Skip to content

Cloud Logging

Eric Fitzgerald edited this page Apr 8, 2026 · 2 revisions

Cloud Logging

TMI supports cloud-native logging to send application logs directly to cloud provider logging services. This enables centralized log management, long-term retention, and integration with cloud monitoring and alerting systems.

Overview

The cloud logging feature provides:

  • Provider-agnostic interface - The CloudLogWriter interface supports multiple cloud providers
  • OCI Logging implementation - First-class support for Oracle Cloud Infrastructure Logging
  • Asynchronous buffered writes - Non-blocking log delivery with configurable batching
  • Health tracking - Monitors cloud logging connectivity
  • Graceful shutdown - Flushes pending logs before application exit
  • Dual logging - Sends logs to both local files and cloud simultaneously

Architecture

                    ┌─────────────────────────────┐
                    │      TMI Application        │
                    │                             │
                    │  slog.Logger                │
                    │      │                      │
                    │      ▼                      │
                    │  CloudLogHandler            │
                    │      │                      │
                    │      ├──► Local Handler     │──► File/Console
                    │      │                      │
                    │      └──► CloudLogWriter    │──► Cloud Provider
                    │           (async buffer)    │
                    └─────────────────────────────┘

Configuration

Environment Variables

Variable Description Default
TMI_LOG_LEVEL Minimum log level info
TMI_LOG_DIR Directory for local log files logs/
TMI_CLOUD_LOG_ENABLED Enable cloud logging false
TMI_CLOUD_LOG_LEVEL Minimum level for cloud logs Same as TMI_LOG_LEVEL

Note: The async write buffer size (default: 1000) is configured programmatically via Config.CloudLogBufferSize, not through an environment variable.

OCI-Specific Variables

Variable Description Required
TMI_CLOUD_LOG_PROVIDER Cloud provider (oci) Yes
TMI_OCI_LOG_ID OCI Log OCID Yes

OCI Logging Setup

Prerequisites

  1. OCI Logging service enabled in your tenancy
  2. Log Group created in OCI Console
  3. Custom Log created within the Log Group
  4. IAM policy allowing OKE worker nodes (or other compute) to write logs

OCI Authentication

The OCI cloud writer uses the following authentication priority:

  1. Explicit ConfigProvider - If provided in OCICloudWriterConfig
  2. Resource Principal - For OCI Container Instances and Functions
  3. Instance Principal - For OCI VMs
  4. Default config (~/.oci/config) - For local development

Terraform Configuration

The Terraform logging module creates the required OCI resources. It supports OKE control plane logs (SERVICE type) and container stdout/stderr log collection via the OCI Unified Monitoring Agent:

module "logging" {
  source = "../../modules/logging/oci"

  compartment_id           = var.compartment_id
  tenancy_ocid             = var.tenancy_ocid
  name_prefix              = var.name_prefix
  object_storage_namespace = data.oci_objectstorage_namespace.ns.namespace

  # OKE control plane log (SERVICE log)
  create_oke_log = true
  oke_cluster_id = var.oke_cluster_id

  # Container stdout/stderr log collection via Unified Monitoring Agent
  create_container_log = true

  retention_days         = 30
  archive_retention_days = 365
  create_archive_bucket  = true
  create_alert_topic     = true
  alert_email            = "[email protected]"
  create_alarms          = true

  tags = local.tags
}

Note: Setting create_container_log = true also creates the dynamic group and IAM policy for OKE worker nodes to ship logs. Application logs must be in JSON format (use slog.NewJSONHandler, not TextHandler) for the Unified Monitoring Agent parser to extract structured fields.

Manual OCI Setup

If not using Terraform:

  1. Create Log Group:
oci logging log-group create \
  --compartment-id <compartment-ocid> \
  --display-name "tmi-logs"
  1. Create Custom Log:
oci logging log create \
  --log-group-id <log-group-ocid> \
  --display-name "tmi-application" \
  --log-type CUSTOM \
  --is-enabled true
  1. Create IAM Policy:
Allow dynamic-group tmi-oke-workers to use log-content in compartment id <compartment-ocid>
Allow dynamic-group tmi-oke-workers to manage log-groups in compartment id <compartment-ocid>

CloudLogWriter Interface

The CloudLogWriter interface enables support for multiple cloud providers:

type CloudLogWriter interface {
    io.Writer

    // WriteLog sends a structured log entry to the cloud provider
    WriteLog(ctx context.Context, entry LogEntry) error

    // Flush forces any buffered logs to be sent immediately
    Flush(ctx context.Context) error

    // Close releases resources and flushes remaining logs
    Close() error

    // Name returns the provider name for identification
    Name() string

    // IsHealthy returns true if the cloud provider is reachable
    IsHealthy(ctx context.Context) bool
}

LogEntry Structure

type LogEntry struct {
    Timestamp time.Time
    Level     slog.Level
    Message   string
    Attrs     map[string]any
    Source    string // file:line if available
}

OCI Cloud Writer

The OCICloudWriter implements CloudLogWriter for OCI Logging service.

Features

  • Batched writes - Collects entries and sends in batches (default: 100 entries)
  • Periodic flushing - Flushes buffer every 5 seconds even if not full
  • Health tracking - Monitors successful/failed write operations
  • Structured logging - Preserves log attributes as JSON fields

Configuration

config := OCICloudWriterConfig{
    LogID:          "ocid1.log.oc1...",
    Source:         "tmi-server",
    Subject:        "production",
    BatchSize:      100,
    FlushTimeout:   5 * time.Second,
    ConfigProvider: nil, // Uses default OCI config
}

writer, err := NewOCICloudWriter(ctx, config)

Log Format in OCI

Logs appear in OCI Logging with this structure:

{
  "data": {
    "level": "INFO",
    "message": "Request processed successfully",
    "source": "api/server.go:245",
    "request_id": "abc123",
    "user_id": "user-456",
    "latency_ms": 45
  },
  "id": "1706123456789000000",
  "time": "2024-01-24T12:34:56.789Z"
}

Logger Integration

Initialization

import "github.com/ericfitz/tmi/internal/slogging"

// Create OCI cloud writer
ociWriter, err := slogging.NewOCICloudWriter(ctx, slogging.OCICloudWriterConfig{
    LogID:  os.Getenv("TMI_OCI_LOG_ID"),
    Source: "tmi-server",
})
if err != nil {
    log.Fatalf("Failed to create OCI writer: %v", err)
}

// Initialize logger with cloud support
cloudLevel := slogging.LogLevelInfo
err = slogging.Initialize(slogging.Config{
    Level:              slogging.LogLevelDebug,
    IsDev:              false,
    LogDir:             "logs",
    AlsoLogToConsole:   true,
    CloudWriter:        ociWriter,
    CloudLogLevel:      &cloudLevel,
    CloudLogBufferSize: 1000,
})

Graceful Shutdown

Always close the logger to flush pending cloud logs:

defer func() {
    if err := slogging.Get().Close(); err != nil {
        log.Printf("Error closing logger: %v", err)
    }
}()

Monitoring Cloud Logging Health

logger := slogging.Get()

// Check error count
if errors := logger.CloudLogErrors(); errors > 0 {
    log.Printf("Cloud logging has %d errors", errors)
}

// Get last error
if err := logger.CloudLogLastError(); err != nil {
    log.Printf("Last cloud log error: %v", err)
}

OCI Logging Module Resources

The Terraform logging module creates (resources are conditional based on input variables):

Resource Description Condition
Log Group Container for related logs Always created
OKE Control Plane Log SERVICE log for kube-apiserver, controller manager, scheduler create_oke_log = true
Container Custom Log Custom log for container stdout/stderr create_container_log = true
Unified Monitoring Agent Fluentd-based agent that tails /var/log/containers/*.log on OKE workers create_container_log = true
Dynamic Group IAM group matching OKE worker node instances create_container_log = true
IAM Policy Permissions for worker nodes to ship logs create_container_log = true
Object Storage Bucket Archive storage (Archive tier) create_archive_bucket = true
Service Connector Automated log archival to Object Storage create_archive_bucket = true and create_oke_log = true
Notification Topic Alert delivery channel create_alert_topic = true
Monitoring Alarm Error rate alarm create_alarms = true and create_oke_log = true

Log Retention

Tier Retention Cost
Live Logs 30 days (configurable, 1-180 days) Standard
Archive 365 days (configurable) Archive tier pricing

Alarms

The module can create a monitoring alarm:

  • Error Rate Alarm - Triggers when error count exceeds the configured threshold (default: 10 errors in 1 minute) with a 5-minute pending duration. Notifications are sent to the alert topic if configured.

Querying Logs

OCI Console

  1. Navigate to Observability & Management > Logging > Logs
  2. Select your log group and log
  3. Use the search interface to filter logs

OCI CLI

# Search logs
oci logging-search search-logs \
  --search-query "search \"ocid1.log.oc1...\" | where level='ERROR'" \
  --time-start 2024-01-24T00:00:00Z \
  --time-end 2024-01-24T23:59:59Z

# Get recent logs
oci logging log-content get \
  --log-id <log-ocid> \
  --start-time 2024-01-24T12:00:00Z

Search Query Examples

-- Find all errors
search "ocid1.log.oc1..." | where level='ERROR'

-- Find by request ID
search "ocid1.log.oc1..." | where request_id='abc123'

-- Find slow requests
search "ocid1.log.oc1..." | where latency_ms > 1000

-- Find by user
search "ocid1.log.oc1..." | where user_id='user-456'

Best Practices

Log Levels

Level Cloud Use Case
DEBUG Optional Detailed debugging (high volume)
INFO Yes Normal operations, request logs
WARN Yes Recoverable issues
ERROR Yes Failures requiring attention

Consider setting cloud log level to INFO while local logs capture DEBUG for cost optimization.

Structured Logging

Always use structured logging for better searchability:

logger.Info("Request processed",
    "request_id", requestID,
    "user_id", userID,
    "method", r.Method,
    "path", r.URL.Path,
    "status", statusCode,
    "latency_ms", latency.Milliseconds(),
)

Error Handling

Cloud logging errors don't fail the application. Monitor error counts:

// Periodic health check
go func() {
    ticker := time.NewTicker(5 * time.Minute)
    for range ticker.C {
        if errors := slogging.Get().CloudLogErrors(); errors > 100 {
            // Alert on high error count
            alerting.Send("Cloud logging degraded", errors)
        }
    }
}()

Adding New Cloud Providers

To add support for a new cloud provider (e.g., AWS CloudWatch):

  1. Create a new file aws_cloud_writer.go
  2. Implement the CloudLogWriter interface
  3. Add configuration handling
  4. Update documentation

Example skeleton:

type AWSCloudWriter struct {
    client *cloudwatchlogs.Client
    // ...
}

func NewAWSCloudWriter(ctx context.Context, config AWSCloudWriterConfig) (*AWSCloudWriter, error) {
    // Initialize AWS CloudWatch Logs client
}

func (w *AWSCloudWriter) WriteLog(ctx context.Context, entry LogEntry) error {
    // Send to CloudWatch
}

// Implement remaining interface methods...

Troubleshooting

Logs Not Appearing in OCI

  1. Verify TMI_OCI_LOG_ID is correct
  2. Check IAM policy grants write access
  3. Verify container can reach OCI services (Service Gateway)
  4. Check CloudLogErrors() for failures

High Latency

  1. Increase BatchSize to reduce API calls
  2. Verify network connectivity to OCI
  3. Check if buffer is consistently full (increase CloudLogBufferSize)

Buffer Overflow

If logs are being dropped:

  1. Increase CloudLogBufferSize
  2. Reduce FlushTimeout for faster delivery
  3. Consider higher CloudLogLevel to reduce volume

Related Documentation

Clone this wiki locally