Skip to content

yxshwanth/flash-sale-engine

Repository files navigation

πŸš€ Flash Sale Engine

A production-ready high-concurrency distributed system for handling flash sales with idempotency, atomic inventory management, fault tolerance, and comprehensive observability. Built with Go, Kafka (Redpanda), Redis, and Docker.

Go Version Docker Kubernetes

✨ Features

Core Features

  • πŸ”„ Idempotency: Prevents duplicate order processing using Redis SETNX with configurable TTL and order status tracking
  • βš›οΈ Atomic Inventory: Race-condition-free stock management using Redis Lua scripts with edge case handling
  • πŸ“¨ Async Processing: Kafka-based message queue for decoupled processing
  • πŸ›‘οΈ Fault Tolerance: Circuit breaker pattern with configurable thresholds and Dead Letter Queue (DLQ) for failed orders
  • βœ… Input Validation: Comprehensive server-side validation with clear error messages
  • πŸ“ Structured Logging: JSON logs with correlation IDs for end-to-end request tracing
  • πŸ₯ Health Checks: Kubernetes-ready health endpoint with service status
  • πŸ“Š High Concurrency: Handles thousands of concurrent requests

Production-Ready Features

  • 🚦 Rate Limiting: Per-user rate limiting using Redis sliding window (configurable)
  • πŸ“ˆ Prometheus Metrics: Comprehensive metrics for monitoring and alerting
  • ⏱️ Request Timeouts: Context-based timeouts for all external calls
  • πŸ”„ Graceful Shutdown: Handles termination signals to drain in-flight requests
  • πŸ“Š DLQ Monitoring: Track DLQ size, age, and failure reasons
  • πŸ” Order Status Tracking: Track order status (PENDING, COMPLETED, FAILED) in Redis
  • ⚑ Enhanced Circuit Breaker: Configurable failure thresholds, success thresholds, and timeouts

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  User   │─────▢│ Gateway │─────▢│  Kafka   │─────▢│Processorβ”‚
β”‚ Request β”‚      β”‚  (API)  β”‚      β”‚ (Queue)  β”‚      β”‚ (Worker)β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚ :8080   β”‚      β”‚          β”‚      β”‚ :9090   β”‚
                 β”‚ /metricsβ”‚      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚ /metricsβ”‚
                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚                                    β”‚
                       β–Ό                                    β–Ό
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚  Redis  β”‚                        β”‚   DLQ   β”‚
                  β”‚(Idempot β”‚                        β”‚(Failed  β”‚
                  β”‚  ency,  β”‚                        β”‚ Orders) β”‚
                  β”‚  Rate   β”‚                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚  Limit) β”‚
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Service Ports:

  • Gateway: :8080 (HTTP API, /health, /metrics)
  • Processor: :9090 (Prometheus metrics)
  • Redis: :6379 (Idempotency, Inventory, Rate Limiting)
  • Redpanda: :19092 (Kafka-compatible message broker)

πŸš€ Quick Start

Prerequisites

  • Docker and Docker Compose
  • Go 1.23+ (optional, for local development)
  • Make (optional, for Mac/Linux users - see Makefile commands)

1. Clone and Start

Option A: Using Make (Mac/Linux)

git clone <your-repo-url>
cd flash-sale-engine
make build    # Build Docker images
make up       # Start all services
make seed     # Seed inventory (100 items for item_id '101')

Option B: Using Docker Compose

git clone <your-repo-url>
cd flash-sale-engine
docker-compose up -d --build
docker exec flash-sale-engine-redis-1 redis-cli SET inventory:101 100

Option C: Using PowerShell (Windows)

git clone <your-repo-url>
cd flash-sale-engine
docker-compose up -d --build
docker exec flash-sale-engine-redis-1 redis-cli SET inventory:101 100

2. Seed Inventory

Using Make:

make seed                    # Default: 100 items for item_id '101'
make seed-item ITEM=102 QTY=50  # Custom item and quantity

Using Docker:

docker exec flash-sale-engine-redis-1 redis-cli SET inventory:101 100

3. Test the System

Option A: Comprehensive Test Suite (Recommended)

# Windows PowerShell - Tests all features
.\test-all-features.ps1

This comprehensive test suite validates:

  • βœ… Input validation
  • βœ… Idempotency
  • βœ… Atomic inventory operations
  • βœ… Structured logging with correlation IDs
  • βœ… Health checks
  • βœ… Circuit breaker behavior
  • βœ… Rate limiting
  • βœ… Prometheus metrics
  • βœ… DLQ monitoring
  • βœ… Order status tracking
  • βœ… Sold out handling

Option B: Quick Manual Testing

# Send an order
$body = '{"user_id":"u1","item_id":"101","amount":1,"request_id":"req-123"}'
Invoke-WebRequest -Uri "http://localhost:8080/buy" -Method POST -Body $body -ContentType "application/json" -UseBasicParsing

# Test idempotency (send same request twice)
Invoke-WebRequest -Uri "http://localhost:8080/buy" -Method POST -Body $body -ContentType "application/json" -UseBasicParsing
# Second request should return 409 Conflict

See TESTING.md for detailed testing scenarios.

πŸ“‹ API Documentation

POST /buy

Place an order for a flash sale item.

Request:

{
  "user_id": "u1",
  "item_id": "101",
  "amount": 1,
  "request_id": "unique-request-id-123"
}

Validation Rules:

  • user_id: Required, alphanumeric/underscore/hyphen, max 100 chars
  • item_id: Required, alphanumeric/underscore/hyphen, max 100 chars
  • amount: Required, integer between 1 and 1000
  • request_id: Required, non-empty, max 200 chars

Responses:

  • 202 Accepted: Order queued successfully
    {
      "status": "Order Queued",
      "correlation_id": "uuid-here"
    }
  • 409 Conflict: Duplicate request detected (idempotency)
  • 429 Too Many Requests: Rate limit exceeded
  • 400 Bad Request: Validation failed
    {
      "error": "Validation failed",
      "errors": [
        {"field": "amount", "message": "amount must be at least 1"}
      ],
      "correlation_id": "uuid-here"
    }
  • 503 Service Unavailable: Circuit breaker is open (Kafka unavailable)
  • 500 Internal Server Error: Server error

GET /health

Health check endpoint for Kubernetes liveness/readiness probes.

Response:

{
  "status": "healthy",
  "redis": true,
  "kafka": true,
  "circuit_breaker_state": "closed"
}
  • 200 OK: All services healthy
  • 503 Service Unavailable: One or more services unhealthy

GET /metrics (Gateway)

Prometheus metrics endpoint for monitoring.

Metrics Exposed:

  • gateway_orders_received_total - Total orders received
  • gateway_orders_successful_total - Orders successfully queued
  • gateway_orders_failed_total - Orders that failed to queue
  • gateway_orders_validation_failed_total - Validation failures
  • gateway_orders_idempotency_rejected_total - Duplicate requests rejected
  • gateway_request_duration_seconds - Request processing time histogram
  • gateway_circuit_breaker_state - Circuit breaker state (0=closed, 1=open, 2=half-open)

Example:

curl http://localhost:8080/metrics

GET /metrics (Processor)

Prometheus metrics endpoint for processor monitoring (port 9090).

Metrics Exposed:

  • processor_orders_processed_total - Total orders processed
  • processor_orders_processed_success_total - Successfully processed
  • processor_orders_processed_failed_total - Failed processing
  • processor_orders_sold_out_total - Orders rejected due to sold out
  • processor_orders_moved_to_dlq_total - Orders moved to DLQ
  • processor_order_processing_duration_seconds - Processing time histogram
  • processor_dlq_size - Current DLQ depth
  • processor_dlq_oldest_message_age_seconds - Age of oldest DLQ message
  • processor_inventory_level{item_id="..."} - Inventory level per item

Example:

curl http://localhost:9090/metrics

🎯 Key Features Explained

1. Idempotency

Problem: User double-clicks or network retries cause duplicate orders.

Solution: Redis SETNX (Set if Not Exists) with request_id as key and 10-minute TTL.

isNew, err := redisClient.SetNX(ctx, "idempotency:"+order.RequestID, "processing", 10*time.Minute).Result()
if !isNew {
    return http.StatusConflict // Duplicate detected
}

Demo:

# First request - succeeds
.\test-buy.ps1 -RequestId "demo-123"

# Second request with same ID - rejected
.\test-buy.ps1 -RequestId "demo-123"  # Returns 409 Conflict

2. Atomic Inventory Management

Problem: Race conditions when multiple users buy simultaneously.

Solution: Redis Lua scripts ensure atomic check-and-refund operations.

// Lua script atomically decrements and refunds if sold out
result, err := checkInventoryScript.Run(ctx, redisClient, []string{inventoryKey}).Result()
// Returns {success: 0|1, stock: int} - all atomic

Benefits:

  • No race conditions possible (Lua scripts are atomic)
  • Automatic refund if sold out
  • No partial failures

3. Circuit Breaker Pattern

Problem: Kafka failures can cascade and crash the gateway.

Solution: Enhanced circuit breaker with configurable thresholds and exponential backoff.

Features:

  • Opens after N consecutive failures (configurable, default: 5)
  • Half-open state allows limited requests to test recovery
  • Configurable timeout with exponential backoff support
  • State exposed via /health endpoint and Prometheus metrics

Configuration:

  • CIRCUIT_BREAKER_FAILURE_THRESHOLD: Failures before opening (default: 5)
  • CIRCUIT_BREAKER_SUCCESS_THRESHOLD: Successes in half-open (default: 2)
  • CIRCUIT_BREAKER_BASE_TIMEOUT: Base timeout (default: 30s)
  • CIRCUIT_BREAKER_MAX_TIMEOUT: Max timeout (default: 300s)
// Circuit breaker wraps Kafka producer
producer = NewCircuitBreaker(rawProducer)
// Returns 503 Service Unavailable when circuit is open

4. Input Validation

Problem: Invalid inputs can cause errors or security issues.

Solution: Comprehensive validation with clear error messages.

  • Validates user_id, item_id format (alphanumeric, underscore, hyphen)
  • Validates amount (1-1000 range)
  • Validates request_id (required, non-empty)
  • Returns 400 Bad Request with detailed error messages

5. Structured Logging

Problem: Hard to trace requests across services.

Solution: JSON logs with correlation IDs for request tracing.

  • Gateway generates UUID correlation IDs
  • Correlation IDs passed via Kafka headers
  • All logs include correlation_id field
  • Enables tracing requests across gateway β†’ Kafka β†’ processor

6. Fault Tolerance (DLQ)

Problem: Payment service fails, but order is already processed.

Solution: Failed orders moved to Dead Letter Queue, inventory refunded atomically.

Features:

  • Automatic inventory refund on failure (atomic Lua script)
  • DLQ size and age monitoring via Prometheus
  • Failure reason categorization
  • Correlation IDs preserved for tracing
if paymentFails {
    // Refund inventory using Lua script (atomic)
    refundScript.Run(ctx, redisClient, []string{inventoryKey}, 1)
    moveToDLQ(msg, "Payment Timeout", correlationID)
}

7. Rate Limiting

Problem: Users can overwhelm the system with too many requests.

Solution: Per-user rate limiting using Redis sliding window.

Features:

  • Configurable max requests per window
  • Per-user tracking (isolated limits)
  • Redis-based for distributed systems
  • Returns 429 Too Many Requests when exceeded

Configuration:

  • RATE_LIMIT_MAX_REQUESTS: Max requests per window (default: 60)
  • RATE_LIMIT_WINDOW: Time window (default: 1m)

8. Prometheus Metrics

Problem: No visibility into system performance and health.

Solution: Comprehensive Prometheus metrics for monitoring and alerting.

Gateway Metrics (:8080/metrics):

  • Order counters (received, successful, failed, validation errors, idempotency rejections)
  • Request duration histogram
  • Circuit breaker state gauge

Processor Metrics (:9090/metrics):

  • Processing counters (total, success, failed, sold out, DLQ)
  • Processing duration histogram
  • DLQ size and age gauges
  • Inventory level gauge per item

See OPERATIONS.md for monitoring and alerting guidelines.

9. Graceful Shutdown

Problem: Abrupt termination causes in-flight requests to fail.

Solution: Handles SIGTERM/SIGINT to drain in-flight requests.

Features:

  • Stops accepting new requests
  • Waits for in-flight requests to complete (30s timeout)
  • Closes connections gracefully
  • Ensures no data loss during shutdown

10. Order Status Tracking

Problem: No way to query order status after submission.

Solution: Track order status in Redis with TTL.

Status Values:

  • PENDING: Order queued, awaiting processing
  • COMPLETED: Order processed successfully
  • FAILED_SOLD_OUT: Order failed due to insufficient inventory
  • FAILED_PAYMENT: Order failed due to payment timeout

TTL: 30 minutes (configurable)

Query:

docker exec flash-sale-engine-redis-1 redis-cli GET "order_status:request-id-123"

πŸ“Š Monitoring & Observability

Logs

View Gateway Logs:

docker-compose logs -f gateway

View Processor Logs:

docker-compose logs -f processor

Search Logs by Correlation ID:

# Find all logs for a specific request
docker-compose logs gateway processor | grep "correlation-id-here"

Metrics

Gateway Metrics:

curl http://localhost:8080/metrics

Processor Metrics:

curl http://localhost:9090/metrics

Query Specific Metrics:

# Check circuit breaker state
curl -s http://localhost:8080/metrics | grep gateway_circuit_breaker_state

# Check DLQ size
curl -s http://localhost:9090/metrics | grep processor_dlq_size

# Check order success rate
curl -s http://localhost:8080/metrics | grep gateway_orders_successful_total

Health Checks

Check Service Health:

curl http://localhost:8080/health

Redis Operations

Check Inventory:

docker exec flash-sale-engine-redis-1 redis-cli GET inventory:101

Check Order Status:

docker exec flash-sale-engine-redis-1 redis-cli GET "order_status:request-id-123"

Check Rate Limit:

docker exec flash-sale-engine-redis-1 redis-cli GET "ratelimit:user-id-123"

Check All Services:

docker-compose ps

See OPERATIONS.md for comprehensive monitoring and troubleshooting guide.

🐳 Docker Compose Services

  • gateway: HTTP API service (port 8080)
    • Endpoints: /buy, /health, /metrics
    • Features: Rate limiting, circuit breaker, input validation
  • processor: Kafka consumer worker (port 9090)
    • Endpoints: /metrics
    • Features: Atomic inventory, DLQ handling, order status tracking
  • redis: Inventory, idempotency, and rate limiting storage (port 6379)
  • redpanda: Kafka-compatible message broker (port 19092)

☸️ Kubernetes Deployment

# Deploy infrastructure
kubectl apply -f k8s/infrastructure.yaml

# Wait for services
kubectl wait --for=condition=ready pod -l app=redis --timeout=60s
kubectl wait --for=condition=ready pod -l app=redpanda --timeout=60s

# Deploy applications
kubectl apply -f k8s/apps.yaml

# Seed inventory
kubectl exec -it deployment/redis -- redis-cli SET inventory:101 100

# Test (NodePort on 30000)
curl -X POST http://localhost:30000/buy \
  -H "Content-Type: application/json" \
  -d '{"user_id":"u1","item_id":"101","amount":1,"request_id":"req-123"}'

πŸ§ͺ Testing

Comprehensive Test Suite

Run the full test suite to validate all features:

# Run comprehensive test suite
.\test-all-features.ps1

Tests Cover:

  • βœ… Input validation (missing fields, invalid values, format validation)
  • βœ… Idempotency (duplicate request rejection, order status tracking)
  • βœ… Atomic inventory (concurrent orders, sold out handling)
  • βœ… Structured logging (correlation IDs across services)
  • βœ… Health check endpoint (service status)
  • βœ… Circuit breaker (failure handling, recovery)
  • βœ… Rate limiting (per-user limits, 429 responses)
  • βœ… Prometheus metrics (gateway and processor)
  • βœ… DLQ monitoring (size, age, failure reasons)
  • βœ… Order status tracking (PENDING, COMPLETED, FAILED states)

Quick Manual Tests

Test 1: Idempotency

$body = '{"user_id":"u1","item_id":"101","amount":1,"request_id":"test-123"}'
# First request - should return 202
Invoke-WebRequest -Uri "http://localhost:8080/buy" -Method POST -Body $body -ContentType "application/json" -UseBasicParsing
# Second request - should return 409
Invoke-WebRequest -Uri "http://localhost:8080/buy" -Method POST -Body $body -ContentType "application/json" -UseBasicParsing

Test 2: Rate Limiting

# Send 70 requests rapidly (limit is 60/min)
for ($i=1; $i -le 70; $i++) {
    $body = "{\"user_id\":\"ratelimit-user\",\"item_id\":\"101\",\"amount\":1,\"request_id\":\"rate-test-$i\"}"
    try {
        Invoke-WebRequest -Uri "http://localhost:8080/buy" -Method POST -Body $body -ContentType "application/json" -UseBasicParsing
    } catch {
        Write-Host "Request $i : $($_.Exception.Response.StatusCode.value__)"
    }
}
# Should see 429 responses after 60 requests

Test 3: Metrics

# Check gateway metrics
Invoke-WebRequest -Uri "http://localhost:8080/metrics" -UseBasicParsing

# Check processor metrics
Invoke-WebRequest -Uri "http://localhost:9090/metrics" -UseBasicParsing

Test 4: Circuit Breaker

# Stop Kafka
docker-compose stop redpanda

# Send 6 requests (will fail)
for ($i=1; $i -le 6; $i++) {
    $body = "{\"user_id\":\"u$i\",\"item_id\":\"101\",\"amount\":1,\"request_id\":\"cb-test-$i\"}"
    try {
        Invoke-WebRequest -Uri "http://localhost:8080/buy" -Method POST -Body $body -ContentType "application/json" -UseBasicParsing
    } catch { }
}

# Check health - circuit should be "open"
Invoke-WebRequest -Uri "http://localhost:8080/health" -UseBasicParsing

# Restart Kafka
docker-compose start redpanda

# Wait 35 seconds for circuit recovery
Start-Sleep -Seconds 35

# Check health - circuit should be "closed"
Invoke-WebRequest -Uri "http://localhost:8080/health" -UseBasicParsing

See TESTING.md for detailed testing scenarios and troubleshooting.

πŸ“ Project Structure

flash-sale-engine/
β”œβ”€β”€ gateway/
β”‚   β”œβ”€β”€ main.go              # HTTP API (Producer)
β”‚   β”œβ”€β”€ validation.go        # Input validation logic
β”‚   β”œβ”€β”€ circuit_breaker.go   # Circuit breaker for Kafka producer
β”‚   └── rate_limiter.go      # Per-user rate limiting
β”œβ”€β”€ processor/
β”‚   β”œβ”€β”€ main.go              # Kafka Consumer (Worker)
β”‚   β”œβ”€β”€ redis_scripts.go     # Redis Lua scripts for atomic operations
β”‚   └── dlq_metrics.go       # DLQ monitoring metrics
β”œβ”€β”€ common/
β”‚   β”œβ”€β”€ logger.go            # Structured logging utilities
β”‚   └── metrics.go           # Prometheus metrics definitions
β”œβ”€β”€ k8s/
β”‚   β”œβ”€β”€ infrastructure.yaml  # Redis, Redpanda
β”‚   └── apps.yaml            # Gateway, Processor
β”œβ”€β”€ Dockerfile               # Multi-stage build for both services
β”œβ”€β”€ docker-compose.yml       # Local development setup
β”œβ”€β”€ Makefile                 # Make commands for Mac/Linux users
β”œβ”€β”€ go.mod                   # Go module dependencies
β”œβ”€β”€ test-all-features.ps1    # Comprehensive test suite (PowerShell)
β”œβ”€β”€ README.md                # This file
β”œβ”€β”€ TESTING.md               # Detailed testing guide
└── OPERATIONS.md            # Operations runbook

πŸ”§ Development

Using Makefile (Mac/Linux)

The project includes a Makefile with convenient commands:

make help              # Show all available commands
make build             # Build Docker images
make up                # Start all services
make down              # Stop all services
make logs              # View logs from all services
make logs-gateway      # View gateway logs
make logs-processor    # View processor logs
make test              # Run comprehensive test suite
make seed              # Seed inventory (100 items for item_id '101')
make health            # Check service health
make metrics           # View Prometheus metrics
make inventory ITEM=101 # Check inventory for specific item
make order-status REQ_ID=test-123 # Check order status
make clean             # Stop services and remove containers
make rebuild           # Rebuild and restart services
make test-order        # Send a test order
make test-idempotency  # Test idempotency (send same request twice)

Build Locally

Using Make:

make build

Manual Build:

go mod download
go build -o gateway-bin ./gateway
go build -o processor-bin ./processor

Run Locally (requires Redis and Kafka)

Using Docker Compose:

make up  # or: docker-compose up -d

Manual Run:

# Terminal 1: Gateway
REDIS_ADDR=localhost:6379 KAFKA_ADDR=localhost:9092 ./gateway-bin

# Terminal 2: Processor
REDIS_ADDR=localhost:6379 KAFKA_ADDR=localhost:9092 ./processor-bin

πŸ“ˆ Performance Considerations

  • Idempotency Key TTL: 10 minutes (prevents key accumulation)
  • Order Status TTL: 30 minutes (configurable)
  • Circuit Breaker: Configurable thresholds (default: 5 failures, 30s timeout)
  • Rate Limiting: Configurable per-user limits (default: 60 requests/minute)
  • Request Timeouts: Context-based timeouts for all external calls
  • Kafka Topic: Auto-created in dev mode
  • Redis Lua Scripts: Atomic operations prevent race conditions
  • Structured Logging: JSON format for easy log aggregation
  • Concurrency: Handles 1000+ requests/second
  • Inventory Operations: All atomic (no locks needed)
  • Graceful Shutdown: 30s timeout for draining in-flight requests

βš™οΈ Configuration

Environment Variables

Gateway:

  • REDIS_ADDR: Redis address (default: redis-service:6379)
  • KAFKA_ADDR: Kafka address (default: kafka-service:9092)
  • LOG_LEVEL: Log level - debug, info, warn, error (default: info)
  • CIRCUIT_BREAKER_FAILURE_THRESHOLD: Failures before opening (default: 5)
  • CIRCUIT_BREAKER_SUCCESS_THRESHOLD: Successes in half-open (default: 2)
  • CIRCUIT_BREAKER_BASE_TIMEOUT: Base timeout (default: 30s)
  • CIRCUIT_BREAKER_MAX_TIMEOUT: Max timeout (default: 300s)
  • RATE_LIMIT_MAX_REQUESTS: Max requests per window (default: 60)
  • RATE_LIMIT_WINDOW: Rate limit window (default: 1m)

Processor:

  • REDIS_ADDR: Redis address (default: redis-service:6379)
  • KAFKA_ADDR: Kafka address (default: kafka-service:9092)
  • LOG_LEVEL: Log level - debug, info, warn, error (default: info)

Docker Compose Configuration

Edit docker-compose.yml to customize environment variables:

gateway:
  environment:
    - RATE_LIMIT_MAX_REQUESTS=120  # Increase rate limit
    - CIRCUIT_BREAKER_FAILURE_THRESHOLD=10  # More tolerant
    - LOG_LEVEL=debug  # Verbose logging

πŸ› οΈ Troubleshooting

Services not starting?

docker-compose logs
docker-compose ps

Can't connect to Redis/Kafka?

  • Check network: docker network ls
  • Verify services: docker-compose ps
  • Check logs: docker-compose logs <service>

Orders not processing?

  • Check processor logs: docker-compose logs processor
  • Verify Kafka topic exists
  • Check Redis connection

Circuit breaker stuck open?

  • Check Kafka/Redpanda is running: docker-compose ps redpanda
  • Restart Kafka: docker-compose restart redpanda
  • Wait 30+ seconds for circuit recovery
  • Check health: curl http://localhost:8080/health

Rate limiting too aggressive?

  • Check current limit: docker-compose exec gateway env | grep RATE_LIMIT
  • Increase limit in docker-compose.yml and restart: docker-compose restart gateway

Metrics not accessible?

  • Verify ports are exposed: docker-compose ps
  • Check service is running: docker-compose logs gateway processor
  • Test endpoints: curl http://localhost:8080/metrics and curl http://localhost:9090/metrics

DLQ growing?

  • Check DLQ size: curl -s http://localhost:9090/metrics | grep processor_dlq_size
  • Review failure reasons in processor logs: docker-compose logs processor | grep DLQ
  • Check DLQ messages: docker exec flash-sale-engine-redpanda-1 rpk topic consume orders-dlq

See OPERATIONS.md for comprehensive troubleshooting guide.

πŸ“š Documentation

  • TESTING.md: Comprehensive testing guide with all scenarios
  • OPERATIONS.md: Operations runbook for production monitoring and troubleshooting

πŸ”— Related Resources

πŸ“ License

MIT License

🀝 Contributing

Contributions welcome! Please open an issue or submit a PR.

πŸ“§ Contact

For questions or issues, please open a GitHub issue.


Built with ❀️ for high-concurrency distributed systems

Production-Ready Features:

  • βœ… Comprehensive monitoring and observability
  • βœ… Fault tolerance and resilience patterns
  • βœ… Rate limiting and request validation
  • βœ… Graceful shutdown and resource management
  • βœ… End-to-end request tracing
  • βœ… Operational runbooks and testing guides

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors