A production-ready high-concurrency distributed system for handling flash sales with idempotency, atomic inventory management, fault tolerance, and comprehensive observability. Built with Go, Kafka (Redpanda), Redis, and Docker.
- π Idempotency: Prevents duplicate order processing using Redis SETNX with configurable TTL and order status tracking
- βοΈ Atomic Inventory: Race-condition-free stock management using Redis Lua scripts with edge case handling
- π¨ Async Processing: Kafka-based message queue for decoupled processing
- π‘οΈ Fault Tolerance: Circuit breaker pattern with configurable thresholds and Dead Letter Queue (DLQ) for failed orders
- β Input Validation: Comprehensive server-side validation with clear error messages
- π Structured Logging: JSON logs with correlation IDs for end-to-end request tracing
- π₯ Health Checks: Kubernetes-ready health endpoint with service status
- π High Concurrency: Handles thousands of concurrent requests
- π¦ Rate Limiting: Per-user rate limiting using Redis sliding window (configurable)
- π Prometheus Metrics: Comprehensive metrics for monitoring and alerting
- β±οΈ Request Timeouts: Context-based timeouts for all external calls
- π Graceful Shutdown: Handles termination signals to drain in-flight requests
- π DLQ Monitoring: Track DLQ size, age, and failure reasons
- π Order Status Tracking: Track order status (PENDING, COMPLETED, FAILED) in Redis
- β‘ Enhanced Circuit Breaker: Configurable failure thresholds, success thresholds, and timeouts
βββββββββββ βββββββββββ ββββββββββββ βββββββββββ
β User βββββββΆβ Gateway βββββββΆβ Kafka βββββββΆβProcessorβ
β Request β β (API) β β (Queue) β β (Worker)β
βββββββββββ β :8080 β β β β :9090 β
β /metricsβ ββββββββββββ β /metricsβ
βββββββββββ βββββββββββ
β β
βΌ βΌ
βββββββββββ βββββββββββ
β Redis β β DLQ β
β(Idempot β β(Failed β
β ency, β β Orders) β
β Rate β βββββββββββ
β Limit) β
βββββββββββ
Service Ports:
- Gateway:
:8080(HTTP API,/health,/metrics) - Processor:
:9090(Prometheus metrics) - Redis:
:6379(Idempotency, Inventory, Rate Limiting) - Redpanda:
:19092(Kafka-compatible message broker)
- Docker and Docker Compose
- Go 1.23+ (optional, for local development)
- Make (optional, for Mac/Linux users - see Makefile commands)
Option A: Using Make (Mac/Linux)
git clone <your-repo-url>
cd flash-sale-engine
make build # Build Docker images
make up # Start all services
make seed # Seed inventory (100 items for item_id '101')Option B: Using Docker Compose
git clone <your-repo-url>
cd flash-sale-engine
docker-compose up -d --build
docker exec flash-sale-engine-redis-1 redis-cli SET inventory:101 100Option C: Using PowerShell (Windows)
git clone <your-repo-url>
cd flash-sale-engine
docker-compose up -d --build
docker exec flash-sale-engine-redis-1 redis-cli SET inventory:101 100Using Make:
make seed # Default: 100 items for item_id '101'
make seed-item ITEM=102 QTY=50 # Custom item and quantityUsing Docker:
docker exec flash-sale-engine-redis-1 redis-cli SET inventory:101 100Option A: Comprehensive Test Suite (Recommended)
# Windows PowerShell - Tests all features
.\test-all-features.ps1This comprehensive test suite validates:
- β Input validation
- β Idempotency
- β Atomic inventory operations
- β Structured logging with correlation IDs
- β Health checks
- β Circuit breaker behavior
- β Rate limiting
- β Prometheus metrics
- β DLQ monitoring
- β Order status tracking
- β Sold out handling
Option B: Quick Manual Testing
# Send an order
$body = '{"user_id":"u1","item_id":"101","amount":1,"request_id":"req-123"}'
Invoke-WebRequest -Uri "http://localhost:8080/buy" -Method POST -Body $body -ContentType "application/json" -UseBasicParsing
# Test idempotency (send same request twice)
Invoke-WebRequest -Uri "http://localhost:8080/buy" -Method POST -Body $body -ContentType "application/json" -UseBasicParsing
# Second request should return 409 ConflictSee TESTING.md for detailed testing scenarios.
Place an order for a flash sale item.
Request:
{
"user_id": "u1",
"item_id": "101",
"amount": 1,
"request_id": "unique-request-id-123"
}Validation Rules:
user_id: Required, alphanumeric/underscore/hyphen, max 100 charsitem_id: Required, alphanumeric/underscore/hyphen, max 100 charsamount: Required, integer between 1 and 1000request_id: Required, non-empty, max 200 chars
Responses:
202 Accepted: Order queued successfully{ "status": "Order Queued", "correlation_id": "uuid-here" }409 Conflict: Duplicate request detected (idempotency)429 Too Many Requests: Rate limit exceeded400 Bad Request: Validation failed{ "error": "Validation failed", "errors": [ {"field": "amount", "message": "amount must be at least 1"} ], "correlation_id": "uuid-here" }503 Service Unavailable: Circuit breaker is open (Kafka unavailable)500 Internal Server Error: Server error
Health check endpoint for Kubernetes liveness/readiness probes.
Response:
{
"status": "healthy",
"redis": true,
"kafka": true,
"circuit_breaker_state": "closed"
}200 OK: All services healthy503 Service Unavailable: One or more services unhealthy
Prometheus metrics endpoint for monitoring.
Metrics Exposed:
gateway_orders_received_total- Total orders receivedgateway_orders_successful_total- Orders successfully queuedgateway_orders_failed_total- Orders that failed to queuegateway_orders_validation_failed_total- Validation failuresgateway_orders_idempotency_rejected_total- Duplicate requests rejectedgateway_request_duration_seconds- Request processing time histogramgateway_circuit_breaker_state- Circuit breaker state (0=closed, 1=open, 2=half-open)
Example:
curl http://localhost:8080/metricsPrometheus metrics endpoint for processor monitoring (port 9090).
Metrics Exposed:
processor_orders_processed_total- Total orders processedprocessor_orders_processed_success_total- Successfully processedprocessor_orders_processed_failed_total- Failed processingprocessor_orders_sold_out_total- Orders rejected due to sold outprocessor_orders_moved_to_dlq_total- Orders moved to DLQprocessor_order_processing_duration_seconds- Processing time histogramprocessor_dlq_size- Current DLQ depthprocessor_dlq_oldest_message_age_seconds- Age of oldest DLQ messageprocessor_inventory_level{item_id="..."}- Inventory level per item
Example:
curl http://localhost:9090/metricsProblem: User double-clicks or network retries cause duplicate orders.
Solution: Redis SETNX (Set if Not Exists) with request_id as key and 10-minute TTL.
isNew, err := redisClient.SetNX(ctx, "idempotency:"+order.RequestID, "processing", 10*time.Minute).Result()
if !isNew {
return http.StatusConflict // Duplicate detected
}Demo:
# First request - succeeds
.\test-buy.ps1 -RequestId "demo-123"
# Second request with same ID - rejected
.\test-buy.ps1 -RequestId "demo-123" # Returns 409 ConflictProblem: Race conditions when multiple users buy simultaneously.
Solution: Redis Lua scripts ensure atomic check-and-refund operations.
// Lua script atomically decrements and refunds if sold out
result, err := checkInventoryScript.Run(ctx, redisClient, []string{inventoryKey}).Result()
// Returns {success: 0|1, stock: int} - all atomicBenefits:
- No race conditions possible (Lua scripts are atomic)
- Automatic refund if sold out
- No partial failures
Problem: Kafka failures can cascade and crash the gateway.
Solution: Enhanced circuit breaker with configurable thresholds and exponential backoff.
Features:
- Opens after N consecutive failures (configurable, default: 5)
- Half-open state allows limited requests to test recovery
- Configurable timeout with exponential backoff support
- State exposed via
/healthendpoint and Prometheus metrics
Configuration:
CIRCUIT_BREAKER_FAILURE_THRESHOLD: Failures before opening (default: 5)CIRCUIT_BREAKER_SUCCESS_THRESHOLD: Successes in half-open (default: 2)CIRCUIT_BREAKER_BASE_TIMEOUT: Base timeout (default: 30s)CIRCUIT_BREAKER_MAX_TIMEOUT: Max timeout (default: 300s)
// Circuit breaker wraps Kafka producer
producer = NewCircuitBreaker(rawProducer)
// Returns 503 Service Unavailable when circuit is openProblem: Invalid inputs can cause errors or security issues.
Solution: Comprehensive validation with clear error messages.
- Validates user_id, item_id format (alphanumeric, underscore, hyphen)
- Validates amount (1-1000 range)
- Validates request_id (required, non-empty)
- Returns 400 Bad Request with detailed error messages
Problem: Hard to trace requests across services.
Solution: JSON logs with correlation IDs for request tracing.
- Gateway generates UUID correlation IDs
- Correlation IDs passed via Kafka headers
- All logs include correlation_id field
- Enables tracing requests across gateway β Kafka β processor
Problem: Payment service fails, but order is already processed.
Solution: Failed orders moved to Dead Letter Queue, inventory refunded atomically.
Features:
- Automatic inventory refund on failure (atomic Lua script)
- DLQ size and age monitoring via Prometheus
- Failure reason categorization
- Correlation IDs preserved for tracing
if paymentFails {
// Refund inventory using Lua script (atomic)
refundScript.Run(ctx, redisClient, []string{inventoryKey}, 1)
moveToDLQ(msg, "Payment Timeout", correlationID)
}Problem: Users can overwhelm the system with too many requests.
Solution: Per-user rate limiting using Redis sliding window.
Features:
- Configurable max requests per window
- Per-user tracking (isolated limits)
- Redis-based for distributed systems
- Returns
429 Too Many Requestswhen exceeded
Configuration:
RATE_LIMIT_MAX_REQUESTS: Max requests per window (default: 60)RATE_LIMIT_WINDOW: Time window (default: 1m)
Problem: No visibility into system performance and health.
Solution: Comprehensive Prometheus metrics for monitoring and alerting.
Gateway Metrics (:8080/metrics):
- Order counters (received, successful, failed, validation errors, idempotency rejections)
- Request duration histogram
- Circuit breaker state gauge
Processor Metrics (:9090/metrics):
- Processing counters (total, success, failed, sold out, DLQ)
- Processing duration histogram
- DLQ size and age gauges
- Inventory level gauge per item
See OPERATIONS.md for monitoring and alerting guidelines.
Problem: Abrupt termination causes in-flight requests to fail.
Solution: Handles SIGTERM/SIGINT to drain in-flight requests.
Features:
- Stops accepting new requests
- Waits for in-flight requests to complete (30s timeout)
- Closes connections gracefully
- Ensures no data loss during shutdown
Problem: No way to query order status after submission.
Solution: Track order status in Redis with TTL.
Status Values:
PENDING: Order queued, awaiting processingCOMPLETED: Order processed successfullyFAILED_SOLD_OUT: Order failed due to insufficient inventoryFAILED_PAYMENT: Order failed due to payment timeout
TTL: 30 minutes (configurable)
Query:
docker exec flash-sale-engine-redis-1 redis-cli GET "order_status:request-id-123"View Gateway Logs:
docker-compose logs -f gatewayView Processor Logs:
docker-compose logs -f processorSearch Logs by Correlation ID:
# Find all logs for a specific request
docker-compose logs gateway processor | grep "correlation-id-here"Gateway Metrics:
curl http://localhost:8080/metricsProcessor Metrics:
curl http://localhost:9090/metricsQuery Specific Metrics:
# Check circuit breaker state
curl -s http://localhost:8080/metrics | grep gateway_circuit_breaker_state
# Check DLQ size
curl -s http://localhost:9090/metrics | grep processor_dlq_size
# Check order success rate
curl -s http://localhost:8080/metrics | grep gateway_orders_successful_totalCheck Service Health:
curl http://localhost:8080/healthCheck Inventory:
docker exec flash-sale-engine-redis-1 redis-cli GET inventory:101Check Order Status:
docker exec flash-sale-engine-redis-1 redis-cli GET "order_status:request-id-123"Check Rate Limit:
docker exec flash-sale-engine-redis-1 redis-cli GET "ratelimit:user-id-123"Check All Services:
docker-compose psSee OPERATIONS.md for comprehensive monitoring and troubleshooting guide.
- gateway: HTTP API service (port 8080)
- Endpoints:
/buy,/health,/metrics - Features: Rate limiting, circuit breaker, input validation
- Endpoints:
- processor: Kafka consumer worker (port 9090)
- Endpoints:
/metrics - Features: Atomic inventory, DLQ handling, order status tracking
- Endpoints:
- redis: Inventory, idempotency, and rate limiting storage (port 6379)
- redpanda: Kafka-compatible message broker (port 19092)
# Deploy infrastructure
kubectl apply -f k8s/infrastructure.yaml
# Wait for services
kubectl wait --for=condition=ready pod -l app=redis --timeout=60s
kubectl wait --for=condition=ready pod -l app=redpanda --timeout=60s
# Deploy applications
kubectl apply -f k8s/apps.yaml
# Seed inventory
kubectl exec -it deployment/redis -- redis-cli SET inventory:101 100
# Test (NodePort on 30000)
curl -X POST http://localhost:30000/buy \
-H "Content-Type: application/json" \
-d '{"user_id":"u1","item_id":"101","amount":1,"request_id":"req-123"}'Run the full test suite to validate all features:
# Run comprehensive test suite
.\test-all-features.ps1Tests Cover:
- β Input validation (missing fields, invalid values, format validation)
- β Idempotency (duplicate request rejection, order status tracking)
- β Atomic inventory (concurrent orders, sold out handling)
- β Structured logging (correlation IDs across services)
- β Health check endpoint (service status)
- β Circuit breaker (failure handling, recovery)
- β Rate limiting (per-user limits, 429 responses)
- β Prometheus metrics (gateway and processor)
- β DLQ monitoring (size, age, failure reasons)
- β Order status tracking (PENDING, COMPLETED, FAILED states)
Test 1: Idempotency
$body = '{"user_id":"u1","item_id":"101","amount":1,"request_id":"test-123"}'
# First request - should return 202
Invoke-WebRequest -Uri "http://localhost:8080/buy" -Method POST -Body $body -ContentType "application/json" -UseBasicParsing
# Second request - should return 409
Invoke-WebRequest -Uri "http://localhost:8080/buy" -Method POST -Body $body -ContentType "application/json" -UseBasicParsingTest 2: Rate Limiting
# Send 70 requests rapidly (limit is 60/min)
for ($i=1; $i -le 70; $i++) {
$body = "{\"user_id\":\"ratelimit-user\",\"item_id\":\"101\",\"amount\":1,\"request_id\":\"rate-test-$i\"}"
try {
Invoke-WebRequest -Uri "http://localhost:8080/buy" -Method POST -Body $body -ContentType "application/json" -UseBasicParsing
} catch {
Write-Host "Request $i : $($_.Exception.Response.StatusCode.value__)"
}
}
# Should see 429 responses after 60 requestsTest 3: Metrics
# Check gateway metrics
Invoke-WebRequest -Uri "http://localhost:8080/metrics" -UseBasicParsing
# Check processor metrics
Invoke-WebRequest -Uri "http://localhost:9090/metrics" -UseBasicParsingTest 4: Circuit Breaker
# Stop Kafka
docker-compose stop redpanda
# Send 6 requests (will fail)
for ($i=1; $i -le 6; $i++) {
$body = "{\"user_id\":\"u$i\",\"item_id\":\"101\",\"amount\":1,\"request_id\":\"cb-test-$i\"}"
try {
Invoke-WebRequest -Uri "http://localhost:8080/buy" -Method POST -Body $body -ContentType "application/json" -UseBasicParsing
} catch { }
}
# Check health - circuit should be "open"
Invoke-WebRequest -Uri "http://localhost:8080/health" -UseBasicParsing
# Restart Kafka
docker-compose start redpanda
# Wait 35 seconds for circuit recovery
Start-Sleep -Seconds 35
# Check health - circuit should be "closed"
Invoke-WebRequest -Uri "http://localhost:8080/health" -UseBasicParsingSee TESTING.md for detailed testing scenarios and troubleshooting.
flash-sale-engine/
βββ gateway/
β βββ main.go # HTTP API (Producer)
β βββ validation.go # Input validation logic
β βββ circuit_breaker.go # Circuit breaker for Kafka producer
β βββ rate_limiter.go # Per-user rate limiting
βββ processor/
β βββ main.go # Kafka Consumer (Worker)
β βββ redis_scripts.go # Redis Lua scripts for atomic operations
β βββ dlq_metrics.go # DLQ monitoring metrics
βββ common/
β βββ logger.go # Structured logging utilities
β βββ metrics.go # Prometheus metrics definitions
βββ k8s/
β βββ infrastructure.yaml # Redis, Redpanda
β βββ apps.yaml # Gateway, Processor
βββ Dockerfile # Multi-stage build for both services
βββ docker-compose.yml # Local development setup
βββ Makefile # Make commands for Mac/Linux users
βββ go.mod # Go module dependencies
βββ test-all-features.ps1 # Comprehensive test suite (PowerShell)
βββ README.md # This file
βββ TESTING.md # Detailed testing guide
βββ OPERATIONS.md # Operations runbook
The project includes a Makefile with convenient commands:
make help # Show all available commands
make build # Build Docker images
make up # Start all services
make down # Stop all services
make logs # View logs from all services
make logs-gateway # View gateway logs
make logs-processor # View processor logs
make test # Run comprehensive test suite
make seed # Seed inventory (100 items for item_id '101')
make health # Check service health
make metrics # View Prometheus metrics
make inventory ITEM=101 # Check inventory for specific item
make order-status REQ_ID=test-123 # Check order status
make clean # Stop services and remove containers
make rebuild # Rebuild and restart services
make test-order # Send a test order
make test-idempotency # Test idempotency (send same request twice)Using Make:
make buildManual Build:
go mod download
go build -o gateway-bin ./gateway
go build -o processor-bin ./processorUsing Docker Compose:
make up # or: docker-compose up -dManual Run:
# Terminal 1: Gateway
REDIS_ADDR=localhost:6379 KAFKA_ADDR=localhost:9092 ./gateway-bin
# Terminal 2: Processor
REDIS_ADDR=localhost:6379 KAFKA_ADDR=localhost:9092 ./processor-bin- Idempotency Key TTL: 10 minutes (prevents key accumulation)
- Order Status TTL: 30 minutes (configurable)
- Circuit Breaker: Configurable thresholds (default: 5 failures, 30s timeout)
- Rate Limiting: Configurable per-user limits (default: 60 requests/minute)
- Request Timeouts: Context-based timeouts for all external calls
- Kafka Topic: Auto-created in dev mode
- Redis Lua Scripts: Atomic operations prevent race conditions
- Structured Logging: JSON format for easy log aggregation
- Concurrency: Handles 1000+ requests/second
- Inventory Operations: All atomic (no locks needed)
- Graceful Shutdown: 30s timeout for draining in-flight requests
Gateway:
REDIS_ADDR: Redis address (default:redis-service:6379)KAFKA_ADDR: Kafka address (default:kafka-service:9092)LOG_LEVEL: Log level -debug,info,warn,error(default:info)CIRCUIT_BREAKER_FAILURE_THRESHOLD: Failures before opening (default:5)CIRCUIT_BREAKER_SUCCESS_THRESHOLD: Successes in half-open (default:2)CIRCUIT_BREAKER_BASE_TIMEOUT: Base timeout (default:30s)CIRCUIT_BREAKER_MAX_TIMEOUT: Max timeout (default:300s)RATE_LIMIT_MAX_REQUESTS: Max requests per window (default:60)RATE_LIMIT_WINDOW: Rate limit window (default:1m)
Processor:
REDIS_ADDR: Redis address (default:redis-service:6379)KAFKA_ADDR: Kafka address (default:kafka-service:9092)LOG_LEVEL: Log level -debug,info,warn,error(default:info)
Edit docker-compose.yml to customize environment variables:
gateway:
environment:
- RATE_LIMIT_MAX_REQUESTS=120 # Increase rate limit
- CIRCUIT_BREAKER_FAILURE_THRESHOLD=10 # More tolerant
- LOG_LEVEL=debug # Verbose loggingServices not starting?
docker-compose logs
docker-compose psCan't connect to Redis/Kafka?
- Check network:
docker network ls - Verify services:
docker-compose ps - Check logs:
docker-compose logs <service>
Orders not processing?
- Check processor logs:
docker-compose logs processor - Verify Kafka topic exists
- Check Redis connection
Circuit breaker stuck open?
- Check Kafka/Redpanda is running:
docker-compose ps redpanda - Restart Kafka:
docker-compose restart redpanda - Wait 30+ seconds for circuit recovery
- Check health:
curl http://localhost:8080/health
Rate limiting too aggressive?
- Check current limit:
docker-compose exec gateway env | grep RATE_LIMIT - Increase limit in
docker-compose.ymland restart:docker-compose restart gateway
Metrics not accessible?
- Verify ports are exposed:
docker-compose ps - Check service is running:
docker-compose logs gateway processor - Test endpoints:
curl http://localhost:8080/metricsandcurl http://localhost:9090/metrics
DLQ growing?
- Check DLQ size:
curl -s http://localhost:9090/metrics | grep processor_dlq_size - Review failure reasons in processor logs:
docker-compose logs processor | grep DLQ - Check DLQ messages:
docker exec flash-sale-engine-redpanda-1 rpk topic consume orders-dlq
See OPERATIONS.md for comprehensive troubleshooting guide.
- TESTING.md: Comprehensive testing guide with all scenarios
- OPERATIONS.md: Operations runbook for production monitoring and troubleshooting
- Prometheus: prometheus.io - Metrics collection
- Redis: redis.io - In-memory data store
- Redpanda: redpanda.com - Kafka-compatible message broker
- Go Circuit Breaker: github.com/sony/gobreaker
MIT License
Contributions welcome! Please open an issue or submit a PR.
For questions or issues, please open a GitHub issue.
Built with β€οΈ for high-concurrency distributed systems
Production-Ready Features:
- β Comprehensive monitoring and observability
- β Fault tolerance and resilience patterns
- β Rate limiting and request validation
- β Graceful shutdown and resource management
- β End-to-end request tracing
- β Operational runbooks and testing guides