Skip to content

Latest commit

 

History

History
358 lines (263 loc) · 7.93 KB

File metadata and controls

358 lines (263 loc) · 7.93 KB
id scaling
title Scaling
sidebar_position 9
description Horizontal and vertical scaling strategies for Milvaion components.

Scaling Guide

This guide covers strategies for scaling Milvaion horizontally and vertically to handle increasing workloads.

Scaling Overview

Milvaion is designed for horizontal scaling:

Component Scaling Type Strategy
API Server Horizontal Add more instances behind load balancer
Workers Horizontal Add more instances per job type
PostgreSQL Vertical / Read replicas Larger instance or read replicas
Redis Vertical / Cluster Larger instance or Redis Cluster
RabbitMQ Horizontal Clustering with replicated queues

Scaling Workers

Workers are stateless (unless you make them stateful) – scaling is straightforward.

Basic Scaling

# Docker Compose - scale to 5 instances
docker compose up -d --scale email-worker=5

# Kubernetes - scale deployment
kubectl scale deployment email-worker --replicas=5

# Kubernetes HPA (automatic)
kubectl autoscale deployment email-worker --min=2 --max=10 --cpu-percent=70

Capacity Planning

Jobs per second = Workers × MaxParallelJobs × (1 / AvgJobDuration)

Example:

  • 3 workers
  • MaxParallelJobs: 10
  • Average job duration: 2 seconds
Throughput = 3 × 10 × (1 / 2) = 15 jobs/second = 900 jobs/minute

Specialized Workers

Create job-specific workers for resource optimization:

Email Worker (I/O-bound, high concurrency)

{
  "Worker": {
    "WorkerId": "email-worker",
    "MaxParallelJobs": 100
  }
}

Report Worker (CPU-bound, low concurrency)

{
  "Worker": {
    "WorkerId": "report-worker",
    "MaxParallelJobs": 4
  }
}

Worker Affinity

Route specific jobs to specific worker pools:

+---------------------+
|      RabbitMQ       |
|    Topic Exchange   |
+---------------------+
            |
   -------------------------
   |           |           |
sendemail.*  report.*   migration.*
   |           |           |
+---------+ +---------+ +------------+
| Email   | | Report  | | Migration  |
| Workers | | Workers | | Worker     |
| (x10)   | | (x2)    | | (x1)        |
+---------+ +---------+ +------------+

Scaling API Server

Horizontal Scaling

The API is stateless – scale by adding instances:

# docker-compose.yml
services:
  milvaion-api:
    image: milvasoft/milvaion-api:latest
    deploy:
      replicas: 3

Load Balancer Configuration

NGINX example:

upstream milvaion_api {
    least_conn;
    server api-1:5000;
    server api-2:5000;
    server api-3:5000;
}

server {
    listen 80;

    location / {
        proxy_pass http://milvaion_api;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
    }
}

Note: Enable WebSocket support for SignalR (Upgrade and Connection headers).

Dispatcher Leader Election

Only one dispatcher should be active. Milvaion uses Redis distributed locking:

1. Each API instance attempts to acquire dispatcher lock
2. Lock winner runs dispatcher service
3. Other instances skip dispatch (passive standby)
4. If leader fails, lock expires, another instance takes over

Configure lock TTL:

{
  "MilvaionConfig": {
    "JobDispatcher": {
      "LockTtlSeconds": 600
    }
  }
}

Scaling Infrastructure

PostgreSQL

Vertical Scaling (Primary):

Workload vCPU RAM Storage
Small 2 4 GB 50 GB SSD
Medium 4 8 GB 100 GB SSD
Large 8 16 GB 500 GB SSD

Read Replicas:

For heavy read workloads (dashboard, reports), add read replicas.

Redis

Vertical Scaling:

Workload RAM Notes
Small 1 GB Single instance
Medium 4 GB Single instance with persistence
Large 8+ GB Redis Cluster or Redis Sentinel

Key Capacity Planning:

  • ~1 KB per scheduled job
  • ~100 bytes per worker heartbeat
  • ~500 bytes per distributed lock

Example: 10,000 active jobs ≈ 10 MB

RabbitMQ

Clustering:

For high availability, use RabbitMQ clustering with quorum queues.

Queue Capacity:

  • ~500 bytes per queued job message
  • 100,000 pending jobs ≈ 50 MB

Throughput Benchmarks

Reference Numbers

Tested on: 4 vCPU, 8 GB RAM, PostgreSQL / Redis / RabbitMQ on same machine

Scenario Jobs/sec Workers Concurrency
Simple logging job 500 1 50
API call (100 ms) 100 1 10
Database insert 200 1 20
Email send (500 ms) 20 1 10

Scaling linearly with workers:

  • 10 workers × 20 jobs/sec = 200 jobs/sec

Bottleneck Identification

Symptom Likely Bottleneck Solution
High API CPU Too many dashboard polls Add API replicas
Redis high latency Too many ZSET operations Redis Cluster
RabbitMQ queue depth growing Workers too slow Add workers
PostgreSQL high CPU Too many occurrence inserts Read replicas, partitioning
Worker high memory Job data too large Optimize job payloads

Concurrency Policies

Per-Job Concurrency

{
  "concurrentExecutionPolicy": 0
}
Value Policy Behavior
0 Skip Do not create occurrence if already running
1 Queue Create occurrence, wait for previous to complete

Worker-Level Concurrency

{
  "Worker": {
    "MaxParallelJobs": 10
  },
  "JobConsumers": {
    "SendEmailJob": { "MaxParallelJobs": 20 },
    "GenerateReportJob": { "MaxParallelJobs": 2 }
  }
}

Auto-Scaling Patterns

Kubernetes HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: email-worker-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: email-worker
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Queue-Based Scaling (KEDA)

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: email-worker-scaler
spec:
  scaleTargetRef:
    name: email-worker
  minReplicaCount: 1
  maxReplicaCount: 50
  triggers:
    - type: rabbitmq
      metadata:
        queueName: jobs.sendemail.wildcard
        host: amqp://guest:guest@rabbitmq:5672/
        mode: QueueLength
        value: "100"

Scaling Checklist

Before Scaling

  • Identify the bottleneck (CPU, memory, I/O, network)
  • Measure current throughput baseline
  • Check infrastructure capacity (connections, disk)

After Scaling

  • Verify even load distribution
  • Monitor for new bottlenecks
  • Update connection pool sizes if needed
  • Adjust timeout values for increased load

Connection Limits

Component Default Limit Recommendation
PostgreSQL 100 connections Increase to workers × 2
Redis 10,000 connections Usually sufficient
RabbitMQ 65,535 connections Usually sufficient

What's Next?