Skip to content

Latest commit

 

History

History
426 lines (329 loc) · 14.7 KB

File metadata and controls

426 lines (329 loc) · 14.7 KB

DevKit - AutoMQ Local Development Environment

Overview

DevKit is AutoMQ's local development environment. One-command Kafka cluster with MinIO S3 storage, JDWP debugging, chaos testing, and optional feature stacks.

Core principle: All commands run via just. Kafka commands execute inside containers — use shortcuts or just bin from the host.

Supported node counts: 1, 3, 4, 5. Node numbering starts at 0.

Prerequisites

  • Docker & Docker Compose
  • justbrew install just
  • envsubstbrew install gettext (macOS)
  • JDK 17+ (for just start-build only)

Quick Start

# First time: build image + code, then start
just start-build        # single node
just start-build 3      # 3-node cluster

# After first build
just start              # single node
just start 3            # 3-node cluster
just start 3 tabletopic # 3-node + Iceberg table topic

Architecture

MinIO (S3 storage)
  └── mc (bucket init, one-shot)
        └── AutoMQ Nodes
              ├── node 0-2: controller + broker (KRaft voters)
              └── node 3+:  broker only

Node roles by count: a 4-node cluster has nodes 0-2 as controllers, node 3 as broker-only.

Optional feature stacks: tabletopic → Schema Registry + Iceberg REST Catalog, analytics → Spark + Trino.

Decision: Which start command to use?

Java code changed + Dockerfile changed → just start-build
Java code changed only                 → just restart-build
Config files changed only              → just restart
Nothing changed                        → just start [N]

Lifecycle

just start                      # Start (skip build), single node
just start 3 tabletopic         # Start 3-node + features
just start-build                # Build image & code, then start
just stop                       # Stop containers (keep data)
just clean                      # Stop + remove all volumes
just restart                    # Restart containers (no rebuild)
just restart-build              # Rebuild code & image, then restart
just status                     # Show container status
just logs                       # Last 100 lines of all nodes
just logs 0                     # Last 100 lines of node-0
just logs 0 500                 # Last 500 lines of node-0
just logs-follow                # Follow all logs (blocks)
just logs-follow 0              # Follow node-0 logs (blocks)
just shell                      # Enter node-0 shell (interactive)
just shell 1                    # Enter node-1 shell (interactive)
just exec 0 ls /opt/automq/bin  # Run a single command in node-0
just build-image                # Rebuild Docker image only
just wait                       # Wait until all nodes healthy (auto-called by start)

just logs outputs to stdout — pipe it freely:

just logs 0 | grep ERROR
just logs 0 500 | grep WARN
just logs 0 | head -20          # oldest entries
just logs 0 | tail -20          # newest entries

Topic Operations

just topic-list                                     # List all topics
just topic-create my-topic --partitions 16          # Create topic
just topic-describe my-topic                        # Describe topic

# Producer — reads from stdin, supports --node N and all Kafka producer flags
just produce my-topic                               # Interactive (type messages, Ctrl-D to end)
echo "hello" | just produce my-topic               # Pipe a single message
just produce my-topic --node 1                      # Produce via node-1

# Produce key:value messages
printf 'k1:v1\nk2:v2\n' | just produce my-topic \
  --property parse.key=true \
  --property "key.separator=:"

# Specify client-id (use --producer-property, not --client-id)
echo "hello" | just produce my-topic \
  --producer-property client.id=my-producer

# Consumer — supports --node N and all Kafka consumer flags
just consume my-topic --from-beginning              # Consume from beginning
just consume my-topic --node 1                      # Consume via node-1
just consume my-topic --max-messages 10             # Stop after 10 messages

# Print keys with custom separator (special characters passed safely)
just consume my-topic --from-beginning \
  --property print.key=true \
  --property "key.separator=:"

# Specify client-id (use --consumer-property, not --client-id)
just consume my-topic --from-beginning \
  --consumer-property client.id=my-consumer

Consumer Group Operations

just group-list                                     # List all consumer groups
just group-describe my-group                        # Show offsets and lag per partition
just group-members my-group                         # Show active members and assignments
just group-reset my-group my-topic --to-earliest    # Reset to earliest offset
just group-reset my-group my-topic --to-latest      # Reset to latest offset
just group-reset my-group my-topic --to-offset 100  # Reset to specific offset

Cluster Operations

just broker-list                # List brokers and supported API versions
just broker-config              # Describe all broker configs (cluster-wide)
just broker-config 0            # Describe broker 0 configs only

Bin (direct passthrough)

Run any Kafka bin/ script inside a node container. Commands run via docker exec, so localhost:9092 refers to the broker inside the container.

just bin kafka-topics.sh --bootstrap-server localhost:9092 --list
just bin kafka-configs.sh --bootstrap-server localhost:9092 --describe --entity-type brokers
just bin --node 1 kafka-broker-api-versions.sh --bootstrap-server localhost:9092

# One-shot shell command in a container
just exec 0 ls /opt/automq/bin
just exec 0 cat /config/node-0.properties

Perf

# -1 = unlimited throughput
just perf-produce my-topic --num-records 100000 --record-size 1024 --throughput -1
just perf-consume my-topic --messages 100000

Diagnostics

Arthas (Java runtime diagnostics)

just arthas         # Attach to node-0 Kafka process (interactive)
just arthas 1       # Attach to node-1 (interactive)

# Non-interactive: run a single Arthas command and exit
just arthas-exec 0 version
just arthas-exec 0 'dashboard -n 1'       # one snapshot of CPU/memory/threads
just arthas-exec 0 'thread -n 5'          # top 5 busy threads
just arthas-exec 0 'jvm'                  # JVM info (heap, GC, classloader)
just arthas-exec 0 'sc kafka.server.*'    # search loaded classes

# Note: streaming commands like watch/trace require active traffic to produce
# output and will block until -n limit is reached. Use just arthas for those.

JMX

JMX exposed on node-0 at port 9999 via jmxterm (pre-installed).

just jmx -e 'domains'                                                              # List all JMX domains
just jmx -e 'beans -d kafka.server'                                                # List beans in domain
just jmx -e 'info -b kafka.server:name=BrokerState,type=KafkaServer'              # View bean attributes
just jmx -e 'get -b kafka.server:name=BrokerState,type=KafkaServer Value'         # Broker state (3 = running)
just jmx -e 'get -b kafka.server:name=MessagesInPerSec,type=BrokerTopicMetrics Count'
just jmx --node 1 -e 'get -b kafka.server:name=BrokerState,type=KafkaServer Value'

JDK tools

just shell      # Enter node-0 shell — jstack, jmap, jcmd available
just shell 1    # Enter node-1 shell

To change JVM heap size, edit HEAP_OPTS in justfile (default: -Xms256m -Xmx256m).

IDEA Remote Debugging

JDWP is enabled on all nodes at startup. Port mapping: node-0 → 5005, node-1 → 5006, node-2 → 5007, ...

# In IDEA: Run → Edit Configurations → + → Remote JVM Debug → localhost:5005
just start
# Then attach debugger

Chaos Testing

Inject faults using tc netem (network delay/loss) and iptables (traffic blocking). NET_ADMIN capability is already set in docker-compose.yml.

Always run just chaos-reset when done — clears all tc rules, iptables rules, and unpauses MinIO.

Node network

just chaos-delay                    # 200ms delay on node-0
just chaos-delay 500 node-1         # 500ms delay on node-1
just chaos-loss                     # 5% packet loss on node-0
just chaos-loss 10 node-1           # 10% packet loss on node-1

S3 (MinIO) faults

# Targeted: only affects traffic between the specified node and MinIO
just chaos-s3-delay 500 node-0      # 500ms latency to S3 on node-0
just chaos-s3-loss 10 node-0        # 10% packet loss to S3 on node-0

# Global
just chaos-s3-down                  # Pause MinIO (S3 completely unavailable)
just chaos-s3-up                    # Resume MinIO

# Partition (iptables DROP)
just chaos-s3-partition node-1      # Isolate node-1 from S3
just chaos-s3-partition-reset       # Restore S3 connectivity only

Node-to-node partition

just chaos-partition node-0 node-1  # Bidirectional isolation (iptables DROP)
just chaos-partition-reset          # Restore node connectivity only

Reset & status

just chaos-reset                    # Remove ALL chaos rules (tc + iptables + unpause MinIO)
just chaos-status                   # Show active rules per node and MinIO state

Scenarios

S3 slow response

just chaos-s3-delay 500 node-0
just produce my-topic
just chaos-reset

S3 outage

just chaos-s3-down
just logs
just chaos-s3-up

Single node S3 partition

just chaos-s3-partition node-1
just logs 1
just chaos-s3-partition-reset

Controller network partition

just start 3
just chaos-partition node-0 node-1
just logs
just chaos-partition-reset

Combined faults

just start 3
just chaos-delay 200 node-0
just chaos-s3-loss 10 node-0
just produce my-topic
just chaos-reset

Features

Append feature names to just start. Multiple features can be combined.

just start 1 telemetry              # Single node + OTel metrics
just start 3 tabletopic             # 3-node + Iceberg table topic
just start 5 zerozone analytics     # 5-node + zone router + query engines
Feature What it enables Extra services
tabletopic Iceberg table topic, REST catalog, schema registry Schema Registry (:8081), Iceberg REST (:8181)
zerozone Zone router, auto AZ assignment (round-robin: az-0/1/2) — (config only)
telemetry OTel metrics export via OTLP HTTP — edit config/features/telemetry.properties to set collector endpoint before use — (config only)
analytics Spark + Trino for querying Iceberg tables Spark (:8888), Trino (:8090)

Namespace (Multi-Instance Isolation)

Run multiple DevKit instances on the same machine without port conflicts. Each namespace gets a unique COMPOSE_PROJECT_NAME and port offset.

Requires separate git worktrees — each worktree has its own devkit/ directory and .devkit/compose.env, so each instance is fully isolated. You cannot run multiple namespaces from the same devkit directory.

# In worktree A:
just set-ns lag-test 1    # ID=1 → ports offset by +100
just start 3

# In worktree B:
just set-ns perf 2        # ID=2 → ports offset by +200
just start

# Revert to default ports
just clear-ns
  • Name: 1-8 chars, must start with a letter, only [a-z0-9-]
  • ID: 1-9, determines port offset (ID * 100)
  • Config is stored in .devkit/compose.env (gitignored)

Port Allocation

Node Kafka Controller JDWP JMX
0 9092 9093 5005 9999
1 19092 19093 5006
2 29092 29093 5007
3 39092 39093 5008
4 49092 49093 5009

Other services:

Configuration

5-layer system — later layers override earlier ones. Each layer is a template rendered via envsubst before Docker starts. Final merged config: .devkit/config/node-N.properties.

Layer File Variables Purpose
1 config/defaults.properties S3 vars S3 endpoints, KRaft basics
2 config/role/server.properties or broker.properties S3 vars, NODE_ID Listeners, process roles
3 config/role/node.properties NODE_ID, CLUSTER_ID, VOTERS, BOOTSTRAP KRaft identity
4 config/features/*.properties All vars + AZ_NAME Feature-specific settings
5 config/custom.properties verbatim (no substitution) Highest priority overrides

Variables

S3 variables (defined in justfile, Layer 1/2/4):

Variable Default Description
S3_REGION us-east-1 S3 region
S3_ENDPOINT http://minio:9000 S3 endpoint URL
S3_DATA_BUCKET automq-data Data bucket
S3_OPS_BUCKET automq-ops Ops bucket
S3_PATH_STYLE true Path-style access (required for MinIO)
S3_ACCESS_KEY admin S3 access key (exported as KAFKA_S3_ACCESS_KEY to containers)
S3_SECRET_KEY password S3 secret key (exported as KAFKA_S3_SECRET_KEY to containers)

Per-node variables (computed per node, Layer 2/3/4):

Variable Example Description
NODE_ID 0, 1, 2 Node index (0-based)
CLUSTER_ID abc-123 Stable UUID, persisted in .devkit/cluster-id
VOTERS 0@node-0:9093,1@node-1:9093 KRaft controller voter list
BOOTSTRAP node-0:9093,node-1:9093 Controller bootstrap servers
AZ_NAME az-0, az-1, az-2 Round-robin by NODE_ID % 3 (used by zerozone)

Customization examples

Use real AWS S3 — edit justfile:

S3_REGION      := "ap-northeast-1"
S3_ENDPOINT    := "https://s3.ap-northeast-1.amazonaws.com"
S3_DATA_BUCKET := "my-automq-data"
S3_OPS_BUCKET  := "my-automq-ops"
S3_PATH_STYLE  := "false"
S3_ACCESS_KEY  := "AKIA..."
S3_SECRET_KEY  := "..."

Change listeners (e.g. SASL) — edit config/role/server.properties:

process.roles=broker,controller
listeners=SASL_PLAINTEXT://:9092,CONTROLLER://:9093
advertised.listeners=SASL_PLAINTEXT://node-${NODE_ID}:9092,CONTROLLER://node-${NODE_ID}:9093

Add broker settings — create config/custom.properties (gitignored):

num.partitions=4
log.retention.hours=24

Add a new feature — create config/features/my-feature.properties:

my.setting=value
my.s3.path=s3://${S3_DATA_BUCKET}/my-prefix
my.node.label=node-${NODE_ID}
just start 1 my-feature