fix: replace removed bitnami/kafka with apache/kafka, production-hard…#38
fix: replace removed bitnami/kafka with apache/kafka, production-hard…#38tusharkhatriofficial merged 1 commit intomainfrom
Conversation
…en deployment - Switch Kafka image from bitnami/kafka:3.7 (removed from Docker Hub) to apache/kafka:latest (official KRaft image) - Convert Bitnami KAFKA_CFG_* env vars to native KAFKA_* format - Rewrite Dockerfile as 3-stage build: Node (dashboard) → Maven (JAR) → JRE runtime — serves both API and dashboard on :8080 - Add .dockerignore to exclude .git, node_modules, ai-docs from builds - Use eclipse-temurin:21-jre instead of jdk (saves ~300MB) - Run app as non-root user (eventara) - Add resource limits to prod compose (postgres 1G, kafka 1G, redis 512M, eventara 1G) - Add healthcheck start_period for slower VPS environments - Remove SPRING_JPA_HIBERNATE_DDL_AUTO override that conflicted with Flyway in dev compose - Rewrite DEPLOYMENT.md with Coolify, DigitalOcean, and VPS guides - Add EVENTARA_PORT and JAVA_OPTS to .env.example
There was a problem hiding this comment.
Key deployment hardening items are currently undermined by (1) apache/kafka:latest being unpinned and (2) Compose deploy.resources.limits often being ignored outside Swarm. The container’s JAVA_OPTS setting is also not applied by the Dockerfile entrypoint, so JVM tuning guidance won’t work. .dockerignore is overly aggressive (notably ignoring Dockerfile), which can break remote/CI builders that rely on the build context.
Additional notes (2)
- Compatibility |
docker-compose.prod.yaml:18-18
deploy.resources.limitsindocker-compose.prod.yamlis ignored bydocker compose upoutside of Swarm mode for many Docker setups. That means the documented “memory limits” may not actually apply on the target VPS, defeating the stated hardening.
If your intent is to enforce limits in standard Compose, you should use Compose-supported settings (e.g., mem_limit) or document clearly that limits require Swarm / a compatible implementation.
- Maintainability |
Dockerfile:1-8
The dashboard build stage only copieseventara-dashboard/package*.jsonbeforenpm ci. If the dashboard usespackage-lock.json(good) this is fine, but if it usespnpm-lock.yaml/yarn.lockor npm workspaces, the caching/install may be incorrect. Also,npm ci --production=falseis an odd flag;npm cialready installs devDependencies by default unlessNODE_ENV=productionor--omit=devis used.
This isn’t necessarily broken, but it’s brittle and a bit confusing in a “production hardened” Dockerfile.
Summary of changes
Summary of changes
- Added a
.dockerignoreto reduce Docker build context (ignoresnode_modules,.git, docs, compose files, etc.). - Updated
.env.exampleto reflect Kafka KRaftCLUSTER_IDand documentedEVENTARA_PORT+JAVA_OPTS. - Rewrote
DEPLOYMENT.mdwith updated guidance (Coolify + VPS steps), resource limits, and env var table. - Replaced
bitnami/kafka:3.7withapache/kafka:latestin both compose files and migrated BitnamiKAFKA_CFG_*env vars to nativeKAFKA_*. - Hardened
docker-compose.prod.yamlwithstart_periodon healthchecks, memory limits, configurable app port, andJAVA_OPTS. - Reworked
Dockerfileinto a 3-stage build (dashboard → backend → JRE runtime) and runs the container as a non-root user. - Minor README tweak to remove the pinned Kafka version string.
| kafka: | ||
| image: bitnami/kafka:3.7 | ||
| image: apache/kafka:latest | ||
| restart: unless-stopped | ||
| environment: | ||
| ALLOW_PLAINTEXT_LISTENER: "yes" | ||
| KAFKA_KRAFT_CLUSTER_ID: ${KAFKA_KRAFT_CLUSTER_ID:-NvDmnaWzQgiH8qbnraqxcg} | ||
| CLUSTER_ID: ${CLUSTER_ID:-NvDmnaWzQgiH8qbnraqxcg} | ||
|
|
||
| KAFKA_CFG_NODE_ID: 1 | ||
| KAFKA_CFG_PROCESS_ROLES: controller,broker | ||
| KAFKA_CFG_CONTROLLER_QUORUM_VOTERS: 1@kafka:9094 | ||
| KAFKA_CFG_CONTROLLER_LISTENER_NAMES: CONTROLLER | ||
| KAFKA_NODE_ID: 1 | ||
| KAFKA_PROCESS_ROLES: controller,broker | ||
| KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka:9094 | ||
| KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER | ||
|
|
||
| # Internal-only listener for production (no host port published) | ||
| KAFKA_CFG_LISTENERS: INTERNAL://:9092,CONTROLLER://:9094 | ||
| KAFKA_CFG_ADVERTISED_LISTENERS: INTERNAL://kafka:9092 | ||
| KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT | ||
| KAFKA_CFG_INTER_BROKER_LISTENER_NAME: INTERNAL | ||
| KAFKA_LISTENERS: INTERNAL://:9092,CONTROLLER://:9094 | ||
| KAFKA_ADVERTISED_LISTENERS: INTERNAL://kafka:9092 | ||
| KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL:PLAINTEXT,CONTROLLER:PLAINTEXT | ||
| KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL | ||
|
|
||
| KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: "true" | ||
| KAFKA_CFG_NUM_PARTITIONS: 3 | ||
| KAFKA_CFG_DEFAULT_REPLICATION_FACTOR: 1 | ||
| KAFKA_AUTO_CREATE_TOPICS_ENABLE: "true" | ||
| KAFKA_NUM_PARTITIONS: 3 | ||
| KAFKA_DEFAULT_REPLICATION_FACTOR: 1 | ||
|
|
||
| KAFKA_CFG_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 | ||
| KAFKA_CFG_TRANSACTION_STATE_LOG_MIN_ISR: 1 | ||
| KAFKA_CFG_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1 | ||
| KAFKA_CFG_GROUP_INITIAL_REBALANCE_DELAY_MS: 0 | ||
| KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 | ||
| KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1 | ||
| KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1 | ||
| KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0 | ||
| volumes: | ||
| - kafka_data:/bitnami/kafka | ||
| - kafka_data:/opt/kafka/kafka-logs |
There was a problem hiding this comment.
Using apache/kafka:latest is a reproducibility and stability risk. A new upstream release can change defaults, KRaft metadata behavior, or paths, and your deployment could break on the next docker compose up.
Given this PR’s goal is to “production-harden” deployment, the image should be pinned to a specific Kafka version (or at least a major/minor tag) and ideally to a digest.
Suggestion
Pin the Kafka image to a known-good version and (optionally) a digest, e.g.:
image: apache/kafka:3.7.2(or whatever you validated)- or
image: apache/kafka@sha256:<digest>
Do the same in docker-compose.yaml.
Reply with "@CharlieHelps yes please" if you'd like me to add a commit pinning the image tag in both compose files.
| EXPOSE 8080 | ||
| ENTRYPOINT ["java", "-jar", "app.jar"] |
There was a problem hiding this comment.
JAVA_OPTS is set in the container environment, but the Dockerfile ENTRYPOINT never consumes it (it always runs java -jar app.jar). Unless your base image provides a wrapper that reads JAVA_OPTS (Temurin images typically do not), this variable will be ignored and the memory tuning guidance in .env.example/docs won’t work.
This is a functional issue: users will think memory limits are applied when they aren’t.
Suggestion
Update the entrypoint to actually apply JAVA_OPTS. Common patterns:
- Shell form:
ENTRYPOINT ["sh", "-c", "exec java $JAVA_OPTS -jar app.jar"]
Or keep exec form by using CMD:
ENTRYPOINT ["sh", "-c"]CMD ["exec java $JAVA_OPTS -jar app.jar"]
Reply with "@CharlieHelps yes please" if you'd like me to add a commit implementing this change safely (including quoting/exec).
| # ===== Stage 1: Build the dashboard ===== | ||
| FROM node:20-alpine AS dashboard-build | ||
| WORKDIR /app/eventara-dashboard | ||
| COPY eventara-dashboard/package*.json ./ | ||
| RUN npm ci --production=false | ||
| COPY eventara-dashboard/ . | ||
| # Vite builds to ../src/main/resources/static (relative to eventara-dashboard) | ||
| RUN npm run build | ||
|
|
||
| # ===== Stage 2: Build the Spring Boot app ===== | ||
| FROM maven:3.9.9-eclipse-temurin-21 AS backend-build | ||
| WORKDIR /app | ||
| COPY pom.xml . | ||
| COPY src ./src | ||
| # Copy dashboard build output into Spring Boot static resources | ||
| COPY --from=dashboard-build /app/src/main/resources/static ./src/main/resources/static/ |
There was a problem hiding this comment.
The Dockerfile comment says Vite builds to ../src/main/resources/static relative to eventara-dashboard, but the actual copy step pulls from /app/src/main/resources/static (i.e., outside /app/eventara-dashboard). That only works if the Vite config truly outputs to /app/src/main/resources/static inside the container.
Given the build stage WORKDIR is /app/eventara-dashboard, a common output would be /app/eventara-dashboard/dist. If the Vite output path ever changes, the Docker build will silently produce an image without the dashboard.
This is fragile because it relies on a non-standard output directory and a hard-coded cross-directory path.
Suggestion
Make the dashboard artifact path explicit and robust:
- Prefer building to the default
dist/and then copying into backend resources:COPY --from=dashboard-build /app/eventara-dashboard/dist ./src/main/resources/static/
If you intentionally build directly into ../src/main/resources/static, add a quick sanity check to fail fast:
RUN test -f /app/src/main/resources/static/index.html
Reply with "@CharlieHelps yes please" if you'd like me to add a commit that switches to copying from dist/ (or adds the fail-fast check).
| # Required for Kafka KRaft mode. Do not change after first boot unless you wipe the kafka volume. | ||
| KAFKA_KRAFT_CLUSTER_ID=NvDmnaWzQgiH8qbnraqxcg | ||
| CLUSTER_ID=NvDmnaWzQgiH8qbnraqxcg | ||
|
|
There was a problem hiding this comment.
You renamed KAFKA_KRAFT_CLUSTER_ID → CLUSTER_ID in .env.example and compose. That’s fine for the new image, but it’s a breaking change for anyone with existing .env files from previous versions. The docs mention stability, but there’s no explicit migration note that users must rename the variable (or they’ll silently fall back to the default).
Given the importance of cluster IDs, silently defaulting could accidentally create a new cluster metadata state when users expected their old one.
Suggestion
Add a short migration note in DEPLOYMENT.md (or README) such as:
- "If upgrading from older versions, rename
KAFKA_KRAFT_CLUSTER_IDtoCLUSTER_IDin your.env." - Optionally support both temporarily in compose:
CLUSTER_ID: ${CLUSTER_ID:-${KAFKA_KRAFT_CLUSTER_ID:-NvDm...}}
Reply with "@CharlieHelps yes please" if you'd like me to add a commit implementing backward-compatible env var fallback plus the doc note.
| ## Resource limits | ||
|
|
||
| The prod compose file sets memory limits: | ||
|
|
||
| | Service | Memory Limit | | ||
| |---|---| | ||
| | Postgres | 1 GB | | ||
| | Kafka | 1 GB | | ||
| | Redis | 512 MB | | ||
| | Eventara (API + Dashboard) | 1 GB | | ||
|
|
||
| Total: ~3.5 GB. A 4 GB VPS works for demos and light use. | ||
|
|
There was a problem hiding this comment.
The prod docs state the compose file "sets memory limits" totaling ~3.5GB. Given deploy.resources often won’t apply in standard Docker Compose (non-Swarm), this section is potentially misleading and could cause operators to size their VPS incorrectly.
This should be aligned with the actual behavior of the compose file.
Suggestion
Clarify how limits are enforced:
- Add a note like: "
deploy.resourcesis applied in Swarm; for plaindocker compose uplimits may not be enforced". - If you adopt
mem_limit(see compose suggestion), update the docs to say limits are enforced by Docker.
Reply with "@CharlieHelps yes please" if you'd like me to add a commit updating DEPLOYMENT.md to accurately describe limit enforcement.
…en deployment