All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- SASL/GSSAPI (Kerberos) authentication via the
gssapiCargo feature. Default builds remain Kerberos-free; opt in withcargo build --features gssapi -p kafka-backup-cli. State machine and credential hints adapted from @kthimjo's PR #95 — thank you.- New
SaslMechanism::Gssapienum variant. - New optional
SecurityConfigfields:sasl_kerberos_service_name,sasl_keytab_path,sasl_krb5_config_path. GssapiPluginimplements RFC 4752 Phase 1 multi-roundgss_init_sec_context, Phase 1→2 turnaround, Phase 2layer = 0x01(no security layer, no size) wrap/unwrap, and KIP-368 re-authentication via fresh-context rebuild.GssapiPluginFactory— constructed from the operator-provided keytab + krb5.conf + service name, validated eagerly at config time. The factory binds the SPN hostname at.build()time (see Factory extension point below), so each per-brokerKafkaClientauthenticates against the correct per-broker SPN (kafka/brokerN.fqdn@REALM) on multi-broker clusters.- Process-wide
KRB5_ENV_LOCK: tokio::sync::Mutex<()>serialisesKRB5_CLIENT_KTNAME/KRB5_CONFIG/KRB5CCNAMEmutation during credential acquisition — eliminates the multi-client env-var race inherent tolibgssapi 0.9. - When a keytab is configured,
GssapiPluginisolates its credential cache viaKRB5CCNAME=MEMORY:<ptr>. This prevents stale tickets in the OS default ccache (common on macOSAPI:<uuid>caches) from being preferred over a fresh TGT from the keytab — a failure mode that surfaces as a cryptic broker-sideAuthentication failed due to invalid credentials.
- New
- Factory extension point —
SecurityConfig.sasl_mechanism_plugin_factory: Option<SaslMechanismPluginFactoryHandle>replaces the priorsasl_mechanism_plugin: Option<SaslMechanismPluginHandle>(both introduced on this branch; neither has shipped).KafkaClient::authenticatecallsfactory.build(broker_host, broker_port)once per connection, receiving the endpoint frombootstrap_servers[0]— whichPartitionLeaderRouterhas already rewritten to the advertised per-brokerhost:portbefore spawning pooled clients. This fixes one correctness bug and removes a latent one:- Multi-broker GSSAPI SPN (fixed). Non-bootstrap brokers now
authenticate against their own SPN
(
kafka/brokerN.fqdn@REALM) rather than the bootstrap host's — the standard librdkafka / JVM-client behaviour. - Per-connection GSSAPI state (removed as a latent risk). Each
pooled
KafkaClientnow owns its ownGssapiPluginand its ownClientCtx. A shared plugin across the pool would have been a concurrency hazard even if it has not produced a visible failure in the current test matrix.
SharedPluginFactory— convenience wrapper for stateless mechanisms (PLAIN, OAUTHBEARER with a shared token provider); returns the same Arc from everybuildcall.- New
SaslPluginError::FactoryFailed { mechanism, source }variant for clean error surfaces at build time.
- Multi-broker GSSAPI SPN (fixed). Non-bootstrap brokers now
authenticate against their own SPN
(
SaslMechanismPlugin::supports_reauth()capability flag — defaulttrue(PLAIN, SCRAM, OAUTHBEARER continue to schedule KIP-368 live re-auth);GssapiPluginoverrides tofalse. Apache Kafka does not support live re-authentication for GSSAPI — Kerberos GSS-API contexts are bound to the wire connection, and the broker rejects in-placeSaslAuthenticateafter the initial handshake. Matches librdkafka: treat the broker-advertisedsession_lifetime_msas a drain-and-reconnect timer rather than firing a reauth the broker will reject. With the plugin opting out,KafkaClient::authenticateskipsspawn_reauth_taskentirely; the session expires naturally and the next RPC reconnects through the normal auth path.- CLI plumbing: new flags
--sasl-mechanism,--sasl-keytab,--sasl-krb5-config,--sasl-kerberos-service-nameonoffset-reset,offset-reset-bulk, andoffset-rollbackcommands. YAML configs auto-wire aGssapiPluginFactorywhensasl_mechanism: GSSAPIis set. A runtime error surfaces if the CLI was built without--features gssapi. - Deduplicated CLI security-args parsing via
commands/security_args.rs(#[derive(clap::Args)] SecurityCliArgs) — removes three copies of the priorparse_security_confighelper. - Docker test fixture at
tests/sasl-gssapi-test-infra/— self-hosted MIT KDC (Dockerfile.kdc), Apache Kafka 7.7.0 configured forSASL_PLAINTEXT://kafka.test.local:9098withGSSAPIenabled, realmTEST.LOCAL, keytab auto-generation with healthcheck gate. - Three
#[ignore]E2E tests: keytab happy-path, missing-keytab clear error, KIP-368 reauth fires within broker's 60 s window (crates/kafka-backup-core/tests/integration_suite/sasl_gssapi_tests.rs). - Full backup → restore roundtrip E2E test over GSSAPI
(
sasl_gssapi_backup_restore_roundtrip): produces records, drivesBackupEngine+RestoreEnginewith topic remap, consumes from the restored topic and asserts record count + payload. Runs at the defaultconnections_per_broker: 4now that each pooled connection owns its ownGssapiPluginvia the factory. - Factory-dispatch regression test
(
sasl_plugin_mock_tests::factory_receives_per_broker_endpoint): aCapturingFactoryassertsbuild(host, port)is called exactly once perKafkaClientwith the endpoint frombootstrap_servers[0]. No Docker — uses the in-processMockKafkaBrokerfixture. - Pool-isolation regression test
(
sasl_plugin_mock_tests::pool_produces_distinct_plugin_per_kafkaclient): N=3 separateMockKafkaBrokerinstances, NKafkaClients sharing oneSaslMechanismPluginFactory; asserts the factory is invoked once per client with the correct endpoint and returns a pointer-distinct plugin Arc each time. Turns item 2 above ("removed as a latent risk") into a tested guarantee. - Scheduler-opt-out regression test
(
sasl_plugin_mock_tests::reauth_scheduler_not_spawned_when_plugin_opts_out): a plugin returningsupports_reauth() = falseconnects against a mock that advertisessession_lifetime_ms: 60_000; virtual time is advanced past the 80 % reauth deadline; the test assertsreauth_payloadis never called and the mock sees exactly oneSaslAuthenticateframe. - Example YAML configs for operators:
config/gssapi-backup.yamlandconfig/gssapi-restore.yaml, driving the release binary end-to-end against the fixture. - Release-binary CLI smoke script at
tests/sasl-gssapi-test-infra/run-cli-smoke.sh— builds--release --features gssapiand exerciseskafka-backup backupandkafka-backup restoreagainst the fixture, asserting exit codes, manifest existence, and restored record count.
gssapifeature links against MIT krb5 at build time. Install:- macOS:
brew install krb5+ exportPKG_CONFIG_PATH="$(brew --prefix krb5)/lib/pkgconfig:…"(Apple's bundled Heimdal does not expose the symbolslibgssapi 0.9requires). - Debian/Ubuntu:
apt-get install libkrb5-dev. - Fedora/RHEL:
dnf install krb5-devel.
- macOS:
- Apache Kafka does not support live KIP-368 re-authentication for the
GSSAPI mechanism — Kerberos GSS-API contexts are bound to the wire
connection and the broker rejects in-place
SaslAuthenticateafter the initial handshake.GssapiPlugin::supports_reauth()returnsfalse, so the client no longer schedules a reauth task for GSSAPI connections; the broker-advertisedsession_lifetime_msis treated as a drain-and-reconnect window, matching librdkafka and the JVM client behaviour. The connection lives out its session and the next RPC transparently reconnects through the normal auth path.
- The mock-broker test proves the factory contract (
buildis called with the correct endpoint perKafkaClient). A multi-broker Docker GSSAPI fixture that exercises distinct per-broker SPNs end-to-end is a planned follow-up. - Release binaries and the default Docker image do not include GSSAPI. Build
your own image with
--build-arg FEATURES=gssapionce the downstream image ships that arg.
- Pluggable SASL mechanism extension point (
SaslMechanismPlugintrait) — lets downstream crates implement OAUTHBEARER, MSK IAM, or custom SASL mechanisms without forkingkafka-backup-core.- Handshake + single- or multi-round
SaslAuthenticatedispatch. - KIP-368 re-authentication scheduler: spawns a task post-handshake
when the broker advertises
session_lifetime_ms > 0; firesreauth_payloadat 80 % of the advertised lifetime with a 30 s minimum floor and ±5 s jitter. - Default
interpret_server_errorhandles both RFC 7628 JSON and Apache Kafka 3.5+ free-formerror_messagebytes. - New field
SecurityConfig.sasl_mechanism_plugin: Option<Arc<dyn SaslMechanismPlugin>>(marked#[serde(skip)]— programmatic wiring only, no YAML surface).
- Handshake + single- or multi-round
- 14 unit tests + 4 integration tests exercising single-round, multi-round, server-error, and scheduler paths against an in-process Kafka-wire mock (no Docker required).
#[ignore]E2E test against Confluent cp-kafka 7.7.0 configured for SASL_PLAINTEXT + OAUTHBEARER with the bundled unsecured-JWS validator. Fixture:tests/sasl-oauth-test-infra/.- Example:
examples/custom_sasl_plugin.rs— minimal static-token OAUTHBEARER plugin (reference implementation).
- SASL dispatch in
KafkaClientunified: the four duplicatedsasl_{plain,scram}_auth{,_raw}methods collapse into a single dispatch function called by both initial-connect and reconnect. Behaviour for existingPLAIN/SCRAM-SHA-256/SCRAM-SHA-512configurations is unchanged.
- Incremental one-shot backups now work — offset tracking was previously gated on
continuous: true, making one-shot and snapshot backups always start fromearliest. Now, addingoffset_storageto the config enables resume-from-last-offset in any backup mode.
- Unit tests for
merge_manifests()function (previously untested) - Integration test for incremental one-shot backup resume behavior
0.5.0 - 2026-01-17
- Prometheus/OpenMetrics metrics support (#9)
- Consumer lag tracking per topic/partition (
kafka_backup_lag_records) - Records and bytes throughput counters (
kafka_backup_records_total,kafka_backup_bytes_total) - Compression ratio gauge (
kafka_backup_compression_ratio) - Storage write latency histogram (
kafka_backup_storage_write_latency_seconds) - Storage I/O bytes counter (
kafka_backup_storage_write_bytes_total) - Error counting by type (
kafka_backup_errors_total)
- Consumer lag tracking per topic/partition (
- HTTP metrics server with
/metricsendpoint (default port 8080) /healthendpoint for liveness checks- New
metricsconfiguration section in config file MetricsServerConfig::new()constructor for programmatic configuration
- Breaking: Added
metrics: Option<MetricsConfig>field toConfigstruct- Existing code constructing
Configwith struct literals must addmetrics: None - YAML configs are unaffected (field is optional with serde default)
- Existing code constructing
- Marked
MetricsConfigas#[non_exhaustive]to prevent future breaking changes
- Added Metrics & Monitoring section to README
- Full metrics reference available at kafka-backup-docs
- Monitoring stack (Prometheus + Grafana) available in kafka-backup-demos
0.4.0 - 2026-01-09
- TLS/SSL support for custom CA certificates (
ssl_ca_location) - Mutual TLS (mTLS) authentication with client certificates (
ssl_certificate_location,ssl_key_location) - TLS test infrastructure with Docker Compose for integration testing
- Comprehensive TLS documentation in configuration guide
- Breaking: Fixed TLS certificate configuration being ignored (#3)
- Previously,
ssl_ca_location,ssl_certificate_location, andssl_key_locationwere parsed but never used - Connections to Kafka with self-signed or internal CA certificates now work correctly
- Added new error variants to
KafkaError:TlsConfig,CertificateLoad,PrivateKeyLoad
- Previously,
- Breaking: Added new variants to
KafkaErrorenum. Code that exhaustively matches on this enum without a wildcard will need updating.
0.1.4 - 2025-12-03
- crates.io publishing for
kafka-backup-corelibrary - Semantic version checking workflow for breaking change detection
- Dependabot configuration for operator repository
- Crate-specific README for kafka-backup-core
- Updated kafka-backup-core package metadata for crates.io compatibility
0.1.3 - 2025-12-01
- Try It Yourself section linking to demos repository
- Suggest Features link to Contributing section
- GitHub issue templates for bugs and feature requests
- Contributing section in README
- Improved issue templates structure
0.1.2 - 2025-11-30
- Scoop package manager support for Windows installation
- Docker Hub automated publishing on releases
- Comprehensive installation guide in README
- Simplified Homebrew install to one-liner (
brew install osodevops/tap/kafka-backup) - Renamed Homebrew formula to
kafka-backup - Updated README installation instructions
- Fixed Docker image naming to use semantic versions
0.1.0 - 2025-11-30
- Initial release of kafka-backup
BackupEnginefor backing up Kafka topics to cloud storageRestoreEnginewith point-in-time recovery (PITR) support- Multi-cloud storage support:
- Amazon S3
- Azure Blob Storage
- Google Cloud Storage
- Local filesystem
- In-memory (for testing)
- Consumer group offset recovery with multiple strategies:
skip- restore data onlyheader-based- extract offset from message headerstimestamp-based- query target by timestampcluster-scan- scan target__consumer_offsetsmanual- operator-driven reset
- Three-phase restore orchestration for exact offset recovery
- Offset snapshot and rollback functionality
- Compression support: zstd, lz4, gzip, snappy
- Prometheus metrics integration
- Circuit breaker pattern for fault tolerance
- SQLite-based offset tracking with cloud sync
- CLI with commands: backup, restore, list, describe, validate, offset-reset
- cargo-dist release workflow with cross-platform binaries
- Homebrew tap for macOS/Linux installation