[release/v26.1.x] Parallelize acceptance and integration tests (#1407)#1416
Merged
andrewstucki merged 11 commits intorelease/v26.1.xfrom Apr 7, 2026
Merged
[release/v26.1.x] Parallelize acceptance and integration tests (#1407)#1416andrewstucki merged 11 commits intorelease/v26.1.xfrom
andrewstucki merged 11 commits intorelease/v26.1.xfrom
Conversation
## Summary - Parallelize acceptance and integration tests, reducing CI wall time from ~90+ minutes to ~22 minutes - Add test infrastructure for parallel execution: file-based locking, per-test namespaces, shared operator install, image import caching - Fix flaky tests exposed by parallel execution - Add debugging tooling: diagnostics on failure, feature-name namespaces, timing markers - Speed up CI builds by only compiling for the host architecture and parallelizing Docker image pulls ## Details ### Acceptance test parallelization Refactored the harpoon test framework to run BDD features in parallel. Features are partitioned by a `@serial` tag — features without it run concurrently via `t.Run` + `t.Parallel()`, while `@serial` features (decommissioning, helm-chart) run sequentially afterward since they perform k3d node operations. - Moved the Redpanda operator from per-feature helm install to a single shared instance installed during `BeforeSuite` - Features that need their own operator (upgrade tests) run in isolated vclusters (`@vcluster` tag) - Setup and teardown are separated so the cluster isn't torn down between parallel and serial phases ### Integration test parallelization - Refactored `RedpandaControllerSuite` from a testify suite with shared mutable state to parallel subtests with per-test namespaces - Added `WatchAllNamespaces` option to `testenv.Env` so a single controller manager serves all test namespaces - Parallelized `charts/redpanda` integration test subtests - Parallelized `TestLicense` subtests across 4 image variants - Removed `-p=1` gate on integration tests (now uses Go's default parallelism) ### CI build speedups - Only compile `linux/amd64` binaries in CI (was building all 4 os/arch combos) - Parallelize Docker image pre-pulls (~25 images pulled concurrently instead of sequentially) - Pre-pull k3d infrastructure images (`rancher/k3s`, `k3d-tools`, `k3d-proxy`) - Batch `k3d image import` calls (one command with all images instead of N separate calls) - Cache imported images with marker files to avoid redundant imports across parallel test packages ### Cross-process coordination - Added file-based locking (`flock`) around k3d cluster creation and image imports to prevent conflicts when multiple test packages run simultaneously - Made CRD installation idempotent (tolerates "already exists" errors) - Made cert-manager installation idempotent in `helmtest.Setup` - Added cert-manager webhook readiness wait after k3s manifest installation ### Flaky test fixes - Fixed `require.FailNow` inside `assert.Eventually` goroutine causing panics in `rpk.go` - Added namespace scoping to Redpanda list queries to avoid cross-feature interference - Added `waitForStatefulSetReady` after cluster availability checks (looks up StatefulSet by label, not name) - Bumped k3d cluster creation timeout from 3 to 5 minutes - Bumped vcluster creation timeout from 3 to 5 minutes - Bumped PVC unbinder test timeouts - Added retry for schema registry readiness in `FactoryOperatorV1` - Overrode upgrade test operator images to use Docker Hub (pre-loaded into k3d) instead of `docker.redpanda.com` - Added `--rerun-fails` to gotestsum for automatic retry of flaky tests - Disabled k3s servicelb (unused in tests) ### Debugging improvements - Feature namespaces now include the feature name (e.g. `test-basic-cluster-tests-abc123`) - Added `=== FEATURE START/END/FAILED ===` log markers with timing - Added `DumpDiagnostics` on feature failure: pod statuses, events, resource descriptions, pod logs - Diagnostics written to `ACCEPTANCE_ARTIFACTS_DIR` when set (uploaded as CI artifacts) - Added `dumpDiagnostics` to `testenv.Env` for integration test failures - Added `DumpContainerLogsOnFailure` helper for testcontainer-based tests - Added vcluster creation failure diagnostics (pod state + events from host namespace) - Added `[TestName]` prefix to `waitFor` log messages for traceability in parallel output - CI test output uses `testname` format (per-test lines with timing, verbose on failure) ### Pre-loaded images Ensured all Docker images used by tests are pre-pulled and imported into k3d clusters to avoid in-test pulls: - Added operator images for upgrade tests (`v25.1.3`, `v25.2.2`, `v25.3.1`) - Added Redpanda images for license tests (`v24.2.9`, `v24.3.1-rc4`) - Added `redpanda-nightly`, `redpanda-operator-nightly` for topic controller and factory tests (cherry picked from commit ca83446) # Conflicts: # acceptance/steps/defaults.go # acceptance/steps/multicluster.go # charts/redpanda/testdata/template-cases.golden.txtar # ci/scripts/run-in-nix-docker.sh # harpoon/suite.go # operator/internal/controller/redpanda/redpanda_controller_test.go # pkg/vcluster/vcluster.go
RafalKorepta
reviewed
Apr 7, 2026
andrewstucki
approved these changes
Apr 7, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport
This will backport the following commits from
maintorelease/v26.1.x:Questions ?
Please refer to the Backport tool documentation