Skip to content

[codex] Add shared rich integration test suite#18324

Draft
xiangfu0 wants to merge 35 commits intomasterfrom
codex/shared-rich-integration-suite
Draft

[codex] Add shared rich integration test suite#18324
xiangfu0 wants to merge 35 commits intomasterfrom
codex/shared-rich-integration-suite

Conversation

@xiangfu0
Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 commented Apr 24, 2026

Summary

  • adds reusable shared-suite infrastructure for pinot-integration-tests via SharedRichClusterIntegrationTest
  • moves the low-risk no-config-override batch behind one shared rich cluster: 1 controller, 1 broker, 2 servers, 1 minion, and embedded Kafka
  • adds exact-config shared profiles for tests that need different component counts, Kafka/minion startup, auth/TLS, gRPC, query-killing, realtime, hybrid, controller, special-topology, controller-only, multi-node, or restart/preload setup
  • keeps per-class schema/table/topic/temp-dir cleanup while sharing infrastructure at suite scope
  • documents setup grouping, timings, parked candidates, and Docker/blocker notes in pinot-integration-tests/INTEGRATION_TEST_SETUP_GROUPS.md

Current Shared-Rich Batch

The main shared-rich suite currently covers 31 source test classes / 34 concrete TestNG classes / 252 TestNG tests. ErrorCodesIntegrationTest is represented by its four concrete inner classes.

Compared on this workstation:

Mode Tests Wall time
Per-class lifecycle 252 886.69s
Shared rich suite 252 563.98s

That is a 322.71s wall-clock reduction, about 36% for this batch.

Exact-Config Suites

These profiles preserve process-config buckets and component-count buckets instead of mixing incompatible setup. Some are speed wins now; some are setup-correctness buckets that become useful as more compatible tests are added.

Suite profile Tests Wall time / status
shared-mse-explain-cluster-integration-test-suite 4 23.86s
shared-no-override-offline-cluster-integration-test-suite 47 122.42s
shared-cursor-memory-cluster-integration-test-suite 19 74.29s
shared-cursor-fs-cluster-integration-test-suite 15 30.30s
shared-cursor-cron-cluster-integration-test-suite 1 24.47s
shared-empty-response-cluster-integration-test-suite 6 22.98s
shared-broker-service-discovery-cluster-integration-test-suite 1 18.09s
shared-broker-query-limit-cluster-integration-test-suite 2 21.43s
shared-null-handling-cluster-integration-test-suite 68 23.00s
shared-msq-without-stats-cluster-integration-test-suite 1 21.59s
shared-group-by-trim-cluster-integration-test-suite 2 20.59s
shared-jmx-metrics-cluster-integration-test-suite 4 26.76s
shared-window-accounting-cluster-integration-test-suite 1 19.73s
shared-offline-grpc-cluster-integration-test-suite 26 35.50s
shared-offline-secure-grpc-cluster-integration-test-suite 13 27.53s
shared-cpu-broker-query-killing-cluster-integration-test-suite 3 43.65s
shared-cpu-server-query-killing-cluster-integration-test-suite 8 43.60s
shared-memory-server-query-killing-cluster-integration-test-suite 8 40.79s
shared-msq-small-buffer-cluster-integration-test-suite 50 34.42s
shared-query-workload-cluster-integration-test-suite 1 39.11s
shared-realtime-rate-limiter-cluster-integration-test-suite 2 92.03s
shared-kafka-partition-cluster-integration-test-suite 11 112.96s
shared-exactly-once-kafka-cluster-integration-test-suite 9 104.45s
shared-realtime-manager-cluster-integration-test-suite 2 87.81s
shared-controller-service-discovery-cluster-integration-test-suite 1 16.50s
shared-cursor-auth-cluster-integration-test-suite 13 25.53s
shared-timeseries-cluster-integration-test-suite 22 18.04s
shared-timeseries-auth-cluster-integration-test-suite 22 19.11s
shared-basic-auth-batch-cluster-integration-test-suite 5 25.76s
shared-row-level-security-cluster-integration-test-suite 4 64.51s
shared-tls-cluster-integration-test-suite 21 52.14s
shared-url-auth-realtime-cluster-integration-test-suite 2 47.64s
shared-grpc-broker-cluster-integration-test-suite 2 53.04s
shared-hybrid-cluster-integration-test-suite 56 161.93s
shared-controller-periodic-tasks-cluster-integration-test-suite 5 306.88s
shared-offline-cluster-integration-test-suite 134 103.43s
shared-multi-stage-engine-custom-tenant-integration-test-suite 91 55.35s
shared-llc-realtime-cluster-integration-test-suite 54 462.10s
shared-peer-download-llc-realtime-cluster-integration-test-suite 13 106.13s
shared-confluent-schema-registry-realtime-cluster-integration-test-suite 11 blocked locally: no Docker
shared-segment-completion-cluster-integration-test-suite 1 19.41s
shared-controller-only-cluster-integration-test-suite 7 109.46s
shared-multi-nodes-offline-cluster-integration-test-suite 137 119.99s
shared-dedup-preload-cluster-integration-test-suite 1 24.75s
shared-upsert-preload-cluster-integration-test-suite 1 27.68s
disabled-manual-cluster-integration-test-suite 0 12.14s

Recent Timing Notes

  • LLC realtime now includes LLCRealtimeClusterIntegrationTest, LLCRealtimeKafka3ClusterIntegrationTest, and LLCRealtimeKafka4ClusterIntegrationTest: 54 tests, including 6 expected skips, passed with a combined per-class baseline of 508.83s; one shared profile passed in 462.10s, a 46.73s reduction.
  • Segment completion is now suite-compatible with its fake-server topology: per-class 18.03s, shared 19.41s.
  • Controller-only bucket: ServerStarterIntegrationTest and ControllerLeaderLocatorIntegrationTest passed with a combined per-class baseline of about 125.82s; one shared profile passed in 109.46s, a 16.36s reduction.
  • Multi-nodes offline bucket: MultiNodesOfflineClusterIntegrationTest passed per-class in 119.01s; one shared profile passed in 119.99s. This is a setup-correctness bucket for the 2-broker/3-server topology.
  • Dedup preload bucket: DedupPreloadIntegrationTest passed per-class in 24.10s; one shared profile passed in 24.75s. It stays separate because it restarts the server with the dedup preload server override.
  • Upsert preload bucket: UpsertTableSegmentPreloadIntegrationTest passed per-class in 28.60s; one shared profile passed in 27.68s. It stays separate because its server override enables snapshot/preload plus one preload thread.
  • Disabled manual bucket: ChaosMonkeyIntegrationTest and TPCHGeneratedQueryIntegrationTest currently have no runnable TestNG methods, so the profile starts no shared infra and passed with 0 tests in 12.14s.
  • Confluent schema-registry runtime validation is blocked on this workstation because Testcontainers cannot find a valid Docker environment; standalone setup failed after 14.86s with 1 setup failure and 10 skipped methods for the same reason.

Parked Candidates

  • CancelQueryIntegrationTests: shared patch compiled, but direct and shared validation hit the existing client-query-id cancellation race. The broker reported the client query id as unknown until the single-stage query timed out, so this needs a targeted cancellation-test fix before suite wiring.
  • PauselessRealtimeIngestionWithDedupIntegrationTest: intermittent unavailable realtime segments under strict replica-group routing.
  • KafkaPartitionSubsetChaosIntegrationTest: own chaos topology with pause/resume, force-commit, and server restart coverage.
  • PurgeMinionClusterIntegrationTest: shared-mode patch compiled, but testRealtimeLastSegmentPreservation timed out waiting for purged realtime records.
  • UpsertCompactMergeTaskIntegrationTest: shared-mode patch compiled, but the task generator skipped segments with empty download URLs and no task names were scheduled.
  • MultiStageEngineIntegrationTest, MergeRollupMinionClusterIntegrationTest: previous patch attempts need a tighter follow-up pass before inclusion.

Validation

  • ./mvnw spotless:apply -pl pinot-integration-tests
  • ./mvnw checkstyle:check -pl pinot-integration-tests
  • ./mvnw license:format -pl pinot-integration-tests
  • ./mvnw license:check -pl pinot-integration-tests
  • ./mvnw -pl pinot-integration-tests -DskipTests test-compile
  • git diff --check
  • ./mvnw -pl pinot-integration-tests -Pshared-rich-cluster-integration-test-suite test (252 tests, 563.98s)
  • direct per-class no-override batch (252 tests, 886.69s)
  • ./mvnw -pl pinot-integration-tests -Pshared-llc-realtime-cluster-integration-test-suite test (54 tests, 462.10s, 6 expected skips)
  • direct Kafka3/Kafka4 batch (36 tests, 333.44s, 4 expected skips)
  • direct SegmentCompletionIntegrationTest (1 test, 18.03s)
  • ./mvnw -pl pinot-integration-tests -Pshared-segment-completion-cluster-integration-test-suite test (1 test, 19.41s)
  • direct KafkaConfluentSchemaRegistryAvroMessageDecoderRealtimeClusterIntegrationTest attempted; blocked by missing Docker/Testcontainers environment
  • direct ServerStarterIntegrationTest (6 tests, about 109s)
  • direct ControllerLeaderLocatorIntegrationTest (1 test, 16.82s)
  • ./mvnw -pl pinot-integration-tests -Pshared-controller-only-cluster-integration-test-suite test (7 tests, 109.46s)
  • direct MultiNodesOfflineClusterIntegrationTest (137 tests, 119.01s)
  • ./mvnw -pl pinot-integration-tests -Pshared-multi-nodes-offline-cluster-integration-test-suite test (137 tests, 119.99s)
  • direct DedupPreloadIntegrationTest (1 test, 24.10s)
  • ./mvnw -pl pinot-integration-tests -Pshared-dedup-preload-cluster-integration-test-suite test (1 test, 24.75s)
  • direct UpsertTableSegmentPreloadIntegrationTest (1 test, 28.60s)
  • ./mvnw -pl pinot-integration-tests -Pshared-upsert-preload-cluster-integration-test-suite test (1 test, 27.68s)
  • ./mvnw -pl pinot-integration-tests -Pdisabled-manual-cluster-integration-test-suite test (0 tests, 12.14s)
  • direct/shared CancelQueryIntegrationTests attempted; blocked by client-query-id cancellation race
  • all other exact-suite validation commands and timings are recorded in pinot-integration-tests/INTEGRATION_TEST_SETUP_GROUPS.md

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.62%. Comparing base (77f4d01) to head (9c1425a).

Additional details and impacted files
@@            Coverage Diff            @@
##             master   #18324   +/-   ##
=========================================
  Coverage     63.61%   63.62%           
  Complexity     1659     1659           
=========================================
  Files          3246     3246           
  Lines        197549   197549           
  Branches      30577    30577           
=========================================
+ Hits         125677   125684    +7     
+ Misses        61830    61825    -5     
+ Partials      10042    10040    -2     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.57% <ø> (-0.02%) ⬇️
java-21 63.59% <ø> (+0.01%) ⬆️
temurin 63.62% <ø> (+<0.01%) ⬆️
unittests 63.61% <ø> (+<0.01%) ⬆️
unittests1 55.61% <ø> (+0.01%) ⬆️
unittests2 35.05% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 force-pushed the codex/shared-rich-integration-suite branch from 5c313da to 9c1425a Compare April 25, 2026 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants