Skip to content

[v26.1.x] cluster/rm_stm: preserve open transaction producers in local snapshots#30019

Open
vbotbuildovich wants to merge 2 commits intoredpanda-data:v26.1.xfrom
vbotbuildovich:backport-pr-30003-v26.1.x-773
Open

[v26.1.x] cluster/rm_stm: preserve open transaction producers in local snapshots#30019
vbotbuildovich wants to merge 2 commits intoredpanda-data:v26.1.xfrom
vbotbuildovich:backport-pr-30003-v26.1.x-773

Conversation

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

Backport of PR #30003

do_take_local_snapshot filters producers by finished_requests, which
drops transactional producers that have begun (fence batch applied) but
not yet replicated data batches. On restart, the snapshot is loaded and
log replay starts from the snapshot offset skipping the fence batch.
Data batches replayed without the fence synthesize transaction state
with tx_seq{-1} and timeout=nullopt. This makes the transaction
impossible to commit (tx_seq mismatch), abort (tx_seq seen as from the
future), or auto-expire (timeout=max), permanently stalling LSO on the
partition.

Include producers with in-progress transactions in local snapshots
regardless of finished_requests so their tx_seq, timeout, and
coordinator partition survive the snapshot roundtrip.

(cherry picked from commit a697aa8)
@vbotbuildovich vbotbuildovich added this to the v26.1.x-next milestone Mar 31, 2026
@vbotbuildovich vbotbuildovich added the kind/backport PRs targeting a stable branch label Mar 31, 2026
@ballard26 ballard26 enabled auto-merge March 31, 2026 19:46
@vbotbuildovich
Copy link
Copy Markdown
Collaborator Author

vbotbuildovich commented Mar 31, 2026

Retry command for Build#82568

please wait until all jobs are finished before running the slash command

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/cluster_features_test.py::FeaturesMultiNodeUpgradeTest.test_upgrade
tests/rptest/tests/cluster_features_test.py::FeaturesMultiNodeUpgradeTest.test_rollback
tests/rptest/tests/license_enforcement_test.py::LicenseEnforcementTest.test_license_enforcement@{"clean_node_after_recovery":true,"clean_node_before_recovery":true}
tests/rptest/tests/license_enforcement_test.py::LicenseEnforcementTest.test_license_enforcement@{"clean_node_after_recovery":false,"clean_node_before_recovery":true}
tests/rptest/tests/license_enforcement_test.py::LicenseEnforcementTest.test_license_enforcement@{"clean_node_after_recovery":true,"clean_node_before_recovery":false}
tests/rptest/tests/cluster_features_test.py::FeaturesNodeJoinTest.test_old_node_join
tests/rptest/tests/license_enforcement_test.py::LicenseEnforcementTest.test_license_enforcement@{"clean_node_after_recovery":false,"clean_node_before_recovery":false}
tests/rptest/tests/cluster_features_test.py::FeaturesSingleNodeUpgradeTest.test_upgrade

@vbotbuildovich
Copy link
Copy Markdown
Collaborator Author

vbotbuildovich commented Mar 31, 2026

CI test results

test results on build#82568
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
FeaturesMultiNodeUpgradeTest test_rollback null integration https://buildkite.com/redpanda/redpanda/builds/82568#019d4554-52e4-4fc2-913f-f3f460c67806 FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.3771, p0=0.0001, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=FeaturesMultiNodeUpgradeTest&test_method=test_rollback
FeaturesMultiNodeUpgradeTest test_rollback null integration https://buildkite.com/redpanda/redpanda/builds/82568#019d4556-17f6-4649-bd51-4218f3b70d47 FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.3764, p0=0.0001, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=FeaturesMultiNodeUpgradeTest&test_method=test_rollback
FeaturesMultiNodeUpgradeTest test_upgrade null integration https://buildkite.com/redpanda/redpanda/builds/82568#019d4554-52e5-4b68-a226-520fec111b20 FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.3771, p0=0.0001, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=FeaturesMultiNodeUpgradeTest&test_method=test_upgrade
FeaturesMultiNodeUpgradeTest test_upgrade null integration https://buildkite.com/redpanda/redpanda/builds/82568#019d4556-17f6-4617-b28d-d3f7fcef95a0 FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.3764, p0=0.0001, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=FeaturesMultiNodeUpgradeTest&test_method=test_upgrade
FeaturesNodeJoinTest test_old_node_join null integration https://buildkite.com/redpanda/redpanda/builds/82568#019d4554-52e5-4f50-b289-3cd21304cf2b FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.3757, p0=0.0001, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=FeaturesNodeJoinTest&test_method=test_old_node_join
FeaturesNodeJoinTest test_old_node_join null integration https://buildkite.com/redpanda/redpanda/builds/82568#019d4556-17f7-4da0-bf5f-c45deb202850 FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.3757, p0=0.0001, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=FeaturesNodeJoinTest&test_method=test_old_node_join
FeaturesSingleNodeUpgradeTest test_upgrade null integration https://buildkite.com/redpanda/redpanda/builds/82568#019d4554-52e8-4c4a-a34d-d82a910f832d FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.3636, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=FeaturesSingleNodeUpgradeTest&test_method=test_upgrade
FeaturesSingleNodeUpgradeTest test_upgrade null integration https://buildkite.com/redpanda/redpanda/builds/82568#019d4556-17fb-4b0c-94bd-93a66f4a74f4 FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.3636, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=FeaturesSingleNodeUpgradeTest&test_method=test_upgrade
LicenseEnforcementTest test_license_enforcement {"clean_node_after_recovery": false, "clean_node_before_recovery": false} integration https://buildkite.com/redpanda/redpanda/builds/82568#019d4554-52e5-4f50-b289-3cd21304cf2b FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.3679, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=LicenseEnforcementTest&test_method=test_license_enforcement
LicenseEnforcementTest test_license_enforcement {"clean_node_after_recovery": false, "clean_node_before_recovery": false} integration https://buildkite.com/redpanda/redpanda/builds/82568#019d4556-17f7-4da0-bf5f-c45deb202850 FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.3679, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=LicenseEnforcementTest&test_method=test_license_enforcement
LicenseEnforcementTest test_license_enforcement {"clean_node_after_recovery": true, "clean_node_before_recovery": false} integration https://buildkite.com/redpanda/redpanda/builds/82568#019d4554-52e6-4a58-a318-b8b06fb38359 FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.3681, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=LicenseEnforcementTest&test_method=test_license_enforcement
LicenseEnforcementTest test_license_enforcement {"clean_node_after_recovery": true, "clean_node_before_recovery": false} integration https://buildkite.com/redpanda/redpanda/builds/82568#019d4556-17f7-4131-92df-5e86efe32ce0 FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.3679, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=LicenseEnforcementTest&test_method=test_license_enforcement
LicenseEnforcementTest test_license_enforcement {"clean_node_after_recovery": false, "clean_node_before_recovery": true} integration https://buildkite.com/redpanda/redpanda/builds/82568#019d4554-52e7-41e7-b9c3-36cb34475889 FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.3681, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=LicenseEnforcementTest&test_method=test_license_enforcement
LicenseEnforcementTest test_license_enforcement {"clean_node_after_recovery": false, "clean_node_before_recovery": true} integration https://buildkite.com/redpanda/redpanda/builds/82568#019d4556-17f8-4340-b0f3-9a23ed2ef977 FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.3679, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=LicenseEnforcementTest&test_method=test_license_enforcement
LicenseEnforcementTest test_license_enforcement {"clean_node_after_recovery": true, "clean_node_before_recovery": true} integration https://buildkite.com/redpanda/redpanda/builds/82568#019d4554-52e7-4da5-bc75-72f3ca68e517 FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.3681, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=LicenseEnforcementTest&test_method=test_license_enforcement
LicenseEnforcementTest test_license_enforcement {"clean_node_after_recovery": true, "clean_node_before_recovery": true} integration https://buildkite.com/redpanda/redpanda/builds/82568#019d4556-17f9-44dd-9231-d3e56c2ac0f9 FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.3679, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=LicenseEnforcementTest&test_method=test_license_enforcement
WriteCachingFailureInjectionE2ETest test_crash_all {"use_transactions": false} integration https://buildkite.com/redpanda/redpanda/builds/82568#019d4556-17f9-44dd-9231-d3e56c2ac0f9 FLAKY 17/21 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0860, p0=0.2445, reject_threshold=0.0100. adj_baseline=0.2365, p1=0.2689, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all
test results on build#82779
test_status test_class test_method test_arguments test_kind job_url passed reason test_history
FLAKY(PASS) DebugRowsTest test_read_and_write_rows null integration https://buildkite.com/redpanda/redpanda/builds/82779#019d62c3-d755-4d55-9b10-2a1192764c5c 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0028, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=DebugRowsTest&test_method=test_read_and_write_rows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/redpanda kind/backport PRs targeting a stable branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants