feat(backends): pickle-counts gate in quarantine_stale_hnsw#1271
Open
sha2fiddy wants to merge 2 commits intoMemPalace:developfrom
Open
feat(backends): pickle-counts gate in quarantine_stale_hnsw#1271sha2fiddy wants to merge 2 commits intoMemPalace:developfrom
sha2fiddy wants to merge 2 commits intoMemPalace:developfrom
Conversation
Adds a third gate that catches the off-by-one corruption shape from chroma-core/chroma#6979: ``len(id_to_label) != total_elements_added`` in ``index_metadata.pickle``. The byte-sniff in ``_segment_appears_healthy`` cannot see this — the pickle is structurally valid; the inconsistency lives in the values, and HNSW traversal eventually lands on the unmapped label and SIGSEGVs (same shape diagnosed at MemPalace#1046). Uses a bounded unpickler that refuses any non-builtin ``find_class``, so a structurally-valid-but-malicious pickle cannot execute arbitrary code at quarantine-check time. Older 0.6.x palaces that stored metadata as a custom-class instance return "undetermined" and the existing byte-sniff verdict stands. 8 new tests; 53/53 in test_backends.py and 1469/1469 in the full suite pass.
Local autoformat hook used ruff 0.15.9 which produced output the CI's pinned `ruff>=0.4.0,<0.5` rejected. Pinned local venv to match and re-formatted. Reverts an incidental reformat of the test_make_client_quarantines_only_on_first_call_per_palace assert block that 0.15.9 introduced.
d20d4f8 to
cd25502
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
quarantine_stale_hnswmisses one corruption shape:index_metadata.picklewherelen(id_to_label) != total_elements_added. Stage 2's byte-sniff sees the PROTO + STOP markers and returns healthy, so the segment stays in place. HNSW traversal eventually dereferences the unmapped label and SIGSEGVs.Same shape as chroma-core/chroma#6979. Hit on a real 522k-drawer palace 2026-04-28; recovered with #1126's
repair --mode hnsw. This gate would have caught it before the crash.Fix
A stage 3 check runs only when stage 2 passes: deserialize
index_metadata.pickleand comparelen(id_to_label) == total_elements_added.<uuid>.drift-<timestamp>(same path as the stage 2 failure)._BoundedUnpicklerrefuses any non-builtinfind_class, so a structurally-valid-but-malicious pickle cannot execute code at quarantine time.Tests
8 new in
tests/test_backends.py:_segment_pickle_counts_consistent): matching, off-by-one, class-pickle (refused), missing keys, missing file.quarantine_stale_hnsw): quarantine on off-by-one, leave-alone on consistent counts, leave-alone on byte-sniff-pass + unparseable (the 0.6.x path).53/53 in
test_backends.pyand 1469/1469 in the full suite pass. ruff clean. No dependency or lockfile changes.