You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Continuation of RFC #595 (Phase 5–9). This RFC proposes five new Synapse pipeline phases that address open, unresolved issues in the repository. All phases are opt-in via RetrievalProfile flags and fully backward-compatible — existing behavior is unchanged when flags are off.
Motivation
Phases 5–9 (PR #596) added MMR deduplication, Pinned Memory, Query Expansion, Supersede Detection, and the Consolidation Engine on top of ChromaDB's default col.query(). However, several critical problems remain open with no PRs addressing them:
Problem
Issues
Status
Embedding model mismatch between ingest and query — silent search failure
This RFC addresses all five with new Synapse pipeline phases.
DX Guarantee
Every phase includes developer-experience instrumentation (timing, tracing, dry-run support, replay logging, and test hooks). All DX features are observability-only — they never modify scores, rankings, candidate selection, or any write path. With all DX flags at their defaults, the pipeline produces byte-identical results to a build without these features. This invariant is enforced by a dedicated test (test_dx_flags_off_identical_results) that runs the same query with all DX flags ON and OFF and asserts that the returned drawer IDs, order, and scores match exactly.
DX capability
Default
When OFF
Phase-level timing
always on
N/A — ~100 ns overhead per phase; no effect on scores
Candidate trace
false
single if guard, zero allocation
Dry run
false
not a flag — explicit per-call argument; normal calls unaffected
Replay logging
false
no writes beyond existing Query Expansion logging
Assertion hooks
none registered
iteration over empty list is a no-op
Phase 10: Model Guard — Embedding Consistency Validation
Problem
The MCP server relies on ChromaDB's built-in default embedding function (all-MiniLM-L6-v2, 384 dimensions). There is no centralized embedding model configuration. If a user ingests with all-mpnet-base-v2 (768-dim), every MCP query silently returns garbage because 384-dim query vectors are compared against 768-dim stored vectors. The dimensional mismatch makes cosine similarity mathematically meaningless, and even when dimensions match, different models produce incompatible vector spaces.
Approach
Insert a validation gate at the very start of the Synapse pipeline (before Phase 1 LTP scoring).
Build-time stamp: During mempalace mine, write the embedding model name and dimension to collection.metadata (ChromaDB native) and optionally to palace_meta.json. Format: {"embedding_model": "all-mpnet-base-v2", "embedding_dim": 768}.
Query-time check: At the start of search_memories(), read the stored model metadata and compare against the currently loaded model's name and dimension.
On mismatch: Add "model_guard": "MISMATCH" to pipeline_trace and populate a warnings field: "ingest model: all-mpnet-base-v2 (768), query model: all-MiniLM-L6-v2 (384) — results may be unreliable". If RetrievalProfile.model_guard_strict is true, return an empty result set with an error message instead of garbage results.
RetrievalProfile flags
model_guard: bool (default true) — enable/disable the check.
model_guard_strict: bool (default false) — if true, block search on mismatch instead of warning.
Developer experience
Timing: phase_timing_ms.model_guard reports the time spent on metadata read + comparison (typically <2 ms). Always included in pipeline_trace.
Candidate trace: When candidate_trace: true, a mismatch event is recorded as {"phase": "model_guard", "action": "warn" | "block", "ingest_model": "...", "query_model": "...", "dim_ingest": 768, "dim_query": 384}. This lets developers confirm that Model Guard fired and what it detected.
Dry run: dry_run=true executes the full check but never writes to palace_meta.json — useful for verifying detection logic against a test palace without altering its metadata.
Replay: The replay_packet records model_guard_result: "MATCH" | "MISMATCH" so that replayed searches reflect the original model state, even if the model has since been changed.
Test hooks: register_hook("post_model_guard", fn) fires after validation with context {"match": bool, "ingest_model": str, "query_model": str}. Test example:
Impact: Eliminates silent search failure. Low implementation cost (metadata read/write + comparison only). Acts as a safety net for all future embedding model PRs (#515, #553, #756).
Phase 11: Cross-Wing Balancing — Wing-Aware Diversity in MMR
Problem
When a palace has wings of vastly different sizes (e.g., a work wing with hundreds of JSON/YAML files vs. a flights wing with a few travel itineraries), unscoped search returns results exclusively from the largest wing. Searching "cambodia" returns swagger.yaml and schema.rb from work instead of the trip itinerary from flights that explicitly mentions "Cambodia", "Phnom Penh", and "Siem Reap". Similarity scores are negative (-0.53 to -0.58), meaning all results are poor — but the large wing's vectors dominate the nearest-neighbor space simply due to volume.
Approach
Extend Phase 5 (MMR) with a wing-diversity penalty term.
Where wing_saturation(d, S) = count(d.wing in S) / |S| — the fraction of already-selected results that share the same wing as candidate d.
To ensure small-wing drawers enter the candidate pool, the initial ChromaDB query requests n_results = k × wing_count (e.g., 5 × 20 = 100) instead of the bare k. MMR then selects the final k from this expanded pool.
RetrievalProfile flags
wing_diversity_weight: float (default 0.3) — the γ parameter. Set to 0.0 to disable (equivalent to standard MMR).
candidate_pool_multiplier: int (default 1) — multiplier for the initial n_results. Set to wing_count for full cross-wing coverage.
Developer experience
Timing: phase_timing_ms.cross_wing_balancing reports the time spent computing wing_saturation penalties and re-ranking. Reported separately from phase_timing_ms.mmr so developers can see the incremental cost of the diversity layer.
Candidate trace: When candidate_trace: true, each drawer dropped by Cross-Wing Balancing is logged with the reason:
This directly answers "why didn't my flights drawer show up?" — because work had already filled 4 of 5 slots.
Dry run: dry_run=true computes all penalties and produces the full candidate trace without persisting any LTP updates that might be triggered by the expanded candidate pool.
Replay: replay_packet records wing_counts_at_query_time: {"work": 4200, "flights": 23, ...} so the saturation math can be reproduced exactly.
Test hooks: register_hook("post_cross_wing", fn) fires with context {"candidates_before": [...], "candidates_after": [...], "wing_saturation_scores": {...}}. Test example:
Impact: Resolves the "large room dominates" problem without requiring --room filters. Incremental change to existing MMR logic. Setting γ = 0 produces identical results to current behavior.
The current pipeline_trace reports aggregate metrics (phases_applied, phases_skipped, total_candidates_in/out, elapsed_ms) but does not explain why any individual drawer ranked where it did. This makes it impossible to debug ranking issues, validate the pipeline's value-add over raw ChromaDB, or produce transparent benchmarks.
Approach
Attach a score_breakdown object to each returned drawer's metadata.
Each phase emits a ScoreEvent(phase: str, delta: float, reason: str) when it modifies a drawer's score.
Each drawer carries a score_events: list[ScoreEvent] through the pipeline.
At the end of the pipeline, score_events are aggregated into score_breakdown.
score_breakdown is only included when RetrievalProfile.explain is true (default false) to avoid overhead in normal searches.
The mempalace_search MCP tool accepts an explain: bool argument to enable this on a per-call basis.
This follows the Explainable IR methodology described in Anand et al. (arXiv:2211.02405) — pointwise score decomposition across ranking features.
RetrievalProfile flags
explain: bool (default false) — include score_breakdown in results.
Developer experience
Phase 12 is itself a DX feature — score_breakdownis the developer's primary debugging tool for ranking questions. The additional DX instrumentation layers on top of it:
Timing: phase_timing_ms.score_explainability reports the aggregation cost of ScoreEvent lists into score_breakdown. When explain: false, this phase is skipped entirely (no timing entry).
Candidate trace: When both candidate_trace: true and explain: true, dropped drawers also include their score_breakdown at the point of exclusion. This answers "drawer X was dropped at MMR — what was its score at that moment?":
Dry run: dry_run=true with explain=true produces a complete score_breakdown for every candidate (not just the final k), giving a full "what-if" view of the entire ranking.
Replay: replay_packet records the full score_events list for every returned drawer, enabling exact reconstruction of the scoring process after the fact.
Test hooks: register_hook("post_score_event", fn) fires on every individual ScoreEvent emission. This is the finest-grained hook — it enables assertions like "MMR penalty for drawer X was between -0.1 and -0.05":
The preCompact hook in Claude Code permanently blocks /compact. When the context window fills up, the hook fires and instructs the model to save everything to MemPalace first — but even after saving, compaction cannot resume because the hook has no mechanism to signal "save complete, proceed with compaction." The root cause is the absence of any criteria for deciding what to keep and what to compress.
Approach
Use Synapse's LTP scores and Supersede Detection to automatically identify compaction candidates.
Trigger: When context token usage exceeds a configurable threshold (e.g., 80%), the Synapse pipeline generates a "compaction candidate list."
Candidate criteria (all must be met):
ltp_score < compaction_threshold (default 0.3) — not important long-term.
is_superseded = true — a newer drawer has replaced this one.
last_accessed > compaction_age_days (default 30) — not recently referenced.
Compaction action: Feed candidates to the Consolidation Engine (Phase 9) for summarization. Original drawers are moved to soft-archive (reversible). The consolidated summary drawer replaces them in active search.
Hook integration: The preCompact hook calls mempalace_session_context to get compaction candidates → executes Consolidation → returns "compaction OK" to Claude Code. This creates an automated flow instead of a permanent block.
Optional decay: Apply a time-based decay factor to LTP scores: decayed_ltp = ltp_score × e^(−λt), where t is days since last access and λ is the decay constant. This models Ebbinghaus's forgetting curve — memories that are never revisited naturally become compaction candidates.
Timing: phase_timing_ms.adaptive_compaction reports the total time for candidate selection + consolidation planning. Broken into sub-timings: ltp_scan_ms, supersede_check_ms, consolidation_plan_ms.
Candidate trace: This phase's trace is especially critical because compaction removes drawers from active search. When candidate_trace: true, every compaction candidate is logged with full justification:
Dry run: Essential for this phase.dry_run=true executes the full candidate selection and consolidation planning but performs zero writes — no soft-archive moves, no summary drawer creation, no LTP updates. Returns the complete compaction plan as output so the developer can review what would be compressed before committing. mempalace_consolidate(mode="evaluate") (existing) is extended to work with the auto-generated candidate list.
Replay: replay_packet records compaction_candidates: [...] and compaction_plan: {groups: [...], summary_count: N}. This allows post-mortem analysis: "the compaction on April 15th archived 23 drawers — were they all correctly identified?"
Test hooks: register_hook("post_compaction_plan", fn) fires after candidate selection with context {"candidates": [...], "groups": [...], "would_archive": int, "would_create_summaries": int}. Test example:
Impact: Unblocks the /compact workflow in Claude Code. Introduces cognitively plausible memory management. Dry run + candidate trace make compaction auditable and reversible.
Dependency: Phase 14 (Paginated Scoring) should be implemented first, since Adaptive Compaction needs to scan all drawers to identify candidates across the entire palace.
Phase 14: Paginated Scoring — Large Palace Support
Problem
col.get(limit=10000) is hardcoded in multiple paths (miner.py, mcp_server.py). On palaces with >10,000 drawers, this silently truncates results — mempalace status shows "10,000 drawers" when the actual count is 122,686. Wing/room breakdowns are incomplete (some wings are entirely missing). The same limitation prevents Synapse phases (LTP scoring, Supersede Detection) from scanning the full palace.
Approach
Introduce cursor-based pagination inside the Synapse pipeline.
Accurate count: Use col.count() for the true total. Display this in mempalace_status.
Batched iteration: Iterate with offset + limit (batch size 5,000):
Incremental LTP updates: LTP scores are persisted in synapse.sqlite3. Batch processing updates only drawers whose filed_at is newer than the last scan timestamp — no full rescan needed on every call.
Bounded scan: RetrievalProfile.max_scan_depth (default 50000) limits the total number of drawers scanned in a single pipeline run, preventing runaway processing on extremely large palaces.
batch_size: int (default 5000) — number of drawers per batch.
max_scan_depth: int (default 50000) — upper bound on total drawers scanned.
Developer experience
Timing: phase_timing_ms.paginated_scoring reports total iteration time, with sub-timings per batch: batch_timings_ms: [12, 14, 11, 13, ...]. This reveals whether specific batches are slow (e.g., batch 8 takes 200 ms because it hits a large wing), enabling targeted optimization.
This lets developers see which regions of the palace contain actively changing drawers vs. stable ones, informing batch size tuning.
Dry run: dry_run=true iterates through all batches and computes LTP scores and supersede candidates but writes nothing to synapse.sqlite3. Returns a summary of what would be updated. Useful for estimating the cost of a full-palace scan before committing.
Replay: replay_packet records pagination_state: {total: 122686, batches_processed: 25, last_offset: 122686, scan_depth_reached: false}. This allows developers to confirm that a past search actually covered the full palace or was truncated by max_scan_depth.
Test hooks: register_hook("post_batch", fn) fires after each batch with context {"batch_index": int, "offset": int, "batch_size": int, "ltp_updates": int}. Test example:
Impact: Correct status display on large palaces. Full Synapse pipeline coverage regardless of palace size. Incremental updates minimize repeated computation. Unblocks Phase 13 (Adaptive Compaction) which requires full-palace scans.
DX invariant: test_dx_flags_off_identical_results — runs the same query with all DX flags ON and OFF, asserts byte-identical drawer IDs, order, and scores.
RFC: Synapse Phase 10–14
Continuation of RFC #595 (Phase 5–9). This RFC proposes five new Synapse pipeline phases that address open, unresolved issues in the repository. All phases are opt-in via
RetrievalProfileflags and fully backward-compatible — existing behavior is unchanged when flags are off.Motivation
Phases 5–9 (PR #596) added MMR deduplication, Pinned Memory, Query Expansion, Supersede Detection, and the Consolidation Engine on top of ChromaDB's default
col.query(). However, several critical problems remain open with no PRs addressing them:/compactin Claude Codecol.get(limit=10000)silently truncates on large palaces (>10K drawers)This RFC addresses all five with new Synapse pipeline phases.
DX Guarantee
Every phase includes developer-experience instrumentation (timing, tracing, dry-run support, replay logging, and test hooks). All DX features are observability-only — they never modify scores, rankings, candidate selection, or any write path. With all DX flags at their defaults, the pipeline produces byte-identical results to a build without these features. This invariant is enforced by a dedicated test (
test_dx_flags_off_identical_results) that runs the same query with all DX flags ON and OFF and asserts that the returned drawer IDs, order, and scores match exactly.falseifguard, zero allocationfalsefalsePhase 10: Model Guard — Embedding Consistency Validation
Problem
The MCP server relies on ChromaDB's built-in default embedding function (
all-MiniLM-L6-v2, 384 dimensions). There is no centralized embedding model configuration. If a user ingests withall-mpnet-base-v2(768-dim), every MCP query silently returns garbage because 384-dim query vectors are compared against 768-dim stored vectors. The dimensional mismatch makes cosine similarity mathematically meaningless, and even when dimensions match, different models produce incompatible vector spaces.Approach
Insert a validation gate at the very start of the Synapse pipeline (before Phase 1 LTP scoring).
mempalace mine, write the embedding model name and dimension tocollection.metadata(ChromaDB native) and optionally topalace_meta.json. Format:{"embedding_model": "all-mpnet-base-v2", "embedding_dim": 768}.search_memories(), read the stored model metadata and compare against the currently loaded model's name and dimension."model_guard": "MISMATCH"topipeline_traceand populate awarningsfield:"ingest model: all-mpnet-base-v2 (768), query model: all-MiniLM-L6-v2 (384) — results may be unreliable". IfRetrievalProfile.model_guard_strictistrue, return an empty result set with an error message instead of garbage results.RetrievalProfile flags
model_guard: bool(defaulttrue) — enable/disable the check.model_guard_strict: bool(defaultfalse) — iftrue, block search on mismatch instead of warning.Developer experience
phase_timing_ms.model_guardreports the time spent on metadata read + comparison (typically <2 ms). Always included inpipeline_trace.candidate_trace: true, a mismatch event is recorded as{"phase": "model_guard", "action": "warn" | "block", "ingest_model": "...", "query_model": "...", "dim_ingest": 768, "dim_query": 384}. This lets developers confirm that Model Guard fired and what it detected.dry_run=trueexecutes the full check but never writes topalace_meta.json— useful for verifying detection logic against a test palace without altering its metadata.replay_packetrecordsmodel_guard_result: "MATCH" | "MISMATCH"so that replayed searches reflect the original model state, even if the model has since been changed.register_hook("post_model_guard", fn)fires after validation with context{"match": bool, "ingest_model": str, "query_model": str}. Test example:Impact: Eliminates silent search failure. Low implementation cost (metadata read/write + comparison only). Acts as a safety net for all future embedding model PRs (#515, #553, #756).
Phase 11: Cross-Wing Balancing — Wing-Aware Diversity in MMR
Problem
When a palace has wings of vastly different sizes (e.g., a
workwing with hundreds of JSON/YAML files vs. aflightswing with a few travel itineraries), unscoped search returns results exclusively from the largest wing. Searching "cambodia" returns swagger.yaml and schema.rb fromworkinstead of the trip itinerary fromflightsthat explicitly mentions "Cambodia", "Phnom Penh", and "Siem Reap". Similarity scores are negative (-0.53 to -0.58), meaning all results are poor — but the large wing's vectors dominate the nearest-neighbor space simply due to volume.Approach
Extend Phase 5 (MMR) with a wing-diversity penalty term.
Standard MMR:
Cross-Wing MMR:
Where
wing_saturation(d, S) = count(d.wing in S) / |S|— the fraction of already-selected results that share the same wing as candidated.To ensure small-wing drawers enter the candidate pool, the initial ChromaDB query requests
n_results = k × wing_count(e.g.,5 × 20 = 100) instead of the barek. MMR then selects the finalkfrom this expanded pool.RetrievalProfile flags
wing_diversity_weight: float(default0.3) — theγparameter. Set to0.0to disable (equivalent to standard MMR).candidate_pool_multiplier: int(default1) — multiplier for the initialn_results. Set towing_countfor full cross-wing coverage.Developer experience
phase_timing_ms.cross_wing_balancingreports the time spent computingwing_saturationpenalties and re-ranking. Reported separately fromphase_timing_ms.mmrso developers can see the incremental cost of the diversity layer.candidate_trace: true, each drawer dropped by Cross-Wing Balancing is logged with the reason:{ "drawer_id": "abc-123", "dropped_at": "cross_wing_balancing", "reason": "wing 'work' already saturated (4/5 selected results)", "wing_saturation": 0.80, "original_mmr_score": 0.71, "penalized_score": 0.47 }This directly answers "why didn't my flights drawer show up?" — because work had already filled 4 of 5 slots.
dry_run=truecomputes all penalties and produces the full candidate trace without persisting any LTP updates that might be triggered by the expanded candidate pool.replay_packetrecordswing_counts_at_query_time: {"work": 4200, "flights": 23, ...}so the saturation math can be reproduced exactly.register_hook("post_cross_wing", fn)fires with context{"candidates_before": [...], "candidates_after": [...], "wing_saturation_scores": {...}}. Test example:Impact: Resolves the "large room dominates" problem without requiring
--roomfilters. Incremental change to existing MMR logic. Settingγ = 0produces identical results to current behavior.Phase 12: Score Explainability — Per-Drawer Score Breakdown
Problem
The current
pipeline_tracereports aggregate metrics (phases_applied,phases_skipped,total_candidates_in/out,elapsed_ms) but does not explain why any individual drawer ranked where it did. This makes it impossible to debug ranking issues, validate the pipeline's value-add over raw ChromaDB, or produce transparent benchmarks.Approach
Attach a
score_breakdownobject to each returned drawer's metadata.{ "score_breakdown": { "cosine_similarity": 0.82, "ltp_boost": 0.15, "mmr_penalty": -0.08, "wing_diversity_penalty": -0.02, "pinned_boost": 0.04, "supersede_penalty": 0.00, "final_score": 0.91 } }Implementation:
ScoreEvent(phase: str, delta: float, reason: str)when it modifies a drawer's score.score_events: list[ScoreEvent]through the pipeline.score_eventsare aggregated intoscore_breakdown.score_breakdownis only included whenRetrievalProfile.explainistrue(defaultfalse) to avoid overhead in normal searches.mempalace_searchMCP tool accepts anexplain: boolargument to enable this on a per-call basis.This follows the Explainable IR methodology described in Anand et al. (arXiv:2211.02405) — pointwise score decomposition across ranking features.
RetrievalProfile flags
explain: bool(defaultfalse) — includescore_breakdownin results.Developer experience
Phase 12 is itself a DX feature —
score_breakdownis the developer's primary debugging tool for ranking questions. The additional DX instrumentation layers on top of it:phase_timing_ms.score_explainabilityreports the aggregation cost ofScoreEventlists intoscore_breakdown. Whenexplain: false, this phase is skipped entirely (no timing entry).candidate_trace: trueandexplain: true, dropped drawers also include theirscore_breakdownat the point of exclusion. This answers "drawer X was dropped at MMR — what was its score at that moment?":{ "drawer_id": "abc-123", "dropped_at": "mmr", "reason": "similarity to xyz-789 = 0.92 > threshold 0.85", "score_at_drop": { "cosine_similarity": 0.78, "ltp_boost": 0.10, "partial_score": 0.88 } }dry_run=truewithexplain=trueproduces a completescore_breakdownfor every candidate (not just the finalk), giving a full "what-if" view of the entire ranking.replay_packetrecords the fullscore_eventslist for every returned drawer, enabling exact reconstruction of the scoring process after the fact.register_hook("post_score_event", fn)fires on every individualScoreEventemission. This is the finest-grained hook — it enables assertions like "MMR penalty for drawer X was between -0.1 and -0.05":Impact: Full transparency into ranking decisions. Enables A/B comparison ("cosine-only" vs. "full pipeline") for benchmarking.
Phase 13: Adaptive Compaction — LTP-Based Memory Compression
Problem
The
preCompacthook in Claude Code permanently blocks/compact. When the context window fills up, the hook fires and instructs the model to save everything to MemPalace first — but even after saving, compaction cannot resume because the hook has no mechanism to signal "save complete, proceed with compaction." The root cause is the absence of any criteria for deciding what to keep and what to compress.Approach
Use Synapse's LTP scores and Supersede Detection to automatically identify compaction candidates.
ltp_score < compaction_threshold(default0.3) — not important long-term.is_superseded = true— a newer drawer has replaced this one.last_accessed > compaction_age_days(default30) — not recently referenced.preCompacthook callsmempalace_session_contextto get compaction candidates → executes Consolidation → returns "compaction OK" to Claude Code. This creates an automated flow instead of a permanent block.decayed_ltp = ltp_score × e^(−λt), wheretis days since last access andλis the decay constant. This models Ebbinghaus's forgetting curve — memories that are never revisited naturally become compaction candidates.RetrievalProfile flags
adaptive_compaction: bool(defaultfalse) — enable automatic candidate generation.compaction_threshold: float(default0.3) — LTP score below which a drawer becomes a candidate.compaction_age_days: int(default30) — minimum days since last access.decay_enabled: bool(defaultfalse) — apply forgetting-curve decay to LTP scores.decay_lambda: float(default0.01) — decay rate constant.Developer experience
phase_timing_ms.adaptive_compactionreports the total time for candidate selection + consolidation planning. Broken into sub-timings:ltp_scan_ms,supersede_check_ms,consolidation_plan_ms.candidate_trace: true, every compaction candidate is logged with full justification:{ "drawer_id": "old-note-42", "action": "compact", "reason": "ltp=0.18 < 0.3, superseded_by=new-note-99, last_accessed=47 days ago", "ltp_score": 0.18, "decayed_ltp": 0.12, "superseded_by": "new-note-99", "days_since_access": 47, "consolidated_into": "summary-drawer-7" }dry_run=trueexecutes the full candidate selection and consolidation planning but performs zero writes — no soft-archive moves, no summary drawer creation, no LTP updates. Returns the complete compaction plan as output so the developer can review what would be compressed before committing.mempalace_consolidate(mode="evaluate")(existing) is extended to work with the auto-generated candidate list.replay_packetrecordscompaction_candidates: [...]andcompaction_plan: {groups: [...], summary_count: N}. This allows post-mortem analysis: "the compaction on April 15th archived 23 drawers — were they all correctly identified?"register_hook("post_compaction_plan", fn)fires after candidate selection with context{"candidates": [...], "groups": [...], "would_archive": int, "would_create_summaries": int}. Test example:Impact: Unblocks the
/compactworkflow in Claude Code. Introduces cognitively plausible memory management. Dry run + candidate trace make compaction auditable and reversible.Dependency: Phase 14 (Paginated Scoring) should be implemented first, since Adaptive Compaction needs to scan all drawers to identify candidates across the entire palace.
Phase 14: Paginated Scoring — Large Palace Support
Problem
col.get(limit=10000)is hardcoded in multiple paths (miner.py,mcp_server.py). On palaces with >10,000 drawers, this silently truncates results —mempalace statusshows "10,000 drawers" when the actual count is 122,686. Wing/room breakdowns are incomplete (some wings are entirely missing). The same limitation prevents Synapse phases (LTP scoring, Supersede Detection) from scanning the full palace.Approach
Introduce cursor-based pagination inside the Synapse pipeline.
col.count()for the true total. Display this inmempalace_status.offset+limit(batch size 5,000):synapse.sqlite3. Batch processing updates only drawers whosefiled_atis newer than the last scan timestamp — no full rescan needed on every call.RetrievalProfile.max_scan_depth(default50000) limits the total number of drawers scanned in a single pipeline run, preventing runaway processing on extremely large palaces.RetrievalProfile flags
paginated_scoring: bool(defaulttrue) — enable paginated iteration.batch_size: int(default5000) — number of drawers per batch.max_scan_depth: int(default50000) — upper bound on total drawers scanned.Developer experience
phase_timing_ms.paginated_scoringreports total iteration time, with sub-timings per batch:batch_timings_ms: [12, 14, 11, 13, ...]. This reveals whether specific batches are slow (e.g., batch 8 takes 200 ms because it hits a large wing), enabling targeted optimization.{ "phase_timing_ms": { "paginated_scoring": { "total_ms": 340, "batches": 25, "batch_timings_ms": [12, 14, 11, 13, "..."], "avg_batch_ms": 13.6, "slowest_batch": {"index": 8, "ms": 47, "offset": 40000} } } }candidate_trace: true, each batch reports its contribution to the pipeline:{ "batch_index": 3, "offset": 15000, "drawers_in_batch": 5000, "ltp_updated": 127, "ltp_skipped_unchanged": 4873, "supersede_candidates_found": 4 }This lets developers see which regions of the palace contain actively changing drawers vs. stable ones, informing batch size tuning.
dry_run=trueiterates through all batches and computes LTP scores and supersede candidates but writes nothing tosynapse.sqlite3. Returns a summary of what would be updated. Useful for estimating the cost of a full-palace scan before committing.replay_packetrecordspagination_state: {total: 122686, batches_processed: 25, last_offset: 122686, scan_depth_reached: false}. This allows developers to confirm that a past search actually covered the full palace or was truncated bymax_scan_depth.register_hook("post_batch", fn)fires after each batch with context{"batch_index": int, "offset": int, "batch_size": int, "ltp_updates": int}. Test example:Impact: Correct
statusdisplay on large palaces. Full Synapse pipeline coverage regardless of palace size. Incremental updates minimize repeated computation. Unblocks Phase 13 (Adaptive Compaction) which requires full-palace scans.Implementation Order
Each phase will be submitted as a separate PR against
develop, following the same pattern as PR #596.Test Plan
Each phase adds tests to
tests/test_synapse_advanced.pyand/or new test files:pipeline_traceoutput, timing entry, candidate trace event format, dry-run metadata preservation, replay packetmodel_guard_result, post_model_guard hook firing.γ = 0backward compatibility, timing separation from MMR, candidate trace with saturation scores, replay wing counts, post_cross_wing hook context.score_breakdownstructure,explain=falseomits it, score component correctness, dropped-drawer score snapshots in candidate trace, full-candidate dry-run output, replay score_events list, post_score_event hook granularity.max_scan_depthenforcement, per-batch timing, batch candidate trace, dry-run scan summary, replay pagination state, post_batch hook full-palace coverage.test_dx_flags_off_identical_results— runs the same query with all DX flags ON and OFF, asserts byte-identical drawer IDs, order, and scores.Target: ~60–80 new tests across all five phases.
References
/compactsilently returns without compacting #856 — preCompact hook blocking