fix: persistent daemon to prevent zombie processes and enable multi-session access (closes #1229)#1270
Open
Vasanth19 wants to merge 1 commit intoMemPalace:developfrom
Open
Conversation
…ession access (closes MemPalace#1229) Introduces a daemon + bridge architecture so all AI agent sessions share a single long-lived MemPalace process rather than each spawning their own MCP server. This eliminates two failure modes described in MemPalace#1229: 1. Zombie processes: SIGKILL bypasses Python atexit/trap cleanup, leaving stale PID files that block every subsequent session from connecting. The daemon outlives any individual session; only the bridge (a 60-line relay) dies with the session. 2. Concurrent ChromaDB writer corruption: multiple PersistentClient holders racing on the HNSW mmap files cause the sqlite metadata and in-memory index to diverge (see also MemPalace#1222). The daemon serialises all tools/call requests through a threading.Lock, giving a single-writer guarantee. Files added under examples/: - mempalace-daemon.py — persistent Unix socket MCP server (LaunchAgent target) - mempalace-bridge.py — lightweight stdio<->socket relay (MCP command per session) - com.mempalace.daemon.plist — macOS LaunchAgent template; auto-restarts on crash Docs added: - docs/multi-session-daemon.md — problem description, architecture, install steps, and MCP config examples for Claude Code, Codex, Gemini CLI, and generic clients. No changes to the core mempalace package or mcp_server.py — the daemon imports handle_request() directly, making this purely additive. Tested on macOS 14/15, Python 3.11/3.12, MemPalace 3.3.x with four concurrent sessions (Claude Code + Codex + Gemini CLI + GG). Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Closes #1229.
The current
mempalace-mcp/mempalace.mcp_servermodel runs one Python process per agent session. In practice, developers run Claude Code, Codex, Gemini CLI, and GG simultaneously. This exposes two compounding failure modes.Failure 1 — Zombie processes after SIGKILL
MCP hosts sometimes force-quit sessions with
SIGKILL. Python'satexitandsignal.signal(SIGTERM)cleanup never fires onSIGKILL, so the process exits without releasing the PID file. The next session sees a stale PID file, decides another instance is running, and refuses to start. Every MCP tool call returns "Connection closed". Manual PID-file removal is the only recovery — until it happens again.Failure 2 — Concurrent ChromaDB writers corrupt HNSW
When multiple sessions each hold an open
PersistentClientagainst the samechroma.sqlite3and simultaneously callupsert(), the writes interleave at the mmap level. The in-memory HNSW tree and on-disk sqlite metadata diverge — exactly the divergence that #1222 detects but cannot prevent. The only safe fix is a single process owning the ChromaDB connection.Solution — Daemon + Bridge Architecture
tools/callrequests are serialized through athreading.Lockinside the daemon. Protocol messages (initialize,tools/list,ping) remain lock-free.ThrottleIntervalseconds (default: 5).Changes
Four files added — no changes to the core
mempalacepackage:examples/mempalace-daemon.pyexamples/mempalace-bridge.pyexamples/com.mempalace.daemon.plistlaunchctl load)docs/multi-session-daemon.mdTest Plan
mempalace-daemon.pystarts and creates~/.mempalace/mcp.sockmempalace-bridge.pyconnects and relaystools/list/mempalace_statuscorrectlymempalace_add_drawercalls succeed without HNSW divergencemcp_server.pybehavior for users not using the daemonTested on
🤖 Generated with Claude Code