Skip to content

fix: persistent daemon to prevent zombie processes and enable multi-session access (closes #1229)#1270

Open
Vasanth19 wants to merge 1 commit intoMemPalace:developfrom
Vasanth19:fix/persistent-daemon-multi-session
Open

fix: persistent daemon to prevent zombie processes and enable multi-session access (closes #1229)#1270
Vasanth19 wants to merge 1 commit intoMemPalace:developfrom
Vasanth19:fix/persistent-daemon-multi-session

Conversation

@Vasanth19
Copy link
Copy Markdown

Problem

Closes #1229.

The current mempalace-mcp / mempalace.mcp_server model runs one Python process per agent session. In practice, developers run Claude Code, Codex, Gemini CLI, and GG simultaneously. This exposes two compounding failure modes.

Failure 1 — Zombie processes after SIGKILL

MCP hosts sometimes force-quit sessions with SIGKILL. Python's atexit and signal.signal(SIGTERM) cleanup never fires on SIGKILL, so the process exits without releasing the PID file. The next session sees a stale PID file, decides another instance is running, and refuses to start. Every MCP tool call returns "Connection closed". Manual PID-file removal is the only recovery — until it happens again.

Failure 2 — Concurrent ChromaDB writers corrupt HNSW

When multiple sessions each hold an open PersistentClient against the same chroma.sqlite3 and simultaneously call upsert(), the writes interleave at the mmap level. The in-memory HNSW tree and on-disk sqlite metadata diverge — exactly the divergence that #1222 detects but cannot prevent. The only safe fix is a single process owning the ChromaDB connection.

Solution — Daemon + Bridge Architecture

macOS LaunchAgent
  └── mempalace-daemon.py   (one process, holds ChromaDB, listens on ~/.mempalace/mcp.sock)
        ├── Claude Code  ←→  mempalace-bridge.py  ←→  socket
        ├── Codex        ←→  mempalace-bridge.py  ←→  socket
        ├── Gemini CLI   ←→  mempalace-bridge.py  ←→  socket
        └── GG           ←→  mempalace-bridge.py  ←→  socket
  • No zombie problem. If a session is SIGKILL'd, only the bridge dies. The daemon keeps running; the next session connects within milliseconds.
  • No concurrent writer corruption. All tools/call requests are serialized through a threading.Lock inside the daemon. Protocol messages (initialize, tools/list, ping) remain lock-free.
  • Auto-start on first use. The bridge detects a missing socket and starts the daemon automatically.
  • LaunchAgent keeps it alive. If the daemon crashes, launchd restarts it within ThrottleInterval seconds (default: 5).

Changes

Four files added — no changes to the core mempalace package:

File Description
examples/mempalace-daemon.py Persistent Unix socket MCP server; the LaunchAgent target
examples/mempalace-bridge.py ~60-line stdio↔socket relay; this is the MCP command each session uses
examples/com.mempalace.daemon.plist macOS LaunchAgent template (edit paths, then launchctl load)
docs/multi-session-daemon.md Full explanation of the problem, architecture, install steps, and MCP config examples for Claude Code, Codex, Gemini CLI, and any stdio client

Test Plan

  • Confirm mempalace-daemon.py starts and creates ~/.mempalace/mcp.sock
  • Confirm mempalace-bridge.py connects and relays tools/list / mempalace_status correctly
  • SIGKILL a bridge process; confirm daemon keeps running and next session connects cleanly
  • Run two bridge sessions concurrently; confirm both mempalace_add_drawer calls succeed without HNSW divergence
  • Unload LaunchAgent; confirm daemon stops; confirm bridge auto-restarts it on next connection attempt
  • Verify no changes to existing mcp_server.py behavior for users not using the daemon

Tested on

  • macOS 14 Sonoma / macOS 15 Sequoia
  • Python 3.11, 3.12
  • MemPalace 3.3.x (ChromaDB 0.6.x)
  • Concurrent sessions: Claude Code + Codex + Gemini CLI + GG

🤖 Generated with Claude Code

…ession access (closes MemPalace#1229)

Introduces a daemon + bridge architecture so all AI agent sessions share a
single long-lived MemPalace process rather than each spawning their own
MCP server. This eliminates two failure modes described in MemPalace#1229:

1. Zombie processes: SIGKILL bypasses Python atexit/trap cleanup, leaving
   stale PID files that block every subsequent session from connecting.
   The daemon outlives any individual session; only the bridge (a 60-line
   relay) dies with the session.

2. Concurrent ChromaDB writer corruption: multiple PersistentClient holders
   racing on the HNSW mmap files cause the sqlite metadata and in-memory
   index to diverge (see also MemPalace#1222). The daemon serialises all tools/call
   requests through a threading.Lock, giving a single-writer guarantee.

Files added under examples/:
- mempalace-daemon.py   — persistent Unix socket MCP server (LaunchAgent target)
- mempalace-bridge.py   — lightweight stdio<->socket relay (MCP command per session)
- com.mempalace.daemon.plist — macOS LaunchAgent template; auto-restarts on crash

Docs added:
- docs/multi-session-daemon.md — problem description, architecture, install steps,
  and MCP config examples for Claude Code, Codex, Gemini CLI, and generic clients.

No changes to the core mempalace package or mcp_server.py — the daemon
imports handle_request() directly, making this purely additive.

Tested on macOS 14/15, Python 3.11/3.12, MemPalace 3.3.x with four concurrent
sessions (Claude Code + Codex + Gemini CLI + GG).

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant