MemPalace · Vasanth19 · Apr 29, 2026
@@ -0,0 +1,218 @@
+# Multi-Session MCP Daemon — Persistent MemPalace for All Your AI Agents
+
+**Fixes:** #1229 — zombie MCP server processes blocking sessions and corrupting ChromaDB
+
+## The Problem
+
+The default `mempalace-mcp` command (which calls `mempalace.mcp_server`) is designed
+to run as one process per agent session. When you run Claude Code, Codex, Gemini CLI,
+and GG simultaneously, each session spawns its own Python process — and each process
+holds an open ChromaDB `PersistentClient`.
+
+This creates two compounding failure modes:
+
+### 1. Zombie processes after SIGKILL
+
+MCP host applications (Claude Desktop, VS Code, terminal multiplexers) sometimes
+force-quit sessions with `SIGKILL` rather than `SIGTERM`. Python's `atexit` handlers
+and `signal.signal(SIGTERM, ...)` cleanup traps never fire on `SIGKILL`. The result
+is a Python process that exits without releasing its file descriptor on
+`~/.mempalace/mcp-server.pid`.
+
+On the next session start, a PID-file guard sees a stale PID file, decides another
+instance is running, and refuses to start. The session connects to nothing, and every
+MCP tool call returns "Connection closed".
+
+Removing the stale PID file by hand is the only recovery — until it happens again.
+
+### 2. Concurrent ChromaDB writers corrupt HNSW
+
+ChromaDB's HNSW index uses memory-mapped segment files. When two processes both hold
+a `PersistentClient` against the same `chroma.sqlite3` and both call `upsert()`, the
+writes interleave at the mmap level. This causes the in-memory HNSW tree and the on-disk
+sqlite metadata to diverge — exactly the divergence issue #1222 documents and detects.
+
+The `hnsw_capacity_status` probe (introduced in #1222) can detect this after the fact,
+but it cannot prevent it. The only safe fix is ensuring **a single process owns the
+ChromaDB connection** at all times.
+
+## The Solution: Daemon + Bridge Architecture
+
+Instead of each session spawning its own `mcp_server.py` process, a single **daemon**
+process runs continuously (managed by macOS LaunchAgent), holds the ChromaDB connection,
+and serves all sessions over a Unix socket. Each agent session runs a tiny **bridge**
+script that relays its stdio to/from the daemon socket.
+
+```
+macOS LaunchAgent
+  └── mempalace-daemon.py    (one process, holds ChromaDB, listens on ~/.mempalace/mcp.sock)
+        ├── Claude Code session  ←→  mempalace-bridge.py  ←→  socket
+        ├── Codex session        ←→  mempalace-bridge.py  ←→  socket
+        ├── Gemini CLI session   ←→  mempalace-bridge.py  ←→  socket
+        └── GG session           ←→  mempalace-bridge.py  ←→  socket
+```
+
+**Benefits:**
+
+- **No zombie problem.** If a session is SIGKILL'd, only the bridge dies. The daemon
+  keeps running; the socket stays open; the next session connects immediately.
+- **No concurrent writer corruption.** All `tools/call` requests are serialised through
+  a single `threading.Lock` inside the daemon. Protocol messages (`initialize`,
+  `tools/list`, `ping`) are lock-free for speed.
+- **Auto-start on first use.** The bridge detects a missing socket and starts the
+  daemon automatically, so you don't have to think about it.
+- **LaunchAgent keeps it alive.** If the daemon crashes (e.g. OOM), launchd restarts
+  it within `ThrottleInterval` seconds (default: 5).
+
+## Files
+
+Three files are provided in the `examples/` directory:
+
+| File | Purpose |
+|------|---------|
+| `mempalace-daemon.py` | The persistent server process — run once via LaunchAgent |
+| `mempalace-bridge.py` | Per-session stdio relay — this is the MCP command you configure |
+| `com.mempalace.daemon.plist` | macOS LaunchAgent template — edit paths, then `launchctl load` |
+
+## Installation
+
+### Step 1 — Copy the scripts
+
+```bash
+# Copy to wherever you keep your local tooling.
+# The bridge looks for the daemon script relative to its own location,
+# so keep both files in the same directory.
+cp examples/mempalace-daemon.py ~/bin/mempalace-daemon.py
+cp examples/mempalace-bridge.py ~/bin/mempalace-bridge.py
+chmod +x ~/bin/mempalace-daemon.py ~/bin/mempalace-bridge.py
+```
+
+### Step 2 — Edit the LaunchAgent plist
+
+Open `examples/com.mempalace.daemon.plist` and replace:
+
+- `/path/to/mempalace/venv/bin/python` — the Python interpreter in your MemPalace
+  virtualenv (run `which python` inside the venv to get the path)
+- `/path/to/mempalace-daemon.py` — the absolute path where you copied the daemon script
+- `YOUR_USERNAME` — your macOS username (`echo $USER`)
+
+### Step 3 — Install and load the LaunchAgent
+
+```bash
+cp examples/com.mempalace.daemon.plist ~/Library/LaunchAgents/com.mempalace.daemon.plist
+launchctl load ~/Library/LaunchAgents/com.mempalace.daemon.plist
+```
+
+Verify it started:
+
+```bash
+# The socket should exist within a few seconds
+ls -la ~/.mempalace/mcp.sock
+
+# Check the log
+tail -f ~/.mempalace/daemon.log
+```
+
+### Step 4 — Configure each agent to use the bridge
+
+Replace every `mempalace-mcp` / `mempalace.mcp_server` invocation with the bridge.
+
+#### Claude Code (`~/.claude.json`)
+
+```json
+{
+  "mcpServers": {
+    "mempalace": {
+      "type": "stdio",
+      "command": "/absolute/path/to/python",
+      "args": ["/absolute/path/to/mempalace-bridge.py"]
+    }
+  }
+}
+```
+
+Or register via CLI:
+
+```bash
+claude mcp add mempalace -- /path/to/python /path/to/mempalace-bridge.py
+```
+
+#### Codex (`~/.codex/config.toml`)
+
+```toml
+[mcp_servers.mempalace]
+command = "/absolute/path/to/python"
+args    = ["/absolute/path/to/mempalace-bridge.py"]
+```
+
+#### Gemini CLI (`~/.gemini/settings.json`)
+
+```json
+{
+  "mcpServers": {
+    "mempalace": {
+      "command": "/absolute/path/to/python",
+      "args": ["/absolute/path/to/mempalace-bridge.py"]
+    }
+  }
+}
+```
+
+Or via CLI:
+
+```bash
+gemini mcp add mempalace /absolute/path/to/python /absolute/path/to/mempalace-bridge.py --scope user
+```
+
+#### Any other stdio MCP client
+
+The bridge is a generic stdio relay. Any client that accepts a `command` + `args`
+MCP configuration can use it:
+
+```
+command: /path/to/python
+args:    ["/path/to/mempalace-bridge.py"]
+```
+
+## Troubleshooting
+
+### "MemPalace daemon not reachable"
+
+The bridge tried 20 times (5 seconds total) to connect and failed.
+
+1. Check the daemon log: `tail ~/.mempalace/daemon.log`
+2. Check launchd status: `launchctl list | grep mempalace`
+3. Try starting the daemon manually to see startup errors:
+   ```bash
+   /path/to/python /path/to/mempalace-daemon.py
+   ```
+
+### LaunchAgent not starting
+
+- Verify plist syntax: `plutil ~/Library/LaunchAgents/com.mempalace.daemon.plist`
+- Reload: `launchctl unload ~/Library/LaunchAgents/com.mempalace.daemon.plist && launchctl load ~/Library/LaunchAgents/com.mempalace.daemon.plist`
+- Check Console.app for launchd errors.
+
+### Stale socket from a previous crash
+
+If the daemon crashed without cleanup, the socket file may still exist but be dead.
+The daemon removes a stale socket automatically on startup (`SOCK_PATH.unlink()` before
+`server.bind()`), so restarting the daemon is sufficient:
+
+```bash
+launchctl kickstart -k gui/$(id -u)/com.mempalace.daemon
+```
+
+### Reverting to the single-process mode
+
+Remove or unload the LaunchAgent, then update your MCP configurations back to
+`mempalace-mcp` or `python -m mempalace.mcp_server`. The daemon and bridge are
+additive — they do not modify the core `mcp_server.py`.
+
+## Tested on
+
+- macOS 14 Sonoma
+- macOS 15 Sequoia
+- Python 3.11 / 3.12
+- MemPalace 3.3.x (ChromaDB 0.6.x)
+- Concurrent sessions: Claude Code + Codex + Gemini CLI + GG
@@ -0,0 +1,26 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+<dict>
+    <key>Label</key>
+    <string>com.mempalace.daemon</string>
+    <key>ProgramArguments</key>
+    <array>
+        <!-- Replace with your actual Python path: `which python` in the mempalace venv -->
+        <string>/path/to/mempalace/venv/bin/python</string>
+        <string>/path/to/mempalace-daemon.py</string>
+    </array>
+    <key>RunAtLoad</key>
+    <true/>
+    <key>KeepAlive</key>
+    <true/>
+    <key>ThrottleInterval</key>
+    <integer>5</integer>
+    <key>StandardOutPath</key>
+    <string>/Users/YOUR_USERNAME/.mempalace/daemon.log</string>
+    <key>StandardErrorPath</key>
+    <string>/Users/YOUR_USERNAME/.mempalace/daemon.log</string>
+    <key>WorkingDirectory</key>
+    <string>/Users/YOUR_USERNAME</string>
+</dict>
+</plist>
@@ -0,0 +1,100 @@
+"""
+mempalace-bridge.py — Lightweight stdio<->socket relay for MemPalace.
+
+Configure each AI agent (Claude Code, Codex, Gemini CLI, etc.) to run this
+script as the MCP command instead of mempalace.mcp_server directly.
+It auto-starts the daemon on first use.
+
+MCP config example (Claude Code ~/.claude.json):
+  "mempalace": {
+    "type": "stdio",
+    "command": "/path/to/python",
+    "args": ["/path/to/mempalace-bridge.py"]
+  }
+"""
+import os
+import socket
+import subprocess
+import sys
+import threading
+import time
+from pathlib import Path
+
+PALACE_DIR    = Path(os.environ.get("MEMPALACE_PALACE", Path.home() / ".mempalace"))
+SOCK_PATH     = PALACE_DIR / "mcp.sock"
+DAEMON_PYTHON = sys.executable
+DAEMON_SCRIPT = str(Path(__file__).parent / "mempalace-daemon.py")
+
+
+def _start_daemon():
+    PALACE_DIR.mkdir(parents=True, exist_ok=True)
+    log_path = PALACE_DIR / "daemon.log"
+    subprocess.Popen(
+        [DAEMON_PYTHON, DAEMON_SCRIPT],
+        stdout=open(str(log_path), "a"),
+        stderr=subprocess.STDOUT,
+        close_fds=True,
+        start_new_session=True,
+    )
+
+
+def _connect(retries: int = 20, delay: float = 0.25) -> socket.socket:
+    for i in range(retries):
+        try:
+            s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
+            s.connect(str(SOCK_PATH))
+            return s
+        except (FileNotFoundError, ConnectionRefusedError):
+            if i == 0:
+                _start_daemon()
+            time.sleep(delay)
+    raise RuntimeError(f"MemPalace daemon not reachable at {SOCK_PATH}")
+
+
+def main():
+    try:
+        sock = _connect()
+    except RuntimeError as e:
+        print(f"[mempalace-bridge] {e}", file=sys.stderr)
+        sys.exit(1)
+
+    stop = threading.Event()
+
+    def pump_in():
+        try:
+            while not stop.is_set():
+                chunk = sys.stdin.buffer.read1(65536)
+                if not chunk:
+                    break
+                sock.sendall(chunk)
+        except Exception:
+            pass
+        finally:
+            stop.set()
+            try:
+                sock.shutdown(socket.SHUT_WR)
+            except Exception:
+                pass
+
+    def pump_out():
+        try:
+            while not stop.is_set():
+                chunk = sock.recv(65536)
+                if not chunk:
+                    break
+                sys.stdout.buffer.write(chunk)
+                sys.stdout.buffer.flush()
+        except Exception:
+            pass
+        finally:
+            stop.set()
+
+    t_in  = threading.Thread(target=pump_in,  daemon=True)
+    t_out = threading.Thread(target=pump_out, daemon=True)
+    t_in.start()
+    t_out.start()
+    t_out.join()
+
+
+if __name__ == "__main__":
+    main()