"Intelligence without memory is just a series of disconnected thoughts."
NeuraMemory AI is an open-source, high-performance memory engine designed to give Large Language Models (LLMs) a persistent, cross-session, and model-agnostic long-term memory. It acts like a “second brain,” helping you save, organize, and find information easily, while understanding context, linking related ideas, and giving smart summaries and insights.
Presentation link: https://neuramemory-ai-ea5s372.gamma.site/
Demo Link: https://youtu.be/eSowmleQzQY?si=iYT1t2C6Sxl-blqX
Live Link: https://neura-memory-ai.vercel.app/
Currently, interacting with AI feels like meeting a brilliant person who gets a concussion every time you close the chat window. This creates three critical failures in the developer and user experience:
-
The Context Window "Token Tax"
Every time you start a new session, you have to re-feed the AI your project structure, coding preferences, or personal history. This wastes thousands of tokens and real money on redundant processing. Current context windows are growing, but they are still a "leaky bucket". Once the limit is reached, the oldest (and often most important) context is discarded. -
The Model Silo Problem
Your "relationship" with an AI is trapped inside a single platform. If you move from Gemini to Claude or a local Llama model, you lose all previous context. There is no interoperable layer for personal or professional AI memory. -
The "Grounding" Gap
Generic LLMs lack "Personal Grounding." They know how to write code, but they don't know your specific project's architectural quirks unless you explicitly tell them every single time.
-
Massive Token Efficiency
By using Retrieval-Augmented Generation (RAG) specifically for personal history, NeuraMemory can represent years of interaction in just a few hundred tokens. It solves the "Lost-in-the-Middle" phenomenon by ensuring only the most semantically relevant data is placed in the LLM's "Working Memory." -
User-Centric Sovereignty
In a world where big tech companies want to own your "digital twin," NeuraMemory is Local-First. You own your memory database. It can be hosted on your local machine or a private server, ensuring that your "Personal Context" never becomes someone else's training data. -
Model Agnosticism
NeuraMemory is designed with a Universal API. Whether you are using a Go-based backend, a React frontend, or a CLI tool, you can hook into the same memory stream. It bridges the gap between different AI providers, making your personal context portable. -
Hierarchical Memory Management
Unlike simple databases, NeuraMemory distinguishes between:- **Episodic Memory:** Specific events (e.g., "We fixed the bug in the auth controller yesterday"). - **Semantic Memory:** General facts (e.g., "I prefer using functional programming patterns in TypeScript"). - **Procedural Memory:** How you like things done (e.g., "Always use snake_case for database schemas").
- Multi-Modal Interaction: Users can interact via text, links, files, and documents.
- Memory Management:
- All chats are stored as memories.
- Memories are displayed as cards on the Manage Memory page.
- Users can add, update, or delete memories easily.
- Conversational Memory: Talk to your stored memories anytime.
- Central AI Hub: Acts as a unified interface connecting multiple AI tools and services.
make dev # Start development environment
make dev-down # Stop development environment
make prod-up # Start production services
make prod-down # Stop production services
make logs # View logs
make clean # Stop and remove containers# Development
docker compose -f docker-compose.yml -f docker-compose.dev.yml up --build
# Production
docker compose up -d
docker compose down
# View logs
docker compose logs -fgit clone https://github.com/Gautam7352/NeuraMemory-AI.git
cd NeuraMemory-AI
cp server/.env.example server/.env
cp client/.env.example client/.env.production
make devEndpoints:
- Frontend:
http://localhost:5173 - API:
http://localhost:3000
For more details, see:
cp server/.env.example server/.env
cp client/.env.example client/.env.productionPaste in the claude desktop config:
{
"mcpServers": {
"memories": {
"command": "npx",
"args": [
"-y",
"mcp-remote",
"https://neura-memory-ai.vercel.app/api/v1/mcp?apiKey={YOUR_API_KEY}"
]
}
},
"preferences": {
"coworkScheduledTasksEnabled": false,
"ccdScheduledTasksEnabled": false,
"sidebarMode": "chat",
"coworkWebSearchEnabled": true
}
}NeuraMemory-AI is currently in active development with a stable core. It is deployed as a distributed system:
- Frontend: Hosted on Vercel for high performance and global edge delivery.
- Backend: Hosted on a dedicated GCP VM with a reverse proxy setup to avoid CORS and performance bottlenecks.
For deep dives into how the system works and how to manage it, see:
- Developer Guide — Architecture, local setup, CI/CD, and repository map.
- Production Ops Guide — Monitoring, logs, backups, and scaling instructions.
- Server Docs — API specifications, database design, and best practices.
- Multi-Modal Interaction: Users can interact via text, links, files, and documents (including OCR support).
- Intelligent Memory Extraction: Automatically distinguishes between episodic facts (bubbles) and semantic knowledge.
- Conversational Retrieval: Talk to your memories using context-grounded RAG.
- Model Context Protocol (MCP): Native integration with AI tools like Claude for remote memory access.
- Local-First / Sovereignty: You own your database; it can be fully self-hosted.
...