A local-first, zero-cost AI-powered system for querying and understanding data lineage using natural language. Built with local LLMs, vector search, and graph databases.
LineageGraph enables users to ask natural language questions about data dependencies and lineage, such as:
- "What feeds into the revenue dashboard?"
- "Which tables are upstream dependencies of revenue_daily?"
- "Can you trace the complete data flow from orders to revenue?"
The system combines:
- Vector Search (DuckDB) for semantic similarity matching
- Graph Database (PostgreSQL) for dependency traversal
- Local LLM (Ollama) for natural language understanding
- LangGraph Agent for intelligent query planning and execution
- π Natural Language Queries: Ask questions about data lineage in plain English
- π§ Intelligent Agent: LangGraph-based agent that plans, investigates, and synthesizes answers
- π GraphRAG: Grounds answers in structured graph data for accuracy
- π Vector Search: Semantic search over table descriptions and metadata
- π¨ Interactive Frontend: React-based UI for querying and visualization
- π Evaluation Suite: Comprehensive test harness with golden dataset
- π¬ Observability: OpenTelemetry tracing for debugging and monitoring
- π° Zero Cost: Runs entirely locally, no API costs
graph TB
subgraph Frontend["Frontend Layer"]
UI[React UI<br/>Query Interface<br/>Port 5173]
end
subgraph Backend["Backend Layer"]
API[FastAPI<br/>REST API<br/>Port 8000]
end
subgraph Agent["Agent Layer"]
LangGraph[LangGraph Agent<br/>State Machine]
end
subgraph Services["Services"]
LLM[Ollama LLM<br/>Mistral 7B]
Vector[DuckDB<br/>Vector Search]
Graph[PostgreSQL<br/>Graph Database]
end
UI -->|HTTP REST| API
API --> LangGraph
LangGraph --> LLM
LangGraph --> Vector
LangGraph --> Graph
style Frontend fill:#e1f5ff
style Backend fill:#fff4e1
style Agent fill:#ffe1f5
style Services fill:#e1ffe1
For detailed architecture documentation, see:
- System Overview - Visual architecture diagrams
- Architecture Documentation - Detailed system architecture
- Component Reference - Component details
- Python 3.11+
- Node.js 18+
- PostgreSQL 15+
- Ollama (for local LLM)
- Homebrew (macOS) or equivalent package manager
-
Clone the repository:
git clone https://github.com/yxshwanth/LieageGraph.git cd LineageGraph -
Install Python dependencies:
python3 -m venv venv source venv/bin/activate pip install -r requirements.txt -
Install frontend dependencies:
cd frontend npm install cd ..
-
Set up services:
# Start PostgreSQL brew services start postgresql@15 # Start Ollama brew services start ollama # Download Mistral model ollama pull mistral
-
Load sample data:
source venv/bin/activate python src/graph/loader.py python src/vector/loader.py
Option 1: Using the management script (recommended)
# Start infrastructure services
./scripts/manage.sh start
# Start backend (in terminal 1)
source venv/bin/activate
python src/main.py
# Start frontend (in terminal 2)
cd frontend
npm run devOption 2: Using Make
# Start infrastructure
make start
# Start backend
make backend
# Start frontend (in another terminal)
make frontendOption 3: Manual
# Terminal 1: Backend
source venv/bin/activate
python src/main.py
# Terminal 2: Frontend
cd frontend
npm run devThe application will be available at:
- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
curl -X POST http://localhost:8000/api/query \
-H "Content-Type: application/json" \
-d '{
"query": "What feeds into the revenue dashboard?",
"depth": 3
}'from src.agents.graph import run_agent
result = run_agent("What feeds into the revenue dashboard?", verbose=True)
print(result["final_answer"])Open http://localhost:5173 in your browser and use the query interface to ask questions about data lineage.
Run the test suite:
# All tests
pytest tests/ -v
# Unit tests
make test-unit
# Integration tests
make test-integration
# Evaluation pipeline
pytest tests/test_evaluation_pipeline.py -vLineageGraph/
βββ src/
β βββ agents/ # LangGraph agent implementation
β β βββ graph.py # Agent graph definition
β β βββ nodes.py # Agent nodes (plan, investigate, synthesize)
β β βββ tools.py # Agent tools (vector search, graph queries)
β β βββ state.py # Agent state management
β βββ graph/ # Graph database layer
β β βββ schema.py # PostgreSQL schema and queries
β β βββ loader.py # Sample data loader
β βββ vector/ # Vector search layer
β β βββ database.py # DuckDB vector store
β β βββ embeddings.py # Sentence-transformers embedder
β β βββ loader.py # Sample data loader
β βββ main.py # FastAPI application
βββ frontend/ # React frontend
β βββ src/
β β βββ App.jsx # Main app component
β β βββ components/ # UI components
βββ tests/ # Test suite
β βββ test_agent_*.py # Agent tests
β βββ test_evaluation_pipeline.py # Evaluation tests
β βββ data/ # Golden dataset
βββ docs/ # Documentation
βββ scripts/ # Utility scripts
β βββ manage.sh # Service management
βββ requirements.txt # Python dependencies
# Database connection
export DATABASE_URL="postgresql://postgres:postgres@localhost/semantic_lineage"
# Enable OpenTelemetry tracing
export TRACING_ENABLED=trueSee SERVICE_MANAGEMENT.md for detailed service management instructions.
The project includes a comprehensive evaluation harness:
- Golden Dataset: 20+ test cases covering various query types
- Metrics: Pass rate, node recall, answer relevance
- Thresholds: 70% pass rate, 70% node recall, 65% answer relevance
Run evaluation:
pytest tests/test_evaluation_pipeline.py -v-
Define the tool in
src/agents/tools.py:@tool("my_new_tool") def my_new_tool(param: str) -> Dict[str, Any]: """Tool description""" # Implementation return {"success": True, "result": ...}
-
Add to
ALL_TOOLSinsrc/agents/tools.py -
The agent will automatically discover and use it
- Graph data: Use
src/graph/loader.pyas a template - Vector data: Use
src/vector/loader.pyas a template
GitHub Actions automatically runs:
- Unit tests
- Integration tests
- Evaluation pipeline (optional, slow)
See .github/workflows/test.yml for details.
- System Overview - Visual architecture diagrams and system overview
- Architecture Documentation - Detailed system architecture with Mermaid diagrams
- Component Reference - Detailed component documentation
- Quick Start Guide - Step-by-step setup instructions
- Service Management - Service management guide
- Agent Tracing - OpenTelemetry tracing usage
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is open source and available under the MIT License.
- Ollama for local LLM inference
- LangGraph for agent orchestration
- DuckDB for vector search
- PostgreSQL for graph storage
For questions, issues, or contributions, please open an issue on GitHub.
Built with β€οΈ for zero-cost AI systems