Component Reference

Detailed documentation for each component in the LineageGraph system.

Agent Components

Agent Graph (`src/agents/graph.py`)

The main agent orchestration system using LangGraph.

Key Functions:

create_agent_graph(): Creates the LangGraph state machine
run_agent(query, verbose=False): Main entry point for agent execution

Graph Structure:

START → plan → investigate → tool → check_continue
                              ↓
                         [loop back or]
                              ↓
                         synthesize → END

Agent Nodes (`src/agents/nodes.py`)

Individual nodes in the agent graph:

plan_node(state)
- Analyzes user query
- Creates execution plan
- Determines required information
investigate_node(state)
- Selects appropriate tools
- Prepares tool inputs
- Decides investigation strategy
synthesize_node(state)
- Combines tool results
- Generates final answer using LLM
- Calculates confidence score

Agent State (`src/agents/state.py`)

Type definitions for agent state:

class AgentState(TypedDict):
    user_query: str
    current_step: str
    plan: str
    next_tool: str
    tool_results: Dict[str, Any]
    final_answer: str
    confidence_score: float
    step_count: int
    tool_calls_made: List[str]

Agent Tools (`src/agents/tools.py`)

Tools available to the agent:

1. search_vector_db

Purpose: Semantic search over table descriptions

Input:

{
    "query": "What feeds into revenue?",
    "limit": 3
}

Output:

{
    "success": True,
    "count": 3,
    "items": [
        {
            "id": "table_revenue_daily",
            "table_name": "revenue_daily",
            "text": "...",
            "similarity": 0.85
        },
        ...
    ]
}

2. get_table_dependencies

Purpose: Get upstream dependencies of a table

Input:

{
    "table_id": "dashboard_revenue",
    "depth": 3
}

Output:

{
    "success": True,
    "dependencies": [
        {
            "id": "table_revenue_daily",
            "name": "revenue_daily",
            "type": "Table",
            "depth": 0
        },
        ...
    ]
}

3. validate_lineage_path

Purpose: Validate if a path exists between two nodes

Input:

{
    "source_id": "table_orders",
    "target_id": "dashboard_revenue"
}

Output:

{
    "success": True,
    "is_valid": True,
    "path_length": 3
}

4. get_node_metadata

Purpose: Get metadata for a specific node

Input:

{
    "node_id": "table_users"
}

Output:

{
    "success": True,
    "node": {
        "id": "table_users",
        "name": "users",
        "type": "Table",
        "description": "..."
    }
}

5. trace_data_flow

Purpose: Trace complete data flow path

Input:

{
    "start_node": "table_orders",
    "end_node": "dashboard_revenue"
}

Output:

{
    "success": True,
    "path": [
        "table_orders",
        "table_order_clean",
        "table_revenue_daily",
        "dashboard_revenue"
    ]
}

6. check_data_freshness

Purpose: Check data freshness score

Input:

{
    "table_id": "table_users"
}

Output:

{
    "success": True,
    "freshness_score": 0.95,
    "last_updated": "2024-01-01T00:00:00Z"
}

Storage Components

Vector Store (`src/vector/database.py`)

DuckDB-based vector database for semantic search.

Key Methods:

add_embedding(id, text, embedding, table_name, source_type): Store embedding
search(query_embedding, limit=3): Search for similar embeddings

Schema:

embeddings table: Text and metadata
vectors table: Embedding vectors

Graph Store (`src/graph/schema.py`)

PostgreSQL-based graph database for lineage relationships.

Key Methods:

add_node(id, node_type, name, description): Add a node
add_edge(source_id, target_id, edge_type): Add a relationship
get_dependencies(node_id, depth): Get upstream dependencies

Schema:

nodes table: Graph nodes
edges table: Graph relationships

Embedder (`src/vector/embeddings.py`)

Sentence-transformers based embedder.

Model: all-MiniLM-L6-v2 (384 dimensions)

Key Methods:

embed_text(text): Generate embedding for text
embed_batch(texts): Generate embeddings for multiple texts

API Components

FastAPI Application (`src/main.py`)

Main FastAPI application.

Endpoints:

GET /health: Health check
POST /api/query: Execute lineage query

Request/Response Models:

QueryRequest: Input model
QueryResponse: Output model

Frontend Components

Query Interface (`frontend/src/components/QueryInterface.jsx`)

Main query interface component.

Features:

Natural language query input
Results display
Error handling

API Client (`frontend/src/api/client.js`)

HTTP client for backend communication.

Methods:

queryLineage(query, depth): Send query to backend

Utility Components

Service Management (`scripts/manage.sh`)

Bash script for managing services.

Commands:

start: Start all services
stop: Stop all services
status: Check service status
restart: Restart all services

Tracing (`src/agents/tracing.py`)

OpenTelemetry tracing support.

Features:

Agent execution tracing
Tool call tracing
LLM inference tracing

Usage:

export TRACING_ENABLED=true
# Traces sent to Jaeger at http://localhost:16686

Data Loaders

Graph Loader (`src/graph/loader.py`)

Loads sample lineage data into PostgreSQL.

Sample Data:

5 nodes (users, orders, order_clean, revenue_daily, revenue_dashboard)
4 edges (lineage relationships)

Vector Loader (`src/vector/loader.py`)

Loads sample embeddings into DuckDB.

Sample Data:

5 table descriptions
Embeddings for each description

Testing Components

Evaluation Harness (`tests/eval_harness.py`)

Comprehensive evaluation system.

Features:

Golden dataset evaluation
Node recall calculation
Answer relevance scoring
Pass rate metrics

Test Suites

test_agent_tools.py: Unit tests for agent tools
test_agent_graph.py: Tests for agent graph
test_week1_5_integration.py: Integration tests
test_evaluation_pipeline.py: Evaluation pipeline tests

Configuration

Environment Variables

DATABASE_URL: PostgreSQL connection string
TRACING_ENABLED: Enable/disable OpenTelemetry tracing

Configuration Files

requirements.txt: Python dependencies
frontend/package.json: Frontend dependencies
.github/workflows/test.yml: CI/CD configuration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Component Reference

Agent Components

Agent Graph (`src/agents/graph.py`)

Agent Nodes (`src/agents/nodes.py`)

Agent State (`src/agents/state.py`)

Agent Tools (`src/agents/tools.py`)

1. search_vector_db

2. get_table_dependencies

3. validate_lineage_path

4. get_node_metadata

5. trace_data_flow

6. check_data_freshness

Storage Components

Vector Store (`src/vector/database.py`)

Graph Store (`src/graph/schema.py`)

Embedder (`src/vector/embeddings.py`)

API Components

FastAPI Application (`src/main.py`)

Frontend Components

Query Interface (`frontend/src/components/QueryInterface.jsx`)

API Client (`frontend/src/api/client.js`)

Utility Components

Service Management (`scripts/manage.sh`)

Tracing (`src/agents/tracing.py`)

Data Loaders

Graph Loader (`src/graph/loader.py`)

Vector Loader (`src/vector/loader.py`)

Testing Components

Evaluation Harness (`tests/eval_harness.py`)

Test Suites

Configuration

Environment Variables

Configuration Files

FilesExpand file tree

COMPONENTS.md

Latest commit

History

COMPONENTS.md

File metadata and controls

Component Reference

Agent Components

Agent Graph (src/agents/graph.py)

Agent Nodes (src/agents/nodes.py)

Agent State (src/agents/state.py)

Agent Tools (src/agents/tools.py)

1. search_vector_db

2. get_table_dependencies

3. validate_lineage_path

4. get_node_metadata

5. trace_data_flow

6. check_data_freshness

Storage Components

Vector Store (src/vector/database.py)

Graph Store (src/graph/schema.py)

Embedder (src/vector/embeddings.py)

API Components

FastAPI Application (src/main.py)

Frontend Components

Query Interface (frontend/src/components/QueryInterface.jsx)

API Client (frontend/src/api/client.js)

Utility Components

Service Management (scripts/manage.sh)

Tracing (src/agents/tracing.py)

Data Loaders

Graph Loader (src/graph/loader.py)

Vector Loader (src/vector/loader.py)

Testing Components

Evaluation Harness (tests/eval_harness.py)

Test Suites

Configuration

Environment Variables

Configuration Files

Agent Graph (`src/agents/graph.py`)

Agent Nodes (`src/agents/nodes.py`)

Agent State (`src/agents/state.py`)

Agent Tools (`src/agents/tools.py`)

Vector Store (`src/vector/database.py`)

Graph Store (`src/graph/schema.py`)

Embedder (`src/vector/embeddings.py`)

FastAPI Application (`src/main.py`)

Query Interface (`frontend/src/components/QueryInterface.jsx`)

API Client (`frontend/src/api/client.js`)

Service Management (`scripts/manage.sh`)

Tracing (`src/agents/tracing.py`)

Graph Loader (`src/graph/loader.py`)

Vector Loader (`src/vector/loader.py`)

Evaluation Harness (`tests/eval_harness.py`)