Skip to content

Latest commit

 

History

History
133 lines (97 loc) · 3.85 KB

File metadata and controls

133 lines (97 loc) · 3.85 KB

CLI > SDK > Python Hierarchy

Following the TRUE 95/5 principle, we should use the simplest tool for each job:

1. CLI Commands (Simplest - Use First)

What's Available:

# Index and search with built-in CLI
llamaindex-cli rag --files "./docs" --question "How does X work?"

# Interactive chat mode
llamaindex-cli rag --files "./docs" --chat

# Download pre-built packs
llamaindex-cli download-llamapack CodeHierarchyAgentPack
llamaindex-cli download-llamapack SubQuestionQueryEnginePack

# Create new LlamaIndex app
llamaindex-cli rag --files "./docs" --create-llama

Our Current Usage: ❌ Not using (requires chromadb, not Qdrant compatible)

2. SDK One-Liners (Simple - Use Second)

What We're Using: ✅

# Indexing - one line
VectorStoreIndex.from_documents(docs, storage_context=storage_context)

# Search - one line  
index.as_query_engine().query("question")

# Property Graph - one line
PropertyGraphIndex.from_documents(docs)

What We Could Use More:

# SubQuestion engine - breaks complex questions
SubQuestionQueryEngine.from_defaults(query_engines)

# Router engine - routes to different indexes
RouterQueryEngine.from_defaults(query_engines)

# Graph RAG - knowledge graph queries
GraphRAGQueryEngine.from_defaults(graph_index)

3. Custom Python (Complex - Use Last)

What We Have (Necessary Custom Logic):

  • find_violations() - 95/5 rule checking (business logic)
  • suggest_libraries() - Custom prompting
  • Integration with claude-parser
  • CLI wrappers for convenience

What We Should Replace:

  • enterprise_architecture.py → Use CodeHierarchyAgentPack
  • ❌ Complex graph building → Use PropertyGraphIndex.from_documents()
  • ❌ Manual caching → Already removed ✅

LlamaPacks We Should Download

# For code analysis
llamaindex-cli download-llamapack CodeHierarchyAgentPack -d ./packs

# For complex Q&A
llamaindex-cli download-llamapack SubQuestionQueryEnginePack -d ./packs

# For multi-step reasoning
llamaindex-cli download-llamapack ChainOfThoughtPack -d ./packs

# For structured output
llamaindex-cli download-llamapack StructuredOutputPack -d ./packs

Refactoring Priority

  1. Replace enterprise_architecture.py with CodeHierarchyAgentPack

    • Current: 113 LOC custom code
    • Better: Download and use the pack
  2. Add SubQuestionQueryEngine for complex queries

    • Current: Simple search only
    • Better: Break down complex questions
  3. Add RouterQueryEngine for multi-domain

    • Current: Manual collection selection
    • Better: Automatic routing

The TRUE Pattern

❌ WRONG (What we were doing):

# 200+ lines of custom code
class EnterpriseArchitecture(SemanticSearch):
    def complex_custom_logic(self):
        # Reinventing what LlamaIndex already has

✅ RIGHT (What we should do):

# Step 1: Try CLI
llamaindex-cli rag --files "./code" --question "Show me the architecture"

# Step 2: If CLI doesn't work, use SDK one-liner
PropertyGraphIndex.from_documents(docs).as_query_engine().query("architecture")

# Step 3: Only if needed, minimal Python wrapper
def analyze_architecture(path):
    return PropertyGraphIndex.from_documents(
        SimpleDirectoryReader(path).load_data()
    ).as_query_engine().query("extract architecture")

Action Items

  1. Download CodeHierarchyAgentPack and replace enterprise_architecture.py
  2. Test if llamaindex-cli can work with our Qdrant setup
  3. Add SubQuestionQueryEngine for complex queries
  4. Add RouterQueryEngine for automatic index selection
  5. Document which approach to use for each feature

Cost-Benefit Analysis

  • CLI: 0 lines of code, but limited to chromadb
  • SDK One-liners: 1-3 lines per feature, works with Qdrant
  • Custom Python: 100+ lines, maintenance burden

Decision: Use SDK one-liners as our primary pattern since CLI doesn't support Qdrant.