inference-stack-llc
diff --git a/‎README.md‎
Lines changed: 12 additions & 4 deletions b/‎README.md‎
Lines changed: 12 additions & 4 deletions
diff --git a/‎docs/api.md‎
Lines changed: 46 additions & 0 deletions b/‎docs/api.md‎
Lines changed: 46 additions & 0 deletions
diff --git a/‎docs/index.md‎
Lines changed: 3 additions & 3 deletions b/‎docs/index.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/user-guide/ai-product-engineering.md‎
Lines changed: 131 additions & 0 deletions b/‎docs/user-guide/ai-product-engineering.md‎
Lines changed: 131 additions & 0 deletions
diff --git a/‎src/electripy/ai/__init__.py‎
Lines changed: 6 additions & 0 deletions b/‎src/electripy/ai/__init__.py‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎src/electripy/ai/context_assembly/__init__.py‎
Lines changed: 26 additions & 0 deletions b/‎src/electripy/ai/context_assembly/__init__.py‎
Lines changed: 26 additions & 0 deletions
diff --git a/‎src/electripy/ai/context_assembly/domain.py‎
Lines changed: 54 additions & 0 deletions b/‎src/electripy/ai/context_assembly/domain.py‎
Lines changed: 54 additions & 0 deletions
diff --git a/‎src/electripy/ai/context_assembly/errors.py‎
Lines changed: 11 additions & 0 deletions b/‎src/electripy/ai/context_assembly/errors.py‎
Lines changed: 11 additions & 0 deletions
@@ -20,13 +20,15 @@ ElectriPy Studio is a curated collection of production-ready Python components a
 
 ## Status & recent updates
 
-- **Last updated**: 2026-03-04
-- **Maturity**: Early alpha (APIs may still evolve), but core components, CLI, concurrency primitives, and first AI building blocks are in place.
+- **Last updated**: 2026-03-24
+- **Maturity**: Early alpha (APIs may still evolve), but core components, CLI, concurrency primitives, and a growing suite of AI product engineering utilities are in place.
 - **Versioning**: SemVer begins at `v0.x` — expect breaking changes until `v1.0`.
 - **Recent highlights**:
     - Added an LLM Gateway for provider-agnostic LLM calls with structured output and safety seams.
     - Added a RAG Evaluation Runner and `electripy rag eval` CLI for benchmarking retrieval quality over JSONL datasets.
     - Added an AI Telemetry component for safe, provider-agnostic observability across HTTP resilience, LLM gateway, policy decisions, and RAG evaluation.
+    - Phase 1: Streaming chat, agent runtime, RAG quality/drift, hallucination guard, and response robustness utilities.
+    - Phase 2: Prompt engine, token budget management, context assembly, model routing, conversation memory, and tool registry.
     - Expanded documentation and user guides for core, concurrency, I/O, CLI, AI, and observability components.
 
 ## Features
@@ -37,7 +39,7 @@ ElectriPy Studio is a curated collection of production-ready Python components a
 - 💻 **CLI**: Typer-based command-line interface with health checks
 - 🤖 **AI building blocks**: Provider-agnostic LLM Gateway with sync/async clients and structured-output helpers, plus a RAG Evaluation Runner for retrieval benchmarking.
 - 📊 **AI Telemetry**: Provider-agnostic telemetry primitives and adapters (JSONL, optional OpenTelemetry) for HTTP resilience, LLM gateway, policy decisions, and RAG evaluation runs.
-- 🧠 **AI product engineering utilities**: Streaming chat primitives, deterministic agent runtime helpers, RAG quality/drift metrics, grounding checks for hallucination reduction, and response robustness helpers for structured outputs.
+- 🧠 **AI product engineering utilities**: Streaming chat primitives, deterministic agent runtime helpers, RAG quality/drift metrics, grounding checks for hallucination reduction, response robustness helpers for structured outputs, prompt templating and composition, token budget tracking and truncation, priority-based context window assembly, rule-based model routing, sliding-window conversation memory, and a declarative tool registry with JSON schema generation.
 
 ## Quick Start
 
@@ -159,7 +161,13 @@ electripy-studio/
 │       ├── agent_runtime/  # Deterministic tool-plan execution primitives
 │       ├── rag_quality/    # Retrieval metrics and drift comparison helpers
 │       ├── hallucination_guard/ # Grounding and citation checks
-│       └── response_robustness/ # JSON extraction/repair and output guards
+│       ├── response_robustness/ # JSON extraction/repair and output guards
+│       ├── prompt_engine/       # Template composition and few-shot management
+│       ├── token_budget/        # Pluggable token counting and truncation
+│       ├── context_assembly/    # Priority-based context window packing
+│       ├── model_router/        # Rule-based model selection and routing
+│       ├── conversation_memory/ # Sliding window and token-aware chat history
+│       └── tool_registry/       # Declarative tool definitions and JSON schema
 ├── tests/                  # Test suite
 ├── docs/                   # Documentation
 ├── recipes/                # Example recipes
 
@@ -95,6 +95,52 @@ Complete API reference for ElectriPy modules.
 - `require_fields(value, fields) -> None`
 - `coalesce_non_empty(candidates) -> str`
 
+### Prompt Engine
+
+- `render_template(template, variables) -> str`: Replace `{{var}}` placeholders in a template string.
+- `build_few_shot_block(examples, max_examples=...) -> list[RenderedMessage]`: Convert few-shot examples into interleaved user/assistant messages.
+- `compose_messages(system=..., few_shot=..., user=..., variables=...) -> RenderedPrompt`: Compose a full chat prompt from building blocks.
+- `FewShotExample`: Typed few-shot example pair.
+- `RenderedPrompt.to_dicts() -> list[dict]`: Export messages for LLM API payloads.
+
+### Token Budget
+
+- `TokenizerPort`: Protocol for pluggable token counting.
+- `CharEstimatorTokenizer(chars_per_token=4.0)`: Zero-dependency character-based token estimator.
+- `count_tokens(text, tokenizer) -> TokenCount`
+- `fits_budget(text, budget, tokenizer) -> bool`
+- `truncate_to_budget(text, budget, tokenizer, strategy=..., strict=...) -> TruncationResult`
+- `TruncationStrategy`: TAIL, HEAD, or MIDDLE truncation.
+
+### Context Assembly
+
+- `ContextBlock(label, content, priority)`: A block of content with a priority level.
+- `ContextPriority`: LOW, MEDIUM, HIGH, CRITICAL.
+- `assemble_context(blocks, budget, tokenizer) -> AssembledContext`: Pack blocks into a token-limited window, dropping lowest priority first.
+
+### Model Router
+
+- `ModelProfile(model_id, provider, cost_tier, ...)`: Model capability/cost profile.
+- `RoutingRule(name, predicate)`: Composable model selection predicate.
+- `ModelRouter(models).route(rules) -> RoutingDecision`: Select cheapest model satisfying all rules.
+- `CostTier`: FREE, LOW, MEDIUM, HIGH, PREMIUM.
+
+### Conversation Memory
+
+- `append_turn(window, role, content, tokenizer) -> ConversationWindow`
+- `recent_turns(window, n) -> ConversationWindow`
+- `sliding_window(window, max_turns, tokenizer) -> ConversationWindow`
+- `trim_to_budget(window, budget, tokenizer, preserve_system=True) -> ConversationWindow`
+- `ConversationWindow.to_dicts() -> list[dict]`: Export for LLM API payloads.
+
+### Tool Registry
+
+- `tool_from_function(func, name=..., description=...) -> ToolDefinition`: Create tool definitions from Python functions.
+- `generate_schema(func) -> ToolSchema`: Infer JSON Schema from function signature.
+- `validate_arguments(tool, arguments) -> dict`: Validate and fill defaults.
+- `ToolRegistry()`: Register, look up, and export tools.
+- `ToolRegistry.to_openai_tools() -> list[dict]`: Export in OpenAI function-calling format.
+
 ---
 
 For more detailed examples, see the [User Guide](user-guide/core.md) and [Recipes](recipes/cli-tool.md).
@@ -8,8 +8,8 @@ ElectriPy Studio is a curated collection of production-ready Python components a
 
 ## Status
 
-- **Last updated**: 2026-03-04
-- **Maturity**: Early alpha (APIs may evolve), but core components, CLI, concurrency primitives, and first AI building blocks are in place.
+- **Last updated**: 2026-03-24
+- **Maturity**: Early alpha (APIs may evolve), but core components, CLI, concurrency primitives, and a growing suite of AI product engineering utilities are in place.
 
 ## Features
 
@@ -19,7 +19,7 @@ ElectriPy Studio is a curated collection of production-ready Python components a
 - **CLI**: Typer-based command-line interface with health checks and evaluation commands
 - **AI & LLM Gateway**: Provider-agnostic LLM clients with structured output and safety seams, plus a RAG Evaluation Runner for benchmarking retrieval quality.
 - **AI Telemetry**: Provider-agnostic telemetry primitives and adapters for HTTP resilience, LLM gateway, policy decisions, and RAG evaluation, with a safe-by-default posture.
-- **AI Product Engineering Utilities**: Streaming chat, deterministic agent runtime helpers, RAG quality/drift metrics, hallucination-risk grounding checks, and response robustness helpers.
+- **AI Product Engineering Utilities**: Streaming chat, deterministic agent runtime helpers, RAG quality/drift metrics, hallucination-risk grounding checks, response robustness helpers, prompt templating, token budget management, priority-based context assembly, rule-based model routing, conversation memory, and declarative tool registry.
 
 ## Documentation Map
 
 
@@ -9,6 +9,12 @@ ElectriPy Studio includes lightweight, composable Python components for advanced
 - RAG quality metrics and retrieval drift comparison helpers.
 - Hallucination-risk reduction helpers through grounding/citation checks.
 - Response robustness helpers for JSON extraction, repair, and strict field validation.
+- Prompt templating with variable injection and few-shot example management.
+- Token budget tracking, budget checking, and multi-strategy truncation.
+- Priority-based context window assembly with automatic low-priority block dropping.
+- Rule-based model routing for cost/capability optimization.
+- Sliding-window conversation memory with token-budget-aware trimming.
+- Declarative tool registry with automatic JSON schema generation and OpenAI export.
 
 ## Component map
 
@@ -17,6 +23,12 @@ ElectriPy Studio includes lightweight, composable Python components for advanced
 - `electripy.ai.rag_quality`
 - `electripy.ai.hallucination_guard`
 - `electripy.ai.response_robustness`
+- `electripy.ai.prompt_engine`
+- `electripy.ai.token_budget`
+- `electripy.ai.context_assembly`
+- `electripy.ai.model_router`
+- `electripy.ai.conversation_memory`
+- `electripy.ai.tool_registry`
 
 ## Quick examples
 
@@ -85,3 +97,122 @@ from electripy.ai.response_robustness import parse_json_with_repair, require_fie
 parsed = parse_json_with_repair("```json\n{\"answer\": \"ok\",}\n```")
 require_fields(parsed.value, ["answer"])
 ```
+
+### Prompt templating and composition
+
+```python
+from electripy.ai.prompt_engine import compose_messages, FewShotExample
+
+prompt = compose_messages(
+    system="You are a {{persona}}.",
+    few_shot=[FewShotExample(user="2+2?", assistant="4")],
+    user="Summarize: {{text}}",
+    variables={"persona": "helpful assistant", "text": "ElectriPy is great"},
+)
+
+# Ready for any LLM API
+messages = prompt.to_dicts()
+```
+
+### Token budget management
+
+```python
+from electripy.ai.token_budget import (
+    CharEstimatorTokenizer,
+    fits_budget,
+    truncate_to_budget,
+    TruncationStrategy,
+)
+
+tokenizer = CharEstimatorTokenizer()
+
+assert fits_budget("short text", budget=100, tokenizer=tokenizer)
+
+result = truncate_to_budget(
+    "A very long document that exceeds the budget...",
+    budget=5,
+    tokenizer=tokenizer,
+    strategy=TruncationStrategy.TAIL,
+)
+assert result.was_truncated
+```
+
+### Priority-based context assembly
+
+```python
+from electripy.ai.context_assembly import (
+    ContextBlock,
+    ContextPriority,
+    assemble_context,
+)
+from electripy.ai.token_budget import CharEstimatorTokenizer
+
+blocks = [
+    ContextBlock(label="system", content="You are helpful.", priority=ContextPriority.CRITICAL),
+    ContextBlock(label="docs", content="Long reference document...", priority=ContextPriority.LOW),
+    ContextBlock(label="query", content="What is X?", priority=ContextPriority.HIGH),
+]
+
+result = assemble_context(blocks, budget=50, tokenizer=CharEstimatorTokenizer())
+# Low-priority blocks are dropped first when budget is exceeded
+print(result.dropped_labels)
+```
+
+### Rule-based model routing
+
+```python
+from electripy.ai.model_router import (
+    CostTier,
+    ModelProfile,
+    ModelRouter,
+    RoutingRule,
+)
+
+router = ModelRouter(models=[
+    ModelProfile(model_id="gpt-4o-mini", provider="openai", cost_tier=CostTier.LOW, supports_structured_output=True),
+    ModelProfile(model_id="gpt-4o", provider="openai", cost_tier=CostTier.HIGH, supports_vision=True),
+])
+
+decision = router.route([
+    RoutingRule(name="needs-vision", predicate=lambda m: m.supports_vision),
+])
+assert decision.selected.model_id == "gpt-4o"
+```
+
+### Conversation memory with token budgets
+
+```python
+from electripy.ai.conversation_memory import (
+    ConversationWindow,
+    TurnRole,
+    append_turn,
+    trim_to_budget,
+)
+from electripy.ai.token_budget import CharEstimatorTokenizer
+
+tokenizer = CharEstimatorTokenizer()
+window = ConversationWindow()
+window = append_turn(window, TurnRole.SYSTEM, "You are helpful.", tokenizer)
+window = append_turn(window, TurnRole.USER, "Hello!", tokenizer)
+window = append_turn(window, TurnRole.ASSISTANT, "Hi there!", tokenizer)
+
+# Trim to budget, always preserving system messages
+trimmed = trim_to_budget(window, budget=20, tokenizer=tokenizer, preserve_system=True)
+messages = trimmed.to_dicts()
+```
+
+### Declarative tool registry
+
+```python
+from electripy.ai.tool_registry import tool_from_function, ToolRegistry
+
+def search(query: str, limit: int = 10) -> list[str]:
+    """Search the knowledge base."""
+    ...
+
+registry = ToolRegistry()
+registry.register(tool_from_function(search, name="search"))
+
+# Export for OpenAI function-calling API
+tools = registry.to_openai_tools()
+```
@@ -16,10 +16,16 @@
 
 __all__ = [
     "agent_runtime",
+    "context_assembly",
+    "conversation_memory",
     "hallucination_guard",
     "llm_gateway",
+    "model_router",
+    "prompt_engine",
     "rag",
     "rag_quality",
     "response_robustness",
     "streaming_chat",
+    "token_budget",
+    "tool_registry",
 ]
@@ -0,0 +1,26 @@
+"""Priority-based context window assembly for LLM prompts.
+
+Purpose:
+  - Pack system prompts, documents, examples, and user queries into a
+    token-limited context window with explicit priority ordering.
+  - Automatically trim lower-priority blocks when the budget is exceeded.
+
+Guarantees:
+  - Higher-priority blocks are never dropped before lower-priority ones.
+  - Uses the TokenizerPort from token_budget for consistent counting.
+"""
+
+from __future__ import annotations
+
+from .domain import AssembledContext, ContextBlock, ContextPriority
+from .errors import AssemblyError, EmptyAssemblyError
+from .services import assemble_context
+
+__all__ = [
+    "ContextBlock",
+    "ContextPriority",
+    "AssembledContext",
+    "AssemblyError",
+    "EmptyAssemblyError",
+    "assemble_context",
+]
@@ -0,0 +1,54 @@
+"""Domain models for context assembly."""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from enum import IntEnum
+
+
+class ContextPriority(IntEnum):
+    """Priority levels for context blocks (higher = more important)."""
+
+    LOW = 10
+    MEDIUM = 20
+    HIGH = 30
+    CRITICAL = 40
+
+
+@dataclass(slots=True)
+class ContextBlock:
+    """A single block of content to include in the context window.
+
+    Attributes:
+        label: Human-readable label for this block (e.g. "system_prompt").
+        content: The text content.
+        priority: Priority level; higher values survive truncation.
+        token_count: Cached token count (populated during assembly).
+    """
+
+    label: str
+    content: str
+    priority: ContextPriority = ContextPriority.MEDIUM
+    token_count: int = 0
+
+
+@dataclass(slots=True)
+class AssembledContext:
+    """Result of assembling context blocks within a budget.
+
+    Attributes:
+        blocks: Blocks that survived assembly, in original insertion order.
+        total_tokens: Total token count of assembled blocks.
+        dropped_labels: Labels of blocks that were dropped due to budget.
+        budget: The token budget used for assembly.
+    """
+
+    blocks: list[ContextBlock]
+    total_tokens: int
+    dropped_labels: list[str] = field(default_factory=list)
+    budget: int = 0
+
+    @property
+    def text(self) -> str:
+        """Concatenate all surviving block contents with double newlines."""
+        return "\n\n".join(b.content for b in self.blocks)
@@ -0,0 +1,11 @@
+"""Exception hierarchy for context assembly."""
+
+from __future__ import annotations
+
+
+class AssemblyError(Exception):
+    """Base exception for context assembly errors."""
+
+
+class EmptyAssemblyError(AssemblyError):
+    """Raised when no blocks can fit within the budget."""