The Python substrate for observable agent engineering.
ElectriPy Studio is a curated collection of production-grade Python components for building observable, testable, and governable agent systems. It provides composable infrastructure for LLM routing, evaluation, policy enforcement, MCP integration, reusable skills packaging, realtime session orchestration, and telemetry-aware runtime execution — all without adopting a framework.
Use ElectriPy when you want typed, production-grade building blocks that compose into your architecture rather than a monolithic framework that owns it.
| Problem | What ElectriPy provides |
|---|---|
| Agent systems are hard to observe | Observe — OpenTelemetry-aligned tracing with span kinds for LLM, agent, tool, retrieval, and policy operations |
| LLM calls need governance | Policy Engine + Policy Gateway — rule-based access control, PII scanning, approval workflows, and request/response guardrails |
| Evaluation is an afterthought | Evals + Eval Assertions — dataset-driven scoring, baseline drift detection, and pytest-native CI gating |
| Provider switching is costly | LLM Gateway + Provider Adapters + Workload Router — swap providers without rewriting business logic; route by cost, latency, or capability |
| Tool integrations are fragile | MCP Toolkit — strongly typed Model Context Protocol clients and server adapters |
| Agent knowledge is scattered | Skills — versioned, validated, template-aware skill packages with manifest-driven composition |
| Streaming sessions are glue code | Realtime — session lifecycle, event sequencing, tool-call orchestration, interruption, and backpressure in a provider-neutral runtime |
| No time to build infrastructure | 30+ composable components — caching, retries, circuit breakers, JSON repair, cost tracking, batch fan-out, replay tapes, and more |
- Ports & Adapters everywhere. Swap providers, stores, transports, and tools without rewriting business logic.
- Deterministic by default. Stable IDs, reproducible evaluation runs, and guarded state machines.
- Observable from day one. Structured tracing, telemetry hooks, and observer ports are built in — not bolted on.
- Safe logging posture. Hashes and redaction seams instead of raw prompts in logs.
- Typed, production APIs. Small public surfaces, strict typing, frozen dataclasses, and Protocol-based interfaces.
- Testable without the network. 1,000+ tests run offline, deterministically, with no API keys required.
graph TD
subgraph Foundation
CORE[Core — config, logging, errors]
CONC[Concurrency — retry, rate limit, circuit breaker]
IO[IO — JSONL read/write]
CLI[CLI — commands & demos]
end
subgraph "Agent Infrastructure"
GW[LLM Gateway]
PA[Provider Adapters]
WR[Workload Router]
FC[Fallback Chain]
BC[Batch Complete]
SO[Structured Output]
end
subgraph "Observability & Governance"
OBS[Observe — tracing & spans]
TEL[Telemetry — adapters]
POL[Policy Engine]
PGW[Policy Gateway]
SDS[Sensitive Data Scanner]
end
subgraph "Evaluation & Quality"
EV[Evals — dataset scoring]
EA[Eval Assertions — CI gating]
RAG[RAG Eval Runner]
end
subgraph "Composition & Packaging"
SK[Skills — versioned packages]
MCP[MCP Toolkit]
PE[Prompt Engine]
TR[Tool Registry]
end
subgraph "Orchestration & Runtime"
RT[Realtime — session orchestration]
AC[Agent Collaboration]
SC[Streaming Chat]
end
GW --> PA
GW --> FC
GW --> BC
GW --> SO
WR --> GW
PGW --> GW
POL --> PGW
OBS --> TEL
SK --> PE
RT --> TR
AC --> POL
EV --> EA
| Package | Purpose |
|---|---|
llm_gateway |
Provider-agnostic sync/async LLM clients with request/response hooks |
provider_adapters |
OpenAI, Anthropic, Ollama, and generic HTTP-JSON adapters |
workload_router |
Policy-driven, cost/latency/capability-aware model selection and routing |
fallback_chain |
Ranked provider failover with metadata tracking |
batch_complete |
Concurrent LLM fan-out with bounded concurrency and per-request error isolation |
structured_output |
Pydantic model extraction from LLM text with auto-retry and temperature decay |
llm_cache |
Pluggable response caching (in-memory LRU, SQLite WAL) with hit-rate tracking |
replay_tape |
Record, replay, and diff LLM interactions for deterministic offline tests |
| Package | Purpose |
|---|---|
observe |
OpenTelemetry-aligned structured tracing with AI-specific span kinds (LLM, agent, tool, retrieval, policy, MCP) |
telemetry |
Provider-agnostic telemetry adapters (JSONL, OpenTelemetry) for HTTP, LLM, policy, and RAG events |
policy |
Enterprise policy engine — subject/resource/action rules, approval workflows, evidence requirements, escalation chains |
policy_gateway |
Deterministic request/response guardrails with regex-based detection, sanitization, and multi-stage enforcement |
sensitive_data_scanner |
PII and secret detection with 9+ built-in patterns and extensible custom rules |
| Package | Purpose |
|---|---|
evals |
Dataset-driven evaluation framework with scoring, baseline comparison, and CI-friendly reporting |
eval_assertions |
Pytest-native assertion helpers (keyword, regex, JSON schema, predicate, length) for LLM output validation |
rag_eval_runner |
Retrieval benchmarking with precision/recall/MRR metrics and drift detection |
| Package | Purpose |
|---|---|
skills |
Versioned, validated skill packages with manifest-driven composition and {{variable}} template rendering |
mcp |
Strongly typed Model Context Protocol toolkit for building MCP clients, servers, and tool adapters |
prompt_engine |
Template composition, variable substitution, and few-shot example management |
tool_registry |
Declarative tool definitions with JSON schema generation and OpenAI function-calling format |
| Package | Purpose |
|---|---|
realtime |
Session lifecycle orchestration — event sequencing, tool calls, interruption, backpressure, transport abstraction |
agent_collaboration |
Bounded multi-agent handoff orchestration with hop limits and policy integration |
streaming_chat |
Sync/async stream chunk primitives and text collection helpers |
agent_runtime |
Deterministic tool-plan execution with step-by-step control |
| Package | Purpose |
|---|---|
core |
Configuration, structured logging, error hierarchy, type utilities |
concurrency |
Retry (sync/async), rate limiting, circuit breaker for cascading failure protection |
io |
JSONL read/write, data processing utilities |
cli |
Typer-based CLI with health checks, RAG eval, and offline demo commands |
| Component | Purpose |
|---|---|
cost_ledger |
Thread-safe token cost accumulation with multi-label slicing |
prompt_fingerprint |
Deterministic SHA-256 request hashing for caching, dedup, and drift detection |
json_repair |
Fix 7 common LLM JSON breakage patterns in one call |
conversation_memory |
Sliding-window and token-aware chat history management |
context_assembly |
Priority-based context window packing and truncation |
model_router |
Rule-based model selection (see also workload_router for the full routing engine) |
token_budget |
Pluggable token counting and budget-aware truncation |
hallucination_guard |
Grounding and citation verification checks |
response_robustness |
JSON extraction, output guards, and structured response validation |
rag_quality |
Retrieval quality metrics and drift comparison helpers |
ElectriPy is not a framework — it is composable infrastructure. Import the pieces you need; leave the rest.
| Library | Overlap | ElectriPy's edge |
|---|---|---|
| LiteLLM | Provider-agnostic LLM gateway | Bundles policy hooks, observability, structured output, and workload routing inline — no proxy server |
| Guardrails AI | Input/output validation | Lighter-weight, composable policy engine + gateway — no XML DSL or hosted dependency |
| CrewAI / AutoGen | Multi-agent orchestration | Bounded, deterministic collaboration with hop limits; building blocks, not a framework |
| RAGAS | RAG evaluation | Integrates eval directly into CI gating with drift comparison; ships scoring, assertions, and dataset harness |
| Instructor | Structured LLM output | Dedicated structured output engine with retry + temperature decay, plus caching, replay tape, and cost tracking |
| Haystack / LangChain | Full RAG/agent framework | Composable building blocks you import — not a framework you adopt wholesale |
- Maturity: Early alpha — APIs may still evolve. Core components, agent infrastructure, and the full observability/governance/evaluation stack are implemented and tested.
- Test suite: 1,000+ tests, all offline and deterministic.
- Versioning: SemVer at
v0.x— expect breaking changes untilv1.0.
pip install electripy-studioelectripy doctorfrom electripy import Config, get_logger
from electripy.concurrency import retry
config = Config.from_env()
logger = get_logger(__name__)
@retry(max_attempts=3, delay=1.0, backoff=2.0)
def fetch_data():
return api_call()from electripy.ai.llm_gateway import LlmGatewaySyncClient
from electripy.ai.policy_gateway import PolicyGateway, PolicyRule, PolicyStage, PolicyAction
gateway = PolicyGateway(rules=[
PolicyRule(
rule_id="pii-email", code="PII_EMAIL",
description="Mask emails",
stage=PolicyStage.PREFLIGHT,
pattern=r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+",
action=PolicyAction.SANITIZE,
),
])from electripy.ai.eval_assertions import assert_llm_output
assert_llm_output("The capital of France is Paris.", contains=["Paris"], min_length=10)from electripy.ai.realtime import RealtimeSessionService, RealtimeConfig, OutputStreamChunk
svc = RealtimeSessionService()
session = svc.create_session(config=RealtimeConfig(model="gpt-4o"))
svc.start_session(session.session_id)
svc.emit_output(session.session_id, OutputStreamChunk(index=0, text="Hello"))
svc.complete_session(session.session_id)electripy demo policy-collabSee recipes/03_policy_collaboration/ for the standalone script.
Full documentation is served via MkDocs. Build and serve locally:
pip install -e ".[docs]"
mkdocs serve- LLM Gateway
- Provider Adapters
- Workload Router
- Fallback Chain
- Batch Complete
- Structured Output
- LLM Caching Layer
- Replay Tape
electripy-studio/
├── src/electripy/
│ ├── core/ # Config, logging, errors, typing
│ ├── concurrency/ # Retry, rate limiting, circuit breaker
│ ├── io/ # JSONL utilities
│ ├── cli/ # CLI commands & demos
│ └── ai/ # Agent engineering components
│ ├── llm_gateway/ # Provider-agnostic LLM clients
│ ├── workload_router/ # Cost/latency/capability-aware model routing
│ ├── observe/ # Structured tracing & span lifecycle
│ ├── mcp/ # Model Context Protocol toolkit
│ ├── evals/ # Dataset-driven evaluation framework
│ ├── policy/ # Enterprise policy engine
│ ├── policy_gateway/ # Request/response guardrails
│ ├── skills/ # Versioned skill packaging
│ ├── realtime/ # Session orchestration & event pipeline
│ ├── agent_collaboration/# Multi-agent handoff orchestration
│ ├── structured_output/ # Pydantic extraction with retry
│ ├── eval_assertions/ # Pytest-native LLM output validation
│ ├── streaming_chat/ # Stream chunk primitives
│ ├── llm_cache/ # Response caching (LRU, SQLite)
│ ├── replay_tape/ # Record/replay/diff LLM interactions
│ ├── tool_registry/ # Declarative tool definitions
│ ├── prompt_engine/ # Template composition
│ ├── token_budget/ # Token counting & truncation
│ ├── context_assembly/ # Priority-based context packing
│ ├── agent_runtime/ # Deterministic tool-plan execution
│ ├── rag_eval_runner/ # Retrieval benchmarking
│ ├── rag_quality/ # Retrieval quality metrics
│ ├── hallucination_guard/# Grounding & citation checks
│ ├── response_robustness/# Output guards & JSON extraction
│ ├── model_router/ # Rule-based model selection
│ ├── conversation_memory/# Sliding-window chat history
│ ├── fallback_chain.py # Provider failover
│ ├── batch_complete.py # Concurrent LLM fan-out
│ ├── cost_ledger.py # Token cost accumulation
│ ├── prompt_fingerprint.py # Request hashing
│ ├── json_repair.py # LLM JSON breakage repair
│ └── sensitive_data_scanner.py # PII & secret detection
├── tests/ # 1,000+ offline, deterministic tests
├── docs/ # MkDocs documentation
├── recipes/ # Runnable examples
│ ├── 01_cli_tool/
│ ├── 02_llm_gateway/
│ └── 03_policy_collaboration/
└── pyproject.toml
- 01_cli_tool — Building a production CLI tool
- 02_llm_gateway — LLM Gateway with a fake provider (offline-friendly)
- 03_policy_collaboration — End-to-end policy + multi-agent collaboration demo
Additional recipe guides in the docs:
- Policy Gateway recipe
- Agent Collaboration Runtime recipe
- Policy + Collaboration E2E recipe
- RAG Evaluation Runner recipe
- AI Telemetry recipe
pytest tests/ -vWith coverage:
pytest tests/ -v --cov=src --cov-report=term-missingruff check . # Linting
black . # Formatting
mypy src/ # Type checkingThese tools are optional but recommended for contributors:
pipx install uv # Fast package manager
pipx install ruff # Fast linter
pipx install pre-commit # Git pre-commit hooks
uv venv .venv && source .venv/bin/activate
uv pip install -e ".[dev]"
pre-commit installGitHub Actions automatically runs tests, linting, and type checking on all pull requests.
- Python 3.11 or higher
- Dependencies managed via
pyproject.toml
MIT License — see LICENSE for details.
Contributions are welcome! Please read our Contributing Guide and Code of Conduct before submitting PRs. For security issues, see SECURITY.md.
