ElectriPy Studio

The Python substrate for observable agent engineering.

Overview

ElectriPy Studio is a curated collection of production-grade Python components for building observable, testable, and governable agent systems. It provides composable infrastructure for LLM routing, evaluation, policy enforcement, MCP integration, reusable skills packaging, realtime session orchestration, and telemetry-aware runtime execution — all without adopting a framework.

Use ElectriPy when you want typed, production-grade building blocks that compose into your architecture rather than a monolithic framework that owns it.

Why ElectriPy Studio

Problem	What ElectriPy provides
Agent systems are hard to observe	Observe — OpenTelemetry-aligned tracing with span kinds for LLM, agent, tool, retrieval, and policy operations
LLM calls need governance	Policy Engine + Policy Gateway — rule-based access control, PII scanning, approval workflows, and request/response guardrails
Evaluation is an afterthought	Evals + Eval Assertions — dataset-driven scoring, baseline drift detection, and pytest-native CI gating
Provider switching is costly	LLM Gateway + Provider Adapters + Workload Router — swap providers without rewriting business logic; route by cost, latency, or capability
Tool integrations are fragile	MCP Toolkit — strongly typed Model Context Protocol clients and server adapters
Agent knowledge is scattered	Skills — versioned, validated, template-aware skill packages with manifest-driven composition
Streaming sessions are glue code	Realtime — session lifecycle, event sequencing, tool-call orchestration, interruption, and backpressure in a provider-neutral runtime
No time to build infrastructure	30+ composable components — caching, retries, circuit breakers, JSON repair, cost tracking, batch fan-out, replay tapes, and more

Design principles

Ports & Adapters everywhere. Swap providers, stores, transports, and tools without rewriting business logic.
Deterministic by default. Stable IDs, reproducible evaluation runs, and guarded state machines.
Observable from day one. Structured tracing, telemetry hooks, and observer ports are built in — not bolted on.
Safe logging posture. Hashes and redaction seams instead of raw prompts in logs.
Typed, production APIs. Small public surfaces, strict typing, frozen dataclasses, and Protocol-based interfaces.
Testable without the network. 1,000+ tests run offline, deterministically, with no API keys required.

Architecture

graph TD
    subgraph Foundation
        CORE[Core — config, logging, errors]
        CONC[Concurrency — retry, rate limit, circuit breaker]
        IO[IO — JSONL read/write]
        CLI[CLI — commands & demos]
    end

    subgraph "Agent Infrastructure"
        GW[LLM Gateway]
        PA[Provider Adapters]
        WR[Workload Router]
        FC[Fallback Chain]
        BC[Batch Complete]
        SO[Structured Output]
    end

    subgraph "Observability & Governance"
        OBS[Observe — tracing & spans]
        TEL[Telemetry — adapters]
        POL[Policy Engine]
        PGW[Policy Gateway]
        SDS[Sensitive Data Scanner]
    end

    subgraph "Evaluation & Quality"
        EV[Evals — dataset scoring]
        EA[Eval Assertions — CI gating]
        RAG[RAG Eval Runner]
    end

    subgraph "Composition & Packaging"
        SK[Skills — versioned packages]
        MCP[MCP Toolkit]
        PE[Prompt Engine]
        TR[Tool Registry]
    end

    subgraph "Orchestration & Runtime"
        RT[Realtime — session orchestration]
        AC[Agent Collaboration]
        SC[Streaming Chat]
    end

    GW --> PA
    GW --> FC
    GW --> BC
    GW --> SO
    WR --> GW
    PGW --> GW
    POL --> PGW
    OBS --> TEL
    SK --> PE
    RT --> TR
    AC --> POL
    EV --> EA

Package map

Agent infrastructure

Package	Purpose
`llm_gateway`	Provider-agnostic sync/async LLM clients with request/response hooks
`provider_adapters`	OpenAI, Anthropic, Ollama, and generic HTTP-JSON adapters
`workload_router`	Policy-driven, cost/latency/capability-aware model selection and routing
`fallback_chain`	Ranked provider failover with metadata tracking
`batch_complete`	Concurrent LLM fan-out with bounded concurrency and per-request error isolation
`structured_output`	Pydantic model extraction from LLM text with auto-retry and temperature decay
`llm_cache`	Pluggable response caching (in-memory LRU, SQLite WAL) with hit-rate tracking
`replay_tape`	Record, replay, and diff LLM interactions for deterministic offline tests

Observability & governance

Package	Purpose
`observe`	OpenTelemetry-aligned structured tracing with AI-specific span kinds (LLM, agent, tool, retrieval, policy, MCP)
`telemetry`	Provider-agnostic telemetry adapters (JSONL, OpenTelemetry) for HTTP, LLM, policy, and RAG events
`policy`	Enterprise policy engine — subject/resource/action rules, approval workflows, evidence requirements, escalation chains
`policy_gateway`	Deterministic request/response guardrails with regex-based detection, sanitization, and multi-stage enforcement
`sensitive_data_scanner`	PII and secret detection with 9+ built-in patterns and extensible custom rules

Evaluation & quality

Package	Purpose
`evals`	Dataset-driven evaluation framework with scoring, baseline comparison, and CI-friendly reporting
`eval_assertions`	Pytest-native assertion helpers (keyword, regex, JSON schema, predicate, length) for LLM output validation
`rag_eval_runner`	Retrieval benchmarking with precision/recall/MRR metrics and drift detection

Composition & packaging

Package	Purpose
`skills`	Versioned, validated skill packages with manifest-driven composition and `{{variable}}` template rendering
`mcp`	Strongly typed Model Context Protocol toolkit for building MCP clients, servers, and tool adapters
`prompt_engine`	Template composition, variable substitution, and few-shot example management
`tool_registry`	Declarative tool definitions with JSON schema generation and OpenAI function-calling format

Orchestration & runtime

Package	Purpose
`realtime`	Session lifecycle orchestration — event sequencing, tool calls, interruption, backpressure, transport abstraction
`agent_collaboration`	Bounded multi-agent handoff orchestration with hop limits and policy integration
`streaming_chat`	Sync/async stream chunk primitives and text collection helpers
`agent_runtime`	Deterministic tool-plan execution with step-by-step control

Core infrastructure

Package	Purpose
`core`	Configuration, structured logging, error hierarchy, type utilities
`concurrency`	Retry (sync/async), rate limiting, circuit breaker for cascading failure protection
`io`	JSONL read/write, data processing utilities
`cli`	Typer-based CLI with health checks, RAG eval, and offline demo commands

Supporting components

Component	Purpose
`cost_ledger`	Thread-safe token cost accumulation with multi-label slicing
`prompt_fingerprint`	Deterministic SHA-256 request hashing for caching, dedup, and drift detection
`json_repair`	Fix 7 common LLM JSON breakage patterns in one call
`conversation_memory`	Sliding-window and token-aware chat history management
`context_assembly`	Priority-based context window packing and truncation
`model_router`	Rule-based model selection (see also `workload_router` for the full routing engine)
`token_budget`	Pluggable token counting and budget-aware truncation
`hallucination_guard`	Grounding and citation verification checks
`response_robustness`	JSON extraction, output guards, and structured response validation
`rag_quality`	Retrieval quality metrics and drift comparison helpers

How ElectriPy compares

ElectriPy is not a framework — it is composable infrastructure. Import the pieces you need; leave the rest.

Library	Overlap	ElectriPy's edge
LiteLLM	Provider-agnostic LLM gateway	Bundles policy hooks, observability, structured output, and workload routing inline — no proxy server
Guardrails AI	Input/output validation	Lighter-weight, composable policy engine + gateway — no XML DSL or hosted dependency
CrewAI / AutoGen	Multi-agent orchestration	Bounded, deterministic collaboration with hop limits; building blocks, not a framework
RAGAS	RAG evaluation	Integrates eval directly into CI gating with drift comparison; ships scoring, assertions, and dataset harness
Instructor	Structured LLM output	Dedicated structured output engine with retry + temperature decay, plus caching, replay tape, and cost tracking
Haystack / LangChain	Full RAG/agent framework	Composable building blocks you import — not a framework you adopt wholesale

Status

Maturity: Early alpha — APIs may still evolve. Core components, agent infrastructure, and the full observability/governance/evaluation stack are implemented and tested.
Test suite: 1,000+ tests, all offline and deterministic.
Versioning: SemVer at v0.x — expect breaking changes until v1.0.

Quick start

Install

pip install electripy-studio

Verify

electripy doctor

Core usage

from electripy import Config, get_logger
from electripy.concurrency import retry

config = Config.from_env()
logger = get_logger(__name__)

@retry(max_attempts=3, delay=1.0, backoff=2.0)
def fetch_data():
    return api_call()

LLM Gateway with policy hooks

from electripy.ai.llm_gateway import LlmGatewaySyncClient
from electripy.ai.policy_gateway import PolicyGateway, PolicyRule, PolicyStage, PolicyAction

gateway = PolicyGateway(rules=[
    PolicyRule(
        rule_id="pii-email", code="PII_EMAIL",
        description="Mask emails",
        stage=PolicyStage.PREFLIGHT,
        pattern=r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+",
        action=PolicyAction.SANITIZE,
    ),
])

Evaluation in CI

from electripy.ai.eval_assertions import assert_llm_output

assert_llm_output("The capital of France is Paris.", contains=["Paris"], min_length=10)

Realtime session

from electripy.ai.realtime import RealtimeSessionService, RealtimeConfig, OutputStreamChunk

svc = RealtimeSessionService()
session = svc.create_session(config=RealtimeConfig(model="gpt-4o"))
svc.start_session(session.session_id)
svc.emit_output(session.session_id, OutputStreamChunk(index=0, text="Hello"))
svc.complete_session(session.session_id)

Demo: Policy + Agent Collaboration

electripy demo policy-collab

See recipes/03_policy_collaboration/ for the standalone script.

Documentation

Full documentation is served via MkDocs. Build and serve locally:

pip install -e ".[docs]"
mkdocs serve

Getting started

Agent infrastructure

Observability & governance

Evaluation & quality

Composition & packaging

Orchestration & runtime

Foundation

Reference

Project structure

electripy-studio/
├── src/electripy/
│   ├── core/                   # Config, logging, errors, typing
│   ├── concurrency/            # Retry, rate limiting, circuit breaker
│   ├── io/                     # JSONL utilities
│   ├── cli/                    # CLI commands & demos
│   └── ai/                     # Agent engineering components
│       ├── llm_gateway/        # Provider-agnostic LLM clients
│       ├── workload_router/    # Cost/latency/capability-aware model routing
│       ├── observe/            # Structured tracing & span lifecycle
│       ├── mcp/                # Model Context Protocol toolkit
│       ├── evals/              # Dataset-driven evaluation framework
│       ├── policy/             # Enterprise policy engine
│       ├── policy_gateway/     # Request/response guardrails
│       ├── skills/             # Versioned skill packaging
│       ├── realtime/           # Session orchestration & event pipeline
│       ├── agent_collaboration/# Multi-agent handoff orchestration
│       ├── structured_output/  # Pydantic extraction with retry
│       ├── eval_assertions/    # Pytest-native LLM output validation
│       ├── streaming_chat/     # Stream chunk primitives
│       ├── llm_cache/          # Response caching (LRU, SQLite)
│       ├── replay_tape/        # Record/replay/diff LLM interactions
│       ├── tool_registry/      # Declarative tool definitions
│       ├── prompt_engine/      # Template composition
│       ├── token_budget/       # Token counting & truncation
│       ├── context_assembly/   # Priority-based context packing
│       ├── agent_runtime/      # Deterministic tool-plan execution
│       ├── rag_eval_runner/    # Retrieval benchmarking
│       ├── rag_quality/        # Retrieval quality metrics
│       ├── hallucination_guard/# Grounding & citation checks
│       ├── response_robustness/# Output guards & JSON extraction
│       ├── model_router/       # Rule-based model selection
│       ├── conversation_memory/# Sliding-window chat history
│       ├── fallback_chain.py   # Provider failover
│       ├── batch_complete.py   # Concurrent LLM fan-out
│       ├── cost_ledger.py      # Token cost accumulation
│       ├── prompt_fingerprint.py # Request hashing
│       ├── json_repair.py      # LLM JSON breakage repair
│       └── sensitive_data_scanner.py # PII & secret detection
├── tests/                      # 1,000+ offline, deterministic tests
├── docs/                       # MkDocs documentation
├── recipes/                    # Runnable examples
│   ├── 01_cli_tool/
│   ├── 02_llm_gateway/
│   └── 03_policy_collaboration/
└── pyproject.toml

Recipes

01_cli_tool — Building a production CLI tool
02_llm_gateway — LLM Gateway with a fake provider (offline-friendly)
03_policy_collaboration — End-to-end policy + multi-agent collaboration demo

Additional recipe guides in the docs:

Development

Running tests

pytest tests/ -v

With coverage:

pytest tests/ -v --cov=src --cov-report=term-missing

Code quality

ruff check .                  # Linting
black .                       # Formatting
mypy src/                     # Type checking

Python tooling (recommended)

These tools are optional but recommended for contributors:

pipx install uv               # Fast package manager
pipx install ruff              # Fast linter
pipx install pre-commit        # Git pre-commit hooks

uv venv .venv && source .venv/bin/activate
uv pip install -e ".[dev]"
pre-commit install

CI/CD

GitHub Actions automatically runs tests, linting, and type checking on all pull requests.

Requirements

Python 3.11 or higher
Dependencies managed via pyproject.toml

License

MIT License — see LICENSE for details.

Contributing

Contributions are welcome! Please read our Contributing Guide and Code of Conduct before submitting PRs. For security issues, see SECURITY.md.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.github		.github
docs		docs
images		images
packages/electripy-cli		packages/electripy-cli
recipes		recipes
src/electripy		src/electripy
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

ElectriPy Studio

Overview

Why ElectriPy Studio

Design principles

Architecture

Package map

Agent infrastructure

Observability & governance

Evaluation & quality

Composition & packaging

Orchestration & runtime

Core infrastructure

Supporting components

How ElectriPy compares

Status

Quick start

Install

Verify

Core usage

LLM Gateway with policy hooks

Evaluation in CI

Realtime session

Demo: Policy + Agent Collaboration

Documentation

Getting started

Agent infrastructure

Observability & governance

Evaluation & quality

Composition & packaging

Orchestration & runtime

Foundation

Reference

Project structure

Recipes

Development

Running tests

Code quality

Python tooling (recommended)

CI/CD

Requirements

License

Contributing

Links

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages