Skip to content

Commit 94da594

Browse files
feat: add 7 Tier 1 daily-driver utilities (v0.3.0)
New utilities: - FallbackChainPort — automatic LLM provider failover - batch_complete() — concurrent LLM fan-out with backpressure - CostLedger — thread-safe token cost tracking with label slicing - prompt_fingerprint() — deterministic SHA-256 request hashing - json_repair() — fix 7 common LLM JSON breakage patterns - CircuitBreaker — closed/open/half_open FSM for failure protection - SensitiveDataScanner — PII & secret detection (9 built-in patterns) Also includes: - 82 new tests (351 total), all passing - 7 new user guide pages + mkdocs nav update - Updated README with features, doc links, project structure - Full quality gate: ruff, black, mypy, pytest all green
1 parent 161b110 commit 94da594

27 files changed

+2312
-5
lines changed

CHANGELOG.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [0.3.0] — 2026-03-25
11+
12+
### Added
13+
14+
- **Fallback Chain** — automatic provider failover across ranked `SyncLlmPort` adapters with metadata tracking (`_fallback_provider_index`).
15+
- **Batch Complete** — fan-out N LLM requests with bounded concurrency (`ThreadPoolExecutor`), order-preserving results, per-request error isolation, and progress callbacks.
16+
- **Cost Ledger** — thread-safe token cost accumulation with multi-dimensional label slicing (`by_label`), estimated cost calculation, and snapshot/reset support.
17+
- **Prompt Fingerprint** — deterministic SHA-256 request hashing (compatible with LLM Cache key algorithm) with full and short digest variants.
18+
- **JSON Repair** — fix 7 common LLM JSON breakage patterns: markdown fences, trailing commas, single quotes, unquoted keys, mismatched brackets, and truncated JSON.
19+
- **Circuit Breaker** — closed→open→half_open FSM for cascading failure protection with configurable thresholds, decorator support, and thread-safe state transitions.
20+
- **Sensitive Data Scanner** — regex-based PII and secret detection with 9 built-in patterns (email, phone, SSN, credit card, API keys, AWS, IPv4), extensible via `add_pattern()`.
21+
- User guide documentation for all seven new components.
22+
- 82 new tests (total suite now at 351).
23+
1024
## [0.2.0] — 2026-03-25
1125

1226
### Added

README.md

Lines changed: 29 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,13 @@ ElectriPy is **not** a framework — it's a composable toolkit of production-gra
4646
- **Maturity**: Early alpha (APIs may still evolve), but core components, CLI, concurrency primitives, and a growing suite of AI product engineering utilities are in place.
4747
- **Versioning**: SemVer begins at `v0.x` — expect breaking changes until `v1.0`.
4848
- **Recent highlights**:
49+
- Added **Fallback Chain** — automatic provider failover across ranked `SyncLlmPort` adapters with metadata tracking.
50+
- Added **Batch Complete** — fan-out N LLM requests with bounded concurrency, order-preserving results, and per-request error isolation.
51+
- Added **Cost Ledger** — thread-safe token cost accumulation with per-label slicing (tenant, model, feature).
52+
- Added **Prompt Fingerprint** — deterministic SHA-256 request hashing for caching, dedup, and audit trails.
53+
- Added **JSON Repair** — fix markdown fences, trailing commas, single quotes, unquoted keys, mismatched brackets, and truncated JSON in one call.
54+
- Added **Circuit Breaker** — closed→open→half_open FSM protecting against cascading provider failures.
55+
- Added **Sensitive Data Scanner** — regex-based PII and secret detection (email, phone, SSN, API keys, AWS keys) with extensible patterns.
4956
- Added a **Structured Output Engine** — extract typed Pydantic models from LLM text with auto-retry and temperature decay.
5057
- Added an **LLM Caching Layer** — pluggable response caching (in-memory LRU, SQLite WAL) with hit-rate tracking.
5158
- Added an **LLM Replay Tape** — record, replay, and diff LLM interactions for deterministic offline tests.
@@ -61,7 +68,7 @@ ElectriPy is **not** a framework — it's a composable toolkit of production-gra
6168
## Features
6269

6370
- 🔧 **Core Components**: Configuration, logging, error handling, and type utilities
64-
-**Concurrency**: Retry mechanisms (sync/async) and async token bucket rate limiter
71+
-**Concurrency**: Retry mechanisms (sync/async), async token bucket rate limiter, and circuit breaker for cascading failure protection
6572
- 📁 **I/O**: JSONL read/write utilities for efficient data processing
6673
- 💻 **CLI**: Typer-based command-line interface with health checks, RAG eval runner, and an offline demo showcase (`electripy demo policy-collab`)
6774
- 🤖 **AI building blocks**: Provider-agnostic LLM Gateway with sync/async clients, request/response policy hooks, structured-output helpers, and a RAG Evaluation Runner for retrieval benchmarking.
@@ -73,6 +80,12 @@ ElectriPy is **not** a framework — it's a composable toolkit of production-gra
7380
- �📊 **AI Telemetry**: Provider-agnostic telemetry primitives and adapters (JSONL, optional OpenTelemetry) for HTTP resilience, LLM gateway, policy decisions, and RAG evaluation runs.
7481
- 🧠 **AI product engineering utilities**: Streaming chat primitives, deterministic agent runtime helpers, RAG quality/drift metrics, grounding checks for hallucination reduction, response robustness helpers for structured outputs, prompt templating and composition, token budget tracking and truncation, priority-based context window assembly, rule-based model routing, sliding-window conversation memory, and a declarative tool registry with JSON schema generation.
7582
- 🛡️ **AI policy and collaboration runtime**: Deterministic policy gateway checks for preflight/postflight/stream/tool flows, plus bounded agent-to-agent collaboration runtime for specialist orchestration patterns.
83+
- 🔗 **Fallback Chain**: Automatic provider failover — tries ranked LLM adapters in order with metadata tracking.
84+
- 📦 **Batch Complete**: Fan-out N LLM requests with bounded concurrency, order-preserving results, and per-request error isolation.
85+
- 💰 **Cost Ledger**: Thread-safe token cost accumulation with multi-dimensional label slicing.
86+
- 🔑 **Prompt Fingerprint**: Deterministic SHA-256 request hashing for caching, dedup, and drift detection.
87+
- 🔧 **JSON Repair**: Fix 7 common LLM JSON breakage patterns (fences, trailing commas, single quotes, unquoted keys, mismatched brackets, truncation) in one call.
88+
- 🔒 **Sensitive Data Scanner**: Regex-based PII and secret detection with 9 built-in patterns and extensible custom rules.
7689

7790
## Quick Start
7891

@@ -179,6 +192,13 @@ Full documentation is available in the [docs/](https://github.com/inference-stac
179192
- [LLM Replay Tape](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/ai-replay-tape.md)
180193
- [Eval Assertions](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/ai-eval-assertions.md)
181194
- [Provider Adapters (Anthropic, Ollama)](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/ai-provider-adapters.md)
195+
- [Fallback Chain](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/ai-fallback-chain.md)
196+
- [Batch Complete](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/ai-batch-complete.md)
197+
- [Cost Ledger](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/ai-cost-ledger.md)
198+
- [Prompt Fingerprint](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/ai-prompt-fingerprint.md)
199+
- [JSON Repair](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/ai-json-repair.md)
200+
- [Sensitive Data Scanner](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/ai-sensitive-data-scanner.md)
201+
- [Circuit Breaker](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/circuit-breaker.md)
182202
- [RAG Evaluation Runner](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/ai-rag-eval-runner.md)
183203
- [AI Product Engineering Utilities](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/ai-product-engineering.md)
184204
- [Component Maturity Model](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/component-maturity.md)
@@ -208,7 +228,7 @@ mkdocs serve
208228
electripy-studio/
209229
├── src/electripy/ # Main package
210230
│ ├── core/ # Config, logging, errors, typing
211-
│ ├── concurrency/ # Retry & rate limiting
231+
│ ├── concurrency/ # Retry, rate limiting & circuit breaker
212232
│ ├── io/ # JSONL utilities
213233
│ ├── cli/ # CLI commands
214234
│ └── ai/ # AI building blocks and product-engineering utilities
@@ -230,7 +250,13 @@ electripy-studio/
230250
│ ├── eval_assertions/ # pytest-native assertion helpers for LLM outputs
231251
│ ├── policy_gateway/ # Deterministic pre/post/tool/stream policy decisions
232252
│ ├── tool_registry/ # Declarative tool definitions and JSON schema
233-
│ └── agent_collaboration/ # Bounded multi-agent handoff orchestration
253+
│ ├── agent_collaboration/ # Bounded multi-agent handoff orchestration
254+
│ ├── fallback_chain.py # Automatic provider failover
255+
│ ├── batch_complete.py # Concurrent LLM fan-out with backpressure
256+
│ ├── cost_ledger.py # Thread-safe token cost accumulation
257+
│ ├── prompt_fingerprint.py # Deterministic SHA-256 request hashing
258+
│ ├── json_repair.py # Fix common LLM JSON breakage
259+
│ └── sensitive_data_scanner.py # PII & secret detection
234260
├── tests/ # Test suite
235261
├── docs/ # Documentation
236262
├── recipes/ # Example recipes
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# Batch Complete
2+
3+
`batch_complete()` fans out many LLM requests in parallel with bounded
4+
concurrency, an optional progress callback, and per-request error
5+
isolation.
6+
7+
## When to use it
8+
9+
- You have 10–10 000 prompts to process and want to maximise
10+
throughput without melting your rate limit.
11+
- You need **order-preserving** results — `results[i]` always
12+
corresponds to `requests[i]`.
13+
- You want failed requests to capture the exception rather than crash
14+
the entire batch.
15+
16+
## Core concepts
17+
18+
| Symbol | Role |
19+
|--------|------|
20+
| `batch_complete()` | Main entry point — keyword-only, returns `list[BatchResult]`. |
21+
| `BatchResult` | Type alias: `LlmResponse \| Exception`. |
22+
23+
## Basic example
24+
25+
```python
26+
from electripy.ai.batch_complete import batch_complete
27+
from electripy.ai.llm_gateway import build_llm_sync_client
28+
from electripy.ai.llm_gateway.domain import LlmRequest, ChatMessage, MessageRole
29+
30+
port = build_llm_sync_client("openai")
31+
32+
requests = [
33+
LlmRequest(
34+
model="gpt-4o-mini",
35+
messages=[ChatMessage(role=MessageRole.USER, content=f"Summarise: {doc}")],
36+
)
37+
for doc in documents
38+
]
39+
40+
results = batch_complete(
41+
port=port,
42+
requests=requests,
43+
max_concurrency=5,
44+
on_progress=lambda done, total: print(f"{done}/{total}"),
45+
)
46+
47+
for r in results:
48+
if isinstance(r, Exception):
49+
print(f"FAILED: {r}")
50+
else:
51+
print(r.text[:80])
52+
```
53+
54+
## Parameters
55+
56+
| Param | Type | Default | Description |
57+
|-------|------|---------|-------------|
58+
| `port` | `SyncLlmPort` || Any LLM adapter. |
59+
| `requests` | `Sequence[LlmRequest]` || Ordered prompts. |
60+
| `max_concurrency` | `int` | 5 | Max in-flight calls. |
61+
| `timeout` | `float \| None` | `None` | Per-request timeout forwarded to the port. |
62+
| `on_progress` | `Callable[[int, int], None] \| None` | `None` | `(completed, total)` callback. |
63+
64+
## Error handling
65+
66+
Each request is independent. If one fails, the exception is captured
67+
in the corresponding result slot — the rest of the batch continues.
68+
This means you never lose partial work to one bad prompt.

docs/user-guide/ai-cost-ledger.md

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# Cost Ledger
2+
3+
The **Cost Ledger** tracks LLM token usage and estimated cost in-process
4+
with thread-safe accumulation and label-based slicing.
5+
6+
## When to use it
7+
8+
- You want per-tenant, per-model, or per-feature cost visibility
9+
without shipping data to a third-party service.
10+
- You need a running total during a batch pipeline or an agent loop.
11+
- You want to set spend alerts or budget guards in calling code.
12+
13+
## Core concepts
14+
15+
| Symbol | Role |
16+
|--------|------|
17+
| `CostLedger` | Thread-safe accumulator with `record()`, `total()`, `by_label()`. |
18+
| `LedgerEntry` | Frozen record: `tokens` + `labels`. |
19+
| `LedgerTotal` | Frozen aggregate: `tokens`, `estimated_cost`, `call_count`. |
20+
21+
## Basic example
22+
23+
```python
24+
from electripy.ai.cost_ledger import CostLedger
25+
26+
ledger = CostLedger(cost_per_1k_tokens=0.002)
27+
28+
# After each LLM call:
29+
ledger.record(tokens=1_500, labels={"tenant": "acme", "model": "gpt-4o-mini"})
30+
ledger.record(tokens=800, labels={"tenant": "acme", "model": "gpt-4o-mini"})
31+
ledger.record(tokens=3_200, labels={"tenant": "globex", "model": "gpt-4o"})
32+
33+
# Global totals
34+
print(ledger.total())
35+
# LedgerTotal(tokens=5500, estimated_cost=0.011, call_count=3)
36+
37+
# Slice by any label dimension
38+
by_tenant = ledger.by_label("tenant")
39+
print(by_tenant["acme"])
40+
# LedgerTotal(tokens=2300, estimated_cost=0.0046, call_count=2)
41+
```
42+
43+
## Multi-dimensional labels
44+
45+
Labels are arbitrary string key-value pairs. Slice by any dimension:
46+
47+
```python
48+
ledger.record(tokens=500, labels={"model": "gpt-4o", "feature": "chat", "env": "prod"})
49+
50+
by_model = ledger.by_label("model")
51+
by_feature = ledger.by_label("feature")
52+
by_env = ledger.by_label("env")
53+
```
54+
55+
## Thread-safety
56+
57+
All mutations are guarded by an internal lock. Multiple threads can
58+
call `record()` concurrently — `total()` and `by_label()` always return
59+
consistent snapshot aggregates.
60+
61+
## Resetting
62+
63+
Call `ledger.reset()` to clear all entries (for example, between test
64+
runs or pipeline stages).
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# Fallback Chain
2+
3+
The **Fallback Chain** provides automatic provider failover for LLM
4+
calls. Wrap multiple `SyncLlmPort` adapters in a `FallbackChainPort`
5+
and the chain tries each provider in order until one succeeds.
6+
7+
## When to use it
8+
9+
- You run multi-provider setups (OpenAI + Anthropic + local) and want
10+
seamless failover without retry loops in calling code.
11+
- A primary provider is occasionally rate-limited or down.
12+
- You want to track **which** provider handled each request.
13+
14+
## Core concepts
15+
16+
| Symbol | Role |
17+
|--------|------|
18+
| `FallbackChainPort` | Implements `SyncLlmPort`, wraps N providers in ranked order. |
19+
20+
On success the response carries
21+
`metadata["_fallback_provider_index"]` — the zero-based index of the
22+
provider that handled the call.
23+
24+
## Basic example
25+
26+
```python
27+
from electripy.ai.fallback_chain import FallbackChainPort
28+
from electripy.ai.llm_gateway import build_llm_sync_client
29+
30+
chain = FallbackChainPort(
31+
providers=[
32+
build_llm_sync_client("openai"),
33+
build_llm_sync_client("anthropic"),
34+
build_llm_sync_client("ollama"),
35+
],
36+
)
37+
38+
response = chain.complete(request)
39+
print(response.metadata["_fallback_provider_index"]) # 0, 1, or 2
40+
```
41+
42+
## Behaviour on failure
43+
44+
- Exceptions from non-final providers are **swallowed** (logged at
45+
`DEBUG` level).
46+
- If **all** providers fail, the exception from the **last** provider
47+
is re-raised — giving you a clear error from the final fallback.
48+
49+
## Combining with other utilities
50+
51+
```python
52+
from electripy.concurrency.circuit_breaker import CircuitBreaker
53+
54+
# Wrap individual providers in circuit breakers, then chain them.
55+
cb_openai = CircuitBreaker(failure_threshold=3, recovery_timeout=30.0)
56+
cb_anthropic = CircuitBreaker(failure_threshold=3, recovery_timeout=30.0)
57+
58+
chain = FallbackChainPort(
59+
providers=[
60+
cb_openai(openai_adapter.complete),
61+
cb_anthropic(anthropic_adapter.complete),
62+
],
63+
)
64+
```

docs/user-guide/ai-json-repair.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# JSON Repair
2+
3+
`json_repair()` fixes the most common JSON breakage patterns produced
4+
by LLMs and returns a parsed `dict` in one call.
5+
6+
## When to use it
7+
8+
- An LLM returns JSON wrapped in markdown fences, with trailing commas,
9+
single-quoted keys, or gets cut off mid-object by token limits.
10+
- You want a single function that handles all of these cases without
11+
chaining regex hacks yourself.
12+
13+
## Repair strategies (applied in order)
14+
15+
1. **Strip markdown fences**`` ```json … ``` ``
16+
2. **Extract the outermost `{…}` block** from surrounding prose.
17+
3. **Remove trailing commas** before `}` or `]`.
18+
4. **Replace single-quoted strings** with double quotes.
19+
5. **Quote bare (unquoted) keys** — JavaScript-style `name:``"name":`.
20+
6. **Fix mismatched brackets** — inserts missing `]` or `}` when
21+
a closer matches a deeper bracket (e.g. `{"items": [1,2,3}`
22+
`{"items": [1,2,3]}`).
23+
7. **Close truncated JSON** — appends missing braces/brackets for
24+
objects that were cut off by token limits.
25+
26+
## Basic example
27+
28+
```python
29+
from electripy.ai.json_repair import json_repair
30+
31+
text = '''Here is the result:
32+
```json
33+
{"name": "Alice", "age": 30,}
34+
```'''
35+
36+
data = json_repair(text)
37+
print(data) # {"name": "Alice", "age": 30}
38+
```
39+
40+
## Raw string variant
41+
42+
If you need the repaired JSON as a string (e.g. for logging or storage)
43+
rather than a parsed dict:
44+
45+
```python
46+
from electripy.ai.json_repair import json_repair_raw
47+
48+
raw = json_repair_raw(text)
49+
print(type(raw)) # <class 'str'>
50+
```
51+
52+
## Truncated JSON
53+
54+
Token limits frequently cut off LLM output mid-object. `json_repair`
55+
handles this automatically:
56+
57+
```python
58+
data = json_repair('{"users": [{"name": "Alice"')
59+
# {"users": [{"name": "Alice"}]}
60+
```
61+
62+
## Error handling
63+
64+
If no JSON object can be recovered at all, `ValueError` is raised.
65+
The error message includes the first 200 characters of the input for
66+
debugging.

0 commit comments

Comments
 (0)