inference-stack-llc
diff --git a/‎CHANGELOG.md‎
Lines changed: 14 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 29 additions & 3 deletions b/‎README.md‎
Lines changed: 29 additions & 3 deletions
diff --git a/‎docs/user-guide/ai-batch-complete.md‎
Lines changed: 68 additions & 0 deletions b/‎docs/user-guide/ai-batch-complete.md‎
Lines changed: 68 additions & 0 deletions
diff --git a/‎docs/user-guide/ai-cost-ledger.md‎
Lines changed: 64 additions & 0 deletions b/‎docs/user-guide/ai-cost-ledger.md‎
Lines changed: 64 additions & 0 deletions
diff --git a/‎docs/user-guide/ai-fallback-chain.md‎
Lines changed: 64 additions & 0 deletions b/‎docs/user-guide/ai-fallback-chain.md‎
Lines changed: 64 additions & 0 deletions
diff --git a/‎docs/user-guide/ai-json-repair.md‎
Lines changed: 66 additions & 0 deletions b/‎docs/user-guide/ai-json-repair.md‎
Lines changed: 66 additions & 0 deletions
@@ -7,6 +7,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+## [0.3.0] — 2026-03-25
+
+### Added
+
+- **Fallback Chain** — automatic provider failover across ranked `SyncLlmPort` adapters with metadata tracking (`_fallback_provider_index`).
+- **Batch Complete** — fan-out N LLM requests with bounded concurrency (`ThreadPoolExecutor`), order-preserving results, per-request error isolation, and progress callbacks.
+- **Cost Ledger** — thread-safe token cost accumulation with multi-dimensional label slicing (`by_label`), estimated cost calculation, and snapshot/reset support.
+- **Prompt Fingerprint** — deterministic SHA-256 request hashing (compatible with LLM Cache key algorithm) with full and short digest variants.
+- **JSON Repair** — fix 7 common LLM JSON breakage patterns: markdown fences, trailing commas, single quotes, unquoted keys, mismatched brackets, and truncated JSON.
+- **Circuit Breaker** — closed→open→half_open FSM for cascading failure protection with configurable thresholds, decorator support, and thread-safe state transitions.
+- **Sensitive Data Scanner** — regex-based PII and secret detection with 9 built-in patterns (email, phone, SSN, credit card, API keys, AWS, IPv4), extensible via `add_pattern()`.
+- User guide documentation for all seven new components.
+- 82 new tests (total suite now at 351).
+
 ## [0.2.0] — 2026-03-25
 
 ### Added
 
@@ -46,6 +46,13 @@ ElectriPy is **not** a framework — it's a composable toolkit of production-gra
 - **Maturity**: Early alpha (APIs may still evolve), but core components, CLI, concurrency primitives, and a growing suite of AI product engineering utilities are in place.
 - **Versioning**: SemVer begins at `v0.x` — expect breaking changes until `v1.0`.
 - **Recent highlights**:
+    - Added **Fallback Chain** — automatic provider failover across ranked `SyncLlmPort` adapters with metadata tracking.
+    - Added **Batch Complete** — fan-out N LLM requests with bounded concurrency, order-preserving results, and per-request error isolation.
+    - Added **Cost Ledger** — thread-safe token cost accumulation with per-label slicing (tenant, model, feature).
+    - Added **Prompt Fingerprint** — deterministic SHA-256 request hashing for caching, dedup, and audit trails.
+    - Added **JSON Repair** — fix markdown fences, trailing commas, single quotes, unquoted keys, mismatched brackets, and truncated JSON in one call.
+    - Added **Circuit Breaker** — closed→open→half_open FSM protecting against cascading provider failures.
+    - Added **Sensitive Data Scanner** — regex-based PII and secret detection (email, phone, SSN, API keys, AWS keys) with extensible patterns.
     - Added a **Structured Output Engine** — extract typed Pydantic models from LLM text with auto-retry and temperature decay.
     - Added an **LLM Caching Layer** — pluggable response caching (in-memory LRU, SQLite WAL) with hit-rate tracking.
     - Added an **LLM Replay Tape** — record, replay, and diff LLM interactions for deterministic offline tests.
@@ -61,7 +68,7 @@ ElectriPy is **not** a framework — it's a composable toolkit of production-gra
 ## Features
 
 - 🔧 **Core Components**: Configuration, logging, error handling, and type utilities
-- ⚡ **Concurrency**: Retry mechanisms (sync/async) and async token bucket rate limiter
+- ⚡ **Concurrency**: Retry mechanisms (sync/async), async token bucket rate limiter, and circuit breaker for cascading failure protection
 - 📁 **I/O**: JSONL read/write utilities for efficient data processing
 - 💻 **CLI**: Typer-based command-line interface with health checks, RAG eval runner, and an offline demo showcase (`electripy demo policy-collab`)
 - 🤖 **AI building blocks**: Provider-agnostic LLM Gateway with sync/async clients, request/response policy hooks, structured-output helpers, and a RAG Evaluation Runner for retrieval benchmarking.
@@ -73,6 +80,12 @@ ElectriPy is **not** a framework — it's a composable toolkit of production-gra
 - �📊 **AI Telemetry**: Provider-agnostic telemetry primitives and adapters (JSONL, optional OpenTelemetry) for HTTP resilience, LLM gateway, policy decisions, and RAG evaluation runs.
 - 🧠 **AI product engineering utilities**: Streaming chat primitives, deterministic agent runtime helpers, RAG quality/drift metrics, grounding checks for hallucination reduction, response robustness helpers for structured outputs, prompt templating and composition, token budget tracking and truncation, priority-based context window assembly, rule-based model routing, sliding-window conversation memory, and a declarative tool registry with JSON schema generation.
 - 🛡️ **AI policy and collaboration runtime**: Deterministic policy gateway checks for preflight/postflight/stream/tool flows, plus bounded agent-to-agent collaboration runtime for specialist orchestration patterns.
+- 🔗 **Fallback Chain**: Automatic provider failover — tries ranked LLM adapters in order with metadata tracking.
+- 📦 **Batch Complete**: Fan-out N LLM requests with bounded concurrency, order-preserving results, and per-request error isolation.
+- 💰 **Cost Ledger**: Thread-safe token cost accumulation with multi-dimensional label slicing.
+- 🔑 **Prompt Fingerprint**: Deterministic SHA-256 request hashing for caching, dedup, and drift detection.
+- 🔧 **JSON Repair**: Fix 7 common LLM JSON breakage patterns (fences, trailing commas, single quotes, unquoted keys, mismatched brackets, truncation) in one call.
+- 🔒 **Sensitive Data Scanner**: Regex-based PII and secret detection with 9 built-in patterns and extensible custom rules.
 
 ## Quick Start
 
@@ -179,6 +192,13 @@ Full documentation is available in the [docs/](https://github.com/inference-stac
 - [LLM Replay Tape](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/ai-replay-tape.md)
 - [Eval Assertions](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/ai-eval-assertions.md)
 - [Provider Adapters (Anthropic, Ollama)](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/ai-provider-adapters.md)
+- [Fallback Chain](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/ai-fallback-chain.md)
+- [Batch Complete](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/ai-batch-complete.md)
+- [Cost Ledger](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/ai-cost-ledger.md)
+- [Prompt Fingerprint](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/ai-prompt-fingerprint.md)
+- [JSON Repair](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/ai-json-repair.md)
+- [Sensitive Data Scanner](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/ai-sensitive-data-scanner.md)
+- [Circuit Breaker](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/circuit-breaker.md)
 - [RAG Evaluation Runner](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/ai-rag-eval-runner.md)
 - [AI Product Engineering Utilities](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/ai-product-engineering.md)
 - [Component Maturity Model](https://github.com/inference-stack-llc/electripy-studio/blob/main/docs/user-guide/component-maturity.md)
@@ -208,7 +228,7 @@ mkdocs serve
 electripy-studio/
 ├── src/electripy/          # Main package
 │   ├── core/               # Config, logging, errors, typing
-│   ├── concurrency/        # Retry & rate limiting
+│   ├── concurrency/        # Retry, rate limiting & circuit breaker
 │   ├── io/                 # JSONL utilities
 │   ├── cli/                # CLI commands
 │   └── ai/                 # AI building blocks and product-engineering utilities
@@ -230,7 +250,13 @@ electripy-studio/
 │       ├── eval_assertions/     # pytest-native assertion helpers for LLM outputs
 │       ├── policy_gateway/      # Deterministic pre/post/tool/stream policy decisions
 │       ├── tool_registry/       # Declarative tool definitions and JSON schema
-│       └── agent_collaboration/ # Bounded multi-agent handoff orchestration
+│       ├── agent_collaboration/ # Bounded multi-agent handoff orchestration
+│       ├── fallback_chain.py   # Automatic provider failover
+│       ├── batch_complete.py   # Concurrent LLM fan-out with backpressure
+│       ├── cost_ledger.py      # Thread-safe token cost accumulation
+│       ├── prompt_fingerprint.py # Deterministic SHA-256 request hashing
+│       ├── json_repair.py      # Fix common LLM JSON breakage
+│       └── sensitive_data_scanner.py # PII & secret detection
 ├── tests/                  # Test suite
 ├── docs/                   # Documentation
 ├── recipes/                # Example recipes
 
@@ -0,0 +1,68 @@
+# Batch Complete
+
+`batch_complete()` fans out many LLM requests in parallel with bounded
+concurrency, an optional progress callback, and per-request error
+isolation.
+
+## When to use it
+
+- You have 10–10 000 prompts to process and want to maximise
+  throughput without melting your rate limit.
+- You need **order-preserving** results — `results[i]` always
+  corresponds to `requests[i]`.
+- You want failed requests to capture the exception rather than crash
+  the entire batch.
+
+## Core concepts
+
+| Symbol | Role |
+|--------|------|
+| `batch_complete()` | Main entry point — keyword-only, returns `list[BatchResult]`. |
+| `BatchResult` | Type alias: `LlmResponse \| Exception`. |
+
+## Basic example
+
+```python
+from electripy.ai.batch_complete import batch_complete
+from electripy.ai.llm_gateway import build_llm_sync_client
+from electripy.ai.llm_gateway.domain import LlmRequest, ChatMessage, MessageRole
+
+port = build_llm_sync_client("openai")
+
+requests = [
+    LlmRequest(
+        model="gpt-4o-mini",
+        messages=[ChatMessage(role=MessageRole.USER, content=f"Summarise: {doc}")],
+    )
+    for doc in documents
+]
+
+results = batch_complete(
+    port=port,
+    requests=requests,
+    max_concurrency=5,
+    on_progress=lambda done, total: print(f"{done}/{total}"),
+)
+
+for r in results:
+    if isinstance(r, Exception):
+        print(f"FAILED: {r}")
+    else:
+        print(r.text[:80])
+```
+
+## Parameters
+
+| Param | Type | Default | Description |
+|-------|------|---------|-------------|
+| `port` | `SyncLlmPort` | — | Any LLM adapter. |
+| `requests` | `Sequence[LlmRequest]` | — | Ordered prompts. |
+| `max_concurrency` | `int` | 5 | Max in-flight calls. |
+| `timeout` | `float \| None` | `None` | Per-request timeout forwarded to the port. |
+| `on_progress` | `Callable[[int, int], None] \| None` | `None` | `(completed, total)` callback. |
+
+## Error handling
+
+Each request is independent.  If one fails, the exception is captured
+in the corresponding result slot — the rest of the batch continues.
+This means you never lose partial work to one bad prompt.
@@ -0,0 +1,64 @@
+# Cost Ledger
+
+The **Cost Ledger** tracks LLM token usage and estimated cost in-process
+with thread-safe accumulation and label-based slicing.
+
+## When to use it
+
+- You want per-tenant, per-model, or per-feature cost visibility
+  without shipping data to a third-party service.
+- You need a running total during a batch pipeline or an agent loop.
+- You want to set spend alerts or budget guards in calling code.
+
+## Core concepts
+
+| Symbol | Role |
+|--------|------|
+| `CostLedger` | Thread-safe accumulator with `record()`, `total()`, `by_label()`. |
+| `LedgerEntry` | Frozen record: `tokens` + `labels`. |
+| `LedgerTotal` | Frozen aggregate: `tokens`, `estimated_cost`, `call_count`. |
+
+## Basic example
+
+```python
+from electripy.ai.cost_ledger import CostLedger
+
+ledger = CostLedger(cost_per_1k_tokens=0.002)
+
+# After each LLM call:
+ledger.record(tokens=1_500, labels={"tenant": "acme", "model": "gpt-4o-mini"})
+ledger.record(tokens=800,   labels={"tenant": "acme", "model": "gpt-4o-mini"})
+ledger.record(tokens=3_200, labels={"tenant": "globex", "model": "gpt-4o"})
+
+# Global totals
+print(ledger.total())
+# LedgerTotal(tokens=5500, estimated_cost=0.011, call_count=3)
+
+# Slice by any label dimension
+by_tenant = ledger.by_label("tenant")
+print(by_tenant["acme"])
+# LedgerTotal(tokens=2300, estimated_cost=0.0046, call_count=2)
+```
+
+## Multi-dimensional labels
+
+Labels are arbitrary string key-value pairs.  Slice by any dimension:
+
+```python
+ledger.record(tokens=500, labels={"model": "gpt-4o", "feature": "chat", "env": "prod"})
+
+by_model   = ledger.by_label("model")
+by_feature = ledger.by_label("feature")
+by_env     = ledger.by_label("env")
+```
+
+## Thread-safety
+
+All mutations are guarded by an internal lock.  Multiple threads can
+call `record()` concurrently — `total()` and `by_label()` always return
+consistent snapshot aggregates.
+
+## Resetting
+
+Call `ledger.reset()` to clear all entries (for example, between test
+runs or pipeline stages).
@@ -0,0 +1,64 @@
+# Fallback Chain
+
+The **Fallback Chain** provides automatic provider failover for LLM
+calls.  Wrap multiple `SyncLlmPort` adapters in a `FallbackChainPort`
+and the chain tries each provider in order until one succeeds.
+
+## When to use it
+
+- You run multi-provider setups (OpenAI + Anthropic + local) and want
+  seamless failover without retry loops in calling code.
+- A primary provider is occasionally rate-limited or down.
+- You want to track **which** provider handled each request.
+
+## Core concepts
+
+| Symbol | Role |
+|--------|------|
+| `FallbackChainPort` | Implements `SyncLlmPort`, wraps N providers in ranked order. |
+
+On success the response carries
+`metadata["_fallback_provider_index"]` — the zero-based index of the
+provider that handled the call.
+
+## Basic example
+
+```python
+from electripy.ai.fallback_chain import FallbackChainPort
+from electripy.ai.llm_gateway import build_llm_sync_client
+
+chain = FallbackChainPort(
+    providers=[
+        build_llm_sync_client("openai"),
+        build_llm_sync_client("anthropic"),
+        build_llm_sync_client("ollama"),
+    ],
+)
+
+response = chain.complete(request)
+print(response.metadata["_fallback_provider_index"])  # 0, 1, or 2
+```
+
+## Behaviour on failure
+
+- Exceptions from non-final providers are **swallowed** (logged at
+  `DEBUG` level).
+- If **all** providers fail, the exception from the **last** provider
+  is re-raised — giving you a clear error from the final fallback.
+
+## Combining with other utilities
+
+```python
+from electripy.concurrency.circuit_breaker import CircuitBreaker
+
+# Wrap individual providers in circuit breakers, then chain them.
+cb_openai = CircuitBreaker(failure_threshold=3, recovery_timeout=30.0)
+cb_anthropic = CircuitBreaker(failure_threshold=3, recovery_timeout=30.0)
+
+chain = FallbackChainPort(
+    providers=[
+        cb_openai(openai_adapter.complete),
+        cb_anthropic(anthropic_adapter.complete),
+    ],
+)
+```
@@ -0,0 +1,66 @@
+# JSON Repair
+
+`json_repair()` fixes the most common JSON breakage patterns produced
+by LLMs and returns a parsed `dict` in one call.
+
+## When to use it
+
+- An LLM returns JSON wrapped in markdown fences, with trailing commas,
+  single-quoted keys, or gets cut off mid-object by token limits.
+- You want a single function that handles all of these cases without
+  chaining regex hacks yourself.
+
+## Repair strategies (applied in order)
+
+1. **Strip markdown fences** — `` ```json … ``` ``
+2. **Extract the outermost `{…}` block** from surrounding prose.
+3. **Remove trailing commas** before `}` or `]`.
+4. **Replace single-quoted strings** with double quotes.
+5. **Quote bare (unquoted) keys** — JavaScript-style `name:` → `"name":`.
+6. **Fix mismatched brackets** — inserts missing `]` or `}` when
+   a closer matches a deeper bracket (e.g. `{"items": [1,2,3}` →
+   `{"items": [1,2,3]}`).
+7. **Close truncated JSON** — appends missing braces/brackets for
+   objects that were cut off by token limits.
+
+## Basic example
+
+```python
+from electripy.ai.json_repair import json_repair
+
+text = '''Here is the result:
+```json
+{"name": "Alice", "age": 30,}
+```'''
+
+data = json_repair(text)
+print(data)  # {"name": "Alice", "age": 30}
+```
+
+## Raw string variant
+
+If you need the repaired JSON as a string (e.g. for logging or storage)
+rather than a parsed dict:
+
+```python
+from electripy.ai.json_repair import json_repair_raw
+
+raw = json_repair_raw(text)
+print(type(raw))  # <class 'str'>
+```
+
+## Truncated JSON
+
+Token limits frequently cut off LLM output mid-object.  `json_repair`
+handles this automatically:
+
+```python
+data = json_repair('{"users": [{"name": "Alice"')
+# {"users": [{"name": "Alice"}]}
+```
+
+## Error handling
+
+If no JSON object can be recovered at all, `ValueError` is raised.
+The error message includes the first 200 characters of the input for
+debugging.