Last updated: 2026-04-26 Purpose: Transfer complete project context to a new developer or agent with zero knowledge loss. Read time: ~20 minutes for full comprehension.
- Project Overview
- Architecture
- Repository Structure
- Module-by-Module Breakdown
- Configuration System
- How a Call Flows End-to-End
- Dependencies and Versions
- Environment and Infrastructure
- Testing
- Security Measures
- Key Design Decisions and Rationale
- Known Issues and Technical Debt
- Planned Future Work
- Cost Profile
- Git History
- Quick Start for New Developers
- Troubleshooting and Gotchas
AIReceptionist is a voice-based AI phone receptionist that answers incoming calls for businesses, provides information from a configurable FAQ list, checks business hours, transfers calls to departments, and takes messages when staff are unavailable. It speaks and listens using real-time speech-to-speech AI -- callers talk to it like a human receptionist.
- It is not a chatbot or text-based system (though a web widget channel is planned).
- It is not a general-purpose voice assistant -- it is scoped to receptionist duties for a specific business.
- Call recording and transcripts are supported as of the 2026-04-23 refactor (see addendum at bottom of this document).
| Layer | Technology |
|---|---|
| Voice AI | OpenAI Realtime API (speech-to-speech) |
| Audio Transport | LiveKit Agents SDK |
| Telephony | LiveKit SIP (connects to phone numbers) |
| Configuration | YAML files + Pydantic v2 validation |
| Message Storage | JSON files on disk (webhook planned) |
| Language | Python 3.14.2 (see compatibility notes) |
A phone call arrives via a SIP trunk connected to LiveKit Cloud. LiveKit dispatches the call to this agent process. The agent loads the appropriate business configuration (YAML file), builds a system prompt describing the business, and connects to the OpenAI Realtime API for speech-to-speech conversation. The caller's audio streams to OpenAI, which generates spoken responses in real time. The agent has function tools (lookup FAQ, check hours, transfer call, take message) that the LLM can invoke during conversation. Messages are saved as JSON files. Call transfers use the LiveKit SIP transfer API.
PSTN / SIP Trunk
|
v
+-------------------+
| LiveKit Cloud |
| SIP Gateway |
+-------------------+
|
v
+-------------------+
| LiveKit Agents | <-- This project
| AgentServer |
+-------------------+
| |
v v
+-------------+ +------------------+
| Business | | OpenAI Realtime |
| Config YAML | | API (voice LLM) |
+-------------+ +------------------+
|
v
+-------------------+
| Message Storage |
| (JSON files) |
+-------------------+
- AgentServer (
receptionist/agent.py): Entry point. Accepts incoming LiveKit sessions, loads config, creates the AI agent session. - Receptionist (
receptionist/agent.py): The agent class. Defines the personality, greeting, and all tool functions the LLM can call. - BusinessConfig (
receptionist/config.py): Pydantic models that validate and structure all business-specific settings loaded from YAML. - build_system_prompt (
receptionist/prompts.py): Converts a BusinessConfig into the natural-language system prompt that instructs the LLM how to behave. - save_message (
receptionist/messages.py): Persists caller messages to disk (or, in the future, to a webhook endpoint).
One running agent process can serve multiple businesses. The routing works as follows:
- An incoming call arrives with job metadata containing a
"config"key (e.g.,"example-dental"). - The agent loads
config/businesses/example-dental.yaml. - If no metadata is provided, it falls back to the first YAML file found in
config/businesses/. - Each business gets its own system prompt, FAQs, hours, routing, and message directory.
AIReceptionist/
├── README.md # Setup guide and configuration reference
├── HANDOFF.md # THIS FILE -- full project context
├── pyproject.toml # Project metadata, dependencies, tool config
├── .env.example # Template for required environment variables
├── .gitignore # Standard Python + project-specific ignores
│
├── receptionist/ # Main application package
│ ├── __init__.py # Package marker (empty or minimal)
│ ├── agent.py (177 lines) # Agent server, session handler, Receptionist class
│ ├── config.py (101 lines)# Pydantic v2 models, YAML loading, validation
│ ├── prompts.py (63 lines)# System prompt builder from BusinessConfig
│ └── messages.py (55 lines)# Message dataclass, file/webhook save logic
│
├── config/
│ └── businesses/
│ └── example-dental.yaml # Example business configuration file
│
├── tests/
│ ├── test_config.py (6 tests) # YAML parsing, validation, edge cases
│ ├── test_prompts.py (6 tests) # Prompt content verification
│ └── test_messages.py (3 tests) # File creation, multiple messages, directory creation
│
├── docs/
│ └── plans/
│ ├── 2026-03-02-ai-receptionist-design.md
│ └── 2026-03-02-ai-receptionist-implementation.md
│
└── messages/ # Runtime message storage (gitignored)
The messages/ directory is gitignored because it contains runtime data (caller messages saved as JSON files). It is created automatically when the first message is saved.
This module defines the entire configuration schema using Pydantic v2 models.
Models (in dependency order):
| Model | Fields | Notes |
|---|---|---|
BusinessInfo |
name: str, type: str, timezone: str |
Core identity. timezone must be a valid IANA timezone string (e.g., "America/New_York"). |
VoiceConfig |
voice_id: str (default "coral"), model: str (default "gpt-realtime") |
OpenAI Realtime voice and model selection. Model can be "gpt-realtime" (latest) or "gpt-4o-realtime-preview" (original). |
DayHours |
open: str, close: str |
Both validated by regex `^([01]\d |
WeeklyHours |
7 fields: monday through sunday, each Optional[DayHours] |
A field_validator converts the string "closed" to None for any day. |
RoutingEntry |
name: str, number: str, description: str |
Represents a department the agent can transfer calls to. |
FAQEntry |
question: str, answer: str |
Single FAQ pair. |
DeliveryMethod |
Enum: "file", "webhook" |
How messages are delivered. |
MessagesConfig |
delivery: DeliveryMethod, file_path: Optional[str], webhook_url: Optional[str] |
A model_validator enforces that file_path is required when delivery=file and webhook_url is required when delivery=webhook. |
BusinessConfig |
All of the above as nested fields | Top-level model. Has a from_yaml_string() classmethod for parsing raw YAML. |
Key functions:
BusinessConfig.from_yaml_string(yaml_str) -> BusinessConfig: Parses a YAML string usingyaml.safe_loadand validates it through Pydantic.load_config(path: str) -> BusinessConfig: Reads a YAML file with explicit UTF-8 encoding and returns a validatedBusinessConfig.
Single function: build_system_prompt(config: BusinessConfig) -> str
This function constructs the full natural-language system prompt that the OpenAI Realtime LLM uses to guide its behavior. The prompt includes:
- Business identity: "You are the AI receptionist for {name}, a {type}."
- Personality instructions: Warm, professional, concise.
- Weekly hours schedule: Formatted day-by-day from config, including which days are closed.
- After-hours message: What to say when the business is closed.
- Routing departments: List of departments the agent can transfer to, with descriptions.
- Tool usage instructions: When and how to use each function tool.
- FAQ list: All question-answer pairs, so the LLM can answer them directly.
- Behavioral rules: Stay concise, never fabricate information, confirm before transferring, show empathy.
Dataclass: Message
caller_name: strcallback_number: strmessage: strbusiness_name: strtimestamp: str-- Automatically set to current UTC time in ISO 8601 format.
Functions:
save_message(message: Message, config: MessagesConfig): Dispatches to_save_to_file()or_send_webhook()based onconfig.delivery._save_to_file(message, file_path): Creates the directory if needed, writes the message as a JSON file. Filename format uses UTC timestamp with microseconds to avoid collisions (e.g.,2026-03-02T14-30-00-123456.json)._send_webhook(message, webhook_url): Stubbed -- raisesNotImplementedError. This is a known gap.
This is the largest and most important module. It ties everything together.
Top-level functions:
-
load_business_config(ctx): Determines which business config to load.- Checks
ctx.job.metadatafor a"config"key. - Validates the config name matches
^[a-zA-Z0-9_-]+$(path traversal protection). - Loads
config/businesses/{config_name}.yaml. - If no metadata, falls back to the first
.yamlfile inconfig/businesses/.
- Checks
-
_get_caller_identity(ctx): Iterates over room participants to find the SIP participant. Returns caller identity orNonewith a warning log.
Class: Receptionist(Agent)
This is a LiveKit Agents SDK Agent subclass. It defines:
| Method | Purpose |
|---|---|
__init__(config) |
Stores BusinessConfig, passes build_system_prompt(config) as the agent's instructions. |
on_enter() |
Called when the agent joins the session. Generates a spoken greeting using the business name from config. |
lookup_faq(question) |
Tool function. Performs case-insensitive substring matching against all FAQs in config. Returns the answer if found, or a neutral "I don't have specific information about that" fallback. |
transfer_call(department) |
Tool function. Looks up the department in config.routing, calls the LiveKit SIP transfer API (ctx.room.transfer_participant). Error messages are sanitized -- details are logged server-side, and a generic message is returned to the LLM. |
take_message(caller_name, message, callback_number) |
Tool function. Creates a Message dataclass and saves it via asyncio.to_thread(save_message, ...) to avoid blocking the event loop. |
get_business_hours() |
Tool function. Uses zoneinfo.ZoneInfo to get the current time in the business's timezone. Performs lexicographic HH:MM comparison against today's DayHours to determine open/closed status. |
Server setup:
server = AgentServer()
@server.rtc_session()
async def handle_call(ctx):
config = await load_business_config(ctx)
session = AgentSession(
model=openai.realtime.RealtimeModel()
)
receptionist = Receptionist(config)
# Noise cancellation: BVCTelephony for SIP calls, BVC otherwise
await session.start(receptionist, room=ctx.room)Entry point: python -m receptionist.agent dev
The dev argument runs the agent in development mode (auto-reload, verbose logging).
Business configs live in config/businesses/. Here is the structural template based on example-dental.yaml:
business:
name: "Example Dental Office"
type: "dental office"
timezone: "America/New_York"
voice:
voice_id: "coral" # OpenAI Realtime voice
# model: "gpt-realtime" # Optional: model variant (default: latest)
hours:
monday:
open: "08:00"
close: "17:00"
tuesday:
open: "08:00"
close: "17:00"
wednesday:
open: "08:00"
close: "17:00"
thursday:
open: "08:00"
close: "17:00"
friday:
open: "08:00"
close: "15:00"
saturday: "closed"
sunday: "closed"
routing:
- name: "Front Desk"
number: "+15551234567"
description: "General inquiries and appointment scheduling"
- name: "Billing"
number: "+15551234568"
description: "Insurance and payment questions"
faq:
- question: "What insurance do you accept?"
answer: "We accept most major dental insurance plans including Delta Dental, Cigna, and Aetna."
- question: "What are your hours?"
answer: "We are open Monday through Thursday 8 AM to 5 PM, Friday 8 AM to 3 PM."
messages:
delivery: "file"
file_path: "messages/example-dental"
personality: "friendly and professional"
after_hours_message: "Our office is currently closed. I can take a message and someone will get back to you on our next business day."- Create a new YAML file in
config/businesses/(e.g.,acme-plumbing.yaml). - Follow the structure above, filling in all required fields.
- For multi-business dispatch, ensure job metadata includes
{"config": "acme-plumbing"}.
DayHours.openandDayHours.closemust matchHH:MM24-hour format.WeeklyHoursfields accept either aDayHoursobject or the string"closed"(converted toNone).MessagesConfigcross-validates:file_pathrequired for file delivery,webhook_urlrequired for webhook delivery.BusinessInfo.nameis a required non-empty string.- YAML is loaded with
yaml.safe_load(safe against code injection). - File is read with explicit
encoding="utf-8".
This section traces a complete phone call through the system.
- An external caller dials the business phone number.
- The SIP trunk provider routes the call to the LiveKit Cloud SIP gateway.
- LiveKit Cloud creates a new room and dispatches the call to the registered agent.
handle_call(ctx)is triggered by the@server.rtc_session()decorator.load_business_config(ctx)runs:- Checks
ctx.job.metadatafor a"config"key. - If found and valid (alphanumeric slug), loads the corresponding YAML file.
- If not found, loads the first YAML in
config/businesses/.
- Checks
- An
AgentSessionis created withopenai.realtime.RealtimeModel(). - A
Receptionistinstance is created with the loaded config. - Noise cancellation is applied (BVCTelephony for SIP, BVC otherwise).
- The session starts.
- The
Receptionist.on_enter()method fires. - It generates a greeting like: "Thank you for calling Example Dental Office. How can I help you today?"
- This is spoken to the caller via the OpenAI Realtime API.
- The caller speaks. Audio streams through LiveKit to OpenAI Realtime.
- OpenAI processes the speech and generates a response.
- If the LLM determines it needs to use a tool, it invokes one:
lookup_faq(question): Searches FAQs, returns answer or fallback.get_business_hours(): Checks if the business is currently open.transfer_call(department): Transfers via SIP to the department's number.take_message(caller_name, message, callback_number): Saves a message to disk.
- The LLM incorporates tool results into its spoken response.
- The caller hangs up, or the call is transferred.
- The LiveKit session ends.
- Any messages taken are already persisted as JSON files in the
messages/directory.
| Package | Requirement | Installed Version | Purpose |
|---|---|---|---|
livekit-agents |
>=1.0.0 |
1.4.3 | Agent SDK for real-time voice sessions |
livekit-plugins-openai |
>=1.0.0 |
1.4.3 | OpenAI Realtime API integration for LiveKit |
livekit-plugins-noise-cancellation |
>=0.2.3 |
0.2.5 | Background noise cancellation (BVC/Krisp) |
pydantic |
>=2.0 |
(latest v2) | Data validation for config models |
pyyaml |
>=6.0 |
(latest) | YAML config file parsing |
python-dotenv |
>=1.0 |
(latest) | .env file loading for secrets |
| Package | Requirement | Purpose |
|---|---|---|
pytest |
>=8.0 |
Test runner |
pytest-asyncio |
>=0.24 |
Async test support |
The livekit-agents package officially restricts Python to <3.14. The development environment runs Python 3.14.2, which means it was force-installed or the constraint was bypassed. This may cause runtime compatibility issues. For production deployment, use Python 3.11 or 3.12 for maximum stability and compatibility.
- Project URL:
wss://aireceptionist-402e6ask.livekit.cloud - Agent registration: The agent registers with
agent_name=""(empty string) for auto-dispatch. - Production note: For multi-business routing with dispatch rules, restore
agent_name="receptionist"and configure LiveKit dispatch rules accordingly.
These should be set in a .env file (see .env.example for template):
| Variable | Purpose |
|---|---|
LIVEKIT_URL |
LiveKit Cloud WebSocket URL |
LIVEKIT_API_KEY |
LiveKit API key for authentication |
LIVEKIT_API_SECRET |
LiveKit API secret for authentication |
OPENAI_API_KEY |
OpenAI API key for Realtime API access |
- OS: Windows 11 Pro 10.0.26200
- Python: 3.14.2 (see compatibility note above)
- Shell: bash (Git Bash or similar on Windows)
# Development mode (auto-reload, verbose logging)
python -m receptionist.agent dev
# Production mode
python -m receptionist.agent startpytest # Run all 15 tests
pytest tests/test_config.py # Run only config tests
pytest -v # Verbose output| Test File | Tests | What It Covers |
|---|---|---|
test_config.py |
6 | YAML parsing, file loading, closed/open day hours, missing name validation, invalid delivery method validation, cross-field delivery validation |
test_prompts.py |
6 | Business name in prompt, personality text, FAQ content, routing info, hours schedule, after-hours message |
test_messages.py |
3 | Single file creation and content, multiple file uniqueness, auto-directory creation |
Total: 15 tests, all passing.
- The
agent.pymodule (would require mocking LiveKit SDK and OpenAI Realtime API). - Webhook delivery (stubbed, not implemented).
- Integration/end-to-end call flow.
get_business_hours()timezone logic.transfer_call()SIP transfer logic.- Error handling paths in agent tools.
Tests use plain pytest with fixtures. Config tests construct YAML strings and validate parsing. Prompt tests check that specific content appears in the generated prompt string. Message tests use temporary directories to verify file I/O.
The following security hardening was applied in commit 1201e07:
The load_business_config() function validates the config name from job metadata against ^[a-zA-Z0-9_-]+$ before constructing a file path. This prevents an attacker from passing ../../etc/passwd as a config name.
config_name from metadata -> regex validation -> config/businesses/{config_name}.yaml
When tool functions (e.g., transfer_call) encounter exceptions, the full error details are logged server-side using Python logging. The message returned to the LLM is generic (e.g., "I'm sorry, I'm unable to transfer your call right now"). This prevents leaking internal paths, stack traces, or infrastructure details to callers.
save_message() is called via asyncio.to_thread() to prevent file I/O from blocking the event loop (which would cause audio glitches or dropped frames in the voice session).
DayHoursenforces HH:MM 24-hour format via regex.MessagesConfiguses a Pydanticmodel_validatorfor cross-field validation.- YAML files are read with
yaml.safe_load(prevents arbitrary code execution). - Files are opened with explicit
encoding="utf-8".
Choice: Use OpenAI's Realtime API for end-to-end speech-to-speech processing. Alternative considered: Cascaded pipeline (Deepgram STT -> Claude/GPT-4o -> ElevenLabs TTS). Rationale: The Realtime API provides the lowest latency and highest fidelity for voice conversations. It handles interruptions, backchanneling, and natural turn-taking natively. The cascaded approach is planned as a future cost-conscious alternative.
Choice: LiveKit Agents SDK for real-time audio transport. Rationale: LiveKit is the same infrastructure OpenAI uses for ChatGPT Advanced Voice Mode. It provides production-grade WebRTC, SIP integration, and an agent framework with built-in session management.
Choice: Business configs are YAML files on disk. Alternative considered: Database (PostgreSQL, SQLite). Rationale: Zero additional infrastructure. Configs are git-versionable, human-readable, and trivially editable. For the expected scale (tens of businesses, not thousands), YAML is sufficient. A database can be added later if needed.
Choice: All FAQ entries are embedded directly in the LLM system prompt. Alternative considered: RAG with vector database. Rationale: At 10-30 FAQs (typical for a small business), the LLM can reason over them directly in context. RAG adds complexity (embedding model, vector store, retrieval logic) with no benefit at this scale. If FAQ counts grow beyond ~50-100, revisit this decision.
Choice: Messages saved as individual JSON files on disk. Alternative considered: Database, message queue. Rationale: Simplest possible approach for MVP. No additional dependencies. Easy to inspect and debug. Webhook delivery is planned for production integrations.
Choice: A single agent process serves multiple businesses, selected by job metadata. Rationale: Efficient resource usage. No need to run N agent processes for N businesses. LiveKit's dispatch system routes calls to the right config.
| # | Issue | Impact | Suggested Fix |
|---|---|---|---|
| 1 | Webhook delivery is stubbed (NotImplementedError) |
Cannot integrate with external systems | Implement _send_webhook() using httpx or aiohttp |
| 2 | Python 3.14 compatibility uncertain | Potential runtime crashes in production | Pin to Python 3.11 or 3.12 in production Dockerfile/deployment |
| 3 | agent_name="" for dev testing |
No named dispatch in production | Restore agent_name="receptionist" and configure dispatch rules |
| # | Issue | Impact | Suggested Fix |
|---|---|---|---|
| 4 | lookup_faq uses simple substring matching |
May return wrong FAQ for ambiguous queries | Use TF-IDF or embedding similarity; sufficient for <30 FAQs now |
| 5 | No call recording or transcript capture | No audit trail or review capability | Use LiveKit Egress API for recordings; OpenAI Realtime text output for transcripts |
| 6 | No email notification for messages | Staff must manually check message files | Add SMTP/SendGrid integration triggered after save_message() |
| # | Issue | Impact | Suggested Fix |
|---|---|---|---|
| 7 | No admin dashboard or web UI | Config changes require file editing | Build a web UI (FastAPI + React) for config management |
| 8 | No integration tests for agent.py | Core module untested | Mock LiveKit and OpenAI SDKs; test tool invocation paths |
| 9 | No structured logging | Harder to debug in production | Add structured JSON logging with correlation IDs per call |
These items come from the design document at docs/plans/2026-03-02-ai-receptionist-design.md:
- Webhook message delivery: Implement
_send_webhook()to POST messages to external endpoints (CRM, Slack, etc.). - Call recordings: Use the LiveKit Egress API to record calls for quality assurance and compliance.
- Call transcripts: Capture the text output from the OpenAI Realtime API to generate searchable transcripts.
- Email notifications: Send email alerts when a message is taken (SMTP or SendGrid).
- Cascaded pipeline mode: Offer an alternative pipeline using Deepgram STT + Claude/GPT-4o + ElevenLabs TTS. This would be cheaper (~$0.05-0.10/min vs. ~$0.20-0.30/min) at the cost of slightly higher latency.
- Web widget channel: Allow businesses to embed a voice widget on their website. Uses browser WebRTC directly (no telephony needed), lowering per-call costs.
- Admin dashboard: Web UI for managing business configs, viewing messages, listening to recordings, and viewing analytics.
- Analytics: Track call volume, common questions, transfer rates, message rates, peak hours.
- Multi-language support: Leverage OpenAI Realtime's multilingual capabilities.
| Cost Component | Rate |
|---|---|
| OpenAI Realtime API | ~$0.20-0.30 per minute |
| SIP trunk (telephony) | ~$0.01-0.02 per minute |
| LiveKit Cloud | Included in agent hosting |
| Metric | Value |
|---|---|
| Calls per day | 30 |
| Average call duration | 2 minutes |
| Daily cost | ~$15 |
| Monthly cost (30 days) | ~$450 |
- Cascaded pipeline (Deepgram + Claude + ElevenLabs): Could reduce AI cost to ~$0.05-0.10/min.
- Web widget (no telephony): Eliminates SIP trunk costs entirely.
- Shorter calls: Optimize prompts and FAQ coverage to resolve calls faster.
The repository has 9 commits on the main branch, listed newest to oldest:
713c212 docs: add README with setup guide and configuration reference
1201e07 fix: harden agent against path traversal, error leaks, and blocking I/O
865cb62 feat: receptionist agent with function tools and server entry point
9673f30 feat: message storage with file-based delivery
953dfb8 feat: system prompt builder from business config
6acbdfc fix: add config validation for delivery fields, time format, and UTF-8 encoding
7d70f91 feat: business config Pydantic models with YAML loading and validation
89578d6 docs: add design doc and implementation plan
bddec57 chore: initial project scaffolding with dependencies
- Scaffolding (
bddec57): Initial project structure,pyproject.toml, dependencies. - Design (
89578d6): Design document and implementation plan written before coding. - Config (
7d70f91,6acbdfc): Pydantic models for business config, then hardened with validation. - Prompts (
953dfb8): System prompt builder from business config. - Messages (
9673f30): File-based message storage. - Agent (
865cb62,1201e07): Core agent with tools, then hardened for security. - Docs (
713c212): README with setup guide.
- Python 3.11 or 3.12 (recommended; 3.14 works but is not officially supported by livekit-agents)
- A LiveKit Cloud account (or self-hosted LiveKit server)
- An OpenAI API key with Realtime API access
# 1. Clone the repository
cd C:\Users\MDASR\Desktop\Projects\AIReceptionist
# 2. Create and activate a virtual environment
python -m venv .venv
source .venv/Scripts/activate # Windows Git Bash
# or: .venv\Scripts\activate # Windows CMD
# or: source .venv/bin/activate # Linux/macOS
# 3. Install dependencies
pip install -e ".[dev]"
# 4. Set up environment variables
cp .env.example .env
# Edit .env and fill in:
# LIVEKIT_URL=wss://aireceptionist-402e6ask.livekit.cloud
# LIVEKIT_API_KEY=<your key>
# LIVEKIT_API_SECRET=<your secret>
# OPENAI_API_KEY=<your key>
# 5. Run the tests to verify everything works
pytest
# 6. Start the agent in development mode
python -m receptionist.agent dev- Go to the LiveKit Cloud dashboard.
- Open the "Playground" or "Agent Playground" tool.
- Connect to the same LiveKit project.
- The agent should accept the session (since
agent_name=""accepts all dispatches). - Speak to test the conversation flow.
- Configure a SIP trunk in LiveKit Cloud pointing to your phone number provider.
- Create a dispatch rule routing incoming SIP calls to the agent.
- Call the phone number.
- Ensure
agent_name=""in the code (for dev) or that dispatch rules match the agent name (for production). - Check that the
.envfile has correctLIVEKIT_URL,LIVEKIT_API_KEY, andLIVEKIT_API_SECRET. - Verify the agent is running and connected: the console should show a registration message.
- Ensure at least one
.yamlfile exists inconfig/businesses/. - If using job metadata routing, verify the metadata
"config"value matches the YAML filename (without.yamlextension). - Check that the YAML file is valid (use a YAML linter).
- Check your
OPENAI_API_KEY-- the Realtime API requires specific access. - Noise cancellation requires the
livekit-plugins-noise-cancellationpackage. If it fails to load, the agent may still work but without noise cancellation. - Ensure the event loop is not being blocked (the
asyncio.to_threadwrapper onsave_messageis specifically for this).
- If you encounter import errors or C extension failures, switch to Python 3.11 or 3.12.
livekit-agentsofficially requires Python<3.14. Force-installing on 3.14 may cause subtle issues.
- Run
pip install -e ".[dev]"to ensure dev dependencies are installed. - Tests do not require LiveKit or OpenAI credentials -- they test config, prompts, and messages only.
- If
test_messages.pyfails, check filesystem permissions on the temp directory.
- Check the
messagesconfig in the YAML file --file_pathmust be set whendeliveryis"file". - The directory is created automatically on first write, but the process needs write permissions.
- Look in the path specified by
file_pathin the business YAML config (e.g.,messages/example-dental/).
This document contains everything needed to understand, maintain, and extend the AIReceptionist project. For architectural rationale and long-term vision, also consult:
docs/plans/2026-03-02-ai-receptionist-design.mddocs/plans/2026-03-02-ai-receptionist-implementation.md
For setup and configuration reference:
README.md.env.exampleconfig/businesses/example-dental.yaml
This addendum summarizes the large refactor landed on the feat/call-artifacts-and-delivery branch. Sections 3, 4, 5, 6, 7, 9, 10, 12 above are partly superseded — see documentation/architecture.md for the current authoritative architecture.
- Package restructure into subpackages:
receptionist/messaging/,email/,recording/,transcript/,retention/, pluslifecycle.py. The legacyreceptionist/messages.pyhas been deleted; its contents moved intomessaging/models.pyandmessaging/channels/file.py. - New CallLifecycle class owns per-call state (metadata, transcript capture, recording handle) and fires the call-end fan-out (transcripts, recording stop, optional call-end email).
- Multi-channel delivery — a business's
messages.channelsis a list; file/webhook/email can be enabled simultaneously. Dispatcher awaits file synchronously and fires the rest as background tasks, writing.failures/records on exhausted retries. - Email via SMTP or Resend behind an
EmailSenderprotocol. - Call recording via LiveKit Egress, to local disk or S3 (incl. R2/B2/MinIO via
endpoint_url). - Transcripts captured from AgentSession events and written as JSON (source of truth) + Markdown.
- Consent preamble spoken before the greeting when recording is enabled (two-party consent states).
- Multi-language auto-detection via the system prompt; per-business
languages.allowedwhitelist. - Retention sweeper CLI:
python -m receptionist.retention sweep [--dry-run] [--business <name>]. Skips.failures/directories. - Failures visibility CLI:
python -m receptionist.messaging list-failures [--business <name>]. - Env-var interpolation in YAML (
${VAR}). - Voice default changed to
marin, model default togpt-realtime-1.5.
Production: aiosmtplib>=3.0, resend>=2.0, httpx>=0.27, aioboto3>=13.0, aiofiles>=23.0.
Dev: pytest-mock>=3.12, respx>=0.21, moto>=5.0.
Floor bumps: livekit-agents>=1.5.0, livekit-plugins-openai>=1.5.0.
~120 tests across unit + 1 integration. See tests/MANUAL.md for live-playground validation that cannot be automated.
- Webhook delivery was stubbed — now fully implemented with retry/backoff.
- No call recording — now supported via LiveKit Egress to local/S3.
- No call transcripts — now captured and persisted.
- No email notification for messages — now supported (SMTP or Resend).
- Python 3.14 compatibility uncertain (
.python-versionnow pins 3.12; deploy on 3.11 or 3.12). agent_name=""for dev; production needsagent_name="receptionist"+ LiveKit dispatch rules.lookup_faquses substring matching — replace with embedding similarity if FAQs >50 per business.- No retry CLI for
.failures/(visibility only). - No admin dashboard / web UI.
- S3 storage for transcripts not supported (local only).
- No structured JSON logging.
- Design spec:
docs/superpowers/specs/2026-04-23-call-artifacts-and-delivery-design.md - Implementation plan:
docs/superpowers/plans/2026-04-23-call-artifacts-and-delivery.md
Adds in-call appointment booking via Google Calendar. See
documentation/architecture.md for the authoritative architecture post-
this-change; this addendum summarizes what shipped.
- Two new function tools on
Receptionist:check_availabilityandbook_appointment. - New
receptionist/booking/subpackage (auth, client wrapper, pure availability logic, booking with race detection, setup CLI). - Both service account and OAuth 2.0 auth paths supported. Setup CLI
(
python -m receptionist.booking setup <business>) walks a business owner through the OAuth browser consent flow. - New
on_bookingemail trigger using the existing EmailChannel dispatcher — notifies staff when an appointment lands. - Session-scoped slot cache (
Receptionist._offered_slots) enforces "check-before-book" architecturally — the LLM cannot book a slot it wasn't offered. - UNVERIFIED tag in event descriptions: staff see the caller's identity was not verified.
CallMetadata.outcome: str | None → CallMetadata.outcomes: set[str].
Calls with multiple outcomes (e.g. transfer + book) now retain both.
Email subjects render as "Transferred + Appointment booked" when applicable.
_OUTCOME_PRIORITY dict deleted; _add_outcome replaces _set_outcome.
Added: google-api-python-client>=2.140, google-auth>=2.32,
google-auth-oauthlib>=1.2, python-dateutil>=2.9. All Apache 2.0.
Unit tests per subpackage module (~35 new tests). One integration test
(tests/integration/test_booking_flow.py) covering record_appointment_booked
→ on_call_ended → on_booking email fan-out. Browser OAuth flow is manual-only
(tests/MANUAL.md section).
- No cancellations (go via
take_messagefor now) - No rescheduling
- No recurring appointments
- No multi-provider round-robin
- No SMS confirmation / caller verification
- No payment integration
- No Outlook / Microsoft 365 / Apple Calendar
- No reminders (would need an SMS provider)
- Design spec:
docs/superpowers/specs/2026-04-24-google-calendar-integration-design.md - Implementation plan:
docs/superpowers/plans/2026-04-24-google-calendar-integration.md - Setup guide:
documentation/google-calendar-setup.md - Current architecture:
documentation/architecture.md
Validated end-to-end against a personal gmail Google Calendar via the LiveKit playground. The live test surfaced five real issues, all fixed before merge:
calendar.eventsscope alone is insufficient forfreeBusy.query. Google treats freeBusy as a calendar-level operation. Addedcalendar.freebusyto the scope set. Existing OAuth tokens issued for the single-scope set must be re-minted via the setup CLI.- Setup CLI U+2713 ("✓") crashed on default Windows cp1252 console.
Replaced with
[OK]. The crash was post-success (token + chmod already done), masking the prior success. dateutil.parserdoesn't understand "tomorrow" / "next Monday". Added_resolve_relative_date()that normalizes those phrases to absolute dates before parsing. The CALENDAR prompt already advertised relative-date support; the parser silently rejected them.- Caller invite needed. Added an optional
caller_emailparameter tobook_appointment. When provided, caller is added as an OPTIONAL Google attendee (so a decline does not affect the organizer's free/busy) and Google sends them the standard.icsinvite. - Phone + email read-back discipline. The agent was booking before confirming the callback number and email letter-by-letter. CALENDAR prompt now requires digit-by-digit phone read-back and letter-by-letter email read-back, both with explicit "yes" confirmation, BEFORE the booking goes through.
Also during validation: added RECEPTIONIST_CONFIG env var so
python -m receptionist.agent dev can pick a non-default business
config without needing job metadata. The previous fallback (first YAML
alphabetically) silently picked example-dental.yaml, leading to
"calendar not configured" errors when running a fresh test config.
Google quirk worth knowing: when the attendee email's domain is unroutable, Google silently drops the attendee from the persisted event. The booking still succeeds; staff just don't see the attendee on the event. Verified by API re-fetch. Not a code bug — the readback discipline (#5) is our mitigation.
Final tally: 198 tests passing, 2 Windows-skipped POSIX-permission checks (intentional). Manual gate complete.
After PR #7 merged, ran a full security-and-optimization sweep before
opening the project up to broader use. 11 commits landed on main
between 7352469 and 939de92. 256 tests passing (+58 from the start
of the pass).
- Setup CLI path-traversal:
python -m receptionist.booking setup ../../etc/passwdpreviously resolved into a config path. Validator added matching the rest of the codebase's^[a-zA-Z0-9_-]+$. - Webhook URL schemes + private hosts:
WebhookChannel.urlnow rejects non-http(s) schemes at config load and warns on loopback / private / link-local hosts (AWS metadata endpoint, localhost, etc.). - Caller-supplied text caps:
take_messageandbook_appointmenttruncate caller free-text fields with logging. Prevents storage bloat and Google's 8KB event description ceiling from being hit. - Windows OAuth ACL: was a silent no-op; now logs a one-shot WARNING per token path nudging operators toward user-only dirs.
assert-> explicit raise inrecording/storage,recording/ egress,messaging/retry— survivespython -O.CallMetadata.mark_finalized(): now logs WARNING instead of swallowingValueErroron duration parsing.
- Cached per-call:
Dispatcher(was rebuilt pertake_message),EmailChannelinstances (were rebuilt per email trigger). _offered_slotsbounded: replaced unboundedset[str]with adeque[frozenset[str]]ofmaxlen=3. Memory-safe on long calls.- Routing lookup O(1): dict-by-lowercased-name built at
Receptionist.__init__. FAQ matching deliberately stays linear (bidirectional substring match doesn't fit a single dict). - Lightweight imports hoisted out of the deferred path; only the
googleapiclient-pulling chain still loads lazily.
.env"tracked in git": NOT tracked, IS gitignored. Surveyor saw the on-disk file and assumed.livekit-agents/google-auth"version floor risk": speculative without a CVE lookup; floors are recent.
- Test gaps in SIP transfer error paths, OAuth refresh failures, recording egress failures.
- Splitting the long
check_availability/book_appointmentmethods.
Plan file: C:\Users\MDASR\.claude\plans\stateful-floating-fiddle.md