Welcome back to our LiveKit voice agent series! Last week, we explored the fundamentals of voice agents using a Google Colab notebook tutorial where we built and tested a basic voice agent directly in the notebook environment. This gave us a solid foundation for understanding how LiveKit agents work with speech recognition, language models, and text-to-speech.
This week, we're taking a significant step forward by building a production-ready, full-stack voice agent application that includes:
- 🎙️ Web-based React frontend with real-time voice interaction
- 🐍 Python backend running LiveKit agents
- 🌐 Real-time audio streaming between browser and agent
- ⚡ Production deployment on a cloud server
While notebooks are excellent for learning and prototyping, real-world voice applications need:
- User-friendly interfaces - A polished web interface that users can actually interact with
- Scalable architecture - Separate frontend and backend that can handle multiple users
- Production readiness - Proper deployment, error handling, and monitoring
- Real-time performance - Low-latency voice interactions that feel natural
By the end of this tutorial, you'll have:
- ✅ Set up a LiveKit Cloud project with proper authentication
- ✅ Built a React frontend with real-time voice capabilities
- ✅ Deployed a Python voice agent backend on a cloud server
- ✅ Integrated Gemini LLM, Google Speech-to-Text and Text-to-Speech API
- ✅ Configured production-ready deployment with proper error handling
Here's how our full-stack voice agent will work:
┌─────────────────┐ WebRTC ┌──────────────────┐
│ React Frontend │ ◄───────────► │ LiveKit Cloud │
│ (Browser) │ │ (Relay Server) │
└─────────────────┘ └──────────────────┘
│
│ Agent Protocol
▼
┌──────────────────┐
│ Python Agent │
│ (Your Server) │
└──────────────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌────────────┐ ┌──────────────────┐
│ Google STT │ │ Gemini LLM │ │ Google TTS │
│ (Speech-to-Text)│ │ (LLM) │ │ (Text-to-Speech) │
└─────────────────┘ └────────────┘ └──────────────────┘
We'll build this application step by step:
- LiveKit Cloud Configuration - Create and configuring our LiveKit project and Sandbox environments (React Frontend)
- Server Environment Setup - Setting up our development environment (installing libraries)
- Python Agent Development - Building our voice agent backend
- Integration & Testing - Connecting all components together
- Production Deployment - Deploying to a cloud server
-
Visit https://cloud.livekit.io/ and create a project.
-
Add create a Sandbox and add to the project created. (no need for Video, Chat features).
-
Clone this Repo https://github.com/nnitiwe-dev/livekit-google-voice-agent-tutorial.
-
Setup your Environment Variables (save to
.envfile):GOOGLE_API_KEY = your_google_api_key ELEVEN_API_KEY = your_eleven_api_key LIVEKIT_URL = your_livekit_url LIVEKIT_API_KEY = your_livekit_api_key LIVEKIT_API_SECRET = your_livekit_api_secret GOOGLE_APPLICATION_CREDENTIALS = path_to_your_google_application_credentials_json_file -
Visit https://docs.astral.sh/uv/getting-started/installation/ and install
uv. UV is a next-generation Python package installer and resolver written in Rust by the team at Astral (creators of Ruff). It's designed to be a drop-in replacement for pip and other Python packaging tools, but significantly faster and more reliable.Linux:
curl -LsSf https://astral.sh/uv/install.sh | shMac:
brew install uvWindows:
powershell -c "irm https://astral.sh/uv/install.ps1 | more"using pip:
pip install uv -
Install dependencies:
cd livekit-google-voice-agent-tutorial uv sync
-
Download certain models such as Silero VAD and the LiveKit turn detector:
uv run python src/agent.py download-files -
To run the agent for use with a frontend or telephony, use the
devcommand:
uv run python src/agent.py dev- In production, use the
startcommand:
uv run python src/agent.py start