Skip to content

Commit 7b714c4

Browse files
committed
[Docs][Voice agents single agent and multi-agent debate example with the voice agents package] [Example as well]
1 parent c943031 commit 7b714c4

File tree

7 files changed

+416
-2
lines changed

7 files changed

+416
-2
lines changed

SECURITY.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,5 +35,4 @@ Once the vulnerability has been thoroughly assessed, we will take the necessary
3535

3636
We aim to respond to all vulnerability reports in a timely manner and work towards resolving them as quickly as possible. We thank you for your contribution to the security of our software.
3737

38-
Please note that any vulnerability reports that are not related to the specified versions or do not provide sufficient information may be declined.
39-
38+
Please note that any vulnerability reports that are not related to the specified versions or do not provide sufficient information may be declined.

docs/mkdocs.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -441,6 +441,11 @@ nav:
441441
- Job Finding Swarm: "examples/job_finding.md"
442442
- Mergers & Aquisition (M&A) Advisory Swarm: "examples/ma_swarm.md"
443443

444+
- Voice Agents:
445+
- Overview: "swarms/examples/voice_agents_overview.md"
446+
- Single Speech Agent: "swarms/examples/single_agent_speech.md"
447+
- Multi-Agent Speech Debate: "swarms/examples/multi_agent_speech_debate.md"
448+
444449
- Tools & Integrations:
445450
- Overview: "examples/tools_integrations_overview.md"
446451
- Web Search with Exa: "examples/exa_search.md"
Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
# Multi-Agent Speech Debate
2+
3+
This tutorial explores a more advanced use case: simulating a turn-based debate between two agents where each agent speaks their responses. We will also optionally use Speech-to-Text (STT) to provide the initial debate topic.
4+
5+
## Prerequisites
6+
7+
- Python 3.10+
8+
- OpenAI API key
9+
- `swarms` library
10+
- `voice-agents` library
11+
- A working microphone (if using STT)
12+
13+
## Tutorial Steps
14+
15+
1. **Install Dependencies**
16+
```bash
17+
pip3 install -U swarms voice-agents
18+
```
19+
20+
2. **Define Agent Personalities**
21+
Create distinct system prompts for your agents to ensure a dynamic debate. In this example, we use Socrates and Simone de Beauvoir.
22+
23+
3. **Initialize Agents**
24+
Set up two agents with `streaming_on=True`.
25+
26+
4. **Create a Debate Loop**
27+
Implement a function that alternates turns between agents, uses their respective TTS voices, and passes the response of one agent as the input to the next.
28+
29+
5. **Integrate STT (Optional)**
30+
Use `record_audio` and `speech_to_text` to capture your own voice as the starting prompt for the debate.
31+
32+
## Code Example
33+
34+
```python
35+
from swarms import Agent
36+
from swarms.structs.conversation import Conversation
37+
from voice_agents.main import speech_to_text, record_audio, StreamingTTSCallback
38+
39+
def debate_with_speech(
40+
agents: list,
41+
max_loops: int = 1,
42+
task: str = None,
43+
use_stt_for_input: bool = False,
44+
):
45+
"""
46+
Simulate a turn-based debate between two agents with speech capabilities.
47+
48+
Args:
49+
agents (list): A list containing exactly two Agent instances who will debate.
50+
max_loops (int): The number of conversational turns.
51+
task (str): The initial prompt or question to start the debate.
52+
use_stt_for_input (bool): If True, use speech-to-text for the initial task input.
53+
54+
Returns:
55+
str: The formatted conversation history.
56+
"""
57+
conversation = Conversation()
58+
59+
# Create TTS callbacks with different voices to differentiate speakers
60+
tts_callback1 = StreamingTTSCallback(voice="onyx", model="tts-1") # Deeper voice
61+
tts_callback2 = StreamingTTSCallback(voice="nova", model="tts-1") # Softer voice
62+
63+
# Get initial task from STT or provided string
64+
if use_stt_for_input:
65+
print("Please speak your question or topic for the debate...")
66+
audio = record_audio(duration=5.0)
67+
task = speech_to_text(audio_data=audio, sample_rate=16000)
68+
print(f"Transcribed: {task}\n")
69+
70+
message = task
71+
speaker = agents[0]
72+
other = agents[1]
73+
current_callback = tts_callback1
74+
other_callback = tts_callback2
75+
76+
for i in range(max_loops):
77+
print(f"--- Turn {i+1}: {speaker.agent_name} speaking ---")
78+
79+
# Agent generates response and speaks in real-time
80+
response = speaker.run(
81+
task=message,
82+
streaming_callback=current_callback,
83+
)
84+
current_callback.flush()
85+
86+
conversation.add(speaker.agent_name, response)
87+
88+
# Swap roles for the next turn
89+
message = response
90+
speaker, other = other, speaker
91+
current_callback, other_callback = other_callback, current_callback
92+
93+
return conversation.return_history_as_string()
94+
95+
# Define System Prompts
96+
socratic_prompt = "You are Socrates. Challenge every assumption with logic."
97+
beauvoir_prompt = "You are Simone de Beauvoir. Focus on freedom and existence."
98+
99+
# Instantiate Agents
100+
agent1 = Agent(
101+
agent_name="Socrates",
102+
system_prompt=socratic_prompt,
103+
model_name="gpt-4o",
104+
streaming_on=True,
105+
)
106+
agent2 = Agent(
107+
agent_name="Simone de Beauvoir",
108+
system_prompt=beauvoir_prompt,
109+
model_name="gpt-4o",
110+
streaming_on=True,
111+
)
112+
113+
# Run the debate
114+
history = debate_with_speech(
115+
agents=[agent1, agent2],
116+
max_loops=3,
117+
task="Is freedom an illusion?",
118+
)
119+
120+
print(history)
121+
```
122+
123+
## Key Components
124+
125+
- **Differentiated Voices**: Using "onyx" and "nova" helps the listener distinguish which agent is currently speaking.
126+
- **Turn-based Logic**: The output of the first agent becomes the input for the second, creating a continuous dialogue.
127+
- **STT Integration**: `speech_to_text` allows for hands-free interaction with the swarm.
128+
- **Conversation Tracking**: The `Conversation` struct helps maintain a record of the entire exchange.
129+
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# Creating a Single Speech Agent
2+
3+
This tutorial demonstrates how to create a single AI agent with real-time text-to-speech (TTS) capabilities using the Swarms framework and the `voice-agents` package. This setup is ideal for interactive applications where you want the agent to "speak" its responses as they are generated.
4+
5+
## Prerequisites
6+
7+
- Python 3.10+
8+
- OpenAI API key (for both LLM and TTS)
9+
- `swarms` library
10+
- `voice-agents` library
11+
12+
## Tutorial Steps
13+
14+
1. **Install Dependencies**
15+
Install the necessary packages:
16+
```bash
17+
pip3 install -U swarms voice-agents
18+
```
19+
20+
2. **Set Up Environment**
21+
Ensure your OpenAI API key is set in your environment:
22+
```bash
23+
export OPENAI_API_KEY="your-api-key-here"
24+
```
25+
26+
3. **Initialize the Agent**
27+
Create an agent with `streaming_on=True`. This is crucial for the TTS callback to work in real-time.
28+
29+
4. **Configure the TTS Callback**
30+
Use the `StreamingTTSCallback` from the `voice-agents` package. You can choose different voices like "alloy", "echo", "fable", "onyx", "nova", or "shimmer".
31+
32+
5. **Run the Agent**
33+
Pass the `streaming_callback` to the `agent.run()` method.
34+
35+
## Code Example
36+
37+
```python
38+
from swarms import Agent
39+
from voice_agents import StreamingTTSCallback
40+
41+
# Initialize the agent
42+
agent = Agent(
43+
agent_name="Quantitative-Trading-Agent",
44+
agent_description="Advanced quantitative trading and algorithmic analysis agent",
45+
model_name="gpt-4o",
46+
dynamic_temperature_enabled=True,
47+
max_loops=1,
48+
streaming_on=True, # Required for real-time TTS
49+
)
50+
51+
# Create the streaming TTS callback
52+
# voice: alloy, echo, fable, onyx, nova, shimmer
53+
tts_callback = StreamingTTSCallback(voice="alloy", model="tts-1")
54+
55+
# Run the agent with streaming TTS callback
56+
out = agent.run(
57+
task="What are the top five best energy stocks across nuclear, solar, gas, and other energy sources?",
58+
streaming_callback=tts_callback,
59+
)
60+
61+
# Flush any remaining text in the buffer to ensure the last sentence is spoken
62+
tts_callback.flush()
63+
64+
print(out)
65+
```
66+
67+
## How it Works
68+
69+
- **Streaming**: When `streaming_on=True`, the agent yields tokens as they are generated.
70+
- **Callback**: The `StreamingTTSCallback` collects these tokens into sentences and sends them to the OpenAI TTS API.
71+
- **Real-time Audio**: Audio is played back as soon as the first sentence is processed, significantly reducing latency compared to waiting for the full response.
72+
- **Flushing**: The `.flush()` method is called at the end to process any remaining text that didn't end with a sentence delimiter.
73+
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Voice Agents Overview
2+
3+
The Swarms framework supports the creation of interactive, voice-enabled agents through integration with the `voice-agents` package. These agents can perceive and respond using human-like speech, enabling hands-free interaction and more natural user experiences.
4+
5+
## Core Features
6+
7+
- **Real-time Streaming TTS**: Leveraging OpenAI's TTS models to speak responses sentence-by-sentence as they are generated, minimizing latency.
8+
- **Differentiated Voices**: Multiple voice profiles (alloy, onyx, nova, etc.) to give each agent in a swarm a unique personality.
9+
- **Speech-to-Text (STT)**: Integration for voice-based task inputs, allowing users to talk directly to their agent swarms.
10+
- **Seamless Integration**: Works with the standard `Agent` class and complex multi-agent architectures like debates and sequential workflows.
11+
12+
## Available Tutorials
13+
14+
In this section, you will find step-by-step guides on implementing voice capabilities:
15+
16+
1. [**Single Speech Agent**](single_agent_speech.md): Learn how to add real-time text-to-speech to a single standalone agent.
17+
2. [**Multi-Agent Speech Debate**](multi_agent_speech_debate.md): A complex example showing two agents debating a topic using different voices, with optional voice-to-text input.
18+
19+
## Getting Started
20+
21+
To use these features, you'll need to install the `voice-agents` package alongside `swarms`:
22+
23+
```bash
24+
pip install -U swarms voice-agents
25+
```
26+
27+
You'll also need a valid `OPENAI_API_KEY` to access the TTS and STT models.
28+
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
from swarms import Agent
2+
from voice_agents import StreamingTTSCallback
3+
4+
# Initialize the agent
5+
agent = Agent(
6+
agent_name="Quantitative-Trading-Agent",
7+
agent_description="Advanced quantitative trading and algorithmic analysis agent",
8+
model_name="gpt-4.1",
9+
dynamic_temperature_enabled=True,
10+
max_loops=1,
11+
dynamic_context_window=True,
12+
top_p=None,
13+
streaming_on=True,
14+
interactive=False,
15+
)
16+
17+
# Create the streaming TTS callback
18+
tts_callback = StreamingTTSCallback(voice="alloy", model="tts-1")
19+
20+
# # Run the agent with streaming TTS callback
21+
# out = agent.run(
22+
# task="What are the top five best energy stocks across nuclear, solar, gas, and other energy sources?",
23+
# streaming_callback=tts_callback,
24+
# )
25+
26+
# # Flush any remaining text in the buffer
27+
# tts_callback.flush()
28+
29+
# print(out)

0 commit comments

Comments
 (0)