Skip to content

Commit c4c947e

Browse files
committed
[Docs][Example][Hiearchical voice agents example]
1 parent e021c38 commit c4c947e

File tree

3 files changed

+369
-0
lines changed

3 files changed

+369
-0
lines changed

docs/mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -460,6 +460,7 @@ nav:
460460
- Overview: "swarms/examples/voice_agents_overview.md"
461461
- Single Speech Agent: "swarms/examples/single_agent_speech.md"
462462
- Multi-Agent Speech Debate: "swarms/examples/multi_agent_speech_debate.md"
463+
- Hierarchical Speech Swarm: "swarms/examples/hierarchical_speech_swarm.md"
463464

464465
- Tools & Integrations:
465466
- Overview: "examples/tools_integrations_overview.md"
Lines changed: 269 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,269 @@
1+
# Hierarchical Swarm with Speech Capabilities
2+
3+
This tutorial demonstrates how to create a hierarchical swarm where multiple specialized agents communicate through voice using text-to-speech (TTS) capabilities. Each agent has a unique voice, making it easy to distinguish who is speaking during collaborative task execution.
4+
5+
## Overview
6+
7+
A hierarchical swarm combines the power of multi-agent collaboration with voice communication. In this architecture:
8+
9+
- **Director Agent**: Coordinates the overall workflow and distributes tasks
10+
11+
- **Worker Agents**: Specialized agents that execute specific tasks
12+
13+
- **Voice Communication**: Each agent speaks their responses using distinct TTS voices
14+
15+
This creates an immersive experience where you can hear agents collaborating in real-time.
16+
17+
## Prerequisites
18+
19+
- Python 3.10+
20+
- OpenAI API key (for both LLM and TTS)
21+
- `swarms` library
22+
- `voice-agents` library
23+
24+
## Tutorial Steps
25+
26+
1. **Install Dependencies**
27+
```bash
28+
pip3 install -U swarms voice-agents
29+
```
30+
31+
2. **Set Up Environment**
32+
Ensure your OpenAI API key is set:
33+
```bash
34+
export OPENAI_API_KEY="your-api-key-here"
35+
```
36+
37+
3. **Create TTS Callbacks**
38+
Define distinct voices for each agent to differentiate speakers.
39+
40+
4. **Initialize Agents with TTS**
41+
Create specialized agents with `streaming_on=True` and assign TTS callbacks directly.
42+
43+
5. **Create Hierarchical Swarm**
44+
Set up the swarm with your speech-enabled agents.
45+
46+
6. **Run the Swarm**
47+
Execute tasks and listen to agents collaborate through voice.
48+
49+
## Complete Code Example
50+
51+
```python
52+
"""
53+
Hierarchical Swarm with Speech Capabilities
54+
55+
This example demonstrates a hierarchical swarm where agents communicate
56+
with each other through voice using text-to-speech (TTS) capabilities.
57+
Each agent has a unique voice, making it easy to distinguish who is speaking.
58+
"""
59+
60+
from swarms import Agent, HierarchicalSwarm
61+
from voice_agents import StreamingTTSCallback
62+
63+
# Create TTS callbacks for each agent with distinct voices
64+
tts_callbacks = {
65+
"Research-Analyst": StreamingTTSCallback(
66+
voice="onyx", model="openai/tts-1"
67+
), # Deeper, authoritative voice
68+
"Data-Analyst": StreamingTTSCallback(
69+
voice="nova", model="openai/tts-1"
70+
), # Softer, analytical voice
71+
"Strategy-Consultant": StreamingTTSCallback(
72+
voice="alloy", model="openai/tts-1"
73+
), # Clear, professional voice
74+
"Director": StreamingTTSCallback(
75+
voice="echo", model="openai/tts-1"
76+
), # Distinctive voice for director
77+
}
78+
79+
# Create specialized agents with streaming enabled for TTS
80+
# Assign TTS callbacks directly to each agent
81+
research_agent = Agent(
82+
agent_name="Research-Analyst",
83+
agent_description="Specialized in comprehensive research and data gathering",
84+
model_name="gpt-4.1",
85+
max_loops=1,
86+
verbose=False,
87+
streaming_on=True, # Required for TTS streaming
88+
streaming_callback=tts_callbacks.get("Research-Analyst"), # Direct TTS callback
89+
)
90+
91+
analysis_agent = Agent(
92+
agent_name="Data-Analyst",
93+
agent_description="Expert in data analysis and pattern recognition",
94+
model_name="gpt-4.1",
95+
max_loops=1,
96+
verbose=False,
97+
streaming_on=True, # Required for TTS streaming
98+
streaming_callback=tts_callbacks.get("Data-Analyst"), # Direct TTS callback
99+
)
100+
101+
strategy_agent = Agent(
102+
agent_name="Strategy-Consultant",
103+
agent_description="Specialized in strategic planning and recommendations",
104+
model_name="gpt-4.1",
105+
max_loops=1,
106+
verbose=False,
107+
streaming_on=True, # Required for TTS streaming
108+
streaming_callback=tts_callbacks.get("Strategy-Consultant"), # Direct TTS callback
109+
)
110+
111+
# Create hierarchical swarm
112+
swarm = HierarchicalSwarm(
113+
name="Swarms Corporation Operations",
114+
description="Enterprise-grade hierarchical swarm for complex task execution with voice communication",
115+
agents=[research_agent, analysis_agent, strategy_agent],
116+
max_loops=1,
117+
interactive=False,
118+
director_model_name="gpt-4.1",
119+
director_temperature=0.7,
120+
director_top_p=None,
121+
planning_enabled=True,
122+
)
123+
124+
# Define the task
125+
task = (
126+
"Conduct a comprehensive analysis of renewable energy stocks. "
127+
"Research the current market trends, analyze the data, and provide "
128+
"strategic recommendations for investment."
129+
)
130+
131+
# Run the swarm (agents already have their TTS callbacks assigned)
132+
try:
133+
result = swarm.run(task=task)
134+
135+
# Flush all TTS buffers to ensure everything is spoken
136+
for callback in tts_callbacks.values():
137+
callback.flush()
138+
139+
except Exception:
140+
# Still flush buffers on error
141+
for callback in tts_callbacks.values():
142+
callback.flush()
143+
raise
144+
```
145+
146+
## Key Components Explained
147+
148+
### 1. TTS Callback Configuration
149+
150+
Each agent gets a unique voice to distinguish speakers:
151+
152+
```python
153+
tts_callbacks = {
154+
"Research-Analyst": StreamingTTSCallback(voice="onyx", model="openai/tts-1"),
155+
"Data-Analyst": StreamingTTSCallback(voice="nova", model="openai/tts-1"),
156+
"Strategy-Consultant": StreamingTTSCallback(voice="alloy", model="openai/tts-1"),
157+
"Director": StreamingTTSCallback(voice="echo", model="openai/tts-1"),
158+
}
159+
```
160+
161+
**Available Voices:**
162+
163+
| Voice | Description |
164+
|----------|----------------------------------|
165+
| `alloy` | Clear, professional voice |
166+
| `echo` | Distinctive, commanding voice |
167+
| `fable` | Warm, narrative voice |
168+
| `onyx` | Deeper, authoritative voice |
169+
| `nova` | Softer, analytical voice |
170+
| `shimmer`| Bright, energetic voice |
171+
172+
### 2. Agent Configuration
173+
174+
Key requirements for speech-enabled agents:
175+
176+
- **`streaming_on=True`**: Enables real-time token streaming required for TTS
177+
178+
- **`streaming_callback`**: Direct assignment of TTS callback to each agent
179+
180+
- **`max_loops=1`**: Typically set to 1 for hierarchical swarms (director handles coordination)
181+
182+
```python
183+
research_agent = Agent(
184+
agent_name="Research-Analyst",
185+
agent_description="Specialized in comprehensive research and data gathering",
186+
model_name="gpt-4.1",
187+
max_loops=1,
188+
verbose=False,
189+
streaming_on=True, # Required for TTS streaming
190+
streaming_callback=tts_callbacks.get("Research-Analyst"), # Direct TTS callback
191+
)
192+
```
193+
194+
### 3. Hierarchical Swarm Setup
195+
196+
The swarm coordinates multiple agents through a director:
197+
198+
```python
199+
swarm = HierarchicalSwarm(
200+
name="Swarms Corporation Operations",
201+
description="Enterprise-grade hierarchical swarm for complex task execution",
202+
agents=[research_agent, analysis_agent, strategy_agent],
203+
max_loops=1,
204+
director_model_name="gpt-4.1",
205+
director_temperature=0.7,
206+
planning_enabled=True,
207+
)
208+
```
209+
210+
**Key Parameters:**
211+
- `agents`: List of worker agents with TTS capabilities
212+
- `director_model_name`: Model for the coordinating director
213+
- `planning_enabled`: Allows director to create execution plans
214+
- `max_loops`: Number of feedback iterations
215+
216+
### 4. Buffer Flushing
217+
218+
Always flush TTS buffers after execution to ensure all speech is played:
219+
220+
```python
221+
# Flush all TTS buffers to ensure everything is spoken
222+
for callback in tts_callbacks.values():
223+
callback.flush()
224+
```
225+
226+
This is critical because the TTS callback buffers text and may not automatically flush incomplete sentences.
227+
228+
## How It Works
229+
230+
1. **Task Distribution**: The director agent receives the task and creates a plan
231+
2. **Agent Assignment**: Director distributes subtasks to specialized worker agents
232+
3. **Real-time Speech**: As each agent generates responses, tokens are streamed to their TTS callback
233+
4. **Voice Differentiation**: Each agent's unique voice makes it clear who is speaking
234+
5. **Collaboration**: Agents can reference each other's work, creating a natural conversation flow
235+
236+
## Advanced Customization
237+
238+
### Custom Voice Selection
239+
240+
Choose voices that match agent personalities:
241+
242+
```python
243+
# Authoritative leader
244+
leader_voice = StreamingTTSCallback(voice="onyx", model="openai/tts-1")
245+
246+
# Analytical researcher
247+
researcher_voice = StreamingTTSCallback(voice="nova", model="openai/tts-1")
248+
249+
# Professional consultant
250+
consultant_voice = StreamingTTSCallback(voice="alloy", model="openai/tts-1")
251+
```
252+
253+
254+
## Best Practices
255+
256+
| Best Practice | Description |
257+
|--------------------------|------------------------------------------------------------------------------------------|
258+
| **Voice Selection** | Use distinct voices for each agent to avoid confusion |
259+
| **Buffer Management** | Always flush TTS buffers after execution |
260+
| **Error Handling** | Flush buffers even on errors to prevent audio glitches |
261+
| **Streaming Requirement**| Always set `streaming_on=True` for TTS to work |
262+
| **Direct Assignment** | Assign TTS callbacks directly to agents for better control |
263+
264+
265+
### Tips for Audio Playback
266+
267+
- **Audio Overlap**: Agents normally speak sequentially, but if you hear overlapping audio, check that agents aren’t being executed concurrently. Adjust `max_loops` or modify the execution order if necessary.
268+
269+
- **Missing Audio**: Always flush TTS buffers after execution with `callback.flush()`. Make sure agents are generating responses and that the TTS callback is actively receiving streamed tokens.
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
"""
2+
Hierarchical Swarm with Speech Capabilities
3+
4+
This example demonstrates a hierarchical swarm where agents communicate
5+
with each other through voice using text-to-speech (TTS) capabilities.
6+
Each agent has a unique voice, making it easy to distinguish who is speaking.
7+
"""
8+
9+
from swarms import Agent, HierarchicalSwarm
10+
from voice_agents import StreamingTTSCallback
11+
12+
# Create TTS callbacks for each agent with distinct voices
13+
tts_callbacks = {
14+
"Research-Analyst": StreamingTTSCallback(
15+
voice="onyx", model="openai/tts-1"
16+
), # Deeper, authoritative voice
17+
"Data-Analyst": StreamingTTSCallback(
18+
voice="nova", model="openai/tts-1"
19+
), # Softer, analytical voice
20+
"Strategy-Consultant": StreamingTTSCallback(
21+
voice="alloy", model="openai/tts-1"
22+
), # Clear, professional voice
23+
"Director": StreamingTTSCallback(
24+
voice="echo", model="openai/tts-1"
25+
), # Distinctive voice for director
26+
}
27+
28+
# Create specialized agents with streaming enabled for TTS
29+
# Assign TTS callbacks directly to each agent
30+
research_agent = Agent(
31+
agent_name="Research-Analyst",
32+
agent_description="Specialized in comprehensive research and data gathering",
33+
model_name="gpt-4.1",
34+
max_loops=1,
35+
verbose=False,
36+
streaming_on=True, # Required for TTS streaming
37+
streaming_callback=tts_callbacks.get(
38+
"Research-Analyst"
39+
), # Direct TTS callback
40+
)
41+
42+
analysis_agent = Agent(
43+
agent_name="Data-Analyst",
44+
agent_description="Expert in data analysis and pattern recognition",
45+
model_name="gpt-4.1",
46+
max_loops=1,
47+
verbose=False,
48+
streaming_on=True, # Required for TTS streaming
49+
streaming_callback=tts_callbacks.get(
50+
"Data-Analyst"
51+
), # Direct TTS callback
52+
)
53+
54+
strategy_agent = Agent(
55+
agent_name="Strategy-Consultant",
56+
agent_description="Specialized in strategic planning and recommendations",
57+
model_name="gpt-4.1",
58+
max_loops=1,
59+
verbose=False,
60+
streaming_on=True, # Required for TTS streaming
61+
streaming_callback=tts_callbacks.get(
62+
"Strategy-Consultant"
63+
), # Direct TTS callback
64+
)
65+
66+
# Create hierarchical swarm
67+
swarm = HierarchicalSwarm(
68+
name="Swarms Corporation Operations",
69+
description="Enterprise-grade hierarchical swarm for complex task execution with voice communication",
70+
agents=[research_agent, analysis_agent, strategy_agent],
71+
max_loops=1,
72+
interactive=False,
73+
director_model_name="gpt-4.1",
74+
director_temperature=0.7,
75+
director_top_p=None,
76+
planning_enabled=True,
77+
)
78+
79+
80+
# Define the task
81+
task = (
82+
"Conduct a comprehensive analysis of renewable energy stocks. "
83+
"Research the current market trends, analyze the data, and provide "
84+
"strategic recommendations for investment."
85+
)
86+
87+
# Run the swarm (agents already have their TTS callbacks assigned)
88+
try:
89+
result = swarm.run(task=task)
90+
91+
# Flush all TTS buffers to ensure everything is spoken
92+
for callback in tts_callbacks.values():
93+
callback.flush()
94+
95+
except Exception:
96+
# Still flush buffers on error
97+
for callback in tts_callbacks.values():
98+
callback.flush()
99+
raise

0 commit comments

Comments
 (0)