|
| 1 | +# Hierarchical Swarm with Speech Capabilities |
| 2 | + |
| 3 | +This tutorial demonstrates how to create a hierarchical swarm where multiple specialized agents communicate through voice using text-to-speech (TTS) capabilities. Each agent has a unique voice, making it easy to distinguish who is speaking during collaborative task execution. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +A hierarchical swarm combines the power of multi-agent collaboration with voice communication. In this architecture: |
| 8 | + |
| 9 | +- **Director Agent**: Coordinates the overall workflow and distributes tasks |
| 10 | + |
| 11 | +- **Worker Agents**: Specialized agents that execute specific tasks |
| 12 | + |
| 13 | +- **Voice Communication**: Each agent speaks their responses using distinct TTS voices |
| 14 | + |
| 15 | +This creates an immersive experience where you can hear agents collaborating in real-time. |
| 16 | + |
| 17 | +## Prerequisites |
| 18 | + |
| 19 | +- Python 3.10+ |
| 20 | +- OpenAI API key (for both LLM and TTS) |
| 21 | +- `swarms` library |
| 22 | +- `voice-agents` library |
| 23 | + |
| 24 | +## Tutorial Steps |
| 25 | + |
| 26 | +1. **Install Dependencies** |
| 27 | + ```bash |
| 28 | + pip3 install -U swarms voice-agents |
| 29 | + ``` |
| 30 | + |
| 31 | +2. **Set Up Environment** |
| 32 | + Ensure your OpenAI API key is set: |
| 33 | + ```bash |
| 34 | + export OPENAI_API_KEY="your-api-key-here" |
| 35 | + ``` |
| 36 | + |
| 37 | +3. **Create TTS Callbacks** |
| 38 | + Define distinct voices for each agent to differentiate speakers. |
| 39 | + |
| 40 | +4. **Initialize Agents with TTS** |
| 41 | + Create specialized agents with `streaming_on=True` and assign TTS callbacks directly. |
| 42 | + |
| 43 | +5. **Create Hierarchical Swarm** |
| 44 | + Set up the swarm with your speech-enabled agents. |
| 45 | + |
| 46 | +6. **Run the Swarm** |
| 47 | + Execute tasks and listen to agents collaborate through voice. |
| 48 | + |
| 49 | +## Complete Code Example |
| 50 | + |
| 51 | +```python |
| 52 | +""" |
| 53 | +Hierarchical Swarm with Speech Capabilities |
| 54 | +
|
| 55 | +This example demonstrates a hierarchical swarm where agents communicate |
| 56 | +with each other through voice using text-to-speech (TTS) capabilities. |
| 57 | +Each agent has a unique voice, making it easy to distinguish who is speaking. |
| 58 | +""" |
| 59 | + |
| 60 | +from swarms import Agent, HierarchicalSwarm |
| 61 | +from voice_agents import StreamingTTSCallback |
| 62 | + |
| 63 | +# Create TTS callbacks for each agent with distinct voices |
| 64 | +tts_callbacks = { |
| 65 | + "Research-Analyst": StreamingTTSCallback( |
| 66 | + voice="onyx", model="openai/tts-1" |
| 67 | + ), # Deeper, authoritative voice |
| 68 | + "Data-Analyst": StreamingTTSCallback( |
| 69 | + voice="nova", model="openai/tts-1" |
| 70 | + ), # Softer, analytical voice |
| 71 | + "Strategy-Consultant": StreamingTTSCallback( |
| 72 | + voice="alloy", model="openai/tts-1" |
| 73 | + ), # Clear, professional voice |
| 74 | + "Director": StreamingTTSCallback( |
| 75 | + voice="echo", model="openai/tts-1" |
| 76 | + ), # Distinctive voice for director |
| 77 | +} |
| 78 | + |
| 79 | +# Create specialized agents with streaming enabled for TTS |
| 80 | +# Assign TTS callbacks directly to each agent |
| 81 | +research_agent = Agent( |
| 82 | + agent_name="Research-Analyst", |
| 83 | + agent_description="Specialized in comprehensive research and data gathering", |
| 84 | + model_name="gpt-4.1", |
| 85 | + max_loops=1, |
| 86 | + verbose=False, |
| 87 | + streaming_on=True, # Required for TTS streaming |
| 88 | + streaming_callback=tts_callbacks.get("Research-Analyst"), # Direct TTS callback |
| 89 | +) |
| 90 | + |
| 91 | +analysis_agent = Agent( |
| 92 | + agent_name="Data-Analyst", |
| 93 | + agent_description="Expert in data analysis and pattern recognition", |
| 94 | + model_name="gpt-4.1", |
| 95 | + max_loops=1, |
| 96 | + verbose=False, |
| 97 | + streaming_on=True, # Required for TTS streaming |
| 98 | + streaming_callback=tts_callbacks.get("Data-Analyst"), # Direct TTS callback |
| 99 | +) |
| 100 | + |
| 101 | +strategy_agent = Agent( |
| 102 | + agent_name="Strategy-Consultant", |
| 103 | + agent_description="Specialized in strategic planning and recommendations", |
| 104 | + model_name="gpt-4.1", |
| 105 | + max_loops=1, |
| 106 | + verbose=False, |
| 107 | + streaming_on=True, # Required for TTS streaming |
| 108 | + streaming_callback=tts_callbacks.get("Strategy-Consultant"), # Direct TTS callback |
| 109 | +) |
| 110 | + |
| 111 | +# Create hierarchical swarm |
| 112 | +swarm = HierarchicalSwarm( |
| 113 | + name="Swarms Corporation Operations", |
| 114 | + description="Enterprise-grade hierarchical swarm for complex task execution with voice communication", |
| 115 | + agents=[research_agent, analysis_agent, strategy_agent], |
| 116 | + max_loops=1, |
| 117 | + interactive=False, |
| 118 | + director_model_name="gpt-4.1", |
| 119 | + director_temperature=0.7, |
| 120 | + director_top_p=None, |
| 121 | + planning_enabled=True, |
| 122 | +) |
| 123 | + |
| 124 | +# Define the task |
| 125 | +task = ( |
| 126 | + "Conduct a comprehensive analysis of renewable energy stocks. " |
| 127 | + "Research the current market trends, analyze the data, and provide " |
| 128 | + "strategic recommendations for investment." |
| 129 | +) |
| 130 | + |
| 131 | +# Run the swarm (agents already have their TTS callbacks assigned) |
| 132 | +try: |
| 133 | + result = swarm.run(task=task) |
| 134 | + |
| 135 | + # Flush all TTS buffers to ensure everything is spoken |
| 136 | + for callback in tts_callbacks.values(): |
| 137 | + callback.flush() |
| 138 | + |
| 139 | +except Exception: |
| 140 | + # Still flush buffers on error |
| 141 | + for callback in tts_callbacks.values(): |
| 142 | + callback.flush() |
| 143 | + raise |
| 144 | +``` |
| 145 | + |
| 146 | +## Key Components Explained |
| 147 | + |
| 148 | +### 1. TTS Callback Configuration |
| 149 | + |
| 150 | +Each agent gets a unique voice to distinguish speakers: |
| 151 | + |
| 152 | +```python |
| 153 | +tts_callbacks = { |
| 154 | + "Research-Analyst": StreamingTTSCallback(voice="onyx", model="openai/tts-1"), |
| 155 | + "Data-Analyst": StreamingTTSCallback(voice="nova", model="openai/tts-1"), |
| 156 | + "Strategy-Consultant": StreamingTTSCallback(voice="alloy", model="openai/tts-1"), |
| 157 | + "Director": StreamingTTSCallback(voice="echo", model="openai/tts-1"), |
| 158 | +} |
| 159 | +``` |
| 160 | + |
| 161 | +**Available Voices:** |
| 162 | + |
| 163 | +| Voice | Description | |
| 164 | +|----------|----------------------------------| |
| 165 | +| `alloy` | Clear, professional voice | |
| 166 | +| `echo` | Distinctive, commanding voice | |
| 167 | +| `fable` | Warm, narrative voice | |
| 168 | +| `onyx` | Deeper, authoritative voice | |
| 169 | +| `nova` | Softer, analytical voice | |
| 170 | +| `shimmer`| Bright, energetic voice | |
| 171 | + |
| 172 | +### 2. Agent Configuration |
| 173 | + |
| 174 | +Key requirements for speech-enabled agents: |
| 175 | + |
| 176 | +- **`streaming_on=True`**: Enables real-time token streaming required for TTS |
| 177 | + |
| 178 | +- **`streaming_callback`**: Direct assignment of TTS callback to each agent |
| 179 | + |
| 180 | +- **`max_loops=1`**: Typically set to 1 for hierarchical swarms (director handles coordination) |
| 181 | + |
| 182 | +```python |
| 183 | +research_agent = Agent( |
| 184 | + agent_name="Research-Analyst", |
| 185 | + agent_description="Specialized in comprehensive research and data gathering", |
| 186 | + model_name="gpt-4.1", |
| 187 | + max_loops=1, |
| 188 | + verbose=False, |
| 189 | + streaming_on=True, # Required for TTS streaming |
| 190 | + streaming_callback=tts_callbacks.get("Research-Analyst"), # Direct TTS callback |
| 191 | +) |
| 192 | +``` |
| 193 | + |
| 194 | +### 3. Hierarchical Swarm Setup |
| 195 | + |
| 196 | +The swarm coordinates multiple agents through a director: |
| 197 | + |
| 198 | +```python |
| 199 | +swarm = HierarchicalSwarm( |
| 200 | + name="Swarms Corporation Operations", |
| 201 | + description="Enterprise-grade hierarchical swarm for complex task execution", |
| 202 | + agents=[research_agent, analysis_agent, strategy_agent], |
| 203 | + max_loops=1, |
| 204 | + director_model_name="gpt-4.1", |
| 205 | + director_temperature=0.7, |
| 206 | + planning_enabled=True, |
| 207 | +) |
| 208 | +``` |
| 209 | + |
| 210 | +**Key Parameters:** |
| 211 | +- `agents`: List of worker agents with TTS capabilities |
| 212 | +- `director_model_name`: Model for the coordinating director |
| 213 | +- `planning_enabled`: Allows director to create execution plans |
| 214 | +- `max_loops`: Number of feedback iterations |
| 215 | + |
| 216 | +### 4. Buffer Flushing |
| 217 | + |
| 218 | +Always flush TTS buffers after execution to ensure all speech is played: |
| 219 | + |
| 220 | +```python |
| 221 | +# Flush all TTS buffers to ensure everything is spoken |
| 222 | +for callback in tts_callbacks.values(): |
| 223 | + callback.flush() |
| 224 | +``` |
| 225 | + |
| 226 | +This is critical because the TTS callback buffers text and may not automatically flush incomplete sentences. |
| 227 | + |
| 228 | +## How It Works |
| 229 | + |
| 230 | +1. **Task Distribution**: The director agent receives the task and creates a plan |
| 231 | +2. **Agent Assignment**: Director distributes subtasks to specialized worker agents |
| 232 | +3. **Real-time Speech**: As each agent generates responses, tokens are streamed to their TTS callback |
| 233 | +4. **Voice Differentiation**: Each agent's unique voice makes it clear who is speaking |
| 234 | +5. **Collaboration**: Agents can reference each other's work, creating a natural conversation flow |
| 235 | + |
| 236 | +## Advanced Customization |
| 237 | + |
| 238 | +### Custom Voice Selection |
| 239 | + |
| 240 | +Choose voices that match agent personalities: |
| 241 | + |
| 242 | +```python |
| 243 | +# Authoritative leader |
| 244 | +leader_voice = StreamingTTSCallback(voice="onyx", model="openai/tts-1") |
| 245 | + |
| 246 | +# Analytical researcher |
| 247 | +researcher_voice = StreamingTTSCallback(voice="nova", model="openai/tts-1") |
| 248 | + |
| 249 | +# Professional consultant |
| 250 | +consultant_voice = StreamingTTSCallback(voice="alloy", model="openai/tts-1") |
| 251 | +``` |
| 252 | + |
| 253 | + |
| 254 | +## Best Practices |
| 255 | + |
| 256 | +| Best Practice | Description | |
| 257 | +|--------------------------|------------------------------------------------------------------------------------------| |
| 258 | +| **Voice Selection** | Use distinct voices for each agent to avoid confusion | |
| 259 | +| **Buffer Management** | Always flush TTS buffers after execution | |
| 260 | +| **Error Handling** | Flush buffers even on errors to prevent audio glitches | |
| 261 | +| **Streaming Requirement**| Always set `streaming_on=True` for TTS to work | |
| 262 | +| **Direct Assignment** | Assign TTS callbacks directly to agents for better control | |
| 263 | + |
| 264 | + |
| 265 | +### Tips for Audio Playback |
| 266 | + |
| 267 | +- **Audio Overlap**: Agents normally speak sequentially, but if you hear overlapping audio, check that agents aren’t being executed concurrently. Adjust `max_loops` or modify the execution order if necessary. |
| 268 | + |
| 269 | +- **Missing Audio**: Always flush TTS buffers after execution with `callback.flush()`. Make sure agents are generating responses and that the TTS callback is actively receiving streamed tokens. |
0 commit comments