[Docs][Voice agents single agent and multi-agent debate example with the voice agents package] [Example as well]

kyegomez · kyegomez · commit 7b714c4fdcd5 · 2025-12-27T22:23:54.000-05:00
diff --git a/SECURITY.md b/SECURITY.md
@@ -35,5 +35,4 @@ Once the vulnerability has been thoroughly assessed, we will take the necessary
 
 We aim to respond to all vulnerability reports in a timely manner and work towards resolving them as quickly as possible. We thank you for your contribution to the security of our software.
 
-Please note that any vulnerability reports that are not related to the specified versions or do not provide sufficient information may be declined.
-
+Please note that any vulnerability reports that are not related to the specified versions or do not provide sufficient information may be declined.
diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml
@@ -441,6 +441,11 @@ nav:
       - Job Finding Swarm: "examples/job_finding.md"
       - Mergers & Aquisition (M&A) Advisory Swarm: "examples/ma_swarm.md"
     
+    - Voice Agents:
+      - Overview: "swarms/examples/voice_agents_overview.md"
+      - Single Speech Agent: "swarms/examples/single_agent_speech.md"
+      - Multi-Agent Speech Debate: "swarms/examples/multi_agent_speech_debate.md"
+    
     - Tools & Integrations:
       - Overview: "examples/tools_integrations_overview.md"
       - Web Search with Exa: "examples/exa_search.md"
diff --git a/docs/swarms/examples/multi_agent_speech_debate.md b/docs/swarms/examples/multi_agent_speech_debate.md
@@ -0,0 +1,129 @@
+# Multi-Agent Speech Debate
+
+This tutorial explores a more advanced use case: simulating a turn-based debate between two agents where each agent speaks their responses. We will also optionally use Speech-to-Text (STT) to provide the initial debate topic.
+
+## Prerequisites
+
+- Python 3.10+
+- OpenAI API key
+- `swarms` library
+- `voice-agents` library
+- A working microphone (if using STT)
+
+## Tutorial Steps
+
+1. **Install Dependencies**
+   ```bash
+   pip3 install -U swarms voice-agents
+   ```
+
+2. **Define Agent Personalities**
+   Create distinct system prompts for your agents to ensure a dynamic debate. In this example, we use Socrates and Simone de Beauvoir.
+
+3. **Initialize Agents**
+   Set up two agents with `streaming_on=True`.
+
+4. **Create a Debate Loop**
+   Implement a function that alternates turns between agents, uses their respective TTS voices, and passes the response of one agent as the input to the next.
+
+5. **Integrate STT (Optional)**
+   Use `record_audio` and `speech_to_text` to capture your own voice as the starting prompt for the debate.
+
+## Code Example
+
+```python
+from swarms import Agent
+from swarms.structs.conversation import Conversation
+from voice_agents.main import speech_to_text, record_audio, StreamingTTSCallback
+
+def debate_with_speech(
+    agents: list,
+    max_loops: int = 1,
+    task: str = None,
+    use_stt_for_input: bool = False,
+):
+    """
+    Simulate a turn-based debate between two agents with speech capabilities.
+    
+    Args:
+        agents (list): A list containing exactly two Agent instances who will debate.
+        max_loops (int): The number of conversational turns.
+        task (str): The initial prompt or question to start the debate.
+        use_stt_for_input (bool): If True, use speech-to-text for the initial task input.
+    
+    Returns:
+        str: The formatted conversation history.
+    """
+    conversation = Conversation()
+    
+    # Create TTS callbacks with different voices to differentiate speakers
+    tts_callback1 = StreamingTTSCallback(voice="onyx", model="tts-1")  # Deeper voice
+    tts_callback2 = StreamingTTSCallback(voice="nova", model="tts-1")   # Softer voice
+    
+    # Get initial task from STT or provided string
+    if use_stt_for_input:
+        print("Please speak your question or topic for the debate...")
+        audio = record_audio(duration=5.0)
+        task = speech_to_text(audio_data=audio, sample_rate=16000)
+        print(f"Transcribed: {task}\n")
+    
+    message = task
+    speaker = agents[0]
+    other = agents[1]
+    current_callback = tts_callback1
+    other_callback = tts_callback2
+    
+    for i in range(max_loops):
+        print(f"--- Turn {i+1}: {speaker.agent_name} speaking ---")
+        
+        # Agent generates response and speaks in real-time
+        response = speaker.run(
+            task=message,
+            streaming_callback=current_callback,
+        )
+        current_callback.flush()
+        
+        conversation.add(speaker.agent_name, response)
+        
+        # Swap roles for the next turn
+        message = response
+        speaker, other = other, speaker
+        current_callback, other_callback = other_callback, current_callback
+    
+    return conversation.return_history_as_string()
+
+# Define System Prompts
+socratic_prompt = "You are Socrates. Challenge every assumption with logic."
+beauvoir_prompt = "You are Simone de Beauvoir. Focus on freedom and existence."
+
+# Instantiate Agents
+agent1 = Agent(
+    agent_name="Socrates",
+    system_prompt=socratic_prompt,
+    model_name="gpt-4o",
+    streaming_on=True,
+)
+agent2 = Agent(
+    agent_name="Simone de Beauvoir",
+    system_prompt=beauvoir_prompt,
+    model_name="gpt-4o",
+    streaming_on=True,
+)
+
+# Run the debate
+history = debate_with_speech(
+    agents=[agent1, agent2],
+    max_loops=3,
+    task="Is freedom an illusion?",
+)
+
+print(history)
+```
+
+## Key Components
+
+- **Differentiated Voices**: Using "onyx" and "nova" helps the listener distinguish which agent is currently speaking.
+- **Turn-based Logic**: The output of the first agent becomes the input for the second, creating a continuous dialogue.
+- **STT Integration**: `speech_to_text` allows for hands-free interaction with the swarm.
+- **Conversation Tracking**: The `Conversation` struct helps maintain a record of the entire exchange.
+
diff --git a/docs/swarms/examples/single_agent_speech.md b/docs/swarms/examples/single_agent_speech.md
@@ -0,0 +1,73 @@
+# Creating a Single Speech Agent
+
+This tutorial demonstrates how to create a single AI agent with real-time text-to-speech (TTS) capabilities using the Swarms framework and the `voice-agents` package. This setup is ideal for interactive applications where you want the agent to "speak" its responses as they are generated.
+
+## Prerequisites
+
+- Python 3.10+
+- OpenAI API key (for both LLM and TTS)
+- `swarms` library
+- `voice-agents` library
+
+## Tutorial Steps
+
+1. **Install Dependencies**
+   Install the necessary packages:
+   ```bash
+   pip3 install -U swarms voice-agents
+   ```
+
+2. **Set Up Environment**
+   Ensure your OpenAI API key is set in your environment:
+   ```bash
+   export OPENAI_API_KEY="your-api-key-here"
+   ```
+
+3. **Initialize the Agent**
+   Create an agent with `streaming_on=True`. This is crucial for the TTS callback to work in real-time.
+
+4. **Configure the TTS Callback**
+   Use the `StreamingTTSCallback` from the `voice-agents` package. You can choose different voices like "alloy", "echo", "fable", "onyx", "nova", or "shimmer".
+
+5. **Run the Agent**
+   Pass the `streaming_callback` to the `agent.run()` method.
+
+## Code Example
+
+```python
+from swarms import Agent
+from voice_agents import StreamingTTSCallback
+
+# Initialize the agent
+agent = Agent(
+    agent_name="Quantitative-Trading-Agent",
+    agent_description="Advanced quantitative trading and algorithmic analysis agent",
+    model_name="gpt-4o",
+    dynamic_temperature_enabled=True,
+    max_loops=1,
+    streaming_on=True, # Required for real-time TTS
+)
+
+# Create the streaming TTS callback
+# voice: alloy, echo, fable, onyx, nova, shimmer
+tts_callback = StreamingTTSCallback(voice="alloy", model="tts-1")
+
+# Run the agent with streaming TTS callback
+out = agent.run(
+    task="What are the top five best energy stocks across nuclear, solar, gas, and other energy sources?",
+    streaming_callback=tts_callback,
+)
+
+# Flush any remaining text in the buffer to ensure the last sentence is spoken
+tts_callback.flush()
+
+print(out)
+```
+
+## How it Works
+
+- **Streaming**: When `streaming_on=True`, the agent yields tokens as they are generated.
+- **Callback**: The `StreamingTTSCallback` collects these tokens into sentences and sends them to the OpenAI TTS API.
+- **Real-time Audio**: Audio is played back as soon as the first sentence is processed, significantly reducing latency compared to waiting for the full response.
+- **Flushing**: The `.flush()` method is called at the end to process any remaining text that didn't end with a sentence delimiter.
+
diff --git a/docs/swarms/examples/voice_agents_overview.md b/docs/swarms/examples/voice_agents_overview.md
@@ -0,0 +1,28 @@
+# Voice Agents Overview
+
+The Swarms framework supports the creation of interactive, voice-enabled agents through integration with the `voice-agents` package. These agents can perceive and respond using human-like speech, enabling hands-free interaction and more natural user experiences.
+
+## Core Features
+
+- **Real-time Streaming TTS**: Leveraging OpenAI's TTS models to speak responses sentence-by-sentence as they are generated, minimizing latency.
+- **Differentiated Voices**: Multiple voice profiles (alloy, onyx, nova, etc.) to give each agent in a swarm a unique personality.
+- **Speech-to-Text (STT)**: Integration for voice-based task inputs, allowing users to talk directly to their agent swarms.
+- **Seamless Integration**: Works with the standard `Agent` class and complex multi-agent architectures like debates and sequential workflows.
+
+## Available Tutorials
+
+In this section, you will find step-by-step guides on implementing voice capabilities:
+
+1. [**Single Speech Agent**](single_agent_speech.md): Learn how to add real-time text-to-speech to a single standalone agent.
+2. [**Multi-Agent Speech Debate**](multi_agent_speech_debate.md): A complex example showing two agents debating a topic using different voices, with optional voice-to-text input.
+
+## Getting Started
+
+To use these features, you'll need to install the `voice-agents` package alongside `swarms`:
+
+```bash
+pip install -U swarms voice-agents
+```
+
+You'll also need a valid `OPENAI_API_KEY` to access the TTS and STT models.
+
diff --git a/examples/guides/voice_agents/agent_with_speech.py b/examples/guides/voice_agents/agent_with_speech.py
@@ -0,0 +1,29 @@
+from swarms import Agent
+from voice_agents import StreamingTTSCallback
+
+# Initialize the agent
+agent = Agent(
+    agent_name="Quantitative-Trading-Agent",
+    agent_description="Advanced quantitative trading and algorithmic analysis agent",
+    model_name="gpt-4.1",
+    dynamic_temperature_enabled=True,
+    max_loops=1,
+    dynamic_context_window=True,
+    top_p=None,
+    streaming_on=True,
+    interactive=False,
+)
+
+# Create the streaming TTS callback
+tts_callback = StreamingTTSCallback(voice="alloy", model="tts-1")
+
+# # Run the agent with streaming TTS callback
+# out = agent.run(
+#     task="What are the top five best energy stocks across nuclear, solar, gas, and other energy sources?",
+#     streaming_callback=tts_callback,
+# )
+
+# # Flush any remaining text in the buffer
+# tts_callback.flush()
+
+# print(out)
diff --git a/examples/guides/voice_agents/debate_with_speech.py b/examples/guides/voice_agents/debate_with_speech.py