1. Overview
This feature proposes enabling gRPC-based conversational session and tool invocation workflows to be represented using the Activity Protocol, leveraging existing extensibility points (type, name, value, valueType, entities) without introducing new schema fields.
The proposal standardizes how Start / Execute / Update / Status / Stop Nuance gRPC lifecycle messages can be mapped to Activities, allowing runtimes to process voice, dialog events, user input, and tool actions in a channel-agnostic and vendor-neutral manner.
2. Problem
Today, gRPC-based conversational platforms (e.g., voice/telephony runtimes) express session lifecycle, dialog events, ASR/NLU results, and action responses using protocol-specific payloads that do not map cleanly onto the Activity Protocol.
This results in:
- Custom, vendor-specific adapters in runtimes
- Limited reuse of Activity-based tooling and middleware
- Inconsistent handling of voice-specific concepts (barge-in, masking, partial transcripts, real-time settings)
- Difficulty extending support to new channels or tool providers
There is no clear, documented pattern for representing gRPC-style tool invocation and session control using Activity Protocol constructs.
3. Current Round-Trip Flow Example: Client and Server Interaction (Conversation Conductor [CC] and Microsoft Copilot Studio [MCS])
User places call:
Classic speech recognition flow → call reaches CC → CC transcribes user audio to text and sends to MCS → MCS processes dialog based on MCS agent configuration and CC input → Returns text response to CC → CC performs text to speech (TTS) → Returns speech response to user.
Realtime Speech to Speech flow → call reaches CC → CC fetches tools from MCS agent (topics, actions, agents) and processes user audio based on its own LLM → calls MCS for tool calling → MCS returns tool call ouput to CC → CC summarizes and returns speech output to user.
Initiate new conversation session:
client → StartRequest with {"payload": context data such as voice config, user data, client data etc}
server → StartResponse with {"payload": {"realtimeModelSettings": {/* ... */}, "capabilities": [/* ... */], "botInfo": {/* ... */} }} - acknowledges session creation and returns negotiated mcs bot capabilities and configuration
MCS greeting with menu options:
client → Empty ExecuteRequest {}
server → ExecuteResponse with {"payload": bot-greeting-message-with-menu-options}
User speaks or provides dtmf input:
client → ExecuteRequest with {"payload": DialogAction or User's transcribed-speech/dtmf input or tool call request}
server → ExecuteResponse with {"payload": QaAction / EscalationAction / EndAction / ContinueAction / GenAiAction} - server responds describing what the client should do next (that is - wait for user input, escalate to human agent, end conversation, continue with empty payload for async response or process mcs bot's tool call response)
For partial transcripts updates in Realtime speech flow:
client → UpdateRequest with {"payload": [ /* incremental transcript messages */] }
server → Empty UpdateResponse {}
Session status check:
client → StatusRequest with { sessionId }
server → StatusResponse with { sessionIdleTimeoutSeconds } // Remaining session time to live (TTL) value in seconds, after which the session is terminated
4. Current gRPC contract
4.1 Start Request
{
"sessionId": "xx",
"selector": {
"channel": "IVR",
"language": "en-US",
"library": "default"
},
"payload": {
"modelRef": {
"uri": "urn:nuance-mix:tag:model/LATEST/mix.dialog"
},
"data": {
"sipHeaders": {
},
"userData": {
"userChannelID": "+1xxx",
"systemID": "+1xxx"
},
"additionalContextItems": {
"OrganizationPhoneNumber": "+1xxx",
"msdyn_ConversationId": "xxx",
"msdyn_OrganizationPhone": "+1xxx",
"msdyn_CustomerPhone": "+1xxxx",
"sipHeaders": {
},
"CustomerPhoneNumber": "+1xxxx"
},
"msdyn_ocliveworkitemid": "xxxx",
"voices": {
"en-US": {
"voiceStyle": null,
"pitch": 0,
"voiceName": "en-US-AvaMultilingualNeural",
"speakingSpeed": 0
}
},
"botData": {
"type": "Mcs"
},
"amdContext": {
"enableAMD": false
},
"msdyn_sessionid": "xxx",
"resultContext": {
"ivrResultContextOnHangup": false
}
}
},
"sessionTimeoutSec": 900,
"clientData": {
"msdyn_sessionid": "xxx",
"correlation_id": "xxx",
"cc_session_id": "xxx",
"msdyn_botid": "xxx"
}
}
4.2 Start Response
{
"payload": {
"sessionId": "xxx",
"modelSettings": {
"instruction": {
"content": "a cheerful and enthusiastic virtual assistant"
},
"tools": [
{
"name": "UniversalSearchTool",
"description": "This powerful tool can search across various data sources within an organization, including documents, databases, tables, online public materials, sharepoints, tutorials, faq, customers support, wikis, companies' regulations and organization, etc. It processes multiple results into a summarized text response. It accesses all databases of the company like customers, products, suppliers, etc. It can also access the customer databases and their information, activities, history, ids, subscriptions. For data related queries, in has an agent to design the sql queries, execute them and retrieve the information. The tool operates in the background to retrieve relevant information, rules, instructions from the organization's resources. If customer/user needs some information that you need search, please consider this tool in the plan. In any case you will communicate with this tool in natural language. Use knowledge/search in tool_guide, if you need to get some information.",
"parameters": {
"required": [
"explanation_of_tool_call",
"user_query",
"user_intent",
"search_query",
"search_keywords"
],
"type": "object",
"properties": {
"explanation_of_tool_call": {
"type": "string",
"description": "Provide a 1-3 sentence explanation of why this action is needed, using a passive tone. For example, 'This action needs to be done to ...'. Please consider any provided instructions about using this tool and consider them in your provided sentences."
},
"user_query": {
"type": "string",
"description": "The exact question or query posed by the user regarding this request."
},
"user_intent": {
"type": "string",
"description": "A short phrase summarizing the user's goal or reason for this request.Providing as much detail as possible around the intent for the given request. This should be auto populated and not explicitly asked to the user."
},
"search_query": {
"type": "string",
"description": "A query related to any topic within the organization. The input should be in natural language and can vary in tone and wording according to customer's preference. The search_query is close to the wording of the customer.E.g, if the customer asks a question, search_query should be also a question."
},
"search_keywords": {
"type": "string",
"description": "A list of keywords that can be used to search for the search engine. The search_keywords are used to search for the search_query in the organization's resources. They are separated by commas."
}
}
},
"strict": true,
"category": "PREBUILT"
},
{
"name": "Escalate",
"parameters": {
"required": [
"explanation_of_tool_call",
"user_query",
"user_intent"
],
"type": "object",
"properties": {
"explanation_of_tool_call": {
"type": "string",
"description": "Provide a 1-3 sentence explanation of why this action is needed, using a passive tone. For example, 'This action needs to be done to ...'. Please consider any provided instructions about using this tool and consider them in your provided sentences."
},
"user_query": {
"type": "string",
"description": "The exact question or query posed by the user regarding this request."
},
"user_intent": {
"type": "string",
"description": "A short phrase summarizing the user's goal or reason for this request.Providing as much detail as possible around the intent for the given request. This should be auto populated and not explicitly asked to the user."
}
}
},
"strict": true,
"category": "TOPIC"
}
],
"toolChoice": "auto",
"modelParameters": {
"model": "gpt-4o-realtime-preview-2024-12-17",
"modalities": [
"text",
"audio"
],
"voice": "alloy",
"temperature": "0.8",
"speed": "1",
"inputAudioFormat": "pcm16",
"outputAudioFormat": "pcm16",
"inputAudioNoiseReduction": "near_field",
"inputAudioTranscription": {
"model": "whisper-1"
},
"maxResponseOutputTokens": 2000,
"turnDetectionSettings": {
"threshold": "0.5",
"prefixPaddingMs": "300",
"silenceDurationMs": "500",
"eagerness": "medium"
}
},
"supportedLanguage": [
"en-US"
],
"latencyMessage": {
"uri": "",
"text": "Hmm...Hold on a moment. I need to give it some thought.",
"delay": 500,
"minimum": 5000
},
"primaryLanguage": "en-US",
"knowledgeSettings": {
"useGeneralKnowledge": true
},
"dtmfSettings": {
"allowDtmfInput": true,
"interdigitTimeout": 3000
},
"silenceDetectionSettings": {
},
"systemInstruction": {
"content": "follow this decision flow exactly"
}
},
"botInfo": {
"botSchemaName": "xxx",
"lastPublishedVersion": "2026-03-30T12:30:52.8521408Z",
"lastPublishedTimestamp": "2026-03-30T12:30:52.852140800Z",
"botAppId": "xxxx",
"cdsBotId": "xxxx",
"conversationStartMessages": [
{
"nlg": [
{
"text": "\u003cspeak version=\"1.0\" xml:lang=\"en-US\" xmlns:mstts=\"http://www.w3.org/2001/mstts\" xmlns=\"http://www.w3.org/2001/10/synthesis\"\u003e\u003cvoice name=\"de-DE-SeraphinaMultilingualNeural\" xmlns=\"\"\u003e\u003cprosody rate=\"0%\" pitch=\"0%\"\u003eThanks for waiting. Your call transfer was successful. If you can hear this message, you are now connected to John from the Sales team. I can assist you with sales-related inquiries.\u003c/prosody\u003e\u003c/voice\u003e\u003c/speak\u003e",
"bargeInDisabled": true
}
],
"visual": [
{
"text": "Hello, I'm E2E CCv2 Call Transfer Test Bot. How can I help?",
"bargeInDisabled": true
}
],
"language": "en-US",
"ttsParameters": {
"voice": {
"name": "de-DE-SeraphinaMultilingualNeural",
"language": "en-US"
}
},
"channel": "conversationconductor",
"messageId": "greeting-0"
}
]
}
}
}
4.3 Execute Request
For handling DialogEvents:
A. End user has hung up the call-
{
"sessionId": "xxx",
"selector": {
"library": "default"
},
"payload": {
"dialogEvent": {
"type": "HANGUP"
},
"requestedData": {
"data": {
"hangupReason": "CUSTOMER_HANGUP"
}
}
}
}
B. End user provided no matched input-
{
"payload": {
"dialogEvent": {
"type": "NO_MATCH",
"message": "001 no-match"
}
}
}
C. End user has not provided any input-
{
"payload": {
"dialogEvent": {
"type": "NO_INPUT",
"message": "002 no-input-timeout"
}
}
}
D. Answer Machine Detected-
{
"payload": {
"dialogEvent": {
"type": "AMD"
},
"requestedData": {
"id": "bargeInData",
"data": {
"bargeInData": {
"messageId": "xxx",
"messageDuration": 1176
}
}
}
}
}
E. Error-
{
"payload": {
"dialogEvent": {
"type": "ERROR",
"message": "006 recognizer-error"
}
}
}
For handling UserInput:
A. DTMF as command (Ex: Press * to interrupt)-
{
"sessionId": "xxx",
"selector": {
"library": "default"
},
"payload": {
"userInput": {
"interpretation": {
"confidence": 1,
"inputMode": "DTMF",
"utterance": "*",
"data": {
"TriggerDtmfKeyPressInfo": "StarKey",
"INTENT": "NO_INTENT",
"SWI_literal": "*",
"SWI_grammarName": "xxx",
"SWI_meaning": "{INTENT:NO_INTENT TriggerDtmfKeyPressInfo:StarKey}"
},
"slotConfidences": {
"TriggerDtmfKeyPressInfo": 1,
"INTENT": 1
}
}
}
}
}
B. DTMF as answer (Ex: Press 1 for Sales)-
{
"sessionId": "xxx",
"selector": {
"library": "default"
},
"payload": {
"userInput": {
"interpretation": {
"confidence": 1,
"inputMode": "DTMF",
"utterance": "1",
"data": {
"SetEntityByDtmfKeyInfo": "Num1",
"INTENT": "NO_INTENT",
"SWI_literal": "1",
"SWI_grammarName": "xxx",
"SWI_meaning": "{INTENT:NO_INTENT SetEntityByDtmfKeyInfo:Num1}"
},
"slotConfidences": {
"SetEntityByDtmfKeyInfo": 1,
"INTENT": 1
}
}
}
}
}
C. Multi-digit freeform DTMF (Ex: user entered account digits 12345)-
{
"sessionId": "xxx",
"selector": {
"library": "default"
},
"payload": {
"userInput": {
"interpretation": {
"confidence": 1,
"inputMode": "DTMF",
"utterance": "1 2 3 4 5",
"data": {
"SWI_meaning": "12345",
"MEANING": "12345",
"SWI_literal": "1 2 3 4 5",
"SWI_grammarName": "builtin:dtmf/digits?length=5;"
},
"slotConfidences": {
"MEANING": 1
}
}
}
}
}
D. External grammar result-
{
"sessionId": "xxx",
"selector": {
"library": "default"
},
"payload": {
"userInput": {
"interpretation": {
"confidence": 0.94,
"inputMode": "VOICE",
"utterance": "yes",
"data": {
"SWI_meaning": "true",
"SWI_literal": "yes",
"SWI_grammarName": "xxx"
}
}
}
}
}
E. ASR input-
{
"sessionId": "xxx",
"selector": {
"library": "default"
},
"payload": {
"userInput": {
"asraasResult": {
"absStartMs": 4540,
"absEndMs": 5700,
"utteranceInfo": {
"durationMs": 1160,
"dsp": {
"numChannels": 1,
"initialSilenceMs": 4540
}
},
"hypotheses": [
{
"confidence": 0.84698063,
"averageConfidence": 0.8469807,
"formattedText": "I want to fly to",
"minimallyFormattedText": "i want to fly to",
"words": [
{
"text": "I",
"confidence": 0.8657025,
"endMs": 40
},
{
"text": "want",
"confidence": 0.863662,
"startMs": 40,
"endMs": 280
},
{
"text": "to",
"confidence": 0.84781086,
"startMs": 280,
"endMs": 360
},
{
"text": "fly",
"confidence": 0.94070596,
"startMs": 360,
"endMs": 680
},
{
"text": "to",
"confidence": 0.7170221,
"startMs": 680,
"endMs": 1160
}
],
"tokenization": [
{
"literal": "i",
"confidence": 0.8657025,
"etime": 40,
"emff": 1
},
{
"literal": "want",
"confidence": 0.863662,
"stime": 40,
"etime": 280,
"smff": 2,
"emff": 6
},
{
"literal": "to",
"confidence": 0.84781086,
"stime": 280,
"etime": 360,
"smff": 7,
"emff": 9
},
{
"literal": "fly",
"confidence": 0.94070596,
"stime": 360,
"etime": 680,
"smff": 10,
"emff": 13
},
{
"literal": "to",
"confidence": 0.7170221,
"stime": 680,
"etime": 1160,
"smff": 14,
"emff": 16
}
]
},
{
"confidence": 0.7131306,
"averageConfidence": 0.71313053,
"formattedText": "i want to fly to lo",
"minimallyFormattedText": "i want to fly to lo",
"words": [
{
"text": "i",
"confidence": 0.8657025,
"endMs": 40
},
{
"text": "want",
"confidence": 0.863662,
"startMs": 40,
"endMs": 280
},
{
"text": "to",
"confidence": 0.84781086,
"startMs": 280,
"endMs": 360
},
{
"text": "fly",
"confidence": 0.94070596,
"startMs": 360,
"endMs": 680
},
{
"text": "to",
"confidence": 0.7170221,
"startMs": 680,
"endMs": 1080
},
{
"text": "lo",
"confidence": 0.043879688,
"startMs": 1080,
"endMs": 1160
}
],
"tokenization": [
{
"literal": "i",
"confidence": 0.8657025,
"etime": 40,
"emff": 1
},
{
"literal": "want",
"confidence": 0.863662,
"stime": 40,
"etime": 280,
"smff": 2,
"emff": 6
},
{
"literal": "to",
"confidence": 0.84781086,
"stime": 280,
"etime": 360,
"smff": 7,
"emff": 9
},
{
"literal": "fly",
"confidence": 0.94070596,
"stime": 360,
"etime": 680,
"smff": 10,
"emff": 13
},
{
"literal": "to",
"confidence": 0.7170221,
"stime": 680,
"etime": 1080,
"smff": 14,
"emff": 16
},
{
"literal": "lo",
"confidence": 0.043879688,
"stime": 1080,
"etime": 1160,
"smff": 17,
"emff": 19
}
]
}
],
"dataPack": {
"language": "en-US",
"topic": "GEN",
"version": "0.0.7.0",
"id": "Pasco+Dpp; en-US; en-US"
}
}
}
}
}
F. LLM request-
{
"sessionId": "xxx",
"selector": {
"library": "default"
},
"payload": {
"userInput": {
"genAiResult": {
"messages": [
{
"role": "USER",
"content": "Of course, my account is xx",
"timestamp": "2026-04-06T10:33:56.128708600Z"
}
],
"toolCalls": [
{
"name": "Get-account-ID-by-phone-number",
"callId": "call_xx",
"arguments": "{ \n \"explanation_of_tool_call\": \"This action needs to be done to retrieve the customer’s account ID using the provided phone number, so we can link the order to their account.\", \n \"user_query\": \"My account phone number is xx.\", \n \"user_intent\": \"retrieve account ID using the customer’s phone number\", \n \"phone_number\": \"xxx\" \n}"
}
]
}
}
}
}
4.4 Execute Response
A. Question and Answer action to be performed by the client application-
{
"payload": {
"messages": [
{
"nlg": [
{
"text": "\u003cspeak version=\"1.0\" xml:lang=\"en-US\" xmlns:mstts=\"http://www.w3.org/2001/mstts\" xmlns=\"http://www.w3.org/2001/10/synthesis\"\u003e\u003cvoice name=\"en-US-ChristopherNeural\" xmlns=\"\"\u003e\u003cprosody rate=\"0%\" pitch=\"0%\"\u003eHello. Welcome to Contoso Bank.\u003c/prosody\u003e\u003c/voice\u003e\u003c/speak\u003e"
}
],
"language": "en-US",
"messageId": "xx"
}
],
"qaAction": {
"message": {
"nlg": [
{
"text": "\u003cspeak version=\"1.0\" xml:lang=\"en-US\" xmlns:mstts=\"http://www.w3.org/2001/mstts\" xmlns=\"http://www.w3.org/2001/10/synthesis\"\u003e\u003cvoice name=\"en-US-ChristopherNeural\" xmlns=\"\"\u003e\u003cprosody rate=\"0%\" pitch=\"0%\"\u003ePlease let me know your concern.\u003c/prosody\u003e\u003c/voice\u003e\u003c/speak\u003e"
}
],
"language": "en-US",
"messageId": "xxx"
},
"data": {
"asrRoutingHint": {
"isLegacy": true,
"urnType": ""
}
},
"recognitionSettings": {
"collectionSettings": {
"timeout": "7000ms",
"completeTimeout": "0ms",
"incompleteTimeout": "300ms",
"maxSpeechTimeout": "12000ms"
},
"speechSettings": {
"sensitivity": "0.5",
"bargeInType": "speech",
"speedVsAccuracy": "0.5"
},
"dtmfSettings": {
"interDigitTimeout": "3000ms",
"termTimeout": "2000ms",
"termChar": "#"
}
},
"orchestrationResourceReference": {
"grammarReferences": [
{
"uri": "builtin:dtmf/digits?length=4;",
"type": "DTMF_GRAMMAR"
}
]
},
"recognitionInitResources": {
"recognitionInitMessage": {
"parameters": {
"language": "en-US",
"topic": "GEN",
"audioFormat": {
"pcm": {
"sampleRateHz": 16000
}
},
"recognitionFlags": {
"includeTokenization": true,
"stallTimers": true
},
"noInputTimeoutMs": 7000,
"recognitionTimeoutMs": 12000,
"utteranceEndSilenceMs": 300,
"speechDetectionSensitivity": 0.5,
"maxHypotheses": 2
},
"resources": [
{
"wakeupWord": {
"words": [
"got it",
"continue",
"please continue",
"i got it",
"Entendido"
]
}
}
]
},
"dtmfRecognitionInit": {
"parameters": {
"recognitionFlags": {
"stallTimers": true
},
"noInputTimeoutMs": 7000,
"dtmfInterdigitTimeoutMs": 3000,
"dtmfTermTimeoutMs": 2000,
"dtmfTermChar": "#",
"resultFormat": {
"additionalParameters": "mrcpv=2.06;strictconfidencelevel=1"
}
},
"resources": [
{
"uriGrammar": {
"uri": "builtin:dtmf/digits?length=4;"
}
}
]
}
},
"language": "en-US",
"isHotwordMode": true,
"loopMessage": {
"audio": [
{
"uri": "xxx"
}
]
}
},
"channel": "conversationconductor"
}
}
B. Escalation action
{
"payload": {
"messages": [
{
"nlg": [
{
"text": "\u003cspeak version=\"1.0\" xml:lang=\"en-US\" xmlns:mstts=\"http://www.w3.org/2001/mstts\" xmlns=\"http://www.w3.org/2001/10/synthesis\"\u003e\u003cvoice name=\"en-US-ChristopherNeural\" xmlns=\"\"\u003e\u003cprosody rate=\"0%\" pitch=\"0%\"\u003eTesting blind transfer.\u003c/prosody\u003e\u003c/voice\u003e\u003c/speak\u003e"
}
],
"language": "en-US",
"messageId": "xxx"
},
{
"nlg": [
{
"text": "\u003cspeak version=\"1.0\" xml:lang=\"en-US\" xmlns:mstts=\"http://www.w3.org/2001/mstts\" xmlns=\"http://www.w3.org/2001/10/synthesis\"\u003e\u003cvoice name=\"en-US-ChristopherNeural\" xmlns=\"\"\u003e\u003cprosody rate=\"0%\" pitch=\"0%\"\u003ePlease hold while we transfer your call. This may take a moment.\u003c/prosody\u003e\u003c/voice\u003e\u003c/speak\u003e"
}
],
"language": "en-US",
"messageId": "xxx"
}
],
"escalationAction": {
"data": {
"sipHeaders": {
"userToUser": "customerid=xxx;accountnumber=zzz"
},
"va_AgentMessage": "Transferring to an agent.",
"va_BotId": "xxx",
"va_BotName": "xxx",
"va_ConversationId": "xxx",
"va_Language": "en-US",
"va_Scope": "bot"
},
"id": "External Actions",
"escalationSettings": {
"type": "blind",
"destination": "+1xxx"
}
},
"channel": "conversationconductor"
}
}
C. Call end action-
{
"payload": {
"messages": [
{
"nlg": [
{
"text": "\u003cspeak version=\"1.0\" xml:lang=\"en-US\" xmlns:mstts=\"http://www.w3.org/2001/mstts\" xmlns=\"http://www.w3.org/2001/10/synthesis\"\u003e\u003cvoice name=\"en-US-ChristopherNeural\" xmlns=\"\"\u003e\u003cprosody rate=\"0%\" pitch=\"0%\"\u003eNo intent. Your conversation will end.\u003c/prosody\u003e\u003c/voice\u003e\u003c/speak\u003e"
}
],
"language": "en-US"
}
],
"endAction": {
},
"channel": "conversationconductor"
}
}
D. Continue action during async request-
{
"payload": {
"messages": [
{
"nlg": [
{
"text": "\u003cspeak version=\"1.0\" xml:lang=\"en-US\" xmlns:mstts=\"http://www.w3.org/2001/mstts\" xmlns=\"http://www.w3.org/2001/10/synthesis\"\u003e\u003cvoice name=\"en-US-ChristopherNeural\" xmlns=\"\"\u003e\u003cprosody rate=\"0%\" pitch=\"0%\"\u003eI'll place the ticket and will order a cup of Americano\r\nfor you.\u003c/prosody\u003e\u003c/voice\u003e\u003c/speak\u003e"
}
],
"language": "en-US"
}
],
"continueAction": {
"message": {
"nlg": [
{
"bargeInDisabled": true
}
],
"language": "en-US"
},
"messageSettings": {
"delay": "500ms",
"minimum": "5000ms"
}
},
"channel": "conversationconductor"
}
}
E. Error during call-
{
"payload": {
"qaAction": {
"message": {
"nlg": [
{
"text": "\u003cspeak version=\"1.0\" xml:lang=\"en-US\" xmlns:mstts=\"http://www.w3.org/2001/mstts\" xmlns=\"http://www.w3.org/2001/10/synthesis\"\u003e\u003cvoice name=\"en-US-ChristopherNeural\" xmlns=\"\"\u003e\u003cprosody rate=\"0%\" pitch=\"0%\"\u003eSorry, the bot can't talk for a while. It's something the bot's owner needs to address. Error code: TelephonyBotNotResponseWithMessageBack. Trace ID: 3 0 3 5 8 7.\u003c/prosody\u003e\u003c/voice\u003e\u003c/speak\u003e"
}
],
"language": "en-US"
},
"data": {
"asrRoutingHint": {
"isLegacy": true,
"urnType": "ams"
}
},
"recognitionSettings": {
"collectionSettings": {
"timeout": "0ms",
"completeTimeout": "0ms",
"incompleteTimeout": "1500ms",
"maxSpeechTimeout": "12000ms"
},
"speechSettings": {
"sensitivity": "0.5",
"bargeInType": "speech",
"speedVsAccuracy": "0.5"
},
"dtmfSettings": {
"interDigitTimeout": "3000ms",
"termTimeout": "2000ms"
}
},
"recognitionInitResources": {
"recognitionInitMessage": {
"parameters": {
"language": "en-US",
"topic": "GEN",
"audioFormat": {
"pcm": {
"sampleRateHz": 16000
}
},
"recognitionFlags": {
"includeTokenization": true,
"stallTimers": true
},
"recognitionTimeoutMs": 12000,
"utteranceEndSilenceMs": 1500,
"speechDetectionSensitivity": 0.5,
"maxHypotheses": 2
}
}
},
"language": "en-US"
},
"channel": "conversationconductor"
}
}
{
"payload": {
"messages": [
{
"nlg": [
{
"text": "\u003cspeak version=\"1.0\" xml:lang=\"en-US\" xmlns:mstts=\"http://www.w3.org/2001/mstts\" xmlns=\"http://www.w3.org/2001/10/synthesis\"\u003e\u003cvoice name=\"en-US-AvaMultilingualNeural\" xmlns=\"\"\u003e\u003cprosody rate=\"0%\" pitch=\"0%\"\u003eSorry, the bot can't talk for a while. It's something the bot's owner needs to address. Error code: TelephonyBadRequest. Trace ID: 6 4 1 1 1 4.\u003c/prosody\u003e\u003c/voice\u003e\u003c/speak\u003e"
}
],
"language": "en-US"
}
],
"channel": "conversationconductor",
"genAiAction": {
"toolCallOutputs": [
{
"output": "Failure",
"disableModelResponse": true
}
]
}
}
}
F. Gen Ai action-
{
"payload": {
"messages": [
{
"nlg": [
{
"text": "\u003cspeak version=\"1.0\" xml:lang=\"en-US\" xmlns:mstts=\"http://www.w3.org/2001/mstts\" xmlns=\"http://www.w3.org/2001/10/synthesis\"\u003e\u003cvoice name=\"de-DE-SeraphinaMultilingualNeural\" xmlns=\"\"\u003e\u003cprosody rate=\"0%\" pitch=\"0%\"\u003eHello and thank you for calling E2E_S2S_agent. Please note that some responses are generated by AI and may require verification for accuracy. How may I help you today?\u003c/prosody\u003e\u003c/voice\u003e\u003c/speak\u003e"
}
],
"language": "en-US"
}
],
"channel": "conversationconductor",
"genAiAction": {
"tools": [
{
"name": "UniversalSearchTool",
"description": "This is tool",
"parameters": {
"type": "object",
"properties": {
"explanation_of_tool_call": {
"type": "string",
"description": "Provide a 1-3 sentence explanation of why this action is needed, using a passive tone. For example, 'This action needs to be done to ...'. Please consider any provided instructions about using this tool and consider them in your provided sentences."
},
"user_query": {
"type": "string",
"description": "The exact question or query posed by the user regarding this request."
},
"user_intent": {
"type": "string",
"description": "A short phrase summarizing the user's goal or reason for this request.Providing as much detail as possible around the intent for the given request. This should be auto populated and not explicitly asked to the user."
},
"search_query": {
"type": "string",
"description": "A query related to any topic within the organization. The input should be in natural language and can vary in tone and wording according to customer's preference. The search_query is close to the wording of the customer.E.g, if the customer asks a question, search_query should be also a question."
},
"search_keywords": {
"type": "string",
"description": "A list of keywords that can be used to search for the search engine. The search_keywords are used to search for the search_query in the organization's resources. They are separated by commas."
}
},
"required": [
"explanation_of_tool_call",
"user_query",
"user_intent",
"search_query",
"search_keywords"
]
},
"strict": true,
"category": "PREBUILT"
},
{
"name": "CustomEndConversation",
"description": "If users says, \"I am done\", \"end the call\", \"that's it\" or any related to this directly ends the conversation",
"parameters": {
"type": "object",
"properties": {
"explanation_of_tool_call": {
"type": "string",
"description": "Provide a 1-3 sentence explanation of why this action is needed, using a passive tone. For example, 'This action needs to be done to ...'. Please consider any provided instructions about using this tool and consider them in your provided sentences."
},
"user_query": {
"type": "string",
"description": "The exact question or query posed by the user regarding this request."
},
"user_intent": {
"type": "string",
"description": "A short phrase summarizing the user's goal or reason for this request.Providing as much detail as possible around the intent for the given request. This should be auto populated and not explicitly asked to the user."
}
},
"required": [
"explanation_of_tool_call",
"user_query",
"user_intent"
]
},
"strict": true,
"category": "TOPIC"
}
],
"toolChoice": "auto",
"toolCallOutputs": [
{
"callId": "call_xxx",
"output": [
{
"name": "message",
"value": "Your order with ID 789 has been created.",
"description": ""
}
]
}
],
"toolInterruption": {
"reasoning": "",
"intentChangeType": "UNKNOWN / REPLACE / CREATE"
},
"updateInstruction": true,
"messages": [
{
"content": "You are the specialist... "
},
{
"content": "You will handle...",
"highPriority": true
}
]
}
}
}
4.5 Update Request
{
"sessionId": "xxx",
"payload": {
"messages": [
{
"role": "USER",
"content": "Can you help me with the",
"timestamp": "2026-04-08T11:36:46.145676Z"
},
{
"role": "ASSISTANT",
"content": "I didn’t catch that. Could you please repeat?\n\u003c!-- LLMReasoningTrace: The user's message appears incomplete or unclear, likely due to a transcription error. Asking them to repeat or clarify their request. --\u003e",
"timestamp": "2026-04-08T11:36:47.089118100Z"
},
{
"role": "USER",
"content": "Order status",
"timestamp": "2026-04-08T11:36:47.573672400Z"
},
{
"role": "ASSISTANT",
"content": "I don’t have enough information to answer that based on the current context. Would you like help with anything else?\n\u003c!-- LLMReasoningTrace: Order status is not supported by the available tools or context, so I am informing the customer and prompting for any additional help needed. --\u003e",
"timestamp": "2026-04-08T11:36:48.636659800Z"
},
{
"role": "USER",
"content": "Based on the current what can you help me with",
"timestamp": "2026-04-08T11:36:55.243131Z"
},
{
"role": "ASSISTANT",
"content": "I can help you find store locations, store hours, make purchases, and provide information about careers and job openings at Contoso Mobile. What would you like assistance with?\n\u003c!-- LLMReasoningTrace: Providing the areas of support available based on the tools provided to guide the customer to a relevant topic. --\u003e",
"timestamp": "2026-04-08T11:36:56.556802300Z"
}
]
}
}
4.6 Update Response
4.7 Status Request
4.8 Status Response
{ "sessionRemainingSec": 259138 }
5. Proposal
Introduce a canonical mapping of gRPC session lifecycle and tool invocation messages to Activity Protocol constructs by:
- Representing gRPC operations using existing Activity types:
- command / commandResult for request–response flows
- event, message where semantically appropriate
- Encoding dialog semantics via:
- Activity.name (e.g., session.init, ERROR, NO_MATCH)
- Activity.valueType for strongly-typed payloads (e.g., recognizer results, generative AI output)
- Activity.value for payload data
- Using entities (e.g., ProductInfo) to convey channel and sub-channel context, instead of overloading channelId
- Supporting voice and multimodal requirements such as:
- ASR / NLU / GenAI results
- Barge-in control
- Partial and final transcripts
- Sensitive-data masking
- Real-time model and capability negotiation
All mappings rely solely on existing Activity Protocol extensibility guidance, avoiding schema changes.
6. Activities
6.1 Start Request
{
"type": "command",
"id": "x-request-id",
"name": "session.init",
"conversation": {
"id": "abc123"
},
"recipient": {
"id": "xxx-11xx",
"role": "bot",
"name": "+2345xxx"
},
"from": {
"id": "xyz123",
"name": "+98665xxx",
"role": "user"
},
"locale": "en-US",
"channelId": "conversationconductor",
"entities": [
{
"type": "ProductInfo",
"id": "msteamsphone / omnichannel"
}
],
"channelData": {
"vnd.microsoft.msdyn.oc.data": {
"voices": {
"en-US": {
"voiceStyle": null,
"pitch": 0,
"voiceName": "en-US-AvaMultilingualNeural",
"speakingSpeed": 0
},
"OrganizationPhoneNumber": "+xxx",
"msdyn_ConversationId": "xxx",
"msdyn_OrganizationPhone": "+xxx",
"msdyn_CustomerPhone": "+xxx",
"CustomerPhoneNumber": "+xx"
}
},
"sipHeaders": {
"userToUser": "xyz",
"sipxHeader": { }
}
},
"value": {
"sessionTimeoutSec": "2000",
"requestAgentCapabilities": true,
"isCallRecordingConsentRequired": true,
"shareOutboundContextOnConversationEnd": true,
"botType": "mcs / tps",
"callingContext": {
"channelId": "msteams",
"userId": "x909",
"conversationType": "personal"
}
}
}
6.2 Start Response
{
"type": "commandResult",
"value": {
"realtimeGptModelSettings": {
"userInstruction": {
"role": "user",
"content": "a cheerful and enthusiastic virtual assistant "
},
"tools": [
{
"name": "UniversalSearchTool",
"description": "This powerful tool can search ...",
"parameters": {
"type": "object",
"properties": {
"explanation_of_tool_call": {
"type": "string",
"description": "Provide a 1-3 sentence explanation of why this action is needed, using a passive tone. For example, 'This action needs to be done to ...'. Please consider any provided instructions about using this tool and consider them in your provided sentences."
},
"user_query": {
"type": "string",
"description": "The exact question or query posed by the user regarding this request."
}
},
"required": [
"explanation_of_tool_call",
"user_query"
]
},
"strict": true,
"category": "PREBUILT"
},
{
"name": "CustomEndConversation",
"description": "If users says, \"I am done\", \"end the call\", \"that's it\" or any related to this directly ends the conversation",
"parameters": {
"type": "object",
"properties": {
"explanation_of_tool_call": {
"type": "string",
"description": "Provide a 1-3 sentence explanation of why this action is needed, using a passive tone. For example, 'This action needs to be done to ...'. Please consider any provided instructions about using this tool and consider them in your provided sentences."
},
"user_query": {
"type": "string",
"description": "The exact question or query posed by the user regarding this request."
},
"user_intent": {
"type": "string",
"description": "A short phrase summarizing the user's goal or reason for this request.Providing as much detail as possible around the intent for the given request. This should be auto populated and not explicitly asked to the user."
}
},
"required": [
"explanation_of_tool_call",
"user_query",
"user_intent"
]
},
"strict": true,
"category": "TOPIC"
}
],
"toolChoice": "auto",
"modelParameters": {
"model": "gpt-x",
"modalities": [
"text",
"audio"
],
"voiceFont": "shimmer",
"temperature": "0.8",
"speakingSpeed": "1",
"inputAudioFormat": "pcm16",
"outputAudioFormat": "pcm16",
"inputAudioNoiseReduction": "near_field",
"inputAudioTranscription": {
"model": "whisper-1"
},
"maxResponseOutputTokens": 1000,
"turnDetectionSettings": {
"threshold": "0.5",
"prefixPaddingMs": "300",
"silenceDurationMs": "500",
"eagerness": "medium"
}
},
"supportedLanguages": [
"en-US"
],
"latencyMessage": {
"uri": "",
"text": "Hmm...Hold on a moment. I need to give it some thought.",
"delay": 500,
"minimum": 5000
},
"primaryLanguage": "en-US",
"knowledgeSettings": {
},
"dtmfSettings": {
"allowDtmfInput": true,
"interdigitTimeout": 3000
},
"silenceDetectionSettings": {
"silenceDetectionTimeout": 7000
},
"systemInstruction": {
"role": "system",
"content": "follow this decision flow exactly"
},
"instructionExpressions": [
{
"expression": "System.Bot.Components.Actions.'crf37_HybridTopicTestAgent.action.Order_status_by_Id-RetrievesorderfromanOrderID'.DisplayName",
"value": "Retrieves-order-from-an-Order-ID"
}
]
},
"agentInfo": {
"botSchemaName": "",
"capabilities": [
"AMD"
],
"lastPublishedVersion": "2026-03-05T12:44:57.5164062Z",
"lastPublishedTimestamp": 123456,
"voiceProcessingMode": "CLASSIC_VOICE_PROCESSING_MODE / REALTIME_VOICE_PROCESSING_MODE",
"cdsBotId": "xxx",
"speech": {
"bargeInEnabled": true
},
"messages": [
{
"nlg": [
{
"text": "\u003cspeak version=\"1.0\" xml:lang=\"en-US\" xmlns:mstts=\"http://www.w3.org/2001/mstts\" xmlns=\"http://www.w3.org/2001/10/synthesis\"\u003e\u003cvoice name=\"en-US-ChristopherNeural\" xmlns=\"\"\u003e\u003cprosody rate=\"0%\" pitch=\"0%\"\u003eHello and thank you for calling Agent with CV, powered by generative AI. How may I help you today?\u003c/prosody\u003e\u003c/voice\u003e\u003c/speak\u003e",
"containsSensitiveData": false
}
],
"visual": [
{
"text": "Hello, I'm Agent with CV. How can I help?",
"containsSensitiveData": false
}
],
"messageId": "xx-0"
}
]
}
}
}
6.3 Execute Request
For starting conversation event:
{
"type": "event",
"name": "StartConversation"
}
For handling DialogEvents:
A. End user has hung up the call-
{
"type": "event ",
"name": "DialogHangup",
"value": {
"hangupReason": "CUSTOMER_HANGUP"
}
}
B. End user provided unrecognizable input-
{
"type": "event",
"name": "DialogUnrecognizedInput",
"valueType": "NoMatch/ UnrecognizedSpeech",
"value": {
"bargeInContext": {
"messageId": "xxx192",
"messageDuration": 20
}
}
}
C. End user has not provided any input-
{
"type": "event",
"name": "DialogNoInput",
"text": "002 no-input-timeout"
}
}
D. Answer Machine Detected-
{
"type": "event",
"name": "AnswerMachineDetected",
"value": {
"bargeInContext": {
"messageId": "xxx192",
"messageDuration": 20
}
}
}
E. unexpected problem occurred-
{
"type": "event",
"name": "DialogError",
"text": "Response Finished Failed with a critical error",
"value": {
"errorCode": "inference_rate_limit_exceeded"
}
}
For handling UserInput:
A. DTMF as command (Ex: Press * to interrupt)-
{
"type": "message",
"text": "4",
"valueType": "TriggerDtmfKeyPressInfo",
"value": {
"dtmfMappingKey": "Num4"
}
}
B. DTMF as answer (Ex: Press 1 for Sales)-
{
"type": "message",
"text": "1",
"valueType": "SetEntityByDtmfKeyInfo",
"value": {
"dtmfMappingKey": "Num1"
}
}
C. Multi-digit freeform DTMF (Ex: user entered account digits 12345)-
{
"type": "message",
"text": "12345",
"valueType": "MultiDtmfInfo"
}
D. External grammar result-
{
"type": "message",
"text": "fifteen hundred U S dollar",
"valueType": "NuanceRecognizerResultInfo",
"value": {
"entityGrammarUri": "https://bugbashgrammars.blob.core.windows.net/common/currencygrammar.grxml?constrainedrequired=true",
"recognizedData": [
{ "entityName": "SWI_meaning", "entityValue": "USD1500.00" },
{ "entityName": "AMBIGUOUS", "entityValue": "0" },
{ "entityName": "MEANING", "entityValue": "USD1500.00" },
{ "entityName": "SWI_literal", "entityValue": "fifteen hundred U S dollar" }
]
}
}
E. ASR input-
{
"type": "message",
"text": "New York",
"valueType": "AsraasResult",
"value": {
"confidence": 0.9152545,
"minimallyFormattedText": "new york"
}
}
F. LLM request-
{
"type": "message",
"text": "We can go ahead and wrap up the call now.",
"valueType": "SpeechToSpeechTurnContext",
"value": {
"incomingActivities": [
{
"id": "<generated-guid-1>",
"locale": "en-US",
"channelId": "conversationconductor",
"type": "message",
"text": "I've successfully saved your mom's details to your address book, so it'll be easier next time you place an order. Is there anything else I can help you with today?",
"timestamp": "2026-04-03T10:36:10.5704886+00:00"
},
{
"id": "<generated-guid-2>",
"type": "message",
"text": "Thank you so much for your help.",
"timestamp": 1234567890
},
{
"id": "<generated-guid-3>",
"type": "message",
"text": "That covers everything I needed. I'm all set, and I really appreciate it.",
"timestamp": 1234567890
},
{
"id": "<generated-guid-4>",
"type": "message",
"text": "We can go ahead and wrap up the call now.",
"timestamp":1234567890
}
],
"toolCalls": [
{
"toolId": "BotHangupTool",
"thought": "This action needs to be done to end the call now that the customer has confirmed they are all set and no further assistance is needed.",
"inputs": { },
"outputs": { },
"callId": "call_Aax2clLaMvwB4Dei"
}
],
"userQuery": "We can go ahead and wrap up the call now.",
"userIntent": "end the conversation after completing the customer’s request"
}
}
G. Follow-up from a ContinueAction message based on async response sent from the bot-
{
"type": "message",
"valueType": "ContinueAction"
}
6.4 Execute Response
A. Question and Answer action to be performed by the client application-
{
"type": "message",
"valueType": "QaAction",
"value": {
"messages": [
{
"nlg": [
{
"text": "\u003cspeak version=\"1.0\" xml:lang=\"en-US\" xmlns:mstts=\"http://www.w3.org/2001/mstts\" xmlns=\"http://www.w3.org/2001/10/synthesis\"\u003e\u003cvoice name=\"en-US-AvaMultilingualNeural\" xmlns=\"\"\u003e\u003cprosody rate=\"0%\" pitch=\"0%\"\u003eContoso Airlines. Thanks for flying with us! Please listen closely as our menu options have recently changed.\u003c/prosody\u003e\u003c/voice\u003e\u003c/speak\u003e",
"containsSensitiveData": false
},
{
"text": "\u003cspeak version=\"1.0\" xml:lang=\"en-US\" xmlns:mstts=\"http://www.w3.org/2001/mstts\" xmlns=\"http://www.w3.org/2001/10/synthesis\"\u003e\u003cvoice name=\"en-US-AvaMultilingualNeural\" xmlns=\"\"\u003e\u003cprosody rate=\"0%\" pitch=\"0%\"\u003eHow can I help today? You can say BookFlight, FlightStatus, LostLuggage, or HangUp.\u003c/prosody\u003e\u003c/voice\u003e\u003c/speak\u003e",
"containsSensitiveData": false
}
],
"visual": [
{
"text": "Contoso Airlines. Thanks for flying with us! Please listen closely as our menu options have recently changed.",
"containsSensitiveData": false
},
{
"text": "How can I help today? You can say BookFlight, FlightStatus, LostLuggage, or HangUp.",
"containsSensitiveData": false
}
],
"messageId": "xx-0"
}
],
"data": {
"asrRoutingHint": {
"isLegacy": true,
"urnType": ""
}
},
"recognitionSettings": {
"dtmfMappings": [
{
"id": "TriggerDtmfKeyPressInfo",
"value": "Num2",
"dtmfKey": "2"
},
{
"id": "TriggerDtmfKeyPressInfo",
"value": "Num3",
"dtmfKey": "3"
},
{
"id": "TriggerDtmfKeyPressInfo",
"value": "Num1",
"dtmfKey": "1"
},
{
"id": "TriggerDtmfKeyPressInfo",
"value": "Num4",
"dtmfKey": "4"
},
{
"id": "TriggerDtmfKeyPressInfo",
"value": "StarKey",
"dtmfKey": "*"
}
],
"collectionSettings": {
"timeout": "5000ms",
"completeTimeout": "0ms",
"incompleteTimeout": "1500ms",
"maxSpeechTimeout": "0ms"
},
"speechSettings": {
"sensitivity": "0.5",
"bargeInType": "speech",
"speedVsAccuracy": "0.5"
},
"dtmfSettings": {
"interDigitTimeout": "3000ms",
"termTimeout": "2000ms"
}
},
"orchestrationResourceReference": {
"recognitionResources": [
{
"inlineWordset": "{\"WORDSETS\":[{\"literal\":\"BookFlight\"},{\"literal\":\"FlightStatus\"},{\"literal\":\"LostLuggage\"},{\"literal\":\"HangUp\"}]}",
"weightValue": 0.1
}
]
},
"recognitionInitResources": {
"recognitionInitMessage": {
"parameters": {
"language": "en-US",
"topic": "GEN",
"audioFormat": {
"pcm": {
"sampleRateHz": 16000
}
},
"recognitionFlags": {
"includeTokenization": true,
"stallTimers": true
},
"noInputTimeoutMs": 5000,
"utteranceEndSilenceMs": 1500,
"speechDetectionSensitivity": 0.5,
"maxHypotheses": 2
},
"resources": [
{
"inlineWordset": "{\"WORDSETS\":[{\"literal\":\"BookFlight\"},{\"literal\":\"FlightStatus\"},{\"literal\":\"LostLuggage\"},{\"literal\":\"HangUp\"}]}",
"weightValue": 0.1
}
]
}
},
"speech": {
"bargeInEnabled": false
},
"userConsentRecordingStatus": 1
}
}
B. Escalation action-
{
"type": "message",
"valueType": "EscalationAction",
"value": {
"messages": [
{
"nlg": [
{
"text": "\u003cspeak version=\"1.0\" xml:lang=\"en-US\" xmlns:mstts=\"http://www.w3.org/2001/mstts\" xmlns=\"http://www.w3.org/2001/10/synthesis\"\u003e\u003cvoice name=\"en-US-ChristopherNeural\" xmlns=\"\"\u003e\u003cprosody rate=\"0%\" pitch=\"0%\"\u003e\u003cemphasis level=\"strong\"\u003eOh no!\u003c/emphasis\u003e We can help. \u003cbreak strength=\"medium\" /\u003e Please wait a moment while I connect you to an agent that can help you find your luggage.\u003c/prosody\u003e\u003c/voice\u003e\u003c/speak\u003e",
"containsSensitiveData": false
}
],
"visual": [
{
"text": "Sorry to hear that. Please wait a moment while I connect you to an agent that can help you find your luggage",
"containsSensitiveData": false
}
],
"messageId": "xx-0"
}
],
"type": "blind",
"destination": "+xxxxxx",
"va_AgentMessage": "This contains custom defined variables",
"additionalContextItems": {
"IsLoggedIn": false,
"va_BotId": "xxxx",
"va_BotName": "xxx",
"va_ConversationId": "xxxx",
"va_Language": "en-US",
"va_Scope": "bot",
"va_LastTopic": "Store",
"va_LastPhrases": "Can I return my item",
"va_Phrases": [
"Hi",
"When does store open",
"Can I return my item"
],
"va_Topics": [
"Greetings",
"Store Hours",
"Return Item"
]
},
"speech": {
"bargeInEnabled": false
},
"userConsentRecordingStatus": 1
}
}
C. Call end action-
{
"type": "message",
"valueType": "EndAction",
"value": {
"messages": [
{
"nlg": [
{
"text": "\u003cspeak version=\"1.0\" xml:lang=\"en-US\" xmlns:mstts=\"http://www.w3.org/2001/mstts\" xmlns=\"http://www.w3.org/2001/10/synthesis\"\u003e\u003cvoice name=\"en-US-ChristopherNeural\" xmlns=\"\"\u003e\u003cprosody rate=\"0%\" pitch=\"0%\"\u003eThanks for ordering a Small pizza. Goodbye!\u003c/prosody\u003e\u003c/voice\u003e\u003c/speak\u003e",
"containsSensitiveData": false
}
],
"visual": [
{
"text": "Thanks for ordering a Small pizza. Goodbye!",
"containsSensitiveData": false
}
],
"messageId": "xx-0"
}
],
"additionalContextItems": {
"GlobalVar1";"Test"
},
"speech": {
"bargeInEnabled": false
},
"userConsentRecordingStatus": 1
}
}
D. Continue action during async request-
{
"type": "message",
"valueType": "ContinueAction",
"value": {
"messages": [
{
"nlg": [
{
"text": "\u003cspeak version=\"1.0\" xml:lang=\"en-US\" xmlns:mstts=\"http://www.w3.org/2001/mstts\" xmlns=\"http://www.w3.org/2001/10/synthesis\"\u003e\u003cvoice name=\"en-US-ChristopherNeural\" xmlns=\"\"\u003e\u003cprosody rate=\"0%\" pitch=\"0%\"\u003ePlease wait while we fetch a joke for you.\u003c/prosody\u003e\u003c/voice\u003e\u003c/speak\u003e",
"containsSensitiveData": false
}
],
"visual": [
{
"text": "Please wait, fetching joke...",
"containsSensitiveData": false
}
],
"messageId": "xx-0"
}
],
"speech": {
"bargeInEnabled": false
},
"callRecordingConsentStatus"1
}
}
E. Error during call-
{
"type": "event",
"name": "DialogError",
"text": "Failed to handle nested/base agent update: ToolCallOutput or CallId is null/empty, cannot process agent update",
"value": {
"errorCode": "bad_request",
"messages": [
{
"nlg": [
{
"text": "\u003cspeak version=\"1.0\" xml:lang=\"en-US\" xmlns:mstts=\"http://www.w3.org/2001/mstts\" xmlns=\"http://www.w3.org/2001/10/synthesis\"\u003e\u003cvoice name=\"de-DE-SeraphinaMultilingualNeural\" xmlns=\"\"\u003e\u003cprosody rate=\"0%\" pitch=\"0%\"\u003eSorry, the bot can't talk for a while. It's something the bot's owner needs to address. Error code: TelephonyBadRequest. Trace ID: 8 2 1 6 6 3.\u003c/prosody\u003e\u003c/voice\u003e\u003c/speak\u003e"
}
],
"visual": [
{
"text": "Sorry, the bot can't talk for a while. It's something the bot's owner needs to address",
"containsSensitiveData": false
}
],
"toolCallOutputs": [
{
"output": "Failure",
"disableModelResponse": true
}
],
"messageId": "xx-0"
}
]
}
}
F. Gen Ai action-
{
"type": "message",
"valueType": "GenAiAction",
"value": {
"toolCallOutputs": [
{
"callId": "call_xxx",
"output": [
{
"name": "message",
"value": "Your order with ID 789 has been created.",
"description": ""
}
]
}
],
"toolInterruption": {
"reasoning": "",
"intentChangeType": "UNKNOWN / REPLACE / CREATE"
},
"updateInstruction": true,
"instructionUpdate": {
"messages": [
{
"role": "SYSTEM",
"content": "You are the specialist..."
},
{
"role": "SYSTEM",
"content": "You will handle..."
}
]
},
"speech": {
"bargeInEnabled": false
},
"callRecordingConsentStatus": 1
}
}
G. Activity traces (in design test mode)-
{
"type": "message",
"valueType": "QaAction",
"value": {
"messages": [
{
"nlg": [
{
"text": "\u003cspeak version=\"1.0\" xml:lang=\"en-US\" xmlns:mstts=\"http://www.w3.org/2001/mstts\" xmlns=\"http://www.w3.org/2001/10/synthesis\"\u003e\u003cvoice name=\"en-US-AvaMultilingualNeural\" xmlns=\"\"\u003e\u003cprosody rate=\"0%\" pitch=\"0%\"\u003eHow can I help today? You can say BookFlight, FlightStatus, LostLuggage, or HangUp.\u003c/prosody\u003e\u003c/voice\u003e\u003c/speak\u003e",
"containsSensitiveData": false
}
],
"visual": [
{
"text": "How can I help today? You can say BookFlight, FlightStatus, LostLuggage, or HangUp.",
"containsSensitiveData": false
}
],
"messageId": "xx-0"
}
],
"activityTraces": [
{ ...Activity... },
{ ...Activity... }
]
}
}
6.5 Update Request
{
"type": "message",
"valueType": "SpeechToSpeechUpdateTranscript",
"value": {
"incomingActivities": [
{ ...Activity... },
{ ...Activity... }
]
}
}
6.6 Update Response
{
"type": "event",
"name": "ack"
}
6.7 Status Request
{
"type": "command",
"name": "SESSION_STATUS_REQUEST",
"value": {
"sessionId": "xxx"
}
}
6.8 Status Response
{
"type": "event",
"name": "SESSION_STATUS_RESPONSE",
"value": {
"sessionRemainingSec": 2
}
}
7. New Round-Trip Flow Example: Client and Server Interaction (Conversation Conductor [CC] and Microsoft Copilot Studio [MCS])
User places call-
Initiate new conversation session:
client → Activity with {"type": "command", "name": "session.init", "channelData": {/* ... */}, "value": {/*...contextData... */}}
server → Activity with {"type": "commandResult", "conversation": {"id": "<session_id>"}, "value": {"realtimeModelSettings": {/* ... */}, "capabilities": [ /* ... */], "botInfo": {/* ... */}}} - acknowledges session creation and returns negotiated mcs bot capabilities and configuration
MCS greeting with menu options:
client → Activity with {"type": event, "name": "StartConversation"}
server → Activity with {"type": "message", "valueType": "QaAction", "value": {/* ... */}}
User speaks or provides dtmf input:
client → Activity with {"type": "event", "name": "DialogNoInput"}
or Activity with {"type": "message", "name": "TriggerDtmfKeyPressInfo", value: {"dtmfMappingKey": "Num4"}}
server → Activity with {"type": "event", "name": "DialogError", "value": {/* ... */}}
or Activity with {"type": "message", "valueType": "QaAction / EscalationAction / EndAction / ContinueAction / GenAiAction", "value": { ... }} - server responds describing what the client should do next (that is - greeting the user, wait for user input, escalate to human agent, end conversation, continue with empty payload for async response or process mcs bot's tool call response):
For partial transcripts updates in Realtime speech flow:
client → Activity with {"type": "message", "valueType":"SpeechToSpeechUpdateTranscript","value": {"incomingActivities":[{/*...Activity...*/}, {/*...Activity...*/}]}}
server → Activity with {"type": "event", "name": "Ack"}
Session status check:
client → Activity with {"type": "command", "name": "SESSION_STATUS_REQUEST", "value": {"sessionId": "xxx"}}
server → Activity with {"type": "command", "name": "SESSION_STATUS_RESPONSE", "value": {"sessionRemainingSec": 2}} // Remaining session time to live (TTL) value in seconds, after which the session is terminated
8. Backward Compatibility
9. Alternative Considered
Continue Using gRPC as the Primary Runtime Contract
This was rejected because:
- It breaks schema parity across vendor–server integrations, as each vendor defines gRPC payloads differently
- Servers must maintain vendor‑specific adapters and branching logic
- Activity semantics become derived rather than authoritative, leading to drift over time
- Cross‑vendor tooling, middleware, and routing cannot rely on a consistent Activity contract
Continuing with gRPC as the primary interface would prevent the Activity Protocol from serving as a stable, vendor‑neutral integration surface, undermining interoperability and long‑term extensibility.
10. Conclusion
- Demonstrates that gRPC-based session lifecycle and tool invocation workflows can be represented using the Activity Protocol without introducing new schema fields.
- Uses existing extensibility mechanisms (type, name, value, valueType, entities) to model gRPC semantics consistently.
- Preserves Activity schema parity across vendor–server integrations, avoiding protocol-specific contracts.
- Allows the Activity Protocol to remain the authoritative, vendor-neutral integration surface while supporting advanced capabilities.
1. Overview
This feature proposes enabling gRPC-based conversational session and tool invocation workflows to be represented using the Activity Protocol, leveraging existing extensibility points (type, name, value, valueType, entities) without introducing new schema fields.
The proposal standardizes how Start / Execute / Update / Status / Stop Nuance gRPC lifecycle messages can be mapped to Activities, allowing runtimes to process voice, dialog events, user input, and tool actions in a channel-agnostic and vendor-neutral manner.
2. Problem
Today, gRPC-based conversational platforms (e.g., voice/telephony runtimes) express session lifecycle, dialog events, ASR/NLU results, and action responses using protocol-specific payloads that do not map cleanly onto the Activity Protocol.
This results in:
There is no clear, documented pattern for representing gRPC-style tool invocation and session control using Activity Protocol constructs.
3. Current Round-Trip Flow Example: Client and Server Interaction (Conversation Conductor [CC] and Microsoft Copilot Studio [MCS])
4. Current gRPC contract
4.1 Start Request
4.2 Start Response
4.3 Execute Request
For handling DialogEvents:
A. End user has hung up the call-
B. End user provided no matched input-
C. End user has not provided any input-
D. Answer Machine Detected-
E. Error-
For handling UserInput:
A. DTMF as command (Ex: Press * to interrupt)-
B. DTMF as answer (Ex: Press 1 for Sales)-
C. Multi-digit freeform DTMF (Ex: user entered account digits 12345)-
D. External grammar result-
E. ASR input-
F. LLM request-
4.4 Execute Response
A. Question and Answer action to be performed by the client application-
B. Escalation action
C. Call end action-
D. Continue action during async request-
E. Error during call-
F. Gen Ai action-
4.5 Update Request
4.6 Update Response
4.7 Status Request
4.8 Status Response
5. Proposal
Introduce a canonical mapping of gRPC session lifecycle and tool invocation messages to Activity Protocol constructs by:
All mappings rely solely on existing Activity Protocol extensibility guidance, avoiding schema changes.
6. Activities
6.1 Start Request
6.2 Start Response
6.3 Execute Request
For starting conversation event:
For handling DialogEvents:
A. End user has hung up the call-
B. End user provided unrecognizable input-
C. End user has not provided any input-
D. Answer Machine Detected-
E. unexpected problem occurred-
For handling UserInput:
A. DTMF as command (Ex: Press * to interrupt)-
B. DTMF as answer (Ex: Press 1 for Sales)-
C. Multi-digit freeform DTMF (Ex: user entered account digits 12345)-
D. External grammar result-
E. ASR input-
F. LLM request-
G. Follow-up from a ContinueAction message based on async response sent from the bot-
6.4 Execute Response
A. Question and Answer action to be performed by the client application-
B. Escalation action-
C. Call end action-
D. Continue action during async request-
E. Error during call-
F. Gen Ai action-
G. Activity traces (in design test mode)-
6.5 Update Request
6.6 Update Response
6.7 Status Request
6.8 Status Response
7. New Round-Trip Flow Example: Client and Server Interaction (Conversation Conductor [CC] and Microsoft Copilot Studio [MCS])
8. Backward Compatibility
9. Alternative Considered
Continue Using gRPC as the Primary Runtime Contract
This was rejected because:
Continuing with gRPC as the primary interface would prevent the Activity Protocol from serving as a stable, vendor‑neutral integration surface, undermining interoperability and long‑term extensibility.
10. Conclusion