Voice Graph Builder — Build Real-Time AI Voice Agents
What is a Voice Graph?
Unlike Text Graphs, which act sequentially, Voice Graphs operate in real-time. They handle continuous audio streams, user interruptions, and background noise using our voice gateway for WebRTC audio orchestration.
Definition — Voice Graph: A visual workflow for building real-time AI phone agents. Each Node represents an AI persona (a conversational state), and each Edge represents a transition that swaps the agent's persona mid-conversation.
A New Architecture
Voice agents achieve human-like response times under 500ms using WebRTC direct streaming, token-level TTS, and async background execution.
How Voice Agents Work
Langoedge Voice Agents use a multi-agent system where the agent is always "listening" for intent.
Speech-to-Text (STT)
Voice Activity Detection (VAD)
LLM Reasoning
Text-to-Speech (TTS)
Nodes & States
Every Node in a Voice Graph is a GenericAgent instance with:
- Instructions: The primary system prompt for that state (e.g., "You are the greeting bot. Ask for their name, then transition to the Booking agent.")
- Tools: Transition tools (edges) and functional tools (API calls, Text Graph handoffs)
Agent Transitions (Handoffs)
When the AI decides to "Switch to the Booking Department," it calls an AgentTransitionTool. The voice engine intercepts this:
- It de-registers the current agent (e.g., Greeting Bot).
- It registers a new agent (e.g., Booking Bot).
- The conversation continues seamlessly — the human never knows the system switched agents.
This is what makes Langoedge's voice agents feel like natural conversations with a team, not a single robotic bot.
Integrating Text Graphs into Voice Calls
Voice agents are great at talking but poor at complex data tasks. Langoedge solves this with Text Graph hand-offs.
Key Takeaway: A Voice Agent can trigger a Text Graph for heavy lifting — database lookups, form submissions, report generation — either synchronously (wait for result) or asynchronously (fire and forget).
Synchronous Wait
The voice agent pauses and waits for the Text Graph to return a result. Best for: "Check my account balance."
Asynchronous Fire & Forget
The voice agent triggers the Text Graph and continues the conversation while the logic runs in the background. Best for: "Send me the report."
Telephony & SIP Trunks
Langoedge handles phone calls via standard SIP Trunks — no complex telephony setup required.
Inbound Calls
Map any phone number to a Voice Graph in the dashboard. When the phone rings, the voice gateway automatically initializes a session and runs your AI agent via a SIP trunk provider like Twilio or Telnyx.
Outbound Calls
Trigger automated calls via the Langoedge API:
POST /voice-graphs/{id}/call
{
"phone_number": "+1234567890"
}
The agent will dial the user and start the conversation as soon as they pick up — perfect for appointment reminders, delivery notifications, or follow-up calls.
Voice Agent Settings & Sidebar Configuration
When editing a voice agent, the right-hand configuration panel allows you to manage voice identity, telephone bindings, external lookup webhooks, server lifecycles, and running transcript triggers.
1. AI Agent Voice Selection
Configure the voice identity and speech synthesis profile. The dropdown allows you to search and select from a curated list of natural-sounding, low-latency voices (such as Inworld or ElevenLabs samples) optimized for real-time, responsive dialogue.
2. Telephone Assignment (Twilio / Telnyx)
Configure how your voice agent is mapped to incoming or outgoing phone trunks:
- Buy Twilio Number: Purchase and provision a local or toll-free number directly within the Langoedge interface (subject to regional carrier availability).
- Bind Telnyx Number: Link an existing Telnyx number to your agent.
- Enter the number in standard E.164 format (e.g.,
+15551234567including the+and country code). - Click Bind number to authenticate and route carrier SIP requests to Langoedge's voice activity gateway. To enable this, ensure that you have connected your Telnyx account in the Connect Page.
- Enter the number in standard E.164 format (e.g.,
3. Caller Profile Webhook (Dynamic CRM Lookup)
To enable personalized interactions, configure a Caller Profile Webhook to dynamically look up caller records (such as patient data from Cliniko) the instant a call is received:
- Webhook URL: The HTTP/HTTPS endpoint to query. You can use an absolute URL or a platform-relative path (e.g.,
webhooks/cliniko/patientwhich automatically prefixes with the platform's API base URL). - Webhook API Key: A secure key sent in the headers of the webhook request (passed in
Authorization: Bearer <key>andx-api-key: <key>). - Technical Execution:
- Upon receiving an inbound connection, the voice engine POSTs a JSON payload containing
phone_numberandvoice_graph_idto your Webhook URL. - The webhook queries your CRM/database and returns the matched record as a JSON object.
- Langoedge parses the response and injects it as a system message in the agent's chat history before the greeting. This enables the agent to immediately context-greet the caller (e.g., "Hello Sarah, I see you have an appointment scheduled with Dr. Smith tomorrow at 2:00 PM. Are you calling to reschedule?").
- Upon receiving an inbound connection, the voice engine POSTs a JSON payload containing
4. Lifecycle Settings (Sticky Deployment)
Configure the running state of the underlying speech-to-text (STT) and text-to-speech (TTS) worker containers:
- Sticky Deployment Toggle: By default, Langoedge scales down inactive voice containers to conserve resources. Toggle Sticky Deployment on to keep your voice agent container continuously active and warm. This eliminates initial container cold-start delay, ensuring optimal human-level latency from the first second of the call.
5. Running Transcript Triggers (Agent Configs)
Integrate Text Graphs / Agent Configs directly into the live audio conversation:
- Dynamic Transcript Processing: Select one or more Text Graph configurations. For every conversational turn/utterance, the voice session invokes these graphs asynchronously in the background, passing the running call transcript.
- Ecosystem Actions: The triggered text graphs run RAG lookups, execute API webhooks, write variables to the shared state, or return response text blocks directly back into the voice agent's assistant context in real-time.
Latency Management
To achieve human-like response times under 500ms, Langoedge uses:
- WebRTC Direct Stream: No intermediate audio files — audio flows peer-to-peer through our streaming server.
- Token Streaming: TTS starts speaking the very first word of the LLM response while the rest is still being generated.
- Async Execution: Background logic is decoupled from the main audio loop so the conversation never stalls.
- Noise Suppression: AI-powered audio cleaning handles car, café, and office environments.
Speech Synthesis (TTS) Latency
| Voice Profile | Best For | Latency |
|---|---|---|
| High Speed | Ultra-low latency conversations | Sub-300ms |
| Expressive | Realistic, emotive voices for premium support | ~500ms |
| Standard | Fast, reliable standard voices | ~400ms |
| Roleplay | Character-driven and roleplay interactions | ~600ms |