← Back to all guides

Voice Graph Builder — Build Real-Time AI Voice Agents

Langoedge Team7 min read

What is a Voice Graph?

Unlike Text Graphs, which act sequentially, Voice Graphs operate in real-time. They handle continuous audio streams, user interruptions, and background noise using our voice gateway for WebRTC audio orchestration.

Definition — Voice Graph: A visual workflow for building real-time AI phone agents. Each Node represents an AI persona (a conversational state), and each Edge represents a transition that swaps the agent's persona mid-conversation.

A New Architecture

Voice agents achieve human-like response times under 500ms using WebRTC direct streaming, token-level TTS, and async background execution.


How Voice Agents Work

Langoedge Voice Agents use a multi-agent system where the agent is always "listening" for intent.

1

Speech-to-Text (STT)

The user's voice is converted to text in real-time using high-speed streaming STT.
2

Voice Activity Detection (VAD)

Voice activity detection determines when a human starts and stops speaking, enabling natural interruption handling.
3

LLM Reasoning

A Large Language Model (LLM) processes the transcript and decides what to do — respond, call a tool, or transition to another agent.
4

Text-to-Speech (TTS)

Responses are synthesized into human-like speech in real-time using high-performance TTS synthesizers.

Nodes & States

Every Node in a Voice Graph is a GenericAgent instance with:

  • Instructions: The primary system prompt for that state (e.g., "You are the greeting bot. Ask for their name, then transition to the Booking agent.")
  • Tools: Transition tools (edges) and functional tools (API calls, Text Graph handoffs)

Agent Transitions (Handoffs)

When the AI decides to "Switch to the Booking Department," it calls an AgentTransitionTool. The voice engine intercepts this:

  1. It de-registers the current agent (e.g., Greeting Bot).
  2. It registers a new agent (e.g., Booking Bot).
  3. The conversation continues seamlessly — the human never knows the system switched agents.

This is what makes Langoedge's voice agents feel like natural conversations with a team, not a single robotic bot.


Integrating Text Graphs into Voice Calls

Voice agents are great at talking but poor at complex data tasks. Langoedge solves this with Text Graph hand-offs.

Key Takeaway: A Voice Agent can trigger a Text Graph for heavy lifting — database lookups, form submissions, report generation — either synchronously (wait for result) or asynchronously (fire and forget).

Synchronous Wait

The voice agent pauses and waits for the Text Graph to return a result. Best for: "Check my account balance."

Asynchronous Fire & Forget

The voice agent triggers the Text Graph and continues the conversation while the logic runs in the background. Best for: "Send me the report."


Telephony & SIP Trunks

Langoedge handles phone calls via standard SIP Trunks — no complex telephony setup required.

Inbound Calls

Map any phone number to a Voice Graph in the dashboard. When the phone rings, the voice gateway automatically initializes a session and runs your AI agent via a SIP trunk provider like Twilio or Telnyx.

Outbound Calls

Trigger automated calls via the Langoedge API:

POST /voice-graphs/{id}/call
{
  "phone_number": "+1234567890"
}

The agent will dial the user and start the conversation as soon as they pick up — perfect for appointment reminders, delivery notifications, or follow-up calls.


Voice Agent Settings & Sidebar Configuration

When editing a voice agent, the right-hand configuration panel allows you to manage voice identity, telephone bindings, external lookup webhooks, server lifecycles, and running transcript triggers.

1. AI Agent Voice Selection

Configure the voice identity and speech synthesis profile. The dropdown allows you to search and select from a curated list of natural-sounding, low-latency voices (such as Inworld or ElevenLabs samples) optimized for real-time, responsive dialogue.

2. Telephone Assignment (Twilio / Telnyx)

Configure how your voice agent is mapped to incoming or outgoing phone trunks:

  • Buy Twilio Number: Purchase and provision a local or toll-free number directly within the Langoedge interface (subject to regional carrier availability).
  • Bind Telnyx Number: Link an existing Telnyx number to your agent.
    • Enter the number in standard E.164 format (e.g., +15551234567 including the + and country code).
    • Click Bind number to authenticate and route carrier SIP requests to Langoedge's voice activity gateway. To enable this, ensure that you have connected your Telnyx account in the Connect Page.

3. Caller Profile Webhook (Dynamic CRM Lookup)

To enable personalized interactions, configure a Caller Profile Webhook to dynamically look up caller records (such as patient data from Cliniko) the instant a call is received:

  • Webhook URL: The HTTP/HTTPS endpoint to query. You can use an absolute URL or a platform-relative path (e.g., webhooks/cliniko/patient which automatically prefixes with the platform's API base URL).
  • Webhook API Key: A secure key sent in the headers of the webhook request (passed in Authorization: Bearer <key> and x-api-key: <key>).
  • Technical Execution:
    1. Upon receiving an inbound connection, the voice engine POSTs a JSON payload containing phone_number and voice_graph_id to your Webhook URL.
    2. The webhook queries your CRM/database and returns the matched record as a JSON object.
    3. Langoedge parses the response and injects it as a system message in the agent's chat history before the greeting. This enables the agent to immediately context-greet the caller (e.g., "Hello Sarah, I see you have an appointment scheduled with Dr. Smith tomorrow at 2:00 PM. Are you calling to reschedule?").

4. Lifecycle Settings (Sticky Deployment)

Configure the running state of the underlying speech-to-text (STT) and text-to-speech (TTS) worker containers:

  • Sticky Deployment Toggle: By default, Langoedge scales down inactive voice containers to conserve resources. Toggle Sticky Deployment on to keep your voice agent container continuously active and warm. This eliminates initial container cold-start delay, ensuring optimal human-level latency from the first second of the call.

5. Running Transcript Triggers (Agent Configs)

Integrate Text Graphs / Agent Configs directly into the live audio conversation:

  • Dynamic Transcript Processing: Select one or more Text Graph configurations. For every conversational turn/utterance, the voice session invokes these graphs asynchronously in the background, passing the running call transcript.
  • Ecosystem Actions: The triggered text graphs run RAG lookups, execute API webhooks, write variables to the shared state, or return response text blocks directly back into the voice agent's assistant context in real-time.

Latency Management

To achieve human-like response times under 500ms, Langoedge uses:

  • WebRTC Direct Stream: No intermediate audio files — audio flows peer-to-peer through our streaming server.
  • Token Streaming: TTS starts speaking the very first word of the LLM response while the rest is still being generated.
  • Async Execution: Background logic is decoupled from the main audio loop so the conversation never stalls.
  • Noise Suppression: AI-powered audio cleaning handles car, café, and office environments.

Speech Synthesis (TTS) Latency

Voice Profile Best For Latency
High Speed Ultra-low latency conversations Sub-300ms
Expressive Realistic, emotive voices for premium support ~500ms
Standard Fast, reliable standard voices ~400ms
Roleplay Character-driven and roleplay interactions ~600ms

Example: Appointment Booker Voice Agent

sequenceDiagram participant User participant Greeting as Greeting Agent participant Booking as Booking Agent participant Calendar as Text Graph User->>Greeting: "Hi, I'd like to book an appointment." Greeting->>Booking: Transition to Booking Agent Booking->>User: "Sure! What day works for you?" User->>Booking: "Next Tuesday." Booking->>Calendar: "Check availability for next Tuesday" Calendar-->>Booking: "Available at 2pm" Booking->>User: "We have a slot at 2pm next Tuesday. Want me to book it?"

Frequently Asked Questions

How fast are Langoedge voice agents?
Langoedge voice agents achieve sub-500ms response times using WebRTC direct streaming, token-level TTS streaming, and async background execution. The exact latency depends on your chosen STT/LLM/TTS providers.
Can voice agents handle interruptions?
Yes. Langoedge uses Voice Activity Detection (VAD) which allows the agent to detect when a user starts speaking mid-sentence, pause the current response, and switch to listening — just like a human conversation.
Can I use my own phone number?
Yes. You can buy a Twilio phone number directly inside the Voice Agent settings panel, or bind an existing Telnyx phone number (in E.164 format) to route carrier calls through Langoedge's voice gateway.
Can voice agents send emails or update my CRM?
Yes. Attach a Text Graph as a tool to any Voice Node, or use our integrations (Gmail, Slack, etc.) directly within a Voice Node. The agent can send emails, update records, or trigger any workflow.

For details on configuring custom voice parameters, please contact our support team.
LT

Langoedge Team

The Langoedge engineering team builds AI agent infrastructure that empowers businesses to deploy reliable, observable AI staff. Follow Langoedge Team on LinkedIn for product updates and architectural deep dives.