Documentation

Event Reference & Usage

VoiceRun uses an event-driven architecture. Here's a comprehensive guide to all available events and how to use them effectively.

Event Lifecycle

Events flow through the system in the following sequence:

Event handler is run once per "Input Event"
Event handler can emit zero or more "Output Events"
Each "Output Event" is processed in the same order as they are emitted

Input Events

{ }

StartEvent

Event emitted when a new voice agent session has started.

🗃

Data Fields

No data fields

</>

Example Usage

if isinstance(event, StartEvent):
  # Handle start event

TextEvent

Event emitted when the user speaks or types text.

🗃

Data Fields

text

The text content spoken or typed by the user.

source

The source of the text event (e.g. "speech", "text")

</>

Example Usage

if isinstance(event, TextEvent):
  text = event.data["text"] # What the user said
  source = event.data["source"] # "speech" or "text"

⏱

TimeoutEvent

Event emitted when the user does not speak for 5 seconds.

🗃

Data Fields

No data fields

</>

Example Usage

if isinstance(event, TimeoutEvent):
  # Handle timeout event

Output Events

🔊

AudioEvent

Plays audio to the user from a URL.

🗃

Data Fields

path

The URL of the audio file to play.

</>

Example Usage

yield AudioEvent(path="https://example.com/audio.mp3")

📝

LogEvent

Logs a message to the system logs.

🗃

Data Fields

message

The message to log.

</>

Example Usage

yield LogEvent(message="Processing user request")

🔇

SilenceEvent

Plays silence for a specified duration in milliseconds.

🗃

Data Fields

duration

The duration of silence in milliseconds.

</>

Example Usage

yield SilenceEvent(duration=2000)

⏹

StopEvent

Stops the current voice agent session. This will end the call. Optionally plays a closing message before ending the call.

🗃

Data Fields

closing_speech

Optional text to speak before ending the call. If provided, this message will be played before the call terminates.

voice

Voice to use for closing speech. Defaults to the agent's default voice if not specified.

speed

Playback speed for closing speech. Range: 0.25x to 4.0x in 0.25 increments. Default: 1.0 (normal speed).

language

Language code for closing speech. Default: 'en' (English).

</>

Example Usage

# Basic usage - end call immediately
yield StopEvent()

# End call with a goodbye message
yield StopEvent(
    closing_speech="Thank you for calling. Have a great day!",
    voice="nova",
    speed=1.2,
    language="en"
)

# Multilingual goodbye
yield StopEvent(
    closing_speech="感谢您的来电，祝您有美好的一天！",
    voice="xiaoxiao",
    language="zh"
)

🗣

TextToSpeechEvent

Converts text to speech using the specified voice and plays it to the user. Supports advanced features like voice styling, speed control, caching, and streaming for optimized performance.

🗃

Data Fields

text

The text content to be converted to speech.

voice

The voice to use for the speech synthesis. You can use any of the following voice types: OpenAI Voices (9 voices): alloy, ash, coral, echo, fable, onyx, nova, sage, shimmer VoiceRun (6 voices): breath, flint, juniper, atlas, solstice, lyric Google Chirp Voices (30 voices): zephyr, puck, charon, kore, fenrir, leda, orus, aoede, callirrhoe, autonoe, enceladus, iapetus, umbriel, algieba, despina, erinome, algenib, rasalgethi, laomedeia, achernar, alnilam, schedar, gacrux, pulcherrima, achird, zubenelgenubi, vindemiatrix, sadachbia, sadaltager, sulafar Azure Voices (32 voices): xiaoxiao, xiaoxiao-multilingual, hsiaochen, yunjhe, hanhan, zhiyuan, aria, guy, jenny, tony, sonia, ryan, natasha, william, clara, liam, neerja, prabhat, emily, connor, asilia, chilemba, ezinne, abeo, rosa, james, luna, wayne, leah, luke, imani, elimu Cartesia Voices (10 voices): brooke, david, sophie, luke, chinese_lisa, chen, chinese_commercial_woman, chinese_commercial_man, welcome_lady, lori_australian Minimax Voice (1 voice): calm woman Explore our complete Voice Gallery for detailed voice descriptions and examples.

instructions

Optional voice styling instructions to control tone, emotion, or accent. Examples: 'enthusiastic with a Taiwanese accent', 'empathetic and professional tone', 'speak with enthusiasm'. Supported by: OpenAI.

speed

Optional playback speed modifier ranging from 0.25x to 4.0x in 0.25 increments. Supported values: 0.25 (very slow), 0.5 (slow), 0.75 (slightly slow), 1.0 (normal), 1.25 (slightly fast), 1.5 (fast), 2.0 (very fast), 4.0 (maximum speed). Supported by: OpenAI (0.25x-4.0x), Azure (0.5x-3.0x).

cache

Enable TTS caching for improved performance on repeated text. When enabled, identical text+voice combinations are cached for 1 week with provider-specific cache keys. Default: true. Supported by: All providers.

stream

Enable streaming TTS for lower latency. When true, audio begins playing as soon as the first chunks are available rather than waiting for complete generation. Default: true. Supported by: OpenAI, Azure.

interruptible

Controls whether users can interrupt this TTS during playback. When false, creates a non-interruptible segment that must complete before user input is processed. Default: true. Supported by: All providers.

model

Optional model name to override the default model for the provider. For example, use 'tts-1-hd' instead of the default 'gpt-4o-mini-tts' for OpenAI voices, or 'speech-2.6-turbo' instead of the default 'speech-2.8-turbo' for MiniMax voices. Once set, the model persists for all subsequent TTS events until changed. Supported by: All providers.

</>

Example Usage

# Basic usage with Prim Voice
yield TextToSpeechEvent(
  text="Hello, I'm speaking to you!",
  voice="lyric"
)

# OpenAI Voice with full feature support
yield TextToSpeechEvent(
  text="Welcome to our service!",
  voice="alloy",
  instructions="enthusiastic and professional",
  speed=1.25,
  stream=True,
  cache=True
)

# Non-interruptible announcement (user cannot interrupt)
yield TextToSpeechEvent(
  text="IMPORTANT: This message cannot be interrupted.",
  voice="nova",
  interruptible=False,
  instructions="serious and authoritative tone"
)

# High-performance streaming for real-time conversation
yield TextToSpeechEvent(
  text="How can I help you today?",
  voice="zephyr",
  stream=True,
  cache=False  # Disable cache for dynamic content
)

# Azure Neural Voice with speed control
yield TextToSpeechEvent(
  text="你好，我是小晓！",
  voice="xiaoxiao",
  speed=1.0,
  stream=True
)

# Customer service scenario with caching for common responses
yield TextToSpeechEvent(
  text="Thank you for calling. How may I assist you?",
  voice="nova",
  instructions="warm and welcoming tone",
  speed=1.0,
  cache=True,  # Cache common greetings
  interruptible=True
)

# Performance-optimized for repeated content
yield TextToSpeechEvent(
  text="Please hold while I transfer your call.",
  voice="alloy",
  cache=True,  # Cache hold messages
  stream=False,  # Pre-generate common messages
  interruptible=False  # Complete message before transfer
)

# Override the default model for a provider
yield TextToSpeechEvent(
  text="This uses a specific model.",
  voice="calm woman",
  model="speech-2.6-turbo"  # Use speech-2.6-turbo instead of default speech-2.8-turbo
)

📞

TransferSessionEvent

Channel-agnostic session transfer. For phone channels: transfers the call to another phone number. For web/API channels: redirects the session to another agent or environment. Optionally plays a closing message before transferring.

🗃

Data Fields

phone_number

Phone number to transfer to (phone channel only). Must include country code. Example: '+15555555555'.

agent_id

Target agent ID to redirect to (web/API channels only). The agent must exist and be accessible.

environment

Target environment name for the agent (web/API channels only). Examples: 'production', 'staging', 'development'.

data

Optional extra data to pass with the transfer. Can include context, user information, or transfer reason.

closing_speech

Optional text to speak before transferring. If provided, this message will be played before the transfer occurs.

voice

Voice to use for closing speech. Defaults to the agent's default voice if not specified.

speed

Playback speed for closing speech. Range: 0.25x to 4.0x in 0.25 increments. Default: 1.0 (normal speed).

language

Language code for closing speech. Default: 'en' (English).

</>

Example Usage

# Phone transfer (cold transfer)
yield TransferSessionEvent(phone_number="+15555555555")

# Phone transfer with context data
yield TransferSessionEvent(
    phone_number="+15555551234",
    data={"reason": "technical_support", "priority": "high", "customer_tier": "premium"}
)

# Phone transfer with closing message
yield TransferSessionEvent(
    phone_number="+15555551234",
    closing_speech="Please hold while I transfer you to our specialist.",
    voice="nova",
    speed=1.0
)

# Web/API redirect to different agent
yield TransferSessionEvent(
    agent_id="technical_support_agent",
    environment="production"
)

# Web/API redirect with context preservation
yield TransferSessionEvent(
    agent_id="billing_specialist",
    environment="production",
    data={"conversation_context": "payment_issue", "customer_id": "12345"}
)

# Web/API redirect with closing message
yield TransferSessionEvent(
    agent_id="billing_specialist",
    environment="production",
    closing_speech="Transferring you to our billing department now.",
    voice="alloy",
    language="en"
)

⚙️

STTUpdateSettingsEvent

Dynamically updates Speech-to-Text (STT) configuration settings during a conversation. This allows agents to change language, models, transcription prompts, endpointing sensitivity, and audio processing without restarting the session.

🗃

Data Fields

language

The language code for transcription. Examples: 'en' (English), 'es' (Spanish), 'fr' (French), 'de' (German), 'ja' (Japanese), 'zh' (Chinese), 'multi' (automatic language detection).

model

STT model to switch to. Examples: 'nova-3' (Deepgram), 'gpt-4o-transcribe' (OpenAI), 'whisper-1' (OpenAI), 'gpt-4o-mini-transcribe' (OpenAI), 'cartesia' (Cartesia). Enables cross-provider switching with shadow/standby instances for seamless transitions.

prompt

Context prompt to improve transcription accuracy. Examples: 'This is a technical conversation about software development', 'Medical terminology and patient care discussion', 'Casual conversation with informal language'.

endpointing

Controls when the system considers the user has finished speaking. Values: > 3 (Server VAD with silence duration in milliseconds), 3 (Semantic VAD 'high' eagerness), 2 (Semantic VAD 'medium' eagerness), 1 (Semantic VAD 'low' eagerness), 0 (Semantic VAD 'auto' eagerness).

noise_reduction

Audio processing type for noise reduction. Options: 'near_field' (close microphone), 'far_field' (distant microphone), 'telephony' (phone call optimization).

prewarm_model

Model to prepare in the background for instant switching. Enables zero-latency model transitions by maintaining standby instances.

</>

Example Usage

# Switch to Spanish language
yield STTUpdateSettingsEvent(language="es")

# Switch STT provider with prewarming
yield STTUpdateSettingsEvent(
  model="gpt-4o-transcribe",
  prewarm_model="nova-3"  # Prepare Deepgram in background
)

# Improve transcription accuracy with context
yield STTUpdateSettingsEvent(
  prompt="This is a technical conversation. Please transcribe technical terms accurately."
)

# Optimize for phone call audio
yield STTUpdateSettingsEvent(
  noise_reduction="telephony",
  endpointing=2000  # 2 second silence for phone calls
)

# Comprehensive configuration update
yield STTUpdateSettingsEvent(
  language="es",
  model="nova-3",
  prompt="Conversación técnica en español. Transcribe términos técnicos con precisión.",
  endpointing=500,
  noise_reduction="near_field",
  prewarm_model="gpt-4o-transcribe"
)

🎙

StartRecordingEvent

Starts recording the current call session. Useful for quality assurance, training, or compliance purposes. Only available for phone channels.

🗃

Data Fields

status_callback_url

Optional webhook URL to receive recording status updates and completion notifications.

</>

Example Usage

# Start recording without callback
yield StartRecordingEvent()

# Start recording with status webhook
yield StartRecordingEvent(
  status_callback_url="https://your-app.com/webhooks/recording-status"
)

⏹🎙

StopRecordingEvent

Stops the current call recording session. The recording will be finalized and made available for download.

🗃

Data Fields

No data fields

</>

Example Usage

# Stop current recording
yield StopRecordingEvent()