Event Reference & Usage

VoiceRun uses an event-driven architecture. Here's a comprehensive guide to all available events and how to use them effectively.

Event Lifecycle

Events flow through the system in the following sequence:

  1. Event handler is run once per "Input Event"
  2. Event handler can emit zero or more "Output Events"
  3. Each "Output Event" is processed in the same order as they are emitted

Input Events

{ }

StartEvent

Event emitted when a new voice agent session has started.

🗃

Data Fields

No data fields
</>

Example Usage

if isinstance(event, StartEvent):
  # Handle start event
  
T

TextEvent

Event emitted when the user speaks or types text.

🗃

Data Fields

text
The text content spoken or typed by the user.
source
The source of the text event (e.g. "speech", "text")
</>

Example Usage

if isinstance(event, TextEvent):
  text = event.data["text"] # What the user said
  source = event.data["source"] # "speech" or "text"
  

TimeoutEvent

Event emitted when the user does not speak for 5 seconds.

🗃

Data Fields

No data fields
</>

Example Usage

if isinstance(event, TimeoutEvent):
  # Handle timeout event
  

Output Events

🔊

AudioEvent

Plays audio to the user from a URL.

🗃

Data Fields

path
The URL of the audio file to play.
</>

Example Usage

yield AudioEvent(path="https://example.com/audio.mp3")
📝

LogEvent

Logs a message to the system logs.

🗃

Data Fields

message
The message to log.
</>

Example Usage

yield LogEvent(message="Processing user request")
🔇

SilenceEvent

Plays silence for a specified duration in milliseconds.

🗃

Data Fields

duration
The duration of silence in milliseconds.
</>

Example Usage

yield SilenceEvent(duration=2000)

StopEvent

Stops the current voice agent session. This will end the call.

🗃

Data Fields

No data fields
</>

Example Usage

yield StopEvent()
🗣

TextToSpeechEvent

Converts text to speech using the specified voice and plays it to the user. Supports advanced features like voice styling, speed control, caching, and streaming for optimized performance.

🗃

Data Fields

text
The text content to be converted to speech.
voice
The voice to use for the speech synthesis. You can use any of the following voice types: **OpenAI Voices (9 voices):** alloy, ash, coral, echo, fable, onyx, nova, sage, shimmer **VoiceRun (6 voices):** breath, flint, juniper, atlas, solstice, lyric **Google Chirp Voices (30 voices):** zephyr, puck, charon, kore, fenrir, leda, orus, aoede, callirrhoe, autonoe, enceladus, iapetus, umbriel, algieba, despina, erinome, algenib, rasalgethi, laomedeia, achernar, alnilam, schedar, gacrux, pulcherrima, achird, zubenelgenubi, vindemiatrix, sadachbia, sadaltager, sulafar **Azure Voices (32 voices):** xiaoxiao, xiaoxiao-multilingual, hsiaochen, yunjhe, hanhan, zhiyuan, aria, guy, jenny, tony, sonia, ryan, natasha, william, clara, liam, neerja, prabhat, emily, connor, asilia, chilemba, ezinne, abeo, rosa, james, luna, wayne, leah, luke, imani, elimu **Cartesia Voices (10 voices):** brooke, david, sophie, luke, chinese_lisa, chen, chinese_commercial_woman, chinese_commercial_man, welcome_lady, lori_australian 📚 **Explore our complete [Voice Gallery](/docs/voice-reference) for detailed voice descriptions and examples.**
instructions
Optional voice styling instructions to control tone, emotion, or accent. Examples: 'enthusiastic with a Taiwanese accent', 'empathetic and professional tone', 'speak with enthusiasm'. **Supported by**: OpenAI.
speed
Optional playback speed modifier ranging from 0.25x to 4.0x in 0.25 increments. Supported values: 0.25 (very slow), 0.5 (slow), 0.75 (slightly slow), 1.0 (normal), 1.25 (slightly fast), 1.5 (fast), 2.0 (very fast), 4.0 (maximum speed). **Supported by**: OpenAI (0.25x-4.0x), Azure (0.5x-3.0x).
cache
Enable TTS caching for improved performance on repeated text. When enabled, identical text+voice combinations are cached for 1 week with provider-specific cache keys. Default: true. **Supported by**: All providers.
stream
Enable streaming TTS for lower latency. When true, audio begins playing as soon as the first chunks are available rather than waiting for complete generation. Default: true. **Supported by**: OpenAI, Azure.
interruptible
Controls whether users can interrupt this TTS during playback. When false, creates a non-interruptible segment that must complete before user input is processed. Default: true. **Supported by**: All providers.
</>

Example Usage

# Basic usage with Prim Voice
yield TextToSpeechEvent(
  text="Hello, I'm speaking to you!",
  voice="lyric"
)

# OpenAI Voice with full feature support
yield TextToSpeechEvent(
  text="Welcome to our service!",
  voice="alloy",
  instructions="enthusiastic and professional",
  speed=1.25,
  stream=True,
  cache=True
)

# Non-interruptible announcement (user cannot interrupt)
yield TextToSpeechEvent(
  text="IMPORTANT: This message cannot be interrupted.",
  voice="nova",
  interruptible=False,
  instructions="serious and authoritative tone"
)

# High-performance streaming for real-time conversation
yield TextToSpeechEvent(
  text="How can I help you today?",
  voice="zephyr",
  stream=True,
  cache=False  # Disable cache for dynamic content
)

# Azure Neural Voice with speed control
yield TextToSpeechEvent(
  text="你好,我是小晓!",
  voice="xiaoxiao",
  speed=1.0,
  stream=True
)

# Customer service scenario with caching for common responses
yield TextToSpeechEvent(
  text="Thank you for calling. How may I assist you?",
  voice="nova",
  instructions="warm and welcoming tone",
  speed=1.0,
  cache=True,  # Cache common greetings
  interruptible=True
)

# Performance-optimized for repeated content
yield TextToSpeechEvent(
  text="Please hold while I transfer your call.",
  voice="alloy",
  cache=True,  # Cache hold messages
  stream=False,  # Pre-generate common messages
  interruptible=False  # Complete message before transfer
)
📞

TransferSessionEvent

Channel-agnostic session transfer. For phone channels: transfers the call to another phone number. For web/API channels: redirects the session to another agent or environment.

🗃

Data Fields

phone_number
Phone number to transfer to (phone channel only). Must include country code. Example: '+15555555555'.
agent_id
Target agent ID to redirect to (web/API channels only). The agent must exist and be accessible.
environment
Target environment name for the agent (web/API channels only). Examples: 'production', 'staging', 'development'.
data
Optional extra data to pass with the transfer. Can include context, user information, or transfer reason.
</>

Example Usage

# Phone transfer (cold transfer)
yield TransferSessionEvent(phone_number="+15555555555")

# Phone transfer with context data
yield TransferSessionEvent(
  phone_number="+15555551234",
  data={"reason": "technical_support", "priority": "high", "customer_tier": "premium"}
)

# Web/API redirect to different agent
yield TransferSessionEvent(
  agent_id="technical_support_agent",
  environment="production"
)

# Web/API redirect with context preservation
yield TransferSessionEvent(
  agent_id="billing_specialist",
  environment="production",
  data={"conversation_context": "payment_issue", "customer_id": "12345"}
)
⚙️

STTUpdateSettingsEvent

Dynamically updates Speech-to-Text (STT) configuration settings during a conversation. This allows agents to change language, models, transcription prompts, endpointing sensitivity, and audio processing without restarting the session.

🗃

Data Fields

language
The language code for transcription. Examples: 'en' (English), 'es' (Spanish), 'fr' (French), 'de' (German), 'ja' (Japanese), 'zh' (Chinese), 'multi' (automatic language detection).
model
STT model to switch to. Examples: 'nova-3' (Deepgram), 'gpt-4o-transcribe' (OpenAI), 'whisper-1' (OpenAI), 'gpt-4o-mini-transcribe' (OpenAI), 'cartesia' (Cartesia). Enables cross-provider switching with shadow/standby instances for seamless transitions.
prompt
Context prompt to improve transcription accuracy. Examples: 'This is a technical conversation about software development', 'Medical terminology and patient care discussion', 'Casual conversation with informal language'.
endpointing
Controls when the system considers the user has finished speaking. Values: > 3 (Server VAD with silence duration in milliseconds), 3 (Semantic VAD 'high' eagerness), 2 (Semantic VAD 'medium' eagerness), 1 (Semantic VAD 'low' eagerness), 0 (Semantic VAD 'auto' eagerness).
noise_reduction
Audio processing type for noise reduction. Options: 'near_field' (close microphone), 'far_field' (distant microphone), 'telephony' (phone call optimization).
prewarm_model
Model to prepare in the background for instant switching. Enables zero-latency model transitions by maintaining standby instances.
</>

Example Usage

# Switch to Spanish language
yield STTUpdateSettingsEvent(language="es")

# Switch STT provider with prewarming
yield STTUpdateSettingsEvent(
  model="gpt-4o-transcribe",
  prewarm_model="nova-3"  # Prepare Deepgram in background
)

# Improve transcription accuracy with context
yield STTUpdateSettingsEvent(
  prompt="This is a technical conversation. Please transcribe technical terms accurately."
)

# Optimize for phone call audio
yield STTUpdateSettingsEvent(
  noise_reduction="telephony",
  endpointing=2000  # 2 second silence for phone calls
)

# Comprehensive configuration update
yield STTUpdateSettingsEvent(
  language="es",
  model="nova-3",
  prompt="Conversación técnica en español. Transcribe términos técnicos con precisión.",
  endpointing=500,
  noise_reduction="near_field",
  prewarm_model="gpt-4o-transcribe"
)
🎙

StartRecordingEvent

Starts recording the current call session. Useful for quality assurance, training, or compliance purposes. Only available for phone channels.

🗃

Data Fields

status_callback_url
Optional webhook URL to receive recording status updates and completion notifications.
</>

Example Usage

# Start recording without callback
yield StartRecordingEvent()

# Start recording with status webhook
yield StartRecordingEvent(
  status_callback_url="https://your-app.com/webhooks/recording-status"
)
⏹🎙

StopRecordingEvent

Stops the current call recording session. The recording will be finalized and made available for download.

🗃

Data Fields

No data fields
</>

Example Usage

# Stop current recording
yield StopRecordingEvent()