VoiceRun Developer Guide

This guide explains how to create agent functions using the VoiceRun framework. Agent functions are the core building blocks that define how your AI agent behaves and responds to various events.

Overview

VoiceRun is a framework for building conversational AI agents. It provides a simple, event-driven architecture that makes it easy to create sophisticated conversational experiences. The framework handles the complexity of speech-to-text, text-to-speech, and conversation management, allowing you to focus on defining your agent's behavior through Python functions.

Basic Structure

Every agent function follows this pattern:

from primfunctions.events import Event, StartEvent, TextEvent, StopEvent, TextToSpeechEvent, TimeoutEvent
from primfunctions.context import Context

async def handler(event: Event, context: Context):
    if isinstance(event, StartEvent):
        # Handle session start
        yield TextToSpeechEvent(text="Hello!", voice="nova")

    if isinstance(event, TextEvent):
        # Handle text input
        user_message = event.data.get("text", "")
        yield TextToSpeechEvent(text=f"You said: {user_message}", voice="nova")

    if isinstance(event, TimeoutEvent):
        # Handle timeout
        yield TextToSpeechEvent(text="Are you still there?", voice="nova")

    if isinstance(event, StopEvent):
        # Handle session end
        yield TextToSpeechEvent(text="Goodbye!", voice="nova")

Events

Input Events

Event TypeDescriptionData Format
StartEventSession begins{ }
TextEventUser sends text{ "source": "text/speech", "text": "message" }
TimeoutEventSession timeout{ "count": 1, "ms_since_input": 1000 }
StopEventSession ends{ }

Event Example

from primfunctions.events import TextToSpeechEvent, AudioEvent, SilenceEvent

# Speak text with specific voice
yield TextToSpeechEvent(
    text="Hello, how can I help you?",
    voice="nova",
    cache=True, # optional, defaults to True
    interruptable=True # optional, defaults to True
)

# Play audio file
yield AudioEvent(path="/path/to/audio.mp3")

# Wait for 2 seconds
yield SilenceEvent(duration=2000)

Context

The context object provides access to session state and utilities:

# Access session variables
user_name = context.get_data("user_name", "Guest")

# Set session variables
context.set_data("user_name", "John")

# Access environment/organization variables
api_key = context.variables.get("OPENAI_API_KEY")

Tests

Tests allow you to run A/B experiments to optimize your agent's performance:

async def handler(event: Event, context: Context):
  if isinstance(event, StartEvent):
      # Configure A/B/C test for greeting style
      context.add_test("greeting_variant", {
          "formal": 0.33,
          "casual": 0.33,
          "friendly": 0.34
      }, stop={
          "max_iterations": 500,
          "max_confidence": 95,
          "target_outcome": "user_satisfied",
          "default": "friendly"
      })

      variant = context.get_test("greeting_variant")

      if variant == "formal":
          yield TextToSpeechEvent(text="Good day! How may I assist you?", voice="nova")
      elif variant == "casual":
          yield TextToSpeechEvent(text="Hey there! What's up?", voice="nova")
      else:  # friendly
          yield TextToSpeechEvent(text="Hi! I'm here to help!", voice="nova")

Outcomes

Outcomes let you track and optimize for key metrics in your tests, such as conversion rates or user satisfaction.

async def handler(event: Event, context: Context):
  if isinstance(event, TextEvent):
      user_message = event.data.get("text", "").lower()
      # Increment conversion rate if user expresses purchase intent
      if "buy" in user_message or "purchase" in user_message:
          current_rate = context.get_outcome("conversion_rate", 0.0)
          context.set_outcome("conversion_rate", current_rate + 0.1)

Advanced Features

Custom Events

from primfunctions.events import CustomEvent

# Create custom event
custom_event = CustomEvent("payment_processed", {"amount": 100.00})

# Handle custom events
if isinstance(event, CustomEvent):
    if event.name == "payment_processed":
        payment_amt = event.data["amount"]
        yield TextToSpeechEvent(text=f"Payment of {payment_amt} processed", voice="nova")

Stop Conditions

# Configure a test with stop conditions for confidence and iterations
  async def handler(event: Event, context: Context):
      if isinstance(event, StartEvent):
          context.add_test(
              "button_color",
              {
                  "red": 0.5,
                  "blue": 0.5
              },
              stop={
                  "max_iterations": 1000,
                  "max_confidence": 95,
                  "target_outcome": "conversion_rate",
                  "default": "blue"
              }
          )
          variant = context.get_test("button_color")
          if variant == "red":
              yield TextToSpeechEvent(text="The button is red.", voice="nova")
          else:
              yield TextToSpeechEvent(text="The button is blue.", voice="nova")

Background Tasks

Background tasks allow you to perform time-consuming operations without blocking the main conversation flow. This is essential for creating responsive agents that can handle complex workflows while maintaining natural conversation.

Creating Background Tasks

Background tasks are async generator functions that yield events. They run independently of the main conversation and can perform operations like API calls, database queries, or complex calculations.

Use context.create_task() to launch background tasks. This immediately returns control to the main conversation while the task runs asynchronously.

Background tasks can share state with the main conversation using context.get_data() and context.set_data():

# In background task
import asyncio
import time
import random

async def background_task(context: Context):
    yield LogEvent("Processing background task...")

    # Set initial state
    context.set_data("task_completed", False)

    # Do work...
    await asyncio.sleep(random.random() * 10)

    # Update state
    context.set_data("task_completed", True)
    context.set_data("completion_time", time.time())

    yield LogEvent("Background task done")

async def handler(event: Event, context: Context):
    if isinstance(event, StartEvent):
        yield TextToSpeechEvent(
            text="Hello! I'll start processing your data in the background.",
            voice="brooke"
        )

        context.create_task(background_task(context))

    if isinstance(event, TextEvent):
        user_message = event.data.get("text", "").lower()

        if context.get_data("task_completed", False):
            completion_seconds_ago = int(time.time() - context.get_data("completion_time", 0))

            yield TextToSpeechEvent(
                text=f"The data is done processing. Completion was {completion_seconds_ago} seconds ago.",
                voice="brooke"
            )

            yield TextToSpeechEvent(
                text="Starting new task...",
                voice="brooke"
            )

            context.create_task(background_task(context))
        else:
            yield TextToSpeechEvent(
                text="The data is still processing.",
                voice="brooke"
            )

Background Task Monitoring

Background tasks appear in the Agent Debugger interface with special visual indicators:

  • Orange background and border to distinguish from regular events
  • Task name display for easy identification

When to Use Background Tasks

Use background tasks for operations that might take more than a few seconds:

  • API calls to external services
  • Database queries and updates
  • File processing and data analysis
  • Complex calculations or model inference
  • Any operation that could delay the conversation

πŸ’‘ Pro Tip: Background Task Best Practices

  • Always use meaningful background_task_name for easy identification
  • Log progress updates to keep users informed
  • Use context for state management between tasks
  • Keep background tasks focused on a single responsibility
  • Handle errors gracefully in background tasks

Outbound Calling

You can start a phone session (outbound call) by creating an API key in the Prim Voices dashboard and then calling the session start endpoint for your agent.

Create an API key

Issue an API key from Prim Voices β†’ Profile β†’ API Keys.

Start an outbound call

curl 'https://api.primvoices.com/v1/agents/<AGENT_ID>/sessions/start' \
  -X 'POST' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer <API_KEY>' \
  --data-raw '{"inputType":"phone","inputParameters":{"phoneNumber":"<NUMBER_TO_DIAL>"},"environment":"<ENVIRONMENT_NAME>","parameters":{}}'

Parameters

  • inputParameters.phoneNumber is the phone number to dial.
  • environment is the environment name to run the agent in.
  • parameters is a dictionary of values passed into the session context when the session starts. It is accessible from the handler using context.get_data("key").

This guide provides the foundation for creating sophisticated agent functions. The modular design allows for complex behaviors while maintaining simplicity and testability.

Ready to see examples in action? Check out our Examples section for complete, working agent implementations.