Build a Conversational Voice AI Agent with Avaluma

agent-1 is the standard conversational avatar agent. It wires a complete voice AI pipeline — speech-to-text, large language model, and text-to-speech — directly to an Avaluma avatar using LiveKit Inference. When a user speaks, the audio travels through AssemblyAI for transcription, OpenAI GPT-4.1-mini for a response, and Cartesia Sonic-3 for synthesis, with the final audio rendered by the avatar server into a live video stream.

Pipeline

Microphone → STT → LLM → TTS → Avaluma Avatar → Video stream
             │                         │
       AssemblyAI              Avatar Server
       (universal-           (animates .hvia
        streaming)              avatar file)
               LLM: OpenAI GPT-4.1-mini
               TTS: Cartesia Sonic-3

Setup

Install the plugin

The avaluma-livekit-plugin package provides the AvatarSession class that connects the agent pipeline to your avatar. The pyproject.toml already declares it as a dependency — no extra install step is needed when using Docker:

pyproject.toml

dependencies = [
    "livekit",
    "livekit-agents[silero,turn-detector]~=1.2",
    "livekit-plugins-noise-cancellation~=0.2",
    "python-dotenv",
    "avaluma-livekit-plugin @ git+https://github.com/avaluma-ai/avaluma-livekit-plugin.git"
]

Configure credentials

Copy .env.example to .env.local and fill in your credentials:

cp .env.example .env.local

.env.local

AVALUMA_LICENSE_KEY="your-license-key"
AVATAR_SERVER_URL="https://your-avatar-server.com"  # or https://api.avaluma.ai

LIVEKIT_URL="wss://your-project.livekit.cloud"
LIVEKIT_API_KEY="your-api-key"
LIVEKIT_API_SECRET="your-api-secret"

Set your avatar ID

Open agents/1-agent-with-livekit-inference/agent-1.py and set avatar_id to your .hvia filename without the extension:

agent-1.py

avatar_id = "your-avatar-id"  # matches your-avatar-id.hvia

Start the agent

Launch livekit-agent-1 with Docker Compose:

docker compose up livekit-agent-1 -d

The container builds from the project root, loads .env.local, and mounts agent-1.py into the container at /app/src/agent.py.

Full Agent Code

This is the complete source of agent-1.py:

agent-1.py

import os

# highlight-next-line
from avaluma_livekit_plugin import AvatarSession
from dotenv import load_dotenv
from livekit.agents import (
    Agent,
    AgentSession,
    JobContext,
    JobProcess,
    RoomInputOptions,
    WorkerOptions,
    cli,
    inference,
)
from livekit.plugins import noise_cancellation, silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel

load_dotenv(".env.local")
agent_name = "agent-1"
avatar_id = "260218-Avaluma_Avatar_Kadda_v5"
license_key = os.getenv("AVALUMA_LICENSE_KEY", "")
avatar_server_url = os.getenv("AVATAR_SERVER_URL", "https://api.avaluma.ai")


class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="""You are a helpful virtual AI assistant. The user is interacting with you via voice, even if you perceive the conversation as text.
            You eagerly assist users with their questions by providing information from your extensive knowledge.
            Your responses are concise, to the point, and without any complex formatting or punctuation including emojis, asterisks, or other symbols.
            You are curious, friendly, and have a sense of humor.""",
        )


async def entrypoint(ctx: JobContext):
    ctx.log_context_fields = {
        "room": ctx.room.name,
    }

    session = AgentSession(
        stt=inference.STT(model="assemblyai/universal-streaming", language="en"),
        llm=inference.LLM(model="openai/gpt-4.1-mini"),
        tts=inference.TTS(
            model="cartesia/sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"
        ),
        turn_detection=MultilingualModel(),
        vad=ctx.proc.userdata["vad"],
        preemptive_generation=True,
    )

    if license_key is None:
        raise ValueError("AVALUMA_LICENSE_KEY is not set")

    # highlight-start
    avatar = AvatarSession(
        license_key=license_key,  # Your License Key
        avatar_id=avatar_id,  # Avatar identifier (Name of .hvia file)
        avatar_server_url=avatar_server_url,
    )
    # Start the avatar and wait for it to join
    await avatar.start(agent_session=session, room=ctx.room)
    # highlight-end

    await session.start(
        agent=Assistant(),
        room=ctx.room,
        room_input_options=RoomInputOptions(
            noise_cancellation=noise_cancellation.BVC(),
        ),
    )
    await ctx.connect()


def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()


if __name__ == "__main__":
    cli.run_app(
        WorkerOptions(
            entrypoint,
            prewarm_fnc=prewarm,
            agent_name=agent_name,
        )
    )

Key Components Explained

AvatarSession

AvatarSession is the core of the Avaluma integration. You instantiate it with your license key, the avatar ID (matching your .hvia filename), and the avatar server URL:

avatar = AvatarSession(
    license_key=license_key,
    avatar_id=avatar_id,
    avatar_server_url=avatar_server_url,
)
await avatar.start(agent_session=session, room=ctx.room)

Calling await avatar.start() registers the avatar as a participant in the LiveKit room and connects it to the AgentSession so TTS audio frames are forwarded to the avatar server for rendering. The call blocks until the avatar participant has fully joined the room.

AgentSession with LiveKit Inference

The AgentSession configures the full voice pipeline using LiveKit’s managed inference endpoints — no separate API keys are required for the STT, LLM, or TTS models:

Parameter	Value	Purpose
`stt`	`assemblyai/universal-streaming`	Real-time speech-to-text
`llm`	`openai/gpt-4.1-mini`	Language model for responses
`tts`	`cartesia/sonic-3`	Text-to-speech synthesis
`turn_detection`	`MultilingualModel()`	Detects end-of-turn across languages
`vad`	`silero.VAD`	Voice activity detection (prewarmed)
`preemptive_generation`	`True`	Starts LLM generation before STT finalizes for lower latency

Noise Cancellation

Background noise suppression is applied at the room input level using noise_cancellation.BVC():

room_input_options=RoomInputOptions(
    noise_cancellation=noise_cancellation.BVC(),
),

BVC (Background Voice Cancellation) filters ambient noise from the microphone feed before audio reaches the STT model, improving transcription accuracy in noisy environments.

Prewarm Function

The prewarm function pre-loads the Silero VAD model into worker process memory before the first job arrives, eliminating cold-start latency:

def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()

Adding a New Agent

Follow these steps to create an additional agent alongside agent-1:

Create a new agent directory

Add a directory under agents/ and place your agent script inside it:

agents/
└── 3-my-custom-agent/
    └── agent-3.py

Set a unique agent name

Inside your new script, set agent_name to a value that is unique within your LiveKit project:

agent-3.py

agent_name = "agent-3"

Add a service to docker-compose.yaml

Mount your script into the container and set AGENT_NAME to match agent_name in your script:

docker-compose.yaml

livekit-agent-3:
  build: .
  restart: unless-stopped
  env_file:
    - .env.local
  environment:
    - AGENT_NAME=agent-3
  volumes:
    - livekit_plugin_cache:/root/.cache
    - ./agents/3-my-custom-agent/agent-3.py:/app/src/agent.py

Start the new agent

docker compose up livekit-agent-3 -d

Each agent service must have a unique AGENT_NAME. Deploying multiple agents with the same name on the same LiveKit project causes routing conflicts — both workers will compete for the same jobs.

Agents

Pricing & Billing

Avatars

Self-Hosting

Docs as MCP

Build a Conversational Voice AI Agent with Avaluma

Pipeline

Setup

Full Agent Code

Key Components Explained

AvatarSession

AgentSession with LiveKit Inference

Noise Cancellation

Prewarm Function

Adding a New Agent

​Pipeline

​Setup

​Full Agent Code

​Key Components Explained

​AvatarSession

​AgentSession with LiveKit Inference

​Noise Cancellation

​Prewarm Function

​Adding a New Agent

Pipeline

Setup

Full Agent Code

Key Components Explained

AvatarSession

AgentSession with LiveKit Inference

Noise Cancellation

Prewarm Function

Adding a New Agent