agent-1 is the standard conversational avatar agent. It wires a complete voice AI pipeline — speech-to-text, large language model, and text-to-speech — directly to an Avaluma avatar using LiveKit Inference. When a user speaks, the audio travels through AssemblyAI for transcription, OpenAI GPT-4.1-mini for a response, and Cartesia Sonic-3 for synthesis, with the final audio rendered by the avatar server into a live video stream.
Pipeline
Microphone → STT → LLM → TTS → Avaluma Avatar → Video stream
│ │
AssemblyAI Avatar Server
(universal- (animates .hvia
streaming) avatar file)
LLM: OpenAI GPT-4.1-mini
TTS: Cartesia Sonic-3
Setup
Install the plugin
The avaluma-livekit-plugin package provides the AvatarSession class that connects the agent pipeline to your avatar. The pyproject.toml already declares it as a dependency — no extra install step is needed when using Docker:dependencies = [
"livekit",
"livekit-agents[silero,turn-detector]~=1.2",
"livekit-plugins-noise-cancellation~=0.2",
"python-dotenv",
"avaluma-livekit-plugin @ git+https://github.com/avaluma-ai/avaluma-livekit-plugin.git"
]
Configure credentials
Copy .env.example to .env.local and fill in your credentials:cp .env.example .env.local
AVALUMA_LICENSE_KEY="your-license-key"
AVATAR_SERVER_URL="https://your-avatar-server.com" # or https://api.avaluma.ai
LIVEKIT_URL="wss://your-project.livekit.cloud"
LIVEKIT_API_KEY="your-api-key"
LIVEKIT_API_SECRET="your-api-secret"
Set your avatar ID
Open agents/1-agent-with-livekit-inference/agent-1.py and set avatar_id to your .hvia filename without the extension:avatar_id = "your-avatar-id" # matches your-avatar-id.hvia
Start the agent
Launch livekit-agent-1 with Docker Compose:docker compose up livekit-agent-1 -d
The container builds from the project root, loads .env.local, and mounts agent-1.py into the container at /app/src/agent.py.
Full Agent Code
This is the complete source of agent-1.py:
import os
# highlight-next-line
from avaluma_livekit_plugin import AvatarSession
from dotenv import load_dotenv
from livekit.agents import (
Agent,
AgentSession,
JobContext,
JobProcess,
RoomInputOptions,
WorkerOptions,
cli,
inference,
)
from livekit.plugins import noise_cancellation, silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel
load_dotenv(".env.local")
agent_name = "agent-1"
avatar_id = "260218-Avaluma_Avatar_Kadda_v5"
license_key = os.getenv("AVALUMA_LICENSE_KEY", "")
avatar_server_url = os.getenv("AVATAR_SERVER_URL", "https://api.avaluma.ai")
class Assistant(Agent):
def __init__(self) -> None:
super().__init__(
instructions="""You are a helpful virtual AI assistant. The user is interacting with you via voice, even if you perceive the conversation as text.
You eagerly assist users with their questions by providing information from your extensive knowledge.
Your responses are concise, to the point, and without any complex formatting or punctuation including emojis, asterisks, or other symbols.
You are curious, friendly, and have a sense of humor.""",
)
async def entrypoint(ctx: JobContext):
ctx.log_context_fields = {
"room": ctx.room.name,
}
session = AgentSession(
stt=inference.STT(model="assemblyai/universal-streaming", language="en"),
llm=inference.LLM(model="openai/gpt-4.1-mini"),
tts=inference.TTS(
model="cartesia/sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"
),
turn_detection=MultilingualModel(),
vad=ctx.proc.userdata["vad"],
preemptive_generation=True,
)
if license_key is None:
raise ValueError("AVALUMA_LICENSE_KEY is not set")
# highlight-start
avatar = AvatarSession(
license_key=license_key, # Your License Key
avatar_id=avatar_id, # Avatar identifier (Name of .hvia file)
avatar_server_url=avatar_server_url,
)
# Start the avatar and wait for it to join
await avatar.start(agent_session=session, room=ctx.room)
# highlight-end
await session.start(
agent=Assistant(),
room=ctx.room,
room_input_options=RoomInputOptions(
noise_cancellation=noise_cancellation.BVC(),
),
)
await ctx.connect()
def prewarm(proc: JobProcess):
proc.userdata["vad"] = silero.VAD.load()
if __name__ == "__main__":
cli.run_app(
WorkerOptions(
entrypoint,
prewarm_fnc=prewarm,
agent_name=agent_name,
)
)
Key Components Explained
AvatarSession
AvatarSession is the core of the Avaluma integration. You instantiate it with your license key, the avatar ID (matching your .hvia filename), and the avatar server URL:
avatar = AvatarSession(
license_key=license_key,
avatar_id=avatar_id,
avatar_server_url=avatar_server_url,
)
await avatar.start(agent_session=session, room=ctx.room)
Calling await avatar.start() registers the avatar as a participant in the LiveKit room and connects it to the AgentSession so TTS audio frames are forwarded to the avatar server for rendering. The call blocks until the avatar participant has fully joined the room.
AgentSession with LiveKit Inference
The AgentSession configures the full voice pipeline using LiveKit’s managed inference endpoints — no separate API keys are required for the STT, LLM, or TTS models:
| Parameter | Value | Purpose |
|---|
stt | assemblyai/universal-streaming | Real-time speech-to-text |
llm | openai/gpt-4.1-mini | Language model for responses |
tts | cartesia/sonic-3 | Text-to-speech synthesis |
turn_detection | MultilingualModel() | Detects end-of-turn across languages |
vad | silero.VAD | Voice activity detection (prewarmed) |
preemptive_generation | True | Starts LLM generation before STT finalizes for lower latency |
Noise Cancellation
Background noise suppression is applied at the room input level using noise_cancellation.BVC():
room_input_options=RoomInputOptions(
noise_cancellation=noise_cancellation.BVC(),
),
BVC (Background Voice Cancellation) filters ambient noise from the microphone feed before audio reaches the STT model, improving transcription accuracy in noisy environments.
Prewarm Function
The prewarm function pre-loads the Silero VAD model into worker process memory before the first job arrives, eliminating cold-start latency:
def prewarm(proc: JobProcess):
proc.userdata["vad"] = silero.VAD.load()
Adding a New Agent
Follow these steps to create an additional agent alongside agent-1:
Create a new agent directory
Add a directory under agents/ and place your agent script inside it:agents/
└── 3-my-custom-agent/
└── agent-3.py
Set a unique agent name
Inside your new script, set agent_name to a value that is unique within your LiveKit project: Add a service to docker-compose.yaml
Mount your script into the container and set AGENT_NAME to match agent_name in your script:livekit-agent-3:
build: .
restart: unless-stopped
env_file:
- .env.local
environment:
- AGENT_NAME=agent-3
volumes:
- livekit_plugin_cache:/root/.cache
- ./agents/3-my-custom-agent/agent-3.py:/app/src/agent.py
Start the new agent
docker compose up livekit-agent-3 -d
Each agent service must have a unique AGENT_NAME. Deploying multiple agents with the same name on the same LiveKit project causes routing conflicts — both workers will compete for the same jobs.