https://api.avaluma.ai.
The Avatar Server
The Avatar Server is the rendering engine of Avaluma AI. It runs inside a Docker container with direct access to an NVIDIA GPU and performs the following work:- Loads your
.hviaavatar files from theassets/avatars/directory at startup - Receives audio from the LiveKit Agent via the
avaluma-livekit-plugin - Renders the photorealistic avatar frame-by-frame, animating lip movement and facial expressions in sync with the audio
- Publishes the resulting video track back into the LiveKit room, where any connected participant can subscribe to it
| Option | URL |
|---|---|
| Self-hosted | http://localhost:8080 (or your domain with the reverse proxy) |
| Avaluma Managed | https://api.avaluma.ai |
The optional Caddy reverse proxy included in
avatar-server/reverse_proxy/ automatically provisions and renews a TLS certificate for your domain, making the self-hosted option production-ready without extra configuration.The LiveKit Agent
The LiveKit Agent runs a full voice AI pipeline inside a Docker container. It listens to microphone audio from participants in the LiveKit room and drives a conversation through the following stages:AgentSession configuration in agent-1.py:
- STT — AssemblyAI
universal-streamingmodel transcribes the participant’s speech in real time - LLM — OpenAI
gpt-4.1-minigenerates a response to the transcript - TTS — Cartesia
sonic-3synthesises the response as speech audio AvatarSession— theavaluma-livekit-pluginforwards the TTS audio to the Avatar Server, which animates the avatar and streams the video back into the room
External Audio (agent-2 Pattern)
The Avatar Server is not limited to theAgentSession pipeline. Any external service that holds a valid LiveKit token can stream audio directly to the avatar over a LiveKit DataStream on the topic lk.audio_stream. The avatar animates that audio without an Agent or AgentSession involved at all.
agent-2 and is useful when you already have a speech synthesis service, a pre-recorded script, or any audio source outside the standard pipeline.
The external sender only needs a LiveKit token and the DataStream topic
lk.audio_stream — no Avaluma SDK is required on the sender side.Component Summary
| Component | Responsibility | Hosting |
|---|---|---|
| Avatar Server | GPU rendering, lip-sync, video streaming | Self-hosted or api.avaluma.ai |
| LiveKit Agent | STT → LLM → TTS voice AI pipeline | Self-hosted Docker container |
| LiveKit Room | Shared media layer connecting both components | LiveKit Cloud or self-hosted |
Explore Further
Avatar Server
Learn how to configure GPU resources, manage avatar files, and enable HTTPS.
LiveKit Agent
Dive into the voice AI pipeline, environment variables, and the external audio pattern.
