.hvia avatar file; Avaluma handles GPU-accelerated rendering, lip-sync, and video streaming — so your users see a lifelike avatar speaking, not a static image or a cartoon.
How It Works
Avaluma AI is built on two cooperating components that you deploy independently. Avatar Server runs on an NVIDIA GPU and is the rendering engine. It reads your.hvia avatar files, animates the avatar frame-by-frame in response to incoming audio, and publishes the resulting video track directly into a LiveKit room. You can host the Avatar Server yourself or use Avaluma’s managed endpoint at https://api.avaluma.ai.
LiveKit Agent is a Python-based voice AI pipeline that drives a conversation. It captures microphone input, runs it through Speech-to-Text (STT), passes the transcript to a Large Language Model (LLM), synthesises speech with Text-to-Speech (TTS), and then hands the audio to the Avatar Server via the avaluma-livekit-plugin. The avatar animates that audio and streams the video back into the room.
Prerequisites
Before you deploy, make sure you have the following:- An Avaluma license key — obtain one at avaluma.ai
- One or more
.hviaavatar files issued with your license - A LiveKit account — LiveKit Cloud or self-hosted
- An NVIDIA GPU with CUDA 12 support, OpenGL, and at least 6 GB VRAM (each simultaneous avatar session uses ~2.5 GB)
- Docker & Docker Compose installed on your server
- NVIDIA Container Toolkit installed so Docker can access the GPU
The Avatar Server is GPU-only. A CPU-only host is not supported. Tested architectures include Ampere, Ada Lovelace, and Blackwell.
Explore the Docs
Quickstart
Deploy the Avatar Server and LiveKit Agent in minutes and connect your first client.
Architecture
Understand how the Avatar Server and LiveKit Agent work together to render and stream your avatar.
Avatar Server
Configure GPU resources, manage
.hvia files, and set up an HTTPS reverse proxy for production.LiveKit Agent
Customise the STT → LLM → TTS pipeline and connect it to any LiveKit room.
