.hvia files and streams the resulting video directly into a LiveKit room. It handles all GPU-accelerated rendering locally, giving you full control over compute resources and latency. You can run multiple avatar sessions at the same time — each session consumes approximately 2.5 GB of VRAM — making it straightforward to scale up as your needs grow.
How It Works
The Avatar Server sits at the core of the Avaluma stack. A LiveKit Agent drives the conversational AI pipeline — speech-to-text, language model, and text-to-speech — and sends audio to the Avatar Server. The server animates the avatar in real time and streams the video back into the LiveKit room for participants to see.GPU Requirements
The Avatar Server requires a dedicated NVIDIA GPU. CPU-only environments are not supported.
| Requirement | Detail |
|---|---|
| CUDA version | 12 |
| Additional capabilities | OpenGL support, graphics drivers |
| Minimum VRAM | 6 GB (each avatar session uses ~2.5 GB) |
| Tested architectures | Ampere, Ada Lovelace, Blackwell |
| NVIDIA Container Toolkit | Required — installation guide |
Deployment Options
You can connect your LiveKit Agent to either the Avaluma-hosted server or your own self-hosted instance:- Avaluma Hosted — point your agent at
api.avaluma.aiand let Avaluma manage the infrastructure. - Self-Hosted — run the Docker service on your own hardware for full control over data and compute.
Directory Structure
Theavatar-server/ directory contains everything you need to get started:
.hvia avatar files in assets/avatars/. The reverse_proxy/ folder contains an optional Caddy configuration for terminating TLS in production.
Next Steps
Setup
Deploy the Avatar Server with Docker, add your avatar files, and configure the environment.
HTTPS Proxy
Add automatic TLS to your deployment using the included Caddy reverse proxy.
