Automate Avatar Lip-Syncing for Live Audio with LiveAvatar

Ever wanted to give a digital avatar a voice without manually animating every syllable? What if you could do it in real-time, driven by a live audio feed? That’s the challenge the team at Alibaba-Quark tackled with LiveAvatar, an open-source project that automatically generates lip-syncing animations for avatars from streaming audio.

For developers building virtual presenters, interactive assistants, or real-time communication tools, this moves us a step closer to believable, dynamic digital characters without the heavy lifting of manual animation or pre-rendered sequences.

What It Does

LiveAvatar is a system that takes a live audio stream and a static image of a talking avatar, then outputs a realistic, synchronized lip movement video in real-time. It uses a deep learning model to predict facial landmarks—specifically mouth shapes—from the audio input and then warps the avatar’s mouth region to match those shapes frame by frame. The result is a seamless animation that makes it look like your avatar is actually speaking the audio.

Why It’s Cool

The real-time aspect is a big deal here. Many lip-sync solutions require pre-recorded audio and offline processing. LiveAvatar is built for live feeds, opening doors for live streaming, video conferencing with avatars, or interactive AI agents. It’s also resource-conscious, designed to run efficiently to keep latency low.

Technically, it’s clever in its approach. Instead of generating full video frames from scratch (which is computationally heavy), it focuses on predicting key facial points from the audio and applying image-based warping. This makes the process faster and helps preserve the original avatar’s style and details. The repository provides pre-trained models and a relatively straightforward pipeline, so you’re not starting from zero.

How to Try It

The project is hosted on GitHub. You’ll need some basic setup with Python and dependencies like PyTorch.

Clone the repo:

git clone https://github.com/Alibaba-Quark/LiveAvatar.git
cd LiveAvatar

Set up the environment as detailed in the README.md. This will involve installing required packages and downloading the pre-trained models they provide.
Prepare your assets: You’ll need a source avatar image (with a closed mouth) and an audio source.
Run the inference script to generate your first lip-synced video. The repository includes examples to help you get the command right.

Since it’s a research project, be prepared for some tinkering. Check the

Automate avatar lip-syncing for live audio feeds

README

Automate Avatar Lip-Syncing for Live Audio with LiveAvatar

What It Does

Why It’s Cool

How to Try It

Join our weekly newsletter

Love discovering amazing projects?