Overview

Create an agent backed by an ElevenLabs Conversational AI agent. D-ID renders the expressive avatar; ElevenLabs handles speech-to-text, the LLM, and text-to-speech. You bring an existing ElevenLabs agent and API key.

How It Works

flowchart LR
    User(["fa:fa-user User"]) -- "audio" --> Avatar(["fa:fa-circle-user D-ID Expressive Avatar"])
    Avatar -- "audio" --> ElevenLabs(["fa:fa-microphone ElevenLabs Agent<br/>STT · LLM · TTS"])
    ElevenLabs -- "speech" --> Avatar
    Avatar -- "video + audio" --> User

User audio is forwarded to your existing ElevenLabs agent for the entire conversation pipeline. D-ID receives the synthesized speech back and renders it through the expressive avatar.

What Each Side Handles

ElevenLabs	D-ID
Speech-to-text	Expressive avatar rendering
LLM and prompting	Session orchestration
Text-to-speech	Embed and Client SDK
Knowledge, tools, conversation config	Vision input (optional)

Connect

Once the agent is created you can connect through two paths:

Embed Code

Drop a script tag into your page for the prebuilt agent UI with no frontend code.

Client SDK

Install the SDK to build a custom UI with full control over layout and behavior.

How It Works

What Each Side Handles

Connect

What’s Next