Overview
Create an agent backed by an ElevenLabs Conversational AI agent. D-ID renders the expressive avatar; ElevenLabs handles speech-to-text, the LLM, and text-to-speech. You bring an existing ElevenLabs agent and API key.
How It Works
flowchart LR
User(["fa:fa-user User"]) -- "audio" --> Avatar(["fa:fa-circle-user D-ID Expressive Avatar"])
Avatar -- "audio" --> ElevenLabs(["fa:fa-microphone ElevenLabs Agent<br/>STT · LLM · TTS"])
ElevenLabs -- "speech" --> Avatar
Avatar -- "video + audio" --> User
User audio is forwarded to your existing ElevenLabs agent for the entire conversation pipeline. D-ID receives the synthesized speech back and renders it through the expressive avatar.
What Each Side Handles
| ElevenLabs | D-ID |
|---|---|
| Speech-to-text | Expressive avatar rendering |
| LLM and prompting | Session orchestration |
| Text-to-speech | Embed and Client SDK |
| Knowledge, tools, conversation config | Vision input (optional) |
Connect
Once the agent is created you can connect through two paths:
Embed Code
Drop a script tag into your page for the prebuilt agent UI with no frontend code.
Client SDK
Install the SDK to build a custom UI with full control over layout and behavior.
Updated 1 day ago
What’s Next
