Discussions
Issue with Real-Time Streaming Avatar Using D-ID Speak API
Hello D-ID Support Team,
I am using your API to create a real-time streaming avatar with speech for my AI agent. When I send a complete response to the API, the avatar and speech work smoothly. However, I am now attempting to send the response in chunks (streaming) as I receive it from my custom LLM API.
The responses I receive from my custom LLM API are streamed incrementally, as shown in the example below:
11:56:25 - [assistant] : لد
11:56:25 - [assistant] : لدينا
11:56:25 - [assistant] : لدينا قائمة
11:56:25 - [assistant] : لدينا قائمة با
11:56:25 - [assistant] : لدينا قائمة باك
11:56:25 - [assistant] : لدينا قائمة باكستان
11:56:25 - [assistant] : لدينا قائمة باكستانية
11:56:25 - [assistant] : لدينا قائمة باكستانية تشمل
11:56:25 - [assistant] : لدينا قائمة باكستانية تشمل أطباق
11:56:25 - [assistant] : لدينا قائمة باكستانية تشمل أطباق مثل دوسا عادي وإدلي.
11:56:26 - [assistant] : لدينا قائمة باكستانية تشمل أطباق مثل دوسا عادي وإدلي. هل تود الاطلاع على القائمة الإنجليزية؟
When I send these chunks to the D-ID API:
The avatar’s speech skips some of the text chunks, resulting in missing words.
The real-time speech output is not smooth or continuous, which disrupts the natural flow of the conversation.
I would like to understand:
Does the D-ID API support real-time streaming input for both the avatar and speech?
If yes, how can I ensure that all text chunks are processed sequentially without dropping any part of the response?
Are there best practices or examples for implementing smooth, real-time streaming for an avatar with speech that mimics natural human-like conversation?
My goal is to have the avatar respond in real time, speaking as the text is being streamed, while maintaining a natural conversational flow.
Thank you for your assistance!
Best regards,