Issue with Real-Time Streaming Avatar Using D-ID Speak API

Hello D-ID Support Team,

I am using your API to create a real-time streaming avatar with speech for my AI agent. When I send a complete response to the API, the avatar and speech work smoothly. However, I am now attempting to send the response in chunks (streaming) as I receive it from my custom LLM API.

The responses I receive from my custom LLM API are streamed incrementally, as shown in the example below:

11:56:25 - [assistant] : لد
11:56:25 - [assistant] : لدينا
11:56:25 - [assistant] : لدينا قائمة
11:56:25 - [assistant] : لدينا قائمة با
11:56:25 - [assistant] : لدينا قائمة باك
11:56:25 - [assistant] : لدينا قائمة باكستان
11:56:25 - [assistant] : لدينا قائمة باكستانية
11:56:25 - [assistant] : لدينا قائمة باكستانية تشمل
11:56:25 - [assistant] : لدينا قائمة باكستانية تشمل أطباق
11:56:25 - [assistant] : لدينا قائمة باكستانية تشمل أطباق مثل دوسا عادي وإدلي.
11:56:26 - [assistant] : لدينا قائمة باكستانية تشمل أطباق مثل دوسا عادي وإدلي. هل تود الاطلاع على القائمة الإنجليزية؟

When I send these chunks to the D-ID API:

The avatar’s speech skips some of the text chunks, resulting in missing words.
The real-time speech output is not smooth or continuous, which disrupts the natural flow of the conversation.

I would like to understand:

Does the D-ID API support real-time streaming input for both the avatar and speech?
If yes, how can I ensure that all text chunks are processed sequentially without dropping any part of the response?
Are there best practices or examples for implementing smooth, real-time streaming for an avatar with speech that mimics natural human-like conversation?

My goal is to have the avatar respond in real time, speaking as the text is being streamed, while maintaining a natural conversational flow.

Thank you for your assistance!

Best regards,