Discussions

Ask a Question
Back to All

Inquiry about reducing latency and sending fragments (chunks) in "talks streams"

Hello,

We are using the D-ID API to simulate video calls in a chatbot, and we have implemented the "talks streams" functionality with streaming avatars. In our case, we generate the audio in advance using ElevenLabs and send it to the D-ID API. However, we are experiencing some latency, as we need to wait for the entire audio to be ready before sending it, and then wait for the response.

To reduce this latency, in our non-video call mode, we use audio fragments (chunks) and play them as they are generated. We would like to know if there is any way to reduce the latency in video calls. Is it possible to send the text or audio in chunks rather than waiting for the complete file? I’d also like to confirm if it’s possible to send the text in deltas, similar to how some text-to-speech APIs work.

Any suggestions to improve response times would be greatly appreciated.

Thanks in advance for your help.