Problem with Audio chunks streaming

When we send in the audio chunks and then we receive them back in (stream/started) we face a problem that if we do not reset the index then the next audio chunk is not played, other wise if we choose to reset the index, then we gat the same stream/started and stream/done again , which tends to jitter the audio and sometimes clip it. Please implement a real life solution where person can speak (not with an agent but as the person speaks we are sending and receiving the data). Your current 11 labs implementation masks this problem because you tend to send all of the chunks together ( you have all the audio data) which is technically not how the user will use this.

I have implemented it with various ways used diff refs and done stitch true and fluent true but still the problem is that as soon as I rest the index, The system breaks the sentence and does not know how to say it back again .

Here is an example -->

Tell me how is your day. --> I send full sentence (meaning I have the entire sentence before sending so it will work correctly). Next sentence will also work correctly if I have the full sentence

Example 2 --> Problem

audio chunk sent
audio chunk sent
stream startred
audio chunk sent
audio chunk sent
audio chunk sent
stream started

Here is the problem. Please Tell me the right approach to solve this issue, or lay in one of your examples