Creating Interactive Digital Avatars: Overcoming Playback Delays and Video Conflicts
I'm developing an AI-based project that creates digital avatars for users. These avatars can be accessed by other people who can have conversations with them. The workflow of the project is as follows:
Users input text (or voice which is then converted to text).
OpenAI fine-tunes the model and returns the text results.
The text is segmented into multiple audio clips.
The audio clips are converted into a video using the d-id Streams API.
I'm currently facing a couple of issues:
I'm using the live-streaming demo found at <https://github.com/de-id/live-streaming-demo>. However, I'm experiencing significant delays in video playback. Each individual audio clip is only around 200KB in size and lasts about 3 seconds.
When I provide multiple audio clips to the d-id API simultaneously, I encounter conflicts in the returned video playback.
I'd be happy to help you with these issues! Is there anything specific you would like assistance with?
I saw the demo on chat.d-id.com and it was very smooth.
Posted by buicksure about 1 year ago