Integration of OpenAI Text-to-Speech with D ID for Talkstream Generation: Inquiry about Solutions

I am utilizing OpenAI to generate responses to queries within my application. Traditionally, we pose a question to OpenAI, which then produces an answer. We take this answer and feed it into D ID as input. D ID subsequently generates a talkstream or image incorporating OpenAI's answer and streams the response.

However, suppose I prefer not to utilize the text-to-speech (TTS) functionality of D ID. Instead, I intend to convert my text into speech using OpenAI's TTS service. As the speech is generated, it will be directly passed to D ID, which will then generate the talkstream incorporating the audio generated by OpenAI. To achieve this, I'll need to implement a webhook between OpenAI and D ID. Is there any existing solution or framework available to facilitate this process?