Using personalized voice

Hi,

I’m currently using default Microsoft voices to generate talking image videos, which works well, the image animates and speaks using synthetic voices. However, I’d love to explore a more personalized experience.

Is it possible to allow users to upload their own recorded voice and have that used with the talking image, so it sounds like the real person in the photo?

Alternatively, would I need to have users send their voice recordings and images to me, so I can process them externally (e.g., via Canva or another tool), then upload the final output into my app?

Ultimately, I’d like to know if I can programmatically pass a user-recorded voice to the system, instead of a TTS-generated one.

Thanks in advance!