✴️ Talks Overview
✴️ Interface
Photo URL + Text or Audio file URL Output
Video URL
✴️ Example #1: Default Call
POST
https://api.d-id.com/talks
| Create a talk
{
"source_url": "https://myhost.com/image.jpg",
"script": {
"type": "text",
"input": "Hello world!"
}
}
{
"id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
"created_at": "2023-03-22T16:38:49.723Z",
"created_by": "google-oauth2|12345678",
"status": "created",
"object": "talk"
}
GET
https://api.d-id.com/talks/<id>
| Get a specific talk
Empty request body
See the Response tab
{
"metadata": {
"driver_url": "bank://lively/driver-02/flipped",
"mouth_open": false,
"num_faces": 1,
"num_frames": 41,
"processing_fps": 51.51385098457352,
"resolution": [
512,
512
],
"size_kib": 334.22265625
},
"audio_url": "https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%12345678/tlk_TMj4G1wiEGpQrdNFvrqAk/microsoft.wav?AWSAccessKeyId=AKIADED3BIK65W6FGA&Expires=167923230&Signature=BpLqGzh83cSL6DSFDSN3BE6pfc2M%3D",
"created_at": "2023-03-22T16:38:49.723Z",
"face": {
"mask_confidence": -1,
"detection": [
224,
198,
484,
553
],
"overlap": "no",
"size": 512,
"top_left": [
98,
119
],
"face_id": 0,
"detect_confidence": 0.9998300075531006
},
"config": {
"stitch": false,
"pad_audio": 0,
"align_driver": true,
"sharpen": true,
"auto_match": true,
"normalization_factor": 1,
"logo": {
"url": "ai",
"position": [
0,
0
]
},
"motion_factor": 1,
"result_format": ".mp4",
"fluent": false,
"align_expand_factor": 0.3
},
"source_url": "https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%12345678/tlk_TMj4G1wiEGpQrdNFvrqAk/source/image.jpeg?AWSAccessKeyId=AKIA5CUSDFDF5W6FGA&Expires=167233230&Signature=TtFFRJTg9kEryjaKA7%2BlqPLv98%3D",
"created_by": "google-oauth2|12345678",
"status": "done",
"driver_url": "bank://lively/",
"modified_at": "2023-03-22T16:39:15.603Z",
"user_id": "google-oauth2|12345678",
"result_url": "https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%12345678tlk_TMj4G1wiEGpQrdNFvrqAk/image.mp4?AWSAccessKeyId=AKIA5CUMPWEREWRWW6FGA&Expires=16795234235&Signature=C1lP87Ia1ulFdsddWWEamfZADq2HA%3D",
"id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
"duration": 2,
"started_at": "2023-03-22T16:39:13.633"
}
The output video is located in the result_url
field.
Note
The output video is ready only when
"status": "done"
status
field lifecycle:
"status": "created" | When posting a new talks request |
"status": "started" | When starting the video processing |
"status": "done" | When the video is ready |
✴️ Example #2: Webhooks
Simply create an endpoint on your side and add it in the webhook
field.
Then the webhook endpoint will be triggered with the same response body once the video is ready.
POST
https://api.d-id.com/talks
| Create a talk
{
"source_url": "https://myhost.com/image.jpg",
"script": {
"type": "text",
"input": "Hello world!"
},
"webhook": "https://myhost.com/webhook"
}
{
"id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
"created_at": "2023-03-22T16:38:49.723Z",
"created_by": "google-oauth2|12345678",
"status": "created",
"object": "talk"
}
{
"metadata": {
"driver_url": "bank://lively/driver-02/flipped",
"mouth_open": false,
"num_faces": 1,
"num_frames": 41,
"processing_fps": 51.51385098457352,
"resolution": [
512,
512
],
"size_kib": 334.22265625
},
"audio_url": "https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%12345678/tlk_TMj4G1wiEGpQrdNFvrqAk/microsoft.wav?AWSAccessKeyId=AKIADED3BIK65W6FGA&Expires=167923230&Signature=BpLqGzh83cSL6DSFDSN3BE6pfc2M%3D",
"created_at": "2023-03-22T16:38:49.723Z",
"face": {
"mask_confidence": -1,
"detection": [
224,
198,
484,
553
],
"overlap": "no",
"size": 512,
"top_left": [
98,
119
],
"face_id": 0,
"detect_confidence": 0.9998300075531006
},
"config": {
"stitch": false,
"pad_audio": 0,
"align_driver": true,
"sharpen": true,
"auto_match": true,
"normalization_factor": 1,
"logo": {
"url": "ai",
"position": [
0,
0
]
},
"motion_factor": 1,
"result_format": ".mp4",
"fluent": false,
"align_expand_factor": 0.3
},
"source_url": "https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%12345678/tlk_TMj4G1wiEGpQrdNFvrqAk/source/image.jpeg?AWSAccessKeyId=AKIA5CUSDFDF5W6FGA&Expires=167233230&Signature=TtFFRJTg9kEryjaKA7%2BlqPLv98%3D",
"created_by": "google-oauth2|12345678",
"status": "done",
"driver_url": "bank://lively/",
"modified_at": "2023-03-22T16:39:15.603Z",
"user_id": "google-oauth2|12345678",
"result_url": "https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%12345678tlk_TMj4G1wiEGpQrdNFvrqAk/image.mp4?AWSAccessKeyId=AKIA5CUMPWEREWRWW6FGA&Expires=16795234235&Signature=C1lP87Ia1ulFdsddWWEamfZADq2HA%3D",
"id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
"duration": 2,
"started_at": "2023-03-22T16:39:13.633"
}
✴️ Example #3: Stitch
In order to get an output video that contains the entire input image context and not only a cropped video around the face area, simply use "stitch:" true
POST
https://api.d-id.com/talks
| Create a talk
{
"source_url": "https://myhost.com/image.jpg",
"script": {
"type": "text",
"input": "Hello world!"
},
"config": {
"stitch": true
}
}
{
"id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
"created_at": "2023-03-22T16:38:49.723Z",
"created_by": "google-oauth2|12345678",
"status": "created",
"object": "talk"
}
✴️ Example #4: Text to Speech
Choose different voices, languages, and styles. See the supported Text-to-Speech providers' voices list
POST
https://api.d-id.com/talks
| Create a talk
{
"source_url": "https://myhost.com/image.jpg",
"script": {
"type": "text",
"input": "Hello world!",
"provider": {
"type": "microsoft",
"voice_id": "en-US-JennyNeural",
"voice_config": {
"style": "Cheerful"
}
}
}
}
{
"id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
"created_at": "2023-03-22T16:38:49.723Z",
"created_by": "google-oauth2|12345678",
"status": "created",
"object": "talk"
}
✴️ Example #5: Audio Script
Using an audio file instead of a text
POST
https://api.d-id.com/talks
| Create a talk
{
"source_url": "https://myhost.com/image.jpg",
"script": {
"type": "audio",
"audio_url": "https://path.to/audio.mp3"
}
}
{
"id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
"created_at": "2023-03-22T16:38:49.723Z",
"created_by": "google-oauth2|12345678",
"status": "created",
"object": "talk"
}
✴️ Example #6: Drivers
"Driver" is a video of a real human face, filmed behind the scenes, that controls the facial and head movements of the speaking output video. There are several different drivers that can be used when creating a Talks
request. By default, (when not providing a driver_url
field in the request body), the system automatically chooses the best-matched driver for the input photo. However, in order to manually force a different and specific driver to the request to diverse the head movements, you can provide one of the following drivers under the driver_url
field.
{
"source_url": "https://myhost.com/image.jpg",
"driver_url": "bank://lively/driver-05", // See Drivers List Tab above for more supported drivers
"script": {
"type": "text",
"input": "Hello world!"
}
}
// Use the prefix "bank://"
"natural/driver-1"
"natural/driver-2"
"natural/driver-3"
"natural/driver-4"
"natural/driver-5"
"natural/driver-6"
"natural/driver-7"
"natural/driver-8"
"lively/driver-01"
"lively/driver-02
"lively/driver-03"
"lively/driver-04"
"lively/driver-05"
"lively/driver-06"
"subtle/driver-01"
"subtle/driver-02"
"subtle/driver-03"
"subtle/driver-04"
Best Practice
We strongly recommend using the default auto-matching driver mechanism (by not providing
driver_url
) to achieve the best results
✴️ Example #7: Expressions
To apply an expression to your avatar, simply add a driver_expressions
parameter under the config
object of the API request body. Learn more here.
Neutral Expression Results with Expressions
Different facial expressions results
✴️ Video Tutorial
Live Coding Session
✴️ Support
Have any questions? We are here to help! Please leave your question in the Discussions section and we will be happy to answer shortly.
Ask a question