β΄οΈ Talks Overview

Create talking head videos from just text or audio, to make business content more cost-effective, engaging and human. Speaking Portrait (Talks endpoint) allows users to create a realistic video of a human presenter, without any video production. Simply input an image and either text or an audio file, and a video is automagically created by our AI-based reenactment technology. Transform articles, training materials, corporate communications, and product marketing materials into videos, at scale, without the need for costly productions and studios.
β΄οΈ Interface
Input
Photo URL + Text or Audio file URL
Output
Video URL


β΄οΈ Example #1: Default Call
POST
https://api.d-id.com/talks
| Create a talk
{
"source_url": "https://myhost.com/image.jpg",
"script": {
"type": "text",
"input": "Hello world!"
}
}
{
"id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
"created_at": "2023-03-22T16:38:49.723Z",
"created_by": "google-oauth2|12345678",
"status": "created",
"object": "talk"
}
GET
https://api.d-id.com/talks/<id>
| Get a specific talk
Empty request body
See the Response tab
{
"metadata": {
"driver_url": "bank://lively/driver-02/flipped",
"mouth_open": false,
"num_faces": 1,
"num_frames": 41,
"processing_fps": 51.51385098457352,
"resolution": [
512,
512
],
"size_kib": 334.22265625
},
"audio_url": "https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%12345678/tlk_TMj4G1wiEGpQrdNFvrqAk/microsoft.wav?AWSAccessKeyId=AKIADED3BIK65W6FGA&Expires=167923230&Signature=BpLqGzh83cSL6DSFDSN3BE6pfc2M%3D",
"created_at": "2023-03-22T16:38:49.723Z",
"face": {
"mask_confidence": -1,
"detection": [
224,
198,
484,
553
],
"overlap": "no",
"size": 512,
"top_left": [
98,
119
],
"face_id": 0,
"detect_confidence": 0.9998300075531006
},
"config": {
"stitch": false,
"pad_audio": 0,
"align_driver": true,
"sharpen": true,
"auto_match": true,
"normalization_factor": 1,
"logo": {
"url": "ai",
"position": [
0,
0
]
},
"motion_factor": 1,
"result_format": ".mp4",
"fluent": false,
"align_expand_factor": 0.3
},
"source_url": "https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%12345678/tlk_TMj4G1wiEGpQrdNFvrqAk/source/image.jpeg?AWSAccessKeyId=AKIA5CUSDFDF5W6FGA&Expires=167233230&Signature=TtFFRJTg9kEryjaKA7%2BlqPLv98%3D",
"created_by": "google-oauth2|12345678",
"status": "done",
"driver_url": "bank://lively/",
"modified_at": "2023-03-22T16:39:15.603Z",
"user_id": "google-oauth2|12345678",
"result_url": "https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%12345678tlk_TMj4G1wiEGpQrdNFvrqAk/image.mp4?AWSAccessKeyId=AKIA5CUMPWEREWRWW6FGA&Expires=16795234235&Signature=C1lP87Ia1ulFdsddWWEamfZADq2HA%3D",
"id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
"duration": 2,
"started_at": "2023-03-22T16:39:13.633"
}
The output video is located in the result_url
field.
NoteThe output video is ready only when
"status": "done"
status
field lifecycle:
"status": "created" | When posting a new talks request |
"status": "started" | When starting the video processing |
"status": "done" | When the video is ready |
β΄οΈ Example #2: Webhooks
Simply create an endpoint on your side and add it in the webhook
field.
Then the webhook endpoint will be triggered with the same response body once the video is ready.
POST
https://api.d-id.com/talks
| Create a talk
{
"source_url": "https://myhost.com/image.jpg",
"script": {
"type": "text",
"input": "Hello world!"
},
"webhook": "https://myhost.com/webhook"
}
{
"id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
"created_at": "2023-03-22T16:38:49.723Z",
"created_by": "google-oauth2|12345678",
"status": "created",
"object": "talk"
}
{
"metadata": {
"driver_url": "bank://lively/driver-02/flipped",
"mouth_open": false,
"num_faces": 1,
"num_frames": 41,
"processing_fps": 51.51385098457352,
"resolution": [
512,
512
],
"size_kib": 334.22265625
},
"audio_url": "https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%12345678/tlk_TMj4G1wiEGpQrdNFvrqAk/microsoft.wav?AWSAccessKeyId=AKIADED3BIK65W6FGA&Expires=167923230&Signature=BpLqGzh83cSL6DSFDSN3BE6pfc2M%3D",
"created_at": "2023-03-22T16:38:49.723Z",
"face": {
"mask_confidence": -1,
"detection": [
224,
198,
484,
553
],
"overlap": "no",
"size": 512,
"top_left": [
98,
119
],
"face_id": 0,
"detect_confidence": 0.9998300075531006
},
"config": {
"stitch": false,
"pad_audio": 0,
"align_driver": true,
"sharpen": true,
"auto_match": true,
"normalization_factor": 1,
"logo": {
"url": "ai",
"position": [
0,
0
]
},
"motion_factor": 1,
"result_format": ".mp4",
"fluent": false,
"align_expand_factor": 0.3
},
"source_url": "https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%12345678/tlk_TMj4G1wiEGpQrdNFvrqAk/source/image.jpeg?AWSAccessKeyId=AKIA5CUSDFDF5W6FGA&Expires=167233230&Signature=TtFFRJTg9kEryjaKA7%2BlqPLv98%3D",
"created_by": "google-oauth2|12345678",
"status": "done",
"driver_url": "bank://lively/",
"modified_at": "2023-03-22T16:39:15.603Z",
"user_id": "google-oauth2|12345678",
"result_url": "https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%12345678tlk_TMj4G1wiEGpQrdNFvrqAk/image.mp4?AWSAccessKeyId=AKIA5CUMPWEREWRWW6FGA&Expires=16795234235&Signature=C1lP87Ia1ulFdsddWWEamfZADq2HA%3D",
"id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
"duration": 2,
"started_at": "2023-03-22T16:39:13.633"
}
β΄οΈ Example #3: Stitch
In order to get an output video that contains the entire input image context and not only a cropped video around the face area, simply use "stitch:" true
POST
https://api.d-id.com/talks
| Create a talk
{
"source_url": "https://myhost.com/image.jpg",
"script": {
"type": "text",
"input": "Hello world!"
},
"config": {
"stitch": true
}
}
{
"id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
"created_at": "2023-03-22T16:38:49.723Z",
"created_by": "google-oauth2|12345678",
"status": "created",
"object": "talk"
}
β΄οΈ Example #4: Text to Speech
Choose different voices, languages, and styles. See the supported Text-to-Speech providers' voices list
POST
https://api.d-id.com/talks
| Create a talk
{
"source_url": "https://myhost.com/image.jpg",
"script": {
"type": "text",
"input": "Hello world!",
"provider": {
"type": "microsoft",
"voice_id": "en-US-JennyNeural",
"voice_config": {
"style": "Cheerful"
}
}
}
}
{
"id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
"created_at": "2023-03-22T16:38:49.723Z",
"created_by": "google-oauth2|12345678",
"status": "created",
"object": "talk"
}
β΄οΈ Example #5: Audio Script
Using an audio file instead of a text
POST
https://api.d-id.com/talks
| Create a talk
{
"source_url": "https://myhost.com/image.jpg",
"script": {
"type": "audio",
"audio_url": "https://path.to/audio.mp3"
}
}
{
"id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
"created_at": "2023-03-22T16:38:49.723Z",
"created_by": "google-oauth2|12345678",
"status": "created",
"object": "talk"
}
β΄οΈ Example #6: Drivers
"Driver" is a video of a real human face, filmed behind the scenes, that controls the facial and head movements of the speaking output video. There are several different drivers that can be used when creating a Talks
request. By default, (when not providing a driver_url
field in the request body), the system automatically chooses the best-matched driver for the input photo. However, in order to manually force a different and specific driver to the request to diverse the head movements, you can provide one of the following drivers under the driver_url
field.
{
"source_url": "https://myhost.com/image.jpg",
"driver_url": "bank://lively/driver-05", // See Drivers List Tab above for more supported drivers
"script": {
"type": "text",
"input": "Hello world!"
}
}
// Use the prefix "bank://"
"natural/driver-1"
"natural/driver-2"
"natural/driver-3"
"natural/driver-4"
"natural/driver-5"
"natural/driver-6"
"natural/driver-7"
"natural/driver-8"
"lively/driver-01"
"lively/driver-02
"lively/driver-03"
"lively/driver-04"
"lively/driver-05"
"lively/driver-06"
"subtle/driver-01"
"subtle/driver-02"
"subtle/driver-03"
"subtle/driver-04"
Best PracticeWe strongly recommend using the default auto-matching driver mechanism (by not providing
driver_url
) to achieve the best results
β΄οΈ Example #7: Expressions
To apply an expression to your avatar, simply add a driver_expressions
parameter under the config
object of the API request body. Learn more here.
Standard Result
Neutral Expression
Results with Expressions
Different facial expressions results
Neutral
Happy
Surprise
Serious
β΄οΈ Video Tutorial
D-ID's API - Talks Endpoint
Live Coding Session
β΄οΈ Support

Have any questions? We're here to help! Just click the Contact Support button below, and we'll be happy to answer shortly.