Overview πŸ“™

✴️ Talks Overview

Create talking head videos from just text or audio, to make business content more cost-effective, engaging and human. Speaking Portrait (Talks endpoint) allows users to create a realistic video of a human presenter, without any video production. Simply input an image and either text or an audio file, and a video is automagically created by ourΒ AI-based reenactment technology. Transform articles, training materials, corporate communications, and product marketing materials into videos, at scale, without the need for costly productions and studios.

✴️ Interface

Input
Photo URL + Text or Audio file URL
Output
Video URL


Photo Text / Audio Video

✴️ Example #1: Default Call

POST https://api.d-id.com/talks | Create a talk

{
    "source_url": "https://myhost.com/image.jpg",
    "script": {
        "type": "text",
        "input": "Hello world!"
    }
}
{
    "id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
    "created_at": "2023-03-22T16:38:49.723Z",
    "created_by": "google-oauth2|12345678",
    "status": "created",
    "object": "talk"
}

GET https://api.d-id.com/talks/<id> | Get a specific talk

Empty request body
See the Response tab
{
    "metadata": {
        "driver_url": "bank://lively/driver-02/flipped",
        "mouth_open": false,
        "num_faces": 1,
        "num_frames": 41,
        "processing_fps": 51.51385098457352,
        "resolution": [
            512,
            512
        ],
        "size_kib": 334.22265625
    },
    "audio_url": "https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%12345678/tlk_TMj4G1wiEGpQrdNFvrqAk/microsoft.wav?AWSAccessKeyId=AKIADED3BIK65W6FGA&Expires=167923230&Signature=BpLqGzh83cSL6DSFDSN3BE6pfc2M%3D",
    "created_at": "2023-03-22T16:38:49.723Z",
    "face": {
        "mask_confidence": -1,
        "detection": [
            224,
            198,
            484,
            553
        ],
        "overlap": "no",
        "size": 512,
        "top_left": [
            98,
            119
        ],
        "face_id": 0,
        "detect_confidence": 0.9998300075531006
    },
    "config": {
        "stitch": false,
        "pad_audio": 0,
        "align_driver": true,
        "sharpen": true,
        "auto_match": true,
        "normalization_factor": 1,
        "logo": {
            "url": "ai",
            "position": [
                0,
                0
            ]
        },
        "motion_factor": 1,
        "result_format": ".mp4",
        "fluent": false,
        "align_expand_factor": 0.3
    },
    "source_url": "https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%12345678/tlk_TMj4G1wiEGpQrdNFvrqAk/source/image.jpeg?AWSAccessKeyId=AKIA5CUSDFDF5W6FGA&Expires=167233230&Signature=TtFFRJTg9kEryjaKA7%2BlqPLv98%3D",
    "created_by": "google-oauth2|12345678",
    "status": "done",
    "driver_url": "bank://lively/",
    "modified_at": "2023-03-22T16:39:15.603Z",
    "user_id": "google-oauth2|12345678",
    "result_url": "https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%12345678tlk_TMj4G1wiEGpQrdNFvrqAk/image.mp4?AWSAccessKeyId=AKIA5CUMPWEREWRWW6FGA&Expires=16795234235&Signature=C1lP87Ia1ulFdsddWWEamfZADq2HA%3D",
    "id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
    "duration": 2,
    "started_at": "2023-03-22T16:39:13.633"
}

The output video is located in the result_url field.

πŸ“˜

Note

The output video is ready only when "status": "done"

status field lifecycle:

"status": "created"When posting a new talks request
"status": "started"When starting the video processing
"status": "done"When the video is ready

✴️ Example #2: Webhooks

Simply create an endpoint on your side and add it in the webhook field.
Then the webhook endpoint will be triggered with the same response body once the video is ready.

POST https://api.d-id.com/talks | Create a talk

{
    "source_url": "https://myhost.com/image.jpg",
    "script": {
        "type": "text",
        "input": "Hello world!"
    },
    "webhook": "https://myhost.com/webhook"
}
{
    "id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
    "created_at": "2023-03-22T16:38:49.723Z",
    "created_by": "google-oauth2|12345678",
    "status": "created",
    "object": "talk"
}
{
    "metadata": {
        "driver_url": "bank://lively/driver-02/flipped",
        "mouth_open": false,
        "num_faces": 1,
        "num_frames": 41,
        "processing_fps": 51.51385098457352,
        "resolution": [
            512,
            512
        ],
        "size_kib": 334.22265625
    },
    "audio_url": "https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%12345678/tlk_TMj4G1wiEGpQrdNFvrqAk/microsoft.wav?AWSAccessKeyId=AKIADED3BIK65W6FGA&Expires=167923230&Signature=BpLqGzh83cSL6DSFDSN3BE6pfc2M%3D",
    "created_at": "2023-03-22T16:38:49.723Z",
    "face": {
        "mask_confidence": -1,
        "detection": [
            224,
            198,
            484,
            553
        ],
        "overlap": "no",
        "size": 512,
        "top_left": [
            98,
            119
        ],
        "face_id": 0,
        "detect_confidence": 0.9998300075531006
    },
    "config": {
        "stitch": false,
        "pad_audio": 0,
        "align_driver": true,
        "sharpen": true,
        "auto_match": true,
        "normalization_factor": 1,
        "logo": {
            "url": "ai",
            "position": [
                0,
                0
            ]
        },
        "motion_factor": 1,
        "result_format": ".mp4",
        "fluent": false,
        "align_expand_factor": 0.3
    },
    "source_url": "https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%12345678/tlk_TMj4G1wiEGpQrdNFvrqAk/source/image.jpeg?AWSAccessKeyId=AKIA5CUSDFDF5W6FGA&Expires=167233230&Signature=TtFFRJTg9kEryjaKA7%2BlqPLv98%3D",
    "created_by": "google-oauth2|12345678",
    "status": "done",
    "driver_url": "bank://lively/",
    "modified_at": "2023-03-22T16:39:15.603Z",
    "user_id": "google-oauth2|12345678",
    "result_url": "https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%12345678tlk_TMj4G1wiEGpQrdNFvrqAk/image.mp4?AWSAccessKeyId=AKIA5CUMPWEREWRWW6FGA&Expires=16795234235&Signature=C1lP87Ia1ulFdsddWWEamfZADq2HA%3D",
    "id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
    "duration": 2,
    "started_at": "2023-03-22T16:39:13.633"
}

✴️ Example #3: Stitch

In order to get an output video that contains the entire input image context and not only a cropped video around the face area, simply use "stitch:" true

POST https://api.d-id.com/talks | Create a talk

{
    "source_url": "https://myhost.com/image.jpg",
    "script": {
        "type": "text",
        "input": "Hello world!"
    },
    "config": {
        "stitch": true
    }
}
{
    "id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
    "created_at": "2023-03-22T16:38:49.723Z",
    "created_by": "google-oauth2|12345678",
    "status": "created",
    "object": "talk"
}

✴️ Example #4: Text to Speech

Choose different voices, languages, and styles. See the supported Text-to-Speech providers' voices list

POST https://api.d-id.com/talks | Create a talk

{
    "source_url": "https://myhost.com/image.jpg",
    "script": {
        "type": "text",
        "input": "Hello world!",
        "provider": {
            "type": "microsoft",
            "voice_id": "en-US-JennyNeural",
            "voice_config": {
                "style": "Cheerful"
            }
        }
    }
}
{
    "id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
    "created_at": "2023-03-22T16:38:49.723Z",
    "created_by": "google-oauth2|12345678",
    "status": "created",
    "object": "talk"
}

✴️ Example #5: Audio Script

Using an audio file instead of a text

POST https://api.d-id.com/talks | Create a talk

{
    "source_url": "https://myhost.com/image.jpg",
    "script": {
        "type": "audio",
        "audio_url": "https://path.to/audio.mp3"
    }
}
{
    "id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
    "created_at": "2023-03-22T16:38:49.723Z",
    "created_by": "google-oauth2|12345678",
    "status": "created",
    "object": "talk"
}

✴️ Example #6: Drivers

"Driver" is a video of a real human face, filmed behind the scenes, that controls the facial and head movements of the speaking output video. There are several different drivers that can be used when creating a Talks request. By default, (when not providing a driver_url field in the request body), the system automatically chooses the best-matched driver for the input photo. However, in order to manually force a different and specific driver to the request to diverse the head movements, you can provide one of the following drivers under the driver_url field.

{
    "source_url": "https://myhost.com/image.jpg",
    "driver_url": "bank://lively/driver-05",  // See Drivers List Tab above for more supported drivers
    "script": {
        "type": "text",
        "input": "Hello world!"
    }
}
// Use the prefix "bank://"
"natural/driver-1"
"natural/driver-2"
"natural/driver-3"
"natural/driver-4"
"natural/driver-5"
"natural/driver-6"
"natural/driver-7"
"natural/driver-8"

"lively/driver-01"
"lively/driver-02
"lively/driver-03"
"lively/driver-04"
"lively/driver-05"
"lively/driver-06"

"subtle/driver-01"
"subtle/driver-02"
"subtle/driver-03"
"subtle/driver-04"

πŸ‘

Best Practice

We strongly recommend using the default auto-matching driver mechanism (by not providing driver_url) to achieve the best results


✴️ Example #7: Expressions

To apply an expression to your avatar, simply add a driver_expressions parameter under the config object of the API request body. Learn more here.

Standard Result
Neutral Expression
Results with Expressions
Different facial expressions results


Neutral Happy Surprise Serious

✴️ Video Tutorial

D-ID's API - Talks Endpoint
Live Coding Session


✴️ Support


Have any questions? We are here to help! Please leave your question in the Discussions section and we will be happy to answer shortly.

Ask a question