Talks Overview

Create talking head videos from just text or audio, to make business content more cost-effective, engaging and human. Speaking Portrait (Talks endpoint) allows users to create a realistic video of a human presenter, without any video production. Simply input an image and either text or an audio file, and a video is automagically created by our AI-based reenactment technology. Transform articles, training materials, corporate communications, and product marketing materials into videos, at scale, without the need for costly productions and studios.

Interface

Input
Photo URL + Text or Audio file URL

Output
Video URL

PhotoText / AudioVideo

Example #1: Default Call

POST https://api.d-id.com/talks | Create a talk

{
    "source_url": "https://myhost.com/image.jpg",
    "script": {
        "type": "text",
        "input": "Hello world!"
    }
}

{
    "id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
    "created_at": "2023-03-22T16:38:49.723Z",
    "created_by": "google-oauth2|12345678",
    "status": "created",
    "object": "talk"
}

GET https://api.d-id.com/talks/<id> | Get a specific talk

Empty request body
See the Response tab

{
    "metadata": {
        "driver_url": "bank://lively/driver-02/flipped",
        "mouth_open": false,
        "num_faces": 1,
        "num_frames": 41,
        "processing_fps": 51.51385098457352,
        "resolution": [
            512,
            512
        ],
        "size_kib": 334.22265625
    },
    "audio_url": "https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%12345678/tlk_TMj4G1wiEGpQrdNFvrqAk/microsoft.wav?AWSAccessKeyId=AKIADED3BIK65W6FGA&Expires=167923230&Signature=BpLqGzh83cSL6DSFDSN3BE6pfc2M%3D",
    "created_at": "2023-03-22T16:38:49.723Z",
    "face": {
        "mask_confidence": -1,
        "detection": [
            224,
            198,
            484,
            553
        ],
        "overlap": "no",
        "size": 512,
        "top_left": [
            98,
            119
        ],
        "face_id": 0,
        "detect_confidence": 0.9998300075531006
    },
    "config": {
        "stitch": false,
        "pad_audio": 0,
        "align_driver": true,
        "sharpen": true,
        "auto_match": true,
        "normalization_factor": 1,
        "logo": {
            "url": "ai",
            "position": [
                0,
                0
            ]
        },
        "motion_factor": 1,
        "result_format": ".mp4",
        "fluent": false,
        "align_expand_factor": 0.3
    },
    "source_url": "https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%12345678/tlk_TMj4G1wiEGpQrdNFvrqAk/source/image.jpeg?AWSAccessKeyId=AKIA5CUSDFDF5W6FGA&Expires=167233230&Signature=TtFFRJTg9kEryjaKA7%2BlqPLv98%3D",
    "created_by": "google-oauth2|12345678",
    "status": "done",
    "driver_url": "bank://lively/",
    "modified_at": "2023-03-22T16:39:15.603Z",
    "user_id": "google-oauth2|12345678",
    "result_url": "https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%12345678tlk_TMj4G1wiEGpQrdNFvrqAk/image.mp4?AWSAccessKeyId=AKIA5CUMPWEREWRWW6FGA&Expires=16795234235&Signature=C1lP87Ia1ulFdsddWWEamfZADq2HA%3D",
    "id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
    "duration": 2,
    "started_at": "2023-03-22T16:39:13.633"
}

The output video is located in the result_url field.

📘
Note
The output video is ready only when "status": "done"

status field lifecycle:


`"status": "created"`	When posting a new `talks` request
`"status": "started"`	When starting the video processing
`"status": "done"`	When the video is ready

Example #2: Webhooks

Simply create an endpoint on your side and add it in the webhook field.
Then the webhook endpoint will be triggered with the same response body once the video is ready.

POST https://api.d-id.com/talks | Create a talk

{
    "source_url": "https://myhost.com/image.jpg",
    "script": {
        "type": "text",
        "input": "Hello world!"
    },
    "webhook": "https://myhost.com/webhook"
}

{
    "id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
    "created_at": "2023-03-22T16:38:49.723Z",
    "created_by": "google-oauth2|12345678",
    "status": "created",
    "object": "talk"
}

{
    "metadata": {
        "driver_url": "bank://lively/driver-02/flipped",
        "mouth_open": false,
        "num_faces": 1,
        "num_frames": 41,
        "processing_fps": 51.51385098457352,
        "resolution": [
            512,
            512
        ],
        "size_kib": 334.22265625
    },
    "audio_url": "https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%12345678/tlk_TMj4G1wiEGpQrdNFvrqAk/microsoft.wav?AWSAccessKeyId=AKIADED3BIK65W6FGA&Expires=167923230&Signature=BpLqGzh83cSL6DSFDSN3BE6pfc2M%3D",
    "created_at": "2023-03-22T16:38:49.723Z",
    "face": {
        "mask_confidence": -1,
        "detection": [
            224,
            198,
            484,
            553
        ],
        "overlap": "no",
        "size": 512,
        "top_left": [
            98,
            119
        ],
        "face_id": 0,
        "detect_confidence": 0.9998300075531006
    },
    "config": {
        "stitch": false,
        "pad_audio": 0,
        "align_driver": true,
        "sharpen": true,
        "auto_match": true,
        "normalization_factor": 1,
        "logo": {
            "url": "ai",
            "position": [
                0,
                0
            ]
        },
        "motion_factor": 1,
        "result_format": ".mp4",
        "fluent": false,
        "align_expand_factor": 0.3
    },
    "source_url": "https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%12345678/tlk_TMj4G1wiEGpQrdNFvrqAk/source/image.jpeg?AWSAccessKeyId=AKIA5CUSDFDF5W6FGA&Expires=167233230&Signature=TtFFRJTg9kEryjaKA7%2BlqPLv98%3D",
    "created_by": "google-oauth2|12345678",
    "status": "done",
    "driver_url": "bank://lively/",
    "modified_at": "2023-03-22T16:39:15.603Z",
    "user_id": "google-oauth2|12345678",
    "result_url": "https://d-id-talks-prod.s3.us-west-2.amazonaws.com/google-oauth2%12345678tlk_TMj4G1wiEGpQrdNFvrqAk/image.mp4?AWSAccessKeyId=AKIA5CUMPWEREWRWW6FGA&Expires=16795234235&Signature=C1lP87Ia1ulFdsddWWEamfZADq2HA%3D",
    "id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
    "duration": 2,
    "started_at": "2023-03-22T16:39:13.633"
}

Example #3: Stitch

In order to get an output video that contains the entire input image context and not only a cropped video around the face area, simply use "stitch:" true

POST https://api.d-id.com/talks | Create a talk

{
    "source_url": "https://myhost.com/image.jpg",
    "script": {
        "type": "text",
        "input": "Hello world!"
    },
    "config": {
        "stitch": true
    }
}

{
    "id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
    "created_at": "2023-03-22T16:38:49.723Z",
    "created_by": "google-oauth2|12345678",
    "status": "created",
    "object": "talk"
}

Example #4: Text to Speech

Choose different voices, languages, and styles. See the supported Text-to-Speech providers' voices list

POST https://api.d-id.com/talks | Create a talk

{
    "source_url": "https://myhost.com/image.jpg",
    "script": {
        "type": "text",
        "input": "Hello world!",
        "provider": {
            "type": "microsoft",
            "voice_id": "en-US-JennyNeural",
            "voice_config": {
                "style": "Cheerful"
            }
        }
    }
}

{
    "id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
    "created_at": "2023-03-22T16:38:49.723Z",
    "created_by": "google-oauth2|12345678",
    "status": "created",
    "object": "talk"
}

Example #5: Audio Script

Using an audio file instead of a text

POST https://api.d-id.com/talks | Create a talk

{
    "source_url": "https://myhost.com/image.jpg",
    "script": {
        "type": "audio",
        "audio_url": "https://path.to/audio.mp3"
    }
}

{
    "id": "tlk_TMj4G1wiEGpQrdNFvrqAk",
    "created_at": "2023-03-22T16:38:49.723Z",
    "created_by": "google-oauth2|12345678",
    "status": "created",
    "object": "talk"
}

Example #6: Drivers

"Driver" is a video of a real human face, filmed behind the scenes, that controls the facial and head movements of the speaking output video. There are several different drivers that can be used when creating a Talks request. By default, (when not providing a driver_url field in the request body), the system automatically chooses the best-matched driver for the input photo. However, in order to manually force a different and specific driver to the request to diverse the head movements, you can provide one of the following drivers under the driver_url field.

{
    "source_url": "https://myhost.com/image.jpg",
    "driver_url": "bank://lively/driver-05",  // See Drivers List Tab above for more supported drivers
    "script": {
        "type": "text",
        "input": "Hello world!"
    }
}

// Use the prefix "bank://"
"natural/driver-1"
"natural/driver-2"
"natural/driver-3"
"natural/driver-4"
"natural/driver-5"
"natural/driver-6"
"natural/driver-7"
"natural/driver-8"

"lively/driver-01"
"lively/driver-02
"lively/driver-03"
"lively/driver-04"
"lively/driver-05"
"lively/driver-06"

"subtle/driver-01"
"subtle/driver-02"
"subtle/driver-03"
"subtle/driver-04"

👍
Best Practice
We strongly recommend using the default auto-matching driver mechanism (by not providing driver_url) to achieve the best results

Example #7: Expressions

To apply an expression to your avatar, simply add a driver_expressions parameter under the config object of the API request body. Learn more here.

Standard Result
Neutral Expression

Results with Expressions
Different facial expressions results

Video Tutorial

Support

Have any questions? We're here to help! Go to the Help Center or send us a message.

Contact Support