Photo Avatar Quickstart

Create talking head videos from a photo and text

Create a realistic video of a human presenter from just an image and text. The Talks endpoint transforms any photo into a speaking avatar.

Create a video

Send a POST request with your image URL and the text you want the avatar to speak.

curl -X POST "https://api.d-id.com/talks" \
  -H "Authorization: Basic <YOUR KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "source_url": "https://create-images-results.d-id.com/api_docs/assets/noelle.jpeg",
    "script": {
      "type": "text",
      "input": "Hello! This is my first talking avatar video."
    }
  }'
{
  "id": "tlk_abc123",
  "created_at": "2024-01-15T10:30:00.000Z",
  "status": "created",
  "object": "talk"
}

Save the id from the response — you'll need it to check the status and retrieve the video.

Check the video status

Poll the GET endpoint until the status changes to done. The video typically takes 10-30 seconds to process.

curl -X GET "https://api.d-id.com/talks/tlk_abc123" \
  -H "Authorization: Basic <YOUR KEY>"
{
  "id": "tlk_abc123",
  "status": "done",
  "result_url": "https://result.d-id.com/.../video.mp4"
}

Poll until status is done. The video typically takes 10-30 seconds to process.

Get your video

Once the status is done, the result_url field contains a direct link to your generated video. Download or stream the video from this URL.

curl -O "https://result.d-id.com/.../video.mp4"

The video URL is valid for 24 hours. Store the video or re-fetch the talk to get a fresh URL.