Custom LLMs

Comprehensive guide for integrating your own LLM implementation with D-ID agents. Learn how to create an OpenAI-compatible endpoint, handle authentication, optimize for production, and implement streaming responses.

Note: LLMs must stream tokens to minimize latency between turns.

Overview

Custom LLM integration enables you to:

Use proprietary or fine-tuned models with D-ID agents
Route requests through your own infrastructure
Maintain full control over model selection and parameters

⚪
Example Code
You can start from our reference implementation on GitHub.

Responsibilities

D-ID's Responsibility

Securely store and encrypt your API keys/credentials
Send properly formatted requests to your endpoint
Handle streaming and non-streaming responses
Include metadata headers for context

Your Responsibility

Implement an OpenAI compatible endpoint
Authenticate requests via API key or OAuth2
Optimize for low latency (TTFT 200–500ms, under 1000ms p95)
Scale infrastructure for production load

Usage

Implement your LLM endpoint

Create an API endpoint that accepts POST requests with D-ID's message format and returns OpenAI-compatible streaming responses.

Request (D-ID sends):

{
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?",
      "created_at": "2025-01-30T12:51:16.946Z"
    },
    {
      "role": "assistant",
      "content": "I'm doing great, thanks!",
      "created_at": "2025-01-30T12:51:18.123Z"
    }
  ],
  "options": {
    "description": "Optional context or metadata"
  },
  "stream": true
}

Response:

data: {"id":"id1","created":1738183028,"choices":[{"delta":{"content":"Hello"}}]}

data: {"id":"id2","created":1738183029,"choices":[{"delta":{"content":" there"}}]}

data: {"id":"id3","created":1738183030,"choices":[{"delta":{"content":"!"}}]}

{
  "content": "Hello there!"
}

Error:

{
  "error": {
    "message": "Detailed error message",
    "code": "401",
    "type": "Unauthorized",
    "status": 401
  }
}

Latency targets (streaming):

Ideal: 200–500ms first token for great UX
Acceptable: 500–1000ms (noticeable pause, still usable)
Poor: >1000ms (conversation feels "laggy")
Tip: Measure p50/p95 TTFT, not just average

Secure your endpoint

Your endpoint will receive authentication via headers. Choose your authentication method:

// Validate the API key from headers
const apiKey = request.headers['x-api-key'];
if (apiKey !== process.env.EXPECTED_API_KEY) {
  return { statusCode: 401, body: JSON.stringify({ error: 'Unauthorized' }) };
}

// Validate the OAuth2 access token
const authHeader = request.headers['authorization'];
const token = authHeader?.replace('Bearer ', '');
// Verify token with your OAuth2 provider

D-ID metadata headers:

D-ID includes these headers with every request for logging and context:

X-DID-AGENT-ID: The agent making the request
X-DID-DISTINCT-ID: Unique client identifier

Create or update agent with custom LLM

Configure your agent to use your custom LLM endpoint.

curl -X POST "https://api.d-id.com/agents" \
  -H "Authorization: Basic <YOUR KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "preview_name": "Custom LLM Agent",
    "presenter": {
      "type": "clip",
      "presenter_id": "v2_public_Amber@0zSz8kflCN",
      "voice": {
        "type": "microsoft",
        "voice_id": "en-US-JennyMultilingualV2Neural"
      }
    },
    "llm": {
      "provider": "custom",
      "instructions": "You are a helpful assistant",
      "custom": {
        "type": "basic",
        "url": "https://your-api.example.com/llm",
        "key": "your-secret-api-key",
        "streaming": true,
        "max_messages": 20
      }
    }
  }'

curl -X POST "https://api.d-id.com/agents" \
  -H "Authorization: Basic <YOUR KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "preview_name": "Custom LLM Agent",
    "presenter": {
      "type": "clip",
      "presenter_id": "v2_public_Amber@0zSz8kflCN",
      "voice": {
        "type": "microsoft",
        "voice_id": "en-US-JennyMultilingualV2Neural"
      }
    },
    "llm": {
      "provider": "custom",
      "instructions": "You are a helpful assistant",
      "custom": {
        "type": "oauth2",
        "url": "https://your-api.example.com/llm",
        "token_url": "https://your-auth.example.com/oauth2/token",
        "client_id": "your-client-id",
        "client_secret": "your-client-secret",
        "streaming": true,
        "max_messages": 20
      }
    }
  }'

{
  "id": "agt_xyz789",
  "preview_name": "Custom LLM Agent",
  "status": "created",
  "llm": {
    "provider": "custom",
    "custom": {
      "type": "basic",
      "url": "https://your-api.example.com/llm",
      "streaming": true
    }
  }
}

🔒
Security
Your API key and OAuth2 credentials are encrypted and securely stored by D-ID. They are never exposed in API responses.

Test your integration

Use the Agent Session quickstart to create a session with your custom LLM agent and start chatting.

Debugging tips:

Monitor your endpoint logs for incoming requests
Measure time to first token (aim for 200–500ms p50, <1000ms p95)
Verify response format matches OpenAI structure
Use max_messages to control conversation history size sent to your LLM

Before going to production:

Ensure "streaming": true is enabled for optimal latency
Load test your endpoint to ensure it can handle concurrent requests
Monitor time to first token metrics (p50, p95, p99)
Ensure p95 TTFT is below 1000ms for good user experience

Custom LLM Configuration Options

FIELD	TYPE	REQUIRED	DESCRIPTION
type	string	✓ Yes	Authentication type: `basic` or `oauth2`
url	string	✓ Yes	Your LLM endpoint URL that receives POST requests
streaming	boolean	Optional	Enable streaming for production (Default: `false`)
max_messages	number	Optional	Maximum conversation history messages sent to your LLM
headers	object	Optional	Additional custom headers to include in requests

LLM Behavior (Optional)

Custom LLMs support the same behavior configuration as OpenAI:

FIELD	TYPE	DESCRIPTION
instructions	string	Defines what the Agent does and how it should behave
prompt_customization	object	Advanced prompt control (role, personality, topics_to_avoid, etc.)

Authentication

Basic

FIELD	TYPE	REQUIRED	DESCRIPTION
key	string	✓ Yes	API key sent in `x-api-key` header

OAuth2

FIELD	TYPE	REQUIRED	DESCRIPTION
token_url	string	✓ Yes	token endpoint for client credentials
client_id	string	✓ Yes	client identifier
client_secret	string	✓ Yes	client secret

FAQ

While streaming is optional, it's strongly recommended for production systems because it enables progressive response delivery, which is critical for conversational AI:

Lower perceived latency: Users see responses immediately as they're generated
Better UX: Natural conversation flow without long pauses
Faster engagement: Aim for 200–500ms time to first token

Non-streaming mode is useful for debugging and development, but production systems should always use "streaming": true for the best user experience.

Time to first token (TTFT) is the delay between receiving a user message and generating the first word of the response.

Ideal: 200–500ms (users perceive it as responsive)
Acceptable: 500–1000ms (noticeable pause, still usable)
Poor: >1000ms (conversation feels "laggy", users disengage)

Important: Measure p50 and p95 TTFT, not just average. Your p95 should stay below 1000ms.

Optimize your model inference, use caching, and minimize network latency to improve TTFT.

Yes, your endpoint must follow OpenAI's message format exactly:

Message structure: Array of {role, content} objects
Roles: user, assistant, system
Streaming format: data: {json}\n\n with choices[].delta.content
Non-streaming format: Simple {content: string} response

This ensures compatibility with D-ID's conversation management and allows seamless switching between providers.

Yes! Custom LLMs support the same configuration fields as OpenAI.

In the agent configuration:

{
  "llm": {
    "provider": "custom",
    "instructions": "You are a helpful assistant",
    "custom": {
      "type": "basic",
      "url": "https://your-api.example.com/llm",
      "key": "your-key",
      "streaming": true
    }
  }
}

D-ID will include these in system prompts. You can also pass custom parameters via the options field in requests and handle them on your endpoint side.