Custom LLMs

How to implement and integrate custom LLMs in agents

Comprehensive guide for integrating your own LLM implementation with D-ID agents. Learn how to create an OpenAI-compatible endpoint, handle authentication, optimize for production, and implement streaming responses.

Note: LLMs should stream tokens to minimize latency between turns.

Overview

Custom LLM integration enables you to:

  • Use proprietary or fine-tuned models with D-ID agents
  • Route requests through your own infrastructure
  • Maintain full control over model selection and parameters
📘

Example Code

You can start from our reference implementation on GitHub.

Responsibilities

D-ID's Responsibility
  • Securely store and encrypt your API keys/credentials
  • Send properly formatted requests to your endpoint
  • Handle streaming and non-streaming responses
  • Include metadata headers for context
Your Responsibility
  • Implement an OpenAI compatible endpoint
  • Authenticate requests via API key or OAuth2
  • Optimize for low latency (TTFT 200–500ms, under 1000ms p95)
  • Scale infrastructure for production load

Usage

Implement your LLM endpoint

Create an API endpoint that accepts POST requests with D-ID's message format and returns OpenAI-compatible streaming responses.

Request format D-ID sends:

{
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?",
      "created_at": "2025-01-30T12:51:16.946Z"
    },
    {
      "role": "assistant",
      "content": "I'm doing great, thanks!",
      "created_at": "2025-01-30T12:51:18.123Z"
    }
  ],
  "options": {
    "description": "Optional context or metadata"
  },
  "stream": true
}
data: {"id":"id1","created":1738183028,"choices":[{"delta":{"content":"Hello"}}]}

data: {"id":"id2","created":1738183029,"choices":[{"delta":{"content":" there"}}]}

data: {"id":"id3","created":1738183030,"choices":[{"delta":{"content":"!"}}]}
{
  "content": "Hello there!"
}

Error response format:

{
  "error": {
    "message": "Detailed error message",
    "code": "401",
    "type": "Unauthorized",
    "status": 401
  }
}

Latency targets (streaming):

  • Ideal: 200–500ms first token for great UX
  • Acceptable: 500–1000ms (noticeable pause, still usable)
  • Poor: >1000ms (conversation feels "laggy")
  • Tip: Measure p50/p95 TTFT, not just average

Secure your endpoint

Your endpoint will receive authentication via headers. Choose your authentication method:

// Validate the API key from headers
const apiKey = request.headers['x-api-key'];
if (apiKey !== process.env.EXPECTED_API_KEY) {
  return { statusCode: 401, body: JSON.stringify({ error: 'Unauthorized' }) };
}
// Validate the OAuth2 access token
const authHeader = request.headers['authorization'];
const token = authHeader?.replace('Bearer ', '');
// Verify token with your OAuth2 provider

D-ID metadata headers:

D-ID includes these headers with every request for logging and context:

  • X-DID-AGENT-ID: The agent making the request
  • X-DID-DISTINCT-ID: Unique client identifier

Create or update agent with custom LLM

Configure your agent to use your custom LLM endpoint.

curl -X POST "https://api.d-id.com/agents" \
  -H "Authorization: Basic <YOUR KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "preview_name": "Custom LLM Agent",
    "presenter": {
      "type": "clip",
      "presenter_id": "v2_public_Amber@0zSz8kflCN",
      "voice": {
        "type": "microsoft",
        "voice_id": "en-US-JennyMultilingualV2Neural"
      }
    },
    "llm": {
      "provider": "custom",
      "instructions": "You are a helpful assistant",
      "custom": {
        "type": "basic",
        "url": "https://your-api.example.com/llm",
        "key": "your-secret-api-key",
        "streaming": true,
        "max_messages": 20
      }
    }
  }'
curl -X POST "https://api.d-id.com/agents" \
  -H "Authorization: Basic <YOUR KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "preview_name": "Custom LLM Agent",
    "presenter": {
      "type": "clip",
      "presenter_id": "v2_public_Amber@0zSz8kflCN",
      "voice": {
        "type": "microsoft",
        "voice_id": "en-US-JennyMultilingualV2Neural"
      }
    },
    "llm": {
      "provider": "custom",
      "instructions": "You are a helpful assistant",
      "custom": {
        "type": "oauth2",
        "url": "https://your-api.example.com/llm",
        "token_url": "https://your-auth.example.com/oauth2/token",
        "client_id": "your-client-id",
        "client_secret": "your-client-secret",
        "streaming": true,
        "max_messages": 20
      }
    }
  }'
{
  "id": "agt_xyz789",
  "preview_name": "Custom LLM Agent",
  "status": "created",
  "llm": {
    "provider": "custom",
    "custom": {
      "type": "basic",
      "url": "https://your-api.example.com/llm",
      "streaming": true
    }
  }
}
🔒

Security

Your API key and OAuth2 credentials are encrypted and securely stored by D-ID. They are never exposed in API responses.

Test your integration

Use the Agent Session quickstart to create a session with your custom LLM agent and start chatting.

Debugging tips:

  • Monitor your endpoint logs for incoming requests
  • Measure time to first token (aim for 200–500ms p50, <1000ms p95)
  • Verify response format matches OpenAI structure
  • Use max_messages to control conversation history size sent to your LLM

Before going to production:

  • Ensure "streaming": true is enabled for optimal latency
  • Load test your endpoint to ensure it can handle concurrent requests
  • Monitor time to first token metrics (p50, p95, p99)
  • Ensure p95 TTFT is below 1000ms for good user experience

Custom LLM Configuration Options

FieldTypeRequiredDescription
typestring✓ YesAuthentication type: "basic" or "oauth2"
urlstring✓ YesYour LLM endpoint URL that receives POST requests
streamingbooleanOptionalEnable streaming responses for production (low latency). Default: false
max_messagesnumberOptionalMaximum conversation history messages sent to your LLM. Helps control context size and costs
headersobjectOptionalAdditional custom headers to include in requests

LLM Behavior (Optional)

Custom LLMs support the same behavior configuration as OpenAI:

FieldTypeDescription
instructionsstringDefines what the Agent does and how it should behave
prompt_customizationobjectAdvanced prompt configuration (role, personality, topics_to_avoid, max_response_length)

Authentication

Basic

FieldTypeRequiredDescription
keystring✓ YesAPI key sent in x-api-key header

OAuth2

FieldTypeRequiredDescription
token_urlstring✓ Yestoken endpoint for client credentials
client_idstring✓ Yesclient identifier
client_secretstring✓ Yesclient secret

FAQ

While streaming is optional, it's strongly recommended for production systems because it enables progressive response delivery, which is critical for conversational AI:

  • Lower perceived latency: Users see responses immediately as they're generated
  • Better UX: Natural conversation flow without long pauses
  • Faster engagement: Aim for 200–500ms time to first token

Non-streaming mode is useful for debugging and development, but production systems should always use "streaming": true for the best user experience.

Time to first token (TTFT) is the delay between receiving a user message and generating the first word of the response.

  • Ideal: 200–500ms (users perceive it as responsive)
  • Acceptable: 500–1000ms (noticeable pause, still usable)
  • Poor: >1000ms (conversation feels "laggy", users disengage)

Important: Measure p50 and p95 TTFT, not just average. Your p95 should stay below 1000ms.

Optimize your model inference, use caching, and minimize network latency to improve TTFT.

Yes, your endpoint must follow OpenAI's message format exactly:

  • Message structure: Array of {role, content} objects
  • Roles: user, assistant, system
  • Streaming format: data: {json}\n\n with choices[].delta.content
  • Non-streaming format: Simple {content: string} response

This ensures compatibility with D-ID's conversation management and allows seamless switching between providers.

Yes! Custom LLMs support the same configuration fields as OpenAI.

In the agent configuration:

{
  "llm": {
    "provider": "custom",
    "instructions": "You are a helpful assistant",
    "custom": {
      "type": "basic",
      "url": "https://your-api.example.com/llm",
      "key": "your-key",
      "streaming": true
    }
  }
}

D-ID will include these in system prompts. You can also pass custom parameters via the options field in requests and handle them on your endpoint side.