Custom LLMs
How to implement and integrate custom LLMs in agents
Comprehensive guide for integrating your own LLM implementation with D-ID agents. Learn how to create an OpenAI-compatible endpoint, handle authentication, optimize for production, and implement streaming responses.
Note: LLMs should stream tokens to minimize latency between turns.
Overview
Custom LLM integration enables you to:
- Use proprietary or fine-tuned models with D-ID agents
- Route requests through your own infrastructure
- Maintain full control over model selection and parameters
Example CodeYou can start from our reference implementation on GitHub.
Responsibilities
- Securely store and encrypt your API keys/credentials
- Send properly formatted requests to your endpoint
- Handle streaming and non-streaming responses
- Include metadata headers for context
- Implement an OpenAI compatible endpoint
- Authenticate requests via API key or OAuth2
- Optimize for low latency (TTFT 200–500ms, under 1000ms p95)
- Scale infrastructure for production load
Usage
Implement your LLM endpoint
Create an API endpoint that accepts POST requests with D-ID's message format and returns OpenAI-compatible streaming responses.
Request format D-ID sends:
{
"messages": [
{
"role": "user",
"content": "Hello, how are you?",
"created_at": "2025-01-30T12:51:16.946Z"
},
{
"role": "assistant",
"content": "I'm doing great, thanks!",
"created_at": "2025-01-30T12:51:18.123Z"
}
],
"options": {
"description": "Optional context or metadata"
},
"stream": true
}data: {"id":"id1","created":1738183028,"choices":[{"delta":{"content":"Hello"}}]}
data: {"id":"id2","created":1738183029,"choices":[{"delta":{"content":" there"}}]}
data: {"id":"id3","created":1738183030,"choices":[{"delta":{"content":"!"}}]}
{
"content": "Hello there!"
}Error response format:
{
"error": {
"message": "Detailed error message",
"code": "401",
"type": "Unauthorized",
"status": 401
}
}Latency targets (streaming):
- Ideal: 200–500ms first token for great UX
- Acceptable: 500–1000ms (noticeable pause, still usable)
- Poor: >1000ms (conversation feels "laggy")
- Tip: Measure p50/p95 TTFT, not just average
Secure your endpoint
Your endpoint will receive authentication via headers. Choose your authentication method:
// Validate the API key from headers
const apiKey = request.headers['x-api-key'];
if (apiKey !== process.env.EXPECTED_API_KEY) {
return { statusCode: 401, body: JSON.stringify({ error: 'Unauthorized' }) };
}// Validate the OAuth2 access token
const authHeader = request.headers['authorization'];
const token = authHeader?.replace('Bearer ', '');
// Verify token with your OAuth2 providerD-ID metadata headers:
D-ID includes these headers with every request for logging and context:
X-DID-AGENT-ID: The agent making the requestX-DID-DISTINCT-ID: Unique client identifier
Create or update agent with custom LLM
Configure your agent to use your custom LLM endpoint.
curl -X POST "https://api.d-id.com/agents" \
-H "Authorization: Basic <YOUR KEY>" \
-H "Content-Type: application/json" \
-d '{
"preview_name": "Custom LLM Agent",
"presenter": {
"type": "clip",
"presenter_id": "v2_public_Amber@0zSz8kflCN",
"voice": {
"type": "microsoft",
"voice_id": "en-US-JennyMultilingualV2Neural"
}
},
"llm": {
"provider": "custom",
"instructions": "You are a helpful assistant",
"custom": {
"type": "basic",
"url": "https://your-api.example.com/llm",
"key": "your-secret-api-key",
"streaming": true,
"max_messages": 20
}
}
}'curl -X POST "https://api.d-id.com/agents" \
-H "Authorization: Basic <YOUR KEY>" \
-H "Content-Type: application/json" \
-d '{
"preview_name": "Custom LLM Agent",
"presenter": {
"type": "clip",
"presenter_id": "v2_public_Amber@0zSz8kflCN",
"voice": {
"type": "microsoft",
"voice_id": "en-US-JennyMultilingualV2Neural"
}
},
"llm": {
"provider": "custom",
"instructions": "You are a helpful assistant",
"custom": {
"type": "oauth2",
"url": "https://your-api.example.com/llm",
"token_url": "https://your-auth.example.com/oauth2/token",
"client_id": "your-client-id",
"client_secret": "your-client-secret",
"streaming": true,
"max_messages": 20
}
}
}'{
"id": "agt_xyz789",
"preview_name": "Custom LLM Agent",
"status": "created",
"llm": {
"provider": "custom",
"custom": {
"type": "basic",
"url": "https://your-api.example.com/llm",
"streaming": true
}
}
}
SecurityYour API key and OAuth2 credentials are encrypted and securely stored by D-ID. They are never exposed in API responses.
Test your integration
Use the Agent Session quickstart to create a session with your custom LLM agent and start chatting.
Debugging tips:
- Monitor your endpoint logs for incoming requests
- Measure time to first token (aim for 200–500ms p50, <1000ms p95)
- Verify response format matches OpenAI structure
- Use
max_messagesto control conversation history size sent to your LLM
Before going to production:
- Ensure
"streaming": trueis enabled for optimal latency - Load test your endpoint to ensure it can handle concurrent requests
- Monitor time to first token metrics (p50, p95, p99)
- Ensure p95 TTFT is below 1000ms for good user experience
Custom LLM Configuration Options
| Field | Type | Required | Description |
|---|---|---|---|
type | string | ✓ Yes | Authentication type: "basic" or "oauth2" |
url | string | ✓ Yes | Your LLM endpoint URL that receives POST requests |
streaming | boolean | Optional | Enable streaming responses for production (low latency). Default: false |
max_messages | number | Optional | Maximum conversation history messages sent to your LLM. Helps control context size and costs |
headers | object | Optional | Additional custom headers to include in requests |
LLM Behavior (Optional)
Custom LLMs support the same behavior configuration as OpenAI:
| Field | Type | Description |
|---|---|---|
instructions | string | Defines what the Agent does and how it should behave |
prompt_customization | object | Advanced prompt configuration (role, personality, topics_to_avoid, max_response_length) |
Authentication
Basic
| Field | Type | Required | Description |
|---|---|---|---|
key | string | ✓ Yes | API key sent in x-api-key header |
OAuth2
| Field | Type | Required | Description |
|---|---|---|---|
token_url | string | ✓ Yes | token endpoint for client credentials |
client_id | string | ✓ Yes | client identifier |
client_secret | string | ✓ Yes | client secret |
FAQ
While streaming is optional, it's strongly recommended for production systems because it enables progressive response delivery, which is critical for conversational AI:
- Lower perceived latency: Users see responses immediately as they're generated
- Better UX: Natural conversation flow without long pauses
- Faster engagement: Aim for 200–500ms time to first token
Non-streaming mode is useful for debugging and development, but production systems should always use "streaming": true for the best user experience.
Time to first token (TTFT) is the delay between receiving a user message and generating the first word of the response.
- Ideal: 200–500ms (users perceive it as responsive)
- Acceptable: 500–1000ms (noticeable pause, still usable)
- Poor: >1000ms (conversation feels "laggy", users disengage)
Important: Measure p50 and p95 TTFT, not just average. Your p95 should stay below 1000ms.
Optimize your model inference, use caching, and minimize network latency to improve TTFT.
Yes, your endpoint must follow OpenAI's message format exactly:
- Message structure: Array of
{role, content}objects - Roles:
user,assistant,system - Streaming format:
data: {json}\n\nwithchoices[].delta.content - Non-streaming format: Simple
{content: string}response
This ensures compatibility with D-ID's conversation management and allows seamless switching between providers.
Yes! Custom LLMs support the same configuration fields as OpenAI.
In the agent configuration:
{
"llm": {
"provider": "custom",
"instructions": "You are a helpful assistant",
"custom": {
"type": "basic",
"url": "https://your-api.example.com/llm",
"key": "your-key",
"streaming": true
}
}
}D-ID will include these in system prompts. You can also pass custom parameters via the options field in requests and handle them on your endpoint side.
Updated about 8 hours ago
