Providers
Universal LLM Client supports multiple providers through a single API. Each provider has specific behaviors for structured output, streaming, and feature support.
Provider Support Matrix
| Feature | OpenAI (compat) | Google/Vertex | Ollama | Anthropic | LlamaCpp |
|---|---|---|---|---|---|
| Chat | ✅ | ✅ | ✅ | ✅ | ✅ |
| Streaming | ✅ | ✅ | ✅ | ✅ | ✅ |
| Structured Output | ✅ | ✅ | ✅ | ✅ | ✅ |
| Tool Calling | ✅ | ✅ | ✅ | ✅ | ✅ |
| Vision/Images | ✅ | ✅ | ✅ | ✅ | ❌ |
| Embeddings | ✅ | ✅ | ✅ | ❌ | ✅ |
| Strict JSON Mode | ✅ (native) | ❌ (stripped) | ❌ | ⚠️ (prompted) | ❌ |
| Prompt Caching | Provider-dependent | Limited | N/A | ✅ (strong) | N/A |
| Native Thinking | Via decoder / fields | ✅ (thinkingConfig) | ✅ (think) | ✅ (extended + signatures) | Via decoder |
OpenAI
Supports OpenAI and any OpenAI-compatible API (OpenRouter, Groq, Together AI, LM Studio, vLLM, etc.).
const model = new AIModel({
model: 'gpt-4o-mini',
providers: [
{ type: 'openai', apiKey: process.env.OPENAI_API_KEY },
],
});Structured Output
OpenAI uses response_format: { type: 'json_schema' } natively. The library sends your Zod schema as a JSON Schema in the request, and OpenAI constrains its output accordingly.
Strict mode is enabled by default — OpenAI guarantees the response matches the schema exactly:
// Strict mode (default)
const result = await model.generateStructured(MySchema, messages);
// Disable strict for compatible endpoints that don't support it
const result = await model.generateStructured(MySchema, messages, {
output: { strict: false },
});Compatible Services
For OpenAI-compatible endpoints, just set the url:
{ type: 'openai', url: 'https://openrouter.ai/api/v1', apiKey: '...' }
{ type: 'openai', url: 'http://localhost:1234/v1', apiKey: 'lm-studio' }
{ type: 'openai', url: 'http://localhost:8000/v1', apiKey: '...' } // vLLMAdvanced transport flexibility (new in this release):
You can now control auth, query parameters, and base path without custom code:
// Azure OpenAI (recommended pattern)
{
type: 'openai',
url: 'https://YOUR-RESOURCE.openai.azure.com/openai/deployments/YOUR-DEPLOYMENT',
apiBasePath: '', // do not append /v1
queryParams: { 'api-version': '2024-10-21' },
headers: { 'api-key': process.env.AZURE_OPENAI_KEY },
// No need to put the query in the url string anymore
}
// Custom gateway with api-key style auth (no "Bearer")
{
type: 'openai',
url: 'https://my-gateway.example.com',
authHeader: 'api-key',
authPrefix: '',
apiKey: process.env.GATEWAY_KEY,
queryParams: { 'region': 'us-east' },
}
// Extra arbitrary headers (merged after auth)
{
type: 'openai',
url: '...',
headers: { 'x-custom': 'value', 'api-key': '...' }, // OpenAI-compatible & Ollama (via buildHeaders)
}These options are typed on every ProviderConfig, but are honored by the OpenAI-compatible provider — with headers, authHeader, and authPrefix also applied by Ollama (both use buildHeaders). queryParams and apiBasePath are OpenAI-compatible only. Google/Vertex and Anthropic build their own auth headers and URLs and ignore these knobs.
For Anthropic-specific prompt caching:
await model.chat(messages, { enablePromptCaching: true, maxTokens: 8192 });This causes the client to mark the system prompt with Anthropic's cache_control: { type: "ephemeral" }. See the Anthropic section for details.
Google / Vertex AI
Supports Google's Generative AI API and Vertex AI.
// Google AI Studio
const model = new AIModel({
model: 'gemini-2.0-flash',
providers: [
{ type: 'google', apiKey: process.env.GOOGLE_API_KEY },
],
});
// Vertex AI
const model = new AIModel({
model: 'gemini-2.0-flash',
providers: [
{
type: 'vertex',
apiKey: process.env.VERTEX_API_KEY,
region: 'us-central1',
},
],
});Structured Output
Google uses responseMimeType: 'application/json' with a responseSchema. The library automatically strips JSON Schema features that Google doesn't support:
patternminLength/maxLengthminimum/maximum/exclusiveMinimum/exclusiveMaximumadditionalProperties
INFO
Your Zod schema can use .min(), .max(), .regex(), etc. freely — the library strips unsupported features from the schema sent to Google while keeping them for validation of the response.
Ollama
Local LLM provider using Ollama's REST API.
const model = new AIModel({
model: 'llama3.2',
providers: [
{ type: 'ollama', url: 'http://localhost:11434' },
],
});Structured Output
Ollama accepts a format parameter with the JSON Schema object directly. The library converts your Zod schema and sends it as the format:
const result = await model.generateStructured(MySchema, [
{ role: 'user', content: 'Extract entities from...' },
]);TIP
For best results with structured output on Ollama, use models that support JSON mode well (e.g., llama3.2, mistral, qwen2.5).
LlamaCpp
Direct connection to a llama.cpp server.
const model = new AIModel({
model: 'default',
providers: [
{ type: 'llamacpp', url: 'http://localhost:8080' },
],
});Uses the OpenAI-compatible API format internally.
Anthropic (Claude)
Native support for Anthropic's Messages API (distinct from OpenAI wire format).
const model = new AIModel({
model: 'claude-sonnet-4-20250514',
providers: [
{ type: 'anthropic', apiKey: process.env.ANTHROPIC_API_KEY },
],
});Key Differences Handled by the Library
- Content blocks (
text,tool_use,tool_result,thinking,image) instead of flat messages. systemis a top-level field (extracted fromrole: 'system'messages).- Tool results are placed inside user messages as
tool_resultblocks. - Streaming uses typed
content_block_*events (includingthinking_deltaandsignature_delta). - Strong prompt caching and extended thinking support.
Prompt Caching
const response = await model.chat(messages, {
enablePromptCaching: true,
// ... other options
});When enabled, the client emits appropriate cache_control markers for supported Claude models.
Structured Output & Thinking
Recent Claude 4.x models have improving native schema enforcement. The library always performs final validation. Thinking / reasoning is surfaced via the standard reasoning field and DecodedEvent stream events.
Embeddings
Anthropic does not offer embeddings — use a different provider in your failover chain if you need them.
Most Other Providers: Use the OpenAI-Compatible Path
The vast majority of inference providers (hosted and self-hosted) speak the OpenAI Chat Completions format (or a close superset). You do not need a dedicated adapter.
Just use { type: 'openai', url: '...' }:
// xAI Grok
{ type: 'openai', url: 'https://api.x.ai/v1', apiKey: process.env.XAI_API_KEY }
// Mistral
{ type: 'openai', url: 'https://api.mistral.ai/v1', apiKey: process.env.MISTRAL_API_KEY }
// DeepSeek (very low cost)
{ type: 'openai', url: 'https://api.deepseek.com', apiKey: process.env.DEEPSEEK_API_KEY }
// Cohere Compatibility API
{ type: 'openai', url: 'https://api.cohere.ai/compatibility/v1', apiKey: process.env.COHERE_API_KEY } // verify exact path in Cohere docs
// Groq (fast)
{ type: 'openai', url: 'https://api.groq.com/openai/v1', apiKey: process.env.GROQ_API_KEY }
// Together AI
{ type: 'openai', url: 'https://api.together.xyz/v1', apiKey: process.env.TOGETHER_API_KEY }
// Fireworks
{ type: 'openai', url: 'https://api.fireworks.ai/inference/v1', apiKey: process.env.FIREWORKS_API_KEY }
// OpenRouter (aggregator)
{ type: 'openai', url: 'https://openrouter.ai/api/v1', apiKey: process.env.OPENROUTER_API_KEY }
// Perplexity Sonar (grounded)
{ type: 'openai', url: 'https://api.perplexity.ai', apiKey: process.env.PERPLEXITY_API_KEY }
// Self-hosted vLLM / TGI / LM Studio / etc.
{ type: 'openai', url: 'http://localhost:8000/v1', apiKey: 'not-needed-for-local' }Notes for Compatible Endpoints
- Many do not implement OpenAI's full
strictJSON schema mode → passoutput: { strict: false }when usinggenerateStructured/chat({ output }). - Tool streaming and parallel calls are widely supported but accumulation logic can be quirky; the library normalizes IDs and empty args.
- Usage / cost headers vary; the auditor still receives normalized
TokenUsageInfo. - For Azure OpenAI: the URL structure (
/deployments/{deployment}/chat/completions?api-version=...) andapi-keyheader are different. You can often construct the full URL + pass custom headers (see advanced config) or request anazureprovider type for first-class ergonomics.
See the research document docs/research/provider-api-landscape-2026.md for a detailed 2026 survey of wire formats, why only Anthropic + Google warrant native clients, Bedrock guidance, Responses API notes, and more.
Provider Failover
When using multiple providers, the library automatically fails over to the next provider on errors:
const model = new AIModel({
model: 'gpt-4o-mini',
providers: [
{ type: 'openai', apiKey: process.env.OPENAI_API_KEY, priority: 0 },
{ type: 'google', apiKey: process.env.GOOGLE_API_KEY, priority: 1 },
{ type: 'ollama', url: 'http://localhost:11434', priority: 2 },
],
retries: 2, // retries per provider before moving to next
});How failover works with structured output
- The schema is sent in the provider's native format (JSON Schema for OpenAI, responseSchema for Google, format for Ollama)
- If a provider fails, the next provider receives the same schema re-formatted for its API
- Response validation is centralized in the Router — regardless of which provider responds, the same Zod validation runs
Health Tracking
Each provider has health status with automatic cooldowns:
const statuses = model.getProviderStatus();
// [
// { id: 'openai-0', healthy: true, errorCount: 0 },
// { id: 'google-1', healthy: false, errorCount: 3, coolingDown: true },
// ]