Skip to content

Universal LLM ClientOne API. Every Provider.

Transparent failover, structured output, streaming, tool execution, and observability — across OpenAI, Google Gemini, Ollama, and any OpenAI-compatible service.

Works With

One interface across local inference, cloud APIs, and gateway services. Add a provider, set a priority — failover is automatic.

OllamaLocal models
OpenAIGPT-5.4
Google AIGemini
Vertex AIEnterprise
LlamaCppNative
OpenRouterGateway

Core Features

Everything you need to ship production AI without stitching together 5 different libraries.

Transparent Failover
Priority-ordered provider chain with retries, health tracking, and cooldowns. Your code never sees the switchover.
Google → OpenRouter → Ollama ✗ 500 ✗ timeout ✓ 200 ok ╰── retry 2x ──╯ ╰── seamless ──╯
Structured Output
Zod 4 schemas in, typed JSON out. Streaming partial objects, provider-aware format negotiation, and native toJSONSchema.
z.object({ → { name: "Ada", name: z.string() age: 36, age: z.number() email: "..." } }) → TypeScript inferred ✓
First-Class Streaming
Async generator streaming with pluggable decoder strategies — standard chat, interleaved reasoning, and custom formats.
for await (event of chatStream()) { text: "The answer is..." thinking: "[analyzing data]" tool: get_weather({city}) }
Autonomous Tool Calling
Register once, works everywhere. Fluent ToolBuilder, auto argument parsing, and MCP server integration in one API.
LLM ──→ get_weather({city:"Tokyo"}) ↑ │ ╰── { temp: 22, ──╯ sunny: true }
Native Observability
Built-in Auditor interface — every request, response, retry, failover, and tool call is a structured, flushable event.
▸ REQUEST [google] gemini-3.1-flash ▸ RETRY [google] 500 → attempt 2 ▸ FAILOVER [openai] openrouter ▸ RESPONSE [openai] 1.2s 84 tokens
MCP Native
Bridge MCP servers to LLM tools with zero glue code. Stdio and HTTP transports, auto tool discovery, seamless execution.
MCPToolBridge.connect({ filesystem: { command: 'npx...' } weather: { url: 'https://...' } }) → registerTools(model) ✓

See It In Action

Real TypeScript. Real patterns. Copy, paste, ship.

chat.ts
import { AIModel } from 'universal-llm-client' const model = new AIModel({ model: 'gemini-3.1-flash', providers: [ { type: 'google', apiKey: process.env.GOOGLE_KEY }, { type: 'ollama' }, ], }) const response = await model.chat([ { role: 'user', content: 'Hello!' } ])

Architecture

Clean layers, zero dependencies, designed as a transport layer for agent frameworks.

AIModel — Public API
RouterFailover Engine
StreamDecoderReasoning Strategies
AuditorObservability
Ollama
OpenAI
Google
Vertex
LlamaCpp
Start Building
Zero dependencies. MIT licensed. Production-ready.
$bun add universal-llm-client

Released under the MIT License.