Universal LLM ClientOne API. Every Provider.

Transparent failover, structured output, streaming, tool execution, and observability — across OpenAI, Google Gemini, Ollama, and any OpenAI-compatible service.

Get Started

API Reference

Works With

One interface across local inference, cloud APIs, and gateway services. Add a provider, set a priority — failover is automatic.

OllamaLocal models

OpenAIGPT-5.4

Google AIGemini

Vertex AIEnterprise

LlamaCppNative

OpenRouterGateway

Core Features

Everything you need to ship production AI without stitching together 5 different libraries.

Transparent Failover

Priority-ordered provider chain with retries, health tracking, and cooldowns. Your code never sees the switchover.

Google → OpenRouter → Ollama ✗ 500 ✗ timeout ✓ 200 ok ╰── retry 2x ──╯ ╰── seamless ──╯

Structured Output

Zod 4 schemas in, typed JSON out. Streaming partial objects, provider-aware format negotiation, and native toJSONSchema.

z.object({ → { name: "Ada", name: z.string() age: 36, age: z.number() email: "..." } }) → TypeScript inferred ✓

First-Class Streaming

Async generator streaming with pluggable decoder strategies — standard chat, interleaved reasoning, and custom formats.

for await (event of chatStream()) { text: "The answer is..." thinking: "[analyzing data]" tool: get_weather({city}) }

Autonomous Tool Calling

LLM ──→ get_weather({city:"Tokyo"}) ↑ │ ╰── { temp: 22, ──╯ sunny: true }

Native Observability

Built-in Auditor interface — every request, response, retry, failover, and tool call is a structured, flushable event.

▸ REQUEST [google] gemini-3.1-flash ▸ RETRY [google] 500 → attempt 2 ▸ FAILOVER [openai] openrouter ▸ RESPONSE [openai] 1.2s 84 tokens

MCP Native

Bridge MCP servers to LLM tools with zero glue code. Stdio and HTTP transports, auto tool discovery, seamless execution.

MCPToolBridge.connect({ filesystem: { command: 'npx...' } weather: { url: 'https://...' } }) → registerTools(model) ✓

See It In Action

Real TypeScript. Real patterns. Copy, paste, ship.

chat.ts

import { AIModel } from 'universal-llm-client' const model = new AIModel({ model: 'gemini-3.1-flash', providers: [ { type: 'google', apiKey: process.env.GOOGLE_KEY }, { type: 'ollama' }, ], }) const response = await model.chat([ { role: 'user', content: 'Hello!' } ])

Architecture

Clean layers, zero dependencies, designed as a transport layer for agent frameworks.

AIModel — Public API

RouterFailover Engine

StreamDecoderReasoning Strategies

AuditorObservability

Ollama

OpenAI

Google

Vertex

LlamaCpp

Start Building

Zero dependencies. MIT licensed. Production-ready.

$bun add universal-llm-client

Get Started API Reference