A provider-agnostic reasoning API with transparent failover — across OpenAI, Anthropic, Google Gemini, Ollama, vLLM, and any OpenAI-compatible service. Structured output, tool execution, Deep Research, and observability included.
One interface across local inference, cloud APIs, and gateway services. Add a provider, set a priority — failover is automatic.
OllamaLocal models
OpenAIGPT + o-series
AnthropicClaude
Google AIGemini
Vertex AIEnterprise
LlamaCppNative
OpenRouterGateway
Core Features
Everything you need to ship production AI without stitching together 5 different libraries.
Transparent Failover
Priority-ordered provider chain with retries, health tracking, and cooldowns. Your code never sees the switchover.
Google → OpenRouter → Ollama
✗ 500 ✗ timeout ✓ 200 ok
╰── retry 2x ──╯ ╰── seamless ──╯
Unified Reasoning
One thinking flag — boolean or level (minimal/low/medium/high) — mapped to each backend. Chain-of-thought surfaced as response.reasoning and live thinking events.
thinking: 'high' → one flag
OpenAI reasoning_effort
Gemini thinkingLevel
Anthropic budget_tokens
res.reasoning // chain-of-thought
Structured Output
Zod 4 schemas in, typed JSON out. Streaming partial objects, provider-aware format negotiation, and native toJSONSchema.