Catalog

Hand-picked frontier.
One base URL.

We don't list every model — only the frontier per modality. Claude Opus 4.7 for code, GPT-5.4 for general, Sora 2 for video, ElevenLabs for voice, Deepgram for transcription. 21 curated picks across 18 providers, behind one API key. Customer-facing prices include our flat 7% markup.

Showing all 21

DeepSeek V3

DeepSeekText generation

General-purpose chat + coding model with strong tool-use reliability. Default workhorse for most coding-agent loops.

66K ctxTools
Routed via
DeepSeek
$0.289 in·$1.18 out
per 1M tokens

DeepSeek R1

DeepSeekText generation

Reasoning-tuned variant with separate chain-of-thought stream. Translates into Anthropic thinking blocks; rendered natively in Claude Code.

66K ctxToolsReasoning
Routed via
DeepSeek
$0.589 in·$2.34 out
per 1M tokens

Claude Opus 4.7

AnthropicText generation

Frontier coding model — 87.6% on SWE-bench Verified. Best agentic-loop reliability. Routed via Anthropic native API: cache_control markers, image content blocks, and thinking blocks all preserved verbatim — Claude Code customers see ~75% input-cost reduction from prompt cache without changing client code.

200K ctxToolsReasoningVision
Routed via
Anthropic
$5.35 in·$26.75 out
per 1M tokens

Claude Sonnet 4.6

AnthropicText generation

Mid-tier Anthropic model. Strong tool use + vision; the default Claude Code CLI uses this slug. Native-passthrough route preserves prompt cache + image + thinking blocks.

200K ctxToolsVision
Routed via
Anthropic
$3.21 in·$16.05 out
per 1M tokens

Claude Haiku 4.5

AnthropicText generation

Fastest + cheapest Anthropic model with full tool-use + vision. ~3x cheaper than Sonnet 4.6 with substantially higher throughput — the right default for high-volume agent loops, classification, and chat tools where you want Anthropic-quality outputs without Sonnet pricing. Native-passthrough preserves prompt cache + image blocks.

200K ctxToolsVision
Routed via
Anthropic
$1.07 in·$5.35 out
per 1M tokens

Gemini 3 Pro

GoogleText generation

Google frontier multimodal — text, vision, code. 2M-token context, strong reasoning, native tool-use. Routed via Google AI Studio OpenAI-compat endpoint.

2M ctxToolsReasoningVision
Routed via
Google AI Studio
$1.34 in·$10.70 out
per 1M tokens

Grok 4.20

xAIText generation

xAI frontier reasoning model — strong on math, code, agentic loops. 256K context. Multi-agent variant available via grok-4.20-multi-agent.

256K ctxToolsReasoning
Routed via
xAI Grok
$3.21 in·$16.05 out
per 1M tokens

Qwen Coder Plus

AlibabaText generation

Alibaba's strongest coding model — OpenAI-compat tool use, broad language coverage, very price-competitive. Cheaper alternative to Sonnet 4.6 in coding rotations.

128K ctxTools
Routed via
Alibaba Qwen
$0.428 in·$1.71 out
per 1M tokens

Kimi K2.6

MoonshotText generation

Moonshot Kimi K2.6 — agentic-coding specialist with very strong tool-use reliability. 128K context, fast generation. Frontier alternative for code-heavy loops.

128K ctxToolsReasoning
Routed via
Moonshot Kimi
$0.642 in·$2.68 out
per 1M tokens

GPT-5.4

OpenAIText generation

OpenAI flagship — strongest at general-purpose reasoning, vision, and broad knowledge as of April 2026. Solid agentic loop performance via tool use.

400K ctxToolsReasoningVision
Routed via
OpenAI
$5.35 in·$21.40 out
per 1M tokens

GPT-5.4 mini

OpenAIText generation

Cheaper, faster GPT-5.4 variant — strong tool-use, lower price-per-token. Good fallback for coding rotations where Sonnet is overkill.

400K ctxToolsReasoningVision
Routed via
OpenAI
$0.535 in·$2.14 out
per 1M tokens

MiniMax M2

MiniMaxText generation

MiniMax M2.7 — frontier coding model. ~80% on SWE-bench Verified, strong tool-use, 200K context. Cheaper alternative to Sonnet 4.6 when prompt cache is not the bottleneck.

200K ctxToolsReasoning
Routed via
MiniMax
$0.321 in·$1.28 out
per 1M tokens

GPT Image 1

OpenAIImage generation

OpenAI image generation — high-fidelity, prompt-faithful, strong at typography in images. Default for /v1/images/generations.

Routed via
OpenAI
Pricing per call — see provider docs

Eleven Multilingual v2

ElevenLabsText-to-speech

World-class TTS — natural prosody, expressive voices, 30+ languages. ElevenLabs' canonical default model — used in their official SDK examples.

Routed via
ElevenLabs
Pricing per call — see provider docs

Eleven Turbo v2.5

ElevenLabsText-to-speech

Lower-latency variant of Eleven Multilingual — ideal for real-time voice agents. Slight quality trade-off for ~half the latency.

Routed via
ElevenLabs
Pricing per call — see provider docs

Sora 2

OpenAIVideo generation

Frontier text-to-video — OpenAI Sora 2. Generates ~10 second clips at 1080p with strong physics + prompt fidelity. Routed via /v1/videos.

Routed via
OpenAI
Pricing per call — see provider docs

Sora 2 Pro

OpenAIVideo generation

Highest-quality Sora variant — slower generation, higher visual fidelity, longer clips. For hero shots and final renders.

Routed via
OpenAI
Pricing per call — see provider docs

Deepgram Nova-3

DeepgramSpeech-to-text

Best-in-class real-time speech-to-text — low latency, strong accent coverage, speaker diarization. Default for /v1/audio/transcriptions when Deepgram is preferred.

Routed via
Deepgram
Pricing per call — see provider docs

Whisper Large v3

OpenAISpeech-to-text

OpenAI Whisper — strong multilingual transcription, batch-mode. Fallback when Deepgram is not preferred.

Routed via
OpenAI
Pricing per call — see provider docs

Perplexity Sonar Pro

PerplexityWeb search

Web-grounded answer engine — citations, freshness, structured search. Used for /v1/search and any task that needs live-web context.

200K ctx
Routed via
Perplexity
$3.21 in·$16.05 out
per 1M tokens

OpenAI text-embedding-3-large

OpenAIEmbeddings

Default embeddings — 3072-dim, strong retrieval quality. Used for /v1/embeddings.

8K ctx
Routed via
OpenAI
$0.139 in·$0.0000 out
per 1M tokens
Roadmap

Drop-in compatibility for any provider version string

Today: pass any model id from the catalog above (the same string you'd use with the official SDK) and we route correctly. Coming soon: pass claude-3-5-sonnet-20241022, gpt-4o-2024-08-06, gemini-1.5-flash — even versions we don't catalog directly — and we'll resolve to the closest live model on the right provider, no code change.

Providers we route to

18 active

Every model above reaches the upstream via the qlaud edge — one base URL, one set of credentials, every provider.

DeepSeekGroqOpenAIAnthropicMistralCerebrasxAICloudflare Workers AIOpenRouterMiniMaxElevenLabsDeepgramPerplexityCartesiaGoogle AI StudioxAI GrokAlibaba QwenMoonshot Kimi
Catalog as JSON

Machine-readable. Edge-cached. No auth required.

$ curl https://api.qlaud.ai/v1/catalog

One key, every model on this page.

Sign up, mint a key, swap two env vars.