DeepSeek V3
General-purpose chat + coding model with strong tool-use reliability. Default workhorse for most coding-agent loops.
We don't list every model — only the frontier per modality. Claude Opus 4.7 for code, GPT-5.4 for general, Sora 2 for video, ElevenLabs for voice, Deepgram for transcription. 21 curated picks across 18 providers, behind one API key. Customer-facing prices include our flat 7% markup.
General-purpose chat + coding model with strong tool-use reliability. Default workhorse for most coding-agent loops.
Reasoning-tuned variant with separate chain-of-thought stream. Translates into Anthropic thinking blocks; rendered natively in Claude Code.
Frontier coding model — 87.6% on SWE-bench Verified. Best agentic-loop reliability. Routed via Anthropic native API: cache_control markers, image content blocks, and thinking blocks all preserved verbatim — Claude Code customers see ~75% input-cost reduction from prompt cache without changing client code.
Mid-tier Anthropic model. Strong tool use + vision; the default Claude Code CLI uses this slug. Native-passthrough route preserves prompt cache + image + thinking blocks.
Fastest + cheapest Anthropic model with full tool-use + vision. ~3x cheaper than Sonnet 4.6 with substantially higher throughput — the right default for high-volume agent loops, classification, and chat tools where you want Anthropic-quality outputs without Sonnet pricing. Native-passthrough preserves prompt cache + image blocks.
Google frontier multimodal — text, vision, code. 2M-token context, strong reasoning, native tool-use. Routed via Google AI Studio OpenAI-compat endpoint.
xAI frontier reasoning model — strong on math, code, agentic loops. 256K context. Multi-agent variant available via grok-4.20-multi-agent.
Alibaba's strongest coding model — OpenAI-compat tool use, broad language coverage, very price-competitive. Cheaper alternative to Sonnet 4.6 in coding rotations.
Moonshot Kimi K2.6 — agentic-coding specialist with very strong tool-use reliability. 128K context, fast generation. Frontier alternative for code-heavy loops.
OpenAI flagship — strongest at general-purpose reasoning, vision, and broad knowledge as of April 2026. Solid agentic loop performance via tool use.
Cheaper, faster GPT-5.4 variant — strong tool-use, lower price-per-token. Good fallback for coding rotations where Sonnet is overkill.
MiniMax M2.7 — frontier coding model. ~80% on SWE-bench Verified, strong tool-use, 200K context. Cheaper alternative to Sonnet 4.6 when prompt cache is not the bottleneck.
OpenAI image generation — high-fidelity, prompt-faithful, strong at typography in images. Default for /v1/images/generations.
World-class TTS — natural prosody, expressive voices, 30+ languages. ElevenLabs' canonical default model — used in their official SDK examples.
Lower-latency variant of Eleven Multilingual — ideal for real-time voice agents. Slight quality trade-off for ~half the latency.
Frontier text-to-video — OpenAI Sora 2. Generates ~10 second clips at 1080p with strong physics + prompt fidelity. Routed via /v1/videos.
Highest-quality Sora variant — slower generation, higher visual fidelity, longer clips. For hero shots and final renders.
Best-in-class real-time speech-to-text — low latency, strong accent coverage, speaker diarization. Default for /v1/audio/transcriptions when Deepgram is preferred.
OpenAI Whisper — strong multilingual transcription, batch-mode. Fallback when Deepgram is not preferred.
Web-grounded answer engine — citations, freshness, structured search. Used for /v1/search and any task that needs live-web context.
Default embeddings — 3072-dim, strong retrieval quality. Used for /v1/embeddings.
Today: pass any model id from the catalog above (the same string you'd use with the official SDK) and we route correctly. Coming soon: pass claude-3-5-sonnet-20241022, gpt-4o-2024-08-06, gemini-1.5-flash — even versions we don't catalog directly — and we'll resolve to the closest live model on the right provider, no code change.
Every model above reaches the upstream via the qlaud edge — one base URL, one set of credentials, every provider.
Machine-readable. Edge-cached. No auth required.
$ curl https://api.qlaud.ai/v1/catalogSign up, mint a key, swap two env vars.