DeepSeek V3
General-purpose chat + coding model with strong tool-use reliability. Default workhorse for most coding-agent loops.
Browse every model qlaud routes to. Customer-facing prices include our 7% markup. Models with multiple hosts can be requested with :nitro for the fastest tool-verified host or :floor for the cheapest live host.
Showing all 17 models
General-purpose chat + coding model with strong tool-use reliability. Default workhorse for most coding-agent loops.
Reasoning-tuned variant with separate chain-of-thought stream. Translates into Anthropic thinking blocks; rendered natively in Claude Code.
Frontier coding model — 87.6% on SWE-bench Verified. Best agentic-loop reliability. Routed via Anthropic native API: cache_control markers, image content blocks, and thinking blocks all preserved verbatim — Claude Code customers see ~75% input-cost reduction from prompt cache without changing client code.
Mid-tier Anthropic model. Strong tool use + vision; the default Claude Code CLI uses this slug. Native-passthrough route preserves prompt cache + image + thinking blocks.
Frontier multimodal — text, vision, code. 2M-token context, strong reasoning. Routed via Google AI Studio native passthrough.
xAI frontier model — strong on reasoning + math + code. 256K context. Routed via xAI native passthrough.
Qwen 3 Coder — Alibaba's strongest open-source coding model. 128K context, OpenAI-compat tool use. Cost-effective frontier coder.
Moonshot Kimi K2 — agentic-coding specialist. Strong tool-use, 128K context, very fast. Excellent fallback in coding rotations.
MiniMax M2.7 — frontier coding model. ~80% on SWE-bench Verified, strong tool-use, 200K context. Cheaper alternative to Sonnet 4.6 when prompt cache is not the bottleneck.
OpenAI image generation — high-fidelity, prompt-faithful, strong at typography in images. Default for /v1/images/generations.
World-class TTS — natural prosody, expressive voices, 30+ languages. Default for /v1/audio/speech when ElevenLabs is the preferred TTS provider.
OpenAI text-to-speech via gpt-4o-mini-tts. Cheaper baseline; good prosody on standard English. Fallback when ElevenLabs is not preferred.
Best-in-class real-time speech-to-text — low latency, strong accent coverage, speaker diarization. Default for /v1/audio/transcriptions when Deepgram is preferred.
OpenAI Whisper — strong multilingual transcription, batch-mode. Fallback when Deepgram is not preferred.
Web-grounded answer engine — citations, freshness, structured search. Used for /v1/search and any task that needs live-web context.
Default embeddings — 3072-dim, strong retrieval quality. Used for /v1/embeddings.
Backwards-compatible alias for the slug Claude Code CLI ships with as default. Routed to Anthropic Sonnet 4.6 via native passthrough — same Anthropic model, same prompt cache, same agentic UX.
All routed through Cloudflare AI Gateway. We add new providers as Cloudflare adds them to its supported list.
Want the full machine-readable catalog? It's public and edge-cached:
curl https://api.qlaud.ai/v1/catalog