LiteLLM is the canonical self-hosted LLM proxy: a Python service you deploy, point at every provider you use (OpenAI, Anthropic, Bedrock, Vertex, Azure OpenAI, dozens more), and call through a unified OpenAI-shape API. qlaud is a hosted gateway with the same routing primitive plus four other layers most production AI apps end up rebuilding. Honest comparison so you can pick without regretting it later.
What LiteLLM does
LiteLLM started as a Python package that translated between provider SDKs. It grew into a self-hosted proxy service — you run a container, configure providers in YAML, and your app talks OpenAI to LiteLLM, LiteLLM talks native to whichever upstream the request resolved to. Strengths:
- Self-hosted. Runs in your VPC, your datacenter, your air-gapped lab. No third-party data plane.
- Massive provider catalog. Bedrock, Vertex, Azure OpenAI, Together, Replicate, Ollama, vLLM, plus the frontier providers — 100+ supported.
- Open source. Apache 2.0 with an Enterprise tier for the higher-tier features.
- Team budgets + virtual keys. Manage spend across internal teams with key abstractions.
- Caching, fallback, retries built into the proxy.
For a company that needs an internal LLM proxy on-prem, LiteLLM is excellent. It's the right answer when your data residency or compliance posture forbids sending traffic through a hosted third party.
What qlaud is
qlaud is a hosted gateway with the same OpenAI-compatible / Anthropic-compatible routing surface — but built around different primary use case: SaaS apps that ship to end-users you bill, where the gateway also handles billing, conversation state, vendor connectors, and tool dispatch.
Side-by-side
| Capability | LiteLLM | qlaud |
|---|---|---|
| Unified OpenAI-shape proxy | ✓ | ✓ |
| Native Anthropic-shape proxy (cache_control, etc.) | ✓ | ✓ |
| Self-hosted option | ✓ | — |
| Hosted (no infra) | cloud tier | ✓ (only) |
| 100+ provider support | ✓ | ~25 frontier |
| Per-team / per-key budgets | ✓ | ✓ |
| Per-end-user keys + hard spend caps | partial (Enterprise) | ✓ |
| Stripe-backed prepaid wallet | — | ✓ |
| Per-end-user usage rollup for invoicing | — | ✓ |
| Managed conversation threads | — | ✓ |
| Semantic search across conversations | — | ✓ |
| Vendor MCP catalog (Linear, GitHub, …) | — | ✓ (35) |
| First-party tool builtins (E2B, web search, …) | — | ✓ (12) |
| Hosted per-user connect URL flow | — | ✓ |
| Background async batch jobs | — | ✓ |
Pick LiteLLM when
- You need to self-host. Data residency, compliance, regulated industry, air-gapped lab, on-prem enterprise. qlaud is hosted-only; LiteLLM runs anywhere Python runs.
- You need the long-tail provider catalog. Ollama for local inference, vLLM for self-hosted Llama, obscure Bedrock models — LiteLLM's 100+ provider list covers more than qlaud's curated frontier set.
- Your consumer is "the company", not end-users. Internal team chatbot, internal AI tools. You don't need per-user-keyed billing because there aren't external users to bill.
- You want full control of the routing logic. Custom routing strategies, custom caching layers, custom observability — LiteLLM's extensible Python middleware hooks let you plug in.
Pick qlaud when
- You ship to end-users you bill. Per-user keys with hard
max_spend_usdcaps + per-user usage rollup are the core primitive. Build a SaaS that wraps AI in a couple hundred lines instead of 6 weeks. - You want zero infra to maintain. qlaud runs on Cloudflare's edge — you don't pick instance sizes, restart pods, monitor uptime, or pay for idle CPU.
- You're building a chatbot and don't want to maintain a Postgres-of-messages + a vector index. Threads + semantic search are managed primitives.
- Your model needs to call third-party tools. 47 tools out of the box (Linear, GitHub, ClickUp, Notion, Stripe, Sentry, E2B code execution, web search, image gen, email — full list at /blog/47-tools-out-of-the-box-mcp-catalog-plus-builtins).
The migration shape (either direction)
Both expose the same OpenAI-shape and Anthropic-shape APIs. Migration is a base URL change:
# LiteLLM (self-hosted)
OPENAI_BASE_URL=https://your-litellm.your-vpc.com/v1
ANTHROPIC_BASE_URL=https://your-litellm.your-vpc.com
# qlaud (hosted)
OPENAI_BASE_URL=https://api.qlaud.ai/v1
ANTHROPIC_BASE_URL=https://api.qlaud.aiYour SDK code stays the same. The added layers (per-user keys, threads, tools) are opt-in — if you don't use them on qlaud, you're paying the 7% gateway markup for routing only. Make sure that's the trade you want before switching purely for the platform.
Honest summary
These tools solve overlapping but different problems. LiteLLM is the right answer for self-hosted internal infrastructure where you control the data plane. qlaud is the right answer for hosted SaaS where you ship AI to end-users you bill, and want the billing + threads + tool layers managed instead of rebuilt. They aren't direct competitors — they target different teams.
If you're evaluating both, the question to answer first is: do you have to self-host? If yes, LiteLLM. If no, qlaud has more layers built in and you ship faster.
Get started
Sign up for qlaud, top up $5. Same OpenAI-compatible interface as LiteLLM, plus the layers you'd otherwise build yourself. If you're currently running LiteLLM and considering offloading the infra, the migration is a base URL change.