ProductApr 28, 20267 min read

qlaud vs LiteLLM: same proxy idea, different layer

LiteLLM is a self-hosted Python proxy that wraps every LLM provider in an OpenAI-compatible interface. qlaud is a hosted gateway that does the same routing PLUS per-user billing, threads, semantic search, and 47 tools (35 vendor MCP connectors + 12 first-party builtins). When to pick which.

qlaud teamEngineering

LiteLLM is the canonical self-hosted LLM proxy: a Python service you deploy, point at every provider you use (OpenAI, Anthropic, Bedrock, Vertex, Azure OpenAI, dozens more), and call through a unified OpenAI-shape API. qlaud is a hosted gateway with the same routing primitive plus four other layers most production AI apps end up rebuilding. Honest comparison so you can pick without regretting it later.

What LiteLLM does

LiteLLM started as a Python package that translated between provider SDKs. It grew into a self-hosted proxy service — you run a container, configure providers in YAML, and your app talks OpenAI to LiteLLM, LiteLLM talks native to whichever upstream the request resolved to. Strengths:

  • Self-hosted. Runs in your VPC, your datacenter, your air-gapped lab. No third-party data plane.
  • Massive provider catalog. Bedrock, Vertex, Azure OpenAI, Together, Replicate, Ollama, vLLM, plus the frontier providers — 100+ supported.
  • Open source. Apache 2.0 with an Enterprise tier for the higher-tier features.
  • Team budgets + virtual keys. Manage spend across internal teams with key abstractions.
  • Caching, fallback, retries built into the proxy.

For a company that needs an internal LLM proxy on-prem, LiteLLM is excellent. It's the right answer when your data residency or compliance posture forbids sending traffic through a hosted third party.

What qlaud is

qlaud is a hosted gateway with the same OpenAI-compatible / Anthropic-compatible routing surface — but built around different primary use case: SaaS apps that ship to end-users you bill, where the gateway also handles billing, conversation state, vendor connectors, and tool dispatch.

Side-by-side

CapabilityLiteLLMqlaud
Unified OpenAI-shape proxy
Native Anthropic-shape proxy (cache_control, etc.)
Self-hosted option
Hosted (no infra)cloud tier✓ (only)
100+ provider support~25 frontier
Per-team / per-key budgets
Per-end-user keys + hard spend capspartial (Enterprise)
Stripe-backed prepaid wallet
Per-end-user usage rollup for invoicing
Managed conversation threads
Semantic search across conversations
Vendor MCP catalog (Linear, GitHub, …)✓ (35)
First-party tool builtins (E2B, web search, …)✓ (12)
Hosted per-user connect URL flow
Background async batch jobs

Pick LiteLLM when

  • You need to self-host. Data residency, compliance, regulated industry, air-gapped lab, on-prem enterprise. qlaud is hosted-only; LiteLLM runs anywhere Python runs.
  • You need the long-tail provider catalog. Ollama for local inference, vLLM for self-hosted Llama, obscure Bedrock models — LiteLLM's 100+ provider list covers more than qlaud's curated frontier set.
  • Your consumer is "the company", not end-users. Internal team chatbot, internal AI tools. You don't need per-user-keyed billing because there aren't external users to bill.
  • You want full control of the routing logic. Custom routing strategies, custom caching layers, custom observability — LiteLLM's extensible Python middleware hooks let you plug in.

Pick qlaud when

  • You ship to end-users you bill. Per-user keys with hard max_spend_usd caps + per-user usage rollup are the core primitive. Build a SaaS that wraps AI in a couple hundred lines instead of 6 weeks.
  • You want zero infra to maintain. qlaud runs on Cloudflare's edge — you don't pick instance sizes, restart pods, monitor uptime, or pay for idle CPU.
  • You're building a chatbot and don't want to maintain a Postgres-of-messages + a vector index. Threads + semantic search are managed primitives.
  • Your model needs to call third-party tools. 47 tools out of the box (Linear, GitHub, ClickUp, Notion, Stripe, Sentry, E2B code execution, web search, image gen, email — full list at /blog/47-tools-out-of-the-box-mcp-catalog-plus-builtins).

The migration shape (either direction)

Both expose the same OpenAI-shape and Anthropic-shape APIs. Migration is a base URL change:

# LiteLLM (self-hosted)
OPENAI_BASE_URL=https://your-litellm.your-vpc.com/v1
ANTHROPIC_BASE_URL=https://your-litellm.your-vpc.com

# qlaud (hosted)
OPENAI_BASE_URL=https://api.qlaud.ai/v1
ANTHROPIC_BASE_URL=https://api.qlaud.ai

Your SDK code stays the same. The added layers (per-user keys, threads, tools) are opt-in — if you don't use them on qlaud, you're paying the 7% gateway markup for routing only. Make sure that's the trade you want before switching purely for the platform.

Honest summary

These tools solve overlapping but different problems. LiteLLM is the right answer for self-hosted internal infrastructure where you control the data plane. qlaud is the right answer for hosted SaaS where you ship AI to end-users you bill, and want the billing + threads + tool layers managed instead of rebuilt. They aren't direct competitors — they target different teams.

If you're evaluating both, the question to answer first is: do you have to self-host? If yes, LiteLLM. If no, qlaud has more layers built in and you ship faster.

Get started

Sign up for qlaud, top up $5. Same OpenAI-compatible interface as LiteLLM, plus the layers you'd otherwise build yourself. If you're currently running LiteLLM and considering offloading the infra, the migration is a base URL change.

#litellm alternative#ai proxy comparison#llm gateway#openai compatible proxy#self-hosted ai proxy#managed ai gateway

Frequently asked questions

+Can I use both LiteLLM and qlaud together?

Technically yes — point LiteLLM's `litellm_params.api_base` at qlaud and you get LiteLLM's Python interface to qlaud's gateway. But there's no real reason — qlaud already exposes the same OpenAI-compatible interface LiteLLM proxies for, and you'd be paying both layers (LiteLLM hosting cost + qlaud's 7%). Pick one.

+Does LiteLLM support per-user spend caps?

LiteLLM has user-level budgets via its `LiteLLM-User-API-Key` middleware in the Enterprise tier. The Open Source version supports virtual keys + per-team budgets but not the per-user-key + spend-cap-enforced-pre-flight pattern qlaud was built around. If per-end-user billing is core to your app, qlaud is built for it; if you only need internal team budgets and self-hosting, LiteLLM works.

+What about threads, tool catalog, semantic search?

LiteLLM is a routing proxy. It doesn't ship threads (you build that), it doesn't ship a vendor MCP catalog (you wire each integration), it doesn't ship semantic search. qlaud bundles all three as managed primitives. If you only need routing — LiteLLM is a fine choice. If you'd otherwise build the rest yourself, qlaud collapses ~6 weeks of plumbing.

+Is LiteLLM faster because it cuts out a network hop?

Marginally — if you self-host LiteLLM in the same datacenter as your app, you save ~10-30ms vs. a hop to a hosted gateway. qlaud runs on Cloudflare's global edge (300+ PoPs), so the hop usually adds 5-15ms wherever your users are. For most apps the difference is invisible. If you need single-digit-millisecond inference latency, you probably already know it.

Keep reading

Ship an AI agent on qlaud in under a minute.

Hardware-isolated microVM per sandbox, ~190 ms round-trip, 80 ms fork(), full Python REPL persistence. Free tier includes $200 credit.

Get started free
qlaud vs LiteLLM: same proxy idea, different layer — qlaud