Watch any AI app launch in 2026 and the architecture diagram is depressingly identical. Clerk for auth. Stripe for billing. Postgres for chat history. A custom Express/FastAPI/Next route that proxies OpenAI and meters tokens into a usage table. Add Pinecone or pgvector for semantic search. Add a Lambda for async background jobs. You spent six weeks before you wrote a single line of the actual product.
Three of those layers are commodity for AI apps specifically. qlaud collapses them into one platform so you can spend your engineering time on the experience, not the plumbing.
The default stack today
If you start an AI app from scratch right now you end up with:
- Auth: Clerk, Auth0, Supabase Auth, or rolled-from-scratch JWTs.
- Billing: Stripe + a custom proxy that intercepts every inference call, counts tokens, looks up provider prices, decrements a per-user counter, returns 402 when over a cap.
- Database for chat: Postgres or Supabase, with a
messagestable, indexes onthread_id + created_at, retention crons, GDPR delete plumbing. - Semantic search: pgvector / Pinecone / Weaviate plus an embedding pipeline that fires on every message.
- The proxy itself: a Next route or FastAPI service that translates between your client and the upstream provider, handles streaming, retries, fallback, observability.
- Tool dispatch: a separate state machine that parses
tool_useblocks, calls the right function, appends results, re-invokes the model. - Connector vendor APIs: Linear, GitHub, Notion, Slack, etc. — each with its own SDK, its own auth dance, its own credential store.
Half of these you don't actually want to run. They're AI-app commodity infra. You want to run the part that's actually your product.
What qlaud collapses
Three of the seven layers above become managed primitives. Auth stays yours (Clerk or whatever) because that's product-shaped, not AI-shaped.
Billing → qlaud
POST /v1/keys mints a per-user API key with a hard max_spend_usd cap. The gateway enforces the cap before forwarding to the provider — over-cap requests return 402 immediately, no upstream burn. A Stripe-backed prepaid wallet handles top-ups end-to-end. GET /v1/usage gives you per-user spend rollups for invoicing whatever way you bill (Stripe subs, Paddle, in-app credits).
// Replaces your custom proxy + metering + Stripe usage records
await fetch('https://api.qlaud.ai/v1/keys', {
method: 'POST',
headers: { Authorization: `Bearer ${MASTER_KEY}` },
body: JSON.stringify({
name: `user_${userId}`,
scope: 'standard',
max_spend_usd: 5, // enforced at the gateway
}),
});Database for chat → qlaud threads
POST /v1/threads creates a conversation. POST /v1/threads/:id/messages sends the next turn — qlaud auto-loads prior messages into the model's context, persists the assistant reply, returns the response. GET /v1/threads/:id/messages?order=desc&limit=30 paginates for your UI's scroll-up history. No Postgres table. No index decisions. No retention cron.
// Your app's chat route — no DB writes, no message-state plumbing
const r = await fetch(`https://api.qlaud.ai/v1/threads/${threadId}/messages`, {
method: 'POST',
headers: { Authorization: `Bearer ${user.qlaudKey}` },
body: JSON.stringify({
model: 'claude-sonnet-4-6',
max_tokens: 1024,
content: userInput, // qlaud loads the rest
stream: true,
}),
});
return new Response(r.body, { headers: { 'content-type': 'text/event-stream' } });Semantic search → qlaud /v1/search
Every persisted message gets indexed in Vectorize automatically. GET /v1/search?q=… returns ranked snippets across every conversation in your account, scoped by end_user_id if you set one on the thread. No embedding pipeline to wire, no pgvector index to maintain.
Proxy + tool dispatch → built-in
The router is the same OpenAI-compatible / Anthropic-compatible surface every other gateway exposes (api.qlaud.ai as base URL). The tool-dispatch loop runs INSIDE qlaud — when you set tools_mode: 'dynamic' (the default when no explicit tools array is passed), the gateway injects 4 meta-tools, the model picks/dispatches them, and you just get the final assistant message back. No state machine on your side.
Connectors → 35 catalog vendors, default-on
Linear, GitHub, ClickUp, Notion, Stripe, Sentry, HubSpot, Salesforce, Asana, etc. — every per-user-capable catalog vendor is auto-discoverable by the model. End-user pastes their API key once via a hosted connect URL, qlaud encrypts it, the tools materialize. See 47 tools out of the box for the full list.
What you keep building
- The actual chat UI — message bubbles, streaming text renderer, tool-use rendering, mobile responsiveness.
- Auth — Clerk or whatever you already use.
- Onboarding, settings, the parts of your product that aren't AI-shaped.
- Your prompt engineering and the system messages that make your assistant feel like YOUR assistant.
- Custom tools your model needs that aren't in the catalog — register them via
POST /v1/toolsas a webhook, your endpoint runs the logic, qlaud dispatches.
Trade-offs — be honest
You give up some flexibility for managed simplicity. Specifically:
- You don't own the database row layout. Threads + messages live in qlaud's D1. You read them via API, you can export them, but you can't do a one-off SQL query against them. If your chat history is the core IP of your product (most apps it isn't), this is a real tradeoff.
- You bind your AI billing to qlaud's 7% markup. Whatever you charge end-users, qlaud takes the 7% on top of provider cost. If you do $100K/month of inference at razor margins, that's a real line item — but it's also less than the engineer-time to maintain a metering pipeline + caps + Stripe usage records + per-provider price tracking.
- Self-hosting isn't supported. qlaud is a hosted service. If you need on-prem or air-gapped, see LiteLLM for routing and build the rest yourself.
The math
Default stack: ~6 weeks of one engineer to ship per-user metering + cap enforcement + chat persistence + semantic search + a couple of tool integrations. That's ~$30-60K loaded cost before any product work starts, plus ongoing maintenance.
qlaud: a base URL change and four lines on signup (full walkthrough). One bill at month-end. Six weeks back to spend on the parts of your product users actually see.
Get started
Sign up for qlaud, top up $5, mint a master key. Threads + tools + per-user billing are live from request #1. The chat-app reference impl with Clerk + qlaud (full source) is at /blog/per-user-ai-billing-in-5-minutes.