ArchitectureApr 28, 20267 min read

Stop building auth, billing, and a database for your AI app

The default AI-app stack is Clerk + Stripe + Postgres + a custom OpenAI/Anthropic proxy. qlaud collapses three of those four into one platform: per-user keys with spend caps, a Stripe-backed wallet, threads + semantic search. You write the chat UI, qlaud handles the backend.

qlaud teamEngineering

Watch any AI app launch in 2026 and the architecture diagram is depressingly identical. Clerk for auth. Stripe for billing. Postgres for chat history. A custom Express/FastAPI/Next route that proxies OpenAI and meters tokens into a usage table. Add Pinecone or pgvector for semantic search. Add a Lambda for async background jobs. You spent six weeks before you wrote a single line of the actual product.

Three of those layers are commodity for AI apps specifically. qlaud collapses them into one platform so you can spend your engineering time on the experience, not the plumbing.

The default stack today

If you start an AI app from scratch right now you end up with:

  • Auth: Clerk, Auth0, Supabase Auth, or rolled-from-scratch JWTs.
  • Billing: Stripe + a custom proxy that intercepts every inference call, counts tokens, looks up provider prices, decrements a per-user counter, returns 402 when over a cap.
  • Database for chat: Postgres or Supabase, with a messages table, indexes on thread_id + created_at, retention crons, GDPR delete plumbing.
  • Semantic search: pgvector / Pinecone / Weaviate plus an embedding pipeline that fires on every message.
  • The proxy itself: a Next route or FastAPI service that translates between your client and the upstream provider, handles streaming, retries, fallback, observability.
  • Tool dispatch: a separate state machine that parses tool_use blocks, calls the right function, appends results, re-invokes the model.
  • Connector vendor APIs: Linear, GitHub, Notion, Slack, etc. — each with its own SDK, its own auth dance, its own credential store.

Half of these you don't actually want to run. They're AI-app commodity infra. You want to run the part that's actually your product.

What qlaud collapses

Three of the seven layers above become managed primitives. Auth stays yours (Clerk or whatever) because that's product-shaped, not AI-shaped.

Billing → qlaud

POST /v1/keys mints a per-user API key with a hard max_spend_usd cap. The gateway enforces the cap before forwarding to the provider — over-cap requests return 402 immediately, no upstream burn. A Stripe-backed prepaid wallet handles top-ups end-to-end. GET /v1/usage gives you per-user spend rollups for invoicing whatever way you bill (Stripe subs, Paddle, in-app credits).

// Replaces your custom proxy + metering + Stripe usage records
await fetch('https://api.qlaud.ai/v1/keys', {
  method: 'POST',
  headers: { Authorization: `Bearer ${MASTER_KEY}` },
  body: JSON.stringify({
    name: `user_${userId}`,
    scope: 'standard',
    max_spend_usd: 5,    // enforced at the gateway
  }),
});

Database for chat → qlaud threads

POST /v1/threads creates a conversation. POST /v1/threads/:id/messages sends the next turn — qlaud auto-loads prior messages into the model's context, persists the assistant reply, returns the response. GET /v1/threads/:id/messages?order=desc&limit=30 paginates for your UI's scroll-up history. No Postgres table. No index decisions. No retention cron.

// Your app's chat route — no DB writes, no message-state plumbing
const r = await fetch(`https://api.qlaud.ai/v1/threads/${threadId}/messages`, {
  method: 'POST',
  headers: { Authorization: `Bearer ${user.qlaudKey}` },
  body: JSON.stringify({
    model: 'claude-sonnet-4-6',
    max_tokens: 1024,
    content: userInput,                        // qlaud loads the rest
    stream: true,
  }),
});
return new Response(r.body, { headers: { 'content-type': 'text/event-stream' } });

Semantic search → qlaud /v1/search

Every persisted message gets indexed in Vectorize automatically. GET /v1/search?q=… returns ranked snippets across every conversation in your account, scoped by end_user_id if you set one on the thread. No embedding pipeline to wire, no pgvector index to maintain.

Proxy + tool dispatch → built-in

The router is the same OpenAI-compatible / Anthropic-compatible surface every other gateway exposes (api.qlaud.ai as base URL). The tool-dispatch loop runs INSIDE qlaud — when you set tools_mode: 'dynamic' (the default when no explicit tools array is passed), the gateway injects 4 meta-tools, the model picks/dispatches them, and you just get the final assistant message back. No state machine on your side.

Connectors → 35 catalog vendors, default-on

Linear, GitHub, ClickUp, Notion, Stripe, Sentry, HubSpot, Salesforce, Asana, etc. — every per-user-capable catalog vendor is auto-discoverable by the model. End-user pastes their API key once via a hosted connect URL, qlaud encrypts it, the tools materialize. See 47 tools out of the box for the full list.

What you keep building

  • The actual chat UI — message bubbles, streaming text renderer, tool-use rendering, mobile responsiveness.
  • Auth — Clerk or whatever you already use.
  • Onboarding, settings, the parts of your product that aren't AI-shaped.
  • Your prompt engineering and the system messages that make your assistant feel like YOUR assistant.
  • Custom tools your model needs that aren't in the catalog — register them via POST /v1/tools as a webhook, your endpoint runs the logic, qlaud dispatches.

Trade-offs — be honest

You give up some flexibility for managed simplicity. Specifically:

  • You don't own the database row layout. Threads + messages live in qlaud's D1. You read them via API, you can export them, but you can't do a one-off SQL query against them. If your chat history is the core IP of your product (most apps it isn't), this is a real tradeoff.
  • You bind your AI billing to qlaud's 7% markup. Whatever you charge end-users, qlaud takes the 7% on top of provider cost. If you do $100K/month of inference at razor margins, that's a real line item — but it's also less than the engineer-time to maintain a metering pipeline + caps + Stripe usage records + per-provider price tracking.
  • Self-hosting isn't supported. qlaud is a hosted service. If you need on-prem or air-gapped, see LiteLLM for routing and build the rest yourself.

The math

Default stack: ~6 weeks of one engineer to ship per-user metering + cap enforcement + chat persistence + semantic search + a couple of tool integrations. That's ~$30-60K loaded cost before any product work starts, plus ongoing maintenance.

qlaud: a base URL change and four lines on signup (full walkthrough). One bill at month-end. Six weeks back to spend on the parts of your product users actually see.

Get started

Sign up for qlaud, top up $5, mint a master key. Threads + tools + per-user billing are live from request #1. The chat-app reference impl with Clerk + qlaud (full source) is at /blog/per-user-ai-billing-in-5-minutes.

#ai app backend#stripe for ai#ai billing#managed ai infrastructure#no database chatbot#per-user keys

Frequently asked questions

+Doesn't 'no database' just mean qlaud is your single point of failure?

qlaud's data plane is Cloudflare D1 + Durable Objects + Vectorize across the global edge — the same primitives Cloudflare uses for their own production services. We expose every piece of your data via API: GET /v1/threads/:id/messages dumps the full transcript any time you want it, GET /v1/usage exports billing rollups. If you ever want to migrate off, the data is yours to take. The 'no database' promise is about not running ONE for your AI state — not about lock-in.

+How does this compare to Supabase + the OpenAI SDK?

Supabase gives you Postgres + auth, you still wire metering, spend caps, threads, semantic search, and tool dispatch yourself. That's a quarter of engineering. qlaud gives you all four as managed primitives, you keep Clerk (or Supabase Auth) for sign-in, and ship the actual product. Roughly: Supabase + OpenAI = batteries-not-included; qlaud = batteries-included for the AI layer specifically.

+What if I need a custom DB for non-AI app data?

Use one. qlaud handles AI-specific state (threads, messages, semantic index, tool credentials, wallet). You still need a DB for product data — user profiles, settings, your own business logic. Most apps end up with a small Postgres or SQLite for app state + qlaud for everything AI-shaped. The 'no database' claim is specifically about the chat-backend bit.

+Can I self-host qlaud if I don't want a managed service?

Not currently. qlaud is a hosted service (Cloudflare Workers + D1 + DO). If you specifically need self-hosted (on-prem, air-gapped, regulated environment) the right tool is LiteLLM for routing + your own per-user metering layer. qlaud's value proposition is 'don't run infrastructure for your AI app' — which is the wrong fit if you must run infrastructure.

Keep reading

Ship an AI agent on qlaud in under a minute.

Hardware-isolated microVM per sandbox, ~190 ms round-trip, 80 ms fork(), full Python REPL persistence. Free tier includes $200 credit.

Get started free
Stop building auth, billing, and a database for your AI app — qlaud