TutorialsApr 25, 20267 min read

Per-user AI billing in 5 minutes (without rebuilding metering)

If you're shipping a product that wraps frontier AI, you need per-user usage caps and per-user invoices. Here's the four-step playbook with qlaud — mint, cap, meter, bill — using the official SDKs you're already using.

qlaud teamEngineering

You shipped an AI app. Day-one users love it. Day-seven you notice your provider bill is climbing fast and you have no idea which user is responsible. Day-fourteen one user's leaked browser key burns through $400 in three hours. This post walks through the four-step pattern that prevents that — without ripping out the SDK you're already using.

The pattern: mint, cap, meter, bill

Same shape as Stripe Connect, AWS sub-accounts, or any other tenant-scoped infra primitive — but for AI inference. You hold one master key. On signup you mint a per-user child key with a hard spend cap. Your users' requests use their own key, and the gateway enforces the cap before forwarding to the provider. At month-end you pull a per-user spend rollup and push it into whatever billing system you already run.

Step 1: mint your master key

Sign up at qlaud.ai, top up the wallet, and in /keys create a master-scope key. Store it as a server-side secret — this key can mint, list, and revoke other keys, so treat it like an admin token.

# .env (server-side only — never ship to the browser)
QLAUD_MASTER_KEY=qlk_live_<your_master_key>

Step 2: mint a per-user key on signup, with a cap

Wherever your user-signup webhook lives (Clerk user.created, Supabase auth trigger, your own POST /signup) — add four lines that mint a qlaud key for the new user with a hard spend cap.

// On user signup in your app
const res = await fetch('https://api.qlaud.ai/v1/keys', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.QLAUD_MASTER_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    name: `user_${userId}`,
    scope: 'standard',     // inference-only, can't mint sub-keys
    max_spend_usd: 5,      // hard cap — gateway enforces, returns 402 over
  }),
});

const { id: keyId, secret } = await res.json();
// secret = qlk_live_... — returned ONCE, store with the user
await db.users.update({ id: userId }, { qlaudKeyId: keyId, qlaudKey: secret });

The secret is shown once — store it with the user record. The id is what you use to revoke or look up usage later. The cap is enforced at the gateway, not in your app code: a request that would push the user over their cap returns 402 Payment Required before it ever reaches the upstream provider.

Step 3: your user's app calls qlaud with their key

This is the part where you change nothing about your existing SDK code. You're already using the OpenAI SDK, the Anthropic SDK, the ElevenLabs SDK, Vercel AI SDK, LangChain — point the base URL at qlaud and pass the user's minted key. Same request shape, same response shape, same streaming semantics.

// Anthropic SDK — Claude with cache_control, tool_use, thinking blocks all preserved
import Anthropic from '@anthropic-ai/sdk';
const claude = new Anthropic({
  baseURL: 'https://api.qlaud.ai',
  apiKey: user.qlaudKey,                  // qlk_live_... minted in step 2
});
await claude.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 1024,
  messages: [{ role: 'user', content: prompt }],
});
// OpenAI SDK — same key, same gateway
import OpenAI from 'openai';
const openai = new OpenAI({
  baseURL: 'https://api.qlaud.ai/v1',
  apiKey: user.qlaudKey,
});
await openai.chat.completions.create({
  model: 'gpt-5.4',
  messages: [{ role: 'user', content: prompt }],
});

Anthropic, OpenAI, ElevenLabs, Deepgram, Google AI Studio, xAI Grok, Alibaba Qwen, Moonshot Kimi, MiniMax, DeepSeek, Mistral, Cerebras — all accessible through the same key, same base URL. Whatever auth-header convention your SDK uses (x-api-key, xi-api-key, x-goog-api-key, Authorization: Bearer, Authorization: Token), qlaud accepts it.

Step 4: pull per-user spend at month-end and bill

Whenever your billing cron fires — first of the month, every Sunday, end of trial — pull the rollup with a single GET. The response groups spend by key, broken down by model, with token counts and latency.

// Pull last month's per-user usage
const fromMs = startOfLastMonth.getTime();
const toMs = endOfLastMonth.getTime();
const res = await fetch(
  `https://api.qlaud.ai/v1/usage?from_ms=${fromMs}&to_ms=${toMs}`,
  { headers: { Authorization: `Bearer ${process.env.QLAUD_MASTER_KEY}` } },
);

const { by_key } = await res.json();
// by_key: [{ key_id, name, cost_micros, by_model: [...] }, ...]

for (const row of by_key) {
  const user = await db.users.findOne({ qlaudKeyId: row.key_id });
  const upstreamUsd = row.cost_micros / 1_000_000;  // micros -> dollars
  const yourMargin = upstreamUsd * YOUR_MARGIN_MULTIPLIER;
  const billUsd = upstreamUsd + yourMargin;
  // Push to whatever billing tool you're already using
  // (Stripe, Paddle, Lemon Squeezy, in-app credits, custom invoicing)
  await yourBillingTool.charge(user, billUsd);
}

Costs are returned in micros (1/1,000,000 USD) for exact decimal arithmetic — no floating-point rounding when you reconcile. The by_key array only contains keys you minted, so you see only your own users' usage. Want a single user's breakdown? Hit /v1/keys/:keyId/usage for an event-level drill-down with cap progress.

Why this beats rolling your own

Per-user metering looks deceptively simple — it's "just" counting tokens and writing rows. The hard parts are the ones you don't see until you ship:

  • Streaming usage is inconsistent across providers. Some include usage in the final SSE chunk, some include it only when you set stream_options.include_usage = true, some omit it entirely on partial streams. We patch this per-provider.
  • Caps have to be enforced before the request, not after. A naive "decrement after response" lets one user fan out 50 parallel requests and overspend by 50× the cap.
  • Cost calc has to track every model price change. Providers re-price quietly. We track the catalog so your invoice math doesn't silently go wrong.
  • Native SDK shapes are not all OpenAI-compatible. Anthropic cache_control, ElevenLabs voice cloning, Deepgram diarization — these don't fit OpenAI's schema. We do native passthrough so the SDK you're using actually works.

You spend a quarter writing this and you're still patching it next year. Or you change a base URL and add four lines on signup.

Get started

Sign up for qlaud, top up $5, mint a master key, ship per-user billing tonight. Your users' SDK code stays exactly the same.

#per-user billing#AI gateway#usage metering#Stripe#API keys#spending caps

Frequently asked questions

+Why not just rate-limit my own OpenAI/Anthropic key?

Provider keys are global. You can't enforce per-user caps, can't see per-user spend, and one runaway user (or a leaked client-side key) can drain your account in minutes. qlaud mints a key per end-user with a hard spend cap enforced at the gateway — over the cap, the request returns 402 before it ever hits the provider.

+Do I need to rewrite my app to use qlaud?

No. qlaud accepts the official SDK request shape from OpenAI, Anthropic, ElevenLabs, Vercel AI SDK, LangChain, LlamaIndex — verbatim. You change one line: the base URL. Your app code, your prompts, your response handling — all unchanged.

+How do I bill the user the markup I want to charge?

qlaud charges you the upstream cost plus a flat 7% gateway fee. Whatever margin you put on top of that is yours — set it in your Stripe / Paddle / Lemon Squeezy product, charge per-token, charge per-feature, charge a flat subscription with a usage allowance. The /v1/usage endpoint gives you per-user cost in micros; you decide what to invoice.

+What if a user goes over their cap mid-request?

qlaud does a pre-flight check before forwarding. If the cap is already exceeded the request returns 402 Payment Required immediately. We tolerate cents-level overdraft on a single in-flight stream rather than killing it mid-response — once the stream finishes the next request is blocked until you raise the cap or top up.

Keep reading

Ship an AI agent on qlaud in under a minute.

Hardware-isolated microVM per sandbox, ~190 ms round-trip, 80 ms fork(), full Python REPL persistence. Free tier includes $200 credit.

Get started free
Per-user AI billing in 5 minutes (without rebuilding metering) — qlaud