How big is the Postgres I'd usually need for a chat app?

For a chat app at any real scale: a `messages` table with `(thread_id, seq, role, content_jsonb, created_at)`, a unique index, a per-user retention cron, GDPR cascade-deletes, an embedding pipeline that fires on every message, and a vector index for semantic search. Maintaining this at scale is a part-time job for one engineer. qlaud handles it as a managed primitive — your code never touches a DB for chat state.

Does the model see the full thread on every turn?

Yes. POST /v1/threads/:id/messages with just `content` (the new user turn) auto-loads prior messages into the model's context. You don't pass a `messages` array — qlaud assembles it from persisted history. For long threads we apply a sliding-window strategy that keeps the system message + recent turns + relevant retrieval; you can override by passing a full `messages` array if you want explicit control.

How does pagination work for the chat UI scroll-up?

GET /v1/threads/:id/messages?order=desc&limit=30 returns the latest 30 messages newest-first (the initial paint). For older messages, pass `before_seq` from the previous response: `?order=desc&limit=30&before_seq=12`. The cursor-based shape (instead of offset/limit) means infinite scroll stays consistent even if a new message arrives mid-scroll.

What about real-time updates if the same thread is open in two tabs?

qlaud doesn't push WebSocket updates for thread mutations (yet). For the same-user-multi-tab case, the cleanest pattern is your client polling /v1/threads/:id/messages?after_seq=last_known on focus, or using BroadcastChannel for cross-tab sync within one browser. Most chat apps don't need cross-tab realtime — it's nice-to-have, not table-stakes.

Build a ChatGPT clone in 200 lines (no database)

Building a ChatGPT clone is a 2024 cliché — except every production version still needs auth, per-user billing, message persistence, semantic search, and tool integrations, which adds up to weeks of plumbing. This is the end-to-end Next.js + Clerk + qlaud version that ships in ~200 lines because qlaud owns the bottom three layers. Full source is open at github.com/qlaud/chatai.

What we're building

Sign-in via Clerk (email + Google + GitHub)
Per-user qlaud key minted on signup with a $5 spend cap
Chat UI with streaming, message history, infinite-scroll pagination
Tool calls work out of the box — "send an email", "create a Linear ticket", "run this Python" all dispatch to qlaud's 47 catalog tools without per-tool wiring
Semantic search across the user's past conversations

What we're NOT building (because qlaud handles it):

A messages table or any database for chat state
An OpenAI/Anthropic proxy
A metering pipeline or spend-cap enforcement
An embedding pipeline + vector index
A tool dispatch state machine
Per-vendor MCP server registration

1. Project setup

npx create-next-app@latest chatai --typescript --tailwind --app
cd chatai
pnpm add @clerk/nextjs

Add Clerk env vars to .env.local:

NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=pk_test_…
CLERK_SECRET_KEY=sk_test_…
CLERK_WEBHOOK_SECRET=whsec_…
QLAUD_MASTER_KEY=qlk_live_…   # mint at qlaud.ai/keys

2. Mint a per-user qlaud key on signup

Wire a Clerk webhook handler that fires on user.created, mints a qlaud key with a $5 cap, and stashes it in Clerk privateMetadata so it's available on every authenticated request.

// src/app/api/webhooks/clerk/route.ts
import { Webhook } from 'svix';
import { clerkClient } from '@clerk/nextjs/server';

export async function POST(req: Request) {
  const wh = new Webhook(process.env.CLERK_WEBHOOK_SECRET!);
  const event = wh.verify(await req.text(), {
    'svix-id': req.headers.get('svix-id')!,
    'svix-timestamp': req.headers.get('svix-timestamp')!,
    'svix-signature': req.headers.get('svix-signature')!,
  }) as { type: string; data: { id: string } };

  if (event.type === 'user.created') {
    const qlaud = await fetch('https://api.qlaud.ai/v1/keys', {
      method: 'POST',
      headers: {
        Authorization: `Bearer ${process.env.QLAUD_MASTER_KEY}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        name: `user_${event.data.id}`,
        scope: 'standard',
        max_spend_usd: 5,
      }),
    }).then((r) => r.json());

    await (await clerkClient()).users.updateUserMetadata(event.data.id, {
      privateMetadata: {
        qlaud_key_id: qlaud.id,
        qlaud_key: qlaud.secret,   // shown ONCE — store now
      },
    });
  }
  return Response.json({ ok: true });
}

3. Server-only helper to read the user's qlaud key

// src/lib/user-state.ts
import 'server-only';
import { auth, clerkClient } from '@clerk/nextjs/server';

export async function getQlaudKey(): Promise<string> {
  const { userId } = await auth();
  if (!userId) throw new Error('not signed in');
  const u = await (await clerkClient()).users.getUser(userId);
  const key = u.privateMetadata.qlaud_key as string | undefined;
  if (!key) throw new Error('no qlaud key for user');
  return key;
}

4. Chat route — streaming, with tools, no DB

The chat route creates a thread on first message, then streams replies. tools_mode defaults to 'dynamic' — qlaud injects the 4 meta-tools and the model auto-discovers Linear / GitHub / web search / email / etc. on demand.

// src/app/api/chat/route.ts
import { getQlaudKey } from '@/lib/user-state';

const QLAUD_BASE = 'https://api.qlaud.ai';

export async function POST(req: Request) {
  const { threadId, message } = await req.json();
  const key = await getQlaudKey();

  // Create thread on first message
  let tid = threadId;
  if (!tid) {
    const t = await fetch(`${QLAUD_BASE}/v1/threads`, {
      method: 'POST',
      headers: { Authorization: `Bearer ${key}`, 'Content-Type': 'application/json' },
      body: JSON.stringify({ end_user_id: 'self' }),  // single-tenant chat app
    }).then((r) => r.json());
    tid = t.id;
  }

  // Stream the assistant reply — qlaud loads prior messages, persists this one,
  // dispatches any tool calls, returns Anthropic-shape SSE.
  const upstream = await fetch(`${QLAUD_BASE}/v1/threads/${tid}/messages`, {
    method: 'POST',
    headers: { Authorization: `Bearer ${key}`, 'Content-Type': 'application/json' },
    body: JSON.stringify({
      model: 'claude-sonnet-4-6',
      max_tokens: 2048,
      content: message,
      stream: true,
    }),
  });

  return new Response(upstream.body, {
    headers: {
      'content-type': 'text/event-stream',
      'x-thread-id': tid,
    },
  });
}

5. Message history — paginated, cursor-based

// src/app/api/threads/[id]/messages/route.ts
import { getQlaudKey } from '@/lib/user-state';

export async function GET(
  req: Request,
  { params }: { params: Promise<{ id: string }> },
) {
  const { id } = await params;
  const key = await getQlaudKey();
  const url = new URL(req.url);
  const beforeSeq = url.searchParams.get('before_seq');

  const r = await fetch(
    `https://api.qlaud.ai/v1/threads/${id}/messages?` +
    new URLSearchParams({
      order: 'desc', limit: '30',
      ...(beforeSeq ? { before_seq: beforeSeq } : {}),
    }),
    { headers: { Authorization: `Bearer ${key}` } },
  );
  return new Response(r.body, { headers: { 'content-type': 'application/json' } });
}

6. The chat UI (client component)

// src/app/chat/page.tsx — simplified for clarity, full source on github
'use client';
import { useState } from 'react';

type Msg = { role: 'user' | 'assistant'; text: string };

export default function ChatPage() {
  const [messages, setMessages] = useState<Msg[]>([]);
  const [input, setInput] = useState('');
  const [threadId, setThreadId] = useState<string | null>(null);

  async function send() {
    if (!input.trim()) return;
    const userMsg: Msg = { role: 'user', text: input };
    const placeholder: Msg = { role: 'assistant', text: '' };
    setMessages((m) => [...m, userMsg, placeholder]);
    setInput('');

    const r = await fetch('/api/chat', {
      method: 'POST',
      body: JSON.stringify({ threadId, message: input }),
    });
    setThreadId(r.headers.get('x-thread-id'));

    const reader = r.body!.getReader();
    const decoder = new TextDecoder();
    let assistantText = '';
    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      // Parse Anthropic-shape SSE — content_block_delta events carry text
      for (const line of decoder.decode(value).split('\n')) {
        if (!line.startsWith('data: ')) continue;
        try {
          const evt = JSON.parse(line.slice(6));
          if (evt.type === 'content_block_delta' && evt.delta?.text) {
            assistantText += evt.delta.text;
            setMessages((m) => {
              const next = [...m];
              next[next.length - 1] = { role: 'assistant', text: assistantText };
              return next;
            });
          }
        } catch {}
      }
    }
  }

  return (
    <div className="mx-auto max-w-2xl p-6">
      <div className="space-y-3">
        {messages.map((m, i) => (
          <div key={i} className={m.role === 'user' ? 'text-right' : ''}>
            <span className="inline-block rounded-lg border px-3 py-2">{m.text || '…'}</span>
          </div>
        ))}
      </div>
      <div className="mt-4 flex gap-2">
        <input
          value={input}
          onChange={(e) => setInput(e.target.value)}
          onKeyDown={(e) => e.key === 'Enter' && send()}
          className="flex-1 rounded-md border px-3 py-2"
          placeholder="Ask anything… ('search the web for X', 'create a Linear ticket', …)"
        />
        <button onClick={send} className="rounded-md bg-black px-4 py-2 text-white">Send</button>
      </div>
    </div>
  );
}

That's the whole app

Sum it up: ~30 lines for the Clerk webhook, ~5 lines for the user-state helper, ~30 lines for the chat route, ~15 lines for the messages-list route, ~80 lines for the UI component — about 160 lines of TypeScript, plus boilerplate. Under 200 total.

What you get for those 200 lines:

Streaming chat with conversation memory across turns
Per-user budget enforcement (request > cap → 402)
Tool calls — say "send an email to bob@…" or "create a Linear ticket: Fix login" and it just works (the model auto-discovers email + Linear via qlaud's meta-tools)
Cursor-based pagination for infinite scroll
Per-user usage rollup at /v1/usage for invoicing (Stripe, Paddle, whatever)

Bonus: semantic search across past chats

Add one route. Every persisted message is auto-indexed in Vectorize.

// src/app/api/search/route.ts
import { getQlaudKey } from '@/lib/user-state';

export async function GET(req: Request) {
  const q = new URL(req.url).searchParams.get('q')!;
  const key = await getQlaudKey();
  const r = await fetch(
    `https://api.qlaud.ai/v1/search?q=${encodeURIComponent(q)}&limit=10`,
    { headers: { Authorization: `Bearer ${key}` } },
  );
  return new Response(r.body, { headers: { 'content-type': 'application/json' } });
}

Hook a search input into your UI, hit /api/search?q=…, render the snippets. No embedding pipeline, no vector index to maintain.

Get the full source

The reference implementation is open source — github.com/qlaud/chatai. Includes the polished UI (markdown rendering, code highlighting, tool-call rendering, mobile responsive), error states, the Clerk webhook handler with svix verification, and a Vercel deploy config. Fork it to start your own.