Platform deep-dive

Skip the Postgres. Skip the metering. Skip the connector wiring.

One POST /v1/threads/:id/messages and you have persisted conversations, 100+ vendor connectors, semantic search, per-user spend caps, and tool dispatch.See the master pitch on the landing page →

Threads

Managed conversation state

Send the next user turn. qlaud loads prior messages into the model's context, persists the assistant reply, returns the response. No `messages` table to maintain. No retention cron. Pagination + GDPR-delete are API calls.

POST /v1/threads/:id/messages

Connectors

105 vendors auto-discoverable

Linear, GitHub, ClickUp, Notion, Stripe, Sentry, HubSpot, Salesforce, Zapier, 97 more — auto-enabled per end-user. They authorize their own account via a one-time hosted URL. Zero per-vendor wiring on your side.

/connectors · 105 live

Semantic search

Every message indexed

Vectorize-backed semantic search across every conversation in your account, scoped by end-user. No embedding pipeline, no pgvector index, no separate vector store bill.

GET /v1/search?q=…

Per-user billing

Spend caps that actually fire

Mint a key per end-user with a hard `max_spend_usd` cap. The gateway enforces it BEFORE forwarding to the provider — over-cap requests return 402, no upstream burn. Pull per-user usage rollups for invoicing.

POST /v1/keys

What you usually have to build

The default AI-app stack is the same diagram in every Notion doc. qlaud collapses three of these four into managed primitives.

Layer	Without qlaud	With qlaud
Auth	Clerk / Auth0 / Supabase Auth	Bring your own — qlaud doesn't replace this
Per-user billing	Stripe + custom proxy that meters tokens, enforces caps, emits invoices	POST /v1/keys with max_spend_usd; gateway enforces; GET /v1/usage for invoicing
Conversation state	Postgres `messages` table + index + retention cron + GDPR cascade-delete	Threads API persists each turn; pagination is a query param
Semantic search	Pinecone / pgvector + embedding pipeline that fires on every message	GET /v1/search?q=… — auto-indexed in Vectorize
LLM proxy	Express / FastAPI route that streams, retries, falls back across providers	qlaud AI Gateway under the hood; native shapes preserved
Tool dispatch	State machine that parses tool_use, calls the function, appends result, re-invokes	Built-in dispatch loop; meta-tools for dynamic discovery
Connectors	Per-vendor SDK + per-user OAuth + credential vault + per-tool dispatch path	105 catalog vendors auto-enabled; hosted connect URL flow

You spend a quarter writing this and you're still patching it next year. Or you change a base URL and add four lines on signup.

The whole conversation, with auto-discovered Linear, in one call

No tools array, no Linear setup, no DB write, no embedding job.

POST https://api.qlaud.ai/v1/threads/:thread_id/messages
Authorization: Bearer qlk_live_<your_end_user's_key>

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 1024,
  "content": "Create a Linear ticket: 'Login button broken on Safari iOS'",
  "stream": true
}

# What happens, autonomously, inside one HTTP call:
#   1. qlaud loads prior thread messages into context
#   2. tools_mode defaults to "dynamic" (no tools array passed)
#      → 4 meta-tools injected (qlaud_search_tools, get_tool_schemas,
#        multi_execute, manage_connections)
#   3. Model calls qlaud_search_tools({intent: "linear ticket"})
#   4. Catalog returns Linear (auto-enabled, not yet connected for this user)
#   5. Model calls qlaud_manage_connections({action: "connect", tool: "qlaud-mcp/linear"})
#   6. qlaud mints a hosted URL, model relays it: "open this to authorize Linear"
#   7. End-user pastes their Linear API key in the qlaud-hosted form
#   8. Model calls linear/create_issue → ticket lands in Linear
#   9. qlaud persists the assistant reply, indexes it for search,
#      debits the user's wallet, returns the streamed response

Choosing the right surface

/v1/messages vs /v1/threads

qlaud exposes two API surfaces. Same auth, same wallet, different power-vs-simplicity tradeoff.

/v1/messages

Just routing + billing

Pure Anthropic-shape passthrough. Whatever tools array you send is forwarded verbatim. Whatever you don't, the model has no tools.

✓ One-line URL swap from your Anthropic SDK
✓ Per-user keys + spend caps + per-user usage rollup
✓ Provider routing, fallback, native shapes preserved
✗ No catalog connectors, no threads, no semantic search

Right when: your existing app already manages history, tool dispatch, and any third-party integrations.

/v1/threads/:id/messages

Full backend

Send the next user turn. qlaud handles the rest. Default tools_mode is dynamic — the model auto-discovers + invokes any of 105 catalog tools.

✓ Everything from /v1/messages, plus:
✓ Persisted threads (no DB to maintain)
✓ 105 vendor connectors auto-enabled per end-user
✓ Semantic search across every conversation
✓ Built-in tool dispatch loop

Right when: you're building a chatbot, an agent, or any AI UX with end-user identity.

Built on

Cloudflare Workers, D1, Durable Objects, Vectorize

The data plane is Cloudflare's — the same primitives Cloudflare uses for their own production services. Globally edge-distributed, atomic per-customer wallet, native vector search. Your data is exportable via API any time. Security details.

One Threads call. The whole AI app backend.

Top up $5, mint a master key, ship a chatbot in 200 lines.

Create your account See the 200-line chatbot