Developers
Rate limits
The Mnemos API meters per workspace, per key, and per endpoint class. Every response tells you where you stand.
Tiers
| Plan | Standard | Burst | Chat / search | Note |
|---|---|---|---|---|
| Starter | 60 | 120 | 30 / min | Per workspace. |
| Business | 600 | 1,200 | 300 / min | Per workspace, with per-key overrides. |
| Enterprise | Contractual | Contractual | Contractual | Negotiated. Hard ceilings live in the rate-limit dashboard. |
Response headers
Every API response includes the following headers. They are the source of truth — use them rather than hardcoding the table above.
headers
X-RateLimit-Limit: 600
X-RateLimit-Remaining: 427
X-RateLimit-Reset: 2026-05-19T15:00:00Z
X-RateLimit-Bucket: workspace:org_01H...:chat
Retry-After: 14 (only on 429)Backoff
On 429 rate_limited, honor Retry-After if present. Otherwise back off with full jitter: pick a random delay between 0 and an exponentially growing cap, starting at 1s and doubling on each failure, with a maximum of 60s. Do not retry the same request more than 5 times.
example
async function withBackoff<T>(fn: () => Promise<T>, max = 5): Promise<T> {
let attempt = 0;
for (;;) {
try { return await fn(); }
catch (err: any) {
if (err.status !== 429 || attempt >= max) throw err;
const ra = Number(err.headers?.get("retry-after") ?? 0);
const cap = Math.min(60_000, 1_000 * 2 ** attempt);
const delay = ra > 0 ? ra * 1000 : Math.random() * cap;
await new Promise(r => setTimeout(r, delay));
attempt++;
}
}
}Need higher limits?
Business customers can request key-scoped overrides for a single workload (for example, a one-time backfill) without amending the contract. Enterprise customers can negotiate permanent ceilings.