How Rate Limiting Works
This page covers the mechanics behind Zuplo's rate limiter: how requests are
counted, what each rateLimitBy mode does in detail, and every configuration
option available. If you just want to add a rate limit to your API, start with
the Getting Started guide instead — this page is the
deep dive you can read alongside or after it.
Zuplo's rate limiter uses a sliding window algorithm enforced globally across all edge locations. Unlike a fixed window algorithm (which resets counters at fixed intervals and can allow bursts at window boundaries), the sliding window continuously tracks requests over a rolling time period. This produces smoother, more predictable throttling behavior.
Key terms
A few terms show up repeatedly in the rate limiting docs. They are related but not interchangeable.
- Counter (or bucket) — The running tally Zuplo keeps for a single caller
and a single policy. Each unique combination of policy
nameand caller identifier gets its own counter. Two different policies tracking the same caller do not share a counter; two different callers under the same policy do not share a counter either. - Rate limit key — The string value that identifies a caller for bucketing.
For
rateLimitBy: "ip"the key is the client's IP address; for"user"it isrequest.user.sub; for"function"it is whatever your custom function returns asCustomRateLimitDetails.key; for"all"there is a single implicit key shared by every request to the route. identifieroption — A field in the policy's configuration that points Zuplo at your custom TypeScript function whenrateLimitByis"function". Zuplo calls that function on each request, and the function returns aCustomRateLimitDetailsobject whosekeyproperty becomes the rate limit key. In short:identifieris where the function lives;keyis what the function returns.
Rate limiting policies
Zuplo provides two rate limiting policies, each suited to different levels of complexity.
Rate Limiting policy
The Rate Limiting policy enforces a single request counter per time window. Configure a maximum number of requests, a time window, and how to identify callers.
Code
Use this policy when you need a straightforward "X requests per Y minutes" limit.
Complex Rate Limiting policy
The Complex Rate Limiting policy supports multiple named counters in a single policy. Each counter tracks a different resource or unit of work.
Code
You can override counter increments programmatically per request using
ComplexRateLimitInboundPolicy.setIncrements(). This is useful for usage-based
pricing where different endpoints consume different amounts of a resource (for
example, counting compute units or tokens instead of raw requests).
Choosing a policy
| Scenario | Policy |
|---|---|
| Fixed requests-per-minute limit for all callers | Rate Limiting |
| Different limits per customer tier (free vs. paid) | Rate Limiting with a custom function |
| Counting multiple resources (requests + compute units) | Complex Rate Limiting (enterprise) |
| Usage-based billing with variable cost per request | Complex Rate Limiting with dynamic increments (enterprise) |
How rateLimitBy works
The rateLimitBy option determines how the rate limiter groups requests into
buckets. Both policies support the same four modes.
ip
Groups requests by the client's IP address. No authentication is required. This is the simplest option and works well for public APIs or as a first layer of protection.
Be aware that multiple clients behind the same corporate proxy, cloud NAT, or
shared Wi-Fi network can share a single IP address. In these cases, IP-based
rate limiting can unfairly throttle unrelated users. For authenticated APIs,
prefer rateLimitBy: "user" instead.
user
Groups requests by the authenticated user's identity (request.user.sub). When
using API key authentication, the
sub value is the consumer name you assigned when creating the API key. When
using JWT authentication, it comes from the token's sub claim.
This is the recommended mode for authenticated APIs because it ties limits to the actual consumer rather than a shared IP address.
The user mode requires an authentication policy (such as API Key
Authentication or JWT authentication) earlier in the policy pipeline. If no
authenticated user is present on the request, the policy returns an error.
function
Groups requests using a custom TypeScript function that you provide. The
function returns a CustomRateLimitDetails object containing a grouping key
and, optionally, overridden values for requestsAllowed and
timeWindowMinutes.
This mode enables dynamic rate limiting where limits vary based on customer tier, route parameters, or any other request property.
all
Applies a single shared counter across all requests to the route, regardless of who makes them. Use this for global rate limits on endpoints that call resource-constrained backends.
Dynamic rate limiting with custom functions
When rateLimitBy is set to "function", you provide a TypeScript module that
determines the rate limit at request time. The function signature is:
Code
The CustomRateLimitDetails object has the following properties:
key- The string used to group requests into rate limit bucketsrequestsAllowed(optional) - Overrides the policy'srequestsAllowedvaluetimeWindowMinutes(optional) - Overrides the policy'stimeWindowMinutesvalue
Returning undefined skips rate limiting for that request entirely.
The function can also be async if you need to look up limits from a database
or external service. See
Per-user rate limiting using a database
for a complete example using the ZoneCache for performance.
Wire the function into the policy configuration using the identifier option:
Code
The requestsAllowed and timeWindowMinutes values in the policy configuration
serve as defaults. The custom function can override them per request.
Combining rate limiting with authentication
Rate limiting works best when combined with authentication so that limits apply per consumer rather than per IP. A typical policy pipeline is:
- Authentication (e.g., API Key Authentication) -- validates credentials
and populates
request.user - Rate Limiting with
rateLimitBy: "user"-- enforces per-consumer limits usingrequest.user.sub
With API key authentication, the consumer's metadata (stored when creating the
key) is available at request.user.data. A custom rate limit function can read
fields like customerType or plan from the metadata to apply tiered limits.
Rate limiting and monetization
If you use Zuplo's Monetization feature, the monetization policy handles quota enforcement based on subscription plans. You can still add a rate limiting policy after the monetization policy to provide per-second or per-minute spike protection on top of monthly billing quotas. These serve different purposes:
- Monetization quotas enforce monthly or billing-period usage limits tied to a subscription plan
- Rate limiting protects against short-duration traffic spikes that could overwhelm your backend
Combining multiple rate limit policies
You can apply multiple rate limiting policies to the same route. For example, you might enforce both a per-minute and a per-hour limit. When using multiple policies, apply the longest time window first, followed by shorter durations. This ordering ensures that the broadest limit is checked first — if a caller has exhausted their hourly quota, the request is rejected immediately without incrementing the shorter-duration counter.
Additional options
Both rate limiting policies support the following additional options:
headerMode- Set to"retry-after"(default) to include theretry-afterheader in 429 responses, or"none"to omit it. Theretry-aftervalue is returned as a number of seconds (delay-seconds format).mode- Set to"strict"(default) or"async". In strict mode, the request is held until the rate limit check completes — the backend is never called if the limit is exceeded. This adds some latency to every request because the check hits a globally distributed rate limit service. In async mode, the request proceeds to the backend in parallel with the rate limit check. This minimizes added latency but means some requests may get through even after the limit is exceeded. Async mode is a good fit when low latency matters more than exact enforcement.throwOnFailure- Controls behavior when the rate limit service is unreachable. When set tofalse(default), requests are allowed through (fail-open). When set totrue, the policy returns an error to the client. The fail-open default prevents a rate limit service outage from blocking all traffic to your API.
Related resources
Go deeper on configuration:
- Rate Limiting policy reference — Every option for the standard policy.
- Complex Rate Limiting policy reference — Multi-counter limits for usage-based pricing (enterprise).
Learn by example:
- Dynamic Rate Limiting tutorial — Tiered limits by customer type.
- Per-user rate limiting with a database — Look up limits at request time using ZoneCache and a database.
Combine with other policies:
- Quota policy — Monthly or billing-period usage caps.
- Monetization policy — Subscription-based access control and metering.