By Rico Fritzsche in API — 26 Mar 2025

Why Code is the Wrong Layer for Rate Limiting

A practical take on keeping it out of your service code.

Rate limiting controls how many requests your system will accept within a certain period. You might be familiar with “burst limits” or “throttling”. It’s all about protecting your app from overload. There are plenty of ways to do it, including in your code via middleware, libraries, or frameworks. But just because you can handle rate limiting in code doesn’t mean you should.

The Appeal of Middleware-Based Rate Limiting

In many frameworks (.NET Core is a good example) you can add a few lines of configuration in your Program or Startup file to activate a built-in rate-limiting feature. You define your policy:

PermitLimit (allowed requests),
Window (time period),
whether you queue or reject additional requests,
and which HTTP status code to send back when the limit is reached.

It’s convenient, and it keeps everything in one place: your codebase.

Why is this a bad thing?

Operational Concerns Should Stay Out of Core Logic
Rate limiting isn’t really part of your business logic; it’s about overall system stability. When that logic lives in your service code, it risks cluttering your application’s responsibilities. If your system experiences heavy load, you want the infrastructure to handle the meltdown gracefully. It’s easier to reason about rate limiting when it’s separate from your application code.
Unified Observability
Infrastructure-based rate limiting centralizes metrics and logs in one location. If each service has its own limiter, you’ll piece together scattered logs to understand how traffic flows and where bottlenecks form. A gateway or dedicated rate-limiting layer can give you a real-time snapshot of the entire system.
Scalability and Reliability
At higher loads, in-process or “code-located” rate limiting solutions tend to suffer. Yes, you can distribute counters or use shared state, but that adds complexity. A dedicated layer or external tool can scale horizontally and is already optimized for concurrency.
Consistency Across Tech Stacks
Odds are, .NET Core might not be the only platform in your environment. You might also run Node services, Java apps, or Azure functions. When you enforce rate limits at the infrastructure level, everything is governed by the same set of rules. That’s a huge plus for consistency.

Centralized Control Matters
If you manage rate limiting across multiple microservices or containers, doing so at the code level means repeating yourself or orchestrating lots of separate services. Changing or updating the rules requires new deployments in each service. That’s fragile and labor-intensive.

Putting rate limits at the infrastructure level (e.g., API gateways, load balancers, or service meshes) allows you to update a single configuration and instantly apply it everywhere.

Example: Fixed Window in ASP.NET Core

Here’s an example of how you might enable a fixed window rate limiter in .NET Core:


services.AddRateLimiter(options =>
{
    options.AddFixedWindowLimiter("FixedPolicy", policy =>
    {
        policy.PermitLimit = 10;                     // number of requests
        policy.Window = TimeSpan.FromSeconds(60);    // time period
        policy.QueueLimit = 2;                       // how many to queue
        // etc.
    });
});

app.UseRateLimiter();

It works fine on a single service basis, but once you have multiple instances or a cluster of services, you need to coordinate them with a shared backplane or more sophisticated distributed approach. And that’s exactly why external solutions often make more sense.

When Code-Based Rate Limiting Might Be Okay

Small-Scale Internal Tools: If you run a simple app or internal tool with limited traffic, a quick in-code limiter might be enough.
Prototyping/Demos: If you’re just whipping up a proof of concept and need basic protection against bursts.
Local Testing: When you want to observe how your service reacts under load without setting up external rate limiting.

Just understand that you’ll likely outgrow it as soon as your service evolves beyond trivial environments.

The Right Way: Keep It Outside

The way I recommend is that a single external mechanism such as an API gateway, a reverse proxy (e.g. Traefik, NGINX, Kong) or your orchestration layer handles all traffic and applies rate limits consistently. This means:

One Central Configuration: Update your rate-limiting rules in a single file or config store.
Unified Monitoring: Log and visualize everything from a central dashboard.
Better Scalability: Infrastructure solutions are built for concurrency.
Less Code Bloat: Keep your service focused on business logic.

Final Thoughts

Yes, you can do rate limiting in code, and yes, frameworks and libraries make it pretty easy. But that doesn’t change the fundamental mismatch: rate limiting is an operational concern, and code-based solutions create extra overhead when you really need a stable, scalable, and easily maintainable approach.

If you’re small-scale or testing, then go for a built-in solution. But when your application is heading towards critical usage, it's better with your rate-limiting rules living outside the code. You'll avoid sleepless nights fighting distributed state, repetitive logic, and app-level logs that only tell half the story. Choose a gateway or orchestrator-level rate limiter and let your codebase do what it does best: deliver your business logic without getting bogged down in operational concerns.

Cheers!