Strategic Protection: Mitigating DoS/DDoS Attacks and Balancing User Blocking
Co-author:
Pankaj Pandey (Senior Technical Architect / Director of Engineering, SCM Tech @ Tata 1mg)
Introduction
At Tata 1mg, our constant endeavor is to ensure seamless user access to our platforms, while simultaneously safeguarding user data against all forms of security threats and fraudulent activities.
Organizations are currently grappling with the challenges posed by a variety of attacks that are directed toward disrupting their services or gaining unauthorized access to their systems.
Problem Statement
The challenge is to effectively counteract DoS/DDoS attacks and API abuse without solely relying on expensive third-party solutions. As organizations scale, these solutions become less feasible due to their generalized nature, associated costs, and lack of customization. Additionally, protection against gradual API abuses requires meticulous attention to make sure legitimate users are not blocked from accessing our services.
Read on our approach to dealing with the following problems —
- Safeguard our systems against DoS/DDoS attacks
- Prevent API abuse even at a very slow rate and balance user blocking
The Rationale
We created a customized plan by learning from real-time solutions in production environments and evaluating options from Cloudflare, AWS, and others. This plan lets us use our existing infrastructure effectively and save on costs, giving us more customization opportunities than usual third-party solutions.
Solution Overview
Disclaimer: As we delve into custom solutions supported by illustrative code snippets and diagrams, the article’s length might appear substantial. However, comprehensive detailing is crucial to outline the end-to-end solution, making sure it is useful for those who want a complete understanding.
We call it a four-layered approach that combines various defenses to tackle DoS/DDoS attacks and API abuse.
Layer 1: IP-Based Restrictions
Implement IP-based restrictions to limit the number of requests from the same source. If the threshold is exceeded, the source is blocked for a short duration. Our “Web Application Firewall”(WAF) layer seems to be the best fit for this, so we have added IP-based restrictions on Cloudflare (our WAF)
Layer 2: API-Based Rate Limiting
Utilize API-GATEWAY for microservices to enforce API-based rate limiting. Each API is configured with rate limits based on request headers and methods. Requests exceeding these limits receive a 429 status code.
Layer 3: Parameter-Based Rate Limiting
Implement rate limiting based on parameters like UserID and tokens (in-depth control to configure rate limiting on anything available in HTTP request headers or body, this supports eval expressions to fetch the identifiers from deep inside the request body). Developers define these limits at the aggregator layer. Layer 3 prevents repeated requests with the same parameters within a specified time frame.
Layer 4: Bot Management
Leverage cloud services such as Cloudflare and AWS to manage bots. This provides an effective and cost-efficient solution to address this complex aspect of security.
Custom API Rate Limiting for Layers 2 & 3
Layers 2 and 3 have been implemented as a custom API rate-limiting solution. We think it’s the best practice to apply this right where requests first enter our systems and begin using resources. Since we don’t want any nonlegitimate request to flow beyond our API gateway, it makes sense to place the layers 2 and 3 protections there.
We have written a configurable rate-limiting module (python decorator) that can be attached to the APIs (as python decorators) just by defining a service config that looks like —
"SEND_OTP_NUMBER": [
{
"identifier": "body.number",
"duration": 60,
"limit": 3,
"message": "Max limit reached. Please try after sometime."
},
{
"identifier": "body.number",
"duration": 86400,
"limit": 20,
"message": "Max limit reached for the day. Please try after 24 hours."
}
]
The above configuration puts two rules (max 3 times in 60 seconds and 20 times in 1 day) on the SEND_OTP_NUMBER (API identifier, “rl_config_key” in the decorator code below) action based on the mobile number present in the request body. The configuration can be extended to chain any number of rules based on request body/headers and duration to decide an effective blocking strategy.
Python decorator code -
Reading and understanding the code becomes important to understand the various low-level implementation details.
def rate_limit_api(rl_config_key: str):
"""
This decorator is used to rate limit an API call.
Use this to put a limit on the number of times an API can be called by a particular
caller (a user or a resource) in a specified duration.
:param rl_config_key: config key name, the decorator will look for the config details in the
config['RATE_LIMIT'] with this key. It must contain a list of limiting rates where each entry contains below
details (these limit are applied with AND condition)
* identifier - values can be like headers.<value> or params.<value> or <body.value> or
body.<value>.<value1> OR eval.<eval_expression> | eval.<eval_expression> must be used for APIs with body
and must refer to request body as "request_body"
* duration - duration in seconds
* limit - allowed number of calls in the specified duration
* message - Error message to be raised if the limit exceeds.
"""
# Pre condition to validate the identifiers
rl_config_details = CONFIG.config['RATE_LIMIT'].get(rl_config_key) or dict()
if not rl_config_details or not all([rl_config_detail.get('identifier') and rl_config_detail.get('limit') is not None and rl_config_detail.get('duration') for rl_config_detail in rl_config_details]):
raise Exception('Config details not found / or incorrect for rate_limit_api config - {}'.format(rl_config_key))
def _validate_config(rl_config_detail):
identifier_str = rl_config_detail.get('identifier')
identifier_path = identifier_str.split('.')
if identifier_path[0] not in ["headers", "params", "body", "eval"]:
raise Exception("Invalid identifier passed to the rate_limit_api")
if identifier_path[0] in ["headers", "params"] and len(identifier_path) > 2:
raise Exception("Invalid identifier depth in rate_limit_api")
for rl_config_detail in rl_config_details:
_validate_config(rl_config_detail)
def decorated(func):
@wraps(func)
async def wrapper(self, *args, **kwargs):
async def _get_identifier_value(rl_config_detail):
request_obj = args[0]
identifier_str = rl_config_detail['identifier']
identifier_path = identifier_str.split('.')
if identifier_path[0] == 'headers':
return request_obj.headers.get(identifier_path[1])
elif identifier_path[0] == 'params':
return request_obj.request_params().get(identifier_path[1])
elif identifier_path[0] == 'body':
request_body = await request_obj.custom_json()
value = request_body or dict()
for key in identifier_path[1:]:
value = value.get(key) or dict()
return str(value) if value else None
elif identifier_path[0] == 'eval':
request_body = await request_obj.custom_json()
# Never delete request_body declaration below. It is used in the in the eval
# statement in the try block
request_body = request_body or dict()
try:
return eval(identifier_str.replace('eval.', ''))
except Exception:
return None
rules_data = list()
for rl_config_detail in rl_config_details:
identifier_value = await _get_identifier_value(rl_config_detail)
if identifier_value:
duration = rl_config_detail['duration']
max_limit = rl_config_detail['limit']
message = rl_config_detail.get('message') or 'Limit exceeded'
publish_breach = True if rl_config_detail.get('publish_breach') is True else False
rules_data.append((identifier_value, duration, max_limit, message, publish_breach))
for rule in rules_data:
if await ApiRateLimit.check_and_publish_limit_breach(rl_config_key, rule):
logger.info('HTTP_429 - {}'.format(rl_config_key))
raise BadRequestException(get_api_rate_limit_exceeded_error_message(rule[3]))
try:
if inspect.iscoroutinefunction(func):
return await func(self, *args, **kwargs)
else:
return func(self, *args, **kwargs)
finally:
for rule in rules_data:
asyncio.create_task(ApiRateLimit.update_rate_limit(rule[0], rl_config_key, rule[1], increment_by=1))
return wrapper
return decorated
The rate limiting module is capable of parsing the HTTP request body/headers to get the identifier value from the expression mentioned in the config. It creates a unique identifier for a request (against each rule) by combining the action (SEND_OTP_NUMBER), identifier, and duration and stores the count of requests in Redis against the unique identifier generated here. When the count breaches the threshold mentioned in the config, the request is blocked from proceeding further.
Protection against API abuse at a slow rate
Addressing slow-rate abuses requires a balanced approach to aggressive user blocking. We have extended Layer 2 and 3 solutions, as discussed above, to provide a mechanism that validates blocked users and takes corrective actions. This is facilitated by a pub-sub model (using Redis channels). Blocked requests are analyzed by an independent microservice named ‘forfend Service’ which decides whether to unblock users, extend blocking, or take other actions based on predefined business logic.
The idea here has been captured in the diagram below -
Not all rate-limiting configurations require dedicated analysis. Hence, we have provisioned a key in our rate-limiting configuration to decide whether the blocked requests should be published for further analysis or not.
"APPLY_COUPON": [
{
"duration": 300,
"identifier": "headers.x-authorization",
"limit": 10,
"message": "Maximum limit reached. Please try after sometime.",
"publish_breach": true
}
]
If the ‘publish_breach’ is True, any request blocked due to this rate-limiting rule will be analyzed by the Forfend Service.
Conclusion
Combining real-time mitigation, layered defenses, and data-driven analysis, this multi-layered approach stands as a robust strategy against DoS/DDoS attacks and API abuse. By harnessing the power of existing infrastructure and intelligent analysis, organizations can strengthen their security posture without being dependent on third-party solutions.