Your Code Should Do Security Checks at the Last Possible Moment

Published in

CYESEC

11 min readJul 20, 2022

Sometimes it feels like there’s a tradeoff between security and simplicity, agility, performance, or usability. In many cases this feeling is untrue, and stems from misinterpretations of security guidelines. In reality, many security bugs are usability bugs in disguise — for example, Injection vulnerabilities.

Why should you care about injection vulnerabilities?

Current development frameworks make building real-world applications easier, while also taking care of many security issues for the developer. Some feel that due to adoption of such frameworks, security bugs in applications will soon be a thing of the past. Unfortunately, this is not true — security researchers keep discovering vulnerabilities in the wild — granted, it’s gradually becoming more difficult to find and exploit, but the impact of a single critical vulnerability is still devastating.

Take, for example, the family of Injection vulnerabilities, like the well known SQL injection vulnerability. Why, are there still SQL injection bugs in the wild, even in newer applications that were built well after mitigating factors such as ORMs were widely adopted by developers?

The answer is that in many real-world cases, developers opt-out of ORMs and other similar tools, when hand-writing SQL makes more sense. Have a look at the “Architecture Patterns with Python”, written by Harry Percival and Bob Gregory — in one of the later chapters, they make an excellent case for using raw SQL queries in some views. Even if you disagree with their reasoning, the fact is that experienced developers (and novice developers alike) keep using “dangerous” tools to handle real-world use cases when it makes sense.

This is true for many types of data-handling vulnerabilities. Cross-Site Scripting vulnerabilities are found in the wild, even though front-end frameworks like React and Angular exist. Many websites, including giant web applications like Github, don’t use any big front-end frameworks, and instead take the classic approach of server-side rendering and using JavaScript to enhance user-experience where needed, which is a valid and effective approach.

The fact of the matter ultimately is, that if you write code, you must have a strategy for handling user-controlled data safely — because a framework won’t always protect you. This is especially true in complex, real-world use-cases. And even more than this, if you’re a security researcher, you should continue looking for these types of vulnerabilities.

Analysis of a simple example

Let’s take a simple web view: we want to let users to search for music tracks by artists, and return an error message if nothing was found. A simplified, intentionally vulnerable implementation of this view in Django might look like this:

Here’s what the interface looks like:

But this view is vulnerable to Cross-Site Scripting — a malicious user could craft a special URL that would inject HTML and JavaScript code into the view:

/search/<img%20onerror=”alert(‘Hacked!’)”%20src=not-there>

This is possible because the implementation of the view mixes user-controlled input (the artist name) with code — the HTML response.

Don’t fix injection vulnerabilities with input sanitation

To fix this vulnerability, some might suggest whitelist input sanitization — it’s more secure to restrict user inputs and only allows the necessary characters! Some frameworks do this out-of-the-box. There’s no reason to allow the user to search for <img onerror="alert('Hacked!')" src=not-there>. So in this example we might want restrict the inputs to just letters and spaces:

The problem with this solution is it hurts usability — there are many artists that we can’t search for now:

/search/Crosby,%20Stills,%20Nash%20&%20Young

What about , and &? Should we allow them? They don’t feel too dangerous, but are they? Let’s say we do allow them. What about:

What about '? It can be dangerous sometimes, but right now we’re not sure if we can use it to exploit the website. Maybe we should talk to the product manager and ask them if we can store all artist names using the closest alphabetic representation? How would we even explain this to them and to the customers?

The problem with this solution is that it’s tied to implementation details that are difficult to explain to stakeholders and keep track of internally. For example, imagine if you now wanted to put a link to the artist’s page, and the user input went into an HTML attribute delimited with 's. The response looks like this:

Now, ' really is dangerous, because we can use it to escape the attribute and open a new one:

/search/’%20onmouseover=’alert(1)’%20x=’

Does this mean we have to go back to the product manager and tell them we have to remove support for '? We can’t constraint the requirements like this.

Input sanitization is an important component of web applications, and could prevent many types of attacks, but by itself it is not a reliable enough strategy to mitigate these kinds of security issues. The problem with input sanitization is twofold:

It’s derived from product requirements, not security requirements.
It happens too early — without the proper context of where the data is used. (We will get back to this point later).

If we want to reliably fix injection vulnerabilities, we have to find a fix that works even if there is no input sanitization at-all. Over-reliance on input sanitization as the main defense layer in applications is one factor that contributes to security-related frustration.

Separation of code and data

Regarding injection-type vulnerabilities, user-controlled input only becomes dangerous when putting it into code — OS commands, SQL, or HTML in the case of cross-site scripting. This user-controlled data is not a Cross-Site Scripting threat when it passes from the browser to the server, from the server to the application router, or from the router to the view. The only dangerous action in our example is putting this data into HTML. A function that’s dangerous to pass user data to is sometimes called a Sink. In a security code review, when you look for injection vulnerabilities, you look for sinks and try to figure out if attackers can control the data that goes into them. In our case, HttpResponse is a sink.

The real mitigation is to eliminate sinks or guard them by keeping the data separated from the code — by escaping the data. Every major framework has a built-in escaping / encoding function, and most of them have a built-in mechanism that automatically escapes data when it’s mixed into HTML. In case of HTML escaping, characters that have special meaning in HTML are converted into a special representation:

Taken from the Django documentation: https://docs.djangoproject.com/en/4.0/ref/templates/builtins/#escape

HTML escaping is unique to HTML — SQL has different characters that have special meaning, and their escaping is different.

If all the data that goes into the sink is escaped properly, then it is not vulnerable to injection attacks.

Do data escaping as late as possible

A good strategy for securing your code, which might be counterintuitive at first, is to escape the data as late as possible — in other words, as close to the sink as possible. Here is an example of how to fix to the function from before:

Notice that if sinks are protected as late as possible, than when you review these sinks, all you need in order to verify that the sink is secure is right in front of your eyes. You don’t need to look at any other code, and in the case of HttpResponse in our search function, you don’t even have to look at any other lines of code, in order to determine that they are used securely. You can see in the same line of code, or very close to it, that all data is escaped before it goes into that sink, and that the type of escaping matches the type of sink. In this case, user data goes into HTML, so we use html.escape. Note that without the escaping, the search function itself technically becomes a sink, since it can be exploited if user data goes into it.

You might notice that we also escape the track names, although they don’t come from the user data in this function. Let’s think about this next.

The track name comes from the Track model, which is stored as a table in the database. It’s actually very difficult to know whether or not we trust the data in the database to be safe! The flow of the track name into this page probably looks something like the diagram below, where HttpResponse is currently a known sink. So, where should we escape the data in order to be on that safe side, shouldn’t we make sure no dangerous data ever goes into the system? Maybe we should escape it as early as possible — immediately in the content management view, before storing it in the database:

This, is wrong. What is actually dangerous, is not the data, but the sinks themselves. Escaping early actually makes the database, artist search view and HttpResponse sinks! The more sinks in the application, the greater the chances an attacker can exploit one of them:

During a code review, you will have to go through every component and line of code from HttpResponse back to the database itself, and make sure user-controlled data can’t reach any of these components. You also have to remember to keep doing it every time any of the code changes. It also makes components in opposite sides of the flow coupled with each other — the team that maintains the content management system has to know that the track name is only mixed into HTML on the other side.

But then what if the application team wants to change that and send the data via an API to a React application, or to a non-web client? They will have to un-escape the track name at their end… and the track name could go into an entirely different sink — it could be written to another database, a log file, etc.

Escaping data very early, which might feel like “good security”, although it isn’t, makes maintaining the application more complicated than it needs to be. This is another example of how a misunderstood security requirement can become an unnecessary burden.

The correct place to escape is as late as possible, in context — inside the artist’s search view, near the HttpResponse call:

Frameworks won’t always save you

Sometimes you have to work outside the framework to implement a unique use-case. When you do that, it’s crucial to pay attention to make sure you handled the data safely.

Let’s use the Django Templates, which handle HTML escaping automatically:

Now let’s assume we want to add a refresh button, and we want to do it in JavaScript to make refreshes look smooth. We don’t use any frontend frameworks right now, but it’s a simple use case, so we will just do it by hand:

Let’s do a quick security code review of that code — we used innerHTML a few times. innerHTML is a sink just like HttpResponse, so we have to make sure we use it safely.

The call to innerHTML on line 29 looks safe because we don’t pass any user data into it. The call on line 28 also looks pretty safe, since it just re-grabs the HTML from the template as-is.

The call in line 21, however, requires a closer look. At first it seems safe, because {{ artist }} is escaped by Django so it should be HTML-safe. However, upon closer inspection we discover that escaping it in Django is doing it too early — it goes into a JavaScript string, and these have their own escaping mechanism. For example, you can represent <as \x3c, which won’t be changed by Django’s html.escape at all, but will still inject valid HTML into refreshMessage.innerHTML:

/search/\x3cimg%20src=not-there%20onerror=”alert(‘Hacked!’)”\x3e

Script runs when the user clicks on the refresh button

So the sink is actually in the Javascript code, and we should escape it in Javascript if we can. We should also note that a template that puts data inside a Javascript string is yet another sink, which is different from HTML sinks. The correct solution is:

Escape the value using JavaScript escaping instead of the default HTML escaping (because the data goes into a JavaScript string, not HTML). The escapejs filter in Django does exactly that.
HTML-escape the data as close to innerHTML as possible — in JavaScript. We can do this for example by taking in user data using innerText instead of innerHTML.

It’s important to note that without these changes, the application won’t only have security issues, but also usability issues in cases where users search for artists with special characters in their name. Again, in this case, security and usability aren’t in competition with each other.

Ultimately, although input sanitization is nice-to-have, it should not be used as the main defense layer in applications and APIs. Instead, code and data should be kept separate. An example of this being done in action is by escaping the data, which should happen as late as possible — in order to reduce the attack surface of sinks and make it easy to review the code. This is fundamentally true both for security and usability, as they are not in competition with each other.

Eyal Greenberg is an accomplished researcher with years of experience serving as a penetration tester, security researcher, team-leader and eventually, as the Commander of the Israeli national Red-Team in Matzov. After completing his service, Eyal helped found CYE where he currently serves as a penetration tester , team leader and spearheads the research division.

Gil Cohen is a highly experienced information security architect, consultant, researcher, and penetration tester with more than 16 years of experience. Gil currently works as the Head of application security at CYE, taking part in both the information security services department and in the research of CYE’s automation product.