What we’ve learned from our last annual penetration test

Published in

inato

8 min readOct 4, 2023

At Inato, we’re on a mission to bring clinical research to each and every patient, regardless of who they are and where they live. Working on clinical topics means that we are working with confidential data that must be protected from potential attackers.

While code security is a day to day matter for every developer, we also rely on security audits done by third party companies. Every year, our platform gets thoroughly tested by a penetration test. The auditors have full access to our platform and code base so they can find vulnerabilities that a real hacker would probably not even find. We can then remove the vulnerabilities before we have any real incident because of it.

Our last penetration test revealed one vulnerability which is quite interesting both for understanding what hackers can do and for learning how to avoid it with good code practices. That’s what I’d like to share in this article.

The symptoms

In our marketplace, we are matching sites that can perform clinical trials with the right clinical trials. Once the matching is done, the site has to fill a questionnaire so the sponsor (pharmaceutical companies) can select it for the real trial. Among the information we gather, we ask for the CV of the key individuals that would work on the trial. The person filling the questionnaire will upload a file that a person working on the sponsor side will then download.

The vulnerability found was a Cross site scripting vulnerability. With it, a hacker was able to upload a fake CV that was actually a JavaScript file. When the user on the sponsor side tried to download it, the JavaScript inside the file was executed:

An alert telling that the user has been hacked when landing on a url with the domain being api-marketplace.inato.com

This can be used in many different malicious ways:

Executing sponsor actions with api calls, like selecting a site for a clinical trial for example
Telling the user to visit another web page where they can give confidential data such as an email or a password
…

Let’s dive deeper in how this could be done!

The technical deep dive

We usually handle file upload in 2 steps in our stack:

we first upload the file on a specific api endpoint that returns a file path. The file is stored in a GCP bucket and the returned path is a part of the file’s path in the bucket.
we then call a GraphQL mutation to, let’s say, edit a member. Among the variables of the mutation, there is the file path returned by the previous api call along with the file name

First breach

Because of the separation between the upload and the mutation call, it’s easy to call the mutation with a path that does not exist. When doing so, nothing special was occurring until a user tried to download the file. Our download endpoint was returning a 404 with an error message containing the file path. Something like We could not find the file not-existing-file.jpg. The sad thing was that the content type header of the response was set to text/html. This is set automatically by Express when using response.send('some_string') with a string, if the content type is not already set.

Because of this, a hacker could call the mutation with a file path being "<strong>You have been hacked</strong>" and any user trying to download the file would have the HTML executed on their browser and see the image above.

Thanks to our proper use of the Content-Security-Header no inline JavaScript added in the file name can be executed. For example, this script set in a file path would not be executed: <script>alert(42)</script>We use helmet to secure our Express app and we barely modify the default settings.

We also prevent any download of script files from any source except our own server. We’re safe then, right? Hackers can’t add a script on our server and use it to hack one of our users, right?

Second breach

We saw that we could upload files in our server that could then be downloaded. Can a hacker use this to upload a malicious JavaScript file?

We restrict the type of files that can be uploaded in our server to “business” documents (.doc, .pdf, .xls, …) or images (.png, .jpg, …) and we use the file-type library to check the type of the uploaded files.

With this library, you can’t write a JavaScript file and just rename the extension from .jsto .jpg. Our api will detect that it’s a JavaScript file and reject it.

It’s however not hard to “trick” the file detection by adding some specific bytes at the start of the file to disguise a JavaScript file as an image file. You can see here the bytes to add to to disguise a file as jpg. The bytes, if interpreted with the ISO-8859–1 charset, will be parsed as ÿØÿ which can be integrated in a valid JavaScript file, like ÿØÿ=true;alert(42) for example.

Here’s where another issue of our code lay: when uploading a file, no matter its real type (the one retrieved by file-type), the corresponding downloaded file kept the extension of the file uploaded. This means that by uploading a file.js file disguised as an image file, the corresponding download link would be something like https://api-marketplace.inato.com/...../...file.js

Doing so, a hacker could upload a file and retrieve an url to download it from our api, a trusted source, in a way that the browser would interpret it as JavaScript.

It’s actually not 100% true. There’s another small technical part that prevents a .js file from being executed by the browser. In order for the magic bytes to be valid in a JavaScript file, we need to have its encoding be ISO-8859–1 and not utf-8. Express automatically sets the charset of a .js file, in the Content-type header, to utf-8 when using the res.attachement method. To bypass this, we can use the .ecma extension. Express won’t set the charset on it and the browser will execute it like a .js file.

Full flow

Here’s how both breaches can be used to execute malicious JavaScript on our api’s domain, creating a cross-site scripting vulnerability:

Description of how a hacker could upload a malicious JavaScript file and have it downloaded and executed by another user

Remediation actions

Here are the fixes we implemented to remove the vulnerability

Instead of keeping the extension of the given file, we now store the file name with the extension given by file-type. This ensures we’ll always only serve file with a trusted and validated extension.
We stopped giving technical info in download file errors. We already did the move a year ago to stop giving error details to our frontend when querying our GraphQL api. We then relied on properly defined error types to display proper error messages to our users. We now have also removed them from our few REST api endpoints.
To remove the technical info above, we stopped using response.send('some_string')and instead just do response.sendStatus(404). Doing this automatically prevented Express from setting the content type to text/html.

Only one of the 3 fixes prevented the vulnerability but each one was a potential breach for another one so it’s best to fix them all.

One thing that we did not do but that could also be done is to stop uploading files in 2 steps. Doing it in one step would ensure only our system can set the path of the file, once the file has been uploaded.

What did we learn from that?

First, let’s take some time to congratulate ourselves. Having set the Content-Security-Policy header on our application really makes the hackers’ job significantly harder. We spent some time adding it and it’s something that can be tedious to maintain. However, without this header, the hacker could have executed any JavaScript without requiring the upload of a corrupt file in our api.
Let’s also emphasize the benefits we can get from security audits. We’ve never any real incident caused by this vulnerability. We were able to fix it before that and, most important, we globally strengthened our system with the fixes we did.
Don’t let technical info leak to the users of your application. It can be exploited in many ways by hackers, from knowing what your system is built with to exploiting a vulnerable display. Rely on status codes or technical codes that are not built from real message errors. It can be tempted to send the api’s errors to the frontend for easier debugging but you should rely on logging for that.
Always validate data coming from end users. It’s always something that comes from security audits, trainings or presentations. Never trust the data coming to your api from anybody, even internal users. They can always be hacked, or ill-intended. Validating user data is not only sanitizing strings to prevent SQL or HTML injection (using a modern framework and ORM will prevent most of these anyways), it’s also validating the files’ type and extensions, among others.
Last but not least, this is an excellent reminder that security is multilayered. There’s not one single protection that will prevent all the vulnerabilities. On the opposite, closing your eyes on one small breach because it does not look that harmful, which can be true at a given point in time, lowers your security defenses. Add a few of these and you can have a vulnerability in your system. In our example, I identified 3 small breaches and fixing only one of them would have prevented the vulnerability.

The vulnerability came because of multiple small breaches

I hope you learned a few things from this and that it was interesting to read. Security can be tedious but it was actually pretty fun to investigate this vulnerability. Seeing how far a hacker can go to penetrate your system and trying to build the correct protections is a nice challenge. I hope this kind of examples can help you think about your product and identify existing and potential breaches in it.

Feel free to comment or ask your questions in the comments!

What we’ve learned from our last annual penetration test

The symptoms

The technical deep dive

Remediation actions

What did we learn from that?

Written by Vincent Francois