The content upload nightmare

How to properly detect the content of a file

Vincent Dufrasnes
PayFit
2 min readAug 17, 2020

--

“What is this file?”

This is a simple question I didn’t care much about. But that was before I had to properly handle file uploading.

This question is easy, anyone can answer it quickly. test.pdf is a PDF, it ends with pdf, image.jpg is a jpg , and that's when things start to get interesting, because there are a lot of questions you can ask to make someone doubt:

  • Are you sure it’s a PDF? Someone could have changed the extension.
  • Your machine says it’s a pdf but can you trust your machine? How does it know that it is a PDF?
  • What if someone changed the bytes inside the file to make your machine think it’s a valid PDF?
  • What if the file you’re trying to open is a malware in disguise?

It’s almost impossible for a system that handles file uploading to be sure, but let’s try to find a way to be as close to the truth as we can.

In this article I’ll share with you how you can detect and verify files that are being uploaded to your server (so server-side). Alright, let’s jump into it!

The full content of this article has been moved to PayFit’s Backstage blog.

--

--

Vincent Dufrasnes
PayFit
Writer for

AWS Software engineer - RDS MySQL MariaDB Security