What sanitize mean ? and why sanitize in code/data ?

Abderrahman Hamila
Jan 30, 2018 · 3 min read

I was teaching some web technologies for electromechanic engineering students in an engineering school. It was a very interesting experience for me as developer and very challenging. It’s not that easy as it seems. I had to prepare my courses, some sample codes and some projects ideas for them to apply what they learned in theory sessions. So, i have taught them some basics on HTML5, CSS3, javascript and some frameworks.

So, they are not developers but for me EVERYBODY SHOULD LEARN HOW TO WRITE SOME CODE and obviously WRITE HTML CODE !

This is a screenshot of a project realized by my students.

One of my students asked “What is sanitization ?” , I explained them what is about and how important can be in a real world project and the difference between sanitize and validate a user input.

  • Sanitizing will remove any illegal character from the data.
  • Validating will determine if the data is in proper form.

I would like to share with you the definition of the word:

sanitize

San[san-i-tahyz]

verb (used with object), sanitized, sanitizing.

1. to free from dirt, germs, etc., as by cleaning or sterilizing.

2. to make less offensive by eliminating anything unwholesome, objectionable, incriminating, etc.:

to sanitize a document before releasing it to the press.

In real world sanitize is to “clean” anything from “bad things”. In computer sciences it means the same thing. Mostly for security purposes, we protect the system from malicious data.

For example, a user can type anything in an input form and submit it. the input value is a valid form but in the server side it can be dangerous. It might be a malicious escape codes, such as with SQL injection apply to checking the validity of a field, where it can return an error to the user.

Writing a post in a blog can be a good example for us. The user enters some HTML code or via a WYSIWYG editor and we store it in the database and then we show it. So, what if the user copy and past some code from the internet and contains a <script> tag that contains some malicious code ? For this case, we do a HTML sanitization.

HTML sanitization is the process of examining an HTML document and producing a new HTML document that preserves only whatever tags are designated “safe” and desired. HTML sanitization can be used to protect against cross-site scripting (XSS) attacks by sanitizing any HTML code submitted by a user.

Basic tags for changing fonts are often allowed, such as <b>, <i>, <u>, <em>, and <strong> while more advanced tags such as <script>, <object>, <embed>, and <link> are removed by the sanitization process. Also potentially dangerous attributes such as the onclick attribute are removed in order to prevent malicious code from being injected.

Depending on the context, sanitization will take on a few different forms. Could be as simple as removing vulgarities & odd symbols from text to removing SQL injection attempts and other malicious code intrusion attempts.

Never trust a user entry, it’s a very naive approach. Always, validate the forms and check it in frontend as well backend. Sanitize data and URLs is a MUST.

Abderrahman Hamila

Written by

Dad, husband & web Developer and casual guitarist.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade