Security Code Review 101 — Input Validation

How to employ character validation strategies in order to effectively reduce the software attack surface

Paul Ionescu
3 min readJan 1, 2019

This article is part of a series. Read the previous article here: https://medium.com/@paul_io/security-code-review-101-a3c593dc6854

Would you let someone in your house if you thought they should not be there?

You wouldn’t. So why allow any input to your application? Why allow symbols into a variable that is intended to be numeric?

Even when validation is used, a common mistake is to use block lists. For example an application will prevent symbols that are known to cause trouble. The weakness of this countermeasure is that some symbols may be overlooked.

Would you maintain a block list of people that cannot come to your house?

You wouldn’t. So why block quotes when you know the input should be numeric? You should only allow numbers.

In the two code samples below one has a security issue due to improper input validation. Can you tell which?

Both code samples, shell out, to execute OS commands, in this case sending a ping to a server. Shelling out is an insecure practice because it can lead to OS Command Injection, however the bottom code mitigates the issue because it uses an allow list to prevent hazardous characters while the top code uses a block list which is, in this case, insufficient since an attacker could pass in something like updateserver.com `command` or updateserver.com |command and neither ` nor | have been included in the block list.

As you can see, allow lists are much more effective at preventing application security issues. A simple multi-purpose function that checks if the input is alphanumeric can prevent multiple types of flaws:

And here is how to use this simple function to prevent a wide range of attacks:

One little function can prevent multiple attack types. The table below demonstrates how the function prevents SQL Injection, OS Injection, Cross-Site Scripting and Path Traversal.

The function also works for multi-language support. For example the character è will be considered a letter and will be allowed.

In an HTTP request there are many parameters that are simply numeric or alphanumeric. Let’s analyze the URL below which is part of a Twitter API request generated by executing a search for security.

https://api.twitter.com/2/search/adaptive.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_composer_source=true&include_ext_alt_text=true&include_reply_count=1&tweet_mode=extended&include_entities=true&include_user_entities=true&include_ext_media_color=true&send_error_codes=true&q=security&count=20&query_source=typd&pc=1&spelling_corrections=1&ext=mediaStats%2ChighlightedLabel

The only parameter here that may need to be excluded from validation is q . More than 90% of the request parameters can benefit from an alphanumeric allow list. By applying input validation to 90% of the input on the request, we reduce 90% of the attack surface. This is why input validation is the most effective way of reducing vulnerabilities.

Click the following link to access the next article in the series which covers Parameterized Statements: https://medium.com/@paul_io/security-code-review-101-parameterized-statements-df95c264364a

--

--

Paul Ionescu

Cyber-security professional and OWASP contributor from Ottawa, Canada. Creator and maintainer of the Secure Coding Dojo open source project.