Cloud Security
Published in

Cloud Security

Why Data Types Matter

How you handle data types may lead to a number of vulnerabilities or odd behavior attackers can abuse

This blog post is part of a series on secure coding principles that may become a future book like my other blog-book Cybersecurity for Executives in the Age of Cloud. If you want to know when the book gets published, follow me on Medium or Twitter — or both!

You’ve probably already heard about data types if you’ve been programming for any length of time. But have you dug into the details? Why should you care about data types?

In some languages, you have to make sure you use the correct data type in your code when you declare a variable. Other languages will try to figure out the type of data you want to use when you instantiate a variable. Who cares what the data type is just let the programming language figure it out, right? Well, let’s take a look at how that might work out.

Problems with mismatched data types

Most programming languages include something called primitive data types. These data types are defined within the language itself. They have certain properties. For example an integer data type in Java must be a whole number between -2,147,483,648 to 2,147,483,647.

Let’s at data types in Java and C#. What do you notice about the data types in these two charts?

Java

https://www.w3schools.com/java/java_data_types.asp

C#

https://www.w3schools.com/cs/cs_data_types.php

Java has a 2 byte short data type. C# has no such data type. When processing numeric values such as in banking systems, you’ll need to ensure that systems that integrate with each other use common data types. If using two different languages, you’ll need to ensure that you define the minimum and maximum values allowed in each system to prevent bugs.

Now consider some of the numeric data types in Microsoft’s version of SQL, Transact-SQL.

Here the data types can be even more granular, to conserve space in a database where storage and processing time become crucial factors to database performance.

One of the most common reasons for batch job failures at the bank where I worked involved batch jobs that imported data from external systems that sent numbers larger than our system could process. Often that would cause a production incident. Sometimes one of our team members on-call would have to respond to a phone call and write a query to manually adjust the system.

That manual query or production fix usually doesn’t go through the same level of scrutiny and testing it did during a development process. At some organizations, a “production support” or “operations” team may make this correction. That team is doing their best given the information they have most of the time, but often doesn’t fully understand the functionality and nuances of the application.

What could go wrong when someone comes in to manually write a query that impacts financial records? If you aren’t already imagining a number of scenarios I’ll help. What if two people are colluding to write a query and send money to the wrong bank account leveraging a process that entails less scrutiny of their actions? What if the person has a bug in their manual hastily written query that wipes out valid records or posts incorrect values and that goes unnoticed? What if the person writes a query and provides an explanation non-technical people don’t understand that fixes the problem but introduces a system vulnerability — on purpose?

I provided another example of what could go wrong with incorrect data types in my last post on secure transactions. Consider the scenario I described involved a flat-file that received the wrong data in a CSV file. It could be that someone put a string in a column that should have been an integer. The CSV file doesn’t care. That could trigger the error in a downstream system. In my example that led to the free wire transfer for a customer in code that failed to properly wrap a set of related operations in a transaction.

Incorrect data types lead to unexpected results

The other day I noticed the following post on Twitter. Notice anything a bit odd?

The last line parses 0.0000005 as 5. What could go wrong?

Let’s say an application is processing dividends that could result in a number with a lot of extra decimal places. Instead of getting a partial cent as a dividend the person receiving this dividend would receive $5.00. Let’s say you have hundreds of thousands of people receiving that dividend. That could add up!

Many people commented on this post. Some of them wrote about how ridiculous it is that JavaScript behaves this way.

Others had a different perspective. They chastised anyone who would write code like this because the parseInt function requires you to pass in a String rather than Number. Who would be so foolish as to use a function incorrectly?

JavaScript is one of the languages that allow you to create variables without specifying a data type. Do you always check what type of data type you need to pass into a JavaScript function before you call it? I can imagine many programmers don’t. They fail to take this step either because they may not even be aware that they should be doing that. Sometimes they are in a hurry to get something done and don’t think about the risk or know that it even exists.

Another potential problem is simply making a mistake. Yes, you’re all perfect right? I’m not. I could easily see myself doing this I’m quickly writing some code to accomplish some task. You know you’re supposed to pass in a string and that you should understand all the inputs to a particular function before calling it. But while typing fast you forgot to put those quotes around that value. Oops. It happens.

On the other hand, shouldn’t this function be checking for proper inputs? Why do you, as a programmer, need to understand how every single function you call works. Shouldn’t they protect you from mistakes like this? Well, some programming languages do and others don’t. It may be a pain to define your data types upfront but when you do, your programming language that enforces correct data types should protect you from problems like this.

What’s the takeaway? A programming language that forces you to define a data type could possibly lead to fewer errors. A programmer would need to choose the correct data type when they define the variable. The language would throw an error if the programmer passes a variable with the wrong data type into a function that doesn’t match the signature of the function. This makes it more complicated to program in that language but can prevent errors such as the one above.

This is how the languages you choose for creating your applications can affect your security, by the way. When you choose to use a particular language for applications involving sensitive data, dig into the details of how they work under the hood to prevent security problems later. If you choose to use a language that doesn’t enforce data types you’ll need to ensure your application checks that the right type of data is leveraged throughout to prevent related security problems.

Vulnerabilities and unexpected behavior in primitive data types

Sometimes the data types in your favorite programming language have a vulnerability. Of course, you’ll need to be aware of those and update to the latest version as quickly as possible to ensure your system is not vulnerable to attack. However, you can also learn from these vulnerabilities and ensure when you implement your own data types they do not have similar problems.

Primitive data types may also have unexpected behavior. Be aware of these issues and ensure you select data types appropriately for given application functionality. Some data types perform rounding incorrectly when dealing with currency values. If you are using Java you’ll want to avoid using the double data type when programming financial applications. Dig into the details of the data types you select and make sure they are appropriate for your application.

This article goes into a lot more detail on Java data types and currency calculations if you are interested:

https://www.infoworld.com/article/2071332/the-need-for-bigdecimal.html

A related issue involves functions that round numbers. This article from Microsoft includes rounding discrepancies in C#:

I wrote a blog post about a problem with foreign currencies in Cold Fusion back in 2010:

From some tax documentation I am reading for Sales Tax Online web tax calculation component:

“You may encounter a precision problem for very large amounts when using the ColdFusion number type. Because this type has a very large exponential range, it necessarily sacrifices precision in the number of significant digits it can carry. This will not generally be a problem with amounts expressed in currencies that have only two significant fractional digits (e.g., US Dollars), however, foreign currencies can have as many as four significant fractional digits. A value with a large non-fractional portion could suffer a rounding error in the fractional digits.”

To help discover issues with unexpected behavior, test applications thoroughly with appropriate bounds checking. Bounds checking means that if you allow values between 1 to 10 for a particular variable, check values up to and outside those boundaries. You should test 0, 1, 10, and 11. You’ll probably want to also test negative numbers because sometimes people add functionality or use data types that drop the sign and that leads to unexpected results.

You could use automation to help you find discrepancies when using different data types and functions. I haven’t done this myself but it would be pretty interesting. If you try it out give my blog post a plug! Write functionality to loop through numbers using different data types and functions with the same numeric inputs. Compare the results. Any time the results don’t match you have a potential issue that you’ll want to understand before using that data type or function.

Creating your own data types

In addition to primitive data types, languages may allow you to create your own data types. You may create data types with the definition of your choosing. Some languages allow you to create objects.

I’m not going to explain objects in complete depth here as you should look up and understand how objects work in your programming language of choice. But I want to address the fact that when creating and using your own data types you have some of the same issues you do when using primitive data types. You need to ensure the correct data types are in use within your own objects.

When you create an object it often has characteristics called properties and methods. Properties leverage primitive data types to describe the object within your system. These larger groupings of primitive data types are expected to have certain characteristics and as you use your object you should validate the integrity of the data contained in your object. Prevent assignment of incorrect data types to object properties to prevent security errors.

Methods are actions that your object takes within an application. These methods often receive data types as input. Just like properties, those data types passed in as arguments to a method could be primitive data types or other objects. Ensure the data types your method receives are correct to prevent a myriad of security problems.

When an application has a security problem related to passing in the wrong data type into an application is known as object type confusion. Object type confusion can lead to a number of problems up to executing commands on a remote machine. This book is about securing code and I’m not going into all the details of how these attacks work and the potential resulting damage. If you want to look at one of these attacks in more detail check out this blog post from Microsoft involving an Adobe Flash vulnerability.

Here’s an example of a CVE in Tensor Flow that allows an attacker to create a model that causes an integer overflow, or in other words push data to an application expecting an integer that is outside the bounds of an integer data type.

What kind of damage might be caused by an integer overflow? Well in one instance, it crashed a rocket. You can do additional research on your own for more examples of why you don’t want to allow data larger than the expected data type size including obtaining an understanding of buffer overflows. These types of vulnerabilities have caused numerous security problems over time.

Here’s another vulnerability due to failing to check data types in PHP that leads to sensitive information exposure:

Some programming languages help you prevent them via strong type checking. If they don’t, you should be checking those data types yourself. When a language is strongly typed you must declare the data type when you create a variable and also for parameters in functions you define. Java advertised that it prevented buffer overflows when it arrived on the scene. It does, mostly, through proper error handling, a topic covered in a prior blog post, and type checking.

Use proper error handling to ensure your custom objects accept only the expected values and call functions with the proper values. Use proper error handling to prevent unexpected errors from introducing security vulnerabilities.

Null

A null value in programming indicate that a variable has no value. It’s empty. Nothing is assigned to it. Some programming languages will distinguish between null and zero, such as Microsoft’s Transact-SQL. Other languages consider null and zero to be one and the same. Some languages use a different name for null values such as Python’s None. Different languages handle null in different ways and it is important to understand those distinctions.

Null values are the source of many bugs, as well as security problems. When you forget to assign a value to variable and pass it to a method you may see the infamous “null value” error message appear on your screen. Too many times programmers forget to check for null values and the resulting error is not always descriptive or helpful. Checking a stack trace may lead you back to the offending function or library that caused a system crash with this nondescript error message.

Some data types use a null value to indicate to the program it has reached the end of a data type. For example, an application or programming language may determine when it reaches the end of the string when it encounters a null value. Sometimes a buffer will assume it’s at the end when it reaches a null value. Attackers may take advantage of this by inserting null values before the end of the value in an application, when then causes the application to dump memory or execute code passed in after it reached the null value. Sometimes the null value with be obfuscated (obscured or disguised) using encoded values, the topic of a future post.

Null values in financial applications can also cause unwanted errors and unexpected results. Is the null treated as a zero? Or is it treated as an empty string? Or nothing? Perhaps your application programming language handles the null differently than your database. When the null value occurs in a batch processing job it may cause the system to crash. A null value may also alter the outcome of a calculation. Always checked for null values in stored procedures or functions used for financial calculations.

You should always try to test your applications with a null values for all the different inputs to ensure you do not get unwanted results. Also test the word ‘null’ because sometimes programmers will inadvertently check for the string ‘null’ instead of an actual null value. The programming language itself may have an issue with the string ‘null’ as well.

Speaking of unwanted results involving null values, here’s a hilarious presentation by someone who thought it would be fun to get a license plate with the word “null” on it called “Go Null Yourself.” I highly recommend this entertaining presentation on what can go wrong with unexpected processing of null values. I doubt you can watch it without at least cracking a smile.

File Types

I’m always happy to get a penetration test with file upload functionality because I almost always find a problem related to that. Sometimes developers simply check the extension at the end of the file (i.e. a Word document gets uploaded and they check for ‘doc’ or ‘docx’ at the end of the file name).

Super easy to name a file with the proper name and insert all sorts of fun data in the file that can cause the system to react in unexpected ways. In a file upload form you can do additional data checks on a file to make sure it is the type of file you expect. This isn’t going to prevent every vulnerability but it will help. When inspect the first few bytes of a file they should indicate the proper file type.

Here’s a source that lists the signatures for different types of files:

Take a look at information for a PNG image file:

On the left hand side you’ve the first few bytes of the file in hexadecimal notation. What’s that? Data can be displayed using different formats that use different numerical base. Without getting into all the details, numbers can be translated from numerical systems in one base to a numerical system in another base. Why would you want to do that? It’s more efficient to process the data in other formats.

You are used to using a decimal numbering system. Take the number that represents the total quantity of fingers most humans have on both their hands. To you that number is 10. In hexadecimal that value is A.

When you get into cybersecurity you typically learn to convert hex to decimal if you are on the technical side. Often times you do this to inspect the bytes of individual packets which are sent on the wire in hexadecimal format.

Sometimes attackers will insert crafty values into the bits and bytes. You need to translate that into something you can understand to see what’s going on. Even data types in network packets need to be correct! I wrote this cheat sheet to translate hexadecimal to binary when I was studying for one of my cybersecurity exams.

You can also convert binary or hex to ASCII (the text you view in a text editor.) Depending on the file type, the ASCII translation may look like gibberish. That’s because the file you’re dealing with is a binary file. The application that reads that file expects and processes binary files, not text files.

Open a binary file in a hex editor to view the contents as hexadecimal. Your programming language of choice usually has a way to inspect the bytes at the beginning of the file. You’ll want to ensure that the first view bytes match the expected file signature. In our example above for a PNG file that would be the following according to our file list above:

89 50 4E 47 0D 0A 1A 0A

If you want to see what this translates to in ASCII you can use your programming language of choice, or one of many online hex to ASCII converters. Just don’t post sensitive data into these online converters and try to use a trusted source.

I plugged the hexadecimal value above into this online converter and you can see the results:

Anywhere you use files you should check to see that they are the type you expect. Different files will have different values at the beginning to indicate the file type. You should be checking those file types when you allow file uploads on a web site or open a file using an application that expects a certain type of file.

What could go wrong if you allow a website to upload the wrong type of image? Just today I read about a phishing attack that uses files with double extensions:

What that means is someone creates a file name with two extensions like:

image.png.exe

When you view this file on some operating system configurations that hide file types the person about to open the file would only see the png, not the exe and assume it’s an image. If your file upload process let’s that through, when the person double clicks on the file thinking it is an image it would execute. Perhaps a file has an extension of png but gets interpreted as an executable by some other program because it bases decisions on the contents of the file, not the file name. Many variations on this them exist.

Not only should you check file types you process, you should also add the proper information at the beginning of files you create to ensure they are processed correctly. If you’ve ever used bash scripts on linux you may have seen this value:

#!/bin/sh

That’s called a Shebang and it tells the system which program to use to execute the file. In that case, the system should use the Bourne shell or a compatible shell in the /bin/sh directory.

If you are writing a script for python3 you would put a similar line at the top of your file, indicating where the version of python3 exists that should execute your program.

#!/usr/bin/env python3

When you create web pages you should specify the proper MIME type. You can read more about MIME types here:

What can go wrong if you don’t specify the proper content type for a web page? A browser may process a page incorrectly, allowing an attacker to insert executable code. Here’s an example of such a vulnerability where a JSON file doesn’t have the proper “application/json” content-type. The content could be interpreted as HTML code with JavaScript that gets executed by the browser.

Inspect the files you receive to ensure they are the proper type. Reject them if they are not. Ensure that the files you create properly identify their type so they get processed correctly.

Actions on Objects

When using objects you may leverage techniques such serialization. When leveraging these different techniques ensure that you are using the correct data types and continue validating data as it moves around throughout your systems.

The Equifax breach resulted from a serialization vulnerability. Serialization has been the source of many vulnerabilities over time. Serialization is complicated and you might choose not to use it at all, due to the complexity it adds to your application. If you are going to use it, you’ll want to do extensive review and testing. Validate that you only serialize expected data types. Make sure that as you serialize and deserialize data make sure that you validate the different values as they are processed. Ensure the integrity of each object remains intact as you use it in your application and pass it from one system to another.

I might delve more into serialization in the book, but if you want to learn more check out the guidance on the OWASP website.

Just as with mismatched primitive data types used by different software libraries, components, and APIs may process data differently. That can cause discrepancies that lead to security problems. Make sure the different systems you use handle data types the same way. Be aware of where they do not and incorporate the appropriate controls to prevent system compromise.

One common problem I see on penetration tests is called Request Smuggling or HTTP Desync attacks. I’ve explained that concept in prior presentations. This type of attack occurs because an HTTP request is processed differently by two systems. You can think of an HTTP request as a data type. There’s a specification defining how HTTP requests should be formulated and behave, but the variance in how different software programmers implement the specification can lead to security problems.

The easiest way to prevent that problem would be to use systems that process the data the same way, but that isn’t always possible. In that case, you’ll need to implement additional software or use security controls that minimize the risk. Be aware of all the ways that data types affect your security. Leverage your engineering skills to choose the right data types, validate them carefully, and test your systems thoroughly.

Next Steps

  • Consider whether you want to use a strongly-typed programming language when you choose a language for a software application.
  • If you choose to use a language that does not enforce data types, carefully consider inputs and outputs when using it.
  • Choose the correct data type for your application, taking into consideration known anomalous behavior such as rounding errors.
  • Validate inputs and outputs when you create your own custom data types.
  • Validate data types as they pass through your system to avoid type confusion, overflows, and other security problems related to mismatched data types or data processing.
  • Check file types when processing files. Reject invalid file types. Add the proper file identifiers to your files.
  • Understand and test the use of null values by your programming language and in your applications and data types.
  • Test applications with bounds checking to find errors related to mismatched or overside inputs that cause incorrect outputs and undesirable behavior.

Teri Radichel

If you liked this story please clap and follow:

Medium: Teri Radichel or Email List: Teri Radichel
Twitter: @teriradichel or @2ndSightLab
Requests services via LinkedIn: Teri Radichel or IANS Research

© 2nd Sight Lab 2022

____________________________________________

Want to learn more about Cybersecurity and Cloud Security? Check out: Cybersecurity for Executives in the Age of Cloud on Amazon

Need Cloud Security Training? 2nd Sight Lab Cloud Security Training

Is your cloud secure? Hire 2nd Sight Lab for a penetration test or security assessment.

Have a Cybersecurity or Cloud Security Question? Ask Teri Radichel by scheduling a call with IANS Research.

Cybersecurity & Cloud Security Resources by Teri Radichel: Cybersecurity and Cloud security classes, articles, white papers, presentations, and podcasts

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Teri Radichel

Teri Radichel

Cloud Security Training and Penetration Testing | GSE, GSEC, GCIH, GCIA, GCPM, GCCC, GREM, GPEN, GXPN | AWS Hero | Infragard | IANS Faculty | 2ndSightLab.com