Sensitivities in Static Code Analysis

Program analysis fight against imperfection

CodeThreat
Dec 1, 2020 · 7 min read

It’s impossible to provide a correct solution to non-trivial security questions for security automation tools. Nevertheless, this limitation doesn’t actually undermine their value being great helpers to make our applications secure. Without the automated tools it’s quite hard to keep up with the speed of current development practices.

That being said, for example, we can’t build an algorithm to check whether a target program is SQL Injection free or not.

Imperfection is an inherent part of an automated security analysis solution and sensitivities are mechanisms to increase the precision of these tools (with a cost)

Sure, there are good tools to check this interesting property, however, never with %100 certainty. Please read our previous blog post for a proof. This is why tools have certain assumptions and they just approximate.

What kind of assumptions do automated security solutions make or limitations do they have? Sqlmap, one of the best dynamic tools for evaluating and exploiting SQL Injection, has many such as the number of right parenthesis to use or sleep time to wait. Without these boundaries we can quickly have an halting problem at our hands.

How about static analysis (SAST)? They, too, have to have many rough approximations in order to dodge being an expensive solution for both time and financial wise. For instance many of the SAST tools exclude the reflection APIs in order not deal with really complex cases.

Apparently, various approximations make SAST tools loose precision and it’s important to know them to get some idea on the effectiveness of these tools. Sensitivities are types of approximations that a tool may pursue in order to increase its precision of issues it produce.

What is sensitivity?

What happens if a static analysis tool issues a SQL Injection whenever it finds a SQL related API syntax in a target system? So, assume that whenever the tools sees a call to java.sql.Statement.executeQuery or System.Data.SqlClient.SqlCommand.ExecuteReader, it flags a SQL Injection never checking whether the SQL command is constructed with untrusted input or hard-coded.

Sure that behavior will make the tool complete. That means it will never miss a vulnerability for this category of weaknesses. However, reporting every single case of SQL related API usage will only make it useless because of the false alarms.

This is extreme approximation and it cripples the issue quality immensely; false alarms and missing alarms. Surely, as being a security tool have to run away from being useless by supporting and implementing various sensitivities in order increase the precision.

A simple example

To make the case for sensitivities concrete, consider the following code. The GetUserInput returns a string that is given by an end-user of our application. That means it cannot be trusted. If that input ends up as an argument for the Execute method, which runs the given argument as an operating system command, that means we have a security risk here.

a = 3; b = 5;
input = String.Empty;
if (a < b)
{
input = GetUserInput();
}
else
{
input = String.Empty;
}
Execute(input);

When we analyze the code manually, we can see that input string can never be set to an untrusted input, since a is smaller than b. If a static code analyzer report an issue here, it’s naturally hard for a developer to grasp the validity of this issue. That is a false alarm.

This is a simple example, since the values of a and b are hard-coded. What if they had dynamic values? That’s where we start to comprehend the real challenge and understand that the hardness of different problems vary.

Returning to our specific case, any tool which can manage to evaluate the above conditional and not issue the alarm is superior than the one that issues a vulnerability blatantly.

Types of sensitivities

We’ll try to list some of the sensitivities defined in static code analysis in order to understand various venues to increase the precision of the results. Moreover, we will try to shortly comment on the cost of having such an effort.

The first one is an internal property of a Data Flow Analysis (DFA), flow sensitivity. The simple looking code below, shows us a very basic taint-flow of the input we got from the end-user by the GetUserInput to a dangerous Execute method. Since we assign an empty String to the taint variable, input, before feeding it into the Execute method, there’s no vulnerability here.

input = GetUserInput();
input = String.Empty;
Execute(input);

This is manual analysis. A flow-sensitive automatic analysis will not issue a vulnerability here, neither. Since, the order of statements is taken into account with strong updates.

A flow-insensitive automatic analysis, however, will definitely issue a vulnerability here, which is then would be a kind of a stupid looking false alarm.

How about spicing things up a little bit with the code below. As a static analysis tool, if we don’t care about evaluating the conditionals (which happens to be the second sensitivity we will talk below) what is the order of statements we have to take account and should we issue a vulnerability or not? I’ll leave this to you as a simple exercise.

input = String.Empty;
if (shouldAllow)
{
input = GetUserInput();
}
else
{
input = String.Empty;
}
Execute(input);

While it’s expensive to run a flow-sensitive analysis, it naturally increases the precision.

Path or Predicate sensitivity is about taking conditionals into account during the analysis. Here’s another simple example to illustrate this mechanism. The code below doesn’t have a valid path that leads the tainted input from the GetUserInput to the Execute. It’s easy to see manually that as the first if block contains a statement that makes sure that a < b will not be true at the second if conditional.

a = 3; b = 5;
input = String.Empty;
if (a < b)
{
b = a - 1;
input = GetUserInput();
}
else
{
input = String.Empty;
}
if (a < b)
Execute(input);

So, a static analysis tool that issues a vulnerability here produces a false alarm and thought to be path-insensitive because it doesn’t take predicates into account.

Surely, a path-sensitive analysis is superior, however, it cost is too much and sometimes impossible.

While can be applied to any flow analysis, context sensitivity is mostly attributed to an analysis called points-to analysis. In short, since the references play an important role in programming languages, we have to calculate which references points to which memory areas to increase our issue quality, especially decreasing the number of missing alarms.

Let’s look at this simple code block. It gets an untrusted input from the GetUserInput method and then produces two separate variables, output1 and output2, by calling the same Identity method with different arguments; input and String.Empty respectively. What the Identity method does is just returning the argument it gets without any modification.

It’s easy to see that output1 is tainted and output2 is not. So, there’s no security problem in the dangerous Execute method having output2 as the argument.

/* 
* Name: Context sensitivity
* Description: Different calling contexts are taken into account separately
*/
input = GetUserInput();
string output1 = Identity(input);
string output2 = Identity(String.Empty);
Execute(output2);string Identity(string id)
{
return id;
}

If the automated analysis is context-sensitive, it can differentiate the two call-sites of Identity as two separate calls with two separate input values. This differentiation will prevent the analysis to issue a false alarm here. Otherwise, the analysis would think that there’s only one Identity function call in the code block and the output could also contain a tainted value for output2, too.

For static analysis it’s really expensive to differentiate the call sites in calculations, however, apparently this really increases the quality of issued security alarms.

Yet another sensitivity in static code analysis is field or index sensitivity. Since we have aggregated data types, it becomes important to be able to take different fields of an object into account separately to increase the precision.

Let’s check out the code below. The question is “Do we have a vulnerability leading to the Execute method?”. The answer is simple. NO. Because, the field of the foo object that is given to the Execute method is not tainted. While it’s the same object, the ParamA field is tainted but not ParamB.

Foo foo = new Foo();
foo.ParamA = GetUserInput();
foo.ParamB = String.Empty;

Execute(foo.ParamB);
class Foo
{
public string ParamA;
public string ParamB;
}

This differentiation is called field-sensitivity and easy to understand. It’s expensive to keep track of all these different members, thinking all the nested objects for a data flow analysis. But apparently it also increases the precision of the alarms reported.

The same idea is depicted in the following code. This time we use arrays instead of classes, however, the idea is same. I’ll leave this as an exercise for you to deduce why it’s hard and expensive to not issue a vulnerability here.

string [] myArray = new string[len];
myArray[0] = GetUserInput();
myArray[1] = String.Empty;
Execute(myArray[1]);

Similar to context sensitivity, object-sensitivity dictates taking different objects for the same field to be taken account separately.

The code below contains two instances of the Foo class; foo1 and foo2. One of them contains a tainted value and the other doesn’t. Since the Execute method is fed with a field of the non-tainted one, foo2, we don’t have a vulnerability here.

Foo foo1 = new Foo();
foo1.ParamA = GetUserInput();
Foo foo2 = new Foo();
foo2.ParamA = String.Empty;
Execute(foo2.ParamA); class Foo
{
public string ParamA;
}

A object-sensitive analysis is obviously expensive but more precise.

Conclusion

The precision of a static analysis tool is just like a medical test. Think about COVID testing kits. Test kit brands aim to be as precise as possible with their results. Otherwise, it may report COVID positive for the people who don’t have the virus (false alarms). And negative for the people who actually has it (missing alarms).

The precision is about having false alarms as low as possible. But this comes with a price. Producing such a test kit may take too much time and energy, but sure it’s worth it. However, such precise tests may also be difficult to apply or need weeks to report the result.

The same goes with the security static code analysis. Increasing the precision is a trade-off between being a complex useful tool which takes too much time and resources to execute and simpler but less useful tool taking less time.

The Startup

Get smarter at building your thing. Join The Startup’s +730K followers.