Ballerina taint checking guide

Dhananjaya Wicramasingha
Ballerina Swan Lake Tech Blog
7 min readSep 26, 2019

Taint checking is a built in security feature in Ballerina designed to help prevent malicious actors from executing arbitrary commands on remote Ballerina services. This includes attacks such as SQL injection or open redirect. What taint checker does is, flag occurrence of code which may allow malicious users to inject some code and take control over the remote service in an unintended way. Taint checking is carried out in compile time by analyzing taintedness of variables as the values of those variables propagate through the program, hence the name taint flow analysis.

Tainted data or potentially tainted data usually means data that is passed into a Ballerina program, this could be provided by a user or may be read from the disk or a database, how ever this data can use to manipulate the program execution in an undesired way. We can also think tainted data as unvalidated data. The problem is this validation is domain specific, therefore it’s difficult to perform the validation as soon as the data is read. What the taint checking does it follow these potentially tainted data through the program and when they are passed to a security sensitive area of a program, notify the developer about the situation so that she can perform the specific validation most suitable to the situation at hand.

Let’s take a look at the following trivial example extracted from taint checking section of Ballerina by Example pages from https://ballerina.io

If you try to compile/run above ballerina program you will note that the compiler emits an error message:

error: .::taintcheckingBBE.bal:9:39: tainted value passed to untainted parameter ‘sqlQuery’

Or if you use the Ballerina plugin with Visual Studio Code, you will get the squiggly lines like this:

Ok, what’s an untainted parameter? And where did this tainted value came from?

You help Ballerina find the tainted values using the Ballerina annotation mechanism. When you develop a function that may produce tainted values, you can annotate that function return type with @tainted, and compiler will assume that any value from this function is potentially tainted. Ballerina have designated constructs which are considered to produce tainted values such as arguments to main function, and values coming from a service (via network). Finally Ballerina standard library functions are equipped with information regarding the taint behavior of those functions.

And where did this untainted parameter came from? For function parameters that are sensitive to tainted values, the developer can use @untainted annotation to specify that we do not want any tainted values passed into this parameter.

This @untainted annotation specified on a function parameter instructs the taint checker to reject any program that try to pass potentially taintable value as an argument to this annotated parameter.

We can also specify that some function returns value that can be tainted. We do that by annotating the return type @tainted. For instance if a developer to write a function that read file content from the disk and these values can be tainted then the developer should use the @tainted value as follows:

Same way when the developer knows that the values return from her function can not be tainted she can specify that using @untainted annotation in return type.

And when you do your sanitation on a potentially taintable value there are two ways to indicate to the Ballerina compiler that this value is not tainted anymore.

First one is to use the type conversion operator with @untainted annotation to indicate that right hand side part of the <@untainted T> expression is not tainted. Doing this will cause Ballerina compiler to treat that value as an untainted value. Here T is the target type in type conversion.

This @untainted indication can be done with or without the target conversion type. If a type is specified this will perform both conversion to desired type and will indicate that value of this expression is no longer consider tainted. And when this is just a @untainted indication and no type conversion is expected just leave out the target type part as shown in below example.

Second way that you can mark some value as untainted is to use function return type annotation @untainted. In below sample return value from ‘validateAndEscapeForPostgresQuery’ will be considered untainted.

The rules and how to use it.

There are several predefined rules around tainted values, which are designed to help the compiler perform taint flow analysis.

  • Module level variables are not allowed to contain tainted values unless it is annotated @tainted.
  • Closure variables are not allowed to store tainted values.
  • You can not pass potentially tainted value to a parameter marked @untainted.
  • Structured values such as map, list, record, object is considered tainted if any member of that value is tainted.
  • When a function produces a tainted value without it being passed in as an argument to that function you must mark the return type @tainted or @untainted depending on the context.
  • When a function taint a value passed into it as an argument, such as by adding a tainted value to a map passed in as an argument you must annotate that parameter @tainted.

These are the basic rules, the compiler will reject to compile if any of these rules are violated.

Let’s take a look at few taint errors from most straightforward to least straightforward.

“tainted value passed to untainted parameter ‘paramName’” is the hallmark taint error message indicating that you are invoking a function by passing a potentially tainted value to a parameter annotated @untainted. Here you probably should clean up and validate the data to make sure that the value is no longer tainted, and then mark the value untainted using any method described previously.

“entry point parameter ‘paramName’ cannot be untainted” you will notice this error appearing if you annotate any of the parameters of an entry point function, that is parameter of main function and parameter of resource functions.

tainted value passed to global variable ‘moduleVar’, I think this is pretty straight forward, you seems to store a tainted value in a module level variable. And yes, I think the error message needs to be updated as Ballerina does not support global variables but module level variables.

“method invocation taint global object ‘moduleLevelObjVal’”, this is when an object method invocation taints a module level variable due to a particular invocation. Consider following code:

We can see that the execution of setField method on module level object value G would potentially taint G, hence due to the law of module level variables not being able to store tainted values this invocation is a taint error.

“tainted value passed to closure variable ‘closureVar’”

This is another Ballerina taint checking rule in action, the rule is that closures are not allowed to store tainted values. Since args[0] may potentially be a tainted value, ‘test = args[0]’ assignment is invalid in Ballerina. In order to store the value in the closure what you could do is, cleanup and mark the value untainted.

You may also encounter taint error stating “functions returning tainted value are required to annotate return signature @tainted: ‘funcName’”. This is when particular function is returning a tainted value, where that tainted value was originated in this function and not passed in as a parameter, or in other words, this function returns a tainted value when none of the values passed into this function is tainted.

This error have another sibling which goes as “argument to parameter ‘r’ is tainted by ‘foo’ hence require to annotate @tainted”. When a function taints a value passed in as a parameter, even when none of the arguments to that function is tainted, that particular tainted parameter is required to annotate @tainted.

You may have notice that when a function produces tainted values to the caller, without it being passed in, Ballerina mandate you to specify it using @tainted annotation. Purpose for mandating this annotation is to document the taint behavior of the function in function signature, so that it’s easier for the users of that function to reason about it. Now when you see the function documentation you see what are the functions that produce tainted values.

Let’s talk about the limitations of Ballerina taint checker, for some people one of the biggest weaknesses could be that there is no way to disable taint checking, you may have to comply with the taint checking rules even when your domain does not really require taint checking, or you know that all the input data is secured. If this is the issue you are facing my recommendation is to apply <@untainted> as early as possible. This does not have any runtime performance implications as taint checking is complete compile time analysis.

Since taint checker only know how to analyze Ballerina code, when native/extern functions are used with Ballerina code, taint checker solely relies on annotations to figure out the taint behavior of the function. That is @tainted annotation on parameters and return type of an extern functions, is the only way to inform about taint behavior to the compiler . This does not describe how the taint value propagate through the function, other than obeying the taint annotations, if any parameter is tainted we consider return value of that function as tainted. And situations where tainted data from one parameter tainting arguments to other parameters are ignored in the analysis due to this limitations.

At the moment taint flow analysis between workers are not fully supported, complex interactions between workers may not fully track the taint flow between them.

If you have any queries, complains please reach Ballerina dev team at ballerina-dev@googlegroups.com or ask it on StackOverflow using Ballerina tag.

--

--