SOUR PICKLE : Insecure Deserialization with Python Pickle module.

abhishek kumar
4 min readMar 19, 2020

--

This is one of those findings which is very difficult to identify during security testing. The best chance we have is while doing Source Code Review of the vulnerable code base. Thus, today will we look from the source code perspective, how a vulnerable code looks and how to flag this.

This certainly is one of those critical bugs that you don’t wish to have in your system. This is the reason why it has earned its place in the current OWASP TOP 10 findings. Below are some to the attacks that can be performed if Insecure Deserialization is present.

i. RCE (Remote Code Execution) : This is by far the most popular and undoubtedly, a serious vulnerabiltiy.
ii. Data Extraction : HTTP cookies, HTML form parameters, API authentication tokens.
iii. DoS (Denial of Service) attack.

Before diving right into the vulnerability itself, let’s understand briefly what is Serialization, Deserialization and their use in real world.

Serialization : Serialization is a mechanism of converting the state of an object into a byte stream, JSON, YAML, XML. JSON and XML are two of the most commonly used serialization formats within web applications.
Deserialization : It is the reverse process where the byte stream is used to recreate the actual object in memory.

We serialize the data for the purpose of storing it into the memory(hard disk) or sending the data over a network.

Now that we have a fair bit of understanding of the terminologies involved, let’s just address the elephant in the room.

As the name of the vulnerability suggests, this vulnerability arises when the we deserialize the data that might be coming from a malicious source.
Python’s native module for serialization and deserialization data is called pickle. The process of serialization and deserialization is called “pickling” and “unpickling” respectively.
It is worth noting that python in its official documentation mentions this module “is not secure” and it should be handled with due care.

Let’s see a vulnerable code snippet and try to understand it.

############################################
#example of unsafe de-serialization
import pickle
import os

#1. creating a Evil class which has our malicious payload command (‘whoami’)
class MyEvilPickle(object):
def __reduce__(self):
return (os.system, (‘whoami’, ))

#2. serializing the malicious class
pickle_data = pickle.dumps(EvilPickle())
#storing the serialized output into a file in current directory
with open(“backup.data”, “wb”) as file:
file.write(pickle_data)

#3. reading the malicious serialized data and de-serializing it
with open(“backup.data”, “rb”) as file:
pickle_data = file.read()
my_data = pickle.loads(pickle_data)

###########################################

The above example has essentially 3 parts:

1. Creating the malicious serialized input : Here we are creating a class called “MyEvilPickle” which overrides “reduce” directive. We placed our malicious payload inside this “reduce” directive which is “whoami” command (LOL :p) in our case. You can replace is with a payload of your choice to get reverse connection or anything. We used “pickele.dumps” method for serializing the object.

2. Storing the supposedly “malicious” serialized object into a local file location.
Note: In case of actual application code, you have high chances that this serialized object is coming through HTTP request instead of reading it from a file.

3. Reading the supposedly “malicios” serialized object and Deserializing it using “pickle.loads” function.

It is step 3, where all the magic happens. Here we are deserializing un-validated user provided serialized input data. This is a classic case of “insecure de-serialization”.

Now you may wonder , what is this “reduce” directive?
Well, it plays a pivotal role in creating our malicious serialized data. Whenever you try to pickle (serialize) an object, there will be some properties that may not serialize well. For instance, an open file handle (open resource), pickle won’t know how to handle the object and will throw an error. You can tell the pickle module how to handle these types of objects natively within a class directly by overriding “reduce”. Here we used this same “reduce” directly for inserting our malicious command instead of telling pickle how to handle the de-serialization.

When we called the “pickle.loads” method, pickle first refers to this “reduce” directive in order to see if there is any instruction which needs to executed while deserializing the data. In our case it will execute our “so called possibly malicious” “whoami” command and we achieved RCE attack.

Now the obvious question comes, how do we remediate this issue ?
Well we can use a combination of below mentioned solutions:
i. Not accepting serialized objects from unknown or untrusted sources.
ii. Implementing integrity checks such as digital signatures on any serialized objects to prevent hostile object creation or data tampering.
iii. Running the deserialization code with limited access permissions.
iv. Using a firewall which can help detect insecure deserialization.

Now the last one, How to identify this while doing Source code Review/ Analysis ?
There are certain automated tools which might flag this “pickle.loads” method but you will have to track back the input source of the serialized data.
While doing this manually, we can follow a step by step approach as mentioned below :
1. look at the import section on top to establish, if pickle module is being used or not.
2. look for the “pickle.loads” method use, whether it is called on an input parameter or not.
3. Track to the source to see if the input source is user controlled and can possibly be tainted.

Yeah, this marks the end of this blog. I’ll continue adding more blogs on new interesting topics. “HASTA LUEGO, HASTA PRONTO!”.

References:
https://docs.python.org/3/library/pickle.html
https://owasp.org/www-project-top-ten/OWASP_Top_Ten_2017/Top_10-2017_A8-Insecure_Deserialization

--

--