Pickling Perils in Python

Amit Nigam
6 min readJan 24, 2023

--

In this article we will examine in detail what pickling or serialization is in Python. And build on that understanding to bring to light how it can lead to RCE or Remote Code Execution when it is performed without adequate sanitization of the input data. We will wind up our discussion with a comparison between Python Pickle & JSON and also look at how to securely use the pickle library.

What is a Python Pickle?

Pickle is a module in python that is used to serialize or de-serialize data. Typically this data is some kind of a Python Object structure e.g., a class or a string etc. Serialization essentially means that we are converting the Object into a Byte Stream with the intention of storing it somewhere (on a file) or transmitting over the network. In either case, this byte stream should have the ability to spring back into the original Object upon de-serialization.

Let us look at an example with a String Data Type.

Pickling a String Object

Basic Example of Python Pickle using String Data Type

In this quick example, we are accepting a user input (String) on line 4 and then we Pickle it and Base64 encode it on line 5. We can see the pickled object as we are printing it on the screen (Line 6). Finally, We Unpickle the object and we get back the original String. Following snippet captures the output of this basic program.

Output of the Basic Example script

As can be seen from the output, we were successfully able to Pickle the string “Pickling Perils in Python” and retrieve it by performing the un-pickling operation on it. An interesting point to observe is that whenever we have gASV in the string somewhere it might be indicative of a Base64 encoded pickled object. Typically if cookies contain gASV then they might be keeping pickled data in them.

We can pickle not just basic data types like Strings but in fact most of the Python objects. Let us extend our existing script to include pickling of a Python Class.

Pickling a Python Class

Lines 11–30 extend the basic Pickling construct to a Class Object

Here is a quick explanation of what it is that we are trying to do.

a. In Lines 13–19 we define a new Class called Dog that contains 2 properties (name & breed) and a Method (introduceSelf)

b. On line 21 we create an Object of type Dog and store it in variable a. The Object has a name “Tommy” and a breed of “Labrador”

c. On line 22 we test the object created by calling the introduceSelf method off of the object

d. In lines 25–27 we pickle the Dog object and also base64 encode it. We then print it out on the console. We also un-pickle the object and store it inside another variable “outClassPickle”.

e. Finally on lines 29–30, we print out the name property from the de-serialized variable “outClassPickle” and also access the method available to the object i.e., introduceSelf

Here is the output of the above extended script

Output from the extended pickling script developed above

This demonstrates how we can not only pickle strings but also pickle python objects and then re-construct them.

This looks good, so where is the danger?

One of the methods of the Pickle module called __reduce__ can lead an attacker to code execution in case if he/she is able to control the data that is being serialized. The __reduce__ method can be made to return a Tuple with a couple of mandatory arguments.

a. Callable Object that is called to re-create the pickled object

b. Another Tuple containing arguments for the callable object.

We can provide a callable object like os.system and provide it System Commands in the Tuple (second argument) and leverage the __reduce__ method to do the rest of the work for us. I know this might not make sense with just this theory. Let me, show a snippet of code that will help.

Example Code to achieve RCE using Python Pickle

As provided in the comments, the following code declares a class RCE and implements the __reduce__ method in it. Inside of the __reduce__ method we return a Tuple. The Tuple contains a Callable Object which in this case is os.system and it also contains a second argument which is another Tuple that contains the arguments for the callable object to reconstruct the serialized object. In this example we pass the command entered by user directly to os.system to achieve code execution.

Following Output will make things clearer.

Achieved Code Execution

We can up the game by providing a command that will collect the privileges of the current user.

Code Execution

From here one can work towards getting a complete reverse shell. So we see that we were able to get full RCE by leveraging the __reduce__ method in Python’s Pickle library.

So to summarize one can say that, if an attacker is able to spot a pickled object and is then subsequently able to control the data that is being serialized means that he/she can serialize custom objects that will provide them RCE on the server.

Comparing JSON & Python Pickle

JSON is definitely the secure choice when we have to accept untrusted data or work with foreign data. However it is limited in its ability to serialize some of basic built-in python data types. It can’t serialize custom classes which Pickle can do because it is native to Python.

So essentially the choice would come down to the use case. If there is a need to serialize complex python objects and the pickling and un-pickling happens in a controlled environment with high integrity then it is a good idea to leverage Python Pickle. Else one might be tempted to use JSON to perform text serialization.

Conclusion

I hope the article was useful and helped you to clear out a few points about getting RCE through Python’s Serialization Library (Pickle). I have provided the script I used to demonstrate about pickling below. Cheers!

import pickle
import base64

x = input('Enter any string for pickling: ') # Accept a String input
pickled = base64.b64encode(pickle.dumps(x)) # Base64 encoding the Pickled Object
print(f'Pickled Object is {pickled}') # Printing the Base64 encoded object

# Printing back the original De-Serialized or unpickled STRING
print('Unpickled Object is: \n',pickle.loads(base64.b64decode(pickled)))

print('\n\n[+] ---------------------More Complex Pickle Object -----------------------------------\n')
# Defining a Dog Class with a couple of properties & one method
class Dog:
def __init__(self,name,breed) -> None: # Class requires a Name & Breed
self.name = name
self.breed = breed

def introduceSelf(self): # A Method to introduce itself
print(f'My Name is {self.name} and my breed is {self.breed}')

a = Dog('Tommy', 'Labrador') # Instantiating class by defining a Dog Object => a
a.introduceSelf() # To show that new Dog object can introduce itself

# Pickling the Dog Object
classPickled = base64.b64encode(pickle.dumps(a))
print(classPickled) # Print the pickled & base64 encoded object onto the screen
outClassPickle = pickle.loads(base64.b64decode(classPickled)) # Un-pickle the object & store in another variable

print(outClassPickle.name) # Access the name property
outClassPickle.introduceSelf() # Access the introduce Method for the pickled object

--

--

Amit Nigam

I like studying up about Cyber Security, Programming & Hacking. An Engineer & MBA by qualification. Want to give back to this wonderful community !!