Hacking Python Applications

And how attackers exploit common programming pitfalls to gain control

Vickie Li
Vickie Li
Nov 15, 2019 · 7 min read

How secure an application is, has very little to do with language choice. You can program securely in languages that are prone to vulnerabilities, and do very insecure things with languages designed to be secure.

However, there are features that developers should be on the lookout for as potential weaknesses in every language. Today, let’s talk about a few dangerous features that could be exploited by attackers in Python.

Exploiting dangerous functions: eval(), exec() and input()

Dangerous functions in Python like eval(), exec() and input() can be used to achieve authentication bypass and even code injection.

The eval() function in Python takes strings and execute them as code. For example, eval(‘1+1’) would return 2.

Since eval() can be used to execute arbitrary code on the system, it should never ever ever be used on any type of unsanitized user input. Let’s look at a vulnerable application for example. The following calculator application uses a JSON API to accept user input:

When operating as expected, the input

Would cause the program to print “The result is 3.

But since eval() would take user-provided input and execute it as Python code, a hacker could provide the application with something more malicious instead:

This input would cause the application to call os.system() and spawn a reverse shell back to the IP 10.0.0.1 on port 8080.

exec() is similar to eval() as they both have the ability to execute Python code from a given string input. The following program could be exploited in the same way as above:

In Python 2, there are two built-in functions for taking user input: input() and raw_input(). Whereas in Python 3, there is only one: input().

The difference between input() and raw_input() in Python 2 is that raw_input() will take the user input and convert it into a string before further processing. Whereas input() will retain the original data type of the supplied value.

So what’s the issue with this? Using Python 2’s input() function could mean that attackers are free to pass in variable names, function names and other data types, leading to authentication bypass and other unexpected outcomes.

For example, if a program is using the following code for access control:

The attacker could simply pass in the user_pass variable name as input and the test case would pass since the program would interpret the user input as a variable. The Python conditional would then become:

The attacker could even pass in get_user_pass(“admin”) and get the same result as the user input would be interpreted as a function call.

Because of these security concerns, if using Python 2, raw_input() should be used instead of input().

This vulnerability is eliminated in Python 3. The only input function in Python 3, input(), behaves in the same way as raw_input() in Python 2, and will always convert user input to a string.

Exploiting string formatting

Another dangerous Python function is str.format(). If an application uses str.format() on a user-controlled format string, an attacker might be able to access arbitrary data of the program via crafted format strings. This is an easy-to-exploit and severe vulnerability that leads to authentication bypass and leaks of confidential data.

Python 3 introduced a new way of formatting text that is much more powerful and flexible than the old format strings using % operators. One of the features of the new string formatting functionality is that you could access the attributes of objects. This means that you could do something like this:

Imagine there is a program like the following that allows users to format their own nametag using str.format().

You could do something like this with the format string:

This would output:

The issue arises when users control the format string directly, and when a Python object is passed into the format string. This is due to the use of special attributes of Python object methods. These attributes can be used to leak a variety of program data. For example, the attribute __globals__ can be used to access the dictionary that stores global variables.

Would return “771df488714111d39138eb60df756e6b”, thus leaking the API key that the application uses.

Exploiting Pickle deserialization

Serialization is a process during which an object in a programming language (say, a Python object) is converted into a format that can be saved to the database or transferred over a network. Whereas deserialization refers to the opposite: it’s when the serialized object is read from a file or the network and converted back into an object.

Image for post
Image for post
Yummmmmmm. Photo by chuttersnap on Unsplash

In Python, serialization is done through Pickles. The following code will print the pickled representation of new_person (this process is called pickling):

Would print:

Whereas pickle.loads(pickled_object) will return the original Python object for the application to operate on. (This process is called unpickling.)

The danger in this functionality occurs when an application unpickles data from untrusted sources. If an attacker can control data that is unpickled by the application, she can cause authentication bypass and often even code execution.

If an application uses information from a pickled object for access control and does not check the integrity of the object, an attacker can simply supply the application with a forged pickle to bypass access control.

Let’s say an application’s session cookie is a string that is the base64 encoded, pickled representation of a Person object. And when the application receives a session cookie, it unpickles it to check for the user’s identity in the “name” field of the object.

Pickling data does not provide any form of data protection. It is simply a way of packaging data for transmission. If the cookie is not encrypted and the integrity of the cookie is not checked before use, an attacker can easily forge a cookie for any user using the following code:

Now for the even more exciting part: achieving code execution by utilizing insecure pickle deserialization!

Remember, a pickle can represent any arbitrary Python object. When an application unpickles a pickle, it is instantiating a new object of that class.

The pickle class allows objects to declare how they should be pickled via the __reduce__ method. This method takes no argument and returns either a string or a tuple. When returning a tuple, the tuple will dictate how the object is reconstructed during unpickling. The tuple should be in the form of:

This means that if an attacker defines a __reduce__ method in an object, the pickled object could be instantiated as something else during unpickling. Now if the attacker constructs a malicious object like this:

She can make the victim application call the following upon unpickling:

This would spawn a reverse shell to the IP 10.0.0.1 on port 8080.

Exploiting YAML parsing

Another way that insecure deserialization can endanger Python applications is through the loading of YAML files.

YAML, interesting enough, stands for “YAML Ain’t Markup Language”. It is a data serialization standard that is widely used across programming languages. In Python, PyYaml is the most popular YAML processor.

YAML files, similar to pickles, can represent arbitrary Python objects. In PyYaml, you can package a Python object into a YAML document like so:

This will print a string like this:

To reconstruct the YAML file to the original Python object, applications call:

Similar to pickle deserialization issues, YAML loading gives attackers a way to forge arbitrary objects and achieve code execution.

If the application uses user-supplied YAML for access control and does not check for integrity of the YAML file, a malicious user can potentially forge arbitrary YAML documents to bypass access control.

For example, if the above code is used to generate the session cookie for a user, an attacker can simply generate a forged session cookie:

If the application uses PyYaml < 4.1, it is also possible to achieve arbitrary code execution by providing the application with an os.system() command inside a YAML:

Other dangerous when developing in Python

Besides language-specific vulnerabilities, platform-agnostic issues like XSS, XXE, SQL injection and command injections are always something to be on the lookout for.

In addition, polluted packages and unpatched dependencies continue to be one of the biggest security concerns for Python developers. So be sure to pay attention to those as well!

As always, hope you found that interesting. Thanks for reading!

The Startup

Medium's largest active publication, followed by +733K people. Follow to join our community.

Vickie Li

Written by

Vickie Li

Professional investigator of nerdy stuff. Hacks and secures. Creates god awful infographics. https://twitter.com/vickieli7

The Startup

Medium's largest active publication, followed by +733K people. Follow to join our community.

Vickie Li

Written by

Vickie Li

Professional investigator of nerdy stuff. Hacks and secures. Creates god awful infographics. https://twitter.com/vickieli7

The Startup

Medium's largest active publication, followed by +733K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store