Securing PDF Generators Against SSRF Vulnerabilities

Securing WeasyPrint and wkhtmltopdf against SSRF

Rick Ramgattie
3 min readMay 29, 2023

A couple of months ago, I was trying to figure out how I could secure a PDF generator running in AWS Lambda against SSRF attacks. SSRF attacks are a type of attack where an attacker can trick a service into making requests to arbitrary resources. This can be used to steal sensitive information or to launch denial-of-service attacks. For some examples of how PDF generators are exploited checkout Owning the cloud through SSRF and PDF Generators by Ben Sadeghipour and Cody Brocious.

To secure PDF generators against SSRF vulnerabilities you can often configure them to not interact with external resources or only allow a limited subset of protocols and resources. However, since this generator was in AWS Lambda I was unable to set up an egress firewall.

Since I could not use an egress firewall I set out to understand how I could restrict what external resources are fetched at the application level. In this blog post, I will provide approaches to securing WeasyPrint and wkthtml against SSRF attacks.

WeasyPrint:

When I was looking into how I could disable or limit external resources fetched by WeasyPrint I found out that they provide the ability to use custom URL fetchers. If you scroll down to the Security section you will see custom URL fetchers will provide you with the ability to limit what protocols are used and set a timeout for requests to prevent DoS.

Figure 1. WeasyPrint PDF generator with custom URL fetcher and timeout of 10 seconds.

Securing WeasyPrint was fairly straightforward. After adding in the custom URL fetcher I was unable to access any external resources outside of those available at https://ramgattie.com, and the timeout ensured that requests weren’t going to last more than 10 seconds.

The problem was that it was a lot slower than I wanted it to be (even if I got rid of the timeout altogether). I had to bump Lambda’s timeout to 15 minutes if I wanted large (> 1MB) HTML to render. That did not seem reasonable which lead me to look at some alternatives like wkhtmltopdf.

wkhtmltopdf:

Using wkhtmltopdf involved using their custom Lambda layer that allowed me to call the binary directly using Python’s subprocess library. This meant that I could make use of its command line arguments to disable functionality that could allow for SSRF. I found this blog on how to use that Lambda layer pretty helpful.

After reviewing their command line arguments documentation and some experimentation I learned that you could set --proxy None to prevent it from fetching external resources and--disable-local-file-access to prevent it from reading local files. I also used --disable-forms , --disable-plugins , and --disable-javascript because I had no need for them.

After doing a bakeoff between wkhtmltopdf and WeasyPrint it was clear that wkhtmltopdf was the winner. That same large HTML file that was causing Lambda to timeout was done within less than a minute.

Conclusion

When converting HTML to PDFs you need to ensure that user input is sanitized and the environment is appropriately sandboxed to prevent DoS and information disclosure. If you find yourself in the need of sandboxing a PDF generator that cannot be easily placed behind an egress firewall you should consider using the generators built-in fetching functionality to prevent access to unintended resources.

--

--