How to prevent data leakage when using AWS CloudFront to cache API requests

Glib Pushkov
4 min readOct 12, 2022

--

thanks to https://nightcafe.studio/ for helping with the image

At some point in a product lifecycle, a software developer Florian saw that users started accessing the application from multiple countries/regions. To enhance their experience, he decided to leverage HTTP caching using CDN and Cache-Control headers. He thought, “Mmm, it’s easy and a high-impact improvement. If a cache behavior is carefully configured — nothing bad will happen, for sure!” and created a task for an upcoming sprint.

A task required the following steps to be done:

  1. Update views/handlers in the application code to set
    - “Cache-Control: 86400” header for public endpoints, which can be accessed without Authorization header (set cache for one day);
    - “Cache-Control: no-cache” for the response objects in private API endpoints.
  2. Create a CloudFront distribution on top of Elastic/Application Load Balancer and configure “Default (*)” cache behavior with a custom cache policy (see image below).
  3. Update route53 configuration to connect api.savelife.in.ua with a newly created CDN.

Florian thought that this configuration doesn’t cache any API responses by default (even for cases when “Cache-Control: no-cache” is not explicitly provided by the backend), and allows users to access cached data only for public endpoints when they use the same query parameters like https://api.savelife.in.ua/v1/donations?country=usa&dateFrom=24-02-2022 (in despite of different Authorization (and other) headers they provide)

After implementing all these changes in a staging environment it was clearly seen that public URLs which were supposed to be cached — were delivered from CF Edge Location, and private URLs in different browser tabs for different users were retrieving correct user-specific data. Ready to deploy, huh?

…and after an hour after production deployment, the first email with a user complaint reached a mailbox. The user said that sometimes when he refreshes his profile page he sees the data of other users.

“WHAAAAT??? HOW??? Configuration is crystal clear, all headers are there!!!” was thinking Florian while he was quickly updating “Cache key settings” to win some time to investigate. “Headers” were temporarily added to a cache key, so each authorized user had their own cache for each URL they accessed (as they had a unique Bearer JWT token in the “Authorization” header).

As a modern software developer, Florian started searching in Google/StackOverflow. No luck. “We need to go deeper,” he thought and started reading CloudFront documentation. His eyes caught an interesting title “Simultaneous requests for the same object (request collapsing)”.
A chapter was saying that despite any configuration (even if you choose “CachingDisabled” AWS-managed policy) when multiple requests for the same URL (and with the same cache key) hit Edge location at the same time — only one of them will reach ‘Origin’ server (your backend) and the response will be shared among all these requests.
It was a long-awaited dopamine-rewarding aha moment!

The first solution that came into his mind is to add more cache behaviors.

The “Default (*)” would be kept as it is — with a cache key that is generated based on request headers (it would be used for all private URLs). And for each public endpoint that is supposed to be cached — create a new behavior with a higher priority and use a custom cache policy (only a query string is taken as a cache key). At the same time, this solution is temporary, as it’s easy to forget to extend CloudFront configuration once a new public endpoint is added. But again, right now it can help to save time and to implement a better solution without a rush.

And as a long-term solution, the more suitable would be to create a separate ‘cached’ subdomain and on a frontend side define which API requests have to be retrieved from CloudFront, and which from the application directly.

To be sure that nobody would be able to access ‘private’ endpoints it makes sense to add an ‘Origin request policy’ to a cache behavior that drops an ‘Authentication’ header while CDN will be forwarding a request to an origin server.

For Florian, it was a pretty stressful day and he was glad that he would be able to sleep peacefully. “I should write a short story to warn people not doing my mistakes,” he thought while he was buying a bottle of beer in a Späty. Florian slowly went home on foot, catching the last glimpse of a Berlin sunset.

Again, giving credit to https://creator.nightcafe.studio/ for the generated image!

P.S. https://savelife.in.ua/ is taken as an example and has no relation to the article. The russia-Ukraine war is still going on and to prevent a genocide we need any support, even the smallest.
The best way to do it — is via https://savelife.in.ua/en/. Thank you 🇺🇦
Hope the war will never come to your country and to your home.

--

--

Glib Pushkov

Berlin-based life enjoyer. Switching from Python to Golang.