Recover The Past
Rescue cached files in browser storage
All of us have read and heard enough about the importance of keeping backups. But honestly, sometimes we won’t pay a blind bit of notice about it, and when you’re watching some movies or chilling around with your friends, Murphy’s law shows up and joins the party.
A similar case just happened to me few days ago. A lot of changes have been lost in an accident and as you might guess, it was not an easy one to do over again. Thus, I started looking for logs, FTP client temporary files, editor backups, regular periodic server backups (which were disabled by their administrator. Avoid ’em people.), etc. No luck. Then the idea came to my head. Browser cache will be the last hope.
So, i looked up Chrome’s cache store to find out if it’s stored there yet:
And Firefox too:
Thankfully, it was there. But wait.
Now what? It not usable in this form. Unfortunately, the file itself cannot be found between thousand of files standing there in browser cache directory on disk. It’s a drop in the ocean. Even worse, It’s gzipped and should get decompressed. So, I had to extract file contents from it’s hexadecimal representation then decompress it. A bit of scripting should get it done.
I’ve wrote a PHP script that will do the thing. All you have to do is to open cache entry page in Chrome/Firefox and copy file data section into a new file named “cachefile.txt” inside script working directory and run the code via Command-Line Interface (CLI) or Web Server as you wish. You can get it from this gist:
Finding file data section is straightforward on Firefox, but it might be a little tricky on Chrome because it shows hexadecimal representation of HTTP headers in cache pages too. It’s not a problem by itself, you can easily distinguish it from data section because of it’s short length (about 10 lines). The real problem shows up on HTTPS URLs that cause the header section to extend about 4000 lines or more because of certificate file inclusion. Be sure to skip this one and select data section only.
As a trick, you can find the beginning line of every section by looking for “00000000:” in the page using CTRL+F or CMD+F. The last result will be the first line of data section.
This script converts hexadecimal data into actual file. If you set $is_gzip variable as true, another output file will be generated holding decompressed version. It’s set true by default. You may disable this option if your file is not gzipped. Just for saving some time.
For assistance, you may see an entry like Content-Encoding: gzip in HTTP headers indicating that it’s gzipped for sure.
And don’t worry. It can handle large files pretty well. :)