Capture javascript cache day on website

Introduction

This article is about how to use PhantomJS to crawl javascript cache day and calculate it’s stable day on website. Although, cache day on website indicates how long this javascript will be kept cache in your browser. But in reality, there are so many script files have a very long cached day but they renew frequently .Therefore, combined with cache day and stable day, we can know that long cache day with long stable day indicates this script file renew seldom. Furthermore, this information is very useful in cache poison attack. By the way, this article does not includes further detail about cache poison attack. If you interested in cache poison attack, please read others article by yourself.

In order to implement this script, you have to install node.js and phantomjs in your computer. Since, phantomjs needs javascript engine, I suggest you to install node.js in your computer because it includes v8 engine and the installation is easily. My computer is Mac os, therefore, in the following installation steps I use mac os environment to show it. For other os, please search node.js, node package manager and phantomjs installation tutorial from Internet.

Installation

Before install my previous mention framework, I suggest mac os users install Homebrew firstly, Homebrew is a mac os installation package management tools. It is very useful.

Homebrew website and its installation guide: https://brew.sh/

After, Homebrew installation, open your terminal and type the following command

Then, please type brew install node to install your node.js

After type this command in your terminal, you type node -v and npm -v would see the node.js version and node package manager version finally.

These version may be not match to your install version because I install these long time ago and have not updated yet.

Now, you can install phantomjs. Please go to the phantomjs website to download installation package http://phantomjs.org/download.html.

Firstly, unzip the download package in your prefer folder

Secondly set a environment variable call PHANTOMJS in your environment variable and append to your environment path variable. The environment variable should create in file ~/.bash_profile and after save this file. you should use source command to source this file again.

Finally, you can type phantomjs command in your terminal.

Sniffer script Usage Guide

The completed code is in my github repository: https://github.com/Isaac234517/snifferjs.git

You clone the git repository to your pc and then run sniffer.js with phantomjs sniffer.js -h. you should see the command usage guideline

  • - i means your website list text file.
  • - o means your output json file.
  • - h means to see what kind of arguments you can input with run command

This script capture the javascript cache day and calculate its stable day on the website and output the captured data as json format. Therefore, once you run the script with phantomjs sniffer.js -i website.txt -o ouput.json. You finally got a output file which contain capture information.

All the running and error message will be print out on console. And the final result can be output to a json file and order by cache day firstly and stable day secondly.

Code Implementation

This script is code in javascript language and use phantomjs to carry out the web page crawl. The script logic steps are

  1. Accept the input parameter from console.
  2. Get the url from text file and put it in a list
  3. Scan the url list and save each url result in hash table.
  4. Save the result hash table in a json file.

line 258 run the script and system.args means accept the system input as function parameters. Inside the main function, I use switch case statement to do the logical condition. In phatnomjs each console input as a argument. For instance, phantomjs sniffer.js -i website.txt. the args array contains[-i, website.txt] two elements. I use args[++i] in line 231 and line 236 in order to get the next value. In this script, all the file io implement in FileProcessor class. The following picture is FileProcessor class definition.

FileProcessor class only handle read text from file and output text to file. After read text from file, then the program should filter out the empty line, comment url and https url. After completed the filter process , the compose text function run callback function to trim the space and return a list which contains all the urls.

Now, each target url is inside the list. I write a scan page function to process the list. This function accept three parameter, list, result and callback. List is the collection of target urls, result is a hash table to save our need information and callback uses to print out the final result in console and save it in json file. Inside the scanpage function, I use phantomjs provided function to open the page and capture the download file information. Done function call once the page loading is completed. It prints out all the information which I get from this url. For instance, if I get “xx.js” and “yy.js ”cache day and stable day in url “abc”. This function will print out the xx.js and yy.js cache day and stable in clear format and sort it by cache day firstly and stable day secondly. Before open the page, I have declare function to handle onResourceReceive state and onResourceError state.

Each page opening process undergoes different state. The detail api for handle each state has already expressed in this link clearly. http://phantomjs.org/api/webpage/

During onResourceReceive state, I capture the cache day and calculate the stable day and save it in result hash table.

The result hash table will be saved as a json string in json format file through printOutFailResult function. It is a callback and will be executed after all urls in list have been processed.

If you input the output file parameter before run this script, you finally got the json file under the current directory.

Thank you for your reading
Show your support

Clapping shows how much you appreciated Isaac Chio’s story.