Before install my previous mention framework, I suggest mac os users install Homebrew firstly, Homebrew is a mac os installation package management tools. It is very useful.
Homebrew website and its installation guide: https://brew.sh/
After, Homebrew installation, open your terminal and type the following command
Then, please type brew install node to install your node.js
After type this command in your terminal, you type node -v and npm -v would see the node.js version and node package manager version finally.
These version may be not match to your install version because I install these long time ago and have not updated yet.
Now, you can install phantomjs. Please go to the phantomjs website to download installation package http://phantomjs.org/download.html.
Firstly, unzip the download package in your prefer folder
Secondly set a environment variable call PHANTOMJS in your environment variable and append to your environment path variable. The environment variable should create in file ~/.bash_profile and after save this file. you should use source command to source this file again.
Finally, you can type phantomjs command in your terminal.
Sniffer script Usage Guide
The completed code is in my github repository: https://github.com/Isaac234517/snifferjs.git
You clone the git repository to your pc and then run sniffer.js with phantomjs sniffer.js -h. you should see the command usage guideline
- - i means your website list text file.
- - o means your output json file.
- - h means to see what kind of arguments you can input with run command
All the running and error message will be print out on console. And the final result can be output to a json file and order by cache day firstly and stable day secondly.
- Accept the input parameter from console.
- Get the url from text file and put it in a list
- Scan the url list and save each url result in hash table.
- Save the result hash table in a json file.
line 258 run the script and system.args means accept the system input as function parameters. Inside the main function, I use switch case statement to do the logical condition. In phatnomjs each console input as a argument. For instance, phantomjs sniffer.js -i website.txt. the args array contains[-i, website.txt] two elements. I use args[++i] in line 231 and line 236 in order to get the next value. In this script, all the file io implement in FileProcessor class. The following picture is FileProcessor class definition.
FileProcessor class only handle read text from file and output text to file. After read text from file, then the program should filter out the empty line, comment url and https url. After completed the filter process , the compose text function run callback function to trim the space and return a list which contains all the urls.
Now, each target url is inside the list. I write a scan page function to process the list. This function accept three parameter, list, result and callback. List is the collection of target urls, result is a hash table to save our need information and callback uses to print out the final result in console and save it in json file. Inside the scanpage function, I use phantomjs provided function to open the page and capture the download file information. Done function call once the page loading is completed. It prints out all the information which I get from this url. For instance, if I get “xx.js” and “yy.js ”cache day and stable day in url “abc”. This function will print out the xx.js and yy.js cache day and stable in clear format and sort it by cache day firstly and stable day secondly. Before open the page, I have declare function to handle onResourceReceive state and onResourceError state.
Each page opening process undergoes different state. The detail api for handle each state has already expressed in this link clearly. http://phantomjs.org/api/webpage/
During onResourceReceive state, I capture the cache day and calculate the stable day and save it in result hash table.
The result hash table will be saved as a json string in json format file through printOutFailResult function. It is a callback and will be executed after all urls in list have been processed.
If you input the output file parameter before run this script, you finally got the json file under the current directory.
Thank you for your reading