Querying entries in HAR files with jq

Most browsers can log all interactions with a site and export the entries as a HAR file. HAR is a JSON-formatted file format that contains all you need to debug potential interaction issues.

Here a snippet from accessing cnn.com:

{
"startedDateTime": "2016–09–13T15:58:23.413Z",
"time": 5.663000003551133,
"request": {
"method": "GET",
"url": "http://edition.cnn.com/favicon.ie9.ico",
"httpVersion": "HTTP/1.1",
"headers": [
{ "name": "Accept-Encoding", "value": "gzip, deflate, sdch" },
...
],
"response": {
"status": 200,
"statusText": "OK",
"httpVersion": "HTTP/1.1",
...
]

Drowning in JSON

The “log all interactions” part raises an issue though: The initial request to cnn.com leads to a 2.6 MB HAR file with 8,852 lines (formatted JSON). So how to find back the requests we are looking for?

Just scanning through such a file is clearly not the best option.

./jq to the rescue

Luckily, Stephen Dolan wrote a nifty tool to query JSON files: jq. With jq, finding back our favicon requests is as simple as using sed:

jq '[.log.entries[] | select(.request.url | startswith("http://edition.cnn.com/favicon"))]' < cnn.har

Happy coding!

Want to learn more about coding? Have a look to our other articles.


Photo: Ronny Roeller