Data wrangling with jq

jq is..
“a lightweight and flexible command-line JSON processor.”
“like sed for JSON data — you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text.”
These data wrangling sessions assume you are somewhat familiar with jq — if you are new visit https://stedolan.github.io/jq/ and start digging. There is an interactive playground to help you get started, but I recommend getting your hands dirty and working with it locally on your machine :)
Working with json files from separate input streams.
One can easily pipe data into jq via STDIN:
$ echo ‘{“name”:”blossm”}’ | jq ‘.name’
$ “blossm”jq can also read input via it’s --slurpfile argument and assign a variable name to the json stream in your jq program:
//data.json
{"name": "blossm"}--------------------------------$ jq '$myVariablename | .name' — slurpfile myVariableName data.json
$ "blossm"
We can leverage these capabilities to operate on multiple streams inside our jq filter.
Here we’ll use the key "name"stored in data.json to access a property in the json stream being piped via STDIN:
//dataKeys.json
["name"]-----------------------------------$ echo '{"name":"blossm"}' | jq '.[$dataKeys[0][0]]' --slurpfile dataKeys dataKeys.json
$ "blossm"
While the example above is somewhat contrived it exemplifies the expressiveness of jq in more complex scenarios.
Let’s say we want to enrich one json file with data from another:
//fileA.json
[
{
"name": "react",
"description": "A JAVASCRIPT LIBRARY FOR BUILDING USER INTERFACES"
},
{
"name": "webpack",
"description": "webpack is a module bundler for modern JavaScript applications."
}
]------------------------------------// fileB.json
[
{
"name": "react",
"url": "https://facebook.github.io/react/"
},
{
"name": "webpack",
"url": "https://webpack.github.io/"
}
]
Let’s map over fileA.json and add website_url properties containing the corresponding url properties found in fileB.json. We will use the name property for looking up the corresponding objects.
NOTE: There are countless ways to achieve this — here I am primarily focused on demonstrating jq’s ability to mix incoming streams into a single program.
I am also using jq’s map, select and recursive decent functions along with variable bindings to get the job done — go here to learn more:
cat fileA.json | jq 'map({name, description, website_url:(.name as $name|$fileBData|..|select(.name? and .name==$name))})' --slurpfile fileBData fileB.
jsonAt a glance this looks somewhat complex — the syntax can get cryptic pretty quick. That’s OK.
Basically we are saying:
- Map over
fileA.json - Retain it’s
nameanddescriptionproperties - Add a
website_urlproperty with a value corresponding to theurlproperty of an object infileB.json - Join objects using the
nameproperty common to both objects
When you are working with jq step into your program one piece at a time. You can clean it up later and define your own jq functions to get organized. More often than not you are trying to understand your data; learn about it’s structure and develop intuition around how to work with it — jq let’s you do this quickly without setting up a bunch of tooling and project scaffolding.
Get what you need faster by getting your hands dirty — you’ll know if it’s actually worth the time investment to take it a step further. You’ll thank yourself for not wasting a bunch of time writing high level code without really knowing your data.
Give jq a try if you haven’t already — it’s a powerful tool and you just might impress yourself with what you can do with it. I look forward to hearing any alternate approaches or insights on solving this problem with jq.
