Mangling JSON Data with jq
Introduction
This is how jq is described at it’s homepage
jqis like sed for JSON data - you can use it to slice and filter and map and transform structured data with the same ease thatsed,awk,grepand friends let you play with text.
jqis written in portable C, and it has zero runtime dependencies. You can download a single binary,scpit to a far away machine of the same type, and expect it to work.
jqcan mangle the data format that you have into the one that you want with very little effort, and the program to do so is often shorter and simpler than you’d expect.
Now to understand a bit better what jq really is let my quote few more paragraphs from its docs. If you are familiar with how piping in bash is used you may find a slight resamblence to its philosophy.
A
jqprogram is a “filter”: it takes an input, and produces an output. There are a lot of builtin filters for extracting a particular field of an object, or converting a number to a string, or various other standard tasks.
Filters can be combined in various ways - you can pipe the output of one filter into another filter, or collect the output of a filter into an array.
Now this may seem a bit intimidating, afterall we just wanted to mangle some JSON, so let’s get our hand dirty doing that on some superhero themed examples. By the end of this article you will no longer be that guy who is still using grep on JSON.
Processing superhero data with jq
We find a JSON suitable for our processing needs and save it somewhere handy.
https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Objects/JSON#json_structure
Let’s assume we have this json formatted blob of data saved as superherosquad.json and we want to process it with jq.
{
"squadName": "Super hero squad",
"homeTown": "Metro City",
"formed": 2016,
"secretBase": "Super tower",
"active": true,
"members": [
{
"name": "Molecule Man",
"age": 29,
"secretIdentity": "Dan Jukes",
"powers": ["Radiation resistance", "Turning tiny", "Radiation blast"]
},
{
"name": "Madame Uppercut",
"age": 39,
"secretIdentity": "Jane Wilson",
"powers": [
"Million tonne punch",
"Damage resistance",
"Superhuman reflexes"
]
},
{
"name": "Eternal Flame",
"age": 1000000,
"secretIdentity": "Unknown",
"powers": [
"Immortality",
"Heat Immunity",
"Inferno",
"Teleportation",
"Interdimensional travel"
]
}
]
}
Pretty-print a json file
It can be helpful to pipe the response through jq to pretty-print it. The simplest jq program is the expression ., which takes the input and produces it unchanged as output.
cat superherosquad.json | jq '.'
# or without torturing the cat
jq '.' superherosquad.json
Selecting a single field
To select a single field we can just use the name of the field .homeTown
jq '.homeTown' superherosquad.json
Which outputs
"Metro City"
Selecting just the superhero array
To select just the array of superheros w
jq '.members' superherosquad.json
[
{
"name": "Molecule Man",
"age": 29,
"secretIdentity": "Dan Jukes",
"powers": [
"Radiation resistance",
"Turning tiny",
"Radiation blast"
]
},
{
"name": "Madame Uppercut",
"age": 39,
"secretIdentity": "Jane Wilson",
"powers": [
"Million tonne punch",
"Damage resistance",
"Superhuman reflexes"
]
},
{
"name": "Eternal Flame",
"age": 1000000,
"secretIdentity": "Unknown",
"powers": [
"Immortality",
"Heat Immunity",
"Inferno",
"Teleportation",
"Interdimensional travel"
]
}
]
Selecting first superhero from the superhero array
Now to select only the first member of the superhero array we can do
jq '.members[0]' superherosquad.json
{
"name": "Molecule Man",
"age": 29,
"secretIdentity": "Dan Jukes",
"powers": [
"Radiation resistance",
"Turning tiny",
"Radiation blast"
]
}
Array slicing
We can use array slicing syntax when accessing array, if we want to work with subarray based on the indexes like this
[1:3]
will be of length 2, containing the elements from index 1 (inclusive) to index 3 (exclusive).
jq '.members[1:3]' superherosquad.json
[
{
"name": "Madame Uppercut",
"age": 39,
"secretIdentity": "Jane Wilson",
"powers": [
"Million tonne punch",
"Damage resistance",
"Superhuman reflexes"
]
},
{
"name": "Eternal Flame",
"age": 1000000,
"secretIdentity": "Unknown",
"powers": [
"Immortality",
"Heat Immunity",
"Inferno",
"Teleportation",
"Interdimensional travel"
]
}
]
Selecting first superpower of the first superhero
Similarly if we wanted to get the first superpower of the first superhero we can simply continue using this syntax chaining it together like so
.members[0].powers[0]
jq '.members[0].powers[0]' superherosquad.json
"Radiation resistance"
Getting age of all superheros
Now we may be intrested in a just a single field from the superhero array like age, we can simply do the by doing
.members[].age
jq '.members[].age' superherosquad.json
29
39
1000000
Finding maximum age
To find a maximum age we might be tempted to do fallback to something like sort and head after getting the age of all superheros using jq like this
jq '.members[].age' superherosquad.json | sort | head -n 1
1000000
but there is a way to do this with just jq by using map and max
jq '.members | map(.age) | max' superherosquad.json
1000000
now you may be thinking well thats not really that much better, but now imagine you do not want to just find the max age but you actually want to output the data of superhero with max age, that’s where the sort | head approach falls apart, but with jq you can simply reach to max_by
jq '.members | max_by(.age)' superherosquad.json
{
"name": "Eternal Flame",
"age": 1000000,
"secretIdentity": "Unknown",
"powers": [
"Immortality",
"Heat Immunity",
"Inferno",
"Teleportation",
"Interdimensional travel"
]
}
Finding a hero with max age less than 100
This is very similar to the previous example with one change, we first want to filter out any heroes which do not satisfy a boolean predicate, more specifically we want age <= 100 to be true. We achieve that by adding map(select(.age <= 100)) before we do the max_by
jq '.members | map(select(.age <= 100)) | max_by(.age)' superherosquad.json
{
"name": "Madame Uppercut",
"age": 39,
"secretIdentity": "Jane Wilson",
"powers": [
"Million tonne punch",
"Damage resistance",
"Superhuman reflexes"
]
}
Regex matching
Just as we were able to filter for heroes that are not older than 100, we can also do regex matching to select only matching entries, so lets find all heroes with name matching regex
[dl]{1}ame
jq '.members | map(select(.name | test("[dl]{1}ame")))' superherosquad.json
[
{
"name": "Madame Uppercut",
"age": 39,
"secretIdentity": "Jane Wilson",
"powers": [
"Million tonne punch",
"Damage resistance",
"Superhuman reflexes"
]
},
{
"name": "Eternal Flame",
"age": 1000000,
"secretIdentity": "Unknown",
"powers": [
"Immortality",
"Heat Immunity",
"Inferno",
"Teleportation",
"Interdimensional travel"
]
}
]
Renaming and deleting fields
Now we want to rename the array of powers to superpowers and get rid of the field secretIdentity entirely. Once again we reach out for map to assign the values to a new key using .superpowers = .powers and delete them after using del.
jq '.members | map(.superpowers = .powers| del(.powers, .secretIdentity))' superherosquad.json
[
{
"name": "Molecule Man",
"age": 29,
"superpowers": [
"Radiation resistance",
"Turning tiny",
"Radiation blast"
]
},
{
"name": "Madame Uppercut",
"age": 39,
"superpowers": [
"Million tonne punch",
"Damage resistance",
"Superhuman reflexes"
]
},
{
"name": "Eternal Flame",
"age": 1000000,
"superpowers": [
"Immortality",
"Heat Immunity",
"Inferno",
"Teleportation",
"Interdimensional travel"
]
}
]
Generating a custom structure
Now let’s try going the other way around, instead of removing and renaming fields to fit our needs, we want to generate a custom JSON structure using the data in the original JSON.
More specifically we want to use current_age instead of age, superheroname instead of name, add a new field is_cool which is obviously always true and also create a new field first_power which is the first power in the powers array.
jq '.members[] | {current_age: .age, superheroname: .name, is_cool: true, first_power: .powers[0]}' superherosquad.json
{
"current_age": 29,
"superheroname": "Molecule Man",
"is_cool": true,
"first_power": "Radiation resistance"
}
{
"current_age": 39,
"superheroname": "Madame Uppercut",
"is_cool": true,
"first_power": "Million tonne punch"
}
{
"current_age": 1000000,
"superheroname": "Eternal Flame",
"is_cool": true,
"first_power": "Immortality"
}
Other tools
There is many more like jq for other common formats like XML, yaml, toml, csv etc.
- Go
yqa lightweight and portable command-line YAML, JSON and XML processor https://github.com/mikefarah/yq/ - Python
yqjq wrapper for YAML, XML, TOML documents https://kislyuk.github.io/yq/ xmlstarforXMLhttps://xmlstar.sourceforge.net/doc/UG/xmlstarlet-ug.htmlxsvforcsvhttps://github.com/BurntSushi/xsv
Conclusion
JSON is a widely employed structured data format typically used in most modern APIs and data services. Unfortunately, shells such as bash have no good way of working with structured data like JSON directly. This means that working with JSON via the command line can be painful, just as trying to parse HTML with regex is. You can fix this awkwardness by using jq a cool command-line processor for JSON.
This was more of a hands on introduction to basics of jq for a more formal introduction I really recommend reading trough man jq and also this in-browser interactive jq guide https://ishan.page/blog/2023-11-06-jq-by-example/
Now the key takeaway here is that using the right tool for the job instead of trying to hack something together with just pure bash or whatever you favourite shell is may make you life significantly less miserable.