Mangling JSON Data with jq

Posted on Dec 25, 2023
tl;dr: Using jq can help you with manipulation of JSON data.

Introduction

https://jqlang.github.io/jq/

This is how jq is described at it’s homepage

jq is like sed for JSON data - you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text.

jq is written in portable C, and it has zero runtime dependencies. You can download a single binary, scp it to a far away machine of the same type, and expect it to work.

jq can mangle the data format that you have into the one that you want with very little effort, and the program to do so is often shorter and simpler than you’d expect.

Now to understand a bit better what jq really is let my quote few more paragraphs from its docs. If you are familiar with how piping in bash is used you may find a slight resamblence to its philosophy.

A jq program is a “filter”: it takes an input, and produces an output. There are a lot of builtin filters for extracting a particular field of an object, or converting a number to a string, or various other standard tasks.

Filters can be combined in various ways - you can pipe the output of one filter into another filter, or collect the output of a filter into an array.

Now this may seem a bit intimidating, afterall we just wanted to mangle some JSON, so let’s get our hand dirty doing that on some superhero themed examples. By the end of this article you will no longer be that guy who is still using grep on JSON.

Processing superhero data with jq

We find a JSON suitable for our processing needs and save it somewhere handy.

https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Objects/JSON#json_structure

Let’s assume we have this json formatted blob of data saved as superherosquad.json and we want to process it with jq.

{
  "squadName": "Super hero squad",
  "homeTown": "Metro City",
  "formed": 2016,
  "secretBase": "Super tower",
  "active": true,
  "members": [
    {
      "name": "Molecule Man",
      "age": 29,
      "secretIdentity": "Dan Jukes",
      "powers": ["Radiation resistance", "Turning tiny", "Radiation blast"]
    },
    {
      "name": "Madame Uppercut",
      "age": 39,
      "secretIdentity": "Jane Wilson",
      "powers": [
        "Million tonne punch",
        "Damage resistance",
        "Superhuman reflexes"
      ]
    },
    {
      "name": "Eternal Flame",
      "age": 1000000,
      "secretIdentity": "Unknown",
      "powers": [
        "Immortality",
        "Heat Immunity",
        "Inferno",
        "Teleportation",
        "Interdimensional travel"
      ]
    }
  ]
}

Pretty-print a json file

It can be helpful to pipe the response through jq to pretty-print it. The simplest jq program is the expression ., which takes the input and produces it unchanged as output.

cat superherosquad.json | jq '.'
# or without torturing the cat
jq '.' superherosquad.json

Selecting a single field

To select a single field we can just use the name of the field .homeTown

jq '.homeTown' superherosquad.json

Which outputs

"Metro City"

Selecting just the superhero array

To select just the array of superheros w

jq '.members' superherosquad.json
[
  {
    "name": "Molecule Man",
    "age": 29,
    "secretIdentity": "Dan Jukes",
    "powers": [
      "Radiation resistance",
      "Turning tiny",
      "Radiation blast"
    ]
  },
  {
    "name": "Madame Uppercut",
    "age": 39,
    "secretIdentity": "Jane Wilson",
    "powers": [
      "Million tonne punch",
      "Damage resistance",
      "Superhuman reflexes"
    ]
  },
  {
    "name": "Eternal Flame",
    "age": 1000000,
    "secretIdentity": "Unknown",
    "powers": [
      "Immortality",
      "Heat Immunity",
      "Inferno",
      "Teleportation",
      "Interdimensional travel"
    ]
  }
]

Selecting first superhero from the superhero array

Now to select only the first member of the superhero array we can do

jq '.members[0]' superherosquad.json
{
  "name": "Molecule Man",
  "age": 29,
  "secretIdentity": "Dan Jukes",
  "powers": [
    "Radiation resistance",
    "Turning tiny",
    "Radiation blast"
  ]
}

Array slicing

We can use array slicing syntax when accessing array, if we want to work with subarray based on the indexes like this

[1:3]

will be of length 2, containing the elements from index 1 (inclusive) to index 3 (exclusive).

jq '.members[1:3]' superherosquad.json
[
  {
    "name": "Madame Uppercut",
    "age": 39,
    "secretIdentity": "Jane Wilson",
    "powers": [
      "Million tonne punch",
      "Damage resistance",
      "Superhuman reflexes"
    ]
  },
  {
    "name": "Eternal Flame",
    "age": 1000000,
    "secretIdentity": "Unknown",
    "powers": [
      "Immortality",
      "Heat Immunity",
      "Inferno",
      "Teleportation",
      "Interdimensional travel"
    ]
  }
]

Selecting first superpower of the first superhero

Similarly if we wanted to get the first superpower of the first superhero we can simply continue using this syntax chaining it together like so

.members[0].powers[0]
 jq '.members[0].powers[0]' superherosquad.json
"Radiation resistance"

Getting age of all superheros

Now we may be intrested in a just a single field from the superhero array like age, we can simply do the by doing

.members[].age
jq '.members[].age' superherosquad.json
29
39
1000000

Finding maximum age

To find a maximum age we might be tempted to do fallback to something like sort and head after getting the age of all superheros using jq like this

jq '.members[].age' superherosquad.json | sort | head -n 1
1000000

but there is a way to do this with just jq by using map and max

jq '.members | map(.age) | max' superherosquad.json
1000000

now you may be thinking well thats not really that much better, but now imagine you do not want to just find the max age but you actually want to output the data of superhero with max age, that’s where the sort | head approach falls apart, but with jq you can simply reach to max_by

jq '.members | max_by(.age)' superherosquad.json
{
  "name": "Eternal Flame",
  "age": 1000000,
  "secretIdentity": "Unknown",
  "powers": [
    "Immortality",
    "Heat Immunity",
    "Inferno",
    "Teleportation",
    "Interdimensional travel"
  ]
}

Finding a hero with max age less than 100

This is very similar to the previous example with one change, we first want to filter out any heroes which do not satisfy a boolean predicate, more specifically we want age <= 100 to be true. We achieve that by adding map(select(.age <= 100)) before we do the max_by

jq '.members | map(select(.age <= 100)) | max_by(.age)' superherosquad.json
{
  "name": "Madame Uppercut",
  "age": 39,
  "secretIdentity": "Jane Wilson",
  "powers": [
    "Million tonne punch",
    "Damage resistance",
    "Superhuman reflexes"
  ]
}

Regex matching

Just as we were able to filter for heroes that are not older than 100, we can also do regex matching to select only matching entries, so lets find all heroes with name matching regex

[dl]{1}ame
jq '.members | map(select(.name | test("[dl]{1}ame")))' superherosquad.json
[
  {
    "name": "Madame Uppercut",
    "age": 39,
    "secretIdentity": "Jane Wilson",
    "powers": [
      "Million tonne punch",
      "Damage resistance",
      "Superhuman reflexes"
    ]
  },
  {
    "name": "Eternal Flame",
    "age": 1000000,
    "secretIdentity": "Unknown",
    "powers": [
      "Immortality",
      "Heat Immunity",
      "Inferno",
      "Teleportation",
      "Interdimensional travel"
    ]
  }
]

Renaming and deleting fields

Now we want to rename the array of powers to superpowers and get rid of the field secretIdentity entirely. Once again we reach out for map to assign the values to a new key using .superpowers = .powers and delete them after using del.

jq '.members | map(.superpowers = .powers| del(.powers, .secretIdentity))' superherosquad.json
[
  {
    "name": "Molecule Man",
    "age": 29,
    "superpowers": [
      "Radiation resistance",
      "Turning tiny",
      "Radiation blast"
    ]
  },
  {
    "name": "Madame Uppercut",
    "age": 39,
    "superpowers": [
      "Million tonne punch",
      "Damage resistance",
      "Superhuman reflexes"
    ]
  },
  {
    "name": "Eternal Flame",
    "age": 1000000,
    "superpowers": [
      "Immortality",
      "Heat Immunity",
      "Inferno",
      "Teleportation",
      "Interdimensional travel"
    ]
  }
]

Generating a custom structure

Now let’s try going the other way around, instead of removing and renaming fields to fit our needs, we want to generate a custom JSON structure using the data in the original JSON.

More specifically we want to use current_age instead of age, superheroname instead of name, add a new field is_cool which is obviously always true and also create a new field first_power which is the first power in the powers array.

jq '.members[] | {current_age: .age, superheroname: .name, is_cool: true, first_power: .powers[0]}' superherosquad.json
{
  "current_age": 29,
  "superheroname": "Molecule Man",
  "is_cool": true,
  "first_power": "Radiation resistance"
}
{
  "current_age": 39,
  "superheroname": "Madame Uppercut",
  "is_cool": true,
  "first_power": "Million tonne punch"
}
{
  "current_age": 1000000,
  "superheroname": "Eternal Flame",
  "is_cool": true,
  "first_power": "Immortality"
}

Other tools

There is many more like jq for other common formats like XML, yaml, toml, csv etc.

Conclusion

JSON is a widely employed structured data format typically used in most modern APIs and data services. Unfortunately, shells such as bash have no good way of working with structured data like JSON directly. This means that working with JSON via the command line can be painful, just as trying to parse HTML with regex is. You can fix this awkwardness by using jq a cool command-line processor for JSON.

This was more of a hands on introduction to basics of jq for a more formal introduction I really recommend reading trough man jq and also this in-browser interactive jq guide https://ishan.page/blog/2023-11-06-jq-by-example/

Now the key takeaway here is that using the right tool for the job instead of trying to hack something together with just pure bash or whatever you favourite shell is may make you life significantly less miserable.