knowledge/technology/applications/cli/jq.md
2024-01-17 09:00:45 +01:00

9.6 KiB

obj website repo
application https://jqlang.github.io/jq/ https://github.com/jqlang/jq

jq

jq is a lightweight and flexible command-line JSON processor akin to sed,awk,grep, and friends for JSON data. It's written in portable C and has zero runtime dependencies, allowing you to easily slice, filter, map, and transform structured data.

Usage

cat data.json | jq [FILTER]

# Raw Data
cat data.json | jq -r [FILTER]

Filters

Identity

The absolute simplest filter is . . This filter takes its input and produces the same value as output. That is, this is the identity operator.

Object Identifier

The simplest useful filter has the form .foo. When given a JSON object (aka dictionary or hash) as input, .foo produces the value at the key "foo" if the key is present, or null otherwise.

The .foo syntax only works for simple, identifier-like keys, that is, keys that are all made of alphanumeric characters and underscore, and which do not start with a digit.

If the key contains special characters or starts with a digit, you need to surround it with double quotes like this: ."foo$", or else .["foo$"].

Array Index

When the index value is an integer, .[<number>] can index arrays. Arrays are zero-based, so .[2] returns the third element.

Negative indices are allowed, with -1 referring to the last element, -2 referring to the next to last element, and so on.

Array/String Slice

The .[<number>:<number>] syntax can be used to return a subarray of an array or substring of a string. The array returned by .[10:15] will be of length 5, containing the elements from index 10 (inclusive) to index 15 (exclusive). Either index may be negative (in which case it counts backwards from the end of the array), or omitted (in which case it refers to the start or end of the array). Indices are zero-based.

Array/Object Value Iterator

If you use the .[index] syntax, but omit the index entirely, it will return all of the elements of an array. Running .[] with the input [1,2,3] will produce the numbers as three separate results, rather than as a single array. A filter of the form .foo[] is equivalent to .foo | .[].

You can also use this on an object, and it will return all the values of the object.

Note that the iterator operator is a generator of values.

Comma

If two filters are separated by a comma, then the same input will be fed into both and the two filters' output value streams will be concatenated in order: first, all of the outputs produced by the left expression, and then all of the outputs produced by the right. For instance, filter .foo, .bar, produces both the "foo" fields and "bar" fields as separate outputs.

The , operator is one way to contruct generators.

Pipe

The | operator combines two filters by feeding the output(s) of the one on the left into the input of the one on the right. It's similar to the Unix shell's pipe, if you're used to that.

If the one on the left produces multiple results, the one on the right will be run for each of those results. So, the expression .[] | .foo retrieves the "foo" field of each element of the input array. This is a cartesian product, which can be surprising.

Note that .a.b.c is the same as .a | .b | .c.

Note too that . is the input value at the particular stage in a "pipeline", specifically: where the . expression appears. Thus .a | . | .b is the same as .a.b, as the . in the middle refers to whatever value .a produced.

Array Construction: []

As in JSON[] is used to construct arrays, as in [1,2,3]. The elements of the arrays can be any jq expression, including a pipeline. All of the results produced by all of the expressions are collected into one big array. You can use it to construct an array out of a known quantity of values (as in [.foo, .bar, .baz]) or to "collect" all the results of a filter into an array (as in [.items[].name])

Once you understand the "," operator, you can look at jq's array syntax in a different light: the expression [1,2,3] is not using a built-in syntax for comma-separated arrays, but is instead applying the [] operator (collect results) to the expression 1,2,3 (which produces three different results).

If you have a filter X that produces four results, then the expression [X] will produce a single result, an array of four elements.

Object Construction: {}

Like JSON{} is for constructing objects (aka dictionaries or hashes), as in: {"a": 42, "b": 17}.

If the keys are "identifier-like", then the quotes can be left off, as in {a:42, b:17}. Variable references as key expressions use the value of the variable as the key. Key expressions other than constant literals, identifiers, or variable references, need to be parenthesized, e.g., {("a"+"b"):59}.

The value can be any expression (although you may need to wrap it in parentheses if, for example, it contains colons), which gets applied to the {} expression's input (remember, all filters have an input and an output).

{foo: .bar}

will produce the JSON object {"foo": 42} if given the JSON object {"bar":42, "baz":43} as its input. You can use this to select particular fields of an object: if the input is an object with "user", "title", "id", and "content" fields and you just want "user" and "title", you can write

{user: .user, title: .title}

Because that is so common, there's a shortcut syntax for it: {user, title}.

If one of the expressions produces multiple results, multiple dictionaries will be produced. If the input's

{"user":"stedolan","titles":["JQ Primer", "More JQ"]}

then the expression

{user, title: .titles[]}

will produce two outputs:

{"user":"stedolan", "title": "JQ Primer"}
{"user":"stedolan", "title": "More JQ"}

Putting parentheses around the key means it will be evaluated as an expression. With the same input as above,

{(.user): .titles}

produces

{"stedolan": ["JQ Primer", "More JQ"]}

Functions

has(key)

The builtin function has returns whether the input object has the given key, or the input array has an element at the given index.

map(f), map_values(f)

For any filter fmap(f) and map_values(f) apply f to each of the values in the input array or object, that is, to the values of .[].

In the absence of errors, map(f) always outputs an array whereas map_values(f) outputs an array if given an array, or an object if given an object.

When the input to map_values(f) is an object, the output object has the same keys as the input object except for those keys whose values when piped to f produce no values at all.

map(f) is equivalent to [.[] | f] and map_values(f) is equivalent to .[] |= f.

del(path)

The builtin function del removes a key and its corresponding value from an object.

reverse

This function reverses an array.

contains(element)

The filter contains(b) will produce true if b is completely contained within the input. A string B is contained in a string A if B is a substring of A. An array B is contained in an array A if all elements in B are contained in any element in A. An object B is contained in object A if all of the values in B are contained in the value in A with the same key. All other types are assumed to be contained in each other if they are equal.

startswith(str)

Outputs true if . starts with the given string argument.

endswith(str) 

Outputs true if . ends with the given string argument.

split(str)

Splits an input string on the separator argument.

join(str)

Joins the array of elements given as input, using the argument as separator. It is the inverse of split: that is, running split("foo") | join("foo") over any input string returns said input string.

Conditionals

if-then-else-end

if A then B else C end will act the same as B if A produces a value other than false or null, but act the same as C otherwise.

if A then B end is the same as if A then B else . end. That is, the else branch is optional, and if absent is the same as .. This also applies to elif with absent ending else branch.

Checking for false or null is a simpler notion of "truthiness" than is found in JavaScript or Python, but it means that you'll sometimes have to be more explicit about the condition you want. You can't test whether, e.g. a string is empty using if .name then A else B end; you'll need something like if .name == "" then A else B end instead.

If the condition A produces multiple results, then B is evaluated once for each result that is not false or null, and C is evaluated once for each false or null.

More cases can be added to an if using elif A then B syntax.

Example: jq 'if . == 0 then "zero" elif . == 1 then "one" else "many" end'

Alternative Operator //

The // operator produces all the values of its left-hand side that are neither false nor null, or, if the left-hand side produces no values other than false or null, then // produces all the values of its right-hand side.

A filter of the form a // b produces all the results of a that are not false or null. If a produces no results, or no results other than false or null, then a // b produces the results of b.

This is useful for providing defaults: .foo // 1 will evaluate to 1 if there's no .foo element in the input.