-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Internals: backtracking
In jq every expression, in every context, can be a generator, and backtracking is an essential part of how generators work.
Not every expression generates any number of values other than 1. For example, null
, false
, true
, numeric literals, and string literals that do no interpolation all produce just one value -- not zero, not two, not more, just one. But expressions involving ,
, .[]
, range
, and/or recursion can generate multiple values.
empty
, of course, generates no values -- it backtracks!
The pipe operator, |
, simply concatenates the AST of the left-hand side with the right-hand side's:
Exp '|' Exp {
$$ = block_join($1, $3);
} |
How can it be that simple?! It's not magic. It has to do with how the jq bytecode VM works.
The VM executes instructions in order, from the first to the last, just as one would expect. At any point in the execution the value at the top of the jq stack is the value that corresponds to .
at that point in the program's execution. Each instruction may manipulate the stack in some way, possibly consume the .
value, and if it consumes it, push a new one.
The BACKTRACK
opcode reverse program flow, backtracking the execution of the bytecode to the nearest stack save point.
Generators create save points when they still have values to yield, and on backtrack they resume the VM's forward execution order, but with a new value (the next output of the generator).
Generators can also initiate backtracking when they run out of outputs. They can also produce a final output without creating a save point, in which case on backtrack that generator will not be resumed as it's no longer there.
The ,
operator corresponds to the FORK
opcode. FORK
instructions have an associated jump target: the address of the first instruction of the right-hand side of the ,
. When a FORK
is executed it creates a save point. When backtracking FORK
simply jumps to the right-hand side's first instruction without creating a save point.
When a function or program outputs a value it executes a RET
instruction, and that creates a save point. For the top-level program the RET
instruction causes the C function jq_next()
to return, though the state of the VM and its stack remain as-is, saved in the jq_state
structure. When the application calls jq_next()
again the VM initiates backtracking, which causes the nearest save point's generator to be resumed -- if there are no save points, or if none of them do anything other than backtrack, then jq_next()
will output a jv_invalid()
to indicate that the top-level program has stopped producing outputs.
When jq_next()
returns jv_invalid()
to indicate that the program is exhausted, the application can reset the program state and execute it again with a new input.
Thus the jq processor simply does that: it loops over inputs running the jq program for each input, and for each input it loops over jq_next()
.
So then joining two block
s of jq program then simply causes the output of the one on the left to be at the top of the stack when the one on the right begins executing.
|
is not where the interesting magic for generators and backtracking is! The magic is as described above, in: a) stack save points, b) FORK
, EACH
, and RANGE
, c) in BACKTRACK
and RET
and any instruction that can initiate backtracking, d) in builtin C-coded functions that can return a jv_invalid()
(backtrack) or jv_invalid_with_msg()
(error).
- Home
- FAQ
- jq Language Description
- Cookbook
- Modules
- Parsing Expression Grammars
- Docs for Oniguruma Regular Expressions (RE.txt)
- Advanced Topics
- Guide for Contributors
- How To
- C API
- jq Internals
- Tips
- Development