Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#1, #4] Function Extensions and Other Changes #32

Merged
merged 66 commits into from
Apr 1, 2023
Merged

Conversation

hiltontj
Copy link
Owner

@hiltontj hiltontj commented Mar 31, 2023

The primary focus of this PR is the introduction of support for Function Extensions in JSONPath query strings (closes #1). In addition, a refactored error system was introduced (related to #4), the crate was converted to a workspace, and tracing was introduced.

Function Extensions

The JSONPath spec outlines Function Extensions. These allow for the use of functions within queries, e.g.,

$[? length(@.foo) > 5 ]

There is a standard register of functions, currently comprised of length, count, match, search, and value functions. There is also a type-system that the standard itself, as well as implementors of the standard, can use to extend JSONPath via the creation of additional functions.

In this PR, support for Function Extensions has been added via the functions module, as well as the #[function] attribute macro.

serde_json_path will now support the standard registered functions defined in the JSONPath specification. Currently that is limited to the five listed above, but will grow as the standard grows.

The Type System and Parse-time Function Validation

serde_json_path validates the use of functions in query strings at parse time. There is a concept of well-typedness in the standard that defines where functions, based on their signature, can be used within queries. These rules are enforced for functions by serde_json_path.

This PR therefore introduces three new types to support those defined in the standard: ValueType, NodesType, and LogicalType. These are used internally by serde_json_path to implement the standard registry functions, but are also exposed through the public API for users to use in their own custom functions, along with the #[function] attribute macro.

The #[function] Attribute Macro

The #[function] attribute macro was introduced to enable custom function registration in serde_json_path. This macro allows users of serde_json_path to define their own functions. To give an example, say you want to have a function called first() in your queries, that takes a NodeList, and returns the first node, if it is not empty. You can now accomplish this by adding the following somewhere in your application's code:

use serde_json_path::functions::{NodesType, ValueType};

#[serde_json_path::function]
fn first(nodes: NodesType) -> ValueType {
    match nodes.first() {
        Some(v) => ValueType::Node(v),
        None => ValueType::Nothing,
    }
}

And that's it! Now you can use a function called first in your queries, i.e., the following query will be valid:

$[? first(@.*) == 42 ]

The macro will ensure that user-defined functions are valid at compile-time, and will generate the means to:

  • validate the use of this function at parse-time, i.e., when calling JsonPath::parse, and
  • evaluate the use of this function at query-time, i.e., when calling JsonPath::query.

This maintains the current status quo; namely, that JsonPath::parse is fallible, while JsonPath::query is not.

The macro's capability to register functions at compile time is made possible via the inventory crate.

For users that do not want to use the #[function] attribute macro, they can disable the functions feature.

Other Changes

Updated Error System

The introduction of functions brought with it the need for a custom parser error type; mainly, to propagate specific errors pertaining to function misuse in query strings at parse-time. This was inevitable (see #4), but needed to be addressed, so I did so here, and not in a future PR.

Therefore, the internal parser error system was updated by this PR. The public Error type was also altered so that it can provide more concise error messages, and better support the addition of more fine grained errors in future, without introducing breaking changes.

Overhaul of Repository Structure

To support the new #[function] attribute macro, the repository was converted to a workspace, comprised of the following members:

  • serde_json_path - the primary crate for general consumption
  • serde_json_path_macros - re-exports the #[function] attribute macro, and provides serde_json_path_macros_internal with external dependencies
  • serde_json_path_macros_internal - the implementation of the #[function] macro
  • serde_json_path_core - defines core types used in both serde_json_path and serde_json_path_macros/macros_internal

This means that serde_json_path now covers four separate crates.

The trace feature flag

To better support debugging of query parsing and evaluation, the use of tracing was introduced, and gated by the trace feature flag, which is not enabled by default. All parser functions and query evaluation functions are instrumented at the TRACE level, to enable debugging of query string parsing and evaluation.

This is largely to aid internal development efforts of serde_json_path, but may also prove useful to anyone attempting to define their own functions, or submit issues for parser bugs.

Tracing was added to the project to debug the count function. It
may not be something that stays, as it may blaot the crate a bit,
but perhaps it could be isolated to a feature flag.
This commit reorganizes the crate so that it can support proc macros.
The primary crate was moved down a level into `serde_json_path` and a
new crate, `serde_json_path_macros` was created alongside it.
`serde_json_path` is still the primary crate, and all of its main
components were moved down, including:

- src/ folder
- test/ folder
- Cargo.toml
- LICENSE
- README
- CHANGELOG

One exception was that some of the build flags in the Cargo.toml file
had to be elevated up to the workspace Cargo.toml file.

The new `serde_json_path_macros` crate has its own unique structure. It
contains an `internal` crate of its own. This is where the actual proc
macro is defined. This was done because the proc macro we are defining
needs to rely on external dependencies; namely, `inventory` and
`once_cell`, and we don't want to put the burden on the user to have to
import these crates into their own libraries/applications.
And add Display implementations for function expressions
This is a progress commit to move over and work on some bug fixes
Refactor the crates into three separate crates:

A) serde_json_path
B) serde_json_path_macros
C) serde_json_path_core

The (C) core crate is needed because (A) and (B) will share dependencies
and thus would cause a circular dep in absence of (C). In general, all
of the core types and traits are moved from (A) to (C), while (A) still
holds some of the high-level user-facing types, as well as the parser.

(B) will still just be dedicated for the proc macro, but by moving the
core types into (C), (B) can now, hopefully, more reliably generate the
code needed for functions to compile correctly.

The code is compiling after this commit, however the functions have been
broken, and therefore there are a few tests failing.
The parsing errors were re-worked to actually propagate to the top. This
required writing impls for any external error type generated by a
map_res throughout the code.

The tests for functions were removed from the spec_examples tests into
the functions tests.

Some parser alt arms were re-ordered, and the function parser was
wrapped in cut to cause failure when function validation fails. Not sure
if this is the correct solution yet, but everything is passing for now.
@hiltontj hiltontj self-assigned this Mar 31, 2023
This solves the problem of having potential inventory collisions between
standard registry functions and user-defined functions. Now, when
validating queries, the inventory of user-defined functions is checked
first, and the function validated if there is a hit. If there is no hit,
then the REGISTRY is checked for one of the standard defined functions.

This ensures that if users define functions with names that later become
standardized (and added to serde_json_path), then their functions will
still take priority.

The feature gating was changed so that the registry, which is part of
the standard, is always built-in to serde_json_path, while the ability
to define custom functions is now behind the "functions" feature flag.
For now, this feature is enabled by default.

To do this, a new attribute macro, `#[register]`, was added. This is
purely for internal use within the serde_json_path crate, and serves no
purpose for external users.

Another benefit worth noting, is that the standard registry functions
are now built-in using a HashMap, which will ensure fast lookups.
@hiltontj hiltontj changed the title [#1, #4] Function Extensions and New Error System [#1, #4] Function Extensions, New Error System, Crate Division, and Tracing Apr 1, 2023
@hiltontj hiltontj changed the title [#1, #4] Function Extensions, New Error System, Crate Division, and Tracing [#1, #4] Function Extensions and Other Changes Apr 1, 2023
The NodesType, ValueType, and LogicalType all had their APIs improved
by implementing common Rust traits.

NodesType also implements Deref/DerefMut so that users get direct access
to the methods on the inner NodeList.
@hiltontj hiltontj merged commit 13c90ab into main Apr 1, 2023
@hiltontj hiltontj deleted the 4-error-handling branch April 1, 2023 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Function Extensions
1 participant