Add Java client method for dataset/job lineage #2623
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
The Java SDK didn't have a method for the dataset/job-level lineage endpoint (
GET /lineage
). See https://marquezproject.slack.com/archives/C01E8MQGJP7/p1692774856981109Closes: #1527
Solution
Adds a new method to
MarquezClient
for the endpoint, along with tests, and the necessary new subclasses ofNodeData
for datasets and jobs.Also, reworks how the polymorphic deserialization is done to get away from the problem described in #1527 which I ran into when working on the new method. This was happening due to the way we were using
@JsonTypeInfo
. Specifically, we had theEXTERNAL_PROPERTY
inclusion strategy on theNodeData
interface class, however (per Jackson docs):This accounted for the extra
type
attribute being added on serialization - the intended behaviour of using the property on the parentNode
was never happening. Unfortunately even moving the relevant annotations to the right places didn't work, I think becausetype
is an existing property onNode
. We'd kind of want a combination of Jackson'sEXISTING_PROPERTY
andEXTERNAL_PROPERTY
but it doesn't exist.Happily, using the
DEDUCTION
resolution strategy (TIL!) works nicely with no extra properties, because each of the subclasses has fields that are both unique and non-nullable, so Jackson can work it out via reflection. It does mean you can construct aNode
with a type that contradicts theNodeData
- but that was kind of the case anyway.For backwards compatibility, the
defaultImpl
forNodeData
in the client is set to the column lineage one. This is because when encountering a payload from the current Marquez API with the extraneoustype
property, the Jackson deduction will get confused and throw. So if consumers upgrade the client first and then Marquez itself, they should see no issues during the transition.One-line summary:
Add Java client method for dataset/job lineage
Checklist
CHANGELOG.md
(Depending on the change, this may not be necessary)..sql
database schema migration according to Flyway's naming convention (if relevant)