Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delete: add possibility to soft-delete datasets and jobs #2032

Merged
merged 2 commits into from
Sep 6, 2022

Conversation

mobuchowski
Copy link
Contributor

@mobuchowski mobuchowski commented Jul 6, 2022

First part of #1736 - ability to "soft delete" datasets and jobs - API to "hide" them which can be used from UI; the reason for adding this is process to hiding inactive datasets and jobs. This PR does not include the UI part.

The feature works by adding is_hidden flag to both datasets and jobs tables. Then, it changes jobs_view and adds datasets_view that hide rows where is_hidden flag is set to true value. This makes writing proper queries easier, since, you don't need to do this filtering manually everywhere as long as you use this view.

The soft-delete is reversed if the job or dataset is updated again - new version reverts the flag.

Signed-off-by: Maciej Obuchowski [email protected]

@pawel-big-lebowski
Copy link
Collaborator

This looks really great. The only inconsistency I see is that:

  • findBySomething does return soft-deleted datasets,
  • findAll does not return them.
    Such a behavior definitely makes here sense but can confuse other contributors.

What if we introduced includeDeleted parameter in the URL and make default values true and false for findByName/findAll respectively. ?

@mobuchowski mobuchowski force-pushed the delete/delete-datasets-endpoint branch 2 times, most recently from c6a5a86 to bc7e1f6 Compare August 1, 2022 11:33
@mobuchowski mobuchowski force-pushed the delete/delete-datasets-endpoint branch from bc7e1f6 to deb00a6 Compare August 4, 2022 12:41
@codecov
Copy link

codecov bot commented Aug 4, 2022

Codecov Report

Merging #2032 (17aaa8f) into main (27b54ed) will increase coverage by 0.03%.
The diff coverage is 80.48%.

@@             Coverage Diff              @@
##               main    #2032      +/-   ##
============================================
+ Coverage     75.11%   75.14%   +0.03%     
- Complexity     1020     1023       +3     
============================================
  Files           202      202              
  Lines          4798     4836      +38     
  Branches        392      392              
============================================
+ Hits           3604     3634      +30     
- Misses          754      762       +8     
  Partials        440      440              
Impacted Files Coverage Δ
api/src/main/java/marquez/db/DatasetDao.java 98.52% <ø> (ø)
api/src/main/java/marquez/db/DatasetFieldDao.java 100.00% <ø> (ø)
...pi/src/main/java/marquez/db/DatasetVersionDao.java 95.83% <ø> (ø)
api/src/main/java/marquez/db/JobDao.java 100.00% <ø> (ø)
api/src/main/java/marquez/db/JobVersionDao.java 91.04% <ø> (ø)
...va/src/main/java/marquez/client/MarquezClient.java 57.43% <0.00%> (-1.83%) ⬇️
...java/src/main/java/marquez/client/MarquezHttp.java 81.91% <83.33%> (+0.20%) ⬆️
api/src/main/java/marquez/api/DatasetResource.java 95.00% <100.00%> (+0.76%) ⬆️
api/src/main/java/marquez/api/JobResource.java 93.10% <100.00%> (+0.79%) ⬆️
.../src/main/java/marquez/service/LineageService.java 85.32% <100.00%> (+0.85%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@mobuchowski mobuchowski force-pushed the delete/delete-datasets-endpoint branch 4 times, most recently from ab7641f to 4b8efd7 Compare August 8, 2022 13:57
@mobuchowski
Copy link
Contributor Author

@pawel-big-lebowski changed how this works. Now it all references view.

@mobuchowski mobuchowski marked this pull request as ready for review August 8, 2022 13:58
@mobuchowski
Copy link
Contributor Author

@wslulciuc can you take a look?

Also, can you think of any additional tests this might need?

Copy link
Collaborator

@pawel-big-lebowski pawel-big-lebowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fell in love with testLineageWithDeletedDataset test ❤️

&& jobNameEquals(n, "downstreamJob0<-outputData<-readJob0<-commonDataset"))
.isEmpty();
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great test! 💯 🥇

@mobuchowski
Copy link
Contributor Author

Next time I won't change query formatting AND query itself in one PR - it makes resolving conflicts unpleasant 😞

@mobuchowski mobuchowski force-pushed the delete/delete-datasets-endpoint branch from 4b8efd7 to ae95250 Compare August 30, 2022 14:01
@boring-cyborg boring-cyborg bot added api API layer changes client/java docker labels Aug 30, 2022
@mobuchowski mobuchowski changed the title delete: add possibility to soft-delete datasets delete: add possibility to soft-delete datasets and jobs Aug 30, 2022
Copy link
Member

@julienledem julienledem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two main comments:

  1. if I agree that the sql reformatting with triple quotes is good, this should be done in a separate PR. Mixing refactoring/reformatting with same behavior and change of behavior in the same PR makes it difficult to review. We should avoid doing that in the future.
  2. personally I would prefer we call this "delete". It is a soft delete since all the versions are maintained but it is a delete as these jobs and dataset don't exist anymore. If they do they just get undeleted the next time they run or get read/written.

api/src/main/java/marquez/api/DatasetResource.java Outdated Show resolved Hide resolved
api/src/main/java/marquez/api/JobResource.java Outdated Show resolved Hide resolved
@mobuchowski mobuchowski force-pushed the delete/delete-datasets-endpoint branch from 25b7d66 to 17aaa8f Compare September 6, 2022 09:56
@mobuchowski mobuchowski merged commit b709b03 into main Sep 6, 2022
@mobuchowski mobuchowski deleted the delete/delete-datasets-endpoint branch September 6, 2022 11:43
@wslulciuc wslulciuc mentioned this pull request Sep 7, 2022
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants