Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drop and create datasets_view #2184

Merged
merged 1 commit into from
Oct 13, 2022
Merged

drop and create datasets_view #2184

merged 1 commit into from
Oct 13, 2022

Conversation

pawel-big-lebowski
Copy link
Collaborator

Signed-off-by: Pawel Leszczynski [email protected]

Problem

Change within datasets_view definition breaks upgrades.

Closes: #2183

Solution

Drop view each time migration is run

Note: All database schema changes require discussion. Please link the issue for context.

Checklist

  • You've signed-off your work
  • [] Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've updated the CHANGELOG.md with details about your change under the "Unreleased" section (if relevant, depending on the change, this may not be necessary)
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

Signed-off-by: Pawel Leszczynski <[email protected]>
@boring-cyborg boring-cyborg bot added the api API layer changes label Oct 13, 2022
@codecov
Copy link

codecov bot commented Oct 13, 2022

Codecov Report

Merging #2184 (2e5a346) into main (2d80f4c) will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff            @@
##               main    #2184   +/-   ##
=========================================
  Coverage     76.54%   76.54%           
  Complexity     1108     1108           
=========================================
  Files           214      214           
  Lines          5181     5181           
  Branches        408      408           
=========================================
  Hits           3966     3966           
  Misses          760      760           
  Partials        455      455           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@pawel-big-lebowski pawel-big-lebowski merged commit 9316120 into main Oct 13, 2022
@pawel-big-lebowski pawel-big-lebowski deleted the fix-datasets-view branch October 13, 2022 10:29
@collado-mike
Copy link
Collaborator

It should be a development principle that we don't make backward-incompatible changes. Dropping/renaming columns is fine in a single instance of an app where upgrades mean taking down the application and restarting it, but in a distributed environment, upgrading should not require downtime. As is, a rolling upgrade will cause all other running hosts to begin failing requests as they will be looking for a column that no longer exists in the central database.

Dropping/renaming columns should always be a two-step process

  1. Remove/update all references to the column in the code
  2. Remove/rename the column in the database

These two steps should be done in different releases so that users have an opportunity to upgrade in a controller fashion without downtime.

@pawel-big-lebowski
Copy link
Collaborator Author

@collado-mike Not clear to me how the problem you described applies for this PR.

The issue is related to a logical view modification and adding new column dataset_symlinks to it (former change). Adding a new column into a view should be backward compatible and old-code readers should be able to read data properly.

It turned out that CREATE OR REPLACE is not able to add a column to existing view and solution to this is to drop the view and re-create afterwards. Theoretically, this introduces millisecond time window when a view does not exist and other readers can fail (the moment after drop view and before create view) but that shouldn't be a big deal within distributed environment where single failures are allowed to happen.

@collado-mike
Copy link
Collaborator

Ah, sorry. I thought the previous view change was dropping a column and that this was attempting to band-aid over that by dropping the view and recreating it. Yeah, I'm not sure why postgres doesn't detect that the previous columns are still there - order should be irrelevant to the view definition. I think adding the new column to the end of the view definition would fix the issue.

but that shouldn't be a big deal within distributed environment where single failures are allowed to happen.

I think our assumptions of what happens in distributed systems differ 🤔. Distributed systems should be resilient to failures. So, if the view is missing and the code retries the query and succeeds, that is resilient to single failures. But that's not what Marquez does right now. Currently, if the query fails, we'll fail the whole request. If it happens to fail during an OpenLineage event processing, we'll lose the whole event. The OL clients don't have any retry built-in, so the data is simply lost. Of course, these issues can be addressed with retries on the client or persistent queues on the backend, but the simpler fix is to make sure view definition changes are backward compatible whenever possible.

@pawel-big-lebowski
Copy link
Collaborator Author

Thanks for providing an explanation on how Marquez is operating right now. It's really helpful. According to the docs:

By default, Flyway always wraps the execution of an entire migration within a single transaction.

So there should be no moment of a missing view visible to other readers.

I added dataset_symlinks column in the middle of view definition to locate it just after existing name column. I can change it to make it the last, however it's unclear to me if this can be any problem, as Marquez makes use of util methods like marquez.db.Columns#stringOrNull which do results.getObject(column). These methods rely on column name instead of column ordering.

The root cause issue was not according to column ordering but because we were trying to replace the view with a view containing different amount of columns which is not allowed in PostgreSQL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api API layer changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ERROR: cannot drop columns from view from marquez/db/migration/R__3_Datasets_view.sql
3 participants