drop and create datasets_view #2184

pawel-big-lebowski · 2022-10-13T09:38:46Z

Signed-off-by: Pawel Leszczynski [email protected]

Problem

Change within datasets_view definition breaks upgrades.

Closes: #2183

Solution

Drop view each time migration is run

Note: All database schema changes require discussion. Please link the issue for context.

Checklist

You've signed-off your work
[] Your changes are accompanied by tests (if relevant)
Your change contains a small diff and is self-contained
You've updated any relevant documentation (if relevant)
You've updated the CHANGELOG.md with details about your change under the "Unreleased" section (if relevant, depending on the change, this may not be necessary)
You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
You've included a header in any source code files (if relevant)

Signed-off-by: Pawel Leszczynski <[email protected]>

codecov · 2022-10-13T09:45:26Z

Codecov Report

Merging #2184 (2e5a346) into main (2d80f4c) will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff            @@
##               main    #2184   +/-   ##
=========================================
  Coverage     76.54%   76.54%           
  Complexity     1108     1108           
=========================================
  Files           214      214           
  Lines          5181     5181           
  Branches        408      408           
=========================================
  Hits           3966     3966           
  Misses          760      760           
  Partials        455      455

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

collado-mike · 2022-10-13T18:11:27Z

It should be a development principle that we don't make backward-incompatible changes. Dropping/renaming columns is fine in a single instance of an app where upgrades mean taking down the application and restarting it, but in a distributed environment, upgrading should not require downtime. As is, a rolling upgrade will cause all other running hosts to begin failing requests as they will be looking for a column that no longer exists in the central database.

Dropping/renaming columns should always be a two-step process

Remove/update all references to the column in the code
Remove/rename the column in the database

These two steps should be done in different releases so that users have an opportunity to upgrade in a controller fashion without downtime.

pawel-big-lebowski · 2022-10-17T11:28:50Z

@collado-mike Not clear to me how the problem you described applies for this PR.

The issue is related to a logical view modification and adding new column dataset_symlinks to it (former change). Adding a new column into a view should be backward compatible and old-code readers should be able to read data properly.

It turned out that CREATE OR REPLACE is not able to add a column to existing view and solution to this is to drop the view and re-create afterwards. Theoretically, this introduces millisecond time window when a view does not exist and other readers can fail (the moment after drop view and before create view) but that shouldn't be a big deal within distributed environment where single failures are allowed to happen.

collado-mike · 2022-10-17T23:07:18Z

Ah, sorry. I thought the previous view change was dropping a column and that this was attempting to band-aid over that by dropping the view and recreating it. Yeah, I'm not sure why postgres doesn't detect that the previous columns are still there - order should be irrelevant to the view definition. I think adding the new column to the end of the view definition would fix the issue.

but that shouldn't be a big deal within distributed environment where single failures are allowed to happen.

I think our assumptions of what happens in distributed systems differ 🤔. Distributed systems should be resilient to failures. So, if the view is missing and the code retries the query and succeeds, that is resilient to single failures. But that's not what Marquez does right now. Currently, if the query fails, we'll fail the whole request. If it happens to fail during an OpenLineage event processing, we'll lose the whole event. The OL clients don't have any retry built-in, so the data is simply lost. Of course, these issues can be addressed with retries on the client or persistent queues on the backend, but the simpler fix is to make sure view definition changes are backward compatible whenever possible.

pawel-big-lebowski · 2022-10-18T06:46:18Z

Thanks for providing an explanation on how Marquez is operating right now. It's really helpful. According to the docs:

By default, Flyway always wraps the execution of an entire migration within a single transaction.

So there should be no moment of a missing view visible to other readers.

I added dataset_symlinks column in the middle of view definition to locate it just after existing name column. I can change it to make it the last, however it's unclear to me if this can be any problem, as Marquez makes use of util methods like marquez.db.Columns#stringOrNull which do results.getObject(column). These methods rely on column name instead of column ordering.

The root cause issue was not according to column ordering but because we were trying to replace the view with a view containing different amount of columns which is not allowed in PostgreSQL.

drop and create datasets_view

2e5a346

Signed-off-by: Pawel Leszczynski <[email protected]>

boring-cyborg bot added the api API layer changes label Oct 13, 2022

mobuchowski approved these changes Oct 13, 2022

View reviewed changes

pawel-big-lebowski merged commit 9316120 into main Oct 13, 2022

pawel-big-lebowski deleted the fix-datasets-view branch October 13, 2022 10:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

drop and create datasets_view #2184

drop and create datasets_view #2184

pawel-big-lebowski commented Oct 13, 2022

codecov bot commented Oct 13, 2022 •

edited

Loading

collado-mike commented Oct 13, 2022

pawel-big-lebowski commented Oct 17, 2022

collado-mike commented Oct 17, 2022

pawel-big-lebowski commented Oct 18, 2022

drop and create datasets_view #2184

drop and create datasets_view #2184

Conversation

pawel-big-lebowski commented Oct 13, 2022

Problem

Solution

Checklist

codecov bot commented Oct 13, 2022 • edited Loading

Codecov Report

collado-mike commented Oct 13, 2022

pawel-big-lebowski commented Oct 17, 2022

collado-mike commented Oct 17, 2022

pawel-big-lebowski commented Oct 18, 2022

codecov bot commented Oct 13, 2022 •

edited

Loading