Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iceberg: Inefficient Equality Delete file handling #18396

Closed
jasonf20 opened this issue Jul 25, 2023 · 3 comments
Closed

Iceberg: Inefficient Equality Delete file handling #18396

jasonf20 opened this issue Jul 25, 2023 · 3 comments
Labels
iceberg Iceberg connector

Comments

@jasonf20
Copy link
Member

jasonf20 commented Jul 25, 2023

Currently, the Iceberg plugin will:

  1. Load equality delete files in each split. If the same delete file is used by many splits it will be re-read many times
  2. Store a map of the deleted rows per loaded delete file. This is inefficient if rows are updated often. It should be enough to store a single map with the row and the max data sequence number in which it was deleted.
@findinpath
Copy link
Contributor

Does #17115 relates to this issue ?

@findinpath findinpath added the iceberg Iceberg connector label Aug 4, 2023
@jasonf20
Copy link
Member Author

jasonf20 commented Aug 7, 2023

@findinpath It addresses 2. but not 1. from what I can tell. PR #18397 addresses both. I have a more detailed explanation here: #18397 (comment)

@ebyhr
Copy link
Member

ebyhr commented Jun 19, 2024

Closing as #21441

@ebyhr ebyhr closed this as completed Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
iceberg Iceberg connector
Development

Successfully merging a pull request may close this issue.

3 participants