Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleted records returned when using equality deletes and filtering on columns not part of the Iceberg Identifier Field IDs #22393

Open
nrutherford-w opened this issue Jun 14, 2024 · 3 comments

Comments

@nrutherford-w
Copy link

nrutherford-w commented Jun 14, 2024

Description

Trino queries return deleted records. This occurs when using equality deletes with Iceberg and filtering by columns that are not part of the Iceberg Identifier Field IDs.

Steps to Reproduce

  1. Create Table
CREATE TABLE orders (
   order_pk integer NOT NULL,
   customer_id integer,
   order_void boolean
)
WITH (
   format = 'ORC',
   format_version = 2,
   location = 'dir/orders-1c20829da05e477893e047e4ba8034e0',
   partitioning = ARRAY['order_pk']
)
  1. Inset into orders table
order_pk customer_id order_void
19 400 0
20 400 0
21 400 0
22 400 0
  1. Delete from orders table
order_pk
21
22
  1. Select from orders table
    SELECT * FROM orders

Expected response is received

order_pk customer_id order_void
19 400 0
20 400 0
  1. Select from orders table filtering by a column that is not part of the table's partitioning
    SELECT * FROM orders WHERE NOT order_void

Response includes deleted records

order_pk customer_id order_void
19 400 0
20 400 0
21 400 0
22 400 0

Expected Result:

The query SELECT * FROM orders WHERE NOT order_void should not return records with 'order_pk' 21 and 22 as they were deleted.

Actual Result:

The query SELECT * FROM orders WHERE NOT order_void returns deleted records with 'order_pk' 21 and 22.

Additional Information

Trino Version 449
Inserts and equality deletes are written through the Iceberg API.
Running insert and delete statements in Trino will not recreate the issue.

Attachments

Insert orc files for order_pk = 22
989149293-1-94f64dda-537c-4ada-ab9b-4b1c1bdc64ed-00007.orc
989149293-1-94f64dda-537c-4ada-ab9b-4b1c1bdc64ed-00008.orc
Delete orc file for order_pk = 22
989569443-1-bfa480a1-c1ee-4982-9d55-176a46cac1a2-00004.orc

orc-files.zip

@ebyhr
Copy link
Member

ebyhr commented Jun 15, 2024

@nrutherford-w I can't reproduce the issue on master. What's your Trino version? Also, please share the actual steps to reproduce without converting statements to markdown tables.

@nrutherford-w nrutherford-w changed the title Deleted records returned when using equality deletes with Iceberg and filtering by non-partitioned columns Deleted records returned when using equality deletes and filtering on columns not part of the Iceberg Identifier Field IDs Jun 21, 2024
@nrutherford-w
Copy link
Author

nrutherford-w commented Jun 21, 2024

@ebyhr Trino version is 449. Insert and Deletes are processed through the Iceberg API. Attached to the initial comment are the .orc files from the insert and delete for order_pk 22

@ebyhr
Copy link
Member

ebyhr commented Jun 23, 2024

@nrutherford-w Thanks for sharing the file. Please share the code snippet so that we can reproduce easily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants