Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Dedup strategy that keeps the last not null field #4184

Merged
merged 10 commits into from
Jun 25, 2024
Prev Previous commit
Next Next commit
chore: skip has_null check
  • Loading branch information
evenyag committed Jun 21, 2024
commit 1221fef6a8bae5cda0ce8b2f58d395f97f3697fc
7 changes: 6 additions & 1 deletion src/mito2/src/read/dedup.rs
Original file line number Diff line number Diff line change
Expand Up @@ -230,8 +230,10 @@ struct LastFieldsBuilder {
/// Fields builders, lazy initialized.
builders: Vec<Box<dyn MutableVector>>,
/// Last fields to merge, lazy initialized.
/// Only initializes this field when `skip_merge()` is false.
last_fields: Vec<Value>,
/// Whether the last row (including `last_fields`) has null field.
/// Only sets this field when `has_delete` is false.
has_null: bool,
/// Whether the last row has delete op. If true, skips merging fields.
has_delete: bool,
Expand Down Expand Up @@ -265,9 +267,12 @@ impl LastFieldsBuilder {

let last_idx = batch.num_rows() - 1;
let fields = batch.fields();
self.has_null = fields.iter().any(|col| col.data.is_null(last_idx));
// Safety: The last_idx is valid.
self.has_delete = batch.op_types().get_data(last_idx).unwrap() == OpType::Delete as u8;
// If the row has been deleted, then we don't need to merge fields.
if !self.has_delete {
self.has_null = fields.iter().any(|col| col.data.is_null(last_idx));
}

if self.skip_merge() {
// No null field or the row has been deleted, no need to merge.
Expand Down
Loading