Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use IEC standard abbreviations (GiB, TiB, etc) #6742

Merged
merged 4 commits into from
Feb 16, 2023

Conversation

dbeatty10
Copy link
Contributor

@dbeatty10 dbeatty10 commented Jan 26, 2023

resolves #6741
equivalent to dbt-labs/dbt-bigquery#482

Description

When using multiples of 1024, the IEC standard abbreviations are "GiB", "TiB", "PiB" instead of "GB", "TB", "PB" (which are for multiples of 1000).

More info

Today I Learned (TIL), format_bytes (and format_rows_number) are defined in dbt-core but redefined in dbt-bigquery and only used in that adapter. So the dbt-core version is unused to best I can tell.

Context for the duplication

Removing them from dbt-core was mentioned here and here as part of the discussion for #48.

It feels to me like one of the following should have happened along with dbt-labs/dbt-bigquery#48:

  1. a simultaneous PR should have removed format_bytes and format_rows_number from dbt-core; or
  2. #48 should have been added to dbt-core instead of dbt-bigquery.

As it stands, the implementations in dbt-core are unused (to the best that we can tell), but would probably need to be maintained to preserve backwards compatibility in case some unknown adapter is using them.

Proposal

Going forward, I'd propose that we choose one (and only one) of the following:

  1. Keep separate implementations in dbt-core and dbt-bigquery (and fix the KB vs. KiB issue in both)
  2. Remove the implementations in dbt-core (so they exist only in dbt-bigquery)
  3. Completely move the implementation in dbt-bigquery back into dbt-core

Checklist

@cla-bot cla-bot bot added the cla:yes label Jan 26, 2023
@dbeatty10 dbeatty10 marked this pull request as ready for review January 26, 2023 02:42
@dbeatty10 dbeatty10 requested a review from a team as a code owner January 26, 2023 02:42
@dbeatty10 dbeatty10 added the ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering label Jan 26, 2023
Copy link
Contributor

@ChenyuLInx ChenyuLInx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated change looks good to me! Learned a bit new things about IEC standard also

@@ -454,7 +454,7 @@ def __get__(self, obj, objtype):


def format_bytes(num_bytes):
for unit in ["Bytes", "KB", "MB", "GB", "TB", "PB"]:
for unit in ["Bytes", "KiB", "MiB", "GiB", "TiB", "PiB"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we actually use this method anywhere in dbt-core. It's a holdover from when it was added, for the sole benefit of dbt-bigquery, and we forgot to remove it during the Great Adapter Split-Apart of 2021.

What do you think about just removing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about just removing it?

Removed!

I did a quick cursory check for both format_bytes and format_rows_number in the dbt Labs adapters using a GitHub search like this. DIdn't see any references other than in dbt-bigquery, which we know about.

I didn't check any non-dbt Labs adapters, although our expectation is that it is not being used. If it did happen to be used by some adapter out there, they'd be on their own to resolve it.

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm aussi

@dbeatty10
Copy link
Contributor Author

Les remerciements @ChenyuLInx and @jtcohen6

Merging now.

@dbeatty10 dbeatty10 merged commit 971b38c into main Feb 16, 2023
@dbeatty10 dbeatty10 deleted the dbeatty/fix-size-abbreviations branch February 16, 2023 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-1911] [Bug] Non-standard abbreviations for kibibyte, etc
3 participants