Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deterministic qsv enum #1888

Closed
tmtmtmtm opened this issue Jun 17, 2024 · 2 comments · Fixed by #1902
Closed

deterministic qsv enum #1888

tmtmtmtm opened this issue Jun 17, 2024 · 2 comments · Fixed by #1902
Labels
enhancement New feature or request

Comments

@tmtmtmtm
Copy link
Contributor

A lot of times I add id columns to incoming CSVs that don't already have one, to make various other tasks simpler later. But when the upstream file changes, and I re-generate a new version of my copy, and then diff that, I can end up with a lot of noise in the case where, say, one line has been moved, and everything after than gets renumbered.

It would be great if there was an option, similar to --uuid, but where the result was more deterministic (perhaps by generating a SHA of the existing data, or something along those lines?)

Or is there some way to get there with qsv apply dynfmt that I can't think of atm (other than explicitly concatenating all the other fields together)?

@jqnatividad
Copy link
Owner

Generating a hash is a good idea to enable enum to create a deterministic ID.

I'll add a --hash option to enum, perhaps using the xxhash crate to make it fast and the same hash is generated across platforms.

@jqnatividad jqnatividad added the enhancement New feature or request label Jun 17, 2024
@jqnatividad
Copy link
Owner

further, when using --hash, one can specify which columns to use to create the hash. By default, all columns will be used (except the index, if it already exists)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants