-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a script to automatically merge multiple .csv files and deal with duplicates #65
Comments
Some updates, merge_csv.py now also print a summary report like this:
which is handy to quickly see if there is any issues. For instance, here this report means that 1235 items of boavizta-data-us.csv are not present in dell.csv, 26 items are presents in dell.csv but not in the current db, the current db contains 174 items having one (or more) duplicates (*), among the items that are in both files, 455 are fully covered by dell.csv, but for 42 items we found attributes in boavizta-data-us.csv that are not present in dell.csv. (*) So far duplicates are detected solely based on the model name. This implies some false positives. |
We need a dedicated tool to merge merge multiple .csv files while detecting and merging duplicates.
I've started to implement it through a new static method of
DeviceCarbonFootprint
:and a
merge_csv.py file1 file2
standalone script written on top of the abovemerge
function.By default, priority is given to device2/file2.
Conflicts are detected only for attributes that provided for both devices and when they are clearly different. If they are close enough, then merge only print a warning in verbose mode.
Then, there are two modes to resolve the conflicts:
TODO:
The text was updated successfully, but these errors were encountered: