Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn on implicit --in-place #134

Merged
merged 4 commits into from
Sep 6, 2022
Merged

Warn on implicit --in-place #134

merged 4 commits into from
Sep 6, 2022

Conversation

blairconrad
Copy link
Owner

@blairconrad blairconrad commented Aug 23, 2022

I've come to recommend against in-place anonymization in most cases. By overwriting the original files, we run the risk of losing those files. This is obvious, and can be protected against by copying the originals before anonymizing. A more subtle problem is that anonymization of dates in dicognito is performed by shifting them slightly into the past. If the same files are anonymized in place repeatedly, we find that study dates, patient birth dates, and the like continue to shift backwards in time, which can cause problems when the data is used.

Thus, dicognito will eventually require explicit choice of --in-place mode or a designated --output-directory.

@blairconrad
Copy link
Owner Author

blairconrad commented Aug 29, 2022

@paulsbduncan, I'm bothering you for a review (no rush! take weeks) not of the code, although that would be welcome as well, but of the concept. You're a principal user. Is the end goal (require --output-directory or --in-place) reasonable? Can you think of a better goal? If this goal is good, is the path I'm taking reasonable?

@paulsbduncan
Copy link
Collaborator

I think so, @blairconrad. There are two main use cases I see:

  1. I downloaded some real data to debug an issue at a site! Quick, anonymize it!
  2. I have some 'seed' data and want to create more data

I want to make sure #1 is easy, because I think it is the most important case (though likely not the most USED case). Users should never be confused, thinking they've anonymized data when they've actually just made a copy of anonymized data.

So, I DO like the overall goal of requiring -o or -i (*). as long as it clearly communicates that -o doesn't anonymize the in place data.

*Oh, I'd love a short command for --in-place, like -i

@blairconrad
Copy link
Owner Author

blairconrad commented Aug 29, 2022

Thanks for the feedback, @paulsbduncan. Yeah, I'm also hoping to avoid confusion. Hence the required switch rather than "well you didn't say --in-place, so I'll assume your first argument is the output directory".

I'll continue down this path.

*Oh, I'd love a short command for --in-place, like -i

I was going to say that we shouldn't've used -i as the short form for --id-prefix, but I guess we didn't. -p. Who knew! I guess -i was my proposed short form for the hypothetical "--id-infix" addition…

Your request is reasonable! PR to be updated as time allows.

@blairconrad
Copy link
Owner Author

I just noticed this part of your comment, @paulsbduncan

as long as it clearly communicates that -o doesn't anonymize the in place data

It's hard for the author to determine whether such clear communication is in place.

From the --help

  --output-directory OUTPUT_DIRECTORY, -o OUTPUT_DIRECTORY
                        Write anonymized files to OUTPUT_DIRECTORY. The output filename will be the
                        new SOP Instance UID. OUTPUT_DIRECTORY will be created if necessary.

I think this meets your goal. Tell me if it don't.

@paulsbduncan
Copy link
Collaborator

Yeah, I think your usage instructions are good. I think I was thinking of something drastic like a warning after you run it that the original files remain unanonymized, but probably that is going to far.

paulsbduncan
paulsbduncan previously approved these changes Sep 6, 2022
@blairconrad
Copy link
Owner Author

blairconrad commented Sep 6, 2022

I think I was thinking of something drastic like a warning after you run it that the original files remain unanonymized, but probably that is going to far.

Agreed:

User: *uses tool in a perfectly cromulent way*
Tool: DANGER!! I DID WHAT YOU WANTED!

Thanks for the feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants