-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial Pseudonymization #10776
Initial Pseudonymization #10776
Conversation
You know that we have a bib file generator python script to quickly generate 1000s of entries ? https://github.com/JabRef/jabref/blob/main/scripts/bib-file-generator.py In addtion, for reproducing issues or so it does not make any sense. |
I will implement the feature to check a BibTeX file in the context of a paper. Here, I need mappings of real examples. This is (for me) more easy than replicating the properties of the real example in a Python script. I am think of end-to-end tests closer to reality. I can also include this code in a follow-up pull request. |
Additionally: - .bib files now need to have \n as line ending - Use \n as newline separator at medline and ris imports - use @ParameterizedTests - Improve code of RisImporter
This PR (should) also fix issues with |
Also fixes |
The image at PR description at #10778 is my use case. |
// TODO: Anonymize metadata | ||
// TODO: Anonymize strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followup?
public void writeValuesMappingAsCsv(Path path) throws IOException { | ||
try ( | ||
OutputStreamWriter writer = new OutputStreamWriter(Files.newOutputStream(path), StandardCharsets.UTF_8); | ||
CSVPrinter csvPrinter = new CSVPrinter(writer, CSVFormat.DEFAULT) | ||
) { | ||
csvPrinter.printRecord("pseudonymized", "original value"); | ||
valueMapping.entrySet().stream() | ||
// We have date-1, date-2, ..., date-10, date-11. That should be sorted accordingly. | ||
.sorted(Comparator.comparing((Map.Entry<String, String> entry) -> getKeyPrefix(entry.getKey()) | ||
).thenComparingInt(entry -> extractNumber(entry.getKey()))) | ||
.forEach(Unchecked.consumer(entry -> { | ||
csvPrinter.printRecord(entry.getKey(), entry.getValue()); | ||
})); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. ;-)
The build for this PR is no longer available. Please visit https://builds.jabref.org/main/ for the latest build. |
* upstream/main: Bump org.apache.lucene:lucene-queries from 9.9.0 to 9.9.1 (#10795) Bump com.google.guava:guava from 32.1.3-jre to 33.0.0-jre (#10793) Bump com.dlsc.gemsfx:gemsfx from 1.90.0 to 1.92.0 (#10796) Bump org.mockito:mockito-core from 5.8.0 to 5.9.0 (#10794) Bump lycheeverse/lychee-action from 1.9.0 to 1.9.1 (#10791) refactor: Transform calls to `Objects.isNull(..)` and `Objects.nonNull(..)` (#10788) refactor: Prefer `String#formatted(Object...)` (#10787) refactor: Adopt `SequencedCollection` (#10786) Update CSL styles (#10785) Initial Pseudonymization (#10776) Use StringUtil.intValueOf instead of StringUtil.intValueOfOptional or custom code (#10779) Refine loading code (#10780) Add wokraround for theme detector issue (#10777) Fix enablement of Aux dialog's "Generate" button (#10775) Fix package of TypedBibEntry (#10774) Fix labeling of PRs for newcomers (#10773) Update add-greeting-to-issue.yml Update add-greeting-to-issue.yml Update add-greeting-to-issue.yml
This adds an initial anonymization of BibTeX libraries.
Use case is to make
.bib
files available inside JabRef's repository for reproducing issues. One thing are performance issues, other things are certain quality checks.The current implementation is very basic, but enough for me currently.
I would propose to merge this in and refine afterwards. TODOs are inside. Other future work is to include this functionality in the CLI (and the UI). In the UI similar to library-based-on-aux-generation.
NO CHANGELOG entry, because functionality accessible through tests only.
Mandatory checks
CHANGELOG.md
described in a way that is understandable for the average user (if applicable)