Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Pseudonymization #10776

Merged
merged 11 commits into from
Jan 14, 2024
Merged

Initial Pseudonymization #10776

merged 11 commits into from
Jan 14, 2024

Conversation

koppor
Copy link
Member

@koppor koppor commented Jan 13, 2024

This adds an initial anonymization of BibTeX libraries.

Use case is to make .bib files available inside JabRef's repository for reproducing issues. One thing are performance issues, other things are certain quality checks.

The current implementation is very basic, but enough for me currently.

I would propose to merge this in and refine afterwards. TODOs are inside. Other future work is to include this functionality in the CLI (and the UI). In the UI similar to library-based-on-aux-generation.

NO CHANGELOG entry, because functionality accessible through tests only.

Mandatory checks

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

@Siedlerchr
Copy link
Member

You know that we have a bib file generator python script to quickly generate 1000s of entries ? https://github.com/JabRef/jabref/blob/main/scripts/bib-file-generator.py

In addtion, for reproducing issues or so it does not make any sense.

@koppor
Copy link
Member Author

koppor commented Jan 13, 2024

In addtion, for reproducing issues or so it does not make any sense.

I will implement the feature to check a BibTeX file in the context of a paper. Here, I need mappings of real examples. This is (for me) more easy than replicating the properties of the real example in a Python script. I am think of end-to-end tests closer to reality.

I can also include this code in a follow-up pull request.

Additionally:

- .bib files now need to have \n as line ending
- Use \n as newline separator at medline and ris imports
- use @ParameterizedTests
- Improve code of RisImporter
@koppor koppor changed the title Initial anonymization [WIP] Initial Pseudonymization Jan 13, 2024
@koppor
Copy link
Member Author

koppor commented Jan 13, 2024

This PR (should) also fix issues with .bib files maintained in this repository.

@koppor
Copy link
Member Author

koppor commented Jan 13, 2024

Also fixes equals for BibDatabaseContext. The eventBus object (created for each context) was also compared. Therefore, even if equal at other fields, two context were never equal.

@koppor koppor changed the title [WIP] Initial Pseudonymization Initial Pseudonymization Jan 13, 2024
@koppor
Copy link
Member Author

koppor commented Jan 13, 2024

The image at PR description at #10778 is my use case.

Comment on lines +65 to +66
// TODO: Anonymize metadata
// TODO: Anonymize strings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followup?

Comment on lines 34 to 48
public void writeValuesMappingAsCsv(Path path) throws IOException {
try (
OutputStreamWriter writer = new OutputStreamWriter(Files.newOutputStream(path), StandardCharsets.UTF_8);
CSVPrinter csvPrinter = new CSVPrinter(writer, CSVFormat.DEFAULT)
) {
csvPrinter.printRecord("pseudonymized", "original value");
valueMapping.entrySet().stream()
// We have date-1, date-2, ..., date-10, date-11. That should be sorted accordingly.
.sorted(Comparator.comparing((Map.Entry<String, String> entry) -> getKeyPrefix(entry.getKey())
).thenComparingInt(entry -> extractNumber(entry.getKey())))
.forEach(Unchecked.consumer(entry -> {
csvPrinter.printRecord(entry.getKey(), entry.getValue());
}));
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. ;-)

Copy link
Contributor

github-actions bot commented Jan 14, 2024

The build for this PR is no longer available. Please visit https://builds.jabref.org/main/ for the latest build.

@koppor koppor added this pull request to the merge queue Jan 14, 2024
Merged via the queue into main with commit 6582752 Jan 14, 2024
18 checks passed
@koppor koppor deleted the add-anonymization branch January 14, 2024 19:30
Siedlerchr added a commit that referenced this pull request Jan 15, 2024
* upstream/main:
  Bump org.apache.lucene:lucene-queries from 9.9.0 to 9.9.1 (#10795)
  Bump com.google.guava:guava from 32.1.3-jre to 33.0.0-jre (#10793)
  Bump com.dlsc.gemsfx:gemsfx from 1.90.0 to 1.92.0 (#10796)
  Bump org.mockito:mockito-core from 5.8.0 to 5.9.0 (#10794)
  Bump lycheeverse/lychee-action from 1.9.0 to 1.9.1 (#10791)
  refactor: Transform calls to `Objects.isNull(..)` and `Objects.nonNull(..)` (#10788)
  refactor: Prefer `String#formatted(Object...)` (#10787)
  refactor: Adopt `SequencedCollection` (#10786)
  Update CSL styles (#10785)
  Initial Pseudonymization (#10776)
  Use StringUtil.intValueOf instead of StringUtil.intValueOfOptional or custom code (#10779)
  Refine loading code (#10780)
  Add wokraround for theme detector issue (#10777)
  Fix enablement of Aux dialog's "Generate" button (#10775)
  Fix package of TypedBibEntry (#10774)
  Fix labeling of PRs for newcomers (#10773)
  Update add-greeting-to-issue.yml
  Update add-greeting-to-issue.yml
  Update add-greeting-to-issue.yml
@koppor koppor mentioned this pull request Jun 20, 2024
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants