Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add examples of advanced SYNERGY use #96

Merged
merged 40 commits into from
Apr 24, 2023
Merged
Changes from 1 commit
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
0a89e4a
Add new README to branch
J535D165 Mar 22, 2023
0b25352
Update README.md
J535D165 Mar 22, 2023
41dfbd8
Add files via upload
J535D165 Mar 22, 2023
f81c8fa
Add image
J535D165 Mar 22, 2023
9ff3262
Update README.md
J535D165 Mar 22, 2023
92e25d3
Add Kwok dataset
J535D165 Mar 22, 2023
bdd4ac8
Update README.md
J535D165 Mar 22, 2023
8033a6c
Update README.md
J535D165 Mar 22, 2023
a6a146e
Update README.md
J535D165 Mar 22, 2023
7f45467
Update README.md
J535D165 Mar 22, 2023
abb1eb0
Update README.md
J535D165 Mar 22, 2023
b4046a4
Add link to web.archive.org
J535D165 Apr 1, 2023
9bd3dc9
Create ATTRIBUTION.md
J535D165 Apr 2, 2023
e44ac02
Update README.md
J535D165 Apr 2, 2023
c60ea72
Update ATTRIBUTION.md
J535D165 Apr 2, 2023
ef31c70
Update LICENSE
J535D165 Apr 2, 2023
7ee4b0c
Update broken links in ATTRIBUTION.md
J535D165 Apr 4, 2023
7d48dfc
Merge branch 'master' into README
J535D165 Apr 4, 2023
ca0a175
Update numbers in README.md
J535D165 Apr 10, 2023
6694da4
Fix wrong percentage
J535D165 Apr 10, 2023
4342069
Add examples on Python package
J535D165 Apr 15, 2023
51dd9d8
Update README.md
J535D165 Apr 15, 2023
d8de592
Update README.md
J535D165 Apr 15, 2023
b37200f
Update attribution
J535D165 Apr 15, 2023
412eba5
Update ATTRIBUTION.md
J535D165 Apr 15, 2023
5ba3470
Update ATTRIBUTION.md
J535D165 Apr 15, 2023
b8ec22d
Update README.md
J535D165 Apr 15, 2023
dd95287
Update README.md
J535D165 Apr 15, 2023
f42e058
Add LICENSE info
J535D165 Apr 16, 2023
bce3a20
Update license text
J535D165 Apr 16, 2023
10fe4a0
Update codebook
J535D165 Apr 16, 2023
202c2ee
Merge branch 'README' into examples
J535D165 Apr 16, 2023
b7eaad3
Merge branch 'master' into README
J535D165 Apr 16, 2023
ad598cc
Merge branch 'README' into examples
J535D165 Apr 16, 2023
cab2654
Add notebook on concepts in SYNERGY
J535D165 Apr 17, 2023
da3c1d4
Add more API examples
J535D165 Apr 24, 2023
5519fe1
Remove changes to readme
J535D165 Apr 24, 2023
4bc5697
Delete ATTRIBUTION.md
J535D165 Apr 24, 2023
dd337f8
Add Attribution
J535D165 Apr 24, 2023
9cd3fb6
Merge branch 'master' into examples
J535D165 Apr 24, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update README.md
  • Loading branch information
J535D165 committed Mar 22, 2023
commit 8033a6c0a7538a823ac23ab361cef53c0b086e3b
63 changes: 17 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,61 +117,32 @@ oa_status
oa_url
```

## Benchmark

### Integration with ASReview Makita
Work in progress.

First install both pyodss and makita with
```sh
pip install pyodss asreview-makita
```

Now create a new folder and run the following code:

Linux/MacOS:

```sh
pyodss get -o data
asreview makita basic
sh run.sh
```

Windows:

```bat
pyodss get -o data
asreview makita basic - run.bat
run.bat
```


# Data pre-processing

The full text of each article is pre-processed using natural language processing techniques. This includes tasks such as sentence segmentation, tokenization, part-of-speech tagging, and named entity recognition. The pre-processing step is designed to extract meaningful features from the text that can be used to train machine learning algorithms. The resulting pre-processed dataset is then split into training and testing sets, with a predefined ratio.



# Data Format
## Attribution

The dataset is provided in a standardized format that includes the following fields:
We would like to thank the following authors for openly sharing the data correponding their systematic review:

Title: The title of the article.
Abstract: The abstract of the article.
Full Text: The full text of the article, pre-processed using natural language processing techniques.
Inclusion Status: A binary label indicating whether the article was included or excluded in the systematic review.
Reason for Exclusion: If the article was excluded, a brief explanation of the reason for exclusion.
The dataset is provided in both CSV and JSON formats.
[Christian Appenzeller-Herzog](https://orcid.org/0000-0001-7430-294X), [Tim Mathes](https://orcid.org/0000-0002-5304-1717), Marlies L.S. Heeres, [Karl Heinz Weiss](https://orcid.org/0000-0002-6336-9935), [Roderick H. J. Houwen](https://orcid.org/0000-0001-6124-7937), [Hannah Ewald](https://orcid.org/0000-0002-5081-1093), [Alexandra Bannach-Brown](https://orcid.org/0000-0002-3161-1395), [Piotr Przybyła](https://orcid.org/0000-0001-9043-6817), James D. Thomas, [Andrew S.C. Rice](https://orcid.org/0000-0001-9533-5636), [Sophia Ananiadou](https://orcid.org/0000-0002-4097-9191), [Jing Liao](https://orcid.org/0000-0001-7014-5377), [Malcolm R. Macleod](https://orcid.org/0000-0001-9187-9839), [Daniel Bos](https://orcid.org/0000-0001-8979-2603), [Frank J. Wolters](https://orcid.org/0000-0003-2226-4050), [Sirwan K.L. Darweesh](https://orcid.org/0000-0002-4361-4593), [Meike W. Vernooij](https://orcid.org/0000-0003-4658-2176), Frank de Wolf, [M. Arfan Ikram](https://orcid.org/0000-0003-0173-9571), [Albert Hofman](https://orcid.org/0000-0002-9865-121X), [Roger Chou](https://orcid.org/0000-0001-9889-8610), Elizabeth A. Clark, Mark Helfand, [Roger Chou](https://orcid.org/0000-0001-9889-8610), Kim Peterson, Mark Helfand, [Anouk A. M. T. Donners](https://orcid.org/0000-0002-8147-013X), Carin M. A. Rademaker, Lisanne A. H. Bevers, [Alwin D. R. Huitema](https://orcid.org/0000-0003-1939-4639), [Roger E. G. Schutgens](https://orcid.org/0000-0002-2762-6033), [Antoine C. G. Egberts](https://orcid.org/0000-0003-1758-7779), [Krista Fischer](https://orcid.org/0000-0001-7126-6613), [Trevor J. Hall](https://orcid.org/0000-0002-0427-6325), [Sarah Beecham](https://orcid.org/0000-0003-1584-5447), David Bowes, David Gray, [Serena J. Counsell](https://orcid.org/0000-0002-8033-5673), [Cathalijn H. C. Leenaars](https://orcid.org/0000-0002-8212-7632), Wilhelmus Drinkenburg, Christ Nolten, Maurice Dematteis, Ruud N. J. M. A. Joosten, Matthijs G. P. Feenstra, [Rob B. M. de Vries](https://orcid.org/0000-0002-0000-8796), [Rosanne W. Meijboom](https://orcid.org/0000-0002-7370-0695), [Helga Gardarsdottir](https://orcid.org/0000-0001-5623-9684), [Antoine C. G. Egberts](https://orcid.org/0000-0003-1758-7779), [Thijs J. Giezen](https://orcid.org/0000-0002-4087-033X), Heidi Nelson, Linda Humphrey, Peggy Nygren, Steven M. Teutsch, Janet D. Allan, Dimitrije Radjenović, [Marjan Hericko](https://orcid.org/0000-0002-1094-0085), [Richard Torkar](https://orcid.org/0000-0002-0118-8143), Aleš Živkovič, [Sanne C. Smid](https://orcid.org/0000-0001-6451-202X), [Daniel McNeish](https://orcid.org/0000-0003-1643-9408), [Milica Miočević](https://orcid.org/0000-0001-8487-3666), [Rens van de Schoot](https://orcid.org/0000-0001-7736-2091), [Eline S van der Valk](https://orcid.org/0000-0001-5134-5453), [Ozair Abawi](https://orcid.org/0000-0002-1343-6562), Mostafa Mohseni, Amir Abdelmoumen, Vincent L. Wester, [Bibian van der Voorn](https://orcid.org/0000-0003-1299-0067), [Anand Krishnan V. Iyer](https://orcid.org/0000-0002-2090-5590), [Erica L T van den Akker](https://orcid.org/0000-0001-5352-9328), [Sanne E. Hoeks](https://orcid.org/0000-0003-4022-9574), Sjoerd A.A. van den Berg, [Yolanda B. de Rijke](https://orcid.org/0000-0001-7759-4968), [Tobias Stalder](https://orcid.org/0000-0001-7558-1274), [Elisabeth F.C. van Rossum](https://orcid.org/0000-0003-0120-4913), [Rens van de Schoot](https://orcid.org/0000-0001-7736-2091), [Marit Sijbrandij](https://orcid.org/0000-0001-5430-9810), [Sonja D. Winter](https://orcid.org/0000-0002-2203-002X), [Sarah Depaoli](https://orcid.org/0000-0002-1277-0462), [Jeroen K. Vermunt](https://orcid.org/0000-0001-9053-9330), Eva A.M. van Dis, [Suzanne C. van Veen](https://orcid.org/0000-0002-5659-2557), Muriel A. Hagenaars, [Neeltje M. Batelaan](https://orcid.org/0000-0001-6444-3781), [Claudi L H Bockting](https://orcid.org/0000-0002-9220-9244), [Rinske M van den Heuvel](https://orcid.org/0000-0002-3835-4686), [Pim Cuijpers](https://orcid.org/0000-0001-5497-2743), Iris M. Engelhard, [Frank J. Wolters](https://orcid.org/0000-0003-2226-4050), Reffat A. Segufa, [Sirwan K.L. Darweesh](https://orcid.org/0000-0002-4361-4593), [Daniel Bos](https://orcid.org/0000-0001-8979-2603), [M. Arfan Ikram](https://orcid.org/0000-0003-0173-9571), [Behnam Sabayan](https://orcid.org/0000-0002-1176-9152), [Albert Hofman](https://orcid.org/0000-0002-9865-121X), [Sanaz Sedaghat](https://orcid.org/0000-0002-3244-7726)

# Dataset Size
For more credits, run `pyodss attribution`.

The dataset contains a total of X articles, of which Y were included in the systematic review and Z were excluded. The training set contains a subset of X articles, with a predefined ratio, while the testing set contains the remaining articles.
## Citing SYNERGY dataset

The ODSS dataset is a linked dataset that consists of Study Selection in Systematic Reviews. The dataset consists of XXX fully labeled datasets. For all these datasets, an OpenAlex record is available.
If you use SYNERGY in a scientific publication, we would appreciate references to:

## Attribution
Biblatex entry:

We would like to thank the following authors for openly sharing the data correponding their systematic review:
@online{xxx,
author = {xxx},
title = {xxx},
date = {xxx},
year = {2023},
}

[Christian Appenzeller-Herzog](https://orcid.org/0000-0001-7430-294X), [Tim Mathes](https://orcid.org/0000-0002-5304-1717), Marlies L.S. Heeres, [Karl Heinz Weiss](https://orcid.org/0000-0002-6336-9935), [Roderick H. J. Houwen](https://orcid.org/0000-0001-6124-7937), [Hannah Ewald](https://orcid.org/0000-0002-5081-1093), [Alexandra Bannach-Brown](https://orcid.org/0000-0002-3161-1395), [Piotr Przybyła](https://orcid.org/0000-0001-9043-6817), James D. Thomas, [Andrew S.C. Rice](https://orcid.org/0000-0001-9533-5636), [Sophia Ananiadou](https://orcid.org/0000-0002-4097-9191), [Jing Liao](https://orcid.org/0000-0001-7014-5377), [Malcolm R. Macleod](https://orcid.org/0000-0001-9187-9839), [Daniel Bos](https://orcid.org/0000-0001-8979-2603), [Frank J. Wolters](https://orcid.org/0000-0003-2226-4050), [Sirwan K.L. Darweesh](https://orcid.org/0000-0002-4361-4593), [Meike W. Vernooij](https://orcid.org/0000-0003-4658-2176), Frank de Wolf, [M. Arfan Ikram](https://orcid.org/0000-0003-0173-9571), [Albert Hofman](https://orcid.org/0000-0002-9865-121X), [Roger Chou](https://orcid.org/0000-0001-9889-8610), Elizabeth A. Clark, Mark Helfand, [Roger Chou](https://orcid.org/0000-0001-9889-8610), Kim Peterson, Mark Helfand, [Anouk A. M. T. Donners](https://orcid.org/0000-0002-8147-013X), Carin M. A. Rademaker, Lisanne A. H. Bevers, [Alwin D. R. Huitema](https://orcid.org/0000-0003-1939-4639), [Roger E. G. Schutgens](https://orcid.org/0000-0002-2762-6033), [Antoine C. G. Egberts](https://orcid.org/0000-0003-1758-7779), [Krista Fischer](https://orcid.org/0000-0001-7126-6613), [Trevor J. Hall](https://orcid.org/0000-0002-0427-6325), [Sarah Beecham](https://orcid.org/0000-0003-1584-5447), David Bowes, David Gray, [Serena J. Counsell](https://orcid.org/0000-0002-8033-5673), [Cathalijn H. C. Leenaars](https://orcid.org/0000-0002-8212-7632), Wilhelmus Drinkenburg, Christ Nolten, Maurice Dematteis, Ruud N. J. M. A. Joosten, Matthijs G. P. Feenstra, [Rob B. M. de Vries](https://orcid.org/0000-0002-0000-8796), [Rosanne W. Meijboom](https://orcid.org/0000-0002-7370-0695), [Helga Gardarsdottir](https://orcid.org/0000-0001-5623-9684), [Antoine C. G. Egberts](https://orcid.org/0000-0003-1758-7779), [Thijs J. Giezen](https://orcid.org/0000-0002-4087-033X), Heidi Nelson, Linda Humphrey, Peggy Nygren, Steven M. Teutsch, Janet D. Allan, Dimitrije Radjenović, [Marjan Hericko](https://orcid.org/0000-0002-1094-0085), [Richard Torkar](https://orcid.org/0000-0002-0118-8143), Aleš Živkovič, [Sanne C. Smid](https://orcid.org/0000-0001-6451-202X), [Daniel McNeish](https://orcid.org/0000-0003-1643-9408), [Milica Miočević](https://orcid.org/0000-0001-8487-3666), [Rens van de Schoot](https://orcid.org/0000-0001-7736-2091), [Eline S van der Valk](https://orcid.org/0000-0001-5134-5453), [Ozair Abawi](https://orcid.org/0000-0002-1343-6562), Mostafa Mohseni, Amir Abdelmoumen, Vincent L. Wester, [Bibian van der Voorn](https://orcid.org/0000-0003-1299-0067), [Anand Krishnan V. Iyer](https://orcid.org/0000-0002-2090-5590), [Erica L T van den Akker](https://orcid.org/0000-0001-5352-9328), [Sanne E. Hoeks](https://orcid.org/0000-0003-4022-9574), Sjoerd A.A. van den Berg, [Yolanda B. de Rijke](https://orcid.org/0000-0001-7759-4968), [Tobias Stalder](https://orcid.org/0000-0001-7558-1274), [Elisabeth F.C. van Rossum](https://orcid.org/0000-0003-0120-4913), [Rens van de Schoot](https://orcid.org/0000-0001-7736-2091), [Marit Sijbrandij](https://orcid.org/0000-0001-5430-9810), [Sonja D. Winter](https://orcid.org/0000-0002-2203-002X), [Sarah Depaoli](https://orcid.org/0000-0002-1277-0462), [Jeroen K. Vermunt](https://orcid.org/0000-0001-9053-9330), Eva A.M. van Dis, [Suzanne C. van Veen](https://orcid.org/0000-0002-5659-2557), Muriel A. Hagenaars, [Neeltje M. Batelaan](https://orcid.org/0000-0001-6444-3781), [Claudi L H Bockting](https://orcid.org/0000-0002-9220-9244), [Rinske M van den Heuvel](https://orcid.org/0000-0002-3835-4686), [Pim Cuijpers](https://orcid.org/0000-0001-5497-2743), Iris M. Engelhard, [Frank J. Wolters](https://orcid.org/0000-0003-2226-4050), Reffat A. Segufa, [Sirwan K.L. Darweesh](https://orcid.org/0000-0002-4361-4593), [Daniel Bos](https://orcid.org/0000-0001-8979-2603), [M. Arfan Ikram](https://orcid.org/0000-0003-0173-9571), [Behnam Sabayan](https://orcid.org/0000-0002-1176-9152), [Albert Hofman](https://orcid.org/0000-0002-9865-121X), [Sanaz Sedaghat](https://orcid.org/0000-0002-3244-7726)
## Contact

For more credits, run `pyodss credits`.
Reach out on the [Discussion forum](https://github.com/asreview/systematic-review-datasets/discussions).