Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
biomedical_ner @ 0a62b91		biomedical_ner @ 0a62b91
models		models
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Repository files navigation

KART: Privacy Leakage Framework of Language Models Pre-trained with Clinical Records

This is an implementation of our arXiv preprint paper (https://arxiv.org/abs/2101.00036) "KART: Privacy Leakage Framework of Language Models Pre-trained with Clinical Records."

Usage

0. Requirements

Python 3.6.4
Make sure that $HOME is set to environment variable $PYTHONPATH.

1. Preparation

1-1. Get Poetry

curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py > ~/get-poetry.py
cd ~
python get-poetry.py --version  1.1.4

poetry config virtualenvs.in-project true

1-2. Clone Repository & Install Packages

cd ~
git clone [email protected]:yutanakamura-tky/kart.git
cd ~/kart
poetry install

1-3. Make MIMIC-III-dummy-PHI

cd ~/kart/src
bash make_mimic_iii_dummy_phi.sh

1-4. Get non-domain-specific uncased BERT-base model

cd ~/kart/src
bash get_google_bert_model.sh

1-5. Convert MIMIC-III to BERT pre-training data

cd ~/kart/src
bash make_pretraining_data.sh

1-6. Pre-train BERT model from scratch

cd ~/kart/src
bash pretrain_bert_from_scratch.sh

1-7. Pre-train BERT model from BERT-base-uncased

cd ~/kart/src
bash pretrain_bert_from_bert_base_uncased.sh

Citation

Please cite our arXiv paper:

@misc{kart,
Author = {Yuta Nakamura and Shouhei Hanaoka and Yukihiro Nomura and Naoto Hayashi and Osamu Abe and Shuntaro Yada and Shoko Wakamiya and Eiji Aramaki},
Title = {KART: Privacy Leakage Framework of Language Models Pre-trained with Clinical Records},
Year = {2020},
Eprint = {arXiv:2101.00036},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KART: Privacy Leakage Framework of Language Models Pre-trained with Clinical Records

Usage

0. Requirements

1. Preparation

1-1. Get Poetry

1-2. Clone Repository & Install Packages

1-3. Make MIMIC-III-dummy-PHI

1-4. Get non-domain-specific uncased BERT-base model

1-5. Convert MIMIC-III to BERT pre-training data

1-6. Pre-train BERT model from scratch

1-7. Pre-train BERT model from BERT-base-uncased

Citation

About

Releases

Packages

Languages

yutanakamura-tky/kart

Folders and files

Latest commit

History

Repository files navigation

KART: Privacy Leakage Framework of Language Models Pre-trained with Clinical Records

Usage

0. Requirements

1. Preparation

1-1. Get Poetry

1-2. Clone Repository & Install Packages

1-3. Make MIMIC-III-dummy-PHI

1-4. Get non-domain-specific uncased BERT-base model

1-5. Convert MIMIC-III to BERT pre-training data

1-6. Pre-train BERT model from scratch

1-7. Pre-train BERT model from BERT-base-uncased

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages