Assignment1-BMI550

Assignment 1: BMI 550— Applied BioNLP Author: Chase Fensore

Python Packages

P0: lowercasing and word tokenization. TO USE: pass "none" to preprocessing argument of predict_test_CUIs (line 423).
P1: P0 + punctuation removal. TO USE: pass "pp1" to preprocessing argument of predict_test_CUIs (line 423).
P2: P1 + Porter stemming. TO USE: pass "pp2" to preprocessing argument of predict_test_CUIs (line 423).

Levenshtein distance: to use, pass "Levenshtein" to sim_metric argument of predict_test_CUIs (line 423).
Token sort ratio: to use, pass "token_sort_ratio" to sim_metric argument of predict_test_CUIs (line 423).

Token sort ratio: to adjust between 0-100, change min_pred_thresh argument of predict_test_CUIs (line 423).
Levenshtein distance: to adjust between 0-1.0, change min_pred_thresh argument of predict_test_CUIs (line 423).

Run: python rulebased_system_Fensore.py. (Notes: line 409: must set file to read Assignment1GoldStandardSet.xlsx, line 423: set desired hyperparameters described above, line 424: change output file name, if desired.)
New output will be stored in: data/result-Assignment1GoldStandardSet.xlsx.
My best-performing output is currently stored in: results/result-Assignment1GoldStandardSet.xlsx.

Run: python rulebased_system_Fensore.py. (Notes: line 409: must set file to read UnlabeledSet.xlsx, line 423: set desired hyperparameters described above, line 424: change output file name, if desired.)
New output will be stored in data/result-UnlabeledSet.xlsx
Existing output is stored in: results/result-UnlabeledSet.xlsx.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
results		results
.gitignore		.gitignore
EvaluationScript.py		EvaluationScript.py
IAA_Calculator.py		IAA_Calculator.py
README.md		README.md
requirements.txt		requirements.txt
rulebased_system_Fensore.py		rulebased_system_Fensore.py