Skip to content

Data and code from our "Inferring Which Medical Treatments Work from Reports of Clinical Trials", NAACL 2019. This work concerns inferring the results reported in clinical trials from text.

License

Notifications You must be signed in to change notification settings

wangah/evidence-inference

 
 

Repository files navigation

SciFact Pre-training

This work further investigates whether pre-training on related scientific 'fact verification' tasks might improve performance for the Evidence Inference BERT-to-BERT pipeline model. Specifically, we use the SciFact claim verification corpus for such pre-training.

See README.evidence_inference.md for the original Evidence Inference README.

Colab Notebooks

The following experiments were run using Colab Pro.

  • PICO Extraction with bi-LSTM-CRF Open In Colab

  • PICO Extraction with SciBERT Open In Colab

  • SciFact Claim Prediction Analysis and Preprocessing Open In Colab

  • BERT Pipeline Hyperparameter Tuning on SciFact Open In Colab

  • BERT Pipeline Evidence Inference Abstract-Only Open In Colab

  • BERT Pipeline Evidence Inference with SciFact Pretraining Open In Colab

Experiment Design

The following steps were performed to evaluate the effectiveness of SciFact pre-training for the Evidence Inference BERT-to-BERT pipeline model:

  1. Extract and preprocess PICO spans from SciFact claims.

  2. Adapt SciFact data into the format expected by the BERT pipeline and define corresponding samplers.

  3. Train the pipeline on SciFact and save the model weights. We optionally converted RoBERTa to SciBERT due to memory constraints.

  4. Train the pipeline on Evidence Inference using the pre-trained weights.

Note that this experiment does not change the model architecture and instead forces the SciFact dataset into the same format as the Evidence Inference data via PICO extraction of SciFact claims. The following options may also be considered:

  1. Add module to model that learns some representation for prompts and claims first before feeding it to the model.

  2. Apply linearization to both the claims and the prompts.

About

Data and code from our "Inferring Which Medical Treatments Work from Reports of Clinical Trials", NAACL 2019. This work concerns inferring the results reported in clinical trials from text.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 80.5%
  • Jupyter Notebook 18.8%
  • Shell 0.7%