From c9191c2cb4ccdc42afbf01ea5c4a253d27613b50 Mon Sep 17 00:00:00 2001
From: Joe Makepeace <joe.makepeace01@gmail.com>
Date: Thu, 5 May 2022 05:54:02 +0100
Subject: [PATCH] Dataset creation instructions added to README.md

---
 README.md | 33 ++++++++++++++++++++++++++++++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index 1c34a51..352b9c6 100644
--- a/README.md
+++ b/README.md
@@ -65,14 +65,41 @@ python3 train.py \
     --semg_train
 ```
 
+## Create Dataset
+
+There are two main types of datasets which can be used with this module,
+the first one is a regular ASR model which performs speech recognition
+on the ground truth audio files. To create a dataset like this, use:
+
+```bash
+python3 create_dataset.py \
+    --emg_dir "./silent_speech/emg_data" \
+    --testset_path "./silent_speech/testset_largedev.json"
+```
+
+Whereas if you want to create a dataset which that uses the predicted
+mel spectrograms from the transduction model which you have
+already generated, use:
+
+```bash
+python3 create_dataset.py \
+    --emg_dir "./silent_speech/emg_data" \
+    --testset_path "./silent_speech/testset_largedev.json" \
+    --semg_preds_path "./silent_speech/pred_audio"
+```
+
 ## Evaluate
 
-To evaluate the best trained model released with the report, run the
-following code:
+To evaluate the best trained model released with the report,
+download the model from
+[Google Drive](https://drive.google.com/file/d/1O8jIWV1v0orE4kOVA6IG-FgYyFO8OMDH/view?usp=sharing)
+into this directory.
+Then create a dataset of the full EMG data predictions using
+the above instructions and run the following code:
 
 ```bash
 python3 evaluate.py \
-    --checkpoint_path "path_to_pretrained_model/ds2_DATASET_SILENT_SPEECH_EPOCHS_10_TEST_LOSS_1.8498832106590273_WER_0.6825681123095443" \
+    --checkpoint_path "ds2_DATASET_SILENT_SPEECH_EPOCHS_10_TEST_LOSS_1.8498832106590273_WER_0.6825681123095443" \
     --dataset_path "path_to_dataset.csv" \
     --print_top 10 \
     --semg_eval