Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

FastMRI dataset onboarding script and detailed examples #444

Merged
merged 64 commits into from
May 19, 2021
Merged
Changes from 1 commit
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
ee10d2c
better logging
ant0nsc Apr 20, 2021
60ebb7c
increase upload timeout
ant0nsc Apr 21, 2021
f9ea89d
onboarding script
ant0nsc Apr 22, 2021
85af616
project file
ant0nsc Apr 22, 2021
14cb742
Merge remote-tracking branch 'origin/main' into antonsc/fastmri
ant0nsc Apr 22, 2021
b112445
changelog
ant0nsc Apr 22, 2021
a1ba5fb
flake8
ant0nsc Apr 22, 2021
4892cee
doc
ant0nsc Apr 22, 2021
e2e9252
fix auth problems
ant0nsc Apr 23, 2021
b20cdb6
fix AWS problem
ant0nsc Apr 23, 2021
a513d8d
Merge remote-tracking branch 'origin/main' into antonsc/fastmri
ant0nsc Apr 23, 2021
c4dbeb6
style fix
ant0nsc Apr 23, 2021
3a15484
fix .tar.gz problem
ant0nsc Apr 26, 2021
3fcf966
fix multi-node problem on HelloContainerfla
ant0nsc Apr 26, 2021
ecab8f2
mypy
ant0nsc Apr 26, 2021
35dac4c
docu
ant0nsc Apr 26, 2021
f63d460
docu
ant0nsc Apr 26, 2021
d24a7dd
running fastmri on knee_singlecoil
ant0nsc Apr 26, 2021
1a6463e
logging noise
ant0nsc Apr 27, 2021
7e5edd2
downgrade azure-mgmt-resource because it leads to loads of warnings
ant0nsc Apr 27, 2021
8ad9354
bug fix
ant0nsc Apr 27, 2021
bc3c37c
cleanup
ant0nsc Apr 27, 2021
7a5014d
flake
ant0nsc Apr 27, 2021
25f8147
docu
ant0nsc Apr 27, 2021
d51679a
docu
ant0nsc Apr 27, 2021
aba61d7
docu
ant0nsc Apr 28, 2021
24a56c4
docu
ant0nsc Apr 28, 2021
3d6251a
progress bar
ant0nsc May 4, 2021
22342c7
rename func
ant0nsc May 4, 2021
b64ee72
PR doc
ant0nsc May 11, 2021
acb51f9
adding more models
ant0nsc May 11, 2021
10fede1
Merge remote-tracking branch 'origin/main' into antonsc/fastmri
ant0nsc May 11, 2021
8d281a0
docu
ant0nsc May 11, 2021
8826890
docu
ant0nsc May 11, 2021
dfdb900
mypy
ant0nsc May 11, 2021
ef52486
test fix
ant0nsc May 12, 2021
8d452ae
Adding more hooks
ant0nsc May 12, 2021
4741fe3
Merge remote-tracking branch 'origin/main' into antonsc/fastmri
ant0nsc May 12, 2021
18d2c63
merge
ant0nsc May 12, 2021
16d5c9c
adding fixed mountpoints
ant0nsc May 12, 2021
cfb32e6
mypy
ant0nsc May 12, 2021
0e2a28d
mypy
ant0nsc May 12, 2021
4da0544
PR doc
ant0nsc May 12, 2021
b1aeba2
doc
ant0nsc May 12, 2021
f88b253
test fix
ant0nsc May 12, 2021
307366f
test fix
ant0nsc May 12, 2021
799a531
docu
ant0nsc May 12, 2021
ab450c3
fallback
ant0nsc May 12, 2021
7ea6cba
removing "unused params" warning
ant0nsc May 12, 2021
040aebc
docker warning
ant0nsc May 12, 2021
0bfaea7
mypy
ant0nsc May 12, 2021
87726ab
docu
ant0nsc May 12, 2021
c683cb1
test fix
ant0nsc May 12, 2021
03065a9
docu
ant0nsc May 12, 2021
da26cd7
docu
ant0nsc May 12, 2021
932dfa8
accidental changes
ant0nsc May 14, 2021
6ef12ab
PR comments
ant0nsc May 14, 2021
b163757
Update InnerEye/Scripts/prepare_fastmri.py
ant0nsc May 14, 2021
d5e2520
fix stuck HelloContainer problem
ant0nsc May 18, 2021
fe16627
diagnostics
ant0nsc May 18, 2021
2bd0eba
Merge remote-tracking branch 'origin/main' into antonsc/fastmri
ant0nsc May 18, 2021
6ada342
remove accidental exit(1)
ant0nsc May 18, 2021
eb303eb
unique name
ant0nsc May 19, 2021
1fb63dc
Merge branch 'main' into antonsc/fastmri
ant0nsc May 19, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
docu
  • Loading branch information
ant0nsc committed Apr 27, 2021
commit 25f81472f0da702b4beab7455f3b761836d1e69b
31 changes: 30 additions & 1 deletion docs/fastmri.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,36 @@ with their corrected extension.
`knee_DICOMs_compressed` and `brain_DICOMs_compressed` (as `.tar` files)


# Troubleshooting
### Troubleshooting the data downloading
If you see a runtime error saying "The subscription is not registered to use namespace 'Microsoft.DataFactory'", then
follow the steps described [here](https://stackoverflow.com/a/48419951/5979993), to enable DataFactory for your
subscription.


## Running a FastMri model with InnerEye

The Azure Data Factory that downloaded the data has put it into the storage account you supplied on the commandline.
If set up correctly, this is the Azure storage account that holds all datasets used in your AzureML workspace.
Hence, after the downloading completes, you are ready to use the InnerEye toolbox to submit an AzureML job that uses
the FastMRI data.

There is an example model already included in the InnerEye toolbox, that uses the `knee_multicoil` dataset. Please
check out [fastmri_varnet.py](../InnerEye/ML/configs/other/fastmri_varnet.py). As with all InnerEye models, you can
start a training run by specifying the name of the class that defines the model, like this:
```shell script
python InnerEye/ML/runner.py --model FastMri --azureml=True --num_nodes=4
```
This will start an AzureML job with 4 nodes training at the same time. Depending on how you set up your compute
cluster, this will use a different number of GPUs: For example, if your cluster uses ND24 virtual machines, where
each VM has 4 Tesla P40 cards, training will use a total of 16 GPUs.

As common with multiple nodes, training time will not scale linearly with increased number of nodes. The following
table gives a rough overview of time to train the FastMri model in the InnerEye toolbox on our cluster (4 Tesla P40
cards per node):

| Step | 1 node | 4 nodes | 8 nodes |
| --- | --- | --- | --- |
| Download training data (1.25 TB) | 22min | 22min | 22min |
| Train and validate 1 epoch | 4h 15min | 1h 6min | 34min |
| Evaluate on test set | 30min | 30min | 30min |
| Total time | 5h 7min | 1h 58min | 1h 26min |