Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GDCM dependency to Dockerfile #5

Open
danielsnider opened this issue Jan 2, 2019 · 16 comments
Open

Add GDCM dependency to Dockerfile #5

danielsnider opened this issue Jan 2, 2019 · 16 comments

Comments

@danielsnider
Copy link

danielsnider commented Jan 2, 2019

My DICOM images seem to be compressed and ask for GDCM. It would be great to add GDCM to the ocr Dockerfile :-).

This is what I see:

root@38a72d18ad9b:/code# ./entrypoint.sh --input /data
DEBUG Found 1 contender files in data
DEBUG Checking 1 dicom files for validation.
Found 1 valid dicom files
Traceback (most recent call last):
  File "/code/main.py", line 172, in <module>
    main()
  File "/code/main.py", line 108, in main
    verbose=args.verbose)
  File "/code/user/__init__.py", line 31, in __init__
    self.image = dicom._get_pixel_array()
  File "/opt/anaconda2/lib/python2.7/site-packages/pydicom-1.0.0a1-py2.7.egg/pydicom/dataset.py", line 1053, in _get_pixel_array
    self._pixel_array = self._pixel_data_numpy()
  File "/opt/anaconda2/lib/python2.7/site-packages/pydicom-1.0.0a1-py2.7.egg/pydicom/dataset.py", line 690, in _pixel_data_numpy
    raise NotImplementedError("Pixel Data is compressed in a "
NotImplementedError: Pixel Data is compressed in a format pydicom does not yet handle. Cannot return array. Pydicom might be able to convert the pixel data using GDCM if it is installed.

Sidenote: Commenting on line main.py#L157... if you catch all exceptions it would be great to at minimum print the exception instead of hiding it. Not seeing the error is the worst kind of error.

@vsoch
Copy link
Member

vsoch commented Jan 2, 2019

Agree on both fronts! I haven't looked at this repository in a bit, let me see what I can do.

@danielsnider
Copy link
Author

It would be great to release a new build to DockerHub!

@danielsnider
Copy link
Author

I'm eager to compare your progress with a machine learning approach to my approach with tesseract OCR, computer vision, and python.

@vsoch
Copy link
Member

vsoch commented Jan 2, 2019

Definitely! I'd like to do this proper with a Docker container deployed via CI, so I'll likely take a day or two to do this. I'll post an update (and ping you on the PR) when I have something to test.

@danielsnider
Copy link
Author

danielsnider commented Jan 2, 2019 via email

@vsoch
Copy link
Member

vsoch commented Jan 2, 2019

Just a heads up, I'm renaming the repo to "dicom-cleaner" because the scraper doesn't as well describe the intended purpose. The Docker images will be built from:

  • pydicom/dicom-header-cleaner
  • pydicom/dicom-ocr-cleaner

I don't have Github Actions for this organization yet, but if/when we do the deploy will be much easier than relying on an external service.

@vsoch
Copy link
Member

vsoch commented Jan 2, 2019

And it would be really fun to test these images and develop new methods! I created the tooling a while back and nobody was super interested in a proper testing, so I'm really happy you are ! :D

@vsoch
Copy link
Member

vsoch commented Jan 3, 2019

hey @danielsnider I'm still figuring out the CI (customizing the orb to build only on PRs and then push on merge to master) but since the container (with gdcm installed via conda) was erroneously pushed to docker hub, if you want to play around with it, here you go! https://cloud.docker.com/u/pydicom/repository/docker/pydicom/dicom-ocr-cleaner

  • See here for the PR to add CircleCI. Note that I have recipes that work to customize the build/deploy cycle, but I'd like to do it using the new CircleCI orbs so I'm waiting to get some feedback.
  • I didn't test anything or change anything in the image except for using the latest version of deid and pydicom. If you are testing and hit issues please post on the PR and I'll fix it up.
  • I am installing GDCM in the same way it's done for pydicom in the travis testing.

I should have some time to check up on this later, hopefully we will hear back from Circle and then can test the GDCM as well.

@vsoch
Copy link
Member

vsoch commented Jan 3, 2019

hey @danielsnider the CI is added, see the images being built and pushed to the docker repos I mentioned above here --> https://circleci.com/gh/pydicom/dicom-cleaner

I added the gdcm install to the pydicom/dicom-ocr-cleaner, although I didn't do any testing, etc (the PR was to add the continuous integration). If you'd like to use this container as a start for your testing, with your feedback we can open another PR to work on the software itself.

@danielsnider
Copy link
Author

Here we go! I'm trying your new container! Can you look into the error below? I've included the python package versions that are in the container too. It looks like pydicom version 1.0.0a1 actually came through, which I suspect is the problem.

dan@ubuntu:~/dicom-scraper$ sudo docker pull pydicom/dicom-header-cleaner                                          
07fbc26aa5a1: Pull complete 
Digest: sha256:edf9c8f44b0d65c2c013b0de28880d58fc053f62b9e210bfb59fd5729942d1ef
Status: Downloaded newer image for pydicom/dicom-header-cleaner:latest

dan@ubuntu:~/dicom-scraper$ sudo docker run --volume ~/input:/data -it  --entrypoint=/bin/bash  pydicom/dicom-heade
r-cleaner -i
root@f7eaa1bdbc23:/code# ./entrypoint.sh --input /data

Traceback (most recent call last):
  File "/code/main.py", line 180, in <module>
    main()
  File "/code/main.py", line 90, in main
    from deid.dicom import get_files
  File "/usr/local/lib/python3.5/dist-packages/deid/dicom/__init__.py", line 1, in <module>
    from .header import (
  File "/usr/local/lib/python3.5/dist-packages/deid/dicom/header.py", line 31, in <module>
    from .tags import (
  File "/usr/local/lib/python3.5/dist-packages/deid/dicom/tags.py", line 27, in <module>
    from pydicom.tag import tag_in_exception
ImportError: cannot import name 'tag_in_exception'

root@f7eaa1bdbc23:/code# python3.5
Python 3.5.2 (default, Nov 12 2018, 13:43:14) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pydicom.tag import tag_in_exception
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'tag_in_exception'
>>>

root@f7eaa1bdbc23:/code# pip3 freeze
certifi==2018.11.29
chardet==3.0.4
cloudpickle==0.6.1
cycler==0.10.0
dask==1.0.0
decorator==4.3.0
deid==0.1.27
idna==2.8
ipython==2.4.1
kiwisolver==1.0.1
matplotlib==3.0.2
networkx==2.2
numpy==1.15.4
Pillow==5.4.0
pydicom==1.0.0a1
Pygments==2.3.1
pyparsing==2.3.0
python-dateutil==2.7.5
PyWavelets==1.0.1
requests==2.21.0
retrying==1.3.3
scikit-image==0.14.1
scikit-learn==0.15.2
scipy==1.2.0
simplegeneric==0.8.1
simplejson==3.16.0
six==1.12.0
toolz==0.9.0
urllib3==1.21.1
validator.py==1.2.5

Hopefully just a wrong dependency issue on pydicom==1.0.0a1?

@vsoch
Copy link
Member

vsoch commented Jan 4, 2019

Yep this is actually what I expected! The module needs to be updated for the newest release of pydicom. I'll do this in a new PR and we can continue discussion from there.

@danielsnider
Copy link
Author

danielsnider commented Jan 4, 2019 via email

@vsoch
Copy link
Member

vsoch commented Jan 4, 2019

hey @danielsnider a PR is open with fixes to the header cleaner, see here, and see the last comment for a pushed container you can test.

#8

I'm not sure how well you can test with your images since the header flagging largely depends on the deid recipe (the filter) criteria used. If your images aren't flagged I'd like to ask you to look at deid.dicom's default recipe -> and comment on what combinations of header fields are being missed! https://github.com/pydicom/deid/blob/master/deid/data/deid.dicom It's a terrible strategy, generally, but the problems with ML approaches is that they take much longer to do.

The ocr image has a bug #9 that likely needs fixing via updating versions. If you have any insight let me know, the original issue is linked and I just don't have time to work on it now.

@danielsnider
Copy link
Author

danielsnider commented Jan 4, 2019 via email

@danielsnider
Copy link
Author

danielsnider commented Jan 4, 2019 via email

@vsoch
Copy link
Member

vsoch commented Jan 4, 2019

I'm on the same page - the "hardcoded" memorize patterns in headers to find pixels just doesn't have feet!

The image is rebuilding now - I'll let you know how it goes after testing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants