Skip to content

Latest commit

 

History

History

ocr

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Dicom OCR Scraper

important this model would need to be rebuilt to work with more recent python and packages. The older version is provided at vanessa/dicom-scraper. We will need to rebuild the model using python 3 and a later version of scikit-learn to update the base container.


This is currently under development, and builds a Docker image to run text (letter detection) on a demo image. You can either detect (just find and report) or clean the data (and save cleaned png images). If this is a route we want to go, the data can be saved as dicom proper.

Docker

First, to build the image (or just skip to download and use version built on Docker Hub):

$ docker build -t vanessa/dicom-scraper .

Then to run it, you can first see if it works:

$ docker run vanessa/dicom-scraper --help

and you should see usage

$ docker run vanessa/dicom-scraper --help
usage: main.py [-h] [--input FOLDER] [--outfolder OUTFOLDER] [--detect]
               [--verbose]

Deid (de-identification) pixel scaping tool.

optional arguments:
  -h, --help            show this help message and exit
  --input FOLDER, -i FOLDER
                        input folder to search for images.
  --outfolder OUTFOLDER, -o OUTFOLDER
                        full path to save output, will use /data folder if not
                        specified
  --detect, -d          Only detect, but don't try to scrub
  --verbose, -v         if set, print more image debugging to screen.

We see that you should provide a folder with dicom files to the --input argument. If you want to see the image files preprocessed (with contenders in red boxes), you should also map a --volume. If you only want to detect (and not clean) you can use --detect.

Detection

Let's cd to some folder with dicom images, and then map it (the $PWD to /data) in the container. We will specify --input to be /data, meaning the mapped folder with our images. Let's try just detection first

$ cd dicom_folder
$ docker run --volume $PWD:/data vanessa/dicom-scraper --input /data --detect

You'll see overly verbose output (this would be nice to replace with a progress bar) followed by the final summary of detection:

DETECTED: 84
SKIPPED:  27
CLEAN:    1
TOTAL:    112

Clean after Detection

Now we will specify the same command, but without --detect so we also perform cleaning (note this is under development).

cd dicom_folder
$ docker run --volume $PWD:/data vanessa/dicom-scraper --input /data

You'll see the pixels that are being cleaned, and the output (png files for preview) in the same folder (the deprecation warnings need to be disabled):

$ docker run --volume $PWD:/data vanessa/dicom-scraper --input /data
DEBUG Found 5 contender files in data
DEBUG Checking 5 dicom files for validation.
/opt/anaconda2/lib/python2.7/site-packages/skimage/transform/_warps.py:84: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
  warn("The default mode, 'constant', will be changed to 'reflect' in "
/opt/anaconda2/lib/python2.7/site-packages/skimage/feature/_hog.py:119: skimage_deprecation: Default value of `block_norm`==`L1` is deprecated and will be changed to `L2-Hys` in v0.15
  'be changed to `L2-Hys` in v0.15', skimage_deprecation)
Found 5 valid dicom files
Scrubbing (108,0,408,512)
Scrubbing (418,200,431,230)
Scrubbing (418,241,431,245)
Scrubbing (418,255,431,263)
Scrubbing (422,268,431,287)
Scrubbing (422,288,435,296)
Scrubbing (437,202,452,214)
Scrubbing (437,214,450,222)
Scrubbing (437,223,452,236)
Scrubbing (437,239,450,245)
============================================================
Scrubbing (48,255,63,271)
Scrubbing (50,249,63,253)
Scrubbing (54,276,63,295)
Scrubbing (54,296,67,304)
============================================================
Scrubbing (64,0,447,512)
Scrubbing (438,268,447,287)
Scrubbing (438,288,451,296)
Scrubbing (453,202,468,214)
Scrubbing (453,214,466,222)
Scrubbing (453,223,468,236)
Scrubbing (453,239,466,245)
============================================================
Scrubbing (90,184,103,214)
Scrubbing (90,225,103,229)
Scrubbing (90,239,103,247)
Scrubbing (94,252,103,271)
Scrubbing (94,272,107,280)
Scrubbing (109,0,474,512)
Scrubbing (109,198,122,206)
Scrubbing (109,223,122,229)
============================================================
Scrubbing (62,283,69,288)
Scrubbing (73,302,80,307)
Scrubbing (90,330,97,335)
============================================================

Complete credit for the base work goes to @FraPochetti, I just wrapped the functions in a container, added xvfb and other dependencies to (hopefully) reproduce most of the versions that he used, and then added functions to save to file.