Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support_TIF_classweights_and_exclusion #18

Closed
wants to merge 7 commits into from
Closed

support_TIF_classweights_and_exclusion #18

wants to merge 7 commits into from

Conversation

minh-doan
Copy link
Collaborator

  • Support parsing TIF
  • Support class_weights in model.py
  • Allow exclusion when collecting samples for training/testing sets

@minh-doan minh-doan mentioned this pull request Oct 21, 2017
@@ -44,14 +44,29 @@
"--verbose",
is_flag=True
)
def command(input, batch_size, directory, name, verbose):
@click.option(
"--exclusion",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--exclude

@click.option(
"--exclusion",
default=None,
help="A comma-separated list of prefixes (string) specifying the files that needs to be helf off the testing dataset."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"helf" -> "held"

"--exclusion",
default=None,
help="A comma-separated list of prefixes (string) specifying the files that needs to be helf off the testing dataset."
" E.g., \"'patient_A', 'patient_X'\". All numpy arrays will be collected for testing if this flag is omitted."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"numpy arrays" -> "files"

Then we won't have to change the wording if/when the underlying data structure changes.

" E.g., \"'patient_A', 'patient_X'\". All numpy arrays will be collected for testing if this flag is omitted."
)
@click.option(
"--nsamples",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--samples would be okay, too.

"This setting is useful to limit certain amount of datapoint to be displayed in unsupervised PCA/t-SNE plots."
"All numpy arrays will be collected for testing if this flag is omitted."
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary newline.

# Check extension
pathnames = glob.glob(os.path.join(label_directories[0], "*"))
## TO_DO:
# how to make sure the label_directories[0] is not non-folder files? e.g. .DS_Store
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use os.path.isdir to filter out directories from non-directories.

@@ -101,7 +101,8 @@ def fit(self, x, y, batch_size=32, epochs=512, validation_split=0.2, verbose=0):
"epochs": epochs,
"steps_per_epoch": len(x_train) // batch_size,
"validation_steps": len(x_valid) // batch_size,
"verbose": verbose
"verbose": verbose,
"class_weight" : class_weight
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"class_weight": class_weight

@@ -33,7 +37,7 @@ def parse(pathname, output_directory, size, channels=None):
if ext == ".cif":
return _parse_cif(pathname, output_directory, size, channels)

raise NotImplementedError("Unsupported file format: {}".format(ext))
raise NotImplementedError("Expected file format: {}".format(".CIF"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I envisioned parse being a top-level function which delegated to different parse functions based on the file extensions. E.g.,

def parse(pathname, output_directory, size, channels=None):
    ext = os.path.splitext(pathname)[-1].lower()

    if ext == ".cif":
        return _parse_cif(pathname, output_directory, size, channels)

    if ext == ".npy":
        return _parse_npy(pathname, output_directory, size, channels) 

    raise NotImplementedError("Unsupported file format: {}".format(ext))

I see how it would be annoying to call parse for every .TIF image. Maybe the first argument (pathname) could take either a single file name or a list of file names.

nested_filenames = []

for label in labels:
# print("Parsing directory: {}".format(label))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment can be deleted.


src_dir = os.path.join(src, label)

filenames = glob.glob("{}/*.tif".format(src_dir))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use os.path.join to concatenate paths: os.path.join(src_dir, "*.tif")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't need glob.glob when using os.path.join(src_dir, "*.tif"), right?

@mcquin
Copy link
Collaborator

mcquin commented Oct 23, 2017

There are some good changes here, Minh. Nice work!

Let's work next on getting the existing tests passing:

  • Any call to model.fit needs to have the class_weights parameter specified. Alternatively, we could provide a sane default. For example, if class_weights is none, then use equal weights (1?).
  • The evaluate command doesn't run due to this line. I made some comments in the PR on how we can change it that should resolve the issue.
  • The default for --exclusion should probably be "" instead of None, or we should use an if to skip the exclusion filtering when a pattern isn't provided.
  • The parse commands are failing with "Unsupported file format: .cif" errors. Maybe there's something missing in the extension check.

@mcquin
Copy link
Collaborator

mcquin commented Mar 7, 2018

Usurped by #22

@mcquin mcquin closed this Mar 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants