Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add slidescore url and slidescore study id to download manifest. #42

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

YoniSchirris
Copy link
Contributor

  • save manifest as json
  • when starting the download, add slidescore_url and slidescore_study_id
  • add layer of slidescore_url and slidescore_study_id keys, might a user download multiple studies into a single directory, so as to not overwrite the initial values of slidescore_study_id if a second study is downloaded to the same directory.

Fixes #40

- save manifest as json
- when starting the download, add slidescore_url and slidescore_study_id
- add layer of slidescore_url and slidescore_study_id keys, might a user download multiple studies into a single directory, so as to not overwrite the initial values of `slidescore_study_id` if a second study is downloaded to the same directory.
- add option to save mappings without downloading WSIs
- add option to save mappings as tsv instead of json
- save image name instead of file name, since this is not helpful when we download .mrxs files as zips
- flip image name -> image id mapping to image id -> image name mapping, since image name is not necessarily unique
@YoniSchirris
Copy link
Contributor Author

Now also fixes #43

Copy link
Contributor

@jonasteuwen jonasteuwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. Can you also fix mypy?

Comment on lines 409 to 417
Will end up with something like
{
'slidescore_url': str,
'slidescore_study_id': int,
'slide_filename_to_id_mapping': {
str: int
...
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not filename anymore is it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not the structure indeed.

changed

    {
        "url": {
            "study_id": {
                "slidescore_url": "url",
                "slidescore_study_id": study_id,
                "slide_filename_to_study_image_id_mapping": {
                    "image_id": "image_name",
                    ...
                    ...
                }
            }
        }
    }

Comment on lines 358 to 360
def append_to_tsv_mapping(save_dir: pathlib.Path, items: List[str]) -> None:
"""
Create a manifest mapping image id to the filename.
Create a manifest mapping image id to image name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add that if the file does not exist yet, you write the first two lines?

# slidescore_url: <URL>
# study_id: <id>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do this manually in cli.download_wsis.

Otherwise I'd need to always pass the url and id to the function.

Also, if this would be the functionality, then downloading a second slidescory study into the same directory would NOT write the slidescore_url and study_id of the second study.

    elif mapping_format == "tsv":
        append_to_tsv_mapping(save_dir=save_dir, items=[f"# {slidescore_url}"])
        append_to_tsv_mapping(save_dir=save_dir, items=[f"# {study_id}"])

@@ -399,23 +493,56 @@ def download_wsis(
# Collect image metadata
images = client.get_images(study_id)

# # Add study details to mapping manifest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single #

Comment on lines +467 to +468
disable_download: bool = False,
mapping_format: str = "json",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no other way to get the mapping only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not that i can think of without duplicating a lot of code or entirely refactoring the function into less coherent parts.

Comment on lines +522 to +526
elif mapping_format == "tsv":
append_to_tsv_mapping(
save_dir=save_dir,
items=[str(image_id), image_name],
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not one of these, it's better to raise an error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Thought this was done by the argparser, but raising error here makes the function more generic

    else:
        raise ValueError(f"mapping_format should be either 'tsv' or 'json', but is {mapping_format}")

- update docstring of file structure of slidescore_mapping.json
- remove double #
- raise valueerror when mapping format is not properly given
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add slidescore server URL and slidescore study ID to slidescore_mapping.txt
2 participants