Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running with greengenes taxonomy files, integer ID field causes type issue #531

Open
mstapylton opened this issue Jul 13, 2021 · 4 comments
Labels

Comments

@mstapylton
Copy link

ID column only containing integers causes the dataframe index to be int type which causes the index.intersection code to fail in tool.py:
Screen Shot 2021-07-01 at 3 40 03 PM
reproduce-files.zip
Here's the fix I put in where I cast the index to type string:
Screenshot from 2021-07-13 10-39-40

@ElDeveloper
Copy link
Member

Thanks @mstapylton, would you be able to submit a pull request with these changes and a new unit test?

@mstapylton
Copy link
Author

Sure, I think I can carve out some time next week to put together a PR. I work in the Clemente lab by the way.

@ElDeveloper
Copy link
Member

ElDeveloper commented Jul 16, 2021 via email

@fedarko fedarko added the bug label Jul 16, 2021
@fedarko
Copy link
Collaborator

fedarko commented Jul 21, 2021

Thank you for the detailed information and bug report, @mstapylton!

I downloaded the files you uploaded and tried running Empress using them; I don't know if this will help with the PR / testing, but the behavior I get (using the latest version of Empress as of today, and on my system) seems a bit different from the error message you screenshot here. I'm recording these differences below, just in case it's helpful to you or to future users.

  • When running Empress standalone (as is done in the issue here), I instead get the following error:

    Traceback (most recent call last):
      File "/home/marcus/Software/miniconda2/envs/qiime2-2021.2/bin/empress", line 11, in <module>
        load_entry_point('empress', 'console_scripts', 'empress')()
      File "/home/marcus/Software/miniconda2/envs/qiime2-2021.2/lib/python3.6/site-packages/click/core.py", line 829, in __call__
        return self.main(*args, **kwargs)
      File "/home/marcus/Software/miniconda2/envs/qiime2-2021.2/lib/python3.6/site-packages/click/core.py", line 782, in main
        rv = self.invoke(ctx)
      File "/home/marcus/Software/miniconda2/envs/qiime2-2021.2/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/home/marcus/Software/miniconda2/envs/qiime2-2021.2/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/home/marcus/Software/miniconda2/envs/qiime2-2021.2/lib/python3.6/site-packages/click/core.py", line 610, in invoke
        return callback(*args, **kwargs)
      File "/home/marcus/Dropbox/Work/KnightLab/Empress/fedarko/empress/empress/scripts/_cli.py", line 46, in tree_plot
        shear_to_feature_metadata=shear_to_feature_metadata)
      File "/home/marcus/Dropbox/Work/KnightLab/Empress/fedarko/empress/empress/core.py", line 158, in __init__
        shear_to_feature_metadata,
      File "/home/marcus/Dropbox/Work/KnightLab/Empress/fedarko/empress/empress/core.py", line 244, in _validate_and_match_data
        ) = match_tree_and_feature_metadata(self.tree, self.features)
      File "/home/marcus/Dropbox/Work/KnightLab/Empress/fedarko/empress/empress/tools.py", line 87, in match_tree_and_feature_metadata
        "No features in the feature metadata are present in the tree, "
    empress.tools.DataMatchingError: No features in the feature metadata are present in the tree, either as tips or as internal nodes.
    

    This is still a bug in Empress (the feature metadata does match up with the tree, so this error is incorrect!), but it's a different error to the one you show above (ValueError: invalid literal for int() with base 10: '35bf...').

    I assume the dataset used in the screenshot is different from the dataset included in the ZIP file, which might explain the difference in error messages. However I'm still confused by the traceback you showed, since the btt = [int(s) for s in btt] line listed there doesn't appear anywhere in Empress' codebase (at least as far as I can tell?), even in old releases.

  • When I run Empress through QIIME 2*, I don't run into this problem -- I'm able to create and load a visualization without any problems.

    * (after first importing the tree file into a QZA with the following command: qiime tools import --type "Phylogeny[Rooted]" --input-path 99_otus.tree --output-path 99_otus.qza)

Anyway -- it's not a huge deal or anything, but hopefully this provides some extra context if these problems were coming up during creation of the PR / testing. Thank you again for raising this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants