Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New tool: biobambam/bamsormadup #1478

Merged
merged 27 commits into from
Apr 6, 2022
Merged

New tool: biobambam/bamsormadup #1478

merged 27 commits into from
Apr 6, 2022

Conversation

matthdsm
Copy link
Contributor

@matthdsm matthdsm commented Apr 1, 2022

Add biobambam/bamsormadup
fixes #1479

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Emit the versions.yml file.
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • PROFILE=docker pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware
    • PROFILE=singularity pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware
    • PROFILE=conda pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware

@matthdsm matthdsm self-assigned this Apr 1, 2022
@matthdsm matthdsm mentioned this pull request Apr 1, 2022
4 tasks
@matthdsm
Copy link
Contributor Author

matthdsm commented Apr 1, 2022

I'm trying to run the tests locally, but keep running into errors with both docker and conda.
AFAIK everything should be ok, but I suppose I'm still missing something.

Some help would be much appreciated
M

@edmundmiller
Copy link
Contributor

I ran it with manually with conda and got the error:
PROFILE=conda nextflow run tests/modules/biobambam/bamsormadup -entry test_biobambam_bamsormadup -c tests/config/nextflow.config

Command error:
  libmaus2::bambam::parallel::GenericInputControlReadWorkPackageDispatcher: Invalid empty stream is invalid bgzf/BAM

  /usr/local/bin/../lib/./libmaus2_stacktrace.so.2(libmaus2::stacktrace::StackTrace::StackTrace()+0x67)[0x7f4181dcce37]
  /usr/local/bin/../lib/libmaus2_exception.so.2(libmaus2::exception::LibMausException::LibMausException()+0x2c)[0x7f418247ea7c]
  bamsormadup(+0x648a1)[0x555ad5f518a1]
  bamsormadup(+0x9a4a8)[0x555ad5f874a8]
  bamsormadup(+0x4a565)[0x555ad5f37565]                                                                      /usr/local/bin/../lib/libstdc++.so.6(+0xcc9d4)[0x7f41823579d4]
  /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3)[0x7f41820a0fa3]
  /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f4181fd14cf]

  libmaus2::bambam::parallel::GenericInputControlReadWorkPackageDispatcher: Invalid empty stream is invalid bgzf/BAM

  /usr/local/bin/../lib/./libmaus2_stacktrace.so.2(libmaus2::stacktrace::StackTrace::StackTrace()+0x67)[0x7f4181dcce37]
  /usr/local/bin/../lib/libmaus2_exception.so.2(libmaus2::exception::LibMausException::LibMausException()+0x2c)[0x7f418247ea7c]
  bamsormadup(+0x648a1)[0x555ad5f518a1]
  bamsormadup(+0x9a4a8)[0x555ad5f874a8]
  bamsormadup(+0x4a565)[0x555ad5f37565]
  /usr/local/bin/../lib/libstdc++.so.6(+0xcc9d4)[0x7f41823579d4]
  /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3)[0x7f41820a0fa3]
  /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f4181fd14cf]

Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall pretty good, just a few requests/suggestions

modules/biobambam/bamsormadup/main.nf Outdated Show resolved Hide resolved
tests/modules/biobambam/bamsormadup/test.yml Outdated Show resolved Hide resolved
@jfy133 jfy133 self-requested a review April 4, 2022 07:44
@matthdsm
Copy link
Contributor Author

matthdsm commented Apr 4, 2022

Awesome, thanks for the review. I'll get to it right away!

Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops something went funky in last review and missed a bunch of comments..

Also tests were failing previously

$args \\
I=$bam \\
O=${prefix}.bam \\
M=${prefix}.metrics.txt \\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove metrics, file naming should be up to the pipeline developer 1

Footnotes

  1. https://nf-co.re/developers/modules#naming-conventions point 5

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would you suggest differentiating the filename? Or do you mean remove M=xxx and moving it to $args?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just I mean just specify as {prefix}.txt, unless the tool can also produce other .txt files? Otherwise .txt should be enough to differentiate - and a pipeline developer can add metrics themselves if they want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allright, thanks. This is resolved than.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you reverted back to metrics, is that required by the tool in the end?

- add (optional) reference input
- add bam index ouput
- add cram output option
- make metrics output: more general
@matthdsm
Copy link
Contributor Author

matthdsm commented Apr 4, 2022

Is there a common way to check the args for errors?
e.g. If the output is a cram file, and reference is not defined we should get an error.

Also, is there a way to set the output extension based on a (possibly undefined) arg (outputformat) in $args?

@jfy133
Copy link
Member

jfy133 commented Apr 4, 2022

Is there a common way to check the args for errors? e.g. If the output is a cram file, and reference is not defined we should get an error.

Also, is there a way to set the output extension based on a (possibly undefined) arg (outputformat) in $args?

It's generally OK to add some checks in the script block with the other process variable definitions. You can see an example of an error check at the bottom of point 5 here: https://nf-co.re/developers/modules#naming-conventions.

and like wise for the variable extension, you are welcome to write a internal conditional variable to adust this

@matthdsm
Copy link
Contributor Author

matthdsm commented Apr 4, 2022

Most comments are fixed! Now we just need to get the tests to work...

@jfy133
Copy link
Member

jfy133 commented Apr 4, 2022

For the failng tests, have you tried running locally?

https://nf-co.re/developers/modules#running-tests-manually

@matthdsm
Copy link
Contributor Author

matthdsm commented Apr 4, 2022

Yeah, but I keep having some issues with the versions.yml file it seems:

 NF_CORE_MODULES_TEST=1 TMPDIR=~ PROFILE=docker pytest --tag biobambam/bamsormadup --symlink --keep-workflow-wd
============================================================= test session starts ==============================================================
platform darwin -- Python 3.10.4, pytest-7.1.1, pluggy-0.12.0
rootdir: /Users/matdsmet/Projects/nf-core-modules, configfile: pytest.ini
plugins: datafiles-2.0, workflow-1.6.0, cov-3.0.0
collecting ... 
collected 732 items                                                                                                                            

biobambam/bamsormadup test_biobambam_bamsormadup:
        command:   nextflow run tests/modules/biobambam/bamsormadup -entry test_biobambam_bamsormadup -c tests/config/nextflow.config
        directory: /Users/matdsmet/pytest_workflow_gqwmex1a/biobambam/bamsormadup_test_biobambam_bamsormadup
        stdout:    /Users/matdsmet/pytest_workflow_gqwmex1a/biobambam/bamsormadup_test_biobambam_bamsormadup/log.out
        stderr:    /Users/matdsmet/pytest_workflow_gqwmex1a/biobambam/bamsormadup_test_biobambam_bamsormadup/log.err
'biobambam/bamsormadup test_biobambam_bamsormadup' done.

tests/test_versions_yml.py sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss [ 14%]
sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssFssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss [ 33%]
ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss [ 52%]
ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss [ 70%]
ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss [ 89%]
sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss                                                              [ 99%]
tests/modules/biobambam/bamsormadup/test.yml FFFF                                                                                        [100%] Keeping temporary directories and logs. Use '--kwd' or '--keep-workflow-wd' to disable this behaviour.


=================================================================== FAILURES ===================================================================
_______________________________ test_ensure_valid_version_yml[biobambam/bamsormadup test_biobambam_bamsormadup] ________________________________

workflow_dir = PosixPath('/Users/matdsmet/pytest_workflow_gqwmex1a/biobambam/bamsormadup_test_biobambam_bamsormadup')

    @pytest.mark.workflow(*_get_workflow_names())
    def test_ensure_valid_version_yml(workflow_dir):
        workflow_dir = Path(workflow_dir)
        software_name = workflow_dir.name.split("_")[0].lower()
        try:
            versions_yml_file = workflow_dir / f"output/{software_name}/versions.yml"
>           versions_yml = versions_yml_file.read_text()

tests/test_versions_yml.py:31: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = PosixPath('/Users/matdsmet/pytest_workflow_gqwmex1a/biobambam/bamsormadup_test_biobambam_bamsormadup/output/bamsormadup/versions.yml')
encoding = 'locale', errors = None

    def read_text(self, encoding=None, errors=None):
        """
        Open the file in text mode, read it, and close the file.
        """
        encoding = io.text_encoding(encoding)
>       with self.open(mode='r', encoding=encoding, errors=errors) as f:

../../miniconda3/lib/python3.10/pathlib.py:1132: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = PosixPath('/Users/matdsmet/pytest_workflow_gqwmex1a/biobambam/bamsormadup_test_biobambam_bamsormadup/output/bamsormadup/versions.yml')
mode = 'r', buffering = -1, encoding = 'locale', errors = None, newline = None

    def open(self, mode='r', buffering=-1, encoding=None,
             errors=None, newline=None):
        """
        Open the file pointed by this path and return a file object, as
        the built-in open() function does.
        """
        if "b" not in mode:
            encoding = io.text_encoding(encoding)
>       return self._accessor.open(self, mode, buffering, encoding, errors,
                                   newline)
E       FileNotFoundError: [Errno 2] No such file or directory: '/Users/matdsmet/pytest_workflow_gqwmex1a/biobambam/bamsormadup_test_biobambam_bamsormadup/output/bamsormadup/versions.yml'

../../miniconda3/lib/python3.10/pathlib.py:1117: FileNotFoundError

During handling of the above exception, another exception occurred:

workflow_dir = PosixPath('/Users/matdsmet/pytest_workflow_gqwmex1a/biobambam/bamsormadup_test_biobambam_bamsormadup')

    @pytest.mark.workflow(*_get_workflow_names())
    def test_ensure_valid_version_yml(workflow_dir):
        workflow_dir = Path(workflow_dir)
        software_name = workflow_dir.name.split("_")[0].lower()
        try:
            versions_yml_file = workflow_dir / f"output/{software_name}/versions.yml"
            versions_yml = versions_yml_file.read_text()
        except FileNotFoundError:
>           raise AssertionError(
                dedent(
                    f"""\
                    `versions.yml` not found in the output directory.
                    Expected path: `{versions_yml_file}`
    
                    This can have multiple reasons:
                    * The test-workflow failed before a `versions.yml` could be generated.
                    * The workflow name in `test.yml` does not start with the tool name.
                    """
                )
            )
E           AssertionError: `versions.yml` not found in the output directory.
E           Expected path: `/Users/matdsmet/pytest_workflow_gqwmex1a/biobambam/bamsormadup_test_biobambam_bamsormadup/output/bamsormadup/versions.yml`
E           
E           This can have multiple reasons:
E           * The test-workflow failed before a `versions.yml` could be generated.
E           * The workflow name in `test.yml` does not start with the tool name.

tests/test_versions_yml.py:33: AssertionError
_________________________________________________________________ test session _________________________________________________________________
'/Users/matdsmet/pytest_workflow_gqwmex1a/biobambam/bamsormadup_test_biobambam_bamsormadup/output/biobambam/test.bam' does not exist while it should
_________________________________________________________________ test session _________________________________________________________________
'/Users/matdsmet/pytest_workflow_gqwmex1a/biobambam/bamsormadup_test_biobambam_bamsormadup/output/biobambam/test.metrics.txt' does not exist while it should
_________________________________________________________________ test session _________________________________________________________________
'/Users/matdsmet/pytest_workflow_gqwmex1a/biobambam/bamsormadup_test_biobambam_bamsormadup/output/biobambam/versions.yml' does not exist while it should
_________________________________________________________________ test session _________________________________________________________________
'biobambam/bamsormadup test_biobambam_bamsormadup' exited with exit code '1' instead of '0'.
=============================================================== warnings summary ===============================================================
../../miniconda3/lib/python3.10/site-packages/pytest_workflow/plugin.py:391
  /Users/matdsmet/miniconda3/lib/python3.10/site-packages/pytest_workflow/plugin.py:391: UserWarning: .git dir detected: /Users/matdsmet/Projects/nf-core-modules/.git. pytest-workflow will copy the entire .git directory and all files ignored by git. It is recommended to use the --git-aware option.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================================================== short test summary info ============================================================
FAILED tests/test_versions_yml.py::test_ensure_valid_version_yml[biobambam/bamsormadup test_biobambam_bamsormadup] - AssertionError: `version...
FAILED tests/modules/biobambam/bamsormadup/test.yml::biobambam/bamsormadup test_biobambam_bamsormadup::output/biobambam/test.bam::should exist
FAILED tests/modules/biobambam/bamsormadup/test.yml::biobambam/bamsormadup test_biobambam_bamsormadup::output/biobambam/test.metrics.txt::should exist
FAILED tests/modules/biobambam/bamsormadup/test.yml::biobambam/bamsormadup test_biobambam_bamsormadup::output/biobambam/versions.yml::should exist
FAILED tests/modules/biobambam/bamsormadup/test.yml::biobambam/bamsormadup test_biobambam_bamsormadup::exit code should be 0
================================================== 5 failed, 727 skipped, 1 warning in 11.81s ==================================================

@jfy133
Copy link
Member

jfy133 commented Apr 4, 2022

Yeah, but I keep having some issues with the versions.yml file it seems:

This is an unfortunate misdirection , you need to check the actual working directory or the .nexflow.log of the test to see what the error is (I hate python traceback sometimes ;) ) . I suspect the process is failing too early so it doesn't generate the versions.yml and but for whatever reason that specific check is the one that makes the largest error...

@matthdsm
Copy link
Contributor Author

matthdsm commented Apr 4, 2022

Ok, looks like a syntax error, altough I'm not really sure what's wrong...

N E X T F L O W  ~  version 21.10.6
Launching `tests/modules/biobambam/bamsormadup/main.nf` [berserk_meucci] - revision: 1a1ee0d1f7
Module compilation error
- file : /Users/matdsmet/pytest_workflow_pfgis0rq/biobambam/bamsormadup_test_biobambam_bamsormadup/tests/modules/biobambam/bamsormadup/../../../../modules/biobambam/bamsormadup/main.nf
- cause: Unexpected input: '{' @ line 1, column 31.
   process BIOBAMBAM_BAMSORMADUP {
                                 ^

1 error

@jfy133
Copy link
Member

jfy133 commented Apr 4, 2022

The other option is to pick other BAM files from the nf-core test/datasets and see if the ywork! There are quite a few now.

@matthdsm
Copy link
Contributor Author

matthdsm commented Apr 5, 2022

I've tried using params.test_data['homo_sapiens']['illumina']['test_paired_end_sorted_bam'], but still no luck. samtools view of the test data seems to yield a valid result, so it must be something inside the docker image

@matthdsm
Copy link
Contributor Author

matthdsm commented Apr 5, 2022

The issue must be somewhere in the module, because when I update biobambam/bammarkduplicates to the latest version (so same docker image), the tests do work.

@matthdsm
Copy link
Contributor Author

matthdsm commented Apr 5, 2022

Downgrading the version of biobambam/bamsormadup doesn't help either

@matthdsm matthdsm added the help wanted Extra attention is needed label Apr 5, 2022
@jfy133
Copy link
Member

jfy133 commented Apr 5, 2022

@matthdsm if you go into the work directory of the process (after a failing test), and try manually running bash .command.sh does it work for you? Can be on any of docker/singularity/conda

@matthdsm
Copy link
Contributor Author

matthdsm commented Apr 5, 2022

I can't run any of the commands locally, the conda recipe isn't built for ARM, so not available on my computer

@jfy133
Copy link
Member

jfy133 commented Apr 5, 2022

Ah I see... will need to get more input else where, I thikn it best to post the problem on slack and get some other people to try too... I still can't get it to run. Just hangs...

@matthdsm
Copy link
Contributor Author

matthdsm commented Apr 5, 2022

Turns out I'm actually an idiot and got the command line all wrong 🤦🏻 fixing right now

@matthdsm
Copy link
Contributor Author

matthdsm commented Apr 5, 2022

So I added the bamcat command in front of bamsormadup, since bamsormadup takes it's input from stdin.
bamcat is part of the biobambam suite, so actually the same tool, which is why I could do this without issue.

@matthdsm
Copy link
Contributor Author

matthdsm commented Apr 5, 2022

bamcat can take multiple inputs, making this chain also suitable to merge several bam files while simultaneously sorting/markdup.

I'm still new to nextflow. How would I define multiple bam inputs? I imagine would then be able to do something like "-I ".join([bam1,bam2,...])

EDIT: nvm, figured it out! I think this is good to go!

@matthdsm matthdsm added Ready for Review and removed help wanted Extra attention is needed labels Apr 5, 2022
@matthdsm matthdsm requested a review from jfy133 April 5, 2022 10:18
Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of comments but looking pretty ready for me now!

modules/biobambam/bamsormadup/meta.yml Outdated Show resolved Hide resolved
modules/biobambam/bamsormadup/meta.yml Outdated Show resolved Hide resolved
$args \\
I=$bam \\
O=${prefix}.bam \\
M=${prefix}.metrics.txt \\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you reverted back to metrics, is that required by the tool in the end?

tests/modules/biobambam/bamsormadup/test.yml Outdated Show resolved Hide resolved
@matthdsm
Copy link
Contributor Author

matthdsm commented Apr 6, 2022

I see you reverted back to metrics, is that required by the tool in the end?

Not required, but I noticed some other tools used for marking duplicates (picard, bammarkduplicates2) use the same convention, so I decided to stick with it for the sake of continuity

@matthdsm
Copy link
Contributor Author

matthdsm commented Apr 6, 2022

Feel free to merge if everything looks ok to you!

@jfy133
Copy link
Member

jfy133 commented Apr 6, 2022

I see you reverted back to metrics, is that required by the tool in the end?

Not required, but I noticed some other tools used for marking duplicates (picard, bammarkduplicates2) use the same convention, so I decided to stick with it for the sake of continuity

Oh really? they were early tools so maybe that was before the guidlines were confirmed...

Hm ok. Fair enough then!

@jfy133 jfy133 merged commit dc95e67 into master Apr 6, 2022
@matthdsm matthdsm deleted the tool/bamsormadup branch April 6, 2022 06:19
@matthdsm
Copy link
Contributor Author

matthdsm commented Apr 6, 2022

Thanks for the help and review. Lots more to come!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

new module: biobambam/bamsormadup
3 participants