Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiQC report: Issues with FastQC #1308

Open
wants to merge 17 commits into
base: dev
Choose a base branch
from

Conversation

MatthiasZepper
Copy link
Member

@MatthiasZepper MatthiasZepper commented May 29, 2024

This draft PR comprises my current progress towards fixing issue #1303.

It does modify the publishDir directives in the FastQC module config such that the reports are consistently published in ${params.outdir}/fastqc/raw and ${params.outdir}/fastqc/trim regardless of the chosen trimmer (TrimGalore!, Fastp), and adapts the custom MultiQC config of the pipeline accordingly.

This is, however, not sufficient to fix the issue, because recent versions of MultiQC have a bug that prevents running the same module twice. There are still separate entries and columns in the General Statistics table, but the modules are not shown in the report and navigation bar:

MultiQC 1.23dev, 1.22, 1.21 MultiQC 1.18
Sidebar in MultiQC 1.23dev MultiQC 1.18
Screenshot of Navbar 1.23dev Screenshot of Navbar 1.18

For both screenshots, I ran MultiQC on the output directory of a test profile run of this pipeline using the custom profile in workflows/rnaseq/assets/multiqc/multiqc_config.yml.

It should be stressed that the FastQC module itself works in modern versions, because if the custom config is omitted, it is also shown. But forcing the module to run twice via a custom config seemingly breaks it. Only in theGeneral Statisticstable, it still works like a charm. Thus, the reports are parsed, but the module output is not displayed in the report.

MultiQC 1.23 General Statistics Table

Further issues

In the course of troubleshooting this issue, I discovered more issues that need to be tackled. Help would be greatly appreciated with those:

Inconsistent naming of FastQC output:

For FastP, the file names are retained before and after trimming:

fastqc
├── raw
│   ├── RAP1_IAA_30M_REP1_1_fastqc.html
│   ├── RAP1_IAA_30M_REP1_1_fastqc.zip
│   ├── RAP1_IAA_30M_REP1_2_fastqc.html
│   ├── RAP1_IAA_30M_REP1_2_fastqc.zip
│   ├── RAP1_UNINDUCED_REP1_fastqc.html
│   ├── RAP1_UNINDUCED_REP1_fastqc.zip
│   ├── RAP1_UNINDUCED_REP2_fastqc.html
│   ├── RAP1_UNINDUCED_REP2_fastqc.zip
│   ├── WT_REP1_1_fastqc.html
│   ├── WT_REP1_1_fastqc.zip
│   ├── WT_REP1_2_fastqc.html
│   ├── WT_REP1_2_fastqc.zip
│   ├── WT_REP2_1_fastqc.html
│   ├── WT_REP2_1_fastqc.zip
│   ├── WT_REP2_2_fastqc.html
│   └── WT_REP2_2_fastqc.zip
└── trim
   ├── RAP1_IAA_30M_REP1_1_fastqc.html
   ├── RAP1_IAA_30M_REP1_1_fastqc.zip
   ├── RAP1_IAA_30M_REP1_2_fastqc.html
   ├── RAP1_IAA_30M_REP1_2_fastqc.zip
   ├── RAP1_UNINDUCED_REP1_fastqc.html
   ├── RAP1_UNINDUCED_REP1_fastqc.zip
   ├── RAP1_UNINDUCED_REP2_fastqc.html
   ├── RAP1_UNINDUCED_REP2_fastqc.zip
   ├── WT_REP1_1_fastqc.html
   ├── WT_REP1_1_fastqc.zip
   ├── WT_REP1_2_fastqc.html
   ├── WT_REP1_2_fastqc.zip
   ├── WT_REP2_1_fastqc.html
   ├── WT_REP2_1_fastqc.zip
   ├── WT_REP2_2_fastqc.html
   └── WT_REP2_2_fastqc.zip

For TrimGalore!, the RAP1_UNINDUCED samples are renamed with a trimmed suffix and the others receive _val1_ and _val2_ suffixes.

fastqc
├── raw
│   ├── RAP1_IAA_30M_REP1_1_fastqc.html
│   ├── RAP1_IAA_30M_REP1_1_fastqc.zip
│   ├── RAP1_IAA_30M_REP1_2_fastqc.html
│   ├── RAP1_IAA_30M_REP1_2_fastqc.zip
│   ├── RAP1_UNINDUCED_REP1_fastqc.html
│   ├── RAP1_UNINDUCED_REP1_fastqc.zip
│   ├── RAP1_UNINDUCED_REP2_fastqc.html
│   ├── RAP1_UNINDUCED_REP2_fastqc.zip
│   ├── WT_REP1_1_fastqc.html
│   ├── WT_REP1_1_fastqc.zip
│   ├── WT_REP1_2_fastqc.html
│   ├── WT_REP1_2_fastqc.zip
│   ├── WT_REP2_1_fastqc.html
│   ├── WT_REP2_1_fastqc.zip
│   ├── WT_REP2_2_fastqc.html
│   └── WT_REP2_2_fastqc.zip
└── trim
    ├── RAP1_IAA_30M_REP1_1_val_1_fastqc.html
    ├── RAP1_IAA_30M_REP1_1_val_1_fastqc.zip
    ├── RAP1_IAA_30M_REP1_2_val_2_fastqc.html
    ├── RAP1_IAA_30M_REP1_2_val_2_fastqc.zip
    ├── RAP1_UNINDUCED_REP1_trimmed_fastqc.html
    ├── RAP1_UNINDUCED_REP1_trimmed_fastqc.zip
    ├── RAP1_UNINDUCED_REP2_trimmed_fastqc.html
    ├── RAP1_UNINDUCED_REP2_trimmed_fastqc.zip
    ├── WT_REP1_1_val_1_fastqc.html
    ├── WT_REP1_1_val_1_fastqc.zip
    ├── WT_REP1_2_val_2_fastqc.html
    ├── WT_REP1_2_val_2_fastqc.zip
    ├── WT_REP2_1_val_1_fastqc.html
    ├── WT_REP2_1_val_1_fastqc.zip
    ├── WT_REP2_2_val_2_fastqc.html
    └── WT_REP2_2_val_2_fastqc.zip

Unfortunately, I have no idea why. I have quadruplechecked the publishDir directives and can't explain. Help and inspiration needed!

Duplicate column is actually shown in the General Statistics table (FIXED!)

According to the config, the duplicate column from FastQC should be hidden in the General Statistics table. However, it is shown. Might be another MultiQC bug or that I just stared myself blind.

# Don't show % Dups in the General Stats table (we have this from Picard)
table_columns_visible:
  fastqc:
    percent_duplicates: False

umi-tools dedup stats not shown (Fixed)

According to our current master / dev branch config, the umi_tools module is not run. Seeing this, I believed that would be an easy fix for #1277 and added the module in the config. However, no reports are shown. Either the module is broken or the deduplication stats are not channelled to MultiQC. In either way, also no quick solution in sight here.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/rnaseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Copy link

github-actions bot commented May 29, 2024

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 1931c05

+| ✅ 173 tests passed       |+
#| ❔   9 tests were ignored |#
!| ❗   7 tests had warnings |!

❗ Test warnings:

  • files_exist - File not found: assets/multiqc_config.yml
  • files_exist - File not found: .github/workflows/awstest.yml
  • files_exist - File not found: .github/workflows/awsfulltest.yml
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 2.14.1
  • Run at 2024-07-10 12:23:46

@MatthiasZepper
Copy link
Member Author

Some progress:

  • Vlad Savelyev speedily fixed the "Multi Module Multi QC issue" for us, and the MultiQC release 1.22.2 was pushed just for us. Patching the MultiQC module to the latest version thus fixes MultiQC report is missing fastQC results on the dev branch #1303 in conjunction with my proposed changes in the publishDir directives. One issue is down, three to go.
  • After some testing, I finally understood that the YAML config in table_columns_visible expects the actual module names and not the original module name used by MultiQC. Thus, I could now successfully suppress the display of the unwanted column. Two issues is down, two to go.

@drpatelh
Copy link
Member

Thanks @MatthiasZepper !!

Two issues is down, two to go.

I read through your write-up but was a little unclear as to what is still missing here?

@pinin4fjords
Copy link
Member

To copy in @MatthiasZepper's note on this from Slack:

I am somewhat stuck with #1308, both because of a lack of time recently and also a lack of ideas. I believed that I fixed 3 of the 4 issues with the 4th, the inconsistent naming of the TrimGalore! output, being somewhat neglectable.

However, it turns out that I did not fix the main issue yet. The reports generated by MultiQC when run inside the pipeline and manually on the outdir of the pipeline differ. The manual runs look exactly how I want them, so I thought it should be good, but the pipeline version does not work alike.

In the pipeline version, the path_filters in the MultiQC config (workflows/rnaseq/assets/multiqc/multiqc_config.yml) are not applied:

module_order:
  - fastqc:
      name: "FastQC (raw)"
      anchor: "fastqc_raw"
      info: "This section of the report shows FastQC results before adapter trimming."
      path_filters:
        - "**/raw/*.zip"
  - cutadapt
  - fastp
  - fastqc:
      name: "FastQC (trimmed)"
      anchor: "fastqc_trimmed"
      info: "This section of the report shows FastQC results after adapter trimming."
      path_filters:
        - "**/trim/*.zip"

I think that is because the file paths in the ch_multiqc_files are still those to the work dir and to not correspond yet to the final folder structure specified by the publishDir directives when I mix the output into the channel…

ch_multiqc_files = ch_multiqc_files.mix(FASTQ_FASTQC_UMITOOLS_FASTP.out.fastqc_raw_zip.collect{it[1]})

… but since I can’t do a proper introspection into the channel (a .view() or .collectFile() completely crashes the pipeline), I don’t know for sure.

@pinin4fjords
Copy link
Member

OK, I know the fix @MatthiasZepper, I sorted this in riboseq. The issue is that the file structure is flat by the time it gets to MultiQC.

We need to do like:

    if (params.trimmer == 'trimgalore') {
        process {
            withName: '.*:FASTQ_FASTQC_UMITOOLS_TRIMGALORE:FASTQC' {
                ext.prefix = { "${meta.id}_raw" }
                ext.args   = '--quiet'
                publishDir = [
                    path: { "${params.outdir}/preprocessing/fastqc" },
                    mode: params.publish_dir_mode,
                    saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
                ]
            }
        }
    }

... and then:

module_order:
  - fastqc:
      name: "FastQC (raw)"
      info: "This section of the report shows FastQC results before adapter trimming."
      path_filters:
        - "*_raw_fastqc.zip"

So we're using the prefix sent to FASTQC to mark the outputs appropriately. I'll push a commit to your branch if I can, but this is the way to solve it.

@MatthiasZepper
Copy link
Member Author

MatthiasZepper commented Jun 20, 2024

OK, I know the fix @MatthiasZepper, I sorted this in riboseq. The issue is that the file structure is flat by the time it gets to MultiQC.

So we're using the prefix sent to FASTQC to mark the outputs appropriately. I'll push a commit to your branch if I can, but this is the way to solve it.

Thank you so much! That would be fantastic! You should be able to push to the branch since you are a maintainer, but just in case, I have also invited to as a collaborator to my fork!

@pinin4fjords
Copy link
Member

@MatthiasZepper OK, committed! Had a quick check and I think this works, though I note that the trimgalore subworkflow doesn't do a post-trim FASTQ, which we might want to address at some point....

Anyway, I'll let you take it home from here :-)

@MatthiasZepper
Copy link
Member Author

Thank you so much! I will try my best to finish this quickly now!

though I note that the trimgalore subworkflow doesn't do a post-trim FASTQ, which we might want to address at some point....

Oh, it does. It is just confusing, because TrimGalore! in itself is a wrapper script around cutadapt and FastQC. So FastQC is not run as a Nextflow process but by the TrimGalore Perl script.

@pinin4fjords
Copy link
Member

Ahh right, thought I was forgetting something ;-). So there is probably a missing bit to get those outputs prefixed correctly, but you know what to do.

@pinin4fjords
Copy link
Member

@MatthiasZepper in case it's impacting on your work, we've noticed that the lastest MultiQC has generated some issues in the workflow. We're looking into it.

@MatthiasZepper MatthiasZepper force-pushed the MultiQC_FastQC_bug branch 3 times, most recently from c005701 to 3ad2adf Compare July 2, 2024 16:55
@MatthiasZepper MatthiasZepper marked this pull request as ready for review July 3, 2024 13:06
@MatthiasZepper
Copy link
Member Author

I think/hope/wish I am done with this PR. It now fixes 3 out of the 4 issues that were spotted with the TrimGalore! renaming being left. However, I perceive this as a minor issue and think that it could be tackled some when later if needed.

@pinin4fjords
Copy link
Member

Great, thanks @MatthiasZepper ! Just to be clear, you don't need an updated MultiQC?

@MatthiasZepper
Copy link
Member Author

MatthiasZepper commented Jul 4, 2024

Great, thanks @MatthiasZepper ! Just to be clear, you don't need an updated MultiQC?

It did need changes to MultiQC, since the previous version was not working. However, the critical bug was fixed with 1.22.2 and my updates to the umi-tools module were already contained within 1.22.3.

Therefore, with this PR, we should now see (re)introduced:

  • MultiQC report has a FastQC (raw) and FastQC (trimmed) section again, closes MultiQC report is missing fastQC results on the dev branch #1303
  • MultiQC report now features an umi-tools extract statistics. While not very helpful for the basic extraction, it will be quite useful for the regex mode of umi-tools.
  • The FastQC duplicate estimate is hidden from the General Statistics table (since umi-tools / picard duplicate estimates are run on the aligned reads and thus more accurate)
  • MultiQC report now correctly picks up and displays the umi-tools dedup statistics. This should close or at least represent significant progress towards a solution of Improve/add UMI deduplication metrics #1277 .
  • Account for some MultiQC config changes. e.g. reverseColors is now reverse_colors in the custom content stuff. Since the custom content is, however, not displayed, it is hard to fully test this. It at least tackles all the warning messages about deprecated config that have been displayed before.
  • I have sneaked in instructions for processing the Watchmaker UMIs with the pipeline. Unrelated to the purpose of this PR, but since it was a tiny update it felt excessive to make a seperate PR for this.

@@ -124,19 +124,13 @@ line="#id: dupradar
# - color: 'green'
# dash: 'LongDash'
# label:
# style: {color: 'green'}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an nf-core module we should really be fixing this in nf-core modules, or via the patch currently on dev (which I just noticed has some config bits I should probably remove)

Copy link
Member Author

@MatthiasZepper MatthiasZepper Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I did admittedly not consider whether that was a local file or is provided with a module. However, I noticed that @drpatelh also changed that file locally for the pipeline. So I think we would redo that for the module once we tested everything with a new MultiQC version?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, made a patch for that one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have now undone all changes and reverted to your file version from when you turned it into a module.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here is the corresponding draft PR to the modules' repo. I think, we should keep it as draft for now, until we can properly test with the new MultiQC version.

MatthiasZepper and others added 3 commits July 9, 2024 18:17
…rshil Patel and me, since they should be fixed upstream in the modules directory. Reset state of file to cc1bf2c.
@pinin4fjords
Copy link
Member

Hope you don't mind @MatthiasZepper - just illustrating in those last couple of commits what I meant. So use the module in its updated form, but also have a patch to help with updates.

I also removed something I added to the patch earlier and which shouldn't have been there, and bumped the module (think it was just Maxime mucking about with stubs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MultiQC report is missing fastQC results on the dev branch
3 participants