Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

effective genome sizes in docs #1285

Merged
merged 7 commits into from
Jan 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,9 +63,9 @@ jobs:
micromamba activate test_and_build
rm -f dist/*
python -m build
- uses: actions/upload-artifact@master
- uses: actions/upload-artifact@v3
with:
name: "Dist files"
name: "distfiles"
path: "dist"
test-wheels:
name: test wheel
Expand All @@ -78,7 +78,7 @@ jobs:
- uses: actions/checkout@v3
- uses: actions/download-artifact@v3
with:
name: "Dist files"
name: "distfiles"
path: ~/dist/
- uses: actions/setup-python@v4
with:
Expand Down
2 changes: 2 additions & 0 deletions CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
* doc fixes (argparse properly displayed, minor changes in installation instructions)
* deepblue support stops
* initiate deprecation of tight_layout in plotheatmap, in favor of constrained_layout. Minor changes in paddings, etc can occur (but for the better).
* documentation changes to improve ESS tab, table constraints have been lifted & sphinx_rtd_theme to v2.0.0
* upload artifact in gh test runner pinned to 3

3.5.4
* error handling and cases for bwAverage with >2 samples
Expand Down
7 changes: 0 additions & 7 deletions docs/_static/fix_tables.css

This file was deleted.

80 changes: 53 additions & 27 deletions docs/content/feature/effectiveGenomeSize.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,30 +6,56 @@ A number of tools can accept an "effective genome size". This is defined as the
1. The number of non-N bases in the genome.
2. The number of regions (of some size) in the genome that are uniquely mappable (possibly given some maximal edit distance).

Option 1 can be computed using ``faCount`` from `Kent's tools <https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/>`__. The effective genome size for a number of genomes using this method is given below:

======== ==============
Genome Effective size
======== ==============
GRCh37 2864785220
GRCh38 2913022398
GRCm37 2620345972
GRCm38 2652783500
dm3 162367812
dm6 142573017
GRCz10 1369631918
WBcel235 100286401
TAIR10 119481543
======== ==============

These values only appropriate if multimapping reads are included. If they are excluded (or there's any MAPQ filter applied), then values derived from option 2 are more appropriate. These are then based on the read length. We can approximate these values for various read lengths using the `khmer program <https://khmer.readthedocs.io/en/v2.1.1/>`__ program and ``unique-kmers.py`` in particular. A table of effective genome sizes given a read length using this method is provided below:

=========== ========== ========== ========== ========== ========= ========= ========== ========
Read length GRCh37 GRCh38 GRCm37 GRCm38 dm3 dm6 GRCz10 WBcel235
=========== ========== ========== ========== ========== ========= ========= ========== ========
50 2685511504 2701495761 2304947926 2308125349 130428560 125464728 1195445591 95159452
75 2736124973 2747877777 2404646224 2407883318 135004462 127324632 1251132686 96945445
100 2776919808 2805636331 2462481010 2467481108 139647232 129789873 1280189044 98259998
150 2827437033 2862010578 2489384235 2494787188 144307808 129941135 1312207169 98721253
200 2855464000 2887553303 2513019276 2520869189 148524010 132509163 1321355241 98672758
=========== ========== ========== ========== ========== ========= ========= ========== ========
Option 1 can be computed using ``faCount`` from `Kents tools <https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/>`__.
The effective genome size for a number of genomes using this method is given below:


+---------------+------------------+
| Genome | Effective size |
+===============+==================+
|GRCh37 | 2864785220 |
+---------------+------------------+
|GRCh38 | 2913022398 |
+---------------+------------------+
|T2T/CHM13CAT_v2| 3117292070 |
+---------------+------------------+
|GRCm37 | 2620345972 |
+---------------+------------------+
|GRCm38 | 2652783500 |
+---------------+------------------+
|dm3 | 162367812 |
+---------------+------------------+
|dm6 | 142573017 |
+---------------+------------------+
|GRCz10 | 1369631918 |
+---------------+------------------+
|GRCz11 | 1368780147 |
+---------------+------------------+
|WBcel235 | 100286401 |
+---------------+------------------+
|TAIR10 | 119482012 |
+---------------+------------------+



These values only appropriate if multimapping reads are included. If they are excluded (or there's any MAPQ filter applied),
then values derived from option 2 are more appropriate.
These are then based on the read length.
We can approximate these values for various read lengths using the `khmer program <https://khmer.readthedocs.io/en/latest/>`__ program and ``unique-kmers.py`` in particular.
A table of effective genome sizes given a read length using this method is provided below:

+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+
|Read length | GRCh37 | GRCh38 | T2T/CHM13CAT_v2 | GRCm37 | GRCm38 | dm3 | dm6 | GRCz10 | GRCz11 | WBcel235 | TAIR10 |
+=================+=================+=================+=================+=================+=================+=================+=================+=================+=================+=================+=================+
|50 | 2685511454 | 2701495711 | 2725240337 | 2304947876 | 2308125299 | 130428510 | 125464678 | 1195445541 | 1197575653 | 95159402 | 114339094 |
+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+
|75 | 2736124898 | 2747877702 | 2786136059 | 2404646149 | 2407883243 | 135004387 | 127324557 | 1251132611 | 1250812288 | 96945370 | 115317469 |
+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+
|100 | 2776919708 | 2805636231 | 2814334875 | 2462480910 | 2467481008 | 139647132 | 129789773 | 1280188944 | 1280354977 | 98259898 | 118459858 |
+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+
|150 | 2827436883 | 2862010428 | 2931551487 | 2489384085 | 2494787038 | 144307658 | 129940985 | 1312207019 | 1311832909 | 98721103 | 118504138 |
+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+
|200 | 2855463800 | 2887553103 | 2936403235 | 2513019076 | 2520868989 | 148523810 | 132508963 | 1321355041 | 1322366338 | 98672558 | 117723393 |
+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+
|250 | 2855044784 | 2898802627 | 2960856300 | 2528988583 | 2538590322 | 151901455 | 132900923 | 1339205109 | 1342093482 | 101271756 | 119585546 |
+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+
2 changes: 1 addition & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
sphinx==7.2.6
mock==5.1.0
sphinx_rtd_theme==1.3.0
sphinx_rtd_theme==2.0.0
sphinx-argparse==0.4.0
2 changes: 1 addition & 1 deletion docs/source/_templates/layout.html
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
{% extends "!layout.html" %}
{% set script_files = script_files + ["_static/welcome_owl.carousel.min.js"] %}
{% set css_files = css_files + ["_static/welcome_owl.carousel.css", "_static/welcome_owl.carousel.theme.css", "_static/fix_tables.css"] %}
{% set css_files = css_files + ["_static/welcome_owl.carousel.css", "_static/welcome_owl.carousel.theme.css"] %}