dgeapy: Differential Gene Expression Analysis in Python

dgeapy.py is a Python toolkit for analyzing RNAseq data, focusing on differential gene expression and intersections among datasets.

Installation

Setting up the Environment

Create a dedicated environment for dgeapy using Conda:

conda create --name dgeapy python=3.10
conda activate dgeapy

Installing Dependencies

Install all necessary libraries with Conda:

conda install pandas numpy openpyxl matplotlib seaborn matplotlib-venn UpSetPlot

Dependencies and their Roles

pandas: For dataframe analysis and manipulation.
numpy: Numerical computing tasks.
openpyxl: Reading and writing Excel xlsx files.
matplotlib: Data visualization.
seaborn: High-level manipulation of figures.
matplotlib-venn: Venn diagrams.
UpSetPlot: UpSet plots.

Usage

Available scripts are:

dgeapy.py analyze
dgeapy.py intersect

python dgeapy/dgeapy.py -h

Differential Gene Expression Analyisis in Python at different levels.

Usage: python dgeapy.py <COMMAND> [OPTIONS]

Commands:
    analyze           Perform differential gene expression analyisis
    intersect         Find intersections between indexes of multiple files

Options:
    -h, --help        Show this help message and exit
    -V, --version     Show version number and exit

Examples:
    python dgeapy.py analyze -h
    python dgeapy.py analyze <mygenes.csv> -f 2.0 -p 0.01
    python dgeapy.py intersect -f <mutA.csv> -f <mutB.csv> -n "Mutant A" -n "Mutant B" -i gene_id

dgeapy analyze

Determine the differentially expressed genes from a dataframe.

$ python dgeapy/dgeapy.py analyze -h
usage: dgeapy.py analyze <TABLE> [OPTIONS]

Differential Gene Expression Analysis.
Generates tables with differentially expressed plots to visualize the results.


positional arguments:
  <TABLE>                  Path to the gene expression data file (CSV, TSV, XLSX).

options:
  -h, --help               show this help message and exit
  -o, --output DIR         Specify the output directory (default: cwd).
  -p, --padj FLOAT         Adjusted p-value threshold for significance (default: 0.05).
  -f, --fold-change FLOAT  Fold change threshold for significance (default: 1.5).
  -F, --formats [STR]      Output formats for plots (e.g. svg, pdf) (default: ['png']).
  -e, --exclude [STR]      Exclude indexes matching specified patterns.
  -N, --nan-values [STR]   Strings to recognize as NaN (default: ['', '--']).
  -k, --keep-duplicated    Keep duplicated indexes (default: False).
  -I, --index-column STR   Column name for index (default: index).
  -L, --log2fc-column STR  Column name for log2 Fold Change (default: log2_fold_change).
  -P, --p-column STR       Column name for adjusted p-values (default: padj).

Script workflow summary:

Take a table in CSV, TSV, or XLSX format.
Verify and clean the data by checking for NaN values, duplicated values in the index, and excluding indexes with specific patterns using --exclude.
Utilize --index-column to index each row and add fold change and regulation columns.
Identify differentially expressed genes (DEG) by applying thresholds for p-adjusted value (--padj) and fold change absolute value (--fc).
Output three tables (DEGs, upregulated and downregulated) and two figures (a bar plot and a volcano plot).

Usage example:

python dgeapy/dgeapy.py analyze example/data/GSE206442.xlsx -o example/analyze_output -L log2FoldChange -N NA

Output tables and figures can be found in example/analyze_output.

Example data can be downloaded from GSE206442.

dgeapy intersect

Compute intersections of indexes among a list of dataframes.

$ python dgeapy/dgeapy.py intersect -h
usage: dgeapy.py intersections -f <file1> -f <file2> [...] -n <name1> -n <name2> [...] [OPTIONS]

Computes intersections between multiple data files and generates comprehensive intersection
tables and visualizations.


options:
  -h, --help              show this help message and exit
  -o, --output DIR        Specify the output directory for results (default: cwd).
  -i, --index-column STR  Name of the index column in the data files (default: index).
  -F, --formats [STR]     Output formats for the plots (e.g. svg) (default: ['png', 'pdf']).
  -N, --nan-values [STR]  Strings to recognize as NaN (default: ['', '--', 'NA']).
  -e, --exclude [STR]     Exclude indexes matching specified patterns.

required arguments:
  -f, --files [<FILE>]    Paths to the data files for intersection analysis.
  -n, --names [STR]       Names for the data files to label plots and tables.

Script workflow summary:

Take multiple dataframes along with their assigned names.
Validate and prepare data by checking for NaN, null, or duplicated values in the indexes, excluding specific patterns with --exclude.
Compute all possible intersections between the indexes of the provided dataframes.
Generate TSV and XLSX files for each non-empty intersection to document results.
Produce visual representations of intersections: automatically generate a weighted and unweighted Venn Diagram for up to three dataframes, or an UpSet Plot for larger sets to visualize the present and missing intersections.

Usage example:

Intersection analysis between 3 files:

python dgeapy/dgeapy.py intersect -f example/data/condition1.xlsx -f example/data/condition2.xlsx -f example/data/condition3.xlsx -n "Condition 1" -n "Condition 2" -n "Condition 3" -o example/intersect3_output

Results can be found in example/intersect3_output

Example of a generated Venn Diagram:

Intersection analysis between 4 files:

python dgeapy/dgeapy.py intersect -f example/data/condition1.xlsx -f example/data/condition2.xlsx -f example/data/condition3.xlsx -f example/data/condition5.xlsx -n "Condition 1" -n "Condition 2" -n "Condition 3" -n "Condition 4" -o example/intersect4_output

Results can be found in example/intersect4_output

Example of a generated UpSet Plot:

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
dgeapy		dgeapy
example		example
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dgeapy: Differential Gene Expression Analysis in Python

Installation

Setting up the Environment

Installing Dependencies

Dependencies and their Roles

Usage

dgeapy analyze

Usage example:

dgeapy intersect

Usage example:

About

Releases

Packages

Languages

jllpons/dgeapy

Folders and files

Latest commit

History

Repository files navigation

dgeapy: Differential Gene Expression Analysis in Python

Installation

Setting up the Environment

Installing Dependencies

Dependencies and their Roles

Usage

dgeapy analyze

Usage example:

dgeapy intersect

Usage example:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages