Workflow for generating OrthoDB v11 protein sets.
All files are automatically downloaded from OrthoDB and parsed using a Snakemake workflow with the following command:
snakemake --cores N
The resulting protein sets are saved into two different folders:
clades
contains clade-specific (e.g.,Arthropoda.fa
orViridiplantae.fa
) OrthoDB sets.species
contains species-specific protein sets from which the proteins of the same species or proteins of all species in the same taxonomic order were removed. This is intended for gene prediction experiments, see, e.g., the BRAKER2 paper.