Skip to content

Compute set of minimizers for DNA or Protein sequences

License

Notifications You must be signed in to change notification settings

rfm-targa/minimizers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

minimizers

Compute set of minimizers for DNA or Protein sequences.

The implementation is based on the article from Roberts et al. that first described how to determine the set of minimizers for biological sequences.

Usage

Purpose
-------

Compute the set of minimizers for DNA or Protein sequences.

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_FILE         Path to a FASTA file with DNA or protein sequences.
  -o OUTPUT_FILE        Path to output to save results.
  --of {binary,csv}     Output file format. `binary` will use pickle to save
                        results into a binary file. `csv` will save results to
                        a CSV file with sequence identifiers in the first
                        field followed by a minimizer per field.
  --k K_VALUE           Value for the size of the kmers.
  --w WINDOW_SIZE       Window size value. Minimizers will be determined based
                        on groups of `w` adjacent kmers.
  --p                   If the start position of the kmers should be returned.
  --m {default,skipper}
                        Defines function that will be used to determine
                        minimizers.
  --t THREADS           Number of CPU cores/threads used to parallelize
                        minimizer computation.

About

Compute set of minimizers for DNA or Protein sequences

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages