Disorderly.py

Compare protein sequences by their lengths and compositions

MIT License.

Requires Python 3+

To see the commands:

$ python3 disorderly.py -h

How to use it?

1. Prepare your query

Put your query sequences in FASTA format and put them in a file

2. Prepare your database

Your database is made of sequences that you want to compare against. This is also in FASTA format, but we need to convert it to a .disorderdb database so it can be used to search against. Generate a .disorderdb file from your database using the following command:

$ python3 disorderly.py -v -fb path/to/your_database.fasta

-v Verbose flag

-fb Database FASTA file

This will generate your_database.fasta.disorderdb in the same folder as your_database.fasta

3. Search

Each of your queries is compared only to sequences of the same length in the database. Once a same-length sequence is found, the Euclidean distance between the compositions of your query and the database sequence is computed. The output contains all the same-length sequences sorted by the Euclidean distance (low to high).

This search is distributed over all the available CPUs!

$ python3 disorderly.py -v -i path/to/query.fasta -db path/to/your_database.fasta.disorderdb

-i Your query sequences in FASTA

-db The converted .disorderdb database

This will generate a .csv with the same name as your query with a bit of additional stuff (i.e. for query.fasta, the result will be query_search-20180816190934-ABCD.csv). The -v verbose flag will tell you where your result is, which will be in the same directory as your query)

Alternatively, you can run everything all at once:

$ python3 disorderly.py -v -i query.fasta -fb your_database.fasta

The previous step-by-step instruction is meant to help you understand what is really going on.

Reading the result

Open the .csv file with a text editor or Excel

The format is (sequence IDs are the FASTA headers):

Queries	Hits	Distances
query-seq-1	database-seq-9	0.000
query-seq-1	database-seq-5	0.135
query-seq-1	database-seq-14	0.246
query-seq-2	database-seq-3	0.000
query-seq-2	database-seq-75	0.321

How to get it? (Install)

No wheel currently :( , so just:

1. Download the .zip

2. Unpack it wherever you want

3. Find disorderly.py under src/ and run as described above

For Stanford folks

Those that run on MEMEX (or any of our servers that uses SLURM):

Feel free to use the bash_run.sh file to submit jobs so it can be run on multiple CPUs

$ sbatch bash_run.sh -v -i query.fasta -fb your_database.fasta

NOTE: bash_run.sh must be in the same folder as disorderly.py

ALSO: It is currently configured to use the DPB partition and 24 cores (1 node on MEMEX). Edit the file with any editor to change this, i.e.:

#SBATCH -p dge    # To use the DGE partition
#SBATCH -c 12     # for 12 cores

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disorderly.py

Compare protein sequences by their lengths and compositions

MIT License.

Requires Python 3+

How to use it?

1. Prepare your query

2. Prepare your database

3. Search

Alternatively, you can run everything all at once:

Reading the result

Open the .csv file with a text editor or Excel

How to get it? (Install)

No wheel currently :( , so just:

1. Download the .zip

2. Unpack it wherever you want

3. Find disorderly.py under src/ and run as described above

For Stanford folks

Those that run on MEMEX (or any of our servers that uses SLURM):

Feel free to use the bash_run.sh file to submit jobs so it can be run on multiple CPUs

About

Releases

Packages

Languages

License

qks1lver/disorderly

Folders and files

Latest commit

History

Repository files navigation

Disorderly.py

Compare protein sequences by their lengths and compositions

MIT License.

Requires Python 3+

How to use it?

1. Prepare your query

2. Prepare your database

3. Search

Alternatively, you can run everything all at once:

Reading the result

Open the .csv file with a text editor or Excel

How to get it? (Install)

No wheel currently :( , so just:

1. Download the .zip

2. Unpack it wherever you want

3. Find disorderly.py under src/ and run as described above

For Stanford folks

Those that run on MEMEX (or any of our servers that uses SLURM):

Feel free to use the bash_run.sh file to submit jobs so it can be run on multiple CPUs

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages