Skip to content

Commit

Permalink
Merge pull request #1 from Edinburgh-Genome-Foundry/dev
Browse files Browse the repository at this point in the history
Biopython v1.78 fix
  • Loading branch information
veghp committed Sep 7, 2020
2 parents 74e3a4a + 15633cf commit 28f4ec1
Show file tree
Hide file tree
Showing 9 changed files with 84 additions and 73 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,4 @@ nosetests.xml
.pypirc

.cache
.vscode
14 changes: 5 additions & 9 deletions LICENCE.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,4 @@

The MIT License (MIT)
[OSI Approved License]

The MIT License (MIT)
MIT License

Copyright (c) 2017 Edinburgh Genome Foundry

Expand All @@ -13,13 +9,13 @@ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
46 changes: 23 additions & 23 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<br /><br />
</p>

Bandwitch
BandVitch
=========

.. image:: https://travis-ci.org/Edinburgh-Genome-Foundry/BandWitch.svg?branch=master
Expand All @@ -14,29 +14,29 @@ Bandwitch
.. image:: https://coveralls.io/repos/github/Edinburgh-Genome-Foundry/BandWitch/badge.svg?branch=master
:target: https://coveralls.io/github/Edinburgh-Genome-Foundry/BandWitch?branch=master

Bandwitch (full documentation `here <https://edinburgh-genome-foundry.github.io/BandWitch/>`_)
BandWitch (full documentation `here <https://edinburgh-genome-foundry.github.io/BandWitch/>`_)
is a Python library for the planning and analysis of restriction
experiments in DNA assembly operations. Bandwitch implements method to select the best enzyme(s) to validate or identify DNA assemblies. It also provides report generation methods to automatically validate/identify assemblies from experimental data.
experiments in DNA assembly operations. BandWitch implements methods for selecting the best enzyme(s) to validate or identify DNA assemblies. It also provides report generation methods to automatically validate/identify assemblies from experimental data.

You can try BandWitch's enzyme suggestion feature in `this web demo <https://cuba.genomefoundry.org/select_digestions>`_, and the sequence validation (from AATI fragment analyzer files) in `this other demo <http:https://cuba.genomefoundry.org/analyze-digests>`_
You can try BandWitch's enzyme suggestion feature in `this web demo <https://cuba.genomefoundry.org/select_digestions>`_, and the sequence validation (from AATI fragment analyzer files) in `this other demo <http:https://cuba.genomefoundry.org/analyze-digests>`_.

Installation
-------------
------------

You can install DnaCauldron through PIP
You can install BandWitch through PIP:


.. code:: shell
sudo pip install bandwitch
On Ubuntu at least, you may need to install libblas first:
On Ubuntu, you may need to install libblas first:

.. code::
sudo apt-get install libblas-dev liblapack-dev
Alternatively, you can unzip the sources in a folder and type
Alternatively, you can unzip the source files in a folder and type:

.. code:: shell
Expand All @@ -49,7 +49,7 @@ Enzyme selection with BandWitch
In the following examples, we assume that we have a set of 12 constructs which we will
need to either validate (i.e. we digest these constructs and compare each pattern
with the expected pattern for that construct) or identify (i.e. we will digest an
a-priori unknown construct and use the migration patterns to un-ambiguously
*a priori* unknown construct and use the migration patterns to unambiguously
identify each construct among the 12 possible candidates).

For validation purposes, the difficulty is to find a digestion that will produce
Expand All @@ -61,8 +61,8 @@ candidates.
Every time when the problem cannot be solved with a single digestion, BandWitch
can propose 2 or 3 digestions which collectively solve the problem.

**Important:** when providing BandWitch with a record, make sure to set the
topology, defined by ``record.annotations['topology'] = 'linear'|'circular'``.
**Important:** when providing BandWitch with a Biopython record, make sure to set the
topology, defined by: ``record.annotations['topology'] = 'linear'|'circular'``


Finding enzymes that "work well" for many constructs
Expand All @@ -78,8 +78,8 @@ Here is the code to select enzymes that will produce nice patterns for all const
enzymes = ["EcoRI", "BamHI", "XhoI", "EcoRV", "SpeI", "XbaI",
"NotI", "SacI", "SmaI", "HindIII", "PstI"]
sequences = [
load_record(genbank_file_paidname=f, topology='circular')
for genbank_file_path in some_list_of_files)
load_record(record_file=f, topology='circular')
for f in some_list_of_genbank_files
]
# SELECT THE BEST SINGLE DIGESTION WITH AT MOST ENZYMES
Expand All @@ -103,7 +103,7 @@ Result:
:align: center

Finding enzymes that will differentiate many constructs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To select enzymes that will produce **different patterns for each construct, for identification:**

Expand All @@ -116,8 +116,8 @@ To select enzymes that will produce **different patterns for each construct, for
# DEFINE SEQUENCES AND ENZYME SET (6-CUTTERS WITH >3 COMMERCIAL PROVIDERS)
enzymes = list_common_enzymes(site_length=(6,), min_suppliers=3)
sequences = [
load_record(genbank_file_path, id=f)
for genbank_file_path in some_list_of_files)
load_record(genbank_file_path, topology='circular')
for genbank_file_path in some_list_of_genbank_files
]
# SELECT THE BEST DIGESTION PAIRS (AT MOST 1 ENZYME PER DIGESTION)
Expand Down Expand Up @@ -146,15 +146,15 @@ Result:

In the result above, each construct has a unique "fingerprint". Assuming that you
have an unlabelled DNA sample which could be any of these assemblies, then simply
digesting the sample with MspA1I and BsmI will give you 2 pattern which collectively
digesting the sample with MspA1I and BsmI will give you 2 patterns which collectively
will correspond to a unique assembly.

Usage: Construct validation or identification from experimental data
---------------------------------------------------------------------
--------------------------------------------------------------------

This part is still under construction.
*This part is still under construction.*

Bandwitch can process output files from an automated fragment analyzer and produce
BandWitch can process output files from an automated fragment analyzer and produce
informative reports as illustrated below:

.. image:: https://raw.githubusercontent.com/Edinburgh-Genome-Foundry/BandWitch/master/docs/_static/images/bands_validation.png
Expand All @@ -164,12 +164,12 @@ informative reports as illustrated below:


License = MIT
--------------
-------------

BandWitch is an open-source software originally written at the `Edinburgh Genome Foundry <http:https://edinburgh-genome-foundry.github.io/home.html>`_ by `Zulko <https://github.com/Zulko>`_ and `released on Github <https://github.com/Edinburgh-Genome-Foundry/Primavera>`_ under the MIT licence (垄 Edinburg Genome Foundry). Everyone is welcome to contribute !
BandWitch is an open-source software originally written at the `Edinburgh Genome Foundry <http:https://edinburgh-genome-foundry.github.io/home.html>`_ by `Zulko <https://github.com/Zulko>`_ and `released on Github <https://github.com/Edinburgh-Genome-Foundry/BandWitch>`_ under the MIT license (Copyright 2017 Edinburgh Genome Foundry). Everyone is welcome to contribute!

More biology software
-----------------------
---------------------

.. image:: https://raw.githubusercontent.com/Edinburgh-Genome-Foundry/Edinburgh-Genome-Foundry.github.io/master/static/imgs/logos/egf-codon-horizontal.png
:target: https://edinburgh-genome-foundry.github.io/
Expand Down
19 changes: 7 additions & 12 deletions bandwitch/DigestionProblem/SeparatingDigestionProblem.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ class SeparatingDigestionsProblem(DigestionProblem):
sequences
An (ordered) dictionary of the form {sequence_name: sequence} where the
sequence is an ATGC string
sequence is an ATGC string.
enzymes
List of the names of the enzymes to consider, e.g. ``['EcoRI', 'XbaI']``.
Expand All @@ -45,7 +45,7 @@ class SeparatingDigestionsProblem(DigestionProblem):
A Ladder object representing the ladder used for migrations.
linear
True for linear sequences, false for circular sequences
True for linear sequences, false for circular sequences.
max_enzymes_per_digestion
Maximal number of enzymes that can go in a single digestion.
Expand Down Expand Up @@ -113,8 +113,7 @@ def _parameter_element_score(self, digestion, sequences_pair):
"""See max_patterns_difference."""
sequence1, sequence2 = sequences_pair
digestion1, digestion2 = [
self.sequences_digestions[s][digestion]
for s in (sequence1, sequence2)
self.sequences_digestions[s][digestion] for s in (sequence1, sequence2)
]
# If similar pair already computed, return the previous result.
if digestion1["same_as"] == digestion2["same_as"]:
Expand All @@ -137,12 +136,11 @@ def _score_to_color(score, maxi=0.1):
Parameters
----------
score
Value between 0 (perfect similarity, green) and 1 (red)
Value between 0 (perfect similarity, green) and 1 (red).
maxi
Value of the score above which everything appears completely red.
Below this value the color goes progressively from red to green in 0.
"""
return (
max(0, min(1, score / maxi)),
Expand Down Expand Up @@ -171,8 +169,7 @@ def plot_distances_map(self, digestions, ax=None, target_file=None):
axes
The axes of the generated figure (if a target file is written to,
the figure is closed and None is returned instead)
the figure is closed and None is returned instead).
"""

if not PLOTS_AVAILABLE:
Expand Down Expand Up @@ -210,17 +207,15 @@ def plot_distances_map(self, digestions, ax=None, target_file=None):
)

ax.set_yticks(range(len(grid)))
ax.set_yticklabels(
list(self.sequences)[:-1], size=14, fontdict={"weight": "bold"}
)
ax.set_yticklabels(list(self.sequences), size=14, fontdict={"weight": "bold"})
ax.set_xticks(range(len(grid)))
ax.xaxis.set_ticks_position("top")
ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.set_xticklabels(
[" " + s for s in list(self.sequences)[1:][::-1]],
[" " + s for s in list(self.sequences)[::-1]],
rotation=90,
size=14,
fontdict={"weight": "bold"},
Expand Down
12 changes: 6 additions & 6 deletions bandwitch/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Code organization

This document walks you trough the DNA Chisel code. Please request changes if anything is unclear.
This document walks you trough the BandWitch code. Please request changes if anything is unclear.

#### Ladder/

- **Ladder.py** implements the Ladder class, used throughout the library to represent band ladders.A ladder's main method is ``dna_size_to_migration``, which predicts the coordinates of a band of a given size.
- **Ladder.py** implements the Ladder class, used throughout the library to represent band ladders. A ladder's main method is ``dna_size_to_migration``, which predicts the coordinates of a band of a given size.
- **preset_ladders.py** provides a set of commonly used ladders obtained by parsing the spreadsheets in ``data/``.

#### DigestionProblem/
Expand All @@ -13,25 +13,25 @@ This submodule implements the classes and methods for finding enzyme sets adapte
- **SetCoverProblem** (implemented in SetCoverProblem.py) implements a class for solving generic Minimal Subset Cover problems. This class inherited by all other classes in this module to find one or more digestions collectively covering all the given records.
- **DigestionProblem** (implemented in DigestionProblem.py) inherits from *SetCoverProblem* and is the base class for all enzyme selection problems. It implements the initialization (where the digestion of every sequence by every enzyme mixes are computed), the plotting of the sequences by selected enzymes, and the scaffold for enzyme selection. All children classes simply need to define a custom ``_parameter_element_score`` method.
- **IdealDigestionsProblem** (implemented in IdealDigestionsProblem.py) is a subclass of DigestionProblem to select enzymes that will give "acceptable" patterns for every sequence provided. Its ``_parameter_element_score`` method computes a ``migration_score()`` depending on the number of bands, the spacing between bands, etc.
- **SeparatingDigestionsProblem** (implemented in SeparatingDigestionsProblem.py) is a subclass of DigestionProblem to select enzymes that will give distinct patterns (not necessarily ideal) for each given sequence. It's ``_parameter_element_score`` computes the distance between the bands of 2 digestions.
- **SeparatingDigestionsProblem** (implemented in SeparatingDigestionsProblem.py) is a subclass of DigestionProblem to select enzymes that will give distinct patterns (not necessarily ideal) for each given sequence. Its ``_parameter_element_score`` computes the distance between the bands of 2 digestions.

#### ClonesObservations/

This submodule implements classes for representing and validating the results of experimental restriction digest experiments. In particular, it allows to import observations from the AATI fragment analyzer, and compare these to expected results to create validation or identification reports.

The class structure in this module is complicated because life is complicated. In a typical assembly batch, there are several different constructs assembled. For each construct we can pick several clones and submit the DNA extracted from each clone to several restriction digests.

- **BandsObservations** represents the observation of a digest. It has a name (e.g. the name of the microplate well in which it was observed), bands (more precisely, a set of bands sizes), a ladder with which it was observed, and optionally a picture of the gel. It has a method to be compared with another pattern (using the *band_patterns_discrepancy()* method defined in *band_patterns_discrepancy.py*)
- **BandsObservations** represents the observation of a digest. It has a name (e.g. the name of the microplate well in which it was observed), bands (more precisely, a set of bands sizes), a ladder with which it was observed, and optionally a picture of the gel. It has a method to be compared with another pattern (using the *band_patterns_discrepancy()* method defined in *band_patterns_discrepancy.py*).
- **Clone** represents a clone, with a name (could be a microplate well's name), the construct associated with the clone (if known) and the (possibly multiple) restriction digest observations for this clone.
- **CloneValidation** is a structure representing the comparison between the observations of a Clone and expected patterns. It contains a Clone, the expected patterns for each digestion, and the discrepancy between observed and expected for each digestion. It is the "building block" of validation and identification reports.
- **ClonesObservations** represents all the information necessary to process a full experimental outcome involving several clones and restriction digests. At it's core, it groups several Clone instances together with the biopython records representing the expected constructs. As the highest-level class in this module, ClonesObservations has methods to be directly created from an AATI Fragment Analyzer zip file (or several zip files, as ClonesObservations can be merged togther). The class also has methods to generate CloneValidations for all clones in its set, in order to validate or identify all clones and compile the results in a PDF report.

#### list_common_enzymes/

- **list_common_enzymes.py** provides a method for getting common enzymes (filtered for star-activity using data in *enzymes_data/*)
- **enzymes_data/** provides the dictionnary ``enzymes_infos``, generated in the *__init__.py* by parsing the spreadsheet **enzymes_infos.csv**, which contains enzyme data, notably methylation sensitivity, compiled from rebase.neb.com using the script **update_enzymes_list.py**.
- **enzymes_data/** provides the dictionary ``enzymes_infos``, generated in the *__init__.py* by parsing the spreadsheet **enzymes_infos.csv**, which contains enzyme data, notably methylation sensitivity, compiled from rebase.neb.com using the script **update_enzymes_list.py**.

#### Files at the root

- **bands_predictions.py** implements methods to predict which band sizes given records and enzymes will create, in particular, some methods efficiently compute band sizes for batches of sequences and combinations of enzymes. The methods are used extensively by the *DigestionProblem* class.
- **tools.py** implements generic methods, notably for Genbank record manipulation.
- **tools.py** implements generic methods, notably for Genbank record manipulation.
21 changes: 16 additions & 5 deletions bandwitch/tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,13 @@
from Bio.Seq import Seq
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import DNAAlphabet

try:
# Biopython <1.78
from Bio.Alphabet import DNAAlphabet
except ImportError:
# Biopython >=1.78
has_dna_alphabet = False
from snapgene_reader import snapgene_file_to_seqrecord


Expand All @@ -26,9 +32,8 @@ def set_record_topology(record, topology):
]
if topology not in valid_topologies:
raise ValueError(
"topology should be one of %s (was %s)." % (
", ".join(valid_topologies), topology
)
"topology should be one of %s (was %s)."
% (", ".join(valid_topologies), topology)
)
annotations = record.annotations
default_prefix = "default_to_"
Expand Down Expand Up @@ -116,11 +121,17 @@ def sequence_to_biopython_record(
sequence, id="<unknown id>", name="<unknown name>", features=()
):
"""Return a SeqRecord of the sequence, ready to be Genbanked."""
if has_dna_alphabet:
seq = Seq(sequence, alphabet=DNAAlphabet())
else:
seq = Seq(sequence)

return SeqRecord(
Seq(sequence, alphabet=DNAAlphabet()),
seq=seq,
id=id,
name=name,
features=list(features),
annotations={"molecule_type": "DNA"},
)


Expand Down
2 changes: 1 addition & 1 deletion bandwitch/version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.3.1"
__version__ = "0.3.2"
12 changes: 6 additions & 6 deletions pypi-readme.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
BandWitch
===========
=========

Bandwitch is a Python library for the planning and analysis of restriction
experiments in DNA assembly operations. Bandwitch implements method to select
BandWitch is a Python library for the planning and analysis of restriction
experiments in DNA assembly operations. BandWitch implements method to select
the best enzyme(s) to validate or identify DNA assemblies. It also provides
report generation methods to automatically validate/identify assemblies from
experimental data.
Expand All @@ -25,11 +25,11 @@ Infos

`<https://edinburgh-genome-foundry.github.io/BandWitch/>`_

**Github Page**
**Github Page:**

`<https://github.com/Edinburgh-Genome-Foundry/BandWitch>`_

**Live demo**
**Live demo:**

Enzyme suggestion: `<http:https://cuba.genomefoundry.org/digestion-selector>`_

Expand All @@ -46,7 +46,7 @@ Enzymes selected by bandwitch to obtain **clear, optimal** patterns for all test
:alt: [logo]
:align: center

Enzymes selected by bandwitch to obtain **significant differences** between the patterns of the tested constructs, so that a construct can be identified by its pattern.
Enzymes selected by bandwitch to obtain **significant differences** between the patterns of the tested constructs, so that a construct can be identified by its pattern:

.. image:: https://raw.githubusercontent.com/Edinburgh-Genome-Foundry/BandWitch/master/examples/separating_digestions.png
:alt: [logo]
Expand Down
Loading

0 comments on commit 28f4ec1

Please sign in to comment.