Merge pull request #1 from Edinburgh-Genome-Foundry/dev

Biopython v1.78 fix
Edinburgh-Genome-Foundry · Sep 7, 2020 · 28f4ec1 · 28f4ec1
2 parents 74e3a4a + 15633cf
commit 28f4ec1
Show file tree

Hide file tree

Showing 9 changed files with 84 additions and 73 deletions.
diff --git a/.gitignore b/.gitignore
@@ -44,3 +44,4 @@ nosetests.xml
 .pypirc
 
 .cache
+.vscode
diff --git a/LICENCE.txt b/LICENCE.txt
@@ -1,8 +1,4 @@
-
-The MIT License (MIT)
-[OSI Approved License]
-
-The MIT License (MIT)
+MIT License
 
 Copyright (c) 2017 Edinburgh Genome Foundry
 
@@ -13,13 +9,13 @@ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 
-The above copyright notice and this permission notice shall be included in
-all copies or substantial portions of the Software.
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
 
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
-THE SOFTWARE.
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.rst b/README.rst
@@ -5,7 +5,7 @@
  <br /><br />
  </p>
 
-Bandwitch
+BandVitch
 =========
 
 .. image:: https://travis-ci.org/Edinburgh-Genome-Foundry/BandWitch.svg?branch=master
@@ -14,29 +14,29 @@ Bandwitch
 .. image:: https://coveralls.io/repos/github/Edinburgh-Genome-Foundry/BandWitch/badge.svg?branch=master
  :target: https://coveralls.io/github/Edinburgh-Genome-Foundry/BandWitch?branch=master
 
-Bandwitch (full documentation `here <https://edinburgh-genome-foundry.github.io/BandWitch/>`_)
+BandWitch (full documentation `here <https://edinburgh-genome-foundry.github.io/BandWitch/>`_)
 is a Python library for the planning and analysis of restriction
-experiments in DNA assembly operations. Bandwitch implements method to select the best enzyme(s) to validate or identify DNA assemblies. It also provides report generation methods to automatically validate/identify assemblies from experimental data.
+experiments in DNA assembly operations. BandWitch implements methods for selecting the best enzyme(s) to validate or identify DNA assemblies. It also provides report generation methods to automatically validate/identify assemblies from experimental data.
 
-You can try BandWitch's enzyme suggestion feature in `this web demo <https://cuba.genomefoundry.org/select_digestions>`_, and the sequence validation (from AATI fragment analyzer files) in `this other demo <http:https://cuba.genomefoundry.org/analyze-digests>`_
+You can try BandWitch's enzyme suggestion feature in `this web demo <https://cuba.genomefoundry.org/select_digestions>`_, and the sequence validation (from AATI fragment analyzer files) in `this other demo <http:https://cuba.genomefoundry.org/analyze-digests>`_.
 
 Installation
--------------
+------------
 
-You can install DnaCauldron through PIP
+You can install BandWitch through PIP:
 
 
 .. code:: shell
 
  sudo pip install bandwitch
 
-On Ubuntu at least, you may need to install libblas first:
+On Ubuntu, you may need to install libblas first:
 
 .. code::
 
  sudo apt-get install libblas-dev liblapack-dev
 
-Alternatively, you can unzip the sources in a folder and type
+Alternatively, you can unzip the source files in a folder and type:
 
 .. code:: shell
 
@@ -49,7 +49,7 @@ Enzyme selection with BandWitch
 In the following examples, we assume that we have a set of 12 constructs which we will
 need to either validate (i.e. we digest these constructs and compare each pattern
 with the expected pattern for that construct) or identify (i.e. we will digest an
-a-priori unknown construct and use the migration patterns to un-ambiguously
+*a priori* unknown construct and use the migration patterns to unambiguously
 identify each construct among the 12 possible candidates).
 
 For validation purposes, the difficulty is to find a digestion that will produce
@@ -61,8 +61,8 @@ candidates.
 Every time when the problem cannot be solved with a single digestion, BandWitch
 can propose 2 or 3 digestions which collectively solve the problem.
 
-**Important:** when providing BandWitch with a record, make sure to set the
-topology, defined by ``record.annotations['topology'] = 'linear'|'circular'``.
+**Important:** when providing BandWitch with a Biopython record, make sure to set the
+topology, defined by: ``record.annotations['topology'] = 'linear'|'circular'``
 
 
 Finding enzymes that "work well" for many constructs
@@ -78,8 +78,8 @@ Here is the code to select enzymes that will produce nice patterns for all const
  enzymes = ["EcoRI", "BamHI", "XhoI", "EcoRV", "SpeI", "XbaI",
  "NotI", "SacI", "SmaI", "HindIII", "PstI"]
  sequences = [
- load_record(genbank_file_paidname=f, topology='circular')
- for genbank_file_path in some_list_of_files)
+ load_record(record_file=f, topology='circular')
+ for f in some_list_of_genbank_files
  ]
 
  # SELECT THE BEST SINGLE DIGESTION WITH AT MOST ENZYMES
@@ -103,7 +103,7 @@ Result:
  :align: center
 
 Finding enzymes that will differentiate many constructs
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 To select enzymes that will produce **different patterns for each construct, for identification:**
 
@@ -116,8 +116,8 @@ To select enzymes that will produce **different patterns for each construct, for
  # DEFINE SEQUENCES AND ENZYME SET (6-CUTTERS WITH >3 COMMERCIAL PROVIDERS)
  enzymes = list_common_enzymes(site_length=(6,), min_suppliers=3)
  sequences = [
- load_record(genbank_file_path, id=f)
- for genbank_file_path in some_list_of_files)
+ load_record(genbank_file_path, topology='circular')
+ for genbank_file_path in some_list_of_genbank_files
  ]
 
  # SELECT THE BEST DIGESTION PAIRS (AT MOST 1 ENZYME PER DIGESTION)
@@ -146,15 +146,15 @@ Result:
 
 In the result above, each construct has a unique "fingerprint". Assuming that you
 have an unlabelled DNA sample which could be any of these assemblies, then simply
-digesting the sample with MspA1I and BsmI will give you 2 pattern which collectively
+digesting the sample with MspA1I and BsmI will give you 2 patterns which collectively
 will correspond to a unique assembly.
 
 Usage: Construct validation or identification from experimental data
----------------------------------------------------------------------
+--------------------------------------------------------------------
 
-This part is still under construction.
+*This part is still under construction.*
 
-Bandwitch can process output files from an automated fragment analyzer and produce
+BandWitch can process output files from an automated fragment analyzer and produce
 informative reports as illustrated below:
 
 .. image:: https://raw.githubusercontent.com/Edinburgh-Genome-Foundry/BandWitch/master/docs/_static/images/bands_validation.png
@@ -164,12 +164,12 @@ informative reports as illustrated below:
 
 
 License = MIT
---------------
+-------------
 
-BandWitch is an open-source software originally written at the `Edinburgh Genome Foundry <http:https://edinburgh-genome-foundry.github.io/home.html>`_ by `Zulko <https://github.com/Zulko>`_ and `released on Github <https://github.com/Edinburgh-Genome-Foundry/Primavera>`_ under the MIT licence (¢ Edinburg Genome Foundry). Everyone is welcome to contribute !
+BandWitch is an open-source software originally written at the `Edinburgh Genome Foundry <http:https://edinburgh-genome-foundry.github.io/home.html>`_ by `Zulko <https://github.com/Zulko>`_ and `released on Github <https://github.com/Edinburgh-Genome-Foundry/BandWitch>`_ under the MIT license (Copyright 2017 Edinburgh Genome Foundry). Everyone is welcome to contribute!
 
 More biology software
------------------------
+---------------------
 
 .. image:: https://raw.githubusercontent.com/Edinburgh-Genome-Foundry/Edinburgh-Genome-Foundry.github.io/master/static/imgs/logos/egf-codon-horizontal.png
  :target: https://edinburgh-genome-foundry.github.io/

diff --git a/bandwitch/DigestionProblem/SeparatingDigestionProblem.py b/bandwitch/DigestionProblem/SeparatingDigestionProblem.py
@@ -36,7 +36,7 @@ class SeparatingDigestionsProblem(DigestionProblem):
 
  sequences
  An (ordered) dictionary of the form {sequence_name: sequence} where the
- sequence is an ATGC string
+ sequence is an ATGC string.
 
  enzymes
  List of the names of the enzymes to consider, e.g. ``['EcoRI', 'XbaI']``.
@@ -45,7 +45,7 @@ class SeparatingDigestionsProblem(DigestionProblem):
  A Ladder object representing the ladder used for migrations.
 
  linear
- True for linear sequences, false for circular sequences
+ True for linear sequences, false for circular sequences.
 
  max_enzymes_per_digestion
  Maximal number of enzymes that can go in a single digestion.
@@ -113,8 +113,7 @@ def _parameter_element_score(self, digestion, sequences_pair):
  """See max_patterns_difference."""
  sequence1, sequence2 = sequences_pair
  digestion1, digestion2 = [
- self.sequences_digestions[s][digestion]
- for s in (sequence1, sequence2)
+ self.sequences_digestions[s][digestion] for s in (sequence1, sequence2)
  ]
  # If similar pair already computed, return the previous result.
  if digestion1["same_as"] == digestion2["same_as"]:
@@ -137,12 +136,11 @@ def _score_to_color(score, maxi=0.1):
  Parameters
  ----------
  score
- Value between 0 (perfect similarity, green) and 1 (red)
+ Value between 0 (perfect similarity, green) and 1 (red).
 
  maxi
  Value of the score above which everything appears completely red.
  Below this value the color goes progressively from red to green in 0.
-
  """
  return (
  max(0, min(1, score / maxi)),
@@ -171,8 +169,7 @@ def plot_distances_map(self, digestions, ax=None, target_file=None):
 
  axes
  The axes of the generated figure (if a target file is written to,
- the figure is closed and None is returned instead)
-
+ the figure is closed and None is returned instead).
  """
 
  if not PLOTS_AVAILABLE:
@@ -210,17 +207,15 @@ def plot_distances_map(self, digestions, ax=None, target_file=None):
  )
 
  ax.set_yticks(range(len(grid)))
- ax.set_yticklabels(
- list(self.sequences)[:-1], size=14, fontdict={"weight": "bold"}
- )
+ ax.set_yticklabels(list(self.sequences), size=14, fontdict={"weight": "bold"})
  ax.set_xticks(range(len(grid)))
  ax.xaxis.set_ticks_position("top")
  ax.spines["right"].set_visible(False)
  ax.spines["top"].set_visible(False)
  ax.spines["left"].set_visible(False)
  ax.spines["bottom"].set_visible(False)
  ax.set_xticklabels(
- [" " + s for s in list(self.sequences)[1:][::-1]],
+ [" " + s for s in list(self.sequences)[::-1]],
  rotation=90,
  size=14,
  fontdict={"weight": "bold"},

diff --git a/bandwitch/README.md b/bandwitch/README.md
@@ -1,10 +1,10 @@
 # Code organization
 
-This document walks you trough the DNA Chisel code. Please request changes if anything is unclear.
+This document walks you trough the BandWitch code. Please request changes if anything is unclear.
 
 #### Ladder/
 
-- **Ladder.py** implements the Ladder class, used throughout the library to represent band ladders.A ladder's main method is ``dna_size_to_migration``, which predicts the coordinates of a band of a given size.
+- **Ladder.py** implements the Ladder class, used throughout the library to represent band ladders. A ladder's main method is ``dna_size_to_migration``, which predicts the coordinates of a band of a given size.
 - **preset_ladders.py** provides a set of commonly used ladders obtained by parsing the spreadsheets in ``data/``.
 
 #### DigestionProblem/
@@ -13,25 +13,25 @@ This submodule implements the classes and methods for finding enzyme sets adapte
 - **SetCoverProblem** (implemented in SetCoverProblem.py) implements a class for solving generic Minimal Subset Cover problems. This class inherited by all other classes in this module to find one or more digestions collectively covering all the given records.
 - **DigestionProblem** (implemented in DigestionProblem.py) inherits from *SetCoverProblem* and is the base class for all enzyme selection problems. It implements the initialization (where the digestion of every sequence by every enzyme mixes are computed), the plotting of the sequences by selected enzymes, and the scaffold for enzyme selection. All children classes simply need to define a custom ``_parameter_element_score`` method.
 - **IdealDigestionsProblem** (implemented in IdealDigestionsProblem.py) is a subclass of DigestionProblem to select enzymes that will give "acceptable" patterns for every sequence provided. Its ``_parameter_element_score`` method computes a ``migration_score()`` depending on the number of bands, the spacing between bands, etc.
-- **SeparatingDigestionsProblem** (implemented in SeparatingDigestionsProblem.py) is a subclass of DigestionProblem to select enzymes that will give distinct patterns (not necessarily ideal) for each given sequence. It's ``_parameter_element_score`` computes the distance between the bands of 2 digestions.
+- **SeparatingDigestionsProblem** (implemented in SeparatingDigestionsProblem.py) is a subclass of DigestionProblem to select enzymes that will give distinct patterns (not necessarily ideal) for each given sequence. Its ``_parameter_element_score`` computes the distance between the bands of 2 digestions.
 
 #### ClonesObservations/
 
 This submodule implements classes for representing and validating the results of experimental restriction digest experiments. In particular, it allows to import observations from the AATI fragment analyzer, and compare these to expected results to create validation or identification reports.
 
 The class structure in this module is complicated because life is complicated. In a typical assembly batch, there are several different constructs assembled. For each construct we can pick several clones and submit the DNA extracted from each clone to several restriction digests.
 
-- **BandsObservations** represents the observation of a digest. It has a name (e.g. the name of the microplate well in which it was observed), bands (more precisely, a set of bands sizes), a ladder with which it was observed, and optionally a picture of the gel. It has a method to be compared with another pattern (using the *band_patterns_discrepancy()* method defined in *band_patterns_discrepancy.py*)
+- **BandsObservations** represents the observation of a digest. It has a name (e.g. the name of the microplate well in which it was observed), bands (more precisely, a set of bands sizes), a ladder with which it was observed, and optionally a picture of the gel. It has a method to be compared with another pattern (using the *band_patterns_discrepancy()* method defined in *band_patterns_discrepancy.py*).
 - **Clone** represents a clone, with a name (could be a microplate well's name), the construct associated with the clone (if known) and the (possibly multiple) restriction digest observations for this clone.
 - **CloneValidation** is a structure representing the comparison between the observations of a Clone and expected patterns. It contains a Clone, the expected patterns for each digestion, and the discrepancy between observed and expected for each digestion. It is the "building block" of validation and identification reports.
 - **ClonesObservations** represents all the information necessary to process a full experimental outcome involving several clones and restriction digests. At it's core, it groups several Clone instances together with the biopython records representing the expected constructs. As the highest-level class in this module, ClonesObservations has methods to be directly created from an AATI Fragment Analyzer zip file (or several zip files, as ClonesObservations can be merged togther). The class also has methods to generate CloneValidations for all clones in its set, in order to validate or identify all clones and compile the results in a PDF report.
 
 #### list_common_enzymes/
 
 - **list_common_enzymes.py** provides a method for getting common enzymes (filtered for star-activity using data in *enzymes_data/*)
-- **enzymes_data/** provides the dictionnary ``enzymes_infos``, generated in the *__init__.py* by parsing the spreadsheet **enzymes_infos.csv**, which contains enzyme data, notably methylation sensitivity, compiled from rebase.neb.com using the script **update_enzymes_list.py**.
+- **enzymes_data/** provides the dictionary ``enzymes_infos``, generated in the *__init__.py* by parsing the spreadsheet **enzymes_infos.csv**, which contains enzyme data, notably methylation sensitivity, compiled from rebase.neb.com using the script **update_enzymes_list.py**.
 
 #### Files at the root
 
 - **bands_predictions.py** implements methods to predict which band sizes given records and enzymes will create, in particular, some methods efficiently compute band sizes for batches of sequences and combinations of enzymes. The methods are used extensively by the *DigestionProblem* class.
-- **tools.py** implements generic methods, notably for Genbank record manipulation.
+- **tools.py** implements generic methods, notably for Genbank record manipulation.
diff --git a/bandwitch/tools.py b/bandwitch/tools.py
@@ -6,7 +6,13 @@
 from Bio.Seq import Seq
 from Bio import SeqIO
 from Bio.SeqRecord import SeqRecord
-from Bio.Alphabet import DNAAlphabet
+
+try:
+ # Biopython <1.78
+ from Bio.Alphabet import DNAAlphabet
+except ImportError:
+ # Biopython >=1.78
+ has_dna_alphabet = False
 from snapgene_reader import snapgene_file_to_seqrecord
 
 
@@ -26,9 +32,8 @@ def set_record_topology(record, topology):
  ]
  if topology not in valid_topologies:
  raise ValueError(
- "topology should be one of %s (was %s)." % (
- ", ".join(valid_topologies), topology
- )
+ "topology should be one of %s (was %s)."
+ % (", ".join(valid_topologies), topology)
  )
  annotations = record.annotations
  default_prefix = "default_to_"
@@ -116,11 +121,17 @@ def sequence_to_biopython_record(
  sequence, id="<unknown id>", name="<unknown name>", features=()
 ):
  """Return a SeqRecord of the sequence, ready to be Genbanked."""
+ if has_dna_alphabet:
+ seq = Seq(sequence, alphabet=DNAAlphabet())
+ else:
+ seq = Seq(sequence)
+
  return SeqRecord(
- Seq(sequence, alphabet=DNAAlphabet()),
+ seq=seq,
  id=id,
  name=name,
  features=list(features),
+ annotations={"molecule_type": "DNA"},
  )
 
 

diff --git a/bandwitch/version.py b/bandwitch/version.py
@@ -1 +1 @@
-__version__ = "0.3.1"
+__version__ = "0.3.2"
diff --git a/pypi-readme.rst b/pypi-readme.rst
@@ -1,8 +1,8 @@
 BandWitch
-===========
+=========
 
-Bandwitch is a Python library for the planning and analysis of restriction
-experiments in DNA assembly operations. Bandwitch implements method to select
+BandWitch is a Python library for the planning and analysis of restriction
+experiments in DNA assembly operations. BandWitch implements method to select
 the best enzyme(s) to validate or identify DNA assemblies. It also provides
 report generation methods to automatically validate/identify assemblies from
 experimental data.
@@ -25,11 +25,11 @@ Infos
 
 `<https://edinburgh-genome-foundry.github.io/BandWitch/>`_
 
-**Github Page**
+**Github Page:**
 
 `<https://github.com/Edinburgh-Genome-Foundry/BandWitch>`_
 
-**Live demo**
+**Live demo:**
 
 Enzyme suggestion: `<http:https://cuba.genomefoundry.org/digestion-selector>`_
 
@@ -46,7 +46,7 @@ Enzymes selected by bandwitch to obtain **clear, optimal** patterns for all test
  :alt: [logo]
  :align: center
 
-Enzymes selected by bandwitch to obtain **significant differences** between the patterns of the tested constructs, so that a construct can be identified by its pattern.
+Enzymes selected by bandwitch to obtain **significant differences** between the patterns of the tested constructs, so that a construct can be identified by its pattern:
 
 .. image:: https://raw.githubusercontent.com/Edinburgh-Genome-Foundry/BandWitch/master/examples/separating_digestions.png
  :alt: [logo]