Skip to content

BiosecSFA/datangle

Repository files navigation

DATANGLE

DATA for genTANGLE


Overview

This repository contains data and results for the GENTANGLE (Gene Tuples ArraNGed in overLapping Elements) pipeline including its computational core CAMEOX (CAMEOs eXtended).

Data

The data in this repository could be used as an example to run the different entry points on the GENTANGLE Singularity container. In particular, DATANGLE provides:

  1. A fully compatible data directory structure with GENTANGLE,
  2. Example files with GENTANGLE inputs, which are very useful as templates for different gene pairs or parameters,
  3. Example files with GENTANGLE output that are useful to learn in advance about the typical results.
  4. All the intermediate files to be able to independently test any step of the GENTANGLE pipeline with the Singularity container by using the example entanglement of infA and aroB (these two genes are entangled in the original CAMEOS paper).

Please see details and documentation on the DATANGLE wiki page.

Results

This repository contains the complete results for an end-to-end gene entanglement example: aroB and infA genes with Pseudomonas protegens PF5 as bacterial host of the synthetic sequences. These results are stored in the output/aroB_pf5_uref100_infA_pf5_uref100_p1 directory, and include:

  • Three different interactive plots in HTML format:
    • All the variants with series by CAMEOX run.
    • Redundant variants for multiplicity analysis.
    • Sampled variants by sampling method (Pareto, multiplicity, overdensities, and random) and ERP (Entanglement Relative Position).
  • Output jupyter notebook as front-end of the analysis.
  • Several data files in different tabular formats, such as plain CSV and compressed serialized pandas dataframes. Please visit the GENTANGLE repository and GENTANGLE wiki for further details and updated documentation on the different results files.

License

DATANGLE is released as part of the GENTANGLE pipeline (LLNL-CODE-845475) and is distributed under the terms of the GNU Affero General Public License v3.0 (see LICENSE).

SPDX-License-Identifier: AGPL-3.0-or-later

LLNL-CODE-845475

Funding

This work is supported by the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research, Lawrence Livermore National Laboratory Secure Biosystems Design SFA “From Sequence to Cell to Population: Secure and Robust Biosystems Design for Environmental Microorganisms”. Work at LLNL is performed under the auspices of the U.S. Department of Energy at Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.


If you use GENTANGLE in your research, please cite the following papers. Thanks!

  1. Martí, JM, et al. GENTANGLE: integrated computational design of gene entanglements. bioRxiv. 2023.11.09.565696. https://doi.org/10.1101/2023.11.09.565696

  2. Blazejewski T, Ho HI, Wang HH. Synthetic sequence entanglement augments stability and containment of genetic information in cells. Science. 2019 Aug 9;365(6453):595-8. https://doi.org/10.1126/science.aav5477