WO2023097325A2 - Systems and methods for identifying genetic phenotypes using programmable nucleases - Google Patents
Systems and methods for identifying genetic phenotypes using programmable nucleases Download PDFInfo
- Publication number
- WO2023097325A2 WO2023097325A2 PCT/US2022/080553 US2022080553W WO2023097325A2 WO 2023097325 A2 WO2023097325 A2 WO 2023097325A2 US 2022080553 W US2022080553 W US 2022080553W WO 2023097325 A2 WO2023097325 A2 WO 2023097325A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- wells
- nucleic acid
- candidate
- allele
- well
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 337
- 101710163270 Nuclease Proteins 0.000 title claims description 160
- 230000002068 genetic effect Effects 0.000 title description 10
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 707
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 676
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 676
- 108700028369 Alleles Proteins 0.000 claims abstract description 382
- 230000035772 mutation Effects 0.000 claims abstract description 267
- 108090000623 proteins and genes Proteins 0.000 claims description 209
- 239000000523 sample Substances 0.000 claims description 186
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 184
- 230000003321 amplification Effects 0.000 claims description 182
- 238000001514 detection method Methods 0.000 claims description 159
- 102000004169 proteins and genes Human genes 0.000 claims description 155
- 239000012472 biological sample Substances 0.000 claims description 120
- 238000003776 cleavage reaction Methods 0.000 claims description 103
- 239000012636 effector Substances 0.000 claims description 98
- 230000007017 scission Effects 0.000 claims description 85
- 239000002773 nucleotide Substances 0.000 claims description 81
- 125000003729 nucleotide group Chemical group 0.000 claims description 80
- 241000700605 Viruses Species 0.000 claims description 75
- 238000006243 chemical reaction Methods 0.000 claims description 63
- 108020004414 DNA Proteins 0.000 claims description 61
- 101150010882 S gene Proteins 0.000 claims description 37
- 239000000203 mixture Substances 0.000 claims description 35
- 206010064571 Gene mutation Diseases 0.000 claims description 27
- 241001678559 COVID-19 virus Species 0.000 claims description 26
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 23
- 244000052769 pathogen Species 0.000 claims description 23
- 230000002441 reversible effect Effects 0.000 claims description 23
- 230000001717 pathogenic effect Effects 0.000 claims description 22
- 238000003752 polymerase chain reaction Methods 0.000 claims description 22
- 239000012634 fragment Substances 0.000 claims description 21
- 239000003153 chemical reaction reagent Substances 0.000 claims description 19
- 230000008859 change Effects 0.000 claims description 17
- 241000894006 Bacteria Species 0.000 claims description 15
- 238000003860 storage Methods 0.000 claims description 15
- 239000012530 fluid Substances 0.000 claims description 13
- 238000000638 solvent extraction Methods 0.000 claims description 13
- 102000053602 DNA Human genes 0.000 claims description 12
- 238000009396 hybridization Methods 0.000 claims description 11
- 239000007850 fluorescent dye Substances 0.000 claims description 10
- 239000012139 lysis buffer Substances 0.000 claims description 10
- 244000045947 parasite Species 0.000 claims description 10
- 238000007397 LAMP assay Methods 0.000 claims description 8
- 210000004369 blood Anatomy 0.000 claims description 8
- 239000008280 blood Substances 0.000 claims description 8
- 238000011901 isothermal amplification Methods 0.000 claims description 8
- 230000001404 mediated effect Effects 0.000 claims description 8
- 229940096437 Protein S Drugs 0.000 claims description 7
- 102000018120 Recombinases Human genes 0.000 claims description 7
- 108010091086 Recombinases Proteins 0.000 claims description 7
- 101710198474 Spike protein Proteins 0.000 claims description 7
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 claims description 6
- 230000002934 lysing effect Effects 0.000 claims description 6
- 238000013518 transcription Methods 0.000 claims description 6
- 230000035897 transcription Effects 0.000 claims description 6
- 241000233866 Fungi Species 0.000 claims description 5
- 108020004682 Single-Stranded DNA Proteins 0.000 claims description 5
- 210000001124 body fluid Anatomy 0.000 claims description 5
- 210000003296 saliva Anatomy 0.000 claims description 5
- 108091092878 Microsatellite Proteins 0.000 claims description 4
- 206010036790 Productive cough Diseases 0.000 claims description 4
- 101100203795 Severe acute respiratory syndrome coronavirus 2 S gene Proteins 0.000 claims description 4
- 238000007894 restriction fragment length polymorphism technique Methods 0.000 claims description 4
- 210000003802 sputum Anatomy 0.000 claims description 4
- 208000024794 sputum Diseases 0.000 claims description 4
- 102100034343 Integrase Human genes 0.000 claims description 3
- 108091034117 Oligonucleotide Proteins 0.000 claims description 3
- 101000629318 Severe acute respiratory syndrome coronavirus 2 Spike glycoprotein Proteins 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 claims description 3
- 238000011179 visual inspection Methods 0.000 claims description 3
- 238000010240 RT-PCR analysis Methods 0.000 claims description 2
- 238000010191 image analysis Methods 0.000 claims description 2
- 230000008707 rearrangement Effects 0.000 claims description 2
- 235000018102 proteins Nutrition 0.000 description 144
- 238000003556 assay Methods 0.000 description 83
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 48
- 201000010099 disease Diseases 0.000 description 46
- 102000004190 Enzymes Human genes 0.000 description 44
- 108090000790 Enzymes Proteins 0.000 description 44
- 229940088598 enzyme Drugs 0.000 description 44
- 230000000694 effects Effects 0.000 description 43
- 238000010791 quenching Methods 0.000 description 38
- 210000004027 cell Anatomy 0.000 description 34
- 230000000171 quenching effect Effects 0.000 description 34
- 206010028980 Neoplasm Diseases 0.000 description 32
- 108091028043 Nucleic acid sequence Proteins 0.000 description 32
- 241000282414 Homo sapiens Species 0.000 description 30
- 208000037797 influenza A Diseases 0.000 description 30
- 238000012070 whole genome sequencing analysis Methods 0.000 description 29
- 235000001014 amino acid Nutrition 0.000 description 25
- 201000011510 cancer Diseases 0.000 description 25
- -1 methods Substances 0.000 description 24
- 230000003612 virological effect Effects 0.000 description 22
- 241000196324 Embryophyta Species 0.000 description 21
- 208000025721 COVID-19 Diseases 0.000 description 20
- 108020005004 Guide RNA Proteins 0.000 description 19
- 230000027455 binding Effects 0.000 description 19
- 230000000295 complement effect Effects 0.000 description 19
- 238000012217 deletion Methods 0.000 description 19
- 238000006467 substitution reaction Methods 0.000 description 19
- 108091028113 Trans-activating crRNA Proteins 0.000 description 18
- 230000037430 deletion Effects 0.000 description 18
- 230000004048 modification Effects 0.000 description 18
- 238000012986 modification Methods 0.000 description 18
- 239000000758 substrate Substances 0.000 description 18
- 238000012360 testing method Methods 0.000 description 18
- 210000001519 tissue Anatomy 0.000 description 17
- 108091079001 CRISPR RNA Proteins 0.000 description 16
- 230000003287 optical effect Effects 0.000 description 16
- 208000015181 infectious disease Diseases 0.000 description 15
- 239000000047 product Substances 0.000 description 15
- 125000006850 spacer group Chemical group 0.000 description 15
- 241000711573 Coronaviridae Species 0.000 description 13
- 201000003176 Severe Acute Respiratory Syndrome Diseases 0.000 description 13
- 238000005516 engineering process Methods 0.000 description 13
- 238000003384 imaging method Methods 0.000 description 13
- 108091093088 Amplicon Proteins 0.000 description 12
- 239000002609 medium Substances 0.000 description 12
- 230000002085 persistent effect Effects 0.000 description 12
- 239000013641 positive control Substances 0.000 description 12
- 241000725303 Human immunodeficiency virus Species 0.000 description 11
- 208000026350 Inborn Genetic disease Diseases 0.000 description 11
- HCHKCACWOHOZIP-UHFFFAOYSA-N Zinc Chemical compound [Zn] HCHKCACWOHOZIP-UHFFFAOYSA-N 0.000 description 11
- 208000016361 genetic disease Diseases 0.000 description 11
- 229920001184 polypeptide Polymers 0.000 description 11
- 102000004196 processed proteins & peptides Human genes 0.000 description 11
- 108090000765 processed proteins & peptides Proteins 0.000 description 11
- 239000011701 zinc Substances 0.000 description 11
- 229910052725 zinc Inorganic materials 0.000 description 11
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 10
- 238000007792 addition Methods 0.000 description 10
- 150000001413 amino acids Chemical class 0.000 description 10
- 238000006073 displacement reaction Methods 0.000 description 10
- 108020004635 Complementary DNA Proteins 0.000 description 9
- 108010051210 beta-Fructofuranosidase Proteins 0.000 description 9
- 239000001573 invertase Substances 0.000 description 9
- 235000011073 invertase Nutrition 0.000 description 9
- 108020004999 messenger RNA Proteins 0.000 description 9
- 239000000126 substance Substances 0.000 description 9
- 241000701806 Human papillomavirus Species 0.000 description 8
- 241001465754 Metazoa Species 0.000 description 8
- 238000010804 cDNA synthesis Methods 0.000 description 8
- 239000003795 chemical substances by application Substances 0.000 description 8
- 239000002299 complementary DNA Substances 0.000 description 8
- 238000007405 data analysis Methods 0.000 description 8
- 238000007834 ligase chain reaction Methods 0.000 description 8
- 239000002777 nucleoside Substances 0.000 description 8
- 125000002924 primary amino group Chemical group [H]N([H])* 0.000 description 8
- 238000010839 reverse transcription Methods 0.000 description 8
- 239000007787 solid Substances 0.000 description 8
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 8
- 238000010453 CRISPR/Cas method Methods 0.000 description 7
- 230000000052 comparative effect Effects 0.000 description 7
- 230000009089 cytolysis Effects 0.000 description 7
- 230000001419 dependent effect Effects 0.000 description 7
- 230000007613 environmental effect Effects 0.000 description 7
- 125000003835 nucleoside group Chemical group 0.000 description 7
- 102100031780 Endonuclease Human genes 0.000 description 6
- 108010043121 Green Fluorescent Proteins Proteins 0.000 description 6
- 102000004144 Green Fluorescent Proteins Human genes 0.000 description 6
- 108060004795 Methyltransferase Proteins 0.000 description 6
- 241000223960 Plasmodium falciparum Species 0.000 description 6
- 241000725643 Respiratory syncytial virus Species 0.000 description 6
- 241000223109 Trypanosoma cruzi Species 0.000 description 6
- 230000015572 biosynthetic process Effects 0.000 description 6
- 239000000975 dye Substances 0.000 description 6
- 239000005090 green fluorescent protein Substances 0.000 description 6
- 208000006454 hepatitis Diseases 0.000 description 6
- 231100000283 hepatitis Toxicity 0.000 description 6
- 230000028327 secretion Effects 0.000 description 6
- 210000002966 serum Anatomy 0.000 description 6
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical compound CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 description 6
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 5
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 5
- 241000004176 Alphacoronavirus Species 0.000 description 5
- 241000494545 Cordyline virus 2 Species 0.000 description 5
- 201000007336 Cryptococcosis Diseases 0.000 description 5
- 208000001490 Dengue Diseases 0.000 description 5
- 206010012310 Dengue fever Diseases 0.000 description 5
- 241000124008 Mammalia Species 0.000 description 5
- 241000244206 Nematoda Species 0.000 description 5
- 208000002606 Paramyxoviridae Infections Diseases 0.000 description 5
- 108700005078 Synthetic Genes Proteins 0.000 description 5
- 241000710886 West Nile virus Species 0.000 description 5
- 239000003570 air Substances 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 5
- 230000000845 anti-microbial effect Effects 0.000 description 5
- 229960002685 biotin Drugs 0.000 description 5
- 235000020958 biotin Nutrition 0.000 description 5
- 239000011616 biotin Substances 0.000 description 5
- 125000003178 carboxy group Chemical group [H]OC(*)=O 0.000 description 5
- 210000000349 chromosome Anatomy 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 208000025729 dengue disease Diseases 0.000 description 5
- 238000013504 emergency use authorization Methods 0.000 description 5
- 230000002255 enzymatic effect Effects 0.000 description 5
- 238000003205 genotyping method Methods 0.000 description 5
- 244000000013 helminth Species 0.000 description 5
- 208000032839 leukemia Diseases 0.000 description 5
- 201000004792 malaria Diseases 0.000 description 5
- 230000000813 microbial effect Effects 0.000 description 5
- 244000005700 microbiome Species 0.000 description 5
- 210000000056 organ Anatomy 0.000 description 5
- 239000000243 solution Substances 0.000 description 5
- 241000712461 unidentified influenza virus Species 0.000 description 5
- 241000251468 Actinopterygii Species 0.000 description 4
- 241000283690 Bos taurus Species 0.000 description 4
- 241000221204 Cryptococcus neoformans Species 0.000 description 4
- 201000011001 Ebola Hemorrhagic Fever Diseases 0.000 description 4
- 241000228404 Histoplasma capsulatum Species 0.000 description 4
- 241000187479 Mycobacterium tuberculosis Species 0.000 description 4
- 229910019142 PO4 Inorganic materials 0.000 description 4
- 108091028664 Ribonucleotide Proteins 0.000 description 4
- 241000700584 Simplexvirus Species 0.000 description 4
- 229930006000 Sucrose Natural products 0.000 description 4
- 241000223997 Toxoplasma gondii Species 0.000 description 4
- 241000223105 Trypanosoma brucei Species 0.000 description 4
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 4
- 208000000260 Warts Diseases 0.000 description 4
- 230000002159 abnormal effect Effects 0.000 description 4
- 230000003197 catalytic effect Effects 0.000 description 4
- 239000005547 deoxyribonucleotide Substances 0.000 description 4
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 4
- 230000005284 excitation Effects 0.000 description 4
- 235000013305 food Nutrition 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 4
- 230000036039 immunity Effects 0.000 description 4
- 230000002458 infectious effect Effects 0.000 description 4
- 238000007851 intersequence-specific PCR Methods 0.000 description 4
- 238000007855 methylation-specific PCR Methods 0.000 description 4
- 210000003097 mucus Anatomy 0.000 description 4
- 239000002105 nanoparticle Substances 0.000 description 4
- 238000007857 nested PCR Methods 0.000 description 4
- 239000013610 patient sample Substances 0.000 description 4
- 239000010452 phosphate Substances 0.000 description 4
- 210000002381 plasma Anatomy 0.000 description 4
- 238000007858 polymerase cycling assembly Methods 0.000 description 4
- 102000040430 polynucleotide Human genes 0.000 description 4
- 108091033319 polynucleotide Proteins 0.000 description 4
- 239000002157 polynucleotide Substances 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000003908 quality control method Methods 0.000 description 4
- 238000003753 real-time PCR Methods 0.000 description 4
- 239000002336 ribonucleotide Substances 0.000 description 4
- 125000002652 ribonucleotide group Chemical group 0.000 description 4
- 230000035945 sensitivity Effects 0.000 description 4
- 238000012163 sequencing technique Methods 0.000 description 4
- 201000010153 skin papilloma Diseases 0.000 description 4
- 239000002689 soil Substances 0.000 description 4
- 241000894007 species Species 0.000 description 4
- 239000005720 sucrose Substances 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 239000001226 triphosphate Substances 0.000 description 4
- 241000701161 unidentified adenovirus Species 0.000 description 4
- 238000012800 visualization Methods 0.000 description 4
- 108090001008 Avidin Proteins 0.000 description 3
- 241000008904 Betacoronavirus Species 0.000 description 3
- 241000606161 Chlamydia Species 0.000 description 3
- 241000588724 Escherichia coli Species 0.000 description 3
- 206010018612 Gonorrhoea Diseases 0.000 description 3
- 241000282412 Homo Species 0.000 description 3
- 241000701044 Human gammaherpesvirus 4 Species 0.000 description 3
- 208000004554 Leishmaniasis Diseases 0.000 description 3
- 241000270322 Lepidosauria Species 0.000 description 3
- 241000588652 Neisseria gonorrhoeae Species 0.000 description 3
- 241000243985 Onchocerca volvulus Species 0.000 description 3
- 241001631646 Papillomaviridae Species 0.000 description 3
- 241001135221 Prevotella intermedia Species 0.000 description 3
- 241000315672 SARS coronavirus Species 0.000 description 3
- 241000191967 Staphylococcus aureus Species 0.000 description 3
- 108010090804 Streptavidin Proteins 0.000 description 3
- 241000193996 Streptococcus pyogenes Species 0.000 description 3
- 241000282898 Sus scrofa Species 0.000 description 3
- 208000000389 T-cell leukemia Diseases 0.000 description 3
- 208000028530 T-cell lymphoblastic leukemia/lymphoma Diseases 0.000 description 3
- 208000002903 Thalassemia Diseases 0.000 description 3
- 201000005485 Toxoplasmosis Diseases 0.000 description 3
- 208000005448 Trichomonas Infections Diseases 0.000 description 3
- 206010044620 Trichomoniasis Diseases 0.000 description 3
- 125000000539 amino acid group Chemical group 0.000 description 3
- 239000011324 bead Substances 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 239000000872 buffer Substances 0.000 description 3
- 210000001175 cerebrospinal fluid Anatomy 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000004069 differentiation Effects 0.000 description 3
- 231100000676 disease causative agent Toxicity 0.000 description 3
- 108010048367 enhanced green fluorescent protein Proteins 0.000 description 3
- 210000003722 extracellular fluid Anatomy 0.000 description 3
- GNBHRKFJIUUOQI-UHFFFAOYSA-N fluorescein Chemical compound O1C(=O)C2=CC=CC=C2C21C1=CC=C(O)C=C1OC1=CC(O)=CC=C21 GNBHRKFJIUUOQI-UHFFFAOYSA-N 0.000 description 3
- 102000034287 fluorescent proteins Human genes 0.000 description 3
- 108091006047 fluorescent proteins Proteins 0.000 description 3
- 230000002496 gastric effect Effects 0.000 description 3
- 208000001786 gonorrhea Diseases 0.000 description 3
- 230000001965 increasing effect Effects 0.000 description 3
- 208000037798 influenza B Diseases 0.000 description 3
- 238000002955 isolation Methods 0.000 description 3
- 239000003550 marker Substances 0.000 description 3
- 238000002625 monoclonal antibody therapy Methods 0.000 description 3
- 239000013642 negative control Substances 0.000 description 3
- 230000000241 respiratory effect Effects 0.000 description 3
- 238000003757 reverse transcription PCR Methods 0.000 description 3
- 230000009870 specific binding Effects 0.000 description 3
- 238000011895 specific detection Methods 0.000 description 3
- 208000006379 syphilis Diseases 0.000 description 3
- 238000007861 thermal asymmetric interlaced PCR Methods 0.000 description 3
- 229940113082 thymine Drugs 0.000 description 3
- 210000002700 urine Anatomy 0.000 description 3
- 238000002255 vaccination Methods 0.000 description 3
- FZWGECJQACGGTI-UHFFFAOYSA-N 2-amino-7-methyl-1,7-dihydro-6H-purin-6-one Chemical compound NC1=NC(O)=C2N(C)C=NC2=N1 FZWGECJQACGGTI-UHFFFAOYSA-N 0.000 description 2
- OIVLITBTBDPEFK-UHFFFAOYSA-N 5,6-dihydrouracil Chemical compound O=C1CCNC(=O)N1 OIVLITBTBDPEFK-UHFFFAOYSA-N 0.000 description 2
- RYVNIFSIEDRLSJ-UHFFFAOYSA-N 5-(hydroxymethyl)cytosine Chemical compound NC=1NC(=O)N=CC=1CO RYVNIFSIEDRLSJ-UHFFFAOYSA-N 0.000 description 2
- LRFVTYWOQMYALW-UHFFFAOYSA-N 9H-xanthine Chemical compound O=C1NC(=O)NC2=C1NC=N2 LRFVTYWOQMYALW-UHFFFAOYSA-N 0.000 description 2
- 102100033793 ALK tyrosine kinase receptor Human genes 0.000 description 2
- 102000000872 ATM Human genes 0.000 description 2
- 241000700606 Acanthocephala Species 0.000 description 2
- 241000203022 Acholeplasma laidlawii Species 0.000 description 2
- 229930024421 Adenine Natural products 0.000 description 2
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 2
- 208000000230 African Trypanosomiasis Diseases 0.000 description 2
- 102000002260 Alkaline Phosphatase Human genes 0.000 description 2
- 108020004774 Alkaline Phosphatase Proteins 0.000 description 2
- 206010001935 American trypanosomiasis Diseases 0.000 description 2
- 206010001986 Amoebic dysentery Diseases 0.000 description 2
- 108010004586 Ataxia Telangiectasia Mutated Proteins Proteins 0.000 description 2
- 241000223838 Babesia bovis Species 0.000 description 2
- 241000193738 Bacillus anthracis Species 0.000 description 2
- 102100026189 Beta-galactosidase Human genes 0.000 description 2
- 241000228405 Blastomyces dermatitidis Species 0.000 description 2
- 102100035631 Bloom syndrome protein Human genes 0.000 description 2
- 108091009167 Bloom syndrome protein Proteins 0.000 description 2
- 241000120506 Bluetongue virus Species 0.000 description 2
- 241000588832 Bordetella pertussis Species 0.000 description 2
- 241000589969 Borreliella burgdorferi Species 0.000 description 2
- 241000724256 Brome mosaic virus Species 0.000 description 2
- 241000589567 Brucella abortus Species 0.000 description 2
- 102100026094 C-type lectin domain family 12 member A Human genes 0.000 description 2
- 241000222122 Candida albicans Species 0.000 description 2
- 241001135194 Capnocytophaga canimorsus Species 0.000 description 2
- 241000701489 Cauliflower mosaic virus Species 0.000 description 2
- 206010008342 Cervix carcinoma Diseases 0.000 description 2
- 241000242722 Cestoda Species 0.000 description 2
- 208000024699 Chagas disease Diseases 0.000 description 2
- 201000006082 Chickenpox Diseases 0.000 description 2
- 201000009182 Chikungunya Diseases 0.000 description 2
- 241001647372 Chlamydia pneumoniae Species 0.000 description 2
- 241000606153 Chlamydia trachomatis Species 0.000 description 2
- 241000223205 Coccidioides immitis Species 0.000 description 2
- 208000003495 Coccidiosis Diseases 0.000 description 2
- 208000035473 Communicable disease Diseases 0.000 description 2
- 208000001528 Coronaviridae Infections Diseases 0.000 description 2
- 241000724252 Cucumber mosaic virus Species 0.000 description 2
- 201000003808 Cystic echinococcosis Diseases 0.000 description 2
- 201000003883 Cystic fibrosis Diseases 0.000 description 2
- 241000701022 Cytomegalovirus Species 0.000 description 2
- 241000725619 Dengue virus Species 0.000 description 2
- 108010008532 Deoxyribonuclease I Proteins 0.000 description 2
- 102000007260 Deoxyribonuclease I Human genes 0.000 description 2
- 241000244170 Echinococcus granulosus Species 0.000 description 2
- 241000223932 Eimeria tenella Species 0.000 description 2
- 108010042407 Endonucleases Proteins 0.000 description 2
- 102100038083 Endosialin Human genes 0.000 description 2
- 241000991587 Enterovirus C Species 0.000 description 2
- 241000283073 Equus caballus Species 0.000 description 2
- 241000714165 Feline leukemia virus Species 0.000 description 2
- 241000282324 Felis Species 0.000 description 2
- 241000224466 Giardia Species 0.000 description 2
- 108010015776 Glucose oxidase Proteins 0.000 description 2
- 239000004366 Glucose oxidase Substances 0.000 description 2
- 102100032530 Glypican-3 Human genes 0.000 description 2
- 241000606768 Haemophilus influenzae Species 0.000 description 2
- 208000031220 Hemophilia Diseases 0.000 description 2
- 208000009292 Hemophilia A Diseases 0.000 description 2
- 241000711549 Hepacivirus C Species 0.000 description 2
- 241000700721 Hepatitis B virus Species 0.000 description 2
- 208000005176 Hepatitis C Diseases 0.000 description 2
- 101001023784 Heteractis crispa GFP-like non-fluorescent chromoprotein Proteins 0.000 description 2
- 208000017604 Hodgkin disease Diseases 0.000 description 2
- 208000010747 Hodgkins lymphoma Diseases 0.000 description 2
- 101000779641 Homo sapiens ALK tyrosine kinase receptor Proteins 0.000 description 2
- 101000929928 Homo sapiens Angiotensin-converting enzyme 2 Proteins 0.000 description 2
- 101000785776 Homo sapiens Artemin Proteins 0.000 description 2
- 101000884275 Homo sapiens Endosialin Proteins 0.000 description 2
- 101001014668 Homo sapiens Glypican-3 Proteins 0.000 description 2
- 101000981336 Homo sapiens Nibrin Proteins 0.000 description 2
- 108010001336 Horseradish Peroxidase Proteins 0.000 description 2
- 241000714260 Human T-lymphotropic virus 1 Species 0.000 description 2
- 241000701074 Human alphaherpesvirus 2 Species 0.000 description 2
- 241000701085 Human alphaherpesvirus 3 Species 0.000 description 2
- 241000711467 Human coronavirus 229E Species 0.000 description 2
- 241000482741 Human coronavirus NL63 Species 0.000 description 2
- 241001428935 Human coronavirus OC43 Species 0.000 description 2
- 241000342334 Human metapneumovirus Species 0.000 description 2
- 206010061598 Immunodeficiency Diseases 0.000 description 2
- 208000029462 Immunodeficiency disease Diseases 0.000 description 2
- 206010023076 Isosporiasis Diseases 0.000 description 2
- PWKSKIMOESPYIA-BYPYZUCNSA-N L-N-acetyl-Cysteine Chemical compound CC(=O)N[C@@H](CS)C(O)=O PWKSKIMOESPYIA-BYPYZUCNSA-N 0.000 description 2
- 241000904817 Lachnospiraceae bacterium Species 0.000 description 2
- 241000589242 Legionella pneumophila Species 0.000 description 2
- 241000222736 Leishmania tropica Species 0.000 description 2
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 2
- 241000712899 Lymphocytic choriomeningitis mammarenavirus Species 0.000 description 2
- 241000829100 Macaca mulatta polyomavirus 1 Species 0.000 description 2
- 201000005505 Measles Diseases 0.000 description 2
- 241000712079 Measles morbillivirus Species 0.000 description 2
- 206010027260 Meningitis viral Diseases 0.000 description 2
- 241000002163 Mesapamea fractilinea Species 0.000 description 2
- 241000520674 Mesocestoides corti Species 0.000 description 2
- 108700011259 MicroRNAs Proteins 0.000 description 2
- 241000127282 Middle East respiratory syndrome-related coronavirus Species 0.000 description 2
- 241000713333 Mouse mammary tumor virus Species 0.000 description 2
- 241000711386 Mumps virus Species 0.000 description 2
- 241000714177 Murine leukemia virus Species 0.000 description 2
- 241000711408 Murine respirovirus Species 0.000 description 2
- 241000186362 Mycobacterium leprae Species 0.000 description 2
- 241000204028 Mycoplasma arginini Species 0.000 description 2
- 241000202956 Mycoplasma arthritidis Species 0.000 description 2
- 241000204051 Mycoplasma genitalium Species 0.000 description 2
- 241000202938 Mycoplasma hyorhinis Species 0.000 description 2
- 241000202894 Mycoplasma orale Species 0.000 description 2
- 241000202934 Mycoplasma pneumoniae Species 0.000 description 2
- 241000202889 Mycoplasma salivarium Species 0.000 description 2
- 241000588650 Neisseria meningitidis Species 0.000 description 2
- 102100024403 Nibrin Human genes 0.000 description 2
- 241001263478 Norovirus Species 0.000 description 2
- 206010072369 Pharyngeal exudate Diseases 0.000 description 2
- 240000009188 Phyllostachys vivax Species 0.000 description 2
- 241000223810 Plasmodium vivax Species 0.000 description 2
- 241000723784 Plum pox virus Species 0.000 description 2
- 208000000474 Poliomyelitis Diseases 0.000 description 2
- 241000709992 Potato virus X Species 0.000 description 2
- 241000723762 Potato virus Y Species 0.000 description 2
- 241000611831 Prevotella sp. Species 0.000 description 2
- 208000010362 Protozoan Infections Diseases 0.000 description 2
- 241000589517 Pseudomonas aeruginosa Species 0.000 description 2
- 238000012341 Quantitative reverse-transcriptase PCR Methods 0.000 description 2
- 206010037742 Rabies Diseases 0.000 description 2
- 241000711798 Rabies lyssavirus Species 0.000 description 2
- 241000702263 Reovirus sp. Species 0.000 description 2
- 241000710799 Rubella virus Species 0.000 description 2
- 241000242678 Schistosoma Species 0.000 description 2
- 241000242677 Schistosoma japonicum Species 0.000 description 2
- 241000242680 Schistosoma mansoni Species 0.000 description 2
- 241000710960 Sindbis virus Species 0.000 description 2
- 102220642430 Spindlin-1_P681R_mutation Human genes 0.000 description 2
- 239000004138 Stearyl citrate Substances 0.000 description 2
- 241000193985 Streptococcus agalactiae Species 0.000 description 2
- CZMRCDWAGMRECN-UGDNZRGBSA-N Sucrose Chemical compound O[C@H]1[C@H](O)[C@@H](CO)O[C@@]1(CO)O[C@@H]1[C@H](O)[C@@H](O)[C@H](O)[C@@H](CO)O1 CZMRCDWAGMRECN-UGDNZRGBSA-N 0.000 description 2
- 238000010459 TALEN Methods 0.000 description 2
- PZBFGYYEXUXCOF-UHFFFAOYSA-N TCEP Chemical compound OC(=O)CCP(CCC(O)=O)CCC(O)=O PZBFGYYEXUXCOF-UHFFFAOYSA-N 0.000 description 2
- 241001672171 Taenia hydatigena Species 0.000 description 2
- 241000244154 Taenia ovis Species 0.000 description 2
- 241000244159 Taenia saginata Species 0.000 description 2
- 241000223779 Theileria parva Species 0.000 description 2
- 241000723873 Tobacco mosaic virus Species 0.000 description 2
- 241000016010 Tomato spotted wilt orthotospovirus Species 0.000 description 2
- 241000223996 Toxoplasma Species 0.000 description 2
- 108010043645 Transcription Activator-Like Effector Nucleases Proteins 0.000 description 2
- 241000242541 Trematoda Species 0.000 description 2
- 241000589884 Treponema pallidum Species 0.000 description 2
- 241000243777 Trichinella spiralis Species 0.000 description 2
- 241000224526 Trichomonas Species 0.000 description 2
- 241000224527 Trichomonas vaginalis Species 0.000 description 2
- 241001442397 Trypanosoma brucei rhodesiense Species 0.000 description 2
- 241000223097 Trypanosoma rangeli Species 0.000 description 2
- 208000006105 Uterine Cervical Neoplasms Diseases 0.000 description 2
- 206010046980 Varicella Diseases 0.000 description 2
- 241000251539 Vertebrata <Metazoa> Species 0.000 description 2
- 241000711975 Vesicular stomatitis virus Species 0.000 description 2
- 108700020467 WT1 Proteins 0.000 description 2
- 241000710772 Yellow fever virus Species 0.000 description 2
- 108010017070 Zinc Finger Nucleases Proteins 0.000 description 2
- 238000002835 absorbance Methods 0.000 description 2
- 229960004308 acetylcysteine Drugs 0.000 description 2
- 239000002253 acid Substances 0.000 description 2
- 230000001154 acute effect Effects 0.000 description 2
- 229960000643 adenine Drugs 0.000 description 2
- 238000007844 allele-specific PCR Methods 0.000 description 2
- 210000004102 animal cell Anatomy 0.000 description 2
- 230000000507 anthelmentic effect Effects 0.000 description 2
- 230000000692 anti-sense effect Effects 0.000 description 2
- 230000000840 anti-viral effect Effects 0.000 description 2
- 238000007846 asymmetric PCR Methods 0.000 description 2
- 201000008680 babesiosis Diseases 0.000 description 2
- 230000001580 bacterial effect Effects 0.000 description 2
- 239000013060 biological fluid Substances 0.000 description 2
- 229940056450 brucella abortus Drugs 0.000 description 2
- 229940095731 candida albicans Drugs 0.000 description 2
- 230000010261 cell growth Effects 0.000 description 2
- 239000013592 cell lysate Substances 0.000 description 2
- 230000019522 cellular metabolic process Effects 0.000 description 2
- 201000010881 cervical cancer Diseases 0.000 description 2
- 210000003756 cervix mucus Anatomy 0.000 description 2
- 229940038705 chlamydia trachomatis Drugs 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 2
- SUYVUBYJARFZHO-RRKCRQDMSA-N dATP Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@H]1C[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 SUYVUBYJARFZHO-RRKCRQDMSA-N 0.000 description 2
- SUYVUBYJARFZHO-UHFFFAOYSA-N dATP Natural products C1=NC=2C(N)=NC=NC=2N1C1CC(O)C(COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 SUYVUBYJARFZHO-UHFFFAOYSA-N 0.000 description 2
- RGWHQCVHVJXOKC-SHYZEUOFSA-J dCTP(4-) Chemical compound O=C1N=C(N)C=CN1[C@@H]1O[C@H](COP([O-])(=O)OP([O-])(=O)OP([O-])([O-])=O)[C@@H](O)C1 RGWHQCVHVJXOKC-SHYZEUOFSA-J 0.000 description 2
- HAAZLUGHYHWQIW-KVQBGUIXSA-N dGTP Chemical compound C1=NC=2C(=O)NC(N)=NC=2N1[C@H]1C[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 HAAZLUGHYHWQIW-KVQBGUIXSA-N 0.000 description 2
- NHVNXKFIZYSCEB-XLPZGREQSA-N dTTP Chemical compound O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)C1 NHVNXKFIZYSCEB-XLPZGREQSA-N 0.000 description 2
- 230000009615 deamination Effects 0.000 description 2
- 238000006481 deamination reaction Methods 0.000 description 2
- 230000003412 degenerative effect Effects 0.000 description 2
- 238000004925 denaturation Methods 0.000 description 2
- 230000036425 denaturation Effects 0.000 description 2
- 239000005549 deoxyribonucleoside Substances 0.000 description 2
- 230000000249 desinfective effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000007847 digital PCR Methods 0.000 description 2
- 239000000539 dimer Substances 0.000 description 2
- 208000035475 disorder Diseases 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- VHJLVAABSRFDPM-QWWZWVQMSA-N dithiothreitol Chemical compound SC[C@@H](O)[C@H](O)CS VHJLVAABSRFDPM-QWWZWVQMSA-N 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 208000001848 dysentery Diseases 0.000 description 2
- 102000052116 epidermal growth factor receptor activity proteins Human genes 0.000 description 2
- 108700015053 epidermal growth factor receptor activity proteins Proteins 0.000 description 2
- 210000003527 eukaryotic cell Anatomy 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 210000000416 exudates and transudate Anatomy 0.000 description 2
- 238000002866 fluorescence resonance energy transfer Methods 0.000 description 2
- 230000002538 fungal effect Effects 0.000 description 2
- 244000053095 fungal pathogen Species 0.000 description 2
- 210000004392 genitalia Anatomy 0.000 description 2
- 238000013412 genome amplification Methods 0.000 description 2
- 229940116332 glucose oxidase Drugs 0.000 description 2
- 235000019420 glucose oxidase Nutrition 0.000 description 2
- 208000014829 head and neck neoplasm Diseases 0.000 description 2
- 208000005252 hepatitis A Diseases 0.000 description 2
- 208000002672 hepatitis B Diseases 0.000 description 2
- 102000048657 human ACE2 Human genes 0.000 description 2
- 208000029080 human African trypanosomiasis Diseases 0.000 description 2
- 210000005260 human cell Anatomy 0.000 description 2
- 239000001257 hydrogen Substances 0.000 description 2
- 229910052739 hydrogen Inorganic materials 0.000 description 2
- FDGQSTZJBFJUBT-UHFFFAOYSA-N hypoxanthine Chemical compound O=C1NC=NC2=C1NC=N2 FDGQSTZJBFJUBT-UHFFFAOYSA-N 0.000 description 2
- 230000007813 immunodeficiency Effects 0.000 description 2
- 238000010348 incorporation Methods 0.000 description 2
- 201000006747 infectious mononucleosis Diseases 0.000 description 2
- 206010022000 influenza Diseases 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 239000000543 intermediate Substances 0.000 description 2
- 238000007852 inverse PCR Methods 0.000 description 2
- 229940115932 legionella pneumophila Drugs 0.000 description 2
- 208000028454 lice infestation Diseases 0.000 description 2
- 210000004072 lung Anatomy 0.000 description 2
- 201000005202 lung cancer Diseases 0.000 description 2
- 208000020816 lung neoplasm Diseases 0.000 description 2
- 238000007403 mPCR Methods 0.000 description 2
- 210000004962 mammalian cell Anatomy 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 201000001441 melanoma Diseases 0.000 description 2
- 239000002679 microRNA Substances 0.000 description 2
- 201000006938 muscular dystrophy Diseases 0.000 description 2
- YOHYSYJDKVYCJI-UHFFFAOYSA-N n-[3-[[6-[3-(trifluoromethyl)anilino]pyrimidin-4-yl]amino]phenyl]cyclopropanecarboxamide Chemical compound FC(F)(F)C1=CC=CC(NC=2N=CN=C(NC=3C=C(NC(=O)C4CC4)C=CC=3)C=2)=C1 YOHYSYJDKVYCJI-UHFFFAOYSA-N 0.000 description 2
- 230000003472 neutralizing effect Effects 0.000 description 2
- 230000002018 overexpression Effects 0.000 description 2
- 230000003071 parasitic effect Effects 0.000 description 2
- 125000002467 phosphate group Chemical group [H]OP(=O)(O[H])O[*] 0.000 description 2
- 108060006184 phycobiliprotein Proteins 0.000 description 2
- 229920000642 polymer Polymers 0.000 description 2
- 244000079416 protozoan pathogen Species 0.000 description 2
- 230000005180 public health Effects 0.000 description 2
- 239000002096 quantum dot Substances 0.000 description 2
- XKMLYUALXHKNFT-UHFFFAOYSA-N rGTP Natural products C1=2NC(N)=NC(=O)C=2N=CN1C1OC(COP(O)(=O)OP(O)(=O)OP(O)(O)=O)C(O)C1O XKMLYUALXHKNFT-UHFFFAOYSA-N 0.000 description 2
- 102000005962 receptors Human genes 0.000 description 2
- 108020003175 receptors Proteins 0.000 description 2
- 239000002342 ribonucleoside Substances 0.000 description 2
- 238000005096 rolling process Methods 0.000 description 2
- 201000005404 rubella Diseases 0.000 description 2
- 210000000582 semen Anatomy 0.000 description 2
- 208000002491 severe combined immunodeficiency Diseases 0.000 description 2
- 208000007056 sickle cell anemia Diseases 0.000 description 2
- 201000002612 sleeping sickness Diseases 0.000 description 2
- 230000001954 sterilising effect Effects 0.000 description 2
- 238000004659 sterilization and disinfection Methods 0.000 description 2
- 125000000185 sucrose group Chemical group 0.000 description 2
- 238000002198 surface plasmon resonance spectroscopy Methods 0.000 description 2
- 230000008685 targeting Effects 0.000 description 2
- 210000001138 tear Anatomy 0.000 description 2
- 238000007862 touchdown PCR Methods 0.000 description 2
- 229940096911 trichinella spiralis Drugs 0.000 description 2
- 235000011178 triphosphate Nutrition 0.000 description 2
- 239000000107 tumor biomarker Substances 0.000 description 2
- 241001529453 unidentified herpesvirus Species 0.000 description 2
- 229940035893 uracil Drugs 0.000 description 2
- 229960005486 vaccine Drugs 0.000 description 2
- 108700026220 vif Genes Proteins 0.000 description 2
- 201000010044 viral meningitis Diseases 0.000 description 2
- 229940051021 yellow-fever virus Drugs 0.000 description 2
- XOYCLJDJUKHHHS-LHBOOPKSSA-N (2s,3s,4s,5r,6r)-6-[[(2s,3s,5r)-3-amino-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy]-3,4,5-trihydroxyoxane-2-carboxylic acid Chemical compound O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](CO[C@H]2[C@@H]([C@@H](O)[C@H](O)[C@H](O2)C(O)=O)O)[C@@H](N)C1 XOYCLJDJUKHHHS-LHBOOPKSSA-N 0.000 description 1
- 102100028734 1,4-alpha-glucan-branching enzyme Human genes 0.000 description 1
- WKBPZYKAUNRMKP-UHFFFAOYSA-N 1-[2-(2,4-dichlorophenyl)pentyl]1,2,4-triazole Chemical compound C=1C=C(Cl)C=C(Cl)C=1C(CCC)CN1C=NC=N1 WKBPZYKAUNRMKP-UHFFFAOYSA-N 0.000 description 1
- UHDGCWIWMRVCDJ-UHFFFAOYSA-N 1-beta-D-Xylofuranosyl-NH-Cytosine Natural products O=C1N=C(N)C=CN1C1C(O)C(O)C(CO)O1 UHDGCWIWMRVCDJ-UHFFFAOYSA-N 0.000 description 1
- 108020004463 18S ribosomal RNA Proteins 0.000 description 1
- YMHOBZXQZVXHBM-UHFFFAOYSA-N 2,5-dimethoxy-4-bromophenethylamine Chemical compound COC1=CC(CCN)=C(OC)C=C1Br YMHOBZXQZVXHBM-UHFFFAOYSA-N 0.000 description 1
- JKMHFZQWWAIEOD-UHFFFAOYSA-N 2-[4-(2-hydroxyethyl)piperazin-1-yl]ethanesulfonic acid Chemical compound OCC[NH+]1CCN(CCS([O-])(=O)=O)CC1 JKMHFZQWWAIEOD-UHFFFAOYSA-N 0.000 description 1
- BGFTWECWAICPDG-UHFFFAOYSA-N 2-[bis(4-chlorophenyl)methyl]-4-n-[3-[bis(4-chlorophenyl)methyl]-4-(dimethylamino)phenyl]-1-n,1-n-dimethylbenzene-1,4-diamine Chemical compound C1=C(C(C=2C=CC(Cl)=CC=2)C=2C=CC(Cl)=CC=2)C(N(C)C)=CC=C1NC(C=1)=CC=C(N(C)C)C=1C(C=1C=CC(Cl)=CC=1)C1=CC=C(Cl)C=C1 BGFTWECWAICPDG-UHFFFAOYSA-N 0.000 description 1
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 1
- CVOFKRWYWCSDMA-UHFFFAOYSA-N 2-chloro-n-(2,6-diethylphenyl)-n-(methoxymethyl)acetamide;2,6-dinitro-n,n-dipropyl-4-(trifluoromethyl)aniline Chemical compound CCC1=CC=CC(CC)=C1N(COC)C(=O)CCl.CCCN(CCC)C1=C([N+]([O-])=O)C=C(C(F)(F)F)C=C1[N+]([O-])=O CVOFKRWYWCSDMA-UHFFFAOYSA-N 0.000 description 1
- ASJSAQIRZKANQN-CRCLSJGQSA-N 2-deoxy-D-ribose Chemical compound OC[C@@H](O)[C@@H](O)CC=O ASJSAQIRZKANQN-CRCLSJGQSA-N 0.000 description 1
- 102100035352 2-oxoisovalerate dehydrogenase subunit alpha, mitochondrial Human genes 0.000 description 1
- 102100035315 2-oxoisovalerate dehydrogenase subunit beta, mitochondrial Human genes 0.000 description 1
- 108010067083 3 beta-hydroxysteroid dehydrogenase type II Proteins 0.000 description 1
- LKKMLIBUAXYLOY-UHFFFAOYSA-N 3-Amino-1-methyl-5H-pyrido[4,3-b]indole Chemical compound N1C2=CC=CC=C2C2=C1C=C(N)N=C2C LKKMLIBUAXYLOY-UHFFFAOYSA-N 0.000 description 1
- 108010071258 4-hydroxy-2-oxoglutarate aldolase Proteins 0.000 description 1
- 102100027715 4-hydroxy-2-oxoglutarate aldolase, mitochondrial Human genes 0.000 description 1
- MXCVHSXCXPHOLP-UHFFFAOYSA-N 4-oxo-6-propylchromene-2-carboxylic acid Chemical compound O1C(C(O)=O)=CC(=O)C2=CC(CCC)=CC=C21 MXCVHSXCXPHOLP-UHFFFAOYSA-N 0.000 description 1
- 102100039791 43 kDa receptor-associated protein of the synapse Human genes 0.000 description 1
- ZAYHVCMSTBRABG-UHFFFAOYSA-N 5-Methylcytidine Natural products O=C1N=C(N)C(C)=CN1C1C(O)C(O)C(CO)O1 ZAYHVCMSTBRABG-UHFFFAOYSA-N 0.000 description 1
- BLQMCTXZEMGOJM-UHFFFAOYSA-N 5-carboxycytosine Chemical compound NC=1NC(=O)N=CC=1C(O)=O BLQMCTXZEMGOJM-UHFFFAOYSA-N 0.000 description 1
- FHSISDGOVSHJRW-UHFFFAOYSA-N 5-formylcytosine Chemical compound NC1=NC(=O)NC=C1C=O FHSISDGOVSHJRW-UHFFFAOYSA-N 0.000 description 1
- ZAYHVCMSTBRABG-JXOAFFINSA-N 5-methylcytidine Chemical compound O=C1N=C(N)C(C)=CN1[C@H]1[C@H](O)[C@H](O)[C@@H](CO)O1 ZAYHVCMSTBRABG-JXOAFFINSA-N 0.000 description 1
- LRSASMSXMSNRBT-UHFFFAOYSA-N 5-methylcytosine Chemical compound CC1=CNC(=O)N=C1N LRSASMSXMSNRBT-UHFFFAOYSA-N 0.000 description 1
- 102100031126 6-phosphogluconolactonase Human genes 0.000 description 1
- 108010029731 6-phosphogluconolactonase Proteins 0.000 description 1
- 102100036512 7-dehydrocholesterol reductase Human genes 0.000 description 1
- OGHAROSJZRTIOK-KQYNXXCUSA-O 7-methylguanosine Chemical compound C1=2N=C(N)NC(=O)C=2[N+](C)=CN1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O OGHAROSJZRTIOK-KQYNXXCUSA-O 0.000 description 1
- 102100027399 A disintegrin and metalloproteinase with thrombospondin motifs 2 Human genes 0.000 description 1
- 102100040079 A-kinase anchor protein 4 Human genes 0.000 description 1
- 101710109924 A-kinase anchor protein 4 Proteins 0.000 description 1
- 101710112753 A-type inclusion protein A25 homolog Proteins 0.000 description 1
- 108091005662 ADAMTS2 Proteins 0.000 description 1
- 102100031585 ADP-ribosyl cyclase/cyclic ADP-ribose hydrolase 1 Human genes 0.000 description 1
- 102000017918 ADRB3 Human genes 0.000 description 1
- 108060003355 ADRB3 Proteins 0.000 description 1
- 102100024645 ATP-binding cassette sub-family C member 8 Human genes 0.000 description 1
- 102100024643 ATP-binding cassette sub-family D member 1 Human genes 0.000 description 1
- 102100032922 ATP-dependent 6-phosphofructokinase, muscle type Human genes 0.000 description 1
- 102100027452 ATP-dependent DNA helicase Q4 Human genes 0.000 description 1
- 101150020330 ATRX gene Proteins 0.000 description 1
- 206010063409 Acarodermatitis Diseases 0.000 description 1
- 102100040963 Acetylcholine receptor subunit epsilon Human genes 0.000 description 1
- 108010055851 Acetylglucosaminidase Proteins 0.000 description 1
- 241000588626 Acinetobacter baumannii Species 0.000 description 1
- 102100035886 Adenine DNA glycosylase Human genes 0.000 description 1
- 102100034540 Adenomatous polyposis coli protein Human genes 0.000 description 1
- 102100036664 Adenosine deaminase Human genes 0.000 description 1
- 102100020775 Adenylosuccinate lyase Human genes 0.000 description 1
- 108700040193 Adenylosuccinate lyases Proteins 0.000 description 1
- 102100026402 Adhesion G protein-coupled receptor E2 Human genes 0.000 description 1
- 102100026423 Adhesion G protein-coupled receptor E5 Human genes 0.000 description 1
- 102100031934 Adhesion G-protein coupled receptor G1 Human genes 0.000 description 1
- 102100026608 Aldehyde dehydrogenase family 3 member A2 Human genes 0.000 description 1
- 241000099223 Alistipes sp. Species 0.000 description 1
- 102100025683 Alkaline phosphatase, tissue-nonspecific isozyme Human genes 0.000 description 1
- 102100034112 Alkyldihydroxyacetonephosphate synthase, peroxisomal Human genes 0.000 description 1
- 102100035028 Alpha-L-iduronidase Human genes 0.000 description 1
- 102100034561 Alpha-N-acetylglucosaminidase Human genes 0.000 description 1
- 102100026277 Alpha-galactosidase A Human genes 0.000 description 1
- 102100030685 Alpha-sarcoglycan Human genes 0.000 description 1
- 101710085003 Alpha-tubulin N-acetyltransferase Proteins 0.000 description 1
- 101710085461 Alpha-tubulin N-acetyltransferase 1 Proteins 0.000 description 1
- 102100032360 Alstrom syndrome protein 1 Human genes 0.000 description 1
- 101710191958 Amino-acid acetyltransferase Proteins 0.000 description 1
- 101710185938 Amino-acid acetyltransferase, mitochondrial Proteins 0.000 description 1
- 102100039338 Aminomethyltransferase, mitochondrial Human genes 0.000 description 1
- 102100040894 Amylo-alpha-1,6-glucosidase Human genes 0.000 description 1
- 206010061424 Anal cancer Diseases 0.000 description 1
- 102100032187 Androgen receptor Human genes 0.000 description 1
- 102100023003 Ankyrin repeat domain-containing protein 30A Human genes 0.000 description 1
- 244000303258 Annona diversifolia Species 0.000 description 1
- 235000002198 Annona diversifolia Nutrition 0.000 description 1
- 102100021253 Antileukoproteinase Human genes 0.000 description 1
- 108010036221 Aquaporin 2 Proteins 0.000 description 1
- 102000011899 Aquaporin 2 Human genes 0.000 description 1
- 101000686547 Arabidopsis thaliana 30S ribosomal protein S1, chloroplastic Proteins 0.000 description 1
- 102100024003 Arf-GAP with SH3 domain, ANK repeat and PH domain-containing protein 1 Human genes 0.000 description 1
- 102100031378 Arginine-hydroxylase NDUFAF5, mitochondrial Human genes 0.000 description 1
- 108700040066 Argininosuccinate lyases Proteins 0.000 description 1
- 102100020999 Argininosuccinate synthase Human genes 0.000 description 1
- 102100029361 Aromatase Human genes 0.000 description 1
- 102100022146 Arylsulfatase A Human genes 0.000 description 1
- 102100031491 Arylsulfatase B Human genes 0.000 description 1
- 101001120734 Ascaris suum Pyruvate dehydrogenase E1 component subunit alpha type I, mitochondrial Proteins 0.000 description 1
- 101150025804 Asl gene Proteins 0.000 description 1
- 102100023927 Asparagine synthetase [glutamine-hydrolyzing] Human genes 0.000 description 1
- 102100032948 Aspartoacylase Human genes 0.000 description 1
- 201000002909 Aspergillosis Diseases 0.000 description 1
- 208000036641 Aspergillus infections Diseases 0.000 description 1
- 101000690509 Aspergillus oryzae (strain ATCC 42149 / RIB 40) Alpha-glucosidase Proteins 0.000 description 1
- 102100036465 Autoimmune regulator Human genes 0.000 description 1
- 241000271566 Aves Species 0.000 description 1
- 102100035683 Axin-2 Human genes 0.000 description 1
- 108010008014 B-Cell Maturation Antigen Proteins 0.000 description 1
- 102000006942 B-Cell Maturation Antigen Human genes 0.000 description 1
- 208000010839 B-cell chronic lymphocytic leukemia Diseases 0.000 description 1
- 102100025218 B-cell differentiation antigen CD72 Human genes 0.000 description 1
- 102100038080 B-cell receptor CD22 Human genes 0.000 description 1
- 102100024222 B-lymphocyte antigen CD19 Human genes 0.000 description 1
- 102100022005 B-lymphocyte antigen CD20 Human genes 0.000 description 1
- 101700002522 BARD1 Proteins 0.000 description 1
- 108700020463 BRCA1 Proteins 0.000 description 1
- 102000036365 BRCA1 Human genes 0.000 description 1
- 101150072950 BRCA1 gene Proteins 0.000 description 1
- 102100028048 BRCA1-associated RING domain protein 1 Human genes 0.000 description 1
- 108700020462 BRCA2 Proteins 0.000 description 1
- 102000052609 BRCA2 Human genes 0.000 description 1
- 208000004926 Bacterial Vaginosis Diseases 0.000 description 1
- 206010004022 Bacterial food poisoning Diseases 0.000 description 1
- 102100027522 Baculoviral IAP repeat-containing protein 7 Human genes 0.000 description 1
- 102100021295 Bardet-Biedl syndrome 1 protein Human genes 0.000 description 1
- 102100021296 Bardet-Biedl syndrome 10 protein Human genes 0.000 description 1
- 102100021297 Bardet-Biedl syndrome 12 protein Human genes 0.000 description 1
- 102100027883 Bardet-Biedl syndrome 2 protein Human genes 0.000 description 1
- 102100025359 Barttin Human genes 0.000 description 1
- 102100022440 Battenin Human genes 0.000 description 1
- 241000190863 Bergeyella zoohelcum Species 0.000 description 1
- 102100022548 Beta-hexosaminidase subunit alpha Human genes 0.000 description 1
- 102100022549 Beta-hexosaminidase subunit beta Human genes 0.000 description 1
- 102100030686 Beta-sarcoglycan Human genes 0.000 description 1
- 102100028282 Bile salt export pump Human genes 0.000 description 1
- 102100033743 Biotin-[acetyl-CoA-carboxylase] ligase Human genes 0.000 description 1
- 206010005003 Bladder cancer Diseases 0.000 description 1
- 208000019838 Blood disease Diseases 0.000 description 1
- 206010005949 Bone cancer Diseases 0.000 description 1
- 102100037086 Bone marrow stromal antigen 2 Human genes 0.000 description 1
- 102100025423 Bone morphogenetic protein receptor type-1A Human genes 0.000 description 1
- 208000018084 Bone neoplasm Diseases 0.000 description 1
- 241000588779 Bordetella bronchiseptica Species 0.000 description 1
- 241001495147 Bordetella holmesii Species 0.000 description 1
- 241000588780 Bordetella parapertussis Species 0.000 description 1
- 208000003508 Botulism Diseases 0.000 description 1
- 208000003174 Brain Neoplasms Diseases 0.000 description 1
- 101150008921 Brca2 gene Proteins 0.000 description 1
- 206010006187 Breast cancer Diseases 0.000 description 1
- 208000026310 Breast neoplasm Diseases 0.000 description 1
- 241000589513 Burkholderia cepacia Species 0.000 description 1
- 208000011691 Burkitt lymphomas Diseases 0.000 description 1
- 101710188619 C-type lectin domain family 12 member A Proteins 0.000 description 1
- 108700012439 CA9 Proteins 0.000 description 1
- 102100034808 CCAAT/enhancer-binding protein alpha Human genes 0.000 description 1
- 102100038078 CD276 antigen Human genes 0.000 description 1
- 108010058905 CD44v6 antigen Proteins 0.000 description 1
- 102100029390 CMRF35-like molecule 1 Human genes 0.000 description 1
- 108091033409 CRISPR Proteins 0.000 description 1
- 108010040467 CRISPR-Associated Proteins Proteins 0.000 description 1
- 102100022509 Cadherin-23 Human genes 0.000 description 1
- 108010050543 Calcium-Sensing Receptors Proteins 0.000 description 1
- 102100034279 Calcium-binding mitochondrial carrier protein Aralar2 Human genes 0.000 description 1
- 102100032539 Calpain-3 Human genes 0.000 description 1
- 102100033620 Calponin-1 Human genes 0.000 description 1
- 241000282836 Camelus dromedarius Species 0.000 description 1
- 102100025570 Cancer/testis antigen 1 Human genes 0.000 description 1
- 206010007134 Candida infections Diseases 0.000 description 1
- 241000283707 Capra Species 0.000 description 1
- 102100024423 Carbonic anhydrase 9 Human genes 0.000 description 1
- 102100027943 Carnitine O-palmitoyltransferase 1, liver isoform Human genes 0.000 description 1
- 102100024853 Carnitine O-palmitoyltransferase 2, mitochondrial Human genes 0.000 description 1
- 102100028003 Catenin alpha-1 Human genes 0.000 description 1
- 102100024940 Cathepsin K Human genes 0.000 description 1
- 108091007854 Cdh1/Fizzy-related Proteins 0.000 description 1
- 206010007882 Cellulitis Diseases 0.000 description 1
- 102100035673 Centrosomal protein of 290 kDa Human genes 0.000 description 1
- 101710198317 Centrosomal protein of 290 kDa Proteins 0.000 description 1
- 102100036165 Ceramide kinase-like protein Human genes 0.000 description 1
- 241000282693 Cercopithecidae Species 0.000 description 1
- 102100034505 Ceroid-lipofuscinosis neuronal protein 5 Human genes 0.000 description 1
- 102100034480 Ceroid-lipofuscinosis neuronal protein 6 Human genes 0.000 description 1
- 108091005944 Cerulean Proteins 0.000 description 1
- 208000026368 Cestode infections Diseases 0.000 description 1
- 241000283153 Cetacea Species 0.000 description 1
- 208000004293 Chikungunya Fever Diseases 0.000 description 1
- 206010067256 Chikungunya virus infection Diseases 0.000 description 1
- 241001647378 Chlamydia psittaci Species 0.000 description 1
- 241000579895 Chlorostilbon Species 0.000 description 1
- 206010008631 Cholera Diseases 0.000 description 1
- 241000251730 Chondrichthyes Species 0.000 description 1
- 101710178046 Chorismate synthase 1 Proteins 0.000 description 1
- 108091005960 Citrine Proteins 0.000 description 1
- 102100031060 Clarin-1 Human genes 0.000 description 1
- 102100038449 Claudin-6 Human genes 0.000 description 1
- 241000193163 Clostridioides difficile Species 0.000 description 1
- 102100023470 Cobalamin trafficking protein CblD Human genes 0.000 description 1
- 108091026890 Coding region Proteins 0.000 description 1
- 108020004705 Codon Proteins 0.000 description 1
- 102100024335 Collagen alpha-1(VII) chain Human genes 0.000 description 1
- 102100031544 Collagen alpha-1(XXVII) chain Human genes 0.000 description 1
- 102100033780 Collagen alpha-3(IV) chain Human genes 0.000 description 1
- 102100033779 Collagen alpha-4(IV) chain Human genes 0.000 description 1
- 102100033775 Collagen alpha-5(IV) chain Human genes 0.000 description 1
- 206010009944 Colon cancer Diseases 0.000 description 1
- 102100021645 Complex I assembly factor ACAD9, mitochondrial Human genes 0.000 description 1
- 108010022637 Copper-Transporting ATPases Proteins 0.000 description 1
- 102100027587 Copper-transporting ATPase 1 Human genes 0.000 description 1
- 102100027591 Copper-transporting ATPase 2 Human genes 0.000 description 1
- 108010043471 Core Binding Factor Alpha 2 Subunit Proteins 0.000 description 1
- 102100023376 Corrinoid adenosyltransferase Human genes 0.000 description 1
- 241000186227 Corynebacterium diphtheriae Species 0.000 description 1
- 241001481833 Coryphaena hippurus Species 0.000 description 1
- 241000606678 Coxiella burnetii Species 0.000 description 1
- 208000000307 Crimean Hemorrhagic Fever Diseases 0.000 description 1
- 201000003075 Crimean-Congo hemorrhagic fever Diseases 0.000 description 1
- 108091005943 CyPet Proteins 0.000 description 1
- 102100023381 Cyanocobalamin reductase / alkylcobalamin dealkylase Human genes 0.000 description 1
- 101710164985 Cyanocobalamin reductase / alkylcobalamin dealkylase Proteins 0.000 description 1
- 102100029140 Cyclic nucleotide-gated cation channel beta-3 Human genes 0.000 description 1
- 102000002427 Cyclin B Human genes 0.000 description 1
- 108010068150 Cyclin B Proteins 0.000 description 1
- 108010025464 Cyclin-Dependent Kinase 4 Proteins 0.000 description 1
- 108010009392 Cyclin-Dependent Kinase Inhibitor p16 Proteins 0.000 description 1
- 102000000577 Cyclin-Dependent Kinase Inhibitor p27 Human genes 0.000 description 1
- 108010016777 Cyclin-Dependent Kinase Inhibitor p27 Proteins 0.000 description 1
- 102000004480 Cyclin-Dependent Kinase Inhibitor p57 Human genes 0.000 description 1
- 108010017222 Cyclin-Dependent Kinase Inhibitor p57 Proteins 0.000 description 1
- 102100036252 Cyclin-dependent kinase 4 Human genes 0.000 description 1
- 101710152695 Cysteine synthase 1 Proteins 0.000 description 1
- 102100031089 Cystinosin Human genes 0.000 description 1
- UHDGCWIWMRVCDJ-PSQAKQOGSA-N Cytidine Natural products O=C1N=C(N)C=CN1[C@@H]1[C@@H](O)[C@@H](O)[C@H](CO)O1 UHDGCWIWMRVCDJ-PSQAKQOGSA-N 0.000 description 1
- 108010009911 Cytochrome P-450 CYP11B2 Proteins 0.000 description 1
- 102100024332 Cytochrome P450 11B1, mitochondrial Human genes 0.000 description 1
- 102100024329 Cytochrome P450 11B2, mitochondrial Human genes 0.000 description 1
- 102100027417 Cytochrome P450 1B1 Human genes 0.000 description 1
- 102100025621 Cytochrome b-245 heavy chain Human genes 0.000 description 1
- 102100025620 Cytochrome b-245 light chain Human genes 0.000 description 1
- 102100026234 Cytokine receptor common subunit gamma Human genes 0.000 description 1
- 102100037579 D-3-phosphoglycerate dehydrogenase Human genes 0.000 description 1
- HMFHBZSHGGEWLO-SOOFDHNKSA-N D-ribofuranose Chemical compound OC[C@H]1OC(O)[C@H](O)[C@@H]1O HMFHBZSHGGEWLO-SOOFDHNKSA-N 0.000 description 1
- 102100038017 DIS3-like exonuclease 2 Human genes 0.000 description 1
- 230000005778 DNA damage Effects 0.000 description 1
- 231100000277 DNA damage Toxicity 0.000 description 1
- 102100031867 DNA excision repair protein ERCC-6 Human genes 0.000 description 1
- 102100031868 DNA excision repair protein ERCC-8 Human genes 0.000 description 1
- 102100034157 DNA mismatch repair protein Msh2 Human genes 0.000 description 1
- 102100037700 DNA mismatch repair protein Msh3 Human genes 0.000 description 1
- 102100021147 DNA mismatch repair protein Msh6 Human genes 0.000 description 1
- 230000033616 DNA repair Effects 0.000 description 1
- 102100039116 DNA repair protein RAD50 Human genes 0.000 description 1
- 102100034484 DNA repair protein RAD51 homolog 3 Human genes 0.000 description 1
- 102100034483 DNA repair protein RAD51 homolog 4 Human genes 0.000 description 1
- 102100022474 DNA repair protein complementing XP-A cells Human genes 0.000 description 1
- 102100022477 DNA repair protein complementing XP-C cells Human genes 0.000 description 1
- 101100481408 Danio rerio tie2 gene Proteins 0.000 description 1
- 102100036511 Dehydrodolichyl diphosphate synthase complex subunit DHDDS Human genes 0.000 description 1
- 241001461743 Deltacoronavirus Species 0.000 description 1
- 102100034289 Deoxynucleoside triphosphate triphosphohydrolase SAMHD1 Human genes 0.000 description 1
- 108010046331 Deoxyribodipyrimidine photo-lyase Proteins 0.000 description 1
- 102100023319 Dihydrolipoyl dehydrogenase, mitochondrial Human genes 0.000 description 1
- 102100032086 Dolichyl pyrophosphate Man9GlcNAc2 alpha-1,3-glucosyltransferase Human genes 0.000 description 1
- 206010059866 Drug resistance Diseases 0.000 description 1
- 102100031648 Dynein axonemal heavy chain 5 Human genes 0.000 description 1
- 102100033595 Dynein axonemal intermediate chain 1 Human genes 0.000 description 1
- 102100033596 Dynein axonemal intermediate chain 2 Human genes 0.000 description 1
- 102100032248 Dysferlin Human genes 0.000 description 1
- 102100024108 Dystrophin Human genes 0.000 description 1
- 102000012804 EPCAM Human genes 0.000 description 1
- 101150084967 EPCAM gene Proteins 0.000 description 1
- 101150029707 ERBB2 gene Proteins 0.000 description 1
- 102100037354 Ectodysplasin-A Human genes 0.000 description 1
- 102100030695 Electron transfer flavoprotein subunit alpha, mitochondrial Human genes 0.000 description 1
- 102100031804 Electron transfer flavoprotein-ubiquinone oxidoreductase, mitochondrial Human genes 0.000 description 1
- 101001003194 Eleusine coracana Alpha-amylase/trypsin inhibitor Proteins 0.000 description 1
- 102100037074 Ellis-van Creveld syndrome protein Human genes 0.000 description 1
- 102100037642 Elongation factor G, mitochondrial Human genes 0.000 description 1
- 102100021309 Elongation factor Ts, mitochondrial Human genes 0.000 description 1
- 102100039246 Elongator complex protein 1 Human genes 0.000 description 1
- 206010014612 Encephalitis viral Diseases 0.000 description 1
- 102100021710 Endonuclease III-like protein 1 Human genes 0.000 description 1
- 102100023387 Endoribonuclease Dicer Human genes 0.000 description 1
- 102100031785 Endothelial transcription factor GATA-2 Human genes 0.000 description 1
- 241000588697 Enterobacter cloacae Species 0.000 description 1
- 241000194032 Enterococcus faecalis Species 0.000 description 1
- 241000194031 Enterococcus faecium Species 0.000 description 1
- 241001026002 Enterococcus italicus Species 0.000 description 1
- 108010055196 EphA2 Receptor Proteins 0.000 description 1
- 102100030340 Ephrin type-A receptor 2 Human genes 0.000 description 1
- 108010044090 Ephrin-B2 Proteins 0.000 description 1
- 102100023721 Ephrin-B2 Human genes 0.000 description 1
- 208000007985 Erythema Infectiosum Diseases 0.000 description 1
- 208000000461 Esophageal Neoplasms Diseases 0.000 description 1
- 241000206602 Eukaryota Species 0.000 description 1
- 201000005866 Exanthema Subitum Diseases 0.000 description 1
- 102100035650 Extracellular calcium-sensing receptor Human genes 0.000 description 1
- 108010067741 Fanconi Anemia Complementation Group N protein Proteins 0.000 description 1
- 102100034553 Fanconi anemia group J protein Human genes 0.000 description 1
- 102100031507 Fc receptor-like protein 5 Human genes 0.000 description 1
- 101150032879 Fcrl5 gene Proteins 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 102100032596 Fibrocystin Human genes 0.000 description 1
- 108090000331 Firefly luciferases Proteins 0.000 description 1
- 102100040859 Fizzy-related protein homolog Human genes 0.000 description 1
- 102000010451 Folate receptor alpha Human genes 0.000 description 1
- 108050001931 Folate receptor alpha Proteins 0.000 description 1
- 102000010449 Folate receptor beta Human genes 0.000 description 1
- 108050001930 Folate receptor beta Proteins 0.000 description 1
- 102100027909 Folliculin Human genes 0.000 description 1
- 102100028875 Formylglycine-generating enzyme Human genes 0.000 description 1
- 108090000123 Fos-related antigen 1 Proteins 0.000 description 1
- 102000003817 Fos-related antigen 1 Human genes 0.000 description 1
- 102100022272 Fructose-bisphosphate aldolase B Human genes 0.000 description 1
- 102100036939 G-protein coupled receptor 20 Human genes 0.000 description 1
- 102100021197 G-protein coupled receptor family C group 5 member D Human genes 0.000 description 1
- 102100029974 GTPase HRas Human genes 0.000 description 1
- 102100028496 Galactocerebrosidase Human genes 0.000 description 1
- 102100037777 Galactokinase Human genes 0.000 description 1
- 102100036291 Galactose-1-phosphate uridylyltransferase Human genes 0.000 description 1
- 102000044445 Galectin-8 Human genes 0.000 description 1
- 102100021792 Gamma-sarcoglycan Human genes 0.000 description 1
- 241000008920 Gammacoronavirus Species 0.000 description 1
- 102100037260 Gap junction beta-1 protein Human genes 0.000 description 1
- 102100037156 Gap junction beta-2 protein Human genes 0.000 description 1
- 101710088083 Glomulin Proteins 0.000 description 1
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 1
- 102100036264 Glucose-6-phosphatase catalytic subunit 1 Human genes 0.000 description 1
- 102100039684 Glucose-6-phosphate exchanger SLC37A4 Human genes 0.000 description 1
- 108010018962 Glucosephosphate Dehydrogenase Proteins 0.000 description 1
- 102100041003 Glutamate carboxypeptidase 2 Human genes 0.000 description 1
- 102100028603 Glutaryl-CoA dehydrogenase, mitochondrial Human genes 0.000 description 1
- 102100033495 Glycine dehydrogenase (decarboxylating), mitochondrial Human genes 0.000 description 1
- 102100029492 Glycogen phosphorylase, muscle form Human genes 0.000 description 1
- 102100030648 Glyoxylate reductase/hydroxypyruvate reductase Human genes 0.000 description 1
- 241000282575 Gorilla Species 0.000 description 1
- 102100038367 Gremlin-1 Human genes 0.000 description 1
- 102100040579 Guanidinoacetate N-methyltransferase Human genes 0.000 description 1
- 102100034445 HCLS1-associated protein X-1 Human genes 0.000 description 1
- 239000007995 HEPES buffer Substances 0.000 description 1
- 102100031561 Hamartin Human genes 0.000 description 1
- 102100037931 Harmonin Human genes 0.000 description 1
- 241001364929 Havel River virus Species 0.000 description 1
- 102100027685 Hemoglobin subunit alpha Human genes 0.000 description 1
- 102100021519 Hemoglobin subunit beta Human genes 0.000 description 1
- 102100039991 Heparan-alpha-glucosaminide N-acetyltransferase Human genes 0.000 description 1
- 102000007343 Hepatitis A Virus Cellular Receptor 1 Human genes 0.000 description 1
- 108010007712 Hepatitis A Virus Cellular Receptor 1 Proteins 0.000 description 1
- 208000005331 Hepatitis D Diseases 0.000 description 1
- 206010019799 Hepatitis viral Diseases 0.000 description 1
- 241000613556 Herbinix hemicellulosilytica Species 0.000 description 1
- 102100028902 Hermansky-Pudlak syndrome 1 protein Human genes 0.000 description 1
- 102100028716 Hermansky-Pudlak syndrome 3 protein Human genes 0.000 description 1
- 102100028721 Hermansky-Pudlak syndrome 5 protein Human genes 0.000 description 1
- 208000007514 Herpes zoster Diseases 0.000 description 1
- 108091027305 Heteroduplex Proteins 0.000 description 1
- 102100035108 High affinity nerve growth factor receptor Human genes 0.000 description 1
- 201000002563 Histoplasmosis Diseases 0.000 description 1
- 102100021088 Homeobox protein Hox-B13 Human genes 0.000 description 1
- 102100031159 Homeobox protein prophet of Pit-1 Human genes 0.000 description 1
- 101001058479 Homo sapiens 1,4-alpha-glucan-branching enzyme Proteins 0.000 description 1
- 101000597665 Homo sapiens 2-oxoisovalerate dehydrogenase subunit alpha, mitochondrial Proteins 0.000 description 1
- 101000597680 Homo sapiens 2-oxoisovalerate dehydrogenase subunit beta, mitochondrial Proteins 0.000 description 1
- 101000744504 Homo sapiens 43 kDa receptor-associated protein of the synapse Proteins 0.000 description 1
- 101000928720 Homo sapiens 7-dehydrocholesterol reductase Proteins 0.000 description 1
- 101000777636 Homo sapiens ADP-ribosyl cyclase/cyclic ADP-ribose hydrolase 1 Proteins 0.000 description 1
- 101000760570 Homo sapiens ATP-binding cassette sub-family C member 8 Proteins 0.000 description 1
- 101000730838 Homo sapiens ATP-dependent 6-phosphofructokinase, muscle type Proteins 0.000 description 1
- 101000580577 Homo sapiens ATP-dependent DNA helicase Q4 Proteins 0.000 description 1
- 101000614701 Homo sapiens ATP-sensitive inward rectifier potassium channel 11 Proteins 0.000 description 1
- 101000965233 Homo sapiens Acetylcholine receptor subunit epsilon Proteins 0.000 description 1
- 101001000351 Homo sapiens Adenine DNA glycosylase Proteins 0.000 description 1
- 101000924577 Homo sapiens Adenomatous polyposis coli protein Proteins 0.000 description 1
- 101000929495 Homo sapiens Adenosine deaminase Proteins 0.000 description 1
- 101000718211 Homo sapiens Adhesion G protein-coupled receptor E2 Proteins 0.000 description 1
- 101000718243 Homo sapiens Adhesion G protein-coupled receptor E5 Proteins 0.000 description 1
- 101000775042 Homo sapiens Adhesion G-protein coupled receptor G1 Proteins 0.000 description 1
- 101000717967 Homo sapiens Aldehyde dehydrogenase family 3 member A2 Proteins 0.000 description 1
- 101000574445 Homo sapiens Alkaline phosphatase, tissue-nonspecific isozyme Proteins 0.000 description 1
- 101000799143 Homo sapiens Alkyldihydroxyacetonephosphate synthase, peroxisomal Proteins 0.000 description 1
- 101001019502 Homo sapiens Alpha-L-iduronidase Proteins 0.000 description 1
- 101000718525 Homo sapiens Alpha-galactosidase A Proteins 0.000 description 1
- 101000703500 Homo sapiens Alpha-sarcoglycan Proteins 0.000 description 1
- 101000797795 Homo sapiens Alstrom syndrome protein 1 Proteins 0.000 description 1
- 101000887804 Homo sapiens Aminomethyltransferase, mitochondrial Proteins 0.000 description 1
- 101000893559 Homo sapiens Amylo-alpha-1,6-glucosidase Proteins 0.000 description 1
- 101000757191 Homo sapiens Ankyrin repeat domain-containing protein 30A Proteins 0.000 description 1
- 101000615334 Homo sapiens Antileukoproteinase Proteins 0.000 description 1
- 101000752037 Homo sapiens Arginase-1 Proteins 0.000 description 1
- 101000588484 Homo sapiens Arginine-hydroxylase NDUFAF5, mitochondrial Proteins 0.000 description 1
- 101000784014 Homo sapiens Argininosuccinate synthase Proteins 0.000 description 1
- 101000919395 Homo sapiens Aromatase Proteins 0.000 description 1
- 101000901140 Homo sapiens Arylsulfatase A Proteins 0.000 description 1
- 101000923070 Homo sapiens Arylsulfatase B Proteins 0.000 description 1
- 101000975992 Homo sapiens Asparagine synthetase [glutamine-hydrolyzing] Proteins 0.000 description 1
- 101000797251 Homo sapiens Aspartoacylase Proteins 0.000 description 1
- 101000928549 Homo sapiens Autoimmune regulator Proteins 0.000 description 1
- 101000874569 Homo sapiens Axin-2 Proteins 0.000 description 1
- 101000934359 Homo sapiens B-cell differentiation antigen CD72 Proteins 0.000 description 1
- 101000884305 Homo sapiens B-cell receptor CD22 Proteins 0.000 description 1
- 101000980825 Homo sapiens B-lymphocyte antigen CD19 Proteins 0.000 description 1
- 101000897405 Homo sapiens B-lymphocyte antigen CD20 Proteins 0.000 description 1
- 101000936083 Homo sapiens Baculoviral IAP repeat-containing protein 7 Proteins 0.000 description 1
- 101000894722 Homo sapiens Bardet-Biedl syndrome 1 protein Proteins 0.000 description 1
- 101000894732 Homo sapiens Bardet-Biedl syndrome 10 protein Proteins 0.000 description 1
- 101000894739 Homo sapiens Bardet-Biedl syndrome 12 protein Proteins 0.000 description 1
- 101000697700 Homo sapiens Bardet-Biedl syndrome 2 protein Proteins 0.000 description 1
- 101000934823 Homo sapiens Barttin Proteins 0.000 description 1
- 101000901683 Homo sapiens Battenin Proteins 0.000 description 1
- 101000765010 Homo sapiens Beta-galactosidase Proteins 0.000 description 1
- 101001045440 Homo sapiens Beta-hexosaminidase subunit alpha Proteins 0.000 description 1
- 101001045433 Homo sapiens Beta-hexosaminidase subunit beta Proteins 0.000 description 1
- 101000703495 Homo sapiens Beta-sarcoglycan Proteins 0.000 description 1
- 101000871771 Homo sapiens Biotin-[acetyl-CoA-carboxylase] ligase Proteins 0.000 description 1
- 101000740785 Homo sapiens Bone marrow stromal antigen 2 Proteins 0.000 description 1
- 101000934638 Homo sapiens Bone morphogenetic protein receptor type-1A Proteins 0.000 description 1
- 101000912622 Homo sapiens C-type lectin domain family 12 member A Proteins 0.000 description 1
- 101000945515 Homo sapiens CCAAT/enhancer-binding protein alpha Proteins 0.000 description 1
- 101000884279 Homo sapiens CD276 antigen Proteins 0.000 description 1
- 101100382122 Homo sapiens CIITA gene Proteins 0.000 description 1
- 101000990055 Homo sapiens CMRF35-like molecule 1 Proteins 0.000 description 1
- 101000899442 Homo sapiens Cadherin-23 Proteins 0.000 description 1
- 101000867715 Homo sapiens Calpain-3 Proteins 0.000 description 1
- 101000945318 Homo sapiens Calponin-1 Proteins 0.000 description 1
- 101000856237 Homo sapiens Cancer/testis antigen 1 Proteins 0.000 description 1
- 101000855412 Homo sapiens Carbamoyl-phosphate synthase [ammonia], mitochondrial Proteins 0.000 description 1
- 101000914324 Homo sapiens Carcinoembryonic antigen-related cell adhesion molecule 5 Proteins 0.000 description 1
- 101000914321 Homo sapiens Carcinoembryonic antigen-related cell adhesion molecule 7 Proteins 0.000 description 1
- 101000859570 Homo sapiens Carnitine O-palmitoyltransferase 1, liver isoform Proteins 0.000 description 1
- 101000909313 Homo sapiens Carnitine O-palmitoyltransferase 2, mitochondrial Proteins 0.000 description 1
- 101000859063 Homo sapiens Catenin alpha-1 Proteins 0.000 description 1
- 101000761509 Homo sapiens Cathepsin K Proteins 0.000 description 1
- 101000715707 Homo sapiens Ceramide kinase-like protein Proteins 0.000 description 1
- 101000710208 Homo sapiens Ceroid-lipofuscinosis neuronal protein 5 Proteins 0.000 description 1
- 101000710215 Homo sapiens Ceroid-lipofuscinosis neuronal protein 6 Proteins 0.000 description 1
- 101000851684 Homo sapiens Chimeric ERCC6-PGBD3 protein Proteins 0.000 description 1
- 101000992973 Homo sapiens Clarin-1 Proteins 0.000 description 1
- 101000882898 Homo sapiens Claudin-6 Proteins 0.000 description 1
- 101000977167 Homo sapiens Cobalamin trafficking protein CblD Proteins 0.000 description 1
- 101000909498 Homo sapiens Collagen alpha-1(VII) chain Proteins 0.000 description 1
- 101000940372 Homo sapiens Collagen alpha-1(XXVII) chain Proteins 0.000 description 1
- 101000710873 Homo sapiens Collagen alpha-3(IV) chain Proteins 0.000 description 1
- 101000710870 Homo sapiens Collagen alpha-4(IV) chain Proteins 0.000 description 1
- 101000710886 Homo sapiens Collagen alpha-5(IV) chain Proteins 0.000 description 1
- 101000677550 Homo sapiens Complex I assembly factor ACAD9, mitochondrial Proteins 0.000 description 1
- 101000936280 Homo sapiens Copper-transporting ATPase 2 Proteins 0.000 description 1
- 101001114650 Homo sapiens Corrinoid adenosyltransferase Proteins 0.000 description 1
- 101000771083 Homo sapiens Cyclic nucleotide-gated cation channel beta-3 Proteins 0.000 description 1
- 101000922034 Homo sapiens Cystinosin Proteins 0.000 description 1
- 101000725164 Homo sapiens Cytochrome P450 1B1 Proteins 0.000 description 1
- 101000856723 Homo sapiens Cytochrome b-245 light chain Proteins 0.000 description 1
- 101001055227 Homo sapiens Cytokine receptor common subunit gamma Proteins 0.000 description 1
- 101000739890 Homo sapiens D-3-phosphoglycerate dehydrogenase Proteins 0.000 description 1
- 101000951062 Homo sapiens DIS3-like exonuclease 2 Proteins 0.000 description 1
- 101000920783 Homo sapiens DNA excision repair protein ERCC-6 Proteins 0.000 description 1
- 101000920778 Homo sapiens DNA excision repair protein ERCC-8 Proteins 0.000 description 1
- 101001134036 Homo sapiens DNA mismatch repair protein Msh2 Proteins 0.000 description 1
- 101001027762 Homo sapiens DNA mismatch repair protein Msh3 Proteins 0.000 description 1
- 101000968658 Homo sapiens DNA mismatch repair protein Msh6 Proteins 0.000 description 1
- 101000743929 Homo sapiens DNA repair protein RAD50 Proteins 0.000 description 1
- 101001132271 Homo sapiens DNA repair protein RAD51 homolog 3 Proteins 0.000 description 1
- 101001132266 Homo sapiens DNA repair protein RAD51 homolog 4 Proteins 0.000 description 1
- 101000618531 Homo sapiens DNA repair protein complementing XP-A cells Proteins 0.000 description 1
- 101000618535 Homo sapiens DNA repair protein complementing XP-C cells Proteins 0.000 description 1
- 101000928713 Homo sapiens Dehydrodolichyl diphosphate synthase complex subunit DHDDS Proteins 0.000 description 1
- 101000776319 Homo sapiens Dolichyl pyrophosphate Man9GlcNAc2 alpha-1,3-glucosyltransferase Proteins 0.000 description 1
- 101000954709 Homo sapiens Doublecortin domain-containing protein 2 Proteins 0.000 description 1
- 101000866368 Homo sapiens Dynein axonemal heavy chain 5 Proteins 0.000 description 1
- 101000872267 Homo sapiens Dynein axonemal intermediate chain 1 Proteins 0.000 description 1
- 101000872272 Homo sapiens Dynein axonemal intermediate chain 2 Proteins 0.000 description 1
- 101001016184 Homo sapiens Dysferlin Proteins 0.000 description 1
- 101001053946 Homo sapiens Dystrophin Proteins 0.000 description 1
- 101001095815 Homo sapiens E3 ubiquitin-protein ligase RING2 Proteins 0.000 description 1
- 101000880080 Homo sapiens Ectodysplasin-A Proteins 0.000 description 1
- 101001010541 Homo sapiens Electron transfer flavoprotein subunit alpha, mitochondrial Proteins 0.000 description 1
- 101000920874 Homo sapiens Electron transfer flavoprotein-ubiquinone oxidoreductase, mitochondrial Proteins 0.000 description 1
- 101000881890 Homo sapiens Ellis-van Creveld syndrome protein Proteins 0.000 description 1
- 101000880344 Homo sapiens Elongation factor G, mitochondrial Proteins 0.000 description 1
- 101000895350 Homo sapiens Elongation factor Ts, mitochondrial Proteins 0.000 description 1
- 101000813117 Homo sapiens Elongator complex protein 1 Proteins 0.000 description 1
- 101000970385 Homo sapiens Endonuclease III-like protein 1 Proteins 0.000 description 1
- 101000907904 Homo sapiens Endoribonuclease Dicer Proteins 0.000 description 1
- 101001066265 Homo sapiens Endothelial transcription factor GATA-2 Proteins 0.000 description 1
- 101000848171 Homo sapiens Fanconi anemia group J protein Proteins 0.000 description 1
- 101000730595 Homo sapiens Fibrocystin Proteins 0.000 description 1
- 101001060703 Homo sapiens Folliculin Proteins 0.000 description 1
- 101000648611 Homo sapiens Formylglycine-generating enzyme Proteins 0.000 description 1
- 101000755933 Homo sapiens Fructose-bisphosphate aldolase B Proteins 0.000 description 1
- 101001071355 Homo sapiens G-protein coupled receptor 20 Proteins 0.000 description 1
- 101001040713 Homo sapiens G-protein coupled receptor family C group 5 member D Proteins 0.000 description 1
- 101000584633 Homo sapiens GTPase HRas Proteins 0.000 description 1
- 101100437500 Homo sapiens GUSB gene Proteins 0.000 description 1
- 101000860395 Homo sapiens Galactocerebrosidase Proteins 0.000 description 1
- 101001024874 Homo sapiens Galactokinase Proteins 0.000 description 1
- 101001021379 Homo sapiens Galactose-1-phosphate uridylyltransferase Proteins 0.000 description 1
- 101000608769 Homo sapiens Galectin-8 Proteins 0.000 description 1
- 101000616435 Homo sapiens Gamma-sarcoglycan Proteins 0.000 description 1
- 101000954104 Homo sapiens Gap junction beta-1 protein Proteins 0.000 description 1
- 101000954092 Homo sapiens Gap junction beta-2 protein Proteins 0.000 description 1
- 101000930910 Homo sapiens Glucose-6-phosphatase catalytic subunit 1 Proteins 0.000 description 1
- 101000892862 Homo sapiens Glutamate carboxypeptidase 2 Proteins 0.000 description 1
- 101001058943 Homo sapiens Glutaryl-CoA dehydrogenase, mitochondrial Proteins 0.000 description 1
- 101001066129 Homo sapiens Glyceraldehyde-3-phosphate dehydrogenase Proteins 0.000 description 1
- 101000998096 Homo sapiens Glycine dehydrogenase (decarboxylating), mitochondrial Proteins 0.000 description 1
- 101000700475 Homo sapiens Glycogen phosphorylase, muscle form Proteins 0.000 description 1
- 101001010442 Homo sapiens Glyoxylate reductase/hydroxypyruvate reductase Proteins 0.000 description 1
- 101001032872 Homo sapiens Gremlin-1 Proteins 0.000 description 1
- 101000893897 Homo sapiens Guanidinoacetate N-methyltransferase Proteins 0.000 description 1
- 101001068173 Homo sapiens HCLS1-associated protein X-1 Proteins 0.000 description 1
- 101100231743 Homo sapiens HPRT1 gene Proteins 0.000 description 1
- 101000795643 Homo sapiens Hamartin Proteins 0.000 description 1
- 101000805947 Homo sapiens Harmonin Proteins 0.000 description 1
- 101001009007 Homo sapiens Hemoglobin subunit alpha Proteins 0.000 description 1
- 101000899111 Homo sapiens Hemoglobin subunit beta Proteins 0.000 description 1
- 101001035092 Homo sapiens Heparan-alpha-glucosaminide N-acetyltransferase Proteins 0.000 description 1
- 101000838926 Homo sapiens Hermansky-Pudlak syndrome 1 protein Proteins 0.000 description 1
- 101000985492 Homo sapiens Hermansky-Pudlak syndrome 3 protein Proteins 0.000 description 1
- 101000985516 Homo sapiens Hermansky-Pudlak syndrome 5 protein Proteins 0.000 description 1
- 101000596894 Homo sapiens High affinity nerve growth factor receptor Proteins 0.000 description 1
- 101001041145 Homo sapiens Homeobox protein Hox-B13 Proteins 0.000 description 1
- 101000706471 Homo sapiens Homeobox protein prophet of Pit-1 Proteins 0.000 description 1
- 101000962530 Homo sapiens Hyaluronidase-1 Proteins 0.000 description 1
- 101001041100 Homo sapiens Hydrolethalus syndrome protein 1 Proteins 0.000 description 1
- 101001047912 Homo sapiens Hydroxymethylglutaryl-CoA lyase, mitochondrial Proteins 0.000 description 1
- 101000840540 Homo sapiens Iduronate 2-sulfatase Proteins 0.000 description 1
- 101000878602 Homo sapiens Immunoglobulin alpha Fc receptor Proteins 0.000 description 1
- 101000840267 Homo sapiens Immunoglobulin lambda-like polypeptide 1 Proteins 0.000 description 1
- 101001103039 Homo sapiens Inactive tyrosine-protein kinase transmembrane receptor ROR1 Proteins 0.000 description 1
- 101000998120 Homo sapiens Interleukin-3 receptor subunit alpha Proteins 0.000 description 1
- 101000614481 Homo sapiens Kidney-associated antigen 1 Proteins 0.000 description 1
- 101001020452 Homo sapiens LIM/homeobox protein Lhx3 Proteins 0.000 description 1
- 101000972491 Homo sapiens Laminin subunit alpha-2 Proteins 0.000 description 1
- 101001023271 Homo sapiens Laminin subunit gamma-2 Proteins 0.000 description 1
- 101001008411 Homo sapiens Lebercilin Proteins 0.000 description 1
- 101000966742 Homo sapiens Leucine-rich PPR motif-containing protein, mitochondrial Proteins 0.000 description 1
- 101001042362 Homo sapiens Leukemia inhibitory factor receptor Proteins 0.000 description 1
- 101000984197 Homo sapiens Leukocyte immunoglobulin-like receptor subfamily A member 2 Proteins 0.000 description 1
- 101001138062 Homo sapiens Leukocyte-associated immunoglobulin-like receptor 1 Proteins 0.000 description 1
- 101001122174 Homo sapiens Lipoamide acyltransferase component of branched-chain alpha-keto acid dehydrogenase complex, mitochondrial Proteins 0.000 description 1
- 101001043326 Homo sapiens Lipoxygenase homology domain-containing protein 1 Proteins 0.000 description 1
- 101000841267 Homo sapiens Long chain 3-hydroxyacyl-CoA dehydrogenase Proteins 0.000 description 1
- 101000923835 Homo sapiens Low density lipoprotein receptor adapter protein 1 Proteins 0.000 description 1
- 101001051093 Homo sapiens Low-density lipoprotein receptor Proteins 0.000 description 1
- 101001065550 Homo sapiens Lymphocyte antigen 6K Proteins 0.000 description 1
- 101001018034 Homo sapiens Lymphocyte antigen 75 Proteins 0.000 description 1
- 101000997662 Homo sapiens Lysosomal acid glucosylceramidase Proteins 0.000 description 1
- 101001004953 Homo sapiens Lysosomal acid lipase/cholesteryl ester hydrolase Proteins 0.000 description 1
- 101000979046 Homo sapiens Lysosomal alpha-mannosidase Proteins 0.000 description 1
- 101001014223 Homo sapiens MAPK/MAK/MRK overlapping kinase Proteins 0.000 description 1
- 101000575454 Homo sapiens Major facilitator superfamily domain-containing protein 8 Proteins 0.000 description 1
- 101000834118 Homo sapiens Malonate-CoA ligase ACSF3, mitochondrial Proteins 0.000 description 1
- 101001051053 Homo sapiens Mannose-6-phosphate isomerase Proteins 0.000 description 1
- 101001120868 Homo sapiens Meckel syndrome type 1 protein Proteins 0.000 description 1
- 101000575066 Homo sapiens Mediator of RNA polymerase II transcription subunit 17 Proteins 0.000 description 1
- 101000573526 Homo sapiens Membrane protein MLC1 Proteins 0.000 description 1
- 101001057193 Homo sapiens Membrane-associated guanylate kinase, WW and PDZ domain-containing protein 1 Proteins 0.000 description 1
- 101000954986 Homo sapiens Merlin Proteins 0.000 description 1
- 101000629405 Homo sapiens Mesoderm posterior protein 2 Proteins 0.000 description 1
- 101001116314 Homo sapiens Methionine synthase reductase Proteins 0.000 description 1
- 101000615488 Homo sapiens Methyl-CpG-binding domain protein 2 Proteins 0.000 description 1
- 101000587058 Homo sapiens Methylenetetrahydrofolate reductase Proteins 0.000 description 1
- 101001114654 Homo sapiens Methylmalonic aciduria type A protein, mitochondrial Proteins 0.000 description 1
- 101000588130 Homo sapiens Microsomal triglyceride transfer protein large subunit Proteins 0.000 description 1
- 101000697649 Homo sapiens Mitochondrial chaperone BCS1 Proteins 0.000 description 1
- 101000972158 Homo sapiens Mitochondrial tRNA-specific 2-thiouridylase 1 Proteins 0.000 description 1
- 101001133056 Homo sapiens Mucin-1 Proteins 0.000 description 1
- 101000934338 Homo sapiens Myeloid cell surface antigen CD33 Proteins 0.000 description 1
- 101000635885 Homo sapiens Myosin light chain 1/3, skeletal muscle isoform Proteins 0.000 description 1
- 101001132874 Homo sapiens Myotubularin Proteins 0.000 description 1
- 101001072477 Homo sapiens N-acetylglucosamine-1-phosphotransferase subunit gamma Proteins 0.000 description 1
- 101001072470 Homo sapiens N-acetylglucosamine-1-phosphotransferase subunits alpha/beta Proteins 0.000 description 1
- 101000829992 Homo sapiens N-acetylglucosamine-6-sulfatase Proteins 0.000 description 1
- 101000997654 Homo sapiens N-acetylmannosamine kinase Proteins 0.000 description 1
- 101000589519 Homo sapiens N-acetyltransferase 8 Proteins 0.000 description 1
- 101000938705 Homo sapiens N-acetyltransferase ESCO2 Proteins 0.000 description 1
- 101000983292 Homo sapiens N-fatty-acyl-amino acid synthase/hydrolase PM20D1 Proteins 0.000 description 1
- 101000651201 Homo sapiens N-sulphoglucosamine sulphohydrolase Proteins 0.000 description 1
- 101000979243 Homo sapiens NADH dehydrogenase [ubiquinone] iron-sulfur protein 6, mitochondrial Proteins 0.000 description 1
- 101001124388 Homo sapiens NPC intracellular cholesterol transporter 1 Proteins 0.000 description 1
- 101001109579 Homo sapiens NPC intracellular cholesterol transporter 2 Proteins 0.000 description 1
- 101000978730 Homo sapiens Nephrin Proteins 0.000 description 1
- 101001051490 Homo sapiens Neural cell adhesion molecule L1 Proteins 0.000 description 1
- 101001103036 Homo sapiens Nuclear receptor ROR-alpha Proteins 0.000 description 1
- 101000721757 Homo sapiens Olfactory receptor 51E2 Proteins 0.000 description 1
- 101000613490 Homo sapiens Paired box protein Pax-3 Proteins 0.000 description 1
- 101000601724 Homo sapiens Paired box protein Pax-5 Proteins 0.000 description 1
- 101000692768 Homo sapiens Paired mesoderm homeobox protein 2B Proteins 0.000 description 1
- 101000589399 Homo sapiens Pannexin-3 Proteins 0.000 description 1
- 101000945735 Homo sapiens Parafibromin Proteins 0.000 description 1
- 101000833892 Homo sapiens Peroxisomal acyl-coenzyme A oxidase 1 Proteins 0.000 description 1
- 101001045218 Homo sapiens Peroxisomal multifunctional enzyme type 2 Proteins 0.000 description 1
- 101000730779 Homo sapiens Peroxisome assembly factor 2 Proteins 0.000 description 1
- 101000579342 Homo sapiens Peroxisome assembly protein 12 Proteins 0.000 description 1
- 101001099372 Homo sapiens Peroxisome biogenesis factor 1 Proteins 0.000 description 1
- 101001126498 Homo sapiens Peroxisome biogenesis factor 10 Proteins 0.000 description 1
- 101000693847 Homo sapiens Peroxisome biogenesis factor 2 Proteins 0.000 description 1
- 101000938567 Homo sapiens Persulfide dioxygenase ETHE1, mitochondrial Proteins 0.000 description 1
- 101001094831 Homo sapiens Phosphomannomutase 2 Proteins 0.000 description 1
- 101000633511 Homo sapiens Photoreceptor-specific nuclear receptor Proteins 0.000 description 1
- 101000691463 Homo sapiens Placenta-specific protein 1 Proteins 0.000 description 1
- 101001126417 Homo sapiens Platelet-derived growth factor receptor alpha Proteins 0.000 description 1
- 101001064779 Homo sapiens Plexin domain-containing protein 2 Proteins 0.000 description 1
- 101000595193 Homo sapiens Podocin Proteins 0.000 description 1
- 101000617725 Homo sapiens Pregnancy-specific beta-1-glycoprotein 2 Proteins 0.000 description 1
- 101000874919 Homo sapiens Probable arginine-tRNA ligase, mitochondrial Proteins 0.000 description 1
- 101001098989 Homo sapiens Propionyl-CoA carboxylase alpha chain, mitochondrial Proteins 0.000 description 1
- 101001098982 Homo sapiens Propionyl-CoA carboxylase beta chain, mitochondrial Proteins 0.000 description 1
- 101001136592 Homo sapiens Prostate stem cell antigen Proteins 0.000 description 1
- 101001136981 Homo sapiens Proteasome subunit beta type-9 Proteins 0.000 description 1
- 101000710213 Homo sapiens Protein CLN8 Proteins 0.000 description 1
- 101000875616 Homo sapiens Protein FAM161A Proteins 0.000 description 1
- 101000969776 Homo sapiens Protein Mpv17 Proteins 0.000 description 1
- 101000979748 Homo sapiens Protein NDRG1 Proteins 0.000 description 1
- 101000880770 Homo sapiens Protein SSX2 Proteins 0.000 description 1
- 101000814371 Homo sapiens Protein Wnt-10a Proteins 0.000 description 1
- 101000720958 Homo sapiens Protein artemis Proteins 0.000 description 1
- 101000726148 Homo sapiens Protein crumbs homolog 1 Proteins 0.000 description 1
- 101001028804 Homo sapiens Protein eyes shut homolog Proteins 0.000 description 1
- 101000893100 Homo sapiens Protein fantom Proteins 0.000 description 1
- 101000579425 Homo sapiens Proto-oncogene tyrosine-protein kinase receptor Ret Proteins 0.000 description 1
- 101001072259 Homo sapiens Protocadherin-15 Proteins 0.000 description 1
- 101001120726 Homo sapiens Pyruvate dehydrogenase E1 component subunit alpha, somatic form, mitochondrial Proteins 0.000 description 1
- 101001137451 Homo sapiens Pyruvate dehydrogenase E1 component subunit beta, mitochondrial Proteins 0.000 description 1
- 101000620777 Homo sapiens Rab proteins geranylgeranyltransferase component A 1 Proteins 0.000 description 1
- 101001130305 Homo sapiens Ras-related protein Rab-23 Proteins 0.000 description 1
- 101001012157 Homo sapiens Receptor tyrosine-protein kinase erbB-2 Proteins 0.000 description 1
- 101000932478 Homo sapiens Receptor-type tyrosine-protein kinase FLT3 Proteins 0.000 description 1
- 101000639763 Homo sapiens Regulator of telomere elongation helicase 1 Proteins 0.000 description 1
- 101000729271 Homo sapiens Retinoid isomerohydrolase Proteins 0.000 description 1
- 101000742938 Homo sapiens Retinol dehydrogenase 12 Proteins 0.000 description 1
- 101000846198 Homo sapiens Ribitol 5-phosphate transferase FKRP Proteins 0.000 description 1
- 101000846336 Homo sapiens Ribitol-5-phosphate transferase FKTN Proteins 0.000 description 1
- 101001125551 Homo sapiens Ribose-phosphate pyrophosphokinase 1 Proteins 0.000 description 1
- 101000615373 Homo sapiens SWI/SNF-related matrix-associated actin-dependent regulator of chromatin subfamily A-like protein 1 Proteins 0.000 description 1
- 101000702542 Homo sapiens SWI/SNF-related matrix-associated actin-dependent regulator of chromatin subfamily E member 1 Proteins 0.000 description 1
- 101000641122 Homo sapiens Sacsin Proteins 0.000 description 1
- 101000665137 Homo sapiens Scm-like with four MBT domains protein 1 Proteins 0.000 description 1
- 101000836983 Homo sapiens Secretoglobin family 1D member 1 Proteins 0.000 description 1
- 101000629622 Homo sapiens Serine-pyruvate aminotransferase Proteins 0.000 description 1
- 101000777277 Homo sapiens Serine/threonine-protein kinase Chk2 Proteins 0.000 description 1
- 101000628562 Homo sapiens Serine/threonine-protein kinase STK11 Proteins 0.000 description 1
- 101000649929 Homo sapiens Serine/threonine-protein kinase VRK1 Proteins 0.000 description 1
- 101000884271 Homo sapiens Signal transducer CD24 Proteins 0.000 description 1
- 101000785978 Homo sapiens Sphingomyelin phosphodiesterase Proteins 0.000 description 1
- 101000873927 Homo sapiens Squamous cell carcinoma antigen recognized by T-cells 3 Proteins 0.000 description 1
- 101000896517 Homo sapiens Steroid 17-alpha-hydroxylase/17,20 lyase Proteins 0.000 description 1
- 101000861263 Homo sapiens Steroid 21-hydroxylase Proteins 0.000 description 1
- 101000875401 Homo sapiens Sterol 26-hydroxylase, mitochondrial Proteins 0.000 description 1
- 101000951145 Homo sapiens Succinate dehydrogenase [ubiquinone] cytochrome b small subunit, mitochondrial Proteins 0.000 description 1
- 101000685323 Homo sapiens Succinate dehydrogenase [ubiquinone] flavoprotein subunit, mitochondrial Proteins 0.000 description 1
- 101000874160 Homo sapiens Succinate dehydrogenase [ubiquinone] iron-sulfur subunit, mitochondrial Proteins 0.000 description 1
- 101000934888 Homo sapiens Succinate dehydrogenase cytochrome b560 subunit, mitochondrial Proteins 0.000 description 1
- 101000628885 Homo sapiens Suppressor of fused homolog Proteins 0.000 description 1
- 101000617738 Homo sapiens Survival motor neuron protein Proteins 0.000 description 1
- 101000835705 Homo sapiens Tectonin beta-propeller repeat-containing protein 2 Proteins 0.000 description 1
- 101000714168 Homo sapiens Testisin Proteins 0.000 description 1
- 101000799466 Homo sapiens Thrombopoietin receptor Proteins 0.000 description 1
- 101000796134 Homo sapiens Thymidine phosphorylase Proteins 0.000 description 1
- 101000772267 Homo sapiens Thyrotropin receptor Proteins 0.000 description 1
- 101000702545 Homo sapiens Transcription activator BRG1 Proteins 0.000 description 1
- 101000894428 Homo sapiens Transcriptional repressor CTCFL Proteins 0.000 description 1
- 101000652736 Homo sapiens Transgelin Proteins 0.000 description 1
- 101000925985 Homo sapiens Translation initiation factor eIF-2B subunit epsilon Proteins 0.000 description 1
- 101000638154 Homo sapiens Transmembrane protease serine 2 Proteins 0.000 description 1
- 101000637950 Homo sapiens Transmembrane protein 127 Proteins 0.000 description 1
- 101000681215 Homo sapiens Transmembrane protein 216 Proteins 0.000 description 1
- 101000795659 Homo sapiens Tuberin Proteins 0.000 description 1
- 101000800287 Homo sapiens Tubulointerstitial nephritis antigen-like Proteins 0.000 description 1
- 101000851376 Homo sapiens Tumor necrosis factor receptor superfamily member 8 Proteins 0.000 description 1
- 101001047681 Homo sapiens Tyrosine-protein kinase Lck Proteins 0.000 description 1
- 101000740048 Homo sapiens Ubiquitin carboxyl-terminal hydrolase BAP1 Proteins 0.000 description 1
- 101000808105 Homo sapiens Uroplakin-2 Proteins 0.000 description 1
- 101000805941 Homo sapiens Usherin Proteins 0.000 description 1
- 101001061851 Homo sapiens V(D)J recombination-activating protein 2 Proteins 0.000 description 1
- 101000854875 Homo sapiens V-type proton ATPase 116 kDa subunit a 3 Proteins 0.000 description 1
- 101000670953 Homo sapiens V-type proton ATPase subunit B, kidney isoform Proteins 0.000 description 1
- 101000667092 Homo sapiens Vacuolar protein sorting-associated protein 13A Proteins 0.000 description 1
- 101000667110 Homo sapiens Vacuolar protein sorting-associated protein 13B Proteins 0.000 description 1
- 101000771982 Homo sapiens Vacuolar protein sorting-associated protein 45 Proteins 0.000 description 1
- 101000851007 Homo sapiens Vascular endothelial growth factor receptor 2 Proteins 0.000 description 1
- 101000854931 Homo sapiens Visual system homeobox 2 Proteins 0.000 description 1
- 101000814512 Homo sapiens X antigen family member 1 Proteins 0.000 description 1
- 101000785721 Homo sapiens Zinc finger FYVE domain-containing protein 26 Proteins 0.000 description 1
- 101001026573 Homo sapiens cAMP-dependent protein kinase type I-alpha regulatory subunit Proteins 0.000 description 1
- 101001039228 Homo sapiens mRNA export factor GLE1 Proteins 0.000 description 1
- 241000598171 Human adenovirus sp. Species 0.000 description 1
- 241000046923 Human bocavirus Species 0.000 description 1
- 241001207270 Human enterovirus Species 0.000 description 1
- 241000430519 Human rhinovirus sp. Species 0.000 description 1
- 208000023105 Huntington disease Diseases 0.000 description 1
- 102100039283 Hyaluronidase-1 Human genes 0.000 description 1
- UFHFLCQGNIYNRP-UHFFFAOYSA-N Hydrogen Chemical compound [H][H] UFHFLCQGNIYNRP-UHFFFAOYSA-N 0.000 description 1
- 102100021092 Hydrolethalus syndrome protein 1 Human genes 0.000 description 1
- 102100024004 Hydroxymethylglutaryl-CoA lyase, mitochondrial Human genes 0.000 description 1
- UGQMRVRMYYASKQ-UHFFFAOYSA-N Hypoxanthine nucleoside Natural products OC1C(O)C(CO)OC1N1C(NC=NC2=O)=C2N=C1 UGQMRVRMYYASKQ-UHFFFAOYSA-N 0.000 description 1
- XQFRJNBWHJMXHO-RRKCRQDMSA-N IDUR Chemical compound C1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)C(I)=C1 XQFRJNBWHJMXHO-RRKCRQDMSA-N 0.000 description 1
- 108010031794 IGF Type 1 Receptor Proteins 0.000 description 1
- 102100029199 Iduronate 2-sulfatase Human genes 0.000 description 1
- 229940076838 Immune checkpoint inhibitor Drugs 0.000 description 1
- 102100038005 Immunoglobulin alpha Fc receptor Human genes 0.000 description 1
- 102100029616 Immunoglobulin lambda-like polypeptide 1 Human genes 0.000 description 1
- 102100039615 Inactive tyrosine-protein kinase transmembrane receptor ROR1 Human genes 0.000 description 1
- 229930010555 Inosine Natural products 0.000 description 1
- UGQMRVRMYYASKQ-KQYNXXCUSA-N Inosine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C2=NC=NC(O)=C2N=C1 UGQMRVRMYYASKQ-KQYNXXCUSA-N 0.000 description 1
- 102100039688 Insulin-like growth factor 1 receptor Human genes 0.000 description 1
- 108010061833 Integrases Proteins 0.000 description 1
- 108010002352 Interleukin-1 Proteins 0.000 description 1
- 102100033493 Interleukin-3 receptor subunit alpha Human genes 0.000 description 1
- 102100025392 Isovaleryl-CoA dehydrogenase, mitochondrial Human genes 0.000 description 1
- 101710201965 Isovaleryl-CoA dehydrogenase, mitochondrial Proteins 0.000 description 1
- 102000017792 KCNJ11 Human genes 0.000 description 1
- 102100034872 Kallikrein-4 Human genes 0.000 description 1
- 208000008839 Kidney Neoplasms Diseases 0.000 description 1
- 241000588747 Klebsiella pneumoniae Species 0.000 description 1
- 102100031413 L-dopachrome tautomerase Human genes 0.000 description 1
- 101710093778 L-dopachrome tautomerase Proteins 0.000 description 1
- UBORTCNDUKBEOP-UHFFFAOYSA-N L-xanthosine Natural products OC1C(O)C(CO)OC1N1C(NC(=O)NC2=O)=C2N=C1 UBORTCNDUKBEOP-UHFFFAOYSA-N 0.000 description 1
- 102100036106 LIM/homeobox protein Lhx3 Human genes 0.000 description 1
- 241000186869 Lactobacillus salivarius Species 0.000 description 1
- 102100022745 Laminin subunit alpha-2 Human genes 0.000 description 1
- 102100022743 Laminin subunit alpha-4 Human genes 0.000 description 1
- 102100024629 Laminin subunit beta-3 Human genes 0.000 description 1
- 102100035159 Laminin subunit gamma-2 Human genes 0.000 description 1
- 206010023927 Lassa fever Diseases 0.000 description 1
- 101000740049 Latilactobacillus curvatus Bioactive peptide 1 Proteins 0.000 description 1
- 102100027443 Lebercilin Human genes 0.000 description 1
- 102000004856 Lectins Human genes 0.000 description 1
- 241000589264 Legionella longbeachae Species 0.000 description 1
- 241000589929 Leptospira interrogans Species 0.000 description 1
- 241001453171 Leptotrichia Species 0.000 description 1
- 241000123728 Leptotrichia buccalis Species 0.000 description 1
- 241000029603 Leptotrichia shahii Species 0.000 description 1
- 102100040589 Leucine-rich PPR motif-containing protein, mitochondrial Human genes 0.000 description 1
- 102100021747 Leukemia inhibitory factor receptor Human genes 0.000 description 1
- 102100025586 Leukocyte immunoglobulin-like receptor subfamily A member 2 Human genes 0.000 description 1
- 102100020943 Leukocyte-associated immunoglobulin-like receptor 1 Human genes 0.000 description 1
- 108090000364 Ligases Proteins 0.000 description 1
- 102000003960 Ligases Human genes 0.000 description 1
- 102100035135 Limbin Human genes 0.000 description 1
- 108050003065 Limbin Proteins 0.000 description 1
- 102100027064 Lipoamide acyltransferase component of branched-chain alpha-keto acid dehydrogenase complex, mitochondrial Human genes 0.000 description 1
- 108010013563 Lipoprotein Lipase Proteins 0.000 description 1
- 102100022119 Lipoprotein lipase Human genes 0.000 description 1
- 102100021959 Lipoxygenase homology domain-containing protein 1 Human genes 0.000 description 1
- 241000390917 Listeria newyorkensis Species 0.000 description 1
- 241000186807 Listeria seeligeri Species 0.000 description 1
- 102100029107 Long chain 3-hydroxyacyl-CoA dehydrogenase Human genes 0.000 description 1
- 102100034389 Low density lipoprotein receptor adapter protein 1 Human genes 0.000 description 1
- 102100024640 Low-density lipoprotein receptor Human genes 0.000 description 1
- 206010024971 Lower respiratory tract infections Diseases 0.000 description 1
- 208000016604 Lyme disease Diseases 0.000 description 1
- 102100032129 Lymphocyte antigen 6K Human genes 0.000 description 1
- 102100033486 Lymphocyte antigen 75 Human genes 0.000 description 1
- 206010025323 Lymphomas Diseases 0.000 description 1
- 102100033342 Lysosomal acid glucosylceramidase Human genes 0.000 description 1
- 102100026001 Lysosomal acid lipase/cholesteryl ester hydrolase Human genes 0.000 description 1
- 102100023231 Lysosomal alpha-mannosidase Human genes 0.000 description 1
- 102100031520 MAPK/MAK/MRK overlapping kinase Human genes 0.000 description 1
- 102000016200 MART-1 Antigen Human genes 0.000 description 1
- 108010010995 MART-1 Antigen Proteins 0.000 description 1
- 102000003624 MCOLN1 Human genes 0.000 description 1
- 101150091161 MCOLN1 gene Proteins 0.000 description 1
- 102100026371 MHC class II transactivator Human genes 0.000 description 1
- 108700002010 MHC class II transactivator Proteins 0.000 description 1
- 229910015837 MSH2 Inorganic materials 0.000 description 1
- 108700012912 MYCN Proteins 0.000 description 1
- 101150022024 MYCN gene Proteins 0.000 description 1
- 102100025613 Major facilitator superfamily domain-containing protein 8 Human genes 0.000 description 1
- 102100026665 Malonate-CoA ligase ACSF3, mitochondrial Human genes 0.000 description 1
- 208000000932 Marburg Virus Disease Diseases 0.000 description 1
- 201000011013 Marburg hemorrhagic fever Diseases 0.000 description 1
- 102100026048 Meckel syndrome type 1 protein Human genes 0.000 description 1
- 102100025530 Mediator of RNA polymerase II transcription subunit 17 Human genes 0.000 description 1
- 108010049137 Member 1 Subfamily D ATP Binding Cassette Transporter Proteins 0.000 description 1
- 108010093662 Member 11 Subfamily B ATP Binding Cassette Transporter Proteins 0.000 description 1
- 102000056548 Member 3 Solute Carrier Family 12 Human genes 0.000 description 1
- 102000012750 Membrane Glycoproteins Human genes 0.000 description 1
- 108010090054 Membrane Glycoproteins Proteins 0.000 description 1
- 102100026290 Membrane protein MLC1 Human genes 0.000 description 1
- 102100027240 Membrane-associated guanylate kinase, WW and PDZ domain-containing protein 1 Human genes 0.000 description 1
- 206010027202 Meningitis bacterial Diseases 0.000 description 1
- 206010027236 Meningitis fungal Diseases 0.000 description 1
- 208000002030 Merkel cell carcinoma Diseases 0.000 description 1
- 241000579048 Merkel cell polyomavirus Species 0.000 description 1
- 102100037106 Merlin Human genes 0.000 description 1
- 102100026817 Mesoderm posterior protein 2 Human genes 0.000 description 1
- 108090000015 Mesothelin Proteins 0.000 description 1
- 102000003735 Mesothelin Human genes 0.000 description 1
- RJQXTJLFIWVMTO-TYNCELHUSA-N Methicillin Chemical compound COC1=CC=CC(OC)=C1C(=O)N[C@@H]1C(=O)N2[C@@H](C(O)=O)C(C)(C)S[C@@H]21 RJQXTJLFIWVMTO-TYNCELHUSA-N 0.000 description 1
- 102100024614 Methionine synthase reductase Human genes 0.000 description 1
- 102100021299 Methyl-CpG-binding domain protein 2 Human genes 0.000 description 1
- 102100029684 Methylenetetrahydrofolate reductase Human genes 0.000 description 1
- 102100023377 Methylmalonic aciduria type A protein, mitochondrial Human genes 0.000 description 1
- 102000016397 Methyltransferase Human genes 0.000 description 1
- 108010050345 Microphthalmia-Associated Transcription Factor Proteins 0.000 description 1
- 102100030157 Microphthalmia-associated transcription factor Human genes 0.000 description 1
- 102100031545 Microsomal triglyceride transfer protein large subunit Human genes 0.000 description 1
- 208000025370 Middle East respiratory syndrome Diseases 0.000 description 1
- 108010074346 Mismatch Repair Endonuclease PMS2 Proteins 0.000 description 1
- 102000008071 Mismatch Repair Endonuclease PMS2 Human genes 0.000 description 1
- 102100027891 Mitochondrial chaperone BCS1 Human genes 0.000 description 1
- 102100030108 Mitochondrial ornithine transporter 1 Human genes 0.000 description 1
- 102100022450 Mitochondrial tRNA-specific 2-thiouridylase 1 Human genes 0.000 description 1
- 241000588655 Moraxella catarrhalis Species 0.000 description 1
- 102100025725 Mothers against decapentaplegic homolog 4 Human genes 0.000 description 1
- 101710143112 Mothers against decapentaplegic homolog 4 Proteins 0.000 description 1
- 102100034256 Mucin-1 Human genes 0.000 description 1
- 241000699666 Mus <mouse, genus> Species 0.000 description 1
- 101100063504 Mus musculus Dlx2 gene Proteins 0.000 description 1
- 101100481410 Mus musculus Tek gene Proteins 0.000 description 1
- 102000013609 MutL Protein Homolog 1 Human genes 0.000 description 1
- 108010026664 MutL Protein Homolog 1 Proteins 0.000 description 1
- 102100025243 Myeloid cell surface antigen CD33 Human genes 0.000 description 1
- 108010009047 Myosin VIIa Proteins 0.000 description 1
- 102100033817 Myotubularin Human genes 0.000 description 1
- 108700026495 N-Myc Proto-Oncogene Proteins 0.000 description 1
- 102100036713 N-acetylglucosamine-1-phosphotransferase subunit gamma Human genes 0.000 description 1
- 102100036710 N-acetylglucosamine-1-phosphotransferase subunits alpha/beta Human genes 0.000 description 1
- 102100023282 N-acetylglucosamine-6-sulfatase Human genes 0.000 description 1
- 102100032618 N-acetylglutamate synthase, mitochondrial Human genes 0.000 description 1
- 102100033341 N-acetylmannosamine kinase Human genes 0.000 description 1
- 102100030822 N-acetyltransferase ESCO2 Human genes 0.000 description 1
- 102100026873 N-fatty-acyl-amino acid synthase/hydrolase PM20D1 Human genes 0.000 description 1
- 102100030124 N-myc proto-oncogene protein Human genes 0.000 description 1
- 102100027661 N-sulphoglucosamine sulphohydrolase Human genes 0.000 description 1
- 102100023214 NADH dehydrogenase [ubiquinone] iron-sulfur protein 6, mitochondrial Human genes 0.000 description 1
- 108010082739 NADPH Oxidase 2 Proteins 0.000 description 1
- 102100029565 NPC intracellular cholesterol transporter 1 Human genes 0.000 description 1
- 102100022737 NPC intracellular cholesterol transporter 2 Human genes 0.000 description 1
- 208000002454 Nasopharyngeal Carcinoma Diseases 0.000 description 1
- 206010061306 Nasopharyngeal cancer Diseases 0.000 description 1
- 241000588653 Neisseria Species 0.000 description 1
- 206010062701 Nematodiasis Diseases 0.000 description 1
- 102100023195 Nephrin Human genes 0.000 description 1
- 108010069196 Neural Cell Adhesion Molecules Proteins 0.000 description 1
- 102100027347 Neural cell adhesion molecule 1 Human genes 0.000 description 1
- 102100024964 Neural cell adhesion molecule L1 Human genes 0.000 description 1
- 206010029266 Neuroendocrine carcinoma of the skin Diseases 0.000 description 1
- 102000007530 Neurofibromin 1 Human genes 0.000 description 1
- 108010085793 Neurofibromin 1 Proteins 0.000 description 1
- 208000015914 Non-Hodgkin lymphomas Diseases 0.000 description 1
- 108091092724 Noncoding DNA Proteins 0.000 description 1
- 206010029803 Nosocomial infection Diseases 0.000 description 1
- 108020004711 Nucleic Acid Probes Proteins 0.000 description 1
- 102000011931 Nucleoproteins Human genes 0.000 description 1
- 108010061100 Nucleoproteins Proteins 0.000 description 1
- 102100030224 O-phosphoseryl-tRNA(Sec) selenium transferase Human genes 0.000 description 1
- 206010030155 Oesophageal carcinoma Diseases 0.000 description 1
- 102100025128 Olfactory receptor 51E2 Human genes 0.000 description 1
- 108700020796 Oncogene Proteins 0.000 description 1
- 208000007027 Oral Candidiasis Diseases 0.000 description 1
- 101710148753 Ornithine aminotransferase Proteins 0.000 description 1
- 102100027177 Ornithine aminotransferase, mitochondrial Human genes 0.000 description 1
- 206010033128 Ovarian cancer Diseases 0.000 description 1
- 206010061535 Ovarian neoplasm Diseases 0.000 description 1
- 230000010718 Oxidation Activity Effects 0.000 description 1
- 238000009004 PCR Kit Methods 0.000 description 1
- 101150045883 POMGNT1 gene Proteins 0.000 description 1
- 108010011536 PTEN Phosphohydrolase Proteins 0.000 description 1
- 102000014160 PTEN Phosphohydrolase Human genes 0.000 description 1
- 102100040891 Paired box protein Pax-3 Human genes 0.000 description 1
- 102100037504 Paired box protein Pax-5 Human genes 0.000 description 1
- 102100026354 Paired mesoderm homeobox protein 2B Human genes 0.000 description 1
- 108020002591 Palmitoyl protein thioesterase Proteins 0.000 description 1
- 102000005327 Palmitoyl protein thioesterase Human genes 0.000 description 1
- 241001099939 Paludibacter propionicigenes Species 0.000 description 1
- 241000282577 Pan troglodytes Species 0.000 description 1
- 206010061902 Pancreatic neoplasm Diseases 0.000 description 1
- 101001024685 Pandinus imperator Pandinin-2 Proteins 0.000 description 1
- 102100032364 Pannexin-3 Human genes 0.000 description 1
- 102100034743 Parafibromin Human genes 0.000 description 1
- 241000991583 Parechovirus Species 0.000 description 1
- 102100040884 Partner and localizer of BRCA2 Human genes 0.000 description 1
- 241001494479 Pecora Species 0.000 description 1
- 241000517307 Pediculus humanus Species 0.000 description 1
- 102100035278 Pendrin Human genes 0.000 description 1
- 108010077056 Peroxisomal Targeting Signal 2 Receptor Proteins 0.000 description 1
- 102100026798 Peroxisomal acyl-coenzyme A oxidase 1 Human genes 0.000 description 1
- 102100022587 Peroxisomal multifunctional enzyme type 2 Human genes 0.000 description 1
- 102100032924 Peroxisomal targeting signal 2 receptor Human genes 0.000 description 1
- 102100032931 Peroxisome assembly factor 2 Human genes 0.000 description 1
- 102100028224 Peroxisome assembly protein 12 Human genes 0.000 description 1
- 102100038881 Peroxisome biogenesis factor 1 Human genes 0.000 description 1
- 102100030554 Peroxisome biogenesis factor 10 Human genes 0.000 description 1
- 102100025516 Peroxisome biogenesis factor 2 Human genes 0.000 description 1
- 241000009328 Perro Species 0.000 description 1
- 102100030940 Persulfide dioxygenase ETHE1, mitochondrial Human genes 0.000 description 1
- 201000005702 Pertussis Diseases 0.000 description 1
- 102100035362 Phosphomannomutase 2 Human genes 0.000 description 1
- 102100029533 Photoreceptor-specific nuclear receptor Human genes 0.000 description 1
- 108010004729 Phycoerythrin Proteins 0.000 description 1
- 102100026181 Placenta-specific protein 1 Human genes 0.000 description 1
- 108010051742 Platelet-Derived Growth Factor beta Receptor Proteins 0.000 description 1
- 102100030485 Platelet-derived growth factor receptor alpha Human genes 0.000 description 1
- 102100026547 Platelet-derived growth factor receptor beta Human genes 0.000 description 1
- 102100031889 Plexin domain-containing protein 2 Human genes 0.000 description 1
- 208000009362 Pneumococcal Pneumonia Diseases 0.000 description 1
- 241000142787 Pneumocystis jirovecii Species 0.000 description 1
- 206010035728 Pneumonia pneumococcal Diseases 0.000 description 1
- 102100036037 Podocin Human genes 0.000 description 1
- 101000616469 Polybia paulista Mastoparan-1 Proteins 0.000 description 1
- 241000605862 Porphyromonas gingivalis Species 0.000 description 1
- 241000162745 Porphyromonas gulae Species 0.000 description 1
- 102100022019 Pregnancy-specific beta-1-glycoprotein 2 Human genes 0.000 description 1
- 241000326476 Prevotella aurantiaca Species 0.000 description 1
- 241001135217 Prevotella buccae Species 0.000 description 1
- 241001116196 Prevotella saccharolytica Species 0.000 description 1
- 101710119292 Probable D-lactate dehydrogenase, mitochondrial Proteins 0.000 description 1
- 102100036134 Probable arginine-tRNA ligase, mitochondrial Human genes 0.000 description 1
- 102100023832 Prolyl endopeptidase FAP Human genes 0.000 description 1
- 102100039022 Propionyl-CoA carboxylase alpha chain, mitochondrial Human genes 0.000 description 1
- 102100039025 Propionyl-CoA carboxylase beta chain, mitochondrial Human genes 0.000 description 1
- 206010060862 Prostate cancer Diseases 0.000 description 1
- 102100036735 Prostate stem cell antigen Human genes 0.000 description 1
- 208000000236 Prostatic Neoplasms Diseases 0.000 description 1
- 102100035764 Proteasome subunit beta type-9 Human genes 0.000 description 1
- 102100034479 Protein CLN8 Human genes 0.000 description 1
- 102100036002 Protein FAM161A Human genes 0.000 description 1
- 102100032831 Protein ITPRID2 Human genes 0.000 description 1
- 102100021273 Protein Mpv17 Human genes 0.000 description 1
- 102100024980 Protein NDRG1 Human genes 0.000 description 1
- 102100030122 Protein O-GlcNAcase Human genes 0.000 description 1
- 102100036226 Protein O-linked-mannose beta-1,2-N-acetylglucosaminyltransferase 1 Human genes 0.000 description 1
- 102100037686 Protein SSX2 Human genes 0.000 description 1
- 102100039461 Protein Wnt-10a Human genes 0.000 description 1
- 102100025918 Protein artemis Human genes 0.000 description 1
- 102100027331 Protein crumbs homolog 1 Human genes 0.000 description 1
- 102100037166 Protein eyes shut homolog Human genes 0.000 description 1
- 102100040970 Protein fantom Human genes 0.000 description 1
- 102100030944 Protein-glutamine gamma-glutamyltransferase K Human genes 0.000 description 1
- 241000588767 Proteus vulgaris Species 0.000 description 1
- 102100028286 Proto-oncogene tyrosine-protein kinase receptor Ret Human genes 0.000 description 1
- 102100036382 Protocadherin-15 Human genes 0.000 description 1
- 241000517305 Pthiridae Species 0.000 description 1
- 108010007100 Pulmonary Surfactant-Associated Protein A Proteins 0.000 description 1
- 102100027773 Pulmonary surfactant-associated protein A2 Human genes 0.000 description 1
- 206010037660 Pyrexia Diseases 0.000 description 1
- CZPWVGJYEJSRLH-UHFFFAOYSA-N Pyrimidine Chemical compound C1=CN=CN=C1 CZPWVGJYEJSRLH-UHFFFAOYSA-N 0.000 description 1
- 108091093078 Pyrimidine dimer Proteins 0.000 description 1
- 102100026067 Pyruvate dehydrogenase E1 component subunit alpha, somatic form, mitochondrial Human genes 0.000 description 1
- 102100035711 Pyruvate dehydrogenase E1 component subunit beta, mitochondrial Human genes 0.000 description 1
- 102100022881 Rab proteins geranylgeranyltransferase component A 1 Human genes 0.000 description 1
- 101000962158 Ralstonia sp Maleylpyruvate isomerase Proteins 0.000 description 1
- 102100031522 Ras-related protein Rab-23 Human genes 0.000 description 1
- 241000700159 Rattus Species 0.000 description 1
- 101100388570 Rattus norvegicus Ebp gene Proteins 0.000 description 1
- 102100030086 Receptor tyrosine-protein kinase erbB-2 Human genes 0.000 description 1
- 102100020718 Receptor-type tyrosine-protein kinase FLT3 Human genes 0.000 description 1
- 208000015634 Rectal Neoplasms Diseases 0.000 description 1
- 102100034469 Regulator of telomere elongation helicase 1 Human genes 0.000 description 1
- 208000035415 Reinfection Diseases 0.000 description 1
- 208000006265 Renal cell carcinoma Diseases 0.000 description 1
- 241000242739 Renilla Species 0.000 description 1
- 206010057190 Respiratory tract infections Diseases 0.000 description 1
- 102100031176 Retinoid isomerohydrolase Human genes 0.000 description 1
- 102100038054 Retinol dehydrogenase 12 Human genes 0.000 description 1
- 241000191023 Rhodobacter capsulatus Species 0.000 description 1
- 102100031774 Ribitol 5-phosphate transferase FKRP Human genes 0.000 description 1
- 102100031754 Ribitol-5-phosphate transferase FKTN Human genes 0.000 description 1
- 102000004167 Ribonuclease P Human genes 0.000 description 1
- 108090000621 Ribonuclease P Proteins 0.000 description 1
- 108010083644 Ribonucleases Proteins 0.000 description 1
- 102000006382 Ribonucleases Human genes 0.000 description 1
- 102000004389 Ribonucleoproteins Human genes 0.000 description 1
- 108010081734 Ribonucleoproteins Proteins 0.000 description 1
- PYMYPHUHKUWMLA-LMVFSUKVSA-N Ribose Natural products OC[C@@H](O)[C@@H](O)[C@@H](O)C=O PYMYPHUHKUWMLA-LMVFSUKVSA-N 0.000 description 1
- 102100029508 Ribose-phosphate pyrophosphokinase 1 Human genes 0.000 description 1
- 241001478212 Riemerella anatipestifer Species 0.000 description 1
- 208000036485 Roseola Diseases 0.000 description 1
- 241000702670 Rotavirus Species 0.000 description 1
- 241000282849 Ruminantia Species 0.000 description 1
- 102100025373 Runt-related transcription factor 1 Human genes 0.000 description 1
- 108700019718 SAM Domain and HD Domain-Containing Protein 1 Proteins 0.000 description 1
- 101150114242 SAMHD1 gene Proteins 0.000 description 1
- 108091005774 SARS-CoV-2 proteins Proteins 0.000 description 1
- 101150116830 SEPSECS gene Proteins 0.000 description 1
- 108091006623 SLC12A3 Proteins 0.000 description 1
- 108091006633 SLC12A6 Proteins 0.000 description 1
- 108091006161 SLC17A5 Proteins 0.000 description 1
- 108091006736 SLC22A5 Proteins 0.000 description 1
- 108091006418 SLC25A13 Proteins 0.000 description 1
- 108091006411 SLC25A15 Proteins 0.000 description 1
- 108091006505 SLC26A2 Proteins 0.000 description 1
- 108091006507 SLC26A4 Proteins 0.000 description 1
- 108091006542 SLC35A3 Proteins 0.000 description 1
- 108091006924 SLC37A4 Proteins 0.000 description 1
- 108091006947 SLC39A4 Proteins 0.000 description 1
- 102000016681 SLC4A Proteins Human genes 0.000 description 1
- 108091006267 SLC4A11 Proteins 0.000 description 1
- 102000005041 SLC6A8 Human genes 0.000 description 1
- 108091006236 SLC7A7 Proteins 0.000 description 1
- 108700028341 SMARCB1 Proteins 0.000 description 1
- 101150008214 SMARCB1 gene Proteins 0.000 description 1
- 102100021248 SWI/SNF-related matrix-associated actin-dependent regulator of chromatin subfamily A-like protein 1 Human genes 0.000 description 1
- 102100025746 SWI/SNF-related matrix-associated actin-dependent regulator of chromatin subfamily B member 1 Human genes 0.000 description 1
- 102100031029 SWI/SNF-related matrix-associated actin-dependent regulator of chromatin subfamily E member 1 Human genes 0.000 description 1
- 101001053942 Saccharolobus solfataricus (strain ATCC 35092 / DSM 1617 / JCM 11322 / P2) Diphosphomevalonate decarboxylase Proteins 0.000 description 1
- 102100034272 Sacsin Human genes 0.000 description 1
- 241000607142 Salmonella Species 0.000 description 1
- 241001678561 Sarbecovirus Species 0.000 description 1
- 241000447727 Scabies Species 0.000 description 1
- 102100038689 Scm-like with four MBT domains protein 1 Human genes 0.000 description 1
- 206010040047 Sepsis Diseases 0.000 description 1
- 102100026842 Serine-pyruvate aminotransferase Human genes 0.000 description 1
- 102100031075 Serine/threonine-protein kinase Chk2 Human genes 0.000 description 1
- 102100026715 Serine/threonine-protein kinase STK11 Human genes 0.000 description 1
- 102100028235 Serine/threonine-protein kinase VRK1 Human genes 0.000 description 1
- 241000607720 Serratia Species 0.000 description 1
- 241000008910 Severe acute respiratory syndrome-related coronavirus Species 0.000 description 1
- 241000607768 Shigella Species 0.000 description 1
- 101710173694 Short transient receptor potential channel 2 Proteins 0.000 description 1
- 102100023105 Sialin Human genes 0.000 description 1
- 102100038081 Signal transducer CD24 Human genes 0.000 description 1
- 208000000453 Skin Neoplasms Diseases 0.000 description 1
- 102100034245 Solute carrier family 12 member 6 Human genes 0.000 description 1
- 102100036924 Solute carrier family 22 member 5 Human genes 0.000 description 1
- 102100026263 Sphingomyelin phosphodiesterase Human genes 0.000 description 1
- 102100035748 Squamous cell carcinoma antigen recognized by T-cells 3 Human genes 0.000 description 1
- 206010041925 Staphylococcal infections Diseases 0.000 description 1
- 241000191963 Staphylococcus epidermidis Species 0.000 description 1
- 108010049356 Steroid 11-beta-Hydroxylase Proteins 0.000 description 1
- 102100021719 Steroid 17-alpha-hydroxylase/17,20 lyase Human genes 0.000 description 1
- 102100039081 Steroid Delta-isomerase Human genes 0.000 description 1
- 102100036325 Sterol 26-hydroxylase, mitochondrial Human genes 0.000 description 1
- 208000005718 Stomach Neoplasms Diseases 0.000 description 1
- 241000194017 Streptococcus Species 0.000 description 1
- 241000193998 Streptococcus pneumoniae Species 0.000 description 1
- 241000194024 Streptococcus salivarius Species 0.000 description 1
- 102100038014 Succinate dehydrogenase [ubiquinone] cytochrome b small subunit, mitochondrial Human genes 0.000 description 1
- 102100023155 Succinate dehydrogenase [ubiquinone] flavoprotein subunit, mitochondrial Human genes 0.000 description 1
- 102100035726 Succinate dehydrogenase [ubiquinone] iron-sulfur subunit, mitochondrial Human genes 0.000 description 1
- 102100031715 Succinate dehydrogenase assembly factor 2, mitochondrial Human genes 0.000 description 1
- 108050007461 Succinate dehydrogenase assembly factor 2, mitochondrial Proteins 0.000 description 1
- 102100025393 Succinate dehydrogenase cytochrome b560 subunit, mitochondrial Human genes 0.000 description 1
- 102100030113 Sulfate transporter Human genes 0.000 description 1
- 102100026939 Suppressor of fused homolog Human genes 0.000 description 1
- 208000033809 Suppuration Diseases 0.000 description 1
- 102100021947 Survival motor neuron protein Human genes 0.000 description 1
- 101150057140 TACSTD1 gene Proteins 0.000 description 1
- 108010032166 TARP Proteins 0.000 description 1
- 102100026312 Tectonin beta-propeller repeat-containing protein 2 Human genes 0.000 description 1
- 108091046869 Telomeric non-coding RNA Proteins 0.000 description 1
- 208000024313 Testicular Neoplasms Diseases 0.000 description 1
- 206010057644 Testis cancer Diseases 0.000 description 1
- 102100036494 Testisin Human genes 0.000 description 1
- 206010043376 Tetanus Diseases 0.000 description 1
- 101150050472 Tfr2 gene Proteins 0.000 description 1
- 241000589499 Thermus thermophilus Species 0.000 description 1
- 102100034196 Thrombopoietin receptor Human genes 0.000 description 1
- 102100031372 Thymidine phosphorylase Human genes 0.000 description 1
- 208000024770 Thyroid neoplasm Diseases 0.000 description 1
- 102100029337 Thyrotropin receptor Human genes 0.000 description 1
- 208000002474 Tinea Diseases 0.000 description 1
- 102100031027 Transcription activator BRG1 Human genes 0.000 description 1
- 102100021393 Transcriptional repressor CTCFL Human genes 0.000 description 1
- 108020004566 Transfer RNA Proteins 0.000 description 1
- 102100026143 Transferrin receptor protein 2 Human genes 0.000 description 1
- 102100034267 Translation initiation factor eIF-2B subunit epsilon Human genes 0.000 description 1
- 102100031989 Transmembrane protease serine 2 Human genes 0.000 description 1
- 102100032072 Transmembrane protein 127 Human genes 0.000 description 1
- 102100022301 Transmembrane protein 216 Human genes 0.000 description 1
- 102000008579 Transposases Human genes 0.000 description 1
- 108010020764 Transposases Proteins 0.000 description 1
- 241000893966 Trichophyton verrucosum Species 0.000 description 1
- 108010039203 Tripeptidyl-Peptidase 1 Proteins 0.000 description 1
- 102100034197 Tripeptidyl-peptidase 1 Human genes 0.000 description 1
- 101710162629 Trypsin inhibitor Proteins 0.000 description 1
- 102100031638 Tuberin Human genes 0.000 description 1
- 102100033469 Tubulointerstitial nephritis antigen-like Human genes 0.000 description 1
- 108700025716 Tumor Suppressor Genes Proteins 0.000 description 1
- 102000044209 Tumor Suppressor Genes Human genes 0.000 description 1
- 108010078814 Tumor Suppressor Protein p53 Proteins 0.000 description 1
- 102000015098 Tumor Suppressor Protein p53 Human genes 0.000 description 1
- 102100036857 Tumor necrosis factor receptor superfamily member 8 Human genes 0.000 description 1
- 102100033254 Tumor suppressor ARF Human genes 0.000 description 1
- 241000287411 Turdidae Species 0.000 description 1
- 102100021869 Tyrosine aminotransferase Human genes 0.000 description 1
- 101710175714 Tyrosine aminotransferase Proteins 0.000 description 1
- 102100024036 Tyrosine-protein kinase Lck Human genes 0.000 description 1
- 102100033778 UDP-N-acetylglucosamine transporter Human genes 0.000 description 1
- 208000025865 Ulcer Diseases 0.000 description 1
- 102100031835 Unconventional myosin-VIIa Human genes 0.000 description 1
- 206010046306 Upper respiratory tract infection Diseases 0.000 description 1
- 208000023915 Ureteral Neoplasms Diseases 0.000 description 1
- 208000007097 Urinary Bladder Neoplasms Diseases 0.000 description 1
- 102100038851 Uroplakin-2 Human genes 0.000 description 1
- 102100037930 Usherin Human genes 0.000 description 1
- 208000002495 Uterine Neoplasms Diseases 0.000 description 1
- 102100029591 V(D)J recombination-activating protein 2 Human genes 0.000 description 1
- 102100020738 V-type proton ATPase 116 kDa subunit a 3 Human genes 0.000 description 1
- 102100039468 V-type proton ATPase subunit B, kidney isoform Human genes 0.000 description 1
- 102100039114 Vacuolar protein sorting-associated protein 13A Human genes 0.000 description 1
- 102100039113 Vacuolar protein sorting-associated protein 13B Human genes 0.000 description 1
- 102100029495 Vacuolar protein sorting-associated protein 45 Human genes 0.000 description 1
- 208000037009 Vaginitis bacterial Diseases 0.000 description 1
- 241000700647 Variola virus Species 0.000 description 1
- 102100033177 Vascular endothelial growth factor receptor 2 Human genes 0.000 description 1
- 241000545067 Venus Species 0.000 description 1
- 241001416177 Vicugna pacos Species 0.000 description 1
- 102100020676 Visual system homeobox 2 Human genes 0.000 description 1
- 201000007096 Vulvovaginal Candidiasis Diseases 0.000 description 1
- 206010064899 Vulvovaginal mycotic infection Diseases 0.000 description 1
- 102000040856 WT1 Human genes 0.000 description 1
- 101150084041 WT1 gene Proteins 0.000 description 1
- 102100039490 X antigen family member 1 Human genes 0.000 description 1
- 102000056014 X-linked Nuclear Human genes 0.000 description 1
- 108700042462 X-linked Nuclear Proteins 0.000 description 1
- 102100033220 Xanthine oxidase Human genes 0.000 description 1
- 108010093894 Xanthine oxidase Proteins 0.000 description 1
- UBORTCNDUKBEOP-HAVMAKPUSA-N Xanthosine Natural products O[C@@H]1[C@H](O)[C@H](CO)O[C@H]1N1C(NC(=O)NC2=O)=C2N=C1 UBORTCNDUKBEOP-HAVMAKPUSA-N 0.000 description 1
- 102100032726 Y+L amino acid transporter 1 Human genes 0.000 description 1
- 208000003152 Yellow Fever Diseases 0.000 description 1
- 102100026419 Zinc finger FYVE domain-containing protein 26 Human genes 0.000 description 1
- 102100023140 Zinc transporter ZIP4 Human genes 0.000 description 1
- 241000193458 [Clostridium] aminophilum Species 0.000 description 1
- 241001531188 [Eubacterium] rectale Species 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 239000000443 aerosol Substances 0.000 description 1
- 230000029936 alkylation Effects 0.000 description 1
- 238000005804 alkylation reaction Methods 0.000 description 1
- 108010004469 allophycocyanin Proteins 0.000 description 1
- HMFHBZSHGGEWLO-UHFFFAOYSA-N alpha-D-Furanose-Ribose Natural products OCC1OC(O)C(O)C1O HMFHBZSHGGEWLO-UHFFFAOYSA-N 0.000 description 1
- 108010009380 alpha-N-acetyl-D-glucosaminidase Proteins 0.000 description 1
- 239000012080 ambient air Substances 0.000 description 1
- 210000004381 amniotic fluid Anatomy 0.000 description 1
- 201000007538 anal carcinoma Diseases 0.000 description 1
- 108010080146 androgen receptors Proteins 0.000 description 1
- 208000007502 anemia Diseases 0.000 description 1
- 238000000137 annealing Methods 0.000 description 1
- 230000000844 anti-bacterial effect Effects 0.000 description 1
- 230000001745 anti-biotin effect Effects 0.000 description 1
- 230000000843 anti-fungal effect Effects 0.000 description 1
- 230000000842 anti-protozoal effect Effects 0.000 description 1
- 229940125644 antibody drug Drugs 0.000 description 1
- 229940121375 antifungal agent Drugs 0.000 description 1
- 239000000427 antigen Substances 0.000 description 1
- 108091007433 antigens Proteins 0.000 description 1
- 102000036639 antigens Human genes 0.000 description 1
- 239000003904 antiprotozoal agent Substances 0.000 description 1
- 244000309743 astrovirus Species 0.000 description 1
- 229940065181 bacillus anthracis Drugs 0.000 description 1
- 201000009904 bacterial meningitis Diseases 0.000 description 1
- 208000033847 bacterial urinary tract infection Diseases 0.000 description 1
- 108010005774 beta-Galactosidase Proteins 0.000 description 1
- 230000003115 biocidal effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 239000000090 biomarker Substances 0.000 description 1
- 210000000601 blood cell Anatomy 0.000 description 1
- 210000003103 bodily secretion Anatomy 0.000 description 1
- 210000004899 c-terminal region Anatomy 0.000 description 1
- 102100037490 cAMP-dependent protein kinase type I-alpha regulatory subunit Human genes 0.000 description 1
- 201000003984 candidiasis Diseases 0.000 description 1
- 210000000234 capsid Anatomy 0.000 description 1
- 230000001364 causal effect Effects 0.000 description 1
- 230000022131 cell cycle Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 208000019065 cervical carcinoma Diseases 0.000 description 1
- 239000002738 chelating agent Substances 0.000 description 1
- 238000007385 chemical modification Methods 0.000 description 1
- 239000003638 chemical reducing agent Substances 0.000 description 1
- 230000001684 chronic effect Effects 0.000 description 1
- 239000011035 citrine Substances 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 208000029742 colonic neoplasm Diseases 0.000 description 1
- 238000010835 comparative analysis Methods 0.000 description 1
- 230000009850 completed effect Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 239000000356 contaminant Substances 0.000 description 1
- 238000011359 convalescent plasma therapy Methods 0.000 description 1
- 108010007169 creatine transporter Proteins 0.000 description 1
- 208000017763 cutaneous neuroendocrine carcinoma Diseases 0.000 description 1
- 230000001351 cycling effect Effects 0.000 description 1
- 210000002726 cyst fluid Anatomy 0.000 description 1
- UHDGCWIWMRVCDJ-ZAKLUEHWSA-N cytidine Chemical compound O=C1N=C(N)C=CN1[C@H]1[C@H](O)[C@@H](O)[C@H](CO)O1 UHDGCWIWMRVCDJ-ZAKLUEHWSA-N 0.000 description 1
- 229940104302 cytosine Drugs 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 239000000412 dendrimer Substances 0.000 description 1
- 229920000736 dendritic polymer Polymers 0.000 description 1
- 230000027832 depurination Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000002405 diagnostic procedure Methods 0.000 description 1
- 238000000502 dialysis Methods 0.000 description 1
- ZPTBLXKRQACLCR-XVFCMESISA-N dihydrouridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)CC1 ZPTBLXKRQACLCR-XVFCMESISA-N 0.000 description 1
- 238000006471 dimerization reaction Methods 0.000 description 1
- 241001493065 dsRNA viruses Species 0.000 description 1
- 239000010976 emerald Substances 0.000 description 1
- 229910052876 emerald Inorganic materials 0.000 description 1
- 229940119563 enterobacter cloacae Drugs 0.000 description 1
- 229940032049 enterococcus faecalis Drugs 0.000 description 1
- 108010087914 epidermal growth factor receptor VIII Proteins 0.000 description 1
- 201000004101 esophageal cancer Diseases 0.000 description 1
- 230000001747 exhibiting effect Effects 0.000 description 1
- 210000003608 fece Anatomy 0.000 description 1
- 210000003754 fetus Anatomy 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- MHMNJMPURVTYEJ-UHFFFAOYSA-N fluorescein-5-isothiocyanate Chemical compound O1C(=O)C2=CC(N=C=S)=CC=C2C21C1=CC=C(O)C=C1OC1=CC(O)=CC=C21 MHMNJMPURVTYEJ-UHFFFAOYSA-N 0.000 description 1
- 108010021843 fluorescent protein 583 Proteins 0.000 description 1
- 231100000221 frame shift mutation induction Toxicity 0.000 description 1
- 230000037433 frameshift Effects 0.000 description 1
- 125000002446 fucosyl group Chemical group C1([C@@H](O)[C@H](O)[C@H](O)[C@@H](O1)C)* 0.000 description 1
- 201000010056 fungal meningitis Diseases 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 206010017758 gastric cancer Diseases 0.000 description 1
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 1
- 238000010362 genome editing Methods 0.000 description 1
- 210000004602 germ cell Anatomy 0.000 description 1
- 201000006592 giardiasis Diseases 0.000 description 1
- 208000005017 glioblastoma Diseases 0.000 description 1
- 239000008103 glucose Substances 0.000 description 1
- 150000004676 glycans Chemical class 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229940047650 haemophilus influenzae Drugs 0.000 description 1
- 208000014951 hematologic disease Diseases 0.000 description 1
- 208000018706 hematopoietic system disease Diseases 0.000 description 1
- 230000002440 hepatic effect Effects 0.000 description 1
- 206010073071 hepatocellular carcinoma Diseases 0.000 description 1
- 231100000844 hepatocellular carcinoma Toxicity 0.000 description 1
- 239000000710 homodimer Substances 0.000 description 1
- 102000047486 human GAPDH Human genes 0.000 description 1
- 244000005702 human microbiome Species 0.000 description 1
- 235000020256 human milk Nutrition 0.000 description 1
- 210000004251 human milk Anatomy 0.000 description 1
- 125000002887 hydroxy group Chemical group [H]O* 0.000 description 1
- 239000012274 immune-checkpoint protein inhibitor Substances 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 208000037799 influenza C Diseases 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 229960003786 inosine Drugs 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 108010028309 kalinin Proteins 0.000 description 1
- 108010024383 kallikrein 4 Proteins 0.000 description 1
- 230000001535 kindling effect Effects 0.000 description 1
- 108010008094 laminin alpha 3 Proteins 0.000 description 1
- 239000004816 latex Substances 0.000 description 1
- 229920000126 latex Polymers 0.000 description 1
- 231100000225 lethality Toxicity 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 201000007270 liver cancer Diseases 0.000 description 1
- 208000014018 liver neoplasm Diseases 0.000 description 1
- 244000144972 livestock Species 0.000 description 1
- 230000033001 locomotion Effects 0.000 description 1
- 210000002751 lymph Anatomy 0.000 description 1
- 210000004880 lymph fluid Anatomy 0.000 description 1
- 230000000527 lymphocytic effect Effects 0.000 description 1
- 102100040700 mRNA export factor GLE1 Human genes 0.000 description 1
- 208000015486 malignant pancreatic neoplasm Diseases 0.000 description 1
- 208000026037 malignant tumor of neck Diseases 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 208000015688 methicillin-resistant staphylococcus aureus infectious disease Diseases 0.000 description 1
- 229960003085 meticillin Drugs 0.000 description 1
- 238000012737 microarray-based gene expression Methods 0.000 description 1
- 244000000010 microbial pathogen Species 0.000 description 1
- 208000008588 molluscum contagiosum Diseases 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 239000000178 monomer Substances 0.000 description 1
- 238000012243 multiplex automated genomic engineering Methods 0.000 description 1
- 210000003928 nasal cavity Anatomy 0.000 description 1
- 201000009240 nasopharyngitis Diseases 0.000 description 1
- 201000011216 nasopharynx carcinoma Diseases 0.000 description 1
- 238000006386 neutralization reaction Methods 0.000 description 1
- QJGQUHMNIGDVPM-UHFFFAOYSA-N nitrogen group Chemical group [N] QJGQUHMNIGDVPM-UHFFFAOYSA-N 0.000 description 1
- 108091027963 non-coding RNA Proteins 0.000 description 1
- 102000042567 non-coding RNA Human genes 0.000 description 1
- 208000002154 non-small cell lung carcinoma Diseases 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000002853 nucleic acid probe Substances 0.000 description 1
- 238000001668 nucleic acid synthesis Methods 0.000 description 1
- 150000003833 nucleoside derivatives Chemical class 0.000 description 1
- 208000003177 ocular onchocerciasis Diseases 0.000 description 1
- 229920001542 oligosaccharide Polymers 0.000 description 1
- 150000002482 oligosaccharides Chemical class 0.000 description 1
- 238000012634 optical imaging Methods 0.000 description 1
- 208000020668 oropharyngeal carcinoma Diseases 0.000 description 1
- 210000003300 oropharynx Anatomy 0.000 description 1
- 201000006958 oropharynx cancer Diseases 0.000 description 1
- VSZGPKBBMSAYNT-RRFJBIMHSA-N oseltamivir Chemical compound CCOC(=O)C1=C[C@@H](OC(CC)CC)[C@H](NC(C)=O)[C@@H](N)C1 VSZGPKBBMSAYNT-RRFJBIMHSA-N 0.000 description 1
- 229960003752 oseltamivir Drugs 0.000 description 1
- 210000003254 palate Anatomy 0.000 description 1
- 201000002528 pancreatic cancer Diseases 0.000 description 1
- 208000008443 pancreatic carcinoma Diseases 0.000 description 1
- 239000000123 paper Substances 0.000 description 1
- 230000008506 pathogenesis Effects 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 208000030940 penile carcinoma Diseases 0.000 description 1
- 201000008174 penis carcinoma Diseases 0.000 description 1
- NBIIXXVUZAFLBC-UHFFFAOYSA-K phosphate Chemical compound [O-]P([O-])([O-])=O NBIIXXVUZAFLBC-UHFFFAOYSA-K 0.000 description 1
- 230000010287 polarization Effects 0.000 description 1
- 229920001481 poly(stearyl methacrylate) Polymers 0.000 description 1
- 102000054765 polymorphisms of proteins Human genes 0.000 description 1
- 108040000983 polyphosphate:AMP phosphotransferase activity proteins Proteins 0.000 description 1
- 229920001282 polysaccharide Polymers 0.000 description 1
- 239000005017 polysaccharide Substances 0.000 description 1
- 244000144977 poultry Species 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 210000001236 prokaryotic cell Anatomy 0.000 description 1
- 230000035755 proliferation Effects 0.000 description 1
- 208000017497 prostate disease Diseases 0.000 description 1
- 229940007042 proteus vulgaris Drugs 0.000 description 1
- 108010078587 pseudouridylate synthetase Proteins 0.000 description 1
- 239000013635 pyrimidine dimer Substances 0.000 description 1
- 102000005912 ran GTP Binding Protein Human genes 0.000 description 1
- 108010005597 ran GTP Binding Protein Proteins 0.000 description 1
- 238000012124 rapid diagnostic test Methods 0.000 description 1
- 102000016914 ras Proteins Human genes 0.000 description 1
- 108010014186 ras Proteins Proteins 0.000 description 1
- 238000013102 re-test Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 206010038038 rectal cancer Diseases 0.000 description 1
- 201000001275 rectum cancer Diseases 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000012958 reprocessing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 208000023504 respiratory system disease Diseases 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 102220083974 rs863224729 Human genes 0.000 description 1
- 101150071322 ruvC gene Proteins 0.000 description 1
- 238000005464 sample preparation method Methods 0.000 description 1
- 229910052594 sapphire Inorganic materials 0.000 description 1
- 239000010980 sapphire Substances 0.000 description 1
- 208000005687 scabies Diseases 0.000 description 1
- 230000000405 serological effect Effects 0.000 description 1
- 210000003491 skin Anatomy 0.000 description 1
- 201000000849 skin cancer Diseases 0.000 description 1
- 210000000813 small intestine Anatomy 0.000 description 1
- 238000005063 solubilization Methods 0.000 description 1
- 230000007928 solubilization Effects 0.000 description 1
- 238000003892 spreading Methods 0.000 description 1
- 201000011549 stomach cancer Diseases 0.000 description 1
- 208000022218 streptococcal pneumonia Diseases 0.000 description 1
- 229940031000 streptococcus pneumoniae Drugs 0.000 description 1
- FRGKKTITADJNOE-UHFFFAOYSA-N sulfanyloxyethane Chemical compound CCOS FRGKKTITADJNOE-UHFFFAOYSA-N 0.000 description 1
- 239000006228 supernatant Substances 0.000 description 1
- 230000009182 swimming Effects 0.000 description 1
- 208000011580 syndromic disease Diseases 0.000 description 1
- 102100029783 tRNA pseudouridine synthase A Human genes 0.000 description 1
- 101150047061 tag-72 gene Proteins 0.000 description 1
- 108010057210 telomerase RNA Proteins 0.000 description 1
- BWMISRWJRUSYEX-SZKNIZGXSA-N terbinafine hydrochloride Chemical compound Cl.C1=CC=C2C(CN(C\C=C\C#CC(C)(C)C)C)=CC=CC2=C1 BWMISRWJRUSYEX-SZKNIZGXSA-N 0.000 description 1
- 201000003120 testicular cancer Diseases 0.000 description 1
- ABZLKHKQJHEPAX-UHFFFAOYSA-N tetramethylrhodamine Chemical compound C=12C=CC(N(C)C)=CC2=[O+]C2=CC(N(C)C)=CC=C2C=1C1=CC=CC=C1C([O-])=O ABZLKHKQJHEPAX-UHFFFAOYSA-N 0.000 description 1
- 201000002510 thyroid cancer Diseases 0.000 description 1
- 201000004647 tinea pedis Diseases 0.000 description 1
- 239000011031 topaz Substances 0.000 description 1
- 229910052853 topaz Inorganic materials 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 108010058734 transglutaminase 1 Proteins 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000005945 translocation Effects 0.000 description 1
- 230000032258 transport Effects 0.000 description 1
- 125000002264 triphosphate group Chemical class [H]OP(=O)(O[H])OP(=O)(O[H])OP(=O)(O[H])O* 0.000 description 1
- 201000008827 tuberculosis Diseases 0.000 description 1
- 208000029729 tumor suppressor gene on chromosome 11 Diseases 0.000 description 1
- 210000001944 turbinate Anatomy 0.000 description 1
- 231100000397 ulcer Toxicity 0.000 description 1
- 241001515965 unidentified phage Species 0.000 description 1
- 210000000626 ureter Anatomy 0.000 description 1
- 201000005112 urinary bladder cancer Diseases 0.000 description 1
- 208000019206 urinary tract infection Diseases 0.000 description 1
- 206010046766 uterine cancer Diseases 0.000 description 1
- 201000002498 viral encephalitis Diseases 0.000 description 1
- 201000001862 viral hepatitis Diseases 0.000 description 1
- 239000012855 volatile organic compound Substances 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
- 244000000028 waterborne pathogen Species 0.000 description 1
- 229940075420 xanthine Drugs 0.000 description 1
- UBORTCNDUKBEOP-UUOKFMHZSA-N xanthosine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(NC(=O)NC2=O)=C2N=C1 UBORTCNDUKBEOP-UUOKFMHZSA-N 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/70—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving virus or bacteriophage
- C12Q1/701—Specific hybridization probes
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/10—Signal processing, e.g. from mass spectrometry [MS] or from PCR
Definitions
- This specification describes technologies generally relating to detection and discrimination of genetic mutations in a biological sample.
- programmable nuclease-based e.g., CRISPR-Cas-based
- amplificationbased assays for recognition of mutated sequences, such as in SARS-CoV-2 variants.
- mutations in the spike protein which binds to the human ACE2 receptor, can render the SARS-CoV-2 virus more infectious and/or more resistant to antibody neutralization, resulting in increased transmissibility and/or escape from immunity, whether vaccine-mediated or naturally acquired immunity.
- Variant identification can also be clinically significant, as some mutations substantially reduce the effectiveness of available monoclonal antibody therapies for the COVID-19 disease.
- compositions, computing systems, methods, and non-transitory computer readable storage mediums for addressing the above identified problems are provided in the present disclosure.
- the compositions, systems, and methods described herein satisfy the abovementioned needs, among others, and provide related advantages.
- Some advantages of the assays, compositions, computing systems, methods, and non- transitory computer readable storage mediums described herein for use in laboratory and point of care settings include low cost, minimal instrumentation, and a sample-to-answer turnaround time of under 2 hours.
- one aspect of the present disclosure provides a method for determining a mutation call for a target locus in a biological sample, at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors.
- the method includes obtaining a signal dataset comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus.
- the signal dataset represents a plurality of time points.
- the plurality of wells comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus.
- the plurality of wells further comprises a second set of wells representing a second allele for the target locus, where each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus.
- Each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value, and each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample.
- a corresponding signal yield for the respective well is determined using the corresponding plurality of reporting signals for the respective well across the plurality of time points.
- a respective candidate call identity is determined based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities.
- a voting procedure is performed across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
- the signal dataset is obtained by a procedure comprising amplifying a first plurality of nucleic acids derived from the biological sample, thereby generating a plurality of amplified nucleic acids.
- the method includes partitioning, from the plurality of amplified nucleic acids, the respective corresponding aliquot of nucleic acid; contacting the respective corresponding aliquot of nucleic acid with at least a programmable nuclease, a guide nucleic acid, and one or more reporters; and cleaving the one or more reporters using the programmable nuclease if the first allele or the second allele, respectively, is present, thereby generating a reporting signal.
- the method further comprises, for each candidate second allele in a plurality of candidate second alleles, repeating the process of obtaining, determining, determining, and performing, thereby obtaining a plurality of candidate mutation calls for the target locus, and performing a mutation call voting procedure across the plurality of candidate mutation calls for the target locus, thereby obtaining a final mutation call for the target locus.
- the first allele is a wild-type allele and each respective candidate second allele in the plurality of candidate second alleles is a different respective mutant allele for the target locus.
- the respective candidate mutation call is determined as the final mutation call for the target locus.
- the obtaining the final mutation call for the target locus is based upon a comparison of signal yields for each pair of candidate second alleles in the sub-plurality of candidate second alleles.
- Another aspect of the present disclosure provides a method for determining a mutation call for a target locus in a biological sample.
- the method comprises, using a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors, obtaining a signal dataset comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus.
- the signal dataset represents a plurality of time points.
- the plurality of wells comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus. Further still, the plurality of wells also comprises, for each respective candidate second allele in a set of candidate second alleles, a respective additional set of wells representing the respective candidate second allele for the target locus, where each well in this respective additional set of wells includes a corresponding plurality of guide nucleic acids that have the respective candidate second allele for the target locus.
- Each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value. Also, each respective well in the first set of wells and each respective well in each respective additional set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample. The method continues by determining, for each respective well in the plurality of wells, a corresponding signal yield for the respective well using the corresponding plurality of reporting signals for the respective well across the plurality of time points.
- the method further determines, for each respective well in the first set of wells, for each respective candidate second allele in the set of candidate second alleles, a respective candidate call identity based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding respective second signal yield for a corresponding well in the respective additional set of wells for the respective candidate second allele, thereby obtaining a plurality of candidate call identities.
- a voting procedure is then performed across the plurality of candidate call identities, thereby obtaining a mutation call for the target locus that is one of the candidate second alleles in the set of candidate second alleles or the first allele.
- the first allele is a wild-type allele and each respective candidate second allele in the set of candidate second alleles is a different respective mutant allele for the target locus. In some such embodiments, the first allele is other than a wild-type allele. In some such embodiments, the set of candidate second alleles consists of between 2 and 10 candidate second alleles. In some such embodiments the set of candidate second alleles consists of a single second allele.
- Another aspect of the present disclosure provides a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors, the one or more programs comprising instructions for performing any of the methods and/or embodiments disclosed above.
- Another aspect of the present disclosure provides a non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for performing any of the methods and/or embodiments disclosed above.
- Another aspect of the present disclosure provides a CRISPR-based COVID-19 variant DETECTR® assay (henceforth abbreviated as DETECTR assay) for the detection of SARS-CoV-2 mutations.
- the assay combines RT-LAMP pre-amplification followed by fluorescent detection using a CRISPR-Casl2 enzyme.
- a comparative evaluation of multiple candidate Cast 2 enzymes and robust assay performance was found with a CRISPR-Casl2 enzyme called CasDxl, which had high specificity in identifying key SNP mutations of functional relevance in the spike protein at amino acid positions 452, 484, and 501.
- the disclosure provides method of assaying for a SARS-CoV-2 variant in an individual, the method comprising: collecting a nasal swab or a throat swab from the individual; optionally extracting a target nucleic acid comprising a segment of a SARS-CoV-2 Spike (“S”) gene from the nasal swab or the throat swab; amplifying the target nucleic acid to produce an amplification product, wherein amplifying the target nucleic acid comprises contacting the target nucleic acid to a plurality of LAMP amplification primers; contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising (i) an effector protein and (ii) a guide nucleic acid comprising a nucleotide sequence that is at least 90% identical to any one of SEQ ID NOS: 1 to 6, 22 to 27, or 40-42; assaying for a change in a signal produced by clea
- S SARS-Co
- the nasal swab is a nasopharyngeal swab. In some embodiments, the throat swab is an oropharyngeal swab.
- amplifying the target nucleic acid comprises contacting the target nucleic acid to at least one reagent for amplification. In some embodiments, the amplifying comprises loop mediated amplification (LAMP). In some embodiments, the at least one reagent for amplification comprises a polymerase, dNTPs, or a combination thereof.
- LAMP loop mediated amplification
- the plurality of LAMP amplification primers comprise an FIP primer, a BIP primer, an F3 primer, a B3 primer, an LF primer, and an LB primer.
- the amplification primers comprise SEQ ID NOS: 8-13, SEQ ID NOS: 14-19, SEQ ID NOS: 28-33, and/or SEQ ID NOS: 34-39.
- the amplifying comprises reverse transcription-LAMP.
- the method further comprises lysing the sample.
- lysing the sample comprises contacting the sample to a lysis buffer.
- determining an SNP call comprises determining whether the segment of the S-gene comprises one or more S-gene mutation(s) relative to a reference wild-type SARS-CoV-2 S-gene.
- the reference wild-type SARS-CoV-2 gene is from a SARS-CoV-2-Wuhan-Hul sequence or the USA-WA1/2020 sequence.
- the one or more S-gene mutations is a single nucleotide polymorphism (SNP).
- the one or more S-gene mutations is associated with one or more Spike protein mutations.
- the one or more Spike protein mutations is (i) a mutation in amino acid position 484 from E to K (E484K), (ii) a mutation in amino acid position 501 from N to Y (N501Y), (iii) a mutation in amino acid position 452 from L to R (L452R), (iv) a mutation in amino acid position 484 from E to Q (E484Q), (v) a mutation in amino acid position 484 from E to A (E484A), or a combination thereof.
- determining an SNP call of the sample comprises comparing the signal produced by the cleavage of the reporter nucleic acid by a composition comprising a guide nucleic acid comprising a nucleotide sequence that is least 90% identical to any one of SEQ ID NOS: 1-3 or 22-24 to a signal produced by contacting a composition comprising a guide nucleic acid comprising a nucleotide sequence that is at least 90% identical to any one of SEQ ID NOS: 4-6, 25-27, or 40-42 to a nucleic acid identical to the target nucleic acid.
- the method comprises determining a variant call of the sample.
- determining the variant call comprises determining whether the sample comprises a wild-type SARS-CoV-2 or a SARS-CoV-2 variant.
- the SARS-CoV-2 variant is any one of an Alpha, Beta, Gamma, Delta, Epsilon, Kappa, Omicron, or Zeta SARS-CoV-2 variant.
- determining the variant call comprises determining one or more SNP calls of the sample.
- determining whether the sample comprises an Alpha SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an N501Y mutation.
- determining whether the sample comprises a Beta SARS-CoV-2 variant comprises detecting S-gene mutations associated with E484K and N501Y mutations. In some embodiments, determining whether the sample comprises a Mu SARS-CoV-2 variant comprises detecting S-gene mutations associated with E484K and N501Y mutations. In some embodiments, determining whether the sample comprises a Gamma SARS-CoV-2 variant comprises detecting S-gene mutations associated with E484K and N501Y mutations. In some embodiments, determining whether the sample comprises a Delta SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an L452R mutation.
- determining whether the sample comprises an Epsilon SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an L452R mutation. In some embodiments, determining whether the sample comprises a Kappa SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an L452R mutation. In some embodiments, determining whether the sample comprises an Omicron SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an E484A mutation. In some embodiments, determining whether the sample comprises a Zeta SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an E484K mutation. In some embodiments, in some embodiments, the effector protein is a Type V Cas effector protein.
- the Type V effector protein is a Cas 12 effector protein, a Cas 13 effector protein, a Cas 14 effector protein, or a CasPhi effector protein.
- the guide nucleic acid comprises a nucleotide sequence that is at least 95% identical to one of SEQ ID NO: 1 to 6, 22 to 27, or 40 to 42. In some embodiments, the guide nucleic acid comprises a nucleotide sequence that is at least 98% identical to one of SEQ ID NO: 1 to 6, 22 to 27, or 40 to 42. In some embodiments, the guide nucleic acid comprises a nucleotide sequence of one of SEQ ID NO: 1 to 6, 22 to 27, or 40 to 42.
- the reporter nucleic acid comprises a nucleotide sequence that is at least 75% identical to SEQ ID NO: 7, wherein the nucleotide sequence is flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end. In some embodiments, the reporter nucleic acid has a nucleotide sequence of SEQ ID NO: 7 flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end.
- FIG. 1 is an example block diagram illustrating a computing device and related data structures used by the computing device in accordance with some implementations of the present disclosure.
- FIGS. 2A, 2B, 2C, 2D, 2E, 2F, and 2G illustrate an example method in accordance with an embodiment of the present disclosure, in which optional steps are indicated by broken lines.
- FIG. 3 illustrates an example schematic for a method of determining a mutation call for a target locus in a sample using programmable nucleases, in accordance with some embodiments of the present disclosure.
- FIGS. 4A, 4B, and 4C collectively illustrate a method for determining a mutation call for a target locus in a sample, in accordance with some embodiments of the present disclosure.
- FIGS. 5A, 5B, and 5C collectively illustrate identification of differentiated phenotypes called as wild-type, mutated, and no-call in a signal dataset, in accordance with an embodiment of the present disclosure.
- Figure 5 A provides an allele discrimination plot visualizing the signal yields obtained from a COVID Variant programmable nuclease-based assay on gene fragments. The allele discrimination plots represent scatter plots of scaled WT and MUT fluorescence values plotted against each other.
- Figure 5B shows Mean Average (MA) plots of the COVID Variant programmable nuclease-based assay data on gene fragments to decrease ambiguity of the signal yields.
- MA Mean Average
- FIGS. 6A, 6B, 6C, 6D, 6E, 6F, 6G, 6H, 61, 6J, and 6K illustrate design and workflow for a COVID-19 Variant DETECTR® assay, in accordance with some embodiments of the present disclosure.
- FIG. 6A shows a schematic of CRISPR-Cas gRNA design for SARS-CoV-2 S gene mutations.
- 6C shows raw fluorescence curves visualizing the SNP differentiation capability of CasDxl using synthetic gene fragments of an S-gene fragment of interest from (i) a wild-type form of the virus containing no major mutations and (ii) SARS-CoV-2 variants having mutations at amino acid positions 452, 484, and 501 relative to the wild-type virus.
- FIG. 6C shows raw fluorescence curves visualizing the SNP differentiation capability of CasDxl using synthetic gene fragments of an S-gene fragment of interest from (i) a wild-type form of the virus containing no major mutations and (ii) SARS-CoV-2 variants having mutations at amino acid positions 452, 484, and 501 relative to the wild-type virus.
- FIG. 6D shows raw fluorescence curves visualizing the SNP differentiation capability of LbCasl2a (which may be referred to as “LbaCasl2a”, “LbCasl2”, and “LbCasl2a” interchangeably) using synthetic gene fragments of an S-gene fragment of interest from (i) a wild-type form of the virus containing no major mutations and (ii) SARS-CoV-2 variants having mutations at amino acid positions 452, 484, and 501 relative to the wild-type virus.
- LbCasl2a which may be referred to as “LbaCasl2a”, “LbCasl2”, and “LbCasl2a” interchangeably
- FIG. 6E shows raw fluorescence curves visualizing the SNP differentiation capability of AsCasl2a using synthetic gene fragments of an S-gene fragment of interest from (i) a wildtype form of the virus containing no major mutations and (ii) SARS-CoV-2 variants having mutations at amino acid positions 452, 484, and 501 relative to the wild-type virus.
- FIG. 6F shows a schematic of multiplexed RT-LAMP primer design showing the SARS-CoV-2 S gene mutations and gRNA positions.
- FIG. 7 shows an example workflow comparison between the COVID Variant DETECTR® assay and SARS-CoV-2 Whole-Genome-Sequencing, in accordance with some embodiments of the present disclosure.
- FIGS. 8A, 8B, 8C, 8D, 8E, 8F, 8G, and 8H illustrate example plots obtained using a DETECTR® data analysis pipeline for SARS-CoV-2 SNP mutation calling, as illustrated in FIGS. 4A-4C, in accordance with an embodiment of the present disclosure.
- FIG. 8A shows a key providing orientation for FIGS. 8B-8H relative to each other.
- FIG. 8B- 8H shows raw fluorescence RT-LAMP curves for each clinical sample.
- RT-LAMP replicates that passed quality control (QC) are represented with solid black lines and dark gray shading and failed LAMP replicates are shown with solid gray lines and light gray shading. Only valid RT-LAMP replicates were used in subsequent data analysis.
- QC quality control
- FIGS. 9A, 9B, 9C, 9D, 9E, 9F, 9G, 9H, 91, 9J, 9K, 9L, and 9M illustrate example plots obtained using the DETECTR® data analysis pipeline for SARS-CoV-2 SNP mutation calling illustrated in FIGS. 4A-4C and 8A-8H, in accordance with an embodiment of the present disclosure.
- FIG. 9A shows a key providing orientation for FIGS. 9B-9M relative to each other.
- FIGS. 9B-9M shows raw fluorescence CasDxl curves for each clinical sample amplified by RT-LAMP. Each clinical sample was amplified with RT-LAMP in triplicate, and the resulting amplicons were detected by CasDxl in triplicate.
- the raw fluorescence curves show WT detection in thick black lines and MUT detection in thin gray lines.
- FIGS. 10A, 10B, 10C, 10D, 10E, and 10F illustrate example plots obtained using the DETECTR® data analysis pipeline for SARS-CoV-2 SNP mutation calling illustrated in FIGS. 4A-4C, 8A-8H, and 9A-9M, in accordance with an embodiment of the present disclosure.
- FIG. 10A shows allele discrimination plots visualizing the scaled signals from the COVID Variant DETECTR® assay on gene fragments. The allele discrimination plots represent scatter plots of scaled WT and MUT fluorescence values plotted against each other.
- FIG. 10A shows allele discrimination plots visualizing the scaled signals from the COVID Variant DETECTR® assay on gene fragments. The allele discrimination plots represent scatter plots of scaled WT and MUT fluorescence values plotted against each other.
- FIGS. 10C-10D collectively show three representative clinical samples of different SARS-CoV-2 lineages used in the workflow of the COVID- 19 Variant DETECTR® assay.
- FIG. 10C raw fluorescence curves of each sample run in RT-LAMP amplification and subsequent triplicate DETECTR® reactions targeting both WT and MUT SNPs for L452(R), E484(K), and N501(Y) are shown.
- FIG. 10D shows box plot visualization of the end point fluorescence in DETECTR® across each SNP for the three representative clinical samples shown in FIG. 10C. Calls were made for each SNP by evaluating the median values of the DETECTR® calls and overall calls through the LAMP replicates, and given a designation of WT, MUT, or NoCall. Final calls were made on the lineage determined by each SNP.
- Non-shaded elements represent WT and shaded elements represent MUT.
- FIG. 10F shows a schematic of a data analysis pipeline workflow describing the RT-LAMP QC and subsequent CasDxl signal scaling. The scaled signals were compared across SNPs and the calls were made for each RT-LAMP replicate. The combined replicate calls defined the mutation call, which informed the final lineage classification.
- FIGS. 11A, 11B, 11C, and 11D illustrate an example evaluation of the COVID- 19 Variant DETECTR® assay compared to SARS-CoV-2 Whole-Genome Sequencing, in accordance with an embodiment of the present disclosure.
- FIG. 11A shows determination of RT-LAMP threshold with a ROC curve. Thresholds for LAMP quality analysis were derived to determine which samples had amplified sufficiently. The exact score value for this qualitative QC metric was determined using a ROC analysis.
- FIG. 11A shows determination of RT-LAMP threshold with a ROC curve. Thresholds for LAMP quality analysis were derived to determine which samples had amplified sufficiently. The exact score value for this qualitative QC metric was determined using a ROC analysis.
- FIG. 11B illustrates a heat
- FIG. 11C shows allele discrimination plots visualizing the scaled signals from the COVID Variant DETECTR® assay on clinical samples.
- FIGS. 12A, 12B, 12C, 12D, 12E, 12F, 12G, 12H, 121, 12J, 12K, 12L, and 12M shows visualization of SNP calls by the data analysis pipeline illustrated in FIGS. 11A-11D. Box plots of all the clinical samples illustrate the spread of the scaled signals for each of the samples across the replicates in the experiment. SNP calls were made on each sample agreement with the median values depicted on the box plot of the sample, which also provided an analytical confirmation of the DETECTR® results. WT detection is represented by shaded boxes and MUT detection is represented by non-shaded boxes.
- FIGS. 13A, 13B, 13C, 13D, 13E, 13F, 13G, 13H, 131, and 13J illustrate evaluation of the COVID- 19 Variant DETECTR® assay compared to SARS-CoV-2 Whole- Genome Sequencing as illustrated in FIGS. 11A-11D and 12A-12M, in accordance with an embodiment of the present disclosure.
- FIGS. 13A-13D collectively show raw fluorescence CasDxl curves for the clinical samples with discordant DETECTR® and WGS results. WT detection is represented by black lines and MUT detection is represented by gray lines.
- FIG. 13E shows visualization of the COVID Variant DETECTR® and SARS-CoV-2 WGS assays showing the alignment of final calls. Across all of the clinical samples in this cohort, 80 out of the 91 clinical sample COVID Variant DETECTR® assay calls were consistent with the SARS-CoV-2 WGS calls.
- FIG. 13F shows alignment of final mutation calls comparing the COVID-19 Variant DETECTR® and SARS-CoV-2 WGS assay results across 91 clinical samples after discordant samples (indicated by asterisk) were resolved.
- FIG. 13G shows final lineage classification on each clinical sample by the COVID- 19 Variant DETECTR® compared to the SARS-CoV-2 lineage determined by the viral WGS.
- FIGS. 13H-13I collectively show a summary of re-testing of discordant samples from the original clinical sample where nearly all SNP discrepancies are resolved.
- FIG. 13J shows final lineage classification on each clinical sample by the COVID- 19 Variant DETECTR® compared to the SARS-CoV-2 lineage determined by the viral WGS following resolution of discordant samples.
- FIGS 14A, 14B, 14C, and 14D collectively illustrate an overall results summary of final SNP calls by COVID-19 Variant DETECTR® assay and viral WGS, in accordance with an embodiment of the present disclosure.
- a summary table of the final SNP calls from the COVID-19 Variant DETECTR® assay and the SARS-CoV-2 whole genome sequencing assay after discordant testing is shown.
- the table includes the lineage classification from DETECTR® calls as well as the PANGO lineage and WHO labels assigned to the WGS calls.
- Ct values from running an FDA EUA authorized SARS-CoV-2 RT-PCR assay, the TaqpathTM COVID-19 RT-PCR kit, are shown.
- FIGS. 15A, 15B, and 15C illustrate specific detection of 484 mutations, which can enable rapid Omicron identification, in accordance with an embodiment of the present disclosure.
- FIG. 15A shows a schematic of Omicron mutations within the S-gene LAMP amplicon and relative position of 484-specific gRNAs and degenerate LAMP primers.
- FIG. 15A shows a schematic of Omicron mutations within the S-gene LAMP amplicon and relative position of 484-specific gRNAs and degenerate LAMP primers.
- FIG. 15C shows alignment of final 484 mutation calls comparing the DETECTR® and SARS-CoV-2 WGS assay results across 36 clinical samples.
- FIG. 16 shows a comparison of RT-LAMP primers for processing Omicron clinical samples, in accordance with an embodiment of the present disclosure.
- Omicron clinical samples (1802), WT (1804) control RNA, and Alpha (1806) control RNAs were amplified using the original RT-LAMP primer set and degenerate RT-LAMP primer set and plotted relative to negative control (1808).
- the degenerate primers see Table 3B
- the original primer set see Table 3A
- FIGS. 17A, 17B, 17C, and 17D show a summary of results of final SNP calls by the COVID Variant DETECTR® and WGS assays, in accordance with an embodiment of the present disclosure, with reference to Example 4 below.
- FIG. 17A shows a schematic of the workflow for determining the final variant calls. If the result was an A484, K484, or Q484, the final variant call was made. If the result was an E484, the sample was reflexed to DETECTR® analysis at the 452 and 501 positions to make the variant determination.
- FIG. 17B shows an interpretation table including the specific 484 SNPs.
- FIG. 17C shows a summary table of the final SNP calls from the DETECTR® assay and the SARS-CoV-2 whole genome sequencing assay of Example 4 including the lineage classification from DETECTR® calls as well as the PANGO lineage and WHO labels assigned to the WGS calls. Ct values were obtained from the FDA EUA authorized S TaqpathTM COVID-19 RT- PCR kit.
- Tracking the evolution and spread of pathogenic variants can inform public policy regarding testing and vaccination, as well as guide contact tracing and containment efforts during local outbreaks.
- pathogenic variants e.g., SARS-CoV-2 variants
- the ability to detect and discriminate genetic phenotypes such as wild-type and/or mutant sequences responsible for disease and infection facilitates a wide range of clinical and epidemiological applications including detection of infectious variants, monitoring the evolution and spread of pathogenic or intervention-resistant strains, and/or discovery of novel target sequences for intervention.
- SARS-CoV-2 variants threatens to substantially prolong the COVID- 19 pandemic.
- SARS-CoV-2 variants especially Variants of Concern (VOCs)
- VOCs Variants of Concern
- Variant identification can also be clinically significant, as some mutations substantially reduce the effectiveness of monoclonal antibody or convalescent plasma therapies for the disease.
- CRISPR Clustered Interspaced Short Palindromic Repeats
- EUA Emergency Use Authorization
- FDA US Food and Drug Administration
- the systems and methods disclosed herein further overcome the limitations of conventional methods for identification of genetic phenotypes in biological samples by providing improved mutation calling for target loci using programmable nuclease-based and/or amplification-based assays.
- these systems and methods provide improved sensitivity and accuracy in identifying target nucleic acids due to the use of specific primers during amplification and/or highly precise programmable nucleases for detection of target sequences.
- Such identification can be used to determine, e.g, pathogenesis, transmission risk, treatment response, diagnosis, and/or epidemiology of pathogens and infectious diseases, as well as tumor DNA and/or cancer-related viruses.
- the systems and methods disclosed herein can be used to screen for rare or novel pathogenic variants (e.g, SARS-CoV-2 variants), alone or in conjunction with conventional sequencing methods. Because the sequencing capacity for most clinical and public health laboratories is limited, the systems and methods of the present disclosure would enable rapid detection of newly emerging variants or currently circulating variants that have acquired additional mutations. The information thus obtained could directly inform outbreak investigation and public health containment efforts, such as quarantine decisions. Thus, the systems and methods disclosed herein have potential as a rapid diagnostic test alternative to sequencingbased methods. Furthermore, identification of specific mutations associated with intervention resistance, such as neutralizing antibody evasion in SARS-CoV-2 variants, could guide the care of individual patients, for instance, with regard to the use of monoclonal antibodies that remain effective in treating the infection.
- intervention resistance such as neutralizing antibody evasion in SARS-CoV-2 variants
- one aspect of the present disclosure provides systems and methods for detecting and determining mutation calls for a target locus in a biological sample, such as a single nucleotide polymorphism in a target sequence.
- a signal dataset is obtained comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus.
- the signal dataset represents a plurality of time points
- the plurality of wells comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus, and the plurality of wells further comprises a second set of wells representing a second allele for the target locus, where each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus.
- Each corresponding plurality of reporting signals includes, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value, and each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample.
- a corresponding signal yield is determined using the corresponding plurality of reporting signals for the respective well across the plurality of time points. Furthermore, for each respective well in the first set of wells, a respective candidate call identity is determined based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities. A voting procedure across the plurality of candidate call identities for the first set of wells is performed, thereby obtaining a mutation call for the target locus.
- the present disclosure provides various systems and methods for assaying for and detecting single nucleotide polymorphisms (SNPs) in a target sequence.
- the various systems and methods disclosed herein use a programmable nuclease complexed with non-naturally occurring (e.g., engineered) guide nucleic acid sequence to detect the presence or absence of, and/or quantify the amount of, a target sequence having one or more SNPs.
- the various systems and methods disclosed herein are used to distinguish or discriminate between sequences having different mutations or variations therein (and/or between SNP -containing sequences and wild-type sequences).
- the SNP-containing target nucleic acids are amplified and/or comprise amplicons.
- Amplifying SNP-containing target nucleic acids can use, for example, reverse transcription (RT) and/or isothermal amplification (e.g., loop- mediated amplification (LAMP)) or thermal amplification (e.g., polymerase chain reaction (PCR)) of RNA or DNAfy.g. , RNA or DNA extracted from a patient sample).
- RT reverse transcription
- LAMP loop- mediated amplification
- PCR polymerase chain reaction
- the present disclosure is described in relation to systems and methods for coronavirus variant detection or discrimination in a sample.
- the compositions and methods disclosed herein may be used to detect and/or determine mutations (e.g, SNPs) in other target sequences of interest.
- the compositions and methods described herein may be used to detect mutations (e.g, SNPs) associated with other viruses or strains or variants thereof, bacteria or strains or variants thereof, diseases, disorders, and/or genetic traits or susceptibilities of interest.
- the present disclosure provides various systems and methods of use thereof for assaying for and detecting mutations or variations of interest or concern in a segment of a Spike (S) gene of a coronavirus in a sample.
- the coronavirus is SARS-CoV-2 (also known as 2019 novel coronavirus, Wuhan coronavirus, or 2019-nCoV), 229E (alpha coronavirus), NL63 (alpha coronavirus), OC43 (beta coronavirus), HKU1 (beta coronavirus), MERS-CoV, or SARS-CoV.
- the coronavirus is a variant of SARS-CoV-2, particularly the alpha variant (also referred to herein as the United Kingdom (UK) variant) known as 20B/501Y.V1, VOC 202012/01, or B.1.1.7 lineage; beta variant (also referred to herein as the South African variant) known as: 20C/501Y.V2 or B.1.351 lineage; the delta variant known as B.1.617.2; the gamma variant known as P.l.
- the terms “2019-nCoV,” “SARS-CoV-2,” and “COVID-19” may be used interchangeably herein. Exemplary variants of concern or interest are shown in Table 1. The genetic characteristics of these variants are discussed in Leung et.
- the systems and methods disclosed herein are used to detect the presence or absence of the segment of the S-gene of the SARS-CoV-2 in a patient sample.
- a patient is diagnosed with COVID- 19 if the presence of SARS-CoV-2 is detected in a sample from the patient.
- the assays disclosed herein provide single nucleotide target specificity, enabling specific detection of a single coronavirus.
- candidate call identities are obtained using DETECTR assays, such as those described herein and in Example 1, below. See, for example, Broughton et al., “CRISPR-Cas 12-based detection of SARS-CoV-2,” Nature Biotechnology 38, 870- 874 (2020), which is hereby incorporated herein by reference in its entirety.
- DETECTR assays disclosed herein use amplification of samples (e.g., RT and/or isothermal amplification (e.g, LAMP) of RNA (e.g, RNA extracted from a patient sample) and/or PCR), followed by Casl2 detection of predefined target loci (e.g., coronavirus sequences), followed by cleavage of a reporter molecule to detect the presence of nucleic acids in the sample having the target loci (e.g., a viral sequence).
- a DETECTR assay targets the E (envelope) genes or N (nucleoprotein) genes of a coronavirus (e.g., SARS-CoV-2).
- a DETECTR assay targets the S-gene of a coronavirus (e.g, SARS-CoV-2) or coronavirus variant. Isothermal amplification can also be performed to amplify one or more regions of the S gene.
- a coronavirus e.g, SARS-CoV-2
- Isothermal amplification can also be performed to amplify one or more regions of the S gene.
- Table 1 Genetic Changes Characterizing the SARS-CoV-2 Variants of Concern and Variants of Interest
- any of the regions of the Spike gene comprising the groups of mutations detailed in Table 1 may be selected as target.
- compositions and methods of use thereof for assaying for and detecting a SARS-CoV-2 variant in a sample.
- the various compositions, methods, and reagents disclosed herein use an effector protein complexed with guide nucleic acid sequence to distinguish among SARS-CoV-2 variants or between a SARS-CoV-2 variant and a wildtype SARS-CoV-2.
- the variant of SARS-CoV-2 (also known as 2019 novel coronavirus or 2019-nCoV) is one of the variants known as Alpha (B.l.1.7), Beta (B.1.315), Gamma (P.l), Epsilon (B.1.427 and B.1.429), Kappa (B.1.617.1), Omicron (B.1.1.529), Zeta (P.2), and Delta (B.1.617.2) and lineages thereof.
- the assays disclosed herein target the S (spike) gene of the SARS-CoV-2 variant.
- a method of detecting a SARS-CoV-2 variant in an individual comprises a) collecting a nasal swab or a throat swab from the individual; b) optionally extracting a target nucleic acid from the nasal swab or the throat swab; c) amplifying the target nucleic acid to produce an amplification product; d) contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof; e) assaying for a change in a signal produced by cleavage of a reporter nucleic acid wherein the effector protein trans cleaves the reporter nucleic acid upon hybridization of the non-naturally occurring guide nucleic acid to the segment of the target nucleic acid or the a
- methods described herein comprises amplifying the target nucleic acid (e.g, via reverse transcription-loop-mediated isothermal amplification (RT-LAMP)).
- reagents for the amplification reaction comprise an FIP primer, a BIP primer, an F3 primer, a B3 primer, an LF primer, and a LB primer.
- the amplification primers are selected from SEQ ID NOS: 8-13, SEQ ID NOS: 14-19, SEQ ID NOS: 28-33, or SEQ ID NOS: 34-39.
- determining the variant call of the SARS-CoV-2 variant comprises detecting one or more S-gene mutation(s) (e.g., associated with a L452R, E484K, E484Q, E484A, and N501Y mutation of the S-gene product) relevant to a wild-type SARS-CoV-2.
- S-gene mutation(s) e.g., associated with a L452R, E484K, E484Q, E484A, and N501Y mutation of the S-gene product
- the non-naturally occurring guide nucleic acid comprises a nucleotide sequence that is at least 80%, 90%, 95%, 98% or 100% identical to SEQ ID NO: 1, 2, 3, 4, 5, or 6 or any one or SEQ ID NO: 22-27 or 40-42.
- the reporter nucleic acid comprises a nucleotide sequence that is at least 75% or 100% identical to SEQ ID NO: 7, wherein the nucleotide sequence is flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end.
- the terms “individual,” “subject,” and “patient” are used interchangeably and include any member of the animal kingdom, including humans.
- percent (%) sequence identity describes the number of matches (“hits”) of identical nucleotides/amino acids of two or more aligned nucleic acid or amino acid sequences as compared to the number of nucleotides or amino acid residues making up the overall length of the reference nucleic acid or amino acid sequences.
- the percentage of amino acid residues or nucleotides that are the same may be determined, when the (sub)sequences are compared and aligned for maximum correspondence over a window of comparison, or over a designated region as measured using a sequence comparison algorithm as known in the art, or when manually aligned and visually inspected.
- This definition also applies to the complement of any sequence to be aligned.
- amplification or “amplifying” is a process by which a nucleic acid molecule is enzymatically copied to generate a progeny population with the same sequence as the parental one.
- the method provided herein includes repeating hybridizing a primer to the target nucleic acid, and extending a nucleic acid complementary to the strand of the target nucleic acid from the primer using a polymerase for one or more times, e.g, until a desired amount of amplification is achieved.
- amplification of the target nucleic acid increases the concentration of the target nucleic acid in the sample relative to the concentration of nucleic acids that do not correspond to the target nucleic acid.
- the term ‘cleavage assay’ refers to an assay designed to visualize, quantitate or identify the cleavage activity of an effector protein.
- the cleavage activity may be cis-cleavage activity.
- the cleavage activity may be transcleavage activity.
- the effector protein is an activated CRISPR effector protein.
- the term ‘activated effector protein’ refers to an effector protein associated with a guide RNA bound to a target RNA , thereby forming an ‘activated’ complex capable of exhibiting trans- or cis-cleavage.
- CRISPR-RNA or “crRNA” or “spacer” refers to an RNA molecule having a sequence with sufficient complementarity to a target nucleic acid sequence to direct sequence-specific binding of an RNA-targeting complex to the target RNA sequence.
- crRNAs contain a sequence that mediates target recognition and a sequence that duplexes with a tracrRNA.
- the crRNA and tracrRNA duplex are present as parts of a single larger guide RNA molecule.
- the crRNA comprises a repeat region that interacts with the effector protein.
- the repeat region may also be referred to as a “protein-binding segment.”
- the repeat region is adjacent to the spacer region.
- a guide nucleic acid that interacts with the effector protein may comprise a repeat region that is 5’ of the spacer region.
- the spacer region of the guide nucleic acid may be complementarity to (e.g., hybridize to) a target sequence of a target nucleic acid.
- the spacer region is 15- 28 linked nucleosides in length.
- the spacer region is 15-26, 15-24, 15-22, 15- 20, 15-18, 16-28, 16-26, 16-24, 16-22, 16-20, 16-18, 17-26, 17-24, 17-22, 17-20, 17-18, 18- 26, 18-24, or 18-22 linked nucleosides in length. In some cases, the spacer region is 18-24 linked nucleosides in length. In some cases, the spacer region is at least 15 linked nucleosides in length. In some cases, the spacer region is at least 16, 18, 20, or 22 linked nucleosides in length. In some cases, the spacer region comprises at least 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides. In some cases, the spacer region is at least 17 linked nucleosides in length. In some cases, the spacer region is at least 18 linked nucleosides in length.
- a positive “detectable signal” may be any signal that can be detected using optical, fluorescent, chemiluminescent, electrochemical or other detection methods known in the art.
- a first detectable signal may be generated by binding of the CRISPR effector protein complex to the target nucleic acid, indicating that the sample contains the target nucleic acid.
- a detectable signal may be generated upon the cleavage event, the event comprising the cleavage of a detector nucleic acid by a Cas effector protein.
- the detectable signal may be generated indirectly by the cleavage event.
- the cleavage event is a trans-cleavage reaction or a cis-cleavage reaction.
- effector protein refers to a polypeptide, or a fragment thereof, possessing enzymatic activity, and that is capable of binding to a target nucleic acid molecule with the support of a guide nucleic acid molecule.
- the binding is sequence-specific.
- the guide nucleic acid molecule is DNA or RNA.
- the target nucleic acid molecule may be DNA or RNA.
- effector protein refers to a protein that is capable of modifying a nucleic acid molecule (e.g, by cleavage, deamination, recombination).
- enzymatic activities include, but are not limited to, nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity, and glycosylase activity.
- the term “guide nucleic acid” or “guide sequence” is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct binding of a nucleic acid-targeting complex to the target sequence.
- the term “gRNA”, as used here refers to any RNA molecule that supports the targeting of an effector protein described here to a target nucleic acid.
- gRNAs include, but are not limited to, crRNAs or crRNAs in combination with associated trans-activating RNAs (tracrRNAs). The latter can be independent RNAs or portions of sequences thereof can be fused into a single RNA using a linker.
- the gRNA is designed to contain a chemical or biochemical modification.
- a gRNA can contain one or more nucleotides.
- non-naturally occurring or “engineered” are used interchangeably and indicate the involvement of the hand of man.
- the terms, when referring to nucleic acid molecules or polypeptides mean that the nucleic acid molecule or the polypeptide is at least substantially free from at least one other component with which they are naturally associated in nature and as found in nature.
- nuclease activity refers to the enzymatic activity of an enzyme which allows the enzyme to cleave the phosphodiester bonds between the nucleotide subunits of nucleic acids; the term “endonuclease activity” refers to the enzymatic activity of an enzyme which allows the enzyme to cleave the phosphodiester bond within a polynucleotide chain.
- Nucleotide refers to a base-sugar-phosphate compound. Nucleotides are the monomeric subunits of both types of nucleic acid polymers, RNA and DNA. “Nucleotide” refers to ribonucleoside triphosphates, rATP, rGTP, rUTP, and rCTP, and deoxyribonucleoside triphosphates, such as dATP, dGTP, dTTP, and dCTP. As used herein, a “nucleoside” refers to a base-sugar combination without a phosphate group.
- Base refers to the nitrogen-containing base, for example, adenine (A), cytidine (C), guanine (G), and thymine (T) and uracil (U).
- a 2’ deoxyribonucleoside-5’ -triphosphate refers to a base-sugar-phosphate compound, which has hydrogen at the 2’ position of the sugar, including, but not limited to, the four common deoxyribose-containing substrates (dATP, dCTP, dGTP, dTTP), and derivatives and analogs thereof.
- a ribonucleoside-5’ -triphosphate refers to a base-sugar-phosphate compound, which has a hydroxyl group at the 2’ position of the sugar, including, but not limited to, the four common ribose-containing substrates for an RNA polymerase- ATP, CTP, GTP and UTP, and derivatives and analogs thereof.
- RNA polymerase- ATP RNA polymerase- ATP
- CTP CTP
- GTP GTP
- UTP universal ribonucleoside-5’ -triphosphate
- the upper (sense) strand sequence is in general, understood as going in the direction from its 5'- to 3'-end, and the complementary sequence is thus understood as the sequence of the lower (antisense) strand in the same direction as the upper strand.
- the reverse sequence is understood as the sequence of the upper strand in the direction from its 3'- to its 5'-end, while the ‘reverse complement’ sequence or the ‘reverse complementary’ sequence is understood as the sequence of the lower strand in the direction of its 5'- to its 3'-end.
- reporter is used interchangeably with “reporter nucleic acid,” “reporter molecule,” or “detector nucleic acid.”
- reporter nucleic acid refers generally to an off-target nucleic acid molecule that is capable of providing a detectable signal upon cleavage by an ‘activated’ effector protein.
- a reporter may comprise a single stranded nucleic acid and a detection moiety (e.g, a labeled single stranded RNA reporter), wherein the nucleic acid is capable of being cleaved by an effector protein (e.g, a CRISPR/Cas protein as disclosed herein) or a multimeric complex thereof, releasing the detection moiety, and generating a detectable signal.
- a detection moiety e.g, a labeled single stranded RNA reporter
- the effector proteins disclosed herein, activated upon hybridization of a guide nucleic acid to a target nucleic acid may cleave the reporter.
- an ‘activated’ effector protein is a CRISPR nuclease that is activated upon hybridization of a guide nucleic acid to a target nucleic acid.
- the reporter nucleic acid is further attached to a moiety, chemical compound or other component that can be used to visualize, quantitate or identify the cleavage of the reporter molecule. Cleaving the “reporter” may be referred to herein as cleaving the “reporter nucleic acid,” the “reporter molecule,” or the “nucleic acid”
- reporters may comprise RNA. Reporters may comprise DNA. In some embodiments, reporters may be double-stranded. In some embodiments, reporters may be single-stranded. In some instances, reporters comprise a protein capable of generating a signal. A signal may be a calorimetric, potentiometric, amperometric, optical (e.g, fluorescent, colorimetric, etc.), or piezo-electric signal. In some instances, the reporter comprises a detection moiety.
- Suitable detectable labels and/or moieties that may provide a signal include, but are not limited to, an enzyme, a radioisotope, a member of a specific binding pair; a fluorophore; a fluorescent protein; a quantum dot; and the like.
- the reporter comprises a detection moiety and a quenching moiety.
- the reporter comprises a cleavage site, wherein the detection moiety is located at a first site on the reporter and the quenching moiety is located at a second site on the reporter, wherein the first site and the second site are separated by the cleavage site.
- the quenching moiety is a fluorescence quenching moiety.
- the quenching moiety is 5' to the cleavage site and the detection moiety is 3' to the cleavage site. In some instances, the detection moiety is 5' to the cleavage site and the quenching moiety is 3' to the cleavage site. Sometimes the quenching moiety is at the 5' terminus of the nucleic acid of a reporter. Sometimes the detection moiety is at the 3' terminus of the nucleic acid of a reporter. In some instances, the detection moiety is at the 5' terminus of the nucleic acid of a reporter. In some instances, the quenching moiety is at the 3' terminus of the nucleic acid of a reporter.
- the terms “trans-activating crRNA” or “tracrRNA” refer to a transactivating or transactivated RNA molecule or a fragment thereof which contains a sequence with sufficient complementarity to allow association with a crRNA molecule.
- the tracrRNA can refer to an RNA molecule that provides a scaffold for binding to a CRISPR effector protein.
- the tracrRNA forms a structure facilitating the binding of a CRISPR-associated or a CRISPR effector protein to a specific target nucleic acid.
- at least a portion of a crRNA sequence and at least a portion of a tracrRNA sequence are present as parts of a larger single guide RNA molecule.
- sgRNA single guide RNA
- single guide nucleic acid refers to a single nucleic acid system comprising: a first nucleotide sequence that binds non-covalently with an effector protein; and a second nucleotide sequence that hybridizes to a target nucleic acid.
- the first nucleotide sequence is referred to as a handle sequence.
- a handle sequence may comprise at least a portion of a tracrRNA sequence, at least a portion of a repeat sequence, or a combination thereof.
- a sgRNA does not comprise a tracrRNA.
- target sequence refers to a DNA or RNA sequence to which a DNA or RNA-targeting guide RNA is designed to have complementarity, where hybridization between a target sequence and a guide RNA promotes the association of the CRISPR effector protein with the target RNA.
- tissue sample is any biological samples derived from a patient. This term includes, but is not limited to, biological fluids such as blood, serum, plasma, urine, cerebrospinal fluid, tears, saliva, lymph, dialysate, lavage fluid, semen, and other biological fluid samples. Cells and tissues of scientific origin are included as tissue samples. The term also includes cells or cells derived therefrom and their progeny, including cells in culture, cell supernatants, and cell lysates. This definition includes samples that have been manipulated in any way after they are obtained, such as treatment with reagents, solubilization, or enrichment of specific components such as polynucleotides or polypeptides. The term also includes derivatives and fractions of patient samples.
- trans-cleavage activity also referred to as “collateral” or “transcollateral” cleavage may be non-specific cleavage of nearby single-stranded nucleic acid by an activated effector protein, such as trans cleavage of detector nucleic acids with a detection moiety.
- trans cleavage assay refers to an assay designed to visualize, quantitate or identify the trans-collateral activity of an activated effector protein.
- an ‘activated’ effector protein is a CRISPR nuclease that is activated upon hybridization of a guide nucleic acid to a target nucleic acid, thereby inducing the formation of the nuclease-guide complex.
- a “DETECTR®” assay is a trans cleavage assay.
- compositions and systems comprising at least one of an engineered effector protein and an engineered guide nucleic acid, which may simply be referred to herein as an effector protein and a guide nucleic acid, respectively.
- an engineered effector protein and an engineered guide nucleic acid refer to an effector protein and a guide nucleic acid, respectively, that are not found in nature.
- systems and compositions comprise at least one non-naturally occurring component.
- compositions and systems may comprise a guide nucleic acid, wherein the sequence of the guide nucleic acid is different or modified from that of a naturally-occurring guide nucleic acid.
- compositions and systems comprise at least two components that do not naturally occur together.
- compositions and systems may comprise a guide nucleic acid comprising a repeat region and a spacer region which do not naturally occur together.
- composition and systems may comprise a guide nucleic acid and a Cas protein that do not naturally occur together.
- an effector protein or guide nucleic acid that is “natural,” “naturally-occurring,” or “ found in nature” includes Cas proteins and guide nucleic acids from cells or organisms that have not been genetically modified by a human or machine.
- the guide nucleic acid comprises a non-natural nucleobase sequence.
- the non-natural sequence is a nucleobase sequence that is not found in nature.
- the non-natural sequence may comprise a portion of a naturally-occurring sequence, wherein the portion of the naturally-occurring sequence is not present in nature absent the remainder of the naturally-occurring sequence.
- the guide nucleic acid comprises two naturally-occurring sequences arranged in an order or proximity that is not observed in nature.
- compositions and systems comprise a ribonucleotide complex comprising a CRISPR/Cas effector protein and a guide nucleic acid that do not occur together in nature.
- Engineered guide nucleic acids may comprise a first sequence and a second sequence that do not occur naturally together.
- an engineered guide nucleic acid may comprise a sequence of a naturally-occurring repeat region and a spacer region that is complementary to a naturally-occurring eukaryotic sequence.
- the engineered guide nucleic acid may comprise a sequence of a repeat region that occurs naturally in an organism and a spacer region that does not occur naturally in that organism.
- An engineered guide nucleic acid may comprise a first sequence that occurs in a first organism and a second sequence that occurs in a second organism, wherein the first organism and the second organism are different.
- the guide nucleic acid may comprise a third sequence at a 3’ or 5’ end of the guide nucleic acid, or between the first and second sequences of the guide nucleic acid.
- an engineered guide nucleic acid may comprise a naturally occurring crRNA and tracrRNA coupled by a linker sequence.
- an engineered guide nucleic acid may comprise at least a portion of a crRNA sequence and at least a portion of a tracrRNA sequence.
- compositions and systems described herein comprise an engineered effector protein that is similar to a naturally occurring effector protein.
- the engineered effector protein may lack a portion of the naturally occurring effector protein.
- the effector protein may comprise a mutation relative to the naturally-occurring effector protein, wherein the mutation is not found in nature.
- the effector protein may also comprise at least one additional amino acid relative to the naturally-occurring effector protein.
- the nucleotide sequence encoding the effector protein is codon optimized relative to the naturally occurring sequence.
- programmable nucleases and uses thereof, e.g., detection of target nucleic acids.
- a programmable nuclease is capable of being activated when complexed with the guide nucleic acid and the target nucleic acid segment.
- a programmable nuclease can be capable of being activated when complexed with a guide nucleic acid and the target sequence.
- the programmable nuclease can be activated upon binding of the guide nucleic acid to its target nucleic acid and can non-specifically degrade a non-target nucleic acid in its environment.
- the programmable nuclease has trans-cleavage activity once activated.
- a programmable nuclease can be a Cas protein (also referred to, interchangeably, as a Cas nuclease or Cas effector protein).
- a guide nucleic acid (e.g., crRNA) and Cas protein can form a CRISPR enzyme.
- one or more programmable nucleases as disclosed herein can be activated to initiate trans-cleavage activity of a reporter (also referred to herein as a reporter molecule).
- a programmable nuclease as disclosed herein can, in some cases, bind to a target sequence or target nucleic acid to initiate trans-cleavage of a reporter.
- the programmable nuclease can be referred to as an RNA-activated programmable RNA nuclease.
- the programmable nuclease as disclosed herein can bind to a target DNA to initiate trans-cleavage of an RNA reporter.
- a programmable nuclease can be referred to herein as a DNA-activated programmable RNA nuclease.
- a programmable nuclease as described herein can be activated by a target RNA or a target DNA.
- a programmable nuclease e.g, a Cas enzyme
- the programmable nuclease can bind to a target ssDNA which initiates trans- cleavage of RNA reporters.
- a programmable nuclease as disclosed herein can bind to a target DNA to initiate trans-cleavage of a DNA reporter, and this programmable nuclease can be referred to as a DNA-activated programmable DNA nuclease.
- the programmable nuclease can become activated after binding of a guide nucleic acid that is complexed with the programmable nuclease with a target nucleic acid, and the activated programmable nuclease can cleave the target nucleic acid, which can result in a trans-cleavage activity.
- Trans-cleavage activity can be non-specific cleavage of nearby single-stranded nucleic acids by the activated programmable nuclease, such as trans-cleavage of reporter nucleic acids comprising a detection moiety.
- the detection moiety can be released or separated from the reporter and can directly or indirectly generate a detectable signal.
- the reporter and/or the detection moiety can be immobilized on a support medium.
- the detection moiety is at least one of a fluorophore, a dye, a polypeptide, or a nucleic acid.
- the detection moiety binds to a capture molecule on the support medium to be immobilized.
- the detectable signal can be visualized on the support medium to assess the presence or concentration of one or more target nucleic acids associated with an ailment, such as a SNP associated with a disease, cancer, or genetic disorder.
- a programmable nuclease is any enzyme that can be or has been designed, modified, or engineered by human contribution so that the enzyme targets or cleaves a nucleic acid in a sequence-specific manner.
- Programmable nucleases can include, for example, zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and/or RNA-guided nucleases such as the bacterial clustered regularly interspaced short palindromic repeat (CRISPR)-Cas (CRISPR-associated) nucleases or Cpfl.
- Programmable nucleases can also include, for example, PfAgo and/or NgAgo.
- Cas proteins are programmable nucleases used in the methods and systems disclosed herein.
- Cas proteins can include any of the known Classes and Types of CRISPR/Cas enzymes.
- Programmable nucleases disclosed herein include Class 1 Cas proteins, such as the Type I, Type IV, or Type III Cas proteins.
- Programmable nucleases disclosed herein also include the Class 2 Cas proteins, such as the Type II, Type V, and Type VI Cas proteins.
- Programmable nucleases included in the methods disclosed herein and methods of use thereof include a Type V or Type VI Cas proteins.
- the programmable nuclease is a Type V Cas protein.
- a Type V Cas effector protein comprises a RuvC domain but lacks an HNH domain.
- the RuvC domain of the Type V Cas effector protein comprises three patrial RuvC domains (RuvC-I, RuvC-II, and RuvC-III, also referred to herein as subdomains).
- the three RuvC subdomains are located within the C-terminal half of the Type V Cas effector protein.
- none of the RuvC subdomains are located at the N terminus of the protein.
- the RuvC subdomains are contiguous.
- the RuvC subdomains are not contiguous with respect to the primary amino acid sequence of the Type V Cas protein, but form a ruvC domain once the protein is produced and folds. In some instances, there are zero to about 50 amino acids between the first and second RuvC subdomains. In some instances, there are zero to about 50 amino acids between the second and third RuvC subdomains.
- the Cas effector is a Casl4 effector.
- the Casl4 effector is a Casl4a, Casl4al, Casl4b, Casl4c, Casl4d, Casl4e, Casl4f, Casl4g, Casl4h, or Casl4u effector.
- the Cas effector is a CasPhi (also referred to herein as a Cas)) effector.
- the Cas effector is a Casl2 effector.
- the Casl2 effector is a Casl2a, Casl2b, Cast 2c, Cast 2d, Casl2e, or Casl2j effector.
- the Type V Cas protein comprises a Casl4 protein.
- Casl4 proteins can comprise a bilobed structure with distinct amino-terminal and carboxy -terminal domains.
- the amino- and carboxy-terminal domains can be connected by a flexible linker.
- the flexible linker can affect the relative conformations of the amino- and carboxyl-terminal domains.
- the flexible linker can be short, for example less than 10 amino acids, less than 8 amino acids, less than 6 amino acids, less than 5 amino acids, or less than 4 amino acids in length.
- the flexible linker can be sufficiently long to enable different conformations of the amino- and carboxy-terminal domains among two Cas 14 proteins of a Cas 14 dimer complex (e.g., the relative orientations of the amino- and carboxy-terminal domains differ between two Cas 14 proteins of a Cas 14 homodimer complex).
- the linker domain can comprise a mutation which affects the relative conformations of the amino- and carboxyl-terminal domains.
- the linker can comprise a mutation which affects Casl4 dimerization. For example, a linker mutation can enhance the stability of a Cas 14 dimer.
- the amino-terminal domain of a Cas 14 protein comprises a wedge domain, a recognition domain, a zinc finger domain, or any combination thereof.
- the wedge domain can comprise a multi-strand P-barrel structure.
- a multi-strand P-barrel structure can comprise an oligonucleotide/oligosaccharide-binding fold that is structurally comparable to those of some Casl2 proteins.
- the recognition domain and the zinc finger domain can each (individually or collectively) be inserted between P-barrel strands of the wedge domain.
- the recognition domain can comprise a 4-a-helix structure, structurally comparable but shorter than those found in some Cas 12 proteins.
- the recognition domain can comprise a binding affinity for a guide nucleic acid or for a guide nucleic acid-target nucleic acid heteroduplex.
- a REC lobe can comprise a binding affinity for a PAM sequence in the target nucleic acid.
- the amino-terminal can comprise a wedge domain, a recognition domain, and a zinc finger domain.
- the carboxy-terminal can comprise a RuvC domain, a zinc finger domain, or any combination thereof.
- the carboxy-terminal can comprise one RuvC and one zinc finger domain.
- Cas 14 proteins comprise a RuvC domain or a partial RuvC domain.
- the RuvC domain can be defined by a single, contiguous sequence, or a set of partial RuvC domains that are not contiguous with respect to the primary amino acid sequence of the Cas 14 protein.
- a partial RuvC domain does not have any substrate binding activity or catalytic activity on its own.
- a Casl4 protein of the present disclosure can include multiple partial RuvC domains, which can combine to generate a RuvC domain with substrate binding or catalytic activity.
- a Casl4 can include 3 partial RuvC domains (RuvC-I, RuvC-II, and RuvC-III, also referred to herein as subdomains) that are not contiguous with respect to the primary amino acid sequence of the Casl4 protein but form a RuvC domain once the protein is produced and folds.
- a Casl4 protein can comprise a linker loop connecting a carboxy terminal domain of the Cast 4 protein with the amino terminal domain of the Cas 14 protein, and where the carboxy terminal domain comprises one or more RuvC domains and the amino terminal domain comprises a recognition domain.
- Cas 14 proteins comprise a zinc finger domain.
- a carboxy terminal domain of a Casl4 protein comprises a zinc finger domain.
- an amino terminal domain of a Cas 14 protein comprises a zinc finger domain.
- the amino terminal domain comprises a wedge domain (e.g., a multi-P-barrel wedge structure), a zinc finger domain, or any combination thereof.
- the carboxy terminal domain comprises the RuvC domains and a zinc finger domain, and the amino terminal domain comprises a recognition domain, a wedge domain, and a zinc finger domain.
- the Type V Cas protein is a Cas) protein.
- a Cas protein can function as an endonuclease that catalyzes cleavage at a specific sequence in a target nucleic acid.
- a programmable Cas nuclease can have a single active site in a RuvC domain that is capable of catalyzing pre-crRNA processing and nicking or cleaving of nucleic acids. This compact catalytic site can render the programmable Cas nuclease especially advantageous for genome engineering and new functionalities for genome manipulation.
- the programmable nuclease is a Type VI Cas protein.
- the Type VI Cas protein is a programmable Cas 13 nuclease.
- the general architecture of a Cas 13 protein includes an N-terminal domain and two HEPN (higher eukaryotes and prokaryotes nucleotide-binding) domains separated by two helical domains.
- the HEPN domains each comprise aR-X4-H motif. Shared features across Cas 13 proteins include that upon binding of the crRNA of the guide nucleic acid to a target nucleic acid, the protein undergoes a conformational change to bring together the HEPN domains and form a catalytically active RNase.
- programmable Cas 13 nucleases also consistent with the present disclosure include Cast 3 nucleases comprising mutations in the HEPN domain that enhance the Cast 3 proteins cleavage efficiency or mutations that catalytically inactivate the HEPN domains.
- Programmable Cast 3 nucleases consistent with the present disclosure also Cast 3 nucleases comprising catalytic components.
- the Cas effector is a Cas 13 effector.
- the Cas 13 effector is a Casl3a, a Casl3b, a Cas 13c, a Cas 13d, or a Cas 13e effector protein.
- the programmable nuclease is Cas 13. In some instances, the Cas 13 is Casl3a, Casl3b, Casl3c, Casl3d, or Casl3e. In some cases, the programmable nuclease is Mad7 or Mad2. In some cases, the programmable nuclease is Cas 12. In some embodiments, the Casl2 is Casl2a, Casl2b, Casl2c, Casl2d, or Casl2e. In some cases, the programmable nuclease is Csml, Cas9, C2c4, C2c8, C2c5, C2cl0, C2c9, or CasZ.
- the Csml is also called smCmsl, miCmsl, obCmsl, or suCmsl.
- Casl3a is called C2c2.
- CasZ is called Casl4a, Casl4b, Casl4c, Casl4d, Casl4e, Casl4f, Casl4g, or Casl4h.
- the programmable nuclease is a type V CRISPR-Cas system. In some cases, the programmable nuclease is a type VI CRISPR-Cas system.
- the programmable nuclease is a type III CRISPR-Cas system. In some cases, the programmable nuclease is from at least one of Leptotrichia shahii (Lsh), Listeria seeligeri (Lse), Leptotrichia buccalis (Lbu), Leptotrichia wadeu (Lwa), Rhodobacter capsulatus (Rea), Herbinix hemicellulosilytica (Hhe), Paludibacter propionicigen.es (Ppr), Lachnospiraceae bacterium (Lba), [Eubacterium] rectale (Ere), Listeria newyorkensis (Lny), Clostridium aminophilum (Cam), Prevotella sp.
- Leptotrichia shahii Lsh
- Listeria seeligeri Lse
- Leptotrichia buccalis Lbu
- Leptotrichia wadeu Lwa
- Psm Capnocytophaga canimorsus
- Ca Lachnospiraceae bacterium
- Bzo Bergeyella zoohelcum
- Prevotella intermedia Pin
- Prevotella buccae Pbu
- Alistipes sp. Asp
- Riemerella anatipestifer Ran
- Prevotella aurantiaca Pau
- Prevotella saccharolytica Psa
- Pin2 Capnocytophaga canimorsus
- Porphyromonas gulae Pgu
- Prevotella sp Prevotella sp.
- the Casl3 is at least one of LbuCasl3a, LwaCasl3a, LbaCasl3a, HheCasl3a, PprCasl3a, EreCasl3a, CamCasl3a, or LshCasl3a.
- the trans-cleavage activity of the programmable nuclease can be activated when the guide nucleic acid is complexed with the target nucleic acid.
- the target nucleic acid is RNA or DNA.
- the programmable nuclease comprises a Cas 12 protein, where the Cas 12 enzyme binds and cleaves double stranded DNA and single stranded DNA.
- programmable nuclease comprises a Cast 3 protein, where the Cast 3 enzyme binds and cleaves single stranded RNA.
- programmable nuclease comprises a Casl4 protein, where the Casl4 enzyme binds and cleaves both double stranded DNA and single stranded DNA.
- Table 2 provides illustrative amino acid sequences of programmable nucleases having trans-cleavage activity.
- programmable nucleases described herein comprise an amino acid sequence that is at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 98%, at least 99%, or 100% identical to any one of SEQ ID NOS: 45-117.
- the programmable nuclease consists of an amino acid sequence that is at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100% identical to any one of SEQ ID NOS: 45-117.
- the programmable nuclease comprises at least about 50, at least about 100, at least about 150, at least about 200, at least about 250, at least about 300, at least about 350, at least about 400, at least about 450, or at least about 500 consecutive amino acids of any one of SEQ ID NOS: 45-117.
- effector proteins disclosed herein are engineered proteins.
- Engineered proteins are not identical to a naturally-occurring protein.
- Engineered proteins can provide enhanced nuclease or nickase activity as compared to a naturally occurring nuclease or nickase.
- An engineered protein can comprise a modified form of a wild-type counterpart protein.
- effector proteins comprise at least one amino acid change (e.g, deletion, insertion, or substitution) that enhances or reduces the nucleic acidcleaving activity of the effector protein relative to the wild-type counterpart.
- a programmable nuclease is thermostable.
- known programmable nucleases e.g, Casl2 nucleases
- a thermostable protein can have enzymatic activity, stability, or folding comparable to those at 37 °C.
- the trans-cleavage activity (e.g, the maximum trans- cleavage rate as measured by fluorescent signal generation) of a programmable nuclease in a trans-cleavage assay at 40 °C, 45 °C, 50 °C, 55 °C, 60 °C, 65 °C, 70 °C, 75 °C, 80 °C, or more is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 100%, at least 1-fold, at least 2- fold, at least 3-fold, at least 4-fold, at least 5-fold, at least 6-fold, at least 7-fold, at least 8- fold, at least 9-fold, at least 10-fold, at least 11-fold, at least 12-fold, at least 13-fold, at least 14-fold, at least 15-fold, at least 20-fold, at least 25-fold, at least 30-fold, at least 35-
- a guide nucleic acid refers to a nucleic acid that comprises a sequence that targets (e.g., is reverse complementary to) the sequence of a target nucleic acid.
- a guide nucleic acid can be DNA, RNA, or a combination thereof (e.g., RNA with a thymine base), or include a chemically modified nucleobase or phosphate backbone.
- Guide nucleic acids are often referred to as a “guide RNA” or “gRNA.”
- a guide nucleic acid can comprise deoxyribonucleotides and/or modified nucleobases.
- a guide nucleic acid is a nucleic acid molecule that binds to an effector protein (e.g., a Cas effector protein), thereby forming a ribonucleoprotein complex (RNP).
- an effector protein e.g., a Cas effector protein
- the engineered guide nucleic acid imparts activity or sequence selectivity to the effector protein.
- a guide nucleic acid when complexed with an effector protein, brings the effector protein into proximity of a target nucleic acid molecule.
- An engineered guide nucleic acid can comprise a CRISPR RNA (crRNA) that is at least partially complementary to a target nucleic acid.
- the engineered guide nucleic acid comprises a trans-activating crRNA (tracrRNA), at least a portion of which interacts with the effector protein.
- a guide nucleic acid can comprise or be coupled to a tracrRNA.
- the tracrRNA can comprise deoxyribonucleosides in addition to ribonucleosides.
- the tracrRNA can be separate from but form a complex with a crRNA.
- at least a portion of a crRNA sequence and at least a portion of a tracrRNA sequence are provided as a single guide nucleic acid, also referred to as a single guide RNA (sgRNA).
- a crRNA and tracrRNA function as two separate, unlinked molecules.
- a reporter can comprise a single stranded nucleic acid and a detection moiety (e.g., a labeled single stranded RNA reporter), where the nucleic acid is capable of being cleaved by a programmable nuclease (e.g., a Type V or VI CRISPR/Cas protein as disclosed herein) or a multimeric complex thereof, releasing the detection moiety, and generating a detectable signal.
- a programmable nuclease e.g., a Type V or VI CRISPR/Cas protein as disclosed herein
- reporter signal refers to any readout from a reporter, such as a fluorescence intensity and/or a lateral flow readout.
- Reporters can comprise RNA. Reporters can comprise DNA. Reporters can be doublestranded. Reporters can be single-stranded.
- the systems and methods disclosed herein provide a Type V CRISPR/Cas protein and a reporter nucleic acid configured to undergo transcollateral cleavage by the Type V CRISPR/Cas protein.
- Transcollateral cleavage of the reporter can generate a signal from the reporter or alter a signal from the reporter.
- the signal is an optical signal, such as a fluorescence signal or absorbance band.
- Transcollateral cleavage of the reporter can alter the wavelength, intensity, or polarization of the optical signal.
- the reporter can comprise a fluorophore and a quencher, such that transcollateral cleavage of the reporter separates the fluorophore and the quencher thereby increasing a fluorescence signal from the fluorophore.
- detection of reporter cleavage (directly or indirectly) to determine the presence of a target nucleic acid sequence is, in some embodiments, referred to as “DETECTR.”
- a method of assaying for a target nucleic acid in a sample comprising contacting the target nucleic acid with a programmable nuclease, a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid, and a reporter nucleic acid, and assaying for a change in a signal, where the change in the signal is produced by or indicative of cleavage of the reporter nucleic acid.
- the reporter comprises a detection moiety.
- the reporter comprises a cleavage site, where the detection moiety is located at a first site on the reporter, where the first site is separated from the remainder of reporter upon cleavage at the cleavage site.
- the detection moiety is 3’ to the cleavage site.
- the detection moiety is 5’ to the cleavage site.
- the detection moiety is at the 3’ terminus of the nucleic acid of a reporter.
- the detection moiety is at the 5’ terminus of the nucleic acid of a reporter.
- the reporter comprises a nucleic acid and a detection moiety.
- a reporter is connected to a surface by a linkage.
- a reporter comprises at least one of a nucleic acid, a chemical functionality, a detection moiety, a quenching moiety, or a combination thereof.
- a reporter is configured for the detection moiety to remain immobilized to the surface and the quenching moiety to be released into solution upon cleavage of the reporter.
- a reporter is configured for the quenching moiety to remain immobilized to the surface and for the detection moiety to be released into solution, upon cleavage of the reporter.
- the detection moiety is at least one of a label, a polypeptide, a dendrimer, or a nucleic acid, or a combination thereof.
- the reporter contains a label.
- the label is FITC, DIG, TAMRA, Cy5, AF594, or Cy3.
- the label comprises a dye, a nanoparticle configured to produce a signal.
- the dye is a fluorescent dye.
- the at least one chemical functionality comprises biotin.
- the at least one chemical functionality is configured to be captured by a capture probe.
- the at least one chemical functionality comprises biotin and the capture probe comprises anti-biotin, streptavidin, avidin or other molecule configured to bind with biotin.
- the dye is the chemical functionality.
- a capture probe comprises a molecule that is complementary to the chemical functionality.
- the capture antibodies are anti-FITC, anti-DIG, anti-TAMRA, anti-Cy5, anti-AF594, or any other appropriate capture antibody capable of binding the detection moiety or conjugate.
- the detection moiety is the chemical functionality.
- reporters comprise a detection moiety capable of generating a signal.
- a signal can be a calorimetric, potentiometric, amperometric, optical (e.g, fluorescent, colorimetric, etc.), or piezo-electric signal.
- Suitable detectable labels and/or moieties that can provide a signal include, but are not limited to, an enzyme, a radioisotope, a member of a specific binding pair, a fluorophore, a fluorescent protein, a quantum dot, and the like.
- the reporter comprises a detection moiety and a quenching moiety.
- the reporter comprises a cleavage site, where the detection moiety is located at a first site on the reporter and the quenching moiety is located at a second site on the reporter, where the first site and the second site are separated by the cleavage site.
- the quenching moiety is a fluorescence quenching moiety.
- the quenching moiety is 5’ to the cleavage site and the detection moiety is 3’ to the cleavage site.
- the detection moiety is 5’ to the cleavage site and the quenching moiety is 3’ to the cleavage site.
- the quenching moiety is at the 5’ terminus of the nucleic acid of a reporter. In some embodiments, the detection moiety is at the 3’ terminus of the nucleic acid of a reporter. In some cases, the detection moiety is at the 5’ terminus of the nucleic acid of a reporter. In some cases, the quenching moiety is at the 3’ terminus of the nucleic acid of a reporter.
- Suitable fluorescent proteins include, but are not limited to, green fluorescent protein (GFP) or variants thereof, blue fluorescent variant of GFP (BFP), cyan fluorescent variant of GFP (CFP), yellow fluorescent variant of GFP (YFP), enhanced GFP (EGFP), enhanced CFP (ECFP), enhanced YFP (EYFP), GFPS65T, Emerald, Topaz (TYFP), Venus, Citrine, mCitrine, GFPuv, destabilised EGFP (dEGFP), destabilised ECFP (dECFP), destabilised EYFP (dEYFP), mCFPm, Cerulean, T-Sapphire, CyPet, YPet, mKO, HcRed, t- HcRed, DsRed, DsRed2, DsRed-monomer, J-Red, dimer2, t-dimer2(12), mRFPl, pocilloporin, Renilla GFP, Monster GFP, paGFP
- Suitable enzymes include, but are not limited to, horseradish peroxidase (HRP), alkaline phosphatase (AP), beta-galactosidase (GAL), glucose-6-phosphate dehydrogenase, beta-N-acetylglucosaminidase, CE ⁇ -glucuronidase, invertase, Xanthine Oxidase, firefly luciferase, and glucose oxidase (GO).
- HRP horseradish peroxidase
- AP alkaline phosphatase
- GAL beta-galactosidase
- glucose-6-phosphate dehydrogenase beta-N-acetylglucosaminidase
- CE ⁇ -glucuronidase invertase
- Xanthine Oxidase firefly luciferase
- glucose oxidase GO
- the detection moiety comprises an invertase.
- the substrate of the invertase can be sucrose.
- a DNS reagent can be included in the system to produce a colorimetric change when the invertase converts sucrose to glucose.
- the reporter nucleic acid and invertase are conjugated using a heterobifunctional linker via sulfo- SMCC chemistry.
- suitable fluorophores provide a detectable fluorescence signal in the same range as 6-Fluorescein (Integrated DNA Technologies), IRDye 700 (Integrated DNA Technologies), TYE 665 (Integrated DNA Technologies), Alex Fluor 594 (Integrated DNA Technologies), or ATTO TM 633 (NHS Ester) (Integrated DNA Technologies).
- fluorophores are fluorescein amidite, 6-Fluorescein, IRDye 700, TYE 665, Alex Fluor 594, or ATTO TM 633 (NHS Ester).
- the fluorophore can be an infrared fluorophore.
- the fluorophore can emit fluorescence in the range of 500 nm and 720 nm.
- the fluorophore emits fluorescence at a wavelength of 700 nm or higher. In other cases, the fluorophore emits fluorescence at about 665 nm. In some cases, the fluorophore emits fluorescence in the range of 500 nm to 520 nm, 500 nm to 540 nm, 500 nm to 590 nm, 590 nm to 600 nm, 600 nm to 610 nm, 610 nm to 620 nm, 620 nm to 630 nm, 630 nm to 640 nm, 640 nm to 650 nm, 650 nm to 660 nm, 660 nm to 670 nm, 670 nm to 680 nm, 690 nm to 690 nm, 690 nm to 700 nm, 700 nm to 710 nm, 710 nm to 720 nm, or
- systems comprise a quenching moiety.
- a quenching moiety may be chosen based on its ability to quench the detection moiety.
- a quenching moiety can be a non-fluorescent fluorescence quencher.
- a quenching moiety may quench a detection moiety that emits fluorescence in the range of 500 nm and 720 nm.
- a quenching moiety may quench a detection moiety that emits fluorescence in the range of 500 nm and 720 nm. In some cases, the quenching moiety quenches a detection moiety that emits fluorescence at a wavelength of 700 nm or higher.
- the quenching moiety quenches a detection moiety that emits fluorescence at about 660 nm or about 670 nm. In some cases, the quenching moiety quenches a detection moiety that emits fluorescence in the range of 500 to 520, 500 to 540, 500 to 590, 590 to 600, 600 to 610, 610 to 620, 620 to 630, 630 to 640, 640 to 650, 650 to 660, 660 to 670, 670 to 680, 690 to 690, 690 to 700, 700 to 710, 710 to 720, or 720 to 730 nm.
- the quenching moiety quenches a detection moiety that emits fluorescence in the range 450 nm to 750 nm, 500 nm to 650 nm, or 550 to 650 nm.
- a quenching moiety may quench fluorescein amidite, 6-Fluorescein, IRDye 700, TYE 665, Alex Fluor 594, or ATTO TM 633 (NHS Ester).
- a quenching moiety can be Iowa Black RQ, Iowa Black FQ or IRDye QC-1 Quencher.
- a quenching moiety quenches fluorescein amidite, 6-Fluorescein (Integrated DNA Technologies), IRDye 700 (Integrated DNA Technologies), TYE 665 (Integrated DNA Technologies), Alex Fluor 594 (Integrated DNA Technologies), or ATTO TM 633 (NHS Ester) (Integrated DNA Technologies).
- a quenching moiety can be Iowa Black RQ (Integrated DNA Technologies), Iowa Black FQ (Integrated DNA Technologies) or IRDye QC-1 Quencher (LiCor). Any of the quenching moieties described herein can be from any commercially available source, can be an alternative with a similar function, a generic, or a non-trade name of the quenching moieties listed.
- the generation of the detectable signal from the release of the detection moiety indicates that cleavage by the programmable nucleases has occurred and that the sample contains the target nucleic acid.
- the detection moiety comprises a fluorescent dye.
- the detection moiety comprises a fluorescence resonance energy transfer (FRET) pair.
- the detection moiety comprises an infrared (IR) dye.
- the detection moiety comprises an ultraviolet (UV) dye.
- the detection moiety comprises a protein.
- the detection moiety comprises a biotin.
- the detection moiety comprises at least one of avidin or streptavidin.
- the detection moiety comprises a polysaccharide, a polymer, or a nanoparticle.
- the detection moiety comprises a gold nanoparticle or a latex nanoparticle.
- a detection moiety can be any moiety capable of generating a calorimetric, potentiometric, amperometric, optical (e.g., fluorescent, colorimetric, etc.), or piezo-electric signal.
- a nucleic acid of a reporter in some embodiments, is protein-nucleic acid that is capable of generating a calorimetric, potentiometric, amperometric, optical (e.g, fluorescent, colorimetric, etc.), or piezo-electric signal upon cleavage of the nucleic acid. Often a calorimetric signal is heat produced after cleavage of the nucleic acids of a reporter.
- a calorimetric signal is heat absorbed after cleavage of the nucleic acids of a reporter.
- a potentiometric signal for example, is electrical potential produced after cleavage of the nucleic acids of a reporter.
- An amperometric signal can be movement of electrons produced after the cleavage of nucleic acid of a reporter.
- the signal is an optical signal, such as a colorimetric signal or a fluorescence signal.
- An optical signal is, for example, a light output produced after the cleavage of the nucleic acids of a reporter.
- an optical signal is a change in light absorbance between before and after the cleavage of nucleic acids of a reporter.
- a piezo-electric signal is a change in mass between before and after the cleavage of the nucleic acid of a reporter.
- Other methods of detection may also be used, such as optical imaging, surface plasmon resonance (SPR), and/or interferometric sensing.
- the detectable signal is a colorimetric signal or a signal visible by eye.
- the detectable signal is fluorescent, electrical, chemical, electrochemical, or magnetic.
- a detectable signal (e.g., a first detectable signal) is generated by binding of the detection moiety to the capture molecule in the detection region, where the detectable signal indicates that the sample contained the target nucleic acid.
- systems are capable of detecting more than one type of target nucleic acid, where the system comprises more than one type of guide nucleic acid and more than one type of reporter nucleic acid.
- systems may be capable of distinguishing between two different target nucleic acids in a sample (e.g, wild-type or mutant when investigating SNPs as described herein).
- the detectable signal is generated directly by the cleavage event. Alternatively, or in combination, the detectable signal is generated indirectly by the cleavage event. In some instances, the detectable signal is not a fluorescent signal. In some instances, the detectable signal is a colorimetric or colorbased signal.
- the detected target nucleic acid is identified based on its spatial location on the detection region of the support medium. In some cases, a second detectable signal is generated in a spatially distinct location than a first detectable signal when two or more detectable signals are generated.
- the reporter is an enzyme-nucleic acid.
- the enzyme can be sterically hindered when present as in the enzyme-nucleic acid, but then functional upon cleavage from the nucleic acid by the programmable nuclease.
- the enzyme is an enzyme that produces a reaction with an enzyme substrate.
- An enzyme can be invertase.
- the substrate of invertase is sucrose and DNS reagent.
- the reporter is a substrate-nucleic acid.
- the substrate is a substrate that produces a reaction with an enzyme. Release of the substrate upon cleavage by the programmable nuclease may free the substrate to react with the enzyme.
- a reporter is attached to a solid support.
- the solid support for example, can be a surface. A surface can be an electrode.
- the solid support is a bead. Often the bead is a magnetic bead.
- the detection moiety is liberated from the solid support and interacts with other mixtures.
- the detection moiety is an enzyme, and upon cleavage of the nucleic acid of the enzyme-nucleic acid, the enzyme flows through a chamber into a mixture comprising the substrate. When the enzyme meets the enzyme substrate, a reaction occurs, such as a colorimetric reaction, which is then detected.
- the detection moiety is an enzyme substrate, and upon cleavage of the nucleic acid of the enzyme substrate-nucleic acid, the enzyme flows through a chamber into a mixture comprising the enzyme. When the enzyme substrate meets the enzyme, a reaction occurs, such as a calorimetric reaction, which is then detected.
- the reporter comprises a nucleic acid conjugated to an affinity molecule which is in turn conjugated to the fluorophore (e.g, nucleic acid - affinity molecule - fluorophore) or the nucleic acid conjugated to the fluorophore which is in turn conjugated to the affinity molecule (e.g, nucleic acid - fluorophore - affinity molecule).
- a linker conjugates the nucleic acid to the affinity molecule.
- a linker conjugates the affinity molecule to the fluorophore.
- a linker conjugates the nucleic acid to the fluorophore.
- a linker can be any suitable linker known in the art.
- the nucleic acid of the reporter can be directly conjugated to the affinity molecule and the affinity molecule can be directly conjugated to the fluorophore or the nucleic acid can be directly conjugated to the fluorophore and the fluorophore can be directly conjugated to the affinity molecule.
- “directly conjugated” indicates that no intervening molecules, polypeptides, proteins, or other moi eties are present between the two moieties directly conjugated to each other.
- a reporter comprises a nucleic acid directly conjugated to an affinity molecule and an affinity molecule directly conjugated to a fluorophore
- the affinity molecule is biotin, avidin, streptavidin, or any similar molecule.
- the reporter comprises a substrate-nucleic acid.
- the substrate can be sequestered from its cognate enzyme when present as in the substrate-nucleic acid, but then is released from the nucleic acid upon cleavage, where the released substrate can contact the cognate enzyme to produce a detectable signal.
- the substrate is sucrose and the cognate enzyme is invertase, and a DNS reagent can be used to monitor invertase activity.
- a reporter can be a hybrid nucleic acid reporter.
- a hybrid nucleic acid reporter comprises a nucleic acid with at least one deoxyribonucleotide and at least one ribonucleotide.
- the nucleic acid of the hybrid nucleic acid reporter is of any length and/or has any mixture of DNAs and RNAs. For example, in some cases, longer stretches of DNA can be interrupted by a few ribonucleotides. Alternatively, longer stretches of RNA can be interrupted by a few deoxyribonucleotides. Alternatively, every other base in the nucleic acid can alternate between ribonucleotides and deoxyribonucleotides.
- hybrid nucleic acid reporter is increased stability as compared to a pure RNA nucleic acid reporter.
- a hybrid nucleic acid reporter can be more stable in solution, lyophilized, or vitrified as compared to a pure DNA or pure RNA reporter.
- a reporter and/or guide nucleic acid comprises one or more modifications, e.g., a base modification, a backbone modification, a sugar modification, etc., to provide the nucleic acid with anew or enhanced feature (e.g., improved stability).
- modifications e.g., a base modification, a backbone modification, a sugar modification, etc.
- suitable modifications include modified nucleic acid backbones and non-natural intemucleoside linkages. Other suitable modifications include nucleic acid mimetics.
- the nucleic acids described herein can include one or more substituted sugar moieties.
- the nucleic acids described herein can include nucleobase modifications or substitutions.
- the nucleic acids described and referred to herein can comprise a plurality of base pairs.
- a base pair refers to a biological unit comprising two nucleobases bound to each other by hydrogen bonds.
- Nucleobases can comprise adenine, guanine, cytosine, thymine, and/or uracil.
- the nucleic acids described and referred to herein can comprise different base pairs.
- the nucleic acids described and referred to herein can comprise one or more modified base pairs. The one or more modified base pairs can be produced when one or more base pairs undergo a chemical modification leading to new bases.
- the one or more modified base pairs can be, for example, Hypoxanthine, Inosine, Xanthine, Xanthosine, 7-Methylguanine, 7-Methylguanosine, 5,6- Dihydrouracil, Dihydrouridine, 5 -Methylcytosine, 5-Methylcytidine, 5- hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), or 5-carboxylcytosine (5caC).
- locus refers to one or more nucleotides that map to a reference sequence, such as a genome.
- a locus refers to a position (e.g., a site) within a reference sequence, e.g., on a particular chromosome of a genome.
- a locus refers to a single nucleotide position within a reference sequence, e.g., on a particular chromosome within a genome.
- a locus refers to a group of nucleotide positions within a reference sequence.
- a locus is defined by a mutation (e.g., substitution, insertion, deletion, inversion, or translocation) of consecutive nucleotides within a reference sequence.
- a locus is defined by a gene, a sub-genic structure (e.g., a regulatory element, exon, intron, or combination thereof), or a predefined span of a chromosome.
- a locus e.g. , a target locus
- a locus is defined by a biomarker and/or antimicrobial resistance gene that maps to the position of the respective locus.
- a respective target locus comprises a plurality of alleles including an allele having a mutation that confers resistance to a microorganism against antimicrobial interventions that target the respective locus (e.g., antibacterial resistance, antiprotozoal resistance, antifungal resistance, antihelminthic resistance, and/or antiviral resistance).
- a respective target locus comprises a genetic marker for the target locus that indicates a resistance to antimicrobial interventions.
- antimicrobial resistance markers e.g., genes and/or amino acid residues
- examples of antimicrobial resistance markers include, but are not limited to, the antimicrobial resistance markers listed below in, e.g, Capela et al., 2019, “An Overview of Drug Resistance in Protozoal Diseases,” Int J Mol Sci.
- the target nucleic acid is a single stranded nucleic acid.
- the target nucleic acid is a double stranded nucleic acid and is prepared into single stranded nucleic acids before or upon contacting the programmable nuclease-based detection reagents (e.g, programmable nuclease, guide nucleic acid, and/or reporter).
- the target nucleic acid is a double stranded nucleic acid.
- the double stranded nucleic acid is DNA.
- the target nucleic acid is an RNA.
- Target nucleic acids include but are not limited to mRNA, rRNA, tRNA, non-coding RNA, long non-coding RNA, and microRNA (miRNA).
- the target nucleic acid is complementary DNA (cDNA) synthesized from a single-stranded RNA template in a reaction catalyzed by a reverse transcriptase.
- the target nucleic acid is single-stranded RNA (ssRNA) or mRNA.
- the target nucleic acid is from a virus, a parasite, or a bacterium described herein.
- the target nucleic acid comprises 5 to 100, 5 to 90, 5 to 80, 5 to 70, 5 to 60, 5 to 50, 5 to 40, 5 to 30, 5 to 25, 5 to 20, 5 to 15, or 5 to 10 nucleotides in length. In some cases, the target nucleic acid comprises 10 to 90, 20 to 80, 30 to 70, or 40 to 60 nucleotides in length.
- the target nucleic acid sequence can be from 10 to 95, from 20 to 95, from 30 to 95, from 40 to 95, from 50 to 95, from 60 to 95, from 10 to 75, from 20 to 75, from 30 to 75, from 40 to 75, from 50 to 75, from 5 to 50, from 15 to 50, from 25 to 50, from 35 to 50, or from 45 to 50 nucleotides in length.
- the target nucleic acid comprises 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 45, 50, 60, 70, 80, 90, or 100 nucleotides in length.
- the target nucleic acid comprises at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, or at least 100 nucleotides in length.
- the target nucleic acid can be reverse complementary to a guide nucleic acid. In some cases, at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
- nucleotides of a guide nucleic acid can be reverse complementary to a target nucleic acid.
- a target nucleic acid can be an amplified nucleic acid of interest.
- the nucleic acid of interest can be any nucleic acid disclosed herein or from any sample as disclosed herein.
- the nucleic acid of interest can be an RNA that is reverse transcribed before amplification.
- the nucleic acid of interest is amplified, and then the amplicons are transcribed into RNA.
- target nucleic acids activate a programmable nuclease to initiate sequence-independent cleavage of a nucleic acid-based reporter (e.g., a reporter comprising an RNA sequence, or a reporter comprising DNA and RNA).
- a programmable nuclease of the present disclosure is activated by a target nucleic acid to cleave reporters having an RNA (also referred to herein as an “RNA reporter”).
- RNA reporter also referred to herein as a “RNA reporter”.
- the RNA reporter can comprise a single-stranded RNA labeled with a detection moiety or can be any RNA reporter as disclosed herein.
- the target nucleic acid as described in the methods herein does not initially comprise a PAM sequence.
- any target nucleic acid of interest can be generated using the methods described herein to comprise a PAM sequence, and thus be a PAM target nucleic acid.
- a PAM target nucleic acid refers to a target nucleic acid that has been amplified to insert a PAM sequence that is recognized by a CRISPR/Cas system.
- the target nucleic acid is in a cell.
- the cell is a single-cell eukaryotic organism; a plant cell an algal cell; a fungal cell; an animal cell; a cell an invertebrate animal; a cell a vertebrate animal such as fish, amphibian, reptile, bird, and mammal; or a cell a mammal such as a human, a non-human primate, an ungulate, a feline, a bovine, an ovine, and a caprine.
- the cell is a eukaryotic cell.
- the cell is a mammalian cell, a human cell, or a plant cell.
- the target nucleic acid sequence comprises a nucleic acid sequence of a virus, a bacterium, or other pathogen responsible for a disease in a plant (e.g, a crop).
- Methods and compositions of the disclosure can be used to treat or detect a disease in a plant.
- the methods of the disclosure can be used to target a viral nucleic acid sequence in a plant.
- a programmable nuclease of the disclosure cleaves the viral nucleic acid.
- the target nucleic acid sequence comprises a nucleic acid sequence of a virus or a bacterium or other agents (e.g, any pathogen) responsible for a disease in the plant (e.g, a crop).
- the target nucleic acid comprises RNA.
- the target nucleic acid in some cases, is a portion of a nucleic acid from a virus or a bacterium or other agents responsible for a disease in the plant (e.g, a crop).
- the target nucleic acid is a portion of a nucleic acid from a genomic locus, or any NA amplicon, such as a reverse transcribed mRNA or a cDNA from a gene locus, a transcribed mRNA, or a reverse transcribed cDNA from a gene locus in at a virus or a bacterium or other agents (e.g, any pathogen) responsible for a disease in the plant (e.g, a crop).
- a virus infecting the plant can be an RNA virus.
- a virus infecting the plant can be a DNA virus.
- TMV Tobacco mosaic virus
- TSWV Tomato spotted wilt virus
- CMV Cucumber mosaic virus
- PVY Potato virus Y
- PMV Cauliflower mosaic virus
- PV Plum pox virus
- BMV Brome mosaic virus
- PVX Potato virus X
- target nucleic acids comprise a mutation.
- a sequence comprising a mutation is modified to a wild-type sequence with a composition, system or method described herein.
- a sequence comprising a mutation is detected with a composition, system or method described herein.
- the mutation can be a mutation of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more nucleotides.
- Non-limiting examples of mutations are insertion-deletion (indel), single nucleotide polymorphism (SNP), and frameshift mutations.
- guide nucleic acids described herein hybridize to a region of the target nucleic acid comprising the mutation.
- a target locus comprises a plurality of alleles including a first allele ( .g., a wild-type allele) and a second allele (e.g, a mutant allele), where the first allele and the second allele differ by a mutation in the nucleic acid sequence of the respective locus, and the respective guide nucleic acid for each respective allele is a nucleic acid that is reverse complementary to the respective allele.
- the mutation is located in a non-coding region or a coding region of a gene.
- target nucleic acids comprise a mutation, where the mutation is a SNP.
- the single nucleotide mutation or SNP can be associated with a phenotype of the sample or a phenotype of the organism from which the sample was taken.
- the SNP in some embodiments, is associated with altered phenotype from wild-type phenotype.
- the SNP can be a synonymous substitution or a nonsynonymous substitution.
- the nonsynonymous substitution can be a missense substitution or a nonsense point mutation.
- the synonymous substitution can be a silent substitution.
- the mutation can be a deletion of one or more nucleotides.
- the single nucleotide mutation, SNP, or deletion is associated with a disease such as cancer or a genetic disorder.
- the mutation such as a single nucleotide mutation, a SNP, or a deletion, can be encoded in the sequence of a target nucleic acid from the germline of an organism or can be encoded in a target nucleic acid from a diseased cell, such as a cancer cell.
- target nucleic acids comprise a mutation, where the mutation is a deletion of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more nucleotides.
- the mutation can be a deletion of about 5, about 10, about 15, about 20, about 25, about 30, about 35, about 40, about 45, about 50, about 55, about 60, about 65, about 70, about 75, about 80, about 85, about 90, about 95, about 100, about 200, about 300, about 400, about 500, about 600, about 700, about 800, about 900, or about 1000 nucleotides.
- the mutation can be a deletion of 1 to 5, 5 to 10, 10 to 15, 15 to 20, 20 to 25, 25 to 30, 30 to 35, 35 to 40, 40 to 45, 45 to 50, 50 to 55, 55 to 60, 60 to 65, 65 to 70, 70 to 75, 75 to 80, 80 to 85, 85 to 90, 90 to 95, 95 to 100, 100 to 200, 200 to 300, 300 to 400, 400 to 500, 500 to 600, 600 to 700, 700 to 800, 800 to 900, 900 to 1000, 1 to 50, 1 to 100, 25 to 50, 25 to 100, 50 to 100, 100 to 500, 100 to 1000, or 500 to 1000 nucleotides.
- single nucleotide polymorphism refers to the variation of a single nucleotide or nucleobase at a specific position in a nucleic acid sequence.
- the single nucleotide or nucleobase variation is generally between the genomes of two members of the same species, or some other specific population.
- a SNP occurs at a specific nucleic acid site in genomic DNA in which different alternative sequences, e.g., “alleles,” exist more frequently in certain member of a population.
- a less frequent allele comprises the SNP and has an abundance of at least 1%, 0.8%, 0.5%, 0.4%, 0.3%, 0.2% or 0.1%.
- a SNP is any point mutation that is sufficiently present in a population (e.g, 1%, 0.8%, 0.5%, 0.4%, 0.3%, 0.2%, 0.1% or more).
- a SNP can be disease-causing, at least partially, or can be associated with a disease.
- SNPs are known to skilled artisans and can be located in relevant published papers and genomic databases.
- allelic frequency can be determined, thereby providing the fraction of all chromosomes that carry a particular allele of a gene within a specific population.
- wild-type refers to a segment or region of nucleic acid sequence, or fragment thereof, that is the universal form (e.g, present in at least 40%) within a population.
- wild-type refers to a segment or region of nucleic acid sequence, or fragment thereof, lacking commonly known sequence variations or allelic variations which can be silent, causal, disease-associated, or disease-risk causing.
- wild-type refers to a to a polypeptide or protein expressed by a naturally occurring organism, or a polypeptide or protein having the characteristics of a polypeptide or protein isolated from a naturally occurring organism, where the polypeptide or protein is relatively constant (e.g, present in at least 40% of) a species population.
- mutations are associated with a disease, that is the mutation in a subject indicates that the subject is susceptible to, or suffers from, a disease, disorder, or pathological state.
- a mutation associated with a disease refers to a mutation which causes the disease, contributes to the development of the disease, or indicates the existence of the disease.
- a mutation associated with a disease can also refer to any mutation which generates transcription or translation products at an abnormal level, or in an abnormal form, in cells affected by a disease relative to a control without the disease.
- Nonlimiting examples of diseases associated with mutations are hemophilia, sickle cell anemia, P-thalassemia, Duchene muscular dystrophy, severe combined immunodeficiency (SCID, also known as “bubble boy syndrome”), Huntington’s disease, cystic fibrosis, and various cancers.
- SCID severe combined immunodeficiency
- Huntington Huntington’s disease
- cystic fibrosis and various cancers.
- the systems and methods of the present disclosure can be used to detect one or more target sequences or nucleic acids in one or more samples.
- the one or more samples can comprise one or more target sequences or nucleic acids for detection of an ailment, such as a disease, cancer, or genetic disorder, or genetic information, such as for phenotyping, genotyping, or determining ancestry and are compatible with the reagents and support mediums as described herein.
- a sample can be taken from any place where a nucleic acid can be found.
- Samples can be taken from an individual/human, a non-human animal, or a crop, or an environmental sample can be obtained to test for presence of a disease, virus, pathogen, cancer, genetic disorder, or any mutation or pathogen of interest.
- a biological sample can be blood, serum, plasma, lung fluid, exhaled breath condensate, saliva, spit, urine, stool, feces, mucus, lymph fluid, peritoneal , cerebrospinal fluid, amniotic fluid, breast milk, gastric secretions, bodily discharges, secretions from ulcers, pus, nasal secretions, sputum, pharyngeal exudates, urethral secretions/mucus, vaginal secretions/mucus, anal secretion/mucus, semen, tears, an exudate, an effusion, tissue fluid, interstitial fluid (e.g, tumor interstitial fluid), cyst fluid, tissue, or, in some instances, any combination thereof.
- tissue fluid interstitial fluid
- a sample can be an aspirate of a bodily fluid from an animal (e.g, human, animals, livestock, pet, etc.) or plant.
- a tissue sample can be from any tissue that can be infected or affected by a pathogen (e.g, a wart, lung tissue, skin tissue, and the like).
- a tissue sample (e.g, from animals, plants, or humans) can be dissociated or liquified prior to application to detection system of the present disclosure.
- a sample can be from a plant (e.g, a crop, a hydroponically grown crop or plant, and/or house plant). Plant samples can include extracellular fluid, from tissue (e.g, root, leaves, stem, trunk etc.).
- a sample can be taken from the environment immediately surrounding a plant, such as hydroponic fluid/ water, or soil.
- a sample from an environment can be from soil, air, or water.
- the environmental sample is taken as a swab from a surface of interest or taken directly from the surface of interest.
- the raw sample is applied to the detection system.
- the sample is diluted with a buffer or a fluid or concentrated prior to application to the detection system.
- the sample is contained in no more than about 200 nanoliters (nL). In some cases, the sample is contained in about 200 nL. In some cases, the sample is contained in a volume that is greater than about 200 nL and less than about 20 microliters (pL).
- the sample is contained in no more than 20 pl. In some cases, the sample is contained in no more than 1, 5, 10, 15, 20, 25, 30, 35 40, 45, 50, 55, 60, 65, 70, 75, 80, 90, 100, 200, 300, 400, 500 pl, or any of value from 1 pl to 500 pl.
- the sample is contained in from 1 pL to 500 pL, from 10 pL to 500 pL, from 50 pL to 500 pL, from 100 pL to 500 pL, from 200 pL to 500 pL, from 300 pL to 500 pL, from 400 pL to 500 pL, from 1 pL to 200 pL, from 10 pL to 200 pL, from 50 pL to 200 pL, from 100 pL to 200 pL, from 1 pL to 100 pL, from 10 pL to 100 pL, from 50 pL to 100 pL, from 1 pL to 50 pL, from 10 pL to 50 pL, from 1 pL to 20 pL, from 10 pL to 20 pL, or from 1 pL to 10 pL.
- the sample is contained in more than 500 pl.
- the sample is taken from a single-cell eukaryotic organism; a plant or a plant cell; an algal cell; a fungal cell; an animal or an animal cell, tissue, or organ; a cell, tissue, or organ from an invertebrate animal; a cell, tissue, fluid, or organ from a vertebrate animal such as fish, amphibian, reptile, bird, and mammal; a cell, tissue, fluid, or organ from a mammal such as a human, a non-human primate, an ungulate, a feline, a bovine, an ovine, and a caprine.
- the sample is taken from nematodes, protozoans, helminths, or malarial parasites.
- the sample can comprise nucleic acids from a cell lysate from a eukaryotic cell, a mammalian cell, a human cell, a prokaryotic cell, or a plant cell.
- the sample can comprise nucleic acids expressed from a cell.
- the sample used for phenotyping testing can comprise at least one target nucleic acid segment that can bind to a guide nucleic acid of the reagents described herein.
- the target nucleic acid segment in some cases, is a portion of a nucleic acid from a gene associated with a phenotypic trait.
- the sample used for genotyping testing can comprise at least one target nucleic acid segment that can bind to a guide nucleic acid of the reagents described herein.
- the target nucleic acid segment in some cases, is a portion of a nucleic acid from a gene associated with a genotype.
- the sample used for ancestral testing can comprise at least one target nucleic acid segment that can bind to a guide nucleic acid of the reagents described herein.
- the target nucleic acid segment in some cases, is a portion of a nucleic acid from a gene associated with a geographic region of origin or ethnic group.
- the sample can be used for identifying a disease status.
- a sample is any sample described herein, and is obtained from a subject for use in identifying a disease status of a subject.
- the disease can be a cancer or genetic disorder.
- a method may comprise obtaining a serum sample from a subject; and identifying a disease status of the subject. Often, the disease status is prostate disease status.
- the device can be configured for asymptomatic, pre- symptomatic, and/or symptomatic diagnostic applications, irrespective of immunity.
- the device can be configured to perform one or more serological assays on a sample (e.g, a sample comprising blood).
- the target sequence is a portion of a nucleic acid from a virus or a bacterium or other agents responsible for a disease in the sample.
- the target sequence in some cases, is a portion of a nucleic acid from a sexually transmitted infection or a contagious disease, in the sample.
- the target sequence in some cases, is a portion of a nucleic acid from an upper respiratory tract infection, a lower respiratory tract infection, or a contagious disease, in the sample.
- the target sequence in some cases, is a portion of a nucleic acid from a hospital acquired infection or a contagious disease, in the sample.
- the target sequence is a portion of a nucleic acid from sepsis, in the sample.
- diseases can include but are not limited to respiratory viruses (e.g, SARS-CoV-2 (i.e., a virus that causes COVID- 19), SARS-CoV-1, MERS-CoV, influenza, Adenovirus, Coronavirus HKU1, Coronavirus NL63, Coronavirus 229E, Coronavirus OC43, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), Human Metapneumovirus (hMPV), Human Rhinovirus (HRVs A, B, C), Human Enterovirus, Influenza A, Influenza A/Hl, Influenza A/H2, Influenza A/H3, Influenza A/H4, Influenza A/H5, Influenza A/H6, Influenza A/H7, Influenza A/H8, Influenza A/H9, Influenza A/H10, Influenza A/Hl 1, Influenza A/H12, Influenza A/H13, Influenza A
- viruses include human immunodeficiency virus (HIV), human papillomavirus (HPV), chlamydia, gonorrhea, syphilis, trichomoniasis, sexually transmitted infection, malaria, Dengue fever, Ebola, chikungunya, and leishmaniasis.
- HAV human immunodeficiency virus
- HPV human papillomavirus
- chlamydia gonorrhea
- syphilis syphilis
- trichomoniasis sexually transmitted infection
- malaria Dengue fever
- Ebola chikungunya
- leishmaniasis leishmaniasis
- Pathogens include viruses, fungi, helminths, protozoa, malarial parasites, Plasmodium parasites, Toxoplasma parasites, and Schistosoma parasites.
- Helminths include roundworms, heartworms, and phytophagous nematodes, flukes, Acanthocephala, and tapeworms.
- Protozoan infections include infections from Giardia spp., Trichomonas spp., African trypanosomiasis, amoebic dysentery, babesiosis, balantidial dysentery, Chagas disease, coccidiosis, malaria and toxoplasmosis.
- pathogens such as parasitic/protozoan pathogens include, but are not limited to: Plasmodium falciparum, P. vivax, Trypanosoma cruzi and Toxoplasma gondii.
- Fungal pathogens include, but are not limited to Cryptococcus neoformans, Histoplasma capsulatum, Coccidioides immitis, Blastomyces dermatitidis, Chlamydia trachomatis, Chlamydia pneumoniae, Chlamydia psittaci, and Candida albicans.
- Pathogenic viruses include but are not limited to: respiratory viruses (e.g., adenoviruses, parainfluenza viruses, severe acute respiratory syndrome (SARS), coronavirus, MERS), gastrointestinal viruses (e.g., noroviruses, rotaviruses, some adenoviruses, astroviruses), exanthematous viruses (e.g., the virus that causes measles, the virus that causes rubella, the virus that causes chickenpox/shingles, the virus that causes roseola, the virus that causes smallpox, the virus that causes fifth disease, chikungunya virus infection); hepatic viral diseases (e.g., hepatitis A, B, C, D, E); cutaneous viral diseases (e.g., warts (including genital, anal), herpes (including oral, genital, anal), molluscum contagiosum); hemmorhagic viral diseases (e.g., Ebola, Lassa fever
- Pathogens include, e.g., HIV virus, Mycobacterium tuberculosis, Klebsiella pneumoniae, Acinetobacter baumannii, Bacillus anthracis, Bordetella pertussis, Burkholderia cepacia, Corynebacterium diphtheriae, Coxiella burnetii, Streptococcus agalactiae, methicillin-resistant Staphylococcus aureus, Legionella longbeachae, Legionella pneumophila, Leptospira interrogans, Moraxella catarrhalis, Streptococcus pyogenes, Escherichia coli, Neisseria gonorrhoeae, Neisseria meningitidis, Neisseria elongate, Neisseria gonorrhoeae, Parechovirus, Pneumococcus, Pneumocystis jirovecii, Cryptoc
- the target nucleic acid comprises a sequence from a virus or a bacterium or other agents responsible for a disease that can be found in the sample.
- the target nucleic acid is a portion of a nucleic acid from a genomic locus, a transcribed mRNA, or a reverse transcribed cDNA from a gene locus in at least one of: human immunodeficiency virus (HIV), human papillomavirus (HPV), chlamydia, gonorrhea, syphilis, trichomoniasis, sexually transmitted infection, malaria, Dengue fever, Ebola, chikungunya, and leishmaniasis.
- HCV human immunodeficiency virus
- HPV human papillomavirus
- chlamydia gonorrhea
- syphilis syphilis
- trichomoniasis sexually transmitted infection
- malaria Dengue fever
- Ebola chikungunya
- leishmaniasis leishmaniasis
- Pathogens include viruses, fungi, helminths, protozoa, malarial parasites, Plasmodium parasites, Toxoplasma parasites, and Schistosoma parasites.
- Helminths include roundworms, heartworms, and phytophagous nematodes, flukes, Acanthocephala, and tapeworms.
- Protozoan infections include infections from Giardia spp., Trichomonas spp., African trypanosomiasis, amoebic dysentery, babesiosis, balantidial dysentery, Chagas disease, coccidiosis, malaria and toxoplasmosis.
- pathogens such as parasitic/protozoan pathogens include, but are not limited to: Plasmodium falciparum, P. vivax, Trypanosoma cruzi and Toxoplasma gondii.
- Fungal pathogens include, but are not limited to Cryptococcus neoformans, Histoplasma capsulatum, Coccidioides immitis, Blastomyces dermatitidis , Chlamydia trachomatis, and Candida albicans.
- Pathogenic viruses include but are not limited to immunodeficiency virus (e.g., HIV); influenza virus; dengue; West Nile virus; herpes virus; yellow fever virus; Hepatitis Virus C; Hepatitis Virus A; Hepatitis Virus B; papillomavirus; and the like.
- immunodeficiency virus e.g., HIV
- influenza virus dengue; West Nile virus
- herpes virus yellow fever virus
- Hepatitis Virus C Hepatitis Virus A
- Hepatitis Virus B Hepatitis Virus B
- papillomavirus papillomavirus
- Pathogens include, e.g., HIV virus, Mycobacterium tuberculosis, Streptococcus agalactiae, methicillin- resistant Staphylococcus aureus, Staphylococcus epidermidis, Legionella pneumophila, Streptococcus pyogenes, Streptococcus salivarius, Escherichia coli, Neisseria gonorrhoeae, Neisseria meningitidis, Pneumococcus, Cryptococcus neoformans, Histoplasma capsulatum, Hemophilus influenzae B, Treponema pallidum, Lyme disease spirochetes, Pseudomonas aeruginosa, Mycobacterium leprae, Brucella abortus, rabies virus, influenza virus, cytomegalovirus, herpes simplex virus I, herpes simplex virus II, human serum parvo-like virus, respiratory
- T. vaginalis varicella-zoster virus
- hepatitis B virus hepatitis C virus
- measles virus human adenovirus (type A, B, C, D, E, F, G)
- human T-cell leukemia viruses Epstein-Barr virus, murine leukemia virus, mumps virus, vesicular stomatitis virus, Sindbis virus, lymphocytic choriomeningitis virus, wart virus, blue tongue virus
- SARS-CoV-2 Variants include Coronavirus HKU1, Coronavirus NL63, Coronavirus 229E, Coronavirus OC43, SARS-CoV-2 85 A, SARS-CoV-2 T1001I, SARS-CoV-2 3675-3677A, SARS-CoV-2 P4715L, SARS-CoV-2 S5360L, SARS-CoV-2 69-70A, SARS-CoV-2 Tyrl44fs, SARS-CoV- 2242-244A, SARS-CoV-2 Y453F, SARS-CoV-2 S477N, SARS-CoV-2 E848K, SARS-CoV- 2 N501Y, SARS-CoV-2 D614G, SARS-CoV-2 P681R, SARS-CoV-2 P681H, SARS-CoV-2 L21F, SARS-CoV
- the target sequence is a portion of a nucleic acid from a genomic locus, a transcribed mRNA, or a reverse transcribed cDNA from a gene locus of bacterium or other agents responsible for a disease in the sample comprising a mutation that confers resistance to a treatment, such as a single nucleotide mutation that confers resistance to antibiotic treatment.
- the target sequence is a portion of a nucleic acid from a subject having cancer.
- the cancer can be a solid cancer (tumor).
- the cancer can be a blood cell cancer, including leukemias and lymphomas.
- Non-limiting types of cancer that could be treated with such methods and compositions include colon cancer, rectal cancer, renal-cell carcinoma, liver cancer, bladder cancer, cancer of the kidney or ureter, lung cancer, cancer of the small intestine, esophageal cancer, melanoma, bone cancer, pancreatic cancer, skin cancer, brain cancer (e.g., glioblastoma), cancer of the head or neck, melanoma, uterine cancer, ovarian cancer, breast cancer, testicular cancer, cervical cancer, stomach cancer, Hodgkin’s Disease, non-Hodgkin’s lymphoma, thyroid cancer.
- the cancer can be a leukemia, such as, by way of non-limiting example, acute myeloid (or myelogenous) leukemia (AML), chronic myeloid (or myelogenous) leukemia (CML), acute lymphocytic (or lymphoblastic) leukemia (ALL), and chronic lymphocytic leukemia (CLL).
- AML acute myeloid (or myelogenous) leukemia
- CML chronic myeloid (or myelogenous) leukemia
- ALL acute lymphocytic leukemia
- CLL chronic lymphocytic leukemia
- the target sequence is a portion of a nucleic acid from a cancer cell.
- a cancer cell can be a cell harboring one or more mutations that results in unchecked proliferation of the cancer cell. Such mutations are known in the art.
- Non-limiting examples of antigens are ADRB3, AKAP-4,ALK, Androgen receptor, B7H3, BCMA, BORIS, BST2, CAIX, CD 179a, CD123, CD171, CD19, CD20, CD22, CD24, CD30, CD300LF, CD33, CD38, CD44v6, CD72, CD79a, CD79b, CD97, CEA, CLDN6, CLEC12A, CLL-1, CS-1, CXORF61, CYP1B1, Cyclin B 1, E7, EGFR, EGFRvIII, ELF2M, EMR2, EPC AM, ERBB2 (Her2/neu), ERG (TMPRSS2 ETS fusion gene), ETV6-AML, EphA2, Ephrin B2, FAP, FC&RL5, FLT3, Folate receptor alpha, Folate receptor beta, Fos-related antigen 1, Fucosyl GM1, GD2, GD3, GM3, GPC3, GPR20, GPRC5D, Glob
- the target sequence is a portion of a nucleic acid from a control gene in a sample.
- the control gene is an endogenous control.
- the endogenous control can include human 18S rRNA, human GAPDH, human HPRT1, human GUSB, human RNase P, MS2 bacteriophage, or any other control sequence of interest within the sample.
- the sample used for cancer testing or cancer risk testing can comprise at least one target sequence or target nucleic acid segment that can bind to a guide nucleic acid of the reagents described herein.
- the target nucleic acid segment in some cases, is a portion of a nucleic acid from a gene with a mutation associated with cancer, from a gene whose overexpression is associated with cancer, a tumor suppressor gene, an oncogene, a checkpoint inhibitor gene, a gene associated with cellular growth, a gene associated with cellular metabolism, or a gene associated with cell cycle.
- the target nucleic acid encodes for a cancer biomarker, such as a prostate cancer biomarker or non-small cell lung cancer.
- the assay can be used to detect “hotspots” in target nucleic acids that can be predictive of cancer, such as lung cancer, cervical cancer, in some cases, the cancer can be a cancer that is caused by a virus.
- viruses that cause cancers in humans include Epstein-Barr virus (e.g, Burkitt’s lymphoma, Hodgkin’s Disease, and nasopharyngeal carcinoma); papillomavirus (e.g, cervical carcinoma, anal carcinoma, oropharyngeal carcinoma, penile carcinoma); hepatitis B and C viruses (e.g, hepatocellular carcinoma); human adult T-cell leukemia virus type 1 (HTLV-1) (e.g, T-cell leukemia); and Merkel cell polyomavirus (e.g, Merkel cell carcinoma).
- Epstein-Barr virus e.g, Burkitt’s lymphoma, Hodgkin’s Disease, and nasopharyngeal carcinoma
- papillomavirus
- the target nucleic acid is a portion of a nucleic acid that is associated with a blood fever.
- the mutation is located in a portion of a nucleic acid from a genomic locus, any DNA amplicon of, a reverse transcribed mRNA, or a cDNA from a locus of at least one of: ALK, APC, ATM, AXIN2, BAP1, BARD1, BLM, BMPR1A, BRCA1, BRCA2, BRIP1, CASR, CDC73, CDH1, CDK4, CDKN1B, CDKN1C, CDKN2A, CEBPA, CHEK2, CTNNA1, DICER1, DIS3L2, EGFR, EPCAM, FH, FLCN, GATA2, GPC3, GREM1, HOXB13, HRAS, system, MAX, MEN1, MET, MITF, MLH1, MSH2, MSH3, M
- the sample used for genetic disorder testing can comprise at least one target sequence or target nucleic acid segment that can bind to a guide nucleic acid of the reagents described herein.
- the genetic disorder is hemophilia, sickle cell anemia, P-thalassemia, Duchene muscular dystrophy, severe combined immunodeficiency, or cystic fibrosis.
- the target nucleic acid in some cases, is from a gene with a mutation associated with a genetic disorder, from a gene whose overexpression is associated with a genetic disorder, from a gene associated with abnormal cellular growth resulting in a genetic disorder, or from a gene associated with abnormal cellular metabolism resulting in a genetic disorder.
- the target nucleic acid is a nucleic acid from a genomic locus, a transcribed mRNA, or a reverse transcribed mRNA, a DNA amplicon of or a cDNA from a locus of at least one of: CFTR, FMRI, SMN1, ABCB11, ABCC8, ABCD1, ACAD9, AC ADM, ACADVL, AC ATI, ACOX1, ACSF3, ADA, ADAMTS2, ADGRG1, AGA, AGL, AGPS, AGXT, AIRE, ALDH3A2, ALDOB, ALG6, ALMS1, ALPL, AMT, AQP2, ARG1, ARSA, ARSB, ASL, ASNS, ASPA, ASS1, ATM, ATP6V1B1, ATP7A, ATP7B, ATRX, BBS1, BBS10, BBS12, BBS2, BCKDHA, BCKDHB, BCS1L, BLM, BSND, CAPN3, CBS, CDH23
- target nucleic acids are amplified and/or comprise amplicons.
- the methods described herein can comprise amplifying a target nucleic acid molecule, and/or amplifying a nucleic acid molecule in a sample to produce a target nucleic acid molecule.
- Amplification can occur prior to or simultaneously with detection of a signal indicative of reporter cleavage.
- amplification and detection can occur within the same reaction volume sequentially or simultaneously.
- amplification and detection can occur sequentially in different reaction volumes.
- amplification and detection can occur at different temperatures.
- amplification and detection can occur at the same temperature.
- amplifying can improve at least one of sensitivity, specificity, or accuracy of the detection of the target nucleic acid.
- Amplification can be isothermal or can comprise thermocycling.
- amplification comprises transcription mediated amplification (TMA), helicase dependent amplification (HD A), circular helicase dependent amplification (cHDA), strand displacement amplification (SDA), recombinase polymerase amplification (RPA), polymerase chain reaction (PCR), quantitative PCR (qPCR), nested PCR, multiplex PCR, degenerative PCR, asymmetric PCR, touchdown PCR, random primer PCR, hemi-nested PCR, polymerase cycling assembly (PCA), colony PCR, ligase chain reaction (LCR), digital PCR, methylation-specific PCR (MSP), co-amplification at lower denaturation temperature PCR (COLD-PCR), allele-specific PCR, intersequence-specific PCR (ISS-PCR), whole genome amplification (WGA), inverse PCR, thermal asymmetric interlaced PCR (TAIL- PCR), loop mediated mediated amplification (TM
- amplification of the target nucleic acid comprises modifying the sequence of the target nucleic acid.
- amplification can be used to insert a protospacer adjacent motif (PAM) sequence into a target nucleic acid that lacks a PAM sequence.
- PAM protospacer adjacent motif
- amplification is used to increase the homogeneity of a target nucleic acid in a sample.
- amplification can be used to remove a nucleic acid variation that is not of interest in the target nucleic acid sequence.
- the term “subject” refers to any living or non-living organism including, but not limited to, a human (e.g. , a male human, female human, fetus, pregnant female, child, or the like), a non-human mammal, or a non-human animal.
- Any human or non-human animal can serve as a subject, including but not limited to mammal, reptile, avian, amphibian, fish, ungulate, ruminant, bovine (e.g, cattle), equine (e.g., horse), caprine and ovine (e.g., sheep, goat), swine (e.g, pig), camelid (e.g, camel, llama, alpaca), monkey, ape (e.g., gorilla, chimpanzee), ursid (e.g., bear), poultry, dog, cat, mouse, rat, fish, dolphin, whale and shark.
- a subject is a male or female of any age (e.g, a man, a woman, or a child).
- a reference sequence refers to a sequence of nucleotide bases.
- a reference sequence is a reference genome.
- a “genome” or “reference genome” can refer to any particular known, sequenced or characterized genome, whether partial or complete, of any organism or virus that can be used to reference identified sequences from a subject. Exemplary reference genomes used for human subjects as well as many other organisms are provided in the on-line genome browser hosted by the National Center for Biotechnology Information (“NCBI”) or the University of California, Santa Cruz (UCSC).
- NCBI National Center for Biotechnology Information
- UCSC Santa Cruz
- a “genome” refers to the complete genetic information of an organism or virus, expressed in nucleic acid sequences.
- a reference sequence or reference genome often is an assembled or partially assembled genomic sequence from an individual subject or from multiple subjects.
- a reference genome is an assembled or partially assembled genomic sequence from one or more human subjects.
- a reference genome is an assembled or partially assembled genomic sequence from one or more microorganisms of the same species.
- the reference genome can be viewed as a representative example of a species’ set of genes.
- a reference genome comprises sequences assigned to chromosomes.
- Exemplary human reference genomes include but are not limited to NCBI build 34 (UCSC equivalent: hg!6), NCBI build 35 (UCSC equivalent: hg!7), NCBI build 36.1 (UCSC equivalent: hg!8), GRCh37 (UCSC equivalent: hg!9), and GRCh38 (UCSC equivalent: hg38).
- NCBI build 34 UCSC equivalent: hg!6
- NCBI build 35 UCSC equivalent: hg!7
- NCBI build 36.1 UCSC equivalent: hg!8
- GRCh37 UCSC equivalent: hg!9
- GRCh38 GRCh38
- FIG. 1 is a block diagram illustrating a system 100 for determining a mutation call for a target locus in a biological sample, in accordance with some implementations.
- the device 100 in some implementations includes one or more central processing units (CPU(s)) 102 (also referred to as processors), one or more network interfaces 104, a user interface 106, a non-persistent memory 111, a persistent memory 112, and one or more communication buses 110 for interconnecting these components.
- the one or more communication buses 110 optionally include circuitry (sometimes called a chipset) that interconnects and controls communications between system components.
- the non-persistent memory 111 typically includes high-speed random access memory, such as DRAM, SRAM, DDR RAM, ROM, EEPROM, flash memory, whereas the persistent memory 112 typically includes CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices.
- the persistent memory 112 optionally includes one or more storage devices remotely located from the CPU(s) 102.
- the persistent memory 112, and the non-volatile memory device(s) within the non-persistent memory 112 comprises non-transitory computer readable storage medium.
- the non-persistent memory 111 or alternatively the non-transitory computer readable storage medium stores the following programs, modules and data structures, or a subset thereof, sometimes in conjunction with the persistent memory 112:
- an optional operating system 116 which includes procedures for handling various basic system services and for performing hardware dependent tasks;
- a signal data store 120 comprising, for each respective well 122 in a plurality of wells, a corresponding plurality of reporting signals 124 for a respective one or more nucleic acid molecules in the biological sample that map to the target locus, where: o the plurality of wells comprises a first set of wells 122 (e.g., 122-1-1, ... 122- K-l) representing a first allele for the target locus, o the plurality of wells further comprises a second set of wells 122 (e.g., 122-1- 2, ...
- each corresponding plurality of reporting signals 124 comprises, for each respective time point in a plurality of time points (e.g, P time points, where P is a positive integer), a respective reporting signal in the form of a corresponding discrete attribute value (e.g., 124-1-1-1, ... 124-1-1-P; 124-2-1- 1, ...
- each respective well 122 in the first set of wells and each respective well 122 in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample
- the signal data store 120 further comprises a plurality of control wells 126 that are free of nucleic acid derived from the biological sample, including a first set of control wells 126-1 representing the first allele for the target locus and a second set of control wells 126-2 representing the second allele for the target locus;
- a data analysis construct 130 for determining: o for each respective well 122 in the plurality of wells, a corresponding signal yield 132 (e.g., 132-1-1, 132-1-2) for the respective well 122 using the corresponding plurality of reporting signals across the plurality of time points, and o for each respective well 122 in the first set of wells (e.g., 122-1-1, ...
- a respective candidate call identity 134 (e.g, 134-1) based on a comparison between a corresponding first signal yield (e.g, 132-1-1) for the respective well in the first set of wells and a corresponding second signal yield (e.g, 132- 1-2) for a corresponding well in the second set of wells; and
- a voting construct 140 for performing a voting procedure across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
- one or more of the above identified elements are stored in one or more of the previously mentioned memory devices and correspond to a set of instructions for performing a function described above.
- the above identified modules, data, or programs (e.g., sets of instructions) need not be implemented as separate software programs, procedures, data sets, or modules, and thus various subsets of these modules and data may be combined or otherwise re-arranged in various implementations.
- the non-persistent memory 111 optionally stores a subset of the modules and data structures identified above. Furthermore, in some embodiments, the memory stores additional modules and data structures not described above.
- one or more of the above identified elements is stored in a computer system, other than that of system 100, that is addressable by system 100 so that system 100 may retrieve all or a portion of such data when needed.
- FIG. 1 depicts a “system 100,” the figures are intended more as a functional description of the various features which may be present in computer systems than as a structural schematic of the implementations described herein. In practice, and as recognized by those of ordinary skill in the art, items shown separately could be combined and some items could be separated. Moreover, although FIG. 1 depicts certain data and modules in non-persistent memory 111, some or all of these data and modules may be in persistent memory 112.
- FIGS. 2A-2G While a system in accordance with the present disclosure has been disclosed with reference to FIG. 1, a method in accordance with the present disclosure is now detailed with reference to FIGS. 2A-2G.
- the present disclosure provides a method 200 for determining a mutation call for a target locus in a biological sample.
- the method is performed at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors.
- the method includes obtaining a signal dataset 120 comprising, for each respective well 122 in a plurality of wells in a common plate, a corresponding plurality of reporting signals 124 for a respective one or more nucleic acid molecules in the biological sample that map to the target locus.
- the signal dataset 120 represents a plurality of time points.
- the plurality of wells 122 comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus.
- the plurality of wells 122 further comprises a second set of wells representing a second allele for the target locus, where each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus.
- Each corresponding plurality of reporting signals 124 comprises, for each respective time point in the plurality of time points, a respective reporting signal 124 in the form of a corresponding discrete attribute value.
- Each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample.
- the biological sample is a clinical sample, a diagnostic sample, an environmental sample, a consumer quality sample, a food sample, a biological product sample, a microbial testing sample, a tumor sample, a forensic sample and/or a laboratory or hospital sample.
- the biological sample is obtained from a human or an animal.
- a biological sample is a sample from a patient undergoing a treatment.
- the biological sample is collected from an environmental source, such as a field (e.g, an agricultural field), lake, river, creek, ocean, watershed, water tank, water reservoir, pool (e.g, swimming pool), pond, air vent, wall, roof, soil, plant, and/or other environmental source.
- an industrial source such as a clean room (e.g, in manufacturing or research facilities), hospital, medical laboratory, pharmacy, pharmaceutical compounding center, food processing area, food production area, water or waste treatment facility, and/or food product.
- the biological sample is an air sample, such as ambient air in a facility (e.g, a medical facility or other facility), exhaled or expectorated air from a subject, and/or aerosols, including any biological contaminants present therein (e.g, bacteria, fungi, viruses, and/or pollens).
- the biological sample is a water sample, such as dialysis systems in medical facility (e.g, to detect waterborne pathogens of clinical significance and/or to determine the quality of water in a facility).
- the biological sample is an environmental surface sample, such as before or after a sterilization or disinfecting process (e.g, to confirm the effectiveness of the sterilization or disinfecting procedure).
- the biological sample is obtained from a subject, such as a human (e.g, a patient). In some embodiments, the biological sample is obtained from any tissue, organ or fluid from the subject. In some embodiments, a plurality of biological samples is obtained from the subject (e.g, a plurality of replicates and/or a plurality of samples including a healthy sample and a diseased sample). [00181] In some embodiments, the biological sample is a clinical sample. In some embodiments, the biological sample is a bodily fluid. In some embodiments, the bodily fluid is sputum, saliva, nasopharyngeal fluid, oropharyngeal fluid or blood. In some embodiments, the biological sample is obtained from a nasopharyngeal or oropharyngeal swab. In some embodiments, the biological sample is obtained from a sample repository.
- the biological sample is obtained from a human with a disease condition (e.g., an infectious disease and/or a disease caused by a pathogenic microorganism).
- a disease condition e.g., an infectious disease and/or a disease caused by a pathogenic microorganism.
- the biological sample is obtained from a subject that is infected with a pathogen.
- the pathogen is a virus, bacteria, fungus, or parasite.
- the pathogen is the causative agent of influenza, common cold, measles, rubella, chickenpox, norovirus, polio, infectious mononucleosis (mono), herpes simplex virus (HSV), human papillomavirus (HPV), human immunodeficiency virus (HIV), viral hepatitis (e.g., hepatitis A, B, C, D, and/or E), viral meningitis, West Nile Virus, rabies, ebola, strep throat, bacterial urinary tract infections (UTIs) (e.g, coliform bacteria), bacterial food poisoning (e.g, E.
- UTIs e.g, coliform bacteria
- E bacterial food poisoning
- coli coli, Salmonella, and/ or Shigella
- bacterial cellulitis e.g, Staphylococcus aureus (MRSA)
- MRSA Staphylococcus aureus
- bacterial vaginosis e.g., bacterial vaginosis, gonorrhea, chlamydia, syphilis, Clostridium difficile (C.
- the pathogen is the causative agent of one or more viral respiratory diseases.
- the pathogen is the causative agent of a coronavirus infection. Referring to Block 206, in some embodiments, the pathogen is severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
- the biological sample is any of the embodiments for samples described herein.
- Other suitable embodiments of samples and/or subjects include those disclosed above (see, for example, the sections entitled “Target Nucleic Acids: Samples” and “Subjects,” above), and any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art.
- the target locus is selected from the group consisting of a single nucleotide variant, a multi-nucleotide variant, an indel, a DNA rearrangement, and a copy number variation.
- the target locus is selected from the group consisting of a restriction fragment length polymorphism (RFLP), a random amplified polymorphic DNA (RAPD), an amplified fragment length polymorphism (AFLP), a variable number tandem repeat (VNTR), an oligonucleotide polymorphism (OP), a single nucleotide polymorphism (SNP), an allele specific associated primer (ASAP), an inverse sequence-tagged repeat (ISTR), an inter-retrotransposon amplified polymorphism (IRAP), and a simple sequence repeat (SSR or microsatellite).
- RFLP restriction fragment length polymorphism
- RAPD random amplified polymorphic DNA
- AFLP amplified fragment length polymorphism
- VNTR variable number tandem repeat
- OP oligonucleotide polymorphism
- SNP single nucleotide polymorphism
- ASAP allele specific associated primer
- ISTR inverse sequence-tagged repeat
- IRAP
- the target locus is selected from a database.
- the target locus maps to a corresponding reference sequence (e.g., a reference genome), and the corresponding reference sequence is obtained from a nucleotide sequence database.
- Suitable nucleotide sequence databases include global genome databases and/or microorganism-specific genome databases.
- a reference sequence for a microorganism is obtained from NCBI, BLAST, EMBL-EBI, GenBank, Ensembl, EuPathDB, The Human Microbiome Project, Pathogen Portal, RDP, SILVA, GREENGENES, EBI Metagenomics, EcoCyc, PATRIC, TBDB, PlasmoDB, the Microbial Genome Database (MBGD), and/or the Microbial Rosetta Stone Database.
- the target locus is a gene. In some embodiments, the target locus maps to all or a portion of a gene. In some embodiments, the target locus maps to all or a portion of a plurality of genes.
- the target locus comprises DNA or RNA.
- the target locus maps to all or a portion of a reference genome that consists essentially of RNA sequences.
- the target locus maps to all or a portion of a reference genome that consists essentially of DNA sequences.
- the target locus comprises DNA, and the one or more nucleic acid molecules in the biological sample that map to the target locus is obtained from a transcription reaction of a DNA molecule for the target locus.
- the transcription reaction generates one or more RNA transcripts that correspond to the target locus and, for each respective well in the plurality of wells in the common plate, the corresponding aliquot of nucleic acid derived from the biological sample includes the one or more RNA transcripts used for obtaining the respective plurality of reporting signals.
- the target locus, nucleic acids, mutations, and/or reference sequences include any of the embodiments described herein.
- Other suitable embodiments of target loci, nucleic acids, mutations, and/or reference sequences include those disclosed above (see, for example, the sections entitled “Target Loci,” “Target Nucleic Acids,” and “Reference Sequences,” above), as well as any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art.
- the target locus comprises a plurality of alleles, including a first allele and a second allele.
- the first allele is wild-type
- the second allele is mutant
- the mutation call is wild-type or mutant.
- the first allele is a first mutant
- the second allele is a second mutant.
- the mutation call is for the first mutant allele or the second mutant allele.
- the first allele and the second allele are selected from a plurality of alleles for the respective target locus.
- the plurality of alleles includes at least 2, at least 3, at least 4, at least 5, or at least 8 alleles. In some embodiments, the plurality of alleles includes no more than 10, no more than 8, no more than 5, no more than 3, or no more than 2 alleles. In some embodiments, the plurality of alleles is from 2 to 5, from 3 to 8, from 3 to 10, or from 5 to 10 alleles. In some embodiments, the plurality of alleles falls within another range starting no lower than 2 alleles and ending no higher than 10 alleles.
- the target locus comprises a plurality of alleles, including at least a wild-type allele. In some embodiments, the target locus comprises a plurality of alleles, including at least a mutant allele. In some embodiments, the target locus comprises a plurality of alleles, including at least a first mutant allele and a second mutant allele. In some embodiments, the target locus comprises one or more mutant alleles. For example, in some embodiments, the target locus comprises at least 2, at least 3, at least 4, at least 5, or at least 8 mutant alleles. In some embodiments, the target locus comprises no more than 10, no more than 8, no more than 5, no more than 3, or no more than 2 mutant alleles.
- the target locus comprises from 2 to 5, from 3 to 8, from 3 to 10, or from 5 to 10 mutant alleles. In some embodiments, the target locus comprises a set of mutant alleles that falls within another range starting no lower than 2 alleles and ending no higher than 10 alleles.
- the mutation call is for a wild-type allele. In some embodiments, the mutation call is for a mutant allele. In some embodiments, the mutation call is for a respective mutant allele in a plurality of mutant alleles. In some embodiments, the mutation call is selected from the group consisting of a wild-type allele and a respective mutant allele in a plurality of mutant alleles.
- the first plurality of guide nucleic acids that have the first allele of the target locus hybridize to the first allele. In some embodiments, the first plurality of guide nucleic acids that have the first allele of the target locus have a nucleic acid sequence that is reverse complementary to the nucleic acid sequence of the first allele. In some embodiments, the second plurality of guide nucleic acids that have the second allele of the target locus hybridize to the second allele. In some embodiments, the second plurality of guide nucleic acids that have the second allele of the target locus have a nucleic acid sequence that is reverse complementary to the nucleic acid sequence of the second allele.
- a respective guide nucleic acid comprises any of the embodiments for guide nucleic acids disclosed herein, as well as any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art. See, for example, the section entitled “Engineered Guide Nucleic Acids,” above.
- the corresponding aliquot of nucleic acid derived from the biological sample is RNA or DNA.
- the corresponding aliquot of nucleic acid derived from the biological sample comprises nucleic acids obtained from within cells.
- the corresponding aliquot of nucleic acid derived from the biological sample comprises cell-free nucleic acid molecules.
- the one or more nucleic acid molecules comprises synthetic nucleic acid molecules.
- Example 1 describes systems and methods for identifying mutation calls using synthetic nucleic acid molecules, in accordance with an embodiment of the present disclosure.
- the method includes isolating the one or more nucleic acid molecules from the biological sample.
- isolation of nucleic acid molecules includes removing one or more cells from the biological sample, subjecting the biological sample to a lysis step, and/or treating the biological sample in order to separate cellular nucleic acid molecules from cell-free nucleic acid molecules.
- isolation of nucleic acid molecules includes isolation of cell-free nucleic acid molecules from a liquid biological sample (e.g, plasma and/or blood).
- the method includes heat-inactivating the biological sample.
- the method includes performing an amplification of one or more nucleic acid molecules derived from the biological sample. See, for instance, the section entitled, “Nucleic Acid Amplification,” below.
- nucleic acid from biological samples and/or preparing biological samples for the same are contemplated for use in the present disclosure, as will be apparent to one skilled in the art.
- the plurality of wells 122 comprises at least 3, at least 5, at least 10, at least 20, at least 40, at least 50, at least 80, at least 100, at least 300, or at least 500 wells. In some embodiments, the plurality of wells comprises no more than 1000, no more than 500, no more than 200, no more than 80, no more than 50, or no more than 30 wells. In some embodiments, the plurality of wells is from 3 to 20, from 5 to 100, from 8 to 50, or from 10 to 500 wells. In some embodiments, the plurality of wells falls within another range starting no lower than 3 wells and ending no higher than 1000 wells.
- the common plate is a multi-well plate.
- the signal dataset 120 is obtained by a procedure comprising amplifying a first plurality of nucleic acids 304 derived from the biological sample, thereby generating a plurality of amplified nucleic acids 404.
- the procedure further includes, for each respective well 122 in the first set of wells and each respective well in the second set of wells, partitioning, from the plurality of amplified nucleic acids 404, the respective corresponding aliquot of nucleic acid 410; contacting the respective corresponding aliquot of nucleic acid with at least a programmable nuclease, a guide nucleic acid, and one or more reporters; and cleaving the one or more reporters using the programmable nuclease if the first allele or the second allele, respectively, is present, thereby generating a reporting signal 124.
- a biological sample 302 is used to obtain a plurality of nucleic acids 304.
- the nucleic acids 304 are extracted from the biological sample 302.
- the signal dataset 120 is obtained by a procedure 306 such as a DETECTR reaction, which includes an optional amplification reaction 308 and a programmable nuclease-based assay 310.
- the amplifying 308 includes any suitable method for amplification of nucleic acids known in the art, such as reverse transcriptase loop-mediated isothermal amplification (RT-LAMP).
- the programmable nuclease-based assay 310 also includes any suitable detection reaction, such as a Cas nuclease-based reaction, in which recognition of a target nucleic acid in the sample nucleic acids 304 by a guide nucleic acid (e.g, gRNA) induces cleavage of a reporter (e.g, a nucleic acid probe) by a Cas nuclease.
- a guide nucleic acid e.g, gRNA
- the Cas nuclease is complexed with the guide nucleic acid. Reporting signals generated from cleavage of the reporter can be detected, measured, and used for mutation calling and variant identification 312, in accordance with methods of the present disclosure.
- the procedure 306 is a DETECTR reaction. In some embodiments, the procedure 306 is SHERLOCK. In some embodiments, the signal dataset 120 is obtained by a procedure that does not comprise an amplification step.
- the amplifying is performed using isothermal amplification, loop-mediated isothermal amplification (LAMP), reverse transcriptase loop-mediated isothermal amplification (RT-LAMP), recombinase polymerase amplification (RPA), reverse transcriptase recombinase polymerase amplification (RT-RPA), polymerase chain reaction (PCR), or reverse transcriptase polymerase chain reaction (RT- PCR).
- LAMP loop-mediated isothermal amplification
- RPA reverse transcriptase loop-mediated isothermal amplification
- RPA reverse transcriptase polymerase amplification
- PCR polymerase chain reaction
- RT- PCR reverse transcriptase polymerase chain reaction
- the amplifying is performed using transcription mediated amplification (TMA), helicase dependent amplification (HD A), circular helicase dependent amplification (cHDA), strand displacement amplification (SDA), recombinase polymerase amplification (RPA), polymerase chain reaction (PCR), quantitative PCR (qPCR), nested PCR, multiplex PCR, degenerative PCR, asymmetric PCR, touchdown PCR, random primer PCR, hemi-nested PCR, polymerase cycling assembly (PCA), colony PCR, ligase chain reaction (LCR), digital PCR, methylation-specific PCR (MSP), co-amplification at lower denaturation temperature PCR (COLD-PCR), allele-specific PCR, intersequence-specific PCR (ISS-PCR), whole genome amplification (WGA), inverse PCR, thermal asymmetric interlaced PCR (TAIL-PCR), loop mediated amplification (LAMP), exponential amplification reaction (
- TMA transcription mediated
- isothermal amplification techniques employ a polymerase and a set of specialized primers designed to recognize distinct sequences in one or more target nucleic acids.
- LAMP techniques typically perform amplification of target nucleic acid molecules at a constant temperature (e.g, 60-65 °C) using multiple inner and outer primers and a polymerase having strand displacement activity.
- LAMP is initiated by an inner primer pair containing a nucleic acid sequence complementary to a portion of the sense and antisense strands of the target nucleic acid.
- strand displacement synthesis primed by an outer primer pair can cause release of a single-stranded amplicon.
- the single-stranded amplicon can serve as a template for further synthesis primed by a second inner and second outer primer that hybridize to the other end of the target nucleic acid and produce a stem-loop nucleic acid structure.
- a second inner and second outer primer that hybridize to the other end of the target nucleic acid and produce a stem-loop nucleic acid structure.
- one inner primer hybridizes to the loop on the product and initiates displacement and target nucleic acid synthesis, yielding the original stem-loop product and a new stem-loop product with a stem twice as long.
- the 3’ terminus of an amplicon loop structure serves as initiation site for self-templating strand synthesis, yielding a hairpin-like amplicon that forms an additional loop structure to prime subsequent rounds of self-templated amplification.
- the amplification continues with accumulation of many copies of the target nucleic acid.
- the final products of the LAMP process are stem-loop nucleic acids with concatenated repeats of the target nucleic acid in cauliflower-like structures with multiple loops formed by annealing between alternately inverted repeats of a target nucleic acid sequence in the same strand.
- LAMP assays produce a detectable signal (e.g, fluorescence) during the amplification reaction.
- the method comprises detecting and/or quantifying a detectable signal (e.g, fluorescence) produced during the LAMP assay. Any suitable method for detecting and quantifying florescence can be used.
- the amplifying is performed using any of the embodiments described herein, or any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art. See, for example, the section entitled “Target Nucleic Acids: Amplification Techniques,” above.
- the method further includes, prior to the amplifying, partitioning the first plurality of nucleic acids 304 derived from the biological sample into a plurality of amplification replicates 404 (e.g, 404-1, 404-2, 404-3), where the amplifying generates a respective plurality of amplified nucleic acids for each respective amplification replicate.
- the method comprises generating aliquots 404 for amplification reactions, as illustrated in FIG. 4A.
- the first plurality of nucleic acids is not partitioned into amplification replicates prior to the amplifying.
- the plurality of amplification replicates comprises at least 3, at least 4, at least 5, at least 6, at least 8, at least 10, at least 15, or at least 20 amplification replicates. In some embodiments, the plurality of amplification replicates comprises no more than 30, no more than 20, no more than 10, or no more than 6 amplification replicates. In some embodiments, the plurality of amplification replicates is from 3 to 9, from 4 to 12, or from 10 to 25 amplification replicates. In some embodiments, the plurality of amplification replicates falls within another range starting no lower than 3 replicates and ending no higher than 30 replicates.
- the method further includes applying an amplification threshold filter to each respective amplification replicate, where, when the respective plurality of amplified nucleic acids fails to satisfy an amplification threshold, the respective amplification replicate is removed from the plurality of amplification replicates.
- the method further includes applying an amplification threshold filter to each respective biological sample in a plurality of biological samples, where, when the respective plurality of amplified nucleic acids fails to satisfy an amplification threshold, the respective sample is removed from the plurality of samples.
- the amplification threshold is established by performing a visual inspection of amplification reaction curves.
- the amplification reaction curve is a receiver operating characteristic (ROC) curve.
- the amplification threshold is determined by selecting an optimal Youden Index point such that the absolute difference between a sensitivity metric and a false positive rate calculated using reference amplification data is maximized.
- the amplification threshold is derived using positive and negative (e.g, no template) amplification controls from the amplification reaction (e.g, LAMP) of a given sample.
- the amplification threshold includes a time threshold and/or a fluorescence rate threshold.
- Positive amplification controls are deemed to represent an ideal sample, where the ideal sample exhibits a classic sigmoidal rise of fluorescence over time, and negative amplification controls are deemed to represent the background fluorescence.
- an ideal sample is approximated as having fluorescence kinetics similar to that of a positive amplification control.
- the amplification threshold is determined based on a mean value between the amplification controls within a common plate. This determination can be used to reduce the impact of background that can occur in amplification replicates.
- the determination of the amplification threshold further includes collecting fluorescence values at the time threshold.
- the time threshold can be used to exclude those samples that would amplify closer to the endpoint, signifying the amplification intermediates (e.g, LAMP intermediates) to be the majority contributors of the rise in the signal and not the actual sample itself.
- the determination further includes assigning a score for each sample, where the score is calculated as a ratio of the rate of fluorescence threshold to the rate fluorescence value at the time threshold for each sample.
- a threshold ratio e.g, 1
- samples are deemed to have failed to reach the minimum fluorescence required to be called out as amplified and, when the ratio is greater than or equal to the threshold ratio (e.g, 1), then samples are deemed to have amplified sufficiently.
- the determination of the amplification threshold comprises performing an ROC analysis using the calculated score and/or a measured amplification metric (e.g, ground truth) of each sample in a plurality of samples, thereby identifying an exact score value for the respective sample.
- a measured amplification metric e.g, ground truth
- the time threshold is from 5 minutes to 40 minutes. In some embodiments, the time threshold is a time point within the total duration of the amplification reaction that is at least 10%, at least 20%, or at least 30% into the total duration and is no more than 90%, no more than 80%, or no more than 70% into the total duration. For example, in some embodiments, the amplification reaction has a duration of 40 minutes and the time threshold is selected from a range of from 10 minutes to 30 minutes. In some embodiments, the amplification reaction has a duration of 40 minutes and the time threshold is 18 minutes.
- one or more amplification replicates in a plurality of amplification replicates are pooled and/or diluted, e.g, by any suitable method known in the art.
- the method further includes, prior to the contacting, partitioning the respective plurality of amplified nucleic acids for each respective amplification replicate 404 into a plurality of detection replicates 408, where each respective detection replicate 410 in the plurality of detection replicates represents a respective well 122 in the plurality of wells.
- the method comprises generating a respective plurality of aliquots 408 from a respective amplification reaction 404, as illustrated in FIGS. 4A-4B.
- each respective aliquot 410 corresponds to a respective well 122 in the plurality of wells.
- wells are grouped into a plurality of sets depending on the type of guide nucleic acid used to contact the corresponding aliquot of nucleic acid, as illustrated in FIGS. 4A-4B (e.g, a first set of wells comprises allele 1 (WT) guide sequences and a second set of wells comprises allele 2 (MUT) guide sequences).
- each well in the first set of wells contains amplified nucleic acids that are interrogated by the allele 1 guide sequences (WT; set “A”) and each well in the second set of wells contains amplified nucleic acids that are interrogated by the allele 2 guide sequences (MUT; set “B”).
- each respective detection replicate represents a respective well in the plurality of wells.
- each respective detection replicate represents a pair of wells in a respective comparative programmable nuclease-based reaction.
- a pair of wells includes (i) a first well in which the corresponding aliquot of nucleic acid is interrogated with a guide nucleic acid having the first allele and (ii) a second well in which the corresponding aliquot of nucleic acid is interrogated with a guide nucleic acid having the second allele.
- a detection replicate includes, e.g, well 410-1-1-1-A (e.g., well 410 for plate 1, amplification replicate 1, detection reaction 1, and wild-type nucleic acid guide “A”) and well 410- 1-1-1 -B (e.g., well 410 for plate 1, amplification replicate 1, detection reaction 1, mutant nucleic acid guide “B”).
- well 410-1-1-1-A e.g., well 410 for plate 1, amplification replicate 1, detection reaction 1, and wild-type nucleic acid guide “A”
- well 410- 1-1-1 -B e.g., well 410 for plate 1, amplification replicate 1, detection reaction 1, mutant nucleic acid guide “B”.
- the plurality of partitioned detection replicates comprises at least 3, at least 4, at least 5, at least 6, at least 8, at least 10, at least 15, or at least 20 detection replicates. In some embodiments, for each respective amplification replicate, the plurality of partitioned detection replicates comprises no more than 30, no more than 20, no more than 10, or no more than 6 detection replicates. In some embodiments, for each respective amplification replicate, the plurality of partitioned detection replicates is from 3 to 9, from 4 to 12, or from 10 to 25 detection replicates. In some embodiments, for each respective amplification replicate, the plurality of partitioned detection replicates falls within another range starting no lower than 3 replicates and ending no higher than 30 replicates. In some embodiments, nucleic acids in the biological sample, and/or a respective plurality of amplified nucleic acids derived therefrom, are not partitioned into a plurality of detection replicates prior to the contacting.
- the cleaving the one or more reporters using the programmable nuclease is performed using a detection assay, such as a programmable nuclease-based step in a DETECTR assay.
- a detection assay such as a programmable nuclease-based step in a DETECTR assay. See, e.g, Broughton et al., “CRISPR-Cas 12- based detection of SARS-CoV-2,” Nature Biotechnology 38, 870-874 (2020), which is hereby incorporated herein by reference in its entirety.
- the method further comprises, prior to the amplifying, partitioning the first plurality of nucleic acids 304 derived from the biological sample into a plurality of replicates for each locus in a plurality of loci (e.g, an amino acid position, gene, and/or target sequence of interest).
- nucleic acids in the biological sample are not partitioned into a plurality of replicates for each locus in a plurality of loci.
- the method further comprises, prior to the contacting, partitioning the respective plurality of amplified nucleic acids for a respective amplification replicate 404 into a plurality of replicates for each locus in a plurality of loci (e.g, an amino acid position, gene, and/or target sequence of interest). For instance, as illustrated in FIG. 4A, nucleic acids 304 are interrogated at each of a plurality of loci of interest 406.
- a plurality of loci e.g, an amino acid position, gene, and/or target sequence of interest
- the amplification replicates 404 are partitioned across a plurality of multi-well plates, where each respective plate corresponds to a respective target locus in the plurality of loci (e.g, 406-1, 406-2, 406-3).
- the respective amplification replicate 404 is further partitioned into a plurality of detection replicates 408 (e.g, 408-1-1, 408-1-2, 408-1-3), where each respective detection replicate 410 in the plurality of detection replicates represents a respective well 122 in the plurality of wells.
- one or more detection replicates in a plurality of detection replicates are pooled and/or diluted, e.g, by any suitable method known in the art.
- the method comprises contacting the respective corresponding aliquot of nucleic acid with a plurality of programmable nucleases.
- each respective programmable nuclease in the plurality of programmable nucleases is the same or different type of nuclease.
- a respective programmable nuclease is a trans-cleaving programmable nuclease (e.g, capable of nonspecific cleavage of nucleic acids).
- the programmable nuclease targets DNA (e.g, singlestranded DNA) or RNA.
- the programmable nuclease is a Cas nuclease.
- the Cas nuclease is selected from the group consisting of a Casl2, Casl3, Cas 14, CasPhi, and/or any subtypes or orthologs thereof.
- the Cas nuclease is LbCasl2a, AsCasl2a, or CasDxl.
- the programmable nuclease is any of the embodiments described herein, or any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art.
- the programmable nuclease is any of the programmable nucleases described in Broughton et al., “CRISPR-Casl2-based detection of SARS-CoV -2, ’’ Nature Biotechnology 38, 870-874 (2020), which is hereby incorporated herein by reference in its entirety.
- the programmable nuclease is programmed for the target locus using a corresponding guide nucleic acid in the first plurality of guide nucleic acids that have the first allele of the target locus, where the cleaving the one or more reporters using the programmable nuclease is initiated upon recognition of the target locus by the corresponding guide nucleic acid.
- a guide nucleic acid directs a programmable nuclease to a respective target locus for cleavage by recognizing the nucleic acid sequence of the target locus.
- Recognition of the target locus can be engineered by designing guide nucleic acids that hybridize to (e.g., are reverse complementary to) all or a portion of the nucleic acid sequence of the target locus.
- the corresponding guide nucleic acid is complexed to the programmable nuclease.
- the first plurality of guide nucleic acids is RNA.
- the first plurality of guide nucleic acids is gRNA or sgRNA.
- the first plurality of guide nucleic acids hybridizes to a wild-type sequence for the target locus. In some embodiments, the first plurality of guide nucleic acids is reverse complementary to a wild-type sequence for the target locus.
- the programmable nuclease is programmed for the target locus using a corresponding guide nucleic acid in the second plurality of guide nucleic acids that have the second allele of the target locus, where the cleaving the one or more reporters using the programmable nuclease is initiated upon recognition of the target locus by the corresponding guide nucleic acid.
- the second plurality of guide nucleic acids is RNA.
- the second plurality of guide nucleic acids hybridizes to (e.g., is reverse complementary to) a mutant sequence for the target locus.
- the guide nucleic acids comprise any of the embodiments described herein, or any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art. See, for example, the section entitled “Engineered Guide Nucleic Acids,” above.
- each respective reporter in the one or more reporters is single-stranded DNA. In some embodiments, each respective reporter in the one or more reporters is single-stranded RNA.
- each respective reporter in the one or more reporters comprises a fluorescent reporter linked to a quencher.
- the one or more reporters are cleaved by the programmable nuclease when the first allele is present in the corresponding aliquot of nucleic acid derived from the biological sample, thus generating a reporting signal.
- the one or more reporters are not cleaved by the programmable nuclease, and the reporting signal is zero or background.
- the one or more reporters are cleaved by the programmable nuclease when the second allele is present in the corresponding aliquot of nucleic acid derived from the biological sample, thus generating a reporting signal.
- the one or more reporters are not cleaved by the programmable nuclease, and the reporting signal is zero or background.
- each respective reporting signal 124 is obtained from a detection moiety.
- each respective reporting signal is a fluorescence emission by a fluorophore
- the corresponding discrete attribute value is a fluorescence intensity.
- each respective reporting signal is an intensity of light, and the corresponding discrete attribute value is measured in relative units (e.g., relative fluorescent units).
- the corresponding discrete attribute value for each respective reporting signal is detected using a plate reader.
- the respective reporting signal 124 is a lateral flow readout.
- in the lateral flow readout is detected manually by visual inspection.
- the lateral flow readout is detected using an image analysis algorithm (e.g., ImageJ, FIJI, etc.). Detection of reporting signals by lateral flow readout is further described in Broughton et al., “CRISPR-Cas 12-based detection of SARS-CoV-2,” Nature Biotechnology 38, 870-874 (2020), which is hereby incorporated herein by reference in its entirety.
- the one or more reporters and/or the plurality of reporting signals, and methods of detection thereof comprise any of the embodiments described herein, as well as any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art. See, for example, the section entitled “Reporters,” above.
- the plurality of time points represented by the signal dataset comprises between 20 and 2000 time points. In some embodiments, the plurality of time points represented by the signal dataset is at least 10, at least 20, at least 50, at least 80, at least 100, at least 200, at least 500, at least 1000, or at least 2000 time points. In some embodiments, the plurality of time points is no more than 5000, no more than 2000, no more than 1000, no more than 500, no more than 200, no more than 100, or no more than 50 time points. In some embodiments, the plurality of time points represented by the signal dataset is from 10 to 500, from 50 to 200, from 40 to 100, or from 200 to 1000. In some embodiments, the plurality of time points represented by the signal dataset falls within another range starting no lower than 10 time points and ending no higher than 5000 time points.
- the plurality of time points is obtained over a duration of between 30 seconds and 1 hour. In some embodiments, the plurality of time points is obtained over a duration of at least 30 seconds, at least 1 minute, at least 5 minutes, at least 10 minutes, at least 30 minutes, or at least 1 hour. In some embodiments, the plurality of time points is obtained over a duration of no more than 2 hours, no more than 1 hour, no more than 30 minutes, or no more than 5 minutes. In some embodiments, the plurality of time points is obtained over a duration of from 30 seconds to 10 minutes, from 5 minutes to 1 hour, or from 20 minutes to 50 minutes. In some embodiments, the plurality of time points is obtained over a duration that falls within another range starting no lower than 30 seconds and ending no higher than 2 hours.
- each respective time point corresponds to a respective measurement obtained during a detection assay, where the detection assay is performed over a duration of time.
- each respective time point corresponds to a respective readout obtained from a plate reader, where each respective readout in a plurality of readouts is taken at intervals over the duration of the detection assay (e.g., every 30 seconds, every 1 second, every 0.5 seconds, etc.).
- the plurality of reporting signals 124 for each respective well 122 in the plurality of wells comprises at least 10, at least 20, at least 50, at least 80, at least 100, at least 200, at least 500, at least 1000, or at least 2000 reporting signals.
- the plurality of reporting signals for each respective well is no more than 5000, no more than 2000, no more than 1000, no more than 500, no more than 200, no more than 100, or no more than 50 reporting signals.
- the plurality of reporting signals for each respective well is from 10 to 500, from 50 to 200, from 40 to 100, or from 200 to 1000 reporting signals.
- the plurality of reporting signals for each respective well falls within another range starting no lower than 10 reporting signals and ending no higher than 5000 reporting signals.
- the signal dataset 120 further comprises, for each respective control well in a plurality of control wells in the common plate, a corresponding plurality of control reporting signals, where the plurality of control wells are free of nucleic acid derived from the biological sample.
- the plurality of control wells comprises a first set of control wells 126-1 representing the first allele for the target locus, where each control well in the first set of control wells includes a first plurality of guide nucleic acids that have the first allele of the target locus.
- the plurality of control wells comprises a second set of control wells 126-2 representing the second allele for the target locus, where each control well in the second set of control wells includes a second plurality of guide nucleic acids that have the second allele of the target locus.
- Each corresponding plurality of control reporting signals comprises, for each respective time point in the plurality of time points, a respective control reporting signal in the form of a corresponding discrete attribute value.
- each respective control well is a negative control well and/or a no template control wells.
- the signal dataset 120 further comprises, for each respective positive control well in a plurality of positive control wells in the common plate, a corresponding plurality of positive control reporting signals.
- the plurality of positive control wells comprises a first set of positive control wells representing the first allele for the target locus, where each respective positive control well in the first set of positive control wells includes a first plurality of guide nucleic acids that have the first allele of the target locus.
- the signal dataset further includes a second set of positive control wells representing the second allele for the target locus, where each respective positive control well in the second set of positive control wells comprises a second plurality of guide nucleic acids that have the second allele of the target locus.
- each respective well in the first set of wells comprises one or more nucleic acid molecules corresponding to the first allele of the target locus
- each respective well in the second set of wells comprises one or more nucleic acid molecules corresponding to the second allele of the target locus.
- a respective guide nucleic acid in a respective plurality of guide nucleic acids in a well hybridizes to all or a portion of the nucleic acid sequence of the respective allele.
- any of the embodiments disclosed herein for guide nucleic acids, reporting signals and methods of obtaining the same, alleles, target loci, and wells are contemplated for use with control wells and/or positive control wells, as will be apparent to one skilled in the art. See, e.g., the sections entitled, “Samples,” “Nucleic Acid Amplification,” “Programmable Nuclease-Based Assays,” and “Signal Dataset,” above.
- the signal dataset is preprocessed for analysis.
- the preprocessing comprises data cleaning, normalization, filtering, and/or any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art.
- the method further includes determining, for each respective well 122 in the plurality of wells, a corresponding signal yield 132 for the respective well using the corresponding plurality of reporting signals 124 for the respective well across the plurality of time points.
- the determining signal yield further comprises, for each respective well 122 in the plurality of wells, normalizing the respective plurality of reporting signals 124 for the respective well by scaling a maximum discrete attribute value by a minimum discrete attribute value in the corresponding plurality of reporting signals 124 for the respective well, thereby obtaining the corresponding signal yield 132 for the respective well.
- the maximum discrete attribute value is an endpoint fluorescence intensity
- the minimum discrete attribute value is a minimum fluorescence intensity
- the signal yield threshold is obtained by determining, for each respective control well in a plurality of control wells in the common plate, a corresponding control signal yield for the respective well using the corresponding plurality of control reporting signals for the respective well across the plurality of time points, thereby obtaining a plurality of control signal yields including a maximum control signal yield and a minimum control signal yield.
- the signal yield threshold is a respective control signal yield (e.g, the maximum and/or the minimum control signal yield), and the respective well is deemed a no-call when the signal yield for the respective well is equal to or less than the respective control signal yield.
- the signal yield threshold is a central tendency metric for the plurality of control signal yields (e.g, an average control signal yield), and the respective well is deemed a no-call when the signal yield for the respective well is equal to or less than the central tendency metric.
- each respective control signal yield is determined by normalizing a maximum discrete attribute value in the corresponding plurality of control reporting signals for the corresponding control well by a minimum discrete attribute value in the corresponding plurality of control reporting signals for the corresponding control well.
- the maximum discrete attribute value and the minimum discrete attribute value will generally be similar, due to the lack of target nucleic acid derived from the biological sample. For instance, in the absence of target nucleic acids to which a corresponding guide nucleic acid can hybridize, a complexed programmable nuclease is unlikely to initiate reporter cleavage. As a result, reporting signals are expected to maintain background levels throughout the course of the detection assay.
- each respective control signal yield in the plurality of control signal yields is 1 or about 1, and the respective well is deemed a no-call when the signal yield for the respective well is equal to 1 or about 1.
- the method further includes determining, for each respective well 122 in the first set of wells, a respective candidate call identity 134 based on a comparison between (i) a corresponding first signal yield 132 for the respective well in the first set of wells and (ii) a corresponding second signal yield 132 for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities 134.
- the respective candidate call identity is selected from the group consisting of first allele, second allele, and no-call (e.g, wild-type, mutant, and/or no-call).
- the corresponding signal yield from the corresponding well in the second set of wells originates from a common biological replicate of the biological sample.
- the first well and the corresponding well are matched for comparison across detection replicates in order to determine candidate call identities.
- the first well and the corresponding well are matched for comparison within a comparative programmable nuclease-based reaction.
- a respective portion of nucleic acid is divided into two wells, where a first well is interrogated by a guide nucleic acid having the first allele sequence and a second well is interrogated by a guide nucleic acid having the second allele sequence. Relative reporting signals obtained from the first well and the second well are compared.
- Comparative programmable nuclease-based reactions are further described in detail herein (see, e.g, the section entitled, “Programmable Nuclease-Based Assays,” above).
- the comparative programmable nuclease-based reaction is a DETECTR reaction.
- the respective portion of nucleic acid is an aliquot of nucleic acid.
- the respective portion of nucleic acid is a plurality of nucleic acids derived from a biological sample.
- FIG. 4B illustrates a comparative detection reaction including, for each respective pair of wells, a first well 410 interrogated by the wild-type guide nucleic acid “A” and a second well 410 interrogated by the mutant guide nucleic acid “B.”
- the first well and the corresponding well are matched for comparison across amplification replicates in order to determine candidate call identities.
- the respective candidate call identity is obtained using a relative intensity metric between the corresponding first signal yield and the corresponding second signal yield.
- the relative intensity metric is a ratio, a log ratio, a ratio of binary logarithm, a difference, and/or a percent change.
- the relative intensity metric has the form log 2 (Fy(FAl)) / log 2 (Fy(SAl)),
- Fy(FAl) is the first allele corresponding signal yield
- Fy(SAl) is the second allele corresponding signal yield
- the candidate call identity when the relative intensity metric satisfies a first threshold criterion, the candidate call identity is first allele, and when the relative intensity metric satisfies a second threshold criterion, the candidate call identity is second allele.
- the first threshold criterion and/or the second threshold criterion is based on a control relative intensity metric for a first control signal yield and a second control signal yield.
- the control relative intensity metric is a ratio, a log ratio, a ratio of binary logarithm, a difference, and/or a fold change.
- the first threshold criterion is satisfied when the relative intensity metric is greater than the ratio of (i) the binary logarithm of a first control signal yield and (i) the binary logarithm of a second control signal yield, where the first control signal yield is obtained for a first control well in the first set of control wells including a first plurality of guide nucleic acids that have the first allele of the target locus, and the second control signal yield is obtained for a second control well in the second set of control wells including a second plurality of guide nucleic acids that have the second allele of the target locus. Accordingly, in some embodiments, the first threshold criterion is satisfied when the relative intensity metric is greater than log 2 (Fc(FAl)) / log 2 (Fc(SAl))
- Fc(FAl) is the first control signal yield
- Fc(SAl) is the second control signal yield
- the ratio of (i) the binary logarithm of a first control signal yield and (i) the binary logarithm of a second control signal yield is 1 or about 1.
- the candidate call identity is first allele.
- the second threshold criterion is satisfied when the relative intensity metric is less than the ratio of (i) the binary logarithm of a first control signal yield and (i) the binary logarithm of a second control signal yield, where the first control signal yield is obtained for a first control well in the first set of control wells including a first plurality of guide nucleic acids that have the first allele of the target locus, and the second control signal yield is obtained for a second control well in the second set of control wells including a second plurality of guide nucleic acids that have the second allele of the target locus. Accordingly, in some embodiments, the second threshold criterion is satisfied when the relative intensity metric is less than log 2 (Fc(FAl)) / log 2 (Fc(SAl))
- Fc(FAl) is the first control signal yield
- Fc(SAl) is the second control signal yield
- the ratio of (i) the binary logarithm of a first control signal yield and (i) the binary logarithm of a second control signal yield is 1 or about 1.
- the candidate call identity is second allele.
- the method further comprises obtaining a central tendency metric for the corresponding first signal yield and the corresponding second signal yield.
- the central tendency metric is an arithmetic mean, weighted mean, log mean, midrange, midhinge, trimean, geometric mean, geometric median, Winsorized mean, median, and/or mode of the distribution of values.
- the central tendency metric is calculated as
- the signal dataset 120 further comprises, for each respective control well in a plurality of control wells in the common plate, a corresponding plurality of control reporting signals, where the plurality of control wells are free of nucleic acid derived from the biological sample.
- the plurality of control wells comprises a first set of control wells 126-1 representing the first allele for the target locus, where each well in the first set of control wells includes a first plurality of guide nucleic acids that have the first allele of the target locus.
- the plurality of control wells comprises a second set of control wells 126-2 representing the second allele for the target locus, where each well in the second set of control wells includes a second plurality of guide nucleic acids that have the second allele of the target locus.
- Each corresponding plurality of control reporting signals comprises, for each respective time point in the plurality of time points, a respective control reporting signal in the form of a corresponding discrete attribute value.
- the method further comprises determining, for each respective control well in the plurality of control wells, a corresponding control signal yield for the respective well using the corresponding plurality of control reporting signals for the respective well across the plurality of time points, thereby obtaining a plurality of control signal yields including a maximum control signal yield and a minimum control signal yield for the common plate.
- the method further includes evaluating whether a respective candidate call identity 134 in the plurality of candidate call identities is a no-call using the maximum control signal yield and the minimum control signal yield.
- the evaluating comprises performing a comparison of (i) the relative intensity metric between the corresponding first and second signal yields and (ii) the plurality of control signal yields, where, when the relative intensity metric between the corresponding first and second signal yields falls within a range bounded by the maximum control signal yield and the minimum control signal yield, the respective candidate call identity 134 is deemed a no-call.
- each respective control signal yield is determined by normalizing a maximum discrete attribute value in the corresponding plurality of control reporting signals for the corresponding control well by a minimum discrete attribute value in the corresponding plurality of control reporting signals for the corresponding control well. As described above, in some embodiments, each respective control signal yield is 1 or about 1.
- the method further includes obtaining a central tendency metric for the corresponding first signal yield and the corresponding second signal yield.
- the method further comprises determining, for each respective control well in the first set of control wells, a respective control central tendency metric, thereby obtaining a plurality of control central tendency metrics including a maximum control central tendency metric and a minimum control central tendency metric.
- the evaluating comprises performing a comparison between (i) the central tendency metric of the corresponding first and second signal yields and (ii) the plurality of control central tendency metrics for the first set of control wells, where, when the central tendency metric of the corresponding first and second signal yields falls within a range bounded by the maximum control central tendency metric and the minimum control central tendency metric, the respective candidate call identity 134 is deemed a no-call.
- the evaluating comprises performing a comparison between (i) the central tendency metric of the corresponding first and second signal yields and (ii) the plurality of control central tendency metrics for the first set of control wells, where, when the central tendency metric of the corresponding first and second signal yields is less than a respective control central tendency threshold, the respective candidate call identity 134 is deemed a no-call.
- the control central tendency threshold is a respective control central tendency metric in the plurality of control central tendency metrics (e.g, the maximum control central tendency metric).
- control central tendency metric is an arithmetic mean, weighted mean, log mean, midrange, midhinge, trimean, geometric mean, geometric median, Winsorized mean, median, and/or mode of the distribution of values.
- control central tendency metric is a log average calculated as
- Fc(FAl) is a first control signal yield for the respective control well in the first set of control wells
- Fc(SAl) is a second control signal yield for a corresponding control well in the second set of control wells.
- the method further includes performing a voting procedure across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
- each respective candidate call identity for each respective well in the first set of wells corresponds to a respective biological sample in one or more biological samples
- the voting procedure comprises assigning the respective candidate call identity as the mutation call for the target locus
- the voting procedure comprises applying a concordance vote across the plurality of candidate call identities for the first set of wells, where the mutation call for the target locus is assigned based on a common candidate call identity that is shared by at least a majority of the first set of wells. In some embodiments, the voting procedure comprises applying a concordance vote across the plurality of candidate call identities for the first set of wells, where the mutation call for the target locus is assigned based on a common candidate call identity that is shared by all of the wells in the first set of wells. In some embodiments, a respective mutation call is no-call when no candidate call identity is shared by at least a majority of the first set of wells.
- each well in the first set of wells corresponds to a different respective replicate of an amplification reaction.
- each well in the first set of wells corresponds to a different respective replicate of a detection assay (e.g, a programmable nuclease-based assay). Partitioning biological samples into replicates is illustrated in FIGS.
- each well in the first set of wells corresponds to a different respective replicate of an amplification reaction, and a respective mutation call is no-call when one or more wells in the amplification reaction fails to satisfy an amplification threshold. See, e.g., the section entitled “Nucleic Acid Amplification,” above.
- Other embodiments for voting procedures and/or concordance votes are contemplated, as will be apparent to one skilled in the art and as disclosed further herein.
- the method further includes, prior to the performing the voting procedure, binning the first set of wells into a plurality of bins, where each respective bin comprises a respective subset of wells originating from a common respective biological replicate of the biological sample.
- the voting procedure further comprises (i) performing, for each respective bin in the plurality of bins, a respective first concordance vote across the candidate call identities 134 for the subset of wells in the respective bin, thereby generating a plurality of bin votes 414, and (ii) applying a second concordance vote across the plurality of bin votes 414, thereby obtaining the mutation call 416 for the target locus.
- each respective bin vote in the plurality of bin votes is selected from the group consisting of first allele, second allele, and no-call.
- the first allele is wild-type and the second allele is mutant.
- the first allele is a first mutant, and the second allele is a second mutant.
- a respective first concordance vote generates a respective bin vote based on a common candidate call identity that is shared by at least a majority of the subset of wells for the respective bin. For example, as illustrated in FIG. 4B, consider the case of three wells, in which at least two wells have a candidate call identity of wild-type. In some such embodiments, the first concordance vote generates a bin vote 414 that is wild-type. Similarly, as illustrated in FIG. 4C, where the subset of wells consists of three wells and at least two of the wells have a candidate call identity of mutant, then the first concordance vote generates a bin vote 414 that is mutant.
- the respective first concordance vote generates a respective bin vote based on a common candidate call identity that is shared by at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95% of the wells in the subset of wells for the respective bin.
- a respective first concordance vote generates a respective bin vote based on a common candidate call identity that is shared by all of the wells in the subset of wells for the respective bin.
- a respective bin vote is no-call when no candidate call identity is shared by at least a majority of the subset of wells for the respective bin. For instance, in some embodiments, when the candidate call identities for the subset of wells are equally distributed between first allele and second allele, then the respective bin vote is nocall.
- a respective bin vote is no-call when at least one well in the subset of wells for the respective bin has a candidate call identity of no-call. For instance, as illustrated in FIG. 4C, where the subset of wells consists of three wells and one of the wells has a candidate call identities of no-call, then the first concordance vote generates a bin vote 414 for the respective bin that is no-call.
- the no-call well is removed from the plurality of wells and the bin vote is performed using the candidate call identities of the remaining wells.
- the second concordance vote generates a mutation call 416 based on a common bin vote that is shared by at least a majority of the bins in the plurality of bins.
- FIG. 4B illustrates a plurality of three bins, where at least two of the bins have a bin vote of wild-type, and where the second concordance vote generates a mutation call 416 for the target locus that is wild-type.
- FIG. 4C illustrates a plurality of three bins, where at least two of the bins have a bin vote of mutant, and where the second concordance vote generates a mutation call 416 for the target locus that is mutant.
- the second concordance vote generates a mutation call based on a common bin vote that is shared by at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95% of the bins in the plurality of bins. In some embodiments, the second concordance vote generates a mutation call based on a common bin vote that is shared by all of the bins in the plurality of bins.
- the mutation call is no-call when no bin vote is shared by at least a majority of the plurality of bins. For instance, in some embodiments, when the bin votes for the plurality of bins are equally distributed between first allele and second allele, then the mutation call is no-call. [00310] In some embodiments, as illustrated in FIG. 4C, when a respective bin vote is nocall, the no-call bin is removed from the plurality of bin votes and the second concordance vote is performed with the remaining bins.
- each respective bin in the plurality of bins corresponds to a different respective biological replicate of an amplification reaction, and the respective subset of wells for the respective bin is partitioned from the respective biological replicate. Partitioning biological samples into replicates is illustrated in FIGS. 4A- 4C and described in further detail above (see, e.g, the sections entitled “Nucleic Acid Amplification” and “Programmable Nuclease-Based Assays,” above).
- the plurality of bins comprises between 2 and 10 bins. In some embodiments, the plurality of bins comprises at least 3, at least 4, at least 5, at least 6, at least 8, at least 10, at least 15, or at least 20 bins. In some embodiments, the plurality of bins comprises no more than 30, no more than 20, no more than 10, or no more than 6 bins. In some embodiments, the plurality of bins is from 3 to 9, from 4 to 12, or from 10 to 25 bins. In some embodiments, the plurality of bins falls within another range starting no lower than 3 bins and ending no higher than 30 bins.
- each respective bin in the plurality of bins comprises between 2 and 10 wells in the respective subset of wells.
- each respective subset of wells comprises at least 3, at least 4, at least 5, at least 6, at least 8, at least 10, at least 15, or at least 20 wells.
- each respective subset of wells comprises no more than 30, no more than 20, no more than 10, or no more than 6 wells.
- each respective subset of wells is from 3 to 9, from 4 to 12, or from 10 to 25 wells. In some embodiments, each respective subset of wells falls within another range starting no lower than 3 wells and ending no higher than 30 wells.
- the systems and methods disclosed herein are used for comparison with one or more additional mutation calling methods (e.g., sequencing-based methods such as whole genome sequencing). In some embodiments, the systems and methods disclosed herein are performed concurrently with one or more additional mutation calling methods (e.g., sequencing-based methods such as whole genome sequencing).
- additional mutation calling methods e.g., sequencing-based methods such as whole genome sequencing.
- the systems and methods for determining a mutation call disclosed herein are performed for each respective mutant allele in a plurality of mutant alleles (see, e.g., the section entitled “Samples,” above).
- the method further includes performing an instance of the obtaining a mutation call for the target locus, for each respective mutant allele in the plurality of mutant alleles for the target locus.
- the method further comprises, for each candidate second allele in a plurality of candidate second alleles, repeating the obtaining a signal dataset, determining corresponding signal yields, determining respective candidate call identities, and performing a voting procedure, thereby obtaining a plurality of candidate mutation calls for the target locus.
- the method further includes performing a mutation call voting procedure across the plurality of candidate mutation calls for the target locus, thereby obtaining a final mutation call for the target locus.
- the first allele is a wild-type allele and each respective candidate second allele in the plurality of candidate second alleles is a different respective mutant allele for the target locus.
- the target locus comprises at least a first allele that is a wild-type allele and a plurality of candidate second alleles, where each respective candidate second allele is a different mutant allele for the target locus.
- the respective candidate mutation call is determined as the final mutation call for the target locus.
- the first allele is a wild-type allele
- each respective candidate second allele in the plurality of candidate second alleles is a different respective mutant allele for the target locus
- the final mutation call for the target locus is a candidate mutation call that corresponds to a respective mutant allele.
- the final mutation call for the target locus is the only candidate mutation call that corresponds to a respective mutant allele, where every other candidate mutation call in the plurality of candidate mutation calls corresponds to the wild-type allele.
- a target allele comprises a wild-type allele and n mutant alleles, where n is a positive integer of 1 or greater (e.g., n is between 1 and 100).
- N comparisons are performed between the wild-type allele and each of the n mutant alleles, in accordance with some embodiments of the present disclosure, thus obtaining a corresponding n candidate mutation calls.
- one of the n comparisons yields a candidate mutation call that identifies the corresponding mutant allele, while every other comparison yields a candidate mutation call that identifies the wild-type allele.
- the candidate mutation call that identifies the corresponding mutant allele is selected as the final mutation call, and the identity of the target locus is determined to be the corresponding mutant allele (e.g, SNP).
- each instance of the obtaining a mutation call for each of the four mutant alleles compared with the wild-type allele would generate the following candidate mutation calls: log 2 (Fy(WT)) > log2(Fy(Ml)) Wild Type log 2 (Fy(WT)) > log2(Fy(M2)) Wild Type log 2 (Fy(WT)) > log2(Fy(M3)) Wild Type log 2 (Fy(WT)) ⁇ log2(Fy(M4)) Mutant (4)
- mutant 4 is selected as the final mutation call.
- the obtaining the final mutation call for the target locus is based upon a comparison of signal yields for each pair of candidate second alleles in the sub-plurality of candidate second alleles.
- the first allele is a wild-type allele
- each respective candidate second allele in the plurality of candidate second alleles is a different respective mutant allele for the target locus
- two or more candidate mutation calls in the plurality of candidate mutation calls identifies a corresponding two or more mutant alleles.
- a comparison is performed between each respective pair of mutant alleles corresponding to candidate mutation calls to determine which mutant allele has the highest signal yield and thus is selected as the final mutation call.
- the comparison is performed by repeating the obtaining a signal dataset, determining corresponding signal yields, determining respective candidate call identities, and performing a voting procedure, in accordance with the methods disclosed above, where the first allele is a first mutant allele in a respective pair of mutant alleles and the second allele is a second mutant allele in the respective pair of mutant alleles.
- each instance of the obtaining a mutation call for each of the four mutant alleles compared with the wild-type allele would generate the following candidate mutation calls: log2(Fy(WT)) > log2(Fy(Ml)) Wild Type log2(Fy(WT)) ⁇ log2(Fy(M2)) Mutant(2) log2(Fy(WT)) > log2(Fy(M3)) Wild Type log2(Fy(WT)) ⁇ log2(Fy(M4)) Mutant (4)
- mutant 2 and mutant 4 Two of the mutant alleles (e.g, mutant 2 and mutant 4) are represented as a candidate mutation call. Comparison of the signal yields of the pair of mutant alleles represented as candidate mutation calls reveals that one of the mutant alleles has a higher signal yield than the other: log2(Fy(M2)) ⁇ log2(Fy(M4)) Mutant(4)
- mutant allele e.g. , mutant 4 having the highest signal yield in the comparison is selected as the final mutation call.
- the signal dataset comprises, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus.
- the signal dataset represents a plurality of time points.
- the plurality of wells comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus.
- the plurality of wells also comprises, for each respective candidate second allele in a set of candidate second alleles, a respective additional set of wells representing the respective candidate second allele for the target locus, where each well in this respective additional set of wells includes a corresponding plurality of guide nucleic acids that have the respective candidate second allele for the target locus.
- Each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value.
- each respective well in the first set of wells and each respective well in each respective additional set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample.
- a corresponding signal yield for the respective well using the corresponding plurality of reporting signals for the respective well across the plurality of time points.
- a voting procedure is then performed across the plurality of candidate call identities, thereby obtaining a mutation call for the target locus that is one of the candidate second alleles in the set of candidate second alleles or the first allele.
- the first allele is a wild-type allele and each respective candidate second allele in the set of candidate second alleles is a different respective mutant allele for the target locus.
- the first allele is other than a wild-type allele.
- the set of candidate second alleles consists of between 2 and 10, between 2 and 20, or between 2 and 100 candidate second alleles.
- each respective target locus in the plurality of target loci comprises any of the embodiments disclosed herein for a first respective target locus, as well as any substitutions, modifications, additions, deletions, and/or combinations thereof (see, e.g., the section entitled “Samples,” above), as will be apparent to one skilled in the art.
- the plurality of target loci comprises at least 2, at least 5, at least 10, at least 20, at least 30, at least 50, at least 100, or at least 200 target loci. In some embodiments, the plurality of target loci comprises no more than 500, no more than 200, no more than 100, no more than 50, or no more than 10 target loci. In some embodiments, the plurality of target loci is from 2 to 10, from 5 to 50, or from 10 to 100 target loci. In some embodiments, the plurality of target loci falls within another range starting no lower than 2 loci and ending no higher than 500 loci.
- each respective target locus in the plurality of target loci maps to a reference sequence for a single organism.
- a respective organism e.g., a virus
- the systems and methods for determining a mutation call disclosed herein are performed for each sample in a plurality of samples.
- FIG. 4A illustrates a first plurality of amplification replicates partitioned from a first sample and a second plurality of amplification replicates partitioned from a second sample.
- FIG. 4B illustrates the determination of mutation calls, in accordance with an embodiment of the present disclosure, using candidate call identities obtained for the first sample, while FIG. 4C illustrates the determination of mutation calls using candidate call identities obtained for the second sample.
- the plurality of samples comprises at least 2, at least 5, at least 10, at least 20, at least 30, at least 50, at least 100, at least 200, or at least 500 samples. In some embodiments, the plurality of samples comprises no more than 1000, no more than 500, no more than 200, no more than 100, no more than 50, or no more than 10 samples. In some embodiments, the plurality of samples is from 2 to 20, from 5 to 100, or from 10 to 200 samples. In some embodiments, the plurality of samples falls within another range starting no lower than 2 samples and ending no higher than 1000 samples.
- Suitable embodiments for performing the present systems and methods for a plurality of loci and/or for a plurality of samples including samples, target loci, target nucleic acids, wells, nucleic acid amplification, programmable nuclease-based assays, reporters, guide nucleic acid sequences, candidate call identities and determining the same, voting procedures, and mutation calls, and any characteristics or elements thereof, include any of the embodiments for samples, target loci, target nucleic acids, wells, nucleic acid amplification, programmable nuclease-based assays, reporters, guide nucleic acid sequences, candidate call identities and determining the same, voting procedures, and mutation calls, disclosed herein for a single target locus and/or for a single sample, as well as any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art.
- one aspect of the present disclosure provides a method for determining a mutation call for a target locus in a biological sample, at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors.
- the method includes amplifying a first plurality of nucleic acids derived from the biological sample, thereby obtaining a plurality of amplified nucleic acids.
- a procedure is performed including (i) partitioning, from the plurality of amplified nucleic acids, a respective corresponding aliquot of nucleic acid derived from the biological sample, and (ii) contacting the respective corresponding aliquot of nucleic acid with at least a programmable nuclease, a guide nucleic acid, and a plurality of reporters.
- a signal dataset is obtained including, for each respective well in the plurality of wells in the common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules derived from the biological sample that map to the target locus.
- the signal dataset represents a plurality of time points.
- the plurality of wells includes a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus.
- the plurality of wells further includes a second set of wells representing a second allele for the target locus, where each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus.
- Each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value obtained from a cleavage of one or more respective reporters, in the plurality of reporters, by the programmable nuclease.
- a corresponding signal yield for the respective well is determined using the corresponding plurality of reporting signals for the respective well across the plurality of time points.
- a respective candidate call identity is determined based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities.
- a voting procedure is performed across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
- Another aspect of the present disclosure provides a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors, the one or more programs comprising instructions for performing a method including obtaining a signal dataset comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus.
- the signal dataset represents a plurality of time points.
- the plurality of wells comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus.
- the plurality of wells further comprises a second set of wells representing a second allele for the target locus, where each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus.
- Each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value.
- Each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample.
- a corresponding signal yield for the respective well is determined using the corresponding plurality of reporting signals for the respective well across the plurality of time points.
- a respective candidate call identity is determined based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities.
- a voting procedure is performed across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
- Another aspect of the present disclosure provides a non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for carrying out a method including obtaining a signal dataset comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus.
- the signal dataset represents a plurality of time points.
- the plurality of wells comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus.
- the plurality of wells further comprises a second set of wells representing a second allele for the target locus, where each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus.
- Each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value.
- Each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample.
- a corresponding signal yield for the respective well is determined using the corresponding plurality of reporting signals for the respective well across the plurality of time points.
- a respective candidate call identity is determined based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities.
- a voting procedure is performed across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
- compositions, systems and methods for assaying for a SARS-CoV-2 variant are consistent with the methods, and reagents disclosed herein.
- the sample comprises a target nucleic acid which is a gene fragment of a SARS-CoV-2 variant.
- the SARS-CoV-2 variant that is specifically targeted is Alpha strain/B.1.1.7 and Q lineages and descendent lineages.
- the SARS-CoV-2 variant that is specifically targeted is Beta strain/B.1.351 and descendent lineages.
- the SARS-CoV-2 variant that is specifically targeted is Gamma strain/P.1 and descendent lineages.
- the SARS- CoV-2 variant that is specifically targeted is Epsilon strain/B.1.427 and B.1.429 lineage and descendent lineages.
- the SARS-CoV-2 variant that is specifically targeted is Kappa strain/B.1.617.1 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Zeta strain/P.2 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Delta strain/B.1.617.2 and AY lineages and descendent lineages. In some embodiments, the SARS- CoV-2 variant that is specifically targeted is Zeta strain and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is B.1.526 lineage and descendent lineages.
- the SARS-CoV-2 variant that is specifically targeted is B.1.526.1 lineage and descendent lineages. In some embodiments, the SARS- CoV-2 variant that is specifically targeted is B.1.525 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is B.1.617 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is B.1.617.3 lineage and descendent lineages. In some embodiments, the SARS- CoV-2 variant that is specifically targeted is B.1.617.1 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Lambda strain/C.37 lineage and descendent lineages.
- the SARS-CoV-2 variant that is specifically targeted is Eta strain/B.1.525 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Iota strain/B.1.526 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Mu strain/B.1.621 and B.1.621.1 lineages and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Omicron strain/B.1.1.529 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is a lineage obtained from a reference database (see, for example, Rambaut et al. , (2020) Nature Microbiology, doi:10.1038/s41564-020-0770-5; and Cov-Lineages, available on the Internet at cov-lineages.org/lineage_hst.html).
- a reference database see, for example, Rambaut et al
- the SARS-CoV-2 variant that is specifically targeted has one or more mutation(s) in the gene encoding Spike (S) protein. In some embodiments, the SARS-CoV-2 variant that is specifically targeted has one or more mutation(s) in the gene encoding Nucleocapsid (N) protein. In some embodiments, the SARS-CoV-2 variant that is specifically targeted has one or more mutation(s) in the gene encoding Envelop (E) protein. In some embodiments, the SARS-CoV-2 variant that is specifically targeted has one or more mutation(s) in the gene encoding Membrane glycoprotein (M) protein.
- S Spike
- N Nucleocapsid
- E Envelop
- M Membrane glycoprotein
- the SARS-CoV-2 variant that is specifically targeted has one or more mutation(s) in ORF la. In some embodiments, the SARS-CoV-2 variant that is specifically targeted has one or more mutation(s) in ORF lb. In some instances, the target nucleic acid comprises a segment of a nucleic acid from the SARS-CoV-2 variant comprising the one or more mutations.
- the sample is a biological sample from an individual.
- a biological sample is a blood, serum, plasma, saliva, urine, mucosal sample, peritoneal sample, cerebrospinal fluid, gastric secretions, nasal secretions, sputum, pharyngeal exudates, urethral or vaginal secretions, an exudate, an effusion, or tissue sample.
- the sample is a nasal swab.
- the nasal swab is a nasopharyngeal swab.
- the sample is a throat swab.
- the throat swab is an oropharyngeal swab.
- a tissue sample may be dissociated or liquified prior to application to detection system of the disclosure.
- a sample from an environment may be from soil, air, or water.
- the environmental sample is taken as a swab from a surface of interest or taken directly from the surface of interest.
- the raw sample is applied to the detection system.
- the sample is diluted with a buffer or a fluid or concentrated prior to application to the detection system or be applied neat to the detection system. [00348] Sometimes, the sample is contained in no more 20 pL.
- the sample in some cases, is contained in no more than 1, 5, 10, 15, 20, 25, 30, 35 40, 45, 50, 55, 60, 65, 70, 75, 80, 90, 100, 200, 300, 400, 500 pL, or any of value from 1 pL to 500 pL. Sometimes, the sample is contained in more than 500 pL.
- Some methods described herein can detect a target nucleic acid present in the sample in various concentrations or amounts as a target nucleic acid.
- the sample has at least 2 target nucleic acids.
- the sample has at least 3, 5, 10, 20, 30, 40, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, or 10000 target nucleic acids.
- the method detects target nucleic acid present at least at one copy per 10 1 non-target nucleic acids, 10 2 non-target nucleic acids, 10 3 non-target nucleic acids, 10 4 non-target nucleic acids, 10 5 non-target nucleic acids, 10 6 non-target nucleic acids, 10 7 non-target nucleic acids, 10 8 non-target nucleic acids, 10 9 non-target nucleic acids, or IO 10 non-target nucleic acids.
- the target nucleic acid comprises a segment of an S (Spike) gene of a SARS-CoV-2 variant.
- the segment of the S gene in some embodiments, comprises a mutation (e.g., an SNP) relative to a wild-type SARS-CoV-2 protein.
- the mutation in some embodiments, is used, at least in part, to identify the SARS-CoV-2 variant (e.g, to distinguish it from other variants or from a wild-type SARS-CoV-2). Therefore detection of the target nucleic acid comprising the mutation in a sample, in some embodiments, is used to identify the sample as comprising the SARS-CoV-2 variant.
- Methods described herein comprises contacting the sample to a complex comprising a guide nucleic acid comprising a segment that is reverse complementary to a segment of the target nucleic acid and an effector protein that exhibits sequence independent cleavage upon forming a complex comprising the segment of the guide nucleic acid binding to the segment of the target nucleic acid (e.g, comprising a segment of a SARS-CoV-2 variant); and assaying for a signal indicating cleavage of a reporter, wherein the signal indicates a presence of the target nucleic acid in the sample and wherein absence of the signal indicates an absence of the target nucleic acid in the sample.
- a guide nucleic acid binds to a target nucleic acid (e.g, a nucleic acid from a SARS-CoV-2 variant)
- the effector proteins trans cleavage activity can be initiated, and reporter nucleic acids can be cleaved, resulting in the detection of fluorescence indicative of, at least in part, the presence of the target nucleic acid.
- the cleaving of the reporter nucleic acid using the effector protein in some embodiments, cleaves with an efficiency of 50% as measured by a change in a signal that is calorimetric, potentiometric, amperometric, optical (e.g, fluorescent, colorimetric, etc.), or piezo-electric, as non-limiting examples.
- the cleavage efficiency is at least 40%, 50%, 60%, 70%, 80%, 90%, or 95% as measured by a change in a signal that is calorimetric, potentiometric, amperometric, optical (e.g, fluorescent, colorimetric, etc.), or piezo-electric, as non-limiting examples.
- the method described herein detect a target nucleic acid (e.g, a nucleic acid from a SARS-CoV-2 variant) with an effector protein and a detector nucleic acid in a sample where the sample is contacted with the reagents for a predetermined length of time sufficient for trans cleavage of the single-stranded detector nucleic acid.
- Some methods described herein comprise a) contacting the sample to i) a detector nucleic acid; and ii) a composition comprising a effector protein and a non-naturally occurring guide nucleic acid having a nucleotide sequence that is at least 80% identical to SEQ ID NO: 1, 2, 3, 4, 5, or 6 (or any one of SEQ ID NOS: 22-27 or 40-42) that hybridizes to a segment of the target nucleic acid, wherein the effector protein cleaves the detector nucleic acid upon hybridization of the non-naturally occurring guide nucleic acid to the segment of the coronavirus target nucleic acid; and b) assaying for a change in a signal, wherein the change in the signal is produced by cleavage of the detector nucleic acid.
- the detector nucleic comprises acid a nucleotide sequence that is at least 62.5% identical to SEQ ID NO: 7, and flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end.
- the detector nucleic acid comprises a nucleotide sequence that is at least 75% identical to SEQ ID NO: 7, and flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end.
- the detector nucleic acid comprises a nucleotide sequence that is at least 87.5% identical to SEQ ID NO: 7, and flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end.
- the non-naturally occurring guide nucleic acid having a nucleotide sequence that is at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 1, 2, 3, 4, 5, or 6, or any one of SEQ ID NOS: 22-27 or 40-42.
- the method further comprises assaying for a control sequence by contacting the control nucleic acid to a second detector nucleic acid and a composition comprising the programmable nuclease and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the control nucleic acid, wherein the programmable nuclease cleaves the detector nucleic acid upon hybridization of the non-naturally occurring guide nucleic acid to the segment of the control nucleic acid.
- Contacting the sample with the effector protein and guide nucleic acid occurs at a temperature of at least about 25°C, at least about 30°C, at least about 35°C, at least about 40°C, at least about 50°C, or at least about 65°C. In some instances, the temperature is not greater than 80°C. In some instances, the temperature is about 25°C, about 30°C, about 35°C, about 40°C, about 45°C, about 50°C, about 55°C, about 60°C, about 65°C, or about 70°C. In some instances, the temperature is about 25°C to about 45°C, about 35°C to about 55°C, or about 55°C to about 65°C.
- methods of the disclosure detect a target nucleic acid (e.g. , the target nucleic acid indicative of, at least in part, a SARS-CoV-2 variant) in less than 60 minutes.
- methods of the disclosure detect a target nucleic acid in less than about 120 minutes, less than about 110 minutes, less than about 100 minutes, less than about 90 minutes, less than about 80 minutes, less than about 70 minutes, less than about 60 minutes, less than about 55 minutes, less than about 50 minutes, less than about 45 minutes, less than about 40 minutes, less than about 35 minutes, less than about 30 minutes, less than about 25 minutes, less than about 20 minutes, less than about 15 minutes, less than about 10 minutes, less than about 5 minutes, less than about 4 minutes, less than about 3 minutes, less than about 2 minutes, or less than about 1 minute.
- the methods of detecting are performed in less than 10 hours, less than 9 hours, less than 8 hours, less than 7 hours, less than 6 hours, less than 5 hours, less than 4 hours, less than 3 hours, less than 2 hours, less than 1 hour, less than 50 minutes, less than 45 minutes, less than 40 minutes, less than 35 minutes, less than 30 minutes, less than 25 minutes, less than 20 minutes, less than 15 minutes, less than 10 minutes, less than 9 minutes, less than 8 minutes, less than 7 minutes, less than 6 minutes, or less than 5 minutes. In some cases, methods of detecting are performed in about 5 minutes to about 10 hours, about 10 minutes to about 8 hours, about 15 minutes to about 6 hours, about 20 minutes to about 5 hours, about 30 minutes to about 2 hours, or about 45 minutes to about 1 hour.
- the results from the completed assay can be detected and analyzed in various ways.
- signal produced by the reaction is visible by eye, and the results can be read by the user.
- the signal is visualized by an imaging device or other device depending on the type of signal.
- the imaging device is a digital camera, such a digital camera on a mobile device.
- the mobile device in some embodiments, has a software program or a mobile application that can capture an image of the support medium, identify the assay being performed, detect the detection region and the detection spot, provide image properties of the detection spot, analyze the image properties of the detection spot, and provide a result.
- the imaging device can capture fluorescence, ultraviolet (UV), infrared (IR), or visible wavelength signals.
- the imaging device in some embodiments, has an excitation source to provide the excitation energy and captures the emitted signals.
- the excitation source is a camera flash and optionally a filter.
- the imaging device is used together with an imaging box that is placed over the support medium to create a dark room to improve imaging.
- the imaging box can be a cardboard box that the imaging device can fit into before imaging.
- the imaging box has optical lenses, mirrors, filters, or other optical elements to aid in generating a more focused excitation signal or to capture a more focused emission signal.
- the imaging box and the imaging device are small, handheld, and portable to facilitate the transport and use of the assay in remote or low resource settings.
- the assay described herein can be visualized and analyzed by a mobile application (app) or a software program.
- a mobile application app
- a software program Using the graphic user interface (GUI) of the app or program, in some embodiments, an individual takes an image of the support medium, including the detection region, barcode, reference color scale, and fiduciary markers on the housing, using a camera on a mobile device.
- the program or app reads the barcode or identifiable label for the test type, locates the fiduciary marker to orient the sample, reads the detectable signals, compares against the reference color grid, and determines the presence or absence of the target nucleic acid, which indicates the presence of the gene, virus, or the agent responsible for the disease.
- the mobile application in some embodiments, presents the results of the test to the individual.
- the mobile application stores the test results in the mobile application.
- the mobile application in some embodiments, communicates with a remote device and transfer the data of the test results.
- the test results in some embodiments, are viewable remotely from the remote device by another individual, including a healthcare professional.
- a remote user in some embodiments, accesses the results and uses the information to recommend action for treatment, intervention, cleanup of an environment.
- the method of disclosure is used as an initial screen for the presence of an SNP before whole genome sequencing. In other cases, the method of disclosure is used to replace whole genome sequencing. In other cases, the method of disclosure is used to detect specific mutations associated with neutralizing antibody evasion before physicians making a clinical decision of using certain antibody drugs to treat the infection. In other cases, the method of disclosure is used for contact tracing within communities.
- Methods described herein comprise contacting a nasal swab or a throat swab from an individual.
- the nasal swab is a nasopharyngeal swab.
- the throat swab is an oropharyngeal swab.
- the procedures to collect a nasal swab is well known in the relevant field. Briefly, a tapered swab, in some embodiments, is used. When the individual is tilted to a certain angel, the swab is inserted into the individual’s nasal cavity parallel to the palate until resistance is met at turbinates.
- the swab in some embodiments, is preserved into a sterile tube.
- the procedures to collect a throat swab are also well known in the relevant field. Briefly, in some embodiments, a swab is inserted into the mouth of the individual, and touches both tonsillar pillars and posterior oropharynx. In some embodiments, the swab is preserved into a sterile tube.
- methods described herein comprise preparing samples, including lysing the sample.
- lysing the sample comprises contacting the sample to a lysis buffer.
- a lysis reaction in some embodiments, is performed at a range of temperatures. In some embodiments, a lysis reaction is performed at about room temperature. In some embodiments, a lysis reaction is performed at about 95°C. In some embodiments, a lysis reaction is performed at from 1 °C to 10 °C, from 4 °C to 8 °C, from 10 °C to 20 °C, from 15 °C to 25 °C, from 15 °C to 20 °C, from 18 °C to 25 °C, from 18 °C to 95 °C, from 20 °C to 37 °C, from 25 °C to 40 °C, from 35 °C to 45 °C, from 40 °C to 60 °C, from 50 °C to 70 °C, from 60 °C to 80 °C, from 70 °C to 90 °C, from 80 °C to 95 °C, or from 90 °C to 99 °C.
- a lysis reaction is performed for about 5 minutes, about 15 minutes, or about 30 minutes. In some embodiments, a lysis reaction is performed for from 2 minutes to 5 minutes, from 3 minutes to 8 minutes, from 5 minutes to 15 minutes, from 10 minutes to 20 minutes, from 15 minutes to 25 minutes, from 20 minutes to 30 minutes, from 25 minutes to 35 minutes, from 30 minutes to 40 minutes, from 35 minutes to 45 minutes, from 40 minutes to 50 minutes, from 45 minutes to 55 minutes, from 50 minutes to 60 minutes, from 55 minutes to 65 minutes, from 60 minutes to 70 minutes, from 65 minutes to 75 minutes, from 70 minutes to 80 minutes, from 75 minutes to 85 minutes, or from 80 minutes to 90 minutes.
- the lysis buffer disclosed herein comprises a viral lysis buffer.
- a viral lysis buffer in some embodiments, lyses a coronavirus capsid in a viral sample (e.g, a sample collected from an individual suspected of having a coronavirus infection), releasing a viral genome.
- the viral lysis buffer in some embodiments, is compatible with amplification (e.g, RT-LAMP amplification) of a target region of the viral genome.
- the viral lysis buffer in some embodiments, is compatible with detection (e.g, a DETECTR reaction disclosed herein).
- a sample in some embodiments, is prepared in a one- step sample preparation method comprising suspending the sample in a viral lysis buffer compatible with amplification, detection (e.g, a DETECTR reaction), or both.
- a viral lysis buffer compatible with amplification e.g, RT-LAMP amplification
- detection e.g, DETECTR
- a viral lysis buffer compatible with amplification e.g, RT-LAMP amplification
- detection e.g, DETECTR
- a viral lysis buffer compatible with amplification e.g, RT-LAMP amplification
- detection e.g, DETECTR
- a viral lysis buffer compatible with amplification e.g, RT-LAMP amplification
- detection e.g, DETECTR
- or both comprises a buffer (e.g, Tris-HCl, phosphate, or HEPES), a reducing agent (e.g, N-Acetyl Cysteine (NAC), Dithiothrei
- a viral lysis buffer in some embodiments, comprises a buffer and a reducing agent, or a viral lysis buffer comprises a buffer and a chelating agent.
- the viral lysis buffer in some embodiments, is formulated at a low pH.
- the viral lysis buffer in some embodiments, is formulated at a pH of from about pH 4 to about pH 5.
- the viral lysis buffer in some embodiments, is formulated at a pH of from about pH 4 to about pH 9.
- the viral lysis buffer in some embodiments, further comprises a preservative (e.g, ProClin 150).
- the viral lysis buffer in some embodiments, comprises an activator of the amplification reaction.
- the buffer in some embodiments, comprises primers, dNTPs, or magnesium (e.g., MgSOi. MgCh or MgOAc), or a combination thereof, to activate the amplification reaction.
- an activator e.g., primers, dNTPs, or magnesium
- an activator is added to the buffer following lysis of the coronavirus to initiate the amplification reaction.
- Methods described herein comprise RNA extraction.
- the method described herein comprises obtaining a sample from a subject infected with or suspected to be infected with coronavirus.
- the methods comprise extracting RNA from the sample.
- RNA can be extracted from the sample with conventional RNA extraction methods.
- RNA is extracted with the Mag-Bind Viral DNA/RNA 96 kit (Omega Bio-Tek).
- RNA is extracted with the HighPrepTM Viral DNA/RNA Kit (MagBio).
- RNA is extracted with the AB MagPure Virus RNA Isolation Kit.
- RNA is extracted with the GeneJET RNA Purification Kit (Thermo Fisher Scientific).
- RNA in some embodiments, is automatically extracted on a liquid handling platform. In some instances, RNA is automatically extracted on the KingFisher Flex (Thermo Fisher Scientific). In some instances, RNA is automatically extracted on Microlab® STAR (Hamilton). In some instances, RNA is automatically extracted on Microlab® NIMBUS (Hamilton). In some instances, RNA is automatically extracted on KingFisher® Duo Prime (Thermo Fisher Scientific). In some instances, RNA is automatically extracted on MagMAX® Express-96 (Applied Biosystems). In some instances, RNA is automatically extracted on BioSprint® 96 (QIAGEN).
- Methods described herein comprise amplifying a target nucleic acid for detection.
- amplifying comprises changing the temperature of the amplification reaction, also known as thermal amplification (e.g., PCR).
- amplifying is performed at essentially one temperature, also known as isothermal amplification.
- amplifying comprises subjecting a target nucleic acid to an amplification reaction selected from transcription mediated amplification (TMA), helicase dependent amplification (HD A), or circular helicase dependent amplification (cHDA), strand displacement amplification (SDA), recombinase polymerase amplification (RPA), loop mediated amplification (LAMP), exponential amplification reaction (EXPAR), rolling circle amplification (RCA), ligase chain reaction (LCR), simple method amplifying RNA targets (SMART), single primer isothermal amplification (SPIA), multiple displacement amplification (MDA), nucleic acid sequence based amplification (NASBA), hinge-initiated primer-dependent amplification of nucleic acids (HIP), nicking enzyme amplification reaction (NEAR), and improved multiple displacement amplification (IMDA).
- TMA transcription mediated amplification
- HD A helicase dependent amplification
- cHDA circular helicase dependent amplification
- SDA
- the target nucleic acid is amplified by LAMP. In some instances, the target nucleic acid is amplified by RT-LAMP.
- LAMP primer sets can be designed to target distinct but nearby sequences around one or more mutations (e.g., SNP) that are indicative of, at least in part, a SARS-CoV-2 variant.
- four primers comprising forward inner primer (FIP) with an overhang designed to form dumbbell structures, backward inner primer (BIP) with an overhang designed to form dumbbell structures, forward outer primer (F3) for displacing the FIP linked complementary strands from templates, and backward outer primer (B3) for displacing the BIP linked complementary strands from templates, are designed to target one region.
- six primers comprising FIP, BIP, F3, B3, loop forward (LF), and loop backward (LB), are designed to target one region, wherein LF and LB are added for faster amplification.
- one or more of the primer are degenerative primers.
- a set of FIP, BIP, F3, B3, LF, and LB are designed to amplify around an SNP that is indicative of, at least in part, a SARS-CoV 2 variant.
- a set of FIP, BIP, F3, B3, LF, and LB are designed to amplify a fragment of DNA that encodes around amino acid position 452 of the S protein.
- a set of FIP, BIP, F3, B3, LF, and LB are designed to amplify a fragment of DNA that encodes around amino acid position 484 of the S protein.
- a set of FIP, BIP, F3, B3, LF, and LB are designed to amplify a fragment of DNA that encodes around amino acid position 501 of the S protein.
- a set of FIP, BIP, F3, B3, LF, and LB are designed to amplify a fragment of DNA that encodes around amino acid position 18, amino acid position 95, amino acid position 144, amino acid position 614, amino acid position 653, amino acid position 681, amino acid position 655, amino acid position 796, or amino acid position 1219 of the S protein.
- a set of FIP, BIP, F3, B3, LF, and LB has primer sequences shown in Tables 3A and 3B herein. [00375] Table 3A: LAMP Primer Sequences
- compositions that, in some embodiments, further comprise reagents for amplification, the uses thereof, e.g., in systems, and methods for assaying for a SARS-CoV-2 variant.
- reagents for amplifying a nucleic acid include polymerases, primers, and nucleotides.
- Nucleic acid amplification of a target nucleic acid improves at least one of sensitivity, specificity, or accuracy of the assay in detecting the target nucleic acid.
- amplification of the target nucleic acid increases the concentration of the target nucleic acid in the sample relative to the concentration of nucleic acids that do not correspond to the target nucleic acid.
- amplification of the target nucleic acid comprises modifying the sequence of the target nucleic acid. For example, in some embodiments, amplification is used to insert a PAM sequence into a target nucleic acid that lacks a PAM sequence.
- the nucleic acid amplification reaction is carried out with reagents comprising amplification primers described herein, a DNA polymerase, dNTPs, and appropriate buffer.
- the buffer comprises Tris-HCl, (NH4)2SO4, KC1, MgSO4, and Tween® 20.
- the concentration of MgSCti is 6mM.
- amplification of the target nucleic acid comprises modifying the sequence of the target nucleic acid. For example, in some embodiments, amplification is used to insert a PAM sequence into a target nucleic acid that lacks a PAM sequence.
- Amplifying takes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or 60 minutes. Sometimes, the nucleic acid amplification is performed for 1 to 60, 5 to 55, 10 to 50, 15 to 45, 20 to 40, or 25 to 35 minutes. Amplifying, in some embodiments, is performed at a temperature of around 20- 45°C. Amplifying, in some embodiments, is performed at a temperature of less than about 20°C, less than about 25°C, less than about 30°C, 35°C, less than about 37°C, less than about 40°C, or less than about 45°C.
- the nucleic acid amplification reaction in some embodiments, is performed at a temperature of at least about 20°C, at least about 25°C, at least about 30°C, at least about 35°C, at least about 37°C, at least about 40°C, or at least about 45°C.
- the amplification reaction is performed on at least 10, 100, 1,000, 5,000, 10,000, 15,000 or 10,000 copies of the target nucleic acid. In some cases, at least 10,000 copies of the target nucleic acid are input into the amplification reaction.
- the nucleic acid amplification reaction is carried out with reagents comprising amplification primers described herein, a DNA polymerase, dNTPs, and appropriate buffer.
- the buffer comprises Tris-HCl, (NH4)2SO4, KC1, MgSO4, and Tween® 20.
- the concentration of MgSCti is 6mM.
- Methods described herein comprise reverse transcribing the coronavirus target nucleic acid, the amplification product, or a combination thereof.
- the reverse transcribing comprises contacting the sample to reagents for reverse transcription.
- the reagents for reverse transcription comprise a reverse transcriptase, an oligonucleotide primer, and dNTPs.
- the reverse transcriptase described herein is WarmStart RTx reverse transcriptase.
- the reverse transcriptase described herein is MultiScribeTM reverse transcriptase.
- the reverse transcriptase described herein is QuantiTect reverse transcriptase. In one instance, the reverse transcriptase described herein is GoScriptTM reverse transcriptase. In one instance, the reverse transcriptase described herein is UltraScript 2.0 reverse transcriptase.
- the contacting the sample to reagents for reverse transcription occurs prior to the contacting the sample to the detector nucleic acid to the detector nucleic acid and the composition, prior to the contacting the sample to the reagents for amplification, or prior to both.
- the contacting the sample to reagents for reverse transcription occurs concurrent to the contacting the sample to the detector nucleic acid to the detector nucleic acid and the composition, concurrent to the contacting the sample to the reagents for amplification, or concurrent to both.
- Methods described herein comprise contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof.
- Methods described herein further comprise assaying for a change in a signal produced by cleavage of a reporter nucleic acid wherein the effector protein trans cleaves the reporter nucleic acid upon hybridization of the non-naturally occurring guide nucleic acid to the segment of the target nucleic acid or the amplification product thereof.
- CRISPR/Cas enzymes are effector proteins used in the methods and systems disclosed herein.
- CRISPR/Cas enzymes can include any of the known Classes and Types of CRISPR/Cas enzymes.
- Effector proteins disclosed herein include Class 1 CRISPR/Cas enzymes, such as the Type I, Type IV, or Type III CRISPR/Cas enzymes.
- Effector proteins disclosed herein also include the Class 2 CRISPR/Cas enzymes, such as the Type II, Type V, and Type VI CRISPR/Cas enzymes.
- the Type V CRISPR/Cas enzyme is a Casl2 effector protein.
- Type V CRISPR/Cas enzymes (e.g, Casl2 or Casl4) lack an HNH domain.
- a Casl2 nuclease of the disclosure cleaves a nucleic acids via a single catalytic RuvC domain.
- the RuvC domain is within a nuclease, or “NUC” lobe of the protein, and the Casl2 nucleases further comprise a recognition, or “REC” lobe.
- the programmable Casl2 effector protein described herein is Casl2a.
- the programmable Casl2 effector protein described herein is Casl2c.
- the programmable Casl2 effector protein described herein is Casl2d.
- the programmable Casl2 effector protein described herein is Casl2e.
- the programmable Casl2 effector protein described herein is Casl2b. In some embodiments, the programmable Casl2 effector protein described herein is Casl2h. In some embodiments, the programmable Casl2 effector protein described herein is Casl2i. In some embodiments, the programmable Casl2 effector protein described herein is a small effector such as Casl2g.
- compositions comprising an effector protein and a guide nucleic acid and uses thereof, e.g., in systems, and methods for assaying for a SARS-CoV-2 variant.
- the guide nucleic acid described herein targets (e.g, is capable of binding to and/or is complementary to) a target nucleic acid.
- the target nucleic acid is near a neighboring PAM sequence that is recognizable by the effector proteins described herein.
- the engineered guide RNA comprises a CRISPR RNA (crRNA) that is at least partially complementary to a target nucleic acid.
- the engineered guide RNA comprises a trans-activating crRNA (tracrRNA), at least a portion of which interacts with the effector protein. The tracrRNA, in some embodiments, hybridizes to a portion of the guide RNA that does not hybridize to the target nucleic acid.
- compositions comprise a crRNA and tracrRNA that function together as two separate, unlinked molecules.
- the guide nucleic acid targets (e.g, is capable of binding to and/or is complementary to) a target nucleic acid indicative of, at least in part, a SARS-CoV- 2 variant.
- the target nucleic acid is from a SARS-CoV-2 variant and comprises a mutation relative to a wild-type SARS-CoV-2 or to a strain of SARS-CoV-2 that comprises no mutations at the same locus.
- the mutation is a single nucleotide polymorphism (SNP).
- the mutation e.g, an SNP
- the target nucleic acid is a RNA, DNA, or a synthetic nucleic acid.
- the guide nucleic acid targets a target nucleic acid which comprises a segment of an S (Spike) gene of SARS-CoV-2.
- the S protein plays a key role in the receptor recognition and cell membrane fusion process, which is essential for the infection and transmission capabilities of SARS-CoV-2.
- the guide nucleic acid targets a target nucleic acid comprising certain SNP(s) in the S gene that encodes S protein.
- the guide nucleic acid targets a portion of the S gene encoding around amino acid position 18, amino acid position 95, amino acid position 144, amino acid position 614, amino acid position 653, amino acid position 681, amino acid position 655, amino acid position 796, or amino acid position 1219 of the Spike protein of SARS-CoV-2 (see NCBI Reference Sequence: YP_009724390.1).
- the guide nucleic acid is a guide nucleic acid represented in Table 4A or Table 4B.
- the guide nucleic acid described herein targets around position 452 in the Spike protein and hybridizes to a wild-type segment of sequence.
- the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 1.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 1.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 1.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 1. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 1. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 22. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 22.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 22. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 22. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 22. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 22. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 22.
- the guide nucleic acid described herein targets around position 484 in the Spike protein and hybridizes to a wild-type segment of sequence.
- the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 2.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 2.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 2.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 2.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 2. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 2. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 23. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 23.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 23. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 23. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 23. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 23.
- the guide nucleic acid described herein targets around position 501 in the Spike protein and hybridizes to a wild-type segment of sequence.
- the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 3.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 3.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 3.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 3.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 3. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 3. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 24. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 24.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 24. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 24. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 24. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 24.
- the guide nucleic acid described herein targets around position 452 in the Spike protein and hybridizes to a mutant segment of sequence.
- the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 4.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 4.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 4.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 4.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 4. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 4. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 25. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 25.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 25. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 25. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 25. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 25.
- the guide nucleic acid described herein targets around position 484, and hybridizes to a mutant segment of sequence.
- the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 5.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 5.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 5.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 5.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 5. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 5. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 26. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 26.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 26. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 26. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 26. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 26.
- the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 40. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 40. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 40. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 40. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 40.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 40.
- the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 4E
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 4E
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 4E
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 4E
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 4E
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO:
- the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 42. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 42. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 42. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 42. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 42. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 42.
- the guide nucleic acid described herein targets around position 501, and hybridizes to a mutant segment of sequence.
- the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 6.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 6.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 6.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 6.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 6. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 6. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 27. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 27.
- the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 27. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 27. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 6. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 6.
- the guide nucleic acids described herein comprise UAAUUUCUACUAAGUGUAGAU (SEQ ID NO: 126) on the 5’ end to be recognized by CasDxl, as described in the Example Section.
- the guide nucleic acids described herein comprise UAAUUUCUACUAAGUGUAGAU (SEQ ID NO: 126) on the 5’ end to be recognized by LbCasl2a, as described in the Example Section
- the guide nucleic acids described herein comprise UAAUUUCUACUCUUGUAGAU (SEQ ID NO: 127) on the 5’ end to be recognized by AsCasl2a, as described in the Example Section.
- systems comprising an effector protein, a guide nucleic acid and a reporter nucleic acid and uses thereof, e.g., in systems, and methods for assaying for a SARS-CoV-2 variant.
- a reporter comprises a single-stranded nucleic acid and a detection moiety, wherein the single-stranded nucleic acid is capable of being cleaved by the activated effector protein, thereby generating a detectable signal.
- the reporter comprises deoxyribonucleotides.
- the reporter comprises ribonucleotides.
- the reporter comprises at least one deoxyribonucleotide and at least one ribonucleotide.
- the reporter comprises a single-stranded nucleic acid comprising at least one ribonucleotide residue at an internal position that functions as a cleavage site.
- the reporter comprises a protein capable of generating a signal.
- a signal in some embodiments, is a calorimetric, potentiometric, amperometric, optical (e.g, fluorescent, colorimetric, etc.), or a piezo-electric signal.
- the reporter comprises a detection moiety. Suitable detectable labels and/or moieties contemplated for providing a signal include, but are not limited to, an enzyme, a radioisotope, a member of a specific binding pair; a fluorophore; a fluorescent protein; a quantum dot; and the like.
- Suitable fluorescent proteins include, but are not limited to, green fluorescent protein (GFP) or variants thereof, blue fluorescent variant of GFP (BFP), cyan fluorescent variant of GFP (CFP), yellow fluorescent variant of GFP (YFP), enhanced GFP (EGFP), enhanced CFP (ECFP), enhanced YFP (EYFP), GFPS65T, Emerald, Topaz (TYFP), Venus, Citrine, mCitrine, GFPuv, destabilized EGFP (dEGFP), destabilized ECFP (dECFP), destabilized EYFP (dEYFP), mCFPm, Cerulean, T-Sapphire, CyPet, YPet, mKO, HcRed, t- HcRed, DsRed, DsRed2, DsRed-monomer, J-Red, dimer2, t-dimer2(12), mRFPl, pocilloporin, Renilla GFP, Monster GFP, paGF
- Suitable enzymes include, but are not limited to, horse radish peroxidase (HRP), alkaline phosphatase (AP), beta-galactosidase (GAL), glucose-6- phosphate dehydrogenase, beta-N-acetylglucosaminidase, P-glucuronidase, invertase, Xanthine Oxidase, firefly luciferase, and glucose oxidase (GO).
- HRP horse radish peroxidase
- AP alkaline phosphatase
- GAL beta-galactosidase
- glucose-6- phosphate dehydrogenase beta-N-acetylglucosaminidase
- P-glucuronidase invertase
- Xanthine Oxidase firefly luciferase
- GO glucose oxidase
- the reporter comprises a detection moiety.
- the reporter comprises a cleavage site, wherein the detection moiety is located at a first site on the reporter, wherein the first site is separated from the remainder of reporter upon cleavage at the cleavage site.
- the detection moiety is 3’ to the cleavage site.
- the detection moiety is 5’ to the cleavage site.
- the detection moiety is at the 3’ terminus of the nucleic acid of a reporter. In some cases, the detection moiety is at the 5’ terminus of the nucleic acid of a reporter.
- the reporter comprises a detection moiety and a quenching moiety.
- the reporter comprises a cleavage site, wherein the detection moiety is located at a first site on the reporter and the quenching moiety is located at a second site on the reporter, wherein the first site and the second site are separated by the cleavage site.
- the quenching moiety is a fluorescence quenching moiety.
- the quenching moiety is 5’ to the cleavage site and the detection moiety is 3’ to the cleavage site.
- the detection moiety is 5’ to the cleavage site and the quenching moiety is 3’ to the cleavage site.
- the quenching moiety is at the 5’ terminus of the nucleic acid of a reporter.
- the detection moiety is at the 3’ terminus of the nucleic acid of a reporter. In some cases, the detection moiety is at the 5’ terminus of the nucleic acid of a reporter. In some cases, the quenching moiety is at the 3’ terminus of the nucleic acid of a reporter.
- the reporter is a reporter represented in Table 5.
- Table 5 Exemplary Single Stranded Detector Nucleic Acid
- the detector nucleic acid has a sequence of SEQ ID NO: 7. In some cases, the detector nucleic acid comprises a sequence that is at least 87.5% identical to SEQ ID NO: 7. In one embodiment, the detector nucleic acid comprises a sequence that is at least 75% identical to SEQ ID NO: 7. In one embodiment, the detector nucleic acid comprises a sequence that is at least 62.5% identical to SEQ ID NO: 7. In one embodiment, the detector nucleic acid comprises a sequence that is at least 50% identical to SEQ ID NO: 7.
- Suitable fluorophores provide a detectable fluorescence signal in the same range as 6-Fluorescein (Integrated DNA Technologies), IRDye 700 (Integrated DNA Technologies), TYE 665 (Integrated DNA Technologies), Alex Fluor 594 (Integrated DNA Technologies), or ATTO TM 633 (NHS Ester) (Integrated DNA Technologies).
- fluorophores are fluorescein amidite, 6-Fluorescein, IRDye 700, TYE 665, Alex Fluor 594, or ATTO TM 633 (NHS Ester).
- the fluorophore may be an infrared fluorophore.
- the fluorophore may emit fluorescence in the range of 500 nm and 720 nm. In some cases, the fluorophore emits fluorescence at a wavelength of 700 nm or higher. In other cases, the fluorophore emits fluorescence at about 665 nm.
- the fluorophore emits fluorescence in the range of 500 nm to 520 nm, 500 nm to 540 nm, 500 nm to 590 nm, 590 nm to 600 nm, 600 nm to 610 nm, 610 nm to 620 nm, 620 nm to 630 nm, 630 nm to 640 nm, 640 nm to 650 nm, 650 nm to 660 nm, 660 nm to 670 nm, 670 nm to 680 nm, 690 nm to 690 nm, 690 nm to 700 nm, 700 nm to 710 nm, 710 nm to 720 nm, or 720 nm to 730 nm. In some cases, the fluorophore emits fluorescence in the range 450 nm to 750 nm, 500 nm to 650 nm,
- a quenching moiety in some embodiments, is chosen based on its ability to quench the detection moiety.
- a quenching moiety in some embodiments, is a non- fluorescent fluorescence quencher.
- a quenching moiety in some embodiments, quenches a detection moiety that emits fluorescence in the range of 500 nm and 720 nm.
- a quenching moiety in some embodiments, quenches a detection moiety that emits fluorescence in the range of 500 nm and 720 nm. In some cases, the quenching moiety quenches a detection moiety that emits fluorescence at a wavelength of 700 nm or higher.
- the quenching moiety quenches a detection moiety that emits fluorescence at about 660 nm or about 670 nm. In some cases, the quenching moiety quenches a detection moiety that emits fluorescence in the range of 500 to 520, 500 to 540, 500 to 590, 590 to 600, 600 to 610, 610 to 620, 620 to 630, 630 to 640, 640 to 650, 650 to 660, 660 to 670, 670 to 680, 690 to 690, 690 to 700, 700 to 710, 710 to 720, or 720 to 730 nm.
- the quenching moiety quenches a detection moiety that emits fluorescence in the range 450 nm to 750 nm, 500 nm to 650 nm, or 550 to 650 nm.
- a quenching moiety quenches fluorescein amidite, 6-Fluorescein, IRDye 700, TYE 665, Alex Fluor 594, or ATTO TM 633 (NHS Ester).
- a quenching moiety in some embodiments, is Iowa Black RQ, Iowa Black FQ or IRDye QC-1 Quencher.
- a quenching moiety quenches fluorescein amidite, 6-Fluorescein (Integrated DNA Technologies), IRDye 700 (Integrated DNA Technologies), TYE 665 (Integrated DNA Technologies), Alex Fluor 594 (Integrated DNA Technologies), or ATTO TM 633 (NHS Ester) (Integrated DNA Technologies).
- a quenching moiety may be Iowa Black RQ (Integrated DNA Technologies), Iowa Black FQ (Integrated DNA Technologies) or IRDye QC-1 Quencher (LiCor). Any of the quenching moieties described herein, in some embodiments, is from any commercially available source, may be an alternative with a similar function, a generic, or a non-trade name of the quenching moieties listed.
- the reporter is present in at least 1.5 fold, at least 2 fold, at least 3 fold, at least 4 fold, at least 5 fold, at least 6 fold, at least 7 fold, at least 8 fold, at least 9 fold, at least 10 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 30 fold, at least 40 fold, at least 50 fold, at least 60 fold, at least 70 fold, at least 80 fold, at least 90 fold, at least 100 fold, from 1.5 fold to 100 fold, from 2 fold to 10 fold, from 10 fold to 20 fold, from 20 fold to 30 fold, from 30 fold to 40 fold, from 40 fold to 50 fold, from 50 fold to 60 fold, from 60 fold to 70 fold, from 70 fold to 80 fold, from 80 fold to 90 fold, from 90 fold to 100 fold, from 1.5 fold to 10 fold, from 1.5 fold to 20 fold, from 10 fold to 40 fold, from 20 fold
- Methods described herein comprise determining a variant call of a sample.
- determining the variant call comprises determining whether the sample comprises a wild-type SARS-CoV-2 or a SARS-CoV-2 variant.
- determining the variant call of the SARS-CoV-2 variant comprises detecting one or more S- gene mutation(s) relevant to a wild-type SARS-CoV-2.
- the SARS-CoV-2 variant is any one of Alpha, Beta, Gamma, Delta, Epsilon, Kappa, Omicron, or Zeta SARS- CoV-2 variants.
- determining the variant call of Alpha SARS-CoV-2 variant comprises detecting an S-gene mutation in amino acid position 501 from N to Y (N501 Y). In specific cases, determining the variant call of Alpha SARS-CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid position 501 of S protein.
- determining the variant call of Alpha SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 3 or 6. In certain cases, determining the variant call of Alpha SARS- CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 24 or 27.
- determining the variant call of Beta SARS-CoV-2 variant comprises detecting S-gene mutations in amino acid positions 484 from E to K and 501 from N to Y (E484K and N501Y).
- determining the variant call of Beta SARS- CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid positions 484 and 501 of S protein.
- determining the variant call of Beta SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 2, 3, 5, or 6. In certain cases, determining the variant call of Beta SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 23, 24, 26, or 27.
- determining the variant call of Gamma SARS-CoV-2 variant comprises detecting S-gene mutations in amino acid positions 484 from E to K and 501 from N to Y (E484K and N501Y). In specific cases, determining the variant call of Gamma SARS- CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid positions 484 and 501 of S protein.
- determining the variant call of Gamma SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 2, 3, 5, or 6. In certain cases, determining the variant call of Gamma SARS-CoV-2 variant comprises using the non- naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 23, 24, 26, or 27.
- determining the variant call of Delta SARS-CoV-2 variant comprises detecting an S-gene mutation in amino acid position 452 from L to R (L452R).
- determining the variant call of Delta SARS-CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid position 452 of S protein.
- determining the variant call of Delta SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 1 or 4. In certain cases, determining the variant call of Delta SARS- CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 22 or 25.
- determining the variant call of Epsilon SARS-CoV-2 variant comprises detecting an S-gene mutation in amino acid position 452 from L to R (L452R).
- determining the variant call of Epsilon SARS-CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid position 452 of S protein.
- determining the variant call of Epsilon SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 1 or 4. In certain cases, determining the variant call of Epsilon SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 22 or 25.
- determining the variant call of Kappa SARS-CoV-2 variant comprises detecting an S-gene mutation in amino acid position 452 from L to R (L452R).
- determining the variant call of Kappa SARS-CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid position 452 of S protein.
- determining the variant call of Kappa SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 1 or 4. In certain cases, determining the variant call of Kappa SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 22 or 25.
- determining the variant call of Omicron SARS-CoV-2 variant comprises detecting an S-gene mutation in amino acid position 484 from E to A (E484A). In specific cases, determining the variant call of Omicron SARS-CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid position 484 of S protein.
- determining the variant call of Omicron SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 2 or 42.
- determining the variant call of Zeta SARS-CoV-2 variant comprises detecting an S-gene mutation in amino acid position 484 from E to K (E484K).
- determining the variant call of Zeta SARS-CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid position 484 of S protein.
- determining the variant call of Zeta SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 2 or 5. In certain cases, determining the variant call of Zeta SARS- CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 23 or 26.
- the data analysis pipeline to determine a variant call of a sample comprises 1) obtaining the maximum fluorescence value and the minimum fluorescence value from a well containing the sample, an effector protein, a reporter nucleic acid, and a non-naturally occurring guide nucleic acid that targets the target nucleic acid from wild-type SARS-CoV-2; 2) dividing the maximum fluorescence value by the corresponding minimum fluorescence value from 1); 3) obtaining the maximum fluorescence value and the minimum fluorescence value from a well with the sample, an effector protein, a reporter nucleic acid, and a non-naturally occurring guide nucleic acid that targets the target nucleic acid from a SARS-CoV-2 variant.
- the non-naturally occurring guide nucleic acid are designed for detecting mutations L452R, E484K, E484Q, E484A, and N501Y.
- the non-naturally occurring guide nucleic acid comprises a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 22, 23, 24, 25, 26, 27, 40, 41, or 42; 4) dividing the maximum fluorescence value by the corresponding minimum fluorescence value from 3).
- the data analysis pipeline to determine a variant calling of a sample comprises transforming the normalized signals described herein (e.g., fluorescence yield).
- the transformation is applying a logarithmic value to normalized signal (e.g., fluorescence yield).
- the transformation is applying a logarithmic value with a base of 2 to normalized signal (e.g., fluorescence yield). Therefore, log2 Fy (WT) and log2 Fy (M) are obtained.
- an allele discrimination plot can be generated with the transformed signal (see FIGS. 4A-4C or FIG. 11D as non-limiting examples).
- a binary logarithmic value can be applied to scaled signals, and the ratio of the WT and Mutant transformed values are plotted against the average of the WT and Mutant transformed values, a mean average (MA) plot can be generated (see FIGS. 8A-8H or FIGS. 12A-12M as non-limiting examples).
- the data analysis pipeline to determine a variant calling of a sample comprises comparing the transformed signals.
- the sample is determined as WT.
- the data analysis pipeline can be expressed as log2(Fy(WT)) > log2(Fy(M)) —> Wild Type.
- the sample is determined as Mutant that is indictive of variant.
- the data analysis pipeline can be expressed as log2(Fy(WT)) ⁇ log2(Fy(M)) —> Mutant.
- the signals of each mutant can be compared to the wild-type signal. If there exists a mutant, then among (n) comparisons for n mutants and one wild-type, one of the comparisons may yield a mutant call.
- the data analysis pipeline can be expressed as log2(Fy(WT)) > log2(Fy(Ml)) Wild Type, log2(Fy(WT)) > log2(Fy(M2)) Wild Type, log2(Fy(WT)) > log2(Fy(M3)) Wild Type, log2(Fy(WT)) ⁇ log2(Fy(M4)) Mutant (4).
- the tie breaker analysis pipeline can be expressed as log2(Fy(WT)) > log2(Fy(Ml)) Wild Type, log2(Fy(WT)) ⁇ log2(Fy(M2)) Mutant (2), log2(Fy(WT)) > log2(Fy(M3)) Wild Type, log2(Fy(WT)) ⁇ log2(Fy(M4)) Mutant (4), log2(Fy(M2)) ⁇ log2(Fy(M4)) Mutant (4).
- a no-target control can be processed concurrently with the sample, and the maximum and the minimum fluorescence values can be scaled and transformed as described herein for a sample.
- logarithmic ratio between the scaled signals and the logarithmic mean of the scaled signals can be defined as Contrast and Size.
- the Contrast and the Size of the sample are within the lower and upper bounds of the NTC that is concurrently processed, the sample is determined as No Call.
- the quality of the determination of a variant calling can be visualized by the separation of WT and Mutant signals in the allele discrimination plot described herein. In other instances, the quality of the determination of a variant calling can be visualized by the separation of WT and Mutant signals in the MA plot described herein (see FIGS. 8A-8H or FIGS. 12A-12M as non-limiting examples).
- compositions, systems, and methods described herein, in some implementations, are multiplexed in a number of ways. These methods of multiplexing are, for example, consistent with methods and reagents disclosed herein for detection of a target nucleic acid within the sample.
- Multiplexing are either spatial multiplexing wherein multiple different target nucleic acids are detected at the same time, but the reactions are spatially separated. Often, the multiple target nucleic acids are detected using the same programmable nuclease, but different guide nucleic acids. The multiple target nucleic acids sometimes are detected using the different programmable nucleases. Sometimes, multiplexing can be single reaction multiplexing wherein multiple different target acids are detected in a single reaction volume. Often, a single population of programmable nucleases is used in single reaction multiplexing. Sometimes, at least two different programmable nucleases are used in single reaction multiplexing. For example, multiplexing can be enabled by immobilization of multiple categories of detector nucleic acids within a fluidic system, to enable detection of multiple target nucleic acids within a single sample.
- kits, reagents, methods, and systems for use in detecting a SARS-CoV-2 variant Disclosed herein are kits, reagents, methods, and systems for use in detecting a SARS-CoV-2 variant.
- kits include a package, carrier, or container that is compartmentalized to receive one or more containers such as vials, tubes, and the like, each of the container(s) comprising one of the separate elements to be used in a method described herein.
- Suitable containers include, for example, test wells, bottles, vials, and test tubes.
- the containers are formed from a variety of materials such as glass, plastic, or polymers.
- the kit or systems described herein contain packaging materials. Examples of packaging materials include, but are not limited to, pouches, blister packs, bottles, tubes, bags, containers, bottles, and any packaging material suitable for intended mode of use.
- a kit typically includes labels listing contents and/or instructions for use, and package inserts with instructions for use.
- a set of instructions will also typically be included.
- a label is on or associated with the container.
- a label is on a container when letters, numbers or other characters forming the label are attached, molded or etched into the container itself; a label is associated with a container when it is present within a receptacle or carrier that also holds the container, e.g, as a package insert.
- a label is used to indicate that the contents are to be used for a specific therapeutic application. The label also indicates directions for use of the contents, such as in the methods described herein.
- the product After packaging the formed product and wrapping or boxing to maintain a sterile barrier, the product may be terminally sterilized by heat sterilization, gas sterilization, gamma irradiation, or by electron beam sterilization. Alternatively, the product may be prepared and packaged by aseptic processing.
- a signal dataset was generated using a wild-type sample containing the wild-type synthetic gene fragments and a mutant sample containing the mutant synthetic gene fragments, in which each sample was extracted for their respective nucleic acids and partitioned into replicates (e.g., three wild-type amplification replicates and three mutant amplification replicates).
- Amplification replicates were amplified by reverse transcriptase loop-mediated isothermal amplification (RT-LAMP), and each amplification replicate was further partitioned into replicate wells in a microtiter plate, for each of the three amino acid positions 452, 484, and 501 (see, for example, the schematic in FIGS. 4A-4B).
- Replicate wells were processed using comparative programmable nuclease-based (DETECTR) reactions, in which amplified synthetic gene fragments were interrogated using a CasDxl programmable nuclease, one or more reporters, and a guide sequence corresponding to either the wild-type sequence or the mutant sequence of the SARS-CoV-2 S-gene.
- DETECTR comparative programmable nuclease-based
- LAMP primer sets each containing 6 primers, were designed to target the L452R, E484K and N501Y mutations in the SARS-CoV-2 Spike (S) protein.
- Multiplexed RT-LAMP was performed using a final reaction volume of 50 pl, which consisted of 8 ul nucleic acid template, 5 pl of L452R primer set, 5 pl of E484K/N501Y primer set, 17 pl of nuclease-free water, 1 pl of SYTO-9 dye, and 14 pl of LAMP mastermix.
- Each of the primer sets consisted of 1.6 pM each of inner primers FIP and BIP, 0.2 pM each of outer primers F3 and B3, and 0.8 pM each of loop primers LF and LB.
- the LAMP mastermix contained 6 mM of MgS04, isothermal amplification buffer at lx final concentration, 1.5 mM of dNTP mix, 8 units of Bst 2.0 WarmStart DNA Polymerase, and 0.5 ul of WarmStart RTx Reverse Transcriptase. Plates were incubated at 65°C for 40 minutes in a real-time QuantstudioTM 5 PCR instrument. Fluorescent signals were collected every 30 seconds.
- Each sample was thus interrogated using a wild-type/mutant comparison in replicates of three, for each of the three amino acid positions, and for each of the three amplification replicates. Reporting signals for each well were obtained based on fluorescence intensities generated by cleavage of the reporter by the CasDxl programmable nuclease upon recognition of the target sequence by the guide sequence.
- the fluorescence intensities of each well were normalized by scaling the maximum fluorescence values to the minimum fluorescence values for the respective well to generate fluorescence yields. Scaled signals for wells interrogated with the wild-type guide sequence were then compared with scaled signals for wells interrogated with the mutant guide sequence, for each variant amino acid position per sample in the plate. Specifically, after scaling, a logarithmic transformation was applied to the generated fluorescence yields for a respective well, thus obtaining a logarithmic ratio calculated in the form of log 2 (Fy(FAl)) / log 2 (Fy(SAl)),
- Fy(FAl) is the corresponding fluorescence yield for a first well interrogated with the wild-type guide sequence in the comparative assay
- Fy(SAl) is the corresponding fluorescence yield for a second well of the same sample type interrogated with the mutant guide sequence in the comparative assay.
- a well was deemed to be wild-type (e.g, contain an aliquot of the wild-type synthetic gene fragments) if the resulting ratio indicated a greater value for log 2 (Fy(FAl)) than for log 2 (Fy(SAl)), whereas a well was deemed to be mutant (e.g, contain an aliquot of the mutant synthetic gene fragments) if the resulting ratio indicated a lower value for log 2 (Fy(FAl)) than for log 2 (Fy(SAl)).
- comparative assay replicates were designated as no-call if the logarithmic ratio between the WT and MUT fluorescence yields fell within the maximum fluorescence yield and minimum fluorescence yield generated for a plurality of control wells containing no target samples (NTCs). Comparative assay replicates were further designated as no-call if the logarithmic mean of the WT and MUT fluorescence yields fell within the range of the maximum and minimum logarithmic means for NTCs, where logarithmic means were calculated in the form of
- FIGS. 5A, 5B, and 5C illustrate the identification of differentiated phenotypes WT and MUT (“M”) using the systems and methods disclosed herein.
- FIG. 5A illustrates allele discrimination plots that show separation of the WT and MUT (“M”) signals, which are concordant with the genotype of the originating samples (WT, M, and NTC).
- FIG. 5B illustrates further separation of the populations after application of a binary logarithmic value to each scaled signal in each well. These transformed values provide clear and improved differentiation for making mutation calls when the ratio of the WT and MUT transformed values are plotted against the average of the WT and MUT transformed values on a Mean Average (MA) plot.
- MA Mean Average
- gRNAs Guide RNAs
- CasDxl and LbCasl2a were initially screened for activity on synthetic gene fragments encoding regions of the SARS-CoV-2 S-gene with either wild-type (WT, SEQ ID NO: 20) or mutant (MUT, SEQ ID NO. 21) sequences at amino acid positions 452, 484, and 501.
- WT wild-type
- MUT mutant sequences at amino acid positions 452, 484, and 501.
- CasDxl, LbCasl2a, and AsCasl2a were further evaluated with their cognate gRNAs on synthetic gene fragments with respect to SNP differentiation capabilities.
- S-gene fragment for SARS-CoV-2, forward strand (SEQ ID NO: 118):
- SEQ ID: 118 ggugguaauuauaauuaccuguauagauuguuuuaggaagucuaaucucaaaccuuuugagagagauauuucaacugaaa ucuaucaggccgguagcacaccuuguaaugguguugaagguuuuuaauuguuacuuuccuuuacaaucauaugguuucc aacccacuaaugguguugguuaccaacca
- S-gene fragment for SARS-CoV-2, reverse strand (SEQ ID NO: 119):
- SEQ ID: 119 ccaccauuuaauauuaauggacauaucuaacaauccuucagauuagaguuuggaaaacucucucuauaaaguugacuuua gauaguccggccaucguguggaacauuaccacaacuuccaaaauuaacaaugaaaggaaauguuaguauaccaaagguug ggugauuaccacaaccaaugguuggu
- SEQ ID: 120 ccGguauagauuguuuagga
- SEQ ID: 121 aauggacauaucuaacaaau
- SEQ ID: 122 acauuaccacaaUuuccaaa
- SEQ ID: 123 acauuaccacaacuuccaaa
- SEQ ID: 124 aacccacuUaugguguuggu [00485] Wild-type S-gene fragment subsequence spanning amino acid position 501 (SEQ ID NO: 125):
- SEQ ID: 125 caacccacuaaugguguugg
- FIG. 7 illustrates the SARS-CoV-2 CRISPR-CasDxl based DETECTR® workflow.
- the sample e.g., a RNA extraction
- DETECTR was used as an input to DETECTR, which was visualized by a fluorescent reader.
- RNA encoding SARS-CoV-2 S-gene was amplified using an isothermal amplification method such as RT-LAMP.
- Amplified samples were detected using a Casl2 programmable nuclease complexed with gRNAs directed to SARS-CoV-2 S-gene sequence.
- the Casl2 programmable nuclease cleaved an ssDNA reporter nucleic acid upon complex formation with the target nucleic acid.
- the results of the DETECTR assay were then compared with Whole Genome Sequencing (WGS) results.
- Wild-type and mutant synthetic gene fragments were PCR amplified using NEB 2x Phusion Master Mix following the manufacturer’s protocol. The amplified product was cleaned using AMPure XP beads following manufacturers protocol at a 0.7X concentration. The product was eluted in nuclease-free water and normalized to 10 nM.
- the LAMP primers used herein are shown in Table 3A and were synthesized by Eurofins Genomics.
- the guide RNAs used herein are shown in Tables 4A and were synthesized by Dharmacon or Synthego.
- the reporter used herein is shown in Table 5 and was synthesized by IDT.
- the synthetic gene fragments SEQ ID Nos. 20 and 21 were synthesized by Twist Biosciences.
- NP/OP SARS-CoV-2 RT-PCR positive nasopharyngeal and/or oropharyngeal
- UDM universal transport media
- VTM viral transport media
- NP/OP swab samples obtained from the UCSF Clinical Microbiology Laboratory were pretreated with DNA/RNA Shield (Zymo Research, # R1100-250) at a 1 : 1 ratio.
- the Mag-Bind Viral DNA/RNA 96 kit (Omega Bio-Tek, # M6246-03) on the KingFisher Flex (Thermo Fisher Scientific, # 5400630) was used for viral RNA extraction using an input volume of 200 pl of diluted NP/OP swab sample and an elution volume of 100 pl.
- the TaqpathTM COVID-19 RT-PCR kit was used to determine the N gene cycle threshold values.
- VBM SARS-CoV-2 variants being monitored
- VOC variants of concern
- VOI variants of interest
- RNA from heat-inactivated SARS-CoV-2 VBM/V OC/V OI isolates were extracted using the EZ1 Virus Mini Kit v2.0 (Qiagen, # 955134) on the EZ1 Advanced XL (Qiagen, # 9001875) according to the manufacturer’s instructions. For each culture, six replicate LAMP reactions were pooled into a single sample. DETECTR® was performed on a 1: 10 dilution of the 10,000 cp/rxn LAMP amplification products.
- COVID- 19 Variant DETECTR® assay was performed with RT-LAMP followed by a CRISPR-Cas-based assay.
- LAMP primer sets each containing 6 primers, were designed to target the L452R, E484K and N501Y mutations in the SARS-CoV-2 Spike (S) protein (see Table 3A).
- Sets of LAMP primers were designed from a 350 bp target sequence spanning the 3 mutations using Primer Explorer V5 (available on the Internet at primerexplorer.jp/e/).
- Candidate primers were manually evaluated for inclusion using the OligoCalc online oligonucleotide properties calculator (see W. A. Kibbe, Nucleic Acids Res 35, W43-46 (2007)) while ensuring that there was no overlap with either primers from the other set or guide RNA target regions that included the L452R, E484K, and N501Y mutations.
- RT-LAMP Multiplexed RT-LAMP was performed using a final reaction volume of 50 pl, which consisted of 8 pl RNA template, 5 pl of L452R primer set (Eurofins Genomics), 5 pl of E484K/N501Y primer set, 17pl of nuclease-free water, Ipl of SYTO-9 dye (ThermoFisher Scientific), and 14pl of LAMP mastermix.
- Each of the primer sets consisted of 1.6pM each of inner primers FIP and BIP, 0.2 pM each of outer primers F3 and B3, and 0.8 pM each of loop primers LF and LB.
- the LAMP mastermix contained 6 mM of MgSOi.
- isothermal amplification buffer at IX final concentration 1.5 mM of dNTP mix (NEB), 8 units of Bst 2.0 WarmStart DNA Polymerase (NEB), and 0.5 ul of WarmStart RTx Reverse Transcriptase (NEB). Plates were incubated at 65°C for 40 minutes in a real-time QuantstudioTM 5 PCR instrument. Fluorescent signals were collected every 60 seconds.
- CasDxl (Mammoth Biosciences), LbCasl2a (EnGen® Lba Casl2a, NEB) or AsCasl2a (Alt-R® A.s. Casl2a, IDT) protein targeting the WT or MUT SNP at L452R, E484K, or N501Y was incubated with 40nM gRNA in IX buffer (MBuffer3 for CasDxl, NEBuffer r2.1 for LbCasl2a and AsCasl2a) for 30 min at 37°C.
- IX buffer Muffer3 for CasDxl, NEBuffer r2.1 for LbCasl2a and AsCasl2a
- gRNAs with an extra sequence of UAAUUUCUACUAAGUGUAGAU (SEQ ID NO: 126) on the 5’end were used with both CasDxl and LbCasl2a, whereas gRNAs with an extra sequence of UAAUUUCUACUCUUGUAGAU (SEQ ID NO: 127) on the 5’end were used with AsCasl2a (see Table 4A).
- lOOnM ssDNA reporter (SEQ ID NO: 7) was added to the RNA- protein complex. 18pL of this DETECTR® master mix was combined with 2pL target amplicon. The DETECTR® assays were monitored for 30 min at 37°C in a plate reader (Tecan).
- cDNA Complementary DNA
- SARS-CoV-2 primers version 3 SARS-CoV-2 primers version 3 according to the Artic protocol (see Plitnick, J. et al., J Clin Microbiol 59, e0064921 (2021) and Quick, J. et al., Nat Protoc 12, 1261-1276 (2017)).
- SARS-CoV-2 viral genome assembly and variant analyses were performed using an in-house bioinformatics pipeline. Briefly, sequencing reads generated by Illumina sequencers (NextSeq 550 or NovaSeq 6000) were demultiplexed and converted to FASTQ files using bcl2fastq (v2.20.0.422). Raw FASTQ files were first screened for SARS-CoV-2 sequences using BLASTn (BLAST+ package 2.9.0) alignment against the Wuhan-Hu- 1 SARS-CoV-2 viral reference genome (NC_045512).
- each well had a guide specific to the mutant or the wild-type SNP.
- the comparison was important to assign a genotypic call to the sample.
- the DETECTR® reactions across the plate were not comparable to each other.
- the endpoint fluorescence intensities were normalized in each well to its own minimum intensity, which is defined as fluorescence yield.
- the fluorescence yield can be compared across wells in a plate under the assumption that each well will have a similar minimum fluorescence starting point. Irrespective of the highest levels of the fluorescence intensities observed across samples, without being limited to any one theory of operation, the yield for a given target will generally remain the same assuming that similar concentrations of samples/target are being compared. This aids in normalizing the signal and comparing replicates across the wells in the same plate.
- Variant calling was performed using the following general rules: 1. No template controls were assigned NTC; 2. If the contrast of the sample for a SNP was between minimum and maximum contrast for the plate, then the sample was assigned a NoCall; and 3. If the size of the sample was lower than the size of the NTC on the plate, then the sample was assigned a NoCall.
- gRNAs Guide RNAs
- WT wild-type
- MUT mutant sequences at amino acid positions 452, 484, and 501
- FIGS. 6B-E Further evaluation of CasDxl, LbCasl2a and AsCasl2a with their cognate gRNAs on synthetic gene fragments is illustrated in FIGS. 6B-E.
- FIGS. 6C-E each figure corresponds to one of the three effector proteins.
- the first column of each figure shows the fluorescent read out from the CRISPR assay performed on a wild-type fragment, with a wild-type guide versus a mutant guide.
- the second column of each figure shows the fluorescent read out from the CRISPR performed on a mutant fragment, with a wild-type guide versus a mutant guide.
- the third column of each figure shows the fluorescent read out from the CRISPR assay performed on a control, with a wild-type guide versus a mutant guide.
- CasDxl showed clear SNP differentiation between wild-type (WT) and mutant (MUT) sequences on all three S-gene variants (see FIG. 6C).
- LbCasl2a was capable of differentiating SNPs at positions 452 and 484, and AsCasl2a was able to differentiate the SNP at position 452 (see FIGS. 6D- 6E).
- a redundant LAMP design was adopted for two reasons: first, this approach was shown to improve detection sensitivity in initial experiments; second, the goal was to increase assay robustness given the continual emergence of escape mutations in the spike RBD throughout the course of the pandemic (see Harvey et al., Nat Rev Microbiol 19, 409-424 (2021)).
- the tested viral cultures included an ancestral SARS-CoV-2 lineage (WA-1) containing the wildtype spike protein (D614) targeted by the approved mRNA (BNT162b2 from Pfizer or mRNA-1273 from Modema) (see K. S. Corbett et al., Nature 586, 567-571 (2020); F. P.
- VBMs variants being monitored
- VOCs variants of concern
- VOIs variants of interest
- Alpha B.1.1.7
- Beta B.1.351
- Gamma P.l
- Epsilon B.1.427 and B.1.429
- Kappa B.1.617.1
- Zeta P.2 lineages
- Heat-inactivated viral culture samples representing the seven SARS-CoV-2 lineages were quantified by digital droplet PCR across a 4-log dynamic range and used to evaluate the analytical sensitivity of the pre-amplification step.
- RT-LAMP amplification was evaluated using six replicates from each viral culture. Consistent amplification was observed for all seven SARS-CoV-2 lineages with 10,000 copies of target input per reaction (200,000 copies/mL) (see FIG. 6G), which is comparable to the target input of more than 200,000 copies/mL viruses (less than 30 Ct value) required for sequencing workflows used in SARS-CoV-2 variant surveillance (see e.g., F.
- FIG. 6B shows a representative heatmap shows the expected pattern of wild-type (WT) and mutational (MUT) calls for each of the SNPs resulting in L452R, E484K and N501 Y. Specifically, for each combination of effector protein and either wild-type or mutant guide, FIG.
- FIG. 6H shows the heatmap results for each reaction carried out in this assay, including for each combination of effector proteins, variant, and corresponding mutant or wild type guide.
- the results of the fluorescent assay are also shown in the fluorescent read-out curves of FIGS. 6I-6K plotting raw fluorescence over time for the wild-type SARS-CoV-2 (Column 1), each variant (Columns 2-7) and controls (Columns 8-12).
- CasDxl correctly identified the wild-type (WT) and mutational (MUT) targets at positions 452, 484 and 501 in each LAMP-amplified, heat-inactivated viral culture (FIG. 6H and FIGS. 6I-6K).
- LbCasl2a was capable of differentiating WT from MUT at position 501 on LAMP-amplified viral cultures but showed much higher background for the WT target at position 452 and higher background for both WT and MUT targets at position 484 (FIG. 6H and FIGS. 6I-6K). Additionally, AsCasl2a was able to differentiate WT from MUT targets at position 452 albeit with substantial background but was unable to differentiate WT from MUT targets at positions 484 and 501 (FIG. 6H and FIGS. 6I-6K). From these data, it was concluded that CasDxl provided more consistent and accurate calls for the L452R, E484K and N501Y mutations.
- one sample (COVID-31) was designated a ‘no call’ at position 452 by viral WGS and thus lacked a comparator, two samples were designated a ‘no call’ due to flat WT and MUT curves (COVID-41 and COVID-73), four samples had similar WT and MUT curve amplitudes, suggesting a mixed population (COVID-03, COVID-56, COVID-61 and COVID-81) (see FIGS. 13A-13D), and four samples had SNP assignments discordant with those from viral WGS (COVID-12, COVID-13, COVID-20 and COVID-63) (see FIGS. 13A-13D).
- the positive predictive agreement (PPA) between the DETECTR® assay and viral WGS at all three WT and MUT SNP positions was 100% (272 of 272, p ⁇ 2.2e-16 by Fisher’s Exact Test) (see Table 7).
- the corresponding negative predictive agreement (NPA) was 91.4% as the E484Q mutation for two SNPs was incorrectly classified as WT. Nevertheless, the final viral lineage classification for the 91 samples after discrepancy testing showed 100% agreement with viral WGS (see FIGS.
- a CRISPR based DETECTR® assay was developed for the detection of SARS-CoV-2 variants.
- Three CRISPR-Casl2 enzymes were evaluated. Based on a head-to-head comparison of these enzymes, clear differences in performance were observed with CasDxl demonstrating the highest fidelity was able to reliably detect all three of the targeted SNPs.
- a data analysis pipeline was developed to differentiate between WT and MUT signals with the COVID-19 Variant DETECTR® assay, yielding an overall SNP concordance of 100% (272/272 total SNP calls) and 100% agreement with lineage classification compared to viral WSG.
- CRISPR-based diagnostic assays have been demonstrated for the detection of SARS-CoV-2 variants, these studies have limitations in coverage of circulating lineages and in the extent of clinical sample evaluation.
- the miSHERLOCK variant assay uses LbCasl2a (NEB) to detect N501Y, E484K and Y144Del covering eight lineages (WA-1, Alpha, Beta, Gamma, Eta, Iota, Mu and Zeta) and was tested only on contrived samples (RNA spiked into human saliva) (see Puig et al., Sci Adv 7, (2021)).
- the SHINEv2 assay uses LwaCasl3a to detect 69/70Del, K417N/T, L452R and 156/157Del + R158G covering eight lineages (WA-1, Alpha, Beta, Gamma, Delta, Epsilon, Kappa and Mu) and was tested with only the 69/70Del gRNAs on 20 Alpha-positive NP clinical samples (see Arizti-Sanz et al., medRxiv, (2021)).
- COVID-19 Variant DETECTR® assay uses CasDxl to detect N501Y, E484K and L452R covering eleven lineages (WA-1, Alpha, Beta, Gamma, Delta, Epsilon, Eta, Iota, Kappa, Mu and Zeta) and 91 clinical samples representing seven out of the eleven lineages were tested with successful detection of all seven.
- the COVID-19 Variant DETECTR® assay described herein can be served as an initial screen for the presence of a rare or novel variant (e.g., carrying both L452R and E484K or carrying all three SNPs) that could be reflexed to viral WGS.
- a rare or novel variant e.g., carrying both L452R and E484K or carrying all three SNPs
- the COVID- 19 Variant DETECTR® assay would thus enable rapid identification of variants circulating in the community to support outbreak investigation and public health containment efforts.
- identification of specific mutations associated with neutralizing antibody evasion, such as E484K could inform patient care with regards to the use of monoclonal antibodies that remain effective in treating the infection.
- the COVID- 19 Variant DETECTR® assay can be readily reconfigured by validating new pre-amplification LAMP primers and gRNAs that target emerging mutations with clinical and epidemiological significance.
- the newly emerging Omicron variant containing at least 30 mutations in the spike protein and 11 mutations in the spike RGD region targeted by the assay, could be detected by increasing degeneracy in the LAMP primers and adding at least one gRNA to be able to distinguish this variant from the others.
- a validated CRISPR assay that combines SARS-CoV-2 detection with variant identification would be useful as a tool for simultaneous COVID-19 diagnosis in individual patients and surveillance for infection control and public health purposes.
- Example 3 SARS-CoV-2 SNP Calling: L452R, E484K, and N501 Y
- the disclosure provides methods for determining L452R, E484K, and N501Y versus wild-type calls within the spike gene of SARS-CoV-2 using an interpretive algorithm for SNP calling in conjunction with a CRISPR-Casl2 based assay.
- the spike variants L452R, E484K, N501Y are described in Zhang el al., 2021, “Ten emerging SARS-CoV-2 spike variants exhibit variable infectivity, animal tropism, and antibody neutralization,” Commun Biol 4, 1196, which is hereby incorporated by reference.
- a signal dataset was obtained that comprised, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting fluorescent signals for a respective one or more nucleic acid molecules in a biological sample that map to position 452, 484 or 501 of the spike protein arising from RT-LAMP amplification of the CRISPR-Casl2 based DETECTR® assay.
- the plurality of wells comprised a first set of nine wells representing the wild type allele for position 452 of the spike protein. Each well in the first set of nine wells included a first plurality of guide nucleic acids that have the wild type allele for position 452 of the SARS-CoV-2 spike protein. [00546] The plurality of wells further comprised a second set of nine wells representing the L452R allele for the SARS-CoV-2 spike protein. Each well in the second set of nine wells included a second plurality of guide nucleic acids that have the L452R allele for the SARS- CoV-2 spike protein.
- Each respective well in the first set of wells and each respective well in the second set of wells contained a corresponding aliquot of nucleic acid derived from a biological sample for which the determination of a L452R versus a wild-type call within the spike gene of SARS-CoV-2 was sought, as well as a Casl2 protein with trans-cutting activity.
- Each corresponding plurality of reporting fluorescent signals in the signal dataset comprised, for each respective time point in a set of 30 time points, a respective fluorescent reporting signal in the form of a corresponding discrete attribute value arising from RT- LAMP amplification.
- Each respective time point in the corresponding plurality of reporting signals represented a different 60 second time interval within a 30 minute monitoring time period of the RT-LAMP amplification.
- the signal yield was the maximum signal yield observed across the 30 times points divided by the minimum signal yield observed across the 30 times points. This allowed signal yields from individual wells to be compared to each other.
- the relative intensity metric has the form: log 2 (Fy(WT)) / log 2 (Fy(M)),
- Fy(WT) was the first corresponding signal yield representing the subject wild type allele
- Fy(M) was the subject mutant allele of the SARS-CoV-2 spike protein.
- WT was the wild type allele for position 452 of the SARS-CoV-2 spike protein
- M was the L452R allele for the SARS-CoV-2 spike protein of the SARS-CoV-2 spike protein.
- the first set of nine wells was binned into three bins, each bin consisting of three of the wells from the first set of nine wells. For each respective bin in the set of three bins, a respective first concordance vote across the candidate call identities for wells in the respective bin was made, thereby generating three bin votes, one for each bin.
- the respective first concordance vote generated a respective bin vote based on a common candidate call identity that was shared by all of the wells in the subset of wells for the respective bin.
- a respective bin vote was no-call when at least one well in the three wells in the respective bin has a candidate call identity of no-call.
- a second concordance vote was made across the three bin votes for the first set of nine wells, thereby obtaining the mutation call that was one of L452R or wild type for position 452 of the SARS-CoV-2 spike protein for the biological sample.
- the second concordance vote generated this mutation call based on a common bin vote that was shared by at least two of the three bins.
- Example 4 SARS-CoV-2 Omicron Variant Detection with High-Fidelity CRISPR-Casl2 Enzyme
- CasDxl (Mammoth Biosciences) was evaluated for activity on SARS-CoV-2 Omicron variants using methods substantially similar to those described in Examples 2 and 3.
- gRNAs Guide RNAs
- CasDxl Guide RNAs (gRNAs) with CasDxl were screened for activity on synthetic gene fragments encoding regions of the SARS-CoV-2 S-gene with either wild-type (WT, SEQ ID NO: 20), E484K mutant (K484, SEQ ID NO: 21), E484A mutant (A484, SEQ ID NO: 43), and E484Q mutant (Q484 SEQ ID NO: 44) sequences at amino acid positions 452, 484, and 501.
- WT wild-type
- E484K mutant K484, SEQ ID NO: 21
- E484A mutant A484, SEQ ID NO: 43
- E484Q mutant Q484 SEQ ID NO: 44
- Wild type and mutant synthetic gene fragments were PCR amplified using NEB 2x Phusion Master Mix following the manufacturer’s protocol. The amplified product was cleaned using AMPure XP beads following manufacturers protocol at a 0.7X concentration. The product was eluted in nuclease-free water and normalized to 10 nM.
- the LAMP primers used herein are shown in Tables 3 A and 3B and were synthesized by Eurofins Genomics.
- the guide RNAs used herein are shown in Tables 4A and 4B and were synthesized by Dharmacon or Synthego.
- the reporter used herein is shown in Table 5 and was synthesized by IDT.
- the synthetic gene fragments SEQ ID Nos. 20, 21, 43, and 44 were synthesized by Twist Biosciences.
- NP/OP SARS-CoV-2 RT-PCR positive nasopharyngeal and/or oropharyngeal
- UDM universal transport media
- VTM viral transport media
- NP/OP swab samples obtained from the UCSF Clinical Microbiology Laboratory were pretreated with DNA/RNA Shield (Zymo Research, # R1100-250) at a 1 : 1 ratio.
- the Mag-Bind Viral DNA/RNA 96 kit (Omega Bio-Tek, # M6246-03) on the KingFisher Flex (Thermo Fisher Scientific, # 5400630) was used for viral RNA extraction using an input volume of 200 pl of diluted NP/OP swab sample and an elution volume of 100 pl.
- the TaqpathTM COVID-19 RT-PCR kit was used to determine the N gene cycle threshold values.
- VBM SARS-CoV-2 variants being monitored
- VOC variants of concern
- VOI variants of interest
- RNA from heat-inactivated SARS-CoV-2 VBM/V OC/V OI isolates were extracted using the EZ1 Virus Mini Kit v2.0 (Qiagen, # 955134) on the EZ1 Advanced XL (Qiagen, # 9001875) according to the manufacturer’s instructions. For each culture, six replicate LAMP reactions were pooled into a single sample. DETECTR® was performed on a 1: 10 dilution of the 10,000 cp/rxn LAMP amplification products.
- COVID- 19 Variant DETECTR® assay was performed with RT-LAMP followed by a CRISPR-Cas-based assay.
- LAMP primer sets each containing 6 primers, were designed to target the L452R, E484K and N501Y mutations in the SARS-CoV-2 Spike (S) protein (see Table 3A).
- Sets of LAMP primers were designed from a 350 bp target sequence spanning the 3 mutations using Primer Explorer V5 (available on the Internet at primerexplorer.jp/e/).
- Candidate primers were manually evaluated for inclusion using the OligoCalc online oligonucleotide properties calculator (see W. A.
- RT-LAMP was performed using a final reaction volume of 50 pl, which consisted of 8 pl RNA template, 5 pl of L452R primer set (Eurofins Genomics), 5 pl of E484K/N501Y primer set, 17pl of nuclease-free water, Ipl of SYTO-9 dye (ThermoFisher Scientific), and 14pl of LAMP mastermix.
- Each of the primer sets consisted of 1.6pM each of inner primers FIP and BIP, 0.2 pM each of outer primers F3 and B3, and 0.8 pM each of loop primers LF and LB.
- the LAMP mastermix contained 6 mM of MgSOi. isothermal amplification buffer at IX final concentration, 1.5 mM of dNTP mix (NEB), 8 units of Bst 2.0 WarmStart DNA Polymerase (NEB), and 0.5 ul of WarmStart RTx Reverse Transcriptase (NEB). Plates were incubated at 65°C for 40 minutes in a real-time QuantstudioTM 5 PCR instrument. Fluorescent signals were collected every 60 seconds.
- Degenerate multiplexed RT-LAMP was performed using a final reaction volume of 65 pl, which consisted of 9.6 pl RNA template, 10 pl of L452R degenerate primer set (Eurofins Genomics), 10 pl of E484(K/Q/A)/N501Y degenerate primer set, 14.1 pl of nuclease-free water, 1.3 pl of SYTO-9 dye (ThermoFisher Scientific), and 20 pl of LAMP mastermix.
- Two degenerate LAMP primer sets each containing 6 primers modified from the original LAMP primer sets (see Table 3 A) with degenerate nucleotides, were designed to capture the L452R, E484K, E484Q, E484A, and N501Y mutations in the SARS-CoV-2 Spike (S) protein (see Table 3B).
- S SARS-CoV-2 Spike
- SARS-CoV-2 viral genome assembly and variant analyses were performed using an in-house bioinformatics pipeline. Briefly, sequencing reads generated by Illumina sequencers (MiSeq or NextSeq 550) were demultiplexed and converted to FASTQ files using bcl2fastq (v2.20.0.422). Raw FASTQ files were first screened for SARS-CoV-2 sequences using BLASTn (BLAST+ package 2.9.0) alignment against the Wuhan-Hu-1 SARS-CoV-2 viral reference genome (NC_045512).
- each well had a guide specific to the mutant or the wild-type SNP.
- the comparison was important to assign a genotypic call to the sample.
- the DETECTR® reactions across the plate were not comparable to each other.
- the endpoint fluorescence intensities were normalized in each well to its own minimum intensity, which was defined as fluorescence yield.
- the fluorescence yield was compared across wells in a plate under the assumption that each well would have a similar minimum fluorescence starting point. Irrespective of the highest levels of the fluorescence intensities observed across samples, without being limited to any one theory of operation, the yield for a given target will generally remain the same assuming that similar concentrations of samples/target are being compared. This aided in normalizing the signal and comparing replicates across the wells in the same plate.
- the wild-type and mutant target guides on NTC generally do not show any change in intensity over time.
- the fluorescence yield for NTC remains constant across replicates and plates, and moreover is close to 1.
- Variant calling was performed using the following general rules: 1. No template controls were assigned NTC; 2. If the contrast of the sample for a SNP was between minimum and maximum contrast for the plate, then the sample was assigned a NoCall; and 3. If the size of the sample was lower than the size of the NTC on the plate, then the sample was assigned a NoCall.
- the 2X2 cross tables classify all three SNPs (at positions 452, 484, 501) across all the samples between sequencing and DETECTR® technologies (see Table 7 and FIGS. 14A- 14D).
- the data transformation and statistical analysis was done in R (see R Core Team. R: A Language and Environmental for Statistical Computing, (Vienna, Austria, 2018), available on the Internet at R-proj ect.org, date accessed: 11/26/21).
- the TaqPath PCR assay with S-gene Target Failure has functioned as a screen that can be reflexed to sequencing to identify the Omicron variant
- the SGTF assay alone cannot differentiate between Omicron BA.1 and Alpha and cannot identify emerging variants that lack the SGTF, such as the Omicron BA.2 sublineage.
- the COVID-19 variant DETECTR® assay described in Example 2 was reconfigured for the identification of Omicron by targeting the E484A mutation, which alone differentiates Omicron from all other current VBM/VOI/VOC.
- gRNAs Guide RNAs
- CasDxl Guide RNAs with CasDxl were screened for activity on synthetic gene fragments encoding regions of the SARS-CoV-2 S-gene with either wild-type (WT, SEQ ID NO: 20), E484K mutant (K484, SEQ ID NO: 21), E484A mutant (A484, SEQ ID NO: 43), and E484Q mutant (Q484 SEQ ID NO: 44) sequences at amino acid positions 452, 484, and 501 (see FIG. 15A). From this initial activity screen, the top-performing gRNAs were identified for each S-gene variant encoding either E484K, E484Q, or E484A (see FIG. 15B). CasDxl was further evaluated with its cognate gRNAs on synthetic gene fragments with respect to SNP differentiation capabilities is illustrated in FIG. 15B.
- the original LAMP primer set used in Example 2 (Table 3 A) was tested against a degenerate LAMP primer set (Table 3B) which incorporated degenerate nucleotides within the LAMP primers, as it was suspected that the original LAMP primer set may not have sufficient sensitivity to amplify the targeted spike RGD region.
- the degenerative LAMP primer design incorporated two sets of six primers each, with both sets generating overlapping spike RBD amplicons that spanned the L452R, E484K/Q/A, and N501Y mutations.
- FIG. 16 shows a comparison of RT-LAMP primers for processing Omicron clinical samples.
- Omicron clinical samples (1802), WT (1804) control RNA, and Alpha (1806) control RNAs were amplified using the original RT-LAMP primer set and degenerate RT-LAMP primer set.
- the degenerate primers (see Table 3B) amplified both the controls (WT and Alpha) and Omicron samples, whereas the original primer set (see Table 3A) only amplified the control samples (WT and Alpha).
- MA mean average
- sample COVID-112 was called an Omicron by DETECTR® based on its A484 SNP call, which was confirmed by WGS.
- sample COVID-122 could not be amplified by RT-LAMP, also suggesting a loss in sample integrity. Following this discrepancy analysis, an overall SNP concordance of 94.7%, and 100% NPA was demonstrated for this set of 48 samples (Table 9).
- Table 9 Overall SNP concordance values for the 484 SNP from the evaluation of the DETECTR® assay against the SARS-CoV-2 WGS comparator assay
- a CRISPR based DETECTR® assay was developed for the detection of SARS-CoV-2 variants.
- a data analysis pipeline was developed to differentiate between WT and MUT signals with the COVID-19 Variant DETECTR® assay, yielding an overall SNP concordance of 97.9% (373/381 total SNP calls when combined with Example 2) and 99.3% (138/139 when combined with Example 2) agreement with lineage classification compared to viral WSG.
- these findings show robust agreement between the COVID- 19 Variant DETECTR® assay and viral WGS for identification of SNP mutations and variant categorization.
- the COVID- 19 Variant DETECTR® assay provided a faster and simpler alternative to sequencing-based methods for COVID- 19 variant diagnostics and surveillance.
- CRISPR-based diagnostic assays have been demonstrated for the detection of SARS-CoV-2 variants, these studies have limitations regarding coverage of circulating lineages, the extent of clinical sample evaluation, and/or assay complexity.
- the miSHERLOCK variant assay uses LbCasl2a (NEB) with RPA preamplification to detect N501Y, E484K and Y144Del covering eight lineages (WA-1, Alpha, Beta, Gamma, Eta, Iota, Mu and Zeta) and was tested only on contrived samples (RNA spiked into human saliva) (see Puig et al., Sci Adv 7, (2021)).
- the SHINEv2 assay uses LwaCasl3a with RPA pre-amplification to detect 69/70Del, K417N/T, L452R and 156/157Del + R158G covering eight lineages (WA-1, Alpha, Beta, Gamma, Delta, Epsilon, Kappa and Mu) and was tested with only the 69/70Del gRNAs on 20 Alpha-positive NP clinical samples (see Arizti-Sanz et al., medRxiv, (2021)).
- the mCARMEN variant identification panel uses 26 crRNA pairs with either the LwaCasl3a or LbaCasl3a and PCR pre-amplification to identify all current circulating lineages including Omicron; however, the VIP requires the Fluidigm Biomark HD system or similar, more complex instrumentation for streamlined execution (see Welch et al., medRxiv, (2021)).
- COVID- 19 Variant DETECTR® assay disclosed herein in Examples 2 and 4 uses CasDxl to detect N501Y, E484K/Q/A, and L452R covering all current circulating lineages and tested on 139 clinical samples representing eight lineages (WA-1, Alpha, Gamma, Delta, Epsilon, Iota, Mu, Omicron). Furthermore, specific Omicron identification was accomplished using only the E484 WT and A484 MUT guides.
- Example 4 shows that the choice of Cas enzyme may be important to maximize the accuracy of CRISPR-based diagnostic assays and may need to be tailored to the site that is being targeted.
- the COVID- 19 variant DETECTR® assay was capable of distinguishing the Alpha, Delta, Kappa and Omicron variants, but did not resolve the remaining VBMs or VOIs.
- tracking of key mutations many of which are suspected to arise from convergent evolution, rather than tracking of variants, may be more important for surveillance as the pandemic continues.
- the data analysis pipeline developed for CRISPR-based SNP calling described herein can readily incorporate additional targets and may offer a blueprint for automated interpretation of fluorescent signal patterns.
- the COVID-19 Variant DETECTR® assay described herein can be served as an initial screen for circulating variants and/or a distinct pattern from a rare or novel variant by interrogating the key 452, 484, and 501 positions that could be reflexed to viral WGS.
- the COVID- 19 Variant DETECTR® assay would thus enable rapid identification of variants circulating in the community to support outbreak investigation and public health containment efforts.
- the COVID-19 Variant DETECTR® assay can be readily reconfigured by validating new preamplification LAMP primers and gRNAs that target emerging mutations with clinical and epidemiological significance.
- newly emerging variants may contain additional mutations that may not be captured with the resolution of the L452R, E484K/Q/A, and N501Y mutations described in Examples 2-4 or which may result in variable performance at a particular SNP compared to others, and could be detected by incorporation of additional gRNA(s) to provide specific and redundance coverage and/or improve identification of specific lineages.
- incorporation of an additional N-gene target to the assay may improve the limit of detection of the DETECTR® assay, which currently relies entirely on a multiplexed and degenerate S-gene LAMP primer design, optionally to facilitate simultaneous detection and SNP/variant identification.
- a validated CRISPR assay that combines SARS-CoV-2 detection with variant identification may offer a faster and simpler alternative to sequencing and would be useful as a tool for simultaneous COVID-19 diagnosis in individual patients and surveillance for infection control and public health purposes.
- first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first subject could be termed a second subject, and, similarly, a second subject could be termed a first subject, without departing from the scope of the present disclosure. The first subject and the second subject are both subjects, but they are not the same subject.
- the term “if’ may be construed to mean “when” or “upon” or “in response to determining” or “in response to detecting,” depending on the context.
- the phrase “if it is determined” or “if [a stated condition or event] is detected” may be construed to mean “upon determining” or “in response to determining” or “upon detecting (the stated condition or event (” or “in response to detecting (the stated condition or event),” depending on the context.
- the term “about” in reference to a number or range of numbers is understood to mean the stated number and numbers +/- 10% thereof, or 10% below the lower listed limit and 10% above the higher listed limit for the values listed for a range.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Medical Informatics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Evolutionary Biology (AREA)
- Organic Chemistry (AREA)
- Molecular Biology (AREA)
- Theoretical Computer Science (AREA)
- Analytical Chemistry (AREA)
- Bioinformatics & Computational Biology (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Immunology (AREA)
- Genetics & Genomics (AREA)
- General Engineering & Computer Science (AREA)
- Biochemistry (AREA)
- Microbiology (AREA)
- Virology (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Systems and methods for determining a mutation call for a target locus in a sample are provided. A signal dataset is obtained comprising, for each well in a plurality of wells, reporting signals for nucleic acids in the sample that map to the locus, over a plurality of time points. Each well contains an aliquot of nucleic acid derived from the sample. A first set of wells includes guide nucleic acids having a first allele for the locus. A second set of wells includes guide nucleic acids having a second allele for the locus. For each well, signal yields are determined using the reporting signals, and candidate call identities are determined for the first set of wells based on a comparison of signal yields in the first and second set of wells. A voting procedure is performed across the candidate call identities, thus obtaining a mutation call for the locus.
Description
SYSTEMS AND METHODS FOR IDENTIFYING GENETIC PHENOTYPES USING PROGRAMMABLE NUCLEASES
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to U.S. Provisional Patent Application Serial No. 63/390,572, filed July 19, 2022, U.S. Provisional Patent Application Serial No. 63/305,934, filed February 2, 2022; U.S. Provisional Patent Application Serial No. 63/305,872, filed February 2, 2022; U.S. Provisional Patent Application Serial No. 63/283,930, filed November 29, 2021; and U.S. Provisional Patent Application Serial No. 63/283,936, filed November 29, 2021, each of which is hereby incorporated by reference in its entirety.
TECHNICAL FIELD
[0002] This specification describes technologies generally relating to detection and discrimination of genetic mutations in a biological sample.
INCORPORATION BY REFERENCE OF SEQUENCE LISTING
[0003] Accompanying this filing is a Sequence Listing entitled, “SeqListing.xml” created on November 29, 2022 and having 190,717 bytes of data, machine formatted on IBM-PC, MS-Windows operating system. The sequence listing is hereby incorporated by reference in its entirety for all purposes.
BACKGROUND
[0004] There have been recurrent large-scale epidemics from novel emerging viruses, including SARS-CoV-2. Since the initial outbreak of SARS-CoV-2, multiple fast-spreading variants have emerged and have been observed to complicate the epidemic and to lead to various government restrictions after the initial relief, which resulted in more severe socioeconomic impacts than the original strain.
[0005] Tracking the evolution and spread of SARS-CoV-2 variants in the community is critical to inform public policy regarding testing and vaccination, as well as guide contact tracing and containment effects during local outbreaks. Variant identification can also be clinically significant, as some mutations substantially reduce the effectiveness of available monoclonal antibody therapies for the disease.
[0006] As a result, there is an urgent need to strengthen genomic surveillance to monitor the evolution of SARS-CoV-2 variants and to uncover possible different patterns of infection by variants of concern e.g., changes in transmissibility, clinical patterns, or lethality). Virus whole-genome sequencing (WGS) and single nucleotide polymorphism (SNP) genotyping are commonly used to identify variants, but can be limited by long turnaround times and/or the requirement for bulky and expensive laboratory instrumentation.
SUMMARY
[0007] Given the above background, improved systems and methods are needed for determining mutation calls for target loci in biological samples, particularly in the application of programmable nuclease-based (e.g., CRISPR-Cas-based) assays and/or amplificationbased assays for recognition of mutated sequences, such as in SARS-CoV-2 variants. For example, in at least some instances, mutations in the spike protein, which binds to the human ACE2 receptor, can render the SARS-CoV-2 virus more infectious and/or more resistant to antibody neutralization, resulting in increased transmissibility and/or escape from immunity, whether vaccine-mediated or naturally acquired immunity. Variant identification can also be clinically significant, as some mutations substantially reduce the effectiveness of available monoclonal antibody therapies for the COVID-19 disease.
[0008] Advantageously, technical solutions (e.g., compositions, computing systems, methods, and non-transitory computer readable storage mediums) for addressing the above identified problems are provided in the present disclosure. The compositions, systems, and methods described herein satisfy the abovementioned needs, among others, and provide related advantages.
[0009] For instance, tracking the evolution and spread of pathogenic variants (e.g. , SARS-CoV-2 variants) in the community can inform public policy regarding testing and vaccination, as well as guide contact tracing and containment effects during local outbreaks. As described above, virus whole-genome sequencing (WGS) and single nucleotide polymorphism (SNP) genotyping are commonly used to identify variants, but can be limited in at least some instances by long turnaround times and/or the requirement for bulky and expensive laboratory instrumentation. In some implementations, it is therefore advantageous to provide diagnostic assays based on clustered interspaced short palindromic repeats (CRISPR) for rapid detection of variants (e.g, SARS-CoV-2 variants) in clinical samples. Some advantages of the assays, compositions, computing systems, methods, and non-
transitory computer readable storage mediums described herein for use in laboratory and point of care settings include low cost, minimal instrumentation, and a sample-to-answer turnaround time of under 2 hours.
[0010] Systems and Methods for Determining Mutation Calls
[0011] Accordingly, one aspect of the present disclosure provides a method for determining a mutation call for a target locus in a biological sample, at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors. The method includes obtaining a signal dataset comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus. The signal dataset represents a plurality of time points. The plurality of wells comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus. The plurality of wells further comprises a second set of wells representing a second allele for the target locus, where each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus. Each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value, and each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample.
[0012] For each respective well in the plurality of wells, a corresponding signal yield for the respective well is determined using the corresponding plurality of reporting signals for the respective well across the plurality of time points. For each respective well in the first set of wells, a respective candidate call identity is determined based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities. A voting procedure is performed across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
[0013] In some embodiments, the signal dataset is obtained by a procedure comprising amplifying a first plurality of nucleic acids derived from the biological sample, thereby
generating a plurality of amplified nucleic acids. For each respective well in the first set of wells and each respective well in the second set of wells, the method includes partitioning, from the plurality of amplified nucleic acids, the respective corresponding aliquot of nucleic acid; contacting the respective corresponding aliquot of nucleic acid with at least a programmable nuclease, a guide nucleic acid, and one or more reporters; and cleaving the one or more reporters using the programmable nuclease if the first allele or the second allele, respectively, is present, thereby generating a reporting signal.
[0014] In some embodiments, the method further comprises, for each candidate second allele in a plurality of candidate second alleles, repeating the process of obtaining, determining, determining, and performing, thereby obtaining a plurality of candidate mutation calls for the target locus, and performing a mutation call voting procedure across the plurality of candidate mutation calls for the target locus, thereby obtaining a final mutation call for the target locus. In some such embodiments, the first allele is a wild-type allele and each respective candidate second allele in the plurality of candidate second alleles is a different respective mutant allele for the target locus.
[0015] In some embodiments, when a single respective candidate mutation call in the plurality of candidate mutation calls corresponds to a respective candidate second allele in the plurality of candidate second alleles, the respective candidate mutation call is determined as the final mutation call for the target locus. In some embodiments, when a sub-plurality of candidate mutation calls in the plurality of candidate mutations calls corresponds to a respective sub-plurality of candidate second alleles in the plurality of candidate second alleles, the obtaining the final mutation call for the target locus is based upon a comparison of signal yields for each pair of candidate second alleles in the sub-plurality of candidate second alleles.
[0016] Another aspect of the present disclosure provides a method for determining a mutation call for a target locus in a biological sample. The method comprises, using a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors, obtaining a signal dataset comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus. In such embodiments, the signal dataset represents a plurality of time points. Moreover, the plurality of wells comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of
guide nucleic acids that have the first allele of the target locus. Further still, the plurality of wells also comprises, for each respective candidate second allele in a set of candidate second alleles, a respective additional set of wells representing the respective candidate second allele for the target locus, where each well in this respective additional set of wells includes a corresponding plurality of guide nucleic acids that have the respective candidate second allele for the target locus. Each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value. Also, each respective well in the first set of wells and each respective well in each respective additional set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample. The method continues by determining, for each respective well in the plurality of wells, a corresponding signal yield for the respective well using the corresponding plurality of reporting signals for the respective well across the plurality of time points. The method further determines, for each respective well in the first set of wells, for each respective candidate second allele in the set of candidate second alleles, a respective candidate call identity based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding respective second signal yield for a corresponding well in the respective additional set of wells for the respective candidate second allele, thereby obtaining a plurality of candidate call identities. A voting procedure is then performed across the plurality of candidate call identities, thereby obtaining a mutation call for the target locus that is one of the candidate second alleles in the set of candidate second alleles or the first allele. In some such embodiments, the first allele is a wild-type allele and each respective candidate second allele in the set of candidate second alleles is a different respective mutant allele for the target locus. In some such embodiments, the first allele is other than a wild-type allele. In some such embodiments, the set of candidate second alleles consists of between 2 and 10 candidate second alleles. In some such embodiments the set of candidate second alleles consists of a single second allele.
[0017] Another aspect of the present disclosure provides a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors, the one or more programs comprising instructions for performing any of the methods and/or embodiments disclosed above.
[0018] Another aspect of the present disclosure provides a non-transitory computer readable storage medium storing one or more programs configured for execution by a
computer, the one or more programs comprising instructions for performing any of the methods and/or embodiments disclosed above.
[0019] Compositions, Systems, and Methods for Assaying SARS-CoV-2 Variants
[0020] Another aspect of the present disclosure provides a CRISPR-based COVID-19 variant DETECTR® assay (henceforth abbreviated as DETECTR assay) for the detection of SARS-CoV-2 mutations. The assay combines RT-LAMP pre-amplification followed by fluorescent detection using a CRISPR-Casl2 enzyme. A comparative evaluation of multiple candidate Cast 2 enzymes and robust assay performance was found with a CRISPR-Casl2 enzyme called CasDxl, which had high specificity in identifying key SNP mutations of functional relevance in the spike protein at amino acid positions 452, 484, and 501.
[0021] In various aspects, the disclosure provides method of assaying for a SARS-CoV-2 variant in an individual, the method comprising: collecting a nasal swab or a throat swab from the individual; optionally extracting a target nucleic acid comprising a segment of a SARS-CoV-2 Spike (“S”) gene from the nasal swab or the throat swab; amplifying the target nucleic acid to produce an amplification product, wherein amplifying the target nucleic acid comprises contacting the target nucleic acid to a plurality of LAMP amplification primers; contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising (i) an effector protein and (ii) a guide nucleic acid comprising a nucleotide sequence that is at least 90% identical to any one of SEQ ID NOS: 1 to 6, 22 to 27, or 40-42; assaying for a change in a signal produced by cleavage of a reporter nucleic acid, wherein the effector protein trans cleaves the reporter nucleic acid upon hybridization of the guide nucleic acid to the target nucleic acid, a segment thereof, or an amplification product thereof; and determining an SNP call of the sample. In some embodiments, the nasal swab is a nasopharyngeal swab. In some embodiments, the throat swab is an oropharyngeal swab. In some embodiments, amplifying the target nucleic acid comprises contacting the target nucleic acid to at least one reagent for amplification. In some embodiments, the amplifying comprises loop mediated amplification (LAMP). In some embodiments, the at least one reagent for amplification comprises a polymerase, dNTPs, or a combination thereof. In some embodiments, the plurality of LAMP amplification primers comprise an FIP primer, a BIP primer, an F3 primer, a B3 primer, an LF primer, and an LB primer. In some embodiments, the amplification primers comprise SEQ ID NOS: 8-13, SEQ ID NOS: 14-19, SEQ ID NOS: 28-33, and/or SEQ ID NOS: 34-39. In some embodiments, the amplifying comprises reverse transcription-LAMP. In some embodiments, the method further comprises lysing the sample.
In some embodiments, lysing the sample comprises contacting the sample to a lysis buffer. In some embodiments, determining an SNP call comprises determining whether the segment of the S-gene comprises one or more S-gene mutation(s) relative to a reference wild-type SARS-CoV-2 S-gene. In some embodiments, the reference wild-type SARS-CoV-2 gene is from a SARS-CoV-2-Wuhan-Hul sequence or the USA-WA1/2020 sequence. In some embodiments, the one or more S-gene mutations is a single nucleotide polymorphism (SNP). In some embodiments, the one or more S-gene mutations is associated with one or more Spike protein mutations. In some embodiments, the one or more Spike protein mutations is (i) a mutation in amino acid position 484 from E to K (E484K), (ii) a mutation in amino acid position 501 from N to Y (N501Y), (iii) a mutation in amino acid position 452 from L to R (L452R), (iv) a mutation in amino acid position 484 from E to Q (E484Q), (v) a mutation in amino acid position 484 from E to A (E484A), or a combination thereof. In some embodiments, determining an SNP call of the sample comprises comparing the signal produced by the cleavage of the reporter nucleic acid by a composition comprising a guide nucleic acid comprising a nucleotide sequence that is least 90% identical to any one of SEQ ID NOS: 1-3 or 22-24 to a signal produced by contacting a composition comprising a guide nucleic acid comprising a nucleotide sequence that is at least 90% identical to any one of SEQ ID NOS: 4-6, 25-27, or 40-42 to a nucleic acid identical to the target nucleic acid. In some embodiments, the method comprises determining a variant call of the sample. In some embodiments, determining the variant call comprises determining whether the sample comprises a wild-type SARS-CoV-2 or a SARS-CoV-2 variant. In some embodiments, the SARS-CoV-2 variant is any one of an Alpha, Beta, Gamma, Delta, Epsilon, Kappa, Omicron, or Zeta SARS-CoV-2 variant. In some embodiments, determining the variant call comprises determining one or more SNP calls of the sample. In some embodiments, determining whether the sample comprises an Alpha SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an N501Y mutation. In some embodiments, determining whether the sample comprises a Beta SARS-CoV-2 variant comprises detecting S-gene mutations associated with E484K and N501Y mutations. In some embodiments, determining whether the sample comprises a Mu SARS-CoV-2 variant comprises detecting S-gene mutations associated with E484K and N501Y mutations. In some embodiments, determining whether the sample comprises a Gamma SARS-CoV-2 variant comprises detecting S-gene mutations associated with E484K and N501Y mutations. In some embodiments, determining whether the sample comprises a Delta SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an L452R mutation. In some embodiments, determining whether the sample
comprises an Epsilon SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an L452R mutation. In some embodiments, determining whether the sample comprises a Kappa SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an L452R mutation. In some embodiments, determining whether the sample comprises an Omicron SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an E484A mutation. In some embodiments, determining whether the sample comprises a Zeta SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an E484K mutation. In some embodiments, In some embodiments, the effector protein is a Type V Cas effector protein. In some embodiments, the Type V effector protein is a Cas 12 effector protein, a Cas 13 effector protein, a Cas 14 effector protein, or a CasPhi effector protein. In some embodiments, the guide nucleic acid comprises a nucleotide sequence that is at least 95% identical to one of SEQ ID NO: 1 to 6, 22 to 27, or 40 to 42. In some embodiments, the guide nucleic acid comprises a nucleotide sequence that is at least 98% identical to one of SEQ ID NO: 1 to 6, 22 to 27, or 40 to 42. In some embodiments, the guide nucleic acid comprises a nucleotide sequence of one of SEQ ID NO: 1 to 6, 22 to 27, or 40 to 42. In some embodiments, the reporter nucleic acid comprises a nucleotide sequence that is at least 75% identical to SEQ ID NO: 7, wherein the nucleotide sequence is flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end. In some embodiments, the reporter nucleic acid has a nucleotide sequence of SEQ ID NO: 7 flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end.
[0022] Various embodiments of systems, methods and devices within the scope of the appended claims each have several aspects, no single one of which is solely responsible for the desirable attributes described herein. Without limiting the scope of the appended claims, some prominent features are described herein. After considering this discussion, and particularly after reading the section entitled “Detailed Description” one will understand how the features of various embodiments are used.
INCORPORATION BY REFERENCE
[0023] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference in their entireties to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.
BRIEF DESCRIPTION OF THE DRAWINGS
[0024] The implementations disclosed herein are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings. Like reference numerals refer to corresponding parts throughout the several views of the drawings.
[0025] FIG. 1 is an example block diagram illustrating a computing device and related data structures used by the computing device in accordance with some implementations of the present disclosure.
[0026] FIGS. 2A, 2B, 2C, 2D, 2E, 2F, and 2G illustrate an example method in accordance with an embodiment of the present disclosure, in which optional steps are indicated by broken lines.
[0027] FIG. 3 illustrates an example schematic for a method of determining a mutation call for a target locus in a sample using programmable nucleases, in accordance with some embodiments of the present disclosure.
[0028] FIGS. 4A, 4B, and 4C collectively illustrate a method for determining a mutation call for a target locus in a sample, in accordance with some embodiments of the present disclosure.
[0029] FIGS. 5A, 5B, and 5C collectively illustrate identification of differentiated phenotypes called as wild-type, mutated, and no-call in a signal dataset, in accordance with an embodiment of the present disclosure. Figure 5 A provides an allele discrimination plot visualizing the signal yields obtained from a COVID Variant programmable nuclease-based assay on gene fragments. The allele discrimination plots represent scatter plots of scaled WT and MUT fluorescence values plotted against each other. Figure 5B shows Mean Average (MA) plots of the COVID Variant programmable nuclease-based assay data on gene fragments to decrease ambiguity of the signal yields. A ratio of WT and MUT transformed values are plotted against the average of the WT and MUT transformed values on the MA plot. Figure 5C presents MA plots of the COVID Variant programmable nuclease-based assay on the gene fragments (n=30 WT; n=30 MUT for each SNP) and no template controls (n=33 WT; n=33 MUT for each SNP) used to validate the present systems and methods.
[0030] FIGS. 6A, 6B, 6C, 6D, 6E, 6F, 6G, 6H, 61, 6J, and 6K illustrate design and workflow for a COVID-19 Variant DETECTR® assay, in accordance with some embodiments of the present disclosure. FIG. 6A shows a schematic of CRISPR-Cas gRNA design for SARS-CoV-2 S gene mutations. FIG. 6B shows heat map comparison of three
different Casl2 enzymes tested using 10 nM PCR-amplified synthetic gene fragments (t=30 minutes). FIG. 6C shows raw fluorescence curves visualizing the SNP differentiation capability of CasDxl using synthetic gene fragments of an S-gene fragment of interest from (i) a wild-type form of the virus containing no major mutations and (ii) SARS-CoV-2 variants having mutations at amino acid positions 452, 484, and 501 relative to the wild-type virus. FIG. 6D shows raw fluorescence curves visualizing the SNP differentiation capability of LbCasl2a (which may be referred to as “LbaCasl2a”, “LbCasl2”, and “LbCasl2a” interchangeably) using synthetic gene fragments of an S-gene fragment of interest from (i) a wild-type form of the virus containing no major mutations and (ii) SARS-CoV-2 variants having mutations at amino acid positions 452, 484, and 501 relative to the wild-type virus. FIG. 6E shows raw fluorescence curves visualizing the SNP differentiation capability of AsCasl2a using synthetic gene fragments of an S-gene fragment of interest from (i) a wildtype form of the virus containing no major mutations and (ii) SARS-CoV-2 variants having mutations at amino acid positions 452, 484, and 501 relative to the wild-type virus. FIG. 6F shows a schematic of multiplexed RT-LAMP primer design showing the SARS-CoV-2 S gene mutations and gRNA positions. FIG. 6G shows a dot plot showing the number (n = 6) of positive replicates across a 4-log dynamic range of the RT-LAMP products. FIG. 6H shows heat-map comparison of end-point fluorescence (t = 30 minutes) of three different Cast 2 enzymes tested against heat-inactivated viral cultures. Replicates (n = 6) generated using RT-LAMP were pooled, and CRISPR-Casl2 reactions were then run in triplicate (n = 3). FIG. 6I-6K shows raw fluorescence curves from three Casl2 enzymes (CasDxl, LbCasl2a, and AsCasl2a) on eight heat-inactivated viral culture samples from various SARS-CoV-2 lineages, a no target control (RT-LAMP) and CasDxl detection controls (WT, MUT and NTC). Experiments were performed using CasDxl replicates (n = 3), with standard deviation of ±1.0SD.
[0031] FIG. 7 shows an example workflow comparison between the COVID Variant DETECTR® assay and SARS-CoV-2 Whole-Genome-Sequencing, in accordance with some embodiments of the present disclosure.
[0032] FIGS. 8A, 8B, 8C, 8D, 8E, 8F, 8G, and 8H illustrate example plots obtained using a DETECTR® data analysis pipeline for SARS-CoV-2 SNP mutation calling, as illustrated in FIGS. 4A-4C, in accordance with an embodiment of the present disclosure. FIG. 8A shows a key providing orientation for FIGS. 8B-8H relative to each other. FIG. 8B- 8H shows raw fluorescence RT-LAMP curves for each clinical sample. The raw fluorescence
RT-LAMP amplification curves for each of the clinical samples was analyzed in triplicate (n = 3 replicates). Each line is representative of the median ±1.0SD of the three RT-LAMP replicates for each sample. RT-LAMP replicates that passed quality control (QC) are represented with solid black lines and dark gray shading and failed LAMP replicates are shown with solid gray lines and light gray shading. Only valid RT-LAMP replicates were used in subsequent data analysis.
[0033] FIGS. 9A, 9B, 9C, 9D, 9E, 9F, 9G, 9H, 91, 9J, 9K, 9L, and 9M illustrate example plots obtained using the DETECTR® data analysis pipeline for SARS-CoV-2 SNP mutation calling illustrated in FIGS. 4A-4C and 8A-8H, in accordance with an embodiment of the present disclosure. FIG. 9A shows a key providing orientation for FIGS. 9B-9M relative to each other. FIGS. 9B-9M shows raw fluorescence CasDxl curves for each clinical sample amplified by RT-LAMP. Each clinical sample was amplified with RT-LAMP in triplicate, and the resulting amplicons were detected by CasDxl in triplicate. The raw fluorescence curves show WT detection in thick black lines and MUT detection in thin gray lines. Each line is representative of the median ±1.0SD of the CasDxl replicates (n = 3) for each WT and MUT guide for each of the RT-LAMP replicates (n = 3), represented by different patterned lines.
[0034] FIGS. 10A, 10B, 10C, 10D, 10E, and 10F illustrate example plots obtained using the DETECTR® data analysis pipeline for SARS-CoV-2 SNP mutation calling illustrated in FIGS. 4A-4C, 8A-8H, and 9A-9M, in accordance with an embodiment of the present disclosure. FIG. 10A shows allele discrimination plots visualizing the scaled signals from the COVID Variant DETECTR® assay on gene fragments. The allele discrimination plots represent scatter plots of scaled WT and MUT fluorescence values plotted against each other. FIG. 10B shows contrast-size or mean average (MA) plots of the COVID Variant DETECTR® assay data on gene fragments to decrease ambiguity of the scaled signals, where a ratio of the WT and MUT transformed values are plotted against the average of the WT and MUT transformed values on the MA plot. FIGS. 10C-10D collectively show three representative clinical samples of different SARS-CoV-2 lineages used in the workflow of the COVID- 19 Variant DETECTR® assay. In FIG. 10C, raw fluorescence curves of each sample run in RT-LAMP amplification and subsequent triplicate DETECTR® reactions targeting both WT and MUT SNPs for L452(R), E484(K), and N501(Y) are shown. For raw fluorescence curves, WT detection is shown in black lines without asterisks and MUT detection is shown in black lines marked by asterisks. Assays were performed using RT-
LAMP replicates (n = 3), CasDxl replicates (n = 3 per LAMP replicate), where shading around kinetic curves indicates standard deviation of ±1.0SD. FIG. 10D shows box plot visualization of the end point fluorescence in DETECTR® across each SNP for the three representative clinical samples shown in FIG. 10C. Calls were made for each SNP by evaluating the median values of the DETECTR® calls and overall calls through the LAMP replicates, and given a designation of WT, MUT, or NoCall. Final calls were made on the lineage determined by each SNP. Non-shaded elements represent WT and shaded elements represent MUT. FIG. 10E shows MA plots of the COVID Variant DETECTR® assay on the gene fragments (n = 30 WT; n = 30 MUT for each SNP) and no template controls (n = 33 WT; n = 33 MUT for each SNP) used to develop the data analysis pipeline. FIG. 10F shows a schematic of a data analysis pipeline workflow describing the RT-LAMP QC and subsequent CasDxl signal scaling. The scaled signals were compared across SNPs and the calls were made for each RT-LAMP replicate. The combined replicate calls defined the mutation call, which informed the final lineage classification.
[0035] FIGS. 11A, 11B, 11C, and 11D illustrate an example evaluation of the COVID- 19 Variant DETECTR® assay compared to SARS-CoV-2 Whole-Genome Sequencing, in accordance with an embodiment of the present disclosure. FIG. 11A shows determination of RT-LAMP threshold with a ROC curve. Thresholds for LAMP quality analysis were derived to determine which samples had amplified sufficiently. The exact score value for this qualitative QC metric was determined using a ROC analysis. FIG. 11B illustrates a heat map showing CasDxl signal (n = 3) per every LAMP replicate (n = 3) for each SNP on every clinical sample reflecting samples prior to discordance testing. FIG. 11C shows allele discrimination plots visualizing the scaled signals from the COVID Variant DETECTR® assay on clinical samples. FIG. 11D shows MA plots, transformed onto M (log ratio) and A (mean average) scales, showing CasDxl SNP detection replicates (n = 807) for each SARS- CoV-2 mutation across 91 clinical samples. WT, MUT, NoCall, and NTC detection is denoted by labeled circles.
[0036] FIGS. 12A, 12B, 12C, 12D, 12E, 12F, 12G, 12H, 121, 12J, 12K, 12L, and 12M shows visualization of SNP calls by the data analysis pipeline illustrated in FIGS. 11A-11D. Box plots of all the clinical samples illustrate the spread of the scaled signals for each of the samples across the replicates in the experiment. SNP calls were made on each sample agreement with the median values depicted on the box plot of the sample, which also
provided an analytical confirmation of the DETECTR® results. WT detection is represented by shaded boxes and MUT detection is represented by non-shaded boxes.
[0037] FIGS. 13A, 13B, 13C, 13D, 13E, 13F, 13G, 13H, 131, and 13J illustrate evaluation of the COVID- 19 Variant DETECTR® assay compared to SARS-CoV-2 Whole- Genome Sequencing as illustrated in FIGS. 11A-11D and 12A-12M, in accordance with an embodiment of the present disclosure. FIGS. 13A-13D collectively show raw fluorescence CasDxl curves for the clinical samples with discordant DETECTR® and WGS results. WT detection is represented by black lines and MUT detection is represented by gray lines. Each line is representative of the median ±1.0SD of the CasDxl replicates (n = 3) for each guide for each of the LAMP replicates (n = 3), and each RT-LAMP replicate is represented by different patterned lines. FIG. 13E shows visualization of the COVID Variant DETECTR® and SARS-CoV-2 WGS assays showing the alignment of final calls. Across all of the clinical samples in this cohort, 80 out of the 91 clinical sample COVID Variant DETECTR® assay calls were consistent with the SARS-CoV-2 WGS calls. FIG. 13F shows alignment of final mutation calls comparing the COVID-19 Variant DETECTR® and SARS-CoV-2 WGS assay results across 91 clinical samples after discordant samples (indicated by asterisk) were resolved. FIG. 13G shows final lineage classification on each clinical sample by the COVID- 19 Variant DETECTR® compared to the SARS-CoV-2 lineage determined by the viral WGS. FIGS. 13H-13I collectively show a summary of re-testing of discordant samples from the original clinical sample where nearly all SNP discrepancies are resolved. FIG. 13J shows final lineage classification on each clinical sample by the COVID- 19 Variant DETECTR® compared to the SARS-CoV-2 lineage determined by the viral WGS following resolution of discordant samples.
[0038] FIGS 14A, 14B, 14C, and 14D collectively illustrate an overall results summary of final SNP calls by COVID-19 Variant DETECTR® assay and viral WGS, in accordance with an embodiment of the present disclosure. A summary table of the final SNP calls from the COVID-19 Variant DETECTR® assay and the SARS-CoV-2 whole genome sequencing assay after discordant testing is shown. The table includes the lineage classification from DETECTR® calls as well as the PANGO lineage and WHO labels assigned to the WGS calls. Ct values from running an FDA EUA authorized SARS-CoV-2 RT-PCR assay, the Taqpath™ COVID-19 RT-PCR kit, are shown. Discordant samples were reflexed back for reprocessing (indicated by asterisks); COVID-63 was classified as a Delta variant by WGS despite its Q484 SNP call, (indicated by marker f ).
[0039] FIGS. 15A, 15B, and 15C illustrate specific detection of 484 mutations, which can enable rapid Omicron identification, in accordance with an embodiment of the present disclosure. FIG. 15A shows a schematic of Omicron mutations within the S-gene LAMP amplicon and relative position of 484-specific gRNAs and degenerate LAMP primers. FIG. 15B shows a heat map comparison of end-point fluorescence (t = 30 min) showing specific detection of 484-specific mutations (E, K, Q, A) on PCR-amplified synthetic gene fragments (n = 3). FIG. 15C shows alignment of final 484 mutation calls comparing the DETECTR® and SARS-CoV-2 WGS assay results across 36 clinical samples.
[0040] FIG. 16 shows a comparison of RT-LAMP primers for processing Omicron clinical samples, in accordance with an embodiment of the present disclosure. Omicron clinical samples (1802), WT (1804) control RNA, and Alpha (1806) control RNAs were amplified using the original RT-LAMP primer set and degenerate RT-LAMP primer set and plotted relative to negative control (1808). The degenerate primers (see Table 3B) amplified both the controls (WT and Alpha) and Omicron samples, whereas the original primer set (see Table 3A) only amplified the control samples (WT and Alpha).
[0041] FIGS. 17A, 17B, 17C, and 17D show a summary of results of final SNP calls by the COVID Variant DETECTR® and WGS assays, in accordance with an embodiment of the present disclosure, with reference to Example 4 below. FIG. 17A shows a schematic of the workflow for determining the final variant calls. If the result was an A484, K484, or Q484, the final variant call was made. If the result was an E484, the sample was reflexed to DETECTR® analysis at the 452 and 501 positions to make the variant determination. FIG. 17B shows an interpretation table including the specific 484 SNPs. FIG. 17C shows a summary table of the final SNP calls from the DETECTR® assay and the SARS-CoV-2 whole genome sequencing assay of Example 4 including the lineage classification from DETECTR® calls as well as the PANGO lineage and WHO labels assigned to the WGS calls. Ct values were obtained from the FDA EUA authorized S Taqpath™ COVID-19 RT- PCR kit. FIG. 17D shows a summary table of the five discordant samples from the DETECTR® assay and WGS after retesting. (NoCall = lack of data generated, N/A = assay not run).
[0042] FIG. 18 shows a summary table of the clinical specimens described in Example 4 with no signal in either the RT-LAMP or the DETECTR® reactions, in accordance with an embodiment of the present disclosure. These samples were called “COVID Not Detected (ND)” by both the COVID- 19 variant DETECTR® assay and the SARS-CoV-2 whole
genome sequencing assays. Ct values were obtained from the FDA EUA authorized S Taqpath™ COVID-19 RT-PCR kit. (NoCall = lack of data generated, N/A = assay not run).
DETAILED DESCRIPTION
[0043] Introduction
[0044] Tracking the evolution and spread of pathogenic variants (e.g., SARS-CoV-2 variants) in the community can inform public policy regarding testing and vaccination, as well as guide contact tracing and containment efforts during local outbreaks. In particular, the ability to detect and discriminate genetic phenotypes such as wild-type and/or mutant sequences responsible for disease and infection facilitates a wide range of clinical and epidemiological applications including detection of infectious variants, monitoring the evolution and spread of pathogenic or intervention-resistant strains, and/or discovery of novel target sequences for intervention.
[0045] For example, the emergence of new SARS-CoV-2 variants threatens to substantially prolong the COVID- 19 pandemic. SARS-CoV-2 variants, especially Variants of Concern (VOCs), have caused resurgent COVID- 19 outbreaks worldwide, even in populations with a high proportion of vaccinated individuals. Mutations in the spike protein, which binds to the human ACE2 receptor, can render the virus more infectious and thereby more transmissible, and/or more capable of evading vaccine or naturally acquired immunity, leading to re-infection. Variant identification can also be clinically significant, as some mutations substantially reduce the effectiveness of monoclonal antibody or convalescent plasma therapies for the disease.
[0046] Traditional methods such as whole genome sequencing (WGS) and single nucleotide polymorphism (SNP) genotyping are commonly used to identify variants but can be limited by long turnaround times and/or the requirement for bulky and expensive laboratory instrumentation. Moreover, conventional diagnostic methods such as quantitative reverse transcriptase PCR (qRT-PCR) can result in false negatives when the concentration of target nucleic acids are low. For instance, viral loads for SARS-CoV-2 can vary during the day and at different stages of infection, such that conventional methods may fail to provide sufficient sensitivity to detect target sequences. Given these deficiencies, there is a need for improved systems and methods for the identification of phenotypically relevant mutations at target loci of interest.
[0047] Accordingly, diagnostic assays based on Clustered Interspaced Short Palindromic Repeats (CRISPR) have been developed for rapid detection of target nucleic acids in biological samples, such as detection of SARS-CoV-2 in clinical samples. A few have obtained Emergency Use Authorization (EUA) by the US Food and Drug Administration (FDA). These assays can be performed at low cost with a sample-to-answer turnaround time of under 2 hours and thus can be suitable for use in point of care settings.
[0048] The systems and methods disclosed herein further overcome the limitations of conventional methods for identification of genetic phenotypes in biological samples by providing improved mutation calling for target loci using programmable nuclease-based and/or amplification-based assays. Advantageously, these systems and methods provide improved sensitivity and accuracy in identifying target nucleic acids due to the use of specific primers during amplification and/or highly precise programmable nucleases for detection of target sequences. Such identification can be used to determine, e.g, pathogenesis, transmission risk, treatment response, diagnosis, and/or epidemiology of pathogens and infectious diseases, as well as tumor DNA and/or cancer-related viruses. For instance, the systems and methods disclosed herein can be used to screen for rare or novel pathogenic variants (e.g, SARS-CoV-2 variants), alone or in conjunction with conventional sequencing methods. Because the sequencing capacity for most clinical and public health laboratories is limited, the systems and methods of the present disclosure would enable rapid detection of newly emerging variants or currently circulating variants that have acquired additional mutations. The information thus obtained could directly inform outbreak investigation and public health containment efforts, such as quarantine decisions. Thus, the systems and methods disclosed herein have potential as a rapid diagnostic test alternative to sequencingbased methods. Furthermore, identification of specific mutations associated with intervention resistance, such as neutralizing antibody evasion in SARS-CoV-2 variants, could guide the care of individual patients, for instance, with regard to the use of monoclonal antibodies that remain effective in treating the infection.
[0049] In particular, one aspect of the present disclosure provides systems and methods for detecting and determining mutation calls for a target locus in a biological sample, such as a single nucleotide polymorphism in a target sequence. A signal dataset is obtained comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus. The signal dataset represents
a plurality of time points, the plurality of wells comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus, and the plurality of wells further comprises a second set of wells representing a second allele for the target locus, where each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus. Each corresponding plurality of reporting signals includes, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value, and each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample.
[0050] For each respective well in the plurality of wells, a corresponding signal yield is determined using the corresponding plurality of reporting signals for the respective well across the plurality of time points. Furthermore, for each respective well in the first set of wells, a respective candidate call identity is determined based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities. A voting procedure across the plurality of candidate call identities for the first set of wells is performed, thereby obtaining a mutation call for the target locus.
[0051] In some embodiments, the present disclosure provides various systems and methods for assaying for and detecting single nucleotide polymorphisms (SNPs) in a target sequence. In particular, the various systems and methods disclosed herein use a programmable nuclease complexed with non-naturally occurring (e.g., engineered) guide nucleic acid sequence to detect the presence or absence of, and/or quantify the amount of, a target sequence having one or more SNPs. In some instances, the various systems and methods disclosed herein are used to distinguish or discriminate between sequences having different mutations or variations therein (and/or between SNP -containing sequences and wild-type sequences). In some embodiments, the SNP-containing target nucleic acids are amplified and/or comprise amplicons. Amplifying SNP-containing target nucleic acids can use, for example, reverse transcription (RT) and/or isothermal amplification (e.g., loop- mediated amplification (LAMP)) or thermal amplification (e.g., polymerase chain reaction (PCR)) of RNA or DNAfy.g. , RNA or DNA extracted from a patient sample). Accordingly,
disclosed herein is a programmable nuclease-based assay for detection and discrimination between SNPs in a target sequence.
[0052] In exemplary embodiments, the present disclosure is described in relation to systems and methods for coronavirus variant detection or discrimination in a sample. However, one of ordinary skill in the art will appreciate that this is not intended to be limiting and the compositions and methods disclosed herein may be used to detect and/or determine mutations (e.g, SNPs) in other target sequences of interest. For example, the compositions and methods described herein may be used to detect mutations (e.g, SNPs) associated with other viruses or strains or variants thereof, bacteria or strains or variants thereof, diseases, disorders, and/or genetic traits or susceptibilities of interest.
[0053] In an exemplary aspect, the present disclosure provides various systems and methods of use thereof for assaying for and detecting mutations or variations of interest or concern in a segment of a Spike (S) gene of a coronavirus in a sample. In some embodiments, the coronavirus is SARS-CoV-2 (also known as 2019 novel coronavirus, Wuhan coronavirus, or 2019-nCoV), 229E (alpha coronavirus), NL63 (alpha coronavirus), OC43 (beta coronavirus), HKU1 (beta coronavirus), MERS-CoV, or SARS-CoV. In some embodiments, the coronavirus is a variant of SARS-CoV-2, particularly the alpha variant (also referred to herein as the United Kingdom (UK) variant) known as 20B/501Y.V1, VOC 202012/01, or B.1.1.7 lineage; beta variant (also referred to herein as the South African variant) known as: 20C/501Y.V2 or B.1.351 lineage; the delta variant known as B.1.617.2; the gamma variant known as P.l. The terms “2019-nCoV,” “SARS-CoV-2,” and “COVID-19” may be used interchangeably herein. Exemplary variants of concern or interest are shown in Table 1. The genetic characteristics of these variants are discussed in Leung et. al, Early transmissibility assessment of the N501 Y mutant strains of SARS-Co V-2 in the United Kingdom, October to November 2020, Euro Surveill. 2021;26(l) and in Tegally et al., Emergence and rapid spread of a new severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) lineage with multiple spike mutations in South Africa, MedRxiv 2020.12.21, each of which is hereby incorporated herein by reference in its entirety. In some embodiments, as illustrated in Example 1, below, with reference to FIGS. 5A-5C, the systems and methods disclosed herein specifically target and assay for a segment of a S-gene of the SARS-CoV-2 coronavirus. In some embodiments, the systems and methods disclosed herein are used to detect the presence or absence of the segment of the S-gene of the SARS-CoV-2 in a patient sample. In some implementations, a patient is diagnosed with COVID- 19 if the presence of SARS-CoV-2 is
detected in a sample from the patient. In some embodiments, the assays disclosed herein provide single nucleotide target specificity, enabling specific detection of a single coronavirus.
[0054] In some embodiments, candidate call identities are obtained using DETECTR assays, such as those described herein and in Example 1, below. See, for example, Broughton et al., “CRISPR-Cas 12-based detection of SARS-CoV-2,” Nature Biotechnology 38, 870- 874 (2020), which is hereby incorporated herein by reference in its entirety. In some implementations, DETECTR assays disclosed herein use amplification of samples (e.g., RT and/or isothermal amplification (e.g, LAMP) of RNA (e.g, RNA extracted from a patient sample) and/or PCR), followed by Casl2 detection of predefined target loci (e.g., coronavirus sequences), followed by cleavage of a reporter molecule to detect the presence of nucleic acids in the sample having the target loci (e.g., a viral sequence). For instance, in some embodiments, a DETECTR assay targets the E (envelope) genes or N (nucleoprotein) genes of a coronavirus (e.g., SARS-CoV-2). In some cases, a DETECTR assay targets the S-gene of a coronavirus (e.g, SARS-CoV-2) or coronavirus variant. Isothermal amplification can also be performed to amplify one or more regions of the S gene.
[0055] Table 1 : Genetic Changes Characterizing the SARS-CoV-2 Variants of Concern and Variants of Interest
[0056] Any of the regions of the Spike gene comprising the groups of mutations detailed in Table 1 may be selected as target.
[0057] Accordingly, another aspect of the present disclosure provides various compositions and methods of use thereof for assaying for and detecting a SARS-CoV-2 variant in a sample. In some implementations, the various compositions, methods, and reagents disclosed herein use an effector protein complexed with guide nucleic acid sequence to distinguish among SARS-CoV-2 variants or between a SARS-CoV-2 variant and a wildtype SARS-CoV-2. In some embodiments, the variant of SARS-CoV-2 (also known as 2019 novel coronavirus or 2019-nCoV) is one of the variants known as Alpha (B.l.1.7), Beta (B.1.315), Gamma (P.l), Epsilon (B.1.427 and B.1.429), Kappa (B.1.617.1), Omicron (B.1.1.529), Zeta (P.2), and Delta (B.1.617.2) and lineages thereof. In some embodiments, the assays disclosed herein target the S (spike) gene of the SARS-CoV-2 variant.
[0058] Disclosed herein is a method of detecting a SARS-CoV-2 variant in an individual. In some embodiments, such method comprises a) collecting a nasal swab or a throat swab from the individual; b) optionally extracting a target nucleic acid from the nasal swab or the throat swab; c) amplifying the target nucleic acid to produce an amplification product; d) contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof; e) assaying for a change in a signal produced by cleavage of a reporter nucleic acid wherein the effector protein trans cleaves the reporter nucleic acid upon hybridization of the non-naturally occurring guide nucleic acid to the segment of the target nucleic acid or the amplification product thereof; and f) determining a variant call of the sample. In some embodiments, methods described herein comprises amplifying the target nucleic acid (e.g, via reverse transcription-loop-mediated isothermal amplification (RT-LAMP)). In some embodiments, reagents for the amplification reaction comprise an FIP primer, a BIP primer, an F3 primer, a B3 primer, an LF primer, and a LB primer. In some embodiments, the amplification primers are selected from SEQ ID NOS: 8-13, SEQ ID NOS: 14-19, SEQ ID NOS: 28-33, or SEQ ID NOS: 34-39. In some embodiments, determining the variant call of the SARS-CoV-2 variant
(e.g., any one of Alpha, Beta, Gamma, Delta, Epsilon, Kappa, Omicron, or Zeta SARS-CoV- 2 variants) comprises detecting one or more S-gene mutation(s) (e.g., associated with a L452R, E484K, E484Q, E484A, and N501Y mutation of the S-gene product) relevant to a wild-type SARS-CoV-2. In some embodiments, the non-naturally occurring guide nucleic acid comprises a nucleotide sequence that is at least 80%, 90%, 95%, 98% or 100% identical to SEQ ID NO: 1, 2, 3, 4, 5, or 6 or any one or SEQ ID NO: 22-27 or 40-42. In some embodiments, the reporter nucleic acid comprises a nucleotide sequence that is at least 75% or 100% identical to SEQ ID NO: 7, wherein the nucleotide sequence is flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end.
[0059] Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. However, it will be apparent to one of ordinary skill in the art that the present disclosure may be practiced without these specific details. In other instances, well-known methods, procedures, components, circuits, and networks have not been described in detail so as not to unnecessarily obscure aspects of the embodiments.
[0060] Definitions.
[0061] As used herein, the terms “individual,” “subject,” and “patient” are used interchangeably and include any member of the animal kingdom, including humans.
[0062] As used herein, the term “percent (%) sequence identity” describes the number of matches (“hits”) of identical nucleotides/amino acids of two or more aligned nucleic acid or amino acid sequences as compared to the number of nucleotides or amino acid residues making up the overall length of the reference nucleic acid or amino acid sequences. In other terms, using an alignment, for two or more sequences or sub-sequences the percentage of amino acid residues or nucleotides that are the same (e.g, 95% identity) may be determined, when the (sub)sequences are compared and aligned for maximum correspondence over a window of comparison, or over a designated region as measured using a sequence comparison algorithm as known in the art, or when manually aligned and visually inspected. This definition also applies to the complement of any sequence to be aligned.
[0063] In general terms, “amplification” or “amplifying” is a process by which a nucleic acid molecule is enzymatically copied to generate a progeny population with the same sequence as the parental one. For example, in some embodiments, the method provided
herein includes repeating hybridizing a primer to the target nucleic acid, and extending a nucleic acid complementary to the strand of the target nucleic acid from the primer using a polymerase for one or more times, e.g, until a desired amount of amplification is achieved. In some instances, amplification of the target nucleic acid increases the concentration of the target nucleic acid in the sample relative to the concentration of nucleic acids that do not correspond to the target nucleic acid.
[0064] In general, the term ‘cleavage assay’ refers to an assay designed to visualize, quantitate or identify the cleavage activity of an effector protein. In some cases, the cleavage activity may be cis-cleavage activity. In some cases, the cleavage activity may be transcleavage activity. In some cases, the effector protein is an activated CRISPR effector protein. The term ‘activated effector protein’ refers to an effector protein associated with a guide RNA bound to a target RNA , thereby forming an ‘activated’ complex capable of exhibiting trans- or cis-cleavage.
[0065] The term “CRISPR-RNA” or “crRNA” or “spacer” refers to an RNA molecule having a sequence with sufficient complementarity to a target nucleic acid sequence to direct sequence-specific binding of an RNA-targeting complex to the target RNA sequence. In some embodiments, crRNAs contain a sequence that mediates target recognition and a sequence that duplexes with a tracrRNA. In some embodiments, the crRNA and tracrRNA duplex are present as parts of a single larger guide RNA molecule.
[0066] In some cases, the crRNA comprises a repeat region that interacts with the effector protein. The repeat region may also be referred to as a “protein-binding segment.” Typically, the repeat region is adjacent to the spacer region. For example, a guide nucleic acid that interacts with the effector protein may comprise a repeat region that is 5’ of the spacer region. The spacer region of the guide nucleic acid may be complementarity to (e.g., hybridize to) a target sequence of a target nucleic acid. In some cases, the spacer region is 15- 28 linked nucleosides in length. In some cases, the spacer region is 15-26, 15-24, 15-22, 15- 20, 15-18, 16-28, 16-26, 16-24, 16-22, 16-20, 16-18, 17-26, 17-24, 17-22, 17-20, 17-18, 18- 26, 18-24, or 18-22 linked nucleosides in length. In some cases, the spacer region is 18-24 linked nucleosides in length. In some cases, the spacer region is at least 15 linked nucleosides in length. In some cases, the spacer region is at least 16, 18, 20, or 22 linked nucleosides in length. In some cases, the spacer region comprises at least 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides. In some cases, the spacer region
is at least 17 linked nucleosides in length. In some cases, the spacer region is at least 18 linked nucleosides in length.
[0067] A positive “detectable signal” may be any signal that can be detected using optical, fluorescent, chemiluminescent, electrochemical or other detection methods known in the art. In some embodiments, a first detectable signal may be generated by binding of the CRISPR effector protein complex to the target nucleic acid, indicating that the sample contains the target nucleic acid. In some embodiments, a detectable signal may be generated upon the cleavage event, the event comprising the cleavage of a detector nucleic acid by a Cas effector protein. Alternatively, or in combination, the detectable signal may be generated indirectly by the cleavage event. In some embodiments, the cleavage event is a trans-cleavage reaction or a cis-cleavage reaction.
[0068] As used herein, the term “effector protein”, “CRISPR effector”, “effector”, “CRISPR-associated protein” or “CRISPR enzyme” as used herein refers to a polypeptide, or a fragment thereof, possessing enzymatic activity, and that is capable of binding to a target nucleic acid molecule with the support of a guide nucleic acid molecule. In some embodiments, the binding is sequence-specific. In some embodiments, the guide nucleic acid molecule is DNA or RNA. In some cases, the target nucleic acid molecule may be DNA or RNA. In some cases, the term “effector protein” refers to a protein that is capable of modifying a nucleic acid molecule (e.g, by cleavage, deamination, recombination). In some embodiments, such enzymatic activities include, but are not limited to, nuclease activity, methyltransferase activity, demethylase activity, DNA repair activity, DNA damage activity, deamination activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity, and glycosylase activity.
[0069] As used herein, the term “guide nucleic acid” or “guide sequence” is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct binding of a nucleic acid-targeting complex to the target sequence. In different embodiments, the term “gRNA”, as used here, refers to any RNA molecule that supports the targeting of an effector protein described here to a target nucleic acid. For example, “gRNAs” include, but are not limited to, crRNAs or crRNAs in combination with associated trans-activating RNAs (tracrRNAs). The latter can be independent RNAs or portions of sequences thereof can be fused into a single RNA using a
linker. In different embodiments, the gRNA is designed to contain a chemical or biochemical modification. In some embodiments, a gRNA can contain one or more nucleotides. The terms “non-naturally occurring” or “engineered” are used interchangeably and indicate the involvement of the hand of man. The terms, when referring to nucleic acid molecules or polypeptides mean that the nucleic acid molecule or the polypeptide is at least substantially free from at least one other component with which they are naturally associated in nature and as found in nature.
[0070] As used herein, the term “nuclease activity” refers to the enzymatic activity of an enzyme which allows the enzyme to cleave the phosphodiester bonds between the nucleotide subunits of nucleic acids; the term “endonuclease activity” refers to the enzymatic activity of an enzyme which allows the enzyme to cleave the phosphodiester bond within a polynucleotide chain.
[0071] “Nucleotide” refers to a base-sugar-phosphate compound. Nucleotides are the monomeric subunits of both types of nucleic acid polymers, RNA and DNA. “Nucleotide” refers to ribonucleoside triphosphates, rATP, rGTP, rUTP, and rCTP, and deoxyribonucleoside triphosphates, such as dATP, dGTP, dTTP, and dCTP. As used herein, a “nucleoside” refers to a base-sugar combination without a phosphate group. “Base” refers to the nitrogen-containing base, for example, adenine (A), cytidine (C), guanine (G), and thymine (T) and uracil (U). In general, a 2’ deoxyribonucleoside-5’ -triphosphate (dNTP) refers to a base-sugar-phosphate compound, which has hydrogen at the 2’ position of the sugar, including, but not limited to, the four common deoxyribose-containing substrates (dATP, dCTP, dGTP, dTTP), and derivatives and analogs thereof. In general, a ribonucleoside-5’ -triphosphate (rNTP) refers to a base-sugar-phosphate compound, which has a hydroxyl group at the 2’ position of the sugar, including, but not limited to, the four common ribose-containing substrates for an RNA polymerase- ATP, CTP, GTP and UTP, and derivatives and analogs thereof. Each nucleotide in a double stranded DNA or RNA molecule is paired with its Watson-Crick counterpart called its complementary nucleotide. In a double stranded DNA or RNA sequence, the upper (sense) strand sequence is in general, understood as going in the direction from its 5'- to 3'-end, and the complementary sequence is thus understood as the sequence of the lower (antisense) strand in the same direction as the upper strand. Following the same logic, the reverse sequence is understood as the sequence of the upper strand in the direction from its 3'- to its 5'-end, while the ‘reverse complement’
sequence or the ‘reverse complementary’ sequence is understood as the sequence of the lower strand in the direction of its 5'- to its 3'-end.
[0072] As used herein, “reporter” is used interchangeably with “reporter nucleic acid,” “reporter molecule,” or “detector nucleic acid.” As used herein, the term “reporter” or “reporter nucleic acid” refers generally to an off-target nucleic acid molecule that is capable of providing a detectable signal upon cleavage by an ‘activated’ effector protein. By way of non-limiting and illustrative example, a reporter may comprise a single stranded nucleic acid and a detection moiety (e.g, a labeled single stranded RNA reporter), wherein the nucleic acid is capable of being cleaved by an effector protein (e.g, a CRISPR/Cas protein as disclosed herein) or a multimeric complex thereof, releasing the detection moiety, and generating a detectable signal.
[0073] The effector proteins disclosed herein, activated upon hybridization of a guide nucleic acid to a target nucleic acid, may cleave the reporter. As used herein, an ‘activated’ effector protein is a CRISPR nuclease that is activated upon hybridization of a guide nucleic acid to a target nucleic acid. In various embodiments, the reporter nucleic acid is further attached to a moiety, chemical compound or other component that can be used to visualize, quantitate or identify the cleavage of the reporter molecule. Cleaving the “reporter” may be referred to herein as cleaving the “reporter nucleic acid,” the “reporter molecule,” or the “nucleic acid”
[0074] In some embodiments, reporters may comprise RNA. Reporters may comprise DNA. In some embodiments, reporters may be double-stranded. In some embodiments, reporters may be single-stranded. In some instances, reporters comprise a protein capable of generating a signal. A signal may be a calorimetric, potentiometric, amperometric, optical (e.g, fluorescent, colorimetric, etc.), or piezo-electric signal. In some instances, the reporter comprises a detection moiety. Suitable detectable labels and/or moieties that may provide a signal include, but are not limited to, an enzyme, a radioisotope, a member of a specific binding pair; a fluorophore; a fluorescent protein; a quantum dot; and the like. In some instances, the reporter comprises a detection moiety and a quenching moiety. In some instances, the reporter comprises a cleavage site, wherein the detection moiety is located at a first site on the reporter and the quenching moiety is located at a second site on the reporter, wherein the first site and the second site are separated by the cleavage site. Sometimes the quenching moiety is a fluorescence quenching moiety. In some instances, the quenching moiety is 5' to the cleavage site and the detection moiety is 3' to the cleavage site. In some
instances, the detection moiety is 5' to the cleavage site and the quenching moiety is 3' to the cleavage site. Sometimes the quenching moiety is at the 5' terminus of the nucleic acid of a reporter. Sometimes the detection moiety is at the 3' terminus of the nucleic acid of a reporter. In some instances, the detection moiety is at the 5' terminus of the nucleic acid of a reporter. In some instances, the quenching moiety is at the 3' terminus of the nucleic acid of a reporter.
[0075] As used herein, the terms “trans-activating crRNA” or “tracrRNA” refer to a transactivating or transactivated RNA molecule or a fragment thereof which contains a sequence with sufficient complementarity to allow association with a crRNA molecule. In some embodiments, the tracrRNA can refer to an RNA molecule that provides a scaffold for binding to a CRISPR effector protein. In some embodiments, the tracrRNA forms a structure facilitating the binding of a CRISPR-associated or a CRISPR effector protein to a specific target nucleic acid. In some embodiments, at least a portion of a crRNA sequence and at least a portion of a tracrRNA sequence are present as parts of a larger single guide RNA molecule.
[0076] The terms, “sgRNA”, “single guide RNA” or “single guide nucleic acid,” as used herein refer to a single nucleic acid system comprising: a first nucleotide sequence that binds non-covalently with an effector protein; and a second nucleotide sequence that hybridizes to a target nucleic acid. In some embodiments, the first nucleotide sequence is referred to as a handle sequence. A handle sequence may comprise at least a portion of a tracrRNA sequence, at least a portion of a repeat sequence, or a combination thereof. A sgRNA does not comprise a tracrRNA.
[0077] In the context of formation of a DNA or RNA-targeting complex, “target sequence” refers to a DNA or RNA sequence to which a DNA or RNA-targeting guide RNA is designed to have complementarity, where hybridization between a target sequence and a guide RNA promotes the association of the CRISPR effector protein with the target RNA.
[0078] As used herein, a “tissue sample” is any biological samples derived from a patient. This term includes, but is not limited to, biological fluids such as blood, serum, plasma, urine, cerebrospinal fluid, tears, saliva, lymph, dialysate, lavage fluid, semen, and other biological fluid samples. Cells and tissues of scientific origin are included as tissue samples. The term also includes cells or cells derived therefrom and their progeny, including cells in culture, cell supernatants, and cell lysates. This definition includes samples that have been manipulated in any way after they are obtained, such as treatment with reagents, solubilization, or
enrichment of specific components such as polynucleotides or polypeptides. The term also includes derivatives and fractions of patient samples.
[0079] As used herein “trans-cleavage activity” also referred to as “collateral” or “transcollateral” cleavage may be non-specific cleavage of nearby single-stranded nucleic acid by an activated effector protein, such as trans cleavage of detector nucleic acids with a detection moiety.
[0080] In general, the term “trans cleavage assay” refers to an assay designed to visualize, quantitate or identify the trans-collateral activity of an activated effector protein. As used herein, an ‘activated’ effector protein is a CRISPR nuclease that is activated upon hybridization of a guide nucleic acid to a target nucleic acid, thereby inducing the formation of the nuclease-guide complex. A “DETECTR®” assay, as used herein, is a trans cleavage assay.
[0081] Disclosed herein are non-naturally occurring compositions and systems comprising at least one of an engineered effector protein and an engineered guide nucleic acid, which may simply be referred to herein as an effector protein and a guide nucleic acid, respectively. In general, an engineered effector protein and an engineered guide nucleic acid refer to an effector protein and a guide nucleic acid, respectively, that are not found in nature. In some instances, systems and compositions comprise at least one non-naturally occurring component. For example, compositions and systems may comprise a guide nucleic acid, wherein the sequence of the guide nucleic acid is different or modified from that of a naturally-occurring guide nucleic acid. In some instances, compositions and systems comprise at least two components that do not naturally occur together. For example, compositions and systems may comprise a guide nucleic acid comprising a repeat region and a spacer region which do not naturally occur together. Also, by way of example, composition and systems may comprise a guide nucleic acid and a Cas protein that do not naturally occur together. Conversely, and for clarity, an effector protein or guide nucleic acid that is “natural,” “naturally-occurring,” or “ found in nature” includes Cas proteins and guide nucleic acids from cells or organisms that have not been genetically modified by a human or machine.
[0082] In some instances, the guide nucleic acid comprises a non-natural nucleobase sequence. In some instances, the non-natural sequence is a nucleobase sequence that is not found in nature. The non-natural sequence may comprise a portion of a naturally-occurring
sequence, wherein the portion of the naturally-occurring sequence is not present in nature absent the remainder of the naturally-occurring sequence. In some instances, the guide nucleic acid comprises two naturally-occurring sequences arranged in an order or proximity that is not observed in nature. In some instances, compositions and systems comprise a ribonucleotide complex comprising a CRISPR/Cas effector protein and a guide nucleic acid that do not occur together in nature. Engineered guide nucleic acids may comprise a first sequence and a second sequence that do not occur naturally together. For example, an engineered guide nucleic acid may comprise a sequence of a naturally-occurring repeat region and a spacer region that is complementary to a naturally-occurring eukaryotic sequence. The engineered guide nucleic acid may comprise a sequence of a repeat region that occurs naturally in an organism and a spacer region that does not occur naturally in that organism. An engineered guide nucleic acid may comprise a first sequence that occurs in a first organism and a second sequence that occurs in a second organism, wherein the first organism and the second organism are different. The guide nucleic acid may comprise a third sequence at a 3’ or 5’ end of the guide nucleic acid, or between the first and second sequences of the guide nucleic acid. For example, an engineered guide nucleic acid may comprise a naturally occurring crRNA and tracrRNA coupled by a linker sequence. Alternatively, an engineered guide nucleic acid may comprise at least a portion of a crRNA sequence and at least a portion of a tracrRNA sequence.
[0083] In some instances, compositions and systems described herein comprise an engineered effector protein that is similar to a naturally occurring effector protein. The engineered effector protein may lack a portion of the naturally occurring effector protein. The effector protein may comprise a mutation relative to the naturally-occurring effector protein, wherein the mutation is not found in nature. The effector protein may also comprise at least one additional amino acid relative to the naturally-occurring effector protein. In certain embodiments, the nucleotide sequence encoding the effector protein is codon optimized relative to the naturally occurring sequence.
[0084] Programmable Nucleases.
[0085] Disclosed herein are programmable nucleases and uses thereof, e.g., detection of target nucleic acids. In some cases, a programmable nuclease is capable of being activated when complexed with the guide nucleic acid and the target nucleic acid segment. A programmable nuclease can be capable of being activated when complexed with a guide nucleic acid and the target sequence. The programmable nuclease can be activated upon
binding of the guide nucleic acid to its target nucleic acid and can non-specifically degrade a non-target nucleic acid in its environment. The programmable nuclease has trans-cleavage activity once activated. A programmable nuclease can be a Cas protein (also referred to, interchangeably, as a Cas nuclease or Cas effector protein). A guide nucleic acid (e.g., crRNA) and Cas protein can form a CRISPR enzyme.
[0086] In some embodiments, one or more programmable nucleases as disclosed herein can be activated to initiate trans-cleavage activity of a reporter (also referred to herein as a reporter molecule). A programmable nuclease as disclosed herein can, in some cases, bind to a target sequence or target nucleic acid to initiate trans-cleavage of a reporter. The programmable nuclease can be referred to as an RNA-activated programmable RNA nuclease. In some instances, the programmable nuclease as disclosed herein can bind to a target DNA to initiate trans-cleavage of an RNA reporter. Such a programmable nuclease can be referred to herein as a DNA-activated programmable RNA nuclease. In some cases, a programmable nuclease as described herein can be activated by a target RNA or a target DNA. For example, a programmable nuclease, e.g, a Cas enzyme, can be activated by a target RNA nucleic acid or a target DNA nucleic acid to cleave RNA reporters. In some embodiments, the programmable nuclease can bind to a target ssDNA which initiates trans- cleavage of RNA reporters. In some instances, a programmable nuclease as disclosed herein can bind to a target DNA to initiate trans-cleavage of a DNA reporter, and this programmable nuclease can be referred to as a DNA-activated programmable DNA nuclease.
[0087] The programmable nuclease can become activated after binding of a guide nucleic acid that is complexed with the programmable nuclease with a target nucleic acid, and the activated programmable nuclease can cleave the target nucleic acid, which can result in a trans-cleavage activity. Trans-cleavage activity can be non-specific cleavage of nearby single-stranded nucleic acids by the activated programmable nuclease, such as trans-cleavage of reporter nucleic acids comprising a detection moiety. Once the reporter is cleaved by the activated programmable nuclease, the detection moiety can be released or separated from the reporter and can directly or indirectly generate a detectable signal. The reporter and/or the detection moiety can be immobilized on a support medium. Often the detection moiety is at least one of a fluorophore, a dye, a polypeptide, or a nucleic acid. In some embodiments, the detection moiety binds to a capture molecule on the support medium to be immobilized. The detectable signal can be visualized on the support medium to assess the presence or
concentration of one or more target nucleic acids associated with an ailment, such as a SNP associated with a disease, cancer, or genetic disorder.
[0088] In some embodiments, a programmable nuclease is any enzyme that can be or has been designed, modified, or engineered by human contribution so that the enzyme targets or cleaves a nucleic acid in a sequence-specific manner. Programmable nucleases can include, for example, zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and/or RNA-guided nucleases such as the bacterial clustered regularly interspaced short palindromic repeat (CRISPR)-Cas (CRISPR-associated) nucleases or Cpfl. Programmable nucleases can also include, for example, PfAgo and/or NgAgo.
[0089] Several programmable nucleases are consistent with the methods and devices of the present disclosure. For example, Cas proteins are programmable nucleases used in the methods and systems disclosed herein. Cas proteins can include any of the known Classes and Types of CRISPR/Cas enzymes. Programmable nucleases disclosed herein include Class 1 Cas proteins, such as the Type I, Type IV, or Type III Cas proteins. Programmable nucleases disclosed herein also include the Class 2 Cas proteins, such as the Type II, Type V, and Type VI Cas proteins. Programmable nucleases included in the methods disclosed herein and methods of use thereof include a Type V or Type VI Cas proteins.
[0090] In some instances, the programmable nuclease is a Type V Cas protein. In general, a Type V Cas effector protein comprises a RuvC domain but lacks an HNH domain. In most instances, the RuvC domain of the Type V Cas effector protein comprises three patrial RuvC domains (RuvC-I, RuvC-II, and RuvC-III, also referred to herein as subdomains). In some instances, the three RuvC subdomains are located within the C-terminal half of the Type V Cas effector protein. In some instances, none of the RuvC subdomains are located at the N terminus of the protein. In some instances, the RuvC subdomains are contiguous. In some instances, the RuvC subdomains are not contiguous with respect to the primary amino acid sequence of the Type V Cas protein, but form a ruvC domain once the protein is produced and folds. In some instances, there are zero to about 50 amino acids between the first and second RuvC subdomains. In some instances, there are zero to about 50 amino acids between the second and third RuvC subdomains. In some instances, the Cas effector is a Casl4 effector. In some instances, the Casl4 effector is a Casl4a, Casl4al, Casl4b, Casl4c, Casl4d, Casl4e, Casl4f, Casl4g, Casl4h, or Casl4u effector. In some instances, the Cas effector is a CasPhi (also referred to herein as a Cas)) effector. In some instances, the Cas
effector is a Casl2 effector. In some instances, the Casl2 effector is a Casl2a, Casl2b, Cast 2c, Cast 2d, Casl2e, or Casl2j effector.
[0091] In some instances, the Type V Cas protein comprises a Casl4 protein. Casl4 proteins can comprise a bilobed structure with distinct amino-terminal and carboxy -terminal domains. The amino- and carboxy-terminal domains can be connected by a flexible linker. The flexible linker can affect the relative conformations of the amino- and carboxyl-terminal domains. The flexible linker can be short, for example less than 10 amino acids, less than 8 amino acids, less than 6 amino acids, less than 5 amino acids, or less than 4 amino acids in length. The flexible linker can be sufficiently long to enable different conformations of the amino- and carboxy-terminal domains among two Cas 14 proteins of a Cas 14 dimer complex (e.g., the relative orientations of the amino- and carboxy-terminal domains differ between two Cas 14 proteins of a Cas 14 homodimer complex). The linker domain can comprise a mutation which affects the relative conformations of the amino- and carboxyl-terminal domains. The linker can comprise a mutation which affects Casl4 dimerization. For example, a linker mutation can enhance the stability of a Cas 14 dimer.
[0092] In some instances, the amino-terminal domain of a Cas 14 protein comprises a wedge domain, a recognition domain, a zinc finger domain, or any combination thereof. The wedge domain can comprise a multi-strand P-barrel structure. A multi-strand P-barrel structure can comprise an oligonucleotide/oligosaccharide-binding fold that is structurally comparable to those of some Casl2 proteins. The recognition domain and the zinc finger domain can each (individually or collectively) be inserted between P-barrel strands of the wedge domain. The recognition domain can comprise a 4-a-helix structure, structurally comparable but shorter than those found in some Cas 12 proteins. The recognition domain can comprise a binding affinity for a guide nucleic acid or for a guide nucleic acid-target nucleic acid heteroduplex. In some cases, a REC lobe can comprise a binding affinity for a PAM sequence in the target nucleic acid. The amino-terminal can comprise a wedge domain, a recognition domain, and a zinc finger domain. The carboxy-terminal can comprise a RuvC domain, a zinc finger domain, or any combination thereof. The carboxy-terminal can comprise one RuvC and one zinc finger domain.
[0093] In some embodiments, Cas 14 proteins comprise a RuvC domain or a partial RuvC domain. The RuvC domain can be defined by a single, contiguous sequence, or a set of partial RuvC domains that are not contiguous with respect to the primary amino acid sequence of the Cas 14 protein. In some instances, a partial RuvC domain does not have any
substrate binding activity or catalytic activity on its own. A Casl4 protein of the present disclosure can include multiple partial RuvC domains, which can combine to generate a RuvC domain with substrate binding or catalytic activity. For example, a Casl4 can include 3 partial RuvC domains (RuvC-I, RuvC-II, and RuvC-III, also referred to herein as subdomains) that are not contiguous with respect to the primary amino acid sequence of the Casl4 protein but form a RuvC domain once the protein is produced and folds. A Casl4 protein can comprise a linker loop connecting a carboxy terminal domain of the Cast 4 protein with the amino terminal domain of the Cas 14 protein, and where the carboxy terminal domain comprises one or more RuvC domains and the amino terminal domain comprises a recognition domain.
[0094] In some embodiments, Cas 14 proteins comprise a zinc finger domain. In some instances, a carboxy terminal domain of a Casl4 protein comprises a zinc finger domain. In some instances, an amino terminal domain of a Cas 14 protein comprises a zinc finger domain. In some instances, the amino terminal domain comprises a wedge domain (e.g., a multi-P-barrel wedge structure), a zinc finger domain, or any combination thereof. In some cases, the carboxy terminal domain comprises the RuvC domains and a zinc finger domain, and the amino terminal domain comprises a recognition domain, a wedge domain, and a zinc finger domain.
[0095] In some instances, the Type V Cas protein is a Cas) protein. A Cas protein can function as an endonuclease that catalyzes cleavage at a specific sequence in a target nucleic acid. A programmable Cas nuclease can have a single active site in a RuvC domain that is capable of catalyzing pre-crRNA processing and nicking or cleaving of nucleic acids. This compact catalytic site can render the programmable Cas nuclease especially advantageous for genome engineering and new functionalities for genome manipulation.
[0096] In some instances, the programmable nuclease is a Type VI Cas protein. In some embodiments, the Type VI Cas protein is a programmable Cas 13 nuclease. The general architecture of a Cas 13 protein includes an N-terminal domain and two HEPN (higher eukaryotes and prokaryotes nucleotide-binding) domains separated by two helical domains. The HEPN domains each comprise aR-X4-H motif. Shared features across Cas 13 proteins include that upon binding of the crRNA of the guide nucleic acid to a target nucleic acid, the protein undergoes a conformational change to bring together the HEPN domains and form a catalytically active RNase. Thus, two activatable HEPN domains are characteristic of a programmable Cas 13 nuclease of the present disclosure. However, programmable Cas 13
nucleases also consistent with the present disclosure include Cast 3 nucleases comprising mutations in the HEPN domain that enhance the Cast 3 proteins cleavage efficiency or mutations that catalytically inactivate the HEPN domains. Programmable Cast 3 nucleases consistent with the present disclosure also Cast 3 nucleases comprising catalytic components. In some instances, the Cas effector is a Cas 13 effector. In some instances, the Cas 13 effector is a Casl3a, a Casl3b, a Cas 13c, a Cas 13d, or a Cas 13e effector protein.
[0097] In some cases, the programmable nuclease is Cas 13. In some instances, the Cas 13 is Casl3a, Casl3b, Casl3c, Casl3d, or Casl3e. In some cases, the programmable nuclease is Mad7 or Mad2. In some cases, the programmable nuclease is Cas 12. In some embodiments, the Casl2 is Casl2a, Casl2b, Casl2c, Casl2d, or Casl2e. In some cases, the programmable nuclease is Csml, Cas9, C2c4, C2c8, C2c5, C2cl0, C2c9, or CasZ. In some embodiments, the Csml is also called smCmsl, miCmsl, obCmsl, or suCmsl. In some embodiments, Casl3a is called C2c2. In some embodiments, CasZ is called Casl4a, Casl4b, Casl4c, Casl4d, Casl4e, Casl4f, Casl4g, or Casl4h. In some embodiments, the programmable nuclease is a type V CRISPR-Cas system. In some cases, the programmable nuclease is a type VI CRISPR-Cas system. In some embodiments, the programmable nuclease is a type III CRISPR-Cas system. In some cases, the programmable nuclease is from at least one of Leptotrichia shahii (Lsh), Listeria seeligeri (Lse), Leptotrichia buccalis (Lbu), Leptotrichia wadeu (Lwa), Rhodobacter capsulatus (Rea), Herbinix hemicellulosilytica (Hhe), Paludibacter propionicigen.es (Ppr), Lachnospiraceae bacterium (Lba), [Eubacterium] rectale (Ere), Listeria newyorkensis (Lny), Clostridium aminophilum (Cam), Prevotella sp. (Psm), Capnocytophaga canimorsus (Cea, Lachnospiraceae bacterium (Lba), Bergeyella zoohelcum (Bzo), Prevotella intermedia (Pin), Prevotella buccae (Pbu), Alistipes sp. (Asp), Riemerella anatipestifer (Ran), Prevotella aurantiaca (Pau), Prevotella saccharolytica (Psa), Prevotella intermedia (Pin2), Capnocytophaga canimorsus (Cea), Porphyromonas gulae (Pgu), Prevotella sp. (Psp), Porphyromonas gingivalis (Pig), Prevotella intermedia (Pin3), Enterococcus italicus (Ei), Lactobacillus salivarius (Ls), or Thermus thermophilus (Tt). In some embodiments, the Casl3 is at least one of LbuCasl3a, LwaCasl3a, LbaCasl3a, HheCasl3a, PprCasl3a, EreCasl3a, CamCasl3a, or LshCasl3a. The trans-cleavage activity of the programmable nuclease can be activated when the guide nucleic acid is complexed with the target nucleic acid. In some embodiments, the target nucleic acid is RNA or DNA.
[0098] In some embodiments, the programmable nuclease comprises a Cas 12 protein, where the Cas 12 enzyme binds and cleaves double stranded DNA and single stranded DNA.
In some embodiments, programmable nuclease comprises a Cast 3 protein, where the Cast 3 enzyme binds and cleaves single stranded RNA. In some embodiments, programmable nuclease comprises a Casl4 protein, where the Casl4 enzyme binds and cleaves both double stranded DNA and single stranded DNA.
[0099] Table 2 provides illustrative amino acid sequences of programmable nucleases having trans-cleavage activity. In some instances, programmable nucleases described herein comprise an amino acid sequence that is at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, or at least 98%, at least 99%, or 100% identical to any one of SEQ ID NOS: 45-117. In some embodiments, the programmable nuclease consists of an amino acid sequence that is at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, at least 99%, or 100% identical to any one of SEQ ID NOS: 45-117. In some embodiments, the programmable nuclease comprises at least about 50, at least about 100, at least about 150, at least about 200, at least about 250, at least about 300, at least about 350, at least about 400, at least about 450, or at least about 500 consecutive amino acids of any one of SEQ ID NOS: 45-117.
[00101] In some instances, effector proteins disclosed herein are engineered proteins. Engineered proteins are not identical to a naturally-occurring protein. Engineered proteins can provide enhanced nuclease or nickase activity as compared to a naturally occurring nuclease or nickase. An engineered protein can comprise a modified form of a wild-type counterpart protein. In some instances, effector proteins comprise at least one amino acid change (e.g, deletion, insertion, or substitution) that enhances or reduces the nucleic acidcleaving activity of the effector protein relative to the wild-type counterpart.
[00102] In some embodiments, a programmable nuclease is thermostable. In some instances, known programmable nucleases (e.g, Casl2 nucleases) are relatively thermosensitive and only exhibit activity (e.g, cis and/or trans-cleavage) sufficient to produce a detectable signal in a diagnostic assay at temperatures less than 40° C, and optimally at about 37° C. A thermostable protein can have enzymatic activity, stability, or folding comparable to those at 37 °C. In some instances, the trans-cleavage activity (e.g, the maximum trans- cleavage rate as measured by fluorescent signal generation) of a programmable nuclease in a trans-cleavage assay at 40 °C, 45 °C, 50 °C, 55 °C, 60 °C, 65 °C, 70 °C, 75 °C, 80 °C, or more is at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 100%, at least 1-fold, at least 2- fold, at least 3-fold, at least 4-fold, at least 5-fold, at least 6-fold, at least 7-fold, at least 8- fold, at least 9-fold, at least 10-fold, at least 11-fold, at least 12-fold, at least 13-fold, at least
14-fold, at least 15-fold, at least 20-fold, at least 25-fold, at least 30-fold, at least 35-fold, at least 40-fold, at least 45-fold, at least 50-fold or more of that at 37 °C.
[00103] Engineered Guide Nucleic Acids.
[00104] As used herein, a guide nucleic acid refers to a nucleic acid that comprises a sequence that targets (e.g., is reverse complementary to) the sequence of a target nucleic acid. A guide nucleic acid can be DNA, RNA, or a combination thereof (e.g., RNA with a thymine base), or include a chemically modified nucleobase or phosphate backbone. Guide nucleic acids are often referred to as a “guide RNA” or “gRNA.” However, a guide nucleic acid can comprise deoxyribonucleotides and/or modified nucleobases. In general, a guide nucleic acid is a nucleic acid molecule that binds to an effector protein (e.g., a Cas effector protein), thereby forming a ribonucleoprotein complex (RNP). In some instances, the engineered guide nucleic acid imparts activity or sequence selectivity to the effector protein. In some cases, a guide nucleic acid, when complexed with an effector protein, brings the effector protein into proximity of a target nucleic acid molecule. An engineered guide nucleic acid can comprise a CRISPR RNA (crRNA) that is at least partially complementary to a target nucleic acid. In some instances, the engineered guide nucleic acid comprises a trans-activating crRNA (tracrRNA), at least a portion of which interacts with the effector protein. A guide nucleic acid can comprise or be coupled to a tracrRNA. The tracrRNA can comprise deoxyribonucleosides in addition to ribonucleosides. The tracrRNA can be separate from but form a complex with a crRNA. In some instances, at least a portion of a crRNA sequence and at least a portion of a tracrRNA sequence are provided as a single guide nucleic acid, also referred to as a single guide RNA (sgRNA). In some instances, a crRNA and tracrRNA function as two separate, unlinked molecules.
[00105] Reporters.
[00106] In some instances, the systems and methods disclosed herein provide a reporter (interchangeably, “reporter nucleic acid” or “reporter molecule”). By way of non-limiting and illustrative example, a reporter can comprise a single stranded nucleic acid and a detection moiety (e.g., a labeled single stranded RNA reporter), where the nucleic acid is capable of being cleaved by a programmable nuclease (e.g., a Type V or VI CRISPR/Cas protein as disclosed herein) or a multimeric complex thereof, releasing the detection moiety, and generating a detectable signal. Moreover, as used interchangeably herein, the term “reporting signal” or “detectable signal” refers to any readout from a reporter, such as a fluorescence
intensity and/or a lateral flow readout. The programmable nucleases disclosed herein, activated upon hybridization of a guide RNA to a target nucleic acid, may cleave the reporter. Accordingly, cleaving the “reporter” can be referred to interchangeably herein as cleaving the “reporter nucleic acid,” the “reporter molecule,” or the “nucleic acid of the reporter.” Reporters can comprise RNA. Reporters can comprise DNA. Reporters can be doublestranded. Reporters can be single-stranded.
[00107] In some instances, the systems and methods disclosed herein provide a Type V CRISPR/Cas protein and a reporter nucleic acid configured to undergo transcollateral cleavage by the Type V CRISPR/Cas protein. Transcollateral cleavage of the reporter can generate a signal from the reporter or alter a signal from the reporter. In some cases, the signal is an optical signal, such as a fluorescence signal or absorbance band. Transcollateral cleavage of the reporter can alter the wavelength, intensity, or polarization of the optical signal. For example, the reporter can comprise a fluorophore and a quencher, such that transcollateral cleavage of the reporter separates the fluorophore and the quencher thereby increasing a fluorescence signal from the fluorophore. Herein, detection of reporter cleavage (directly or indirectly) to determine the presence of a target nucleic acid sequence is, in some embodiments, referred to as “DETECTR.” In some embodiments described herein is a method of assaying for a target nucleic acid in a sample comprising contacting the target nucleic acid with a programmable nuclease, a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid, and a reporter nucleic acid, and assaying for a change in a signal, where the change in the signal is produced by or indicative of cleavage of the reporter nucleic acid.
[00108] In some cases, the reporter comprises a detection moiety. In some instances, the reporter comprises a cleavage site, where the detection moiety is located at a first site on the reporter, where the first site is separated from the remainder of reporter upon cleavage at the cleavage site. In some cases, the detection moiety is 3’ to the cleavage site. In some cases, the detection moiety is 5’ to the cleavage site. In some embodiments, the detection moiety is at the 3’ terminus of the nucleic acid of a reporter. In some cases, the detection moiety is at the 5’ terminus of the nucleic acid of a reporter.
[00109] In some embodiments, the reporter comprises a nucleic acid and a detection moiety. In some embodiments, a reporter is connected to a surface by a linkage. In some embodiments, a reporter comprises at least one of a nucleic acid, a chemical functionality, a detection moiety, a quenching moiety, or a combination thereof. In some embodiments, a
reporter is configured for the detection moiety to remain immobilized to the surface and the quenching moiety to be released into solution upon cleavage of the reporter. In some embodiments, a reporter is configured for the quenching moiety to remain immobilized to the surface and for the detection moiety to be released into solution, upon cleavage of the reporter. Often the detection moiety is at least one of a label, a polypeptide, a dendrimer, or a nucleic acid, or a combination thereof. In some embodiments, the reporter contains a label. In some embodiments, the label is FITC, DIG, TAMRA, Cy5, AF594, or Cy3. In some embodiments, the label comprises a dye, a nanoparticle configured to produce a signal. In some embodiments, the dye is a fluorescent dye. In some embodiments, the at least one chemical functionality comprises biotin. In some embodiments, the at least one chemical functionality is configured to be captured by a capture probe. In some embodiments, the at least one chemical functionality comprises biotin and the capture probe comprises anti-biotin, streptavidin, avidin or other molecule configured to bind with biotin. In some embodiments, the dye is the chemical functionality. In some embodiments, a capture probe comprises a molecule that is complementary to the chemical functionality. In some embodiments, the capture antibodies are anti-FITC, anti-DIG, anti-TAMRA, anti-Cy5, anti-AF594, or any other appropriate capture antibody capable of binding the detection moiety or conjugate. In some embodiments, the detection moiety is the chemical functionality.
[00110] In some instances, reporters comprise a detection moiety capable of generating a signal. A signal can be a calorimetric, potentiometric, amperometric, optical (e.g, fluorescent, colorimetric, etc.), or piezo-electric signal. Suitable detectable labels and/or moieties that can provide a signal include, but are not limited to, an enzyme, a radioisotope, a member of a specific binding pair, a fluorophore, a fluorescent protein, a quantum dot, and the like.
[00111] In some cases, the reporter comprises a detection moiety and a quenching moiety. In some instances, the reporter comprises a cleavage site, where the detection moiety is located at a first site on the reporter and the quenching moiety is located at a second site on the reporter, where the first site and the second site are separated by the cleavage site. In some embodiments, the quenching moiety is a fluorescence quenching moiety. In some cases, the quenching moiety is 5’ to the cleavage site and the detection moiety is 3’ to the cleavage site. In some cases, the detection moiety is 5’ to the cleavage site and the quenching moiety is 3’ to the cleavage site. In some embodiments, the quenching moiety is at the 5’ terminus of the nucleic acid of a reporter. In some embodiments, the detection moiety is at the 3’
terminus of the nucleic acid of a reporter. In some cases, the detection moiety is at the 5’ terminus of the nucleic acid of a reporter. In some cases, the quenching moiety is at the 3’ terminus of the nucleic acid of a reporter.
[00112] Suitable fluorescent proteins include, but are not limited to, green fluorescent protein (GFP) or variants thereof, blue fluorescent variant of GFP (BFP), cyan fluorescent variant of GFP (CFP), yellow fluorescent variant of GFP (YFP), enhanced GFP (EGFP), enhanced CFP (ECFP), enhanced YFP (EYFP), GFPS65T, Emerald, Topaz (TYFP), Venus, Citrine, mCitrine, GFPuv, destabilised EGFP (dEGFP), destabilised ECFP (dECFP), destabilised EYFP (dEYFP), mCFPm, Cerulean, T-Sapphire, CyPet, YPet, mKO, HcRed, t- HcRed, DsRed, DsRed2, DsRed-monomer, J-Red, dimer2, t-dimer2(12), mRFPl, pocilloporin, Renilla GFP, Monster GFP, paGFP, Kaede protein and kindling protein, Phycobiliproteins and Phycobiliprotein conjugates including B-Phycoerythrin, R- Phycoerythrin and Allophycocyanin. Suitable enzymes include, but are not limited to, horseradish peroxidase (HRP), alkaline phosphatase (AP), beta-galactosidase (GAL), glucose-6-phosphate dehydrogenase, beta-N-acetylglucosaminidase, CE<-glucuronidase, invertase, Xanthine Oxidase, firefly luciferase, and glucose oxidase (GO).
[00113] In some instances, the detection moiety comprises an invertase. The substrate of the invertase can be sucrose. A DNS reagent can be included in the system to produce a colorimetric change when the invertase converts sucrose to glucose. In some cases, the reporter nucleic acid and invertase are conjugated using a heterobifunctional linker via sulfo- SMCC chemistry.
[00114] In some embodiments, suitable fluorophores provide a detectable fluorescence signal in the same range as 6-Fluorescein (Integrated DNA Technologies), IRDye 700 (Integrated DNA Technologies), TYE 665 (Integrated DNA Technologies), Alex Fluor 594 (Integrated DNA Technologies), or ATTO TM 633 (NHS Ester) (Integrated DNA Technologies). Non-limiting examples of fluorophores are fluorescein amidite, 6-Fluorescein, IRDye 700, TYE 665, Alex Fluor 594, or ATTO TM 633 (NHS Ester). The fluorophore can be an infrared fluorophore. The fluorophore can emit fluorescence in the range of 500 nm and 720 nm. In some cases, the fluorophore emits fluorescence at a wavelength of 700 nm or higher. In other cases, the fluorophore emits fluorescence at about 665 nm. In some cases, the fluorophore emits fluorescence in the range of 500 nm to 520 nm, 500 nm to 540 nm, 500 nm to 590 nm, 590 nm to 600 nm, 600 nm to 610 nm, 610 nm to 620 nm, 620 nm to 630 nm, 630 nm to 640 nm, 640 nm to 650 nm, 650 nm to 660 nm, 660 nm to 670 nm, 670 nm to 680 nm,
690 nm to 690 nm, 690 nm to 700 nm, 700 nm to 710 nm, 710 nm to 720 nm, or 720 nm to 730 nm. In some cases, the fluorophore emits fluorescence in the range 450 nm to 750 nm, 500 nm to 650 nm, or 550 to 650 nm.
[00115] In some embodiments, systems comprise a quenching moiety. A quenching moiety may be chosen based on its ability to quench the detection moiety. A quenching moiety can be a non-fluorescent fluorescence quencher. A quenching moiety may quench a detection moiety that emits fluorescence in the range of 500 nm and 720 nm. A quenching moiety may quench a detection moiety that emits fluorescence in the range of 500 nm and 720 nm. In some cases, the quenching moiety quenches a detection moiety that emits fluorescence at a wavelength of 700 nm or higher. In other cases, the quenching moiety quenches a detection moiety that emits fluorescence at about 660 nm or about 670 nm. In some cases, the quenching moiety quenches a detection moiety that emits fluorescence in the range of 500 to 520, 500 to 540, 500 to 590, 590 to 600, 600 to 610, 610 to 620, 620 to 630, 630 to 640, 640 to 650, 650 to 660, 660 to 670, 670 to 680, 690 to 690, 690 to 700, 700 to 710, 710 to 720, or 720 to 730 nm. In some cases, the quenching moiety quenches a detection moiety that emits fluorescence in the range 450 nm to 750 nm, 500 nm to 650 nm, or 550 to 650 nm. A quenching moiety may quench fluorescein amidite, 6-Fluorescein, IRDye 700, TYE 665, Alex Fluor 594, or ATTO TM 633 (NHS Ester). A quenching moiety can be Iowa Black RQ, Iowa Black FQ or IRDye QC-1 Quencher. In some embodiments, a quenching moiety quenches fluorescein amidite, 6-Fluorescein (Integrated DNA Technologies), IRDye 700 (Integrated DNA Technologies), TYE 665 (Integrated DNA Technologies), Alex Fluor 594 (Integrated DNA Technologies), or ATTO TM 633 (NHS Ester) (Integrated DNA Technologies). A quenching moiety can be Iowa Black RQ (Integrated DNA Technologies), Iowa Black FQ (Integrated DNA Technologies) or IRDye QC-1 Quencher (LiCor). Any of the quenching moieties described herein can be from any commercially available source, can be an alternative with a similar function, a generic, or a non-trade name of the quenching moieties listed.
[00116] Generally, the generation of the detectable signal from the release of the detection moiety indicates that cleavage by the programmable nucleases has occurred and that the sample contains the target nucleic acid. In some cases, the detection moiety comprises a fluorescent dye. In some embodiments, the detection moiety comprises a fluorescence resonance energy transfer (FRET) pair. In some cases, the detection moiety comprises an infrared (IR) dye. In some cases, the detection moiety comprises an ultraviolet (UV) dye.
Alternatively, or in combination, the detection moiety comprises a protein. In some embodiments, the detection moiety comprises a biotin. In some embodiments, the detection moiety comprises at least one of avidin or streptavidin. In some instances, the detection moiety comprises a polysaccharide, a polymer, or a nanoparticle. In some instances, the detection moiety comprises a gold nanoparticle or a latex nanoparticle.
[00117] A detection moiety can be any moiety capable of generating a calorimetric, potentiometric, amperometric, optical (e.g., fluorescent, colorimetric, etc.), or piezo-electric signal. A nucleic acid of a reporter, in some embodiments, is protein-nucleic acid that is capable of generating a calorimetric, potentiometric, amperometric, optical (e.g, fluorescent, colorimetric, etc.), or piezo-electric signal upon cleavage of the nucleic acid. Often a calorimetric signal is heat produced after cleavage of the nucleic acids of a reporter. In some embodiments, a calorimetric signal is heat absorbed after cleavage of the nucleic acids of a reporter. A potentiometric signal, for example, is electrical potential produced after cleavage of the nucleic acids of a reporter. An amperometric signal can be movement of electrons produced after the cleavage of nucleic acid of a reporter. Often, the signal is an optical signal, such as a colorimetric signal or a fluorescence signal. An optical signal is, for example, a light output produced after the cleavage of the nucleic acids of a reporter. In some embodiments, an optical signal is a change in light absorbance between before and after the cleavage of nucleic acids of a reporter. Often, a piezo-electric signal is a change in mass between before and after the cleavage of the nucleic acid of a reporter. Other methods of detection may also be used, such as optical imaging, surface plasmon resonance (SPR), and/or interferometric sensing.
[00118] In some embodiments, the detectable signal is a colorimetric signal or a signal visible by eye. In some instances, the detectable signal is fluorescent, electrical, chemical, electrochemical, or magnetic. In some cases, a detectable signal (e.g., a first detectable signal) is generated by binding of the detection moiety to the capture molecule in the detection region, where the detectable signal indicates that the sample contained the target nucleic acid. In some embodiments, systems are capable of detecting more than one type of target nucleic acid, where the system comprises more than one type of guide nucleic acid and more than one type of reporter nucleic acid. For example, systems may be capable of distinguishing between two different target nucleic acids in a sample (e.g, wild-type or mutant when investigating SNPs as described herein). In some cases, the detectable signal is generated directly by the cleavage event. Alternatively, or in combination, the detectable
signal is generated indirectly by the cleavage event. In some instances, the detectable signal is not a fluorescent signal. In some instances, the detectable signal is a colorimetric or colorbased signal. In some cases, the detected target nucleic acid is identified based on its spatial location on the detection region of the support medium. In some cases, a second detectable signal is generated in a spatially distinct location than a first detectable signal when two or more detectable signals are generated.
[00119] Often, the reporter is an enzyme-nucleic acid. The enzyme can be sterically hindered when present as in the enzyme-nucleic acid, but then functional upon cleavage from the nucleic acid by the programmable nuclease. Often, the enzyme is an enzyme that produces a reaction with an enzyme substrate. An enzyme can be invertase. Often, the substrate of invertase is sucrose and DNS reagent.
[00120] In some embodiments, the reporter is a substrate-nucleic acid. Often the substrate is a substrate that produces a reaction with an enzyme. Release of the substrate upon cleavage by the programmable nuclease may free the substrate to react with the enzyme.
[00121] In some embodiments, a reporter is attached to a solid support. The solid support, for example, can be a surface. A surface can be an electrode. In some embodiments, the solid support is a bead. Often the bead is a magnetic bead. Upon cleavage, the detection moiety is liberated from the solid support and interacts with other mixtures. For example, the detection moiety is an enzyme, and upon cleavage of the nucleic acid of the enzyme-nucleic acid, the enzyme flows through a chamber into a mixture comprising the substrate. When the enzyme meets the enzyme substrate, a reaction occurs, such as a colorimetric reaction, which is then detected. As another example, the detection moiety is an enzyme substrate, and upon cleavage of the nucleic acid of the enzyme substrate-nucleic acid, the enzyme flows through a chamber into a mixture comprising the enzyme. When the enzyme substrate meets the enzyme, a reaction occurs, such as a calorimetric reaction, which is then detected.
[00122] In some embodiments, the reporter comprises a nucleic acid conjugated to an affinity molecule which is in turn conjugated to the fluorophore (e.g, nucleic acid - affinity molecule - fluorophore) or the nucleic acid conjugated to the fluorophore which is in turn conjugated to the affinity molecule (e.g, nucleic acid - fluorophore - affinity molecule). In some embodiments, a linker conjugates the nucleic acid to the affinity molecule. In some embodiments, a linker conjugates the affinity molecule to the fluorophore. In some embodiments, a linker conjugates the nucleic acid to the fluorophore. A linker can be any
suitable linker known in the art. In some embodiments, the nucleic acid of the reporter can be directly conjugated to the affinity molecule and the affinity molecule can be directly conjugated to the fluorophore or the nucleic acid can be directly conjugated to the fluorophore and the fluorophore can be directly conjugated to the affinity molecule. In this context, “directly conjugated” indicates that no intervening molecules, polypeptides, proteins, or other moi eties are present between the two moieties directly conjugated to each other. For example, if a reporter comprises a nucleic acid directly conjugated to an affinity molecule and an affinity molecule directly conjugated to a fluorophore, then no intervening moiety is present between the nucleic acid and the affinity molecule and no intervening moiety is present between the affinity molecule and the fluorophore. In some implementations, the affinity molecule is biotin, avidin, streptavidin, or any similar molecule.
[00123] In some cases, the reporter comprises a substrate-nucleic acid. The substrate can be sequestered from its cognate enzyme when present as in the substrate-nucleic acid, but then is released from the nucleic acid upon cleavage, where the released substrate can contact the cognate enzyme to produce a detectable signal. Often, the substrate is sucrose and the cognate enzyme is invertase, and a DNS reagent can be used to monitor invertase activity.
[00124] A reporter can be a hybrid nucleic acid reporter. A hybrid nucleic acid reporter comprises a nucleic acid with at least one deoxyribonucleotide and at least one ribonucleotide. In some embodiments, the nucleic acid of the hybrid nucleic acid reporter is of any length and/or has any mixture of DNAs and RNAs. For example, in some cases, longer stretches of DNA can be interrupted by a few ribonucleotides. Alternatively, longer stretches of RNA can be interrupted by a few deoxyribonucleotides. Alternatively, every other base in the nucleic acid can alternate between ribonucleotides and deoxyribonucleotides. A major advantage of the hybrid nucleic acid reporter is increased stability as compared to a pure RNA nucleic acid reporter. For example, a hybrid nucleic acid reporter can be more stable in solution, lyophilized, or vitrified as compared to a pure DNA or pure RNA reporter.
[00125] Modified Nucleic Acids.
[00126] In some cases, a reporter and/or guide nucleic acid comprises one or more modifications, e.g., a base modification, a backbone modification, a sugar modification, etc., to provide the nucleic acid with anew or enhanced feature (e.g., improved stability).
Examples of suitable modifications include modified nucleic acid backbones and non-natural
intemucleoside linkages. Other suitable modifications include nucleic acid mimetics. The nucleic acids described herein can include one or more substituted sugar moieties. The nucleic acids described herein can include nucleobase modifications or substitutions.
[00127] The nucleic acids described and referred to herein can comprise a plurality of base pairs. As used herein, a base pair refers to a biological unit comprising two nucleobases bound to each other by hydrogen bonds. Nucleobases can comprise adenine, guanine, cytosine, thymine, and/or uracil. In some cases, the nucleic acids described and referred to herein can comprise different base pairs. In some cases, the nucleic acids described and referred to herein can comprise one or more modified base pairs. The one or more modified base pairs can be produced when one or more base pairs undergo a chemical modification leading to new bases. The one or more modified base pairs can be, for example, Hypoxanthine, Inosine, Xanthine, Xanthosine, 7-Methylguanine, 7-Methylguanosine, 5,6- Dihydrouracil, Dihydrouridine, 5 -Methylcytosine, 5-Methylcytidine, 5- hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), or 5-carboxylcytosine (5caC).
[00128] Target Loci.
[00129] As used herein, the term “locus” (e.g. , target locus) refers to one or more nucleotides that map to a reference sequence, such as a genome. In some embodiments, a locus refers to a position (e.g., a site) within a reference sequence, e.g., on a particular chromosome of a genome. In some embodiments, a locus refers to a single nucleotide position within a reference sequence, e.g., on a particular chromosome within a genome. In some embodiments, a locus refers to a group of nucleotide positions within a reference sequence. In some instances, a locus is defined by a mutation (e.g., substitution, insertion, deletion, inversion, or translocation) of consecutive nucleotides within a reference sequence. In some instances, a locus is defined by a gene, a sub-genic structure (e.g., a regulatory element, exon, intron, or combination thereof), or a predefined span of a chromosome. In some embodiments, a locus (e.g. , a target locus) is represented in a biological sample by one or more nucleic acid molecules (e.g., target nucleic acid molecules) in the biological sample, where all or a portion of the one or more nucleic acid molecules map to the respective locus.
[00130] In some cases, a locus is defined by a biomarker and/or antimicrobial resistance gene that maps to the position of the respective locus. For instance, in some embodiments, a respective target locus comprises a plurality of alleles including an allele having a mutation that confers resistance to a microorganism against antimicrobial interventions that target the
respective locus (e.g., antibacterial resistance, antiprotozoal resistance, antifungal resistance, antihelminthic resistance, and/or antiviral resistance). In some such embodiments, a respective target locus comprises a genetic marker for the target locus that indicates a resistance to antimicrobial interventions. Examples of antimicrobial resistance markers (e.g., genes and/or amino acid residues) include, but are not limited to, the antimicrobial resistance markers listed below in, e.g, Capela et al., 2019, “An Overview of Drug Resistance in Protozoal Diseases,” Int J Mol Sci. 20(22): 5748; doi: 10.3390/ijms20225748; Beech et al., 2011, “Anthelmintic resistance: markers for resistance, or susceptibility?” Parasitology 138(2): 160-174; doi: 10.1017/S0031182010001198; and Toledu-Rueda e/ a/., 2018, “Antiviral resistance markers in influenza virus sequences in Mexico, 2000-2017,” Infect Drug Resist 11: 1751-1756; doi: 10.2147/IDR.S153154, each of which is hereby incorporated herein by reference in its entirety.
[00131] Target Nucleic Acids.
[00132] Disclosed herein are systems and methods for detecting a target nucleic acid and/or distinguishing between different target nucleic acids. In some instances, the target nucleic acid is a single stranded nucleic acid. Alternatively, or in combination, the target nucleic acid is a double stranded nucleic acid and is prepared into single stranded nucleic acids before or upon contacting the programmable nuclease-based detection reagents (e.g, programmable nuclease, guide nucleic acid, and/or reporter). In some embodiments, the target nucleic acid is a double stranded nucleic acid. In some embodiments, the double stranded nucleic acid is DNA. In some implementations, the target nucleic acid is an RNA. Target nucleic acids include but are not limited to mRNA, rRNA, tRNA, non-coding RNA, long non-coding RNA, and microRNA (miRNA). In some instances, the target nucleic acid is complementary DNA (cDNA) synthesized from a single-stranded RNA template in a reaction catalyzed by a reverse transcriptase. In some cases, the target nucleic acid is single-stranded RNA (ssRNA) or mRNA. In some cases, the target nucleic acid is from a virus, a parasite, or a bacterium described herein.
[00133] In some cases, the target nucleic acid comprises 5 to 100, 5 to 90, 5 to 80, 5 to 70, 5 to 60, 5 to 50, 5 to 40, 5 to 30, 5 to 25, 5 to 20, 5 to 15, or 5 to 10 nucleotides in length. In some cases, the target nucleic acid comprises 10 to 90, 20 to 80, 30 to 70, or 40 to 60 nucleotides in length. In some instances, the target nucleic acid sequence can be from 10 to 95, from 20 to 95, from 30 to 95, from 40 to 95, from 50 to 95, from 60 to 95, from 10 to 75, from 20 to 75, from 30 to 75, from 40 to 75, from 50 to 75, from 5 to 50, from 15 to 50, from
25 to 50, from 35 to 50, or from 45 to 50 nucleotides in length. In some cases, the target nucleic acid comprises 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 45, 50, 60, 70, 80, 90, or 100 nucleotides in length. In some instances, the target nucleic acid comprises at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, or at least 100 nucleotides in length. The target nucleic acid can be reverse complementary to a guide nucleic acid. In some cases, at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 45, 50, 60, 70, 80, 90, or 100 nucleotides of a guide nucleic acid can be reverse complementary to a target nucleic acid.
[00134] A target nucleic acid can be an amplified nucleic acid of interest. The nucleic acid of interest can be any nucleic acid disclosed herein or from any sample as disclosed herein. The nucleic acid of interest can be an RNA that is reverse transcribed before amplification. In some embodiments, the nucleic acid of interest is amplified, and then the amplicons are transcribed into RNA.
[00135] In some embodiments, target nucleic acids activate a programmable nuclease to initiate sequence-independent cleavage of a nucleic acid-based reporter (e.g., a reporter comprising an RNA sequence, or a reporter comprising DNA and RNA). For example, a programmable nuclease of the present disclosure is activated by a target nucleic acid to cleave reporters having an RNA (also referred to herein as an “RNA reporter”). Alternatively, a programmable nuclease of the present disclosure is activated by a target RNA to cleave reporters having an RNA (also referred to herein as a “RNA reporter”). The RNA reporter can comprise a single-stranded RNA labeled with a detection moiety or can be any RNA reporter as disclosed herein.
[00136] In some embodiments, the target nucleic acid as described in the methods herein does not initially comprise a PAM sequence. However, any target nucleic acid of interest can be generated using the methods described herein to comprise a PAM sequence, and thus be a PAM target nucleic acid. A PAM target nucleic acid, as used herein, refers to a target nucleic acid that has been amplified to insert a PAM sequence that is recognized by a CRISPR/Cas system.
[00137] In some embodiments, the target nucleic acid is in a cell. In some embodiments, the cell is a single-cell eukaryotic organism; a plant cell an algal cell; a fungal cell; an animal
cell; a cell an invertebrate animal; a cell a vertebrate animal such as fish, amphibian, reptile, bird, and mammal; or a cell a mammal such as a human, a non-human primate, an ungulate, a feline, a bovine, an ovine, and a caprine. In preferred embodiments, the cell is a eukaryotic cell. In preferred embodiments, the cell is a mammalian cell, a human cell, or a plant cell.
[00138] In some embodiments, the target nucleic acid sequence comprises a nucleic acid sequence of a virus, a bacterium, or other pathogen responsible for a disease in a plant (e.g, a crop). Methods and compositions of the disclosure can be used to treat or detect a disease in a plant. For example, the methods of the disclosure can be used to target a viral nucleic acid sequence in a plant. In some embodiments, a programmable nuclease of the disclosure (e.g, Cast 4) cleaves the viral nucleic acid. In some embodiments, the target nucleic acid sequence comprises a nucleic acid sequence of a virus or a bacterium or other agents (e.g, any pathogen) responsible for a disease in the plant (e.g, a crop). In some embodiments, the target nucleic acid comprises RNA. The target nucleic acid, in some cases, is a portion of a nucleic acid from a virus or a bacterium or other agents responsible for a disease in the plant (e.g, a crop). In some cases, the target nucleic acid is a portion of a nucleic acid from a genomic locus, or any NA amplicon, such as a reverse transcribed mRNA or a cDNA from a gene locus, a transcribed mRNA, or a reverse transcribed cDNA from a gene locus in at a virus or a bacterium or other agents (e.g, any pathogen) responsible for a disease in the plant (e.g, a crop). A virus infecting the plant can be an RNA virus. A virus infecting the plant can be a DNA virus. Non-limiting examples of viruses that can be targeted with the disclosure include Tobacco mosaic virus (TMV), Tomato spotted wilt virus (TSWV), Cucumber mosaic virus (CMV), Potato virus Y (PVY), Cauliflower mosaic virus (CaMV) (RT virus), Plum pox virus (PPV), Brome mosaic virus (BMV) and Potato virus X (PVX).
[00139] Mutations
[00140] In some instances, target nucleic acids comprise a mutation. In some instances, a sequence comprising a mutation is modified to a wild-type sequence with a composition, system or method described herein. In some instances, a sequence comprising a mutation is detected with a composition, system or method described herein. The mutation can be a mutation of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more nucleotides. Non-limiting examples of mutations are insertion-deletion (indel), single nucleotide polymorphism (SNP), and frameshift mutations. In some instances, guide nucleic acids described herein hybridize to a region of the target nucleic acid comprising the mutation. For example, in some embodiments, a target locus comprises a plurality of alleles
including a first allele ( .g., a wild-type allele) and a second allele (e.g, a mutant allele), where the first allele and the second allele differ by a mutation in the nucleic acid sequence of the respective locus, and the respective guide nucleic acid for each respective allele is a nucleic acid that is reverse complementary to the respective allele. In some embodiments, the mutation is located in a non-coding region or a coding region of a gene.
[00141] In some instances, target nucleic acids comprise a mutation, where the mutation is a SNP. The single nucleotide mutation or SNP can be associated with a phenotype of the sample or a phenotype of the organism from which the sample was taken. The SNP, in some embodiments, is associated with altered phenotype from wild-type phenotype. The SNP can be a synonymous substitution or a nonsynonymous substitution. The nonsynonymous substitution can be a missense substitution or a nonsense point mutation. The synonymous substitution can be a silent substitution. The mutation can be a deletion of one or more nucleotides. Often, the single nucleotide mutation, SNP, or deletion is associated with a disease such as cancer or a genetic disorder. The mutation, such as a single nucleotide mutation, a SNP, or a deletion, can be encoded in the sequence of a target nucleic acid from the germline of an organism or can be encoded in a target nucleic acid from a diseased cell, such as a cancer cell.
[00142] In some instances, target nucleic acids comprise a mutation, where the mutation is a deletion of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more nucleotides. The mutation can be a deletion of about 5, about 10, about 15, about 20, about 25, about 30, about 35, about 40, about 45, about 50, about 55, about 60, about 65, about 70, about 75, about 80, about 85, about 90, about 95, about 100, about 200, about 300, about 400, about 500, about 600, about 700, about 800, about 900, or about 1000 nucleotides. The mutation can be a deletion of 1 to 5, 5 to 10, 10 to 15, 15 to 20, 20 to 25, 25 to 30, 30 to 35, 35 to 40, 40 to 45, 45 to 50, 50 to 55, 55 to 60, 60 to 65, 65 to 70, 70 to 75, 75 to 80, 80 to 85, 85 to 90, 90 to 95, 95 to 100, 100 to 200, 200 to 300, 300 to 400, 400 to 500, 500 to 600, 600 to 700, 700 to 800, 800 to 900, 900 to 1000, 1 to 50, 1 to 100, 25 to 50, 25 to 100, 50 to 100, 100 to 500, 100 to 1000, or 500 to 1000 nucleotides.
[00143] As used herein, the term “single nucleotide polymorphism” or “SNP,” refers to the variation of a single nucleotide or nucleobase at a specific position in a nucleic acid sequence. The single nucleotide or nucleobase variation is generally between the genomes of two members of the same species, or some other specific population. In some cases, a SNP occurs at a specific nucleic acid site in genomic DNA in which different alternative sequences, e.g.,
“alleles,” exist more frequently in certain member of a population. In some cases, a less frequent allele comprises the SNP and has an abundance of at least 1%, 0.8%, 0.5%, 0.4%, 0.3%, 0.2% or 0.1%. In some instances, a SNP is any point mutation that is sufficiently present in a population (e.g, 1%, 0.8%, 0.5%, 0.4%, 0.3%, 0.2%, 0.1% or more). A SNP can be disease-causing, at least partially, or can be associated with a disease. SNPs are known to skilled artisans and can be located in relevant published papers and genomic databases.
[00144] As used herein, the term “allele” refers to a variant of a given gene and can be the result of a SNP or any sequence variation from a reference sequence. From any number of different alleles of a given gene, an allelic frequency can be determined, thereby providing the fraction of all chromosomes that carry a particular allele of a gene within a specific population.
[00145] As used herein, the term “wild-type,” or “wild-type variant” refers to a segment or region of nucleic acid sequence, or fragment thereof, that is the universal form (e.g, present in at least 40%) within a population. In some embodiments, “wild-type,” or “wild-type variant” refers to a segment or region of nucleic acid sequence, or fragment thereof, lacking commonly known sequence variations or allelic variations which can be silent, causal, disease-associated, or disease-risk causing. In some embodiments, “wild-type,” or “wild-type variant” refers to a to a polypeptide or protein expressed by a naturally occurring organism, or a polypeptide or protein having the characteristics of a polypeptide or protein isolated from a naturally occurring organism, where the polypeptide or protein is relatively constant (e.g, present in at least 40% of) a species population.
[00146] In some instances, mutations are associated with a disease, that is the mutation in a subject indicates that the subject is susceptible to, or suffers from, a disease, disorder, or pathological state. In some examples, a mutation associated with a disease refers to a mutation which causes the disease, contributes to the development of the disease, or indicates the existence of the disease. A mutation associated with a disease can also refer to any mutation which generates transcription or translation products at an abnormal level, or in an abnormal form, in cells affected by a disease relative to a control without the disease. Nonlimiting examples of diseases associated with mutations are hemophilia, sickle cell anemia, P-thalassemia, Duchene muscular dystrophy, severe combined immunodeficiency (SCID, also known as “bubble boy syndrome”), Huntington’s disease, cystic fibrosis, and various cancers.
[00147] Samples
[00148] The systems and methods of the present disclosure can be used to detect one or more target sequences or nucleic acids in one or more samples. The one or more samples can comprise one or more target sequences or nucleic acids for detection of an ailment, such as a disease, cancer, or genetic disorder, or genetic information, such as for phenotyping, genotyping, or determining ancestry and are compatible with the reagents and support mediums as described herein. Generally, a sample can be taken from any place where a nucleic acid can be found. Samples can be taken from an individual/human, a non-human animal, or a crop, or an environmental sample can be obtained to test for presence of a disease, virus, pathogen, cancer, genetic disorder, or any mutation or pathogen of interest. A biological sample can be blood, serum, plasma, lung fluid, exhaled breath condensate, saliva, spit, urine, stool, feces, mucus, lymph fluid, peritoneal , cerebrospinal fluid, amniotic fluid, breast milk, gastric secretions, bodily discharges, secretions from ulcers, pus, nasal secretions, sputum, pharyngeal exudates, urethral secretions/mucus, vaginal secretions/mucus, anal secretion/mucus, semen, tears, an exudate, an effusion, tissue fluid, interstitial fluid (e.g, tumor interstitial fluid), cyst fluid, tissue, or, in some instances, any combination thereof. A sample can be an aspirate of a bodily fluid from an animal (e.g, human, animals, livestock, pet, etc.) or plant. A tissue sample can be from any tissue that can be infected or affected by a pathogen (e.g, a wart, lung tissue, skin tissue, and the like). A tissue sample (e.g, from animals, plants, or humans) can be dissociated or liquified prior to application to detection system of the present disclosure. A sample can be from a plant (e.g, a crop, a hydroponically grown crop or plant, and/or house plant). Plant samples can include extracellular fluid, from tissue (e.g, root, leaves, stem, trunk etc.). A sample can be taken from the environment immediately surrounding a plant, such as hydroponic fluid/ water, or soil. A sample from an environment can be from soil, air, or water. In some instances, the environmental sample is taken as a swab from a surface of interest or taken directly from the surface of interest. In some instances, the raw sample is applied to the detection system. In some instances, the sample is diluted with a buffer or a fluid or concentrated prior to application to the detection system. In some cases, the sample is contained in no more than about 200 nanoliters (nL). In some cases, the sample is contained in about 200 nL. In some cases, the sample is contained in a volume that is greater than about 200 nL and less than about 20 microliters (pL). In some cases, the sample is contained in no more than 20 pl. In some cases, the sample is contained in no more than 1, 5, 10, 15, 20, 25, 30, 35 40, 45, 50,
55, 60, 65, 70, 75, 80, 90, 100, 200, 300, 400, 500 pl, or any of value from 1 pl to 500 pl. In some cases, the sample is contained in from 1 pL to 500 pL, from 10 pL to 500 pL, from 50 pL to 500 pL, from 100 pL to 500 pL, from 200 pL to 500 pL, from 300 pL to 500 pL, from 400 pL to 500 pL, from 1 pL to 200 pL, from 10 pL to 200 pL, from 50 pL to 200 pL, from 100 pL to 200 pL, from 1 pL to 100 pL, from 10 pL to 100 pL, from 50 pL to 100 pL, from 1 pL to 50 pL, from 10 pL to 50 pL, from 1 pL to 20 pL, from 10 pL to 20 pL, or from 1 pL to 10 pL. In some embodiments, the sample is contained in more than 500 pl.
[00149] In some instances, the sample is taken from a single-cell eukaryotic organism; a plant or a plant cell; an algal cell; a fungal cell; an animal or an animal cell, tissue, or organ; a cell, tissue, or organ from an invertebrate animal; a cell, tissue, fluid, or organ from a vertebrate animal such as fish, amphibian, reptile, bird, and mammal; a cell, tissue, fluid, or organ from a mammal such as a human, a non-human primate, an ungulate, a feline, a bovine, an ovine, and a caprine. In some instances, the sample is taken from nematodes, protozoans, helminths, or malarial parasites. In some cases, the sample can comprise nucleic acids from a cell lysate from a eukaryotic cell, a mammalian cell, a human cell, a prokaryotic cell, or a plant cell. In some cases, the sample can comprise nucleic acids expressed from a cell.
[00150] The sample used for phenotyping testing can comprise at least one target nucleic acid segment that can bind to a guide nucleic acid of the reagents described herein. The target nucleic acid segment, in some cases, is a portion of a nucleic acid from a gene associated with a phenotypic trait.
[00151] The sample used for genotyping testing can comprise at least one target nucleic acid segment that can bind to a guide nucleic acid of the reagents described herein. The target nucleic acid segment, in some cases, is a portion of a nucleic acid from a gene associated with a genotype.
[00152] The sample used for ancestral testing can comprise at least one target nucleic acid segment that can bind to a guide nucleic acid of the reagents described herein. The target nucleic acid segment, in some cases, is a portion of a nucleic acid from a gene associated with a geographic region of origin or ethnic group.
[00153] The sample can be used for identifying a disease status. For example, a sample is any sample described herein, and is obtained from a subject for use in identifying a disease status of a subject. The disease can be a cancer or genetic disorder. In some embodiments, a method may comprise obtaining a serum sample from a subject; and identifying a disease
status of the subject. Often, the disease status is prostate disease status. In any of the embodiments described herein, the device can be configured for asymptomatic, pre- symptomatic, and/or symptomatic diagnostic applications, irrespective of immunity. In any of the embodiments described herein, the device can be configured to perform one or more serological assays on a sample (e.g, a sample comprising blood).
[00154] In some cases, the target sequence is a portion of a nucleic acid from a virus or a bacterium or other agents responsible for a disease in the sample. The target sequence, in some cases, is a portion of a nucleic acid from a sexually transmitted infection or a contagious disease, in the sample. The target sequence, in some cases, is a portion of a nucleic acid from an upper respiratory tract infection, a lower respiratory tract infection, or a contagious disease, in the sample. The target sequence, in some cases, is a portion of a nucleic acid from a hospital acquired infection or a contagious disease, in the sample. The target sequence, in some cases, is a portion of a nucleic acid from sepsis, in the sample. These diseases can include but are not limited to respiratory viruses (e.g, SARS-CoV-2 (i.e., a virus that causes COVID- 19), SARS-CoV-1, MERS-CoV, influenza, Adenovirus, Coronavirus HKU1, Coronavirus NL63, Coronavirus 229E, Coronavirus OC43, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), Human Metapneumovirus (hMPV), Human Rhinovirus (HRVs A, B, C), Human Enterovirus, Influenza A, Influenza A/Hl, Influenza A/H2, Influenza A/H3, Influenza A/H4, Influenza A/H5, Influenza A/H6, Influenza A/H7, Influenza A/H8, Influenza A/H9, Influenza A/H10, Influenza A/Hl 1, Influenza A/H12, Influenza A/H13, Influenza A/H14, Influenza A/H15, Influenza A/H16, Influenza A/Hl- 2009, Influenza A/Nl, Influenza A/N2, Influenza A/N3, Influenza A/N4, Influenza A/N5, Influenza A/N6, Influenza A/N7, Influenza A/N8, Influenza A/N9, Influenza A/N 10, Influenza A/N 11 , oseltamivir-resistant Influenza A, Influenza B, Influenza B - Victoria V 1 , Influenza B - Yamagata Yl, Influenza C, Parainfluenza Virus 1, Parainfluenza Virus 2, Parainfluenza Virus 3, Parainfluenza Virus 4, Respiratory Syncytial Virus A, Respiratory Syncytial Virus B) and respiratory bacteria (e.g, Bordetella parapertussis, Bordetella pertussis, Bordetella bronchiseptica, Bordetella holmesii, Chlamydia pneumoniae, Mycoplasma pneumoniae). Other viruses include human immunodeficiency virus (HIV), human papillomavirus (HPV), chlamydia, gonorrhea, syphilis, trichomoniasis, sexually transmitted infection, malaria, Dengue fever, Ebola, chikungunya, and leishmaniasis.
Pathogens include viruses, fungi, helminths, protozoa, malarial parasites, Plasmodium parasites, Toxoplasma parasites, and Schistosoma parasites. Helminths include roundworms,
heartworms, and phytophagous nematodes, flukes, Acanthocephala, and tapeworms. Protozoan infections include infections from Giardia spp., Trichomonas spp., African trypanosomiasis, amoebic dysentery, babesiosis, balantidial dysentery, Chagas disease, coccidiosis, malaria and toxoplasmosis. Examples of pathogens such as parasitic/protozoan pathogens include, but are not limited to: Plasmodium falciparum, P. vivax, Trypanosoma cruzi and Toxoplasma gondii. Fungal pathogens include, but are not limited to Cryptococcus neoformans, Histoplasma capsulatum, Coccidioides immitis, Blastomyces dermatitidis, Chlamydia trachomatis, Chlamydia pneumoniae, Chlamydia psittaci, and Candida albicans. Pathogenic viruses include but are not limited to: respiratory viruses (e.g., adenoviruses, parainfluenza viruses, severe acute respiratory syndrome (SARS), coronavirus, MERS), gastrointestinal viruses (e.g., noroviruses, rotaviruses, some adenoviruses, astroviruses), exanthematous viruses (e.g., the virus that causes measles, the virus that causes rubella, the virus that causes chickenpox/shingles, the virus that causes roseola, the virus that causes smallpox, the virus that causes fifth disease, chikungunya virus infection); hepatic viral diseases (e.g., hepatitis A, B, C, D, E); cutaneous viral diseases (e.g., warts (including genital, anal), herpes (including oral, genital, anal), molluscum contagiosum); hemmorhagic viral diseases (e.g., Ebola, Lassa fever, dengue fever, yellow fever, Marburg hemorrhagic fever, Crimean-Congo hemorrhagic fever); neurologic viruses (e.g., polio, viral meningitis, viral encephalitis, rabies), sexually transmitted viruses (e.g., HIV, HPV, and the like), immunodeficiency virus (e.g., HIV); influenza virus; dengue; West Nile virus; herpes virus; yellow fever virus; Hepatitis Virus C; Hepatitis Virus A; Hepatitis Virus B; papillomavirus; and the like. Pathogens include, e.g., HIV virus, Mycobacterium tuberculosis, Klebsiella pneumoniae, Acinetobacter baumannii, Bacillus anthracis, Bordetella pertussis, Burkholderia cepacia, Corynebacterium diphtheriae, Coxiella burnetii, Streptococcus agalactiae, methicillin-resistant Staphylococcus aureus, Legionella longbeachae, Legionella pneumophila, Leptospira interrogans, Moraxella catarrhalis, Streptococcus pyogenes, Escherichia coli, Neisseria gonorrhoeae, Neisseria meningitidis, Neisseria elongate, Neisseria gonorrhoeae, Parechovirus, Pneumococcus, Pneumocystis jirovecii, Cryptococcus neoformans, Histoplasma capsulatum, Haemophilus influenzae B, Treponema pallidum, Lyme disease spirochetes, Pseudomonas aeruginosa, Mycobacterium leprae, Brucella abortus, rabies virus, influenza virus, cytomegalovirus, herpes simplex virus I, herpes simplex virus II, human serum parvo-like virus, respiratory syncytial virus (RSV), M. genitalium, T. Vaginalis, varicella-zoster virus, hepatitis B virus, hepatitis C virus, measles virus, adenovirus, human T-cell leukemia viruses, Epstein-Barr virus, murine leukemia virus,
mumps virus, vesicular stomatitis virus, Sindbis virus, lymphocytic choriomeningitis virus, wart virus, blue tongue virus, Sendai virus, feline leukemia virus, Reovirus, polio virus, simian virus 40, mouse mammary tumor virus, dengue virus, rubella virus, West Nile virus, Plasmodium falciparum, Plasmodium vivax, Toxoplasma gondii, Trypanosoma rangeli, Trypanosoma cruzi, Trypanosoma rhodesiense, Trypanosoma brucei, Schistosoma mansoni, Schistosoma japonicum, Babesia bovis, Eimeria tenella, Onchocerca volvulus, Leishmania tropica, Mycobacterium tuberculosis, Trichinella spiralis, Theileria parva, Taenia hydatigena, Taenia ovis, Taenia saginata, Echinococcus granulosus, Mesocestoides corti, Mycoplasma arthritidis, M. hyorhinis, M. orale, M. arginini, Acholeplasma laidlawii, M. salivarium, M. pneumoniae, Enter obacter cloacae, Kiebsiella aerogenes, Proteus vulgaris, Serratia macesens, Enterococcus faecalis, Enterococcus faecium, Streptococcus intermdius, Streptococcus pneumoniae, and Streptococcus pyogenes. Often the target nucleic acid comprises a sequence from a virus or a bacterium or other agents responsible for a disease that can be found in the sample. In some cases, the target nucleic acid is a portion of a nucleic acid from a genomic locus, a transcribed mRNA, or a reverse transcribed cDNA from a gene locus in at least one of: human immunodeficiency virus (HIV), human papillomavirus (HPV), chlamydia, gonorrhea, syphilis, trichomoniasis, sexually transmitted infection, malaria, Dengue fever, Ebola, chikungunya, and leishmaniasis. Pathogens include viruses, fungi, helminths, protozoa, malarial parasites, Plasmodium parasites, Toxoplasma parasites, and Schistosoma parasites. Helminths include roundworms, heartworms, and phytophagous nematodes, flukes, Acanthocephala, and tapeworms. Protozoan infections include infections from Giardia spp., Trichomonas spp., African trypanosomiasis, amoebic dysentery, babesiosis, balantidial dysentery, Chagas disease, coccidiosis, malaria and toxoplasmosis. Examples of pathogens such as parasitic/protozoan pathogens include, but are not limited to: Plasmodium falciparum, P. vivax, Trypanosoma cruzi and Toxoplasma gondii. Fungal pathogens include, but are not limited to Cryptococcus neoformans, Histoplasma capsulatum, Coccidioides immitis, Blastomyces dermatitidis , Chlamydia trachomatis, and Candida albicans. Pathogenic viruses include but are not limited to immunodeficiency virus (e.g., HIV); influenza virus; dengue; West Nile virus; herpes virus; yellow fever virus; Hepatitis Virus C; Hepatitis Virus A; Hepatitis Virus B; papillomavirus; and the like. Pathogens include, e.g., HIV virus, Mycobacterium tuberculosis, Streptococcus agalactiae, methicillin- resistant Staphylococcus aureus, Staphylococcus epidermidis, Legionella pneumophila, Streptococcus pyogenes, Streptococcus salivarius, Escherichia coli, Neisseria gonorrhoeae, Neisseria meningitidis, Pneumococcus, Cryptococcus neoformans, Histoplasma capsulatum,
Hemophilus influenzae B, Treponema pallidum, Lyme disease spirochetes, Pseudomonas aeruginosa, Mycobacterium leprae, Brucella abortus, rabies virus, influenza virus, cytomegalovirus, herpes simplex virus I, herpes simplex virus II, human serum parvo-like virus, respiratory syncytial virus (RSV), Alphacoronavirus, Betacoronavirus, Sarbecovirus, SARS-related virus, Gammacoronavirus, Deltacoronavirus, M. genitalium, T. vaginalis, varicella-zoster virus, hepatitis B virus, hepatitis C virus, measles virus, human adenovirus (type A, B, C, D, E, F, G), human T-cell leukemia viruses, Epstein-Barr virus, murine leukemia virus, mumps virus, vesicular stomatitis virus, Sindbis virus, lymphocytic choriomeningitis virus, wart virus, blue tongue virus, Sendai virus, feline leukemia virus, Reovirus, polio virus, simian virus 40, mouse mammary tumor virus, dengue virus, rubella virus, West Nile virus, Human Bocavirus, Plasmodium falciparum, Plasmodium vivax, Toxoplasma gondii, Trypanosoma rangeli, Trypanosoma cruzi, Trypanosoma rhodesiense, Trypanosoma brucei, Schistosoma mansoni, Schistosoma japonicum, Babesia bovis, Eimeria tenella, Onchocerca volvulus, Leishmania tropica, Mycobacterium tuberculosis, Trichinella spiralis, Theileria parva, Taenia hydatigena, Taenia ovis, Taenia saginata, Echinococcus granulosus, Mesocestoides corti, Mycoplasma arthritidis, M. hyorhinis, M. orale, M. arginini, Acholeplasma laidlawii, M. salivarium and AT. pneumoniae. SARS-CoV-2 Variants include Coronavirus HKU1, Coronavirus NL63, Coronavirus 229E, Coronavirus OC43, SARS-CoV-2 85 A, SARS-CoV-2 T1001I, SARS-CoV-2 3675-3677A, SARS-CoV-2 P4715L, SARS-CoV-2 S5360L, SARS-CoV-2 69-70A, SARS-CoV-2 Tyrl44fs, SARS-CoV- 2242-244A, SARS-CoV-2 Y453F, SARS-CoV-2 S477N, SARS-CoV-2 E848K, SARS-CoV- 2 N501Y, SARS-CoV-2 D614G, SARS-CoV-2 P681R, SARS-CoV-2 P681H, SARS-CoV-2 L21F, SARS-CoV-2 Q27Stop, SARS-CoV-2 Mlfs, and SARS-CoV-2 R203fs. In some cases, the target sequence is a portion of a nucleic acid from a genomic locus, a transcribed mRNA, or a reverse transcribed cDNA from a gene locus of bacterium or other agents responsible for a disease in the sample comprising a mutation that confers resistance to a treatment, such as a single nucleotide mutation that confers resistance to antibiotic treatment.
[00155] In some instances, the target sequence is a portion of a nucleic acid from a subject having cancer. The cancer can be a solid cancer (tumor). The cancer can be a blood cell cancer, including leukemias and lymphomas. Non-limiting types of cancer that could be treated with such methods and compositions include colon cancer, rectal cancer, renal-cell carcinoma, liver cancer, bladder cancer, cancer of the kidney or ureter, lung cancer, cancer of the small intestine, esophageal cancer, melanoma, bone cancer, pancreatic cancer, skin
cancer, brain cancer (e.g., glioblastoma), cancer of the head or neck, melanoma, uterine cancer, ovarian cancer, breast cancer, testicular cancer, cervical cancer, stomach cancer, Hodgkin’s Disease, non-Hodgkin’s lymphoma, thyroid cancer. The cancer can be a leukemia, such as, by way of non-limiting example, acute myeloid (or myelogenous) leukemia (AML), chronic myeloid (or myelogenous) leukemia (CML), acute lymphocytic (or lymphoblastic) leukemia (ALL), and chronic lymphocytic leukemia (CLL).
[00156] In some instances, the target sequence is a portion of a nucleic acid from a cancer cell. A cancer cell can be a cell harboring one or more mutations that results in unchecked proliferation of the cancer cell. Such mutations are known in the art. Non-limiting examples of antigens are ADRB3, AKAP-4,ALK, Androgen receptor, B7H3, BCMA, BORIS, BST2, CAIX, CD 179a, CD123, CD171, CD19, CD20, CD22, CD24, CD30, CD300LF, CD33, CD38, CD44v6, CD72, CD79a, CD79b, CD97, CEA, CLDN6, CLEC12A, CLL-1, CS-1, CXORF61, CYP1B1, Cyclin B 1, E7, EGFR, EGFRvIII, ELF2M, EMR2, EPC AM, ERBB2 (Her2/neu), ERG (TMPRSS2 ETS fusion gene), ETV6-AML, EphA2, Ephrin B2, FAP, FCAR, FCRL5, FLT3, Folate receptor alpha, Folate receptor beta, Fos-related antigen 1, Fucosyl GM1, GD2, GD3, GM3, GPC3, GPR20, GPRC5D, GloboH, HAVCR1, HMWMAA, HPV E6, IGF-I receptor, IL-13Ra2, IL-1 IRa, KIT, LAGE-la, LAIR1, LCK, LILRA2, LMP2, LY6K, LY75, LewisY, MAD-CT-1, MAD-CT-2, MAGE Al, MAGE-A1, ML-IAP, MUC1, MYCN, MelanA/MARTl, Mesothelin, NA17, NCAM, NY-BR-1, NY-ESO-1, OR51E2, OY- TES 1, PANX3, PAP, PAX3, PAX5, PCTA-l/Galectin 8, PDGFR-beta, PLAC1, PRSS21, PSCA, PSMA, Polysialic acid, Prostase, RAGE-1, ROR1, RU1, RU2, Ras mutant, RhoC, SART3, SSEA-4, SSX2, TAG72, TARP, TEM1/CD248, TEM7R, TGS5, TRP-2, TSHR, Tie 2, Tn Ag, UPK2, VEGFR2, WT1, XAGE1, and IGLL1.
[00157] In some cases, the target sequence is a portion of a nucleic acid from a control gene in a sample. In some embodiments, the control gene is an endogenous control. The endogenous control can include human 18S rRNA, human GAPDH, human HPRT1, human GUSB, human RNase P, MS2 bacteriophage, or any other control sequence of interest within the sample.
[00158] The sample used for cancer testing or cancer risk testing can comprise at least one target sequence or target nucleic acid segment that can bind to a guide nucleic acid of the reagents described herein. The target nucleic acid segment, in some cases, is a portion of a nucleic acid from a gene with a mutation associated with cancer, from a gene whose overexpression is associated with cancer, a tumor suppressor gene, an oncogene, a checkpoint
inhibitor gene, a gene associated with cellular growth, a gene associated with cellular metabolism, or a gene associated with cell cycle. In some embodiments, the target nucleic acid encodes for a cancer biomarker, such as a prostate cancer biomarker or non-small cell lung cancer. In some cases, the assay can be used to detect “hotspots” in target nucleic acids that can be predictive of cancer, such as lung cancer, cervical cancer, in some cases, the cancer can be a cancer that is caused by a virus. Some non-limiting examples of viruses that cause cancers in humans include Epstein-Barr virus (e.g, Burkitt’s lymphoma, Hodgkin’s Disease, and nasopharyngeal carcinoma); papillomavirus (e.g, cervical carcinoma, anal carcinoma, oropharyngeal carcinoma, penile carcinoma); hepatitis B and C viruses (e.g, hepatocellular carcinoma); human adult T-cell leukemia virus type 1 (HTLV-1) (e.g, T-cell leukemia); and Merkel cell polyomavirus (e.g, Merkel cell carcinoma). One skilled in the art will recognize that viruses can cause or contribute to other types of cancers. In some cases, the target nucleic acid is a portion of a nucleic acid that is associated with a blood fever. In some instances, the mutation is located in a portion of a nucleic acid from a genomic locus, any DNA amplicon of, a reverse transcribed mRNA, or a cDNA from a locus of at least one of: ALK, APC, ATM, AXIN2, BAP1, BARD1, BLM, BMPR1A, BRCA1, BRCA2, BRIP1, CASR, CDC73, CDH1, CDK4, CDKN1B, CDKN1C, CDKN2A, CEBPA, CHEK2, CTNNA1, DICER1, DIS3L2, EGFR, EPCAM, FH, FLCN, GATA2, GPC3, GREM1, HOXB13, HRAS, system, MAX, MEN1, MET, MITF, MLH1, MSH2, MSH3, MSH6, MUTYH, NBN, NF1, NF2, NTHL1, PALB2, PDGFRA, PHOX2B, PMS2, POLDI, POLE, POTI, PRKAR1A, PTCHI, PTEN, RAD50, RAD51C, RAD51D, RBI, RECQL4, RET, RUNX1, SDHA, SDHAF2, SDHB, SDHC, SDHD, SMAD4, SMARCA4, SMARCB1, SMARCE1, STK11, SUFU, TERC, TERT, TMEM127, TP53, TSC1, TSC2, VHL, WRN, and WT1. In some instances, the mutation is associated with a blood disorder, e.g., a thalassemia or an anemia.
[00159] The sample used for genetic disorder testing can comprise at least one target sequence or target nucleic acid segment that can bind to a guide nucleic acid of the reagents described herein. In some embodiments, the genetic disorder is hemophilia, sickle cell anemia, P-thalassemia, Duchene muscular dystrophy, severe combined immunodeficiency, or cystic fibrosis. The target nucleic acid, in some cases, is from a gene with a mutation associated with a genetic disorder, from a gene whose overexpression is associated with a genetic disorder, from a gene associated with abnormal cellular growth resulting in a genetic disorder, or from a gene associated with abnormal cellular metabolism resulting in a genetic
disorder. In some cases, the target nucleic acid is a nucleic acid from a genomic locus, a transcribed mRNA, or a reverse transcribed mRNA, a DNA amplicon of or a cDNA from a locus of at least one of: CFTR, FMRI, SMN1, ABCB11, ABCC8, ABCD1, ACAD9, AC ADM, ACADVL, AC ATI, ACOX1, ACSF3, ADA, ADAMTS2, ADGRG1, AGA, AGL, AGPS, AGXT, AIRE, ALDH3A2, ALDOB, ALG6, ALMS1, ALPL, AMT, AQP2, ARG1, ARSA, ARSB, ASL, ASNS, ASPA, ASS1, ATM, ATP6V1B1, ATP7A, ATP7B, ATRX, BBS1, BBS10, BBS12, BBS2, BCKDHA, BCKDHB, BCS1L, BLM, BSND, CAPN3, CBS, CDH23, CEP290, CERKL, CHM, CHRNE, CIITA, CLN3, CLN5, CLN6, CLN8, CLRN1, CNGB3, COL27A1, COL4A3, COL4A4, COL4A5, COL7A1, CPS1, CPT1A, CPT2, CRB1, CTNS, CTSK, CYBA, CYBB, CYP11B1, CYP11B2, CYP17A1, CYP19A1, CYP27A1, DBT, DCLRE1C, DHCR7, DHDDS, DLD, DMD, DNAH5, DNAI1, DNAI2, DYSF, EDA, EIF2B5, EMD, ERCC6, ERCC8, ESCO2, ETFA, ETFDH, ETHE1, EVC, EVC2, EYS, F9, FAH, FAM161A, FANCA, FANCC, FANCG, FH, FKRP, FKTN, G6PC, GAA, GALC, GALK1, GALT, GAMT, GBA, GBE1, GCDH, GFM1, GJB1, GJB2, GLA, GLB1, GLDC, GLE1, GNE, GNPTAB, GNPTG, GNS, GRHPR, HADHA, HAX1, HBA1„ HBA2, HBB, HEXA, HEXB, HGSNAT, HLCS, HMGCL, HOGA1, HPS1, HPS3, HSD17B4, HSD3B2, HYAL1, HYLS1, IDS, IDUA, IKBKAP, IL2RG, IVD, KCNJ11, LAMA2, LAMA3, LAMB3, LAMC2, LCA5, LDLR, LDLRAP1, LHX3, LIFR, LIPA, LOXHD1, LPL, LRPPRC, MAN2B1, MCOLN1, MED17, MESP2, MFSD8, MKS1, MLC1, MMAA, MMAB, MMACHC, MMADHC, MPI, MPL, MPV17, MTHFR, MTM1, MTRR, MTTP, MUT, MYO7A, NAGLU, NAGS, NBN, NDRG1, NDUFAF5, NDUFS6, NEB, NPC1, NPC2, NPHS1, NPHS2, NR2E3, NTRK1, OAT, OP A3, OTC, PAH, PC, PCCA, PCCB, PCDH15, PDHA1, PDHB, PEX1, PEX10, PEX12, PEX2, PEX6, PEX7, PFKM, PHGDH, PKHD1, PMM2, POMGNT1, PPT1, PROP1, PRPS1, PSAP, PTS, PUS1, PYGM, RAB23, RAG2, RAPSN, RARS2, RDH12, RMRP, RPE65, RPGRIP1L, RSI, RTEL1, SACS, SAMHD1, SEPSECS, SGCA, SGCB, SGCG, SGSH, SLC12A3, SLC12A6, SLC17A5, SLC22A5, SLC25A13, SLC25A15, SLC26A2, SLC26A4, SLC35A3, SLC37A4, SLC39A4, SLC4A11, SLC6A8, SLC7A7, SMARCAL1, SMPD1, STAR, SUMF1, TAT, TCIRG1, TECPR2, TFR2, TGM1, TH, TMEM216, TPP1, TRMU, TSFM, TTP A, TYMP, USH1C, USH2A, VPS13A, VPS13B, VPS45, VRK1, VSX2, WNT10A, XPA, XPC, and ZFYVE26.
[00160] Amplification Techniques
[00161] In some embodiments, target nucleic acids are amplified and/or comprise amplicons. The methods described herein can comprise amplifying a target nucleic acid
molecule, and/or amplifying a nucleic acid molecule in a sample to produce a target nucleic acid molecule. Amplification can occur prior to or simultaneously with detection of a signal indicative of reporter cleavage. In some embodiments, amplification and detection can occur within the same reaction volume sequentially or simultaneously. In some embodiments, amplification and detection can occur sequentially in different reaction volumes. In some embodiments, amplification and detection can occur at different temperatures. In some embodiments, amplification and detection can occur at the same temperature. In some embodiments, amplifying can improve at least one of sensitivity, specificity, or accuracy of the detection of the target nucleic acid.
[00162] Amplification can be isothermal or can comprise thermocycling. In some embodiments, amplification comprises transcription mediated amplification (TMA), helicase dependent amplification (HD A), circular helicase dependent amplification (cHDA), strand displacement amplification (SDA), recombinase polymerase amplification (RPA), polymerase chain reaction (PCR), quantitative PCR (qPCR), nested PCR, multiplex PCR, degenerative PCR, asymmetric PCR, touchdown PCR, random primer PCR, hemi-nested PCR, polymerase cycling assembly (PCA), colony PCR, ligase chain reaction (LCR), digital PCR, methylation-specific PCR (MSP), co-amplification at lower denaturation temperature PCR (COLD-PCR), allele-specific PCR, intersequence-specific PCR (ISS-PCR), whole genome amplification (WGA), inverse PCR, thermal asymmetric interlaced PCR (TAIL- PCR), loop mediated amplification (LAMP), exponential amplification reaction (EXPAR), rolling circle amplification (RCA), ligase chain reaction (LCR), simple method amplifying RNA targets (SMART), single primer isothermal amplification (SPIA), multiple displacement amplification (MDA), nucleic acid sequence based amplification (NASBA), hinge-initiated primer-dependent amplification of nucleic acids (HIP), nicking enzyme amplification reaction (NEAR), and/or improved multiple displacement amplification (IMDA). In some embodiments, the amplification reaction includes reverse transcription (RT), for example when amplifying an RNA target nucleic acid.
[00163] In some embodiments, amplification of the target nucleic acid comprises modifying the sequence of the target nucleic acid. For example, amplification can be used to insert a protospacer adjacent motif (PAM) sequence into a target nucleic acid that lacks a PAM sequence. In some cases, amplification is used to increase the homogeneity of a target nucleic acid in a sample. For example, amplification can be used to remove a nucleic acid variation that is not of interest in the target nucleic acid sequence.
[00164] Subjects.
[00165] As used herein, the term “subject” refers to any living or non-living organism including, but not limited to, a human (e.g. , a male human, female human, fetus, pregnant female, child, or the like), a non-human mammal, or a non-human animal. Any human or non-human animal can serve as a subject, including but not limited to mammal, reptile, avian, amphibian, fish, ungulate, ruminant, bovine (e.g, cattle), equine (e.g., horse), caprine and ovine (e.g., sheep, goat), swine (e.g, pig), camelid (e.g, camel, llama, alpaca), monkey, ape (e.g., gorilla, chimpanzee), ursid (e.g., bear), poultry, dog, cat, mouse, rat, fish, dolphin, whale and shark. In some embodiments, a subject is a male or female of any age (e.g, a man, a woman, or a child).
[00166] Reference Sequences.
[00167] As used herein, the term “reference sequence” refers to a sequence of nucleotide bases. In some embodiments, a reference sequence is a reference genome. For example, a “genome” or “reference genome” can refer to any particular known, sequenced or characterized genome, whether partial or complete, of any organism or virus that can be used to reference identified sequences from a subject. Exemplary reference genomes used for human subjects as well as many other organisms are provided in the on-line genome browser hosted by the National Center for Biotechnology Information (“NCBI”) or the University of California, Santa Cruz (UCSC). A “genome” refers to the complete genetic information of an organism or virus, expressed in nucleic acid sequences. As used herein, a reference sequence or reference genome often is an assembled or partially assembled genomic sequence from an individual subject or from multiple subjects. In some embodiments, a reference genome is an assembled or partially assembled genomic sequence from one or more human subjects. In some embodiments, a reference genome is an assembled or partially assembled genomic sequence from one or more microorganisms of the same species. The reference genome can be viewed as a representative example of a species’ set of genes. In some embodiments, a reference genome comprises sequences assigned to chromosomes. Exemplary human reference genomes include but are not limited to NCBI build 34 (UCSC equivalent: hg!6), NCBI build 35 (UCSC equivalent: hg!7), NCBI build 36.1 (UCSC equivalent: hg!8), GRCh37 (UCSC equivalent: hg!9), and GRCh38 (UCSC equivalent: hg38).
[00168] The implementations described herein provide various technical solutions for determining a mutation call for a target locus in a biological sample. Details of implementations are now described in conjunction with the Figures.
[00169] Exemplary System Embodiments
[00170] FIG. 1 is a block diagram illustrating a system 100 for determining a mutation call for a target locus in a biological sample, in accordance with some implementations. The device 100 in some implementations includes one or more central processing units (CPU(s)) 102 (also referred to as processors), one or more network interfaces 104, a user interface 106, a non-persistent memory 111, a persistent memory 112, and one or more communication buses 110 for interconnecting these components. The one or more communication buses 110 optionally include circuitry (sometimes called a chipset) that interconnects and controls communications between system components. The non-persistent memory 111 typically includes high-speed random access memory, such as DRAM, SRAM, DDR RAM, ROM, EEPROM, flash memory, whereas the persistent memory 112 typically includes CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. The persistent memory 112 optionally includes one or more storage devices remotely located from the CPU(s) 102. The persistent memory 112, and the non-volatile memory device(s) within the non-persistent memory 112, comprises non-transitory computer readable storage medium. In some implementations, the non-persistent memory 111 or alternatively the non-transitory computer readable storage medium stores the following programs, modules and data structures, or a subset thereof, sometimes in conjunction with the persistent memory 112:
• an optional operating system 116, which includes procedures for handling various basic system services and for performing hardware dependent tasks;
• an optional network communication module (or instructions) 118 for connecting the visualization system 100 with other devices, or a communication network;
• a signal data store 120 comprising, for each respective well 122 in a plurality of wells, a corresponding plurality of reporting signals 124 for a respective one or more nucleic acid molecules in the biological sample that map to the target locus, where: o the plurality of wells comprises a first set of wells 122 (e.g., 122-1-1, ... 122- K-l) representing a first allele for the target locus,
o the plurality of wells further comprises a second set of wells 122 (e.g., 122-1- 2, ... 122-K-2) representing a second allele for the target locus, o each corresponding plurality of reporting signals 124 comprises, for each respective time point in a plurality of time points (e.g, P time points, where P is a positive integer), a respective reporting signal in the form of a corresponding discrete attribute value (e.g., 124-1-1-1, ... 124-1-1-P; 124-2-1- 1, ... 124-2-1 -P), o each respective well 122 in the first set of wells and each respective well 122 in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample, and o optionally, the signal data store 120 further comprises a plurality of control wells 126 that are free of nucleic acid derived from the biological sample, including a first set of control wells 126-1 representing the first allele for the target locus and a second set of control wells 126-2 representing the second allele for the target locus;
• a data analysis construct 130 for determining: o for each respective well 122 in the plurality of wells, a corresponding signal yield 132 (e.g., 132-1-1, 132-1-2) for the respective well 122 using the corresponding plurality of reporting signals across the plurality of time points, and o for each respective well 122 in the first set of wells (e.g., 122-1-1, ... 122-K-l), a respective candidate call identity 134 (e.g, 134-1) based on a comparison between a corresponding first signal yield (e.g, 132-1-1) for the respective well in the first set of wells and a corresponding second signal yield (e.g, 132- 1-2) for a corresponding well in the second set of wells; and
• a voting construct 140 for performing a voting procedure across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
[00171] In some implementations, one or more of the above identified elements are stored in one or more of the previously mentioned memory devices and correspond to a set of instructions for performing a function described above. The above identified modules, data, or programs (e.g., sets of instructions) need not be implemented as separate software programs, procedures, data sets, or modules, and thus various subsets of these modules and
data may be combined or otherwise re-arranged in various implementations. In some implementations, the non-persistent memory 111 optionally stores a subset of the modules and data structures identified above. Furthermore, in some embodiments, the memory stores additional modules and data structures not described above. In some embodiments, one or more of the above identified elements is stored in a computer system, other than that of system 100, that is addressable by system 100 so that system 100 may retrieve all or a portion of such data when needed.
[00172] Although FIG. 1 depicts a “system 100,” the figures are intended more as a functional description of the various features which may be present in computer systems than as a structural schematic of the implementations described herein. In practice, and as recognized by those of ordinary skill in the art, items shown separately could be combined and some items could be separated. Moreover, although FIG. 1 depicts certain data and modules in non-persistent memory 111, some or all of these data and modules may be in persistent memory 112.
[00173] Embodiments for Determining Mutation Calls
[00174] While a system in accordance with the present disclosure has been disclosed with reference to FIG. 1, a method in accordance with the present disclosure is now detailed with reference to FIGS. 2A-2G.
[00175] Referring to FIGS. 2A-2G, the present disclosure provides a method 200 for determining a mutation call for a target locus in a biological sample. In some embodiments, the method is performed at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors.
[00176] Referring to Block 202, the method includes obtaining a signal dataset 120 comprising, for each respective well 122 in a plurality of wells in a common plate, a corresponding plurality of reporting signals 124 for a respective one or more nucleic acid molecules in the biological sample that map to the target locus. The signal dataset 120 represents a plurality of time points. The plurality of wells 122 comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus. The plurality of wells 122 further comprises a second set of wells representing a second allele for the target locus, where each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus. Each corresponding
plurality of reporting signals 124 comprises, for each respective time point in the plurality of time points, a respective reporting signal 124 in the form of a corresponding discrete attribute value. Each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample.
[00177] Samples.
[00178] In some embodiments, the biological sample is a clinical sample, a diagnostic sample, an environmental sample, a consumer quality sample, a food sample, a biological product sample, a microbial testing sample, a tumor sample, a forensic sample and/or a laboratory or hospital sample. In some embodiments, the biological sample is obtained from a human or an animal. In some embodiments, a biological sample is a sample from a patient undergoing a treatment.
[00179] In some embodiments, the biological sample is collected from an environmental source, such as a field (e.g, an agricultural field), lake, river, creek, ocean, watershed, water tank, water reservoir, pool (e.g, swimming pool), pond, air vent, wall, roof, soil, plant, and/or other environmental source. In some embodiments, the biological sample is collected from an industrial source, such as a clean room (e.g, in manufacturing or research facilities), hospital, medical laboratory, pharmacy, pharmaceutical compounding center, food processing area, food production area, water or waste treatment facility, and/or food product. In some embodiments, the biological sample is an air sample, such as ambient air in a facility (e.g, a medical facility or other facility), exhaled or expectorated air from a subject, and/or aerosols, including any biological contaminants present therein (e.g, bacteria, fungi, viruses, and/or pollens). In some embodiments, the biological sample is a water sample, such as dialysis systems in medical facility (e.g, to detect waterborne pathogens of clinical significance and/or to determine the quality of water in a facility). In some embodiments, the biological sample is an environmental surface sample, such as before or after a sterilization or disinfecting process (e.g, to confirm the effectiveness of the sterilization or disinfecting procedure).
[00180] In some embodiments, the biological sample is obtained from a subject, such as a human (e.g, a patient). In some embodiments, the biological sample is obtained from any tissue, organ or fluid from the subject. In some embodiments, a plurality of biological samples is obtained from the subject (e.g, a plurality of replicates and/or a plurality of samples including a healthy sample and a diseased sample).
[00181] In some embodiments, the biological sample is a clinical sample. In some embodiments, the biological sample is a bodily fluid. In some embodiments, the bodily fluid is sputum, saliva, nasopharyngeal fluid, oropharyngeal fluid or blood. In some embodiments, the biological sample is obtained from a nasopharyngeal or oropharyngeal swab. In some embodiments, the biological sample is obtained from a sample repository.
[00182] In some embodiments, the biological sample is obtained from a human with a disease condition (e.g., an infectious disease and/or a disease caused by a pathogenic microorganism). For instance, referring to Block 204, in some embodiments, the biological sample is obtained from a subject that is infected with a pathogen. In some embodiments, the pathogen is a virus, bacteria, fungus, or parasite.
[00183] In some embodiments, the pathogen is the causative agent of influenza, common cold, measles, rubella, chickenpox, norovirus, polio, infectious mononucleosis (mono), herpes simplex virus (HSV), human papillomavirus (HPV), human immunodeficiency virus (HIV), viral hepatitis (e.g., hepatitis A, B, C, D, and/or E), viral meningitis, West Nile Virus, rabies, ebola, strep throat, bacterial urinary tract infections (UTIs) (e.g, coliform bacteria), bacterial food poisoning (e.g, E. coli, Salmonella, and/ or Shigella), bacterial cellulitis (e.g, Staphylococcus aureus (MRSA)), bacterial vaginosis, gonorrhea, chlamydia, syphilis, Clostridium difficile (C. diff), tuberculosis, whooping cough, pneumococcal pneumonia, bacterial meningitis, Lyme disease, cholera, botulism, tetanus, anthrax, vaginal yeast infection, ringworm, athlete’s foot, thrush, aspergillosis, histoplasmosis, Cryptococcus infection, fungal meningitis, malaria, toxoplasmosis, trichomoniasis, giardiasis, tapeworm infection, roundworm infection, pubic and head lice, scabies, leishmaniasis, and/or river blindness. In some embodiments, the pathogen is the causative agent of one or more viral respiratory diseases. In some embodiments, the pathogen is the causative agent of a coronavirus infection. Referring to Block 206, in some embodiments, the pathogen is severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
[00184] In some embodiments, the biological sample is any of the embodiments for samples described herein. Other suitable embodiments of samples and/or subjects include those disclosed above (see, for example, the sections entitled “Target Nucleic Acids: Samples” and “Subjects,” above), and any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art.
[00185] Referring to Block 208, in some embodiments, the target locus is selected from the group consisting of a single nucleotide variant, a multi-nucleotide variant, an indel, a DNA rearrangement, and a copy number variation. In some embodiments, the target locus is selected from the group consisting of a restriction fragment length polymorphism (RFLP), a random amplified polymorphic DNA (RAPD), an amplified fragment length polymorphism (AFLP), a variable number tandem repeat (VNTR), an oligonucleotide polymorphism (OP), a single nucleotide polymorphism (SNP), an allele specific associated primer (ASAP), an inverse sequence-tagged repeat (ISTR), an inter-retrotransposon amplified polymorphism (IRAP), and a simple sequence repeat (SSR or microsatellite).
[00186] In some embodiments, the target locus is selected from a database. For instance, in some embodiments, the target locus maps to a corresponding reference sequence (e.g., a reference genome), and the corresponding reference sequence is obtained from a nucleotide sequence database. Suitable nucleotide sequence databases include global genome databases and/or microorganism-specific genome databases. For example, in some embodiments, a reference sequence for a microorganism is obtained from NCBI, BLAST, EMBL-EBI, GenBank, Ensembl, EuPathDB, The Human Microbiome Project, Pathogen Portal, RDP, SILVA, GREENGENES, EBI Metagenomics, EcoCyc, PATRIC, TBDB, PlasmoDB, the Microbial Genome Database (MBGD), and/or the Microbial Rosetta Stone Database. See, for example, Zhulin, 2015, “Databases for Microbiologists,” J Bacteriol 197:2458 -2467, doi: 10.1128/JB.00330-15; Uchiyama et a/., 2019, “MBGD update 2018: microbial genome database based on hierarchical orthology relations covering closely related and distantly related comparisons,” Nuc Acids Res., 47 (DI), D382-D389, doi: 10.1093/nar/gkyl054; and Ecker et al., 2005, “The Microbial Rosetta Stone Database: A compilation of global and emerging infectious microorganisms and bioterrorist threat agents,” BMC Microbiology 5, 19, doi: 10.1186/1471-2180-5-19, each of which is hereby incorporated by reference herein in its entirety.
[00187] In some embodiments, the target locus is a gene. In some embodiments, the target locus maps to all or a portion of a gene. In some embodiments, the target locus maps to all or a portion of a plurality of genes.
[00188] In some embodiments, the target locus comprises DNA or RNA. For instance, in some such embodiments, the target locus maps to all or a portion of a reference genome that consists essentially of RNA sequences. In some embodiments, the target locus maps to all or a portion of a reference genome that consists essentially of DNA sequences.
[00189] In some embodiments, the target locus comprises DNA, and the one or more nucleic acid molecules in the biological sample that map to the target locus is obtained from a transcription reaction of a DNA molecule for the target locus. In some such embodiments, the transcription reaction generates one or more RNA transcripts that correspond to the target locus and, for each respective well in the plurality of wells in the common plate, the corresponding aliquot of nucleic acid derived from the biological sample includes the one or more RNA transcripts used for obtaining the respective plurality of reporting signals.
[00190] In some embodiments, the target locus, nucleic acids, mutations, and/or reference sequences include any of the embodiments described herein. Other suitable embodiments of target loci, nucleic acids, mutations, and/or reference sequences include those disclosed above (see, for example, the sections entitled “Target Loci,” “Target Nucleic Acids,” and “Reference Sequences,” above), as well as any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art.
[00191] For instance, in some embodiments, the target locus comprises a plurality of alleles, including a first allele and a second allele. Referring to Block 210, in some embodiments, the first allele is wild-type, the second allele is mutant, and the mutation call is wild-type or mutant. In some embodiments, referring to Block 212, the first allele is a first mutant, and the second allele is a second mutant. In some such embodiments, the mutation call is for the first mutant allele or the second mutant allele. In some embodiments, the first allele and the second allele are selected from a plurality of alleles for the respective target locus.
[00192] In some embodiments, the plurality of alleles includes at least 2, at least 3, at least 4, at least 5, or at least 8 alleles. In some embodiments, the plurality of alleles includes no more than 10, no more than 8, no more than 5, no more than 3, or no more than 2 alleles. In some embodiments, the plurality of alleles is from 2 to 5, from 3 to 8, from 3 to 10, or from 5 to 10 alleles. In some embodiments, the plurality of alleles falls within another range starting no lower than 2 alleles and ending no higher than 10 alleles.
[00193] In some embodiments, the target locus comprises a plurality of alleles, including at least a wild-type allele. In some embodiments, the target locus comprises a plurality of alleles, including at least a mutant allele. In some embodiments, the target locus comprises a plurality of alleles, including at least a first mutant allele and a second mutant allele. In some embodiments, the target locus comprises one or more mutant alleles. For example, in some
embodiments, the target locus comprises at least 2, at least 3, at least 4, at least 5, or at least 8 mutant alleles. In some embodiments, the target locus comprises no more than 10, no more than 8, no more than 5, no more than 3, or no more than 2 mutant alleles. In some embodiments, the target locus comprises from 2 to 5, from 3 to 8, from 3 to 10, or from 5 to 10 mutant alleles. In some embodiments, the target locus comprises a set of mutant alleles that falls within another range starting no lower than 2 alleles and ending no higher than 10 alleles.
[00194] Accordingly, in some embodiments, the mutation call is for a wild-type allele. In some embodiments, the mutation call is for a mutant allele. In some embodiments, the mutation call is for a respective mutant allele in a plurality of mutant alleles. In some embodiments, the mutation call is selected from the group consisting of a wild-type allele and a respective mutant allele in a plurality of mutant alleles.
[00195] In some embodiments, the first plurality of guide nucleic acids that have the first allele of the target locus hybridize to the first allele. In some embodiments, the first plurality of guide nucleic acids that have the first allele of the target locus have a nucleic acid sequence that is reverse complementary to the nucleic acid sequence of the first allele. In some embodiments, the second plurality of guide nucleic acids that have the second allele of the target locus hybridize to the second allele. In some embodiments, the second plurality of guide nucleic acids that have the second allele of the target locus have a nucleic acid sequence that is reverse complementary to the nucleic acid sequence of the second allele.
[00196] In some embodiments, a respective guide nucleic acid comprises any of the embodiments for guide nucleic acids disclosed herein, as well as any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art. See, for example, the section entitled “Engineered Guide Nucleic Acids,” above.
[00197] In some embodiments, the corresponding aliquot of nucleic acid derived from the biological sample is RNA or DNA. In some embodiments, the corresponding aliquot of nucleic acid derived from the biological sample comprises nucleic acids obtained from within cells. Alternatively, or in addition, in some embodiments, the corresponding aliquot of nucleic acid derived from the biological sample comprises cell-free nucleic acid molecules.
[00198] In some embodiments, the one or more nucleic acid molecules comprises synthetic nucleic acid molecules. For instance, Example 1 describes systems and methods for
identifying mutation calls using synthetic nucleic acid molecules, in accordance with an embodiment of the present disclosure.
[00199] In some embodiments, the method includes isolating the one or more nucleic acid molecules from the biological sample. In some embodiments, isolation of nucleic acid molecules includes removing one or more cells from the biological sample, subjecting the biological sample to a lysis step, and/or treating the biological sample in order to separate cellular nucleic acid molecules from cell-free nucleic acid molecules. In some embodiments, isolation of nucleic acid molecules includes isolation of cell-free nucleic acid molecules from a liquid biological sample (e.g, plasma and/or blood).
[00200] In some embodiments, the method includes heat-inactivating the biological sample.
[00201] In some embodiments, the method includes performing an amplification of one or more nucleic acid molecules derived from the biological sample. See, for instance, the section entitled, “Nucleic Acid Amplification,” below.
[00202] Other methods for deriving nucleic acid from biological samples and/or preparing biological samples for the same are contemplated for use in the present disclosure, as will be apparent to one skilled in the art.
[00203] In some embodiments, the plurality of wells 122 comprises at least 3, at least 5, at least 10, at least 20, at least 40, at least 50, at least 80, at least 100, at least 300, or at least 500 wells. In some embodiments, the plurality of wells comprises no more than 1000, no more than 500, no more than 200, no more than 80, no more than 50, or no more than 30 wells. In some embodiments, the plurality of wells is from 3 to 20, from 5 to 100, from 8 to 50, or from 10 to 500 wells. In some embodiments, the plurality of wells falls within another range starting no lower than 3 wells and ending no higher than 1000 wells.
[00204] In some embodiments, the common plate is a multi-well plate.
[00205] Nucleic Acid Amplification.
[00206] Referring to Block 214, in some embodiments, the signal dataset 120 is obtained by a procedure comprising amplifying a first plurality of nucleic acids 304 derived from the biological sample, thereby generating a plurality of amplified nucleic acids 404. The procedure further includes, for each respective well 122 in the first set of wells and each respective well in the second set of wells, partitioning, from the plurality of amplified nucleic
acids 404, the respective corresponding aliquot of nucleic acid 410; contacting the respective corresponding aliquot of nucleic acid with at least a programmable nuclease, a guide nucleic acid, and one or more reporters; and cleaving the one or more reporters using the programmable nuclease if the first allele or the second allele, respectively, is present, thereby generating a reporting signal 124.
[00207] Accordingly, in an example embodiment illustrated in FIG. 3, a biological sample 302 is used to obtain a plurality of nucleic acids 304. For instance, in some implementations, the nucleic acids 304 are extracted from the biological sample 302. The signal dataset 120 is obtained by a procedure 306 such as a DETECTR reaction, which includes an optional amplification reaction 308 and a programmable nuclease-based assay 310. The amplifying 308 includes any suitable method for amplification of nucleic acids known in the art, such as reverse transcriptase loop-mediated isothermal amplification (RT-LAMP). The programmable nuclease-based assay 310 also includes any suitable detection reaction, such as a Cas nuclease-based reaction, in which recognition of a target nucleic acid in the sample nucleic acids 304 by a guide nucleic acid (e.g, gRNA) induces cleavage of a reporter (e.g, a nucleic acid probe) by a Cas nuclease. In some implementations, the Cas nuclease is complexed with the guide nucleic acid. Reporting signals generated from cleavage of the reporter can be detected, measured, and used for mutation calling and variant identification 312, in accordance with methods of the present disclosure.
[00208] In some embodiments, the procedure 306 is a DETECTR reaction. In some embodiments, the procedure 306 is SHERLOCK. In some embodiments, the signal dataset 120 is obtained by a procedure that does not comprise an amplification step.
[00209] Generation of reporting signals using amplification and/or programmable nuclease-based assays are further described in, e.g, Broughton et al., “CRISPR-Cas 12-based detection of SARS-CoV-2,” Nature Biotechnology 38, 870-874 (2020), which is hereby incorporated herein by reference in its entirety. Other embodiments for the generation of reporting signals using amplification and/or programmable nuclease-based assays are contemplated herein, as will be apparent to one skilled in the art.
[00210] Referring to Block 216, in some embodiments, the amplifying is performed using isothermal amplification, loop-mediated isothermal amplification (LAMP), reverse transcriptase loop-mediated isothermal amplification (RT-LAMP), recombinase polymerase amplification (RPA), reverse transcriptase recombinase polymerase amplification (RT-RPA),
polymerase chain reaction (PCR), or reverse transcriptase polymerase chain reaction (RT- PCR). In some embodiments, the amplifying is performed using a multiplexed amplification reaction.
[00211] In some embodiments, the amplifying is performed using transcription mediated amplification (TMA), helicase dependent amplification (HD A), circular helicase dependent amplification (cHDA), strand displacement amplification (SDA), recombinase polymerase amplification (RPA), polymerase chain reaction (PCR), quantitative PCR (qPCR), nested PCR, multiplex PCR, degenerative PCR, asymmetric PCR, touchdown PCR, random primer PCR, hemi-nested PCR, polymerase cycling assembly (PCA), colony PCR, ligase chain reaction (LCR), digital PCR, methylation-specific PCR (MSP), co-amplification at lower denaturation temperature PCR (COLD-PCR), allele-specific PCR, intersequence-specific PCR (ISS-PCR), whole genome amplification (WGA), inverse PCR, thermal asymmetric interlaced PCR (TAIL-PCR), loop mediated amplification (LAMP), exponential amplification reaction (EXPAR), rolling circle amplification (RCA), ligase chain reaction (LCR), simple method amplifying RNA targets (SMART), single primer isothermal amplification (SPIA), multiple displacement amplification (MDA), nucleic acid sequence based amplification (NASBA), hinge-initiated primer-dependent amplification of nucleic acids (HIP), nicking enzyme amplification reaction (NEAR), and/or improved multiple displacement amplification (IMDA). In some embodiments, the amplification reaction includes reverse transcription (RT), for example when amplifying an RNA target nucleic acid. See, e.g., the section entitled “Target Nucleic Acids: Amplification Techniques,” above.
[00212] In particular, isothermal amplification techniques employ a polymerase and a set of specialized primers designed to recognize distinct sequences in one or more target nucleic acids. For example, LAMP techniques typically perform amplification of target nucleic acid molecules at a constant temperature (e.g, 60-65 °C) using multiple inner and outer primers and a polymerase having strand displacement activity. In some instances, LAMP is initiated by an inner primer pair containing a nucleic acid sequence complementary to a portion of the sense and antisense strands of the target nucleic acid. Following strand displacement synthesis by the inner primers, strand displacement synthesis primed by an outer primer pair can cause release of a single-stranded amplicon. The single-stranded amplicon can serve as a template for further synthesis primed by a second inner and second outer primer that hybridize to the other end of the target nucleic acid and produce a stem-loop nucleic acid structure. In subsequent LAMP cycling, one inner primer hybridizes to the loop on the
product and initiates displacement and target nucleic acid synthesis, yielding the original stem-loop product and a new stem-loop product with a stem twice as long. Additionally, the 3’ terminus of an amplicon loop structure serves as initiation site for self-templating strand synthesis, yielding a hairpin-like amplicon that forms an additional loop structure to prime subsequent rounds of self-templated amplification. The amplification continues with accumulation of many copies of the target nucleic acid. The final products of the LAMP process are stem-loop nucleic acids with concatenated repeats of the target nucleic acid in cauliflower-like structures with multiple loops formed by annealing between alternately inverted repeats of a target nucleic acid sequence in the same strand.
[00213] Typically, LAMP assays produce a detectable signal (e.g, fluorescence) during the amplification reaction. In some embodiments, the method comprises detecting and/or quantifying a detectable signal (e.g, fluorescence) produced during the LAMP assay. Any suitable method for detecting and quantifying florescence can be used.
[00214] In some embodiments, the amplifying is performed using any of the embodiments described herein, or any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art. See, for example, the section entitled “Target Nucleic Acids: Amplification Techniques,” above.
[00215] Referring to Block 218, the method further includes, prior to the amplifying, partitioning the first plurality of nucleic acids 304 derived from the biological sample into a plurality of amplification replicates 404 (e.g, 404-1, 404-2, 404-3), where the amplifying generates a respective plurality of amplified nucleic acids for each respective amplification replicate. Accordingly, the method comprises generating aliquots 404 for amplification reactions, as illustrated in FIG. 4A. In some embodiments, the first plurality of nucleic acids is not partitioned into amplification replicates prior to the amplifying.
[00216] In some embodiments, the plurality of amplification replicates comprises at least 3, at least 4, at least 5, at least 6, at least 8, at least 10, at least 15, or at least 20 amplification replicates. In some embodiments, the plurality of amplification replicates comprises no more than 30, no more than 20, no more than 10, or no more than 6 amplification replicates. In some embodiments, the plurality of amplification replicates is from 3 to 9, from 4 to 12, or from 10 to 25 amplification replicates. In some embodiments, the plurality of amplification replicates falls within another range starting no lower than 3 replicates and ending no higher than 30 replicates.
[00217] In some embodiments, the method further includes applying an amplification threshold filter to each respective amplification replicate, where, when the respective plurality of amplified nucleic acids fails to satisfy an amplification threshold, the respective amplification replicate is removed from the plurality of amplification replicates.
[00218] In some embodiments, the method further includes applying an amplification threshold filter to each respective biological sample in a plurality of biological samples, where, when the respective plurality of amplified nucleic acids fails to satisfy an amplification threshold, the respective sample is removed from the plurality of samples.
[00219] In some embodiments, the amplification threshold is established by performing a visual inspection of amplification reaction curves. In some embodiments, the amplification reaction curve is a receiver operating characteristic (ROC) curve. In some embodiments, the amplification threshold is determined by selecting an optimal Youden Index point such that the absolute difference between a sensitivity metric and a false positive rate calculated using reference amplification data is maximized.
[00220] In some embodiments, the amplification threshold is derived using positive and negative (e.g, no template) amplification controls from the amplification reaction (e.g, LAMP) of a given sample. For instance, in some such embodiments, the amplification threshold includes a time threshold and/or a fluorescence rate threshold. Positive amplification controls are deemed to represent an ideal sample, where the ideal sample exhibits a classic sigmoidal rise of fluorescence over time, and negative amplification controls are deemed to represent the background fluorescence. For instance, in some such embodiments, an ideal sample is approximated as having fluorescence kinetics similar to that of a positive amplification control. In some embodiments, the amplification threshold is determined based on a mean value between the amplification controls within a common plate. This determination can be used to reduce the impact of background that can occur in amplification replicates.
[00221] In some implementations, the determination of the amplification threshold further includes collecting fluorescence values at the time threshold. The time threshold can be used to exclude those samples that would amplify closer to the endpoint, signifying the amplification intermediates (e.g, LAMP intermediates) to be the majority contributors of the rise in the signal and not the actual sample itself. The determination further includes assigning a score for each sample, where the score is calculated as a ratio of the rate of
fluorescence threshold to the rate fluorescence value at the time threshold for each sample. In some embodiments, when the ratio of rate of fluorescence between controls and samples is less than a threshold ratio (e.g, 1), then samples are deemed to have failed to reach the minimum fluorescence required to be called out as amplified and, when the ratio is greater than or equal to the threshold ratio (e.g, 1), then samples are deemed to have amplified sufficiently.
[00222] In some embodiments, the determination of the amplification threshold comprises performing an ROC analysis using the calculated score and/or a measured amplification metric (e.g, ground truth) of each sample in a plurality of samples, thereby identifying an exact score value for the respective sample.
[00223] In some embodiments, the time threshold is from 5 minutes to 40 minutes. In some embodiments, the time threshold is a time point within the total duration of the amplification reaction that is at least 10%, at least 20%, or at least 30% into the total duration and is no more than 90%, no more than 80%, or no more than 70% into the total duration. For example, in some embodiments, the amplification reaction has a duration of 40 minutes and the time threshold is selected from a range of from 10 minutes to 30 minutes. In some embodiments, the amplification reaction has a duration of 40 minutes and the time threshold is 18 minutes.
[00224] In some embodiments, one or more amplification replicates in a plurality of amplification replicates are pooled and/or diluted, e.g, by any suitable method known in the art.
[00225] In some embodiments, referring to Block 220, the method further includes, prior to the contacting, partitioning the respective plurality of amplified nucleic acids for each respective amplification replicate 404 into a plurality of detection replicates 408, where each respective detection replicate 410 in the plurality of detection replicates represents a respective well 122 in the plurality of wells.
[00226] Accordingly, the method comprises generating a respective plurality of aliquots 408 from a respective amplification reaction 404, as illustrated in FIGS. 4A-4B. For instance, for each plurality of aliquots partitioned from a respective amplification reaction, each respective aliquot 410 corresponds to a respective well 122 in the plurality of wells. In some embodiments, wells are grouped into a plurality of sets depending on the type of guide nucleic acid used to contact the corresponding aliquot of nucleic acid, as illustrated in FIGS.
4A-4B (e.g, a first set of wells comprises allele 1 (WT) guide sequences and a second set of wells comprises allele 2 (MUT) guide sequences). Thus, as illustrated in FIG. 4B, each well in the first set of wells contains amplified nucleic acids that are interrogated by the allele 1 guide sequences (WT; set “A”) and each well in the second set of wells contains amplified nucleic acids that are interrogated by the allele 2 guide sequences (MUT; set “B”).
[00227] In some embodiments, each respective detection replicate represents a respective well in the plurality of wells. In some embodiments, each respective detection replicate represents a pair of wells in a respective comparative programmable nuclease-based reaction. For example, in a respective comparative programmable nuclease-based reaction, a pair of wells includes (i) a first well in which the corresponding aliquot of nucleic acid is interrogated with a guide nucleic acid having the first allele and (ii) a second well in which the corresponding aliquot of nucleic acid is interrogated with a guide nucleic acid having the second allele. Thus, as FIG. 4B illustrates, in some such embodiments, a detection replicate includes, e.g, well 410-1-1-1-A (e.g., well 410 for plate 1, amplification replicate 1, detection reaction 1, and wild-type nucleic acid guide “A”) and well 410- 1-1-1 -B (e.g., well 410 for plate 1, amplification replicate 1, detection reaction 1, mutant nucleic acid guide “B”).
[00228] In some embodiments, for each respective amplification replicate, the plurality of partitioned detection replicates comprises at least 3, at least 4, at least 5, at least 6, at least 8, at least 10, at least 15, or at least 20 detection replicates. In some embodiments, for each respective amplification replicate, the plurality of partitioned detection replicates comprises no more than 30, no more than 20, no more than 10, or no more than 6 detection replicates. In some embodiments, for each respective amplification replicate, the plurality of partitioned detection replicates is from 3 to 9, from 4 to 12, or from 10 to 25 detection replicates. In some embodiments, for each respective amplification replicate, the plurality of partitioned detection replicates falls within another range starting no lower than 3 replicates and ending no higher than 30 replicates. In some embodiments, nucleic acids in the biological sample, and/or a respective plurality of amplified nucleic acids derived therefrom, are not partitioned into a plurality of detection replicates prior to the contacting.
[00229] In some embodiments, the cleaving the one or more reporters using the programmable nuclease is performed using a detection assay, such as a programmable nuclease-based step in a DETECTR assay. See, e.g, Broughton et al., “CRISPR-Cas 12- based detection of SARS-CoV-2,” Nature Biotechnology 38, 870-874 (2020), which is hereby incorporated herein by reference in its entirety.
[00230] In some embodiments, the method further comprises, prior to the amplifying, partitioning the first plurality of nucleic acids 304 derived from the biological sample into a plurality of replicates for each locus in a plurality of loci (e.g, an amino acid position, gene, and/or target sequence of interest). In some embodiments, nucleic acids in the biological sample are not partitioned into a plurality of replicates for each locus in a plurality of loci.
[00231] In some embodiments, the method further comprises, prior to the contacting, partitioning the respective plurality of amplified nucleic acids for a respective amplification replicate 404 into a plurality of replicates for each locus in a plurality of loci (e.g, an amino acid position, gene, and/or target sequence of interest). For instance, as illustrated in FIG. 4A, nucleic acids 304 are interrogated at each of a plurality of loci of interest 406.
Accordingly, the amplification replicates 404 are partitioned across a plurality of multi-well plates, where each respective plate corresponds to a respective target locus in the plurality of loci (e.g, 406-1, 406-2, 406-3). In some such embodiments, for each respective multi-well plate 406 corresponding to a respective target locus, the respective amplification replicate 404 is further partitioned into a plurality of detection replicates 408 (e.g, 408-1-1, 408-1-2, 408-1-3), where each respective detection replicate 410 in the plurality of detection replicates represents a respective well 122 in the plurality of wells.
[00232] In some embodiments, one or more detection replicates in a plurality of detection replicates are pooled and/or diluted, e.g, by any suitable method known in the art.
[00233] Programmable Nuclease-Based Assays.
[00234] In some embodiments, the method comprises contacting the respective corresponding aliquot of nucleic acid with a plurality of programmable nucleases. In some embodiments, each respective programmable nuclease in the plurality of programmable nucleases is the same or different type of nuclease.
[00235] Referring to Block 222, in some embodiments, a respective programmable nuclease is a trans-cleaving programmable nuclease (e.g, capable of nonspecific cleavage of nucleic acids). In some embodiments, the programmable nuclease targets DNA (e.g, singlestranded DNA) or RNA.
[00236] Referring to Block 224, in some embodiments, the programmable nuclease is a Cas nuclease. In some embodiments, the Cas nuclease is selected from the group consisting of a Casl2, Casl3, Cas 14, CasPhi, and/or any subtypes or orthologs thereof. In some embodiments, the Cas nuclease is LbCasl2a, AsCasl2a, or CasDxl.
[00237] In some embodiments, the programmable nuclease is any of the embodiments described herein, or any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art. See, for example, the section entitled “Programmable Nucleases,” above. For instance, in some embodiments, the programmable nuclease is any of the programmable nucleases described in Broughton et al., “CRISPR-Casl2-based detection of SARS-CoV -2, ’’ Nature Biotechnology 38, 870-874 (2020), which is hereby incorporated herein by reference in its entirety.
[00238] In some embodiments, referring to Block 226, for each respective well 122 in the first set of wells, the programmable nuclease is programmed for the target locus using a corresponding guide nucleic acid in the first plurality of guide nucleic acids that have the first allele of the target locus, where the cleaving the one or more reporters using the programmable nuclease is initiated upon recognition of the target locus by the corresponding guide nucleic acid. For instance, as described herein (see, e.g., the sections entitled “Engineered Guide Nucleic Acids” and “Programmable Nucleases,” above), a guide nucleic acid directs a programmable nuclease to a respective target locus for cleavage by recognizing the nucleic acid sequence of the target locus. Recognition of the target locus can be engineered by designing guide nucleic acids that hybridize to (e.g., are reverse complementary to) all or a portion of the nucleic acid sequence of the target locus. In some embodiments, the corresponding guide nucleic acid is complexed to the programmable nuclease.
[00239] Referring to Block 228, in some embodiments, the first plurality of guide nucleic acids is RNA. In some embodiments, the first plurality of guide nucleic acids is gRNA or sgRNA.
[00240] Referring to Block 230, in some embodiments, the first plurality of guide nucleic acids hybridizes to a wild-type sequence for the target locus. In some embodiments, the first plurality of guide nucleic acids is reverse complementary to a wild-type sequence for the target locus.
[00241] In some embodiments, for each respective well in the second set of wells, the programmable nuclease is programmed for the target locus using a corresponding guide nucleic acid in the second plurality of guide nucleic acids that have the second allele of the target locus, where the cleaving the one or more reporters using the programmable nuclease is initiated upon recognition of the target locus by the corresponding guide nucleic acid. In
some embodiments, the second plurality of guide nucleic acids is RNA. In some embodiments, the second plurality of guide nucleic acids hybridizes to (e.g., is reverse complementary to) a mutant sequence for the target locus.
[00242] In some embodiments, the guide nucleic acids comprise any of the embodiments described herein, or any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art. See, for example, the section entitled “Engineered Guide Nucleic Acids,” above.
[00243] Signal Dataset.
[00244] In some embodiments, each respective reporter in the one or more reporters is single-stranded DNA. In some embodiments, each respective reporter in the one or more reporters is single-stranded RNA.
[00245] Referring to Block 232, in some embodiments, each respective reporter in the one or more reporters comprises a fluorescent reporter linked to a quencher.
[00246] As described above, in some implementations, for each respective well in the first set of wells, the one or more reporters are cleaved by the programmable nuclease when the first allele is present in the corresponding aliquot of nucleic acid derived from the biological sample, thus generating a reporting signal. In some embodiments, when the first allele is not present in the corresponding aliquot of nucleic acid, the one or more reporters are not cleaved by the programmable nuclease, and the reporting signal is zero or background.
[00247] In some implementations, for each respective well in the second set of wells, the one or more reporters are cleaved by the programmable nuclease when the second allele is present in the corresponding aliquot of nucleic acid derived from the biological sample, thus generating a reporting signal. In some embodiments, when the second allele is not present in the corresponding aliquot of nucleic acid, the one or more reporters are not cleaved by the programmable nuclease, and the reporting signal is zero or background.
[00248] In some embodiments, each respective reporting signal 124 is obtained from a detection moiety. For instance, in some embodiments, each respective reporting signal is a fluorescence emission by a fluorophore, and the corresponding discrete attribute value is a fluorescence intensity. In some embodiments, each respective reporting signal is an intensity of light, and the corresponding discrete attribute value is measured in relative units (e.g., relative fluorescent units).
[00249] In some embodiments, the corresponding discrete attribute value for each respective reporting signal is detected using a plate reader.
[00250] In some embodiments, the respective reporting signal 124 is a lateral flow readout. In some such embodiments, in the lateral flow readout is detected manually by visual inspection. In some embodiments, the lateral flow readout is detected using an image analysis algorithm (e.g., ImageJ, FIJI, etc.). Detection of reporting signals by lateral flow readout is further described in Broughton et al., “CRISPR-Cas 12-based detection of SARS-CoV-2,” Nature Biotechnology 38, 870-874 (2020), which is hereby incorporated herein by reference in its entirety.
[00251] In some embodiments, the one or more reporters and/or the plurality of reporting signals, and methods of detection thereof, comprise any of the embodiments described herein, as well as any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art. See, for example, the section entitled “Reporters,” above.
[00252] In some embodiments, the plurality of time points represented by the signal dataset comprises between 20 and 2000 time points. In some embodiments, the plurality of time points represented by the signal dataset is at least 10, at least 20, at least 50, at least 80, at least 100, at least 200, at least 500, at least 1000, or at least 2000 time points. In some embodiments, the plurality of time points is no more than 5000, no more than 2000, no more than 1000, no more than 500, no more than 200, no more than 100, or no more than 50 time points. In some embodiments, the plurality of time points represented by the signal dataset is from 10 to 500, from 50 to 200, from 40 to 100, or from 200 to 1000. In some embodiments, the plurality of time points represented by the signal dataset falls within another range starting no lower than 10 time points and ending no higher than 5000 time points.
[00253] In some embodiments, the plurality of time points is obtained over a duration of between 30 seconds and 1 hour. In some embodiments, the plurality of time points is obtained over a duration of at least 30 seconds, at least 1 minute, at least 5 minutes, at least 10 minutes, at least 30 minutes, or at least 1 hour. In some embodiments, the plurality of time points is obtained over a duration of no more than 2 hours, no more than 1 hour, no more than 30 minutes, or no more than 5 minutes. In some embodiments, the plurality of time points is obtained over a duration of from 30 seconds to 10 minutes, from 5 minutes to 1 hour, or from 20 minutes to 50 minutes. In some embodiments, the plurality of time points is obtained over
a duration that falls within another range starting no lower than 30 seconds and ending no higher than 2 hours.
[00254] In some embodiments, each respective time point corresponds to a respective measurement obtained during a detection assay, where the detection assay is performed over a duration of time. For instance, in some embodiments, each respective time point corresponds to a respective readout obtained from a plate reader, where each respective readout in a plurality of readouts is taken at intervals over the duration of the detection assay (e.g., every 30 seconds, every 1 second, every 0.5 seconds, etc.).
[00255] Accordingly, in some embodiments, the plurality of reporting signals 124 for each respective well 122 in the plurality of wells comprises at least 10, at least 20, at least 50, at least 80, at least 100, at least 200, at least 500, at least 1000, or at least 2000 reporting signals. In some embodiments, the plurality of reporting signals for each respective well is no more than 5000, no more than 2000, no more than 1000, no more than 500, no more than 200, no more than 100, or no more than 50 reporting signals. In some embodiments, the plurality of reporting signals for each respective well is from 10 to 500, from 50 to 200, from 40 to 100, or from 200 to 1000 reporting signals. In some embodiments, the plurality of reporting signals for each respective well falls within another range starting no lower than 10 reporting signals and ending no higher than 5000 reporting signals.
[00256] In some embodiments, the signal dataset 120 further comprises, for each respective control well in a plurality of control wells in the common plate, a corresponding plurality of control reporting signals, where the plurality of control wells are free of nucleic acid derived from the biological sample. The plurality of control wells comprises a first set of control wells 126-1 representing the first allele for the target locus, where each control well in the first set of control wells includes a first plurality of guide nucleic acids that have the first allele of the target locus. The plurality of control wells comprises a second set of control wells 126-2 representing the second allele for the target locus, where each control well in the second set of control wells includes a second plurality of guide nucleic acids that have the second allele of the target locus. Each corresponding plurality of control reporting signals comprises, for each respective time point in the plurality of time points, a respective control reporting signal in the form of a corresponding discrete attribute value.
[00257] In some embodiments, each respective control well is a negative control well and/or a no template control wells.
[00258] In some embodiments, the signal dataset 120 further comprises, for each respective positive control well in a plurality of positive control wells in the common plate, a corresponding plurality of positive control reporting signals. The plurality of positive control wells comprises a first set of positive control wells representing the first allele for the target locus, where each respective positive control well in the first set of positive control wells includes a first plurality of guide nucleic acids that have the first allele of the target locus. The signal dataset further includes a second set of positive control wells representing the second allele for the target locus, where each respective positive control well in the second set of positive control wells comprises a second plurality of guide nucleic acids that have the second allele of the target locus.
[00259] In some embodiments, each respective well in the first set of wells comprises one or more nucleic acid molecules corresponding to the first allele of the target locus, and each respective well in the second set of wells comprises one or more nucleic acid molecules corresponding to the second allele of the target locus.
[00260] As described above, in some embodiments, a respective guide nucleic acid in a respective plurality of guide nucleic acids in a well (e.g, a negative control well and/or a positive control well) hybridizes to all or a portion of the nucleic acid sequence of the respective allele.
[00261] Moreover, in some embodiments, any of the embodiments disclosed herein for guide nucleic acids, reporting signals and methods of obtaining the same, alleles, target loci, and wells are contemplated for use with control wells and/or positive control wells, as will be apparent to one skilled in the art. See, e.g., the sections entitled, “Samples,” “Nucleic Acid Amplification,” “Programmable Nuclease-Based Assays,” and “Signal Dataset,” above.
[00262] In some embodiments, the signal dataset is preprocessed for analysis. In some such embodiments, the preprocessing comprises data cleaning, normalization, filtering, and/or any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art.
[00263] Candidate Call Identities.
[00264] Referring to Block 234, the method further includes determining, for each respective well 122 in the plurality of wells, a corresponding signal yield 132 for the respective well using the corresponding plurality of reporting signals 124 for the respective well across the plurality of time points.
[00265] In particular, referring to Block 236, in some embodiments, the determining signal yield further comprises, for each respective well 122 in the plurality of wells, normalizing the respective plurality of reporting signals 124 for the respective well by scaling a maximum discrete attribute value by a minimum discrete attribute value in the corresponding plurality of reporting signals 124 for the respective well, thereby obtaining the corresponding signal yield 132 for the respective well. Accordingly, in some such embodiments, the maximum discrete attribute value is an endpoint fluorescence intensity, and the minimum discrete attribute value is a minimum fluorescence intensity. Without being limited to any one theory of operation, each well in the common plate is deemed to have a similar minimum fluorescence at the start of a detection assay, and thus the signal yield for a respective target locus is approximated to be consistent across wells when the concentrations of sample and/or target nucleic acids in the corresponding aliquots of nucleic acid are similar.
[00266] In some embodiments, the signal yield has the form Fy = max(F) / min(F), where Fy is the signal yield, max(F) is the maximum discrete attribute value in the corresponding plurality of reporting signals, and min(F) is the minimum discrete attribute value in the corresponding plurality of reporting signals.
[00267] In some embodiments, when the signal yield for a respective well fails to satisfy a signal yield threshold, the respective well is deemed a no-call. In some embodiments, the signal yield threshold is obtained by determining, for each respective control well in a plurality of control wells in the common plate, a corresponding control signal yield for the respective well using the corresponding plurality of control reporting signals for the respective well across the plurality of time points, thereby obtaining a plurality of control signal yields including a maximum control signal yield and a minimum control signal yield. Accordingly, in some embodiments, the signal yield threshold is a respective control signal yield (e.g, the maximum and/or the minimum control signal yield), and the respective well is deemed a no-call when the signal yield for the respective well is equal to or less than the respective control signal yield. In some embodiments, the signal yield threshold is a central tendency metric for the plurality of control signal yields (e.g, an average control signal yield), and the respective well is deemed a no-call when the signal yield for the respective well is equal to or less than the central tendency metric.
[00268] In some embodiments, each respective control signal yield is determined by normalizing a maximum discrete attribute value in the corresponding plurality of control reporting signals for the corresponding control well by a minimum discrete attribute value in
the corresponding plurality of control reporting signals for the corresponding control well. Without being limited to any one theory of operation, the maximum discrete attribute value and the minimum discrete attribute value will generally be similar, due to the lack of target nucleic acid derived from the biological sample. For instance, in the absence of target nucleic acids to which a corresponding guide nucleic acid can hybridize, a complexed programmable nuclease is unlikely to initiate reporter cleavage. As a result, reporting signals are expected to maintain background levels throughout the course of the detection assay. Accordingly, in some embodiments, each respective control signal yield in the plurality of control signal yields is 1 or about 1, and the respective well is deemed a no-call when the signal yield for the respective well is equal to 1 or about 1.
[00269] Referring to Block 238, the method further includes determining, for each respective well 122 in the first set of wells, a respective candidate call identity 134 based on a comparison between (i) a corresponding first signal yield 132 for the respective well in the first set of wells and (ii) a corresponding second signal yield 132 for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities 134. Referring to Block 240, in some embodiments, the respective candidate call identity is selected from the group consisting of first allele, second allele, and no-call (e.g, wild-type, mutant, and/or no-call).
[00270] In some embodiments, for each respective well in the first set of wells, the corresponding signal yield from the corresponding well in the second set of wells originates from a common biological replicate of the biological sample. For instance, in some embodiments, the first well and the corresponding well are matched for comparison across detection replicates in order to determine candidate call identities.
[00271] In some embodiments, the first well and the corresponding well are matched for comparison within a comparative programmable nuclease-based reaction. In some such embodiments, for a respective comparative reaction, a respective portion of nucleic acid is divided into two wells, where a first well is interrogated by a guide nucleic acid having the first allele sequence and a second well is interrogated by a guide nucleic acid having the second allele sequence. Relative reporting signals obtained from the first well and the second well are compared. Comparative programmable nuclease-based reactions are further described in detail herein (see, e.g, the section entitled, “Programmable Nuclease-Based Assays,” above). In some embodiments, the comparative programmable nuclease-based reaction is a DETECTR reaction. In some embodiments, the respective portion of nucleic
acid is an aliquot of nucleic acid. In some embodiments, the respective portion of nucleic acid is a plurality of nucleic acids derived from a biological sample. FIG. 4B illustrates a comparative detection reaction including, for each respective pair of wells, a first well 410 interrogated by the wild-type guide nucleic acid “A” and a second well 410 interrogated by the mutant guide nucleic acid “B.”
[00272] In some embodiments, the first well and the corresponding well are matched for comparison across amplification replicates in order to determine candidate call identities.
[00273] Referring to Block 242, in some embodiments, the respective candidate call identity is obtained using a relative intensity metric between the corresponding first signal yield and the corresponding second signal yield. In some embodiments, the relative intensity metric is a ratio, a log ratio, a ratio of binary logarithm, a difference, and/or a percent change.
[00274] In some embodiments, the relative intensity metric has the form log2(Fy(FAl)) / log2(Fy(SAl)),
[00275] where Fy(FAl) is the first allele corresponding signal yield, and Fy(SAl) is the second allele corresponding signal yield.
[00276] In some embodiments, when the relative intensity metric satisfies a first threshold criterion, the candidate call identity is first allele, and when the relative intensity metric satisfies a second threshold criterion, the candidate call identity is second allele. In some embodiments, the first threshold criterion and/or the second threshold criterion is based on a control relative intensity metric for a first control signal yield and a second control signal yield. In some embodiments, the control relative intensity metric is a ratio, a log ratio, a ratio of binary logarithm, a difference, and/or a fold change.
[00277] For instance, in some embodiments, the first threshold criterion is satisfied when the relative intensity metric is greater than the ratio of (i) the binary logarithm of a first control signal yield and (i) the binary logarithm of a second control signal yield, where the first control signal yield is obtained for a first control well in the first set of control wells including a first plurality of guide nucleic acids that have the first allele of the target locus, and the second control signal yield is obtained for a second control well in the second set of control wells including a second plurality of guide nucleic acids that have the second allele of the target locus. Accordingly, in some embodiments, the first threshold criterion is satisfied when the relative intensity metric is greater than
log2(Fc(FAl)) / log2(Fc(SAl))
[00278] where Fc(FAl) is the first control signal yield, and Fc(SAl) is the second control signal yield.
[00279] In some embodiments, the ratio of (i) the binary logarithm of a first control signal yield and (i) the binary logarithm of a second control signal yield is 1 or about 1.
Accordingly, in some embodiments, when the relative intensity metric is greater than 1, the candidate call identity is first allele.
[00280] In some embodiments, the second threshold criterion is satisfied when the relative intensity metric is less than the ratio of (i) the binary logarithm of a first control signal yield and (i) the binary logarithm of a second control signal yield, where the first control signal yield is obtained for a first control well in the first set of control wells including a first plurality of guide nucleic acids that have the first allele of the target locus, and the second control signal yield is obtained for a second control well in the second set of control wells including a second plurality of guide nucleic acids that have the second allele of the target locus. Accordingly, in some embodiments, the second threshold criterion is satisfied when the relative intensity metric is less than log2(Fc(FAl)) / log2(Fc(SAl))
[00281] where Fc(FAl) is the first control signal yield, and Fc(SAl) is the second control signal yield.
[00282] In some embodiments, the ratio of (i) the binary logarithm of a first control signal yield and (i) the binary logarithm of a second control signal yield is 1 or about 1.
Accordingly, in some embodiments, when the relative intensity metric is less than 1, the candidate call identity is second allele.
[00283] In some embodiments, the method further comprises obtaining a central tendency metric for the corresponding first signal yield and the corresponding second signal yield. In some embodiments, the central tendency metric is an arithmetic mean, weighted mean, log mean, midrange, midhinge, trimean, geometric mean, geometric median, Winsorized mean, median, and/or mode of the distribution of values.
[00284] In some embodiments, the central tendency metric is calculated as
[log2(Fy(FAl) + Fy(SAl))] / 2,
[00285] where Fy(FAl) is the corresponding first signal yield, and Fy(SAl) is the corresponding second signal yield.
[00286] Referring to Block 244, the signal dataset 120 further comprises, for each respective control well in a plurality of control wells in the common plate, a corresponding plurality of control reporting signals, where the plurality of control wells are free of nucleic acid derived from the biological sample. The plurality of control wells comprises a first set of control wells 126-1 representing the first allele for the target locus, where each well in the first set of control wells includes a first plurality of guide nucleic acids that have the first allele of the target locus. The plurality of control wells comprises a second set of control wells 126-2 representing the second allele for the target locus, where each well in the second set of control wells includes a second plurality of guide nucleic acids that have the second allele of the target locus. Each corresponding plurality of control reporting signals comprises, for each respective time point in the plurality of time points, a respective control reporting signal in the form of a corresponding discrete attribute value.
[00287] In some such embodiments, the method further comprises determining, for each respective control well in the plurality of control wells, a corresponding control signal yield for the respective well using the corresponding plurality of control reporting signals for the respective well across the plurality of time points, thereby obtaining a plurality of control signal yields including a maximum control signal yield and a minimum control signal yield for the common plate. The method further includes evaluating whether a respective candidate call identity 134 in the plurality of candidate call identities is a no-call using the maximum control signal yield and the minimum control signal yield.
[00288] For instance, referring to Block 246, in some embodiments, the evaluating comprises performing a comparison of (i) the relative intensity metric between the corresponding first and second signal yields and (ii) the plurality of control signal yields, where, when the relative intensity metric between the corresponding first and second signal yields falls within a range bounded by the maximum control signal yield and the minimum control signal yield, the respective candidate call identity 134 is deemed a no-call.
[00289] In some embodiments, referring to Block 248, each respective control signal yield is determined by normalizing a maximum discrete attribute value in the corresponding plurality of control reporting signals for the corresponding control well by a minimum discrete attribute value in the corresponding plurality of control reporting signals for the
corresponding control well. As described above, in some embodiments, each respective control signal yield is 1 or about 1.
[00290] Referring to Block 250, in some embodiments, the method further includes obtaining a central tendency metric for the corresponding first signal yield and the corresponding second signal yield. In some such embodiments, the method further comprises determining, for each respective control well in the first set of control wells, a respective control central tendency metric, thereby obtaining a plurality of control central tendency metrics including a maximum control central tendency metric and a minimum control central tendency metric. In some such embodiments, the evaluating comprises performing a comparison between (i) the central tendency metric of the corresponding first and second signal yields and (ii) the plurality of control central tendency metrics for the first set of control wells, where, when the central tendency metric of the corresponding first and second signal yields falls within a range bounded by the maximum control central tendency metric and the minimum control central tendency metric, the respective candidate call identity 134 is deemed a no-call.
[00291] In some embodiments, the evaluating comprises performing a comparison between (i) the central tendency metric of the corresponding first and second signal yields and (ii) the plurality of control central tendency metrics for the first set of control wells, where, when the central tendency metric of the corresponding first and second signal yields is less than a respective control central tendency threshold, the respective candidate call identity 134 is deemed a no-call. In some embodiments, the control central tendency threshold is a respective control central tendency metric in the plurality of control central tendency metrics (e.g, the maximum control central tendency metric).
[00292] In some embodiments, the control central tendency metric is an arithmetic mean, weighted mean, log mean, midrange, midhinge, trimean, geometric mean, geometric median, Winsorized mean, median, and/or mode of the distribution of values.
[00293] In some embodiments, the control central tendency metric is a log average calculated as
[log2(Fc(FAl) + Fc(SAl))] / 2,
[00294] where Fc(FAl) is a first control signal yield for the respective control well in the first set of control wells, and Fc(SAl) is a second control signal yield for a corresponding control well in the second set of control wells.
[00295] Voting Methods.
[00296] Referring to Block 252, the method further includes performing a voting procedure across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
[00297] In some embodiments, each respective candidate call identity for each respective well in the first set of wells corresponds to a respective biological sample in one or more biological samples, and the voting procedure comprises assigning the respective candidate call identity as the mutation call for the target locus.
[00298] In some embodiments, the voting procedure comprises applying a concordance vote across the plurality of candidate call identities for the first set of wells, where the mutation call for the target locus is assigned based on a common candidate call identity that is shared by at least a majority of the first set of wells. In some embodiments, the voting procedure comprises applying a concordance vote across the plurality of candidate call identities for the first set of wells, where the mutation call for the target locus is assigned based on a common candidate call identity that is shared by all of the wells in the first set of wells. In some embodiments, a respective mutation call is no-call when no candidate call identity is shared by at least a majority of the first set of wells. For instance, in some embodiments, when the candidate call identities for the first set of wells are equally distributed between first allele and second allele, then the respective mutation call is no-call. In some embodiments, when a respective candidate call identity in the plurality of candidate call identities is no-call, the no-call well is removed from the first set of wells and the concordance vote is performed using the candidate call identities of the remaining wells. In some embodiments, each well in the first set of wells corresponds to a different respective replicate of an amplification reaction. In some embodiments, each well in the first set of wells corresponds to a different respective replicate of a detection assay (e.g, a programmable nuclease-based assay). Partitioning biological samples into replicates is illustrated in FIGS. 4A-4C and described in further detail above (see, e.g., the sections entitled “Nucleic Acid Amplification” and “Programmable Nuclease-Based Assays,” above). In some embodiments, each well in the first set of wells corresponds to a different respective replicate of an amplification reaction, and a respective mutation call is no-call when one or more wells in the amplification reaction fails to satisfy an amplification threshold. See, e.g., the section entitled “Nucleic Acid Amplification,” above.
[00299] Other embodiments for voting procedures and/or concordance votes are contemplated, as will be apparent to one skilled in the art and as disclosed further herein.
[00300] For instance, referring to Block 254, in some embodiments, the method further includes, prior to the performing the voting procedure, binning the first set of wells into a plurality of bins, where each respective bin comprises a respective subset of wells originating from a common respective biological replicate of the biological sample. The voting procedure further comprises (i) performing, for each respective bin in the plurality of bins, a respective first concordance vote across the candidate call identities 134 for the subset of wells in the respective bin, thereby generating a plurality of bin votes 414, and (ii) applying a second concordance vote across the plurality of bin votes 414, thereby obtaining the mutation call 416 for the target locus. Referring to Block 256, in some embodiments, each respective bin vote in the plurality of bin votes is selected from the group consisting of first allele, second allele, and no-call.
[00301] In some embodiments, as illustrated in FIGS. 4B-4C, the first allele is wild-type and the second allele is mutant. In some embodiments, the first allele is a first mutant, and the second allele is a second mutant.
[00302] Referring to Block 258, in some embodiments, a respective first concordance vote generates a respective bin vote based on a common candidate call identity that is shared by at least a majority of the subset of wells for the respective bin. For example, as illustrated in FIG. 4B, consider the case of three wells, in which at least two wells have a candidate call identity of wild-type. In some such embodiments, the first concordance vote generates a bin vote 414 that is wild-type. Similarly, as illustrated in FIG. 4C, where the subset of wells consists of three wells and at least two of the wells have a candidate call identity of mutant, then the first concordance vote generates a bin vote 414 that is mutant.
[00303] In some embodiments, the respective first concordance vote generates a respective bin vote based on a common candidate call identity that is shared by at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95% of the wells in the subset of wells for the respective bin.
[00304] In some embodiments, a respective first concordance vote generates a respective bin vote based on a common candidate call identity that is shared by all of the wells in the subset of wells for the respective bin.
[00305] In some embodiments, a respective bin vote is no-call when no candidate call identity is shared by at least a majority of the subset of wells for the respective bin. For instance, in some embodiments, when the candidate call identities for the subset of wells are equally distributed between first allele and second allele, then the respective bin vote is nocall.
[00306] Referring to Block 260, in some embodiments, a respective bin vote is no-call when at least one well in the subset of wells for the respective bin has a candidate call identity of no-call. For instance, as illustrated in FIG. 4C, where the subset of wells consists of three wells and one of the wells has a candidate call identities of no-call, then the first concordance vote generates a bin vote 414 for the respective bin that is no-call. In some embodiments, when a respective candidate call identity is no-call, the no-call well is removed from the plurality of wells and the bin vote is performed using the candidate call identities of the remaining wells.
[00307] Referring to Block 262, in some embodiments, the second concordance vote generates a mutation call 416 based on a common bin vote that is shared by at least a majority of the bins in the plurality of bins. FIG. 4B illustrates a plurality of three bins, where at least two of the bins have a bin vote of wild-type, and where the second concordance vote generates a mutation call 416 for the target locus that is wild-type. Similarly, FIG. 4C illustrates a plurality of three bins, where at least two of the bins have a bin vote of mutant, and where the second concordance vote generates a mutation call 416 for the target locus that is mutant.
[00308] In some embodiments, the second concordance vote generates a mutation call based on a common bin vote that is shared by at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95% of the bins in the plurality of bins. In some embodiments, the second concordance vote generates a mutation call based on a common bin vote that is shared by all of the bins in the plurality of bins.
[00309] In some embodiments, the mutation call is no-call when no bin vote is shared by at least a majority of the plurality of bins. For instance, in some embodiments, when the bin votes for the plurality of bins are equally distributed between first allele and second allele, then the mutation call is no-call.
[00310] In some embodiments, as illustrated in FIG. 4C, when a respective bin vote is nocall, the no-call bin is removed from the plurality of bin votes and the second concordance vote is performed with the remaining bins.
[00311] Referring to Block 264, in some embodiments, each respective bin in the plurality of bins corresponds to a different respective biological replicate of an amplification reaction, and the respective subset of wells for the respective bin is partitioned from the respective biological replicate. Partitioning biological samples into replicates is illustrated in FIGS. 4A- 4C and described in further detail above (see, e.g, the sections entitled “Nucleic Acid Amplification” and “Programmable Nuclease-Based Assays,” above).
[00312] In some embodiments, the plurality of bins comprises between 2 and 10 bins. In some embodiments, the plurality of bins comprises at least 3, at least 4, at least 5, at least 6, at least 8, at least 10, at least 15, or at least 20 bins. In some embodiments, the plurality of bins comprises no more than 30, no more than 20, no more than 10, or no more than 6 bins. In some embodiments, the plurality of bins is from 3 to 9, from 4 to 12, or from 10 to 25 bins. In some embodiments, the plurality of bins falls within another range starting no lower than 3 bins and ending no higher than 30 bins.
[00313] In some embodiments, each respective bin in the plurality of bins comprises between 2 and 10 wells in the respective subset of wells. In some embodiments, each respective subset of wells comprises at least 3, at least 4, at least 5, at least 6, at least 8, at least 10, at least 15, or at least 20 wells. In some embodiments, each respective subset of wells comprises no more than 30, no more than 20, no more than 10, or no more than 6 wells. In some embodiments, each respective subset of wells is from 3 to 9, from 4 to 12, or from 10 to 25 wells. In some embodiments, each respective subset of wells falls within another range starting no lower than 3 wells and ending no higher than 30 wells.
[00314] In some embodiments, the systems and methods disclosed herein are used for comparison with one or more additional mutation calling methods (e.g., sequencing-based methods such as whole genome sequencing). In some embodiments, the systems and methods disclosed herein are performed concurrently with one or more additional mutation calling methods (e.g., sequencing-based methods such as whole genome sequencing).
[00315] In some embodiments, the systems and methods for determining a mutation call disclosed herein are performed for each respective mutant allele in a plurality of mutant alleles (see, e.g., the section entitled “Samples,” above). For example, in some embodiments,
the method further includes performing an instance of the obtaining a mutation call for the target locus, for each respective mutant allele in the plurality of mutant alleles for the target locus.
[00316] In one example of some such embodiments, the method further comprises, for each candidate second allele in a plurality of candidate second alleles, repeating the obtaining a signal dataset, determining corresponding signal yields, determining respective candidate call identities, and performing a voting procedure, thereby obtaining a plurality of candidate mutation calls for the target locus. In some such embodiments, the method further includes performing a mutation call voting procedure across the plurality of candidate mutation calls for the target locus, thereby obtaining a final mutation call for the target locus.
[00317] In some embodiments, the first allele is a wild-type allele and each respective candidate second allele in the plurality of candidate second alleles is a different respective mutant allele for the target locus. For example, in some such embodiments, the target locus comprises at least a first allele that is a wild-type allele and a plurality of candidate second alleles, where each respective candidate second allele is a different mutant allele for the target locus.
[00318] In some embodiments, when a single respective candidate mutation call in the plurality of candidate mutation calls corresponds to a respective candidate second allele in the plurality of candidate second alleles, the respective candidate mutation call is determined as the final mutation call for the target locus. For example, in some implementations, the first allele is a wild-type allele, each respective candidate second allele in the plurality of candidate second alleles is a different respective mutant allele for the target locus, and the final mutation call for the target locus is a candidate mutation call that corresponds to a respective mutant allele. In some such embodiments, the final mutation call for the target locus is the only candidate mutation call that corresponds to a respective mutant allele, where every other candidate mutation call in the plurality of candidate mutation calls corresponds to the wild-type allele.
[00319] Thus, in an example embodiment, a target allele comprises a wild-type allele and n mutant alleles, where n is a positive integer of 1 or greater (e.g., n is between 1 and 100). N comparisons are performed between the wild-type allele and each of the n mutant alleles, in accordance with some embodiments of the present disclosure, thus obtaining a corresponding n candidate mutation calls. In some implementations, one of the n comparisons yields a
candidate mutation call that identifies the corresponding mutant allele, while every other comparison yields a candidate mutation call that identifies the wild-type allele. In some such implementations, the candidate mutation call that identifies the corresponding mutant allele is selected as the final mutation call, and the identity of the target locus is determined to be the corresponding mutant allele (e.g, SNP).
[00320] Accordingly, for an example target allele comprising a wild-type allele and four possible mutant alleles (n = 4 in this example each instance of the obtaining a mutation call for each of the four mutant alleles compared with the wild-type allele would generate the following candidate mutation calls: log2(Fy(WT)) > log2(Fy(Ml)) Wild Type log2(Fy(WT)) > log2(Fy(M2)) Wild Type log2(Fy(WT)) > log2(Fy(M3)) Wild Type log2(Fy(WT)) < log2(Fy(M4)) Mutant (4)
[00321] As only one of the mutant alleles (e.g, mutant 4) is represented as a candidate mutation call, in some such embodiments, mutant 4 is selected as the final mutation call.
[00322] In some embodiments, when a sub-plurality of candidate mutation calls in the plurality of candidate mutations calls corresponds to a respective sub-plurality of candidate second alleles in the plurality of candidate second alleles, the obtaining the final mutation call for the target locus is based upon a comparison of signal yields for each pair of candidate second alleles in the sub-plurality of candidate second alleles. For example, in some implementations, the first allele is a wild-type allele, each respective candidate second allele in the plurality of candidate second alleles is a different respective mutant allele for the target locus, and two or more candidate mutation calls in the plurality of candidate mutation calls identifies a corresponding two or more mutant alleles. In some such embodiments, a comparison is performed between each respective pair of mutant alleles corresponding to candidate mutation calls to determine which mutant allele has the highest signal yield and thus is selected as the final mutation call. In some such embodiments, the comparison is performed by repeating the obtaining a signal dataset, determining corresponding signal yields, determining respective candidate call identities, and performing a voting procedure, in accordance with the methods disclosed above, where the first allele is a first mutant allele in a
respective pair of mutant alleles and the second allele is a second mutant allele in the respective pair of mutant alleles.
[00323] Accordingly, for an example target allele comprising a wild-type allele and four possible mutant alleles, each instance of the obtaining a mutation call for each of the four mutant alleles compared with the wild-type allele would generate the following candidate mutation calls: log2(Fy(WT)) > log2(Fy(Ml)) Wild Type log2(Fy(WT)) < log2(Fy(M2)) Mutant(2) log2(Fy(WT)) > log2(Fy(M3)) Wild Type log2(Fy(WT)) < log2(Fy(M4)) Mutant (4)
[00324] Two of the mutant alleles (e.g, mutant 2 and mutant 4) are represented as a candidate mutation call. Comparison of the signal yields of the pair of mutant alleles represented as candidate mutation calls reveals that one of the mutant alleles has a higher signal yield than the other: log2(Fy(M2)) < log2(Fy(M4)) Mutant(4)
[00325] Thus, the mutant allele (e.g. , mutant 4) having the highest signal yield in the comparison is selected as the final mutation call.
[00326] In another example of some such embodiments, all the candidate second alleles in the plurality of candidate second alleles are in the same plate. In such embodiments, the signal dataset comprises, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus. In this example, the signal dataset represents a plurality of time points. Moreover, the plurality of wells comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus. Further still, the plurality of wells also comprises, for each respective candidate second allele in a set of candidate second alleles, a respective additional set of wells representing the respective candidate second allele for the target locus, where each well in this respective additional set of wells includes a corresponding plurality of guide nucleic acids that have the
respective candidate second allele for the target locus. Each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value. Also, each respective well in the first set of wells and each respective well in each respective additional set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample. In this example, there is determined, for each respective well in the plurality of wells, a corresponding signal yield for the respective well using the corresponding plurality of reporting signals for the respective well across the plurality of time points. In this example, there is further determined, for each respective well in the first set of wells, for each respective candidate second allele in the set of candidate second alleles, a respective candidate call identity based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding respective second signal yield for a corresponding well in the respective additional set of wells for the respective candidate second allele, thereby obtaining a plurality of candidate call identities. A voting procedure is then performed across the plurality of candidate call identities, thereby obtaining a mutation call for the target locus that is one of the candidate second alleles in the set of candidate second alleles or the first allele. In some embodiments in accordance with this example, the first allele is a wild-type allele and each respective candidate second allele in the set of candidate second alleles is a different respective mutant allele for the target locus. In some embodiments in accordance with this example, the first allele is other than a wild-type allele. In some embodiments in accordance with this example, the set of candidate second alleles consists of between 2 and 10, between 2 and 20, or between 2 and 100 candidate second alleles. In some embodiments, the systems and methods for determining a mutation call disclosed herein are performed for each target locus in a plurality of target loci. For instance, FIGS. 4A-4B illustrates a method in accordance with the present disclosure, which is performed for each of three target loci. Additionally, Example 1 describes an exemplary method in accordance with the present disclosure performed for each of amino acid positions 452, 484, and 501. In some embodiments, each respective target locus in the plurality of target loci comprises any of the embodiments disclosed herein for a first respective target locus, as well as any substitutions, modifications, additions, deletions, and/or combinations thereof (see, e.g., the section entitled “Samples,” above), as will be apparent to one skilled in the art.
[00327] In some embodiments, the plurality of target loci comprises at least 2, at least 5, at least 10, at least 20, at least 30, at least 50, at least 100, or at least 200 target loci. In some embodiments, the plurality of target loci comprises no more than 500, no more than 200, no more than 100, no more than 50, or no more than 10 target loci. In some embodiments, the plurality of target loci is from 2 to 10, from 5 to 50, or from 10 to 100 target loci. In some embodiments, the plurality of target loci falls within another range starting no lower than 2 loci and ending no higher than 500 loci.
[00328] In some embodiments, each respective target locus in the plurality of target loci maps to a reference sequence for a single organism. For instance, in some embodiments, a respective organism (e.g., a virus) comprises a plurality of target loci at which mutations can occur. Determining the number and type of mutations at each of the target loci of interest can inform identification and classification of organisms (e.g., by strain, variant, and/or subtype).
[00329] In some embodiments, the systems and methods for determining a mutation call disclosed herein are performed for each sample in a plurality of samples. For instance, FIG. 4A illustrates a first plurality of amplification replicates partitioned from a first sample and a second plurality of amplification replicates partitioned from a second sample. FIG. 4B illustrates the determination of mutation calls, in accordance with an embodiment of the present disclosure, using candidate call identities obtained for the first sample, while FIG. 4C illustrates the determination of mutation calls using candidate call identities obtained for the second sample.
[00330] In some embodiments, the plurality of samples comprises at least 2, at least 5, at least 10, at least 20, at least 30, at least 50, at least 100, at least 200, or at least 500 samples. In some embodiments, the plurality of samples comprises no more than 1000, no more than 500, no more than 200, no more than 100, no more than 50, or no more than 10 samples. In some embodiments, the plurality of samples is from 2 to 20, from 5 to 100, or from 10 to 200 samples. In some embodiments, the plurality of samples falls within another range starting no lower than 2 samples and ending no higher than 1000 samples.
[00331] Suitable embodiments for performing the present systems and methods for a plurality of loci and/or for a plurality of samples, including samples, target loci, target nucleic acids, wells, nucleic acid amplification, programmable nuclease-based assays, reporters, guide nucleic acid sequences, candidate call identities and determining the same, voting procedures, and mutation calls, and any characteristics or elements thereof, include any of the
embodiments for samples, target loci, target nucleic acids, wells, nucleic acid amplification, programmable nuclease-based assays, reporters, guide nucleic acid sequences, candidate call identities and determining the same, voting procedures, and mutation calls, disclosed herein for a single target locus and/or for a single sample, as well as any substitutions, modifications, additions, deletions, and/or combinations thereof, as will be apparent to one skilled in the art.
[00332] Additional embodiments.
[00333] As described above, one aspect of the present disclosure provides a method for determining a mutation call for a target locus in a biological sample, at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors. The method includes amplifying a first plurality of nucleic acids derived from the biological sample, thereby obtaining a plurality of amplified nucleic acids. For each respective well in a plurality of wells in a common plate, a procedure is performed including (i) partitioning, from the plurality of amplified nucleic acids, a respective corresponding aliquot of nucleic acid derived from the biological sample, and (ii) contacting the respective corresponding aliquot of nucleic acid with at least a programmable nuclease, a guide nucleic acid, and a plurality of reporters.
[00334] A signal dataset is obtained including, for each respective well in the plurality of wells in the common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules derived from the biological sample that map to the target locus. The signal dataset represents a plurality of time points. The plurality of wells includes a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus. The plurality of wells further includes a second set of wells representing a second allele for the target locus, where each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus. Each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value obtained from a cleavage of one or more respective reporters, in the plurality of reporters, by the programmable nuclease.
[00335] For each respective well in the plurality of wells, a corresponding signal yield for the respective well is determined using the corresponding plurality of reporting signals for the
respective well across the plurality of time points. For each respective well in the first set of wells, a respective candidate call identity is determined based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities. A voting procedure is performed across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
[00336] Another aspect of the present disclosure provides a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors, the one or more programs comprising instructions for performing a method including obtaining a signal dataset comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus. The signal dataset represents a plurality of time points. The plurality of wells comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus. The plurality of wells further comprises a second set of wells representing a second allele for the target locus, where each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus. Each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value. Each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample.
[00337] For each respective well in the plurality of wells, a corresponding signal yield for the respective well is determined using the corresponding plurality of reporting signals for the respective well across the plurality of time points. For each respective well in the first set of wells, a respective candidate call identity is determined based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities. A voting procedure is performed across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
[00338] Another aspect of the present disclosure provides a non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for carrying out a method including obtaining a signal dataset comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus. The signal dataset represents a plurality of time points. The plurality of wells comprises a first set of wells representing a first allele for the target locus, where each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus. The plurality of wells further comprises a second set of wells representing a second allele for the target locus, where each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus. Each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value. Each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample.
[00339] For each respective well in the plurality of wells, a corresponding signal yield for the respective well is determined using the corresponding plurality of reporting signals for the respective well across the plurality of time points. For each respective well in the first set of wells, a respective candidate call identity is determined based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities. A voting procedure is performed across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
[00340] Another aspect of the present disclosure provides a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors, the one or more programs comprising instructions for performing any of the methods and/or embodiments disclosed herein. In some embodiments, any of the presently disclosed methods and/or embodiments are performed at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors.
[00341] Another aspect of the present disclosure provides a non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for carrying out any of the methods disclosed herein.
[00342] Embodiments for Detecting SARS-CoV-2 Variants
[00343] I. Samples.
[00344] Described herein are compositions, systems and methods for assaying for a SARS-CoV-2 variant. A number of samples are consistent with the methods, and reagents disclosed herein.
[00345] In some cases, the sample comprises a target nucleic acid which is a gene fragment of a SARS-CoV-2 variant. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Alpha strain/B.1.1.7 and Q lineages and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Beta strain/B.1.351 and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Gamma strain/P.1 and descendent lineages. In some embodiments, the SARS- CoV-2 variant that is specifically targeted is Epsilon strain/B.1.427 and B.1.429 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Kappa strain/B.1.617.1 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Zeta strain/P.2 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Delta strain/B.1.617.2 and AY lineages and descendent lineages. In some embodiments, the SARS- CoV-2 variant that is specifically targeted is Zeta strain and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is B.1.526 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is B.1.526.1 lineage and descendent lineages. In some embodiments, the SARS- CoV-2 variant that is specifically targeted is B.1.525 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is B.1.617 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is B.1.617.3 lineage and descendent lineages. In some embodiments, the SARS- CoV-2 variant that is specifically targeted is B.1.617.1 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Lambda strain/C.37 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant
that is specifically targeted is Eta strain/B.1.525 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Iota strain/B.1.526 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Mu strain/B.1.621 and B.1.621.1 lineages and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is Omicron strain/B.1.1.529 lineage and descendent lineages. In some embodiments, the SARS-CoV-2 variant that is specifically targeted is a lineage obtained from a reference database (see, for example, Rambaut et al. , (2020) Nature Microbiology, doi:10.1038/s41564-020-0770-5; and Cov-Lineages, available on the Internet at cov-lineages.org/lineage_hst.html).
[00346] In some embodiments, the SARS-CoV-2 variant that is specifically targeted has one or more mutation(s) in the gene encoding Spike (S) protein. In some embodiments, the SARS-CoV-2 variant that is specifically targeted has one or more mutation(s) in the gene encoding Nucleocapsid (N) protein. In some embodiments, the SARS-CoV-2 variant that is specifically targeted has one or more mutation(s) in the gene encoding Envelop (E) protein. In some embodiments, the SARS-CoV-2 variant that is specifically targeted has one or more mutation(s) in the gene encoding Membrane glycoprotein (M) protein. In some embodiments, the SARS-CoV-2 variant that is specifically targeted has one or more mutation(s) in ORF la. In some embodiments, the SARS-CoV-2 variant that is specifically targeted has one or more mutation(s) in ORF lb. In some instances, the target nucleic acid comprises a segment of a nucleic acid from the SARS-CoV-2 variant comprising the one or more mutations.
[00347] In some instances, the sample is a biological sample from an individual. In some instances, a biological sample is a blood, serum, plasma, saliva, urine, mucosal sample, peritoneal sample, cerebrospinal fluid, gastric secretions, nasal secretions, sputum, pharyngeal exudates, urethral or vaginal secretions, an exudate, an effusion, or tissue sample. In some instances, the sample is a nasal swab. In specific instances, the nasal swab is a nasopharyngeal swab. In some instances, the sample is a throat swab. In specific instances, the throat swab is an oropharyngeal swab. A tissue sample may be dissociated or liquified prior to application to detection system of the disclosure. A sample from an environment may be from soil, air, or water. In some instances, the environmental sample is taken as a swab from a surface of interest or taken directly from the surface of interest. In some instances, the raw sample is applied to the detection system. In some instances, the sample is diluted with a buffer or a fluid or concentrated prior to application to the detection system or be applied neat to the detection system.
[00348] Sometimes, the sample is contained in no more 20 pL. The sample, in some cases, is contained in no more than 1, 5, 10, 15, 20, 25, 30, 35 40, 45, 50, 55, 60, 65, 70, 75, 80, 90, 100, 200, 300, 400, 500 pL, or any of value from 1 pL to 500 pL. Sometimes, the sample is contained in more than 500 pL.
[00349] Some methods described herein can detect a target nucleic acid present in the sample in various concentrations or amounts as a target nucleic acid. In some cases, the sample has at least 2 target nucleic acids. In some cases, the sample has at least 3, 5, 10, 20, 30, 40, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, or 10000 target nucleic acids. In some cases, the method detects target nucleic acid present at least at one copy per 101 non-target nucleic acids, 102 non-target nucleic acids, 103 non-target nucleic acids, 104 non-target nucleic acids, 105 non-target nucleic acids, 106 non-target nucleic acids, 107 non-target nucleic acids, 108 non-target nucleic acids, 109 non-target nucleic acids, or IO10 non-target nucleic acids.
[00350] Any of the above disclosed samples are consistent with the systems, assays, and effector proteins disclosed herein.
[00351] II. Methods of Nucleic Acid Detection.
[00352] Provided herein are methods of detecting target nucleic acids indicative of, at least in part, SARS-CoV-2 variants. The SARS-CoV-2 variant, in some embodiments, is any one of the WA-1, Alpha, Beta, Gamma, Delta, Epsilon, Kappa, Omicron, or Zeta SARS-CoV-2 variants. In some embodiments, the target nucleic acid comprises a segment of an S (Spike) gene of a SARS-CoV-2 variant. The segment of the S gene, in some embodiments, comprises a mutation (e.g., an SNP) relative to a wild-type SARS-CoV-2 protein. The mutation, in some embodiments, is used, at least in part, to identify the SARS-CoV-2 variant (e.g, to distinguish it from other variants or from a wild-type SARS-CoV-2). Therefore detection of the target nucleic acid comprising the mutation in a sample, in some embodiments, is used to identify the sample as comprising the SARS-CoV-2 variant.
[00353] Methods described herein, in some embodiments, comprises contacting the sample to a complex comprising a guide nucleic acid comprising a segment that is reverse complementary to a segment of the target nucleic acid and an effector protein that exhibits sequence independent cleavage upon forming a complex comprising the segment of the guide nucleic acid binding to the segment of the target nucleic acid (e.g, comprising a segment of a SARS-CoV-2 variant); and assaying for a signal indicating cleavage of a reporter, wherein
the signal indicates a presence of the target nucleic acid in the sample and wherein absence of the signal indicates an absence of the target nucleic acid in the sample. When a guide nucleic acid binds to a target nucleic acid (e.g, a nucleic acid from a SARS-CoV-2 variant), the effector proteins trans cleavage activity can be initiated, and reporter nucleic acids can be cleaved, resulting in the detection of fluorescence indicative of, at least in part, the presence of the target nucleic acid. The cleaving of the reporter nucleic acid using the effector protein, in some embodiments, cleaves with an efficiency of 50% as measured by a change in a signal that is calorimetric, potentiometric, amperometric, optical (e.g, fluorescent, colorimetric, etc.), or piezo-electric, as non-limiting examples. In some cases, the cleavage efficiency is at least 40%, 50%, 60%, 70%, 80%, 90%, or 95% as measured by a change in a signal that is calorimetric, potentiometric, amperometric, optical (e.g, fluorescent, colorimetric, etc.), or piezo-electric, as non-limiting examples. In some cases, the method described herein detect a target nucleic acid (e.g, a nucleic acid from a SARS-CoV-2 variant) with an effector protein and a detector nucleic acid in a sample where the sample is contacted with the reagents for a predetermined length of time sufficient for trans cleavage of the single-stranded detector nucleic acid.
[00354] Some methods described herein comprise a) contacting the sample to i) a detector nucleic acid; and ii) a composition comprising a effector protein and a non-naturally occurring guide nucleic acid having a nucleotide sequence that is at least 80% identical to SEQ ID NO: 1, 2, 3, 4, 5, or 6 (or any one of SEQ ID NOS: 22-27 or 40-42) that hybridizes to a segment of the target nucleic acid, wherein the effector protein cleaves the detector nucleic acid upon hybridization of the non-naturally occurring guide nucleic acid to the segment of the coronavirus target nucleic acid; and b) assaying for a change in a signal, wherein the change in the signal is produced by cleavage of the detector nucleic acid. In some specific embodiments, the detector nucleic comprises acid a nucleotide sequence that is at least 62.5% identical to SEQ ID NO: 7, and flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end. In some specific embodiments, the detector nucleic acid comprises a nucleotide sequence that is at least 75% identical to SEQ ID NO: 7, and flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end. In some specific embodiments, the detector nucleic acid comprises a nucleotide sequence that is at least 87.5% identical to SEQ ID NO: 7, and flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end. In some specific embodiments, the non-naturally occurring guide nucleic acid having a
nucleotide sequence that is at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 1, 2, 3, 4, 5, or 6, or any one of SEQ ID NOS: 22-27 or 40-42.
[00355] In some instances, the method further comprises assaying for a control sequence by contacting the control nucleic acid to a second detector nucleic acid and a composition comprising the programmable nuclease and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the control nucleic acid, wherein the programmable nuclease cleaves the detector nucleic acid upon hybridization of the non-naturally occurring guide nucleic acid to the segment of the control nucleic acid.
[00356] Contacting the sample with the effector protein and guide nucleic acid, in some embodiments, occurs at a temperature of at least about 25°C, at least about 30°C, at least about 35°C, at least about 40°C, at least about 50°C, or at least about 65°C. In some instances, the temperature is not greater than 80°C. In some instances, the temperature is about 25°C, about 30°C, about 35°C, about 40°C, about 45°C, about 50°C, about 55°C, about 60°C, about 65°C, or about 70°C. In some instances, the temperature is about 25°C to about 45°C, about 35°C to about 55°C, or about 55°C to about 65°C.
[00357] In some cases, methods of the disclosure detect a target nucleic acid (e.g. , the target nucleic acid indicative of, at least in part, a SARS-CoV-2 variant) in less than 60 minutes. In some cases, methods of the disclosure detect a target nucleic acid in less than about 120 minutes, less than about 110 minutes, less than about 100 minutes, less than about 90 minutes, less than about 80 minutes, less than about 70 minutes, less than about 60 minutes, less than about 55 minutes, less than about 50 minutes, less than about 45 minutes, less than about 40 minutes, less than about 35 minutes, less than about 30 minutes, less than about 25 minutes, less than about 20 minutes, less than about 15 minutes, less than about 10 minutes, less than about 5 minutes, less than about 4 minutes, less than about 3 minutes, less than about 2 minutes, or less than about 1 minute.
[00358] In some cases, the methods of detecting are performed in less than 10 hours, less than 9 hours, less than 8 hours, less than 7 hours, less than 6 hours, less than 5 hours, less than 4 hours, less than 3 hours, less than 2 hours, less than 1 hour, less than 50 minutes, less than 45 minutes, less than 40 minutes, less than 35 minutes, less than 30 minutes, less than 25 minutes, less than 20 minutes, less than 15 minutes, less than 10 minutes, less than 9 minutes, less than 8 minutes, less than 7 minutes, less than 6 minutes, or less than 5 minutes. In some cases, methods of detecting are performed in about 5 minutes to about 10 hours, about 10
minutes to about 8 hours, about 15 minutes to about 6 hours, about 20 minutes to about 5 hours, about 30 minutes to about 2 hours, or about 45 minutes to about 1 hour.
[00359] The results from the completed assay can be detected and analyzed in various ways. In some cases, signal produced by the reaction is visible by eye, and the results can be read by the user. In some cases, the signal is visualized by an imaging device or other device depending on the type of signal. Often, the imaging device is a digital camera, such a digital camera on a mobile device. The mobile device, in some embodiments, has a software program or a mobile application that can capture an image of the support medium, identify the assay being performed, detect the detection region and the detection spot, provide image properties of the detection spot, analyze the image properties of the detection spot, and provide a result. Alternatively, or in combination, the imaging device can capture fluorescence, ultraviolet (UV), infrared (IR), or visible wavelength signals. The imaging device, in some embodiments, has an excitation source to provide the excitation energy and captures the emitted signals. In some cases, the excitation source is a camera flash and optionally a filter. In some cases, the imaging device is used together with an imaging box that is placed over the support medium to create a dark room to improve imaging. The imaging box can be a cardboard box that the imaging device can fit into before imaging. In some instances, the imaging box has optical lenses, mirrors, filters, or other optical elements to aid in generating a more focused excitation signal or to capture a more focused emission signal. Often, the imaging box and the imaging device are small, handheld, and portable to facilitate the transport and use of the assay in remote or low resource settings.
[00360] The assay described herein can be visualized and analyzed by a mobile application (app) or a software program. Using the graphic user interface (GUI) of the app or program, in some embodiments, an individual takes an image of the support medium, including the detection region, barcode, reference color scale, and fiduciary markers on the housing, using a camera on a mobile device. The program or app reads the barcode or identifiable label for the test type, locates the fiduciary marker to orient the sample, reads the detectable signals, compares against the reference color grid, and determines the presence or absence of the target nucleic acid, which indicates the presence of the gene, virus, or the agent responsible for the disease. The mobile application, in some embodiments, presents the results of the test to the individual. The mobile application, in some embodiments, stores the test results in the mobile application. The mobile application, in some embodiments, communicates with a remote device and transfer the data of the test results. The test results, in some embodiments,
are viewable remotely from the remote device by another individual, including a healthcare professional. A remote user, in some embodiments, accesses the results and uses the information to recommend action for treatment, intervention, cleanup of an environment.
[00361] In some cases, the method of disclosure is used as an initial screen for the presence of an SNP before whole genome sequencing. In other cases, the method of disclosure is used to replace whole genome sequencing. In other cases, the method of disclosure is used to detect specific mutations associated with neutralizing antibody evasion before physicians making a clinical decision of using certain antibody drugs to treat the infection. In other cases, the method of disclosure is used for contact tracing within communities.
[00362] a. Sample Preparation
[00363] Methods described herein, in some embodiments, comprise contacting a nasal swab or a throat swab from an individual. In some instances, the nasal swab is a nasopharyngeal swab. In some instances, the throat swab is an oropharyngeal swab. The procedures to collect a nasal swab is well known in the relevant field. Briefly, a tapered swab, in some embodiments, is used. When the individual is tilted to a certain angel, the swab is inserted into the individual’s nasal cavity parallel to the palate until resistance is met at turbinates. The swab, in some embodiments, is preserved into a sterile tube. The procedures to collect a throat swab are also well known in the relevant field. Briefly, in some embodiments, a swab is inserted into the mouth of the individual, and touches both tonsillar pillars and posterior oropharynx. In some embodiments, the swab is preserved into a sterile tube.
[00364] In some embodiments, methods described herein comprise preparing samples, including lysing the sample. In some instances, lysing the sample comprises contacting the sample to a lysis buffer.
[00365] A lysis reaction, in some embodiments, is performed at a range of temperatures. In some embodiments, a lysis reaction is performed at about room temperature. In some embodiments, a lysis reaction is performed at about 95°C. In some embodiments, a lysis reaction is performed at from 1 °C to 10 °C, from 4 °C to 8 °C, from 10 °C to 20 °C, from 15 °C to 25 °C, from 15 °C to 20 °C, from 18 °C to 25 °C, from 18 °C to 95 °C, from 20 °C to 37 °C, from 25 °C to 40 °C, from 35 °C to 45 °C, from 40 °C to 60 °C, from 50 °C to 70 °C, from 60 °C to 80 °C, from 70 °C to 90 °C, from 80 °C to 95 °C, or from 90 °C to 99 °C. In
some embodiments, a lysis reaction is performed for about 5 minutes, about 15 minutes, or about 30 minutes. In some embodiments, a lysis reaction is performed for from 2 minutes to 5 minutes, from 3 minutes to 8 minutes, from 5 minutes to 15 minutes, from 10 minutes to 20 minutes, from 15 minutes to 25 minutes, from 20 minutes to 30 minutes, from 25 minutes to 35 minutes, from 30 minutes to 40 minutes, from 35 minutes to 45 minutes, from 40 minutes to 50 minutes, from 45 minutes to 55 minutes, from 50 minutes to 60 minutes, from 55 minutes to 65 minutes, from 60 minutes to 70 minutes, from 65 minutes to 75 minutes, from 70 minutes to 80 minutes, from 75 minutes to 85 minutes, or from 80 minutes to 90 minutes.
[00366] The lysis buffer disclosed herein, in some embodiments, comprises a viral lysis buffer. A viral lysis buffer, in some embodiments, lyses a coronavirus capsid in a viral sample (e.g, a sample collected from an individual suspected of having a coronavirus infection), releasing a viral genome. The viral lysis buffer, in some embodiments, is compatible with amplification (e.g, RT-LAMP amplification) of a target region of the viral genome. The viral lysis buffer, in some embodiments, is compatible with detection (e.g, a DETECTR reaction disclosed herein). A sample, in some embodiments, is prepared in a one- step sample preparation method comprising suspending the sample in a viral lysis buffer compatible with amplification, detection (e.g, a DETECTR reaction), or both. A viral lysis buffer compatible with amplification (e.g, RT-LAMP amplification), detection (e.g, DETECTR), or both, in some embodiments, comprises a buffer (e.g, Tris-HCl, phosphate, or HEPES), a reducing agent (e.g, N-Acetyl Cysteine (NAC), Dithiothreitol (DTT), [3- mercaptoethanol (BME), or tris(2-carboxyethyl)phosphine (TCEP)), a chelating agent (e.g, EDTA or EGTA), a detergent (e.g, deoxy cholate, NP-40 (Ipgal), Triton X-100, or Tween 20), a salt (e.g, ammonium acetate, magnesium acetate, manganese acetate, potassium acetate, sodium acetate, ammonium chloride, potassium chloride, magnesium chloride, manganese chloride, sodium chloride, ammonium sulfate, magnesium sulfate, manganese sulfate, potassium sulfate, or sodium sulfate), or a combination thereof. For example, a viral lysis buffer, in some embodiments, comprises a buffer and a reducing agent, or a viral lysis buffer comprises a buffer and a chelating agent. The viral lysis buffer, in some embodiments, is formulated at a low pH. For example, the viral lysis buffer, in some embodiments, is formulated at a pH of from about pH 4 to about pH 5. In some embodiments, the viral lysis buffer, in some embodiments, is formulated at a pH of from about pH 4 to about pH 9. The viral lysis buffer, in some embodiments, further comprises a preservative (e.g, ProClin 150). In some cases, the viral lysis buffer, in some embodiments, comprises an activator of the
amplification reaction. For example, the buffer, in some embodiments, comprises primers, dNTPs, or magnesium (e.g., MgSOi. MgCh or MgOAc), or a combination thereof, to activate the amplification reaction. In some cases, an activator (e.g., primers, dNTPs, or magnesium) , in some embodiments, is added to the buffer following lysis of the coronavirus to initiate the amplification reaction.
[00367] b. RNA extraction
[00368] Methods described herein, in some embodiments, comprise RNA extraction. In some embodiments, the method described herein comprises obtaining a sample from a subject infected with or suspected to be infected with coronavirus. In some instances, the methods comprise extracting RNA from the sample. RNA can be extracted from the sample with conventional RNA extraction methods. In some instances, RNA is extracted with the Mag-Bind Viral DNA/RNA 96 kit (Omega Bio-Tek). In some instances, RNA is extracted with the HighPrep™ Viral DNA/RNA Kit (MagBio). In some instances, RNA is extracted with the AB MagPure Virus RNA Isolation Kit. In some instances, RNA is extracted with the GeneJET RNA Purification Kit (Thermo Fisher Scientific).
[00369] RNA, in some embodiments, is automatically extracted on a liquid handling platform. In some instances, RNA is automatically extracted on the KingFisher Flex (Thermo Fisher Scientific). In some instances, RNA is automatically extracted on Microlab® STAR (Hamilton). In some instances, RNA is automatically extracted on Microlab® NIMBUS (Hamilton). In some instances, RNA is automatically extracted on KingFisher® Duo Prime (Thermo Fisher Scientific). In some instances, RNA is automatically extracted on MagMAX® Express-96 (Applied Biosystems). In some instances, RNA is automatically extracted on BioSprint® 96 (QIAGEN).
[00370] c. Amplification
[00371] Methods described herein, in some embodiments, comprise amplifying a target nucleic acid for detection. In some embodiments, amplifying comprises changing the temperature of the amplification reaction, also known as thermal amplification (e.g., PCR). In some implementations, amplifying is performed at essentially one temperature, also known as isothermal amplification.
[00372] In some embodiments, amplifying comprises subjecting a target nucleic acid to an amplification reaction selected from transcription mediated amplification (TMA), helicase dependent amplification (HD A), or circular helicase dependent amplification (cHDA), strand
displacement amplification (SDA), recombinase polymerase amplification (RPA), loop mediated amplification (LAMP), exponential amplification reaction (EXPAR), rolling circle amplification (RCA), ligase chain reaction (LCR), simple method amplifying RNA targets (SMART), single primer isothermal amplification (SPIA), multiple displacement amplification (MDA), nucleic acid sequence based amplification (NASBA), hinge-initiated primer-dependent amplification of nucleic acids (HIP), nicking enzyme amplification reaction (NEAR), and improved multiple displacement amplification (IMDA).
[00373] In some instances, the target nucleic acid is amplified by LAMP. In some instances, the target nucleic acid is amplified by RT-LAMP. LAMP primer sets can be designed to target distinct but nearby sequences around one or more mutations (e.g., SNP) that are indicative of, at least in part, a SARS-CoV-2 variant. In some instances, four primers, comprising forward inner primer (FIP) with an overhang designed to form dumbbell structures, backward inner primer (BIP) with an overhang designed to form dumbbell structures, forward outer primer (F3) for displacing the FIP linked complementary strands from templates, and backward outer primer (B3) for displacing the BIP linked complementary strands from templates, are designed to target one region. In other instances, six primers, comprising FIP, BIP, F3, B3, loop forward (LF), and loop backward (LB), are designed to target one region, wherein LF and LB are added for faster amplification. In some instances, one or more of the primer are degenerative primers.
[00374] In some instances, a set of FIP, BIP, F3, B3, LF, and LB are designed to amplify around an SNP that is indicative of, at least in part, a SARS-CoV 2 variant. In specific instances, a set of FIP, BIP, F3, B3, LF, and LB are designed to amplify a fragment of DNA that encodes around amino acid position 452 of the S protein. In specific instances, a set of FIP, BIP, F3, B3, LF, and LB are designed to amplify a fragment of DNA that encodes around amino acid position 484 of the S protein. In specific instances, a set of FIP, BIP, F3, B3, LF, and LB are designed to amplify a fragment of DNA that encodes around amino acid position 501 of the S protein. In specific instances, a set of FIP, BIP, F3, B3, LF, and LB are designed to amplify a fragment of DNA that encodes around amino acid position 18, amino acid position 95, amino acid position 144, amino acid position 614, amino acid position 653, amino acid position 681, amino acid position 655, amino acid position 796, or amino acid position 1219 of the S protein. In one instance, a set of FIP, BIP, F3, B3, LF, and LB has primer sequences shown in Tables 3A and 3B herein.
[00375] Table 3A: LAMP Primer Sequences
[00377] Provided herein are compositions that, in some embodiments, further comprise reagents for amplification, the uses thereof, e.g., in systems, and methods for assaying for a SARS-CoV-2 variant. Non-limiting examples of reagents for amplifying a nucleic acid include polymerases, primers, and nucleotides. Nucleic acid amplification of a target nucleic acid, in some embodiments, improves at least one of sensitivity, specificity, or accuracy of the assay in detecting the target nucleic acid. In some cases, amplification of the target nucleic acid increases the concentration of the target nucleic acid in the sample relative to the concentration of nucleic acids that do not correspond to the target nucleic acid.
[00378] In some embodiments, amplification of the target nucleic acid comprises modifying the sequence of the target nucleic acid. For example, in some embodiments, amplification is used to insert a PAM sequence into a target nucleic acid that lacks a PAM sequence.
[00379] In some embodiments, the nucleic acid amplification reaction is carried out with reagents comprising amplification primers described herein, a DNA polymerase, dNTPs, and appropriate buffer. In some specific embodiments, the buffer comprises Tris-HCl, (NH4)2SO4, KC1, MgSO4, and Tween® 20. In one embodiment, the concentration of MgSCti is 6mM.
[00380] In some embodiments, amplification of the target nucleic acid comprises modifying the sequence of the target nucleic acid. For example, in some embodiments, amplification is used to insert a PAM sequence into a target nucleic acid that lacks a PAM sequence.
[00381] Amplifying, in some embodiments, takes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or 60 minutes. Sometimes, the nucleic acid amplification is performed for 1 to 60, 5 to 55, 10 to 50, 15 to 45, 20 to 40, or 25 to 35 minutes. Amplifying, in some embodiments, is performed at a temperature of around 20- 45°C. Amplifying, in some embodiments, is performed at a temperature of less than about 20°C, less than about 25°C, less than about 30°C, 35°C, less than about 37°C, less than about 40°C, or less than about 45°C. The nucleic acid amplification reaction, in some embodiments, is performed at a temperature of at least about 20°C, at least about 25°C, at least about 30°C, at least about 35°C, at least about 37°C, at least about 40°C, or at least about 45°C.
[00382] In some cases, the amplification reaction is performed on at least 10, 100, 1,000, 5,000, 10,000, 15,000 or 10,000 copies of the target nucleic acid. In some cases, at least 10,000 copies of the target nucleic acid are input into the amplification reaction.
[00383] In some embodiments, the nucleic acid amplification reaction is carried out with reagents comprising amplification primers described herein, a DNA polymerase, dNTPs, and appropriate buffer. In some specific embodiments, the buffer comprises Tris-HCl, (NH4)2SO4, KC1, MgSO4, and Tween® 20. In one embodiment, the concentration of MgSCti is 6mM.
[00384] d. Reverse transcription
[00385] Methods described herein, in some embodiments, comprise reverse transcribing the coronavirus target nucleic acid, the amplification product, or a combination thereof. In some instances, the reverse transcribing comprises contacting the sample to reagents for reverse transcription. In some specific instances, the reagents for reverse transcription comprise a reverse transcriptase, an oligonucleotide primer, and dNTPs. In one instance, the reverse transcriptase described herein is WarmStart RTx reverse transcriptase. In one instance, the reverse transcriptase described herein is MultiScribe™ reverse transcriptase. In one instance, the reverse transcriptase described herein is QuantiTect reverse transcriptase. In one instance, the reverse transcriptase described herein is GoScript™ reverse transcriptase. In one instance, the reverse transcriptase described herein is UltraScript 2.0 reverse transcriptase.
[00386] In some instances, the contacting the sample to reagents for reverse transcription occurs prior to the contacting the sample to the detector nucleic acid to the detector nucleic acid and the composition, prior to the contacting the sample to the reagents for amplification, or prior to both.
[00387] In some instances, the contacting the sample to reagents for reverse transcription occurs concurrent to the contacting the sample to the detector nucleic acid to the detector nucleic acid and the composition, concurrent to the contacting the sample to the reagents for amplification, or concurrent to both.
[00388] e. Generatins and assaying signals
[00389] Methods described herein, in some embodiments, comprise contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof.
[00390] Methods described herein, in some embodiments, further comprise assaying for a change in a signal produced by cleavage of a reporter nucleic acid wherein the effector protein trans cleaves the reporter nucleic acid upon hybridization of the non-naturally occurring guide nucleic acid to the segment of the target nucleic acid or the amplification product thereof.
[00391] i. Effector Proteins
[00392] Provided herein are effector proteins and uses thereof, e.g., in compositions, systems, and methods for assaying for a SARS-CoV-2 variant.
[00393] Several effector proteins are consistent with the methods of the disclosure. For example, CRISPR/Cas enzymes are effector proteins used in the methods and systems disclosed herein. CRISPR/Cas enzymes can include any of the known Classes and Types of CRISPR/Cas enzymes. Effector proteins disclosed herein include Class 1 CRISPR/Cas enzymes, such as the Type I, Type IV, or Type III CRISPR/Cas enzymes. Effector proteins disclosed herein also include the Class 2 CRISPR/Cas enzymes, such as the Type II, Type V, and Type VI CRISPR/Cas enzymes.
[00394] In some embodiments, the Type V CRISPR/Cas enzyme is a Casl2 effector protein. Type V CRISPR/Cas enzymes (e.g, Casl2 or Casl4) lack an HNH domain. A Casl2 nuclease of the disclosure cleaves a nucleic acids via a single catalytic RuvC domain. The RuvC domain is within a nuclease, or “NUC” lobe of the protein, and the Casl2 nucleases further comprise a recognition, or “REC” lobe. The REC and NUC lobes are connected by a bridge helix and the Casl2 proteins additionally include two domains for PAM recognition termed the PAM interacting (PI) domain and the wedge (WED) domain. (Murugan et al. , Mol Cell. 2017 Oct 5; 68(1): 15-25). In some embodiments, the programmable Casl2 effector protein described herein is Casl2a. In some embodiments, the programmable Casl2 effector protein described herein is Casl2c. In some embodiments, the programmable Casl2 effector protein described herein is Casl2d. In some embodiments, the programmable Casl2 effector protein described herein is Casl2e. In some embodiments, the programmable Casl2 effector protein described herein is Casl2b. In some embodiments, the programmable Casl2 effector protein described herein is Casl2h. In some embodiments, the programmable Casl2 effector protein described herein is Casl2i. In some embodiments, the programmable Casl2 effector protein described herein is a small effector such as Casl2g.
[00395] ii. Guide Nucleic Acids
[00396] Provided herein are compositions comprising an effector protein and a guide nucleic acid and uses thereof, e.g., in systems, and methods for assaying for a SARS-CoV-2 variant.
[00397] In some embodiments, the guide nucleic acid described herein targets (e.g, is capable of binding to and/or is complementary to) a target nucleic acid. In some instances, the target nucleic acid is near a neighboring PAM sequence that is recognizable by the effector proteins described herein. In some instances, the engineered guide RNA comprises a CRISPR RNA (crRNA) that is at least partially complementary to a target nucleic acid. In
some instances, the engineered guide RNA comprises a trans-activating crRNA (tracrRNA), at least a portion of which interacts with the effector protein. The tracrRNA, in some embodiments, hybridizes to a portion of the guide RNA that does not hybridize to the target nucleic acid. In some instances, at least a portion of a crRNA sequence and at least a portion of a tracrRNA sequence are provided as a single guide RNA (sgRNA). In some instances, compositions comprise a crRNA and tracrRNA that function together as two separate, unlinked molecules.
[00398] In some instances, the guide nucleic acid targets (e.g, is capable of binding to and/or is complementary to) a target nucleic acid indicative of, at least in part, a SARS-CoV- 2 variant. In some instances, the target nucleic acid is from a SARS-CoV-2 variant and comprises a mutation relative to a wild-type SARS-CoV-2 or to a strain of SARS-CoV-2 that comprises no mutations at the same locus. In some instances, the mutation is a single nucleotide polymorphism (SNP). In some instance, the mutation (e.g, an SNP) can be used to distinguish the SARS-CoV-2 variant from a wild-type or from another SARS-CoV-2 variant. In some instances, the target nucleic acid is a RNA, DNA, or a synthetic nucleic acid. In some instances, the guide nucleic acid targets a target nucleic acid which comprises a segment of an S (Spike) gene of SARS-CoV-2. In some cases, the S protein plays a key role in the receptor recognition and cell membrane fusion process, which is essential for the infection and transmission capabilities of SARS-CoV-2. In some instances, the guide nucleic acid targets a target nucleic acid comprising certain SNP(s) in the S gene that encodes S protein. In some specific instances, the guide nucleic acid targets a portion of the S gene encoding around amino acid position 18, amino acid position 95, amino acid position 144, amino acid position 614, amino acid position 653, amino acid position 681, amino acid position 655, amino acid position 796, or amino acid position 1219 of the Spike protein of SARS-CoV-2 (see NCBI Reference Sequence: YP_009724390.1).
[00399] In some embodiments, the guide nucleic acid is a guide nucleic acid represented in Table 4A or Table 4B.
[00401] Table 4B: Additional E484 Mutant Guide Nucleic Acid Sequences
[00402] In some embodiments, the guide nucleic acid described herein targets around position 452 in the Spike protein and hybridizes to a wild-type segment of sequence. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 1. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 1. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 1. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 1. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 1. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 22. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 22. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 22. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 22. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 22. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 22. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 22.
[00403] In some embodiments, the guide nucleic acid described herein targets around position 484 in the Spike protein and hybridizes to a wild-type segment of sequence. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 2. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 2. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 2. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 2. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 2. In
some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 2. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 23. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 23. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 23. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 23. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 23. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 23.
[00404] In some embodiments, the guide nucleic acid described herein targets around position 501 in the Spike protein and hybridizes to a wild-type segment of sequence. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 3. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 3. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 3. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 3. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 3. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 3. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 24. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 24. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 24. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 24. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ
ID NO: 24. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 24.
[00405] In some embodiments, the guide nucleic acid described herein targets around position 452 in the Spike protein and hybridizes to a mutant segment of sequence. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 4. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 4. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 4. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 4. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 4. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 4. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 25. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 25. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 25. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 25. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 25. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 25.
[00406] In some embodiments, the guide nucleic acid described herein targets around position 484, and hybridizes to a mutant segment of sequence. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 5. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 5. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 5. In some specific embodiments, the guide nucleic acid described
herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 5. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 5. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 5. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 26. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 26. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 26. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 26. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 26. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 26. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 40. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 40. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 40. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 40. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 40. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 40. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 4E In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 4E In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 4E In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 4E In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 4E In some specific embodiments, the guide nucleic acid described herein comprises
a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 41. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 42. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 42. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 42. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 42. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 42. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 42.
[00407] In some embodiments, the guide nucleic acid described herein targets around position 501, and hybridizes to a mutant segment of sequence. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 6. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 6. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 6. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 6. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 6. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 6. In some specific embodiments, the guide nucleic acid described herein has a nucleotide sequence of SEQ ID NO: 27. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 60% identical to SEQ ID NO: 27. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 70% identical to SEQ ID NO: 27. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 80% identical to SEQ ID NO: 27. In some specific embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 90% identical to SEQ ID NO: 6. In some specific
embodiments, the guide nucleic acid described herein comprises a nucleotide sequence that is at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identical to SEQ ID NO: 6.
[00408] In some embodiments, the guide nucleic acids described herein comprise UAAUUUCUACUAAGUGUAGAU (SEQ ID NO: 126) on the 5’ end to be recognized by CasDxl, as described in the Example Section.
[00409] In some embodiments, the guide nucleic acids described herein comprise UAAUUUCUACUAAGUGUAGAU (SEQ ID NO: 126) on the 5’ end to be recognized by LbCasl2a, as described in the Example Section
[00410] In some embodiments, the guide nucleic acids described herein comprise UAAUUUCUACUCUUGUAGAU (SEQ ID NO: 127) on the 5’ end to be recognized by AsCasl2a, as described in the Example Section.
[00411] Hi. Reporter Nucleic Acids
[00412] Provided herein are systems comprising an effector protein, a guide nucleic acid and a reporter nucleic acid and uses thereof, e.g., in systems, and methods for assaying for a SARS-CoV-2 variant.
[00413] In some instances, a reporter comprises a single-stranded nucleic acid and a detection moiety, wherein the single-stranded nucleic acid is capable of being cleaved by the activated effector protein, thereby generating a detectable signal. In some cases, the reporter comprises deoxyribonucleotides. In other cases, the reporter comprises ribonucleotides. In some cases, the reporter comprises at least one deoxyribonucleotide and at least one ribonucleotide. In some cases, the reporter comprises a single-stranded nucleic acid comprising at least one ribonucleotide residue at an internal position that functions as a cleavage site.
[00414] In some instances, the reporter comprises a protein capable of generating a signal. A signal, in some embodiments, is a calorimetric, potentiometric, amperometric, optical (e.g, fluorescent, colorimetric, etc.), or a piezo-electric signal. In some cases, the reporter comprises a detection moiety. Suitable detectable labels and/or moieties contemplated for providing a signal include, but are not limited to, an enzyme, a radioisotope, a member of a specific binding pair; a fluorophore; a fluorescent protein; a quantum dot; and the like.
[00415] Suitable fluorescent proteins include, but are not limited to, green fluorescent protein (GFP) or variants thereof, blue fluorescent variant of GFP (BFP), cyan fluorescent
variant of GFP (CFP), yellow fluorescent variant of GFP (YFP), enhanced GFP (EGFP), enhanced CFP (ECFP), enhanced YFP (EYFP), GFPS65T, Emerald, Topaz (TYFP), Venus, Citrine, mCitrine, GFPuv, destabilized EGFP (dEGFP), destabilized ECFP (dECFP), destabilized EYFP (dEYFP), mCFPm, Cerulean, T-Sapphire, CyPet, YPet, mKO, HcRed, t- HcRed, DsRed, DsRed2, DsRed-monomer, J-Red, dimer2, t-dimer2(12), mRFPl, pocilloporin, Renilla GFP, Monster GFP, paGFP, Kaede protein and kindling protein, Phycobiliproteins and Phycobiliprotein conjugates including B-Phycoerythrin, R- Phycoerythrin and Allophycocyanin. Suitable enzymes include, but are not limited to, horse radish peroxidase (HRP), alkaline phosphatase (AP), beta-galactosidase (GAL), glucose-6- phosphate dehydrogenase, beta-N-acetylglucosaminidase, P-glucuronidase, invertase, Xanthine Oxidase, firefly luciferase, and glucose oxidase (GO).
[00416] In some cases, the reporter comprises a detection moiety. In some instances, the reporter comprises a cleavage site, wherein the detection moiety is located at a first site on the reporter, wherein the first site is separated from the remainder of reporter upon cleavage at the cleavage site. In some cases, the detection moiety is 3’ to the cleavage site. In some cases, the detection moiety is 5’ to the cleavage site. Sometimes the detection moiety is at the 3’ terminus of the nucleic acid of a reporter. In some cases, the detection moiety is at the 5’ terminus of the nucleic acid of a reporter.
[00417] In some cases, the reporter comprises a detection moiety and a quenching moiety. In some instances, the reporter comprises a cleavage site, wherein the detection moiety is located at a first site on the reporter and the quenching moiety is located at a second site on the reporter, wherein the first site and the second site are separated by the cleavage site. Sometimes the quenching moiety is a fluorescence quenching moiety. In some cases, the quenching moiety is 5’ to the cleavage site and the detection moiety is 3’ to the cleavage site. In some cases, the detection moiety is 5’ to the cleavage site and the quenching moiety is 3’ to the cleavage site. Sometimes the quenching moiety is at the 5’ terminus of the nucleic acid of a reporter. Sometimes the detection moiety is at the 3’ terminus of the nucleic acid of a reporter. In some cases, the detection moiety is at the 5’ terminus of the nucleic acid of a reporter. In some cases, the quenching moiety is at the 3’ terminus of the nucleic acid of a reporter.
[00418] In some embodiments, the reporter is a reporter represented in Table 5.
[00419] Table 5: Exemplary Single Stranded Detector Nucleic Acid
/5Alex594N/: 5' Alexa Fluor 594 (NHS Ester) (Integrated DNA Technologies)
/3IAbRQSp/: 3' Iowa Black RQ (Integrated DNA Technologies)
[00420] In one embodiment, the detector nucleic acid has a sequence of SEQ ID NO: 7. In some cases, the detector nucleic acid comprises a sequence that is at least 87.5% identical to SEQ ID NO: 7. In one embodiment, the detector nucleic acid comprises a sequence that is at least 75% identical to SEQ ID NO: 7. In one embodiment, the detector nucleic acid comprises a sequence that is at least 62.5% identical to SEQ ID NO: 7. In one embodiment, the detector nucleic acid comprises a sequence that is at least 50% identical to SEQ ID NO: 7.
[00421] Suitable fluorophores, in some embodiments, provide a detectable fluorescence signal in the same range as 6-Fluorescein (Integrated DNA Technologies), IRDye 700 (Integrated DNA Technologies), TYE 665 (Integrated DNA Technologies), Alex Fluor 594 (Integrated DNA Technologies), or ATTO TM 633 (NHS Ester) (Integrated DNA Technologies). Non-limiting examples of fluorophores are fluorescein amidite, 6-Fluorescein, IRDye 700, TYE 665, Alex Fluor 594, or ATTO TM 633 (NHS Ester). The fluorophore may be an infrared fluorophore. The fluorophore may emit fluorescence in the range of 500 nm and 720 nm. In some cases, the fluorophore emits fluorescence at a wavelength of 700 nm or higher. In other cases, the fluorophore emits fluorescence at about 665 nm. In some cases, the fluorophore emits fluorescence in the range of 500 nm to 520 nm, 500 nm to 540 nm, 500 nm to 590 nm, 590 nm to 600 nm, 600 nm to 610 nm, 610 nm to 620 nm, 620 nm to 630 nm, 630 nm to 640 nm, 640 nm to 650 nm, 650 nm to 660 nm, 660 nm to 670 nm, 670 nm to 680 nm, 690 nm to 690 nm, 690 nm to 700 nm, 700 nm to 710 nm, 710 nm to 720 nm, or 720 nm to 730 nm. In some cases, the fluorophore emits fluorescence in the range 450 nm to 750 nm, 500 nm to 650 nm, or 550 to 650 nm.
[00422] A quenching moiety, in some embodiments, is chosen based on its ability to quench the detection moiety. A quenching moiety, in some embodiments, is a non- fluorescent fluorescence quencher. A quenching moiety, in some embodiments, quenches a detection moiety that emits fluorescence in the range of 500 nm and 720 nm. A quenching
moiety, in some embodiments, quenches a detection moiety that emits fluorescence in the range of 500 nm and 720 nm. In some cases, the quenching moiety quenches a detection moiety that emits fluorescence at a wavelength of 700 nm or higher. In other cases, the quenching moiety quenches a detection moiety that emits fluorescence at about 660 nm or about 670 nm. In some cases, the quenching moiety quenches a detection moiety that emits fluorescence in the range of 500 to 520, 500 to 540, 500 to 590, 590 to 600, 600 to 610, 610 to 620, 620 to 630, 630 to 640, 640 to 650, 650 to 660, 660 to 670, 670 to 680, 690 to 690, 690 to 700, 700 to 710, 710 to 720, or 720 to 730 nm. In some cases, the quenching moiety quenches a detection moiety that emits fluorescence in the range 450 nm to 750 nm, 500 nm to 650 nm, or 550 to 650 nm. A quenching moiety, in some embodiments, quenches fluorescein amidite, 6-Fluorescein, IRDye 700, TYE 665, Alex Fluor 594, or ATTO TM 633 (NHS Ester). A quenching moiety, in some embodiments, is Iowa Black RQ, Iowa Black FQ or IRDye QC-1 Quencher. A quenching moiety, in some embodiments, quenches fluorescein amidite, 6-Fluorescein (Integrated DNA Technologies), IRDye 700 (Integrated DNA Technologies), TYE 665 (Integrated DNA Technologies), Alex Fluor 594 (Integrated DNA Technologies), or ATTO TM 633 (NHS Ester) (Integrated DNA Technologies). A quenching moiety may be Iowa Black RQ (Integrated DNA Technologies), Iowa Black FQ (Integrated DNA Technologies) or IRDye QC-1 Quencher (LiCor). Any of the quenching moieties described herein, in some embodiments, is from any commercially available source, may be an alternative with a similar function, a generic, or a non-trade name of the quenching moieties listed.
[00423] In some cases, the reporter is present in at least 1.5 fold, at least 2 fold, at least 3 fold, at least 4 fold, at least 5 fold, at least 6 fold, at least 7 fold, at least 8 fold, at least 9 fold, at least 10 fold, at least 11 fold, at least 12 fold, at least 13 fold, at least 14 fold, at least 15 fold, at least 16 fold, at least 17 fold, at least 18 fold, at least 19 fold, at least 20 fold, at least 30 fold, at least 40 fold, at least 50 fold, at least 60 fold, at least 70 fold, at least 80 fold, at least 90 fold, at least 100 fold, from 1.5 fold to 100 fold, from 2 fold to 10 fold, from 10 fold to 20 fold, from 20 fold to 30 fold, from 30 fold to 40 fold, from 40 fold to 50 fold, from 50 fold to 60 fold, from 60 fold to 70 fold, from 70 fold to 80 fold, from 80 fold to 90 fold, from 90 fold to 100 fold, from 1.5 fold to 10 fold, from 1.5 fold to 20 fold, from 10 fold to 40 fold, from 20 fold to 60 fold, or from 10 fold to 80 fold excess of total nucleic acids in a sample.
[00424] f. Determination of variant calls
[00425] Methods described herein, in some embodiments, comprise determining a variant call of a sample. In some cases, determining the variant call comprises determining whether the sample comprises a wild-type SARS-CoV-2 or a SARS-CoV-2 variant. In specific cases, determining the variant call of the SARS-CoV-2 variant comprises detecting one or more S- gene mutation(s) relevant to a wild-type SARS-CoV-2. In specific cases, the SARS-CoV-2 variant is any one of Alpha, Beta, Gamma, Delta, Epsilon, Kappa, Omicron, or Zeta SARS- CoV-2 variants.
[00426] In some instances, determining the variant call of Alpha SARS-CoV-2 variant comprises detecting an S-gene mutation in amino acid position 501 from N to Y (N501 Y). In specific cases, determining the variant call of Alpha SARS-CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid position 501 of S protein. In certain cases, determining the variant call of Alpha SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 3 or 6. In certain cases, determining the variant call of Alpha SARS- CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 24 or 27.
[00427] In some instances, determining the variant call of Beta SARS-CoV-2 variant comprises detecting S-gene mutations in amino acid positions 484 from E to K and 501 from N to Y (E484K and N501Y). In specific cases, determining the variant call of Beta SARS- CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid positions 484 and 501 of S protein. In certain cases, determining the variant call of Beta SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 2, 3, 5, or 6. In certain cases, determining the variant call of Beta SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 23, 24, 26, or 27.
[00428] In some instances, determining the variant call of Gamma SARS-CoV-2 variant comprises detecting S-gene mutations in amino acid positions 484 from E to K and 501 from N to Y (E484K and N501Y). In specific cases, determining the variant call of Gamma SARS- CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid positions 484 and 501 of S protein. In certain cases, determining the variant call of Gamma SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 2, 3, 5, or 6. In certain cases, determining the variant call of Gamma SARS-CoV-2 variant comprises using the non- naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 23, 24, 26, or 27.
[00429] In some instances, determining the variant call of Delta SARS-CoV-2 variant comprises detecting an S-gene mutation in amino acid position 452 from L to R (L452R). In specific cases, determining the variant call of Delta SARS-CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid position 452 of S protein. In certain cases, determining the variant call of Delta SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 1 or 4. In certain cases, determining the variant call of Delta SARS- CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 22 or 25.
[00430] In some instances, determining the variant call of Epsilon SARS-CoV-2 variant comprises detecting an S-gene mutation in amino acid position 452 from L to R (L452R). In specific cases, determining the variant call of Epsilon SARS-CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid position 452 of S protein. In certain cases, determining the variant call of
Epsilon SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 1 or 4. In certain cases, determining the variant call of Epsilon SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 22 or 25.
[00431] In some instances, determining the variant call of Kappa SARS-CoV-2 variant comprises detecting an S-gene mutation in amino acid position 452 from L to R (L452R). In specific cases, determining the variant call of Kappa SARS-CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid position 452 of S protein. In certain cases, determining the variant call of Kappa SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 1 or 4. In certain cases, determining the variant call of Kappa SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 22 or 25.
[00432] In some instances, determining the variant call of Omicron SARS-CoV-2 variant comprises detecting an S-gene mutation in amino acid position 484 from E to A (E484A). In specific cases, determining the variant call of Omicron SARS-CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid position 484 of S protein. In certain cases, determining the variant call of Omicron SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 2 or 42.
[00433] In some instances, determining the variant call of Zeta SARS-CoV-2 variant comprises detecting an S-gene mutation in amino acid position 484 from E to K (E484K). In specific cases, determining the variant call of Zeta SARS-CoV-2 variant comprises contacting the amplification product, the target nucleic acid, or a combination thereof, to a
composition comprising an effector protein and a non-naturally occurring guide nucleic acid that hybridizes to a segment of the target nucleic acid or amplification product thereof that contains amino acid position 484 of S protein. In certain cases, determining the variant call of Zeta SARS-CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 2 or 5. In certain cases, determining the variant call of Zeta SARS- CoV-2 variant comprises using the non-naturally occurring guide nucleic acids each comprising a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 23 or 26.
[00434] In some instances, the data analysis pipeline to determine a variant call of a sample comprises 1) obtaining the maximum fluorescence value and the minimum fluorescence value from a well containing the sample, an effector protein, a reporter nucleic acid, and a non-naturally occurring guide nucleic acid that targets the target nucleic acid from wild-type SARS-CoV-2; 2) dividing the maximum fluorescence value by the corresponding minimum fluorescence value from 1); 3) obtaining the maximum fluorescence value and the minimum fluorescence value from a well with the sample, an effector protein, a reporter nucleic acid, and a non-naturally occurring guide nucleic acid that targets the target nucleic acid from a SARS-CoV-2 variant. In some instances, the non-naturally occurring guide nucleic acid are designed for detecting mutations L452R, E484K, E484Q, E484A, and N501Y. In specific instances, the non-naturally occurring guide nucleic acid comprises a nucleotide sequence that is at least 80%, 90%, 95%, 98%, or 100% identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 22, 23, 24, 25, 26, 27, 40, 41, or 42; 4) dividing the maximum fluorescence value by the corresponding minimum fluorescence value from 3). In specific instances, the ratio of maximum fluorescence value and the minimum fluorescence value is defined as fluorescence yield, and it can be calculated by the formula: Fy = max(F)/min(F). Therefore, from step 2) above, Fy (WT) is obtained. From step 4) above, Fy (Mutant) (or Fy(M)) is obtained.
[00435] Next, in some instances, the data analysis pipeline to determine a variant calling of a sample comprises transforming the normalized signals described herein (e.g., fluorescence yield). In specific instances, the transformation is applying a logarithmic value to normalized signal (e.g., fluorescence yield). In one instance, the transformation is applying a logarithmic value with a base of 2 to normalized signal (e.g., fluorescence yield). Therefore, log2 Fy (WT) and log2 Fy (M) are obtained. In some specific instances, an allele
discrimination plot can be generated with the transformed signal (see FIGS. 4A-4C or FIG. 11D as non-limiting examples). In other specific instances, a binary logarithmic value can be applied to scaled signals, and the ratio of the WT and Mutant transformed values are plotted against the average of the WT and Mutant transformed values, a mean average (MA) plot can be generated (see FIGS. 8A-8H or FIGS. 12A-12M as non-limiting examples).
[00436] Next, in some instances, the data analysis pipeline to determine a variant calling of a sample comprises comparing the transformed signals. In some specific instances where the transformed signals from WT are higher than the transformed signals from Mutant, the sample is determined as WT. In one instance, the data analysis pipeline can be expressed as log2(Fy(WT)) > log2(Fy(M)) —> Wild Type. In some specific instances where the transformed signals from WT are lower than the transformed signals from Mutant, the sample is determined as Mutant that is indictive of variant. In one instance, the data analysis pipeline can be expressed as log2(Fy(WT)) < log2(Fy(M)) —> Mutant.
[00437] In cases where more than one mutant at a particular position was being analyzed for, the signals of each mutant can be compared to the wild-type signal. If there exists a mutant, then among (n) comparisons for n mutants and one wild-type, one of the comparisons may yield a mutant call. In one instance, the data analysis pipeline can be expressed as log2(Fy(WT)) > log2(Fy(Ml)) Wild Type, log2(Fy(WT)) > log2(Fy(M2)) Wild Type, log2(Fy(WT)) > log2(Fy(M3)) Wild Type, log2(Fy(WT)) < log2(Fy(M4)) Mutant (4).
[00438] In cases where there is a tie in the above logic between mutant and wild-type, then a tie breaker comparison can be used to yield a final result. In one instance, the tie breaker analysis pipeline can be expressed as log2(Fy(WT)) > log2(Fy(Ml)) Wild Type, log2(Fy(WT)) < log2(Fy(M2)) Mutant (2), log2(Fy(WT)) > log2(Fy(M3)) Wild Type, log2(Fy(WT)) < log2(Fy(M4)) Mutant (4), log2(Fy(M2)) < log2(Fy(M4)) Mutant (4).
[00439] Separately, in some instances, a no-target control (NTC) can be processed concurrently with the sample, and the maximum and the minimum fluorescence values can be scaled and transformed as described herein for a sample. In specific instances, logarithmic ratio between the scaled signals and the logarithmic mean of the scaled signals can be defined as Contrast and Size. In some specific instances where the Contrast and the Size of the sample are within the lower and upper bounds of the NTC that is concurrently processed, the sample is determined as No Call. In one instance, the data analysis pipeline can be expressed as Cmin(NTC-snp)<=C(sample-snp)<=Cmax(NTC-snp) —> NoCall. In one instance, the data
analysis pipeline can be expressed as Smin(NTC-snp)<=S(sample-snp)<=Smax(NTC-snp) NoCall (see “NoCalls” and “NTC” in FIG. 1 ID as a non-limiting example).
[00440] In some instances, the quality of the determination of a variant calling can be visualized by the separation of WT and Mutant signals in the allele discrimination plot described herein. In other instances, the quality of the determination of a variant calling can be visualized by the separation of WT and Mutant signals in the MA plot described herein (see FIGS. 8A-8H or FIGS. 12A-12M as non-limiting examples).
[00441] III. Multiplexing.
[00442] The compositions, systems, and methods described herein, in some implementations, are multiplexed in a number of ways. These methods of multiplexing are, for example, consistent with methods and reagents disclosed herein for detection of a target nucleic acid within the sample.
[00443] Multiplexing, in some embodiments, are either spatial multiplexing wherein multiple different target nucleic acids are detected at the same time, but the reactions are spatially separated. Often, the multiple target nucleic acids are detected using the same programmable nuclease, but different guide nucleic acids. The multiple target nucleic acids sometimes are detected using the different programmable nucleases. Sometimes, multiplexing can be single reaction multiplexing wherein multiple different target acids are detected in a single reaction volume. Often, a single population of programmable nucleases is used in single reaction multiplexing. Sometimes, at least two different programmable nucleases are used in single reaction multiplexing. For example, multiplexing can be enabled by immobilization of multiple categories of detector nucleic acids within a fluidic system, to enable detection of multiple target nucleic acids within a single sample.
[00444] IV. Kits.
[00445] Disclosed herein are kits, reagents, methods, and systems for use in detecting a SARS-CoV-2 variant.
[00446] In some instances, such kits include a package, carrier, or container that is compartmentalized to receive one or more containers such as vials, tubes, and the like, each of the container(s) comprising one of the separate elements to be used in a method described herein. Suitable containers include, for example, test wells, bottles, vials, and test tubes. In one embodiment, the containers are formed from a variety of materials such as glass, plastic, or polymers.
[00447] The kit or systems described herein contain packaging materials. Examples of packaging materials include, but are not limited to, pouches, blister packs, bottles, tubes, bags, containers, bottles, and any packaging material suitable for intended mode of use.
[00448] A kit typically includes labels listing contents and/or instructions for use, and package inserts with instructions for use. A set of instructions will also typically be included. In one embodiment, a label is on or associated with the container. In some instances, a label is on a container when letters, numbers or other characters forming the label are attached, molded or etched into the container itself; a label is associated with a container when it is present within a receptacle or carrier that also holds the container, e.g, as a package insert. In one embodiment, a label is used to indicate that the contents are to be used for a specific therapeutic application. The label also indicates directions for use of the contents, such as in the methods described herein.
[00449] After packaging the formed product and wrapping or boxing to maintain a sterile barrier, the product may be terminally sterilized by heat sterilization, gas sterilization, gamma irradiation, or by electron beam sterilization. Alternatively, the product may be prepared and packaged by aseptic processing.
[00450] EXAMPLES
[00451] The following examples are illustrative and non-limiting to the scope of the methods, reagents, systems, and kits described herein.
[00452] Example 1 - Identification of Genetic Phenotypes in Synthetic SARS-CoV-2 S-
Gene Fragments
[00453] Systems and methods for identifying genetic phenotypes in accordance with an embodiment of the present disclosure were developed and tested using samples of synthetic gene fragments encoding regions of the SARS-CoV-2 S-gene with either wild-type (WT) or mutant (MUT) sequences at amino acid positions 452, 484, and 501. Wild-type and mutant synthetic gene fragments were first PCR amplified using NEB 2x Phusion Master Mix following the manufacturer’s protocol. The amplified product was cleaned using AMPure XP beads following manufacturers protocol at a 0.7x concentration, eluted in nuclease-free water, and normalized to 10 nM.
[00454] A signal dataset was generated using a wild-type sample containing the wild-type synthetic gene fragments and a mutant sample containing the mutant synthetic gene fragments, in which each sample was extracted for their respective nucleic acids and
partitioned into replicates (e.g., three wild-type amplification replicates and three mutant amplification replicates). Amplification replicates were amplified by reverse transcriptase loop-mediated isothermal amplification (RT-LAMP), and each amplification replicate was further partitioned into replicate wells in a microtiter plate, for each of the three amino acid positions 452, 484, and 501 (see, for example, the schematic in FIGS. 4A-4B). Replicate wells were processed using comparative programmable nuclease-based (DETECTR) reactions, in which amplified synthetic gene fragments were interrogated using a CasDxl programmable nuclease, one or more reporters, and a guide sequence corresponding to either the wild-type sequence or the mutant sequence of the SARS-CoV-2 S-gene.
[00455] Briefly, two LAMP primer sets, each containing 6 primers, were designed to target the L452R, E484K and N501Y mutations in the SARS-CoV-2 Spike (S) protein. Multiplexed RT-LAMP was performed using a final reaction volume of 50 pl, which consisted of 8 ul nucleic acid template, 5 pl of L452R primer set, 5 pl of E484K/N501Y primer set, 17 pl of nuclease-free water, 1 pl of SYTO-9 dye, and 14 pl of LAMP mastermix. Each of the primer sets consisted of 1.6 pM each of inner primers FIP and BIP, 0.2 pM each of outer primers F3 and B3, and 0.8 pM each of loop primers LF and LB. The LAMP mastermix contained 6 mM of MgS04, isothermal amplification buffer at lx final concentration, 1.5 mM of dNTP mix, 8 units of Bst 2.0 WarmStart DNA Polymerase, and 0.5 ul of WarmStart RTx Reverse Transcriptase. Plates were incubated at 65°C for 40 minutes in a real-time Quantstudio™ 5 PCR instrument. Fluorescent signals were collected every 30 seconds. 40nM Cas protein was incubated with 40nM gRNA in lx buffer for 30 min at 37°C. lOOnM ssDNA reporter (/5Alex594N/TTATTATT/3IAbRQSp/, IDT) was added to the RNA- protein complex. 18pL of this DETECTR® master mix was combined with 2pL target amplicon. The DETECTR® assays were monitored for 30 min at 37°C in a plate reader (Tecan).
[00456] Each sample was thus interrogated using a wild-type/mutant comparison in replicates of three, for each of the three amino acid positions, and for each of the three amplification replicates. Reporting signals for each well were obtained based on fluorescence intensities generated by cleavage of the reporter by the CasDxl programmable nuclease upon recognition of the target sequence by the guide sequence.
[00457] To evaluate the signal dataset, the fluorescence intensities of each well were normalized by scaling the maximum fluorescence values to the minimum fluorescence values for the respective well to generate fluorescence yields. Scaled signals for wells interrogated
with the wild-type guide sequence were then compared with scaled signals for wells interrogated with the mutant guide sequence, for each variant amino acid position per sample in the plate. Specifically, after scaling, a logarithmic transformation was applied to the generated fluorescence yields for a respective well, thus obtaining a logarithmic ratio calculated in the form of log2(Fy(FAl)) / log2(Fy(SAl)),
[00458] where Fy(FAl) is the corresponding fluorescence yield for a first well interrogated with the wild-type guide sequence in the comparative assay, and Fy(SAl) is the corresponding fluorescence yield for a second well of the same sample type interrogated with the mutant guide sequence in the comparative assay. A well was deemed to be wild-type (e.g, contain an aliquot of the wild-type synthetic gene fragments) if the resulting ratio indicated a greater value for log2(Fy(FAl)) than for log2(Fy(SAl)), whereas a well was deemed to be mutant (e.g, contain an aliquot of the mutant synthetic gene fragments) if the resulting ratio indicated a lower value for log2(Fy(FAl)) than for log2(Fy(SAl)).
[00459] Additionally, comparative assay replicates were designated as no-call if the logarithmic ratio between the WT and MUT fluorescence yields fell within the maximum fluorescence yield and minimum fluorescence yield generated for a plurality of control wells containing no target samples (NTCs). Comparative assay replicates were further designated as no-call if the logarithmic mean of the WT and MUT fluorescence yields fell within the range of the maximum and minimum logarithmic means for NTCs, where logarithmic means were calculated in the form of
[log2(Fy(FAl) + Fy(SAl))]/2.
[00460] The assignment of no-call with this method was based on the assumption that an NTC will have neither WT nor MUT signals, thereby making no-calls indistinguishable from NTC samples.
[00461] FIGS. 5A, 5B, and 5C illustrate the identification of differentiated phenotypes WT and MUT (“M”) using the systems and methods disclosed herein. FIG. 5A illustrates allele discrimination plots that show separation of the WT and MUT (“M”) signals, which are concordant with the genotype of the originating samples (WT, M, and NTC). FIG. 5B illustrates further separation of the populations after application of a binary logarithmic value to each scaled signal in each well. These transformed values provide clear and improved differentiation for making mutation calls when the ratio of the WT and MUT transformed
values are plotted against the average of the WT and MUT transformed values on a Mean Average (MA) plot. This analysis resulted in 100% concordance for SNP identity at positions 452, 484, and 501, which was consistent when observed as individual clusters of wells as illustrated in FIG. 5C. These results, obtained using synthetic gene fragment samples, provide robust validation of the presently disclosed systems and methods for use in biological and/or clinical contexts.
[00462] Example 2 -Rapid SARS-CoV-2 Variant Detection with CRISPR
[00463] Three different DNA-targeting CRISPR-Cas effectors with trans-cutting activity - CasDxl (Mammoth Biosciences), LbCasl2a (NEB) and AsCasl2a (IDT) - were evaluated for activity on SARS-CoV-2 variants.
[00464] SNP differentiation capabilities on synthetic guide nucleic acids
[00465] Guide RNAs (gRNAs) with CasDxl and LbCasl2a were initially screened for activity on synthetic gene fragments encoding regions of the SARS-CoV-2 S-gene with either wild-type (WT, SEQ ID NO: 20) or mutant (MUT, SEQ ID NO. 21) sequences at amino acid positions 452, 484, and 501. CasDxl, LbCasl2a, and AsCasl2a were further evaluated with their cognate gRNAs on synthetic gene fragments with respect to SNP differentiation capabilities.
[00466] WT synthetic S-gene fragment (SEQ ID NO: 20):
[00467] tctgctttactaatgtctatgcagattcatttgtaattagaggtgatgaagtcagacaaatcgctccagggcaaactggaa agattgctgattataattataaattaccagatgattttacaggctgcgttatagcttggaattctaacaatcttgattctaaggttggtggtaatt ataattacctgtatagattgtttaggaagtctaatctcaaaccttttgagagagatatttcaactgaaatctatcaggccggtagcacacctt gtaatggtgttgaaggttttaattgttactttcctttacaatcatatggtttccaacccactaatggtgttggttaccaaccatacagagtagta gtactttcttttgaacttctacatgca
[00468] Mutant synthetic S-gene fragment including mutations on amino acid positions 452, 484, and 501 (SEQ ID NO: 21):
[00469] Tctgctttactaatgtctatgcagattcatttgtaattagaggtgatgaagtcagacaaatcgctccagggcaaactgga aatattgctgattataattataaattaccagatgattttacaggctgcgttatagcttggaattctaacaatcttgattctaaggttggtggtaat tataattaccggtatagattgtttaggaagtctaatctcaaaccttttgagagagatatttcaactgaaatctatcaggccggtagcacacct tgtaatggtgttaaaggttttaattgttactttcctttacaatcatatggtttccaacccacttatggtgttggttaccaaccatacagagtagta gtactttcttttgaacttctacatgca
[00470] A schematic of CRISPR-Cas gRNA design for SARS-CoV-2 S gene mutations is illustrated in FIG. 6A, including a region of the SARS-CoV-2 S-gene and wild-type or mutant sequences at amino acid positions 452, 484, and 501.
[00471] S-gene fragment for SARS-CoV-2, forward strand (SEQ ID NO: 118):
[00472] SEQ ID: 118: ggugguaauuauaauuaccuguauagauuguuuaggaagucuaaucucaaaccuuuugagagagauauuucaacugaaa ucuaucaggccgguagcacaccuuguaaugguguugaagguuuuaauuguuacuuuccuuuacaaucauaugguuucc aacccacuaaugguguugguuaccaacca
[00473] S-gene fragment for SARS-CoV-2, reverse strand (SEQ ID NO: 119):
[00474] SEQ ID: 119: ccaccauuaauauuaauggacauaucuaacaaauccuucagauuagaguuuggaaaacucucucuauaaaguugacuuua gauaguccggccaucguguggaacauuaccacaacuuccaaaauuaacaaugaaaggaaauguuaguauaccaaagguug ggugauuaccacaaccaaugguuggu
[00475] Mutant S-gene fragment subsequence including mutation at amino acid position 452 (SEQ ID NO: 120):
[00476] SEQ ID: 120: ccGguauagauuguuuagga
[00477] Wild-type S-gene fragment subsequence spanning amino acid position 452 (SEQ ID NO: 121):
[00478] SEQ ID: 121: aauggacauaucuaacaaau
[00479] Mutant S-gene fragment subsequence including mutation at amino acid position 484 (SEQ ID NO: 122):
[00480] SEQ ID: 122: acauuaccacaaUuuccaaa
[00481] Wild-type S-gene fragment subsequence spanning amino acid position 484 (SEQ ID NO: 123):
[00482] SEQ ID: 123: acauuaccacaacuuccaaa
[00483] Mutant S-gene fragment subsequence including mutation at amino acid position 501 (SEQ ID NO: 124):
[00484] SEQ ID: 124: aacccacuUaugguguuggu
[00485] Wild-type S-gene fragment subsequence spanning amino acid position 501 (SEQ ID NO: 125):
[00486] SEQ ID: 125: caacccacuaaugguguugg
[00487] SNP differentiation capabilities on heat inactivated viral cultures
[00488] Additionally, SNP differentiation capabilities on heat-inactivated viral cultures were tested using the full SARS-CoV-2 CRISPR-CasDxl based DETECTR® workflow, which involved a multiplexed RT-LAMP amplification followed by the CRISPR-based DETECTR® readout that enabled SNP differentiation. FIG. 7 illustrates the SARS-CoV-2 CRISPR-CasDxl based DETECTR® workflow. As shown in FIG. 7, the sample (e.g., a RNA extraction) was used as an input to DETECTR, which was visualized by a fluorescent reader. RNA encoding SARS-CoV-2 S-gene was amplified using an isothermal amplification method such as RT-LAMP. Amplified samples were detected using a Casl2 programmable nuclease complexed with gRNAs directed to SARS-CoV-2 S-gene sequence. The Casl2 programmable nuclease cleaved an ssDNA reporter nucleic acid upon complex formation with the target nucleic acid. The results of the DETECTR assay were then compared with Whole Genome Sequencing (WGS) results.
[00489] Development and testing of data analysis pipelines
[00490] To create a development environment for data analysis pipelines analyzing the COVID-19 CRISPR-CasDxl based DETECTR® assay, a blinded dataset was generated with the SARS-CoV-2 CRISPR-CasDxl based DETECTR® assay on 93 SARS-COV-2 positive clinical samples (previously characterized by sequencing) and corresponding SNP DETECTR® controls.
[00491] Materials and Methods
[00492] Synthetic gene fragments
[00493] Wild-type and mutant synthetic gene fragments (Twist) were PCR amplified using NEB 2x Phusion Master Mix following the manufacturer’s protocol. The amplified product was cleaned using AMPure XP beads following manufacturers protocol at a 0.7X concentration. The product was eluted in nuclease-free water and normalized to 10 nM. The LAMP primers used herein are shown in Table 3A and were synthesized by Eurofins Genomics. The guide RNAs used herein are shown in Tables 4A and were synthesized by Dharmacon or Synthego. The reporter used herein is shown in Table 5 and was synthesized
by IDT. The synthetic gene fragments (SEQ ID Nos. 20 and 21) were synthesized by Twist Biosciences.
[00494] Clinical sample acquisition and extraction
[00495] De-identified residual SARS-CoV-2 RT-PCR positive nasopharyngeal and/or oropharyngeal (NP/OP) swab samples in universal transport media (UTM) or viral transport media (VTM) were obtained from the UCSF Clinical Microbiology Laboratory. All samples were stored in a biorepository according to protocols approved by the UCSF Institutional Review Board (protocol number 10-01116, 11-05519) until processed.
[00496] All NP/OP swab samples obtained from the UCSF Clinical Microbiology Laboratory were pretreated with DNA/RNA Shield (Zymo Research, # R1100-250) at a 1 : 1 ratio. The Mag-Bind Viral DNA/RNA 96 kit (Omega Bio-Tek, # M6246-03) on the KingFisher Flex (Thermo Fisher Scientific, # 5400630) was used for viral RNA extraction using an input volume of 200 pl of diluted NP/OP swab sample and an elution volume of 100 pl. The Taqpath™ COVID-19 RT-PCR kit (Thermo Fisher Scientific) was used to determine the N gene cycle threshold values.
[00497] Heat-inactivated culture acquisition and extraction
[00498] Heat-inactivated cultures of SARS-CoV-2 variants being monitored (VBM), variants of concern (VOC), or variants of interest (VOI) were provided by the California Department of Public Health (CDPH).
[00499] RNA from heat-inactivated SARS-CoV-2 VBM/V OC/V OI isolates were extracted using the EZ1 Virus Mini Kit v2.0 (Qiagen, # 955134) on the EZ1 Advanced XL (Qiagen, # 9001875) according to the manufacturer’s instructions. For each culture, six replicate LAMP reactions were pooled into a single sample. DETECTR® was performed on a 1: 10 dilution of the 10,000 cp/rxn LAMP amplification products.
[00500] COVID-19 Variant DETECTR® assay
[00501] COVID- 19 Variant DETECTR® assay was performed with RT-LAMP followed by a CRISPR-Cas-based assay.
[00502] Two LAMP primer sets, each containing 6 primers, were designed to target the L452R, E484K and N501Y mutations in the SARS-CoV-2 Spike (S) protein (see Table 3A). Sets of LAMP primers were designed from a 350 bp target sequence spanning the 3 mutations using Primer Explorer V5 (available on the Internet at primerexplorer.jp/e/).
Candidate primers were manually evaluated for inclusion using the OligoCalc online oligonucleotide properties calculator (see W. A. Kibbe, Nucleic Acids Res 35, W43-46 (2007)) while ensuring that there was no overlap with either primers from the other set or guide RNA target regions that included the L452R, E484K, and N501Y mutations.
[00503] Multiplexed RT-LAMP was performed using a final reaction volume of 50 pl, which consisted of 8 pl RNA template, 5 pl of L452R primer set (Eurofins Genomics), 5 pl of E484K/N501Y primer set, 17pl of nuclease-free water, Ipl of SYTO-9 dye (ThermoFisher Scientific), and 14pl of LAMP mastermix. Each of the primer sets consisted of 1.6pM each of inner primers FIP and BIP, 0.2 pM each of outer primers F3 and B3, and 0.8 pM each of loop primers LF and LB. The LAMP mastermix contained 6 mM of MgSOi. isothermal amplification buffer at IX final concentration, 1.5 mM of dNTP mix (NEB), 8 units of Bst 2.0 WarmStart DNA Polymerase (NEB), and 0.5 ul of WarmStart RTx Reverse Transcriptase (NEB). Plates were incubated at 65°C for 40 minutes in a real-time Quantstudio™ 5 PCR instrument. Fluorescent signals were collected every 60 seconds.
[00504] 40nM CasDxl (Mammoth Biosciences), LbCasl2a (EnGen® Lba Casl2a, NEB) or AsCasl2a (Alt-R® A.s. Casl2a, IDT) protein targeting the WT or MUT SNP at L452R, E484K, or N501Y was incubated with 40nM gRNA in IX buffer (MBuffer3 for CasDxl, NEBuffer r2.1 for LbCasl2a and AsCasl2a) for 30 min at 37°C. gRNAs with an extra sequence of UAAUUUCUACUAAGUGUAGAU (SEQ ID NO: 126) on the 5’end were used with both CasDxl and LbCasl2a, whereas gRNAs with an extra sequence of UAAUUUCUACUCUUGUAGAU (SEQ ID NO: 127) on the 5’end were used with AsCasl2a (see Table 4A). lOOnM ssDNA reporter (SEQ ID NO: 7) was added to the RNA- protein complex. 18pL of this DETECTR® master mix was combined with 2pL target amplicon. The DETECTR® assays were monitored for 30 min at 37°C in a plate reader (Tecan).
[00505] Digital PCR
[00506] For digital PCR, samples were evaluated at 3 dilutions (1 TOO; 1 : 1,000; and 1:10,000) using the ApexBio Covid-19 Multiplex Digital PCR Detection Kit (Stilla Technologies) according to the manufacturer’s protocol. The controls (positive and negative, the Kit Controls, and an internal control) were run with the samples in duplicate. The dilutions were used to determine the most accurate concentration which was determined from the N gene concentration.
[00507] Sequencing methods
[00508] Experiments with sequencing steps were carried out. Complementary DNA (cDNA) synthesis from RNA via reverse transcription and tiling multiplexed amplicon PCR were performed using SARS-CoV-2 primers version 3 according to the Artic protocol (see Plitnick, J. et al., J Clin Microbiol 59, e0064921 (2021) and Quick, J. et al., Nat Protoc 12, 1261-1276 (2017)). Libraries were constructed by ligating adapters to the amplicon products using NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs, # E7645L), barcoding using NEBNext Multiplex Oligos for Illumina (New England Biolabs, # E6440L), and purification with AMPure XP (Beckman-Coulter, # 63880). Final pooled libraries were sequenced on either Illumina NextS eq 550 or Novaseq 6000 as 1x300 singleend reads (300 cycles).
[00509] SARS-CoV-2 viral genome assembly and variant analyses were performed using an in-house bioinformatics pipeline. Briefly, sequencing reads generated by Illumina sequencers (NextSeq 550 or NovaSeq 6000) were demultiplexed and converted to FASTQ files using bcl2fastq (v2.20.0.422). Raw FASTQ files were first screened for SARS-CoV-2 sequences using BLASTn (BLAST+ package 2.9.0) alignment against the Wuhan-Hu- 1 SARS-CoV-2 viral reference genome (NC_045512). Reads containing adapters, the ARTIC primer sequences, and low-quality reads were filtered using BBDuk (version 38.87) and then mapped to the NC_045512 reference genome using BBMap (version 38.87). Variants were called with Call Variants and a depth cutoff of 5 was used to generate the final assembly. Pangolin software (version 3.0.6, see A. O'Toole et al., Virus Evol 7, veab064 (2021); and A. Rambaut et al., Nat Microbiol 5, 1403-1407 (2020)) was used to identify the lineage. Using a custom in-house script, consensus FASTA files generated by the genome assembly pipeline were scanned to confirm L452R, E484K, and N501 Y mutations.
[00510] Eleven samples were re-extracted as described above for the NP/OP swab samples and evaluated by viral WGS as described above. The samples were then thawed and amplified using the LAMP protocol described above and evaluated using the COVID-19 Variant DETECTR® assay as described above.
[00511] DETECTR® data analysis pipeline
[00512] Quality control metric for the LAMP reaction
[00513] Prior to processing DETECTR® data from the clinical samples, data indicating the success or failure of the samples to amplify in the LAMP reaction were collected. The
absolute truth was based on visual inspection of LAMP curves. This absolute truth was used to develop thresholds for the LAMP reactions. The positive and negative controls from the LAMP reactions were used to derive the thresholds to qualify the samples. Two sets of thresholds were used: time threshold and fluorescence rate threshold. The positive LAMP controls were assumed to represent an ideal sample and displayed a classic sigmoidal rise of fluorescence over time and the NTC represented the background fluorescence. It was hypothesized that a sample would ideally have positive control like fluorescence kinetics. However due to the presence of high background in some samples, a mean value between controls for each plate was chosen as threshold. After this, the fluorescence values at a time threshold of 18 minutes were collected. The time point is of importance here to rule out those samples that would amplify closer to the endpoint, signifying the LAMP intermediates to be the majority contributors of the rise in the signal and not the actual sample itself. A score was assigned for each sample which was calculated as a ratio of rate of fluorescence rate threshold to the rate of fluorescence value at 18 minutes for each sample. The hypothesis was that if this ratio of rate of fluorescence between controls and samples is less than 1, then samples had failed to reach the minimum fluorescence required to be called out as amplified and if the ratio is greater than or equal to 1, then samples had amplified sufficiently. To identify the exact score value for a qualitative QC metric, an ROC analysis was done on scores and the absolute truth (see FIG. 11 A).
[00514] Data analysis for CRISPR-based SNP calling
[00515] Each well had a guide specific to the mutant or the wild-type SNP. The comparison was important to assign a genotypic call to the sample. The DETECTR® reactions across the plate were not comparable to each other. For this purpose, the endpoint fluorescence intensities were normalized in each well to its own minimum intensity, which is defined as fluorescence yield. The fluorescence yield can be compared across wells in a plate under the assumption that each well will have a similar minimum fluorescence starting point. Irrespective of the highest levels of the fluorescence intensities observed across samples, without being limited to any one theory of operation, the yield for a given target will generally remain the same assuming that similar concentrations of samples/target are being compared. This aids in normalizing the signal and comparing replicates across the wells in the same plate.
Fy = max(F)/min(F)
[00516] Without being limited to any one theory of operation, the wild-type and mutant target guides on NTC generally do not show any change in intensity over time. Thus, the fluorescence yield for NTC remains constant across replicates and plates, and moreover is close to 1.
Fy(NTC) = 1
[00517] On the contrary, if a sample has a fluorescence yield ~ 1, then it qualifies for a No Call.
[00518] Variant calling was performed using the following general rules: 1. No template controls were assigned NTC; 2. If the contrast of the sample for a SNP was between minimum and maximum contrast for the plate, then the sample was assigned a NoCall; and 3. If the size of the sample was lower than the size of the NTC on the plate, then the sample was assigned a NoCall.
Cmin(NTC-snp)<=C(sample-snp)<=Cmax(NTC-snp) - NoCall
Smin(NTC-snp)<=S(sample-snp)<=Smax(NTC-snp) - NoCall log2(Fy(WT)) > log2(Fy(M)) - Wild Type log2(Fy(WT)) < log2(Fy(M)) Mutant
[00519] SNP calls
[00520] The following procedure was used to evaluate the concordance between sequencing and DETECTR® technologies for genotypic classification of the clinical cohort dataset.
[00521] First, all samples and SNPs for which both sequencing and DETECTR® data were present in the distributed files were considered by matching the SNP IDs and sample names. This included cleaning and curing the dataset which had failed LAMP reactions and identifying WT and MUT based on the spacer fluorescent. This yielded a preliminary data set containing 279 calls across 3 SNPs against 93 samples. After eliminating samples that had failed to amplify in the LAMP reaction but were assigned a genotype, the resulting final analysis data consisted of 272 calls (Wild-type, Mutant and NoCall) spread across 3 SNPs and 91 samples. For each of the 3 SNPs in the analysis data set, both sequencing and DETECTR® genotypes (including NoCalls and LAMP Fails) were identified and recorded for
each of the 93 patients. The 91 patients included the individuals for whom actual sequencing data was available.
[00522] Statistical Analysis
[00523] SNP Calls
[00524] Statistical analysis for the experiments described herein was performed as follows. For each SNP in the analysis of genotypic calls, a variety of statistics evaluating the concordance between genotype calls were computed on the two different technologies. The concordant and discordant genotypes were visualized through contingency tables. For each SNP, there are three possible genotypes (Wild-type, Mutant and No Call). The concordance rates were calculated without the samples that failed the LAMP reaction (see Table 7 and FIGS. 14A-14D). The 2X2 cross tables classify all three SNPs across all the samples between sequencing and DETECTR® technologies (see Table 7 and FIGS. 14A-14D). The data transformation and statistical analysis was done in R (see R Core Team. R: A Language and Environmental for Statistical Computing, (Vienna, Austria, 2018), available on the Internet at R-proj ect.org, date accessed: 11/26/21).
[00525] Results
[00526] Identifying the optimal CRISPR-Casl2 enzyme for SNP detection
[00527] To determine the optimal CRISPR-Casl2 enzyme for SNP detection, three different CRISPR-Cas effectors were evaluated with trans-cutting activity: CasDxl, LbCasl2a, and AsCasl2a. Guide RNAs (gRNAs) with CasDxl and LbCasl2a were initially screened for activity on synthetic gene fragments encoding regions of the SARS-CoV-2 S- gene with either wild-type (WT, SEQ ID NO: 20) or mutant (MUT, SEQ ID NO: 21) sequences at amino acid positions 452, 484, and 501 (see FIG. 6A and FIG. 6F). From this initial activity screen, the top-performing gRNAs were identified for each S-gene variant encoding either L452R, E484K or N501Y (see FIG. 6B). Further evaluation of CasDxl, LbCasl2a and AsCasl2a with their cognate gRNAs on synthetic gene fragments is illustrated in FIGS. 6B-E. For example, in FIGS. 6C-E, each figure corresponds to one of the three effector proteins. The first column of each figure shows the fluorescent read out from the CRISPR assay performed on a wild-type fragment, with a wild-type guide versus a mutant guide. The second column of each figure shows the fluorescent read out from the CRISPR performed on a mutant fragment, with a wild-type guide versus a mutant guide. The third column of each figure shows the fluorescent read out from the CRISPR assay performed on a
control, with a wild-type guide versus a mutant guide. As shown in FIG. 6B, CasDxl showed clear SNP differentiation between wild-type (WT) and mutant (MUT) sequences on all three S-gene variants (see FIG. 6C). LbCasl2a was capable of differentiating SNPs at positions 452 and 484, and AsCasl2a was able to differentiate the SNP at position 452 (see FIGS. 6D- 6E).
[00528] Next, SNP differentiation capabilities were tested on heat-inactivated viral cultures using the full COVID- 19 Variant DETECTR® assay, consisting of RNA extraction, multiplexed RT-LAMP amplification (see FIG. 6F), and CRISPR-Casl2 detection with guide RNAs targeting part of the spike receptor-binding domain (RBD) (see FIG. 6A). The LAMP primer design incorporated two sets of six primers each, with both sets generating overlapping spike RBD amplicons that spanned the L452R, E484K, and N501Y mutations. A redundant LAMP design was adopted for two reasons: first, this approach was shown to improve detection sensitivity in initial experiments; second, the goal was to increase assay robustness given the continual emergence of escape mutations in the spike RBD throughout the course of the pandemic (see Harvey et al., Nat Rev Microbiol 19, 409-424 (2021)). The tested viral cultures included an ancestral SARS-CoV-2 lineage (WA-1) containing the wildtype spike protein (D614) targeted by the approved mRNA (BNT162b2 from Pfizer or mRNA-1273 from Modema) (see K. S. Corbett et al., Nature 586, 567-571 (2020); F. P. Polack et al., N Engl J Med 383, 2603-2615 (2020)) and DNA adenovirus vector (Ad26.COV2.S from Johnson and Johnson) (see R. Bos et al., NPJ Vaccines 5, 91 (2020)) vaccines, variants being monitored (VBMs) that were previously classified as variants of concern (VOCs) or variants of interest (VOIs), including Alpha (B.1.1.7), Beta (B.1.351), Gamma (P.l), Epsilon (B.1.427 and B.1.429), Kappa (B.1.617.1), and Zeta (P.2) lineages, and the current VOC Delta (B.1.617.2) lineage (see Table 6 for combination of mutations at each position for each strain). Heat-inactivated viral culture samples representing the seven SARS-CoV-2 lineages were quantified by digital droplet PCR across a 4-log dynamic range and used to evaluate the analytical sensitivity of the pre-amplification step. RT-LAMP amplification was evaluated using six replicates from each viral culture. Consistent amplification was observed for all seven SARS-CoV-2 lineages with 10,000 copies of target input per reaction (200,000 copies/mL) (see FIG. 6G), which is comparable to the target input of more than 200,000 copies/mL viruses (less than 30 Ct value) required for sequencing workflows used in SARS-CoV-2 variant surveillance (see e.g., F. de Mello Malta et al., Sci Rep 11, 7122 (2021); Igloi et al., Emerg Infect Dis 27, 1323-1329 (2021)).
[00529] Table 6: Interpretation table summarizing the SARS-CoV-2 mutations in this study associated with the corresponding lineage classification
[00530] To evaluate the specificity of the different Casl2 enzymes, amplified material from each viral culture was pooled and the and SNPs resulting in the L452(R), E484(K) and N501(Y) mutations were detected using CasDxl, LbCal2a and AsCasl2a. FIG. 6B shows a representative heatmap shows the expected pattern of wild-type (WT) and mutational (MUT) calls for each of the SNPs resulting in L452R, E484K and N501 Y. Specifically, for each combination of effector protein and either wild-type or mutant guide, FIG. 6H shows the heatmap results for each reaction carried out in this assay, including for each combination of effector proteins, variant, and corresponding mutant or wild type guide. The results of the fluorescent assay are also shown in the fluorescent read-out curves of FIGS. 6I-6K plotting raw fluorescence over time for the wild-type SARS-CoV-2 (Column 1), each variant (Columns 2-7) and controls (Columns 8-12). Similar to the results found using gene fragments, CasDxl correctly identified the wild-type (WT) and mutational (MUT) targets at positions 452, 484 and 501 in each LAMP-amplified, heat-inactivated viral culture (FIG. 6H and FIGS. 6I-6K). In comparison, LbCasl2a was capable of differentiating WT from MUT at position 501 on LAMP-amplified viral cultures but showed much higher background for the WT target at position 452 and higher background for both WT and MUT targets at position 484 (FIG. 6H and FIGS. 6I-6K). Additionally, AsCasl2a was able to differentiate WT from MUT targets at position 452 albeit with substantial background but was unable to differentiate WT from MUT targets at positions 484 and 501 (FIG. 6H and FIGS. 6I-6K). From these data, it was concluded that CasDxl provided more consistent and accurate calls for the L452R, E484K and N501Y mutations.
[00531] Data analysis pipeline for calling COVID-19 Variant SNPs
[00532] The SARS-CoV-2 CRISPR-CasDxl based DETECTR® assay was proceeded using only the high-fidelity CasDxl enzyme. To develop a data analysis pipeline for calling SARS-CoV-2 SNP mutations and assigning lineage classifications with the COVID- 19 Variant DETECTR® assay (see Table 6 and FIG. 10F), data collected from SNP synthetic gene fragment controls (n=279) that included all mutational combinations of 452, 484 and 501 (see Materials and Methods) were used. Based on the control sample data, allele discrimination plots were generated (see C. Broccanello et al., Plant Methods 14, 28 (2018); F. E. McGuigan, S. H. Ralston, Psychiatr Genet 12, 133-136 (2002)) to define boundaries that separate the WT and MUT signals (see FIG. 10A). Clear differentiation between WT and MUT signals was observed when plotting the ratio against the average of the WT and MUT transformed values on a mean average (MA) plot (see C. Broccanello et al., Plant Methods 14, 28 (2018); F. E. McGuigan, S. H. Ralston, Psychiatr Genet 12, 133-136 (2002)) (see FIG. 10B), with 100% concordance for SNP identity at positions 452, 484, and 501 for the control samples.
[00533] Performance evaluation of the COVID-19 Variant DETECTR® assay using clinical samples
[00534] Next, a blinded dataset consisting of 93 COVID-19 positive clinical samples (previously analyzed by viral WGS) was assembled and the SNP controls were run in parallel. These samples were extracted, amplified in triplicate RT-LAMP reactions (see FIGS. 8A-8H), and processed further as triplicate CasDxl reactions for each LAMP replicate (see FIGS. 9A-9M). A total of nine replicates were thus generated for each sample to detect WT or MUT SNPs at positions 452, 484, and 501. The DETECTR® data analysis pipeline was then applied to each sample to provide a final lineage categorization (see Table 6, FIGS. 10C-10D and FIG. 10F). For a biological RT-LAMP replicate to be designated as either WT or MUT, the same call needed to be made from all three technical CasDxl replicates (see FIGS. 4A-4C). A final SNP mutation call was made based on more than or equal to 1 of the same calls from the three biological replicates, with replicates that were designated as a No Call ignored (see FIGS. 4A-4C and FIG. 11B). After excluding two samples that were considered invalid because the fluorescence intensity from RT-LAMP amplification did not reach a pre-established threshold determined using receiver-operator characteristic (ROC) curve analysis (see FIGS. 8A-8H and FIG. 11A), a total of 807 CasDxl signals from the 91 remaining clinical samples were evaluated, generating up to 9 replicates for each clinical sample (see FIGS. 4A-4C). Differentiation of WT and MUT signals according to the allele
discrimination plots was more pronounced at positions 484 and 501 than position 452 (see FIG. 11C), whereas the MA plots, generated by transforming the data onto M (log ratio) and A (mean average) scales, showed clear separation of WT and MUT calls for all three positions (see FIG. 11D). The variant calls made on each sample were consistent with the difference in median values of the log-transformed signals as determined using the data analysis pipeline (see FIGS. 12A-12M).
[00535] The viral WGS results were then unblinded to evaluate the accuracy of the DETECTR® assay for SNP calls and lineage classification. There were 14 discordant SNP calls out of 272 (94.9% SNP concordance) distributed among 11 clinical samples out of 91 (see FIGS. 13A-13D, FIG. 13E, and FIGS. 13H-13I). Among the 11 discordant samples, one sample (COVID-31) was designated a ‘no call’ at position 452 by viral WGS and thus lacked a comparator, two samples were designated a ‘no call’ due to flat WT and MUT curves (COVID-41 and COVID-73), four samples had similar WT and MUT curve amplitudes, suggesting a mixed population (COVID-03, COVID-56, COVID-61 and COVID-81) (see FIGS. 13A-13D), and four samples had SNP assignments discordant with those from viral WGS (COVID-12, COVID-13, COVID-20 and COVID-63) (see FIGS. 13A-13D).
[00536] Given that the comparison data had been collected over an extended time period, it was suspected that sample stability issues arising from aliquoting and multiple freeze-thaw cycles may have accounted for the observed discrepancies. To further investigate this possibility, the 11 discordant clinical samples were re-extracted from original respiratory swab matrix and re-analyzed by running both viral WGS and the DETECTR® assay in parallel. Re-testing of the samples resulted in nearly complete agreement between the two methods, with the exception of two SNPs that were identified as E484Q in two samples by WGS but were incorrectly called E484 (WT) by the DETECTR® assay (see Table 7, FIG. 13F, and FIGS. 13H-13I). Thus, based on discrepancy testing, the positive predictive agreement (PPA) between the DETECTR® assay and viral WGS at all three WT and MUT SNP positions was 100% (272 of 272, p<2.2e-16 by Fisher’s Exact Test) (see Table 7). The corresponding negative predictive agreement (NPA) was 91.4% as the E484Q mutation for two SNPs was incorrectly classified as WT. Nevertheless, the final viral lineage classification for the 91 samples after discrepancy testing showed 100% agreement with viral WGS (see FIGS. 13G, 13J, and 14A-14D)
[00537] Table 7 : Final Positive Predictive Agreement (PPA), Negative Predictive Agreement (NPA) and concordance values for each WT and MUT SNP from the evaluation of the DETECTR® assay against the SARS-CoV-2 WGS comparator assay after discordant samples were resolved
[00538] Discussion
[00539] In this example, a CRISPR based DETECTR® assay was developed for the detection of SARS-CoV-2 variants. Three CRISPR-Casl2 enzymes were evaluated. Based on a head-to-head comparison of these enzymes, clear differences in performance were observed with CasDxl demonstrating the highest fidelity was able to reliably detect all three of the targeted SNPs. A data analysis pipeline was developed to differentiate between WT and MUT signals with the COVID-19 Variant DETECTR® assay, yielding an overall SNP concordance of 100% (272/272 total SNP calls) and 100% agreement with lineage classification compared to viral WSG. Taken together, these findings show robust agreement between the COVID- 19 Variant DETECTR® assay and viral WGS for identification of SNP mutations and variant categorization. Thus, the COVID- 19 Variant DETECTR® assay provided a faster and simpler alternative to sequencing-based methods for COVID- 19 variant surveillance.
[00540] Although CRISPR-based diagnostic assays have been demonstrated for the detection of SARS-CoV-2 variants, these studies have limitations in coverage of circulating lineages and in the extent of clinical sample evaluation. For example, the miSHERLOCK variant assay uses LbCasl2a (NEB) to detect N501Y, E484K and Y144Del covering eight lineages (WA-1, Alpha, Beta, Gamma, Eta, Iota, Mu and Zeta) and was tested only on contrived samples (RNA spiked into human saliva) (see Puig et al., Sci Adv 7, (2021)). Additionally, the SHINEv2 assay uses LwaCasl3a to detect 69/70Del, K417N/T, L452R and 156/157Del + R158G covering eight lineages (WA-1, Alpha, Beta, Gamma, Delta, Epsilon, Kappa and Mu) and was tested with only the 69/70Del gRNAs on 20 Alpha-positive NP clinical samples (see Arizti-Sanz et al., medRxiv, (2021)). In comparison, the COVID-19 Variant DETECTR® assay disclosed herein uses CasDxl to detect N501Y, E484K and L452R covering eleven lineages (WA-1, Alpha, Beta, Gamma, Delta, Epsilon, Eta, Iota, Kappa, Mu and Zeta) and 91 clinical samples representing seven out of the eleven lineages were tested with successful detection of all seven.
[00541] In the near term, the COVID-19 Variant DETECTR® assay described herein can be served as an initial screen for the presence of a rare or novel variant (e.g., carrying both L452R and E484K or carrying all three SNPs) that could be reflexed to viral WGS. As the
sequencing capacity for most clinical and public health laboratories is limited, the COVID- 19 Variant DETECTR® assay would thus enable rapid identification of variants circulating in the community to support outbreak investigation and public health containment efforts. In addition, identification of specific mutations associated with neutralizing antibody evasion, such as E484K, could inform patient care with regards to the use of monoclonal antibodies that remain effective in treating the infection. As the virus continues to mutate and evolve, the COVID- 19 Variant DETECTR® assay can be readily reconfigured by validating new pre-amplification LAMP primers and gRNAs that target emerging mutations with clinical and epidemiological significance. For example, the newly emerging Omicron variant, containing at least 30 mutations in the spike protein and 11 mutations in the spike RGD region targeted by the assay, could be detected by increasing degeneracy in the LAMP primers and adding at least one gRNA to be able to distinguish this variant from the others. Over the longer term, a validated CRISPR assay that combines SARS-CoV-2 detection with variant identification would be useful as a tool for simultaneous COVID-19 diagnosis in individual patients and surveillance for infection control and public health purposes.
[00542] Example 3 - SARS-CoV-2 SNP Calling: L452R, E484K, and N501 Y
[00543] The disclosure provides methods for determining L452R, E484K, and N501Y versus wild-type calls within the spike gene of SARS-CoV-2 using an interpretive algorithm for SNP calling in conjunction with a CRISPR-Casl2 based assay. The spike variants L452R, E484K, N501Y are described in Zhang el al., 2021, “Ten emerging SARS-CoV-2 spike variants exhibit variable infectivity, animal tropism, and antibody neutralization,” Commun Biol 4, 1196, which is hereby incorporated by reference.
[00544] In accordance with the interpretive algorithm, a signal dataset was obtained that comprised, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting fluorescent signals for a respective one or more nucleic acid molecules in a biological sample that map to position 452, 484 or 501 of the spike protein arising from RT-LAMP amplification of the CRISPR-Casl2 based DETECTR® assay.
[00545] The plurality of wells comprised a first set of nine wells representing the wild type allele for position 452 of the spike protein. Each well in the first set of nine wells included a first plurality of guide nucleic acids that have the wild type allele for position 452 of the SARS-CoV-2 spike protein.
[00546] The plurality of wells further comprised a second set of nine wells representing the L452R allele for the SARS-CoV-2 spike protein. Each well in the second set of nine wells included a second plurality of guide nucleic acids that have the L452R allele for the SARS- CoV-2 spike protein.
[00547] Each respective well in the first set of wells and each respective well in the second set of wells contained a corresponding aliquot of nucleic acid derived from a biological sample for which the determination of a L452R versus a wild-type call within the spike gene of SARS-CoV-2 was sought, as well as a Casl2 protein with trans-cutting activity.
[00548] Each corresponding plurality of reporting fluorescent signals in the signal dataset comprised, for each respective time point in a set of 30 time points, a respective fluorescent reporting signal in the form of a corresponding discrete attribute value arising from RT- LAMP amplification. Each respective time point in the corresponding plurality of reporting signals represented a different 60 second time interval within a 30 minute monitoring time period of the RT-LAMP amplification.
[00549] A determination was made, for each respective well in the plurality of wells, of a corresponding fluorescent signal yield for the respective well using the corresponding plurality of reporting signals for the respective well across the 30 time points. For each well, the signal yield was the maximum signal yield observed across the 30 times points divided by the minimum signal yield observed across the 30 times points. This allowed signal yields from individual wells to be compared to each other.
[00550] Then, a determination was made, for each respective well in the first set of nine wells, of a respective candidate call identity based on relative intensity metric between the corresponding first signal yield for the respective well in the first set of nine wells and a corresponding second signal yield for a corresponding well in the second set of nine wells, thereby obtaining a plurality of candidate call identities. The relative intensity metric has the form: log2(Fy(WT)) / log2(Fy(M)),
[00551] where Fy(WT) was the first corresponding signal yield representing the subject wild type allele, and Fy(M) was the subject mutant allele of the SARS-CoV-2 spike protein. In the case of the first and second set of nine wells, WT was the wild type allele for position 452 of the SARS-CoV-2 spike protein whereas M was the L452R allele for the SARS-CoV-2 spike protein of the SARS-CoV-2 spike protein. There were three possible candidate call
identities for each well in the first set of nine wells: wild type for position 452 of the SARS- CoV-2 spike protein, L452R for the SARS-CoV-2 spike protein, or No Call. No Call arises when the relative intensity metric for a given well was between the maximum control signal yield and the minimum control signal yield arising from RT-LAMP assays of control wells on the common plate that are free of nucleic acid derived from the biological sample.
[00552] The first set of nine wells was binned into three bins, each bin consisting of three of the wells from the first set of nine wells. For each respective bin in the set of three bins, a respective first concordance vote across the candidate call identities for wells in the respective bin was made, thereby generating three bin votes, one for each bin. The respective first concordance vote generated a respective bin vote based on a common candidate call identity that was shared by all of the wells in the subset of wells for the respective bin. A respective bin vote was no-call when at least one well in the three wells in the respective bin has a candidate call identity of no-call.
[00553] A second concordance vote was made across the three bin votes for the first set of nine wells, thereby obtaining the mutation call that was one of L452R or wild type for position 452 of the SARS-CoV-2 spike protein for the biological sample. The second concordance vote generated this mutation call based on a common bin vote that was shared by at least two of the three bins.
[00554] Corresponding to L452R, additional sets of nine wells in the common plate respectively represented the wild type allele for position 484 within the spike gene of SARS- CoV-2, the E484K mutation in the spike gene of SARS-CoV-2, the wild type allele for position 501 within the spike gene of SARS-CoV-2, and the N501Y mutation in the spike gene of SARS-CoV-2. Such sets of nine wells were processed in the same manner as the first and second sets of nine wells for the respective wild type allele for position 452 and the L452R allele for the SARS-CoV-2 spike protein of the SARS-CoV-2 spike protein described above.
[00555] Example 4 - SARS-CoV-2 Omicron Variant Detection with High-Fidelity CRISPR-Casl2 Enzyme
[00556] Laboratory tests for the accurate and rapid identification of SARS-CoV-2 variants can potentially guide the treatment of COVID-19 patients and inform infection control and public health surveillance efforts. This example presents the development and validation of a rapid COVID- 19 variant DETECTR® assay incorporating loop-mediated isothermal
amplification (LAMP) followed by CRISPR-Casl2 based identification of single nucleotide polymorphism (SNP) mutations in the SARS-CoV-2 spike (S) gene. The assay targeted the L452R, E484K/Q/A, and N501 Y mutations that are associated with nearly all circulating viral lineages and identifies the two currently-circulating variants of concern, Delta and Omicron. In a comparison of three different Casl2 enzymes, the newly identified enzyme CasDxl was able to accurately identify all targeted SNP mutations under the conditions tested. An analysis pipeline for CRISPR-based SNP identification from 139 clinical samples yielded an overall SNP concordance of 98% and agreement with SARS-CoV-2 lineage classification of 138/139 compared to viral whole-genome sequencing. It was also shown that detection of the single E484A mutation was necessary and sufficient to accurately identify Omicron from other major circulating variants in patient samples. These findings demonstrate the utility of CRISPR-based DETECTR® as a faster and simpler diagnostic than sequencing for SARS-CoV-2 variant identification in clinical and public health laboratories.
[00557] CasDxl (Mammoth Biosciences) was evaluated for activity on SARS-CoV-2 Omicron variants using methods substantially similar to those described in Examples 2 and 3.
[00558] SNP differentiation capabilities on synthetic guide nucleic acids
[00559] Guide RNAs (gRNAs) with CasDxl were screened for activity on synthetic gene fragments encoding regions of the SARS-CoV-2 S-gene with either wild-type (WT, SEQ ID NO: 20), E484K mutant (K484, SEQ ID NO: 21), E484A mutant (A484, SEQ ID NO: 43), and E484Q mutant (Q484 SEQ ID NO: 44) sequences at amino acid positions 452, 484, and 501. CasDxl was further evaluated with its cognate gRNAs on synthetic gene fragments with respect to SNP differentiation capabilities.
[00560] WT synthetic S-gene fragment (SEQ ID NO: 20):
[00561] tctgctttactaatgtctatgcagattcatttgtaattagaggtgatgaagtcagacaaatcgctccagggcaaactggaa agattgctgattataattataaattaccagatgattttacaggctgcgttatagcttggaattctaacaatcttgattctaaggttggtggtaatt ataattacctgtatagattgtttaggaagtctaatctcaaaccttttgagagagatatttcaactgaaatctatcaggccggtagcacacctt gtaatggtgttgaaggttttaattgttactttcctttacaatcatatggtttccaacccactaatggtgttggttaccaaccatacagagtagta gtactttcttttgaacttctacatgca
[00562] Mutant synthetic S-gene fragment including mutations on amino acid positions 452, 484 (K484), and 501 (SEQ ID NO: 21):
[00563] tctgctttactaatgtctatgcagattcatttgtaattagaggtgatgaagtcagacaaatcgctccagggcaaactggaa atattgctgattataattataaattaccagatgattttacaggctgcgttatagcttggaattctaacaatcttgattctaaggttggtggtaatt
ataattaccggtatagattgtttaggaagtctaatctcaaaccttttgagagagatatttcaactgaaatctatcaggccggtagcacacctt gtaatggtgttaaaggttttaattgttactttcctttacaatcatatggtttccaacccacttatggtgttggttaccaaccatacagagtagta gtactttctttgaacttctacatgca
[00564] Mutant synthetic S-gene fragment including mutations on amino acid position 484 (A484) (SEQ ID NO: 43):
[00565] tctgctttactaatgtctatgcagattcatttgtaattagaggtgatgaagtcagacaaatcgctccagggcaaactggaa aGattgctgattataattataaattaccagatgattttacaggctgcgttatagcttggaattctaacaatcttgattctaaggttggtggtaat tataattaccggtatagattgtttaggaagtctaatctcaaaccttttgagagagatatttcaactgaaatctatcaggccggtagcacacct tgtaatggtgttgCaggttttaattgttactttcctttacaatcatatggtttccaacccacttatggtgttggttaccaaccatacagagtagt agtactttcttttgaacttctacatgca
[00566] Mutant synthetic S-gene fragment including mutations on amino acid position 484 (Q484) (SEQ ID NO: 44):
[00567] tctgctttactaatgtctatgcagattcatttgtaattagaggtgatgaagtcagacaaatcgctccagggcaaactggaa aGattgctgattataattataaattaccagatgattttacaggctgcgttatagcttggaattctaacaatcttgattctaaggttggtggtaat tataattaccggtatagattgtttaggaagtctaatctcaaaccttttgagagagatatttcaactgaaatctatcaggccggtagcacacct tgtaatggtgttCaaggttttaattgttactttcctttacaatcatatggtttccaacccacttatggtgttggttaccaaccatacagagtagt agtactttcttttgaacttctacatgca
[00568] Materials and Methods
[00569] Synthetic gene fragments
[00570] Wild type and mutant synthetic gene fragments (Twist) were PCR amplified using NEB 2x Phusion Master Mix following the manufacturer’s protocol. The amplified product was cleaned using AMPure XP beads following manufacturers protocol at a 0.7X concentration. The product was eluted in nuclease-free water and normalized to 10 nM. The LAMP primers used herein are shown in Tables 3 A and 3B and were synthesized by Eurofins Genomics. The guide RNAs used herein are shown in Tables 4A and 4B and were synthesized by Dharmacon or Synthego. The reporter used herein is shown in Table 5 and was synthesized by IDT. The synthetic gene fragments (SEQ ID Nos. 20, 21, 43, and 44) were synthesized by Twist Biosciences.
[00571] Clinical sample acquisition and extraction
[00572] De-identified residual SARS-CoV-2 RT-PCR positive nasopharyngeal and/or oropharyngeal (NP/OP) swab samples in universal transport media (UTM) or viral transport
media (VTM) were obtained from the UCSF Clinical Microbiology Laboratory. All samples were stored in a biorepository according to protocols approved by the UCSF Institutional Review Board (protocol number 10-01116, 11-05519) until processed.
[00573] All NP/OP swab samples obtained from the UCSF Clinical Microbiology Laboratory were pretreated with DNA/RNA Shield (Zymo Research, # R1100-250) at a 1 : 1 ratio. The Mag-Bind Viral DNA/RNA 96 kit (Omega Bio-Tek, # M6246-03) on the KingFisher Flex (Thermo Fisher Scientific, # 5400630) was used for viral RNA extraction using an input volume of 200 pl of diluted NP/OP swab sample and an elution volume of 100 pl. The Taqpath™ COVID-19 RT-PCR kit (Thermo Fisher Scientific) was used to determine the N gene cycle threshold values.
[00574] Heat-inactivated culture acquisition and extraction
[00575] Heat-inactivated cultures of SARS-CoV-2 variants being monitored (VBM), variants of concern (VOC), or variants of interest (VOI) were provided by the California Department of Public Health (CDPH).
[00576] RNA from heat-inactivated SARS-CoV-2 VBM/V OC/V OI isolates were extracted using the EZ1 Virus Mini Kit v2.0 (Qiagen, # 955134) on the EZ1 Advanced XL (Qiagen, # 9001875) according to the manufacturer’s instructions. For each culture, six replicate LAMP reactions were pooled into a single sample. DETECTR® was performed on a 1: 10 dilution of the 10,000 cp/rxn LAMP amplification products.
[00577] COVID-19 Variant DETECTR® assay
[00578] COVID- 19 Variant DETECTR® assay was performed with RT-LAMP followed by a CRISPR-Cas-based assay.
[00579] Two LAMP primer sets, each containing 6 primers, were designed to target the L452R, E484K and N501Y mutations in the SARS-CoV-2 Spike (S) protein (see Table 3A). Sets of LAMP primers were designed from a 350 bp target sequence spanning the 3 mutations using Primer Explorer V5 (available on the Internet at primerexplorer.jp/e/). Candidate primers were manually evaluated for inclusion using the OligoCalc online oligonucleotide properties calculator (see W. A. Kibbe, Nucleic Acids Res 35, W43-46 (2007)) while ensuring that there was no overlap with either primers from the other set or guide RNA target regions that included the L452R, E484(K), and N501 Y mutations.
[00580] Multiplexed RT-LAMP was performed using a final reaction volume of 50 pl, which consisted of 8 pl RNA template, 5 pl of L452R primer set (Eurofins Genomics), 5 pl of E484K/N501Y primer set, 17pl of nuclease-free water, Ipl of SYTO-9 dye (ThermoFisher Scientific), and 14pl of LAMP mastermix. Each of the primer sets consisted of 1.6pM each of inner primers FIP and BIP, 0.2 pM each of outer primers F3 and B3, and 0.8 pM each of loop primers LF and LB. The LAMP mastermix contained 6 mM of MgSOi. isothermal amplification buffer at IX final concentration, 1.5 mM of dNTP mix (NEB), 8 units of Bst 2.0 WarmStart DNA Polymerase (NEB), and 0.5 ul of WarmStart RTx Reverse Transcriptase (NEB). Plates were incubated at 65°C for 40 minutes in a real-time Quantstudio™ 5 PCR instrument. Fluorescent signals were collected every 60 seconds.
[00581] Degenerate multiplexed RT-LAMP was performed using a final reaction volume of 65 pl, which consisted of 9.6 pl RNA template, 10 pl of L452R degenerate primer set (Eurofins Genomics), 10 pl of E484(K/Q/A)/N501Y degenerate primer set, 14.1 pl of nuclease-free water, 1.3 pl of SYTO-9 dye (ThermoFisher Scientific), and 20 pl of LAMP mastermix. Two degenerate LAMP primer sets, each containing 6 primers modified from the original LAMP primer sets (see Table 3 A) with degenerate nucleotides, were designed to capture the L452R, E484K, E484Q, E484A, and N501Y mutations in the SARS-CoV-2 Spike (S) protein (see Table 3B). The primer set and mastermix assembly, the incubation and data collection were described above.
[00582] 40nM CasDxl (Mammoth Biosciences) protein targeting the WT or MUT SNP at L452R, E484K, E484Q, E484A, or N501 Y was incubated with 40nM gRNA in IX buffer (MBuffer3 for CasDxl) for 30 min at 37°C. gRNAs with an extra sequence of UAAUUUCUACUAAGUGUAGAU (SEQ ID NO: 126) on the 5 ’end were used with CasDxl (see Tables 4A and 4B). lOOnM ssDNA reporter (SEQ ID NO: 7) was added to the RNA-protein complex. 18pL of this DETECTR® master mix was combined with 2pL target amplicon. The DETECTR® assays were monitored for 30 min at 37°C in a plate reader (Tecan).
[00583] Sequencing methods
[00584] Experiments with sequencing steps were carried out as described in Example 2. Complementary DNA (cDNA) synthesis from RNA via reverse transcription and tiling multiplexed amplicon PCR were performed using SARS-CoV-2 primers version 3 according to the Artic protocol (see Plitnick, J. et al., J Clin Microbiol 59, e0064921 (2021); and Quick,
J. et al., Nat Protoc 12, 1261-1276 (2017)). Libraries were constructed by ligating adapters to the amplicon products using NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs, # E7645L), barcoding using NEBNext Multiplex Oligos for Illumina (New England Biolabs, # E6440L), and purification with AMPure XP (Beckman-Coulter, # 63880). Final pooled libraries were sequenced on either Illumina MiSeq or NextSeq 550 as 2x150 single-end reads (300 cycles).
[00585] SARS-CoV-2 viral genome assembly and variant analyses were performed using an in-house bioinformatics pipeline. Briefly, sequencing reads generated by Illumina sequencers (MiSeq or NextSeq 550) were demultiplexed and converted to FASTQ files using bcl2fastq (v2.20.0.422). Raw FASTQ files were first screened for SARS-CoV-2 sequences using BLASTn (BLAST+ package 2.9.0) alignment against the Wuhan-Hu-1 SARS-CoV-2 viral reference genome (NC_045512). Reads containing adapters, the ARTIC primer sequences, and low-quality reads were filtered using BBDuk (version 38.87) and then mapped to the NC_045512 reference genome using BBMap (version 38.87). Variants were called with CallVariants and iVar (version 1.3.1) and a depth cutoff of 5 was used to generate the final assembly. Pangolin software (version 3.1.17, see A. O'Toole et al., Virus Evol 7, veab064 (2021); and A. Rambaut et al., Nat Microbiol 5, 1403-1407 (2020)) was used to identify the lineage. Using a custom in-house script, consensus FASTA files generated by the genome assembly pipeline were scanned to confirm L452R, E484K, E484Q, E484A, and N501Y mutations.
[00586] Discordant sample retesting
[00587] The discordant samples for Examples 2 and 3 (n=16) were re-extracted as described above for the NP/OP swab samples and evaluated by viral WGS as described above. The extracted nucleic acids were then thawed (incurring an additional freeze/thaw as needed) and amplified using the LAMP protocol described above and evaluated using the COVID- 19 Variant DETECTR® assay as described above.
[00588] DETECTR® data analysis pipeline
[00589] Quality Control Metric for the LAMP Reaction
[00590] Prior to processing DETECTR® data from the clinical samples, data indicating the success or failure of the samples to amplify in the LAMP reaction were collected. The absolute truth was based on visual inspection of LAMP curves. This absolute truth was used to develop thresholds for the LAMP reactions. The positive and negative controls from the
LAMP reactions were used to derive the thresholds to qualify the samples. Two sets of thresholds were used: time threshold and fluorescence rate threshold. The positive LAMP controls were assumed to represent an ideal sample and displayed a classic sigmoidal rise of fluorescence over time and the NTC represented the background fluorescence. It was hypothesized that a sample would ideally have positive control like fluorescence kinetics. However, due to the presence of high background in some samples, a mean value between controls for each plate was chosen as threshold. After this, the fluorescence values at a time threshold of 18 minutes were collected. The time point is of importance to rule out those samples that would amplify closer to the endpoint, signifying the LAMP intermediates to be the majority contributors of the rise in the signal and not the actual sample itself. A score was assigned for each sample which was calculated as a ratio of rate of fluorescence rate threshold to the rate of fluorescence value at 18 minutes for each sample. The hypothesis was that if this ratio of rate of fluorescence between controls and samples was less than 1, then the samples had failed to reach the minimum fluorescence required to be called out as amplified. If the ratio was greater than or equal to 1, then the samples had amplified sufficiently. To identify the exact score value for a qualitative QC metric, an ROC analysis was done on scores and the absolute truth (see FIG. 11 A).
[00591] Data analysis for CRISPR-based SNP calling
[00592] As described in Examples 2 and 3, each well had a guide specific to the mutant or the wild-type SNP. The comparison was important to assign a genotypic call to the sample. The DETECTR® reactions across the plate were not comparable to each other. For this purpose, the endpoint fluorescence intensities were normalized in each well to its own minimum intensity, which was defined as fluorescence yield. The fluorescence yield was compared across wells in a plate under the assumption that each well would have a similar minimum fluorescence starting point. Irrespective of the highest levels of the fluorescence intensities observed across samples, without being limited to any one theory of operation, the yield for a given target will generally remain the same assuming that similar concentrations of samples/target are being compared. This aided in normalizing the signal and comparing replicates across the wells in the same plate.
Fy = max(F)/min(F)
[00593] Without being limited to any one theory of operation, the wild-type and mutant target guides on NTC generally do not show any change in intensity over time. Thus, the
fluorescence yield for NTC remains constant across replicates and plates, and moreover is close to 1.
Fy(NTC) = 1
[00594] On the contrary, if a sample has a fluorescence yield ~ 1, then it qualifies for a No Call.
[00595] Variant calling was performed using the following general rules: 1. No template controls were assigned NTC; 2. If the contrast of the sample for a SNP was between minimum and maximum contrast for the plate, then the sample was assigned a NoCall; and 3. If the size of the sample was lower than the size of the NTC on the plate, then the sample was assigned a NoCall.
Cmin(NTC-snp)<=C(sample-snp)<=Cmax(NTC-snp) -* NoCall
Smin(NTC-snp)<=S(sample-snp)<=Smax(NTC-snp) — NoCall log2(Fy(WT)) > log2(Fy(M)) - Wild Type log2(Fy(WT)) < log2(Fy(M)) -Mutant
[00596] In cases where more than one mutant at a particular position was being analyzed for, such as in the case of the Omicron variant of SARS-CoV-2, the signals of each mutant was compared with the wild-type signal. If there existed a mutant, then among (n) comparisons for n mutants and one wild type, one of the comparisons would yield a mutant call. log2(Fy(WT)) > log2(Fy(Ml)) - Wild Type log2(Fy(WT)) > log2(Fy(M2)) - Wild Type log2(Fy(WT)) > log2(Fy(M3)) - Wild Type log2(Fy(WT)) < log2(Fy(M4)) -Mutant (4)
[00597] If there was a tie in the above logic between mutant and wild-type, then a tie breaker comparison would yield a final result.
log2(Fy(WT)) > log2(Fy(Ml)) - Wild Type log2(Fy(WT)) < log2(Fy(M2)) -Mutant (2) log2(Fy(WT)) > log2(Fy(M3)) - Wild Type log2(Fy(WT)) < log2(Fy(M4)) -Mutant (4) log2(Fy(M2)) < log2(Fy(M4)) —Mutant (4)
[00598] SNP calls
[00599] The following procedure, similar to that described in Examples 2 and 3, was used to evaluate the concordance between sequencing and DETECTR® technologies for genotypic classification of the clinical cohort dataset.
[00600] First, all samples and SNPs for which both sequencing and DETECTR® data were present in the distributed files were considered by matching the SNP IDs and sample names. This included cleaning and curing the dataset which had failed LAMP reactions and identifying WT and MUT based on the spacer fluorescent. Samples that had failed to amplify in the LAMP reaction but were assigned a genotype were eliminated as described in Example 2. From the second cohort of 48 samples, 102 calls were made. These 48 samples were initially assay for SNP calls at position 484 and the variant was determined from there if possible as described herein (e.g., presence of the E484A mutant at position 484 indicated the Omicron variant). When the variant could not be called based on the SNP call of position 484, SNP calls for positions 452 and 501 were combined with that of position 484 to make the final variant call. The resulting final analysis data consisted of 102 calls (WT, K484, A484, Q484, R452, Y501, or No Call) spread across 3 SNPs and 48 samples. For each of the 3 SNPs in the analysis data set, both sequencing and DETECTR® genotypes (including NoCalls and LAMP Fails) were identified and recorded for each of the 48 patients. The 48 patients included the individuals for whom actual sequencing data was available.
[00601] Statistical Analysis
[00602] SNP calls
[00603] Statistical analysis for the experiments described herein were done as follows. For each SNP in the analysis of genotypic calls, a variety of statistics evaluating the concordance
between genotype calls were computed on the two different technologies. The concordant and discordant genotypes were visualized through contingency tables. For each SNP, there are three possible genotypes (Wild-type, Mutant and No Call). For the SNP at position 484, the Mutant genotype may be any one of K484, A484, or Q484. The concordance rates were calculated without the samples that failed the LAMP reaction (see Table 7 and FIGS. 14A- 14D). The 2X2 cross tables classify all three SNPs (at positions 452, 484, 501) across all the samples between sequencing and DETECTR® technologies (see Table 7 and FIGS. 14A- 14D). The data transformation and statistical analysis was done in R (see R Core Team. R: A Language and Environmental for Statistical Computing, (Vienna, Austria, 2018), available on the Internet at R-proj ect.org, date accessed: 11/26/21).
[00604] Results
[00605] Update of the COVID- 19 Variant DETECTR® assay for distinct 484 mutant SNP detection
[00606] In November 2021, a new SARS-CoV-2 variant was identified and almost immediately designated a variant of concern, called Omicron. The Omicron variant carries an exceptionally high number of mutations (>30) within the S-gene and has been shown to have enhanced transmissibility and immune evasion. The record number of COVID- 19 cases globally from Omicron and loss of activity by certain therapeutic antibodies underscores the need for rapid and targeted identification of SARS-CoV-2 variants. Although the TaqPath PCR assay with S-gene Target Failure (SGTF) has functioned as a screen that can be reflexed to sequencing to identify the Omicron variant, the SGTF assay alone cannot differentiate between Omicron BA.1 and Alpha and cannot identify emerging variants that lack the SGTF, such as the Omicron BA.2 sublineage. The COVID-19 variant DETECTR® assay described in Example 2 was reconfigured for the identification of Omicron by targeting the E484A mutation, which alone differentiates Omicron from all other current VBM/VOI/VOC. Given that E484-related mutations are present in multiple circulating variants and have a strong effect on reducing antibody neutralization, a panel of CasDxl gRNAs to detect all relevant mutations (E, K, Q and A) at amino acid position 484 were also developed (see Tables 4A and 4B). Guide RNAs (gRNAs) with CasDxl were screened for activity on synthetic gene fragments encoding regions of the SARS-CoV-2 S-gene with either wild-type (WT, SEQ ID NO: 20), E484K mutant (K484, SEQ ID NO: 21), E484A mutant (A484, SEQ ID NO: 43), and E484Q mutant (Q484 SEQ ID NO: 44) sequences at amino acid positions 452, 484, and 501 (see FIG. 15A). From this initial activity screen, the top-performing gRNAs were
identified for each S-gene variant encoding either E484K, E484Q, or E484A (see FIG. 15B). CasDxl was further evaluated with its cognate gRNAs on synthetic gene fragments with respect to SNP differentiation capabilities is illustrated in FIG. 15B.
[00607] Given the highly mutated Omicron S-gene (see FIG. 15A), the original LAMP primer set used in Example 2 (Table 3 A) was tested against a degenerate LAMP primer set (Table 3B) which incorporated degenerate nucleotides within the LAMP primers, as it was suspected that the original LAMP primer set may not have sufficient sensitivity to amplify the targeted spike RGD region. As in Example 2, the degenerative LAMP primer design incorporated two sets of six primers each, with both sets generating overlapping spike RBD amplicons that spanned the L452R, E484K/Q/A, and N501Y mutations. FIG. 16 shows a comparison of RT-LAMP primers for processing Omicron clinical samples. Omicron clinical samples (1802), WT (1804) control RNA, and Alpha (1806) control RNAs were amplified using the original RT-LAMP primer set and degenerate RT-LAMP primer set. The degenerate primers (see Table 3B) amplified both the controls (WT and Alpha) and Omicron samples, whereas the original primer set (see Table 3A) only amplified the control samples (WT and Alpha).
[00608] Within weeks of the first Omicron case in the US, an additional set of 48 clinical samples beyond those of Example 2 were procured and tested. These samples were blinded and processed with the updated DETECTR® assay workflow, which included sample extraction, followed by amplification with the degenerate LAMP primers and detection with each of the 484-specific CasDxl gRNAs. Once processed, a result with mutations K484, Q484, or A484 was called Beta/Eta/Gamma/Iota/Mu/Zeta, Kappa, or Omicron, respectively, as shown in Tables 8 and 9 and FIGS. 15C and 17A-17B.
[00610] Development of an updated data analysis pipeline for calling COVID-19 Variant
SNPs
[00611] To develop an updated data analysis pipeline for calling SARS-CoV-2 SNP mutations and assigning lineage classifications with the COVID-19 Variant DETECTR® assay (see Table 8 and FIGS. 17A-17B), data collected from the additional clinical samples (n=48) were used. Allele discrimination plots were generated as described in Example 2 to define boundaries that separate the WT and MUT signals. Clear differentiation between WT and MUT signals was observed when plotting the ratio against the average of the WT and MUT transformed values on a mean average (MA) plot as described in Example 2 for SNP identity at positions 452, 484, and 501 for the control samples. In cases of SNP 484 (n=484), additional mutants were characterized against WT, the ratio of MUT(l) vs. MUT(2), etc. was generated as described above to make the final MUT(l) or MUT(2) call.
[00612] Performance evaluation of the updated COVID- 19 Variant DETECTR® assay using clinical samples
[00613] A blinded dataset consisting of 48 COVID-19 positive clinical samples (previously analyzed by viral WGS) was assembled and the SNP controls were run in parallel. These samples were extracted, amplified in triplicate RT-LAMP reactions, and processed further as triplicate CasDxl reactions for each LAMP replicate. A total of nine replicates were thus generated for each sample to detect WT (E484) or MUT (K484, Q484, or A484) SNPs at positions 484. The DETECTR® data analysis pipeline was then applied to each sample to provide a 484 SNP categorization (see Tables 8 and 9 and FIGS. 15C and 17A-17B). For a biological RT-LAMP replicate to be designated as either WT or MUT, the same call needed to be made from all three technical CasDxl replicates (see FIG. 4A-4C). A final 484 SNP mutation call was made based on more than or equal to 1 of the same calls from the three biological replicates, with replicates that were designated as a No Call ignored (see FIGS. 4A-4C). Once processed, a result with mutations K484, Q484, or A484 was called Beta/Eta/Gamma/Iota/Mu/Zeta, Kappa, or Omicron, respectively, as shown in Tables 8 and 9 and FIGS. 15C and 17A-17B. If the result was associated with E484 (WT), the assay was run using WT and MUT gRNAs at positions 452 and 501 and the DETECTR® data analysis pipeline was applied to provide a final lineage categorization (see Tables 8 and 9 and FIGS. 15C and 17A-17B). Using this workflow, 36 out of 48 total clinical samples were detected: 18/48 resulted as E484 (WT) and were subsequently tested with 452 and 501 gRNAs (3/18 called WT, 6/18 called Alpha and 9/18 called Delta), 4/48 resulted as K484 (called Beta/Eta/Gamma/Iota/Mu/Zeta), 2/48 resulted as Q484 (called Kappa), and 12/48 resulted as A484 (called Omicron) (see FIG. 15C and 17B-15C). The remaining 12/48
clinical samples neither amplified nor showed any DETECTR® signal and were thus called “Not Detected” (see FIG. 18).
[00614] The viral WGS results were unblinded to evaluate the accuracy of the updated DETECTR® assay for SNP calls and lineage classification. Unblinding of samples COVID- 92 through COVID-127 revealed five discordant samples: COVID-103, COVID-108, COVID-109, COVID-112 and COVID-122 (FIG. 17D). All five discordant samples were reextracted from the original patient sample and re-processed with WGS and COVID Variant DETECTR®. After repeat testing, three samples (COVID-103, COVID-108, COVID-109) showed 100% concordance between WGS and DETECTR®, with both methods resulting in “No Call” at position 452. Notably, these samples were also part of the original set of 91 samples (COVID-20, COVID-63, COVID-73) from Example 2 that were previously concordant at position 452, suggesting a decrease in sample integrity likely resulting from multiple freeze/thaw cycles incurred during several re-extractions. Sample COVID-112 was called an Omicron by DETECTR® based on its A484 SNP call, which was confirmed by WGS. Finally, sample COVID-122 could not be amplified by RT-LAMP, also suggesting a loss in sample integrity. Following this discrepancy analysis, an overall SNP concordance of 94.7%, and 100% NPA was demonstrated for this set of 48 samples (Table 9).
[00615] Table 9 : Overall SNP concordance values for the 484 SNP from the evaluation of the DETECTR® assay against the SARS-CoV-2 WGS comparator assay
[00616] Discussion
[00617] In this example, a CRISPR based DETECTR® assay was developed for the detection of SARS-CoV-2 variants. A data analysis pipeline was developed to differentiate between WT and MUT signals with the COVID-19 Variant DETECTR® assay, yielding an overall SNP concordance of 97.9% (373/381 total SNP calls when combined with Example 2) and 99.3% (138/139 when combined with Example 2) agreement with lineage classification compared to viral WSG. Taken together, these findings show robust agreement between the COVID- 19 Variant DETECTR® assay and viral WGS for identification of SNP mutations and variant categorization. Thus, the COVID- 19 Variant DETECTR® assay
provided a faster and simpler alternative to sequencing-based methods for COVID- 19 variant diagnostics and surveillance.
[00618] Although CRISPR-based diagnostic assays have been demonstrated for the detection of SARS-CoV-2 variants, these studies have limitations regarding coverage of circulating lineages, the extent of clinical sample evaluation, and/or assay complexity. For example, the miSHERLOCK variant assay uses LbCasl2a (NEB) with RPA preamplification to detect N501Y, E484K and Y144Del covering eight lineages (WA-1, Alpha, Beta, Gamma, Eta, Iota, Mu and Zeta) and was tested only on contrived samples (RNA spiked into human saliva) (see Puig et al., Sci Adv 7, (2021)). The SHINEv2 assay uses LwaCasl3a with RPA pre-amplification to detect 69/70Del, K417N/T, L452R and 156/157Del + R158G covering eight lineages (WA-1, Alpha, Beta, Gamma, Delta, Epsilon, Kappa and Mu) and was tested with only the 69/70Del gRNAs on 20 Alpha-positive NP clinical samples (see Arizti-Sanz et al., medRxiv, (2021)). Finally, the mCARMEN variant identification panel (VIP) uses 26 crRNA pairs with either the LwaCasl3a or LbaCasl3a and PCR pre-amplification to identify all current circulating lineages including Omicron; however, the VIP requires the Fluidigm Biomark HD system or similar, more complex instrumentation for streamlined execution (see Welch et al., medRxiv, (2021)). In comparison, the COVID- 19 Variant DETECTR® assay disclosed herein in Examples 2 and 4 uses CasDxl to detect N501Y, E484K/Q/A, and L452R covering all current circulating lineages and tested on 139 clinical samples representing eight lineages (WA-1, Alpha, Gamma, Delta, Epsilon, Iota, Mu, Omicron). Furthermore, specific Omicron identification was accomplished using only the E484 WT and A484 MUT guides.
[00619] The results of Examples 2-4 show that the choice of Cas enzyme may be important to maximize the accuracy of CRISPR-based diagnostic assays and may need to be tailored to the site that is being targeted. As configured in Example 4 with the L452R, E484K/Q/A and N501Y SNP targets, the COVID- 19 variant DETECTR® assay was capable of distinguishing the Alpha, Delta, Kappa and Omicron variants, but did not resolve the remaining VBMs or VOIs. However, given the rapid emergence and shifts in the distribution of variants over time, it appears likely that tracking of key mutations, many of which are suspected to arise from convergent evolution, rather than tracking of variants, may be more important for surveillance as the pandemic continues. The data analysis pipeline developed for CRISPR-based SNP calling described herein can readily incorporate additional targets and may offer a blueprint for automated interpretation of fluorescent signal patterns.
[00620] In the near term, the COVID-19 Variant DETECTR® assay described herein can be served as an initial screen for circulating variants and/or a distinct pattern from a rare or novel variant by interrogating the key 452, 484, and 501 positions that could be reflexed to viral WGS. As the sequencing capacity for most clinical and public health laboratories is limited, the COVID- 19 Variant DETECTR® assay would thus enable rapid identification of variants circulating in the community to support outbreak investigation and public health containment efforts. Identification of specific mutations associated with neutralizing antibody evasion could inform patient care with regards to the use of monoclonal antibodies that remain effective in treating the infection. As the virus continues to mutate and evolve, the COVID-19 Variant DETECTR® assay can be readily reconfigured by validating new preamplification LAMP primers and gRNAs that target emerging mutations with clinical and epidemiological significance. For example, newly emerging variants may contain additional mutations that may not be captured with the resolution of the L452R, E484K/Q/A, and N501Y mutations described in Examples 2-4 or which may result in variable performance at a particular SNP compared to others, and could be detected by incorporation of additional gRNA(s) to provide specific and redundance coverage and/or improve identification of specific lineages. Additionally, incorporation of an additional N-gene target to the assay may improve the limit of detection of the DETECTR® assay, which currently relies entirely on a multiplexed and degenerate S-gene LAMP primer design, optionally to facilitate simultaneous detection and SNP/variant identification. Over the longer term, a validated CRISPR assay that combines SARS-CoV-2 detection with variant identification may offer a faster and simpler alternative to sequencing and would be useful as a tool for simultaneous COVID-19 diagnosis in individual patients and surveillance for infection control and public health purposes.
[00621] CONCLUSION
[00622] All references cited herein are incorporated herein by reference in their entirety and for all purposes to the same extent as if each individual publication or patent or patent application was specifically and individually indicated to be incorporated by reference in its entirety for all purposes.
[00623] Plural instances may be provided for components, operations or structures described herein as a single instance. Finally, boundaries between various components, operations, and data stores are somewhat arbitrary, and particular operations are illustrated in the context of specific illustrative configurations. Other allocations of functionality are
envisioned and may fall within the scope of the implementation(s). In general, structures and functionality presented as separate components in the example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the implementation(s).
[00624] It will also be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first subject could be termed a second subject, and, similarly, a second subject could be termed a first subject, without departing from the scope of the present disclosure. The first subject and the second subject are both subjects, but they are not the same subject.
[00625] The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the description of the invention and the appended claims, the singular forms “a,” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
[00626] As used herein, the term “if’ may be construed to mean “when” or “upon” or “in response to determining” or “in response to detecting,” depending on the context. Similarly, the phrase “if it is determined” or “if [a stated condition or event] is detected” may be construed to mean “upon determining” or “in response to determining” or “upon detecting (the stated condition or event (” or “in response to detecting (the stated condition or event),” depending on the context. As used herein, the term “about” in reference to a number or range of numbers is understood to mean the stated number and numbers +/- 10% thereof, or 10% below the lower listed limit and 10% above the higher listed limit for the values listed for a range.
[00627] The foregoing description included example systems, methods, techniques, instruction sequences, and computing machine program products that embody illustrative implementations. For purposes of explanation, numerous specific details were set forth in order to provide an understanding of various implementations of the inventive subject matter. It will be evident, however, to those skilled in the art that implementations of the inventive subject matter may be practiced without these specific details. In general, well-known instruction instances, protocols, structures and techniques have not been shown in detail.
[00628] The foregoing description, for the purpose of explanation, has been described with reference to specific implementations. However, the illustrative discussions above are not intended to be exhaustive or to limit the implementations to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The implementations were chosen and described in order to best explain the principles and their practical applications, to thereby enable others skilled in the art to best utilize the implementations and various implementations with various modifications as are suited to the particular use contemplated.
Claims
1. A method for determining a mutation call for a target locus in a biological sample, the method comprising: at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors:
(A) obtaining a signal dataset comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus, wherein: the signal dataset represents a plurality of time points, the plurality of wells comprises a first set of wells representing a first allele for the target locus, wherein each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus, the plurality of wells further comprises a second set of wells representing a second allele for the target locus, wherein each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus, each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value, and each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample,
(B) determining, for each respective well in the plurality of wells, a corresponding signal yield for the respective well using the corresponding plurality of reporting signals for the respective well across the plurality of time points;
(C) determining, for each respective well in the first set of wells, a respective candidate call identity based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities; and
(D) performing a voting procedure across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
2. The method of claim 1, wherein the determining (B) further comprises, for each respective well in the plurality of wells, normalizing the respective plurality of reporting signals for the respective well by scaling a maximum discrete attribute value by a minimum discrete attribute value in the corresponding plurality of reporting signals for the respective well, thereby obtaining the corresponding signal yield for the respective well.
3. The method of claim 1 or 2, wherein, for each respective well in the first set of wells, the corresponding signal yield from the corresponding well in the second set of wells originates from a common biological replicate of the biological sample.
4. The method of any one of claims 1-3, wherein the respective candidate call identity is selected from the group consisting of first allele, second allele, and no-call.
5. The method of any one of claims 1-4, wherein the respective candidate call identity is obtained using a relative intensity metric between the corresponding first signal yield and the corresponding second signal yield.
6. The method of claim 5, wherein the relative intensity metric has the form log2(Fy(FAl)) / log2(Fy(SAl)), wherein:
Fy(FAl) is the first corresponding signal yield, and Fy(SAl) is the second corresponding signal yield.
7. The method of claim 6, wherein, when the relative intensity metric is greater than 1, the candidate call identity is first allele.
8. The method of claim 6 or 7, wherein, when the relative intensity metric is less than 1, the candidate call identity is second allele.
9. The method of any one of claims 5-8, wherein the signal dataset further comprises, for each respective control well in a plurality of control wells in the common plate, a corresponding plurality of control reporting signals, wherein: the plurality of control wells are free of nucleic acid derived from the biological sample,
the plurality of control wells comprises a first set of control wells representing the first allele for the target locus, wherein each well in the first set of control wells includes a first plurality of guide nucleic acids that have the first allele of the target locus, the plurality of control wells comprises a second set of control wells representing the second allele for the target locus, wherein each well in the second set of control wells includes a second plurality of guide nucleic acids that have the second allele of the target locus, and each corresponding plurality of control reporting signals comprises, for each respective time point in the plurality of time points, a respective control reporting signal in the form of a corresponding discrete attribute value, and wherein: the method further comprises: determining, for each respective control well in the plurality of control wells, a corresponding control signal yield for the respective well using the corresponding plurality of control reporting signals for the respective well across the plurality of time points, thereby obtaining a plurality of control signal yields including a maximum control signal yield and a minimum control signal yield for the common plate, and evaluating whether a respective candidate call identity in the plurality of candidate call identities is a no-call using the maximum control signal yield and the minimum control signal yield.
10. The method of claim 9, wherein the evaluating comprises performing a comparison of
(i) the relative intensity metric between the corresponding first and second signal yields and
(ii) the plurality of control signal yields, wherein, when the relative intensity metric between the corresponding first and second signal yields falls within a range bounded by the maximum control signal yield and the minimum control signal yield, the respective candidate call identity is deemed a no-call.
11. The method of claim 9 or 10, wherein each respective control signal yield is determined by normalizing a maximum discrete attribute value in the corresponding plurality of control reporting signals for the corresponding control well by a minimum discrete attribute value in the corresponding plurality of control reporting signals for the corresponding control well.
12. The method of any one of claims 9-11, further comprising:
obtaining a central tendency metric for the corresponding first signal yield and the corresponding second signal yield; and determining, for each respective control well in the first set of control wells, a respective control central tendency metric, thereby obtaining a plurality of control central tendency metrics including a maximum control central tendency metric and a minimum control central tendency metric, and wherein: the evaluating comprises performing a comparison between the central tendency metric of the corresponding first and second signal yields and the control central tendency metric for the first set of control wells, wherein, when the central tendency metric of the corresponding first and second signal yields falls within a range bounded by the maximum control central tendency metric and the minimum control central tendency metric, the respective candidate call identity is deemed a no-call.
13. The method of claim 12, wherein the central tendency metric is calculated as
[log2(Fy(FAl) + Fy(SAl))]/2, wherein,
Fy(FAl) is the corresponding first signal yield, and Fy(SAl) is the corresponding second signal yield.
14. The method of claim 12 or 13, wherein the control central tendency metric is a log average calculated as
[log2(Fc(FAl) + Fc(SAl))]/2 wherein,
Fc(FAl) is a first control signal yield for the respective control well in the first set of control wells, and
Fc(SAl) is a second control signal yield for a corresponding control well in the second set of control wells.
15. The method of any one of claims 1-14, further comprising, prior to the performing (D), binning the first set of wells into a plurality of bins, wherein each respective bin comprises a respective subset of wells originating from a common respective biological replicate of the biological sample, and wherein the voting procedure further comprises:
(i) performing, for each respective bin in the plurality of bins, a respective first concordance vote across the candidate call identities for the subset of wells in the respective
202
bin, thereby generating a plurality of bin votes, and
(ii) applying a second concordance vote across the plurality of bin votes, thereby obtaining the mutation call for the target locus.
16. The method of claim 15, wherein each respective bin vote in the plurality of bin votes is selected from the group consisting of first allele, second allele, and no-call.
17. The method of claim 15 or 16, wherein a respective first concordance vote generates a respective bin vote based on a common candidate call identity that is shared by at least a majority of the subset of wells for the respective bin.
18. The method of claim 15 or 16, wherein a respective first concordance vote generates a respective bin vote based on a common candidate call identity that is shared by all of the wells in the subset of wells for the respective bin.
19. The method of claim 15 or 16, wherein a respective bin vote is no-call when at least one well in the subset of wells for the respective bin has a candidate call identity of no-call.
20. The method of any one of claims 15-19, wherein the second concordance vote generates a mutation call based on a common bin vote that is shared by at least a majority of the bins in the plurality of bins.
21. The method of any one of claims 15-19, wherein the second concordance vote generates a mutation call based on a common bin vote that is shared by all of the bins in the plurality of bins.
22. The method of any one of claims 15-21, wherein each respective bin in the plurality of bins corresponds to a different respective biological replicate of an amplification reaction, and the respective subset of wells for the respective bin is partitioned from the respective biological replicate.
23. The method of any one of claims 15-22, wherein the plurality of bins comprises between 2 and 10 bins.
203
24. The method of any one of claims 15-23, wherein each respective bin in the plurality of bins comprises between 2 and 10 wells in the respective subset of wells.
25. The method of any one of claims 1-24, wherein the signal dataset is obtained by a procedure comprising: amplifying a first plurality of nucleic acids derived from the biological sample, thereby generating a plurality of amplified nucleic acids; and for each respective well in the first set of wells and each respective well in the second set of wells: partitioning, from the plurality of amplified nucleic acids, the respective corresponding aliquot of nucleic acid; contacting the respective corresponding aliquot of nucleic acid with at least a programmable nuclease, a guide nucleic acid, and one or more reporters; and cleaving the one or more reporters using the programmable nuclease if the first allele or the second allele, respectively, is present, thereby generating a reporting signal.
26. The method of claim 25, wherein the programmable nuclease is a trans-cleaving programmable nuclease.
27. The method of claim 25 or 26, wherein the programmable nuclease targets DNA or RNA.
28. The method of any one of claims 25-27, wherein the programmable nuclease is a Cas nuclease.
29. The method of claim 28, wherein the Cas nuclease is selected from the group consisting of a Casl2, Casl3, Cas 14, or CasPhi.
30. The method of claim 28 or 29, wherein the Cas nuclease is LbCasl2a, AsCasl2a, or CasDxl.
31. The method of any one of claims 25-30, wherein, for each respective well in the first set of wells, the programmable nuclease is programmed for the target locus using a corresponding guide nucleic acid in the first plurality of guide nucleic acids that have the first
204
allele of the target locus, and wherein the cleaving the one or more reporters using the programmable nuclease is initiated upon recognition of the target locus by the corresponding guide nucleic acid.
32. The method of claim 31, wherein the first plurality of guide nucleic acids is RNA.
33. The method of claim 31 or 32, wherein the first plurality of guide nucleic acids hybridizes to a wild-type sequence for the target locus.
34. The method of any one of claims 25-33, wherein each respective reporter in the one or more reporters is single-stranded DNA.
35. The method of any one of claims 25-33, wherein each respective reporter in the one or more reporters is single-stranded RNA.
36. The method of any one of claims 25-35, wherein each respective reporter in the one or more reporters comprises a fluorescent reporter linked to a quencher.
37. The method of any one of claims 25-36, wherein the amplifying is performed using isothermal amplification, loop-mediated isothermal amplification (LAMP), reverse transcriptase loop-mediated isothermal amplification (RT-LAMP), recombinase polymerase amplification (RPA), reverse transcriptase recombinase polymerase amplification (RT-RPA), polymerase chain reaction (PCR), or reverse transcriptase polymerase chain reaction (RT- PCR).
38. The method of any one of claims 25-37, wherein the amplifying is performed using a multiplexed amplification reaction.
39. The method of any one of claims 25-38, further comprising, prior to the amplifying, partitioning the first plurality of nucleic acids derived from the biological sample into a plurality of amplification replicates, wherein the amplifying generates a respective plurality of amplified nucleic acids for each respective amplification replicate.
40. The method of claim 39, further comprising, prior to the contacting, partitioning the respective plurality of amplified nucleic acids for each respective amplification replicate into
205
a plurality of detection replicates, wherein each respective detection replicate in the plurality of detection replicates represents a respective well in the plurality of wells.
41. The method of claim 39 or 40, further comprising applying an amplification threshold filter to each respective amplification replicate, wherein, when the respective plurality of amplified nucleic acids fails to satisfy an amplification threshold, the respective amplification replicate is removed from the plurality of amplification replicates.
42. The method of any one of claims 1-41, wherein the first allele is wild-type, the second allele is mutant, and the mutation call is wild-type or mutant.
43. The method of any one of claims 1-42, wherein the first allele is a first mutant, and the second allele is a second mutant.
44. The method of any one of claims 1-43, wherein each respective reporting signal is obtained from a detection moiety.
45. The method of claim 44, wherein each respective reporting signal is a fluorescence emission by a fluorophore, and the corresponding discrete attribute value is a fluorescence intensity.
46. The method of claim 44 or 45, wherein the corresponding discrete attribute value for each respective reporting signal is detected using a plate reader.
47. The method of any one of claims 1-43, wherein the respective reporting signal is a lateral flow readout.
48. The method of claim 47, wherein the lateral flow readout is detected manually by visual inspection.
49. The method of claim 47, wherein the lateral flow readout is detected using an image analysis algorithm.
206
50. The method of any one of claims 1-49, wherein the target locus is selected from the group consisting of: a single nucleotide variant, a multi-nucleotide variant, an indel, a DNA rearrangement, and a copy number variation.
51. The method of any one of claims 1-49, wherein the target locus is selected from the group consisting of: a restriction fragment length polymorphism (RFLP), a random amplified polymorphic DNA (RAPD), an amplified fragment length polymorphism (AFLP), a variable number tandem repeat (VNTR), an oligonucleotide polymorphism (OP), a single nucleotide polymorphism (SNP), an allele specific associated primer (ASAP), an inverse sequence- tagged repeat (ISTR), an inter-retrotransposon amplified polymorphism (IRAP), and a simple sequence repeat (SSR or microsatellite).
52. The method of any one of claims 1-51, wherein the target locus is selected from a database.
53. The method of any one of claims 1-52, wherein the target locus is a gene.
54. The method of any one of claims 1-53, wherein the target locus comprises DNA or RNA.
55. The method of claim 54, wherein the target locus comprises DNA, and the one or more nucleic acid molecules in the biological sample that map to the target locus is obtained from a transcription reaction of a DNA molecule for the target locus.
56. The method of any one of claims 1-55, wherein the plurality of time points comprises between 20 and 2000 time points.
57. The method of any one of claims 1-56, wherein the plurality of time points is obtained over a duration of between 30 seconds and 1 hour.
58. The method of any one of claims 1-57, wherein the biological sample is a clinical sample.
59. The method of any one of claims 1-58, wherein the biological sample is a bodily fluid.
207
60. The method of claim 59, wherein the bodily fluid is sputum, saliva, nasopharyngeal fluid, oropharyngeal fluid or blood.
61. The method of any one of claims 1-60, wherein the biological sample is obtained from a nasopharyngeal or oropharyngeal swab.
62. The method of any one of claims 1-61, wherein the biological sample is obtained from a subject that is infected with a pathogen.
63. The method of claim 62, wherein the pathogen is a virus, bacteria, fungus, or parasite.
64. The method of claim 62 or 63, wherein the pathogen is severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
65. The method of any one of claims 1-64, further comprising heat-inactivating the biological sample.
66. The method of any one of claims 1-65, further comprising isolating the one or more nucleic acid molecules from the biological sample.
67. The method of any one of claims 1-66, wherein the one or more nucleic acid molecules comprises synthetic nucleic acid molecules.
68. A computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors, the one or more programs comprising instructions for performing a method comprising:
(A) obtaining a signal dataset comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus, wherein: the signal dataset represents a plurality of time points, the plurality of wells comprises a first set of wells representing a first allele for the target locus, wherein each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus,
208
the plurality of wells further comprises a second set of wells representing a second allele for the target locus, wherein each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus, each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value, and each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample,
(B) determining, for each respective well in the plurality of wells, a corresponding signal yield for the respective well using the corresponding plurality of reporting signals for the respective well across the plurality of time points;
(C) determining, for each respective well in the first set of wells, a respective candidate call identity based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities; and
(D) performing a voting procedure across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
69. A non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for carrying out a method comprising:
(A) obtaining a signal dataset comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus, wherein: the signal dataset represents a plurality of time points, the plurality of wells comprises a first set of wells representing a first allele for the target locus, wherein each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus, the plurality of wells further comprises a second set of wells representing a second allele for the target locus, wherein each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus,
209
each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value, and each respective well in the first set of wells and each respective well in the second set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample,
(B) determining, for each respective well in the plurality of wells, a corresponding signal yield for the respective well using the corresponding plurality of reporting signals for the respective well across the plurality of time points;
(C) determining, for each respective well in the first set of wells, a respective candidate call identity based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities; and
(D) performing a voting procedure across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
70. A method for determining a mutation call for a target locus in a biological sample, the method comprising: at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors:
(A) amplifying a first plurality of nucleic acids derived from the biological sample, thereby obtaining a plurality of amplified nucleic acids;
(B) for each respective well in a plurality of wells in a common plate, partitioning, from the plurality of amplified nucleic acids, a respective corresponding aliquot of nucleic acid derived from the biological sample; and contacting the respective corresponding aliquot of nucleic acid with at least a programmable nuclease, a guide nucleic acid, and a plurality of reporters;
(C) obtaining a signal dataset comprising, for each respective well in the plurality of wells in the common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules derived from the biological sample that map to the target locus, wherein: the signal dataset represents a plurality of time points,
210
the plurality of wells comprises a first set of wells representing a first allele for the target locus, wherein each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus, the plurality of wells further comprises a second set of wells representing a second allele for the target locus, wherein each well in the second set of wells includes a second plurality of guide nucleic acids that have the second allele of the target locus, and each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value obtained from a cleavage of one or more respective reporters, in the plurality of reporters, by the programmable nuclease,
(D) determining, for each respective well in the plurality of wells, a corresponding signal yield for the respective well using the corresponding plurality of reporting signals for the respective well across the plurality of time points;
(E) determining, for each respective well in the first set of wells, a respective candidate call identity based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding second signal yield for a corresponding well in the second set of wells, thereby obtaining a plurality of candidate call identities; and
(F) performing a voting procedure across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus.
71. The method of any one of claims 1-67, further comprising, for each candidate second allele in a plurality of candidate second alleles, repeating the A) obtaining, B) determining, C) determining, and D) performing thereby obtaining a plurality of candidate mutation calls for the target locus, and the method further comprises performing a mutation call voting procedure across the plurality of candidate mutation calls for the target locus, thereby obtaining a final mutation call for the target locus that is one of the candidate mutation calls or the first allele.
72. The method of claim 71, wherein the first allele is a wild-type allele and each respective candidate second allele in the plurality of candidate second alleles is a different respective mutant allele for the target locus.
211
73. The method of claim 71 or 72, wherein, when a single respective candidate mutation call in the plurality of candidate mutation calls corresponds to a respective candidate second allele in the plurality of candidate second alleles, the respective candidate mutation call is determined as the final mutation call for the target locus.
74. The method of claim 71 or 72, wherein, when a sub-plurality of candidate mutation calls in the plurality of candidate mutations calls corresponds to a respective sub-plurality of candidate second alleles in the plurality of candidate second alleles, the obtaining the final mutation call for the target locus is based upon a comparison of signal yields for each pair of candidate second alleles in the sub-plurality of candidate second alleles.
74. The method of any one of claims 71-74 wherein the plurality of candidate second alleles consists of between 2 and 10 candidate second alleles.
75. The method of any one of claims 1-67, wherein the second allele is a first candidate second allele in a plurality of candidate second alleles, the plurality of wells further comprises, for each respective candidate second allele in the plurality of candidate second alleles beyond the first candidate second allele, a respective additional set of wells representing the candidate second allele for the target locus, wherein each well in the respective additional set of wells includes a plurality of guide nucleic acids that have the respective candidate second allele of the target locus, each respective well in each respective additional set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample wherein the C) determining and D) performing is performed for each candidate second allele in the plurality of candidate second alleles, thereby obtaining a plurality of candidate mutation calls for the target locus, and the method further comprises performing a mutation call voting procedure across the plurality of candidate mutation calls for the target locus, thereby obtaining a final mutation call for the target locus that is one of the candidate mutation calls or the first allele.
76. The method of claim 75, wherein the first allele is a wild-type allele and each respective candidate second allele in the plurality of candidate second alleles is a different respective mutant allele for the target locus.
212
77. The method of claim 75 or 76, wherein, when a single respective candidate mutation call in the plurality of candidate mutation calls corresponds to a respective candidate second allele in the plurality of candidate second alleles, the respective candidate mutation call is determined as the final mutation call for the target locus.
78. The method of claim 75 or 76, wherein, when a sub-plurality of candidate mutation calls in the plurality of candidate mutations calls corresponds to a respective sub-plurality of candidate second alleles in the plurality of candidate second alleles, the obtaining the final mutation call for the target locus is based upon a comparison of signal yields for each pair of candidate second alleles in the sub-plurality of candidate second alleles.
79. The method of any one of claims 75-78, wherein the plurality of candidate second alleles consists of between 2 and 10 candidate second alleles.
80. A method for determining a mutation call for a target locus in a biological sample, the method comprising: at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors:
(A) obtaining a signal dataset comprising, for each respective well in a plurality of wells in a common plate, a corresponding plurality of reporting signals for a respective one or more nucleic acid molecules in the biological sample that map to the target locus, wherein: the signal dataset represents a plurality of time points, the plurality of wells comprises a first set of wells representing a first allele for the target locus, wherein each well in the first set of wells includes a first plurality of guide nucleic acids that have the first allele of the target locus, the plurality of wells further comprises, for each respective candidate second allele in a set of candidate second alleles, a respective additional set of wells representing the respective candidate second allele for the target locus, wherein each well in the respective additional set of wells includes a corresponding plurality of guide nucleic acids that have the respective candidate second allele for the target locus, each corresponding plurality of reporting signals comprises, for each respective time point in the plurality of time points, a respective reporting signal in the form of a corresponding discrete attribute value, and
213
each respective well in the first set of wells and each respective well in each respective additional set of wells contains a corresponding aliquot of nucleic acid derived from the biological sample,
(B) determining, for each respective well in the plurality of wells, a corresponding signal yield for the respective well using the corresponding plurality of reporting signals for the respective well across the plurality of time points;
(C) determining, for each respective well in the first set of wells for each respective candidate second allele in the set of candidate second alleles, a respective candidate call identity based on a comparison between a corresponding first signal yield for the respective well in the first set of wells and a corresponding respective second signal yield for a corresponding well in the respective additional set of wells for the respective candidate second allele, thereby obtaining a plurality of candidate call identities; and
(D) performing a voting procedure across the plurality of candidate call identities for the first set of wells, thereby obtaining a mutation call for the target locus that is one of the candidate second alleles in the set of candidate second alleles or the first allele.
81. The method of claim 80, wherein the first allele is a wild-type allele and each respective candidate second allele in the set of candidate second alleles is a different respective mutant allele for the target locus.
82. The method of claims 80 or 81, wherein the set of candidate second alleles consists of between 2 and 10 candidate second alleles.
83. The method of claim 80, wherein the set of candidate second alleles consists of a single second allele.
84. A method of assaying for a SARS-CoV-2 variant in an individual, the method comprising:
(A) collecting a nasal swab or a throat swab from the individual;
(B) optionally extracting a target nucleic acid comprising a segment of a SARS-CoV- 2 Spike (“S”) gene from the nasal swab or the throat swab;
214
(C) amplifying the target nucleic acid to produce an amplification product, wherein amplifying the target nucleic acid comprises contacting the target nucleic acid to a plurality of LAMP amplification primers;
(D) contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising (i) an effector protein and (ii) a guide nucleic acid comprising a nucleotide sequence that is at least 90% identical to any one of SEQ ID NOS: 1 to 6, 22 to 27, or 40 to 42;
(E) assaying for a change in a signal produced by cleavage of a reporter nucleic acid, wherein the effector protein trans cleaves the reporter nucleic acid upon hybridization of the guide nucleic acid to the target nucleic acid, a segment thereof, or an amplification product thereof; and
(F) determining an SNP call of the sample.
85. The method of claim 84, wherein the nasal swab is a nasopharyngeal swab.
86. The method of claim 84, wherein the throat swab is an oropharyngeal swab.
87. The method of any one of claims 84-86, wherein amplifying the target nucleic acid comprises contacting the target nucleic acid to at least one reagent for amplification.
88. The method of any one of claims 84-87, wherein the amplifying comprises loop mediated amplification (LAMP).
89. The method of any one of claims 87-88, wherein the at least one reagent for amplification comprises a polymerase, dNTPs, or a combination thereof.
90. The method of any one of claims 84-89, wherein the plurality of LAMP amplification primers comprise an FIP primer, a BIP primer, an F3 primer, a B3 primer, an LF primer, and an LB primer.
91. The method of any one of claims 84-90, wherein the amplification primers comprise SEQ ID NOS: 8-13, SEQ ID NOS: 14-19, SEQ ID NOS: 28-33, and/or SEQ ID NOS: 34-39.
215
92. The method of any one of claims 84-91, wherein the amplifying comprises reverse transcription-L AMP .
93. The method of any one of claims 84-92, comprising lysing the sample.
94. The method of claim 93, wherein lysing the sample comprises contacting the sample to a lysis buffer.
95. The method of any one of claims 1-67 or 70-94, wherein determining an SNP call comprises determining whether the segment of the S-gene comprises one or more S-gene mutation(s) relative to a reference wild-type SARS-CoV-2 S-gene.
96. The method of claim 95, wherein the reference wild-type SARS-CoV-2 gene is from the USA-WA1/2020 sequence.
97. The method of claim 95 or 96, wherein the one or more S-gene mutations is a single nucleotide polymorphism (SNP).
98. The method of any one of claims 95-97, wherein the one or more S-gene mutations is associated with one or more Spike protein mutations.
99. The method of claim 98, wherein the one or more Spike protein mutations is (i) a mutation in amino acid position 484 from E to K (E484K), (ii) a mutation in amino acid position 501 from N to Y (N501Y), (iii) a mutation in amino acid position 452 from L to R (L452R), (iv) a mutation in amino acid position 484 from E to Q (E484Q), (v) a mutation in amino acid position 484 from E to A (E484A), or a combination thereof.
100. The method of any one of claims 84-99, wherein determining an SNP call of the sample comprises comparing the signal produced by the cleavage of the reporter nucleic acid by a composition comprising a guide nucleic acid comprising a nucleotide sequence that is least 90% identical to any one of SEQ ID NOS; 1-3 or 22-24 to a signal produced by contacting a composition comprising a guide nucleic acid comprising a nucleotide sequence that is at least 90% identical to any one of SEQ ID NOS: 4-6, 25-27, or 40-42 to a nucleic acid identical to the target nucleic acid.
216
101. The method of any one of claim 1-67 or 70-100, comprising determining a variant call of the sample.
102. The method of claim 101, wherein determining the variant call comprises determining whether the sample comprises a wild-type SARS-CoV-2 or a SARS-CoV-2 variant.
103. The method of claim 100 or 102, wherein the SARS-CoV-2 variant is any one of an Alpha, Beta, Gamma, Delta, Epsilon, Kappa, Omicron, or Zeta SARS-CoV-2 variant.
104. The method of any one of claims 101-103, wherein determining the variant call comprises determining one or more SNP calls of the sample.
105. The method of any one of claims 101-104, wherein determining whether the sample comprises an Alpha SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an N501Y mutation.
106. The method of any one of claims 101-105, wherein determining whether the sample comprises a Beta SARS-CoV-2 variant comprises detecting S-gene mutations associated with E484K and N501Y mutations.
107. The method of any one of claims 101-106, wherein determining whether the sample comprises a Gamma SARS-CoV-2 variant comprises detecting S-gene mutations associated with E484K and N501 Y mutations.
108. The method of any one of claims 101-107, wherein determining whether the sample comprises a Mu SARS-CoV-2 variant comprises detecting S-gene mutations associated with E484K and N501Y mutations.
109. The method of any one of claims 101-108, wherein determining whether the sample comprises a Delta SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an L452R mutation.
217
110. The method of any one of claims 101-109, wherein determining whether the sample comprises a Epsilon SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an L452R mutation.
111. The method of any one of claims 101-110, wherein determining whether the sample comprises a Kappa SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an L452R mutation.
112. The method of any one of claims 101-111, wherein determining whether the sample comprises an Omicron SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an E484A mutation.
113. The method of any one of claims 101-112, wherein determining whether the sample comprises a Zeta SARS-CoV-2 variant comprises detecting an S-gene mutation associated with an E484K mutation.
114. The method of any one of claims 84-113, wherein the effector protein is a Type V Cas effector protein.
115. The method of claim 114, wherein the Type V effector protein is a Cas 12 effector protein, a Cas 13 effector protein, a Cas 14 effector protein, or a CasPhi effector protein.
116. The method of one of claims 84-115, wherein the guide nucleic acid comprises a nucleotide sequence that is at least 95% identical to one of SEQ ID NO: 1 to 6, 22 to 27, or 40 to 42.
117. The method of any one of claims 84-116, wherein the guide nucleic acid comprises a nucleotide sequence that is at least 98% identical to one of SEQ ID NO: 1 to 6, 22 to 27, or 40 to 42.
118. The method of any one of claims 84-117, wherein the guide nucleic acid comprises a nucleotide sequence of one of SEQ ID NO: 1 to 6, 22 to 27, or 40 to 42.
218
119. The method of any one of claims 84- 118, wherein the reporter nucleic acid comprises a nucleotide sequence that is at least 75% identical to SEQ ID NO: 7, wherein the nucleotide sequence is flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end.
120. The method of any one of claims 84-119, wherein the reporter nucleic acid has a nucleotide sequence of SEQ ID NO: 7 flanked with a fluorescent dye on the 5’ end and a quencher on the 3’ end.
121. A method of assaying for a SARS-CoV-2 variant in an individual, the method comprising:
(A) amplifying a target nucleic acid from a biological sample of the individual, the target nucleic acid comprising a segment of a SARS-CoV-2 Spike (“S”) gene to produce an amplification product, wherein amplifying the target nucleic acid comprises contacting the target nucleic acid to a plurality of LAMP amplification primers;
(B) contacting the amplification product, the target nucleic acid, or a combination thereof, to a composition comprising (i) an effector protein and (ii) a guide nucleic acid comprising a nucleotide sequence that is at least 90% identical to any one of SEQ ID NOS: 1 to 6, 22 to 27, or 40 to 42;
(C) assaying for a change in a signal produced by cleavage of a reporter nucleic acid, wherein the effector protein trans cleaves the reporter nucleic acid upon hybridization of the guide nucleic acid to the target nucleic acid, a segment thereof, or an amplification product thereof;
(D) determining an SNP call of the sample; and
(E) optionally extracting a target nucleic acid from a biological sample obtained from the individual, prior to (A).
Applications Claiming Priority (10)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202163283936P | 2021-11-29 | 2021-11-29 | |
US202163283930P | 2021-11-29 | 2021-11-29 | |
US63/283,930 | 2021-11-29 | ||
US63/283,936 | 2021-11-29 | ||
US202263305934P | 2022-02-02 | 2022-02-02 | |
US202263305872P | 2022-02-02 | 2022-02-02 | |
US63/305,872 | 2022-02-02 | ||
US63/305,934 | 2022-02-02 | ||
US202263390572P | 2022-07-19 | 2022-07-19 | |
US63/390,572 | 2022-07-19 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2023097325A2 true WO2023097325A2 (en) | 2023-06-01 |
WO2023097325A3 WO2023097325A3 (en) | 2023-07-06 |
Family
ID=86540404
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2022/080553 WO2023097325A2 (en) | 2021-11-29 | 2022-11-29 | Systems and methods for identifying genetic phenotypes using programmable nucleases |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023097325A2 (en) |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
IL269097B2 (en) * | 2012-09-04 | 2024-01-01 | Guardant Health Inc | Systems and methods to detect rare mutations and copy number variation |
WO2020142754A2 (en) * | 2019-01-04 | 2020-07-09 | Mammoth Biosciences, Inc. | Programmable nuclease improvements and compositions and methods for nucleic acid amplification and detection |
-
2022
- 2022-11-29 WO PCT/US2022/080553 patent/WO2023097325A2/en unknown
Also Published As
Publication number | Publication date |
---|---|
WO2023097325A3 (en) | 2023-07-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kostyusheva et al. | CRISPR-Cas systems for diagnosing infectious diseases | |
US20230392141A1 (en) | Methods and compositions for analyzing nucleic acid | |
US20220119788A1 (en) | Programmable nuclease improvements and compositions and methods for nucleic acid amplification and detection | |
US20240102084A1 (en) | Compositions and methods for detection of a nucleic acid | |
US20230159992A1 (en) | High throughput single-chamber programmable nuclease assay | |
US20220364159A1 (en) | Compositions for detection of dna and methods of use thereof | |
EA027558B1 (en) | Process for multiplex nucleic acid identification | |
JP2010521156A (en) | System and method for detection of HIV drug resistant variants | |
No et al. | Comparison of targeted next-generation sequencing for whole-genome sequencing of Hantaan orthohantavirus in Apodemus agrarius lung tissues | |
WO2022133108A2 (en) | Methods and compositions for performing a detection assay | |
WO2023056451A1 (en) | Compositions and methods for assaying for and genotyping genetic variations | |
US20240191224A1 (en) | Programmable nucleases and methods of use | |
JP2023527850A (en) | programmable nuclease diagnostic device | |
Li et al. | Applications of the CRISPR-Cas system for infectious disease diagnostics | |
Nafea et al. | Application of next-generation sequencing to identify different pathogens | |
Zhang et al. | Diagnostic efficiency of RPA/RAA integrated CRISPR-Cas technique for COVID-19: A systematic review and meta-analysis | |
Qian et al. | Clustered regularly interspaced short palindromic Repeat/Cas12a mediated multiplexable and portable detection platform for GII genotype Porcine Epidemic Diarrhoea Virus Rapid diagnosis | |
US20240209421A1 (en) | Single-buffer compositions for nucleic acid detection | |
Wongpalee et al. | Highly specific and sensitive detection of Burkholderia pseudomallei genomic DNA by CRISPR-Cas12a | |
WO2023097325A2 (en) | Systems and methods for identifying genetic phenotypes using programmable nucleases | |
Zhou et al. | Sensitive and specific exonuclease III-assisted recombinase-aided amplification colorimetric assay for rapid detection of nucleic acids | |
Luo et al. | An isothermal CRISPR-based diagnostic assay for Neisseria gonorrhoeae and Chlamydia trachomatis detection | |
Zhang et al. | Visualized Genotyping from “Sample to Results” Within 25 Minutes by Coupling Recombinase Polymerase Amplification (RPA) With Allele-Specific Invasive Reaction Assisted Gold Nanoparticle Probes Assembling | |
Zhao et al. | A novel strategy for the detection of SARS-CoV-2 variants based on multiplex PCR-MALDI-TOF MS | |
WO2024040112A2 (en) | Signal amplification assays for nucleic acid detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22899584 Country of ref document: EP Kind code of ref document: A2 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |