CN111383713B - ctDNA detection and analysis device and method - Google Patents

ctDNA detection and analysis device and method Download PDF

Info

Publication number
CN111383713B
CN111383713B CN201811642911.9A CN201811642911A CN111383713B CN 111383713 B CN111383713 B CN 111383713B CN 201811642911 A CN201811642911 A CN 201811642911A CN 111383713 B CN111383713 B CN 111383713B
Authority
CN
China
Prior art keywords
mutation
variation
somatic mutation
ctdna
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811642911.9A
Other languages
Chinese (zh)
Other versions
CN111383713A (en
Inventor
于慧
张美俊
刘涛
李大为
玄兆伶
王海良
王娟
肖飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Annoroad Gene Technology Beijing Co ltd
Beijing Annoroad Medical Laboratory Co ltd
Original Assignee
Annoroad Gene Technology Beijing Co ltd
Anoroad Institute Of Life Science
Beijing Annoroad Medical Laboratory Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Annoroad Gene Technology Beijing Co ltd, Anoroad Institute Of Life Science, Beijing Annoroad Medical Laboratory Co ltd filed Critical Annoroad Gene Technology Beijing Co ltd
Priority to CN201811642911.9A priority Critical patent/CN111383713B/en
Publication of CN111383713A publication Critical patent/CN111383713A/en
Application granted granted Critical
Publication of CN111383713B publication Critical patent/CN111383713B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/106Pharmacogenomics, i.e. genetic variability in individual responses to drugs and drug metabolism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Public Health (AREA)
  • Pathology (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Zoology (AREA)
  • Primary Health Care (AREA)
  • Biomedical Technology (AREA)
  • Epidemiology (AREA)
  • Analytical Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Genetics & Genomics (AREA)
  • Immunology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Hospice & Palliative Care (AREA)
  • Oncology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Toxicology (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Medicinal Chemistry (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to a ctDNA detection and analysis device, comprising: the device comprises a data processing module, a somatic cell variation acquisition module and a somatic cell variation screening module.

Description

ctDNA detection and analysis device and method
Technical Field
The invention relates to the field of DNA tumor analysis, in particular to a ctDNA detection and analysis device and a method.
Background
Molecular and cellular heterogeneity is a hallmark of cancer and one of the greatest challenges in tumor diagnosis and treatment. Genomic instability and phenotypic plasticity of cancer cells results in a lesion-specific genome. Accurate treatment of tumors has been the goal of scientific research and clinical medicine pursuit. Liquid biopsy is a method of in vitro diagnosis, is a non-invasive blood test, can monitor Circulating Tumor Cells (CTCs) and circulating tumor DNA (ctDNA) released from tumors or metastases into the blood circulation system, and is a breakthrough novel adjuvant therapy technique for detecting tumors. The liquid biopsy has the advantages of solving the pain point of accurate treatment, reducing the hazard of biopsy through non-invasive sampling, effectively prolonging the life cycle of a patient and having high cost performance.
The normal circulating cfDNA levels in cancer patients are higher than in healthy people. As tumor volume increases, cell turnover also increases due to the number of apoptotic and necrotic cells. Most cfDNA fragments measure between 180 and 200bp, indicating that apoptosis may produce most cfDNA in the circulation. In 1989 Stroun et al found that some cfDNA in the plasma of cancer patients was derived from cancer cells. In 1994, the first report that pancreatic cancer patients detected KRAS gene mutations in plasma cfDNA. In healthy people, the concentration of plasma free DNA (cfDNA) cfDNA is between 1 and 10ng/ml. Human plasma free nucleic acid fragments were first reported by Mandel and Metatis in 1949; cfDNA levels were reported in 1977 for the first cancer patients; mutations in cfDNA are highly specific markers of cancer, thus forming the term "circulating tumor DNA" (ctDNA).
Based on ctDNA, this liquid biopsy technique can be applied clinically: 1. early screening/diagnosis of cancer: for symptomatic patients, having sensitive and specific cancer detection may speed up diagnosis and treatment time. For population level, tumor screening with ctDNA as a disease marker can be intervened before symptoms are apparent. 2. Prognosis, residual disease and risk of recurrence assessment: assessing the risk of progression is critical to selecting a treatment. After treatment, patients with high risk of relapse were identified as being useful for stratification assistance therapy. 3. Treatment selection: novel molecular targeted therapies and immunotherapies improve patient molecular typing and therapeutic stratification. Currently, tumor biopsies obtain gold standards for tumor diagnosis, but are not readily available, and they are subject to tumor heterogeneity, leading to false negative results. 4. Treatment monitoring: the traditional monitoring method has limited precision and has higher cost and radiation hazard. Liquid biopsies can be repeatedly monitored with continuous repetition with less risk to the patient while providing accurate tumor-related information readings.
At present, ctDNA detection is mainly used for detecting somatic mutation of single or few known pathogenic sites to diagnose and treat diseases of patients. For patients with both systematic and germ-line mutations, only a portion of sites are often selected for analysis, thus introducing subjective factors, leading to a bias in results, and no way to conduct evaluation guidance simultaneously on different toxic reactions of drugs in populations of different genotypes.
Disclosure of Invention
In order to achieve the above object, the present invention provides an apparatus and method for ctDNA mutation site analysis and evaluation interpretation in combination with a second generation sequencing method.
In particular, the invention relates to the following:
1. a ctDNA detection analysis device, comprising:
the data processing module is used for filtering and comparing the ctDNA and the control sample sequencing data, screening low-quality variant sequencing data and reserving high-quality variant sequencing data;
the somatic mutation acquisition module is used for comparing the high-quality mutation sequencing data of ctDNA and the high-quality mutation sequencing data of a control sample with a human reference genome to acquire somatic mutation result information;
and the somatic mutation screening module is used for screening the acquired somatic mutation result information to obtain the genetic information related to the somatic mutation sites.
2. The ctDNA detection analysis device according to claim 1, wherein the data processing module further comprises, after:
the germ line variation acquisition module is used for comparing variation sequencing data of the control sample with human reference genome information and filtering to acquire germ line variation results;
and the genetic risk interpretation module is used for evaluating cancer risk information of the germ line variation result.
3. The ctDNA detection assay device according to scheme 1 or 2, wherein,
the somatic mutation screening module is further followed by:
and the targeting drug guiding module is used for screening drugs applied to the somatic mutation sites aiming at the somatic mutation sites of genes.
4. The ctDNA detection assay device according to scheme 2 or 3, wherein,
further comprising, after the genetic risk interpretation module:
and the drug metabolism module is used for evaluating the drug use effect of specific germ line variation sites of specific genes and determining the drug use response in populations with different genotypes.
5. The ctDNA detection analysis device according to any one of aspects 1 to 4, further comprising:
the sample collection module is used for collecting ctDNA samples and control samples of the subjects, carrying out second generation library construction sequencing to obtain sequencing data,
wherein the control sample is a normal leukocyte sample derived from a subject providing a ctDNA sample.
6. The ctDNA detection assay device according to any one of claims 1 to 5, wherein the somatic mutation comprises: single nucleotide variations and small fragment insertion deletion variations.
7. The ctDNA detection assay device according to any one of claims 1 to 6, wherein the somatic mutation screening module comprises three sub-modules:
A somatic mutation site annotation submodule for annotating a somatic mutation site;
the annotation result screening submodule is used for screening annotation results; and
and the somatic mutation result integration submodule is used for integrating the somatic mutation result obtained by screening.
8. The ctDNA detection analysis device according to claim 2 or 7, wherein, in the annotation result screening submodule and in the germline variation acquisition module, the following filtering method is performed to filter annotated somatic variation sites and germline variation results:
the missense variation of the exon region or the cleavage site region is retained,
filtering mutation sites of a thousand Genome database (the frequency of the mutation sites is more than 0.01 in the population), and reserving mutation sites with the frequency of less than 0.01 in a 1000Genome database; and
variant sites in the ExAC database (frequency greater than 0.01 in the population) were filtered, and variant sites in the ExAC database with frequencies less than 0.01 were retained.
9. The ctDNA detection analysis device according to claim 3, wherein the targeted drug guide module performs annotation interpretation of targeted drug metabolism information using an OncoKB database, and
further preferably, the targeted drug guide module includes evaluating administration information for a patient specific mutation site using the detected somatic mutation information according to FDA class classification of the targeted drug, and issuing a drug class classification.
10. The ctDNA detection analysis device according to claim 2, wherein the genetic risk interpretation module comprises interpreting genetic risk of the germline variation site according to ClinVar, BIC, HGMD database, giving a risk level of site variation.
11. The ctDNA detection and analysis device according to claim 4, wherein the drug metabolism module comprises a step of evaluating the response of a patient of a genotype to a drug, including toxicity and sensitivity, when the patient uses the drug, based on genotype information generated by a germline variation according to pharmsgkb database.
12. A ctDNA detection assay method comprising:
a data processing step, which is used for filtering and comparing the ctDNA and the control sample sequencing data, screening low-quality variant sequencing data and reserving high-quality variant sequencing data;
a somatic mutation information detection step, which is used for comparing the high-quality mutation sequencing data of ctDNA and the high-quality mutation sequencing data of a control sample with a human reference genome to obtain somatic mutation result information;
and a somatic mutation result screening step, which is used for screening the acquired somatic mutation result information to obtain the gene information related to the somatic mutation sites.
13. The ctDNA detection analysis method according to claim 12, wherein after the data processing step, further comprising:
the germ line variation information acquisition and filtration step is used for comparing variation sequencing data of a control sample with human reference genome information and obtaining germ line variation results through filtration;
and a genetic risk interpretation step for evaluating cancer risk information of the germ line variation result.
14. The ctDNA detection assay according to scheme 12 or 13, wherein,
the step of screening the somatic mutation results further comprises the following steps:
and a targeted drug guiding step, which is used for screening drugs applied to the somatic mutation sites aiming at the somatic mutation sites of the genes.
15. The ctDNA detection assay according to scheme 13 or 14, wherein,
the genetic risk interpretation step further comprises, after:
and a drug metabolism analysis step for evaluating the drug use effect of specific germ line variation sites of specific genes and determining drug use reactions in populations of different genotypes.
16. The ctDNA detection analysis method according to any one of schemes 12 to 15, further comprising:
a sample collection step for collecting ctDNA sample and control sample of the subject, and performing second generation library construction sequencing to obtain sequencing data,
Wherein the control sample is a normal leukocyte sample derived from a subject providing a ctDNA sample.
17. The ctDNA detection assay of any one of schemes 12 to 16, wherein the somatic variation comprises: single nucleotide variations and small fragment insertion deletion variations.
18. The ctDNA detection assay of any one of claims 12 to 17, wherein the somatic mutation result screening step comprises three sub-steps:
a somatic mutation site annotation sub-step for annotating the somatic mutation site;
a comment result screening sub-step for screening the comment result; and
and a somatic mutation result integration sub-step for integrating the somatic mutation result obtained by screening.
19. The ctDNA detection analysis method according to claim 13 or 18, wherein, in the annotation result screening sub-step and in the germ line variation information acquisition and filtering step, the following filtering method is performed to filter annotated somatic variation sites and germ line variation results:
the missense variation of the exon region or the cleavage site region is retained,
filtering mutation sites of a thousand Genome database (the frequency of the mutation sites is more than 0.01 in the population), and reserving mutation sites with the frequency of less than 0.01 in a 1000Genome database; and
Variant sites in the ExAC database (frequency greater than 0.01 in the population) were filtered, and variant sites in the ExAC database with frequencies less than 0.01 were retained.
20. The ctDNA detection analysis method according to claim 14, wherein the targeted drug guide method uses an OncoKB database for annotation interpretation of targeted drug metabolic information, and
further preferably, the targeted drug guide method includes evaluating administration information for a patient specific mutation site using the detected somatic mutation information according to FDA class classification of the targeted drug, and providing a drug class classification.
21. The ctDNA detection assay of claim 13, wherein the genetic risk interpretation step comprises interpreting genetic risk of the germline variation locus according to ClinVar, BIC, HGMD database, giving a risk grade of locus variation.
22. The ctDNA assay method according to claim 15, wherein the drug metabolism analysis step comprises evaluating the response of a patient of a genotype to a drug, including toxicity, sensitivity, when using a drug, based on genotype information generated by germline variation according to pharmsgkb database.
ADVANTAGEOUS EFFECTS OF INVENTION
All variations in the range of the design area are considered, so that the bias and omission caused by personal subjective factors are avoided; all the variations in the target area analyzed at one time are analyzed, and the method plays an important role in guiding the medication without genotype individuals.
Drawings
FIG. 1 shows a ctDNA detection and analysis device according to an embodiment of the present invention.
FIG. 2 shows a ctDNA detection and analysis device according to another embodiment of the present invention.
FIG. 3 shows a ctDNA detection and analysis device according to still another embodiment of the present invention.
FIG. 4 shows a ctDNA detection and analysis device according to still another embodiment of the present invention.
FIG. 5 shows a ctDNA detection and analysis device according to another embodiment of the present invention.
The present invention will be described in further detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The genetic mutation mentioned in the present invention includes somatic mutation and germ line mutation, wherein germ line mutation refers to genetic mutation from the mother, which is present in all cells of the whole body (both normal cells and cancer cells); somatic mutations, generally referred to as mutations that are self-produced by cancer cells, are not present in normal cells. Briefly, acquired mutations are somatic mutations, and only germ line mutations are inherited to the next generation.
The present invention relates to single nucleotide polymorphisms (single nucleotide polymorphism, SNPs) and single nucleotide variations (single nucleotide variants, SNV). SNPs are polymorphisms caused by single nucleotide variations (substitutions, insertions or deletions) at the same position in the genomic DNA sequence between individuals. The phenomenon that individual nucleotides at the same position in genomic DNA sequences of different species differ. Loci, DNA sequences, etc. that differ as such can be used as markers for genome mapping. On average, about every 1000 nucleotides in the human genome may exhibit a 1 single nucleotide polymorphism change, some of which may be associated with a disease but may be mostly unrelated to a disease. Single nucleotide polymorphisms are important grounds for studying genetic variation in human families and animal and plant lines. In the study of genomic variations in cancer, a specific single nucleotide variation in cancer is a somatic mutation (SNV) relative to normal tissue.
The small fragment insertion deletion variation (InDel) in the present invention refers to the insertion or deletion of a small fragment (> 50 bp) on the genome.
ctDNA in the present invention refers to circulating tumor DNA, which means tumor cell somatic DNA is released into the circulation system after shedding or apoptosis, and is a characteristic tumor biomarker. By ctDNA detection, tumor traces in blood can be detected. ctDNA is extracellular DNA in a cell-free state, exists in body fluids such as blood, synovial fluid and cerebrospinal fluid, is mainly composed of single-stranded or double-stranded DNA and a mixture of single-stranded and double-stranded DNA, and exists in the form of a DNA protein complex or free DNA.
The White Blood Cells (WBC) in the present invention are colorless, spherical, nucleated blood cells. The total number of normal adults is (4.0-10.0) x 10 9 The ratio of the ratio/L can be changed within a certain range according to different times of day and the functional state of the body. Leukocytes generally have active mobility and they can migrate from within the blood vessel to outside the blood vessel, or from outside the blood vessel to within the blood vessel. Therefore, white blood cells are present in blood and lymph, and are also present in tissues other than blood vessels and lymphatic vessels.
In a specific embodiment of the present invention, as shown in fig. 1, the present invention relates to a ctDNA detection and analysis device comprising: the data processing module is used for filtering and comparing the ctDNA and the control sample sequencing data, screening low-quality variant sequencing data and reserving high-quality variant sequencing data; the somatic mutation acquisition module is used for comparing the high-quality mutation sequencing data of ctDNA and the high-quality mutation sequencing data of a control sample with a human reference genome to acquire somatic mutation result information; and the somatic mutation screening module is used for screening the acquired somatic mutation result information to obtain the genetic information related to the somatic mutation sites.
In a specific embodiment of the present invention, as shown in fig. 2, the present invention relates to a ctDNA detection and analysis device comprising: the sample collection module is used for collecting ctDNA samples and control samples of a subject, and performing second generation library establishment sequencing to obtain sequencing data; the data processing module is used for filtering and comparing the ctDNA and the control sample sequencing data, screening low-quality variant sequencing data and reserving high-quality variant sequencing data; the somatic mutation acquisition module is used for comparing the high-quality mutation sequencing data of ctDNA and the high-quality mutation sequencing data of a control sample with a human reference genome to acquire somatic mutation result information; and the somatic mutation screening module is used for screening the acquired somatic mutation result information to obtain the genetic information related to the somatic mutation sites.
In a specific embodiment of the present invention, as shown in fig. 3, the present invention relates to a ctDNA detection and analysis device comprising: the sample collection module is used for collecting ctDNA samples and control samples of a subject, and performing second generation library establishment sequencing to obtain sequencing data; the data processing module is used for filtering and comparing the ctDNA and the control sample sequencing data, screening low-quality variant sequencing data and reserving high-quality variant sequencing data; the somatic mutation acquisition module is used for comparing the high-quality mutation sequencing data of ctDNA and the high-quality mutation sequencing data of a control sample with a human reference genome to acquire somatic mutation result information; the somatic mutation screening module is used for screening the acquired somatic mutation result information to obtain the gene information related to the somatic mutation sites; and the targeting drug guiding module is used for screening drugs applied to the somatic mutation sites aiming at the somatic mutation sites of genes.
In a specific embodiment of the present invention, as shown in fig. 4, the present invention relates to a ctDNA detection and analysis device comprising: the data processing module is used for filtering and comparing the ctDNA and the control sample sequencing data, screening low-quality variant sequencing data and reserving high-quality variant sequencing data; the somatic mutation acquisition module is used for comparing the high-quality mutation sequencing data of ctDNA and the high-quality mutation sequencing data of a control sample with a human reference genome to acquire somatic mutation result information; the somatic mutation screening module is used for screening the acquired somatic mutation result information to obtain the gene information related to the somatic mutation sites; the germ line variation acquisition module is used for comparing variation sequencing data of the control sample with human reference genome information and filtering to acquire germ line variation results; and the genetic risk interpretation module is used for evaluating cancer risk information of the germ line variation result.
In a specific embodiment of the present invention, as shown in fig. 5, the present invention relates to a ctDNA detection and analysis device comprising: the sample collection module is used for collecting ctDNA samples and control samples of the subjects, and carrying out second generation library establishment sequencing to obtain sequencing data; the data processing module is used for filtering and comparing the ctDNA and the control sample sequencing data, screening low-quality variant sequencing data and reserving high-quality variant sequencing data; the somatic mutation acquisition module is used for comparing the high-quality mutation sequencing data of ctDNA and the high-quality mutation sequencing data of a control sample with a human reference genome to acquire somatic mutation result information; the somatic mutation screening module is used for screening the acquired somatic mutation result information to obtain the gene information related to the somatic mutation sites; the targeting drug guiding module is used for screening drugs applied to somatic mutation sites aiming at the somatic mutation sites of genes; the germ line variation acquisition module is used for comparing variation sequencing data of the control sample with human reference genome information and filtering to acquire germ line variation results; the genetic risk interpretation module is used for evaluating cancer risk information of the germ line variation result; and the drug metabolism module is used for evaluating the drug use effect of specific germ line variation sites of specific genes and determining the drug use response in populations with different genotypes.
In a specific embodiment of the invention, in the sample collection module, ctDNA samples, control samples of the subject are collected and subjected to secondary library-building sequencing to obtain sequencing data. Wherein the control sample is a normal leukocyte sample derived from a subject providing a ctDNA sample.
In a specific embodiment of the invention, in the data processing module, specifically, the DNA library of the white blood cells of the established ctDNA and the control sample is subjected to on-machine sequencing on an Illumina sequencing platform, and the sequencing length is 150bp (PE 150) at double ends. Raw sequencing data is stored in fq format. In this step, the data is filtered to screen low quality variant sequencing data and retain high quality variant sequencing data prior to subsequent analysis using the data, specifically the filtering steps are as follows:
(1) Removing the heads polluted by the joints (the number of bases polluted by the joints in the heads is more than 5bp, and for double-end sequencing, removing the heads at two ends if one end is polluted by the joints);
(2) Removing low-quality Reads (the bases with the mass value Q less than or equal to 19 in the Reads account for more than 15% of the total bases), and removing the Reads at two ends if one end is the low-quality Reads for double-end sequencing;
(3) Reads with an N-content of greater than 5% were removed (for double-ended sequencing, if an N-content at one end was greater than 5%, both ends of the Reads were removed).
The clear Data obtained after filtering can be used for subsequent analysis. Clear Data was first aligned to the ginseng genome hg19 by BWA, and then the Data was sorted and deduplicated with samtools and Picard.
In a specific embodiment of the present invention, in the somatic mutation obtaining module, the high-quality variant sequencing data of ctDNA and the high-quality variant sequencing data of the control sample are compared with the human reference genome to obtain somatic mutation result information. Specifically, GATK is an abbreviation of Genome Analysis ToolKit, is a software for analyzing mutation information from high-throughput sequencing data, and is one of the most mainstream snp rolling software at present. And (2) taking the comparison result of the ctDNA sample and the white blood cells of the control sample obtained in the step (S2) as an input file, and providing an fa file of the ginseng genome hg19 and a bed format file of a target area, wherein the bed file contains specific position information of genes to be detected on a reference genome. Somatic mutation results directly detected using the mutct 2 module of GATK. Thus, the high-quality variant sequencing data of ctDNA and the high-quality variant sequencing data of the control sample can be compared with the human reference genome to obtain somatic variant result information.
In a specific embodiment of the invention, the somatic variation consists essentially of: single nucleotide variations and small fragment insertion deletion variations.
In a specific embodiment of the invention, the somatic mutation screening module comprises three sub-modules: a somatic mutation site annotation submodule for annotating a somatic mutation site; the annotation result screening submodule is used for screening annotation results; and a somatic mutation result integrating sub-module for integrating the somatic mutation result obtained by screening. Wherein, at the annotation result screening submodule, the following filtering method is performed to filter annotated somatic mutation sites: the missense mutation of an exon region or a cut site region is reserved, mutation sites of a thousand-person Genome database (the frequency of the Genome database is more than 0.01 in the crowd) are filtered, and mutation sites with the frequency lower than 0.01 in the Genome database are reserved; and filtering variant sites of the ExAC database (frequency is more than 0.01 in the population), and reserving variant sites with frequency lower than 0.01 in the ExAC database.
In a specific embodiment of the invention, in the somatic mutation screening module, low quality somatic mutation sites are screened, high quality somatic mutation sites are reserved, and the subsequent mutation sites read all the screened mutation site information. Screening low-quality variant sites is mainly performed in aspects of sequencing depth, detection quality value and the like of the sites, GATK4 software is adopted, and related parameters are default parameters of the software.
Firstly, in a somatic mutation site annotation submodule, annotation is carried out on a somatic mutation site, in order to ensure the comprehensiveness of annotation, annotation is carried out according to NCBI and UCSC databases at the same time in the module, and the gene where the mutation site is located and the specific position on the gene and the influence of mutation on protein coding are determined. Further, in this step, it is also preferable to make comments of different aspects using a plurality of databases. Such as disease-related databases COSMIC, HGMD, etc., to obtain more comprehensive genetic information.
And in the annotation result screening sub-module: the annotation results are further filtered. The somatic mutation sites obtained by annotation are further screened, and the specific screening method is as follows: 1) Preserving missense variation of an exon region (exonic) or a cleavage site region (splice, 2bp upstream of the cleavage site); 2) Filtering mutation sites of a thousand Genome database (the frequency of the mutation sites is more than 0.01 in the population), and reserving mutation sites with the frequency of less than 0.01 in a 1000Genome database; 3) Variant sites in the ExAC database (frequency greater than 0.01 in the population) were filtered, and variant sites in the ExAC database with frequencies less than 0.01 were retained.
Then, in the somatic mutation result integration submodule, the somatic mutation result obtained by screening is integrated, one mutation site is one row, and the chromosome where mutation is located, the mutation starting site, the mutation ending site, the base type of a reference genome, the base type after mutation, the gene name, the gene interval, the cDNA mutation condition and the protein peptide mutation condition are written in the first columns, and the information is used for locating mutation.
In a specific embodiment of the present invention, in the germ line variation obtaining module, the following filtering method is performed to filter germ line variation results: the missense mutation of an exon region or a cut site region is reserved, mutation sites of a thousand-person Genome database (the frequency of the Genome database is more than 0.01 in the crowd) are filtered, and mutation sites with the frequency lower than 0.01 in the Genome database are reserved; and filtering variant sites of the ExAC database (frequency is more than 0.01 in the population), and reserving variant sites with frequency lower than 0.01 in the ExAC database. In the module, specifically, the detection of germ line variation is also carried out by adopting GATK software, and only the sequencing result of the white blood cells of a control sample is used as an input file to provide an fa file of a ginseng genome hg19 and a bed format file of a target area, wherein the bed file contains specific position information of a gene to be detected on a reference genome.
In a specific embodiment of the present invention, the targeted drug guide module uses the OncoKB database for annotation interpretation of targeted drug metabolism information, and further preferably, the targeted drug guide module includes evaluating administration information for a specific mutation site of a patient using the detected somatic mutation information according to FDA classification level of the targeted drug, and issuing a drug class classification. Specifically, in the module, the method is used for screening out the drugs currently applied to the mutation site aiming at the specific mutation site of the gene, and guiding clinical application. Specifically, the Oncokb database can be used for matching mutation information of gene loci with targeted drug guidance to sort out targeted drug guidance information of sample somatic mutation.
In a specific embodiment of the invention, the genetic risk interpretation module comprises interpreting genetic risk of the germline variation locus according to the ClinVar, BIC, HGMD database, giving a risk ranking of locus variation. Specifically, based on the filtered germ line mutations, the cancer risk information in each database was matched by ANNOVAR, reference ClinVar, HGMD, BIC database; referring to the population databases, matching population frequencies in the databases, marking a cancer risk rating for each of the germline mutations.
In a specific embodiment of the invention, the drug metabolism module comprises a module for assessing the response, including toxicity, sensitivity, of a patient of a genotype to a drug when the drug is used, based on the genotype information generated by the germline variation, based on the pharmsgkb database. Specifically, based on the filtered germ line mutation, the PharmGKB database is consulted by ANNOVAR, and information such as the use of medicines at the position in the database and the metabolism, the response and the like of patients with different genotypes to the medicines are matched.
In a specific embodiment of the present invention, the present invention relates to a ctDNA detection assay method. Specifically, the method comprises the following steps:
S1, step: collecting sample information; the step is used for collecting ctDNA samples and control samples, and carrying out secondary library construction sequencing to obtain sequencing data. In this step, the sample typically comprises a ctDNA sample of the subject, a control leukocyte sample, and secondary pooling sequencing is performed on these collected samples to obtain sequencing data for these samples, wherein in this step, the preferred control leukocyte sample is a normal leukocyte sample from the subject itself.
S2, step: and a data processing step, wherein the data processing step is used for filtering and comparing the ctDNA and control sample sequencing data, screening low-quality variant sequencing data and retaining high-quality variant sequencing data.
Specifically, the DNA library of the constructed ctDNA and the white blood cells of the control sample are subjected to on-machine sequencing on an Illumina sequencing platform, and the sequencing length is 150bp (PE 150) at two ends. Raw sequencing data is stored in fq format. In this step, the data is filtered to screen low quality variant sequencing data and retain high quality variant sequencing data prior to subsequent analysis using the data, specifically the filtering steps are as follows:
(1) Removing the heads polluted by the joints (the number of bases polluted by the joints in the heads is more than 5bp, and for double-end sequencing, removing the heads at two ends if one end is polluted by the joints);
(2) Removing low-quality Reads (the bases with the mass value Q less than or equal to 19 in the Reads account for more than 15% of the total bases), and removing the Reads at two ends if one end is the low-quality Reads for double-end sequencing;
(3) Reads with an N-content of greater than 5% were removed (for double-ended sequencing, if an N-content at one end was greater than 5%, both ends of the Reads were removed).
The clear Data obtained after filtering can be used for subsequent analysis. Clear Data was first aligned to the ginseng genome hg19 by BWA, and then the Data was sorted and deduplicated with samtools and Picard.
S3, step: and a somatic mutation obtaining step, wherein the somatic mutation obtaining step is used for comparing the high-quality mutation sequencing data of ctDNA and the high-quality mutation sequencing data of a control sample with a human reference genome to obtain somatic mutation result information.
Specifically, GATK is an abbreviation of Genome Analysis ToolKit, is a software for analyzing mutation information from high-throughput sequencing data, and is one of the most mainstream snp rolling software at present. And (2) taking the comparison result of the ctDNA sample and the white blood cells of the control sample obtained in the step (S2) as an input file, and providing an fa file of the ginseng genome hg19 and a bed format file of a target area, wherein the bed file contains specific position information of genes to be detected on a reference genome. Somatic mutation results directly detected using the mutct 2 module of GATK. Thus, the high-quality variant sequencing data of ctDNA and the high-quality variant sequencing data of the control sample can be compared with the human reference genome to obtain somatic variant result information.
The somatic mutation information mainly comprises: single Nucleotide Variations (SNV) and small fragment InDel variations (InDel).
S4, step: a somatic mutation screening step; the step is used for screening the obtained somatic mutation result information to obtain the gene information related to the somatic mutation sites. In this step, low quality somatic mutation sites are screened, high quality somatic mutation sites are retained, and the screened mutation site information for both subsequent mutation site interpretation is used. Screening low-quality variant sites is mainly performed in aspects of sequencing depth, detection quality value and the like of the sites, GATK4 software is adopted, and related parameters are default parameters of the software.
Firstly, in step S4-1, annotation is carried out on the mutation sites of the somatic cells, and in order to ensure the comprehensiveness of the annotation, the annotation is carried out according to NCBI and UCSC databases at the same time in the step, and the genes where the mutation sites are located, the specific positions on the genes and the influence of mutation on protein coding are determined. Further, in this step, it is also preferable to make comments of different aspects using a plurality of databases. Such as disease-related databases COSMIC, HGMD, etc., to obtain more comprehensive genetic information.
And the next step S4-2: the annotation results are further filtered. The somatic mutation sites obtained by annotation are further screened, and the specific screening method is as follows:
1) Preserving missense variation of an exon region (exonic) or a cleavage site region (splice, 2bp upstream of the cleavage site);
2) Filtering mutation sites of a thousand Genome database (the frequency of the mutation sites is more than 0.01 in the population), and reserving mutation sites with the frequency of less than 0.01 in a 1000Genome database;
3) Variant sites in the ExAC database (frequency greater than 0.01 in the population) were filtered, and variant sites in the ExAC database with frequencies less than 0.01 were retained.
Then, in the step S4-3, the somatic mutation results obtained by screening are integrated, one mutation site is one row, and the chromosome where the mutation is located, the mutation start site, the mutation end site, the base type of the reference genome, the base type after the mutation, the gene name, the gene interval, the cDNA mutation condition and the protein peptide mutation condition are written in the first columns, and the information is used for locating the mutation.
S5, step: a targeted drug guidance step; this step is used to screen drugs applied to the somatic mutation sites against the somatic mutation sites of the genes.
Specifically, in this step, the method is used for screening out the drugs currently applied to the mutation site aiming at the specific mutation site of the gene, and guiding clinical application. Specifically, the Oncokb database can be used for matching mutation information of gene loci with targeted drug guidance to sort out targeted drug guidance information of sample somatic mutation.
The method further comprises the steps of evaluating the administration information aiming at the specific mutation site of the patient by utilizing the detected somatic mutation information according to the classification grade of the FDA on the targeted drug, and giving out the drug grade classification.
S6, step: a germ line variation obtaining step of using the high quality sequencing data obtained in the data processing steps S1 and S2 as well, that is, the sequencing data of only the leukocytes of the control sample, preferably the leukocytes of the subject themselves. In the step, the detection of germ line variation is also carried out by adopting GATK software, and only the sequencing result of the white blood cells of the control sample is used as an input file to provide an fa file of the ginseng genome hg19 and a bed format file of a target area, wherein the bed file contains specific position information of a gene to be detected on a reference genome.
Specifically, germline variation is identified based on the comparison of normal samples in the cancer analysis project. The germ line variation was filtered.
The specific filtering method is as follows:
1) Preserving missense variation of an exon region (exonic) or a cleavage site region (splice, 2bp upstream of the cleavage site);
2) Filtering mutation sites in a thousand-person Genome database (the frequency is more than 0.01 in all people and Asia), and reserving mutation sites with the frequency lower than 0.01 in a 1000Genome database;
3) Filtering variation sites of an ExAC database (the frequency of all people and Asia is greater than 0.01), and reserving variation sites with the frequency lower than 0.01 in the ExAC database;
4) The detected germ line mutation sites were filtered.
S7, step: a genetic risk interpretation step for evaluating cancer risk information of the germ line mutation site.
Specifically, based on the filtered germ line mutations, the cancer risk information in each database was matched by ANNOVAR, reference ClinVar, HGMD, BIC database; referring to the population databases, matching population frequencies in the databases, marking a cancer risk rating for each of the germline mutations.
S8: and a drug metabolism analysis step for evaluating the drug use effect of the specific variation site of the specific gene and the drug use response in the population with different genotypes.
Based on the filtered germ line mutation, the PharmGKB database is consulted through ANNOVAR, and information such as the use of medicines at the position in the database and the metabolism, the response and the like of patients with different genotypes on the medicines are matched.
In the prior art, a conventional analysis strategy for second generation sequencing data is sequencing data filter alignment-mutation detection-ANNOVAR annotation-followed by other analysis. In order to be in seamless connection with the second-generation sequencing analysis method, the invention starts with the detection result of ctDNA somatic variation and germ line variation, and evaluates the use of targeted drugs, cancer genetic risks and chemotherapy drugs/targeted drug metabolism.
Examples
Experiment 1: sample collection
ctDNA and blood leukocyte samples from two different case samples are collected, DNA is extracted and pooled for second generation sequencing analysis, wherein the methods for extracting DNA and pooling are all commonly used in the art, and a customized hybridization chip is adopted to capture the target region sequence of the candidate gene. Firstly, detecting the purity, concentration, volume and the like of DNA; establishing a genome sequencing library for DNA meeting the quality requirement to form a fragmented genome sequence, wherein both ends of the sequence comprise sequencing joints; hybridizing the genome DNA library with a customized liquid chip, and combining the target genome DNA fragment with a biotin-labeled oligonucleotide probe to form a hybridization complex according to a complementary pairing principle; the genome fragment which is not combined with the liquid phase chip probe is eluted and purified, and the capture fragment is amplified by PCR; and detecting the purity and the concentration of the library after the capture hybridization.
Experiment 2: sequencing data processing
The built DNA library is subjected to on-machine sequencing on an Illumina sequencing platform, and the sequencing length is 150bp (PE 150) at the two ends. Raw sequencing data is stored in fq format, and before analysis using the data, the data is filtered, with the following conditions:
(1) Removing the heads polluted by the joints (the number of bases polluted by the joints in the heads is more than 5bp, and for double-end sequencing, removing the heads at two ends if one end is polluted by the joints);
(2) Removing low-quality Reads (the bases with the mass value Q less than or equal to 19 in the Reads account for more than 15% of the total bases), and removing the Reads at two ends if one end is the low-quality Reads for double-end sequencing;
(3) Reads with an N-content of greater than 5% were removed (for double-ended sequencing, if an N-content at one end was greater than 5%, both ends of the Reads were removed).
The clear Data obtained after filtering can be used for subsequent analysis. Clear Data was first aligned to the ginseng genome hg19 by BWA, and then the Data was sorted and deduplicated with samtools and Picard.
Experiment 3: somatic mutation acquisition
GATK is an abbreviation of Genome Analysis ToolKit, is a software for analyzing mutation information from high-throughput sequencing data, and is one of the most mainstream snp rolling software at present. And (3) taking the comparison result file of the ctDNA sample and the control sample obtained in the experiment 2 as an input file, and providing an fa file of the ginseng genome hg19 and a bed format file of a target area, wherein the bed file contains specific position information of a gene to be detected on a reference genome. Somatic mutation results directly detected using the mutct 2 module of GATK.
Experiment 4: germ line variation acquisition
The germ line variation detection is also carried out by adopting GATK software, and only the comparison result file of a control sample (namely a blood leukocyte sample) is used as an input file to provide an fa file of the ginseng genome hg19 and a bed format file of a target area, wherein the bed file contains specific position information of a gene to be detected on a reference genome. Somatic mutation results were directly detected using the gapotypecller module of GATK.
Experiment 5: somatic mutation screening
Somatic variation was low quality site screened using the filtermutctcalls module for GATK and germline variation was low quality site screened using the variant filtration module for GATK. All parameters are analyzed by selecting default parameters, and mainly filtering and screening the aspects of depth, quality value, frequency and the like of the mutation sites.
Experiment 6: somatic mutation annotation screening
Somatic mutation annotation the use of an annovar annotation program, annovar is a software that integrates the application of the latest database information to functionally annotate mutation sites, and the annotation aspects include three aspects: annotation based on genes, genomic regions and filtering functions. Based on the database annotation of the genes, mainly annotating the name of the gene where the variation is located, whether the encoded protein is affected and the affected amino acid position information, such as the annotation of the RefGene database; based on the database annotation of the genomic region, mainly annotating the genomic functional region where the variation is located, such as gene, exon, UTR, transcription factor binding sites, etc., such as the annotation of the tfbsConsSites database; based on the database annotation of the filtering function, it is mainly annotated whether the variation appears in some common databases and the conservativeness and pathogenicity of the variation, such as the annotation of the database of 1000Genome, dbSNP, dbNSFP.
After the annotation is completed, the module sorts the annotation results, extracts the important information of the mutation site, the specific display results are shown in the following table 1, the detected mutation site is located at 15541654 positions of chromosome 1, the reference base type of the position is T, the processed mutation base type is C, the gene related to the position is TMEM51, the position is located at an exon region of the TMEM51 gene, the mutation from T to C is a missense mutation, the mutation of the position can cause the change of coding amino acid, the mutation of the position can cause the change of the information of the corresponding transcript, the related transcripts are NM_001136217, NM_018002, NM_001136216, NM_001136218, the related transcripts are respectively represented by exon2, exon2, exon3 and exon3 of the cDNA, the T of the position is C, the corresponding amino acid change is leucine (L) of the position is proline (P) of 24, the position is 80 in a normal sample, the mutation frequency is 0, the mutation frequency is 22.22 in a tumor sample, and the quality of the sample is detected by a threshold value of the detection.
TABLE 1 somatic variation annotation screening results
Chr: a chromosome;
start: a starting position;
end: a termination position;
ref: reference genome base type;
alt: variant base form;
gene: a gene name;
genePos: the locus is located in a gene functional region;
exonicFunc: classifying mutation sites;
transcript: a transcript;
exon: exon numbering;
cdna: base change;
pep: amino acid changes;
splicing: a cleavage site;
depth_normal: sequencing depth of normal sample;
freq_normal: mutation frequency of normal samples;
depth_scroll: sequencing depth of cancer samples;
freq_scroll: mutation frequency of cancer samples;
filter: whether or not to pass filtration;
because the disease variation focused by the current scientific research mainly focuses on the nonsensical mutation of the exon region and the mutation of the cleavage site, and the somatic variation often has lower mutation frequency in the population, we select the detected somatic variation based on the fact that the specific filtering method is as follows:
1) Preserving missense variation of an exon region (exonic) or a cleavage site region (splice, 2bp upstream of the cleavage site);
2) Filtering mutation sites in a thousand-person Genome database (the frequency is more than 0.01 in all people and Asia), and reserving mutation sites with the frequency lower than 0.01 in a 1000Genome database;
3) Filtering variation sites of an ExAC database (the frequency of all people and Asia is greater than 0.01), and reserving variation sites with the frequency lower than 0.01 in the ExAC database;
experiment 7: targeted drug guidance
The OncoKB database contains the influence on organisms caused by the change of specific tumor genes and the related information of drug treatment, and is a very accurate database in tumor treatment. And inputting the detected somatic variation sites and the detected cancer seeds, and comparing and annotating with an Oncokb database to obtain site pathogenicity rating corresponding to the project cancer seeds and using treatment information of the FDA approved targeted drugs. Specific results show that the gene name involved in the somatic mutation site is EGFR, the specific information of the mutation is mutation of leucine (L) at position 858 of cDNA into arginine (R), and targeting drugs which can be guided currently are erlotinib, afatinib and gefitinib aiming at the mutation information, wherein the three drugs are all FDA (FDA approved targeting drugs) and the approved disease type is consistent with the disease type detected by the module.
Table 2 targeted drug guide analysis results
gene_name: targeting drug targeting gene name;
alt: mutation information;
drug: drug information;
level: and (5) classifying the medicines.
The last column of the level targeting drugs is classified as follows:
a: the FDA obtains batch targeting medicine, indication medicine, has the applicability of FDA obtaining batch and detects the biomarker;
b1: FDA obtains batch targeting drugs, indication drugs, and has the applicability of NCCN recommended detection biomarkers;
b2: FDA obtains batch targeted drugs, non-indication drugs, and has the applicability of NCCN recommended detection biomarkers;
c1: the medicine has an effect on the clinical targeting medicine and the indication medicine;
c2: targeted drugs with clinical effects, and non-indication drugs;
d: a target drug of therapeutic interest in biology;
n: based on the indication and biomarker detection results, the FDA approved targeted drug is tolerogenic.
Experiment 8: germline variation annotation
The germline mutation annotation also uses an annovar annotation program, wherein annovar is software for integrating and applying the latest database information to perform functional annotation on mutation sites, and the annotation aspects comprise three aspects: annotation based on genes, genomic regions and filtering functions. Based on the database annotation of the genes, mainly annotating the name of the gene where the variation is located, whether the encoded protein is affected and the affected amino acid position information, such as the annotation of the RefGene database; based on the database annotation of the genomic region, mainly annotating the genomic functional region where the variation is located, such as gene, exon, UTR, transcription factor binding sites, etc., such as the annotation of the tfbsConsSites database; based on the database annotation of the filtering function, it is mainly annotated whether the variation appears in some common databases and the conservativeness and pathogenicity of the variation, such as the annotation of the database of 1000Genome, dbSNP, dbNSFP.
Experiment 9: genetic risk interpretation
The occurrence of cancer can be affected by genetic factors. The "secondary hit" theory indicates that if detrimental germ line mutations are present on the cancer suppressor gene, then cancer will occur once additional detrimental somatic mutations occur. Based on the germline mutations identified, cancer genetic risk information is generally obtained by reference to the ClinVar, HGMD and BIC databases. Risk information is classified into 5 categories: (1) benign; (2) potentially benign; (3) An uncertainty variation of importance (variant of uncertain significance, VUS); (4) potentially pathogenic (likely pathogenic); (5) pathogenicity (pathogenicity).
The ClinVar database is primarily a collection of genetic variation information associated with disease, containing clinical notes of over 125,000 unique mutations. ClinVar uses the star-based system (star-based) to evaluate the intrinsic or annotated role of a particular mutation in a disease.
The HGMD database records and sorts the database of pathogenic sites closely related to human genetic diseases in published documents, each site has PMID of reference, and the collected site disease relation types mainly comprise disease-causing mutation, possible disease-causing mutation but the pathogenicity is yet to be confirmed, functional polymorphism, polymorphism related to the disease and polymorphism related to the disease are proved by functional support.
The BIC database mainly collects information about mutation of pathogenic sites related to breast cancer.
The germ line mutation and the corresponding cancer genetic risk information result are displayed, as shown in the following table 3, the gene involved in the germ line mutation is KCNQ4, the specific information of the mutation is that the C base at 546 th position of the cDNA is mutated to G base type, for the mutation, the mutation is pathogenic according to the information of the ClinVar database, the mutation is possibly related to non-comprehensive hearing impairment in the studied disease, the mutation is a dominant genetic mutation according to the information of the HGMD database, the mutation is related to deafness, the autosomal dominant, and the genetic risk information of the site cannot be obtained according to the information of the BIC database.
TABLE 3 genetic risk interpretation information
Gene: a gene name;
mutation: variation information;
risk_label_clinvar: risk assessment based on a ClinVar database;
risk_label_bic: risk assessment based on the BIC database;
risk_label_hgmd: risk assessment based on the HGMD database;
experiment 10: drug metabolism assessment
The pharmsgkb database is a database that collects genotype and phenotype information related to the genome of drugs and systematically categorizes this information. Pharmsgkb, known as Pharmacogenetics and Pharmacogenomics Knowledge Base, translates into chinese, the database of genetics and pharmacogenomics, and the website divides the relationship between genes and drugs into two broad categories: phenotypes (including: clinical Outcome (CO), pharmacodynamics and drug response (PD), pharmacokinetics (PK), molecular and cellular Functional Assays (FA)); genotype (GN).
Through PharmGKB database annotation, different drug absorption and toxicity conditions of patients can be definitely treated by taking drugs when site variation is different in genotypes.
The results of drug metabolism evaluation show that the gene involved in germline variation is EGFR, the specific mutation site is on chromosome 7 at 55086755, mutated from reference base G to T, which is a known mutation, the number in the dbSNP database is rs712829, the genotype of the individual at this site is 0/1, i.e., heterozygous germline variation, for which erlotinib can be used according to the PharmGKB database, and the severity of possible diarrhea is less when the drug is treated with the patient.
TABLE 4 evaluation results of drug metabolism
Gene: a gene name;
chromosome: a chromosome;
gene position: position information;
rs number: rs number;
REF: a reference sequence;
ALT: a mutant sequence;
genotype: genotype;
drug class: a drug type;
drug name: chinese name of the drug;
metabolic characteristics: a related description;
the targeted drug guidance in experiment 7 was based solely on somatic mutation, and the drugs used were three of erlotinib, afatinib, and gefitinib. Experiment 10 was further analyzed for germ line variation in the sample in combination with the results of the somatic variation assay, resulting in a lower severity of possible diarrhea when patients were treated with erlotinib. Therefore, the detection and analysis device provided by the invention further perfects the detection result of the existing somatic cells, and plays an important role in guiding the medication of individuals with different genotypes.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and substitutions 10 can be made by those skilled in the art without departing from the technical principles of the present invention, and these modifications and substitutions should also be considered as being within the scope of the present invention.

Claims (6)

1. A ctDNA detection analysis device, comprising:
the data processing module is used for filtering and comparing the ctDNA and the sequencing data of the control sample to obtain high-quality sequencing data;
the somatic mutation acquisition module is used for comparing the high-quality sequencing data of ctDNA and the high-quality mutation sequencing data of a control sample with a human reference genome to acquire somatic mutation result information;
the somatic mutation screening module is used for screening the acquired somatic mutation result information to obtain the gene information related to the somatic mutation sites;
further comprising, after the data processing module:
the germ line variation acquisition module is used for comparing variation sequencing data of the control sample with human reference genome information and filtering to acquire germ line variation results;
the genetic risk interpretation module is used for evaluating cancer risk information of the germ line variation result;
Wherein the control sample is a normal leukocyte sample derived from a subject providing a ctDNA sample;
the somatic mutation screening module comprises three sub-modules:
a somatic mutation site annotation submodule for annotating a somatic mutation site;
the annotation result screening submodule is used for screening annotation results; and
the somatic mutation result integrating submodule is used for integrating the somatic mutation results obtained by screening;
the somatic mutation screening module is further followed by:
the targeting drug guiding module is used for screening drugs applied to somatic mutation sites aiming at the somatic mutation sites of genes;
further comprising, after the genetic risk interpretation module:
and the drug metabolism module is used for evaluating the drug use effect of the specific germ line variation site of the gene and determining the drug use response in the population with different genotypes. .
2. The ctDNA detection analysis device according to claim 1, further comprising:
and the sample collection module is used for collecting ctDNA samples and control samples of the subjects, and carrying out second generation library establishment sequencing to obtain sequencing data.
3. The ctDNA detection assay device of claim 1 or 2, wherein the somatic mutation comprises: single nucleotide variations and small fragment insertion deletion variations.
4. The ctDNA detection analysis device according to claim 1, wherein in the annotation result screening submodule and in the germline variation acquisition module, the following filtering method is performed to filter annotated somatic variation sites and germline variation results:
the missense variation of the exon region or the cleavage site region is retained,
filtering variation sites with the frequency of more than 0.01 in the human group in the thousand-person Genome database, and reserving variation sites with the frequency of less than 0.01 in the 1000Genome database; and
variant sites with a frequency greater than 0.01 in the human population in the ExAC database are filtered, and variant sites with a frequency lower than 0.01 in the ExAC database are reserved.
5. The ctDNA detection analysis device according to claim 1, wherein the targeted drug guide module uses an OncoKB database for annotation interpretation of targeted drug metabolic information.
6. The ctDNA detection analysis device according to claim 1, wherein the genetic risk interpretation module comprises interpreting genetic risk of germline variation loci according to ClinVar, BIC, HGMD database, giving a risk ranking of locus variation.
CN201811642911.9A 2018-12-29 2018-12-29 ctDNA detection and analysis device and method Active CN111383713B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811642911.9A CN111383713B (en) 2018-12-29 2018-12-29 ctDNA detection and analysis device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811642911.9A CN111383713B (en) 2018-12-29 2018-12-29 ctDNA detection and analysis device and method

Publications (2)

Publication Number Publication Date
CN111383713A CN111383713A (en) 2020-07-07
CN111383713B true CN111383713B (en) 2023-08-01

Family

ID=71216590

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811642911.9A Active CN111383713B (en) 2018-12-29 2018-12-29 ctDNA detection and analysis device and method

Country Status (1)

Country Link
CN (1) CN111383713B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112735520B (en) * 2021-02-03 2021-07-20 深圳裕康医学检验实验室 Interpretation method, system and storage medium for tumor individualized immunotherapy gene detection result

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2759605A1 (en) * 2013-01-25 2014-07-30 Signature Diagnostics AG A method for predicting a manifestation of an outcome measure of a cancer patient
JP2015089364A (en) * 2013-11-07 2015-05-11 有限会社ジェノテックス Cancer diagnostic method by multiplex somatic mutation, development method of cancer pharmaceutical, and cancer diagnostic device
CN105063208A (en) * 2015-08-10 2015-11-18 北京吉因加科技有限公司 Low-frequency mutation enrichment sequencing method for free target DNA (deoxyribonucleic acid) in plasma
CN105518151A (en) * 2013-03-15 2016-04-20 莱兰斯坦福初级大学评议会 Identification and use of circulating nucleic acid tumor markers
CN106591486A (en) * 2016-12-27 2017-04-26 安诺优达基因科技(北京)有限公司 Kit for detecting blood disease related gene variation
CN106676182A (en) * 2017-02-07 2017-05-17 北京诺禾致源科技股份有限公司 Low-frequency gene fusion detection method and device
CN108300716A (en) * 2018-01-05 2018-07-20 武汉康测科技有限公司 Joint component, its application and the method that targeting sequencing library structure is carried out based on asymmetric multiplex PCR

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040072143A1 (en) * 1998-06-01 2004-04-15 Weyerhaeuser Company Methods for classification of somatic embryos
US9646134B2 (en) * 2010-05-25 2017-05-09 The Regents Of The University Of California Bambam: parallel comparative analysis of high-throughput sequencing data
JP2015501974A (en) * 2011-11-07 2015-01-19 インジェヌイティ システムズ インコーポレイテッド Methods and systems for identification of causal genomic mutations.
US20130268207A1 (en) * 2012-04-09 2013-10-10 Life Technologies Corporation Systems and methods for identifying somatic mutations
US20140315199A1 (en) * 2013-04-17 2014-10-23 Life Technologies Corporation Gene fusions and gene variants associated with cancer
CN104462869B (en) * 2014-11-28 2017-12-26 天津诺禾致源生物信息科技有限公司 The method and apparatus for detecting body cell single nucleotide mutation
WO2016106391A1 (en) * 2014-12-22 2016-06-30 The Broad Institute, Inc. Rapid quantitative detection of single nucleotide polymorphisms or somatic variants and methods to identify malignant neoplasms
EP3271848A4 (en) * 2015-03-16 2018-12-05 Personal Genome Diagnostics Inc. Systems and methods for analyzing nucleic acid
US11302416B2 (en) * 2015-09-02 2022-04-12 Guardant Health Machine learning for somatic single nucleotide variant detection in cell-free tumor nucleic acid sequencing applications
GB201516047D0 (en) * 2015-09-10 2015-10-28 Cancer Rec Tech Ltd Method
US20180089373A1 (en) * 2016-09-23 2018-03-29 Driver, Inc. Integrated systems and methods for automated processing and analysis of biological samples, clinical information processing and clinical trial matching
SI3551753T1 (en) * 2016-12-09 2022-09-30 The Broad Institute, Inc. Crispr effector system based diagnostics
CN106845153A (en) * 2016-12-29 2017-06-13 安诺优达基因科技(北京)有限公司 A kind of device for using Circulating tumor DNA pattern detection somatic mutation
CN107423578B (en) * 2017-03-02 2020-09-22 北京诺禾致源科技股份有限公司 Device for detecting somatic cell mutation
CN107391965A (en) * 2017-08-15 2017-11-24 上海派森诺生物科技股份有限公司 A kind of lung cancer somatic mutation determination method based on high throughput sequencing technologies

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2759605A1 (en) * 2013-01-25 2014-07-30 Signature Diagnostics AG A method for predicting a manifestation of an outcome measure of a cancer patient
CN105518151A (en) * 2013-03-15 2016-04-20 莱兰斯坦福初级大学评议会 Identification and use of circulating nucleic acid tumor markers
JP2015089364A (en) * 2013-11-07 2015-05-11 有限会社ジェノテックス Cancer diagnostic method by multiplex somatic mutation, development method of cancer pharmaceutical, and cancer diagnostic device
CN105063208A (en) * 2015-08-10 2015-11-18 北京吉因加科技有限公司 Low-frequency mutation enrichment sequencing method for free target DNA (deoxyribonucleic acid) in plasma
CN106591486A (en) * 2016-12-27 2017-04-26 安诺优达基因科技(北京)有限公司 Kit for detecting blood disease related gene variation
CN106676182A (en) * 2017-02-07 2017-05-17 北京诺禾致源科技股份有限公司 Low-frequency gene fusion detection method and device
CN108300716A (en) * 2018-01-05 2018-07-20 武汉康测科技有限公司 Joint component, its application and the method that targeting sequencing library structure is carried out based on asymmetric multiplex PCR

Also Published As

Publication number Publication date
CN111383713A (en) 2020-07-07

Similar Documents

Publication Publication Date Title
Quach et al. Expanding the genetic architecture of nicotine dependence and its shared genetics with multiple traits
JP7022188B2 (en) Methods for multi-resolution analysis of cell-free nucleic acids
ES2969767T3 (en) Diagnostic methods
ES2968457T3 (en) Characteristics of the ends of circulating extracellular DNA
CN107577921A (en) A kind of tumor target gene sequencing data analytic method
CN105986031B (en) Tumor susceptibility 62 gene and application thereof
CN105420351A (en) Method and system for determining individual gene mutation
CN109385666A (en) Lymthoma gene trap chip and its application
CN105506115A (en) DNA library for detecting and diagnosing genetic cardiomyopathy pathogenic genes and application thereof
Muller et al. OutLyzer: software for extracting low-allele-frequency tumor mutations from sequencing background noise in clinical practice
CN105780129A (en) Method for constructing sequencing library of target area
CN105925665A (en) Kit, database establishment method, and method and system for detecting area target variation
US12054712B2 (en) Fragment size characterization of cell-free DNA mutations from clonal hematopoiesis
JP2015089364A (en) Cancer diagnostic method by multiplex somatic mutation, development method of cancer pharmaceutical, and cancer diagnostic device
CN112708669A (en) Primer pair and kit for detecting polymorphism of adalimumab-related drug genes
CN102899413A (en) Application of single nucleotide polymorphisms of rs2735591 in detection of leprosy susceptibility genes
CN111383713B (en) ctDNA detection and analysis device and method
CN107151707B (en) Kit for detecting lung cancer related gene hot spot mutation and application thereof
CN110004229A (en) Application of the polygenes as EGFR monoclonal antibody class Drug-resistant marker
CN104178487A (en) ATM gene mutant and application thereof
CN107760688A (en) A kind of BRCA2 gene mutation bodies and its application
CN104073499A (en) TMC1 gene mutant and applications thereof
KR102481211B1 (en) SNP marker for chronic sensorineural tinnitus diagnosis and diagnosis method using the same
CN105838720A (en) PTPRQ gene mutant and application thereof
CN104164424A (en) CC2D2A gene mutant and application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240328

Address after: Room 701, unit 2, building 8, yard 88, Kechuang 6th Street, Daxing District, Beijing 100176

Patentee after: ANNOROAD GENE TECHNOLOGY (BEIJING) Co.,Ltd.

Country or region after: China

Patentee after: BEIJING ANNOROAD MEDICAL LABORATORY Co.,Ltd.

Address before: Room 101 and Room 201, Unit 2, Building 8, Yard 88, Kechuang 6th Street, Daxing District Economic and Technological Development Zone, Beijing 100176

Patentee before: BEIJING ANNOROAD MEDICAL LABORATORY Co.,Ltd.

Country or region before: China

Patentee before: ANOROAD INSTITUTE OF LIFE SCIENCE

Patentee before: ANNOROAD GENE TECHNOLOGY (BEIJING) Co.,Ltd.