CN118380053A - Method for screening novel hereditary susceptibility genes of medulloblastoma - Google Patents

Method for screening novel hereditary susceptibility genes of medulloblastoma Download PDF

Info

Publication number
CN118380053A
CN118380053A CN202410813874.2A CN202410813874A CN118380053A CN 118380053 A CN118380053 A CN 118380053A CN 202410813874 A CN202410813874 A CN 202410813874A CN 118380053 A CN118380053 A CN 118380053A
Authority
CN
China
Prior art keywords
mutation
medulloblastoma
family
genes
susceptibility
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410813874.2A
Other languages
Chinese (zh)
Inventor
姜涛
李健康
韩洞明
刘海龙
陈璇
刘子嘉
金鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BGI Shenzhen Co Ltd
Beijing Tiantan Hospital
Original Assignee
BGI Shenzhen Co Ltd
Beijing Tiantan Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BGI Shenzhen Co Ltd, Beijing Tiantan Hospital filed Critical BGI Shenzhen Co Ltd
Priority to CN202410813874.2A priority Critical patent/CN118380053A/en
Publication of CN118380053A publication Critical patent/CN118380053A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B35/00ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
    • G16B35/20Screening of libraries
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/40Population genetics; Linkage disequilibrium
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Genetics & Genomics (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Chemical & Material Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Public Health (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Primary Health Care (AREA)
  • Biomedical Technology (AREA)
  • Ecology (AREA)
  • Physiology (AREA)
  • Epidemiology (AREA)
  • Biochemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to the technical field of tumor susceptibility genes, in particular to a method for screening novel hereditary susceptibility genes of medulloblastoma. After families containing known susceptibility gene mutations are screened out by the method provided by the invention, the families which do not contain the known susceptibility gene mutations are left, and candidate susceptibility genes are searched in mutation data of the families, so that the interpretability of the method (namely, the interpretation rate of pathogenesis of medulloblastoma on a molecular mechanism) can be improved; the invention can reduce the false positive of the screening result and improve the accuracy of the screening result by screening the genes containing more than 2 heterozygous deletion mutation sites and filtering out the genes with higher gene damage values of normal people reported in GDI.

Description

Method for screening novel hereditary susceptibility genes of medulloblastoma
Technical Field
The invention relates to the technical field of tumor susceptibility genes, in particular to a method for screening novel hereditary susceptibility genes of medulloblastoma.
Background
Cancer susceptibility genes have been shown to have a high degree of clinical relevance [1] in guiding cancer prevention strategies. In Medulloblastoma (MB), the importance of the cancer susceptibility syndrome is becoming more and more accepted [2]. Germline mutations in susceptibility genes account for about 5-6% of MB diagnosis. Genetic defects can lead to deregulation of specific molecular pathways, leading to tumorigenesis [3]. MB has been found to be associated with the following cancer susceptibility syndromes, respectively: gorlin syndrome (PTCH 1 [6] and SUFU [7] gene mutations), li-fraemeni syndrome (TP 53 gene mutation) [4], fanconi anaemia (BRAC 2 gene mutation) [5], familial adenomatous polyposis (APC gene mutation) [6]. There are six susceptibility genes that have been validated for MB: APC, BRCA2, PALB2, PTCH1, SUFU and TP53, which susceptibility genes act on different subtypes of MB, wherein APC affects the Wnt/β -catenin pathway, BRCA2 and PALB2 affect the DNA repair pathway, PTCH1 and SUFU affect the Hedgehog pathway, TP53 affects the cell cycle regulation and DNA damage repair pathway. Mutation or abnormal expression of the genes can lead to the uncontrolled control of corresponding channels, thereby affecting biological processes such as cell proliferation, differentiation, growth, apoptosis and the like, further leading to the occurrence of MB, and providing guidance [3] for genetic counseling and detection of MB. However, the germ line mutation of the susceptibility gene of the medulloblastoma accounts for only 5-6% of the MB diagnosis rate at present, and a large number of medulloblastoma patients cannot explain the pathogenesis of the susceptibility gene on the molecular mechanism of the susceptibility gene.
Reference is made to:
[1] Mark A Glaire, Matthew Brown, David N Church, Ian Tomlinson. Cancer predisposition syndromes: lessons for truly precision medicine[J]. The Journal of Pathology,2017,241(2).
[2] Carta Roberto, et al. Cancer Predisposition Syndromes and Medulloblastoma in the Molecular Era[J]. Frontiers in oncology,2020,10.
[3] Waszak SM, Northcott PA, Buchhalter I, Robinson GW, Sutter C, Groebner S, et al. Spectrum and prevalence of genetic predisposition in medulloblastoma: a retrospective genetic study and prospective validation in a clinical trial cohort[J]. The Lancet Oncology,2018,19(6).
[4] Li FP, Fraumeni JF. Rhabdomyosarcoma in children: epidemiologic study and identification of a familial cancer syndrome[J]. Natl Cancer Inst 1969; 43: 1365–73.
[5] Offit K, Levran O, Mullaney B, et al. Shared genetic susceptibility to breast cancer, brain tumors, and Fanconi anemia[J]. Natl Cancer Inst 2003; 95: 1548-51.
[6] Hamilton SR, Liu B, Parsons RE, et al. The molecular basis of Turcot's syndrome[J]. N Engl J Med 1995; 332: 839–47.
Disclosure of Invention
In order to solve the problems, the invention provides a method for screening a novel genetic susceptibility gene of medulloblastoma. The method provided by the invention can accurately screen and detect the new genetic susceptibility genes in the whole genome data of the medulloblastoma population, has low false positive of screening results, and provides technical support for MB genetic counseling and detection.
In order to achieve the above object, the present invention provides the following technical solutions:
the invention provides a method for screening a novel genetic susceptibility gene of medulloblastoma, which comprises the following steps:
Selecting a plurality of family samples as samples to be tested; the family sample comprises blood of a medulloblastoma patient, tumor tissue of the medulloblastoma patient and blood of parents of the medulloblastoma patient;
Performing whole genome sequencing and mutation site detection on the sample to be detected to obtain VCF files of each sample; combining the VCF files of each sample according to the family relation to obtain family VCF files;
Performing quality control and annotation on the family VCF file, and removing the family VCF file with known mutation of the susceptibility gene of the medulloblastoma to obtain the family VCF file without known mutation of the susceptibility gene;
Screening in the family VCF file without known susceptibility gene mutation to obtain the novel hereditary susceptibility gene of the medulloblastoma; the screening criteria were: the genes satisfying 1) to 3) simultaneously are specifically:
1) Genes with heterozygous deletion mutation sites occurring more than 2 times; the heterozygous deletion mutation sites are as follows: the mutation site is heterozygous in blood sequencing data of a medulloblastoma patient, the tumor sequencing data is homozygous wild or homozygous mutant, and the father and/or mother blood sequencing data is heterozygous;
2) GDI is a gene rated below Medium or unrated;
3) The average expression quantity of the brain is more than or equal to 10TPM genes.
Preferably, the quality control criteria include: the variation sites Qual By Depth<2.0,RMS Mapping Quality<40.0,Fisher Strand>60.0,Strand Odds Ratio>3.0,Mapping Quality Rank Sum Test<-12.5 and Read Pos Rank Sum Test < -8.0 were removed.
Preferably, the mutation site detected by the mutation site includes a point mutation, an insertion mutation and a deletion mutation.
Preferably, the annotated criteria include: (1) Filtering out variant sites with allele frequencies greater than 0.01 using a known mutation database; the known mutation databases include the thousand genome project, dbSNP, ESP6500, exAC, or gnomAD; (2) The reserved mutation type is a frame shift mutation, a missense mutation, a non-frame shift mutation, a splicing mutation, a start codon loss mutation, a stop codon acquisition mutation or a stop codon loss mutation; (3) And reserving heterozygous mutation sites with mutation allele frequencies of 0.25-0.75.
Preferably, the annotated region includes: a3 'untranslated region, a 5' untranslated region, and an exon region.
Preferably, the known medulloblastoma susceptibility genes include: APC, BRCA2, PALB2, PTCH1, TP53 and SUFU.
Preferably, the family number of the plurality of family samples is equal to or greater than 43.
Preferably, the novel genetic susceptibility gene of medulloblastoma comprises: SSPO, SPECC1, RABEP1, MTUS1, MPRIP, and CSMD1.
Preferably, the blood comprises peripheral blood.
The beneficial effects are that:
After families containing known susceptibility gene mutations are screened out by the method provided by the invention, the families which do not contain the known susceptibility gene mutations are left, and candidate susceptibility genes are searched in mutation data of the families, so that the interpretability of the method (namely, the interpretation rate of pathogenesis of medulloblastoma on a molecular mechanism) can be improved; the invention can reduce the false positive of the screening result and improve the accuracy of the screening result by screening the genes containing more than 2 heterozygous deletion mutation sites and filtering out the genes with higher gene damage values of normal people reported in GDI.
Detailed Description
The invention provides a method for screening a novel genetic susceptibility gene of medulloblastoma, which comprises the following steps:
Selecting a plurality of family samples as samples to be tested; the family sample comprises blood of a medulloblastoma patient, tumor tissue of the medulloblastoma patient and blood of parents of the medulloblastoma patient;
Performing whole genome sequencing and mutation site detection on the sample to be detected to obtain VCF files of each sample; combining the VCF files of each sample according to the family relation to obtain family VCF files;
Performing quality control and annotation on the family VCF file, and removing the family VCF file with known mutation of the susceptibility gene of the medulloblastoma to obtain the family VCF file without known mutation of the susceptibility gene;
Screening in the family VCF file without known susceptibility gene mutation to obtain the novel hereditary susceptibility gene of the medulloblastoma; the screening criteria were: the genes satisfying 1) to 3) simultaneously are specifically:
1) Genes with heterozygous deletion mutation sites occurring more than 2 times; the heterozygous deletion mutation sites are as follows: the mutation site is heterozygous in blood sequencing data of a medulloblastoma patient, the tumor sequencing data is homozygous wild or homozygous mutant, and the father and/or mother blood sequencing data is heterozygous;
2) GDI is a gene rated below Medium or unrated;
3) The average expression quantity of the brain is more than or equal to 10TPM genes.
The invention selects a plurality of family samples as samples to be tested; the family sample includes blood from a medulloblastoma patient, tumor tissue from a medulloblastoma patient, and blood from a parent of a medulloblastoma patient. In the invention, the family number of the plurality of family samples is preferably equal to or more than 43; the blood preferably comprises peripheral blood.
After obtaining a sample to be detected, the invention carries out whole genome sequencing and mutation site detection on the sample to be detected to obtain a VCF file of each sample. In the present invention, the mutation site of the mutation site detection includes point mutation, insertion mutation and deletion mutation. The method for whole genome sequencing and mutation site detection is not particularly required, and the method or the entrusted gene detection company well known to the person skilled in the art can be selected for detection.
After the VCF file of each sample is obtained, the invention combines the VCF files of each sample according to family relation to obtain family VCF files. In the present invention, the tool used for the merging preferably comprises GATK software.
After obtaining the family VCF file, the invention carries out quality control and annotation on the family VCF file, removes the family VCF file with known mutation of the susceptibility gene of the medulloblastoma, and obtains the family VCF file without known mutation of the susceptibility gene. In the present invention, the known medulloblastoma susceptibility gene preferably includes: APC, BRCA2, PALB2, PTCH1, TP53 and SUFU; the criteria for quality control preferably include: the variation sites Qual By Depth(QD)<2.0,RMS Mapping Quality(MQ)<40.0,Fisher Strand(FS)>60.0,Strand Odds Ratio(SOR)>3.0,Mapping Quality Rank Sum Test(MQRankSum)<-12.5 and Read Pos Rank Sum Test (ReadPosRankSum) < -8.0 were removed. The invention can improve the accuracy of the invention by removing families with known medulloblastoma susceptibility genes to mutate, leaving families which do not contain known susceptibility gene mutation, searching candidate susceptibility genes in mutation data of the families, and avoiding interference of the known susceptibility genes on new susceptibility genes.
The QD of the present invention is the ratio of the Quality (Quality) of the mutated site divided by the Depth of coverage (Depth), where the mutated Quality is the value of QUAL in VCF, and is used to measure the reliability of mutation, and the Depth of coverage is the sum of the depths of coverage of all samples containing mutated bases at this site.
The MQ of the present invention is the root mean square of the aligned mass values of all reads aligned to that site.
The two indexes of FS and SOR are P values obtained through Fisher test, and describe whether obvious positive and negative Strand specificity (Strand bias or preference) exists for reads containing only variation and reads containing only reference sequence bases during sequencing or alignment. This difference reflects either insufficient randomness in the sequencing process or that the alignment algorithm has a certain selection preference in certain regions of the genome.
MQRankSum is a mutation quality score index for assessing the quality of mutation sites. MQRankSum is an abbreviation for Mapping Quality Rank Sum Test, which is a non-parametric Wilcoxon rank sum test for comparing sequencing quality of reference and variant bases at variant sites. Specifically, the MQRankSum test compares two sets of data: one group is the mapping quality (mapping quality) of reads carrying reference alleles, and the other group is the mapping quality of reads carrying variant alleles. The purpose of this test is to detect if there is a significant map quality difference, which may indicate that the detection of a variation may be affected by sequencing or mapping errors. In the mutation filtering process of the GATK, MQRankSum can be used as a filtering standard to help the invention identify and eliminate the mutation possibly identified by mistake due to technical problems.
ReadPosRankSum is also a quality score indicator of mutation detection for assessing the quality of the mutation site. ReadPosRankSum is an abbreviation for Read Position Rank Sum Test, a non-parametric Wilcoxon rank sum test for comparing the relative positions of reads of reference and variant bases at variant sites in their sequencing reads, which can help the present invention identify and exclude those variations that may be misidentified due to read positional deviations.
In the present invention, the noted region preferably includes: a 3 'untranslated region (3' UTR), a 5 'untranslated region (5' UTR), and an exon region (Exonic); the annotated criteria preferably include: (1) Filtering out variant sites with allele frequencies greater than 0.01 using a known mutation database; the known mutation databases include the thousand genome project, dbSNP, ESP6500, exAC, or gnomAD; (2) The reserved mutation type is a frame shift mutation, a missense mutation, a non-frame shift mutation, a splicing mutation, a start codon loss mutation, a stop codon acquisition mutation or a stop codon loss mutation; (3) The heterozygous mutation site with the mutation allele frequency (alt/DP) of 0.25-0.75 is reserved.
The term of the invention is defined as follows:
Frame shift (FRAMESHIFT) mutation: it is meant that the number of bases inserted or deleted is not a multiple of 3, resulting in a change in the reading frame after the start of the point of variation, possibly resulting in a completely different protein.
Missense (Missense) mutations: refers to a change in one base resulting in a change in the encoded amino acid. The Chinese name is "missense variation".
Non-frameshift (Nonframeshift) mutations: the number of bases for an insertion or deletion is a multiple of 3 and does not change the reading frame, but may add or subtract amino acids in the protein.
Splice (Splicing) mutation: the finger mutation occurs at the splice site of the gene and may affect the splicing process of RNA, resulting in abnormal mRNA and protein products.
Start codon loss (Startloss) mutations: the mutation leads to the loss of the initiation codon (usually ATG) and may lead to the inability of the protein synthesis to begin.
Stop codon acquisition (Stopgain) mutation: the finger mutation results in the creation of a new stop codon, leading to premature termination of the protein.
Termination codon loss (Stoploss) mutations: the finger variation results in normal stop codon loss and may lead to protein elongation.
Alt: reads representing the variant allele (ALTERNATE ALLELE), i.e., the number of sequencing reads of the non-reference allele detected at the variant site. DP: representing the total depth of the site (Depth of Coverage), i.e., the number of all sequencing reads at the variant site (including reads of the reference allele and variant allele).
This ratio alt/DP reflects the ratio of variant alleles in the total sequencing depth, i.e., the frequency of variant alleles. In filtering the variation data, a threshold range is set to select those variations with allele frequencies within a particular range to exclude possible false positives or false negatives.
If alt/DP is less than 0.25, this may mean that the frequency of variant alleles is too low, possibly due to sequencing errors or contamination.
If alt/DP is greater than 0.75, this may mean that the frequency of variant alleles is too high, possibly due to inaccuracy of the reference genome or multiple copies in the sample.
Thus, by setting the filtering condition of 0.25. Ltoreq.alt/DP.ltoreq.0.75, the present invention allows for selection of those variations that are moderately frequent, and more likely to be truly present. Such filtering criteria may help to improve the accuracy of the mutation detection.
Screening in the family VCF file without known susceptibility gene mutation to obtain the novel hereditary susceptibility gene of the medulloblastoma; the screening criteria were: the genes satisfying 1) to 3) simultaneously are specifically:
1) Genes with heterozygous deletion mutation sites occurring more than 2 times; the heterozygous deletion mutation sites are as follows: the mutation site is heterozygous in blood sequencing data of a medulloblastoma patient, the tumor sequencing data is homozygous wild or homozygous mutant, and the father and/or mother blood sequencing data is heterozygous;
2) GDI is a gene rated below Medium or unrated;
3) The average expression quantity of the brain is more than or equal to 10TPM genes.
The gene with more than 2 heterozygous deletion mutation sites refers to that the heterozygous deletion mutation sites are found at different positions of one gene for more than two times, and the heterozygous deletion mutation sites can be mutation sites of the same type or different types.
The gene injury index (GDI) is the cumulative mutation injury of each human gene in healthy individuals, based on the thousands of genome project database (stage 3) genetic variation of healthy individuals. Previous studies have demonstrated that human genes that are often highly impaired in normal populations are unlikely to cause disease. By selecting GDI as a gene rated below Medium or unrated, high frequency variation of genes that are less likely to cause disease can be filtered out.
The average expression level of the brain selected in the invention refers to the average expression level of brain bulk-RNA. Since the medulloblastoma is developed in the brain, the gene highly expressed in the brain is screened, and the accuracy of the novel genetic susceptibility gene of the medulloblastoma obtained by screening is higher.
The novel genetic susceptibility genes of the medulloblastoma obtained by screening preferably comprise: SSPO, SPECC1, RABEP1, MTUS1, MPRIP, and CSMD1.
To further illustrate the present invention, a method of screening for novel genetic predisposition to medulloblastoma provided by the present invention is described in detail below with reference to examples, but they should not be construed as limiting the scope of the present invention.
Example 1
1. Case inclusion
300 Medulloblastoma cases were recruited, and the family samples were 43 groups, each group containing a patient tumor sample, a blood sample, and a parent blood sample, all of which were confirmed by two independent diagnoses by the advanced office pathologists. The project relates to collection and preservation of human genetic resources, is approved by the ethical committee of Beijing Tiantan Hospital affiliated to the university of capital medical science, and all patients in the group sign informed consent and comply with the approval of the national human genetic resources office of the science and technology division of the national Ministry of human management of the people's republic of China.
2. Whole genome sequencing and mutation detection
Sampling: peripheral blood of a patient and a parent and tumor tissues of the patient are taken, DNA is extracted, the total amount of the DNA is more than or equal to 6 mug, and the concentration is as follows: not less than 50 ng/. Mu.l, purity: OD 260/280 =1.8 to 2.0.
The method for entrusted Shenzhen Dalife science institute to carry out whole genome sequencing and sequencing data mutation detection comprises the following steps:
Whole genome sequencing: ① Randomly breaking genomic DNA by using ultrasonic waves; ② The broken DNA fragments are subjected to terminal repair and linker connection to construct a DNA library; ③ Bridge PCR amplifying the library DNA; ④ Sequencing the amplified DNA library on a MGISEQ-2000 platform by adopting double-end sequencing and outputting data;
Detecting variation of sequencing data; ① Off-machine data Quality Control (QC): setting a filtering threshold value as Q30, removing low-quality reads, removing reads with a complete sequencing adapter (adapter), removing reads with the content of N bases being more than 10% of the length proportion of the read, discarding reads with the length less than 100bp, adopting a tool of SAOPnuke, fastQC and the like; ② alignment: short reads fragments sequenced using software such as SOAP and BWA were aligned to human genome (hg 38) reference sequences. ③ And (3) mutation detection: based on the results of the alignment, point mutation and indel mutation detection were performed using software such as SOAPsnp and GATK. After the VCF of each sample is obtained, the VCFs of the families are combined by using GATK according to the family relation, and the combined family VCFs are obtained.
3. Quality control, annotation and pathogenicity classification of variants
Quality control of the variant data VCF: the hard overrate is carried out on the whole genome variation data, one or a plurality of index thresholds (also called data characteristic values) are set manually, and then all variation sites which do not meet the thresholds are discarded. Wherein the index has 5 threshold values set by ,Qual By Depth(QD),Fisher Strand(FS),Strand Odds Ratio(SOR),RMS Mapping Quality(MQ),Mapping Quality Rank Sum Test(MQRankSum), QDs <2.0, MQ <40.0, FS >60.0, SOR >3.0, MQRankSum < -12.5, readPosRankSum < -8.0.
Annotating the mutation information after quality control: after the VCF with the hard overrate is obtained, variation information with higher reliability is obtained, kggseq software is used for carrying out gene annotation on the VCF after quality control, and the following group of parameters are used for carrying out quality control of sequence variation in the software: (1) Population allele frequency filtering, including kilo-genome project, dbSNP, ESP6500, exAC, and gnomAD, filtering out variations with a frequency greater than 0.01; (2) reserve 3UTR,5UTR, exonic region variant types: frameshift, missense, nonframeshift, splicing, startloss, stopgain, stoploss; (3) the heterozygous variation site satisfies that alt/DP is less than or equal to 0.25 and less than or equal to 0.75.
4. Exclusion of families harboring known susceptibility gene mutations
According to extensive literature investigation in the early stage, 6 susceptibility genes of MB groups are known, and the susceptibility genes are specifically as follows: APC, BRCA2, PALB2, PTCH1, TP53 and SUFU, identifying whether the filtered family annotation information contains mutations of these known susceptibility genes, excluding mutant families containing known susceptibility genes in the annotation information, finding 16 families containing known susceptibility gene mutations in 43 families, and finding no mutation of known 6 susceptibility genes in 27 families.
5. Heterozygosity loss (Loss of heterogeneity, LOH) analysis in a family of mutations free of known susceptibility genes
In the family without known susceptibility gene mutation, screening mutation sites, heterozygous patient blood data, homozygous wild type or homozygous mutant tumor data, heterozygous either one of the blood data of parents, and satisfying the mutation sites of the conditions, namely, preliminarily obtaining new candidate sites of MB susceptibility genes, and obtaining 246 genes.
6. Human gene injury index (GDI) assessment
GDI was screened for genes rated below Medium and unrated in GDI using add_GDI.py (https:// lab. Scanner. Edu/casanova/GDI/programs), leaving 102 candidate susceptibility genes screened out of 246 genes obtained from the preliminary LOH analysis.
7. Determination of a novel list of genetic susceptibility genes for MB
Performing family genetic co-segregation analysis by using a core family Whole Genome Sequencing (WGS) analysis and using an adult non-cancer gnomAD control queue, wherein each family group comprises blood and tumor whole genome variation data of a prover and whole genome variation data of parents of the prover respectively, and 102 candidate susceptibility genes are screened after mutation susceptibility genes reported by the former in a patient are removed; and screening genes with heterozygosity Loss (LOH) which occur twice or more in family groups from 102 candidate susceptibility genes to obtain a LOH high-frequency gene list (table 1), filtering the genes by a GDI database, narrowing the screening range of the candidate genes to genes with damage score value of NA or less than medium (table 2), screening genes with brain high expression (the expression average value of bulk-RNA is more than or equal to 10 TPM), and finally screening the following 6 gene lists (table 3).
Table 1 shows a list of LOH high frequency genes
Table 2 shows the results obtained by GDI database filtration
TABLE 3 MB novel genetic susceptibility gene list
Although the foregoing embodiments have been described in some, but not all, embodiments of the invention, it should be understood that other embodiments may be devised in accordance with the present embodiments without departing from the spirit and scope of the invention.

Claims (9)

1. A method of screening for a novel genetic predisposing gene for medulloblastoma comprising the steps of:
Selecting a plurality of family samples as samples to be tested; the family sample comprises blood of a medulloblastoma patient, tumor tissue of the medulloblastoma patient and blood of parents of the medulloblastoma patient;
Performing whole genome sequencing and mutation site detection on the sample to be detected to obtain VCF files of each sample; combining the VCF files of each sample according to the family relation to obtain family VCF files;
Performing quality control and annotation on the family VCF file, and removing the family VCF file with known mutation of the susceptibility gene of the medulloblastoma to obtain the family VCF file without known mutation of the susceptibility gene;
Screening in the family VCF file without known susceptibility gene mutation to obtain the novel hereditary susceptibility gene of the medulloblastoma; the screening criteria were: the genes satisfying 1) to 3) simultaneously are specifically:
1) Genes with heterozygous deletion mutation sites occurring more than 2 times; the heterozygous deletion mutation sites are as follows: the mutation site is heterozygous in blood sequencing data of a medulloblastoma patient, the tumor sequencing data is homozygous wild or homozygous mutant, and the father and/or mother blood sequencing data is heterozygous;
2) GDI is a gene rated below Medium or unrated;
3) The average expression quantity of the brain is more than or equal to 10TPM genes.
2. The method of claim 1, wherein the quality control criteria comprise: the variation sites Qual By Depth < 2.0,RMS Mapping Quality < 40.0,Fisher Strand > 60.0,Strand Odds Ratio > 3.0,Mapping Quality Rank Sum Test < -12.5 and Read Pos Rank Sum Test < -8.0 were removed.
3. The method of claim 1, wherein the mutation site detected by the mutation site comprises a point mutation, an insertion mutation, and a deletion mutation.
4. The method of claim 1, wherein the annotated criteria comprise: (1) Filtering out variant sites with allele frequencies greater than 0.01 using a known mutation database; the known mutation databases include the thousand genome project, dbSNP, ESP6500, exAC, or gnomAD; (2) The reserved mutation type is a frame shift mutation, a missense mutation, a non-frame shift mutation, a splicing mutation, a start codon loss mutation, a stop codon acquisition mutation or a stop codon loss mutation; (3) And reserving heterozygous mutation sites with mutation allele frequencies of 0.25-0.75.
5. The method of claim 1 or 4, wherein the annotated zone comprises: a 3 'untranslated region, a 5' untranslated region, and an exon region.
6. The method of claim 1, wherein the known medulloblastoma susceptibility gene comprises: APC, BRCA2, PALB2, PTCH1, TP53 and SUFU.
7. The method of claim 1, wherein the family number of the plurality of family samples is ≡43.
8. The method of claim 1, wherein the medulloblastoma new genetic susceptibility gene comprises: SSPO, SPECC1, RABEP1, MTUS1, MPRIP, and CSMD1.
9. The method of claim 1, wherein the blood comprises peripheral blood.
CN202410813874.2A 2024-06-24 2024-06-24 Method for screening novel hereditary susceptibility genes of medulloblastoma Pending CN118380053A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410813874.2A CN118380053A (en) 2024-06-24 2024-06-24 Method for screening novel hereditary susceptibility genes of medulloblastoma

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410813874.2A CN118380053A (en) 2024-06-24 2024-06-24 Method for screening novel hereditary susceptibility genes of medulloblastoma

Publications (1)

Publication Number Publication Date
CN118380053A true CN118380053A (en) 2024-07-23

Family

ID=91908986

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410813874.2A Pending CN118380053A (en) 2024-06-24 2024-06-24 Method for screening novel hereditary susceptibility genes of medulloblastoma

Country Status (1)

Country Link
CN (1) CN118380053A (en)

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106434918A (en) * 2016-09-26 2017-02-22 复旦大学附属华山医院 Application of specific gene mutation in molecular diagnosis and targeted therapy of pituitary adenoma detection
CN109715802A (en) * 2016-03-18 2019-05-03 卡里斯科学公司 Oligonucleotide probe and application thereof
CN110499364A (en) * 2019-07-30 2019-11-26 北京凯昂医学诊断技术有限公司 A kind of probe groups and its kit and application for detecting the full exon of extended pattern hereditary disease
CN112342293A (en) * 2020-11-23 2021-02-09 新疆医科大学第一附属医院 Family pathogenic coronary heart disease susceptibility gene, reagent for in vitro detection thereof and application thereof
CN112375815A (en) * 2020-11-11 2021-02-19 上海市儿童医院 Genetic disease high-throughput sequencing pathogenic mutation screening method based on core family
CN112626213A (en) * 2020-12-28 2021-04-09 复旦大学附属肿瘤医院 Liver cancer detection panel based on next-generation sequencing technology, kit and application thereof
CN112735599A (en) * 2021-01-26 2021-04-30 河南省人民医院 Evaluation method for judging rare hereditary diseases
CN113249483A (en) * 2021-06-10 2021-08-13 北京泛生子基因科技有限公司 Gene combination, system and application for detecting tumor mutation load
CN113355418A (en) * 2021-06-15 2021-09-07 上海长征医院 Gene for osteosarcoma typing and osteosarcoma prognosis evaluation and application thereof
CN113470776A (en) * 2021-05-28 2021-10-01 南方医科大学皮肤病医院(广东省皮肤病医院、广东省皮肤性病防治中心、中国麻风防治研究中心) Genetic diagnosis system integrating data acquisition, analysis and report generation
CN113502353A (en) * 2021-07-07 2021-10-15 武汉凯德维斯生物技术有限公司 Application of integrated high-frequency gene locus of high-intermediate-risk HPV (human papilloma virus) related to cervical cancer occurrence
CN113628681A (en) * 2021-07-21 2021-11-09 哈尔滨星云医学检验所有限公司 Family denovo mutation-based analysis method and application thereof
CN114921536A (en) * 2022-06-28 2022-08-19 苏州贝康医疗器械有限公司 Method, device, storage medium and equipment for detecting uniparental diploid and loss of heterozygosity
CN114999570A (en) * 2022-08-05 2022-09-02 苏州贝康医疗器械有限公司 Haplotype construction method independent of proband
CN115171784A (en) * 2022-07-20 2022-10-11 北京携云启源科技有限公司 Screening method and screening system for risk genes and risk mutations of polygenic genetic disorders
WO2023286305A1 (en) * 2021-07-15 2023-01-19 富士フイルム株式会社 Cell quality management method and cell production method
CN116168762A (en) * 2023-04-25 2023-05-26 北京泛生子基因科技有限公司 Computer readable storage medium and device for predicting medulloblastoma typing by low depth whole genome sequencing technique and application thereof
CN116665774A (en) * 2023-04-24 2023-08-29 苏州贝康医疗器械有限公司 Family whole genome monomer linkage analysis method, device, storage medium and equipment
CN117677707A (en) * 2021-07-15 2024-03-08 富士胶片株式会社 Quality control method for specific cells and method for producing specific cells
CN117727363A (en) * 2023-12-19 2024-03-19 得利富(厦门)生物科技有限公司 Method and system for analyzing tumor gene mutation detection biological information of multiple sequencing platforms
CN117730164A (en) * 2021-07-15 2024-03-19 富士胶片株式会社 Method for managing cell quality and method for producing cell

Patent Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109715802A (en) * 2016-03-18 2019-05-03 卡里斯科学公司 Oligonucleotide probe and application thereof
CN106434918A (en) * 2016-09-26 2017-02-22 复旦大学附属华山医院 Application of specific gene mutation in molecular diagnosis and targeted therapy of pituitary adenoma detection
CN110499364A (en) * 2019-07-30 2019-11-26 北京凯昂医学诊断技术有限公司 A kind of probe groups and its kit and application for detecting the full exon of extended pattern hereditary disease
CN112375815A (en) * 2020-11-11 2021-02-19 上海市儿童医院 Genetic disease high-throughput sequencing pathogenic mutation screening method based on core family
CN112342293A (en) * 2020-11-23 2021-02-09 新疆医科大学第一附属医院 Family pathogenic coronary heart disease susceptibility gene, reagent for in vitro detection thereof and application thereof
CN112626213A (en) * 2020-12-28 2021-04-09 复旦大学附属肿瘤医院 Liver cancer detection panel based on next-generation sequencing technology, kit and application thereof
CN112735599A (en) * 2021-01-26 2021-04-30 河南省人民医院 Evaluation method for judging rare hereditary diseases
CN113470776A (en) * 2021-05-28 2021-10-01 南方医科大学皮肤病医院(广东省皮肤病医院、广东省皮肤性病防治中心、中国麻风防治研究中心) Genetic diagnosis system integrating data acquisition, analysis and report generation
CN113249483A (en) * 2021-06-10 2021-08-13 北京泛生子基因科技有限公司 Gene combination, system and application for detecting tumor mutation load
CN113355418A (en) * 2021-06-15 2021-09-07 上海长征医院 Gene for osteosarcoma typing and osteosarcoma prognosis evaluation and application thereof
CN113502353A (en) * 2021-07-07 2021-10-15 武汉凯德维斯生物技术有限公司 Application of integrated high-frequency gene locus of high-intermediate-risk HPV (human papilloma virus) related to cervical cancer occurrence
WO2023286305A1 (en) * 2021-07-15 2023-01-19 富士フイルム株式会社 Cell quality management method and cell production method
TW202307215A (en) * 2021-07-15 2023-02-16 日商富士軟片股份有限公司 Cell quality management method and cell production method
CN117677707A (en) * 2021-07-15 2024-03-08 富士胶片株式会社 Quality control method for specific cells and method for producing specific cells
CN117730164A (en) * 2021-07-15 2024-03-19 富士胶片株式会社 Method for managing cell quality and method for producing cell
CN113628681A (en) * 2021-07-21 2021-11-09 哈尔滨星云医学检验所有限公司 Family denovo mutation-based analysis method and application thereof
CN114921536A (en) * 2022-06-28 2022-08-19 苏州贝康医疗器械有限公司 Method, device, storage medium and equipment for detecting uniparental diploid and loss of heterozygosity
CN115171784A (en) * 2022-07-20 2022-10-11 北京携云启源科技有限公司 Screening method and screening system for risk genes and risk mutations of polygenic genetic disorders
CN114999570A (en) * 2022-08-05 2022-09-02 苏州贝康医疗器械有限公司 Haplotype construction method independent of proband
CN116665774A (en) * 2023-04-24 2023-08-29 苏州贝康医疗器械有限公司 Family whole genome monomer linkage analysis method, device, storage medium and equipment
CN116168762A (en) * 2023-04-25 2023-05-26 北京泛生子基因科技有限公司 Computer readable storage medium and device for predicting medulloblastoma typing by low depth whole genome sequencing technique and application thereof
CN117727363A (en) * 2023-12-19 2024-03-19 得利富(厦门)生物科技有限公司 Method and system for analyzing tumor gene mutation detection biological information of multiple sequencing platforms

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
珠珠;黄鉴;潘国庆;陈昌望;洪敏;李文亮;董坚;: "云南省7个Lynch综合征家系中致癌基因突变研究", 肿瘤防治研究, no. 04, 25 April 2015 (2015-04-25) *

Similar Documents

Publication Publication Date Title
US11605445B2 (en) Analysis of fragmentation patterns of cell-free DNA
EP3899018B1 (en) Cell-free dna end characteristics
TW201833329A (en) Methods and systems for tumor detection
KR20170125044A (en) Mutation detection for cancer screening and fetal analysis
WO2014133369A1 (en) Method and apparatus for diagnosing fetal aneuploidy using genomic sequencing
US20160203260A1 (en) Applications of plasma mitochondrial dna analysis
CN116631508B (en) Detection method for tumor specific mutation state and application thereof
US20230124836A1 (en) System using a method for searching and identifying a genetic condition prodromal of the onset of solid tumors
US12054712B2 (en) Fragment size characterization of cell-free DNA mutations from clonal hematopoiesis
Li et al. Correlational study on mitochondrial DNA mutations as potential risk factors in breast cancer
Maierhofer et al. The clinical and genomic landscape of patients with DDX41 variants identified during diagnostic sequencing
Kuchareczko et al. Are molecular tests necessary to diagnose NIFTP?
CN118380053A (en) Method for screening novel hereditary susceptibility genes of medulloblastoma
Noumani et al. Erythrocytosis: Diagnosis and investigation
CN111383713B (en) ctDNA detection and analysis device and method
EP2959298B1 (en) Methods
Seo et al. Germline Functional Variants Contribute to Somatic Mutation and Outcomes in Neuroblastoma
List et al. OncoGeneDx: Hereditary MDS/Leukemia Panel
CN117821595A (en) Biomarker for early precursor T acute lymphoblastic leukemia
Min et al. Microdeletion on Xq27. 1 in a Chinese VACTERL-Like Family with Kidney and Anal Anomalies
CN117877574A (en) Microsatellite locus combination for detecting microsatellite instability based on single tumor sample and application thereof
Fuligni Highly Sensitive and Specific Method for Detection of Clinically Relevant Fusion Genes across Cancer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination