US20140336996A1 - Analysis of genetic variants - Google Patents
Analysis of genetic variants Download PDFInfo
- Publication number
- US20140336996A1 US20140336996A1 US14/274,525 US201414274525A US2014336996A1 US 20140336996 A1 US20140336996 A1 US 20140336996A1 US 201414274525 A US201414274525 A US 201414274525A US 2014336996 A1 US2014336996 A1 US 2014336996A1
- Authority
- US
- United States
- Prior art keywords
- variant
- value
- tumor
- allele frequency
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002068 genetic effect Effects 0.000 title abstract description 17
- 238000004458 analytical method Methods 0.000 title description 37
- 238000000034 method Methods 0.000 claims abstract description 105
- 206010028980 Neoplasm Diseases 0.000 claims description 224
- 108700028369 Alleles Proteins 0.000 claims description 126
- 108091027544 Subgenomic mRNA Proteins 0.000 claims description 109
- 210000004602 germ cell Anatomy 0.000 claims description 108
- 230000000392 somatic effect Effects 0.000 claims description 86
- 230000035772 mutation Effects 0.000 claims description 65
- 238000012512 characterization method Methods 0.000 claims description 47
- 108700024394 Exon Proteins 0.000 claims description 45
- 238000012163 sequencing technique Methods 0.000 claims description 41
- 230000006870 function Effects 0.000 claims description 34
- 238000004364 calculation method Methods 0.000 claims description 29
- 238000012217 deletion Methods 0.000 claims description 19
- 230000037430 deletion Effects 0.000 claims description 19
- 206010069754 Acquired gene mutation Diseases 0.000 claims description 12
- 230000037439 somatic mutation Effects 0.000 claims description 12
- 239000000523 sample Substances 0.000 description 159
- 210000001519 tissue Anatomy 0.000 description 45
- 201000011510 cancer Diseases 0.000 description 37
- 108090000623 proteins and genes Proteins 0.000 description 28
- 238000007481 next generation sequencing Methods 0.000 description 25
- 238000012360 testing method Methods 0.000 description 21
- 230000008569 process Effects 0.000 description 20
- 239000002773 nucleotide Substances 0.000 description 19
- 125000003729 nucleotide group Chemical group 0.000 description 19
- 238000011282 treatment Methods 0.000 description 16
- 210000004027 cell Anatomy 0.000 description 14
- 239000012634 fragment Substances 0.000 description 12
- 238000004422 calculation algorithm Methods 0.000 description 11
- 102000039446 nucleic acids Human genes 0.000 description 11
- 108020004707 nucleic acids Proteins 0.000 description 11
- 150000007523 nucleic acids Chemical class 0.000 description 11
- 239000000126 substance Substances 0.000 description 11
- 101150072950 BRCA1 gene Proteins 0.000 description 10
- 101000605639 Homo sapiens Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform Proteins 0.000 description 10
- 102100038332 Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform Human genes 0.000 description 10
- 230000004075 alteration Effects 0.000 description 10
- 238000011319 anticancer therapy Methods 0.000 description 10
- 210000000349 chromosome Anatomy 0.000 description 10
- -1 e.g. Proteins 0.000 description 10
- 230000037442 genomic alteration Effects 0.000 description 10
- 238000013459 approach Methods 0.000 description 9
- 230000008859 change Effects 0.000 description 8
- 239000003814 drug Substances 0.000 description 8
- 108700020463 BRCA1 Proteins 0.000 description 7
- 102000036365 BRCA1 Human genes 0.000 description 7
- 108020004414 DNA Proteins 0.000 description 7
- 229940124597 therapeutic agent Drugs 0.000 description 7
- 101150008921 Brca2 gene Proteins 0.000 description 6
- 102100025064 Cellular tumor antigen p53 Human genes 0.000 description 6
- 208000005443 Circulating Neoplastic Cells Diseases 0.000 description 6
- 108010011536 PTEN Phosphohydrolase Proteins 0.000 description 6
- 102000014160 PTEN Phosphohydrolase Human genes 0.000 description 6
- 108010078814 Tumor Suppressor Protein p53 Proteins 0.000 description 6
- 230000011218 segmentation Effects 0.000 description 6
- 238000003860 storage Methods 0.000 description 6
- 102000052609 BRCA2 Human genes 0.000 description 5
- 108700020462 BRCA2 Proteins 0.000 description 5
- 101001012157 Homo sapiens Receptor tyrosine-protein kinase erbB-2 Proteins 0.000 description 5
- 102100030086 Receptor tyrosine-protein kinase erbB-2 Human genes 0.000 description 5
- 239000012491 analyte Substances 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 5
- 238000003364 immunohistochemistry Methods 0.000 description 5
- 238000003780 insertion Methods 0.000 description 5
- 230000037431 insertion Effects 0.000 description 5
- 238000002560 therapeutic procedure Methods 0.000 description 5
- 108091026890 Coding region Proteins 0.000 description 4
- 230000004544 DNA amplification Effects 0.000 description 4
- 208000022120 Jeavons syndrome Diseases 0.000 description 4
- 238000003556 assay Methods 0.000 description 4
- 210000004369 blood Anatomy 0.000 description 4
- 239000008280 blood Substances 0.000 description 4
- 208000035475 disorder Diseases 0.000 description 4
- 206010073071 hepatocellular carcinoma Diseases 0.000 description 4
- 230000003902 lesion Effects 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 239000000203 mixture Substances 0.000 description 4
- 238000009521 phase II clinical trial Methods 0.000 description 4
- 230000001105 regulatory effect Effects 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 102200085789 rs121913279 Human genes 0.000 description 4
- 102220097748 rs766786605 Human genes 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 239000007858 starting material Substances 0.000 description 4
- 238000013179 statistical model Methods 0.000 description 4
- 238000010200 validation analysis Methods 0.000 description 4
- 108700040618 BRCA1 Genes Proteins 0.000 description 3
- 206010009944 Colon cancer Diseases 0.000 description 3
- 108010076010 Cystathionine beta-lyase Proteins 0.000 description 3
- 102100035813 E3 ubiquitin-protein ligase CBL Human genes 0.000 description 3
- 102100030708 GTPase KRas Human genes 0.000 description 3
- 108091092195 Intron Proteins 0.000 description 3
- 102000048850 Neoplasm Genes Human genes 0.000 description 3
- 108700019961 Neoplasm Genes Proteins 0.000 description 3
- 108091028043 Nucleic acid sequence Proteins 0.000 description 3
- 206010035226 Plasma cell myeloma Diseases 0.000 description 3
- 201000000582 Retinoblastoma Diseases 0.000 description 3
- 208000003721 Triple Negative Breast Neoplasms Diseases 0.000 description 3
- 230000003321 amplification Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 239000003153 chemical reaction reagent Substances 0.000 description 3
- 238000000205 computational method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 206010061289 metastatic neoplasm Diseases 0.000 description 3
- 238000003199 nucleic acid amplification method Methods 0.000 description 3
- 102000054765 polymorphisms of proteins Human genes 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000008707 rearrangement Effects 0.000 description 3
- 230000004043 responsiveness Effects 0.000 description 3
- 208000022679 triple-negative breast carcinoma Diseases 0.000 description 3
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 2
- CDKIEBFIMCSCBB-UHFFFAOYSA-N 1-(6,7-dimethoxy-3,4-dihydro-1h-isoquinolin-2-yl)-3-(1-methyl-2-phenylpyrrolo[2,3-b]pyridin-3-yl)prop-2-en-1-one;hydrochloride Chemical compound Cl.C1C=2C=C(OC)C(OC)=CC=2CCN1C(=O)C=CC(C1=CC=CN=C1N1C)=C1C1=CC=CC=C1 CDKIEBFIMCSCBB-UHFFFAOYSA-N 0.000 description 2
- 102100026205 1-phosphatidylinositol 4,5-bisphosphate phosphodiesterase gamma-1 Human genes 0.000 description 2
- 108020005345 3' Untranslated Regions Proteins 0.000 description 2
- 108020003589 5' Untranslated Regions Proteins 0.000 description 2
- 102100038776 ADP-ribosylation factor-related protein 1 Human genes 0.000 description 2
- 102100033793 ALK tyrosine kinase receptor Human genes 0.000 description 2
- 102100034580 AT-rich interactive domain-containing protein 1A Human genes 0.000 description 2
- 102000000872 ATM Human genes 0.000 description 2
- 208000024893 Acute lymphoblastic leukemia Diseases 0.000 description 2
- 208000014697 Acute lymphocytic leukaemia Diseases 0.000 description 2
- 208000031261 Acute myeloid leukaemia Diseases 0.000 description 2
- 102100035886 Adenine DNA glycosylase Human genes 0.000 description 2
- 102100034540 Adenomatous polyposis coli protein Human genes 0.000 description 2
- 102100024439 Adhesion G protein-coupled receptor A2 Human genes 0.000 description 2
- 102100021569 Apoptosis regulator Bcl-2 Human genes 0.000 description 2
- 108010004586 Ataxia Telangiectasia Mutated Proteins Proteins 0.000 description 2
- 102000004000 Aurora Kinase A Human genes 0.000 description 2
- 108090000461 Aurora Kinase A Proteins 0.000 description 2
- 102100032306 Aurora kinase B Human genes 0.000 description 2
- 102100021631 B-cell lymphoma 6 protein Human genes 0.000 description 2
- 108091012583 BCL2 Proteins 0.000 description 2
- 208000032791 BCR-ABL1 positive chronic myelogenous leukemia Diseases 0.000 description 2
- 108091007743 BRCA1/2 Proteins 0.000 description 2
- 102100026596 Bcl-2-like protein 1 Human genes 0.000 description 2
- 102100023932 Bcl-2-like protein 2 Human genes 0.000 description 2
- 102100021334 Bcl-2-related protein A1 Human genes 0.000 description 2
- 101150008012 Bcl2l1 gene Proteins 0.000 description 2
- 101001042041 Bos taurus Isocitrate dehydrogenase [NAD] subunit beta, mitochondrial Proteins 0.000 description 2
- 206010006187 Breast cancer Diseases 0.000 description 2
- 208000026310 Breast neoplasm Diseases 0.000 description 2
- 101710098191 C-4 methylsterol oxidase ERG25 Proteins 0.000 description 2
- 102100034808 CCAAT/enhancer-binding protein alpha Human genes 0.000 description 2
- 102100036364 Cadherin-2 Human genes 0.000 description 2
- 102100022480 Cadherin-20 Human genes 0.000 description 2
- 102100029761 Cadherin-5 Human genes 0.000 description 2
- 102100024965 Caspase recruitment domain-containing protein 11 Human genes 0.000 description 2
- 102100028914 Catenin beta-1 Human genes 0.000 description 2
- 102100037182 Cation-independent mannose-6-phosphate receptor Human genes 0.000 description 2
- ZEOWTGPWHLSLOG-UHFFFAOYSA-N Cc1ccc(cc1-c1ccc2c(n[nH]c2c1)-c1cnn(c1)C1CC1)C(=O)Nc1cccc(c1)C(F)(F)F Chemical compound Cc1ccc(cc1-c1ccc2c(n[nH]c2c1)-c1cnn(c1)C1CC1)C(=O)Nc1cccc(c1)C(F)(F)F ZEOWTGPWHLSLOG-UHFFFAOYSA-N 0.000 description 2
- 108091007854 Cdh1/Fizzy-related Proteins 0.000 description 2
- 102000038594 Cdh1/Fizzy-related Human genes 0.000 description 2
- 208000010833 Chronic myeloid leukaemia Diseases 0.000 description 2
- 108010043471 Core Binding Factor Alpha 2 Subunit Proteins 0.000 description 2
- 102100029375 Crk-like protein Human genes 0.000 description 2
- 108010058546 Cyclin D1 Proteins 0.000 description 2
- 108010025464 Cyclin-Dependent Kinase 4 Proteins 0.000 description 2
- 108010025468 Cyclin-Dependent Kinase 6 Proteins 0.000 description 2
- 108010009356 Cyclin-Dependent Kinase Inhibitor p15 Proteins 0.000 description 2
- 102000009512 Cyclin-Dependent Kinase Inhibitor p15 Human genes 0.000 description 2
- 108010009392 Cyclin-Dependent Kinase Inhibitor p16 Proteins 0.000 description 2
- 108010009367 Cyclin-Dependent Kinase Inhibitor p18 Proteins 0.000 description 2
- 102000009503 Cyclin-Dependent Kinase Inhibitor p18 Human genes 0.000 description 2
- 102100036252 Cyclin-dependent kinase 4 Human genes 0.000 description 2
- 102100026804 Cyclin-dependent kinase 6 Human genes 0.000 description 2
- 102100024456 Cyclin-dependent kinase 8 Human genes 0.000 description 2
- 108010081668 Cytochrome P-450 CYP3A Proteins 0.000 description 2
- 102100038497 Cytokine receptor-like factor 2 Human genes 0.000 description 2
- 102100024812 DNA (cytosine-5)-methyltransferase 3A Human genes 0.000 description 2
- 108010024491 DNA Methyltransferase 3A Proteins 0.000 description 2
- 102100034157 DNA mismatch repair protein Msh2 Human genes 0.000 description 2
- 102100021147 DNA mismatch repair protein Msh6 Human genes 0.000 description 2
- 102100024607 DNA topoisomerase 1 Human genes 0.000 description 2
- 102100037799 DNA-binding protein Ikaros Human genes 0.000 description 2
- 102100022204 DNA-dependent protein kinase catalytic subunit Human genes 0.000 description 2
- 208000035240 Disease Resistance Diseases 0.000 description 2
- 206010061818 Disease progression Diseases 0.000 description 2
- 102100031480 Dual specificity mitogen-activated protein kinase kinase 1 Human genes 0.000 description 2
- 102100023266 Dual specificity mitogen-activated protein kinase kinase 2 Human genes 0.000 description 2
- 102100023274 Dual specificity mitogen-activated protein kinase kinase 4 Human genes 0.000 description 2
- 108050002772 E3 ubiquitin-protein ligase Mdm2 Proteins 0.000 description 2
- 102000012199 E3 ubiquitin-protein ligase Mdm2 Human genes 0.000 description 2
- 101150016325 EPHA3 gene Proteins 0.000 description 2
- 102100039563 ETS translocation variant 1 Human genes 0.000 description 2
- 102100039578 ETS translocation variant 4 Human genes 0.000 description 2
- 102100039577 ETS translocation variant 5 Human genes 0.000 description 2
- 108010055323 EphB4 Receptor Proteins 0.000 description 2
- 101150025643 Epha5 gene Proteins 0.000 description 2
- 102100030324 Ephrin type-A receptor 3 Human genes 0.000 description 2
- 102100021605 Ephrin type-A receptor 5 Human genes 0.000 description 2
- 102100021604 Ephrin type-A receptor 6 Human genes 0.000 description 2
- 102100021606 Ephrin type-A receptor 7 Human genes 0.000 description 2
- 102100030779 Ephrin type-B receptor 1 Human genes 0.000 description 2
- 102100031983 Ephrin type-B receptor 4 Human genes 0.000 description 2
- 102100031984 Ephrin type-B receptor 6 Human genes 0.000 description 2
- 102100031690 Erythroid transcription factor Human genes 0.000 description 2
- 102100038595 Estrogen receptor Human genes 0.000 description 2
- 101710105178 F-box/WD repeat-containing protein 7 Proteins 0.000 description 2
- 102100028138 F-box/WD repeat-containing protein 7 Human genes 0.000 description 2
- 102000009095 Fanconi Anemia Complementation Group A protein Human genes 0.000 description 2
- 108010087740 Fanconi Anemia Complementation Group A protein Proteins 0.000 description 2
- 102100023593 Fibroblast growth factor receptor 1 Human genes 0.000 description 2
- 101710182386 Fibroblast growth factor receptor 1 Proteins 0.000 description 2
- 102100023600 Fibroblast growth factor receptor 2 Human genes 0.000 description 2
- 101710182389 Fibroblast growth factor receptor 2 Proteins 0.000 description 2
- 102100027842 Fibroblast growth factor receptor 3 Human genes 0.000 description 2
- 101710182396 Fibroblast growth factor receptor 3 Proteins 0.000 description 2
- 102100027844 Fibroblast growth factor receptor 4 Human genes 0.000 description 2
- 102100032596 Fibrocystin Human genes 0.000 description 2
- 102100027579 Forkhead box protein P4 Human genes 0.000 description 2
- 102100024165 G1/S-specific cyclin-D1 Human genes 0.000 description 2
- 102100024185 G1/S-specific cyclin-D2 Human genes 0.000 description 2
- 102100037859 G1/S-specific cyclin-D3 Human genes 0.000 description 2
- 102100037858 G1/S-specific cyclin-E1 Human genes 0.000 description 2
- 102100029974 GTPase HRas Human genes 0.000 description 2
- 102100039788 GTPase NRas Human genes 0.000 description 2
- 101001077417 Gallus gallus Potassium voltage-gated channel subfamily H member 6 Proteins 0.000 description 2
- 102100025334 Guanine nucleotide-binding protein G(q) subunit alpha Human genes 0.000 description 2
- 102100032610 Guanine nucleotide-binding protein G(s) subunit alpha isoforms XLas Human genes 0.000 description 2
- 102100036738 Guanine nucleotide-binding protein subunit alpha-11 Human genes 0.000 description 2
- 102100040735 Guanylate cyclase soluble subunit alpha-2 Human genes 0.000 description 2
- 102100031561 Hamartin Human genes 0.000 description 2
- 102100034051 Heat shock protein HSP 90-alpha Human genes 0.000 description 2
- 102100035108 High affinity nerve growth factor receptor Human genes 0.000 description 2
- 102100022103 Histone-lysine N-methyltransferase 2A Human genes 0.000 description 2
- 102100038970 Histone-lysine N-methyltransferase EZH2 Human genes 0.000 description 2
- 102100039489 Histone-lysine N-methyltransferase, H3 lysine-79 specific Human genes 0.000 description 2
- 102100039541 Homeobox protein Hox-A3 Human genes 0.000 description 2
- 102100027893 Homeobox protein Nkx-2.1 Human genes 0.000 description 2
- 101000691599 Homo sapiens 1-phosphatidylinositol 4,5-bisphosphate phosphodiesterase gamma-1 Proteins 0.000 description 2
- 101000809413 Homo sapiens ADP-ribosylation factor-related protein 1 Proteins 0.000 description 2
- 101000779641 Homo sapiens ALK tyrosine kinase receptor Proteins 0.000 description 2
- 101000924266 Homo sapiens AT-rich interactive domain-containing protein 1A Proteins 0.000 description 2
- 101001000351 Homo sapiens Adenine DNA glycosylase Proteins 0.000 description 2
- 101000924577 Homo sapiens Adenomatous polyposis coli protein Proteins 0.000 description 2
- 101000833358 Homo sapiens Adhesion G protein-coupled receptor A2 Proteins 0.000 description 2
- 101000798306 Homo sapiens Aurora kinase B Proteins 0.000 description 2
- 101000971234 Homo sapiens B-cell lymphoma 6 protein Proteins 0.000 description 2
- 101000904691 Homo sapiens Bcl-2-like protein 2 Proteins 0.000 description 2
- 101000894929 Homo sapiens Bcl-2-related protein A1 Proteins 0.000 description 2
- 101000945515 Homo sapiens CCAAT/enhancer-binding protein alpha Proteins 0.000 description 2
- 101000714537 Homo sapiens Cadherin-2 Proteins 0.000 description 2
- 101000899459 Homo sapiens Cadherin-20 Proteins 0.000 description 2
- 101000794587 Homo sapiens Cadherin-5 Proteins 0.000 description 2
- 101000761179 Homo sapiens Caspase recruitment domain-containing protein 11 Proteins 0.000 description 2
- 101000916173 Homo sapiens Catenin beta-1 Proteins 0.000 description 2
- 101001028831 Homo sapiens Cation-independent mannose-6-phosphate receptor Proteins 0.000 description 2
- 101000919315 Homo sapiens Crk-like protein Proteins 0.000 description 2
- 101000980937 Homo sapiens Cyclin-dependent kinase 8 Proteins 0.000 description 2
- 101000956427 Homo sapiens Cytokine receptor-like factor 2 Proteins 0.000 description 2
- 101001134036 Homo sapiens DNA mismatch repair protein Msh2 Proteins 0.000 description 2
- 101000968658 Homo sapiens DNA mismatch repair protein Msh6 Proteins 0.000 description 2
- 101000830681 Homo sapiens DNA topoisomerase 1 Proteins 0.000 description 2
- 101000599038 Homo sapiens DNA-binding protein Ikaros Proteins 0.000 description 2
- 101000619536 Homo sapiens DNA-dependent protein kinase catalytic subunit Proteins 0.000 description 2
- 101001115395 Homo sapiens Dual specificity mitogen-activated protein kinase kinase 4 Proteins 0.000 description 2
- 101000813729 Homo sapiens ETS translocation variant 1 Proteins 0.000 description 2
- 101000813747 Homo sapiens ETS translocation variant 4 Proteins 0.000 description 2
- 101000813745 Homo sapiens ETS translocation variant 5 Proteins 0.000 description 2
- 101000967216 Homo sapiens Eosinophil cationic protein Proteins 0.000 description 2
- 101000898696 Homo sapiens Ephrin type-A receptor 6 Proteins 0.000 description 2
- 101000898708 Homo sapiens Ephrin type-A receptor 7 Proteins 0.000 description 2
- 101001064150 Homo sapiens Ephrin type-B receptor 1 Proteins 0.000 description 2
- 101001064451 Homo sapiens Ephrin type-B receptor 6 Proteins 0.000 description 2
- 101001066268 Homo sapiens Erythroid transcription factor Proteins 0.000 description 2
- 101000882584 Homo sapiens Estrogen receptor Proteins 0.000 description 2
- 101000917134 Homo sapiens Fibroblast growth factor receptor 4 Proteins 0.000 description 2
- 101000730595 Homo sapiens Fibrocystin Proteins 0.000 description 2
- 101000861403 Homo sapiens Forkhead box protein P4 Proteins 0.000 description 2
- 101000980741 Homo sapiens G1/S-specific cyclin-D2 Proteins 0.000 description 2
- 101000738559 Homo sapiens G1/S-specific cyclin-D3 Proteins 0.000 description 2
- 101000738568 Homo sapiens G1/S-specific cyclin-E1 Proteins 0.000 description 2
- 101000584633 Homo sapiens GTPase HRas Proteins 0.000 description 2
- 101000584612 Homo sapiens GTPase KRas Proteins 0.000 description 2
- 101000744505 Homo sapiens GTPase NRas Proteins 0.000 description 2
- 101000857888 Homo sapiens Guanine nucleotide-binding protein G(q) subunit alpha Proteins 0.000 description 2
- 101001014590 Homo sapiens Guanine nucleotide-binding protein G(s) subunit alpha isoforms XLas Proteins 0.000 description 2
- 101001014594 Homo sapiens Guanine nucleotide-binding protein G(s) subunit alpha isoforms short Proteins 0.000 description 2
- 101001072407 Homo sapiens Guanine nucleotide-binding protein subunit alpha-11 Proteins 0.000 description 2
- 101001038749 Homo sapiens Guanylate cyclase soluble subunit alpha-2 Proteins 0.000 description 2
- 101000795643 Homo sapiens Hamartin Proteins 0.000 description 2
- 101001016865 Homo sapiens Heat shock protein HSP 90-alpha Proteins 0.000 description 2
- 101000596894 Homo sapiens High affinity nerve growth factor receptor Proteins 0.000 description 2
- 101001045846 Homo sapiens Histone-lysine N-methyltransferase 2A Proteins 0.000 description 2
- 101000882127 Homo sapiens Histone-lysine N-methyltransferase EZH2 Proteins 0.000 description 2
- 101000963360 Homo sapiens Histone-lysine N-methyltransferase, H3 lysine-79 specific Proteins 0.000 description 2
- 101000962622 Homo sapiens Homeobox protein Hox-A3 Proteins 0.000 description 2
- 101000632178 Homo sapiens Homeobox protein Nkx-2.1 Proteins 0.000 description 2
- 101100508538 Homo sapiens IKBKE gene Proteins 0.000 description 2
- 101001056180 Homo sapiens Induced myeloid leukemia cell differentiation protein Mcl-1 Proteins 0.000 description 2
- 101001077600 Homo sapiens Insulin receptor substrate 2 Proteins 0.000 description 2
- 101001034652 Homo sapiens Insulin-like growth factor 1 receptor Proteins 0.000 description 2
- 101000960234 Homo sapiens Isocitrate dehydrogenase [NADP] cytoplasmic Proteins 0.000 description 2
- 101000599886 Homo sapiens Isocitrate dehydrogenase [NADP], mitochondrial Proteins 0.000 description 2
- 101000984620 Homo sapiens Low-density lipoprotein receptor-related protein 1B Proteins 0.000 description 2
- 101000582631 Homo sapiens Menin Proteins 0.000 description 2
- 101000954986 Homo sapiens Merlin Proteins 0.000 description 2
- 101000653374 Homo sapiens Methylcytosine dioxygenase TET2 Proteins 0.000 description 2
- 101001030211 Homo sapiens Myc proto-oncogene protein Proteins 0.000 description 2
- 101001014610 Homo sapiens Neuroendocrine secretory protein 55 Proteins 0.000 description 2
- 101001109719 Homo sapiens Nucleophosmin Proteins 0.000 description 2
- 101000601724 Homo sapiens Paired box protein Pax-5 Proteins 0.000 description 2
- 101001120056 Homo sapiens Phosphatidylinositol 3-kinase regulatory subunit alpha Proteins 0.000 description 2
- 101001126417 Homo sapiens Platelet-derived growth factor receptor alpha Proteins 0.000 description 2
- 101000808592 Homo sapiens Probable ubiquitin carboxyl-terminal hydrolase FAF-X Proteins 0.000 description 2
- 101000797903 Homo sapiens Protein ALEX Proteins 0.000 description 2
- 101000585703 Homo sapiens Protein L-Myc Proteins 0.000 description 2
- 101000579425 Homo sapiens Proto-oncogene tyrosine-protein kinase receptor Ret Proteins 0.000 description 2
- 101000602015 Homo sapiens Protocadherin gamma-B4 Proteins 0.000 description 2
- 101000779418 Homo sapiens RAC-alpha serine/threonine-protein kinase Proteins 0.000 description 2
- 101000798015 Homo sapiens RAC-beta serine/threonine-protein kinase Proteins 0.000 description 2
- 101000798007 Homo sapiens RAC-gamma serine/threonine-protein kinase Proteins 0.000 description 2
- 101000712530 Homo sapiens RAF proto-oncogene serine/threonine-protein kinase Proteins 0.000 description 2
- 101100087590 Homo sapiens RICTOR gene Proteins 0.000 description 2
- 101000932478 Homo sapiens Receptor-type tyrosine-protein kinase FLT3 Proteins 0.000 description 2
- 101000606537 Homo sapiens Receptor-type tyrosine-protein phosphatase delta Proteins 0.000 description 2
- 101000742859 Homo sapiens Retinoblastoma-associated protein Proteins 0.000 description 2
- 101001112293 Homo sapiens Retinoic acid receptor alpha Proteins 0.000 description 2
- 101000927796 Homo sapiens Rho guanine nucleotide exchange factor 7 Proteins 0.000 description 2
- 101000771237 Homo sapiens Serine/threonine-protein kinase A-Raf Proteins 0.000 description 2
- 101000984753 Homo sapiens Serine/threonine-protein kinase B-raf Proteins 0.000 description 2
- 101000777293 Homo sapiens Serine/threonine-protein kinase Chk1 Proteins 0.000 description 2
- 101000777277 Homo sapiens Serine/threonine-protein kinase Chk2 Proteins 0.000 description 2
- 101000987315 Homo sapiens Serine/threonine-protein kinase PAK 3 Proteins 0.000 description 2
- 101000628562 Homo sapiens Serine/threonine-protein kinase STK11 Proteins 0.000 description 2
- 101000713600 Homo sapiens T-box transcription factor TBX22 Proteins 0.000 description 2
- 101000799466 Homo sapiens Thrombopoietin receptor Proteins 0.000 description 2
- 101000702545 Homo sapiens Transcription activator BRG1 Proteins 0.000 description 2
- 101000813738 Homo sapiens Transcription factor ETV6 Proteins 0.000 description 2
- 101000664703 Homo sapiens Transcription factor SOX-10 Proteins 0.000 description 2
- 101000687905 Homo sapiens Transcription factor SOX-2 Proteins 0.000 description 2
- 101001010792 Homo sapiens Transcriptional regulator ERG Proteins 0.000 description 2
- 101000638154 Homo sapiens Transmembrane protease serine 2 Proteins 0.000 description 2
- 101000795659 Homo sapiens Tuberin Proteins 0.000 description 2
- 101000823316 Homo sapiens Tyrosine-protein kinase ABL1 Proteins 0.000 description 2
- 101000823271 Homo sapiens Tyrosine-protein kinase ABL2 Proteins 0.000 description 2
- 101000997835 Homo sapiens Tyrosine-protein kinase JAK1 Proteins 0.000 description 2
- 101000997832 Homo sapiens Tyrosine-protein kinase JAK2 Proteins 0.000 description 2
- 101000934996 Homo sapiens Tyrosine-protein kinase JAK3 Proteins 0.000 description 2
- 101001087416 Homo sapiens Tyrosine-protein phosphatase non-receptor type 11 Proteins 0.000 description 2
- 101000851018 Homo sapiens Vascular endothelial growth factor receptor 1 Proteins 0.000 description 2
- 102100026539 Induced myeloid leukemia cell differentiation protein Mcl-1 Human genes 0.000 description 2
- 102100027004 Inhibin beta A chain Human genes 0.000 description 2
- 102100021857 Inhibitor of nuclear factor kappa-B kinase subunit epsilon Human genes 0.000 description 2
- 102100025092 Insulin receptor substrate 2 Human genes 0.000 description 2
- 102100039688 Insulin-like growth factor 1 receptor Human genes 0.000 description 2
- 102100039905 Isocitrate dehydrogenase [NADP] cytoplasmic Human genes 0.000 description 2
- 102100037845 Isocitrate dehydrogenase [NADP], mitochondrial Human genes 0.000 description 2
- 102100027121 Low-density lipoprotein receptor-related protein 1B Human genes 0.000 description 2
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 2
- 108010068342 MAP Kinase Kinase 1 Proteins 0.000 description 2
- 108010068353 MAP Kinase Kinase 2 Proteins 0.000 description 2
- 102000017274 MDM4 Human genes 0.000 description 2
- 108050005300 MDM4 Proteins 0.000 description 2
- 102000046961 MRE11 Homologue Human genes 0.000 description 2
- 108700019589 MRE11 Homologue Proteins 0.000 description 2
- 229910015837 MSH2 Inorganic materials 0.000 description 2
- 108700012912 MYCN Proteins 0.000 description 2
- 101150022024 MYCN gene Proteins 0.000 description 2
- 208000035346 Margins of Excision Diseases 0.000 description 2
- 102100030550 Menin Human genes 0.000 description 2
- 102100037106 Merlin Human genes 0.000 description 2
- 102100030803 Methylcytosine dioxygenase TET2 Human genes 0.000 description 2
- 108010050345 Microphthalmia-Associated Transcription Factor Proteins 0.000 description 2
- 102100030157 Microphthalmia-associated transcription factor Human genes 0.000 description 2
- 102100025751 Mothers against decapentaplegic homolog 2 Human genes 0.000 description 2
- 101710143123 Mothers against decapentaplegic homolog 2 Proteins 0.000 description 2
- 102100025748 Mothers against decapentaplegic homolog 3 Human genes 0.000 description 2
- 101710143111 Mothers against decapentaplegic homolog 3 Proteins 0.000 description 2
- 102100025725 Mothers against decapentaplegic homolog 4 Human genes 0.000 description 2
- 101710143112 Mothers against decapentaplegic homolog 4 Proteins 0.000 description 2
- 101150097381 Mtor gene Proteins 0.000 description 2
- 208000034578 Multiple myelomas Diseases 0.000 description 2
- 102000013609 MutL Protein Homolog 1 Human genes 0.000 description 2
- 108010026664 MutL Protein Homolog 1 Proteins 0.000 description 2
- 102100038895 Myc proto-oncogene protein Human genes 0.000 description 2
- 201000003793 Myelodysplastic syndrome Diseases 0.000 description 2
- 208000033761 Myelogenous Chronic BCR-ABL Positive Leukemia Diseases 0.000 description 2
- 208000033776 Myeloid Acute Leukemia Diseases 0.000 description 2
- 208000014767 Myeloproliferative disease Diseases 0.000 description 2
- 201000007224 Myeloproliferative neoplasm Diseases 0.000 description 2
- 108700026495 N-Myc Proto-Oncogene Proteins 0.000 description 2
- 102100030124 N-myc proto-oncogene protein Human genes 0.000 description 2
- 102100029166 NT-3 growth factor receptor Human genes 0.000 description 2
- 102000007530 Neurofibromin 1 Human genes 0.000 description 2
- 108010085793 Neurofibromin 1 Proteins 0.000 description 2
- 208000015914 Non-Hodgkin lymphomas Diseases 0.000 description 2
- 102000001759 Notch1 Receptor Human genes 0.000 description 2
- 108010029755 Notch1 Receptor Proteins 0.000 description 2
- 102100022678 Nucleophosmin Human genes 0.000 description 2
- 108700020796 Oncogene Proteins 0.000 description 2
- 102100037504 Paired box protein Pax-5 Human genes 0.000 description 2
- 108010065129 Patched-1 Receptor Proteins 0.000 description 2
- 102000012850 Patched-1 Receptor Human genes 0.000 description 2
- 102100026169 Phosphatidylinositol 3-kinase regulatory subunit alpha Human genes 0.000 description 2
- 108010051742 Platelet-Derived Growth Factor beta Receptor Proteins 0.000 description 2
- 102100030485 Platelet-derived growth factor receptor alpha Human genes 0.000 description 2
- 102100026547 Platelet-derived growth factor receptor beta Human genes 0.000 description 2
- 102100022807 Potassium voltage-gated channel subfamily H member 2 Human genes 0.000 description 2
- 208000006664 Precursor Cell Lymphoblastic Leukemia-Lymphoma Diseases 0.000 description 2
- 102100038603 Probable ubiquitin carboxyl-terminal hydrolase FAF-X Human genes 0.000 description 2
- 206010060862 Prostate cancer Diseases 0.000 description 2
- 208000000236 Prostatic Neoplasms Diseases 0.000 description 2
- 102100030128 Protein L-Myc Human genes 0.000 description 2
- 102100028286 Proto-oncogene tyrosine-protein kinase receptor Ret Human genes 0.000 description 2
- 102100033810 RAC-alpha serine/threonine-protein kinase Human genes 0.000 description 2
- 102100032315 RAC-beta serine/threonine-protein kinase Human genes 0.000 description 2
- 102100032314 RAC-gamma serine/threonine-protein kinase Human genes 0.000 description 2
- 102100033479 RAF proto-oncogene serine/threonine-protein kinase Human genes 0.000 description 2
- 108090000740 RNA-binding protein EWS Proteins 0.000 description 2
- 102000004229 RNA-binding protein EWS Human genes 0.000 description 2
- 108700019586 Rapamycin-Insensitive Companion of mTOR Proteins 0.000 description 2
- 102000046941 Rapamycin-Insensitive Companion of mTOR Human genes 0.000 description 2
- 101710100969 Receptor tyrosine-protein kinase erbB-3 Proteins 0.000 description 2
- 102100029986 Receptor tyrosine-protein kinase erbB-3 Human genes 0.000 description 2
- 102100029981 Receptor tyrosine-protein kinase erbB-4 Human genes 0.000 description 2
- 101710100963 Receptor tyrosine-protein kinase erbB-4 Proteins 0.000 description 2
- 102100020718 Receptor-type tyrosine-protein kinase FLT3 Human genes 0.000 description 2
- 102100039666 Receptor-type tyrosine-protein phosphatase delta Human genes 0.000 description 2
- 108010029031 Regulatory-Associated Protein of mTOR Proteins 0.000 description 2
- 102100040969 Regulatory-associated protein of mTOR Human genes 0.000 description 2
- 102100038042 Retinoblastoma-associated protein Human genes 0.000 description 2
- 102100023606 Retinoic acid receptor alpha Human genes 0.000 description 2
- 102100025373 Runt-related transcription factor 1 Human genes 0.000 description 2
- 108700028341 SMARCB1 Proteins 0.000 description 2
- 101150008214 SMARCB1 gene Proteins 0.000 description 2
- 102000001332 SRC Human genes 0.000 description 2
- 108060006706 SRC Proteins 0.000 description 2
- 102100025746 SWI/SNF-related matrix-associated actin-dependent regulator of chromatin subfamily B member 1 Human genes 0.000 description 2
- 101100495925 Schizosaccharomyces pombe (strain 972 / ATCC 24843) chr3 gene Proteins 0.000 description 2
- 238000012300 Sequence Analysis Methods 0.000 description 2
- 102100029437 Serine/threonine-protein kinase A-Raf Human genes 0.000 description 2
- 102100027103 Serine/threonine-protein kinase B-raf Human genes 0.000 description 2
- 102100031081 Serine/threonine-protein kinase Chk1 Human genes 0.000 description 2
- 102100031075 Serine/threonine-protein kinase Chk2 Human genes 0.000 description 2
- 102100027911 Serine/threonine-protein kinase PAK 3 Human genes 0.000 description 2
- 102100026715 Serine/threonine-protein kinase STK11 Human genes 0.000 description 2
- 102100023085 Serine/threonine-protein kinase mTOR Human genes 0.000 description 2
- 102000013380 Smoothened Receptor Human genes 0.000 description 2
- 101710090597 Smoothened homolog Proteins 0.000 description 2
- 206010068771 Soft tissue neoplasm Diseases 0.000 description 2
- 208000005718 Stomach Neoplasms Diseases 0.000 description 2
- 102100036839 T-box transcription factor TBX22 Human genes 0.000 description 2
- 102100033455 TGF-beta receptor type-2 Human genes 0.000 description 2
- 102100034196 Thrombopoietin receptor Human genes 0.000 description 2
- 102100031027 Transcription activator BRG1 Human genes 0.000 description 2
- 102100039580 Transcription factor ETV6 Human genes 0.000 description 2
- 102100038808 Transcription factor SOX-10 Human genes 0.000 description 2
- 102100024270 Transcription factor SOX-2 Human genes 0.000 description 2
- 108010082684 Transforming Growth Factor-beta Type II Receptor Proteins 0.000 description 2
- 102100031989 Transmembrane protease serine 2 Human genes 0.000 description 2
- 102100031638 Tuberin Human genes 0.000 description 2
- 102000001742 Tumor Suppressor Proteins Human genes 0.000 description 2
- 108010040002 Tumor Suppressor Proteins Proteins 0.000 description 2
- 102100033254 Tumor suppressor ARF Human genes 0.000 description 2
- 102100022596 Tyrosine-protein kinase ABL1 Human genes 0.000 description 2
- 102100022651 Tyrosine-protein kinase ABL2 Human genes 0.000 description 2
- 102100033438 Tyrosine-protein kinase JAK1 Human genes 0.000 description 2
- 102100033444 Tyrosine-protein kinase JAK2 Human genes 0.000 description 2
- 102100025387 Tyrosine-protein kinase JAK3 Human genes 0.000 description 2
- 102100033019 Tyrosine-protein phosphatase non-receptor type 11 Human genes 0.000 description 2
- 108010053100 Vascular Endothelial Growth Factor Receptor-3 Proteins 0.000 description 2
- 102100033178 Vascular endothelial growth factor receptor 1 Human genes 0.000 description 2
- 102100033177 Vascular endothelial growth factor receptor 2 Human genes 0.000 description 2
- 102100033179 Vascular endothelial growth factor receptor 3 Human genes 0.000 description 2
- 208000009956 adenocarcinoma Diseases 0.000 description 2
- 238000013476 bayesian approach Methods 0.000 description 2
- 108700000711 bcl-X Proteins 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 239000000872 buffer Substances 0.000 description 2
- 208000037516 chromosome inversion disease Diseases 0.000 description 2
- 208000029742 colonic neoplasm Diseases 0.000 description 2
- 239000002299 complementary DNA Substances 0.000 description 2
- 238000011109 contamination Methods 0.000 description 2
- 239000013068 control sample Substances 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 230000005750 disease progression Effects 0.000 description 2
- 230000036267 drug metabolism Effects 0.000 description 2
- 102000052116 epidermal growth factor receptor activity proteins Human genes 0.000 description 2
- 108700015053 epidermal growth factor receptor activity proteins Proteins 0.000 description 2
- 239000012520 frozen sample Substances 0.000 description 2
- 206010017758 gastric cancer Diseases 0.000 description 2
- 108010019691 inhibin beta A subunit Proteins 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 210000004072 lung Anatomy 0.000 description 2
- 201000005202 lung cancer Diseases 0.000 description 2
- 208000020816 lung neoplasm Diseases 0.000 description 2
- 230000003211 malignant effect Effects 0.000 description 2
- 230000001394 metastastic effect Effects 0.000 description 2
- 101150071637 mre11 gene Proteins 0.000 description 2
- YOHYSYJDKVYCJI-UHFFFAOYSA-N n-[3-[[6-[3-(trifluoromethyl)anilino]pyrimidin-4-yl]amino]phenyl]cyclopropanecarboxamide Chemical compound FC(F)(F)C1=CC=CC(NC=2N=CN=C(NC=3C=C(NC(=O)C4CC4)C=CC=3)C=2)=C1 YOHYSYJDKVYCJI-UHFFFAOYSA-N 0.000 description 2
- 208000002154 non-small cell lung carcinoma Diseases 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 201000008968 osteosarcoma Diseases 0.000 description 2
- 230000002974 pharmacogenomic effect Effects 0.000 description 2
- 238000009522 phase III clinical trial Methods 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 238000012913 prioritisation Methods 0.000 description 2
- 238000000611 regression analysis Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 102200108481 rs121912654 Human genes 0.000 description 2
- 208000011571 secondary malignant neoplasm Diseases 0.000 description 2
- 201000011549 stomach cancer Diseases 0.000 description 2
- 230000008685 targeting Effects 0.000 description 2
- 230000001225 therapeutic effect Effects 0.000 description 2
- 201000002510 thyroid cancer Diseases 0.000 description 2
- 231100000419 toxicity Toxicity 0.000 description 2
- 230000001988 toxicity Effects 0.000 description 2
- 230000005945 translocation Effects 0.000 description 2
- 108010064892 trkC Receptor Proteins 0.000 description 2
- 238000010626 work up procedure Methods 0.000 description 2
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 1
- 102100028161 ATP-binding cassette sub-family C member 2 Human genes 0.000 description 1
- 102100028163 ATP-binding cassette sub-family C member 4 Human genes 0.000 description 1
- 102100033350 ATP-dependent translocase ABCB1 Human genes 0.000 description 1
- 201000003076 Angiosarcoma Diseases 0.000 description 1
- 206010073360 Appendix cancer Diseases 0.000 description 1
- 206010003571 Astrocytoma Diseases 0.000 description 1
- 102100027205 B-cell antigen receptor complex-associated protein alpha chain Human genes 0.000 description 1
- 102100027203 B-cell antigen receptor complex-associated protein beta chain Human genes 0.000 description 1
- 208000010839 B-cell chronic lymphocytic leukemia Diseases 0.000 description 1
- 102100035080 BDNF/NT-3 growth factors receptor Human genes 0.000 description 1
- 108700010154 BRCA2 Genes Proteins 0.000 description 1
- 206010004146 Basal cell carcinoma Diseases 0.000 description 1
- 206010004593 Bile duct cancer Diseases 0.000 description 1
- 206010005003 Bladder cancer Diseases 0.000 description 1
- 102100022595 Broad substrate specificity ATP-binding cassette transporter ABCG2 Human genes 0.000 description 1
- 201000009030 Carcinoma Diseases 0.000 description 1
- 206010008342 Cervix carcinoma Diseases 0.000 description 1
- 208000005243 Chondrosarcoma Diseases 0.000 description 1
- 201000009047 Chordoma Diseases 0.000 description 1
- 208000006332 Choriocarcinoma Diseases 0.000 description 1
- VYZAMTAEIAYCRO-UHFFFAOYSA-N Chromium Chemical compound [Cr] VYZAMTAEIAYCRO-UHFFFAOYSA-N 0.000 description 1
- 208000016718 Chromosome Inversion Diseases 0.000 description 1
- 208000001333 Colorectal Neoplasms Diseases 0.000 description 1
- 208000009798 Craniopharyngioma Diseases 0.000 description 1
- 108010026925 Cytochrome P-450 CYP2C19 Proteins 0.000 description 1
- 108010000561 Cytochrome P-450 CYP2C8 Proteins 0.000 description 1
- 108010001237 Cytochrome P-450 CYP2D6 Proteins 0.000 description 1
- 102100027417 Cytochrome P450 1B1 Human genes 0.000 description 1
- 102100029363 Cytochrome P450 2C19 Human genes 0.000 description 1
- 102100029359 Cytochrome P450 2C8 Human genes 0.000 description 1
- 102100021704 Cytochrome P450 2D6 Human genes 0.000 description 1
- 102100039205 Cytochrome P450 3A4 Human genes 0.000 description 1
- 102100039208 Cytochrome P450 3A5 Human genes 0.000 description 1
- 238000007400 DNA extraction Methods 0.000 description 1
- 101100226017 Dictyostelium discoideum repD gene Proteins 0.000 description 1
- 102100022334 Dihydropyrimidine dehydrogenase [NADP(+)] Human genes 0.000 description 1
- 101150105460 ERCC2 gene Proteins 0.000 description 1
- 201000009051 Embryonal Carcinoma Diseases 0.000 description 1
- 206010014733 Endometrial cancer Diseases 0.000 description 1
- 206010014759 Endometrial neoplasm Diseases 0.000 description 1
- 102100021771 Endoplasmic reticulum mannosyl-oligosaccharide 1,2-alpha-mannosidase Human genes 0.000 description 1
- 206010014950 Eosinophilia Diseases 0.000 description 1
- 206010014967 Ependymoma Diseases 0.000 description 1
- 208000000461 Esophageal Neoplasms Diseases 0.000 description 1
- 208000032027 Essential Thrombocythemia Diseases 0.000 description 1
- 102100029951 Estrogen receptor beta Human genes 0.000 description 1
- 208000006168 Ewing Sarcoma Diseases 0.000 description 1
- 201000008808 Fibrosarcoma Diseases 0.000 description 1
- 102100035184 General transcription and DNA repair factor IIH helicase subunit XPD Human genes 0.000 description 1
- 208000034951 Genetic Translocation Diseases 0.000 description 1
- 208000032612 Glial tumor Diseases 0.000 description 1
- 206010018338 Glioma Diseases 0.000 description 1
- 102100030943 Glutathione S-transferase P Human genes 0.000 description 1
- 208000001258 Hemangiosarcoma Diseases 0.000 description 1
- 208000017604 Hodgkin disease Diseases 0.000 description 1
- 208000021519 Hodgkin lymphoma Diseases 0.000 description 1
- 208000010747 Hodgkins lymphoma Diseases 0.000 description 1
- 101000986629 Homo sapiens ATP-binding cassette sub-family C member 4 Proteins 0.000 description 1
- 101000914489 Homo sapiens B-cell antigen receptor complex-associated protein alpha chain Proteins 0.000 description 1
- 101000914491 Homo sapiens B-cell antigen receptor complex-associated protein beta chain Proteins 0.000 description 1
- 101000596896 Homo sapiens BDNF/NT-3 growth factors receptor Proteins 0.000 description 1
- 101000725164 Homo sapiens Cytochrome P450 1B1 Proteins 0.000 description 1
- 101000902632 Homo sapiens Dihydropyrimidine dehydrogenase [NADP(+)] Proteins 0.000 description 1
- 101001095815 Homo sapiens E3 ubiquitin-protein ligase RING2 Proteins 0.000 description 1
- 101000615944 Homo sapiens Endoplasmic reticulum mannosyl-oligosaccharide 1,2-alpha-mannosidase Proteins 0.000 description 1
- 101001010910 Homo sapiens Estrogen receptor beta Proteins 0.000 description 1
- 101001010139 Homo sapiens Glutathione S-transferase P Proteins 0.000 description 1
- 101001056794 Homo sapiens Inosine triphosphate pyrophosphatase Proteins 0.000 description 1
- 101000917858 Homo sapiens Low affinity immunoglobulin gamma Fc region receptor III-A Proteins 0.000 description 1
- 101001043562 Homo sapiens Low-density lipoprotein receptor-related protein 2 Proteins 0.000 description 1
- 101001039199 Homo sapiens Low-density lipoprotein receptor-related protein 6 Proteins 0.000 description 1
- 101001025967 Homo sapiens Lysine-specific demethylase 6A Proteins 0.000 description 1
- 101001057193 Homo sapiens Membrane-associated guanylate kinase, WW and PDZ domain-containing protein 1 Proteins 0.000 description 1
- 101000587058 Homo sapiens Methylenetetrahydrofolate reductase Proteins 0.000 description 1
- 101000973778 Homo sapiens NAD(P)H dehydrogenase [quinone] 1 Proteins 0.000 description 1
- 101000807596 Homo sapiens Orotidine 5'-phosphate decarboxylase Proteins 0.000 description 1
- 101000595751 Homo sapiens Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit gamma isoform Proteins 0.000 description 1
- 101000662592 Homo sapiens Poly [ADP-ribose] polymerase tankyrase-2 Proteins 0.000 description 1
- 101000834853 Homo sapiens SUZ domain-containing protein 1 Proteins 0.000 description 1
- 101000826399 Homo sapiens Sulfotransferase 1A1 Proteins 0.000 description 1
- 101000628885 Homo sapiens Suppressor of fused homolog Proteins 0.000 description 1
- 101000799388 Homo sapiens Thiopurine S-methyltransferase Proteins 0.000 description 1
- 101000809797 Homo sapiens Thymidylate synthase Proteins 0.000 description 1
- 101000740048 Homo sapiens Ubiquitin carboxyl-terminal hydrolase BAP1 Proteins 0.000 description 1
- 206010048643 Hypereosinophilic syndrome Diseases 0.000 description 1
- 201000003803 Inflammatory myofibroblastic tumor Diseases 0.000 description 1
- 206010067917 Inflammatory myofibroblastic tumour Diseases 0.000 description 1
- 102100025458 Inosine triphosphate pyrophosphatase Human genes 0.000 description 1
- 108091029795 Intergenic region Proteins 0.000 description 1
- 208000008839 Kidney Neoplasms Diseases 0.000 description 1
- 208000031671 Large B-Cell Diffuse Lymphoma Diseases 0.000 description 1
- 101000740049 Latilactobacillus curvatus Bioactive peptide 1 Proteins 0.000 description 1
- 108091026898 Leader sequence (mRNA) Proteins 0.000 description 1
- 208000018142 Leiomyosarcoma Diseases 0.000 description 1
- 102100029193 Low affinity immunoglobulin gamma Fc region receptor III-A Human genes 0.000 description 1
- 102100021922 Low-density lipoprotein receptor-related protein 2 Human genes 0.000 description 1
- 102100040704 Low-density lipoprotein receptor-related protein 6 Human genes 0.000 description 1
- 102100037462 Lysine-specific demethylase 6A Human genes 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 208000025205 Mantle-Cell Lymphoma Diseases 0.000 description 1
- 208000007054 Medullary Carcinoma Diseases 0.000 description 1
- 208000000172 Medulloblastoma Diseases 0.000 description 1
- 108010047230 Member 1 Subfamily B ATP Binding Cassette Transporter Proteins 0.000 description 1
- 108010090306 Member 2 Subfamily G ATP Binding Cassette Transporter Proteins 0.000 description 1
- 102100027240 Membrane-associated guanylate kinase, WW and PDZ domain-containing protein 1 Human genes 0.000 description 1
- 206010027406 Mesothelioma Diseases 0.000 description 1
- 206010027476 Metastases Diseases 0.000 description 1
- 102100029684 Methylenetetrahydrofolate reductase Human genes 0.000 description 1
- 108010066419 Multidrug Resistance-Associated Protein 2 Proteins 0.000 description 1
- 102100022365 NAD(P)H dehydrogenase [quinone] 1 Human genes 0.000 description 1
- 206010029260 Neuroblastoma Diseases 0.000 description 1
- 108090000770 Neuropilin-2 Proteins 0.000 description 1
- 108091092724 Noncoding DNA Proteins 0.000 description 1
- 206010030155 Oesophageal carcinoma Diseases 0.000 description 1
- 201000010133 Oligodendroglioma Diseases 0.000 description 1
- 108091034117 Oligonucleotide Proteins 0.000 description 1
- 102100037214 Orotidine 5'-phosphate decarboxylase Human genes 0.000 description 1
- 206010033128 Ovarian cancer Diseases 0.000 description 1
- 206010061535 Ovarian neoplasm Diseases 0.000 description 1
- 206010061902 Pancreatic neoplasm Diseases 0.000 description 1
- 102000007497 Patched-2 Receptor Human genes 0.000 description 1
- 108010071083 Patched-2 Receptor Proteins 0.000 description 1
- 102100036052 Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit gamma isoform Human genes 0.000 description 1
- ZYFVNVRFVHJEIU-UHFFFAOYSA-N PicoGreen Chemical compound CN(C)CCCN(CCCN(C)C)C1=CC(=CC2=[N+](C3=CC=CC=C3S2)C)C2=CC=CC=C2N1C1=CC=CC=C1 ZYFVNVRFVHJEIU-UHFFFAOYSA-N 0.000 description 1
- 101150063858 Pik3ca gene Proteins 0.000 description 1
- 208000007641 Pinealoma Diseases 0.000 description 1
- 102100037477 Poly [ADP-ribose] polymerase tankyrase-2 Human genes 0.000 description 1
- 102100034433 Protein kinase C-binding protein NELL2 Human genes 0.000 description 1
- 102100029753 Reduced folate transporter Human genes 0.000 description 1
- 206010038389 Renal cancer Diseases 0.000 description 1
- 208000006265 Renal cell carcinoma Diseases 0.000 description 1
- 108091006778 SLC19A1 Proteins 0.000 description 1
- 108091006735 SLC22A2 Proteins 0.000 description 1
- 108091006730 SLCO1B3 Proteins 0.000 description 1
- 102100026877 SUZ domain-containing protein 1 Human genes 0.000 description 1
- 208000004337 Salivary Gland Neoplasms Diseases 0.000 description 1
- 206010061934 Salivary gland cancer Diseases 0.000 description 1
- 206010039491 Sarcoma Diseases 0.000 description 1
- 201000010208 Seminoma Diseases 0.000 description 1
- 208000021712 Soft tissue sarcoma Diseases 0.000 description 1
- 102100032417 Solute carrier family 22 member 2 Human genes 0.000 description 1
- 102100027239 Solute carrier organic anion transporter family member 1B3 Human genes 0.000 description 1
- 102100023986 Sulfotransferase 1A1 Human genes 0.000 description 1
- 102100032891 Superoxide dismutase [Mn], mitochondrial Human genes 0.000 description 1
- 102100026939 Suppressor of fused homolog Human genes 0.000 description 1
- 201000008736 Systemic mastocytosis Diseases 0.000 description 1
- 101150080074 TP53 gene Proteins 0.000 description 1
- 208000024313 Testicular Neoplasms Diseases 0.000 description 1
- 206010057644 Testis cancer Diseases 0.000 description 1
- 102100034162 Thiopurine S-methyltransferase Human genes 0.000 description 1
- 108091036066 Three prime untranslated region Proteins 0.000 description 1
- 102100038618 Thymidylate synthase Human genes 0.000 description 1
- 208000024770 Thyroid neoplasm Diseases 0.000 description 1
- 108010047933 Tumor Necrosis Factor alpha-Induced Protein 3 Proteins 0.000 description 1
- 108700025716 Tumor Suppressor Genes Proteins 0.000 description 1
- 102000044209 Tumor Suppressor Genes Human genes 0.000 description 1
- 102100024596 Tumor necrosis factor alpha-induced protein 3 Human genes 0.000 description 1
- 102100029152 UDP-glucuronosyltransferase 1A1 Human genes 0.000 description 1
- 101710205316 UDP-glucuronosyltransferase 1A1 Proteins 0.000 description 1
- 208000007097 Urinary Bladder Neoplasms Diseases 0.000 description 1
- 208000006105 Uterine Cervical Neoplasms Diseases 0.000 description 1
- 208000002495 Uterine Neoplasms Diseases 0.000 description 1
- 108010053099 Vascular Endothelial Growth Factor Receptor-2 Proteins 0.000 description 1
- 208000014070 Vestibular schwannoma Diseases 0.000 description 1
- 208000008383 Wilms tumor Diseases 0.000 description 1
- 108700031763 Xeroderma Pigmentosum Group D Proteins 0.000 description 1
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 1
- 208000004064 acoustic neuroma Diseases 0.000 description 1
- 208000017733 acquired polycythemia vera Diseases 0.000 description 1
- 201000005188 adrenal gland cancer Diseases 0.000 description 1
- 208000024447 adrenal gland neoplasm Diseases 0.000 description 1
- 210000004381 amniotic fluid Anatomy 0.000 description 1
- 208000036878 aneuploidy Diseases 0.000 description 1
- 231100001075 aneuploidy Toxicity 0.000 description 1
- 239000003242 anti bacterial agent Substances 0.000 description 1
- 229940088710 antibiotic agent Drugs 0.000 description 1
- 239000003146 anticoagulant agent Substances 0.000 description 1
- 229940127219 anticoagulant drug Drugs 0.000 description 1
- 208000021780 appendiceal neoplasm Diseases 0.000 description 1
- 210000003567 ascitic fluid Anatomy 0.000 description 1
- 210000003719 b-lymphocyte Anatomy 0.000 description 1
- 230000037429 base substitution Effects 0.000 description 1
- 239000011324 bead Substances 0.000 description 1
- 201000007180 bile duct carcinoma Diseases 0.000 description 1
- 201000009036 biliary tract cancer Diseases 0.000 description 1
- 208000020790 biliary tract neoplasm Diseases 0.000 description 1
- 239000000090 biomarker Substances 0.000 description 1
- 238000001574 biopsy Methods 0.000 description 1
- 201000001531 bladder carcinoma Diseases 0.000 description 1
- 210000001124 body fluid Anatomy 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 201000000220 brain stem cancer Diseases 0.000 description 1
- 208000003362 bronchogenic carcinoma Diseases 0.000 description 1
- 201000005200 bronchus cancer Diseases 0.000 description 1
- JJWKPURADFRFRB-UHFFFAOYSA-N carbonyl sulfide Chemical compound O=C=S JJWKPURADFRFRB-UHFFFAOYSA-N 0.000 description 1
- 208000002458 carcinoid tumor Diseases 0.000 description 1
- 201000007455 central nervous system cancer Diseases 0.000 description 1
- 210000001175 cerebrospinal fluid Anatomy 0.000 description 1
- 201000010881 cervical cancer Diseases 0.000 description 1
- 230000002759 chromosomal effect Effects 0.000 description 1
- 208000021668 chronic eosinophilic leukemia Diseases 0.000 description 1
- 210000003040 circulating cell Anatomy 0.000 description 1
- 238000010224 classification analysis Methods 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 206010012818 diffuse large B-cell lymphoma Diseases 0.000 description 1
- 210000001840 diploid cell Anatomy 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000000890 drug combination Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000003623 enhancer Substances 0.000 description 1
- 208000037828 epithelial carcinoma Diseases 0.000 description 1
- 201000004101 esophageal cancer Diseases 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 210000003722 extracellular fluid Anatomy 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000000834 fixative Substances 0.000 description 1
- 201000003444 follicular lymphoma Diseases 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 238000012260 full gene deletion Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 201000011243 gastrointestinal stromal tumor Diseases 0.000 description 1
- 230000004545 gene duplication Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000004077 genetic alteration Effects 0.000 description 1
- 238000010448 genetic screening Methods 0.000 description 1
- 230000007614 genetic variation Effects 0.000 description 1
- 201000010536 head and neck cancer Diseases 0.000 description 1
- 208000014829 head and neck neoplasm Diseases 0.000 description 1
- 201000002222 hemangioblastoma Diseases 0.000 description 1
- 230000002489 hematologic effect Effects 0.000 description 1
- 231100000844 hepatocellular carcinoma Toxicity 0.000 description 1
- 238000009396 hybridization Methods 0.000 description 1
- 238000007901 in situ hybridization Methods 0.000 description 1
- 201000010982 kidney cancer Diseases 0.000 description 1
- 206010024627 liposarcoma Diseases 0.000 description 1
- 210000004185 liver Anatomy 0.000 description 1
- 201000007270 liver cancer Diseases 0.000 description 1
- 208000014018 liver neoplasm Diseases 0.000 description 1
- 210000001165 lymph node Anatomy 0.000 description 1
- 208000037829 lymphangioendotheliosarcoma Diseases 0.000 description 1
- 208000012804 lymphangiosarcoma Diseases 0.000 description 1
- 208000015486 malignant pancreatic neoplasm Diseases 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 208000023356 medullary thyroid gland carcinoma Diseases 0.000 description 1
- 201000001441 melanoma Diseases 0.000 description 1
- 206010027191 meningioma Diseases 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- 230000009401 metastasis Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000000214 mouth Anatomy 0.000 description 1
- 238000000491 multivariate analysis Methods 0.000 description 1
- 206010028537 myelofibrosis Diseases 0.000 description 1
- 201000000050 myeloid neoplasm Diseases 0.000 description 1
- 208000001611 myxosarcoma Diseases 0.000 description 1
- 201000002120 neuroendocrine carcinoma Diseases 0.000 description 1
- 235000015097 nutrients Nutrition 0.000 description 1
- 230000005969 oncogenic driver mutation Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 108700025694 p53 Genes Proteins 0.000 description 1
- 201000002528 pancreatic cancer Diseases 0.000 description 1
- 208000008443 pancreatic carcinoma Diseases 0.000 description 1
- 208000004019 papillary adenocarcinoma Diseases 0.000 description 1
- 201000010198 papillary carcinoma Diseases 0.000 description 1
- 239000013610 patient sample Substances 0.000 description 1
- 208000029255 peripheral nervous system cancer Diseases 0.000 description 1
- 210000003800 pharynx Anatomy 0.000 description 1
- 238000000053 physical method Methods 0.000 description 1
- 208000024724 pineal body neoplasm Diseases 0.000 description 1
- 201000004123 pineal gland cancer Diseases 0.000 description 1
- 208000037244 polycythemia vera Diseases 0.000 description 1
- 230000035935 pregnancy Effects 0.000 description 1
- 239000003755 preservative agent Substances 0.000 description 1
- 208000003476 primary myelofibrosis Diseases 0.000 description 1
- 238000007639 printing Methods 0.000 description 1
- 230000035755 proliferation Effects 0.000 description 1
- 239000002213 purine nucleotide Substances 0.000 description 1
- 150000003212 purines Chemical class 0.000 description 1
- 239000002719 pyrimidine nucleotide Substances 0.000 description 1
- 150000003230 pyrimidines Chemical class 0.000 description 1
- 239000000376 reactant Substances 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 201000009410 rhabdomyosarcoma Diseases 0.000 description 1
- 102200055464 rs113488022 Human genes 0.000 description 1
- 102200104847 rs28934574 Human genes 0.000 description 1
- 102220060698 rs786203538 Human genes 0.000 description 1
- 238000007790 scraping Methods 0.000 description 1
- 201000008407 sebaceous adenocarcinoma Diseases 0.000 description 1
- 238000011896 sensitive detection Methods 0.000 description 1
- 238000010008 shearing Methods 0.000 description 1
- 208000000649 small cell carcinoma Diseases 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000002904 solvent Substances 0.000 description 1
- 238000000527 sonication Methods 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 206010041823 squamous cell carcinoma Diseases 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 108010045815 superoxide dismutase 2 Proteins 0.000 description 1
- 238000001356 surgical procedure Methods 0.000 description 1
- 201000010965 sweat gland carcinoma Diseases 0.000 description 1
- 206010042863 synovial sarcoma Diseases 0.000 description 1
- 238000010189 synthetic method Methods 0.000 description 1
- 201000003120 testicular cancer Diseases 0.000 description 1
- 208000013066 thyroid gland cancer Diseases 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 238000011277 treatment modality Methods 0.000 description 1
- 201000005112 urinary bladder cancer Diseases 0.000 description 1
- 208000010570 urinary bladder carcinoma Diseases 0.000 description 1
- 206010046766 uterine cancer Diseases 0.000 description 1
- 108700026220 vif Genes Proteins 0.000 description 1
Images
Classifications
-
- G06F19/3437—
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/10—Ploidy or copy number detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
Definitions
- the disclosure relates to the analysis of genetic variants.
- cancer tissues are characterized by genetic lesions which are, at least in part, responsible for the occurrence or phenotype of the disorder.
- Many cancers are characterized by one or more genetic aberrations, including gene copy number changes, somatic and germline mutations. The need still exists for analyzing genetic variants associated with cancer.
- the disclosure features, inter alia, methods and systems for analyzing or characterizing variants in a tumor, e.g., generating a characterization model for a variant (e.g., a mutation) in a tissue (e.g., a tumor or tumor sample) from a subject (e.g., a human subject, e.g., a cancer patient).
- a tissue e.g., a tumor or tumor sample
- a subject e.g., a human subject, e.g., a cancer patient.
- the methods described herein can be used in genomic testing that includes variants, e.g., novel variants, whose somatic status is unknown or unclear.
- the characterization can include assessment or indication of zygosity and/or variant type, e.g., as somatic or germline.
- the assessment has numerous uses including: obtaining an understanding of the genetic lesions in a cancer; selecting a treatment modality, e.g., in response to the analysis; staging, diagnosing, or prognosing a subject, e.g., in response to the analysis; developing novel therapeutic agents; the discovery and use of existing therapeutic agents for disorders not previously treated with that therapeutic agent; selection of subjects for experimental trials; understanding mechanisms of tumor characteristics, e.g., tumor metabolism, growth, invasiveness, resistance or susceptibility to therapy; selection or discovery of treatment regimes, e.g., drug combinations, e.g., for simultaneous use or for sequential use, e.g., as early or subsequent line of treatment; and assembling databases of tumor characteristics.
- the systems and methods disclosed herein are also useful for developing compositions, assays, kits, devices, systems, and methods for treating cancer.
- the disclosure provides, a system for generating a characterization model (including, e.g., variant type and/or zygosity) for a variant (e.g., a mutation) in a tissue or sample, e.g., a tumor, or tumor sample, from a subject, e.g., a human subject, e.g., a cancer patient.
- a characterization model including, e.g., variant type and/or zygosity
- a variant e.g., a mutation
- the system is configured such that the analysis can be performed without the need for analyzing non-tumor tissue from the subject.
- the analysis is performed without analyzing non-tumor tissue from the subject, e.g., non-tumor tissue from the same subject is not sequenced.
- the system is configured to determine for at least one of the tumor sample, the selected subgenomic intervals, and the selected germline SNPs that the variant type, e.g., mutation type, cannot be determined for analyzed values.
- At least one processor when executing acquires the SCI calculated as a function (e.g., the log of the ratio) of the number of reads for a subgenomic interval and the number or reads for a control (e.g., a process-matched control).
- a function e.g., the log of the ratio
- At least one processor when executing is configured to calculate SCI as a function (e.g., the log of the ratio) of the number of reads for a subgenomic interval and the number or reads for a control (e.g., a process-matched control).
- a function e.g., the log of the ratio
- the at least one processor when executing is configured to validate a minimum number of subgenomic intervals have been selected or analyzed.
- the at least one processor when executing is configured to acquire the SCI from values calculated against at least 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 1,000, 2,000, 3,000, or 4,000, subgenomic intervals (e.g., exons).
- subgenomic intervals e.g., exons
- the at least one processor when executing is configured to calculate the SCI against at least 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 1,000, 2,000, 3,000, or 4,000, subgenomic intervals (e.g., exons).
- subgenomic intervals e.g., exons
- the SCI comprises a plurality of respective values (e.g., log r values) for a plurality of subgenomic intervals (e.g., exons) from at least 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 1,000, 2,000, 3,000, or 4,000, genes.
- respective values e.g., log r values
- subgenomic intervals e.g., exons
- At least one, a plurality, or substantially all of the values comprised in the SCI are corrected for correlation with GC content.
- At least one processor when executing is configured to validate a minimum number of a plurality of germline SNPs have been selected or analyzed.
- the minimum number of germline SNPs comprises at least 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 1,000, 2,000, 3,000, 4,000, 5000, 6000, 7000, 8000, 9000, 10,000, or 15,000 germline SNPs.
- the SAFI is based, at least in part, on a minor allele frequency in the tumor sample.
- the at least one processor when executing is configured to calculate, or acquire, SAFI based, at least in part, on a minor allele frequency in the tumor sample.
- the SAFI is based, at least in part, on an alternative allele frequency (e.g., an allele frequency other than a standard allele in a human genome reference database).
- an alternative allele frequency e.g., an allele frequency other than a standard allele in a human genome reference database.
- the at least one processor when executing is configured to calculate, or acquire, SAFI based, at least in part, on an alternative allele frequency (e.g., an allele frequency other than a standard allele in a human genome reference database).
- an alternative allele frequency e.g., an allele frequency other than a standard allele in a human genome reference database.
- the at least one processor when executing is configured to access values of C, M, and p calculated from fitting a genome-wide copy number model to the SCI and the SAFI.
- the at least one processor when executing is configured to calculate C, M, and p.
- the at least one processor when executing generates a best fit between the genome-wide copy number model and the SCI and the SAFI to calculate C, M, and p.
- values of C, M, and p fit a plurality of genome-wide copy number model inputs of the SCI and the SAFI.
- the at least one processor when executing is configured to access or calculate one or more genome-wide copy number models.
- the at least one processor when executing is configured to determine a confidence value for each of the plurality of genome-wide copy number models based on a determined fit to the SCI and the SAFI.
- the at least one processor when executing is configured to calculate C, M, and p, responsive to contributions from each of the plurality of genome-wide copy models.
- the contributions are determined according to a confidence level for each of the plurality of genome-wide copy models (including, e.g., confidence levels reflective of a degree of fit).
- a genomic segment comprises a plurality of subgenomic intervals, e.g., exons, e.g., subgenomic intervals, e.g., exons, which have been assigned a SCI value.
- system is configured to calculate and/or assign SCI values to a plurality of subgenomic intervals.
- the at least one processor when executing is configured to require a minimum number of subgenomic intervals for analysis of a genomic segment.
- a genomic segment comprises at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200, 225, 250, 275, 300, 400, or 500 subgenomic intervals, e.g., exons.
- a genomic segment comprises 10 to 1,000, 20 to 900, 30 to 700, 40 to 600, 50 to 500, 60 to 400, 70 to 300, 80 to 200, 80 to 150, or 80 to 120, 90 to 110, or about 100, subgenomic intervals (e.g., exons).
- subgenomic intervals e.g., exons
- a genomic segment comprises between 100 and 10,000, 100 and 5,000, 100 and 4,000, 100 and 3,000, 100 and 2,000, or 100 and 1,000, subgenomic intervals (e.g., exons).
- a genomic segment comprises 10 to 1,000, 20 to 900, 30 to 700, 40 to 600, 50 to 500, 60 to 400, 70 to 300, 80 to 200, 80 to 150, or 80 to 120, 90 to 110, or about 100 genomic SNPs, which have been assigned a SAFI value.
- a genomic segment comprises between 100 and 10,000, 100 and 5,000, 100 and 4,000, 100 and 3,000, 100 and 2,000, or 100 and 1,000, genomic SNPs which have been assigned a SAFI value.
- the at least one processor when executing is configured to validate each of a plurality of genomic segments with values having one or both of:
- the at least one processor when executing is configured to require the number of subgenomic intervals (e.g., exons) that are contained in, or are combined to form, a genomic segment is at least 2, 5, 10, 15, 20, 50, or 100 times the number of genomic segments.
- subgenomic intervals e.g., exons
- the at least one processor when executing is configured to require the number of subgenomic intervals, e.g., exons, is at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 times the number of genomic segments.
- the at least one processor when executing is configured to define a boundary for a genomic segment (e.g., automatically define boundary, accept user input on the boundary, generate relative boundary from user provided inputs, display a user interface for defining genomic segment boundary, display suggested boundary, etc.).
- a boundary for a genomic segment e.g., automatically define boundary, accept user input on the boundary, generate relative boundary from user provided inputs, display a user interface for defining genomic segment boundary, display suggested boundary, etc.
- the at least one processor when executing is configured to assemble sequences of subgenomic intervals (e.g., exons) into genetic segments (including, e.g., user identified subgenomic intervals, system identified subgenomic intervals, candidate subgenomic intervals, user confirmed candidate subgenomic intervals).
- subgenomic intervals e.g., exons
- genetic segments including, e.g., user identified subgenomic intervals, system identified subgenomic intervals, candidate subgenomic intervals, user confirmed candidate subgenomic intervals.
- the at least one processor when executing is configured to segment a genomic sequence into subgenomic intervals of equal copy number (e.g., according to circular binary segmentation (CBS) algorithms, an HMM based method, a Wavelet based method, or a Cluster along Chromosomes method).
- CBS circular binary segmentation
- the at least one processor when executing is configured to assemble subgenomic intervals into genomic segments of equal copy number (e.g., according to circular binary segmentation (CBS) algorithms, an HMM based method, a Wavelet based method, or a Cluster along Chromosomes method).
- CBS circular binary segmentation
- the at least one processor when executing is configured to assemble sequences for subgenomic intervals according to a method described herein (e.g., circular binary segmentation function (CBS) an HMM based method, a Wavelet based method, or a Cluster along Chromosomes method).
- a method described herein e.g., circular binary segmentation function (CBS) an HMM based method, a Wavelet based method, or a Cluster along Chromosomes method.
- the at least one processor when executing is configured to fit the genome-wide copy number model to the SCI according to calculation of:
- the at least one processor when executing is configured to fit the genome-wide copy number model to the SAFI according to calculation of:
- the at least one processor when executing is configured to fit the genome-wide copy number model according to execution of Gibbs sampling.
- the at least one processor when executing is configured to fit the genome-wide copy number model by determining a best fit model from a fitting algorithm (e.g., Markov chain Monte Carlo (MCMC) algorithm, e.g., ASCAT (Allele-Specific Copy Number Analysis of Tumors), OncoSNP, or PICNIC (Predicting Integral Copy Numbers In Cancer).
- MCMC Markov chain Monte Carlo
- ASCAT Allele-Specific Copy Number Analysis of Tumors
- OncoSNP OncoSNP
- PICNIC Predicting Integral Copy Numbers In Cancer
- the fitting comprises using Metropolis-Hastings MCMC.
- the fitting comprises using a non-Bayesian approach (e.g., a frequentist approach, e.g., using least squares fitting).
- a non-Bayesian approach e.g., a frequentist approach, e.g., using least squares fitting.
- the at least one processor when executing is configured to determine g by calculating a fit of values for VAFI, p, C, and M to a model for somatic/germline status.
- the at least one processor when executing is configured to determine g by solving for g in
- AF pM + g ⁇ ( 1 - p ) pC + 2 ⁇ ( 1 - p ) .
- the at least one processor when executing is configured to classify a type of variant responsive to a calculated value of g.
- the at least one processor when executing is configured to classify the type of variant based on at least one of:
- the g value approximately equal to 1 (e.g., with a predetermined distance from 1), or higher, classify the variant as a germline variant
- the at least one processor when executing is configured to define an indistinguishable range of values for g responsive to local evaluation of the genomic segment calculations.
- the at least one processor when executing is configured to define the indistinguishable range of values based on a confidence level associated with calculated values, wherein the greater the confidence level the smaller the range of values of g defining the indistinguishable range, and wherein the smaller the confidence level the greater the range of values of g defining the indistinguishable range of values.
- the at least one processor when executing is configured to classify a zygosity of the variant responsive to a calculated value indicating heterozygosity.
- the at least one processor when executing is configured to determine the sample purity (p) as a global purity value (e.g., is the same for all genomic segments).
- the at least one processor when executing is configured to determine the value of g according to:
- the at least one processor when executing is configured to determine that a g value is approximately equal to 1 and classify the variant as a germline polymorphism.
- the at least one processor when executing is configured to determine that a g value is approximately equal to 1 (e.g., greater than 0.6) and classify the variant as a germline polymorphism.
- the at least one processor when executing is configured to determine that a g value is approximately equal to 0 (e.g., less than 0.4) and classify the variant as a somatic mutation.
- the at least one processor when executing is configured to determine that a g value is approximately equal to a classification value (e.g., g is approximately 1 or 0) responsive to a degree of statistical confidence in the calculations.
- a classification value e.g., g is approximately 1 or 0
- the at least one processor when executing is configured to determine that a g value is significantly less than 0, and classify the variant as a subclonal somatic variant.
- the at least one processor when executing is configured to determine the value of g according to:
- AF allele frequency
- the somatic/germline status is determined when the sample purity is below, for example, about 40% (e.g., between about 10% and 30% (e.g., between about 10% and 20% or between about 20% and 30%)).
- the at least one processor when executing is configured to validate purity values.
- the at least one processor when executing is configured to define a confidence level for a calculation based on the sample purity value.
- the at least one processor when executing is configured to increase a confidence level for a determination of variant type based on a low purity (e.g., 10 ⁇ 30%), and/or decrease a confidence level for a determination of zygosity based on a low purity (e.g., 10 ⁇ 30%), and/or increase a confidence level for a determination of zygosity based on a high purity (e.g., >90%), and/or decrease a confidence level for a determination of variant type based on a high purity (e.g., >90%).
- the at least one processor when executing is configured to classify the variant according to:
- the at least one processor when executing is configured to determine an indication of zygosity for said variant (e.g., mutation).
- the at least one processor when executing is configured to determine the indication of zygosity for said variant is heterozygous when 0 ⁇ M ⁇ C.
- the at least one processor when executing is configured to require the sample purity is greater than about 80%, e.g., between about 90% and 100%, e.g., between about 90% and 95%, or between about 95% and 100%, when determining the zygosity.
- the at least one processor when executing in configured to process-match control values using values obtained where the control is a sample of euploid (e.g., diploid) tissue from a subject other than the subject from which the tumor sample is from, or a sample of mixed euploid (e.g., diploid) tissues from one or more (e.g., at least 2, 3, 4, or 5) subjects other than the subject from which the tumor sample is from.
- euploid e.g., diploid
- mixed euploid e.g., diploid
- the at least one processor when executing is configured to sequence each of the selected subgenomic intervals and each of the selected germline SNPs, e.g., by next generation sequencing (NGS).
- NGS next generation sequencing
- the at least one processor when executing is configured to determine sequence coverage prior to normalization is at least about 10 ⁇ , 20 ⁇ , 30 ⁇ , 50 ⁇ , 100 ⁇ , 250 ⁇ , 500 ⁇ , 750 ⁇ , or 1000 ⁇ the depth of the sequencing.
- the subject has received an anti-cancer therapy.
- the subject has received an anti-cancer therapy and is resistant to the therapy or exhibits disease progression.
- the subject has received an anti-cancer therapy which is selected from: a therapeutic agent that has been approved by the FDA, EMEA, or other regulatory agency; or a therapeutic agent that has been not been approved by the FDA, EMEA, or other regulatory agency.
- the subject has received an anti-cancer therapy in the course of a clinical trial, e.g., a Phase I, Phase II, or Phase III clinical trial (or in an ex-US equivalent of such a trial).
- a clinical trial e.g., a Phase I, Phase II, or Phase III clinical trial (or in an ex-US equivalent of such a trial).
- the variant is positively associated with the type of tumor present in the subject, e.g., with occurrence of, or resistance to treatment.
- the variant is not positively associated with the type of tumor present in the subject.
- the variant is positively associated with a tumor other than the type of tumor present in the subject.
- the variant is a variant that is not positively associated with the type of tumor present in the subject.
- the system is configured to memorialize, e.g., in a database, e.g., a machine readable database, provide a report containing, or transmit, a descriptor for one or more of: the presence, absence, or frequency, of other mutations in the tumor, e.g., other mutations associated with the tumor type in the sample, other mutations not associated with the tumor type in the sample, or other mutations associated with a tumor other than the tumor type in the sample; the characterization of the variant; the allele or gene; or the tumor type, e.g., the name of the type of tumor, whether the tumor is primary or secondary; a subject characteristic; or therapeutic alternatives, recommendations, or choices.
- a database e.g., a machine readable database
- a descriptor relating to the characterization of the variant comprises a descriptor for zygosity or germline vs. somatic status.
- a descriptor relating to a subject characteristic comprises a descriptor for one or more of: the subject's identity; one or more of the subject's, age, gender, weight, or other similar characteristic, occupation; the subject's medical history, e.g., occurrence of the tumor or of other disorders; the subject's family medical history, e.g., relatives who share or do not share the variant; or the subject's prior treatment history, e.g., the treatment received, response to a previously administered anti-cancer therapy, e.g., disease resistance, responsiveness, or progression.
- a descriptor relating to a subject characteristic comprises a descriptor for one or more of: the subject's identity; one or more of the subject's, age, gender, weight, or other similar characteristic, occupation; the subject's medical history, e.g., occurrence of the tumor or of other disorders; the subject's family medical history, e.g., relatives who share or do not share the variant; or the subject's prior treatment history
- the system is in communication with a system that provides one or more of: sequencing data, e.g., raw sequencing data; or sequence analysis.
- system can further provide one or more of: sequencing data, e.g., raw sequencing data; or sequence analysis.
- sequencing data e.g., raw sequencing data
- sequence analysis e.g., sequence analysis
- the at least one processor when executing is configured to generate a user interface.
- the user interface is configured to accept as input any one or more of:
- the system responsive to the user interface input, e.g., for one or more (e.g., 2, 3, 4, 5 or all) of SCI, SAFI, VAFI, C, M, or p, the system generates a characterization model, e.g., a characterization model for a variant as described herein.
- a characterization model e.g., a characterization model for a variant as described herein.
- the user interface is configured to display subgenomic intervals or a value calculated therefrom.
- the user interface is configured to accept user input selecting a plurality of subgenomic intervals on which to evaluate the tumor sample from the subject.
- the user interface is configured to display germline SNPs for the tumor sample.
- the user interface is configured to accept user input selecting a plurality of germline SNPs on which to evaluate the tumor sample.
- the user interface is configured to accept user defined confidence level for calculated values (e.g., calculated value described above).
- the user interface is configured to accept user input to define a boundary for a genomic segment.
- the user interface is configured to display a system generated genomic segment boundary for acceptance or modification by a user.
- the disclosure features, a method of characterizing a variant, e.g., a mutation, in a tissue or sample, e.g., a tumor, or tumor sample, from a subject, e.g., a human, e.g., a cancer patient, comprising:
- the analysis can be performed without the need for analyzing non- tumor tissue from the subject.
- the analysis is performed without analyzing non-tumor tissue from the subject, e.g., non-tumor tissue from the same subject is not sequenced.
- the SCI comprises values that are a function, e.g., the log of the ratio, of the number of reads for a subgenomic interval, e.g., from the sample, and the number or reads for a control, e.g., a process-matched control.
- the SCI comprises values, e.g., log r values, for at least 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, or 10,000, subgenomic intervals, e.g., exons.
- the SCI comprises values, e.g., log r values, for at least 100 subgenomic intervals, e.g., exons.
- the SCI comprises values, e.g., log r values, for 1,000 to 10,000, 2,000 to 9,000, 3,000 to 8,000, 3,000 to 7,000, 3,000 to 6,000, or 4,000 to 5,000, subgenomic intervals, e.g., exons.
- the SCI comprises values, e.g., log r values, for subgenomic intervals, e.g., exons, from at least 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 1,000, 2,000, 3,000, or 4,000, genes.
- At least one, a plurality, or substantially all of the values comprised in the SCI are corrected for correlation with GC content.
- a subgenomic interval, e.g., an exon, from the sample has at least 10, 20, 30, 40, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, or 1,000 reads.
- a plurality e.g., at least 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, or 10,000, subgenomic intervals, e.g., exons, from the sample has a predetermined number of reads.
- the predetermined number of reads is at least 10, 20, 30, 40, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, or 1,000.
- the plurality of germline SNPs comprise at least 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 1,000, 2,000, 3,000, 4,000, 5000, 6000, 7000, 8000, 9000, 10,000, or 15,000 germline SNPs.
- the plurality of germline SNPs comprise at least 100 germline SNPs.
- the plurality of germline SNPs comprises 500 to 5,000, 1,000 to 4,000, or 2,000 to 3,000 germline SNPs.
- the allele frequency is a minor allele frequency.
- the allele frequency is an alternative allele, e.g., an allele other than a standard allele in a human genome reference database.
- the method comprises characterizing a plurality of variants, e.g., mutants, in the tumor sample.
- the method comprises characterizing at least 2, 3, 4, 5, 6, 7, 8 9, 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, or 500 variants, e.g., mutants.
- the method comprises characterizing variants, e.g., mutants, in at least 2, 3, 4, 5, 6, 7, 8 9, 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, or 500 different genes.
- the method comprises acquiring a VAFI for at least 2, 3, 4, 5, 6, 7, 8 9, 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, or 500 variants, e.g., mutants.
- the method comprises performing one, two or all, of steps a), b), and c) for at least 2, 3, 4, 5, 6, 7, 8 9, 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, or 500 variants, e.g., mutants.
- values of C, M, and p are, have, or can be obtained by, fitting a genome-wide copy number model to one or both of the SCI and the SAFI.
- values of C, M, and p fit a plurality of genome-wide copy number model inputs of the SCI and the SAFI.
- a genomic segment comprises a plurality of subgenomic intervals, e.g., exons, e.g., subgenomic intervals which have been assigned a SCI value.
- a genomic segment comprises at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200, 225, 250, 275, 300, 400, or 500 subgenomic intervals, e.g., exons.
- a genomic segment comprises 10 to 1,000, 20 to 900, 30 to 700, 40 to 600, 50 to 500, 60 to 400, 70 to 300, 80 to 200, 80 to 150, or 80 to 120, 90 to 110, or about 100, subgenomic intervals, e.g., exons.
- a genomic segment comprises between 100 and 10,000, 100 and 5,000, 100 and 4,000, 100 and 3,000, 100 and 2,000, or 100 and 1,000, subgenomic intervals, e.g., exons.
- a genomic segment comprises 10 to 1,000, 20 to 900, 30 to 700, 40 to 600, 50 to 500, 60 to 400, 70 to 300, 80 to 200, 80 to 150, or 80 to 120, 90 to 110, or about 100 genomic SNPs, which have been assigned a SAFI value.
- a genomic segment comprises between 100 and 10,000, 100 and 5,000, 100 and 4,000, 100 and 3,000, 100 and 2,000, or 100 and 1,000, genomic SNPs which have been assigned a SAFI value.
- each of a plurality of genomic segments are characterized by having one or both of:
- the number of subgenomic intervals, e.g., exons, that are contained in, or are combined to form, a genomic segment is at least 2, 5, 10, 15, 20, 50, or 100 times the number of genomic segments.
- the number of subgenomic intervals e.g., exons, is at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 times the number of genomic segments.
- a boundary for a genomic segment is provided.
- the method comprises assembling sequences for subgenomic intervals, e.g., exons, into genetic segments.
- the method comprises assembling sequences for subgenomic intervals, with a method described herein, e.g., a method comprising a circular binary segmentation (CBS), an HMM based method, a Wavelet based method, or a Cluster along Chromosomes method.
- a method described herein e.g., a method comprising a circular binary segmentation (CBS), an HMM based method, a Wavelet based method, or a Cluster along Chromosomes method.
- fitting the genome-wide copy number model to the SCI comprises using the equation of:
- ⁇ ( ⁇ i l i C i )/ ⁇ i l i , let l i be the length of a genomic segment.
- fitting the genome-wide copy number model to the SAFI comprises using the equation of:
- the fitting comprises using Gibbs sampling.
- fitting comprises using e.g., Markov chain Monte Carlo (MCMC) algorithm, e.g., ASCAT (Allele-Specific Copy Number Analysis of Tumors), OncoSNP, or PICNIC (Predicting Integral Copy Numbers In Cancer).
- MCMC Markov chain Monte Carlo
- ASCAT Allele-Specific Copy Number Analysis of Tumors
- OncoSNP OncoSNP
- PICNIC Predicting Integral Copy Numbers In Cancer
- fitting comprises using Metropolis-Hastings MCMC.
- fitting comprises using a non-Bayesian approach, e.g., a frequentist approach, e.g., using least squares fitting.
- g is determined by determining the fit of values for VAFI, p, C, and M to a model for somatic/germline status.
- the method comprises acquiring an indication of heterozygosity for said variant, e.g., mutation.
- sample purity (p) is global purity, e.g., is the same for all genomic segments.
- the value of g is acquired by:
- a value of g that is close to 0, e.g., does not differ significantly from 0, indicates the variant is a somatic variant.
- a value of g that is close to 1, e.g., does not differ significantly from 1, indicates the variant is a germline variant.
- a value of g is less than 1 but more than 0, e.g., if it is less than 1 by a predetermined amount and more than 0 by a predetermined amount, e.g., if g is between 0.4 and 0.6, it indicates an indistinguishable result.
- a value of g that is significantly less than 0, is indicative of a subclonal somatic variant.
- the value of g is acquired by:
- AF allele frequency
- M′ C ⁇ M (e.g., when M is a non-minor allele frequency)
- the somatic/germline status is determined, e.g., when the sample purity is below about 40%, e.g., between about 10% and 30%, e.g., between about 10% and 20%, or between about 20% and 30%.
- the method comprises acquiring an indication of zygosity for said variant, e.g., mutation.
- the mutation status is determined as heterozygous is 0 ⁇ M ⁇ C.
- the zygosity is determined, e.g., when the sample purity is greater than about 80%, e.g., between about 90% and 100%, e.g., between about 90% and 95%, or between about 95% and 100%.
- the method comprises sequencing each of the selected subgenomic intervals and each of the selected germline SNPs, e.g., by next generation sequencing (NGS).
- NGS next generation sequencing
- the sequence coverage prior to normalization is at least about 10 ⁇ , 20 ⁇ , 30 ⁇ , 50 ⁇ , 100 ⁇ , 250 ⁇ , 500 ⁇ , 750 ⁇ , or 1000 ⁇ the depth of the sequencing.
- the subject has received an anti-cancer therapy.
- the subject has received an anti-cancer therapy and is resistant to the therapy or exhibits disease progression.
- an anti-cancer therapy which is selected from: a therapeutic agent that has been approved by the FDA, EMEA, or other regulatory agency; or a therapeutic agent that has been not been approved by the FDA, EMEA, or other regulatory agency.
- the subject has received an anti-cancer therapy in the course of a clinical trial, e.g., a Phase I, Phase II, or Phase III clinical trial (or in an ex-US equivalent of such a trial).
- a clinical trial e.g., a Phase I, Phase II, or Phase III clinical trial (or in an ex-US equivalent of such a trial).
- the variant is positively associated with the type of tumor present in the subject, e.g., with occurrence of, or resistance to treatment.
- the variant is not positively associated with the type of tumor present in the subject.
- the variant is positively associated with a tumor other than the type of tumor present in the subject.
- the variant is a variant that is not positively associated with the type of tumor present in the subject.
- the method can memorialize, e.g., in a database, e.g., a machine readable database, provide a report containing, or transmit, a descriptor for one or more of: the presence, absence, or frequency, of other mutations in the tumor, e.g., other mutations associated with the tumor type in the sample, other mutations not associated with the tumor type in the sample, or other mutations associated with a tumor other than the tumor type in the sample; the characterization of the variant; the allele or gene; or the tumor type, e.g., the name of the type of tumor, whether the tumor is primary or secondary; a subject characteristic; or therapeutic alternatives, recommendations, or choices.
- a database e.g., a machine readable database
- a descriptor relating to the characterization of the variant comprises a descriptor for zygosity or germline vs somatic status.
- a descriptor relating to a subject characteristic comprises a descriptor for one or more of: the subject's identity; one or more of the subject's, age, gender, weight, or other similar characteristic, occupation; the subject's medical history, e.g., occurrence of the tumor or of other disorders; the subject's family medical history, e.g., relatives who share or do not share the variant; or the subject's prior treatment history, e.g., the treatment received, response to a previously administered anti-cancer therapy, e.g., disease resistance, responsiveness, or progression.
- a descriptor relating to a subject characteristic comprises a descriptor for one or more of: the subject's identity; one or more of the subject's, age, gender, weight, or other similar characteristic, occupation; the subject's medical history, e.g., occurrence of the tumor or of other disorders; the subject's family medical history, e.g., relatives who share or do not share the variant; or the subject's prior treatment history
- FIG. 1 depicts an exemplary CGH-like log-ratio profile of sample to acquire Input SCI. The region that encompasses BRCA1 gene is circled.
- FIG. 2 depicts an exemplary germline SNP allele frequency profile of sample to acquire Input SAFI. The region that encompasses BRCA1 gene is circled.
- FIG. 3 is a process flow chart for determining a characterization model for a tumor sample according to one embodiment.
- FIG. 4 shows an exemplary block diagram of a general-purpose computer system 400 which can be specially configured to practice various aspects of the present disclosure discussed herein.
- FIG. 5 depicts a storage device.
- FIG. 6 depicts a networked computer system.
- FIG. 7 provides a Table of expected allele frequencies showing that the ability to distinguish somatic variants versus germline polymorphisms, and the ability to determine zygosity status are dependent upon sample purity.
- FIG. 8 depicts a subset of the Table shown in FIG. 7 with the LOH status indicated.
- FIG. 9 depicts a CGH-like log-ratio profile of sample for determination of somatic/germline status and zygosity for PIK3CA H1047R variant.
- FIG. 10 depicts a CGH-like log-ratio profile of sample for determination of somatic/germline status and zygosity for TP53 G356R variant.
- FIG. 11 depicts an exemplary CGH-like log-ratio profile of sample.
- the articles “a” and “an” refer to one or to more than one (e.g., to at least one) of the grammatical object of the article.
- “About” and “approximately” shall generally mean an acceptable degree of error for the quantity measured given the nature or precision of the measurements. Exemplary degrees of error are within 20 percent (%), typically, within 10%, and more typically, within 5% of a given value or range of values.
- Acquire or “acquiring” as the terms are used herein, refer to obtaining possession of a physical entity, or a value, e.g., a numerical value, by one or more or all of: “directly acquiring,” “indirectly acquiring” the physical entity or value, or in the case of a value, “acquiring by calculation.”
- Directly acquiring means performing a process (e.g., performing a synthetic or analytical method) to obtain the physical entity or value.
- “Directly acquiring a physical entity includes performing a process that includes a physical change in a physical substance, e.g., a starting material. Exemplary changes include making a physical entity from two or more starting materials, shearing or fragmenting a substance, separating or purifying a substance, combining two or more separate entities into a mixture, performing a chemical reaction that includes breaking or forming a covalent or non covalent bond.
- Directly acquiring a value includes performing a process that includes a physical change in a sample or another substance, e.g., performing an analytical process which includes a physical change in a substance, e.g., a sample, analyte, or reagent (sometimes referred to herein as “physical analysis”), performing an analytical method, e.g., a method which includes one or more of the following: separating or purifying a substance, e.g., an analyte, or a fragment or other derivative thereof, from another substance; combining an analyte, or fragment or other derivative thereof, with another substance, e.g., a buffer, solvent, or reactant; or changing the structure of an analyte, or a fragment or other derivative thereof, e.g., by breaking or forming a covalent or non covalent bond, between a first and a second atom of the analyte; or by changing the structure of a reagent, or a fragment or other derivative thereof
- “Indirectly acquiring” refers to receiving the physical entity or value from another party or source (e.g., a third party laboratory that directly acquired the physical entity or value).
- a first party may acquire a value from a second party (indirectly acquiring) which said second party directly acquired or acquired by calculation.
- “Acquiring by calculation” refers to acquiring a value by calculation or computation, e.g., as performed on a machine, e.g., a computer.
- “Acquiring a sample” as the term is used herein, refers to obtaining possession of a sample, e.g., a tissue sample or nucleic acid sample, by “directly acquiring” or “indirectly acquiring” the sample.
- “Directly acquiring a sample” means performing a process (e.g., performing a physical method such as a surgery or extraction) to obtain the sample.
- “Indirectly acquiring a sample” refers to receiving the sample from another party or source (e.g., a third party laboratory that directly acquired the sample).
- Directly acquiring a sample includes performing a process that includes a physical change in a physical substance, e.g., a starting material, such as a tissue, e.g., a tissue in a human patient or a tissue that has was previously isolated from a patient.
- a starting material such as a tissue
- Exemplary changes include making a physical entity from a starting material, dissecting or scraping a tissue; separating or purifying a substance (e.g., a sample tissue or a nucleic acid sample); combining two or more separate entities into a mixture; performing a chemical reaction that includes breaking or forming a covalent or non-covalent bond.
- Directly acquiring a sample includes performing a process that includes a physical change in a sample or another substance, e.g., as described above. Methods described herein can include acquiring the tumor sample.
- next-generation sequencing or NGS or NG sequencing refers to any sequencing method that determines the nucleotide sequence of either individual nucleic acid molecules (e.g., in single molecule sequencing) or clonally expanded proxies for individual nucleic acid molecules in a high throughput fashion (e.g., greater than 10 3 , 10 4 , 10 5 or more molecules are sequenced simultaneously).
- the relative abundance of the nucleic acid species in the library can be estimated by counting the relative number of occurrences of their cognate sequences in the data generated by the sequencing experiment.
- Next generation sequencing methods are known in the art, and are described, e.g., in Metzker, M. (2010) Nature Biotechnology Reviews 11:31-46, incorporated herein by reference.
- Next generation sequencing can detect a variant present in less than 5% of the nucleic acids in a sample. Method described herein can use NGS methods.
- Nucleotide value represents the identity of the nucleotide(s) occupying or assigned to a preselected nucleotide position. Typical nucleotide values include: missing (e.g., deleted); additional (e.g., an insertion of one or more nucleotides, the identity of which may or may not be included); or present (occupied); A; T; C; or G.
- a nucleotide value can be a frequency for one or more, e.g., 2, 3, or 4, bases (or other value described herein, e.g., missing or additional) at a nucleotide position.
- a nucleotide value can comprise a frequency for A, and a frequency for G, at a nucleotide position.
- tissue sample each refers to a collection of cells obtained from a subject or patient, e.g., from a tissue, or circulating cells, of a subject or patient.
- the source of the tissue sample can be solid tissue as from a fresh, frozen and/or preserved organ, tissue sample, biopsy, or aspirate; blood or any blood constituents; bodily fluids such as cerebral spinal fluid, amniotic fluid, peritoneal fluid or interstitial fluid; or cells from any time in gestation or development of the subject.
- the tissue sample can contain compounds that are not naturally intermixed with the tissue in nature such as preservatives, anticoagulants, buffers, fixatives, nutrients, antibiotics or the like.
- the sample is preserved as a frozen sample or as formaldehyde- or paraformaldehyde-fixed paraffin-embedded (FFPE) tissue preparation.
- FFPE formaldehyde- or paraformaldehyde-fixed paraffin-embedded
- the sample can be embedded in a matrix, e.g., an FFPE block or a frozen sample.
- the sample is a tumor sample, e.g., includes one or more premalignant or malignant cells.
- the sample e.g., the tumor sample, is acquired from a solid tumor, a soft tissue tumor or a metastatic lesion.
- the sample e.g., the tumor sample
- the sample includes tissue or cells from a surgical margin.
- the sample e.g., tumor sample
- the sample includes one or more circulating tumor cells (CTC) (e.g., a CTC acquired from a blood sample).
- CTC circulating tumor cells
- the sample is a tumor sample, e.g., includes one or more premalignant or malignant cells.
- the sample, e.g., the tumor sample is acquired from a solid tumor, a soft tissue tumor or a metastatic lesion.
- the sample, e.g., the tumor sample includes tissue or cells from a surgical margin. The sample can be histologically normal tissue.
- the sample e.g., tumor sample
- the sample includes one or more circulating tumor cells (CTC) (e.g., a CTC acquired from a blood sample).
- CTC circulating tumor cells
- the method further includes acquiring a sample, e.g., a tumor sample as described herein.
- the sample can be acquired directly or indirectly.
- “Sequencing” a nucleic acid molecule requires determining the identity of at least one nucleotide in the molecule. In embodiments the identity of less than all of the nucleotides in a molecule are determined. In other embodiments, the identity of a majority or all of the nucleotides in the molecule is determined.
- Subgenomic interval refers to a portion of genomic sequence.
- a subgenomic interval can be a single nucleotide position, e.g., a nucleotide position variants of which are associated (positively or negatively) with a tumor phenotype.
- a subgenomic interval comprises more than one nucleotide position.
- Such embodiments include sequences of at least 2, 5, 10, 50, 100, 150, or 250 nucleotide positions in length.
- Subgenomic intervals can comprise an entire gene, or a preselected portion thereof, e.g., the coding region (or portions thereof), a preselected intron (or portion thereof) or exon (or portion thereof).
- a subgenomic interval will include or be an exon.
- a subgenomic interval can comprise all or a part of a fragment of a naturally occurring, e.g., genomic, nucleic acid.
- a subgenomic interval can correspond to a fragment of genomic DNA which is subjected to a sequencing reaction.
- a subgenomic interval is continuous sequence from a genomic source.
- a subgenomic interval includes sequences that are not contiguous in the genome, e.g., it can include junctions formed found at exon-exon junctions in cDNA.
- a subgenomic interval comprises or consists of: a single nucleotide position; an intragenic region or an intergenic region; an exon or an intron, or a fragment thereof, typically an exon sequence or a fragment thereof; a coding region or a non-coding region, e.g., a promoter, an enhancer, a 5′ untranslated region (5′ UTR), or a 3′ untranslated region (3′ UTR), or a fragment thereof; a cDNA or a fragment thereof; a polymorphism; an SNP; a somatic mutation, a germ line mutation or both; an alteration, e.g., a point or a single mutation; a deletion mutation (e.g., an in-frame deletion, an intragenic deletion, a full gene deletion); an insertion mutation (e.g., intragenic insertion); an inversion mutation (e.g., an intra-chromosomal inversion); a linking mutation; a linked insertion mutation; an inverted
- the “copy number of a gene” refers to the number of DNA sequences in a cell encoding a particular gene product. Generally, for a given gene, a mammal has two copies of each gene. The copy number can be increased, e.g., by gene amplification or duplication, or reduced by deletion.
- Variant refers to a structure that can be present at a subgenomic interval that can have more than one structure, e.g., an allele at a polymorphic locus.
- SCI is a measure of normalized sequence coverage at each of a plurality of selected subgenomic intervals, e.g., exons.
- SCI can comprise a series of values for a plurality of selected subgenomic intervals.
- a useful formulation of SCI is a function, e.g., the log, of a value related to the number of sequencing reads for a subgenomic interval, e.g., an exon, in the tumor sample/a value related to the number of sequencing reads for that subgenomic interval in the control. This is sometimes referred to herein as log r.
- a useful form for SCI is:
- reads are acquired for a particular subgenomic interval, e.g., an exon.
- Reads for that subgenomic interval from a control diploid cell are acquired.
- the log of the ratio of the former to the later is acquired. This is repeated for each of a plurality of subgenomic intervals.
- the resulting series of log r values can be used as SCI.
- the measure of normalized sequence coverage can also comprise adjustment for other parameters that might distort the analysis.
- the method can include the use of an SCI that is corrected for this.
- the GC content for a plurality of the subgenomic intervals is acquired.
- the GC content and log r can be compared to determine if they are correlated. This can be undesirable as variations in log r should generally be independent of GC content. Then if there is a correlation, the values for log r can be adjusted, e.g., by regression analysis.
- Input SCI comprises a measure of the allele frequency for each of a plurality of selected germline SNPs in the tumor sample.
- An allele frequency at a selected SNP can be acquired from reads from the sample which cover a selected SNP.
- the allele frequency is the frequency of the minor allele as portrayed in the reads.
- the allele frequency is the frequency, as portrayed in the reads, of an alternative allele.
- the identity of an alternative allele can be acquired from a reference database, e.g., UCSC Human Genome Browser (Meyer L. R., et al., The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res.
- dbSNP the NCBI database of genetic variation. Nucleic Acids Res. 2001; 29(1): 308-311).
- VAFI Variant Allele Frequency Input
- Input VAFI comprises the allele frequency for said variant, e.g., mutation, in the tissue or sample, e.g., tumor sample.
- the number of reads for each of a plurality of subgenomic intervals is normalized, e.g., to the number of reads from a control.
- the control need not be, and typically is not, from the subject that supplies the tumor sample.
- the control sample can be from an individual that does not have a tumor, or does not have a tumor of the type in the subject sample.
- the sample is from normal, non-disease state tissue.
- a control is “process-matched” with the tumor sample if they are sequenced under similar conditions.
- a process matched control can be one in which one or more or all of the following conditions for the treatment of the tumor sample and the control are met: they prepared in the same way; nucleic acid for sequencing is obtained from them in the same way; they are sequenced with the same sequencing method; or they are sequenced in the same run.
- a genomic segment comprises a subgenomic interval, e.g., an exon, and other genomic sequence, e.g., one or a plurality of other subgenomic intervals.
- a genomic interval will include a plurality of subgenomic intervals, e.g., exons, which are characterized by having one or both of:
- a measure of normalized sequence coverage e.g., log r
- a preselected amount e.g., the values for log 2 r for subgenomic intervals, e.g., exons, within the boundaries of the genomic segment differ by no more than a reference value, or are substantially constant;
- SNP allele frequencies for germline SNPs that differ by no more than a preselected amount, e.g., the values for germline SNP allele frequencies for subgenomic intervals, e.g., exons, within the boundaries of the genomic segment differ by no more than a reference value, or are substantially constant.
- genomic sequences into genomic segments can in cases be viewed as a data reduction step.
- E.g., several thousand exons may amount to many fewer, e.g., a hundred or fewer, genomic segments.
- the number of subgenomic intervals, e.g., exons, that are contained in, or are combined to form, the genomic segments can at least 2, 5, 10, 15, 20, 50 or 100 times the number of genomic segments.
- the number of subgenomic intervals, e.g., exons is at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 times the number of genomic segments.
- Genomic sequences e.g., subgenomic intervals, e.g., exons
- a method described herein e.g., a method comprising a circular binary segmentation (CBS)
- CBS circular binary segmentation
- Other methods that can be used include, but not limited to, HMM based methods (Fridlyand et al. Journal of Multivariate Analysis 90 (2004): 132-153), Wavelet based methods (Hsu et al. Biostatistics. 2005; 6(2): 211-226), and Cluster along Chromosomes method (Wang et al. Biostatistics. 2005; 6(1): 45-58).
- Genome-wide copy number, as well as copy number and LOH estimates for each chromosomal segment, can be determined by fitting a statistical model, e.g., a statistical model described herein.
- a gemline variant at segment i can have expected AF:
- AFgermline pMi + ( 1 - p ) pCi + 2 ⁇ ( 1 - p ) ,
- a somatic mutation at segment i can have expected AF:
- AFsomatic pMi pCi + 2 ⁇ ( 1 - p ) ,
- a subclonal somatic mutation at segment i can have expected AF:
- FIG. 8 is an exemplary expected allele frequency table for copy numbers, given purity (p), copy number (C), and alternative allele count (M).
- p purity
- C copy number
- M alternative allele count
- low purity (e.g., ⁇ 20%) samples are relatively be easier for assessing somatic status, but more difficult in assessing tumor LOH.
- high purity (e.g., >90%) samples are easier for assessing tumor LOH, but more difficult in assessing somatic status.
- Tumor samples that are well-admixed with surrounding normal tissue e.g., many clinical cancer specimens
- a more comprehensive table for expected allele frequencies is depicted in FIG. 7 .
- the methods described herein can be used to characterize variants found anywhere in the genome including in exons, introns, 5′-UTRs, and inter-gene regions.
- the method comprises characterizing a variant, e.g., a mutation, in a tumor suppressor gene. In another embodiment, the method comprises characterizing a variant, e.g., a mutation, in an oncogene.
- the method comprises characterizing a variant, e.g., a mutation, in a gene selected from: Table 1, Table 2, or Table 3.
- the method comprises acquiring an SCI for subgenomic intervals from at least five, six, seven, eight, nine, ten, fifteen, twenty, twenty-five, thirty or more genes from the sample, wherein the genes are chosen from: Table 1, Table 2, or Table 3.
- the method comprises acquiring an SCI for a plurality, e.g., at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, of subgenomic intervals, e.g., exons, a gene chosen from: Table 1, Table 2, or Table 3.
- subgenomic intervals e.g., exons, a gene chosen from: Table 1, Table 2, or Table 3.
- the method comprises acquiring an SAFI for a SNP from at least five, six, seven, eight, nine, ten, fifteen, twenty, twenty-five, thirty or more genes from the sample, wherein the genes or gene products are chosen from: Table 1, Table 2, or Table 3.
- one or more of the genomic segments are relevant to pharmacogenetics and pharmacogenomics (PGx), e.g., drug metabolism and toxicity.
- PGx pharmacogenetics and pharmacogenomics
- the method can be used to analyze variants in subjects having cancer.
- Cancers include, but are not limited to, B cell cancer, e.g., multiple myeloma, melanomas, breast cancer, lung cancer (such as non-small cell lung carcinoma or NSCLC), bronchus cancer, colorectal cancer, prostate cancer, pancreatic cancer, stomach cancer, ovarian cancer, urinary bladder cancer, brain or central nervous system cancer, peripheral nervous system cancer, esophageal cancer, cervical cancer, uterine or endometrial cancer, cancer of the oral cavity or pharynx, liver cancer, kidney cancer, testicular cancer, biliary tract cancer, small bowel or appendix cancer, salivary gland cancer, thyroid gland cancer, adrenal gland cancer, osteosarcoma, chondrosarcoma, cancer of hematological tissues, adenocarcinomas, inflammatory myofibroblastic tumors, gastrointestinal stromal tumor (GIST), colon cancer, multiple myeloma (MM), myelodysplastic syndrome (MDS), myeloproliferative disorder (MP
- the cancer is a primary cancer, e.g., a cancer is named after the part of the body where it first started to grow.
- the cancer is a secondary cancer (or a metastasis), e.g., when cancer cells spread from the primary cancer to another part of the body (e.g., lymph nodes, lungs, liver, brain, and bones).
- a secondary cancer can contain cancer cells originated from the primary cancer site.
- the specimens can be processed and analyzed using NGS-based cancer assay, e.g., as described in Frampton et al. Nat Biotechnol. 31(11):1023-1031 (2013).
- the method includes, e.g., DNA extraction, sequencing, analysis and interpretation.
- DNA can be extracted from FFPE tumor samples.
- Typical sample requirements include, e.g., surface area ⁇ 25 mm 2 , sample volume ⁇ 1 mm 3 , nucleated cellularity ⁇ 80% or 30,000 cells, tumor content ⁇ 20%.
- Sequencing library can be prepared using “with-bead” library construction.
- DNA can be fragmented by sonication and ⁇ 50 ng of dsRNA (e.g., quantified by PicoGreen) may be required for library preparation.
- DNA fragments can be captured by biotinylated DNA oligonucleotides during hybridization.
- Sequencing can be performed, e.g., to >500 ⁇ average unique coverage (e.g., with >100 ⁇ at >99% exons), e.g., on a HiSeq platform (Illumina) using 49 ⁇ 49 paired-end sequencing.
- CNAs copy number alterations
- the methods described herein can be sensitive, e.g., to variants present at any mutant allele frequency. Detection of long (e.g., 1-40 bp) indel variants can be achieved using Bruijin graph- based local assembly. CGH-like analysis of read-depth can be used for assessment of CNAs.
- the methods described herein allow for clinical interpretation without a matched normal.
- the reporting approach can include, e.g., removal of germline variants (e.g., from 1000 Genome Project (dbSNP135)) and highlighting known driver alterations (e.g., COSMIC v62) as biologically significant.
- dbSNP135 1000 Genome Project
- driver alterations e.g., COSMIC v62
- specially configured computer systems can be configured to perform the analysis discussed herein, e.g., to generate characterization models of genetic variants appearing in tumor samples.
- the characterization models can specify, for example, a tumor type (e.g., somatic, germline, subclonal somatic, and not-distinguishable) and/or a tumor zygosity (e.g., homozygous, heterozygous, and absent) for a genetic variant based on sequencing information obtained on the sample.
- Various embodiments of characterization systems can be configured to operate on testing data (e.g., genetic sequencing information) provided from genetic screening systems and/or methods.
- the characterization systems can also be configured to perform genetic testing on tumor samples directly to generate, for example, genetic sequencing information.
- characterization models can be generated by system components that interact with system components for sequencing and/or testing tumor samples. The results generated by sequencing components can be accessed by characterization system components to generate characterization models of genetic variants.
- characterization systems can provide user or collaborator (e.g., physicians, researches, clinicians, and other medical personnel) access to genomic sequencing data or information on variants through user interfaces. Responsive to selection in the user interface, the system can accept definition of subgenomic intervals and/or germline single nucleotide polymorphisms (SNPs) within a tumor sample on which to provide a characterization model. In other embodiments, the characterization system can automatically define the subgenomic intervals and/or germline SNPs on which to develop classification analysis.
- user or collaborator e.g., physicians, researches, clinicians, and other medical personnel
- a characterization system is configured to capture data on a genomic sequence coverage for specified subgenomic intervals.
- the system can define a variable for a sequence coverage input (“SCI” discussed herein) based on the values for sequence coverage at the specified subgenomic intervals.
- the system includes a user interface display configured to accept user input to define the specified subgenomic intervals.
- the subgenomic intervals can be pre-defined as part of genetic testing and/or analysis.
- the system can also be configured to identify the subgenomic intervals to analyze automatically (e.g., based on segmentation analysis, etc).
- the system captures a value for sequence coverage for each of a plurality of specified subgenomic intervals.
- the captured values can be normalized, averaged, or weighted to prevent outlier values from skewing subsequent calculations.
- a normalized value for sequence coverage is used in generating a characterization model for a tumor sample.
- the characterization system can also be configured to derive an allele frequency value according to specification of germline SNPs in the tumor sample.
- the system can define a variable for an SNP allele frequency input (“SAFI” as discussed herein) based on the values for allele frequency for the selected germline SNPs.
- SAFI SNP allele frequency input
- the system specifies the germline SNPs on which to capture values for allele frequency (e.g., based on pre-specified selection, automatically based on analysis of the tumor sample, etc.).
- the user interface can also be configured to accept selection of germline SNPs within genetic sequencing information obtained on, for example, a tumor sample.
- the system can be configured to capture and/or calculate additional values from genetic sequence information (including, e.g., captured from testing systems and/or components or generated by the characterization system directly).
- the system can capture allele frequency in a tumor sample (“VAFI”—variant allele frequency as discussed herein) for a given variant (e.g., a mutation) from testing data.
- VAFI tumor sample
- the system can generate the data for capturing the allele frequency responsive to genetic sequence testing performed on the sample.
- the additional values which can be captured and/or acquired can also include any one or more of genomic segment total copy number (“C”—discussed herein) for a plurality of genomic segments; a genomic segment minor allele copy number (“M”—discussed herein) for a plurality of genomic segments; and a sample purity value (“p”—discussed herein).
- C genomic segment total copy number
- M genomic segment minor allele copy number
- p sample purity value
- the characterization system can determine a tumor type (e.g., somatic, germline, subclonal somatic, and not-distinguishable), a tumor zygosity (e.g., homozygous, heterozygous, and absent) responsive to the genetic sequencing data. In embodiments this is achieved without resort to physical analysis of a control sample to determine for example purity.
- a tumor type e.g., somatic, germline, subclonal somatic, and not-distinguishable
- a tumor zygosity e.g., homozygous, heterozygous, and absent
- the system can calculate a value for a variant type, e.g., mutation type (“g”—e.g., a value that is indicative of a variant being somatic, germline, subclonal somatic, or not-distinguishable) by executing a function on the acquired and/or calculated values for VAFI, p, C, and M.
- g mutation type
- the system can classify the variant type, e.g., mutation type.
- a g value equal or approximately equal to 0 is classified by the system as somatic variant.
- a g value equal or approximately equal to 1 is classified by the system as a germline variant. Values of g between 0 and 1 (e.g., 0.4-0.6) are classified by the system as not-determinable.
- the system can calculate a value indicative of the zygosity of the variant in the sample as a function of the acquired and/or calculated values for C and M. For example, a value of M equal to 0 not equal to C is indicative of absence of the variant, a non-zero value of M equal to C is indicative of homozygosity of the variant (e.g., LOH), a value of M equal to 0 equal to C is indicative of homozygous deletion of the variant, and a non-zero value of M not equal to C is indicative of heterozygosity of the variant.
- a value of M equal to 0 not equal to C is indicative of absence of the variant
- a non-zero value of M equal to C is indicative of homozygosity of the variant (e.g., LOH)
- a value of M equal to 0 equal to C is indicative of homozygous deletion of the variant
- a non-zero value of M not equal to C is indicative of heterozygosity of the variant.
- the system can also be configured to determine a confidence level associated with any calculation and/or calculated value (e.g., based on statistical analysis of the input(s) and computational values used to derive an output).
- the system can use determinations on the confidence of calculations and/or calculated values in interpreting classification outputs.
- the not-determinable range of values can be increased where the degree of confidence associated with the calculation of g is low. In another example, the not-determinable range of values can be decreased where the degree of confidence associated with the calculation of g is high.
- system for generating characterization models can perform any one or more of the functions and/or computations discussed herein.
- the system includes system components specially configured calculate C, M, and/or p responsive to fitting a genome-wide copy number model to one or both of the SCI and the SAFI.
- the system and/or system components are configured to fit the genome-wide copy number model to the SCI using the equation of:
- ⁇ tumor ploidy.
- the system can also be configured to fit the genome-wide copy number model to the SAFI using the equation of:
- the system calculates g based on the fit of values for VAFI, p, C, and M to models of somatic/germline status.
- Various fitting methodologies can be executed by the system to determine g values (e.g., Markov chain Monte Carlo (MCMC) algorithm, e.g., ASCAT (Allele-Specific Copy Number Analysis of Tumors), OncoSNP, or PICNIC (Predicting Integral Copy Numbers In Cancer).
- MCMC Markov chain Monte Carlo
- ASCAT Allele-Specific Copy Number Analysis of Tumors
- OncoSNP OncoSNP
- PICNIC Predicting Integral Copy Numbers In Cancer
- a system for determining a characterization model for a tumor sample can execute a variety of functions and/or processes.
- FIG. 3 Shown in FIG. 3 is an example process 300 for generating a characterization model for a tumor sample according to one embodiment.
- Process 300 begins at 302 by acquisition of calculation values.
- the acquisition of the calculation values at 302 can include accessing any one or more of the values used to calculate g and/or determine zygosity (e.g., from evaluation of M against C).
- the calculation values accessed at 302 can include any one or more of: SCI, SAFI, VAFI, C, M, p.
- acquisition at 302 can also include calculation and/or direct determination of SCI, SAFI, and VAFI from sequencing on a tumor sample. Additionally, acquisition at 302 can also include calculation and/or direct determination of C, M, and/or p.
- Process 300 continues at 304 , where values necessary for determining the characterization model that are missing ( 304 YES) are calculated from the acquired values of 302 .
- C, M, and/or p can be calculated at 306 if any of the values are not acquired, and intermediate calculations are necessary 304 YES. If the values necessary for classification are acquired at 302 , then intermediate calculations are not needed 304 NO.
- classification values can be determined at 308 .
- a value indicative of variant type is determined at 308 .
- the variant type can include somatic, germline, subclonal somatic, and/or not-distinguishable based on the value determined at 308 .
- a value for g is determined at 308 , and the variant type is classified based on the value of g (e.g., equal or approximately equal to 0: somatic; equal or approximately equal to 1: germline; less than 0; subclonal somatic; and in a range between 0 and 1 (e.g., 0.4 to 0.6) not-distinguishable).
- a value indicative of zygosity as a function of C and M is determined at 308 (e.g., a value of M equal to 0 not equal to C is indicative of absence of the variant, a non-zero value of M equal to C is indicative of homozygosity of the variant (e.g., LOH), a value of M equal to 0 equal to C is indicative of a homozygous deletion of the variant, and a non-zero value of M not equal to C is indicative of heterozygosity of the variant).
- a characterization model can be generated for a variant specifying type and/or zygosity.
- Various embodiments according to the disclosure may be implemented on one or more specially programmed computer systems.
- These computer systems may be, for example, general-purpose computers such as those based on Intel PENTIUM-type processor, Motorola PowerPC, AMD Athlon or Turion, Sun UltraSPARC, Hewlett-Packard PA-RISC processors, or any other type of processor, including multi-core processors.
- Intel PENTIUM-type processor Motorola PowerPC, AMD Athlon or Turion
- Sun UltraSPARC Hewlett-Packard PA-RISC processors
- any type computer system may be used to perform a process or processes for generating a characterization model for a variant in a tumor sample.
- the system may be located on a single computer or may be distributed among a plurality of computers attached by a communications network.
- a general-purpose computer system is specially configured to perform any of the described functions, including but not limited to, acquiring calculation values (e.g., SCI, SAFI, VAFI, M, C, p), normalizing calculation values against a control, calculating intermediate values, calculating classification value(s) (e.g., g and/or zygosity value(s)), etc.
- calculation values e.g., SCI, SAFI, VAFI, M, C, p
- normalizing calculation values against a control e.g., a control
- calculating intermediate values e.g., g and/or zygosity value(s)
- Additional functions include, for example, fitting genomic wide models to determine classification values, determining log r values, determining correlation of GC content, specifying genomic intervals, specifying germline SNPs, determining calculation values (e.g., SCI, SAFI, VAFI, M, C, p), defining genomic segments, segmenting genomic sequence information, determining sequence coverage, determining SNP allele frequencies, determining genomic segment boundaries, etc.
- the system may perform other functions, including assembling sequences for subgenomic intervals, generating genome-wide copy number model(s), fitting genome-wide copy number model(s), displaying genomic sequence information for selection, determining sample purity, calculating confidence values, and enforcing thresholds on calculations (e.g., purity >80%).
- the functions, operations, and/or algorithms described herein can also be encoded as software executing on hardware that together define a processing component, that can further define one or more portions of a specially configured general purpose computer, that reside on an individual specially configured general purpose computer, and/or reside on multiple specially configured general purpose computers.
- FIG. 4 shows an example block diagram of a general-purpose computer system 400 which can be specially configured to practice various aspects of the present disclosure discussed herein.
- various aspects of the disclosure can be implemented as specialized software executing in one or more computer systems including general-purpose computer systems 604 , 606 , and 608 communicating over network 602 shown in FIG. 6 .
- Computer system 400 may include a processor 406 connected to one or more memory devices 410 , such as a disk drive, memory, or other device for storing data.
- Memory 410 is typically used for storing programs and data during operation of the computer system 400 .
- Components of computer system 400 can be coupled by an interconnection mechanism 408 , which may include one or more busses (e.g., between components that are integrated within a same machine) and/or a network (e.g., between components that reside on separate discrete machines).
- the interconnection mechanism 408 enables communications (e.g., data, instructions) to be exchanged between system components of system 400 .
- Computer system 400 may also include one or more input/output (I/O) devices 402 - 204 , for example, a keyboard, mouse, trackball, microphone, touch screen, a printing device, display screen, speaker, etc.
- Storage 412 typically includes a computer readable and writeable nonvolatile recording medium in which instructions are stored that define a program to be executed by the processor or information stored on or in the medium to be processed by the program.
- the medium may, for example, be a disk 502 or flash memory as shown in FIG. 5 .
- the processor causes data to be read from the nonvolatile recording medium into another memory 504 that allows for faster access to the information by the processor than does the medium.
- This memory is typically a volatile, random access memory such as a dynamic random access memory (DRAM) or static memory (SRAM).
- DRAM dynamic random access memory
- SRAM static memory
- the computer-readable medium is a non-transient storage medium.
- the memory can be located in storage 412 as shown, or in memory system 410 .
- the processor 406 generally manipulates the data within the memory 410 , and then copies the data to the medium associated with storage 412 after processing is completed.
- a variety of mechanisms are known for managing data movement between the medium and integrated circuit memory element and the disclosure is not limited thereto. The disclosure is not limited to a particular memory system or storage system.
- the computer system may include specially-programmed, special-purpose hardware, for example, an application-specific integrated circuit (ASIC).
- ASIC application-specific integrated circuit
- aspects of the disclosure can be implemented in software executed on hardware, hardware or firmware, or any combination thereof.
- computer system 400 is shown by way of example as one type of computer system upon which various aspects of the disclosure can be practiced, it should be appreciated that aspects of the disclosure are not limited to being implemented on the computer system as shown in FIG. 4 .
- Various aspects of the disclosure can be practiced on one or more computers having a different architectures or components than that shown in FIG. 4 .
- Various embodiments of the disclosure can be programmed using an object-oriented programming language, such as Java, C++, Ada, or C# (C-Sharp). Other object-oriented programming languages may also be used. Alternatively, functional, scripting, and/or logical programming languages can be used.
- Various aspects of the disclosure can be implemented in a non-programmed environment (e.g., documents created in HTML, XML or other format that, when viewed in a window of a browser program, render aspects of a graphical-user interface (GUI) or perform other functions).
- GUI graphical-user interface
- the system libraries of the programming languages are incorporated herein by reference.
- Various aspects of the disclosure can be implemented as programmed or non-programmed elements, or any combination thereof.
- the system can be a distributed system (e.g., client server, multi-tier system) comprising multiple general-purpose computer systems.
- the system includes software processes executing on a system for generating a characterization model.
- Various system embodiments can execute operations such as accepting a tumor sample, executing genomic sequencing, generating and displaying classification/characterization information on the sample, generating user interfaces for displaying classification information, accepting user input regarding genomic segments and/or boundary definition, among other options.
- the system embodiments may operate as “black box” systems where an input sample is classified without further interaction, and other system embodiments may permit user interaction to specify genomic segments, genomic intervals, etc., on which analysis is performed.
- FIG. 6 shows an architecture diagram of an example distributed system 600 suitable for implementing various aspects of the disclosure. It should be appreciated that FIG. 6 is used for illustration purposes only, and that other architectures can be used to facilitate one or more aspects of the disclosure.
- System 600 may include one or more general-purpose computer systems distributed among a network 602 such as, for example, the Internet. Such systems may cooperate to perform any of the functions and/or processes discussed above.
- one or more of systems 604 , 606 , and 608 may accept tumor samples, conduct genomic sequencing, and pass the resulting data to one or more of the remaining system 604 , 606 , and 608 .
- the one or more client computer systems 604 , 606 , and 608 can also be used to access and/or update calculations for values to solve classification functions, and/or determine classification values, execute fitting algorithms, execute regression analysis, determine confidence values, etc.
- a system 604 includes a browser program such as the Microsoft Internet Explorer application program, Mozilla's FireFox, or Google's Chrome browser through which one or more websites can be accessed. Further, there can be one or more application programs that are executed on system 604 that perform functions associated with evaluating a tumor sample, submitting a tumor sample, obtaining genomic sequencing data, and/or communicating genomic sequencing data.
- system 604 may include one or more local databases for storing, caching and/or retrieving sequencing information associated with testing, sequencing, etc.
- Network 602 may also include, one or more server systems, which can be implemented on general-purpose computers that cooperate to perform various functions discussed herein.
- System 600 may execute any number of software programs or processes and the disclosure is not limited to any particular type or number of processes. Such processes can be executed by system embodiments and/or system components to perform the various workflows and operations discussed.
- the method, or the assay further includes acquiring the sequence of a subgenomic interval that is present in a gene or gene product associated with one or more of drug metabolism, drug responsiveness, or toxicity (also referred to therein as “PGx” genes).
- Methods described herein can comprise providing a report, e.g., in electronic, web-based, or paper form, to the patient or to another person or entity, e.g., a caregiver, e.g., a physician, e.g., an oncologist, a hospital, clinic, third-party payor, insurance company or government office, a research collaborator, or, generally, a party which is interested in the characterization of a variant.
- a report e.g., in electronic, web-based, or paper form
- a caregiver e.g., a physician, e.g., an oncologist, a hospital, clinic, third-party payor, insurance company or government office, a research collaborator, or, generally, a party which is interested in the characterization of a variant.
- a CGH-like log-ratio profile of the sample is obtained by normalizing the sequence coverage obtained at all exons and >1,700 genome-wide SNPs against a process-matched normal control. This profile is segmented and interpreted using allele frequencies of sequenced SNPs to estimate tumor purity and copy number at each segment. Briefly, if S i is a genomic segment at constant copy number in the tumor, let l i be the length of S i , r ij be the coverage measurement of exon j within S i , and f ik be the minor allele frequency of SNP k within S i . We seek to estimate p—tumor purity, and C i —the copy numbers of S i . We jointly model r ij and f ik , given p and C i :
- M i is the copy number of minor alleles at S i , distributed as integer 0 ⁇ M i ⁇ C i , ⁇ ri and ⁇ fi reflect noise observed in the CGH and SNP data, respectively. Fitting is performed using Gibbs sampling, assigning absolute copy number to all segments. Focal amplifications are called at segments with ⁇ 6 copies and homozygous deletions at 0 copies, in samples with purity >20%.
- NGS Next-generation sequencing
- ACLIA-certified, CAP-accredited NGS-based test has been developed and deployed that interrogates the entire coding sequence of 236 selected cancer genes from minimal ( ⁇ 50 ng) DNA from FFPE tissue.
- Deep, uniform coverage and customized algorithms permit accurate identification of all classes of genomic alterations.
- a key practical constraint in genomic testing in oncology is the limited availability of matching normal specimens, restricting the interpretation of any novel variants identified which are either private germline polymorphisms or somatic alterations.
- An approach to assessing somatic vs. germline status of genomic alterations without a patient matched normal, as well as determining variant zygosity and LOH, is described herein.
- a CGH-like log-ratio profile of the sample is obtained by normalizing the coverage obtained at all exons and >3,500 genome-wide SNPs against a process-matched normal control.
- This profile is segmented and interpreted using allele frequencies of sequenced SNPs to estimate tumor purity (p) and copy number (C) at each segment. Fitting is performed using Gibbs sampling, assigning total copy number and minor allele count to all segments. Given a list of variants with unknown somatic/germline/zygosity status, the copy number and minor allele count (M) of the segment local to each variant is obtained. Allele frequencies f of variants of interest are interpreted using equation
- Statistical significance is assessed relative to read depth and to local variability in allele frequency estimates.
- TNBC triple-negative breast cancer
- This work describes a computational method based on interpretation of variant allele frequencies for determining the somatic/germline/LOH status of genomic alterations in clinical cancer specimens without a matched normal control.
- the method supports functional prioritization and interpretation of novel alterations discovered on routine testing and enables indication for additional diagnostic workup if predicted germline risk variants are found.
- this further informs clinical decision making and expands treatment choices for cancer patients.
- FIG. 1 The CGH-like log-ratio profile used to acquire SCI is shown in FIG. 1 . As shown in FIG. 1 , the total local copy number for BRCA1 is 2.
- this BRCA1 I600fs*7 variant is a somatic variant that is homozygous (2 of 2 copies) in the tumor.
- the candidate mutation tested in this Example is PIK3CA H1047R.
- a genome-wide copy number model indicated that the tumor has 4 copies of PIK3CA, with 2 variant alleles.
- the genomic segment containing PIK3CA is not under LOH in the tumor.
- AF allele frequency
- the candidate mutation tested in this Example is TP53 G356R.
- a genome-wide copy number model indicated that the tumor has 2 copies of TP53, with 2 variant alleles.
- the genomic segment containing TP53 is under LOH in the tumor.
- AF allele frequency
- FIG. 11 depicts a CGH-like log-ratio profile of sample for establishing an exemplary genome-wide copy number model. Selected chromosomes are annotated with respect to copy number, zygosity, and somatic/germline status, as shown in Table 4.
- p-arm of chromosome 1 is under copy-neutral LOH (LOHx), while the entire chromosome 13 is under copy-loss LOH (LOH1). Somatic status of certain functional mutations is also reported in Table 4.
- genomic testing in oncology A key constraint in genomic testing in oncology is that matched normal specimens are not commonly obtained in clinical practice. Thus, while most clinically relevant genomic alterations have been previously characterized and do not require normal tissue for interpretation, the use of novel variants whose somatic status is unknown is limited.
- This example describes a approach to predicting somatic vs. germline status of genomic alterations from tumor tissue alone in a CLIA-certified, NGS-based test that interrogates all exons of 236 cancer-related genes.
- f germline pM + 1 - p pC + 2 ⁇ ( 1 - p ) VS .
- ⁇ f somatic pM pC + 2 ⁇ ( 1 - p ) .
- measured allele frequency is compared to expectation, and a prediction is made with statistical confidence assessed based on read depth and local variability of SNP measurements in each segment.
- specimens from 30 lung and colon cancer patients were examined by sequencing the primary tumor, the metastatic tumor, and a matched-normal control. A total of 305 unique variants with known somatic status were assessed.
- This computational method leverages deep next-generation sequencing of clinical cancer specimens to predict variant somatic status without a matched-normal control. Accuracy of the method is >95%, demonstrated using three independent validation approaches.
- the analytic framework also assesses tumor LOH status of identified variants, and the sub-clonality of somatic mutations. It supports functional prioritization and interpretation of alterations discovered on routine testing and can indicate additional work-up if germline risk variants are found.
- a characterization model can be captured and tracked over time.
- the system can be configured to analyze and store characterization information on multiple tissue samples taken from a subject.
- the characterization model developed over time provides information on changes to the characterization model (including e.g., variant type, zygosity, etc.).
- the system can analyze the characterization model to identify relationships between different variants (e.g., tumors) based, for example, on similarity in characterization models.
- the system can identify related variants in different tumors, different patients, etc.
- a characterization model can include treatment information.
- the system can identify related treatment options responsive to similarity in characterization models and any respective treatments. Once related treatment options are identified, the system can present related treatment in user interface displays, in a report generated by the system, etc.
Landscapes
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Chemical & Material Sciences (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Computational Biology (AREA)
- Analytical Chemistry (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Description
- This application claims the benefit of U.S. Provisional Application No. 61/821,920, filed May 10, 2013 and U.S. Provisional Application No. 61/939,936, filed Feb. 14, 2014, the contents of which are hereby incorporated by reference in their entirety.
- The disclosure relates to the analysis of genetic variants.
- Typically, cancer tissues are characterized by genetic lesions which are, at least in part, responsible for the occurrence or phenotype of the disorder. Many cancers are characterized by one or more genetic aberrations, including gene copy number changes, somatic and germline mutations. The need still exists for analyzing genetic variants associated with cancer.
- The disclosure features, inter alia, methods and systems for analyzing or characterizing variants in a tumor, e.g., generating a characterization model for a variant (e.g., a mutation) in a tissue (e.g., a tumor or tumor sample) from a subject (e.g., a human subject, e.g., a cancer patient). Embodiments described herein allow for the analysis without the need for analyzing non-tumor tissue from the subject. For example, the methods described herein can be used in genomic testing that includes variants, e.g., novel variants, whose somatic status is unknown or unclear. The characterization can include assessment or indication of zygosity and/or variant type, e.g., as somatic or germline. The assessment has numerous uses including: obtaining an understanding of the genetic lesions in a cancer; selecting a treatment modality, e.g., in response to the analysis; staging, diagnosing, or prognosing a subject, e.g., in response to the analysis; developing novel therapeutic agents; the discovery and use of existing therapeutic agents for disorders not previously treated with that therapeutic agent; selection of subjects for experimental trials; understanding mechanisms of tumor characteristics, e.g., tumor metabolism, growth, invasiveness, resistance or susceptibility to therapy; selection or discovery of treatment regimes, e.g., drug combinations, e.g., for simultaneous use or for sequential use, e.g., as early or subsequent line of treatment; and assembling databases of tumor characteristics. The systems and methods disclosed herein are also useful for developing compositions, assays, kits, devices, systems, and methods for treating cancer. The systems and methods disclosed herein can inform clinical decision making and expand treatment choices for cancer patients.
- In one aspect, the disclosure provides, a system for generating a characterization model (including, e.g., variant type and/or zygosity) for a variant (e.g., a mutation) in a tissue or sample, e.g., a tumor, or tumor sample, from a subject, e.g., a human subject, e.g., a cancer patient. The system comprises:
-
- at least one processor operatively connected to a memory, the at least one processor when executing is configured to:
- a) acquire:
- i) a sequence coverage input (SCI), which comprises, for each of a plurality of selected subgenomic intervals (e.g., exons) a value for sequence coverage at the selected subgenomic intervals (including, e.g., a normalized sequence coverage value);
- ii) an SNP allele frequency input (SAFI), which comprises, for each of a plurality of selected germline SNPs, a value for the allele frequency, in the tissue or sample, e.g., tumor sample;
- iii) a variant allele frequency input (VAFI), which comprises the allele frequency for said variant, e.g., mutation, in the tissue or sample, e.g., tumor sample;
- b) acquire values, determined as a function of SCI and SAFI, for:
- a genomic segment total copy number (C), for each of a plurality of genomic segments;
- a genomic segment minor allele copy number (M), for each of a plurality of genomic segments; and
- sample purity (p); and
- c) calculate one or both, of:
- i) a value for variant type, e.g., mutation type, e.g., g, which is indicative of the variant being somatic, germline, subclonal somatic, or not-distinguishable, wherein the at least one processor when executing is configured calculate the value for variant type, e.g., mutation type, as a function of VAFI, p, C, and M;
- ii) an indication of the zygosity (e.g., homozygous, heterozygous, and absent) of the variant, e.g., mutation, in the tissue or sample, e.g., tumor sample, as function of C and M.
- In an embodiment, the system is configured such that the analysis can be performed without the need for analyzing non-tumor tissue from the subject.
- In an embodiment, the analysis is performed without analyzing non-tumor tissue from the subject, e.g., non-tumor tissue from the same subject is not sequenced.
- In an embodiment, the system is configured to determine for at least one of the tumor sample, the selected subgenomic intervals, and the selected germline SNPs that the variant type, e.g., mutation type, cannot be determined for analyzed values.
- In an embodiment, at least one processor when executing acquires the SCI calculated as a function (e.g., the log of the ratio) of the number of reads for a subgenomic interval and the number or reads for a control (e.g., a process-matched control).
- In an embodiment, at least one processor when executing is configured to calculate SCI as a function (e.g., the log of the ratio) of the number of reads for a subgenomic interval and the number or reads for a control (e.g., a process-matched control).
- In an embodiment, the at least one processor when executing is configured to validate a minimum number of subgenomic intervals have been selected or analyzed.
- In an embodiment, the at least one processor when executing is configured to acquire the SCI from values calculated against at least 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 1,000, 2,000, 3,000, or 4,000, subgenomic intervals (e.g., exons).
- In an embodiment, the at least one processor when executing is configured to calculate the SCI against at least 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 1,000, 2,000, 3,000, or 4,000, subgenomic intervals (e.g., exons).
- In an embodiment, the SCI comprises a plurality of respective values (e.g., log r values) for a plurality of subgenomic intervals (e.g., exons) from at least 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 1,000, 2,000, 3,000, or 4,000, genes.
- In an embodiment, at least one, a plurality, or substantially all of the values comprised in the SCI are corrected for correlation with GC content.
- In an embodiment, at least one processor when executing is configured to validate a minimum number of a plurality of germline SNPs have been selected or analyzed.
- In an embodiment, the minimum number of germline SNPs comprises at least 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 1,000, 2,000, 3,000, 4,000, 5000, 6000, 7000, 8000, 9000, 10,000, or 15,000 germline SNPs.
- In an embodiment, the SAFI is based, at least in part, on a minor allele frequency in the tumor sample.
- In an embodiment, the at least one processor when executing is configured to calculate, or acquire, SAFI based, at least in part, on a minor allele frequency in the tumor sample.
- In an embodiment, the SAFI is based, at least in part, on an alternative allele frequency (e.g., an allele frequency other than a standard allele in a human genome reference database).
- In an embodiment, the at least one processor when executing is configured to calculate, or acquire, SAFI based, at least in part, on an alternative allele frequency (e.g., an allele frequency other than a standard allele in a human genome reference database).
- In an embodiment, the at least one processor when executing is configured to access values of C, M, and p calculated from fitting a genome-wide copy number model to the SCI and the SAFI.
- In an embodiment, the at least one processor when executing is configured to calculate C, M, and p.
- In an embodiment, the at least one processor when executing generates a best fit between the genome-wide copy number model and the SCI and the SAFI to calculate C, M, and p.
- In an embodiment, values of C, M, and p fit a plurality of genome-wide copy number model inputs of the SCI and the SAFI.
- In an embodiment, the at least one processor when executing is configured to access or calculate one or more genome-wide copy number models.
- In an embodiment, the at least one processor when executing is configured to determine a confidence value for each of the plurality of genome-wide copy number models based on a determined fit to the SCI and the SAFI.
- In an embodiment, the at least one processor when executing is configured to calculate C, M, and p, responsive to contributions from each of the plurality of genome-wide copy models.
- In an embodiment, the contributions are determined according to a confidence level for each of the plurality of genome-wide copy models (including, e.g., confidence levels reflective of a degree of fit).
- In an embodiment, a genomic segment comprises a plurality of subgenomic intervals, e.g., exons, e.g., subgenomic intervals, e.g., exons, which have been assigned a SCI value.
- In an embodiment, the system is configured to calculate and/or assign SCI values to a plurality of subgenomic intervals.
- In an embodiment, the at least one processor when executing is configured to require a minimum number of subgenomic intervals for analysis of a genomic segment.
- In an embodiment, a genomic segment comprises at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200, 225, 250, 275, 300, 400, or 500 subgenomic intervals, e.g., exons.
- In an embodiment, a genomic segment comprises 10 to 1,000, 20 to 900, 30 to 700, 40 to 600, 50 to 500, 60 to 400, 70 to 300, 80 to 200, 80 to 150, or 80 to 120, 90 to 110, or about 100, subgenomic intervals (e.g., exons).
- In an embodiment, a genomic segment comprises between 100 and 10,000, 100 and 5,000, 100 and 4,000, 100 and 3,000, 100 and 2,000, or 100 and 1,000, subgenomic intervals (e.g., exons).
- In an embodiment, a genomic segment comprises 10 to 1,000, 20 to 900, 30 to 700, 40 to 600, 50 to 500, 60 to 400, 70 to 300, 80 to 200, 80 to 150, or 80 to 120, 90 to 110, or about 100 genomic SNPs, which have been assigned a SAFI value.
- In an embodiment, a genomic segment comprises between 100 and 10,000, 100 and 5,000, 100 and 4,000, 100 and 3,000, 100 and 2,000, or 100 and 1,000, genomic SNPs which have been assigned a SAFI value.
- In an embodiment, the at least one processor when executing is configured to validate each of a plurality of genomic segments with values having one or both of:
-
- a measure of normalized sequence coverage, e.g., log r, that differ by no more than a preselected amount, e.g., the values for log2 r for subgenomic intervals, e.g., exons, within the boundaries of the genomic segment differ by no more than a reference value, or are substantially constant; and
- SNP allele frequencies for germline SNPs that differ by no more than a preselected amount, e.g., the values for germline SNP allele frequencies for subgenomic intervals, e.g., exons, within the boundaries of the genomic segment differ by no more than a reference value, or are substantially constant.
- In an embodiment, the at least one processor when executing is configured to require the number of subgenomic intervals (e.g., exons) that are contained in, or are combined to form, a genomic segment is at least 2, 5, 10, 15, 20, 50, or 100 times the number of genomic segments.
- In an embodiment, the at least one processor when executing is configured to require the number of subgenomic intervals, e.g., exons, is at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 times the number of genomic segments.
- In an embodiment, the at least one processor when executing is configured to define a boundary for a genomic segment (e.g., automatically define boundary, accept user input on the boundary, generate relative boundary from user provided inputs, display a user interface for defining genomic segment boundary, display suggested boundary, etc.).
- In an embodiment, the at least one processor when executing is configured to assemble sequences of subgenomic intervals (e.g., exons) into genetic segments (including, e.g., user identified subgenomic intervals, system identified subgenomic intervals, candidate subgenomic intervals, user confirmed candidate subgenomic intervals).
- In an embodiment, the at least one processor when executing is configured to segment a genomic sequence into subgenomic intervals of equal copy number (e.g., according to circular binary segmentation (CBS) algorithms, an HMM based method, a Wavelet based method, or a Cluster along Chromosomes method).
- In an embodiment, the at least one processor when executing is configured to assemble subgenomic intervals into genomic segments of equal copy number (e.g., according to circular binary segmentation (CBS) algorithms, an HMM based method, a Wavelet based method, or a Cluster along Chromosomes method).
- In an embodiment, the at least one processor when executing is configured to assemble sequences for subgenomic intervals according to a method described herein (e.g., circular binary segmentation function (CBS) an HMM based method, a Wavelet based method, or a Cluster along Chromosomes method).
- In an embodiment, the at least one processor when executing is configured to fit the genome-wide copy number model to the SCI according to calculation of:
-
- where ψ is tumor ploidy.
- In an embodiment, the at least one processor is configured to determined ψ=(ΣiliCi)/Σili), wherein li is determined based at least in part on the length of a genomic segment being analyzed.
- In an embodiment, the at least one processor when executing is configured to fit the genome-wide copy number model to the SAFI according to calculation of:
-
- where AF is allele frequency.
- In an embodiment, the at least one processor when executing is configured to fit the genome-wide copy number model according to execution of Gibbs sampling.
- In an embodiment, the at least one processor when executing is configured to fit the genome-wide copy number model by determining a best fit model from a fitting algorithm (e.g., Markov chain Monte Carlo (MCMC) algorithm, e.g., ASCAT (Allele-Specific Copy Number Analysis of Tumors), OncoSNP, or PICNIC (Predicting Integral Copy Numbers In Cancer).
- In an embodiment, the fitting comprises using Metropolis-Hastings MCMC.
- In an embodiment, the fitting comprises using a non-Bayesian approach (e.g., a frequentist approach, e.g., using least squares fitting).
- In an embodiment, the at least one processor when executing is configured to determine g by calculating a fit of values for VAFI, p, C, and M to a model for somatic/germline status.
- In an embodiment, the at least one processor when executing is configured to determine g by solving for g in
-
- In an embodiment, the at least one processor when executing is configured to classify a type of variant responsive to a calculated value of g.
- In an embodiment, the at least one processor when executing is configured to classify the type of variant based on at least one of:
- for the g value sufficiently close to 0, classify the variant as a somatic variant; (e.g., with a predetermined distance from 0)
- for the g value approximately equal to 1 (e.g., with a predetermined distance from 1), or higher, classify the variant as a germline variant; and
- for the g value between 0 and 1, evaluate the g value to determine that it is not close to either the somatic classification value or the germline classification value (e.g., 0.4 to 0.6), and classify the variant as indistinguishable; and
- for the g value less than 0, classify the variant as a subclonal somatic variant.
- In an embodiment, the at least one processor when executing is configured to define an indistinguishable range of values for g responsive to local evaluation of the genomic segment calculations.
- In an embodiment, the at least one processor when executing is configured to define the indistinguishable range of values based on a confidence level associated with calculated values, wherein the greater the confidence level the smaller the range of values of g defining the indistinguishable range, and wherein the smaller the confidence level the greater the range of values of g defining the indistinguishable range of values.
- In an embodiment, the at least one processor when executing is configured to classify a zygosity of the variant responsive to a calculated value indicating heterozygosity.
- In an embodiment, the at least one processor when executing is configured to determine the sample purity (p) as a global purity value (e.g., is the same for all genomic segments).
- In an embodiment, the at least one processor when executing is configured to determine the value of g according to:
-
- where AF is allele frequency (e.g., the variant is a germline polymorphism if g=1 and the variant is a somatic mutation if g=0).
- In an embodiment, the at least one processor when executing is configured to determine that a g value is approximately equal to 1 and classify the variant as a germline polymorphism.
- In an embodiment, the at least one processor when executing is configured to determine that a g value is approximately equal to 1 (e.g., greater than 0.6) and classify the variant as a germline polymorphism.
- In an embodiment, the at least one processor when executing is configured to determine that a g value is approximately equal to 0 (e.g., less than 0.4) and classify the variant as a somatic mutation.
- In an embodiment, the at least one processor when executing is configured to determine that a g value is approximately equal to a classification value (e.g., g is approximately 1 or 0) responsive to a degree of statistical confidence in the calculations.
- In an embodiment, the at least one processor when executing is configured to determine that a g value is significantly less than 0, and classify the variant as a subclonal somatic variant.
- In an embodiment, the at least one processor when executing is configured to determine the value of g according to:
-
- where AF is allele frequency, and M′=C−M (e.g., when M is a non-minor allele frequency) (e.g., the variant is a germline polymorphism if g=1 and the variant is a somatic mutation if g=0).
- In an embodiment, the somatic/germline status is determined when the sample purity is below, for example, about 40% (e.g., between about 10% and 30% (e.g., between about 10% and 20% or between about 20% and 30%)).
- In an embodiment, the at least one processor when executing is configured to validate purity values.
- In an embodiment, the at least one processor when executing is configured to define a confidence level for a calculation based on the sample purity value.
- In an embodiment, the at least one processor when executing is configured to increase a confidence level for a determination of variant type based on a low purity (e.g., 10−30%), and/or decrease a confidence level for a determination of zygosity based on a low purity (e.g., 10−30%), and/or increase a confidence level for a determination of zygosity based on a high purity (e.g., >90%), and/or decrease a confidence level for a determination of variant type based on a high purity (e.g., >90%).
- In an embodiment, the at least one processor when executing is configured to classify the variant according to:
-
- a value of M equal to 0 not equal to C indicates an absence of the variant, e.g., mutation, e.g., not existent in the tumor;
- a non-zero value of M equal to C indicates a homozygosity of the variant, e.g., mutation, e.g., with loss of heterozygosity (LOH);
- a value of M equal to 0 equal to C indicates a homozygous deletion of the variant, e.g., mutation, e.g., not existent in the tumor; and
- a non-zero value of M not equal to C indicates a heterozygosity of the variant, e.g., mutation.
- In an embodiment, the at least one processor when executing is configured to determine an indication of zygosity for said variant (e.g., mutation).
- In an embodiment, the at least one processor when executing is configured to determine the indication of zygosity for said variant is homozygous when M=C≠0 (including, for example, M is approximately equal to C), e.g., with LOH.
- In an embodiment, the at least one processor when executing is configured to determine the indication of zygosity for said variant is homozygously deleted when M=C=0 (including, for example, M is approximately equal to C).
- In an embodiment, the at least one processor when executing is configured to determine the indication of zygosity for said variant is heterozygous when 0<M<C.
- In an embodiment, the at least one processor when executing is configured to determine the indication of zygosity for said variant is absent from the tumor when M=0 and C≠0 (including, for example, M is approximately equal to 0).
- In an embodiment, the at least one processor when executing is configured to require the sample purity is greater than about 80%, e.g., between about 90% and 100%, e.g., between about 90% and 95%, or between about 95% and 100%, when determining the zygosity.
- In an embodiment, the at least one processor when executing in configured to process-match control values using values obtained where the control is a sample of euploid (e.g., diploid) tissue from a subject other than the subject from which the tumor sample is from, or a sample of mixed euploid (e.g., diploid) tissues from one or more (e.g., at least 2, 3, 4, or 5) subjects other than the subject from which the tumor sample is from.
- In an embodiment, the at least one processor when executing is configured to sequence each of the selected subgenomic intervals and each of the selected germline SNPs, e.g., by next generation sequencing (NGS).
- In an embodiment, the at least one processor when executing is configured to determine sequence coverage prior to normalization is at least about 10×, 20×, 30×, 50×, 100×, 250×, 500×, 750×, or 1000× the depth of the sequencing.
- In an embodiment, the subject has received an anti-cancer therapy.
- In an embodiment, the subject has received an anti-cancer therapy and is resistant to the therapy or exhibits disease progression.
- In an embodiment, the subject has received an anti-cancer therapy which is selected from: a therapeutic agent that has been approved by the FDA, EMEA, or other regulatory agency; or a therapeutic agent that has been not been approved by the FDA, EMEA, or other regulatory agency.
- In an embodiment, the subject has received an anti-cancer therapy in the course of a clinical trial, e.g., a Phase I, Phase II, or Phase III clinical trial (or in an ex-US equivalent of such a trial).
- In an embodiment, the variant is positively associated with the type of tumor present in the subject, e.g., with occurrence of, or resistance to treatment.
- In an embodiment, the variant is not positively associated with the type of tumor present in the subject.
- In an embodiment, the variant is positively associated with a tumor other than the type of tumor present in the subject.
- In an embodiment, the variant is a variant that is not positively associated with the type of tumor present in the subject.
- In an embodiment, the system is configured to memorialize, e.g., in a database, e.g., a machine readable database, provide a report containing, or transmit, a descriptor for one or more of: the presence, absence, or frequency, of other mutations in the tumor, e.g., other mutations associated with the tumor type in the sample, other mutations not associated with the tumor type in the sample, or other mutations associated with a tumor other than the tumor type in the sample; the characterization of the variant; the allele or gene; or the tumor type, e.g., the name of the type of tumor, whether the tumor is primary or secondary; a subject characteristic; or therapeutic alternatives, recommendations, or choices.
- In an embodiment, a descriptor relating to the characterization of the variant comprises a descriptor for zygosity or germline vs. somatic status.
- In an embodiment, a descriptor relating to a subject characteristic comprises a descriptor for one or more of: the subject's identity; one or more of the subject's, age, gender, weight, or other similar characteristic, occupation; the subject's medical history, e.g., occurrence of the tumor or of other disorders; the subject's family medical history, e.g., relatives who share or do not share the variant; or the subject's prior treatment history, e.g., the treatment received, response to a previously administered anti-cancer therapy, e.g., disease resistance, responsiveness, or progression.
- In an embodiment, the system is in communication with a system that provides one or more of: sequencing data, e.g., raw sequencing data; or sequence analysis.
- In an embodiment, the system can further provide one or more of: sequencing data, e.g., raw sequencing data; or sequence analysis.
- In an embodiment, the at least one processor when executing is configured to generate a user interface.
- In an embodiment, the user interface is configured to accept as input any one or more of:
-
- a sequence coverage input (SCI), which comprises, for each of a plurality of selected subgenomic intervals, e.g., exons, a value for sequence coverage at the selected subgenomic intervals (including, e.g., a normalized sequence coverage value);
- an SNP allele frequency input (SAFI), which comprises, for each of a plurality of selected germline SNPs, a value for the allele frequency, in the tumor sample;
- a variant allele frequency input (VAFI), which comprises the allele frequency for said variant, e.g., mutation, in the tumor sample;
- a genomic segment total copy number (C), for each of a plurality of genomic segments;
- a genomic segment minor allele copy number (M), for each of a plurality of genomic segments; and
- sample purity (p).
- In an embodiment, responsive to the user interface input, e.g., for one or more (e.g., 2, 3, 4, 5 or all) of SCI, SAFI, VAFI, C, M, or p, the system generates a characterization model, e.g., a characterization model for a variant as described herein.
- In an embodiment, the user interface is configured to display subgenomic intervals or a value calculated therefrom.
- In an embodiment, the user interface is configured to accept user input selecting a plurality of subgenomic intervals on which to evaluate the tumor sample from the subject.
- In an embodiment, the user interface is configured to display germline SNPs for the tumor sample.
- In an embodiment, the user interface is configured to accept user input selecting a plurality of germline SNPs on which to evaluate the tumor sample.
- In an embodiment, the user interface is configured to accept user defined confidence level for calculated values (e.g., calculated value described above).
- In an embodiment, the user interface is configured to accept user input to define a boundary for a genomic segment.
- In an embodiment, the user interface is configured to display a system generated genomic segment boundary for acceptance or modification by a user.
- In another aspect, the disclosure features, a method of characterizing a variant, e.g., a mutation, in a tissue or sample, e.g., a tumor, or tumor sample, from a subject, e.g., a human, e.g., a cancer patient, comprising:
-
- a) acquiring:
- i) a sequence coverage input (SCI), which comprises, for each of a plurality of selected subgenomic intervals, e.g., exons, a value for normalized sequence coverage at the selected subgenomic intervals;
- ii) an SNP allele frequency input (SAFI), which comprises, for each of a plurality of selected germline SNPs, a value for the allele frequency, in the tumor or sample, e.g., tumor sample;
- iii) a variant allele frequency input (VAFI), which comprises the allele frequency for said variant, e.g., mutation, in the tumor or sample, e.g., tumor sample;
- b) acquiring values, as a function of SCI and SAFI, for:
- C, for each of a plurality of genomic segments, wherein C is a genomic segment total copy number;
- M, for each of a plurality of genomic segments, wherein M is a genomic segment minor allele copy number; and
- p, wherein p is sample purity; and
- c) acquiring one or both of:
- i) a value for variant type, e.g. mutation type, e.g., g, which is indicative of the variant, e.g., a mutation, being somatic, a subclonal somatic variant, germline, or not-distinguishable, and is a function of VAFI, p, C, and M;
- ii) an indication of the zygosity of the variant, e.g., mutation, in the tumor or sample, e.g.,tumor sample, as function of C and M.
- a) acquiring:
- In an embodiment the analysis can be performed without the need for analyzing non- tumor tissue from the subject.
- In an embodiment, the analysis is performed without analyzing non-tumor tissue from the subject, e.g., non-tumor tissue from the same subject is not sequenced.
- In an embodiment, the SCI comprises values that are a function, e.g., the log of the ratio, of the number of reads for a subgenomic interval, e.g., from the sample, and the number or reads for a control, e.g., a process-matched control.
- In an embodiment, the SCI comprises values, e.g., log r values, for at least 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, or 10,000, subgenomic intervals, e.g., exons.
- In an embodiment, the SCI comprises values, e.g., log r values, for at least 100 subgenomic intervals, e.g., exons.
- In an embodiment, the SCI comprises values, e.g., log r values, for 1,000 to 10,000, 2,000 to 9,000, 3,000 to 8,000, 3,000 to 7,000, 3,000 to 6,000, or 4,000 to 5,000, subgenomic intervals, e.g., exons.
- In an embodiment, the SCI comprises values, e.g., log r values, for subgenomic intervals, e.g., exons, from at least 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 1,000, 2,000, 3,000, or 4,000, genes.
- In an embodiment, at least one, a plurality, or substantially all of the values comprised in the SCI are corrected for correlation with GC content.
- In an embodiment, a subgenomic interval, e.g., an exon, from the sample has at least 10, 20, 30, 40, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, or 1,000 reads.
- In an embodiment, a plurality, e.g., at least 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, or 10,000, subgenomic intervals, e.g., exons, from the sample has a predetermined number of reads.
- In an embodiment, the predetermined number of reads is at least 10, 20, 30, 40, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, or 1,000.
- In an embodiment, the plurality of germline SNPs comprise at least 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 1,000, 2,000, 3,000, 4,000, 5000, 6000, 7000, 8000, 9000, 10,000, or 15,000 germline SNPs.
- In an embodiment, the plurality of germline SNPs comprise at least 100 germline SNPs.
- In an embodiment, the plurality of germline SNPs comprises 500 to 5,000, 1,000 to 4,000, or 2,000 to 3,000 germline SNPs.
- In an embodiment, the allele frequency is a minor allele frequency.
- In an embodiment, the allele frequency is an alternative allele, e.g., an allele other than a standard allele in a human genome reference database.
- In an embodiment, the method comprises characterizing a plurality of variants, e.g., mutants, in the tumor sample.
- In an embodiment, the method comprises characterizing at least 2, 3, 4, 5, 6, 7, 8 9, 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, or 500 variants, e.g., mutants.
- In an embodiment, the method comprises characterizing variants, e.g., mutants, in at least 2, 3, 4, 5, 6, 7, 8 9, 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, or 500 different genes.
- In an embodiment, the method comprises acquiring a VAFI for at least 2, 3, 4, 5, 6, 7, 8 9, 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, or 500 variants, e.g., mutants.
- In an embodiment, the method comprises performing one, two or all, of steps a), b), and c) for at least 2, 3, 4, 5, 6, 7, 8 9, 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, or 500 variants, e.g., mutants.
- In an embodiment, values of C, M, and p are, have, or can be obtained by, fitting a genome-wide copy number model to one or both of the SCI and the SAFI.
- In an embodiment, values of C, M, and p fit a plurality of genome-wide copy number model inputs of the SCI and the SAFI.
- In an embodiment, a genomic segment comprises a plurality of subgenomic intervals, e.g., exons, e.g., subgenomic intervals which have been assigned a SCI value.
- In an embodiment, a genomic segment comprises at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200, 225, 250, 275, 300, 400, or 500 subgenomic intervals, e.g., exons.
- In an embodiment, a genomic segment comprises 10 to 1,000, 20 to 900, 30 to 700, 40 to 600, 50 to 500, 60 to 400, 70 to 300, 80 to 200, 80 to 150, or 80 to 120, 90 to 110, or about 100, subgenomic intervals, e.g., exons.
- In an embodiment, a genomic segment comprises between 100 and 10,000, 100 and 5,000, 100 and 4,000, 100 and 3,000, 100 and 2,000, or 100 and 1,000, subgenomic intervals, e.g., exons.
- In an embodiment, a genomic segment comprises 10 to 1,000, 20 to 900, 30 to 700, 40 to 600, 50 to 500, 60 to 400, 70 to 300, 80 to 200, 80 to 150, or 80 to 120, 90 to 110, or about 100 genomic SNPs, which have been assigned a SAFI value.
- In an embodiment, a genomic segment comprises between 100 and 10,000, 100 and 5,000, 100 and 4,000, 100 and 3,000, 100 and 2,000, or 100 and 1,000, genomic SNPs which have been assigned a SAFI value.
- In an embodiment, each of a plurality of genomic segments are characterized by having one or both of:
-
- a measure of normalized sequence coverage, e.g., log r, that differ by no more than a preselected amount, e.g., the values for log2 r for subgenomic intervals, e.g., exons, within the boundaries of the genomic segment differ by no more than a reference value, or are substantially constant; and
- SNP allele frequencies for germline SNPs that differ by no more than a preselected amount, e.g., the values for germline SNP allele frequencies for subgenomic intervals, e.g., exons, within the boundaries of the genomic segment differ by no more than a reference value, or are substantially constant.
- In an embodiment, the number of subgenomic intervals, e.g., exons, that are contained in, or are combined to form, a genomic segment is at least 2, 5, 10, 15, 20, 50, or 100 times the number of genomic segments.
- In an embodiment, the number of subgenomic intervals, e.g., exons, is at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 times the number of genomic segments.
- In an embodiment, a boundary for a genomic segment is provided.
- In an embodiment, the method comprises assembling sequences for subgenomic intervals, e.g., exons, into genetic segments.
- In an embodiment, the method comprises assembling sequences for subgenomic intervals, with a method described herein, e.g., a method comprising a circular binary segmentation (CBS), an HMM based method, a Wavelet based method, or a Cluster along Chromosomes method.
- In an embodiment, fitting the genome-wide copy number model to the SCI comprises using the equation of:
-
- where ψ is tumor ploidy.
- In an embodiment, ψ=(ΣiliCi)/Σili, let li be the length of a genomic segment.
- In an embodiment, fitting the genome-wide copy number model to the SAFI comprises using the equation of:
-
- where AF is allele frequency.
- In an embodiment, the fitting comprises using Gibbs sampling.
- In an embodiment, fitting comprises using e.g., Markov chain Monte Carlo (MCMC) algorithm, e.g., ASCAT (Allele-Specific Copy Number Analysis of Tumors), OncoSNP, or PICNIC (Predicting Integral Copy Numbers In Cancer).
- In an embodiment, fitting comprises using Metropolis-Hastings MCMC.
- In an embodiment, fitting comprises using a non-Bayesian approach, e.g., a frequentist approach, e.g., using least squares fitting.
- In an embodiment, g is determined by determining the fit of values for VAFI, p, C, and M to a model for somatic/germline status.
- In an embodiment, the method comprises acquiring an indication of heterozygosity for said variant, e.g., mutation.
- In an embodiment, sample purity (p) is global purity, e.g., is the same for all genomic segments.
- In an embodiment, the value of g is acquired by:
-
- where AF is allele frequency.
- In an embodiment, a value of g that is close to 0, e.g., does not differ significantly from 0, indicates the variant is a somatic variant.
- In an embodiment, a value of g that is 0, or close to 0, e.g., within a predetermined distance from 0, e.g., a value of g of less than 0.4, indicates the variant is a somatic variant.
- In an embodiment, a value of g that is close to 1, e.g., does not differ significantly from 1, indicates the variant is a germline variant.
- In an embodiment, a value of g that is 1, or close to 1, e.g., within a predetermined distance from 1, e.g., a value of g of more than 0.6, indicates the variant is a germline variant.
- In an embodiment, a value of g is less than 1 but more than 0, e.g., if it is less than 1 by a predetermined amount and more than 0 by a predetermined amount, e.g., if g is between 0.4 and 0.6, it indicates an indistinguishable result.
- In an embodiment, a value of g that is significantly less than 0, is indicative of a subclonal somatic variant.
- In an embodiment, the value of g is acquired by:
-
- where AF is allele frequency, and M′=C−M (e.g., when M is a non-minor allele frequency), e.g., the variant is a germline polymorphism if g=1 and the variant is a somatic mutation if g=0.
- In an embodiment, the somatic/germline status is determined, e.g., when the sample purity is below about 40%, e.g., between about 10% and 30%, e.g., between about 10% and 20%, or between about 20% and 30%.
- In an embodiment, when:
-
- a value of M equal to 0 not equal to C is indicative of absence of the variant, e.g., mutation, e.g., not existent in the tumor;
- a non-zero value of M equal to C is indicative of homozygosity of the variant, e.g., mutation, e.g., with loss of heterozygosity (LOH);
- a value of M equal to 0 equal to C indicates a homozygous deletion of the variant, e.g., mutation, e.g., not existent in the tumor; and
- a non-zero value of M not equal to C is indicative of heterozygosity of the variant, e.g., mutation.
- In an embodiment, the method comprises acquiring an indication of zygosity for said variant, e.g., mutation.
- In an embodiment, the mutation status is determined as homozygous (e.g., LOH) if M=C≠0.
- In an embodiment, the mutation status is determined as homozygous deletion if M=C=0.
- In an embodiment, the mutation status is determined as heterozygous is 0<M<C.
- In an embodiment, the mutation is absent from the tumor if M=0 and C≠0.
- In an embodiment, the zygosity is determined, e.g., when the sample purity is greater than about 80%, e.g., between about 90% and 100%, e.g., between about 90% and 95%, or between about 95% and 100%.
- In an embodiment, the control is a sample of euploid (e.g., diploid) tissue from a subject other than the subject from which the tumor sample is from, or a sample of mixed euploid (e.g., diploid) tissues from one or more (e.g., at least 2, 3, 4, or 5) subjects other than the subject from which the tumor sample is from.
- In an embodiment, the method comprises sequencing each of the selected subgenomic intervals and each of the selected germline SNPs, e.g., by next generation sequencing (NGS).
- In an embodiment, the sequence coverage prior to normalization is at least about 10×, 20×, 30×, 50×, 100×, 250×, 500×, 750×, or 1000× the depth of the sequencing.
- In an embodiment, the subject has received an anti-cancer therapy.
- In an embodiment the subject has received an anti-cancer therapy and is resistant to the therapy or exhibits disease progression.
- In an embodiment the subject has received an anti-cancer therapy which is selected from: a therapeutic agent that has been approved by the FDA, EMEA, or other regulatory agency; or a therapeutic agent that has been not been approved by the FDA, EMEA, or other regulatory agency.
- In an embodiment the subject has received an anti-cancer therapy in the course of a clinical trial, e.g., a Phase I, Phase II, or Phase III clinical trial (or in an ex-US equivalent of such a trial).
- In an embodiment the variant is positively associated with the type of tumor present in the subject, e.g., with occurrence of, or resistance to treatment.
- In an embodiment the variant is not positively associated with the type of tumor present in the subject.
- In an embodiment the variant is positively associated with a tumor other than the type of tumor present in the subject.
- In an embodiment the variant is a variant that is not positively associated with the type of tumor present in the subject.
- In an embodiment, the method can memorialize, e.g., in a database, e.g., a machine readable database, provide a report containing, or transmit, a descriptor for one or more of: the presence, absence, or frequency, of other mutations in the tumor, e.g., other mutations associated with the tumor type in the sample, other mutations not associated with the tumor type in the sample, or other mutations associated with a tumor other than the tumor type in the sample; the characterization of the variant; the allele or gene; or the tumor type, e.g., the name of the type of tumor, whether the tumor is primary or secondary; a subject characteristic; or therapeutic alternatives, recommendations, or choices.
- In an embodiment a descriptor relating to the characterization of the variant comprises a descriptor for zygosity or germline vs somatic status.
- In an embodiment a descriptor relating to a subject characteristic comprises a descriptor for one or more of: the subject's identity; one or more of the subject's, age, gender, weight, or other similar characteristic, occupation; the subject's medical history, e.g., occurrence of the tumor or of other disorders; the subject's family medical history, e.g., relatives who share or do not share the variant; or the subject's prior treatment history, e.g., the treatment received, response to a previously administered anti-cancer therapy, e.g., disease resistance, responsiveness, or progression.
- Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.
- Other features and advantages of the invention will be apparent from the detailed description, drawings, and from the claims.
- The drawings are first described.
-
FIG. 1 depicts an exemplary CGH-like log-ratio profile of sample to acquire Input SCI. The region that encompasses BRCA1 gene is circled. -
FIG. 2 depicts an exemplary germline SNP allele frequency profile of sample to acquire Input SAFI. The region that encompasses BRCA1 gene is circled. -
FIG. 3 is a process flow chart for determining a characterization model for a tumor sample according to one embodiment. -
FIG. 4 shows an exemplary block diagram of a general-purpose computer system 400 which can be specially configured to practice various aspects of the present disclosure discussed herein. -
FIG. 5 depicts a storage device. -
FIG. 6 depicts a networked computer system. -
FIG. 7 provides a Table of expected allele frequencies showing that the ability to distinguish somatic variants versus germline polymorphisms, and the ability to determine zygosity status are dependent upon sample purity. -
FIG. 8 depicts a subset of the Table shown inFIG. 7 with the LOH status indicated. -
FIG. 9 depicts a CGH-like log-ratio profile of sample for determination of somatic/germline status and zygosity for PIK3CA H1047R variant. -
FIG. 10 depicts a CGH-like log-ratio profile of sample for determination of somatic/germline status and zygosity for TP53 G356R variant. -
FIG. 11 depicts an exemplary CGH-like log-ratio profile of sample. - Certain terms are first defined. Additional terms are defined throughout the specification.
- As used herein, the articles “a” and “an” refer to one or to more than one (e.g., to at least one) of the grammatical object of the article.
- “About” and “approximately” shall generally mean an acceptable degree of error for the quantity measured given the nature or precision of the measurements. Exemplary degrees of error are within 20 percent (%), typically, within 10%, and more typically, within 5% of a given value or range of values.
- “Acquire” or “acquiring” as the terms are used herein, refer to obtaining possession of a physical entity, or a value, e.g., a numerical value, by one or more or all of: “directly acquiring,” “indirectly acquiring” the physical entity or value, or in the case of a value, “acquiring by calculation.”
- “Directly acquiring” means performing a process (e.g., performing a synthetic or analytical method) to obtain the physical entity or value. “Directly acquiring a physical entity includes performing a process that includes a physical change in a physical substance, e.g., a starting material. Exemplary changes include making a physical entity from two or more starting materials, shearing or fragmenting a substance, separating or purifying a substance, combining two or more separate entities into a mixture, performing a chemical reaction that includes breaking or forming a covalent or non covalent bond. Directly acquiring a value includes performing a process that includes a physical change in a sample or another substance, e.g., performing an analytical process which includes a physical change in a substance, e.g., a sample, analyte, or reagent (sometimes referred to herein as “physical analysis”), performing an analytical method, e.g., a method which includes one or more of the following: separating or purifying a substance, e.g., an analyte, or a fragment or other derivative thereof, from another substance; combining an analyte, or fragment or other derivative thereof, with another substance, e.g., a buffer, solvent, or reactant; or changing the structure of an analyte, or a fragment or other derivative thereof, e.g., by breaking or forming a covalent or non covalent bond, between a first and a second atom of the analyte; or by changing the structure of a reagent, or a fragment or other derivative thereof, e.g., by breaking or forming a covalent or non covalent bond, between a first and a second atom of the reagent.
- “Indirectly acquiring” refers to receiving the physical entity or value from another party or source (e.g., a third party laboratory that directly acquired the physical entity or value). E.g., a first party may acquire a value from a second party (indirectly acquiring) which said second party directly acquired or acquired by calculation.
- “Acquiring by calculation” refers to acquiring a value by calculation or computation, e.g., as performed on a machine, e.g., a computer.
- “Acquiring a sample” as the term is used herein, refers to obtaining possession of a sample, e.g., a tissue sample or nucleic acid sample, by “directly acquiring” or “indirectly acquiring” the sample. “Directly acquiring a sample” means performing a process (e.g., performing a physical method such as a surgery or extraction) to obtain the sample. “Indirectly acquiring a sample” refers to receiving the sample from another party or source (e.g., a third party laboratory that directly acquired the sample). Directly acquiring a sample includes performing a process that includes a physical change in a physical substance, e.g., a starting material, such as a tissue, e.g., a tissue in a human patient or a tissue that has was previously isolated from a patient. Exemplary changes include making a physical entity from a starting material, dissecting or scraping a tissue; separating or purifying a substance (e.g., a sample tissue or a nucleic acid sample); combining two or more separate entities into a mixture; performing a chemical reaction that includes breaking or forming a covalent or non-covalent bond. Directly acquiring a sample includes performing a process that includes a physical change in a sample or another substance, e.g., as described above. Methods described herein can include acquiring the tumor sample.
- “Next-generation sequencing or NGS or NG sequencing” as used herein, refers to any sequencing method that determines the nucleotide sequence of either individual nucleic acid molecules (e.g., in single molecule sequencing) or clonally expanded proxies for individual nucleic acid molecules in a high throughput fashion (e.g., greater than 103, 104, 105 or more molecules are sequenced simultaneously). In one embodiment, the relative abundance of the nucleic acid species in the library can be estimated by counting the relative number of occurrences of their cognate sequences in the data generated by the sequencing experiment. Next generation sequencing methods are known in the art, and are described, e.g., in Metzker, M. (2010) Nature Biotechnology Reviews 11:31-46, incorporated herein by reference. Next generation sequencing can detect a variant present in less than 5% of the nucleic acids in a sample. Method described herein can use NGS methods.
- “Nucleotide value” as referred herein, represents the identity of the nucleotide(s) occupying or assigned to a preselected nucleotide position. Typical nucleotide values include: missing (e.g., deleted); additional (e.g., an insertion of one or more nucleotides, the identity of which may or may not be included); or present (occupied); A; T; C; or G. Other values can be, e.g., not Y, wherein Y is A, T, G, or C; A or X, wherein X is one or two of T, G, or C; T or X, wherein X is one or two of A, G, or C; G or X, wherein X is one or two of T, A, or C; C or X, wherein X is one or two of T, G, or A; a pyrimidine nucleotide; or a purine nucleotide. A nucleotide value can be a frequency for one or more, e.g., 2, 3, or 4, bases (or other value described herein, e.g., missing or additional) at a nucleotide position. E.g., a nucleotide value can comprise a frequency for A, and a frequency for G, at a nucleotide position.
- “Or” is used herein to mean, and is used interchangeably with, the term “and/or”, unless context clearly indicates otherwise. The use of the term “and/or” in some places herein does not mean that uses of the term “or” are not interchangeable with the term “and/or” unless the context clearly indicates otherwise.
- “Sample,” “tumor sample,” “cancer sample,” “tissue sample,” “patient sample,” “patient cell or tissue sample” or “specimen” each refers to a collection of cells obtained from a subject or patient, e.g., from a tissue, or circulating cells, of a subject or patient. The source of the tissue sample can be solid tissue as from a fresh, frozen and/or preserved organ, tissue sample, biopsy, or aspirate; blood or any blood constituents; bodily fluids such as cerebral spinal fluid, amniotic fluid, peritoneal fluid or interstitial fluid; or cells from any time in gestation or development of the subject. The tissue sample can contain compounds that are not naturally intermixed with the tissue in nature such as preservatives, anticoagulants, buffers, fixatives, nutrients, antibiotics or the like. In one embodiment, the sample is preserved as a frozen sample or as formaldehyde- or paraformaldehyde-fixed paraffin-embedded (FFPE) tissue preparation. For example, the sample can be embedded in a matrix, e.g., an FFPE block or a frozen sample. Typically, the sample is a tumor sample, e.g., includes one or more premalignant or malignant cells. In certain, embodiments, the sample, e.g., the tumor sample, is acquired from a solid tumor, a soft tissue tumor or a metastatic lesion. In other embodiments, the sample, e.g., the tumor sample, includes tissue or cells from a surgical margin. In an embodiment, the sample, e.g., tumor sample, includes one or more circulating tumor cells (CTC) (e.g., a CTC acquired from a blood sample). In other embodiments, the sample is a tumor sample, e.g., includes one or more premalignant or malignant cells. In certain, embodiments, the sample, e.g., the tumor sample, is acquired from a solid tumor, a soft tissue tumor or a metastatic lesion. In other embodiments, the sample, e.g., the tumor sample, includes tissue or cells from a surgical margin. The sample can be histologically normal tissue. In another embodiment, the sample, e.g., tumor sample, includes one or more circulating tumor cells (CTC) (e.g., a CTC acquired from a blood sample). In one embodiment, the method further includes acquiring a sample, e.g., a tumor sample as described herein. The sample can be acquired directly or indirectly.
- “Sequencing” a nucleic acid molecule requires determining the identity of at least one nucleotide in the molecule. In embodiments the identity of less than all of the nucleotides in a molecule are determined. In other embodiments, the identity of a majority or all of the nucleotides in the molecule is determined.
- “Subgenomic interval” as referred to herein, refers to a portion of genomic sequence. In an embodiment a subgenomic interval can be a single nucleotide position, e.g., a nucleotide position variants of which are associated (positively or negatively) with a tumor phenotype. In an embodiment a subgenomic interval comprises more than one nucleotide position. Such embodiments include sequences of at least 2, 5, 10, 50, 100, 150, or 250 nucleotide positions in length. Subgenomic intervals can comprise an entire gene, or a preselected portion thereof, e.g., the coding region (or portions thereof), a preselected intron (or portion thereof) or exon (or portion thereof). Typically a subgenomic interval will include or be an exon. A subgenomic interval can comprise all or a part of a fragment of a naturally occurring, e.g., genomic, nucleic acid. E.g., a subgenomic interval can correspond to a fragment of genomic DNA which is subjected to a sequencing reaction. In embodiments a subgenomic interval is continuous sequence from a genomic source. In embodiments a subgenomic interval includes sequences that are not contiguous in the genome, e.g., it can include junctions formed found at exon-exon junctions in cDNA.
- In an embodiment, a subgenomic interval comprises or consists of: a single nucleotide position; an intragenic region or an intergenic region; an exon or an intron, or a fragment thereof, typically an exon sequence or a fragment thereof; a coding region or a non-coding region, e.g., a promoter, an enhancer, a 5′ untranslated region (5′ UTR), or a 3′ untranslated region (3′ UTR), or a fragment thereof; a cDNA or a fragment thereof; a polymorphism; an SNP; a somatic mutation, a germ line mutation or both; an alteration, e.g., a point or a single mutation; a deletion mutation (e.g., an in-frame deletion, an intragenic deletion, a full gene deletion); an insertion mutation (e.g., intragenic insertion); an inversion mutation (e.g., an intra-chromosomal inversion); a linking mutation; a linked insertion mutation; an inverted duplication mutation; a tandem duplication (e.g., an intrachromosomal tandem duplication); a translocation (e.g., a chromosomal translocation, a non-reciprocal translocation); a rearrangement (e.g., a genomic rearrangement (e.g., a rearrangement of one or more introns, or a fragment thereof; a rearranged intron can include a 5′- and/or 3′-UTR); a change in gene copy number; a change in gene expression; a change in RNA levels, or a combination thereof. The “copy number of a gene” refers to the number of DNA sequences in a cell encoding a particular gene product. Generally, for a given gene, a mammal has two copies of each gene. The copy number can be increased, e.g., by gene amplification or duplication, or reduced by deletion.
- “Variant,” as used herein, refers to a structure that can be present at a subgenomic interval that can have more than one structure, e.g., an allele at a polymorphic locus.
- Headings, e.g., (a), (b), (i) etc, are presented merely for ease of reading the specification and claims. The use of headings in the specification or claims does not require the steps or elements be performed in alphabetical or numerical order or the order in which they are presented.
- Sequence Coverage Input (SCI)
- Input SCI is a measure of normalized sequence coverage at each of a plurality of selected subgenomic intervals, e.g., exons. SCI can comprise a series of values for a plurality of selected subgenomic intervals. A useful formulation of SCI is a function, e.g., the log, of a value related to the number of sequencing reads for a subgenomic interval, e.g., an exon, in the tumor sample/a value related to the number of sequencing reads for that subgenomic interval in the control. This is sometimes referred to herein as log r. A useful form for SCI is:
- log2 (the # of sequencing reads for a subgenomic interval, e.g., an exon, in the tumor sample/the # of sequencing reads for that subgenomic interval in the control).
- E.g., for a particular subgenomic interval, e.g., an exon, reads are acquired. Reads for that subgenomic interval from a control diploid cell are acquired. The log of the ratio of the former to the later is acquired. This is repeated for each of a plurality of subgenomic intervals. The resulting series of log r values can be used as SCI.
- The measure of normalized sequence coverage can also comprise adjustment for other parameters that might distort the analysis. E.g., if it were found that values for measure of normalized sequence coverage correlated with another factor, e.g., GC content, the method can include the use of an SCI that is corrected for this. In an embodiment the GC content for a plurality of the subgenomic intervals is acquired. The GC content and log r can be compared to determine if they are correlated. This can be undesirable as variations in log r should generally be independent of GC content. Then if there is a correlation, the values for log r can be adjusted, e.g., by regression analysis.
- SNP Allele Frequency Input (SAFI)
- Input SCI comprises a measure of the allele frequency for each of a plurality of selected germline SNPs in the tumor sample. An allele frequency at a selected SNP can be acquired from reads from the sample which cover a selected SNP. In embodiment the allele frequency is the frequency of the minor allele as portrayed in the reads. In other embodiments the allele frequency is the frequency, as portrayed in the reads, of an alternative allele. The identity of an alternative allele can be acquired from a reference database, e.g., UCSC Human Genome Browser (Meyer L. R., et al., The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res. 2013; 41(Database issue): D64-69), and dbSNP (Sherry S. T., et al., dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001; 29(1): 308-311).
- Variant Allele Frequency Input (VAFI)
- Input VAFI comprises the allele frequency for said variant, e.g., mutation, in the tissue or sample, e.g., tumor sample.
- Control
- Typically, the number of reads for each of a plurality of subgenomic intervals is normalized, e.g., to the number of reads from a control. The control need not be, and typically is not, from the subject that supplies the tumor sample. The control sample can be from an individual that does not have a tumor, or does not have a tumor of the type in the subject sample. Typically the sample is from normal, non-disease state tissue. A control is “process-matched” with the tumor sample if they are sequenced under similar conditions. E.g., a process matched control can be one in which one or more or all of the following conditions for the treatment of the tumor sample and the control are met: they prepared in the same way; nucleic acid for sequencing is obtained from them in the same way; they are sequenced with the same sequencing method; or they are sequenced in the same run.
- Genomic Segments
- A genomic segment comprises a subgenomic interval, e.g., an exon, and other genomic sequence, e.g., one or a plurality of other subgenomic intervals. Typically, a genomic interval will include a plurality of subgenomic intervals, e.g., exons, which are characterized by having one or both of:
- a measure of normalized sequence coverage, e.g., log r, that differ by no more than a preselected amount, e.g., the values for log2 r for subgenomic intervals, e.g., exons, within the boundaries of the genomic segment differ by no more than a reference value, or are substantially constant; and
- SNP allele frequencies for germline SNPs that differ by no more than a preselected amount, e.g., the values for germline SNP allele frequencies for subgenomic intervals, e.g., exons, within the boundaries of the genomic segment differ by no more than a reference value, or are substantially constant.
- Assembly of genomic sequences into genomic segments can in cases be viewed as a data reduction step. E.g., several thousand exons may amount to many fewer, e.g., a hundred or fewer, genomic segments. The number of subgenomic intervals, e.g., exons, that are contained in, or are combined to form, the genomic segments can at least 2, 5, 10, 15, 20, 50 or 100 times the number of genomic segments. In embodiments the number of subgenomic intervals, e.g., exons, is at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 times the number of genomic segments.
- Genomic sequences, e.g., subgenomic intervals, e.g., exons, can be assembled into subgenomic intervals, with a method described herein, e.g., a method comprising a circular binary segmentation (CBS) (Olshen et al. Biostatistics. 2004; 5(4): 557-572). Other methods that can be used include, but not limited to, HMM based methods (Fridlyand et al. Journal of Multivariate Analysis 90 (2004): 132-153), Wavelet based methods (Hsu et al. Biostatistics. 2005; 6(2): 211-226), and Cluster along Chromosomes method (Wang et al. Biostatistics. 2005; 6(1): 45-58).
- Statistical Model
- Genome-wide copy number, as well as copy number and LOH estimates for each chromosomal segment, can be determined by fitting a statistical model, e.g., a statistical model described herein.
- For example, the following steps can be performed:
- Let:
-
- Si be a genomic segment
- li be the length of Si
- rij be the log ratio (LR) of exon j within Si
- fik be the minor allele frequency of SNP k within Si
- Seek to estimate p—tumor purity, and Ci—the copy numbers of Si
- Jointly model rij and fik, given p and Ci;
-
-
- Mi≦Ci is number of altered alleles at Si
- σri and σfi are niose parameters
- Fit model using standards methods, e.g., Markov chain Monte Carlo (MCMC), assignming copy numbers to all segments.
- For each genomic segment i:
-
- If Ci=Mi=0, the segment has homozygous deletion in tumor;
- If Ci=Mi≠0, the segment has LOH in tumor;
- If Ci≠Mi≠0, the segment is heterozygous in tumor.
- For each mutation identified, use model fit to assess differences in expected allele frequencies (AF) between germline, somatic, and subclonal somatic mutations. Statistical confidence assessed based on read depth and local variability in allele frequency estimates.
- For example, a gemline variant at segment i can have expected AF:
-
- a somatic mutation at segment i can have expected AF:
-
- and
- a subclonal somatic mutation at segment i can have expected AF:
-
-
FIG. 8 is an exemplary expected allele frequency table for copy numbers, given purity (p), copy number (C), and alternative allele count (M). For example, low purity (e.g., <20%) samples are relatively be easier for assessing somatic status, but more difficult in assessing tumor LOH. As another example, high purity (e.g., >90%) samples are easier for assessing tumor LOH, but more difficult in assessing somatic status. Tumor samples that are well-admixed with surrounding normal tissue (e.g., many clinical cancer specimens) can be optimal. A more comprehensive table for expected allele frequencies is depicted inFIG. 7 . - Variants and SNPs
- The methods described herein can be used to characterize variants found anywhere in the genome including in exons, introns, 5′-UTRs, and inter-gene regions.
- In an embodiment, the method comprises characterizing a variant, e.g., a mutation, in a tumor suppressor gene. In another embodiment, the method comprises characterizing a variant, e.g., a mutation, in an oncogene.
- In an embodiment, the method comprises characterizing a variant, e.g., a mutation, in a gene selected from: Table 1, Table 2, or Table 3.
- In an embodiment, the method comprises acquiring an SCI for subgenomic intervals from at least five, six, seven, eight, nine, ten, fifteen, twenty, twenty-five, thirty or more genes from the sample, wherein the genes are chosen from: Table 1, Table 2, or Table 3.
- In an embodiment, the method comprises acquiring an SCI for a plurality, e.g., at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, of subgenomic intervals, e.g., exons, a gene chosen from: Table 1, Table 2, or Table 3.
- In an embodiment, the method comprises acquiring an SAFI for a SNP from at least five, six, seven, eight, nine, ten, fifteen, twenty, twenty-five, thirty or more genes from the sample, wherein the genes or gene products are chosen from: Table 1, Table 2, or Table 3.
-
TABLE 1 Exemplary Genes for Analysis ABL1, AKT1, AKT2, AKT3, ALK, APC, AR, BRAF, CCND1, CDK4, CDKN2A, CEBPA, CTNNB1, EGFR, ERBB2, ESR1, FGFR1, FGFR2, FGFR3, FLT3, HRAS, JAK2, KIT, KRAS, MAP2K1, MAP2K2, MET, MLL, MYC, NF1, NOTCH1, NPM1, NRAS, NTRK3, PDGFRA, PIK3CA, PIK3CG, PIK3R1, PTCH1, PTCH2, PTEN, RB1, RET, SMO, STK11, SUFU, and TP53. -
TABLE 2 Exemplary Genes for Analysis ABL2, ARAF, ARFRP1, ARID1A, ATM, ATR, AURKA, AURKB, BAP1, BCL2, BCL2A1, BCL2L1, BCL2L2, BCL6, BRCA1, BRCA2, CBL, CARD11, CBL, CCND2, CCND3, CCNE1, CD79A, CD79B, CDH1, CDH2, CDH20, CDH5, CDK6, CDK8, CDKN2B, CDKN2C, CHEK1, CHEK2, CRKL, CRLF2, DNMT3A, DOT1L, EPHA3, EPHA5, EPHA6, EPHA7, EPHB1, EPHB4, EPHB6, ERBB3, ERBB4, ERG, ETV1, ETV4, ETV5, ETV6, EWSR1, EZH2, FANCA, FBXW7, FGFR4, FLT1, FLT4, FOXP4, GATA1, GNA11, GNAQ, GNAS, GPR124, GUCY1A2, HOXA3, HSP90AA1, IDH1, IDH2, IGF1R, IGF2R, IKBKE, IKZF1, INHBA, IRS2, JAK1, JAK3, JUN, KDM6A, KDR, LRP1B, LRP6, LTK, MAP2K4, MCL1, MDM2, MDM4, MEN1, MITF, MLH1, MPL, MRE11A, MSH2, MSH6, MTOR, MUTYH, MYCL1, MYCN, NF2, NKX2-1, NTRK1, NTRK2, PAK3, PAX5, PDGFRB, PKHD1, PLCG1, PRKDC, PTPN11, PTPRD, RAF1, RARA, RICTOR, RPTOR, RUNX1, SMAD2, SMAD3, SMAD4, SMARCA4, SMARCB1, SOX10, SOX2, SRC, TBX22, TET2, TGFBR2, TMPRSS2, TNFAIP3, TNK, TNKS2, TOP1, TSC1, TSC2, USP9X, VHL, and WT1. -
TABLE 3 Exemplary Genes for Analysis ABCB1, ABCC2, ABCC4, ABCG2, ABL1, ABL2, AKT1, AKT2, AKT3, ALK, APC, AR, ARAF, ARFRP1, ARID1A, ATM, ATR, AURKA, AURKB, BCL2, BCL2A1, BCL2L1, BCL2L2, BCL6, BRAF, BRCA1, BRCA2, C1orf144, CARD11, CBL, CCND1, CCND2, CCND3, CCNE1, CDH1, CDH2, CDH20, CDH5, CDK4, CDK6, CDK8, CDKN2A, CDKN2B, CDKN2C, CEBPA, CHEK1, CHEK2, CRKL, CRLF2, CTNNB1, CYP1B1, CYP2C19, CYP2C8, CYP2D6, CYP3A4, CYP3A5, DNMT3A, DOT1L, DPYD, EGFR, EPHA3, EPHA5, EPHA6, EPHA7, EPHB1, EPHB4, EPHB6, ERBB2, ERBB3, ERBB4, ERCC2, ERG, ESR1, ESR2, ETV1, ETV4, ETV5, ETV6, EWSR1, EZH2, FANCA, FBXW7, FCGR3A, FGFR1, FGFR2, FGFR3, FGFR4, FLT1, FLT3, FLT4, FOXP4, GATA1, GNA11, GNAQ, GNAS, GPR124, GSTP1, GUCY1A2, HOXA3, HRAS, HSP90AA1, IDH1, IDH2, IGF1R, IGF2R, IKBKE, IKZF1, INHBA, IRS2, ITPA, JAK1, JAK2, JAK3, JUN, KDR, KIT, KRAS, LRP1B, LRP2, LTK, MAN1B1, MAP2K1, MAP2K2, MAP2K4, MCL1, MDM2, MDM4, MEN1, MET, MITF, MLH1, MLL, MPL, MRE11A, MSH2, MSH6, MTHFR, MTOR, MUTYH, MYC, MYCL1, MYCN, NF1, NF2, NKX2-1, NOTCH1, NPM1, NQO1, NRAS, NRP2, NTRK1, NTRK3, PAK3, PAX5, PDGFRA, PDGFRB, PIK3CA, PIK3R1, PKHD1, PLCG1, PRKDC, PTCH1, PTEN, PTPN11, PTPRD, RAF1, RARA, RB1, RET, RICTOR, RPTOR, RUNX1, SLC19A1, SLC22A2, SLCO1B3, SMAD2, SMAD3, SMAD4, SMARCA4, SMARCB1, SMO, SOD2, SOX10, SOX2, SRC, STK11, SULT1A1, TBX22, TET2, TGFBR2, TMPRSS2, TOP1, TP53, TPMT, TSC1, TSC2, TYMS, UGT1A1, UMPS, USP9X, VHL, and WT1. - In one embodiment, one or more of the genomic segments (e.g., SNPs) are relevant to pharmacogenetics and pharmacogenomics (PGx), e.g., drug metabolism and toxicity.
- Cancers
- The method can be used to analyze variants in subjects having cancer.
- Cancers include, but are not limited to, B cell cancer, e.g., multiple myeloma, melanomas, breast cancer, lung cancer (such as non-small cell lung carcinoma or NSCLC), bronchus cancer, colorectal cancer, prostate cancer, pancreatic cancer, stomach cancer, ovarian cancer, urinary bladder cancer, brain or central nervous system cancer, peripheral nervous system cancer, esophageal cancer, cervical cancer, uterine or endometrial cancer, cancer of the oral cavity or pharynx, liver cancer, kidney cancer, testicular cancer, biliary tract cancer, small bowel or appendix cancer, salivary gland cancer, thyroid gland cancer, adrenal gland cancer, osteosarcoma, chondrosarcoma, cancer of hematological tissues, adenocarcinomas, inflammatory myofibroblastic tumors, gastrointestinal stromal tumor (GIST), colon cancer, multiple myeloma (MM), myelodysplastic syndrome (MDS), myeloproliferative disorder (MPD), acute lymphocytic leukemia (ALL), acute myelocytic leukemia (AML), chronic myelocytic leukemia (CML), chronic lymphocytic leukemia (CLL), polycythemia Vera, Hodgkin lymphoma, non- Hodgkin lymphoma (NHL), soft-tissue sarcoma, fibrosarcoma, myxosarcoma, liposarcoma, osteogenic sarcoma, chordoma, angiosarcoma, endotheliosarcoma, lymphangiosarcoma, lymphangioendotheliosarcoma, synovioma, mesothelioma, Ewing's tumor, leiomyosarcoma, rhabdomyosarcoma, squamous cell carcinoma, basal cell carcinoma, adenocarcinoma, sweat gland carcinoma, sebaceous gland carcinoma, papillary carcinoma, papillary adenocarcinomas, medullary carcinoma, bronchogenic carcinoma, renal cell carcinoma, hepatoma, bile duct carcinoma, choriocarcinoma, seminoma, embryonal carcinoma, Wilms' tumor, bladder carcinoma, epithelial carcinoma, glioma, astrocytoma, medulloblastoma, craniopharyngioma, ependymoma, pinealoma, hemangioblastoma, acoustic neuroma, oligodendroglioma, meningioma, neuroblastoma, retinoblastoma, follicular lymphoma, diffuse large B-cell lymphoma, mantle cell lymphoma, hepatocellular carcinoma, thyroid cancer, gastric cancer, head and neck cancer, small cell cancers, essential thrombocythemia, agnogenic myeloid metaplasia, hypereosinophilic syndrome, systemic mastocytosis, familiar hypereosinophilia, chronic eosinophilic leukemia, neuroendocrine cancers, carcinoid tumors, and the like.
- In some embodiments, the cancer is a primary cancer, e.g., a cancer is named after the part of the body where it first started to grow. In some embodiments, the cancer is a secondary cancer (or a metastasis), e.g., when cancer cells spread from the primary cancer to another part of the body (e.g., lymph nodes, lungs, liver, brain, and bones). For example, a secondary cancer can contain cancer cells originated from the primary cancer site.
- Sample Processing, Analysis, Interpretation and Reporting
- The specimens can be processed and analyzed using NGS-based cancer assay, e.g., as described in Frampton et al. Nat Biotechnol. 31(11):1023-1031 (2013). Typically, the method includes, e.g., DNA extraction, sequencing, analysis and interpretation.
- DNA can be extracted from FFPE tumor samples. Typical sample requirements include, e.g., surface area ≧25 mm2, sample volume ≧1 mm3, nucleated cellularity ≧80% or 30,000 cells, tumor content ≧20%.
- Sequencing library can be prepared using “with-bead” library construction. DNA can be fragmented by sonication and ≧50 ng of dsRNA (e.g., quantified by PicoGreen) may be required for library preparation. DNA fragments can be captured by biotinylated DNA oligonucleotides during hybridization. Sequencing can be performed, e.g., to >500× average unique coverage (e.g., with >100× at >99% exons), e.g., on a HiSeq platform (Illumina) using 49×49 paired-end sequencing.
- Various types of analysis can be performed. For example, base substitutions and short insertions/deletions can be analyzed by Bayesian algorithm and local assembly, respectively. As another example, copy number alterations (CNAs) can be assessed by comparison with process-matched normal control and gene fusions can be identified by analysis of chimeric read pairs. The methods described herein can be sensitive, e.g., to variants present at any mutant allele frequency. Detection of long (e.g., 1-40 bp) indel variants can be achieved using Bruijin graph- based local assembly. CGH-like analysis of read-depth can be used for assessment of CNAs.
- The methods described herein allow for clinical interpretation without a matched normal. The reporting approach can include, e.g., removal of germline variants (e.g., from 1000 Genome Project (dbSNP135)) and highlighting known driver alterations (e.g., COSMIC v62) as biologically significant. A concise summary of the biomedical literature and current clinical trials can be provided for each highlighted alteration.
- Example System Environment
- According to some embodiments, specially configured computer systems can be configured to perform the analysis discussed herein, e.g., to generate characterization models of genetic variants appearing in tumor samples. The characterization models can specify, for example, a tumor type (e.g., somatic, germline, subclonal somatic, and not-distinguishable) and/or a tumor zygosity (e.g., homozygous, heterozygous, and absent) for a genetic variant based on sequencing information obtained on the sample. Various embodiments of characterization systems can be configured to operate on testing data (e.g., genetic sequencing information) provided from genetic screening systems and/or methods. In some embodiments, the characterization systems can also be configured to perform genetic testing on tumor samples directly to generate, for example, genetic sequencing information. In further embodiments, characterization models can be generated by system components that interact with system components for sequencing and/or testing tumor samples. The results generated by sequencing components can be accessed by characterization system components to generate characterization models of genetic variants.
- According to some embodiments, characterization systems can provide user or collaborator (e.g., physicians, researches, clinicians, and other medical personnel) access to genomic sequencing data or information on variants through user interfaces. Responsive to selection in the user interface, the system can accept definition of subgenomic intervals and/or germline single nucleotide polymorphisms (SNPs) within a tumor sample on which to provide a characterization model. In other embodiments, the characterization system can automatically define the subgenomic intervals and/or germline SNPs on which to develop classification analysis.
- According to one embodiment, a characterization system is configured to capture data on a genomic sequence coverage for specified subgenomic intervals. The system can define a variable for a sequence coverage input (“SCI” discussed herein) based on the values for sequence coverage at the specified subgenomic intervals. In one example, the system includes a user interface display configured to accept user input to define the specified subgenomic intervals. In other embodiments, the subgenomic intervals can be pre-defined as part of genetic testing and/or analysis. Further, the system can also be configured to identify the subgenomic intervals to analyze automatically (e.g., based on segmentation analysis, etc). Once the subgenomic intervals are specified, the system captures a value for sequence coverage for each of a plurality of specified subgenomic intervals. The captured values can be normalized, averaged, or weighted to prevent outlier values from skewing subsequent calculations. In one example, a normalized value for sequence coverage is used in generating a characterization model for a tumor sample.
- The characterization system can also be configured to derive an allele frequency value according to specification of germline SNPs in the tumor sample. The system can define a variable for an SNP allele frequency input (“SAFI” as discussed herein) based on the values for allele frequency for the selected germline SNPs. In some embodiments, the system specifies the germline SNPs on which to capture values for allele frequency (e.g., based on pre-specified selection, automatically based on analysis of the tumor sample, etc.). In other embodiments, the user interface can also be configured to accept selection of germline SNPs within genetic sequencing information obtained on, for example, a tumor sample.
- In some embodiments, the system can be configured to capture and/or calculate additional values from genetic sequence information (including, e.g., captured from testing systems and/or components or generated by the characterization system directly). In one example, the system can capture allele frequency in a tumor sample (“VAFI”—variant allele frequency as discussed herein) for a given variant (e.g., a mutation) from testing data. In another example, the system can generate the data for capturing the allele frequency responsive to genetic sequence testing performed on the sample. The additional values which can be captured and/or acquired can also include any one or more of genomic segment total copy number (“C”—discussed herein) for a plurality of genomic segments; a genomic segment minor allele copy number (“M”—discussed herein) for a plurality of genomic segments; and a sample purity value (“p”—discussed herein).
- According to one embodiment, the characterization system can determine a tumor type (e.g., somatic, germline, subclonal somatic, and not-distinguishable), a tumor zygosity (e.g., homozygous, heterozygous, and absent) responsive to the genetic sequencing data. In embodiments this is achieved without resort to physical analysis of a control sample to determine for example purity.
- For example, the system can calculate a value for a variant type, e.g., mutation type (“g”—e.g., a value that is indicative of a variant being somatic, germline, subclonal somatic, or not-distinguishable) by executing a function on the acquired and/or calculated values for VAFI, p, C, and M. Based on the output value of g, the system can classify the variant type, e.g., mutation type. In one example, a g value equal or approximately equal to 0 is classified by the system as somatic variant. In another example, a g value equal or approximately equal to 1 is classified by the system as a germline variant. Values of g between 0 and 1 (e.g., 0.4-0.6) are classified by the system as not-determinable.
- In further examples, the system can calculate a value indicative of the zygosity of the variant in the sample as a function of the acquired and/or calculated values for C and M. For example, a value of M equal to 0 not equal to C is indicative of absence of the variant, a non-zero value of M equal to C is indicative of homozygosity of the variant (e.g., LOH), a value of M equal to 0 equal to C is indicative of homozygous deletion of the variant, and a non-zero value of M not equal to C is indicative of heterozygosity of the variant.
- In some implementations, the system can also be configured to determine a confidence level associated with any calculation and/or calculated value (e.g., based on statistical analysis of the input(s) and computational values used to derive an output). The system can use determinations on the confidence of calculations and/or calculated values in interpreting classification outputs. In one example, the not-determinable range of values can be increased where the degree of confidence associated with the calculation of g is low. In another example, the not-determinable range of values can be decreased where the degree of confidence associated with the calculation of g is high.
- Example Calculations
- Various embodiments of the system for generating characterization models can perform any one or more of the functions and/or computations discussed herein. In some embodiments, the system includes system components specially configured calculate C, M, and/or p responsive to fitting a genome-wide copy number model to one or both of the SCI and the SAFI. In one example, the system and/or system components are configured to fit the genome-wide copy number model to the SCI using the equation of:
-
- where ψ is tumor ploidy. The system and/or system components can calculate ψ as =(ΣiliCi)/Σili, where li is the length of a genomic segment. The system can also be configured to fit the genome-wide copy number model to the SAFI using the equation of:
-
- where AF is allele frequency. In one example, the system calculates g based on the fit of values for VAFI, p, C, and M to models of somatic/germline status. Various fitting methodologies can be executed by the system to determine g values (e.g., Markov chain Monte Carlo (MCMC) algorithm, e.g., ASCAT (Allele-Specific Copy Number Analysis of Tumors), OncoSNP, or PICNIC (Predicting Integral Copy Numbers In Cancer).
- According to one embodiment, a system for determining a characterization model for a tumor sample can execute a variety of functions and/or processes. Shown in
FIG. 3 is anexample process 300 for generating a characterization model for a tumor sample according to one embodiment.Process 300 begins at 302 by acquisition of calculation values. The acquisition of the calculation values at 302 can include accessing any one or more of the values used to calculate g and/or determine zygosity (e.g., from evaluation of M against C). For example, the calculation values accessed at 302 can include any one or more of: SCI, SAFI, VAFI, C, M, p. In some implementations, acquisition at 302 can also include calculation and/or direct determination of SCI, SAFI, and VAFI from sequencing on a tumor sample. Additionally, acquisition at 302 can also include calculation and/or direct determination of C, M, and/or p. -
Process 300 continues at 304, where values necessary for determining the characterization model that are missing (304YES) are calculated from the acquired values of 302. For example, C, M, and/or p can be calculated at 306 if any of the values are not acquired, and intermediate calculations are necessary 304YES. If the values necessary for classification are acquired at 302, then intermediate calculations are not needed 304NO. Once the values necessary are defined, classification values can be determined at 308. In one example, a value indicative of variant type is determined at 308. The variant type can include somatic, germline, subclonal somatic, and/or not-distinguishable based on the value determined at 308. In one example, a value for g is determined at 308, and the variant type is classified based on the value of g (e.g., equal or approximately equal to 0: somatic; equal or approximately equal to 1: germline; less than 0; subclonal somatic; and in a range between 0 and 1 (e.g., 0.4 to 0.6) not-distinguishable). - In another example, a value indicative of zygosity as a function of C and M is determined at 308 (e.g., a value of M equal to 0 not equal to C is indicative of absence of the variant, a non-zero value of M equal to C is indicative of homozygosity of the variant (e.g., LOH), a value of M equal to 0 equal to C is indicative of a homozygous deletion of the variant, and a non-zero value of M not equal to C is indicative of heterozygosity of the variant). Based on the classification value(s) determined at 308 a characterization model can be generated for a variant specifying type and/or zygosity.
- Various embodiments according to the disclosure may be implemented on one or more specially programmed computer systems. These computer systems may be, for example, general-purpose computers such as those based on Intel PENTIUM-type processor, Motorola PowerPC, AMD Athlon or Turion, Sun UltraSPARC, Hewlett-Packard PA-RISC processors, or any other type of processor, including multi-core processors. It should be appreciated that one or more of any type computer system may be used to perform a process or processes for generating a characterization model for a variant in a tumor sample. Further, the system may be located on a single computer or may be distributed among a plurality of computers attached by a communications network.
- A general-purpose computer system according to one embodiment is specially configured to perform any of the described functions, including but not limited to, acquiring calculation values (e.g., SCI, SAFI, VAFI, M, C, p), normalizing calculation values against a control, calculating intermediate values, calculating classification value(s) (e.g., g and/or zygosity value(s)), etc. Additional functions include, for example, fitting genomic wide models to determine classification values, determining log r values, determining correlation of GC content, specifying genomic intervals, specifying germline SNPs, determining calculation values (e.g., SCI, SAFI, VAFI, M, C, p), defining genomic segments, segmenting genomic sequence information, determining sequence coverage, determining SNP allele frequencies, determining genomic segment boundaries, etc.
- It should be appreciated that the system may perform other functions, including assembling sequences for subgenomic intervals, generating genome-wide copy number model(s), fitting genome-wide copy number model(s), displaying genomic sequence information for selection, determining sample purity, calculating confidence values, and enforcing thresholds on calculations (e.g., purity >80%).
- The functions, operations, and/or algorithms described herein can also be encoded as software executing on hardware that together define a processing component, that can further define one or more portions of a specially configured general purpose computer, that reside on an individual specially configured general purpose computer, and/or reside on multiple specially configured general purpose computers.
-
FIG. 4 shows an example block diagram of a general-purpose computer system 400 which can be specially configured to practice various aspects of the present disclosure discussed herein. For example, various aspects of the disclosure can be implemented as specialized software executing in one or more computer systems including general-purpose computer systems network 602 shown inFIG. 6 .Computer system 400 may include aprocessor 406 connected to one ormore memory devices 410, such as a disk drive, memory, or other device for storing data.Memory 410 is typically used for storing programs and data during operation of thecomputer system 400. Components ofcomputer system 400 can be coupled by aninterconnection mechanism 408, which may include one or more busses (e.g., between components that are integrated within a same machine) and/or a network (e.g., between components that reside on separate discrete machines). Theinterconnection mechanism 408 enables communications (e.g., data, instructions) to be exchanged between system components ofsystem 400. -
Computer system 400 may also include one or more input/output (I/O) devices 402-204, for example, a keyboard, mouse, trackball, microphone, touch screen, a printing device, display screen, speaker, etc.Storage 412, typically includes a computer readable and writeable nonvolatile recording medium in which instructions are stored that define a program to be executed by the processor or information stored on or in the medium to be processed by the program. - The medium may, for example, be a
disk 502 or flash memory as shown inFIG. 5 . Typically, in operation, the processor causes data to be read from the nonvolatile recording medium into anothermemory 504 that allows for faster access to the information by the processor than does the medium. This memory is typically a volatile, random access memory such as a dynamic random access memory (DRAM) or static memory (SRAM). In one example, the computer-readable medium is a non-transient storage medium. - Referring again to
FIG. 4 , the memory can be located instorage 412 as shown, or inmemory system 410. Theprocessor 406 generally manipulates the data within thememory 410, and then copies the data to the medium associated withstorage 412 after processing is completed. A variety of mechanisms are known for managing data movement between the medium and integrated circuit memory element and the disclosure is not limited thereto. The disclosure is not limited to a particular memory system or storage system. - The computer system may include specially-programmed, special-purpose hardware, for example, an application-specific integrated circuit (ASIC). Aspects of the disclosure can be implemented in software executed on hardware, hardware or firmware, or any combination thereof. Although
computer system 400 is shown by way of example as one type of computer system upon which various aspects of the disclosure can be practiced, it should be appreciated that aspects of the disclosure are not limited to being implemented on the computer system as shown inFIG. 4 . Various aspects of the disclosure can be practiced on one or more computers having a different architectures or components than that shown inFIG. 4 . - It should also be appreciated that the disclosure is not limited to executing on any particular system or group of systems. Also, it should be appreciated that the disclosure is not limited to any particular distributed architecture, network, or communication protocol.
- Various embodiments of the disclosure can be programmed using an object-oriented programming language, such as Java, C++, Ada, or C# (C-Sharp). Other object-oriented programming languages may also be used. Alternatively, functional, scripting, and/or logical programming languages can be used. Various aspects of the disclosure can be implemented in a non-programmed environment (e.g., documents created in HTML, XML or other format that, when viewed in a window of a browser program, render aspects of a graphical-user interface (GUI) or perform other functions). The system libraries of the programming languages are incorporated herein by reference. Various aspects of the disclosure can be implemented as programmed or non-programmed elements, or any combination thereof.
- Various aspects of the disclosure can be implemented by one or more systems similar to
system 400. For instance, the system can be a distributed system (e.g., client server, multi-tier system) comprising multiple general-purpose computer systems. In one example, the system includes software processes executing on a system for generating a characterization model. Various system embodiments can execute operations such as accepting a tumor sample, executing genomic sequencing, generating and displaying classification/characterization information on the sample, generating user interfaces for displaying classification information, accepting user input regarding genomic segments and/or boundary definition, among other options. The system embodiments may operate as “black box” systems where an input sample is classified without further interaction, and other system embodiments may permit user interaction to specify genomic segments, genomic intervals, etc., on which analysis is performed. - There can be other computer systems that perform functions such as fitting genomic data to genome-wide copy number models, generating characterization models, storing characterization models, etc. These systems can also be configured to manage administration of testing of samples, accept samples as inputs, sequence samples, provide sequencing data to classification components, among other options. These systems and/or system components can be distributed over a communication system such as the Internet. One such distributed network, as discussed below with respect to
FIG. 6 , can be used to implement various aspects of the disclosure. -
FIG. 6 shows an architecture diagram of an example distributedsystem 600 suitable for implementing various aspects of the disclosure. It should be appreciated thatFIG. 6 is used for illustration purposes only, and that other architectures can be used to facilitate one or more aspects of the disclosure.System 600 may include one or more general-purpose computer systems distributed among anetwork 602 such as, for example, the Internet. Such systems may cooperate to perform any of the functions and/or processes discussed above. - In an example of one such system, one or more of
systems system client computer systems - In another example, a
system 604 includes a browser program such as the Microsoft Internet Explorer application program, Mozilla's FireFox, or Google's Chrome browser through which one or more websites can be accessed. Further, there can be one or more application programs that are executed onsystem 604 that perform functions associated with evaluating a tumor sample, submitting a tumor sample, obtaining genomic sequencing data, and/or communicating genomic sequencing data. For example,system 604 may include one or more local databases for storing, caching and/or retrieving sequencing information associated with testing, sequencing, etc. -
Network 602 may also include, one or more server systems, which can be implemented on general-purpose computers that cooperate to perform various functions discussed herein.System 600 may execute any number of software programs or processes and the disclosure is not limited to any particular type or number of processes. Such processes can be executed by system embodiments and/or system components to perform the various workflows and operations discussed. - Effect of Sample Purity on Analysis
- The ability to distinguish somatic versus germline, and the ability to determine zygosity status are dependent upon sample purity. See
FIG. 7 which provides a table of expected frequencies. The table enumerates values for the formula: -
-
- where p is “sample purity” (0%, 5%, 10%, 15%, 20% . . . ),
- g is “status of variant”, as described herein, e.g., g=0 being somatic, and g=1 being germline,
- M is “variant allele count”,
- C is “total copy number”,
- AF are all the expected allele frequencies on the grid, and
- NaN is “not a number”, which occurs when the denominator pC+2(1−p) is precisely 0.
- The limitations, based on sample purity, are as follows: Low purity (p<20%) samples: Tumor zygosity assessment is difficult due to lack to tumor content. As an extreme example, if p=0%, there is no tumor specimen whatsoever, and one cannot assign a zygosity status at all. However, it's easy to distinguish somatic versus germline status here, because germline variants are expected have an allele frequency close to 50%, whereas somatic variants are expected to drastically differ from an allele frequency of 50%.
- High purity (p>90%) samples: Somatic versus germline assessment is difficult due to lack of normal-cell content. As an extreme example, if p=100%, there is no normal cell whatsoever, and therefore we have zero germline information. However, it's easy to distinguish tumor zygosity, because we have an abundance of tumor information.
- In other embodiments, the method, or the assay, further includes acquiring the sequence of a subgenomic interval that is present in a gene or gene product associated with one or more of drug metabolism, drug responsiveness, or toxicity (also referred to therein as “PGx” genes).
- Methods described herein can comprise providing a report, e.g., in electronic, web-based, or paper form, to the patient or to another person or entity, e.g., a caregiver, e.g., a physician, e.g., an oncologist, a hospital, clinic, third-party payor, insurance company or government office, a research collaborator, or, generally, a party which is interested in the characterization of a variant.
- Molecular diagnostics are increasingly important to clinical oncology, as the number of therapies targeting specific genomic alterations grows. This trend has led to a proliferation of single biomarker assays or hotspot panels, restricted in the breadth of genes and classes of genomic alterations assessed. Limitations of these approached have been overcome in a CLIA certified, pan solid tumor, next-generation sequencing (NGS)-based test that interrogates the entire coding sequence of 182 selected cancer genes from minimal (≧50 ng) DNA from FFPE tissue. High coverage (>500×) and customized algorithms permit clinical-grade identification of all classes of genomic alterations. An approach to copy number assessment, which addresses the high stromal contamination in routine patient specimens and enables sensitive detection of focal gene amplifications and homozygous deletions, is discussed.
- A CGH-like log-ratio profile of the sample is obtained by normalizing the sequence coverage obtained at all exons and >1,700 genome-wide SNPs against a process-matched normal control. This profile is segmented and interpreted using allele frequencies of sequenced SNPs to estimate tumor purity and copy number at each segment. Briefly, if Si is a genomic segment at constant copy number in the tumor, let li be the length of Si, rij be the coverage measurement of exon j within Si, and fik be the minor allele frequency of SNP k within Si. We seek to estimate p—tumor purity, and Ci—the copy numbers of Si. We jointly model rij and fik, given p and Ci:
-
- where Mi is the copy number of minor alleles at Si, distributed as
integer 0≦Mi≦Ci, σri and σfi reflect noise observed in the CGH and SNP data, respectively. Fitting is performed using Gibbs sampling, assigning absolute copy number to all segments. Focal amplifications are called at segments with ≧6 copies and homozygous deletions at 0 copies, in samples with purity >20%. - The method was validated against current clinical standards for copy number assessment: fluorescence in-situ hybridization (FISH) and immunohistochemistry (IHC). 42 Breast cancer specimens were analyzed with NGS-based and FISH/IHC calls for HER2 amplification and 22 Prostate cancer samples with calls for PTEN homozygous loss. Average sequence coverage in the dataset exceeded 1000×. Of the 6 HER2 amplified/36 normal and 6 PTEN deleted/16 normal cases by FISH/IHC, all but one were classified identically by NGS. Review of NGS data for the discordant NGS deleted/FISH normal PTEN call supported homozygous loss. Overall, relative to FISH/IHC, model accuracy for detecting focal gene amplification and homozygous deletion was thus 98% (63/64 calls). Importantly, nearly 40% (24/64) of cases had tumor purity ≦50%, including 4/13 (30%) of cases with a HER2 or PTEN alteration, highlighting the importance of addressing stromal contamination in clinical cancer specimens.
- This study describes the computational approach and presents validation of copy number assessment in a comprehensive, clinical grade, NGS-based cancer gene test. The observed accuracy for focal amplifications and homozygous deletions, coupled with the ability to interrogate all classes of potentially actionable alterations, suggests that this type of testing can become a routine component of cancer patient care.
- Next-generation sequencing (NGS) of cancer specimens is increasingly important to clinical oncology, as the number of therapies targeting specific genomic alterations grows. ACLIA-certified, CAP-accredited NGS-based test has been developed and deployed that interrogates the entire coding sequence of 236 selected cancer genes from minimal (≧50 ng) DNA from FFPE tissue. Deep, uniform coverage and customized algorithms permit accurate identification of all classes of genomic alterations. However, a key practical constraint in genomic testing in oncology is the limited availability of matching normal specimens, restricting the interpretation of any novel variants identified which are either private germline polymorphisms or somatic alterations. An approach to assessing somatic vs. germline status of genomic alterations without a patient matched normal, as well as determining variant zygosity and LOH, is described herein.
- First, a CGH-like log-ratio profile of the sample is obtained by normalizing the coverage obtained at all exons and >3,500 genome-wide SNPs against a process-matched normal control. This profile is segmented and interpreted using allele frequencies of sequenced SNPs to estimate tumor purity (p) and copy number (C) at each segment. Fitting is performed using Gibbs sampling, assigning total copy number and minor allele count to all segments. Given a list of variants with unknown somatic/germline/zygosity status, the copy number and minor allele count (M) of the segment local to each variant is obtained. Allele frequencies f of variants of interest are interpreted using equation
-
- where we compute the value of g. A germline variant has g=1, a somatic variant has a g=0, and a sub-clonal somatic variant has g<0. Statistical significance is assessed relative to read depth and to local variability in allele frequency estimates. Following determination of g, zygosity is determined from M and C: tumor homozygous deletion has C=M=0, tumor LOH has C=M≠0, heterozygous tumor has C≠M, and variant is absent from tumor if C≠0 and M=0.
- As proof-of-principle, the approach was applied to 74 triple-negative breast cancer (TNBC) specimens from Instituto Nacional de Enfermedades Neoplásicas in Lima, Perú. 4 genes most frequently altered in the dataset were selected for analysis: TP53, BRCA1, BRCA2, and PIK3CA. As expected, 47/49 (96%) of TP53 variants were predicted somatic, with clear evidence of second copy tumor suppressor loss through LOH for 43/49 (88%). 8/8 (100%) PIK3CA variants were also predicted somatic, and 6/8 heterozygous, consistent with the PIK3CA's established role as an oncogene. In contrast, 12/18 (67%) of BRCA1/2 variants were germline, consistent with the established role for inherited BRCA1/2 variation and somatic alterations in TNBC.
- This work describes a computational method based on interpretation of variant allele frequencies for determining the somatic/germline/LOH status of genomic alterations in clinical cancer specimens without a matched normal control. The method supports functional prioritization and interpretation of novel alterations discovered on routine testing and enables indication for additional diagnostic workup if predicted germline risk variants are found. When coupled with the accurate assessment of all classes of known cancer genomic alterations offered by deep NGS testing, this further informs clinical decision making and expands treatment choices for cancer patients.
- Inputs SCI and SAFI were acquired as described herein. Fitting a genome-wide copy number model to SCI and SAFI inputs yielded a tumor purity of 40%, with the local region around TP53 gene showing C=2, M=2. VAFI input of TP53 V157F variant has an allele frequency (AF) of 40%. Applying the equations, a value for g, g=0.01, was obtained, given the purity, C, and M from the previous step. Thus, it was concluded that this TP53 V157F is a somatic variant that is homozygous (2 of 2 copies) in the tumor.
- Inputs SCI and SAFI were acquired as described herein. Fitting a genome-wide copy number model to SCI and SAFI inputs yielded a tumor purity of 40%, with the local region around BRCA2 gene showing C=4, M=2. VAFI input of BRCA2 V 1229I variant has an AF of 51%. Applying the equations, a value for g, g=1.05, was obtained, given the purity, C, and M from the step above. Thus, it was concluded this BRCA2 V1229I is a germline variant that is heterozygous (2 of 4 copies) in the tumor.
- Inputs SCI and SAFI were acquired as described herein. Fitting a genome-wide model copy number model to SCI and SAFI inputs yielded a tumor purity of 50%, with the local region around PIK3CA gene showing C=2, M=1. VAFI input of PIK3CA H419_P421>T variant has an AF of 13%. Applying the equations, obtained a value for g, g=−0.48, given the purity, C, and M from the previous step. This allele frequency of 13% is well below an expectation of AF=25% for a fully clonal somatic variant, and an AF of 50% for a germline variant. Thus, it was concluded that this PIK3CA H419_P421>T is a sub-clonal somatic variant that is heterozygous (1 of 2 copies) in the tumor.
- Inputs SCI and SAFI were acquired as described herein. The CGH-like log-ratio profile used to acquire SCI is shown in
FIG. 1 . As shown inFIG. 1 , the total local copy number for BRCA1 is 2. The germline SNP allele frequency profile used to acquire SAFI is shown inFIG. 2 . As shown inFIG. 2 , the allele frequency of nearby SNPs (26% or 74%) implies the number of allelic copy for BRCA1 is 0 or 2 copies. Fitting a genome-wide copy number model to SCI and SAFI inputs yielded a tumor purity of 46%, with the local region around BRCA1 gene showing C=2, M=2. VAFI input of BRCA1 I600fs*7 variant has an AF of 42%. Applying the equations, obtained a value for g, g=0, given the purity, C, and M from the previous step. Thus, it was concluded that this BRCA1 I600fs*7 variant is a somatic variant that is homozygous (2 of 2 copies) in the tumor. - The candidate mutation tested in this Example is PIK3CA H1047R. As shown in
FIG. 9 , a genome-wide copy number model indicated that the tumor has 4 copies of PIK3CA, with 2 variant alleles. The genomic segment containing PIK3CA is not under LOH in the tumor. PIK3CA H1047R variant has an allele frequency (AF) of 36%, which is significantly below threshold of a germline variant (expected AF=50%,FIG. 9 ) but matches a full clonal somatic mutation (expected AF=38%). Thus, it was concluded that this PIK3CA H1047R mutation is somatic and heterozygous in tumor. - The candidate mutation tested in this Example is TP53 G356R. As shown in
FIG. 10 , a genome-wide copy number model indicated that the tumor has 2 copies of TP53, with 2 variant alleles. The genomic segment containing TP53 is under LOH in the tumor. TP53 G356R variant has an allele frequency (AF) of 85%, which is significantly above threshold of a somatic variant (expected AF=65%) but matches a germline mutation (expected AF=83%/MAF=17%,FIG. 10 ). Thus, it was concluded that this TP53 G356R mutation is germline and homozygous in tumor. -
FIG. 11 depicts a CGH-like log-ratio profile of sample for establishing an exemplary genome-wide copy number model. Selected chromosomes are annotated with respect to copy number, zygosity, and somatic/germline status, as shown in Table 4. -
TABLE 4 Genome-wide copy number model with select chromosomes annotated chromosome start end arm level chr (Mb) (Mb) CN LOH gains and LOH Status of short variant chr1 1 120 2 LOHx 1p_LOHx chr3 1 90 1 LOH1 3p_LOH1 chr3 130 198 6 none 3q_gain chr5 1 180 2 none chr8 1 146 2 none chr13 1 115 1 LOH1 chr13_LOH1 BRCA2 D651N somatic and homozygous chr17 1 8 1 LOH1 17p_LOH1 TP53 R282W somatic and homozygous chr21 31 48 1 LOH1 - As indicated in
FIG. 11 and Table 4, p-arm ofchromosome 1 is under copy-neutral LOH (LOHx), while theentire chromosome 13 is under copy-loss LOH (LOH1). Somatic status of certain functional mutations is also reported in Table 4. - A key constraint in genomic testing in oncology is that matched normal specimens are not commonly obtained in clinical practice. Thus, while most clinically relevant genomic alterations have been previously characterized and do not require normal tissue for interpretation, the use of novel variants whose somatic status is unknown is limited. This example describes a approach to predicting somatic vs. germline status of genomic alterations from tumor tissue alone in a CLIA-certified, NGS-based test that interrogates all exons of 236 cancer-related genes.
- For each sample, a “CGH”-like aneuploidy profile was obtained by normalizing against a process-matched control. This profile is segmented and modeled using allele frequencies at >3,500 SNPs to estimate the genome-wide tumor purity (p), copy number (C), and minor allele count (M) at each segment. Variant allele frequency is expected to differ based on somatic status:
-
- For variants of unknown status, measured allele frequency is compared to expectation, and a prediction is made with statistical confidence assessed based on read depth and local variability of SNP measurements in each segment.
- To validate the method, specimens from 30 lung and colon cancer patients were examined by sequencing the primary tumor, the metastatic tumor, and a matched-normal control. A total of 305 unique variants with known somatic status were assessed.
- Next, to evaluate performance broadly, predictions for 17 somatic “hotspot” mutations (e.g. KRAS G12, PIK3CA H1047, BRAF V600E) and 20 common germline SNPs in 2,578 clinical cancer specimens were examined.
- Further, to assess the impact of stromal admixture, three cell lines (HCC-1937, HCC-1954, NCI-H1395) which were experimentally titrated with their matched normal to 6 different levels (10% to 75%) were examined.
- Overall, predictions could be made in about 85% of cases, with 96% of known somatic variants and 98% of known germline variants predicted correctly, as demonstrated in Table 5 below.
-
TABLE 5 Summary of results Somatic variants Germline variants Validation study Call rate predicted correctly predicted correctly 30 matched-normal samples 84% 95% (311/326) 99% (151/153) (479/567) 2,578 clinical samples at 85% 96% (2556/2665) 98% (2062/2106) common somatic and germline (4771/5583) variants 3 cell lines with varying 83% 97% (60/62) 97% (118/122) proportions of tumor-normal (184/222) admixture - This computational method leverages deep next-generation sequencing of clinical cancer specimens to predict variant somatic status without a matched-normal control. Accuracy of the method is >95%, demonstrated using three independent validation approaches. The analytic framework also assesses tumor LOH status of identified variants, and the sub-clonality of somatic mutations. It supports functional prioritization and interpretation of alterations discovered on routine testing and can indicate additional work-up if germline risk variants are found.
- According to one embodiment, a characterization model can be captured and tracked over time. For example, the system can be configured to analyze and store characterization information on multiple tissue samples taken from a subject. The characterization model developed over time provides information on changes to the characterization model (including e.g., variant type, zygosity, etc.). The system can analyze the characterization model to identify relationships between different variants (e.g., tumors) based, for example, on similarity in characterization models. In some implementations, the system can identify related variants in different tumors, different patients, etc.
- According to another embodiment, a characterization model can include treatment information. The system can identify related treatment options responsive to similarity in characterization models and any respective treatments. Once related treatment options are identified, the system can present related treatment in user interface displays, in a report generated by the system, etc.
- Other embodiments are described within the following claims.
Claims (30)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/274,525 US9792403B2 (en) | 2013-05-10 | 2014-05-09 | Analysis of genetic variants |
US15/708,475 US10847249B2 (en) | 2013-05-10 | 2017-09-19 | Analysis of genetic variants |
US17/077,967 US20210043274A1 (en) | 2013-05-10 | 2020-10-22 | Analysis of genetic variants |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201361821920P | 2013-05-10 | 2013-05-10 | |
US201461939936P | 2014-02-14 | 2014-02-14 | |
US14/274,525 US9792403B2 (en) | 2013-05-10 | 2014-05-09 | Analysis of genetic variants |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/708,475 Continuation US10847249B2 (en) | 2013-05-10 | 2017-09-19 | Analysis of genetic variants |
Publications (2)
Publication Number | Publication Date |
---|---|
US20140336996A1 true US20140336996A1 (en) | 2014-11-13 |
US9792403B2 US9792403B2 (en) | 2017-10-17 |
Family
ID=51865422
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/274,525 Active US9792403B2 (en) | 2013-05-10 | 2014-05-09 | Analysis of genetic variants |
US15/708,475 Active 2035-08-02 US10847249B2 (en) | 2013-05-10 | 2017-09-19 | Analysis of genetic variants |
US17/077,967 Pending US20210043274A1 (en) | 2013-05-10 | 2020-10-22 | Analysis of genetic variants |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/708,475 Active 2035-08-02 US10847249B2 (en) | 2013-05-10 | 2017-09-19 | Analysis of genetic variants |
US17/077,967 Pending US20210043274A1 (en) | 2013-05-10 | 2020-10-22 | Analysis of genetic variants |
Country Status (6)
Country | Link |
---|---|
US (3) | US9792403B2 (en) |
EP (2) | EP4524972A2 (en) |
AU (3) | AU2014262481A1 (en) |
CA (1) | CA2912059A1 (en) |
HK (1) | HK1222466A1 (en) |
WO (1) | WO2014183078A1 (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017151524A1 (en) * | 2016-02-29 | 2017-09-08 | Foundation Medicine, Inc. | Methods and systems for evaluating tumor mutational burden |
WO2018106884A1 (en) * | 2016-12-08 | 2018-06-14 | Life Technologies Corporation | Methods for detecting mutation load from a tumor sample |
WO2018183493A1 (en) * | 2017-03-29 | 2018-10-04 | Nantomics, Llc | Signature-hash for multi-sequence files |
CN108733975A (en) * | 2018-03-29 | 2018-11-02 | 深圳裕策生物科技有限公司 | Tumor colonies mutation detection method, device and storage medium based on the sequencing of two generations |
JPWO2019009431A1 (en) * | 2017-07-07 | 2020-05-21 | 株式会社Dnaチップ研究所 | Highly accurate method for identifying mutations in tumor cells |
US10847249B2 (en) | 2013-05-10 | 2020-11-24 | Foundation Medicine, Inc. | Analysis of genetic variants |
CN112885406A (en) * | 2020-04-16 | 2021-06-01 | 深圳裕策生物科技有限公司 | Method and system for detecting HLA heterozygosity loss |
CN113658638A (en) * | 2021-08-20 | 2021-11-16 | 江苏先声医学诊断有限公司 | Detection method and quality control system for homologous recombination defects based on NGS platform |
US11180803B2 (en) | 2011-04-15 | 2021-11-23 | The Johns Hopkins University | Safe sequencing system |
US11279767B2 (en) | 2016-02-29 | 2022-03-22 | Genentech, Inc. | Therapeutic and diagnostic methods for cancer |
US11286531B2 (en) | 2015-08-11 | 2022-03-29 | The Johns Hopkins University | Assaying ovarian cyst fluid |
US11300570B2 (en) | 2016-10-06 | 2022-04-12 | Genentech, Inc. | Therapeutic and diagnostic methods for cancer |
CN114517223A (en) * | 2020-11-20 | 2022-05-20 | 福建和瑞基因科技有限公司 | Method for screening SNP (Single nucleotide polymorphism) sites and application thereof |
US11421265B2 (en) | 2010-12-30 | 2022-08-23 | Foundation Medicine, Inc. | Optimization of multigene analysis of tumor samples |
US11525163B2 (en) | 2012-10-29 | 2022-12-13 | The Johns Hopkins University | Papanicolaou test for ovarian and endometrial cancers |
US11674962B2 (en) | 2017-07-21 | 2023-06-13 | Genentech, Inc. | Therapeutic and diagnostic methods for cancer |
US11725247B2 (en) | 2016-02-29 | 2023-08-15 | Foundation Medicine, Inc. | Methods of treating cancer |
US11773449B2 (en) | 2017-09-01 | 2023-10-03 | The Hospital For Sick Children | Profiling and treatment of hypermutant cancer |
US11959141B2 (en) | 2014-12-05 | 2024-04-16 | Foundation Medicine, Inc. | Multigene analysis of tumor samples |
WO2024238560A1 (en) * | 2023-05-16 | 2024-11-21 | Foundation Medicine, Inc. | Methods and systems for prediction of novel pathogenic mutations |
US12195803B2 (en) | 2017-08-07 | 2025-01-14 | The Johns Hopkins University | Methods and materials for assessing and treating cancer |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111254500B (en) | 2012-12-10 | 2024-01-23 | 分析生物科学有限公司 | Methods of targeted genomic analysis |
US20160053301A1 (en) | 2014-08-22 | 2016-02-25 | Clearfork Bioscience, Inc. | Methods for quantitative genetic analysis of cell free dna |
US11515004B2 (en) | 2015-05-22 | 2022-11-29 | Csts Health Care Inc. | Thermodynamic measures on protein-protein interaction networks for cancer therapy |
AU2016353133B2 (en) | 2015-11-11 | 2022-12-08 | Resolution Bioscience, Inc. | High efficiency construction of dna libraries |
AU2017315769B2 (en) | 2016-08-25 | 2024-02-01 | Resolution Bioscience, Inc. | Methods for the detection of genomic copy changes in DNA samples |
WO2018144782A1 (en) * | 2017-02-01 | 2018-08-09 | The Translational Genomics Research Institute | Methods of detecting somatic and germline variants in impure tumors |
WO2019020652A1 (en) | 2017-07-25 | 2019-01-31 | Sophia Genetics Sa | Methods for detecting biallelic loss of function in next-generation sequencing genomic data |
CA3079253A1 (en) | 2017-11-03 | 2019-05-09 | Guardant Health, Inc. | Normalizing tumor mutation burden |
AU2019310041A1 (en) | 2018-07-23 | 2021-02-04 | Guardant Health, Inc. | Methods and systems for adjusting tumor mutational burden by tumor fraction and coverage |
US20200273538A1 (en) | 2019-02-27 | 2020-08-27 | Guardant Health, Inc. | Computational modeling of loss of function based on allelic frequency |
WO2021146322A1 (en) * | 2020-01-13 | 2021-07-22 | Cardiff Oncology, Inc. | Circulating tumor dna as a biomarker for leukemia treatment |
WO2021216920A1 (en) | 2020-04-22 | 2021-10-28 | Iovance Biotherapeutics, Inc. | Systems and methods for coordinating manufacturing of cells for patient-specific immunotherapy |
EP4123653A1 (en) | 2021-07-22 | 2023-01-25 | QIAGEN GmbH | Method of evaluating a mutational burden |
CN113278706B (en) * | 2021-07-23 | 2021-11-12 | 广州燃石医学检验所有限公司 | Method for distinguishing somatic mutation from germline mutation |
EP4427226A1 (en) * | 2021-11-03 | 2024-09-11 | Foundation Medicine, Inc. | System and method for identifying copy number alterations |
CN113990389B (en) * | 2021-12-27 | 2022-04-22 | 北京优迅医疗器械有限公司 | Method and device for deducing tumor purity and ploidy |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110098193A1 (en) * | 2009-10-22 | 2011-04-28 | Kingsmore Stephen F | Methods and Systems for Medical Sequencing Analysis |
US20120095697A1 (en) * | 2010-10-13 | 2012-04-19 | Aaron Halpern | Methods for estimating genome-wide copy number variations |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100028873A1 (en) * | 2006-03-14 | 2010-02-04 | Abdelmajid Belouchi | Methods and means for nucleic acid sequencing |
EP3225697A3 (en) | 2010-12-30 | 2017-11-22 | Foundation Medicine, Inc. | Optimization of multigene analysis of tumor samples |
EP4524972A2 (en) | 2013-05-10 | 2025-03-19 | Foundation Medicine, Inc. | Analysis of genetic variants |
-
2014
- 2014-05-09 EP EP25155115.6A patent/EP4524972A2/en active Pending
- 2014-05-09 EP EP14795352.5A patent/EP2994847B1/en active Active
- 2014-05-09 AU AU2014262481A patent/AU2014262481A1/en not_active Abandoned
- 2014-05-09 CA CA2912059A patent/CA2912059A1/en active Pending
- 2014-05-09 WO PCT/US2014/037569 patent/WO2014183078A1/en active Application Filing
- 2014-05-09 US US14/274,525 patent/US9792403B2/en active Active
-
2016
- 2016-09-09 HK HK16110725.3A patent/HK1222466A1/en unknown
-
2017
- 2017-09-19 US US15/708,475 patent/US10847249B2/en active Active
-
2020
- 2020-02-24 AU AU2020201325A patent/AU2020201325B2/en active Active
- 2020-10-22 US US17/077,967 patent/US20210043274A1/en active Pending
-
2022
- 2022-02-23 AU AU2022201252A patent/AU2022201252A1/en not_active Abandoned
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110098193A1 (en) * | 2009-10-22 | 2011-04-28 | Kingsmore Stephen F | Methods and Systems for Medical Sequencing Analysis |
US20120095697A1 (en) * | 2010-10-13 | 2012-04-19 | Aaron Halpern | Methods for estimating genome-wide copy number variations |
Non-Patent Citations (1)
Title |
---|
Van Loo et al. (Allele-Specific Copy number Analysis Of Tumors, PNAS, September 28, 2010, vol. 107, no. 39, pages 16910-16915). * |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12180540B2 (en) | 2010-12-30 | 2024-12-31 | Foundation Medicine, Inc. | Optimization of multigene analysis of tumor samples |
US11421265B2 (en) | 2010-12-30 | 2022-08-23 | Foundation Medicine, Inc. | Optimization of multigene analysis of tumor samples |
US11459611B2 (en) | 2011-04-15 | 2022-10-04 | The Johns Hopkins University | Safe sequencing system |
US12209281B2 (en) | 2011-04-15 | 2025-01-28 | The Johns Hopkins University | Safe sequencing system |
US12006544B2 (en) | 2011-04-15 | 2024-06-11 | The Johns Hopkins University | Safe sequencing system |
US11773440B2 (en) | 2011-04-15 | 2023-10-03 | The Johns Hopkins University | Safe sequencing system |
US12252743B2 (en) | 2011-04-15 | 2025-03-18 | The Johns Hopkins University | Safe sequencing system |
US11180803B2 (en) | 2011-04-15 | 2021-11-23 | The Johns Hopkins University | Safe sequencing system |
US11453913B2 (en) | 2011-04-15 | 2022-09-27 | The Johns Hopkins University | Safe sequencing system |
US11525163B2 (en) | 2012-10-29 | 2022-12-13 | The Johns Hopkins University | Papanicolaou test for ovarian and endometrial cancers |
US10847249B2 (en) | 2013-05-10 | 2020-11-24 | Foundation Medicine, Inc. | Analysis of genetic variants |
US11959141B2 (en) | 2014-12-05 | 2024-04-16 | Foundation Medicine, Inc. | Multigene analysis of tumor samples |
US11286531B2 (en) | 2015-08-11 | 2022-03-29 | The Johns Hopkins University | Assaying ovarian cyst fluid |
US11725247B2 (en) | 2016-02-29 | 2023-08-15 | Foundation Medicine, Inc. | Methods of treating cancer |
US11279767B2 (en) | 2016-02-29 | 2022-03-22 | Genentech, Inc. | Therapeutic and diagnostic methods for cancer |
CN109196359B (en) * | 2016-02-29 | 2022-04-12 | 基础医疗股份有限公司 | Methods and systems for assessing tumor mutational burden |
WO2017151524A1 (en) * | 2016-02-29 | 2017-09-08 | Foundation Medicine, Inc. | Methods and systems for evaluating tumor mutational burden |
CN109196359A (en) * | 2016-02-29 | 2019-01-11 | 基础医疗股份有限公司 | For assessing the method and system of Tumor mutations load |
US11300570B2 (en) | 2016-10-06 | 2022-04-12 | Genentech, Inc. | Therapeutic and diagnostic methods for cancer |
WO2018106884A1 (en) * | 2016-12-08 | 2018-06-14 | Life Technologies Corporation | Methods for detecting mutation load from a tumor sample |
US11101019B2 (en) | 2016-12-08 | 2021-08-24 | Life Technologies Corporation | Methods for detecting mutation load from a tumor sample |
WO2018183493A1 (en) * | 2017-03-29 | 2018-10-04 | Nantomics, Llc | Signature-hash for multi-sequence files |
JPWO2019009431A1 (en) * | 2017-07-07 | 2020-05-21 | 株式会社Dnaチップ研究所 | Highly accurate method for identifying mutations in tumor cells |
US11674962B2 (en) | 2017-07-21 | 2023-06-13 | Genentech, Inc. | Therapeutic and diagnostic methods for cancer |
US12195803B2 (en) | 2017-08-07 | 2025-01-14 | The Johns Hopkins University | Methods and materials for assessing and treating cancer |
US11773449B2 (en) | 2017-09-01 | 2023-10-03 | The Hospital For Sick Children | Profiling and treatment of hypermutant cancer |
CN108733975A (en) * | 2018-03-29 | 2018-11-02 | 深圳裕策生物科技有限公司 | Tumor colonies mutation detection method, device and storage medium based on the sequencing of two generations |
CN112885406A (en) * | 2020-04-16 | 2021-06-01 | 深圳裕策生物科技有限公司 | Method and system for detecting HLA heterozygosity loss |
CN114517223A (en) * | 2020-11-20 | 2022-05-20 | 福建和瑞基因科技有限公司 | Method for screening SNP (Single nucleotide polymorphism) sites and application thereof |
CN113658638A (en) * | 2021-08-20 | 2021-11-16 | 江苏先声医学诊断有限公司 | Detection method and quality control system for homologous recombination defects based on NGS platform |
WO2024238560A1 (en) * | 2023-05-16 | 2024-11-21 | Foundation Medicine, Inc. | Methods and systems for prediction of novel pathogenic mutations |
Also Published As
Publication number | Publication date |
---|---|
CA2912059A1 (en) | 2014-11-13 |
US9792403B2 (en) | 2017-10-17 |
AU2014262481A1 (en) | 2015-11-26 |
HK1222466A1 (en) | 2017-06-30 |
EP2994847A1 (en) | 2016-03-16 |
EP2994847A4 (en) | 2017-04-19 |
US20180218113A1 (en) | 2018-08-02 |
AU2020201325B2 (en) | 2021-12-09 |
EP2994847B1 (en) | 2025-02-12 |
AU2022201252A1 (en) | 2022-03-17 |
AU2020201325A1 (en) | 2020-03-12 |
US10847249B2 (en) | 2020-11-24 |
US20210043274A1 (en) | 2021-02-11 |
WO2014183078A1 (en) | 2014-11-13 |
EP4524972A2 (en) | 2025-03-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2020201325B2 (en) | Analysis of genetic variants | |
JP7458360B2 (en) | Systems and methods for detection and treatment of diseases exhibiting disease cell heterogeneity and communicating test results | |
Pleasance et al. | Pan-cancer analysis of advanced patient tumors reveals interactions between therapy and genomic landscapes | |
Sunami et al. | Feasibility and utility of a panel testing for 114 cancer‐associated genes in a clinical setting: a hospital‐based study | |
Zill et al. | The landscape of actionable genomic alterations in cell-free circulating tumor DNA from 21,807 advanced cancer patients | |
US11475981B2 (en) | Methods and systems for dynamic variant thresholding in a liquid biopsy assay | |
US11211147B2 (en) | Estimation of circulating tumor fraction using off-target reads of targeted-panel sequencing | |
US20240013858A1 (en) | Methods for determining variant frequency and monitoring disease progression | |
US20230242975A1 (en) | Methods and systems for distinguishing somatic genomic sequences from germline genomic sequences | |
US20250095775A1 (en) | Methods for determining variant frequency and monitoring disease progression | |
US20240052419A1 (en) | Methods and systems for detecting genetic variants |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: FOUNDATION MEDICINE, INC., MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SUN, JAMES XIN;YELENSKY, ROMAN;SIGNING DATES FROM 20140728 TO 20140729;REEL/FRAME:033583/0985 |
|
AS | Assignment |
Owner name: ROCHE FINANCE LTD, SWITZERLAND Free format text: PATENT SECURITY AGREEMENT;ASSIGNOR:FOUNDATION MEDICINE, INC.;REEL/FRAME:040165/0615 Effective date: 20160927 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
AS | Assignment |
Owner name: FOUNDATION MEDICINE, INC., MASSACHUSETTS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:ROCHE FINANCE LTD;REEL/FRAME:056715/0711 Effective date: 20210430 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |