CA2345441A1 - Complexity management and analysis of genomic dna - Google Patents
Complexity management and analysis of genomic dna Download PDFInfo
- Publication number
- CA2345441A1 CA2345441A1 CA002345441A CA2345441A CA2345441A1 CA 2345441 A1 CA2345441 A1 CA 2345441A1 CA 002345441 A CA002345441 A CA 002345441A CA 2345441 A CA2345441 A CA 2345441A CA 2345441 A1 CA2345441 A1 CA 2345441A1
- Authority
- CA
- Canada
- Prior art keywords
- nucleic acid
- acid sample
- sequences
- fragments
- dna
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 24
- 238000000034 method Methods 0.000 claims abstract description 166
- 239000012634 fragment Substances 0.000 claims abstract description 111
- 238000009396 hybridization Methods 0.000 claims abstract description 43
- 238000006911 enzymatic reaction Methods 0.000 claims abstract description 5
- 108020004414 DNA Proteins 0.000 claims description 151
- 239000000523 sample Substances 0.000 claims description 148
- 150000007523 nucleic acids Chemical class 0.000 claims description 99
- 102000039446 nucleic acids Human genes 0.000 claims description 90
- 108020004707 nucleic acids Proteins 0.000 claims description 90
- 108091008146 restriction endonucleases Proteins 0.000 claims description 52
- 239000011324 bead Substances 0.000 claims description 38
- 239000002773 nucleotide Substances 0.000 claims description 28
- 125000003729 nucleotide group Chemical group 0.000 claims description 28
- 108010042407 Endonucleases Proteins 0.000 claims description 23
- 102000004533 Endonucleases Human genes 0.000 claims description 23
- 108090000623 proteins and genes Proteins 0.000 claims description 20
- 102000004190 Enzymes Human genes 0.000 claims description 19
- 102000053602 DNA Human genes 0.000 claims description 18
- 108090000790 Enzymes Proteins 0.000 claims description 17
- 101710099953 DNA mismatch repair protein msh3 Proteins 0.000 claims description 14
- 230000003321 amplification Effects 0.000 claims description 13
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 13
- 102000054765 polymorphisms of proteins Human genes 0.000 claims description 13
- 101710163270 Nuclease Proteins 0.000 claims description 12
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 12
- 108020005187 Oligonucleotide Probes Proteins 0.000 claims description 12
- 239000003814 drug Substances 0.000 claims description 12
- 239000000203 mixture Substances 0.000 claims description 12
- 239000002751 oligonucleotide probe Substances 0.000 claims description 12
- 239000007787 solid Substances 0.000 claims description 12
- 238000006243 chemical reaction Methods 0.000 claims description 11
- 229940079593 drug Drugs 0.000 claims description 11
- 201000010099 disease Diseases 0.000 claims description 10
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 10
- 230000029087 digestion Effects 0.000 claims description 9
- 102000004169 proteins and genes Human genes 0.000 claims description 9
- 108020004682 Single-Stranded DNA Proteins 0.000 claims description 8
- 238000003752 polymerase chain reaction Methods 0.000 claims description 8
- 108091032973 (ribonucleotides)n+m Proteins 0.000 claims description 5
- 239000002299 complementary DNA Substances 0.000 claims description 5
- 238000003499 nucleic acid array Methods 0.000 claims description 4
- 238000012216 screening Methods 0.000 claims description 4
- 108010010677 Phosphodiesterase I Proteins 0.000 claims description 2
- 108020004999 messenger RNA Proteins 0.000 claims description 2
- 238000002474 experimental method Methods 0.000 abstract description 9
- 238000002360 preparation method Methods 0.000 abstract description 6
- 239000013615 primer Substances 0.000 description 41
- 238000003491 array Methods 0.000 description 32
- 108700028369 Alleles Proteins 0.000 description 21
- 238000004925 denaturation Methods 0.000 description 14
- 230000036425 denaturation Effects 0.000 description 14
- 108091034117 Oligonucleotide Proteins 0.000 description 13
- 238000013461 design Methods 0.000 description 12
- 230000002068 genetic effect Effects 0.000 description 12
- 102000003960 Ligases Human genes 0.000 description 11
- 108090000364 Ligases Proteins 0.000 description 11
- 239000000499 gel Substances 0.000 description 11
- TWRXJAOTZQYOKJ-UHFFFAOYSA-L Magnesium chloride Chemical compound [Mg+2].[Cl-].[Cl-] TWRXJAOTZQYOKJ-UHFFFAOYSA-L 0.000 description 10
- 239000000872 buffer Substances 0.000 description 9
- 238000011534 incubation Methods 0.000 description 9
- 238000002955 isolation Methods 0.000 description 9
- 238000000137 annealing Methods 0.000 description 8
- 230000000295 complement effect Effects 0.000 description 8
- 238000001514 detection method Methods 0.000 description 8
- 239000003550 marker Substances 0.000 description 8
- 239000000047 product Substances 0.000 description 8
- 238000012360 testing method Methods 0.000 description 8
- 206010028980 Neoplasm Diseases 0.000 description 7
- 238000012408 PCR amplification Methods 0.000 description 7
- 238000004590 computer program Methods 0.000 description 7
- KDCGOANMDULRCW-UHFFFAOYSA-N 7H-purine Chemical compound N1=CNC2=NC=NC2=C1 KDCGOANMDULRCW-UHFFFAOYSA-N 0.000 description 6
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 6
- 238000001976 enzyme digestion Methods 0.000 description 6
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 6
- 239000010931 gold Substances 0.000 description 6
- 229910052737 gold Inorganic materials 0.000 description 6
- 238000002372 labelling Methods 0.000 description 6
- SCVFZCLFOSHCOH-UHFFFAOYSA-M potassium acetate Chemical compound [K+].CC([O-])=O SCVFZCLFOSHCOH-UHFFFAOYSA-M 0.000 description 6
- 230000009467 reduction Effects 0.000 description 6
- 241001465754 Metazoa Species 0.000 description 5
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 5
- 201000011510 cancer Diseases 0.000 description 5
- 210000004027 cell Anatomy 0.000 description 5
- 238000003776 cleavage reaction Methods 0.000 description 5
- 238000003205 genotyping method Methods 0.000 description 5
- 238000007403 mPCR Methods 0.000 description 5
- 229910001629 magnesium chloride Inorganic materials 0.000 description 5
- 230000002829 reductive effect Effects 0.000 description 5
- 230000007017 scission Effects 0.000 description 5
- 238000000926 separation method Methods 0.000 description 5
- 102000012410 DNA Ligases Human genes 0.000 description 4
- 108010061982 DNA Ligases Proteins 0.000 description 4
- 108010054576 Deoxyribonuclease EcoRI Proteins 0.000 description 4
- 241000196324 Embryophyta Species 0.000 description 4
- 108020004711 Nucleic Acid Probes Proteins 0.000 description 4
- 108091093037 Peptide nucleic acid Proteins 0.000 description 4
- 235000014680 Saccharomyces cerevisiae Nutrition 0.000 description 4
- 239000002253 acid Substances 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 239000002853 nucleic acid probe Substances 0.000 description 4
- 102000040430 polynucleotide Human genes 0.000 description 4
- 108091033319 polynucleotide Proteins 0.000 description 4
- 239000002157 polynucleotide Substances 0.000 description 4
- 238000000746 purification Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 239000000243 solution Substances 0.000 description 4
- 241000894007 species Species 0.000 description 4
- 239000000758 substrate Substances 0.000 description 4
- 238000001308 synthesis method Methods 0.000 description 4
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 3
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 3
- 108010086093 Mung Bean Nuclease Proteins 0.000 description 3
- 108010038272 MutS Proteins Proteins 0.000 description 3
- 102000010645 MutS Proteins Human genes 0.000 description 3
- CZPWVGJYEJSRLH-UHFFFAOYSA-N Pyrimidine Chemical compound C1=CN=CN=C1 CZPWVGJYEJSRLH-UHFFFAOYSA-N 0.000 description 3
- 102000018120 Recombinases Human genes 0.000 description 3
- 108010091086 Recombinases Proteins 0.000 description 3
- 108091081062 Repeated sequence (DNA) Proteins 0.000 description 3
- 238000013459 approach Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 3
- 210000000349 chromosome Anatomy 0.000 description 3
- 238000005094 computer simulation Methods 0.000 description 3
- 238000012217 deletion Methods 0.000 description 3
- 230000037430 deletion Effects 0.000 description 3
- 239000000975 dye Substances 0.000 description 3
- 238000013467 fragmentation Methods 0.000 description 3
- 238000006062 fragmentation reaction Methods 0.000 description 3
- 238000003780 insertion Methods 0.000 description 3
- 230000037431 insertion Effects 0.000 description 3
- UEGPKNKPLBYCNK-UHFFFAOYSA-L magnesium acetate Chemical compound [Mg+2].CC([O-])=O.CC([O-])=O UEGPKNKPLBYCNK-UHFFFAOYSA-L 0.000 description 3
- 239000011654 magnesium acetate Substances 0.000 description 3
- 235000011285 magnesium acetate Nutrition 0.000 description 3
- 229940069446 magnesium acetate Drugs 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 230000035772 mutation Effects 0.000 description 3
- 230000002974 pharmacogenomic effect Effects 0.000 description 3
- 235000011056 potassium acetate Nutrition 0.000 description 3
- 230000003252 repetitive effect Effects 0.000 description 3
- 238000012163 sequencing technique Methods 0.000 description 3
- 238000006467 substitution reaction Methods 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 238000011282 treatment Methods 0.000 description 3
- 208000023275 Autoimmune disease Diseases 0.000 description 2
- -1 DNA and RNA Chemical class 0.000 description 2
- 108010008532 Deoxyribonuclease I Proteins 0.000 description 2
- 102000007260 Deoxyribonuclease I Human genes 0.000 description 2
- 241000588724 Escherichia coli Species 0.000 description 2
- 241000701533 Escherichia virus T4 Species 0.000 description 2
- 108060002716 Exonuclease Proteins 0.000 description 2
- 108091027305 Heteroduplex Proteins 0.000 description 2
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 2
- 150000007513 acids Chemical class 0.000 description 2
- 125000005210 alkyl ammonium group Chemical group 0.000 description 2
- 238000003556 assay Methods 0.000 description 2
- 230000001580 bacterial effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 239000008280 blood Substances 0.000 description 2
- 210000004369 blood Anatomy 0.000 description 2
- 230000002759 chromosomal effect Effects 0.000 description 2
- 238000010367 cloning Methods 0.000 description 2
- 238000005520 cutting process Methods 0.000 description 2
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 230000001627 detrimental effect Effects 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000001962 electrophoresis Methods 0.000 description 2
- 108010064144 endodeoxyribonuclease VII Proteins 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 102000013165 exonuclease Human genes 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 235000013305 food Nutrition 0.000 description 2
- 238000001502 gel electrophoresis Methods 0.000 description 2
- 239000011521 glass Substances 0.000 description 2
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 2
- 238000004128 high performance liquid chromatography Methods 0.000 description 2
- NOESYZHRGYRDHS-UHFFFAOYSA-N insulin Chemical compound N1C(=O)C(NC(=O)C(CCC(N)=O)NC(=O)C(CCC(O)=O)NC(=O)C(C(C)C)NC(=O)C(NC(=O)CN)C(C)CC)CSSCC(C(NC(CO)C(=O)NC(CC(C)C)C(=O)NC(CC=2C=CC(O)=CC=2)C(=O)NC(CCC(N)=O)C(=O)NC(CC(C)C)C(=O)NC(CCC(O)=O)C(=O)NC(CC(N)=O)C(=O)NC(CC=2C=CC(O)=CC=2)C(=O)NC(CSSCC(NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(CC=2C=CC(O)=CC=2)NC(=O)C(CC(C)C)NC(=O)C(C)NC(=O)C(CCC(O)=O)NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(CC=2NC=NC=2)NC(=O)C(CO)NC(=O)CNC2=O)C(=O)NCC(=O)NC(CCC(O)=O)C(=O)NC(CCCNC(N)=N)C(=O)NCC(=O)NC(CC=3C=CC=CC=3)C(=O)NC(CC=3C=CC=CC=3)C(=O)NC(CC=3C=CC(O)=CC=3)C(=O)NC(C(C)O)C(=O)N3C(CCC3)C(=O)NC(CCCCN)C(=O)NC(C)C(O)=O)C(=O)NC(CC(N)=O)C(O)=O)=O)NC(=O)C(C(C)CC)NC(=O)C(CO)NC(=O)C(C(C)O)NC(=O)C1CSSCC2NC(=O)C(CC(C)C)NC(=O)C(NC(=O)C(CCC(N)=O)NC(=O)C(CC(N)=O)NC(=O)C(NC(=O)C(N)CC=1C=CC=CC=1)C(C)C)CC1=CN=CN1 NOESYZHRGYRDHS-UHFFFAOYSA-N 0.000 description 2
- 231100000518 lethal Toxicity 0.000 description 2
- 230000001665 lethal effect Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000002483 medication Methods 0.000 description 2
- 230000036961 partial effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 108090000765 processed proteins & peptides Proteins 0.000 description 2
- 102000004196 processed proteins & peptides Human genes 0.000 description 2
- 230000008439 repair process Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 150000003839 salts Chemical class 0.000 description 2
- 239000012192 staining solution Substances 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 239000010414 supernatant solution Substances 0.000 description 2
- PIEPQKCYPFFYMG-UHFFFAOYSA-N tris acetate Chemical compound CC(O)=O.OCC(N)(CO)CO PIEPQKCYPFFYMG-UHFFFAOYSA-N 0.000 description 2
- HRANPRDGABOKNQ-ORGXEYTDSA-N (1r,3r,3as,3br,7ar,8as,8bs,8cs,10as)-1-acetyl-5-chloro-3-hydroxy-8b,10a-dimethyl-7-oxo-1,2,3,3a,3b,7,7a,8,8a,8b,8c,9,10,10a-tetradecahydrocyclopenta[a]cyclopropa[g]phenanthren-1-yl acetate Chemical compound C1=C(Cl)C2=CC(=O)[C@@H]3C[C@@H]3[C@]2(C)[C@@H]2[C@@H]1[C@@H]1[C@H](O)C[C@@](C(C)=O)(OC(=O)C)[C@@]1(C)CC2 HRANPRDGABOKNQ-ORGXEYTDSA-N 0.000 description 1
- 208000005452 Acute intermittent porphyria Diseases 0.000 description 1
- 208000007848 Alcoholism Diseases 0.000 description 1
- 201000004384 Alopecia Diseases 0.000 description 1
- 108091093088 Amplicon Proteins 0.000 description 1
- 244000105975 Antidesma platyphyllum Species 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 108091003079 Bovine Serum Albumin Proteins 0.000 description 1
- 108091026890 Coding region Proteins 0.000 description 1
- 108020004703 Cruciform DNA Proteins 0.000 description 1
- 102100034157 DNA mismatch repair protein Msh2 Human genes 0.000 description 1
- 102100037700 DNA mismatch repair protein Msh3 Human genes 0.000 description 1
- 239000003155 DNA primer Substances 0.000 description 1
- 238000001712 DNA sequencing Methods 0.000 description 1
- 108010053770 Deoxyribonucleases Proteins 0.000 description 1
- 102000016911 Deoxyribonucleases Human genes 0.000 description 1
- 208000002197 Ehlers-Danlos syndrome Diseases 0.000 description 1
- 241000701832 Enterobacteria phage T3 Species 0.000 description 1
- 102100038075 Eukaryotic translation initiation factor 2D Human genes 0.000 description 1
- 208000024720 Fabry Disease Diseases 0.000 description 1
- 208000003807 Graves Disease Diseases 0.000 description 1
- 208000015023 Graves' disease Diseases 0.000 description 1
- 208000031953 Hereditary hemorrhagic telangiectasia Diseases 0.000 description 1
- 102000016871 Hexosaminidase A Human genes 0.000 description 1
- 108010053317 Hexosaminidase A Proteins 0.000 description 1
- 101001134036 Homo sapiens DNA mismatch repair protein Msh2 Proteins 0.000 description 1
- 101001027762 Homo sapiens DNA mismatch repair protein Msh3 Proteins 0.000 description 1
- 208000000563 Hyperlipoproteinemia Type II Diseases 0.000 description 1
- 206010061218 Inflammation Diseases 0.000 description 1
- 102000004877 Insulin Human genes 0.000 description 1
- 108090001061 Insulin Proteins 0.000 description 1
- 208000009625 Lesch-Nyhan syndrome Diseases 0.000 description 1
- 102100024640 Low-density lipoprotein receptor Human genes 0.000 description 1
- 229910015834 MSH1 Inorganic materials 0.000 description 1
- 229910015837 MSH2 Inorganic materials 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 108091092878 Microsatellite Proteins 0.000 description 1
- 229910003177 MnII Inorganic materials 0.000 description 1
- 101100238610 Mus musculus Msh3 gene Proteins 0.000 description 1
- OKIZCWYLBDKLSU-UHFFFAOYSA-M N,N,N-Trimethylmethanaminium chloride Chemical compound [Cl-].C[N+](C)(C)C OKIZCWYLBDKLSU-UHFFFAOYSA-M 0.000 description 1
- 101100384865 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) cot-1 gene Proteins 0.000 description 1
- 208000008589 Obesity Diseases 0.000 description 1
- 206010031243 Osteogenesis imperfecta Diseases 0.000 description 1
- 206010036182 Porphyria acute Diseases 0.000 description 1
- 108010090804 Streptavidin Proteins 0.000 description 1
- 241000194017 Streptococcus Species 0.000 description 1
- 208000037065 Subacute sclerosing leukoencephalitis Diseases 0.000 description 1
- 206010042297 Subacute sclerosing panencephalitis Diseases 0.000 description 1
- 229930006000 Sucrose Natural products 0.000 description 1
- CZMRCDWAGMRECN-UGDNZRGBSA-N Sucrose Chemical compound O[C@H]1[C@H](O)[C@@H](CO)O[C@@]1(CO)O[C@@H]1[C@H](O)[C@@H](O)[C@H](O)[C@@H](CO)O1 CZMRCDWAGMRECN-UGDNZRGBSA-N 0.000 description 1
- 208000035317 Total hypoxanthine-guanine phosphoribosyl transferase deficiency Diseases 0.000 description 1
- 102000004357 Transferases Human genes 0.000 description 1
- 108090000992 Transferases Proteins 0.000 description 1
- 208000026911 Tuberous sclerosis complex Diseases 0.000 description 1
- 206010045261 Type IIa hyperlipidaemia Diseases 0.000 description 1
- 244000000188 Vaccinium ovalifolium Species 0.000 description 1
- 102100026383 Vasopressin-neurophysin 2-copeptin Human genes 0.000 description 1
- 208000027276 Von Willebrand disease Diseases 0.000 description 1
- 208000006110 Wiskott-Aldrich syndrome Diseases 0.000 description 1
- GRRMZXFOOGQMFA-UHFFFAOYSA-J YoYo-1 Chemical compound [I-].[I-].[I-].[I-].C12=CC=CC=C2C(C=C2N(C3=CC=CC=C3O2)C)=CC=[N+]1CCC[N+](C)(C)CCC[N+](C)(C)CCC[N+](C1=CC=CC=C11)=CC=C1C=C1N(C)C2=CC=CC=C2O1 GRRMZXFOOGQMFA-UHFFFAOYSA-J 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 239000011543 agarose gel Substances 0.000 description 1
- 230000009418 agronomic effect Effects 0.000 description 1
- 201000007930 alcohol dependence Diseases 0.000 description 1
- 150000001413 amino acids Chemical class 0.000 description 1
- 102000023732 binding proteins Human genes 0.000 description 1
- 108091008324 binding proteins Proteins 0.000 description 1
- 229940098773 bovine serum albumin Drugs 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 210000000481 breast Anatomy 0.000 description 1
- 238000012511 carbohydrate analysis Methods 0.000 description 1
- 150000001720 carbohydrates Chemical class 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 230000000112 colonic effect Effects 0.000 description 1
- 108091036078 conserved sequence Proteins 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000001816 cooling Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 244000038559 crop plants Species 0.000 description 1
- 229940104302 cytosine Drugs 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000027832 depurination Effects 0.000 description 1
- 230000001066 destructive effect Effects 0.000 description 1
- 201000010064 diabetes insipidus Diseases 0.000 description 1
- 206010012601 diabetes mellitus Diseases 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 239000000539 dimer Substances 0.000 description 1
- 230000005684 electric field Effects 0.000 description 1
- 210000003238 esophagus Anatomy 0.000 description 1
- ZMMJGEGLRURXTF-UHFFFAOYSA-N ethidium bromide Chemical compound [Br-].C12=CC(N)=CC=C2C2=CC=C(N)C=C2[N+](CC)=C1C1=CC=CC=C1 ZMMJGEGLRURXTF-UHFFFAOYSA-N 0.000 description 1
- 229960005542 ethidium bromide Drugs 0.000 description 1
- 238000013401 experimental design Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 201000001386 familial hypercholesterolemia Diseases 0.000 description 1
- 230000035558 fertility Effects 0.000 description 1
- 238000001917 fluorescence detection Methods 0.000 description 1
- 239000007850 fluorescent dye Substances 0.000 description 1
- 238000004374 forensic analysis Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000012252 genetic analysis Methods 0.000 description 1
- 235000009424 haa Nutrition 0.000 description 1
- 230000003676 hair loss Effects 0.000 description 1
- 208000019622 heart disease Diseases 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 229910001385 heavy metal Inorganic materials 0.000 description 1
- 208000009601 hereditary spherocytosis Diseases 0.000 description 1
- 229940094991 herring sperm dna Drugs 0.000 description 1
- 238000000338 in vitro Methods 0.000 description 1
- 230000002779 inactivation Effects 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 230000004054 inflammatory process Effects 0.000 description 1
- 229940125396 insulin Drugs 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000000543 intermediate Substances 0.000 description 1
- 210000003734 kidney Anatomy 0.000 description 1
- 208000032839 leukemia Diseases 0.000 description 1
- 108010057342 ligatin Proteins 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 210000004185 liver Anatomy 0.000 description 1
- 239000012160 loading buffer Substances 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000021121 meiosis Effects 0.000 description 1
- 238000002493 microarray Methods 0.000 description 1
- 244000000010 microbial pathogen Species 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 239000008267 milk Substances 0.000 description 1
- 235000013336 milk Nutrition 0.000 description 1
- 210000004080 milk Anatomy 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010369 molecular cloning Methods 0.000 description 1
- 230000009456 molecular mechanism Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 210000000214 mouth Anatomy 0.000 description 1
- 101150093855 msh1 gene Proteins 0.000 description 1
- 201000006417 multiple sclerosis Diseases 0.000 description 1
- 201000006938 muscular dystrophy Diseases 0.000 description 1
- 210000000653 nervous system Anatomy 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 230000001293 nucleolytic effect Effects 0.000 description 1
- 235000020824 obesity Nutrition 0.000 description 1
- 210000001672 ovary Anatomy 0.000 description 1
- 210000000496 pancreas Anatomy 0.000 description 1
- 208000030761 polycystic kidney disease Diseases 0.000 description 1
- 229920001184 polypeptide Polymers 0.000 description 1
- 208000015768 polyposis Diseases 0.000 description 1
- 210000002307 prostate Anatomy 0.000 description 1
- 230000002285 radioactive effect Effects 0.000 description 1
- 239000011541 reaction mixture Substances 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000004153 renaturation Methods 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 206010039073 rheumatoid arthritis Diseases 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000009394 selective breeding Methods 0.000 description 1
- 210000003491 skin Anatomy 0.000 description 1
- 239000011780 sodium chloride Substances 0.000 description 1
- 229910000162 sodium phosphate Inorganic materials 0.000 description 1
- 238000010532 solid phase synthesis reaction Methods 0.000 description 1
- 238000010186 staining Methods 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 210000002784 stomach Anatomy 0.000 description 1
- 239000005720 sucrose Substances 0.000 description 1
- 208000037369 susceptibility to malaria Diseases 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 201000000596 systemic lupus erythematosus Diseases 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000014616 translation Effects 0.000 description 1
- 208000009999 tuberous sclerosis Diseases 0.000 description 1
- 210000003932 urinary bladder Anatomy 0.000 description 1
- 210000004291 uterus Anatomy 0.000 description 1
- 208000012137 von Willebrand disease (hereditary or acquired) Diseases 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6813—Hybridisation assays
- C12Q1/6834—Enzymatic or biochemical coupling of nucleic acids to a solid phase
- C12Q1/6837—Enzymatic or biochemical coupling of nucleic acids to a solid phase using probe arrays or probe chips
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6844—Nucleic acid amplification reactions
- C12Q1/6853—Nucleic acid amplification reactions using modified primers or templates
- C12Q1/6855—Ligating adaptors
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6844—Nucleic acid amplification reactions
- C12Q1/6858—Allele-specific amplification
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S977/00—Nanotechnology
- Y10S977/902—Specified use of nanostructure
- Y10S977/904—Specified use of nanostructure for medical, immunological, body treatment, or diagnosis
- Y10S977/924—Specified use of nanostructure for medical, immunological, body treatment, or diagnosis using nanostructure as support of dna analysis
Landscapes
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- Physics & Mathematics (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Analytical Chemistry (AREA)
- Immunology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The present invention provides for novel methods of sample preparation and analysis involving reproducibly reducing the complexity of a nucleic sample. The invention further provides for analysis of the above sample by hybridization to an array which may be specifically designed to interrogate the desired fragments for particular characteristics, such as, for example, the presence or absence of a polymorphism. The invention further provides fo r novel methods of using a computer system to model enzymatic reactions in ord er to determine experimental conditions before conducting actual experiments.</ SDOAB>
Description
COMPLEXITY MANAGEMENT AND ANALYSIS OF GENOMIC DNA
RELATED APPLICATIONS
This application claims the benefit of U.S. Provisional Application Serial Nos.
60/105,867, filed 10/27/98, and 60/136,125, filed 5/26/99, the entire teachings of which are incorporated herein by reference.
BACI~.GROUND OF THE INVENTION
The past years have seen a dynamic change in the ability of science to comprehend vast amounts of data. Pioneering technologies such as nucleic acid arrays allow scientists to delve into the world of genetics in far greater detail than ever before.
Exploration of genomic DNA has long been a dream of the scientific community.
Held within the complex structures of genomic DNA lies the potential to identify, diagnose, or treat diseases like cancer,, alzheimers or alcoholism. Answers to the wand's food distribution problems may be held within the exploitation of genomic information from plants and animals.
It is estimated that by the Spring of 2000 a reference sequence of the entire human genome will be sequenced allowing for types of genetic analysis that were never before possible. Novel methods of sample preparation and sample analysis are needed to provide for the fast and .cost effective exploration of complex samples of nucleic acids, particularly genomic DNA.
SUMMARY OF THE INVENTION
The present invention provides a flexible and scalable method for analyzing complex samples of nucleic. acids, such as genomic DNA. These methods are not limited to any particular type; of nucleic acid sample: plant, bacterial, animal (including human) total genome DNA, RNA, cDNA and the like may be analyzed using some or all of the methods disclosed in this invention. The word "DNA" may be used below as .
an example of a nucleic acid.. It is understood that this term includes all nucleic acids, such as DNA and RNA, unless a use below requires a specific type of nucleic acid.
This invention provides a powerful tool for analysis of complex nucleic acid samples.
RELATED APPLICATIONS
This application claims the benefit of U.S. Provisional Application Serial Nos.
60/105,867, filed 10/27/98, and 60/136,125, filed 5/26/99, the entire teachings of which are incorporated herein by reference.
BACI~.GROUND OF THE INVENTION
The past years have seen a dynamic change in the ability of science to comprehend vast amounts of data. Pioneering technologies such as nucleic acid arrays allow scientists to delve into the world of genetics in far greater detail than ever before.
Exploration of genomic DNA has long been a dream of the scientific community.
Held within the complex structures of genomic DNA lies the potential to identify, diagnose, or treat diseases like cancer,, alzheimers or alcoholism. Answers to the wand's food distribution problems may be held within the exploitation of genomic information from plants and animals.
It is estimated that by the Spring of 2000 a reference sequence of the entire human genome will be sequenced allowing for types of genetic analysis that were never before possible. Novel methods of sample preparation and sample analysis are needed to provide for the fast and .cost effective exploration of complex samples of nucleic acids, particularly genomic DNA.
SUMMARY OF THE INVENTION
The present invention provides a flexible and scalable method for analyzing complex samples of nucleic. acids, such as genomic DNA. These methods are not limited to any particular type; of nucleic acid sample: plant, bacterial, animal (including human) total genome DNA, RNA, cDNA and the like may be analyzed using some or all of the methods disclosed in this invention. The word "DNA" may be used below as .
an example of a nucleic acid.. It is understood that this term includes all nucleic acids, such as DNA and RNA, unless a use below requires a specific type of nucleic acid.
This invention provides a powerful tool for analysis of complex nucleic acid samples.
From experimental design to isolation of desired fragments and hybridization to an appropriate array, the invention provides for faster, more efficient and less expensive methods of complex nucleic acid analysis.
The present invention provides for novel methods of sample preparation and analysis comprising managing or reducing, in a reproducible manner, the complexity of a nucleic acid sample. The present invention eliminates the need for multiplex PCR, a time intensive and expensive step in most large scale analysis protocols, and for many of the embodiments the step of complexity reduction may be performed entirely in a single tube. The invention further provides for analysis of the sample by hybridization to to an array which may be specifically designed to interrogate fragments for particular characteristics, such as, for example, the presence or absence of a polymorphism. The invention further provides for novel methods of using a computer system to model enzymatic reactions in order to determine experimental conditions and/or to design arrays. In a preferred embodiment the invention discloses novel methods of genome -wide polymorphism discovery and genotyping.
In one embodiment of the invention, the step of complexity management of the nucleic acid sample comprises enzymatically cutting the nucleic sample into fragments, separating the fragments and selecting a particular fragment pool. Optionally, the selected fragments are then ligated to adaptor sequences containing PCR primer templates.
In a preferred embodiment, the step of complexity management is performed entirely in a single tube.
In one embodiment of complexity management, a type Its endonuclease is used to digest the nucleic acid sample and the fragments are selectively ligated to adaptor sequences and then amplified.
In another embodiment, the method of complexity management utilizes two restriction enzymes with different cutting sites and frequencies and two different adaptor sequences.
In another embodiment of the invention, the step of complexity management comprises performing the ,Arbitrarily Primed Polymerase Chain Reaction (AP
PCR) upon the sample.
The present invention provides for novel methods of sample preparation and analysis comprising managing or reducing, in a reproducible manner, the complexity of a nucleic acid sample. The present invention eliminates the need for multiplex PCR, a time intensive and expensive step in most large scale analysis protocols, and for many of the embodiments the step of complexity reduction may be performed entirely in a single tube. The invention further provides for analysis of the sample by hybridization to to an array which may be specifically designed to interrogate fragments for particular characteristics, such as, for example, the presence or absence of a polymorphism. The invention further provides for novel methods of using a computer system to model enzymatic reactions in order to determine experimental conditions and/or to design arrays. In a preferred embodiment the invention discloses novel methods of genome -wide polymorphism discovery and genotyping.
In one embodiment of the invention, the step of complexity management of the nucleic acid sample comprises enzymatically cutting the nucleic sample into fragments, separating the fragments and selecting a particular fragment pool. Optionally, the selected fragments are then ligated to adaptor sequences containing PCR primer templates.
In a preferred embodiment, the step of complexity management is performed entirely in a single tube.
In one embodiment of complexity management, a type Its endonuclease is used to digest the nucleic acid sample and the fragments are selectively ligated to adaptor sequences and then amplified.
In another embodiment, the method of complexity management utilizes two restriction enzymes with different cutting sites and frequencies and two different adaptor sequences.
In another embodiment of the invention, the step of complexity management comprises performing the ,Arbitrarily Primed Polymerase Chain Reaction (AP
PCR) upon the sample.
In another embodiment of the invention, the step of complexity management comprises removing repeated sequences by denaturing and reannealing the DNA
and then removing double stranded duplexes.
In another embodirr.~ent of the invention, the step of complexity management comprises hybridizing the DNA sample to a magnetic bead which is bound to an oligonucleotide probe containing a desired sequence. This embodiment may further comprise exposing the hybridized sample to a single strand DNA nuclease to remove the single stranded DNA, ligating an adaptor sequence containing a Class II S
restriction enzyme site to the resulting duplexed DNA and digesting the duplex with the appropriate Class II S restriction enzyme to release the magnetic bead. This embodiment may or may not comprise amplification of the isolated DNA sequence.
Furthermore, the adaptor sequence may or may not be used as a template for the PCR
primer. In this embodiment, the adaptor sequence may or may not contain a SNP
identification sequence or tag.
In another embodiment, the method of complexity management comprises exposing the DNA sample to a mismatch binding protein and digesting the sample with a 3' to S' exonuclease and then a single strand DNA nuclease. This embodiment may or may not include the use of a magnetic bead attached to the mismatch binding protein.
2o BRIEF DESCRIPTION OF THE FIGURES
Figure 1 is a schematic representation of a method of complexity management comprising restriction enzyme digest, fragment separation, and isolation and purification of a fragment size range of interest.
Figure 2 is a schematic representation of a method of complexity management comprising restriction enzyme digest, fragment separation, isolation and purification of a fragment size range of interest, ligation of an adaptor sequence to the desired fragments and amplification of those fragments.
Figure 3 depicts the effect on complexity of PCR amplification using primers with and without specific nucleotides.
3o Figure 4 is a schematic representation of a method of complexity management comprising a type Its restriction enzyme digest, adaptor sequence ligation and amplification of desired fragments.
and then removing double stranded duplexes.
In another embodirr.~ent of the invention, the step of complexity management comprises hybridizing the DNA sample to a magnetic bead which is bound to an oligonucleotide probe containing a desired sequence. This embodiment may further comprise exposing the hybridized sample to a single strand DNA nuclease to remove the single stranded DNA, ligating an adaptor sequence containing a Class II S
restriction enzyme site to the resulting duplexed DNA and digesting the duplex with the appropriate Class II S restriction enzyme to release the magnetic bead. This embodiment may or may not comprise amplification of the isolated DNA sequence.
Furthermore, the adaptor sequence may or may not be used as a template for the PCR
primer. In this embodiment, the adaptor sequence may or may not contain a SNP
identification sequence or tag.
In another embodiment, the method of complexity management comprises exposing the DNA sample to a mismatch binding protein and digesting the sample with a 3' to S' exonuclease and then a single strand DNA nuclease. This embodiment may or may not include the use of a magnetic bead attached to the mismatch binding protein.
2o BRIEF DESCRIPTION OF THE FIGURES
Figure 1 is a schematic representation of a method of complexity management comprising restriction enzyme digest, fragment separation, and isolation and purification of a fragment size range of interest.
Figure 2 is a schematic representation of a method of complexity management comprising restriction enzyme digest, fragment separation, isolation and purification of a fragment size range of interest, ligation of an adaptor sequence to the desired fragments and amplification of those fragments.
Figure 3 depicts the effect on complexity of PCR amplification using primers with and without specific nucleotides.
3o Figure 4 is a schematic representation of a method of complexity management comprising a type Its restriction enzyme digest, adaptor sequence ligation and amplification of desired fragments.
Figure 5 depicts type Its restriction enzymes and their cleavage sites.
Figure 6 is a schematic representation of a method of complexity management comprising a type Its restriction enzyme digest, adaptor sequence ligation and amplification of desired fragments.
Figure 7 is a schematic representation of a method of complexity management comprising AP PCR.
Figure 8 depicts the results of AP PCR on human genomic DNA.
Figure 9 depicts the reproducibility of AP PCR.
Figure 10 is a schematic representation of a method of complexity management to comprising removing repetitive sequences by denaturing and reannealing genomic DNA.
Figure 11 is a schematic representation of a method of complexity management comprising hybridizing a probe sequence attached to a magnetic bead to a pool of fractionated DNA.
15 Figure 12 is a schematic representation of a method of complexity management comprising hybridizing a probe sequence bound to a magnetic bead to a pool of fractionated DNA, ligating an adaptor sequence containing a class Its restriction enzyme site to the DNA/probe duplex, digesting the duplex, ligating a second adaptor sequence to the duplex and amplifying.
2o Figure 13 is a schematic representation of a method of complexity management comprising hybridizing a probe sequence bound to a magnetic bead to a pool of fractionated DNA, ligating an adaptor sequence containing a class Its restriction enzyme site to the DNA/probe duplex, digesting the duplex, ligating a second adaptor sequence to the duplex and amplifying.
25 Figure 14 depicts a chimeric probe array.
Figure 15 is a sche:rrratic representation of a method of complexity management comprising hybridizing a probe sequence attached to a magnetic bead to a pool of fractionated DNA, ligatin~; an adaptor sequence containing a class Its restriction enzyme site to the DNAlprobe duplex, digesting the duplex, ligating a second adaptor.
3o sequence to the duplex, amplifying and hybridizing the amplicons to a chimeric probe array.
Figure 16 is a schematic representation of a method of complexity management comprising hybridizing a mismatch binding protein to DNA containing a polymorphism and isolating the region containing the polymorphism.
Figure 17 is a schematic representation of a method of complexity management comprising attaching a magnetic bead to the mismatch binding protein of Figure 16.
Figure 18 shows digestion of DNA by a combination of restriction enzymes.
Figure 19 shows digested yeast total genomic DNA.
Exhibit 1 is an example of one type of computer program which can be written to model restriction enzyme digestions.
to Exhibit 2 is an example of one type of computer program which can be written to model ligation reactions.
DETAILED DESCRIPTION OF THE PRESENT INVENTION
This application relies on the disclosure of other patent applications and 15 literature references. These documents are hereby incorporated by reference in their entireties for all purposes.
Definitions A "genome" is all the genetic material in the chromosomes of an organism.
2o DNA derived from the genetic material in the chromosomes of a particular organism is genomic DNA. A genomic library is a collection of clones made from a set of randomly generated overlapping DNA fragments representing the entire genome of an organism.
An "oligonucleotide" can be nucleic acid, such as DNA or RNA, and single- or 25 double-stranded. Oligonucleotides can be naturally occurring or synthetic, but are typically prepared by synthetic means. Oligonucleotides can be of any length but are usually at least 5, 10, or 20 bases long and may be up to 20, 50, 100, 1,000, or 5,000 bases long. A polymorphic site can, occur within any position of the oligonucleotide.
Oligonucleotides can include peptide nucleic acids (PNAs) or analog nucleic acids.
3o See US Patent Application No. 08/630,427 filed 4/3/96.
An array comprises a solid support with nucleic acid probes attached to said support. Arrays typically comprise a plurality of different oligonucleotide probes that are coupled to a surface of a substrate in different known locations. These arrays, also described as "microarrays" or colloquially "chips" have been generally described in the art, for example, U.S. Pat. :IVos. 5,143,854, 5445934, 5,744,305, 5,677,195 and PCT
Patent Publication Nos. WO 90/15070 and 92/10092. Each of which is incorporated by reference in its entirety for all purposes. These arrays may generally be produced using mechanical synthesis methods or light directed synthesis methods which incorporate a combination of photolithographic methods and solid phase synthesis methods.
See Fodor et al., Science, 251:767-777 (1991), Pirrung et al., U.S. Pat. No.
Figure 6 is a schematic representation of a method of complexity management comprising a type Its restriction enzyme digest, adaptor sequence ligation and amplification of desired fragments.
Figure 7 is a schematic representation of a method of complexity management comprising AP PCR.
Figure 8 depicts the results of AP PCR on human genomic DNA.
Figure 9 depicts the reproducibility of AP PCR.
Figure 10 is a schematic representation of a method of complexity management to comprising removing repetitive sequences by denaturing and reannealing genomic DNA.
Figure 11 is a schematic representation of a method of complexity management comprising hybridizing a probe sequence attached to a magnetic bead to a pool of fractionated DNA.
15 Figure 12 is a schematic representation of a method of complexity management comprising hybridizing a probe sequence bound to a magnetic bead to a pool of fractionated DNA, ligating an adaptor sequence containing a class Its restriction enzyme site to the DNA/probe duplex, digesting the duplex, ligating a second adaptor sequence to the duplex and amplifying.
2o Figure 13 is a schematic representation of a method of complexity management comprising hybridizing a probe sequence bound to a magnetic bead to a pool of fractionated DNA, ligating an adaptor sequence containing a class Its restriction enzyme site to the DNA/probe duplex, digesting the duplex, ligating a second adaptor sequence to the duplex and amplifying.
25 Figure 14 depicts a chimeric probe array.
Figure 15 is a sche:rrratic representation of a method of complexity management comprising hybridizing a probe sequence attached to a magnetic bead to a pool of fractionated DNA, ligatin~; an adaptor sequence containing a class Its restriction enzyme site to the DNAlprobe duplex, digesting the duplex, ligating a second adaptor.
3o sequence to the duplex, amplifying and hybridizing the amplicons to a chimeric probe array.
Figure 16 is a schematic representation of a method of complexity management comprising hybridizing a mismatch binding protein to DNA containing a polymorphism and isolating the region containing the polymorphism.
Figure 17 is a schematic representation of a method of complexity management comprising attaching a magnetic bead to the mismatch binding protein of Figure 16.
Figure 18 shows digestion of DNA by a combination of restriction enzymes.
Figure 19 shows digested yeast total genomic DNA.
Exhibit 1 is an example of one type of computer program which can be written to model restriction enzyme digestions.
to Exhibit 2 is an example of one type of computer program which can be written to model ligation reactions.
DETAILED DESCRIPTION OF THE PRESENT INVENTION
This application relies on the disclosure of other patent applications and 15 literature references. These documents are hereby incorporated by reference in their entireties for all purposes.
Definitions A "genome" is all the genetic material in the chromosomes of an organism.
2o DNA derived from the genetic material in the chromosomes of a particular organism is genomic DNA. A genomic library is a collection of clones made from a set of randomly generated overlapping DNA fragments representing the entire genome of an organism.
An "oligonucleotide" can be nucleic acid, such as DNA or RNA, and single- or 25 double-stranded. Oligonucleotides can be naturally occurring or synthetic, but are typically prepared by synthetic means. Oligonucleotides can be of any length but are usually at least 5, 10, or 20 bases long and may be up to 20, 50, 100, 1,000, or 5,000 bases long. A polymorphic site can, occur within any position of the oligonucleotide.
Oligonucleotides can include peptide nucleic acids (PNAs) or analog nucleic acids.
3o See US Patent Application No. 08/630,427 filed 4/3/96.
An array comprises a solid support with nucleic acid probes attached to said support. Arrays typically comprise a plurality of different oligonucleotide probes that are coupled to a surface of a substrate in different known locations. These arrays, also described as "microarrays" or colloquially "chips" have been generally described in the art, for example, U.S. Pat. :IVos. 5,143,854, 5445934, 5,744,305, 5,677,195 and PCT
Patent Publication Nos. WO 90/15070 and 92/10092. Each of which is incorporated by reference in its entirety for all purposes. These arrays may generally be produced using mechanical synthesis methods or light directed synthesis methods which incorporate a combination of photolithographic methods and solid phase synthesis methods.
See Fodor et al., Science, 251:767-777 (1991), Pirrung et al., U.S. Pat. No.
5,143,854 (see also PCT Application No. WO 90/15070) and Fodor et al., PCT Publication No. WO
92/10092 and U.S. Pat. No. 5,424,186, each ofwhich is hereby incorporated in its entirety by reference for all purposes. Techniques for the synthesis of these arrays using mechanical synthesis methods are described in, e.g., U.S. Pat. No. 5,384,261, incorporated herein by reference in its entirety for all purposes. Although a planar array surface is preferred, the array may be fabricated on a surface of virtually any shape or even a multiplicity of surfaces. Arrays rnay be nucleic acids on beads, fibers such as fiber optics, glass or any other appropriate substrate, see US Patent Nos.
5,770,358, 5,789,162, 5,708,153 and 5,800,992 which are hereby incorporated in their entirety for all purposes. Arrays may be packaged in such a manner as to allow for diagnostics or other manipulation of in an all inclusive device, see for example, US
2o Patent Nos. 5,856,174 and 5,922,591 incorporated in their entirety by reference for all purposes.
Hybridization probes are oligonucleotides capable of binding in a base-specific manner to a complementary strand of nucleic acid. Such probes include peptide nucleic acids, as described in Niels;en et al., Science 254, 1497-1500 (1991), and other nucleic acid analogs and nucleic acid mimetics. See US Patent Application No.
081630,427 filed 4/3/96.
Hybridizations are usually performed under stringent conditions, for example, at a salt concentration of no more than 1 M and a temperature of at least 25 °C. For example, conditions of 5X SSPE (750 mM NaCI, 50 mM NaPhosphate, 5 mM EDTA, 3o pH 7.4) and a temperature of 25-30°C are suitable for allele-specific probe hybridizations. For stringent conditions, see, for example, Sambrook, Fritsche and Maniatis. "Molecular Cloning A laboratory Manual" 2"d Ed. Cold Spring Harbor Press (1989) which is hereby incorporated by reference in its entirety for all purposes above.
Polymorphism refers to the occurrence of two or more genetically determined alternative sequences or alleles in a population. A polymorphic marker or site is the locus at which divergence occurs. Preferred markers have at least two alleles, each occurring at frequency of l,~reater than 1 %, and more preferably greater than 10% or 20% of a selected population. A polymorphism may comprise one or more base changes, an insertion, a repeat, or a deletion. A polymorphic locus may be as small as one base pair. Polymorphiic markers include restriction fragment length 1o polymorphisms, variable number of tandem repeats (VNTR's), hypervariable regions, minisatellites, dinucleotidc; repeats, trinucleotide repeats, tetranucleotide repeats, simple sequence repeats, and insertion elements such as Alu. The first identified allelic form is arbitrarily designated as the reference form and other allelic forms are designated as alternative or variant alleles. The allelic form occurring most frequently in a selected is population is sometimes referred to as the wildtype form. Diploid organisms may be homozygous or heterozygous for allelic forms. A diallelic polymorphism has two forms. A triallelic polymorphism has three forms.
A single nucleotide polymorphism (SNP) occurs at a polymorphic site occupied by a single nucleotide, which is the site of variation between allelic sequences. The site 20 is usually preceded by and followed by highly conserved sequences of the allele (e.g., sequences that vary in less than 1/100 or 1/1000 members of the populations).
A single nucleotide polymorphism usually arises due to substitution of one nucleotide for another at the polymorphic site. A transition is the replacement of one purine by another purine or one pyrimidine by another pyrimidine. A
transversion is 25 the replacement of a purine by a pyrimidine or vice versa. Single nucleotide polymorphisms can also arise from a deletion of a nucleotide or an insertion of a nucleotide relative to a reference allele.
An individual is not limited to a human being, but may also include other organisms including but not limited to mammals, plants, bacteria or cells derived from 30 any of the above.
General The present invention provides for novel methods of sample preparation and analysis involving managing or reducing the complexity of a nucleic acid sample, such as genomic DNA, in a reproducible manner. The invention further provides for analysis of the above sample by hybridization to an array which may be specifically designed to interrogate the desired fragments for particular characteristics, such as, for example, the presence or absence of a polymorphism. The invention fixrther provides for novel methods of using a computer system to model enzymatic reactions in order to determine experimental conditions before conducting any actual experiments. As an to example, the present techniques are useful to identify new polymorphisms and to genotype individuals after palymorphisms have been identified.
Generally, the steps of the present invention involve reducing the complexity of a nucleic acid sample using the disclosed techniques alone or in combination.
None of these techniques require multiplex PCR and most of them can be performed in a single 15 tube. With one exception (AP PCR), the methods for complexity reduction involve fragmenting the nucleic acid sample, often, but not always by restriction enzyme digest.
The resulting fragments, or in the case of AP PCR, PCR products, of interest are then isolated. The isolation steps of the present invention vary but may involve size selection or direct amplification, often adaptor sequences are employed to facilitate 2o isolation. In a preferred embodiment the isolated sequences are then exposed to an array which may or may nat have been specifically designed and manufactured to interrogate the isolated sequences. Design of both the complexity management steps and the arrays may be aided by the computer modeling techniques which are also described in the present invention.
Complexitv mana eg_ment The present invention provides for a number of novel methods of complexity management of nucleic acid samples such as genomic DNA. These methods are disclosed below.
3o A number of methods disclosed herein require the use of restriction enzymes to fragment the nucleic acid sample. Methods of using a restriction enzyme or enzymes to cut nucleic acids at a large number of sites and selecting a size range of restriction fragments for assay have been shown. This scheme is illustrated in Figure 1.
In one embodiment of the invention, schematically illustrated in Figure 2, restriction enzymes are used to cut the nucleic acids in the sample (Fig. 2, Step 1). In general, a restriction enzyme recognizes a specific nucleotide sequence of four to eight nucleotides (though this number can vary) and cuts a DNA molecule at a specific site.
For example, the restriction enzyme Eco RI recognizes the sequence GAATTC and will cut a DNA molecule between the G and the first A. Many different restriction enzymes are known and appropriate restriction enzymes can be chosen for a desired result. For to example, restriction enzymes can be purchased from suppliers such as New England Biolabs. Methods for conducting restriction digests will be known to those of skill in the art, but directions for each restriction enzyme are generally supplied with the restriction enzymes themselves. For a thorough explanation of the use of restriction enzymes, see for example, section 5, specifically pages 5.2-5.32 of Sambrook, et al., 15 incorporated by reference above.
After restriction enzyme digestion, the method further requires that the pool of digested DNA fragments be separated by size and that DNA fragments of the desired size be selected (Figure 2, step 2) and isolated (Figure 2, Step 3). Methods for separating DNA fragments after a restriction digest will be well known to those of skill 2o in the art. As a non-limiting example, DNA fragments which have been digested with a restriction enzyme may be separated using gel electrophoresis, see for example, Maniatis, section 6. In this technique, DNA fragments are placed in a gel matrix. An electric field is applied across the gel and the DNA fragments migrate towards the positive end. The larger the DNA fragment, the more the fargment's migration is 25 inhibited by the gel matrix. This allows for the separation of the DNA
fragments by size. A size marker is run on the gel simultaneously with the DNA fragments so that the fragments of the desired size may be identified and isolated from the gel.
Methods for purification of the DNA fragments from the gel matrix are also described in Sambrook et al.
3o Any other non-destructive method of isolating DNA fragments of the desired size may be employed. For example, size-based chromotography, HPLC, dIiPLC or a sucrose density gradient could be used to reduce the DNA pool to those fragments WO 00/24939 ~CT/US99/25200 ~0 within a particular size range and then this smaller pool could be run on an electrophoresis gel.
After isolation, adaptor sequences are ligated to the fragments. (Figure 2, Step 4) Adaptor sequences are generally oligonucleotides of at least 5 or 10 bases and preferably no more than SO or 60 bases in length, however, adaptor sequences may be even longer, up to 100 or 200 bases depending upon the desired result. For example, if the desired outcome is to prevent amplification of a particular fragment, longer adaptor sequences designed to form. stem loops or other tertiary structures may be ligated to the fragment. Adaptor sequences may be synthesized using any methods known to those of skill in the art. For the puposes of this invention they may, as options, comprise templates for PCR primers and/or tag or recognition sequences. The design and use of tag sequences is described in US Patent No. 5,800,992 and US Provisional Patent Application No. 60/140,350, filed 6/23/99. Both of which are incorporated by reference for all purposes. Adaptor sequences may be ligated to either blunt end or 1s sticky end DNA. Methods of ligation will be known to those of skill in the art and are described, for example, in Sambrook et al. Methods include DNase digestian to "nick"
the DNA, ligation with ddNTP and the use of polymerise I to fill in gaps or any other methods described in the art.
Further complexity reduction is achieved by adding a specific nucleotide on the 5' end of the PCR primer as illustrated in Figure 3. The specific nucleotide fiu-ther reduces the complexity of the resulting DNA pool because only those fragments which have been isolated after restriction enzyme digestion and contain the complement of the specific nucleotides) incorporated in the PCR primer will be amplified. Figure depicts the results of hybridization to an array after enzyme digestion, ligation to an 2s adaptor and PCR amplification. Figs. 3B and 3C depict the results of hybridization to an array after enzyme digestion, ligation to an adaptor and PCR amplification where the PCR primers incorporated specific nucleotides in the 5' end of the primer. Tn Fig. 3B
the 5' and 3' primers have different specific nucleotides incorporated. In Fig. 3A the 5' and 3' primers have the same nucleotides incorporated. The level of complexity in the 3o isolated pool can be varied depending upon the identity and number of nucleotides incorporated into the PCR primers. A number of embodiments of the present invention involve amplification by PC'.R. Any of these embodiments may be further modified to reduce complexity using the above disclosed technique.
Various methods of conducting PCR amplification and primer design and construction for PCR amplification will be known to those of skill in the art.
PCR is a method by which a specific polynucleotide sequence can be amplified in vitro.
PCR is an extremely powerful technique for amplifying specific polynucleotide sequences, including genomic DNA, single-stranded cDNA, and mRNA among others. As described in U.S. Pat. Nos. 4,683,202, 4,683,195, and 4,800,159 (which are incorporated herein by reference), PCR typically comprises treating separate to complementary strands of a target nucleic acid with two oligonucleotide primers to form complementary primer extension products on both strands that act as templates for synthesizing copies of the desired nucleic acid sequences. By repeating the separation and synthesis steps in an automated system, essentially exponential duplication of the target sequences can be achieved. Standard protocols may be found in, for example 15 Sambrook et al. which is hereby incorporated by reference for all purposes.
In another embodiment, schematically illustrated in Figure 4, the step of complexity management of the DNA samples comprises digestion with a Type Its endonuclease thereby creating sticky ends comprised of random nucleic acid sequences. (Fig 4, Step 1) Type-Its endonucleases are generally commercially 2o available and are well known in the art. A description of Type Its endonucleases can be found in US Patent No. 5,710,000 which is hereby incorporated by reference for all purposes. Like their Type-II counterparts, Type-Its endonucleases recognize specific sequences of nucleotide base pairs within a double stranded polynucleotide sequence.
Upon recognizing that sequence, the endonuclease will cleave the polynucleotide 2s sequence, generally leaving an overhang of one strand of the sequence, or "sticky end."
Type-II endonucleases, however, generally require that the specific recognition site be palindromic. That is, reading in the 5' to 3' direction, the base pair sequence is the same for both strands of the recognition site. For example, the.~equence G-I-A-A-T-T-C
3o C-T-T-A-A-I-G
is the recognition site for the Type-II endonuclease EcoRi, where the arrows indicate the cleavage sites in each strand. This sequence is palindromic in that both strands of the sequence, when read in the 5' to 3' direction are the same.
The Type-Its endonucleases, on the other hand, generally do not require palindromic recognition sequences. Additionally, these Type-Its endonucleases also generally cleave outside of their recognition sites. For example, the Type-Its endonuclease Earl recognizes and cleaves in the following manner:
CTCTTCNINNNN
GAGAAG nn n n ~ n where the recognition sequence is -C-T-C-T-T-C-, N and n represent complementary, ambiguous base pairs and the arrows indicate the cleavage sites in each strand. As the example illustrates, the recognition sequence is non-palindromic, and the cleavage occurs outside of that recognition site.
Specific Type-Its endonucleases which are useful in the present invention include, e.g., EarI, MnII, PIeI, AIwI, BbsI, BsaI, BsmAI, BspMI, Esp3I, HgaI, SapI, SfaNI, BbvI, BsmFI, FokI, BseRI, Hphl and MboII. The activity of these Type-Its endonucleases is illustrated in FIG. 5, which shows the cleavage and recognition patterns of the Type-Its endonucleases.
The sticky ends resulting from Type-Its endonuclease digestion are then ligated to adaptor sequences (Fig 4, Step 2) Those of skill in the art will be familiar with methods of ligation. Standard protocols can be found in, for example, Sambrook et al., hereby incorporated by reference for all purposes. Only those fragments containing the adaptor sequence are isalated. (Figure 6) In addition to those methods of isolation discussed above, methods of isolation which take advantage of unique tag sequences which may be constructed in the adaptor sequences may be employed. These tag sequences may or may not be used as PCR
primer templates. Fragments containing these tags can then be segregated from other non-tag bearing sequences using various methods of hybridization or any of the methods described in the above referenced application.
3o In another embodiment, depicted in Figure 18, the method of complexity reduction comprises digesting the DNA sample with two different restriction enzymes.
The first restriction enzyme is a frequent base cutter, such as MSE I which has a four WO 00/24939 ~CT/US99/25200 base recognition site. The second restriction enzyme is a rare base cutter, such as Eco RI, which has a 6 base recolmition site. This results in three possible categories of fragments; (most common) those which have been cut on both ends with the frequent base cutter, (least common) those which have been cut on both ends with the rare base cutter, and those which have been cut on one end with the frequent base cutter and on one end with the rare base cutter. Adaptors are ligated to the fragments and PCR
primers are designed such that only those fragments which fall into the desired category or categories are amplified. This technique, employed with a six base cutter and a four base cutter can reduce complexity 8-fold when only those fragments from the latter category are amplified. Other combinations of restriction enzymes may be employed to achieve the desired level of complexity.
In another embodiment, the step of complexity management comprises removing repetitive sequences. Figure 10 depicts a schematic representation of this embodiment. The nucleic acid sample is first fragmented. (Figure 10, Step 1 ) Various methods of fragmenting DNA will be known to those of skill in the art.
These methods may be, for example, either chemical or physical in nature. Chemical fragmentation may include partial degradation with a DNAse, partial depurination with acid, the use of restriction e:n2:ymes or other enzymes which cleave DNA at known or unknown locations. Physical fragmentation methods may involve subjecting the DNA
2o to a high shear rate. High shear rates may be produced, for example, by moving DNA
through a chamber or channel with pits or spikes, or forcing the DNA sample through a restricted size flow passage:, e.g., an aperture having a cross sectional dimension in the micron or submicron scale.
In a preferred embodiment adaptor sequences are ligated to the resulting fragments. (Figure 10, Step 2) The fragments with or without adaptor sequences are then denatured. (Figure 10, Step 3) Methods of denaturation will be will known to those of skill in the art. After denaturation, the fragments are then allowed to reanneal.
(Figure 10, Step 4) Annealing conditions may be altered as appropriate to obtain the ' level of repetitive sequence removal desired. Finally, double stranded sequences are 3o removed (Figure 10, Step ~~). Methods of removing double stranded sequences will be known to those of skill in the art and may include without limitation, methods of digesting double stranded DNA such as double strand specific nucleases and exonucleases or methods of physical separation including, without limitation gel based electrophoresis or size chromotography.
In another embodiment, the step of complexity management comprises performing an arbitrarily primed polymerase chain reaction (AP PCR) upon the sample.
AP PCR is described in US Patent No. 5,487,985 which is hereby incorporated by reference in its entirety for all purposes. Figure 7 depicts a schematic illustration of this embodiment. Performing AP PCR with random primers which have specific nucleotides incorporated into the primers produces a reduced representation of genomic DNA in a reproducible manner. Figure 8 shows the level of complexity reduction of human genomic DNA resulting from AP PCR with various primers. Column 1 lists the primer name. Column 2 list the primer sequence. Column 3 lists the annealing temperature. Column 4 lists the polymerase used. Column 5 lists the number correlated to a specific gene on the Hum6.8K GeneChip(R) probe array (Affymetrix, Inc. Santa Clara, Ca). Column 6 lists the percentage of the human genes on the i5 Hum6.8K GeneChip(R) probe array found by fragments whose complexity has been reduced by this method. Fi~;ure 9 shows the reproducibility of AP PCR.
Independently prepared samples preps were subjected to AP PCR using the same primers. The gel bands show that the level of reproducibility between the samples is very high.
Primers may be designed using standard techniques. For example, a computer program is available on the iinternet at the Operon Technologies, Inc. website at http:
www.operon.com. The Operon Oligo Toolkit allows a user to input a potential primer sequence into the webform. The site will instantly calculate a variety of attributes for the oligonucleotide including molecular weight, GC content, Tm, and primer-dimer sets. You may also plot the oligonucletoide against a second sequence. PCR
amplification techniques are. described above in this application and will be well known to those of skill in the art.
In another embodiment of the invention, the method reducing the complexity of a nucleic acid sample comprises hybridizing the sample to a nucleic acid probe containing a desired sequence which is bound to a solid support, such as a magnetic bead. For a description of hybridization of nucleic acids to solid supports, see US Pat No. 5,800,992 incorporated by reference above. This sequence may comprise, for example, a sequence containing a SNP, a cDNA fragment, a chromosome fragment, a subset of genomic DNA or a subset of a library. The sequence may comprise as few as 16 nucleotides and may comprise as many as 2,000, 3,000, 5,000 or more nucleotides in length. Methods of designing and making oligonucleotide probes will be well known to those of skill in the art. In one embodiment, the probe may contain a 5 template sequence for a PCR primer. Solid supports suitable for the attachment of nucleic acid probe sequences will be well known to those of skill in the art but may include, glass beads, magnetic beads, and/or planar surfaces. Magnetic beads axe commercially available from, for example, Dynal (Oslo, Norway). The nucleic acid probes may be synthesized directly on the solid support or attached to the support as a t o full length sequence. Protocols for attaching magnetic beads to probes are included in US Patent No. 5,512,439 which is hereby incorporated by reference for all purposes.
Standard hybridization protocols as discussed above may be employed.
Figure 11 depicts a schematic representation of one example of the above embodiment, wherein the complexity management step is utilized to facilitate genome 15 wide genotyping. Much of the cost of genotyping comes from multiplex PCR.
In this embodiment, the entire sample preparation can be performed in a single tube without the need for multiplex PCR. Because the desired result is to genotype a DNA
sample, the desired sequence in Figure 11 contains a polymorphism. The oligonucleotide comprises 32 bases with the SNP in the center. A magnetic bead is attached to the oligonucleotide probe. (Fig. 11, step 1 ) The probe is then exposed to, for example, fractionated genomic DNA. (Fig.l l, step 2). Adaptor sequences are ligated to both ends of the fragments. (Fig. 11, step 3). The fragments are then amplified (Fig. 11, step 4) and the PCR product containing the desired polymorphism may then be analyzed by various methods including, for example, hybridization to an array or single base extension (SBE). SBE is described in, for example US Provisional Application 60/140,359 which is hereby incorporated by reference in its entirety for all purposes.
The method may further comprise exposing the hybridized sample to a single strand DNA nuclease to remove the single stranded DNA. This embodiment may further comprise ligating an adaptor sequence containing a Class II S
restriction 3o enzyme site to resulting duplexed DNA and digesting the duplex with the appropriate Class II S restriction enzyme to release the attached sequences. The sequences are then WO 00!24939 PCT/US99/25200 isolated and a second adaptor sequence is ligated to the complex and the sequences are amplified.
Figures 12 and 13 depict schematic representations of an embodiment comprising the use of ClassIIs endonucleases. Both figures depict methods which may be employed for single tube genotyping without the need for multiplex PCR. In Figures 12 and 13, the desired sequence is a SNP. The oligonucleotide probe in Figure 12 is 32 bases long and in Figure 13 is 17 bases long. In both figures the SNP
is in the center of the oligonucleotide. The oligonucleotide probe is bound to a magnetic bead.
(Figs. 12 and 13, step 1 ). The probe is then hybridized to fragmented genomic DNA
l0 (Figs. 12 and 13, step 2). Single stranded DNA is digested with a single strand DNA
nuclease leaving a DNA duplex attached to the magnetic bead. (Figs. 12 and 13, step 3). An adaptor sequence is then ligated to the duplex. The adaptor sequence contains a Class IIS restriction site. The probe length and Class IIS endonuclease are chosen such that the site where the duplex is cut is between the SNP and the magnetic bead. In Figure 12 the Class IIS endonuclease cuts directly adjacent to the SNP site, such that the SNP is part of the sticky end left by the endonuclease digestion. (Fig.
12, step 5) In Figure 13 the endonuclease cuts closer to the magnetic bead, leaving a number of bases between the sticky end and the SNP site. (Fig. 13, step 5) In either case, the magnetic bead is released and the sequences are isolated. Adaptor sequences are then ligated to the sticky ends. (Figs. 12 and 13, step 6) In both Figures 12 and 13 the adaptor sequences contain templates for PCR probes. The fragments containing the SNP are then amplified (Figs. 12 and 13, step 7) and the PCR products may be analyzed in a number of different methods including hybridization to an array designed to detect SNPs or SBE.
In this embodiment, the adaptor sequence may further comprise a SNP
identification sequence or tag. In this case, the array to which the PCR
products are hybridized may be a generic tag array as described in the above referenced US
Patent No. 5,800,992 and US Provisional Patent Application 60/140,359 or a chimeric probe array (Figure 14). A chimeric; probe array contains probes which interrogate both for 3o particular sequences characteristic of a genotype as well as for artificial sequences which have been ligated to specific fragments in the sample pool. This allows for higher specificity of hybridization and better differentiation between probes.
This embodiment is depicted in :Figure 15.
In another embodirr~ent, depicted in Figure 16 the method of complexity reduction comprises hybridizing the DNA sample to a mismatch binding protein.
Fig.
16, step 2. Mismatch binding proteins are described in Wagner R. and Radman, M.
(1995) "Methods: A Companion to Methods in Enzymology" 7, 199-203 which is hereby incorporated by reference in its entirety for all purposes. Mismatch binding proteins preferentially bind to DNA duplexes which contain sequence mismatches.
This allows for a relatively wimple and rapid method to locate and identify SNPs. In to this embodiment no prior lcnawledge of the SNP is required. Mismatch binding proteins are commercially available through GeneCheck (Ft. Collins, Co.). In a further embodiment, depicted in Figure 17, magnetic beads are attached to the mismatch binding proteins. Mismatch binding proteins attached to magnetic beads are commercially available through GeneCheck (Ft. Collins, Co.). After hybridization the sample is digested with a 3' to 5' exonuclease (Fig. 16, step 3). Remaining single stranded DNA is then removed with a nuclease (Fig. 16, step 4).
If it is desired to cut the duplex at the mismatch, then the enzyme resolvase may be used. See US Patent Nos. 5,958,692, 5,871,911 and 5,876,941 (each of which is incorporated by reference in their entireties for all purposes) for a description of various methods of cleaving nucleic acids. The resolvases (e.g. X-solvases of yeast and bacteriophage T4, Jensch et al. EMBO J. 8, 4325 (1989)) are nucleolytic enzymes capable of catalyzing the resolution of branched DNA intermediates (e.g., DNA
cruciforrns) which can involve hundreds of nucleotides. In general, these enzymes are active close to the site of DNA distortion (Bhattacharyya et al., J. Mol.
Biol., 221, 1191, (1991)). T4 Endonuclease VII, the product of gene 49 of bacteriophage T4 (Kleff et al., The EMBO J. 7, 1527, (1988)) is a resolvase (West, Annu. Rev. Biochem.
61, 603, (1992)) which was first shown to resolve Holliday-structures (Mizuuchi et al., Cell 29, 357, (1982)). T4 Endonuclease VII has been shown to recognize DNA
cruciforms (Bhattacharyya et al., supra; Mizuuchi et al., supra) and DNA loops (Kleff et al., supra), 3o and it may be involved in patch repair. Bacteriophage T7 Endonuclease I has also been shown to recognize and cleave DNA cruciforms (West, Ann. Rev. Biochem. 61, 603, (1992)). Eukaryotic resolvasc;s, particularly from the yeast Saccharomyces cerevisiae, have been shown to recogni;se and cleave cruciform DNA (West, supra; Jensch, et al., EMBO J. 8, 4325 (1989)). Other nucleases are known which recognize and cleave DNA mismatches. For example, S 1 nuclease is capable of recognizing and cleaving DNA mismatches formed when a test DNA and a control DNA are annealed to form a heteroduplex (Shenk et al., F'roc. Natl. Acad. Sci. 72, 989, (1975)). The Nut Y repair protein of E. coli is also capable of detecting and cleaving DNA mismatches.
Computer Implemented Analysis In another embodiment a computer system is used to model the reactions to discussed above to aid the user in selecting the correct experimental conditions. In this embodiment, the sequence o;f the DNA sample must be known. A computer program queries an electronic database containing the sequence of the DNA sample looking for sites which will be recognized by the enzyme being used. The method of modeling experiments can be employed for a wide variety of experiments.
15 In one embodiment, the user can run multiple experiments altering various conditions. For example, i f a user desires to isolate a particular sequence of interest in a fragment which has been digested with a restriction enzyme, the user can have the computer model the possible outcomes using a wide variety of restriction enzymes.
The particular sequence which is selected may be chosen by specific criteria, i.e.
2o because the region is believed to be associated with specific genes, polymorphisms, or phenotypes for example, or may be chosen at random. The user can then select the restriction enzyme which, for example, isolates the desired sequence in a fragment of unique size. Additionally or alternatively, if the user desires to reduce complexity using the type IIS nuclease/ligation technique described above, the user can experiment 25 with the length and sequence of the adaptors to determine the optimal sequence for the adaptors' "sticky" ends. This enables the user to be confident that they will obtain a fragment containing a particular sequence of interest or to fine tune the level of complexity in the DNA pool. In another embodiment, a user could model the kinetics of the denaturing, reannealin~; technique for removal of repeated sequences discussed 3o above to determine the conditions which allow for the desired result. For example, a user may desire the removal of only a certain percentage of repeated sequences.
For example, virtual restriction digests may be performed by querying an electronic database which contains the sequence of DNA of interest. Because the database contains the nucleic acid sequence and restriction enzymes cut at known locations based on the DNA sequence, one can easily predict the sequence and size of fragments which will result from a restriction digest of the DNA. Ideally, restriction enzymes which produce no two fragments of the same or very similar size are desired.
Combinations of restriction enzymes may be employed. Those of skill in the art will be familiar with electronic databases of DNA sequences. GenBank, for example, contains approximately 2,570,000,000 nucleic acid bases in 3,525,000 sequence records as of to April 1999. A computer program searches the electronic database for a sequence which suits the requirements of the particular restriction enzyme. For example, the restriction enzyme Eco RI recognizes l:he sequence GAATTC and will cut a DNA molecule between the G and the first A. The computer program will query the chosen sequence for any occurences of the sequence GAATTC and mark the site where the restriction 15 enzyme will cut. The program will then provide the user with a display of the resulting fragments.
Exhibit 1 is an example of a program to conduct this type of virtual enzyme digestion. Exhibit 2 is an example of a program to virtually model the ligation of two sequences to each other.
2o In another embodiment, the method of modeling experiments in a computer system can be used to design probe arrays. A database may be interrogated for any desired sequence, for example, a polymorphism. Computer modeled reactions are then performed to help determine the method for isolating a fragment of DNA
containing the sequence of interest. These methods may comprise any of the methods described 25 above, alone or in combination. Arrays are then constructed which are designed to interrogate the resulting fragments. It is important to note that for the purpose of designing arrays, the virtual reactions need not be performed flawlessly, since the arrays may contain hundreds of thousands of sequences.
One embodiment of the invention relies on the use of virtual reactions to 3o predetermine the sequence o:f chosen DNA fragments which have subjected to various procedures. The sequence information for the chosen fragments is then used to design the probes which are to be attached to DNA arrays. Arrays rnay be designed and manufactured in any number of ways. For example, DNA arrays may be synthesized directly onto a solid support using methods described in, for example US
Patent Nos.
5,837,832, 5,744,305 and 5,800,992 and W095/11995 herein incorporated by reference for all purposes. See also, Fodor et al., Science, 251:767-777 (1991), Pinning s et al., U.S. Pat. No. 5,143,854 (see also PCT Application No. WO 90/15070) and Fodor et al., PCT Publication No. WO 92/10092 and U.S. Pat. No. 5,424,186, each of which is hereby incorporated in its entirety by reference for all purposes.
Techniques for the synthesis of these arrays using mechanical synthesis methods are described in, e.g., U.S. Pat. No. 5,384,261, incorporated herein by reference in its entirety for all purposes 1o Briefly, 5,837,832 describes a tiling method for array fabrication whereby probes are synthesized on a solid support. These arrays comprise a set of oligonucleotide probes such that, for each base in a specific reference sequence, the set includes a probe (called the "wild-type" or "WT" probe) that is exactly complementary to a section of the sequence of the chosen fragment including the base of interest and four additional 15 probes (called "substitution probes"), which are identical to the WT probe except that the base of interest has been replaced by one of a predetermined set (typically 4) of nucleotides. Probes may be synthesized to query each base in the sequence of the chosen fragment. Target nucleic acid sequences which hybridize to a probe on the array which contain a substitution probe indicate the presence of a single nucleotide 2o polymorphism. Other applications describing methods of designing tiling arrays include: US Patent Nos. 5,858,659, and 5,861,242 each of which is incorporated by reference in its entirety for all purposes. In a similar manner, arrays could be constructed to test for a variety of sequence variations including deletions, repeats or base changes greater than one; nucleotide. US Patent Nos. 5,593,839 and 5,856,101 (each of which is incorporated by reference for all purposes) describe methods of using computers to design arrays and lithographic masks.
The label used to detect the target sequences will be determined, in part, by the detection methods being applied. Thus, the labeling method and label used are selected in combination with the actual detecting systems being used. Once a particular label has 3o been selected, appropriate labeling protocols will be applied, as described below for specific embodiments. Standard labeling protocols for nucleic acids are described, e.g., in Maniatis; Kambara, H. et al. (1988) BioTechnology 6:816-821; Smith, L. et al.
(1985) Nuc. Acids Res. 13:2399-2412; for polypeptides, see, e.g., Allen G.
(1989) Sequencing of Proteins and Peptides, Elsevier, N.Y., especially chapter 5, and Greenstein and Winitz ( 1961 ) Chemistry of the Amino Acids, Wiley and Sons, N.Y.
Carbohydrate labeling is described, e.g., in Chaplin and Kennedy (1986) Carbohydrate Analysis: A Practical Approach, IRL Press, Oxford. Other techniques such as TdT end labeling may likewise be employed. Techniques for labeling protocols for use with SBE are described in, e.g. US Provisional Patent Application 60/140,359 which is incorporated by reference above.
Generally, when using a DNA array a quickly and easily detectable signal is to preferred. Fluorescent tagging of the target sequence is often preferred, but other suitable labels include heavy metal labels, magnetic probes, chromogenic labels (e.g., phosphorescent labels, dyes, and fluorophores) spectroscopic labels, enzyme linked labels, radioactive labels, and labeled binding proteins. Additional labels are described in U.S. Pat. Nos. 5,800,992 and 4,366,241, and published PCT Application WO
15 99/13319 which are incorporated herein by reference.
The hybridization conditions between probe and target should be selected such that the specific recognition interaction, i.e., hybridization, of the two molecules is both sufficiently specific and sufficiently stable. See, e.g., Hames and Higgins (1985) Nucleic Acid Hybridisation: A Practical Approach, IRL Press, Oxford. These 2o conditions will be dependent both on the specific sequence and often on the guanine and cytosine (GC) content of the complementary hybrid strands. The conditions may often be selected to be universally equally stable independent of the specific sequences involved. This typically will make use of a reagent such as an alkylammonium buffer.
See, Wood et al. (1985) "Base Composition-independent Hybridization in 25 Tetramethylammonium Chloride: A Method for Oligonucleotide Screening of Highly Complex Gene Libraries," Proc. Natl. Acad. Sci. USA, 82:1585-1588; and Krupov et al. (1989) "An Oligonucleotide Hybridization Approach to DNA Sequencing," FEBS
Letters, 256:118-122; each of which is hereby incorporated herein by reference. An alkylammonium buffer tends to minimize differences in hybridization rate and stability 3o due to GC content. By virtue of the fact that sequences then hybridize with approximately equal affinity and stability, there is relatively little bias in strength or kinetics of binding for particular sequences. Temperature and salt conditions along with other buffer parameters should be selected such that the kinetics of renaturation should be essentially independent of the specific target subsequence or oligonucleotide probe involved. In order to ensure this, the hybridization reactions will usually be performed in a single incubation of all the substrate matrices together exposed to the identical same target probe solution under the same conditions. The hybridization conditions will usually be selected to be sufficiently specific such that the fidelity of base matching will be properly discriminated. Of course, control hybridizations should be included to determine the stringency and kinetics of hybridization. See for example, US Patent No. 5,871,928 which is hereby incorporated in its entirety for all purposes.
Another factor that c;an be adjusted to increase the ability of targets to hybridize to probes is the use of nucleic acid analogs of PNAs in the probes. They can be built into the probes to create a more uniform set of hybridization conditions across the entire array. See US Patent Application No. 08/630,427 incorporated by reference above.
The detection methods used to determine where hybridization has taken place will typically depend upon the label selected. Thus, for a fluorescent label a fluorescent detection apparatus will typically be used. Pirrung et al. (1992) U.S. Pat.
No. 5,143,854 and Ser. No. 07/624,120, now abandoned, (both of which are hereby incorporated by reference for all purposes) describe apparatus and mechanisms for scanning a substrate matrix using fluorescence detection, but a similar apparatus is adaptable for other optically detectable labels. See also, US Patent Nos. 5,578,832, 5,834,758, and 5,837,832 each of which is incorporated by reference in its entirety for all purposes.
A variety of methodic can be used to enhance detection of labeled targets bound to a probe attached to a solid support. In one embodiment, the protein MutS
(from E.
coli) or equivalent proteins such as yeast MSH1, MSH2, and MSH3; mouse Rep-3, and Streptococcus Hex-A, is used in conjunction with target hybridization to detect probe-target complex that contain mismatched base pairs. The protein, labeled directly or indirectly, can be added during or after hybridization of target nucleic' acid, and differentially binds to homo-~ and heteroduplex nucleic acid. A wide variety of dyes and other labels can be used for similar purposes. For instance, the dye YOYO-1 is known to bind preferentially to nucleic acids containing sequences comprising runs of 3 or more G residues. Signal amplification methods as described in US Patent Application No. 09/276,774 may likewise be used.
Various methods of hybridization detection will be known to those of skill in the art. See for example, US Patent Nos. 5,578,832, 5,631,734, 5,744,305 and 5,$00,992 each of which is hereby incorporated in its entirety for all purposes.
Examples 1o Example 1 - Restriction Enzyme Di~est/Sizi~
The complexity of total genomic DNA from human and yeast was reproducibly reduced using a restriction enzyme digestion. For each species 0.5 ug genomic DNA
was digested with 20 units of EcoRI in a total volume of 40 ul at 37 °C
overnight (Figure 2, Step 1 ). The enzyme was inactivated by incubation at 65 °C
for 10 minutes.
The DNA solution was mixed with 10 ul Sx loading buffer and separated by gel electrophoresis on a 2% agarose gel. (Figure 2, Step 2) The gel was visualized by ethidium bromide staining. Fragments of 250 - 350 by were excised from the gel and purified using a QIAquick ge;l extraction kit (Qiagen). (Figure 2, Step 3) Alternatively, fragments of the required size could have been isolated using HPLC.
2o Adaptor sequences containing PCR primer template sequences were then ligated to the purified fragrne;nts using 100U T4 ligase in lx T4 DNA ligase buffer (New England Biolabs) at 16 °C overnight. The adaptor sequences were 5'-d(pAATTCGAACCCCTTCGGATC)-3' and 5'-d(GATCCGAAGGGGTTCGAATT)-3' (Figure 2, Step 4) The ligase; was then heat inactivated at 65 °C for 15 minutes.
The fragments were then subjected to PCR with one primer that corresponded to the PCR primer template sequence 5'-d(GATCCGAAGGGGTTCGAATT)-3' (Figure 2, Step 5). The PCR mixture. contained approx. 1 ng ligated DNA fragments, 5 units AmpliTaq Gold polymerase IPerkins. Elmer), 5 uM pimer, 200uM dN-TPs,. l5 mM
Tris-HCI (pH8.2), 50 mM KCI, 2.5 mM MgCl2 in a final volume of 50 ul. PCR was performed in a Perkin-Elmer 9600 thermocycler using an initial 10 minute denaturation at 95 °C, 35 cycles of a 1 minute denaturation at 94 °C, annealing for 1 minute at 57 °C
and extension at 72 °C for ~; minutes. This is followed by a final 5 minute extension cycle at 72 °C.
The PCR products were then purified with QIAquick PCR Purification kit (Qiagen) according to the manufacturer's instructions and fragmented with DNase I.
The remaining fragments were then labeled with biotin-N6-ddATP as follows:
In each tube, incubate 10 ug DNA with 0.3 unit DnaseI (Promega) at 37 °C for 30 minutes in a 45 ul mixture also containing 10 mM Tris-Actate (pH 7.5), 10 mM
magnesium acetate and 50 mM potassium acetate. Stop the reaction by heating the sample to 95 °C for 15 minutes. Label the sample by adding 60 unit terminal o transferase and 4 pmol biotin-N6-ddATP (Dupont NEN) followed by incubation at 37 °C for 90 minutes and a final heat inactivation at 95 °C for 15 minutes.
The labeled DNA was then hybridized to an array in a hybridization mixture containing 80 ug labeled DNA, 160 ug human COT-1 DNA (GIBCO), 3.S M
tetramethylamonium cloride, 10 mM MES (pH 6.5), 0.01 % Triton-100, 20 ug herring sperm DNA, 100 ug bovine serum albumin and 200 pM control oligomer at 44 °C for 40 hours on a rotisserie at 40 rpm. The arrays were then washed with 0.1 M
MaCI in 10 mM MES at 44 °C for 30 minutes on a rotisserie at 40 rpm. The hybridized arrays were then stained with a staining solution [10 mM MES (ph 6.5), 1 M NaCI, 10 ug/ml steptaviden R-phycoerythrin, 0.5 mg/mI acetylated BSA, 0.01% Triton-100] at 40 °C
for 15 minutes. The arrays were then washed with 6x SSPET [0.9 M NaCI, 60 mM
NaH2P04 (pH 7.4), 6 mM EDTA, 0.005 % Triton-100J on a GeneChip~ Fluidics Station (Affymetrix, Inc., Smta Clara, CA) 10 times at 22 °C. The arrays were then anti-streptavidin antibody stained at 40 °C for 30 minutes with antibody solution [lOmM MES (pH 6.5), 1 M NaCI, 10 ug/m1 streptavidin R-phycoerythrin, 0.5 mg/ml actylated BSA, 0.01% Triton-100]. The arrays are then restained with staining solution for 15 minutes followed by 6X SSPET washing as above. The arrays are then scanned with a confocal scanner at 560 nm. The hybridization patterns were then screened for SNP detection with a computer program as described in D.G. Wang et al Science 280, 1077-1082, 1998. The results of the hybridization can be seen in Figures 8A
and 8B.
Example 2 - Digestion with a Tvne Its Endonuclease and Selective Li. ation Complexity was reproducibly reduced after digestion with a type Its endonuclease and selective ligation to an adaptor sequence. 2 ug of genomic DNA was digested with Bbv I at 37 °C overnight. (Figure 3, Step 1) The enzyme was heat inactivated at 65 °C for 15 minutes.
5 Adaptors containing PCR primer template sequences were ligated in a 50 ul mixture of 400 ng digested genomic DNA, 10 pmol adaptor and 40 unit T4 ligase in a 1 X T4 ligase buffer. (Figure: 3, Step 2) The adaptor sequences were as follows: 5'-d(pATNNGATCCGAAGG(iTTCGAATTC)-3' and 5'GAATTCGAACCCCTTC'GGATC)-3'. The ligation was conducted at 16°C
to overnight. The ligase was inactivated by incubation at 65°C for 15 minutes.
The fragments were then subjected to PCR with one primer that corresponded to the PCR primer template sequence: 5'-GAATTCGAACCCCTTCGGATC)-3' in a 50 ul reaction containing 20 ng ligated DNA, 1 unit AmpliTaq Gold polymerase (Perkins Elmer), 3 uM primer, 200uM dNTPs, 15 mM Tris-HCl (pH8.0), 50 mM KCI, 2.5 mM
15 MgCl2. PCR was performed in a Perkin-Elmer 9600 thermocycler using an initial 10 minute denaturation at 95°C, 35 cycles of a 0.5 minute denaturation at 94°C, annealing for 0.5 minute at 57°C and extension at 72°C for 2 minutes. This is followed by a final 5 minute extension cycle at 72°C.
2o Example 3 - Double Digestion and Selective PCR
Human genomic DNA was digested in a 40 ul reaction at 37 °C for 1 hour. The reaction mixture contained 0.5 ug human genomic DNA, 0.5 mM DTT, 5 unit EcoRI
(New England Biolabs), 5 units Sau3AI (New England Biolabs), 0.5 ng/ul BSA, 10 mM Tris-Acetate (pH 7.5), 117 mM magnesium acetate and SO mM potassium acetate.
25 The enzymes were inactivated at 65 °C for 15 minutes.
The restriction fragmf;nts were then ligated to adaptor sequences. The ligation mixture contained: 5 pmol Eco R I adaptor [5'-d(pAATTCGAACCCCTTCGGATC)-3' and 5'-d(GATCCGAAGGG<JTTCG)-3'], 50 pmol Sau3A I adaptor [S'-d(pGATCGCCCTATAGTGAGTCGTATTACAGTGGACCATCGAGGGTCA)-3'], 5 3o mM DTT, 0.5 ng/ul BSA, 100 unit T4 DNA ligase, 1 mM ATP, 10 mM Tris-Acetate (pH 7.5), 10 mM magnesium acetate and 50 mM potassium acetate]. The ligation mixture was incubated with the restriction fragments at 37°C for 3 hours. The ligase was inactivated at 65 °C for 20 minutes.
The ligated DNA target was then amplified by PCR. The PCR mixture contained 12.5 ng ligated DNA, 1 unit AmpliTaq Gold polumerase (Perkins Elrner), 0.272 rnM EcoRI selective primer (5'-AAGGGGTTCGGAATTCCC-3'; CC as the selective bases), 0.272 uM Sau3AI selective primer (5'-TCACTATAGGGCGATCTG-3'; TG as the selective bases), 200 uM dNTPs, 15 mM Tris-HCl (pH 8,0), 50 mM
KCI, 2.5 mM MgCl2 in a final volume of 50 ul. PCR was performed in a Perkin-Elmer thermocycler using an initial 10 minute denaturation at 95 °C, 35 cycles of a 1 minute 1o denaturation at 94 °C, annealing for 1 minute at 56 °C and extension at 72 for 2 minutes. This is followed by a final 5 minute extension at 72 °C.
Example 4, Arbitaril, Primed PCR
PCR pimers were designed with the Operon Oligo Toolkit described in the 15 specification above.
Human genomic DNA was amplified in a 100 ul reaction containing 100 ng genomic DNA, 1.25 units ArnpliTaq Gold polymerase (Perkin Elmer), 10 uM
arbitary primer, 200 mM dNTPs, 10 rnM tris-HCI (pH 8.3), 50 mM KCI and 2.5 mM MgCl2.
PCR was performed in a Perkin-Elmer 9600 thermocycler using an initial 10 2o minute denaturation at 95 °C., 35 cycles of a 1 minute denaturation at 94 °C, annealing for 1 minute at 56 °C and extension at 72 for 2 minutes. This is followed by a final 7 minute extension at 72 °C.
The PCR product was. then purified, fragmented, labeled and hybridized as described in the examples above.
Example 5 - SNP discovery - Generally As an example, the present invention may be directed to a method for simplifying the detection of or comparing the presence of absence of SNPS-among ._ individuals, populations, species or between different species. This invention allows 3o for a quick and cost-effective method of comparing polymorphism data between multiple individuals. First, a reduced representation of a nucleic acid sample is produced in a repeatable and highly reproducible manner from multiple individuals, using any of the above described techniques alone or in combination. Then, the data generated by hybridizing the DNA samples collected from multiple individuals to identical arrays in order to detect for the presence or absence of a number of sequence variants is compared. Arrays are designed to detect specific SNPS or simply to detect the presence of a region known to frequently contain SNPS. In the latter case, other techniques such as sequencing could be employed to identify the SNP.
SNP discovery - method 1 Typically, the detection of SNPs has been made using at least one procedure in which the nucleic acid sequence that may contain the SNP is amplified using PCR
primers. This use can create an expense if many SNPs are to be evaluated or tested and it adds significantly more time to the experiment for primer design and selection and testing. The following example eliminates the need for the specific PCR
amplification step or steps. First, using the; methods provided in example 1 above, a restriction enzyme or enzymes is used to cut genomic DNA at a large number of sites and a size range of restriction fragments is selected for assay. An electronic database, such as GenBank is queried to determine which sequences would be cut with the specific restriction enzymes) that were selected above. The sequences of the resulting 2o fragments are then used to design DNA arrays which will screen the regions for the SNPs or other variants. The ,selected fragments are then subjected to further fragmentation and hybridized) to the array for analysis.
SNP discover~Method 2 Alternatively, the method provided in example 2 above may be employed, type IIS restriction enzymes cut ge,nomic DNA from each individual and adaptor sequences are designed to ligate to specific fragments as desired. Adaptor sequences may include both random and specific nucleotide ends as required to produce the desired result. If desired, amplification primers may be designed to hybridize to the adaptor sequences, allowing for amplification of only the fragments of interest. An electronic database and computer modeling system may be used to aid in the selection of appropriate experimental conditions and to design the appropriate arrays. The fragments are then hybridized to the array for analysis.
SNP discovery - Method 3 As another alternative, MutS Protein were used to isolate DNA containing SNPS for analysis on an array. 3 ugs of DNA was fragmented with Eco R I
(alternatively a Dnase I could have been used.) At this point an equal amount of control DNA was added (th.is step is optional).
O.Sug of the fragments were denatured at 95 °C for 10 minutes and gradually 1o cooled to 65 °C over a 60 minute period. The fragments were then incubated at 65 °C
for 30 minutes and the temperature was ramped down to 25 °C over a 60 minute period.
1.5 ug MutS protein (Epicer,~tre) was then added and allowed to incubate at room temperature for 15 minutes 1:o allow for binding. (Figure 7, Step 1 ) The bound fragments were then digested with 20 units T7 polymerase (New England Biolabs) at 30 °C for 30 minutes. {Figure 7, Step 2) The T7 polymerase was inactivated by incubation at 65 °C for 10 minutes.
Single stranded DNA was trimmed with 100 units of nuclease S 1 (Boehringer-Mannheim) at 16 °C for 15 minutes. {Figure 7, Step 3) The enzymes inactivated by adding 50 nmol EDTA and incubation at 65°C for 15 minutes.
Adaptor sequences containing PCR primer templates were then ligated to the DNA sequences in a 10 ul li;gation mixture: lul DNA solution, 4 ul dH20, 1 ul lOX T4 DNA ligase buffer, 3 ul 10 mM adaptor [5'-d(GATCCGAAGGGGTTCGAATT)-3' and 5'-d(pGAATTCGAACCCCTTCGGATC-e') and 1 ul 400 U/ul T4 DNA ligase]
and incubated at 16 °C overnight and then inactivated at 65 °C
for 15 minutes. (Figure 7, Step 4) The sequences were ;amplified in a 25 ul reaction containing 0.25 pmol template DNA, 0.125 units AmpliTaq Gold polymerase (Perkin Elmer), 3 uM primer, [5'-d(GATCCGAAGGGGTTC(sAATT)-3'], 200 uM dNTPs, 15 mM tn's-HCl (pH 8.0), 50r--mM KCl and 1.5 mM MgCl;z.
3o PCR was performed in a MJ Research Tetrad thermocycler using an initial 10 minute denaturation at 95 "C', 35 cycles of a 0.5 minute denaturation at 94 °C, annealing for 0.5 minute at 57 °C and extension at 72 °C. This is followed by a final 5 minute extension at 72 °C.
The sequences were then labeled and hybridized to an array as described above.
s SNP discovery - Method 4 As another alternative, oligonucletides attached to magnetic beads may be used for allele specific SNP enrichment and genotyping. Synthesized biotin-tagged oligonucleotides containing sequences complementary to the regions of desired SNPs were mixed with target DNA in a 1000: 1 ratio. (Alternatively, a 10:1, 20:1, 50:1, 250:1 or any other ratio could have been chosen.) The sample was then denatured at 95 °C for 10 minutes allowed to reanneal by slowly cooling to room temperature.
The sample was then bound to streptavadin-magnetic beads (Promega) by mixing the sample and the beads and incubation at room temperature for 10 minutes.
The beads were then washed with 1X MES with 1M Sodium Chloride (NaCI) three times. The beads were then resuspended in 50 ul 1X mung bean nuclease buffer.and mixed with 1 unit of mung bean nuclease. The beads were then incubated at 30°C for 15 minutes. The mung bean nuclease was then inactivated by adding 1 % SDS. The beads were then washed with 1 X MES with 1 M NaCI three times.
2o The beads were then resuspended in ligation mixture containing T4 ligase in X T4 ligase buffer and 200 fold excess adaptor I sequence [5'-d(ATTAACCCTCACTAAAGCTGGAG)-3'and S'-d(pCTCCAGCTTTAGTGAGGGTTAAT)-3' BpmI recognition sites are highlighted in boldface] at 16 °C overnight. The ligase was then inactivated by incubation at 65 °C
for 10 minutes.
The beads were then washed with 1X MES with 1M NaCI three times and then resuspended in 50 ul 1X Bpm I restriction buffer. BPM I was then added and the beads were incubated at 37 °C'. for 1 hr. The enzyme was inactivated by incubation at ..
65 °C for 10 minutes and the supernatant solution with the sequences containing the 3o desired SNPs was collected.
A second set of adaptor sequences containing PCR template sequences [5'-d(pCTATAGTGAGTCGTATT-3') and (S'-AATACGACTCACTATAGNN-3')) and ligase were then added to the supernatant solution and incubated at 16 °C overnight.
The ligase was then heat inactivated at 65 °C for 10 minutes.
The samples were then amplified with PCR using T3 (5'-ATTAACCCTCACTAAAG-3') and T7 5'-d(TAATACGACTCACTATAGGG)-3' sequencing primers (Operon) in a 50 ml reaction containing 106 copies of each target DNA, 1 unit AmpliTaq Gold polymerase (Perkin Elmer), 2 uM each primer, 200 uM
dNTPs, 1 S mM tris-HCl (pH 8.0), 50 mM KCl and 2.5 mM MgCl2.
1 o PCR was performed in a MJ Research Tetrad Thermocycler using an initial 10 minute denaturation at 95 °C'., 45 cycles of a 0.5 minute denaturation at 94 °C, annealing for 0.5 minute at ~2 °C and extension at 72 °C for 1 minute. This is followed by a final 5 minute extension at 72 °C. The fragments were then labeled and hybridized to an array.
Methods of Use The present methods of sample preparation and analysis are appropriate for a wide variety of applications. Any analysis of genomic DNA may be benefitted by a reproducible method of complexity management.
2o As a preferred embodiment, the present procedure can be used for SNP
discovery and to genotype individuals. For example, any of the procedures described above, alone or in combination, could be used to isolate the SNPs present in one or more specific regions of genomic DNA. Arrays could then be designed and manufactured on a large scale basis to interrogate only those fragments containing the regions of interest. Thereafter, a sample from one or more individuals would be obtained and prepared using the same techniques which were used to design the array.
Each sample can then be hybridized to a pre-designed array and the hybridization pattern can be analyzed to determine. the genotype of each individual.or a population of individuals as a whole. Methods of use for polymorphisms can be found in, for 3o example, co-pending U.S. application 08/813,159. Some methods of use are briefly discussed below.
Correlation of Polymorphisms with Phenotmic Traits Some polymorphisms occur within a protein coding sequence and contribute to phenotype by affecting protein structure. The effect may be neutral, beneficial or detrimental, or both beneficial and detrimental, depending on the circumstances. For example, a heterozygous sickle cell mutation (which involves a single nucleotide polymorphism) confers resistance to malaria, but a homozygous sickle cell mutation is usually lethal. Other polymorphisms occur in noncoding regions but may exert phenotypic effects indirectly via influence on replication, transcription, and translation.
A single polymorphism may affect more than one phenotypic trait. Likewise, a single to phenotypic trait may be affected by polymorphisms in different genes.
Further, some polymorphisms predispose an individual to a distinct mutation that is causally related to a certain phenotype.
Phenotypic traits include diseases that have known but hitherto unmapped genetic components (e.g., agammaglobulimenia, diabetes insipidus, Lesch-Nyhan syndrome, muscular dystrophy, Wiskott-Aldrich syndrome, Fabry's disease, familial hypercholesterolemia, polycystic kidney disease, hereditary spherocytosis, von Willebrand's disease, tuberous sclerosis, hereditary hemorrhagic telangiectasia, familial colonic polyposis, Ehlers-Danlos syndrome, osteogenesis imperfecta, and acute intermittent porphyria). Phenotypic traits also include symptoms of, or susceptibility to, multifactorial diseases of which a component is or may be genetic, such as autoimmune diseases, inflammation, cancer, diseases of the nervous system, and infection by pathogenic microorganisms. Some examples of autoimmune diseases include rheumatoid arthritis, multiple sclerosis, diabetes (insulin-dependent and non-independent), systemic lupus erythematosus and Graves disease. Some examples of cancers include cancers of the bladder, brain, breast, colon, esophagus, kidney, leukemia, liver, lung, oral cavity, ovary, pancreas, prostate, skin, stomach and uterus.
Phenotypic traits also include: characteristics such as longevity, appearance (e.g., baldness, obesity), strength, speed, endurance, fertility, and susceptibility or receptivity to particular drugs or therapeutic treatments.
3o Correlation is performed for a population of individuals who have been tested for the presence or absence of a phenotypic trait of interest and for polymoiphic markers sets. To perform such analysis, the presence or absence of a set of polymorphisms (i.e. a polyrnorphic set) is determined for a set of the individuals, some of whom exhibit a particular trait, and some of which exhibit lack of the trait. The alleles of each polymorphism of the set are then reviewed to determine whether the presence or absence of a particular allele is associated with the trait of interest.
s Correlation can be performed by standard statistical methods such as a K-squared test and statistically significant con elations between polymorphic forms) and phenotypic characteristics are noted. For example, it might be found that the presence of allele A1 at polymorphism A correlates with heart disease. As a further example, it might be faund that the combined presence of allele Al at polymorphism A and allele B1 at polymorphism B correlates with increased milk production of a farm animal.
(See, Beitz et al., US 5,292,639 Genetic Mapping of Phenotypic Traits Linkage analysis is useful for mapping a genetic locus associated with a is phenotypic trait to a chromosomal position, and thereby cloning gene{s) responsible for the trait. See Lander et al., F'roc. Natl. Acad. Sci. (USA) 83, 7353-7357 (1986); Lander et al., Proc. Natl. Acad. Sci. ~~I~SA) 84, 2363-2367 (1987); Donis-Keller et al., Cell S1, 319-337 (1987); Lander et al., Genetics 121, 185-199 (1989)). Genes localized by linkage can be cloned by a process known as directional cloning. See Wainwright, 2o Med. J. Australia 159, 170-174 (1993); Collins, Nature Genetics 1, 3-6 (1992) (each of which is incorporated by reference in its entirety for all purposes).
Linkage studies are typically performed on members of a family. Available members of the family are characterized for the presence or absence of a phenotypic trait and for a set of polymorphic markers. The distribution of polymorphic markers in 25 an informative meiosis is them analyzed to determine which polymorphic markers co-segregate with a phenotypic trait. See, e.g., Kerem et al., Science 245, 1073-(1989); Monaco et al., Nature 316, 842 (1985); Yamoka et al., Neurology 40, (1990); Rossiter et al., FASE~3 Journal 5, 21-27 (1991).
3o Diseauilibrium rnappin~ of the entire e~ nome Linkage disequilibriu~n or allelic association is the preferential association of a particular allele or genetic marker with a specific allele, or genetic marker at a nearby chromosomal location more frequently than expected by chance for any particular allele frequency in the population. For example, if locus X has alleles a and b, which occur equally frequently, and linked locus Y has alleles c and d, which occur equally frequently, one would expect the combination ac to occur with a frequency of 0.25. If ac occurs more frequently, then alleles a and c are in linkage disequilibrium.
Linkage disequilibrium may result from natural selection of certain combination of alleles or because an allele has been introduced into a population too recently to have reached equilibrium with linked alleles.
A marker in linkage disequilibrium can be particularly useful in detecting susceptibility to disease (or other phenotype) notwithstanding that the marker does not cause the disease. For example, a marker (X) that is not itself a causative element of a disease, but which is in linkage disequilibrium with a gene {including regulatory sequences) (Y) that is a causative element of a phenotype, can be detected to indicate susceptibility to the disease in circumstances in which the gene Y may not have been 1s identified or may not be readily detectable.
Marker assisted breedi~
Genetic markers can decipher the genomes in animals and crop plants. Genetic markers can aid a breeder in the understanding, selecting and managing of the genetic 2o complexity of an agronomic or desirable trait. The agriculture world, for example, has a great deal of incentive to tn~ to produce food with a rising number of desirable traits (high yield, disease resistancc;, taste, smell, color, texture, etc.) as consumer demand and expectations increase. However, many traits, even when the molecular mechanisms are known, are too difficult or costly to monitor during production.
25 Readibly detectable polymophisms which are in close physical proximity to the desired genes can be used as .a proxy to determine whether the desired trait is present or not in a particular organism. This provides for an efficient screening tool which can accelerate the selective breeding process.
3o Pharmaco~enomics Genetic information can provide a powerful tool for doctors to determine what course of medicine is best for a particular patient. A recent Science paper entitled "Molecular Classification o:f Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," (to~ be published 10/15/99 hereby incorporated by reference in its entirety for all purposes) discusses the use of genetic information discovered through the use of arrays to determine the specific type of cancer a particular patient has. The paper goes on to discuss the ways in which particular treatment options can then be tailored for each patient's particular type of cancer. Similar uses of genetic information for treatment plans have been disclosed for patients with HIV. (See US Patent Application 5,861,242).
The pharmaceutical industry is likewise interested in the area of 1o pharmacogenomics. Every year pharmaceutical companies suffer large losses from drugs which fail clinical trials for one reason or another. Some of the most difficult are those drugs which, while being highly effective for a large percentage of the population, prove dangerous or even lethal for a very small percentage of the population. Pharmacogenomics can be used to correlate a specific genotype with 15 specific responses to a drug. The basic idea is to get the right drug to the right patient.
If pharmaceutical companies (and later, physicians) can accurately remove from the potential recipient pool those; patients who would suffer adverse responses to a particular drug, many research efforts which are currently being dropped by pharmaceutical companies could be resurrected saving hundreds of thousands of dollars 2o for the companies and providing many currently unavailable medications to patients.
Similarly, some medications may be highly effective for only a very small percentage of the population while proving only slightly effective or even ineffective to a large percentage of patients. Pharmacogenomics allows pharamaceutical companies to predict which patients would be the ideal candidate for a particular drug, thereby 25 dramatically reducing failure rates and providing greater incentive to companies to continue to conduct research into those drugs.
Forensics The capacity to identify a distinguishing or unique set of forensic markers in an 3o individual is useful for forensic analysis. For example, one can determine whether a blood sample from a suspect matches a blood or other tissue sample from a crime scene by determining whether the set of polymorphic forms occupying selected polymorphic sites is the same in the suspect and the sample. If the set of polymorphic markers does not match between a suspect and a sample, it can be concluded (barring experimental error) that the suspect was not the source of the sample. If the set of markers does match, one can conclude that the DNA from the suspect is consistent with that found at 5 the crime scene. If frequencies of the polymorphic forms at the loci tested have been determined (e.g., by analysis of a suitable population of individuals), one can perform a statistical analysis to determine the probability that a match of suspect and crime scene sample would occur by chance.
Paternity Testing/ Determination of Relatedness 10 The object of paternity testing is usually to determine whether a male is the father of a child. In most cases, the mather of the child is known and thus, the mother's contribution to the child's genotype can be traced. Paternity testing investigates whether the part of the child';s genotype not attributable to the mother is consistent with that of the putative father. Paternity testing can be performed by analyzing sets of 15 polymorphisms in the putative father and the child. Of course, the present invention can be expanded to the use of this procedure to determine if one individual is related to another. Even more broadly, the present invention can be employed to determine how related one individual is to another, for example, between races or species.
Conclusion From the foregoing it can be seen that the advantage of the present invention is that it provides a flexible and scalable method for analyzing complex samples of DNA, such as genomic DNA. These methods are not limited to any particular type of nucleic acid sample: plant, bacterial, animal (including human) total genome DNA, RNA, cDNA and the like may be analyzed using some or all of the methods disclosed in this invention. This invention provides a powerful tool for analysis of complex nucleic acid samples. From experiment design to isolation of desired fragments and hybridization to an appropriate array, the above invention provides for faster, more efficient and less expensive methods of complex nucleic acid analysis.
All publications and patent applications cited above are incorporated by reference in their entirety for all purposes to the same extent as if each individual publication or patent application were specifically and individually indicated to be so incorporated by reference. Although the present invention has been described in some detail by way of illustration and example for purposes of clarity and understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims.
EXHIBIT A
#!/internet/bin/per15.002 -w # Copyright (c) 1998 # Eugene Wang # *** BEGIN ***
#_____________________________________________________________________ #input sequence (File 0) to compare #-____________________________________________________________________ if ($#ARGV < 2) {die "argv < 2";) open(Enzymelnput,$ARGV[0]) II die "Cannot open input file $ARGV[O]";
#print "Input Enzyme 1 sequence = ";
$Elsequence = <EnzymeInput>;
chomp $Elsequence;
$lenElSeq = length($Elsequence);
$Elsequence = tr/a-z/A-Z/;
$ElExtLoc = <EnzymeInput>;
chomp($ElExtLoc):
$lenElTotal = $:lenElSeq + $ElExtLoc;
#print "Input Enzyme 2 sequence = ";
$E2sequence = <Enzymelnput>;
chomp $E2sequence;
$E2sequence = reverse($E2sequence);
$lenE2Seq = length($E2sequence);
$E2sequence =~ tr/a-z/A-Z/;
$E2ExtLoc = <EnzymeInput>;
chomp($E2ExtLoc);
$lenE2Tota1 = $lenE2Seq; + $E2ExtLoc;
$lenElExtra = $E2ExtLoc - $ElExtLoc;
$ElSizeStart = <EnzymeInput>;
chomp($ElSizeStart):
$ElSizeEnd = <EnzymeInput>;
chomp($ElSizeEnd);
______________________._______________________________________________ # .-.. ,..
#open input FASTA file (File 1) #______________________.________________________________________________ #print "Input file name = ";
#Sfname = <>;
#chomp $fname;
#$fname = "H DJ0167F23.aeq";
open(Infile,$ARGV[1]) II die "Cannot open input file $ARGV[1]";
#--____________________._.______________________________________________ #open output file (File 2) #______________________._______________________________________________ open (Outfile,">$ARGV[2]"') II die "Cannot open output file $ARGV[2]";
#open (Outfile,">output.txt");
#print Outfile "Qualifier\tSequence";
#_____________________._________._______________________________________ #read input FASTA file #______________________________._______________________________________ $line = <Infile>; #header line print Outfile "$line";
$linecount = 0;
$FullSeq = "";
#_________________________-___________________________________________ #check headerline format #_____________________________________________________________________ chomp $line;
@fields = split (/\I/~$line);
$ntokens = 0;
foreach (@fields) ($nt:okens++;}
#$ntokens = @fields;
if ($ntokens > 3) ($FragmentID = $fields[3];}
else ( $line =~ s/~> />/;
@fields = split (/ /.$line);
$ntokens = 0;
foreach (@fields) ($ntokens++;}
if ($ntokens > 0) ($FragmentID = $fields[0]; $FragmentID =~ s/~>//;}
else ($FragmentID = "UnknownFragment";}
}
while ($line = <Infile>) #read in a line chomp $line;
# print "$line\n"; ' -$linecount++;
next if ($line eq "");
if ($line =~ /~'#/ II $line =~ /~>/) ##if first char is a '#' or '>' ( &CompareSequ~ithEnzyme ClassIIs(); ##compare the sequences before this line WO 00/24939 PC'f/US99/25200 print Outfil~e "\n\n\n$line\n";
$FullSeq = "";
$linecount = 0;
@fields = split (/\I/.$line);
$FragmentID = $fields[3];
}
else $FullSeq . $.line;
}
#print Outfile "$FullSeq";
close (Infile);
#_____________________________________________________________________ _______________ #compare sequence with FASTA input #_____________________.________________________________________________ &CompareSeqWithEnzyme~ClassIIs();
#--___________________________________________________________________ #close output file #_____________________________________________________________________ close (Outfile);
##
###
#compare sequence with FASTA input ##
###
sub CompareSeqWithEnzyme ClassIIs() $lenFullSeq = length($FullSeq);
if ($lenFullSeq <= 0) (return(0);}
print Outfile "TotalLength:\t$lenFullSeq\n";
print Outfile "Enzyme top strand: ";
print Outfile "(5\'-$Elsequence";
if ($ElExtLoc>0) {print Outfile "(N)$ElExtLoc";}
print Outfile "-3\')":
print Outfile "\n";
print Outfile "Enzyme bottom strand: ";
print Outfile "(5\'-";
if ($E2ExtLoc>0) (print Outfile "'(N)$E2ExtLoc";}
print Outfile "$E2sequen.ce-3\')";
print Outfile " or ";
my $ts = reverse($E2sequence);
print Outfile "(3\'-$t~~";
if ($E2ExtLoc>0) (print Outfile "(N)$E2ExtLoc";}
print Outfile "-5\')"~
print Outfile "\n";
print Outfile "Segment size: $ElSizeStart - $ElSizeEnd\n";
$minLen = $lenElTotal < $lenE2Tota1 ? $lenElTotal : $lenE2Total;
$maxLen = $lenElTotal > $lenE2Tota1 ? $lenElTotal : $lenE2Total;
$nMatchEl = 0;
$nSelected = 0;
@EnzLocLeft = ();
@EnzLocRight = ();
@EnzTypeLeft = ();
@EnzTypeRight = ();
if ($minLen > 0) # for ($i=0; $i <_ $lenFullSeq-$lenElSeq; $i++) for ($i=0; $i <_ $lenFullSeq-$maxLen; $i++) ( if (substr($FullSeq,$i,$lenElSeq) eq $Elsequence) ( # $EnzLocLeft($nMatchEl] _ $i + $lenElTotal;
##have to use push() # $EnzTypeLeft[$nMatchEl] = 1;
push(@EnzLoc:Left,$i + $lenElTotal);
push(@EnzTypeLeft,l);
# print Outfile "$nMatchEl\t$i\t";
# print Outfile "type 1\t";
# print Outfile "$Elsequence\t";
# print Outfile substr($FullSeq,$i,$lenElTotal);
# print Outfile "\n";
if ($nMatchEl > 0) push(@EnzLocRight,$i + $lenElTotal-1);
push(@EnzTypeRight,l);
) $nMatchEl++;
}
# if (sub~str($FullSeq,$i+$E2ExtLoc,$lenE2Seq) eq $E2sequence) elsif (substr($FullSeq,$i+$E2ExtLoc,$lenE2Seq) eq $E2sequence) ( # $EnzLocLeft($nMatchEl] _ $i;
# $EnzCutLeft[$nMatchEl] = 2; Y
push(@EnzLocLeft,$i);
push(@EnzTypeLeft,2);
# print Outfile "$nMatchEl\t$i\t";
# print Outfile "type 2\t";
# print Outfile "$E2sequence\t";
# print C>utfile substr($FullSeq,$i,$lenE2Tota1);
# print Outfile "\n";
if ($nMatchEl > 0) ( push(@EnzLocRight,$i-1);
push(@EnzTypeRight,2);
}
}
}
$nMatchEl++;
if ($nMatchEl > 0) {
push(@EnzLocRight,$i-1);
push{@EnzTypeRight,2);
}
print Outfile "Number of segments: $nMatchEl\n";
if ($nMatchEl !__ ($#EnzLocRight+1)) {die ("Counting error...nMatchEl($nMatchEl) !_ $#EnzLocRight");}
print Outfile "f9atched loci:\n";
for ($i=0; $i < $nMatchEl; $i++) {
print Outfile "$EnzLocLeft[$i]\t";
}
print Outfile "\nSegment Size:\n";
for ($i=0; $i < $nMatchEl-1; $i++) {
$tmpSegSize = $EnzLocRight[$i] - $EnzLocLeft($i] + 1;
if ($tmpSegSize >_ $ElSizeStart && $tmpSegSize <_ $ElSizeEnd) $SegSel<~cted[$nSelected++] _ $i;
}
print Outfile "$tmpSegSize\t";
}
##-____________________._______________________________________________ ## print out the Segment (E1) sequences ##-____________________._-_____________________________________________ print Outfile "\nSegments Selected ($nSelected):";
for ($i=0; $i < $nSelect ed; $i++) {
$selSeq = $SegS~alected[$i];
$Elleft = $EnzLocLeft[$selSeq];
$Elright = $Erzz:~ocRight[$selSeq];
if ($lenElExtra :> 0) {$Elright +_ $lenElExtra;}
else {$Elleft +:_ $lenElExtra;}
$lenSelSeq = $El:right - $Elleft + 1;
WO 00!24939 PCTNS99/25200 $OutputHeaderLine = ">" . $FragmentID ."-" .$selSeq .
"\tsize=" . $lenSelSeq;
$OutputHeaderLine . "\tLoci=" . $Elleft . "-" . $Elright;
$OutputHeaderLine: . "\tEnz$EnzTypeLeft[$selSeq]-Enz$EnzTypeRight[$selSeq]";
print Outfile "\n$OutputHeaderLine";
print "$OutputHeaderLine";
# Segment sequence $SeqEltoNextEl =- substr($FullSeq,$Elleft,$lenSelSeq);
print Outfile "\n$SeqEltoNextE1\n";
print "\n$SeqEll~oNextE1\n";
IS }
return ($lenFullSeq);
}
EXHIBIT B
#!/internet/bin/per15.00Z -w #****************************************************************
# Copyright (c) 1998 # Author: Eugene Wang # Title: Ligate # Purpose: Find matching segments/sequences in two files #****************************************************************
if ($#ARGV != 2) {die "Number of argv ($#ARGV+1) != 3";}
#_____________________.________________________________________________ #input file #_____________________________________________________________________ open(InfileLigate,$ARGV[0]) or die "Open error...$ARGV[OJ\n";
$locLigate = <InfileLigate>;
chomp $locLigate;
$seqLigate = <InfileLigate>;
chomp $seqLigate;
close (InfileLigate);
______________________________________________________~_____________ :.
#output file #-____________________________________________________________________ open(Infile,$ARGV[1]) or die "Open error...$ARGV[1]\n";
$OutName = $ARGV[2];
open (Outfile,">$OutName") or die("Open error...$OutName");
$alreadyReadOne = 0;
$sequence = "";
while ($line = <Infile>) #read in a line {
chomp $line;
next if ($line e~q "") ' if ($line =~ /~#/ II $line =~ /~>/) ##if first char is a '#' or '>' {
if ($already~:eadOne =- 1) {
if (&Lic~ate($sequence,$locLigate,$seqLigate) _-- 1) {
print, Outfile "$headerLine\n";
print Outfile "$sequence\n";
):
$sequenc:e = "";
) $headerLine _- $line;
$alreadyReadOne = 1;
else {
$sequence . Sline;
) if ($alreadyReadOne =- .L) {
if (&Ligate($sequence,$locLigate,$seqLigate) _- 1) {
print Outfile "SheaderLine\n";
print Outfile "$sequence\n";
);
close (Infile);
close (Outfile);
##
###
#compare sequence with Ligation Adapter sequence ##
###
sub Ligate() local $retcode = 0;
local ($seq,$locLigate,$seqLigate) _ @-; ~- ..
local $lenLigate = length($seqLigate):
local $lenSeq = length($seq):
SS
if ((substr($seq.$locLigate,$lenLigate) eq $seqLigate) &&
(substr($seq,$lenSeq-$locLigate-$lenLigate,$lenLigate) eq $seqLigate)) {
$retcode = 1;
return $retcode;
92/10092 and U.S. Pat. No. 5,424,186, each ofwhich is hereby incorporated in its entirety by reference for all purposes. Techniques for the synthesis of these arrays using mechanical synthesis methods are described in, e.g., U.S. Pat. No. 5,384,261, incorporated herein by reference in its entirety for all purposes. Although a planar array surface is preferred, the array may be fabricated on a surface of virtually any shape or even a multiplicity of surfaces. Arrays rnay be nucleic acids on beads, fibers such as fiber optics, glass or any other appropriate substrate, see US Patent Nos.
5,770,358, 5,789,162, 5,708,153 and 5,800,992 which are hereby incorporated in their entirety for all purposes. Arrays may be packaged in such a manner as to allow for diagnostics or other manipulation of in an all inclusive device, see for example, US
2o Patent Nos. 5,856,174 and 5,922,591 incorporated in their entirety by reference for all purposes.
Hybridization probes are oligonucleotides capable of binding in a base-specific manner to a complementary strand of nucleic acid. Such probes include peptide nucleic acids, as described in Niels;en et al., Science 254, 1497-1500 (1991), and other nucleic acid analogs and nucleic acid mimetics. See US Patent Application No.
081630,427 filed 4/3/96.
Hybridizations are usually performed under stringent conditions, for example, at a salt concentration of no more than 1 M and a temperature of at least 25 °C. For example, conditions of 5X SSPE (750 mM NaCI, 50 mM NaPhosphate, 5 mM EDTA, 3o pH 7.4) and a temperature of 25-30°C are suitable for allele-specific probe hybridizations. For stringent conditions, see, for example, Sambrook, Fritsche and Maniatis. "Molecular Cloning A laboratory Manual" 2"d Ed. Cold Spring Harbor Press (1989) which is hereby incorporated by reference in its entirety for all purposes above.
Polymorphism refers to the occurrence of two or more genetically determined alternative sequences or alleles in a population. A polymorphic marker or site is the locus at which divergence occurs. Preferred markers have at least two alleles, each occurring at frequency of l,~reater than 1 %, and more preferably greater than 10% or 20% of a selected population. A polymorphism may comprise one or more base changes, an insertion, a repeat, or a deletion. A polymorphic locus may be as small as one base pair. Polymorphiic markers include restriction fragment length 1o polymorphisms, variable number of tandem repeats (VNTR's), hypervariable regions, minisatellites, dinucleotidc; repeats, trinucleotide repeats, tetranucleotide repeats, simple sequence repeats, and insertion elements such as Alu. The first identified allelic form is arbitrarily designated as the reference form and other allelic forms are designated as alternative or variant alleles. The allelic form occurring most frequently in a selected is population is sometimes referred to as the wildtype form. Diploid organisms may be homozygous or heterozygous for allelic forms. A diallelic polymorphism has two forms. A triallelic polymorphism has three forms.
A single nucleotide polymorphism (SNP) occurs at a polymorphic site occupied by a single nucleotide, which is the site of variation between allelic sequences. The site 20 is usually preceded by and followed by highly conserved sequences of the allele (e.g., sequences that vary in less than 1/100 or 1/1000 members of the populations).
A single nucleotide polymorphism usually arises due to substitution of one nucleotide for another at the polymorphic site. A transition is the replacement of one purine by another purine or one pyrimidine by another pyrimidine. A
transversion is 25 the replacement of a purine by a pyrimidine or vice versa. Single nucleotide polymorphisms can also arise from a deletion of a nucleotide or an insertion of a nucleotide relative to a reference allele.
An individual is not limited to a human being, but may also include other organisms including but not limited to mammals, plants, bacteria or cells derived from 30 any of the above.
General The present invention provides for novel methods of sample preparation and analysis involving managing or reducing the complexity of a nucleic acid sample, such as genomic DNA, in a reproducible manner. The invention further provides for analysis of the above sample by hybridization to an array which may be specifically designed to interrogate the desired fragments for particular characteristics, such as, for example, the presence or absence of a polymorphism. The invention fixrther provides for novel methods of using a computer system to model enzymatic reactions in order to determine experimental conditions before conducting any actual experiments. As an to example, the present techniques are useful to identify new polymorphisms and to genotype individuals after palymorphisms have been identified.
Generally, the steps of the present invention involve reducing the complexity of a nucleic acid sample using the disclosed techniques alone or in combination.
None of these techniques require multiplex PCR and most of them can be performed in a single 15 tube. With one exception (AP PCR), the methods for complexity reduction involve fragmenting the nucleic acid sample, often, but not always by restriction enzyme digest.
The resulting fragments, or in the case of AP PCR, PCR products, of interest are then isolated. The isolation steps of the present invention vary but may involve size selection or direct amplification, often adaptor sequences are employed to facilitate 2o isolation. In a preferred embodiment the isolated sequences are then exposed to an array which may or may nat have been specifically designed and manufactured to interrogate the isolated sequences. Design of both the complexity management steps and the arrays may be aided by the computer modeling techniques which are also described in the present invention.
Complexitv mana eg_ment The present invention provides for a number of novel methods of complexity management of nucleic acid samples such as genomic DNA. These methods are disclosed below.
3o A number of methods disclosed herein require the use of restriction enzymes to fragment the nucleic acid sample. Methods of using a restriction enzyme or enzymes to cut nucleic acids at a large number of sites and selecting a size range of restriction fragments for assay have been shown. This scheme is illustrated in Figure 1.
In one embodiment of the invention, schematically illustrated in Figure 2, restriction enzymes are used to cut the nucleic acids in the sample (Fig. 2, Step 1). In general, a restriction enzyme recognizes a specific nucleotide sequence of four to eight nucleotides (though this number can vary) and cuts a DNA molecule at a specific site.
For example, the restriction enzyme Eco RI recognizes the sequence GAATTC and will cut a DNA molecule between the G and the first A. Many different restriction enzymes are known and appropriate restriction enzymes can be chosen for a desired result. For to example, restriction enzymes can be purchased from suppliers such as New England Biolabs. Methods for conducting restriction digests will be known to those of skill in the art, but directions for each restriction enzyme are generally supplied with the restriction enzymes themselves. For a thorough explanation of the use of restriction enzymes, see for example, section 5, specifically pages 5.2-5.32 of Sambrook, et al., 15 incorporated by reference above.
After restriction enzyme digestion, the method further requires that the pool of digested DNA fragments be separated by size and that DNA fragments of the desired size be selected (Figure 2, step 2) and isolated (Figure 2, Step 3). Methods for separating DNA fragments after a restriction digest will be well known to those of skill 2o in the art. As a non-limiting example, DNA fragments which have been digested with a restriction enzyme may be separated using gel electrophoresis, see for example, Maniatis, section 6. In this technique, DNA fragments are placed in a gel matrix. An electric field is applied across the gel and the DNA fragments migrate towards the positive end. The larger the DNA fragment, the more the fargment's migration is 25 inhibited by the gel matrix. This allows for the separation of the DNA
fragments by size. A size marker is run on the gel simultaneously with the DNA fragments so that the fragments of the desired size may be identified and isolated from the gel.
Methods for purification of the DNA fragments from the gel matrix are also described in Sambrook et al.
3o Any other non-destructive method of isolating DNA fragments of the desired size may be employed. For example, size-based chromotography, HPLC, dIiPLC or a sucrose density gradient could be used to reduce the DNA pool to those fragments WO 00/24939 ~CT/US99/25200 ~0 within a particular size range and then this smaller pool could be run on an electrophoresis gel.
After isolation, adaptor sequences are ligated to the fragments. (Figure 2, Step 4) Adaptor sequences are generally oligonucleotides of at least 5 or 10 bases and preferably no more than SO or 60 bases in length, however, adaptor sequences may be even longer, up to 100 or 200 bases depending upon the desired result. For example, if the desired outcome is to prevent amplification of a particular fragment, longer adaptor sequences designed to form. stem loops or other tertiary structures may be ligated to the fragment. Adaptor sequences may be synthesized using any methods known to those of skill in the art. For the puposes of this invention they may, as options, comprise templates for PCR primers and/or tag or recognition sequences. The design and use of tag sequences is described in US Patent No. 5,800,992 and US Provisional Patent Application No. 60/140,350, filed 6/23/99. Both of which are incorporated by reference for all purposes. Adaptor sequences may be ligated to either blunt end or 1s sticky end DNA. Methods of ligation will be known to those of skill in the art and are described, for example, in Sambrook et al. Methods include DNase digestian to "nick"
the DNA, ligation with ddNTP and the use of polymerise I to fill in gaps or any other methods described in the art.
Further complexity reduction is achieved by adding a specific nucleotide on the 5' end of the PCR primer as illustrated in Figure 3. The specific nucleotide fiu-ther reduces the complexity of the resulting DNA pool because only those fragments which have been isolated after restriction enzyme digestion and contain the complement of the specific nucleotides) incorporated in the PCR primer will be amplified. Figure depicts the results of hybridization to an array after enzyme digestion, ligation to an 2s adaptor and PCR amplification. Figs. 3B and 3C depict the results of hybridization to an array after enzyme digestion, ligation to an adaptor and PCR amplification where the PCR primers incorporated specific nucleotides in the 5' end of the primer. Tn Fig. 3B
the 5' and 3' primers have different specific nucleotides incorporated. In Fig. 3A the 5' and 3' primers have the same nucleotides incorporated. The level of complexity in the 3o isolated pool can be varied depending upon the identity and number of nucleotides incorporated into the PCR primers. A number of embodiments of the present invention involve amplification by PC'.R. Any of these embodiments may be further modified to reduce complexity using the above disclosed technique.
Various methods of conducting PCR amplification and primer design and construction for PCR amplification will be known to those of skill in the art.
PCR is a method by which a specific polynucleotide sequence can be amplified in vitro.
PCR is an extremely powerful technique for amplifying specific polynucleotide sequences, including genomic DNA, single-stranded cDNA, and mRNA among others. As described in U.S. Pat. Nos. 4,683,202, 4,683,195, and 4,800,159 (which are incorporated herein by reference), PCR typically comprises treating separate to complementary strands of a target nucleic acid with two oligonucleotide primers to form complementary primer extension products on both strands that act as templates for synthesizing copies of the desired nucleic acid sequences. By repeating the separation and synthesis steps in an automated system, essentially exponential duplication of the target sequences can be achieved. Standard protocols may be found in, for example 15 Sambrook et al. which is hereby incorporated by reference for all purposes.
In another embodiment, schematically illustrated in Figure 4, the step of complexity management of the DNA samples comprises digestion with a Type Its endonuclease thereby creating sticky ends comprised of random nucleic acid sequences. (Fig 4, Step 1) Type-Its endonucleases are generally commercially 2o available and are well known in the art. A description of Type Its endonucleases can be found in US Patent No. 5,710,000 which is hereby incorporated by reference for all purposes. Like their Type-II counterparts, Type-Its endonucleases recognize specific sequences of nucleotide base pairs within a double stranded polynucleotide sequence.
Upon recognizing that sequence, the endonuclease will cleave the polynucleotide 2s sequence, generally leaving an overhang of one strand of the sequence, or "sticky end."
Type-II endonucleases, however, generally require that the specific recognition site be palindromic. That is, reading in the 5' to 3' direction, the base pair sequence is the same for both strands of the recognition site. For example, the.~equence G-I-A-A-T-T-C
3o C-T-T-A-A-I-G
is the recognition site for the Type-II endonuclease EcoRi, where the arrows indicate the cleavage sites in each strand. This sequence is palindromic in that both strands of the sequence, when read in the 5' to 3' direction are the same.
The Type-Its endonucleases, on the other hand, generally do not require palindromic recognition sequences. Additionally, these Type-Its endonucleases also generally cleave outside of their recognition sites. For example, the Type-Its endonuclease Earl recognizes and cleaves in the following manner:
CTCTTCNINNNN
GAGAAG nn n n ~ n where the recognition sequence is -C-T-C-T-T-C-, N and n represent complementary, ambiguous base pairs and the arrows indicate the cleavage sites in each strand. As the example illustrates, the recognition sequence is non-palindromic, and the cleavage occurs outside of that recognition site.
Specific Type-Its endonucleases which are useful in the present invention include, e.g., EarI, MnII, PIeI, AIwI, BbsI, BsaI, BsmAI, BspMI, Esp3I, HgaI, SapI, SfaNI, BbvI, BsmFI, FokI, BseRI, Hphl and MboII. The activity of these Type-Its endonucleases is illustrated in FIG. 5, which shows the cleavage and recognition patterns of the Type-Its endonucleases.
The sticky ends resulting from Type-Its endonuclease digestion are then ligated to adaptor sequences (Fig 4, Step 2) Those of skill in the art will be familiar with methods of ligation. Standard protocols can be found in, for example, Sambrook et al., hereby incorporated by reference for all purposes. Only those fragments containing the adaptor sequence are isalated. (Figure 6) In addition to those methods of isolation discussed above, methods of isolation which take advantage of unique tag sequences which may be constructed in the adaptor sequences may be employed. These tag sequences may or may not be used as PCR
primer templates. Fragments containing these tags can then be segregated from other non-tag bearing sequences using various methods of hybridization or any of the methods described in the above referenced application.
3o In another embodiment, depicted in Figure 18, the method of complexity reduction comprises digesting the DNA sample with two different restriction enzymes.
The first restriction enzyme is a frequent base cutter, such as MSE I which has a four WO 00/24939 ~CT/US99/25200 base recognition site. The second restriction enzyme is a rare base cutter, such as Eco RI, which has a 6 base recolmition site. This results in three possible categories of fragments; (most common) those which have been cut on both ends with the frequent base cutter, (least common) those which have been cut on both ends with the rare base cutter, and those which have been cut on one end with the frequent base cutter and on one end with the rare base cutter. Adaptors are ligated to the fragments and PCR
primers are designed such that only those fragments which fall into the desired category or categories are amplified. This technique, employed with a six base cutter and a four base cutter can reduce complexity 8-fold when only those fragments from the latter category are amplified. Other combinations of restriction enzymes may be employed to achieve the desired level of complexity.
In another embodiment, the step of complexity management comprises removing repetitive sequences. Figure 10 depicts a schematic representation of this embodiment. The nucleic acid sample is first fragmented. (Figure 10, Step 1 ) Various methods of fragmenting DNA will be known to those of skill in the art.
These methods may be, for example, either chemical or physical in nature. Chemical fragmentation may include partial degradation with a DNAse, partial depurination with acid, the use of restriction e:n2:ymes or other enzymes which cleave DNA at known or unknown locations. Physical fragmentation methods may involve subjecting the DNA
2o to a high shear rate. High shear rates may be produced, for example, by moving DNA
through a chamber or channel with pits or spikes, or forcing the DNA sample through a restricted size flow passage:, e.g., an aperture having a cross sectional dimension in the micron or submicron scale.
In a preferred embodiment adaptor sequences are ligated to the resulting fragments. (Figure 10, Step 2) The fragments with or without adaptor sequences are then denatured. (Figure 10, Step 3) Methods of denaturation will be will known to those of skill in the art. After denaturation, the fragments are then allowed to reanneal.
(Figure 10, Step 4) Annealing conditions may be altered as appropriate to obtain the ' level of repetitive sequence removal desired. Finally, double stranded sequences are 3o removed (Figure 10, Step ~~). Methods of removing double stranded sequences will be known to those of skill in the art and may include without limitation, methods of digesting double stranded DNA such as double strand specific nucleases and exonucleases or methods of physical separation including, without limitation gel based electrophoresis or size chromotography.
In another embodiment, the step of complexity management comprises performing an arbitrarily primed polymerase chain reaction (AP PCR) upon the sample.
AP PCR is described in US Patent No. 5,487,985 which is hereby incorporated by reference in its entirety for all purposes. Figure 7 depicts a schematic illustration of this embodiment. Performing AP PCR with random primers which have specific nucleotides incorporated into the primers produces a reduced representation of genomic DNA in a reproducible manner. Figure 8 shows the level of complexity reduction of human genomic DNA resulting from AP PCR with various primers. Column 1 lists the primer name. Column 2 list the primer sequence. Column 3 lists the annealing temperature. Column 4 lists the polymerase used. Column 5 lists the number correlated to a specific gene on the Hum6.8K GeneChip(R) probe array (Affymetrix, Inc. Santa Clara, Ca). Column 6 lists the percentage of the human genes on the i5 Hum6.8K GeneChip(R) probe array found by fragments whose complexity has been reduced by this method. Fi~;ure 9 shows the reproducibility of AP PCR.
Independently prepared samples preps were subjected to AP PCR using the same primers. The gel bands show that the level of reproducibility between the samples is very high.
Primers may be designed using standard techniques. For example, a computer program is available on the iinternet at the Operon Technologies, Inc. website at http:
www.operon.com. The Operon Oligo Toolkit allows a user to input a potential primer sequence into the webform. The site will instantly calculate a variety of attributes for the oligonucleotide including molecular weight, GC content, Tm, and primer-dimer sets. You may also plot the oligonucletoide against a second sequence. PCR
amplification techniques are. described above in this application and will be well known to those of skill in the art.
In another embodiment of the invention, the method reducing the complexity of a nucleic acid sample comprises hybridizing the sample to a nucleic acid probe containing a desired sequence which is bound to a solid support, such as a magnetic bead. For a description of hybridization of nucleic acids to solid supports, see US Pat No. 5,800,992 incorporated by reference above. This sequence may comprise, for example, a sequence containing a SNP, a cDNA fragment, a chromosome fragment, a subset of genomic DNA or a subset of a library. The sequence may comprise as few as 16 nucleotides and may comprise as many as 2,000, 3,000, 5,000 or more nucleotides in length. Methods of designing and making oligonucleotide probes will be well known to those of skill in the art. In one embodiment, the probe may contain a 5 template sequence for a PCR primer. Solid supports suitable for the attachment of nucleic acid probe sequences will be well known to those of skill in the art but may include, glass beads, magnetic beads, and/or planar surfaces. Magnetic beads axe commercially available from, for example, Dynal (Oslo, Norway). The nucleic acid probes may be synthesized directly on the solid support or attached to the support as a t o full length sequence. Protocols for attaching magnetic beads to probes are included in US Patent No. 5,512,439 which is hereby incorporated by reference for all purposes.
Standard hybridization protocols as discussed above may be employed.
Figure 11 depicts a schematic representation of one example of the above embodiment, wherein the complexity management step is utilized to facilitate genome 15 wide genotyping. Much of the cost of genotyping comes from multiplex PCR.
In this embodiment, the entire sample preparation can be performed in a single tube without the need for multiplex PCR. Because the desired result is to genotype a DNA
sample, the desired sequence in Figure 11 contains a polymorphism. The oligonucleotide comprises 32 bases with the SNP in the center. A magnetic bead is attached to the oligonucleotide probe. (Fig. 11, step 1 ) The probe is then exposed to, for example, fractionated genomic DNA. (Fig.l l, step 2). Adaptor sequences are ligated to both ends of the fragments. (Fig. 11, step 3). The fragments are then amplified (Fig. 11, step 4) and the PCR product containing the desired polymorphism may then be analyzed by various methods including, for example, hybridization to an array or single base extension (SBE). SBE is described in, for example US Provisional Application 60/140,359 which is hereby incorporated by reference in its entirety for all purposes.
The method may further comprise exposing the hybridized sample to a single strand DNA nuclease to remove the single stranded DNA. This embodiment may further comprise ligating an adaptor sequence containing a Class II S
restriction 3o enzyme site to resulting duplexed DNA and digesting the duplex with the appropriate Class II S restriction enzyme to release the attached sequences. The sequences are then WO 00!24939 PCT/US99/25200 isolated and a second adaptor sequence is ligated to the complex and the sequences are amplified.
Figures 12 and 13 depict schematic representations of an embodiment comprising the use of ClassIIs endonucleases. Both figures depict methods which may be employed for single tube genotyping without the need for multiplex PCR. In Figures 12 and 13, the desired sequence is a SNP. The oligonucleotide probe in Figure 12 is 32 bases long and in Figure 13 is 17 bases long. In both figures the SNP
is in the center of the oligonucleotide. The oligonucleotide probe is bound to a magnetic bead.
(Figs. 12 and 13, step 1 ). The probe is then hybridized to fragmented genomic DNA
l0 (Figs. 12 and 13, step 2). Single stranded DNA is digested with a single strand DNA
nuclease leaving a DNA duplex attached to the magnetic bead. (Figs. 12 and 13, step 3). An adaptor sequence is then ligated to the duplex. The adaptor sequence contains a Class IIS restriction site. The probe length and Class IIS endonuclease are chosen such that the site where the duplex is cut is between the SNP and the magnetic bead. In Figure 12 the Class IIS endonuclease cuts directly adjacent to the SNP site, such that the SNP is part of the sticky end left by the endonuclease digestion. (Fig.
12, step 5) In Figure 13 the endonuclease cuts closer to the magnetic bead, leaving a number of bases between the sticky end and the SNP site. (Fig. 13, step 5) In either case, the magnetic bead is released and the sequences are isolated. Adaptor sequences are then ligated to the sticky ends. (Figs. 12 and 13, step 6) In both Figures 12 and 13 the adaptor sequences contain templates for PCR probes. The fragments containing the SNP are then amplified (Figs. 12 and 13, step 7) and the PCR products may be analyzed in a number of different methods including hybridization to an array designed to detect SNPs or SBE.
In this embodiment, the adaptor sequence may further comprise a SNP
identification sequence or tag. In this case, the array to which the PCR
products are hybridized may be a generic tag array as described in the above referenced US
Patent No. 5,800,992 and US Provisional Patent Application 60/140,359 or a chimeric probe array (Figure 14). A chimeric; probe array contains probes which interrogate both for 3o particular sequences characteristic of a genotype as well as for artificial sequences which have been ligated to specific fragments in the sample pool. This allows for higher specificity of hybridization and better differentiation between probes.
This embodiment is depicted in :Figure 15.
In another embodirr~ent, depicted in Figure 16 the method of complexity reduction comprises hybridizing the DNA sample to a mismatch binding protein.
Fig.
16, step 2. Mismatch binding proteins are described in Wagner R. and Radman, M.
(1995) "Methods: A Companion to Methods in Enzymology" 7, 199-203 which is hereby incorporated by reference in its entirety for all purposes. Mismatch binding proteins preferentially bind to DNA duplexes which contain sequence mismatches.
This allows for a relatively wimple and rapid method to locate and identify SNPs. In to this embodiment no prior lcnawledge of the SNP is required. Mismatch binding proteins are commercially available through GeneCheck (Ft. Collins, Co.). In a further embodiment, depicted in Figure 17, magnetic beads are attached to the mismatch binding proteins. Mismatch binding proteins attached to magnetic beads are commercially available through GeneCheck (Ft. Collins, Co.). After hybridization the sample is digested with a 3' to 5' exonuclease (Fig. 16, step 3). Remaining single stranded DNA is then removed with a nuclease (Fig. 16, step 4).
If it is desired to cut the duplex at the mismatch, then the enzyme resolvase may be used. See US Patent Nos. 5,958,692, 5,871,911 and 5,876,941 (each of which is incorporated by reference in their entireties for all purposes) for a description of various methods of cleaving nucleic acids. The resolvases (e.g. X-solvases of yeast and bacteriophage T4, Jensch et al. EMBO J. 8, 4325 (1989)) are nucleolytic enzymes capable of catalyzing the resolution of branched DNA intermediates (e.g., DNA
cruciforrns) which can involve hundreds of nucleotides. In general, these enzymes are active close to the site of DNA distortion (Bhattacharyya et al., J. Mol.
Biol., 221, 1191, (1991)). T4 Endonuclease VII, the product of gene 49 of bacteriophage T4 (Kleff et al., The EMBO J. 7, 1527, (1988)) is a resolvase (West, Annu. Rev. Biochem.
61, 603, (1992)) which was first shown to resolve Holliday-structures (Mizuuchi et al., Cell 29, 357, (1982)). T4 Endonuclease VII has been shown to recognize DNA
cruciforms (Bhattacharyya et al., supra; Mizuuchi et al., supra) and DNA loops (Kleff et al., supra), 3o and it may be involved in patch repair. Bacteriophage T7 Endonuclease I has also been shown to recognize and cleave DNA cruciforms (West, Ann. Rev. Biochem. 61, 603, (1992)). Eukaryotic resolvasc;s, particularly from the yeast Saccharomyces cerevisiae, have been shown to recogni;se and cleave cruciform DNA (West, supra; Jensch, et al., EMBO J. 8, 4325 (1989)). Other nucleases are known which recognize and cleave DNA mismatches. For example, S 1 nuclease is capable of recognizing and cleaving DNA mismatches formed when a test DNA and a control DNA are annealed to form a heteroduplex (Shenk et al., F'roc. Natl. Acad. Sci. 72, 989, (1975)). The Nut Y repair protein of E. coli is also capable of detecting and cleaving DNA mismatches.
Computer Implemented Analysis In another embodiment a computer system is used to model the reactions to discussed above to aid the user in selecting the correct experimental conditions. In this embodiment, the sequence o;f the DNA sample must be known. A computer program queries an electronic database containing the sequence of the DNA sample looking for sites which will be recognized by the enzyme being used. The method of modeling experiments can be employed for a wide variety of experiments.
15 In one embodiment, the user can run multiple experiments altering various conditions. For example, i f a user desires to isolate a particular sequence of interest in a fragment which has been digested with a restriction enzyme, the user can have the computer model the possible outcomes using a wide variety of restriction enzymes.
The particular sequence which is selected may be chosen by specific criteria, i.e.
2o because the region is believed to be associated with specific genes, polymorphisms, or phenotypes for example, or may be chosen at random. The user can then select the restriction enzyme which, for example, isolates the desired sequence in a fragment of unique size. Additionally or alternatively, if the user desires to reduce complexity using the type IIS nuclease/ligation technique described above, the user can experiment 25 with the length and sequence of the adaptors to determine the optimal sequence for the adaptors' "sticky" ends. This enables the user to be confident that they will obtain a fragment containing a particular sequence of interest or to fine tune the level of complexity in the DNA pool. In another embodiment, a user could model the kinetics of the denaturing, reannealin~; technique for removal of repeated sequences discussed 3o above to determine the conditions which allow for the desired result. For example, a user may desire the removal of only a certain percentage of repeated sequences.
For example, virtual restriction digests may be performed by querying an electronic database which contains the sequence of DNA of interest. Because the database contains the nucleic acid sequence and restriction enzymes cut at known locations based on the DNA sequence, one can easily predict the sequence and size of fragments which will result from a restriction digest of the DNA. Ideally, restriction enzymes which produce no two fragments of the same or very similar size are desired.
Combinations of restriction enzymes may be employed. Those of skill in the art will be familiar with electronic databases of DNA sequences. GenBank, for example, contains approximately 2,570,000,000 nucleic acid bases in 3,525,000 sequence records as of to April 1999. A computer program searches the electronic database for a sequence which suits the requirements of the particular restriction enzyme. For example, the restriction enzyme Eco RI recognizes l:he sequence GAATTC and will cut a DNA molecule between the G and the first A. The computer program will query the chosen sequence for any occurences of the sequence GAATTC and mark the site where the restriction 15 enzyme will cut. The program will then provide the user with a display of the resulting fragments.
Exhibit 1 is an example of a program to conduct this type of virtual enzyme digestion. Exhibit 2 is an example of a program to virtually model the ligation of two sequences to each other.
2o In another embodiment, the method of modeling experiments in a computer system can be used to design probe arrays. A database may be interrogated for any desired sequence, for example, a polymorphism. Computer modeled reactions are then performed to help determine the method for isolating a fragment of DNA
containing the sequence of interest. These methods may comprise any of the methods described 25 above, alone or in combination. Arrays are then constructed which are designed to interrogate the resulting fragments. It is important to note that for the purpose of designing arrays, the virtual reactions need not be performed flawlessly, since the arrays may contain hundreds of thousands of sequences.
One embodiment of the invention relies on the use of virtual reactions to 3o predetermine the sequence o:f chosen DNA fragments which have subjected to various procedures. The sequence information for the chosen fragments is then used to design the probes which are to be attached to DNA arrays. Arrays rnay be designed and manufactured in any number of ways. For example, DNA arrays may be synthesized directly onto a solid support using methods described in, for example US
Patent Nos.
5,837,832, 5,744,305 and 5,800,992 and W095/11995 herein incorporated by reference for all purposes. See also, Fodor et al., Science, 251:767-777 (1991), Pinning s et al., U.S. Pat. No. 5,143,854 (see also PCT Application No. WO 90/15070) and Fodor et al., PCT Publication No. WO 92/10092 and U.S. Pat. No. 5,424,186, each of which is hereby incorporated in its entirety by reference for all purposes.
Techniques for the synthesis of these arrays using mechanical synthesis methods are described in, e.g., U.S. Pat. No. 5,384,261, incorporated herein by reference in its entirety for all purposes 1o Briefly, 5,837,832 describes a tiling method for array fabrication whereby probes are synthesized on a solid support. These arrays comprise a set of oligonucleotide probes such that, for each base in a specific reference sequence, the set includes a probe (called the "wild-type" or "WT" probe) that is exactly complementary to a section of the sequence of the chosen fragment including the base of interest and four additional 15 probes (called "substitution probes"), which are identical to the WT probe except that the base of interest has been replaced by one of a predetermined set (typically 4) of nucleotides. Probes may be synthesized to query each base in the sequence of the chosen fragment. Target nucleic acid sequences which hybridize to a probe on the array which contain a substitution probe indicate the presence of a single nucleotide 2o polymorphism. Other applications describing methods of designing tiling arrays include: US Patent Nos. 5,858,659, and 5,861,242 each of which is incorporated by reference in its entirety for all purposes. In a similar manner, arrays could be constructed to test for a variety of sequence variations including deletions, repeats or base changes greater than one; nucleotide. US Patent Nos. 5,593,839 and 5,856,101 (each of which is incorporated by reference for all purposes) describe methods of using computers to design arrays and lithographic masks.
The label used to detect the target sequences will be determined, in part, by the detection methods being applied. Thus, the labeling method and label used are selected in combination with the actual detecting systems being used. Once a particular label has 3o been selected, appropriate labeling protocols will be applied, as described below for specific embodiments. Standard labeling protocols for nucleic acids are described, e.g., in Maniatis; Kambara, H. et al. (1988) BioTechnology 6:816-821; Smith, L. et al.
(1985) Nuc. Acids Res. 13:2399-2412; for polypeptides, see, e.g., Allen G.
(1989) Sequencing of Proteins and Peptides, Elsevier, N.Y., especially chapter 5, and Greenstein and Winitz ( 1961 ) Chemistry of the Amino Acids, Wiley and Sons, N.Y.
Carbohydrate labeling is described, e.g., in Chaplin and Kennedy (1986) Carbohydrate Analysis: A Practical Approach, IRL Press, Oxford. Other techniques such as TdT end labeling may likewise be employed. Techniques for labeling protocols for use with SBE are described in, e.g. US Provisional Patent Application 60/140,359 which is incorporated by reference above.
Generally, when using a DNA array a quickly and easily detectable signal is to preferred. Fluorescent tagging of the target sequence is often preferred, but other suitable labels include heavy metal labels, magnetic probes, chromogenic labels (e.g., phosphorescent labels, dyes, and fluorophores) spectroscopic labels, enzyme linked labels, radioactive labels, and labeled binding proteins. Additional labels are described in U.S. Pat. Nos. 5,800,992 and 4,366,241, and published PCT Application WO
15 99/13319 which are incorporated herein by reference.
The hybridization conditions between probe and target should be selected such that the specific recognition interaction, i.e., hybridization, of the two molecules is both sufficiently specific and sufficiently stable. See, e.g., Hames and Higgins (1985) Nucleic Acid Hybridisation: A Practical Approach, IRL Press, Oxford. These 2o conditions will be dependent both on the specific sequence and often on the guanine and cytosine (GC) content of the complementary hybrid strands. The conditions may often be selected to be universally equally stable independent of the specific sequences involved. This typically will make use of a reagent such as an alkylammonium buffer.
See, Wood et al. (1985) "Base Composition-independent Hybridization in 25 Tetramethylammonium Chloride: A Method for Oligonucleotide Screening of Highly Complex Gene Libraries," Proc. Natl. Acad. Sci. USA, 82:1585-1588; and Krupov et al. (1989) "An Oligonucleotide Hybridization Approach to DNA Sequencing," FEBS
Letters, 256:118-122; each of which is hereby incorporated herein by reference. An alkylammonium buffer tends to minimize differences in hybridization rate and stability 3o due to GC content. By virtue of the fact that sequences then hybridize with approximately equal affinity and stability, there is relatively little bias in strength or kinetics of binding for particular sequences. Temperature and salt conditions along with other buffer parameters should be selected such that the kinetics of renaturation should be essentially independent of the specific target subsequence or oligonucleotide probe involved. In order to ensure this, the hybridization reactions will usually be performed in a single incubation of all the substrate matrices together exposed to the identical same target probe solution under the same conditions. The hybridization conditions will usually be selected to be sufficiently specific such that the fidelity of base matching will be properly discriminated. Of course, control hybridizations should be included to determine the stringency and kinetics of hybridization. See for example, US Patent No. 5,871,928 which is hereby incorporated in its entirety for all purposes.
Another factor that c;an be adjusted to increase the ability of targets to hybridize to probes is the use of nucleic acid analogs of PNAs in the probes. They can be built into the probes to create a more uniform set of hybridization conditions across the entire array. See US Patent Application No. 08/630,427 incorporated by reference above.
The detection methods used to determine where hybridization has taken place will typically depend upon the label selected. Thus, for a fluorescent label a fluorescent detection apparatus will typically be used. Pirrung et al. (1992) U.S. Pat.
No. 5,143,854 and Ser. No. 07/624,120, now abandoned, (both of which are hereby incorporated by reference for all purposes) describe apparatus and mechanisms for scanning a substrate matrix using fluorescence detection, but a similar apparatus is adaptable for other optically detectable labels. See also, US Patent Nos. 5,578,832, 5,834,758, and 5,837,832 each of which is incorporated by reference in its entirety for all purposes.
A variety of methodic can be used to enhance detection of labeled targets bound to a probe attached to a solid support. In one embodiment, the protein MutS
(from E.
coli) or equivalent proteins such as yeast MSH1, MSH2, and MSH3; mouse Rep-3, and Streptococcus Hex-A, is used in conjunction with target hybridization to detect probe-target complex that contain mismatched base pairs. The protein, labeled directly or indirectly, can be added during or after hybridization of target nucleic' acid, and differentially binds to homo-~ and heteroduplex nucleic acid. A wide variety of dyes and other labels can be used for similar purposes. For instance, the dye YOYO-1 is known to bind preferentially to nucleic acids containing sequences comprising runs of 3 or more G residues. Signal amplification methods as described in US Patent Application No. 09/276,774 may likewise be used.
Various methods of hybridization detection will be known to those of skill in the art. See for example, US Patent Nos. 5,578,832, 5,631,734, 5,744,305 and 5,$00,992 each of which is hereby incorporated in its entirety for all purposes.
Examples 1o Example 1 - Restriction Enzyme Di~est/Sizi~
The complexity of total genomic DNA from human and yeast was reproducibly reduced using a restriction enzyme digestion. For each species 0.5 ug genomic DNA
was digested with 20 units of EcoRI in a total volume of 40 ul at 37 °C
overnight (Figure 2, Step 1 ). The enzyme was inactivated by incubation at 65 °C
for 10 minutes.
The DNA solution was mixed with 10 ul Sx loading buffer and separated by gel electrophoresis on a 2% agarose gel. (Figure 2, Step 2) The gel was visualized by ethidium bromide staining. Fragments of 250 - 350 by were excised from the gel and purified using a QIAquick ge;l extraction kit (Qiagen). (Figure 2, Step 3) Alternatively, fragments of the required size could have been isolated using HPLC.
2o Adaptor sequences containing PCR primer template sequences were then ligated to the purified fragrne;nts using 100U T4 ligase in lx T4 DNA ligase buffer (New England Biolabs) at 16 °C overnight. The adaptor sequences were 5'-d(pAATTCGAACCCCTTCGGATC)-3' and 5'-d(GATCCGAAGGGGTTCGAATT)-3' (Figure 2, Step 4) The ligase; was then heat inactivated at 65 °C for 15 minutes.
The fragments were then subjected to PCR with one primer that corresponded to the PCR primer template sequence 5'-d(GATCCGAAGGGGTTCGAATT)-3' (Figure 2, Step 5). The PCR mixture. contained approx. 1 ng ligated DNA fragments, 5 units AmpliTaq Gold polymerase IPerkins. Elmer), 5 uM pimer, 200uM dN-TPs,. l5 mM
Tris-HCI (pH8.2), 50 mM KCI, 2.5 mM MgCl2 in a final volume of 50 ul. PCR was performed in a Perkin-Elmer 9600 thermocycler using an initial 10 minute denaturation at 95 °C, 35 cycles of a 1 minute denaturation at 94 °C, annealing for 1 minute at 57 °C
and extension at 72 °C for ~; minutes. This is followed by a final 5 minute extension cycle at 72 °C.
The PCR products were then purified with QIAquick PCR Purification kit (Qiagen) according to the manufacturer's instructions and fragmented with DNase I.
The remaining fragments were then labeled with biotin-N6-ddATP as follows:
In each tube, incubate 10 ug DNA with 0.3 unit DnaseI (Promega) at 37 °C for 30 minutes in a 45 ul mixture also containing 10 mM Tris-Actate (pH 7.5), 10 mM
magnesium acetate and 50 mM potassium acetate. Stop the reaction by heating the sample to 95 °C for 15 minutes. Label the sample by adding 60 unit terminal o transferase and 4 pmol biotin-N6-ddATP (Dupont NEN) followed by incubation at 37 °C for 90 minutes and a final heat inactivation at 95 °C for 15 minutes.
The labeled DNA was then hybridized to an array in a hybridization mixture containing 80 ug labeled DNA, 160 ug human COT-1 DNA (GIBCO), 3.S M
tetramethylamonium cloride, 10 mM MES (pH 6.5), 0.01 % Triton-100, 20 ug herring sperm DNA, 100 ug bovine serum albumin and 200 pM control oligomer at 44 °C for 40 hours on a rotisserie at 40 rpm. The arrays were then washed with 0.1 M
MaCI in 10 mM MES at 44 °C for 30 minutes on a rotisserie at 40 rpm. The hybridized arrays were then stained with a staining solution [10 mM MES (ph 6.5), 1 M NaCI, 10 ug/ml steptaviden R-phycoerythrin, 0.5 mg/mI acetylated BSA, 0.01% Triton-100] at 40 °C
for 15 minutes. The arrays were then washed with 6x SSPET [0.9 M NaCI, 60 mM
NaH2P04 (pH 7.4), 6 mM EDTA, 0.005 % Triton-100J on a GeneChip~ Fluidics Station (Affymetrix, Inc., Smta Clara, CA) 10 times at 22 °C. The arrays were then anti-streptavidin antibody stained at 40 °C for 30 minutes with antibody solution [lOmM MES (pH 6.5), 1 M NaCI, 10 ug/m1 streptavidin R-phycoerythrin, 0.5 mg/ml actylated BSA, 0.01% Triton-100]. The arrays are then restained with staining solution for 15 minutes followed by 6X SSPET washing as above. The arrays are then scanned with a confocal scanner at 560 nm. The hybridization patterns were then screened for SNP detection with a computer program as described in D.G. Wang et al Science 280, 1077-1082, 1998. The results of the hybridization can be seen in Figures 8A
and 8B.
Example 2 - Digestion with a Tvne Its Endonuclease and Selective Li. ation Complexity was reproducibly reduced after digestion with a type Its endonuclease and selective ligation to an adaptor sequence. 2 ug of genomic DNA was digested with Bbv I at 37 °C overnight. (Figure 3, Step 1) The enzyme was heat inactivated at 65 °C for 15 minutes.
5 Adaptors containing PCR primer template sequences were ligated in a 50 ul mixture of 400 ng digested genomic DNA, 10 pmol adaptor and 40 unit T4 ligase in a 1 X T4 ligase buffer. (Figure: 3, Step 2) The adaptor sequences were as follows: 5'-d(pATNNGATCCGAAGG(iTTCGAATTC)-3' and 5'GAATTCGAACCCCTTC'GGATC)-3'. The ligation was conducted at 16°C
to overnight. The ligase was inactivated by incubation at 65°C for 15 minutes.
The fragments were then subjected to PCR with one primer that corresponded to the PCR primer template sequence: 5'-GAATTCGAACCCCTTCGGATC)-3' in a 50 ul reaction containing 20 ng ligated DNA, 1 unit AmpliTaq Gold polymerase (Perkins Elmer), 3 uM primer, 200uM dNTPs, 15 mM Tris-HCl (pH8.0), 50 mM KCI, 2.5 mM
15 MgCl2. PCR was performed in a Perkin-Elmer 9600 thermocycler using an initial 10 minute denaturation at 95°C, 35 cycles of a 0.5 minute denaturation at 94°C, annealing for 0.5 minute at 57°C and extension at 72°C for 2 minutes. This is followed by a final 5 minute extension cycle at 72°C.
2o Example 3 - Double Digestion and Selective PCR
Human genomic DNA was digested in a 40 ul reaction at 37 °C for 1 hour. The reaction mixture contained 0.5 ug human genomic DNA, 0.5 mM DTT, 5 unit EcoRI
(New England Biolabs), 5 units Sau3AI (New England Biolabs), 0.5 ng/ul BSA, 10 mM Tris-Acetate (pH 7.5), 117 mM magnesium acetate and SO mM potassium acetate.
25 The enzymes were inactivated at 65 °C for 15 minutes.
The restriction fragmf;nts were then ligated to adaptor sequences. The ligation mixture contained: 5 pmol Eco R I adaptor [5'-d(pAATTCGAACCCCTTCGGATC)-3' and 5'-d(GATCCGAAGGG<JTTCG)-3'], 50 pmol Sau3A I adaptor [S'-d(pGATCGCCCTATAGTGAGTCGTATTACAGTGGACCATCGAGGGTCA)-3'], 5 3o mM DTT, 0.5 ng/ul BSA, 100 unit T4 DNA ligase, 1 mM ATP, 10 mM Tris-Acetate (pH 7.5), 10 mM magnesium acetate and 50 mM potassium acetate]. The ligation mixture was incubated with the restriction fragments at 37°C for 3 hours. The ligase was inactivated at 65 °C for 20 minutes.
The ligated DNA target was then amplified by PCR. The PCR mixture contained 12.5 ng ligated DNA, 1 unit AmpliTaq Gold polumerase (Perkins Elrner), 0.272 rnM EcoRI selective primer (5'-AAGGGGTTCGGAATTCCC-3'; CC as the selective bases), 0.272 uM Sau3AI selective primer (5'-TCACTATAGGGCGATCTG-3'; TG as the selective bases), 200 uM dNTPs, 15 mM Tris-HCl (pH 8,0), 50 mM
KCI, 2.5 mM MgCl2 in a final volume of 50 ul. PCR was performed in a Perkin-Elmer thermocycler using an initial 10 minute denaturation at 95 °C, 35 cycles of a 1 minute 1o denaturation at 94 °C, annealing for 1 minute at 56 °C and extension at 72 for 2 minutes. This is followed by a final 5 minute extension at 72 °C.
Example 4, Arbitaril, Primed PCR
PCR pimers were designed with the Operon Oligo Toolkit described in the 15 specification above.
Human genomic DNA was amplified in a 100 ul reaction containing 100 ng genomic DNA, 1.25 units ArnpliTaq Gold polymerase (Perkin Elmer), 10 uM
arbitary primer, 200 mM dNTPs, 10 rnM tris-HCI (pH 8.3), 50 mM KCI and 2.5 mM MgCl2.
PCR was performed in a Perkin-Elmer 9600 thermocycler using an initial 10 2o minute denaturation at 95 °C., 35 cycles of a 1 minute denaturation at 94 °C, annealing for 1 minute at 56 °C and extension at 72 for 2 minutes. This is followed by a final 7 minute extension at 72 °C.
The PCR product was. then purified, fragmented, labeled and hybridized as described in the examples above.
Example 5 - SNP discovery - Generally As an example, the present invention may be directed to a method for simplifying the detection of or comparing the presence of absence of SNPS-among ._ individuals, populations, species or between different species. This invention allows 3o for a quick and cost-effective method of comparing polymorphism data between multiple individuals. First, a reduced representation of a nucleic acid sample is produced in a repeatable and highly reproducible manner from multiple individuals, using any of the above described techniques alone or in combination. Then, the data generated by hybridizing the DNA samples collected from multiple individuals to identical arrays in order to detect for the presence or absence of a number of sequence variants is compared. Arrays are designed to detect specific SNPS or simply to detect the presence of a region known to frequently contain SNPS. In the latter case, other techniques such as sequencing could be employed to identify the SNP.
SNP discovery - method 1 Typically, the detection of SNPs has been made using at least one procedure in which the nucleic acid sequence that may contain the SNP is amplified using PCR
primers. This use can create an expense if many SNPs are to be evaluated or tested and it adds significantly more time to the experiment for primer design and selection and testing. The following example eliminates the need for the specific PCR
amplification step or steps. First, using the; methods provided in example 1 above, a restriction enzyme or enzymes is used to cut genomic DNA at a large number of sites and a size range of restriction fragments is selected for assay. An electronic database, such as GenBank is queried to determine which sequences would be cut with the specific restriction enzymes) that were selected above. The sequences of the resulting 2o fragments are then used to design DNA arrays which will screen the regions for the SNPs or other variants. The ,selected fragments are then subjected to further fragmentation and hybridized) to the array for analysis.
SNP discover~Method 2 Alternatively, the method provided in example 2 above may be employed, type IIS restriction enzymes cut ge,nomic DNA from each individual and adaptor sequences are designed to ligate to specific fragments as desired. Adaptor sequences may include both random and specific nucleotide ends as required to produce the desired result. If desired, amplification primers may be designed to hybridize to the adaptor sequences, allowing for amplification of only the fragments of interest. An electronic database and computer modeling system may be used to aid in the selection of appropriate experimental conditions and to design the appropriate arrays. The fragments are then hybridized to the array for analysis.
SNP discovery - Method 3 As another alternative, MutS Protein were used to isolate DNA containing SNPS for analysis on an array. 3 ugs of DNA was fragmented with Eco R I
(alternatively a Dnase I could have been used.) At this point an equal amount of control DNA was added (th.is step is optional).
O.Sug of the fragments were denatured at 95 °C for 10 minutes and gradually 1o cooled to 65 °C over a 60 minute period. The fragments were then incubated at 65 °C
for 30 minutes and the temperature was ramped down to 25 °C over a 60 minute period.
1.5 ug MutS protein (Epicer,~tre) was then added and allowed to incubate at room temperature for 15 minutes 1:o allow for binding. (Figure 7, Step 1 ) The bound fragments were then digested with 20 units T7 polymerase (New England Biolabs) at 30 °C for 30 minutes. {Figure 7, Step 2) The T7 polymerase was inactivated by incubation at 65 °C for 10 minutes.
Single stranded DNA was trimmed with 100 units of nuclease S 1 (Boehringer-Mannheim) at 16 °C for 15 minutes. {Figure 7, Step 3) The enzymes inactivated by adding 50 nmol EDTA and incubation at 65°C for 15 minutes.
Adaptor sequences containing PCR primer templates were then ligated to the DNA sequences in a 10 ul li;gation mixture: lul DNA solution, 4 ul dH20, 1 ul lOX T4 DNA ligase buffer, 3 ul 10 mM adaptor [5'-d(GATCCGAAGGGGTTCGAATT)-3' and 5'-d(pGAATTCGAACCCCTTCGGATC-e') and 1 ul 400 U/ul T4 DNA ligase]
and incubated at 16 °C overnight and then inactivated at 65 °C
for 15 minutes. (Figure 7, Step 4) The sequences were ;amplified in a 25 ul reaction containing 0.25 pmol template DNA, 0.125 units AmpliTaq Gold polymerase (Perkin Elmer), 3 uM primer, [5'-d(GATCCGAAGGGGTTC(sAATT)-3'], 200 uM dNTPs, 15 mM tn's-HCl (pH 8.0), 50r--mM KCl and 1.5 mM MgCl;z.
3o PCR was performed in a MJ Research Tetrad thermocycler using an initial 10 minute denaturation at 95 "C', 35 cycles of a 0.5 minute denaturation at 94 °C, annealing for 0.5 minute at 57 °C and extension at 72 °C. This is followed by a final 5 minute extension at 72 °C.
The sequences were then labeled and hybridized to an array as described above.
s SNP discovery - Method 4 As another alternative, oligonucletides attached to magnetic beads may be used for allele specific SNP enrichment and genotyping. Synthesized biotin-tagged oligonucleotides containing sequences complementary to the regions of desired SNPs were mixed with target DNA in a 1000: 1 ratio. (Alternatively, a 10:1, 20:1, 50:1, 250:1 or any other ratio could have been chosen.) The sample was then denatured at 95 °C for 10 minutes allowed to reanneal by slowly cooling to room temperature.
The sample was then bound to streptavadin-magnetic beads (Promega) by mixing the sample and the beads and incubation at room temperature for 10 minutes.
The beads were then washed with 1X MES with 1M Sodium Chloride (NaCI) three times. The beads were then resuspended in 50 ul 1X mung bean nuclease buffer.and mixed with 1 unit of mung bean nuclease. The beads were then incubated at 30°C for 15 minutes. The mung bean nuclease was then inactivated by adding 1 % SDS. The beads were then washed with 1 X MES with 1 M NaCI three times.
2o The beads were then resuspended in ligation mixture containing T4 ligase in X T4 ligase buffer and 200 fold excess adaptor I sequence [5'-d(ATTAACCCTCACTAAAGCTGGAG)-3'and S'-d(pCTCCAGCTTTAGTGAGGGTTAAT)-3' BpmI recognition sites are highlighted in boldface] at 16 °C overnight. The ligase was then inactivated by incubation at 65 °C
for 10 minutes.
The beads were then washed with 1X MES with 1M NaCI three times and then resuspended in 50 ul 1X Bpm I restriction buffer. BPM I was then added and the beads were incubated at 37 °C'. for 1 hr. The enzyme was inactivated by incubation at ..
65 °C for 10 minutes and the supernatant solution with the sequences containing the 3o desired SNPs was collected.
A second set of adaptor sequences containing PCR template sequences [5'-d(pCTATAGTGAGTCGTATT-3') and (S'-AATACGACTCACTATAGNN-3')) and ligase were then added to the supernatant solution and incubated at 16 °C overnight.
The ligase was then heat inactivated at 65 °C for 10 minutes.
The samples were then amplified with PCR using T3 (5'-ATTAACCCTCACTAAAG-3') and T7 5'-d(TAATACGACTCACTATAGGG)-3' sequencing primers (Operon) in a 50 ml reaction containing 106 copies of each target DNA, 1 unit AmpliTaq Gold polymerase (Perkin Elmer), 2 uM each primer, 200 uM
dNTPs, 1 S mM tris-HCl (pH 8.0), 50 mM KCl and 2.5 mM MgCl2.
1 o PCR was performed in a MJ Research Tetrad Thermocycler using an initial 10 minute denaturation at 95 °C'., 45 cycles of a 0.5 minute denaturation at 94 °C, annealing for 0.5 minute at ~2 °C and extension at 72 °C for 1 minute. This is followed by a final 5 minute extension at 72 °C. The fragments were then labeled and hybridized to an array.
Methods of Use The present methods of sample preparation and analysis are appropriate for a wide variety of applications. Any analysis of genomic DNA may be benefitted by a reproducible method of complexity management.
2o As a preferred embodiment, the present procedure can be used for SNP
discovery and to genotype individuals. For example, any of the procedures described above, alone or in combination, could be used to isolate the SNPs present in one or more specific regions of genomic DNA. Arrays could then be designed and manufactured on a large scale basis to interrogate only those fragments containing the regions of interest. Thereafter, a sample from one or more individuals would be obtained and prepared using the same techniques which were used to design the array.
Each sample can then be hybridized to a pre-designed array and the hybridization pattern can be analyzed to determine. the genotype of each individual.or a population of individuals as a whole. Methods of use for polymorphisms can be found in, for 3o example, co-pending U.S. application 08/813,159. Some methods of use are briefly discussed below.
Correlation of Polymorphisms with Phenotmic Traits Some polymorphisms occur within a protein coding sequence and contribute to phenotype by affecting protein structure. The effect may be neutral, beneficial or detrimental, or both beneficial and detrimental, depending on the circumstances. For example, a heterozygous sickle cell mutation (which involves a single nucleotide polymorphism) confers resistance to malaria, but a homozygous sickle cell mutation is usually lethal. Other polymorphisms occur in noncoding regions but may exert phenotypic effects indirectly via influence on replication, transcription, and translation.
A single polymorphism may affect more than one phenotypic trait. Likewise, a single to phenotypic trait may be affected by polymorphisms in different genes.
Further, some polymorphisms predispose an individual to a distinct mutation that is causally related to a certain phenotype.
Phenotypic traits include diseases that have known but hitherto unmapped genetic components (e.g., agammaglobulimenia, diabetes insipidus, Lesch-Nyhan syndrome, muscular dystrophy, Wiskott-Aldrich syndrome, Fabry's disease, familial hypercholesterolemia, polycystic kidney disease, hereditary spherocytosis, von Willebrand's disease, tuberous sclerosis, hereditary hemorrhagic telangiectasia, familial colonic polyposis, Ehlers-Danlos syndrome, osteogenesis imperfecta, and acute intermittent porphyria). Phenotypic traits also include symptoms of, or susceptibility to, multifactorial diseases of which a component is or may be genetic, such as autoimmune diseases, inflammation, cancer, diseases of the nervous system, and infection by pathogenic microorganisms. Some examples of autoimmune diseases include rheumatoid arthritis, multiple sclerosis, diabetes (insulin-dependent and non-independent), systemic lupus erythematosus and Graves disease. Some examples of cancers include cancers of the bladder, brain, breast, colon, esophagus, kidney, leukemia, liver, lung, oral cavity, ovary, pancreas, prostate, skin, stomach and uterus.
Phenotypic traits also include: characteristics such as longevity, appearance (e.g., baldness, obesity), strength, speed, endurance, fertility, and susceptibility or receptivity to particular drugs or therapeutic treatments.
3o Correlation is performed for a population of individuals who have been tested for the presence or absence of a phenotypic trait of interest and for polymoiphic markers sets. To perform such analysis, the presence or absence of a set of polymorphisms (i.e. a polyrnorphic set) is determined for a set of the individuals, some of whom exhibit a particular trait, and some of which exhibit lack of the trait. The alleles of each polymorphism of the set are then reviewed to determine whether the presence or absence of a particular allele is associated with the trait of interest.
s Correlation can be performed by standard statistical methods such as a K-squared test and statistically significant con elations between polymorphic forms) and phenotypic characteristics are noted. For example, it might be found that the presence of allele A1 at polymorphism A correlates with heart disease. As a further example, it might be faund that the combined presence of allele Al at polymorphism A and allele B1 at polymorphism B correlates with increased milk production of a farm animal.
(See, Beitz et al., US 5,292,639 Genetic Mapping of Phenotypic Traits Linkage analysis is useful for mapping a genetic locus associated with a is phenotypic trait to a chromosomal position, and thereby cloning gene{s) responsible for the trait. See Lander et al., F'roc. Natl. Acad. Sci. (USA) 83, 7353-7357 (1986); Lander et al., Proc. Natl. Acad. Sci. ~~I~SA) 84, 2363-2367 (1987); Donis-Keller et al., Cell S1, 319-337 (1987); Lander et al., Genetics 121, 185-199 (1989)). Genes localized by linkage can be cloned by a process known as directional cloning. See Wainwright, 2o Med. J. Australia 159, 170-174 (1993); Collins, Nature Genetics 1, 3-6 (1992) (each of which is incorporated by reference in its entirety for all purposes).
Linkage studies are typically performed on members of a family. Available members of the family are characterized for the presence or absence of a phenotypic trait and for a set of polymorphic markers. The distribution of polymorphic markers in 25 an informative meiosis is them analyzed to determine which polymorphic markers co-segregate with a phenotypic trait. See, e.g., Kerem et al., Science 245, 1073-(1989); Monaco et al., Nature 316, 842 (1985); Yamoka et al., Neurology 40, (1990); Rossiter et al., FASE~3 Journal 5, 21-27 (1991).
3o Diseauilibrium rnappin~ of the entire e~ nome Linkage disequilibriu~n or allelic association is the preferential association of a particular allele or genetic marker with a specific allele, or genetic marker at a nearby chromosomal location more frequently than expected by chance for any particular allele frequency in the population. For example, if locus X has alleles a and b, which occur equally frequently, and linked locus Y has alleles c and d, which occur equally frequently, one would expect the combination ac to occur with a frequency of 0.25. If ac occurs more frequently, then alleles a and c are in linkage disequilibrium.
Linkage disequilibrium may result from natural selection of certain combination of alleles or because an allele has been introduced into a population too recently to have reached equilibrium with linked alleles.
A marker in linkage disequilibrium can be particularly useful in detecting susceptibility to disease (or other phenotype) notwithstanding that the marker does not cause the disease. For example, a marker (X) that is not itself a causative element of a disease, but which is in linkage disequilibrium with a gene {including regulatory sequences) (Y) that is a causative element of a phenotype, can be detected to indicate susceptibility to the disease in circumstances in which the gene Y may not have been 1s identified or may not be readily detectable.
Marker assisted breedi~
Genetic markers can decipher the genomes in animals and crop plants. Genetic markers can aid a breeder in the understanding, selecting and managing of the genetic 2o complexity of an agronomic or desirable trait. The agriculture world, for example, has a great deal of incentive to tn~ to produce food with a rising number of desirable traits (high yield, disease resistancc;, taste, smell, color, texture, etc.) as consumer demand and expectations increase. However, many traits, even when the molecular mechanisms are known, are too difficult or costly to monitor during production.
25 Readibly detectable polymophisms which are in close physical proximity to the desired genes can be used as .a proxy to determine whether the desired trait is present or not in a particular organism. This provides for an efficient screening tool which can accelerate the selective breeding process.
3o Pharmaco~enomics Genetic information can provide a powerful tool for doctors to determine what course of medicine is best for a particular patient. A recent Science paper entitled "Molecular Classification o:f Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," (to~ be published 10/15/99 hereby incorporated by reference in its entirety for all purposes) discusses the use of genetic information discovered through the use of arrays to determine the specific type of cancer a particular patient has. The paper goes on to discuss the ways in which particular treatment options can then be tailored for each patient's particular type of cancer. Similar uses of genetic information for treatment plans have been disclosed for patients with HIV. (See US Patent Application 5,861,242).
The pharmaceutical industry is likewise interested in the area of 1o pharmacogenomics. Every year pharmaceutical companies suffer large losses from drugs which fail clinical trials for one reason or another. Some of the most difficult are those drugs which, while being highly effective for a large percentage of the population, prove dangerous or even lethal for a very small percentage of the population. Pharmacogenomics can be used to correlate a specific genotype with 15 specific responses to a drug. The basic idea is to get the right drug to the right patient.
If pharmaceutical companies (and later, physicians) can accurately remove from the potential recipient pool those; patients who would suffer adverse responses to a particular drug, many research efforts which are currently being dropped by pharmaceutical companies could be resurrected saving hundreds of thousands of dollars 2o for the companies and providing many currently unavailable medications to patients.
Similarly, some medications may be highly effective for only a very small percentage of the population while proving only slightly effective or even ineffective to a large percentage of patients. Pharmacogenomics allows pharamaceutical companies to predict which patients would be the ideal candidate for a particular drug, thereby 25 dramatically reducing failure rates and providing greater incentive to companies to continue to conduct research into those drugs.
Forensics The capacity to identify a distinguishing or unique set of forensic markers in an 3o individual is useful for forensic analysis. For example, one can determine whether a blood sample from a suspect matches a blood or other tissue sample from a crime scene by determining whether the set of polymorphic forms occupying selected polymorphic sites is the same in the suspect and the sample. If the set of polymorphic markers does not match between a suspect and a sample, it can be concluded (barring experimental error) that the suspect was not the source of the sample. If the set of markers does match, one can conclude that the DNA from the suspect is consistent with that found at 5 the crime scene. If frequencies of the polymorphic forms at the loci tested have been determined (e.g., by analysis of a suitable population of individuals), one can perform a statistical analysis to determine the probability that a match of suspect and crime scene sample would occur by chance.
Paternity Testing/ Determination of Relatedness 10 The object of paternity testing is usually to determine whether a male is the father of a child. In most cases, the mather of the child is known and thus, the mother's contribution to the child's genotype can be traced. Paternity testing investigates whether the part of the child';s genotype not attributable to the mother is consistent with that of the putative father. Paternity testing can be performed by analyzing sets of 15 polymorphisms in the putative father and the child. Of course, the present invention can be expanded to the use of this procedure to determine if one individual is related to another. Even more broadly, the present invention can be employed to determine how related one individual is to another, for example, between races or species.
Conclusion From the foregoing it can be seen that the advantage of the present invention is that it provides a flexible and scalable method for analyzing complex samples of DNA, such as genomic DNA. These methods are not limited to any particular type of nucleic acid sample: plant, bacterial, animal (including human) total genome DNA, RNA, cDNA and the like may be analyzed using some or all of the methods disclosed in this invention. This invention provides a powerful tool for analysis of complex nucleic acid samples. From experiment design to isolation of desired fragments and hybridization to an appropriate array, the above invention provides for faster, more efficient and less expensive methods of complex nucleic acid analysis.
All publications and patent applications cited above are incorporated by reference in their entirety for all purposes to the same extent as if each individual publication or patent application were specifically and individually indicated to be so incorporated by reference. Although the present invention has been described in some detail by way of illustration and example for purposes of clarity and understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims.
EXHIBIT A
#!/internet/bin/per15.002 -w # Copyright (c) 1998 # Eugene Wang # *** BEGIN ***
#_____________________________________________________________________ #input sequence (File 0) to compare #-____________________________________________________________________ if ($#ARGV < 2) {die "argv < 2";) open(Enzymelnput,$ARGV[0]) II die "Cannot open input file $ARGV[O]";
#print "Input Enzyme 1 sequence = ";
$Elsequence = <EnzymeInput>;
chomp $Elsequence;
$lenElSeq = length($Elsequence);
$Elsequence = tr/a-z/A-Z/;
$ElExtLoc = <EnzymeInput>;
chomp($ElExtLoc):
$lenElTotal = $:lenElSeq + $ElExtLoc;
#print "Input Enzyme 2 sequence = ";
$E2sequence = <Enzymelnput>;
chomp $E2sequence;
$E2sequence = reverse($E2sequence);
$lenE2Seq = length($E2sequence);
$E2sequence =~ tr/a-z/A-Z/;
$E2ExtLoc = <EnzymeInput>;
chomp($E2ExtLoc);
$lenE2Tota1 = $lenE2Seq; + $E2ExtLoc;
$lenElExtra = $E2ExtLoc - $ElExtLoc;
$ElSizeStart = <EnzymeInput>;
chomp($ElSizeStart):
$ElSizeEnd = <EnzymeInput>;
chomp($ElSizeEnd);
______________________._______________________________________________ # .-.. ,..
#open input FASTA file (File 1) #______________________.________________________________________________ #print "Input file name = ";
#Sfname = <>;
#chomp $fname;
#$fname = "H DJ0167F23.aeq";
open(Infile,$ARGV[1]) II die "Cannot open input file $ARGV[1]";
#--____________________._.______________________________________________ #open output file (File 2) #______________________._______________________________________________ open (Outfile,">$ARGV[2]"') II die "Cannot open output file $ARGV[2]";
#open (Outfile,">output.txt");
#print Outfile "Qualifier\tSequence";
#_____________________._________._______________________________________ #read input FASTA file #______________________________._______________________________________ $line = <Infile>; #header line print Outfile "$line";
$linecount = 0;
$FullSeq = "";
#_________________________-___________________________________________ #check headerline format #_____________________________________________________________________ chomp $line;
@fields = split (/\I/~$line);
$ntokens = 0;
foreach (@fields) ($nt:okens++;}
#$ntokens = @fields;
if ($ntokens > 3) ($FragmentID = $fields[3];}
else ( $line =~ s/~> />/;
@fields = split (/ /.$line);
$ntokens = 0;
foreach (@fields) ($ntokens++;}
if ($ntokens > 0) ($FragmentID = $fields[0]; $FragmentID =~ s/~>//;}
else ($FragmentID = "UnknownFragment";}
}
while ($line = <Infile>) #read in a line chomp $line;
# print "$line\n"; ' -$linecount++;
next if ($line eq "");
if ($line =~ /~'#/ II $line =~ /~>/) ##if first char is a '#' or '>' ( &CompareSequ~ithEnzyme ClassIIs(); ##compare the sequences before this line WO 00/24939 PC'f/US99/25200 print Outfil~e "\n\n\n$line\n";
$FullSeq = "";
$linecount = 0;
@fields = split (/\I/.$line);
$FragmentID = $fields[3];
}
else $FullSeq . $.line;
}
#print Outfile "$FullSeq";
close (Infile);
#_____________________________________________________________________ _______________ #compare sequence with FASTA input #_____________________.________________________________________________ &CompareSeqWithEnzyme~ClassIIs();
#--___________________________________________________________________ #close output file #_____________________________________________________________________ close (Outfile);
#compare sequence with FASTA input
sub CompareSeqWithEnzyme ClassIIs() $lenFullSeq = length($FullSeq);
if ($lenFullSeq <= 0) (return(0);}
print Outfile "TotalLength:\t$lenFullSeq\n";
print Outfile "Enzyme top strand: ";
print Outfile "(5\'-$Elsequence";
if ($ElExtLoc>0) {print Outfile "(N)$ElExtLoc";}
print Outfile "-3\')":
print Outfile "\n";
print Outfile "Enzyme bottom strand: ";
print Outfile "(5\'-";
if ($E2ExtLoc>0) (print Outfile "'(N)$E2ExtLoc";}
print Outfile "$E2sequen.ce-3\')";
print Outfile " or ";
my $ts = reverse($E2sequence);
print Outfile "(3\'-$t~~";
if ($E2ExtLoc>0) (print Outfile "(N)$E2ExtLoc";}
print Outfile "-5\')"~
print Outfile "\n";
print Outfile "Segment size: $ElSizeStart - $ElSizeEnd\n";
$minLen = $lenElTotal < $lenE2Tota1 ? $lenElTotal : $lenE2Total;
$maxLen = $lenElTotal > $lenE2Tota1 ? $lenElTotal : $lenE2Total;
$nMatchEl = 0;
$nSelected = 0;
@EnzLocLeft = ();
@EnzLocRight = ();
@EnzTypeLeft = ();
@EnzTypeRight = ();
if ($minLen > 0) # for ($i=0; $i <_ $lenFullSeq-$lenElSeq; $i++) for ($i=0; $i <_ $lenFullSeq-$maxLen; $i++) ( if (substr($FullSeq,$i,$lenElSeq) eq $Elsequence) ( # $EnzLocLeft($nMatchEl] _ $i + $lenElTotal;
##have to use push() # $EnzTypeLeft[$nMatchEl] = 1;
push(@EnzLoc:Left,$i + $lenElTotal);
push(@EnzTypeLeft,l);
# print Outfile "$nMatchEl\t$i\t";
# print Outfile "type 1\t";
# print Outfile "$Elsequence\t";
# print Outfile substr($FullSeq,$i,$lenElTotal);
# print Outfile "\n";
if ($nMatchEl > 0) push(@EnzLocRight,$i + $lenElTotal-1);
push(@EnzTypeRight,l);
) $nMatchEl++;
}
# if (sub~str($FullSeq,$i+$E2ExtLoc,$lenE2Seq) eq $E2sequence) elsif (substr($FullSeq,$i+$E2ExtLoc,$lenE2Seq) eq $E2sequence) ( # $EnzLocLeft($nMatchEl] _ $i;
# $EnzCutLeft[$nMatchEl] = 2; Y
push(@EnzLocLeft,$i);
push(@EnzTypeLeft,2);
# print Outfile "$nMatchEl\t$i\t";
# print Outfile "type 2\t";
# print Outfile "$E2sequence\t";
# print C>utfile substr($FullSeq,$i,$lenE2Tota1);
# print Outfile "\n";
if ($nMatchEl > 0) ( push(@EnzLocRight,$i-1);
push(@EnzTypeRight,2);
}
}
}
$nMatchEl++;
if ($nMatchEl > 0) {
push(@EnzLocRight,$i-1);
push{@EnzTypeRight,2);
}
print Outfile "Number of segments: $nMatchEl\n";
if ($nMatchEl !__ ($#EnzLocRight+1)) {die ("Counting error...nMatchEl($nMatchEl) !_ $#EnzLocRight");}
print Outfile "f9atched loci:\n";
for ($i=0; $i < $nMatchEl; $i++) {
print Outfile "$EnzLocLeft[$i]\t";
}
print Outfile "\nSegment Size:\n";
for ($i=0; $i < $nMatchEl-1; $i++) {
$tmpSegSize = $EnzLocRight[$i] - $EnzLocLeft($i] + 1;
if ($tmpSegSize >_ $ElSizeStart && $tmpSegSize <_ $ElSizeEnd) $SegSel<~cted[$nSelected++] _ $i;
}
print Outfile "$tmpSegSize\t";
}
##-____________________._______________________________________________ ## print out the Segment (E1) sequences ##-____________________._-_____________________________________________ print Outfile "\nSegments Selected ($nSelected):";
for ($i=0; $i < $nSelect ed; $i++) {
$selSeq = $SegS~alected[$i];
$Elleft = $EnzLocLeft[$selSeq];
$Elright = $Erzz:~ocRight[$selSeq];
if ($lenElExtra :> 0) {$Elright +_ $lenElExtra;}
else {$Elleft +:_ $lenElExtra;}
$lenSelSeq = $El:right - $Elleft + 1;
WO 00!24939 PCTNS99/25200 $OutputHeaderLine = ">" . $FragmentID ."-" .$selSeq .
"\tsize=" . $lenSelSeq;
$OutputHeaderLine . "\tLoci=" . $Elleft . "-" . $Elright;
$OutputHeaderLine: . "\tEnz$EnzTypeLeft[$selSeq]-Enz$EnzTypeRight[$selSeq]";
print Outfile "\n$OutputHeaderLine";
print "$OutputHeaderLine";
# Segment sequence $SeqEltoNextEl =- substr($FullSeq,$Elleft,$lenSelSeq);
print Outfile "\n$SeqEltoNextE1\n";
print "\n$SeqEll~oNextE1\n";
IS }
return ($lenFullSeq);
}
EXHIBIT B
#!/internet/bin/per15.00Z -w #****************************************************************
# Copyright (c) 1998 # Author: Eugene Wang # Title: Ligate # Purpose: Find matching segments/sequences in two files #****************************************************************
if ($#ARGV != 2) {die "Number of argv ($#ARGV+1) != 3";}
#_____________________.________________________________________________ #input file #_____________________________________________________________________ open(InfileLigate,$ARGV[0]) or die "Open error...$ARGV[OJ\n";
$locLigate = <InfileLigate>;
chomp $locLigate;
$seqLigate = <InfileLigate>;
chomp $seqLigate;
close (InfileLigate);
______________________________________________________~_____________ :.
#output file #-____________________________________________________________________ open(Infile,$ARGV[1]) or die "Open error...$ARGV[1]\n";
$OutName = $ARGV[2];
open (Outfile,">$OutName") or die("Open error...$OutName");
$alreadyReadOne = 0;
$sequence = "";
while ($line = <Infile>) #read in a line {
chomp $line;
next if ($line e~q "") ' if ($line =~ /~#/ II $line =~ /~>/) ##if first char is a '#' or '>' {
if ($already~:eadOne =- 1) {
if (&Lic~ate($sequence,$locLigate,$seqLigate) _-- 1) {
print, Outfile "$headerLine\n";
print Outfile "$sequence\n";
):
$sequenc:e = "";
) $headerLine _- $line;
$alreadyReadOne = 1;
else {
$sequence . Sline;
) if ($alreadyReadOne =- .L) {
if (&Ligate($sequence,$locLigate,$seqLigate) _- 1) {
print Outfile "SheaderLine\n";
print Outfile "$sequence\n";
);
close (Infile);
close (Outfile);
#compare sequence with Ligation Adapter sequence
sub Ligate() local $retcode = 0;
local ($seq,$locLigate,$seqLigate) _ @-; ~- ..
local $lenLigate = length($seqLigate):
local $lenSeq = length($seq):
SS
if ((substr($seq.$locLigate,$lenLigate) eq $seqLigate) &&
(substr($seq,$lenSeq-$locLigate-$lenLigate,$lenLigate) eq $seqLigate)) {
$retcode = 1;
return $retcode;
Claims (38)
1. A method of analyzing a first nucleic sample comprising:
providing said first nucleic acid sample;
reproducibly reducing the complexity of said first nucleic acid sample to produce a second nucleic acid sample which may comprise a plurality of non-identical sequences whereby said second nucleic acid sample is obtainable by:
fragmenting said first nucleic acid sample to produce fragments and ligating adaptor sequences to said fragments;
fragmenting said first nucleic acid sample to produce fragments, denaturing said fragments, allowing some of said fragments to reanneal to form double stranded DNA sequences and removing said double stranded DNA
sequences.
amplification by arbitrarily primed PCR;
hybridizing said first nucleic acid sample to an oligonucleotide probe bound to a solid support;
hybridizing said first nucleic acid sequence to a mismatch binding protein;
providing a nucleic acid array;
hybridizing said second nucleic acid sample to said array; and analyzing a hybridization pattern resulting from said hybridization.
providing said first nucleic acid sample;
reproducibly reducing the complexity of said first nucleic acid sample to produce a second nucleic acid sample which may comprise a plurality of non-identical sequences whereby said second nucleic acid sample is obtainable by:
fragmenting said first nucleic acid sample to produce fragments and ligating adaptor sequences to said fragments;
fragmenting said first nucleic acid sample to produce fragments, denaturing said fragments, allowing some of said fragments to reanneal to form double stranded DNA sequences and removing said double stranded DNA
sequences.
amplification by arbitrarily primed PCR;
hybridizing said first nucleic acid sample to an oligonucleotide probe bound to a solid support;
hybridizing said first nucleic acid sequence to a mismatch binding protein;
providing a nucleic acid array;
hybridizing said second nucleic acid sample to said array; and analyzing a hybridization pattern resulting from said hybridization.
2. The method of claim 1 wherein said second nucleic acid sample comprises at least 0.5 % of said nucleic acid sample
3. The method of claim 1 wherein said second nucleic acid sample comprises at least 3 % of said nucleic acid. sample
4. The method of claim 1 wherein said second nucleic acid sample comprises at least 12 % of said nucleic acid sample at least 12%
5. The method of clam 1 wherein said second nucleic acid sample comprises at least 50 % of said nucleic acid sample
6. The method of claim 1 wherein each of said non-identical sequences differs from the other non-identical sequences by at least 5 nucleic acid bases.
7. The method of claim 1 wherein each of said non-identical sequences differs from the other non-identical sequences by at least 10 nucleic acid bases.
8. The method of claim 1 wherein each of said non-identical sequences differs from the other non-identical sequences by at least 50 nucleic acid bases.
9. The method of claim 1 wherein each of said non-identical sequences differs from the other non-identical sequences by at least 1000 nucleic acid bases.
10. The method of claim 1 wherein said NA sample is DNA.
11. The method of claim 1 wherein said NA sample is genomic DNA.
12. The method of claim 1 wherein said first nucleic acid sample is cDNA
derived from RNA or mRNA.
derived from RNA or mRNA.
13. The method of claim 1 further comprising the step of amplifying at least one of the non-identical sequences in said second nucleic acid sample.
14. The method of claim 13 wherein said step of amplifying is performed by a polymerase chain reaction (PCR).
15. The method of claim 1 wherein the entire method is performed in a single reaction vessel.
16. The method of claim 1 wherein said step of fragmenting the first nucleic acid sample comprises digestion with at least one restriction enzyme.
17. The method of claim 1 wherein said step of fragmenting the first nucleic acid sample comprises digestion with a type IIs endonuclease.
18. The method of claim 1 wherein said adaptor sequences comprise PCR
primer template sequences.
primer template sequences.
19. The method of claim 1 wherein said adaptor sequences comprise tag sequences.
20. The method of claim 1 wherein said solid support is a magnetic bead.
21. The method of claim 1 wherein said mismatch binding protein is bound to a magnetic bead.
22. The method of claim 1 wherein said method for analyzing a nucleic acid sample comprises determining whether the nucleic acid sample contains sequence variations.
23. The method of claim 22 wherein said sequence variations are single nucleotide polymorphisms.
24. The method of claim 1 wherein the step of obtaining a DNA array comprises:
designing a DNA array to query DNA fragments which have been produced by the identical procedures used to obtain said second nucleic acid sample.
designing a DNA array to query DNA fragments which have been produced by the identical procedures used to obtain said second nucleic acid sample.
25. The method of claim 24 wherein the step of designing further requires predetermining the sequences contained in said second nucleic acid sample.
26. The method of claim wherein said step of predetermining the sequences contained in said second nucleic acid sample is conducted in a computer system.
27. The method of claim 23 wherein said second nucleic acid sample is obtainable by:
binding oligonucleotide probes containing a desired SNP sequence to magnetic beads to form probe-bead complexes; and hybridizing said probe-bead complexes to said DNA sample;
exposing said hybridized DNA sample to a single strand DNA nuclease to remove single stranded DNA thereby forming a DNA duplex;
ligating a double stranded adaptor sequence comprising a restriction enzyme site to said DNA duplex;
digesting said DNA duplex with a restriction enzyme to release the magnetic bead; and isolating only those fragments containing said SNP sequence.
binding oligonucleotide probes containing a desired SNP sequence to magnetic beads to form probe-bead complexes; and hybridizing said probe-bead complexes to said DNA sample;
exposing said hybridized DNA sample to a single strand DNA nuclease to remove single stranded DNA thereby forming a DNA duplex;
ligating a double stranded adaptor sequence comprising a restriction enzyme site to said DNA duplex;
digesting said DNA duplex with a restriction enzyme to release the magnetic bead; and isolating only those fragments containing said SNP sequence.
28. The method of claim 25 wherein said restriction enzyme is a Class Its endonuclease.
29. The method of claim 23 wherein said second nucleic acid sample is obtainable by:
exposing the DNA sample to a mismatch bonding protein;
employing a 3' to 5' exonuclease to remove single stranded DNA; and employing a nuclease to remove single stranded DNA.
exposing the DNA sample to a mismatch bonding protein;
employing a 3' to 5' exonuclease to remove single stranded DNA; and employing a nuclease to remove single stranded DNA.
30. A method of screening for DNA sequence variations in an individual comprising:
providing said first nucleic acid sample from said individual;
providing a second nucleic acid sample by reproducibly reducing the complexity of said first nucleic acid sample to produce a second nucleic acid sample which may comprise a plurality of non-identical sequences whereby said second nucleic acid sample is obtainable by:
fragmenting said first nucleic acid sample to produce fragments and ligating adaptor sequences to said fragments;
fragmenting said first nucleic acid sample to produce fragments, denaturing said fragments, allowing some of said fragments to reanneal to form double stranded DNA sequences and removing said double stranded DNA
sequences.
amplification by arbitrarily primed PCR;
hybridizing said first nucleic acid sample to an oligonucleotide probe bound to a solid support;
hybridizing said first nucleic acid sequence to a mismatch binding protein;
providing a nucleic acid array;
hybridizing said second nucleic acid sample to said array; and analyzing a hybridization pattern resulting from said hybridization.
providing said first nucleic acid sample from said individual;
providing a second nucleic acid sample by reproducibly reducing the complexity of said first nucleic acid sample to produce a second nucleic acid sample which may comprise a plurality of non-identical sequences whereby said second nucleic acid sample is obtainable by:
fragmenting said first nucleic acid sample to produce fragments and ligating adaptor sequences to said fragments;
fragmenting said first nucleic acid sample to produce fragments, denaturing said fragments, allowing some of said fragments to reanneal to form double stranded DNA sequences and removing said double stranded DNA
sequences.
amplification by arbitrarily primed PCR;
hybridizing said first nucleic acid sample to an oligonucleotide probe bound to a solid support;
hybridizing said first nucleic acid sequence to a mismatch binding protein;
providing a nucleic acid array;
hybridizing said second nucleic acid sample to said array; and analyzing a hybridization pattern resulting from said hybridization.
31. The method of claim 30 wherein said sequence variation is a SNP.
32. The method of claim 31 wherein said SNP is associated with a disease.
33. The method of claim 31 wherein said SNP is associated with the efficacy of a drug.
34. A method of screening for DNA sequence variations in a population of individuals comprising:
providing said a first nucleic acid sample from each of said individuals;
providing a second nucleic acid sample by reproducibly reducing the complexity of said first nucleic acid sample to produce a second nucleic acid sample which may comprise a plurality of non-identical sequences whereby said second nucleic acid sample is obtainable by:
fragmenting said first nucleic acid sample to produce fragments and ligating adaptor sequences to said fragments;
fragmenting said first nucleic acid sample to produce fragments, denaturing said fragments, allowing some of said fragments to reanneal to form double stranded DNA sequences and removing said double stranded DNA
sequences.
amplification by arbitrarily primed PCR;
hybridizing said first nucleic acid sample to an oligonucleotide probe bound to a solid support;
hybridizing said first nucleic acid sequence to a mismatch binding protein;
providing a nucleic acid array;
hybridizing said second nucleic acid sample to said array; and analyzing a hybridization pattern resulting from said hybridization.
providing said a first nucleic acid sample from each of said individuals;
providing a second nucleic acid sample by reproducibly reducing the complexity of said first nucleic acid sample to produce a second nucleic acid sample which may comprise a plurality of non-identical sequences whereby said second nucleic acid sample is obtainable by:
fragmenting said first nucleic acid sample to produce fragments and ligating adaptor sequences to said fragments;
fragmenting said first nucleic acid sample to produce fragments, denaturing said fragments, allowing some of said fragments to reanneal to form double stranded DNA sequences and removing said double stranded DNA
sequences.
amplification by arbitrarily primed PCR;
hybridizing said first nucleic acid sample to an oligonucleotide probe bound to a solid support;
hybridizing said first nucleic acid sequence to a mismatch binding protein;
providing a nucleic acid array;
hybridizing said second nucleic acid sample to said array; and analyzing a hybridization pattern resulting from said hybridization.
35. The method of claim 34 further comprising the step of compiling the analyses of each individual's hybridization pattern.
36. The method of claim 34 wherein said sequence variation is a SNP.
37. In a computer system, a method of designing an array comprising:
modeling specific enzymatic reactions between a known nucleic acid sequence and an enzyme;
obtaining the results of said modeled enzymatic reactions;
obtaining probe sequences based upon said results; and designing an array to .contain said probe sequences.
modeling specific enzymatic reactions between a known nucleic acid sequence and an enzyme;
obtaining the results of said modeled enzymatic reactions;
obtaining probe sequences based upon said results; and designing an array to .contain said probe sequences.
38. A method of analyzing a plurality of nucleic acid samples, comprising treating a first nucleic acid sample according to a defined procedure that produces a first population of fragments, the collective sequences of the fragments comprising a subset of the collective sequences present in the first nucleic acid sample, determining abundance or composition of a subset of the first population of fragments;
treating a second nucleic acid sample according to the defined procedure to produce a second population of fragments containing corresponding fragments to the fragments in the first population;
determining abundance or composition of a subset of fragments in the second population having sequences corresponding to the subset of fragments in the first population.
treating a second nucleic acid sample according to the defined procedure to produce a second population of fragments containing corresponding fragments to the fragments in the first population;
determining abundance or composition of a subset of fragments in the second population having sequences corresponding to the subset of fragments in the first population.
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10586798P | 1998-10-27 | 1998-10-27 | |
US60/105,867 | 1998-10-27 | ||
US13612599P | 1999-05-26 | 1999-05-26 | |
US60/136,125 | 1999-05-26 | ||
PCT/US1999/025200 WO2000024939A1 (en) | 1998-10-27 | 1999-10-27 | Complexity management and analysis of genomic dna |
Publications (1)
Publication Number | Publication Date |
---|---|
CA2345441A1 true CA2345441A1 (en) | 2000-05-04 |
Family
ID=26803034
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002345441A Abandoned CA2345441A1 (en) | 1998-10-27 | 1999-10-27 | Complexity management and analysis of genomic dna |
Country Status (8)
Country | Link |
---|---|
US (3) | US6361947B1 (en) |
EP (1) | EP1124990B1 (en) |
JP (1) | JP2002528096A (en) |
AT (1) | ATE316152T1 (en) |
AU (1) | AU2144000A (en) |
CA (1) | CA2345441A1 (en) |
DE (1) | DE69929542T2 (en) |
WO (1) | WO2000024939A1 (en) |
Families Citing this family (181)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6974666B1 (en) * | 1994-10-21 | 2005-12-13 | Appymetric, Inc. | Methods of enzymatic discrimination enhancement and surface-bound double-stranded DNA |
EP1032705B1 (en) | 1997-10-30 | 2011-12-14 | Cold Spring Harbor Laboratory | Probe arrays and methods of using probe arrays for distinguishing dna |
US6703228B1 (en) * | 1998-09-25 | 2004-03-09 | Massachusetts Institute Of Technology | Methods and products related to genotyping and DNA analysis |
EP1157131A2 (en) * | 1999-02-22 | 2001-11-28 | Lynx Therapeutics, Inc. | Polymorphic dna fragments and uses thereof |
US20060275782A1 (en) | 1999-04-20 | 2006-12-07 | Illumina, Inc. | Detection of nucleic acid reactions on bead arrays |
US20020119448A1 (en) * | 1999-06-23 | 2002-08-29 | Joseph A. Sorge | Methods of enriching for and identifying polymorphisms |
US6958225B2 (en) | 1999-10-27 | 2005-10-25 | Affymetrix, Inc. | Complexity management of genomic DNA |
WO2001051663A2 (en) | 2000-01-11 | 2001-07-19 | Maxygen, Inc. | Integrated systems and methods for diversity generation and screening |
CA2407731A1 (en) * | 2000-05-02 | 2001-11-08 | Centre National De La Recherche Scientifique | Identification of genetic markers |
US20020164634A1 (en) * | 2000-08-26 | 2002-11-07 | Perlegen Sciences, Inc. | Methods for reducing complexity of nucleic acid samples |
US20020137043A1 (en) * | 2000-08-26 | 2002-09-26 | Nila Patil | Method for reducing complexity of nucleic acid samples |
US20020055112A1 (en) * | 2000-08-26 | 2002-05-09 | Nila Patil | Methods for reducing complexity of nucleic acid samples |
AR031640A1 (en) * | 2000-12-08 | 2003-09-24 | Applied Research Systems | ISOTHERMAL AMPLIFICATION OF NUCLEIC ACIDS IN A SOLID SUPPORT |
WO2002061145A2 (en) * | 2001-01-31 | 2002-08-08 | Ambion, Inc. | Competitive amplification of fractionated targets from multiple nucleic acid samples |
WO2002090599A1 (en) * | 2001-05-09 | 2002-11-14 | Genetic Id, Inc. | Universal microarray system |
US20030009294A1 (en) * | 2001-06-07 | 2003-01-09 | Jill Cheng | Integrated system for gene expression analysis |
US6872529B2 (en) * | 2001-07-25 | 2005-03-29 | Affymetrix, Inc. | Complexity management of genomic DNA |
US7297778B2 (en) * | 2001-07-25 | 2007-11-20 | Affymetrix, Inc. | Complexity management of genomic DNA |
AU2002357249A1 (en) * | 2001-12-13 | 2003-07-09 | Blue Heron Biotechnology, Inc. | Methods for removal of double-stranded oligonucleotides containing sequence errors using mismatch recognition proteins |
US20030157700A1 (en) * | 2001-12-19 | 2003-08-21 | Affymetrix, Inc. | Apparatus and methods for constructing array plates |
DE10208333A1 (en) * | 2002-02-27 | 2003-09-04 | Axaron Bioscience Ag | Analysis of nucleic acid fragment mixtures |
CA2478985A1 (en) | 2002-03-13 | 2003-09-25 | Syngenta Participations Ag | Nucleic acid detection method |
EP1362929A3 (en) * | 2002-05-17 | 2004-05-19 | Affymetrix, Inc. | Methods for genotyping |
US7097976B2 (en) | 2002-06-17 | 2006-08-29 | Affymetrix, Inc. | Methods of analysis of allelic imbalance |
US9388459B2 (en) | 2002-06-17 | 2016-07-12 | Affymetrix, Inc. | Methods for genotyping |
US7108976B2 (en) * | 2002-06-17 | 2006-09-19 | Affymetrix, Inc. | Complexity management of genomic DNA by locus specific amplification |
US20040072217A1 (en) * | 2002-06-17 | 2004-04-15 | Affymetrix, Inc. | Methods of analysis of linkage disequilibrium |
US7504215B2 (en) | 2002-07-12 | 2009-03-17 | Affymetrix, Inc. | Nucleic acid labeling methods |
JP4471927B2 (en) * | 2002-09-30 | 2010-06-02 | ニンブルゲン システムズ インコーポレイテッド | Array parallel loading method |
CA2500783C (en) * | 2002-10-01 | 2012-07-17 | Nimblegen Systems, Inc. | Microarrays having multiple oligonucleotides in single array features |
US7459273B2 (en) * | 2002-10-04 | 2008-12-02 | Affymetrix, Inc. | Methods for genotyping selected polymorphism |
DE10246824A1 (en) * | 2002-10-08 | 2004-04-22 | Axaron Bioscience Ag | Analyzing nucleic acid mixture by hybridization to an array, useful e.g. for expression analysis, using sample mixture of labeled restriction fragments of uniform size |
EP1580269B1 (en) * | 2002-11-07 | 2008-07-02 | Yoichi Matsubara | Method of detecting gene mutation |
US20060133957A1 (en) * | 2003-01-17 | 2006-06-22 | Knapp Merrill A | Device and method for fragmenting material by hydrodynamic shear |
US7833706B2 (en) * | 2003-01-30 | 2010-11-16 | Celera Corporation | Genetic polymorphisms associated with rheumatoid arthritis, methods of detection and uses thereof |
US20040259125A1 (en) * | 2003-02-26 | 2004-12-23 | Omni Genetics, Inc. | Methods, systems and apparatus for identifying genetic differences in disease and drug response |
US20090124514A1 (en) * | 2003-02-26 | 2009-05-14 | Perlegen Sciences, Inc. | Selection probe amplification |
US20060183132A1 (en) * | 2005-02-14 | 2006-08-17 | Perlegen Sciences, Inc. | Selection probe amplification |
US7625699B2 (en) * | 2003-03-10 | 2009-12-01 | Celera Corporation | Genetic polymorphisms associated with coronary stenosis, methods of detection and uses thereof |
WO2004081187A2 (en) * | 2003-03-10 | 2004-09-23 | Applera Corporation | Genetic polymorphisms associated with myocardial infarction, methods of detection and uses thereof |
WO2004083403A2 (en) * | 2003-03-18 | 2004-09-30 | Applera Corporation | Genetic polymorphisms associated with rheumatoid arthritis, methods of detection and uses thereof |
US20060134638A1 (en) * | 2003-04-02 | 2006-06-22 | Blue Heron Biotechnology, Inc. | Error reduction in automated gene synthesis |
RU2390561C2 (en) | 2003-05-23 | 2010-05-27 | Колд Спринг Харбор Лэборетери | Virtual sets of fragments of nucleotide sequences |
WO2005000098A2 (en) | 2003-06-10 | 2005-01-06 | The Trustees Of Boston University | Detection methods for disorders of the lung |
US20040259100A1 (en) * | 2003-06-20 | 2004-12-23 | Illumina, Inc. | Methods and compositions for whole genome amplification and genotyping |
WO2005003304A2 (en) * | 2003-06-20 | 2005-01-13 | Illumina, Inc. | Methods and compositions for whole genome amplification and genotyping |
US20050181394A1 (en) * | 2003-06-20 | 2005-08-18 | Illumina, Inc. | Methods and compositions for whole genome amplification and genotyping |
US20050032102A1 (en) * | 2003-07-22 | 2005-02-10 | Affymetrix, Inc. | Mapping genomic rearrangements |
US8114978B2 (en) | 2003-08-05 | 2012-02-14 | Affymetrix, Inc. | Methods for genotyping selected polymorphism |
US20050100911A1 (en) * | 2003-08-06 | 2005-05-12 | Perlegen Sciences, Inc. | Methods for enriching populations of nucleic acid samples |
EP1675682B1 (en) * | 2003-10-24 | 2017-07-19 | Aushon Biosystems, Inc. | Apparatus and method for dispensing fluid, semi-solid and solid samples |
WO2005054516A2 (en) | 2003-11-26 | 2005-06-16 | Advandx, Inc. | Peptide nucleic acid probes for analysis of certain staphylococcus species |
US20050233354A1 (en) * | 2004-01-22 | 2005-10-20 | Affymetrix, Inc. | Genotyping degraded or mitochandrial DNA samples |
EP1564306B1 (en) | 2004-02-17 | 2013-08-07 | Affymetrix, Inc. | Methods for fragmenting and labeling DNA |
SE0401270D0 (en) * | 2004-05-18 | 2004-05-18 | Fredrik Dahl | Method for amplifying specific nucleic acids in parallel |
EP2290071B1 (en) | 2004-05-28 | 2014-12-31 | Asuragen, Inc. | Methods and compositions involving microRNA |
EP1623996A1 (en) * | 2004-08-06 | 2006-02-08 | Deutsches Krebsforschungszentrum Stiftung des öffentlichen Rechts | Improved method of selecting a desired protein from a library |
US20060073506A1 (en) | 2004-09-17 | 2006-04-06 | Affymetrix, Inc. | Methods for identifying biological samples |
EP1645640B1 (en) | 2004-10-05 | 2013-08-21 | Affymetrix, Inc. | Method for detecting chromosomal translocations |
JP2006126204A (en) | 2004-10-29 | 2006-05-18 | Affymetrix Inc | Automated method for manufacturing polymer array |
US7682782B2 (en) | 2004-10-29 | 2010-03-23 | Affymetrix, Inc. | System, method, and product for multiple wavelength detection using single source excitation |
EP2808389A1 (en) | 2004-11-12 | 2014-12-03 | Asuragen, Inc. | Methods and compositions involving MIRNA and MIRNA inhibitor molecules |
US20060166224A1 (en) * | 2005-01-24 | 2006-07-27 | Norviel Vernon A | Associations using genotypes and phenotypes |
JP2008537120A (en) | 2005-04-14 | 2008-09-11 | ザ トラスティーズ オブ ボストン ユニバーシティ | Diagnosis of lung injury using classification prediction |
US20060286571A1 (en) | 2005-04-28 | 2006-12-21 | Prometheus Laboratories, Inc. | Methods of predicting methotrexate efficacy and toxicity |
US7452671B2 (en) * | 2005-04-29 | 2008-11-18 | Affymetrix, Inc. | Methods for genotyping with selective adaptor ligation |
EP2292788B1 (en) | 2005-06-23 | 2012-05-09 | Keygene N.V. | Strategies for high throughput identification and detection of polymorphisms |
ES2357549T3 (en) * | 2005-06-23 | 2011-04-27 | Keygene N.V. | STRATEGIES FOR THE IDENTIFICATION AND DETECTION OF HIGH PERFORMANCE OF POLYMORPHISMS. |
EP1907577A4 (en) * | 2005-06-30 | 2009-05-13 | Syngenta Participations Ag | METHODS FOR SCREENING FOR GENE SPECIFIC HYBRIDIZATION POLYMORPHISMS (GSHPs) AND THEIR USE IN GENETIC MAPPING AND MARKER DEVELOPMENT |
US20070003938A1 (en) * | 2005-06-30 | 2007-01-04 | Perlegen Sciences, Inc. | Hybridization of genomic nucleic acid without complexity reduction |
GB0514910D0 (en) | 2005-07-20 | 2005-08-24 | Solexa Ltd | Method for sequencing a polynucleotide template |
GB0514935D0 (en) | 2005-07-20 | 2005-08-24 | Solexa Ltd | Methods for sequencing a polynucleotide template |
ATE453728T1 (en) | 2005-09-29 | 2010-01-15 | Keygene Nv | HIGH-THROUGHPUT SCREENING OF MUTAGENIZED POPULATIONS |
US10316364B2 (en) | 2005-09-29 | 2019-06-11 | Keygene N.V. | Method for identifying the source of an amplicon |
GB0522310D0 (en) | 2005-11-01 | 2005-12-07 | Solexa Ltd | Methods of preparing libraries of template polynucleotides |
GB0524069D0 (en) * | 2005-11-25 | 2006-01-04 | Solexa Ltd | Preparation of templates for solid phase amplification |
US7634363B2 (en) | 2005-12-07 | 2009-12-15 | Affymetrix, Inc. | Methods for high throughput genotyping |
US11306351B2 (en) | 2005-12-21 | 2022-04-19 | Affymetrix, Inc. | Methods for genotyping |
EP3404114B1 (en) * | 2005-12-22 | 2021-05-05 | Keygene N.V. | Method for high-throughput aflp-based polymorphism detection |
CA2641851A1 (en) | 2006-02-08 | 2007-08-16 | Eric Hans Vermaas | Method for sequencing a polynucleotide template |
EP1999472A2 (en) | 2006-03-09 | 2008-12-10 | The Trustees Of Boston University | Diagnostic and prognostic methods for lung disorders using gene expression profiles from nose epithelial cells |
US7951583B2 (en) * | 2006-03-10 | 2011-05-31 | Plc Diagnostics, Inc. | Optical scanning system |
US9976192B2 (en) | 2006-03-10 | 2018-05-22 | Ldip, Llc | Waveguide-based detection system with scanning light source |
US8288157B2 (en) * | 2007-09-12 | 2012-10-16 | Plc Diagnostics, Inc. | Waveguide-based optical scanning systems |
US9423397B2 (en) | 2006-03-10 | 2016-08-23 | Indx Lifecare, Inc. | Waveguide-based detection system with scanning light source |
US9528939B2 (en) | 2006-03-10 | 2016-12-27 | Indx Lifecare, Inc. | Waveguide-based optical scanning systems |
WO2007107710A1 (en) * | 2006-03-17 | 2007-09-27 | Solexa Limited | Isothermal methods for creating clonal single molecule arrays |
ES2645661T3 (en) | 2006-04-04 | 2017-12-07 | Keygene N.V. | High performance detection of molecular markers based on restriction fragments |
EP2049682A2 (en) * | 2006-07-31 | 2009-04-22 | Illumina Cambridge Limited | Method of library preparation avoiding the formation of adaptor dimers |
JP5244103B2 (en) | 2006-08-09 | 2013-07-24 | ホームステッド クリニカル コーポレイション | Organ-specific protein and method of use thereof |
US7754429B2 (en) | 2006-10-06 | 2010-07-13 | Illumina Cambridge Limited | Method for pair-wise sequencing a plurity of target polynucleotides |
US9845494B2 (en) | 2006-10-18 | 2017-12-19 | Affymetrix, Inc. | Enzymatic methods for genotyping on arrays |
AU2007325931A1 (en) | 2006-11-02 | 2008-06-05 | Yale University | Assessment of oocyte competence |
US8293684B2 (en) * | 2006-11-29 | 2012-10-23 | Exiqon | Locked nucleic acid reagents for labelling nucleic acids |
EP2121983A2 (en) * | 2007-02-02 | 2009-11-25 | Illumina Cambridge Limited | Methods for indexing samples and sequencing multiple nucleotide templates |
CN101743326A (en) * | 2007-05-14 | 2010-06-16 | 因赛特遗传学公司 | Methods of screening nucleic acids for single nucleotide variations |
US8200440B2 (en) * | 2007-05-18 | 2012-06-12 | Affymetrix, Inc. | System, method, and computer software product for genotype determination using probe array data |
US20080293589A1 (en) * | 2007-05-24 | 2008-11-27 | Affymetrix, Inc. | Multiplex locus specific amplification |
WO2009014848A2 (en) * | 2007-06-25 | 2009-01-29 | Affymetrix, Inc. | Patterned microcodes |
WO2009012984A1 (en) * | 2007-07-26 | 2009-01-29 | Roche Diagnostics Gmbh | Target preparation for parallel sequencing of complex genomes |
WO2009032167A1 (en) * | 2007-08-29 | 2009-03-12 | Illumina Cambridge | Method for sequencing a polynucleotide template |
US9388457B2 (en) | 2007-09-14 | 2016-07-12 | Affymetrix, Inc. | Locus specific amplification using array probes |
US8716190B2 (en) | 2007-09-14 | 2014-05-06 | Affymetrix, Inc. | Amplification and analysis of selected targets on solid supports |
EP2198050A1 (en) | 2007-09-14 | 2010-06-23 | Asuragen, INC. | Micrornas differentially expressed in cervical cancer and uses thereof |
US8124336B2 (en) * | 2007-09-26 | 2012-02-28 | Population Genetics Technologies Ltd | Methods and compositions for reducing the complexity of a nucleic acid sample |
US12060554B2 (en) | 2008-03-10 | 2024-08-13 | Illumina, Inc. | Method for selecting and amplifying polynucleotides |
US9074244B2 (en) | 2008-03-11 | 2015-07-07 | Affymetrix, Inc. | Array-based translocation and rearrangement assays |
US9012370B2 (en) | 2008-03-11 | 2015-04-21 | National Cancer Center | Method for measuring chromosome, gene or specific nucleotide sequence copy numbers using SNP array |
EP2285960B1 (en) | 2008-05-08 | 2015-07-08 | Asuragen, INC. | Compositions and methods related to mir-184 modulation of neovascularization or angiogenesis |
GB2461026B (en) * | 2008-06-16 | 2011-03-09 | Plc Diagnostics Inc | System and method for nucleic acids sequencing by phased synthesis |
US8309306B2 (en) * | 2008-11-12 | 2012-11-13 | Nodality, Inc. | Detection composition |
ES2403312T3 (en) | 2009-01-13 | 2013-05-17 | Keygene N.V. | New strategies for genome sequencing |
WO2010093465A1 (en) | 2009-02-11 | 2010-08-19 | Caris Mpi, Inc. | Molecular profiling of tumors |
ES2638779T3 (en) | 2009-03-16 | 2017-10-24 | Pangu Biopharma Limited | Compositions and procedures comprising variants of histidyl tarn synthetase splicing that have non-canonical biological activities |
CA2757289A1 (en) | 2009-03-31 | 2010-10-21 | Atyr Pharma, Inc. | Compositions and methods comprising aspartyl-trna synthetases having non-canonical biological activities |
EP2425286B1 (en) * | 2009-04-29 | 2020-06-24 | Ldip, Llc | Waveguide-based detection system with scanning light source |
EP2248914A1 (en) * | 2009-05-05 | 2010-11-10 | Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V. | The use of class IIB restriction endonucleases in 2nd generation sequencing applications |
GB0912909D0 (en) * | 2009-07-23 | 2009-08-26 | Olink Genomics Ab | Probes for specific analysis of nucleic acids |
US8445201B2 (en) * | 2009-07-31 | 2013-05-21 | Affymetrix, Inc. | Hybridization device, methods, and system using mixing beads |
US20110059453A1 (en) * | 2009-08-23 | 2011-03-10 | Affymetrix, Inc. | Poly(A) Tail Length Measurement by PCR |
CN102597256B (en) | 2009-08-25 | 2014-12-03 | 伊鲁米那股份有限公司 | Methods for selecting and amplifying polynucleotides |
CN102858995B (en) | 2009-09-10 | 2016-10-26 | 森特瑞隆技术控股公司 | Targeting sequence measurement |
US10174368B2 (en) | 2009-09-10 | 2019-01-08 | Centrillion Technology Holdings Corporation | Methods and systems for sequencing long nucleic acids |
US20160186266A1 (en) | 2009-10-27 | 2016-06-30 | Carislife Sciences, Inc. | Molecular profiling for personalized medicine |
US8501122B2 (en) | 2009-12-08 | 2013-08-06 | Affymetrix, Inc. | Manufacturing and processing polymer arrays |
WO2011071382A1 (en) | 2009-12-10 | 2011-06-16 | Keygene N.V. | Polymorfphic whole genome profiling |
JP5799484B2 (en) | 2009-12-14 | 2015-10-28 | トヨタ自動車株式会社 | Probe design method in DNA microarray, DNA microarray having probe designed by the method |
US8835358B2 (en) | 2009-12-15 | 2014-09-16 | Cellular Research, Inc. | Digital counting of individual molecules by stochastic attachment of diverse labels |
CN102656279A (en) | 2009-12-17 | 2012-09-05 | 凯津公司 | Restriction enzyme based whole genome sequencing |
GB0922377D0 (en) | 2009-12-22 | 2010-02-03 | Arab Gulf University The | Mutant LDL receptor |
CA2797093C (en) | 2010-04-26 | 2019-10-29 | Atyr Pharma, Inc. | Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of cysteinyl-trna synthetase |
US8961960B2 (en) | 2010-04-27 | 2015-02-24 | Atyr Pharma, Inc. | Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of isoleucyl tRNA synthetases |
US8993723B2 (en) | 2010-04-28 | 2015-03-31 | Atyr Pharma, Inc. | Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of alanyl-tRNA synthetases |
WO2011150279A2 (en) | 2010-05-27 | 2011-12-01 | Atyr Pharma, Inc. | Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of glutaminyl-trna synthetases |
EP2563912B1 (en) | 2010-04-29 | 2018-09-05 | aTyr Pharma, Inc. | Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of asparaginyl trna synthetases |
US9034320B2 (en) | 2010-04-29 | 2015-05-19 | Atyr Pharma, Inc. | Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of Valyl-tRNA synthetases |
US8961961B2 (en) | 2010-05-03 | 2015-02-24 | a Tyr Pharma, Inc. | Innovative discovery of therapeutic, diagnostic, and antibody compositions related protein fragments of arginyl-tRNA synthetases |
WO2011140135A2 (en) | 2010-05-03 | 2011-11-10 | Atyr Pharma, Inc. | Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of methionyl-trna synthetases |
US9034321B2 (en) | 2010-05-03 | 2015-05-19 | Atyr Pharma, Inc. | Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of phenylalanyl-alpha-tRNA synthetases |
JP6008844B2 (en) | 2010-05-04 | 2016-10-19 | エータイアー ファーマ, インコーポレイテッド | Innovative discovery of therapeutic, diagnostic and antibody compositions related to protein fragments of the p38 MULTI-tRNA synthetase complex |
AU2011252990B2 (en) | 2010-05-14 | 2017-04-20 | Pangu Biopharma Limited | Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of phenylalanyl-beta-tRNA synthetases |
AU2011261486B2 (en) | 2010-06-01 | 2017-02-23 | Pangu Biopharma Limited | Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of lysyl-tRNA synthetases |
CA2804416C (en) | 2010-07-12 | 2020-04-28 | Atyr Pharma, Inc. | Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of glycyl-trna synthetases |
WO2012027611A2 (en) | 2010-08-25 | 2012-03-01 | Atyr Pharma, Inc. | INNOVATIVE DISCOVERY OF THERAPEUTIC, DIAGNOSTIC, AND ANTIBODY COMPOSITIONS RELATED TO PROTEIN FRAGMENTS OF TYROSYL-tRNA SYNTHETASES |
US9518289B2 (en) * | 2010-09-20 | 2016-12-13 | Seegene, Inc. | Detection of target nucleic acid sequences by exonucleolytic activity using single-labeled immobilized probes on solid phase |
KR20140040697A (en) | 2011-01-14 | 2014-04-03 | 키진 엔.브이. | Paired end random sequence based genotyping |
CN103384832B (en) | 2011-02-24 | 2016-06-29 | 希尔氏宠物营养品公司 | For diagnosing and treat compositions and the method for renal dysfunction in felid |
US20120252682A1 (en) | 2011-04-01 | 2012-10-04 | Maples Corporate Services Limited | Methods and systems for sequencing nucleic acids |
AU2012271528B2 (en) | 2011-06-15 | 2015-06-25 | Hill's Pet Nutrition, Inc. | Compositions and methods for diagnosing and monitoring hyperthyroidism in a feline |
CA2848304A1 (en) | 2011-09-09 | 2013-03-14 | The Board Of Trustees Of The Leland Stanford Junior University | Methods for sequencing a polynucleotide |
US9644241B2 (en) | 2011-09-13 | 2017-05-09 | Interpace Diagnostics, Llc | Methods and compositions involving miR-135B for distinguishing pancreatic cancer from benign pancreatic disease |
JP6189857B2 (en) | 2011-12-19 | 2017-08-30 | ヒルズ・ペット・ニュートリシャン・インコーポレーテッド | Compositions and methods for diagnosing and treating hyperthyroidism in companion animals |
EP2802666B1 (en) | 2012-01-13 | 2018-09-19 | Data2Bio | Genotyping by next-generation sequencing |
MX356107B (en) | 2012-02-16 | 2018-05-15 | Atyr Pharma Inc | Histidyl-trna synthetases for treating autoimmune and inflammatory diseases. |
ES2663234T3 (en) | 2012-02-27 | 2018-04-11 | Cellular Research, Inc | Compositions and kits for molecular counting |
US11177020B2 (en) | 2012-02-27 | 2021-11-16 | The University Of North Carolina At Chapel Hill | Methods and uses for molecular tags |
DK2828218T3 (en) | 2012-03-20 | 2020-11-02 | Univ Washington Through Its Center For Commercialization | METHODS OF LOWERING THE ERROR RATE OF MASSIVELY PARALLEL DNA SEQUENCING USING DUPLEX CONSENSUS SEQUENCING |
WO2014071070A1 (en) | 2012-11-01 | 2014-05-08 | Pacific Biosciences Of California, Inc. | Compositions and methods for selection of nucleic acids |
WO2014085434A1 (en) | 2012-11-27 | 2014-06-05 | Pontificia Universidad Catolica De Chile | Compositions and methods for diagnosing thyroid tumors |
EP2935585B1 (en) * | 2012-12-21 | 2019-04-10 | Nanyang Technological University | Site-specific induction of bimolecular quadruplex-duplex hybrids and methods of using the same |
EP3037532A4 (en) * | 2013-08-21 | 2017-05-03 | Fujirebio Inc. | Method for measuring modified nucleobase using solid phase probe, and kit for same |
US10018566B2 (en) | 2014-02-28 | 2018-07-10 | Ldip, Llc | Partially encapsulated waveguide based sensing chips, systems and methods of use |
US10435685B2 (en) | 2014-08-19 | 2019-10-08 | Pacific Biosciences Of California, Inc. | Compositions and methods for enrichment of nucleic acids |
EP3183367B1 (en) | 2014-08-19 | 2019-06-26 | Pacific Biosciences Of California, Inc. | Compositions and methods for enrichment of nucleic acids |
WO2016138427A1 (en) | 2015-02-27 | 2016-09-01 | Indx Lifecare, Inc. | Waveguide-based detection system with scanning light source |
EP3362580B1 (en) | 2015-10-18 | 2021-02-17 | Affymetrix, Inc. | Multiallelic genotyping of single nucleotide polymorphisms and indels |
US11332784B2 (en) | 2015-12-08 | 2022-05-17 | Twinstrand Biosciences, Inc. | Adapters, methods, and compositions for duplex sequencing |
SG11202003885UA (en) | 2017-11-08 | 2020-05-28 | Twinstrand Biosciences Inc | Reagents and adapters for nucleic acid sequencing and methods for making such reagents and adapters |
AU2019300172A1 (en) | 2018-07-12 | 2021-01-28 | Twinstrand Biosciences, Inc. | Methods and reagents for characterizing genomic editing, clonal expansion, and associated applications |
WO2020109412A1 (en) | 2018-11-28 | 2020-06-04 | Keygene N.V. | Targeted enrichment by endonuclease protection |
JP7462632B2 (en) | 2018-11-30 | 2024-04-05 | カリス エムピーアイ インコーポレイテッド | Next-generation molecular profiling |
CN113474466A (en) | 2019-02-21 | 2021-10-01 | 主基因有限公司 | Polyploid genotyping |
CN114051537A (en) | 2019-04-17 | 2022-02-15 | 艾基诺米公司 | Improved method for early diagnosis of uterine leiomyomas and leiomyosarcoma |
WO2021112918A1 (en) | 2019-12-02 | 2021-06-10 | Caris Mpi, Inc. | Pan-cancer platinum response predictor |
WO2021116371A1 (en) | 2019-12-12 | 2021-06-17 | Keygene N.V. | Semi-solid state nucleic acid manipulation |
CA3161280A1 (en) | 2019-12-20 | 2021-06-24 | Rene Cornelis Josephus Hogers | Next-generation sequencing library preparation using covalently closed nucleic acid molecule ends |
WO2022074058A1 (en) | 2020-10-06 | 2022-04-14 | Keygene N.V. | Targeted sequence addition |
WO2022112316A1 (en) | 2020-11-24 | 2022-06-02 | Keygene N.V. | Targeted enrichment using nanopore selective sequencing |
WO2022112394A1 (en) | 2020-11-25 | 2022-06-02 | Koninklijke Nederlandse Akademie Van Wetenschappen | Ribosomal profiling in single cells |
WO2024121354A1 (en) | 2022-12-08 | 2024-06-13 | Keygene N.V. | Duplex sequencing with covalently closed dna ends |
WO2024209000A1 (en) | 2023-04-04 | 2024-10-10 | Keygene N.V. | Linkers for duplex sequencing |
Family Cites Families (54)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4675283A (en) * | 1984-07-19 | 1987-06-23 | Massachusetts Institute Of Technology | Detection and isolation of homologous, repeated and amplified nucleic acid sequences |
US4683202A (en) | 1985-03-28 | 1987-07-28 | Cetus Corporation | Process for amplifying nucleic acid sequences |
US4683195A (en) | 1986-01-30 | 1987-07-28 | Cetus Corporation | Process for amplifying, detecting, and/or-cloning nucleic acid sequences |
EP0224126A3 (en) | 1985-11-25 | 1989-02-01 | The University of Calgary | Covalently linked complementary oligodeoxynucleotides as universal nucleic acid sequencing primer linkers |
US4800159A (en) | 1986-02-07 | 1989-01-24 | Cetus Corporation | Process for amplifying, detecting, and/or cloning nucleic acid sequences |
US5093245A (en) | 1988-01-26 | 1992-03-03 | Applied Biosystems | Labeling by simultaneous ligation and restriction |
US5027877A (en) * | 1988-04-14 | 1991-07-02 | Bridgestone Corporation | Off-the-road heavy duty pneumatic radial tire |
US6107023A (en) | 1988-06-17 | 2000-08-22 | Genelabs Technologies, Inc. | DNA amplification and subtraction techniques |
US5512439A (en) | 1988-11-21 | 1996-04-30 | Dynal As | Oligonucleotide-linked magnetic particles and uses thereof |
ATE142272T1 (en) | 1989-01-19 | 1996-09-15 | Behringwerke Ag | NUCLEIC ACID AMPLIFICATION USING A SINGLE PRIMER |
US5508178A (en) | 1989-01-19 | 1996-04-16 | Rose; Samuel | Nucleic acid amplification using single primer |
JPH04504356A (en) | 1989-01-31 | 1992-08-06 | ユニバーシティ オブ マイアミ | Microdissection and amplification of chromosomal DNA |
AU6646190A (en) | 1989-10-16 | 1991-05-16 | Genelabs Technologies, Inc. | Non-specific dna amplification |
CA2036946C (en) | 1990-04-06 | 2001-10-16 | Kenneth V. Deugau | Indexing linkers |
WO1992007095A1 (en) | 1990-10-15 | 1992-04-30 | Stratagene | Arbitrarily primed polymerase chain reaction method for fingerprinting genomes |
PT969102E (en) | 1991-09-24 | 2008-03-25 | Keygene Nv | Primers, kits and sets of restriction fragments used in selective restriction fragment amplification |
WO1993014217A1 (en) * | 1992-01-10 | 1993-07-22 | Life Technologies, Inc. | Use of predetermined nucleotides having altered base pairing characteristics in the amplification of nucleic acid molecules |
EP0675966B1 (en) | 1992-02-19 | 2004-10-06 | The Public Health Research Institute Of The City Of New York, Inc. | Novel oligonucleotide arrays and their use for sorting, isolating, sequencing, and manipulating nucleic acids |
US5750335A (en) * | 1992-04-24 | 1998-05-12 | Massachusetts Institute Of Technology | Screening for genetic variation |
US5436142A (en) | 1992-11-12 | 1995-07-25 | Cold Spring Harbor Laboratory | Methods for producing probes capable of distingushing variant genomic sequences |
US6277606B1 (en) | 1993-11-09 | 2001-08-21 | Cold Spring Harbor Laboratory | Representational approach to DNA analysis |
US5650274A (en) * | 1993-06-25 | 1997-07-22 | Hitachi, Ltd. | DNA analyzing method |
US5837832A (en) * | 1993-06-25 | 1998-11-17 | Affymetrix, Inc. | Arrays of nucleic acid probes on biological chips |
US5759922A (en) * | 1993-08-25 | 1998-06-02 | Micron Technology, Inc. | Control of etch profiles during extended overetch |
EP0730663B1 (en) * | 1993-10-26 | 2003-09-24 | Affymetrix, Inc. | Arrays of nucleic acid probes on biological chips |
US6207373B1 (en) | 1998-02-25 | 2001-03-27 | Nanogen, Inc. | Methods for determining nature of repeat units in DNA |
US6027877A (en) | 1993-11-04 | 2000-02-22 | Gene Check, Inc. | Use of immobilized mismatch binding protein for detection of mutations and polymorphisms, purification of amplified DNA samples and allele identification |
WO1995025538A1 (en) | 1994-03-18 | 1995-09-28 | The General Hospital Corporation | Cleaved amplified rflp detection methods |
US5851770A (en) | 1994-04-25 | 1998-12-22 | Variagenics, Inc. | Detection of mismatches by resolvase cleavage using a magnetic bead support |
WO1996005222A1 (en) * | 1994-08-08 | 1996-02-22 | Wisconsin Alumni Research Foundation | Purification and pharmaceutical compositions containing type g botulinum neurotoxin |
US5710000A (en) | 1994-09-16 | 1998-01-20 | Affymetrix, Inc. | Capturing sequences adjacent to Type-IIs restriction sites for genomic library mapping |
US6013445A (en) | 1996-06-06 | 2000-01-11 | Lynx Therapeutics, Inc. | Massively parallel signature sequencing by ligation of encoded adaptors |
US5565340A (en) | 1995-01-27 | 1996-10-15 | Clontech Laboratories, Inc. | Method for suppressing DNA fragment amplification during PCR |
US5707807A (en) * | 1995-03-28 | 1998-01-13 | Research Development Corporation Of Japan | Molecular indexing for expressed gene analysis |
US5972693A (en) * | 1995-10-24 | 1999-10-26 | Curagen Corporation | Apparatus for identifying, classifying, or quantifying DNA sequences in a sample without sequencing |
DE69530215T2 (en) * | 1995-12-12 | 2003-11-13 | Societe Des Produits Nestle S.A., Vevey | ice cream |
US5712127A (en) | 1996-04-29 | 1998-01-27 | Genescape Inc. | Subtractive amplification |
DE19620874A1 (en) * | 1996-05-23 | 1997-11-27 | Bmw Rolls Royce Gmbh | Fuel injection for a staged gas turbine combustor |
US5763239A (en) | 1996-06-18 | 1998-06-09 | Diversa Corporation | Production and use of normalized DNA libraries |
EP0941366A2 (en) | 1996-11-06 | 1999-09-15 | Whitehead Institute For Biomedical Research | Biallelic markers |
US6060245A (en) | 1996-12-13 | 2000-05-09 | Stratagene | Methods and adaptors for generating specific nucleic acid populations |
US6060240A (en) | 1996-12-13 | 2000-05-09 | Arcaris, Inc. | Methods for measuring relative amounts of nucleic acids in a complex mixture and retrieval of specific sequences therefrom |
WO1998030721A1 (en) | 1997-01-10 | 1998-07-16 | Pioneer Hi-Bred International, Inc. | Hybridization-based genetic amplification and analysis |
US6027945A (en) | 1997-01-21 | 2000-02-22 | Promega Corporation | Methods of isolating biological target materials using silica magnetic particles |
EP0972078B1 (en) | 1997-03-20 | 2005-06-01 | Affymetrix, Inc. (a California Corporation) | Iterative resequencing |
WO1998056954A1 (en) | 1997-06-13 | 1998-12-17 | Affymetrix, Inc. | Method to detect gene polymorphisms and monitor allelic expression employing a probe array |
EP1032705B1 (en) * | 1997-10-30 | 2011-12-14 | Cold Spring Harbor Laboratory | Probe arrays and methods of using probe arrays for distinguishing dna |
US6033861A (en) | 1997-11-19 | 2000-03-07 | Incyte Genetics, Inc. | Methods for obtaining nucleic acid containing a mutation |
WO1999036571A2 (en) * | 1998-01-13 | 1999-07-22 | Biochip Technologies Gmbh | Method for the detection or nucleic acid of nucleic acid sequences |
US6306643B1 (en) | 1998-08-24 | 2001-10-23 | Affymetrix, Inc. | Methods of using an array of pooled probes in genetic analysis |
US6703228B1 (en) | 1998-09-25 | 2004-03-09 | Massachusetts Institute Of Technology | Methods and products related to genotyping and DNA analysis |
EP1001037A3 (en) | 1998-09-28 | 2003-10-01 | Whitehead Institute For Biomedical Research | Pre-selection and isolation of single nucleotide polymorphisms |
ATE405665T1 (en) * | 1999-03-11 | 2008-09-15 | Zeachem Inc | METHOD FOR PRODUCING ETHANOL |
US6906643B2 (en) * | 2003-04-30 | 2005-06-14 | Hewlett-Packard Development Company, L.P. | Systems and methods of viewing, modifying, and interacting with “path-enhanced” multimedia |
-
1999
- 1999-10-27 CA CA002345441A patent/CA2345441A1/en not_active Abandoned
- 1999-10-27 WO PCT/US1999/025200 patent/WO2000024939A1/en active IP Right Grant
- 1999-10-27 DE DE69929542T patent/DE69929542T2/en not_active Expired - Lifetime
- 1999-10-27 JP JP2000578491A patent/JP2002528096A/en not_active Withdrawn
- 1999-10-27 AT AT99965737T patent/ATE316152T1/en not_active IP Right Cessation
- 1999-10-27 AU AU21440/00A patent/AU2144000A/en not_active Abandoned
- 1999-10-27 US US09/428,350 patent/US6361947B1/en not_active Expired - Lifetime
- 1999-10-27 EP EP99965737A patent/EP1124990B1/en not_active Expired - Lifetime
-
2001
- 2001-07-12 US US09/904,039 patent/US7267966B2/en not_active Expired - Fee Related
-
2004
- 2004-09-16 US US10/942,364 patent/US20060063158A1/en not_active Abandoned
Also Published As
Publication number | Publication date |
---|---|
US7267966B2 (en) | 2007-09-11 |
DE69929542D1 (en) | 2006-04-06 |
US6361947B1 (en) | 2002-03-26 |
JP2002528096A (en) | 2002-09-03 |
DE69929542T2 (en) | 2006-09-14 |
ATE316152T1 (en) | 2006-02-15 |
EP1124990B1 (en) | 2006-01-18 |
AU2144000A (en) | 2000-05-15 |
WO2000024939A1 (en) | 2000-05-04 |
EP1124990A4 (en) | 2003-03-12 |
US20020142314A1 (en) | 2002-10-03 |
US20060063158A1 (en) | 2006-03-23 |
EP1124990A1 (en) | 2001-08-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7267966B2 (en) | Complexity management and analysis of genomic DNA | |
US9845490B2 (en) | Complexity management of genomic DNA | |
US8492121B2 (en) | Complexity management of genomic DNA | |
US7214490B2 (en) | Method of target enrichment and amplification | |
US7202039B2 (en) | Complexity management of genomic DNA | |
EP1362929A2 (en) | Methods for genotyping | |
US20040110153A1 (en) | Compleixity management of genomic DNA by semi-specific amplification | |
CA2489733A1 (en) | Complexity management of genomic dna by locus specific amplication | |
EP1056889B1 (en) | Methods related to genotyping and dna analysis | |
US20070148636A1 (en) | Method, compositions and kits for preparation of nucleic acids | |
Kamberov et al. | Use of in vitro OmniPlex Libraries for high-throughput comparative genomics and molecular haplotyping |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request | ||
FZDE | Discontinued |