0001 Torelli,A. ADVANCE and ADAM: Two .. Comput.Appl.Bio 94 10(1):3-6 Torelli A; Robotti CA ADVANCE and ADAM: Two Algorithms for the Analysis of Global Similarity between Homologous Informational Sequences Pairwise comparison; Sequence proximity; Pairwise alignment; Italy; Similarity; Algorithm "Two algorithms for the analysis of global similarity between sequences of informational polymeric molecules (nucleic acids and proteins) are proposed: one (ADVANCE) merely gives a quantification of the global similarity between two sequences, and is very fast; the other (ADAM) also provides an alignment of the sequences. Both are new algorithms, implement Sellers' theorem, do not require parameters ... and are fast ...." Comput Appl Biosci 1994 10 1 3-6 0002 Ina,Y. ODEN: A Program Packag.. Comput.Appl.Bio 94 10(1):11-12 Ina Y ODEN: A Program Package for Molecular Evolutionary Analysis and Database Search of DNA and Amino Acid Sequences Database search; Phylogeny; JP; Program; DNA; Amino acid "To enable researchers to use both kinds of programs interactively, I developed a program package which integrates (about) 50 programs for database search and molecular evolutionary analysis. I named the package 'ODEN' after a Japanese food which is cooked with various materials marvelously harmonized." Comput Appl Biosci 1994 10 1 11-12 0003 Thompson,J.D. Improved Sensitivity o.. Comput.Appl.Bio 94 10(1):19-29 Thompson JD; Higgins DG; Gibson TJ Improved Sensitivity of Profile Searches through the Use of Sequence Weights and Gap Excision Match a pattern matrix; Database search; Sequence weight; DE; Gap; Profile "However, none of the published weighting schemes seemed ideal for the purpose of weighting profiles. We have developed a new method to weight the sequences, where more distant sequences are assigned higher weights than closely related ones according to branch length values of neighbour-joining trees prepared from the aligned sequences." Comput Appl Biosci 1994 10 1 19-29 0004 Doelz,R. Hierarchical Access Sy.. Comput.Appl.Bio 94 10(1):31-34 Doelz R Hierarchical Access System for Sequence Libraries in Europe (HASSLE): A Tool to Access Sequence Databases Remotely Database search; SWI; FASTA; Hierarchical; BLAST; Program; Sequence database "HASSLE focuses on the network aspect of the molecular biology computing and assumes that it is possible to have database applications available as remote 'services' (programs, program packages or utilities) which can be started by a simple command script after a suitable feed of datafiles. The current system provides these services for searching with programs like FASTA or BLAST ...." Comput Appl Biosci 1994 10 1 31-34 0005 Olsen,G.J. fastDNAml: A Tool for .. Comput.Appl.Bio 94 10(1):41-48 Olsen GJ; Matsuda H; Hagstrom R; Overbeek R fastDNAml: A Tool for Construction of Phylogenetic Trees of DNA Sequences using Maximum Likelihood Phylogeny; Parallel; USA; Likelihood; Evolutionary tree; DNA; Phylogenetic "The program can be run on a wide variety of computers ranging from Unix workstations to massively parallel systems .... Our program uses a maximum likelihood approach and is based on version 3.3 of Felsenstein's dnaml program. ... and phylogenetic estimates are possible even when hundreds of sequences exist." Comput Appl Biosci 1994 10 1 41-48 0006 Gast,F.U. A Macintosh Program fo.. Comput.Appl.Bio 94 10(1):49-51 Gast FU A Macintosh Program for the Versatile Generation of Random Nucleic Acid Sequences and their Structural Analysis Sequence analysis; Program; Nucleic acid; DE "The program 'MacStAn' for the Apple Macintosh generates random sequences and can analyze their tendency to form secondary structure or translation products as well as their mono-, di- and trinucleotide composition." Comput Appl Biosci 1994 10 1 49-51 0007 Fuchs,R. Fast Protein Block Sea.. Comput.Appl.Bio 94 10(1):79-80 Fuchs R Fast Protein Block Searches Database search; DE; Block search; Genome; Profile; Protein "Profile searches using aligned short protein blocks are an effective method for identifying putative protein functions. An algorithm is presented that accelerates block searches by a factor 2-5 with only limited lack of sensitivity; this algorithm is particularly suited for application in large- scale genome research." Comput Appl Biosci 1994 10 1 79-80 0008 Chapman,M.S. Sequence Similarity Sc.. Comput.Appl.Bio 94 10(2):111-119 Chapman MS Sequence Similarity Scores and the Inference of Structure - Function Relationships Multiple comparison; USA; Statistical; Sequence proximity; Function; Similarity; Structure; Score "Improved methods are described for the interpretation of two or more aligned protein or nucleic acid sequences. ... Improvements include the calculation of a position-dependent, gap-penalized similarity score; computer- assisted graphical association of sequence similarity with structural, functional or chemical properties of the sequences; and statistical comparisons of the sequence conservation or variability of different groups of residues." Comput Appl Biosci 1994 10 2 111-119 0009 Frohlich,K.U. Sequence Similarity Pr.. Comput.Appl.Bio 94 10(2):179-183 Frohlich KU Sequence Similarity Presenter: A Tool for the Graphic Display of Similarities of Long Sequences for Use in Presentations Pairwise comparison; DE; Display; Sequence alignment; Similarity; Graphic "A new method for the presentation of alignments of long sequences is described. The degree of identity for the aligned sequences is averaged for sections of a fixed number of residues. The resulting values are converted to shades of gray, with white corresponding to lack of identity and black corresponding to perfect identity. A sequence alignment is represented as a bar filled with varying shades of gray." Comput Appl Biosci 1994 10 2 179-183 0010 Laferriere,A. An RNA Pattern Matchin.. Comput.Appl.Bio 94 10(2):211-212 Laferriere A; Gautheret D; Cedergren R An RNA Pattern Matching Program with Enhanced Performance and Portability Database search; Pattern match; Sequence database; Program; String match; RNA; CA; Performance "We present here a significant improvement of the program RNAMOT which allows searches of primary and secondary structural patterns in sequence databases (Gautheret et al. 1990). An important performance enhancement was achieved using a faster string-matching algorithm and more efficient sequence scans." Comput Appl Biosci 1994 10 2 211-212 0011 Wishart,D.S. SEQSEE: A Comprehensiv.. Comput.Appl.Bio 94 10(2):121-132 Wishart DS; Boyko RF; Willard L; Richards FM; Sykes BD SEQSEE: A Comprehensive Program Suite for Protein Sequence Analysis Sequence analysis; CA; Display; Statistical; Sequence alignment; Pattern match; Program; Protein "SEQSEE (SEQuence SEEker) is a multi-purpose, menu-driven suite of programs designed to provide a fully integrated, state-of-the-art package for the analysis and display of protein sequences and protein databases. ... SEQSEE is capable of performing ... sequence/database searching, sequence retrieval, sequence entry and editing, statistical sequence analysis, multiple sequence alignment, flexible pattern matching, and secondary structure prediction." Comput Appl Biosci 1994 10 2 121-132 0012 Searls,D.B. Doing Sequence Analysi.. Comput.Appl.Bio 93 9(4):421-426 Searls DB Doing Sequence Analysis with your Printer Sequence analysis; USA "The software package RSVP (Rapid Sequence Visualization in PostScript) has a suite of visually oriented sequence analysis routines implemented entirely in the page description language PostScript .... RSVP is thus a relatively platform-independent tool for providing a 'quick look' at sequence data, using form and color to help point out patterns, in advance of more sophisticated sequence analyses." Comput Appl Biosci 1993 9 4 421-426 0013 Date,S. Multiple Alignment of .. Comput.Appl.Bio 93 9(4):397-402 Date S; Kulkarni R; Kulkarni B; Kulkarni-Kale U; Kolaskar AS Multiple Alignment of Sequences on Parallel Computers Multiple alignment; Clustering; Parallel; India; Program; Hierarchical "A software package that allows one to carry out multiple alignment of protein and nucleic acid sequences of almost unlimited length and number of sequences is developed on C-DAC parallel computer - a transputer-based machine. ... The speed gains are almost linear when the number of transputers is increased from 4 to 64." Comput Appl Biosci 1993 9 4 397-402 0014 Chao,K.M. Locating Well-Conserve.. Comput.Appl.Bio 93 9(4):387-396 Chao KM; Hardison RC; Miller W Locating Well-Conserved Regions Within a Pairwise Alignment Pairwise alignment; Significance; USA; Locally optimal; Suboptimal; Region "When alignments are so long that it is infeasible, or at least undesirable, to inspect them in complete detail, it is helpful to have an automatic process that computes information about the varying degree of conservation along the alignment and displays the information in a graphical representation that is readily assimilated. This paper presents methods for computing several such 'robustness measures' at each position of a given alignment." Comput Appl Biosci 1993 9 4 387-396 0015 Livingstone,C Protein Sequence Align.. Comput.Appl.Bio 93 9(6):745-756 Livingstone CD; Barton GJ Protein Sequence Alignments: A Strategy for the Hierarchical Analysis of Residue Conservation Multiple comparison; UK; Sequence alignment; Hierarchical; Consensus index; Protein; Residue "An algorithm is described for the systematic characterization of the physico-chemical properties seen at each position in a multiple protein sequence alignment. The new algorithm allows questions important in the design of mutagenesis experiments to be quickly answered since positions in the alignment that show unusual or interesting residue substitution patterns may be rapidly identified. The strategy is based on a flexible set-based description of amino acid properties, which is used to define the conservation between any group of amino acids." Comput Appl Biosci 1993 9 6 745-756 0016 Ortells,M.O. CEDIT: A C Interface a.. Comput.Appl.Bio 93 9(6):741-744 Ortells MO; Cockcroft VB; Lunt GG CEDIT: A C Interface and Macro Facility for Portein Sequence Alignment Editing in Colour with Microsoft Word 5.0 for PCs Multiple alignment; Display; UK; Sequence alignment; Editor; Editing; Word "CEDIT, a C interface and macro facility that provides for the colour editing of protein sequence alignments (up to 2000 sequences, 5000 residues each) using Microsoft Word 5.0 for PCs is presented. CEDIT uses the ability of MS-Word to display letters with the desired colour to easily identify conservative homologies across the sequences." Comput Appl Biosci 1993 9 6 741-744 0017 De Rijk,P. DCSE, an Interactive T.. Comput.Appl.Bio 93 9(6):735-740 De Rijk P; De Wachter R DCSE, an Interactive Tool for Sequence Alignment and Secondary Structure Research Multiple alignment; Belgium; Sequence alignment; Gap; Program; Editor; Structure; Secondary "DCSE provides a user-friendly package for the creation and editing of sequence alignments. ... It shifts characters or entire blocks of aligned characters, rather than inserting or deleting gaps in the sequences. Alignment of a new sequence to an existing alignment is partly automated. Although DCSE can be used on protein sequence alignments, it is especially targeted at the examination of RNA. The secondary structure for every sequence can be incorporated easily in the alignment." Comput Appl Biosci 1993 9 6 735-740 0018 Barton,G.J. An Efficient Algorithm.. Comput.Appl.Bio 93 9(6):729-734 Barton GJ An Efficient Algorithm to Locate all Locally Optimal Alignments between Two Sequences Allowing for Gaps Subalignment; UK; Gap; Locally optimal; Optimal; Algorithm "An efficient algorithm is described to locate locally optimal alignments between two sequences allowing for insertions and deletions. The algorithm is based on that of Smith and Waterman which returns the single best local alignment. However, the algorithm described here permits all non-intersecting locally optimal alignments to be determined in a single pass through the comparison matrix." Comput Appl Biosci 1993 9 6 729-734 0019 Liuni,S. SIMD Parallelization o.. Comput.Appl.Bio 93 9(6):701-707 Liuni S; Prunella N; Pesole G; D'Orazio T; Stella E; Distante A SIMD Parallelization of the WORDUP Algorithm for Detecting Statistically Significant Patterns in DNA Sequences Multiple comparison; Significance; Parallel; Pattern match; String match; Italy; Boyer-Moore; DNA; Algorithm "We study a method for parallelizing the algorithm WORDUP, which detects the presence of statistically significant patterns in DNA sequences. WORDUP implements an efficient method to identify the presence of statistically significant oligomers in a non-homologous group of sequences. It is based on a modified version of the Boyer-Moore algorithm, which is one of the fastest algorithms for string matching available in the literature." Comput Appl Biosci 1993 9 6 701-707 0020 Bordo,D. ENVIRON: A Software Pa.. Comput.Appl.Bio 93 9(6):639-645 Bordo D ENVIRON: A Software Package to Compare Protein Three-Dimensional Structures with Homologous Sequences using Local Structural Motifs Structure; Motif; Sequence alignment; Italy; Program; Protein "This work presents a method to compare local clusters of interacting residues as observed in a known three-dimensional protein structure with corresponding clusters inferred from homologous protein sequences, assuming conserved protein folding. For this purpose the local environment of a selected residue in a known protein structure is defined as the ensemble of amino acids in contact with it in the folded state. Using a multiple sequence alignment to identify corresponding residues in homologous proteins, a detailed comparison can be performed ...." Comput Appl Biosci 1993 9 6 639-645 0021 Fuchs,R. Block Searches on VAX .. Comput.Appl.Bio 93 9(5):587-591 Fuchs R Block Searches on VAX and Alpha Computer Systems Database search; Match a pattern matrix; Block search; DE; Pattern search "A new program, BlockSearch, is described that allows biologists to search protein sequences against the BLOCKS database of aligned protein blocks by converting these blocks to site-specific scoring matrices. It thus complements existing tools for standard similarity searches and pattern searches which aid in elucidating the function of newly determined protein-coding sequences." Comput Appl Biosci 1993 9 5 587-591 0022 Prunella,N. FASTPAT: A Fast and Ef.. Comput.Appl.Bio 93 9(5):541-545 Prunella N; Liuni S; Attimonelli M; Pesole G FASTPAT: A Fast and Efficient Algorithm for String Searching in DNA Sequences String match; Boyer-Moore; Italy; String search; DNA; Algorithm "A new string searching algorithm is presented aimed at searching for the occurrence of character patterns in longer character texts. The algorithm, specifically designed for nucleic acid sequence data, is essentially derived from the Boyer-Moore method .... Both pattern and text data are compressed so that the natural 4-letter alphabet of nucleic acid sequences is considerably enlarged." Comput Appl Biosci 1993 9 5 541-545 0023 Zhang,M.Q. A Weight Array Method .. Comput.Appl.Bio 93 9(5):499-509 Zhang MQ; Marr TG A Weight Array Method for Splicing Signal Analysis Match a pattern matrix; USA; Sequence analysis; Statistical; Signal "A new method of sequence analysis, using a weight array method (WAM), which generalizes the traditional Staden weight matrix method (WMM), is proposed. With the help of a statistical mechanical model, the discriminant function is identified with the energy function describing macromolecular interactions." Comput Appl Biosci 1993 9 5 499-509 0024 Sakamoto,N. Development of the Ove.. Comput.Appl.Bio 93 9(4):427-434 Sakamoto N; Takagi T; Sakaki Y Development of the Overlapping Oligonucleotide Database and its Application to Signal Sequence Search of the Human Genome Database search; Sequence database; Signal; JP; Sequence search; Genome "We have developed ODS (Overlapping Oligonucleotide Database for Signal Sequence Search) - the first relational database that integrates information on biological features into the search for signal sequences. ... Nucleotide sequences are transformed into overlapping oligonucleotides in order to facilitate the signal sequence search rapidly without the need for specific alignment programs. This transformation leads to a one-to-one correspondence between the nucleotide sequence and its biological feature." Comput Appl Biosci 1993 9 4 427-434 0025 Milosavljevic Discovering Simple DNA.. Comput.Appl.Bio 93 9(4):407-411 Milosavljevic A; Jurka J Discovering Simple DNA Sequences by the Algorithmic Significance Method Sequence analysis; Significance; Compression; USA; Dynamic programming; Repeat; DNA "The main idea is that patterns can be discovered by finding ways to encode the observed data concisely. ... The method is applied to discover significantly simple DNA sequences. We define DNA sequences to be simple if they contain repeated occurrences of certain 'words' and thus can be encoded is a small number of bits. ... A standard dynamic programming algorithm for data compression is applied to compute the minimal encoding lengths of sequences in linear time." Comput Appl Biosci 1993 9 4 407-411 0026 Fagin,B. A Special-Purpose Proc.. Comput.Appl.Bio 93 9(2):221-226 Fagin B; Watt JG; Gross R A Special-Purpose Processor for Gene Sequence Analysis Pairwise alignment; Hardware; USA; Sequence analysis; Sequence alignment; Needleman-Wunsch; Gene "For certain problems, special-purpose computers can achieve significant cost/performance gains over general-purpose machines. We describe one such computer here: a custom accelerator for gene sequence analysis. The accelerator implements a version of the Needleman-Wunsch algorithm for nucleotide sequence alignment. ... The boards ... yield a 15-fold performance improvement over an unassisted host." Comput Appl Biosci 1993 9 2 221-226 0027 Vogt,G. Profile Sequence Analy.. Comput.Appl.Bio 93 9(1):25-28 Vogt G; Argos P Profile Sequence Analysis and Database Searches on a Transputer Machine Connected to a Macintosh Computer Match a pattern matrix; Database search; Parallel; DE; Sequence analysis; Dynamic programming; Profile "An implementation of Profilesearch (a technique to search for relationships between a protein sequence and multiply aligned sequences) for a parallel computer is described. ... The program and environment are useful to search quickly and easily for similarities between a single sequence or sequence set and individual sequences contained in a large database. The alignment is determined by typical dynamic programming techniques." Comput Appl Biosci 1993 9 1 25-28 0028 Fuchs,R. EMBL-Search: A CD-ROM .. Comput.Appl.Bio 93 9(1):71-77 Fuchs R; Stoehr P EMBL-Search: A CD-ROM Based Database Query System Database search; DE; Sequence database; Query "This paper describes a system of generally applicable index files provided on the EMBL sequence databases CD-ROM to facilitate the development of front-end software to the sequence databases available on this CD-ROM. The index files are used by a new versatile and user-friendly database retrieval program for the Apple Macintosh, EMBL-Search, which allows the easy construction of complex database queries." Comput Appl Biosci 1993 9 1 71-77 0029 Balzarotti,V. An Algorithm for the I.. Comput.Appl.Bio 93 9(1):93-100 Balzarotti V; Colizzi V; Morante S; Parisi V An Algorithm for the Identification of Similar Oligopeptides between Amino Acid Sequences Locally optimal; Significance; Identification; Italy; Subalignment; Amino acid; Algorithm "We have developed a new algorithm capable of identifying pairs of similar oligopeptides irrespective of their length, number, location and ordering along the proteins, by locally comparing the two sequences of amino acids. The algorithm compares the actual number of similar pairs found in this way, with the number expected under the simplified assumption that the amino acids along the sequences are randomly distributed with a given occurrence frequency. The final step of the procedure consists in selecting the pairs of similar oligopeptides that are statistically significant ...." Comput Appl Biosci 1993 9 1 93-100 0030 Aho,A.V. Pattern Matching in St.. Formal Langua.. 80Academic Press Aho AV Pattern Matching in Strings Book R Formal Language Theory, Perspectives and Open Problems String match; Match complex patterns; Language; USA; Pattern match; Expression "This paper examines three basic classes of string patterns ... and analyzes some of the time-space tradeoffs inherent in searching for these classes of patterns. The three classes of patterns considered are (1) finite sets of strings, (2) regular expressions, and (3) regular expressions with back referencing. Efficient pattern matching algorithms for each of these classes are discussed." Academic Press New York 1980 325-347 0031 Aleksandrov,N Pattern Recognition in.. Mol.Biol.(Mosc. 89 23:988-999 Aleksandrov NN; Mironov AA Pattern Recognition in Computer Analysis of Nucleoside Sequences Pattern recognition; Discrimination; RU; Recognition Translated from Molekulyarnaya Biologiya, 23(5), 1248-1262, Sept.-Oct. 1989. "The results of using the 'generalized portrait' algorithm for pattern recognition to find an Escherichia coli promoter are presented. Related problems of feature selection, set selection and computing coordinates of the dividing vector are solved." Mol Biol (Mosc ) 23 23 988-999 0032 Almagor,H. A Markov Analysis of D.. J.Theor.Biol. 83 104:633-645 Almagor H A Markov Analysis of DNA Sequences Sequence analysis; Significance; Markov; IL; DNA "One of the basic questions to be asked (the 'correlation question') is to what extent are the 64 trinucleotide (triplet) frequencies measured in a sequence determined by the 16 doublet frequencies in the same sequence. The DNA is described here as a Markov process, with the nucleotides being outcomes of a sequence generator. ... Two natural DNA sequences ... are analysed as examples of the method." J Theor Biol 104 104 633-645 0033 Apostolico,A. The Myriad Virtues of .. Combinatorial.. 85Springer-Verlag Apostolico A The Myriad Virtues of Subword Trees Apostolico A Galil Z Combinatorial Algorithms on Words. NATO ASI Series F: Computer and System Sciences, vol. 12 Match complex patterns; Search tree; USA; Data structure; Compression; Regularities "Several nontrivial applications of subword trees have been developed since their first appearance. Some such applications depart considerably from the original motiviations. A brief account of them is attempted here." Springer-Verlag Berlin 1985 85-96 0034 Arratia,R. An Extreme Value Theor.. Ann.Statist. 86 14(3):971-993 Arratia R; Gordon L; Waterman MS An Extreme Value Theory for Sequence Matching Pairwise comparison; Significance; USA; Sequence match; Longest common "Consider finite sequences X1...Xm and Y1...Yn .... We study the distribution of the longest contiguous run of matches between the X's and Y's, allowing at most k mismatches. The distribution is closely approximated by that of the maximum of (1-p)mn i.i.d. negative binomial random variables." Ann Statist 1986 14 3 971-993 0035 Arratia,R. Stochastic Scrabble: L.. J.Appl.Probab. 88 25:106-119 Arratia R; Morris P; Waterman MS Stochastic Scrabble: Large Deviations for Sequences with Scores Pairwise comparison; Significance; USA; Longest common; Markov; Stochastic; Score "A derivation of a law of large numbers for the highest-scoring matching subsequence is given." J Appl Probab 25 25 106-119 0036 Arratia,R. An Erdos-Renyi Law wit.. Adv.Math. 85 55:13-23 Arratia R; Waterman MS An Erdos-Renyi Law with Shifts Pairwise comparison; Significance; USA; Longest common; Markov "Motivated by the comparison of DNA sequences, a generalization is given of the result of Erdos and Renyi on the length Rn of the longest run of heads in the first n tosses of a coin." Adv Math 55 55 13-23 0037 Attimonelli,M Multisequence Comparis.. Cell Biophys. 85 7:239-250 Attimonelli M; Lanave C; Sbisa E; Preparata G; Saccone C Multisequence Comparisons in Protein Coding Genes: Search for Functional Constraints Multiple comparison; Region; Statistical; Significance; Italy; Coding; Protein; Gene "The problem ... is to find in a given sequence those regions showing anomalous persistence structure. Clearly the notion of persistence can only be defined in a comparative way, i. e., by considering homologous sequences belonging to different species .... In order to appreciate the statistical meaning of the observed values of the permanence densities, their expectations and their statistical fluctuations must be determined ...." Cell Biophys 7 7 239-250 0038 Baeza-Yates,R A New Approach to Text.. Proceedings o.. 89Association for Baeza-Yates RA; Gonnet GH A New Approach to Text Searching Belkin NJ Van Rijsbergen CJ Proceedings of the Twelth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval Text search; Match with don't cares; Match with k mismatches; CA; String match "We introduce a family of simple and fast algorithms for solving the classical string matching problem, string matching with don't care symbols and complement symbols, and multiple patterns. In addition we solve the same problems allowing up to k mismatches." (Addendum in SIGIR Forum, 23(3,4), Spring/Summer 1989, p. 7.) Association for Computing Machinery New York 1989 168-175 0039 Bairoch,A. The PROSITE Dictionary.. Nucleic Acids R 93 21(13):3097-31 Bairoch A The PROSITE Dictionary of Sites and Patterns in Proteins, its Current Status Database search; Sequence database; Pattern library; Motif; Signature; Protein; PROSITE; SWI "PROSITE is a compilation of sites and patterns found in protein sequences: it can be used as a method of determining the function of uncharacterized proteins translated from genomic or cDNA sequences." Nucleic Acids Res 1993 21 13 3097-3103 0040 Bairoch,A. The SWISS-PROT Protein.. Nucleic Acids R 93 21(13):3093-30 Bairoch A; Boeckmann B The SWISS-PROT Protein Sequence Data Bank, Recent Developments Database search; Sequence database; SWI; Protein "SWISS-PROT is an annotated protein sequence database established in 1986 and maintained collaboratively, since 1988, by the Department of Medical Biochemistry of the University of Geneva and the EMBL Data Library." Nucleic Acids Res 1993 21 13 3093-3096 0041 Barker,W.C. The PIR-International .. Nucleic Acids R 93 21(13):3089-30 Barker WC; George DG; Mewes HW; Pfeiffer F; Tsugita A The PIR-International Databases Sequence database; USA "This paper briefly describes the architecture of the Protein Sequence Database, a number of other PIR-International databases, and mechanisms for providing access to and for distribution of these databases." Nucleic Acids Res 1993 21 13 3089-3092 0042 Bell,T. Longest-Match String S.. Software.Practi 93 23(7):757-771 Bell T; Kulp D Longest-Match String Searching for Ziv-Lempel Compression String match; NZ; Compression; String search; Data structure; Search tree; Boyer-Moore "Hashing, binary search trees, splay trees and the Boyer-Moore searching algorithm are traditionally used to search for exact matches, but we show how these can be adapted to find longest matches." Software Practice Experience 1993 23 7 757-771 0043 Benson,D. GenBank Nucleic Acids R 93 21(13):2963-29 Benson D; Lipman DJ; Ostell J GenBank Database search; Sequence database; USA; GenBank "The GenBank sequence database has undergone an expansion in data coverage, annotation content and the development of new services for the scientific community." Nucleic Acids Res 1993 21 13 2963-2965 0044 Bishop,M.J. Inference of Evolution.. Nucleic Acid .. 87IRL Press Bishop MJ; Friday AE; Thompson EA Inference of Evolutionary Relationships Bishop MJ Rawlings CJ Nucleic Acid and Protein Sequence Analysis: A Practical Approach Pairwise alignment; Likelihood; UK; Phylogeny "Much of the literature of molecular evolution is confused as to what constitute the data which have been observed, what constitutes the model ... and how to evaluate the relative merits of the competing hypotheses which are being considered. Outlining how to set about this is a practical matter ...." Describes a maximum likelihood method to align two sequences. IRL Press Oxford 1987 359-385 0045 Bodlaender,H. Parameterized Complexi.. First Interna.. 94Steering Commit Bodlaender H; Downey RG; Fellows MR; Hallett MT; Wareham HT Parameterized Complexity Analysis in Computational Biology First International Workshop on Shape and Pattern Matching in Computational Biology Multiple alignment; Consensus sequence; Longest common; Complexity; CA; Parameterized "We describe some new results on the Longest Common Subsequence problem. In particular, we show that the problem is hard for W[t] for all t when parameterized by the number of strings and the size of the alphabet. Lower bounds on the complexity of this basic combinatorial problem imply lower bounds on more general sequence alignment and consensus discovery problems. We also describe a number of open problems pertaining to the parameterized complexity of problems in computational biology ...." Steering Committee of the 1994 IEEE Workshop on Shape and Pattern Matching in Computational Biology 1994 Yorktown Heights, NY 1994 PP:99-116 0046 Bodlaender,H. The Parameterized Comp.. Lecture Notes i 94 807:15-30 Bodlaender H; Downey RG; Fellows MR; Wareham HT The Parameterized Complexity of Sequence Alignment and Consensus Multiple alignment; Longest common; Complexity; CA; Sequence alignment; Parameterized 5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "The Longest Common Subsequence problem is examined from the point of view of parameterized computational complexity. ... Our main results show that: (1) The Longest Common Subsequence (LCS) parameterized by the number of sequences to be analyzed is hard for W[t] for all t. (2) The LCS problem, parameterized by the length of the common subsequence, belongs to W[P] and is hard for W[2]. (3) The LCS problem parameterized both by the number of sequences and the length of the common subsequence, is complete for W[1]." Lecture Notes in Comput Sci 807 807 15-30 0047 Bork,P. Mobile Modules and Mot.. Curr.Opin.Struc 92 2:413-421 Bork P Mobile Modules and Motifs Pattern recognition; DE; Motif; Module "Therefore, at the sequence level, [modules] can often be recognized only by comparison with specific motifs. These motifs are usually not characterized by just a few conserved amino acids, but rather are complex arrangements that will become increasingly blurred with the rapidly growing number of available sequences." Characteristics of modules. Use of motifs for identifying modules. Discerning homology from similarity. Proteins with new modular architecture. Curr Opin Struct Biol 2 2 413-421 0048 Bowie,J.U. A Method to Identify P.. Science 91 253(12 July):1 Bowie JU; Luthy R; Eisenberg D A Method to Identify Protein Sequences that Fold into a Known Three- Dimensional Structure Database search; USA; Structure; Match a pattern matrix; Profile; Protein; Fold "The inverse protein folding problem, the problem of finding which amino acid sequences fold into a known three-dimensional (3D) structure, can be effectively attacked by finding sequences that are most compatible with the environments of the residues in the 3D structure." From the known 3D structure of a protein P, construct a 3D structure profile; use it to search a database of protein sequences to identify proteins most likely to adopt a fold similar to P. Science 1991 253 12 July 164-170 0049 Breslauer,D. Tight Comparison Bound.. Inform.Process. 93 47:51-57 Breslauer D; Colussi L; Toniolo L Tight Comparison Bounds for the String Prefix-Matching Problem Match a prefix; Complexity; Italy; String match; Pattern match "In the string prefix-matching problem one is interested in finding the longest prefix of a pattern string of length m that occurs starting at each position of a text string of length n. ... In this paper we study the exact complexity of the string prefix-matching problem in the deterministic sequential comparison model." Inform Process Lett 47 47 51-57 0050 Chang,J.H. Parallel Parsing on a .. IEEE Trans.Comp 87 36(1):64-75 Chang JH; Ibarra OH; Palis MA Parallel Parsing on a One-Way Array of Finite-State Machines Language; Automata; Parallel; Sequence recognition; USA; Parsing; Longest common "We show that a one-way two-dimensional iterative array of finite-state machines (2-DIA) can recognize and parse strings of any context-free language in linear time. ... We also consider the problem of finding approximate patterns in strings, the string-to-string correction problem, and the longest common subsequence problem, and show that they can be solved in linear time on a 2- DIA." IEEE Trans Comput 1987 36 1 64-75 0051 Fickett,J.W. Development of a Datab.. Mathematical .. 89CRC Press Fickett JW; Burks C Development of a Database for Nucleotide Sequences Waterman MS Mathematical Methods for DNA Sequences Sequence database; USA; Nucleotide "We know of only two data banks currently attempting comprehensive coverage of nucleotide sequence data: the GenBank genetic sequence data bank in the U.S., and the data bank at EMBL (European Molecular Biology Laboratory at Heidelberg, West Germany). We will describe one approach, undertaken by the GenBank staff at LANL (Los Alamos National Laboratory), to the development of a database that does justice to the natural structure of the data, facilitates current applications, and allows expansion for the foreseeable future." CRC Press Boca Raton, FL 1989 1-34 0052 Chen,M.T. Efficient and Elegant .. Combinatorial.. 85Springer-Verlag Chen MT; Seiferas J Efficient and Elegant Subword-Tree Construction Apostolico A Galil Z Combinatorial Algorithms on Words. NATO ASI Series F: Computer and System Sciences, vol. 12 Match complex patterns; USA; Automata; Search tree "A clean version of Weiner's linear-time compact-subword-tree construction simultaneously also constructs the smallest deterministic finite automaton recognizing the reverse subwords." Springer-Verlag Berlin 1985 97-107 0053 Galil,Z. Open Problems in Strin.. Combinatorial.. 85Springer-Verlag Galil Z Open Problems in Stringology Apostolico A Galil Z Combinatorial Algorithms on Words. NATO ASI Series F: Computer and System Sciences, vol. 12 String match; USA "Several open problems concerning combinatorial algorithms on strings are described." Questions about string matching. Generalizations of string matching. Index construction. Miscellaneous problems. Springer-Verlag Berlin 1985 1-8 0054 Guibas,L.J. Periodicities in Strings Combinatorial.. 85Springer-Verlag Guibas LJ Periodicities in Strings Apostolico A Galil Z Combinatorial Algorithms on Words. NATO ASI Series F: Computer and System Sciences, vol. 12 Regularities; USA "In this talk we summarize what is known about the periodicities of strings. A period of a string is a shift that causes a string to match itself." For an expanded version of these results see Guibas and Odlyzko (1981). Springer-Verlag Berlin 1985 257-269 0055 Pinter,R.Y. Efficient String Match.. Combinatorial.. 85Springer-Verlag Pinter RY Efficient String Matching with Don't-care Patterns Apostolico A Galil Z Combinatorial Algorithms on Words. NATO ASI Series F: Computer and System Sciences, vol. 12 IL; String match "The main result of this paper is an algorithm to deal efficiently with patterns containing a definite number of don't-care symbols. Our method is to collect 'evidence' about the occurrences of the constant parts of the pattern in the text, using the algorithm of Aho and Corasick." Springer-Verlag Berlin 1985 11-29 0056 Day,W.H.E. Alignment, Comparison .. New Approache.. 94Springer-Verlag Day WHE; McMorris FR Alignment, Comparison and Consensus of Molecular Sequences Diday E Lechevallier Y; Schader M; Bertrand P; Burtschy B New Approaches in Classification and Data Analysis Sequence comparison; Review; CA "A rich and varied literature on sequence comparison has developed, one containing hundreds of theoretical or methodological contributions and thousands of applications. However, the focus of our review is on the theory and methodology of sequence comparison." 150 references Springer-Verlag Berlin 1994 327-346 0057 Fitch,W.M. Locating Gaps in Amino.. Biochem.Genet. 69 3:99-108 Fitch WM Locating Gaps in Amino Acid Sequences to Optimize the Homology Between Two Proteins Pairwise alignment; Gap; USA; Homology; Amino acid; Protein "A method for optimally locating gaps in the amino acid sequences of homologous proteins is presented. ... The major virtues of this procedure are that the assertion of homology does not depend upon the prior introduction of gaps and that a genetic rather than a chemical test is the basis of for asserting a genetic relationship." Biochem Genet 3 3 99-108 0058 Amir,A. Dynamic Dictionary Mat.. J.Comput.System 94 49:208-222 Amir A; Farach M; Galil Z; Giancarlo R; Park K Dynamic Dictionary Matching Dictionary match; USA; Dynamic "We consider the dynamic dictionary matching problem. We are given a set of pattern strings (the dictionary) that can change over time; that is, we can insert a new pattern into the dictionary or delete a pattern from it. Moreover, given a text string, we must be able to find all occurrences of any pattern of the dictionary in the text. Let D0 be the empty dictionary. We present an algorithm that performs any sequence of the following operations in the given time bounds: (1) insert (p, Di-1) ... (2) delete (p, Di-1) ... (3) search (t, Di). Search text t[1,n] for all occurrences of the patterns of dictionary Di. The time complexity is O( ( n + tocc ) log |Di| ), where tocc is the total number of occurrences of patterns in the text." J Comput Systems Sci 49 49 208-222 0059 Heumann,K. A New Concept of Seque.. Comput.Appl.Bio 94 10(5):519-526 Heumann K; George D; Mewes HW A New Concept of Sequence Data Distribution on Wide Area Networks Sequence database; DE; Distribution; Network "Accepted concepts in distributed applications design have been applied in the development of a network-based system for the synchronization of remote sequence database access sites by an incremental update mechanism. Computer hardware requirements, network bandwidth, and stability considerations make centralized access to essential computerized resources undesirable. A network model has been developed to distribute access over a collection of remotely situated computer centers." Comput Appl Biosci 1994 10 5 519-526 0060 Gonnet,G.H. Text Algorithms. Chapt.. Handbook of A.. 91Addison-Wesley Gonnet GH; Baeza-Yates RA Text Algorithms. Chapter 7 in Handbook of Algorithms and Data Structures In Pascal and C Handbook of Algorithms and Data Structures in Pascal and C Text search; Review; SWI; Data structure; Structure; Algorithm "Text searching is the process of finding a pattern within a string of characters. ... We will divide the algorithms between those which search the text as given, those which require preprocessing of the text and other text algorithms." The entire book has references to 1350 published papers. Addison-Wesley Wokingham, UK 1991 2 251-288 0061 Gordon,L. An Extreme Value Theor.. Probab.Theory R 86 72:279-287 Gordon L; Schilling MF; Waterman MS An Extreme Value Theory for Long Head Runs Pairwise comparison; Significance; USA; Probabilistic "We show that the probabilistic behavior of the length of the longest pure head run (in the first n independent coin tosses) is closely approximated by that of the greatest integer function of the maximum of n(1-p) i.i.d. exponential random variables." Probab Theory Related Fields 72 72 279-287 0062 Jiang,T. Optimization Problems .. Advances in O.. 93 Jiang T; Li M Optimization Problems in Molecular Biology Du DZ Sun J Advances in Optimization and Approximation Longest common; Multiple alignment; CA; Optimization Manuscript received 11 February 1994. Jiang, Lawler, Wang (1994), p. 768. "Rather than an extensive literature survey, the purpose of this article is to introduce in depth several prominent optimization problems arising in molecular biology. We will emphasize recent developments and provide proof sketches for the results whenever possible." 1993 0063 Johnson,M.S. Alignment and Searchin.. J.Mol.Biol. 93 231:735-752 Johnson MS; Overington JP; Blundell TL Alignment and Searching for Common Protein Folds Using a Data Bank of Structural Templates Multiple alignment; Database search; UK; Template; Protein; Fold "We introduce an approach to protein comparisons in which tertiary structure information is exploited in the alignment of a protein sequence of known tertiary structure, or an aligned set of sequences of known homologous structures, with one or more sequences. ... (The approach produces) a scoring template suitable for aligning sequences or searching sequence data banks." J Mol Biol 231 231 735-752 0064 Karlin,S. Applications and Stati.. Proc.Nat.Acad.S 93 90:5873-5877 Karlin S; Altschul SF Applications and Statistics for Multiple High-Scoring Segments in Molecular Sequences Sequence analysis; Significance; USA; Segment; Statistical; Sequence comparison; Scoring "Molecular sequences will frequently yield several high-scoring segments for which some combined assessment is in order. This paper describes the statistical distribution for the sum of the scores of multiple high-scoring segments and illustrates its application to the identification of possible transmembrane segments and the evaluation of sequence similarity." Proc Nat Acad Sci USA 90 90 5873-5877 0065 Karp,R.M. Rapid Identification o.. ACM Sympos.Theo 72 4:125-136 Karp RM; Miller RE; Rosenberg AL Rapid Identification of Repeated Patterns in Strings, Trees and Arrays Sequence analysis; Regularities; USA; Pattern discovery; Identification; Repeat "We describe a strategy for constructing efficient algorithms for solving two type of matching problems. ... Depth d Matches: Find all depth d substructures of S which occur at least twice in S (possibly overlapping), and find the position in S of each such repeated substructure. Maximum Matches: Find the maximum depth D for which S has a repeated depth D substructure ... ." ACM Sympos Theory Comput 4 4 125-136 0066 Kashyap,R.L. An Effective Algorithm.. Inform.Sci. 81 23(3):201-217 Kashyap RL; Oommen BJ An Effective Algorithm for String Correction Using Generalized Edit Distances - II. Computational Complexity of the Algorithm and Some Applications Correction; USA; Edit; Complexity; Dictionary match; Distance; Algorithm "This paper deals with the problem of estimating an unknown transmitted string X, belonging to a finite dictionary H from its observable noisy version Y. ... We study the computational complexity of Algorithm I, and illustrate quantitatively the advantage Algorithm I has over the standard technique and other algorithms." Inform Sci 1981 23 3 201-217 0067 Kashyap,R.L. A Common Basis for Sim.. Internat.J.Comp 83 13:17-40 Kashyap RL; Oommen BJ A Common Basis for Similarity Measures Involving Two Strings Sequence proximity; USA; Edit; Longest common; Supersequence; Pairwise comparison; Similarity "We consider an abstract measure between strings X and Y, written as D(X,Y), defined in terms of two abstract operators + and * and a binary function d whose arguments are symbols of an alphabet A. ... Many new results are obtained using this abstract formulation, such as an explicit linear relationship between the LLCS and the LSCS between two strings." Internat J Comput Math 13 13 17-40 0068 Kashyap,R.L. Similarity Measures fo.. Internat.J.Comp 83 13:95-104 Kashyap RL; Oommen BJ Similarity Measures for Sets of Strings Multiple comparison; USA; Sequence proximity; Similarity "We extend the results (of Kashyap and Oommen 1983) to capture various numerical and nonnumerical measures involving more than two strings." Internat J Comput Math 13 13 95-104 0069 Lausen,B. Statistical Analysis o.. Classificatio.. 91Springer-Verlag Lausen B Statistical Analysis of Genetic Distance Data Bock HH Ihm P Classification, Data Analysis, and Knowledge Organization. Models and Methods with Applications Pairwise alignment; Significance; Dot; DE; Statistical; Genetic; Distance "A genetic distance may be computed from aligned genetic sequence data; e.g. DNA sequences. We discuss the dot-matrix plot as a possible graphical check of the goodness of the alignment. ... Therefore, we discuss aspects of an heuristic which allows the combined exploration of genetic distance between the sequences and of different positional variation. A tree structure is not assumed for such an exploration." Springer-Verlag Berlin 1991 254-261 0070 Majster,M.E. Efficient On-line Cons.. SIAM J.Comput. 80 9(4):785-807 Majster ME; Reiser A Efficient On-line Construction and Correction of Position Trees Pattern match; Search tree; DE; Correction; Data structure "This paper presents an on-line algorithm for the construction of position trees, i.e., an algorithm which constructs the position tree for a given string while reading the string from left to right. In addition, an on-line correction algorithm is presented which - upon a change in the string - can be used to construct the new position tree." SIAM J Comput 1980 9 4 785-807 0071 Manber,U. Suffix Arrays: A New M.. Proceedings o.. 90Society for Ind Manber U; Myers EW Suffix Arrays: A New Method for On-line String Searches Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms String match; Search tree; USA; Data structure; String search; Suffix "A new and conceptually simple data structure, called a suffix array, for on-line string searches is introduced in this paper. ... Suffix arrays permit on-line string searches of the type, 'Is W a substring of A?' to be answered in time O(P+logN), where P is the length of W and N is the length of A, which is competitive with (and in some cases slightly better than) suffix trees." Society for Industrial and Applied Mathematics Philadelphia, PA 1990 319-327 0072 Morrison,D.R. PATRICIA - Practical A.. J.Assoc.Comput. 68 15(4):514-534 Morrison DR PATRICIA - Practical Algorithm to Retrieve Information Coded in Alphanumeric String match; Search tree; USA; Algorithm "PATRICIA is an algorithm which provides a flexible means of storing, indexing, and retrieving information in a large file .... It retrieves information in response to keys furnished by the user with a quantity of computation which has a bound which depends linearly on the length of keys and the number of their proper occurrences and is otherwise independent of the size of the library." Section 6 is on how PATRICIA detects the presence of a phrase and finds its proper occurrences. J Assoc Comput Mach 1968 15 4 514-534 0073 Myers,G. A Four Russians Algori.. J.Assoc.Comput. 92 39(4):430-448 Myers G A Four Russians Algorithm for Regular Expression Pattern Matching Match complex patterns; Language; USA; Pattern match; Expression; Automata; Algorithm "We present an O(PN/log N) worst-case time and space algorithm for determining if a word A of length N is in the language denoted by a regular expression R of length P." J Assoc Comput Mach 1992 39 4 430-448 0074 Owolabi,O. Efficient Pattern Sear.. Inform.Process. 93 47:17-21 Owolabi O Efficient Pattern Searching over Large Dictionaries Database search; N-gram; Boyer-Moore; NI; Pattern match; Pattern search "A method is described which is suitable for on-line query term expansion. By using an efficient version of the N-gram method for similarity matching, a small set of strings from the dictionary is selected. From this set, all the strings relevant to the query term are then identified using the Boyer-Moore pattern matching algorithm." Inform Process Lett 47 47 17-21 0075 Pearson,W.R. Identifying Distantly .. Curr.Opin.Struc 91 1:321-326 Pearson WR Identifying Distantly Related Protein Sequences Database search; Review; Consensus sequence; Significance; USA; Sequence comparison; Statistical; Region; Protein "New methods for identifying distantly related proteins can be used to confirm sequence homology when only weak sequence similarity remains. These methods improve the selectivity of sequence comparison either by calculating the statistical significance of the most similar region, or by using consensus patterns rather than simple pairwise similarity scores." Curr Opin Struct Biol 1 1 321-326 0076 Perlwitz,M.D. Pattern Analysis of th.. Adv.Appl.Math. 88 9:7-21 Perlwitz MD; Burks C; Waterman MS Pattern Analysis of the Genetic Code Genetic; USA; Codon; Mapping "The genetic code is examined in a new and systematic fashion: we consider the code as a mapping of one finite set (the 64 codons) to another (the 20 amino acids). Given a class of mappings simpler than the actual code, we ask which mappings best approximate it." Adv Appl Math 9 9 7-21 0077 Rice,C.M. The EMBL Data Library Nucleic Acids R 93 21(13):2967-29 Rice CM; Fuchs R; Higgins DG; Stoehr PJ; Cameron GN The EMBL Data Library Database search; Sequence database; DE; EMBL "The principal role of the EMBL Data Library, since its inception in 1980, has been to maintain and distribute a database of nucleotide sequences (the EMBL Nucleotide Sequence Database). It also supports and maintains the protein sequence database SWISS-PROT and distributes other databases of interest to molecular biologists." Nucleic Acids Res 1993 21 13 2967-2971 0078 Sackin,M.J. Amino Acid Sequences i.. Biochem.J. 65 96:70P-71P Sackin MJ; Sneath PHA Amino Acid Sequences in Proteins: A Computer Study Pairwise comparison; UK; Amino acid; Protein "An ALGOL program for the Elliott 803 computer has been developed for comparing the amino acid sequences in two protein chains. It can detect similarities, deletions, insertions and inversions that would be hard to detect by eye. The method is to 'slide' the chains past each other one step at a time and to count the number of amino acids that match." Biochem J 96 96 70P-71P 0079 Slisenko,A.O. Determination in Real .. Soviet Math.Dok 80 21(2):392-395 Slisenko AO Determination in Real Time of all the Periodicities in a Word Regularities; Complexity; RU; Word Describes "the general properties of a construction upon which the proof of the following assertion is based: There exists an addressable machine that determines in real time all the periodicities in an input word. ... One can extract from the basic properties of the algorithm for finding the periodicities of a word in real time a complete solution to the problem of the complexity of a number of well-known problems concerning the determination of the subwords of an input word." Soviet Math Dokl 1980 21 2 392-395 0080 Steele,J.M. Long Common Subsequenc.. SIAM J.Appl.Mat 82 42(4):731-737 Steele JM Long Common Subsequences and Probability of Two Random Strings Pairwise comparison; Significance; USA; Probabilistic; Longest common; Subsequence; Probability "Let (x1,...xn) and (y1,...yn) be two strings from an alphabet A and let Ln denote their longest common subsequence. The probabilistic behavior of Ln is studied under various probability models for the x's and y's." SIAM J Appl Math 1982 42 4 731-737 0081 Unger,R. DNAMAT: An Efficient G.. Comput.Appl.Bio 86 2(4):283-289 Unger R; Harel D; Sussman JL DNAMAT: An Efficient Graphic Matrix Sequence Homology Algorithm and its Application to Structural Analysis Pairwise comparison; Multiple comparison; Dot; IL; Display; Homology; Algorithm; Graphic; Matrix "We present a fast algorithm to produce a graphic matrix representation of sequence homology. ... In addition we suggest a way to extend our approach to analyse a series of related DNA or RNA sequences, in order to determine certain common structural features. The analysis is done by 'summing' a set of dot- matrices to produce an overall matrix that displays structural elements common to most of the sequences." Comput Appl Biosci 1986 2 4 283-289 0082 Vishkin,U. Deterministic Sampling.. ACM Sympos.Theo 90 22:170-180 Vishkin U Deterministic Sampling - A New Technique for Fast Pattern Matching Parallel; USA; Pattern match; String match; Sampling "Consider the string matching problem. Given the pattern, we select carefully a sample of its positions .... Then, we search for the sample. For non-periodic patterns, the sample ... provides sparse verification. This approach enables to perform the text analysis ... in O(log* n) time and optimal speed-up on a PRAM. ... It also leads to a new linear time serial algorithm for string matching." ACM Sympos Theory Comput 22 22 170-180 0083 Weiner,P. Linear Pattern Matchin.. IEEE Sympos.Swi 73 14:1-11 Weiner P Linear Pattern Matching Algorithms String match; Dictionary match; Search tree; USA; Pattern match; Data structure; Algorithm 15-17 October 1973. "We introduce an interesting data structure called a bi-tree. A linear time algorithm for obtaining a compacted version of a bi-tree associated with a given string is presented. With this construction as the basic tool, we indicate how to solve several pattern matching problems ... in linear time." IEEE Sympos Switching Automata Theory 14 14 1-11 0084 Amir,A. Adaptive Dictionary Ma.. IEEE Sympos.Fou 91 32:760-766 Amir A; Farach M Adaptive Dictionary Matching Dictionary match; Suffix; USA "We present new semi-adaptive and fully-adaptive dictionary matching algorithms. In the fully adaptive algorithm, the dictionary is precessed in time O( |D| log |D| ). Inserting a new pattern P into the dictionary can be done in time O( |P| log |D| ). A dictionary pattern can be deleted in time O( log |D| ). Text scanning is accomplished in time O( |T| log |D| ). We also present a parallel version of the algorithm with optimal speedup for the dictionary construction and pattern addition phase and a logarithmic overhead in the text scan phase. Our method incorporates a new way of using suffix trees ...." IEEE Sympos Found Comput Sci 32 32 760-766 0085 Commentz-Walt A String Matching Algo.. Lecture Notes i 79 71:118-132 Commentz-Walter B A String Matching Algorithm Fast on the Average. Extended Abstract Dictionary match; DE; String match; Algorithm Proceedings, 6th ICALP, International Colloquium on Automata, Languages and Programming. Graz, Austria, July 1979. "A user of the database specifies one or several words or phrases, so called keywords, describing the information sought. The answer will be the documents which contain all or some of the user specified keywords. It takes too much time to scan each document of the database for every user separately. Therefore, we introduce a sort of secondary index ... containing keyword fragments. Searching the index with the user specified keywords yields a superset of the documents required. ... Therefore, we scan the documents of the superset for the user specified keywords." Lecture Notes in Comput Sci 71 71 118-132 0086 Barton,G.J. LOPAL and SCAMP: Techn.. J.Mol.Graphics 88 6(Dec.):190-19 Barton GJ; Sternberg MJE LOPAL and SCAMP: Techniques for the Comparison and Display of Protein Sequences Structure; Program; UK; Display; Dynamic programming; Least squares; Protein "This paper describes two computer programs designed to assist in the comparison of protein structures. LOPAL (LOoP ALignment) applies a dynamic programming algorithm to the comparison of regions of protein three dimensional (3D) structure and gives a similarity score and suggested sequence alignment with that score. SCAMP (Structure Comparison and Alignment of Multiple Proteins) is an interactive graphics program ... that allows the simultaneous display, manipulation and pairwise least-squares fitting of up to nine independent structures." J Mol Graphics 1988 6 Dec. 190-196 0087 Taylor,W.R. Protein Structure Alig.. J.Mol.Biol. 89 208:1-22 Taylor WR; Orengo CA Protein Structure Alignment Structure; UK; Dynamic programming; Protein "A new method of comparing protein structures is described, based on distance plot analysis. ... When presented with the co-ordinate sets of two structures, the method will produce automatically an alignment of their sequences based on structural criteria. The method uses the dynamic programming optimization technique, which is widely used in the comparison of protein sequences and thus unifies the techniques of protein structure and sequence comparison." J Mol Biol 208 208 1-22 0088 Rodeh,M. Linear Algorithm for D.. J.Assoc.Comput. 81 28(1):16-24 Rodeh M; Pratt VR; Even S Linear Algorithm for Data Compression via String Matching Search tree; Compression; IL; String match; Algorithm "A linear implementation of the optimal universal data compression methods of Lempel and Ziv is described. The main tool is McCreight's algorithm for constructing suffix trees. Both bounded and unbounded memory are considered." J Assoc Comput Mach 1981 28 1 16-24 0089 Baeza-Yates,R Searching Subsequences Theoret.Comput. 91 78:363-376 Baeza-Yates RA Searching Subsequences Subsequence; Automata; Longest common; CL "We define the directed acyclic subsequence graph of a text as the smallest deterministic partial finite automaton that recognizes all possible subsequences of that text. ... We show that it is possible to build this automaton using O(n log n) time and O(n) space for a text of size n. With this structure, we can search a subsequence in logarithmic time. We extend this construction to the case of multiple strings .... For the latter case, we discuss its application to the longest common subsequence problem. ... Our algorithm improves upon previous solutions for more than two strings." Theoret Comput Sci 78 78 363-376 0090 Irving,R.W. Two Algorithms for the.. Lecture Notes i 92 644:214-229 Irving RW; Fraser CB Two Algorithms for the Longest Common Subsequence of Three (or More) Strings Longest common; Subsequence; UK; Algorithm Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992. Proceedings. "Various algorithms have been proposed, over the years, for the longest common subsequence problem on 2 strings (2-LCS), many of these imporving, at least for some cases, on the classical dynamic programming approach. However, relatively little attention has been paid in the literature to the k-LCS problem for k > 2 .... In this paper, we describe and analyse two algorithms with particular reference to the 3-LCS problem, though each algorithm can be extended to solve the k-LCS problem for general k." Lecture Notes in Comput Sci 644 644 214-229 0091 Henikoff,S. Playing with Blocks: S.. New Biol. 91 3(12):1148-115 Henikoff S Playing with Blocks: Some Pitfalls of Forcing Multiple Alignments Multiple alignment; Review; USA; Region "Block alignments of multiple amino acid sequences are useful representations of regions thought to share common ancestry and function. Often the block alignments are motivated by the expectation that a protein of interest is similar in function to members of a family of proteins. However, when alignments are forced by using ad hoc methods, it is often difficult to decide whether the proposed relationship is valid. Visual examination can be deceptive, especially when alignments are not carried out in the context of controls subjected to similar procedures.. ... When standard methods fail to find an interesting block alignment unaided by human intervention, then the result should be regarded with caution." New Biol 1991 3 12 1148-1154 0092 Breslauer,D. Efficient Comparison B.. J.Complexity 93 9(3):339-365 Breslauer D; Galil Z Efficient Comparison Based String Matching String match; Pattern match; NL "We study the exact number of symbol comparisons that are required to solve the string matching problem and present a family of efficient algorithms. Unlike previous string matching algorithms, the algorithms in this family do not 'forget' results of comparisons, what makes their analysis much simpler. In particular, we give a linear-time algorithm that finds all occurrences of a pattern of length m in a text of length n .... The pattern preprocessing takes linear time and makes at most 2m comparisons. This algorithm establishes that, in general, searching for a long pattern is easier than searching for a short one." J Complexity 1993 9 3 339-365 0093 Chao,K.M. Constrained Sequence A.. Bull.Math.Biol. 93 55(3):503-524 Chao KM; Hardison RC; Miller W Constrained Sequence Alignment Pairwise alignment; Dynamic programming; USA; Sequence alignment; Locally optimal; Gap "This paper presents a dynamic programming algorithm for aligning two sequences when the alignment is constrained to lie between two arbitrary boundary lines in the dynamic programming matrix. For affine gap penalties, the algorithm requires only O(F) computation time and O(M+N) space, where F is the area of the feasible region and M and N are the sequence lengths. The result extends to concave gap penalties, with somewhat increased time and space bounds." Bull Math Biol 1993 55 3 503-524 0094 Friemann,A. A New Approach for Dis.. Comput.Appl.Bio 92 8(3):261-265 Friemann A; Schmitz S A New Approach for Displaying Identities and Differences among Aligned Amino Acid Sequences Display; Consensus index; Sequence proximity; DE; Amino acid "An algorithm is presented for computing degrees of sequence conservation found among aligned amino acid sequences. Sequence identities are calculated for each position of an alignment and average identity values of neighboring positions are figured. The average identity value of the whole alignment is chosen as a limit to discriminate between well and less conserved sequence sections. A second algorithm is given to calculate the degree of divergence of individual sequences compared to the other sequences of the alignment." Comput Appl Biosci 1992 8 3 261-265 0095 Hardison,R. Use of Long Sequence A.. Mol.Biol.Evol. 93 10(1):73-102 Hardison R; Miller W Use of Long Sequence Alignments to Study the Evolution and Regulation of Mammalian Globin Gene Clusters Multiple alignment; USA; Segment; Genome; Sequence alignment; Evolution; Gene "The determination of long segments of DNA sequences encompassing the b- and a-globin gene clusters has provided an unprecedented data base for analysis of genome evolution and regulation of gene clusters. A newly developed computer tool kit generates local alignments between such long sequences in a space- efficient manner, helps the user analyze the alignments effectively, and finds consistently aligning blocks of sequences in multiple pairwise comparisons." Mol Biol Evol 1993 10 1 73-102 0096 Barker,W.C. Protein Sequence Datab.. Methods Enzymol 90 183:31-49 Barker WC; George DG; Hunt LT Protein Sequence Database Sequence database; USA; Protein "The Protein Sequence Database has been maintained by researchers at the National Biomedical Research Foundation (NBRF) since the early 1960s. ... Currently the NBRF effort is supported as part of the Protein Identification Resource (PIR) project funded by the NIH Division of Research Resources, the National Library of Medicine, and the National Institute of General Medical Sciences. The main purpose of this resource is to aid the research community in the identification and interpretation of protein sequence information." Methods Enzymol 183 183 31-49 0097 Kolaskar,A.S. Sequence Alignment App.. J.Mol.Biol. 92 223:1053-1061 Kolaskar AS; Kulkarni-Kale U Sequence Alignment Approach to Pick Up Conformationally Similar Protein Fragments Scoring; Substitution; Pairwise alignment; India; Sequence alignment; Fragment; Protein "A weight matrix, called Conformational Similarity Weight (CSW) matrix, was prepared using the conformational similarity index. This weight matrix was used to align sequences of 21 pairs of proteins whose crystal structures are known. ... Such an approach allows us to pick up conformationally similar protein fragments with more than 67% accuracy." J Mol Biol 223 223 1053-1061 0098 Clark,S.P. MALIGNED: A Multiple S.. Comput.Appl.Bio 92 8(6):535-538 Clark SP MALIGNED: A Multiple Sequence Alignment Editor Multiple alignment; Program; CA; Sequence alignment; Editor "A multiple sequence alignment editor is described which runs on a VAX/VMS system and can exchange data with a number of other programs, including those of the Genetics Computer Group (GCG). Up to 199 sequences can be aligned. The quality of the alignment can be easily judged during its development because the display attributes to each character are determined by the way it matches the other sequences." Comput Appl Biosci 1992 8 6 535-538 0099 Faulkner,D.V. Multiple Aligned Seque.. Trends Biochem. 88 13:321-322 Faulkner DV; Jurka J Multiple Aligned Sequence Editor (MASE) Multiple alignment; Program; Editor; Sequence analysis; USA "Cognitive capacities of the human brain can not, so far, be matched by computers. Even well optimized computer programs have limited flexibility in addressing the variety of problems associated with sequence analysis. Hence, we were motivated to design a Multiple Aligned Sequence Editor (MASE) which combines manual sequence manipulations with standard computer analysis." Trends Biochem Sci 13 13 321-322 0100 Stockwell,P.A HOMED: A Homologous Se.. Comput.Appl.Bio 87 3(1):37-43 Stockwell PA; Petersen GB HOMED: A Homologous Sequence Editor Program; NZ; Display; Consensus sequence; Parallel; Editor "The alignment of homologous sequences with each other and their display has proved a difficult task, despite a frequent requirement for this process. HOMED enables related sequences to be edited and listed in parallel with each other. ... HOMED provides functions for listing the sequences in a variety of formats and for generating a consensus sequence as well as providing a series of tools for maintenance of the sequence database." Comput Appl Biosci 1987 3 1 37-43 0101 Thirup,S. ALMA, An Editor for La.. Proteins Struct 90 7:291-295 Thirup S; Larsen NE ALMA, An Editor for Large Sequence Alignments Multiple alignment; Management; Program; DK; Sequence alignment; Display; Editor "A dedicated sequence editor, ALMA, was developed for aligning many sequences of proteins or RNA molecules or longer DNA fragments. Like previously published editors, ALMA is menu directed, screen oriented, and offers similarity and consensus display. ALMA has the additional features of collective movement of sequences, acceptance of input from many sources including structure files and databases, secondary structure display, and easy merging of alignments. ... The program allows interaction between manual and automatic alignment." Proteins Struct Funct Genet 7 7 291-295 0102 Knox,E.B. Chloroplast Genome Rea.. Mol.Biol.Evol. 93 10(2):414-430 Knox EB; Downie SR; Palmer JD Chloroplast Genome Rearrangements and the Evolution of Giant Lobelias from Herbaceous Ancestors Genome; Rearrangement; Deletion; Inversion; USA; Evolution; Chloroplast; Ancestor "Phylogenetic relationships among 16 species of Lobelia and single representatives of Monopsis and Sclerotheca (Lobeliaceae) were assessed by mapping restriction sites and major structural rearrangements (deletions and inversions) in the large single-copy region of the chloroplast genome. Eleven inversions and five different gene arrangements were found. A deletion involving ORF512 is associated with many of the inversions, and all inversion endpoints are located in intergenic spacer regions." Mol Biol Evol 1993 10 2 414-430 0103 Henikoff,S. Amino Acid Substitutio.. Proc.Nat.Acad.S 92 89:10915-10919 Henikoff S; Henikoff JG Amino Acid Substitution Matrices from Protein Blocks Sequence proximity; Substitution; USA; Scoring; Amino acid; Protein "Methods for alignment of protein sequences typically measure similarity by using a substitution matrix with scores for all possible exchanges of one amino acid with another. The most widely used matrices are based on the Dayhoff model of evolutionary rates. Using a different approach, we have derived substitution matrices from about 2000 blocks of aligned sequence segments characterizing more than 500 groups of related proteins. This led to marked improvements in alignments and in searches using queries from each of the groups." Proc Nat Acad Sci USA 89 89 10915-10919 0104 Fuchs,R. Molecular Biological D.. Trends Biotechn 92 10(1):61-66 Fuchs R; Rice P; Cameron GN Molecular Biological Databases - Present and Future Sequence database; DE; Genome; Mapping "The importance of databases as a research tool in molecular biology is growing steadily, and a wide range of databases relevant to genome research is currently available. However, the design of current databases is inadequate for accurate representation and analysis of the results of large-scale genome mapping and sequencing projects. A new generation of databases is required to master the challenges of the future." Challenges concerning data acquisition, data distribution, data interpretation, flexibility of data representation and database integration, database design. Trends Biotechnol 1992 10 1 61-66 0105 Orcutt,B.C. Protein and Nucleic Ac.. Annu.Rev.Biophy 83 12:419-441 Orcutt BC; George DG; Dayhoff MO Protein and Nucleic Acid Sequence Database Systems Database search; Sequence database; USA; Protein; Nucleic acid "Several groups currently collect data and maintain large-scale computerized nucleic acid sequence databases. These include the National Biomedical Research Foundation (NBRF), the Los Alamos National Laboratory, the European Molecular Biology Laboratory, and the Molecular Evolution group at Lyon. ... Only the NBRF group maintains a comprehensive protein data collection that is available on-line. In this review we primarily describe the NBRF system, which as the present time contains the largest and most comprehensive data collections and the most integrated on-line distribution system." Annu Rev Biophys Bioeng 12 12 419-441 0106 Gutell,R.R. Identifying Constraint.. Nucleic Acids R 92 20(21):5785-57 Gutell RR; Power A; Hertz GZ; Putz EJ; Stormo GD Identifying Constraints on the Higher-Order Structure of RNA: Continued Development and Application of Comparative Sequence Analysis Methods Multiple alignment; Structure; USA; Sequence analysis; RNA "Comparative sequence analysis addresses the problem of RNA folding and RNA structural diversity, and is responsible for determining the folding of many RNA molecules. ... Comparative structure analysis requires an alignment of those sequences that make up the collection. The better the alignment, the more meaningful the information that can be discerned. Initially sequences are aligned for maximum primary structure homology. As secondary structure elements are identified and phylogenetically proven, these features, in addition to primary structure conservation, serve to constrain the juxtaposition of sequences." Nucleic Acids Res 1992 20 21 5785-5795 0107 Marck,C. 'DNA Strider': A 'C' P.. Nucleic Acids R 88 16(5):1829-183 Marck C 'DNA Strider': A 'C' Program for the Fast Analysis of DNA and Protein Sequences on the Apple Macintosh Family of Computers Sequence analysis; FR; Program; Editor; Restriction; Dictionary match; Protein; DNA The program "has been designed as an easy to learn and use program as well as a fast and efficient tool for the day-to-day sequence analysis work. The program consists of a multi-window sequence editor and of various DNA and Protein analysis functions. ... The restriction sites search uses a newly designed fast hexamer look-ahead algorithm. Typical runtime for the search of all sites with a library of 130 restriction endonucleases is 1 second per 10000 bases." Nucleic Acids Res 1988 16 5 1829-1836 0108 Blum,A. Linear Approximation o.. J.Assoc.Comput. 94 41(4):630-647 Blum A; Jiang T; Li M; Tromp J; Yannakakis M Linear Approximation of Shortest Superstrings Supersequence; Shortest common; Approximation; USA Also Proc. 23rd ACM Symp. on Theory of Computing, 1991, 328-336. "We consider the following problem: given a collection of strings s1, ..., sm, find the shortest string s such that each si appears as a substring (a consecutive block) of s. Although this problem is known to be NP-hard, a simple greedy procedure appears to do quite will and is routinely used in DNA sequencing .... We show that the greedy algorithm does in fact achieve a constant factor approximation, proving an upper bound of 4n. Furthermore, we present a simple modified version of the greedy algorithm that we show produces a superstring of length at most 3n." J Assoc Comput Mach 1994 41 4 630-647 0109 Gallant,J. On Finding Minimal Len.. J.Comput.System 80 20:50-58 Gallant J; Maier D; Storer JA On Finding Minimal Length Superstrings Supersequence; Complexity; USA "The superstring problem is: Given a set S of strings and a positive integer K, does S have a superstring of length K? ... We consider the complexity of the superstring problem. NP-completeness results dealing with sets of strings over both finite and infinite alphabets are presented. Also, for a restricted version of the superstring problem, a linear time algorithm is given." J Comput Systems Sci 20 20 50-58 0110 Allison,L. Restriction Site Mappi.. Comput.Appl.Bio 88 4(1):97-101 Allison L; Yee CN Restriction Site Mapping is in Separation Theory Restriction; Mapping; AU "A computer algorithm for restriction-site mapping consists of a generator of partial maps and a consistency checker. This paper examines consistency checking and argues that a method based on separation theory extracts the maximum amount of information from fragment lengths in digest data. It results in the minimum number of false maps being generated." Comput Appl Biosci 1988 4 1 97-101 0111 Jiang,T. A Note on Shortest Sup.. Inform.Process. 92 44(4):195-199 Jiang T; Li M; Du DZ A Note on Shortest Superstrings with Flipping Supersequence; CA; Approximation "This paper considers an interesting variation of the [shortest common superstring] problem: For a given set of strings S = {s1, ... , sm}, find a shortest superstring that contains either si or siR for each i. The problem may have applications in DNA sequencing practice when orientations of the fragments in the target DNA molecule are unknown. We give a simple greedy algorithm and prove a 4n approximation bound for it." Inform Process Lett 1992 44 4 195-199 0112 Karp,R.M. Mapping the Genome: So.. ACM Sympos.Theo 93 25:278-285 Karp RM Mapping the Genome: Some Combinatorial Problems Arising in Molecular Biology Genome; Mapping; Combinatorial; Clone; USA "In order to construct a physical map of a large DNA molecule it is necessary to extract from it a large number of fragments called clones, obtain a 'fingerprint' of each clone, and then mathematically reassemble the DNA molecule by determining how the clones overlap. This reassembly process leads to a number of challenging algorithmic, combinatorial and probabilistic problems that are currently handled in a primitive way, and should be grist for the mills of theoretical computer scientists." ACM Sympos Theory Comput 25 25 278-285 0113 Middendorf,M. More on the Complexity.. Theoret.Comput. 94 125:205-228 Middendorf M More on the Complexity of Common Superstring and Supersequence Problems Supersequence; DE; Complexity The author obtains NP-completeness results concerning decision versions of the problems to find the Shortest Common Superstring, the Shortest Common Supersequence, and cyclic and permutation variants of them. Theoret Comput Sci 125 125 205-228 0114 Tarhio,J. A Greedy Approximation.. Theoret.Comput. 88 57:131-145 Tarhio J; Ukkonen E A Greedy Approximation Algorithm for Constructing Shortest Common Superstrings Multiple comparison; Supersequence; Knuth-Morris-Pratt; FI; Approximation; Compression; Shortest common; Algorithm "An approximation algorithm for the shortest common superstring problem is developed, based on the Knuth-Morris-Pratt string-matching procedure and on the greedy heuristics for finding longest Hamiltonian paths in weighted graphs. Given a set R of strings, the algorithm constructs a common superstring for R in O(mn) steps where m is the number of strings in R and n is the total length of these strings. The performance of the algorithm is analysed in terms of the compression in the common superstrings constructed ...." Theoret Comput Sci 57 57 131-145 0115 Teng,S.H. Approximating Shortest.. IEEE Sympos.Fou 93 34:158-165 Teng SH; Yao F Approximating Shortest Superstrings Supersequence; Shortest common; Approximation; USA "The Shortest Superstring Problem is to find a shortest possible string that contains every string in a given set as substrings. This problem has applications to data compression and DNA sequencing. As the problem is NP-hard and MAX SNP-hard, approximation algorithms are of interest. We present a new algorithm which always finds a superstring that is at most 2.89 times as long as the shortest superstring. Our result improves the 3-approximation result of Blum, Jiang, Li, Tromp, and Yannakakis [1991]." IEEE Sympos Found Comput Sci 34 34 158-165 0116 Turner,J.S. Approximation Algorith.. Inform.Comput. 89 83(1):1-20 Turner JS Approximation Algorithms for the Shortest Common Superstring Problem Supersequence; Search tree; Approximation; USA; Shortest common; Algorithm "The object of the shortest common superstring problem (SCS) is to find the shortest possible string that contains every string in a given set as substrings. As the problem is NP-complete, approximation algorithms are of interest. ... We describe several approximation algorithms that produce solutions that are always within a factor of two of optimum with respect to the overlap measure. We also describe an efficient implementation of one of these, using McCreight's compact suffix tree construction algorithm." Inform Comput 1989 83 1 1-20 0117 Bork,P. Recognition of Functio.. FEBS Lett. 89 257(1):191-195 Bork P Recognition of Functional Regions in Primary Structures using a Set of Property Patterns Database search; Pattern library; DE; Region; Motif; Pattern definition; Structure; Recognition "32 consensus patterns for a set of functional regions and structural motifs in protein sequences were constructed. The pattern definition is heuristic and based on 11 selected steric and physicochemical properties. By comparison with these patterns, it was possible to identify, without false detection, 1532 sites in 8702 protein sequences of SWISSPROT. Screening against such a pattern library offers a considerable chance to identify functional regions or structural motifs in proteins from which only the sequence is known." FEBS Lett 1989 257 1 191-195 0118 Johnson,M.S. Comparisons of Protein.. Curr.Opin.Struc 91 1:334-344 Johnson MS Comparisons of Protein Structures Structure; Review; UK; Protein "The structures of proteins related by evolution are remarkably alike even when the observed sequence similarities are statistically marginal or seemingly non-existent. Similar protein substructures are found in proteins for which there is no evidence of common ancestry and no similarity in their global topology. Recent advances in the comparison of whole proteins, together with the comparison and analysis of their parts, have paved the way for the use of structural information in prediction and modelling, protein engineering, structure and sequence alignments, and investigations of protein evolution ...." Curr Opin Struct Biol 1 1 334-344 0119 Sali,A. From Comparison of Pro.. Trends Biochem. 90 15:235-240 Sali A; Overington JP; Johnson MS; Blundell TL From Comparison of Protein Sequences and Structures to Protein Modelling and Design Sequence comparison; Structure; UK; Protein "A useful approach to modelling proteins exploits knowledge of three- dimensional structures determined by X-ray crystallography together with rules defined by their analysis and comparison. ... In this review we shall consider our own approach to protein modelling which can be completely automated, and in which all decisions are rule based." Trends Biochem Sci 15 15 235-240 0120 Gautheret,D. Pattern Searching/Alig.. Comput.Appl.Bio 90 6(4):325-331 Gautheret D; Major F; Cedergren R Pattern Searching/Alignment with RNA Primary and Secondary Structures: An Effective Descriptor for tRNA Pattern search; Sequence alignment; Motif; CA; Pattern match; Structure; RNA; Secondary "A convenient pattern-matching program using primary and higher-order structural features has been developed based on a 'backtracking' algorithm. A second implementation of the algorithm uses descriptors of structural features (including primary sequences) to align a list of homologous or highly similar sequences. An application of the pattern matcher to the search for tRNA and group I intron structural motifs in sequence data banks is presented." Comput Appl Biosci 1990 6 4 325-331 0121 Gartmann,C.J. SQUIRREL: Sequence QUe.. Nucleic Acids R 91 19(21):6033-60 Gartmann CJ; Grob U SQUIRREL: Sequence QUery, Information Retrieval and REporting Library. A Program Package for Analyzing Signals in Nucleic Acid Sequences for the VAX Multiple alignment; Segment; DE; Signal; Program; Nucleic acid; Retrieval; Query "A computer tool is described for comparison, analysis and search of genetic signals. The method is based on sequence consensus matrices. It assumes that a genetic signal (such as a promoter, enhancer or whatever) is composed of several signal blocks separated from each other by variable distances. A set of programs is presented to perform the analysis. ... The method is able to align large sets of sequences within a few minutes and to check the quality of the alignment." Nucleic Acids Res 1991 19 21 6033-6040 0122 Apostolico,A. Optimal Off-line Detec.. Theoret.Comput. 83 22:297-315 Apostolico A; Preparata FP Optimal Off-line Detection of Repetitions in a String Regularities; Search tree; USA; Data structure; Optimal; Repetition; Detection "An algorithm is presented to detect - within optimal time O(n log n) and space O(n), off-line on a RAM - all of the distinct repetitions in a given textstring on a finite alphabet. The proposed strategy is self-contained, as it depends more heavily on algorithmic design considerations than on the combinatorial properties of the output. It is based on a new data structure, the leaf-tree, which is particularly suited to exploit simple properties of the suffix tree associated with the string to be analyzed." Theoret Comput Sci 22 22 297-315 0123 Apostolico,A. Structural Properties .. J.Comput.System 85 31:394-411 Apostolico A; Preparata FP Structural Properties of the String Statistics Problem String match; Search tree; USA; Regularities "A suitably weighted index tree ... can be easily adapted to store, for a given string x and for all substrings w of x, the number of distinct instances of w along x. ... If the substring w has nontrivial periods, however, the number of distinct instances might differ from that of distinct non-overlapping occurrences along x. It is shown here that O(n log n) storage units - n standing for the length of x - are sufficient to organize this second kind of statistics, in such a way that the maximum number of nonoverlapping instances for arbitrary w along x can be retrieved in a number of character comparisons not exceeding the length of w." J Comput Systems Sci 31 31 394-411 0124 Blumer,A. The Smallest Automaton.. Theoret.Comput. 85 40:31-55 Blumer A; Blumer J; Haussler D; Ehrenfeucht A; Chen MT; Seiferas J The Smallest Automaton Recognizing the Subwords of a Text Regularities; Automata; USA "Let a partial deterministic finite automaton be a DFA in which each state need not have a transition edge for each letter of the alphabet. We demonstrate that the smallest partial DFA for the set of all subwords of a given word w, |w| > 2, has at most 2 |w| - 2 states and 3 |w| - 4 transition edges, independently of the alphabet size. We give an algorithm to build this smallest partial DFA from the input w on-line in linear time." Theoret Comput Sci 40 40 31-55 0125 Hancart,C. On Simon's String Sear.. Inform.Process. 93 47(2):95-99 Hancart C On Simon's String Searching Algorithm String match; FR; Complexity; String search; Algorithm "Simon has recently designed [a string matching] algorithm which can be regarded as a compromise between the implementation of [a deterministic finite automaton] and [the algorithm of Knuth, Morris and Pratt]. ... In this paper, we extend Simon's work by studying the complexity of [variants of Simon's algorithm]." Inform Process Lett 1993 47 2 95-99 0126 Claverie,J.M. Heuristic Informationa.. Nucleic Acids R 86 14(1):179-196 Claverie JM; Bougueleret L Heuristic Informational Analysis of Sequences Sequence analysis; Pattern discovery; Information content; FR; Statistical; Profile; N-gram; Heuristic "Nucleotide or amino-acid sequences are interpreted as successions of words of length k (k-tuples) the frequencies of which are highly variable in different statistical populations of genes or proteins. After building k-tuple reference tables from coherent subsets or entire data banks, the local information content profile of individual sequences is drawn. Anomalous regions (peaks or depressions) of such a profile can lead to the discovery and identification of specific sequence patterns." Nucleic Acids Res 1986 14 1 179-196 0127 Prestridge,D. SIGNAL SCAN: A Compute.. Comput.Appl.Bio 91 7(2):203-206 Prestridge DS SIGNAL SCAN: A Computer Program that Scans DNA Sequences for Eukaryotic Transcriptional Elements Dictionary match; USA; Consensus sequence; Signal; Program; DNA "SIGNAL SCAN uses both specific sequence elements derived from biochemical characterization and elements from derived consensus sequences to match against a user input DNA sequence. ... The matching algorithm is a simple string match." Comput Appl Biosci 1991 7 2 203-206 0128 Heringa,J. A Method to Recognize .. Proteins Struct 93 17:391-411 Heringa J; Argos P A Method to Recognize Distant Repeats in Protein Sequences Regularities; Multiple alignment; DE; Display; Sequence alignment; Consensus sequence; Repeat; Protein "An automated algorithm is presented that delineates protein sequence fragments which display similarity. The method incorporates a selection of a number of local nonoverlapping sequence alignments with the highest similarity scores and a graph-theoretical approach to elucidate the consistent start and end points of the fragments comprising one or more ensembles of related subsequences. The procedure allows the simultaneous identification of different types of repeats within one sequence. A multiple alignment of the resulting fragments is performed and a consensus sequence derived from the ensemble(s)." Proteins Struct Funct Genet 17 17 391-411 0129 Vingron,M. Sequence Alignment and.. J.Mol.Biol. 94 235:1-12 Vingron M; Waterman MS Sequence Alignment and Penalty Choice. Review of Concepts, Case Studies and Implications Multiple alignment; Sequence proximity; USA; Sequence alignment; Gap; Review The paper reviews two recent advances in algorithms and probability that enable us to take a new approach to the question of selecting parameters for sequence alignment algorithms. "From this we gain a better understanding of the dependence of alignments on parameters in general. We propose novel criteria to detect biologically good alignments and highlight some specific features about the interaction between similarity matrices and gap penalties." J Mol Biol 235 235 1-12 0130 de Almeida,N. A String-Matching Algo.. Inform.Process. 93 47(5):257-259 de Almeida NF Jr; Barbosa VC A String-Matching Algorithm for the CREW PRAM Parallel; BR; String match; Pattern match; Algorithm "We present an algorithm for the CREW PRAM to find all occurrences of a pattern of size m in a text of size n. For a fixed alphabet and m = O(log2 n), the algorithm runs in O(log m) time on O(n / log m) processors. Under these restrictions, it is optimal and improves on the time complexity of previously known string-matching algorithms for the CREW PRAM." Inform Process Lett 1993 47 5 257-259 0131 Giancarlo,R. Fully Dynamic Dictiona.. 92 Giancarlo R; Amir A; Farach M; Galil Z; Park K Fully Dynamic Dictionary Matching BK - Dictionary match; USA; Dynamic Preprint, 19 pp. "We consider the dynamic dictionary matching problem. ... Our algorithms improve the deletion scheme presented in Amir and Farach's recent solution for the dynamic dictionary matching problem." Document No. 11272- 920311-12TM, AT&T Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ 07974-2070, USA, 19 pages. 1992 0132 Posfai,J. VISA: Visual Sequence .. Comput.Appl.Bio 94 10(5):537-544 Posfai J; Szaraz Z; Roberts RJ VISA: Visual Sequence Analysis for the Comparison of Multiple Amino Acid Sequences Sequence analysis; Multiple comparison; Amino acid; USA "VISA (VIsual Sequence Analysis) is a software package that displays global similarities within a set of related protein sequences. The program identifies amino acid patterns that are common to many members of the set of sequences and displays them as a series of histograms. Individual peaks on the display can be assigned a color and analogous peaks in the other sequences are then automatically marked in the same color. This can be repeated for each significant peak and leads to a display in which major matching segments of multiple amino acid sequences appear as dominant peaks of the histograms with matching colors." Comput Appl Biosci 1994 10 5 537-544 0133 Nakai,K. Gnome - an Internet-Ba.. Comput.Appl.Bio 94 10(5):547-550 Nakai K; Tokimori T; Ogiwara A; Uchiyama I; Niiyama T Gnome - an Internet-Based Sequence Analysis Tool Sequence analysis; Electronic mail; Genome; JP "Gnome (GenomeNet Open Mail-service Environment) is a sequence analysis tool that enables an end-user to make use of several Internet- (mainly e-mail) based services with an easy-to-use graphical user interface. Users can conduct homology and motif searches, and database-entry retrieval against the latest databases by emitting search requests to and receiving their results from a search-server by e-mail. The search results are viewed and managed efficiently with this system. The Macintosh and X (Motif) versions of the Gnome client and the UNIX version of the Gnome server are available to academic users free of charge." Comput Appl Biosci 1994 10 5 547-550 0134 Colussi,L. Fastest Pattern Matchi.. J.Algorithms 94 16:163-189 Colussi L Fastest Pattern Matching in Strings String match; Boyer-Moore; Pattern match; Italy "An algorithm is presented that substantially improves the algorithm of Boyer and Moore for pattern matching in strings, both in the worst case and in the average. ... The new algorithm performs 2n character comparisons in the worst case while the Boyer and Moore algorithm requires 3n comparisons; the new algorithm requires fewer comparisons than Boyer and Moore on the average .... As a shortcoming of the new algorithm, the preprocessing of the pattern requires O(m) time on the average but O(m2) in the worst case." J Algorithms 16 16 163-189 0135 Bailey,T.A. Fast String Searching .. Inform.Process. 80 11(3):130-133 Bailey TA; Dromey RG Fast String Searching by Finding Subkeys in Subtext String match; Boyer-Moore; String search "Our algorithm dominates the Boyer and Moore algorithm for binary alphabets, but is inferior for large alphabets and short keys. This algorithm is an application of the technique used by Aho and Corasick to a single key. ... The algorithm gains its speed by only looking at every bth character of the text." Inform Process Lett 1980 11 3 130-133 0136 Baker,T.P. A Technique for Extend.. SIAM J.Comput. 78 7(4):533-541 Baker TP A Technique for Extending Rapid Exact-Match String Matching to Arrays of more than One Dimension String match; Knuth-Morris-Pratt; USA; Multidimensional; Pattern match; Pattern recognition "A class of algorithms is presented for very rapid on-line detection of occurrences of a fixed set of pattern arrays as embedded subarrays in an input array. By reducing the array problem to a string matching problem in a natural way, it is shown that efficient string matching algorithms may be applied to arrays. This is illustrated by use of the string-matching algorithm of Knuth, Morris and Pratt." SIAM J Comput 1978 7 4 533-541 0137 Blumer,A. Complete Inverted File.. J.Assoc.Comput. 87 34(3):578-595 Blumer A; Blumer J; Haussler D; McConnell R; Ehrenfeucht A Complete Inverted Files for Efficient Text Retrieval and Analysis String match; Search tree; Automata; USA; Retrieval; Data structure "A data structure that implements a complete inverted file for [a finite set S of texts] that occupies linear space and can be built in linear time, using the uniform-cost RAM model, is given. Using this data structure, the time for each of the above query functions [find(w), freq(w), locations(w)] is optimal. To accomplish this, techniques from the theory of finite automata and the work on suffix trees are used to build a deterministic finite automaton that recognizes the set of all subwords of the set S." J Assoc Comput Mach 1987 34 3 578-595 0138 Bookstein,A. On Harrison's Substrin.. Comm.ACM 73 16(3):180-181 Bookstein A On Harrison's Substring Testing Technique Significance; USA; String match; Probabilistic "This note comments on a technique by Malcolm Harrison [1971] that tests whether a given string of characters, S1, is a substring of another string of characters, S2. ... We here note that, based on the assumptions inherent in Harrison's development, it is possible to derive a more exact expression for the probability of a false match." Comm ACM 1973 16 3 180-181 0139 Crochemore,M. An Optimal Algorithm f.. Inform.Process. 81 12(5):244-250 Crochemore M An Optimal Algorithm for Computing the Repetitions in a Word Regularities; Repetition; FR; Optimal; Word; Algorithm "This paper presents an algorithm to compute all the repetitions of primitive factors in a word x [of length n] in time O(n log n). A straightforward adaptation of the Knuth, Morris and Pratt's string-matching algorithm also allows to solve the problem, but in time O(n2)." Inform Process Lett 1981 12 5 244-250 0140 Crochemore,M. Computing LCF in Linea.. EATCS Bull. 86 30:57-61 Crochemore M Computing LCF in Linear Time String match; Automata; FR "The LCF of two words u and v of A* is the length of a longest factor common to u and v. A linear algorithm to compute LCF is given, based on a linear time algorithm to build the minimal suffix automaton of a word. The algorithm yields a real-time string-matching algorithm." EATCS Bull 30 30 57-61 0141 Fitch,W.M. Unresolved Problems in.. Lect.Math.Life 86 17:1-18 Fitch WM Unresolved Problems in DNA Sequence Analysis Sequence analysis; Review; USA; Sequence comparison; Sequence alignment; Phylogeny; DNA "Problems in the analysis of DNA sequences can be of six classes. [Structure analysis. Sequence comparison. Sequence alignment. Phylogeny estimation. Sequence estimation, given a phylogeny. Analysis of mutation rates.] These problems are complicated by biological considerations such as that changes may occur in several ways that are context dependent. A number of unsolved problems in each of these classes are formulated." Lect Math Life Sci 17 17 1-18 0142 Dromey,R.G. A Fast Algorithm for T.. Austral.Comput. 79 11(2):63-67 Dromey RG A Fast Algorithm for Text Comparison Longest common; AU; String match; Edit; Algorithm "Two new algorithms for finding the longest unbroken common subsequence in a pair of text files are presented. The algorithms are simple to implement, economical on space requirements and they are highly efficient for the comparison of pairs of text files for all ranges of overlap both large and small." Austral Comput J 1979 11 2 63-67 0143 Ehrenfeucht,A A New Distance Metric .. Discrete Appl.M 88 20:191-203 Ehrenfeucht A; Haussler D A New Distance Metric on Strings Computable in Linear Time Sequence comparison; Sequence proximity; Correction; USA; Distance "We describe a new metric for sequence comparison that emphasizes global similarity over sequential matching at the local level. It has the advantage over the Levenshtein metric that strings of lengths n and m can be compared in time proportional to n+m instead of nm. Various mathematical properties of the metric are established." Discrete Appl Math 20 20 191-203 0144 Faloutsos,C. Access Methods for Text ACM Comput.Surv 85 17(1):49-74 Faloutsos C Access Methods for Text Retrieval; Review; CA; Compression; String match "This paper compares text retrieval methods intended for office systems. The operational requirements of the office environment are discussed, and retrieval methods from database systems and from information retrieval systems are examined. We classify these methods and examine the most interesting representatives of each class. Attempts to speed up retrieval with special purpose hardware are also presented, and issues such as approximate string matching and compression are discussed." ACM Comput Surveys 1985 17 1 49-74 0145 Gereb-Graus,M Three One-Way Heads Ca.. J.Comput.System 94 48(1):1-8 Gereb-Graus M; Li M Three One-Way Heads Cannot Do String Matching Automata; USA; String match "We prove that three-head one-way DFA [deterministic finite automata] cannot perform string matching, that is, no three-head one-way DFA accepts the language L = { x#y : x is a substring of y, where x,y are in {0,1}* }. This answers the k = 3 case of the question whether a k-head one-way DFA can perform string matching, raised by Galil and Seiferas." J Comput Systems Sci 1994 48 1 1-8 0146 Harrison,M.C. Implementation of the .. Comm.ACM 71 14(12):777-779 Harrison MC Implementation of the Substring Test by Hashing String match; USA "A technique is described for implementing the test which determines if one string is a substring of another. When there is low probability that the test will be satisfied, it is shown how the operation can be speeded up considerably if it is preceded by a test on appropriately chosen hash codes of the strings." Comm ACM 1971 14 12 777-779 0147 Kempf,M. Time Optimal Left to R.. Acta Inform. 87 24(4):461-474 Kempf M; Bayer R; Guntzer U Time Optimal Left to Right Construction of Position Trees String match; DE; Search tree; Optimal "We are presenting a new algorithm for the on-line construction of position trees. Reading a given input string from left to right we are generating its position tree with the aid of the general concept of infix trees. An additional chain structure within the trees, called tail node connection, enables us to construct the tree within the best possible time (proportional to the number of nodes). ... The position tree for a given text is a trie index spelling out for every position the shortest substring starting at that position and occurring nowhere else in the text." Acta Inform 1987 24 4 461-474 0148 Main,M.G. An O(n log n) Algorith.. J.Algorithms 84 5(3):422-432 Main MG; Lorentz RJ An O(n log n) Algorithm for Finding all Repetitions in a String Regularities; Knuth-Morris-Pratt; USA; Repetition; Algorithm "Any nonempty string of the form xx is called a repetition. ... The algorithm is based on a linear algorithm to find all the new repetitions formed when two strings are concatenated. This linear algorithm is possible because new repetitions of equal length must occur in blocks with consecutive starting positions. The linear algorithm uses a variation of the Knuth-Morris-Pratt algorithm to find all partial occurrences of a pattern within a text string. It is also shown that no algorithm based on comparisons of symbols can improve O(n log n)." J Algorithms 1984 5 3 422-432 0149 Meyer,B. Incremental String Mat.. Inform.Process. 85 21(5):219-227 Meyer B Incremental String Matching Dictionary match; USA; String match; Complexity "The problem studied in this paper is to search a given text for occurrences of certain strings, in the particular case where the set of strings may change as the search proceeds. ... We show how [the algorithm of Aho and Corasick] can be modified to allow incremental diagram construction, so that new keywords may be entered at any time during the search. The incremental algorithm presented essentially retains the time and space complexities of the non- incremental one." Inform Process Lett 1985 21 5 219-227 0150 Moller-Nielse Experiments with a Fas.. Inform.Process. 84 18(3):129-135 Moller-Nielsen P; Staunstrup J Experiments with a Fast String Searching Algorithm String match; Parallel; Boyer-Moore; DK; String search; Algorithm "Consider the problem of finding the first occurrence of a particular pattern in a (long) string of characters. Boyer and Moore (1977) found a fast algorithm for doing this. Here we consider how this algorithm behaves when executed on a multiprocessor. It is shown that a simple implementation performs very well. This claim is based on experiments performed on the Multi-Maren multiprocessor by the present authors (1982)." Inform Process Lett 1984 18 3 129-135 0151 Franklin,N.C. Conservation of Genome.. J.Mol.Biol. 84 181:75-84 Franklin NC Conservation of Genome Form but not Sequence in the Transcription Antitermination Determinants of Bacteriophages l, f21 and P22 Genome; Rearrangement; Sequence comparison; USA "Comparisons are made among DNA sequences upstream from terminators in both leftwards and rightwards early operons of related coliphages l, f21 and P22. ... Despite almost total disparity of DNA sequence, the three genomes can be discerned to include the same elements in the same order and spacing ...." J Mol Biol 181 181 75-84 0152 Rivest,R.L. Partial-Match Retrieva.. SIAM J.Comput. 76 5(1):19-50 Rivest RL Partial-Match Retrieval Algorithms Partial match; Match with don't cares; USA; Retrieval; Algorithm "We examine the efficiency of hash-coding and tree-search algorithms for retrieving from a file of k-letter words all words which match a partially- specified input query word (for example, retrieving all six-letter English words of the form S**R*H where '*' is a 'don't care' character)." SIAM J Comput 1976 5 1 19-50 0153 Seiferas,J. Real-Time Recognition .. Math.Systems Th 77 11:111-146 Seiferas J; Galil Z Real-Time Recognition of Substring Repetition and Reversal Regularities; Automata; Language; USA; String match; Repetition; Reversal; Recognition "Real-time multitape Turing machine algorithms are presented for recognizing the languages { wxyxz : |w| = r|x|, |y| = s|x|, |z| = t|x| } and { wxyxRz : |w| = r|x|, |y| = s|x|, |z| = t|x| } for fixed r, s, and t and for string-matching with 'forced mismatches'." Math Systems Theory 11 11 111-146 0154 Slisenko,A.O. Recognizing a Symmetry.. Proc.Steklov In 73 129:25-208 Slisenko AO Recognizing a Symmetry Predicate by Multihead Turing Machines with Input Regularities; Automata; RU Only pages 25-28 and 208 are in the file. "It is proved that a symmetry predicate [i.e., a palindrome] can be recognized by a certain six-head Turing Machine with input in real time." Proc Steklov Inst Math 129 129 25-208 0155 Tarhio,J. Boyer-Moore Approach t.. Lecture Notes i 90 447:348-359 Tarhio J; Ukkonen E Boyer-Moore Approach to Approximate String Matching Approximate match; Boyer-Moore; Match with k mismatches; Match with k differences; String match; FI Proceedings Scandinavian Workshop in Algorithmic Theory, SWAT'90. "The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string in a text string with at most k mismatches. ... A related algorithm is developed for the k differences problem where the task is to find all approximate occurrences of a pattern in a text with <= k differences (insertions, deletions, changes)." Lecture Notes in Comput Sci 447 447 348-359 0156 Tharp,A.L. The Practicality of Te.. Software.Practi 82 12:35-44 Tharp AL; Tai KC The Practicality of Text Signatures for Accelerating String Searching String match; Signature; USA; Fingerprint; String search "This paper studies the use of text signatures in string searching. Text signatures are a coded representation of a unit of text formed by hashing substrings into bit positions which are, in turn, set to one. Then instead of searching an entire line of text exhaustively, the text signature may be examined first to determine if complete processing is warranted." Software Practice Experience 12 12 35-44 0157 Stormo,G.D. Use of the 'Perceptron.. Nucleic Acids R 82 10(9):2997-301 Stormo GD; Schneider TD; Gold L; Ehrenfeucht A Use of the 'Perceptron' Algorithm to Distinguish Translational Initiation Sites in E. coli Match a pattern matrix; Perceptron; USA; Algorithm "We have used a 'Perceptron' algorithm to find a weighting function which distinguishes E. coli translational initiation sites from all other sites in a library of over 78,000 nucleotides of mRNA sequence." Nucleic Acids Res 1982 10 9 2997-3011 0158 Staden,R. Finding Protein Coding.. Methods Enzymol 90 183:163-181 Staden R Finding Protein Coding Regions in Genomic Sequences Sequence analysis; UK; Region; Coding; Signal; Protein; Genomic "There are two types of information that can be used for finding protein coding regions. The first is to look for the special, so-called signal sequences, such as splice junctions and promoters, that surround coding regions. This is often called gene search by signal. The second is to examine long sections of the DNA to see if they look more like coding sequence than noncoding sequence. These latter methods are often described as gene search by content and are the subject of this chapter." Methods Enzymol 183 183 163-181 0159 Danckaert,A. 'Size Leap' Algorithm:.. Comput.Appl.Bio 91 7(4):509-513 Danckaert A; Chappey C; Hazout S 'Size Leap' Algorithm: An Efficient Extraction of the Longest Common Motifs from a Molecular Sequence Set. Application to the DNA Sequence Reconstruction. Multiple alignment; Reconstruct; FR; Motif; Region; Longest common; DNA; Sequence reconstruction; Algorithm "We propose a new method, called 'size leap' algorithm, of search for motifs of maximum size and common to two fragments at least. It allows the creation of a reduced database of motifs from a set of sequences whose size obeys the series of Fibonacci numbers. The convenience lies in the efficiency of the motif extraction. It can be applied in the establishment of overlap regions for DNA sequence reconstruction and multiple alignment of biological sequences." Comput Appl Biosci 1991 7 4 509-513 0160 Isono,K. A Computer Program Pac.. Nucleic Acids R 84 12(1):101-112 Isono K A Computer Program Package for Storing and Retrieving DNA/RNA and Protein Sequence Data Database search; DE; Program; Protein "Program DATBAS is for storing and improving DNA sequence data .... Programs NUCDAT and PROTEN are for analyzing DNA/RNA and protein sequence data, respectively. ... Program LITRAT enables users to prepare a scientific literature file convenient for writing scientific articles, and program STRAIN for storing information concerning bacterial and/or plasmid strains." Nucleic Acids Res 1984 12 1 101-112 0161 Churchill,G.A The Accuracy of DNA Se.. Genomics 92 14:89-98 Churchill GA; Waterman MS The Accuracy of DNA Sequences: Estimating Sequence Quality Reconstruct; Consensus sequence; USA; Statistical; Likelihood; Fragment; DNA; Accuracy "In this paper we describe a method of the statistical reconstruction of a large DNA sequence from a set of sequenced fragments. We assume that the fragments have been assembled and address the problem of determining the degree to which the constructed sequence is free from errors, i.e., its accuracy. ... A likelihood-based procedure for the estimation of the sequencing error rates, which utilizes an iterative EM algorithm, is described. ... We present three different approaches to the definition of a consensus sequence." Genomics 14 14 89-98 0162 Krawetz,S.A. Sequence Errors Descri.. Nucleic Acids R 89 17(10):3951-39 Krawetz SA Sequence Errors Described in GenBank: A Means to Determine the Accuracy of DNA Sequence Interpretation Sequence analysis; Significance; USA; Error; DNA; GenBank; Accuracy "The accuracy of nucleic acid sequence data interpretation was determined by assessing and quantifying the discrepancies reported in the GenBank database. This permitted the calculation of an Error Rate (ER) for nucleic acid sequence determination. ... This establishes the first set of limit boundaries of the ER for sequence interpretation and sequence errors within the GenBank database and provides the foundation for future assessments and the monitoring of sequence data accumulation." Nucleic Acids Res 1989 17 10 3951-3957 0163 States,D.J. Molecular Sequence Acc.. Proc.Nat.Acad.S 91 88:5518-5522 States DJ; Botstein D Molecular Sequence Accuracy and the Analysis of Protein Coding Regions Pairwise alignment; Error; Significance; USA; Region; Coding; Protein; Accuracy "We studied the impact of nucleic acid sequence errors on the ability to align predicted amino acid sequences with the sequences of related proteins. We found that with a simultaneous translation and alignment algorithm, identification of sequence homologies is resilient to the introduction of random errors. Proteins with >30% sequence identity can be reliably recognized even in the presence of 1% frameshifting (insertion or deletion) error rates and 5% base substitution rates." Proc Nat Acad Sci USA 88 88 5518-5522 0164 Hein,J. Reconstructing Evoluti.. Math.Biosci. 90 98:185-200 Hein J Reconstructing Evolution of Sequences Subject to Recombination Using Parsimony Phylogeny; USA; Segment; Dynamic programming; Parsimony; Evolution; Recombination "It is demonstrated that the appropriate structure to represent the evolution of sequences with recombinations is a family of trees each describing the evolution of a segment of the sequence. Two trees for neighboring segments will differ by exactly the transfer of a subtree within the whole tree. This leads to a metric between trees .... This metric is used to formulate a dynamic programming algorithm that finds the most parsimonious history that fits a given set of sequences. The algorithm is potentially very practical, since many groups of sequences defy analysis by methods that ignore recombinations.." Math Biosci 98 98 185-200 0165 Olsen,G.J. Phylogenetic Analysis .. Methods Enzymol 88 164:793-812 Olsen GJ Phylogenetic Analysis Using Ribosomal RNA Phylogeny; USA; RNA; Phylogenetic "The inference of phylogenetic relationships from molecular data (i.e., the field of molecular evolution) is contributing greatly to our understanding of the evolution of life on Earth. Although the discussion that follows is directed toward analyses based on rRNA sequences, nearly all of the concepts, and many of the details, are equally applicable to the other DNA, RNA, or protein sequences. ... The merits of rRNA for phylogenetic inference ... include universality, functional constancy, ease of identification and isolation, and apparent lack of lateral gene transfer." Methods Enzymol 164 164 793-812 0166 Claverie,J.M. Information Enhancemen.. Computers Chem. 93 17(2):191-201 Claverie JM; States DJ Information Enhancement Methods for Large Scale Sequence Analysis Database search; Significance; USA; Sequence analysis; Mask; Program "The improved efficiency of similarity search programs and the affordability of even faster computers allow studies where whole sequence databases can be the target of various comparisons with increasingly larger or numerous query sequences. However, the usefulness of those 'brute force' methods now becomes limited by the time it takes an experienced scientist to sift the biologically relevant matches from overwhelming, albeit 'statistically significant' outputs. ... We present two masking methods ... capable of eliminating most of the irrelevant outputs in a variety of large scale sequence analysis situations ...." Computers Chem 1993 17 2 191-201 0167 Lebbe,J. Local Predictability i.. Biochimie 93 75(5):371-378 Lebbe J; Vignes R Local Predictability in Biological Sequences, Algorithm and Applications Sequence analysis; Significance; Sequence prediction; FR; Algorithm "The goal of this paper is to propose an algorithm based on the k nearest neighbours to compute a local predictability measure in biological sequences. Some ideas about the usefulness of this measure are discussed on the basis of preliminary experimentations. ... Therefore we propose: to learn a system that predicts each letter of a sequence, to compare each predicted letter with the real so as to compute a local predictability measure, and to locate the zones where the letters are particularly well or badly predicted." Biochimie 1993 75 5 371-378 0168 Philippe,H. MUST, A Computer Packa.. Nucleic Acids R 93 21(22):5264-52 Philippe H MUST, A Computer Package of Management Utilities for Sequences and Trees Program; Management; Phylogeny; FR; Display "The MUST package is a phylogenetically oriented set of programs for data management and display, allowing one to handle both raw data (sequences) and results (trees, number of steps, bootstrap proportions). It is complementary to the main available software for phylogenetic analysis (PHYLIP, PAUP, HENNIG86, CLUSTAL) with which it is fully compatible." Nucleic Acids Res 1993 21 22 5264-5272 0169 Taylor,W.R. Deriving an Amino Acid.. J.Theor.Biol. 93 164(1):65-83 Taylor WR; Jones DT Deriving an Amino Acid Distance Matrix Sequence proximity; Substitution; UK; Sequence alignment; Distance; Amino acid; Matrix "Various methods were investigated to convert an amino acid similarity matrix into a low-dimensional metric distance matrix. Using projection techniques, no unique transformation was found and of the many inversion forms investigated, simple negation normalized by the diagonal elements produced a good fit to the original data. ... The derived forms might find applications in sequence alignment, including pattern-matching algorithms, and the construction of phylogenetic trees." J Theor Biol 1993 164 1 65-83 0170 Allison,L. Normalization of Affin.. J.Theor.Biol. 93 161(2):263-269 Allison L Normalization of Affine Gap Costs Used in Optimal Sequence Alignment Sequence proximity; AU; Sequence alignment; Edit; Automata; Gap; Optimal "It is shown how to normalize the costs of an alignment algorithm that employs affine or linear gap costs. The normalized costs are interpreted as the -log probabilities of the instructions of a finite-state edit-machine. This gives an explicit model relating sequences that can be linked to processes of mutation and evolution." J Theor Biol 1993 161 2 263-269 0171 Vingron,M. Weighting in Sequence .. Proc.Nat.Acad.S 93 90(19):8777-87 Vingron M; Sibbald PR Weighting in Sequence Space: A Comparison of Methods in Terms of Generalized Sequences Multiple alignment; Significance; Sequence weight; USA "A geometric analysis based on a continuous sequence space is presented that provides a common framework in which to compare [four methods for weighting aligned biological sequences]. It is concluded that there are two 'best' methods. When the sequences are known to be phylogenetically related ..., the method of Altschul, Carroll and Lipman (1989) is appropriate. When the sequences are not known to be phylogenetically related ... a modification of the method of Sibbald and Argos (1990) is preferable." Proc Nat Acad Sci USA 1993 90 19 8777-8781 0172 Rinsma-Melche The Expected Number of.. N.Z.J.Bot. 93 31(3):219-230 Rinsma-Melchert I The Expected Number of Matches in Optimal Global Sequence Alignments Pairwise alignment; Significance; NZ; Sequence alignment; Optimal "This paper outlines how lattice walks and generating functions could be used to find the expected number of matches in the optimal alignment of two sequences, in several special cases. Solving the resulting equations proves difficult." N Z J Bot 1993 31 3 219-230 0173 Altschul,S.F. A Protein Alignment Sc.. J.Mol.Evol. 93 36(3):290-300 Altschul SF A Protein Alignment Scoring System Sensitive at all Evolutionary Distances Sequence proximity; Substitution; USA; Statistical; Database search; Evolutionary distance; Scoring; Distance; Protein "Because in a database search it generally is not known a priori what evolutionary distances will characterize the similarities found, it is necessary to employ an appropriate range of [substitution] matrices in order not to overlook potential homologies. This paper formalizes this concept by defining a scoring system that is sensitive at all detectable evolutionary distances. The statistical behavior of this scoring system is analyzed, and it is shown that for a typical protein database search, estimating the originally unknown evolutionary distance appropriate to each alignment costs slightly over two bits of information ...." J Mol Evol 1993 36 3 290-300 0174 Saccone,C. Time and Biosequences J.Mol.Evol. 93 37(2):154-159 Saccone C; Lanave C; Pesole G Time and Biosequences Sequence proximity; Italy; Evolutionary distance "In both quantitative and qualitative measurements of the genetic distances [of biosequences], the compositional constraints of the nucleotide sequences play a very important role. We demonstrate that when homologous sequences significantly differ in base composition we get erratic branching order and/or wrong evaluation of the evolutionary distances." J Mol Evol 1993 37 2 154-159 0175 Sneath,P.H.A. A Proposal on Metrics .. FEMS Microbiol. 93 106(1):1-8 Sneath PHA A Proposal on Metrics for Identification using Nucleic Acid Sequences Sequence proximity; UK; Identification; Nucleic acid "The need is stressed for attempts to be made to permit diagnostic nucleic acid sequences to be used in a quantitative manner. Sequence differences or binding values should be converted to a distance measure and from this an ultrametric tree should be constructed. A single quantitative determination can yield considerable information about the likely identity of an unknown microorganism when the distance obtained from the sequence is compared with the tree." FEMS Microbiol Lett 1993 106 1 1-8 0176 Johnson,M.S. A Structural Basis for.. J.Mol.Biol. 93 233(4):716-738 Johnson MS; Overington JP A Structural Basis for Sequence Comparisons. An Evaluation of Scoring Methodologies Sequence proximity; Substitution; UK; Sequence comparison; Scoring "A residue-exchange matrix has been derived that is suitable for comparison of amino acid sequences. ... The majority of the data is from structural comparisons where there is between 15 and 40% sequence identity. As a result, a scoring matrix such as the one devised here should provide a sensitive basis for the comparison of amino acid sequences and the search for homologous sequences in amino acid databases. In order to asses the value of this matrix we have made a comparative analysis with 12 other published scoring matrices that have been used for the alignment of protein amino acid sequences." J Mol Biol 1993 233 4 716-738 0177 Apostolico,A. Guest Editor's Forewor.. Algorithmica 94 12(4/5):245-24 Apostolico A Guest Editor's Foreword. Special Issue on String Algorithmics and Its Applications String search; Approximate match; Edit; Distance; USA "Most of the past and current research in string algorithmics falls into one of the following problem categories: exact search, computation of edit distances, and approximate search. So does the majority of the papers in this special issue." Algorithmica 1994 12 4/5 245-246 0178 Gregor,J. Dynamic Programming Al.. IEEE Trans.Patt 93 15(2):129-135 Gregor J; Thomason MG Dynamic Programming Alignment of Sequences Representing Cyclic Patterns Pairwise alignment; USA; Dynamic programming; Complexity; Dynamic "String alignment by dynamic programming is generalized to include cyclic shift and corresponding optimal alignment cost for strings representing cyclic patterns. A guided search algorithm uses bounds on actual alignment costs to find all optimal cyclic shifts. ... Algorithmic complexity is analyzed for major stages in the search. Applicability of the method is illustrated with satellite DNA sequences and circularly permuted protein sequences." IEEE Trans Patt Anal Mach Intell 1993 15 2 129-135 0179 Blum,N. On Locally Optimal Ali.. Lecture Notes i 92 577:425-436 Blum N On Locally Optimal Alignments in Genetic Sequences Pairwise alignment; DE; Locally optimal; Optimal; Genetic Proceedings, Ninth Symposium on Theoretical Aspects of Computer Science (STACS). "We show how to compute all substrings of [a text string] x which have c-locally minimal [edit] distance from [a pattern string] y and all corresponding alignments in O(mn) time where n is the length of x and m is the length of y." Lecture Notes in Comput Sci 577 577 425-436 0180 Jacobson,G. Heaviest Increasing/Co.. Lecture Notes i 92 644:52-66 Jacobson G; Vo KP Heaviest Increasing/Common Subsequence Problems Longest common; Subsequence; USA "We define the heaviest increasing subsequence (HIS) and heaviest common subsequence (HCS) problems as natural generalizations of the well-studied longest increasing subsequence (LIS) and longest common subsequence (LCS) problems. We show how the famous Robinson-Schensted correspondence between permutations and pairs of Young tableaux can be extended to compute heaviest increasing subsequences. Then, we point out a simple weight-preserving correspondence between the HIS and HCS problems. From this duality ... the Hunt- Szymanski LCS algorithm can be seen as a special case of the Robinson-Schensted algorithm." Lecture Notes in Comput Sci 644 644 52-66 0181 Abarbanel,R.M Rapid Searches for Com.. Nucleic Acids R 84 12(1):263-280 Abarbanel RM; Wieneke PR; Mansfield E; Jaffe DA; Brutlag DL Rapid Searches for Complex Patterns in Biological Molecules Match complex patterns; Automata; USA; Pattern match; Database search "We have developed a tool called QUEST to allow the flexible exploration of sequences in a data bank. QUEST combines the flexibility of the UNIX pattern matching utilities and the speed of a finite state machine. In addition, QUEST allows patterns to be defined in terms of other patterns." Nucleic Acids Res 1984 12 1 263-280 0182 Abrahamson,K. Generalized String Mat.. SIAM J.Comput. 87 16(6):1039-105 Abrahamson K Generalized String Matching String match; CA; Language "This paper investigates a generalization of string matching, in which the pattern is a sequence of pattern elements, each compatible with a set of symbols. The alphabet of symbols is infinite, with its members encoded in a finite alphabet. ... The obvious algorithm for generalized string matching requires time O(NM), where N is the length of the encoding of the pattern, and M is that of the object string." Then a better algorithm is described. SIAM J Comput 1987 16 6 1039-1051 0183 Aho,A.V. Algorithms for Finding.. Handbook of T.. 90Elsevier Scienc Aho AV Algorithms for Finding Patterns in Strings van Leeuwen J Handbook of Theoretical Computer Science, Volume A, Algorithms and Complexitys Pattern match; Review; USA; Language; Expression; String match; Algorithm Notations for patterns. Matching keywords. Matching sets of keywords. Matching regular expressions. Related problems, including a terse review of approximate string matching. "No single algorithm is known for the longest- common-subsequence problem that dominates all applications." Elsevier Science Amsterdam 1990 255-300 0184 Goldberg,T. Faster Parallel String.. J.Algorithms 94 16:295-308 Goldberg T; Zwick U Faster Parallel String Matching via Larger Deterministic Samples Parallel; IL; String match "Building on previous results of Breslauer, Galil, and Vishkin, we obtain for every p(m) = O(log log m) an optimal speedup parallel string matching algorithm that can preprocess a pattern P of length m in time O( p(m) ) and can then find all occurrences of P in a text of an arbitrary length in time O(log log m / log p(m) )." J Algorithms 16 16 295-308 0185 Hein,J. An Algorithm Combining.. J.Theor.Biol. 94 167:169-174 Hein J An Algorithm Combining DNA and Protein Alignment Pairwise alignment; Coding; Protein; DNA; Genomic; DK; Algorithm "An algorithm is presented that aligns two DNA sequences minimizing the overall amount of evolution that the associated proteins have experienced. It is generalized to minimizing a weighted average of protein and DNA evolution. ... This algorithm could undoubtedly be generalized to align DNA with many coding frames in it. However, this would be very complicated, but highly practical as this could align genomic structures well." J Theor Biol 167 167 169-174 0186 Kim,J.Y. Fast String Matching u.. Software.Practi 94 24(1):79-88 Kim JY; Shawe-Taylor J Fast String Matching using an n-Gram Algorithm N-gram; Boyer-Moore; UK; String match; String search; Pattern match; Algorithm "Experimental results are given for the application of a new n-gram algorithm to substring searching in DNA strings. The results confirm theoretical predictions of expected running times based on the assumption that the data are drawn from a stationary ergodic source. They also confirm that the algorithms tested are the most efficient known for searches involving larger patterns." Software Practice Experience 1994 24 1 79-88 0187 Chaitin,G.J. On the Length of Progr.. J.Assoc.Comput. 66 13:547-569 Chaitin GJ On the Length of Programs for Computing Finite Binary Sequences Sequence analysis; Significance; Information theory; USA; Automata; Program "The use of Turing machines for calculating finite binary sequences is studied from the point of view of information theory and the theory of recursive functions. Various results are obtained concerning the number of instructions in programs. A modified form of Turing machine is studied from the same point of view. An application to the problem of defining a patternless sequence is proposed in terms of the concepts here developed." J Assoc Comput Mach 13 13 547-569 0188 Morimoto,K. A Method of Compressin.. Software.Practi 94 24(3):265-288 Morimoto K; Iriguchi H; Aoe JI A Method of Compressing Trie Structures Database search; Search tree; JP; Data structure; Structure "A trie structure can immediately determine whether a desired key is in a given key set or not, and can find its longest match easily. ... However, the total number of states of a trie becomes large, so space requirements are not good for a huge key set. To resolve this disadvantage a new structure which reduces the total number of states in a traditional trie, called a double-trie, is introduced in this paper. Insertion and deletion operation, as well as key retrieval for this double-trie, are presented." Software Practice Experience 1994 24 3 265-288 0189 Smith,P.D. On Tuning the Boyer-Mo.. Software.Practi 94 24(4):435-436 Smith PD On Tuning the Boyer-Moore-Horspool String Searching Algorithm String match; Boyer-Moore; USA; String search; Algorithm "Experiments suggest that recently reported improvements to the Boyer- Moore-Horspool string searching algorithm may be due to compiler effects rather than to properties of the language begin searched." Software Practice Experience 1994 24 4 435-436 0190 Wright,A.H. Approximate String Mat.. Software.Practi 94 24(4):337-362 Wright AH Approximate String Matching using Within-word Parallelism Match with k differences; USA; Dynamic programming; String match An implementation of the dynamic programming algorithm for this problem is given that packs several characters and mod-4 integers into a computer word. Thus, it is a parallelization of the conventional implementation that runs on ordinary processors. Since a small alphabet means that characters have short binary codes, the degree of parallelism is greatest for small alphabets and for processors with long words. For an alphabet of size 8 or smaller and a 64 bit processor, a 21-fold parallelism over the conventional algorithm can be obtained." Software Practice Experience 1994 24 4 337-362 0191 Stormo,G.D. Probing Information Co.. Methods Enzymol 91 208:458-468 Stormo GD Probing Information Content of DNA-Binding Sites Consensus sequence; Information content; USA; Statistical Sauer, R.T., ed. Protein-DNA Interactions. San Diego: Academic Press. "An information content analysis of protein-binding sites gives a quantitative description of the specificity of the protein, independent of the mechanism of specificity. It gives useful information about the total specificity of the protein and about the individual positions within the binding sites. Information content is consistent with both thermodynamic and statistical analyses of specificity. When applied to a collection of known binding sites, the description provided may be limited by the sample size or by unknown constraints on those sites. Experimental procedures to determine the information content can give much more reliable measures." Methods Enzymol 208 208 458-468 0192 Russell,R.B. The Limits of Protein .. J.Mol.Biol. 93 234:951-957 Russell RB; Barton GJ The Limits of Protein Secondary Structure Prediction Accuracy from Multiple Sequence Alignment Multiple alignment; Structure; UK; Sequence alignment; Protein; Prediction; Secondary; Accuracy "The expected best residue-by-residue accuracies for secondary structure prediction from multiple protein sequence alignment have been determined by an analysis of known protein structural families. The results show substantial variation is possible among homologous protein structures , and that 100% agreement is unlikely between a consensus prediction and one member of a protein structural family. The study provides the range of agreement to be expected between a perfect secondary structure prediction from a multiple alignment and each protein within the alignment." J Mol Biol 234 234 951-957 0193 Orengo,C.A. A Local Alignment Meth.. J.Mol.Biol. 93 233:488-497 Orengo CA; Taylor WR A Local Alignment Method for Protein Structure Motifs Structure; UK; Motif; Dynamic programming; Sequence alignment; Protein "A method for the comparison of protein three-dimensional substructures was developed. The method employs the double dynamic programming method of Taylor and Orengo but identifies multiple local alignments rather than a single global alignment. A modification based on the Smith Waterman algorithm for sequence alignment enables the automatic identification and growth of the most structurally similar local alignments irrespective of length and composition." J Mol Biol 233 233 488-497 0194 Lefevre,C. A Fast Word Search Alg.. Nucleic Acids R 94 22(3):404-411 Lefevre C; Ikeda JE A Fast Word Search Algorithm for the Representation of Sequence Similarity in Genomic DNA Pairwise comparison; Dot; JP; Representation; Automata; Repetition; Similarity; DNA; Word; Genomic; Algorithm "Computation of [the dot matrix for comparing biological two biological sequences] has been reconsidered here. An improvement is proposed through the preprocessing of the data into an automaton recognizing the word structure of a sequence. The main advantage of this approach is to systematically eliminate the repetitions during word comparison. Simple heuristics are also considered to greatly speed up pattern matching. As a result, large sequences are handled very efficiently." Nucleic Acids Res 1994 22 3 404-411 0195 Lake,J.A. Reconstructing Evoluti.. Proc.Nat.Acad.S 94 91(4):1455-145 Lake JA Reconstructing Evolutionary Trees from DNA and Protein Sequences: Paralinear Distances Phylogeny; Markov; USA; Evolutionary tree; Substitution; Distance; Protein; DNA "The reconstruction of phylogenetic trees from DNA and protein sequences is confounded by unequal rate effects. ... The algorithm presented here, called paralinear distances, is valid for a much broader class of substitution processes than previous algorithms and is accordingly less affected by unequal rate effects. It may be used with all nucleic acid, protein, or other sequences, provided that their evolution may be modeled as a succession of Markov processes. ... Paralinear distances can fail when sequences are misaligned or when site-to-site sequence variation of rates is extensive." Proc Nat Acad Sci USA 1994 91 4 1455-1459 0196 Kahn,P. EMBL Data Library Methods Enzymol 90 183:23-31 Kahn P; Cameron G EMBL Data Library Sequence database; DE; EMBL "The EMBL Data Library was established in 1980 to collect, organize, and distribute a database of nucleotide sequences and related descriptive information extracted from publications in scientific journals." Databases. Data acquisition. Data distribution. Methods Enzymol 183 183 23-31 0197 Hein,J. Genomic Alignment J.Mol.Evol. 94 38:310-316 Hein J; Stovlbaek J Genomic Alignment Pairwise alignment; DK; Region; Coding; Frame; Genomic "A heuristic algorithm is presented that can compare DNA with both coding and noncoding regions, but that also can compare multiple reading frames and determine which exons are homologous. A program, GenAl (Genomic Alignment), was developed that implements the algorithm. Its use is demonstrated on two retroviruses." J Mol Evol 38 38 310-316 0198 Fristensky,B. Feature Expressions: C.. Nucleic Acids R 93 21(25):5997-60 Fristensky B Feature Expressions: Creating and Manipulating Sequence Datasets Database search; CA; Region; Expression; Coding "Annotation of features, such as introns, exons and protein coding regions in GenBank/EMBL/DDBJ entries is now standardized through use of the Features Table (FT) language. Because FT is intrinsic to the database definition, it can serve as a software- and platform-independent lingua franca for sequence manipulation. The XYLEM package makes it possible to create and manipulate sequence datasets using FT expressions." Nucleic Acids Res 1993 21 25 5997-6003 0199 Claverie,J.M. Detecting Frame Shifts.. J.Mol.Biol. 93 234:1140-1157 Claverie JM Detecting Frame Shifts by Amino Acid Sequence Comparison Sequence proximity; Substitution; Frame; USA; Sequence comparison; Scoring; Amino acid "I derive five new types of scoring matrix, each capable of detecting a specific frame shift (deletion, insertion and inversion in 3 frames) and use them with a regular local alignments program to detect amino acid sequences that may have derived from alternative reading frames of the same nucleotide sequence. Frame shifts are inferred from the sole comparison of the protein sequences. The five scoring matrices were used with the BLASTP program to compare all the protein sequences in the Swissprot database. Surprisingly, the searches revealed hundreds of highly significant frame shift matches." J Mol Biol 234 234 1140-1157 0200 Burks,C. GenBank: Current Statu.. Methods Enzymol 90 183:3-22 Burks C; Cinkosky MJ; Gilna P; Hayden JED; Abe Y; Atencio EJ; Barnhouse S; Benton D; Buenafe CA; Cumella KE; Davison DB; Emmert DB; Faulkner MJ; Fickett JW; Fischer WM; Good M; Horne DA; Houghton FK; Kelkar PM; Kelley TA; Kelly M; King MA; Langan BJ; Lauer JT; Lopez N; Lynch C; Lynch J; Marchi JB; Marr TG; Martinez FA; McLeod MJ; Medvick PA; Mishra SK; Moore J; Munk CA; Mondragon SM; Nasseri KK; Nelson D; Nelson W; Nguyen T; Reiss G; Rice J; Ryals J; Salazar MD; Stelts SR; Trujillo BL; Tomlinson LJ; Weiner MG; Welch FJ; Wiig SE; Yudin K; Zins LB GenBank: Current Status and Future Directions Sequence database; GenBank; USA "The GenBank database provides a collection of nucleotide sequences as well as relevant bibliographic and biological annotation. We present an updated view of the size and scope of the database, and we also describe recent developments in the strategies, protocols, and software for collecting, maintaining, and distributing the data." Methods Enzymol 183 183 3-22 0201 Boswell,D.R. Sequence Alignment by .. Trends Biochem. 87 12:279-280 Boswell DR Sequence Alignment by Word Processor Multiple alignment; UK; Sequence alignment; Program; Word "The word processing programs I have used for sequence alignment on IBM PC-compatible microprocessors are WORDSTAR and PCWRITE." The author discusses the suitably of these programs for rudimentary alignment of multiple sequences. Trends Biochem Sci 12 12 279-280 0202 Baldi,P. Hidden Markov Models o.. Proc.Nat.Acad.S 94 91:1059-1063 Baldi P; Chauvin Y; Hunkapiller T; McClure MA Hidden Markov Models of Biological Primary Sequence Information Sequence analysis; Markov; Multiple alignment; USA; Statistical; Motif; Model "Hidden Markov model (HMM) techniques are used to model families of biological sequences. ... The HMM approach is applied to three protein families .... In all cases, the models derived capture the important statistical characteristics of the family and can be used for a number of tasks, including multiple alignments, motif detection, and classification. For K sequences of average length N, this approach yields an effective multiple-alignment algorithm which requires O(KN2) operations, linear in the number of sequences." Proc Nat Acad Sci USA 91 91 1059-1063 0203 Luo,L. The Statistical Correl.. Bull.Math.Biol. 91 53(3):345-353 Luo L; Li H The Statistical Correlation of Nucleotides in Protein-Coding DNA Sequences Composition; Information content; Markov; Statistical; CN; Correlation; DNA; Nucleotide "The statistical correlation of nucleotides in a DNA sequence is described by a set of redundancies D1, D2, D3, .... By calculation of {Dn} of 2341 coding regions of nucleic acid sequences it is demonstrated that about 2/3 of sequences has correlation length <=2, 10% of sequences - correlation with 3-periodicity and others - long range aperiodic correlation. The implications of the results from the interactions of random mutation and natural selection are discussed briefly." Bull Math Biol 1991 53 3 345-353 0204 Churchill,G.A Stochastic Models for .. Bull.Math.Biol. 89 51(1):79-94 Churchill GA Stochastic Models for Heterogeneous DNA Sequences Composition; Information content; Markov; Likelihood; USA; Display; Stochastic; DNA; Model "In this paper, the DNA sequence is viewed as a stochastic process with local compositional properties determined by the states of a hidden Markov chain. The model used is a discrete-state, discrete-outcome version of a general model for non-stationary time series proposed by Kitagawa (1987). A smoothing algorithm is described which can be used to reconstruct the hidden process and produce graphic displays of the compositional structure of a sequence. The problem of parameter estimation is approached using likelihood methods ...." Bull Math Biol 1989 51 1 79-94 0205 Tavare,S. Codon Preference and P.. Bull.Math.Biol. 89 51(1):95-115 Tavare S; Song B Codon Preference and Primary Sequence Structure in Protein-Coding Regions Composition; Information content; Markov; USA; Region; Codon; Complexity; Substitution; Structure "The stochastic complexity of a data base of 365 protein-coding regions is analysed. When the primary sequence is modeled as a spatially homogeneous Markov source, the fit to observed codon preference is very poor. The situation improves substantially when a non-homogeneous model is used. Some implications for the estimation of species phylogeny and substitution rates are discussed." Bull Math Biol 1989 51 1 95-115 0206 Sankoff,D. Probabilistic Models o.. Bull.Math.Biol. 89 51(1):117-124 Sankoff D; Goldstein M Probabilistic Models of Genome Shuffling Genome; Probabilistic; CA; Shuffling; Model "The comparison of entire genomes in evolutionary studies gives rise to alignments characterized by many intersections, or inversions in the order of two fragments in different genomes. To model this, we suggest a random migration process for fragments, and discuss its equilibrium distribution in the case of linear and circular genomes. Simulations are carried out to explore 'cut-off' behavior as the process approaches equilibrium. ... Questions of applicability of these models are discussed." Bull Math Biol 1989 51 1 117-124 0207 Arratia,R. Tutorial on Large Devi.. Bull.Math.Biol. 89 51(1):125-131 Arratia R; Gordon L Tutorial on Large Deviations for the Binomial Distribution Probabilistic; USA; Distribution "We present, in an easy to use form, the large deviation theory of the binomial distribution: how to approximate the probability of k or more successes in n independent trials, each with success probability p, when the specified fraction of successes, a = k/n, satisfies 0 < p < a < 1." Bull Math Biol 1989 51 1 125-131 0208 Zharkikh,A.A. VOSTORG: A Package of .. Gene 91 101:251-254 Zharkikh AA; Rzhetsky AY; Morosov PS; Sitnikova TL; Krushkal JS VOSTORG: A Package of Microcomputer Programs for Sequence Analysis and Construction of Phylogenetic Trees Sequence analysis; Phylogeny; RU; Program; Phylogenetic "VOSTORG is a new, versatile package of programs for the inference and presentation of phylogenetic trees, as well as an efficient tool for nculeotide (nt) and amino acid (aa) sequence analysis (sequence input, verification, alignment, construction of consensus, etc.). On appropriately equipped systems, these data can be displayed on a video monitor or printed as required. ... The package is designed to be easily handled by occasional computer users and yet it is powerful enough for experienced professionals." Gene 101 101 251-254 0209 Brutlag,D.L. BLAZE(TM): An Implemen.. Computers Chem. 93 17(2):203-207 Brutlag DL; Dautricourt JP; Diaz R; Fier J; Moxon B; Stamm R BLAZE(TM): An Implementation of the Smith-Waterman Sequence Comparison Algorithm on a Massively Parallel Computer Sequence comparison; Database search; Parallel; USA; FASTA; BLAST; Program; Algorithm "We have implemented the Smith and Waterman dynamic programming algorithm on the massively parallel MP1104 computer from MasPar and compared its ability to detect remote protein sequence homologies with that of other commonly used database search algorithms. ... We have found that the algorithms, in order of decreasing sensitivity are BLAZE, FASTDB, FASTA and BLAST. Hence the massively parallel computers allow one to have maximal sensitivity and search speed simultaneously." Computers Chem 1993 17 2 203-207 0210 States,D.J. Improved Sensitivity o.. Methods: Compan 91 3(1):66-70 States DJ; Gish W; Altschul SF Improved Sensitivity of Nucleic Acid Database Searches Using Application- Specific Scoring Matrices Database search; Sequence proximity; BLAST; Scoring; USA "Scoring matrices for nucleic acid sequence comparison that are based on models appropriate to the analysis of molecular sequencing errors or biological mutation processes are presented. In mammalian genomes, transition mutations occur significantly more frequently than transversions, and the optimal scoring of sequence alignments based on this substitution model differs from that derived assuming a uniform mutation model. ... Results of searches performed using BLASTN's default score matrix are compared with those using scores based on a mutational model in which transitions are more prevalent than transversions." Methods: Companion Methods Enzymol 1991 3 1 66-70 0211 Slisenko,A.O. String Matching in Rea.. Lecture Notes i 78 64:493-496 Slisenko AO String Matching in Real Time: Some Properties of the Data Structure Pattern match; Regularities; Search tree; RU; Data structure; String match; Complexity; Structure Mathematical Foundations of Computer Science, 1978: Proceedings, 7th. Zakopane, Poland, 4-8 September 1978. Edited by J. Winkowski. "The two main aims of this report are: (i) to claim new results on the complexity of a well-known problem, namely, string-matching; (ii) to make explicit those general properties of the data structure used in the real-time algorithm which provide its basic speed capacities. Let us consider the following three problems of the string- matching type: (1) recognize the set {uvw#v}, where u, v, w [are binary sequences]; (2) find a longest repetition in a given string; (3) find all the periodicities in a given string .... All these problems can be solved in real time." Lecture Notes in Comput Sci 64 64 493-496 0212 Nadeau,J.H. Lengths of Chromosomal.. Proc.Nat.Acad.S 84 81:814-818 Nadeau JH; Taylor BA Lengths of Chromosomal Segments Conserved since Divergence of Man and Mouse Chromosome; Rearrangement; Evolution; USA; Segment; Divergence "Linkage relationships of homologous loci in man and mouse were used to estimate the mean length of autosomal segments conserved during evolution. Comparison of the locations of >83 homologous loci revealed 13 conserved segments. ... Methods were developed for using this sample of conserved segments to estimate the mean length of all conserved autosomal segments in the genome. ... The mean length of conserved segments was also used to estimate the number of chromosomal rearrangements that have disrupted linkage since divergence of man and mouse. This estimate was shown to be 178 +- 39 rearrangements." Proc Nat Acad Sci USA 81 81 814-818 0213 Brooks,L.D. The Probabilities of S.. Genomics 88 3:207-216 Brooks LD; Weir BS; Schaffer HE The Probabilities of Similarities DNA Sequence Comparisons Pairwise comparison; Significance; Statistical; USA; Sequence comparison; Similarity; Probability; DNA "We discuss the statistical significance of local similarities found between DNA sequences, and illustrate the procedure with reference to the Queen and Korn algorithm. ... A table is given to assess the significance of longest similarities in sequences of length up to 1000 bases. Quite long similarities are expected to occur by chance alone. The critical values we calculate for assessing significance are preferable to expected numbers of similarities used by some commercial computer packages. ... We have not found approximate formulas, such as those of Waterman (1986), to be applicable over a wide range of conditions." Genomics 3 3 207-216 0214 Hillis,D.M. Application and Accura.. Science 94 264(29 April): Hillis DM; Huelsenbeck JP; Cunningham CW Application and Accuracy of Molecular Phylogenies Phylogeny; Evolutionary tree; USA; Accuracy "The performance of methods of phylogenetic analysis can be assessed by numerical simulation studies and by the experimental evolution of organisms in controlled laboratory situations. Both kinds of assessment indicate that existing methods are effective at estimating phylogenies over a wide range of evolutionary conditions, especially if information about substitution bias is used to provide differential weightings for character transformations." Science 1994 264 29 April 671-677 0215 Sellers,P.H. Pattern Recognition in.. Lect.Math.Life 86 17:19-28 Sellers PH Pattern Recognition in DNA Pattern recognition; USA; DNA; Recognition "The possibility of DNA sequencing brings to biology an increased need for mathematical and computational tools. This need has been met at the Rockefeller University by developing computer programs for pattern recognition in DNA. The main point to be illustrated in this lecture is that the successful development of such programs requires a formal mathematical approach to the biological problems involved." Lect Math Life Sci 17 17 19-28 0216 Jones,D.T. The Rapid Generation o.. Comput.Appl.Bio 92 8(3):275-282 Jones DT; Taylor WR; Thornton JM The Rapid Generation of Mutation Data Matrices from Protein Sequences Sequence proximity; Scoring; UK; Protein "An efficient means for generating mutation data matrices from large numbers of protein sequences is presented here. By means of an approximate peptide-based sequence comparison algorithm, the set sequences are clustered at the 85% identity level. The closest relating pairs of sequences are aligned, and observed amino acid exchanges tallied in a matrix. The raw mutation frequency matrix is processed in a similar way to that described by Dayhoff et al. (1978), and so the resulting matrices may be easily used in current sequence analysis applications, in place of the standard mutation data matrices, which have not been updated for 13 years." Comput Appl Biosci 1992 8 3 275-282 0217 Karlin,S. Maximal Length of Comm.. Ann.Probab. 88 16(2):535-563 Karlin S; Ost F Maximal Length of Common Words Among Random Letter Sequences Longest common; Significance; Probabilistic; USA; Word "Consider random letter sequences ... based on a finite alphabet generated by uniformly mixing stationary processes. The asymptotic distributional properties of the length of the longest common word in r or more of the s sequences ... are investigated. When the probability measures of the different sequences are not too dissimilar, a classical extremal type limit law holds .... The distributional properties of other long-word relationships and patterns among the sequences are also discussed." Ann Probab 1988 16 2 535-563 0218 Nei,M. Methods for Computing .. Mol.Biol.Evol. 85 2(1):66-85 Nei M; Stephens JC; Saitou N Methods for Computing the Standard Errors of Branching Points in an Evolutionary Tree and Their Application to Molecular Data from Humans and Apes Evolutionary tree; Robustness; Analytical; Distance; UPGMA; USA; Error "Statistical methods for computing the standard errors of the branching points of an evolutionary tree are developed. These methods are for the unweighted pair-group method-determined (UPGMA) trees reconstructed from molecular data such as amino acid sequences, nucleotide sequences, restriction- sites data, and electrophoretic distances. They were applied to data for the human, chimpanzee, gorilla, orangutan, and gibbon species." Mol Biol Evol 1985 2 1 66-85 0219 Waterman,M.S. Probability Distributi.. Lect.Math.Life 86 17:29-56 Waterman MS Probability Distributions for DNA Sequence Comparisons Sequence comparison; Significance; Statistical; Markov; USA; Segment; Distributed; Distribution; Probability; DNA "Recently DNA sequence comparisons have focused on finding long matching segments between two sequences, rather than matching the entire sequences. Generalizations of the celebrated Erdos-Renyi law give laws of large numbers and extreme value distributions for random variables equal to the length of the longest exact match and longest approximate match between the sequences. The cases of independent, identically distributed sequences and of Markov chains are presented. In the final section, simulated sequences and sequences from bacteriophage lambda are analyzed in light of these theoretical results." Lect Math Life Sci 17 17 29-56 0220 Duret,L. HOVERGEN: A Database o.. Nucleic Acids R 94 22(12):2360-23 Duret L; Mouchiroud D; Gouy M HOVERGEN: A Database of Homologous Vertebrate Genes Sequence database; Database search; Gene; FR "Similarity search programs easily find genes homologous to a given sequence. However, only very tedious manual procedures allow the retrieval of all sets of homologous genes sequenced for a given set of species. Moreover, this search often generates errors due to the complexity of data to be managed simultaneously: phylogenetic trees, alignments, taxonomy, sequences and related information. HOVERGEN helps to solve these problems by integrating all this information. ... This graphical tool gives thus a rapid and simple access to all data necessary to interpret homology relationships between genes." Nucleic Acids Res 1994 22 12 2360-2365 0221 Collins,J.F. High-Efficiency Sequen.. Computers and.. 90Addison-Wesley Collins JF; Reddaway SF High-Efficiency Sequence Database Searching: Use of the Distributed Array Processor Bell G Marr T Computers and DNA, SFI Studies in the Sciences of Complexity, vol. VII Sequence database; Database search; Parallel; Distributed; UK "Careful mapping of the sequence comparison algorithm described by Coulson, Collins and Lyall [1987] has provided on the AMT DAP 510 machine a high-speed method of searching for local protein sequence similarities in databases. ... [Novel] methods will be required to maintain an adequate search- and-retrieval capability with the most powerful computers. Such a method that exploits the features of the DAP is described, whose performance should provide the basis for adequate searching even when the database has reached the size of the human genome, or 3 x 109 bases of genetic sequence." Addison-Wesley Reading, MA 1990 85-92 0222 Davison,D.B. Sequence Searching on .. Computers and.. 90Addison-Wesley Davison DB Sequence Searching on Supercomputers Bell G Marr T Computers and DNA, SFI Studies in the Sciences of Complexity, vol. VII Sequence search; Program; Performance; USA "There are a large number of machines available [at Los Alamos] .... The abundance of cycles, and the low cost, could lead one to think that optimization would be less important. That is not so. Precisely because the queries are larger and more involved, it is necessary to create code that is as efficient at possible. This paper will discuss the steps involved in taking an existing similarity code and improving its performance." Addison-Wesley Reading, MA 1990 93-97 0223 Lapedes,A. Application of Neural .. Computers and.. 90Addison-Wesley Lapedes A; Barnes C; Burks C; Farber R; Sirotkin K Application of Neural Networks and Other Machine Learning Algorithms to DNA Sequence Analysis Bell G Marr T Computers and DNA, SFI Studies in the Sciences of Complexity, vol. VII Sequence analysis; Neural; DNA; USA; Network; Learning; Algorithm "In this article we report initial, quantitative results on application of simple neural networks and simple machine learning methods to two problems in DNA sequence analysis. ... (1) Determination of whether procaryotic and eucaryotic DNA sequences segments are translated to protein. ... (2) Determination of whether eucaryotic DNA sequence segments containing the dinucleotides 'AG' or 'GT' are transcribed to RNA splice junctions." Addison-Wesley Reading, MA 1990 157-181 0224 Fischer,M.J. String-Matching and Ot.. SIAM-AMS Proc. 74 7:113-125 Fischer MJ; Paterson MS String-Matching and Other Products String match; Approximate match; Don't care; USA In Complexity of Computation, Karp, R. M. (ed.). "The string-matching problem considered here is to find all occurrences of a given pattern as a substring of another longer string. ... The more difficult case where either string may have 'don't care' symbols which are deemed to match with all symbols is also considered. By exploiting the formal similarity of string-matching with integer multiplication, a new algorithm has been obtained with a running time which is only slightly worse than linear." SIAM-AMS Proc 1974 7 113-125 0225 Bird,R.S. Formal Derivation of a.. Sci.Comput.Prog 89 12:93-104 Bird RS; Gibbons J; Jones G Formal Derivation of a Pattern Matching Algorithm Pattern match; Knuth-Morris-Pratt; UK; Algorithm "This paper is devoted to the synthesis of a functional version of the Knuth-Morris-Pratt algorithm. ... However, we do assume some familiarity with the basic ideas of functional programming." Sci Comput Programming 12 12 93-104 0226 Wilbur,W.J. On the PAM Matrix Mode.. Mol.Biol.Evol. 85 2(5):434-447 Wilbur WJ On the PAM Matrix Model of Protein Evolution Scoring; PAM; Markov; USA; Evolution; Protein; Model; Matrix "The internal consistency of the PAM matrix model of protein evolution is here investigated. ... A discrepancy of more than two orders of magnitude is found between the predictions and the data when this is carried out. This is partly accounted for by an error in constructing the matrix. However, it also seems necessary that the basic model be modified. Several possibilities are considered. One of these is to incorporate a site-dependent spectrum of mutabilities associated with each amino acid." Mol Biol Evol 1985 2 5 434-447 0227 Waterman,M.S. Multiple Hypothesis Te.. Computers and.. 90Addison-Wesley Waterman MS; Gordon L Multiple Hypothesis Testing for Sequence Comparisons Bell G Marr T Computers and DNA, SFI Studies in the Sciences of Complexity, vol. VII Pairwise comparison; Significance; Locally optimal; Sequence comparison; USA "It is remarkable that these segmental matchings from random sequences are so long and score so well. Simulations such as this suggest that understanding the distribution of score (max Gij) under the null hypothesis of independence is an important goal. Otherwise if the analysis of 'interesting' alignments proceeds on an ad hoc basis, it is easy to be misled by statistically insignificant alignments. ... The examples of this paper are of DNA sequences, but the general theory allows analysis of protein and other sequences." Addison-Wesley Reading, MA 1990 127-135 0228 Hebrard,J.J. Calcul de la distance .. RAIRO Inform.Th 86 20(4):441-456 Hebrard JJ; Crochemore M Calcul de la distance par les sous-mots Sequence proximity; Automata; Longest common; FR; DE; Data structure; Distance "This paper gives two methods to compute the shortest subsequence which distinguishes two different words u and v. The use of automata together with data structures for 'Union-Find' questions leads to an algorithm almost linear in the length of uv." A distance between u and v can be based on the length of this subsequence. RAIRO Inform Theor Appl 1986 20 4 441-456 0229 Claverie,J.M. Database of Ancient Se.. Nature (Lond.) 93 364(1 July):19 Claverie JM Database of Ancient Sequences Sequence database; Region; Motif; USA; Ancient "Green et al. [1993] have introduced the concept of ancient conserved regions (ACRs), defined as contiguous amino-acid sequence segments predating the coelomate radiation 500-600 million years ago. ... I have estimated the total number and assembled a repertoire of these ancestral sequences from an analysis of the Swiss-Prot (21.0) protein database. ... This small 'ancestral' subset thus constitutes a convenient resource for the fast screening and identification of new sequences (for instance numerous cDNA partial sequences) and the definition of motifs. The 551-representative ACR set is available on e-mail request ...." Nature (Lond ) 1993 364 1 July 19-20 0230 Green,P. Ancient Conserved Regi.. Science 93 259 (19 March) Green P; Lipman D; Hillier L; Waterston R; States D; Claverie JM Ancient Conserved Regions in New Gene Sequences and the Protein Databases Database search; USA; Region; Gene; Protein; Ancient "Sets of new gene sequences from human, nematode, and yeast were compared with each other and with a set of Escherichia coli genes in order to detect ancient evolutionarily conserved regions (ACRs) in the encoded proteins. Nearly all of the ACRs so identified were found to be homologous to sequences in the protein databases. ... It is estimated that there are fewer than 900 ACRs in all." Science 1993 259 19 March 1711-1716 0231 Schneider,T.D A Design for Computer .. Nucleic Acids R 82 10(9):3013-302 Schneider TD; Stormo GD; Haemer JS; Gold L A Design for Computer Nucleic-Acid-Sequence Storage, Retrieval, and Manipulation Sequence database; Database search; Management; Program; USA; Retrieval "We have designed and built a data-base system for the storage of nucleic- acid sequences. The system consists of a data base ('the library') and software that manages and provides access to that data base ('the librarian'). Nucleic Acids Res 1982 10 9 3013-3024 0232 Breslauer,D. A Lower Bound for Para.. SIAM J.Comput. 92 21(5):856-862 Breslauer D; Galil Z A Lower Bound for Parallel String Matching String match; Parallel; USA; Complexity "This paper presents an W(log log m) lower bound on the number of rounds necessary for finding occurrences of a pattern string P[1..m] in a text string T[1..2m] in parallel using m comparisons in each round. This bound is within a constant factor of the fastest algorithm for this problem [Breslauer, Galil (1990)] and also holds for an m-processor CRCW-PRAM in the case of a general alphabet. Consequently, the paper derives the parallel complexity of the string matching problem using p processors for general alphabets ...." SIAM J Comput 1992 21 5 856-862 0233 Boguski,M.S. On Computer-Assisted A.. J.Lipid Res. 86 27:1011-1034 Boguski MS; Freeman M; Elshourbagy NA; Taylor JM; Gordon JI On Computer-Assisted Analysis of Biological Sequences: Proline Punctuation, Consensus Sequences, and Apolipoprotein Repeats Sequence alignment; Sequence comparison; Database search; Consensus sequence; Structure; Review; USA; Sequence analysis; Repeat "We describe a number of computer methods that have been applied to the analysis of apolipoprotein sequences. We discuss the suitability of these methods for particular problems, how the choice of initial 'parameters' can affect the results, and what the results can tell us about protein or gene sequences. We also identify some outstanding problems of apolipoprotein sequence analysis where further work is needed." J Lipid Res 27 27 1011-1034 0234 Tsai,W.H. Attributed String Matc.. IEEE Trans.Patt 85 7(4):453-462 Tsai WH; Yu SS Attributed String Matching with Merging for Shape Recognition String match; Sequence proximity; CN; Segment; Recognition "Each attributed string is an ordered sequence of shape boundary primitives, each representing a basic boundary structural unit, line segment, with two types of numerical attributes, length and direction. A new type of primitive edit operation, called merge, is then introduced, which can be used to combine and then match any number of consecutive boundary primitives in one shape with those in another. The resulting attributed string matching with merging approach is shown useful for recognizing distorted shapes." IEEE Trans Patt Anal Mach Intell 1985 7 4 453-462 0235 Baeza-Yates,R On Boyer-Moore Automata Algorithmica 94 12(4/5):268-29 Baeza-Yates RA; Choffrut C; Gonnet GH On Boyer-Moore Automata String search; Pattern match; Automata; Boyer-Moore; CL "The notion of Boyer-Moore automaton was introduced by Knuth, Morris, and Pratt in their historical paper on fast pattern matching. It leads to an algorithm that requires more preprocessing but is more efficient than the original Boyer-Moore's algorithm. We formalize the notion of Boyer-Moore automaton and we give an efficient building algorithm. Also, bounds on the number of states are presented, and the concept of potential of a transition is introduced to improve the worst- and average-case behavior of these machines." Algorithmica 1994 12 4/5 268-292 0236 Zweig,S.E. Analysis of Large Nucl.. Nucleic Acids R 84 12(1):767-776 Zweig SE Analysis of Large Nucleic Acid Dot Matrices on Small Computers Sequence comparison; Dot; Program; USA; Compression "A UCSD Pascal program was developed which can analyze nucleic acid dot matrices of up to 9500 x 9500 in size on the Apple II computer. Although matrices of such size consume large amounts of computer memory, this program minimizes these problems by analyzing only small strips of the matrix at a time, and then transferring the results to a floppy disk or printer. Compression and memory efficient code further enhance the size of the matrix that can be analyzed." Nucleic Acids Res 1984 12 1 767-776 0237 Stephens,J.C. Statistical Methods of.. Mol.Biol.Evol. 85 2(6):539-556 Stephens JC Statistical Methods of DNA Sequence Analysis: Detection of Intragenic Recombination or Gene Conversion Sequence analysis; Significance; Statistical; Phylogeny; USA; Gene; DNA; Recombination; Detection "Simple but exact statistical tests for detecting a cluster of associated nucleotide changes in DNA are presented. The tests are based on the linear distribution of a set of s sites among a total of n sites, where the s sites may be the variable sites, sites of insertion/deletion, or categorized in some other way. These tests are especially useful for detecting gene conversion and intragenic recombination in a sample of DNA sequences." Mol Biol Evol 1985 2 6 539-556 0238 Shepherd,J.C. Ancient Patterns in Nu.. Methods Enzymol 90 183:180-192 Shepherd JCW Ancient Patterns in Nucleic Acid Sequences Region; UK; Coding; Frame; Nucleic acid; Ancient "Here a brief summary is given of some of the evidence and reasoning leading to the conclusion that remnants of a primeval coding system still exist in present-day DNA sequences from all types of living organisms. ... A simple computer program is then described which looks for remnants of these primeval messages. Not only is it useful as a quick method of analysis of newly determined sequences by predicting the likely reading frame and, in some cases, the extent of existing genes, but it can be a guide to the nature of the genes and their past history." Methods Enzymol 183 183 180-192 0239 Hunt,L.T. Usefulness of the PIR .. Methods in Pr.. 91Birkhauser Hunt LT Usefulness of the PIR Database for Protein Comparisons Jornvall H Hoog JO; Gustavsson AM Methods in Protein Sequence Analysis Sequence database; Sequence analysis; USA; Sequence comparison; Protein; PIR "Innovative options being developed for the protein sequence databases of the PIR-International will aid sequence analysis by providing more rapid access to new data, facilitating information retrieval, incorporating new types of information and data representations, and adding to the programs for searching, comparison, and prediction. Protocols for sequence analysis are briefly outlined." Birkhauser Basel 1991 343-352 0240 Bleasby,A.J. Construction of Valida.. Protein Eng. 90 3(3):153-159 Bleasby AJ; Wootton JC Construction of Validated, Non-redundant Composite Protein Sequence Databases Sequence database; UK; Protein "A strategy has been developed for the construction of a validated, comprehensive composite protein sequence database. Entries are amalgamated from primary source data bases by a largely automated set of processes in which redundant and trivially different entries are eliminated. A modular approach has been adopted to allow scientific judgement to be used at each stage of database processing and amalgamation." Protein Eng 1990 3 3 153-159 0241 Chothia,C. The Relation Between t.. EMBO J. 86 5(4):823-826 Chothia C; Lesk AM The Relation Between the Divergence of Sequence and Structure in Proteins Sequence comparison; Structure; UK; Divergence; Protein "Here we report a systematic comparison of structures from eight different protein families. This shows that the extent of the structural changes is directly related to the extent of the sequence changes." EMBO J 1986 5 4 823-826 0242 Dayhoff,M.O. A Model of Evolutionar.. Atlas of Prot.. 72National Biomed Dayhoff MO; Eck RV; Park CM A Model of Evolutionary Change in Proteins Dayhoff MO Atlas of Protein Sequence and Structure, 1972, Volume 5 Sequence proximity; Substitution; PAM; Scoring; USA; Protein; Model "What mutations are most likely to be accepted? Which amino acids are least likely to change? How does the passage of time affect the similarity of related protein sequences?" National Biomedical Research Foundation Washington, DC 1972 89-99 0243 Dayhoff,M.O. A Model of Evolutionar.. Atlas of Prot.. 68National Biomed Dayhoff MO; Eck RV A Model of Evolutionary Change in Proteins Dayhoff MO Eck RV Atlas of Protein Sequence and Structure, 1967-68 Sequence proximity; Substitution; PAM; Scoring; USA; Evolutionary distance; Protein; Model Accepted Point Mutations. Mutability of Amino Acids. Amino Acid Frequencies in the Mutation Data. Mutation Probability Matrix for the Evolutionary Distance of Two PAMs. Simulation of the Mutational Process. Mutation Probability Matrices. Estimation of Evolutionary Distance. Relatedness Odds Matrix. Computing Relationships Between Sequences. National Biomedical Research Foundation Silver Spring, MD 1968 33-41 0244 Levin,J.M. An Algorithm for Secon.. FEBS Lett. 86 205(2):303-308 Levin JM; Robson B; Garnier J An Algorithm for Secondary Structure Determination in Proteins Based on Sequence Similarity Sequence proximity; Structure; FR; Scoring; Similarity; Protein; Secondary; Algorithm "A secondary structure prediction algorithm is proposed on the hypothesis that short homologous sequences of amino acids have the same secondary structure tendencies. Comparisons are made with the secondary structure assignments of Kabsch and Sander from X-ray data ... and an empirically determined similarity matrix which assigns a sequence similarity score between any two sequences of 7 residues in length. This similarity matrix differs in many respects from that of the Dayhoff substitution matrix ...." FEBS Lett 1986 205 2 303-308 0245 Pevzner,P.A. l-Tuple DNA Sequencing.. J.Biomol.Struct 89 7(1):63-73 Pevzner PA l-Tuple DNA Sequencing: Computer Analysis Supersequence; Shortest common; Reconstruct; RU; DNA; Sequencing "In the present paper a necessary and sufficient condition for testing the uniqueness of sequence reconstruction is obtained and an efficient reconstruction algorithm is proposed. ... The problem of l-tuple DNA sequencing is a particular case of the problem about minimal superword: for a given set of words S one has to find the word of minimal length containing all words of the set S. Gallant et al. (1980) proved the NP-completeness of this problem. Hence one can hope to obtain the efficient algorithms for its solution only in particular cases." J Biomol Struct & Dyn 1989 7 1 63-73 0246 Suen,C.Y. n-Gram Statistics for .. IEEE Trans.Patt 79 1(2):164-172 Suen CY n-Gram Statistics for Natural Language Understanding and Text Processing Retrieval; N-gram; Statistical; CA; Language "n-Gram (n = 1 to 5) statistics and other properties of the English language were derived for applications in natural language understanding and text processing. They were computed from a well-known corpus composed of 1 million word samples. ... The positional distributions of n-grams obtained in the present study are discussed. Statistical studies on word length and trends of n-gram frequencies versus vocabulary are presented. In addition to a survey of n-gram statistics found in the literature, a collection of n-gram statistics obtained by other researchers is reviewed and compared." IEEE Trans Patt Anal Mach Intell 1979 1 2 164-172 0247 Kabsch,W. On the Use of Sequence.. Proc.Nat.Acad.S 84 81:1075-1078 Kabsch W; Sander C On the Use of Sequence Homologies to Predict Protein Structure: Identical Pentapeptides can have Completely Different Conformations Sequence comparison; Structure; DE; Homology; Protein "Pentapeptide structure within a protein is strongly dependent on sequence context, a fact essentially ignored in most protein structure prediction methods: just considering the local sequence of five residues is not sufficient to predict correctly the local conformation (secondary structure). ... Also, we are warned that in the growing practice of comparing a new protein sequence with a data base of known sequences, finding an identical pentapeptide sequence between two proteins is not a significant indication of structural similarity or of evolutionary kinship." Proc Nat Acad Sci USA 81 81 1075-1078 0248 Van Bockstael Sequence Representation Biochimie 85 67:509-516 Van Bockstaele F Sequence Representation Representation; FR; Sequence analysis; Linguistic; Segment "This article deals with the definition of a method for analyzing sequences of symbols, especially biological sequences. We are mostly interested in finding representations of sequences, that could help to explicate relationship between their structure and their activity. Starting with automatically built rules, governing occurrences of symbols within sequences, we define ways of using these rules to determine different subsequences that we assume to be contexts. Labelled contexts provide a possible representation of sequences." Biochimie 67 67 509-516 0249 Raiha,K.J. The Shortest Common Su.. Theoret.Comput. 81 16:187-198 Raiha KJ; Ukkonen E The Shortest Common Supersequence Problem over Binary Alphabet is NP- complete Supersequence; Complexity; FI; Shortest common "We consider the complexity of the Shortest Common Supersequence (SCS) problem .... The SCS problem is shown to be NP-complete for strings over an alphabet of size >= 2." Theoret Comput Sci 16 16 187-198 0250 Benson,G. A Space Efficient Algo.. Lecture Notes i 94 807:1-14 Benson G A Space Efficient Algorithm for Finding the Best Non-Overlapping Alignment Score Sequence alignment; Repeat; Region; USA; Score; Algorithm 5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "Repeating patterns make up a significant fraction of DNA and protein molecules. These repeating regions are important to biological function because they may act as catalytic, regulatory or evolutionary sites and because they have been implicated in human disease. .... In this paper, we present a space efficient algorithm for finding the maximum alignment score for any two substrings of a single string T under the condition that the substrings do not overlap. In a biological context, this corresponds to the largest repeating region in the molecule." Lecture Notes in Comput Sci 807 807 1-14 0251 Chao,K.M. Computing all Suboptim.. Lecture Notes i 94 807:31-42 Chao KM Computing all Suboptimal Alignments in Linear Space Sequence alignment; Suboptimal; USA 5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "Recently, a new compact representation for suboptimal alignments was proposed by Naor and Brutlag (1993). The kernel of that representation is a minimal directed acyclic graph (DAG) containing all suboptimal alignments. In this paper, we propose a method that computes such a DAG in space linear to the graph size. ... To exploit the computed DAG, we employ a variant of Aho-Corasick pattern matching machine ... to locate all occurrences of specified patterns, and then find a path in the DAG that maximizes the sum of the scores of the non- overlapping patterns occurring in it." Lecture Notes in Comput Sci 807 807 31-42 0252 Bafna,V. Approximation Algorith.. Lecture Notes i 94 807:43-53 Bafna V; Lawler EL; Pevzner PA Approximation Algorithms for Multiple Sequence Alignment Multiple alignment; Approximation; Sequence alignment; USA; Algorithm 5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "We consider the problem of aligning of k sequences of length n. The cost function is sum of pairs, and satisfies triangle inequality. ... We generalize this approach to assemble an alignment of k sequences from optimally aligned subsets of l < k sequences to obtain an improved performance guarantee. For arbitrary l < k, we devise deterministic and randomized algorithms yielding performance guarantees of 2 - l/k. For fixed l, the running times of these algorithms are polynomial in n and k." Lecture Notes in Comput Sci 807 807 43-53 0253 Huang,X. A Context Dependent Me.. Lecture Notes i 94 807:54-63 Huang X A Context Dependent Method for Comparing Sequences Sequence comparison; Sequence proximity; Pairwise alignment; USA 5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "A scoring scheme is presented to measure the similarity score between two biological sequences, where matches are weighted dependent on their context. The scheme generalizes a widely used scoring scheme. A dynamic programming algorithm is developed to compute a largest-scoring alignment of two sequences .... Also developed is an algorithm for computing a largest-scoring local alignment between two sequences in quadratic time and linear space." Lecture Notes in Comput Sci 807 807 54-63 0254 Cobbs,A.L. Fast Identification of.. Lecture Notes i 94 807:64-74 Cobbs AL Fast Identification of Approximately Matching Substrings Approximate match; Identification; USA 5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "We give an efficient algorithm for finding all maximal matches between [two strings]. The algorithm runs in time bounded by the sum of the lengths of the maximal matches .... The main application is identifying homologous regions of protein sequences." Lecture Notes in Comput Sci 807 807 64-74 0255 Huang,X. Parametric Recomputing.. Lecture Notes i 94 807:87-101 Huang X; Pevzner PA; Miller W Parametric Recomputing in Alignment Graphs Sequence alignment; Parametric; Graph; USA 5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "DNA/protein sequence alignments in computational molecular biology depend heavily on the settings of penalties for substitutions, insertions/deletions and gaps. Inappropriate choice of parameters causes irrelevant matches ('noise') to be reported .... This paper provides a computational underpinning for such iterative noise filtration in alignment graphs. Our main results assume that a preliminary noisy alignment, computed with reasonable but ad hoc parameters, is given; the problem is to modify the parameters to reduce noise." Lecture Notes in Comput Sci 807 807 87-101 0256 Manber,U. A Text Compression Sch.. Lecture Notes i 94 807:113-124 Manber U A Text Compression Scheme that Allows Fast Searching Directly in the Compressed File Sequence search; Compression; String match; USA 5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "A new text compression scheme is presented in this paper. The main purpose of this scheme is to speed up string matching by searching the compressed file directly. The scheme requires no modification of the string-matching algorithm, which is used as a black box; any string-matching procedure can be used. Instead, the pattern is modified; only the outcome of the matching of the modified pattern against the compressed file is decompressed. Since the compressed file is smaller than the original file, the search is faster both in terms of I/O time and processing time than a search in the original file." Lecture Notes in Comput Sci 807 807 113-124 0257 Lestree,L. Unit Route Upper Bound.. Lecture Notes i 94 807:136-145 Lestree L Unit Route Upper Bound for String-Matching on Hypercube String match; Parallel; FR 5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "We give here an algorithm of string matching on a hypercube with constant memory .... This algorithm is very close to the lower bound of the problem for this architecture. ... The model chosen here is a SIMD hypercube with free communication." Lecture Notes in Comput Sci 807 807 136-145 0258 Kosaraju,S.R. Computation of Squares.. Lecture Notes i 94 807:146-150 Kosaraju SR Computation of Squares in a String (Preliminary Version) Regularities; Search tree; Square; USA 5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "We design a linear time algorithm for computing a square substring from each position of a given string over a finite alphabet. The algorithm exploits several subtle properties of suffix trees for strings." Lecture Notes in Comput Sci 807 807 146-150 0259 Alexander,K.S Shortest Common Supers.. Lecture Notes i 94 807:164-172 Alexander KS Shortest Common Superstrings for Strings of Random Letters Supersequence; Shortest common; Compression; USA 5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "Given a finite collection of strings of letters from a fixed alphabet, it is of interest, in the contexts of data compression and DNA sequencing, to find the length of the shortest string which contains each of the given strings as a consecutive substring. In order to analyze the average behavior of the optimal superstring length, substrings with a specified collection of lengths are considered with the letters selected independently at random. An asymptotic expression, as the collection of lengths becomes large, is obtained for the savings from compression ...." Lecture Notes in Comput Sci 807 807 164-172 0260 Irving,R.W. Maximal Common Subsequ.. Lecture Notes i 94 807:173-183 Irving RW; Fraser CB Maximal Common Subsequences and Minimal Common Supersequences Longest common; Subsequence; Shortest common; Supersequence; Approximation; UK 5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "Here we study the related problems of finding a minimum-length maximal common subsequence and a maximum-length minimal common supersequence. We describe dynamic programming algorithms for the case of two strings ..., which can be extended to any fixed number of strings. We also show that the minimum maximal common subsequence problem is NP-hard in general for k strings, and we prove a strong negative approximability result for this problem." Lecture Notes in Comput Sci 807 807 173-183 0261 Breslauer,D. Dictionary-Matching on.. Lecture Notes i 94 807:184-197 Breslauer D Dictionary-Matching on Unbounded Alphabets: Uniform-Length Dictionaries Dictionary match; Multidimensional; Search tree; Italy 5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "This paper presents an efficient on-line dictionary-matching algorithm for the case where the patterns have uniform length and the input alphabet is unbounded. A tight lower bound establishes that our approach is optimal if the only access the algorithm has to the input strings is by pairwise symbol comparisons. In an immediate application, the new dictionary-matching algorithm can be used in a previously known higher-dimensional array-matching algorithm, improving the performance of this algorithm on unbounded alphabets." Lecture Notes in Comput Sci 807 807 184-197 0262 Idury,R.M. Multiple Matching of P.. Lecture Notes i 94 807:226-239 Idury RM; Schaffer AA Multiple Matching of Parameterized Patterns Pattern match; Parameterized; Automata; USA 5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "We extend Baker's theory of parameterized pattern matching (1993) to algorithms that match multiple patterns in a text. We first consider the case where the patterns are fixed and preprocessed once, and then the case where the pattern set can change by insertions and deletions. Baker's algorithms are based on suffix trees, whereas ours are based on pattern matching automata." Lecture Notes in Comput Sci 807 807 226-239 0263 Akutsu,T. Approximate String Mat.. Lecture Notes i 94 807:240-249 Akutsu T Approximate String Matching with Don't Care Characters Approximate match; Match with don't cares; String match; JP 5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "This paper presents parallel and serial approximate matching algorithms for strings with don't care characters. They are based on Landau and Vishkin's approximate string matching algorithm and Fisher and Paterson's exact string matching algorithm with don't care characters. ... Several extensions are also described." Lecture Notes in Comput Sci 807 807 240-249 0264 Chang,W.I. Approximate String Mat.. Lecture Notes i 94 807:259-273 Chang WI; Marr TG Approximate String Matching and Local Similarity Approximate match; Sequence proximity; String match; USA; Similarity 5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "In this paper, we describe how the distance-based sublinear expected time algorithm of Chang and Lawler can be extended to solve efficiently the local similarity problem. We present both a new theoretical result, polynomial-space, constant- fraction-error matching that is provably optimal, and a practical adaptation of it that produces nearly identical results as Smith-Waterman, at speedups of 2X ... or better. Further improvements are anticipated." Lecture Notes in Comput Sci 807 807 259-273 0265 Kececioglu,J. Efficient Bounds for O.. Lecture Notes i 94 807:307-325 Kececioglu J; Sankoff D Efficient Bounds for Oriented Chromosome Inversion Distance Genome; Sequence proximity; Chromosome; Inversion; CA; Distance 5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "We study the problem of comparing two circular chromosomes that have evolved by chromosome inversion, assuming that the order of corresponding genes is known, as well as their orientation. Determining the minimum number of inversions is equivalent to finding the minimum of reversals to sort a signed circular permutation, where a reversal takes an arbitrary substring of elements and reverses their order, as well as flipping their sign. We show that tight bounds on the minimum number of reversals can be found by simple and efficient algorithms." Lecture Notes in Comput Sci 807 807 307-325 0266 Computational Molecula.. 88Oxford Universi Computational Molecular Biology: Sources and Methods for Sequence Analysis Lesk AM BK - Sequence analysis; UK Table of contents (ix-x) and references (229-247) only. Introduction (1 chapter); databanks of protein sequences (3); databanks of nucleic acid sequences (2); databanks of three-dimensional structures (1); program systems (3); the technical background (3); scientific applications (6); prospects (1). Oxford University Press Oxford, UK 1988 1-249 0267 Chang,W.I. Sublinear Approximate .. Algorithmica 94 12(4/5):327-34 Chang WI; Lawler EL Sublinear Approximate String Matching and Biological Applications Pattern match; String match; Edit; Distance; Suffix; Approximate match; USA "Given a text string of length n and a pattern string of length m over a b-letter alphabet, the k differences approximate string matching problem asks for all locations in the text where the pattern occurs with at most k differences (substitutions, insertions, deletions). We treat k not as a constant but as a fraction of m (not necessarily constant-fraction). ... We give an algorithm that is sublinear time O( (n/m) k logb m ) when the text is random and k is bounded by the threshold m/( logb m + O(1) )." Algorithmica 1994 12 4/5 327-344 0268 Gingeras,T.R. Computer Programs for .. Nucleic Acids R 79 7(2):529-545 Gingeras TR; Milazzo JP; Sciaky D; Roberts RJ Computer Programs for the Assembly of DNA Sequences Supersequence; Shortest common; Reconstruct; Program; USA; DNA "A collection of user-interactive computer programs is described which aids in the assembly of DNA sequences. This is achieved by searching for the positions of overlapping common nucleotide sequences within the blocks of sequence obtained as primary data. Such overlapping segments are then melded into one continuous string of nucleotides. Strategies for determining the accuracy of the sequence being analyzed and reducing the error rate resulting from the manual manipulation of sequence data are discussed." Nucleic Acids Res 1979 7 2 529-545 0269 Taylor,W.R. A Holistic Approach to.. Protein Eng. 89 2(7):505-519 Taylor WR; Orengo CA A Holistic Approach to Protein Structure Alignment Structure; UK; Protein "A method of protein structure comparison developed previously is extended to incorporate other aspects of protein structure in addition to the inter- atomic vectors on which it was originally based. Each additional aspect ... was introduced separately and evaluated for its ability to improve alignment quality. The components were then combined, suitably weighted, to produce a more holistic comparison method." Protein Eng 1989 2 7 505-519 0270 Goldstein,L. Mapping DNA by Stochas.. Adv.Appl.Math. 87 8:194-207 Goldstein L; Waterman MS Mapping DNA by Stochastic Relaxation Digest; Mapping; USA; DNA; Stochastic "The multiple digest mapping problem arising in molecular biology can be stated roughly as follows. A linear or circular segment of DNA is cut at all occurrences of a specific short pattern by restriction enzymes. By using restriction enzymes singly and in combination it is required to construct a map showing the location of cleavage sites. In this paper we first consider the efficacy of a simulated annealing algorithm towards the solution to the multiple digest problem. Second, the double digest problem ... is shown to admit an exponentially increasing number of solutions as a function of the length of the segment under a particular probability model. Next, the double digest problem is shown to lie in the class of NP complete problems ...." Adv Appl Math 8 8 194-207 0271 Orengo,C.A. A Rapid Method of Prot.. J.Theor.Biol. 90 147:517-551 Orengo CA; Taylor WR A Rapid Method of Protein Structure Alignment Structure; UK; Protein "A reduction in the time required to compare two protein structures has been achieved for a previously developed structure alignment method, by reducing the number of residue pair comparisons which must be performed between the two structures." J Theor Biol 147 147 517-551 0272 Orengo,C.A. Fast Structure Alignme.. Proteins Struct 92 14:139-167 Orengo CA; Brown NP; Taylor WR Fast Structure Alignment for Protein Databank Searching Database search; Structure; UK; Protein; Databank "A fast method is described for searching and analyzing the protein structure databank. It uses secondary structure followed by residue matching to compare protein structures and is developed from a previous structural alignment method based on dynamic programming." Proteins Struct Funct Genet 14 14 139-167 0273 Huang,X. An Algorithm for Ident.. Comput.Appl.Bio 94 10(3):219-225 Huang X An Algorithm for Identifying Regions of a DNA Sequence that Satisfy a Content Requirement Sequence analysis; Composition; Dynamic programming; USA; Region; DNA; Algorithm "We present a dynamic programming algorithm for identifying regions of a DNA sequence that meet a user-specified compositional requirement. Applications of the algorithm include finding C+G-rich regions, locating TA+CG-deficient regions, identifying CpG islands, and finding regions rich in periodical three- base patterns. The algorithm has the advantage over the simple window method in that the algorithm shows the exact location of each identified region." Comput Appl Biosci 1994 10 3 219-225 0274 Chou,P.Y. Prediction of the Seco.. Adv.Enzymol.Rel 78 47:45-148 Chou PY; Fasman GD Prediction of the Secondary Structure of Proteins from their Amino Acid Sequence Structure; USA; Protein; Amino acid; Prediction; Secondary Historical introduction. The Chou and Fasman predictive method. Definition of conformational regions. Refinement of conformational parameters. Application of Chou-Fasman method. Comparison of predictive methods. Computerized Chou- Fasman method. Future directions. Adv Enzymol Relat Areas Mol Biol 47 47 45-148 0275 Rost,B. Prediction of Protein .. J.Mol.Biol. 93 232:584-599 Rost B; Sander C Prediction of Protein Secondary Structure at Better than 70% Accuracy Structure; Multiple alignment; DE; Neural; Sequence alignment; Protein; Prediction; Secondary; Accuracy "We have trained a two-layered feed-forward neural network on a non- redundant data base of 130 protein chains to predict the secondary structure of water-soluble proteins. A new key aspect is the use of evolutionary information in the form of multiple sequence alignments that are used as input in place of single sequences. The inclusion of protein family information in this form increases the prediction accuracy by six to eight percentage points." J Mol Biol 232 232 584-599 0276 Huang,X. On Global Sequence Ali.. Comput.Appl.Bio 94 10(3):227-235 Huang X On Global Sequence Alignment Sequence alignment; Dynamic programming; USA; Gap "We present a dynamic programming algorithm for computing a best global alignment of two sequences. The proposed algorithm is robust in identifying any of several global relationships between two sequences. The algorithm delivers a best alignment of two sequences in linear space and quadratic time. We also describe a multiple alignment algorithm based on the pairwise algorithm. ... Experimental results indicate that for a commonly used set of gap penalties, the new programs produce more satisfactory alignments on sequences of various lengths than some existing pairwise and multiple programs based on the dynamic programming algorithm of Needleman and Wunsch." Comput Appl Biosci 1994 10 3 227-235 0277 Bishop,M.J. Evolutionary Trees fro.. Proc.R.Soc.Lond 85 226:271-302 Bishop MJ; Friday AE Evolutionary Trees from Nucleic Acid and Protein Sequences Phylogeny; Evolutionary tree; Probabilistic; UK; Protein; Nucleic acid "The problem addressed is that of estimating evolutionary relationship by the comparative study of the nucleic acid or protein sequences of living organisms. The most important point made in this account is that estimation of evolutionary relationship should be based on clearly defined models the assumptions of which are open to test. The models should as far as possible conform to what is known about the processes of evolutionary change in the organisms concerned. Prevailing approaches, grouped here as divergence models, are stated below in such a way that it is clear that they involve unrealistic assumptions about the nature of evolutionary change. Emphasis is placed on the use of probabilistic models of evolutionary change." Proc R Soc Lond Ser B 226 226 271-302 0278 Lake,J.A. A Rate-Independent Tec.. Mol.Biol.Evol. 87 4(2):167-191 Lake JA A Rate-Independent Technique for Analysis of Nucleic Acid Sequences: Evolutionary Parsimony Phylogeny; Sequence analysis; Character data; Invariant; Substitution; Parsimony; USA; Robustness; Analytical; Nucleic acid "The method of evolutionary parsimony - or operator invariants - is a technique of nucleic acid sequence analysis related to parsimony analysis and explicitly designed for determining evolutionary relationships among four distantly related taxa. The method is independent of substitution rates because it is derived from consideration of the group properties of substitution operators rather than from an analysis of the probabilities of substitution in branches of a tree." Mol Biol Evol 1987 4 2 167-191 0279 Raita,T. Tuning the Boyer-Moore.. Software.Practi 92 22(10):879-884 Raita T Tuning the Boyer-Moore-Horspool String Searching Algorithm String match; Boyer-Moore; Regularities; Sequence search; Pattern match; FI; String search; Algorithm "Substring search is a common activity in computing. The fastest known search method is that of Boyer and Moore with the improvements introduced by Horspool. This paper presents a new implementation which takes advantage of the dependencies between the characters. The resulting code runs 25 per cent faster than the best currently-known routine." Software Practice Experience 1992 22 10 879-884 0280 Gyori,E. Stack of Pancakes Stud.Sci.Math.H 78 13:133-137 Gyori E; Turan G Stack of Pancakes Inversion; Reversal; Prefix; Genomic; HU Let p be a permutation of the number-set {1, ..., n}. Let an admissible step be the reversing of the 'end' of the sequence (permutation). ... Let f(p) be the minimal number of the admissible steps by means of which we get the permutation 1,2,...,n. Let f(n) be the maximum of f(p) over all permutations in the symmetric group Sn. Then f(n) <= (5n + 5)/3 for arbitrary n. See also Gates & Papadimitriou (1979). Stud Sci Math Hungarica 13 13 133-137 0281 Wu,S. Fast Text Searching Al.. Comm.ACM 92 35(10):83-91 Wu S; Manber U Fast Text Searching Allowing Errors Text search; Match with k differences; USA; Language; String match; Expression; Error "Many different approximate string-matching algorithms have been suggested. In this article we present a new algorithm which is very fast in practice, reasonably simple to implement, and supports a large number of variations of the approximate string-matching problem. The algorithm is based on a numeric scheme for exact string matching developed by Baeza-Yates and Gonnet (1992). The algorithm can handle most of the common types of queries, including arbitrary regular expressions, and several variations of closeness measures. ... [It] served as a basis for a software package for Unix called agrep, which has been in use since June 1991." Comm ACM 1992 35 10 83-91 0282 Baeza-Yates,R A New Approach to Text.. Comm.ACM 92 35(10):74-82 Baeza-Yates RA; Gonnet GH A New Approach to Text Searching Text search; Match with don't cares; Match with k mismatches; CL; String match; String search "The string-matching problem consists of finding all occurrences of a pattern of length m in a text of length n. We generalize the problem allowing don't care symbols, the complement of a symbol, and any finite class of symbols. We solve this problem for one or more patterns, with or without mismatches. For small patterns the worst-case time is linear in the size of the text (we say that a pattern is small if m is bounded by a constant)." Comm ACM 1992 35 10 74-82 0283 Barker,W.C. Detecting Distant Rela.. Atlas of Prot.. 72National Biomed Barker WC; Dayhoff MO Detecting Distant Relationships: Computer Methods and Results Dayhoff MO Atlas of Protein Sequence and Structure, 1972, Volume 5 Sequence proximity; Scoring; Significance; USA; Needleman-Wunsch "With [the mutation data] matrix and a strategy of comparison to randomly permuted sequences, we can definitely infer relationships as remote as those between cytochrome c551 of Pseudomonas and the cytochromes c of higher organisms, plant leghemoglobin and vertebrate hemoglobins, and a bacterial protease and the mammalian serine proteases. In this chapter we report the results of tests on many pairs of proteins for relationship to each other. ... We have used a computer algorithm designed by Needleman and Wunsch to test for remote relationships between proteins." National Biomedical Research Foundation Washington, DC 1972 101-110 0284 Cantor,C.R. The Occurrence of Gaps.. Biochem.Biophys 68 31(3):410-416 Cantor CR The Occurrence of Gaps in Protein Sequences Pairwise alignment; Gap; Significance; USA; Protein "A simple procedure has been developed to test whether a gap does in fact increase the apparent homology between two sequences. ... We have modified the procedures developed by Fitch (1966) to include the placement of a gap at every possible position in a protein sequence." Biochem Biophys Res Commun 1968 31 3 410-416 0285 Nussinov,R. Compositional Variatio.. Comput.Appl.Bio 91 7(3):287-293 Nussinov R Compositional Variations in DNA Sequences Sequence analysis; Composition; USA; Signal; N-gram; Pattern discovery; DNA "Biologically occurring nucleotide sequences differ from randomly generated ones. Here we describe general patterns found in prokaryotic and in eukaryotic DNA. In the accompanying paper (Nussinov, 1991) we also describe DNA signals recognized by their corresponding protein factors. In particular, we focus on modes of searches for such patterns and signals and on the potential properties such sequences may possess." Comput Appl Biosci 1991 7 3 287-293 0286 Nussinov,R. Some Rules in the Orde.. Nucleic Acids R 80 8(19):4545-456 Nussinov R Some Rules in the Ordering of Nucleotides in the DNA Composition; N-gram; IL; Dyad; DNA; Nucleotide "Natural DNA sequences contain distinct nearest neighbor patterns. Eukaryotic as well as prokaryotic sequences show a consistent hierarchy in the frequencies of appearance of most doublets." Nucleic Acids Res 1980 8 19 4545-4562 0287 Nussinov,R. Nearest Neighbor Nucle.. J.Biol.Chem. 81 256(16):8458-8 Nussinov R Nearest Neighbor Nucleotide Patterns: Structural and Biological Implications Composition; N-gram; USA; Codon; Dyad; Nucleotide "Recently, nearest neighbor patterns were observed in prokaryotic and eukaryotic DNA sequences. These are discussed with respect to some of their biological implications. It is suggested that their origins relate to different specific structures of nearest neighbor base pairs. These patterns strongly constrain the DNA sequence. As such, they 'explain' to some degree the amino acid codon choice and have direct bearing on questions related to evolution." J Biol Chem 1981 256 16 8458-8462 0288 Nussinov,R. Doublet Frequencies in.. Nucleic Acids R 84 12(3):1749-176 Nussinov R Doublet Frequencies in Evolutionary Distinct Groups Composition; N-gram; IL; Dyad "We analyze the dinucleotide frequencies of occurrence and preferences separately within the vertebrates, nonvertebrates, DNA viruses, .... Distinct patterns are observed. ... Doublets are the most basic ingredient of order in nucleotide sequences. We suggest that their preferences and the arrangement of nucleotides in the DNA in general is determined to a large extent by the conformational and packaging considerations of the double helix. Some principles of DNA conformation are viewed in light of our results." Nucleic Acids Res 1984 12 3 1749-1763 0289 Nussinov,R. Strong Doublet Prefere.. J.Mol.Evol. 84 20:111-119 Nussinov R Strong Doublet Preferences in Nucleotide Sequences and DNA Geometry Composition; N-gram; IL; Dyad; DNA; Nucleotide; Geometry "Analysis of the sequence data available today ... confirms the previously observed phenomenon that there are distinct dinucleotide preferences in DNA sequences. Consistent behaviour is observed in the major sequence groups analysed here in prokaryotes, eukaryotes and mitochondria. Some doublet preferences are common to all groups and are found in most sequences of the Los Alamos Library. The patterns seen in such large data sets are very significant statistically and biologically. Since they are present in numerous and diverse nucleotide sequences, one may infer that they confer evolutionary advantages on the organism." J Mol Evol 20 20 111-119 0290 Zhurkin,V.B. Periodicity in DNA Pri.. Nucleic Acids R 81 9(8):1963-1971 Zhurkin VB Periodicity in DNA Primary Structure is Defined by Secondary Structure of the Coded Protein Regularities; Structure; Linguistic; RU; Segment; Protein; DNA; Secondary "The repeating pattern of nucleotide sequences can be used for comparison of the DNA segments with low degree of homology. ... Such particular sites of DNA as promotors, origins of replication, etc. can be compared with the punctuation marks in a printed text - these are the elements, which determine the 'syntax' of the DNA language. To understand DNA functioning one should learn the laws regulating short-range order in the nucleotide sequence as well ('orthographical' laws). The periodicity found by Trifonov and Sussman [1980] is one of not numerous so far 'orthographical' laws of the DNA language ...." Nucleic Acids Res 1981 9 8 1963-1971 0291 Zhurkin,V.B. Periodicity in DNA Pri.. Stud.Biophys. 82 87(2/3):151-15 Zhurkin VB Periodicity in DNA Primary Structure and Specific Alignment of Nucleosimes Structure; RU; DNA "Reconstitution experiments with core histones have yielded preferred nucleosome positions which are dictated solely by DNA sequence. One of the explanations of this phenomenon is that the nucleosomes favour those segments of DNA which are more easily wraped about the core. ... Here a new concept is suggested to explain the specific alignment of nucleosomes." Stud Biophys 1982 87 2/3 151-152 0292 Nussinov,R. Signals in DNA Sequenc.. Comput.Appl.Bio 91 7(3):295-299 Nussinov R Signals in DNA Sequences and their Potential Properties Sequence analysis; Signal; USA; Pattern discovery; DNA "To date, most signal searches have been focused on specific recurrences of nucleotide sequences. Much less attention has been directed towards the structure, flexibility and hydrogen-bonding patterns that recognition elements may possess. Here we review the various methods involved in such searches. In particular, however, we also address the searches for potential properties. In this regard it is of interest to inspect the asymmetry in the distributions of complementary oligomers near biological features." Comput Appl Biosci 1991 7 3 295-299 0293 Nussinov,R. Sequence Signals in Eu.. Crit.Rev.Bioche 90 25(3):185-224 Nussinov R Sequence Signals in Eukaryotic Upstream Regions Signal; Region; Consensus sequence; IL "Two DNA sequence elements are known to recur frequently upstream of eukaryotic polymerase II-transcribed genes. The TATAAA .... The GGCCAATCT .... Here, I discuss DNA structural considerations in upstream regions along with protein readout of the major and minor groove information content. These sequence-structure aspects are put in the general context of protein (factors)- DNA (elements) recognition and regulartion." Crit Rev Biochem Mol Biol 1990 25 3 185-224 0294 Oliphant,A.R. Defining the Consensus.. Nucleic Acids R 88 16(15):7673-76 Oliphant AR; Struhl K Defining the Consensus Sequence of E. coli Promoter Elements by Random Selection Consensus sequence; USA; Selection "The consensus sequence of E. coli promoter elements was determined by the method of random selection. A large collection of hybrid molecules was produced in which random-sequence oligonucleotides were cloned in place of a wild-type promoter element, and functional -10 and -35 E. coli promoter elements were obtained by a genetic selection involving the expression of a structural gene. The DNA sequences ... for -10 and -35 elements were determined. The consensus sequences determined by this approach are very similar to those determined by comparing DNA sequences of naturally occurring E. coli promoters." Nucleic Acids Res 1988 16 15 7673-7683 0295 Volinia,S. The Frequency of Oligo.. Comput.Appl.Bio 89 5(1):33-40 Volinia S; Gambari R; Bernardi F; Barrai I The Frequency of Oligonucleotides in Mammalian Genic Regions Composition; Statistical; Significance; Region; Italy "We have prepared algorithms for the study of the frequency distribution of all oligonucleotides of length 2-6 in DNA sequences ... and have obtained the distribution of the ratio between the observed frequency of oligonucleotides and their expected frequency based on independent nucleotide probabilities. ... We observed that some oligonucleotides show a statistical behaviour and a regional distribution similar to that of known signal sequences. Moreover the frequency distribution of oligonucleotides of length 5 and 6 tends to become bimodal, indicating the existence of a population of very frequent oligonucleotides." Comput Appl Biosci 1989 5 1 33-40 0296 Huang,X. A Contig Assembly Prog.. Genomics 92 14:18-25 Huang X A Contig Assembly Program Based on Sensitive Detection of Fragment Overlaps Supersequence; Contig; USA; Dynamic programming; Program; Fragment; Detection "An effective computer program for assembling DNA fragments, the contig assembly program (CAP), has been developed. In the CAP program, a filter is used to eliminate quickly fragment pairs that could not possibly overlap, a dynamic programming algorithm is applied to compute the maximal-scoring overlapping alignment between each remaining pair of fragments, and a simple greedy approach is employed to assemble fragments in order of alignment scores." Genomics 14 14 18-25 0297 Dear,S. A Sequence Assembly an.. Nucleic Acids R 91 19(14):3907-39 Dear S; Staden R A Sequence Assembly and Editing Program for Efficient Management of Large Projects Management; Contig; Sequence alignment; UK; Display; Program; Editor; Editing; Sequence assembly "We describe a sequence assembly and editing program for managing large and small projects. It is being used to sequence complete cosmids and has substantially reduced the time taken to process the data. ... All editing is performed using a mouse operated contig editor that displays aligned sequences and their traces together on the screen. The editor ... permits rapid movement along the aligned sequences. Insertions, deletions and replacements can be made in individual aligned readings and global changes can be made by editing the consensus." Nucleic Acids Res 1991 19 14 3907-3911 0298 Peltola,H. SEQAID: A DNA Sequence.. Nucleic Acids R 84 12(1):307-321 Peltola H; Soderlund H; Ukkonen E SEQAID: A DNA Sequence Assembling Program Based on a Mathematical Model Management; Restriction; FI; Program; DNA; Model "The program automatically assembles long DNA sequences from short fragments with minimal user interaction. ... The main novel features of the system are that SEQAID implements several new well-behaved algorithms based on a mathematical model of the problem. It also utilizes available information on restriction fragments to detect illegitimate overlaps and to fine relationships between separately assembled sequence blocks." Nucleic Acids Res 1984 12 1 307-321 0299 Brown,A.H.D. Analysis of Variation .. Statistical A.. 83Marcel Dekker Brown AHD; Clegg MT Analysis of Variation in Related DNA Sequences Weir BS Statistical Analysis of DNA Sequence Data Sequence analysis; AU; Genome; Statistical; DNA See Weir (1983) for the book's bibliography, pp. 231-248. "As the application of sequencing technology expands, numerous copies of a particular sequence will be available for comparison. For instance, different copies of a gene which is highly reiterated in the genome may be sequenced, or several copies of a particular single copy sequence drawn from different individuals may be analyzed. In this chapter we will consider the analysis of sequence variation in a sample of highly repeated genes. Our objective will be to illustrate several statistical questions which arise naturally from such data." Marcel Dekker New York, NY 1983 107-132 0300 Kanehisa,M.I. Pattern Recognition in.. Nucleic Acids R 82 10(1):265-278 Kanehisa MI; Goad WB Pattern Recognition in Nucleic Acid Sequences. II. An Efficient Method for Finding Locally Stable Secondary Structures Pattern recognition; Structure; USA; Nucleic acid; Secondary; Recognition "We present a method for calculating all possible single hairpin loop secondary structures in a nucleic acid sequence by the order of N2 operations where N is the total number of bases. Each structure may contain any number of bulges and internal loops. Most natural sequences are found to be indistinguishable from random sequences in the potential of forming secondary structures, which is defined by the frequency of possible secondary structures calculated by the method. There is a strong correlation between the higher G+C content and the higher structure forming potential." Nucleic Acids Res 1982 10 1 265-278 0301 Korn,L.J. Computer Analysis of N.. Proc.Nat.Acad.S 77 74(10):4401-44 Korn LJ; Queen CL; Wegman MN Computer Analysis of Nucleic Acid Regulatory Sequences Sequence analysis; Program; Regularities; Repetition; USA; Dyad; Restriction A computer program designed to facilitate the analysis of nucleic acid sequences "can search several nucleotide sequences for oligonucleotides common to all of them. It can examine a DNA or RNA sequence for two kinds of homologous regions - repetitions and dyad symmetries. The homologies need not be perfect: mismatches and 'looping out' of nucleotides are allowed. The program also finds (A+T)- and (C+G)-rich regions, locates restriction enzyme recognition sites, determines the distribution of di- and trinucleotides, and performs various other functions." Proc Nat Acad Sci USA 1977 74 10 4401-4405 0302 McCallum,D. Computer Processing of.. J.Mol.Biol. 77 116:29-30 McCallum D; Smith M Computer Processing of DNA Sequence Data Sequence analysis; Program; UK; Frame; Editor; Sequence search; DNA "In this Appendix we describe the basic features of the computer programs used in this study." Compilation and numbering of the sequence. Editing and revision. Search for specific sequences. Search for families of sequences. Translation into protein sequence simultaneously in all three reading frames. J Mol Biol 116 116 29-30 0303 Orcutt,B.C. Nucleic Acid Sequence .. Nucleic Acids R 82 10(1):157-174 Orcutt BC; George DG; Fredrickson JA; Dayhoff MO Nucleic Acid Sequence Database Computer System Sequence database; Database search; USA; Nucleic acid "On September 15, 1980, the Nucleic Acid Sequence Database Demonstration Project of the National Biomedical Research Foundation was made available to interested users through telephone access to our computer. ... The main retrieval program of the system is the nucleic acid query program (NAQ). The commands of this program and other ancillary programs of the system are designed so that similar nucleic acid sequences can be aligned readily, protein sequences or the complements of nucleic acid sequences can be constructed, stored, and aligned, and feature tables can be examined." Nucleic Acids Res 1982 10 1 157-174 0304 Queen,C.L. Computer Analysis of N.. Methods Enzymol 80 65:595-609 Queen CL; Korn LJ Computer Analysis of Nucleic Acids and Proteins Sequence analysis; Program; USA; Protein "In this article, we begin by outlining some general principles of computer utilization. Then we discuss the computer programs available for nucleic acid analysis and present in detail one comprehensive program that can be used for amino acid as well as nucleotide sequences. We conclude with a comment on the interpretation of computer-generated information." See also Korn, Queen, Wegman (1977). Methods Enzymol 65 65 595-609 0305 Staden,R. Sequence Data Handling.. Nucleic Acids R 77 4(11):4037-405 Staden R Sequence Data Handling by Computer Management; Sequence analysis; Program; UK "The speed of the new DNA sequencing techniques has created a need for computer programs to handle the data produced. This paper describes simple programs designed specifically for use by people with little or no computer experience. The programs are for use on small computers and provide facilities for storage, editing and analysis of both DNA and amino acid sequences." Nucleic Acids Res 1977 4 11 4037-4051 0306 Staden,R. Further Procedures for.. Nucleic Acids R 78 5(3):1013-1015 Staden R Further Procedures for Sequence Analysis by Computer Sequence analysis; Program; UK; Management "A previous paper [Staden (1977)] described programs for sequence data handling and analysis by computer. The facilities of this basic set are extended by further easily used programs." Nucleic Acids Res 1978 5 3 1013-1015 0307 Eigen,M. Statistical Geometry i.. Proc.Nat.Acad.S 88 85:5913-5917 Eigen M; Winkler-Oswatitsch R; Dress A Statistical Geometry in Sequence Space: A Method of Quantitative Comparative Sequence Analysis Multiple comparison; Significance; DE; Statistical; Sequence analysis; Segment; Invariant; Geometry "A statistical method of comparative sequence analysis that combines horizontal and vertical correlations among aligned sequences is introduced. It is based on the analysis mainly of quartet combinations of sequences considered as geometric configurations in sequence space. Numerical invariants related to relative internal segment lengths are assigned to each such configuration and statistical averages of these invariants are established. They are used for internal calibration of the topology of divergence and for quantitative determination of the noise level." Proc Nat Acad Sci USA 85 85 5913-5917 0308 Sankoff,D. Efficient Optimal Deco.. Math.Biosci. 92 111:279-293 Sankoff D Efficient Optimal Decomposition of a Sequence into Disjoint Regions, Each Matched to Some Template in an Inventory Decomposition; CA; Region; Consensus sequence; Dynamic programming; Profile; Template; Optimal "Given an amino acid sequence, we discuss how to find efficiently an optimal set of disjoint regions (substrings, domains, modules, etc.), each of which can be matched to some element of a predefined inventory containing, for example, consensus sequences, protosequences, or protein family profiles. ... [The problem] can be solved in time quadratic in the length of the sequence and linear with the number of templates in the inventory, by a single pass of a dynamic programming algorithm." Math Biosci 111 111 279-293 0309 Auger,I.E. Algorithms for the Opt.. Bull.Math.Biol. 89 51:39-54 Auger IE; Lawrence CE Algorithms for the Optimal Identification of Segment Neighborhoods Sequence analysis; Common feature; Segment; Least squares; Likelihood; USA; Optimal; Identification; Algorithm "Two algorithms for the efficient identification of segment neighborhoods are presented. A segment neighborhood is a set of contiguous residues that share common features. Two procedures are developed to efficiently find estimates for the parameters of the model that describe these features and for the residues that define the boundaries of each segment neighborhood. The algorithms can accept nearly any model of segment neighborhood, and can be applied with a broad class of best fit functions including least squares and maximum likelihood." Bull Math Biol 51 51 39-54 0310 Smith,T.F. The History of the Gen.. Genomics 90 6:701-707 Smith TF The History of the Genetic Sequence Databases Sequence database; USA; Genetic A historical sketch with 25 references Genomics 6 6 701-707 0311 George,D.G. The Protein Identifica.. Nucleic Acids R 86 14(1):11-15 George DG; Barker WC; Hunt LT The Protein Identification Resource (PIR) Sequence database; USA; Coding; Identification; Protein; PIR "The Protein Identification Resource, which provides the scientific community with an efficient on-line computer system designed for the identification and analysis of protein sequences and their corresponding coding sequences, has been established. The resource consists of an integrated computer system composed of a number of protein and nucleic acid sequence databases and the software necessary to analyze this information effectively." Nucleic Acids Res 1986 14 1 11-15 0312 Chappey,C. A Method for Delineati.. Comput.Appl.Bio 92 8(3):255-260 Chappey C; Hazout S A Method for Delineating Structurally Homogeneous Regions in Protein Sequences FR; Region; Common feature; Sequence analysis; Profile; Protein "A homogeneous region in a protein sequence is a set of contiguous residues that share common features, concerning physico-chemical, structural and mutational information. This paper presents a method for identifying such homogeneous regions. From a profile describing a given type of biological information along the sequence, the algorithm allows the segmentation of the sequence by optimizing a criterion characterized by two user-defined control parameters: the 'homogenizing degree' of the regions and the 'site neighbourhood' size." Comput Appl Biosci 1992 8 3 255-260 0313 Claverie,J.M. Smoothing Profiles wit.. Comput.Appl.Bio 91 7(1):113-115 Claverie JM; Daulmerie C Smoothing Profiles with Sliding Windows: Better to Wear a Hat! Sequence analysis; Display; Profile; FR "A general way to analyze sequences is to turn them into lists of position-dependent numerical values, the graphical representation of which provides a sequence profile suitable for visual inspection and 'pattern recognition'. ... We examined here the merit of the 'triangular' window, which consists of associating linearly decreasing weights with the positions starting from the center ('hat' average). ... It is remarkable how such a minor change in the smoothing algorithm improves the profile readability." Comput Appl Biosci 1991 7 1 113-115 0314 Peltola,H. Algorithms for Some St.. Information P.. 83North-Holland Peltola H; Soderlund H; Tarhio J; Ukkonen E Algorithms for Some String Matching Problems Arising in Molecular Genetics Mason REA Information Processing 83. Proceedings of the IFIP 9th World Computer Congress. Paris, France, September 19-23, 1983 Composition; FI; String match; Genetic; Algorithm "With current laboratory techniques it is possible to determine the nucleotide order for relatively short fragments of a long DNA molecule while the total order for long molecules must be reconstructed from the fragments. ... We give for this problem a simple formulation as a string matching problem, and develop efficient algorithms for finding good approximate solutions." North-Holland Amsterdam 1983 59-64 0315 Gingeras,T.R. Steps Toward Computer .. Science 80 209(19 Sept.): Gingeras TR; Roberts RJ Steps Toward Computer Analysis of Nucleotide Sequences Sequence analysis; Review; USA; Clone; Nucleotide "Concomitant improvements in methods for nucleic acid sequencing have led many investigators to characterize their clones by sequencing them. This has resulted in the accumulation of such large amounts of sequence data that computer-assisted methods, with programs directed toward the manipulation of nucleic acid sequences, have become indispensable during the collection and analysis of that data. ... It is the intent of this article to report on the developing role of computer technology in this field." Science 1980 209 19 Sept. 1322-1328 0316 Stormo,G.D. Quantitative Analysis .. Nucleic Acids R 86 14(16):6661-66 Stormo GD; Schneider TD; Gold L Quantitative Analysis of the Relationship between Nucleotide Sequence and Functional Activity Function; Match a pattern matrix; USA; Nucleotide "Several recent papers have used matrices to evaluate nucleic acid sequences .... The different papers vary in their methods of assigning values to the elements of the matrix. In this paper we show how to use methods for solving simultaneous equations to find the matrix elements that give the best fit to a set of quantitative data." Nucleic Acids Res 1986 14 16 6661-6679 0317 Kashyap,R.L. Spelling Correction us.. Pattern Recogni 84 2(3):147-154 Kashyap RL; Oommen BJ Spelling Correction using Probabilistic Methods Correction; USA; Probabilistic; Edit "A probabilistic procedure is suggested for the automatic correction of spelling and typing errors in printed English texts. The heart of the procedure is a probabilistic model for the generation of the garbled word from the correct word. The garbler can delete or insert symbols in the word or substitute one or more symbols by other symbols. An expression is derived for P(Y | X), the probability of generating a garbled word Y from a correct word X. The model is probabilistically consistent." Pattern Recognition Lett 1984 2 3 147-154 0318 Peterson,J.L. Computer Programs for .. Comm.ACM 80 23(12):676-687 Peterson JL Computer Programs for Detecting and Correcting Spelling Errors Correction; USA; Error; Program "With the increase in word and text processing computer systems, programs which check and correct spelling will become more and more common. Peterson investigates the basic structure of several such existing programs and their approaches to solving the problems which arise when this type of program is created. The basic framework and background necessary to write a spelling checker or corrector are provided." Comm ACM 1980 23 12 676-687 0319 Riseman,E.M. A Contextual Postproce.. IEEE Trans.Comp 74 23(5):480-493 Riseman EM; Hanson AR A Contextual Postprocessing System for Error Correction using Binary n- Grams Correction; N-gram; USA; Pattern recognition; Error "The effectiveness of various forms of contextual information in a postprocessing system for detection and correction of errors in words is examined. Various algorithms utilizing context are considered, from a dictionary algorithm which has available the maximum amount of information, to a set of contextual algorithms utilizing positional binary n-gram statistics. ... This type of information is extremely compact and the computation for error correction is orders of magnitude less than that required by the dictionary algorithm." IEEE Trans Comput 1974 23 5 480-493 0320 Smith,T.F. Statistical Characteri.. Nucleic Acids R 83 11(7):2205-222 Smith TF; Waterman MS; Sadler JR Statistical Characterization of Nucleic Acid Sequence Functional Domains Function; USA; Statistical; Genome; Segment; Coding; Nucleic acid; Characterization "It has long been recognized that various genome classes were distinguishable on the basis of base composition and nearest neighbor frequencies. ... It is now clear that these and related statistics can uniquely characterize the various functional domains of the genome. In particular, peptide coding, intervening segments, structural RNA coding and mitochondrial domains of the vertebrate genome are uniquely characterizable. ... Here, we investigated the statistical measures most distinctive of the various domains and then linked them to our current understandings in so far as possible." Nucleic Acids Res 1983 11 7 2205-2220 0321 Liquori,A.M. Pattern Recognition of.. J.Mol.Evol. 86 23:80-87 Liquori AM; Ripamonti A; Sadun C; Ottani S; Braga D Pattern Recognition of Sequence Similarities in Globular Proteins by Fourier Analysis: A Novel Approach to Molecular Evolution Pairwise comparison; Fourier; Pattern recognition; Segment; Italy; Similarity; Evolution; Protein; Recognition "A new algorithm is introduced for analyzing gene-duplication-independent (orthologous) and gene-duplication-dependent amino acid sequence similarities between proteins of different species. It is based on the calculation of an auto-correlation function D(x) as a Fourier series analogous to that used in crystal analysis by x-ray diffraction. ... This method allows satisfactory pattern recognition of homologies and internal duplications of an initial segment of the polypeptide chain." J Mol Evol 23 23 80-87 0322 Sankoff,D. Genomic Divergence thr.. Methods Enzymol 90 183:428-438 Sankoff D; Cedergren R; Abel Y Genomic Divergence through Gene Rearrangement Genome; Genomic; Probabilistic; Divergence; Gene; Rearrangement; CA "In this chapter we discuss simple probabilistic models for genome shuffling introduced by Sankoff and Goldstein [1989] and apply them to the assessment of relationships among a number of bacterial genomes. Lacking complete nucleotide sequences at this level, we assess our methodology on genetic map data." Methods Enzymol 183 183 428-438 0323 Hillis,D.M. Analysis of DNA Sequen.. Methods Enzymol 93 224:456-487 Hillis DM; Allard MW; Miyamoto MM Analysis of DNA Sequence Data: Phylogenetic Inference Phylogeny; Review; USA; DNA; Phylogenetic "Methods for inferring phylogeny from DNA sequences have proliferated greatly in the last few years. Unfortunately, decisions concerning which of many described methods will be used in a given study are rarely made by weighing the advantages and disadvantages of each approach; instead, issues of availability or historical inertia often dictate such choices. ... Our goal in this chapter is to present a practical guide to selecting a set of methods for phylogenetic analysis of nucleic acid sequences. We focus on the assumptions, advantages, disadvantages, and limitations of the various approaches. Space does not permit a description of each of the algorithms, but many of these are described in an excellent review paper by Swofford and Olsen (1990)." Methods Enzymol 224 224 456-487 0324 Zakharov,I.A. Quantitative Analysis .. Dokl.Biol.Sci. 88 301:443-447 Zakharov IA; Valeev AK Quantitative Analysis of Evolution of Mammalian Genomes by Comparison of Genetic Maps Genome; Genetic; Mapping; RU; Evolution Translated from Doklady Akademii Nauk SSSR, 301(5), 1213-1218, Aug. 1988. "The number of homologous genes with known location in different mammals is high enough to attempt to make a stricter analysis. Since no works are known to us in which an appropriate mathematical apparatus has been proposed, our first task was to formulate approaches to a quantitative analysis of similarities and differences in genetic maps and to test the possibilities of such analysis by comparing genetic maps of five species of mammals. ... Our first task was the determination of the measure of similarity between genetic maps compared." Four proximity measures are described. Dokl Biol Sci 301 301 443-447 0325 Middendorf,M. The Shortest Common No.. Theoret.Comput. 93 108:365-369 Middendorf M The Shortest Common Nonsubsequence Problem is NP-complete Nonsubsequence; DE; Complexity; Shortest common "The SCNS problem is shown to be NP-complete for strings over an alphabet of size >= 2." Theoret Comput Sci 108 108 365-369 0326 Hebrard,J.J. An Algorithm for Disti.. Theoret.Comput. 91 82:35-49 Hebrard JJ An Algorithm for Distinguishing Efficiently Bit-Strings by their Subsequences Longest common; Sequence proximity; FR; Subsequence; Algorithm "A linear on-line algorithm for computing a shortest subsequence that distinguishes two different bit-strings is presented. The method is based on a special way of factorizing strings. ... One can also consider as a measure of similarity the greatest integer d(u,v) such that no string of length <= d(u,v) can distinguish u and v. This paper is devoted to the computation of d(u,v)." Theoret Comput Sci 82 82 35-49 0327 Claverie,J.M. k-Tuple Frequency Anal.. Methods Enzymol 90 183:237-252 Claverie JM; Sauvaget I; Bougueleret L k-Tuple Frequency Analysis: From Intron/Exon Discrimination to T-cell Epitope Mapping Sequence analysis; k-tuple; FR; N-gram; Discrimination; Mapping "There are many classes of functions for which neither a sufficient overall homology nor a discriminant functional signature can be found. ... For such problems, we have developed an alternative approach which does not depend on the a priori recognition of a single or a few specific, highly discriminant, patterns. Instead, the methods presented in this chapter take advantage of the frequencies of occurrence of all subsequences of length k (k-tuples) as computed from the sequence of interest." Methods Enzymol 183 183 237-252 0328 Searls,D.B. The Linguistics of DNA Am.Sci. 92 80(Nov.-Dec.): Searls DB The Linguistics of DNA Sequence analysis; Language; USA; Genome; Linguistic; DNA "Finding effective methods for reading the language of nucleic acids is rapidly becoming an issue of practical concern. ... Equipped with knowledge of the linguistic structure of the genome, one can endeavor to write a computer program that parses genes and other high-level features of DNA." Am Sci 1992 80 Nov.-Dec. 579-591 0329 Brendel,V. Genome Structure Descr.. Nucleic Acids R 84 12(5):2561-256 Brendel V; Busse HG Genome Structure Described by Formal Languages Genome; Language; Linguistic; DE; Automata; Structure "Nucleic acid sequences may be looked upon as words over the alphabet of nucleotides. Naturally occurring DNAs and RNAs form subsets of the set of all possible words. The use of formal languages is proposed to describe the structure of these subsets. Regular languages defined by finite automata are introduced to demonstrate the application of the concept on RNA-phages of group I. This approach permits a concise characterization of grammatical patterns in genetic information." Nucleic Acids Res 1984 12 5 2561-2568 0330 Collado-Vides A Transformational-Gra.. J.Theor.Biol. 89 136:403-425 Collado-Vides J A Transformational-Grammar Approach to the Study of the Regulation of Gene Expression Language; Linguistic; Genome; MEX; Expression; Gene "We propose generative grammar for constructing an integrative paradigm for the understanding of genome organization and the regulation of gene expression. Linguistic terms in molecular biology are defined. ... A general structure is presented for the grammar; the application of phase-structure rules is justified by the existence of lexical categories. Transformational rules are utilized to represent loops of regulation. ... Finally, this approach is compared to other linguistic applications in molecular biology." J Theor Biol 136 136 403-425 0331 Head,T. Formal Language Theory.. Bull.Math.Biol. 87 49(6):737-759 Head T Formal Language Theory and DNA: An Analysis of the Generative Capacity of Specific Recombinant Behaviors Sequence analysis; Language; USA; DNA "A new manner of relating formal language theory to the study of informational macromolecules is initiated. ... The associated languages are analysed by means of a new generative formalism called a splicing system. A significant subclass of these languages, which we call the persistent splicing languages, is shown to coincide with a class of regular languages which have been previously studied in other contexts: the strictly locally testable languages. This study initiates the formal analysis of the generative power of recombinational behaviors in general." Bull Math Biol 1987 49 6 737-759 0332 Newberg,L.A. A Lower Bound on the N.. Adv.Appl.Math. 93 14(2):172-183 Newberg LA; Naor D A Lower Bound on the Number of Solutions to the Probed Partial Digest Problem Digest; Mapping; USA "The probed partial digestion mapping method partially digests a DNA strand with a restriction enzyme. A probe, which attaches to the DNA between two restriction enzyme cutting sites, is hybridized to the partially digested DNA, and the sizes of fragments to which the probe hybridizes are measured. The objective is to reconstruct the linear order of the restriction enzyme cutting sites from the multiset of measured lengths. ... This article shows that a multiset of N measured lengths can have as many as W(Nt) solutions for any t < ... 1.73." Adv Appl Math 1993 14 2 172-183 0333 Barth,G. Relating the Average-c.. Combinatorial.. 85Springer-Verlag Barth G Relating the Average-case Costs of the Brute-Force and Knuth-Morris-Pratt String Matching Algorithm Apostolico A Galil Z Combinatorial Algorithms on Words String match; Knuth-Morris-Pratt; Markov; DE; Algorithm "The main objective of this paper is to elaborate on this observation [that the Knuth-Morris-Pratt algorithm is not likely to be significantly faster than the brute-force method in most actual applications] and to present a detailed and accurate average-case analysis of both the brute-force and the KMP algorithm. The analysis exploits results from Markov chain theory. ... An accurate approximation for the ratio KMP/NAIVE, where KMP and NAIVE denote the average case complexities of the KMP and naive string matching algorithms, respectively, is given by the term 1 - (1/c) + (1/c2)." Springer-Verlag Berlin 1985 45-58 0334 Odlyzko,A.M. Enumeration of Strings Combinatorial.. 85Springer-Verlag Odlyzko AM Enumeration of Strings Apostolico A Galil Z Combinatorial Algorithms on Words Enumeration; Pattern match; String match; Survey; USA "A survey is presented of some methods and results on counting words that satisfy various restrictions on subwords (i.e., blocks of consecutive symbols). Various applications to comma-free codes, games, pattern matching, and other subjects are indicated. The emphasis is on the unified treatment of those topics through the use of generating functions." Springer-Verlag Berlin 1985 205-228 0335 Main,M.G. Linear Time Recognitio.. Combinatorial.. 85Springer-Verlag Main MG; Lorentz RJ Linear Time Recognition of Squarefree Strings Apostolico A Galil Z Combinatorial Algorithms on Words Regularities; Square; USA; Recognition "This paper presents a new O(n log n) algorithm to determine whether a string of length n has a substring which is a square. The algorithm is not as general as some previous algorithms for finding all squares ..., but it does have a simplicity which the others lack. Also, for a fixed alphabet of size k, the algorithm can be improved by a factor of logk (n), yielding an O(n) algorithm for determining whether a string contains a square." Springer-Verlag Berlin 1985 271-278 0336 Rabin,M.O. Discovering Repetition.. Combinatorial.. 85Springer-Verlag Rabin MO Discovering Repetitions in Strings Apostolico A Galil Z Combinatorial Algorithms on Words Regularities; Repetition; String match; Fingerprint; USA "In the present paper we employ the fingerprinting method to solve yet another string matching problem. Given a string y we want to find the earliest repetition, i.e. the shortest w and x such that y = wxxz. We shall call this the repetition problem." Springer-Verlag Berlin 1985 279-288 0337 Restivo,A. Some Decision Results .. Combinatorial.. 85Springer-Verlag Restivo A; Salemi S Some Decision Results on Nonrepetitive Words Apostolico A Galil Z Combinatorial Algorithms on Words Regularities; Repetition; Square; Italy; Word "The paper addresses some generalizations of the Thue Problem such as: given a word u, does there exist an infinite nonrepetitive overlap free (or square free) word having u as a prefix? A solution to this as well as to related problems is given for the case of overlap free words on a binary alphabet." Springer-Verlag Berlin 1985 289-295 0338 Combinatorial Algorith.. 85Springer-Verlag Combinatorial Algorithms on Words Apostolico A Galil Z BK - String match; Search tree; Enumeration; Regularities; USA; Compression; Combinatorial; Word; Algorithm Table of contents only. General (1 paper), string matching (4), subword trees (2), data compression (5), counting (4), periods and other regularities (4), miscellaneous (5). Springer-Verlag Berlin 1985 0-0 0339 Waterman,M.S. Foreword [Mathematical.. Bull.Math.Biol. 89 51(1):1-4 Waterman MS Foreword [Mathematical Analysis of Molecular Sequences. Special Issue] Sequence analysis; USA; Statistical; Region; Genome; Approximate match; Codon; Coding "The present issue is a collection of [ten] research papers in the area of mathematical analysis of molecular sequences." Approximate matching of sequences, fit models to protein sequences, statistical properties of a DNA sequence, codon preference in protein coding regions, genome comparison, large deviations for the binomial distribution. Bull Math Biol 1989 51 1 1-4 0340 Combinatorial Pattern .. 94Springer-Verlag Combinatorial Pattern Matching. 5th Annual Symposium, CPM 94. Proceedings. Lecture Notes in Computer Science, Volume 807. Crochemore M Gusfield D BK - Sequence alignment; Pattern match; FR; Language; Expression; Combinatorial Asilomar, June 5-8, 1994. "Combinatorial Pattern Matching addresses issues of searching and matching of strings and more complicated patterns such as trees, regular expressions, extended expressions, etc. The goal is to derive non-trivial combinatorial properties for such structures and then to exploit these properties in order to achieve superior performances for the corresponding computational problems." Springer-Verlag Berlin 1994 viii+326-0 0341 Molecular Evolution: C.. 90Academic Press Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences. Methods in Enzymology, Volume 183. Doolittle RF BK - Sequence database; Database search; Pattern match; Structure; Sequence alignment; Phylogeny; USA; Evolution; Protein; Nucleic acid Title page, table of contents only. Databases (4 papers), searching databases (5), patterns in nucleic acid sequences (7), predicting RNA secondary structure (3), aligning protein and nucleic acid sequences (12), estimating sequence divergence (5), phylogenetic trees (6). Academic Press San Diego 1990 1-707 0342 Angluin,D. Finding Patterns Commo.. ACM Sympos.Theo 79 11:130-141 Angluin D Finding Patterns Common to a Set of Strings (Extended Abstract) Pattern language; Pattern definition; USA "We motivate, formalize, and study a computational problem in concrete inductive inference. A 'pattern' is defined to be a concatenation of constants and variables, and the language of a pattern is defined to be the set of strings obtained by substituting constant strings for the variables. The problem we consider is, given a set of strings, find a minimal pattern language containing this set. This problem is shown to be effectively solvable in the general case and to lead to correct inference in the limit of the pattern languages. There exists a polynomial time algorithm for it in the restricted case of one-variable patterns." ACM Sympos Theory Comput 11 11 130-141 0343 Chrobak,M. Remarks on String-matc.. Inform.Process. 87 24(5):325-329 Chrobak M; Rytter W Remarks on String-matching and One-way Multihead Automata Pattern match; Automata; Complexity; PO; String match "The complexity of string-matching has been deeply investigated both in sequential and parallel models of computation. Since string-matching can be done in real time, there is no sense of talking about any lower bounds concerning its time complexity. However, one can ask what is the simplest possible device capable of doing string-matching." Inform Process Lett 1987 24 5 325-329 0344 Waterman,M.S. Interval Graphs and Ma.. Bull.Math.Biol. 86 48(2):189-195 Waterman MS; Griggs JR Interval Graphs and Maps of DNA Graph; Restriction; Mapping; USA; DNA "A special class of interval graphs is defined and characterized, and an algorithm is given for their construction. These graphs are motivated by an important representation of DNA called restriction maps by molecular biologists. Circular restriction maps are easily included." Bull Math Biol 1986 48 2 189-195 0345 Galil,Z. Two Fast Simulations w.. Inform.Process. 76 4(4):85-87 Galil Z Two Fast Simulations which Imply some Fast String Matching and Palindrome- Recognition Algorithms Pattern match; Regularities; USA; String match; Simulation; Algorithm "Theorems 1 and 3 imply some fast algorithms for string-matching and for palindrome-recognition which could not be derived directly by the previously known simulations." Inform Process Lett 1976 4 4 85-87 0346 Manacher,G. A New Linear-time "On-.. J.Assoc.Comput. 75 22(3):346-351 Manacher G A New Linear-time "On-line" Algorithm for Finding the Smallest Initial Palindrome of a String Regularities; Automata; USA; Palindrome; Algorithm "Despite significant advances in linear-time scanning algorithms, particularly those based wholly or in part on either Cook's linear-time simulation of two-way deterministic pushdown automata or Weiner's algorithm, the problem of recognizing the initial leftmost nonvoid palindrome of a string in time proportional to the length N of the palindrome, examining no symbols other than those in the palindorme, has remained open. The present algorithm solves this problem, assuming that addition of two integers less than or equal to N may be performed in a single operation." J Assoc Comput Mach 1975 22 3 346-351 0347 Stephen,G.A. String Search 92 Stephen GA String Search BK - String match; Sequence proximity; Longest common; String search; UK Technical Report TR-92-gas-01, University College of North Wales, Bangor, Gwynedd, UK, 138 pp. "This report is concerned with a string searching problem which has arisen in the development of textual search methods for a proposed information processing system. The aim of the report is not to attempt to provide a definitive solution for the problem, although some ideas towards this end are presented, but rather to review existing string searching algorithms germane to the processing system in general and, in particular, to the specific problem." 1992 0348 Burkowski,F.J A Hardware Hashing Sch.. IEEE Trans.Comp 82 31(9):825-834 Burkowski FJ A Hardware Hashing Scheme in the Design of a Multiterm String Comparator Match with don't cares; Hardware; Retrieval; CA; Text search "This paper discusses the hardware design of a term detection unit which may be used in the scanning of text emanating from a serial source such as a disk or bubble memory. The main objective of this design is the implementation of a high performance unit which can detect any one of many terms (e.g., 1024 terms) while accepting source text at disk transfer rates." IEEE Trans Comput 1982 31 9 825-834 0349 Crochemore,M. Transducers and Repeti.. Theoret.Comput. 86 45:63-86 Crochemore M Transducers and Repetitions Regularities; Automata; FR; Search tree; Repetition "The factor transducer of a word associates to each of its factors (or subwords) their first occurrence. Optimal bounds on the size of minimal factor transducers together with an algorithm for building them are given. Analogue results and a simple algorithm are given for the case of subsequential suffix transducers. Algorithms are applied to repetition searching in words. ... Thanks to factor transducers we get an O(n) algorithm for finding a square in a word of length n on a fixed alphabet." Compare with Main, Lorentz (1984). Theoret Comput Sci 45 45 63-86 0350 Hirata,M. A Versatile Data Strin.. IEEE J.Solid-St 88 23(2):329-335 Hirata M; Yamada H; Nagai H; Takahashi K A Versatile Data String-Search VLSI Approximate match; Match with don't cares; Hardware; Automata; JP; VLSI; String search "A versatile data string-search VLSI has been described. An 8K content addressable memory and a 20K-gate finite-state automaton logic have been combined to execute data string search. This architecture allowed versatile operations, such as approximate-match and variable-length 'don't care' search at high speed." IEEE J Solid-State Circuits 1988 23 2 329-335 0351 Landau,G.M. Efficient String Match.. IEEE Sympos.Fou 85 26:126-136 Landau GM; Vishkin U Efficient String Matching in the Presence of Errors Approximate match; Error; String match; IL "Given a text of length n, a pattern of length m and an integer k, we present an algorithm for finding all occurrences of the pattern in the text, each with at most k differences. The algorithm runs in O( m2 + k2n ) time. Given the same input we also present an algorithm for finding all occurrences of the pattern in the text, each with at most k mismatches (superfluous characters in either the text or the pattern are not allowed). This algorithm runs in O( k (m log m + n) ) time." IEEE Sympos Found Comput Sci 26 26 126-136 0352 Landau,G.M. Parallel Construction .. Lecture Notes i 87 267:314-325 Landau GM; Schieber B; Vishkin U Parallel Construction of a Suffix Tree (Extended Abstract) Search tree; Parallel; Suffix; IL Proceedings of the 14th ICALP. "Weiner's (1973) suffix tree is known to be a powerful tool for string manipulations. We present a parallel algorithm for constructing a suffix tree. The algorithm runs in O(log n) time and uses n processors. We also present applications for designing efficient parallel algorithms for several string problems." Lecture Notes in Comput Sci 267 267 314-325 0353 Cornish-Bowde Assessment of Protein .. J.Theor.Biol. 77 65:735-742 Cornish-Bowden A Assessment of Protein Sequence Identity from Amino Acid Composition Data Sequence comparison; Composition; Sequence proximity; UK; Protein; Amino acid "In this paper I shall show that for two proteins of equal length an index similar to that of Marchalonis & Weltman (1971) provides a direct and unbiased estimate of the number of differences between the two sequences. The precision of this estimate can be calculated a priori and so the reliability of deductions about ancestral relationships can be assessed. Straightforward interpretation of the indexes of Harris et al. (1969) and of Marchalonis & Weltman (1971) is now possible, because both can readily be converted into the new index with very little calculation." J Theor Biol 65 65 735-742 0354 Bishop,M.J. Preface [Nucleic Acid .. Nucleic Acid .. 87IRL Press Bishop MJ; Rawlings CJ Preface [Nucleic Acid and Protein Sequence Analysis: A Practical Approach] Bishop MJ Rawlings CJ Nucleic Acid and Protein Sequence Analysis: A Practical Approach Sequence analysis; Database search; Sequence comparison; UK; Protein "This book is designed as a practical aid to biologists wishing to use computers for the acquisition, storage, or analysis of nucleic acid or protein sequences." IRL Press Oxford 1987 v-v 0355 Copeland,N.G. A Genetic Linkage Map .. Science 94 262 (1 Oct.):5 Copeland NG; Jenkins NA; Gilbert DJ; Eppig JT; Maltais LJ; Miller JC; Dietrich WF; Weaver A; Lincoln SE; Steen RG; Stein LD; Nadeau JH; Lander ES A Genetic Linkage Map of the Mouse: Current Applications and Future Prospects Genetic; Mapping; Genome; Evolution; Gene; USA "Technological advances have made possible the development of high- resolution genetic linkage maps for the mouse. These maps in turn offer exciting prospects for understanding mammalian genome evolution through comparative mapping, for developing mouse models of human disease, and for identifying the function of all genes in the organism." Science 1994 262 1 Oct. 57-66 0356 Staden,R. The Current Status and.. Nucleic Acids R 86 14(1):217-231 Staden R The Current Status and Portability of our Sequence Handling Software Management; Program; Sequence comparison; Dot; UK "The package contains a comprehensive suite of programs for managing large shotgun sequencing projects, a program containing 61 functions for analysing single sequences and a program for comparing pairs of sequences for similarity. ... I believe the programs will now run on any machine with a FORTRAN 77 compiler and sufficient memory." Nucleic Acids Res 1986 14 1 217-231 0357 Hamm,G.H. The EMBL Data Library Nucleic Acids R 86 14(1):5-9 Hamm GH; Cameron GN The EMBL Data Library Sequence database; EMBL; DE "The EMBL Data Library was the first internationally supported central resource for nucleic acid sequence data. Working in close collaboration with its American counterpart, GenBank, the library prepares and makes available to the scientific community a comprehensive collection of the published nucleic acid sequences. This paper describes briefly the contents of the database, how it is available, and possible future enhancements of Data Library services." Nucleic Acids Res 1986 14 1 5-9 0358 Chin,F.Y.L. A Fast Algorithm for C.. J.Inform.Proces 90 13(4):463-469 Chin FYL; Poon CK A Fast Algorithm for Computing Longest Common Subsequences of Small Alphabet Size Longest common; Subsequence; HK; Algorithm "This paper presents a new algorithm for [the LCS] problem .... This algorithm is particularly efficient when s (the alphabet size) is small. Different data structures are used to obtain variations of the basic algorithm that require different time and space complexities." J Inform Process 1990 13 4 463-469 0359 Aho,A.V. Efficient String Match.. Comm.ACM 75 18(6):333-340 Aho AV; Corasick MJ Efficient String Matching: An Aid to Bibliographic Search Dictionary match; USA; Pattern match; String match; Automata; Knuth-Morris-Pratt "This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text. The algorithm consists of constructing a finite state pattern matching machine from the keywords and then using the pattern matching machine to process the text string in a single pass. ... Our approach combines the ideas in the Knuth- Morris-Pratt algorithm with those of finite state machines." Comm ACM 1975 18 6 333-340 0360 Aho,A.V. A Minimum Distance Err.. SIAM J.Comput. 72 1(4):305-312 Aho AV; Peterson TG A Minimum Distance Error-correcting Parser for Context-free Languages Correction; Language; USA; Edit; Distance; Parser "We assume three types of syntax errors can debase the sentences of a language generated by a context-free language: the replacement of a symbol by an incorrect symbol, the insertion of an extraneous symbol, or the deletion of a symbol. We present an algorithm that will parse any input string to completion finding the fewest possible number of errors. On a random access computer the algorithm requires time proportional to the cube of the length of the imput." SIAM J Comput 1972 1 4 305-312 0361 Alexandrov,N. Local Multiple Alignme.. Comput.Appl.Bio 92 8(4):339-345 Alexandrov NN Local Multiple Alignment by Consensus Matrix Multiple alignment; Gap; Consensus matrix; JP; Matrix A new algorithm for aligning several sequences based on the calculation of a consensus matrix and the comparison of all the sequences using this consensus matrix. Two modifications, corresponding to evolutionary and functional meanings of the alignment, depend on the specification of the gap penalty function. Interplay between consensus matrix and multiple alignment Comput Appl Biosci 1992 8 4 339-345 0362 Aho,A.V. Bounds on the Complexi.. J.Assoc.Comput. 76 23(1):1-12 Aho AV; Hirschberg DS; Ullman JD Bounds on the Complexity of the Longest Common Subsequence Problem Longest common; USA; Complexity; Subsequence "The difficulty of computing a longest common subsequence of two strings is examined using the decision tree model of computation, in which vertices represent 'equal-unequal' comparisons." A lower bound is the product of the lengths of the two strings J Assoc Comput Mach 1976 23 1 1-12 0363 Bairoch,A. PROSITE: A Dictionary .. Nucleic Acids R 91 19(Suppl.):224 Bairoch A PROSITE: A Dictionary of Sites and Patterns in Proteins Sequence database; SWI; Motif; Sequence analysis; Pattern library; Signature; Protein; PROSITE "PROSITE is a compilation of sites and patterns found in protein sequences. The use of protein sequence patterns (or motifs) to determine the function of proteins is becoming very rapidly one of the essential tools of sequence analysis. ... No attempt had been made until very recently to systematically collect biologically significant patterns or to discover new ones. It is for these reasons that we have developed, since 1988, a dictionary of sites and patterns which we call PROSITE." Nucleic Acids Res 1991 19 Suppl. 2241-2245 0364 Allison,L. A Bit-string Longest-c.. Inform.Process. 86 23(6):305-310 Allison L; Dix TI A Bit-string Longest-common-subsequence Algorithm Longest common; AU; Edit; Algorithm "A longest-common-subsequence algorithm is described which operates in terms of bit or bit-string operations. It offers a speedup of the order of the word-length on a conventional computer." Inform Process Lett 1986 23 6 305-310 0365 Allison,L. Finite-State Models in.. J.Mol.Evol. 92 35(1):77-89 Allison L; Wallace CS; Yee CN Finite-State Models in the Alignment of Macromolecules Pairwise alignment; Significance; AU; Message length; Information theory; Model "Minimum message length encoding is a technique of inductive inference with theoretical and practical advantages. It allows the posterior odds-ratio of two theories or hypotheses to be calculated. Here it is applied to problems of aligning or relating two strings, in particular two biological macromolecules." J Mol Evol 1992 35 1 77-89 0366 Allison,L. Minimum Message Length.. Bull.Math.Biol. 90 52(3):431-453 Allison L; Yee CN Minimum Message Length Encoding and the Comparison of Macromolecules Pairwise alignment; Significance; AU; Message length; Information theory "The question of whether or not two strings are related and, if so, of how they are related and the problem of finding a good theory of string mutation are treated as inductive inference problems. The method allows the posterior odds- ratio of two string alignments or of two models of string mutation to be computed." Bull Math Biol 1990 52 3 431-453 0367 Altschul,S.F. Gap Costs for Multiple.. J.Theor.Biol. 89 138(3):297-309 Altschul SF Gap Costs for Multiple Sequence Alignment Multiple alignment; USA; Sequence alignment; Gap "A new definition of gap costs for multiple alignments is proposed and compared with previous ones. Since the new definition links a multiple alignment's cost to that of its pairwise projections, it allows knowledge gained about two-sequence alignments to bear on the multiple alignment problem. Also, such linkage is a key element of recent algorithms that have rendered practical the simultaneous alignment of as many as six sequences." J Theor Biol 1989 138 3 297-309 0368 Altschul,S.F. Amino Acid Substitutio.. J.Mol.Biol. 91 219(3):555-565 Altschul SF Amino Acid Substitution Matrices from an Information Theoretic Perspective Sequence proximity; Substitution; Information theory; USA; Scoring; Sequence comparison; Statistical; Sequence alignment; Amino acid "In the light of information theory, it is possible to express the scores of a substitution matrix in bits and to see that different matrices are better adapted to different purposes." Discusses the PAM-120, PAM-200, and PAM-250 matrices J Mol Biol 1991 219 3 555-565 0369 Altschul,S.F. Weights for Data Relat.. J.Mol.Biol. 89 207(4):647-653 Altschul SF; Carroll RJ; Lipman DJ Weights for Data Related by a Tree Multiple alignment; Sequence weight; USA; Evolutionary tree "How can one characterize a set of data collected from different biological species, or indeed any set of data related by an evolutionary tree? The structure imposed by the tree implies that the data are not independent, and for most applications this should be taken into account. We describe strategies for weighting the data that circumvent some of the problems of dependency." J Mol Biol 1989 207 4 647-653 0370 Altschul,S.F. Significance of Nucleo.. Mol.Biol.Evol. 85 2(6):526-538 Altschul SF; Erickson BW Significance of Nucleotide Sequence Alignments: A Method for Random Sequence Permutation that Preserves Dinucleotide and Codon Usage Pairwise alignment; Significance; USA; Sequence alignment; Codon; Nucleotide; Permutation "It is important to avoid claiming that sequence similarity is the result of nucleotide order if it can be explained merely by nonrandom usage of dinucleotides and/or codons. ... This paper describes and illustrates a method that generates with equal probability all permutations with a given dinucleotide usage or dinucleotide and codon usage." Mol Biol Evol 1985 2 6 526-538 0371 Altschul,S.F. A Nonlinear Measure of.. Bull.Math.Biol. 86 48(5/6):617-63 Altschul SF; Erickson BW A Nonlinear Measure of Subalignment Similarity and its Significance Levels Subalignment; Locally optimal; USA; Significance; Pattern recognition; Similarity "A new measure of subalignment similarity is introduced. ... Previous algorithms can not use this measure to find locally optimal subalignments because, unlike Needleman-Wunsch and Sellers similarities, this measure is nonlinear. A new pattern recognition algorithm is described for finding all locally optimal subalignments of two nucleotide sequences." Bull Math Biol 1986 48 5/6 617-632 0372 Altschul,S.F. Locally Optimal Subali.. Bull.Math.Biol. 86 48(5/6):633-66 Altschul SF; Erickson BW Locally Optimal Subalignments using Nonlinear Similarity Functions Subalignment; Sequence comparison; USA; Locally optimal; Optimal; Function; Similarity "Nonlinear similarity functions are often better than linear functions at distinguishing interesting subalignments from those due to chance. Nonlinear similarity functions useful for comparing biological sequences are developed. Several new algorithms are presented for finding locally optimal subalignments of two sequences." Bull Math Biol 1986 48 5/6 633-660 0373 Altschul,S.F. Optimal Sequence Align.. Bull.Math.Biol. 86 48(5/6):603-61 Altschul SF; Erickson BW Optimal Sequence Alignment using Affine Gap Costs Sequence proximity; Pairwise alignment; USA; Sequence alignment; Optimal; Gap "When comparing two biological sequences, it is often desirable for a gap to be assigned a cost not directly proportional to its length. If affine gap costs are employed, ... the algorithm of Gotoh (1982) finds the minimum cost of aligning two sequences in order MN steps." Since Gotoh's algorithm is flawed, the authors describe "an algorithm that finds all and only the optimal alignments." Bull Math Biol 1986 48 5/6 603-616 0374 Altschul,S.F. Significance Levels fo.. Bull.Math.Biol. 88 50(1):77-92 Altschul SF; Erickson BW Significance Levels for Biological Sequence Comparison using Non-linear Similarity Functions Subalignment; Significance; USA; Sequence comparison; Scoring; Distribution; Function; Similarity "A class of non-linear similarity functions s1 has been proposed for comparing subalignments of biological sequences. The distribution of maximal s1- alignments is well approximated by the extreme value distribution. The significance levels of s1 are studied for a variety of nucleotide frequency distributions as well as for several matrices of amino acid substitution costs." See Altschul, Erickson (1986) Bull Math Biol 1988 50 1 77-92 0375 Altschul,S.F. Basic Local Alignment .. J.Mol.Biol. 90 215:403-410 Altschul SF; Gish W; Miller W; Myers EW; Lipman DJ Basic Local Alignment Search Tool Subalignment; Database search; USA; Dynamic programming; Motif; Region; Locally optimal; BLAST A new method "which employs a measure based on well-defined mutation scores. It directly approximates the results that would be obtained by a dynamic programming algorithm for optimizing this measure. ... The basic algorithm ... can be ... applied in a variety of contexts including straightforward DNA and protein sequences database searches, motif searches, ... and in the analysis of multiple regions of similarity ...." J Mol Biol 215 215 403-410 0376 Altschul,S.F. Trees, Stars, and Mult.. SIAM J.Appl.Mat 89 49(1):197-209 Altschul SF; Lipman DJ Trees, Stars, and Multiple Biological Sequence Alignment Multiple alignment; USA; Sequence alignment; Dynamic programming; Evolutionary tree "This paper presents an extension of Carrillo and Lipman's algorithm to the definition of multiple alignment cost as the cost of an evolutionary tree." SIAM J Appl Math 1989 49 1 197-209 0377 Altschul,S.F. Protein Database Searc.. Proc.Nat.Acad.S 90 87(14):5509-55 Altschul SF; Lipman DJ Protein Database Searches for Multiple Alignments Database search; USA; Statistical; Multiple alignment; Pattern recognition; Sequence comparison; Protein "By searching a database for multiple as opposed to pairwise alignments, distant relationships are much more easily distinguished from background noise. Recent statistical results permit the power of this approach to be analyzed. Given a typical query sequence, an algorithm described here permits the current protein database to be searched for three-sequence alignments in less than four minutes." Proc Nat Acad Sci USA 1990 87 14 5509-5513 0378 Aoe,J.I. An Efficient Implement.. IEEE Trans.Soft 89 15(8):1010-101 Aoe JI An Efficient Implementation of Static String Pattern Matching Machines Dictionary match; Automata; JP; Pattern match "A technique for implementing a static transition table of a string pattern matching machine which locates all occurrences of a finite number of keywords in a string is described. The approach is based on Johnson's storage and retrieval method of the transition table of a finite state machine." IEEE Trans Software Eng 1989 15 8 1010-1016 0379 Aoe,J. A Method for Improving.. IEEE Trans.Soft 84 10(1):116-120 Aoe J; Yamamoto Y; Shimada R A Method for Improving String Pattern Matching Machines Dictionary match; Automata; JP; Pattern match "This correspondence describes an efficient string pattern matching machine to locate all occurrences of any of a finite number of keywords and phrases in an arbitrary text string. Some conditions are defined on the states of the machine in order to improve the speed and size of the machine by Aho and Corasick" (1975) IEEE Trans Software Eng 1984 10 1 116-120 0380 Apostolico,A. Improving the Worst-ca.. Inform.Process. 86 23(2):63-69 Apostolico A Improving the Worst-case Performance of the Hunt-Szymanski Strategy for the Longest Common Subsequence of Two Strings Longest common; USA; Search tree; Data structure; Subsequence; Performance "The new algorithm presented here pursues a schedule of primitive operations quite close to the one inherent to the Hunt-Szymanski strategy, but with substantially enhanced efficiency. ... First, its worst case is never worse than linear in the product nm of the lengths of the two input strings. Second, its time bound does not always grow with the cardinality r of the set R of all pairs of matching positions of the input strings." Inform Process Lett 1986 23 2 63-69 0381 Apostolico,A. Remark on the Hsu-Du N.. Inform.Process. 87 25(4):235-236 Apostolico A Remark on the Hsu-Du New Algorithm for the Longest Common Subsequence Problem Longest common; USA; Subsequence; Algorithm "One of the time bounds claimed for a recent algorithm [Hsu, Du (1984)] computing the longest common subsequence of two strings is shown not to be correct. While this fact considerably affects the performance of that algorithm, it also contributes to pose a few interesting questions." Inform Process Lett 1987 25 4 235-236 0382 Apostolico,A. Efficient CRCW-PRAM Al.. Theoret.Comput. 93 108:331-344 Apostolico A Efficient CRCW-PRAM Algorithms for Universal Substring Searching String match; Parallel; USA; Complexity; Sequence search; Algorithm "Thus, in particular, searching for any substring of a pattern of size m in any substring of a text of size n can be done in constant time with at most n + m processors, once both the text and the pattern have been put in standard form at a cost of O((n + m) log n) operations. This has the same global complexity as the early algorithm in [Galil (1985)], which, however, handled only one definite pattern at a time." Theoret Comput Sci 108 108 331-344 0383 Apostolico,A. Fast Linear-space Comp.. Theoret.Comput. 92 92(1):3-17 Apostolico A; Browne S; Guerra C Fast Linear-space Computations of Longest Common Subsequences Longest common; USA; Complexity; Subsequence "This paper reviews linear-space LCS computations in connection with two classical paradigms originally designed to take less than quadratic time in favorable circumstances. The objective is to achieve the space reduction without alteration of the asymptotic time complexity of the original algorithm." One suits cases where the LCS is expected to be close to the shortest input string; another suits cases where one input is much shorter than the other Theoret Comput Sci 1992 92 1 3-17 0384 Apostolico,A. The Boyer-Moore-Galil .. SIAM J.Comput. 86 15(1):98-105 Apostolico A; Giancarlo R The Boyer-Moore-Galil String Searching Strategies Revisited Boyer-Moore; USA; Pattern match; String match; String search "Based on the Boyer-Moore-Galil approach, a new algorithm is proposed which requires a number of character comparisons bounded by 2n, regardless of the number of occurrences of the pattern in the text string. Preprocessing is only slightly more involved and still requires a time linear in the pattern size." SIAM J Comput 1986 15 1 98-105 0385 Apostolico,A. The Longest Common Sub.. Algorithmica 87 2:315-336 Apostolico A; Guerra C The Longest Common Subsequence Problem Revisited Longest common; USA; Data structure; Dynamic programming; Subsequence "This paper re-examines, in a unified framework, two classic approaches [of Hirschberg and Hunt-Szymanski] to the problem of finding a longest common subsequence (LCS) of two strings, and proposes faster implementations for both." Algorithmica 2 2 315-336 0386 Apostolico,A. Parallel Construction .. Algorithmica 88 3:347-365 Apostolico A; Iliopoulos C; Landau GM; Schieber B; Vishkin U Parallel Construction of a Suffix Tree with Applications Match with k differences; Parallel; USA; Search tree; String match; Regularities; Suffix "In this paper a CRCW parallel RAM algorithm is presented that constructs the suffix tree associated with a string of n symbols in O(log n) time with n processors. ... Efficient parallel procedures are also given for some string problems that can be solved with suffix trees." On-line string matching. String matching with k differences Algorithmica 3 3 347-365 0387 Argos,P. A Sensitive Procedure .. J.Mol.Biol. 87 193(2):385-396 Argos P A Sensitive Procedure to Compare Amino Acid Sequences Pairwise alignment; DE; Segment; Subalignment; Scoring; Amino acid "Methods are discussed that provide sensitive criteria for detection of weak sequence homologies. They are based on the Dayhoff relatedness odds amino acid exchange matrix and certain residue physical characteristics. The search procedure uses several residue probe lengths in comparing all possible segments of two protein sequences ...Alignments are automatically effected using the highest search matrix values and without the necessity of gap penalties." J Mol Biol 1987 193 2 385-396 0388 Argos,P. Sensitivity Comparison.. Methods Enzymol 90 183:352-365 Argos P; Vingron M Sensitivity Comparisons of Protein Amino Acid Sequences Pairwise alignment; Multiple alignment; Dot; DE; Gap; Protein; Amino acid The classic alignment procedure for amino acid sequences is flawed, especially when alignments have amino acid identity at 35% or less of the matched positions, since multiple optimal alignments usually exist and are sensitive to choices of gap penalties. The authors describe strategies to overcome these problems for comparisons of two or multiple sequences. See Argos (1987), Rechid, Vingron, and Argos (1989), and Vingron and Argos (1989) for details Methods Enzymol 183 183 352-365 0389 Argos,P. Protein Sequence Compa.. Protein Eng. 91 4(4):375-383 Argos P; Vingron M; Vogt G Protein Sequence Comparison: Methods and Significance Database search; Review; DE; Sequence comparison; Significance; Sequence alignment; Protein Single amino acid comparisons. Using sequence fragments for comparisons. Multiple sequence alignment. Problems. Comparison of methods. Recommendations. Future needs Protein Eng 1991 4 4 375-383 0390 Golding,B. Exploratory Analysis o.. Comput.Appl.Bio 94 10(3):243-247 Golding B Exploratory Analysis of Multiple Sequence Alignments using Phylogenies Multiple alignment; Significance; Phylogeny; CA; Sequence alignment; Region "Multiple alignment algorithms may produce an alignment between sequences even when they have little homology with other sequences. A program is presented that makes use of a phylogeny to explore the implications of an alignment. ... The program also permits randomization of subsections of the sequences to determine the significance of the multiple alignment for these individual regions. The combination of these two simple methods permits rapid and interactive exploration of multiple sequence alignments." Comput Appl Biosci 1994 10 3 243-247 0391 Arratia,R. The Erdos-Renyi Law in.. Ann.Statist. 90 18(2):539-570 Arratia R; Gordon L; Waterman MS The Erdos-Renyi Law in Distribution, for Coin Tossing and Sequence Matching Pairwise comparison; Significance; USA; Statistical; Segment; Distribution "We consider the simplest problem of possible statistical interest, matching segments from two independent sequences of independent identically distributed letters. Surprisingly ... we shall see that even such a naive formulation might be useful in a biological context. Our main results ... give the asymptotic distribution of unusually rich matches between independent sequences." Ann Statist 1990 18 2 539-570 0392 Arratia,R. Critical Phenomena in .. Ann.Probab. 85 13(4):1236-124 Arratia R; Waterman MS Critical Phenomena in Sequence Matching Pairwise comparison; Significance; USA; Sequence match; Markov; Longest common "We give a generalization of the result of Erdos and Renyi on the length of the longest head run in the first n tosses of a coin. ... The results generalize to more than two sequences and to Markov chains. A strong law of large numbers is given for the proportion of letters within the longest matching word; the limiting proportion exhibits critical behavior ...." Ann Probab 1985 13 4 1236-1249 0393 Arratia,R. The Erdos-Renyi Strong.. Ann.Probab. 89 17(3):1152-116 Arratia R; Waterman MS The Erdos-Renyi Strong Law for Pattern Matching with a Given Proportion of Mismatches Pairwise comparison; Significance; USA; Pattern match; Longest common; Markov "Consider two random sequences ... of i.i.d. letters in which the probability that two distinct letters match is p > 0. For each value a between p and 1, the length of the longest contiguous matching between the two sequences, requiring only a proportion a of corresponding letters to match, satisfies a strong law analogous to the Erdos-Renyi law for coin tossing." Ann Probab 1989 17 3 1152-1169 0394 Attwood,T.K. Multiple Sequence Alig.. Gene 91 98:153-159 Attwood TK; Eliopoulos EE; Findlay JBC Multiple Sequence Alignment of Protein Families Showing Low Sequence Homology: A Methodological Approach Using Database Pattern-matching Discriminators for G-protein-linked Receptors Multiple alignment; UK; Sequence alignment; Pattern match; Discrimination; Profile; Homology; Protein "The approach ... does not rely on 3D-structure alignments, it does not include explicit definitions of secondary structure positions, and neither does it introduce gap penalties. The method relies instead on building up a character profile for each position in the discriminators, ... and is used in a qualitative manner to aid multiple sequence alignment." Gene 98 98 153-159 0395 Bacon,D.J. Multiple Sequence Alig.. J.Mol.Biol. 86 191:153-161 Bacon DJ; Anderson WF Multiple Sequence Alignment Multiple alignment; Segment; CA; Sequence alignment; Statistical A multiple sequence alignment algorithm which is based on finding common subsequences J Mol Biol 191 191 153-161 0396 Bacon,D.J. Multiple Sequence Comp.. Methods Enzymol 90 183:438-447 Bacon DJ; Anderson WF Multiple Sequence Comparison Multiple alignment; Segment; CA; Sequence comparison; Statistical Concerns "the problem of finding weak similarities or distant relationships among proteins for which only the sequences are known. Comparing just two sequences at a time by current methods does not allow for a sufficiently sensitive test of similarity. ... Simultaneous intercomparison of several sequences, on the other hand, often yields a significantly nonrandom signal that in turn provides a statistical basis for assertions about structure and/or function." Methods Enzymol 183 183 438-447 0397 Baeza-Yates,R Improved String Search.. Software.Practi 89 19(3):257-271 Baeza-Yates RA Improved String Searching String search; Boyer-Moore; CA; String match; Text search; Pattern match "We show that it is possible to improve the average time of the Boyer- Moore string matching algorithm using more space." Software Practice Experience 1989 19 3 257-271 0398 Baeza-Yates,R String Searching Algor.. Lecture Notes i 89 382:75-96 Baeza-Yates RA String Searching Algorithms Revisited String match; CA; Data structure; Boyer-Moore; Knuth-Morris-Pratt; String search; Algorithm Dehne,F., Sack,J.R., Santoro,N. (Eds.), Algorithms and Data Structures, Workshop WADS '89, Ottawa, Canada, 17-19 August 1989. "We present bounds for the average case of the Knuth-Morris-Pratt (KMP) algorithm and the Boyer-Moore- Horspool algorithm for random text. ... We also present a hybrid algorithm which combines the KMP and BMH algorithms, and which, in practice, is faster than the Boyer-Moore algorithm." Lecture Notes in Comput Sci 382 382 75-96 0399 Baeza-Yates,R Average Running Time o.. Theoret.Comput. 92 92(1):19-31 Baeza-Yates RA; Regnier M Average Running Time of the Boyer-Moore-Horspool Algorithm String match; Boyer-Moore; CL; String search; Algorithm "We study Boyer-Moore-type string searching algorithms. We analyze the Horspool's variant. The searching time is linear. An exact expression of the linearity constant is derived and is proven to be asymptotically a, 1/c <= a <= 2/(c + 1), where c is the cardinality of the alphabet." Theoret Comput Sci 1992 92 1 19-31 0400 Bains,W. MULTAN: A Program to A.. Nucleic Acids R 86 14(1):159-177 Bains W MULTAN: A Program to Align Multiple DNA Sequences Multiple alignment; Consensus sequence; UK; Program; DNA The author describes a heuristic, iterative algorithm to align nucleic acid sequences. A basic step of the algorithm generates a consensus sequence from the current alignment of sequences. Seven rules identify a consensus result at a position. Interplay between consensus sequence and multiple alignment Nucleic Acids Res 1986 14 1 159-177 0401 Bains,W. MULTAN (2), a Multiple.. Comput.Appl.Bio 89 5(1):51-52 Bains W MULTAN (2), a Multiple String Alignment Program for Nucleic Acids and Proteins Multiple alignment; Consensus sequence; UK; Program; Protein; Nucleic acid "Here I describe a generalization of Bains' (1986) MULTAN program to align any sequences made from an alphabet of <= 64 characters." Interplay between consensus sequence and multiple alignment Comput Appl Biosci 1989 5 1 51-52 0402 Bairoch,A. SEQANALREF: A Sequence.. Comput.Appl.Bio 91 7(2):268-268 Bairoch A SEQANALREF: A Sequence Analysis Bibliographic Reference Data Bank Sequence analysis; Bibliography; SWI "The majority of entries belong to one of the following categories: algorithms for protein and nucleic acid sequence analysis ...; algorithms for sequence-based phylogenetic analysis; description of biopolymer data banks ...; description of software packages; description of on-line services for molecular biologists." Comput Appl Biosci 1991 7 2 268-268 0403 Barron,S. A Bibliography on Comp.. Comput.Appl.Bio 91 7(2):269-269 Barron S; Witten M; Harkness R; Driver J A Bibliography on Computational Algorithms in Molecular Biology and Genetics Sequence analysis; Bibliography; USA; Program; Genetic; Algorithm "The purpose of this short note is to provide an announcement of an ongoing database of bibliographic references on the subject of computational algorithms in molecular biology and genetics. We have focused upon computer and mathematical aspects of molecular biology and genetics (interpreted in a liberal and broad sense)." Comput Appl Biosci 1991 7 2 269-269 0404 Barry,D. Asynchronous Distance .. Biometrics 87 43:261-276 Barry D; Hartigan JA Asynchronous Distance Between Homologous DNA Sequences Sequence proximity; IR; Statistical; Significance; Distance; DNA "The distance between homologous DNA sequences of two species is proposed to be -0.25 ln [det (P)], where P is the conditional probability matrix specifying the proportions of the various nucleotides in the second sequence, corresponding to each of the four nucleotides in the first sequence." Biometrics 43 43 261-276 0405 Barth,G. An Alternative for the.. Inform.Process. 81 13(4-5):134-13 Barth G An Alternative for the Implementation of Knuth-Morris-Pratt Algorithm Knuth-Morris-Pratt; USA; String match; Algorithm "The new version of [the Knuth-Morris-Pratt algorithm] is optimal in the sense that for any input data a minimal amount of effort is spent to prepare for recoveries from possible mismatches." Inform Process Lett 1981 13 4-5 134-137 0406 Barth,G. An Analytical Comparis.. Inform.Process. 84 18(5):249-256 Barth G An Analytical Comparison of Two String Searching Algorithms String match; Knuth-Morris-Pratt; DE; Pattern match; String search; Markov; Analytical; Algorithm "Average case analyses of two algorithms to locate [a pattern in a text] are conducted in this paper. One algorithm is based on a straightforward trial- and-error approach, the other one [is due to Knuth-Morris-Pratt]." Inform Process Lett 1984 18 5 249-256 0407 Barton,G.J. Protein Multiple Seque.. Methods Enzymol 90 183:403-428 Barton GJ Protein Multiple Sequence Alignment and Flexible Pattern Matching Multiple alignment; Review; Match a pattern matrix; UK; Sequence alignment; Pattern match; Protein "In this chapter, a practical strategy for the rapid multiple alignment of protein sequences is described. Although not guaranteed to give the mathematically optimal alignment, the algorithm is able to cope with large numbers of sequences. It is also a fast procedure that gives alignments generally as good or better than those obtained by pairwise methods." Methods Enzymol 183 183 403-428 0408 Barton,G.J. Scanning Protein Seque.. Comput.Appl.Bio 91 7(1):85-88 Barton GJ Scanning Protein Sequence Databanks using a Distributed Processing Workstation Network Database search; Distributed; UK; Sequence comparison; Protein; Network; Databank "The programme pscan has been developed to distribute protein databank scans over a network of computers that share a common file system. pscan may be used in conjunction with most conventional sequence comparison programmes. ... Accordingly, pscan provides a low-cost, portable alternative to dedicated parallel processing computers." Comput Appl Biosci 1991 7 1 85-88 0409 Barton,G.J. Computer Speed and Seq.. Science 92 257(18 Sept.): Barton GJ Computer Speed and Sequence Comparison Database search; UK; Sequence comparison "Despite recent well-known advances in computer performance, it is still a commonly and erroneously held belief that rigorous sequence comparison method are too expensive to use for protein database searching." Science 1992 257 18 Sept. 1609-1609 0410 Barton,G.J. ALSCRIPT: A Tool to Fo.. Protein Eng. 93 6(1):37-40 Barton GJ ALSCRIPT: A Tool to Format Multiple Sequence Alignments Multiple alignment; UK; Sequence alignment; Display; Program "The ALSCRIPT program ... was developed specifically to allow the easy formatting and graphical display of large multiple sequence alignments." Protein Eng 1993 6 1 37-40 0411 Barton,G.J. A Strategy for the Rap.. J.Mol.Biol. 87 198:327-337 Barton GJ; Sternberg MJE A Strategy for the Rapid Multiple Alignment of Protein Sequences. Confidence Levels from Tertiary Structure Comparisons Multiple alignment; Clustering; UK; Structure; Significance; Confidence; Protein "An algorithm is presented for the multiple alignment of protein sequences that is both accurate and rapid computationally. The approach is based on the conventional dynamic-programming method of pairwise alignment." J Mol Biol 198 198 327-337 0412 Barton,G.J. Evaluation and Improve.. Protein Eng. 87 1(2):89-94 Barton GJ; Sternberg MJE Evaluation and Improvements in the Automatic Alignment of Protein Sequences Pairwise alignment; Significance; UK; Sequence alignment; Structure; Protein "The accuracy of protein sequence alignment obtained by applying a commonly used global sequence comparison algorithm is assessed. Alignments based on the superposition of the three-dimensional structures are used as a standard for testing the automatic, sequence-based methods." Protein Eng 1987 1 2 89-94 0413 Barton,G.J. Flexible Protein Seque.. J.Mol.Biol. 90 212(2):389-402 Barton GJ; Sternberg MJE Flexible Protein Sequence Patterns: A Sensitive Method to Detect Weak Structural Similarities Match a pattern matrix; UK; Pattern match; Pattern definition; Dynamic programming; Similarity; Protein "In contrast to conventional pattern matching, template or sequence alignment methods, flexible [protein sequence] patterns allow residue patterns typical of a complete protein fold to be developed in terms of residue positions (elements), separated by gaps of defined range. An efficient dynamic programming algorithm is presented to enable the best alignment(s) of a pattern with a sequence to be identified." J Mol Biol 1990 212 2 389-402 0414 Beanland,T.J. The Inference of Evolu.. Comp.Biochem.Ph 92 102B(4):643-65 Beanland TJ; Howe CJ The Inference of Evolutionary Trees from Molecular Data Phylogeny; Multiple alignment; UK; Review; Evolutionary tree; Significance "Procedures for multiple alignment of sequence data, subsequent phylogenetic inference, and testing of the trees derived are presented. The assumptions underlying different approaches and the extent to which they are valid are discussed." Comp Biochem Physiol B Comp Biochem 1992 102B 4 643-659 0415 Beckmann,J.S. Intervening Sequences .. J.Biomol.Struct 86 4(3):391-400 Beckmann JS; Brendel V; Trifonov EN Intervening Sequences Exhibit Distinct Vocabulary Sequence analysis; Significance; IL; Linguistic "Little is known about the origin and function of eukaryotic introns. Application of a novel linguistic approach to the analysis of intervening sequences reveals, however, that they exhibit a specific non-random vocabulary whose major feature is the utilization of mirror-symmetrical words ('mirrorrim')." J Biomol Struct & Dyn 1986 4 3 391-400 0416 Benner,S.A. Response to Barton's L.. Science 92 257(18 Sept.): Benner SA; Cohen MA; Gonnet GH Response to Barton's Letter: Computer Speed and Sequence Comparison Database search; SWI; Sequence comparison; Sequence alignment "However, our approach and Barton's differ in three fundamental ways, all interesting to the general scientist who wants to use sequence alignments without becoming entangled in its mathematics." Science 1992 257 18 Sept. 1609-1610 0417 Benson,D.C. Digital Signal Process.. Nucleic Acids R 90 18(10):3001-30 Benson DC Digital Signal Processing Methods for Biological Sequence Comparison Pairwise comparison; Fourier; USA; Statistical; Signal; Sequence comparison "A method is discussed for DNA or protein sequence comparison using a finite field fast Fourier transform, a digital signal processing technique; and statistical methods are discussed for analyzing the output of this algorithm. [It] compares two sequences of length N in computing time proportional to N log N compared to N2 for methods currently used." Nucleic Acids Res 1990 18 10 3001-3006 0418 Benson,D.C. Fourier Methods for Bi.. Nucleic Acids R 90 18(21):6305-63 Benson DC Fourier Methods for Biosequence Analysis Pairwise comparison; Fourier; USA; Gap "Novel methods are discussed for using fast Fourier transforms for DNA or protein sequence comparison. ... Novel methods are given which (1) enable the detection of clusters of matching letters, (2) facilitate the insertion of gaps to enhance sequence similarity, and (3) accommodate to varying densities of letters in the input sequences." Nucleic Acids Res 1990 18 21 6305-6310 0419 Berg,O.G. Selection of DNA Bindi.. J.Mol.Biol. 87 193:723-750 Berg OG; von Hippel PH Selection of DNA Binding Sites by Regulatory Proteins. Statistical- mechanical Theory and Application to Operators and Promoters Match a pattern matrix; USA; Sequence analysis; Statistical; Protein; Selection; DNA; Binding "We present a statistical-mechanical selection theory for the sequence analysis of a set of specific DNA regulatory sites that makes it possible to predict the relationship between individual base-pair choices in the site and specific activity (affinity)." J Mol Biol 193 193 723-750 0420 Berger,M.P. A Novel Randomized Ite.. Comput.Appl.Bio 91 7(4):479-484 Berger MP; Munson PJ A Novel Randomized Iterative Strategy for Aligning Multiple Protein Sequences Multiple alignment; USA; Program; Optimal; Needleman-Wunsch; Pairwise alignment; Protein "Our algorithm randomly divides a group of unaligned sequences into two subgroups, between which an optimal alignment is then obtained by a Needleman- Wunsch style of algorithm. ... The pairwise alignment process is repeated using different random divisions of the whole group into two subgroups." Comput Appl Biosci 1991 7 4 479-484 0421 Berkman,O. Highly Parallelizable .. ACM Sympos.Theo 89 21:309-319 Berkman O; Breslauer D; Galil Z; Schieber B; Vishkin U Highly Parallelizable Problems String match; Parallel; IL Seattle, WA, 15-17 May 1989. "In this section [4] we describe a parallel algorithm for finding all the occurrences of a pattern of length m in a text of length n over an arbitrary alphabet. The algorithm runs in O(log log m) time using n / log log m processors on a Common CRCW PRAM." ACM Sympos Theory Comput 21 21 309-319 0422 Bertossi,A.A. A VLSI System for Stri.. Integration, Th 90 9:129-139 Bertossi AA A VLSI System for String Matching Parallel; String match; Italy; VLSI Find all occurrences of a pattern of length m in a text of length n, where both strings are over a finite alphabet S. "In spite of its practical relevance, string matching has received very little attention so far in the VLSI literature. In this paper, we present a special purpose VLSI system for string matching which takes O(log n + log |S|) time and can be laid out with O(mn log m log n) area." Integration, The VLSI J 1990 9 129-139 0423 Bertossi,A.A. A Parallel Solution to.. Comput.J. 92 35(5):524-526 Bertossi AA; Luccio F; Pagli L; Lodi E A Parallel Solution to the Approximate String Matching Problem Match with k differences; Parallel; Dynamic programming; String match; Italy; Approximate match; VLSI "We have shown how the approximate string matching problem (ASMP) can be solved in parallel on a bounded degree network of elementary processors. The proposed parallelization scheme is very simple. It is based on a standard sequential method of dynamic programming, and attains optimal speedup. ... Our scheme is instead suitable for VLSI implementation, and takes into account a very general set of errors, for which no 'fast' algorithm is known." Comput J 1992 35 5 524-526 0424 Beyer,W.A. A Molecular Sequence M.. Math.Biosci. 74 19:9-25 Beyer WA; Stein ML; Smith TF; Ulam SM A Molecular Sequence Metric and Evolutionary Trees Sequence proximity; USA; Evolutionary tree "A precisely stated algorithm is given for reconstructing phylogenetic relationships from protein amino acid sequence data under the restriction that all distance measures be proper metrics. In conjunction with a general sequence metric, the algorithm is applied to the cytochrome c data used in earlier studies." Math Biosci 19 19 9-25 0425 Bishop,M.J. Maximum Likelihood Ali.. J.Mol.Biol. 86 190(2):159-165 Bishop MJ; Thompson EA Maximum Likelihood Alignment of DNA Sequences Pairwise alignment; UK; Likelihood; Probabilistic; DNA "The optimal alignment problem for pairs of molecular sequences under a probabilistic model of evolutionary change is equivalent to the problem of estimating the maximum likelihood time required to transform one sequence to the other. When this time has been estimated, various alignments of high posterior probability may be written down. A simple model with two parameters is presented and a method is described by which the likelihood may be computed." J Mol Biol 1986 190 2 159-165 0426 Bishop,M. Fast Computer Search f.. Nucleic Acids R 84 12(13):5471-54 Bishop M; Thompson E Fast Computer Search for Similar DNA Sequences Database search; UK; Sequence database; Statistical; DNA "An extremely fast method of searching a nucleic acid sequence database against a probe sequence is described. The method is based on the detection of deviation from expected number and deviation from random spatial distribution of sub-sequences which are unique within a sequence, and shared between that sequence and the probe." Nucleic Acids Res 1984 12 13 5471-5474 0427 Blaisdell,B.E A Measure of the Simil.. Proc.Nat.Acad.S 86 83(14):5155-51 Blaisdell BE A Measure of the Similarity of Sets of Sequences not Requiring Sequence Alignment Sequence proximity; USA; Sequence alignment; Markov; N-gram; Coding; Needleman-Wunsch; Dot; Similarity "Determination of first- and second-order Markov chain homogeneity of sets of nuclear eukaryotic DNA sequences, both coding and noncoding, finds similarities imperceptible to the standard Needleman-Wunsch base matching or dot-matrix algorithms." Proc Nat Acad Sci USA 1986 83 14 5155-5159 0428 Blaisdell,B.E Average Values of a Di.. J.Mol.Evol. 89 29(6):538-547 Blaisdell BE Average Values of a Dissimilarity Measure Not Requiring Sequence Alignment are Twice the Averages of Conventional Mismatch Counts Requiring Sequence Alignment for a Computer-generated Model System Sequence proximity; USA; Sequence alignment; N-gram; Statistical; Significance; Model "Three measures of sequence dissimilarity have been compared on a computer-generated model system in which substitutions in random sequences were made at randomly selected sites and the replacement character was chosen at random from the set of characters different from the original occupant of the site." J Mol Evol 1989 29 6 538-547 0429 Blaisdell,B.E Effectiveness of Measu.. J.Mol.Evol. 89 29(6):526-537 Blaisdell BE Effectiveness of Measures Requiring and Not Requiring Prior Sequence Alignment for Estimating the Dissimilarity of Natural Sequences Sequence proximity; USA; Sequence alignment; Least squares; Discrimination; Evolutionary tree; Consensus tree "Various measures of sequence dissimilarity have been evaluated by how well the additive least squares estimation of edges (branch lengths) of an unrooted evolutionary tree fit the observed pair-wise dissimilarity measures and by how consistent the trees are for different data sets derived from the same set of sequences. This evaluation provided sensitive discrimination among dissimilarity measures ...." J Mol Evol 1989 29 6 526-537 0430 Blaisdell,B.E Average Values of a Di.. J.Mol.Evol. 91 32(6):521-528 Blaisdell BE Average Values of a Dissimilarity Measure not Requiring Sequence Alignment are Twice the Averages of Conventional Mismatch Counts Requiring Sequence Alignment for a Variety of Computer-Generated Model Systems Sequence proximity; USA; Sequence alignment; Model "It has been found that two dissimilarity measures not requiring sequence alignment perform about as well for the inference of unrooted evolutionary trees as do conventional mismatch counts requiring prior sequence alignment. ... A reason for the success of one of the measures not requiring sequence alignment has been found." J Mol Evol 1991 32 6 521-528 0431 Blum,N. On Locally Optimal Ali.. 93 Blum N On Locally Optimal Alignments in Genetic Sequences (Revised Version) BK - Subalignment; Locally optimal; Optimal; Genetic; DE Report 8567-CS, Institut fur Informatik, Universitat Bonn, 23 pp. A c- locally minimal distance is defined. "We show how to compute all substrings of x which have c-locally minimal distance from y and all corresponding alignments in O(mn) time where n is the length of x and m is the length of y." 1993 0432 Blum,N. Efficient Computation .. 93 Blum N Efficient Computation of All Optimal Alignments of Two Genetic Sequences with Concave Weighting Functions BK - Pairwise alignment; Optimal; Function; Genetic; DE Report 8586-CS, Institut fur Informatik, Universitat Bonn, 28 pp. "We show for any concave weighting function, how to compute a compact representation of the distance graph of two genetic sequences x and y ...." 1993 0433 Blum,N. On Locally Optimal Loc.. 93 Blum N On Locally Optimal Local Alignments and Subalignments of Genetic Sequences with Concave Weighting Functions BK - Subalignment; Locally optimal; Optimal; Function; Genetic; DE Report 8587-CS, Institut fur Informatik, Universitat Bonn, 25 pp. "We show for any concave weighting function, how to compute a compact representation of the locally optimal local alignment graph of [sequences] x and y and of the locally optimal subalignment graph of x and y, respectively which contains exactly all locally optimal local alignments and all locally optimal subalignments of x and y ...." 1993 0434 Blum,N. Some Remarks on the Ac.. 93 Blum N Some Remarks on the Accurate Notion of Local Optimality in Genetic Sequences BK - Pairwise comparison; Review; Genetic; DE Report 8588-CS, Institut fur Informatik, Universitat Bonn, 14 pp. "We give a comprehensive survey of old and new results with respect to locally optimal local alignments and locally optimal subalignments in genetic sequences." 1993 0435 Boguski,M.S. Computational Sequence.. J.Lipid Res. 92 33:957-974 Boguski MS Computational Sequence Analysis Revisited: New Databases, Software Tools, and the Research Opportunities They Engender Sequence analysis; Review; USA; Sequence database; Sequence search; Multiple alignment; Motif "Recent developments in fast database searching, multiple sequence alignment, and molecular modeling are discussed and windows-based, mouse-driven software for CD-ROM and network information retrieval are described." J Lipid Res 33 33 957-974 0436 Boguski,M.S. Analysis of Conserved .. New Biol. 92 4(3):247-260 Boguski MS; Hardison RC; Schwartz S; Miller W Analysis of Conserved Domains and Sequence Motifs in Cellular Regulatory Proteins and Locus Control Regions Using New Software Tools for Multiple Alignment and Visualization Multiple comparison; USA; Motif; Region; Sequence analysis; Multiple alignment; Display; Protein "Here we describe an integrated set of interactive Unix tools that combines several multiple-alignment techniques with traditional 'dot-plot' visualization to provide a flexible environment for approaching complex sequence analysis problems." New Biol 1992 4 3 247-260 0437 Bork,P. A Method for Property .. Stud.Biophys. 89 129(2/3):231-2 Bork P; Grunwald C A Method for Property Pattern Searches in Protein Sequence Data Bases, Demonstrated by Detection of GTP-binding Sites Consensus sequence; Match complex patterns; Database search; DE; Pattern search; Sequence database; Protein; Detection "We have developed a method for deriving patterns of such properties (i. e., consensus patterns) from alignments of related sequences and for the subsequent database search for sequence sections that match these patterns. The characterization of the residues of the patterns is based on ten physicochemical and steric properties given in Zvelebil et al. (1987)." Stud Biophys 1989 129 2/3 231-240 0438 Bork,P. Recognition of Differe.. Eur.J.Biochem. 90 191:347-358 Bork P; Grunwald C Recognition of Different Nucleotides-binding Sites in Primary Structures Using a Property-pattern Approach Pattern match; DE; Consensus sequence; Sequence recognition; Structure; Recognition "Consensus sequence patterns for b-a-b folds binding FAD, NAD and GTP were constructed on the basis of 11 steric and physicochemical properties. These property patterns permit detection and distinction of the respective nucleotide- binding sites on the basis of amino acid sequence analysis alone." Eur J Biochem 191 191 347-358 0439 Boswell,D.R. A Program for Template.. Comput.Appl.Bio 88 4(3):345-350 Boswell DR A Program for Template Matching of Protein Sequences Match complex patterns; NZ; Motif; Template; Program; Protein "The matching of a template to a protein sequence is simplified by treating it as a special case of sequence alignment. Restriction of the distances between motifs in the template controls against spurious matches within very long sequences. The program using this algorithm is fast enough to be used in scanning large databases for sequences matching a complex template." Comput Appl Biosci 1988 4 3 345-350 0440 Boswell,D.R. Sequence Comparison an.. Computational.. 88Oxford Universi Boswell DR; Lesk AM Sequence Comparison and Alignment: The Measurement and Interpretation of Sequence Similarity Lesk AM Computational Molecular Biology. Sources and Methods for Sequence Analysis Pairwise comparison; Review; NZ; Sequence comparison; Sequence alignment; Similarity "Sequence comparison and alignment are among the most important tools of computational molecular biology. ... Sequence comparison merely detects common features; sequence alignment places residues of the sequences into the best one- to-one correspondence. ... Here we emphasize not the methods but the interpretation of the results." Oxford University Press Oxford 1988 161-178 0441 Boswell,D.R. Sequence Comparison by.. Nucleic Acids R 84 12(1):457-463 Boswell DR; McLachlan AD Sequence Comparison by Exponentially-Damped Alignment Pairwise alignment; UK; Sequence comparison; Dynamic programming Two "sequences are compared by calculating for each pair of residues a score which represents the best local alignment bringing those residues into correspondence; smooth localization is achieved by reducing the contribution of distant parts of the alignment path by a factor which decreases exponentially with their distance from the point in question." Nucleic Acids Res 1984 12 1 457-463 0442 Boyer,R.S. A Fast String-Searchin.. Comm.ACM 77 20(10):762-772 Boyer RS; Moore JS A Fast String-Searching Algorithm String match; Boyer-Moore; USA; String search; Algorithm "An algorithm is presented that searches for the location, i, of the first occurrence of a character string, pat, in another string, string. During the search operation, the characters of pat are matched starting with the last character of pat. The information gained by starting the match at the end of the pattern often allows the algorithm to proceed in large jumps through the text being searched." Comm ACM 1977 20 10 762-772 0443 Bradford,J.H. Sequence Matching with.. Inform.Process. 90 34(4):193-196 Bradford JH Sequence Matching with Binary Codes Sequence proximity; CA; Sequence match "This paper introduces an algorithm that encodes pairs of strings as binary numbers such that the Hamming distance between the binary code words is equal to the Levenshtein distance between the original strings." Inform Process Lett 1990 34 4 193-196 0444 Brendel,V. Linguistics of Nucleot.. J.Biomol.Struct 86 4(1):11-21 Brendel V; Beckmann JS; Trifonov EN Linguistics of Nucleotide Sequences: Morphology and Comparison of Vocabularies Sequence analysis; Significance; Linguistic; IL; Nucleotide "The concept of 'words' in continuous languages devoid of blanks is introduced and an operational definition of words given. With this novel concept nucleotide sequences become objects for linguistic analysis. The typical word size of the nucleotide language is found to be 3 to 5 ...." J Biomol Struct & Dyn 1986 4 1 11-21 0445 Brendel,V. Methods and Algorithms.. Proc.Nat.Acad.S 92 89:2002-2006 Brendel V; Bucher P; Nourbakhsh IR; Blaisdell BE; Karlin S Methods and Algorithms for Statistical Analysis of Protein Sequences Sequence analysis; Significance; USA; Statistical; Protein; Algorithm "We describe several protein sequence statistics designed to evaluate distinctive attributes of residue content and arrangement in primary structure. Considered are global compositional biases, local clustering of different residue types ..., long runs of charged or uncharged residues, periodic patterns, counts and distribution of homooligopeptides, and unusual spacings ." Proc Nat Acad Sci USA 89 89 2002-2006 0446 Breslauer,D. An Optimal O(log log n.. SIAM J.Comput. 90 19(6):1051-105 Breslauer D; Galil Z An Optimal O(log log n) Time Parallel String Matching Algorithm Parallel; USA; String match; Optimal; Algorithm "An optimal O(log log n) time parallel algorithm for string matching on CRCW-PRAM is presented. It improves previous results of Galil(1985) and Vishkin (1985)." Since the algorithm requires n/log log n processors, the string matching problem belongs to one of the lowest parallel complexity classes SIAM J Comput 1990 19 6 1051-1058 0447 Breslauer,D. A Lower Bound for Para.. ACM Sympos.Theo 91 23:439-443 Breslauer D; Galil Z A Lower Bound for Parallel String Matching Parallel; USA; String match; Complexity New Orleans, LA, 6-8 May 1991. "We present an O(log log m) lower bound on the number of rounds necessary for finding occurrences of a pattern string P[1..m] in a text string T[1..2m] in parallel using m comparisons in each round. This is the first lower bound for this problem. [It] is within a constant factor of the fastest algorithm ... and also holds for an m-processor CRCW-PRAM in the case of a general alphabet." ACM Sympos Theory Comput 23 23 439-443 0448 Brutlag,D.L. Improved Sensitivity o.. Comput.Appl.Bio 90 6(3):237-245 Brutlag DL; Dautricourt JP; Maulik S; Relph J Improved Sensitivity of Biological Sequence Database Searches Database search; USA; Sequence database; k-tuple "We have increased the sensitivity of DNA and protein sequence database searches by allowing similar but non-identical amino acids or nucleotides to match. In addition, one can match k-tuples or words instead of matching individual residues in order to speed the search. ... The concept of matching non-identical k-tuples also increases the power of DNA database searches." Comput Appl Biosci 1990 6 3 237-245 0449 Butler,R. Aligning Genetic Seque.. Strand: New C.. 90Englewood Cliff Butler R; Butler T; Foster I; Karonis N; Olson R; Overbeek R; Pfluger N; Price M; Tuecke S Aligning Genetic Sequences Foster I Taylor S Strand: New Concepts in Parallel Programming Multiple alignment; Segment; USA; Genetic "Our [multiple sequence alignment] algorithm is based on the notion of critical subsequences. ... When a critical subsequence occurs in two or more sequences, we call the set of occurrences a pin. Our algorithm will attempt to create an alignment in which as many pins as possible align exactly." Englewood Cliffs NJ ,Prentice Hall 1990 253-271 0450 Carrillo,H. The Multiple Sequence .. SIAM J.Appl.Mat 88 48(5):1073-108 Carrillo H; Lipman D The Multiple Sequence Alignment Problem in Biology Multiple alignment; USA; Sequence alignment; Dynamic programming; Complexity; Sequence comparison "The dynamic programming approach [to the multiple sequence alignment problem] has the limitation that its complexity scales up greatly with dimension .... In the following, we make observations on the problem of aligning sequences and that of aligning subsets of these sequences that reveal constraints of the problem that will prove useful in reducing computation in the dynamic programming method." SIAM J Appl Math 1988 48 5 1073-1082 0451 Cavener,D.R. Comparison of the Cons.. Nucleic Acids R 87 15(4):1353-136 Cavener DR Comparison of the Consensus Sequence Flanking Translational Start Sites in Drosophila and Vertebrates Consensus sequence; USA; Consensus method "An important issue germane to the analysis of nucleic acid sequences is the criteria used for consensus assignments. ... With these considerations in mind I have chosen the following criteria for the assignment of consensus sequences. ... However, the goal of this study was to obtain reliable consensus data which would not be significantly affected by a few errors." Nucleic Acids Res 1987 15 4 1353-1361 0452 Chan,S.C. Synthesis and Recognit.. IEEE Trans.Patt 91 13(12):1245-12 Chan SC; Wong AKC Synthesis and Recognition of Sequences Multiple alignment; Clustering; CA; Hierarchical; Entropy; Recognition "The synthesis of an ensemble of sequences is a 'sequence' of random elements that specify the probabilities of occurrence of the different symbols at the corresponding sites of the sequences. The synthesis is determined by a hierarchical sequence synthesis procedure ... which returns not only the taxonomic hierarchy of the whole ensemble of sequences but also the alignment ... of a group ... of the sequences at each level of the hierarchy." IEEE Trans Patt Anal Mach Intell 1991 13 12 1245-1255 0453 Chan,S.C. A Survey of Multiple S.. Bull.Math.Biol. 92 54(4):563-598 Chan SC; Wong AKC; Chiu DKY A Survey of Multiple Sequence Comparison Methods Multiple alignment; Survey; CA; Sequence comparison "This article presents a survey of the exhaustive (optimal) and heuristic (possibly sub-optimal) methods developed for the comparison of multiple macromolecular sequences. Emphasis is given to the different approaches of the heuristic methods Bull Math Biol 1992 54 4 563-598 0454 Takezaki,N. Inconsistency of the M.. J.Mol.Evol. 94 39:210-218 Takezaki N; Nei M Inconsistency of the Maximum Parsimony Method When the Rate of Nucleotide Substitution is Constant Phylogeny; Parsimony; USA; Substitution; Rate; Nucleotide "The inconsistency of the maximum parsimony method is known to occur even when the rate of nucleotide substitution is constant. To understand why this inconsistency occurs, a mathematical study was conducted for the cases of five, six, and seven sequences. The results obtained indicate that this inconsistency occurs because the probability of occurrence of nucleotide configurations generated by one substitution on a short interior branch is often lower than that of configurations generated by more substitutions on other longer branches. The chance of occurrence of this event ... apparently increases as the number of sequences increases." J Mol Evol 39 39 210-218 0455 Chang,W.I. Approximate String Mat.. IEEE Sympos.Fou 90 31:116-124 Chang WI; Lawler EL Approximate String Matching in Sublinear Expected Time Match with k differences; USA; String match; Approximate match; Locally optimal 22-24 October 1990, St. Louis, MO. "We are interested in much faster algorithms for restricted cases of the [k differences approximate string matching] problem, such as when the text string is random and errors are not too frequent. We have devised an algorithm that, for k < m/(log m + O(1)), runs in time O((n/m)k log m) on the average. In the worst case, our algorithm is O(nk) .... We define the approximate substring matching problem and give efficient algorithms based on our techniques." IEEE Sympos Found Comput Sci 31 31 116-124 0456 Chao,K.M. Aligning Two Sequences.. Comput.Appl.Bio 92 8(5):481-487 Chao KM; Pearson WR; Miller W Aligning Two Sequences Within a Specified Diagonal Band Pairwise alignment; USA; FASTA; Locally optimal "We describe an algorithm for aligning two sequences within a diagonal band that requires only O(NW) computation time and O(N) space, where N is the length of the shorter of the two sequences and W is the width of the band. ... This algorithm has been incorporated into the FASTA program package ...." Comput Appl Biosci 1992 8 5 481-487 0457 Chappey,C. MASH: An Interactive P.. Comput.Appl.Bio 91 7(2):195-202 Chappey C; Danckaert A; Dessen P; Hazout S MASH: An Interactive Program for Multiple Alignment and Consensus Sequence Construction for Biological Sequences Multiple alignment; FR; Motif; Consensus sequence; Program "... a method that allows the selection of the series of the common motifs to be aligned according to the 'alignment priority' criterion. This function depends on both the length and the occurrence frequency of the motifs, and allows the extraction of the total or local similarities, i.e. involving the whole set or only some sequences." Comput Appl Biosci 1991 7 2 195-202 0458 Chen,E.S. Parallel Alignment of .. Comput.Appl.Bio 93 9(3):375-375 Chen ES; Asano C; Davison DB Parallel Alignment of DNA Sequences on the Connection Machine CM-2 Database search; Parallel; USA; Program; Hardware; DNA "This code allows for searches of a query sequence against a library, while the Jones program [Jones (1992)] is best used for locating small patterns within a database." Comput Appl Biosci 1993 9 3 375-375 0459 Chiu,D.K.Y. Inferring Consensus St.. Comput.Appl.Bio 91 7(3):347-352 Chiu DKY; Kolodziejczak T Inferring Consensus Structure from Nucleic Acid Sequences Multiple alignment; Structure; CA; Nucleic acid "This paper presents an unsupervised inference method for determining the higher-order structure from sequence data. The method is general, but in this paper it is applied to nucleic acid sequences in determining the secondary (2-D) and tertiary (3-D) structure of the macromolecule." Comput Appl Biosci 1991 7 3 347-352 0460 Choffrut,C. An Optimal Algorithm f.. EATCS Bull. 90 40:217-225 Choffrut C An Optimal Algorithm for Building the Boyer-Moore Automaton String match; Boyer-Moore; Automata; FR; Optimal; Algorithm "The notion of Boyer-Moore automaton ... leads to an algorithm that requires more preprocessing but is more efficient than the original Boyer- Moore's algorithm. We give an optimal algorithm for computing the automaton and state an upper bound on the size of the automaton ...." EATCS Bull 40 40 217-225 0461 Chvatal,V. Longest Common Subsequ.. J.Appl.Probab. 75 12:306-315 Chvatal V; Sankoff D Longest Common Subsequence of Two Random Sequences Longest common; Significance; CA; Subsequence "Given two random k-ary sequences of length n, what is f(n, k), the expected length of their longest common subsequence? ... We study the limiting behaviour of n-1f(n, k) and derive upper and lower bounds on these limits for all k." J Appl Probab 12 12 306-315 0462 Chvatal,V. An Upper-Bound Techniq.. Time Warps, S.. 83Addison-Wesley Chvatal V; Sankoff D An Upper-Bound Technique for Lengths of Common Subsequences Sankoff D Kruskal JB Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison Longest common; Significance; CA; Subsequence This chapter illustrates "the application of combinatorial argumentation to sequence-comparison problems, in deriving upper bounds for the expected length of the longest common subsequence of two random k-ary sequences of length n." Addison-Wesley Reading, MA 1983 353-357 0463 Claverie,J.M. Assessing the Biologic.. Comput.Appl.Bio 85 1(2):95-104 Claverie JM; Sauvaget I Assessing the Biological Significance of Primary Structure Consensus Patterns using Sequence Databanks. I. Heat-shock and Glucocorticoid Control Elements in Eukaryotic Promoters Signal; FR; Significance; Program; Database search; Structure; Databank "We describe FORTRAN 77 software allowing for convenient searching of any segmented and ambiguous pattern in the currently available protein or nucleotide sequence databanks." Comput Appl Biosci 1985 1 2 95-104 0464 Cockwell,K.Y. Software Tools for Mot.. Comput.Appl.Bio 89 5(3):227-232 Cockwell KY; Giles IG Software Tools for Motif and Pattern Scanning: Program Descriptions including a Universal Sequence Reading Algorithm Dictionary match; UK; Motif; Gap; Program; Pattern match; Reading; Algorithm "Two programs, MOTIF and PATTERN, that scan sequences for matches to user- defined motifs and patterns of motifs based on identity and set membership are described." A pattern is a sequence of motifs separated by gaps Comput Appl Biosci 1989 5 3 227-232 0465 Cohen,D.N. Matching Code Sequence.. Math.Biosci. 75 24:25-30 Cohen DN; Reichert TA; Wong AKC Matching Code Sequences Utilizing Context Free Quality Measures Pairwise alignment; USA; Optimal; Needleman-Wunsch "A method is described herein for discovering the optimal correspondence of a pair of code sequences under generalized quality measures. The limits of both this algorithm and that of Needleman and Wunsch are presented." The algorithm described runs in O(M2N2) time. Math Biosci 24 24 25-30 0466 Cole,R. Tighter Bounds on the .. IEEE Sympos.Fou 92 33:600-609 Cole R; Hariharan R Tighter Bounds on the Exact Complexity of String Matching Complexity; USA; String match 24-27 October 1992, Pittsburgh, PA. "This paper considers how many character comparisons are needed to find all occurrences of a pattern of length m in a text of length n." Upper and lower bounds are obtained IEEE Sympos Found Comput Sci 33 33 600-609 0467 Collins,J.F. Applications of Parall.. Nucleic Acids R 84 12(1):181-192 Collins JF; Coulson AFW Applications of Parallel Processing Algorithms for DNA Sequence Analysis Pairwise comparison; Parallel; UK; Sequence analysis; Distributed; DNA; Algorithm "This paper explores the applicability of an ICL Distributed Array Processor ('DAP') to the general problem of finding similarities in DNA sequences." Programs are described for inspecting the match matrix and searching for alignments Nucleic Acids Res 1984 12 1 181-192 0468 Collins,J.F. Molecular Sequence Com.. Nucleic Acid .. 87IRL Press Collins JF; Coulson AFW Molecular Sequence Comparison and Alignment Bishop MJ Rawlings CJ Nucleic Acid and Protein Sequence Analysis: A Practical Approach Pairwise comparison; Review; UK; Sequence comparison "Similarity searches require a solution to one of three types of problem. ... Given two finite sequences, what pattern of indels makes the most plausibly similar alignment between them? ... Which sub-sequence(s) of an indefinitely long subsequence show(s) the greatest similarity to a short query sequence ...? ... Which pair(s) of sub-sequences ... show(s) the most plausible similarities ...?" IRL Press Oxford 1987 323-358 0469 Collins,J.F. Significance of Protei.. Methods Enzymol 90 183:474-487 Collins JF; Coulson AFW Significance of Protein Sequence Similarities Database search; Significance; UK; Locally optimal; Sequence comparison; Similarity; Protein "Given that a certain subsequence alignment has been found in comparing a query sequence against a database of sequences, what is the chance that an alignment of the same degree of similarity (or better) would have been found if the query sequence had been compared with a database just like the database which was used ... except that it contained no sequences which are significantly related to the query sequence?" Methods Enzymol 183 183 474-487 0470 Collins,J.F. The Significance of Pr.. Comput.Appl.Bio 88 4(1):67-71 Collins JF; Coulson AFW; Lyall A The Significance of Protein Sequence Similarities Subalignment; Significance; UK; Sequence comparison; Locally optimal; Similarity; Protein "A general method of assessing the significance of scored best local alignments, particularly suited to protein sequence comparisons, is described. The method establishes the parameters describing the distribution of the best results from any search program, provided that the set is sufficiently large and the majority of the alignments arise from unrelated sequences." Comput Appl Biosci 1988 4 1 67-71 0471 Colussi,L. Correctness and Effici.. Inform.Comput. 91 95(2):225-251 Colussi L Correctness and Efficiency of Pattern Matching Algorithms String match; Complexity; Pattern match; Italy; Algorithm "A few lines pattern matching algorithm is obtained by using the correctness proof of programs as a tool to the design of efficient algorithms. The new algorithm is obtained from a brute force algorithm by three refinement steps. ... [It] performs 1.5n character comparisons in the worst case and is sublinear on a random text for all patterns. ... [It] always works better than the classical [Knuth-Morris-Pratt] algorithm and, for some problems, is better than the [Boyer-Moore] algorithm too." Inform Comput 1991 95 2 225-251 0472 Colussi,L. On the Exact Complexit.. IEEE Sympos.Fou 90 31:135-144 Colussi L; Galil Z; Giancarlo R On the Exact Complexity of String Matching Complexity; USA; String match October 22-24, 1990, St. Louis, MO. "We investigate the maximal number of character comparisons made by a linear-time string matching algorithm, given a text of length n and a pattern of length m over a general alphabet. We denote it by c(n,m) or approximate it by (1+C)n, where C is a universal constant. We add the subscript 'on-line' when we restrict attention to on-line algorithms .... The upper bound was established 20 years ago ... and no progress has been made for 19 years. The only lower bound has been the obvious one .... We improve these bounds and determine Con-line exactly." IEEE Sympos Found Comput Sci 31 31 135-144 0473 Consel,C. Partial Evaluation of .. Inform.Process. 89 30(2):79-86 Consel C; Danvy O Partial Evaluation of Pattern Matching in Strings String match; Knuth-Morris-Pratt; FR; Pattern match; Automata "This article describes how automatically specializing a fairly naive pattern matcher by partial evaluation leads to the Knuth, Morris & Pratt algorithm. Interestingly enough, no theorem proving is needed to achieve the partial evaluation, as was previously argued, and it is sufficient to identify a static component in the computation to get the result - a deterministic finite automaton. This experiment illustrates how a small insight and partial evaluation can achieve a nontrivial result." Inform Process Lett 1989 30 2 79-86 0474 Core,N.G. Supercomputers and Bio.. Comput.Biomed.R 89 22:497-515 Core NG; Edmiston EW; Saltz JH; Smith RM Supercomputers and Biological Sequence Comparison Algorithms Pairwise comparison; Parallel; USA; Dynamic programming; Sequence comparison; Algorithm "One method of increasing the speed of the calculations [to compare biological sequences] is to perform them in parallel. We present the results of initial investigations using two dynamic programming algorithms on the Intel iPSC hypercube and the Connection Machine as well as an inexpensive, heuristically-based algorithm on the Encore Multimax." Comput Biomed Res 22 22 497-515 0475 Cormen,T.H. String Matching Introduction .. 90MIT Press Cormen TH; Leiserson CE; Rivest RL String Matching Introduction to Algorithms Review; USA; Automata; String match; Knuth-Morris-Pratt; Boyer-Moore Chapter 34: . The naive string-matching algorithm. The Rabin-Karp algorithm. String matching with finite automata. The Knuth-Morris-Pratt algorithm. The Boyer-Moore algorithm MIT Press Cambridge, MA 1990 853-885 0476 Corpet,F. Multiple Sequence Alig.. Nucleic Acids R 88 16(22):10881-1 Corpet F Multiple Sequence Alignment with Hierarchical Clustering Multiple alignment; FR; Sequence alignment; Clustering; Hierarchical This approach is based on the conventional dynamic-programming method of pairwise alignment. Initially, a hierarchical clustering of the sequences is calculated from the matrix of the pairwise alignment scores. From the clustering a multiple sequence alignment is derived. The pairwise alignments from the multiple alignment form a new matrix of pairwise alignment scores. If the matrix is different from the previous one, iteration of the process can be performed Nucleic Acids Res 1988 16 22 10881-10890 0477 Coulson,A.F.W Protein and Nucleic Ac.. Comput.J. 87 30(5):420-424 Coulson AFW; Collins JF; Lyall A Protein and Nucleic Acid Sequence Database Searching: a Suitable Case for Parallel Processing Database search; Parallel; UK; Sequence database; Significance; Protein; Nucleic acid "Sequence analysis of protein and nucleic acid databases by exhaustive string-matching algorithms is effectively implemented on large processor-array machines, such as the I. C. L. DAP. An improved method of assessing the significance of the best alignments for proteins is described." Comput J 1987 30 5 420-424 0478 Crochemore,M. Two-Way String-Matching J.Assoc.Comput. 91 38(3):651-675 Crochemore M; Perrin D Two-Way String-Matching String match; Boyer-Moore; Knuth-Morris-Pratt; FR "A new string-matching algorithm is presented, which can be viewed as an intermediate between the classical algorithms of Knuth, Morris, and Pratt on the one hand and Boyer and Moore, on the other hand. The algorithm is linear in time and uses constant space as the algorithm of Galil and Seiferas. It [is] remarkably simple which consequently makes its analysis possible." J Assoc Comput Mach 1991 38 3 651-675 0479 Czelusniak,J. Maximum Parsimony Appr.. Methods Enzymol 90 183:601-615 Czelusniak J; Goodman M; Moncrief ND; Kehoe SM Maximum Parsimony Approach to Construction of Evolutionary Trees from Aligned Homologous Sequences Multiple alignment; Phylogeny; USA; Evolutionary tree; Parsimony "We feel we have demonstrated (by example but not by rigorous proof) that our heuristic maximum parsimony search procedures can approach the correct evolutionary tree." Methods Enzymol 183 183 601-615 0480 Dardel,F. DNAid: A Macintosh Ful.. Comput.Appl.Bio 88 4(4):483-486 Dardel F; Bensoussan P DNAid: A Macintosh Full Screen Editor Featuring a Built-in Regular Expression Interpreter for the Search of Specific Patterns in Biological Sequences using Finite State Automata Dictionary match; Automata; FR; Language; Pattern search; Pattern language; Editor; Expression "In addition to the classical editing capabilities, powerful analysis and search functions are available from within the editor. ... Furthermore a pattern matching language is included which allows searches for user-defined strict or fuzzy signals within biological sequences. Patterns are translated into finite state automata which allow very efficient searches." Comput Appl Biosci 1988 4 4 483-486 0481 Davies,G. Algorithms for Pattern.. Software.Practi 86 16(6):575-601 Davies G; Bowsher S Algorithms for Pattern Matching String match; Review; UK; Pattern match; Complexity; String search; Algorithm "This paper describes four algorithms of varying complexity used for pattern matching, and investigates their behaviour. The algorithms are tested using patterns of varying length from several alphabets." Software Practice Experience 1986 16 6 575-601 0482 Davison,D. Sequence Similarity ('.. Bull.Math.Biol. 85 47(4):437-474 Davison D Sequence Similarity ('Homology') Searching for Molecular Biologists Pairwise comparison; Review; USA; Sequence alignment; Similarity "Major types of sequence similarity searching ... are reviewed and examples of each are presented. The features and limitations of each type of program, and individual implementations of each type are discussed. Two pairs of sequences are used as examples to show how implementations of each type differ in their results and their presentation. Both local and global alignment programs are examined...." Bull Math Biol 1985 47 4 437-474 0483 Davison,D. A Non-metric Sequence .. Bull.Math.Biol. 84 46(4):579-590 Davison D; Thompson KH A Non-metric Sequence Alignment Program Pairwise alignment; USA; Sequence alignment; Region; Program "An algorithm for nucleic acid and protein sequence alignment is presented. It is a non-metric local similarity minimal-difference algorithm and in the current implementation, assembles the matching regions found into a pseudo-global format. Its strength are its speed of execution and the especially convenient presentation of its output. The algorithm is intended for use in sequence melding and local (small-region) similarity searching." Bull Math Biol 1984 46 4 579-590 0484 Day,G.R. Statistical Significan.. Nucleic Acids R 82 10(24):8323-83 Day GR; Blake RD Statistical Significance of Symmetrical and Repetitive Segments of DNA Sequence analysis; Significance; USA; Statistical; Repetition; Segment; DNA "Methods of computer analysis for the recurrence of symmetrical and repetitive elements in large numbers of DNA sequences are described, together with derivations of appropriate quantitative criteria for the evaluation of the statistical significance of these elements in DNAs of different base composition." Nucleic Acids Res 1982 10 24 8323-8339 0485 Day,W.H.E. Properties of Levensht.. Bull.Math.Biol. 84 46(2):327-332 Day WHE Properties of Levenshtein Metrics on Sequences Sequence proximity; CA "Those Levenshtein dissimilarity measures based on insertions and deletions are analyzed by a model involving valuations on a partially ordered set. The model reveals structural relationships among poset, valuation and dissimilarity measure. As a consequence, certain Levenshtein dissimilarity measures are shown to be metrics characterized by betweenness properties and computable in terms of well-known measures of sequence similarity." Bull Math Biol 1984 46 2 327-332 0486 Day,W.H.E. An Empirical Evaluatio.. New Approache.. 94Springer-Verlag Day WHE; Gordon AD An Empirical Evaluation of Consensus Rules for Molecular Sequences Diday E Lechevallier Y; Schader M; Bertrand P; Burtschy B New Approaches in Classification and Data Analysis Consensus sequence; Probabilistic; CA "We investigate relationships among several consensus methods for molecular sequences: c(P*), the containing subset method of Gordon [1994]; gp, the generalized plurality rule method of Day and McMorris (1992); and sp, a method based on the simple plurality rule." Springer-Verlag Berlin 1994 347-355 0487 Day,W.H.E. Consensus Sequences Ba.. Bull.Math.Biol. 92 54(6):1057-106 Day WHE; McMorris FR Consensus Sequences Based on Plurality Rule Consensus sequence; Plurality rule; CA; Consensus method "We apply concepts of social choice theory, in particular those concerning median and plurality rules, to investigate the problem of finding a consensus of aligned molecular sequences. ... Our results concern plurality rules which are median rules, are characterized by the Condorcet properties, and are efficient to calculate. Our approach is axiomatic." Bull Math Biol 1992 54 6 1057-1068 0488 Day,W.H.E. Critical Comparison of.. Nucleic Acids R 92 20(5):1093-109 Day WHE; McMorris FR Critical Comparison of Consensus Methods for Molecular Sequences Consensus sequence; Review; CA; Consensus method "We conducted a critical comparison of nine consensus methods for sequences, of which eight were used in papers appearing in this journal. We report the results of that comparison, and we make recommendations which we hope will assist researchers when they must select particular consensus methods for particular applications." Nucleic Acids Res 1992 20 5 1093-1099 0489 Day,W.H.E. Interpreting Consensus.. Math.Biosci. 92 111(2):231-247 Day WHE; McMorris FR Interpreting Consensus Sequences Based on Plurality Rule Consensus sequence; Plurality rule; CA; Consensus method; Profile "Our goal is to help researchers interpret the results of a function, based on the concept of plurality rule, that calculates a consensus of a profile of molecular bases. By expressing the plurality rule function as a composition of simpler functions, we obtain both an algorithm to calculate the consensus result and an upper bound on the number of nonequivalent results." Math Biosci 1992 111 2 231-247 0490 Day,W.H.E. Threshold Consensus Me.. J.Theor.Biol. 92 159(4):481-489 Day WHE; McMorris FR Threshold Consensus Methods for Molecular Sequences Consensus sequence; CA; Consensus method "We introduce a parameterized threshold consensus method ... for molecular sequences which is based on a majority-rule voting principle." J Theor Biol 1992 159 4 481-489 0491 Day,W.H.E. Analysing Molecular Se.. N.Z.J.Bot. 93 31(3):211-218 Day WHE; McMorris FR Analysing Molecular Sequences Using Consensus Consensus sequence; Review; CA; Consensus method "Methods for discovering consensus sequences are surveyed. Included are methods based on frequency thresholds, voting strategies, heuristics, neighbourhoods, and measures of inhomogeneity or information content." N Z J Bot 1993 31 3 211-218 0492 Day,W.H.E. Discovering Consensus .. Information a.. 93Springer-Verlag Day WHE; McMorris FR Discovering Consensus Molecular Sequences Opitz O Lausen B; Klar R Information and Classification - Concepts, Methods and Applications Consensus sequence; Review; CA; Consensus method "We survey methods for discovering consensus sequences such as those based on frequency thresholds, voting strategies, heuristics, neighbourhoods, and measures of inhomogeneity or information content." Springer-Verlag Berlin 1993 393-402 0493 Day,W.H.E. On the Consistency of .. J.Classif. 94 11(2):??-?? Day WHE; McMorris FR On the Consistency of the Plurality Rule Consensus Function for Molecular Sequences Consensus sequence; Plurality rule; CA; Function; Profile; Consistency The plurality rule consensus function fails to satisfy properties of consistency that enable users to understand its behaviour for long profiles in terms of its behaviour for short profiles. Because consistency is a desirable feature of consensus functions, the authors explore the boundaries of its applicability to the plurality rule consensus function J Classif 1994 11 2 ??-?? 0494 Day,W.H.E. The Computation of Con.. Math.Comput.Mod 93 17(10):49-52 Day WHE; McMorris FR The Computation of Consensus Patterns in DNA Sequences Consensus sequence; Longest common; Complexity; CA; DNA "Two important consensus problems are closely related to two well-known sequence problems. M. Waterman's problem of finding consensus strings is a natural extension of the Longest Common Substring problem. The problem of identifying consensus subsequences is a natural extension of the Longest Common Subsequence problem, and thus is NP-hard." Math Comput Modelling 1993 17 10 49-52 0495 Day,W.H.E. On the Existence of Co.. J.Comput.Inform 91 2(2):123-137 Day WHE; Mirkin BG On the Existence of Constrained Partitions of Integers Consensus sequence; Plurality rule; CA; Consensus method "We confirm Day and McMorris's conjecture that the plurality-rule consensus function has exactly 26 nonequivalent results when it is used to analyse molecules with four bases." J Comput Inform 1991 2 2 123-137 0496 Dayhoff,M.O. Establishing Homologie.. Methods Enzymol 83 91:524-544 Dayhoff MO; Barker WC; Hunt LT Establishing Homologies in Protein Sequences Sequence proximity; Substitution; USA; Statistical; PAM; Homology; Protein In Hirs,C.H.W., Timasheff,S.N. (Eds.), Enzyme Structure, Part I. "We will be particularly concerned with statistical tests capable of illuminating even very distant relationships. ... From these tests we concluded that ... the MDM matrix [mutation data scoring matrix] for 250 PAMs (Fig. 3) is the best matrix for detecting distantly related sequences." Methods Enzymol 91 91 524-544 0497 Dayhoff,M.O. A Model of Evolutionar.. Atlas of Prot.. 78National Biomed Dayhoff MO; Schwartz RM; Orcutt BC A Model of Evolutionary Change in Proteins Dayhoff MO Atlas of Protein Sequence and Structure, Volume 5, Supplement 3, 1978 Sequence proximity; Substitution; PAM; Scoring; USA; Evolutionary distance; Protein; Model "The matrices derived from [protein] data that describe the amino acid replacement probabilities between two sequences at various evolutionary distances are more accurate and the scoring matrix that is derived is more sensitive in detecting distant relationships than the one that we previously derived." National Biomedical Research Foundation Washington, DC 1978 345-352 0498 Deken,J. Probabilistic Behavior.. Time Warps, S.. 83Addison-Wesley Deken J Probabilistic Behavior of Longest-Common-Subsequence Length Sankoff D Kruskal JB Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison Longest common; Significance; USA; Probabilistic "For a fairly broad class of models for random sequences (cf. Deken 1979), it can be shown that the length of a longest common subsequence, divided by the total sequence length, approaches ... some constant ck that is a function of the random model used and the number k of letters in the alphabet. ... The counting method used to derive the upper bounds [of ck] is of some independent interest, and so is described below." Addison-Wesley Reading, MA 1983 359-362 0499 Deken,J.G. Some Limit Results for.. Discrete Math. 79 26:17-31 Deken JG Some Limit Results for Longest Common Subsequences Longest common; Significance; USA; Subsequence "We specialize to uniform random sequences and give some improvements on the lower bounds previously obtained [Chvatal, Sankoff 1975] for the proportion of digits which can be matched in the limit. The main result ... is a system of lower bounds which improve known results for alphabets of >2 letters." Discrete Math 26 26 17-31 0500 DeLisi,C. Assessing the Signific.. Math.Biosci. 84 69:77-85 DeLisi C; Kanehisa M Assessing the Significance of Local Sequence Homologies Sequence analysis; Significance; USA; Statistical; Probabilistic; Homology "When homology searches are performed against databases containing several hundred thousand residues, an important number to know is the probability that the homologous sequence would have occurred simply as the result of the large number of arrangements of short sequences that must be present in any large collection of disparate sequences. In this note we evaluate different versions of this question using different sets of rules." Math Biosci 69 69 77-85 0501 Depiereux,E. Simultaneous and Multi.. Protein Eng. 91 4(6):603-613 Depiereux E; Feytmans E Simultaneous and Multivariate Alignment of Protein Sequences: Correspondence between Physicochemical Profiles and Structurally Conserved Regions (SCR) Multiple alignment; Belgium; Multivariate; Structure; Region; Profile; Protein "The method ... is based on two basic requirements for a meaningful alignment. First, each sequence or segment of a sequence is characterized by a multivariate physicochemical profile. Second, the alignment is performed by considering all the sequences simultaneously, and the algorithm detects those regions that form a set of similar profiles." Protein Eng 1991 4 6 603-613 0502 Depiereux,E. MATCH-BOX: A Fundament.. Comput.Appl.Bio 92 8(5):501-509 Depiereux E; Feytmans E MATCH-BOX: A Fundamentally New Algorithm for the Simultaneous Alignment of Several Protein Sequences Multiple alignment; Segment; Belgium; Gap; Protein; Algorithm The main problems in automatic procedures for multiple alignment are related to the successive pairwise alignment approach and to the choice of gap weighting. The authors' algorithm searches for complete matches common to all the sequences without performing pairwise alignment and regardless of gap weighting Comput Appl Biosci 1992 8 5 501-509 0503 Deshpande,A.S A Platform for Biologi.. Comput.Appl.Bio 91 7(2):237-247 Deshpande AS; Richards DS; Pearson WR A Platform for Biological Sequence Comparison on Parallel Computers Database search; Parallel; USA; Sequence comparison; Sequence database; FASTA; Program "We have written two programs for searching biological sequence databases that run on Intel hypercube computers. PSCANLIB compares a single sequence against a sequence library, and PCOMPLIB compares all the entries in one sequence library against a second library. ... We have implemented the rapid FASTA sequence comparison algorithm and the more rigorous Smith-Waterman algorithm within this framework." Comput Appl Biosci 1991 7 2 237-247 0504 Devereux,J. A Comprehensive Set of.. Nucleic Acids R 84 12(1):387-395 Devereux J; Haeberli P; Smithies O A Comprehensive Set of Sequence Analysis Programs for the VAX Sequence analysis; USA; Program; Sequence comparison "The University of Wisconsin Genetics Computer Group (UWGCG) has been organized to develop computational tools for the analysis and publication of biological sequence data. A group of programs that will interact with each other has been developed for the Digital Equipment Corporation VAX computer using the VMS system." Nucleic Acids Res 1984 12 1 387-395 0505 Doolittle,R.F Similar Amino Acid Seq.. Science 81 214(9 Oct.):14 Doolittle RF Similar Amino Acid Sequences: Chance or Common Ancestry Significance; USA; Sequence comparison; Amino acid "Sometimes the surviving similarities [between amino acid sequences] are so vague that even computer-based sequence comparison procedures are unable to validate relationships. In other cases similar sequences may appear in totally alien proteins as a result of mere chance or, occasionally, by the convergent evolution of sequences with special properties." Science 1981 214 9 Oct. 149-159 0506 Doolittle,R.F Searching through Sequ.. Methods Enzymol 90 183:99-110 Doolittle RF Searching through Sequence Databases Database search; USA; Sequence database; Sequence search; Significance "The results of a sequence search usually require that judgments be made about the significance of what has or has not been found. The primary aim of this chapter is to provide a few simple guidelines and hints about how to make these judgments. When it comes to low-level similarity, caution is always warranted." Methods Enzymol 183 183 99-110 0507 Doolittle,R.F Reconstructing History.. Protein Sci. 92 1(2):191-200 Doolittle RF Reconstructing History with Amino Acids Phylogeny; USA; Module; Mosaic; Shuffling "Among the factors that can confound the reconstruction of events, however, are occasional horizontal gene transfers and exon shuffling. The latter has led to a number of mosaic proteins, many of which contain various combinations of a relatively small set of modules like the epidermal growth factor domain." Protein Sci 1992 1 2 191-200 0508 Doolittle,R.F Nearest Neighbor Proce.. Methods Enzymol 90 183:659-669 Doolittle RF; Feng DF Nearest Neighbor Procedure for Relating Progressively Aligned Amino Acid Sequences Multiple alignment; Phylogeny; USA; Amino acid Application of the progressive alignment method (Feng and Doolittle 1987, 1990). It is a nearest neighbor method called PAPA: parsimony after progressive alignment Methods Enzymol 183 183 659-669 0509 Dumas,J.P. Efficient Algorithms f.. Nucleic Acids R 82 10(1):197-206 Dumas JP; Ninio J Efficient Algorithms for Folding and Comparing Nucleic Acid Sequences Pairwise comparison; FR; Repeat; Structure; Regularities; Nucleic acid; Folding; Algorithm "Fast algorithms for analysing sequence data are presented. An algorithm for strict homologies finds all common subsequences of length >= 6 in two given sequences. ... We shall describe, in its simplest form, an algorithm to search for strict repeats within a sequence. ... With minor changes the algorithm searches for palindromes or inverted repeats or searches for homologies, inverted homologies or complementarities between two sequences." Nucleic Acids Res 1982 10 1 197-206 0510 Edmiston,E.W. Parallel Processing of.. Internat.J.Para 88 17(3):259-275 Edmiston EW; Core NG; Saltz JH; Smith RM Parallel Processing of Biological Sequence Comparison Algorithms Pairwise alignment; Subalignment; Parallel; USA; Sequence comparison; Algorithm "We present the results of initial investigations using the Intel iPSC/1 hypercube and the Connection Machine (CM-I) for these comparisons [of biological sequences]. Since these machines have very different architectures, the issues and performance trade-offs discussed have a wide applicability for the parallel processing of biological sequence comparisons." Internat J Parallel Programming 1988 17 3 259-275 0511 Edmiston,E. Parallelization of the.. Proceedings o.. 87Penn State Pres Edmiston E; Wagner RA Parallelization of the Dynamic Programming Algorithm for Comparison of Sequences Proceedings of the 1987 International Conference on Parallel Processing Pairwise alignment; Parallel; USA; IL; Dynamic programming; Dynamic; Algorithm 17-21 August 1987, Chicago, IL. "We look at parallel algorithms for two similar problems: finding a best match between two sequences and finding a best match of a short sequence to a subsequence of a long sequence. A method for parallelizing the dynamic programming solutions to both of these problems is presented. ... The parallel algorithms can execute on an SIMD machine ...." Penn State Press Philadelphia, PA 1987 78-80 0512 Eilam-Tzoreff Matching Patterns in S.. Theoret.Comput. 88 60(3):231-254 Eilam-Tzoreff T; Vishkin U Matching Patterns in Strings Subject to Multi-linear Transformations String match; Pattern match; IL "Suppose we are given two strings of real numbers. ... We consider problems within the following framework. Suppose each symbol of the pattern was modified by any transformation which is a member in some family of transformations. Find all occurrences of the pattern in the text where the pattern may appear subject to any one of these transformations. Problems are introduced and efficient algorithms are given." Theoret Comput Sci 1988 60 3 231-254 0513 Eppstein,D. Sequence Comparison wi.. J.Algorithms 90 11(1):85-101 Eppstein D Sequence Comparison with Mixed Convex and Concave Costs Pairwise alignment; Sequence proximity; Gap; USA; Sequence comparison "Recently a number of algorithms have been developed for solving the minimum-weight edit sequence problem with non-linear costs for multiple insertions and deletions. We extend these algorithms to cost functions that are neither convex nor concave, but a mixture of both." J Algorithms 1990 11 1 85-101 0514 Eppstein,D. Speeding up Dynamic Pr.. IEEE Sympos.Fou 88 29:488-496 Eppstein D; Galil Z; Giancarlo R Speeding up Dynamic Programming Pairwise alignment; USA; Dynamic programming; Data structure; Dynamic 24-26 October 1988. "A number of important computational problems in molecular biology ... can be expressed as recurrences which have typically been solved with dynamic programming. By using more sophisticated data structures, and by taking advantage of further structure from the applications, we speed up the computation of several of these recurrences by one or two orders of magnitude." IEEE Sympos Found Comput Sci 29 29 488-496 0515 Eppstein,D. Sparse Dynamic Program.. J.Assoc.Comput. 92 39(3):519-545 Eppstein D; Galil Z; Giancarlo R; Italiano GF Sparse Dynamic Programming I: Linear Cost Functions Pairwise alignment; USA; Dynamic programming; Sequence comparison; Function; Dynamic "Dynamic programming solutions to a number of different recurrence equations for sequence comparison ... are considered. These recurrences are defined over a number of points that is quadratic in the input size; however only a sparse set matters for the result. Efficient algorithms for these problems are given, when the weight functions used in the recurrences are taken to be linear." J Assoc Comput Mach 1992 39 3 519-545 0516 Eppstein,D. Sparse Dynamic Program.. J.Assoc.Comput. 92 39(3):546-567 Eppstein D; Galil Z; Giancarlo R; Italiano GF Sparse Dynamic Programming II: Convex and Concave Cost Functions Pairwise alignment; USA; Dynamic programming; Gap; Function; Dynamic Continues Eppstein et al. (1992a). "Efficient algorithms are given for solving these problems, when the cost of a gap in the alignment ... is taken as a convex or concave function of the gap ... length." J Assoc Comput Mach 1992 39 3 546-567 0517 Erickson,B.W. Recognition of Pattern.. Time Warps, S.. 83Addison-Wesley Erickson BW; Sellers PH Recognition of Patterns in Genetic Sequences Sankoff D Kruskal JB Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison Match with k differences; USA; Genetic; Recognition This chapter "deals with the question of how to find the consecutive string (or strings) in a longer sequence a with 'best possible agreement' to a shorter sequence b. ... It presents algorithms to solve two different versions of this question .... In one version, best agreement means that the string has the smallest possible distance to b. In the other, best agreement ... means that no substring or superstring has smaller distance to b." Addison-Wesley Reading, MA 1983 55-91 0518 Felsenstein,J Phylogenies from Molec.. Annu.Rev.Genet. 88 22:521-565 Felsenstein J Phylogenies from Molecular Sequences: Inference and Reliability Multiple alignment; Phylogeny; Statistical; Reliability; Analytical; Robustness; USA Estimating phylogenies. Methods for inferring phylogenies. Statistics and the justification of methods. Statistical tests of phylogenies. The bootstrap, the jackknife, and other resampling methods. Simulation studies Annu Rev Genet 22 22 521-565 0519 Felsenstein,J An Efficient Method fo.. Nucleic Acids R 82 10(1):133-139 Felsenstein J; Sawyer S; Kochin R An Efficient Method for Matching Nucleic Acid Sequences Pairwise comparison; USA; Fourier; Nucleic acid "A method of computing the fraction of matches between two nucleic acid sequences at all possible alignments is described. It makes use of the Fast Fourier Transform. ... This method will complement algorithms for efficiently finding the longest matching parts of two sequences, and is faster than existing algorithms for finding matches allowing deletions and insertions." Nucleic Acids Res 1982 10 1 133-139 0520 Feng,D.F. Progressive Sequence A.. J.Mol.Evol. 87 25:351-360 Feng DF; Doolittle RF Progressive Sequence Alignment as a Prerequisite to Correct Phylogenetic Trees Multiple alignment; Clustering; USA; Sequence alignment; Needleman-Wunsch; Phylogenetic "A progressive alignment method is described that utilizes the Needleman and Wunsch pairwise alignment algorithm iteratively to achieve the multiple alignment of a set of protein sequences and to construct an evolutionary tree depicting their relationship. ... The method has the added virtue of providing multiple sequence alignments quickly and simply by completely objective criteria." J Mol Evol 25 25 351-360 0521 Feng,D.F. Progressive Alignment .. Methods Enzymol 90 183:375-387 Feng DF; Doolittle RF Progressive Alignment and Phylogenetic Tree Construction of Protein Sequences Multiple alignment; Clustering; USA; Gap; Protein; Phylogenetic "The ... progressive alignment method ... produces a multiple alignment for a set of protein sequences by iteratively acting on the sequences. The essence of the method is based on the simple rule, 'once a gap, always a gap.' Consequently, the order in which the sequences are arranged is crucial. In this regard, an approximate phylogenetic order of the sequences is first determined by a series of pairwise alignments by the Needleman and Wunsch method." Methods Enzymol 183 183 375-387 0522 Feng,D.F. Aligning Amino Acid Se.. J.Mol.Evol. 85 21:112-125 Feng DF; Johnson MS; Doolittle RF Aligning Amino Acid Sequences: Comparison of Commonly Used Methods Sequence proximity; Substitution; Review; USA; Amino acid "We examined two extensive families of protein sequences using four different alignment schemes that employ various degrees of 'weighting' in order to determine which approach is most sensitive in establishing relationships. All alignments used a similarity approach based on a general algorithm devised by Needleman and Wunsch." J Mol Evol 21 21 112-125 0523 Fickett,J.W. Fast optimal alignment Nucleic Acids R 84 12(1):175-179 Fickett JW Fast optimal alignment Pairwise alignment; USA; Sequence alignment; Optimal "We show how to speed up sequence alignment algorithms of the type introduced by Needleman and Wunsch .... What we do is reorder the computation of the usual alignment matrix so that the optimal alignment is ordinarily found when only a small fraction of the matrix is filled. The number of matrix elements which have to be computed is related to the distance between the sequences being aligned ...." Nucleic Acids Res 1984 12 1 175-179 0524 Fischel-Ghods Alignment of Protein S.. Protein Eng. 90 3(7):577-581 Fischel-Ghodsian F; Mathiowitz G; Smith TF Alignment of Protein Sequences using Secondary Structure: a Modified Dynamic Programming Method Multiple alignment; Structure; USA; Dynamic programming; Protein; Secondary; Dynamic "A method for comparison of protein sequences based on their primary and secondary structure is described. ... Sequences are compared with a dynamic programming method (STRALIGN) that includes a similarity matrix for both the amino acids and secondary structure." Protein Eng 1990 3 7 577-581 0525 Fitch,W.M. An Improved Method of .. J.Mol.Biol. 66 16:9-16 Fitch WM An Improved Method of Testing for Evolutionary Homology Sequence proximity; USA; Homology "A more sensitive method of searching for a homologous relation between two proteins is presented. The method depends on determining the minimum number of nucleotides which must be altered to permit the conversion of one sequence into the other." J Mol Biol 16 16 9-16 0526 Fitch,W.M. Further Improvements i.. J.Mol.Biol. 70 49:1-14 Fitch WM Further Improvements in the Method of Testing for Evolutionary Homology among Proteins Sequence proximity; USA; Homology; Protein "An earlier method for detecting significant genetic relatedness between two gene products (Fitch 1966) is improved upon through three specific measures. ... The third measure [also shows] how quickly the probability that a result would be ascribed to a chance event decreases as the length of the sequences being compared is increased." J Mol Biol 49 49 1-14 0527 Fitch,W.M. Random Sequences J.Mol.Biol. 83 163:171-176 Fitch WM Random Sequences Sequence analysis; Significance; USA "The meaning of random [sequence] is briefly discussed along with a distinction between representative sequences and shuffled sequences. The rationale in choosing between them and a method for shuffling a sequence while preserving nearest-neighbor frequencies is given." J Mol Biol 163 163 171-176 0528 Fitch,W.M. Optimal Sequence Align.. Proc.Nat.Acad.S 83 80:1382-1386 Fitch WM; Smith TF Optimal Sequence Alignments Sequence proximity; USA; Sequence alignment; Codon; Gap; Optimal "Current theory is adequate to the task of finding an optimal alignment between two character strings such as nucleic acids. Most algorithms currently in use must fail to find the homologous alignment between a set of codons for the chicken a- and b-hemoglobin sequence when it is in fact discoverable by a more general treatment of gaps. Fundamental reasons for this are discussed." Proc Nat Acad Sci USA 80 80 1382-1386 0529 Foulser,D.E. Parallel Computation o.. Comput.Biomed.R 90 23(4):310-331 Foulser DE; Core NG Parallel Computation of Multiple Biological Sequence Comparisons Multiple comparison; Common feature; Parallel; USA; Sequence comparison; Search tree "This paper presents a parallel computer implementation of a suffix tree- based method for rapid multiple sequence comparisons, as a variant on a method proposed recently by Karlin et al." See Karlin, Ghandour, Ost, Tavare, and Korn (1983), and Karlin, Morris, Ghandour, and Leung (1988) Comput Biomed Res 1990 23 4 310-331 0530 Fredman,M.L. Algorithms for Computi.. Bull.Math.Biol. 84 46(4):553-566 Fredman ML Algorithms for Computing Evolutionary Similarity Measures with Length Independent Gap Penalties Sequence alignment; Multiple alignment; USA; Gap; Similarity; Algorithm "We give algorithms for computing the extent of similarity between two or three sequences of letters. The similarity measures we consider include a penalty for inserting gaps within the sequence in order to enhance similarity. The magnitude of the penalty for gaps is assumed to be independent of their size in order to accommodate certain biological applications." Bull Math Biol 1984 46 4 553-566 0531 Friedemann,T. Alignment of Multiple .. Comput.Appl.Bio 88 4(1):213-214 Friedemann T Alignment of Multiple DNA and Protein Sequence Data Multiple alignment; Review; USA; Sequence alignment; Protein; DNA Summary of seven multiple sequence alignment programs and their applicability Comput Appl Biosci 1988 4 1 213-214 0532 Frishman,D. Recognition of Distant.. J.Mol.Biol. 92 228:951-962 Frishman D; Argos P Recognition of Distantly Related Protein Sequences using Conserved Motifs and Neural Networks Match complex patterns; DE; Motif; Neural; Protein; Network; Recognition "A sensitive technique for protein sequence motif recognition based on neural networks has been developed. ... The objective of the present investigation is to develop an automatic and sensitive algorithm to delineate motifs in multiply aligned sequences and then to use these patterns in a search for other distantly related primary structures." J Mol Biol 228 228 951-962 0533 Fristensky,B. Improving the Efficien.. Nucleic Acids R 86 14(1):597-610 Fristensky B Improving the Efficiency of Dot-matrix Similarity Searches Through Use of an Oligomer Table Pairwise comparison; Dot; USA; Similarity; Oligomer "Dot-matrix sequence similarity searches can be greatly speeded up through use of a table listing all locations of short oligomers in one to the sequences to find potential similarities with a second sequence. The algorithm described finds similarities between two sequences ... [by] comparing L residues at a time ...." Nucleic Acids Res 1986 14 1 597-610 0534 Fuchs,R. MacPattern: Protein Pa.. Comput.Appl.Bio 91 7(1):105-106 Fuchs R MacPattern: Protein Pattern Searching on the Apple Macintosh Dictionary match; DE; Pattern definition; Protein "A program is described for rapid detection of protein sequence patterns on the Apple Macintosh which makes full use of the information contained in the PROSITE protein pattern database. ... The algorithm used for detecting patterns is based on the set-membership matrix concept (Cockwell, Giles (1989)), adapted to the PROSITE pattern definition syntax." Comput Appl Biosci 1991 7 1 105-106 0535 Fitch,W.M. The Usefulness of Amin.. Evol.Biol. 70 4:67-109 Fitch WM; Margoliash E The Usefulness of Amino Acid and Nucleotide Sequences in Evolutionary Studies Pairwise comparison; Phylogeny; USA; Nucleotide; Amino acid Introduction. Detection of Significant Similarities Between Sequences. Inferring Evolutionary Relationships from Sequence Information. Derived Evolutionary and Genetic Information. Evol Biol 4 4 67-109 0536 Galas,D.J. Rigorous Pattern-recog.. J.Mol.Biol. 85 186:117-128 Galas DJ; Eggert M; Waterman MS Rigorous Pattern-recognition Methods for DNA Sequences: Analysis of Promoter Sequences from Escherichia coli Consensus sequence; Neighbourhood; USA; Statistical; Significance; DNA "We have developed rigorous analytical methods for finding unknown patterns that occur imperfectly in a set of several sequences, and have used them to examine a set of bacterial promoters. ... We also have provided estimates for the statistical significance of common patterns discovered in sets of sequences." J Mol Biol 186 186 117-128 0537 Galil,Z. Real-Time Algorithms f.. ACM Sympos.Theo 76 8:161-173 Galil Z Real-Time Algorithms for String-Matching and Palindrome Recognition String match; Complexity; USA; Palindrome; Algorithm; Recognition Hershey, PA, 3-5 May 1976. "We give a sufficient condition when an on-line algorithm can be transformed into a real-time algorithm. We use this condition to construct real-time algorithms for string-matching and palindrome recognition problems by random access machines and by Turing machines." ACM Sympos Theory Comput 8 8 161-173 0538 Galil,Z. On Improving the Worst.. Comm.ACM 79 22(9):505-508 Galil Z On Improving the Worst Case Running Time of the Boyer-Moore String Matching Algorithm Boyer-Moore; IL; String match; Algorithm "It is shown how to modify the Boyer-Moore string matching algorithm so that its worst case running time is linear even when multiple occurrences of the pattern are present in the text." Comm ACM 1979 22 9 505-508 0539 Galil,Z. String Matching in Rea.. J.Assoc.Comput. 81 28(1):134-149 Galil Z String Matching in Real Time Knuth-Morris-Pratt; IL; String match "A sufficient condition for an on-line algorithm to be transformed into a real-time algorithm is given. This condition is used to construct real-time algorithms for various string-matching problems by random access machines and by Turing machines." Knuth-Morris-Pratt real-time algorithms are described for RAM and Turing machine J Assoc Comput Mach 1981 28 1 134-149 0540 Galil,Z. Optimal Parallel Algor.. ACM Sympos.Theo 84 16:240-248 Galil Z Optimal Parallel Algorithms for String Matching Parallel; IL; String match; Optimal; Algorithm Washington, DC, 30 April - 2 May 1984. "Let WRAM (PRAM) be a parallel computer with p processors (RAMs) which share a common memory and are allowed simultaneous reads and writes (only simultaneous reads). ... We design below families of parallel algorithms that solve the string matching problem .... Similar families are also obtained for the problem of finding all initial palindromes of a given string." ACM Sympos Theory Comput 16 16 240-248 0541 Galil,Z. Optimal Parallel Algor.. Inform.Control 85 67:144-157 Galil Z Optimal Parallel Algorithms for String Matching Parallel; IL; String match; Optimal; Algorithm "Let WRAM (PRAM) be a parallel computer with p processors (RAMs) which share a common memory and are allowed simultaneous reads and writes (only simultaneous reads). ... We design below families of parallel algorithms that solve the string matching problem .... Similar families are also obtained for the problem of finding all initial palindromes of a given string." Inform Control (Orlando) 67 67 144-157 0542 Galil,Z. A Constant-Time Optima.. ACM Sympos.Theo 92 24:69-76 Galil Z A Constant-Time Optimal Parallel String-Matching Algorithm String match; Parallel; USA; Optimal; Algorithm Victoria, BC, 4-6 May 1992. "Given a pattern string, we describe a way to preprocess it. We design a constant-time optimal parallel algorithm for finding all occurrences of the (preprocessed) pattern in any given text." ACM Sympos Theory Comput 24 24 69-76 0543 Galil,Z. Improved String Matchi.. SIGACT News 86 17(4, whole no Galil Z; Giancarlo R Improved String Matching with k Mismatches Match with k mismatches; IL; Data structure; String match; Search tree "Recently, an efficient algorithm for [matching with k mismatches] has been devised by [Landau and Vishkin]. ... Here we present a compact version of their algorithm .... The data structure that we use is the suffix tree of the pattern modified in order to support the static lowest common ancestor algorithm ...." SIGACT News 1986 17 4, whole no. 62 52-54 0544 Galil,Z. Parallel String Matchi.. Theoret.Comput. 87 51:341-348 Galil Z; Giancarlo R Parallel String Matching with k Mismatches Match with k mismatches; Parallel; USA; String match "Two improved algorithms for string matching with k mismatches are presented. One algorithm is based on fast integer multiplication algorithms whereas the other follows more closely classic string-matching techniques." Theoret Comput Sci 51 51 341-348 0545 Galil,Z. Data Structures and Al.. J.Complexity 88 4(1):33-72 Galil Z; Giancarlo R Data Structures and Algorithms for Approximate String Matching Review; USA; Data structure; String match; Parallel; Structure; Algorithm "This paper surveys techniques for designing efficient sequential and parallel approximate string matching algorithms. Special attention is given to the methods for the construction of data structures that efficiently support primitive operations needed in approximate string matching." J Complexity 1988 4 1 33-72 0546 Galil,Z. Speeding up Dynamic Pr.. Theoret.Comput. 89 64:107-118 Galil Z; Giancarlo R Speeding up Dynamic Programming with Applications to Molecular Biology Pairwise alignment; USA; Dynamic programming; Edit; Dynamic Improved algorithms are obtained for dual convex-concave cases of a problem that arises in many applications. The algorithms speed up several dynamic programming routines that solve as a subproblem the given problem. One typical problem is to compute the edit distance between the two sequences, given substitution costs and a convex cost function for gaps." Theoret Comput Sci 64 64 107-118 0547 Galil,Z. On the Exact Complexit.. SIAM J.Comput. 91 20(6):1008-102 Galil Z; Giancarlo R On the Exact Complexity of String Matching: Lower Bounds Complexity; USA; String match "This paper provides several lower bounds on the number of character comparisons that any string matching algorithm must perform in the worst case in order to find occurrences of a pattern string in a text string. The class of algorithms that are considered need not know the alphabet." SIAM J Comput 1991 20 6 1008-1020 0548 Galil,Z. On the Exact Complexit.. SIAM J.Comput. 92 21(3):407-437 Galil Z; Giancarlo R On the Exact Complexity of String Matching: Upper Bounds Complexity; USA; String match "It is shown that, for any pattern of length m and for any text of length n, it is possible to find all occurrences of the pattern in the text in overall linear time and at most (4n - m)/3 character comparisons." SIAM J Comput 1992 21 3 407-437 0549 Galil,Z. A Linear-time Algorith.. Inform.Process. 90 33(6):309-311 Galil Z; Park K A Linear-time Algorithm for Concave One-dimensional Dynamic Programming Pairwise alignment; USA; Dynamic programming; Edit; Algorithm; Dynamic "The one-dimensional dynamic programming problem is defined .... The modified edit distance problem [Galil, Giancarlo (1989)], which arises in molecular biology, ... can be decomposed into 2n copies of the problem." Inform Process Lett 1990 33 6 309-311 0550 Galil,Z. An Improved Algorithm .. SIAM J.Comput. 90 19(6):989-999 Galil Z; Park K An Improved Algorithm for Approximate String Matching Match with k differences; USA; String match; Algorithm "Given a text string, a pattern string, and an integer k, a new algorithm for finding all occurrences of the pattern string in the text string with at most k differences is presented. Both its theoretical and practical variants improve upon the known algorithms." SIAM J Comput 1990 19 6 989-999 0551 Galil,Z. Dynamic Programming wi.. Theoret.Comput. 92 92:49-76 Galil Z; Park K Dynamic Programming with Convexity, Concavity and Sparsity Pairwise alignment; Dynamic programming; Review; Sequence proximity; Edit; IL; Sequence alignment; Longest common; Gap; Dynamic "In many applications dynamic programming problems satisfy additional conditions of convexity, concavity and sparsity. This paper presents a classification of dynamic programming problems and surveys efficient algorithms based on the three conditions." The string edit distance problem, the longest common subsequence problem, the sequence alignment problem, and the sequence alignment problem with linear gap costs are examples or variations of Problem 2.2 Theoret Comput Sci 92 92 49-76 0552 Galil,Z. Linear-time String-mat.. Theoret.Comput. 81 13(3):331-336 Galil Z; Seiferas J Linear-time String-matching using only a Fixed Number of Local Storage Locations String match; Knuth-Morris-Pratt; IL "In an earlier paper (Galil, Seiferas 1980), we asked whether any variant of the linear-time string-matching algorithm of Knuth, Morris, and Pratt (1977) could be implemented as a FORTRAN subroutine. ... In this note, we show that a fixed number of local storage locations does suffice, at least for an implementation which is slightly less 'straightforward.'" Theoret Comput Sci 1981 13 3 331-336 0553 Galil,Z. Time-space-optimal Str.. J.Comput.System 83 26(3):280-294 Galil Z; Seiferas J Time-space-optimal String Matching Knuth-Morris-Pratt; IL; String match "Any string-matching algorithm requires at least linear time and a constant number of local storage locations. We design and analyze an algorithm which realizes both asymptotic bounds simultaneously. This can be viewed as completely eliminating the need for the tabulated 'failure function' in the linear-time algorithm of Knuth, Morris, and Pratt." J Comput Systems Sci 1983 26 3 280-294 0554 Galil,Z. Saving Space in Fast S.. SIAM J.Comput. 80 9(2):417-438 Galil Z; Seiferas JI Saving Space in Fast String Matching Knuth-Morris-Pratt; IL; String match "Algorithms described in this paper reduce the extra space used by the Knuth-Morris-Pratt algorithm down to O(log |x|) [x being the pattern] ...." SIAM J Comput 1980 9 2 417-438 0555 Gatlin,L.L. The Information Conten.. J.Theor.Biol. 66 10:281-300 Gatlin LL The Information Content of DNA Composition; Information content; USA; DNA "Recognition of the relevance of the transition probability matrix of the nearest neighbor experiment to the Shannon formula makes possible the calculation of the average information per symbol for a given kind of DNA. ... It is speculated that DNA sequences with highly asymmetric transition probabilities serve control functions in the genetic program." J Theor Biol 10 10 281-300 0556 George,D.G. Mutation Data Matrix a.. Methods Enzymol 90 183:333-351 George DG; Barker WC; Hunt LT Mutation Data Matrix and Its Uses Sequence proximity; Substitution; USA; Matrix Similarity scoring matrices. Dayhoff Mutation Data Matrix (MDM). Limitations of the model. Computer applications using MDM. New similarity matrices Methods Enzymol 183 183 333-351 0557 Gibbs,A.J. The Diagram, a Method .. Eur.J.Biochem. 70 16:1-11 Gibbs AJ; McIntyre GA The Diagram, a Method for Comparing Sequences Pairwise comparison; Dot; AU "We describe another alternative, the 'diagonal-match' or diagram method, which, we think has advantages over other methods in that it is basically simple (can be done by hand if necessary), and shows directly all the possible similarities between the sequences." Eur J Biochem 16 16 1-11 0558 Goad,W.B. Pattern Recognition in.. Nucleic Acids R 82 10(1):247-263 Goad WB; Kanehisa MI Pattern Recognition in Nucleic Acid Sequences. I. A General Method for Finding Local Homologies and Symmetries Subalignment; USA; Pattern recognition; Statistical; Subsequence; Needleman-Wunsch; Homology; Nucleic acid; Recognition "We present an algorithm - a generalization of the Needleman-Wunsch- Sellers algorithm - which finds within longer sequences all subsequences that resemble one another locally. The probability that so close a resemblance would occur by chance alone is calculated and used to classify these local homologies according to statistical significance." Nucleic Acids Res 1982 10 1 247-263 0559 Goldstein,L. Poisson Approximation .. Comm.Statist.Th 90 19(11):4167-41 Goldstein L Poisson Approximation and DNA Sequence Matching Pairwise alignment; Significance; USA; Approximation; Sequence match; Poisson; DNA "A formal justification for the strong limit behavior of the log n law for the case of perfect matching between sequences is given in Arratia and Waterman (1985); how to obtain detailed information about the distributional behavior in the log n law using the Chen-Stein method is outlined below." Comm Statist Theory Methods 1990 19 11 4167-4179 0560 Goldstein,L. Poisson, Compound Pois.. Bull.Math.Biol. 92 54(5):785-812 Goldstein L; Waterman MS Poisson, Compound Poisson and Process Approximations for Testing Statistical Significance in Sequence Comparisons Pairwise alignment; Significance; USA; Approximation; Statistical; Sequence comparison; Poisson "Most ... algorithms search for the alignment of two sequences that optimizes some alignment score. It is an important problem to assess the statistical significance of a given score. In this paper we use newly developed methods for Poisson approximation to derive estimates of the statistical significance of k-word matches on a diagonal of a sequence comparison." Bull Math Biol 1992 54 5 785-812 0561 Gonnet,G.H. An Analysis of the Kar.. Inform.Process. 90 34(5):271-274 Gonnet GH; Baeza-Yates RA An Analysis of the Karp-Rabin String Matching Algorithm CA; Probabilistic; String match; String search; Algorithm "We present an average case analysis of the Karp-Rabin string matching algorithm. This algorithm is a probabilistic algorithm that adapts hashing techniques to string searching. We also propose an efficient implementation of this algorithm." Inform Process Lett 1990 34 5 271-274 0562 Gonnet,G.H. Exhaustive Matching of.. Science 92 256(5 June):14 Gonnet GH; Cohen MA; Benner SA Exhaustive Matching of the Entire Protein Sequence Database Database search; SWI; Sequence database; Gap; Protein "The entire protein sequence database has been exhaustively matched. Definitive mutation matrices and models for scoring gaps were obtained from the matching and used to organize the sequence database as sets of evolutionarily connected components. ... The key to matching in a reasonable time lies in the step preceding the application of the Needleman-Wunsch algorithm: a reorganization of the sequence data by indexing on a patricia tree." Science 1992 256 5 June 1443-1445 0563 Gordon,A.D. A Probabilistic Approa.. New Approache.. 94Springer-Verlag Gordon AD A Probabilistic Approach to Identifying Consensus in Molecular Sequences Diday E Lechevallier Y; Schader M; Bertrand P; Burtschy B New Approaches in Classification and Data Analysis Consensus sequence; Probabilistic; Profile; UK "Given a profile of nucleic acid bases at a specified position in an aligned set of molecular sequences, a simple rule for defining ambiguity codes is presented: all bases whose frequency in the profile falls below the maximum profile frequency by no more than a specified number d are included in the ambiguity code. Ways are described of defining d so as to ensure that this 'containing subset' possesses desirable properties under the assumption of a multinomial model for the frequencies of bases in the profile." Springer-Verlag Berlin 1994 356-361 0564 Gotoh,O. An Improved Algorithm .. J.Mol.Biol. 82 162:705-708 Gotoh O An Improved Algorithm for Matching Biological Sequences Pairwise alignment; JP; Gap; Algorithm Waterman, Smith, and Beyer (1976) described an O(m2n) algorithm for aligning two sequences in which gaps of any length are allowed. This paper presents a new O(mn) algorithm in which gap weights have a special form J Mol Biol 162 162 705-708 0565 Gotoh,O. Alignment of Three Bio.. J.Theor.Biol. 86 121:327-337 Gotoh O Alignment of Three Biological Sequences with an Efficient Traceback Procedure Multiple alignment; JP; Dynamic programming "This paper describes a dynamic programming algorithm for aligning three sequences at a time." J Theor Biol 121 121 327-337 0566 Gotoh,O. Pattern Matching of Bi.. Comput.Appl.Bio 87 3(1):17-20 Gotoh O Pattern Matching of Biological Sequences with Limited Storage Subalignment; JP; Pattern match; Complexity A method is described for getting the locally best matched alignments between a pair of biological sequences which greatly reduces the storage requirement while maintaining the O(n2) time complexity Comput Appl Biosci 1987 3 1 17-20 0567 Gotoh,O. Consistency of Optimal.. Bull.Math.Biol. 90 52(4):509-525 Gotoh O Consistency of Optimal Sequence Alignments Multiple alignment; Segment; JP; Sequence alignment; Region; Optimal; Consistency A previous method "is further extended so that the combination of pairwise alignments that gives the greatest consistency is found when possibly many alignments are equally optimal for each pairwise comparison. A method for acceleration of simultaneous multiple sequence alignment is proposed in which consistent regions serve as 'anchor points' limiting application of direct multi-way alignment to the rest of 'inconsistent' regions." Bull Math Biol 1990 52 4 509-525 0568 Gotoh,O. Optimal Sequence Align.. Bull.Math.Biol. 90 52(3):359-373 Gotoh O Optimal Sequence Alignment Allowing for Long Gaps Pairwise alignment; JP; Sequence alignment; Gap; Optimal "Because a long stretch in a biological sequence can be lost or added by a single mutational event such as unequal crossing-over or transposition of a movable element, the probability of occurrence of a long gap seems almost independent of the gap length, while short insertions or deletions would occur in a length-dependent frequency." Thus a new algorithm for optimal sequence alignment where the gap weight function is given by a piecewise linear function Bull Math Biol 1990 52 3 359-373 0569 Gotoh,O. Optimal Alignment Betw.. Comput.Appl.Bio 93 9(3):361-370 Gotoh O Optimal Alignment Between Groups of Sequences and its Application to Multiple Sequence Alignment Multiple alignment; JP; Optimal; Sequence alignment "Four algorithms ... were developed to align two groups of biological sequences. ... The advantages and disadvantages of the four algorithms are discussed on the basis of the results of examinations of several protein families." Comput Appl Biosci 1993 9 3 361-370 0570 Gotoh,O. Sequence Search on a S.. Nucleic Acids R 86 14(1):57-64 Gotoh O; Tagashira Y Sequence Search on a Supercomputer Database search; JP; Sequence search; Gap "A set of programs was developed for searching nucleic acid and protein sequence data bases for sequences similar to a given sequence. The programs ... were optimized for vector processing on a Hitachi S810-20 supercomputer. ... The principal algorithm is that of Smith and Waterman (1981) modified to incorporate a linear gap weight (Gotoh 1982)." Nucleic Acids Res 1986 14 1 57-64 0571 Gribskov,M. The Language Metaphor .. Computers Chem. 92 16(2):85-88 Gribskov M The Language Metaphor in Sequence Analysis Sequence analysis; Linguistic; USA; Coding; Language "The metaphors of language and coding have provided a powerful framework for organizing molecular biology. Many techniques developed in the analysis of text and other communication channels have been successfully applied to macromolecular sequences with little or no change. ... A number of properties, such as long-range interactions, structural dynamics and the importance of sequence variation in modulation of function, are poorly modelled by the language metaphor." Computers Chem 1992 16 2 85-88 0572 Gribskov,M. Profile Scanning for T.. Comput.Appl.Bio 88 4(1):61-66 Gribskov M; Homyak M; Edenfield J; Eisenberg D Profile Scanning for Three-dimensional Structural Patterns in Protein Sequences Match a pattern matrix; USA; Dynamic programming; Profile; Protein "Profile analysis measures the similarity between a target sequence and a group of aligned sequences (the probe). The probe sequences are used to produce a position-specific scoring table (the profile) that can be aligned with any sequence (the target) using standard dynamic programming methods." Comput Appl Biosci 1988 4 1 61-66 0573 Gribskov,M. Profile Analysis Methods Enzymol 90 183:146-159 Gribskov M; Luthy R; Eisenberg D Profile Analysis Match a pattern matrix; USA; Profile; Motif "The profile method provides a convenient way to represent information about groups or families of sequences as well as a means to ask questions about the definition of protein families, the relationships between distantly related proteins, and the presence of sequence or structural motifs in proteins." Methods Enzymol 183 183 146-159 0574 Gribskov,M. Profile Analysis: Dete.. Proc.Nat.Acad.S 87 84(13):4355-43 Gribskov M; McLachlan AD; Eisenberg D Profile Analysis: Detection of Distantly Related Proteins Match a pattern matrix; USA; Profile; Sequence comparison; Protein; Detection "Profile analysis is a method for detecting distantly related proteins by sequence comparison. The basis for comparison is not only the customary Dayhoff mutational-distance matrix but also the results of structural studies and information implicit in the alignments of the sequences of families of similar proteins. This information is expressed in a position-specific scoring table (profile) ...." Proc Nat Acad Sci USA 1987 84 13 4355-4358 0575 Griggs,J.R. Sequence Alignments wi.. SIAM J.Algebrai 86 7(4):604-608 Griggs JR; Hanlon PJ; Waterman MS Sequence Alignments with Matched Sections Pairwise alignment; USA; Sequence alignment "In molecular biology, two finite sequences are compared by displaying one sequence written over another in an alignment. The number of alignments of two sequences is related to the Stanton-Cowan numbers. This paper gives asymptotics for the number of alignments of two sequences of length n with matching sections of size at least b." SIAM J Algebraic Discrete Methods 1986 7 4 604-608 0576 Grob,U. Recognition of Ill-def.. Comput.Appl.Bio 88 4(1):79-88 Grob U; Stuber K Recognition of Ill-defined Signals in Nucleic Acid Sequences Match a pattern matrix; DE; Signal; Nucleic acid; Recognition; Consensus matrix "A set of programs has been developed for the definition and handling of nucleic acid sequence consensus information. The sequences of known genetic control signals are combined in a matrix." Comput Appl Biosci 1988 4 1 79-88 0577 Grossi,R. A Fast VLSI Solution f.. Integration, Th 92 13(2):195-206 Grossi R A Fast VLSI Solution for Approximate String Matching Match with k differences; Parallel; String match; Italy; VLSI "A simple hardware algorithm is proposed for the approximate string matching problem .... The employed interconnection network is a classical mesh- of-trees, augmented with trees along the main diagonals of the mesh. The area and time bounds are shown, and compared favorably with previous solutions in many cases." Integration, The VLSI J 1992 13 2 195-206 0578 Grossi,R. Simple and Efficient S.. Inform.Process. 89 33:113-120 Grossi R; Luccio F Simple and Efficient String Matching with k Mismatches Match with k mismatches; Italy "We follow a new approach to [string matching with k mismatches] based on the determination of the permutations of [string] P in [string] T, and propose two algorithms for its solution. ... An extensive set of runs shows ... that the running times are strongly reduced, thus making our algorithms important in practice." Inform Process Lett 33 33 113-120 0579 Guibas,L.J. A New Proof of the Lin.. SIAM J.Comput. 80 9(4):672-682 Guibas LJ; Odlyzko AM A New Proof of the Linearity of the Boyer-Moore String Searching Algorithm String match; Boyer-Moore; USA; String search; Algorithm "We study the combinatorial structure of periodic strings and use these results to derive a new proof of the linearity of the Boyer-Moore algorithm in the worst case. Our proof reduces the previously best known bound of 7n to 4n, where n is the length of the text." SIAM J Comput 1980 9 4 672-682 0580 Guibas,L.J. Periods in Strings J.Combin.Theory 81 30(1):19-42 Guibas LJ; Odlyzko AM Periods in Strings Regularities; USA "We explore the notion of periods of a string. A period can be thought of as a shift that causes the string to match over itself. ... This problem arose in connection with our work on string searching algorithms. ... The more sophisticated of these algorithms extract information from an unsuccessful match and use it to rule out other matches which have no chance of succeeding. These decisions invariably require knowledge of how the pattern matches over itself ...." J Combin Theory Ser A 1981 30 1 19-42 0581 Guibas,L.J. String Overlaps, Patte.. J.Combin.Theory 81 30(2):183-208 Guibas LJ; Odlyzko AM String Overlaps, Pattern Matching and Nontransitive Games String match; Complexity; USA; Pattern match "This paper studies several topics concerning the way strings can overlap. The key notion of the correlation of two strings is introduced, which is a representation of how the second string can overlap into the first. ... Another application shows that no algorithm can check for the presence of a given pattern in a text without examining essentially all characters of the text in the worst case." J Combin Theory Ser A 1981 30 2 183-208 0582 Gusfield,D. Efficient Methods for .. Bull.Math.Biol. 93 55(1):141-154 Gusfield D Efficient Methods for Multiple Sequence Alignment with Guaranteed Error Bounds Multiple alignment; Complexity; Approximation; USA; Sequence alignment; Error "Several precise measures have been proposed for evaluating the goodness of a multiple alignment, but no efficient methods are known which compute the optimal alignment for any of these measures in any but small cases. In this paper, we consider two previously proposed measures, and give two computationally efficient multiple alignment methods (one for each measure) whose deviation from the optimal value is guaranteed to be less than a factor of two." Bull Math Biol 1993 55 1 141-154 0583 Haber,J.E. An Evaluation of the R.. J.Mol.Biol. 70 50:617-639 Haber JE; Koshland DE Jr An Evaluation of the Relatedness of Proteins based on Comparison of Amino Acid Sequences Pairwise alignment; Significance; USA; Statistical; Sequence comparison; Protein; Amino acid "A procedure based on comparing amino acid residues was developed to examine the statistical consequences of practices commonly applied in sequence comparisons. ... Rules of thumb were developed to indicate significant relatedness of two sequences beyond the expectations of chance." J Mol Biol 50 50 617-639 0584 Hall,J.D. A Software Tool for Fi.. Comput.Appl.Bio 88 4(1):35-40 Hall JD; Myers EW A Software Tool for Finding Locally Optimal Alignments in Protein and Nucleic Acid Sequences Subalignment; USA; Region; Locally optimal; Optimal; Protein; Nucleic acid "We describe software for aligning protein or nucleic acid sequences based on the concept of match density. This method is especially useful for locating regions of short similarity between two longer sequences which may be largely dissimilar (e.g. locating active site regions in distantly related proteins)." Comput Appl Biosci 1988 4 1 35-40 0585 Hall,P.A.V. Approximate String Mat.. ACM Comput.Surv 80 12(4):381-402 Hall PAV; Dowling GR Approximate String Matching Review; UK; Dynamic programming; String match "Approximate matching of strings is reviewed with the aim of surveying techniques suitable for finding an item in a database when there may be a spelling mistake or other error in the keyword. The methods found are classified as either equivalence or similarity problems. Equivalence problems are seen to be readily solved using canonical forms. For similarity problems difference measures are surveyed, with a full description of the well-established dynamic programming method ...." ACM Comput Surveys 1980 12 4 381-402 0586 Harr,R. Search Algorithm for P.. Nucleic Acids R 83 11(9):2943-295 Harr R; Haggstrom M; Gustafsson P Search Algorithm for Pattern Match Analysis of Nucleic Acid Sequences Match a pattern matrix; SWE; Statistical; Significance; Pattern match; Nucleic acid; Algorithm "The algorithm is of pattern match type and is based on the fact that genetic information often is a function of a predictable statistical occurrence of the four bases within parts of the sequence. The search algorithm compares the known statistical pattern of bases in e.g. a promoter, with an unknown sequence and calculates the statistical significance of the match at all positions in the unknown sequence." Nucleic Acids Res 1983 11 9 2943-2957 0587 Hashiguchi,K. String Matching Proble.. Inform.Comput. 92 101(2):131-149 Hashiguchi K; Yamada K String Matching Problems over Free Partially Commutative Monoids Knuth-Morris-Pratt; JP; String match "This paper studies two string matching problems over free partially commutative monoids. We analyze these two problems in detail, and present two efficient polynomial time algorithms for solving them. ... Thus our algorithms may be regarded as FPCM-versions of the Knuth-Morris-Pratt string matching algorithm." Inform Comput 1992 101 2 131-149 0588 Haskin,R.L. Operational Characteri.. ACM Trans.Datab 83 8(1):15-40 Haskin RL; Hollaar LA Operational Characteristics of a Hardware-based Pattern Matcher Match complex patterns; USA; Automata "The design and operation of a new class of hardware-based pattern matchers ... is presented. This recognizer is based on a unique implementation technique for finite state automata consisting of partitioning the state table among a number of simple digital machines." ACM Trans Database Systems 1983 8 1 15-40 0589 Hein,J. A New Method that Simu.. Mol.Biol.Evol. 89 6(6):649-668 Hein J A New Method that Simultaneously Aligns and Reconstructs Ancestral Sequences for any Number of Homologous Sequences, When the Phylogeny is Given Multiple alignment; Evolutionary tree; USA; Reconstruct; Phylogeny "Among the fundamental problems in molecular evolution and in the analysis of homologous sequences are alignment, phylogeny reconstruction, and the reconstruction of evolutionary sequences. This paper presents a fast, combined solution to these problems." Mol Biol Evol 1989 6 6 649-668 0590 Hein,J. A Tree Reconstruction .. Mol.Biol.Evol. 89 6(6):669-684 Hein J A Tree Reconstruction Method that is Economical in the Number of Pairwise Comparisons Used Multiple alignment; Phylogeny; USA; Pairwise comparison "A fast method for reconstructing phylogenies from distance data is presented. The method is economical in the number of pairwise comparisons needed. It can be combined with a new phylogenetic alignment procedure to yield an algorithm that gives a complete history of a set of homologous sequences." Mol Biol Evol 1989 6 6 669-684 0591 Hein,J. Unified Approach to Al.. Methods Enzymol 90 183:626-645 Hein J Unified Approach to Alignment and Phylogenies Multiple alignment; Evolutionary tree; Phylogeny; USA "Conventionally, the alignment problem involves two sequences and must consider both substitutions and insertions/deletions. The phylogeny problem involves more sequences but usually requires that the insertions/deletions be taken care of beforehand. The accomplishment of the method presented here is to solve both problems simultaneously." Methods Enzymol 183 183 626-645 0592 Hein,J. A Heuristic Method to .. J.Mol.Evol. 93 36(4):396-405 Hein J A Heuristic Method to Reconstruct the History of Sequences Subject to Recombination Multiple alignment; Phylogeny; JP; Reconstruct; Heuristic; Recombination "Sequences subject to recombination and gene conversion defy phylogenetic analysis by traditional methods since their evolutionary history cannot be adequately summarized by a tree. This study investigates ways to describe their evolutionary history and proposes a method giving a partial reconstruction of this history." J Mol Evol 1993 36 4 396-405 0593 Henikoff,S. Automated Assembly of .. Nucleic Acids R 91 19(23):6565-65 Henikoff S; Henikoff JG Automated Assembly of Protein Blocks for Database Searching Database search; USA; Region; Motif; Protein "Here we present a system that is designed to assemble a best set of blocks for a given group of related proteins. The blocks are extended from ungapped aligned regions discovered by the MOTIF algorithm of [Smith, Annau, Chandrasegaran 1990] which can rapidly detect very distant relationships among large groups of proteins. Many blocks might be found, and they might overlap or appear in different orders .... The best set of blocks among these is determined by a new algorithm ...." Nucleic Acids Res 1991 19 23 6565-6572 0594 Henikoff,S. Detection of Protein S.. Nucleic Acids R 88 16(13):6191-62 Henikoff S; Wallace JC Detection of Protein Similarities Using Nucleotide Sequence Databases Database search; USA; Sequence database; Frame; Similarity; Protein; Nucleotide; Detection "A simple procedure is described for finding similarities between proteins using nucleotide sequence databases. ...[A probe] consisting of an unidentified open reading frame (ORF) ... was conceptually translated into protein and compared to every possible translated reading frame of every nucleotide sequence in the database." Nucleic Acids Res 1988 16 13 6191-6204 0595 Henikoff,S. Finding Protein Simila.. Methods Enzymol 90 183:111-132 Henikoff S; Wallace JC; Brown JP Finding Protein Similarities with Nucleotide Sequence Databases Database search; USA; Sequence database; Similarity; Protein; Nucleotide "It is worthwhile to search nucleotide sequence databases for protein similarities, since these databases are more complete and up-to-date. As illustrated in this chapter, it is advantageous to search these databases for amino acid rather than nucleotide sequence similarities. Therefore, we have adapted amino acid sequence searching procedures to detect similarities within nucleotide sequence databases." Methods Enzymol 183 183 111-132 0596 Henneke,C.M. A Multiple Sequence Al.. Comput.Appl.Bio 89 5(2):141-150 Henneke CM A Multiple Sequence Alignment Algorithm for Homologous Proteins using Secondary Structure Information and Optionally Keying Alignments to Functionally Important Sites Multiple alignment; Clustering; Sequence alignment; Structure; Protein; UK; Secondary; Algorithm "The programs described herein function as part of a suite of programs designed for pairwise alignment, multiple alignment, generation of randomized sequences, production of alignment scores and a sorting routine for analysis of the alignments produced." Comput Appl Biosci 1989 5 2 141-150 0597 Hertz,G.Z. Identification of Cons.. Comput.Appl.Bio 90 6(2):81-92 Hertz GZ; Hartzell GW III; Stormo GD Identification of Consensus Patterns in Unaligned DNA Sequences Known to be Functionally Related Consensus sequence; Information theory; USA; Identification; DNA The method identifies "consensus patterns in a set of unaligned DNA sequences known to bind a common protein or to have some other common biochemical function. The method is based on a matrix representation of binding site patterns. ... The goal of the method is to find the most significant matrix ... out of all the matrices that can be formed ...." Comput Appl Biosci 1990 6 2 81-92 0598 Higgins,D.G. Sequence Ordinations: .. Comput.Appl.Bio 92 8(1):15-22 Higgins DG Sequence Ordinations: A Multivariate Analysis Approach to Analysing Large Sequence Data Sets Multiple alignment; Multivariate; DE "This paper shows how to use principal coordinates analysis to find low- dimensional representations of distance matrices derived from aligned sets of sequences." Comput Appl Biosci 1992 8 1 15-22 0599 Higgins,D.G. CLUSTAL V: Improved So.. Comput.Appl.Bio 92 8(2):189-191 Higgins DG; Bleasby AJ; Fuchs R CLUSTAL V: Improved Software for Multiple Sequence Alignment Multiple alignment; Clustering; DE; Sequence alignment; Profile "... the multiple alignments are carried out in a progressive manner .... Sequences are aligned in larger and larger groups according to the branching order in a 'guide tree' ... constructed using the UPGMA method .... The strategy for aligning two alignments is a simple extension of the profile alignment method of Gribskov et al. (1987)." Comput Appl Biosci 1992 8 2 189-191 0600 Higgins,D.G. CLUSTAL: A Package for.. Gene 88 73:237-244 Higgins DG; Sharp PM CLUSTAL: A Package for Performing Multiple Sequence Alignment on a Microcomputer Multiple alignment; Clustering; IR; Sequence alignment "An approach for performing multiple alignments of large numbers of amino acid or nucleotide sequences is described. The method is based on first deriving a phylogenetic tree from a matrix of all pairwise sequence similarity scores, obtained using a fast pairwise alignment algorithm. Then the multiple alignment is achieved from a series of pairwise alignments of clusters of sequences, following the order of branching in the tree." Gene 73 73 237-244 0601 Higgins,D.G. Fast and Sensitive Mul.. Comput.Appl.Bio 89 5(2):151-153 Higgins DG; Sharp PM Fast and Sensitive Multiple Sequence Alignments on a Microcomputer Multiple alignment; Clustering; IR; Sequence alignment "A strategy is described for the rapid alignment of many long nucleic acid or protein sequences on a microcomputer. ... The approach is based on progressively aligning sequences according to the branching order in an initial phylogenetic tree." Comput Appl Biosci 1989 5 2 151-153 0602 Higgins,D.G. EMBLSCAN: Fast Approxi.. Comput.Appl.Bio 92 8(2):137-139 Higgins DG; Stoehr P EMBLSCAN: Fast Approximate DNA Database Searches on Compact Disc Database search; DE; Distributed; DNA "An algorithm that allows rapid searching of nucleic acid sequences based on pregenerated index files is described. The programs and index files for searching the entire EMBL nucleotide sequence collection are being distributed on the EMBL Data Library's CD-ROM." Comput Appl Biosci 1992 8 2 137-139 0603 Hirosawa,M. MASCOT: Multiple Align.. Comput.Appl.Bio 93 9(2):161-167 Hirosawa M; Hoshida M; Ishikawa M; Toya T MASCOT: Multiple Alignment System for Protein Sequences Based on Three-way Dynamic Programming Multiple alignment; Clustering; JP; Simulated annealing; Protein; Dynamic programming; Dynamic "MASCOT achieves high-quality alignment by employing three-way alignment in addition to two-way alignment. The resultant alignments are refined by simulated annealing to higher quality. We also use a cluster analysis of sequences to produce highly reliable alignments." Comput Appl Biosci 1993 9 2 161-167 0604 Hirschberg,D. A Linear Space Algorit.. Comm.ACM 75 18(6):341-343 Hirschberg DS A Linear Space Algorithm for Computing Maximal Common Subsequences Longest common; USA; Subsequence; Algorithm "The problem of finding a longest common subsequence of two strings has been solved in quadratic time and space. An algorithm is presented which will solve this problem in quadratic time and in linear space." Comm ACM 1975 18 6 341-343 0605 Hirschberg,D. Algorithms for the Lon.. J.Assoc.Comput. 77 24(4):664-675 Hirschberg DS Algorithms for the Longest Common Subsequence Problem Longest common; USA; Subsequence; Algorithm "Two algorithms are presented that solve the longest common subsequence problem. The first algorithm is applicable in the general case and requires O(pn + n log n) time where p is the length of the longest common subsequence. ... In the common special case where p is close to m, [the second] algorithm takes much less time than n2." J Assoc Comput Mach 1977 24 4 664-675 0606 Hirschberg,D. An Information-Theoret.. Inform.Process. 78 7(1):40-41 Hirschberg DS An Information-Theoretic Lower Bound for the Longest Common Subsequence Problem Longest common; Complexity; Information theory; USA; Subsequence "We shall prove that n log n is a lower bound on the number of "less than - equal - greater than" comparisons required to solve the LCS problem, assuming unrestricted alphabet size." Inform Process Lett 1978 7 1 40-41 0607 Hirschberg,D. Recent Results on the .. Time Warps, S.. 83Addison-Wesley Hirschberg DS Recent Results on the Complexity of Common-Subsequence Problems Sankoff D Kruskal JB Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison Longest common; Complexity; USA An overview of recent results in the solution of a variety of common- subsequence problems Addison-Wesley Reading, MA 1983 325-330 0608 Hirst,J.D. Prediction of ATP-bind.. Protein Eng. 91 4(6):615-623 Hirst JD; Sternberg MJE Prediction of ATP-binding Motifs: A Comparison of a Perceptron-type Neural Network and a Consensus Sequence Method Match a pattern matrix; UK; Pattern recognition; Motif; Neural; Consensus sequence; Statistical; Prediction; Network "In this paper, a two-layer feed-forward neural network has been trained to recognize ATP-binding local sequence motifs. The neural network correctly classified 78% of the 349 sequences used. This was much better than a simple motif-searching program. A more sophisticated statistical method was developed, however, which performed marginally better (80% correct classification) than the neural network." Protein Eng 1991 4 6 615-623 0609 Hodgman,T.C. The Elucidation of Pro.. Comput.Appl.Bio 89 5(1):1-13 Hodgman TC The Elucidation of Protein Function by Sequence Motif Analysis Sequence analysis; Review; UK; Motif; Function; Protein "Protein sequence motifs are acquiring increasing prominence in the area of sequence analysis. This review describes the current methods of their construction and their use in the determination of protein function, and offers guidelines on interpreting data obtained." Comput Appl Biosci 1989 5 1 1-13 0610 Hogeweg,P. The Alignment of Sets .. J.Mol.Evol. 84 20:175-186 Hogeweg P; Hesper B The Alignment of Sets of Sequences and the Construction of Phyletic Trees: An Integrated Method Multiple alignment; Evolutionary tree; NL "The alignment of sets of sequences and the construction of phyletic trees cannot be treated separately. The concept of 'good alignment' is meaningless without reference to a phyletic tree, and the construction of phyletic trees presupposes alignment of the sequences. We propose an integrated method that generates both an alignment of a set of sequences and a phyletic tree." J Mol Evol 20 20 175-186 0611 Horspool,R.N. Practical Fast Searchi.. Software.Practi 80 10:501-506 Horspool RN Practical Fast Searching in Strings String match; Boyer-Moore; CA "The problem of searching through text to find a specified substring is considered in a practical setting. It is discovered that a method developed by Boyer and Moore can outperform even special-purpose search instructions that may be built into the computer hardware. For very short substrings however, these special purpose instructions are fastest ...." Software Practice Experience 10 10 501-506 0612 Hsu,W.J. Computing a Longest Co.. BIT 84 24:45-59 Hsu WJ; Du MW Computing a Longest Common Subsequence for a Set of Strings Multiple alignment; CN; Dynamic programming; Longest common; Subsequence "The known 2-string LCS problem is generalized to finding a Longest Common Subsequence (LCS) for a set of strings. A new, general approach that systematically enumerates common subsequences is proposed for the solution. ... The proposed method may be considered to be much more efficient than the straightforward dynamic programming approach." BIT 24 24 45-59 0613 Hsu,W.J. New Algorithms for LCS.. J.Comput.System 84 29(2):133-152 Hsu WJ; Du MW New Algorithms for LCS Problem Longest common; CN; Algorithm "Two algorithms which improve two existing results, respectively, are presented. ... The [first] algorithm also exhibits desirable properties under conditions of sparse matches. [The second] also outperforms existing algorithms designed for sparsely-matched situations. ... The two algorithms provide interesting contrasts of different approaches to one problem ...." J Comput Systems Sci 1984 29 2 133-152 0614 Huang,X. A Lower Bound for the .. Inform.Process. 88 27(6):319-321 Huang X A Lower Bound for the Edit-distance Problem Under an Arbitrary Cost Function Pairwise alignment; Complexity; USA; Edit; Longest common; Function "We show that any algorithm that can compute the edit distance of two strings under an arbitrary cost function must take time proportional to n2 under the RAM model of computation, where n is the length of the strings. As a corollary, we observe that the Hunt-Szymanski algorithm for longest common subsequences cannot be extended to solve the general edit-distance problem." Inform Process Lett 1988 27 6 319-321 0615 Huang,X. A Space-efficient Para.. Internat.J.Para 89 18(3):223-239 Huang X A Space-efficient Parallel Sequence Comparison Algorithm for a Message- passing Multiprocessor Subalignment; Parallel; USA; Sequence comparison; Sequence alignment; Algorithm "We present a parallel algorithm for computing an optimal sequence alignment in efficient space. The algorithm is intended for a message-passing architecture with one-dimensional-array topology. ... Some experimental results on an Intel hypercube are provided." Internat J Parallel Programming 1989 18 3 223-239 0616 Huang,X. Computing Local Sequen.. Proceedings o.. 90 Huang X Computing Local Sequence Similarities on a Hypercube Proceedings of the 1990 International Conference on Parallel Processing, Vol. III Subalignment; Parallel; USA; Region; Similarity "Recently, a space efficient algorithm for finding similar regions of two sequences has been developed (Huang, Hardison, Miller 1990). In this paper, we consider parallelizing the algorithm on an Intel iPSC/2 hypercube. Experimental results show that high parallel efficiency is achieved." 1990 360-361 0617 Huang,X. A Space-efficient Algo.. Comput.Appl.Bio 90 6(4):373-381 Huang X; Hardison RC; Miller W A Space-efficient Algorithm for Local Similarities Subalignment; USA; Similarity; Algorithm "We describe a dynamic-programming local-similarity algorithm that needs only space proportional to the sum of the sequence lengths. The method can also find repeats within a single long sequence. ... Our linear-space local similarity algorithm combines the linear-space global alignment algorithm of Myers and Miller (1988) with techniques of Waterman and Eggert (1987)." Comput Appl Biosci 1990 6 4 373-381 0618 Huang,X. A Time-Efficient, Line.. Adv.Appl.Math. 91 12:337-357 Huang X; Miller W A Time-Efficient, Linear-Space Local Similarity Algorithm Subalignment; USA; Similarity; Algorithm "This paper presents a time-efficient algorithm that produces k best 'non- intersecting' local alignments for any chosen k. The algorithm's main strength is that it needs only O(M + N + K) space, where M and N are the lengths of the given sequences and K is the total length of the computed alignments." Adv Appl Math 12 12 337-357 0619 Huang,X. Parallelization of a L.. Comput.Appl.Bio 92 8(2):155-165 Huang X; Miller W; Schwartz S; Hardison RC Parallelization of a Local Similarity Algorithm Subalignment; Parallel; USA; Region; Sequence comparison; Similarity; Algorithm "We describe how to parallelize the new algorithm [to determine the similar regions within two given sequences] and present results of experimental studies on an Intel hypercube. The parallel method provides rapid, high- resolution alignments for users of our software toolkit for pairwise sequence comparison ...." Comput Appl Biosci 1992 8 2 155-165 0620 Hume,A. A Tale of Two Greps Software.Practi 88 18(11):1063-10 Hume A A Tale of Two Greps Match complex patterns; USA; Text search; Program; Boyer-Moore Text searching programs such as the UNIX system tools grep and egrep require more than just good algorithms;; they need to make efficient use of system resources such as I/O. ... I also describe incorporating the Boyer-Moore algorithm into egrep; egrep is now typically 8-10 (for some common patterns 30- 40) times faster than grep." Software Practice Experience 1988 18 11 1063-1072 0621 Hume,A. Fast String Searching Software.Practi 91 21(11):1221-12 Hume A; Sunday D Fast String Searching String match; Boyer-Moore; USA; String search The Boyer-Moore algorithm "has been the standard benchmark for the practical string search literature. Yet this yardstick compares badly with current practice. We describe two algorithms that perform 47% fewer comparisons and are about 4.5 times faster across a wide range of architectures and compilers. These new variants are members of a family of algorithms based on the skip loop structure of the preferred, but often neglected, fast form of Boyer- Moore." Software Practice Experience 1991 21 11 1221-1248 0622 Hunt,J.W. A Fast Algorithm for C.. Comm.ACM 77 20(5):350-353 Hunt JW; Szymanski TG A Fast Algorithm for Computing Longest Common Subsequences Longest common; USA; Dynamic programming; Subsequence; Algorithm This algorithm is not the dynamic programming approach: the authors "suggested extracting a longest common subsequence from the two strings and producing the editing changes from the subsequence." Comm ACM 1977 20 5 350-353 0623 Ibarra,O.H. String Editing on a On.. IEEE Trans.Comp 92 41(1):112-118 Ibarra OH; Jiang T; Wang H String Editing on a One-way Linear Array of Finite-state Machines Pairwise alignment; Parallel; USA; Longest common; Editing "We give an efficient parallel algorithm for the string edit problem. The model of computation is a one-way linear array of identical finite-state machines (nodes). ... Our algorithm can produce the actual minimum-cost edit sequence in linear time. ... We also give applications to other problems such as the longest common subsequence and approximate pattern matching." IEEE Trans Comput 1992 41 1 112-118 0624 Ibarra,O.H. String Processing on t.. IEEE Trans.Acou 90 38(1):160-164 Ibarra OH; Pong TC; Sohn SM String Processing on the Hypercube Pairwise comparison; Parallel; USA; String match; Signal; Longest common "We give parallel algorithms for solving some string comparison problems on the hypercube. These algorithms are widely applicable to the problems of speech and signal processing." Problems considered: match a keyword, longest common subsequence, string edit, minimum-length time-warping IEEE Trans Acoustics Speech Signal Processing 1990 38 1 160-164 0625 Isenman,M.E. Performance and Archit.. IEEE Trans.Comp 90 39(2):238-250 Isenman ME; Shasha DE Performance and Architectural Issues for String Matching Knuth-Morris-Pratt; USA; String match; Performance "We introduce special heuristics to the Knuth-Morris-Pratt algorithm to reduce the time and space required to perform the string matching. We compare our hardware-based approach to the software approaches embodied in the UNIX System grep and fgrep commands. ... We concentrate on hardware that can handle variable length don't cares ...." IEEE Trans Comput 1990 39 2 238-250 0626 Ishikawa,M. Multiple Sequence Alig.. Comput.Appl.Bio 93 9(3):267-273 Ishikawa M; Toya T; Hoshida M; Nitta K; Ogiwara A; Kanehisa M Multiple Sequence Alignment by Parallel Simulated Annealing Multiple alignment; JP; Sequence alignment; Parallel; Simulated annealing "We have developed simulated annealing algorithms to solve the problem of multiple sequence alignment. ... To overcome long execution times for simulated annealing, we utilized a parallel computer. ... The algorithm is also useful for refining multiple alignments obtained by other heuristic methods." Comput Appl Biosci 1993 9 3 267-273 0627 Itoga,S.Y. The String Merging Pro.. BIT 81 21(1):20-30 Itoga SY The String Merging Problem Multiple comparison; USA; Correction; Longest common "The string merging problem is to determine a merged string from a given set of strings. ... Necessary and sufficient conditions are presented for the case where this solution matches the solution to the string-to-string correction problem. A special case where deletion is the only allowed edition [sic] operation is shown to have the longest common subsequence of the strings as its solution." BIT 1981 21 1 20-30 0628 Ivanov,A.G. Recognition of an Appr.. Math.USSR-Izv. 85 24(3):479-522 Ivanov AG Recognition of an Approximate Occurrence of Words on a Turing Machine in Real Time Match with k mismatches; RU; Word; Recognition "A Turing machine is constructed which in real time solves the problem of the approximate identification of occurrences of words with respect to a number of familiar metrics [e.g., Hamming, Minkowski]." Math USSR-Izv 1985 24 3 479-522 0629 Jagadeeswaran Interactive Computer P.. Nucleic Acids R 82 10(1):433-447 Jagadeeswaran P; McGuire PM Jr Interactive Computer Programs in Sequence Data Analysis Pairwise comparison; Dot; USA; Program "The first group of programs named MATCH ... is designed to generate the dot matrix which provides information on the homology between two sequences and direct and inverted repeats within a sequence." Nucleic Acids Res 1982 10 1 433-447 0630 Johnson,M.S. A Method for the Simul.. J.Mol.Evol. 86 23:267-278 Johnson MS; Doolittle RF A Method for the Simultaneous Alignment of Three or More Amino Acid Sequences Multiple alignment; Segment; USA; Amino acid "The basis of the approach is a progressive evaluation of selected segments from each sequence. Only a small subset of all possible segments from each sequence is compared, and a minimum of information is retained for the trace-back of the alignment. As a result, this method has the advantage of being both rapid and minimally consumptive of computer memory when constructing the alignment." J Mol Evol 23 23 267-278 0631 Jones,R. Sequence Pattern Match.. Comput.Appl.Bio 92 8(4):377-383 Jones R Sequence Pattern Matching on a Massively Parallel Computer Database search; Parallel; USA; Pattern match; Gap "A method is described for finding all occurrences of a sequence pattern within a database of molecular sequences. Implementation of this on a massively parallel computer [the CM-2] allows the user to perform very fast database searches using complex patterns. In particular, the software supports approximate pattern matching with score thresholds for either the entire pattern or specified elements thereof. Matches to individual elements can be linked by variable length gaps ...." Comput Appl Biosci 1992 8 4 377-383 0632 Jones,R. Protein Sequence Compa.. Computers and.. 90Addison-Wesley Jones R; Taylor W IV; Zhang X; Mesirov JP; Lander E Protein Sequence Comparison on the Connection Machine CM-2 Bell G Marr T Computers and DNA, SFI Studies in the Sciences of Complexity, Vol. VII Subalignment; Parallel; USA; Sequence comparison; Protein "The appropriate algorithm for searching a database is that of Smith and Waterman (1981), which locates the best common subsequence between two otherwise unrelated sequences. ... Here we present our implementation of this algorithm on the data parallel Connection Machine CM-2, manufactured by Thinking Machines Corporation." Addison-Wesley Reading, MA 1990 99-107 0633 Jukes,T.H. Evolution of Protein M.. Mammalian Pro.. 69Academic Press Jukes TH; Cantor CR Evolution of Protein Molecules Munro HN Mammalian Protein Metabolism, Volume III Sequence proximity; Substitution; USA; Approximation; Evolution; Protein "From the triplet nature of the code one can see that certain amino acid interchanges are much more likely than others in the limit of small number of base changes. Depending on the amino acids involved, it can take either 1, 2, or 3 base changes in DNA (or RNA) to convert one amino acid to another. ... The approximation one must make is to say that all single base changes are equally probable." Academic Press New York 1969 21-132 0634 Kanaoka,M. Alignment of Protein S.. Protein Eng. 89 2(5):347-351 Kanaoka M; Kishimoto F; Ueki Y; Umeyama H Alignment of Protein Sequences using the Hydrophobic Core Scores Pairwise alignment; JP; Region; Gap; Protein; Score "To improve the accuracy of [pairwise] alignments, we introduced the concept of hydrophobic core scores, which restrains putting insertions/deletions in the hydrophobic core regions of the protein. ... The introduction of the hydrophobic core scores derived from the knowledge of the tertiary structure of one of each pair resulted in an improvement of the accuracy of the alignments." Protein Eng 1989 2 5 347-351 0635 Kanehisa,M. Use of Statistical Cri.. Nucleic Acids R 84 12(1):203-213 Kanehisa M Use of Statistical Criteria for Screening Potential Homologies in Nucleic Acid Sequences Subalignment; Significance; USA; Statistical; Segment; Monte Carlo; Homology; Nucleic acid "We proposed a simple formula to assess the statistical significance of homologous segments found in comparison of two nucleic acid sequences (Goad, Kanehisa 1982). This paper clarifies the basic assumptions of the formula and its reliability is examined by Monte Carlo calculations. The results were satisfactory for random sequences." Nucleic Acids Res 1984 12 1 203-213 0636 Karlin,S. Methods for Assessing .. Proc.Nat.Acad.S 90 87(6):2264-226 Karlin S; Altschul SF Methods for Assessing the Statistical Significance of Molecular Sequence Features by Using General Scoring Schemes Sequence analysis; Significance; USA; Statistical; Segment; Sequence alignment; Scoring "The distribution of the maximal segment score for randomly generated single or multiple protein sequences is available under broad conditions. Such results may serve as benchmarks of statistical significance. The results also provide a means for choosing suitable scoring schemes." Proc Nat Acad Sci USA 1990 87 6 2264-2268 0637 Karlin,S. Identification of Sign.. Methods Enzymol 90 183:388-402 Karlin S; Blaisdell BE; Brendel V Identification of Significant Sequence Patterns in Proteins Sequence analysis; Significance; USA; Identification; Pattern discovery; Protein "The methods described in this chapter identify statistically significant amino acid sequence configurations of many kinds. Our objective is to identify diagnostic sequence features that might provide insights into protein function and structure and ways of protein classification. ... Our focus here is to identify statistically significant clusters, runs, and periodic patterns of charge." Methods Enzymol 183 183 388-402 0638 Karlin,S. Chance and Statistical.. Science 92 257(3 July):39 Karlin S; Brendel V Chance and Statistical Significance in Protein and DNA Sequence Analysis Sequence analysis; Significance; USA; Statistical; Sequence comparison; Scoring; Protein; DNA "Statistical approaches help in the determination of significant configurations in ... sequence data. Three recent statistical methods are discussed: (i) score-based sequence analysis that provides a means for characterizing anomalies in local sequence text and for evaluation sequence comparisons; (ii) quantile distributions of amino acid usage that reveal general compositional biases in proteins ...; and (iii) r-scan statistics that can be applied to the analysis of spacings of sequence markers." Science 1992 257 3 July 39-49 0639 Karlin,S. Statistical Methods an.. Annu.Rev.Biophy 91 20:175-203 Karlin S; Bucher P; Brendel V; Altschul SF Statistical Methods and Insights for Protein and DNA Sequences Sequence analysis; Significance; Review; USA; Statistical; Sequence comparison; Clustering; Protein; DNA "This article focuses on the statistics of protein sequences and the insights they can provide to structure, function, and phylogenetic relatedness." Sequence concepts and statistical significance. Sequence comparisons and searches. Evaluation of clustering in protein sequences. Comparative compositional analysis of protein sequences. Unusual spacings between sequence letters or words Annu Rev Biophys Biophys Chem 20 20 175-203 0640 Karlin,S. Statistical Compositio.. Ann.Statist. 90 18(2):571-581 Karlin S; Dembo A; Kawabata T Statistical Composition of High-scoring Segments from Molecular Sequences Sequence analysis; Significance; USA; Statistical; Segment; Probabilistic; Scoring; Composition "We present new probabilistic formulas for characterizing statistically significant sequence configurations with respect to a general scoring scheme associated with letter attributes and for enabling varying degrees in letter matches. We describe the asymptotic extremal distribution of high aggregate segment scores and the letter composition of high-scoring segments." Ann Statist 1990 18 2 571-581 0641 Karlin,S. Comparative Statistics.. Proc.Nat.Acad.S 85 82(18):6186-61 Karlin S; Ghandour G Comparative Statistics for DNA and Protein Sequences: Multiple Sequence Analysis Sequence analysis; Significance; USA; Statistical; Multiple comparison; Protein; DNA "Concepts and methods [Karlin and Ghandour, 1985] for the analysis of patterns and relationships are extended to multiple DNA and protein sequences. Functionals include multiple sequence common word occurrence distributions, characterizations of high frequency shared words, and ascertainment of long block identities." Proc Nat Acad Sci USA 1985 82 18 6186-6190 0642 Karlin,S. Comparative Statistics.. Proc.Nat.Acad.S 85 82(17):5800-58 Karlin S; Ghandour G Comparative Statistics for DNA and Protein Sequences: Single Sequence Analysis Sequence analysis; Significance; USA; Statistical; Repeat; Protein; DNA "Four categories of data representations are used to help interpret structures and similarities of nucleic acid and protein sequences. Statistical significance of the observed relationships revealed by these representations are assessed by a hierarchy of permutation procedures and by comparisons with theoretical random models." Proc Nat Acad Sci USA 1985 82 17 5800-5804 0643 Karlin,S. Multiple-alphabet Amin.. Proc.Nat.Acad.S 85 82:8597-8601 Karlin S; Ghandour G Multiple-alphabet Amino Acid Sequence Comparisons of the Immunoglobulin k- chain Constant Domain Sequence analysis; Significance; USA; Statistical; Codon; Amino acid; Sequence comparison "We compare the amino acid sequences of the constant domains of the immunoglobulin k chain of human, mouse, and rabbit by using four ... 'alphabets' of the 20 amino acids based on their chemical, functional, charge, and structural properties." Proc Nat Acad Sci USA 82 82 8597-8601 0644 Karlin,S. The Use of Multiple Al.. EMBO J. 85 4(5):1217-1223 Karlin S; Ghandour G The Use of Multiple Alphabets in Kappa-gene Immunoglobulin DNA Sequence Comparisons Sequence analysis; Significance; USA; Statistical; DNA; Sequence comparison "Comparisons within and between [three] DNA sequences are carried out in terms of three two-letter nucleotide alphabets: ... (ii) P-Q alphabet which distinguishes purines ... from pyrimidines .... The P-Q alphabet comparisons reveal an abundance of statistically significant block identities not seen at the nucleotide level." EMBO J 1985 4 5 1217-1223 0645 Karlin,S. DNA Sequence Compariso.. Mol.Biol.Evol. 85 2(1):35-52 Karlin S; Ghandour G; Foulser DE DNA Sequence Comparisons of the Human, Mouse, and Rabbit Immunoglobulin Kappa Gene Sequence analysis; Significance; USA; Sequence comparison; Statistical; Longest common; Gene; DNA "New formulas for determining the expected length and variance of the longest block identity (a succession of matching nucleotides) between multiple random sequences are given and are used to establish statistical criteria for ascertaining the significance of block identities shared in r out of s sequences." Mol Biol Evol 1985 2 1 35-52 0646 Karlin,S. New Approaches for Com.. Proc.Nat.Acad.S 83 80(18):5660-56 Karlin S; Ghandour G; Ost F; Tavare S; Korn LJ New Approaches for Computer Analysis of Nucleic Acid Sequences Multiple comparison; Common feature; USA; Significance; Dyad; Statistical; Nucleic acid "A new ... algorithm is outlined that ascertains within and between nucleic acid and protein sequences all direct repeats, dyad symmetries, and other structural relationships. Large repeats, repeats of high frequency, dyad symmetries of specified stem length and loop distance, and their distributions are determined. Significance of homologies is assessed by a hierarchy of permutation procedures." Proc Nat Acad Sci USA 1983 80 18 5660-5664 0647 Karlin,S. Algorithms for Identif.. Comput.Appl.Bio 88 4(1):41-51 Karlin S; Morris M; Ghandour G; Leung MY Algorithms for Identifying Local Molecular Sequence Features Multiple comparison; Common feature; USA; Dyad; Repeat; Multiple alignment; Algorithm "Efficient algorithms are described for identifying local molecular sequence features including repeats, dyad symmetry pairings and aligned matches between sequences, while allowing for errors. ... A similar algorithm for multiple sequences identifies matches 'approximately aligned' with respect to some common location. [It] is useful for refining alignment maps based on coarser global analyses ...." Comput Appl Biosci 1988 4 1 41-51 0648 Karlin,S. Efficient Algorithms f.. Proc.Nat.Acad.S 88 85:841-845 Karlin S; Morris M; Ghandour G; Leung MY Efficient Algorithms for Molecular Sequence Analysis Multiple comparison; Common feature; USA; Sequence analysis; Multiple alignment; Dyad; Repeat; Algorithm "Efficient (linear time) algorithms are described for identifying global molecular sequence features allowing for errors including repeats, matches between sequences, dyad symmetry pairings, and other sequence patterns. A multiple sequence alignment algorithm is also described." Proc Nat Acad Sci USA 85 85 841-845 0649 Karlin,S. Counts of Long Aligned.. Adv.Appl.Probab 87 19:293-351 Karlin S; Ost F Counts of Long Aligned Word Matches Among Random Letter Sequences Multiple alignment; Significance; USA; Markov; Longest common; Statistical; Word "Asymptotic distributional properties of the maximal length aligned word (a contiguous set of letters) among multiple random Markov dependent sequences composed of letters from a finite alphabet are given. ... We shall concentrate in this paper on the random variable which is the length of the longest aligned matching word ... and also called the maximal length consensus segment." Adv Appl Probab 19 19 293-351 0650 Karlin,S. Patterns in DNA and Am.. Mathematical .. 89CRC Press Karlin S; Ost F; Blaisdell BE Patterns in DNA and Amino Acid Sequences and Their Statistical Significance Waterman MS Mathematical Methods for DNA Sequences Sequence analysis; Significance; Review; USA; Statistical; Repeat; Longest common; Pattern discovery; Amino acid; DNA "Relative to sequence composition, word relationships can be characterized with reference to spacings, proximity to natural biological sites, unusual lengths, clustering attributes, .... The specification of ... such concepts is the first main objective .... Distinguishing significant features ... is important in sequence comparisons. ... A description of useful theoretical formulas and their interpretation on molecular sequence data is the second main objective ...." CRC Press Boca Raton, FL 1989 133-157 0651 Karp,R.M. Efficient Randomized P.. IBM J.Res.Devel 87 31(2):249-260 Karp RM; Rabin MO Efficient Randomized Pattern-matching Algorithms String match; USA; Fingerprint; Multidimensional; Pattern match; Algorithm "We present randomized algorithms to solve the [matching keywords] problem and some of its generalizations. The algorithms represent strings of length n by much shorter strings called fingerprints, and achieve their efficiency by manipulating fingerprints instead of longer strings. The algorithms require a constant number of storage locations, and essentially run in real time. ... The method readily generalizes to higher-dimensional pattern-matching problems." IBM J Res Develop 1987 31 2 249-260 0652 Kashyap,R.L. An Effective Algorithm.. Inform.Sci. 81 23(2):123-142 Kashyap RL; Oommen BJ An Effective Algorithm for String Correction Using Generalized Edit Distances - I. Description of the Algorithm and its Optimality Dictionary match; Correction; USA; Edit; Distance; Algorithm "This paper deals with the problem of estimating a transmitted string X, from the corresponding received string Y, which is a noisy version of X. We assume that Y contains any number of substitution, insertion, and deletion errors, and that no two consecutive symbols of X were deleted in transmission." Inform Sci 1981 23 2 123-142 0653 Kashyap,R.L. The Noisy Substring Ma.. IEEE Trans.Soft 83 9(3):365-370 Kashyap RL; Oommen BJ The Noisy Substring Matching Problem Dictionary match; Correction; USA; Edit "We considered the problem of estimating the set T(U), the subset of words in the dictionary H which contains U as a substring, using only Y, a noisy version of U. The suggested set estimate S*(Y) which needs cubic time for computation has relatively high accuracy as verified by experiments." IEEE Trans Software Eng 1983 9 3 365-370 0654 Keim,P. An Examination of the .. J.Mol.Biol. 81 151:179-197 Keim P; Heinrikson RL; Fitch WM An Examination of the Expected Degree of Sequence Similarity that might Arise in Proteins that have Converged to Similar Conformational States Pairwise comparison; Significance; USA; Statistical; Structure; Similarity; Protein "The influence of structural similarity on both the genetic tests for amino acid sequence similarity and the inference of homology was examined by statistical methods." J Mol Biol 151 151 179-197 0655 Kim,J.Y. An Approximate String-.. Theoret.Comput. 92 92(1):107-117 Kim JY; Shawe-Taylor J An Approximate String-Matching Algorithm Approximate match; UK; Data structure; Search tree; N-gram; String match; Algorithm "An approximate string-matching algorithm is described based on earlier attribute-matching algorithms. The algorithm involves building a trie from the text string .... Once this data structure has been built any number of approximate searches can be made .... The ideas employed in the algorithm have been shown effective in practice before, but have not previously received any theoretical analysis." Theoret Comput Sci 1992 92 1 107-117 0656 Kleffe,J. First and Second Momen.. Comput.Appl.Bio 92 8(5):433-441 Kleffe J; Borodovsky M First and Second Moment of Counts of Words in Random Texts Generated by Markov Chains Sequence analysis; Significance; Markov; DE; Word "An exact expression for the variance of random frequency that a given word has in text generated by a Markov chain is presented. The result is applied to periodic Markov chains, which describe the protein-coding DNA sequences better than simple Markov chains. A new solution to the problem of word overlap is proposed." Comput Appl Biosci 1992 8 5 433-441 0657 Kleffe,J. The Joint Distribution.. Comput.Appl.Bio 93 9(3):275-283 Kleffe J; Grau E The Joint Distribution of Patterns in Random Sequences with Application to the RC-measure for Expressivity Sequence analysis; Significance; DE; Markov; Distribution "A method was previously developed for computation of pattern probabilities in random sequences under Markov chain models. We extend this method to the calculation of the joint distribution for two patterns." Comput Appl Biosci 1993 9 3 275-283 0658 Kleffe,J. Exact Computation of P.. Comput.Appl.Bio 90 6(4):347-353 Kleffe J; Langbecker U Exact Computation of Pattern Probabilities in Random Sequences Generated by Markov Chains Sequence analysis; Significance; Markov; DE; Probability "Observed patterns in macromolecular sequences are often considered as words and compared with their probabilities of occurring in random sequences. Calculation of these probabilities, however, often lacks rigour. We have developed an algorithm for exact computation of such probabilities for stochastic sequences that follow a Markov chain model." Comput Appl Biosci 1990 6 4 347-353 0659 Knuth,D.E. Fast Pattern Matching .. SIAM J.Comput. 77 6(2):323-350 Knuth DE; Morris JH; Pratt VR Fast Pattern Matching in Strings String match; Knuth-Morris-Pratt; USA; Pattern match "An algorithm is presented which finds all occurrences of one given string within another, in running time proportional to the sum of the lengths of the strings. The constant of proportionality is low enough to make this algorithm of practical use, and the procedure can also be extended to deal with some more general pattern-matching problems." SIAM J Comput 1977 6 2 323-350 0660 Konings,D.A.M Evolution of the Prima.. Mol.Biol.Evol. 87 4(3):300-314 Konings DAM; Hogeweg P; Hesper B Evolution of the Primary and Secondary Structures of the E1a mRNAs of the Adenovirus Multiple alignment; Evolutionary tree; NL; Structure; Evolution; Secondary "Sankoff et al. (1972) were the first to use an estimated genealogical relationship among sequences to assist in aligning multiple sequences. ... A modified Sankoff et al. procedure ... (Hogeweg and Hesper 1984) ... called TRIALS, modified and improved still further, is detailed here." Mol Biol Evol 1987 4 3 300-314 0661 Krishnan,G. DNA Sequence Analysis:.. Nucleic Acids R 86 14(1):543-550 Krishnan G; Kaul RK; Jagadeeswaran P DNA Sequence Analysis: A Procedure to Find Homologies Among Many Sequences Consensus sequence; Dot; USA; Sequence analysis; Program; Homology; DNA "SEQCMP, a program that analyzes and searches for homology among multiple nucleic acid sequences, is described. The sequences are compared by the dot matrix method and the consensus sequence is derived by superimposing all the dot matrices on one another." Nucleic Acids Res 1986 14 1 543-550 0662 Kruskal,J.B. An Overview of Sequenc.. Time Warps, S.. 83Addison-Wesley Kruskal JB An Overview of Sequence Comparison Sankoff D Kruskal JB Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison Pairwise comparison; Review; USA; Sequence comparison Introduction to basic concepts, terminology, and notation (same as Kruskal 1983). It is the first of three chapters comprising a readable, self-contained exposition on sequence comparison (Kruskal, Liberman 1983; Kruskal, Sankoff 1983) Addison-Wesley Reading, MA 1983 1-44 0663 Kruskal,J.B. An Overview of Sequenc.. SIAM Rev. 83 25(2):201-237 Kruskal JB An Overview of Sequence Comparison: Time Warps, String Edits, and Macromolecules Pairwise comparison; Review; USA; Sequence comparison; Edit "A wide variety of different applications lead to problems in which sequences of different lengths must be compared, to see how different they are, and to see which elements in one sequence correspond to which elements in the other sequences. ... This paper surveys the applications, methods, and theory of sequence comparison." Same as Kruskal (1983) SIAM Rev 1983 25 2 201-237 0664 Kruskal,J.B. The Symmetric Time-war.. Time Warps, S.. 83Addison-Wesley Kruskal JB; Liberman M The Symmetric Time-warping Problem: From Continuous to Discrete Sankoff D Kruskal JB Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison Pairwise comparison; Review; USA Time-warping refers to the comparison of trajectories, or time-labeled curves in multidimensional space, where each trajectory is subject to both alteration by additive random error and variation in speed from one point to another. It is the second of three chapters comprising a readable, self- contained exposition on sequence comparison (Kruskal 1983; Kruskal, Sankoff 1983) Addison-Wesley Reading, MA 1983 125-161 0665 Kruskal,J.B. An Anthology of Algori.. Time Warps, S.. 83Addison-Wesley Kruskal JB; Sankoff D An Anthology of Algorithms and Concepts for Sequence Comparison Sankoff D Kruskal JB Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison Pairwise comparison; Review; USA; Sequence comparison; Dynamic programming; Algorithm This chapter is for skeptics as yet unconvinced of the utility of the dynamic programming approach to sequence comparison. It is the third of three chapters comprising a readable, self-contained exposition on sequence comparison (Kruskal 1983; Kruskal, Liberman 1983) Addison-Wesley Reading, MA 1983 265-310 0666 Kumar,S.K. A Linear Space Algorit.. Acta Inform. 87 24:353-362 Kumar SK; Rangan CP A Linear Space Algorithm for the LCS Problem Longest common; India; Complexity; Algorithm "A new linear-space algorithm to solve the LCS problem is presented. The only other algorithm with linear-space complexity is by Hirschberg and has run time complexity O(mn). Our algorithm, based on the divide and conquer technique, has run time complexity O(n(m - p)), where p is the length of the LCS." Acta Inform 24 24 353-362 0667 Lake,J.A. The Order of Sequence .. Mol.Biol.Evol. 91 8(3):378-385 Lake JA The Order of Sequence Alignment can Bias the Selection of Tree Topology Multiple alignment; Phylogeny; USA; Selection; Bias; Topology; Sequence alignment "The order in which sequences are aligned can bias tree selection. To test the effect of alignment order, the classical four-taxon test has been applied to the 'tree of life' by using alternative alignments and three reconstruction algorithms (maximum parsimony, transversion parsimony, and evolutionary parsimony). ... Specific alignment orders systematically favor alternative trees." Mol Biol Evol 1991 8 3 378-385 0668 Landau,G.M. Efficient String Match.. Theoret.Comput. 86 43:239-249 Landau GM; Vishkin U Efficient String Matching with k Mismatches Match with k mismatches; IL; String match "Given a text of length n, a pattern of length m, and an integer k, we present an algorithm for finding all occurrences of the pattern in the text, each with at most k mismatches." Theoret Comput Sci 43 43 239-249 0669 Landau,G.M. Introducing Efficient .. ACM Sympos.Theo 86 18:220-230 Landau GM; Vishkin U Introducing Efficient Parallelism into Approximate String Matching and a New Serial Algorithm Match with k differences; Parallel; IL; String match; Approximate match; Algorithm "Given a text of length n, a pattern of length m and an integer k, we present parallel and serial algorithms for finding all occurrences of the pattern in the text with at most k differences." ACM Sympos Theory Comput 18 18 220-230 0670 Landau,G.M. Fast String Matching w.. J.Comput.System 88 37(1):63-78 Landau GM; Vishkin U Fast String Matching with k Differences Match with k differences; IL; String match "Given a text of length n, a pattern of length m, and an integer k, we present an algorithm for finding all occurrences of the pattern in the text, each with at most k differences." J Comput Systems Sci 1988 37 1 63-78 0671 Landau,G.M. Fast Parallel and Seri.. J.Algorithms 89 10:157-169 Landau GM; Vishkin U Fast Parallel and Serial Approximate String Matching Match with k differences; Parallel; IL; String match; Approximate match "Given text of length n, a pattern of length m and an integer k, we present parallel and serial algorithms for finding all occurrences of the pattern in the text with at most k differences. The parallel algorithm requires O(log m + k) time using n processors. The serial algorithm runs in O(nk) time for an alphabet whose size is fixed." J Algorithms 10 10 157-169 0672 Landau,G.M. An Efficient String Ma.. Nucleic Acids R 86 14(1):31-46 Landau GM; Vishkin U; Nussinov R An Efficient String Matching Algorithm with k Differences for Nucleotide and Amino Acid Sequences Match with k differences; IL; String match; Amino acid; Nucleotide; Algorithm Given a pattern of length m, a text of length n, and an integer k, "we present a simple algorithm showing that sequences can be optimally aligned in O(k2n) time. For long sequences the gain factor over the currently used algorithms is very large." Nucleic Acids Res 1986 14 1 31-46 0673 Landau,G.M. An Efficient String Ma.. J.Theor.Biol. 87 126(4):483-490 Landau GM; Vishkin U; Nussinov R An Efficient String Matching Algorithm with k Substitutions for Nucleotide and Amino Acid Sequences Match with k mismatches; IL; String match; Substitution; Amino acid; Nucleotide; Algorithm "Given a text of length n, a pattern of length m and an integer k, we present an algorithm for finding all occurrences of the pattern in the text, each with at most k substitutions. The algorithm runs in O(k(m log m + n)) time, and requires O(nk) space. This algorithm has direct implications for nucleotide and amino acid sequence comparisons." J Theor Biol 1987 126 4 483-490 0674 Landau,G.M. Locating Alignments wi.. Comput.Appl.Bio 88 4(1):19-24 Landau GM; Vishkin U; Nussinov R Locating Alignments with k Differences for Nucleotide and Amino Acid Sequences Match with k differences; Subalignment; IL; Pairwise alignment; Approximate match; Amino acid; Nucleotide "Given two sequences, a pattern of length m, a text of length n and a positive integer k, we give two algorithms. The first finds all occurrences of the pattern in the text as long as these do not differ from each other by more than k differences. ... The second algorithm finds all subsequence alignments between the pattern and the text with at most k differences." Comput Appl Biosci 1988 4 1 19-24 0675 Landau,G.M. Fast Alignment of DNA .. Methods Enzymol 90 183:487-502 Landau GM; Vishkin U; Nussinov R Fast Alignment of DNA and Protein Sequences Match with k differences; Subalignment; IL; Approximate match; Protein; DNA "Searching [a database] for some 'key subsequences' with presumably special coding or regulatory function is a basic problem often arising in the analysis of the biological significance of these data. ... The numerous possible matchings of long sequences generate, for these data, a problem of computational difficulty. ... Our algorithms suggest a more efficient organization of the matching task and the involved bookkeeping of the quality of partial matchings." Methods Enzymol 183 183 487-502 0676 Lander,E. Study of Protein Seque.. J.Supercomput. 89 3:255-269 Lander E; Mesirov JP; Taylor W IV Study of Protein Sequence Comparison Metrics on the Connection Machine CM- 2 Pairwise comparison; Parallel; USA; Sequence comparison; Statistical; Scoring; Dynamic programming; Protein "Software tools have been developed to do rapid, large-scale protein sequence comparisons on databases of amino acid sequences, using a data parallel computer architecture. ... We have used this software to analyze the effectiveness of various scoring metrics in determining sequence similarity, and to generate statistical information about the behavior of these scoring systems under the variation of certain parameters." J Supercomput 3 3 255-269 0677 Landes,C. A Comparison of Severa.. Nucleic Acids R 92 20(14):3631-36 Landes C; Henaut A; Risler JL A Comparison of Several Similarity Indexes Used in the Classification of Protein Sequences: A Multivariate Analysis Sequence proximity; FR; Multivariate; FASTA; Classification; Dot; Similarity; Protein "The present work describes an attempt to identify reliable criteria which could be used as distance indices between protein sequences." Seven criteria were tested. "Three criteria gave a classification consistent with known similarities between the sequences in the sets, namely the Z-scores from BESTFIT and FASTA and the multiple dot plot comparison distance index from DOCMA." Nucleic Acids Res 1992 20 14 3631-3637 0678 Landes,C. Dot-plot Comparisons b.. Comput.Appl.Bio 93 9(2):191-196 Landes C; Henaut A; Risler JL Dot-plot Comparisons by Multivariate Analysis (DOCMA): A Tool for Classifying Protein Sequences Multiple comparison; Dot; FR; Multivariate; Clustering; Protein "A method aimed at classifying protein sequences without resorting to pairwise alignment is presented. Called DOCMA (DOt-plot Comparisons by Multivariate Analysis), it is based on a multivariate analysis of the pairwise dot-plots between all the sequences in the set." Comput Appl Biosci 1993 9 2 191-196 0679 Landraud,A.M. An Algorithm for Findi.. IEEE Trans.Patt 89 11(8):890-895 Landraud AM; Avril JF; Chretienne P An Algorithm for Finding a Common Structure Shared by a Family of Strings Multiple alignment; Segment; FR; Repeat; Dynamic programming; Structure; Algorithm "Our [alignment] method works in two successive stages. First, we use a fast algorithm for drawing up a directory of exactly repeated patterns appearing in a given majority of strings. In the second stage, our algorithm constructs recursively 'anchoring patterns' by a 'divide-and-conquer' strategy and converges on a maximum number of alignments." IEEE Trans Patt Anal Mach Intell 1989 11 8 890-895 0680 Lawrence,C.B. Use of Homology Domain.. Methods Enzymol 90 183:133-146 Lawrence CB Use of Homology Domains in Sequence Similarity Detection Subalignment; USA; Region; Locally optimal; Pairwise alignment; Homology; Similarity; Detection "This chapter describes an approach to identifying sequence similarities that complements other standard methods. It is in the class of methods that finds local optimal alignments. The main strength of this approach is that it can identify the boundaries of homologous regions between two sequences with great precision." Methods Enzymol 183 183 133-146 0681 Lawrence,C.B. Definition and Identif.. Comput.Appl.Bio 88 4(1):25-33 Lawrence CB; Goldman DA Definition and Identification of Homology Domains Subalignment; Significance; USA; Region; Identification; Probabilistic; Scoring; Homology "The notion of a 'homology domain' is employed which defines the boundaries of a region of sequence homology containing no insertions or deletions. The relative significance of different potential homology domains is evaluated using a non-linear similarity score related to the probability of finding the observed level of similarity in the region by chance." Comput Appl Biosci 1988 4 1 25-33 0682 Lawrence,C.B. Optimized homology sea.. Bull.Math.Biol. 86 48(5/6):569-58 Lawrence CB; Goldman DA; Hood RT Optimized homology searches of the gene and protein sequence data banks Database search; USA; Homology; Gene; Protein "A strategy is presented for searching the gene and protein sequence data banks which combines the use of two previously described algorithms [Altschul, Erickson (1986); Lipman, Pearson (1985); Wilbur; Lipman (1983)]. The implementation of this strategy is thoroughly evaluated with respect to sensitivity, specificity and speed." Bull Math Biol 1986 48 5/6 569-583 0683 Lawrence,C.E. Maximum Likelihood Est.. J.Theor.Biol. 85 113:425-439 Lawrence CE; Reilly AA Maximum Likelihood Estimation of Subsequence Conservation Sequence proximity; USA; Likelihood; Statistical; Markov; Subsequence; Estimation "A statistical method is presented for comparing protein sequences by partitioning the polymers and estimating each subsegment's degree of conservation. Conservation is measured as a function of the number of transitions occurring in the underlying time homogeneous Markov process assumed to govern amino acid mutations. ... Partitioning and estimation are carried out via maximum likelihood. The method is contrasted with the ... percent homology measure." J Theor Biol 113 113 425-439 0684 Lawrence,C.E. An Expectation Maximiz.. Proteins Struct 90 7:41-51 Lawrence CE; Reilly AA An Expectation Maximization (EM) Algorithm for the Identification and Characterization of Common Sites in Unaligned Biopolymer Sequences Consensus sequence; Information theory; USA; Identification; Likelihood; Characterization; Expectation; Maximization; Algorithm "Statistical methodology for the identification and characterization of protein binding sites in a set of unaligned DNA fragments is presented. ... No alignment of the sites is required. Instead, the uncertainty in the location of the sites is handled by employing the missing information principle to develop an 'expectation maximization' (EM) algorithm." Proteins Struct Funct Genet 7 7 41-51 0685 Lecroq,T. A Variation on the Boy.. Theoret.Comput. 92 92:119-144 Lecroq T A Variation on the Boyer-Moore Algorithm String match; Boyer-Moore; FR; Algorithm "A new approach can possess the ability for a given position in the text to compute the length of the longest prefix of the word which ends at that position. When we know this length, we are able to compute a better shift than the Boyer-Moore approach. ... This leads to a linear-time algorithm which scans the text characters at most three times each." Theoret Comput Sci 92 92 119-144 0686 Lee,K.C. Design and Analysis of.. Lecture Notes i 89 368:215-229 Lee KC; Mak VW Design and Analysis of a Parallel VLSI String Search Algorithm String match; Parallel; USA; Pattern match; VLSI; String search; Algorithm In Boral, H., Faudemay, P. (Eds.), Database Machines. Proceedings of the Sixth International Workshop, IWDM '89, Deauville, France, 19-21 June 1989. "In this paper, we propose a parallel VLSI string search algorithm called the Data Parallel Pattern Matching (DPPM) algorithm." Lecture Notes in Comput Sci 368 368 215-229 0687 Lefevre,C. Pattern Recognition in.. Comput.Appl.Bio 93 9(3):349-354 Lefevre C; Ikeda JE Pattern Recognition in DNA Sequences and its Application to Consensus Foot-printing Multiple comparison; Common feature; JP; Pattern recognition; Motif; Significance; Repeat; DNA; Recognition "We consider the problem of comparing several nucleic acid sequences to identify words occurring imperfectly (patterns with no gap) with unusual frequency. Methods for computing, representing, and inspecting interactively the structure of such repeating motifs in nucleic acids and more generally any text are described." Comput Appl Biosci 1993 9 3 349-354 0688 Lefevre,C. The Position End-Set T.. Comput.Appl.Bio 93 9(3):343-348 Lefevre C; Ikeda JE The Position End-Set Tree: A Small Automaton for Word Recognition in Biological Sequences String match; JP; Search tree; Regularities; Automata; Word; Recognition "When one is expecting to do many substring searches it is worthwhile to build an auxiliary index to the sequence to aid in the search. We propose a method to generate a compact index that can be viewed as a small (partial) deterministic finite automaton recognizing the subword structure of a sequence." Comput Appl Biosci 1993 9 3 343-348 0689 Lesk,A.M. Homology Modelling: In.. Curr.Opin.Struc 92 2:242-247 Lesk AM; Boswell DR Homology Modelling: Inferences from Tables of Aligned Sequences Multiple alignment; Structure; NZ; Homology "The relationship between an individual amino acid sequence and its associated protein structure is deterministic, but has proved to be too subtle to understand in detail from studying the relationships between single amino acid sequences and structures. The patterns that appear in tables of aligned homologous sequences contain much more information, and study of them has led to success in several applications." Curr Opin Struct Biol 2 2 242-247 0690 Lesk,A.M. Alignment of the Amino.. Protein Eng. 86 1(1):77-78 Lesk AM; Levitt M; Chothia C Alignment of the Amino Acid Sequences of Distantly Related Proteins using Variable Gap Penalties Sequence alignment; UK; Region; Gap; Needleman-Wunsch; Structure; Protein; Amino acid "Because of the importance of the stability of the structures of the packing of helix-helix interfaces, insertions and deletions are not observed to occur in the interiors of helical regions of proteins. ... It is possible to apply this insight ... to the alignment of distantly related sequences by a modification of the Needleman-Wunsch procedure." Protein Eng 1986 1 1 77-78 0691 Leung,M.Y. An Efficient Algorithm.. J.Mol.Biol. 91 221(4):1367-13 Leung MY; Blaisdell BE; Burge C; Karlin S An Efficient Algorithm for Identifying Matches With Errors in Multiple Long Molecular Sequences Multiple comparison; Common feature; USA; Error; Repeat; Algorithm "An efficient algorithm is described for finding matches, repeats and other word relations, allowing for errors, in large data sets of long molecular sequences. The algorithm entails hashing on fixed-size words in conjunction with the use of a linked list connecting all occurrences of the same word." J Mol Biol 1991 221 4 1367-1378 0692 Levenshtein,V Binary Codes Capable o.. Soviet Phys.Dok 66 10(8):707-710 Levenshtein VI Binary Codes Capable of Correcting Deletions, Insertions, and Reversals Sequence proximity; Correction; RU; Edit; Reversal; Deletion "Consider a function r(x, y) defined on pairs of binary words and equal to the smallest number of deletions and insertions that transform the word x into y. It is not difficult to show that the function r(x, y) is a metric .... It can be shown that the function r(x, y) defined on pairs of binary words as equal to the smallest number of deletions, insertions, and reversals that will transform x into y is a metric ...." Soviet Phys Dokl 1966 10 8 707-710 0693 Li,M. String-Matching Cannot.. Inform.Process. 86 22(5):231-236 Li M; Yesha Y String-Matching Cannot be Done by a Two-head One-way Deterministic Finite Automaton String match; Complexity; USA; Automata "String-matching cannot be performed by a two-head one-way deterministic finite automaton (or even by a Turing machine with two one-way input heads and o(n) storage space)." Inform Process Lett 1986 22 5 231-236 0694 Lipman,D.J. A Tool for Multiple Se.. Proc.Nat.Acad.S 89 86:4412-4415 Lipman DJ; Altschul SF; Kececioglu JD A Tool for Multiple Sequence Alignment Multiple alignment; USA; Sequence alignment; Dynamic programming "We describe the design and application of a tool for multiple alignment of amino acid sequences that implements a new algorithm that greatly reduces the computational demands of dynamic programming. This tool is able to align in reasonable time as many as eight sequences the length of an average protein." Proc Nat Acad Sci USA 86 86 4412-4415 0695 Lipman,D.J. Comparative Analysis o.. Nucleic Acids R 82 10(8):2723-273 Lipman DJ; Maizel J Comparative Analysis of Nucleotide Acid Sequences by their General Constraints Sequence analysis; Significance; USA; Information theory; Nucleotide "We describe two measures of a nucleic acid sequence, derived from Information Theory, which characterize the constraints toward nonuniform base composition, and the constraints on the ordering of the bases." Nucleic Acids Res 1982 10 8 2723-2739 0696 Lipman,D.J. Rapid and Sensitive Pr.. Science 85 227(22 March): Lipman DJ; Pearson WR Rapid and Sensitive Protein Similarity Searches Database search; USA; Similarity; Protein "We have developed an algorithm, used in the computer program FASTP... In this article, we discuss the basis of the algorithm and its application to two proteins evolutionarily related to other sequences in the database. In addition, we show an example of a search which presented puzzling results and discuss criteria for evaluating such results." Science 1985 227 22 March 1435-1441 0697 Lipman,D.J. On the Statistical Sig.. Nucleic Acids R 84 12(1):215-226 Lipman DJ; Wilbur WJ; Smith TF; Waterman MS On the Statistical Significance of Nucleic Acid Similarities Subalignment; Significance; USA; Statistical; Similarity; Nucleic acid "The known statistical properties of nucleic acid sequences strongly affect the statistical distribution of similarity values when calculated by standard procedures. We propose a series of models which account for some of these known statistical properties. The utility of the method is demonstrated in evaluating high relative similarity scores in four specific cases in which there is little biological context by which to judge the similarities." Nucleic Acids Res 1984 12 1 215-226 0698 Lopresti,D.P. P-NAC: A Systolic Arra.. Computer 87 20(7):98-99 Lopresti DP P-NAC: A Systolic Array for Comparing Nucleic Acid Sequences Pairwise alignment; Parallel; USA; Dynamic programming; Nucleic acid "The Princeton Nucleic Acid Comparator (P-NAC) is a linear systolic array for comparing DNA sequences. The architecture is a parallel realization of a standard dynamic programming algorithm" (Wagner, Fischer 1974) Computer 1987 20 7 98-99 0699 Lowrance,R. An Extension of the St.. J.Assoc.Comput. 75 22(2):177-183 Lowrance R; Wagner RA An Extension of the String-to-string Correction Problem Pairwise alignment; Correction; USA; Edit The string-to-string correction problem of Wagner and Fischer (1974) used the edit operations of insertion, deletion, and mutation. "This paper extends the set of allowable edit operations to include the operation of interchanging the positions of two adjacent characters." J Assoc Comput Mach 1975 22 2 177-183 0700 Lu,S.Y. A Sentence-to-Sentence.. IEEE Trans.Syst 78 8(5):381-389 Lu SY; Fu KS A Sentence-to-Sentence Clustering Procedure for Pattern Analysis Sequence proximity; USA; Clustering; Probabilistic; Edit "The similarity between patterns is expressed in terms of the distance between their corresponding sentences. A weighted distance between two strings is defined and its probabilistic interpretation given. ... The following algorithm, which is an extension of Wagner and Fisher's algorithm, computes the weighted distance [in which insertions, deletions, and substitutions each have distinct weights] between two strings." IEEE Trans Systems Man Cybernet 1978 8 5 381-389 0701 Luthy,R. Secondary Structure-ba.. Proteins Struct 91 10:229-239 Luthy R; McLachlan AD; Eisenberg D Secondary Structure-based Profiles: Use of Structure-Conserving Scoring Tables in Searching Protein Sequence Databases for Structural Similarities Match a pattern matrix; USA; Sequence database; Sequence comparison; Structure; Profile; Scoring; Similarity; Protein; Secondary "The profile method, for detecting distantly related proteins by sequence comparison, has been extended to incorporate secondary structure information from known X-ray structures. ... As in the standard profile method [Gribskov, McLachlan, Eisenberg 1987], a position-dependent scoring table, termed a profile, is calculated from the aligned sequences." Proteins Struct Funct Genet 10 10 229-239 0702 Lyon,G. Syntax-Directed Least-.. Comm.ACM 74 17(1):3-14 Lyon G Syntax-Directed Least-Errors Analysis for Context-Free Languages: A Practical Approach Sequence recognition; Correction; Language; USA; Dynamic programming "A least-errors recognizer is developed informally using the well-known recognizer of Earley, along with elements of Bellman's dynamic programming. The analyzer takes a general class of context-free grammars as drivers, and any finite string as input. Recognition consists of a least-errors count for a corrected version of the input relative to the driver grammar." Comm ACM 1974 17 1 3-14 0703 Maes,M. On a Cyclic String-to-.. Inform.Process. 90 35(2):73-78 Maes M On a Cyclic String-to-string Correction Problem Pairwise alignment; Correction; NL; Longest common; Edit "This leads to the notion of a cyclic string, and in this paper we present an O(nm log m) algorithm to solve the string-to-string correction problem for cyclic strings." Inform Process Lett 1990 35 2 73-78 0704 Maier,D. The Complexity of Some.. J.Assoc.Comput. 78 25(2):322-336 Maier D The Complexity of Some Problems on Subsequences and Supersequences Longest common; Supersequence; Complexity; USA; Subsequence The problem of calculating a longest common subsequence in N sequences is NP-complete J Assoc Comput Mach 1978 25 2 322-336 0705 Maizel,J.V.,J Enhanced Graphic Matri.. Proc.Nat.Acad.S 81 78(12):7665-76 Maizel JV Jr; Lenk RP Enhanced Graphic Matrix Analysis of Nucleic Acid and Protein Sequences Pairwise comparison; Dot; DE; Regularities; Repeat; Palindrome; Protein; Nucleic acid; Graphic; Matrix The method "analyzes nucleic acid and amino acid sequences for features of possible biological interest and reveals the spatial patterns of such features. When a sequence is compared to itself the technique shows regions of self- complementarity, direct repeats, and palindromic subsequences. Comparison of two ... sequences ... showed domains of similarity, regions of divergence, and features explainable by transpositions." Proc Nat Acad Sci USA 1981 78 12 7665-7669 0706 Manber,U. An Algorithm for Strin.. Inform.Process. 91 37:133-136 Manber U; Baeza-Yates R An Algorithm for String Matching with a Sequence of Don't Cares Match with don't cares; USA; String match; Text search; Don't care; Don't care; Don't care; Don't care; Don't care; Don't care; Don't care; Don't care; Don't care; Algorithm; Don't care; Don't care "We present an algorithm to search for a pattern containing a sequence of don't care symbols in a preprocessed text. This problem models proximity searching in text searching systems and special searching problems in biological sequences." Inform Process Lett 37 37 133-136 0707 Marck,C. Fast Analysis of DNA a.. Nucleic Acids R 86 14(1):583-590 Marck C Fast Analysis of DNA and Protein Sequence on Apple IIe: Restriction Sites Search, Alignment of Short Sequence and Dot Matrix Analysis Match with k differences; FR; Dot; Restriction; Gap; Protein; DNA; Matrix "The search for a short sequence (< 36 bases) within a longer one (up to 9999 bases) with a given number of mismatches or gaps allowed has also been written in assembly language." The algorithm is a simplification of one by Fickett (1984) Nucleic Acids Res 1986 14 1 583-590 0708 Martinez,H.M. An Efficient Method fo.. Nucleic Acids R 83 11(13):4629-46 Martinez HM An Efficient Method for Finding Repeats in Molecular Sequences Multiple alignment; Common feature; Regularities; USA; Complexity; Repeat; Dyad "The problem of finding repeats in molecular sequences is approached as a sorting problem. It leads to a method which is linear in space complexity and N log N in expected time complexity. ... Of particular interest is that several sequences can be treated as a single sequence. This leads to an efficient method ... for finding common features of many sequences, such as favorable alignments." Nucleic Acids Res 1983 11 13 4629-4634 0709 Martinez,H.M. A Flexible Multiple Se.. Nucleic Acids R 88 16(5):1683-169 Martinez HM A Flexible Multiple Sequence Alignment Program Multiple alignment; Clustering; USA; Sequence alignment; Region; Program "The 'regions' method for multisequence alignment used in the previously reported program MALIGN [Sobel, Martinez 1986] has been generalized to include recursive refinement so that unaligned portions between two regions at the current level of resolution can be handled with increased resolution. ... GENALIGN uses this improved regions method to execute fast pairwise alignments in the framework of Taylor's multisequence alignment procedure using clustered pairwise alignments." Nucleic Acids Res 1988 16 5 1683-1691 0710 Masek,W.J. A Faster Algorithm Com.. J.Comput.System 80 20(1):18-31 Masek WJ; Paterson MS A Faster Algorithm Computing String Edit Distances Pairwise alignment; USA; Edit; Longest common; Distance; Algorithm "The operations we admit are deleting, inserting and replacing one symbol at a time, with possibly different costs for each of these operations. ... We describe an algorithm for computing the edit distance between two strings of length n" which requires O(n2/log n) "steps whenever the costs of edit operations are integral multiples of a single positive real number and the alphabet for the strings is finite." J Comput Systems Sci 1980 20 1 18-31 0711 Masek,W.J. How to Compute String-.. Time Warps, S.. 83Addison-Wesley Masek WJ; Paterson MS How to Compute String-edit Distances Quickly Sankoff D Kruskal JB Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison Pairwise alignment; USA; Edit; Longest common; Distance "We present an algorithm with an asymptotically faster execution time, for example O(n2/log n) when both strings are of length n, providing that the alphabet for the strings is finite and all edit costs are integral multiples of some real number r." Addison-Wesley Reading, MA 1983 337-349 0712 McCaldon,P. Oligopeptide Biases in.. Proteins Struct 88 4:99-122 McCaldon P; Argos P Oligopeptide Biases in Protein Sequences and Their Use in Predicting Protein Coding Regions in Nucleotide Sequences String match; Significance; DE; Region; Coding; Frame; Protein; Nucleotide "We have examined oligopeptides with lengths ranging from 2 to 11 residues in protein sequences that show no obvious evolutionary relationship. ... The results, contrary to previous studies, show clear prejudices in protein sequences. The oligopeptide preferences were used to help decide the significance of sequence homologies ...." Proteins Struct Funct Genet 4 4 99-122 0713 McCreight,E.M A Space-Economical Suf.. J.Assoc.Comput. 76 23(2):262-272 McCreight EM A Space-Economical Suffix Tree Construction Algorithm String match; Search tree; USA; Pattern match; Suffix; Algorithm "A new algorithm is presented for constructing auxiliary digital search trees to aid in exact-match substring searching. This algorithm has the same asymptotic running time bound as previously published algorithms, but is more economical in space. ... New work on ... (the update problem) is presented." J Assoc Comput Mach 1976 23 2 262-272 0714 McLachlan,A.D Tests for Comparing Re.. J.Mol.Biol. 71 61:409-424 McLachlan AD Tests for Comparing Related Amino Acid Sequences. Cytochrome c and Cytochrome c551 Pairwise comparison; Dot; UK; Segment; Statistical; Repeat; Amino acid "An improved method for testing similarities or repeats in protein sequences ... includes three features: a measure of similarity for amino acids, based on observed substitutions in homologous proteins; a search procedure which compares all pairs of segments of two proteins; new statistical tests which estimate the probabilities that observed correlations could have occurred by chance." J Mol Biol 61 61 409-424 0715 McLachlan,A.D Analysis of Gene Dupli.. J.Mol.Biol. 83 169(1):15-30 McLachlan AD Analysis of Gene Duplication Repeats in the Myosin Rod Pairwise comparison; Dot; UK; Statistical; Significance; Repeat; Duplication; Gene "For the analysis of the myosin repeats we have needed to develop improved statistical methods which make accurate tests of significance under a range of assumptions about the nature of the sequence. These methods are outlined below. ... We begin below with a brief resume of the comparison matrix method and then describe the newer developments." J Mol Biol 1983 169 1 15-30 0716 McLachlan,A.D Confidence Limits for .. J.Mol.Biol. 85 185:39-49 McLachlan AD; Boswell DR Confidence Limits for Homology in Protein or Gene Sequences. The c-myc Oncogene and Adenovirus E1a Protein Pairwise alignment; Significance; UK; Statistical; Homology; Confidence; Protein; Gene "We describe new tests, of general application, for deciding whether two proteins or DNA sequences are significantly homologous, in cases where the relationship is neither evidently true nor evidently false." J Mol Biol 185 185 39-49 0717 Mehldau,G. A System for Pattern M.. Comput.Appl.Bio 93 9(3):299-314 Mehldau G; Myers G A System for Pattern Matching Applications on Biosequences Match complex patterns; USA; Pattern match; Approximate match; Program; Pattern definition "ANREP is a system for finding matches to patterns .... ANREP provides a unified framework for almost all previously proposed biosequence patterns and extends them by providing approximate matching, a feature heretofore unavailable except for the limited case of individual sequences." Comput Appl Biosci 1993 9 3 299-314 0718 Mengeritsky,G Recognition of Charact.. Comput.Appl.Bio 87 3(3):223-227 Mengeritsky G; Smith TF Recognition of Characteristic Patterns in Sets of Functionally Equivalent DNA Sequences Consensus sequence; Statistical; USA; Pattern discovery; DNA; Recognition "An algorithm has been developed for the identification of unknown patterns which are distinctive for a set of short DNA sequences believed to be functionally equivalent. A pattern is defined as being a string, containing fully or partially specified nucleotides at each position of the string." Comput Appl Biosci 1987 3 3 223-227 0719 Miller,P.L. Parallel Computation a.. Comput.Appl.Bio 91 7(1):71-78 Miller PL; Nadkarni PM; Pearson WR Parallel Computation and FASTA: Confronting the Problem of Parallel Database Search for a Fast Sequence Comparison Algorithm Database search; Parallel; USA; FASTA; Sequence comparison; Algorithm "We have parallelized the FASTA algorithm for biological sequence comparison using Linda, a machine-independent parallel programming language. The resulting parallel program runs on a variety of different parallel machines." Comput Appl Biosci 1991 7 1 71-78 0720 Miller,P.L. Comparing Machine-inde.. Comput.Appl.Bio 92 8(2):167-175 Miller PL; Nadkarni PM; Pearson WR Comparing Machine-independent versus Machine-specific Parallelization of a Software Platform for Biological Sequence Comparison Database search; Parallel; USA; Sequence comparison "A platform program that performs biological sequence comparison provides a case study to compare the relative advantages of a machine-independent approach to parallel computation versus a machine-specific approach. ... In the benchmark tests reported, the benefits of the machine-independent approach were achieved with only a modest sacrifice in efficiency." Comput Appl Biosci 1992 8 2 167-175 0721 Miller,W. Building Multiple Alig.. Comput.Appl.Bio 93 9(2):169-176 Miller W Building Multiple Alignments from Pairwise Alignments Multiple alignment; USA; Pairwise alignment; Dot "Given a family of related sequences, one can first determine alignments between various pairs of those sequences, then construct a simultaneous alignment of all the sequences that is determined in a natural manner by the set of pairwise alignments. ... This paper presents an efficient algorithm for constructing a multiple alignment from a set of pairwise alignments." It makes five assumptions and is based on dot plots Comput Appl Biosci 1993 9 2 169-176 0722 Miller,W. A File Comparison Prog.. Software.Practi 85 15(11):1025-10 Miller W; Myers EW A File Comparison Program Longest common; Sequence proximity; USA; Program; Edit "This paper presents a simple method for computing a shortest sequence of insertion and deletion commands that converts one given file to another. The method is particularly efficient when the difference between the two files is small compared to the files' lengths. In experiments performed on typical files, the program often ran four times faster than the UNIX diff command." Software Practice Experience 1985 15 11 1025-1040 0723 Miller,W. Sequence Comparison wi.. Bull.Math.Biol. 88 50(2):97-120 Miller W; Myers EW Sequence Comparison with Concave Weighting Functions Pairwise alignment; USA; Sequence comparison; Gap; Function "We consider efficient methods for computing a difference metric between two sequences of symbols, where the cost of an operation to insert or delete a block of symbols is a concave function of the block's length. Alternatively, sequences can be optimally aligned when gap penalties are a concave function of the gap length. Two algorithms [based on Waterman 1984] are presented." Bull Math Biol 1988 50 2 97-120 0724 Mirkin,B. Consensus Functions an.. Bull.Math.Biol. 93 55(4):695-713 Mirkin B; Roberts FS Consensus Functions and Patterns in Molecular Sequences Consensus sequence; Neighbourhood; RU; Function "We study a method of consensus originally due to Waterman et al. (1984) which is used to identify patterns or features in a molecular sequence where a pattern can be moved within a given 'window.' We show that some well-known consensus methods of the social sciences, the median and the mean, are special cases of this method .... The specific parameters used [by] Waterman et al. make their method equivalent to the median procedure ...." Bull Math Biol 1993 55 4 695-713 0725 Mironov,A.A. Statistical Method for.. Nucleic Acids R 88 16(11):5169-51 Mironov AA; Alexandrov NN Statistical Method for Rapid Homology Search Database search; RU; Statistical; Fragment; Significance; Homology "Sequences to be compared are divided into fragments with length N, where N is the minimal expected homology size. For each of the fragment pairs the distance r is calculated. If r occurs smaller then r0 - the cutoff value, [these] sequences can contain homologous fragments. ... This way of comparison may be used for preliminary selection of sequence pairs which are thought to be homologous and must be completed by the following construction of the optimal alignment." Nucleic Acids Res 1988 16 11 5169-5173 0726 Miyazawa,S. A New Substitution Mat.. Protein Eng. 93 6(3):267-278 Miyazawa S; Jernigan RL A New Substitution Matrix for Protein Sequence Searches Based on Contact Frequencies in Protein Structures Sequence proximity; Substitution; JP; Sequence search; Scoring; Structure; Protein; Matrix "In global and local homology searches, [our] scoring matrix tends to yield significantly higher alignment scores than either the unitary matrix or the genetic code matrix, and also may yield higher alignment scores for distantly related protein pairs than MDM78." Protein Eng 1993 6 3 267-278 0727 Mohana Rao,J. New Scoring Matrix for.. Internat.J.Pept 87 29:276-281 Mohana Rao JK New Scoring Matrix for Amino Acid Residue Exchanges Based on Residue Characteristic Physical Parameters Sequence proximity; Substitution; USA; Scoring; Amino acid; Matrix; Residue; Physical "When comparing protein sequences for detecting homologies, the use of [our new EMPAR scoring] matrix in place of the Dayhoff log-odds matrix yields results that reflect the topological similarities in the proteins. The use of EMPAR is equivalent to the parametric correlation coefficient approach of Ooi and his colleagues." Internat J Pept Protein Res 29 29 276-281 0728 Moore,G.W. Alignment Statistic fo.. J.Mol.Evol. 77 9:121-130 Moore GW; Goodman M Alignment Statistic for Identifying Related Protein Sequences Pairwise alignment; Significance; USA; Statistical; Codon; Protein "Closely related proteins show an obvious kinship by having numerous matching amino acids in their aligned sequences. Kinship between anciently separated proteins requires a statistical evaluation to rule out fortuitous similarities. A simple statistic is developed which assumes equal probability for all codon pairs ...." J Mol Evol 9 9 121-130 0729 Mott,R.F. Maximum-Likelihood Est.. Bull.Math.Biol. 92 54(1):59-75 Mott RF Maximum-Likelihood Estimation of the Statistical Distribution of Smith- Waterman Local Sequence Similarity Scores Subalignment; Significance; Likelihood; UK; Statistical; Database search; Distribution; Similarity; Estimation; Score "A method is described for estimating the distribution and hence testing the statistical significance of sequence similarity scores obtained during a data-bank search. Maximum-likelihood is used to fit a model to the scores, avoiding any costly simulation of random sequences. The method is applied in detail to the Smith-Waterman algorithm when gaps are allowed ...." Bull Math Biol 1992 54 1 59-75 0730 Mott,R.F. STATSEARCH: A GCG-comp.. Comput.Appl.Bio 90 6(3):293-295 Mott RF; Kirkwood TBL STATSEARCH: A GCG-compatible Program for Assessing Statistical Significance during DNA and Protein Databank Searches Sequence analysis; Significance; UK; Statistical; Program; Database search; Protein; DNA; Databank "We describe a program STATSEARCH which implements the method of [Mott, Kirkwood, Curnow (1989)] for searching DNA and protein sequence databanks for statistically significant similarities to a given query sequence." Comput Appl Biosci 1990 6 3 293-295 0731 Mott,R.F. A Test for the Statist.. Comput.Appl.Bio 89 5(2):123-131 Mott RF; Kirkwood TBL; Curnow RN A Test for the Statistical Significance of DNA Sequence Similarities for Application in Databank Searches Database search; Significance; UK; Statistical; Dyad; Similarity; DNA; Databank "A method is developed, based on word-searching, which provides a rapid test for the statistical significance of DNA sequence similarities for use in databank searching. The method makes allowance for the lengths and dinucleotide compositions of the sequences being compared." Comput Appl Biosci 1989 5 2 123-131 0732 Mott,R.F. An Accurate Approximat.. Bull.Math.Biol. 90 52(6):773-784 Mott RF; Kirkwood TBL; Curnow RN An Accurate Approximation to the Distribution of the Length of the Longest Matching Word between Two Random DNA Sequences Longest common; Significance; UK; Distribution; Statistical; Approximation; DNA; Word The derivation uses "only elementary probability arguments. The distribution is shown to be consistent with previous asymptotic results for the mean and variance of longest common words. The application of the distribution to assessing the statistical significance of sequence similarities is considered." Bull Math Biol 1990 52 6 773-784 0733 Mott,R.F. Tests for the Statisti.. Protein Eng. 90 4(2):149-154 Mott RF; Kirkwood TBL; Curnow RN Tests for the Statistical Significance of Protein Sequence Similarities in Databank Searches Database search; Significance; UK; Statistical; Similarity; Protein; Databank "A suite of tests to evaluate the statistical significance of protein sequence similarities is developed for use in data bank searches. The tests are based on the Wilbur-Lipman word-search algorithm, and take into account the sequence lengths and compositions, and optionally the weighting of amino acid matches." Protein Eng 1990 4 2 149-154 0734 Mrazek,J. GLOBIC: A Very Fast Mi.. Comput.Appl.Bio 92 8(1):29-34 Mrazek J; Kypr J GLOBIC: A Very Fast Microcomputer Program for Fingerprinting, Characterization and Comparison of Long Nucleotide Sequences Pairwise comparison; CZ; Program; Fingerprint; N-gram; Nucleotide; Characterization "Instead of the nucleotide sequences themselves, GLOBIC compares the local nucleotide or short oligonucleotide compositions. GLOBIC presents two- dimensional maps of contour lines depicting the similarity of two different sequences, a sequence compared to itself, to its complementary sequence or to a random sequence." Comput Appl Biosci 1992 8 1 29-34 0735 Mrazek,J. UNIREP: A Microcompute.. Comput.Appl.Bio 93 9(3):355-360 Mrazek J; Kypr J UNIREP: A Microcomputer Program to Find Unique and Repetitive Nucleotide Sequences in Genomes Sequence analysis; CZ; Genome; Program; Repetition; Nucleotide The program "identifies repetitive and unique nucleotide sequences in genomes or parts of genomes. A key feature of the algorithm is an oligonucleotide representation in a numerical code to make possible a comparison of all pairs of oligonucleotides (including overlaps) occurring in the analyzed sequence." Comput Appl Biosci 1993 9 3 355-360 0736 Mukherjee,A. Hardware Algorithms fo.. IEEE Trans.Comp 89 38(4):600-603 Mukherjee A Hardware Algorithms for Determining Similarity Between Two Strings Longest common; USA; Complexity; Hardware; Pattern match; VLSI; Similarity; Algorithm "This paper presents pipelined hardware algorithms, with time complexity O(n + m), for determining similarity between two character strings expressed as the length of the longest common subsequence of the given pair of strings. ... Two methods are presented: a sequential method with serial text input and an alternating method in which both the pattern and the text are serially applied to the machine." IEEE Trans Comput 1989 38 4 600-603 0737 Mukhopadhyay, A Fast Algorithm for t.. Inform.Sci. 80 20:69-82 Mukhopadhyay A A Fast Algorithm for the Longest-Common-Subsequence Problem Longest common; USA; Complexity; Algorithm "A fast algorithm for the [LCS] problem is presented which runs in O((p + n) log n) time, where p is the total number of pairs of matched positions between the strings. Thus, the average performance of this algorithm is much better than those of the quadratic algorithms proposed earlier and takes only a linear amount of space." Inform Sci 20 20 69-82 0738 Murata,M. Three-way Needleman-Wu.. Methods Enzymol 90 183:365-375 Murata M Three-way Needleman-Wunsch Algorithm Multiple alignment; AU; Needleman-Wunsch; Sequence alignment; Dynamic programming; Algorithm Murata, Richardson, Sussman (1985) extended "the method of Needleman and Wunsch so that three sequences could be compared simultaneously. ...Modifications to the program for the comparison of longer sequences are described in this chapter." Methods Enzymol 183 183 365-375 0739 Murata,M. Simultaneous Compariso.. Proc.Nat.Acad.S 85 82:3073-3077 Murata M; Richardson JS; Sussman JL Simultaneous Comparison of Three Protein Sequences Multiple alignment; AU; Needleman-Wunsch; Sequence alignment; Dynamic programming; Protein "Here we present an algorithm for the simultaneous comparison of three biological sequences. The [dynamic programming] algorithm is an extension of the method developed by S. B. Needleman and C. D. Wunsch ...." Murata (1990) describes extensions Proc Nat Acad Sci USA 82 82 3073-3077 0740 Myers,E.W. An O(ND) Difference Al.. Algorithmica 86 1:251-266 Myers EW An O(ND) Difference Algorithm and its Variations Pairwise alignment; USA; Longest common; Edit; Algorithm "The problems of finding a longest common subsequence of two sequences A and B and a shortest edit script for transforming A into B have long been known to be dual problems. In this paper, they are shown to be equivalent to finding a shortest/longest path in an edit graph." The algorithm was discovered independently by Ukkonen (1983, 1985) Algorithmica 1 1 251-266 0741 Myers,E.W. Optimal Alignments in .. Comput.Appl.Bio 88 4(1):11-17 Myers EW; Miller W Optimal Alignments in Linear Space Pairwise alignment; USA; Sequence alignment; Gap; Optimal; Program "Space, not time, is often the limiting factor when computing optimal sequence alignments, and a number of recent papers in the biology literature have proposed space-saving strategies. However, [Hirschberg 1975] presented a method that is superior to the new proposals, both in theory and in practice. The goal of this paper is to give Hirschberg's idea the visibility it deserves by developing a linear-space version of Gotoh's algorithm, which accommodates affine gap penalties." Comput Appl Biosci 1988 4 1 11-17 0742 Myers,E.W. Approximate Matching o.. Bull.Math.Biol. 89 51(1):5-37 Myers EW; Miller W Approximate Matching of Regular Expressions Dictionary match; USA; Language; Approximate match; Expression; Sequence match "Given a sequence A and regular expression R, the approximate regular expression matching problem is to find a sequence matching R whose optimal alignment with A is the highest scoring of all such sequences. This paper develops an algorithm to solve the problem in time O(MN) .... Our method is superior to an earlier algorithm by Wagner and Seiferas in several ways." Bull Math Biol 1989 51 1 5-37 0743 Myers,E.W. Row Replacement Algori.. ACM Trans.Progr 89 11(1):33-56 Myers EW; Miller W Row Replacement Algorithms for Screen Editors Pairwise alignment; USA; Sequence comparison; Dynamic programming; Editor; Edit; Algorithm "Interactive screen editors repeatedly determine terminal command sequences to update a screen row. Computing an optimal command sequence differs from the traditional sequence comparison problem in that there is a cost for moving the cursor over unedited characters and the cost of an n-character command is not always the cost of n one-character commands." A dynamic programming algorithm is presented ACM Trans Programming Languages Systems 1989 11 1 33-56 0744 Myers,E.W. Computer Program for t.. Nucleic Acids R 86 14(1):501-508 Myers EW; Mount DW Computer Program for the IBM Personal Computer which Searches for Approximate Matches to Short Oligonucleotide Sequences in Long Target DNA Sequences Match with k differences; USA; Program; Approximate match; DNA "We describe a program which may be used to find approximate matches to a short predefined DNA sequence in a larger target DNA sequence." The algorithm is a refinement of one by Sellers (1980) Nucleic Acids Res 1986 14 1 501-508 0745 Nakatsu,N. A Longest Common Subse.. Acta Inform. 82 18:171-179 Nakatsu N; Kambayashi Y; Yajima S A Longest Common Subsequence Algorithm Suitable for Similar Text Strings Longest common; JP; Subsequence; Retrieval; Algorithm Let m and n be lengths of two strings, m <= n, which have a longest common subsequence of length p. "In this paper, O(n(m - p)) algorithm is presented. When p is close to m (in other words, two given strings are similar), the algorithm presented here runs much faster than previously known algorithms." Acta Inform 18 18 171-179 0746 Nedde,D.N. Visualizing Relationsh.. Comput.Appl.Bio 93 9(3):331-335 Nedde DN; Ward MO Visualizing Relationships between Nucleic Acid Sequences using Correlation Images Pairwise comparison; Dot; USA; Sequence comparison; Program; Display; Correlation "This paper describes a portable software package implementing a variation of the dot-matrix plot for genetic sequence comparison in conjunction with highly interactive image manipulation and examination techniques." Comput Appl Biosci 1993 9 3 331-335 0747 Needleman,S.B A General Method Appli.. J.Mol.Biol. 70 48(3):443-453 Needleman SB; Wunsch CD A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins Pairwise alignment; Dynamic programming; USA; Needleman-Wunsch; Edit; Similarity; Amino acid; Protein Dynamic programming is used with a similarity criterion to find an optimal alignment of two sequences. This paper is a standard reference for the comparison of two molecular sequences J Mol Biol 1970 48 3 443-453 0748 Niefind,K. Amino Acid Similarity .. J.Mol.Biol. 91 219(3):481-497 Niefind K; Schomburg D Amino Acid Similarity Coefficients for Protein Modeling and Sequence Alignment Derived from Main-chain Folding Angles Sequence proximity; Substitution; DE; Scoring; Sequence alignment; Similarity; Amino acid; Protein; Folding "A set of 'similarity-parameters' was calculated that reflects the influence of the proteinogenic amino acids on the structure of the protein backbone. ... [These parameters] should form a scoring matrix in protein sequence alignment superior to identity scoring. The usability of the 'structure derived correlation matrix (SCM)' for these purposes is assessed and demonstrated for some examples ...." J Mol Biol 1991 219 3 481-497 0749 Nielsen,P.T. On the Expected Durati.. IEEE Trans.Info 73 19:702-704 Nielsen PT On the Expected Duration of a Search for a Fixed Pattern in Random Data String match; DK; Regularities; Complexity "An expression is obtained for the expected duration of a search to find a given L-ary sequence in a semi-infinite stream of random L-ary data. The search time is found to be an increasing function of the lengths of the 'bifices' of the pattern, where the term bifix dentoes a sequence wihc is both a prefix and a suffix." L is the cardinality of the alphabet IEEE Trans Inform Theory 19 19 702-704 0750 Ninio,J. String Analysis and En.. J.Mol.Biol. 89 207:585-596 Ninio J; Mizraji E String Analysis and Energy Minimization in the Partition of DNA Sequences Sequence analysis; Significance; FR; Motif; Signal; DNA; Energy "While the recognition of particular signals in sequences relies on complex physical interactions, the problem is often analysed in terms of the presence or absence of literal motifs (strings) in the sequence. We present here a test-case for evaluating the potential of this approach." J Mol Biol 207 207 585-596 0751 Nussinov,R. An Efficient Code Sear.. J.Theor.Biol. 83 100:319-328 Nussinov R An Efficient Code Searching for Sequence Homology and DNA Duplication Pairwise alignment; IL; Structure; Homology; Duplication; DNA "This paper presents a very simple and efficient algorithm that searches for sequence homology and gene duplication. The code finds the best alignment of two, short or long, sequences without having to specify how many unmatched bases are allowed to be looped out. ... The code runs in O(n3/2) units of time. ... The present method is modeled after the planar folding algorithm ... which has been introduced to secondary structure of RNA .... In general, any good secondary structure algorithm can be converted to yield an algorithm for sequence alignment." J Theor Biol 100 100 319-328 0752 O'Hara,P.J. PRIMEGEN, a Tool for D.. Comput.Appl.Bio 91 7(4):533-534 O'Hara PJ; Venezia D PRIMEGEN, a Tool for Designing Primers from Multiple Alignments Multiple alignment; Program; USA; Sequence alignment; Region "PRIMEGEN (for primer generator) can evaluate a multiple protein sequence alignment both for degree of conservation and for the oligonucleotide degeneracy necessary to encode every amino acid sequence with a given region of the alignment." Comput Appl Biosci 1991 7 4 0753 O'Neill,M.C. Consensus Methods for .. J.Mol.Biol. 89 207(2):301-310 O'Neill MC Consensus Methods for Finding and Ranking DNA Binding Sites. Application to Escherichia coli Promoters Match a pattern matrix; USA; Consensus method; DNA; Binding "There have been many different approaches employed to define the 'consensus' sequence of various DNA binding sites and to use the definition obtained to locate and rank members of a given sequence family. The analysis presented here enlists two of these approaches, each in modified form, to develop a highly efficient search protocol for Escherichia coli promoters ...." J Mol Biol 1989 207 2 301-310 0754 Okuda,T. A Method for the Corre.. IEEE Trans.Comp 76 25:172-177 Okuda T; Tanaka E; Kasai T A Method for the Correction of Garbled Words based on the Levenshtein Metric Sequence proximity; Correction; JP; Edit; Error; Word "In this paper we propose a new method for correcting garbled words based on Levenshtein distance and weighted Levenshtein distance [in which insertions, deletions, and substitutions each have distinct weights]." IEEE Trans Comput 25 25 172-177 0755 Oommen,B.J. Constrained String Edi.. Inform.Sci. 87 40:267-284 Oommen BJ Constrained String Editing Pairwise alignment; CA; Editing "In this paper we consider the problem of transforming X to Y using any arbitrary edit constraint involving the number and type of edit operations to be performed. An algorithm is presented to compute the minimum distance associated with editing X to Y subject to the specified constraint. ... The technique to compute the optimal transformation is also presented." Inform Sci 40 40 267-284 0756 Oommen,B.J. Recognition of Noisy S.. IEEE Trans.Patt 87 9(5):676-685 Oommen BJ Recognition of Noisy Subsequences Using Constrained Edit Distances String match; Correction; CA; Edit; Error; Subsequence; Distance; Recognition "let X* be any unknown word from a finite dictionary H. Let U be any arbitrary subsequence of X*. We consider the problem of estimating X* by processing Y, which is a noisy version of U." IEEE Trans Patt Anal Mach Intell 1987 9 5 676-685 0757 Owens,J. A Fixed-point Alignmen.. Comput.Appl.Bio 88 4(1):73-77 Owens J; Chatterjee D; Nussinov R; Konopka AK; Maizel JV Jr A Fixed-point Alignment Technique for Detection of Recurrent and Common Sequence Motifs Associated with Biological Features Multiple comparison; Common feature; USA; Motif; Detection "A fixed-point alignment analysis technique is presented which is designed to locate common sequence motifs in collections of proteins or nucleic acids. Initially a program aligns a collection of sequences by a common sequence pattern or known biological feature. ... Once all alignment markers are located, the sequences are scanned for occurrences of given oligomers within a specified span both upstream and downstream of the fixed-point." Comput Appl Biosci 1988 4 1 73-77 0758 Owolabi,O. Fast Approximate Strin.. Software.Practi 88 18(4):387-393 Owolabi O; McGregor DR Fast Approximate String Matching Database search; N-gram; UK; String match; Approximate match Approximate string matching to entries in a stored dictionary. "The first [stage] uses a very compact n-gram table to preselect sets of roughly similar strings. The second stage compares these with the input string using an accurate method to give an accurately matched set of strings. ... The resulting method is both computationally fast and storage-efficient." Software Practice Experience 1988 18 4 387-393 0759 Panjukov,V.V. Finding Steady Alignme.. Comput.Appl.Bio 93 9(3):285-290 Panjukov VV Finding Steady Alignments: Similarity and Distance Pairwise alignment; RU; Gap; Similarity; Distance "Certain alignments keep the optimum despite the weight parameters varying over a range of values. Alignments of this kind are called steady. A method finding all the steady optimal alignments of two sequences is presented providing that a gap penalty is directly proportional to gap length." Comput Appl Biosci 1993 9 3 285-290 0760 Parry-Smith,D SOMAP: a Novel Interac.. Comput.Appl.Bio 91 7(2):233-235 Parry-Smith DJ; Attwood TK SOMAP: a Novel Interactive Approach to Multiple Protein Sequences Alignment Multiple alignment; UK; Display; Sequence analysis; Protein "The approach used is essentially one of manual sequence manipulation, aided by built-in symbolic displays of identities and similarities, and strict and 'fuzzy' (ambiguous) pattern-matching facilities. Additional flexibility is provided by means of an interface to a publicly available automatic alignment system and to a comprehensive sequence analysis package." Comput Appl Biosci 1991 7 2 233-235 0761 Patthy,L. Detecting Homology of .. J.Mol.Biol. 87 198:567-577 Patthy L Detecting Homology of Distantly Related Proteins with Consensus Sequences Multiple alignment; Consensus sequence; HU; Clustering; Homology; Protein The multiple alignment algorithm iterates between generating a multiple alignment from a consensus sequence (by pairwise comparisons) and constructing a consensus sequence from aligned sequences. An initial grouping of similar sequences suggests that the method might be embedded in a SAHN clustering structure J Mol Biol 198 198 567-577 0762 Pearson,W.R. Rapid and Sensitive Se.. Methods Enzymol 90 183:63-98 Pearson WR Rapid and Sensitive Sequence Comparison with FASTP and FASTA Database search; USA; FASTA "In this chapter, I show an example of a simple FASTA library search, describe the FASTA algorithm, and then discuss in detail a more problematic search, namely, one for members of the G-protein-coupled receptor family. Additional information about how to customize the scoring parameters and output from the FASTA programs is included in the appendices." Methods Enzymol 183 183 63-98 0763 Pearson,W.R. Searching Protein Sequ.. Genomics 91 11(3):635-650 Pearson WR Searching Protein Sequence Libraries: Comparison of the Sensitivity and Selectivity of the Smith-Waterman and FASTA Algorithms Database search; USA; FASTA; Sequence comparison; Protein; Algorithm "Rapid sequence comparison algorithms such as FASTP (Lipman, Pearson, 1985) and FASTA (Pearson, Lipman, 1988; Pearson, 1990) have dramatically decreased the amount of time required to compare a newly determined protein sequence to a protein or DNA sequence database." "Several strategies for improving the sensitivity of FASTA were examined." Genomics 1991 11 3 635-650 0764 Pearson,W.R. Improved Tools for Bio.. Proc.Nat.Acad.S 88 85:2444-2448 Pearson WR; Lipman DJ Improved Tools for Biological Sequence Comparison Database search; USA; Sequence comparison; FASTA; Significance; Composition; Display; Region "The FASTA program is a more sensitive derivative of the FASTP program, which can be used to search protein or DNA sequence data bases .... The RFD2 program can be used to evaluate the significance of similarity scores using a shuffling method that preserves local sequence composition. The LFASTA program can display all the regions of local similarity between two sequences with scores greater than a threshold ...." Proc Nat Acad Sci USA 85 85 2444-2448 0765 Pearson,W.R. Dynamic Programming Al.. Methods Enzymol 92 210:575-601 Pearson WR; Miller W Dynamic Programming Algorithms for Biological Sequence Comparison Pairwise comparison; Dynamic programming; Review; Sequence comparison; Profile; USA; Dynamic; Algorithm In Brand,L., and Johnson,M.L. (Eds.), Numerical Computer Methods. "We discuss several dynamic programming algorithms that have been applied to biological sequence comparison problems. ... We present efficient dynamic programming algorithms for calculating global and local similarity scores, and for comparing a sequence 'profile' or pattern to a sequence." Methods Enzymol 210 210 575-601 0766 Peltola,H. Algorithms for the Sea.. Nucleic Acids R 86 14(1):99-107 Peltola H; Soderlund H; Ukkonen E Algorithms for the Search of Amino Acid Patterns in Nucleic Acid Sequences Pairwise alignment; FI; Region; Dynamic programming; Sequence comparison; Codon; Amino acid; Nucleic acid; Algorithm "Some algorithms are described for the search of regions in a nucleic acid sequence that, when translated into amino acids, are homologous to a given amino acid pattern. All algorithms are modifications of the dynamic programming method for sequence comparison such that the translation of codons is taken into account." Nucleic Acids Res 1986 14 1 99-107 0767 Pesole,G. WORDUP: An Efficient A.. Nucleic Acids R 92 20(11):2871-28 Pesole G; Prunella N; Liuni S; Attimonelli M; Saccone C WORDUP: An Efficient Algorithm for Discovering Statistically Significant Patterns in DNA Sequences Sequence analysis; Significance; Statistical; Markov; Motif; Italy; DNA; Algorithm "We present here a fast and sensitive method designed to isolate short nucleotide sequences which have non-random statistical properties and may thus be biologically active. It is based on a first order Markov analysis and allows us to detect statistically significant sequence motifs from six to ten nucleotides long which are significantly shared (or avoided) in the sequences under investigation." Nucleic Acids Res 1992 20 11 2871-2875 0768 Petersen,S.B. Training Neural Networ.. Trends Biotechn 90 8(11):304-308 Petersen SB; Bohr H; Bohr J; Brunak S; Cotterill RMJ; Fredholm H; Lautrup B Training Neural Networks to Analyse Biological Sequences Sequence analysis; Neural; DK; Structure; Network Sequence homology measured by neural networks. Secondary structure prediction. Prediction of b-turns in proteins. Prediction of three-dimensional protein backbone conformation. Using neural networks on nucleic acid sequences Trends Biotechnol 1990 8 11 304-308 0769 Pevzner,P.A. Multiple Alignment, Co.. SIAM J.Appl.Mat 92 52(6):1763-177 Pevzner PA Multiple Alignment, Communication Cost, and Graph Matching Multiple alignment; Complexity; Approximation; USA; Graph "Although many algorithms for suboptimal alignment have been suggested, no "performance guarantees" algorithms have been known until recently. A computationally efficient approximation multiple alignment algorithm with guaranteed error bounds equal to the normalized communication cost of a corresponding graph is given in this paper." See also Gusfield (1993) SIAM J Appl Math 1992 52 6 1763-1779 0770 Pevzner,P.A. Nucleotide Sequences v.. Computers Chem. 92 16(2):103-106 Pevzner PA Nucleotide Sequences versus Markov Models Sequence analysis; Significance; Markov; USA; Nucleotide; Model "There exist several peculiarities of nucleotide sequences that preclude their description by existing models and thus allow one to distinguish DNA sequences from random A, T, G, C-texts. ... This paper reviews some approaches to locate anomalous words and establishes links between the recent results on walking Markov models with strand symmetry ... and non-stationary words in DNA sequences ...." Computers Chem 1992 16 2 103-106 0771 Pevzner,P.A. Statistical Distance B.. Comput.Appl.Bio 92 8(2):121-127 Pevzner PA Statistical Distance Between Texts and Filtration Methods in Sequence Comparison Database search; USA; Statistical; Sequence comparison; Complexity; Dynamic programming; Distance "Upon searching local similarities in long sequences, the necessity of a 'rapid' similarity search becomes acute. Quadratic complexity of dynamic programming algorithms forces the employment of filtration methods that allow elimination of the sequences with a low similarity level. The paper is devoted to the theoretical substantiations of the filtration method based on the statistical distance between texts." Comput Appl Biosci 1992 8 2 121-127 0772 Pevzner,P.A. Linguistics of Nucleot.. J.Biomol.Struct 89 6(5):1013-1026 Pevzner PA; Borodovsky MY; Mironov AA Linguistics of Nucleotide Sequences I: The Significance of Deviations from Mean Statistical Characteristics and Prediction of the Frequencies of Occurrence of Words Sequence analysis; Significance; Linguistic; RU; Statistical; Sequence prediction; Prediction; Nucleotide; Word "We propose a formula for the variance of the number of word's occurrences in the text, with allowance for word overlaps, making it possible to assess the significance of the deviations from the expected statistical characteristics. ... [Also a] new method for predicting the frequencies of occurrence of particular words ...." J Biomol Struct & Dyn 1989 6 5 1013-1026 0773 Pevzner,P.A. Linguistics of Nucleot.. J.Biomol.Struct 89 6(5):1027-1038 Pevzner PA; Borodovsky MY; Mironov AA Linguistics of Nucleotide Sequences II: Stationary Words in Genetic Texts and the Zonal Structure of DNA Sequence analysis; Significance; Linguistic; RU; Distributed; Genetic; Structure; DNA; Nucleotide; Word "Words are irregularly distributed in genetic texts. The analysis of this irregularity leads to the notion of stationary and non-stationary words. ... The distribution of stationary words suggests a method for partitioning DNA into zones." J Biomol Struct & Dyn 1989 6 5 1027-1038 0774 Pietrokovski, Linguistic Measure of .. J.Biomol.Struct 90 7(6):1251-1268 Pietrokovski S; Hirshon J; Trifonov EN Linguistic Measure of Taxonomic and Functional Relatedness of Nucleotide Sequences Sequence proximity; Linguistic; IL; Nucleotide "A single value, the linguistic similarity between the sequences, is suggested as a measure of sequence relatedness. ... The similarity value is shown to be very sensitive to the relatedness of the source species, thus providing a convenient tool for taxonomic classification of species by their sequence vocabularies. ... This can be a basis for a quick screening technique for functional characterization of the sequences ...." J Biomol Struct & Dyn 1990 7 6 1251-1268 0775 Pirklbauer,K. A Study of Pattern-Mat.. Structured Prog 92 13:89-98 Pirklbauer K A Study of Pattern-Matching Algorithms String match; Review; Austria; Algorithm "This paper does not present a new pattern-matching algorithm but offers a survey of well-known algorithms and compares their run-time behavior. ... The algorithms are compared by measuring the behavior of typical examples." Structured Programming 13 13 89-98 0776 Pizzi,E. A Simple Method for Gl.. Nucleic Acids R 92 20(1):131-136 Pizzi E; Attimonelli M; Liuni S; Frontali C; Sacconne C A Simple Method for Global Sequence Comparison Sequence proximity; Sequence comparison; Italy "We investigated the possibility of using a global approach to efficiently pre-screen a large database, in order to rapidly identify those sequences which are related to a given one." "A simple method of sequence comparison, based on a correlation analysis of oligonucleotide frequency distributions, is here shown to be a reliable test of overall sequence similarity." Nucleic Acids Res 1992 20 1 131-136 0777 Posfai,J. Predictive Motifs Deri.. Nucleic Acids R 89 17(7):2421-243 Posfai J; Bhagwat AS; Posfai G; Roberts RJ Predictive Motifs Derived from Cytosine Methyltransferases Multiple alignment; Segment; USA; Motif "To produce a global alignment of the set of similar sequences we developed a new procedure. ... In brief, information from both the amino acid and the nucleic acid sequences is used to produce the alignment. The program attempts to reproduce the method of alignment by eye, by directly locating globally conserved sequence features." Nucleic Acids Res 1989 17 7 2421-2435 0778 Pramanik,S. A Hardware Pattern Mat.. Comput.J. 85 28(3):264-269 Pramanik S; King CT A Hardware Pattern Matching Algorithm on a Dataflow Match complex patterns; Hardware; USA; Pattern match; Parallel; Algorithm "A hardware pattern matcher is presented, which searches for patterns on a data flow, such as characters read from a disk. The backing up on the data flow, for a general pattern matching, is avoided by means of a set of cells running in parallel. Each cell can search for a pattern independently, but requires only one one-character comparator." Comput J 1985 28 3 264-269 0779 Pustell,J. A High Speed, High Cap.. Nucleic Acids R 82 10(15):4765-47 Pustell J; Kafatos FC A High Speed, High Capacity Homology Matrix: Zooming through SV40 and Polyoma Pairwise comparison; Dot; USA; Compression; Homology; Matrix "We present a new homology matrix program which owes its basic conception to the two-dimensional dot matrices previously described ... but has important improvements and new features." It has a noise-filtration system, capacity for compression without much loss of information, and execution speed Nucleic Acids Res 1982 10 15 4765-4782 0780 Queen,C. Improvements to a Prog.. Nucleic Acids R 82 10(1):449-456 Queen C; Wegman MN; Korn LJ Improvements to a Program for DNA Analysis: A Procedure to Find Homologies Among Many Sequences Consensus sequence; Neighbourhood; USA; Program; Homology; DNA "Partial homologies among a set of sequences are not readily detected by a computer program that compares sequences two at a time, because each pair of sequences contains many homologies not shared by the others. We have therefore added to our program a procedure for detecting multi-sequence homologies, based on an algorithm that analyzes all the sequences simultaneously." Nucleic Acids Res 1982 10 1 449-456 0781 Rabani,Y. On the Space Complexit.. Theoret.Comput. 92 95:231-244 Rabani Y; Galil Z On the Space Complexity of Some Algorithms for Sequence Comparison Pairwise alignment; Complexity; IL; Sequence comparison; Gap; Algorithm "Recent algorithms for computing the modified edit distance given convex or concave gap cost functions are shown to require W(n2) space for certain input." Theoret Comput Sci 95 95 231-244 0782 Raiha,L. Approximate Sequence C.. Pattern Recogni 90 12(1/2):159-16 Raiha L Approximate Sequence Comparison: A Study with Histograms Pairwise alignment; FI; Sequence comparison "We have succeeded in generalizing the algorithm [Ukkonen (1985)] for non- negative cost functions. ... The alphabet of the sequences may be infinite. ... The cost function to weigh the editing operations must have at least two of the metric properties: non-negative values and the triangle inequality. ... The algorithm needs linear space." Pattern Recognition 1990 12 1/2 159-169 0783 Rechid,R. A New Interactive Prot.. Comput.Appl.Bio 89 5(2):107-113 Rechid R; Vingron M; Argos P A New Interactive Protein Sequence Alignment Program and Comparison of its Results with Widely Used Algorithms Pairwise alignment; DE; Sequence comparison; Display; Program; Protein; Sequence alignment; Algorithm "A computer program that allows interactive sequence comparison is described. It graphically displays a search matrix using residue physicochemical characteristics and multilength segmental comparisons." Comput Appl Biosci 1989 5 2 107-113 0784 Regnier,M. Knuth-Morris-Pratt Alg.. Lecture Notes i 89 379:431-444 Regnier M Knuth-Morris-Pratt Algorithm: An Analysis String match; Knuth-Morris-Pratt; FR; Algorithm In Kreczmar, A., Mirkowska, G. (Eds.), Mathematical Foundations of Computer Science 1989, MFCS '89, Porabka-Kozubnik, Poland, 28 August - 1 September 1989. "This paper deals with an average analysis of the Knuth-Morris- Pratt algorithm. ... An algebraic scheme is used, based on combinatorics on words and generating functions." Lecture Notes in Comput Sci 379 379 431-444 0785 Reich,J.G. On the Statistical Ass.. Nucleic Acids R 84 12(13):5529-55 Reich JG; Drabsch H; Daumler A On the Statistical Assessment of Similarities in DNA Sequences Pairwise alignment; Significance; DE; Statistical; Gap; Similarity; DNA "The statistical behavior of the similarity score for unrelated DNA sequences calculated as letter-by-letter comparison or from various forms of optimal alignment was studied. ... This makes it possible to adopt a simple criterion for the rejection of fortuitous similarity. It is based on the mean and standard deviation of chance scores whose expected values, depending on chain length, gap penalty and probability of letter coincidence, may be calculated ...." Nucleic Acids Res 1984 12 13 5529-5543 0786 Reich,J.G. A Simple Statistical S.. Comput.Appl.Bio 87 3(1):25-30 Reich JG; Meiske W A Simple Statistical Significance Test of Window Scores in Large Dot Matrices Obtained from Protein or Nucleic Acid Sequences Pairwise comparison; Dot; DE; Statistical; Significance; Protein; Nucleic acid; Score "A test of the statistical significance of dot constellations as detected by window search in large dot matrices is described. The procedure takes the correlation between overlapping windows on the diagonals of a dot matrices into account. It is based on a confidence limit of the exact distribution of dot scores." Comput Appl Biosci 1987 3 1 25-30 0787 Reichert,T.A. An Application of Info.. J.Theor.Biol. 73 42(2):245-261 Reichert TA; Cohen DN; Wong AKC An Application of Information Theory to Genetic Mutations and the Matching of Polypeptide Sequences Sequence proximity; Information theory; USA; Genetic "An information-based methodology for determining the quality of an alignment of two code sequences is presented. ... In application, one needs to obtain estimates of the distribution of (a) the spacing between mutations, (b) the frequency of the four mutation operations, and (c) the inserted character frequencies and deletion lengths." J Theor Biol 1973 42 2 245-261 0788 Reizer,J. Possible Problems with.. Trends Biochem. 92 17(2):60-60 Reizer J; Saier MH; Reizer A Possible Problems with the Protein Sequence Comparison Program FASTA Database search; FASTA; USA; Sequence comparison; Program; Protein "A prerequisite to [computer aided comparisons of protein sequences] is an effective screening routine that reliably searches protein libraries and selects sequences for evaluation. The widely used FASTA program provides a rapid library search and sequence comparison algorithm for this purpose. Some problems that we have encountered while using FASTA should be brought to the attention of other users." Trends Biochem Sci 1992 17 2 60-60 0789 Rinsma,I. Distribution of the Nu.. Bull.Math.Biol. 90 52(3):349-358 Rinsma I; Hendy M; Penny D Distribution of the Number of Matches between Nucleotide Sequences Pairwise alignment; Significance; NZ; Distribution; Nucleotide "A method is given for calculating the probability of observing m matches from two overlapping random sequences. [It is a] useful first step in evaluating the reliability of evolutionary trees .... It could also be used to determine how much better an optimal alignment is than expected by chance." Bull Math Biol 1990 52 3 349-358 0790 Risler,J.L. Amino Acid Substitutio.. J.Mol.Biol. 88 204:1019-1029 Risler JL; Delorme MO; Delacroix H; Henaut A Amino Acid Substitutions in Structurally Related Proteins. A Pattern Recognition Approach. Determination of a New and Efficient Scoring Matrix Sequence proximity; Substitution; FR; Pattern recognition; Scoring; Amino acid; Protein; Recognition; Matrix "Amino acid substitutions in evolutionarily related proteins have been studied from a structural point of view. ... The matrix of distances between amino acids, or scoring matrix, determined from this study is different from any other published matrix. ... [It] seems to be very efficient for aligning distantly related proteins." J Mol Biol 204 204 1019-1029 0791 Rivest,R.L. On the Worst-case Beha.. SIAM J.Comput. 77 6(4):669-674 Rivest RL On the Worst-case Behaviour of String-Searching Algorithms String match; Complexity; USA; Pattern match; Algorithm "Any algorithm for finding a pattern of length k in a string of length n must examine at least n - k + 1 of the characters of the string in the worst case. ... We prove that this is the best possible result. Therefore there do not exist pattern matching algorithms whose worst-case behavior is 'sublinear' in n (that is, linear with constant less than one), in contrast with the situation for average behavior ...." SIAM J Comput 1977 6 4 669-674 0792 Roberts,L. New Chip may Speed Gen.. Science 89 244(12 May):65 Roberts L New Chip may Speed Genome Analysis Match complex patterns; USA; Genome; Pattern match; Parallel "What they have come up with ... is 'a hardware solution to what is normally handled by investigators as a software problem' ... - a relatively inexpensive parallel processing system that can scan up to 10 million characters a second. ... What accounts for the speed of this system is that the instructions for pattern matching are hardwired into the processors." Science 1989 244 12 May 655-656 0793 Robson,B. Natural Sequence Code .. Comput.Appl.Bio 92 8(3):283-289 Robson B; Greaney PJ Natural Sequence Code Representations for Compression and Rapid Searching of Human-genome Style Databases Database search; UK; Compression; Representation "Numeric descriptions ('bio-informatic descriptions') of amino acid residues have been developed which will be of value whenever the quality and quantity of information in very large ... gene and protein sequences is to be compared or manipulated. ... Preliminary studies on both a supercomputer and smaller machines suggest a 'worst-case' speeding of [approximately] 4.5-fold." Comput Appl Biosci 1992 8 3 283-289 0794 Rohde,K. A Fast, Sensitive Patt.. Comput.Appl.Bio 93 9(2):183-189 Rohde K; Bork P A Fast, Sensitive Pattern-matching Approach for Protein Sequences Match a pattern matrix; DE; Dynamic programming; Protein "We present a fast, sensitive pattern-matching algorithm that describes a pattern by its physico-chemical properties rather than by occurrence of amino acids, using a fast, dynamic programming algorithm. ... This method leads to a better description of the pattern, as it is not simply a group of similar sequence positions but rather a structure with special properties and functions." Comput Appl Biosci 1993 9 2 183-189 0795 Roytberg,M.A. A Search for Common Pa.. Comput.Appl.Bio 92 8(1):57-64 Roytberg MA A Search for Common Patterns in Many Sequences Multiple comparison; Common feature; RU "A new approach to search for common patterns in many sequences is presented. The idea is that one sequence from the set of sequences to be compared is considered as a 'basic' one and all its similarities with other sequences are found. Multiple similarities are then reconstructed using these data." Comput Appl Biosci 1992 8 1 57-64 0796 Russell,R.B. Multiple Protein Seque.. Proteins Struct 92 14:309-323 Russell RB; Barton GJ Multiple Protein Sequence Alignment from Tertiary Structure Comparison: Assignment of Global and Residue Confidence Levels Multiple alignment; Structure; UK; Sequence alignment; Confidence; Protein; Residue "An algorithm is presented for the accurate and rapid generation of multiple protein sequence alignments from tertiary structure comparisons. ... In order to reduce the need for visual verification, two similarity indices are introduced to determine the quality of each generated structural alignment." Proteins Struct Funct Genet 14 14 309-323 0797 Rytter,W. A Correct Preprocessin.. SIAM J.Comput. 80 9(3):509-512 Rytter W A Correct Preprocessing Algorithm for Boyer-Moore String-Searching String match; Boyer-Moore; MEX; Correction; Knuth-Morris-Pratt; Algorithm "We present the correction to Knuth's algorithm [Knuth, Morris, Pratt (1977)] for computing the table of pattern shifts later used in the Boyer-Moore algorithm [(1977)] for pattern matching." SIAM J Comput 1980 9 3 509-512 0798 Earley,J. An Efficient Context-f.. Comm.ACM 70 13(2):94-102 Earley J An Efficient Context-free Parsing Algorithm Parsing; Language; USA; Algorithm "A parsing algorithm which seems to be the most efficient general context- free algorithm known is described. It is similar to both Knuth's LR(k) algorithm and the familiar top-down algorithm. It has a time bound proportional to n3 (where n is the length of the string being parsed) in general; it has an n2 bound for unambiguous grammars; and it runs in linear time on a large class of grammars, which seems to include most practical context-free programming language grammars." Comm ACM 1970 13 2 94-102 0799 Younger,D.H. Recognition and Parsin.. Inform.Control 67 10(2):189-208 Younger DH Recognition and Parsing of Context-Free Languages in Time n3. Sequence recognition; Language; Automata; USA; Parsing; Recognition "A recognition algorithm is exhibited whereby an arbitrary string over a given vocabulary can be tested for containment in a given context-free language. A special merit of this algorithm is that it is completed in a number of steps proportional to the cube of the number of symbols in the tested string. ... The recognition algorithm is then simulated on a Turing Machine. It is shown that this simulation likewise requires a number of steps proportional to only the cube of the test string length." Inform Control (Orlando) 1967 10 2 189-208 0800 Claverie,J.M. A New Protein Sequence.. Nature (Lond.) 85 318(7 Nov.):19 Claverie JM; Sauvaget I A New Protein Sequence Data Bank Sequence database; FR; Sequence analysis; Protein "A protein sequence data bank, called PGtrans, is available from our laboratory. This data bank is generated by automatic computer translation of the well-known nucleotide sequence library GenBank .... The main purpose of PGtrans is to offer a direct access to all amino-acid sequences coded among GenBank nucleotide sequences and thus to be an efficient tool for protein homology searches. Its format is compatible with the rest of our sequence analysis software and will be consistent with the recommendations of the CODATA task group on protein sequence data banks." Nature (Lond ) 1985 318 7 Nov. 19-19 0801 Chin,F. Performance Analysis o.. Algorithmica 94 12(4/5):293-31 Chin F; Poon CK Performance Analysis of Some Simple Heuristics for Computing Longest Common Subsequences Longest common; Subsequence; Performance; Heuristic; HK "Although the Longest Common Subsequence (LCS) Problem has been studied by many researchers for years, heuristic methods have not been investigated before. In this paper we present a simple heuristic which guarantees to return a common subsequence of length at least 1/s that of the longest where s is the number of different symbols in the input strings. Furthermore, we generalize the idea to several classes of heuristic algorithms. Surprisingly, we find that no other heuristic in these classes outperforms this simple algorithm." Algorithmica 1994 12 4/5 293-311 0802 Gribskov,M. The Codon Preference P.. Nucleic Acids R 84 12(1):539-549 Gribskov M; Devereux J; Burgess RR The Codon Preference Plot: Graphic Analysis of Protein Coding Sequences and Prediction of Gene Expression Sequence analysis; Codon; Expression; USA; Coding; Frame; Protein; Gene; Prediction; Graphic "The codon preference plot is useful for locating genes in sequenced DNA, predicting the relative level of their expression and for detecting DNA sequencing errors resulting in the insertion or deletion of bases within a coding sequence. The three possible reading frames are displayed in parallel along with the open reading frames and plots of the location of rare codons in each reading frame." Nucleic Acids Res 1984 12 1 539-549 0803 Orcutt,B.C. Searching the Protein .. Bull.Math.Biol. 84 46(4):545-552 Orcutt BC; Barker WC Searching the Protein Sequence Database Database search; Review; USA; Dynamic programming; Approximate match; Protein; Sequence database "As the volume of protein sequence data grows, rapid methods for searching the protein sequence database become of primary importance. Rigorous comparison of sequences is obtained with the well-known dynamic programming algorithms. However these algorithms are not rapid enough to use for routinely searching the entire database. In this paper we discuss some methods that can be used for rapid searches." The protein identification problem. Search for identical matching segments. Search for approximate matching segments (substitutions). Search for approximate matching segments (all mutations). Bull Math Biol 1984 46 4 545-552 0804 Tavare,S. Some Statistical Aspec.. Mathematical .. 89CRC Press Tavare S; Giddings BW Some Statistical Aspects of the Primary Structure of Nucleotide Sequences Waterman MS Mathematical Methods for DNA Sequences Sequence analysis; Markov; Fourier; Regularities; USA; Statistical; Structure; Nucleotide "The second section describes some Markov chain methods for assessing the dependence structure that exists in a sequence of nucleotides. Particular emphasis is placed on methods for estimating the order of the Markov dependence. ... The third part of our paper describes some methods for searching for repetitive or periodic patterns in a sequence. We base our analysis on the discrete Walsh transform and compare it to the more familiar Fourier methods." CRC Press Boca Raton, FL 1989 117-132 0805 Sackin,M.J. Crossassociation: A Me.. Biochem.Genet. 71 5:287-313 Sackin MJ Crossassociation: A Method of Comparing Protein Sequences Pairwise comparison; Significance; UK; Statistical; Protein "The method is to 'slide' the sequences past each other one step at a time and to count the number of amino acids that match. At each overlap position, the program prints the percentage match and statistical significance measures of the matching. ... The method includes computation of three overall similarity measures between sequences which should have use in both evolutionary and taxonomic studies." Biochem Genet 5 5 287-313 0806 Salemme,A. A Convenient Method fo.. Nucleic Acids R 84 12:257-262 Salemme A; Furano AV A Convenient Method for Locating Sets of Related Short Sequences in DNA Sequences of any Length Match with k mismatches; USA; DNA "With a single execution of each program, the user can search one or more sequences of any length for one or more short sequences. The user specifies either the number of mismatches (from 0 through N) or the positions of the mismatches allowed in each short sequence." Nucleic Acids Res 12 12 257-262 0807 Sander,C. Database of Homology-d.. Proteins Struct 91 9:56-68 Sander C; Schneider R Database of Homology-derived Protein Structures and the Structural Meaning of Sequence Alignment Pairwise alignment; Significance; Sequence alignment; Structure; Protein; Sequence weight; DE "The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length." Proteins Struct Funct Genet 9 9 56-68 0808 Sankoff,D. Matching Sequences und.. Proc.Nat.Acad.S 72 69(1):4-6 Sankoff D Matching Sequences under Deletion/Insertion Constraints Pairwise alignment; Dynamic programming; CA; Longest common The algorithm of Needleman and Wunsch (1970), for finding longest common subsequences without constraints, "is improved from the viewpoint of computational economy. An economical algorithm is then elaborated for finding subsequences satisfying deletion/insertion constraints." Proc Nat Acad Sci USA 1972 69 1 4-6 0809 Sankoff,D. Minimal Mutation Trees.. SIAM J.Appl.Mat 75 28(1):35-42 Sankoff D Minimal Mutation Trees of Sequences Multiple alignment; CA; Significance "Given a finite tree, some of whose vertices are identified with given finite sequences, we show how to construct sequences for all the remaining vertices simultaneously, so as to minimize the total edge-length of the tree. Edge-length is calculated by a metric whose biological significance is the mutational distance between two sequences." See Sankoff and Cedergren (1983) SIAM J Appl Math 1975 28 1 35-42 0810 Sankoff,D. Simultaneous Solution .. SIAM J.Appl.Mat 85 45(5):810-825 Sankoff D Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems Multiple alignment; Dynamic programming; CA; RNA; Folding "The alignment of finite sequences, the inference of ribonucleic acid secondary structures (folding), and the reconstruction of ancestral sequences on a phylogenetic tree, are three problems which have dynamic programming solutions, which we formulate in a common mathematical framework. Combining the objective functions for alignment ... and folding (free energy), we present an algorithm which solves all three problems simultaneously ...." SIAM J Appl Math 1985 45 5 810-825 0811 Sankoff,D. A Test for Nucleotide .. J.Mol.Biol. 73 77:159-164 Sankoff D; Cedergren RJ A Test for Nucleotide Sequence Homology Pairwise alignment; Significance; Monte Carlo; CA; Homology; Nucleotide With respect to alignments of two given sequences, "a test is developed which computes the significance of each deletion/insertion hypothesized, based on Monte-Carlo sampling of random sequences with the same base composition as the experimental sequences being tested." J Mol Biol 77 77 159-164 0812 Sankoff,D. Simultaneous Compariso.. Time Warps, S.. 83Addison-Wesley Sankoff D; Cedergren RJ Simultaneous Comparison of Three or More Sequences Related by a Tree Sankoff D Kruskal JB Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison Multiple alignment; Dynamic programming; Character weight; CA The algorithm minimizes the length of the given evolutionary tree of the sequences. See Sankoff (1975) Addison-Wesley Reading, MA 1983 253-263 0813 Sankoff,D. Frequency of Insertion.. J.Mol.Evol. 76 7:133-149 Sankoff D; Cedergren RJ; Lapalme G Frequency of Insertion-Deletion, Transversion,and Transition in the Evolution of 5S Ribosomal RNA Multiple alignment; Dynamic programming; CA; Evolution; RNA; Transition "We present a dynamic programming algorithm which finds the optimal alignment for a set of N sequences simultaneously, where each sequence is associated with one of the N tips of a given evolutionary tree. Concurrently, protosequences are constructed corresponding to the ancestral nodes of the tree." J Mol Evol 7 7 133-149 0814 Time Warps, String Edi.. 83Addison-Wesley Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison Sankoff D Kruskal JB BK - Sequence analysis; Dynamic programming; CA; Sequence comparison; Complexity; Edit An overview of sequence comparison based on dynamic programming. It includes major sections on: macromolecular sequences; time-warping, continuous functions, and speech processing; algorithms for related problems; computational complexity; random sequences Addison-Wesley Reading, MA 1983 xii+382-xii+382 0815 Sankoff,D. Gene Order Comparisons.. Proc.Nat.Acad.S 92 89:6575-6579 Sankoff D; Leduc G; Antoine N; Paquin B; Lang BF; Cedergren R Gene Order Comparisons for Phylogenetic Inference: Evolution of the Mitochondrial Genome Genome; CA; Evolution; Gene; Phylogenetic "We describe the construction of a database of 16 mitochondrial gene orders from fungi and other eukaryotes by using complete or nearly complete genomic sequences; propose a measure of gene order rearrangement based on the minimal set of chromosomal inversions, transpositions, insertions, and deletions necessary to convert the order in one genome to that of the other; report on algorithm design and the development of the DERANGE software for the calculation of this measure ...." Proc Nat Acad Sci USA 89 89 6575-6579 0816 Sankoff,D. Evolution of 5S RNA an.. Nat.New Biol. 73 245(24 Oct.):2 Sankoff D; Morel C; Cedergren RJ Evolution of 5S RNA and the Nonrandomness of Base Replacement Multiple alignment; CA; Evolution; RNA "The main problem is to align the various sequences so that bases in corresponding position in different sequences are fairly certain to reflect a common term in the ancestral sequence. ... This problem has recently been solved for the case where the evaluation is based on a minimal mutation criterion. ... For the case of three known sequences and one unknown sequence in the configuration it is quite easy to implement a computer program for the method." Nat New Biol 1973 245 24 Oct. 232-234 0817 Sankoff,D. Shortcuts, Diversions,.. Discrete Math. 73 4:287-293 Sankoff D; Sellers PH Shortcuts, Diversions, and Maximal Chains in Partially Ordered Sets Pairwise alignment; CA; Restriction "An algorithm is described for finding the maximal weight chain between two points in a locally finite partial order under the restriction that all but k (or fewer) successive pairs in the chain belong to a given subset of the partial order relation." Application to molecular genetics Discrete Math 4 4 287-293 0818 Santibanez,M. A Multiple Alignment P.. Comput.Appl.Bio 87 3(2):111-114 Santibanez M; Rohde K A Multiple Alignment Program for Protein Sequences Multiple alignment; Segment; DE; Program; Protein This program for multiple alignment of protein sequences is "an extension of the fast alignment program by Wilbur et al. (1984) into higher dimensions. The use of hash procedures on fragments of the protein sequences increases the speed of calculation. Thereby we also take into account fragments which are present in some, but not in all, sequences considered." Comput Appl Biosci 1987 3 2 111-114 0819 Saqi,M.A.S. A Simple Method to Gen.. J.Mol.Biol. 91 219(4):727-732 Saqi MAS; Sternberg MJE A Simple Method to Generate Non-trivial Alternate Alignments of Protein Sequences Pairwise alignment; Dynamic programming; UK; Protein "An algorithm is presented that finds suboptimal alignments of protein sequences by a simple modification to the standard dynamic programming method." J Mol Biol 1991 219 4 727-732 0820 Saroff,H.A. A Note on the Evaluati.. Bull.Math.Biol. 84 46(5/6):951-96 Saroff HA A Note on the Evaluation of Similarity (Homology) of Short Sequences with Long Sequences Pairwise comparison; Significance; Monte Carlo; USA; Gap; Similarity; Homology "Monte Carlo data on the comparison of a short sequence with a long one are developed in a manner to quantify the occurrence of gaps. ... In previous publications tables of expected frequencies of occurrence of similarities between a short sequence, length 10, and a long one, length 112, were presented .... This note develops the problem in more detail, particularly the results relating to gaps." Bull Math Biol 1984 46 5/6 951-961 0821 Saurin,W. Comparaison de plusieu.. C.R.Acad.Sci.Pa 86 303(13):541-54 Saurin W; Marliere P Comparaison de plusieurs sequences proteiques par reconnaissance de blocs conserves Multiple alignment; Segment; FR; Region; DE Simultaneous Alignment of Several Protein Sequences. "The sequences of related proteins show the alternance of conserved and variable regions. ... Although the exact meaning of such constraints remains elusive, conserved regions can be extracted from protein chains and used to align them. We developed a program that efficiently performs this task." C R Acad Sci Paris Ser III 1986 303 13 541-546 0822 Saurin,W. Matching Relational Pa.. Comput.Appl.Bio 87 3(2):115-120 Saurin W; Marliere P Matching Relational Patterns in Nucleic Acid Sequences Match complex patterns; FR; Nucleic acid "We describe a program that efficiently searches sequence data banks for complex patterns where sites are linked by common relations such as identity, complementarity or span. Its algorithm is closer to those of automatic demonstration than to the finite state machines used in fast pattern matching." Comput Appl Biosci 1987 3 2 115-120 0823 Schaback,R. On the Expected Sublin.. SIAM J.Comput. 88 17(4):648-658 Schaback R On the Expected Sublinearity of the Boyer-Moore Algorithm String match; Boyer-Moore; DE; Probabilistic; Algorithm "This paper analyzes the expected performance of a simplified version BM* of the Boyer-Moore string-matching algorithm. A probabilistic automaton A, which models the expected behavior of BM*, is set up under the assumption that both text and pattern are generated by a source which emits independent and uncorrelated symbols with an arbitrary distribution of probabilities." SIAM J Comput 1988 17 4 648-658 0824 Schneider,T.D Sequence Logos: A New .. Nucleic Acids R 90 18(20):6097-61 Schneider TD; Stephens RM Sequence Logos: A New Way to Display Consensus Sequences Consensus sequence; USA; Display "A graphical method is presented for displaying the patterns in a set of aligned sequences. The characters representing the sequence are stacked on top of each other for each position in the aligned sequences. The height of each letter is made proportional to its frequency, and the letters are sorted so the most common one is on top." Nucleic Acids Res 1990 18 20 6097-6100 0825 Schneider,T.D Information Content of.. J.Mol.Biol. 86 188:415-431 Schneider TD; Stormo GD; Gold L; Ehrenfeucht A Information Content of Binding Sites on Nucleotide Sequences Match a pattern matrix; Information content; USA; Distributed; Nucleotide; Binding "We define a measure of the information ... in the sequence patterns at binding sites. It allows one to investigate how information is distributed across the sites and to compare one site to another. One can also calculate the amount of information ... that would be required to locate the sites, given that they occur with some frequency in the genome." Matrices having a high information content J Mol Biol 188 188 415-431 0826 Schoniger,M. A Local Algorithm for .. Bull.Math.Biol. 92 54(4):521-536 Schoniger M; Waterman MS A Local Algorithm for DNA Sequence Alignment with Inversions Subalignment; Dynamic programming; USA; Sequence alignment; Inversion; DNA; Algorithm "A dynamic programming algorithm to find all optimal alignments of DNA subsequences is described. The alignments use not only substitutions, insertions and deletions of nucleotides but also inversions (reversed complements) of substrings of the sequences. The inversion alignments themselves contain substitutions, insertions and deletions of nucleotides." Bull Math Biol 1992 54 4 521-536 0827 Schuler,G.D. A Workbench for Multip.. Proteins Struct 91 9(3):180-190 Schuler GD; Altschul SF; Lipman DJ A Workbench for Multiple Alignment Construction and Analysis Multiple alignment; Segment; USA; Sequence alignment "Multiple sequence alignment can be a useful technique for studying molecular evolution, as well as for analyzing relationships between structure or function and primary sequence. We have developed for this purpose an interactive program, MACAW ..., that allows the user to construct multiple alignments by locating, analyzing, editing, and combining 'blocks' of aligned sequence segments." Proteins Struct Funct Genet 1991 9 3 180-190 0828 Schwartz,R.M. Matrices for Detecting.. Atlas of Prot.. 78National Biomed Schwartz RM; Dayhoff MO Matrices for Detecting Distant Relationships Dayhoff MO Atlas of Protein Sequence and Structure, Volume 5, Supplement 3, 1978 Sequence proximity; Substitution; Scoring; USA; Genetic A comparison of four matrices for calculating scores of pairs of aligned protein sequences: unitary matrix (UM), genetic code matrix (GCM), alternative amino acids matrix (AAAM), and the mutation data matrix (MDM78). In the comparisons, MDM78 seems clearly to be the best National Biomedical Research Foundation Washington, DC 1978 353-358 0829 Schwartz,S. Software Tools for Ana.. Nucleic Acids R 91 19(17):4663-46 Schwartz S; Miller W; Yang CM; Hardison RC Software Tools for Analyzing Pairwise Alignments of Long Sequences Pairwise comparison; Dot; USA; Pairwise alignment "Computer tools are needed to summarize the information [from pairwise sequence comparisons], to assist in its analysis, and to report the findings. ... One tool prepares publication-quality pictorial representations of alignments, while another facilitates interactive browsing of pairwise alignment data." Nucleic Acids Res 1991 19 17 4663-4667 0830 Sege,R.D. A Statistical Test for.. Nucleic Acids R 82 10(1):375-389 Sege RD; Saxberg BEH A Statistical Test for Comparing Several Nucleotide Sequences Consensus sequence; Likelihood; USA; Statistical; Nucleotide "The general problem addressed here is to determine the level of information contained in a group of N bases, i.e., to examine the distribution of bases at one location among N sequences, or at N locations along one sequence .... A method will now be derived which will allow one to determine the level of confidence for rejecting the hypothesis that the observed data came by chance selection from the base pool." Nucleic Acids Res 1982 10 1 375-389 0831 Sellers,P.H. An Algorithm for the D.. J.Combin.Theory 74 16:253-258 Sellers PH An Algorithm for the Distance Between Two Finite Sequences Pairwise alignment; Sequence proximity; USA; Dynamic programming; Distance; Algorithm An equal-weights, distance-based, dynamic programming algorithm to compare two sequences. See Sellers (1974b) for the generalization to arbitrary weights J Combin Theory Ser A 16 16 253-258 0832 Sellers,P.H. On the Theory and Comp.. SIAM J.Appl.Mat 74 26(4):787-793 Sellers PH On the Theory and Computation of Evolutionary Distances Pairwise alignment; Sequence proximity; USA; Evolutionary distance; Distance "This paper gives a formal definition of the biological concept of evolutionary distance and an algorithm to compute it." See Sellers (1974a) for the algorithm under the assumption of equal weights SIAM J Appl Math 1974 26 4 787-793 0833 Sellers,P.H. Pattern Recognition in.. Proc.Nat.Acad.S 79 76(7):3041-304 Sellers PH Pattern Recognition in Genetic Sequences Match with k differences; Pattern recognition; USA; Evolutionary distance; Genetic; Recognition "This paper announces an algorithm for finding pattern similarities between two given finite sequences. Two portions, one from each sequence, are similar if they are close in the metric space of evolutionary distances. ... This result lends itself to detecting similarities by computer between pairs of biological sequences, such as proteins and nucleic acids." See Sellers (1980) for details Proc Nat Acad Sci USA 1979 76 7 3041-3041 0834 Sellers,P.H. The Theory and Computa.. J.Algorithms 80 1:359-373 Sellers PH The Theory and Computation of Evolutionary Distances: Pattern Recognition Match with k differences; Pattern recognition; USA; Display; Evolutionary distance; Distance; Recognition "A method of finding pattern similarities between two sequences is given. Two portions, one from each sequence, are similar if they are close in the metric space of evolutionary distances. The method allows a complete list to be made of all pairs of intervals, one from each of two given sequences, such that each pair displays a maximum local degree of similarity ...." J Algorithms 1 1 359-373 0835 Sellers,P.H. Pattern Recognition in.. Bull.Math.Biol. 84 46(4):501-514 Sellers PH Pattern Recognition in Genetic Sequences by Mismatch Density Subalignment; Dynamic programming; USA; Pattern recognition; Genetic; Recognition Computer program for similarity search. Find subsequences within the matrix exhibiting locally optimal alignment by maintaining a minimum match density. Based on the concept of match density as suggested by Goad and Kanehisa (1982) Bull Math Biol 1984 46 4 501-514 0836 Shapiro,B.A. An Interactive Dot Mat.. J.Biomol.Struct 87 4(5):697-706 Shapiro BA; Nussinov R; Lipkin LE; Maizel JV Jr An Interactive Dot Matrix System for Locating Potentially Significant Features in Nucleic Acid Molecules Sequence analysis; Significance; Dot; USA; Region; Probabilistic; Nucleic acid; Matrix "An interactive computer system using a dot matrix approach has been developed and used to determine potentially significant features due to distortions in the B-DNA helix .... Specifically, it was found that a pattern of alternating doublets of purines and pyrimidines appear to exist in regulatory regions. This result is shown to be beyond probabilistic expectation." J Biomol Struct & Dyn 1987 4 5 697-706 0837 Sibbald,P.R. Scrutineer: A Computer.. Comput.Appl.Bio 90 6(3):279-288 Sibbald PR; Argos P Scrutineer: A Computer Program that Flexibly Seeks and Describes Motifs and Profiles in Protein Sequence Databases Database search; DE; Motif; Sequence database; Program; Profile; Protein "Scrutineer is an interactive, user-friendly program designed to search for motifs, patterns and profiles in the Swissprot, Protein Identification Resource (PIR) or SeqDb protein sequence databases." Comput Appl Biosci 1990 6 3 279-288 0838 Sibbald,P.R. Weighting Aligned Prot.. J.Mol.Biol. 90 216(4):813-818 Sibbald PR; Argos P Weighting Aligned Protein or Nucleic Acid Sequences to Correct for Unequal Representation Sequence weight; DE; Correction; Database search; Representation; Profile; Protein; Nucleic acid "Aligned sequences from the same family ... are seldom representative of the entire family. ... For many applications, such as using alignments or profiles to perform database searches for distantly related family members, such unequal representation requires correction. An algorithm to perform appropriate weighting of individual sequences is presented along with examples illustrating its efficacy." J Mol Biol 1990 216 4 813-818 0839 Sibbald,P.R. Calculating Higher Ord.. J.Theor.Biol. 89 136:475-483 Sibbald PR; Banerjee S; Maze J Calculating Higher Order DNA Sequence Information Measures Sequence analysis; Information theory; CA; DNA "This paper re-examines the use of information theory as a tool for understanding DNA. Specifically we (a) refine Gatlin's application (1972) of information theory to DNA sequence analysis, (b) point out some recent misinterpretations of results obtained by Brooks et al. (1988), (c) reconsider the problem that the finite lengths of DNA sequences pose for the use of a theory designed for sequences of infinite length ...." J Theor Biol 136 136 475-483 0840 Sittig,D.F. A Parallel Computing A.. Comput.Biomed.R 91 24(2):152-169 Sittig DF; Foulser D; Carriero N; McCorkle G; Miller PL A Parallel Computing Approach to Genetic Sequence Comparison: The Master- Worker Paradigm with Interworker Communication Pairwise alignment; Parallel; USA; Sequence comparison; Dynamic programming; Genetic "We have implemented a parallel version of a dynamic programming biological sequence comparison algorithm to study the potential applicability of using parallel computers for genetic sequence comparisons. Our parallel program ... was tested on both a 10 CPU Sequent Symmetry and a 64 CPU Intel Hypercube." A parallel version of Gotoh's (1982) algorithm Comput Biomed Res 1991 24 2 152-169 0841 Slisenko,A.O. Detection of Periodici.. J.Soviet Math. 83 22(3):1316-138 Slisenko AO Detection of Periodicities and String-matching in Real Time Pattern match; Complexity; Regularities; RU; Detection "This article contains a detailed description of an algorithm for finding all periodicities in real time on a machine with random memory access and registers of asymptotically minimal length. In fact, this construction gives a real-time algorithm for pattern matching, finding the longest repetitions, and so forth." J Soviet Math 1983 22 3 1316-1387 0842 Smit,G.de V. A Comparison of Three .. Software.Practi 82 12:57-66 Smit Gde V A Comparison of Three String Matching Algorithms Knuth-Morris-Pratt; Boyer-Moore; SA; String match; Complexity; Algorithm "Three string matching algorithms - straightforward, Knuth-Morris-Pratt and Boyer-Moore - are examined and their time complexities discussed. A comparison of their actual average behaviour is made, based on empirical data presented. It is shown that the Boyer-Moore algorithm is extremely efficient in most cases and that ... the Knuth-Morris-Pratt algorithm is not significantly better on the average than the straightforward algorithm." Software Practice Experience 12 12 57-66 0843 Smith,H.O. Finding Sequence Motif.. Proc.Nat.Acad.S 90 87:826-830 Smith HO; Annau TM; Chandrasegaran S Finding Sequence Motifs in Groups of Functionally Related Proteins Consensus sequence; Statistical; USA; Motif; Segment; Protein "We have developed a method for rapidly finding patterns of conserved amino acid residues (motifs) in groups of functionally related proteins. ... Segments of the proteins containing those patterns that occur most frequently are aligned on each other by a scoring method that obtains an average relatedness value for all the amino acids in each column of the aligned sequence block ...." Proc Nat Acad Sci USA 87 87 826-830 0844 Smith,P.D. Experiments with a Ver.. Software.Practi 91 21(10):1065-10 Smith PD Experiments with a Very Fast Substring Search Algorithm Pattern match; USA; String match; Algorithm "Sunday devised string matching methods that are generally faster than the Boyer-Moore algorithm. His fastest method used statistics of the language being scanned to determine the order in which character pairs are to be compared. In this paper the performances of similar, but language-independent, algorithms are examined. Results comparable with language-based algorithms can be achieved with an adaptive technique." Software Practice Experience 1991 21 10 1065-1074 0845 Smith,R. A Finite State Machine.. Comput.Appl.Bio 88 4(4):459-465 Smith R A Finite State Machine Algorithm for Finding Restriction Sites and other Pattern Matching Applications Dictionary match; Automata; USA; Restriction; Algorithm "Existing algorithms for finding restriction endonuclease recognition sites use brute-force algorithms which run in time O(NM) where N is the number of nucleotides in the sequence under analysis and M is the total number of nucleotides in all the different sites being searched for. This paper presents a deterministic finite state machine algorithm which runs in time O(N)." Comput Appl Biosci 1988 4 4 459-465 0846 Smith,R.F. Automatic Generation o.. Proc.Nat.Acad.S 90 87:118-122 Smith RF; Smith TF Automatic Generation of Primary Sequence Patterns from Sets of Related Protein Sequences Consensus sequence; USA; Clustering; Protein "We have developed a computer algorithm that can extract the pattern of conserved primary sequence elements common to all members of a homologous protein family. The method involves clustering the pairwise similarity scores among a set of related sequences to generate a binary dendrogram (tree). The tree is then reduced in a stepwise manner ... until only a single common 'root' pattern remains." Proc Nat Acad Sci USA 87 87 118-122 0847 Smith,R.F. Pattern-induced Multi-.. Protein Eng. 92 5(1):35-41 Smith RF; Smith TF Pattern-induced Multi-sequence Alignment (PIMA) Algorithm Employing Secondary Structure-dependent Gap Penalties for Use in Comparative Protein Modelling Multiple alignment; USA; Sequence alignment; Gap; Protein; Secondary; Algorithm "A multiple sequence alignment algorithm is described that uses a dynamic programming-based pattern construction method to align a set of homologous sequences based on their common pattern of conserved sequence elements." Protein Eng 1992 5 1 35-41 0848 Smith,T.F. Comparison of Bioseque.. Adv.Appl.Math. 81 2:482-489 Smith TF; Waterman MS Comparison of Biosequences Pairwise alignment; USA; Segment; Needleman-Wunsch "The homology measure of Needleman and Wunsch (1970) is shown, under general conditions, to be equivalent to the distance measure of Sellers (1974). A new algorithm is given to find similar pairs of segments, one segment from each sequence. The new algorithm is compared to an earlier one due to Sellers (1980)." Adv Appl Math 2 2 482-489 0849 Smith,T.F. Identification of Comm.. J.Mol.Biol. 81 147:195-197 Smith TF; Waterman MS Identification of Common Molecular Subsequences Subalignment; USA; Segment; Identification; Subsequence "In this letter we extend the above ideas to find a pair of segments, one from each of two long sequences, such that there is no other pair of segments with greater similarity (homology). The similarity measure used here allows for arbitrary length deletions and insertions." J Mol Biol 147 147 195-197 0850 Smith,T.F. The Statistical Distri.. Nucleic Acids R 85 13(2):645-656 Smith TF; Waterman MS; Burks C The Statistical Distribution of Nucleic Acid Similarities Pairwise comparison; Significance; USA; Statistical; Segment; Distributed; Distribution; Similarity; Nucleic acid "All pairs of a large set of known vertebrate DNA sequences were searched by computer for most similar segments. Analysis of this data shows that the computed similarity scores are distributed proportionally to the logarithm of the product of the lengths of the sequences involved. ... A simple rule is derived for determination of statistical significance of the similarity scores and to assist in relating statistical and biological significance." Nucleic Acids Res 1985 13 2 645-656 0851 Smith,T.F. Comparative Biosequenc.. J.Mol.Evol. 81 18(1):38-46 Smith TF; Waterman MS; Fitch WM Comparative Biosequence Metrics Pairwise alignment; USA; Sequence alignment; Needleman-Wunsch "The sequence alignment algorithms of Needleman and Wunsch (1970) and Sellers (1974) are compared. Although the former maximizes similarity and the latter minimizes differences, the two procedures are proven to be equivalent. The equivalence relations necessary for each procedure to give the same result are" described J Mol Evol 1981 18 1 38-46 0852 Sobel,E. A Multiple Sequence Al.. Nucleic Acids R 86 14(1):363-374 Sobel E; Martinez HM A Multiple Sequence Alignment Program Multiple alignment; Segment; USA; Sequence alignment; Statistical; Significance; Program "A program is described for simultaneously aligning two or more molecular sequences which is based on first finding common segments above a specified length and then piecing these together to maximize an alignment scoring function. Optimal as well as near-optimal alignments are found, and there is also provided a means for randomizing the given sequences for testing the statistical significance of an alignment." Nucleic Acids Res 1986 14 1 363-374 0853 Spouge,J.L. Improving Sequence-mat.. J.Mol.Biol. 85 181(1):137-138 Spouge JL Improving Sequence-matching Algorithms by Working from Both Ends Pairwise alignment; USA; Algorithm "Recent algorithms (e.g., Ukkonen, Fickett) align nucleic acid sequences (starting from the left) by bounding the allowed distance between subsequences by d, aligning, then incrementing d until all of both sequences are aligned. Aligning from both ends is more efficient." J Mol Biol 1985 181 1 137-138 0854 Spouge,J.L. Speeding up Dynamic Pr.. SIAM J.Appl.Mat 89 49(5):1552-156 Spouge JL Speeding up Dynamic Programming Algorithms for Finding Optimal Lattice Paths Pairwise alignment; Dynamic programming; USA; Optimal; Dynamic; Algorithm "Finding an optimal alignment between two sequences can ... be reduced to finding an optimal lattice path. Dynamic programming algorithms are generally well-suited to such problems, but can be slow and require too much storage .... Faster algorithms requiring less computer storage can often be constructed by restricting calculations to a 'computational volume' known to contain the optimal path." SIAM J Appl Math 1989 49 5 1552-1566 0855 Spouge,J.L. Fast Optimal Alignment Comput.Appl.Bio 91 7(1):1-7 Spouge JL Fast Optimal Alignment Pairwise alignment; USA; Gap; Optimal "A general principle underlies the efficiency of [the efficient alignment algorithms of Fickett and Ukkonen]: inequalities can direct computations to promising subalignments. Hence inequalities can be used to suggest alignment algorithms. Inequalities for unweighted end-gaps, affine and concave gap weights, etc., are discussed ...." Comput Appl Biosci 1991 7 1 1-7 0856 Sprizhitsky,Y Statistical Analysis o.. J.Biomol.Struct 88 6(2):345-358 Sprizhitsky YA; Nechipurenko YD; Alexandrov AA; Volkenstein MV Statistical Analysis of Nucleotide Runs in Coding and Noncoding DNA Sequences Sequence analysis; Significance; RU; Statistical; Coding; DNA; Nucleotide "There are considerable differences of run distributions in DNA sequences of procaryotes, invertebrates and vertebrates. There is an abundance of short runs (1-2 nucleotides long) in the coding sequences and there is a deficiency of such runs in the noncoding regions." J Biomol Struct & Dyn 1988 6 2 345-358 0857 Srinivas,Y.V. A Sheaf-theoretic Appr.. Theoret.Comput. 93 112:53-97 Srinivas YV A Sheaf-theoretic Approach to Pattern Matching and Related Problems String match; Knuth-Morris-Pratt; Pattern match; USA "We present a general theory of pattern matching by adopting an extensional, geometric view of patterns. ... We derive a generalized version of the Knuth-Morris-Pratt string-matching algorithm by gradually converting this extensional description into an intensional description, i.e., an algorithm." Theoret Comput Sci 112 112 53-97 0858 Staden,R. An Interactive Graphic.. Nucleic Acids R 82 10(9):2951-296 Staden R An Interactive Graphics Program for Comparing and Aligning Nucleic Acid and Amino Acid Sequences Pairwise comparison; Dot; UK; Program; Amino acid; Nucleic acid; Graphic "This paper describes a computer program designed to look for similarities between pairs of nucleic or amino acid sequences. ... The basic principle ... was first described by Gibbs and McIntyre (1970) and involves producing a diagram that contains a representation of all the matches between a pair of sequences. This diagram is then scanned by eye and the human ability to recognize patterns used to detect any similarities that might be present." Nucleic Acids Res 1982 10 9 2951-2961 0859 Staden,R. Computer Methods to Lo.. Nucleic Acids R 84 12(1):505-519 Staden R Computer Methods to Locate Signals in Nucleic Acid Sequences Match a pattern matrix; UK; Signal; Nucleic acid "We describe a computer program that can be used to locate poorly defined recognition sequences .... Our methods ... assign separate values to each base at each position of the recognition sequence and can therefore indicate the relative importance of each base at each position. This is done by using a weight matrix to represent each type of recognition sequence." Nucleic Acids Res 1984 12 1 505-519 0860 Staden,R. Methods to Define and .. Comput.Appl.Bio 88 4(1):53-60 Staden R Methods to Define and Locate Patterns of Motifs in Sequences Dictionary match; UK; Motif "A method to define and search for complex patterns of motifs in nucleic acid and protein sequences is described. With this method nucleic acid motifs can be defined in eight different ways and protein motifs in six. A pattern is defined by a list of motifs. ... Programs to search for patterns in individual sequences and libraries of sequences are described." Comput Appl Biosci 1988 4 1 53-60 0861 Staden,R. Methods for Calculatin.. Comput.Appl.Bio 89 5(2):89-96 Staden R Methods for Calculating the Probabilities of Finding Patterns in Sequences Pattern recognition; Significance; UK; Probabilistic; Motif; Probability "This paper describes the use of probability-generating functions for calculating the probabilities of finding motifs in nucleic acid and protein sequences. Equations and algorithms are given for calculating the probabilities associated with nine different ways of defining motifs. Comparisons are made with searches of random sequences." Comput Appl Biosci 1989 5 2 89-96 0862 Staden,R. Methods for Discoverin.. Comput.Appl.Bio 89 5(4):293-298 Staden R Methods for Discovering Novel Motifs in Nucleic Acid Sequences Consensus sequence; Neighbourhood; UK; Motif; Nucleic acid "We describe a computer tool to aid the discovery of new motifs in nucleic acid sequences. ... The heart of the method is the creation of dictionaries of related subsequences [which] can then be analyzed to look for the commonest or best-defined subsequences, those that occur in the highest number of different sequences, or for those in equivalent positions within the family." Comput Appl Biosci 1989 5 4 293-298 0863 Staden,R. Searching for Patterns.. Methods Enzymol 90 183:193-211 Staden R Searching for Patterns in Protein and Nucleic Acid Sequences Match a pattern matrix; UK; Protein; Nucleic acid "There is a rapidly growing number of well-established patterns, especially in nucleic acid sequences, and some readers may wish only to know how to search for these. This chapter, however, describes a set of programs that not only perform searches for known patterns but which also enable users to define their own patterns. The patterns can be defined in many different ways ... and the search programs operate on individual sequences as well as whole libraries of sequences." See Staden (1988) for the programs Methods Enzymol 183 183 193-211 0864 Staden,R. Screening Protein and .. DNA Seq.- J.DNA 91 1:369-374 Staden R Screening Protein and Nucleic Acid Sequences against Libraries of Patterns Match complex patterns; Pattern library; Motif; UK; Protein; Nucleic acid "We describe programs that can screen nucleic acid and protein sequences against libraries of motifs and patterns. Such comparisons are likely to play an important role in interpreting the function of sequences determined during large scale sequencing projects." DNA Seq - J DNA Seq Mapping 1991 1 369-374 0865 States,D.J. Similarity and Homology Sequence Anal.. 91W. H. Freeman States DJ; Boguski MS Similarity and Homology Gribskov M Devereux J Sequence Analysis Primer Sequence analysis; Review; Dot; Dynamic programming; Sequence alignment; Similarity; Homology; USA A review. Similarity versus Homology. Dot Matrix Methods. Dynamic Programming Methods. Scoring Systems. Multiple Sequence Alignment W H Freeman New York 1991 89-157 0866 Sternberg,M.J PROMOT: A FORTRAN Prog.. Comput.Appl.Bio 91 7(2):257-260 Sternberg MJE PROMOT: A FORTRAN Program to Scan Protein Sequences Against a Library of Known Motifs Dictionary match; Pattern library; Motif; UK; Program; Protein "Recently a database (PROSITE) has been established that contains 337 known motifs encoded as a list of allowed residue types at specific positions along the sequence. PROMOT is a FORTRAN computer program that takes a protein sequence and examines if it contains any of the motifs in PROSITE." Comput Appl Biosci 1991 7 2 257-260 0867 Sternberg,M.J Library of common prot.. Nature (Lond.) 91 349(10 Jan.):1 Sternberg MJE Library of common protein motifs Sequence analysis; Significance; Pattern library; Motif; UK; Protein "One problem facing molecular biologists is the evaluation of the significance of finding a common amino-acid sequence motif in different proteins. ... Bairoch has established a library, called PROSITE, of 337 protein motifs, based on the ... SWISSPROT14 database. ... Thus if one finds a known motif in a newly determined protein sequence and [the expected number of chance matches calculated from residue frequencies] < 0.5, then it is likely that this match detects a biologically meaningful relationship." Nature (Lond ) 1991 349 10 Jan. 111-111 0868 Sternberg,M.J Local Protein Sequence.. Protein Eng. 90 4(2):125-131 Sternberg MJE; Islam SA Local Protein Sequence Similarity Does Not Imply a Structural Relationship Subalignment; Significance; UK; Structure; Similarity; Protein "Thus local sequence [similarity] does not indicate a structural similarity when there is neither an evolutionary nor functional explanation to support this. Accordingly structure predictions based on finding a local sequence similarity with an evolutionary unrelated protein of known conformation are unlikely to be valid." Protein Eng 1990 4 2 125-131 0869 Sternberg,M.J Protein Sequences - Ho.. Trends Biotechn 91 9(9):300-302 Sternberg MJE; Islam SA Protein Sequences - Homologies and Motifs Database search; Review; UK; Motif; Homology; Protein "When a protein sequence is determined, it is standard procedure to perform database searches to identify any similarities that may exist to other sequences .... This article outlines some of the software techniques available for performing such searches." Trends Biotechnol 1991 9 9 300-302 0870 Stormo,G.D. Identifying Coding Seq.. Nucleic Acid .. 87IRL Press Stormo GD Identifying Coding Sequences Bishop MJ Rawlings CJ Nucleic Acid and Protein Sequence Analysis: A Practical Approach Match a pattern matrix; USA; Motif; Coding A survey of a variety of techniques which have been employed to describe and locate imprecisely defined motifs IRL Press Oxford, UK 1987 231-258 0871 Stormo,G.D. Computer Methods for A.. Annu.Rev.Biophy 88 17:241-263 Stormo GD Computer Methods for Analyzing Sequence Recognition of Nucleic Acids Match a pattern matrix; Consensus sequence; Review; USA; Sequence recognition; Information content; Nucleic acid; Recognition Perspectives and overview. Sequence patterns in nucleic acids: qualitative specificity, quantitative specificity. Qualitative analysis: finding consensus, matrix methods. Quantitative analysis: information content of binding sites, thermodynamics of recognition, activity matrices. Future directions. Conclusions Annu Rev Biophys Biophys Chem 17 17 241-263 0872 Stormo,G.D. Consensus Patterns in .. Methods Enzymol 90 183:211-221 Stormo GD Consensus Patterns in DNA Consensus sequence; Information theory; USA; Region; Genome; DNA "This chapter describes computer-aided methods useful for the identification and analysis of regulatory sites. The goal of these methods is to extract from a set of known binding sites a pattern which describes the sites and serves to distinguish them from other regions of the genome that are not bound by the protein." Methods Enzymol 183 183 211-221 0873 Stormo,G.D. Identifying Protein-bi.. Proc.Nat.Acad.S 89 86:1183-1187 Stormo GD; Hartzell GW III Identifying Protein-binding Sites from Unaligned DNA Fragments Consensus sequence; Information theory; Pattern recognition; USA; Fragment; DNA "We present a [consensus] method that can be applied to the problem of identifying the recognition pattern for a DNA-binding protein given only a collection of sequenced DNA fragments .... The method compares the 'information content' of a large number of possible binding site alignments to arrive at a matrix representation of the binding site pattern." Proc Nat Acad Sci USA 86 86 1183-1187 0874 Streletc,V.B. Fast, Statistically Ba.. Comput.Appl.Bio 92 8(6):529-534 Streletc VB; Shindyalov IN; Kolchanov NA; Milanesi L Fast, Statistically Based Alignment of Amino Acid Sequences on the Base of Diagonal Fragments of Dot-matrices Pairwise alignment; Dot; RU; Statistical; Fragment; Amino acid "We present a new pairwise alignment algorithm that uses iterative statistical analysis of homologous subsequences. Apart from the classical conversion of the dot-matrix characteristic of the Needleman-Wunsch algorithm (NW), we used only those matrix elements that corresponded to the most non- random subsequence homologies." Comput Appl Biosci 1992 8 6 529-534 0875 Stuckle,E.E. Statistical Analysis o.. Nucleic Acids R 90 18(22):6641-66 Stuckle EE; Emmrich C; Grob U; Nielsen PJ Statistical Analysis of Nucleotide Sequences Sequence analysis; Significance; Markov; DE; Statistical; Signal; Nucleotide "In order to scan nucleic acid databases for potentially relevant but as yet unknown signals, we have developed an improved statistical model for pattern analysis of nucleic acid sequences by modifying previous methods based on Markov chains. ... The model allows the simultaneous analysis of several short sequences with unequal base frequencies and Markov order k not= 0 as is usually the case in databases." Nucleic Acids Res 1990 18 22 6641-6647 0876 Subbiah,S. A Method for Multiple .. J.Mol.Biol. 89 209:539-548 Subbiah S; Harrison SC A Method for Multiple Sequence Alignment with Gaps Multiple alignment; Clustering; USA; Sequence alignment; Gap; Needleman-Wunsch; Optimal "A method that performs multiple sequence alignment by cyclical use of the standard pairwise Needleman-Wunsch algorithm is presented. ... Comparison with the one known case where the optimal multiple sequence alignment has been rigorously determined shows that in practice the proposed method finds the mathematically optimal solution." J Mol Biol 209 209 539-548 0877 Suboch,G.M. Statistical Significan.. Comput.Appl.Bio 90 6(1):43-48 Suboch GM; Sprizhitsky YA Statistical Significance of Some Complex Nucleotide Combinations: A Comparison of DNA Models Sequence analysis; Significance; Markov; RU; Statistical; DNA; Nucleotide; Model "DNA is modelled as a sequence of nucleotide runs. This model is shown to provide a more adequate description of the observed frequencies of occurrence of local homopurine-homopyrimidine mirror repeats than the second-order homogeneous Markov chain model." Comput Appl Biosci 1990 6 1 43-48 0878 Sunday,D.M. A Very Fast Substring .. Comm.ACM 90 33(8):132-142 Sunday DM A Very Fast Substring Search Algorithm String match; USA; Algorithm "This article describes a substring search algorithm that is faster than the Boyer-Moore algorithm. This algorithm does not depend on scanning the pattern string in any particular order. Three variations of the algorithm are given that use three different pattern scan orders." Comm ACM 1990 33 8 132-142 0879 Swofford,D.L. Phylogeny Reconstruction Molecular Sys.. 90Sinauer Associa Swofford DL; Olsen GJ Phylogeny Reconstruction Hillis DM Moritz C Molecular Systematics Multiple alignment; Phylogeny; Evolutionary tree; Character data; Parsimony; USA "Inferring phylogenetic relationships from molecular data requires the selection of an appropriate method from the many techniques that have been described. Unfortunately, phylogenetic analysis is frequently treated as a black box into which data are fed and out of which 'The Tree' springs. Our goal in this chapter is to provide more than a cursory description of the available analytical methods; rather, we hope to develop a conceptual framework for understanding the theoretical and practical distinctions among alternative methodologies." Sinauer Associates Sunderland, MA 1990 411-501 0880 Tajima,F. Determination of Windo.. J.Mol.Evol. 91 33:470-473 Tajima F Determination of Window Size for Analyzing DNA Sequences Sequence analysis; Significance; JP; Region; DNA "DNA sequences are generally not random sequences. To show such nonrandomness visually, DNA sequence data are often plotted as moving averages for a certain length of window slid along a sequence. Here a simple algorithm is presented for determining the window size and for finding a nonrandom region of sequence." J Mol Evol 33 33 470-473 0881 Tajima,K. A New Multiple Alignme.. J.Protein Chem. 88 7(3):292-293 Tajima K A New Multiple Alignment Algorithm for Protein and DNA Sequences Based on Vector-Scalar Matching Multiple alignment; Dynamic programming; Hardware; JP; Gap; Protein; DNA; Algorithm "The proposed new algorithm has two new features. First, the multiple alignment containing gaps is treated as a vector sequence. ... Second, vector and scalar sequences are compared using a dynamic programming approach. This vector-scalar matching enables us to align sequences globally." J Protein Chem 1988 7 3 292-293 0882 Tajima,K. Multiple DNA and Prote.. Comput.Appl.Bio 88 4(4):467-471 Tajima K Multiple DNA and Protein Sequence Alignment on a Workstation and a Supercomputer Multiple alignment; Evolutionary tree; JP; Sequence alignment; Protein; DNA This multiple sequence alignment "method is based on the alignment of a set of aligned sequences with the new sequence, and uses a recursive procedure of such alignment. ... In this paper we describe the method of multiple alignment based on a phylogenetic tree and its application to ... protein and DNA sequences with the use of a workstation and supercomputer." Comput Appl Biosci 1988 4 4 467-471 0883 Takaoka,T. An On-line Pattern Mat.. Inform.Process. 86 22(6):329-330 Takaoka T An On-line Pattern Matching Algorithm String match; Knuth-Morris-Pratt; JP; Pattern match; Parallel; On-line; Algorithm The Boyer-Moore and Knuth-Morris-Pratt "algorithms are off-line ones in the sense that after the pattern is input, the actual matching algorithm runs. ... The present short article gives an algorithm for on-line pattern matching which does pattern matching in parallel with the action of reading input symbols." Inform Process Lett 1986 22 6 329-330 0884 Tanaka,E. A High-speed String Co.. IEEE Trans.Patt 87 9(6):806-815 Tanaka E; Kojima Y A High-speed String Correction Method Using a Hierarchical File Sequence comparison; Correction; JP; Hierarchical "We proposed a multistage hierarchical string correction method for large vocabulary. The lower bound of computational labor is estimated, and it is shown that a multistage string correction method using a special type of a hierarchical file can reduce computational labor greatly. ... Another application of this technique is a search for approximate matches in a large file." IEEE Trans Patt Anal Mach Intell 1987 9 6 806-815 0885 Tarhio,J. Approximate Boyer-Moor.. SIAM J.Comput. 93 22(2):243-260 Tarhio J; Ukkonen E Approximate Boyer-Moore String Matching Boyer-Moore; Match with k mismatches; Match with k differences; FI; String match "The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. ... [k mismatches and k differences.] ... The new algorithms are often significantly faster than the old ones. Both algorithms are functionally equivalent with the Horspool version of the Boyer-Moore algorithm when k = 0." SIAM J Comput 1993 22 2 243-260 0886 Taylor,P. A Fast Homology Progra.. Nucleic Acids R 84 12(1):447-455 Taylor P A Fast Homology Program for Aligning Biological Sequences Pairwise alignment; UK; Gap; Program; Evolutionary distance; Homology This paper describes improved algorithms for computing the alignment (evolutionary distance and optimal path) of a pair of sequences subject to constraints on the form of the gap weighting function. Compare with Gotoh (1982) and Waterman, Smith, and Beyer (1976) Nucleic Acids Res 1984 12 1 447-455 0887 Taylor,P. A New Method for Findi.. Comput.Appl.Bio 91 7(4):495-500 Taylor P; Rosenberg P; Samsonova MG A New Method for Finding Long Consensus Patterns in Nucleic Acid Sequences Consensus sequence; UK; Nucleic acid "We describe a fast computer algorithm for identifying consensus patterns in DNA sequences. The method requires no prior assumptions about the consensus pattern other than its length. ... [It permits] the analysis of long sequences for consensus patterns of up to 16 bases." Comput Appl Biosci 1991 7 4 495-500 0888 Taylor,W.R. Identification of Prot.. J.Mol.Biol. 86 188:233-258 Taylor WR Identification of Protein Sequence Homology by Consensus Template Alignment Multiple alignment; Pattern match; UK; Sequence alignment; Consensus sequence; Identification; Template; Homology; Protein A multiple sequence alignment algorithm which (like Bains 1986) relies on the iterative definition of a 'consensus' sequence that determines the register of all the sequences considered. Interplay between consensus sequence and multiple alignment J Mol Biol 188 188 233-258 0889 Taylor,W.R. Multiple Sequence Alig.. Comput.Appl.Bio 87 3(2):81-87 Taylor WR Multiple Sequence Alignment by a Pairwise Algorithm Multiple alignment; Clustering; UK; Sequence alignment; Algorithm "An algorithm is described that processes the results of a conventional pairwise sequence alignment program to automatically produce an unambiguous multiple alignment of many sequences. Unlike other, more complex, multiple alignment programs, the method described here is fast enough to be used on almost any multiple sequence alignment problem." Comput Appl Biosci 1987 3 2 81-87 0890 Taylor,W.R. A Flexible Method to A.. J.Mol.Evol. 88 28:161-169 Taylor WR A Flexible Method to Align Large Numbers of Biological Sequences Multiple alignment; UK; Consensus sequence; Clustering "A method for the alignment of two or more biological sequences is described. The method is a direct extension of the method of Taylor (1987) incorporating a consensus sequence approach and allows considerable freedom in the control of the clustering of the sequences." Interplay between consensus sequence and multiple alignment J Mol Evol 28 28 161-169 0891 Taylor,W.R. Pattern Matching Metho.. Protein Eng. 88 2(2):77-86 Taylor WR Pattern Matching Methods in Protein Sequence Comparison and Structure Prediction Match a pattern matrix; Review; UK; Pattern match; Sequence comparison; Structure; Dynamic programming; Protein; Prediction Review of template based methods, dynamic programming based methods, and fragment based methods Protein Eng 1988 2 2 77-86 0892 Taylor,W.R. A Template Based Metho.. Progress in Bio 89 54:159-252 Taylor WR A Template Based Method of Pattern Matching in Protein Sequences Match complex patterns; Pattern match; UK; Structure; Template; Protein "The following sections of the current work describe the "Template" program of Taylor (1986a) and some of its applications." Template Method: Specification and Matching of Simple Patterns. Combinatorics, Template Interactions and Domain Recognition. Secondary Structure Prediction. Match Sets, Multiple Sequences and Sources of Patterns Progress in Biophysics and Molecular Biology 54 54 159-252 0893 Taylor,W.R. Hierarchical Method to.. Methods Enzymol 90 183:456-474 Taylor WR Hierarchical Method to Align Large Numbers of Biological Sequences Multiple alignment; Clustering; UK; Hierarchical "In this chapter I describe the computer program that resulted in response to my own need to align more than two protein sequences." Methods Enzymol 183 183 456-474 0894 Taylor,W.R. Templates, Consensus P.. Curr.Opin.Struc 91 1:327-333 Taylor WR; Jones DT Templates, Consensus Patterns and Motifs Consensus sequence; Pattern match; Review; UK; Motif; Structure; Template "Current methods in pattern and consensus-sequence matching are reviewed. Attention is focused on those studies in which these methods have been applied to either known structures or structure prediction, including some applications that use machine learning and artificial intelligence." Curr Opin Struct Biol 1 1 327-333 0895 Thompson,K. Regular Expression Sea.. Comm.ACM 68 11(6):419-422 Thompson K Regular Expression Search Algorithm Match complex patterns; Automata; Language; USA; Expression; Signal; Algorithm "A method for locating specific character strings embedded in character text is described and an implementation of this method in the form of a compiler is discussed. ... The object program then accepts the text to be searched as input and produces a signal every time an embedded string in the text matches the given regular expression." Comm ACM 1968 11 6 419-422 0896 Thorne,J.L. Freeing Phylogenies fr.. Mol.Biol.Evol. 92 9(6):1148-1162 Thorne JL; Kishino H Freeing Phylogenies from Artifacts of Alignment Multiple alignment; Phylogeny; Likelihood; USA; Evolutionary distance; Evolutionary tree "Widely used methods for phylogenetic inference, both those that require and those that produce alignments, share certain weaknesses. ... A method that lacks them is introduced. For each pair of sequences in the data set, the method utilizes both insertion-deletion and amino acid replacement information to estimate a pairwise evolutionary distance. ... The distance matrix and standard error estimates are used to infer a phylogenetic tree." Mol Biol Evol 1992 9 6 1148-1162 0897 Thorne,J.L. An Evolutionary Model .. J.Mol.Evol. 91 33(2):114-124 Thorne JL; Kishino H; Felsenstein J An Evolutionary Model for Maximum Likelihood Alignment of DNA Sequences Pairwise alignment; Likelihood; Dynamic programming; USA; Statistical; DNA; Model "Most algorithms for the alignment of biological sequences are not derived from an evolutionary model. Consequently, these alignment algorithms lack a strong statistical basis. A maximum likelihood method for the alignment of two DNA sequences is presented. This method is based upon a statistical model of DNA sequence evolution for which we have obtained explicit transition probabilities." J Mol Evol 1991 33 2 114-124 0898 Thorne,J.L. Inching Toward Reality.. J.Mol.Evol. 92 34(1):3-16 Thorne JL; Kishino H; Felsenstein J Inching Toward Reality: An Improved Likelihood Model of Sequence Evolution Pairwise alignment; Likelihood; Dynamic programming; USA; Substitution; Evolution; Model "Our previous evolutionary model [1991] is generalized to permit approximate treatment of multiple-base insertions and deletions as well as regional heterogeneity of substitution rates. Parameter estimation and alignment procedures that incorporate these generalizations are developed." J Mol Evol 1992 34 1 3-16 0899 Tichy,W.F. The String-to-string C.. ACM Trans.Compu 84 2(4):309-321 Tichy WF The String-to-string Correction Problem with Block Moves Pairwise alignment; USA; Correction "An algorithm that produces the shortest edit sequence transforming one string into another is presented. The algorithm is optimal in the sense that it generates a minimal covering set of common substrings of one string with respect to another. Two improvements of the basic algorithm are developed. ... The block move algorithm ... runs in linear time and space." ACM Trans Comput Systems 1984 2 4 309-321 0900 Timkovskii,V. Complexity of Common S.. Cybernetics 90 25(5):565-580 Timkovskii VG Complexity of Common Subsequence and Supersequence Problems and Related Problems Longest common; Supersequence; Complexity; RU; Subsequence Translated from Kibernetika, No. 5, pp. 1-13, September-October, 1989. "In this paper, we consider old and new polynomial-time and NP-hard problems of finding longest common subsequences and subwords and shortest common supersequences and superwords .... The results provide a more complete characterization of the complexity of these problems. We also discuss the dual problems ...." Cybernetics 1990 25 5 565-580 0901 Tyler,E.C. A Review of Algorithms.. Comput.Biomed.R 91 24(1):72-96 Tyler EC; Horton MR; Krause PR A Review of Algorithms for Molecular Sequence Comparison Pairwise comparison; Review; USA; Sequence comparison; Algorithm "Most computer analyses of nucleic acid and protein sequences depend on comparisons between sequences. ... This paper reviews algorithms currently in use to solve comparison problems in molecular biology. Each algorithm is explained in detail and discussed in terms of the molecular biology problems it is most suited to solve." Comput Biomed Res 1991 24 1 72-96 0902 Tyson,H. Alignment of Nucleotid.. Comput.Methods 85 21:3-10 Tyson H; Haley B Alignment of Nucleotide or Amino Acid Sequences on Microcomputers, Using a Modification of Sellers' (1974) Algorithm which Avoids the Need for Calculation of the Complete Distance Matrix Pairwise alignment; CA; Gap; Distance; Amino acid; Nucleotide; Algorithm; Matrix "The Sellers algorithm for calculating distance between sequences has been modified to reduce its demands on microcomputer memory space by more than half. Gap penalties and mismatch scores are user-adjustable." Comput Methods Programs Biomed 21 21 3-10 0903 Ukkonen,E. On Approximate String .. Lecture Notes i 83 158:487-495 Ukkonen E On Approximate String Matching Pairwise alignment; FI; Edit; String match In Foundations of Computation Theory, Proceedings of the 1983 International FCT-Conference, Borgholm, Sweden, August 21-27, 1983. "An algorithm is given for computing the edit distance as well as the corresponding sequence of editing steps ... between two strings .... The algorithm needs time O(s min(m, n)) and space O(s2) where s is the edit distance .... For small s this is a considerable improvement over the best previously known algorithm that needs time and space O(mn)." Lecture Notes in Comput Sci 158 158 487-495 0904 Ukkonen,E. Algorithms for Approxi.. Inform.Control 85 64:100-118 Ukkonen E Algorithms for Approximate String Matching Pairwise alignment; FI; String match; Algorithm This paper is a revised and expanded version of Ukkonen (1983). The author develops an improved algorithm to compute the Levenshtein distance between a pair of strings Inform Control (Orlando) 64 64 100-118 0905 Ukkonen,E. Finding Approximate Pa.. J.Algorithms 85 6(1):132-137 Ukkonen E Finding Approximate Patterns in Strings Match with k differences; Automata; FI; Edit "Let p (the pattern) be a string and t >= 0 an integer. The problem of locating in any string a substring whose edit distance from p is at most a given constant t is considered. An algorithm is presented to construct a deterministic finite-state automaton that solves the problem." J Algorithms 1985 6 1 132-137 0906 Ulam,S.M. Some Combinatorial Pro.. Applications .. 72Academic Press Ulam SM Some Combinatorial Problems Studied Experimentally on Computing Machines Zaremba SK Applications of Number Theory to Numerical Analysis Sequence proximity; USA; Coding; Combinatorial "Two classes of problems are discussed. In the first group the main questions concern the behavior of sequences of symbols, coding physical or biological properties. A fundamental question concerns the notion of a distance ... in the spaces of such sequences." Academic Press New York 1972 1-10 0907 Ulam,S.M. Some Ideas and Prospec.. Annu.Rev.Biophy 72 1:277-292 Ulam SM Some Ideas and Prospects in Biomathematics Sequence proximity; USA; Codon "A quantitative treatment of problems of morphology could employ the notion of a distance as a measure of difference between the elements of a set that constitutes the object of a study. ... We shall give here several examples. A fundamental one is the set of codons in a DNA chain." Annu Rev Biophys Bioeng 1 1 277-292 0908 Ullmann,J.R. A Binary N-gram Techni.. Comput.J. 77 20(2):141-147 Ullmann JR A Binary N-gram Technique for Automatic Correction of Substitution, Deletion, Insertion and Reversal Errors in Words Match with k differences; N-gram; Correction; UK; Coding; Error; Substitution; Word; Reversal; Deletion "This paper offers three basic contributions to n-gram technology. First, a method of reducing storage requirements by random superimposed coding. Second, an n-gram method for finding all dictionary words that differ from a given word by up to two errors. Third, an n-gram method for correcting up to two substitution, insertion, deletion and reversal errors without doing a separate computation for every possible pair of errors." Comput J 1977 20 2 141-147 0909 Dayhoff,M.O. Computer Analysis of P.. Sci.Am. 69 221(1):86-95 Dayhoff MO Computer Analysis of Protein Evolution Phylogeny; USA; Evolution; Protein "Amino acid sequences of similar proteins in different organisms contain information on relations among species. This information is analyzed to reconstruct in detail the history of living things." Sci Am 1969 221 1 86-95 0910 Crochemore,M. String-Matching on Ord.. Theoret.Comput. 92 92:33-47 Crochemore M String-Matching on Ordered Alphabets String match; Regularities; FR; Complexity "We present a new string-matching algorithm that exploits an ordering of the alphabet. The algorithm is linear in time and uses a fixed number of memory locations in addition to the text and the pattern. Therefore, it is time-space- optimal. Its main characteristic is that it scans the pattern from left to right. No preprocessing of the pattern is needed and the complexity is independent of the size of the pattern. An important consequence is the possibility of computing the periods of a word in linear time and constant space. The algorithm can also be turned into a real-time string-matching algorithm." Theoret Comput Sci 92 92 33-47 0911 van der Woude Playing with Patterns,.. Sci.Comput.Prog 89 12(3):177-190 van der Woude J Playing with Patterns, Searching for Strings String match; Knuth-Morris-Pratt; Regularities; NL; Pattern search "We present [an] exercise that is especially interesting for problems dealing with periodicity. In particular it enables us to treat preprocessing and search in the Knuth-Morris-Pratt pattern search algorithm as a unit. The main objective of this paper is the design, not the algorithm(s). ... Driven by correctness arguments we calculate the algorithm." Sci Comput Programming 1989 12 3 177-190 0912 van Heel,M. A New Family of Powerf.. J.Mol.Biol. 91 220(4):877-887 van Heel M A New Family of Powerful Multivariate Statistical Sequence Analysis Techniques Sequence analysis; Invariant; DE; Multivariate; Statistical "A novel multivariate statistical approach is presented for extracting and exploiting intrinsic information present in our ever-growing sequence data banks. The information extraction from the sequences avoids the pitfalls of intersequence alignment by analyzing secondary invariant functions derived from the sequences in the data bank rather than the sequences themselves. ... The ... principles can be used for a wide spectrum of sequence analysis problems ...." J Mol Biol 1991 220 4 877-887 0913 Venezia,D. Rapid Motif Compliance.. Comput.Appl.Bio 93 9(1):65-69 Venezia D; O'Hara PJ Rapid Motif Compliance Scoring with Match Weight Sets Match complex patterns; Motif; Language; USA; Expression; Scoring "The program MOTIF incorporates a weight matrix and a rapid, backtracking tree-search algorithm to score motif compliance with greatly enhanced performance while placing no constraints on the motif. ... MOTIF allows a choice of regular expression formats and can use both motif and sequence libraries as either targets or queries." Comput Appl Biosci 1993 9 1 65-69 0914 Vihinen,M. An Algorithm for Simul.. Comput.Appl.Bio 88 4(1):89-92 Vihinen M An Algorithm for Simultaneous Comparison of Several Sequences Multiple comparison; Dot; FI; Region; Algorithm A dot matrix approach. "Conserved regions of one sequence are located by doing pairwise comparisons with other sequences .... The observation matrices filled with scores of comparisons are superimposed and added together and those points having values greater than or equal to stringency are accepted." Comput Appl Biosci 1988 4 1 89-92 0915 Vihinen,M. Simultaneous Compariso.. Methods Enzymol 90 183:447-456 Vihinen M Simultaneous Comparison of Several Sequences Multiple comparison; Dot; FI; Region "The difference between sequence comparison and alignment is that the former indicates all similarities between the sequences whereas the latter method aligns the matching bases or residues. ... The comparisons give overall sequence similarity regardless of alignment. ... A new method to study sequence similarities by comparing one sequence with another was developed. In this approach pairwise comparisons of aligned sequences are superimposed to search conserved regions of the query sequence." Methods Enzymol 183 183 447-456 0916 Vihinen,M. MULTICOMP: a Program P.. Comput.Appl.Bio 92 8(1):35-38 Vihinen M; Euranto A; Luostarinen P; Nevalainen O MULTICOMP: a Program Package for Multiple Sequence Comparison Multiple comparison; Dot; FI; Sequence comparison; Program "The MULTICOMP program package includes several procedures with which one query sequence can be compared simultaneously to several DNA, RNA or amino acid sequences. The same technique was also introduced for comparing propensities of secondary structural features, which can be predicted on the basis of amino acid sequences." Comput Appl Biosci 1992 8 1 35-38 0917 Vingron,M. A Fast and Sensitive M.. Comput.Appl.Bio 89 5(2):115-121 Vingron M; Argos P A Fast and Sensitive Multiple Sequence Alignment Algorithm Multiple alignment; Segment; Sequence weight; DE; Sequence alignment; Algorithm "A two-step multiple alignment strategy is presented that allows rapid alignment of a set of homologous sequences and comparison of pre-aligned groups of sequences." Comput Appl Biosci 1989 5 2 115-121 0918 Vingron,M. Determination of Relia.. Protein Eng. 90 3(7):565-569 Vingron M; Argos P Determination of Reliable Regions in Protein Sequence Alignments Subalignment; Significance; DE; Region; Sequence alignment; Sequence comparison; Protein "Judging the significance of alignments is still a major problem in sequence comparison. We present a method to delineate reliable regions within an alignment. This differs from standard approaches in that it does not attempt to attribute one significance value to the alignment as a whole, but assesses alignment quality locally. An algorithm is provided that predicts which residue pairs in an alignment are likely to be correctly matched." Protein Eng 1990 3 7 565-569 0919 Vingron,M. Motif Recognition and .. J.Mol.Biol. 91 218:33-43 Vingron M; Argos P Motif Recognition and Alignment for Many Sequences by Comparison of Dot- matrices Multiple alignment; Dot; Motif; DE; Region; Sequence alignment; Gap; Recognition "We present an algorithm to delineate dot-plot agreement. A novel procedure ... is developed to identify common patterns and reliably aligned regions in a set of distantly related sequences. The algorithm finds motifs independent of input sequence lengths and reduces the dependence on gap penalties. When sequences share greater similarity, the same approach converts to a multiple sequence alignment procedure." J Mol Biol 218 218 33-43 0920 Vintsyuk,T.K. Speech Discrimination .. Cybernetics 68 4(1):52-57 Vintsyuk TK Speech Discrimination by Dynamic Programming Pairwise alignment; Dynamic programming; RU; Discrimination; Signal; Dynamic Also (Russian) Kibernetika, 4(1), 81-88, 1968. "In our proposed algorithm for the recognition of words the greatest possible match between the readings of [a vector of spectral intensity readings] for the unknown signal and for the standard of its class is achieved .... The recognition of words is carried out through discrimination of the components of the word and is accomplished by the method of dynamic programming." Vintsyuk and Needleman and Wunsch (1970) are early users of dynamic programming to compare sequences Cybernetics 1968 4 1 52-57 0921 Vishkin,U. Optimal Parallel Patte.. Inform.Control 85 67:91-113 Vishkin U Optimal Parallel Pattern Matching in Strings String match; Parallel; IL; Pattern match; Optimal "Given a text of length n and a pattern of length m, we present a parallel linear algorithm for finding all occurrences of the pattern in the text. The algorithm runs in O(n/p) time using any number of p <= n/log m processors on a concurrent-read concurrent-write parallel random-access-machine." Inform Control (Orlando) 67 67 91-113 0922 Vishkin,U. Deterministic Sampling.. SIAM J.Comput. 91 20(1):22-40 Vishkin U Deterministic Sampling - A New Technique for Fast Pattern Matching Parallel; IL; Pattern match; Pattern recognition; String match; Sampling "... This approach enables the text analysis ... to be performed in O(log* n) time and optimal speedup on a PRAM. This improves on the previous fastest optimal speedup result. It also leads to a new serial algorithm for string matching that runs in linear time including preprocessing. The approach is expected to be applicable for pragmatic pattern recognition problems." SIAM J Comput 1991 20 1 22-40 0923 Vogel,H. Generalization and Sim.. J.Mol.Evol. 78 10:339-348 Vogel H Generalization and Simplification of the Moore-Goodman Test for Significance of Alignment Homologies Pairwise alignment; Significance; FR; Codon; Homology "A test given by Moore and Goodman (1977) that checks the significance of a homology between protein sequences is generalized to any type of distance measure and to any classification of codon pairs or amino acids according to this measure." J Mol Evol 10 10 339-348 0924 Vogt,G. Searching for Distantl.. Comput.Appl.Bio 92 8(1):49-55 Vogt G; Argos P Searching for Distantly Related Protein Sequences in Large Databases by Parallel Processing on a Transputer Machine Database search; Parallel; DE; Sequence alignment; Protein "AliMac is an implementation of a sensitive sequence alignment algorithm on a parallel computer. The method achieves reliable alignments for very distantly related sequences from a combined use of amino acid exchange weights and physicochemical characteristics. ... This paper describes the AliMac hardware and software and discusses problems and peculiarities of parallel implementations, especially with transputers." Comput Appl Biosci 1992 8 1 49-55 0925 von Heijne,G. Computer Analysis of D.. Eur.J.Biochem. 91 199:253-256 von Heijne G Computer Analysis of DNA and Protein Sequences Sequence analysis; Review; Sequence database; Motif; Sequence alignment; Neural; SWE; Protein; DNA "Some recent trends in the development of theoretical methods for DNA and protein sequence analysis are reviewed, with particular emphasis on the design of new databases, motif searches, sequence alignment algorithms and applications of neural networks." Sensitivity. Speed. Multiple alignments Eur J Biochem 199 199 253-256 0926 Wagner,R.A. Order-n Correction for.. Comm.ACM 74 17(5):265-268 Wagner RA Order-n Correction for Regular Languages Correction; Language; Automata; USA "A method is presented for calculating a string B, belonging to a given regular language L, which is 'nearest' (in number of edit operations) to a given input string a. B is viewed as a reasonable 'correction' for the possibly erroneous string a, where a was originally intended to be a string of L." Comm ACM 1974 17 5 265-268 0927 Wagner,R.A. On the Complexity of t.. ACM Sympos.Theo 75 7:218-223 Wagner RA On the Complexity of the Extended String-to-string Correction Problem Pairwise alignment; Complexity; USA; Correction Albuquerque, NM, 5-7 May 1975. "The Extended String-to-String Correction Problem (ESSCP) is defined as the problem of determining, for given strings A and B over alphabet V, a minimum-cost sequence S of edit operations such that S(A) = B. The sequence S may make use of the operations Change, Insert, Delete and Swap .... Thus, 'almost all' ESSCPs can be solved in deterministic polynomial time, but the general problem is NP-complete." ACM Sympos Theory Comput 7 7 218-223 0928 Wagner,R.A. On the Complexity of t.. Time Warps, S.. 83Addison-Wesley Wagner RA On the Complexity of the Extended String-to-string Correction Problem Sankoff D Kruskal JB Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison Pairwise alignment; Complexity; USA; Correction Permits insert, replace, delete, and swap operations. "In the present chapter we analyze this algorithm and investigate its running time with respect to the length of the strings being compared and the relative costs of the various edit operations." See Wagner (1975) for an extended abstract with the same title Addison-Wesley Reading, MA 1983 215-235 0929 Wagner,R.A. The String-to-String C.. J.Assoc.Comput. 74 21(1):168-173 Wagner RA; Fischer MJ The String-to-String Correction Problem Pairwise alignment; Longest common; Correction; USA Permits the edit operations of insertion, deletion, and mutation - the Levenshtein distance. "An algorithm is presented which solves this problem in time proportional to the product of the lengths of the two strings." J Assoc Comput Mach 1974 21 1 168-173 0930 Wagner,R.A. Correcting Counter-Aut.. SIAM J.Comput. 78 7(3):357-375 Wagner RA; Seiferas JI Correcting Counter-Automaton-Recognizable Languages Dictionary match; Automata; Correction; USA; Language "The edit operations considered here are single-character deletions, single-character insertions, and single-character substitutions, each at an independent cost that does not depend on context. Employing a linear-time algorithm for solving single-origin graph shortest distance problems, it is shown how to correct a string of length n into the language accepted by a counter automaton in time proportional to n2 on a RAM with unit operation cost function." SIAM J Comput 1978 7 3 357-375 0931 Wallace,J.C. PATMAT: A Searching an.. Comput.Appl.Bio 92 8(3):249-254 Wallace JC; Henikoff S PATMAT: A Searching and Extraction Program for Sequence, Pattern and Block Queries and Databases Database search; USA; Program; Query "A program has been developed that provides molecular biologists with multiple tools for searching databases, yet uses a very simple interface. PATMAT can use protein or (translated) DNA sequences, patterns or blocks of aligned proteins as queries of databases consisting of amino acid or nucleotide sequences, pattern or blocks." Comput Appl Biosci 1992 8 3 249-254 0932 Wallin,E. Fast Needleman-Wunsch .. Comput.Appl.Bio 93 9(1):117-118 Wallin E; Wettergren C; Hedman F; von Heijne G Fast Needleman-Wunsch Scanning of Sequence Databanks on a Massively Parallel Computer Pairwise alignment; Parallel; Needleman-Wunsch; SWE; Databank "We have implemented the Needleman-Wunsch (NW) algorithm on a massively parallel computer, an 8K CM-2 machine (Thinking Machines Co., Cambridge, MA)." Compare with Lander, Mesirov, Taylor (1989) Comput Appl Biosci 1993 9 1 117-118 0933 Wang,Y.P. Optimal Correspondence.. IEEE Trans.Patt 90 12(11):1080-10 Wang YP; Pavlidis T Optimal Correspondence of String Subsequences Longest common; Sequence proximity; Language; USA; Optimal; Subsequence "The problem of substring matching when ... the alphabet is infinite ... has received less attention. We present definitions of string distance, an effective way of computing them, and matching algorithms minimizing such distances. Our analysis also includes the matching of strings to regular expressions." IEEE Trans Patt Anal Mach Intell 1990 12 11 1080-1087 0934 Watanabe,K. Optimal Alignments of .. Comput.Appl.Bio 85 1(2):83-87 Watanabe K; Urano Y; Tamaoki T Optimal Alignments of Biological Sequences on a Microcomputer Pairwise alignment; CA; Optimal "An algorithm and a program have been developed which enable optimal alignments of biological sequences on an 8-bit microcomputer." The algorithm is based on Waterman, Smith, Beyer (1976) Comput Appl Biosci 1985 1 2 83-87 0935 Waterman,M.S. Sequence Alignments in.. Proc.Nat.Acad.S 83 80:3123-3124 Waterman MS Sequence Alignments in the Neighborhood of the Optimum with General Application to Dynamic Programming Pairwise alignment; Dynamic programming; Locally optimal; USA; Sequence alignment; Dynamic "There are sometimes unknown constraints on the sequences that cause the 'true' alignment to disagree with the optimum (computer) solution. To assist in overcoming these difficulties, an algorithm has been developed to produce all alignments within a specified distance of the optimum. The distance can be chosen after the optimum is computed, and the algorithm can be repeated at will." Proc Nat Acad Sci USA 80 80 3123-3124 0936 Waterman,M.S. Efficient Sequence Ali.. J.Theor.Biol. 84 108:333-337 Waterman MS Efficient Sequence Alignment Algorithms Pairwise alignment; Gap; USA; Sequence alignment; Algorithm Sequence alignments with "multiple insertion/deletions are known to increase computation time from O(n2) to O(n3) although Gotoh has presented an O(n2) algorithm in the case the multiple insertion/deletion weighting function is linear. It is argued in this paper that it could be desirable to use concave weighting functions. For that case, an algorithm is derived that is conjectured to be O(n2)." J Theor Biol 108 108 333-337 0937 Waterman,M.S. General Methods of Seq.. Bull.Math.Biol. 84 46(4):473-500 Waterman MS General Methods of Sequence Comparison Sequence comparison; Dynamic programming; Subalignment; Database search; Review; USA; Region; Segment "Mathematical methods for comparison of nucleic acid sequences are reviewed. There are two major methods of sequence comparison: dynamic programming and a method referred to here as the regions method. The problem types discussed are comparison of two sequences, location of long matching segments, efficient database searches and comparison of several sequences." Bull Math Biol 1984 46 4 473-500 0938 Waterman,M.S. Multiple Sequence Alig.. Nucleic Acids R 86 14(22):9095-91 Waterman MS Multiple Sequence Alignment by Consensus Multiple alignment; Consensus sequence; USA; Sequence alignment Describes an algorithm for multiple sequence alignment that matches words of length and degree of mismatch chosen by the user. The method is based on the consensus sequence algorithm described by Waterman, Arratia, and Galas (1984) Nucleic Acids Res 1986 14 22 9095-9102 0939 Waterman,M.S. Computer Analysis of N.. Methods Enzymol 88 164:765-793 Waterman MS Computer Analysis of Nucleic Acid Sequences Sequence analysis; Sequence comparison; Consensus sequence; Structure; Review; USA In Noller,H.F.,Jr., Moldave,K. (Eds.), Ribosomes. "I make no attempt to survey the literature. Instead I try to describe some useful and interesting methods of sequence analysis that utilize the power of computers." Sequence comparisons. Consensus patterns. Secondary structure. Conclusions Methods Enzymol 164 164 765-793 0940 Waterman,M.S. Consensus Patterns in .. Mathematical .. 89CRC Press Waterman MS Consensus Patterns in Sequences Waterman MS Mathematical Methods for DNA Sequences Consensus sequence; Neighbourhood; USA Review of an approach to identifying DNA features that are not conserved precisely in location or in pattern. Applications to consensus words or palindromes in multiple sequences, consensus within one sequence, and long consensus patterns CRC Press Boca Raton, FL 1989 93-115 0941 Mathematical Methods f.. 89CRC Press Mathematical Methods for DNA Sequences Waterman MS BK - Sequence analysis; Sequence comparison; Review; USA; Statistical; DNA The book has ten chapters on mathematical, statistical, and computer methods for analyzing DNA sequences CRC Press Boca Raton, FL 1989 x+283-x+283 0942 Waterman,M.S. Sequence Alignments Mathematical .. 89CRC Press Waterman MS Sequence Alignments Waterman MS Mathematical Methods for DNA Sequences Sequence alignment; Consensus sequence; Database search; Review; USA; Dynamic programming Review of the dynamic programming alignment of two or more sequences, the consensus alignment of multiple sequences, and the comparison of a sequence to a data base CRC Press Boca Raton, FL 1989 53-92 0943 Waterman,M.S. Pattern Recognition in.. Bull.Math.Biol. 84 46(4):515-527 Waterman MS; Arratia R; Galas DJ Pattern Recognition in Several Sequences: Consensus and Alignment Consensus sequence; Neighbourhood; USA; Pattern recognition; Statistical; Recognition "This paper gives a new and practical solution for finding unknown patterns that occur imperfectly above a preset frequency. Algorithms for finding the patterns are given as well as estimates of statistical significance." The consensus concept depends on window width, consensus sequence length, and neighbourhood specification Bull Math Biol 1984 46 4 515-527 0944 Waterman,M.S. A Dynamic Programming .. Math.Biosci. 85 77(1/2):179-18 Waterman MS; Byers TH A Dynamic Programming Algorithm to Find all Solutions in a Neighborhood of the Optimum Pairwise alignment; Locally optimal; Dynamic programming; USA; Dynamic; Algorithm "Just after he introduced dynamic programming, Richard Bellman with R. Kalaba in 1960 gave a method for finding Kth best policies. Their method has been modified wince then, but it is still not practical for many problems. This paper describes a new technique which modifies the usual backtracking procedure and lists all near-optimal policies. This practical algorithm is very much in the spirit of the original formulation of dynamic programming. An application to matching biological sequences is given." Math Biosci 1985 77 1/2 179-188 0945 Waterman,M.S. A New Algorithm for Be.. J.Mol.Biol. 87 197(4):723-728 Waterman MS; Eggert M A New Algorithm for Best Subsequence Alignments with Application to tRNA- rRNA Comparisons Subalignment; Locally optimal; USA; Gap; Subsequence; Algorithm "The algorithm of Smith and Waterman for identification of maximally similar subsequences is extended to allow identification of all non-intersecting similar subsequences with similarity score at or above some preset level. The resulting alignments are found in order of score, with the highest scoring alignment first. In the case of single gaps or multiple gaps weighted linear with gap length, the algorithm is extremely efficient ...." J Mol Biol 1987 197 4 723-728 0946 Waterman,M.S. Parametric Sequence Co.. Proc.Nat.Acad.S 92 89:6090-6093 Waterman MS; Eggert M; Lander E Parametric Sequence Comparisons Sequence comparison; Dynamic programming; USA; Parametric; Statistical Compare two sequences. "We present an algorithm to efficiently find the optimal alignments for all choices of the penalty parameters. It is then possible to systematically explore these alignments for those with the most biological or statistical interest." Proc Nat Acad Sci USA 89 89 6090-6093 0947 Waterman,M.S. Phase Transitions in S.. Proc.Nat.Acad.S 87 84(5):1239-124 Waterman MS; Gordon L; Arratia R Phase Transitions in Sequence Matches and Nucleic Acid Structure Sequence analysis; Sequence comparison; Significance; USA; Region; Sequence match; Structure; Nucleic acid; Transition "Extremal properties, such as longest helical region, can now be studied with a new family of probability distributions [Arratia, Gordon, Waterman, 1986]. Not only is such extremal behavior analyzed with great precision, but new phase transitions are determined. ... These results ... also have importance for significance tests in comparison of nucleic acid or protein sequences." Proc Nat Acad Sci USA 1987 84 5 1239-1243 0948 Waterman,M.S. Consensus Methods for .. Methods Enzymol 90 183:221-237 Waterman MS; Jones R Consensus Methods for DNA and Protein Sequence Alignment Consensus sequence; Neighbourhood; USA; Consensus method; Protein; Sequence alignment; DNA "The purpose of this chapter is to present some of the tools that we have created in order to analyze multiple sequences in a rigorous, efficient, and systematic way. ... Our approach is based on what we refer to as consensus analysis. ... The basis of the consensus method is an algorithm to find consensus words, with the degree of matching and alignment specified by the user of the program." Methods Enzymol 183 183 221-237 0949 Waterman,M.S. Computer Alignment of .. Phylogenetic .. 91Oxford Universi Waterman MS; Joyce J; Eggert M Computer Alignment of Sequences Miyamoto MM Cracraft J Phylogenetic Analysis of DNA Sequences Sequence alignment; Review; USA; Statistical Sequence alignment, aligning full sequences, maximum sequences, statistical distribution of alignment scores, multiple sequence alignment Oxford University Press New York 1991 59-72 0950 Waterman,M.S. Line Geometries for Se.. Bull.Math.Biol. 84 46(4):567-577 Waterman MS; Perlwitz MD Line Geometries for Sequence Comparisons Multiple alignment; Dynamic programming; Evolutionary tree; USA; Sequence comparison; Sequence alignment; Geometry "A simple generalization of the sequences makes it possible to obtain some results about the geometry of sequence alignments. These ideas suggest heuristic approaches to problems of comparing several sequences. If M sequences [of length n] are known to be related by a binary tree, they can be aligned in O(MN2) time and O(N2 + NM) storage." Bull Math Biol 1984 46 4 567-577 0951 Waterman,M.S. Some Biological Sequen.. Adv.Math. 76 20(3):367-387 Waterman MS; Smith TF; Beyer WA Some Biological Sequence Metrics Sequence proximity; Multiple alignment; Information theory; USA Section 8 "extends the notion of distance between two sequences to a distance among n sequences: the n-distance. An algorithm which computes this distance is given. The algorithm also gives the alignment of the n sequences which has least weight." Adv Math 1976 20 3 367-387 0952 Weir,B.S. Statistical Analysis o.. J.Nat.Cancer In 88 80(6):395-406 Weir BS Statistical Analysis of DNA Sequences Sequence analysis; Significance; Review; USA; Statistical; Markov; Region; DNA "Developments in the statistical analysis of DNA sequence data since 1984 are reviewed. Mathematical methods employing dynamic programming or incorporating Markov chain theory have been developed to search sequences for regions of similarity and to align sequences. When the biological forces of mutation and genetic drift are included in models, distances between aligned sequences allow the construction of evolutionary trees." J Nat Cancer Inst 1988 80 6 395-406 0953 White,C.T. The Diagonal-traverse .. Nucleic Acids R 84 12(1):751-766 White CT; Hardies SC; Hutchison CA III; Edgell MH The Diagonal-traverse Homology Search Algorithm for Locating Similarities between Two Sequences Pairwise comparison; Dot; USA; Display; Segment; Homology; Similarity; Algorithm "We present a fast computer algorithm for finding homology between two DNA sequences. It generates a two-dimensional display in which a diagonal string of dots represents a stretch of homology between the two segments. Our algorithm performs the search very rapidly, and has no internal data storage requirement except for the sequences themselves." Nucleic Acids Res 1984 12 1 751-766 0954 Wilbur,W.J. Rapid Similarity Searc.. Proc.Nat.Acad.S 83 80:726-730 Wilbur WJ; Lipman DJ Rapid Similarity Searches of Nucleic Acid and Protein Data Banks Database search; Pairwise alignment; k-tuple; USA; Similarity; Protein; Nucleic acid "We present an algorithm for the global comparison of sequences based on matching k-tuples of sequence elements for a fixed k. ... The algorithm has also been adapted, in a separate implementation, to produce rigorous sequence alignments." Proc Nat Acad Sci USA 80 80 726-730 0955 Wilbur,W.J. The Context Dependent .. SIAM J.Appl.Mat 84 44(3):557-567 Wilbur WJ; Lipman DJ The Context Dependent Comparison of Biological Sequences Pairwise alignment; Pairwise comparison; Sequence proximity; Dynamic programming; USA "A general method for comparing two macromolecules is developed. The method differs from more traditional procedures in that matches are evaluated dependent on sequence context. We first define a context dependent similarity score between sequences and give a dynamic programming algorithm for its calculation. Conditions are then described which allow the conversion of the similarity score to a metric distance. The class of metrics ... includes the Sellers metric." SIAM J Appl Math 1984 44 3 557-567 0956 Williams,P.L. Phylogeny Determinatio.. Methods Enzymol 90 183:615-627 Williams PL; Fitch WM Phylogeny Determination Using Dynamically Weighted Parsimony Method Phylogeny; Character weight; USA; Region; Parsimony When estimating phylogenies from sequences, workers may discard regions of sequences that are difficult to align, or weight transversions more than transitions. "Both kinds of weighting are subject to the charge of investigator bias in the absence of some procedure for assigning weights .... This chapter presents various methods for assigning both kinds of weights plus a method for evaluating trees given the weights." Methods Enzymol 183 183 615-627 0957 Wong,A.K.C. A Multiple Sequence Co.. Bull.Math.Biol. 93 55(2):465-486 Wong AKC; Chan SC; Chiu DKY A Multiple Sequence Comparison Method Multiple alignment; Phylogeny; CA; Sequence comparison; Hierarchical "A new method for the comparison of multiple macromolecular sequences ... is based on a hierarchical sequence synthesis procedure that does not require any a priori knowledge of the molecular structure of the sequences or the phylogenetic relations among the sequences. It ... has the capability of ... aligning the sequences while the taxonomic tree of the sequences is being constructed in one single phase." Bull Math Biol 1993 55 2 465-486 0958 Wong,A.K.C. A Generalized Method f.. Comput.Biol.Med 74 4:43-57 Wong AKC; Reichert TA; Cohen DN; Aygun BO A Generalized Method for Matching Informational Macromolecular Code Sequences Pairwise alignment; Information theory; USA "This paper is concerned with the discovery of the best way of changing one sequence into another by assembling the set of alterations that minimize some quality measure. The quality measures presently incorporated into the algorithm assess the amount of information required to convert one sequence into another (Reichert, Cohen, Wong 1973). This procedure differs in a fundamental way from [previous work]." Comput Biol Med 4 4 43-57 0959 Wong,C.K. Bounds for the String .. J.Assoc.Comput. 76 23(1):13-16 Wong CK; Chandra AK Bounds for the String Editing Problem Pairwise comparison; Complexity; USA; Editing "The string editing problem is to determine the distance between two strings as measured by the minimal cost sequence of deletions, insertions, and changes of symbols needed to transform one string into the other. ... If the operations on symbols of the strings are restricted to tests of equality, then O(nm) operations are necessary (and sufficient) to compute the distance." J Assoc Comput Mach 1976 23 1 13-16 0960 Wu,S. An O(NP) Sequence Comp.. Inform.Process. 90 35(6):317-323 Wu S; Manber U; Myers G; Miller W An O(NP) Sequence Comparison Algorithm Pairwise comparison; Longest common; USA; Sequence comparison; Edit; Algorithm "Let A and B be two sequences of lengths M and N, N >= M, let D be the length of a shortest insertion-deletion edit script , and let P be the number of deletions in such a script. "We present an algorithm for finding a shortest edit distance of A and B whose worst-case running time is O(NP) and whose expected running time is O(N + PD). ... It is nearly twice as fast as the O(ND) algorithm of Myers ...." Inform Process Lett 1990 35 6 317-323 0961 Yamada,H. A High-Speed String-Se.. IEEE J.Solid-St 87 22(5):829-834 Yamada H; Hirata M; Nagai H; Takahashi K A High-Speed String-Search Engine Database search; Parallel; Hardware; Automata; JP "This paper describes a newly developed VLSI character string-search engine (SSE) which uses a new architecture ... that combines finite-state automaton logic with a new content addressable memory to achieve a string comparison rate as fast as 80 million strings per second. This string-search performance is several times faster than any previously reported." IEEE J Solid-State Circuits 1987 22 5 829-834 0962 Yao,A.C.C. The Complexity of Patt.. SIAM J.Comput. 79 8(3):368-387 Yao ACC The Complexity of Pattern Matching for a Random String String match; Knuth-Morris-Pratt; Complexity; USA; Pattern match "In this paper, we study the average-case complexity of pattern matching in the model of [Knuth, Morris, Pratt (1977)]. ... These results in particular confirm [a conjecture of Knuth] when n >= 2m. We may add that the case m <= n <= 2m is mainly of theoretical interest, as the text strings are usually much longer than the patterns in practice." SIAM J Comput 1979 8 3 368-387 0963 Yee,C.N. Reconstruction of Stri.. Comput.Appl.Bio 93 9(1):1-7 Yee CN; Allison L Reconstruction of Strings Past Pairwise alignment; Automata; Message length; Austria "Minimum message length encoding, a method of inductive inference, is applied to the string-alignment problem. It leads to an alignment method that averages over all alignments in a weighted fashion. Experiments indicate that this method can recover the actual parameters of evolution with high accuracy and over a wide range of values, whereas the use of a single optimal alignment gives biased results." Comput Appl Biosci 1993 9 1 1-7 0964 Yianilos,P.N. A Dedicated Comparator.. Electronics 83 56(24):113-117 Yianilos PN A Dedicated Comparator Matches Symbol Strings Fast and Intelligently Database search; Parallel; Hardware; USA An integrated circuit "searches a data base and ranks the 16 best matches to a given string at up to 30,000 records per second." The measurement of proximity between strings is described on p. 115 Electronics 1983 56 24 113-117 0965 Zuker,M. Suboptimal Sequence Al.. J.Mol.Biol. 91 221(2):403-420 Zuker M Suboptimal Sequence Alignment in Molecular Biology: Alignment with Error Analysis Pairwise alignment; Dynamic programming; Dot; CA; Sequence alignment; Suboptimal; Error "A molecular sequence alignment algorithm based on dynamic programming has been extended to allow the computation of all pairs of residues that can be part of optimal and suboptimal sequence alignments. The uncertainties inherent in sequence alignment can be displayed using a new form of dot plot. The method ... can reveal what parts of the alignment are better determined than others." J Mol Biol 1991 221 2 403-420 0966 Zvelebil,M.J. Prediction of Protein .. J.Mol.Biol. 87 195(4):957-961 Zvelebil MJ; Barton GJ; Taylor WR; Sternberg MJE Prediction of Protein Secondary Structure and Active Sites using the Alignment of Homologous Sequences Multiple alignment; Structure; UK; Protein; Prediction; Secondary "The prediction of protein secondary structure ... is improved by 9% to 66% using the information available from a family of homologous sequences." A method to align multiple sequences is described on page 958 J Mol Biol 1987 195 4 957-961 0967 Nakayama,S.I. Method for Clustering .. J.Chem.Inf.Comp 88 28:72-78 Nakayama SI; Shigezumi S; Yoshida M Method for Clustering Proteins by Use of All Possible Pairs of Amino Acids as Structural Descriptors Sequence proximity; N-gram; Clustering; JP; Dyad; Composition; Protein; Amino acid "Proteins were represented as vectors, of which components were all possible pairs of amino acids. From a distance matrix between any pairs of proteins thus represented, several clusters corresponding to connected components were generated. Application of this method to three different sets of proteins showed that it was suitable for clustering closely related proteins with respect to the sequential similarity defined by Dayhoff." J Chem Inf Comput Sci 28 28 72-78 0968 Liu,K.C. On String Pattern Matc.. SIAM J.Comput. 81 10(1):118-140 Liu KC On String Pattern Matching: A New Model with a Polynomial Time Algorithm Pattern match; USA; Parsing; Pattern definition; Language; String match; Model; Algorithm "A polynomial time algorithm is presented for string pattern matching. Earley's parsing algorithm is adapted for context-free patterns and is extended to allow the augmentation of the immediate assignment operation of SNOBOL4 and a powerful describtive operator not previously implemented, set complement. Canonical pattern definition systems are defined to describe patterns for which our algorithm will perform pattern matching. The languages generated by such systems are called extended context-free languages, and are shown to properly contain the family of context-free languages ...." SIAM J Comput 1981 10 1 118-140 0969 Taylor,W.R. The Classification of .. J.Theor.Biol. 86 119:205-218 Taylor WR The Classification of Amino Acid Conservation Consensus sequence; UK; Sequence alignment; Classification; Amino acid "A classification of amino acid type is described which is based on a synthesis of physico-chemical and mutation data. This is organised in the form of a Venn diagram from which sub-sets are derived that include groups of amino acids likely to be conserved for similar structural reasons. These sets are used to describe conservation in aligned sequences by allocating to each position the smallest set that contains all the residue types brought together by the alignment. This minimal set assignment provides a simple way of reducing the information contained in a sequence alignment to a form which can be analysed by computer yet remains readable." J Theor Biol 119 119 205-218 0970 Thornton,J.M. Protein Motifs and Dat.. Trends Biochem. 89 14:300-304 Thornton JM; Gardner SP Protein Motifs and Data-base Searching Sequence database; UK; Motif; Structure; Protein "Protein structure and sequence motifs are now recognized for many different protein families and topologies. To aid identification and use of these motifs in modelling and prediction, it has become necessary to establish consistent data bases of protein structure, including not only coordinates, but also derived data such as secondary structure location and solvent accessibilities. ... We will concentrate on structural motifs and structure- related sequence motifs and describe how these can be extracted from the newly established data bases of protein structure for use in prediction and modelling." Trends Biochem Sci 14 14 300-304 0971 Guenoche,A. Alignment and Hierarch.. Information a.. 93Springer-Verlag Guenoche A Alignment and Hierarchical Clustering Method for Strings Opitz O Lausen B; Klar R Information and Classification. Concepts, Methods and Applications Multiple alignment; Longest common; FR; Clustering; Hierarchical "We develop a conceptual clustering method for strings to realize an multiple alignment of biological sequences. We associate to each cluster a common subsequence of its strings. Unfortunately, the longest common subsequence problem is NP-hard as soon as there are more than two strings. To avoid this difficulty, we present: a greedy alignment method; some improvements of the Hirschberg algorithm ...; an ascending clustering method, which provides a common subsequence that is longer than the one given by the greedy algorithm." Springer-Verlag Berlin 1993 403-412 0972 Andersson,A. The Complexity of Sear.. ACM Sympos.Theo 94 26:317-325 Andersson A; Hagerup T; Hastad J; Petersson O The Complexity of Searching a Sorted Array of Strings Sequence search; Complexity; SWE 23-25 May 1994, Montreal, Quebec. "We present an algorithm for finding a given k-character string in an array of n strings, arranged in alphabetical order, using O( ( k log log n / ( log log ( 4 + ( k log log n / log n ) ) ) ) + k + log n ) character comparisons. This improves significantly upon previous bounds." ACM Sympos Theory Comput 26 26 317-325 0973 Bodlaender,H. Beyond NP-Completeness.. ACM Sympos.Theo 94 26:449-458 Bodlaender HL; Fellows MR; Hallett MT Beyond NP-Completeness for Problems of Bounded Width: Hardness for the W Hierarchy (Extended Abstract) Longest common; Complexity; Parameterized; CA 23-25 May 1994, Montreal, Quebec. "The parameterized computational complexity of a collection of well-known problems including: ... LONGEST COMMON SUBSEQUENCE ... is explored. It is shown that these problems are hard for various levels of the W hierarchy. ... Theorem 2. LCS is hard for W[t] for all t." ACM Sympos Theory Comput 26 26 449-458 0974 Hagerup,T. Optimal Parallel Strin.. ACM Sympos.Theo 94 26:382-391 Hagerup T Optimal Parallel String Algorithms: Merging, Sorting and Computing the Minimum Merge; Sort; Parallel; DE; Optimal; Algorithm 23-25 May 1994, Montreal, Quebec. "We study fundamental comparison problems on strings of characters, equipped with the usual lexicographical ordering. For each problem studied, we give a parallel algorithm that is optimal with respect to at least one criterion for which no optimal algorithm was previously known." The results concern: merging two sets of strings, sorting a sequence of strings, and finding the minimum string in a sequence of strings. ACM Sympos Theory Comput 26 26 382-391 0975 Hagerup,T. Merging and Sorting St.. Lecture Notes i 92 629:298-306 Hagerup T; Petersson O Merging and Sorting Strings in Parallel Merge; Sort; Parallel; DE Proceedings, 17th International Symposium on Mathematical Foundations of Computer Science. "We show that strings of characters, equipped with the usual lexicographical ordering, can be merged and sorted in parallel as efficiently as integers, although with some loss in speed." The models of computation considered are the CRCW PRAM and the EREW PRAM. Lecture Notes in Comput Sci 629 629 298-306 0976 Clift,B. Sequence Landscapes Nucleic Acids R 86 14(1):141-158 Clift B; Haussler D; McConnell R; Schneider TD; Stormo GD Sequence Landscapes Regularities; Display; USA; Repeat "We describe a method for representing the structure of repeating sequences in nucleic-acids, proteins and other texts. A portion of the sequence is presented at the bottom of a CRT screen. Above the sequence is its landscape, which looks like a mountain range. Each mountain corresponds to a subsequence of the sequence. At the peak of every mountain is written the number of times that the subsequence appears. ... Using sequence landscapes, one can quickly locate significant repeats." Nucleic Acids Res 1986 14 1 141-158 0977 Hariharan,R. Optimal Parallel Suffi.. ACM Sympos.Theo 94 26:290-299 Hariharan R Optimal Parallel Suffix Tree Construction Search tree; Parallel; USA; Optimal; Suffix "An O(m)-work O(log4 m)-time common CRCW-PRAM algorithm for constructing the suffix tree of a string s of length m drawn from any fixed alphabet set is obtained. The algorithm takes O(m) space and is the first known work and space optimal parallel algorithm for this problem. It can be generalized to a string s drawn from any general alphabet ...." ACM Sympos Theory Comput 26 26 290-299 0978 Jiang,T. Aligning Sequences via.. ACM Sympos.Theo 94 26:760-769 Jiang T; Lawler EL; Wang L Aligning Sequences via an Evolutionary Tree: Complexity and Approximation Multiple alignment; Evolutionary tree; Complexity; Approximation; CA "It is shown that tree alignment is NP-hard and generalized tree alignment is MAX SNP-hard. On the positive side, we design an efficient approximation algorithm with performance ratio 2 for tree alignment. The algorithm is then extended to a polynomial-time approximation scheme. ... The contrast between the approximability of tree alignment and generalized tree alignment shows that a phylogenetic tree can indeed help in multiple alignment." ACM Sympos Theory Comput 26 26 760-769 0979 Lander,E.S. Mapping and Interpreti.. Comm.ACM 91 34(11):33-39 Lander ES; Langridge R; Saccocio DM Mapping and Interpreting Biological Information Sequence analysis; Database search; Structure; USA; Mapping "This article summarizes some of the key computational challenges in three major areas as discussed by workshop participants: sequence analysis, information storage and retrieval, and protein structure prediction. Successfully meeting the challenges in all these problem areas is likely to require unprecedented collaboration between the computer and life sciences." Comm ACM 1991 34 11 33-39 0980 Kosaraju,S.R. Real-Time Pattern Matc.. ACM Sympos.Theo 94 26:310-316 Kosaraju SR Real-Time Pattern Matching and Quasi-Real-Time Construction of Suffix Trees. Preliminary Version Pattern match; Search tree; USA; Suffix "We design simple real-time algorithms for the following problems for any text string T and pattern string P: (a) given T#P as input, test whether PR is a substring of T, and (b) Given T#P as input, test whether P is a substring of T. Even though these results were claimed in a voluminous paper by Slisenko, the design of a convincing and understandable solution is a well-known open problem. Our algorithm is based on a novel top-down suffix tree construction algorithm. This algorithm ... constructs enough of the suffix tree in real-time so that it can respond to pattern match queries in real-time." ACM Sympos Theory Comput 26 26 310-316 0981 Cole,R. Optimally Fast Paralle.. IEEE Sympos.Fou 93 34:248-258 Cole R; Crochemore M; Galil Z; Gasieniec L; Hariharan R; Muthukrishnan S; Park K; Rytter W Optimally Fast Parallel Algorithms for Preprocessing and Pattern Matching in One and Two Dimensions Pattern match; Parallel; USA; Algorithm The authors obtain these results for a pattern of length m: "1. Improving the preprocessing of the constant-time text search algorithm [Galil 1992] .... 2. A constant-time deterministic string-matching algorithm in the case that the text length n satisfies n = W( m1+e ) for a constant e > 0. 3. A simple probabilistic string-matching algorithm that has constant time with high probability for random input. 4. A constant expected time Las-Vegas algorithm for computing the period of the pattern and all witnesses and thus string matching itself, solving the main open problem remaining in string matching." IEEE Sympos Found Comput Sci 34 34 248-258 0982 Muthukrishnan String Matching Under .. Lecture Notes i 92 652:356-367 Muthukrishnan S; Ramesh H String Matching Under a General Matching Relation String match; Complexity; USA Proc. 12th FST & TCS, India. "In standard string matching, each symbol matches only itself. In other string matching problems, e.g., the string matching with 'don't cares' problem, a symbol may match several symbols. In general, an arbitrary many-to-many matching relation might hold between symbols. We consider a general string matching problem in which such a matching relation is specified and those text positions are sought at which the pattern matches under this relation. Depending upon the existence of a simple, easily recognizable property in the given matching relation, we show that string matching either requires time linear in the text and pattern lengths or is at least as hard as boolean multiplication." Lecture Notes in Comput Sci 652 652 356-367 0983 Muthukrishnan Non-Standard Stringolo.. ACM Sympos.Theo 94 26:770-779 Muthukrishnan S; Palem K Non-Standard Stringology: Algorithms and Complexity String match; Match with don't cares; Complexity; USA; Algorithm "Non-standard stringology concerns string matching problems, wherein a position in the 'text' (of size n) matches one in the 'pattern' (of size m), based on very general relationships between the corresponding 'symbols'. For example, string matching with don't cares is a simple non-standard string matching problem .... The main results in this paper concern the inherent complexity of a variety of non-standard string matching problems, characterized in terms of algebraic convolutions." ACM Sympos Theory Comput 26 26 770-779 0984 Penotti,F.E. A Distributed System f.. Comput.Appl.Bio 94 10(3):277-280 Penotti FE A Distributed System for DNA/Protein Database Similarity Searches Database search; Distributed; Italy; Similarity "A distributed system for exhaustive alignment similarity searches on DNA/protein databases is presented. The system makes it possible to share the computational burden on diverse computers, provided they are interconnected by a network supporting TCP/IP communication." Comput Appl Biosci 1994 10 3 277-280 0985 Sahinalp,S.C. Symmetry Breaking for .. ACM Sympos.Theo 94 26:300-309 Sahinalp SC; Vishkin U Symmetry Breaking for Suffix Tree Construction String match; Search tree; Parallel; USA; Suffix "There are several serial algorithms for suffix tree construction which run in linear time, but the number of operations in the only parallel algorithm available ... is propostional to n log n. ... We show how to break symmetries that occur in the process of assigning labels ... and thereby reduce the number of labeled substrings to linear. We give several algorithms for suffix tree construction. One of them runs in O(log2 n) parallel time and O(n) work for input strings whose characters are drawn from a constant size alphabet." ACM Sympos Theory Comput 26 26 300-309 0986 Bairoch,A. The SWISS-PROT Protein.. Nucleic Acids R 91 19:2247-2249 Bairoch A; Boeckmann B The SWISS-PROT Protein Sequence Data Bank Sequence database; SWI; Protein; SWISS-PROT "SWISS-PROT is an annotated protein sequence database established in 1986 and maintained collaboratively, since 1988, by the Department of Medical Biochemistry of the University of Geneva and the EMBL Data Library." Sources of the sequence data. Format. What distinguishes SWISS-PROT from other protein sequence databases? Annotation. Minimal redundancy. Integration with other databases. Content of the current release. Distribution. Nucleic Acids Res 19 19 2247-2249 0987 Barker,W.C. The PIR-International .. Nucleic Acids R 92 20:2023-2026 Barker WC; George DG; Mewes HW; Tsugita A The PIR-International Protein Sequence Database Sequence database; Protein; PIR; USA PIR-International. The protein sequence database. The superfamily concept and placement in the database. Standardization within and among databases. Work in progress. Complementary role of the protein sequence database and Geninfo. Data distribution on magnetic tapes and cd-rom. On-line access and e-mail servers. How to obtain PIR-International databases, software, and newsletters. Nucleic Acids Res 20 20 2023-2026 0988 Staden,R. Indexing the Sequence .. DNA Seq.- J.DNA 92 3:99-105 Staden R; Dear S Indexing the Sequence Libraries: Software Providing a Common Indexing System for all the Standard Sequence Libraries Database search; Program; UK "We describe a set of programs for creating and using indexes for the distributed forms of the major sequence libraries. The indexes conform to the specification of those distributed on cd-rom by the EMBL sequence library. The programs create entry name, accession number, author and freetext indexes and a brief directory index. If a suitable application program is given an entry name or accession number these indexes allow rapid retrieval of sequences or annotation. ... We also describe the organisation and use of the different sequence libraries and their index files." DNA Seq - J DNA Seq Mapping 1992 3 99-105 0989 Bernstein,M. Reducing the Man-Machi.. Comput.Appl.Bio 87 3(3):229-232 Bernstein M Reducing the Man-Machine Barrier: The Sequence Analysis Workbench Sequence analysis; Program; USA "Direct manipulation offers an alternative paradigm in which the scientist uses the computer as a scientific instrument for examining and modifying the data directly, rather than by issuing instructions to an agent. The Sequence Analysis Workbench provides several experimental tools for direct manipulation of sequence data; object-oriented programming makes it possible to construct sophisticated tools quickly, and facilitates critical examination and review of scientific software." Comput Appl Biosci 1987 3 3 229-232 0990 Sonnhammer,E. A Workbench for Large-.. Comput.Appl.Bio 94 10(3):301-307 Sonnhammer ELL; Durbin R A Workbench for Large-Scale Sequence Homology Analysis Sequence analysis; Program; UK; Sequence alignment; Database search; BLAST; Homology "To reduce the tedious browsing of large quantities of protein similarities, two programs, MSPcrunch and Blixem, were developed, which assist in processing the results from the database search programs in the BLAST suite. MSPcrunch removes biased composition and redundant matches while keeping weak matches that are consistent with a larger gapped alignment. ... Blixem is a multiple sequence alignment viewer for X-windows which makes it significantly easier to scan and evaluate the matches ratified by MSPcrunch." Comput Appl Biosci 1994 10 3 301-307 0991 Cornish-Bowde How Reliably do Amino .. J.Theor.Biol. 79 76:369-386 Cornish-Bowden A How Reliably do Amino Acid Composition Comparisons Predict Sequence Similarities between Proteins? Sequence proximity; Sequence comparison; Composition; UK; Similarity; Amino acid; Protein "A method for comparing amino acid compositions of proteins (Cornish- Bowden 1977) has been extended to allow proteins of unequal lengths to be compared. ... It tends to exaggerate the amount of difference between unrelated proteins. ... When applied to related proteins the method gives results in good agreement with those predicted." J Theor Biol 76 76 369-386 0992 Strelets,V.B. Data Bank Homology Sea.. Comput.Appl.Bio 94 10(3):319-322 Strelets VB; Ptitsyn AA; Milanesi L; Lim HA Data Bank Homology Search Algorithm with Linear Computation Complexity Database search; k-tuple; USA; Region; Complexity; Homology; Algorithm "The principal advantages of the new algorithm are: (i) linear computation complexity; (ii) low memory requirements; (iii) high sensitivity to the presence of local region homology. The algorithm first calculates indicative matrices of k-tuple 'realization' in the query sequence and then searches for an appropriate number of matching k-tuples within a narrow range in database sequences." Comput Appl Biosci 1994 10 3 319-322 0993 Branscomb,E. Optimizing Restriction.. Genomics 90 8:351-366 Branscomb E; Slezak T; Pae R; Galas D; Carrano AV; Waterman M Optimizing Restriction Fragment Fingerprinting Methods for Ordering Large Genomic Libraries Genome; Fingerprint; Contig; Likelihood; Statistical; USA; Restriction; Fragment; Genomic "We present a statistical analysis of the problem of ordering large genomic cloned libraries through overlap detection based on restriction fingerprinting. ... To this end, we adopt a statistical approach that uses the likelihood ratio as a statistic to detect overlap. ... This estimate is a critical tool for the accurate, automatic assembly of overlapping sets of fragments into islands called 'contigs.' These contigs must subsequently be connected by other methods to provide an ordered set of overlapping fragments covering the entire genome." Genomics 8 8 351-366 0994 Mott,R. Algorithms and Softwar.. Nucleic Acids R 93 21(8):1965-197 Mott R; Grigoriev A; Maier E; Hoheisel J; Lehrach H Algorithms and Software Tools for Ordering Clone Libraries: Application to the Mapping of the Genome of Schizosaccharomyces pombe Genome; Clone; Mapping; Program; UK; Simulated annealing; Algorithm "A complete set of software tools to aid the physical mapping of a genome has been developed and successfully applied .... Two approaches were used for ordering single-copy hybridisation probes: one was based on the simulated annealing algorithm to order all probes, and another on inferring the minimum- spanning subset of the probes using a heuristic filtering procedure. ... In addition to these programs and the database management software, tools for visualizing and editing the data are described." Nucleic Acids Res 1993 21 8 1965-1974 0995 Olson,M.V. Random-Clone Strategy .. Proc.Nat.Acad.S 86 83:7826-7830 Olson MV; Dutchik JE; Graham MY; Brodeur GM; Helms C; Frank M; MacCollin M; Scheinman R; Frank T Random-Clone Strategy for Genomic Restriction Mapping in Yeast Genome; Restriction; USA; Mapping; Clone; Genomic "An approach to global restriction mapping is described that is applicable to any complex source DNA. By analyzing a single restriction digest for each member of a redundant set of l clones, a data base is constructed that contains fragment-size lists for all the clones. The clones are then grouped into usbsets, each member of which is related to at least one other member by a significant overlap. Finally, a tree-searching algorithm seeks restriction maps that are consistent with the fragment-size lists for all the clones in each subset." Proc Nat Acad Sci USA 83 83 7826-7830 0996 Zhang,P. An Algorithm Based on .. Comput.Appl.Bio 94 10(3):309-317 Zhang P; Schon EA; Fischer SG; Cayanis E; Weiss J; Kistler S; Bourne PE An Algorithm Based on Graph Theory for the Assembly of Contigs in Physical Mapping of DNA Genome; USA; Graph; Contig; Mapping; DNA; Algorithm; Physical mapping; Physical "An algorithm is described for mapping DNA contigs based on an interval graph (IG) representation. In general terms, the input to the algorithm is a set of binary overlapping relations among finite intervals spread along a real line, from which the algorithm generates sets of ordered overlapping fragments spanning that line. The implications of a more general case of the IG, called a probe interval graph (PIG), in which only a subset of cosmids are used as probes, are also discussed. ... CPU time is essentially linear with respect to the number of cosmids analyzed." Comput Appl Biosci 1994 10 3 309-317 0997 Schmitt,W. Multiple Solutions of .. Adv.Appl.Math. 91 12:412-427 Schmitt W; Waterman MS Multiple Solutions of DNA Restriction Mapping Problems Restriction; Digest; Mapping; USA; DNA "The construction of a restriction map of a DNA molecule from fragment length data is known to be NP hard. However, it is also known that under a simple model of randomness the number of solutions to the mapping problem increases exponentially with the length of the DNA molecule. In this paper, we define a hierarchy of equivalence relations on the set of all solutions to the mapping problem and study the combinatorics and characterization of the equivalence classes." Adv Appl Math 12 12 412-427 0998 Watterson,G.A The Chromosome Inversi.. J.Theor.Biol. 82 99:1-7 Watterson GA; Ewens WJ; Hall TE; Morgan A The Chromosome Inversion Problem Chromosome; Inversion; Genomic; AU "We wish to calculate a measure of distance between two species for the purpose of constructing a phylogenetic tree. The data from which the distance measure is to be calculated is the order of the sequence of gene loci around a circular chromosome, and the distance between any two species is the minimum number of chromosomal inversions necessary to make the two sequences identical. There is no top or bottom to the chromosome so mirror image sequences are regarded as being identical. There is also no fixed 12 o'clock position. Various algorithms are considered which yield upper and lower bounds to the distance measure required but no algorithm giving the exact value has been found." J Theor Biol 99 99 1-7 0999 Galil,Z. Saving Space in Fast S.. IEEE Sympos.Fou 77 18:179-188 Galil Z; Seiferas J Saving Space in Fast String-Matching String match; IL "The inspiration for this paper was an attempt to implement the fast string-matching algorithm of Knuth, Morris and Pratt (1977) as a Fortran subroutine. ... We show [for pattern x and text y] how to reduce the additional space utilization by the fast algorithm down to O( log |x| ) memory locations. ... We show how to reduce the running time of the naive algorithm all the way down to O( |x|e (|x| + |y|) ) for any fixed e > 0. Thus we get an almost linear- time algorithm which can be implemented without any dynamic storage allocation at all." IEEE Sympos Found Comput Sci 18 18 179-188 1000 Guibas,L.J. A New Proof of the Lin.. IEEE Sympos.Fou 77 18:189-195 Guibas LJ; Odlyzko AM A New Proof of the Linearity of the Boyer-Moore String Searching Algorithm String match; Boyer-Moore; String search; USA; Algorithm "The main result of this paper is a new proof of the linearity of the Boyer-Moore algorithm. We have paid close attention to details and to clarity of presentation. We have also improved the worst case bound from 6n to 4n. In the process we have developed considerable combinatorial machinery dealing with the occurrence of periods in strings, much of it of interest in its own right. Undoubtedly the same or similar machinery will come in handy in the analysis of other questions concerning pattern matching. [Possibly] the true worst case bound for the algorithm is 2n." IEEE Sympos Found Comput Sci 18 18 189-195 1001 Bafna,V. Genome Rearrangements .. IEEE Sympos.Fou 93 34:148-157 Bafna V; Pevzner PA Genome Rearrangements and Sorting by Reversals Genome; Rearrangement; Sort; Reversal; USA "Sequence comparison in molecular biology is in the beginning of a major paradigm shift - a shift from gene comparison based on local mutations to chromosome comparison based on global rearrangements. In the simplest form the problem of gene rearrangements corresponds to sorting by reversals, i.e., sorting of an array using reversals of arbitrary fragments." Theoretical results and approximation algorithms are given for the cases of unsigned and signed permutations. IEEE Sympos Found Comput Sci 34 34 148-157 1002 Skiena,S.S. A Partial Digest Appro.. Bull.Math.Biol. 94 56(2):275-294 Skiena SS; Sundaram G A Partial Digest Approach to Restriction Site Mapping Restriction; Mapping; DNA; USA; Digest "We present a new, practical algorithm to resolve the experimental data in restriction site analysis, which is a common technique for mapping DNA. Specifically, we assert that multiple digestions with a single restriction enzyme can provide sufficient information to identify the positions of the restriction sites with high probability. The motivation for the new approach comes from combinatorial results on the number of mutually homeometric sets in one dimension, where two sets of n points are homeometric if the multiset of n(n-1)/2 distances they determine are the same." Bull Math Biol 1994 56 2 275-294 1003 Galil,Z. Time-Space-Optimal Str.. ACM Sympos.Theo 81 13:106-113 Galil Z; Seiferas J Time-Space-Optimal String Matching (preliminary report) String match; Complexity; Automata; IL "In this paper we describe a new linear-time string-matching algorithm requiring neither dynamic storage allocation nor other high-level capabilities. The algorithm can be implemented to run in linear time even on a six-head two- way finite automaton. Moreover, the automaton requires only =, not= branching. (Decisions depend on which of the six scanned pattern or text symbols and positions are the same, but not on the particular symbols or how many symbols there are. Hence the same algorithm works even for an infinite alphabet.)" ACM Sympos Theory Comput 13 13 106-113 1004 Hirschberg,D. The Least Weight Subse.. IEEE Sympos.Fou 85 26:137-143 Hirschberg DS; Larmore LL The Least Weight Subsequence Problem - extended abstract Least weight; Subsequence; USA "The least weight subsequence (LWS) problem is introduced, and is shown to be equivalent to the classic minimum path problem for directed graphs. A special case of the LWS problem is shown to be solvable in O( n log n ) time generally and, for certain weight functions, in linear time. A number of applications are given, including an optimum paragraph formation problem and the problem of finding a minimum height B-tree, whose solutions realize improvement in asymptotic time complexity." IEEE Sympos Found Comput Sci 26 26 137-143 1005 Fitch,W.M. Mapping the Order of D.. Gene 83 22:19-29 Fitch WM; Smith TF; Ralph WW Mapping the Order of DNA Restriction Fragments Restriction; Mapping; DNA; USA; Fragment "A straightforward method was designed for mapping the order of DNA restriction fragments obtained by a double and two single digestions, without the necessity of using a computer or a radioactive label. All possible solutions compatible with a pre-set level of error in the determination of sequence lengths are obtained. The primary assumptions are given, and the appropriate modifications of the algorithm are presented as a function of any assumptions one is unable (or unwilling) to make. Use of the method in connection with end- labeled fragments is also described." Gene 22 22 19-29 1006 Li,M. Towards a DNA Sequenci.. IEEE Sympos.Fou 90 31:125-134 Li M Towards a DNA Sequencing Theory (Learning a String) (Preliminary Version) Sequence analysis; Supersequence; Shortest common; Approximation; DNA; CA; Sequencing; Learning "We model the DNA sequencing problem as learning a superstring from its randomly drawn substrings. ... One major obstacle to our approach turns out to be a quite well-known open question on how to approximate the shortest common superstring of a set of strings .... We give the first provably good algorithm which approximates the shortest superstring of length n by a superstring of length O( n log n )." IEEE Sympos Found Comput Sci 31 31 125-134 1007 Kannan,S. Inferring Evolutionary.. IEEE Sympos.Fou 90 31(I):362-371 Kannan S; Warnow T Inferring Evolutionary History from DNA Sequences (Extended Abstract) Phylogeny; DNA; USA "We are interested here in two related problems. The first is determining whether we can triangulate a vertex-colored graph without introducing edges between vertices of the same color. This is related to a fundamental problem for geneticists, that of using character state information to construct evolutionary trees. We demonstrate the polynomial equivalence of these problems. An important subproblem arises when the characters are based upon DNA sequences. We present an O( n2k ) algorithm for this case where n is the number of species and k is the number of characters." IEEE Sympos Found Comput Sci 1990 31 I 362-371 1008 Blum,A. Linear Approximation o.. ACM Sympos.Theo 91 23:328-336 Blum A; Jiang T; Li M; Tromp J; Yannakakis M Linear Approximation of Shortest Superstrings Supersequence; Shortest common; Approximation; USA "Although [the shortest common superstring] problem is known to be NP- hard, a simple greedy procedure appears to do quite well .... We show that the greedy algorithm does in fact achieve a constant factor approximation, proving an upper bound of 4n. Furthermore, we present a simple modified version of the greedy algorithm that we show produces a superstring of length at most 3n. We also show the superstring problem to be MAX SNP-hard, which implies that a polynomial-time approximation scheme for this problem is unlikely." ACM Sympos Theory Comput 23 23 328-336 1009 Galil,Z. Truly Alphabet-Indepen.. IEEE Sympos.Fou 92 33:247-256 Galil Z; Park K Truly Alphabet-Independent Two-Dimensional Pattern Matching Pattern match; Multidimensional; USA "We present an algorithm [for two-dimensional pattern matching with a pattern of size m2 and a text of size n2] that is truly independent of the alphabet and takes linear O( m2 + n2 ) time. As in the Knuth-Morris-Pratt algorithm, the only operation on the alphabet is the equality test of two symbols." IEEE Sympos Found Comput Sci 33 33 247-256 1010 Jiang,T. k One-way Heads Cannot.. ACM Sympos.Theo 93 25:62-70 Jiang T; Li M k One-way Heads Cannot do String-Matching String match; Automata; CA "We settle a conjecture raised by Z. Galil and J. Seiferas [1981] 12 years ago: k-head one-way deterministic finite automata cannot perform string-matching (i.e., accept the language { x#y : there exists u and there exists v such that y = uxv } ), for any k." ACM Sympos Theory Comput 25 25 62-70 1011 Baker,B.S. A Theory of Parameteri.. ACM Sympos.Theo 93 25:71-80 Baker BS A Theory of Parameterized Pattern Matching: Algorithms and Applications (Extended Abstract) Pattern match; Parameterized; Suffix; USA; Algorithm "This paper develops a theory and algorithms for an application problem arising in software maintenance: to track down duplication in a large software system. We want to find ... parameterized matches, where a parameterized match between two sections of code means that one section can be transformed into the other by [a one-to-one replacement of parameter names]. This paper formalizes this problem in terms of parameterized strings and parameterized pattern matching and defines a new data structure (parameterized suffix tree) suitable for parameterized pattern matching." ACM Sympos Theory Comput 25 25 71-80 1012 Blum,N. Speeding up Dynamic Pr.. 94Institut fur In Blum N Speeding up Dynamic Programming without Omitting any Optimal Solution and some Applications in Molecular Biology BK - Pairwise alignment; Dynamic programming; DE; Complexity; Optimal; Dynamic "We extend the algorithm of Galil and Biancarlo, which sped up dynamic programming in the case of concave cost functions such that a compact representation of all optimal solutions is computed. The time complexity grows only by a small constant factor. Under the assumption that such a compact representation is given, we develop efficient algorithms for the solution of problems in molecular biology concerning the computation of all optimal local alignments and all optimal subalignments in genetic sequences." Institut fur Informatik Universitat Bonn ,Bonn 1994 1-37 1013 Doolittle,R.F Similar Amino Acid Seq.. Trends Biochem. 89 14:244-245 Doolittle RF Similar Amino Acid Sequences Revisited Sequence comparison; Significance; USA; Amino acid "The rapid accumulation of protein sequences, many bearing unexpected resemblances to each other, is providing a new perspective on evolution." See also Doolittle (1981). Trends Biochem Sci 14 14 244-245 1014 Staden,R. Graphic Methods to Det.. Nucleic Acids R 84 12(1):521-538 Staden R Graphic Methods to Determine the Function of Nucleic Acid Sequences Function; Sequence analysis; UK; Region; Display; Nucleic acid; Graphic "We have described a single program [ANALYSEQ] that contains the traditional sequence analysis techniques plus some new methods that can locate particular sequence features or regions that are of interest because they are unusual. Most of the routines display their results graphically which has a number of advantages. Graphical output is clearer than marking listings of sequences; allows superposition, and therefore easy comparison, of the results of many different and often independent forms of analysis; and it allows us to see regions of sequences that may perform more than one function." Nucleic Acids Res 1984 12 1 521-538 1015 Hillis,D.M. Ribosomal DNA: Molecul.. Q.Rev.Biol. 91 66(4):411-453 Hillis DM; Dixon MT Ribosomal DNA: Molecular Evolution and Phylogenetic Inference Phylogeny; Review; USA; Evolution; DNA; Phylogenetic "Studies of rDNA sequences have been used to infer phylogenetic history across a very broad spectrum .... The reasons for the systematic versatility of rDNA include the numerous rates of evolution among different regions of rDNA ..., the presence of many copies of most rDNA sequences per genome, and the pattern of concerted evolution that occurs among repeated copies. These features facilitate the analysis of rDNA by direct RNA sequencing, DNA sequencing ..., and restriction enzyme methodologies. Constraints imposed by secondary structure of rRNA and concerted evolution need to be considered in phylogenetic analyses, but these constraints do not appear to impede seriously the usefulness of rDNA." Q Rev Biol 1991 66 4 411-453 1016 Konopka,A. Is the Information Con.. J.Theor.Biol. 84 107:697-704 Konopka A Is the Information Content of DNA Evolutionarily Significant? Composition; Information content; DE; DNA "It has been suggested (Subba Rao, Hamid & Subba Rao, 1979; Subba Rao, Geevan & Subba Rao, 1982) that the information content of the coding regions in DNA tends to increase with evolution and, therefore, is a suitable indicator of evolutionary progress. In order to re-examine this hypothesis, I have modified the method used by Subba Rao et al. (1982) in such a way that the numerical results are much less sensitive to the amino acid composition of the polypeptide corresponding to the DNA sequences under consideration. By using this modified procedure, I present evidence that the hypothesis of Subba Rao et al. (1982) is not valid for a wide range of evolving genes." J Theor Biol 107 107 697-704 1017 Shulman,M.J. The Coding Function of.. J.Theor.Biol. 81 88:409-420 Shulman MJ; Steinberg CM; Westmoreland N The Coding Function of Nucleotide Sequences can be Discerned by Statistical Analysis Sequence analysis; Function; Statistical; Coding; SWI; Nucleotide "The nucleotide sequences of the RNA phage MS2 and the DNA phage fX were subjected to statistical analysis. This analysis alone indicates (a) that the genetic code is a non-overlapping triplet code and (b) what the correct reading frame is. The application of these methods to identify structure in sequences of unknown function is discussed." J Theor Biol 88 88 409-420 1018 Subba Rao,J. Significance of the In.. J.Theor.Biol. 82 96:571-577 Subba Rao J; Geevan CP; Subba Rao G Significance of the Information Content of DNA in Mutations and Evolution Information content; Composition; Significance; India; Evolution; DNA "One point mutations in human haemoglobins have been analysed and it is seen that most of these mutations satisfy the condition P1 > P2, where P1 is the probability of occurrence of the codon that mutates and P2 is that of the codon it mutates to. Further, it is shown that the hypothesis that the information content of DNA is a reasonable evolutionary measure is consistent with the above condition." J Theor Biol 96 96 571-577 1019 Rzhetsky,A. A Simple Method for Es.. Mol.Biol.Evol. 92 9(5):945-967 Rzhetsky A; Nei M A Simple Method for Estimating and Testing Minimum-Evolution Trees Phylogeny; Evolutionary tree; Statistical; Minimum evolution; USA "A simple method for estimating and testing phylogenetic trees under the principle of minimum evolution (ME) is presented. The basic procedure of this method is first to obtain the neighbor-joining (NJ) tree by Saitou and Nei's method and then to search for a tree with the minimum value of the sum (S) of branch lengths by examining all trees that are closely related to the NJ tree. Once the ME tree is identified, a statistical test is conducted for the difference in S between this tree and other closely related trees. The mathematical method required for conducting this test is developed by using the least-squares approach." Mol Biol Evol 1992 9 5 945-967 1020 Barber,A.M. SequenceEditingAligner.. Gene.Anal.Techn 90 7:39-45 Barber AM; Maizel JV Jr SequenceEditingAligner: A Multiple Sequence Editor and Aligner Display; Program; Editor; Sequence alignment; Consensus sequence; USA "Here we present the SequenceEditingAligner system for editing multiple, aligned genetic sequences. This is an interactive multi-window color system that displays more than 3500 nucleotides or amino acids. The system handles nucleic acid or protein sequences with or without secondary structure data. More than 300 sequences, each more than 1500 elements in length, may be analyzed together. With the system scientists can classify elements, align sequences, edit them, find consensus patterns, and simultaneously generate oligomer frequency histograms and other statistics." Gene Anal Techn Appl 7 7 39-45 1021 Stockwell,P.A HOMED: A Homologous Se.. Trends Biochem. 88 13:322-324 Stockwell PA HOMED: A Homologous Sequence Editor Display; Program; Editor; NZ "Since the initial publication of the HOMED HOMologous sequence EDitor in CABIOS (1987) a number of further enhancements have been made so that an updated report of the current capabilities is desirable." Trends Biochem Sci 13 13 322-324 1022 Fuchs,R. Free Molecular Biologi.. Comput.Appl.Bio 90 6(2):120-121 Fuchs R Free Molecular Biological Software Available from the EMBL File Server Program; Sequence analysis; DE; EMBL; Server "A new service provided by EMBL (EMBL Software File Server) is described that will make free molecular biology software available to anyone with computer network access. MS-DOA, Apple Macintosh and VAX/VMX are supported at the moment. The programs will be delivered by normal electronic mail; conversion mechanisms will transform binary files to ASCII to allow mail transfer. The service will also help authors to distribute their software conveniently." Comput Appl Biosci 1990 6 2 120-121 1023 Heckel,P. A Technique for Isolat.. Comm.ACM 78 21(4):264-268 Heckel P A Technique for Isolating Differences between Files Sequence proximity; Longest common; Subsequence; USA "A simple algorithm is described for isolating the differences between two files. ... The algorithm isolates differences in a way that corresponds closely to our intuitive notion of difference, is easy to implement, and is computationally efficient, with time linear in the file length. For most applications the algorithm isolates differences similar to those isolated by the longest common subsequence." Comm ACM 1978 21 4 264-268 1024 Salamon,P. A Maximum Entropy Prin.. Computers Chem. 92 16(2):117-124 Salamon P; Konopka AK A Maximum Entropy Principle for the Distribution of Local Complexity in Naturally Occurring Nucleotide Sequences Composition; Linguistic; Entropy; Complexity; Distribution; USA; Nucleotide "A maximum entropy principle (MEP) governing the distribution of complexity of short oligonucleotides from large collections of functionally equivalent sequences is presented. The principle is seen to work well in both translated regions (exons and bacterial genes) and introns from various genomes. It also works in cases of sample sequences from various genomes and even a representative sample of the entire GenBank. This suggests that all naturally occurring DNA sequences are likely to follow the MEP described in this report." Computers Chem 1992 16 2 117-124 1025 Bell,G.I. Roles of Repetitive Se.. Computers Chem. 92 16(2):135-143 Bell GI Roles of Repetitive Sequences Regularities; Repeat; Repetition; USA "Repetitive sequences are ubiquitous in the DNA of eukaryotes, some as tandem arrays and others interspersed widely in the genome. Repetitive sequences have special roles in genome evolution, which increasingly detailed sequence information is helping to elucidate. Processes, including meiotic crossing over (equal and unequal), unequal mitotic sister chromatid exchange, gene conversion and transposition, with or without multiplication, can foster homogeneity of the members of a repeat family (concerted evolution) and turnover of the whole genome. Some examples are considered." Computers Chem 1992 16 2 135-143 1026 Konopka,A.K. Sequences, Codes and F.. Computers Chem. 92 16(2):83-84 Konopka AK Sequences, Codes and Functions Sequence analysis; USA; Coding; Linguistic; Function This editorial introduces a special issue entitled "Open Problems of Computational Molecular Biology." The papers were presented at the Open Problems of Computational Molecular Biology Workshop, Telluride, 2-8 June 1991; they are devoted to aspects of the biological coding problem. Contents: Overviews and Opinions (3 papers), Mathematical Models in Biomolecular Linguistics (3), Examples of Encoded Biological Functions (3), Computational Experiments (2), Models and Proposals (2). Computers Chem 1992 16 2 83-84 1027 Churchill,G.A Hidden Markov Chains a.. Computers Chem. 92 16(2):107-115 Churchill GA Hidden Markov Chains and the Analysis of Genome Structure Genome; Statistical; Markov; USA; Structure "In this paper, statistical methods based on a hidden Markov chain model are used to study the structure of some small complete genomes and a human genome segment. A variety of discrete compositional domains are discovered and their correlations with genome function are explored." Computers Chem 1992 16 2 107-115 1028 Argos,P. The Language of Protei.. Computers Chem. 92 16(2):93-102 Argos P The Language of Protein Folding: Many Forked Tongues Structure; Linguistic; DE; Protein; Language; Folding "Protein folding is discussed in analogy with language. The protein primary sequence, a string of successive amino acids in one letter code, is a sentence. A sentence contains words, or subsequences, which are the local secondary structures of the protein. Each of the words can have several meanings but, when collected together and read in context, convey a central idea; namely, the folded and functional protein. Deciphering the fold or meaning of the protein sentence from only a knowledge of the ordered letters is a difficult task." Computers Chem 1992 16 2 93-102 1029 Claverie,J.M. Sequence "Signals": Ar.. Computers Chem. 92 16(2):89-91 Claverie JM Sequence "Signals": Artifact or Reality? Signal; USA; Linguistic "I first review the concept of molecular sequence signal and the various forms it can take in the literature. I then comment on the limitation of this concept on the grounds that sequence signals are neither independent of a subjective representation, nor complete and/or verifiable correlates of the associated biological phenomena." Computers Chem 1992 16 2 89-91 1030 Konopka,A.K. Computational Molecula.. Computers Chem. 93 17(2):v-vi Konopka AK Computational Molecular Biology: From Sequence Research to Software Development Sequence analysis; USA This editorial introduces a special issue entitled "Open Problems of Computational Molecular Biology (2)." The papers were presented at the Second International Workshop on Open Problems in Computational Molecular Biology, Telluride, 19 July - 2 August 1992; they are devoted to aspects of computational molecular biology (the goal of which is to understand biological phenomena through computational experiments and plausible reasoning). Contents: Overviews and Opinions (2 papers), Mathematical Models (3), Computational Experiments and Molecular Evolution (3), Data Analysis and Software Development (4). Computers Chem 1993 17 2 v-vi 1031 Taylor,W.R. Protein Structure Pred.. Computers Chem. 93 17(2):117-122 Taylor WR Protein Structure Prediction from Sequence Structure; Sequence alignment; Pattern match; UK; Protein; Prediction "The problem of protein tertiary structure prediction from sequence is reviewed, emphasizing that practical solutions are most likely to come from the recognition of existing (known) structures that fit the sequence of the protein of unknown structure. Fit can be defined in terms of sequence alone - by simple alignment in the more obvious problems or pattern matching where the similarity is remote and fragmentary. More remote similarities can be recognized by matching the sequence directly onto a known structure. This threading method is outlined ...." Computers Chem 1993 17 2 117-122 1032 Gribskov,M. A Mechanistic View of .. Computers Chem. 93 17(2):113-116 Gribskov M A Mechanistic View of Proteins and Their Sequences Sequence analysis; Mechanistic; USA; Protein "I consider the application of a mechanical analogy to sequence analysis and in particular to protein sequences and structures. The mechanistic metaphor is easily recognized as one of the fundamental concepts behind experimental disciplines such as biochemistry, genetics and cell biology. Its application to analysis of protein sequences is most clearly seen in the application of comparative approaches to associating structure with function." Computers Chem 1993 17 2 113-116 1033 Salamon,P. On the Robustness of M.. Computers Chem. 93 17(2):135-148 Salamon P; Wootton JC; Konopka AK; Hansen LK On the Robustness of Maximum Entropy Relationships for Complexity Distributions of Nucleotide Sequences Composition; Entropy; Complexity; Distribution; USA; Robustness; Nucleotide "Given a functionally equivalent set of natural nucleotide sequences, the distribution of local compositional complexity among all subsequences of this set appears to be as random as possible consistent with the mean complexity of such subsequences. The robustness of this relationship and its possible causes have been explored ...." Computers Chem 1993 17 2 135-148 1034 Wootton,J.C. Statistics of Local Co.. Computers Chem. 93 17(2):149-163 Wootton JC; Federhen S Statistics of Local Complexity in Amino Acid Sequences and Sequence Databases Sequence analysis; Sequence database; Statistical; Complexity; Segment; USA; Amino acid "Protein sequences contain surprisingly many local regions of low compositional complexity. ... Several different formal definitions of local complexity and probability are presented here and are compared for their utility in algorithms for localization of such regions in amino acid sequences and sequence databases. ... These measures ... are shown to be broadly similar for first-pass, approximate localization of low-complexity regions in protein sequences, but they give significantly different results when applied in optimal segmentation algorithms." Computers Chem 1993 17 2 149-163 1035 Bell,G.I. Repetitive DNA Sequenc.. Computers Chem. 93 17(2):185-190 Bell GI; Torney DC Repetitive DNA Sequences: Some Considerations for Simple Sequence Repeats Regularities; Repeat; USA; DNA "(1) Can the polymorphism evident in the length of many simple sequence repeats (SSRs) or microsatellites be explained as a result of unequal mitotic crossing over? [Probably not.] ... (2) Some results are presented on the number of mono- and di-nucleotide repeats in the human genome. For each high scoring locus, an optimal alignment is made of the actual with an ideal SSR; for such alignments, the relative numbers of insertions, deletions (indels), transitions and transversions are obtained for each class of SSR. (3) An elementary derivation of the number of equivalence classes of SSRs of any word length, n, is given." Computers Chem 1993 17 2 185-190 1036 Wu,C.H. Classification Neural .. Computers Chem. 93 17(2):219-227 Wu CH Classification Neural Networks for Rapid Sequence Annotation and Automated Database Organization Sequence database; Neural; Classification; N-gram; USA; Network "A neural network classification method has been developed as an alternative approach to the search/organization problem of large molecular databases. Two artificial neural systems have been implemented on a Cray for rapid protein/nucleic acid classification of unknown sequences. The system employs a n-gram hashing function for sequence encoding and modular back- propagation networks for classification. The protein system, which classifies proteins into PIR superfamilies, has achieved 82-100% sensitivity at a speed that is about an order of magnitude faster than other search methods." Computers Chem 1993 17 2 219-227 1037 Miura,R.M. Preface [Some Mathemat.. Lect.Math.Life 86 17:ix-x Miura RM Preface [Some Mathematical Questions in Biology - DNA Sequence Analysis] Sequence analysis; CA; Pattern recognition; Sequence comparison; Statistical; Probabilistic; Structure; DNA "This volume contains papers based on lectures which were presented at the Eighteenth Annual Symposium on Some Mathematical Questions in Biology - DNA Sequence Analysis. The Symposium was held on May 28, 1984 in New York City in conjunction with the Annual Meeting of the American Association for the Advancement of Science." Lect Math Life Sci 17 17 ix-x 1038 Cole,R. Tight Bounds on the Co.. SIAM J.Comput. 94 23(5):1075-109 Cole R Tight Bounds on the Complexity of the Boyer-Moore String Matching Algorithm String match; Boyer-Moore; Pattern match; USA; Complexity; Algorithm "The problem of finding all occurrences of a pattern of length m in a text of length n is considered. It is shown that the Boyer-Moore string matching algorithm performs roughly 3n comparisons and that this bound is tight up to O(n/m) .... While the upper bound is somewhat involved, its main elements provide a simple proof of a 4n upper bound for the same algorithm." SIAM J Comput 1994 23 5 1075-1091 1039 Crochemore,M. Speeding Up Two String.. Algorithmica 94 12(4/5):247-26 Crochemore M; Czumaj A; Gasieniec L; Jarominek S; Lecroq T; Plandowski W; Rytter W Speeding Up Two String-Matching Algorithms String match; Pattern match; Suffix; Automata; Repetition; Boyer-Moore; Factor; PO; Algorithm "We show how to speed up two string-matching algorithms: the Boyer-Moore algorithm (BM algorithm), and its version called here the reverse factor algorithm (RF algorithm). The RF algorithm is based on factor graphs for the reverse of the pattern. The main feature of both algorithms is that they scan the text right-to-left from the supposed right position of the pattern. ... We show that it is enough to remember the last matched segment ... to speed up the RF algorithm considerably ... and to speed up the BM algorithm (to make at most 2n comparisons." Algorithmica 1994 12 4/5 247-267 1040 Vishkin,U. Optimal Parallel Patte.. Lecture Notes i 85 194:497-508 Vishkin U Optimal Parallel Pattern Matching in Strings (Extended Summary) Pattern match; Parallel; String match; Optimal; IL Proceedings, ICALP'85. "Given a text of length n and a pattern, we present a parallel linear algorithm for finding all occurrences of the pattern in the text. The algorithm runs in O(n/p) time using any number of p <= n/log n processors on a concurrent-read concurrent-write parallel random-access- machine." Lecture Notes in Comput Sci 194 194 497-508 1041 Moore,D. An Optimal Algorithm t.. Inform.Process. 94 50:239-246 Moore D; Smyth WF An Optimal Algorithm to Compute all the Covers of a String Cover; AU; Optimal; Algorithm "Let x denote a given nonempty string of length n >= 1. A string u is a cover of x if and only if every position of x lies within an occurrence of u within x. Thus x is always a cover of itself. In this paper we characterize all the covers of x in terms of an easily computed normal form for x. The characterization theorem then gives rise to a simple recursive algorithm which computes all the covers of x in time Q(n)." Inform Process Lett 50 50 239-246 1042 Perleberg,C.H Single Character Searc.. Inform.Process. 94 50:269-275 Perleberg CH Single Character Searching Methods and the Shift-Or Pattern-Matching Algorithm Sequence search; Pattern match; CL; Algorithm "Single character searching (SCS) methods have wide application in text processing, since many text processing algorithms need to search for a single character in a text string. In this paper, we compare three SCS methods. Two SCS methods are applied to the shift-or pattern matching algorithm of Baeza-Yates and Gonnet (1992), and the performance of the different versions of the algorithm are compared. ... Finally, the shift-or implementations are compared to the Tuned Boyer-Moore implementation of Hume and Sunday (1991)." Inform Process Lett 50 50 269-275 1043 Breslauer,D. Testing String Superpr.. Inform.Process. 94 49:235-241 Breslauer D Testing String Superprimitivity in Parallel String match; Cover; Parallel; Regularities; Italy "A string w covers another string z if every symbol of z is within some occurrence of w in z. A string is called superprimitive if it is covered only by itself .... This paper presents an optimal ... CRCW-PRAM algorithm that tests if a string z is superprimitive ...." Inform Process Lett 49 49 235-241 1044 Apostolico,A. Optimal Superprimitivi.. Inform.Process. 91 39(1):17-20 Apostolico A; Farach M; Iliopoulos CS Optimal Superprimitivity Testing for Strings Cover; Optimal; Regularities; USA "A string w covers another string z if every position of z is within some occurrence of w in z. Clearly, every string is covered by itself. A string that is covered only by itself is superprimitive. We show that the property of being superprimitive is testable on a string of n symbols in O(n) time and space." Inform Process Lett 1991 39 1 17-20 1045 Breslauer,D. An On-line String Supe.. Inform.Process. 92 44(6):345-347 Breslauer D An On-line String Superprimitivity Test Cover; Prefix; Regularities; USA; On-line "We present an on-line linear-time algorithm that tests if each prefix of an input string is superprimitive while the string is given a symbol at a time." Inform Process Lett 1992 44 6 345-347 1046 Amir,A. Alphabet Dependence in.. Inform.Process. 94 49:111-115 Amir A; Farach M; Muthukrishnan S Alphabet Dependence in Parameterized Matching Parameterized; Pattern match; USA; String match "In this paper we provide an algorithm to find all occurrences of a pattern string of length m in a text string of length n under the parameterized pattern matching model. ... Our algorithm is optimal since we show that [a type of] dependence ... is inherent to any algorithm for this problem in the comparison model." Inform Process Lett 49 49 111-115 1047 Kannan,S.K. Inferring Evolutionary.. SIAM J.Comput. 94 23(4):713-737 Kannan SK; Warnow TJ Inferring Evolutionary History from DNA Sequences Phylogeny; Evolutionary tree; Graph; USA; DNA "One of the longstanding problems in computational molecular biology is the Character Compatibility Problem, which is concerned with the construction of phylogenetic trees for species sets, where the species are defined by characters. The character compatibility problem is NP-complete in general. In this paper an O(n2k) time algorithm is described for the case where the species are described by quaternary characters. This algorithm can be used to construct phylogenetic trees from DNA sequences." SIAM J Comput 1994 23 4 713-737 1048 Baeza-Yates,R Fast String Matching w.. Inform.Comput. 94 108(2):187-199 Baeza-Yates RA; Gonnet GH Fast String Matching with Mismatches Match with k mismatches; String match; Boyer-Moore; Automata; CL "We describe and analyze three simple and fast algorithms on the average for solving the problem of string matching with bounded number of mismatches. These are the naive algorithm, an algorithm based on the Boyer-Moore approach, and ad hoc deterministic finite automata searching. We include simulation results that compare these algorithms to previous works." Inform Comput 1994 108 2 187-199 1049 Baeza-Yates,R Fast and Practical App.. Lecture Notes i 92 644:185-192 Baeza-Yates RA; Perleberg CH Fast and Practical Approximate String Matching Approximate match; String match; CL Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992. Proceedings. "We present new algorithms for approximate string matching based in simple, but efficient, ideas. First, we present an algorithm for string matching with mismatches based in arithmetical operations that runs in linear worst case time for most practical cases. This is a new approach to string searching. Second, we present an algorithm for string matching with errors based on partitioning the pattern that requires linear expected time for typical inputs." Lecture Notes in Comput Sci 644 644 185-192 1050 Baeza-Yates,R Technical Corresponden.. Comm.ACM 92 35(4):132-137 Baeza-Yates R; Krogh FT; Ziegler B; Sibbald PR; Sunday DM Technical Correspondence. Notes on a Very Fast Substring Search Algorithm String search; String match; USA; Algorithm Four letters criticizing the paper by Sunday (1990), and his response. Comm ACM 1992 35 4 132-137 1051 Manber,U. Approximate Pattern Ma.. Byte 92 17(12, Nov.):2 Manber U; Wu S Approximate Pattern Matching Pattern match; Approximate match; USA "Most text editors and search programs do not support approximate searches because of the complexity involved in implementing such a procedure. But some new algorithms may change that. Below, we describe one such algorithm in sufficient detail to enable you to include it in your own programs. We also describe agrep, a Unix software tool for apporximate pattern matching that we developed. Agrep includes many options that make searching powerful and convenient." Byte 1992 17 12, Nov. 281-292 1052 Lipton,R.J. Computational Approach.. Proc.IEEE 89 77(7):1056-106 Lipton RJ; Marr TG; Welsh JD Computational Approaches to Discovering Semantics in Molecular Biology Sequence comparison; USA "One of the central questions of molecular biology is the discovery of the semantics of DNA. This discovery relies in a critical way on a variety of expensive computations. In order to solve these computations, both parallel computers and special-purpose hardware play a major role. ... In this paper we discuss the basic methodology involved in discovering the evolutionary structure of both DNA and proteins. ... The fundamental questions are just how the sequences are to be compared. The implementation issues concern the vast amounts of computation required to execute the known algorithms." Proc IEEE 1989 77 7 1056-1060 1053 Richards,F.M. The Protein Folding Pr.. Sci.Am. 91 264(1):54-63 Richards FM The Protein Folding Problem Structure; USA; Protein; Folding "In theory, all one needs to know in order to fold a protein into its biologically active shape is the sequence of its constituent amino acids. Why has nobody been able to put theory into practice?" Sci Am 1991 264 1 54-63 1054 Crochemore,M. Foreword [Selected Pap.. Theoret.Comput. 92 92(1):1-1 Crochemore M Foreword [Selected Papers of the Combinatorial Pattern Matching School, Paris, July 1990] Pattern match; FR; Combinatorial "This volume contains a selection of papers that have been presented at the first Combinatorial Pattern Matching school, held in Paris during July 1990. The school presented to young researchers a wide variety of combinatorial methods used in the domain of Pattern Recognition, through lectures delivered by A. V. Aho, A. Apostolico, M. Crochemore, Z. Galil and E. Ukkonen. Other researchers have also presented their own works and the whole result is a kind of panorama of what is being done in the domain of Pattern Matching and its applications." Theoret Comput Sci 1992 92 1 1-1 1055 Crochemore,M. Foreword [Combinatoria.. Lecture Notes i 94 807:iii-iii Crochemore M; Gusfield D Foreword [Combinatorial Pattern Matching. 5th Annual Symposium, CPM 94] Pattern match; FR Asilomar, CA, USA, June 5-8, 1994. Proceedings. "Combinatorial Pattern Matching addresses issues of searching and matching of strings and more complicated patterns such as trees, regular expressions, extended expressions, etc. The goal is to derive non-trivial combinatorial properties for such structures and then to exploit these properties in order to achieve superior performances for the corresponding computational problems." Contents: alignments (7 papers), various matchings (5), combinatorial aspects (7), more bio- informatics (7). Lecture Notes in Comput Sci 807 807 iii-iii 1056 Lander,E.S. Genomic Mapping by Fin.. Genomics 88 2:231-239 Lander ES; Waterman MS Genomic Mapping by Fingerprinting Random Clones: A Mathematical Analysis Genome; Fingerprint; Mapping; Clone; USA; Genomic "The physical map is assembled by first 'fingerprinting' a large number of clones chosen at random from a recombinant library and then inferring overlaps between clones with sufficiently similar fingerprints. Although the basic approach is the same, there are many possible choices for the fingerprint used to characterize the clones and the rules for declaring overlap. In this paper, we derive simple formulas showing how the progress of a physical mapping project is affected by the nature of the fingerprinting scheme. Using these formulas, we discuss the analytic considerations involved in selecting an appropriate fingerprinting scheme for a particular project." Genomics 2 2 231-239 1057 Michiels,F. Molecular Approaches t.. Comput.Appl.Bio 87 3(3):203-210 Michiels F; Craig AG; Zehetner G; Smith GP; Lehrach H Molecular Approaches to Genome Analysis: A Strategy for the Construction of Ordered Overlapping Clone Libraries Genome; Statistical; Clone; DE "Here we describe progress on a series of molecular techniques designed to bridge the gap between genetic and molecular distances in mammals. ... We summarize approaches for the physical and molecular analysis of genetic distances and describe the experimental, statistical and computational basis of a new approach to create ordered libraries of overlapping clones from large genomes." Comput Appl Biosci 1987 3 3 203-210 1058 Sulston,J. Software for Genome Ma.. Comput.Appl.Bio 88 4(1):125-132 Sulston J; Mallett F; Staden R; Durbin R; Horsnell T; Coulson A Software for Genome Mapping by Fingerprinting Techniques Genome; Fingerprint; Program; UK; Mapping; Fragment "A genome mapping package has been developed for reading and assembling data from clones analysed by restriction enzyme fragmentation and polyacrylamide gel electrophoresis. The package comprises: data entry; matching; assembly; statistical analysis; modelling." Comput Appl Biosci 1988 4 1 125-132 1059 Baeza-Yates,R Proximity Matching usi.. Lecture Notes i 94 807:198-212 Baeza-Yates R; Cunto W; Manber U; Wu S Proximity Matching using Fixed-Queries Trees Approximate match; Search tree; Data structure; CL 5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "We present a new data structure, called the fixed-queries tree, for the problem of finding all elements of a fixed set that are close, under some distance function, to a query element. Fixed-queries trees can be used for any distance function, not necessarily even a metric, as long as it satisfies the triangle inequality. ... Fixed-queries trees are particularly efficient for applications in which comparing two elements is expensive." Lecture Notes in Comput Sci 807 807 198-212 1060 Ukkonen,E. Approximate String-Mat.. Theoret.Comput. 92 92:191-211 Ukkonen E Approximate String-Matching with q-Grams and Maximal Matches Approximate match; N-gram; Sequence proximity; Longest common; Edit; FI; String match "We study approximate string-matching in connection with two string distance functions that are computable in linear time. The first function is based on the so-called q-grams. An algorithm is given for the associated string- matching problem that finds the locally best approximate occurrences of pattern P, |P| = m, in text T, |T| = n, in time O(n log(m-q)). The other distance function is based on finding maximal common substrings and allows a form of approximate string-matching in time O(n). Both distances give a lower bound for the edit distance ... which leads to fast hybrid algorithms for the edit distance based string-matching." Theoret Comput Sci 92 92 191-211 1061 Crochemore,M. String Matching with C.. Lecture Notes i 88 324:44-58 Crochemore M String Matching with Constraints String match; FR Proceedings, MFCS'88 Symposium, Carlsbad, Czechoslovakia. "In this paper, two string-matching algorithms belonging to the second family [fixed word, variable text] are presented. ... The first algorithm ... processes the text in real-time. The delay only depends on the size of the alphabet. Our algorithm heavily relies on properties of minimal automata recognizing the suffixes of a word. ... We present ... an algorithm which requires only constant additional memory space during all its phases .... It makes use of a deep theorem on words ... known as the critical factorization theorem." Lecture Notes in Comput Sci 324 324 44-58 1062 Galil,Z. An Improved Algorithm .. Lecture Notes i 89 372:394-404 Galil Z; Park K An Improved Algorithm for Approximate String Matching Approximate match; Automata; String match; USA; Match with k differences; Algorithm Automata, Languages, and Programming (ICALP'89), 16th International Colloquium. "Given a text string, a pattern string, and an integer k, a new algorithm for finding all occurrences of the pattern string in the text string with at most k differences is presented. Both its theoretical and practical variants improve the known algorithms." Lecture Notes in Comput Sci 372 372 394-404 1063 Quong,R.W. Fast Average-Case Patt.. Theoret.Comput. 92 92:165-179 Quong RW Fast Average-Case Pattern Matching by Multiplexing Sparse Tables Pattern match; Match with k mismatches; USA; Pattern recognition "Pattern matching consists of finding occurrences of a pattern in some data. One general approach is to sample the data collecting evidence about possible matches. By sampling appropriately, we force matches to be sparse and can encode a table of size m as a series of smaller tables .... This method yields practical algorithms with fast average-case running times for a wide variety of pattern matching and pattern recognition problems. We apply our technique of multiplexing sparse tables to the k-mismatches string searching problem ...." Theoret Comput Sci 92 92 165-179 1064 Baeza-Yates,R Algorithms for String .. SIGIR Forum 89 23(3,4):34-58 Baeza-Yates RA Algorithms for String Searching: A Survey Sequence search; Survey; CA; String search; Algorithm "We present the most important algorithms for string matching: the naive algorithm, the Knuth-Morris-Pratt algorithm, different variants of the Boyer- Moore algorithm, the shift-or algorithm, and the Karp-Rabin algorithm (a probabilistic one). Experimental results for random text and one sample of English text are included. We also survey the main theoretical results for each algorithm. We use the C programming language to present our algorithms. ... An extensive bibliography is also included." SIGIR Forum 1989 23 3,4 34-58 1065 Graham,S.L. On Line Context Free L.. ACM Sympos.Theo 76 8:112-120 Graham SL; Harrison MA; Ruzzo WL On Line Context Free Language Recognition in less than Cubic Time Sequence recognition; Language; USA; On-line; Recognition "A new on-line context free language recognition algorithm is presented which is derived from Earley's algorithm and has several advantages over the original. First, the new algorithm not only is conceptually simpler than Earley's, but also allows significant speed improvements. Second, our algorithm serves to explain the connections between Earley's algorithm and the Cocke- Kasami-Younger algorithm. Third, our algorithm allows an implementation which uses only ... O( n3/log n ) operations on a RAM. This makes it the fastest known on-line context free language recognition algorithm." ACM Sympos Theory Comput 8 8 112-120 1066 Hirschberg,D. The Least Weight Subse.. SIAM J.Comput. 87 16(4):628-638 Hirschberg DS; Larmore LL The Least Weight Subsequence Problem Least weight; Subsequence; Dynamic programming; USA "The least weight subsequence (LWS) problem is introduced, and is shown to be equivalent to the classic minimum path problem for directed graphs. A special case of the LWS problem is shown to be solvable in O(n log n) time generally, and for certain weight functions, in linear time. A number of applications are given, including an optimum paragraph formation problem and the problem of finding a minimum height B-tree, whose solutions realize improvement in asymptotic time complexity." SIAM J Comput 1987 16 4 628-638 1067 Valiant,L.G. General Context-Free R.. J.Comput.System 75 10:308-315 Valiant LG General Context-Free Recognition in Less than Cubic Time Sequence recognition; Language; USA; Recognition "By a succession of reductions we show that context-free recognition, for n character input strings, can be carried out at least as fast as multiplication for n x n Boolean matrices. Using Strassen's method for matrix multiplication, an indirect algorithm for general context-free recognition can be derived that has time complexity O(n2.81). This is asymptotically more efficient than any of the best previously known recognition schemes ... all of which require O(n3) time in the worst case. The crucial result on which the new algorithm depends is a general one that is applicable to a wide class of matrix computations." J Comput Systems Sci 10 10 308-315 1068 Wilber,R. The Concave Least-Weig.. J.Algorithms 88 9:418-425 Wilber R The Concave Least-Weight Subsequence Problem Revisited Least weight; Subsequence; USA "D. S. Hirschberg and L. L. Larmore (1987) showed that the concave least- weight subsequence problem can be solved in O(n log n) time and that if a certain extra condition is imposed it can be solved in O(n) time. Here we show that the concave least weight subsequence problem can always be solved in O(n) time, without any extra conditions." J Algorithms 9 9 418-425 1069 Chang,J.H. Parallel Parsing on a .. Proceedings o.. 86IEEE Computer S Chang JH; Ibarra OH; Palis MA Parallel Parsing on a One-Way Array of Finite-State Machines Hwang K Jacobs SM; Swartzlander EE Proceedings of the 1986 International Conference on Parallel Processing, August 19-22, 1986 Sequence recognition; Language; Automata; USA; Parallel; Parsing "We show that a one-way two-dimensional iterative array of finite-state machines can recognize and parse strings of any context-free language in linear time. What makes this result interesting and rather surprising is the fact that each processor of the array holds only a fixed amount of information (independent of the size of the input) and communicates with its neighbors in only one direction. This makes for a simple VLSI implementation." IEEE Computer Society Press Washington, DC 1986 887-894 1070 Chiang,Y.T. Parallel Parsing Algor.. IEEE Trans.Patt 84 6(3):302-314 Chiang YT; Fu KS Parallel Parsing Algorithms and VLSI Implementations for Syntactic Pattern Recognition Sequence recognition; Language; Parallel; Pattern recognition; Parsing; VLSI; USA; Algorithm; Recognition "Earley's algorithm has been commonly used for the parsing of general context-free languages and the error-correcting parsing in syntactic pattern recognition. ... This paper presents a parallel Earley's recognition algorithm in terms of an 'X*' operator. ... Simulation results show that this system can recognize a string with length n in 2n+1 system time. We also present a parallel parse-extraction algorithm, a complete parsing algorithm, and an error- correcting recognition algorithm. ... These parallel algorithms are especially useful for syntactic pattern recognition." IEEE Trans Patt Anal Mach Intell 1984 6 3 302-314 1071 Kosaraju,S.R. Speed of Recognition o.. SIAM J.Comput. 75 4(3):331-340 Kosaraju SR Speed of Recognition of Context-Free Languages by Array Automata Sequence recognition; Language; Automata; USA; Recognition "The recognition speed of context-free languages (CFL's) using arrays of finite state machines is considered. It is shown that CFL's can be recognized by 2-dimensional arrays in linear time and by 1-dimensional arrays in time n2." SIAM J Comput 1975 4 3 331-340 1072 Dolev,D. Parallel Computation o.. Parallel Proc.. 88Elsevier Scienc Dolev D; Gil J Parallel Computation of Edit Distance Chiricozzi E D'Amico A Parallel Processing and Applications Longest common; Edit; Parallel; IL; Distance "The subject of the paper is the edit distance between two strings of characters. We will introduce two parallel algorithms to compute the elementary operations useful in evaluating the edit distance. ... The parallel model used is CRCW-PRAM .... The main algorithm in the paper is an efficient algorithm for solving the editing distance with arbitrary weights for the operations replace, delete, and insert. ... When the weights of the three functions are ..., the editing distance reduces to the problem of finding the longest common subsequence (LCS) of the two strings." Elsevier Science Amsterdam 1988 265-275 1073 Burnett,L. Development of a Super.. Nucleic Acids R 86 14(1):47-55 Burnett L Development of a Superior Strategy for Computer-Assisted Nucleotide Sequence Analysis Pairwise comparison; Region; AU; Sequence analysis; Nucleotide "A new strategy for high-resolution nucleotide sequence analysis has been developed. The strategy involves an exhaustive tree-searching algorithm which examines all possible combinations of short regions of sequence alignments, followed by culling of unsuitable sequence relationships. The new algorithm can detect sequence homologies invisible to existing algorithms, and is capable of detecting all possible sequence relationships." Nucleic Acids Res 1986 14 1 47-55 1074 Lawrence,C.B. Data structures for DN.. Nucleic Acids R 86 14(1):205-216 Lawrence CB Data structures for DNA Sequence Manipulation Data structure; Sequence database; USA; Structure; DNA "Two data structures designated Fragment and Construct are described. The Fragment data structure defines a continuous nucleic acid sequence from a unique genetic origin. The Construct defines a continuous sequence composed of sequences from multiple genetic origins. These data structures are manipulated by a set of software tools to simulate the construction of mosaic recombinant DNA molecules. They are also used as an interface between sequence data banks and analytical programs." Nucleic Acids Res 1986 14 1 205-216 1075 Roberts,R.J. Preface [Special issue.. Nucleic Acids R 86 14(1):0-0 Roberts RJ; Soll D Preface [Special issue devoted to the applications of computers to research on nucleic acids] Sequence analysis; Program; USA "This is the third special issue of Nucleic Acids Research devoted to the applications of computers to research on nucleic acids. ... The aim of these special issues has been to heighten the awareness of both scientists and programmers to the broad range of software that is currently available." Table of contents: 63 papers, 620 pages. Nucleic Acids Res 1986 14 1 0-0 1076 Roberts,R.J. Preface [Issue devoted.. Nucleic Acids R 82 10(1):0-0 Roberts RJ; Soll D Preface [Issue devoted to the applications of computers to research on nucleic acids] Sequence analysis; Program; USA "The programs described range from straightforward algorithms based on simple search routines to some highly sophisticated packages for data base management and analysis." Table of contents: 38 papers, 456 pages. Nucleic Acids Res 1982 10 1 0-0 1077 Pearson,W.R. Automatic Construction.. Nucleic Acids R 82 10(1):217-227 Pearson WR Automatic Construction of Restriction Site Maps Restriction; Program; USA "A computer program is described which constructs maps of restriction endonuclease cleavage sites in DNA molecules, given only the fragment lengths. The program utilizes fragment length data from single and double restriction enzyme digests to generate maps for linear or circular molecules." Nucleic Acids Res 1982 10 1 217-227 1078 Roberts,R.J. Preface [Issue devoted.. Nucleic Acids R 84 12(1):0-0 Roberts RJ; Soll D Preface [Issue devoted to the applications of computers to research on nucleic acids] Sequence analysis; Program; USA "It is the aim of this second special issue to heighten the awareness of both scientists and programmers to the broad range of software that is currently available." Table of contents: part 1, 37 papers; part 2, 42 papers. Nucleic Acids Res 1984 12 1 0-0 1079 Waterman,M.S. Algorithms for Restric.. Nucleic Acids R 84 12(1):237-242 Waterman MS; Smith TF; Katcher HL Algorithms for Restriction Map Comparisons Restriction; Mapping; USA; Algorithm "An algorithm is presented which compares two restriction maps, yielding a measure of distance between the maps and relating the maps by an alignment. This new algorithm finds the minimum weighted sum of genetic events required to convert one map into the other, where the genetic events are the appearance/disappearance of restriction sites and changes in the number of bases between restriction sites." Nucleic Acids Res 1984 12 1 237-242 1080 Bucher,P. Signal Search Analysis.. Nucleic Acids R 84 12(1):287-305 Bucher P; Bryan B Signal Search Analysis: A New Method to Localize and Characterize Functionally Important DNA Sequences Signal; Pattern discovery; N-gram; SWI; DNA "The generation of 'signal search data' represents a general method of describing the common properties of a set of DNA sequences presumed to be functionally analogous. Besides the detailed description of this method we present two computer programs which use signal search data as input data: One that processes them to a 'constraint profile' and another one which lists over- represented 'signals' of potential functional relevance." Nucleic Acids Res 1984 12 1 287-305 1081 Sankoff,D. A Strategy for Sequenc.. Nucleic Acids R 82 10(1):421-431 Sankoff D; Cedergren RJ; McKay W A Strategy for Sequence Phylogeny Research Phylogeny; CA; Statistical "The proliferation of sequence data eventually exceeds the capacity of rigorous minimal mutation methods. Rather than having recourse to rapid suboptimal or matrix methods, which lead to uncertain, ambiguous and non-unique results, we suggest here a way of combining reasonable degrees of biological and/or statistical certainty about the data with absolute optimization procedures. This reduces the computing problem without the disadvantages of suboptimal methods." Nucleic Acids Res 1982 10 1 421-431 1082 Fuchs,R. New Services of the EM.. Nucleic Acids R 90 18(15):4319-43 Fuchs R; Stoehr P; Rice P; Omond R; Cameron G New Services of the EMBL Data Library Sequence database; Database search; FASTA; DE; EMBL "The EMBL File Server has been reorganised, and many new databases and other information relevant to biologists are now accessible via global computer networks. A broad range of software for molecular biology is freely available for different popular computer systems, including the EMBL enhancements to the Wisconsin (GCG) Package. The new Mail-Quicksearch and Mail-FastA services give access to the latest sequence data for database searches by ordinary electronic mail." Nucleic Acids Res 1990 18 15 4319-4323 1083 Brendel,V. A Computer Algorithm f.. Nucleic Acids R 84 12(10):4411-44 Brendel V; Trifonov EN A Computer Algorithm for Testing Potential Prokaryotic Terminators Match a pattern matrix; IL; Algorithm "An algorithm to locate terminators in templates of known nucleotide sequence has been constructed on the basis of correlation to the distribution of dinucleotides along the aligned signal sequences. The algorithm has been tested on natural sequences of a total length of about 11,500 N. It finds all known independent terminators and only a few other sites, including some of the rho- dependent and putative terminators." Nucleic Acids Res 1984 12 10 4411-4427 1084 Mulligan,M.E. Escherichia coli Pro .. Nucleic Acids R 84 12(1):789-800 Mulligan ME; Hawley DK; Entriken R; McClure WR Escherichia coli Promoter Sequences Predict in vitro RNA Polymerase Selectivity Match complex patterns; Sequence proximity; USA; Match a pattern matrix; RNA "We describe a simple algorithm for computing a homology score for Escherichia coli promoters based on DNA sequence alone. ... The search for a possible promoter site within a DNA sequence occurs in two steps. Initially the locations of sequences homologous to the consensus sequence of the two most highly conserved regions are identified. ... The second stage of the promoter search is the combination of -35 sequences with -10 sequences to form potential promoters. ... Once a potential promoter has been located, it is then evaluated according to a weighting scheme." Nucleic Acids Res 1984 12 1 789-800 1085 Lake,J.A. Determining Evolutiona.. J.Mol.Evol. 87 26:59-73 Lake JA Determining Evolutionary Distances from Highly Diverged Nucleic Acid Sequences: Operator Metrics Phylogeny; Invariant; Statistical; USA; Substitution; Evolutionary distance; Distance "Operator metrics are explicitly designed to measure evolutionary distances from nucleic acid sequences when substitution rates differ greatly among the organisms being compared, or when substitutions have been extensive. Unlike lengths calculated by the distance matrix and parsimony methods, in which substitutions in one branch of a tree can alter the measured length of another branch, lengths determined by operator metrics are not affected by substitutions outside the branch." J Mol Evol 26 26 59-73 1086 Klotz,L.C. Calculation of Evoluti.. Proc.Nat.Acad.S 79 76(9):4516-452 Klotz LC; Komar N; Blanken RL; Mitchell RM Calculation of Evolutionary Trees from Sequence Data Phylogeny; USA; Evolutionary tree; Substitution "In this paper we present a method for calculating evolutionary trees from sequence data that uses, along with the difference matrix [constructed from the sequence differences between pairs of sequences from the organisms], the rate of evolution of the various sequences from their common ancestor. It is proven analytically that this method uniquely determines both the correct tree topology and root in theory for unequal rates of sequence evolution. How one would estimate an ancestral sequence to be used in the method is discussed ...." Proc Nat Acad Sci USA 1979 76 9 4516-4520 1087 Waterman,M.S. Parametric and Ensembl.. Bull.Math.Biol. 94 56(4):743-767 Waterman MS Parametric and Ensemble Sequence Alignment Algorithms Sequence alignment; Dynamic programming; Locally optimal; USA; Parametric; Algorithm "Recently algorithms for parametric alignment ... find optimal scores for all penalty parameters, both for global and local sequence alignment. This paper reviews those techniques. Then in the main part of this paper dynamic programming methods are used to compute ensemble alignment, finding all alignment scores for all parameters. Both global and local ensemble alignments are studied, and parametric alignment is used to compute near optimal ensemble alignments." Bull Math Biol 1994 56 4 743-767 1088 Felsenstein,J Evolutionary Trees fro.. J.Mol.Evol. 81 17:368-376 Felsenstein J Evolutionary Trees from DNA Sequences: A Maximum Likelihood Approach Phylogeny; Statistical; Likelihood; Character data; Evolutionary rate; Program; USA; Evolutionary tree; DNA "The application of maximum likelihood techniques to the estimation of evolutionary trees from nucleic acid sequence data is discussed. A computationally feasible method for finding such maximum likelihood estimates is developed, and a computer program is available. The method has advantages over the traditional parsimony algorithms, which can give misleading results if rates of evolution differ in different lineages. It also allows the testing of hypotheses about the constancy of evolutionary rates by likelihood ratio tests, and gives rough indication of the error of the estimate of the tree." J Mol Evol 17 17 368-376 1089 Felsenstein,J Inferring Evolutionary.. Statistical A.. 83Marcel Dekker Felsenstein J Inferring Evolutionary Trees from DNA Sequences Weir BS Statistical Analysis of DNA Sequence Data Statistical; Phylogeny; Likelihood; USA; Evolutionary tree; Analytical; Robustness; DNA See Weir (1983) for the book's bibliography, pp. 231-248. Introduction. Parsimony methods. Maximum likelihood methods. Alternatives to Likelihood. The state of the problem Marcel Dekker New York 1983 133-150 1090 Tavare,S. Some Probabilistic and.. Lect.Math.Life 86 17:57-86 Tavare S Some Probabilistic and Statistical Problems in the Analysis of DNA Sequences Probabilistic; Statistical; Sequence analysis; Substitution; USA; DNA "This paper concentrates on statistical aspects of the estimation of substitution rates and divergence times on the basis of DNA sequence data. A new method of estimation is suggested, and exhibited using data from serum albumin and a-fetoprotein. The divergence time of the rat and mouse is estimated using a tree calibrated by the human-rat divergence time. Some inherent difficulties in these methods are highlighted by statistical analysis of the sequences." Lect Math Life Sci 17 17 57-86 1091 Naor,D. On Suboptimal Alignmen.. Lecture Notes i 93 684:179-196 Naor D; Brutlag D On Suboptimal Alignments of Biological Sequences Sequence alignment; Suboptimal; Enumeration; USA 4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings. "We present a method for representing all alignments whose score is within any given delta from the optimal score. It represents a large number of alignments by a compact graph which makes it easy to impose additional biological constraints and select one desirable alignment from this large set. ... We define a set of 'canonical' suboptimal alignments, and argue that these are the essential ones since any other suboptimal alignment is a combination of few canonical ones. We then show how to efficiently enumerate suboptimal alignments in order of their score, and count their numbers." Lecture Notes in Comput Sci 684 684 179-196 1092 Klotz,L.C. A Practical Method for.. J.Theor.Biol. 81 91:261-272 Klotz LC; Blanken RL A Practical Method for Calculating Evolutionary Trees from Sequence Data Phylogeny; Evolutionary tree; USA "In a previous paper (Klotz et al., 1979) we described a method for determining evolutionary trees from sequence data when rates of evolution of the sequences might differ greatly. ... However, the method is impractical to use in most situations because it requires some knowledge of the ancestor. In this present paper we describe another method, related to the previous one, in which a present-day sequence can serve temporarily as an ancestor for purposes of determining the evolutionary tree regardless of the rates of evolution of the sequences involved." J Theor Biol 91 91 261-272 1093 Zharkikh,A.A. Rapid Evaluation of Nu.. Dokl.Biol.Sci. 89 308:611-613 Zharkikh AA; Rzhetskii AY Rapid Evaluation of Nucleotide Sequence Homology by Oligonucleotide Frequency Analysis Sequence comparison; Composition; Homology; Database search; N-gram; RU; Nucleotide Translated from Doklady Akademii Nauk SSSR, 308(5), 1232-1235, Oct. 1989. "The rapid evaluation of homologies between two or more DNA sequences is important in searching for homologs in databases, in molecular taxonomy, and for other purposes. The complex and time-consuming nature of homology analysis is such that it is appropriate to substitute standard methods with faster but less accurate methods using a relatively small number of their integral characteristics. For this purpose we report the use of a measure of the similarity of the oligonucleotide composition of DNA sequences." Dokl Biol Sci 308 308 611-613 1094 Kimura,M. Estimation of Evolutio.. Proc.Nat.Acad.S 81 78(1):454-458 Kimura M Estimation of Evolutionary Distances between Homologous Nucleotide Sequences Substitution; Codon; Pairwise comparison; Evolutionary distance; Statistical; Distance; JP; Nucleotide; Estimation "By using two models of evolutionary base substitutions - 'three- substitution-type' and 'two-frequency-class' models - some formulae are derived which permit a simple estimation of the evolutionary distances (and also the evolutionary rates when the divergence times are known) through comparative studies of DNA (and RNA) sequences. These formulae are applied to estimate the base substitution rates at the first, second, and third positions of codons .... Also, formulae for estimating the synonymous component (at the third codon position) and the standard errors are obtained." Proc Nat Acad Sci USA 1981 78 1 454-458 1095 Amir,A. An Alphabet Independen.. SIAM J.Comput. 94 23(2):313-323 Amir A; Benson G; Farach M An Alphabet Independent Approach to Two-Dimensional Pattern Matching Pattern match; Regularities; Multidimensional; USA; String match "The authors show an algorithm for two-dimensional matching with an O(n2) text-scanning phase. Furthermore, the text scan requires no special assumptions about the alphabet, i.e., it runs on the same model as the standard linear-time string-matching algorithm. The pattern preprocessing requires an ordered alphabet and runs with the same alphabet dependency as the previously known algorithms." SIAM J Comput 1994 23 2 313-323 1096 Amir,A. Two-dimensional Dictio.. Inform.Process. 92 44(5):233-239 Amir A; Farach M Two-dimensional Dictionary Matching Dictionary match; Multidimensional; USA "In this paper, we present an algorithm for the Two-Dimensional Dictionary Problem. [It] is that of finding each occurrence of a set of two-dimensional patterns in a text." Inform Process Lett 1992 44 5 233-239 1097 Bird,R.S. Two Dimensional Patter.. Inform.Process. 77 6(5):168-170 Bird RS Two Dimensional Pattern Matching Pattern match; Multidimensional; Knuth-Morris-Pratt; UK "In this case the problem is to determine where, if anywhere, the pattern occurs as a subarray of the text. Our purpose is to give an algorithm for the two dimensional case, one which follows the general approach of the [Knuth- Morris-Pratt algorithm], and indeed uses the KMP as a subprogram." Inform Process Lett 1977 6 5 168-170 1098 Zhu,R.F. A Technique for Two-Di.. Comm.ACM 89 32(9):1110-112 Zhu RF; Takaoka T A Technique for Two-Dimensional Pattern Matching Pattern match; Multidimensional; JP "By reducing an array matching problem to a string matching problem in a natural way, it is shown that efficient string matching algorithms can be applied to arrays, assuming that a linear preprocessing is made on the text. ... In this article we first present an efficient pattern matching algorithm for the two-dimensional case, one which is a combination of the [Knuth-Morris-Pratt] and [Rabin-Karp] algorithms. ... Computer experiments show that for various pattern sizes the average cost for either of our algorithms is much less than that of the algorithm proposed by Bird (1977)." Comm ACM 1989 32 9 1110-1120 1099 Stanfill,C. Parallel Free-Text Sea.. Comm.ACM 86 29(12):1229-12 Stanfill C; Kahle B Parallel Free-Text Search on the Connection Machine System Parallel; Text search; Database search; USA; Hardware "A new implementation of free-text search using a new parallel computer - the Connection Machine - makes possible the application of exhaustive methods not previously feasible for large databases." Comm ACM 1986 29 12 1229-1239 1100 Reeves,P.R. MULTICOMP: A Program f.. Comput.Appl.Bio 94 10(3):281-284 Reeves PR; Farnell L; Lan R MULTICOMP: A Program for Preparing Sequence Data for Phylogenetic Analysis Phylogeny; Management; AU; Program; Phylogenetic "MULTICOMP is a program that assists in the phylogenetic analysis of DNA sequences. It streamlines sequence handling and analysis. Input is from either individual sequence files or a file of aligned sequences. It produces data on variation at DNA and amino acid sequence level and can also convert sequences to data formats suitable for PHYLIP, PAUP and MacClade phylogenetic inference programs. Further, two tree-building programs, NEIGHBOR and DNAPARS, of PHYLIP can be directly run from within it." Comput Appl Biosci 1994 10 3 281-284 1101 Sadler,J.R. Regulatory Pattern Ide.. Nucleic Acids R 83 11(7):2221-223 Sadler JR; Waterman MS; Smith TF Regulatory Pattern Identification in Nucleic Acid Sequences Pattern recognition; Identification; USA; Nucleic acid "A critique of the often employed consensus and local homology methods suggests the need for new tools. In particular, such new methods should use the positional and structural data now becoming available on exactly what it is that is recognized in the DNA sequence by sequence-specific binding proteins." Nucleic Acids Res 1983 11 7 2221-2231 1102 Breen,S. Renewal Theory for Sev.. J.Appl.Probab. 85 22:228-234 Breen S; Waterman MS; Zhang N Renewal Theory for Several Patterns Sequence analysis; Statistical; USA "Discrete renewal theory is generalized to study the occurrence of a collection of patterns in random sequences, where a renewal is defined to be the occurrence of one of the patterns in the collection which does not overlap an earlier renewal. The action of restriction enzymes on DNA sequences provided motivation for this work. Related results of Guibas and Odlyzko are discussed." J Appl Probab 22 22 228-234 1103 Waterman,M.S. Consensus Methods for .. Mathematical .. 89CRC Press Waterman MS Consensus Methods for Folding Single-Stranded Nucleic Acids Waterman MS Mathematical Methods for DNA Sequences Consensus method; Structure; USA; Nucleic acid; Folding "The structure of single-stranded RNA macromolecules is crucial to the functioning of an organism. ... Other than by guessing or inspection, there seem to be two major techniques for prediction of secondary structure: the minimum energy method and the comparative method. The previous chapter gives an extensive treatment of the important minimum energy method, which utilizes dynamic programming. After briefly discussing the minimum energy approach we turn to the main topic of this chapter, comparative or consensus analysis of folding." CRC Press Boca Raton, FL 1989 185-224 1104 Waterman,M.S. Genomic Sequence Datab.. Genomics 90 6:700-701 Waterman MS Genomic Sequence Databases Genome; Sequence database; USA; Genomic "Collecting and managing data that are growing so rapidly, that require constant correction, and that must be adapted to new definitions are major tasks. Cooperation between databases has obvious scientific and political difficulties, even within one country. When we factor in problems of international cooperation, the reality of a unified set of biological databases seems even more remote. These areas require policy decisions that will affect the progress of international science. Who should make these decisions? Who will actually make them? National and international databases must be coordinated. ... We cannot leave the future of information management in biology to chance." Genomics 6 6 700-701 1105 Kececioglu,J. Reconstructing a Histo.. ACM-SIAM Sympos 94 5:471-480 Kececioglu J; Gusfield D Reconstructing a History of Recombinations from a Set of Sequences Phylogeny; Sequence analysis; Genomic; Recombination; Edit; Distance; USA Preprint, 12 pp. "One of the classic problems in computational biology is the reconstruction of evolutionary histories. A recent trend is toward increasing the explanatory power of the models by incorporating higher-order evolutionary events that more accurately reflect the full range of mutation at the molecular level. In this paper, we take a step in this direction by considering the problem of reconstructing an evolutionary history for a set of genetic sequences that have evolved by recombination. Recombination produces a new sequence by crossing two parent sequences, and is among the most important mechanisms of high-order molecular mutation." ACM-SIAM Sympos Discrete Algorithms 1994 5 471-480 1106 Pevzner,P.A. Matrix Longest Subsequ.. Lecture Notes i 92 644:79-89 Pevzner PA; Waterman MS Matrix Longest Subsequence Problem, Duality and Hilbert Bases Longest common; Subsequence; USA; Duality; Matrix Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992. Proceedings. "Although a number of efficient algorithms for the longest common subsequence (LCS) problem have been suggested since the 1970's, there is no duality theorem for the LCS problem. In the present paper a simple duality theorem is proved for the LCS problem and for a wide class of partial orders generalizing the notion of common subsequence. An algorithm for finding generalized LCS is suggested which has the classical dynamic programming algorithm as a special case." Lecture Notes in Comput Sci 644 644 79-89 1107 Pevzner,P.A. Generalized Sequence A.. Adv.Appl.Math. 93 14(2):139-171 Pevzner PA; Waterman MS Generalized Sequence Alignment and Duality Sequence alignment; Longest common; Duality; USA "Although a number of efficient algorithms for the longest common subsequence (LCS) problem have been suggested since the 1970s, there is no duality theorem for the LCS problem. In the present paper a simple duality theorem is proved for the LCS problem and for a wide class of partial orders generalizing the notion of common subsequence and sequence alignment. An algorithm for finding generalized alignment is suggested which has the classical dynamic programming approach for alignment problems as a special case. The algorithm covers both local and global alignment as well as a variety of gap functions." Adv Appl Math 1993 14 2 139-171 1108 Pevzner,P.A. A Fast Filtration for .. Lecture Notes i 93 684:197-214 Pevzner PA; Waterman MS A Fast Filtration for the Substring Matching Problem String match; Match with k mismatches; Approximate match; USA 4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings. "Given a text of length n and a query of length q we present an algorithm for finding all locations of m-tuples in the text and in the query that differ by at most k mismatches. ... In the case q=m the problem coincides with the classical approximate string matching with k mismatches problem. We present a new approach to this problem based on multiple filtration which may have advantages over some sophisticated and theoretically efficient methods that have been proposed." Lecture Notes in Comput Sci 684 684 197-214 1109 Arratia,R. A Phase Transition for.. Ann.Appl.Probab 94 4(1):200-225 Arratia R; Waterman MS A Phase Transition for the Score in Matching Random Sequences Allowing Deletions Pairwise alignment; Scoring; Significance; Subsequence; Longest common; USA; Sequence match; Transition; Deletion; Score "We consider a sequence matching problem involving the optimal alignment score for contiguous subsequences, rewarding matches and penalizing for deletions and mismatches. This score is used by biologists comparing pairs of DNA or protein sequences. We prove that for two sequences of length n, as n goes to infinity, there is a phase transition between linear growth in n, when the penalty parameters are small, and logarithmic growth in n, when the penalties are large. The results are valid for independent sequences with iid or Markov letters. ... The longest common subsequence problem of Chvatal and Sankoff is a special case of our setup." Ann Appl Probab 1994 4 1 200-225 1110 Waterman,M.S. Rapid and Accurate Est.. Proc.Nat.Acad.S 94 91:4625-4628 Waterman MS; Vingron M Rapid and Accurate Estimates of Statistical Significance for Sequence Data Base Searches Database search; Statistical; Significance; Sequence database; USA "A central question in sequence comparison is the statistical significance of an observed similarity. For local alignment containing gaps to optimize sequence similarity this problem has so far not been solved mathematically. Using as a basis the Chen-Stein theory of Poisson approximation, we present a practical method to approximate the probability that a local alignment score is a result of chance alone. For a set of similarity scores and gap penalties only one simulation of random alignments needs to be calculated to derive the key information allowing us to estimate the significance of any alignment calculated under this setting. We present applications to data base searching and the analysis of pairwise and self-comparisons of proteins." Proc Nat Acad Sci USA 91 91 4625-4628 1111 Wang,J.T.L. Discovering Active Mot.. Nucleic Acids R 94 22(14):2769-27 Wang JTL; Marr TG; Shasha D; Shapiro BA; Chirn GW Discovering Active Motifs in Sets of Related Protein Sequences and Using Them for Classification Sequence search; Motif; Classification; USA; Protein "We describe a method for discovering active motifs in a set of related protein sequences. The method is an automatic two step process: (1) find candidate motifs in a small sample of the sequences; (2) test whether these motifs are approximately present in all the sequences. To reduce the running time, we develop two optimization heuristics based on statistical estimation and pattern matching techniques. ... By combining the discovered motifs with an existing fingerprint technique, we develop a protein classifier." Nucleic Acids Res 1994 22 14 2769-2775 1112 Britten,R.J. Repeated Sequences in .. Science 68 161(9 Aug.):52 Britten RJ; Kohne DE Repeated Sequences in DNA Regularities; Repeat; USA; DNA "Hundreds of thousands of copies of DNA sequences have been incorporated into the gneomes of higher organisms. ... In this article we describe selected measurements that show most clearly the presence of repeated sequences and indicate some of their properties." Science 1968 161 9 Aug. 529-540 1113 Britten,R.J. Analysis of Repeating .. Methods Enzymol 74 29:363-418 Britten RJ; Graham DE; Neufeld BR Analysis of Repeating DNA Sequences by Reassociation Regularities; Repetition; USA; DNA "Repetitive DNA occurs widely, if not universally, among higher organisms. A variety of procedures has been developed or adapted to examine its characteristics, and a body of concepts and language has grown up to deal with its complexities. This chapter attempts to summarize this body of knowledge and technique. ... The chapter consists of an introduction in the form of a glossary and descriptions of techniques and a method for the evaluation of rate constants." Methods Enzymol 29 29 363-418 1114 Felsenstein,J Numerical Methods for .. Q.Rev.Biol. 82 57(4):379-404 Felsenstein J Numerical Methods for Inferring Evolutionary Trees Evolutionary tree; Phylogeny; Review; USA Parsimony methods. Correlated characters. Compatibility. Clustering methods. Pairwise methods. Explicitly statistical methods. Nucleotide and protein sequence data. Q Rev Biol 1982 57 4 379-404 1115 Kashyap,R.L. Statistical Estimation.. J.Theor.Biol. 74 47:75-101 Kashyap RL; Subas S Statistical Estimation of Parameters in a Phylogenetic Tree Using a Dynamic Model of the Substitutional Process Statistical; Phylogeny; Likelihood; USA; Model; Phylogenetic; Estimation; Dynamic "Using a modified version of the substitutional process proposed by Neyman, we estimate the parameters of the phylogenetic tree made up of three species .... The parameters estimated are the rate of substitution of amino acids along a protein and the ratio of the times of divergence of the species .... A method is given for determining the tree structure when it is not known. Both the maximum likelihood and Bayes methods are used in the estimation. ... Next we consider the construction of the correct phlyogenetic tree made up of three or more taxonomic categories ...." J Theor Biol 47 47 75-101 1116 Kececioglu,J. Of Mice and Men: Algor.. ACM-SIAM Sympos 95 6:???-??? Kececioglu JD; Ravi R Of Mice and Men: Algorithms for Evolutionary Distances between Genomes with Translocation Evolutionary distance; Genome; Translocation; Rearrangement; Inversion; USA; Distance; Algorithm Preprint, 10 pp. "In this paper, we begin the algorithmic study of genome rearrangement by translocation. ... We model this as a process that exchanges prefixes and suffixes of strings, where each string represents a sequence of distinct markers along a chromosome in the genome. For the general problem of determining the translocation distance between two such sets of strings, we present a 2-approximation algorithm. ... We also examine for the first time two types of rearrangements in concert. ... For genomes that have evolved by translocation and inversion, we show there is a simple 2-approximation algorithm for data in which the orientation of markers is unknown, and a 3/2-approximation algorithm when orientation is known." ACM-SIAM Sympos Discrete Algorithms 1995 6 ???-??? 1117 Felsenstein,J The Number of Evolutio.. Syst.Zool. 78 27:27-33 Felsenstein J The Number of Evolutionary Trees Evolutionary tree; USA "A simple method of counting the number of possible evolutionary trees is presented. The trees are assumed to be rooted, with labelled tips but unlabelled root and unlabelled interior nodes. The method allows multifurcations as well as bifurcations. It makes use of a simple recurrence relation for T(n,m), the number of trees with n labelled tips and m unlabelled interior nodes." Syst Zool 27 27 27-33 1118 Felsenstein,J Statistical Inference .. J.Roy.Statist.S 83 146(3):246-272 Felsenstein J Statistical Inference of Phylogenies Statistical; Phylogeny; Markov; Likelihood; Review; USA "Statistical work on inferring phylogenies has concentrated on two cases: nucleic acid sequence data, modelled by a stochastic process with four discrete states, and gene frequency data, modelled by Brownian motion. A review of this work is presented. There are many unsolved problems, the most important of which is to persuade biologists to think of the problem of inferring phylogenies as being basically statistical, and to abandon deductive frameworks that are used as a justification for 'parsimony' methods." J Roy Statist Soc Ser A 1983 146 3 246-272 1119 Fitch,W.M. Construction of Phylog.. Science 67 155(20 Jan.):2 Fitch WM; Margoliash E Construction of Phylogenetic Trees Phylogeny; Evolutionary tree; Clustering; Distance; USA; Phylogenetic "A method based on mutation distances as estimated from cytochrome c sequences is of general applicability. ... The mutation distance between two cytochromes is defined here as the minimal number of nucleotides that would need to be altered in order for the gene for one cytochrome to code for the other." Science 1967 155 20 Jan. 279-284 1120 Fitch,W.M. Toward Defining the Co.. Syst.Zool. 71 20:406-416 Fitch WM Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology Evolutionary tree; Phylogeny; USA; Evolution; Character optimization; Topology "A method is presented that is asserted to provide all hypothetical ancestral character states that are consistent with describing the descent of the present-day character states in a minimum number of changes of state using a predetermined phylogenetic relationship among the taxa represented. The character states used as examples are the four messenger RNA nucleotides encoding the amino acid sequences of proteins, but eh method is general." Syst Zool 20 20 406-416 1121 Hartigan,J.A. Minimum Mutation Fits .. Biometrics 73 29:53-65 Hartigan JA Minimum Mutation Fits to a Given Tree Evolutionary tree; Phylogeny; USA "A number of objects, such as species, lie at the ends of a known evolutionary tree. A variable taking a finite number of possible values is specified on this set of objects. How can the values of the variable be estimated for the ancestors of the objects? One way is to assign to the ancestors those values which have the minimum number of mutations (or changes) in going from ancestors to their immediate descendants. In this paper, a method of generating all such minimum mutation fits is described. ... Most relevantly, a recent paper by Fitch (1971) ... specifies rules for constructing a minimum mutation fit to a given binary tree. The advance of this paper is in specifying construction rules for a general tree, and in proving that the rules do given a minimum mutation fit." Biometrics 29 29 53-65 1122 Kaplan,N. Statistical Analysis o.. Statistical A.. 83Marcel Dekker Kaplan N Statistical Analysis of Restriction Enzyme Map Data and Nucleotide Sequence Data Weir BS Statistical Analysis of DNA Sequence Data Statistical; Restriction; Mapping; USA; Nucleotide See Weir (1983) for the book's bibliography, pp. 231-248. "Restriction enzyme map data provide only a limited amount of information about the similarities of homologous DNA sequences. Complete information is in hand when the DNA is totally sequenced. ... The analysis of the data generated by these new techniques has required the development of new statistical methodology. Several authors have used this data to estimate evolutionary distance between two DNA sequences having a common ancestor .... Others have used the data to estimate DNA sequence variation within populations .... The purpose of this chapter is to survey these statistical methods." Marcel Dekker New York 1983 75-106 1123 Kimura,M. A Simple Method for Es.. J.Mol.Evol. 80 16:111-120 Kimura M A Simple Method for Estimating Evolutionary Rates of Base Substitutions Through Comparative Studies of Nucleotide Sequences Substitution; Evolutionary distance; Sequence comparison; JP; Evolutionary rate; Rate; Nucleotide "Some simple formulae were obtained which enable us to estimate evolutionary distances in terms of the number of nucleotide substitutions (and, also, the evolutionary rates when the divergence times are known). In comparing a pair of nucleotide sequences, we distinguish two types of differences; [transitions, transversions]. ... Also, formulae for standard errors were obtained. Some examples were worked out using reported globin sequences to show that synonymous substitutions occur at much higher rates than amino acid- altering substitutions in evolution." J Mol Evol 16 16 111-120 1124 Kimura,M. On the Stochastic Mode.. J.Mol.Evol. 72 2:87-90 Kimura M; Ohta T On the Stochastic Model for Estimation of Mutational Distance between Homologous Proteins Substitution; Evolutionary distance; Stochastic; JP; Distance; Protein; Model; Estimation "A set of simple equations is derived which gives the relationship between the observed amino acid differences per 100 codons and the evolutionary distance per 100 codons using Holmquist's stochastic model of molecular evolution." J Mol Evol 2 2 87-90 1125 Farris,J.S. On the Phenetic Approa.. Major Pattern.. 77Plenum Press Farris JS On the Phenetic Approach to Vertebrate Classification Hecht MK Goody PC; Hecht BM Major Patterns in Vertebrate Evolution. NATO ASI Series, Vol. 14 Classification; Clustering; Hierarchical; Evolutionary tree; USA "The topological errors [of an inferred phylogeny] might be remedied, however, by using a correction called the transformed distance method (Farris 1977; Klotz et al. 1979). In brief, this method uses an outgroup as reference to make corrections for unequal rates of evolution among the lineages under study and then applies UPGMA to the new distance matrix to infer the topology of the tree." -- Li, Graur (1991), p. 109. Plenum Press New York 1977 823-850 1126 Fitch,W.M. Toward Finding the Tre.. Proceedings o.. 75Freeman Fitch WM Toward Finding the Tree of Maximum Parsimony Estabrook G Proceedings of the Eighth International Conference on Numerical Taxonomy Evolutionary tree; Phylogeny; USA; Parsimony "Since the general solution to the problem [of finding a tree of maximum parsimony] is not at hand, I shall consider procedures that are useful, either in making the problem more tractable or in pointing toward the most parsimonious tree, as well as noting pitfalls. Two major parts of this are, 1, the concept of a discordancy and, 2, a natural interpretation of a Prim-Kruskal netwsork ... in terms of phylogeny." This is a preliminary version of Fitch (1977). Freeman San Francisco 1975 189-230 1127 Fitch,W.M. On the Problem of Disc.. Am.Nat. 77 111(No. 978):2 Fitch WM On the Problem of Discovering the Most Parsimonious Tree Evolutionary tree; Phylogeny; USA "Since the general solution to the problem [of finding a tree of maximum parsimony] is not in hand, I shall consider procedures that are useful, either in making the problem more tractable or in pointing toward the most parsimonious tree, as well as noting pitfalls. Two major parts of this are: (1) the concept of a discordancy and (2) a natural interpretation of a Prim-Kruskal ... or other single linkage network in terms of phylogeny." Fitch (1975) is a preliminary version of this paper. Am Nat 1977 111 No. 978 223-257 1128 Fitch,W.M. Evolutionary Trees wit.. J.Mol.Evol. 74 3:263-278 Fitch WM; Farris JS Evolutionary Trees with Minimum Nucleotide Replacements from Amino Acid Sequences Evolutionary tree; Phylogeny; USA; Nucleotide; Amino acid "The problem of determining the minimum number of nucleotide substitutions required to account for the descent of a set of amino acid sequences given their ancestral relationships (phylogeny) has been studied. A method expanding upon the earlier work of Fitch (1971) for a set of nucleotide sequences is presented and its merits compared to the method of Moore et al. (1973)." J Mol Evol 3 3 263-278 1129 Fitch,W.M. A Non-Sequential Metho.. J.Mol.Evol. 81 18:30-37 Fitch WM A Non-Sequential Method for Constructing Trees and Hierarchical Classifications Hierarchical; Classification; Phylogeny; Evolutionary tree; Clustering; Distance; USA "A procedure is presented that forms an unrooted tree-like structure from a matrix of pairwise differences. The tree is not formed a portion at a time, as methods now in use generally do, but is formed en toto without intervening estimates of branch lengths. The method is based on a relaxed additivity (four- point metric) constraint. From the tree, a classification may be formed." J Mol Evol 18 18 30-37 1130 Li,W.H. Simple Method for Cons.. Proc.Nat.Acad.S 81 78(2):1085-108 Li WH Simple Method for Constructing Phylogenetic Trees from Distance Matrices Phylogeny; Evolutionary tree; USA; Distance; Phylogenetic "A simple method is proposed for constructing phylogenetic trees from distance matrices. The procedure for constructing tree topologies is similar to that of the unweighted pair-group method (UPG method) but makes corrections for unequal rates of evolution among lineages. The procedure for estimating branch lengths is the same as that of the Fitch and Margoliash method (F-M method) except that it allows no negative branch lengths. The performance of the present procedure for the construction of tree topologies is compared with that of the UPG method, the F-M method, Farris' method, and the modified Farris method ...." Proc Nat Acad Sci USA 1981 78 2 1085-1089 1131 Saitou,N. The Neighbor-Joining M.. Mol.Biol.Evol. 87 4(4):406-425 Saitou N; Nei M The Neighbor-Joining Method: A New Method for Reconstructing Phylogenetic Trees Phylogeny; Evolutionary tree; Clustering; Distance; USA; Phylogenetic; Neighbor joining "A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods ...." See Studier, Keppler (1988) for clarifications and a correction. Mol Biol Evol 1987 4 4 406-425 1132 Sattath,S. Additive Similarity Tr.. Psychometrika 77 42(3):319-345 Sattath S; Tversky A Additive Similarity Trees Clustering; Hierarchical; Additive tree; Program; Similarity; Distance; IL "Similarity data can be represented by additive trees. ... The additive tree is less restrictive than the ultrametric tree, commonly known as the hierarchical clustering scheme. The two representations are characterized and compared. A computer program, ADDTREE, for the construction of additive trees is described and applied to several sets of data." Psychometrika 1977 42 3 319-345 1133 Studier,J.A. A Note on the Neighbor.. Mol.Biol.Evol. 88 5(6):729-731 Studier JA; Keppler KJ A Note on the Neighbor-Joining Algorithm of Saitou and Nei Phylogeny; Evolutionary tree; USA; Error; Neighbor joining; Algorithm "The minimum running time of the algorithm as formulated by Saitou and Nei is unclear. We present an alternative formulation that runs in time O(N3), where N is the number of operational taxonomic units (OTUs). ... The proof given by Saitou and Nei that the correct tree is recovered if D is treelike is incorrect. We describe the error and supply a correct proof below." Mol Biol Evol 1988 5 6 729-731 1134 Szpankowski,W (Un)Expected Behavior .. ACM-SIAM Sympos 92 3:422-431 Szpankowski W (Un)Expected Behavior of Typical Suffix Trees String match; Suffix; Search tree; USA "Recently, Chang and Lawler have designed a sublinear expected time algorithm for approximate string matching using simple estimates of some parameters of suffix trees. ... In this paper, we use a novel technique called string ruler approach to provide a characterization of several basic parameters of suffix trees .... These findings are used to ... provide new insights and generalizations of string matching algorithms, particularly the one by Chang and Lawler." ACM-SIAM Sympos Discrete Algorithms 1992 3 422-431 1135 Nussinov,R. Theoretical Molecular .. J.Theor.Biol. 87 125:219-235 Nussinov R Theoretical Molecular Biology: Prospectives and Perspectives String match; Pattern match; Approximate match; Consensus method; IL "I briefly discuss some aspects of theoretical molecular biology. Specifically, I include the issues of searches for homologies via string matchings, for patterns of specific nucleotide gorupings and of sequence- structure relationship. The various approaches developed in order to achieve this end are described, attempting to convey some of the excitement in this quickly growing field." Pattern recognition in symbolic strings. Algorithms for string matching and for finding consensus sequences. Nearest neighbour patterns in nucleotide sequences. Consensus sequences. Structural Implications. Some further considerations. J Theor Biol 125 125 219-235 1136 Senapathy,P. Splice Junctions, Bran.. Methods Enzymol 90 183:252-278 Senapathy P; Shapiro MB; Harris NL Splice Junctions, Branch Point Sites, and Exons: Sequence Statistics, Identification, and Applications to Genome Project Pattern discovery; Identification; Genome; USA; Exon "We have used the tabulated consensus scoring matrices to find the most probable splice sites in a given sequence. ... A method was developed to predict potential exons in an uncharacterized sequence based on splice site scores and by using other parameters of exons and eukaryotic coding sequences. However, although this method could identify some complete exons of a gene, it cannot identify all the exons of a gene completely. ... Thus the problem of identifying complete genes is an order of magnitude more complex than finding individual exons ...." Methods Enzymol 183 183 252-278 1137 Eigen,M. Statistical Geometry o.. Methods Enzymol 90 183:505-530 Eigen M; Winkler-Oswatitsch R Statistical Geometry on Sequence Space Multiple comparison; Statistical; Sequence analysis; Sequence proximity; DE; Geometry "Alignment as such represents a two-dimensional matrix and thus invites horizontal and vertical inspection. Distance is calculated by horizontal summing of differences between two sequences. Positional nonuniformities of mutation or fixation manifest themselves in vertical deviations from consensus occupation. In this chapter we describe methods of comparative sequence analysis that combine horizontal and vertical criteria. They are used to construct geometries that are more complex, but at the same time also more informative than simple distance dendrograms. We start by introducing the concept of sequence space, a high-dimensional space that is most appropriate for representing sequence relations." Methods Enzymol 183 183 505-530 1138 Gojobori,T. Statistical Methods fo.. Methods Enzymol 90 183:531-550 Gojobori T; Moriyama EN; Kimura M Statistical Methods for Estimating Sequence Divergence Evolutionary distance; Substitution; Statistical; JP; Divergence "The observed number of nucleotide differences between the two DNA sequences is thus frequently different from the total number of nucleotide substitutions that have actually occurred during their divergence. Statistical methods for estimating the number of nucleotide substitutions are therefore required for comparative studies of DNA sequences. In this chapter, we first describe various methods for estimating the number of nucleotide substitutions and then discuss the advantages and disadvantages of these methods." Methods Enzymol 183 183 531-550 1139 Saccone,C. Influence of Base Comp.. Methods Enzymol 90 183:570-583 Saccone C; Lanave C; Pesole G; Preparata G Influence of Base Composition on Quantitative Estimates of Gene Evolution Evolutionary distance; Composition; Stochastic; Markov; Italy; Evolution; Gene "The measure of the genetic distance between organisms is one of the most challenging and difficult issues in molecular evolution. ... The construction of simple models of molecular evolution appears as a necessary and scientifically appropriate first step in any methodological approach. A few years ago we proposed a simple stochastic model of molecular evolution, the stationary Markov model, and we demonstrated that it is at work in a large variety of types of evolutionary dynamics operating at the gene level. In this chapter we present the theoretical basis of the model, its mathematical formulation, and a few experimental applications." Methods Enzymol 183 183 570-583 1140 Saitou,N. Maximum Likelihood Met.. Methods Enzymol 90 183:584-598 Saitou N Maximum Likelihood Methods Evolutionary distance; Phylogeny; Likelihood; JP "Application of the maximum likelihood (ML) method to the problem of phylogenetic tree reconstruction was first studied for the case of gene frequency data. Later, an ML algorithm for constructing unrooted phylogenetic trees from nucleotide sequence data was developed by Felsenstein (1981). Recently, Saitou (1988) proposed a stepwise tree-searching algorithm for the ML method. This is similar to that of the neighbor-joining method (Saitou, Nei 1987), in which distance matrices are used." Methods Enzymol 183 183 584-598 1141 Felsenstein,J PHYLIP - Phylogeny Inf.. Cladistics 89 5(2):164-166 Felsenstein J PHYLIP - Phylogeny Inference Package (Version 3.2) Phylogeny; USA; Program "This is a free package of programs for inferring phylogenies and carrying out certain related tasks. At present it contains 29 programs, which carry out different algorithms on different kinds of data. The programs in the package are: ..." Programs for molecular sequence data (10), Programs for distance matrix data (2), Programs for Gene Frequencies (2), Programs for discrete state data (10), Programs for plotting trees and consensus trees (5). [The current version is available from the author by ftp (file transfer program).] Cladistics 1989 5 2 164-166 1142 Blanken,R.L. Computer Comparison of.. J.Mol.Evol. 82 19:9-19 Blanken RL; Klotz LC; Hinnebusch AG Computer Comparison of New and Existing Criteria for Constructing Evolutionary Trees from Sequence Data Phylogeny; Evolutionary tree; USA "Three new methods for constructing evolutionary trees from molecular sequence data are presented. These methods are based on a theory for correcting for non-constant evolutionary rates (Klotz et al. 1979; Klotz and Blanken 1981). Extensive computer simulations were run to compare these new methods to the commonly used criteria of Dayhoff (1978) and Fitch and Margoliash (1967). ... However, no method yielded the correct topology all of the time, which demonstrated the need to determine confidence estimates in a particular result when evolutionary trees are determined from sequence data." J Mol Evol 19 19 9-19 1143 Winkler-Oswat Comparative Sequence A.. Chemica Scripta 86 26B:59-66 Winkler-Oswatitsch R; Dress A; Eigen M Comparative Sequence Analysis Exemplified with tRNA and 5S rRNA Sequence analysis; Evolutionary tree; DE "The advent of new sequencing techniques has brought a sudden increase in the data available for the study of evolutionary history on a quantitative basis. Criteria are put forward and methods are developed that allow an optimal alignment of sequences, a determination of the topology of their kinship relations, a reconstitution of precursors and a reliable establishment of their randomization. The criteria developed are tested by comparison to a large bulk of data from both tRNA and ribosomal 5S RNA sequences." Chemica Scripta 1986 26B 59-66 1144 Wheeler,W.C. Paired Sequence Differ.. Mol.Biol.Evol. 88 5(1):90-96 Wheeler WC; Honeycutt RL Paired Sequence Difference in Ribosomal RNAs: Evolutionary and Phylogenetic Implications Phylogeny; USA; RNA; Phylogenetic "Ribosomal RNAs have secondary structures that are maintained by internal Watson-Crick pairing. ... [W]e show that Darwinian selection operates on these nucleotide sequences to maintain functionally important secondary structure. Insect phylogenies based on nucleotide positions involved in pairing and the production of secondary structure are incongruent with those constructed on the basis of positions that are not. Furthermore, phylogeny reconstruction using these nonpairing bases is concordant with other, morphological data." Mol Biol Evol 1988 5 1 90-96 1145 McLachlan,A.D Repeating Sequences an.. J.Mol.Biol. 72 64:417-437 McLachlan AD Repeating Sequences and Gene Duplication in Proteins Repeat; Duplication; Repetition; UK; Gene; Protein "The theory that proteins have evolved by repeated internal duplication of short segments of polypeptide chains has been tested by looking for repeats and near repeats in over 50 different proteins, many of them of known structure. The probability that the observed repeats could arise by chance has been calculated. The search does not yield a single new example where the evidence for gene duplication is compelling. No protein shows a unique internally consistent pattern of repeats which both correlates with repeats in the structure and cannot be explained by chance." J Mol Biol 64 64 417-437 1146 Kaplan,N. A New Estimate of Sequ.. J.Mol.Evol. 79 13:295-304 Kaplan N; Langley CH A New Estimate of Sequence Divergence of Mitochondrial DNA Using Restriction Endonuclease Mappings Evolutionary distance; Restriction; Mapping; USA; Divergence; DNA "A new estimate of the sequence divergence of mitochondrial DNA in related species using restriction enzyme maps is constructed. The estimate is derived assuming a simple Poisson-like model for the evolutionary process and is chosen to maximize an expression which is a reasonable approximation to the true likelihood of the restriction map data. Using this estimate, four sets of mitochondrial DNA data are analyzed and discussed." J Mol Evol 13 13 295-304 1147 Peacock,D. Use of Amino Acid Sequ.. J.Mol.Biol. 75 95:513-527 Peacock D; Boulter D Use of Amino Acid Sequence Data in Phylogeny and Evaluation of Methods using Computer Simulation Phylogeny; UK; Simulation; Amino acid "The advantages and disadvantages of the use of amino acid sequence data for phylogenetic studies are critically examined. The accuracy of two of the main methods currently used to construct phylogenetic relationships from amino acid sequences is evaluated, using a computer program that produces model sequences by simulating the process of protein evolution." The methods compared were those of Dayhoff & Eck (1966) and of Moore, Goodman & Barnabas (1973). J Mol Biol 95 95 513-527 1148 Saitou,N. Property and Efficienc.. J.Mol.Evol. 88 27:261-273 Saitou N Property and Efficiency of the Maximum Likelihood Method for Molecular Phylogeny Phylogeny; Likelihood; USA "The maximum likelihood (ML) method for constructing phylogenetic trees (both rooted and unrooted trees) from DNA sequence data was studied. Although there is some theoretical problem in the comparison of ML values conditional for each topology, it is possible to make a heuristic argument to justify the method. Based on this argument, a new algorithm for estimating the ML tree is presented." J Mol Evol 27 27 261-273 1149 Sourdis,J. Accuracy of Phylogenet.. Mol.Biol.Evol. 87 4(2):159-166 Sourdis J; Krimbas C Accuracy of Phylogenetic Trees Estimated from DNA Sequence Data Phylogeny; Evolutionary tree; GR; DNA; Phylogenetic; Accuracy "The relative merits of four different tree-making methods in obtaining the correct topology were studied by using computer simulation. The methods studied were the unweighted pair-group method with arithmetic mean (UPGMA), Fitch and Margoliash's (FM) method, the distance Wagner (DW) method, and Tateno et al.'s modified Farris (MF) method. ... The results obtained can be summarized as follows: (1) The probability of obtaining the correct rooted or unrooted tree is low unless a large number of nucleotide differences exists between different sequences. (2) ..." Mol Biol Evol 1987 4 2 159-166 1150 Tateno,Y. Statistical Properties.. J.Mol.Evol. 86 23:354-361 Tateno Y; Tajima F Statistical Properties of Molecular Tree Construction Methods under the Neutral Mutation Model Phylogeny; Evolutionary tree; Statistical; JP; Model "The statistical properties of three molecular tree construction methods - the unweighted pair-group arithmetic average clustering (UPG), Farris, and modified Farris methods - are examined under the neutral mutation model of evolution. The methods are compared for accuracy in construction of the topology and estimation of the branch lengths, using statistics of these two aspects. The distribution of the statistic concerning topological construction is shown to be as important as its mean and variance for the comparison." J Mol Evol 23 23 354-361 1151 Tateno,Y. Accuracy of Estimated .. J.Mol.Evol. 82 18:387-404 Tateno Y; Nei M; Tajima F Accuracy of Estimated Phylogenetic Trees from Molecular Data. I. Distantly Related Species Phylogeny; Evolutionary tree; USA; Phylogenetic; Accuracy "The accuracies and efficiencies of four different methods for constructing phylogenetic trees from molecular data were examined by using computer simulation. The methods examined are UPGMA, Fitch and Margoliash's (1967) (F/M) method, Farris' (1972) method, and the modified Farris method (Tateno, Nei, and Tajima, this paper). ... Nevertheless, any tree-making method is likely to make errors in obtaining the correct topology with a high probability, unless all branch lengths of the true tree are sufficiently long. ... The agreement between patristic and observed distances is not a good indicator of the goodness of the tree obtained." J Mol Evol 18 18 387-404 1152 Thompson,E.A. Likelihood and Parsimo.. Cladistics 86 2(1):43-52 Thompson EA Likelihood and Parsimony: Comparison of Criteria and Solutions Phylogeny; Likelihood; UK; Parsimony "This paper investigates the effects of alternative levels of assumption upon resulting estimates of evolutionary history, in the context of a particular problem - analysis of varying allele frequencies between polymorphic alleles existing in all the populations under consideration." Cladistics 1986 2 1 43-52 1153 Hirschberg,D. The Set-Set LCS Problem Algorithmica 89 4(4):503-510 Hirschberg DS; Larmore LL The Set-Set LCS Problem Longest common; Subsequence; USA; Dynamic programming "An efficient algorithm is presented that solves a generalization of the Longest Common Subsequence problem, in which both of the input strings consists of sets of symbols which may be permuted." Algorithmica 1989 4 4 503-510 1154 Ukkonen,E. A Linear-Time Algorith.. Algorithmica 90 5(3):313-323 Ukkonen E A Linear-Time Algorithm for Finding Approximate Shortest Common Superstrings Supersequence; Shortest common; FI; Algorithm "Approximate shortest common superstrings for a given set R of strings can be constructed by applying the greedy heuristics for finding a longest Hamiltonian path in the weighted graph that represents the pairwise overlaps between the strings in R. We develop an efficient implementation of this idea using a modified Aho-Corasick string-matching automaton." Algorithmica 1990 5 3 313-323 1155 Apostolico,A. Optimal Parallel Detec.. Algorithmica 92 8(4):285-319 Apostolico A Optimal Parallel Detection of Squares in Strings Regularities; Optimal; Parallel; Square; Italy; Detection "A string is square-free if it has no nonempty substring of the form ww. It is shown that the square-freedom of a string of n symbols over an arbitrary alphabet can be tested by a CRCW PRAM with n processors in O(log n) time and linear auxiliary space. ... More elaborate constructions lead to a CRCW PRAM algorithm for detecting, within the same n-processors bounds, all positioned squares in x in time O(log n) and using linear auxiliary space. The fastest sequential algorithms solve this problem in O(n log n) time, and such a performance is known to be optimal." Algorithmica 1992 8 4 285-319 1156 Lam,T.W. Finding Least-Weight S.. Algorithmica 93 9(6):615-628 Lam TW; Chan KF Finding Least-Weight Subsequences with Fewer Processors Least weight; Subsequence; HK "In this paper we show that if the weight function satisfies the inverse quadrangle inequality, the [least-weight subsequence] problem can be solved on a CREW PRAM in O(log2 n log log n) time with n/log log n processors, or in O(log2 n) time with n log n processors. Notice that the processor-time complexity of our algorithm is much closer to the almost linear-time complexity of the best- known sequential algorithm." Algorithmica 1993 9 6 615-628 1157 Ukkonen,E. Approximate String Mat.. Algorithmica 93 10(5):353-364 Ukkonen E; Wood D Approximate String Matching with Suffix Automata Approximate match; String match; Automata; FI; Suffix "The approximate string matching problem is, given a text string, a pattern string, and an integer k, to find in the text all approximate occurrences of the pattern. An approximate occurrence means a substring of the text with edit distance at most k from the pattern. We give a new O(kn) algorithm for this problem, where n is the length of the text. The algorithm is based on the suffix automaton with failure transitions and on the diagonalwise monotonicity of the edit distance table. Some experiments showing that the algorithm has a small overhead are reported." Algorithmica 1993 10 5 353-364 1158 Steele,J.M. An Efron-Stein Inequal.. Ann.Statist. 86 14(2):753-758 Steele JM An Efron-Stein Inequality for Nonsymmetric Statistics Longest common; Subsequence; USA "Finally the inequality is applied to a problem of string comparisons by means of long common subsequences, a problem considered at length in Sankoff and Kruskal (1983). The best known bound on the variance of the longest common subsequence is improved, and a new k string comparison problem is introduced." Ann Statist 1986 14 2 753-758 1159 Kececioglu,J. Combinatorial Algorith.. Algorithmica 94 12:???-??? Kececioglu JD; Myers EW Combinatorial Algorithms for DNA Sequence Assembly Approximation; Fragment; Sequence reconstruction; Sequence assembly; USA; Combinatorial; DNA; Algorithm Preprint, 45 pp. "The sequence reconstruction problem that we take as our formulation of DNA sequence assembly is a variation of the shortest common superstring problem, complicated by the presence of sequencing errors and reverse complements of fragments. Since the simpler superstring problem is NP- hard, any efficient reconstruction procedure must resort to heuristics. In this paper, however, a four phase approach based on rigorous design criteria is presented, and has been found to be very accurate in practice." Algorithmica 12 12 ???-??? 1160 Pevzner,P.A. Open Combinatorial Pro.. Israel Sympos.T 95 3:???-??? Pevzner PA; Waterman MS Open Combinatorial Problems in Computational Molecular Biology Genome; Rearrangement; Mapping; Sequencing; Sequence comparison; USA; Combinatorial Preprint, 16 pp. "In the last few years theoretical computer scientists have found new challenges in computational molecular biology. We discuss recent advances and present some open combinatorial problems in different areas of computational molecular biology such as genome rearrangements, DNA physical mapping, DNA sequencing and sequence comparison." Israel Sympos Theor Comput Systems 3 3 ???-??? 1161 Cheng,H.D. VLSI Architectures for.. Pattern Recogni 87 20(1):125-141 Cheng HD; Fu KS VLSI Architectures for String Matching and Pattern Matching VLSI; String match; Pattern match; Hardware; USA "In this paper, we discuss string-matching and dynamic time-warp pattern- matching. ... We propose a VLSI architecture based on the space-time domain expansion approach which can compute the string distance and also give the matching index-pairs which correspond to the edit sequence. ... We also propose a VLSI architecture for dynamic time-warping based on the space-time expansion method which can obtain a high throughput by using extensive pipelining and parallelism." Pattern Recognition 1987 20 1 125-141 1162 Hollaar,L.A. Text Retrieval Computers Computer 79 12(3):40-50 Hollaar LA Text Retrieval Computers Hardware; Retrieval; USA "The hardware required for efficient text retrieval differs from that required for retrieval of formatted data. Here is an examination of such hardware, particularly term comparators." Computer 1979 12 3 40-50 1163 Iyengar,S.S. A String Searching Alg.. Appl.Math.Compu 80 6:123-131 Iyengar SS; Alia V A String Searching Algorithm String search; USA; Algorithm "This paper is an attempt to develop a string searching algorithm that begins the search for a match in the middle of the strings being compared. The algorithm uses information gained from mismatches and the location of the search area in the large string, to make decisions and direct the search. Several elements of this algorithm can be useful in string searching applications." Appl Math Comput 6 6 123-131 1164 Felsenstein,J Confidence Limits on P.. Evolution 85 39(4):783-791 Felsenstein J Confidence Limits on Phylogenies: An Approach Using the Bootstrap Evolutionary tree; Robustness; Resampling; Bootstrap; Confidence; USA; Statistical; Phylogeny "The recently-developed statistical method known as the "bootstrap' can be used to place confidence intervals on phylogenies. ... In the case of phylogenies, it is argued that the proper method of resampling is to keep all of the original species while sampling characters with replacement, under the assumption that the characters have been independently drawn by the systematist and have evolved independently. Majority-rule consensus trees can be used to construct a phylogeny showing all of the inferred monophyletic groups that occurred in a majority of the bootstrap samples." Evolution 1985 39 4 783-791 1165 Aoe,J.I. An Efficient Implement.. SIGIR Forum 89 23(3,4):22-33 Aoe JI An Efficient Implementation of String Pattern Matching Machines for a Finite Number of Keywords Pattern match; String match; Automata; Data structure; JP "This paper describes a method of implementing a static transition table of a string pattern matching machine to locate all occurrences of a finite number of keywords in a text string. The scheme combines the fast access of an array representation with the compactness of a list structure. Each transition can be computed from the present data structure in O(1) time and the storage is as small as the list structure. The construction and pattern matching programs associated with the present data structure are provided and the efficiency is evaluated by empirical results." SIGIR Forum 1989 23 3,4 22-33 1166 Kuo,S. An Improved Algorithm .. SIGIR Forum 89 23(3,4):89-99 Kuo S; Cross GR An Improved Algorithm to Find the Length of the Longest Common Subsequence of Two Strings Pairwise comparison; Longest common; Subsequence; USA; Algorithm "We present an improvement to this algorithm [Hunt, Szymanski (1977)] .... Some experimental results show dramatic improvements for large n." SIGIR Forum 1989 23 3,4 89-99 1167 Staden,R. An Improved Sequence H.. Comput.Appl.Bio 90 6(4):387-393 Staden R An Improved Sequence Handling Package that Runs on the Apple Macintosh Sequence analysis; Program; UK "We report improvements to our sequence analysis package and adaptation to run on the Apple Macintosh range of machines. ... In addition to a large number of small but useful extra features, some important new analytical functions have been devised. These include sequence and contig editors; optimal alignment and comparison methods; and a new method for comparing the observed and expected frequencies of selected oligonucleotides." Comput Appl Biosci 1990 6 4 387-393 1168 Gleeson,T.J. An X Windows and UNIX .. Comput.Appl.Bio 91 7(3):398-0 Gleeson TJ; Staden R An X Windows and UNIX Implementation of Our Sequence Analysis Package Sequence analysis; Program; UK "Our comprehensive package of programs for handling and analysing sequences (references in Staden, 1990) has been used on VAX machines running under the VMS operating system for many years. ... Further modifications to the original FORTRAN and an additional set of routines written in C have enabled us to produce two new versions of the programs to run under the X Window System. The first runs under the terminal emulator xterm, and the second runs directly under X." Comput Appl Biosci 1991 7 3 398-0 1169 Stephen,G.A. String Searching Algor.. 94World Scientifi Stephen GA String Searching Algorithms BK - String search; String match; Approximate match; Search tree; Distance; Repeat; UK; Algorithm "This book presents a bibliographic overview of the field and an anthology of detailed descriptions of the principal algorithms available. The aim is twofold: on the one hand, to provide an easy-to-read comparison of the available techniques in each area, and on the other, to furnish the reader with a reference to in-depth descriptions of the major algorithms. Topics covered include methods for finding exact and approximate string matches, calculating 'edit' distances between strings, finding common sequences and finding the longest repetitions within strings." World Scientific Publishing Singapore 1994 xii+243-0 1170 Ukkonen,E. Approximate String-Mat.. Lecture Notes i 93 684:228-242 Ukkonen E Approximate String-Matching over Suffix Trees Approximate match; String match; Search tree; FI; Suffix 4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings. "The classical approximate string-matching problem ... is considered. We concentrate on the special case in which [the text] T is available for preprocessing before the searches with varying [pattern] P and [neighbourhood] k. It is shown how the searches can be done fast using the suffix tree of T augmented with the suffix links as the preprocessed form of T and applying dynamic programming over the tree. Three variations of the search algorithm are developed ...." Lecture Notes in Comput Sci 684 684 228-242 1171 Vingron,M. Multiple Sequence Comp.. Lecture Notes i 93 684:243-253 Vingron M; Pevzner PA Multiple Sequence Comparison and n-Dimensional Image Reconstruction Sequence comparison; Dot; Multiple alignment; USA 4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings. "In recent studies the usefulness of dot-matrices for multiple sequence alignment has been proved. Viewing dot-matrices as projections of unknown n- dimensional points, we consider the multiple alignment problem (for n sequences) as an n-dimensional image reconstruction problem with noise. From this perspective we introduce and develop the filtering method due to Vingron and Argos (1991). ... An improved version of the original algorithm is introduced that avoids costly dot-matrix multiplications ...." Lecture Notes in Comput Sci 684 684 243-253 1172 Breslauer,D. Tight Comparison Bound.. Lecture Notes i 93 684:11-19 Breslauer D; Colussi L; Toniolo L Tight Comparison Bounds for the String Prefix-Matching Problem String match; Prefix; Knuth-Morris-Pratt; Italy 4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings. "This is a natural generalization of the string matching problem where only occurrences of the whole pattern are sought. The Knuth-Morris-Pratt string matching algorithm can be easily adapted to solve the string prefix-matching problem without making additional comparisons. In this paper we study the exact complexity of the string prefix-matching problem in the deterministic sequential comparison model. Our bounds do not account for comparisons made in a pattern preprocessing step." Lecture Notes in Comput Sci 684 684 11-19 1173 Iliopoulos,C. Covering a String Lecture Notes i 93 684:54-62 Iliopoulos CS; Moore DWG; Park K Covering a String Repetition; Regularities; AU 4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings. "We consider the problem of finding the repetitive structures of a given string x. The period u of the string x grasps the repetitiveness of x, since x is a prefix of a string constructed by concatenations of u. We generalize the concept of repetitiveness as follows: A string w covers a string x if there exists a string constructed by concatenations and superpositions of w of which x is a substring. A substring w of x is called a seed of x if w covers x. We present an O(n log n) time algorithm for finding all the seeds of a given string of length n." Lecture Notes in Comput Sci 684 684 54-62 1174 Irving,R.W. On the Worst-Case Beha.. Lecture Notes i 93 684:63-73 Irving RW; Fraser CB On the Worst-Case Behaviour of some Approximation Algorithms for the Shortest Common Supersequence of k Strings Supersequence; Shortest common; UK; Approximation; Algorithm 4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings. "Two natural polynomial-time approximation algorithms for the shortest common supersequence (SCS) of k strings are analysed from the point of view of worst- case performance guarantee. Both algorithms behave badly in the worst case, whether the underlying alphabet is unbounded or of fixed size." Lecture Notes in Comput Sci 684 684 63-73 1175 Kannan,S.K. An Algorithm for Locat.. Lecture Notes i 93 684:74-86 Kannan SK; Myers EW An Algorithm for Locating Non-Overlapping Regions of Maximum Alignment Score Sequence alignment; Repeat; Region; USA; Score; Algorithm 4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings. "In this paper we present an O(N2 log2 N) algorithm for finding the two non- overlapping substrings of a given string of length N which have the highest- scoring alignment between them. This significantly improves the previously best known bound of O(N3) for the worst-case complexity of this problem." Lecture Notes in Comput Sci 684 684 74-86 1176 Kececioglu,J. Exact and Approximatio.. Lecture Notes i 93 684:87-105 Kececioglu J; Sankoff D Exact and Approximation Algorithms for the Inversion Distance between Two Chromosomes Genome; Sequence proximity; Chromosome; Inversion; CA; Approximation; Distance; Reversal; Transposition; Translocation; Algorithm (To appear in Algorithmica, 1994, as "Exact and Approximation Algorithms for Sorting by Reversals, with Application to Genome Rearrangements.") 4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings. "Motivated by the problem in computational biology of reconstructing the series of chromosome inversions by which one organism evolved from another, we consider the problem of computing the shortest series of reversals that transform one permutation to another. The permutations describe the order of genes on corresponding chromosomes, and a reversal takes an arbitrary substring of elements and reverses their order. For this problem we develop two algorithms: a greedy approximation algorithm ... and a branch and bound exact algorithm that finds an optimal solution ...." Lecture Notes in Comput Sci 684 684 87-105 1177 Kececioglu,J. The Maximum Weight Tra.. Lecture Notes i 93 684:106-119 Kececioglu J The Maximum Weight Trace Problem in Multiple Sequence Alignment Sequence alignment; Multiple alignment; USA 4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings. "We define a new problem in multiple sequence alignment, called maximum weight trace. The problem formalizes in a natural way the common practice of merging pairwise alignments to form multiple sequence alignments, and contains a version of the minimum sum of pairs alignment problem as a special case. ... We develop a branch and bound algorithm for maximum weight trace. Though the problem is NP- complete, an implementation of the algorithm shows we can solve instances on as many as 6 sequences of length 250 in a few minutes." Lecture Notes in Comput Sci 684 684 106-119 1178 Landau,G.M. An Algorithm for Appro.. Lecture Notes i 93 684:120-133 Landau GM; Schmidt JP An Algorithm for Approximate Tandem Repeats Repeat; Approximate match; USA; Algorithm 4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings. "A perfect tandem repeat within a string S is a substring r = uv for which u = v. An approximate tandem repeat is a substring r = uv for which u and v are similar. In this paper we consider two criterions of similarity: the Hamming distance (k mismatches) and the edit distance (k differences). For a string S of length n and an integer k our algorithm reports all locally optimal approximate repeats ...." Lecture Notes in Comput Sci 684 684 120-133 1179 Louchard,G. Analysis of a String E.. Lecture Notes i 93 684:152-163 Louchard G; Szpankowski W Analysis of a String Edit Problem in a Probabilistic Framework (Extended Abstract) Edit; Sequence proximity; Probabilistic; Belgium 4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings. "We consider a string edit problem in a probabilistic framework. ... In particular, we observe that the [edit distance] is asymptotically almost surely (a.s.) equal to an where a is a constant and n is the sum of lengths of both strings. We also obtained some bounds on a in the so called independent model in which all weights ... are assumed to be independent. More importantly, we show that the edit distance is well concentrated around its average value. As a by- product of our results, we also present a precise estimate of the number of alignments between two strings." Lecture Notes in Comput Sci 684 684 152-163 1180 Muthukrishnan Detecting False Matche.. Lecture Notes i 93 684:164-178 Muthukrishnan S Detecting False Matches in String Matching Algorithms String match; Parallel; USA; Algorithm 4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings. "Consider a text string of length n, a pattern string of length m and a match vector of length n which declares each location in the text to be either a match ... or a potential match. ... We investigate the complexity of two problems in this context, namely, checking if there is any false match, and identifying all the false matches in the match vector. We present an algorithm on the CRCW PRAM that checks if there exists any false match in O(1) time using O(n) processors. Since string matching takes W(log log m) time on the CRCW PRAM, checking for false matches is provably simpler than string matching." Lecture Notes in Comput Sci 684 684 164-178 1181 Szpankowski,W Probabilistic Analysis.. Lecture Notes i 92 644:1-14 Szpankowski W Probabilistic Analysis of Generalized Suffix Trees (Extended Abstract) Search tree; Data structure; String match; Probabilistic; USA; Suffix Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992. Proceedings. "Suffix trees find several applications in computer science and telecommunications, most notably in algorithms on strings, data compressions and codes. We consider in a probabilistic framework a family of generalized suffix trees - called b-suffix trees - built from the first n suffixes of a random word. ... Several parameters of b-suffix trees are of interest, namely the typical depth, the depth of insertion, the height, the external path length, and so forth. We establish some results concerning typical, that is, almost sure (a.s.), behavior of these parameters." Lecture Notes in Comput Sci 644 644 1-14 1182 Regnier,M. A Language Approach to.. Lecture Notes i 92 644:15-26 Regnier M A Language Approach to String Searching Evaluation String search; Probabilistic; Knuth-Morris-Pratt; Boyer-Moore; Markov; FR; Language Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992. Proceedings. "We propose a general framework to derive average performance of string searching algorithms that preprocess the pattern. It relies mainly on languages and combinatorics on words, joined to some probabilistic tools. The approach is quite powerful: although we concentrate here on Morris-Pratt and Boyer-Moore-Horspool, it applies to a large class of algorithms. A fairly general character distribution is assumed, namely a Markovian one, suitable for applications such as natural languages or biological databases searching." Lecture Notes in Comput Sci 644 644 15-26 1183 Atallah,M.J. Pattern Matching With .. Lecture Notes i 92 644:27-40 Atallah MJ; Jacquet P; Szpankowski W Pattern Matching With Mismatches: A Probabilistic Analysis and a Randomized Algorithm (Extended Abstract) Pattern match; Approximate match; Probabilistic; USA; Algorithm Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992. Proceedings. "Given a text of length n and a pattern of length m over some (possibly unbounded) alphabet, we consider the problem of finding all positions in the text at which the pattern 'almost occurs'. Here by 'almost occurs' we mean that at least some fixed fraction r of the characters of the pattern ... are equal to their corresponding characters in the text. We design a randomized algorithm that has O(n log m) worst-case time complexity and computes with high probability all of the almost-occurrences of the pattern in the text." Lecture Notes in Comput Sci 644 644 27-40 1184 Kim,J.Y. Fast Multiple Keyword .. Lecture Notes i 92 644:41-51 Kim JY; Shawe-Taylor J Fast Multiple Keyword Searching Dictionary match; N-gram; UK Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992. Proceedings. "A new multiple keyword searching algorithm is presented as a generalization of a fast substring matching algorithm based on an n-gram technique. The expected searching time complexity is shown to be O((N/m + ml) log lm) under reasonable assumptions about the keywords together with the assumption that the text is drawn from a stationary ergodic source, where N is the text size, l the number of keywords and m the smallest keyword size." Lecture Notes in Comput Sci 644 644 41-51 1185 Knight,J.R. Approximate Regular Ex.. Lecture Notes i 92 644:67-78 Knight JR; Myers EW Approximate Regular Expression Pattern Matching with Concave Gap Penalties Pattern match; Approximate match; Language; Gap; USA; Expression Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992. Proceedings. "Given a sequence A of length M and a regular expression R of length P, an approximate regular expression pattern matching algorithm computes the score of the best alignment between A and one of the sequences exactly matched by R. There are a variety of schemes for scoring alignments. ... In this paper we present an O(MP(log M + log2 P)) algorithm for approximate regular expression matching for an arbitrary [function scoring each aligned pair of symbols] and any concave [gap weighting function]." Lecture Notes in Comput Sci 644 644 67-78 1186 Fischetti,V.A Identifying Periodic O.. Lecture Notes i 92 644:111-120 Fischetti VA; Landau GM; Schmidt JP; Sellers PH Identifying Periodic Occurrences of a Template with Applications to Protein Structure Regularities; Template; Match a pattern matrix; Structure; USA; Protein Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992. Proceedings. "We consider a string matching problem where the pattern is a template that matches many different strings with various degrees of perfection. ... For a text T of length n, and a template P of length m, we wish to find the best alignment of T with Pn, which is the concatenation of n copies of P, (m will typically be much smaller than n). ... We show that the structure of Pn can be exploited and the problem reduced to essentially solving a dynamic programming of size O(mn)." Lecture Notes in Comput Sci 644 644 111-120 1187 Sankoff,D. Edit Distance for Geno.. Lecture Notes i 92 644:121-135 Sankoff D Edit Distance for Genome Comparison Based on Non-Local Operations Genome; Edit; Distance; Rearrangement; CA Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992. Proceedings. Motivated by "the feasibility of evolutionary inference based on the macrostructure of entire genomes, rather than on the traditional comparison of homologous versions of a single gene in different organisms. In this paper, we define a number of measures of gene order rearrangement, describe algorithm design and software development for the calculation of some of these quantities in single-chromosome genomes, and report on the results of applying these tools to a database of mitochondrial gene orders inferred from genomic sequences." Lecture Notes in Comput Sci 644 644 121-135 1188 Chang,W.I. Theoretical and Empiri.. Lecture Notes i 92 644:175-184 Chang WI; Lampe J Theoretical and Empirical Comparisons of Approximate String Matching Algorithms String match; Approximate match; Match with k differences; Probabilistic; USA; Algorithm Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992. Proceedings. "We study in depth a model of non-exact pattern matching based on edit distance .... More precisely, the k differences approximate string matching problem specifies .... We have carefully implemented and analyzed various O(kn) algorithms based on dynamic programming (DP).... A new algorithm is presented that computes much fewer entries of the DP table. ... We give a probabilistic analysis of the DP table in order to prove that the expected running time of our algorithm ... is O(kn) for random text." Lecture Notes in Comput Sci 644 644 175-184 1189 Pevzner,P.A. Multiple Alignment wit.. Lecture Notes i 92 644:205-213 Pevzner PA Multiple Alignment with Guaranteed Error Bounds and Communication Cost Multiple alignment; Error; Graph; USA Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992. Proceedings. "Dynamic programming for optimal multiple alignment requires too much time to be practical. Although many algorithms for suboptimal alignment have been suggested, no 'performance guarantees' have been known until recently. We give an approximation multiple alignment algorithm with guaranteed error bounds equal to the normalized communication cost of a corresponding graph." Lecture Notes in Comput Sci 644 644 205-213 1190 Hui,L.C.K. Color Set Size Problem.. Lecture Notes i 92 644:230-243 Hui LCK Color Set Size Problem with Applications to String Matching String match; Longest common; Multiple comparison; USA Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992. Proceedings. "This paper gives an optimal sequential solution of the color set size problem and string matching applications including a linear time algorithm for the problem of finding the longest substring common to at least k out of m input strings for all k between 1 and m. In addition, parallel solutions to the above problems are given. These solutions may shed light on problems in computational biology, such as the multiple string alignment problem." Lecture Notes in Comput Sci 644 644 230-243 1191 Mehta,D.P. Computing Display Conf.. Lecture Notes i 92 644:244-261 Mehta DP; Sahni S Computing Display Conflicts in String and Circular String Visualization Sequence analysis; Display; Graph; Data structure; USA Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992. Proceedings. "We have proposed a model for the visualization of strings and circular strings, where we introduced the problem of display conflicts. In this paper, we provide efficient algorithms for computing display conflicts in linear strings. These algorithms make use of the scdawg data structure for linear strings. We also extend the scdawg data structure to represent circular strings. The resulting data structure may now be employed to compute display conflicts in circular strings by using the algorithms for computing conflicts in linear strings." Lecture Notes in Comput Sci 644 644 244-261 1192 Amir,A. Efficient Randomized D.. Lecture Notes i 92 644:262-275 Amir A; Farach M; Matias Y Efficient Randomized Dictionary Matching Algorithms (Extended Abstract) Dictionary match; USA; Algorithm Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992. Proceedings. "In string matching, randomized algorithms have primarily made use of randomized hashing functions which convert strings into 'signatures' or 'finger prints'. We explore the use of finger prints in conjunction with other randomized and deterministic techniques and data structures. We present several new algorithms for dictionary matching, along with parallel algorithms which are simpler or more efficient than previously known algorithms." Lecture Notes in Comput Sci 644 644 262-275 1193 Idury,R.M. Dynamic Dictionary Mat.. Lecture Notes i 92 644:276-287 Idury RM; Schaffer AA Dynamic Dictionary Matching with Failure Functions Dictionary match; USA; Function; Dynamic Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992. Proceedings. "Amir, Farach, Galil, Giancarlo, and Park used an automaton based on suffix trees to solve the dynamic [dictionary matching] problem. We show how to match their time bounds for update and search using a failure function framework, similar to that used by Aho and Corasick to solve the static dictionary matching problem. We then show that our approach allows us to achieve faster search times at the expense of the update times. Finally, we show how to speed up the initial dictionary construction." Lecture Notes in Comput Sci 644 644 276-287 1194 Libertini,G. "Reconstruction of Anc.. J.Mol.Evol. 94 39:219-229 Libertini G; Di Donato A "Reconstruction of Ancestral Sequences by the Inferential Method, a Tool for Protein Engineering Studies Phylogeny; Evolutionary tree; Italy; Protein "This paper describes the inferential method, an approach for reconstructing protein and nucleotide sequences of ancestral species, starting from known, homologous, contemporary sequences. The method requires knowledge of the topology of the phylogenetic tree, whose nodes are the species to whom the reconstructed sequences belong. The method has been tested by computer simulation of speciation and nucleotide substitutions, starting from a single ancestral sequence, and by subsequent reconstruction of nodal sequences." J Mol Evol 39 39 219-229 1195 Huelsenbeck,J Success of Phylogeneti.. Syst.Biol. 93 42(3):247-264 Huelsenbeck JP; Hillis DM Success of Phylogenetic Methods in the Four-Taxon Case Phylogeny; USA; Phylogenetic "The success of 16 methods of phylogenetic inference was examined using consistency and simulation analysis. Success - the frequency with which a tree- making method correctly identified the true phylogeny - was examined for an unrooted four-taxon tree. In this study, tree-making methods were examined under a large number of branch-length conditions and under three models of sequence evolution. The results are plotted to facilitate comparisons among the methods. The consistency analysis indicated which methods converge on the correct tree given infinite sample size." Syst Biol 1993 42 3 247-264 1196 Miyata,T. Two Types of Amino Aci.. J.Mol.Evol. 79 12:219-236 Miyata T; Miyazawa S; Yasunaga T Two Types of Amino Acid Substitutions in Protein Evolution Substitution; Amino acid; Protein; Evolution; JP "The frequency of amino acid substitutions, relative to the frequency expected by chance, decreases linearly with the increase in physico-chemical differences between amino acid pairs involved in a substitution. This correlation does not apply to abnormal human hemoglobins. Since abnormal hemoglobins mostly reflect the process of mutation rather than selection, the correlation manifest during protein evolution between substitution frequency and physico-chemical difference in amino acids can be attributed to natural selection. ... From this analysis, we can show that there exists another type of substitution which depends less on the extent of physico-chemical properties of substituted amino acids." J Mol Evol 12 12 219-236 1197 Kishino,H. Evaluation of the Maxi.. J.Mol.Evol. 89 29:170-179 Kishino H; Hasegawa M Evaluation of the Maximum Likelihood Estimate of the Evolutionary Tree Topologies from DNA Sequence Data, and the Branching Order in Hominoidea Phylogeny; Evolutionary tree; Likelihood; JP; Robustness; Analytical; DNA; Topology "In evaluating the extent to which the maximum likelihood tree is a significantly better representation of the true tree, it is important to estimate the variance of the difference between log likelihood of different tree topologies. Bootstrap resampling can be used for this purpose ... but it imposes a great computation burden. To overcome this difficulty, we developed a new method for estimating the variance by expressing it directly." J Mol Evol 29 29 170-179 1198 Saitou,N. Relative Efficiencies .. Mol.Biol.Evol. 89 6(5):514-525 Saitou N; Imanishi T Relative Efficiencies of the Fitch-Margoliash, Maximum-Parsimony, Maximum- Likelihood, Minimum-Evolution, and Neighbor-joining Methods of Phylogenetic Tree Construction in Obtaining the Correct Tree Phylogeny; Evolutionary tree; Clustering; Distance; JP; Minimum evolution; Phylogenetic; Neighbor joining "The relative efficiencies of several tree-making methods for obtaining the correct phylogenetic tree were studied by using computer simulation. ... If one considers the computational time involved, the [neighbor-joining] method seems to be a method of choice." Mol Biol Evol 1989 6 5 514-525 1199 Sourdis,J. Relative Efficiencies .. Mol.Biol.Evol. 88 5(3):298-311 Sourdis J; Nei M Relative Efficiencies of the Maximum Parsimony and Distance-Matrix Methods in Obtaining the Correct Phylognetic Tree Phylogeny; Evolutionary tree; Distance; Parsimony; USA "The relative efficiencies of the maximum parsimony (MP) and distance- matrix methods in obtaining the correct tree (topology) were studied by using computer simulation. The distance-matrix methods examined are the neighbor- joining, distance-Wagner, Tateno et al. modified Farris, Faith, and Li methods." Mol Biol Evol 1988 5 3 298-311 1200 Tajima,F. Estimation of Evolutio.. Mol.Biol.Evol. 94 11(2):278-286 Tajima F; Takezaki N Estimation of Evolutionary Distance for Reconstructing Molecular Phylogenetic Trees Phylogeny; Evolutionary distance; Evolutionary tree; JP; Distance; Phylogenetic; Estimation "The most commonly used measure of evolutionary distance in molecular phylogenetics is the number of nucleotide substitutions per site. However, this number is not necessarily most efficient for reconstructing a phylogenetic tree. In order to evaluate the accuracy of evolutionary distance for obtaining the correct tree topology, an accuracy index, A(t), was proposed. ... Using A(t), namely, finding the condition under which A(t) gives the maximum value, we can obtain an evolutionary distance which is efficient for obtaining the correct topology." Mol Biol Evol 1994 11 2 278-286 1201 Tateno,Y. Relative Efficiencies .. Mol.Biol.Evol. 94 11(2):261-277 Tateno Y; Takezaki N; Nei M Relative Efficiencies of the Maximum-Likelihood, Neighbor-Joining, and Maximum-Parsimony Methods When Substitution Rate Varies with Site Phylogeny; Substitution; Joining; Likelihood; Parsimony; USA; Rate; Neighbor joining "The relative efficiencies of the maximum-likelihood (ML), neighbor- joining (NJ), and maximum-parsimony (MP) methods in obtaining the correct topology and in estimating the branch lengths for the case of four DNA sequences were studied by computer simulation, under the assumption either that there is variation in substitution rate among different nucleotide sites or that there is no variation." Mol Biol Evol 1994 11 2 261-277 1202 Hendy,M.D. Spectral Analysis of P.. J.Classif. 93 10:5-24 Hendy MD; Penny D Spectral Analysis of Phylogenetic Data Phylogeny; Evolutionary tree; NZ; Spectral analysis; Phylogenetic "The spectral analysis of sequence and distance data is a new approach to phylogenetic analysis. ... We develop an optimality selection procedure using a least squares best fit, to find the phylogenetic tree whose tree spectrum most closely matches the conjugate spectrum. An inferred sequence spectrum can be derived from the selected tree spectrum using the inverse Hadamard conjugation to allow a comparison with the original sequence spectrum." J Classif 10 10 5-24 1203 Cavender,J.A. Invariants of Phylogen.. J.Classif. 87 4:57-71 Cavender JA; Felsenstein J Invariants of Phylogenies in a Simple Case with Discrete States Phylogeny; Invariant; Statistical; USA "Under a simple model of transition between two states, we can work out the probabilities of different data outcomes in four species with any given phylogeny. For a given tree topology, if all characters are evolving under the same probabilistic model, there are two quadratic forms in the frequencies of outcomes that must be zero. It may be possible to test the null hypothesis that the tree is of a particular topology by testing whether these quadratic forms are zero. One of the tests is a test for independence in a simple 2 x 2 contingency table." J Classif 4 4 57-71 1204 Sankoff,D. Designer Invariants fo.. Mol.Biol.Evol. 90 7(3):255-269 Sankoff D Designer Invariants for Large Phylogenies Phylogeny; Invariant; Markov; CA "The Cavender-Felsenstein edge-length invariants for binary characters on 4-trees provide the starting point for the development of 'customized' invariants for evaluating and comparing phylogenetic hypotheses. The binary character invariants may be generalized to k-valued characters without losing the quadratic nature of the invariants .... The key to the approach is that certain sets of these configurations constitute events which are probabilistically independent from other such sets, under the symmetric Markov change models studied. By introducing more complex sets of configurations, we find the quadratic invariants for 5-trees in the binary model ...." Mol Biol Evol 1990 7 3 255-269 1205 Penny,D. Trees from Sequences: .. Austral.Syst.Bo 90 3(10 Aug.):21- Penny D; Hendy MD; Zimmer EA; Hamby RK Trees from Sequences: Panacea or Pandora's Box? Phylogeny; Reliability; Robustness; Consistency; NZ "There are however still many problems estimating the reliability of the results of tree reconstruction. These are discussed, with examples, under the three headings of sampling error, methodological problems, and human errors. The methodological problems are the hardest to solve. They include the large number of trees, incomplete use information, inconsistency (converging to an incorrect tree), problems derived from unknown selection pressures on sequences, and trees being an inappropriate model. To overcome these problems, a good method for reconstructing trees should have the properties of being fast, efficient, consistent, robust and falsifiable." Austral Syst Bot 1990 3 10 Aug. 21-38 1206 Hendy,M.D. The Relationship Betwe.. Syst.Zool. 89 38:310-321 Hendy MD The Relationship Between Simple Evolutionary Tree Models and Observable Sequence Data Phylogeny; Evolutionary tree; NZ; Model "Cavender (1978) introduced a model of an evolutionary branching process on a sequence of characters, where the characters take either of two states with symmetric probabilities of change between them. From this model ... we show how to derive some properties of the resulting sequences and distance measures between pairs of taxa. These can be used to test the effectiveness of current algorithms for recovering [a given evolutionary tree], such as parsimony or distance methods. The relationships are described in terms of two matrices of exponential order." Syst Zool 38 38 310-321 1207 Penny,D. Reliability of Evoluti.. Cold Spring Har 87 52:857-862 Penny D; Hendy MD; Henderson IM Reliability of Evolutionary Trees Phylogeny; Evolutionary tree; NZ; Reliability "We describe a simple method for maximum likelihood for 2-state characters, using Hadamard matrices. Because these matrices are easily inverted, we can, for a given tree, calculate rates of evolution directly from the data. The method has allowed us to compare maximum likelihood, minimal length, and distance methods for reconstructing evolutionary trees. ... We have recently described ... a new likelihood method that seems to be particularly suitable for the long sequences of ribosomes. These sequences are sufficiently long to test for convergence to a single tree and to allow estimates of deviations from a simple model." Cold Spring Harbor Sympos Quant Biol 52 52 857-862 1208 Hendy,M.D. A Framework for the Qu.. Syst.Zool. 89 38(4):297-309 Hendy MD; Penny D A Framework for the Quantitative Study of Evolutionary Trees Phylogeny; Evolutionary tree; NZ "A direct method for calculating expected data from an evolutionary model for two state characters is described. ... These relationships have been used to analyse the behaviour of tree building algorithms under conditions when there are sufficient data. ... With equal rates of evolution ... we show that for n=4 taxa, parsimony will always converge to the correct tree, but we give examples with n=5 where parsimony will converge on an incorrect tree, even for equal rates of evolution. A further example with n=6 shows convergence to an incorrect tree with equal but arbitrarily small rates of change." Syst Zool 1989 38 4 297-309 1209 Penny,D. Testing the Theory of .. Nature (Lond.) 82 297(20 May):19 Penny D; Foulds LR; Hendy MD Testing the Theory of Evolution by Comparing Phylogenetic Trees Constructed from five Different Protein Sequences Evolutionary tree; NZ; Evolution; Protein; Phylogenetic "The theory of evolution predicts that similar phylogenetic trees should be obtained from different sets of character data. We have tested this prediction using sequence data for 5 proteins from 11 species. Our results are consistent with the theory of evolution. ... The general conculsions from the present work are that (1) it is possible to make falsifiable predictions from the hypothesis that species have been linked in the past by an evolutionary tree and (2) there is strong support from these five sequences for the theory of evolution." Nature (Lond ) 1982 297 20 May 197-200 1210 Cavender,J.A. Mechanized Derivation .. Mol.Biol.Evol. 89 6(3):301-316 Cavender JA Mechanized Derivation of Linear Invariants Phylogeny; Invariant; Markov; USA "Linear invariants, discovered by Lake, promise to provide a versatile way of inferring phylogenies on the basis of nucleic acid sequences .... A semigroup of Markov transition matrices embodies the assumptions underlying the method, and alternative semigroups exist. The set of all linear invariants may be derived from the semigroup by using an algorithm described here. Under assumptions no stronger than Lake's, there are >50 independent linear invariants for each of the 15 rooted trees linking four species." Mol Biol Evol 1989 6 3 301-316 1211 Cavender,J.A. Taxonomy with Confidence Math.Biosci. 78 40:271-280 Cavender JA Taxonomy with Confidence Phylogeny; Evolutionary tree; USA; Confidence "There are essentially three ways in which four species may be related in a phylogenetic tree graph. It is usual to compute for each of these three possibilities the smallest number of mutations that could have brought about the observed distribution of characteristics among the four species. The graph that minimizes this number is then preferred. In fact, the hypothesis that the graph chosen in this way is correct may be accepted with confidence if the minimum is strong in a sense described here. In principle, the theory could be extended to treat sets of more than four species." See the erratum on page 309. Math Biosci 40 40 271-280 1212 Cavender,J.A. Tests of Phylogenetic .. Math.Biosci. 81 54:217-229 Cavender JA Tests of Phylogenetic Hypotheses under Generalized Models Phylogeny; Evolutionary tree; Statistical; USA; Model; Phylogenetic "Prospective topologies of phylogenetic trees can be tested as hypotheses using statistics based on parsimony. If the unknown branch lengths of the trees are different for different characters, the method still works. When the transition probabilities between the states of characters are unequal in a known or unknown degree, the method still works. Hybridization or horizontal gene transfer in the history of a group can never be rejected; whether it can be confidently detected is problematical. Only four species are treated here and only binary characters." Math Biosci 54 54 217-229 1213 Moore,G.W. A Method for Construct.. J.Theor.Biol. 73 38:459-485 Moore GW; Barnabas J; Goodman M A Method for Constructing Maximum Parsimony Ancestral Amino Acid Sequences on a Given Network Phylogeny; Evolutionary tree; USA; Parsimony; Amino acid; Network "A solution is presented for the problem of how to find ancestral codons which minimize the number of mutations over a given network of species for which character-states of aligned amino acid sequences among the contemporary species are known. Three theorems which allow this 'maximum parsimony' problem to be solved are proved; then the use of these theorems in finding maximum parsimony ancestral codons is illustrated on a network of chicken and mammalian alpha globin amino acid sequences at two alignment positions." J Theor Biol 38 38 459-485 1214 Farris,J.S. A Probability Model fo.. Syst.Zool. 73 22:250-256 Farris JS A Probability Model for Inferring Evolutionary Trees Phylogeny; Evolutionary tree; Statistical; Stochastic; USA; Probability; Model "Estimation of evolutionary trees should be treated as a problem in statistical inference, but such treatment requires the explicit formulation of a stochastic model of the evolutionary process. Because an evolutionary inference procedure is likely to be put to such uses as deciding the issue of whether rates of evolution are homogeneous, the stochastic model underlying the inference procedure should not assume homogeneity over time of the evolutionary process .... Such a model is constructed, and it is shown that most parsimonious trees are maximum-likelihood estimated evolutionary trees under the stochastic model." Syst Zool 22 22 250-256 1215 Zuckerkandl,E Molecules as Documents.. J.Theor.Biol. 65 8:357-366 Zuckerkandl E; Pauling L Molecules as Documents of Evolutionary History Phylogeny; USA "Different types of molecules are discussed in relation to their fitness for providing the basis for a molecular phylogeny. Best fit are the 'semantides', i.e. the different types of macromolecules that carry the genetic information or a very extensive translation thereof. The fact that more than one coding triplet may code for a given amino acid residue in a polypeptide leads to the notion of 'isosemantic substitutions' in genic and messenger polynucleotides. Such substitutions lead to differences in nucleotide sequence that are not expressed by differences in amino acid sequence. Some possible consequences of isosemanticism are discussed." J Theor Biol 8 8 357-366 1216 Felsenstein,J Maximum Likelihood and.. Syst.Zool. 73 22:240-249 Felsenstein J Maximum Likelihood and Minimum-Steps Methods for Estimating Evolutionary Trees from Data on Discrete Characters Phylogeny; Likelihood; Evolutionary tree; Statistical; USA "The general maximum likelihood approach to the statistical estimation of phylogenies is outlined, for data in which there are a number of discrete states for each character. The details of the maximum likelihood method will depend on the details of the probabilistic model of evolution assumed. There are a very large number of possible models of evolution. For a few of the simpler models, the calculation of the likelihood of an evolutionary tree is outlined. For these models, the maximum likelihood tree will be the same as the 'most parsimonious' tree if the probability of change during the evolution of the group is assumed a priori to be very small." Syst Zool 22 22 240-249 1217 Felsenstein,J Cases in which Parsimo.. Syst.Zool. 78 27:401-410 Felsenstein J Cases in which Parsimony or Compatibility Methods will be Positively Misleading Phylogeny; Evolutionary tree; Likelihood; USA; Parsimony; Compatibility Republished as Felsenstein (1984). "For some simple three- and four- species cases involving a character with two states, it is determined under what conditions several methods of phylogenetic inference will fail to converge to the true phylogeny as more and more data are accumulated. The methods are the Camin-Sokal parsimony method, the compatibility method, and Farris's unrooted Wagner tree parsimony method. In all cases the conditions for this failure (which is the failure to be statistically consistent) are essentially that parallel changes exceed informative, nonparallel changes." Syst Zool 27 27 401-410 1218 Felsenstein,J A Likelihood Approach .. Biol.J.Linn.Soc 81 16:183-196 Felsenstein J A Likelihood Approach to Character Weighting and What It Tells Us about Parsimony and Compatibility Character weight; Statistical; Likelihood; Phylogeny; USA; Parsimony; Compatibility "The statistical framework of maximum likelihood estimation is used to examine character weighting in inferring phylogenies. A simple probabilistic model of evolution is used, in which each character evolves independently among two states, and different lineages evolve independently. When different characters have different known probabilities of change, all sufficiently small, the proper maximum likelihood method of estimating phylogenies is a weighted parsimony method in which the weights are logarithmically related to the rates of change. When rates of change are taken extremely small, the weights become more equal and unweighted parsimony methods are obtained." Biol J Linn Soc 16 16 183-196 1219 Felsenstein,J Parsimony in Systemati.. Annu.Rev.Ecol.S 83 14:313-333 Felsenstein J Parsimony in Systematics: Biological and Statistical Issues Phylogeny; Statistical; USA; Parsimony; Systematics "Quite recently 'parsimony' has become the favored method for inferring phylogenies (evolutionary trees). The accompanying article by Elliott Sover discusses the philosophical issues relating to the status of parsimony from a somewhat different perspective than that adopted here. This review will discuss parsimony, its origins, its major variants, and its biological assumptions." Annu Rev Ecol Syst 14 14 313-333 1220 Swofford,D.L. When are Phylogeny Est.. Phylogenetic .. 91Oxford Universi Swofford DL When are Phylogeny Estimates from Molecular and Morphological Data Incongruent? Miyamoto MM Cracraft J Phylogenetic Analysis of DNA Sequences Phylogeny; Significance; USA "To the extent that character sets 'tell the truth' about their past, phylogenies inferred from different character sets should be congruent with the true tree and therefore with each other. ... In practice, however, the ideal of perfect congruence is frequently not achieved. ... This chapter has two purposes. First, in keeping with the general theme of this volume, I will review several methods currently being used to assess levels of congruence. Second, I will suggest some additional procedures, facilitated by recent improvements in computer software, that allow a more comprehensive examination of the question posed in the title." Oxford University Press New York 1991 295-333 1221 Navidi,W.C. The Effect of Unequal .. Mol.Biol.Evol. 92 9(6):1163-1175 Navidi WC; Beckett-Lemus L The Effect of Unequal Transversion Rates on the Accuracy of Evolutionary Parsimony Phylogeny; Parsimony; USA; Rate; Transversion; Accuracy "Evolutionary parsimony is an easy-to-use method of phylogenetic inference that is based on nucleic acid sequences and that does not require the assumption that evolutionary processes in the various sites on the molecule are identical. It does, however, require a parameter constraint, known as the 'balanced transversion' assumption. We show that the accuracy of the procedure is fairly insensitive to moderate violations of this assumption - and that the procedure thus is applicable under more general conditions than previously thought." Mol Biol Evol 1992 9 6 1163-1175 1222 Miyata,T. Molecular Evolution of.. J.Mol.Evol. 80 16:23-36 Miyata T; Yasunaga T Molecular Evolution of mRNA: A Method for Estimating Evolutionary Rates of Synonymous and Amino Acid Substitutions from Homologous Nucleotide Sequences and Its Application Substitution; JP; Evolution; Evolutionary rate; Synonymous; Amino acid; Rate; Nucleotide "A method for estimating the evolutionary rates of synonymous and amino acid substitutions from homologous nucleotide sequences is presented." J Mol Evol 16 16 23-36 1223 Lanave,C. A New Method for Calcu.. J.Mol.Evol. 84 20:86-93 Lanave C; Preparata G; Saccone C; Serio G A New Method for Calculating Evolutionary Substitution Rates Substitution; Stochastic; Markov; Italy; Rate "In this paper we present a new method for analysing molecular evolution in homologous genes based on a general stationary Markov process. The elaborate statistical analysis necessary to apply the method effectively has been performed using Monte Carlo techniques. We have applied our method to the silent third position of the codon of the five mitochondrial genes coding for identified proteins of four mammalian species (rat, mouse, cow and man). We found that the method applies satisfactorily to the three former species, while the last appears to be outside the scope of the present approach." J Mol Evol 20 20 86-93 1224 Jin,L. Limitations of the Evo.. Mol.Biol.Evol. 90 7(1):82-102 Jin L; Nei M Limitations of the Evolutionary Parsimony Method of Phylogenetic Analysis Phylogeny; Parsimony; Invariant; USA; Phylogenetic "Lake's evolutionary parsimony (EP) method of constructing a phylogenetic tree is primarily applied to four DNA sequences. ... However, Lake's method depends on a number of unrealistic assumptions. We therefore examined the theoretical basis of his method and reached the following conclusions: ... (6) As long as a proper distance measure is used, the NJ method is better than the EP and MP methods whether there is a transition/transversion bias or whether there is variation in substitution rate among different nucleotide sites." Mol Biol Evol 1990 7 1 82-102 1225 Navidi,W.C. Methods for Inferring .. Mol.Biol.Evol. 91 8(1):128-143 Navidi WC; Churchill GA; von Haeseler A Methods for Inferring Phylogenies from Nucleic Acid Sequence Data by Using Maximum Likelihood and Linear Invariants Phylogeny; Likelihood; Invariant; Statistical; Significance; USA; Nucleic acid "A likelihood-ratio test may be used to determine the feasibility of any tree for which the maximum likelihood can be computed. The method of linear invariants described by Cavender, which includes Lake's method of evolutionary parsimony as a special case, is essentially a form of the likelihood-ratio method. In the case of a small number of species (four or five), these methods may be used to find a confidence set for the correct tree. An exact version of Lake's asymptotic c2 test has been mentioned by Holmquist et al. Under very general assumptions, a one-sided exact test is appropriate, which greatly increases power." Mol Biol Evol 1991 8 1 128-143 1226 Navidi,W.C. Phylogenetic Inference.. Biometrics 93 49(2):543-555 Navidi WC; Churchill GA; von Haeseler A Phylogenetic Inference: Linear Invariants and Maximum Likelihood Phylogeny; Likelihood; Invariant; Statistical; USA; Phylogenetic "We develop a new statistical method for inferring phylogenies, based on a likelihood ratio test. This method does not require parameter constraints but does require identical evolutionary processes in the sites considered. ... We describe a sound mathematical basis for the use of linear invariants. We show that the validity of the method requires parameter constraints, but does not require that the evolutionary processes in differing sites be identical. We show that the method of linear invariants is asymptotically equivalent to a less powerful version of our likelihood ratio test, and is thus essentially a maximum likelihood technique." Biometrics 1993 49 2 543-555 1227 Staden,R. Automation of the Comp.. Nucleic Acids R 82 10(15):4731-47 Staden R Automation of the Computer Handling of Gel Reading Data Produced by the Shotgun Method of DNA Sequencing Supersequence; Shortest common; Reconstruct; UK; DNA; Reading; Sequencing "This paper describes a computer method for handling gel reading data produced by the shotgun method of DNA sequencing. The method greatly reduces the time the sequencer needs to spend checking and editing his data and yet it produces a consensus sequence for which the accuracy of determination of every base can be clearly shown. ... No information is lost in this process as alignments are achieved by making only insertions and because all the individual gel readings are added to a database from which they can be retrieved and displayed lined up one above the other." Nucleic Acids Res 1982 10 15 4731-4751 1228 Hillis,D.M. Molecular Versus Morph.. Annu.Rev.Ecol.S 87 18:23-42 Hillis DM Molecular Versus Morphological Approaches to Systematics Phylogeny; Review; USA; Systematics "In this review, I first outline the advantages of both morphological and molecular approaches to systematics. I then discuss some common differences in assumptions and methods of analysis that can lead to spurious conflict between studies, especially those concerning phylogenetic reconstruction. A major impediment in comparing the two approaches is that the histories of the application of the two techniques to systematic problems differ to a large extent. ... Finally, I discuss ways in which conflicting studies can be reconciled, and I argue for the increased combination of molecular and morphological data in order to maximize phylogenetic information." Annu Rev Ecol Syst 18 18 23-42 1229 Cesari,Y. Une caracterisation de.. C.R.Acad.Sci.Pa 78 286(24):1175-1 Cesari Y; Yincent M Une caracterisation des mots periodiques Regularities; Cover; Repetition; FR; DE "We establish the periodicity of words in which all letters admit a double covering." C R Acad Sci Paris Ser A 1978 286 24 1175-1177 1230 Bean,D.R. Avoidable Patterns in .. Pacific J.Math. 79 85(2):261-294 Bean DR; Ehrenreucht A; McNulty GF Avoidable Patterns in Strings of Symbols String match; Pattern match; USA "A word is just a finite string of letters. The word W avoids the word U provided no substitution instance of U is a subword of W. W is avoidable if on some finite alphabet there is an infinite collection of words each of which avoids W. ... Next we examine avoidable words in general and prove that all words of length at least 2n on an alphabet with n letters are sumultaneously avoidable. We show that on any finite alphabet the collection of avoidable words is simultaneously avoidable. We provide an effective (recursive) characterization of avoidability." Pacific J Math 1979 85 2 261-294 1231 Tarhio,J. A Greedy Algorithm for.. Lecture Notes i 86 233:602-610 Tarhio J; Ukkonen E A Greedy Algorithm for Constructing Shortest Common Superstrings Supersequence; Shortest common; Reconstruct; FI; Algorithm "An algorithm for constructing shortest common superstrings for a given set R of strings is developed, based on Knuth-Morris-Pratt string matching procedure and on the greedy heuristics for finding longest Hamiltonian paths in weighted graphs. The algorithm runs in O(mn + m2 log m) steps where m is the number of strings in R and n is the total length of these strings. The compression in the common superstring constructed by the algorithm is shown to be at least half of the compression in a shortest superstring." Lecture Notes in Comput Sci 233 233 602-610 1232 Apostolico,A. On Context Constrained.. RAIRO Inform.Th 84 18(2):147-159 Apostolico A On Context Constrained Squares and Repetitions in a String Repetition; Square; Regularities; Italy "Some combinatorial and computational problems concerning repetitions and repetition roots in a string x on a finite alphabet - that are characterized in general by an O(n log n) bound in terms of the length n of x - are shown to admit of a linear bound when approached in particular contexts." RAIRO Inform Theor 1984 18 2 147-159 1233 Apostolico,A. Efficient Parallel Alg.. SIAM J.Comput. 90 19(5):968-988 Apostolico A; Atallah MJ; Larmore LL; McFaddin S Efficient Parallel Algorithms for String Editing and Related Problems Editing; Distance; Sequence comparison; Parallel; USA; Algorithm "The string editing problem ... has a well-known O(|x||y|) time-sequential solution. Efficient PRAM parallel algorithms for the string editing problem are given." SIAM J Comput 1990 19 5 968-988 1234 Crochemore,M. Recherche lineaire d'u.. C.R.Acad.Sci.Pa 83 296(18):781-78 Crochemore M Recherche lineaire d'un carre dans un mot [Linear Searching for a Square in a Word] Regularities; Square; Repetition; FR "The search for a square in a word may be implemented in time proportional to the length of the word on a random access machine provided the alphabet is fixed." C R Acad Sci Paris Ser I 1983 296 18 781-784 1235 Hirschberg,D. The Set LCS Problem Algorithmica 87 2:91-95 Hirschberg DS; Larmore LL The Set LCS Problem Longest common; Subsequence; Dynamic programming; USA "An efficient algorithm is presented that solves a generalization of the Longest Common Subsequence problem, in which one of the two input strings contains sets of symbols which may be permuted. This problem arises from a music application." Algorithmica 2 2 91-95 1236 Patterson,C. Homology in Classical .. Mol.Biol.Evol. 88 5(6):603-625 Patterson C Homology in Classical and Molecular Biology Sequence comparison; Homology; UK "Hypotheses of homology are the basis of comparative morphology and comparative molecular biology. The kinds of homologous and nonhomologous relations in classical and molecular biology are explored through the three tests that may be applied to a hypothesis of homology: congruence, conjunction, and similarity. The same three tests apply in molecular comparisons and in morphology, and in each field they differentiate eight kinds of relation. These various relations are discussed and compared." Mol Biol Evol 1988 5 6 603-625 1237 Li,W.H. A Statistical Test of .. Mol.Biol.Evol. 89 6(4):424-435 Li WH A Statistical Test of Phylogenies Estimated from Sequence Data Evolutionary tree; Significance; Statistical; USA; Phylogeny "A simple approach to testing the significance of the branching order, estimated from protein or DNA sequence data, of three taxa is proposed. The branching order is inferred by the transformed-distance method, under the assumption that one or two outgroups are available, and the branch lengths are estimated by the least-squares method. The inferred branching order is considered significant if the estimated internodal distance is significantly greater than zero. To test this, a formula for the variance of the internodal distance has been developed. The statistical test proposed has been checked by computer simulation." Mol Biol Evol 1989 6 4 424-435 1238 Shoemaker,J.S Evidence from Nuclear .. Mol.Biol.Evol. 89 6(3):270-289 Shoemaker JS; Fitch WM Evidence from Nuclear Sequences that Invariable Sites should be Considered when Sequence Divergence is Calculated Sequence proximity; USA; Divergence "It has long been known, from the distribution of multiple amino acid replacements, that not all amino acids of a sequence are replaceable. More recently, the phenomenon was observed at the nucleotide level in mitochondrial DNA even after allowing for different rates of transition and transversion substitutions. We have extended the search to globin gene sequences from various organisms, with the following results. ... (5) The fit in the latter case suggests, if the assumptions are correct and at all common, that current procedures for estimating the total number of nucleotide substitutions in two genes since their divergence from their common ancestor could be low by as much as an order of magnitude." Mol Biol Evol 1989 6 3 270-289 1239 Fitch,W.M. Correcting Parsimoniou.. Mol.Biol.Evol. 90 7(5):438-443 Fitch WM; Beintema JJ Correcting Parsimonious Trees for Unseen Nucleotide Substitutions: The Effect of Dense Branching as Exemplified by Ribonuclease Evolutionary rate; Substitution; Sequence proximity; USA; Nucleotide "In a study of mammalian ribonuclease evolutionary rates, we applied the Fitch-Bruschi correction to reduce the bias caused by an unequal sampling of taxa in different lineages. The correction was clearly appropriate but only up to a point. The analysis showed that the sampling of taxa within the pecora was sufficiently intense that no correction for unseen, amino acid-changing, nucleotide substitutions was required." Mol Biol Evol 1990 7 5 438-443 1240 Fitch,W.M. The Evolution of Proka.. Mol.Biol.Evol. 87 4(4):381-394 Fitch WM; Bruschi M The Evolution of Prokaryotic Ferredoxins - With a General Method Correcting for Unobserved Substitutions in Less Branched Lineages Evolutionary rate; Substitution; Correction; Evolution; USA "Appendix. Correction of Limb Length on Most-Parsimonious Trees. It is well recognized that, in most-parsimonious trees, the number of nucleotide substitutions (or amino acid replacements) observed between a sequence and a remote ancestor of it is an increasing function of the number of branching events between the two of them. The effect is to cause lineages with fewer branchings to appear to evolve more slowly. ... We present here a new and simpler method that also corrects for this problem." Mol Biol Evol 1987 4 4 381-394 1241 Tajima,F. A Simple Graphic Metho.. Mol.Biol.Evol. 90 7(6):578-588 Tajima F A Simple Graphic Method for Reconstructing Phylogenetic Trees from Molecular Data Phylogeny; Evolutionary tree; JP; Phylogenetic; Graphic "A simple graphic method is proposed for reconstructing phylogenetic trees from molecular data. This method is similar to the unweighted pair-group method with arithmetic mean, but the process of computation of average distances and reconstruction of new matrices, required in the latter method, is eliminated from this new method, so that one can reconstruct a phylogenetic tree without using a computer, unless the number of operational taxonomic units is very large. Furthermore, this method allows a phylogenetic tree to have multifurcating branches whenever there is ambiguity with bifurcation." Mol Biol Evol 1990 7 6 578-588 1242 Bulmer,M. Use of the Method of G.. Mol.Biol.Evol. 91 8(6):868-883 Bulmer M Use of the Method of Generalized Least Squares in Reconstructing Phylogenies from Sequence Data Phylogeny; Least squares; UK; Square "The method of generalized least squares provides a flexible method of phylogenetic reconstruction from sequence data, after reducing them to pairwise distances between species, corrected for multiple and back mutation. It gives efficient estimates of the branch lengths of a given tree. It also provides a natural measure of the departure of the observed from the predicted set of distances which has a c2 distribution under the true topology; this fact is used to construct a significance test on the topology and so to determine a 'confidence interval' for the set of trees which are compatible with the data." Mol Biol Evol 1991 8 6 868-883 1243 Olsen,G.J. Systematic Underestima.. Mol.Biol.Evol. 91 8(5):592-608 Olsen GJ Systematic Underestimation of Tree Branch Lengths by Lake's Operator Metrics: An Effect of Position-dependent Substitution Rates Substitution; Evolutionary distance; USA; Rate; Systematics "It is shown analytically that operator metrics does not yield the claimed estimate of transversion sequence differences when sequence positions differ in their nucleotide substitution rate, in which case the method underestimates tree branch lengths. The site-to-site variations in substitution rate that have been characterized by previous authors are of sufficient magnitude to explain the problems observed in the operator-metrics branch length estimates. Transversion substitutions estimated using Kimura's two-parameter (transition/transversion) model are less subject to this problem and are more consistent with directly observed differences." Mol Biol Evol 1991 8 5 592-608 1244 Takahata,N. Sampling Errors in Phy.. Mol.Biol.Evol. 91 8(4):494-502 Takahata N; Tajima F Sampling Errors in Phylogeny Evolutionary distance; Evolutionary tree; Statistical; Significance; JP; Error; Sampling; Phylogeny "The sampling variance of nucleotide diversity or branch length in a phylogenetic tree constructed by any distance method provides a criterion to judge whether a deduction or an inference made from data is statistically significant. However, computation of the sampling variance is usually tedious .... In this paper, we derive simple formulas for the minimum and maximum values of the sampling variance, which are independent of underlying substitution models. Application of these formulas demonstrates satisfactorily accurate estimates of the sampling variances and therefore their practical use." Mol Biol Evol 1991 8 4 494-502 1245 Jin,L. Relative Efficiencies .. Mol.Biol.Evol. 91 8(3):356-365 Jin L; Nei M Relative Efficiencies of the Maximum-Parsimony and Distance-Matrix Methods of Phylogeny Construction for Restriction Data Phylogeny; Restriction; USA; Joining; Parsimony; UPGMA "The relative efficiencies of the maximum-parsimony (MP), UPGMA, and neighbor-joining (NJ) methods in obtaining the correct tree (topology) for restriction-site and restriction-fragment data were studied by computer simulation." Mol Biol Evol 1991 8 3 356-365 1246 Bafna,V. Sorting by Transpositi.. 94 Bafna V; Pevzner PA Sorting by Transpositions BK - Rearrangement; Transposition; Genomic; USA Preprint received 7 Nov. 1994, 15 pp. "The paper addresses the problem of genome comparison versus classical gene comparison and presents algorithms to analyse rearrangements in genomes evolving by transpositions. In the simplest form the problem corresponds to sorting by transpositions, i.e., sorting of an array using transpositions of arbitrary fragments. We derive lower bounds on transposition distance between permutations and present approximation algorithms for sorting by transpositions. The algorithms also imply a non-trivial upper bound on the transposition diameter of the symmetric group." 1994 1247 Lewontin,R.C. Inferring the Number o.. Mol.Biol.Evol. 89 6(1):15-32 Lewontin RC Inferring the Number of Evolutionary Events from DNA Coding Sequence Differences Evolutionary distance; Coding; USA; DNA "The estimation of the amount of evolutionary divergence that has taken place between two DNA coding sequences depends strongly on the degree of constraint on amino acid replacements. If amino acid replacements are relatively unconstrained, the individual nucleotide is the appropriate unit of analysis and the method of Tajima and Nei can be used. If amino acid replacements are constarained, however, this method is shown to be inapplicable. For sequences with strong amino acid constraints, a method is outlined analogous to the Tajima and Nei method using codons as the unit of analysis. Only synomymous substitutions are used." Mol Biol Evol 1989 6 1 15-32 1248 Tajima,F. Statistical Method for.. Mol.Biol.Evol. 92 9(1):168-181 Tajima F Statistical Method for Estimating the Standard Errors of Branch Lengths in a Phylogenetic Tree Reconstructed without Assuming Equal Rates of Nucleotide Substitution among Different Lineages Evolutionary tree; Evolutionary distance; Statistical; Error; Substitution; JP; Rate; Nucleotide; Phylogenetic "A statistical method is developed for estimating the standard errors of branch lengths in a phylogenetic tree reconstructed without assuming equal rates of nucleotide substitution among different lineages. This method can be easily used for testing whether the length of an interior branch in a reconstructed tree is positive, i.e., whether the topology of the tree is correct. Computer simulations indicate that this method is appropriate for a statistical test. ... The results obtained show that the present method provides a powerful statistical test." Mol Biol Evol 1992 9 1 168-181 1249 DeBry,R.W. The Consistency of Sev.. Mol.Biol.Evol. 92 9(3):537-551 DeBry RW The Consistency of Several Phylogeny-Inference Methods under Varying Evolutionary Rates Phylogeny; Evolutionary rate; USA; Consistency; Rate "A phylogenetic method is a consistent estimator of phylogeny if and only if it is guaranteed to give the correct tree, given that sufficient (possibly infinite) independent data are examined. The following methods are examined for consistency: UPGMA (unweighted pair-group, averages), NJ (neighbor joining), MF (modified Farris), and P (parsimony). A two-parameter model of nucleotide sequence substitution is used, and the expected distribution of character states is calculated. Without perfect correction for superimposed substitutions, all four methods may be inconsistent if there is but one branch evolving at a faster rate than the other branches." Mol Biol Evol 1992 9 3 537-551 1250 Tamura,K. Estimation of the Numb.. Mol.Biol.Evol. 92 9(4):678-687 Tamura K Estimation of the Number of Nucleotide Substitutions When There Are Strong Transition-Transversion and G+C-Content Biases Substitution; JP; Nucleotide; Estimation "A simple mathematical method is developed to estimate the number of nucleotide substitutions per site between two DNA sequences, by extending Kimura's (1980) two-parameter method to the case where a G+C-content bias exists. This method will be useful when there are strong transition-transversion and G+C-content biases, as in the case of Drosophila mitochondrial DNA." Mol Biol Evol 1992 9 4 678-687 1251 Clark,A.G. Sequencing Errors and .. Mol.Biol.Evol. 92 9(4):744-752 Clark AG; Whittam TS Sequencing Errors and Molecular Evolutionary Analysis Phylogeny; Substitution; Error; USA; Sequencing "Heuristic approaches were used to quantify the influence that sequencing errors have on estimates of nucleotide diversity, substitution rate, and the construction of genealogies. Error rates of < 1 nucleotide/kb probably have little effect on conclusions about evolutionary history of highly polymorphic organisms such as Drosophila and Escherichia coli, but organisms with very low nucleotide diversity, such as humans, require greater sequencing accuracy. A scan of GenBank for corrections of previous errors reveals that sequencing errors are highly nonrandom." Mol Biol Evol 1992 9 4 744-752 1252 Churchill,G.A Sample Size for a Phyl.. Mol.Biol.Evol. 92 9(4):753-769 Churchill GA; von Haeseler A; Navidi WC Sample Size for a Phylogenetic Inference Evolutionary distance; Statistical; Significance; USA; Phylogenetic "The objective of this work is to describe sample-size calculations for the inference of a nonzero central branch length in an unrooted four-species phylogeny. Attention is restricted to independent binary characters, such as might be obtained from an alignment of the purine-pyrimidine sequences of a nucleic acid molecule. A statistical test based on a multinomial model for character-state configurations is described. The importance of including invariable sites in models for sequence change is demonstrated, and their effect on sample size is quantified." Mol Biol Evol 1992 9 4 753-769 1253 Allard,M.W. Testing Phylogenetic A.. Mol.Biol.Evol. 92 9(5):778-786 Allard MW; Miyamoto MM Testing Phylogenetic Approaches with Empirical Data, as Illustrated with the Parsimony Method Phylogeny; Evolutionary tree; Significance; USA; Parsimony; Phylogenetic "In the present study, the evolutionary relationships of lipotyphlan insectivores ... are investigated with new mitochondrial DNA sequences of the 12S ribosomal RNA gene. A single phylgeny based on parsimony analyses of these sequences is accepted as well supported according to different criteria, although an exception to this conclusion is noted. This exception forms the basis for an investigation of why an incorrect solution is obtained by the parsimony method in this particular case." Mol Biol Evol 1992 9 5 778-786 1254 Zharkikh,A. Statistical Properties.. Mol.Biol.Evol. 92 9(6):1119-1147 Zharkikh A; Li WH Statistical Properties of Bootstrap Estimation of Phylogenetic Variability from Nucleotide Sequences. I. Four Taxa with a Molecular Clock Evolutionary tree; Bootstrap; Statistical; USA; Clock; Nucleotide; Phylogenetic; Estimation "The statistical properties of sample estimation and bootstrap estimation of phylogenetic variability from a sample of nucleotide sequences are studied by using model trees of three taxa with an outgroup and by assuming a constant rate of nucleotide substitution. The maximum-parsimony method of tree reconstruction is used. An analytic formula is derived for estimating the sequence length that is required if P, the probability of obtaining the true tree from the sampled sequences, is to be equal to or higher than a given value." Mol Biol Evol 1992 9 6 1119-1147 1255 Schoniger,M. A Simple Method to Imp.. Mol.Biol.Evol. 93 10(2):471-483 Schoniger M; von Haeseler A A Simple Method to Improve the Reliability of Tree Reconstruction Phylogeny; Reliability; USA "The efficiencies of distance-matrix methods for correct tree reconstruction under a variety of substitution rates, transition-transversion biases, and different model trees were studied. ... We show that a combination of combinatorial weighting by Williams and Fitch (1990) and the Jukes-Cantor (1969) correction significantly increases the efficiency of tree-reconstruction methods, for a large fraction of evolutionary parameters. We explain why this approach is superior to any other weighting/correction scheme tested, as long as .... An approximate threshold for switching to a different weighting scheme is given." Mol Biol Evol 1993 10 2 471-483 1256 Tajima,F. Unbiased Estimation of.. Mol.Biol.Evol. 93 10(3):677-688 Tajima F Unbiased Estimation of Evolutionary Distance between Nucleotide Sequences Evolutionary distance; Substitution; JP; Distance; Nucleotide; Estimation "A new algorithm for estimating the number of nucleotide substitutions per site (i.e., the evolutionary distance) between two nucleotide sequences is presented. This algorithm can be applied to many estimation methods, such as Jukes and Cantor's method (1969), Kimura's transition/transversion method (1980), and Tajima and Nei's method (1984). Unlike ordinary methods, this algorithm is always applicable. Numerical computations and computer simulations indicate that this algorithm gives an almost unbiased estimate of the evolutionary distance, unless the evolutionary distance is very large." Mol Biol Evol 1993 10 3 677-688 1257 Rzhetsky,A. Theoretical Foundation.. Mol.Biol.Evol. 93 10(5):1073-109 Rzhetsky A; Nei M Theoretical Foundation of the Minimum-Evolution Method of Phylogenetic Inference Phylogeny; Minimum evolution; USA; Phylogenetic "The minimum-evolution (ME) method of phylogenetic inference is based on the assumption that the tree with the smallest sum of branch length estimates is most likely to be the true one. In the past this assumption has been used without mathematical proof. Here we present the theoretical basis of this method by showing that the expectation of the sum of branch length estimates for the true tree is smallest among all possible trees, provided that the evolutionary distances used are statistically unbiased and that the branch lengths are estimated by the ordinary least-squares method." Mol Biol Evol 1993 10 5 1073-1095 1258 Yang,Z. Maximum-Likelihood Est.. Mol.Biol.Evol. 93 10(6):1396-140 Yang Z Maximum-Likelihood Estimation of Phylogeny from DNA Sequences when Substitution Rates Differ over Sites Phylogeny; Likelihood; Substitution; CN; DNA; Rate; Estimation "Felsenstein's (1981) maximum-likelihood approach for inferring phylogeny from DNA sequences assumes that the rate of nucleotide substitution is constant over different nucleotide sites. This assumption is sometimes unrealistic, as has been revealed by analysis of real sequence data. In the present paper Felsenstein's method is extended to the case where substitution rates over sites are described by the G distribution. A numerical example is presented to show that the method fits the data better than do previous models." Mol Biol Evol 1993 10 6 1396-1401 1259 Tamura,K. Model Selection in the.. Mol.Biol.Evol. 94 11(1):154-157 Tamura K Model Selection in the Estimation of the Number of Nucleotide Substitutions Evolutionary distance; Likelihood; USA; Substitution; Selection; Model; Nucleotide; Estimation "Tamura and Nei (1993) recently published a new mathematical model for estimating the number of nucleotide substitutions per site to analyze mitochondrial DNA (mtDNA) control-region sequences from humans and chimpanzees. Although this mathematical model fitted the observed pattern of nucleotide substitution quite well, the goodness of fit of the model has not been tested statistically. In the present communication, I would like to examine Horai et al.'s (1992) data on the coding region of mtDNA and show that Tamura and Nei's model fits observed data better than does Hasegawa et al.'s (1985) model." Mol Biol Evol 1994 11 1 154-157 1260 Yang,Z. Comparison of Models f.. Mol.Biol.Evol. 94 11(2):316-324 Yang Z; Goldman N; Friday A Comparison of Models for Nucleotide Substitution Used in Maximum- Likelihood Phylogenetic Estimation Phylogeny; Likelihood; Substitution; UK; Model; Nucleotide; Phylogenetic; Estimation "Using real sequence data, we evaluate the adequacy of assumptions made in evolutionary models of nucleotide substitution and the effects that these assumptions have on estimation of evolutionary trees. Two aspects of the assumptions are evaluated. The first concerns the pattern of nucleotide substitution, including equilibrium base frequencies and the transition/transversion-rate ratio. The second concerns the variation of substitution rates over sites. The maximum-likelihood estimate of tree topology appears quite robust to both these aspects of the assumptions of the models, but evaluation of the reliability of the estimated tree by using simpler, less realistic models can be misleading." Mol Biol Evol 1994 11 2 316-324 1261 Wakeley,J. Substitution-Rate Vari.. Mol.Biol.Evol. 94 11(3):436-442 Wakeley J Substitution-Rate Variation among Sites and the Estimation of Transition Bias Substitution; Sequence comparison; USA; Transition; Bias; Estimation "Substitution-rate variation among sites and differences in the probabilities of change among the four nucleotides are conflated in DNA sequence comparisons. When variation in rate exists among sites but is ignored, biases in the rates of change among nucleotides are underestimated. This paper provides a quantification of this effect when the observed proportions of transitions, P, and transversions, Q, between two sequences are used to estimate transition bias. The utility of P/Q as an estimator is examined both with and without rate variation among sites." Mol Biol Evol 1994 11 3 436-442 1262 Kuhner,M.K. A Simulation Compariso.. Mol.Biol.Evol. 94 11(3):459-468 Kuhner MK; Felsenstein J A Simulation Comparison of Phylogeny Algorithms under Equal and Unequal Evolutionary Rates Phylogeny; Simulation; Evolutionary rate; Parsimony; Likelihood; USA; Rate; Algorithm "Using simulated data, we compared five methods of phylogenetic tree estimation: parsimony, compatibility, maximum-likelihood, Fitch-Margoliash, and neighbor joining. ... Maximum likelihood was the most successful method overall, although for short sequences Fitch-Margoliash and neighbor joining were sometimes better. ... Parsimony and compatibility had particular difficulty with inaccuracy and bias when substitution rates varied among different branches. When rates of evolution varied among different sites, all methods showed signs of inaccuracy and bias." Mol Biol Evol 1994 11 3 459-468 1263 Zharkikh,A. Inconsistency of the M.. Syst.Biol. 93 42(2):113-125 Zharkikh A; Li WH Inconsistency of the Maximum-Parsimony Method: The Case of Five Taxa with a Molecular Clock Phylogeny; Parsimony; Simulation; Monte Carlo; Joining; USA; Clock "The inconsistency of the maximum-parsimony method for the case of five taxa with a molecular clock was studied using an analytical approach and Monte Carlo simulation. The inconsistency occurs in the case of a symmetrical tree with short internal branches and long external branches but can be avoided by using slowly evolving sequences. The neighbor-joining method is consistent if the evolutionary distances between taxa are estimated accurately." Syst Biol 1993 42 2 113-125 1264 Farris,J.S. A Successive Approxima.. Syst.Zool. 69 18:374-385 Farris JS A Successive Approximations Approach to Character Weighting Character weight; USA; Approximation "Characters that are reliable for cladistic inference are those that are consistent with the true phyletic relationships, that is, those that have little homoplasy. ... A technique that infers cladistic relationship by successively weighting characters according to apparent cladistic reliability is suggested, and computer simulation tests of the technique are described. Results indicate that the successive weighting procedure can be highly successful, even when cladistically reliable characters are heavily outnumbered by unreliable ones." Syst Zool 18 18 374-385 1265 Gojobori,T. Estimation of Average .. J.Mol.Evol. 82 18:414-422 Gojobori T; Ishii K; Nei M Estimation of Average Number of Nucleotide Substitutions When the Rate of Substitution Varies with Nucleotide Evolutionary distance; Substitution; Statistical; Pairwise comparison; USA; Nucleotide; Rate; Estimation "A formal mathematical analysis of Kimura's (1981) six-parameter model of nucleotide substitution for the case of unequal substitution rates among different pairs of nucleotides is conducted, and new formulae for estimating the number of nucleotide substitutions and its standard error are obtained. By using computer simulation, the validities and utilities of Jukes and Cantor's (1969) one-parameter formula, Takahata and Kimura's (1981) four-parameter formula, and our six-parameter formula for estimating the number of nucleotide substitutions are examined under three different schemes of nucleotide substitution." J Mol Evol 18 18 414-422 1266 Golding,G.B. Estimates of DNA and P.. Mol.Biol.Evol. 83 1(1):125-142 Golding GB Estimates of DNA and Protein Sequence Divergence: An Examination of Some Assumptions Evolutionary distance; Statistical; Divergence; USA; Protein; DNA "Some of the assumptions underlying estimates of DNA and protein sequence divergence are examined. A solution for the variance of these estimates that allows for different mutation rates and different population sizes in each species and for an arbitrary structure in the initial population is obtained. It is shown that these conditions do not strongly affect estimates of divergence. In general, they cause the variance of divergence to be smaller than a binomial variance. Thus, the binomial variance that is usually assumed for these estimates is safely conservative." Mol Biol Evol 1983 1 1 125-142 1267 Li,W.H. A New Method for Estim.. Mol.Biol.Evol. 85 2(2):150-174 Li WH; Wu CI; Luo CC A New Method for Estimating Synonymous and Nonsynonymous Rates of Nucleotide Substitution Considering the Relative Likelihood of Nucleotide and Codon Changes Substitution; Likelihood; Evolutionary distance; Codon; Synonymous; Rate; Nucleotide; USA "A new method is proposed for estimating the number of synonymous and nonsynonymous nucleotide substitutions between homologous genes. In this method, a nucleotide site is classified as nondegenerate, twofold degenerate, or fourfold degenerate, depending on how often nucleotide substitutions will result in amino acid replacement; nucleotide changes are classified as either transitional or transversional, and changes between codons are assumed to occur with different probabilities, which are determined by their relative frequencies among more than 3,000 changes in mammalian genes." Mol Biol Evol 1985 2 2 150-174 1268 Nei,M. Simple Methods for Est.. Mol.Biol.Evol. 86 3(5):418-426 Nei M; Gojobori T Simple Methods for Estimating the Numbers of Synonymous and Nonsynonymous Nucleotide Substitutions Evolutionary distance; Pairwise comparison; Statistical; Substitution; Codon; USA; Synonymous; Nucleotide "Two simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions are presented. Although they give no weights to different types of codon substitutions, these methods give essentially the same results as those obtained by Miyata and Yasunaga's [1980] and by Li et al.'s [1985] methods. Computer simulation indicates that estimates of synonymous substitutions obtained by the two methods are quite accurate unless the number of nucleotide substitutions per site is very large. It is shown that all available methods tend to give an underestimate of the number of nonsynonymous substitutions when the number is large." Mol Biol Evol 1986 3 5 418-426 1269 Takahata,N. A Model of Evolutionar.. Genetics 81 98(Jul.):641-6 Takahata N; Kimura M A Model of Evolutionary Base Substitutions and its Application with Special Reference to Rapid Change of Pseudogenes Evolutionary distance; Substitution; Statistical; Pairwise comparison; JP; Model; Pseudogene "A model of evolutionary base substitutions that can incorporate different substitutional rates between the four bases and that takes into account unequal composition of bases in DNA sequences is proposed. Using this model, we derived formulae that enable us to estimate the evolutionary distances in terms of the number of nucleotide substitutions through comparative studies of nucleotide sequences. In order to check the validity of various formulae, Monte Carlo experiments were performed. These formulae were applied to analyze data on DNA sequences from diverse organisms." Genetics 1981 98 Jul. 641-657 1270 Tamura,K. Estimation of the Numb.. Mol.Biol.Evol. 93 10(3):512-526 Tamura K; Nei M Estimation of the Number of Nucleotide Substitutions in the Control Region of Mitochondrial DNA in Humans and Chimpanzees Evolutionary distance; Substitution; Region; DNA; Nucleotide; USA; Estimation "Examining the pattern of nucleotide substitution for the control region of mitochondrial DNA (mtDNA) in humans and chimpanzees, we developed a new mathematical method for estimating the number of transitional and transversional substitutions per site, as well as the total number of nucleotide substitutions. In this method, excess transition, unequal nucleotide frequencies, and variation of substitution rate among different sites are all taken into account." Mol Biol Evol 1993 10 3 512-526 1271 Tajima,F. Estimation of Evolutio.. Mol.Biol.Evol. 84 1(3):269-285 Tajima F; Nei M Estimation of Evolutionary Distance between Nucleotide Sequences Evolutionary distance; Statistical; Pairwise comparison; Distance; USA; Nucleotide; Estimation "A mathematical formula for estimating the average number of nucleotide substitutions per site (d) between two homologous DNA sequences is developed by taking into account unequal rates of substitution among different nucleotide pairs. Although this formula is obtained for the equal-input model of nucleotide substitution, computer simulations have shown that it gives a reasonably good estimate for a wide range of nucleotide substitution patterns as long as d <= 1. ... A statistical method for estimating the number of nucleotide changes due to deletion and insertion is also developed." Mol Biol Evol 1984 1 3 269-285 1272 Cavalli-Sforz Phylogenetic Analysis:.. Am.J.Hum.Genet. 67 19(3), Part I: Cavalli-Sforza LL; Edwards AWF Phylogenetic Analysis: Models and Estimation Procedures Phylogeny; Likelihood; Evolution; Evolutionary tree; Clustering; Distance; Italy; Model; Phylogenetic; Estimation See also Evolution, 21:550-570(1967). "Acceptance of the theory of evolution as the means of explaining observed similarities and differences among organisms invites the construction of trees of descent purporting to show evolutionary relationships. Whether such trees are based on fossil or living specimens, they may often be criticized for having a subjective element. The purpose of this paper is to show how suitable evolutionary models can be constructed and applied objectively. In it we amplify and extend the methods we have given in previous communications ...." Am J Hum Genet 1967 19 3 233-257 (Part I) 1273 Cavalli-Sforz Phylogenetic Analysis:.. Evolution 67 21:550-570 Cavalli-Sforza LL; Edwards AWF Phylogenetic Analysis: Models and Estimation Procedures Phylogeny; Likelihood; Evolution; Evolutionary tree; Clustering; Distance; Italy; Model; Phylogenetic; Estimation See also American Journal of Human Genetics, 19(3), Part I:233-257(1967). "Acceptance of the theory of evolution as the means of explaining observed similarities and differences among organisms invites the construction of trees of descent purporting to show evolutionary relationships. Whether such trees are based on fossil or living specimens, they may often be criticized for having a subjective element. The purpose of this paper is to show how suitable evolutionary models can be constructed and applied objectively. In it we amplify and extend the methods we have given in previous communications ...." Evolution 21 21 550-570 1274 Goldman,N. Maximum Likelihood Inf.. Syst.Zool. 90 39(4):345-361 Goldman N Maximum Likelihood Inference of Phylogenetic Trees, with Special Reference to a Poisson Process Model of DNA Substitution and to Parsimony Analysis Phylogeny; Likelihood; Parsimony; Substitution; UK; Poisson; DNA; Model; Phylogenetic "Maximum likelihood inference is discussed, and some of its advantages and disadvantages are noted. The application of maximum likelihood inference to phylogenetics is examined, and a simple Poisson process model of DNA substitution is used as one example. Further examples follow from the clarification of implicit models underlying traditional 'parsimony' and 'compatibility' analyses. From the elucidation of these models and analyses, it is seen that Poisson process analysis gives a statistically consistent estimate of phylogeny, and that parsimony methods do indeed have a maximum likelihood foundation but give potentially incorrect estimates of phylogeny." Syst Zool 1990 39 4 345-361 1275 Goldman,N. Statistical Tests of M.. J.Mol.Evol. 93 36:182-198 Goldman N Statistical Tests of Models of DNA Substitution Substitution; Statistical; Phylogeny; Likelihood; Clock; UK; DNA; Model "A test statistic suggested by Cox is employed to test the adequacy of some statistical models of DNA sequence evolution used in the phylogenetic inference method introduced by Felsenstein. Monte Carlo simulations are used to assess significance levels. The resulting statistical tests provide an objective and very general assessment of all the components of a DNA substitution model; more specific versions of the test are devised to test individual components of a model. In all cases, the new analyses have the additional advantage that values of phylogenetic parameters do not have to be assumed in order to perform the tests." J Mol Evol 36 36 182-198 1276 Fukami-Kobaya Robustness of Maximum .. J.Mol.Evol. 91 32:79-91 Fukami-Kobayashi K; Tateno Y Robustness of Maximum Likelihood Tree Estimation Against Different Patterns of Base Substitutions Phylogeny; Likelihood; Substitution; Robustness; JP; Estimation "We first evaluated the robustness of the maximum likelihood (ML) method in the estimation of molecular trees against different nucleotide substitution patterns, including Jukes and Cantor's .... Namely, we conducted computer simulations in which we could set up various evolutionary models of a hypothetical gene, and define a true tree to which an estimated tree by the ML method was to be compared. The results show that topology estimation by the ML method is considerably robust against different ratios of transitions to transversions and different GC contents .... The ML tree estimation based on Jikes and Cantor's model is also revealed to be resistant to GC content, but rather sensitive to the ratio of transitions to transversions." J Mol Evol 32 32 79-91 1277 Nei,M. Relative Efficiencies .. Phylogenetic .. 91Oxford Universi Nei M Relative Efficiencies of Different Tree-Making Methods for Molecular Data Miyamoto MM Cracraft J Phylogenetic Analysis of DNA Sequences Phylogeny; Reliability; Recovery; Review; USA "There are many different tree-making methods that can be used for molecular data. Each of these methods has some advantages and disadvantages, and the overall relative efficiencies of the methods in recovering the correct phylogenetic tree are still controversial. ... In the late 1970s we initiated a comprehensive study of this problem, considering DNA sequences (Tateno et al., 1982). ... In this chapter, a summary of the results of these studies is presented. Before the discussion of these results, however, the theoretical basis of each tree-making method that is used for molecular data will be presented." Oxford University Press New York 1991 90-128 1278 Camin,J.H. A Method for Deducing .. Evolution 65 19:311-326 Camin JH; Sokal RR A Method for Deducing Branching Sequences in Phylogeny Phylogeny; Parsimony; USA "... those trees which most closely resembled the true cladistics invariably required for their construction the least number of postulated evolutionary steps for the characters studied. Subsequently we examined the possibility of reconstructing cladistics by the principle of evolutionary parsimony. ... A method is described for reconstructing presumed cladistic evolutionary sequences of recent organisms and its implications are discussed. ... The reconstruction proceeds on the hypothesis that the minimum number of evolutionary steps yields the correct cladogram. The method has been programmed for computer processing." Evolution 19 19 311-326 1279 Archie,J.W. A Randomization Test f.. Syst.Zool. 89 38(3):239-252 Archie JW A Randomization Test for Phylogenetic Information in Systematic Data Phylogenetic; Significance; Statistical; USA; Systematics "A randomization procedure is proposed to determine if sets of data used for phylogenetic analysis contain phylogenetically nonrandom information. The method compares the observed number of steps on a minimum length tree with the mean number of steps on minimum length trees derived from the same data set after character state assignments have been randomly permuted within each character. Such randomized data sets will exhibit exactly the same character state distributions as the original data but no phylogenetic informaiton." Syst Zool 1989 38 3 239-252 1280 Huelsenbeck,J Tree-Length Distributi.. Syst.Zool. 91 40(3):257-270 Huelsenbeck JP Tree-Length Distribution Skewness: An Indicator of Phylogenetic Information Phylogenetic; Statistical; Significance; Simulation; USA; Distribution "Computer simulations in which phylogenies were generated under various conditions were used to examine the relationship between the phylogenetic signal of a character data set, the skewness of the tree-length distribution, and the position of the real tree relative to the most parsimonious tree for a four- character-state system. Character data that are consistent with one phylogenetic hypothesis produce tree-length distributions that are highly skewed to the left, whereas character data consistent with many phylogenetic hypotheses produce more symmetrical tree-length distributions that cannot be distinguished from tree- length distributions produced by random character data." Syst Zool 1991 40 3 257-270 1281 Faith,D.P. Could a Cladogram This.. Cladistics 91 7(1):1-28 Faith DP; Cranston PS Could a Cladogram This Short Have Arisen by Chance Alone?: On Permutation Tests for Cladistic Structure Phylogenetic; Cladistic; Statistical; Significance; AU; Structure; Permutation "The length of the most-parsimonious tree reflects the degree to which the observed characters co-vary such that a single tree topology can explain shared character states among the taxa. This 'cladistic covariation' can be quantified by comparing the length of the most parsimonious tree for the observed data set to that found for data sets with random covariation of characters. ... The cladistic permutation tail probability, PTP, is defined as the estimate of the proportion of times that a tree can be found as short or shorter than the original tree. Significant cladistic covariation exists if the PTP is less than a prescribed value, for example, 0.05." Cladistics 1991 7 1 1-28 1282 Fitch,W.M. Cautionary Remarks on .. Syst.Zool. 79 28(3):375-379 Fitch WM Cautionary Remarks on Using Gene Expression Events in Parsimony Procedures Phylogeny; Gene; Expression; Parsimony; USA "It can be seen that the distribution [of tree lengths] is normal .... I have never seen a normal distribution for real sequences before. ... But while the distribution is not skewed in the more usual fashion, that does not prevent the figure from illustrating an important feature that is not sufficiently recognized. That feature is that there may be more than one most parsimonious tree topology, that there may be many tree topologies that are less parsimonious by only a very small number of substitutions, and that the total range of substitutions from the best to the worst tree will most commonly be much less than the number of possible trees if the taxa are reasonably numerous." Syst Zool 1979 28 3 375-379 1283 Hillis,D.M. Discriminating Between.. Phylogenetic .. 91Oxford Universi Hillis DM Discriminating Between Phylogenetic Signal and Random Noise in DNA Sequences Miyamoto MM Cracraft J Phylogenetic Analysis of DNA Sequences Phylogenetic; Signal; USA; DNA "In parsimony analysis, changes at nucleotide positions among aligned sequences are mapped onto a tree, and the number of evolutionary changes required to accommodate that tree with the data is calculated as the tree length. For any given data set, this procedure may be repeated for many thousands of trees .... The optimal tree is thus the one that requires the fewest number of evolutionary changes. In this chapter, I argue that the shape of the distribution of tree lengths contains information useful in deciding whether or not the data set contains phylogenetic signal." Oxford University Press New York 1991 278-294 1284 Goodman,M. Further Remarks on the.. Syst.Zool. 79 28(3):379-385 Goodman M; Czelusniak J; Moore GW Further Remarks on the Parameter of Gene Duplication and Expression Events in Parsimony Reconstructions Phylogeny; Expression; Gene; Duplication; Parsimony; USA "We share the concern expressed by Walter Fitch (1979) in his cautionary remarks about possible pitfalls in the parsimony procedure advocated by us (Goodman et al., 1979). Thus, while we do not fully agree with all his points, our present remarks are intended to be complementary to his and to thereby help elucidate the problem of constructing a correct genealogical tree from amino acid sequence data. ... Thus, in contrast to the usual maximum parsimony reconstruction which only minimizes the number of nucleotide replacements throughout the tree, the new procedure minimizes the sum of nucleotide replacements, gene duplications, and gene expression events." Syst Zool 1979 28 3 379-385 1285 Hillis,D.M. Signal, Noise, and Rel.. J.Hered. 92 83:189-195 Hillis DM; Huelsenbeck JP Signal, Noise, and Reliability in Molecular Phylogenetic Analyses Phylogenetic; Statistical; Significance; Signal; Reliability; USA "DNA sequences and other molecular data compared among organisms may contain phylogenetic signal, or they may be randomized with respect to phylogenetic history. Some method is needed to distinguish phylogenetic signal from random noise to avoid analysis of data that have been randomized with respect to the historical relationships of the taxa being compared. ... The distribution of tree lengths of all tree topologies (or a random sample thereof) provides a sensitive measure of phylogenetic signal: data matrices with phylogenetic signal produce tree-length distributions that are strongly skewed to the left .... Tables of critical values of a skewness test statistic, g1, are provided ...." J Hered 83 83 189-195 1286 Hasegawa,M. Dating of the Human-Ap.. J.Mol.Evol. 85 22:160-174 Hasegawa M; Kishino H; Yano T Dating of the Human-Ape Splitting by a Molecular Clock of Mitochondrial DNA Evolutionary divergence; Statistical; Markov; Clock; JP; DNA "A new statistical method for estimating divergence dates of species from DNA sequence data by a molecular clock approach is developed. This method takes into account effectively the information contained in a set of DNA sequence data." J Mol Evol 22 22 160-174 1287 Perler,F. The Evolution of Genes.. Cell (Cambridge 80 20:555-566 Perler F; Efstratiadis A; Lomedico P; Gilbert W; Kolodner R; Dodgson J The Evolution of Genes: The Chicken Preproinsulin Gene Evolutionary divergence; Evolution; Gene; Clock; USA "The divergences between insulin gene sequences, and also between globin genes, show that changes at introns and silent positions in coding regions appear very rapidly ..., but that the accumulation of changes in these sites saturates, although not completely, after about 100 million years. From this we conclude that not all of these sites are neutral and that they do not behave as accurate evolutionary clocks over long periods of time. However, nucleotide substitutions leading to amino acid replacements are an excellent clock. Our analysis indicates that this clock is driven by selection." Cell (Cambridge, Mass ) 1980 20 555-566 1288 Farris,J.S. Estimating Phylogeneti.. Am.Nat. 72 106(Sept.-Oct. Farris JS Estimating Phylogenetic Trees from Distance Matrices Clustering; Phylogeny; Evolutionary tree; USA; Distance; Phylogenetic "In this paper I shall describe a modification of the Wagner tree- constructing technique of Kluge and Farris. The new procedure operates only upon an OTU x OTU matrix of phenetic differences and has no need to reference a character-state matrix." Am Nat 1972 106 Sept.-Oct. 645-668 1289 Sneath,P.H.A. Numerical Taxonomy: Th.. 73W. H. Freeman Sneath PHA; Sokal RR Numerical Taxonomy: The Principles and Practice of Numerical Classification BK - Hierarchical; Classification; Clustering; Distance; UPGMA; UK See section 5.5 (Sequential, Agglomerative, Hierarchic, Nonoverlapping Clustering Methods, pp. 214-245) and in particular the discussion of UPGMA on pages 230-234. W H Freeman San Francisco 1973 pp. xv+573-0 1290 Holmquist,R. Analysis of Higher-Pri.. Mol.Biol.Evol. 88 5(3):217-236 Holmquist R; Miyamoto MM; Goodman M Analysis of Higher-Primate Phylogeny from Transversion Differences in Nuclear and Mitochondrial DNA by Lake's Methods of Evolutionary Parsimony and Operator Metrics Phylogeny; Evolutionary tree; Character data; Invariant; Parsimony; USA; DNA; Transversion "We concluded that there is no agreement on either the correct branching order or differential rates of evolution among the higher primates .... Recently, Lake developed two novel methods, based on group properties of transition and transversion operators, that (a) permit, in principle, objective resolution of problems of the above type and (b) attach a statistical significance level to the conclusions drawn. In the present paper, we develop formulas for using these two methods in tandem and apply them to study transversion differences in nuclear [and] mitochondrial DNA ...." Mol Biol Evol 1988 5 3 217-236 1291 Dixon,M.T. Ribosomal RNA Secondar.. Mol.Biol.Evol. 93 10(1):256-267 Dixon MT; Hillis DM Ribosomal RNA Secondary Structure: Compensatory Mutations and Implications for Phylogenetic Analysis Character weight; Phylogeny; Structure; RNA; USA; Phylogenetic; Secondary "Using sequence data from the 28S ribosomal RNA (rRNA) genes of selected vertebrates, we investigated the effects that constraints imposed by secondary structure have on the phylogenetic analysis of rRNA sequence data. Our analysis indicates that characters from both base-pairing regions (stems) and non-base- pairing regions (loops) contain phylogenetic information, as judged by the level of support of the phylogenetic results compared with a well-established tree based on both morphological and molecular data." Mol Biol Evol 1993 10 1 256-267 1292 Kallersjo,M. Skewness and Permutation Cladistics 92 8(3):275-287 Kallersjo M; Farris JS; Kluge AG; Bult C Skewness and Permutation Phylogenetic; Statistical; Significance; USA; Permutation "Following Fitch's (1979) early suggestion, Le Quesne (1989), Huelsenbeck (1991) and Hillis (1991) have all recommended assessing the phylogenetic structure in systematic data according to the skewness of the distribution of tree lengths. We point out here that such evaluations can be misleading; arguments for that approach are not well-considered. ... To resolve this problem we introduce a test based on a new measure - total support - which takes multiple most parsimonious trees into account. Our fast method for approximating support may prove useful in analyses of very large data matrices." Cladistics 1992 8 3 275-287 1293 Sidow,A. Compositional Statisti.. J.Mol.Evol. 90 31:51-68 Sidow A; Wilson AC Compositional Statistics: An Improvement of Evolutionary Parsimony and its Application to Deep Branches in the Tree of Life Phylogeny; Evolutionary tree; Invariant; Parsimony; Statistical; USA "We present compositional statistics, a new method of phylogenetic inference, which is an extension of evolutionary parsimony. Compositional statistics takes account of the base composition of the compared sequences by using nucleotide positions that evolutionary parsimony ignores. It shares with evolutionary parsimony the features of rate invariance and the fundamental distinction between transitions and transversions. Of the presently available methods of phylogenetic inference, compositional statistics is based on the fewest and mildest assumptions about the mode of DNA sequence evolution." J Mol Evol 31 31 51-68 1294 Eddy,S.R. RNA Sequence Analysis .. Nucleic Acids R 94 22(11):2079-20 Eddy SR; Durbin R RNA Sequence Analysis using Covariance Models Sequence analysis; Probabilistic; Consensus sequence; Structure; UK; RNA; Covariance; Model "We describe a general approach to several RNA sequence analysis problems using probabilistic models that flexibly describe the secondary structure and primary sequence consensus of an RNA sequence family. We call these models 'covariance models'. A covariance model of tRNA sequences is an extremely sensitive and discriminative tool for searching for additional tRNAs and tRNA- related sequences in sequence databases. A model can be built automatically from an existing sequence alignment." Nucleic Acids Res 1994 22 11 2079-2088 1295 Felsenstein,J Confidence Limits on P.. Syst.Zool. 85 34(2):152-161 Felsenstein J Confidence Limits on Phylogenies With a Molecular Clock Evolutionary tree; Robustness; Analytical; Statistical; Clock; USA; Confidence; Phylogeny "For three species in the presence of a molecular clock, it is possible to compute how many steps a phylogeny must have to be significantly worse than the most parsimonious phylogeny. ... The distribution of two statistics is obtained by direct enumeration of all possible outcomes .... The two statistics are the number of fewer steps in the best tree than in the next best tree, and the number of 'phylogenetically informative' characters supporting the best tree. These two statistics prove to be approximately equivalent in statistical power, and tables of 95%-significance values are provided for each." Syst Zool 1985 34 2 152-161 1296 Williams,S.A. A Statistical Test tha.. Mol.Biol.Evol. 89 6(4):325-330 Williams SA; Goodman M A Statistical Test that Supports a Human/Chimpanzee Clade Based on Noncoding DNA Sequence Data Evolutionary tree; Robustness; Analytical; USA; Statistical; DNA "Using the aligned DNA sequence data of Miyamoto et al. [1988] and Maeda et al. [1988], all noncoding genetic material, and a simple statistical test, we show that a Homo/Pan clade is supported at approximately the 3% level of significance. The method accommodates polymorphism and different evolutionary rates for different sites. All assumptions on which the statistical study is based are made explicit." Mol Biol Evol 1989 6 4 325-330 1297 Felsenstein,J Distance Methods: A Re.. Cladistics 86 2(2):130-143 Felsenstein J Distance Methods: A Reply to Farris Phylogeny; Statistical; USA; Distance "Farris (1985) claimed that my assertions about unbiasedness and consistency of estimates of a phylogeny obtained by least squares fitting are in error. ... It is argued, contrary to Farris's claims, that one need not avoid nonmetric distances, and that one should avoid negative branch lengths in estimates of phylogenies from distance data. Statistical tests of clockness, and, to a limited extent, of alternative phylogenies can be constructed, and these are demonstrated by example. ... Information on phylogenies is present in distance data, as in other kinds of data, and statistical methods can be developed to extract it." Cladistics 1986 2 2 130-143 1298 Hillis,D.M. An Empirical Test of B.. Syst.Biol. 93 42(2):182-192 Hillis DM; Bull JJ An Empirical Test of Bootstrapping as a Method for Assessing Confidence in Phylogenetic Analysis Evolutionary tree; Robustness; Resampling; Bootstrap; Confidence; USA; Phylogenetic "Although bootstrapping was first applied in phylogenetics to assess the repeatability of a given result, bootstrap results are commonly interpreted as a measure of the probability that a phylogenetic estimate represents the true phylogeny. Here we use computer simulations and a laboratory-generated phylogeny to test bootstrapping results of parsimony analyses, both as measures of repeatability (i.e., the probability of repeating a result given a new sample of characters) and accuracy (i.e., the probability that a result represents the true phylogeny)." Syst Biol 1993 42 2 182-192 1299 Lanyon,S.M. Detecting Internal Inc.. Syst.Zool. 85 34(4):397-403 Lanyon SM Detecting Internal Inconsistencies in Distance Data Evolutionary tree; Robustness; Jackknife; USA; Distance "Phylogenetic trees, derived from distance measures, may be of variable reliability due to variance in the quality of the data sets from which they are produced. Such trees, therefore, are of questionable value as a means of summarizing large data sets. To improve our confidence in these trees, a jackknife technique is presented that, in combination with existing consensus techniques, identifies those portions of evolutionary history that are poorly known due to inconsistencies in the data. ... The approach is a simple modification of existing tree-generating methods." Syst Zool 1985 34 4 397-403 1300 Penny,D. Estimating the Reliabi.. Mol.Biol.Evol. 86 3(5):403-417 Penny D; Hendy M Estimating the Reliability of Evolutionary Trees Evolutionary tree; Robustness; Resampling; Jackknife; Reliability; NZ "Six protein sequences from the same 11 mammalian taxa were used to estimate the accuracy and reliability of phylogenetic trees using real, rather than simulated, data. ... It was concluded that it is possible to give a reasonable estimate of the reliability of the final tree, at least when several sequences are combined. ... In our opinion, it is unreasonable to publish an evolutionary tree derived from sequence data without giving an idea of the reliability of the tree." Mol Biol Evol 1986 3 5 403-417 1301 Sanderson,M.J Confidence Limits on P.. Cladistics 89 5(2):113-129 Sanderson MJ Confidence Limits on Phylogenies: The Bootstrap Revisited Evolutionary tree; Robustness; Resampling; Bootstrap; Confidence; USA; Phylogeny "The bootstrap, a non-parametric statistical analysis, can be used to assess confidence limits on phylogenies. The method most widely used tests the monophyly of individual clades. This paper proposes additional applications of the bootstrap which provide useful information about phylogeny even when many clades are found not to be supported with confidence (as often occurs in practice). In such cases it is still possible to place a constraint on the phylogenetic position of taxa by examining the relative size of the smallest monophyletic groups that contain them." Cladistics 1989 5 2 113-129 1302 Penny,D. Testing Methods of Evo.. Cladistics 85 1(3):266-278 Penny D; Hendy MD Testing Methods of Evolutionary Tree Construction Phylogeny; Evolutionary tree; Confidence; Robustness; Character weight; NZ "Evaluating the reliability of methods for reconstructing evolutionary trees is discussed under the four headings of: evaluating criteria for an optimal tree, finding the optimal tree for the criterion selected, detecting reliable and unreliable data, and estimating the error range for the final tree. ... An objective weighting of columns (characters) can lead to an improved tree by giving less weight to columns that are closer to a random order. The weighting of characters is derived from the ratio of the observed to expected number of incompatabilities for each column. Several forms of character weighting give better trees ...." Cladistics 1985 1 3 266-278 1303 Phylogenetic Analysis .. 91Oxford Universi Phylogenetic Analysis of DNA Sequences Miyamoto MM Cracraft J BK - Sequence analysis; Phylogeny; Evolution; USA; DNA; Phylogenetic "This volume has assembled an internationally recognized group of investigators representing different theoretical viewpoints and disciplines to address critically a diversity of questions about DNA systematics. ... This book has its roots in the symposium 'Recent Advances in Phylogenetic Studies of DNA Sequences,' which was part of a special centennial celebration of the American Society of Zoologists, held in conjunction with the Society of Systematic Zoology, on December 26-30, 1989 in Boston Massachusetts. ... Each participant concentrated on the strengths, limitations, and assumptions of their approaches relative to others." Oxford University Press New York 1991 x+358-0 1304 Crochemore,M. Pattern Matching in St.. Image Analysi.. 88Plenum Crochemore M; Perrin D Pattern Matching in Strings Cantoni V Di Gesu V; Levialdi S Image Analysis and Processing II String match; Pattern match; Factorization; FR "In this paper, we present a new method for pattern matching in strings. ... From the practical viewpoint, its merits consist in requiring only constant additional memory space. It can therefore be compared with the algorithm of [Galil, Seiferas (1983)] but it is faster and simpler. From the theoretical viewpoint, its main feature is that it makes use of a deep theorem on words known as the critical factorization theorem due to Cesari, Duval, Vincent .... It is also amusing that the new algorithm can be considered as a compromise between Knuth-Morris-Pratt's and Boyer-Moore's algorithms." Plenum New York 1988 67-79 1305 Crochemore,M. Constant-space String-.. Foundations o.. 88Springer-Verlag Crochemore M Constant-space String-matching Nori ? Kumar ? Foundations of Software Technology and Theoretical Computer Science String match; Factorization; FR "We present a string-matching algorithm with the following properties: it is linear in time with a small multiplicative constant during all its phases; it processes the searched text with constant memory space in addition to the string. ... During its first phase the algorithm computes the smallest period of the pattern, in some situations. The computation succeeds when this period is not too great. The question remains whether there exists an algorithm computing the smallest period of a word in linear time with constant extra memory space." Springer-Verlag New York 1988 80-87 1306 Crochemore,M. String-Matching and Pe.. EATCS Bull. 89 39:149-153 Crochemore M String-Matching and Periods String match; Regularities; FR "We present a new string-matching algorithm based on a computation of periods of the pattern. It is linear in time and uses a fixed number of memory locations in addition to the text and the pattern. Therefore it is time-space- optimal as the algorithms of [Galil & Seiferas 1983] and [Crochemore & Perrin 1989]. Its main characteristic is that it scans the pattern from left to right as [Knuth, Morris & Pratt 1977] does. No preprocessing of the pattern is needed and the complexity is independent of the size of the pattern." EATCS Bull 39 39 149-153 1307 Bafna,V. Genome Rearrangements .. 94 Bafna V; Pevzner PA Genome Rearrangements and Sorting by Reversals BK - Genome; Rearrangement; Reversal; USA Preprint dated 14 Oct. 1994, 28 pp. "Recently, Kececioglu and Sankoff gave the first approximation algorithm for sorting by reversals with guaranteed error bound 2 and identified open problems related to chromosome rearrangements. One of these problems is Gollan's conjecture on the reversal diameter of the symmetric group. This paper proves the conjecture. Further the problem of expected reversal distance between two random permutations is investigated. ... An approximation algorithm for signed permutations is presented, which provides a performance guarantee of 3/2. Finally, using the signed permutations approach, an approximation algorithm for sorting by reversals is described, which achieves a performance guarantee of 7/4." 1994 1308 Waterman,M. Sequence Comparison Si.. 94 Waterman M; Vingron M Sequence Comparison Significance and Poisson Approximation BK - Sequence comparison; Poisson; Approximation; Chen-Stein; USA; Significance Preprint received 7 Dec. 1994, 44 pp. "The Chen-Stein method of Poisson approximation has been used to establish theorems about comparison of two DNA or protein sequences. The most useful result for sequence alignment applies to alignment scoring for aligned letters and no gaps. However there has not been a valid method to assign statistical significance to alignment scores with gaps. In this paper we extend Poisson approximation techniques using the Aldous clumping heuristic to a practical method of estimating statistical significance." 1994 1309 Day,W.H.E. Estimating Phylogenies.. Classificatio.. 91Springer-Verlag Day WHE Estimating Phylogenies with Invariant Functions of Data Bock HH Ihm P Classification, Data Analysis, and Knowledge Organization: Models and Methods with Applications Phylogeny; Invariant; CA; Function "What is encouraging, however, is that researchers are beginning to develop methods of estimating phylogenies which may be robust under conditions where parsimony is not. A strategy shared by some of these methods (Cavender, Felsenstein (1987), Lake (1987)) is to use invariant functions of the data to identify the correct topology of the corresponding phylogeny. But which invariants, and how? What assumptions underlie these approaches? I discuss these issues and indicate the direction this research seems to be taking." Springer-Verlag Berlin 1991 248-253 1310 Wolf,K. Variance Estimation in.. Classificatio.. 91Springer-Verlag Wolf K; Degens PO Variance Estimation in the Additive Tree Model Bock HH Ihm P Classification, Data Analysis, and Knowledge Organization: Models and Methods with Applications Additive tree; Robustness; Confidence; Likelihood; Distance; DE; Variance; Model; Estimation "By the use of stochastic models it is possible to judge procedures for fitting additive trees to dissimilarity data. We use the simple additive error model ... to analyse the accuracy of an estimated additive tree by estimating its variance, too. Analogously to the three-object variance estimator in the ultrametric case ... we propose a four-object variance estimator based on the simple maximum-likelihood ... variance estimation for all subsets consisting of any four objects of an additive tree. In contrast to variance estimation using the residual sum of squares this new estimator is not based on the assumed i.e. estimated structure of the given dissimilarity data." Springer-Verlag Berlin 1991 262-269 1311 Archie,J.W. Homoplasy Excess Stati.. Syst.Zool. 90 39(2):169-174 Archie JW Homoplasy Excess Statistics and Retention Indices: A Reply to Farris Evolutionary tree; Character data; Phylogenetic; USA "In a recent paper (Archie, 1989) I introduced a new approach to measuring levels of homoplasy in phylogenetic data sets in which the observed level of homoplasy is compared to the maximum achievable for a data set containing no phylogenetic information. ... The properties and behavior of the homoplasy excess statistics ... were compared to the consistency index (Kluge and Farris, 1969). ... Farris (1990) discusses several minor points regarding Archie (1989) which deserve comment ...." Syst Zool 1990 39 2 169-174 1312 Sanderson,M.J Flexible Phylogeny Rec.. Syst.Zool. 90 39(4):414-420 Sanderson MJ Flexible Phylogeny Reconstruction: A Review of Phylogenetic Inference Packages Using Parsimony Phylogeny; Parsimony; Program; Review; USA; Phylogenetic A review of PHYLIP, Hennig86 and PAUP. "In short, hastily formed opinions are not hard to find, and it is desirable to have a set of standards by which to judge these programs while minimizing the personal biases that inevitably intrude." Criteria: efficiency, documentation, flexibility, synthesis, benchmark comparisons. Syst Zool 1990 39 4 414-420 1313 Faith,D.P. Probability, Parsimony.. Syst.Biol. 92 41(2):252-257 Faith DP; Cranston PS Probability, Parsimony, and Popper Cladistic; Phylogenetic; Character data; Resampling; AU; Probability; Parsimony "Randomization tests for cladistic structure ... compare the minimum length of the tree found for the original character data to the length of the tree achieved for corresponding randomized data sets. ... In practice, the proportion of all data sets (observed and random) having a tree length as short or shorter than that of the observed tree is defined as the 'cladistic permutation tail probability' or PTP (Faith, Cranston 1991). ... In this paper, we formalize this connection between PTP tests and degree of corroboration by developing a link between the tail probability associated with PTP tests and the concept of corroboration as developed by Popper (1959)." Syst Biol 1992 41 2 252-257 1314 Knight,A. Substitution Bias, Wei.. Syst.Biol. 93 42(1):18-31 Knight A; Mindell DP Substitution Bias, Weighting of DNA Sequence Evolution, and the Phylogenetic Position of Fea's Viper Substitution; Character weight; USA; Evolution; DNA; Phylogenetic; Bias "Character state weights were assigned prior to phylogenetic analysis in proportion to the ratio of expected to observed nucleotide differences in pairwise comparisons of sequences. ... This method may help in resolving rapid radiations and other relationships obscured by homoplasy." Syst Biol 1993 42 1 18-31 1315 Steel,M.A. Parsimony can be Consi.. Syst.Biol. 93 42(4):581-587 Steel MA; Hendy MD; Penny D Parsimony can be Consistent! Phylogeny; Parsimony; Consistency; NZ "A desired property of any method for reconstructing evolutionary trees is that it be consistent, i.e., as sequences become longer the method will recover the correct tree with probability tending to 1. ... We report here that the original conclusion is too sweeping in that the problem is not with the parsimony criterion itself but rather with the implementation of the criterion. ... Many criteria, including parsimony and compatibility, are consistent after appropriate nonlinear transformations that adjust for multiple hits (Penny et al., 1993)." Syst Biol 1993 42 4 581-587 1316 Penny,D. Some Recent Progress w.. N.Z.J.Bot. 93 31(3):275-288 Penny D; Watson EE; Hickson RE; Lockhart PJ Some Recent Progress with Methods for Evolutionary Trees Phylogeny; Evolutionary tree; Consistency; Spectral analysis; Parsimony; NZ "We discuss methods for inferring evolutionary trees from these patterns or signals under five properties desired for an ideal method. These five desiderata are that the methods be efficient (fast), consistent, powerful, robust, and falsifiable. Our conclusion is that corrections for multiple changes in sequences are the most important factor for any method to be consistent. Most optimality criteria, including compatibility and parsimony, become consistent when the sequences have appropriate corrections for multiple changes. Conversely, virtually no methods are consistent without adjustments for multiple changes." N Z J Bot 1993 31 3 275-288 1317 Schoniger,M. More Reliable Phylogen.. Information a.. 93Springer-Verlag Schoniger M; von Haeseler A More Reliable Phylogenies by Properly Weighted Nucleotide Substitutions Opitz O Lausen B; Klar R Information and Classification: Concepts, Methods and Applications Phylogeny; Substitution; Evolutionary rate; DE; Nucleotide "The efficiency of the neighbor-joining method under a variety of substitution rates, transition-transversion biases and model trees is studied. If substitution rates vary considerably and the ratio of transitions and transversions is large, even a Kimura (1980) two-parameter correction cannot guarantee reconstruction of the model tree. We show that application of the combinatorial weighting method by Williams and Fitch (1990) together with the Jukes-Cantor (1969) correction significantly improves the efficiency of tree reconstructions for a wide range of evolutionary parameters." Springer-Verlag Berlin 1993 413-420 1318 Trifonov,E.N. Nucleotide Sequences a.. Classificatio.. 88Elsevier Scienc Trifonov EN Nucleotide Sequences as a Language: Morphological Classes of Words Bock HH Classification and Related Methods of Data Analysis Sequence analysis; Linguistic; IL; Nucleotide; Word; Language "Like every known written language the nucleotide sequences are repetitive, i.e. certain words (strings) of the four letter alphabet ... occur frequently, while other combinations of letters are avoided. There are several morphologically distinct classes of words (morphemes) in this language of the nucleotide sequences (Gnomic language). ... Oligonucleotide ('syllabic') composition of words of semantic dictionary of Gnomic language is discussed. Gnomic 'speech apparatus' appears to favor certain combinations of letters." Elsevier Science Publishers B V (North Holland) Amsterdam 1988 57-64 1319 Vach,W. The Jukes-Cantor Trans.. Analyzing and.. 92Springer-Verlag Vach W The Jukes-Cantor Transformation and Additivity of Estimated Genetic Distances Schader M Analyzing and Modeling Data and Knowledge Substitution; Evolutionary distance; DE; Genetic; Distance "We give a simple derivation for the Jukes-Cantor transformation. The importance of the transformation with respect to distance-based tree constructing methods is demonstrated. We show that it is not justified to expect that the transformation improves the additivity of the estimated distance if non-additivity is measured by the degree of violation of the equation in the four-point condition. Finally, some effects of model violation are discussed." Springer-Verlag Berlin 1992 141-150 1320 Wheeler,W.C. Nucleic Acid Sequence .. Cladistics 90 6(4):363-367 Wheeler WC Nucleic Acid Sequence Phylogeny and Random Outgroups Phylogeny; Character data; Outgroup; USA; Nucleic acid "When divergent taxa are used to root networks, it is assumed that the character states in the outgroup have historical similarity to those in the ingroup. Yet, if the data are nucleic acid sequences, the character states shared by a divergent outgroup may be based not on history but on random similarity. A simple procedure is proposed to test this possibility. In the absence of an appropriate outgroup, root position can be estimated with the use of an asymmetrical character transformation matrix. If the matrix is sufficiently biased, it can supply the polarity information usually derived from an outgroup." Cladistics 1990 6 4 363-367 1321 Wheeler,W.C. Combinatorial Weights .. Cladistics 90 6(3):269-275 Wheeler WC Combinatorial Weights in Phylogenetic Analysis: A Statistical Parsimony Procedure Phylogeny; Character weight; Statistical; Parsimony; USA; Combinatorial; Phylogenetic "A data dependent weighting procedure is developed to allow the comparison of phylogenetic trees based on nucleic acid sequence data. The sampling error of this cladogram 'cost' is then examined, permitting statistical evaluation of the cost differential." Cladistics 1990 6 3 269-275 1322 Felsenstein,J Distance Methods for I.. Evolution 84 38(1):16-24 Felsenstein J Distance Methods for Inferring Phylogenies: A Justification Phylogeny; USA; Distance "There are at least two different logical frameworks underlying distance methods. Farris (1981) has presented a major critique of distance methods, finding a number of them to be ill-justified. We shall see that his conclusions come from adopting one of the two logical frameworks, and that when the other is adopted, many of these methods do turn out to have a coherent logical basis. The Path Length Interpretation. ... A Statistical Framework for Distance Methods." Evolution 1984 38 1 16-24 1323 Drolet,S. Quadratic Tree Invaria.. J.Theor.Biol. 90 144:117-129 Drolet S; Sankoff D Quadratic Tree Invariants for Multivalued Characters Phylogeny; Invariant; Character data; CA "Generalization to characters other than binary is difficult because of the computational size of the problem - when the Cavender-Felsenstein method is applied directly to the case of three-valued characters, a quartic polynomial involving 22,050 terms results. Algebraic manipulation with the help of MACSYMA, however, shows that there are quadratic branch-length invariants in this case as well. Similarities in the form of the binary and trinary character invariants suggests a form for the case of four-valued characters and numerous tests confirm this. It is this case which will be of use in phylogenetic reconstruction based on nucleotide sequence data." J Theor Biol 144 144 117-129 1324 De Soete,G. A Least Squares Algori.. Psychometrika 83 48(4):621-626 De Soete G A Least Squares Algorithm for Fitting Additive Trees to Proximity Data Phylogeny; Additive tree; Distance; Least squares; Belgium; Square; Algorithm "A least squares algorithm for fitting additive trees to proximity data is described. The algorithm uses a penalty function to enforce the four point condition on the estimated path length distances. The algorithm is evaluated in a small Monte Carlo study. Finally, an illustrative application is presented." Psychometrika 1983 48 4 621-626 1325 Buneman,P. The Recovery of Trees .. Mathematics i.. 71Edinburgh Unive Buneman P The Recovery of Trees from Measures of Dissimilarity Hodson FR Kendall DG; Tautu P Mathematics in the Archaeological and Historical Sciences Phylogeny; Additive tree; Distance; UK; Recovery "The problem of inferring an evolutionary tree from a set of measurements is one that crops up in various fields .... For example, amino-acid sequences of the same protein extracted from different organisms can be determined, and one can attempt, from the dissimilarities between these sequences, to construct a phylogenetic tree of these organisms. ... The object of this paper is to show that there is a method for inferring a tree from a [dissimilarity coefficient] which has properties that may make it rather more attractive than other currently available methods." Definition of the four-point condition for additive tree metrics. Edinburgh University Press Edinburgh 1971 387-395 1326 Le Quesne,W.J Frequency Distribution.. Cladistics 89 5(4):395-407 Le Quesne WJ Frequency Distributions of Lengths of Possible Networks from a Data Matrix Phylogeny; Statistical; Distribution; UK; Network; Matrix "The aim of the present work has been to examine the frequency distributions of the unrooted tree lengths obtained with real and random data, and to see whether any assessment of the information content of the data matrix could be made from the characteristics of this distribution." Cladistics 1989 5 4 395-407 1327 Prager,E.M. Construction of Phylog.. J.Mol.Evol. 78 11:129-142 Prager EM; Wilson AC Construction of Phylogenetic Trees for Proteins and Nucleic Acids: Empirical Evaluation of Alternative Matrix Methods Phylogeny; Distance; Parsimony; UPGMA; USA; Protein; Nucleic acid; Phylogenetic; Matrix "The methods of Fitch and Margoliash and of Farris for the construction of phylogenetic trees were compared. It is suggested that were input data are likely to include overestimates as well as true estimates and underestimates of the actual distances between taxonomic units, the F-M method is the most reasonable to use for constructing phylogenies from distance matrices. ... By contrast, where it is known that each input datum is indeed either a true estimate or an underestimate of the actual distance between 2 taxonomic units, the Farris procedure appears, on theoretical grounds, to be the method of choice. Amino acid and nucleotide sequence data are in this category." J Mol Evol 11 11 129-142 1328 Farris,J.S. Distance Data Revisited Cladistics 85 1(1):67-85 Farris JS Distance Data Revisited Phylogeny; USA; Distance "Objections to my earlier demonstratin, that the branch lengths of trees fitted to distance matrices have no physical interpretation, are shown to be ill-founded. ... A method is introduced for constructing multiple trees of optimal or near-optimal fit to distance data, and this is found to give better performance than previous methods. Most published trees based on distances have been poorly chosen. Consensus trees of several trees with near-optimal fit are found to be quite poorly resolved, and it appears that molecular distances seldom provide much useful information on phylogenetic relationships." Cladistics 1985 1 1 67-85 1329 Farris,J.S. Distance Data in Phylo.. Advances in C.. 81New York Botani Farris JS Distance Data in Phylogenetic Analysis Funk VA Brooks DR Advances in Cladistics: Proceedings of the First Meeting of the Willi Hennig Society Phylogeny; USA; Distance; Phylogenetic "It is my aim here to concentrate on the methodological issues posed by distance analyses. I shall begin by tracing the development of techniques in the context of immunological distance, perhaps the most common type of distance data. Later I shall show how that discussion can be extended to other kinds of molecular distances, and to distances generally." Reconstructing Genealogies. Optimal Branch Length Fitting. Improving Fit. Measures of Fit. Clocks and Ultrametrics. Path Length Interpretations. Euclidean Distances. Manhattan Distances. New York Botanical Garden Bronx, NY 1981 3-23 1330 Swofford,D.L. On the Utility of the .. Advances in C.. 81New York Botani Swofford DL On the Utility of the Distance Wagner Procedure Funk VA Brooks DR Advances in Cladistics: Proceedings of the First Meeting of the Willi Hennig Society Phylogeny; USA; Distance "I find little empirical justification for the notion that the F-M [Fitch- Margoliash] method is preferable to the distance Wagner procedure. Indeed, I will show here that the same data sets and trees used by Prager and Wilson (1978) to demonstrate the supposed superiority of the F-M method can in fact be used to support exactly the opposite conclusion. That is, when the two methods are compared fairly, the distance Wagner procedure outperforms the F-M method." New York Botanical Garden Bronx, NY 1981 25-43 1331 Felsenstein,J PHYLIP (Phylogeny Infe.. 94 Felsenstein J PHYLIP (Phylogeny Inference Package) Version 3.5c: Executables for Macintosh BK - Phylogeny; Program; USA This README document introduces executable programs that are available from J. Felsenstein by anonymous ftp (file transfer protocol). For information on how to obtain the programs, request the file 'Getting PHYLIP 3.5 by ftp' from userid 'joe' at electronic mail address 'genetics.washington.edu' 1994 1332 Farris,J.S. Distances and Statistics Cladistics 86 2(2):144-157 Farris JS Distances and Statistics Phylogeny; Statistical; USA; Distance "Felsenstein's claim of approximate additivity for sequence differences is based on an unjustified model, as is his proposed nonadditive fitting method. His advocacy of the nonnegativity restriction on fitted branch lengths rests on the false premise that distances are additive. His proposed significance test confounds sampling error with departures from additivity and rests on false assumptions of additivity and of independence of distances. His additive fitting program lacks any useful facility for recognizing ambiguities in distance data." Cladistics 1986 2 2 144-157 1333 Penny,D. Towards a Basis for Cl.. J.Theor.Biol. 82 96:129-142 Penny D Towards a Basis for Classification: The Incompleteness of Distance Measures, Incompatibility Analysis and Phenetic Classification Classification; Character data; NZ; Distance "It is shown that information is lost in converting the original data to distances in that it is in general not possible to recover the original data from a distance matrix. ... It is concluded that because of this loss of information the methods of phenetic classification are inherently weaker than methods that retain the original data. An indication is given of how information is lost in transforming to distances. Incompatibility matrices are shown not to contain all the original information but these methods usually retain the original data for tree building." J Theor Biol 96 96 129-142 1334 Vingron,M. Towards Integration of.. 94 Vingron M; von Haeseler A Towards Integration of Multiple Alignment and Phylogenetic Tree Construction BK - Multiple alignment; Evolutionary tree; Phylogeny; DE; Phylogenetic Preprint dated June 1994, 16 pp. "The central problem in the study of molecular evolution is the reconstruction of the history of a set of biological sequences in the form of a phylogenetic tree. One of the steps in calculating this tree is computation of a multiple alignment of the set of sequences. Most existing approaches treat the two problems of multiple alignment and tree construction as separate while in fact they influence each other. Based on three-way alignments of pre-aligned groups of sequences we adapt a commonly used tree construction procedure to produce both tree and multiple alignment simultaneously. A sufficient criterion to prevent the introduction of edges with negative length reduces the number of three-way alignments that need to be computed." 1994 1335 De Soete,G. On the Construction of.. Z.Naturforsch.T 83 38:156-158 De Soete G On the Construction of 'Optimal' Phylogenetic Trees Phylogeny; Distance; Belgium; Phylogenetic "An iterative algorithm for constructing the optimal phylogenetic tree from a given set of dissimilarity data is described. The procedure is applied for illustrative purposes to a data set compiled by Fitch and Margoliash." Z Naturforsch Teil C 38 38 156-158 1336 Waterman,M.S. Additive Evolutionary .. J.Theor.Biol. 77 64:199-213 Waterman MS; Smith TF; Singh M; Beyer WA Additive Evolutionary Trees Evolutionary tree; Phylogeny; Additive tree; USA "Metric trees are dendrograms which ... have numerical values attached to the branches. ... Metric trees and additive matrices are discussed and the uniqueness of the metric tree for an additive dissimilarity matrix is shown. A simple algorithm is given to generate the metric tree for an additive dissimilarity matrix. This algorithm is extended to non-additive dissimilarity matrices through the use of linear programming." J Theor Biol 64 64 199-213 1337 Swofford,D.L. Reconstructing Ancestr.. Math.Biosci. 87 87:199-229 Swofford DL; Maddison WP Reconstructing Ancestral Character States Under Wagner Parsimony Evolutionary tree; Phylogeny; Character optimization; USA; Parsimony "The problem of assigning optimal character states to the hypothetical ancestors of an evolutionary tree under the Wagner parsimony criterion is examined. A proof is provided for the correctness of Farris's well-known, but previously unproven, algorithm for solving this problem. However, the solution is not, in general, unique, and Farris's method obtains only a subset (generally only one) of the possible solutions. Algorithms that discover other solutions and that resolve ambiguities through the imposition of ancillary criteria are developed and discussed." Math Biosci 87 87 199-229 1338 Maddison,W.P. MacClade: Analysis of .. 92Sinauer Associa Maddison WP; Maddison DR MacClade: Analysis of Phylogeny and Character Evolution. Version 3 BK - Phylogeny; Character data; Program; USA; Evolution "This book is both a manual for the computer program MacClade, describing its features and potential uses, as well as a portrayal of a phylogenetic approach to studying diversity and evolution. It is relatively easy to see the diversity of living organisms, but it has proved more difficult to see that diversity in terms of its history; the slow development of a thoroughly phylogenetic perspective in biology attests to this challenge. Together this book and program present methods for analyzing and exploring phylogenetic hypotheses, including hypotheses about character evolution." Sinauer Associates Inc ,Sunderland, MA 1992 xi+398-0 1339 Sober,E. Parsimony in Systemati.. Annu.Rev.Ecol.S 83 14:335-357 Sober E Parsimony in Systematics: Philosophical Issues Phylogeny; Parsimony; Likelihood; USA; Systematics "If one is entirely ignorant of the contingent properties of the evolutionary process that generated a set of taxa (e.g. what forces acted when and with what intensities, or what the probabilities were of certain evolutionary transitions), can one still reasonably use parsimony? Cladists generally say yes, while their critics disagree." Competing methods of phylogenetic inference. A priori defences of parsimony. A posteriori criticisms of parsimony. A likelihood justification of parsimony. Summary. Annu Rev Ecol Syst 14 14 335-357 1340 Sober,E. Reconstructing the Pas.. 88MIT Press Sober E Reconstructing the Past. Parsimony, Evolution, and Inference BK - Phylogeny; Parsimony; Likelihood; USA; Evolution The biological problem of phylogenetic inference. The philosophical problem of simplicity. The principle of the common cause. Cladistics and the limits of hypothetico-deductivism. Parsimony, likelihood, and Consistency. A model branching process. MIT Press Cambridge, MA 1988 xviii+265-0 1341 Maddison,W. Reconstructing Charact.. Cladistics 89 5(4):365-377 Maddison W Reconstructing Character Evolution on Polytomous Cladograms Phylogeny; Character data; USA; Evolution "New algorithms for both ordered and unordered characters are presented to reconstruct character evolution under the uncertain-resolution interpretation of polytomies. These algorithms allow the cladogram to resolve itself so as to be favourable for the character whose evolution is being reconstructed. Because different characters may have different favourable resolutions, it is not possible in general to use these algorithms to determine the total parsimony of a polytomous cladogram ..., for which the only adequate approach is to find a most parsimonious dichotomous resolution of the cladogram." Cladistics 1989 5 4 365-377 1342 Wang,C. A Subgraph Problem fro.. J.Comput.Biol. 94 1(3):227-234 Wang C A Subgraph Problem from Restriction Maps of DNA Restriction; Mapping; DNA; USA "Computing the minimum number of edge removals needed to convert a bipartite graph into an interval graph was proposed by Waterman and Griggs in the study of restriction maps of DNA. We show that this problem is NP-complete and we give a polynomial algorithm that finds an edge-maximum interval subgraph for trees. Then various heuristics can be devised using this algorithm." J Comput Biol 1994 1 3 227-234 1343 Nanney,D.L. Shifting Ditypic Site .. J.Mol.Evol. 89 28:451-459 Nanney DL; Preparata RM; Preparata FP; Meyer EB; Simon EM Shifting Ditypic Site Analysis: Heuristics for Expanding the Phylogenetic Range of Nucleotide Sequences in Sankoff Analyses Phylogeny; Character data; USA; Nucleotide; Heuristic; Phylogenetic "We describe and illustrate a simple heuristic approach to the Sankoff methods for construction of parsimonious evolutionary trees from nucleotide sequence data. The procedure is intended to permit more valid inferences, particularly from relatively short sequences, concerning relationships among taxa separated for long time intervals. The procedure is based on the great variability of evolutionary plasticity among sites in the molecules and removes from consideration the more highly variable sites. ... Only 'ditypic sites,' i.e., sites observed in only two evolutionary states within the array, are used in making phylogenetic inferences." J Mol Evol 28 28 451-459 1344 Han,J. Over-Representation of.. Nucleic Acids R 94 22(9):1735-174 Han J; Hsu C; Zhu Z; Longshore JW; Finley WH Over-Representation of the Disease Associated (CAG) and (CGG) Repeats in the Human Genome Repeat; Repetition; Genome; Sequence analysis; USA "Expansion of trimer repeats has recently been described as a new type of human mutation. Of the 64 possible trimer compositions, only the CGG and CAG repeats have been implicated in genetic diseases. This study intends to address two questions: (1) What makes the CGG and CAG repeats unique? (2) Could other trimer repeats be involved in this type of mutation? ... The computer aided sequence analysis studies reported here may help to understand the molecular mechanisms of trimer repeat expansion." Nucleic Acids Res 1994 22 9 1735-1740 1345 Zhang,Z. An Exponential Example.. J.Comput.Biol. 94 1(3):235-239 Zhang Z An Exponential Example for a Partial Digest Mapping Algorithm Digest; Mapping; USA; Algorithm "The partial digest problem for small-scale DNA physical mapping is known in computer science as the turnpike reconstruction problem. Although no polynomial algorithm for this problem is known, a simple backtracking algorithm of Skiena et al. works well in practice. Weiss raises the question whether an exponential example exists for this algorithm. This paper presents such an exponential example for this backtracking algorithm." J Comput Biol 1994 1 3 235-239 1346 Lake,J.A. Origin of the Eukaryot.. Nature (Lond.) 88 331(14 Jan.):1 Lake JA Origin of the Eukaryotic Nucleus Determined by Rate-Invariant Analysis of rRNA Sequences Phylogeny; Invariant; USA "The second application [of Lake's (1987) method of evolutionary parsimony] to more taxa is to infer a fully resolved branching of many species (Lake 1988); however, it has yet to be described in sufficient detail to be reproduced." - Swofford, Olsen (1990), p. 474. Nature (Lond ) 1988 331 14 Jan. 184-186 1347 Martin,D.R. Equivalence Classes fo.. J.Comput.Biol. 94 1(3):241-253 Martin DR Equivalence Classes for the Double-Digest Problem with Coincident Cut Sites Restriction; Mapping; Digest; DNA; USA "Pevzner (1994) completely characterized the solutions to the [Double Digest Problem (DDP)] in the case of no coincident cut sites by associating solutions to DDP with alternating Eulerian paths in an edge-bicolored graph. In this paper we extend the definition of cassettes and their transformations to the general case allowing coincident cut sites. Solutions to the DDP in the general case are again characterized by associating solutions to the DDP with alternating Eulerian cycles in an extended graph." J Comput Biol 1994 1 3 241-253 1348 Fukami,K. On the Maximum Likelih.. J.Mol.Evol. 89 28:460-464 Fukami K; Tateno Y On the Maximum Likelihood Method for Estimating Molecular Trees: Uniqueness of the Likelihood Point Phylogeny; Likelihood; JP "Studies are carried out on the uniqueness of the stationary point on the likelihood function for estimating molecular phylogenetic trees, yielding proof that there exists at most one stationary point, i.e., the maximum point, in the parameter range for the one parameter model of nucleotide substitution. The proof is simple yet applicable to any type of tree topology with an arbitrary number of operational taxonomic units (OTUs). The proof ensures that any valid approximation algorithm be able to reach the unique maximum point under the conditions mentioned above." J Mol Evol 28 28 460-464 1349 Hendy,M.D. Branch and Bound Algor.. Math.Biosci. 82 59:277-290 Hendy MD; Penny D Branch and Bound Algorithms to Determine Minimal Evolutionary Trees Evolutionary tree; Phylogeny; Program; NZ; Algorithm "Two practical branch and bound algorithms for determining minimal and near-minimal phylogenetic trees from protein sequence data are presented. A mathematical description and analysis of phylogenetic trees introduces these algorithms. A comment on efficiency and fine tuning completes the paper. An example is cited where computer time was reduced from an estimated 55 days for a total search, to just under 5 minutes." Math Biosci 59 59 277-290 1350 Sharkey,M.J. A Hypothesis-Independe.. Cladistics 89 5(1):63-86 Sharkey MJ A Hypothesis-Independent Method of Character Weighting for Cladistic Analysis Character weight; Compatibility; CA; Cladistic "A hypothesis-independent method of weighting cladistic characters, based on character compatibility, is proposed. The method is used in two fashions, to generate cladograms, and to select from multiple minimum length cladograms. ... The method is contrasted with other weighting techniques which are generally found to be hypothesis dependent." Cladistics 1989 5 1 63-86 1351 Hendy,M.D. Identification of Phyl.. J.Theor.Biol. 78 71:441-452 Hendy MD; Penny D; Foulds LR Identification of Phylogenetic Trees of Minimal Length Phylogeny; Parsimony; NZ; Identification; Phylogenetic "The problem of determining an optimal phylogenetic tree from a set of data is an example of the Steiner problem in graphs. There is no efficient algorithm for solving this problem with reasonably large data sets. In the present paper an approach is described that proves in some cases that a given tree is optimal without testing all possible trees. ... We simultaneously attempt to reduce the total length of the tree and increase the lower bound. When these are equal it is not possible to make a shorter tree with a given data set and given criterion." J Theor Biol 71 71 441-452 1352 Hendy,M.D. Proving Phylogenetic T.. Math.Biosci. 80 51:71-88 Hendy MD; Foulds LR; Penny D Proving Phylogenetic Trees Minimal with l-Clustering and Set Partitioning Phylogeny; Parsimony; Evolutionary tree; NZ; Phylogenetic "The problem of determining a minimal length phylogenetic (evolutionary) tree from a set of aligned protein sequences is defined mathematically. Although this is an example of the Steiner problem in graphs (SPG), we exploit the special nature of the character sequences to solve it more efficiently than by using SPG algorithms. This is done by using a key theorem concerning partitions of the data. All optimal solutions for problems with less than six species are classified. In problems where optimality is not immediately achieved, the data must be partitioned." Math Biosci 51 51 71-88 1353 Lopez-Ortiz,A Linear Pattern Matchin.. SIGACT News 94 25(3):114-121 Lopez-Ortiz A Linear Pattern Matching of Repeated Substrings Pattern match; Repeat; String match; Data structure; Search tree; CA "Weiner (1973) presented a very original algorithm that performs linear time recognition of repeated instances of a substring in a string. Weiner's approach to this problem was as important as the solution to the problem itself. ... Unfortunately, Weiner's paper may be difficult for modern readers. Familiar objects such as trees and other data structures are described using notation drawn from automata theory. Typographical errors and overloading of terms contribute to the difficulty. This paper attempts to explain Weiner's result in a more accessible manner." SIGACT News 1994 25 3 114-121 1354 Moore,G.M. An Iterative Approach .. J.Theor.Biol. 73 38:423-457 Moore GM; Goodman M; Barnabas J An Iterative Approach from the Standpoint of the Additive Hypothesis to the Dendrogram Problem Posed by Molecular Data Sets Phylogeny; Distance; USA "The problem of constructing a dendrogram depicting phylogenetic relationships for a collection of contemporary species is considered. An approach was developed based on the additive hypothesis in which each 'length' between two species can be described by the shortest sum of lengths for the individual links on the dendrogram topology which connect the two species. The additive hypothesis holds equally well if the dendrogram is replaced by its corresponding (rootless) network. Network topologies are defined set theoretically in terms of the initial, contemporary species ...." J Theor Biol 38 38 423-457 1355 Maddison,W.P. Interactive Analysis o.. Folia Primatol. 89 53(1-4):190-20 Maddison WP; Maddison DR Interactive Analysis of Phylogeny and Character Evolution Using the Computer Program MacClade Phylogeny; Character data; Program; USA; Evolution "Computer programs for phylogenetic analysis have been important tools in systematics and evolutionary biology, but most have been designed primarily for the reconstruction of phylogenetic trees and not the interpretation of patterns of character evolution. Described here is the computer program MacClade, designed for interactive analysis of character evolution and phylogeny." Folia Primatol 1989 53 1-4 190-202 1356 Sober,E. A Likelihood Justifica.. Cladistics 85 1(3):209-233 Sober E A Likelihood Justification of Parsimony Phylogeny; Parsimony; Likelihood; USA "A connection is established between maximally parsimonious cladograms and trees of highest likelihood. The assumptions needed to prove this are derivable from the structure of evolutionary theory and are independent of the frequency of homoplasy. The bearing of this justification on alternative methods of phylogenetic inference and on Felsenstein's (1978) proof that parsimony and other phylogenetic methods can be statistically inconsistent is discussed." Cladistics 1985 1 3 209-233 1357 Sober,E. Parsimony and Characte.. Cladistics 86 2(1):28-42 Sober E Parsimony and Character Weighting Phylogeny; Character weight; Likelihood; USA; Parsimony "The likelihood justification of parsimony proposed in Sober (1983, 1984) is applied to some problems posed by character weighting. An argument is provided for thinking that the point frequency of a character is not a good descriptor for parsimonious reconstructions of a phylogeny. The idea that a good character will be conservative or nonadaptive is also examined from a likelihood point of view." Cladistics 1986 2 1 28-42 1358 McGuire,J.B. On the Reconstruction .. J.Theor.Biol. 78 75:141-147 McGuire JB; Thompson CJ On the Reconstruction of an Evolutionary Order Phylogeny; Axiomatic; USA "The problem of reconstructing an evolutionary order from various taxonomic criteria may be thought of as specifying a computer program or evolution function e that takes as input the taxonomic orders based on the criteria and produces a single composite evolutionary order as output. We specify four conditions that any such e should satisfy. Taken separately the conditions seem reasonable but taken together they are inconsistent." J Theor Biol 75 75 141-147 1359 Vach,W. Least Squares Approxim.. CSQ - Comput.St 91 3:203-218 Vach W; Degens PO Least Squares Approximation of Additive Trees to Dissimilarities - Characterizations and Algorithms Additive tree; Distance; Least squares; Square; Approximation; Monte Carlo; DE; Characterization; Algorithm "We consider the problem of fitting an additive tree to a given dissimilarity matrix by least squares approximation. We present several characterizations of the local solutions to this approximation problem. One of them leads directly to a new algorithmic approach extending the agglomerative construction principle which is well established in hierarchical clustering. Some new and traditional tree constructing methods are compared in a Monte Carlo study with regard to their ability to redetect substructures of a true tree." CSQ - Comput Statist Quart 1991 3 203-218 1360 Rodrigo,A.G. A Modification to Whee.. Cladistics 92 8(2):165-170 Rodrigo AG A Modification to Wheeler's Combinatorial Weights Calculations Substitution; Character weight; Phylogeny; NZ; Combinatorial "Wheeler (1990) proposed a procedure for weighting the transformations between all nucleotide pairs, from a set of aligned nucleotide sequences. ... In this paper I show that the normalization procedure estimates the conditional probability ... instead of the 'total' probability .... I argue that an estimate of the latter probability is more appropriate for phylogenetic analysis and I present a modification of Wheeler's method. Finally, I show how we may estimate asymmetric transformation probabilities using an outgroup. If there is a reasonable outgroup available, this method may be preferable to the other described here." Cladistics 1992 8 2 165-170 1361 Rodrigo,A.G. An Information-Rich Ch.. N.Z.Nat.Sci. 89 16:97-103 Rodrigo AG An Information-Rich Character Weighting Procedure for Parsimony Analysis Phylogeny; Character weight; Parsimony; NZ "A weighting procedure is proposed which takes account of prior information pertaining to the characters used in a parsimony analysis. This information comes from specific knowledge about the biology of the group in question, as well as general evolutionary theory. ... The procedure is an iterative one, and can be terminated once the resultant tree has converged to a 'constant' value, or after a predetermined number of runs. The resultant tree may or may not be as short as the most parsimonious tree. It is argued that in taking account of prior information, the proposed procedure is information-rich (IR)." N Z Nat Sci 16 16 97-103 1362 Albert,V.A. On the Rationale and U.. Cladistics 92 8(1):73-83 Albert VA; Mishler BD On the Rationale and Utility of Weighting Nucleotide Sequence Data Phylogeny; Character weight; USA; Nucleotide "These issues are germane to the weighting scheme recently proposed by W. Wheeler (1990) for use with nucleotide sequence data. We will address Wheeler's weighting approach from two angles: (i) the assumptions that must be made in order to justify its use; and (ii) its convergence, albeit in an inferior manner, to the within-character weighting approach already developed by David Sankoff and colleagues ...." Cladistics 1992 8 1 73-83 1363 Mishler,B.D. The Use of Nucleic Aci.. Taxon 88 37:391-395 Mishler BD; Bremer K; Humphries CJ; Churchill SP The Use of Nucleic Acid Sequence Data in Phylogenetic Reconstruction Phylogeny; USA; Phylogenetic; Nucleic acid "Considerable interest has recently been focused on nucleic acid sequence data as a source of phylogenetic information. ... With respect to higher-level relationships of green plants, 5S RNA sequences provide the most information at the present time .... In an earlier paper ... we attempted to apply these data to a cladistic analysis of the green plants, but were discouraged with the results because of considerable homoplasy in the data. Steele et al. (1988) have objected to our rejection of these particular data in that analysis. We wish to respond to their concerns and more generally discuss prospects and problems with nucleic acid sequences as systematic evidence." Taxon 37 37 391-395 1364 Wheeler,W. Quo Vadis? Cladistics 92 8(1):85-86 Wheeler W Quo Vadis? Phylogeny; Character weight; USA "Albert and Mishler (1992) raise several points in the criticism of my two papers (Wheeler, 1990a,b) describing and using combinatorial weights. Overall, the points raised are divisible into two types, those which arise from a misconception as to the meaning of the weights and those which are based on a probabilistic model of their construction. I will discuss their criticism in this light." Cladistics 1992 8 1 85-86 1365 Bryant,H.N. The Role of Permutatio.. Syst.Biol. 92 41(2):258-263 Bryant HN The Role of Permutation Tail Probability Tests in Phylogenetic Systematics Phylogenetic; Statistical; Probability; CA; Permutation; Systematics "Faith and Cranston (1992) attempted to forge a formal link between the cladistic permutation tail probability (PTP) associated with their randomization test for cladistic structure (Faith and Cranston, 1991) and Karl Popper's (1959) concept of corroboration. ... As a reviewer of an earlier version of their paper, I argued that the null model - namely that the characters in the data matrix will covary at random - is contrary to the basic axioms of phylogenetic systematics. ... Discussion of these problems leads to a slightly different interpretation of the role of PTP testing in the evaluation of most-parsimonious cladograms." Syst Biol 1992 41 2 258-263 1366 Felsenstein,J Methods for Inferring .. Numerical Tax.. 83Springer-Verlag Felsenstein J Methods for Inferring Phylogenies: A Statistical View Felsenstein J Numerical Taxonomy. NATO ASI Series, Vol. G1 Phylogeny; Statistical; USA "While throughout the rest of science it is generally accepted that statistics is the framework within which inferences from data ought to be made, in systematics this is a minority viewpoint. Nonstatistical principles such as parsimony are usually invoked as underlying the logic of the inferences. I think that these principles are preferred precisely because they have an aura of certainty that a statistical framework cannot provide. The remainder of this paper will explore the implications of a statistical viewpoint on inferring phylogenies. Readers who want a more extended account can consult my recent review of methods for inferring phylogenies (Felsenstein 1982)." Springer-Verlag Berlin 1983 315-334 1367 Felsenstein,J Parsimony and Likeliho.. Syst.Zool. 86 35(4):617-626 Felsenstein J; Sober E Parsimony and Likelihood: An Exchange Phylogeny; Likelihood; Parsimony; USA "This is intended as an exploration of the differences between the authors on the relationship between parsimony and likelihood. We have used the format pioneered by Harper and Platnick (1978) of an exchange of comments, each by one of the authors." Syst Zool 1986 35 4 617-626 1368 Neff,N.A. A Rational Basis for A.. Syst.Zool. 86 35(1):110-123 Neff NA A Rational Basis for A Priori Character Weighting Character weight; USA "Previously presented arguments for and against character weighting in systematic analyses are briefly reviewed and the bases for different weighting methods summarized. A priori and a posteriori methods are defined. I conclude that a priori weighting is the only noncircular approach for weighting of characters in the construction or recognition of groups of taxa, but that no objective method of a priori weighting has been proposed to date. A hypothetico- deductive methodology for character analysis completely prior to and independent of cladistic analysis (or phylogeny reconstruction) is briefly summarized." Syst Zool 1986 35 1 110-123 1369 Saitou,N. A Theoretical Study of.. Syst.Zool. 89 38(1):1-6 Saitou N A Theoretical Study of the Underestimation of Branch Lengths by the Maximum Parsimony Principle Phylogeny; Evolutionary tree; Parsimony; JP "The degree of underestimation of branch lengths by the maximum parsimony principle is studied. The expected number of nucleotide changes per site under the maximum parsimony principle is computed, and it is compared with the expected number of nucleotide substitutions. ... It is shown that as long as the evolutionary distance is less than 0.2, the maximum parsimony principle gives good estimates of nucleotide substitutions. When the evolutionary distance is greater than 0,2, however, the method gives gross underestimates of nucleotide substitutions." Syst Zool 1989 38 1 1-6 1370 Goloboff,P.A. Character Optimization.. Cladistics 93 9(4):433-436 Goloboff PA Character Optimization and Calculation of Tree Lengths Evolutionary tree; Character optimization; USA; Optimization "In cladistics, character optimization (Farris, 1970) is the process of finding the possible assignments of states to the internal nodes of a tree such that the steps, or length, for the character are the minimum possible .... For non-additive characters, all the states that occur in at least one possible optimization can be found using Fitch's (1971) two-pass algorithm. For additive characters, the only published algorithm is Swofford and Maddison's (1987). I describe here another algorithm which, like Swofford and Maddison's, deals only with dichotomous trees, but is possibly more efficient and simpler to program." Cladistics 1993 9 4 433-436 1371 Farris,J.S. Methods for Computing .. Syst.Zool. 70 19:83-92 Farris JS Methods for Computing Wagner Trees Phylogeny; Evolutionary tree; Character optimization; USA "The article derives some properties of Wagner Trees and Networks and describes computational procedures for Prim Networks, the Wagner Method, Rootless Wagner Method and optimization of hypothetical intermediates." Syst Zool 19 19 83-92 1372 Swofford,D.L. Parsimony, Character-s.. Systematics, .. 92 Swofford DL; Maddison WP Parsimony, Character-state Reconstructions, and Evolutionary Inferences Mayden RL Systematics, Historical Ecology, and North American Freshwater Fishes Phylogeny; Parsimony; Character optimization; USA Goloboff (1993), p. 436 1992 186-223 1373 Goloboff,P.A. Estimating Character W.. Cladistics 93 9(1):83-91 Goloboff PA Estimating Character Weights during Tree Search Phylogeny; Evolutionary tree; Character weight; USA "A new method for weighting characters according to their homoplasy is proposed; the method is non-iterative and does not require independent estimations of weights. It is based on searching trees with maximum total fit, with character fits defined as a concave function of homoplasy. Then, when comparing trees, differences in steps occurring in characters which show more homoplasy on the trees are less influential. The reliability of the characters is estimated, during the analysis, as a logical implication of the trees being compared. The 'fittest' trees imply that the characters are maximally reliable and, given character conflict, have fewer steps for the characters which fit the tree better." Cladistics 1993 9 1 83-91 1374 Hide,W. Biological Evaluation .. J.Comput.Biol. 94 1(3):199-215 Hide W; Burke J; Davison DB Biological Evaluation of d2, an Algorithm for High-Performance Sequence Comparison Sequence comparison; Database search; Sequence proximity; Sequence database; USA; Algorithm "A number of algorithms exist for searching sequence databases for biologically significant similarities based on the primary sequence similarity of aligned sequences. We have determined the biological sensitivity and selectivity of d2, a high-performance comparison algorithm that rapidly determines the relative dissimilarity of large datasets of genetic sequences. d2 uses sequence-word multiplicity as a simple measure of dissimilarity. It is not constrained by the comparison of direct sequence alignments and so can use word contexts to yield new information on relationships. ... A theoretical analysis of the expectation for scores is presented." J Comput Biol 1994 1 3 199-215 1375 Davis,J.I. Character Removal as a.. Cladistics 93 9(2):201-210 Davis JI Character Removal as a Means for Assessing Stability of Clades Evolutionary tree; Robustness; USA "The stability of each clade resolved by a data set can be assessed as the minimum number of characters that, when removed, cause resolution of the clade to be lost; a clade is regarded as having been lost when it does not occur in the strict consensus tree. The clade stability index (CSI) is the ratio of this minimum number of characters to the number of informative characters in the data set. The CSI of a clade can range from 0 (absence from the consensus tree of the complete data set) to 1 (all informative characters must be removed for the clade to fail to be resolved)." Cladistics 1993 9 2 201-210 1376 Cracraft,J. Parsimony and Phylogen.. Phylogenetic .. 91Oxford Universi Cracraft J; Helm-Bychowski K Parsimony and Phylogenetic Inference using DNA Sequences: Some Methodological Strategies Miyamoto MM Cracraft J Phylogenetic Analysis of DNA Sequences Phylogeny; Parsimony; Reliability; Informativeness; USA; DNA; Phylogenetic "This chapter addresses two of the more general problems involving the use of parsimony procedures in phylogenetic inference: (1) given that there are physico-chemical/functional constraints on sequence evolution, especially in sequences coding for proteins or structural RNAs, how might parsimony be applied in order to infer phylogenetic relationships, and (2) how might we judge the phylogenetic informativeness of sequence data?" Oxford University Press New York 1991 184-220 1377 Fitch,W.M. The Estimate of Total .. Phil.Trans.R.So 86 312:317-324 Fitch WM The Estimate of Total Nucleotide Substitutions from Pairwise Differences is Biased Evolutionary rate; Evolutionary divergence; Substitution; USA; Nucleotide "A nomographic method is presented that estimates the number of nucleotide substitutions since the common ancestor of two nucleotide sequences with no assumption about the proportion of transition and transversion substitutions except that it is constant over time. ... Mitochondrial data provide evidence that, for this and probably other current models correcting for superimposed substitutions, one or more of the underlying assumptions is incorrect. This is because there is some unknown systematic bias affecting this evolutionary process. It is suggested that at least part of the bias arises from incorrectly assuming that all sites are variable." Phil Trans R Soc Lond Ser B 312 312 317-324 1378 Zharkikh,A. Statistical Properties.. J.Mol.Evol. 92 35(4):356-366 Zharkikh A; Li WH Statistical Properties of Bootstrap Estimation of Phylogenetic Variability from Nucleotide Sequences. II. Four Taxa Without a Molecular Clock Evolutionary tree; Bootstrap; Statistical; Clock; USA; Nucleotide; Phylogenetic; Estimation Zharkikh, Li (1993), p. 125 J Mol Evol 1992 35 4 356-366 1379 Hedges,S.B. The Number of Replicat.. Mol.Biol.Evol. 92 9(2):366-369 Hedges SB The Number of Replications Needed for Accurate Estimation of the Bootstrap P Value in Phylogenetic Studies Phylogeny; Bootstrap; USA; Phylogenetic; Estimation "The bootstrap is a statistical method for obtaining a nonparametric estimate of error. ... The application of bootstrapping to phylogeny estimation is a tradeoff between the maximum number of replications that can be performed by the researcher in a reasonable amount of time and the minimum number of replications needed for accurate estimation of the bootstrap P value (BP). The purpose of the present report is to explore the variance (and hence the accuracy) of the phylogenetic BP and to establish guidelines for efficient bootstrap sampling." Mol Biol Evol 1992 9 2 366-369 1380 Mindell,D.P. Aligning DNA Sequences.. Phylogenetic .. 91Oxford Universi Mindell DP Aligning DNA Sequences: Homology and Phylogenetic Weighting Miyamoto MM Cracraft J Phylogenetic Analysis of DNA Sequences Multiple alignment; Sequence weight; Homology; USA; DNA; Phylogenetic "The object of this chapter is to examine the practice of DNA sequence alignment in the context of homology assessment. I point out that species sequences should be aligned in descending order of phylogenetic relationship (phylogenetic weighting of alignments) to maintain the continuity of information which forms the basis of relationships of homology. Using mitochondrial ribosomal RNA (rRNA) sequences, I also show how shuffling the order of input for sequences in multiple alignments may be used to help determine phylogenetic relationships among taxa whose divergences occurred relatively close together in time." Oxford University Press New York 1991 73-89 1381 Sidow,A. Compositional Statisti.. Phylogenetic .. 91Oxford Universi Sidow A; Wilson AC Compositional Statistics Evaluated by Computer Simulations Miyamoto MM Cracraft J Phylogenetic Analysis of DNA Sequences Phylogeny; Evolutionary tree; Invariant; Simulation; USA "This chapter is about a new method called compositional statistics, which is most suitable for elucidating relationships among highly diverged sequences. In contrast to most other methods, it takes into account the sequences' base compositions. Our discussion emphasizes the idea that biases in the base composition of the compared sequences may affect a phylogenetic analysis and produce systematic errors if not properly corrected for. ... In order to point out the most useful applications of compositional statistics, we first discuss some important strengths and weaknesses of the most commonly used methods of phylogenetic inference." Oxford University Press New York 1991 129-146 1382 Felsenstein,J Is There Something Wro.. Syst.Biol. 93 42(2):193-200 Felsenstein J; Kishino H Is There Something Wrong with the Bootstrap on Phylogenies? A Reply to Hillis and Bull Phylogeny; Bootstrap; Reliability; Evolutionary tree; USA "We argue that these phenomena are not a result of using the bootstrap but are a result of summarizing the evidence for a group by using a P value. Hillis and Bull's phenomena are rather precisely duplicated in a much simpler model of estimating where the mean of a normal distribution is, a model having no bootstrapping. As in empirical studies, we can often get a clearer picture by considering a simple example. Finally, we show that there is another straightforward meaning of the P value that is not invalidated by Hillis and Bull's criticisms and that can be taken as the 'real' meaning of the bootstrap P value." Syst Biol 1993 42 2 193-200 1383 Sanderson,M.J MacClade, Version 3.0 Syst.Biol. 93 42(2):218-220 Sanderson MJ MacClade, Version 3.0 Phylogeny; Character data; Review; Program; USA "Aside from a few minor bugs, the only real deficiency is the lack of support for System 7, the recent update of the Macintosh operating system. ... In the meantime, they have done an admirable job of not only satisfying most workers' requirements but also of making a statement about the level of sophistication necessary in studies of character evolution - and they have provided the means to achieve it." Syst Biol 1993 42 2 218-220 1384 Felsenstein,J Counting Phylogenetic .. J.Theor.Biol. 91 152:357-376 Felsenstein J Counting Phylogenetic Invariants in Some Simple Cases Phylogeny; Invariant; Evolutionary tree; USA; Phylogenetic "An informal degrees of freedom argument is used to count the number of phylogenetic invariants in cases where we have three or four species and can assume a Jukes-Cantor model of base substitution with or without a molecular clock. ... Two new classes of invariants are found: non-phylogenetic cubic invariants testing independence of evolutionary events in different lineages, and linear phylogenetic invariants which occur when there is a molecular clock. Most of the linear invariants found by Cavender (1989) turn out in the Jukes- Cantor case to be simple tests of symmetry of the substitution model, and not phylogenetic invariants." J Theor Biol 152 152 357-376 1385 Hendy,M.D. Hadamard Conjugation: .. N.Z.J.Bot. 93 31(3):231-237 Hendy MD; Charleston MA Hadamard Conjugation: A Versatile Tool for Modelling Nucleotide Sequence Evolution Phylogeny; Hadamard; Invariant; NZ; Evolution; Nucleotide "Hadamard conjugation has proved to be a useful tool in examining some of the properties of the patterns of nucleotide sequences arising from the evolution of the taxa they represent. It has a considerable advantage in that the formulae are independent of the phylogenetic structure under consideration .... Hadamard conjugation is outlined and four applications are introduced. [They] are the theoretical examination of tree building methods, the generation of sample sequences under various models for simulation studies, the identification of some phylogenetic invariants, and the closest tree method for inferring phylogenetic trees and their edge lengths." N Z J Bot 1993 31 3 231-237 1386 Fu,Y.X. Construction of Linear.. Math.Biosci. 92 109:201-228 Fu YX; Li WH Construction of Linear Invariants in Phylogenetic Inference Phylogeny; Evolutionary tree; Invariant; USA; Phylogenetic "An analytical method is presented for constructing linear invariants. All linear invariants of a k-species tree can be derived from those of (k-1)-species trees using this method. The new method is simpler than that of Cavender, which relies on numerical computations. Moreover, the new method provides a convenient tool to study the relationships between linear invariants of the same tree or of different trees. All linear invariants of trees of up to five species are derived in this study. ... The number of linear invariants for a tree is found to increase rapidly with the number of species." Math Biosci 109 109 201-228 1387 Penny,D. Progress with Methods .. Trends Ecol.Evo 92 7(3):73-79 Penny D; Hendy MD; Steel MA Progress with Methods for Constructing Evolutionary Trees Phylogeny; Invariant; Review; Evolutionary tree; NZ "Evolutionists dream of a tree-reconstruction method that is efficient (fast), powerful, consistent, robust and falsifiable. These criteria are at present conflicting in that the fastest methods are weak (in their use of information in the sequences) and inconsistent (even with very long sequences they may lead to an incorrect tree). But there has been exciting progress in new approaches to tree inference, in understanding general properties of methods, and in developing ideas for estimating the reliability of trees. New phylogenetic invariant methods allow selected parameters of the underlying model to be estimated directly from sequences." Trends Ecol Evol 1992 7 3 73-79 1388 Penny,D. Testing the Theory of .. Phylogenetic .. 91Oxford Universi Penny D; Hendy MD; Steel MA Testing the Theory of Descent Miyamoto MM Cracraft J Phylogenetic Analysis of DNA Sequences Phylogeny; Spectral analysis; NZ "In this chapter we review our approach to the study of evolutionary trees. This has been developed within a strong Popperian framework ... of aiming to develop falsifiable hypotheses. After discussing some of the general issues involved, we then discuss the question of how good methods are for inferring trees, particularly from molecular data." Oxford University Press New York 1991 155-183 1389 Steel,M.A. Spectral Analysis and .. Appl.Math.Lett. 92 5(6):63-67 Steel MA; Hendy MD; Skekely LA; Erdos PL Spectral Analysis and a Closest Tree Method for Genetic Sequences Phylogeny; Genetic; Spectral analysis; Consistency; NZ "We describe a new method for estimating the evolutionary tree linking a collection of species from their aligned four-state genetic sequences. This method, which can be adapted to provide a branch-and-bound algorithm, is statistically consistent provided the sequences have evolved according to a standard stochastic model of nucleotide mutation. Our approach exploits a recent group-theoretic description of this model." Appl Math Lett 1992 5 6 63-67 1390 Kim,J. Multiple Sequence Alig.. Comput.Appl.Bio 94 10(4):419-426 Kim J; Pramanik S; Chung MJ Multiple Sequence Alignment using Simulated Annealing Multiple alignment; Simulated annealing; USA; Sequence alignment "An algorithm called Multiple Sequence Alignment using Simulated Annealing (MSASA) has been developed. The computational complexity of MSASA is significantly reduced by replacing the high-temperature phase of the annealing process by a fast heuristic algorithm. ... Compared to the dynamic programming approach, MSASA can (i) use natural gap costs which can generate better solution, (ii) align more sequences and (iii) take less computation time." Comput Appl Biosci 1994 10 4 419-426 1391 Gotoh,O. Further Improvement in.. Comput.Appl.Bio 94 10(4):379-387 Gotoh O Further Improvement in Methods of Group-to-Group Sequence Alignment with Generalized Profile Operations Multiple alignment; Profile; JP "It has previously been shown that rigorous optimization of alignment between two groups of sequences in the sense of minimal sum of pairs (SP) score with a linear gap-weighting function can be achieved by an extended version of the dynamic programming algorithm. The major drawback of this algorithm was that the computation time grows in proportion to the product of the numbers (M and N) of sequences comprising the two groups. A new algorithm presented in this paper achieves the same rigorous alignment in a time complexity much less dependent on the sizes of the two groups." Comput Appl Biosci 1994 10 4 379-387 1392 Fuchs,R. Sequence Analysis by E.. Comput.Appl.Bio 94 10(4):413-417 Fuchs R Sequence Analysis by Electronic Mail: A Tool for Accessing Internet E-mail Servers Sequence analysis; Program; Electronic mail; DE; Server; Internet "A new utility program, MSU, is described that simplifies the use of electronic mail servers for sequence analysis. Service descriptions are defined in external control files which can be changed without affecting the main program. This approach makes MSU a highly flexible tool that allows easy modification, extension and customization of service descriptions to suit users' personal requirements." Comput Appl Biosci 1994 10 4 413-417 1393 Coulson,A. Extracting the Informa.. Trends Biotechn 93 11:223-227 Coulson A Extracting the Information - Sequence Analysis Software Design Evolves Sequence analysis; Program; UK "In the past few years, techniques for electronic-data storage, retrieval and analysis have become essential research tools in molecular biology. The starting point for this development was the invention of techniques for the rapid cloning and sequencing of genes; frequently, the sequence of a gene will be available long before the gene product has been isolated, or even before there are any experimantal methods available for its study. It is therefore important to extract as much information as possible from the sequence itself, both for its own sake, and to provide guidance in selecting subsequent experimental approaches." Trends Biotechnol 11 11 223-227 1394 Davison,D.B. The GenBank-Server at .. Nucleic Acids R 90 18(6):1571-157 Davison DB; Chappelear JE The GenBank-Server at the University of Houston Sequence database; Electronic mail; Program; USA "The University of Houston GenBank-Server is an electronic mail facility which has been successfully in service for the last 14 months. It provides locus id and accession number access to GenBank data. In addition, it also holds the contributed-software archives previously kept at BIONET." Nucleic Acids Res 1990 18 6 1571-1572 1395 Henikoff,S. Sequence Analysis by E.. Trends Biochem. 93 18(Jul.):267-2 Henikoff S Sequence Analysis by Electronic Mail Server Sequence analysis; Electronic mail; USA; Server "Sequence analysis tasks account for much of the recent popularity of e- mail servers for biologists. ... Some e-mail servers make available programs that are not generally found in sequence analysis packages. ... Table I lists several e-mail servers for sequence analysis tasks. Amos Bairoch's more complete description of e-mail servers can itself be obtained from an e-mail server (send the message 'get doc:serv_ema.txt' to netservembl-heidelberg.de)." Trends Biochem Sci 1993 18 Jul. 267-268 1396 Tatusov,R.L. A Simple Tool to Searc.. Comput.Appl.Bio 94 10(4):457-459 Tatusov RL; Koonin EV A Simple Tool to Search for Sequence Motifs that are Conserved in BLAST Outputs Database search; Motif; BLAST; USA "An obvious way to augment the selectivity of 'weak' motifs without compromising the specificity is to search for motifs that are conserved in groups of related sequences. Such conservation serves as a filter that cuts off fortuitous occurrences of motifs. We describe here a simple program, Bla, that searches the output of BLAST, the widespresd fast database-searching program ..., for conserved motifs." Comput Appl Biosci 1994 10 4 457-459 1397 Miller,W. A Note about Computing.. Comput.Appl.Bio 94 10(4):455-456 Miller W; Boguski M A Note about Computing All Local Alignments Pairwise alignment; Locally optimal; USA "A recent paper in this journal by G. Barton proposed an efficient algorithm for locating locally optimal alignments between two sequences. Although the paper claims that all such alignments are found, the approach frequently fails to detect some of the significant matches. This note explains the deficiency." Comput Appl Biosci 1994 10 4 455-456 1398 Landes,C. Fast Databank Searchin.. Comput.Appl.Bio 94 10(4):453-454 Landes C; Risler JL Fast Databank Searching with a Reduced Amino-Acid Alphabet Database search; FR; Amino acid; Databank "Fast sequence databanks search algorithms generally make use of hash tables and look for exactly matching words. An increased sensitivity - at the expense of a decreased selectivity - can be attained in the case of proteins by using a reduced amino acid alphabet. We propose here an alphabet reduced to 10 symbols, that we used in modified versions of the FASTP and SCAN programs. An application ... shows that this technique may be useful in detecting distant relationships between proteins." Comput Appl Biosci 1994 10 4 453-454 1399 Rzhetsky,A. METREE: A Program Pack.. Comput.Appl.Bio 94 10(4):409-412 Rzhetsky A; Nei M METREE: A Program Package for Inferring and Testing Minimum-Evolution Trees Phylogeny; Evolutionary tree; Minimum evolution; Program; USA "The METREE program package for estimating phylogenetic trees with the minimum evolution method is written in Turbo C 2.0 and is intended to be used on any IBM-compatible personal computers that have a mathematical coprocessor. The package is simple to use and is menu driven. A program for visualizing and printing out the final tree is also included." Comput Appl Biosci 1994 10 4 409-412 1400 Corpet,F. RNAlign Program: Align.. Comput.Appl.Bio 94 10(4):389-399 Corpet F; Michot B RNAlign Program: Alignment of RNA Sequences using both Primary and Secondary Structures Sequence alignment; Sequence database; FR; Program; Structure; RNA; Secondary "We have developed an algorithm and a computer program for aligning new RNA sequences with a bank of aligned homologous RNA sequences. Given a common folding structure for the bank, the program performs an alignment between the bank and a new sequence, optimal both in terms of primary and secondary structure. This method is useful to align sequences that present a common folding structure despite extensive divergence of their primary structures. It allows these preserved regions to be precisely distinguished from domains with more variable secondary structure." Comput Appl Biosci 1994 10 4 389-399 1401 Warnow,T.J. Constructing Phylogene.. N.Z.J.Bot. 93 31(3):239-248 Warnow TJ Constructing Phylogenetic Trees Efficiently using Compatibility Criteria Phylogeny; Evolutionary tree; Compatibility; USA; Phylogenetic "The Character Compatibility Problem is a classical problem in computational biology concerned with constructing phylogenetic trees of minimum possible evolution from qualitative character sets. This problem arose in the 1970s, and until recently the only cases for which efficient algorithms were found were for binary (i.e. two-state) characters and for two characters at a time, while the complexity of the general problem remained open. In this paper we will discuss the remarkable progress on this problem since 1990." N Z J Bot 1993 31 3 239-248 1402 Yang,Z. Maximum Likelihood Phy.. J.Mol.Evol. 94 39(3):306-314 Yang Z Maximum Likelihood Phylogenetic Estimation from DNA Sequences with Variable Rates over Sites: Approximate Methods Phylogeny; Likelihood; Evolutionary rate; Approximation; UK; DNA; Rate; Phylogenetic; Estimation "Two approximate methods are proposed for maximum likelihood phylogenetic estimation, which allow variable rates of substitution across nucleotide sites. Three data sets with quite different characteristics were analyzed to examine empirically the performance of these methods. ... The computational requirements of the two methods are comparable to that of Felsenstein's (1981) model, which assumes a single rate for all the sites." J Mol Evol 1994 39 3 306-314 1403 Zharkikh,A. Estimation of Evolutio.. J.Mol.Evol. 94 39(3):315-329 Zharkikh A Estimation of Evolutionary Distances Between Nucleotide Sequences Evolutionary distance; Distance; Markov; Substitution; USA; Nucleotide; Estimation "A formal mathematical analysis of the substitution process in nucleotide sequence evolution was done in terms of the Markov process. ... Extensive computer simulation was used to compare the accuracy and effectiveness of various methods for estimating the evolutionary distance between two nucleotide sequences. It was shown that the multiparameter methods of Lanave et al. (1984), Gojobori et al. (1982), and Barry and Hartigan (1987) are preferable to others for the purpose of phylogenetic analysis when the sequences are long. However, when sequences are short and the evolutionary distance is large, Tajima and Nei's (1984) method is superior to others." J Mol Evol 1994 39 3 315-329 1404 Statistical Analysis o.. 83Marcel Dekker, Statistical Analysis of DNA Sequence Data Weir BS BK - Sequence analysis; Statistical; USA; DNA I have only the preface and bibliography. "This book is intended to survey the rapidly growing field of statistical analysis of DNA sequence data. The authors are all engaged in such analyses, and several of them are also involved in the generation of DNA data. They have pointed to current problems in the interpretation of the new genetic information and have shown possible approaches to solving these problems. We all hope that the book will serve as a timely and convenient reference for molecular, population and evolutionary geneticists and will also serve to stimulate statisticians to become involved in one of the most exciting areas of modern science." Marcel Dekker Inc ,New York 1983 pp. ix+255-0 1405 McClure,M.A. Comparative Analysis o.. Mol.Biol.Evol. 94 11(4):571-592 McClure MA; Vasi TK; Fitch WM Comparative Analysis of Multiple Protein-Sequence Alignment Methods Multiple alignment; Sequence alignment; Survey; Protein; Motif; USA "We have analyzed a total of 12 different global and local multiple protein-sequence alignment methods. The purpose of this study is to evaluate each method's ability to correctly identify the ordered series of motifs found among all members of a given protein family. ... The performance of all 12 methods was affected by (1) the number of sequences in the test sets, (2) the degree of similarity among the sequences, and (3) the number of indels required to produce a multiple alignment. Global methods generally performed better than local methods in the detection of motif patterns." Mol Biol Evol 1994 11 4 571-592 1406 Waddell,P.J. The Sampling Distribut.. Mol.Biol.Evol. 94 11(4):630-642 Waddell PJ; Penny D; Hendy MD; Arnold G The Sampling Distributions and Covariance Matrix of Phylogenetic Spectra Phylogeny; Genetic; Hadamard; Spectral analysis; Distribution; Covariance; NZ; Sampling; Phylogenetic; Matrix "We extend recent advances in computing variance-covariance matrices from genetic distances to a sequence method of phylogenetic analysis. These matrices, together with other statistical properties of corrected sequence spectra, are studied as a foundation fro more powerful and testable methods in phylogenetics. ... Our results extend naturally to four-color (nucleotide) spectra." Mol Biol Evol 1994 11 4 630-642 1407 Gaut,B.S. Detecting Substitution.. Mol.Biol.Evol. 94 11(4):620-629 Gaut BS; Weir BS Detecting Substitution-Rate Heterogeneity among Regions of a Nucleotide Sequence Substitution; Region; Likelihood; Gene; USA; Nucleotide "Likelihood-ration statistics are proposed to test for heterogeneity in nucleotide substitution rate among regions of a DNA sequence. The tests examine three-sequence phylogenies, and two specific tests are proposed: a test to detect rate heterogeneity among genic regions within a sequence, over all evolutionary lineages; and a test to detect rate heterogeneity among regions in a specific evolutionary lineage. Simulations examine the ability of tests to detect a single region that varies in nucleotide substitution rate relative to the remainder of the sequence." Mol Biol Evol 1994 11 4 620-629 1408 Ota,T. Variance and Covarianc.. Mol.Biol.Evol. 94 11(4):613-619 Ota T; Nei M Variance and Covariances of the Numbers of Synonymous and Nonsynonymous Substitutions per Site Substitution; Synonymous; Statistical; Variance; USA; Covariance "Nei and Gojobori (1986) developed a simple method to estimate the numbers of synonymous (ds) and nonsynonymous (dn) substitutions per site. In the present paper, we have developed a method for computing variances and covariances of ds's and dn's and of the proportions of synonymous (ps) and nonsynonymous (pn) differences. We also have developed a method for computing the variances of mean ds, dn, ps, pn, without constructing a phylogenetic tree of the genes. We have conducted computer simulations based on simple evolutionary models and have shown that the new method gives good estimates of variances and covariances." Mol Biol Evol 1994 11 4 613-619 1409 Lockhart,P.J. Recovering Evolutionar.. Mol.Biol.Evol. 94 11(4):605-612 Lockhart PJ; Steel MA; Hendy MD; Penny D Recovering Evolutionary Trees under a More Realistic Model of Sequence Evolution Phylogeny; Evolutionary tree; Stochastic; Evolution; NZ; Model "We report a new transformation, the LogDet, that is consistent for sequences with differing nucleotide composition and that have arisen under simple but asymmetric stochastic models of evolution. This transformation is required because existing methods tend to group sequences on the basis of their nucleotide composition, irrespective of their evolutionary history. ... The overall conclusions from this study are that irregular A, C, G, T compositions are an important and possible general cause of patterns that can mislead tree- reconstruction methods, even when high bootstrap values are obtained." Mol Biol Evol 1994 11 4 605-612 1410 Neuwald,A.F. Detecting Patterns in .. J.Mol.Biol. 94 239:698-712 Neuwald AF; Green P Detecting Patterns in Protein Sequences Pattern recognition; Motif; Statistical; Significance; Sequence alignment; Sequence comparison; USA; Protein "The detection of conserved sequence patterns (motifs) in related proteins often yields valuable structural and functional insights. We describe a method that utilizes rigorous statistics and a depth-first search procedure to efficiently and exhaustively search a set of proteins for significant patterns up to a specified length. Additional procedures classify related patterns into groups and identify protein segments most likely to share a common motif." J Mol Biol 239 239 698-712 1411 Zhang,C.T. A Graphic Approach to .. J.Mol.Biol. 94 238:1-8 Zhang CT; Chou KC A Graphic Approach to Analyzing Codon Usage in 1562 Escherichia coli Protein Coding Sequences Coding; Codon; Sequence analysis; CN; Protein; Graphic "The occurrence frequencies of the four bases ... at each of the three codon positions for 1562 E. coli protein coding sequences have been calculated. The 1562 x 4 x 3 = 18,744 data thus obtained have been analyzed by a graphic method .... The results of our analysis indicate that the patterns for the first two codon positions reflect the origin for producing native folding structures of proteins. We thus come to the conclustion that the distribution patterns for the first two codon positions should be basically species-independent, as confirmed by studies for a number of other species. However, the distribution pattern for the third codon position is species-dependent." J Mol Biol 238 238 1-8 1412 Idury,R.M. Dynamic Dictionary Mat.. Theoret.Comput. 94 131:295-310 Idury RM; Schaffer AA Dynamic Dictionary Matching with Failure Functions Dictionary match; Pattern match; USA; Function; Dynamic "Amir and Farach (1991) and Amir et al. (to appear) recently initiated the study of the dynamic dictionary pattern matching problem. ... Amir et al. (to appear) used an automaton based on suffix trees to solve the dynamic problem. ... We show that the same bounds can be achieved using a framework based on failure functions. We then show that our approach also allows us to achieve faster search times at the expense of the update times. ... This is advantageous if the search texts are much larger than the dictionary or searches are more frequent than updates." Theoret Comput Sci 131 131 295-310 1413 Miyamoto,M.M. A Congruence Test of R.. Syst.Biol. 94 43(2):236-249 Miyamoto MM; Allard MW; Adkins RM; Janecek LL; Honeycutt RL A Congruence Test of Reliability using Linked Mitochondrial DNA Sequences Phylogeny; Reliability; Congruence; USA; DNA "In the absence of certainty, well-corroborated hypotheses of species relationships serve as the best estimates of the true phylogenies of groups. This approach was extended to linked mitochondrial DNA (mtDNA) sequences that share the same gene phylogenies because of nonrecombination. This expectation of congruence forms the basis to test the reliability of unequal weighting for different base positions and changes of DNA sequences. ... Heavy weighting for stems and first/second codon positions and for transversions were first evaluated against the molecular evolutionary properties of the three genes and then evaluated by congruence ...." Syst Biol 1994 43 2 236-249 1414 Bairoch,A. List of Molecular Biol.. 93 Bairoch A List of Molecular Biology Email Servers BK - Electronic mail; Sequence database; Database search; Program; SWI; Server Document serv_ema.txt (version 1.70, 10 Dec. 1993) which is available from netservembl-heidelberg.de. "This document briefly describes the various email servers that are available to molecular biologists. The servers described in this document generally fall into one of the following two categories: (1) Servers that provide an analytical function. ... (2) Servers that allow you to retrieve all or part of a database." 1993 1415 Felsenstein,J Cases in which Parsimo.. Conceptual Is.. 84MIT Press Felsenstein J Cases in which Parsimony or Compatibility Methods will be Positively Misleading Sober E Conceptual Issues in Evolutionary Biology. An Anthology Phylogeny; Evolutionary tree; Likelihood; Parsimony; Compatibility; USA Originally published as Felsenstein (1978). "For some simple three- and four-species cases involving a character with two states, it is determined under what conditions several methods of phylogenetic inference will fail to converge to the true phylogeny as more and more data are accumulated. The methods are the Camin-Sokal parsimony method, the compatibility method, and Farris's unrooted Wagner tree parsimony method. In all cases the conditions for this failure (which is the failure to be statistically consistent) are essentially that parallel changes exceed informative, nonparallel changes." MIT Press Cambridge, MA 1984 663-674 1416 Felsenstein,J Phylogenies and the Co.. Am.Nat. 85 125(1):1-15 Felsenstein J Phylogenies and the Comparative Method Phylogeny; Statistical; Correlation; USA "Recent years have seen a growth in numerical studies using the comparative method. The method usually involves a comparison of two phenotypes across a range of species or higher taxa, or a comparison of one phenotype with an environmental variable. ... My intention is to point out a serious statistical problem with this approach, a problem that affects all of these studies. It arises from the fact that species are part of a hierarchically structured phylogeny, and thus cannot be regarded for statistical purposes as if drawn independently from the same distribution." Am Nat 1985 125 1 1-15 1417 Felsenstein,J Perils of Molecular In.. Nature (Lond.) 88 335 (8 Sept.): Felsenstein J Perils of Molecular Introspection Phylogeny; Likelihood; Parsimony; Invariant; Distance; USA This is a brief overview of methods for analysing the phylogeny of the apes. "We can either use all the information with a highly specific evolutionary model, as likelihood methods do, or trade some of that information for robustness by looking at a smaller subset of the data, as invariants, parsimony and distance methods each does in different ways." Nature (Lond ) 1988 335 8 Sept. 118-118 1418 Felsenstein,J Phylogenies and Quanti.. Annu.Rev.Ecol.S 88 19:445-471 Felsenstein J Phylogenies and Quantitative Characters Phylogeny; Character data; USA "My argument is that the methods used to study the evolution of quantitative characters within populations can profitably be used on a phylogenetic scale to illumine the connection between pattern and process. ... The moment seems ripe to consider the issue." Annu Rev Ecol Syst 19 19 445-471 1419 Trelles-Salaz On an Efficient Parall.. Comput.Appl.Bio 94 10(5):509-511 Trelles-Salazar O; Zapata EL; Carazo JM On an Efficient Parallelization of Exhaustive Sequence Comparison Algorithms on Message Passing Architectures Sequence comparison; Database search; Parallel; SP; Algorithm "We present a new parallel computing approach to the case of exhaustive sequential sequence comparison algorithms on message-passing architectures. In this context a modification of guided self-scheduling as well as efficient buffering strategies are presented. We discuss two specific implementations, one on the Paramid parallel computer, and the other on a cluster of workstations running PVM. In both cases the parallel performance is higher than with any other method presented so far." Comput Appl Biosci 1994 10 5 509-511 1420 Felsenstein,J Phylogenies from Restr.. Evolution 92 46(1):159-173 Felsenstein J Phylogenies from Restriction Sites: A Maximum-Likelihood Approach Phylogeny; Restriction; Likelihood; USA "Restriction sites data can be analyzed by maximum likelihood to obtain estimates of phylogenies. The likelihood methods of Smouse and Li, who were able to compute likelihoods for up to four species under a simplified model of base change, can be extended numerically to deal with any number of species. The computational methods for doing so are outlined. The resulting algorithms are slow but take multiple gains and losses of restriction sites fully into account, unlike parsimony methods. ... The present method is available in a computer program." Evolution 1992 46 1 159-173 1421 Archie,J.W. The Number of Evolutio.. Theoret.Pop.Bio 93 43:52-79 Archie JW; Felsenstein J The Number of Evolutionary Steps on Random and Minimum Length Trees for Random Evolutionary Data Evolutionary tree; Statistical; USA "A model of evolutionarily uninformative data is derived and two separate character state distributions, one with two states (0,1) and one with missing- value data (0,1,{0,1}), are obtained. The expectation of number of steps on random trees is derived for both types of data and the variance in number of steps is derived for missing-value data. It is conjectured that the number of steps on random trees for these data should be asymptotically normal. Computer simulation is used to find approximations for the expected number and variance in number of steps of minimum length trees for both types of random evolutionary data." Theoret Pop Biol 43 43 52-79 1422 Steel,M. A Complete Family of P.. N.Z.J.Bot. 93 31(3):289-296 Steel M; Szekely L; Erdos PL; Waddell P A Complete Family of Phylogenetic Invariants for Any Number of Taxa Under Kimura's 3ST Model Phylogeny; Invariant; Spectral analysis; NZ; Model; Phylogenetic "We describe a new family of phylogenetic invariants that arise from the recently developed spectral analysis approach to tree reconstruction. These invariants, which are valid for Kimura's 3ST model, possess four important properties - they are defined equally easily for any number of taxa, their description is tree-independent, they apply even when the distribution of the four nucleotides in the ancestral taxon is unknown, and they can be modified to deal with sequence sites that do not mutate independently with identical distribution." N Z J Bot 1993 31 3 289-296 1423 Weir,B.S. Variances for Distance.. N.Z.J.Bot. 93 31(3):317-321 Weir BS; Gaut BS Variances for Distances Between Plant Sequences Phylogeny; Pairwise comparison; Distance; Character data; USA; Variance "When the data consist of DNA sequences, appropriate distances can be defined to reflect the mutation model and to have expected values proportional to the time of divergence of the sequences. With the growing amount of sequence data, it is also necessary to incorporate within-species sequence variation into measures of distance between sequences. This additional feature requires attention to be paid to drift and recombination as well as mutation, and greatly increases the difficulty of estimating variances of estimated distances. Numerical resampling seems appropriate." N Z J Bot 1993 31 3 317-321 1424 Fitch,W.M. Weighted Parsimony: Do.. Phylogenetic .. 91Oxford Universi Fitch WM; Ye J Weighted Parsimony: Does it Work? Miyamoto MM Cracraft J Phylogenetic Analysis of DNA Sequences Phylogeny; Character weight; USA; Parsimony "In 1969, Farris suggested a relatively unbiased way of weighting the value of a character for systematic purposes based upon the proposition that characters that frequently change their state are unreliable guides to relationships .... In nucleotide sequences, one can apply the same philosophy not only to the various characters ... but to the character changes as well. [See Williams and Fitch (1989, 1990).] A computer program has been developed that permits one to perform either kind of weighting, or both simultaneously .... In this work, we use simulation to test whether the principle works in practice." Oxford University Press New York 1991 147-154 1425 Li,W.H. Statistical Methods fo.. Phylogenetic .. 91Oxford Universi Li WH; Gouy M Statistical Methods for Testing Molecular Phylogenies Miyamoto MM Cracraft J Phylogenetic Analysis of DNA Sequences Phylogeny; Statistical; Confidence; USA "Fortunately, the rapid accumulation of DNA sequence data has stimulated a strong trend to make phylogenetic reconstruction more statistical. Statistical tests can be classified as analytical or resampling. Resampling methods (e.g., bootstrapping, jacknifing) resample the data to infer empirically the variability of the estimate obtained by a tree-making method. [See Felsenstein 1988.] In this chapter, we discuss only analytical methods. Analytical tests can be based on parsimony methods, distance methods, likelihood methods, or invariant methods. The last approach, which includes the evolutionary parsimony method, has been reviewed in Felsenstein (1988). We discuss only the other approaches." Oxford University Press New York 1991 249-277 1426 Swofford,D.L. PAUP: Phylogenetic Ana.. 91Illinois Natura Swofford DL PAUP: Phylogenetic Analysis Using Parsimony, Version 3.0s. BK - Phylogeny; Character data; Program; Parsimony; Invariant; USA; Phylogenetic Draft version of the User's Manual for PAUP 3.0, dated 6 Dec. 1991. "Version 3 of PAUP contains many significant improvements over earlier versions of the program. From a scientific standpoint, the most important enhancement is support for a wider variety of parsimony models, including the well-known Dollo and Camin-Sokal variants and a 'generalized' method that allows the specification of user-defined character types. These methods supplement the ordered-reversible (Wagner) and unordered (Fitch) parsimony methods of the earlier versions. In addition, the method of invariants for nucleotide sequence data ('evolutionary parsimony') developed by James Lake has been incorporated." Illinois Natural History Survey Champaign, IL, USA 1991 pp.vii+178-0 1427 Martino,R.L. Parallel Computing in .. Science 94 265(12 Aug.):9 Martino RL; Johnson CA; Suh EB; Trus BL; Yap TK Parallel Computing in Biomedical Research Parallel; Hardware; Database search; Protein; Structure; Prediction; USA "Scalable parallel computer architectures provide the computational performance needed for advanced biomedical computing problems. The National Institutes of Health have developed a number of parallel algorithms and techniques useful in determining biological structure and function. These applications include ... searching for homologous DNA or amino acid sequences in large biological databases. Timing results demonstrate substantial performance improvements with parallel implementations compared with conventional sequential systems." Science 1994 265 12 Aug. 902-907 1428 Zhang,Z. Chaining Multiple-Alig.. J.Comput.Biol. 94 1(3):217-226 Zhang Z; Raghavachari B; Hardison RC; Miller W Chaining Multiple-Alignment Blocks Multiple alignment; Block search; Footprint; USA "We derive a time-efficient method for building a multiple alignment consisting of a highest-scoring chain of 'blocks,' i.e., short gap-free alignments. Besides executing faster than a general-purpose multiple-alignment program, the method may be particularly appropriate when discovery of blocks meeting a certain criterion is the main reason for aligning the sequences. Utility of the method is illustrated by locating a chain of 'phylogenetic footprints' (specifically, exact matches of length 6 or more) in the 5'-flanking regions of six mammalian e-globin genes." J Comput Biol 1994 1 3 217-226 1429 Yamauchi,K. The Sequence Flanking .. Nucleic Acids R 91 19(10):2715-27 Yamauchi K The Sequence Flanking Translational Initiation Site in Protozoa Consensus method; JP "If a specific nucleotide was observed at a frequency greater than 50% at a specific position, it was defined as consensus nucleotide. If the sum of the frequencies of two nucleotides was greater than 75% and neither nucleotide met the criteria for a single consensus, they were assigned as co-consensus nucleotides." Nucleic Acids Res 1991 19 10 2715-2720 1430 Shapiro,M.B. RNA Splice Junctions o.. Nucleic Acids R 87 15(17):7155-71 Shapiro MB; Senapathy P RNA Splice Junctions of Different Classes of Eukaryotes: Sequence Statistics and Functional Implications in Gene Expression Consensus method; USA; RNA; Gene; Expression "The following simple rule was used in arriving at a consensus sequence at each location: if the highest percentage computed for a particular nucleotide site equals or exceeds 40, choose the corresponding nucleotide; choose also the nucleotide with the second highest percentage if it equals or exceeds 30 and is at least twice as large as the third highest percentage." Nucleic Acids Res 1987 15 17 7155-7174 1431 Arratia,R. Two Moments Suffice fo.. Ann.Probab. 89 17(1):9-25 Arratia R; Goldstein L; Gordon L Two Moments Suffice for Poisson Approximations: The Chen-Stein Method Statistical; Significance; Poisson; Chen-Stein; USA; Approximation "Convergence to the Poisson distribution, for the number of occurrences of dependent events, can often be established by computing only first and second moments, but not higher ones. This remarkable result is due to Chen (1975). The method also provides an upper bound on the total variation distance to the Poisson distribution, and succeeds in cases where third and higher moments blow up. This paper presents Chen's results in a form that is easy to use and gives a multivariable extension, which gives an upper bound on the total variation distance between a sequence of dependent indicator functions and a Poisson process with the same intensity." Ann Probab 1989 17 1 9-25 1432 Wilson,A.C. Biochemical Evolution Annu.Rev.Bioche 77 46:573-639 Wilson AC; Carlson SS; White TJ Biochemical Evolution Evolution; Clock; Evolutionary rate; USA "This review deals with the contributions of comparative studies on the nucleic acids and proteins of present-day organisms to knowledge of evolution. ... The main concern of this review is with the rates at which base substitutions and amino acid substitutions have been fixed and with the relationship between these rates and the rates of organismal evolution. We consider topics that have not been comprehensively reviewed before, such as molecular evolution in primates, the generation-time hypothesis, stochastic variation in the evolutionary clock, and the relationship between sequence evolution and organismal evolution." Annu Rev Biochem 46 46 573-639 1433 Day,W.H.E. Sequence Analysis and .. CSNA Newsletter 94 37(Nov.):0-0 Day WHE Sequence Analysis and Comparison: A Bibliography. Version 4.0 - 5 October 1994 Sequence analysis; Sequence comparison; Bibliography; CA "I am maintaining, in electronic form, a bibliography of papers on the theory or methodology of sequence analysis, alignment, comparison or consensus. The bibliography includes many papers on the estimation of phylogenies from sequences. It has only a few papers on the alignment, comparison or prediction of sequence structures." Included are instructions on: how to perform free-text searching of version 3.0 with the Wide-Area Information Server (WAIS), and how to obtain a text-only file of the bibliography by anonymous ftp (file transfer protocol). CSNA Newsletter 1994 37 Nov. 0-0 1434 Dembo,A. Strong Limit Theorems .. Ann.Probab. 91 19(4):1737-175 Dembo A; Karlin S Strong Limit Theorems of Empirical Functionals for Large Exceedances of Partial Sums of I.I.D. Variables Statistical; Significance; Scoring; Sequence analysis; USA The paper's results are applied "in characterizing the composition of high scoring segments in letter sequences .... The [problem is] of interest in connection with molecular (DNA and protein) sequence comparisons (see Section 4, Karlin and Altschul (1990) and Karlin, Dembo and Kawabata (1990)) ...." Ann Probab 1991 19 4 1737-1755 1435 Karlin,S. Limit Distributions of.. Adv.Appl.Probab 92 24:113-140 Karlin S; Dembo A Limit Distributions of Maximal Segmental Score Among Markov-Dependent Partial Sums Sequence analysis; Statistical; Significance; Markov; USA; Distribution; Score "In this paper we derive new probabilistic formulas useful for assessing statistical significance (unusual high values) of a sequence segment composition allowing a general scoring scheme in letter values in the context of Markov- dependent sequences. (For biological discussions and applications, see Karlin & Altschul (1990) and Karlin, Bucher, Brendel & Altschul (1991).) The formulas have been incorporated into computer software that are now effective in the analysis of biomolecular sequence data (e.g. Altschul, Gish, Miller, Myers & Lipman (1990), Altschul & Lipman (1990), Karlin, Bucher, Brendel & Altschul (1991))." Adv Appl Probab 24 24 113-140 1436 Gusfield,D. Parametric Optimizatio.. ACM-SIAM Sympos 92 3:432-439 Gusfield D; Balasubramanian K; Naor D Parametric Optimization of Sequence Alignment Pairwise alignment; Parametric; Sequence alignment; USA; Optimization "Parametric Sequence Alignment is the problem of computing the optimal valued alignment between two sequences as a function of variable weights for matches, mismatches, spaces and gaps. The goal is to partition the parameter space into regions (which are necessarily convex) such that in each region one alignment is optimal throughout and such that the regions are maximal for this property. In this paper we are primarily concerned with the structure of this convex decomposition, and secondarily with the complexity of computing the decomposition." ACM-SIAM Sympos Discrete Algorithms 1992 3 432-439 1437 Arratia,R. Poisson Approximation .. Stat.Sci. 90 5(4):403-434 Arratia R; Goldstein L; Gordon L Poisson Approximation and the Chen-Stein Method Pairwise comparison; Statistical; Significance; Poisson; Chen-Stein; USA; Approximation Includes commentaries by J. M. Steele, A. D. Barbour, M. S. Waterman and L. H. Y. Chen, and also a rejoinder by the authors. "The Chen-Stein method of Poisson approximation is a powerful tool for computing an error bound when approximating probabilities using the Poisson distribution. In many cases, this bound may be given in terms of first and second moments alone. We present a background of the method and state some fundamental Poisson approximation theorems. The body of this paper is an illustration, through varied examples, of the wide applicability and utility of the Chen-Stein method. ... We conclude with an application to molecular biology." Stat Sci 1990 5 4 403-434 1438 Li,W.H. Fundamentals of Molecu.. 91Sinauer Associa Li WH; Graur D Fundamentals of Molecular Evolution BK - Gene; Evolution; Genome; Phylogeny; USA "We have set out to write a book for 'beginners' in molecular evolution. At the same time, we have tried to maintain the standards of the scientific method and to include quantitative treatments of the issues at hand. Therefore, in describing evolutionary phenomena and mechanisms at the molecular level, both mathematical and intuitive explanations are provided. Neither is meant to be at the expense of the other; rather, the two approaches are intended to complement each other and to help the reader achieve a better grasp of the issues. We have not attempted to attain encyclopedic completeness, but have provided a large number of examples to support and clarify the many theoretical arguments and discussions." Sinauer Associates Inc ,Sunderland, MA 1991 xv+284-0 1439 Combinatorial Pattern .. 93Springer-Verlag Combinatorial Pattern Matching. 4th Annual Symposium, CPM 93. Proceedings. Lecture Notes in Computer Science, Volume 684. Apostolico A Crochemore M; Galil Z; Manber U BK - Pattern match; Italy; Combinatorial Padova, Italy, June 1993. "Combinatorial Pattern Matching addresses issues of searching and matching of strings and more complicated patterns such as trees, regular expressions, extended expressions, etc. The goal is to derive nontrivial combinatorial properties for such structures and then to exploit these properties in order to achieve superior performances for the corresponding computational problems." Springer-Verlag Berlin 1993 viii+265-0 1440 Combinatorial Pattern .. 92Springer-Verlag Combinatorial Pattern Matching. Third Annual Symposium. Proceedings. Lecture Notes in Computer Science, Volume 644. Apostolico A Crochemore M; Galil Z; Manber U BK - Pattern match; Pattern search; USA; Combinatorial Tucson, Arizona, April/May 1992. "Combinatorial Pattern Matching addresses issues of searching and matching of strings and more complicated patterns such as trees, regular expressions, extended expressions, etc. The goal is to derive nontrivial combinatorial properties for such structures and then to exploit these properties in order to achieve superior performances for the corresponding computational problems. In recent years, a steady flow of high-quality scientific study of this subject has changed a sparse set of isolated results into a full-fledged area of algorithmics. Still, there is currently no central place for disseminating results in this area. We hope that CPM can grow to serve as the focus point." Springer-Verlag Berlin 1992 x+287-0 1441 Allison,L. Using Hirschberg's Alg.. Inform.Process. 94 51(5):251-255 Allison L Using Hirschberg's Algorithm to Generate Random Alignments of Strings Pairwise alignment; Monte Carlo; Edit; Distance; Subsequence; Longest common; AU; Algorithm "Hirschberg [(1975)] gave an alignment algorithm for the longest common subsequence problem that uses O( n2 ) time and O( n ) space for two strings of length n. A simple modification of the algorithm can sample string alignments at random according to their probability distribution. This is useful for statistical estimation of evolutionary distances of a family of strings, e.g. DNA strings. The algorithm's time and space complexity are unchanged." Inform Process Lett 1994 51 5 251-255 1442 Borodovsky,M. Deriving Non-Homogeneo.. Computers Chem. 94 18(3):259-267 Borodovsky M; Peresetsky A Deriving Non-Homogeneous DNA Markov Chain Models by Cluster Analysis Algorithm Minimizing Multiple Alignment Entropy Pattern discovery; Markov; Multiple alignment; Clustering; Statistical; USA; DNA; Entropy; Model; Algorithm "Non-homogeneous Markov chain models can represent biologically important regions of DNA sequences. The statistical pattern that is described by these models is usually weak and was found primarily because of strong biological indications. The general method for extracting similar patterns is presented in the current paper. The algorithm incorporates cluster analysis, multiple alignment and entropy minimization. ... These Markov models were already employed in the GeneMark gene prediction algorithm, which is used in genome sequencing projects." Computers Chem 1994 18 3 259-267 1443 Brendel,V. Applications of Statis.. Computers Chem. 94 18(3):251-253 Brendel V; Karlin S Applications of Statistical Criteria in Protein Sequence Analysis: Case Study of Yeast RNA Polymerase II Subunits Pattern discovery; Statistical; Sequence analysis; USA; Protein; RNA "We have recently proposed statistical techniques to identify unusual protein sequence features [Brendel, Bucher, Nourbakhsh, Blaisdell, Karlin (1992); Karlin, Brendel (1992); Karlin, Blaisdell, Bucher (1992)]. Extensive mapping of these features to particular groups of proteins may afford new ways of protein classification. Here we present a case study of such analysis by discussing special features of the amino acid sequences of yeast RNA polymerase II, the first eukaryotic RNA polymerase for which all subunits have been sequenced." Computers Chem 1994 18 3 251-253 1444 Karlin,S. A Method to Identify D.. J.Mol.Biol. 89 205(1):165-177 Karlin S; Blaisdell BE; Mocarski ES; Brendel V A Method to Identify Distinctive Charge Configurations in Protein Sequences, with Application to Human Herpesvirus Polypeptides Pattern discovery; Statistical; Significance; Protein; USA; Charge "Charge interactions are of great importance for protein function and structure, and for a variety of cellular and biochemical processes. We present a systematic approach to the detection of distinctive clusters, runs and periodic patterns of charged residues in a protein sequence. Criteria and formulae are set forth to assess statistical significance of these charge configurations. ... The statistics developed in this paper apply more generally to other than charge properties of a protein and should aid in the evaluation of a large variety of sequence features." J Mol Biol 1989 205 1 165-177 1445 Karlin,S. Statistical Analyses o.. Nucleic Acids R 92 20(6):1363-137 Karlin S; Burge C; Campbell AM Statistical Analyses of Counts and Distributions of Restriction Sites in DNA Sequences Pattern discovery; Statistical; Significance; Restriction; Distribution; DNA; USA "Counts and spacings of all 4- and 6-bp palindromes in DNA sequences from a broad range of organisms were investigated. Both 4- and 6-bp average palindrome counts were significantly low in all bacteriophages except one, probably as a means of avoiding restriction enzyme cleavage. ... The counts and distributions of 4-bp and 6-bp restriction sites in bacterial species are variable. ... Interpretations of these results are given in terms of restriction/methylation regimes, recombination and transcription processes, and possible structural and regulatory roles of 4- and 6-bp palindromes." Nucleic Acids Res 1992 20 6 1363-1370 1446 Karlin,S. Quantile Distributions.. Protein Eng. 92 5(8):729-738 Karlin S; Blaisdell BE; Bucher P Quantile Distributions of Amino Acid Usage in Protein Classes Pattern discovery; Statistical; Significance; Distribution; Protein; USA; Amino acid "A comparative study of the compositional properties of various protein sets from both cellular and viral organisms is presented. Invariants and contrasts of amino acid usages have been discerned for different protein function classes and for different species using robust statistical methods based on quantile distributions and stochastic ordering relationships. In addition, a quantitative criterion to assess amino acid compositional extremes relative to a reference protein set is proposed and applied." Protein Eng 1992 5 8 729-738 1447 Karlin,S. Patchiness and Correla.. Science 93 259 (29 Jan.): Karlin S; Brendel V Patchiness and Correlations in DNA Sequences Pattern discovery; Statistical; DNA; Correlation; Stochastic; USA "The highly nonrandom character of genomic DNA can confound attempts at modeling DNA sequence variation by standard stochastic processes (including random walk or fractal models). In particular, the mosaic character of DNA consisting of patches of different composition can fully account for apparent long-range correlations in DNA." Science 1993 259 29 Jan. 677-680 1448 Dembo,A. Central Limit Theorems.. Stochastic Proc 93 45(2):259-271 Dembo A; Karlin S Central Limit Theorems of Partial Sums for Large Segmental Values Pattern discovery; Statistical; Significance; Central limit; USA "Many random structures of theoretical and practical importance are associated with sequences of real random variables of high aggregate values having small probability, often exponentially small. In this context we set forth a class of Gaussian distributional limit theorems conditioned on rare events. The results can be construed as a central limit theorem in the context of large deviation theory. ... Our motivation stems from biomolecular sequence comparisons, Karlin and Altschul (1990), Karlin et al. (1990)." Stochastic Processes and Their Applications 1993 45 2 259-271 1449 Lawrence,C.E. Detecting Subtle Seque.. Science 93 262 (8 Oct.):2 Lawrence CE; Altschul SF; Boguski MS; Liu JS; Neuwald AF; Wootton JC Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment Pattern discovery; Multiple alignment; Repeat; Signal; USA; Sampling "A wealth of protein and DNA sequence data is being generated .... A crucial barrier to deciphering these sequences and understanding the relations among them is the difficulty of detecting subtle local residue patterns common to multiple sequences. Such patterns frequently reflect similar molecular structures and biological properties. A mathematical definition of this 'local multiple alignment' problem suitable for full computer automation has been used to develop a new and sensitive algorithm, based on the statistical method of iterative sampling." Science 1993 262 8 Oct. 208-214 1450 Apostolico,A. Optimal Canonization o.. Inform.Comput. 91 95(1):76-95 Apostolico A; Crochemore M Optimal Canonization of All Substrings of a String Sequence analysis; Factor; Optimal; Italy "Any word can be decomposed uniquely into lexicographically nonincreasing factors each one of which is a Lyndon word. This paper addresses the relationship between the Lyndon decomposition of a word x and a canonical rotation of x, i.e., a rotation w of x that is lexicographically smallest among all rotations of x. The main combinatorial result is a characterization of the Lyndon factor of x with which w must start. As an application, faster on-line algorithms for finding the canonical rotation(s) of x are developed by nontrivial extension of known Lyndon factorization strategies." Inform Comput 1991 95 1 76-95 1451 Apostolico,A. Efficient Detection of.. Theoret.Comput. 93 119(2):247-265 Apostolico A; Ehrenfeucht A Efficient Detection of Quasiperiodicities in Strings Sequence analysis; Regularities; USA; Detection "A string z is quasiperiodic if there is a second string w not= z such that the occurrences of w in z cover z entirely, i.e., every position of z falls within some occurrence of w in z. It is shown here that all maximal quasiperiodic substrings of a string x of n symbols can be detected in time O(n log2 n)." Theoret Comput Sci 1993 119 2 247-265 1452 Apostolico,A. Self-Alignments in Wor.. J.Algorithms 92 13(3):446-467 Apostolico A; Szpankowski W Self-Alignments in Words and Their Applications Self alignment; Word; Repetition; Suffix; USA "Some quantities associated with periodicities in words are analyzed within the Bernoulli probabilistic model. In particular, the following problem is addressed. Assume that a string X is given, with symbols emitted randomly but independently according to some known distribution of probabilities. Then, for each pair (W,Z) of distinct suffixes of X, the expected length of the longest common prefix of W and Z is sought. The collection of these lengths, that are called here self-alignments, plays a crucial role in several algorithmic problems on words, such as building suffix trees or inverted files, detecting squares and other regularities, computing substring statistics, etc." J Algorithms 1992 13 3 446-467 1453 Baeza-Yates,R Fast Two-Dimensional P.. Inform.Process. 93 45(1):51-57 Baeza-Yates R; Regnier M Fast Two-Dimensional Pattern Matching Pattern match; Multidimensional; CL "An algorithm for searching for a two-dimensional m x m pattern in a two- dimensional n x n text is presented. It performs on the average less comparisons than the size of the text: n2 / m using m2 extra space. Basically, it uses multiple string matching on only n / m rows of the text. It runs in at most 2n2 time and is close to the optimal n2 time for many patterns. It steadily extends to an alphabet-independent algorithm with a similar worst case. Experimental results are included for a practical version." Inform Process Lett 1993 45 1 51-57 1454 Bairoch,A. PROSITE: Recent Develo.. Nucleic Acids R 94 22(17):3583-35 Bairoch A; Bucher P PROSITE: Recent Developments Sequence database; PROSITE; Protein; Pattern library; SWI "PROSITE is a compilation of sites and patterns found in protein sequences; it can be used as a method of determining the function of uncharacterized proteins translated from genomic or cDNA sequences." Nucleic Acids Res 1994 22 17 3583-3589 1455 Bairoch,A. The SWISS-PROT Protein.. Nucleic Acids R 94 22(17):3578-35 Bairoch A; Boeckmann B The SWISS-PROT Protein Sequence Data Bank: Current Status Sequence database; SWISS-PROT; Protein; SWI "SWISS-PROT is an annotated protein sequence database established in 1986 and maintained collaboratively, since 1988, by the Department of Medical Biochemistry of the University of Geneva and the EMBL Data Library. The SWISS- PROT protein sequence data bank consists of sequence entries. Sequence entries are composed of different line types, each with their own format. For standardization purposes the format of SWISS-PROT follows as closely as possible that of the EMBL Nucleotide Sequence Database." Nucleic Acids Res 1994 22 17 3578-3580 1456 Bairoch,A. PROSITE: A Dictionary .. Nucleic Acids R 92 20(suppl):2013 Bairoch A PROSITE: A Dictionary of Sites and Patterns in Proteins Sequence database; PROSITE; Protein; Pattern library; SWI "PROSITE is a compilation of sites and patterns found in protein sequences. The use of protein sequence patterns (or motifs) to determine the function of proteins is becoming very rapidly one of the essential tools of sequence analysis. ... While there have been a number of recent reports that review published patterns, no attempt had been made until very recently to systematically collect biologically significant patterns or to discover new ones. It is for these reasons that we have developed, since 1988, a dictionary of sites and pattern which we call PROSITE." Nucleic Acids Res 1992 20 suppl 2013-2018 1457 Bairoch,A. The SWISS-PROT Protein.. Nucleic Acids R 92 20(suppl):2019 Bairoch A; Boeckmann B The SWISS-PROT Protein Sequence Data Bank Sequence database; SWISS-PROT; Protein; SWI "SWISS-PROT is an annotated protein sequence database established in 1986 and maintained collaboratively, since 1988, by the Department of Medical Biochemistry of the University of Geneva and the EMBL Data Library." Nucleic Acids Res 1992 20 suppl 2019-2022 1458 Appel,R.D. A New Generation of In.. Trends Biochem. 94 19(6):258-260 Appel RD; Bairoch A; Hochstrasser DF A New Generation of Information Retrieval Tools for Biologists: The Example of the ExPASy WWW Server Retrieval; WWW; Server; SWI "ExPASy is a WWW server set up at the University Hospital of Geneva and the Medical Biochemistry Department of Geneva University, and is dedicated to molecular biology with an emphasis on data relevant to proteins. The two main entry points on the server give access to the SWISS-PROT database of annotated protein sequences and the SWISS-2DPAGE database of two-dimensional gel electrophoresis images. SWISS-PROT can be searched by protein description, entry name or accession number, or referenced author name, as well as by performing a full text search on all the annotation fields." Trends Biochem Sci 1994 19 6 258-260 1459 Crochemore,M. Two-Dimensional Patter.. Inform.Process. 93 46(4):159-162 Crochemore M; Gasieniec L; Rytter W Two-Dimensional Pattern Matching by Sampling Pattern match; Multidimensional; Sampling; FR "We extend the concept of deterministic sampling to the two-dimensional pattern matching problem. We show that almost all patterns have a logarithmic deterministic sample. There are 2D-matching algorithms which work efficiently for almost all patterns. They solve the 2D-matching problem in linear sequential time with O(1) space, or, alternatively in O(1) parallel time with linear number of processors. This is the first attempt to reduce the space for two-dimensional pattern matching." Inform Process Lett 1993 46 4 159-162 1460 Crochemore,M. Efficient Parallel Alg.. Inform.Process. 91 38(2):57-60 Crochemore M; Rytter W Efficient Parallel Algorithms to Test Square-Freeness and Factorize Strings Regularities; Square; Parallel; Factor; FR; Algorithm "A string is square-free iff it does not contain a nonempty subword of the form ww. We give an algorithm testing square-freeness of strings in log n time with n processors of a CRCW PRAM. The input alphabet is not bounded. The best sequential time algorithm for this problem takes O(n log n) time. Hence the total number of operations in our parallel algorithm matches that of the best sequential algorithm. The algorithm relies on an efficient parallel computation of a factorization of words in text compression." Inform Process Lett 1991 38 2 57-60 1461 Crochemore,M. Parallel Construction .. Inform.Process. 90 35(3):121-128 Crochemore M; Rytter W Parallel Construction of Minimal Suffix and Factor Automata Automata; Factor; Suffix; Parallel; FR "We show that the constructions of directed acyclic word graphs (dawg's) and of minimal suffix and minimal factor automata can be done by almost optimal parallel algorithms (optimal within logarithmic factor). In the concurrent-write model our algorithms work in log n time and in the exclusive-write model they work in log2 n time. The number of employed processors is linear. Hence our constructions have the same complexity as the best known parallel algorithms computing suffix trees. A relationship between dawg's and suffix trees is exploited ...." Inform Process Lett 1990 35 3 121-128 1462 Neraud,J. A String-Matching Inte.. Theoret.Comput. 92 92(1):145-164 Neraud J; Crochemore M A String-Matching Interpretation of the Equation xmyn = zp String match; Pattern match; On-line; FR "We consider the following problem. Instance: a finite alphabet A, a biprefix code X = {x,y} whose elements are primitive, a word w in A*. Question: find all maximal factors of w which are prefixes of a word of X*. We present an on-line algorithm which solves the problem in time linear in the length of w, after a preprocessing phase applied to the set X." Theoret Comput Sci 1992 92 1 145-164 1463 Crochemore,M. Usefulness of the Karp.. Theoret.Comput. 91 88(1):59-82 Crochemore M; Rytter W Usefulness of the Karp-Miller-Rosenberg Algorithm in Parallel Computations on Strings and Arrays String match; Parallel; Multidimensional; Sequence analysis; Repeat; FR; Algorithm "The Karp-Miller-Rosenberg (1972) algorithm was one of the first efficient (almost linear) sequential algorithms for finding repeated patterns and for string matching. In the area of efficient sequential computations on strings it was soon superseded by more efficient (and more sophisticated) algorithms. We show that the Karp-Miller-Rosenberg algorithm (KMR) must be considered as a basic technique in parallel computations. For many problems, variations of KMR give the (known) most efficient parallel algorithms." Theoret Comput Sci 1991 88 1 59-82 1464 Fuchs,R. Molecular Biological D.. Progress in Bio 91 56(3):215-245 Fuchs R; Cameron GN Molecular Biological Databases: The Challenge of the Genome Era Sequence database; Genome; DE "In this article we discuss the implications which the advances in sequencing technology and the genome analysis projects will have for the existing sequence databanks and how they can react to the challenges of the future. ... The first section provides some basic information on sequence databases and genome projects in order to improve the understanding of the problems which the databanks will have to face in the coming years. Then, the consequences of large-scale sequencing and genome analysis projects are explained in detail and it is shown that they require fundamanetal changes to the work of the sequence databanks. Next, different approaches and strategies for coping with the forthcoming problems are outlined, and finally we present a model for a next generation of sequence and other biological databases which requires a conceptional reorganization of these databases, but which offers good chances for successfully mastering the challenges of the future." Progress in Biophysics and Molecular Biology 1991 56 3 215-245 1465 Higgins,D.G. The EMBL Data Library Nucleic Acids R 92 20(suppl):2071 Higgins DG; Fuchs R; Stoehr PJ; Cameron GN The EMBL Data Library Sequence database; EMBL; Nucleotide; DE "The EMBL Data Library is part of the European Molecular Biology Laboratory in Heidelberg, Germany. It was established in 1980 and its principal role is to maintain and distribute a database of nucleotide sequences (the EMBL Nucleotide Sequence Database). It is also involved in maintaining other biological databases such as the protein sequence database SWISS-PROT and distributes other databases of interest to molecular biologists." Nucleic Acids Res 1992 20 suppl 2071-2074 1466 Luttke,A. MacP12: A Protein Prop.. Comput.Appl.Bio 93 9(6):760-761 Luttke A; Fuchs R MacP12: A Protein Property Multi-Profile Plot Program for the Apple Macintosh Sequence analysis; Display; Profile; Protein; Amino acid; DE; Program "MacP12, a program for the Apple Macintosh, allows simultaneous plotting of two protein property profiles selectable from 12 built-in amino acid property scales. Various parameters such as the region to be analyzed, the size of the sliding window, the weighting function and the size of the graphical output can be easily adjusted by the user, which makes this program appropriate for diverse research questions. Since build-in scales can be simply exchanged, MacP12 is adaptable to the specific needs of the individual user." Comput Appl Biosci 1993 9 6 760-761 1467 Benner,S.A. Empirical and Structur.. J.Mol.Biol. 93 229(4):1065-10 Benner SA; Cohen MA; Gonnet GH Empirical and Structural Models for Insertions and Deletions in the Divergent Evolution of Proteins Evolution; Protein; Model; Indel; SWI; Deletion "The exhaustive matching of the protein sequence database makes possible a broadly based study of insertions and deletions (indels) during divergent evolution. In this study, the probability of a gap in an alignment of a pair of homologous protein sequences was found to increase with the evolutionary distance measured in PAM units (number of accepted point mutations per 100 amino acid residues). A relationship between the average number of amino acid residues between indels and evolutionary distance suggests that a unit 30 to 40 amino acid residues in length remains, on average, undisrupted by indels during divergent evolution." J Mol Biol 1993 229 4 1065-1082 1468 Gusfield,D. Faster Implementation .. Inform.Process. 94 51(5):271-274 Gusfield D Faster Implementation of a Shortest Superstring Approximation Supersequence; Shortest common; Approximation; Data structure; Regularities; USA "The shortest superstring problem has recently received renewed attention due to its connection to problems in sequencing long pieces of DNA. Most recently, Teng and Yao (1993) developed an approximation algorithm for the shortest superstring problem which has a smaller error bound than the previously best approximation due to Blum, Jiang, Li, Tromp and Yannakakis (1991). ... In this paper we reduce the worst case running time for the new approximation method of Teng and Yao, making its running time competitive with the approximation method of Blum et al. We exploit suffix trees and properties of the periodicity of strings." Inform Process Lett 1994 51 5 271-274 1469 Gusfield,D. An Efficient Algorithm.. Inform.Process. 92 41(4):181-185 Gusfield D; Landau GM; Schieber B An Efficient Algorithm for the All Pairs Suffix-Prefix Problem Sequence analysis; Suffix; Prefix; USA; Algorithm "For a pair of strings (S1, S2), define the suffix-prefix match of (S1, S2) to be the longest suffix of string S1 that matches a prefix of string S2. The following problem is considered in this paper. Given a collection of strings S1, S2, ..., Sk of total length m, find the suffix-prefix match for each of the k(k-1) ordered pairs of strings. We present an algorithm that solves the problem in O( m + k2 ) time, for any fixed alphabet. Since the size of the input is W(m) and the size of the output is W(k2) this solution is optimal." Inform Process Lett 1992 41 4 181-185 1470 Gusfield,D. Parametric Optimizatio.. Algorithmica 94 12(4/5):312-32 Gusfield D; Balasubramanian K; Naor D Parametric Optimization of Sequence Alignment Sequence alignment; Pairwise alignment; Parametric; Edit; Distance; Optimization; USA "Parametric sequence alignment is the problem of computing the optimal- valued alignment between two sequences as a function of variable weights for matches, mismatches, spaces, and gaps. ... In this paper we are primarily concerned with the structure of this convex decomposition, and secondarily with the complexity of computing the decomposition. The most striking results are the following: For the special case where only matches, mismatches, and spaces are counted, and where spaces are counted throughout the alignment, we show that the decomposition is surprisingly simple: all regions are infinite; there are at most n2/3 regions; the lines that bound the regions are all of the form b = c + (c + 0.5)a; ..." Algorithmica 1994 12 4/5 312-326 1471 Hendy,M.D. A Combinatorial Descri.. Discrete Math. 91 96(1):51-58 Hendy MD A Combinatorial Description of the Closest Tree Algorithm for Finding Evolutionary Trees Phylogeny; Combinatorial; Evolutionary tree; NZ; Algorithm "The closest tree algorithm for estimating the evolutionary history of n species, from a set of homologous DNA or RNA sequences is designed to avoid the problem of inconsistency inherent in current methods. ... In this paper, a new description of the algorithm is given, exploiting a combinatorial inverse pair relationship. As a consequence, the algorithm can be improved in efficiency, to be O(n2n) for some classes of sequences. This improvement makes the algorithm practical for problems involving up to n = 20 species." Discrete Math 1991 96 1 51-58 1472 Hendy,M.D. A Discrete Fourier Ana.. Proc.Nat.Acad.S 94 91(8):3339-334 Hendy MD; Penny D; Steel MA A Discrete Fourier Analysis for Evolutionary Trees Evolutionary tree; Fourier; Spectral analysis; NZ "Discrete Fourier transformations have recently been developed to model the evolution of two-state characters (the Cavender/Farris model). We report here the extension of these transformations to provide invertible relationships between a phylogenetic tree T (with three probability parameters of nucleotide substitution on each edge corresponding to Kimura's 3ST model) and the expected frequencies of the nucleotide patterns in the sequences. We refer to these relationships as spectral analysis." Proc Nat Acad Sci USA 1994 91 8 3339-3343 1473 Feldman,W. Gray Code Masks for Se.. Genomics 94 23:233-235 Feldman W; Pevzner P Gray Code Masks for Sequencing by Hybridization Sequencing; Hybridization; Error; USA; Mask "In light-directed synthesis of high-density oligonucleotide arrays for sequencing by hybridization, synthesis errors result from the unintended illumination of chip regions that should remain dark. Most synthesis errors occur at the borders of illuminated regions, where light diffraction, internal reflection, and scattering produce the most unintended illumination. A combinatorial synthesis strategy based on two-dimensional Gray codes was devised to reduce the overall lengths of these borders in masks for photolithographic chip design. This article describes an application of two-dimensional Gray codes ...." Genomics 23 23 233-235 1474 Shields,D.C. GCWIND: A Microcompute.. Comput.Appl.Bio 92 8(5):521-523 Shields DC; Higgins DG; Sharp PM GCWIND: A Microcomputer Program for Identifying Open Reading Frames According to Codon Positional G+C Content Frame; Reading; Codon; Program; IR "GCWIND is a microcomputer (IBM-PC compatible) program for the identification of protein-coding open reading frames. The program is similar to the FRAME program .... The base compositions (%G+C) for each of the three possible reading phases through the DNA sequence are displayed separately, together with the positions of potential translation initiation and termination codons (on the leading and complementary strands), to provide an immediate representation of those regions within the sequence that have coding potential." Comput Appl Biosci 1992 8 5 521-523 1475 Kishino,H. Maximum Likelihood Inf.. J.Mol.Evol. 90 31(2):151-160 Kishino H; Miyata T; Hasegawa M Maximum Likelihood Inference of Protein Phylogeny and the Origin of Chloroplasts Phylogeny; Likelihood; Protein; Markov; JP; Chloroplast "A maximum likelihood method for inferring protein phylogeny was developed. It is based on a Markov model that takes into account the unequal transition probabilities among pairs of amino acids and does not assume constancy of rate among different lineages. Therefore, this method is expected to be powerful in inferring phylogeny among distantly related proteins, either orthologous or paralogous, where the evolutionary rate may deviate from constancy. Not only amino acid substitutions but also insertion/deletion events during evolution were incorporated into the Markov model." J Mol Evol 1990 31 2 151-160 1476 Myers,E.W. A Sublinear Algorithm .. Algorithmica 94 12(4/5):345-37 Myers EW A Sublinear Algorithm for Approximate Keyword Searching Approximate match; Database search; Dynamic programming; USA; Algorithm "Given a relatively short query string W of length P, a long subject string A of length N, and a threshold D, the approximate keyword search problem is to find all substrings of A that align with W with not more than D insertions, deletions, and mismatches. ... In this paper we present an algorithm that given a precomputed index of the database A, finds rare matches in time that is sublinear in N, i.e., Nc for some c < 1. The sequence A must be over a finite alphabet S. ... In preliminary practical experiments, the approach gives a 50- to 500-fold improvement over previous algorithms for problems of interest in molecular biology." Algorithmica 1994 12 4/5 345-374 1477 Fischetti,V.A Identifying Periodic O.. Inform.Process. 93 45(1):11-18 Fischetti VA; Landau GM; Schmidt JP; Sellers PH Identifying Periodic Occurrences of a Template with Applications to Protein Structure Protein; Structure; Template; Regularities; String match; USA Author sequence and affiliations were corrected on page 157 of Inform. Process. Lett. 46 (1993). "Consider a template P of size m in which each character matches many different characters with various degrees of perfection. Given a text T of size n, we present a simple and practical algorithm that finds the substring of T, which best matches some substring of Pn (Pn is the concatenation of an arbitrary number of copies of P). The algorithm produces the matched pair and their alignment in O( mn ) time." Inform Process Lett 1993 45 1 11-18 1478 Li,W.H. Unbiased Estimation of.. J.Mol.Evol. 93 36(1):96-99 Li WH Unbiased Estimation of the Rates of Synonymous and Nonsynonymous Substitution Substitution; Synonymous; Rate; Transition; Transversion; USA; Estimation "The current convention in estimating the number of substitutions per synonymous site (KS) and per nonsysnonymous site (KA) between two protein-coding genes is to count each twofold degenerate site as one-third synonymous and two- thirds nonsynonymous because one of the three possible changes as such a site is synonymous and the other two are nonsynonymous. This counting rule can considerably overestimate the KS value .... A new method that gives unbiased estimates is proposed." J Mol Evol 1993 36 1 96-99 1479 Gates,W.H. Bounds for Sorting by .. Discrete Math. 79 27:47-57 Gates WH; Papadimitriou CH Bounds for Sorting by Prefix Reversal Inversion; Prefix; Reversal; Genomic; USA "For a permutation s of the integers from 1 to n, let f(s) be the smallest number of prefix reversals that will transform s to the identity permuation, and let f(n) be the largest such f(s) for all s in (the symmetric group) Sn. We show that f(n) <= (5n + 5)/3, and that f(n) >= 17n/16 for n a multiple of 16. If, furthermore, each integer is required to participate in an even number of reversed prefixes, the corresponding function g(n) is shown to obey 3n/2 - 1 <= g(n) <= 2n + 3." Discrete Math 27 27 47-57 1480 Myers,E.W. An O(N**2 log N) Restr.. Bull.Math.Biol. 92 54(4):599-618 Myers EW; Huang X An O(N2 log N) Restriction Map Comparison and Search Algorithm Restriction; Mapping; USA; Algorithm "We present an O(R log P) time, O(M + P2) space algorithm for searching a restriction map with M sites for the best matches to a shorter map with P sites, where R, the number of matching site pairs, is bounded by MP. As first proposed by Waterman et al. (1984) the objective function used to score matches is additive in the number of unaligned sites and the discrepancies in the distances between adjacent aligned sites. Our algorithm is basically a sparse dynamic programming computation in which 'candidate lists' are used to model the future contribution of all previously computed entries to those yet to be computed." Bull Math Biol 1992 54 4 599-618 1481 Pevzner,P.A. Optimal Chips for Mega.. Mol.Biol.(Mosc. 91 25(2 part 2):4 Pevzner PA; Lysov YuP; Khrapko KR; Belyavskii AV; Florentev VL; Mirzabekov AD Optimal Chips for Megabase DNA Sequencing Sequencing; DNA; Optimal; RU Translated from Molekulyarnaya Biologiya, 25(2), 552-562, March-April 1991. "A new approach to DNA sequencing associated with hybridization of a DNA fragment with oligonucleotides immobilized on a two-dimensional matrix (the SHOM method) was proposed in 1988. The first SHOM studies were directed at creating a sequence matrix containing all 65,536 octanucleotides. A new family of sequencing matrices has now been proposed, making it possible to reduce the number of oligonucleotides to be synthesized by a factor of 5-15 with virtually no decrease in method resolution." Mol Biol (Mosc ) 1991 25 2 part 2 459-467 1482 Gelfand,M.S. Extendable Words in Nu.. Comput.Appl.Bio 92 8(2):129-135 Gelfand MS; Kozhukhin CG; Pevzner PA Extendable Words in Nucleotide Sequences Sequence analysis; Word; Statistical; Nucleotide; Linguistic; RU "Previous statistical analyses revealed several peculiarities of nucleotide sequences that preclude their description by existing models and thus allow one to distinguish DNA and RNA sequences from random A,T,C,G-texts. This is a consequence of the unusual distribution of certain words in nucleotide sequences: while the distribution of (most) words is consistent with Markov models of small orders, the distribution of certain words cannot be described by any previous model .... In this work we introduce a probabilistic approach that is partly motivated by analogy with linguistics." Comput Appl Biosci 1992 8 2 129-135 1483 Sankoff,D. Analytical Approaches .. Biochimie 93 75(5):409-413 Sankoff D Analytical Approaches to Genomic Evolution Genomic; Evolution; Analytical; CA "We model the non-local mechanisms of genomic evolution and propose methods for studying the evolutionary divergence of species based on these models. Mechanisms include the movement of segments of genomes within a single chromosome (transpositions), the reciprocal translocation of segments between two chromosomes, and the inversion of segments. Each of these is studied in the context of a different type of genomic data. We introduce the theory of phylogenetic invariants for evolutionary inference based on very long macromolecular sequences." Biochimie 1993 75 5 409-413 1484 Ferretti,V. The Empirical Discover.. Adv.Appl.Probab 93 25(2):290-302 Ferretti V; Sankoff D The Empirical Discovery of Phylogenetic Invariants Phylogeny; Invariant; Markov; CA; Phylogenetic "An invariant F of a tree T under a k-state Markov model, where the time parameter is identified with the edges of T, allows us to recognize whether data on N observed species can be associated with the N terminal vertices of T in the sense of having been generated on T rather than on any other tree with N terminals. The invariance is with respect to the (time) lengths associated with the edges of the tree. We propose a general method of finding invariants of a parametrized functional form. ... We apply this to the case of quadratic invariants of unrooted binary trees with four terminals, for all k, using the Jukes-Cantor type of Markov matrix." Adv Appl Probab 1993 25 2 290-302 1485 Sibbald,P.R. Overseer: A Nucleotide.. Comput.Appl.Bio 92 8(1):45-48 Sibbald PR; Sommerfeldt H; Argos P Overseer: A Nucleotide Sequence Searching Tool Sequence search; Nucleotide; Database search; DE "Overseer is a computer program that searches databases of nucleic acid sequences for objects of interest to the user. Such objects may consist of any number of simpler building blocks such as repeats, palindromes or stem-loops, strings of particular bases with or without mismatches, etc. Written in standard Pascal, this program runs under Unix and VMS and should also run under other operating systems." Comput Appl Biosci 1992 8 1 45-48 1486 Guigo,R. Inferring Correlation .. IEEE Trans.Patt 93 15(10):1030-10 Guigo R; Smith TF Inferring Correlation between Database Queries: Analysis of Protein Sequence Patterns Database search; Correlation; Protein; Pattern discovery; USA; Query "Given a subset P of a database, we address the problem of finding the query f in a given database attribute having the closest extension to P. In the particular case that we outline, P is the set of protein sequences in a protein sequence database matching a given protein sequence pattern, whereas f is a query in the annotation of the database. Ideally, f is the description of a biological function. If the extension of f is very similar to P, we may infer association between the pattern and the biological function described by the query." IEEE Trans Patt Anal Mach Intell 1993 15 10 1030-1041 1487 Guigo,R. Automatic Evaluation o.. Comput.Appl.Bio 91 7(3):309-315 Guigo R; Johansson A; Smith TF Automatic Evaluation of Protein Sequence Functional Patterns Protein; Pattern discovery; USA "A procedure that automatically provides an evaluation of the diagnostic ability of a protein sequence functional pattern is described. The procedure relies on the identification of the closest definable set in terms of a (protein sequence) database functional annotation to the set of database instances containing a given pattern. Assuming annotation correctness and completeness in the protein sequence database, the degree of statistical association between these sets provides an appropriate measure of the diagnostic ability of the pattern." Comput Appl Biosci 1991 7 3 309-315 1488 Lamperti,E.D. Corruption of Genomic .. Nucleic Acids R 92 20(11):2741-27 Lamperti ED; Kittelberger JM; Smith TF; Villa-Komaroff L Corruption of Genomic Databases with Anomalous Sequence Sequence database; Reliability; Genomic; USA "We describe evidence that DNA sequences from vectors used for cloning and sequencing have been incorporated accidentally into eukaryotic entries in the GenBank database. These incorporations were not restricted to one type of vector or to a single mechanism. Many minor instances may have been the result of simple editing errors, but some entries contained large blocks of vector sequence that had been incorporated by contamination or other accidents during cloning. Some cases involved unusual rearrangements and areas of vector distant from the normal insertion sites." Nucleic Acids Res 1992 20 11 2741-2747 1489 Steel,M.A. Confidence in Evolutio.. Nature (Lond.) 93 364 (29 Jul.): Steel MA; Lockhart PJ; Penny D Confidence in Evolutionary Trees from Biological Sequence Data Evolutionary tree; Confidence; Significance; NZ "Where genomes have independently acquired similar G+C base compositions, signals in the data arise that cause methods of evolutionary tree reconstruction to estimate the wrong tree by grouping together sequences with similar G+C content. Under these conditions randomization tests can lead to both the rejection of the correct evolutionary hypothesis and acceptance of an incorrect hypothesis .... We have proposed one approach to testing for the G+C content problem. Here we present a formalization of this method, a frequency-dependent significance test, which has general application." Nature (Lond ) 1993 364 29 Jul. 440-442 1490 Szekely,L.A. Fourier Calculus on Ev.. Adv.Appl.Math. 93 14(2):200-216 Szekely LA; Steel MA; Erdos PL Fourier Calculus on Evolutionary Trees Evolutionary tree; Fourier; Calculus; Phylogeny; HU "We describe a Fourier analysis approach to the reconstruction theory of evolutionary trees that is based on Kimura's model of molecular evolution." Adv Appl Math 1993 14 2 200-216 1491 Prestridge,D. SIGNAL SCAN 3.0: New D.. Comput.Appl.Bio 93 9(1):113-115 Prestridge DS; Stormo G SIGNAL SCAN 3.0: New Database and Program Features Database search; Program; DNA; USA; Signal "SIGNAL SCAN is a program that utilizes a transcription factor database to find potential transcription factor binding sites in DNA sequences. ... SIGNAL SCAN is now network compatible and is available for IBM-compatible PC, Unix and VMA platforms." Comput Appl Biosci 1993 9 1 113-115 1492 Snyder,E.E. Identification of Codi.. Nucleic Acids R 93 21(3):607-613 Snyder EE; Stormo GD Identification of Coding Regions in Genomic DNA Sequences: An Application of Dynamic Programming and Neural Networks Pattern discovery; Coding; Region; Dynamic programming; Neural; Identification; DNA; Genomic; USA; Network; Dynamic "Dynamic programming (DP) is applied to the problem of precisely identifying internal exons and introns in genomic DNA sequences. The program GeneParser first scores the sequence of interest for splice sites and for these intron- and exon-specific content measures: codon usage, local compositional complexity, 6-tuple frequency, length distribution and periodic asymmetry. This information is then organized for interpretation by DP. GeneParser employs the DP algorithm to enforce the constraints that introns and exons must be adjacent and non-overlapping and finds the highest scoring combination of introns and exons subject to these constraints." Nucleic Acids Res 1993 21 3 607-613 1493 Huelsenbeck,J Is Character Weighting.. Syst.Biol. 94 43(2):288-291 Huelsenbeck JP; Swofford DL; Cunningham CW; Bull JJ; Waddell PJ Is Character Weighting a Panacea for the Problem of Data Heterogeneity in Phylogenetic Analysis? Phylogeny; Character data; Character weight; USA; Phylogenetic "Although we fully agree that a weighting strategy that correctly down- weights unreliable characters can improve accuracy of phylogenetic estimation (Bull et al., 1993:394), we find the position that weighting can take into account all forms of heterogeneity to be overly optimistic. Even if that position were supportable, however, we do not understand the claim that conditions under which differential weighting fails to solve the problem would also cause our approach to fail." Syst Biol 1994 43 2 288-291 1494 Taylor,W.R. Compensating Changes i.. Protein Eng. 94 7(3):341-348 Taylor WR; Hatrick K Compensating Changes in Protein Multiple Sequence Alignments Multiple alignment; Protein; Sequence alignment; Structure; UK "A method was developed to identify compensating changes between residues at positions in a multiple sequence alignment. (For example, one position might always contain a positively charged residue when the other is negatively charged and vice versa.) A correlation-based method was used to measure the compensation found in the four residues at a pair of positions in any two sequences in a multiple alignment. All possible sequence pairings were measured at the pair of positions and the resulting matrix analysed to give a measure of cooperativity among the pairs." Protein Eng 1994 7 3 341-348 1495 Taylor,W.R. Protein Fold Refinemen.. Protein Eng. 93 6(6):593-604 Taylor WR Protein Fold Refinement: Building Models from Idealized Folds using Motif Constraints and Multiple Sequence Data Protein; Fold; Model; Motif; Multiple alignment; Distance; UK "A general solution to the problem of directly incorporating data from multiple sequence alignments into the construction of molecular models was approached through the calculation of an estimated pairwise distance based on conserved hydrophobicity. A scaling method was developed that allowed the required bulk geometric properties of the estimated pairwise distances (mean and mean squared) to mimic those expected in a globular protein. These properties were maintained independently of the composition, length, number or degree of conservation of the original sequences." Protein Eng 1993 6 6 593-604 1496 Lerman,I.C. Classification of Alig.. New Approache.. 94Springer-Verlag Lerman IC; Nicolas J; Tallur B; Peter P Classification of Aligned Biological Sequences Diday E Lechevallier Y; Schader M; Bertrand P; Burtschy B New Approaches in Classification and Data Analysis Classification; Clustering; Similarity; Sequence comparison; FR "We considered the problem of classifying aligned sequences and applied our methods to two families - a family of 68 cytochrome sequences and that of 42 globin sequences. ... The main interest of this paper is to show the interactions between mathematical representation of similarities among these complex data structures and the outcome of clustering within the common framework of Likelihood Linkage Analysis (L.L.A.) (Lerman, Lerman et al. (1993)). Applying LLA methodology, fine and relevant results have been obtained for both data sets." Springer-Verlag Berlin 1994 370-377 1497 Huang,X. Dynamic Programming Al.. Comput.Appl.Bio 92 8(5):511-520 Huang X; Waterman MS Dynamic Programming Algorithms for Restriction Map Comparison Restriction; Mapping; Dynamic programming; USA; Dynamic; Algorithm "For most sequence comparison problems there is a corresponding map comparison algorithm. While map data may appear to be incompatible with dynamic programming, we show in this paper that the rigor and efficiency of dynamic programming algorithms carry over to the map comparison algorithms. We present algorithms for restriction map comparison .... The new algorithms are a natural extension of a previous map comparison model. Dynamic programming algorithms for computing optimal global and local alignments under the new model are described." Comput Appl Biosci 1992 8 5 511-520 1498 Crochemore,M. On Two-Dimensional Pat.. Theoret.Comput. 94 132:403-414 Crochemore M; Rytter W On Two-Dimensional Pattern Matching by Optimal Parallel Algorithms Pattern match; Parallel; Multidimensional; FR; Optimal; Algorithm "Simplified versions of Kedem-Landau-Palem algorithms for parallel one- dimensional and two-dimensional pattern-matching on a CRCW PRAM are presented. ... A novel algorithm for two-dimensional matching is presented which is more directly designed for two-dimensional objects." Theoret Comput Sci 132 132 403-414 1499 Wu,C. Back-Propagation and C.. Nucleic Acids R 94 22(20):4291-42 Wu C; Shivakumar S Back-Propagation and Counter-Propagation Neural Networks for Phylogenetic Classification of Ribosomal RNA Sequences Phylogeny; Classification; Neural; RNA; USA; Phylogenetic; Network "A neural network system has been developed for rapid and accurate classification of ribosomal RNA sequences according to phylogenetic relationship. The molecular sequences are encoded into neural input vectors using an n-gram hashing method. A SVD (singular value decomposition) method is used to compress and reduce the size of long and sparse n-gram input vectors. The neural networks used are three-layered, feed-forward networks that employ supervised learning paradigms, including the back-propagation algorithm and a modified counter-propagation algorithm." Nucleic Acids Res 1994 22 20 4291-4299 1500 Goldman,N. A Codon-based Model of.. Mol.Biol.Evol. 94 11(5):725-736 Goldman N; Yang Z A Codon-based Model of Nucleotide Substitution for Protein-coding DNA Sequences Codon; Model; Substitution; DNA; Phylogeny; Markov; UK; Nucleotide "A codon-based model for the evolution of protein-coding DNA sequences is presented for use in phylogenetic estimation. A Markov process is used to describe substitutions between codons. Transition/transversion rate bias and codon usage bias are allowed in the model, and selective restraints at the protein level are accommodated using physicochemical distances between the amino acids coded for by the codons." Mol Biol Evol 1994 11 5 725-736 1501 Muse,S.V. A Likelihood Approach .. Mol.Biol.Evol. 94 11(5):715-724 Muse SV; Gaut BS A Likelihood Approach for Comparing Synonymous and Nonsynonymous Nucleotide Substitution Rates, with Application to the Chloroplast Genome Likelihood; Synonymous; Substitution; Genome; Model; DNA; Evolution; USA; Nucleotide; Rate; Chloroplast "A model of DNA sequence evolution applicable to coding regions is presented. This represents the first evolutionary model that accounts for dependencies among nucleotides within a codon. The model uses the codon, as opposed to the nuceotide, as the unit of evolution, and is parameterized in terms of synonymous and nonsynonymous nucleotide substitution rates. One of the model's advantages ... is that it completely corrects for multiple hits at a codon, rather than taking a parsimony approach and considering only pathways of minimum change between homologous codons." Mol Biol Evol 1994 11 5 715-724 1502 Konopka,A.K. Computational Experime.. Computers Chem. 94 18(3):v-viii Konopka AK Computational Experiments in Molecular Biology: Searching for the 'Big Picture' Sequence analysis; USA This editorial introduces a special issue entitled "Open Problems of Computational Molecular Biology (3)." Many of the papers were presented at the Third International Workshop on Open Problems in Computational Molecular Biology, Telluride, 11-25 July 1993. Contents: Computational Issues in Genome Research (3 papers), Protein Structure Prediction (4), Mathematical Techniques in Sequence Research (4), Modelling Evolution and Development (3), Overviews and Opinions (2). Computers Chem 1994 18 3 v-viii 1503 Claverie,J.M. Some Useful Statistica.. Computers Chem. 94 18(3):287-294 Claverie JM Some Useful Statistical Properties of Position-Weight Matrices Sequence analysis; Pattern discovery; Statistical; Significance; Profile; Scoring; USA "Position-weight matrices (or profiles) are simple mathematical objects traditionally used to capture the information about local sequence patterns (or motifs) characteristic of a given structure or function. Although weight matrices can lead to fast database scanning algorithms their usage has been limited, due to the lack of a reliable method to assess the statistical significance of the matching scores. In this article I first review three different computation schemes for designing weight matrices .... I then show that, for patterns spanning 10 positions or more, the best scores expected from matching random sequences are distributed according to the extreme value (Gumbel) distribution." Computers Chem 1994 18 3 287-294 1504 Lawrence,C. Toward the Unification.. Computers Chem. 94 18(3):255-258 Lawrence C Toward the Unification of Sequence and Structural Data for Identification of Structural and Functional Constraints Pattern discovery; Structure; Function; Protein; USA; Identification "The identification and characterization of local residue patterns or conserved segments shared by a set of biopolymers has provided a number of insights in molecular biology. Biopolymer sequences are observations from macro molecules that share common structural or function features. The approach taken here rests on the notion that information may be most efficiently extracted from these observations through the use of a model that faithfully represents macro- molecular characteristics. Accordingly, our efforts are focused on statistical models which attempt to capture central features of protein structure, function, and change. Here the assumptions that underlie two new methods for the analysis of protein sequence data are explicitly delineated." Computers Chem 1994 18 3 255-258 1505 Fickett,J.W. Inferring Genes from O.. Computers Chem. 94 18(3):203-205 Fickett JW Inferring Genes from Open Reading Frames Gene; Protein; Coding; Reading; ORF; USA; Frame "One expects that in DNA without protein coding function, stop codons (which constitute three of the 64 possible codons) should occur frequently in all reading frames, and that a long open reading frame (ORF) can be interpreted as a sign for the existence of a gene. We make a beginning on introducing quantitative measures of confidence into this inference - taking Saccharomyces cerevisiae as a sample case - and show that some common assumptions can reasonably be questioned. In particular we show that statistical support for the biological function of shorter ORFs listed as putative genes in recent papers is in fact very weak." Computers Chem 1994 18 3 203-205 1506 Day,W.H.E. The Asymptotic Plurali.. Math.Comput.Mod 95 0:0-0 Day WHE; Kubicka E; Kubicki G; McMorris FR The Asymptotic Plurality Rule for Molecular Sequences Consensus method; Plurality rule; Sequence analysis; Characterization; USA "The asymptotic plurality rule, apl, is a consensus function which maps each profile P of length n (i.e., each sequence of n bases appearing at an aligned position of n molecules) to a set apl ( P ) of consensus results (i.e., ambiguity codes) that is a descriptive summary of P. Our main result is to characterize each consensus result X = apl ( P ) in terms of the frequencies with which the bases in P occur. We then use these characterizations to investigate features (e.g., strong consistency, length independence) of apl that researchers may find useful for the interpretation of apl's consensus results." Math Comput Modelling 0 0 0-0 1507 Baeza-Yates,R Analysis of Boyer-Moor.. ACM-SIAM Sympos 90 1:328-343 Baeza-Yates RA; Gonnet GH; Regnier M Analysis of Boyer-Moore-Type String Searching Algorithms String search; Boyer-Moore; Automata; CL; Algorithm "We study Boyer-Moore-type string searching algorithms. First, we analyze the Horspool's variant. The [average-case] searching time is linear. An exact expression of the linearity constant is derived and is proven to be asymptotically 1/c, where c is the cardinality of the alphabet. ... We also study Boyer-Moore automata, a notion that we formalize. This approach appears to be faster than any other known algorithm, in both the worst and average case number of inspections." ACM-SIAM Sympos Discrete Algorithms 1990 1 328-343 1508 Amir,A. Efficient Pattern Matc.. ACM-SIAM Sympos 90 1:344-357 Amir A; Landau GM; Vishkin U Efficient Pattern Matching with Scaling Pattern match; Multidimensional; USA "The problem of pattern matching with scaling is defined. The input for the two-dimensional version of the problem consists of an n x n 'text' matrix and an m x m 'pattern' matrix. We want to find all occurrences of the pattern in the text, scaled to all natural multiples. ... This problem is useful for some tasks in computer vision. Our main contribution is a linear time algorithm for the problem." ACM-SIAM Sympos Discrete Algorithms 1990 1 344-357 1509 Amir,A. Efficient 2-dimensiona.. ACM-SIAM Sympos 91 2:212-223 Amir A; Farach M Efficient 2-dimensional Approximate Matching of Non-rectangular Figures Pattern match; Multidimensional; Approximate match; USA "Finding all occurrences of a non-rectangular pattern of height m and area a in an n x n text with no more than k mismatch, insertion, and deletion errors is an important problem in computer vision. It can be solved using a dynamic programming approach in time O( an2 ). We show a O( kn2 ( m log m )1/2 ( k log k )1/2 + k2n2 ) algorithm which combines convolutions with dynamic programming. At the heart of the algorithm are the Smaller Matching Problem and the k-Aligned Ones with Location Problem. Efficient algorithms to solve both these problems are presented." ACM-SIAM Sympos Discrete Algorithms 1991 2 212-223 1510 Cole,R. Tight Bounds on the Co.. ACM-SIAM Sympos 91 2:224-233 Cole R Tight Bounds on the Complexity of the Boyer-Moore String Matching Algorithm String match; Boyer-Moore; Complexity; USA; Algorithm "The problem of finding all occurrences of a pattern of length m in a text of length n is considered. It is shown that the Boyer-Moore string matching algorithm performs roughly 3n comparisons and that this bound is tight up to O( n/m ); more precisely, an upper bound of 3n - n/m comparisons is shown, as is a lower bound of 3n( 1 - o(1) ) comparisons, as n/m goes to infinity and m goes to infinity. While the upper bound is somewhat involved, its main elements provide a quite simple proof of a 4n upper bound for the same algorithm." ACM-SIAM Sympos Discrete Algorithms 1991 2 224-233 1511 Amir,A. Two-Dimensional Period.. ACM-SIAM Sympos 92 3:440-452 Amir A; Benson G Two-Dimensional Periodicity and its Applications Pattern match; Multidimensional; Regularities; USA "This paper presents a new algorithmic technique for two-dimensional matching, that of periodicity analysis. This paper's main contribution is defining and analysing two-dimensional periodicity in rectangular arrays. In addition, we introduce a new pattern matching paradigm - Compressed Matching. A text array T and a pattern array P are given in compressed forms c(T) and c(P). We seek all appearances of P in T, without decompressing T." ACM-SIAM Sympos Discrete Algorithms 1992 3 440-452 1512 Waterman,M.S. Introduction to Comput.. 95Chapman Hall Waterman MS Introduction to Computational Biology: Maps, Sequences and Genomes BK - Sequence analysis; Mapping; Genome; USA Not published as of March 1995. Chapman Hall 1995 0-0 1513 Sankoff,D. Steiner Points in the .. 95 Sankoff D; Sundaram G; Kececioglu J Steiner Points in the Space of Genome Rearrangements (in preparation) BK - Genomic; Rearrangement; CA; Genome Preliminary version presented at Workshop on Genome Rearrangements, University of Southern California, March 1994. 1995 1514 Luo,L. A Stochastic Evolution.. J.Theor.Biol. 92 157:83-94 Luo L; Trainor LEH A Stochastic Evolutionary Model of Molecular Sequences Sequence analysis; Stochastic; Evolution; Model; CA "A stochastic evolutionary model of molecular sequences is proposed. The basic forces in evolution are supposed to be mutation and selection. ... The selective force is divided into two parts: a slowly-varying part and a rapidly- changing fluctuation. The latter influences the distribution of sequences and results in an equation of motion along the flow line. The former plays a more important role in the emergence of evolutionary order. It is demonstrated that the asymmetry of selective forces would lead to a definite order of the system." J Theor Biol 157 157 83-94 1515 Jiang,T. Approximating Shortest.. Theoret.Comput. 94 134(2):473-491 Jiang T; Li M Approximating Shortest Superstrings with Constraints Supersequence; Shortest common; Data structure; CA Also Proc. 3rd Workshop on Algorithms and Data Structures, 1993, pp. 385- 396. "Various versions of the shortest common superstring problem play important roles in data compression and DNA sequencing. ... We present polynomial-time approximation algorithms that produce consistent superstrings of length O(n), for two important special cases: (a) when no negative strings contain positive strings as substrings; (b) when there are only a constant number of negative strings. The algorithms are obtained by making an essential use of the Hungarian algorithm, which can find an optimal cycle cover on weighted graphs." Theoret Comput Sci 1994 134 2 473-491 1516 Bandelt,H.J. Split Decomposition: A.. Mol.Phylogenet. 92 1(3):242-252 Bandelt HJ; Dress AWM Split Decomposition: A New and Useful Approach to Phylogenetic Analysis of Distance Data Phylogenetic; Evolutionary distance; Distance; DE; Decomposition "In order to analyze the structure inherent to a matrix of dissimilarities (such as evolutionary distances) we propose to use a new technique called split decomposition. This method accurately dissects the given dissimilarity measure as a sum of elementary 'split' metrics plus a (small) residue. The split summands identify related groups which are susceptible to further interpretation when casted against the available biological information. Reanalysis of previously published ribosomal RNA data sets using split decomposition illustrate the potential of this approach." Mol Phylogenet Evol 1992 1 3 242-252 1517 Zhang,M.Q. Alignment of Molecular.. J.Theor.Biol. 95 174(2):119-129 Zhang MQ; Marr TG Alignment of Molecular Sequences Seen as Random Path Analysis Sequence alignment; Random path; Dynamic programming; USA "We propose a generating functional method - random path analysis (RPA) - that generalizes the classical dynamic programming (DP) method widely used in sequence alignments. For a given cost function, DP is a deterministic method that finds an optimal alignment by minimizing the total cost function for all possible alignments. By allowing uncertainty, RPA is a statistical method that weights fluctuating alignments by probabilities. Therefore, DP may be thought of as the deterministic limit of RPA when the fluctuations approach zero. ... Here we focus on deriving a mathematically rigorous solution to RPA both in its combinatorial form and in its graphical representation ...." J Theor Biol 1995 174 2 119-129 1518 Orengo,C.A. A Review of Methods fo.. Patterns in P.. 92Springer-Verlag Orengo CA A Review of Methods for Protein Structure Comparison Taylor WR Patterns in Protein Sequence and Structure Structure; Review; Protein Vol. 7, Springer series in Biophysics. Orengo, Taylor (1993), p. 497 Springer-Verlag Heidelberg 1992 159-188 1519 Goldstein,L. Approximations to Prof.. J.Comput.Biol. 94 1(2):93-104 Goldstein L; Waterman MS Approximations to Profile Score Distributions Scoring; Statistical; Significance; Approximation; Profile; Distribution; USA; Score "Profiles, which are summaries of multiple alignments of a sequence family, are used to find new instances of the family in databases. In this paper, we study the maximum score M obtained when the profile is aligned without indels at all possible positions of a random sequence. The main theorem gives an approximation to the distribution function of M with an explicit bound on the error. This theorem implies that M has a limiting extreme value distribution." J Comput Biol 1994 1 2 93-104 1520 Searls,D.B. The Computational Ling.. Artificial In.. 93AAAI Press Searls DB The Computational Linguistics of Biological Sequences Hunter L Artificial Intelligence and Molecular Biology Sequence analysis; Language; USA; Linguistic Searls (1992), p. 591 who mentions 1992, Snyder & Stormo (1995), p. 17. AAAI Press Cambridge, MA 1993 47-120 1521 Luo,L. The Maximum Informatio.. J.Theor.Biol. 95 174(2):131-136 Luo L; Bai G The Maximum Information Principle and the Evolution of Nucleotide Sequences Sequence analysis; Composition; Sequence prediction; Markov; Probability; Information theory; CN; Evolution; Nucleotide "The probability distributions of bases in nucleotide sequences are deduced from the maximum information principle by maximizing the entropy (due to random mutation of bases) under certain constraints (Markovian entropy, G+C content, etc., due to selection). Two formulations are given with respect to different selective constraints. The deviations of theoretical distributions from experimental data are lower than 10% for most sequences. It is shown that the Lagrange multipliers change from species to species systematically - i.e., selective constraints correlate with evolution." J Theor Biol 1995 174 2 131-136 1522 Aho,A.V. Bounds on the Complexi.. IEEE Sympos.Fou 74 15:104-109 Aho AV; Hirschberg DS; Ullman JD Bounds on the Complexity of the Longest Common Subsequence Problem Longest common; Complexity; Subsequence; USA David S. Johnson. Not at DIMACS. Published in J. ACM, 23(1), 1-12 (1976) IEEE Sympos Found Comput Sci 15 15 104-109 1523 Griggs,J.R. On the Number of Align.. Graphs Combin. 90 6:133-146 Griggs JR; Hanlon P; Odlyzko AM; Waterman MS On the Number of Alignments of k Sequences Multiple alignment; Sequence alignment; Combinatorial; USA "Numerous studies by molecular biologists concern the relationships between several long DNA sequences, which are listed in rows with some gaps inserted and with similar positions aligned vertically. This motivates our interest in estimating the number of possible arrangements of such sequences. We say that a k sequence alignment of size n is obtained by inserting some (or no) 0's into k sequences of n 1's so that every sequence has the same length and so that there is no position which is 0 in all sequences. We show by a combinatorial argument that for any fixed k >= 1, the number f(k,n) of k alignments of length n grows like (c-sub-k)**n as n [goes to infinity] ...." Graphs Combin 6 6 133-146 1524 Neyman,J. Molecular Studies of E.. Statistical D.. 71Academic Press Neyman J Molecular Studies of Evolution: A Source of Novel Statistical Problems Gupta SS Yackel J Statistical Decision Theory and Related Topics; Proceedings of a Symposium Held at Purdue University, November 23-25, 1970 Statistical; Likelihood; Phylogeny; Evolution; USA "The recently opened and rapidly developing field of evolution research, conducted on the level of molecules, is a novel source of interesting statistical and probabilistic problems. The biological studies are concerned with macromolecules which, in organisms as diverse as Man, Monkey, Carp, Whale and Yeast, perform similar functions and have similar structures. The apparently inconsequential differences among such homologous macromolecules, their sites and their frequencies, are at the base of current efforts to establish lineages linking the species studied to a common ancestor. The nature of statistical problems originating from such biological studies is illustrated on two tentative stochastic models of 'inconsequential' substitutions in the macromolecules." Academic Press New York 1971 1-27 1525 Bisant,D. Identification of Ribo.. Nucleic Acids R 95 23(9):1632-163 Bisant D; Maizel J Identification of Ribosome Binding Sites in Escherichia coli Using Neural Network Models RNA; Binding; Neural; USA; Identification; Ribosome; Network; Model "This study investigated the use of neural networks in the identification of Escherichia coli ribosome binding sites. ... Feedforward backpropagation networks were applied to their identification. Perceptrons were also applied, since they have been the previous best method since 1982. Evaluation of performance for all the neural networks and perceptrons was determined by ROC [receiver-operating-characteristic] analysis. The neural network provided significant improvement in the recognition of these sites when compared with the previous best method, finding less than half the number of false positives when both models were adjusted to find an equal number of actual sites." Nucleic Acids Res 1995 23 9 1632-1639 1526 Andersson,A. Efficient Implementati.. Software.Practi 95 25(2):129-141 Andersson A; Nilsson S Efficient Implementation of Suffix Trees Suffix; Search tree; Compression; SWE "We study the problem of string searching using the traditional approach of storing all unique substrings of the text in a suffix tree. The methods of path compression, level compression and data compression are combined to build a simple, compact and efficient implementation of a suffix tree. Based on a comparative discussion and extensive experiments, we argue that our new data structure is superior to previous methods in many practical situations." Software Practice Experience 1995 25 2 129-141 1527 Charleston,M. The Effects of Sequenc.. J.Comput.Biol. 94 1(2):133-151 Charleston MA; Hendy MD; Penny D The Effects of Sequence Length, Tree Topology, and Number of Taxa on the Performance of Phylogenetic Methods Phylogeny; Performance; Phylogenetic; Simulation; NZ; Topology "Simulations were used to study the performance of several character-based and distance-based phylogenetic methods in obtaining the correct tree from pseudo-randomly generated input data. The study included all the topologies of unrooted binary trees with from 4 to 10 pendant vertices (taxa) inclusive. The length of the character sequences used ranged from 10 to 10**5 characters exponentially. The methods studied include Closest Tree, Compatibility, Li's method, Maximum Parsimony, Neighbor-joining, Neighborliness, and UPGMA." J Comput Biol 1994 1 2 133-151 1528 Steel,M.A. Reconstructing Trees W.. J.Comput.Biol. 94 1(2):153-163 Steel MA; Szekely LA; Hendy MD Reconstructing Trees When Sequence Sites Evolve at Variable Rates Phylogeny; Rate; Markov; Spectral analysis; NZ "For a sequence of colors independently evolving on a tree under a simple Markov model, we consider conditions under which the tree can be uniquely recovered from the 'sequence spectrum' -- the expected frequencies of the various leaf colorations. This is relevant for phylogenetic analysis (where colors represent nucleotides or amino acids; leaves represent extant taxa) as the sequence spectrum is estimated directly from a collection of aligned sequences. ... Hence there is a logical barrier to accurate, consistent phylogenetic inference for these models when assumptions about the rate distribution are not made." J Comput Biol 1994 1 2 153-163 1529 Fasman,K.H. Restructuring the Geno.. J.Comput.Biol. 94 1(2):165-171 Fasman KH Restructuring the Genome Data Base: A Model for a Federation of Biological Databases Sequence database; Genome; Model; USA "The creation of a federation of public biological databases has been proposed. Formerly independent systems will need to be modified to interoperate better within this federation. This will enable the federated system to provide biologists with an integrated view of biological data. The GDB Human Genome Database is being restructured to participate in the proposed federation. GDB itself will be organized into a collection of related data sets in support of human gene mapping. The techniques that will be used to link these data sets will be applicable to the federation as a whole." J Comput Biol 1994 1 2 165-171 1530 Nei,M. Molecular Evolutionary.. 87Columbia Univer Nei M Molecular Evolutionary Genetics BK - Evolutionary distance; Substitution; Evolution; Population; Genetic; JP "During the last ten years, spectacular progress has occurred in the study of molecular evolution and variation mainly because of the introduction of new biochemical techniques such as gene cloning, DNA sequencing, and restriction enzyme methods. ... Furthermore, the molecular approach is now being used for studying the evolution of morphological, physiological, and behavioral characters. The purpose of this book is to summarize and review recent developments in this area of study. Previously, molecular evolution and population genetics were studied as separate scientific disciplines. In this book, an attempt will be made to unify these two disciplines into one which may be called molecular evolutionary genetics." Bibliography: pp. 433-495. Columbia University Press New York 1987 x+512-0 1531 Krichevsky,R. Occam's Razor, Partial.. Inform.Comput. 94 108(1):158-174 Krichevsky RE Occam's Razor, Partially Specified Boolean Functions, String Matching, and Independent Sets String match; Probability; RU; Function "An algorithm transforming any partially specified boolean function into an asymptotically shortest program to compute it is presented. The algorithms runs in quadratic time. As corollaries, two methods are developed. The first of them solves the string-matching problem in a randomized way with a smaller false match probability than previously known methods. The second produces an asymptotically largest family of independent sets in nearly minimal time." Inform Comput 1994 108 1 158-174 1532 Altschul,S.F. Issues in Searching Mo.. Nature Genetics 94 6(2):119-129 Altschul SF; Boguski MS; Gish W; Wootton JC Issues in Searching Molecular Sequence Databases Sequence database; Database search; Statistical; Sequence alignment; Scoring; USA "Sequence similarity search programs are versatile tools for the molecular biologist, frequently able to identify possible DNA coding regions and to provide clues to gene and protein structure and function. While much attention had been paid to the precise algorithms these programs employ and to their relative speeds, there is a constellation of associated issues that are equally important to realize the full potential of these methods. Here, we consider a number of these issues, including the choice of scoring systems, the statistical significance of alignments, the masking of uninformative or potentially confounding sequence regions, the nature and extent of sequence redundancy in the databases and network access to similarity search services." Nature Genetics 1994 6 2 119-129 1533 Fitch,W.M. A Hidden Bias in the E.. Evolutionary .. 86Academic Press Fitch WM A Hidden Bias in the Estimate of Total Nucleotide Substitutions from Pairwise Differences Karlin S Nevo E Evolutionary Processes and Theory Substitution; Evolutionary divergence; Nucleotide; Bias; USA Swofford, Olsen (1990), p. 537 Academic Press Orlando, FL 1986 315-328 1534 Cavender,J.A. Necessary Conditions f.. Math.Biosci. 91 103:69-75 Cavender JA Necessary Conditions for the Method of Inferring Phylogeny by Linear Invariants Evolutionary tree; Invariant; Phylogeny; Markov; USA "It is known that if all the Markov transition matrices that govern the substitution of one nucleotide for another satisfy six linear constraints, then equations can be derived that permit one to infer evolutionary trees from nucleic acid sequences by the method of linear invariants. These sufficient conditions are also necessary. Any relaxation of them results in the loss of all linear invariants. Necessary conditions for any given set of linear invariants can be derived by examining conditions a matrix must satisfy to map a certain set of matrices into itself. To the extent that necessary conditions are incorrect, a method is not reliable." Math Biosci 103 103 69-75 1535 Karlin,S. Computational DNA Sequ.. Annu.Rev.Microb 94 48:619-654 Karlin S; Cardon LR Computational DNA Sequence Analysis Sequence analysis; DNA; Protein; Statistical; Evolution; k-tuple; USA "This paper reviews several new developments in computer and statistical analysis of DNA and protein sequences. We present criteria and describe means for assessing and interpreting genomic inhomogeneities within and between sequences. These include: (a) characterizations of short oligonucleotide biases and general compositional tendencies; (b) molecular evolutionary reconstructions based on dinucleotide relative abundance distance measures and partial orderings; (c) the application of r-scan statistics, quantile distributions, and score-based analyses to identify clustering, overdispersion, and excessive evenness in the distribution of a marker array along a sequence." Annu Rev Microbiol 48 48 619-654 1536 Li,W.H. Reconstruction of Phyl.. Cold Spring Har 87 52:847-856 Li WH; Wolfe KH; Sourdis J; Sharp PM Reconstruction of Phylogenetic Trees and Estimation of Divergence Times under Nonconstant Rates of Evolution Evolutionary rate; Divergence; Evolution; Rate; Phylogenetic; Estimation; USA "Phylogenetic reconstruction is extremely difficult when the rates of evolution differ greatly among lineages and when the taxa or DNA sequences under study are distantly related. We have studied this problem for the simple case of only four taxa. We used computer simulation to compare the performance of several methods to see which are most effective against unequal rates of evolution .... It is commonly thought that phylogenetic reconstruction becomes much simpler when one or more outgroups are available .... However, the usefulness of an outgroup depends on its distance from the taxa under study. We therefore studied how quickly the reliability of an outgroup reference decreases with that distance." Cold Spring Harbor Sympos Quant Biol 52 52 847-856 1537 Golding,B. A Maximum Likelihood A.. J.Mol.Evol. 90 31:511-523 Golding B; Felsenstein J A Maximum Likelihood Approach to the Detection of Selection from a Phylogeny Evolutionary tree; Likelihood; Selection; Phylogeny; CA; Detection "A large amount of information is contained within the phylogenetic relationships between species. ... The influence that deleterious selection might have is determined here. The likelihood of different phylogenies in the presence of selection is explored to determine the properties of such a likelihood surface. The calculation of likelihoods for a phylogeny in the presence and absence of selection, permits the application of a likelihood ratio test to search for selection. It is shown that even a single selected site can have a strong effect on the likelihood." J Mol Evol 31 31 511-523 1538 Taylor,W.R. Protein Structure Mode.. J.Biotechnol. 94 35(2/3):281-29 Taylor WR Protein Structure Modelling from Remote Sequence Similarity Review; Protein; Structure; Model; Sequence proximity; Amino acid; UK; Similarity "Many methods exist for taking a sequence that exhibits similarity to another of known structure and building a molecular model. However, when the sequence similarity is very remote and fragmentary, this "modelling-by-homology" approach is less reliable. Current methods that tackle this problem are reviewed below, taking as an example the construction of a predicted model for the retroviral protease. ... Because of the rapid proliferation of methods and their variants, an exhaustive review of the literature has not been possible and the following survey concentrates on the developments of the author and colleagues to explain the basic methods." J Biotechnol 1994 35 2/3 281-291 1539 Fukami-Kobaya Estimation of Evolutio.. Mol.Biol.Evol. 94 11(1):99-105 Fukami-Kobayashi K Estimation of Evolutionary Distance between Distantly Related Sequences of Amino Acids, Taking Account of Patterns of Amino Acid Replacement Evolutionary distance; Amino acid; Distance; Likelihood; Phylogenetic; JP; Estimation "A method called the 'similarity distance method' (SD method) was developed to obtain maximum-likelihood estimates of evolutionary distance between amino acid sequences, on the basis of a given pattern of amino acid replacement. Computer simulation revealed that, by using the new method, evolutionary distance can be estimated efficiently even when the expected identity between the sequences is as low as 0.14 and the length of the sequences is only 50 amino acid residues." Mol Biol Evol 1994 11 1 99-105 1540 Agarwala,R. A Polynomial-time Algo.. SIAM J.Comput. 94 23(6):1216-122 Agarwala R; Fernandez-Baca D A Polynomial-time Algorithm for the Perfect Phylogeny Problem when the Number of Character States is Fixed Phylogeny; Character data; Compatibility; Evolutionary tree; USA; Algorithm "This paper presents a polynomial-time algorithm for determining whether a set of species, described by the characters they exhibit, has a perfect phylogeny, assuming the maximum number of possible states for a character is fixed. This solves a longstanding open problem. This result should be contrasted with the proof by Steel (1992) and Bodlaender, Fellows and Warnow (1992) that the perfect phylogeny problem is NP-complete in general." SIAM J Comput 1994 23 6 1216-1224 1541 Apostolico,A. Parallel Detection of .. Theoret.Comput. 95 141:163-173 Apostolico A; Breslauer D; Galil Z Parallel Detection of all Palindromes in a String String search; Palindrome; Parallel; USA; Detection "This paper presents two efficient concurrent-read concurrent-write parallel algorithms that find all palindromes in a given string. ... These new results improve on the known parallel palindrome detection algorithms by using smaller auxiliary space and either by making fewer operations or by achieving a faster running time." Theoret Comput Sci 141 141 163-173 1542 Bafna,V. Sorting by Reversals: .. Mol.Biol.Evol. 95 12(2):239-246 Bafna V; Pevzner PA Sorting by Reversals: Genome Rearrangements in Plant Organelles and Evolutionary History of X Chromosome Reversal; Genome; Sequence comparison; Inversion; USA; Rearrangement; Chromosome "The paper addresses the problem of genome comparison versus classical gene comparison and presents algorithms to analyze rearrangements in genomes evolving by inversions. In the simplest form the problem corresponds to sorting by reversals, that is sorting of an array using reversals of arbitrary fragments. We describe algorithms to analyze genomes evolving by inversions and discuss applications of these algorithms in molecular evolution." Mol Biol Evol 1995 12 2 239-246 1543 Breslauer,D. Fast Parallel String P.. Theoret.Comput. 95 137:269-278 Breslauer D Fast Parallel String Prefix-matching String search; Prefix; Parallel; String match; DK "An O( log log m ) time n log m / log log m - processor CRCW-PRAM algorithm for the string prefix-matching problem over general alphabets is presented. The algorithm can also be used to compute the KMP [Knuth-Morris- Pratt] failure function in O( log log m ) time on m log m / log log m processors. These results improve on the running time of the best previous algorithm for both problems, which was O( log m ), while preserving the same number of processors." Theoret Comput Sci 137 137 269-278 1544 Breslauer,D. Dictionary-Matching on.. J.Algorithms 95 18:278-295 Breslauer D Dictionary-Matching on Unbounded Alphabets: Uniform Length Dictionaries Dictionary match; On-line; Multidimensional; Italy "In the string-matching problem one is interested in all occurrences of a short pattern string in a longer text string. Dictionary-matching is a generalization of this problem where one is looking simultaneously for all occurrences of several patterns in a single text. This paper presents an efficient on-line dictionary-matching algorithm for the case where the patterns have uniform length and the input alphabet is unbounded. A tight lower bound establishes that our approach is optimal if the only access the algorithm has to the input strings is by pairwise symbol comparisons." J Algorithms 18 18 278-295 1545 Cole,R. Tighter Lower Bounds o.. SIAM J.Comput. 95 24(1):30-45 Cole R; Hariharan R; Paterson M; Zwick U Tighter Lower Bounds on the Exact Complexity of String Matching String match; Complexity; On-line; Pattern match; USA "This paper considers the exact number of character comparisons needed to find all occurrences of a pattern of length m in a text of length n using on- line and general algorithms. ... These lower bounds complement an on-line upper bound ... obtained recently by Cole and Hariharan. The lower bounds are obtained by finding patterns with interesting combinatorial properties. It is also shown that for some patterns off-line algorithms can be more efficient than on-line algorithms." SIAM J Comput 1995 24 1 30-45 1546 DeBry,R.W. The Relationship betwe.. Mol.Biol.Evol. 95 12(2):291-297 DeBry RW; Abele LG The Relationship between Parsimony and Maximum-Likelihood Analyses: Tree Scores and Confidence Estimates for Three Real Data Sets Parsimony; Likelihood; Phylogeny; Evolutionary tree; Confidence; USA; Score "However, it is not known how frequently the most parsimonious topology will be the same as the maximum-likelihood topology with real data sets. Three 18S nucleotide sequence data sets are examined, each consisting of seven crustacean taxa. For each data set, under both parsimony and likelihood, scores are determined for all 945 topologies, complete confidence sets are estimated by methods that account for variance in the phylogenetic estimate, and bootstrap resampling is performed." Mol Biol Evol 1995 12 2 291-297 1547 Ferretti,V. Phylogenetic Invariant.. J.Theor.Biol. 95 173:147-162 Ferretti V; Sankoff D Phylogenetic Invariants for More General Evolutionary Models Phylogenetic; Invariant; Phylogeny; Evolutionary tree; CA; Model "In this paper, we apply a general method of finding invariants of a parameterized functional form to find low-degree polynomial invariants for different models. Quadratic invariants are obtained for the Kimura two-parameter model, for a model allowing evolutionary dependence between positions in the sequences and for an asymmetric model that allows for A+T versus G+C asymmetries in DNA base composition. Those invariants are found for trees (unrooted in the case of the Kimura model and rooted for the others) with N=3 or N=4 terminal vertices." J Theor Biol 173 173 147-162 1548 Gascuel,O. A Note on Sattath and .. Mol.Biol.Evol. 94 11(6):961-963 Gascuel O A Note on Sattath and Tversky's, Saitou and Nei's, and Studier and Keppler's Algorithms for Inferring Phylogenies from Evolutionary Distances Phylogeny; Reconstruct; Distance; Additive tree; Evolutionary tree; Evolutionary distance; FR; Algorithm "Several simulations ... have shown a high relative efficiency of ADDTREE and of the NJ method in recovering the true topology. These studies have also shown that ADDTREE and the NJ method, whose principles seem very different, are in fact close and usually provide identical or similar trees. ... In this note, we account for this proximity regardless of the number of taxa, and we show that the minimum evolution principle, as employed in the NJ method, is very close to the neighborliness used by Sattath and Tversky (1977) and by Fitch (1981) in a nonagglomerative way." Mol Biol Evol 1994 11 6 961-963 1549 Gaut,B.S. Success of Maximum Lik.. Mol.Biol.Evol. 95 12(1):152-162 Gaut BS; Lewis PO Success of Maximum Likelihood Phylogeny Inference in the Four-Taxon Case Likelihood; Phylogeny; Evolutionary tree; USA "We used simulated data to investigate a number of properties of maximum- likelihood (ML) phylogenetic tree estimation for the case of four taxa. ... Data were analyzed in the ML framework with two different substitution models, and we compared the ability of the two models to reconstruct the correct topology. Although both models were inconsistent for some branch-length combinations in the presence of site-to-site variation, the models were efficient predictors of topology under most simulation conditions." Mol Biol Evol 1995 12 1 152-162 1550 Karlin,S. Which Bacterium is the.. Proc.Nat.Acad.S 94 91:12842-12846 Karlin S; Campbell AM Which Bacterium is the Ancestor of the Animal Mitochondrial Genome? Genome; Composition; Nucleotide; Bias; USA; Ancestor "We present considerable data supporting the hypothesis that a Sulfolobus- or Mycoplasma-like endosymbiont, rather than an a-protobacterium, is the ancestor of animal mitochondrial genomes. This hypothesis is based on pronounced similarities in oligonucleotide relative abundance extremes common to animal mtDNA, Sulfolobus, and Mycoplasma capricolum and pronounced discrepancies of these relative abundance values with respect to a-proteobacteria." Proc Nat Acad Sci USA 91 91 12842-12846 1551 Karlin,S. Heterogeneity of Genom.. Proc.Nat.Acad.S 94 91:12837-12841 Karlin S; Ladunga I; Blaisdell BE Heterogeneity of Genomes: Measures and Values Genome; Nucleotide; Distance; USA "Genomic homogeneity is investigated for a broad base of DNA sequences in terms of dinucleotide relative abundance distances (abbreviated d-distances) and of oligonucleotide compositional extremes. It is shown that d-distances between different genomic sequences in the same species are low, only about 2 or 3 times the distance found in random DNA, and are generally smaller than the between- species d-distances." Proc Nat Acad Sci USA 91 91 12837-12841 1552 Karlin,S. Comparisons of Eukaryo.. Proc.Nat.Acad.S 94 91:12832-12836 Karlin S; Ladunga I Comparisons of Eukaryotic Genomic Sequences Genome; Sequence comparison; USA; Genomic "A method for assessing genomic similarity based on relative abundances of short oligonucleotides in large DNA samples is introduced. The method requires neither homologous sequences nor prior sequence alignments. The analysis centers on (i) dinucleotide (and tri- and tetra-) relative abundance extremes in genomic sequences, (ii) distances between sequences based on all dinucleotide relative abundance values, and (iii) a multidimensional partial ordering protocol. The emphasis in this paper is on assessments of general relatedness of genomes as distinguished from phylogenetic reconstructions." Proc Nat Acad Sci USA 91 91 12832-12836 1553 Kelly,C. A Test of the Markovia.. Biometrics 94 50:653-664 Kelly C A Test of the Markovian Model of DNA Evolution Markov; Evolution; DNA; USA; Model "The Markov model of molecular evolution has recently received a significant amount of interest because its statistical nature allows for the testing of a number of evolutionary hypotheses. Here we propose a test which assesses whether data from two species sharing a common ancestor will fit a general Markovian model. We illustrate the test with two examples of data which appear at first glance not to fit a Markov model." Biometrics 50 50 653-664 1554 Lento,G.M. Use of Spectral Analys.. Mol.Biol.Evol. 95 12(1):28-52 Lento GM; Hickson RE; Chambers GK; Penny D Use of Spectral Analysis to Test Hypotheses on the Origin of Pinnipeds Spectral analysis; Evolution; DNA; NZ "We inferred phylogenetic reconstructions from DNA sequence data using standard parsimony and neighbor-joining algorithms for phylogenetic inference as well as a new method called spectral analysis (Hendy and Penny) in which phylogenetic information is displayed independently of any selected tree. We identified and compensated for potential sources of error known to lend to selection of incorrect phylogenetic trees. These include sampling error, unequal evolutionary rates on lineages, unequal nucleotide composition among lineages, unequal rates of change at different sites, and inappropriate tree selection criteria." Mol Biol Evol 1995 12 1 28-52 1555 Perna,N.T. Unequal Base Frequenci.. Mol.Biol.Evol. 95 12(2):359-361 Perna NT; Kocher TD Unequal Base Frequencies and the Estimation of Substitution Rates Substitution; Nucleotide; Sequence comparison; USA; Rate; Estimation "The model assumes a stationary process (i.e., the observed base composition reflects the nucleotide frequencies at equilibrium). In order to maintain the equilibrium composition, substitutions are weighted by the frequency of the mutant base. The motivation for this weighting arises from an analysis of the patterns and relative rates of base substitutions inferred by parsimony from a distance-based tree. We wish to discuss this analysis of substitution patterns, as well as the sensitivity of divergence estimates to the equilibrium nucleotide frequencies which are assumed." Mol Biol Evol 1995 12 2 359-361 1556 Rzhetsky,A. Tests of Applicability.. Mol.Biol.Evol. 95 12(1):131-151 Rzhetsky A; Nei M Tests of Applicability of Several Substitution Models for DNA Sequence Data Substitution; DNA; Sequence analysis; Invariant; Phylogeny; USA; Model "Using linear invariants for various models of nucleotide substitution, we developed test statistics for examining the applicability of a specific model to a given dataset in phylogenetic inference. ... The test statistics developed are independent of evolutionary time and phylogeny, although the variances of the statistics contain phylogenetic information. Therefore, these statistics can be used before a phylogenetic tree is estimated. Our objective is to find the simplest model that is applicable to a given dataset, keeping in mind that a simple model usually gives an estimate of evolutionary distance ... with a smaller variance than a complicated model when the simple model is correct." Mol Biol Evol 1995 12 1 131-151 1557 Sitnikova,T. Interior-Branch and Bo.. Mol.Biol.Evol. 95 12(2):319-333 Sitnikova T; Rzhetsky A; Nei M Interior-Branch and Bootstrap Tests of Phylogenetic Trees Phylogeny; Bootstrap; Statistical; USA; Phylogenetic "We have compared statistical properties of the interior-branch and bootstrap tests of phylogenetic trees when the neighbor-joining tree-building method is used. ... Actually, the bootstrap test usually underestimates the extent of statistical support of species clusters. The relationship between the confidence values obtained by the two tests varies with both the topology and expected branch lengths of the true (model) tree." Mol Biol Evol 1995 12 2 319-333 1558 Strumpen,V. Coupling Hundreds of W.. Software.Practi 95 25(3):291-304 Strumpen V Coupling Hundreds of Workstations for Parallel Molecular Sequence Analysis Parallel; Distributed; Sequence analysis; SWI "We present a highly scalable approach to distributed parallel computing on workstations in the Internet which provides significant speed-up to molecular biology sequence analysis. Recent developments show that smaller numbers of workstations connected via a local area network can be used efficiently for parallel computing. This work emphasizes scalability with respect to the number of workstations employed." Software Practice Experience 1995 25 3 291-304 1559 Tillier,E.R.M Neighbor Joining and M.. Mol.Biol.Evol. 95 12(1):7-15 Tillier ERM; Collins RA Neighbor Joining and Maximum Likelihood with RNA Sequences: Addressing the Interdependence of Sites Likelihood; RNA; Sequence analysis; CA; Joining; Neighbor joining "We analyze a new probabilistic model for the evolution of double-stranded RNA molecules that considers substitutions of the base pairs rather than of each of the bases independently. The new model, called the double-stranded model, was incorporated into the neighbor-joining distance and maximum likelihood methods. Computer simulations show that maximum likelihood is very robust to the violation of the assumption of the independence of sites. In contrast, the neighbor-joining method is sensitive to such violations ...." Mol Biol Evol 1995 12 1 7-15 1560 Overton,G.C. QGB: A System for Quer.. J.Comput.Biol. 94 1(1):3-14 Overton GC; Aaronson JS; Haas J; Adams J QGB: A System for Querying Sequence Database Fields and Features Database search; Sequence database; USA "We have developed a general system, QGB, for performing complex queries on the information in the DDBJ/EMBL/Genbank databases, including queries over the structural features of sequences implied in the FEATURE TABLE. Queries are formed in a Structured Query Language (SQL)-like syntax with language extensions to support complex types ... appropriate for representing and querying sequence data. A novel aspect of QGB is its ability to deduce missing features and infer relationships among features as a consequence of constructing a parse tree of sequence structure from information described in the FEATURE TABLE." J Comput Biol 1994 1 1 3-14 1561 States,D.J. Combined Use of Sequen.. J.Comput.Biol. 94 1(1):39-50 States DJ; Gish W Combined Use of Sequence Similarity and Codon Bias for Coding Region Identification Codon; Bias; Coding; Region; Sequence proximity; USA; Similarity; Identification "A computer program called BLASTX was previously shown to be effective in identifying and assigning putative function to likely protein coding regions by detecting significant similarity between a conceptually translated nucleotide query sequence and members of a protein sequence database. We present and assess the sensitivity of a new option to this software tool, herein called BLASTC, which employs information obtained from biases in codon utilization, along with the information obtained from sequence similarity." J Comput Biol 1994 1 1 39-50 1562 Miller,W. Constructing Aligned S.. J.Comput.Biol. 94 1(1):51-64 Miller W; Boguski M; Raghavachari B; Zhang Z; Hardison RC Constructing Aligned Sequence Blocks Multiple alignment; Sequence alignment; Pairwise alignment; USA "This paper presents an efficient method for constructing aligned blocks (i.e., gap-free multiple alignments) from a set of pairwise alignments. The method is more sensitive than some earlier block-constructing methods for detecting conserved sequence regions. The technique is applied to analyze conserved regions in protein prenyltransferases and to detect regulatory elements in the 5' flank of the b-globin gene." J Comput Biol 1994 1 1 51-64 1563 Ferretti,V. Skewed Base Compositio.. J.Comput.Biol. 94 1(1):77-92 Ferretti V; Lang BF; Sankoff D Skewed Base Compositions, Asymmetric Transition Matrices, and Phylogenetic Invariants Phylogenetic; Invariant; Transition; Evolutionary tree; CA; Composition "Evolutionary inference methods that assume equal DNA base compositions and symmetric nucleotide substitution matrices, where these assumptions do not hold, are likely to group species on the basis of similar base compositions rather than true phylogenetic relationships. We propose an invariants-based method for dealing with this problem. ... We apply a general 'empirical' method of finding invariants of a parameterized functional form. ... We discuss the problems of finding asymmetric models satisfying the property of semigroup closure, of finding asymmetric models that admit invariants at all, and of the computational complexity of the method." J Comput Biol 1994 1 1 77-92 1564 Godzik,A. Flexible Algorithm for.. Comput.Appl.Bio 94 10(6):587-596 Godzik A; Skolnick J Flexible Algorithm for Direct Multiple Alignment of Protein Structures and Sequences Protein; Structure; Multiple alignment; Sequence alignment; USA; Algorithm "The recently described equivalence between the alignment of two proteins and a conformation of a lattice chain on a two-dimensional square lattice is extended to multiple alignments. The search for the optimal multiple alignment between several proteins, which is equivalent to finding the energy minimum in the conformational space of a multi-dimensional lattice chain, is studied by the Monte Carlo approach. This method ... can accept arbitrary scoring functions, including non-local ones, and its speed decreases slowly with increasing number of dimensions." Comput Appl Biosci 1994 10 6 587-596 1565 Kondrakhin,Y. Construction of a Gene.. Comput.Appl.Bio 94 10(6):597-603 Kondrakhin YV; Shamin VV; Kolchanov NA Construction of a Generalized Consensus Matrix for Recognition of Vertebrate Pre-mRNA 3'-terminal Processing Sites Consensus matrix; RNA; RU; Recognition; Matrix "Using a set of sequences of 63 cleavage / polyadenylation sites of vertebrate pre-mRNA, a generalized consensus matrix was constructed. The elements of the matrix were the absolute frequencies of oligonucleotides of length l at the ith position of sites. The cleavage point of each site was assigned the same position number. To recognize a polyadenylation site in a nucleotide sequence, a multiplicative measure was obtained using the elements of the generalized consensus matrix as weight factors." Comput Appl Biosci 1994 10 6 597-603 1566 Shepelev,V.A. Multidimensional Dot-m.. Comput.Appl.Bio 94 10(6):605-611 Shepelev VA; Yanishevsky NV Multidimensional Dot-matrices Multidimensional; Dot; RU "A generalization of the dot-matrix of similarity for n sequence is proposed. For the visualization of the n-dimensional dot-matrix, the n projections onto the plane passing through the main diagonal and each of the n axes of Euclidean space En are displayed. The projection is compressed so that the points at the coordinates ... are depicted on the plane. The common regions of similarity are revealed as segments of straight lines parallel to the main diagonal." Comput Appl Biosci 1994 10 6 605-611 1567 Schneider,G. Artificial Neural Netw.. Comput.Appl.Bio 94 10(6):635-645 Schneider G; Schuchhardt J; Wrede P Artificial Neural Networks and Simulated Molecular Evolution are Potential Tools for Sequence-oriented Protein Design Neural; Feature extraction; Evolution; Simulation; Protein; DE; Network "The potential of artificial neural filter systems for feature extraction from amino acid sequences is discussed. Analysis of signal peptidase I cleavage- sites in protein precursor sequences serves as an example application. Trained neural networks can be used as the fitness function in an evolutionary protein design cycle termed 'simulated molecular evolution' which is an entirely computer-based method for the rational design of locally encoded amino acid sequence features." Comput Appl Biosci 1994 10 6 635-645 1568 Smith,S.W. The Genetic Data Envir.. Comput.Appl.Bio 94 10(6):671-675 Smith SW; Overbeek R; Woese CR; Gilbert W; Gillevet PM The Genetic Data Environment: An Expandable GUI for Multiple Sequence Analysis GUI; Sequence analysis; Multiple comparison; USA; Genetic "An X-Windows-based graphic user interface is presented which allows the seamless integration of numerous existing biomolecular programs into a single analysis environment. This environment is based on a core multiple sequence editor that is linked to external programs by a user-expandable menu system and is supported on Sun and DEC workstations. There is no limitation to the number of external functions that can be linked to the interface. The length and number of sequences that can be handled are limited only by the size of virtual memory present on the workstation." Comput Appl Biosci 1994 10 6 671-675 1569 Wishart,D.S. Constrained Multiple S.. Comput.Appl.Bio 94 10(6):687-688 Wishart DS; Boyko RF; Sykes BD Constrained Multiple Sequence Alignment using XALIGN Sequence alignment; Multiple alignment; CA "In response to this need, we have developed the program XALIGN (X-ray ALIGNment), a menu-driven, modular program designed to perform up to six different alignment functions. These include: 1. Pairwise protein sequence alignment. 2. Multiple (>500) sequence alignment. 3. Pairwise sequence / structure alignments. 4. Multiple (>500) sequence / structure alignments. 5. Multi-residue clustering (for editing and alignment). 6. Multi-residue anchoring (for editing and alignment)." Comput Appl Biosci 1994 10 6 687-688 1570 Vingron,M. Multiple Sequence Comp.. Adv.Appl.Math. 95 16(1):1-22 Vingron M; Pevzner PA Multiple Sequence Comparison and Consistency on Multipartite Graphs Sequence comparison; Multiple comparison; Consistency; Graph; USA "Calculation of dot-matrices is a widespread tool in biological sequence comparison. As a visual aid they are used in pairwise sequence comparison but so far have been of little help in the simultaneous comparison of several sequences. Viewing dot-matrices as projections of unknown n-dimensional points we consider the multiple alignment problem (for n sequences) as an n-dimensional image reconstruction problem with noise. We model this situation using a multipartite graph and introduce a notion of 'consistency' on such a graph." Adv Appl Math 1995 16 1 1-22 1571 Bunke,H. An Improved Algorithm .. Inform.Process. 95 54:93-96 Bunke H; Csirik J An Improved Algorithm for Computing the Edit Distance of Run-length Coded Strings Edit; Distance; Approximate match; String match; SWI; Algorithm "Recently, an algorithm for computing the edit distance of run-length coded strings was proposed. ... In this paper, we propose a different approach. Our new algorithm will be also based on a division of the edit matrix into blocks. However, no subdivision of these blocks will ever be required. ... The new algorithm is restricted, however, to the special cost function under which the cost of any insertion and deletion is equal to 1, and the cost of any substitution is equal to 2. The algorithm described in [Bunke & Csirik 1993] can additionally handle the case where all edit operations have unit cost." Inform Process Lett 54 54 93-96 1572 Idury,R.M. Multiple Matching of R.. Inform.Comput. 95 117(1):78-90 Idury RM; Schaffer AA Multiple Matching of Rectangular Patterns String match; Multidimensional; Range search; USA; Rectangular "We describe the first worst-case efficient algorithm for simultaneously matching multiple rectangular patterns of varying sizes and aspect ratios in a rectangular text. Efficient means significantly more efficient asymptotically than applying known algorithms that handle one height (or width or aspect ratio) at a time for each height. Our algorithm features an interesting use of multidimensional range searching, as well as new adaptations of several known techniques for two-dimensional string matching." Inform Comput 1995 117 1 78-90 1573 Sheng,K.N. Pattern Matching betwe.. Bull.Math.Biol. 94 56(6):1143-116 Sheng KN; Naus JI Pattern Matching between Two Non-aligned Random Sequences Pattern match; Sequence match; Probabilistic; Longest common; USA "Given two independent sequences of letters, we seek the probability distribution of the length of the longest matching word. This word can be in different positions in the two sequences and we consider both perfect and nearly perfect matching. We derive bounds and approximations for the probability and compare them with other bounds and approximations. The results can be applied to DNA sequences in molecular biology and generalized matching between two independent random seuqences." Bull Math Biol 1994 56 6 1143-1162 1574 Knight,A. Weighting of Nucleotid.. Syst.Biol. 95 44(1):112-116 Knight A; Mindell DP Weighting of Nucleotide Sequences: A Reply Character weight; DNA; USA; Nucleotide "In an earlier paper (Knight and Mindell, 1993), we proposed and applied a method for calculating weights for DNA sequence characters prior to phylogenetic analysis. Our weighting scheme uses the ratio of expected to observed (EOR) nucleotide differences in comparisons of sequences .... Collins et al. (1994) expanded on the EOR method using the same premises mentioned above. ... We argue here that ... their use of a random model in calculating expected values for the EOR is unrealistic and therefore inappropriate in light of the extensive evidence indicating the nonrandom nature of sequence evolution ...." Syst Biol 1995 44 1 112-116 1575 Collins,T.M. Compositional Effects .. Syst.Biol. 94 43(3):449-459 Collins TM; Kraus F; Estabrook G Compositional Effects and Weighting of Nucleotide Sequences for Phylogenetic Analysis Character weight; DNA; Phylogenetic; USA; Nucleotide "Knight and Mindell (1993) proposed a new method for weighting classes of nucleotide substitution for application to phylogenetic analysis using DNA sequences. ... However, the method proposed by K&M for addressing the effects of compositional bias when weighting is lacking in several respects. ... We suggest a general approach to generating weighting schemes of this kind that is consistent with our comments above and show one example of such a scheme that we believe follows the intent of K&M." Syst Biol 1994 43 3 449-459 1576 Chao,K.M. Recent Developments in.. J.Comput.Biol. 94 1(4):271-291 Chao KM; Hardison RC; Miller W Recent Developments in Linear-Space Alignment Methods: A Survey Sequence analysis; Dynamic programming; Multiple alignment; Survey; USA "A dynamic-programming strategy for sequence alignment first proposed in 1975 by Dan Hirschberg can be adapted to yield a number of extremely space- efficient algorithms. ... Three of our recent extensions of the technique are then outlined. ... We also describe two linear-space methods for computing k best local ... alignments, where k >= 1. ... Finally, we describe programs that implement various combinations of these techniques to provide a multisequence alignment method that is especially suited to handling a few very long sequences." J Comput Biol 1994 1 4 271-291 1577 Rioux,P.A. A Portable Search Engi.. J.Comput.Biol. 94 1(4):293-295 Rioux PA; Gilbert WA; Littlejohn TG A Portable Search Engine and Browser for the Entrez Database Database search; Sequence database; CA "Entrez is a molecular sequence and reference database. We present a tool called CLEVER which permits flexible access to the Entrez database by other applications and interactively by users. In this way, CLEVER is both a search engine and a command-line browser of the Entrez database." J Comput Biol 1994 1 4 293-295 1578 Taylor,W.R. Motif-Biased Protein S.. J.Comput.Biol. 94 1(4):297-310 Taylor WR Motif-Biased Protein Sequence Alignment Protein; Sequence alignment; Multiple alignment; Motif; Gap; UK "A method was developed for pairwise protein sequence alignment to emulate the effect of structural knowledge or multiple sequences. Runs of matches of the preferred length were emphasized through the use of a product-bias allowing short motifs to influence the alignment to a degree that was a realistic reflection of their infrequency of occurrence. This gave motifs a locally high scoring match, making their alignment relatively less sensitive to the value of the gap penalty. This property should be a great advantage when a large number of sequence comparisons are made with a fixed set of parameter values, as typically occurs in the scan of a sequence databank with a probe or in the development of a multiple alignment." J Comput Biol 1994 1 4 297-310 1579 Wang,L. On the Complexity of M.. J.Comput.Biol. 94 1(4):337-348 Wang L; Jiang T On the Complexity of Multiple Sequence Alignment Sequence alignment; Multiple alignment; Complexity; Approximation; Evolutionary tree; CA "We study the computational complexity of two popular problems in multiple sequence alignment: multiple alignment with SP-score and multiple tree alignment. It is shown that the first problem is NP-complete and the second is MAX SNP-hard. The complexity of tree alignment with a given phylogeny is also considered." J Comput Biol 1994 1 4 337-348 1580 Naor,D. On Near-Optimal Alignm.. J.Comput.Biol. 94 1(4):349-366 Naor D; Brutlag DL On Near-Optimal Alignments of Biological Sequences Sequence alignment; Pairwise alignment; Suboptimal; Dynamic programming; Edit; Distance; IL "A near-optimal alignment between a pair of sequences is an alignment whose score lies within the neighborhood of the optimal score. We present an efficient method for representing all alignments whose score is within any given delta from the optimal score. The representation is a compact graph that makes it easy to impose additional biological constraints and select one desirable alignment from the large set of alignments. We study the combinatorial nature of near-optimal alignments, and define a set of 'canonical' near-optimal alignments." J Comput Biol 1994 1 4 349-366 1581 Julich,A. Implementations of BLA.. Comput.Appl.Bio 95 11(1):3-6 Julich A Implementations of BLAST for Parallel Computers BLAST; Parallel; Complexity; DE "The BLAST sequence comparison programs have been ported to a variety of parallel computers - the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example." Comput Appl Biosci 1995 11 1 3-6 1582 Resenchuk,S.M ALIGNMENT SERVICE: Cre.. Comput.Appl.Bio 95 11(1):7-11 Resenchuk SM; Blinov VM ALIGNMENT SERVICE: Creation and Processing of Alignments of Sequences of Unlimited Length Multiple alignment; Sequence alignment; Editor; RU "A package for the creation and processing of multiple sequence alignment is described. There is no limit on the lengths of the processed nucleotide or amino acid sequences, and the number of sequences in the alignment is also unlimited. The main groups of functions are: a semi-automatic alignment editor; a wide set of functions for technical processing of alignments; nucleotide alignment mapping and translation; and similarity search functions. A user- friendly interface and a set of generally used file actions provide a special operational subsystem for everyday tasks." Comput Appl Biosci 1995 11 1 7-11 1583 Hirosawa,M. Comprehensive Study on.. Comput.Appl.Bio 95 11(1):13-18 Hirosawa M; Totoki Y; Hoshida M; Ishikawa M Comprehensive Study on Iterative Algorithms of Multiple Sequence Alignment Sequence alignment; Multiple alignment; Heuristic; Approximation; JP; Algorithm "Recently, an effective new class of algorithms has been developed. These algorithms iteratively apply dynamic programming to partially aligned sequences to improve their alignment quality. ... This paper reports our comprehensive comparison of iterative algorithms. We proved that performance improves remarkably when using a tree-based iterative method, which iteratively refines an alignment whenever two subalignments are merged in a tree-based way. We propose a tree-dependent, restricted partitioning technique to efficiently reduce the execution time of iterative algorithms." Comput Appl Biosci 1995 11 1 13-18 1584 Granjeon,E. Detection of Compositi.. Comput.Appl.Bio 95 11(1):29-37 Granjeon E; Tarroux P Detection of Compositional Constraints in Nucleic Acid Sequences Using Neural Networks Exon; Intron; Composition; Neural; FR; Detection; Nucleic acid; Network "We describe in this paper a neural network method for the detection of compositional constraints in introns and exons. ... As with the previous approaches, this technique discriminates introns and exons .... Moreover, using junk DNA sequences in the learning phase allows one to detect constrained regions inside the intron and the exon sequences (i.e., sequences that differ, by their nucleic acid compositions, from junk DNA). The application of our approach could be useful in the study of the internal organization of these sequences." Comput Appl Biosci 1995 11 1 29-37 1585 Eroshkin,A.M. PROANAL version 2: Mul.. Comput.Appl.Bio 95 11(1):39-44 Eroshkin AM; Fomin VI; Zhilkin PA; Ivanisenko VV; Kondrakhin YV PROANAL version 2: Multifunctional Program for Analysis of Multiple Protein Sequence Alignments and for Studying the Structure - Activity Relationships in Protein Families Protein; Multiple alignment; Sequence alignment; RU; Program; Structure "A new version of the program PROANAL is described. A multiple linear regression analysis of the protein structure - activity relationship allows one to investigate the combinations of protein sites and factors influencing the activity. ... PROANAL2 may be useful in the simulation of protein-engineering experiments and in the search of a number of protein regions such as functional sites, secondary structures, solvent-exposed regions, T- and B-cell antigenic determinants, etc." Comput Appl Biosci 1995 11 1 39-44 1586 Bodlaender,H. Parameterized Complexi.. Comput.Appl.Bio 95 11(1):49-57 Bodlaender HL; Downey RG; Fellows MR; Hallett MT; Wareham HT Parameterized Complexity Analysis in Computational Biology Longest common; Sequence alignment; Consensus discovery; Complexity; CA; Parameterized "Many computational problems in biology involve parameters for which a small range of values cover important applications. We argue that for many problems in this setting, parameterized computational complexity rather than NP- completeness is the appropriate tool for studying apparent intractability. ... In addition to surveying this complexity framework, we describe a new result for the Longest Common Subsequence problem. ... Lower bounds on the complexity of this basic combinatorial problem imply lower bounds on more general sequence alignment and consensus discovery problems." Comput Appl Biosci 1995 11 1 49-57 1587 Sagot,M.F. Finding Flexible Patte.. Comput.Appl.Bio 95 11(1):59-70 Sagot MF; Viari A; Pothier J; Soldano H Finding Flexible Patterns in a Text: An Application to Three-dimensional Molecular Matching Pattern match; Multidimensional; Protein; Structure; FR "Finding certain regularities in a text is an important problem in many areas, e.g., in the analysis of biological molecules such as nucleic acids or proteins. In the latter case, the text may be sequences of amino acids or a linear coding of three-dimensional structures, and the regularities then correspond to lexical or structural motifs common to two, or more, proteins. We first recall an earlier algorithm that found these regularities in a flexible way. Then we introduce a generalized version of this algorithm designed for the particular case of protein three-dimensional structures, since these structures present a few peculiarities that make them computationally harder to process." Comput Appl Biosci 1995 11 1 59-70 1588 Perochon-Dori RNA_d2: A Computer Pro.. Comput.Appl.Bio 95 11(1):101-109 Perochon-Dorisse J; Chetouani F; Aurel S; Iscolo N; Bichot B RNA_d2: A Computer Program for Editing and Display of RNA Secondary Structures RNA; Editing; Display; Secondary; Structure; FR; Program "RNA_d2 is a user-friendly program developed for interactively generating aesthetic and non-overlapping drawings of RNA secondary structures. It is designed so that the drawings can be edited in a very natural and intuitive way, in order to emphasize structural homologies between several molecules, as well as the foldings themselves to update the base-pair sets according to new data. ... RNA_d2 allows easy untangling and editing of RNA molecules > 1000 nucleotides long." Comput Appl Biosci 1995 11 1 101-109 1589 Schoniger,M. Simulating Efficiently.. Comput.Appl.Bio 95 11(1):111-115 Schoniger M; von Haeseler A Simulating Efficiently the Evolution of DNA Sequences Simulation; Evolution; DNA; Stochastic; DE "Two menu-driven FORTRAN programs are described that simulate the evolution of DNA sequences in accordance with a user-specified model. This general stochastic model allows for an arbitrary stationary nucleotide composition and any transition-transversion bias during the process of base substitution. In addition, the user may define any hypothetical model tree according to which a family of sequences evolves. The programs suggest the computationally most inexpensive approach to generate nucleotide substitutions. Either reproducible or non-repeatable simulations ... can be performed." Comput Appl Biosci 1995 11 1 111-115 1590 Xu,Y. Correcting Sequencing .. Comput.Appl.Bio 95 11(2):117-124 Xu Y; Mural RJ; Uberbacher EC Correcting Sequencing Errors in DNA Coding Regions Using a Dynamic Programming Approach Error; Correction; Sequence analysis; DNA; Coding; Region; Dynamic programming; USA; Sequencing; Dynamic "This paper presents an algorithm for detecting and 'correcting' sequencing errors that occur in DNA coding regions. The types of sequencing errors addressed are insertions and deletions (indels) of DNA bases. The goal is to provide a capability which makes single-pass or low-redundancy sequence data more informative, reducing the need for high-redundancy sequencing for gene identification and characterization purposes. ... On a test set consisting of 68 human DNA sequences with 1% randomly generated indels in coding regions, the algorithm detected and corrected 76% of the indels." Comput Appl Biosci 1995 11 2 117-124 1591 Chao,K.M. A Local Alignment Tool.. Comput.Appl.Bio 95 11(2):147-153 Chao KM; Zhang J; Ostell J; Miller W A Local Alignment Tool for Very Long DNA Sequences Sequence alignment; Pairwise alignment; DNA; USA "This paper presents a practical program, called sim2, for building local alignments of two sequences, each of which may be hundreds of kilobases long. sim2 first constructs n best non-intersecting chains of 'fragments', such as all occurrences of identical 5-tuples in each of two DNA sequences, for any specified n >= 1. Each chain is then refined by delivering an optimal alignment in a region delimited by the chain. sim2 requires only space proportional to the size of the input sequences and the output alignments, and the same source code runs on Unix machines, on Macintoshes, on PCs, and on DEC Alpha PCs." Comput Appl Biosci 1995 11 2 147-153 1592 Watanabe,H. A Comprehensive Repres.. Comput.Appl.Bio 95 11(2):159-166 Watanabe H; Otsuka J A Comprehensive Representation of Extensive Similarity Linkage between Large Numbers of Proteins Protein; Multiple comparison; Sequence comparison; Clustering; JP; Representation; Similarity "A method is described for the representation of a bird's-eye view of similarity relationships between large numbers of proteins. With the aid of single-linkage clustering, proteins are clustered into groups on the basis of various types of similarity such as sequence similarity estimated between all the protein pairs. Proteins in a group are directly or indirectly connected to all proteins in the same group by similarities higher than a given threshold and show no similarity higher than the threshold to any proteins outside the group." Comput Appl Biosci 1995 11 2 159-166 1593 Thompson,J.D. Introducing Variable G.. Comput.Appl.Bio 95 11(2):181-186 Thompson JD Introducing Variable Gap Penalties to Sequence Alignment in Linear Space Sequence alignment; Pairwise alignment; Gap; Complexity; DE "The problem of finding an optimal sequence alignment has been solved by Hirschberg (1975) in quadratic time and linear space. Myers and Miller (1988) presented an implementation of this algorithm for aligning biological sequences, incorporating affine gap penalties. ... This paper presents a further development of the Myers and Miller algorithm. Here, we maximize similarity scores and, more significantly, introduce position-specific gap penalties. Thus, residue-dependent information such as structure preferences and existing gaps in a partial alignment can be applied to the solution of the alignment problem." Comput Appl Biosci 1995 11 2 181-186 1594 Doelz,R. A Compression Mechanis.. Comput.Appl.Bio 95 11(2):219-223 Doelz R; Eggenberger F A Compression Mechanism for Sequence Databases to Improve the Efficiency of Conventional Tools Sequence database; Compression; SWI "This paper describes a method to compress molecular biology databases that are characterized by an increasing proportion of data derived from genome projects. The performance of our tool has been tested on various data files of the EMBL nucleotide sequence database. ... The compression of sequence database updates was tested in combination with the common Unix compression program 'compress'. Our tool improved the efficiency of 'compress' on average by 16%." Comput Appl Biosci 1995 11 2 219-223 1595 Allison,L. The Posterior Probabil.. J.Mol.Evol. 94 39:418-430 Allison L; Wallace CS The Posterior Probability Distribution of Alignments and its Application to Parameter Estimation of Evolutionary Trees and to Optimization of Multiple Alignments Evolutionary tree; Probability; Sequence alignment; Multiple alignment; Optimal; Simulated annealing; AU; Distribution; Optimization; Estimation "How to sample alignments from their posterior probability distribution given two strings is shown. This is extended to sampling alignments of more than two strings. The result is first applied to the estimation of the edges of a given evolutionary tree over several strings. Second, when used in conjunction with simulated annealing, it gives a stochastic search method for an optimal multiple alignment." J Mol Evol 39 39 418-430 1596 Benson,D.A. GenBank Nucleic Acids R 94 22(17):3441-34 Benson DA; Boguski M; Lipman DJ; Ostell J GenBank DNA; Sequence database; GenBank; USA "The GenBank sequence database continues to expand its data coverage, quality control, annotation content and retrieval services for the scientific community. Besides handling direct submissions of sequence data from authors, GenBank also incorporates DNA sequences from all available public sources; an integrated retrieval system, known as Entrez, also makes available data from the major protein sequence and structural databases, and from U.S. and European patents. MEDLINE abstracts from published articles describing the sequences are also included as an additional source of biological annotation for sequence entries." Nucleic Acids Res 1994 22 17 3441-3444 1597 Benson,G. A Method for Fast Data.. Nucleic Acids R 94 22(22):4828-48 Benson G; Waterman MS A Method for Fast Database Search for all k-Nucleotide Repeats Database search; DNA; Repeat; k-tuple; Region; USA "A significant portion of DNA consists of repeating patterns of various sizes, from very small (one, two and three nucleotides) to very large (over 300 nucleotides). ... It would be useful to search for such regions in the DNA database in order that they may be studied more fully. ... Therefore, any program to look for repeats must be efficient and fast. In this paper, we present some new techniques that are useful in recognizing repeating patterns and describe a new program for rapidly detecting repeat regions in the DNA database where the basic unit of the repeat has size up to 32 nucleotides." Nucleic Acids Res 1994 22 22 4828-4836 1598 Bonfield,J.K. The Application of Num.. Nucleic Acids R 95 23(8):1406-141 Bonfield JK; Staden R The Application of Numerical Estimates of Base Calling Accuracy to DNA Sequencing Projects DNA; Sequence recognition; Sequencing; Accuracy; Consensus method; UK "During DNA sequencing projects one of the most labour intensive and highly skilled tasks is to view the original trace descriptions of gels and to adjudicate between conflicting readings. Given the current methods of calculating a consensus, the majority of the time employed in viewing traces and editing readings is actually devoted to making the poorer data fit the good data. We propose new consensus calculation algorithms that employ numerical estimates of base calling accuracy and which when used in conjunction with an automatic detector of contradictory data should greatly reduce the time spent checking and editing readings and hence improve DNA sequencing productivity." Nucleic Acids Res 1995 23 8 1406-1410 1599 Borodovsky,M. Intrinsic and Extrinsi.. Nucleic Acids R 94 22(22):4756-47 Borodovsky M; Rudd KE; Koonin EV Intrinsic and Extrinsic Approaches for Detecting Genes in a Bacterial Genome Gene; Identification; Detection; Markov; BLAST; Motif; Sequence comparison; USA; Genome "The unannotated regions of the E. coli genome DNA sequence from the EcoSeq6 database, totaling 1,278 'intergenic' sequences of the combined length of 359,279 basepairs, were analyzed using computer-assisted methods with the aim of identifying putative unknown genes. The proposed strategy for finding new genes includes two key elements: (i) prediction of expressed open reading frames (ORFs) using the GeneMark method based on Markov chain models for coding and non-coding regions of E. coli DNA, and (ii) search for protein sequence similarities using programs based on the BLAST algorithm and programs for motif identification." Nucleic Acids Res 1994 22 22 4756-4767 1600 Cserzo,M. New Alignment Strategy.. J.Mol.Biol. 94 243:388-396 Cserzo M; Bernassau JM; Simon I; Maigret B New Alignment Strategy for Transmembrane Proteins Protein; Sequence alignment; Homology; Model; FR "In this paper an algorithm which locates helical transmembrane segments is described. It is shown that given the location of transmembrane helices of a protein, corresponding helices in another membrane related protein can be pinpointed. The method seems to be extremely insensitive to sequence identity but highly sensitive to the property of a sequence to assume transmembrane helical structure. ... There are indications that hint at the broader range of applicability of the presented method." J Mol Biol 243 243 388-396 1601 Emmert,D.B. The European Bioinform.. Nucleic Acids R 94 22(17):3445-34 Emmert DB; Stoehr PJ; Stoesser G; Cameron GN The European Bioinformatics Institute (EBI) Databases Database search; Sequence database; Sequence search; UK "This paper describes the databases and services of the European Bioinformatics Institute (EBI). In collaboration with DDBJ and GenBank/NCBI, the EBI maintains and distributes the EMBL Nucleotide Sequence Database, Europe's primary nucleotide sequence data resource. ... Over thirty additional specialist molecular biology databases, as well as software and documentation of interest to molecular biologists, are also available. The EBI network services include database searching, entry retrieval, and sequence similarity searching facilities." Nucleic Acids Res 1994 22 17 3445-3449 1602 Fasman,K.H. The GDB(TM) Human Geno.. Nucleic Acids R 94 22(17):3462-34 Fasman KH; Cuticchia AJ; Kingsbury DT The GDB(TM) Human Genome Data Base anno 1994 Genome; Database search; Mapping; Data acquisition; USA "In 1991 the Genome Data Base at Johns Hopkins University School of Medicine was selected as the central repository for mapping data from the Human Genome Project .... It is even more important that GDB provide leadership in the genome informatics enterprise. Three themes described here are dominant in our future plans and represent the essence of the major changes made in the past year. They include: enhanced data acquisition, better map representation, and full integration into the collection of genomic databases." Nucleic Acids Res 1994 22 17 3462-3469 1603 Fu,Y.X. Linear Invariants unde.. J.Theor.Biol. 95 173:339-352 Fu YX Linear Invariants under Jukes' and Cantor's One-parameter Model Invariant; Phylogenetic; Evolutionary tree; USA; Model "Linear invariants are random variables with zero expectations under certain assumptions. In this paper, linear invariants under Jukes' and Cantor's one-parameter model, both with and without the assumption that nucleotide frequencies are at equilibrium, are studied using the method developed in a previous paper. Phylogenetic linear invariants ... for trees with up to seven species are derived and bases of phylogenetic linear invariant spaces for unrooted trees with four, five and six species are presented." J Theor Biol 173 173 339-352 1604 Gautheret,D. Identification of Base.. J.Mol.Biol. 95 248:27-43 Gautheret D; Damberger SH; Gutell RR Identification of Base-triples in RNA using Comparative Sequence Analysis Sequence analysis; Sequence comparison; RNA; Structure; USA; Identification "Comparative sequence analysis has proven to be a very efficient tool for the determination of RNA secondary structure and certain tertiary interactions. However, base-triples, an important RNA structural element, cannot be predicted accurately from sequence data. We show here that the poor base correlations observed at base-triple positions are the result of two factors. (1) Base covariation is not as strictly required in triples as it is in Watson-Crick pairs. (2) Base-triple structures are less conserved among homologous molecules." J Mol Biol 248 248 27-43 1605 George,D.G. The PIR-International .. Nucleic Acids R 94 22(17):3569-35 George DG; Barker WC; Mewes HW; Pfeiffer F; Tsugita A The PIR-International Protein Sequence Database Protein; Sequence database; USA "PIR-International is an association of macromolecular sequence data collection centers dedicated to fostering international cooperation as an essential element in the development of scientific databases. A major objective of PIR-International is to continue the development of the Protein Sequence Database as an essential public resource for protein sequence information. This paper briefly describes the architecture of the Protein Sequence Database and how it and associated data sets are distributed and can be accessed electronically." Nucleic Acids Res 1994 22 17 3569-3573 1606 Gu,X. The Size Distribution .. J.Mol.Evol. 95 40:464-473 Gu X; Li WH The Size Distribution of Insertions and Deletions in Human and Rodent Pseudogenes Suggests the Logarithmic Gap Penalty for Sequence Alignment Pseudogene; Gap; Sequence alignment; Indel; USA; Distribution; Deletion "The size distributions of deletions, insertions, and indels ... were studied, using 78 human processed pseudogenes and other published data sets. The following results were obtained: (1) Deletions occur more frequently than do insertions in sequence evolution ... (2) Empirically, the size distributions of deletions, insertions, and indels can be described well by a power law ... (5) The linear gap penalty, which is most commonly used in sequence alignment, is not supported by our analysis; rather, ... an appropriate gap penalty is wk = a + b ln k, where a is the gap creation cost and b ln k is the gap extension cost ...." J Mol Evol 40 40 464-473 1607 Hein,J. A Maximum-Likelihood A.. J.Mol.Evol. 95 40:181-189 Hein J; Stovlbaek J A Maximum-Likelihood Approach to Analyzing Nonoverlapping and Overlapping Reading Frames Likelihood; Reading; Frame; Evolution; Sequence analysis; DK "A model is presented for sequence evolution on the basis of which one can analyze combinations of noncoding, singly coding, and multiply coding regions of aligned homologous DNA sequences. It is a generalization of Kimura's (1980) and Li, Wu & Luo's (1985) transition-transversion models with selection on replacement substitutions. Based on a hierarchy of hypotheses, one will be able to estimate selection factors and transition and transversion distances for different combinations of regions ...." J Mol Evol 40 40 181-189 1608 Henikoff,S. Position-based Sequenc.. J.Mol.Biol. 94 243:574-578 Henikoff S; Henikoff JG Position-based Sequence Weights Multiple alignment; Sequence alignment; Sequence weight; Profile; Database search; Protein; Block search; USA "Sequence weighting methods have been used to reduce redundancy and emphasize diversity in multiple sequence alignment and searching applications. Each of these methods is based on a notion of distance between a sequence and an ancestral or generalized sequence. We describe a different approach, which bases weights on the diversity observed at each position in the alignment, rather than on a sequence distance measure. These position-based weights make minimal assumptions, are simple to compute, and perform well in comprehensive evaluations." J Mol Biol 243 243 574-578 1609 Heulsenbeck,J Performance of Phyloge.. Syst.Biol. 95 44(1):17-48 Heulsenbeck JP Performance of Phylogenetic Methods in Simulation Phylogenetic; Evolutionary tree; Simulation; Performance; USA "In this study, I examined the performance of 26 commonly used methods of phylogenetic inference for three statistical criteria: consistency, efficiency, and robustness. ... The performance of methods was examined under three models of DNA substitution for four taxa. The branch lengths of the four-taxon trees were varied extensively in this simulation. The results indicate that most methods perform well (i.e., estimate the correct tree >= 95% of the time) over a large portion of the four-taxon parameter space. In general, maximum likelihood performed best, followed by the additive distance methods and the parsimony methods." Syst Biol 1995 44 1 17-48 1610 Ina,Y. New Methods for Estima.. J.Mol.Evol. 95 40:190-226 Ina Y New Methods for Estimating the Numbers of Synonymous and Nonsynonymous Substitutions Evolutionary tree; Distance; Substitution; Statistical; JP; Synonymous "New methods for estimating the numbers of synonymous and nonsynonymous substitutions per site were developed. The methods are unweighted pathway methods based on Kumura's two-parameter model. Computer simulations were conducted to evaluate the accuracies of the new methods, Nei and Gojobori's (NG) method, Miyata and Yasunaga's (MY) method, Li, Wu, and Luo's (LWL) method, and Pamilo, Bianchi, and Li's (PBL) method. ... The NG, MY, and LWL methods give overestimates of the number of synonymous substitutions and underestimates of the number of nonsynonymous substitutions." J Mol Evol 40 40 190-226 1611 Krogh,A. A Hidden Markov Model .. Nucleic Acids R 94 22(22):4768-47 Krogh A; Mian IS; Haussler D A Hidden Markov Model that finds Genes in E. coli DNA Markov; Gene; DNA; Statistical; Protein; USA; Model "A hidden Markov model (HMM) has been developed to find protein coding genes in E. coli DNA using E. coli genome DNA sequence from the EcoSeq6 database maintained by Kenn Rudd. This HMM includes states that model the codons and their frequencies in E. coli genes, as well as the patterns found in the intergenic region, including repetitive extragenic palindromic sequences and the Shine-Delgarno motif. ... The HMM finds the exact locations of about 80% of the known E. coli genes, and approximate locations for about 10%. It also finds several potentially new genes, and locates several places where insertion or deletion errors and/or frameshifts may be present in the contigs." Nucleic Acids Res 1994 22 22 4768-4778 1612 Li,W.H. Statistical Tests of D.. Syst.Biol. 95 44(1):49-63 Li WH; Zharkikh A Statistical Tests of DNA Phylogenies Statistical; DNA; Phylogeny; Parsimony; Bootstrap; Minimum evolution; Bias; Neighbor joining; USA "In this article, we review (1) statistical tests of DNA phylogenies inferred by the maximum-parsimony method ... (2) statistical tests based on the minimum-evolution criterion ..., and (3) the bootstrap technique for estimating the confidence level of a phylogenetic hypothesis based on either the maximum- parsimony or the neighbor-joining method. We explain why the bootstrap technique usually gives biased estimates and how to correct the bias." Syst Biol 1995 44 1 49-63 1613 Maidak,B.L. The Ribosomal Database.. Nucleic Acids R 94 22(17):3485-34 Maidak BL; Larsen N; McCaughey MJ; Overbeek R; Olsen GJ; Fogel K; Blandy J; Woese CR The Ribosomal Database Project Ribosome; RNA; Sequence database; USA; rdp.life.uiuc.edu "The Ribosomal Database Project (RDP) is a curated database that offers ribosome-related data, analysis services, and associated computer programs. The offerings include phylogenetically ordered alignments of ribosomal RNA (rRNA) sequences, derived phylogenetic trees, rRNA secondary structure diagrams, and various software for handling, analyzing and displaying alignments and trees. The data are available via anonymous ftp (rdp.life.uiuc.edu), electronic mail (serverrdp.life.uiuc.edu) and gopher (rdpgopher.life.uiuc.edu)." Nucleic Acids Res 1994 22 17 3485-3487 1614 Sakakibara,Y. Stochastic Context-fre.. Nucleic Acids R 94 22(23):5112-51 Sakakibara Y; Brown M; Hughey R; Mian IS; Sjolander K; Underwood RC; Haussler D Stochastic Context-free Grammars for tRNA Modeling Stochastic; Language; Grammar; Sequence alignment; Markov; Structure; Secondary; USA "Stochastic context-free grammars (SCFGs) are applied to the problems of folding, aligning and modeling families of tRNA sequences. SCFGs capture the sequences' common primary and secondary structure and generalize the hidden Markov models (HMMs) used in related work on protein and DNA. Results show that after having been trained on as few as 20 tRNA sequences from only two tRNA subfamilies ..., the model can discern general tRNA from similar-length RNA sequences of other kinds, can find secondary structure of new tRNA sequences, and can produce multiple alignments of large sets of tRNA sequences." Nucleic Acids Res 1994 22 23 5112-5120 1615 Sander,C. The HSSP Database of P.. Nucleic Acids R 94 22(17):3597-35 Sander C; Schneider R The HSSP Database of Protein Structure - Sequence Alignments Protein; Structure; Sequence alignment; Sequence database; Homology; DE "HSSP (homology-derived structures of proteins) is a derived database merging structural (2-D and 3-D) and sequence information (1-D). For each protein of known 3D structure from the Protein Data Bank, the database has a file with all sequence homologues, properly aligned to the PDB protein. Homologues are very likely to have the same 3D structure as the PDB protein to which they have been aligned. As a result, the database is not only a database of sequence aligned sequence families, but it is also a database of implied secondary and tertiary structures." Nucleic Acids Res 1994 22 17 3597-3599 1616 Snyder,E.E. Identification of Prot.. J.Mol.Biol. 95 248:1-18 Snyder EE; Stormo GD Identification of Protein Coding Regions in Genomic DNA Protein; Coding; Region; Genomic; DNA; Gene; Identification; USA "We have developed a computer program, GeneParser, which identifies and determines the fine structure of protein genes in genomic DNA sequences. ... Using this method, we can rapidly generate ranked suboptimal solutions, each of which is the optimum solution containing a given intron-exon junction. We have tested the system on a large collection of human genes. ... We have also quantified the robustness of the method to substitution and frame-shift errors and show how the system can be optimized for performance on sequences with known levels of sequencing errors." J Mol Biol 248 248 1-18 1617 Strelets,V.B. Analysis of Peptides f.. J.Mol.Evol. 94 39:625-630 Strelets VB; Shindyalov IN; Lim HA Analysis of Peptides from Known Proteins: Clusterization in Sequence Space Protein; Sequence proximity; k-tuple; Evolution; Clustering; USA "A combinatorial sequence space (CSS) model was introduced to represent sequences as a set of overlapping k-tuples of some fixed length which correspond to points in the CSS. The aim was to analyze clusterization of protein sequences in the CSS and to test various hypotheses about the possible evolutionary basis of this clusterization. The authors developed an easy-to-use technique which can reveal and analyze such a characterization in a multidimensional CSS. Application of the technique led to an unexpectedly high clusterization of points in the CSS corresponding to k-tuples from known proteins." J Mol Evol 39 39 625-630 1618 Thompson,J.D. CLUSTAL W: Improving t.. Nucleic Acids R 94 22(22):4673-46 Thompson JD; Higgins DG; Gibson TJ CLUSTAL W: Improving the Sensitivity of Progressive Multiple Sequence Alignment through Sequence Weighting, Position-specific Gap Penalties and Weight Matrix Choice Multiple alignment; Sequence alignment; Sequence weight; Gap; Substitution; DE; Matrix "The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up- weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly .... Fourthly .... These modifications are incorporated into a new program, CLUSTAL W which is freely available." Nucleic Acids Res 1994 22 22 4673-4680 1619 Thorne,J.L. Estimation and Reliabi.. Biometrics 95 51(1):100-113 Thorne JL; Churchill GA Estimation and Reliability of Molecular Sequence Alignments Estimation; Reliability; Sequence alignment; Stochastic; Evolution; Likelihood; USA "The problem of estimating the relatedness of a pair of biological sequences is addressed. A stochastic model of sequence evolution is described that allows insertion and deletion as well as replacement of amino acid residues ... over time. An expectation-maximization (EM) algorithm that obtains maximum likelihood estimates of the model parameters is introduced. The method assumes that the sequences are related by descent from a common ancestor but the alignment (i.e., the precise evolutionary correspondence between residues in each sequence) is unknown. Results from the E-step of the EM algorithm are used to assess the likelihood that any two residues are related by direct descent from a common ancestor." Biometrics 1995 51 1 100-113 1620 Tillier,E.R.M Maximum Likelihood wit.. J.Mol.Evol. 94 39:409-417 Tillier ERM Maximum Likelihood with Multiparameter Models of Substitution Likelihood; Substitution; Model; Evolution; CA "Maximum-likelihood approaches to phylogenetic estimation have the potential of great flexibility, even though current implementations are highly constrained. One such constraint has been the limitation to one-parameter models of substitution. A general implementation of Newton's maximization procedure was developed that allows the maximum likelihood method to be used with multiparameter models. The Estimate and Maximize (EM) algorithm was also used to obtain a good approximation to the maximum likelihood for a certain class of multiparameter models." J Mol Evol 39 39 409-417 1621 Ulyanov,A.V. Multi-alphabet Consens.. Nucleic Acids R 95 23(8):1434-144 Ulyanov AV; Stormo GD Multi-alphabet Consensus Algorithm for Identification of Low Specificity Protein-DNA Interactions Consensus discovery; Identification; Signal; DNA; USA; Algorithm "A method for the identification and characterization of protein-DNA interactions is presented. We have developed an approach for finding unknown multiple patterns that occur imperfectly in a set of several sequences. The pattern may contain letters from the nucleotide alphabet (A,C,G,T) including ambiguous characters (A/C, A/G, etc.). This method reveals weak DNA signals on an unaligned set of DNA fragments known to be functionally related and assumes no prior information on the sequences' alignment." Nucleic Acids Res 1995 23 8 1434-1440 1622 Levitt,M. Accurate Modeling of P.. J.Mol.Biol. 92 226:507-533 Levitt M Accurate Modeling of Protein Conformation by Automatic Segment Matching Protein; Segment; Match complex patterns; Sequence match; Database search; Model; USA "Segment match modeling uses a data base of highly refined known protein X-ray structures to build an unknown target structure from its amino acid sequence and the atomic coordinates of a few of its atoms .... The target structure is first broken into a set of short segments. The data base is then searched for matching segments, which are fitted onto the framework of the target structure. Three criteria are used for choosing a matching data base segment: amino acid sequence similarity, conformational similarity (atomic coordinates), and compatibility with the target structure (van der Waals' interactions)." J Mol Biol 226 226 507-533 1623 Cardon,L.R. Expectation Maximizati.. J.Mol.Biol. 92 223:159-170 Cardon LR; Stormo GD Expectation Maximization Algorithm for Identifying Protein-binding Sites with Variable Lengths from Unaligned DNA Fragments DNA; Expectation; Maximization; Protein; Binding; Multiple alignment; Consensus sequence; Fragment; USA; Algorithm "An Expectation Maximization algorithm for identification of DNA binding sites is presented. The approach predicts the location of binding regions while allowing variable length spacers within the sites. In addition to predicting the most likely spacer length for a set of DNA fragments, the method identifies individual sites that differ in spacer size. No alignment of DNA sequences is necessary. The method is illustrated by application to 231 E. coli DNA fragments known to contain promoters with variable spacings between their consensus regions." J Mol Biol 223 223 159-170 1624 Saroff,H.A. The Uniqueness of Prot.. Bull.Math.Biol. 84 46(4):661-672 Saroff HA The Uniqueness of Protein Sequences. Uniqueness Diagrams for the Dayhoff File - 1984 Protein; Sequence analysis; k-tuple; Monte Carlo; Approximation; USA "Protein sequences of the Dayhoff databank of 1984 have been analyzed to evaluate the occurrences of the 400 dipeptides and 8000 tripeptides. Expected values and standard deviations for the di- and tri-peptides were determined by Monte Carlo and binomial approximation. A condensed format containing this information, labeled a uniqueness diagram, is presented and made available in the form of a microfiche." Bull Math Biol 1984 46 4 661-672 1625 Bougueleret,L Objective Comparison o.. Nucleic Acids R 88 16(5):1729-173 Bougueleret L; Tekaia F; Sauvaget I; Claverie JM Objective Comparison of Exon and Intron Sequences by the Mean of 2- Dimensional Data Analysis Methods Exon; Intron; Sequence comparison; Sequence analysis; FR "Here we advocate the use of 2-dimensional data representation in the context of the informational approach of sequence analysis (Claverie & Bougueleret (1986) NAR 14, 179-196) by applying these methods to the problem of intron/exon discrimination. Two main findings are reported: (i) oligonucleotide patterns complementary to the U1 small nuclear RNA are specifically avoided in exon sequences, (ii) vertebrate intron sequences, to the exclusion of other eukariotic phyla, are characterized by a peculiar distribution of CpG containing patterns." Nucleic Acids Res 1988 16 5 1729-1738 1626 Fields,C.A. Gm: A Practical Tool f.. Comput.Appl.Bio 90 6:263-270 Fields CA; Soderlund CA Gm: A Practical Tool for Automating DNA Sequence Analysis Sequence analysis; DNA Snyder & Stormo (1995), p. 17 Comput Appl Biosci 6 6 263-270 1627 Gish,W. Identification of Prot.. Nature Genetics 93 3:266-272 Gish W; States DJ Identification of Protein Coding Regions by Database Similarity Search Protein; Database search; Similarity; Identification; Coding; Region Snyder & Stormo (1995), p. 17 Nature Genetics 3 3 266-272 1628 Knight,J.R. Super-Pattern Matching Algorithmica 95 13(1/2):211-24 Knight JR; Myers EW Super-Pattern Matching Pattern match; USA UnCover SICI Code: 0178-4617(19950101)13:1:2L.211:SM;1- Algorithmica 1995 13 1/2 211-243 1629 Zuker,M. On Finding all Subopti.. Science 89 244(7 April):4 Zuker M On Finding all Suboptimal Foldings of an RNA Molecule RNA; Folding; Secondary; Structure; Suboptimal; CA "An algorithm and a computer program have been prepared for determining RNA secondary structures within any prescribed increment of the computed global minimum free energy. The mathematical problem of determining how well defined a minimum energy folding is can now be solved. All predicted base pairs that can participate in suboptimal structures may be displayed and analyzed graphically. Representative suboptimal foldings are generated by selecting these base pairs one at a time and computing the best foldings that contain them." Science 1989 244 7 April 48-52 1630 Uberbacher,E. Locating Protein-codin.. Proc.Nat.Acad.S 91 88:11261-11265 Uberbacher EC; Mural RJ Locating Protein-coding Regions in Human DNA Sequences by a Multiple Sensor-Neural Network Approach DNA; Neural; Network; Coding; Protein; Region; USA "Identifying genes within large regions of uncharacterized DNA is a difficult undertaking and is currently the focus of many research efforts. We describe a reliable computational approach for locating protein-coding portions of genes in anonymous DNA sequence. Using a concept suggested by robotic environmental sensing, our method combines a set of sensor algorithms and a neural network to localize the coding regions. Several algorithms that report local characteristics of the DNA sequence, and therefore act as sensors, are also described." Proc Nat Acad Sci USA 88 88 11261-11265 1631 Jiang,T. On the Complexity of L.. Theoret.Comput. 93 119:363-371 Jiang T; Li M On the Complexity of Learning Strings and Sequences Sequence analysis; Complexity; Model; Learning; CA "It is shown that strings (sequences) cannot be learned by strings (sequences) in Valiant's distribution-free (pac-) learning model, assuming RP not= NP." Theoret Comput Sci 119 119 363-371 1632 Amir,A. Improved Dynamic Dicti.. Inform.Comput. 95 119(2):258-282 Amir A; Farach M; Schaffer AA Improved Dynamic Dictionary Matching Dictionary match; Dynamic; USA UnCover SICI Code: 0890-5401(19950601)119:2L.258:IDDM;1- Inform Comput 1995 119 2 258-282 1633 Henikoff,S. Performance Evaluation.. Proteins Struct 93 17:49-61 Henikoff S; Henikoff JG Performance Evaluation of Amino Acid Substitution Matrices Substitution; Amino acid; Performance Henikoff & Henikoff (1994), p.578 Proteins Struct Funct Genet 17 17 49-61 1634 Luthy,R. Improving the Sensitiv.. Protein Sci. 94 3:139-146 Luthy R; Xenarios I; Bucher P Improving the Sensitivity of the Sequence Profile Method Profile; Protein Henikoff & Henikoff (1994), p.578 Protein Sci 3 3 139-146 1635 Evens,S.N. Invariants of Some Pro.. Ann.Statist. 93 21(1):355-377 Evens SN; Speed TP Invariants of Some Probability Models Used in Phylogenetic Inference Invariant; Phylogenetic; Fourier; Probability; Model; USA "The so-called method of invariants is a technique in the field of molecular evolution for inferring phylogenetic relations among a number of species on the basis of nucleotide sequence data. An invariant is a polynomial function of the probability distribution defined by a stochastic model for the observed nucleotide sequence. ... For a wide class of models found in the literature, we present a simple algebraic formalism for recognising whether or not a function is an invariant and for generating all possible invariants. Our work is based on recognising an underlying group structure and using discrete Fourier analysis." Ann Statist 1993 21 1 355-377 1636 Fu,Y.X. Necessary and Sufficie.. Math.Biosci. 91 105:229-238 Fu YX; Li WH Necessary and Sufficient Conditions for the Existence of Certain Quadratic Invariants under a Phylogenetic Tree Invariant; Phylogenetic; Evolutionary tree; Substitution; Model; USA "Invariants are functions of the probabilities of state configurations among lineages, with expected values equal to zero under certain phylogenies. For two-state sequences, the existence of certain quadratic invariants requires a symmetric substitution model. For sequences with more than two states, the necessary condition for the existence of certain quadratic invariants in terms of independent events is much stronger than symmetry. For DNA sequences, only three parameters are allowed in the substitution model, which includes Kimura's two-parameter model as a special case." Math Biosci 105 105 229-238 1637 Nguyen,T. A Derivation of All Li.. J.Mol.Evol. 92 35:60-76 Nguyen T; Speed TP A Derivation of All Linear Invariants for a Nonbalanced Transversion Model Invariant; Transversion; Phylogenetic; Model; Evolutionary tree; USA "Cavender noted a generalization of linear invariants to certain more general substitution models. In this paper we give a simple explicit description of a basis for all linear invariants for a slight variant of Cavender's more general model, which applies to rooted trees linking any number of species. Bases for rooted trees linking five species are enumerated and the method applied to a problem concerning RNA polymerase sequence data." J Mol Evol 35 35 60-76 1638 Bunke,H. An Algorithm for Match.. Computing 93 50:297-314 Bunke H; Csirik J An Algorithm for Matching Run-Length Coded Strings String match; Edit; Distance; Longest common; Sequence comparison; SWI; Algorithm "An algorithm for the computation of the edit distance of run-length coded strings is given. In run-length coding, not all individual symbols in a string are explicitly listed. Instead, one run of identical consecutive symbols is coded by giving one representative symbol together with its multiplicity. The algorithm determines the minimum cost sequence of edit operations transforming one string into another. In the worst case, the algorithm has a time complexity of O(nm), where n and m give the lengths of the strings to be compared. In the best case, the time complexity is O(kl), where k and l are the numbers of runs of identical symbols in the two strings under comparison." Computing 50 50 297-314 1639 Bock,H.H. Consensus Rules for Mo.. Learning and .. 95Springer-Verlag Bock HH; Day WHE; McMorris FR Consensus Rules for Molecular Sequences: Open Problems Bock HH Polasek W Learning and Knowledge Consensus discovery; Consensus method; Sequence analysis; DE Preprint: "Any set of n aligned molecular (e.g., DNA, protein) sequences can be represented by a matrix of n rows and m columns in which the n symbols (e.g., bases, amino acids) in a given column represent homologous states of a biological character. At each column of the matrix, a problem of consensus description is to determine a set of symbols (e.g., ambiguity codes for DNA) that best represents the n symbols in the column. Although consensus sequences appear frequently in the biological literature, the features or relevance of rules for deriving consensus molecular sequences are largely unexplored. To encourage further research, we summarize recent work and pose mathematical and biological problems regarding consensus rules for molecular sequences." Springer-Verlag Heidelberg 1995 1-11 1640 Overington,J. Environment-specific A.. Protein Sci. 92 1:216-226 Overington J; Donnelly D; Johnson MS; Sali A; Blundell TL Environment-specific Amino Acid Substitution Tables: Tertiary Templates and Prediction of Protein Folds Amino acid; Substitution; Template; Prediction; Protein; Fold Altschul, Boguski, Gish & Wootton (1994), p. 129 Protein Sci 1 1 216-226 1641 Pascarella,S. Analysis of Insertions.. J.Mol.Biol. 92 224:461-471 Pascarella S; Argos P Analysis of Insertions/Deletions in Protein Structures Indel; Protein; Structure; Sequence analysis; Evolution; DE "An analysis of insertions and deletions (indels) occurring in a databank of multiple sequence alignments based on protein tertiary structure is reported. Indels prefer to be short (1 to 5 residues). The average intervening sequence length between them versus the percentage of residue identity in pairwise alignments shows an exponential behaviour, suggesting a stochastic process such that nearly every loop in an ancestral structure is a possible target for indels during evolution." J Mol Biol 224 224 461-471 1642 Sikela,J.M. Finding New Genes Fast.. Nature Genetics 93 3:189-191 Sikela JM; Auffray C Finding New Genes Faster than Ever Gene Altschul, Boguski, Gish & Wootton (1994), p. 129 Nature Genetics 3 3 189-191 1643 Karlin,S. Charge Configurations .. Proc.Nat.Acad.S 88 85:9396-9400 Karlin S; Brendel V Charge Configurations in Viral Proteins Sequence search; Charge; Statistical; Protein; USA; Viral "The spatial distribution of the charged residues of a protein is of interest with respect to potential electrostatic interactions. We have examined the proteins of a large number of representative eukaryotic and prokaryotic viruses for the occurrence of significant clusters, runs, and periodic patterns of charge. ..." Proc Nat Acad Sci USA 85 85 9396-9400 1644 Luo,L.F. Informational Paramete.. J.Theor.Biol. 88 130:351-361 Luo LF; Tsai L; Zhou YM Informational Parameters of Nucleic Acid and Molecular Evolution Information theory; Evolution; Statistical; Probability; Nucleic acid; Evolutionary distance; Grammar; CN "From the point of view of information theory, a statistical analysis of 2000 nucleic acid sequences ... is given. The sequences are grouped into 20 categories. The probability-order-difference (POD) matrix is defined which is used to analyse the evolutionary distance of any two categories of sequences. The informational parameters ... are calculated for each sequence and averaged in each category. The statistical dependence of these parameters on molecular evolution is discussed. It is found that (X) is a good statistical quantity which describes the vocabulary compositions as well as the grammatical constructions of the genetic language." J Theor Biol 130 130 351-361 1645 Rowe,G.W. On the Informational C.. J.Theor.Biol. 83 101:151-170 Rowe GW; Trainor LEH On the Informational Content of Viral DNA Sequence analysis; Information content; DNA; Genome; Codon; Bias; CA; Viral "This paper is concerned primarily with how information is stored in viral DNA. The general problem of defining information content is discussed and a procedure for analysis extended from that of Gatlin (1972) is developed. Long range correlations in base sequences are analyzed for several viral genomes. The relationship of these correlations to the existence of strong codon biases is examined and the consequences discussed." J Theor Biol 101 101 151-170 1646 Stuckle,E.E. Probability of Occurre.. J.Theor.Biol. 92 159:299-306 Stuckle EE; Nielsen PJ; Grob U Probability of Occurrence of Specific Oligomers Sequence analysis; Statistical; Oligomer; Probability; DE "We improved an already existing formula for calculating the probability of occurrence of specific oligomers (Grob and Stuber, 1987) by taking into account unequal base distribution. This method identifies specific oligomers in a given sequence as candidates for biological signals." J Theor Biol 159 159 299-306 1647 Karlin,S. Molecular Evolution of.. J.Virol. 94 68(3):1886-190 Karlin S; Mocarski ES; Schachtel GA Molecular Evolution of Herpesviruses: Genomic and Protein Sequence Comparisons Evolution; Sequence comparison; Genomic; Protein; Phylogenetic; Sequence alignment; USA "Phylogenetic reconstruction of herpesvirus evolution is generally founded on amino acid sequence comparisons of specific proteins. These are relevant to the evolution of the specific gene (or set of genes), but the resulting phylogeny may vary depending on the particular sequence chosen for analysis (or comparison). In the first part of this report, we compare 13 herpesvirus genomes by using a new multidimensional methodology based on distance measures and partial orderings of dinucleotide relative abundances. ... In the second part of this report, evolutionary relationships among the 13 herpesvirus genomes are evaluated on the basis of recent methods of amino acid alignment applied to four essential protein sequences. ... By our methods, evolutionary relationships derived from genomic comparisons versus protein comparisons differ to some extent. The dinucleotide relative abundance distances appear to discriminate DNA structure specificity more than sequence specificity. The evolutionary development of genes among viruses (and species) is more dependent on each individual gene." J Virol 1994 68 3 1886-1902 1648 Karlin,S. Statistical Studies of.. Phil.Trans.R.So 94 344(1310):391- Karlin S Statistical Studies of Biomolecular Sequences: Score-based Methods Sequence analysis; Score; Statistical; Genome; Multiple comparison; Segment; USA "This presentation reviews the method of score-based sequence analysis with the objectives of discerning distinctive segments in single sequences and identifying significant common segments in sequence comparisons. A number of new results are described here for both the theory and its applications. These include distributional theory involving several high scoring segments in single sequences, distribution formulas for general scoring regimes in multiple sequence comparisons, bounds for periodic scoring assignments, sensitivity analysis of genome composition and refinements on predicting exons and genes in DNA sequences." Phil Trans R Soc Lond Ser B 1994 344 1310 391-402 1649 Burge,C. Over- and Under-Repres.. Proc.Nat.Acad.S 92 89:1358-1362 Burge C; Campbell AM; Karlin S Over- and Under-Representation of Short Oligonucleotides in DNA Sequences Sequence analysis; DNA; Statistical; k-tuple; Genomic; USA "Strand-symmetric relative abundance functionals for di-, tri-, and tetranucleotides are introduced and applied to sequences encompassing a broad phylogenetic range to discern tendencies and anomalies in the occurrences of these short oligonucleotides within and between genomic sequences. ... Explanations for these over- and under-representations in terms of DNA/RNA structures and regulatory mechanisms are considered." Proc Nat Acad Sci USA 89 89 1358-1362 1650 Gutell,R.R. Comparative Studies of.. Curr.Opin.Struc 93 3:313-322 Gutell RR Comparative Studies of RNA: Inferring Higher-order Structure from Patterns of Sequence Variation RNA; Structure; Sequence comparison; USA Gautheret, Damberger & Gutell (1995), p. 42. Curr Opin Struct Biol 3 3 313-322 1651 Olsen,G.J. Comparative Analysis o.. 83University Micr Olsen GJ Comparative Analysis of Nucleotide Sequence Data BK - Sequence comparison; RNA; Secondary; Sequence analysis; Phylogeny; USA; Nucleotide "By examining the divergence of the sequences it is possible to study the evolutionary relationships of the sequences, and thus infer the relationships between the organisms carrying them. By examining the similarities of the sequences, it is possible to identify conserved sequence features, which, due to their retention through evolution, we infer to be important to the function of the sequences in vivo. This thesis contains three major divisions, each emphasizing a different type of comparative sequence analysis. The first division addresses the analysis of nucleotide sequence phylogeny. ... The second division explores the application of comparative analysis to the deduction of RNA secondary structure. ... The final divisiion presents a general method for inferring the relationship between the nucleotide appearing at one position in a molecule and the nucleotide appearing at a second position in the same molecule." University Microfilms International Ann Arbor, MI 48106, USA 1983 xii+1-163 1652 Winker,S. Structure Detection th.. Comput.Appl.Bio 90 6:365-371 Winker S; Overbeek R; Woese CR; Olsen GJ; Pfluger N Structure Detection through Automated Covariance Search Structure; Sequence comparison; Covariance; Detection Gautheret, Damberger & Gutell (1995), p. 43. Comput Appl Biosci 6 6 365-371 1653 Rooman,M.J. Identification of Pred.. Nature (Lond.) 88 335(6185):45-4 Rooman MJ; Wodak SJ Identification of Predictive Sequence Motifs Limited by Protein Structure Data Base Size Protein; Structure; Database search; Motif; Sequence database; Identification; Belgium "Associations between short amino acid sequence patterns and protein secondary structure classes can be found by searching a data base of known protein structures. Analysis of these associations suggests that secondary structure of proteins can be determined locally by sequence motifs of high predictive value, but at present our ability to find these motifs is limited by the size of the available data bases." Nature (Lond ) 1988 335 6185 45-49 1654 Steel,M. Recovering the Correct.. Mol.Phylogenet. 95 0:0-0 Steel M; Hendy M; Lockhart PJ; Penny D Recovering the Correct Tree Under a More Realistic Model of Sequence Evolution Evolutionary tree; Sequence analysis; NZ; Model; Evolution Li & Zharkikh (1995), p. 63. 28.07.95: not yet on shelf. Mol Phylogenet Evol 0 0 0-0 1655 Rawlings,D. Adjacencies in Words Adv.Appl.Math. 95 16(2):206-218 Rawlings D Adjacencies in Words String search; Common feature; Word UnCover SICI Code: 0196-8858(19950601)16:2L.206:AW;1- Adv Appl Math 1995 16 2 206-218 1656 Kececioglu,J. Combinatorial Algorith.. Algorithmica 95 13(1/2):7-51 Kececioglu JD; Myers EW Combinatorial Algorithms for DNA Sequence Assembly Sequence assembly; USA; Combinatorial; Algorithm; DNA UnCover SICI Code: 0178-4617(19950101)13:1:2L.7:CADS;1- Algorithmica 1995 13 1/2 7-51 1657 Alizadeh,F. Physical Mapping of Ch.. Algorithmica 95 13(1/2):52-76 Alizadeh F; Karp RM; Weisser DK Physical Mapping of Chromosomes: A Combinatorial Problem in Molecular Biology Chromosome; Physical mapping; Mapping; Combinatorial; Physical UnCover SICI Code: 0178-4617(19950101)13:1:2L.52:PMCC;1- Algorithmica 1995 13 1/2 52-76 1658 Pevzner,P.A. DNA Physical Mapping a.. Algorithmica 95 13(1/2):77-105 Pevzner PA DNA Physical Mapping and Alternating Eulerian Cycles in Colored Graphs Physical mapping; Graph; DNA; USA; Mapping; Physical UnCover SICI Code: 0178-4617(19950101)13:1:2L.77:DPMA;1- Algorithmica 1995 13 1/2 77-105 1659 Chao,K.M. Linear-Space Algorithm.. Algorithmica 95 13(1/2):106-13 Chao KM; Miller W Linear-Space Algorithms that Build Local Alignments from Fragments Sequence alignment; Fragment; Algorithm UnCover SICI Code: 0178-4617(19950101)13:1:2L.106:LATB;1- Algorithmica 1995 13 1/2 106-134 1660 Pevzner,P.A. Multiple Filtration an.. Algorithmica 95 13(1/2):135-15 Pevzner PA; Waterman MS Multiple Filtration and Approximate Pattern Matching Pattern match; Approximate match; USA UnCover SICI Code: 0178-4617(19950101)13:1:2L.135:MFAP;1- Algorithmica 1995 13 1/2 135-154 1661 Farach,M. A Robust Model for Fin.. Algorithmica 95 13(1/2):155-17 Farach M; Kannan S; Warnow T A Robust Model for Finding Optimal Evolutionary Trees Evolutionary tree; Phylogeny; Model; USA; Optimal UnCover SICI Code: 0178-4617(19950101)13:1:2L.155:RMFO;1- Algorithmica 1995 13 1/2 155-179 1662 Kececioglu,J. Exact and Approximatio.. Algorithmica 95 13(1/2):180-21 Kececioglu J; Sankoff D Exact and Approximation Algorithms for Sorting by Reversals, with Application to Genome Rearrangement Genome; Rearrangement; Reversal; Algorithm; USA; Approximation UnCover SICI Code: 0178-4617(19950101)13:1:2L.180:EAAS;1- Algorithmica 1995 13 1/2 180-210 1663 de Rezende,P. Point Set Pattern Matc.. Algorithmica 95 13(4):387-403? de Rezende PJ; Lee DT Point Set Pattern Matching in d-Dimensions Pattern match; Multidimensional UnCover SICI Code: 0178-4617(19950401)13:4L.387:PSPM;1- Algorithmica 1995 13 4 387-403 1664 Crochemore,M. Squares, Cubes, and Ti.. Algorithmica 95 13(5):405-425 Crochemore M; Rytter W Squares, Cubes, and Time-Space Efficient String Searching String search; Square UnCover SICI Code: 0178-4617(19950501)13:5L.405:SCTE;1- Algorithmica 1995 13 5 405-425 1665 Knight,J.R. Approximate Regular Ex.. Algorithmica 95 14(1):85-121? Knight JR; Myers EW Approximate Regular Expression Pattern Matching with Concave Gap Penalties Pattern match; Gap; Language; Expression UnCover SICI Code: 0178-4617(19950701)14:1L.85:AREP;1- Algorithmica 1995 14 1 85-121 1666 Golumbic,M.C. On the Complexity of D.. Adv.Appl.Math. 94 15(3):251-??? Golumbic MC; Kaplan H; Shamir R On the Complexity of DNA Physical Mapping Physical mapping; Complexity; DNA; Mapping; Physical UnCover SICI Code: 0196-8858(19940901)15:3L.251:CDP;1- Adv Appl Math 1994 15 3 251-??? 1667 Amir,A. Efficient 2-Dimensiona.. Inform.Comput. 95 118(1):1-11 Amir A; Farach M Efficient 2-Dimensional Approximate Matching of Half-Rectangular Figures Pattern match; Multidimensional; Rectangular; Approximate match UnCover SICI Code: 0890-5401(19950401)118:1L.1:E2AM;1- Inform Comput 1995 118 1 1-11 1668 Karlin,S. Contrasts in Codon Usa.. J.Virol. 90 64(9):4264-427 Karlin S; Blaisdell BE; Schachtel GA Contrasts in Codon Usage of Latent versus Productive Genes of Epstein-Barr Virus: Data and Hypotheses Codon; Gene; Sequence analysis; Bias; USA "Epstein-Barr virus (EBV) has two different modes of existence: latent and productive. ... It is shown that the EBV genes known to be expressed during latency display codon usage strikingly different from that of genes that are expressed during lytic growth. In particular, the percentage of S3 (G or C in codon site 3) is persistently lower (about 20%) in all latent genes than in nonlatent genes. ... Two principal explanations to account for the EBV latent versus productive gene codon disparity are proposed." J Virol 1990 64 9 4264-4273 1669 Karlin,S. Dinucleotide Relative .. Trends in Genet 95 11(7):283-??? Karlin S; Burge C Dinucleotide Relative Abundance Extremes: A Genomic Signature Genomic; k-tuple; Sequence analysis; USA; Signature UnCover SICI Code: 0168-9479(19950701)11:7L.283:DRAE;1- 27.07.95: not yet on shelf. Trends in Genetics 1995 11 7 283-??? 1670 Karlin,S. Measuring Residue Asso.. J.Mol.Biol. 94 239(2):227-248 Karlin S; Zuker M; Brocchieri L Measuring Residue Associations in Protein Structures. Possible Implications for Protein Folding. Protein; Folding; Residue; Association; Structure; Distance; USA "We propose a number of distance measures between residues in protein structures based on average, minimum and maximum distances of all atom (backbone and side-chain) coordinates or with respect to side-chain atom coordinates only. ... For each distance measure, averaging and normalizing over representative protein structures, association values and closeness orderings for all amino acid types are determined. The expected associations of side-chain interactions between oppositely charged residues, among hydrophobic residues and of cysteine with cysteine are confirmed." J Mol Biol 1994 239 2 227-248 1671 Karlin,S. Significant Similarity.. Mol.Biol.Evol. 92 9(1):152-167 Karlin S; Brendel V; Bucher P Significant Similarity and Dissimilarity in Homologous Proteins Protein; Significance; Sequence comparison; Statistical; Similarity; USA "Common practice emphasizes significant sequence similarities between different members of protein families. These similarities presumably reflect on evolutionary conservation of structurally and functionally essential residues. The nonconserved regions, on the other hand, may be either selectively neutral or differentiated. We propose several distributional sequence statistics (e.g., clustering of charged residues, compositional biases, and repetitive patterns) as indicators of differentiation events." Mol Biol Evol 1992 9 1 152-167 1672 Karlin,S. Some Statistical Probl.. JASA 91 86(413):27-?? Karlin S; Macken C Some Statistical Problems in the Assessment of Inhomogeneities of DNA Sequence Data Statistical; DNA; Sequence analysis UnCover: MAR 01 1991 v 86 n 413 JASA 1991 86 413 27-?? 1673 Dembo,A. Critical Phenomena for.. Ann.Probab. 94 22(4):1993-??? Dembo A; Karlin S; Zeitouni O Critical Phenomena for Sequence Matching with Scoring Statistical; Scoring; Sequence match; USA UnCover SICI Code: 0091-1798(19941001)22:4L.1993:CPSM;1- Ann Probab 1994 22 4 1993-???? 1674 Dembo,A. Limit Distribution of .. Ann.Probab. 94 22(4):2022-??? Dembo A; Karlin S; Zeitouni O Limit Distribution of Maximal Non-aligned Two-sequence Segmental Score Statistical; Score; Segment; Pairwise comparison; USA; Distribution UnCover SICI Code: 0091-1798(19941001)22:4L.2022:LDMT;1- Ann Probab 1994 22 4 2022-???? 1675 Port,E. Genomic Mapping by End.. Genomics 95 26(1):84-100 Port E; Sun F; Martin D; Waterman MS Genomic Mapping by End-Characterized Random Clones: A Mathematical Analysis Genomic; Mapping; Clone; Physical; Fingerprint; USA "Physical maps can be constructed by 'fingerprinting' a large number of random clones and inferring overlap between clones when the fingerprints are sufficiently similar. E. Lander and M. Waterman (1988) gave a mathematical analysis of such mapping strategies. ... Recently it has been proposed that ends of clones rather than the entire clone be fingerprinted or characterized. Such fingerprints ... require a mathematical analysis deeper than that of Lander- Waterman. This paper studies clone islands, which can include uncharacterized regions, and also the islands that are formed entirely from the ends of clones." Genomics 1995 26 1 84-100 1676 Penner,R.C. Spaces of RNA Secondar.. Adv.Math. 93 101(1):31-49 Penner RC; Waterman MS Spaces of RNA Secondary Structures RNA; Secondary; Structure; Topology; USA "We prove two topological theorems in physical chemistry. ... In fact, our primary motivation here is to study secondary structures on RNA. This imposes the further restriction that there can be at most one base-pair supported at a given site of underlying linear macromolecule, and imposing this restriction leads to the class of 'binary macromolecules.' Our main results here assert the sphericity of certain topological spaces of both arbitrary and binary macromolecules, and it is the latter which we hope may have applications to RNA." Adv Math 1993 101 1 31-49 1677 Waterman,M.S. Designer Algorithms fo.. N.Z.J.Bot. 93 31(3):269-273 Waterman MS; von Haeseler A Designer Algorithms for Cryptogene Searches Gene; Database search; RNA; Dynamic programming; Algorithm; Editing; USA "RNA editing in the mitochondria of kinetoplastid protozoa describes the insertion and (or) deletion of precise numbers of uridines at precise locations in the transcribed RNA. Such genes are known as cryptogenes. We describe dynamic programming algorithms to search for unknown cryptogenes and for the sequences that template the editing, gRNAs." N Z J Bot 1993 31 3 269-273 1678 Waterman,M. Estimating Statistical.. Phil.Trans.R.So 94 344(1310):383- Waterman M Estimating Statistical Significance of Sequence Alignments Sequence alignment; Statistical; Significance; Pairwise comparison; Segment; USA "Algorithms that compare two proteins or DNA sequences and produce an alignment of the best matching segments are widely used in molecular biology. These algorithms produce scores that when comparing random sequences of length n grow proportional to n or to log(n) depending on the algorithm parameters. The Azuma-Hoeffding inequality gives an upper bound on the probability of large deviations of the score from its mean in the linear case. Poisson approximation can be applied in the logarithmic case." Phil Trans R Soc Lond Ser B 1994 344 1310 383-390 1679 Frank-Kamenet Fractality of DNA Texts J.Biomol.Struct 94 12(3):655-670 Frank-Kamenetskii MD; Borovik AS; Grosberg AY Fractality of DNA Texts Fractal; Sequence analysis; DNA UnCover SICI Code: 0739-1102(19941201)12:3L.655:FDT;1- J Biomol Struct & Dyn 1994 12 3 655-670 1680 Chelvanayagam Easy Adaptation of Pro.. Protein Eng. 94 7(2):173-184 Chelvanayagam G; Roy G; Argos P Easy Adaptation of Protein Structure to Sequence Protein; Structure; Sequence analysis UnCover SICI Code: 0269-2139(19940201)7:2L.173:EAPS;1- Protein Eng 1994 7 2 173-184 1681 Vriend,G. A Novel Search Method .. Protein Eng. 94 7(1):23-30 Vriend G; Sander C; Stouten PFW A Novel Search Method for Protein Sequence-Structure Relations using Property Profiles Protein; Sequence search; Structure; Profile UnCover SICI Code: 0269-2139(19940101)7:1L.23:NSMP;1- Protein Eng 1994 7 1 23-30 1682 Flores,T.P. An Algorithm for Autom.. Protein Eng. 94 7(1):31-38 Flores TP; Moss DS; Thornton JM An Algorithm for Automatically Generating Protein Topology Cartoons Protein; Topology; Structure; Algorithm UnCover SICI Code: 0269-2139(19940101)7:1L.31:AAGP;1- Protein Eng 1994 7 1 31-38 1683 Saqi,S.A.M. Identification of Sequ.. Protein Eng. 94 7(2):165-172 Saqi SAM; Sternberg MJE Identification of Sequence Motifs from a Set of Proteins with Related Function Protein; Sequence analysis; Motif; Identification; Function UnCover SICI Code: 0269-2139(19940201)7:2L.165:ISMF;1- Protein Eng 1994 7 2 165-172 1684 Laughton,A.C. A Study of Simulated A.. Protein Eng. 94 7(2):235-242 Laughton AC A Study of Simulated Annealing Protocols for Use with Molecular Dynamics in Protein Structure Prediction Protein; Simulated annealing; Structure; Prediction; Dynamic UnCover SICI Code: 0269-2139(19940201)7:2L.235:SSAP;1- Protein Eng 1994 7 2 235-242 1685 Mao,B. Protein Folding Classe.. Protein Eng. 94 7(3):319-330 Mao B; Chou KC; Zhang CT Protein Folding Classes: A Geometric Interpretation of the Amino Acid Composition of Globular Proteins Protein; Folding; Amino acid; Composition; Geometry UnCover SICI Code: 0269-2139(19940301)7:3L.319:PFCG;1- Protein Eng 1994 7 3 319-330 1686 Attwood,T.K. PRINTS - A Protein Mot.. Protein Eng. 94 7(7):841-848 Attwood TK; Beck ME PRINTS - A Protein Motif Fingerprint Database Protein; Motif; Fingerprint; Database search UnCover SICI Code: 0269-2139(19940701)7:7L.841:PPMF;1- Protein Eng 1994 7 7 841-848 1687 Fidelis,K. Comparison of Systemat.. Protein Eng. 94 7(8):953-960 Fidelis K; Stern PS; Moult J Comparison of Systematic Search and Database Methods for Construction Segment of Protein Structure Protein; Structure; Database search; Segment; Systematics UnCover SICI Code: 0269-2139(19940801)7:8L.953:CSSD;1- Protein Eng 1994 7 8 953-960 1688 Lathrop,R.H. The Protein Threading .. Protein Eng. 94 7(9):1059-1068 Lathrop RH The Protein Threading Problem with Sequence Amino Acid Interaction Preferences is NP-complete Protein; Sequence proximity; Amino acid; Complexity UnCover SICI Code: 0269-2139(19940901)7:9L.1059:PTPW;1- Protein Eng 1994 7 9 1059-1068 1689 De Filippis,V Predicting Local Struc.. Protein Eng. 94 7(10):1203-120 De Filippis V; Sander C; Vriend G Predicting Local Structural Changes that Result from Point Mutations Protein; Structure; Prediction UnCover SICI Code: 0269-2139(19941001)7:10L.1203:PLSC;1- Protein Eng 1994 7 10 1203-1208 1690 Shindyalov,I. Macromolecular Query L.. Protein Eng. 94 7(11):1311-132 Shindyalov IN; Chang W; Bourne PE Macromolecular Query Language (MMQL): Prototype Data Model and Implementation Protein; Model; Language; Query UnCover SICI Code: 0269-2139(19941101)7:11L.1311:MQL(;1- Protein Eng 1994 7 11 1311-1322 1691 Horimoto,K. A Simple Procedure for.. Protein Eng. 94 7(12):1433-144 Horimoto K; Yamamoto H; Otsuka J A Simple Procedure for Assigning a Sequence Motif with an Obscure Pattern: Application to the Basic/Helix-Loop-Helix Motif Protein; Sequence analysis; Motif UnCover SICI Code: 0269-2139(19941201)7:12L.1433:SPAS;1- Protein Eng 1994 7 12 1433-1440 1692 Zhu,Z.Y. A New Approach to the .. Protein Eng. 95 8(2):103-108 Zhu ZY A New Approach to the Evaluation of Protein Secondary Structure Predictions at the Level of the Elements of Secondary Structure Protein; Secondary; Structure; Prediction UnCover SICI Code: 0269-2139(19950201)8:2L.103:NAEP;1- Protein Eng 1995 8 2 103-108 1693 Milik,M. Neural Network System .. Protein Eng. 95 8(3):225-236 Milik M; Kolinski A; Skolnick J Neural Network System for the Evaluation of Side-chain Packing in Protein Structures Neural; Network; Protein; Structure UnCover SICI Code: 0269-2139(19950301)8:3L.225:NNSE;1- Protein Eng 1995 8 3 225-236 1694 Kolinski,A. Monte Carlo Simulation.. Proteins Struct 94 18(4):338-352 Kolinski A; Skolnick J Monte Carlo Simulations of Protein Folding. I. Lattice Model and Interaction Scheme Protein; Folding; Monte Carlo; Simulation; Model UnCover SICI Code: 0887-3585(1994)18:4L.338:MCSP;1- Proteins Struct Funct Genet 1994 18 4 338-352 1695 Rose,G.D. Protein Folding: Predi.. Proteins Struct 94 19(1):1-3 Rose GD; Creamer TP Protein Folding: Predicting Predicting Protein; Folding; Prediction UnCover SICI Code: 0887-3585(1994)19:1L.1:PFPP;1- Proteins Struct Funct Genet 1994 19 1 1-3 1696 Rost,B. Combining Evolutionary.. Proteins Struct 94 19(1):55-72 Rost B; Sander C Combining Evolutionary Information and Neural Networks to Predict Protein Secondary Structure Protein; Secondary; Structure; Evolution; Neural; Network UnCover SICI Code: 0887-3585(1994)19:1L.55:CEIN;1- Proteins Struct Funct Genet 1994 19 1 55-72 1697 Abagyan,R. Recognition of Distant.. Proteins Struct 94 19(2):132-140 Abagyan R; Frishman D; Argos P Recognition of Distantly Related Proteins Through Energy Calculations Protein; Recognition; Energy UnCover SICI Code: 0887-3585(1994)19:2L.132:RDRP;1- Proteins Struct Funct Genet 1994 19 2 132-140 1698 Holm,L. Searching Protein Stru.. Proteins Struct 94 19(3):165-173 Holm L; Sander C Searching Protein Structure Databases Has Come of Age Protein; Structure; Database search UnCover SICI Code: 0887-3585(1994)19:3L.165:SPSD;1- Proteins Struct Funct Genet 1994 19 3 165-173 1699 Holm,L. Parser for Protein Fol.. Proteins Struct 94 19(3):256-268 Holm L; Sander C Parser for Protein Folding Units Protein; Folding; Parser UnCover SICI Code: 0887-3585(1994)19:3L.256:PPFU;1- Proteins Struct Funct Genet 1994 19 3 256-268 1700 Chou,K.C. A Novel Approach to Pr.. Proteins Struct 95 21(4):319-344 Chou KC A Novel Approach to Predicting Protein Structural Classes in a (20-1) Amino Acid Composition Space Protein; Structure; Prediction; Amino acid; Composition UnCover SICI Code: 0887-3585(1995)21:4L.319:NAPP;1- Proteins Struct Funct Genet 1995 21 4 319-344 1701 Rost,B. Improved Prediction of.. Proc.Nat.Acad.S 93 90(16):7558-75 Rost B; Sander C Improved Prediction of Protein Secondary Structure by Use of Sequence Profiles and Neural Networks Neural; Secondary; Structure; Prediction; Protein; DE; Profile; Network "The explosive accumulation of protein sequences in the wake of large- scale sequencing projects is in stark contrast to the much slower experimental determination of protein structures. Improved methods of structure prediction from the gene sequence alone are therefore needed. Here, we report a substantial increase in both the accuracy and quality of secondary-structure predictions, using a neural-network algorithm. The main improvements come from the use of multiple sequence alignments (better overall accuracy), from "balanced training" (better prediction of beta -strands), and from "structure context training" (better prediction of helix and strand lengths)." Proc Nat Acad Sci USA 1993 90 16 7558-7562 1702 Benson,G. A Space Efficient Algo.. Theoret.Comput. 95 145:357-369 Benson G A Space Efficient Algorithm for Finding the Best Nonoverlapping Alignment Score Sequence analysis; Sequence alignment; Repeat; Score; Algorithm; USA "Repeating patterns make up a significant fraction of DNA and protein molecules. ... In this paper, we present a space efficient algorithm for finding the maximum alignment score for any two substrings of a single string T under the condition that the substrings do not overlap. In a biological context, this corresponds to the largest repeating region in the molecule. The algorithm runs in O(n**2 log**2 n) time and uses only O(n**2) space." Theoret Comput Sci 145 145 357-369 1703 Brenner,S.E. Network Sequence Retri.. Trends in Genet 95 11(6):247-248 Brenner SE Network Sequence Retrieval Sequence search; Database search; Electronic mail; World Wide Web; Retrieval; UK; Network "Retrieving DNA and protein sequences from a database is one of the common computer tasks for molecular biologists and should be one of the simplest. ... But for scientists who wish to spend their research time at the bench and not at the computer, even the trouble of obtaining current versions of the software, installing them and learning about them can be a distressingly large time investment. ... A World Wide Web (WWW) client can provide a one-piece solution. ... Time spent learning how to use the WWW is a good investment." Trends in Genetics 1995 11 6 247-248 1704 Brown,N.P. Identification and Ana.. J.Mol.Biol. 95 249:342-359 Brown NP; Whittaker AJ; Newell WR; Rawlings CJ; Beck S Identification and Analysis of Multigene Families by Comparison of Exon Fingerprints Gene; Sequence alignment; Sequence comparison; Dynamic programming; Fingerprint; Exon; Genomic; UK; Identification "Gene families are often recognised by sequence homology using similarity searching to find relationships, however, genomic sequence data provides gene architectural information not used by conventional search methods. ... A fast search technique capable of detecting possible weak sequence homologies apparent at the intron/exon level of gene organization is presented for comparing spliceosomal genes and gene fragments." J Mol Biol 249 249 342-359 1705 Charleston,M. Neighbor-Joining Uses .. Mol.Phylogenet. 93 2(1):6-12 Charleston MA; Hendy MD; Penny D Neighbor-Joining Uses the Optimal Weight for Net Divergence Phylogenetic; Neighbor joining; Distance; NZ; Optimal; Divergence "A class of phylogenetic clustering methods which calculate net divergences from distance data, but assign differing weights to the net divergences, is defined. ... The accuracy of some of these methods is studied by computer simulation for the case of four taxa under the additive tree hypothesis. Of these methods and under this hypothesis, it is proved that Neighbor-Joining uses the only weighting for net divergence which is consistent, so that it is the only method in the class which is expected to converge to the correct tree as more data are added." Mol Phylogenet Evol 1993 2 1 6-12 1706 Yang,Z. Evaluation of Several .. J.Mol.Evol. 95 40:689-697 Yang Z Evaluation of Several Methods for Estimating Phylogenetic Trees when Substitution Rates Differ over Nucleotide Sites Phylogenetic; Evolutionary tree; Substitution; Rate; Nucleotide; UK "Several maximum likelihood and distance matrix methods for estimating phylogenetic trees from homologous DNA sequences were compared when substitution rates at sites were assumed to follow a gamma distribution. Computer simulations were performed to estimate the probabilities that various tree estimation methods recover the true tree topology. The case of four species was considered, and a few combinations of parameters were examined. Attention was applied to discriminating among different sources of error in tree reconstruction ...." J Mol Evol 40 40 689-697 1707 Zhang,L. On the Approximation o.. Theoret.Comput. 95 143:353-362 Zhang L On the Approximation of Longest Common Nonsupersequences and Shortest Common Nonsubsequences Longest common; Shortest common; Subsequence; Supersequence; CA; Approximation; Nonsubsequence "The longest common nonsupersequence (LCNS) problem is shown to be NP- complete over the binary alphabet, and Max SNP-hard, in general. Although it is open whether this problem and the shortest common nonsubsequence problem are Max SNP-hard over the binary alphabet, we show that their generalizations (the mixed supersequence and the mixed subsequence problems) indeed remain Max SNP-hard over the binary alphabet." Theoret Comput Sci 143 143 353-362 1708 Gatesy,J. Alignment-Ambiguous Nu.. Mol.Phylogenet. 93 2(2):152-157 Gatesy J; DeSalle R; Wheeler W Alignment-Ambiguous Nucleotide Sites and the Exclusion of Systematic Data Phylogenetic; Indel; Sequence alignment; Multiple alignment; DNA; USA; Nucleotide; Systematics "Molecular systematists generally rely on computer algorithms to establish the alignment of DNA sequences. However, when alignment regions are characterized by multiple insertions and deletions, these gap-filled stretches of DNA are often excised before phylogenetic reconstruction. This exclusion of systematic data is generally determined by subjective criteria. We explore a replicable methodology in which the comparison of several multiple sequence alignments can be used to eliminate regions of unstable sequence alignment." Mol Phylogenet Evol 1993 2 2 152-157 1709 Gu,X. Maximum Likelihood Est.. Mol.Biol.Evol. 95 12(4):546-557 Gu X; Fu YX; Li WH Maximum Likelihood Estimation of the Heterogeneity of Substitution Rate among Nucleotide Sites Likelihood; Estimation; Substitution; Rate; Nucleotide; Distance; USA "This paper presents a maximum likelihood approach to estimating the variation of substitution rate among nucleotide sites. We assume that the rate varies among sites according to an invariant+gamma distribution, which has two parameters: the gamma parameter alpha and the proportion of invariable sites theta. Theoretical treatments on three, four, and five sequences have been conducted, and computer programs have been developed. ... Extensive simulations show that ...." Mol Biol Evol 1995 12 4 546-557 1710 Harper,R. World Wide Web Resourc.. Trends in Genet 95 11(6):223-228 Harper R World Wide Web Resources for the Biologist WWW; Network; Electronic mail; Internet; UK; World Wide Web "The World Wide Web is currently the major networking resource for biologists. It has passed Gopher and simple electronic mail (email) servers in popularity. In the 1990's, the advent of client-server sortware will be the main driving force in bioinformatics. During the past few years, biologists have used the Internet increasingly to distribute data, and the methods of doing this have become more and more sophisticated as the speed with which network links can be made has increased." Trends in Genetics 1995 11 6 223-228 1711 Hasegawa,M. Relative Efficiencies .. Mol.Phylogenet. 93 2(1):1-5 Hasegawa M; Fujiwara M Relative Efficiencies of the Maximum Likelihood, Maximum Parsimony, and Neighbor-Joining Methods for Estimating Protein Phylogeny Likelihood; Parsimony; Neighbor joining; Protein; Phylogeny; Simulation; JP "The relative efficiencies of the maximum likelihood (ML), maximum parsimony (MP), and neighbor-joining (NJ) methods for protein phylogeny in obtaining the correct tree topology were studied by using computer simulation. Furthermore, the robustness of the methods against departures from the assumed underlying model was studied." Mol Phylogenet Evol 1993 2 1 1-5 1712 Huynen,M.A. Pattern Generation in .. J.Mol.Evol. 94 39:71-79 Huynen MA; Hogeweg P Pattern Generation in Molecular Evolution: Exploitation of the Variation in RNA Landscapes RNA; Pattern definition; Evolution; Secondary; Structure; Simulation; Statistical; NL "Evolution of RNA secondary structure is studied using simulation techniques and statistical analysis of fitness landscapes. The transition from RNA sequence to RNA secondary structure leads to fitness landscapes that have local variations in their 'ruggedness.' Evolution exploits these variations. In stable environments it moves the quasispecies toward relatively 'flat' peaks, where not only the master sequence but also its mutants have a high fitness. In a rapidly changing environment, the situation is reversed: evolution moves the quasispecies to a region where the correlation between secondary structures of 'neighboring' RNA sequences is relatively low." J Mol Evol 39 39 71-79 1713 Jiang,T. Shortest Consistent Su.. Theoret.Comput. 95 143:113-122 Jiang T; Timkovsky VG Shortest Consistent Superstrings Computable in Polynomial Time Shortest consistent; Shortest common; Supersequence; Complexity; Sequencing; Hybridization; CA "The shortest consistent superstring problem is, given a set of positive strings and a set of negative strings, finding a shortest string including every positive string and no negative string as a substring. The problem is NP-hard and arises in DNA sequencing by hybridization. It is also an extension of the well-known shortest common superstring problem which corresponds to the case when the set of negative strings is empty. In this paper we show that a shortest consistent superstring can be found in polynomial time if (i) a longest common nonsuperstring for the set of negative strings exists or (ii) the number of positive strings is bounded and every symbol of the alphabet appears at the end of some negative string." Theoret Comput Sci 143 143 113-122 1714 Karlin,S. Correlation Analysis o.. Proc.Nat.Acad.S 92 89:12165-12169 Karlin S; Bucher P Correlation Analysis of Amino Acid Usage in Protein Classes Correlation; Protein; Amino acid; Viral; Genome; USA "We present a comparative study of residue usage correlations of various organism protein sets of diverse phylogenetic species and of open reading frames of several large human viral genomes. Our correlation analysis reveals three major tendencies: .... Discussion and speculations relate amino acid usage correlations to protein function/structure, cellular localization, proximity in amino acid biosynthetic pathways, amino acid relative abundances, tRNA and aminoacyl synthetase availabilities, and evolutionary processes." Proc Nat Acad Sci USA 89 89 12165-12169 1715 May,A.C.W. The Recognition of Pro.. Phil.Trans.R.So 94 344:373-381 May ACW; Johnson MS; Rufino SD; Wako H; Zhu ZY; Sowdhamini R; Srinivasan N; Rodionov MA; Blundell TL The Recognition of Protein Structure and Function from Sequence: Adding Value to Genome Data Protein; Structure; Function; Sequence analysis; Genome; UK; Recognition "The explosion of DNA sequence data from genome projects presents many challenges. For instance, we must extend our current knowledge of protein structure and function so that it can be applied to these new sequences. The derivation of rules for the relationships between sequence and structure allow us to recognize a common fold by the use of tertiary templates. New techniques enable us to begin to meet the challenge of rule-based modelling of distantly related proteins. This paper describes an integrated and knowledge-based approach to the prediction of protein structure and function which can maximize the value of sequence information." Phil Trans R Soc Lond Ser B 344 344 373-381 1716 Middendorf,M. On Finding Minimal, Ma.. Theoret.Comput. 95 145:317-327 Middendorf M On Finding Minimal, Maximal, and Consistent Sequences over a Binary Alphabet Longest common; Subsequence; Shortest common; Supersequence; Complexity; DE "In this paper we investigate the complexity of finding various kinds of common super- and subsequences with respect to one or two given sets of strings. We show that Longest Minimal Common Supersequence, Shortest Maximal Common Subsequence, and Shortest Maximal Common Non-Supersequence are MAX SNP-hard over a binary alphabet. ... We show how these problems can be related to finding sequences consistent with respect to two given sets of strings. This leads to a unified approach for characterizing the complexity of such problems." Theoret Comput Sci 145 145 317-327 1717 Miramontes,P. Structural and Thermod.. J.Mol.Evol. 95 40:698-704 Miramontes P; Medrano L; Cerpa C; Cedergren R; Ferbeyre G; Cocho G Structural and Thermodynamic Properties of DNA Uncover Different Evolutionary Histories DNA; Structure; Thermodynamic; Evolutionary tree; Genomic; CA "We propose an index of DNA homogeneity (IDH) based on a binary distribution model that quantifies structural and thermodynamic aggregates present in DNA primary structures. Extensive analysis of sequence databases with the IDH uncovers significant constraints on DNA sequence other than those derived from codon usage or protein function. This index clearly distinguishes between organisms of different evolutive origins and places them in disjoint domains of DNA sequence space." J Mol Evol 40 40 698-704 1718 Pesole,G. A Statistical Method f.. Mol.Phylogenet. 92 1(2):91-96 Pesole G; Attimonelli M; Preparata G; Saccone C A Statistical Method for Detecting Regions with Different Evolutionary Dynamics in Multialigned Sequences Statistical; Region; Evolution; Multiple alignment; Sequence alignment; Stochastic; Gene; Italy; Dynamic "We describe a stochastic method for tracing the evolutionary pattern of multialigned sequences. This method allows us to detect gene regions with distinct evolutionary dynamics, e.g., regions that significantly deviate from the expected behavior. Accurate detection of hypervariable or hyperconstrained regions may provide useful information on the structure/function relationship of biosequences. This information can help localize functional constraints. In addition, the selection of distinct evolutionary dynamics may assist in the correct use of biosequences as reliable molecular clocks." Mol Phylogenet Evol 1992 1 2 91-96 1719 Ragan,M.A. Phylogenetic Inference.. Mol.Phylogenet. 92 1(1):53-58 Ragan MA Phylogenetic Inference Based on Matrix Representation of Trees Phylogenetic; Evolutionary tree; Consensus tree; CA; Matrix; Representation "Rooted phylogenetic trees can be represented as matrices in which the rows correspond to termini, and columns correspond to internal nodes .... Parsimony analysis of such a metrix will fully recover the topology of the original tree. The maximum size of the represented matrix depends only on the number of termini in the tree .... Representations of multiple trees ... can readily be combined into a single matrix .... Parsimony analysis of the resulting composite matrix yields a hybrid supertree which typically provides greater resolution than conventional consensus trees." Mol Phylogenet Evol 1992 1 1 53-58 1720 Schoniger,M. A Stochastic Model for.. Mol.Phylogenet. 94 3(3):240-247 Schoniger M; von Haeseler A A Stochastic Model for the Evolution of Autocorrelated DNA Sequences Stochastic; Model; Evolution; DNA; Substitution; DE "Currently used stochastic models of DNA sequence evolution assume independent and identically distributed nucleotide sites. They are too simple to account for dependence structures obviously present in molecular data. Up to now more realistic stochastic models for nucleotide substitutions have been considered intractable. In this paper a procedure that accounts for non- overlapping correlations among pairs of sites of a DNA sequence is developed. We show that currently used models that ignore correlated sites underestimate distances inferred from observed sequence dissimilarities." Mol Phylogenet Evol 1994 3 3 240-247 1721 Steel,M. A Frequency-Dependent .. Mol.Phylogenet. 95 4(1):64-71 Steel M; Lockhart PJ; Penny D A Frequency-Dependent Significance Test for Parsimony Evolutionary tree; Parsimony; Significance; Statistical; Sequence comparison; NZ "We describe techniques for assessing evolutionary trees constructed by the parsimony criteria, when sequences exhibit irregular base compositions. In particular, we extend a recently described frequency-dependent significance test to handle any number of taxa and describe a modification of the Kishino-Hasegawa sites test. These modifications are useful for detecting historical signals beyond those patterns which arise purely from irregular base compositions between the compared sequences. ... We also describe how the techniques can be modified to determine how 'tree-like' data are, given independent variation in the base frequencies." Mol Phylogenet Evol 1995 4 1 64-71 1722 van Batenburg An APL-programmed Gene.. J.Theor.Biol. 95 174:269-280 van Batenburg FHD; Gultyaev AP; Pleij CWA An APL-programmed Genetic Algorithm for the Prediction of RNA Secondary Structure Genetic; Algorithm; Prediction; RNA; Secondary; Structure; NL "The possibilities of using a genetic algorithm for the prediction of RNA secondary structure were investigated. The algorithm, using the procedure of stepwise selection of the most fit structures (similarly to natural evolution), allows different models of fitness or driving forces determining RNA structure to be easily introduced. This can be used for simulation of the RNA folding process and for the investigation of possible folding pathways. Such an algorithm needs several modifications before it can predict RNA secondary structures. After modification, a fair number of correct stems are predicted, even when using computationally quick, but very crude, fitness criteria ...." J Theor Biol 174 174 269-280 1723 Vogt,G. An Assessment of Amino.. J.Mol.Biol. 95 249:816-831 Vogt G; Etzold T; Argos P An Assessment of Amino Acid Exchange Matrices in Aligning Protein Sequences: The Twilight Zone Revisited Amino acid; Protein; Sequence alignment; Residue; Gap; Substitution; Score; Matrix; DE "The sensitivity of most protein sequence alignment methods depends strongly on the quality of the comparison matrices used. These matrices, which assign weights or similarity scores to every possible amino acid substitution pair, are utilized to differentiate amongst the various possible alignments of two or more sequences. There are many ways to generate these exchange weights and new matrices are constantly published. There has been no overall assessment of these various matrices when applied in different alignment techniques and over many protein folds and families, both close and distant and with the use of several gap penalty values. In this work, a set of amino acid sequences matched by superposition of known protein tertiary topologies is used to test the alignment accuracy of the different method/matrix/penalty combinations." J Mol Biol 249 249 816-831 1724 Wheeler,W.C. Elision: A Method for .. Mol.Phylogenet. 95 4(1):1-9 Wheeler WC; Gatesy J; DeSalle R Elision: A Method for Accommodating Multiple Molecular Sequence Alignments with Alignment-Ambiguous Sites Sequence alignment; Multiple alignment; Character weight; Consensus alignment; Phylogenetic; USA "Multiple alignments are frequently nonunique. Two sources of these multiple alignments are analysis based on different sets of alignment parameter values ... and nonunique equally costly alignments based on a single set of alignment parameters. By 'eliding' these individual alignments into a single grand alignment, phylogeny that is weighted toward those positions that align more consistently can be reconstructed. Positions that show greater variation among alignments will be relatively downweighted. The technique results in a weighting procedure that is a posteriori and based on the evidence established from the original sequence alignments." Mol Phylogenet Evol 1995 4 1 1-9 1725 Yang,Z. Estimating the Pattern.. J.Mol.Evol. 94 39:105-111 Yang Z Estimating the Pattern of Nucleotide Substitution Nucleotide; Substitution; Model; Markov; Likelihood; DNA; Phylogenetic; UK "In this paper a model-based maximum likelihood approach is proposed for estimating substitution patterns in real sequences. Nucleotide substitution is assumed to follow a homogeneous Markov process, and the general reversible process model (REV) and the unrestricted model without the reversibility assumption are used. These models are also applied to examine the adequacy of the [HKY85] model of Hasegawa, Kishino and Yano (1985). ... It is concluded that the use of the REV model in phylogenetic analysis can be recommended, especially for large data sets or for sequences with extreme substitution patterns, while HKY85 may be expected to provide a good approximation." J Mol Evol 39 39 105-111 1726 Hart,W.E. Fast Protein Folding i.. ACM Sympos.Theo 95 27:157-168 Hart WE; Istrail S Fast Protein Folding in the Hydrophobic-hydrophilic Model Within Three- eights of Optimal (Extended Abstract) Protein; Folding; Complexity; USA; Model; Optimal "We present performance-guaranteed approximation algorithms for the protein folding problem in the hydrophobic-hydrophilic model, Dill (1985). ... The protein is modeled as a chain of amino acids of length n which are of two types: H (hydrophobic, i.e., nonpolar) and P (hydrophilic, i.e., polar). ... Our algorithms have linear (3n) time and achieve a three-dimensional protein conformation that has a guaranteed free energy within 3/8 of optimal. ... Equally important, the folding pathway and final conformations of our algorithms are biologically plausible." ACM Sympos Theory Comput 27 27 157-168 1727 Kosaraju,S.R. Large-Scale Assembly o.. ACM Sympos.Theo 95 27:169-177 Kosaraju SR; Delcher AL Large-Scale Assembly of DNA Strings and Space-Efficient Construction of Suffix Trees Supersequence; Sequence assembly; Shortest common; Suffix; Complexity; Ancestor; USA; DNA "We consider the problem of assembling a given set of DNA strings into a small set of strings. A simple version of this problem is known as the superstring problem. ... We first give a linear-time algorithm for the greedy heuristic to construct a superstring. We then generalize the problem to several DNA string assembly problems and develop greedy implementations for them. We also describe efficient algorithms to compute the suffix tree for strings over unbounded alphabets and to compute nearest common ancestors in trees." ACM Sympos Theory Comput 27 27 169-177 1728 Hannenhalli,S Transforming Cabbage i.. ACM Sympos.Theo 95 27:178-189 Hannenhalli S; Pevzner P Transforming Cabbage into Turnip (Polynomial Algorithm for Sorting Signed Permutations by Reversals) Complexity; Reversal; Sort; Signed; Permutation; Genome; USA; Algorithm "Analysis of genomes evolving by inversions leads to a combinatorial problem of sorting by reversals studied in detail recently. ... We study sorting of signed permutations by reversals, a problem which adequately models rearrangements in small genomes like chloroplast or mitochondrial DNA. The previously suggested performance guarantee algorithms for sorting signed permutations by reversals approximate the reversal distance between permutations with an astonishing accuracy for both simulated and biological data. We prove a duality theorem explaining this intriguing performance and show that there exists a 'hidden' parameter which allows one to efficiently compute the reversal distance between signed permutations." ACM Sympos Theory Comput 27 27 178-189 1729 Ferragina,P. A Fully-Dynamic Data S.. ACM Sympos.Theo 95 27:693-702 Ferragina P; Grossi R A Fully-Dynamic Data Structure for External Substring Search Data structure; String search; Dynamic; Pattern search; Suffix; Italy; Structure "We address the issue of efficiently searching on external dynamic data structures for strings, introducing the External Dynamic Substring Search [EDSS] problem. ... We introduce the SB-Tree data structure for [a set of external text strings], which is the first fully-dynamic data structure allowing the EDSS problem to be solved with provably good worst-case and amortized I/O bounds. ... In this paper, we address the issue of efficiently finding all the occurrences of a pattern string as a substring of many text strings." ACM Sympos Theory Comput 27 27 693-702 1730 Farach,M. String Matching in Lem.. ACM Sympos.Theo 95 27:703-712 Farach M; Thorup M String Matching in Lempel-Ziv Compressed Strings String match; Compression; Lempel-Ziv; USA "String matching and Compression are two widely studied areas of computer science. ... Data structures from string matching can be used to derive fast implementations of many important compression schemes, most notably the Lempel- Ziv (LZ1) algorithm. ... The Compressed Matching Problem is that of performing string matching in a compressed text, without uncompressing it. ... In this paper, we give the first non-trivial compressed matching algorithm for the classic compression scheme, the LZ1 algorithm." ACM Sympos Theory Comput 27 27 703-712 1731 Czumaj,A. Work-Time-Optimal Para.. ACM Sympos.Theo 95 27:713-722 Czumaj A; Galil Z; Gasieniec L; Park K; Plandowski W Work-Time-Optimal Parallel Algorithms for String Problems Parallel; Algorithm; String match; Pattern match; Regularities; Palindrome; Square; PO "A parallel algorithm is work-optimal if it uses the smallest possible work; a work-optimal algorithm is work-time-optimal if it also uses the smallest possible time. We design work-time-optimal algorithms for a number of string processing problems on the EREW-PRAM and the hypercube. They include string matching and two dimensional pattern matching." ACM Sympos Theory Comput 27 27 713-722 1732 Vivarelli,F. LGANN: A Parallel Syst.. Comput.Appl.Bio 95 11(3):253-260 Vivarelli F; Giusti G; Villani M; Campanini R; Fariselli P; Compiani M; Casadio R LGANN: A Parallel System Combining a Local Genetic Algorithm and Neural Networks for the Prediction of Secondary Structure of Proteins Parallel; Genetic; Algorithm; Neural; Network; Prediction; Secondary; Structure; Protein; Italy "In this work we describe a parallel system consisting of feed-forward neural networks supervised by a local genetic algorithm. The system is implemented in a transputer architecture and is used to predict the secondary structures of globular proteins. This method allows a wide search in the parameter space of the neural networks and the determination of their optimal topology for the predictive task. Different neural network topologies are selected by the genetic algorithm on the basis of minimal values of mean square errors on the testing set." Comput Appl Biosci 1995 11 3 253-260 1733 Cantalloube,H Automat and BLAST: Com.. Comput.Appl.Bio 95 11(3):261-272 Cantalloube H; Labesse G; Chomilier J; Nahum C; Cho YY; Chams V; Achour A; Lachgar A; Mbika JP; Issing W; Mornon JP; Bizzini B; Zagury D; Zagury JF Automat and BLAST: Comparison of Two Protein Sequence Similarity Search Programs Protein; Sequence search; Similarity; BLAST; FR; Program "Since the early 1980s, protein/DNA sequence similarity search has become of major importance to biologists, and the need for fast and efficient tools grows with the size of databanks. Two programs use the strategy of finite state deterministic automatons to accomplish these searches. One of these two is BLAST, which is now widely used, and the other Automat, which has just been published. The differences and similarities in their basic principles, their use and their performances are analysed in this paper in order to allow optimal use of these important softwares." Comput Appl Biosci 1995 11 3 261-272 1734 Fondrat,C. A Rapid Access Motif D.. Comput.Appl.Bio 95 11(3):273-279 Fondrat C; Dessen P A Rapid Access Motif Database (RAMdb) with a Search Algorithm for the Retrieval Patterns in Nucleic Acids or Protein Databanks Databank; Motif; Database search; Retrieval; Pattern search; Nucleic acid; Protein; FR; Algorithm "We present here a codification structure, entirely interfaced with the main packages for biomolecule database management, associated with a new search algorithm to retrieve quickly a sequence in a database. This system is derived from a method previously proposed for homology search in databanks with a preprocessed codification of an entire database in which all the overlapping subsequences of a specific length in a sequence were converted into a code and stored in a hash-coding file. This new algorithm is designed for an improved use of the codification." Comput Appl Biosci 1995 11 3 273-279 1735 Bansal,M. NUPARM and NUCGEN: Sof.. Comput.Appl.Bio 95 11(3):281-287 Bansal M; Bhattacharyya D; Ravi B NUPARM and NUCGEN: Software for Analysis and Generation of Sequence Dependent Nucleic Acid Structures DNA; Structure; Nucleic acid; Geometry; RNA; India "Software packages NUPARM and NUCGEN are described, which can be used to understand sequence directed structural variations in nucleic acids, by analysis and generation of non-uniform structures. A set of local inter basepair parameters ... have been defined, which use geometry and coordinates of two successive basepairs only and can be used to generate polymeric structures with varying geometries for each of the 16 possible dinucleotide steps. ... NUPARM can be used to analyse both DNA and RNA structures, with single as well as double stranded helices." Comput Appl Biosci 1995 11 3 281-287 1736 Trelles-Salaz An Image-processing Ap.. Comput.Appl.Bio 95 11(3):301-308 Trelles-Salazar O; Zapata EL; Dopazo J; Coulson AFW; Carazo JM An Image-processing Approach to Dotplots: An X-Window-based Program for Interactive Analysis of Dotplots Derived from Sequence and Structural Data Dot; Sequence comparison; Sequence analysis; Structure; SP; Program "We present an approach to the study of the relationships between biological sequences and structures applying image analysis methods to dotplots. We introduce a set of analytical tools based on different types of digital image-processing filters that are new within the context of dotplots. We have reformulated some of the usual approaches in dotplot analysis as mathematical operations on images within the framework of mathematical morphology. An X- Window-based implementation of this new approach has been developed and is available by anonymous FTP." Comput Appl Biosci 1995 11 3 301-308 1737 Reczko,M. A Parallel Neural Netw.. Comput.Appl.Bio 95 11(3):309-315 Reczko M; Hatzigeorgiou A; Mache N; Zell A; Suhai S A Parallel Neural Network Simulator on the Connection Machine CM-5 Parallel; Neural; Network; Simulation; Pattern discovery; Prediction; DE "We here present a parallel implementation of artificial neural networks on the connection machine CM-5 and compare it with other parallel implementations on SIMD and MIMD architectures. This parallel implementation was developed with the goal of efficiently training large neural networks with huge training pattern sets for applications in molecular biology, in particular the prediction of coding regions in DNA sequences." Comput Appl Biosci 1995 11 3 309-315 1738 Singh,G.B. DNAView: A Quality Ass.. Comput.Appl.Bio 95 11(3):317-319 Singh GB; Krawetz SA DNAView: A Quality Assessment Tool for the Visualization of Large Sequenced Regions Sequence analysis; Display; Region; Graphic; Nucleic acid; DNA; USA "This communication describes DNAView, a graphical tool for the visualization and printing of large nucleic acid sequences. DNAView uses color coding to compactly display genomic segments of up to 100kb on a single printed page. The specific color schemes integrated into DNAView can highlight 'local aggregate' properties of large segments of DNA. We have also incorporated a confidence expression for the assigned sequence. This is represented by base color intensity that is proportional to the number of times that base was sequenced." Comput Appl Biosci 1995 11 3 317-319 1739 Mironov,A.A. DNASUN: A Package of C.. Comput.Appl.Bio 95 11(3):331-335 Mironov AA; Alexandrov NN; Bogodarova NY; Grigorjev A; Lebedev VF; Lunovskaya LV; Truchan ME; Pevzner PA DNASUN: A Package of Computer Programs for the Biotechnology Laboratory DNA; Gene; Program; Sequence analysis; Sequence alignment; Physical mapping; Sequencing; Protein; Nucleotide; RU "The paper describes a new software package DNASUN developed for supporting gene engineering laboratories. The package provides a user-friendly interface for experimental researches and supports the traditional nucleotide/protein sequence analysis as well as physical mapping, sequencing, plasmid manipulations, optimal oligonucleotide probe selection and other common molecular biology procedures." Comput Appl Biosci 1995 11 3 331-335 1740 Ferran,E.A. A Hybrid Method to Clu.. Comput.Appl.Bio 93 9(6):671-680 Ferran EA; Pflugfelder B A Hybrid Method to Cluster Protein Sequences based on Statistics and Artificial Neural Networks Clustering; Protein; Sequence analysis; Statistical; Neural; FR; Network "We have recently proposed a method, based on artificial neural networks (ANNs) to cluster protein sequences into families according to their degree of sequence similarity. The network was trained with an unsupervised learning algorithm, using, as inputs, matrix patterns derived from the bipeptide composition of the protein sequences. We describe here some further improvements to that approach. ... Finally, we propose a new hybrid method of the statistical and ANN approaches, in which the results of the statistical method are used to choose the number of neurons and inputs of the network." Comput Appl Biosci 1993 9 6 671-680 1741 Kumar,S. MEGA: Molecular Evolut.. Comput.Appl.Bio 94 10(2):189-191 Kumar S; Tamura K; Nei M MEGA: Molecular Evolutionary Genetics Analysis Software for Microcomputers Evolution; Genetic; Evolutionary distance; Phylogenetic; Statistical; UPGMA; Parsimony; Neighbor joining; USA "A computer program package called MEGA has been developed for estimating evolutionary distances, reconstructing phylogenetic trees and computing basic statistical quantities from molecular data. ... In this program, various methods for estimating evolutionary distances from nucleotide and amino acid sequence data, three different methods of phylogenetic inference (UPGMA, neighbor-joining and maximum parsimony) and two statistical tests of topological differences are included." Comput Appl Biosci 1994 10 2 189-191 1742 DeSalle,R. Implications of Ancien.. Experientia 94 50(6):543-550 DeSalle R Implications of Ancient DNA for Phylogenetic Studies DNA; Ancient; Phylogenetic; Review; Cladistic; USA "The utility of DNA sequence characters from fossil specimens is examined from a phylogenetic perspective. Four ways that fossil characters can alter phylogenetic hypotheses are discussed. ... Fossil DNA sequences as characters will be affected by the problem of missing data and missing taxa. In general, cladogram accuracy will be more greatly affected by missing taxa and cladogram resolution will be affected more acutely by missing data. Due to these points, an examination of the importance of the phylogenetic question being addressed, the utility of the fossil DNA sequences and the rarity of the fossil should be considered before damage of a fossil is undertaken." Experientia 1994 50 6 543-550 1743 Archakov,A.I. Structural Classificat.. Biochem.Mol.Bio 93 31(6):1071-108 Archakov AI; Degtyare? KN Structural Classification of the P450 Superfamily based on Consensus Sequence Comparison Protein; Structure; Classification; Consensus sequence; Sequence comparison; Superfamily CSNA Service 23(1994). Biochem Mol Biol Intl 1993 31 6 1071-1080 1744 Pietrokovski, Comparing Nucleotide a.. J.Biotechnol. 94 35(2/3):257-27 Pietrokovski S Comparing Nucleotide and Protein Sequences by Linguistic Methods Nucleotide; Protein; Sequence comparison; Linguistic CSNA Service 23(1994). J Biotechnol 1994 35 2/3 257-272 1745 Marshall,C.R. Dollo's Law and the De.. Proc.Nat.Acad.S 94 91(25):12283-1 Marshall CR; Raff EC; Raff RA Dollo's Law and the Death and Resurrection of Genes Dollo; Gene; Genome; Evolution; Genetic; Protein; USA "Dollo's law, the concept that evolution is not substantively reversible, implies that the degradation of genetic information is sufficiently fast that genes or developmental pathways released from selective pressure will rapidly become nonfunctional. Using empirical data to assess the rate of loss of coding information in genes for proteins with varying degrees of tolerance to mutational change, we show that, in fact, there is a significant probability over evolutionary time scales of 0.5-6 million years for successful reactivation of silenced genes or 'lost' developmental programs. Conversely, the reactivation of long (>10 million years)-unexpressed genes and dormant developmental pathways is not possible unless function is maintained by other selective constraints ...." Proc Nat Acad Sci USA 1994 91 25 12283-12287 1746 Schuster,P. From Sequences to Shap.. Proc.R.Soc.Lond 94 255(1344):279- Schuster P; Fontana W; Stadler PF; Hofacker IL From Sequences to Shapes and Back: A Case-Study in RNA Secondary Structures RNA; Secondary; Structure; Folding; Sequence analysis; USA "RNA folding is viewed here as a map assigning secondary structures to sequences. ... By using an algorithm for inverse folding, we show that sequences sharing the same structure are distributed randomly over sequence space. All common structures can be accessed from an arbitrary sequence by a number of mutations much smaller than the chain length. ... Implications for evolutionary adaptation and for applied molecular evolution are evident: finding a particular structure by mutation and selection is much simpler than expected and, even if catalytic activity should turn out to be sparse in the space of RNA structures, it can hardly be missed by evolutionary processes." Proc R Soc Lond Ser B 1994 255 1344 279-284 1747 Olmstead,R.G. Chloroplast DNA System.. Am.J.Bot. 94 81(9):1205-122 Olmstead RG; Palmer JD Chloroplast DNA Systematics: A Review of Methods and Data Analysis DNA; Chloroplast; Systematics; Review; Restriction; Mapping; Sequence comparison; Phylogenetic; USA "The field of plant molecular systematics is expanding rapidly, and with it new and refined methods are coming into use. This paper reviews recent advances in experimental methods and data analysis, as applied to the chloroplast genome. ... The relative advantages and disadvantages of comparative restriction site mapping and DNA sequencing are reviewed. For both methods, the analysis of resulting data requires sufficient taxon and character sampling to achieve the best possible estimate of phylogenetic relationships. Parsimony analysis is particularly sensitive to the issue of taxon sampling due to the problem of long branches attracting on a tree." Am J Bot 1994 81 9 1205-1224 1748 Brower,A.V.Z. Practical and Theoreti.. Ann.Entomol.Soc 94 87(6):702-716 Brower AVZ; DeSalle R Practical and Theoretical Considerations for Choice of a DNA-Sequence Region in Insect Molecular Systematics, with a Short Review of Published Studies using Nuclear Gene Regions (Review) DNA; Region; Gene; Review; Insect; Systematics CSNA Service 23(1994). Ann Entomol Soc Am 1994 87 6 702-716 1749 Dong,S. Gene Structure Predict.. Genomics 94 23(3):540-551 Dong S; Searls DB Gene Structure Prediction by Linguistic Methods Gene; Structure; Prediction; Linguistic CSNA Service 23(1994). Genomics 1994 23 3 540-551 1750 Wheeler,W.C. Malign - A Multiple Se.. J.Hered. 94 85(5):417-418 Wheeler WC; Gladstei? DS Malign - A Multiple Sequence Alignment Program (Technical Note) Multiple alignment; Sequence alignment; Program CSNA Service 23(1994). J Hered 1994 85 5 417-418 1751 Adell,J.C. Monte Carlo Simulation.. J.Mol.Evol. 94 38(3):305-309 Adell JC; Dopazo J Monte Carlo Simulation in Phylogenies: An Application to Test the Constancy of Evolutionary Rates Monte Carlo; Simulation; Phylogenetic; Evolutionary rate; Bootstrap; Clock; Least squares; SP; Phylogeny; Rate "Monte Carlo simulation has commonly been used in phylogenetic studies to test different tree-reconstruction methods, and consequently, its application for testing evolutionary models can be considered as a natural extension of this usage. Repetitive simulation of a given evolutionary process, under the restrictions imposed by the model to be tested, along a determinate tree topology allow the estimate of probability distributions for the desired parameters. Next, the phylogenetic tree can be reconstructed again without the constraints of the model, and the parameter of interest, derived from this tree, can be compared to the corresponding probability distribution derived from the restricted, simulated trees." J Mol Evol 1994 38 3 305-309 1752 Krogh,A. Hidden Markov Models i.. J.Mol.Biol. 94 235(5):1501-15 Krogh A; Brown M; Mian IS; Sjolander K; Haussler D Hidden Markov Models in Computational Biology - Applications to Protein Modeling Protein; Model; Markov; Sequence alignment; Multiple alignment; Database search; Statistical; USA "Hidden Markov Models (HMMs) are applied to the problems of statistical modeling, database seqrching and multiple sequence alignment of protein families and protein domains. These methods are demonstrated on the globin family, the protein kinase catalytic domain, and the EF-hand calcium binding motif. ... The HMM produces multiple alignments of good quality that agree closely with the alignments produced by programs that incorporate three-dimensional structural information." J Mol Biol 1994 235 5 1501-1531 1753 Hess,S.T. Wide Variations in Nei.. J.Mol.Biol. 94 236(4):1022-10 Hess ST; Blake JD; Blake RD Wide Variations in Neighbor-dependent Substitution Rates Substitution; Rate; Bias; Indel; USA "The pattern of 20,200 point substitutions in the 16 unique neighbor-pair environments has been determined from aligned gene/pseudogene sequences in the current database of human DNA sequences. Substitution rates, representing averages over those for different regions of the genome, are distributed over a 60-fold range with strong biases in particular neighbor-pair environments. ... Characteristic biases of the content and arrangement of oligonucleotide strings or tuples in all sequence elements, but particularly in non-coding regions, appear to be due to the pattern of different neighbor-dependent substitution rates." J Mol Biol 1994 236 4 1022-1033 1754 Smith,A.B. Paleontological Data a.. Paleobiology 94 20(3):259-273 Smith AB; Littlewood DTJ Paleontological Data and Molecular Phylogenetic Analysis Phylogenetic; Sequence analysis; Rate; Evolution; UK "Molecular data are becoming an indispensable tool for the reconstruction of phylogenies. Fossil molecular data remain scarce, but have the potential to resolve patterns of deep branching and provide empirical tests of tree reconstruction techniques. A total evidence approach, combining and comparing complementary morphological, molecular and stratigraphical data from both recent and fossil taxa, is advocated as the most promising way forward because there are several well-established problems that can afflict the analysis of molecular sequence data sometimes resulting in spurious tree topologies." Paleobiology 1994 20 3 259-273 1755 Conklin,D. Knowledge Discovery in.. IEEE Trans.Know 93 5(6):985-987 Conklin D; Fortier S; Glasgow J Knowledge Discovery in Molecular Databases (Letter) Knowledge; Sequence database; Database search CSNA Service 23(1994). IEEE Trans Knowledge Data Eng 1993 5 6 985-987 1756 Du,M.W. An Approach to Designi.. IEEE Trans.Know 94 6(4):620-633 Du MW; Chang SC An Approach to Designing Very Fast Approximate String-Matching Algorithms String match; Approximate match; Algorithm CSNA Service 23(1994). IEEE Trans Knowledge Data Eng 1994 6 4 620-633 1757 Bertossi,A.A. Parallel String-Matchi.. J.Parallel Dist 94 22(2):229-234 Bertossi AA; Logi F Parallel String-Matching with Variable-Length Don't Cares Parallel; String match; Don't care CSNA Service 23(1994). J Parallel Distrib Comput 1994 22 2 229-234 1758 Pande,V.S. Nonrandomness in Prote.. Proc.Nat.Acad.S 94 91(26):12972-1 Pande VS; Grosberg AY; Tanaka T Nonrandomness in Protein Sequences: Evidence for a Physically Driven Stage of Evolution? Evolution; Protein; Sequence analysis; DNA; Statistical; Genetic "The sequences, or primary structures, of existing biopolymers-in particular, proteins-are believed to be a product of evolution. Are the sequences random? If not, what is the character of this nonrandomness? To explore the statistics of protein sequences, we use the idea of mapping the sequence onto the trajectory of a random walk, originally proposed by Peng et al. in their analysis of DNA sequences. Using three different mappings, corresponding to three basic physical interactions between amino acids, we found pronounced deviations from pure randomness, and these deviations seem directed toward minimization of the energy of the three-dimensional structure. We consider this result as evidence for a physically driven stage of evolution." Proc Nat Acad Sci USA 1994 91 26 12972-12975 1759 Attwood,T.K. PRINTS - A Database of.. Nucleic Acids R 94 22(17):3590-35 Attwood TK; Beck ME; Bleasby AJ; Parry-Smith DJ PRINTS - A Database of Protein Motif Fingerprints Database search; Protein; Fingerprint; UK; Motif "PRINTS is a compendium of protein motif 'fingerprints'. A fingerprint is defined as a group of motifs excised from conserved regions of a sequence alignment, whose diagnostic power or potency is refined by iterative databasescanning (in this case the OWL composite sequence database). ... The use of groups of independent, linearly- or spatially- distinct motifs allows protein folds and functionalities to be characterised more flexibly and powerfully than conventional single-component patterns or regular expressions. ... The information contained within PRINTS is distinct from, but complementary to the consensus expressions stored in the widely-used PROSITE dictionary of patterns." Nucleic Acids Res 1994 22 17 3590-3596 1760 Rost,B. Structure Prediction o.. Curr.Opin.Biote 94 5(4):372-380 Rost B; Sander C Structure Prediction of Proteins - Where are We Now? Protein; Structure; Nucleotide; Sequence analysis; Secondary; DE; Prediction "Although the 'structure from sequence' prediction problem remains fundamentally unsolved, new and promising methods in one, two and three dimensions have reopened the field. Significantly improved one-dimensional prediction of secondary structure from multiple sequence alignments is now in routine use. In the two-dimensional approach, inter-residue contacts can be detected by analysis of correlated mutations, albeit with low accuracy. Finally, three-dimensional methods, in which pseudopotentials or information values are derived from the databases, are proving their value for distinguishing between correct and incorrect models." Curr Opin Biotechnol 1994 5 4 372-380