<?xml version="1.0"?>
<BIB>
<SEQ>
<UI>0001   Torelli,A.    ADVANCE and ADAM: Two .. Comput.Appl.Bio 94 
10(1):3-6
</UI>
<AU>Torelli A;
    Robotti CA
</AU>
<TI>ADVANCE and ADAM: Two Algorithms for the Analysis of Global Similarity
between Homologous Informational Sequences
</TI>
<SU>Pairwise comparison;
    Sequence proximity;
    Pairwise alignment;
    Italy;
    Similarity;
    Algorithm
</SU>
<AB>"Two algorithms for the analysis of global similarity between sequences 
of
informational polymeric molecules (nucleic acids and proteins) are proposed: 
one
(ADVANCE) merely gives a quantification of the global similarity between two
sequences, and is very fast; the other (ADAM) also provides an alignment of the
sequences. Both are new algorithms, implement Sellers' theorem, do not require
parameters ... and are fast ...."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>1</NO>
<PP>3-6</PP>
</SEQ>

<SEQ>
<UI>0002   Ina,Y.        ODEN: A Program Packag.. Comput.Appl.Bio 94 
10(1):11-12
</UI>
<AU>Ina Y
</AU>
<TI>ODEN: A Program Package for Molecular Evolutionary Analysis and Database
Search of DNA and Amino Acid Sequences
</TI>
<SU>Database search;
    Phylogeny;
    JP;
    Program;
    DNA;
    Amino acid
</SU>
<AB>"To enable researchers to use both kinds of programs interactively, I
developed a program package which integrates (about) 50 programs for database
search and molecular evolutionary analysis. I named the package 'ODEN' after a
Japanese food which is cooked with various materials marvelously harmonized."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>1</NO>
<PP>11-12</PP>
</SEQ>

<SEQ>
<UI>0003   Thompson,J.D. Improved Sensitivity o.. Comput.Appl.Bio 94 
10(1):19-29
</UI>
<AU>Thompson JD;
    Higgins DG;
    Gibson TJ
</AU>
<TI>Improved Sensitivity of Profile Searches through the Use of Sequence
Weights and Gap Excision
</TI>
<SU>Match a pattern matrix;
    Database search;
    Sequence weight;
    DE;
    Gap;
    Profile
</SU>
<AB>"However, none of the published weighting schemes seemed ideal for the
purpose of weighting profiles. We have developed a new method to weight the
sequences, where more distant sequences are assigned higher weights than 
closely
related ones according to branch length values of neighbour-joining trees
prepared from the aligned sequences."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>1</NO>
<PP>19-29</PP>
</SEQ>

<SEQ>
<UI>0004   Doelz,R.      Hierarchical Access Sy.. Comput.Appl.Bio 94 
10(1):31-34
</UI>
<AU>Doelz R
</AU>
<TI>Hierarchical Access System for Sequence Libraries in Europe (HASSLE): A
Tool to Access Sequence Databases Remotely
</TI>
<SU>Database search;
    SWI;
    FASTA;
    Hierarchical;
    BLAST;
    Program;
    Sequence database
</SU>
<AB>"HASSLE focuses on the network aspect of the molecular biology computing
and assumes that it is possible to have database applications available as
remote 'services' (programs, program packages or utilities) which can be 
started
by a simple command script after a suitable feed of datafiles. The current
system provides these services for searching with programs like FASTA or BLAST
...."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>1</NO>
<PP>31-34</PP>
</SEQ>

<SEQ>
<UI>0005   Olsen,G.J.    fastDNAml: A Tool for .. Comput.Appl.Bio 94 
10(1):41-48
</UI>
<AU>Olsen GJ;
    Matsuda H;
    Hagstrom R;
    Overbeek R
</AU>
<TI>fastDNAml: A Tool for Construction of Phylogenetic Trees of DNA Sequences
using Maximum Likelihood
</TI>
<SU>Phylogeny;
    Parallel;
    USA;
    Likelihood;
    Evolutionary tree;
    DNA;
    Phylogenetic
</SU>
<AB>"The program can be run on a wide variety of computers ranging from Unix
workstations to massively parallel systems .... Our program uses a maximum
likelihood approach and is based on version 3.3 of Felsenstein's dnaml program.
... and phylogenetic estimates are possible even when hundreds of sequences
exist."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>1</NO>
<PP>41-48</PP>
</SEQ>

<SEQ>
<UI>0006   Gast,F.U.     A Macintosh Program fo.. Comput.Appl.Bio 94 
10(1):49-51
</UI>
<AU>Gast FU
</AU>
<TI>A Macintosh Program for the Versatile Generation of Random Nucleic Acid
Sequences and their Structural Analysis
</TI>
<SU>Sequence analysis;
    Program;
    Nucleic acid;
    DE
</SU>
<AB>"The program 'MacStAn' for the Apple Macintosh generates random sequences
and can analyze their tendency to form secondary structure or translation
products as well as their mono-, di- and trinucleotide composition."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>1</NO>
<PP>49-51</PP>
</SEQ>

<SEQ>
<UI>0007   Fuchs,R.      Fast Protein Block Sea.. Comput.Appl.Bio 94 
10(1):79-80
</UI>
<AU>Fuchs R
</AU>
<TI>Fast Protein Block Searches
</TI>
<SU>Database search;
    DE;
    Block search;
    Genome;
    Profile;
    Protein
</SU>
<AB>"Profile searches using aligned short protein blocks are an effective
method for identifying putative protein functions. An algorithm is presented
that accelerates block searches by a factor 2-5 with only limited lack of
sensitivity; this algorithm is particularly suited for application in large-
scale genome research."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>1</NO>
<PP>79-80</PP>
</SEQ>

<SEQ>
<UI>0008   Chapman,M.S.  Sequence Similarity Sc.. Comput.Appl.Bio 94 
10(2):111-119
</UI>
<AU>Chapman MS
</AU>
<TI>Sequence Similarity Scores and the Inference of Structure - Function
Relationships
</TI>
<SU>Multiple comparison;
    USA;
    Statistical;
    Sequence proximity;
    Function;
    Similarity;
    Structure;
    Score
</SU>
<AB>"Improved methods are described for the interpretation of two or more
aligned protein or nucleic acid sequences. ... Improvements include the
calculation of a position-dependent, gap-penalized similarity score; computer-
assisted graphical association of sequence similarity with structural,
functional or chemical properties of the sequences; and statistical comparisons
of the sequence conservation or variability of different groups of residues."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>2</NO>
<PP>111-119</PP>
</SEQ>

<SEQ>
<UI>0009   Frohlich,K.U. Sequence Similarity Pr.. Comput.Appl.Bio 94 
10(2):179-183
</UI>
<AU>Frohlich KU
</AU>
<TI>Sequence Similarity Presenter: A Tool for the Graphic Display of
Similarities of Long Sequences for Use in Presentations
</TI>
<SU>Pairwise comparison;
    DE;
    Display;
    Sequence alignment;
    Similarity;
    Graphic
</SU>
<AB>"A new method for the presentation of alignments of long sequences is
described. The degree of identity for the aligned sequences is averaged for
sections of a fixed number of residues. The resulting values are converted to
shades of gray, with white corresponding to lack of identity and black
corresponding to perfect identity. A sequence alignment is represented as a bar
filled with varying shades of gray."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>2</NO>
<PP>179-183</PP>
</SEQ>

<SEQ>
<UI>0010   Laferriere,A. An RNA Pattern Matchin.. Comput.Appl.Bio 94 
10(2):211-212
</UI>
<AU>Laferriere A;
    Gautheret D;
    Cedergren R
</AU>
<TI>An RNA Pattern Matching Program with Enhanced Performance and Portability
</TI>
<SU>Database search;
    Pattern match;
    Sequence database;
    Program;
    String match;
    RNA;
    CA;
    Performance
</SU>
<AB>"We present here a significant improvement of the program RNAMOT which
allows searches of primary and secondary structural patterns in sequence
databases (Gautheret et al. 1990). An important performance enhancement was
achieved using a faster string-matching algorithm and more efficient sequence
scans."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>2</NO>
<PP>211-212</PP>
</SEQ>

<SEQ>
<UI>0011   Wishart,D.S.  SEQSEE: A Comprehensiv.. Comput.Appl.Bio 94 
10(2):121-132
</UI>
<AU>Wishart DS;
    Boyko RF;
    Willard L;
    Richards FM;
    Sykes BD
</AU>
<TI>SEQSEE: A Comprehensive Program Suite for Protein Sequence Analysis
</TI>
<SU>Sequence analysis;
    CA;
    Display;
    Statistical;
    Sequence alignment;
    Pattern match;
    Program;
    Protein
</SU>
<AB>"SEQSEE (SEQuence SEEker) is a multi-purpose, menu-driven suite of
programs designed to provide a fully integrated, state-of-the-art package for
the analysis and display of protein sequences and protein databases. ... SEQSEE
is capable of performing ... sequence/database searching, sequence retrieval,
sequence entry and editing, statistical sequence analysis, multiple sequence
alignment, flexible pattern matching, and secondary structure prediction."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>2</NO>
<PP>121-132</PP>
</SEQ>

<SEQ>
<UI>0012   Searls,D.B.   Doing Sequence Analysi.. Comput.Appl.Bio 93 
9(4):421-426
</UI>
<AU>Searls DB
</AU>
<TI>Doing Sequence Analysis with your Printer
</TI>
<SU>Sequence analysis;
    USA
</SU>
<AB>"The software package RSVP (Rapid Sequence Visualization in PostScript)
has a suite of visually oriented sequence analysis routines implemented 
entirely
in the page description language PostScript .... RSVP is thus a relatively
platform-independent tool for providing a 'quick look' at sequence data, using
form and color to help point out patterns, in advance of more sophisticated
sequence analyses."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>4</NO>
<PP>421-426</PP>
</SEQ>

<SEQ>
<UI>0013   Date,S.       Multiple Alignment of .. Comput.Appl.Bio 93 
9(4):397-402
</UI>
<AU>Date S;
    Kulkarni R;
    Kulkarni B;
    Kulkarni-Kale U;
    Kolaskar AS
</AU>
<TI>Multiple Alignment of Sequences on Parallel Computers
</TI>
<SU>Multiple alignment;
    Clustering;
    Parallel;
    India;
    Program;
    Hierarchical
</SU>
<AB>"A software package that allows one to carry out multiple alignment of
protein and nucleic acid sequences of almost unlimited length and number of
sequences is developed on C-DAC parallel computer - a transputer-based machine.
... The speed gains are almost linear when the number of transputers is
increased from 4 to 64."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>4</NO>
<PP>397-402</PP>
</SEQ>

<SEQ>
<UI>0014   Chao,K.M.     Locating Well-Conserve.. Comput.Appl.Bio 93 
9(4):387-396
</UI>
<AU>Chao KM;
    Hardison RC;
    Miller W
</AU>
<TI>Locating Well-Conserved Regions Within a Pairwise Alignment
</TI>
<SU>Pairwise alignment;
    Significance;
    USA;
    Locally optimal;
    Suboptimal;
    Region
</SU>
<AB>"When alignments are so long that it is infeasible, or at least
undesirable, to inspect them in complete detail, it is helpful to have an
automatic process that computes information about the varying degree of
conservation along the alignment and displays the information in a graphical
representation that is readily assimilated. This paper presents methods for
computing several such 'robustness measures' at each position of a given
alignment."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>4</NO>
<PP>387-396</PP>
</SEQ>

<SEQ>
<UI>0015   Livingstone,C Protein Sequence Align.. Comput.Appl.Bio 93 
9(6):745-756
</UI>
<AU>Livingstone CD;
    Barton GJ
</AU>
<TI>Protein Sequence Alignments: A Strategy for the Hierarchical Analysis of
Residue Conservation
</TI>
<SU>Multiple comparison;
    UK;
    Sequence alignment;
    Hierarchical;
    Consensus index;
    Protein;
    Residue
</SU>
<AB>"An algorithm is described for the systematic characterization of the
physico-chemical properties seen at each position in a multiple protein 
sequence
alignment. The new algorithm allows questions important in the design of
mutagenesis experiments to be quickly answered since positions in the alignment
that show unusual or interesting residue substitution patterns may be rapidly
identified. The strategy is based on a flexible set-based description of amino
acid properties, which is used to define the conservation between any group of
amino acids."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>6</NO>
<PP>745-756</PP>
</SEQ>

<SEQ>
<UI>0016   Ortells,M.O.  CEDIT: A C Interface a.. Comput.Appl.Bio 93 
9(6):741-744
</UI>
<AU>Ortells MO;
    Cockcroft VB;
    Lunt GG
</AU>
<TI>CEDIT: A C Interface and Macro Facility for Portein Sequence Alignment
Editing in Colour with Microsoft Word 5.0 for PCs
</TI>
<SU>Multiple alignment;
    Display;
    UK;
    Sequence alignment;
    Editor;
    Editing;
    Word
</SU>
<AB>"CEDIT, a C interface and macro facility that provides for the colour
editing of protein sequence alignments (up to 2000 sequences, 5000 residues
each) using Microsoft Word 5.0 for PCs is presented. CEDIT uses the ability of
MS-Word to display letters with the desired colour to easily identify
conservative homologies across the sequences."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>6</NO>
<PP>741-744</PP>
</SEQ>

<SEQ>
<UI>0017   De Rijk,P.    DCSE, an Interactive T.. Comput.Appl.Bio 93 
9(6):735-740
</UI>
<AU>De Rijk P;
    De Wachter R
</AU>
<TI>DCSE, an Interactive Tool for Sequence Alignment and Secondary Structure
Research
</TI>
<SU>Multiple alignment;
    Belgium;
    Sequence alignment;
    Gap;
    Program;
    Editor;
    Structure;
    Secondary
</SU>
<AB>"DCSE provides a user-friendly package for the creation and editing of
sequence alignments. ... It shifts characters or entire blocks of aligned
characters, rather than inserting or deleting gaps in the sequences. Alignment
of a new sequence to an existing alignment is partly automated. Although DCSE
can be used on protein sequence alignments, it is especially targeted at the
examination of RNA. The secondary structure for every sequence can be
incorporated easily in the alignment."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>6</NO>
<PP>735-740</PP>
</SEQ>

<SEQ>
<UI>0018   Barton,G.J.   An Efficient Algorithm.. Comput.Appl.Bio 93 
9(6):729-734
</UI>
<AU>Barton GJ
</AU>
<TI>An Efficient Algorithm to Locate all Locally Optimal Alignments between
Two Sequences Allowing for Gaps
</TI>
<SU>Subalignment;
    UK;
    Gap;
    Locally optimal;
    Optimal;
    Algorithm
</SU>
<AB>"An efficient algorithm is described to locate locally optimal alignments
between two sequences allowing for insertions and deletions. The algorithm is
based on that of Smith and Waterman which returns the single best local
alignment. However, the algorithm described here permits all non-intersecting
locally optimal alignments to be determined in a single pass through the
comparison matrix."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>6</NO>
<PP>729-734</PP>
</SEQ>

<SEQ>
<UI>0019   Liuni,S.      SIMD Parallelization o.. Comput.Appl.Bio 93 
9(6):701-707
</UI>
<AU>Liuni S;
    Prunella N;
    Pesole G;
    D'Orazio T;
    Stella E;
    Distante A
</AU>
<TI>SIMD Parallelization of the WORDUP Algorithm for Detecting Statistically
Significant Patterns in DNA Sequences
</TI>
<SU>Multiple comparison;
    Significance;
    Parallel;
    Pattern match;
    String match;
    Italy;
    Boyer-Moore;
    DNA;
    Algorithm
</SU>
<AB>"We study a method for parallelizing the algorithm WORDUP, which detects
the presence of statistically significant patterns in DNA sequences. WORDUP
implements an efficient method to identify the presence of statistically
significant oligomers in a non-homologous group of sequences. It is based on a
modified version of the Boyer-Moore algorithm, which is one of the fastest
algorithms for string matching available in the literature."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>6</NO>
<PP>701-707</PP>
</SEQ>

<SEQ>
<UI>0020   Bordo,D.      ENVIRON: A Software Pa.. Comput.Appl.Bio 93 
9(6):639-645
</UI>
<AU>Bordo D
</AU>
<TI>ENVIRON: A Software Package to Compare Protein Three-Dimensional
Structures with Homologous Sequences using Local Structural Motifs
</TI>
<SU>Structure;
    Motif;
    Sequence alignment;
    Italy;
    Program;
    Protein
</SU>
<AB>"This work presents a method to compare local clusters of interacting
residues as observed in a known three-dimensional protein structure with
corresponding clusters inferred from homologous protein sequences, assuming
conserved protein folding. For this purpose the local environment of a selected
residue in a known protein structure is defined as the ensemble of amino acids
in contact with it in the folded state. Using a multiple sequence alignment to
identify corresponding residues in homologous proteins, a detailed comparison
can be performed ...."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>6</NO>
<PP>639-645</PP>
</SEQ>

<SEQ>
<UI>0021   Fuchs,R.      Block Searches on VAX .. Comput.Appl.Bio 93 
9(5):587-591
</UI>
<AU>Fuchs R
</AU>
<TI>Block Searches on VAX and Alpha Computer Systems
</TI>
<SU>Database search;
    Match a pattern matrix;
    Block search;
    DE;
    Pattern search
</SU>
<AB>"A new program, BlockSearch, is described that allows biologists to 
search
protein sequences against the BLOCKS database of aligned protein blocks by
converting these blocks to site-specific scoring matrices. It thus complements
existing tools for standard similarity searches and pattern searches which aid
in elucidating the function of newly determined protein-coding sequences."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>5</NO>
<PP>587-591</PP>
</SEQ>

<SEQ>
<UI>0022   Prunella,N.   FASTPAT: A Fast and Ef.. Comput.Appl.Bio 93 
9(5):541-545
</UI>
<AU>Prunella N;
    Liuni S;
    Attimonelli M;
    Pesole G
</AU>
<TI>FASTPAT: A Fast and Efficient Algorithm for String Searching in DNA
Sequences
</TI>
<SU>String match;
    Boyer-Moore;
    Italy;
    String search;
    DNA;
    Algorithm
</SU>
<AB>"A new string searching algorithm is presented aimed at searching for the
occurrence of character patterns in longer character texts. The algorithm,
specifically designed for nucleic acid sequence data, is essentially derived
from the Boyer-Moore method .... Both pattern and text data are compressed so
that the natural 4-letter alphabet of nucleic acid sequences is considerably
enlarged."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>5</NO>
<PP>541-545</PP>
</SEQ>

<SEQ>
<UI>0023   Zhang,M.Q.    A Weight Array Method .. Comput.Appl.Bio 93 
9(5):499-509
</UI>
<AU>Zhang MQ;
    Marr TG
</AU>
<TI>A Weight Array Method for Splicing Signal Analysis
</TI>
<SU>Match a pattern matrix;
    USA;
    Sequence analysis;
    Statistical;
    Signal
</SU>
<AB>"A new method of sequence analysis, using a weight array method (WAM),
which generalizes the traditional Staden weight matrix method (WMM), is
proposed. With the help of a statistical mechanical model, the discriminant
function is identified with the energy function describing macromolecular
interactions."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>5</NO>
<PP>499-509</PP>
</SEQ>

<SEQ>
<UI>0024   Sakamoto,N.   Development of the Ove.. Comput.Appl.Bio 93 
9(4):427-434
</UI>
<AU>Sakamoto N;
    Takagi T;
    Sakaki Y
</AU>
<TI>Development of the Overlapping Oligonucleotide Database and its
Application to Signal Sequence Search of the Human Genome
</TI>
<SU>Database search;
    Sequence database;
    Signal;
    JP;
    Sequence search;
    Genome
</SU>
<AB>"We have developed ODS (Overlapping Oligonucleotide Database for Signal
Sequence Search) - the first relational database that integrates information on
biological features into the search for signal sequences. ... Nucleotide
sequences are transformed into overlapping oligonucleotides in order to
facilitate the signal sequence search rapidly without the need for specific
alignment programs. This transformation leads to a one-to-one correspondence
between the nucleotide sequence and its biological feature."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>4</NO>
<PP>427-434</PP>
</SEQ>

<SEQ>
<UI>0025   Milosavljevic Discovering Simple DNA.. Comput.Appl.Bio 93 
9(4):407-411
</UI>
<AU>Milosavljevic A;
    Jurka J
</AU>
<TI>Discovering Simple DNA Sequences by the Algorithmic Significance Method
</TI>
<SU>Sequence analysis;
    Significance;
    Compression;
    USA;
    Dynamic programming;
    Repeat;
    DNA
</SU>
<AB>"The main idea is that patterns can be discovered by finding ways to
encode the observed data concisely. ... The method is applied to discover
significantly simple DNA sequences. We define DNA sequences to be simple if 
they
contain repeated occurrences of certain 'words' and thus can be encoded is a
small number of bits. ... A standard dynamic programming algorithm for data
compression is applied to compute the minimal encoding lengths of sequences in
linear time."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>4</NO>
<PP>407-411</PP>
</SEQ>

<SEQ>
<UI>0026   Fagin,B.      A Special-Purpose Proc.. Comput.Appl.Bio 93 
9(2):221-226
</UI>
<AU>Fagin B;
    Watt JG;
    Gross R
</AU>
<TI>A Special-Purpose Processor for Gene Sequence Analysis
</TI>
<SU>Pairwise alignment;
    Hardware;
    USA;
    Sequence analysis;
    Sequence alignment;
    Needleman-Wunsch;
    Gene
</SU>
<AB>"For certain problems, special-purpose computers can achieve significant
cost/performance gains over general-purpose machines. We describe one such
computer here: a custom accelerator for gene sequence analysis. The accelerator
implements a version of the Needleman-Wunsch algorithm for nucleotide sequence
alignment. ... The boards ... yield a 15-fold performance improvement over an
unassisted host."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>2</NO>
<PP>221-226</PP>
</SEQ>

<SEQ>
<UI>0027   Vogt,G.       Profile Sequence Analy.. Comput.Appl.Bio 93 
9(1):25-28
</UI>
<AU>Vogt G;
    Argos P
</AU>
<TI>Profile Sequence Analysis and Database Searches on a Transputer Machine
Connected to a Macintosh Computer
</TI>
<SU>Match a pattern matrix;
    Database search;
    Parallel;
    DE;
    Sequence analysis;
    Dynamic programming;
    Profile
</SU>
<AB>"An implementation of Profilesearch (a technique to search for
relationships between a protein sequence and multiply aligned sequences) for a
parallel computer is described. ... The program and environment are useful to
search quickly and easily for similarities between a single sequence or 
sequence
set and individual sequences contained in a large database. The alignment is
determined by typical dynamic programming techniques."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>1</NO>
<PP>25-28</PP>
</SEQ>

<SEQ>
<UI>0028   Fuchs,R.      EMBL-Search: A CD-ROM .. Comput.Appl.Bio 93 
9(1):71-77
</UI>
<AU>Fuchs R;
    Stoehr P
</AU>
<TI>EMBL-Search: A CD-ROM Based Database Query System
</TI>
<SU>Database search;
    DE;
    Sequence database;
    Query
</SU>
<AB>"This paper describes a system of generally applicable index files
provided on the EMBL sequence databases CD-ROM to facilitate the development of
front-end software to the sequence databases available on this CD-ROM. The 
index
files are used by a new versatile and user-friendly database retrieval program
for the Apple Macintosh, EMBL-Search, which allows the easy construction of
complex database queries."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>1</NO>
<PP>71-77</PP>
</SEQ>

<SEQ>
<UI>0029   Balzarotti,V. An Algorithm for the I.. Comput.Appl.Bio 93 
9(1):93-100
</UI>
<AU>Balzarotti V;
    Colizzi V;
    Morante S;
    Parisi V
</AU>
<TI>An Algorithm for the Identification of Similar Oligopeptides between 
Amino
Acid Sequences
</TI>
<SU>Locally optimal;
    Significance;
    Identification;
    Italy;
    Subalignment;
    Amino acid;
    Algorithm
</SU>
<AB>"We have developed a new algorithm capable of identifying pairs of 
similar
oligopeptides irrespective of their length, number, location and ordering along
the proteins, by locally comparing the two sequences of amino acids. The
algorithm compares the actual number of similar pairs found in this way, with
the number expected under the simplified assumption that the amino acids along
the sequences are randomly distributed with a given occurrence frequency. The
final step of the procedure consists in selecting the pairs of similar
oligopeptides that are statistically significant ...."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>1</NO>
<PP>93-100</PP>
</SEQ>

<SEQ>
<UI>0030   Aho,A.V.      Pattern Matching in St.. Formal Langua.. 80Academic 
Press
</UI>
<AU>Aho AV
</AU>
<TI>Pattern Matching in Strings
</TI>
<ED>Book R
</ED>
<BK>Formal Language Theory, Perspectives and Open Problems
</BK>
<SU>String match;
    Match complex patterns;
    Language;
    USA;
    Pattern match;
    Expression
</SU>
<AB>"This paper examines three basic classes of string patterns ... and
analyzes some of the time-space tradeoffs inherent in searching for these
classes of patterns. The three classes of patterns considered are (1) finite
sets of strings, (2) regular expressions, and (3) regular expressions with back
referencing. Efficient pattern matching algorithms for each of these classes 
are
discussed."
</AB>
<PU>Academic Press </PU>
<PL>New York </PL>
<PY>1980</PY>
<PP>325-347</PP>
</SEQ>

<SEQ>
<UI>0031   Aleksandrov,N Pattern Recognition in.. Mol.Biol.(Mosc. 89 
23:988-999
</UI>
<AU>Aleksandrov NN;
    Mironov AA
</AU>
<TI>Pattern Recognition in Computer Analysis of Nucleoside Sequences
</TI>
<SU>Pattern recognition;
    Discrimination;
    RU;
    Recognition
</SU>
<AB>Translated from Molekulyarnaya Biologiya, 23(5), 1248-1262, Sept.-Oct.
1989. "The results of using the 'generalized portrait' algorithm for pattern
recognition to find an Escherichia coli promoter are presented. Related 
problems
of feature selection, set selection and computing coordinates of the dividing
vector are solved."
</AB>
<JT>Mol Biol (Mosc ) </JT>
<PY>23</PY>
<VO>23</VO>
<PP>988-999</PP>
</SEQ>

<SEQ>
<UI>0032   Almagor,H.    A Markov Analysis of D.. J.Theor.Biol.   83 
104:633-645
</UI>
<AU>Almagor H
</AU>
<TI>A Markov Analysis of DNA Sequences
</TI>
<SU>Sequence analysis;
    Significance;
    Markov;
    IL;
    DNA
</SU>
<AB>"One of the basic questions to be asked (the 'correlation question') is 
to
what extent are the 64 trinucleotide (triplet) frequencies measured in a
sequence determined by the 16 doublet frequencies in the same sequence. The DNA
is described here as a Markov process, with the nucleotides being outcomes of a
sequence generator. ... Two natural DNA sequences ... are analysed as examples
of the method."
</AB>
<JT>J Theor Biol</JT>
<PY>104</PY>
<VO>104</VO>
<PP>633-645</PP>
</SEQ>

<SEQ>
<UI>0033   Apostolico,A. The Myriad Virtues of .. Combinatorial.. 
85Springer-Verlag
</UI>
<AU>Apostolico A
</AU>
<TI>The Myriad Virtues of Subword Trees
</TI>
<ED>Apostolico A
    Galil Z
</ED>
<BK>Combinatorial Algorithms on Words. NATO ASI Series F: Computer and System
Sciences, vol. 12
</BK>
<SU>Match complex patterns;
    Search tree;
    USA;
    Data structure;
    Compression;
    Regularities
</SU>
<AB>"Several nontrivial applications of subword trees have been developed
since their first appearance. Some such applications depart considerably from
the original motiviations. A brief account of them is attempted here."
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1985</PY>
<PP>85-96</PP>
</SEQ>

<SEQ>
<UI>0034   Arratia,R.    An Extreme Value Theor.. Ann.Statist.    86 
14(3):971-993
</UI>
<AU>Arratia R;
    Gordon L;
    Waterman MS
</AU>
<TI>An Extreme Value Theory for Sequence Matching
</TI>
<SU>Pairwise comparison;
    Significance;
    USA;
    Sequence match;
    Longest common
</SU>
<AB>"Consider finite sequences X1...Xm and Y1...Yn .... We study the
distribution of the longest contiguous run of matches between the X's and Y's,
allowing at most k mismatches. The distribution is closely approximated by that
of the maximum of (1-p)mn i.i.d. negative binomial random variables."
</AB>
<JT>Ann Statist</JT>
<PY>1986</PY>
<VO>14</VO>
<NO>3</NO>
<PP>971-993</PP>
</SEQ>

<SEQ>
<UI>0035   Arratia,R.    Stochastic Scrabble: L.. J.Appl.Probab.  88 
25:106-119
</UI>
<AU>Arratia R;
    Morris P;
    Waterman MS
</AU>
<TI>Stochastic Scrabble: Large Deviations for Sequences with Scores
</TI>
<SU>Pairwise comparison;
    Significance;
    USA;
    Longest common;
    Markov;
    Stochastic;
    Score
</SU>
<AB>"A derivation of a law of large numbers for the highest-scoring matching
subsequence is given."
</AB>
<JT>J Appl Probab</JT>
<PY>25</PY>
<VO>25</VO>
<PP>106-119</PP>
</SEQ>

<SEQ>
<UI>0036   Arratia,R.    An Erdos-Renyi Law wit.. Adv.Math.       85 55:13-23
</UI>
<AU>Arratia R;
    Waterman MS
</AU>
<TI>An Erdos-Renyi Law with Shifts
</TI>
<SU>Pairwise comparison;
    Significance;
    USA;
    Longest common;
    Markov
</SU>
<AB>"Motivated by the comparison of DNA sequences, a generalization is given
of the result of Erdos and Renyi on the length Rn of the longest run of heads 
in
the first n tosses of a coin."
</AB>
<JT>Adv Math</JT>
<PY>55</PY>
<VO>55</VO>
<PP>13-23</PP>
</SEQ>

<SEQ>
<UI>0037   Attimonelli,M Multisequence Comparis.. Cell Biophys.   85 
7:239-250
</UI>
<AU>Attimonelli M;
    Lanave C;
    Sbisa E;
    Preparata G;
    Saccone C
</AU>
<TI>Multisequence Comparisons in Protein Coding Genes: Search for Functional
Constraints
</TI>
<SU>Multiple comparison;
    Region;
    Statistical;
    Significance;
    Italy;
    Coding;
    Protein;
    Gene
</SU>
<AB>"The problem ... is to find in a given sequence those regions showing
anomalous persistence structure. Clearly the notion of persistence can only be
defined in a comparative way, i. e., by considering homologous sequences
belonging to different species .... In order to appreciate the statistical
meaning of the observed values of the permanence densities, their expectations
and their statistical fluctuations must be determined ...."
</AB>
<JT>Cell Biophys</JT>
<PY>7</PY>
<VO>7</VO>
<PP>239-250</PP>
</SEQ>

<SEQ>
<UI>0038   Baeza-Yates,R A New Approach to Text.. Proceedings o.. 
89Association for
</UI>
<AU>Baeza-Yates RA;
    Gonnet GH
</AU>
<TI>A New Approach to Text Searching
</TI>
<ED>Belkin NJ
    Van Rijsbergen CJ
</ED>
<BK>Proceedings of the Twelth Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval
</BK>
<SU>Text search;
    Match with don't cares;
    Match with k mismatches;
    CA;
    String match
</SU>
<AB>"We introduce a family of simple and fast algorithms for solving the
classical string matching problem, string matching with don't care symbols and
complement symbols, and multiple patterns. In addition we solve the same
problems allowing up to k mismatches." (Addendum in SIGIR Forum, 23(3,4),
Spring/Summer 1989, p. 7.)
</AB>
<PU>Association for Computing Machinery </PU>
<PL>New York </PL>
<PY>1989</PY>
<PP>168-175</PP>
</SEQ>

<SEQ>
<UI>0039   Bairoch,A.    The PROSITE Dictionary.. Nucleic Acids R 93 
21(13):3097-31
</UI>
<AU>Bairoch A
</AU>
<TI>The PROSITE Dictionary of Sites and Patterns in Proteins, its Current
Status
</TI>
<SU>Database search;
    Sequence database;
    Pattern library;
    Motif;
    Signature;
    Protein;
    PROSITE;
    SWI
</SU>
<AB>"PROSITE is a compilation of sites and patterns found in protein
sequences: it can be used as a method of determining the function of
uncharacterized proteins translated from genomic or cDNA sequences."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1993</PY>
<VO>21</VO>
<NO>13</NO>
<PP>3097-3103</PP>
</SEQ>

<SEQ>
<UI>0040   Bairoch,A.    The SWISS-PROT Protein.. Nucleic Acids R 93 
21(13):3093-30
</UI>
<AU>Bairoch A;
    Boeckmann B
</AU>
<TI>The SWISS-PROT Protein Sequence Data Bank, Recent Developments
</TI>
<SU>Database search;
    Sequence database;
    SWI;
    Protein
</SU>
<AB>"SWISS-PROT is an annotated protein sequence database established in 1986
and maintained collaboratively, since 1988, by the Department of Medical
Biochemistry of the University of Geneva and the EMBL Data Library."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1993</PY>
<VO>21</VO>
<NO>13</NO>
<PP>3093-3096</PP>
</SEQ>

<SEQ>
<UI>0041   Barker,W.C.   The PIR-International .. Nucleic Acids R 93 
21(13):3089-30
</UI>
<AU>Barker WC;
    George DG;
    Mewes HW;
    Pfeiffer F;
    Tsugita A
</AU>
<TI>The PIR-International Databases
</TI>
<SU>Sequence database;
    USA
</SU>
<AB>"This paper briefly describes the architecture of the Protein Sequence
Database, a number of other PIR-International databases, and mechanisms for
providing access to and for distribution of these databases."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1993</PY>
<VO>21</VO>
<NO>13</NO>
<PP>3089-3092</PP>
</SEQ>

<SEQ>
<UI>0042   Bell,T.       Longest-Match String S.. Software.Practi 93 
23(7):757-771
</UI>
<AU>Bell T;
    Kulp D
</AU>
<TI>Longest-Match String Searching for Ziv-Lempel Compression
</TI>
<SU>String match;
    NZ;
    Compression;
    String search;
    Data structure;
    Search tree;
    Boyer-Moore
</SU>
<AB>"Hashing, binary search trees, splay trees and the Boyer-Moore searching
algorithm are traditionally used to search for exact matches, but we show how
these can be adapted to find longest matches."
</AB>
<JT>Software Practice Experience </JT>
<PY>1993</PY>
<VO>23</VO>
<NO>7</NO>
<PP>757-771</PP>
</SEQ>

<SEQ>
<UI>0043   Benson,D.     GenBank                  Nucleic Acids R 93 
21(13):2963-29
</UI>
<AU>Benson D;
    Lipman DJ;
    Ostell J
</AU>
<TI>GenBank
</TI>
<SU>Database search;
    Sequence database;
    USA;
    GenBank
</SU>
<AB>"The GenBank sequence database has undergone an expansion in data
coverage, annotation content and the development of new services for the
scientific community."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1993</PY>
<VO>21</VO>
<NO>13</NO>
<PP>2963-2965</PP>
</SEQ>

<SEQ>
<UI>0044   Bishop,M.J.   Inference of Evolution.. Nucleic Acid .. 87IRL Press
</UI>
<AU>Bishop MJ;
    Friday AE;
    Thompson EA
</AU>
<TI>Inference of Evolutionary Relationships
</TI>
<ED>Bishop MJ
    Rawlings CJ
</ED>
<BK>Nucleic Acid and Protein Sequence Analysis: A Practical Approach
</BK>
<SU>Pairwise alignment;
    Likelihood;
    UK;
    Phylogeny
</SU>
<AB>"Much of the literature of molecular evolution is confused as to what
constitute the data which have been observed, what constitutes the model ... 
and
how to evaluate the relative merits of the competing hypotheses which are being
considered. Outlining how to set about this is a practical matter ...."
Describes a maximum likelihood method to align two sequences.
</AB>
<PU>IRL Press </PU>
<PL>Oxford </PL>
<PY>1987</PY>
<PP>359-385</PP>
</SEQ>

<SEQ>
<UI>0045   Bodlaender,H. Parameterized Complexi.. First Interna.. 94Steering 
Commit
</UI>
<AU>Bodlaender H;
    Downey RG;
    Fellows MR;
    Hallett MT;
    Wareham HT
</AU>
<TI>Parameterized Complexity Analysis in Computational Biology
</TI>
<BK>First International Workshop on Shape and Pattern Matching in
Computational Biology
</BK>
<SU>Multiple alignment;
    Consensus sequence;
    Longest common;
    Complexity;
    CA;
    Parameterized
</SU>
<AB>"We describe some new results on the Longest Common Subsequence problem.
In particular, we show that the problem is hard for W[t] for all t when
parameterized by the number of strings and the size of the alphabet. Lower
bounds on the complexity of this basic combinatorial problem imply lower bounds
on more general sequence alignment and consensus discovery problems. We also
describe a number of open problems pertaining to the parameterized complexity 
of
problems in computational biology ...."
</AB>
<PU>Steering Committee of the 1994 IEEE Workshop on Shape and Pattern
Matching in Computational Biology</PU>
<PY>1994</PY> 
<PL>Yorktown Heights, NY</PL>
<PY>1994</PY>
<PP>PP:99-116</PP>
</SEQ>

<SEQ>

<UI>0046   Bodlaender,H. The Parameterized Comp.. Lecture Notes i 94 
807:15-30
</UI>
<AU>Bodlaender H;
    Downey RG;
    Fellows MR;
    Wareham HT
</AU>
<TI>The Parameterized Complexity of Sequence Alignment and Consensus
</TI>
<SU>Multiple alignment;
    Longest common;
    Complexity;
    CA;
    Sequence alignment;
    Parameterized
</SU>
<AB>5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "The
Longest Common Subsequence problem is examined from the point of view of
parameterized computational complexity. ... Our main results show that: (1) The
Longest Common Subsequence (LCS) parameterized by the number of sequences to be
analyzed is hard for W[t] for all t. (2) The LCS problem, parameterized by the
length of the common subsequence, belongs to W[P] and is hard for W[2]. (3) The
LCS problem parameterized both by the number of sequences and the length of the
common subsequence, is complete for W[1]."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>807</PY>
<VO>807</VO>
<PP>15-30</PP>
</SEQ>

<SEQ>
<UI>0047   Bork,P.       Mobile Modules and Mot.. Curr.Opin.Struc 92 
2:413-421
</UI>
<AU>Bork P
</AU>
<TI>Mobile Modules and Motifs
</TI>
<SU>Pattern recognition;
    DE;
    Motif;
    Module
</SU>
<AB>"Therefore, at the sequence level, [modules] can often be recognized only
by comparison with specific motifs. These motifs are usually not characterized
by just a few conserved amino acids, but rather are complex arrangements that
will become increasingly blurred with the rapidly growing number of available
sequences." Characteristics of modules. Use of motifs for identifying modules.
Discerning homology from similarity. Proteins with new modular architecture.
</AB>
<JT>Curr Opin Struct Biol</JT>
<PY>2</PY>
<VO>2</VO>
<PP>413-421</PP>
</SEQ>

<SEQ>
<UI>0048   Bowie,J.U.    A Method to Identify P.. Science         91 253(12 
July):1
</UI>
<AU>Bowie JU;
    Luthy R;
    Eisenberg D
</AU>
<TI>A Method to Identify Protein Sequences that Fold into a Known Three-
Dimensional Structure
</TI>
<SU>Database search;
    USA;
    Structure;
    Match a pattern matrix;
    Profile;
    Protein;
    Fold
</SU>
<AB>"The inverse protein folding problem, the problem of finding which amino
acid sequences fold into a known three-dimensional (3D) structure, can be
effectively attacked by finding sequences that are most compatible with the
environments of the residues in the 3D structure." From the known 3D structure
of a protein P, construct a 3D structure profile; use it to search a database 
of
protein sequences to identify proteins most likely to adopt a fold similar to 
P.
</AB>
<JT>Science </JT>
<PY>1991</PY>
<VO>253</VO>
<NO>12 July</NO>
<PP>164-170</PP>
</SEQ>

<SEQ>
<UI>0049   Breslauer,D.  Tight Comparison Bound.. Inform.Process. 93 47:51-57
</UI>
<AU>Breslauer D;
    Colussi L;
    Toniolo L
</AU>
<TI>Tight Comparison Bounds for the String Prefix-Matching Problem
</TI>
<SU>Match a prefix;
    Complexity;
    Italy;
    String match;
    Pattern match
</SU>
<AB>"In the string prefix-matching problem one is interested in finding the
longest prefix of a pattern string of length m that occurs starting at each
position of a text string of length n. ... In this paper we study the exact
complexity of the string prefix-matching problem in the deterministic 
sequential
comparison model."
</AB>
<JT>Inform Process Lett</JT>
<PY>47</PY>
<VO>47</VO>
<PP>51-57</PP>
</SEQ>

<SEQ>
<UI>0050   Chang,J.H.    Parallel Parsing on a .. IEEE Trans.Comp 87 
36(1):64-75
</UI>
<AU>Chang JH;
    Ibarra OH;
    Palis MA
</AU>
<TI>Parallel Parsing on a One-Way Array of Finite-State Machines
</TI>
<SU>Language;
    Automata;
    Parallel;
    Sequence recognition;
    USA;
    Parsing;
    Longest common
</SU>
<AB>"We show that a one-way two-dimensional iterative array of finite-state
machines (2-DIA) can recognize and parse strings of any context-free language 
in
linear time. ... We also consider the problem of finding approximate patterns 
in
strings, the string-to-string correction problem, and the longest common
subsequence problem, and show that they can be solved in linear time on a 2-
DIA."
</AB>
<JT>IEEE Trans Comput</JT>
<PY>1987</PY>
<VO>36</VO>
<NO>1</NO>
<PP>64-75</PP>
</SEQ>

<SEQ>
<UI>0051   Fickett,J.W.  Development of a Datab.. Mathematical .. 89CRC Press
</UI>
<AU>Fickett JW;
    Burks C
</AU>
<TI>Development of a Database for Nucleotide Sequences
</TI>
<ED>Waterman MS
</ED>
<BK>Mathematical Methods for DNA Sequences
</BK>
<SU>Sequence database;
    USA;
    Nucleotide
</SU>
<AB>"We know of only two data banks currently attempting comprehensive
coverage of nucleotide sequence data: the GenBank genetic sequence data bank in
the U.S., and the data bank at EMBL (European Molecular Biology Laboratory at
Heidelberg, West Germany). We will describe one approach, undertaken by the
GenBank staff at LANL (Los Alamos National Laboratory), to the development of a
database that does justice to the natural structure of the data, facilitates
current applications, and allows expansion for the foreseeable future."
</AB>
<PU>CRC Press </PU>
<PL>Boca Raton, FL </PL>
<PY>1989</PY>
<PP>1-34</PP>
</SEQ>

<SEQ>
<UI>0052   Chen,M.T.     Efficient and Elegant .. Combinatorial.. 
85Springer-Verlag
</UI>
<AU>Chen MT;
    Seiferas J
</AU>
<TI>Efficient and Elegant Subword-Tree Construction
</TI>
<ED>Apostolico A
    Galil Z
</ED>
<BK>Combinatorial Algorithms on Words. NATO ASI Series F: Computer and System
Sciences, vol. 12
</BK>
<SU>Match complex patterns;
    USA;
    Automata;
    Search tree
</SU>
<AB>"A clean version of Weiner's linear-time compact-subword-tree 
construction
simultaneously also constructs the smallest deterministic finite automaton
recognizing the reverse subwords."
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1985</PY>
<PP>97-107</PP>
</SEQ>

<SEQ>
<UI>0053   Galil,Z.      Open Problems in Strin.. Combinatorial.. 
85Springer-Verlag
</UI>
<AU>Galil Z
</AU>
<TI>Open Problems in Stringology
</TI>
<ED>Apostolico A
    Galil Z
</ED>
<BK>Combinatorial Algorithms on Words. NATO ASI Series F: Computer and System
Sciences, vol. 12
</BK>
<SU>String match;
    USA
</SU>
<AB>"Several open problems concerning combinatorial algorithms on strings are
described." Questions about string matching. Generalizations of string 
matching.
Index construction. Miscellaneous problems.
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1985</PY>
<PP>1-8</PP>
</SEQ>

<SEQ>
<UI>0054   Guibas,L.J.   Periodicities in Strings Combinatorial.. 
85Springer-Verlag
</UI>
<AU>Guibas LJ
</AU>
<TI>Periodicities in Strings
</TI>
<ED>Apostolico A
    Galil Z
</ED>
<BK>Combinatorial Algorithms on Words. NATO ASI Series F: Computer and System
Sciences, vol. 12
</BK>
<SU>Regularities;
    USA
</SU>
<AB>"In this talk we summarize what is known about the periodicities of
strings. A period of a string is a shift that causes a string to match itself."
For an expanded version of these results see Guibas and Odlyzko (1981).
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1985</PY>
<PP>257-269</PP>
</SEQ>

<SEQ>
<UI>0055   Pinter,R.Y.   Efficient String Match.. Combinatorial.. 
85Springer-Verlag
</UI>
<AU>Pinter RY
</AU>
<TI>Efficient String Matching with Don't-care Patterns
</TI>
<ED>Apostolico A
    Galil Z
</ED>
<BK>Combinatorial Algorithms on Words. NATO ASI Series F: Computer and System
Sciences, vol. 12
</BK>
<SU>IL;
    String match
</SU>
<AB>"The main result of this paper is an algorithm to deal efficiently with
patterns containing a definite number of don't-care symbols. Our method is to
collect 'evidence' about the occurrences of the constant parts of the pattern 
in
the text, using the algorithm of Aho and Corasick."
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1985</PY>
<PP>11-29</PP>
</SEQ>

<SEQ>
<UI>0056   Day,W.H.E.    Alignment, Comparison .. New Approache.. 
94Springer-Verlag
</UI>
<AU>Day WHE;
    McMorris FR
</AU>
<TI>Alignment, Comparison and Consensus of Molecular Sequences
</TI>
<ED>Diday E
    Lechevallier Y;
    Schader M;
    Bertrand P;
    Burtschy B
</ED>
<BK>New Approaches in Classification and Data Analysis
</BK>
<SU>Sequence comparison;
    Review;
    CA
</SU>
<AB>"A rich and varied literature on sequence comparison has developed, one
containing hundreds of theoretical or methodological contributions and 
thousands
of applications. However, the focus of our review is on the theory and
methodology of sequence comparison." 150 references
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1994</PY>
<PP>327-346</PP>
</SEQ>

<SEQ>
<UI>0057   Fitch,W.M.    Locating Gaps in Amino.. Biochem.Genet.  69 3:99-108
</UI>
<AU>Fitch WM
</AU>
<TI>Locating Gaps in Amino Acid Sequences to Optimize the Homology Between 
Two
Proteins
</TI>
<SU>Pairwise alignment;
    Gap;
    USA;
    Homology;
    Amino acid;
    Protein
</SU>
<AB>"A method for optimally locating gaps in the amino acid sequences of
homologous proteins is presented. ... The major virtues of this procedure are
that the assertion of homology does not depend upon the prior introduction of
gaps and that a genetic rather than a chemical test is the basis of for
asserting a genetic relationship."
</AB>
<JT>Biochem Genet</JT>
<PY>3</PY>
<VO>3</VO>
<PP>99-108</PP>
</SEQ>

<SEQ>
<UI>0058   Amir,A.       Dynamic Dictionary Mat.. J.Comput.System 94 
49:208-222
</UI>
<AU>Amir A;
    Farach M;
    Galil Z;
    Giancarlo R;
    Park K
</AU>
<TI>Dynamic Dictionary Matching
</TI>
<SU>Dictionary match;
    USA;
    Dynamic
</SU>
<AB>"We consider the dynamic dictionary matching problem. We are given a set
of pattern strings (the dictionary) that can change over time; that is, we can
insert a new pattern into the dictionary or delete a pattern from it. Moreover,
given a text string, we must be able to find all occurrences of any pattern of
the dictionary in the text. Let D0 be the empty dictionary. We present an
algorithm that performs any sequence of the following operations in the given
time bounds: (1) insert (p, Di-1) ... (2) delete (p, Di-1) ... (3) search (t,
Di). Search text t[1,n] for all occurrences of the patterns of dictionary Di.
The time complexity is O( ( n + tocc ) log |Di| ), where tocc is the total
number of occurrences of patterns in the text."
</AB>
<JT>J Comput Systems Sci</JT>
<PY>49</PY>
<VO>49</VO>
<PP>208-222</PP>
</SEQ>

<SEQ>
<UI>0059   Heumann,K.    A New Concept of Seque.. Comput.Appl.Bio 94 
10(5):519-526
</UI>
<AU>Heumann K;
    George D;
    Mewes HW
</AU>
<TI>A New Concept of Sequence Data Distribution on Wide Area Networks
</TI>
<SU>Sequence database;
    DE;
    Distribution;
    Network
</SU>
<AB>"Accepted concepts in distributed applications design have been applied 
in
the development of a network-based system for the synchronization of remote
sequence database access sites by an incremental update mechanism. Computer
hardware requirements, network bandwidth, and stability considerations make
centralized access to essential computerized resources undesirable. A network
model has been developed to distribute access over a collection of remotely
situated computer centers."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>5</NO>
<PP>519-526</PP>
</SEQ>

<SEQ>
<UI>0060   Gonnet,G.H.   Text Algorithms. Chapt.. Handbook of A.. 
91Addison-Wesley
</UI>
<AU>Gonnet GH;
    Baeza-Yates RA
</AU>
<TI>Text Algorithms. Chapter 7 in Handbook of Algorithms and Data Structures
In Pascal and C
</TI>
<BK>Handbook of Algorithms and Data Structures in Pascal and C
</BK>
<SU>Text search;
    Review;
    SWI;
    Data structure;
    Structure;
    Algorithm
</SU>
<AB>"Text searching is the process of finding a pattern within a string of
characters. ... We will divide the algorithms between those which search the
text as given, those which require preprocessing of the text and other text
algorithms." The entire book has references to 1350 published papers.
</AB>
<PU>Addison-Wesley</PU>
<PL>Wokingham, UK</PL>
<PY>1991</PY>
<VO>2</VO>
<PP>251-288</PP>
</SEQ>

<SEQ>
<UI>0061   Gordon,L.     An Extreme Value Theor.. Probab.Theory R 86 
72:279-287
</UI>
<AU>Gordon L;
    Schilling MF;
    Waterman MS
</AU>
<TI>An Extreme Value Theory for Long Head Runs
</TI>
<SU>Pairwise comparison;
    Significance;
    USA;
    Probabilistic
</SU>
<AB>"We show that the probabilistic behavior of the length of the longest 
pure
head run (in the first n independent coin tosses) is closely approximated by
that of the greatest integer function of the maximum of n(1-p) i.i.d.
exponential random variables."
</AB>
<JT>Probab Theory Related Fields </JT>
<PY>72</PY>
<VO>72</VO>
<PP>279-287</PP>
</SEQ>

<SEQ>
<UI>0062   Jiang,T.      Optimization Problems .. Advances in O.. 93
</UI>
<AU>Jiang T;
    Li M
</AU>
<TI>Optimization Problems in Molecular Biology
</TI>
<ED>Du DZ
    Sun J
</ED>
<BK>Advances in Optimization and Approximation
</BK>
<SU>Longest common;
    Multiple alignment;
    CA;
    Optimization
</SU>
<AB>Manuscript received 11 February 1994. Jiang, Lawler, Wang (1994), p. 768.
"Rather than an extensive literature survey, the purpose of this article is to
introduce in depth several prominent optimization problems arising in molecular
biology. We will emphasize recent developments and provide proof sketches for
the results whenever possible."
</AB>
<PY>1993</PY>
</SEQ>

<SEQ>
<UI>0063   Johnson,M.S.  Alignment and Searchin.. J.Mol.Biol.     93 
231:735-752
</UI>
<AU>Johnson MS;
    Overington JP;
    Blundell TL
</AU>
<TI>Alignment and Searching for Common Protein Folds Using a Data Bank of
Structural Templates
</TI>
<SU>Multiple alignment;
    Database search;
    UK;
    Template;
    Protein;
    Fold
</SU>
<AB>"We introduce an approach to protein comparisons in which tertiary
structure information is exploited in the alignment of a protein sequence of
known tertiary structure, or an aligned set of sequences of known homologous
structures, with one or more sequences. ... (The approach produces) a scoring
template suitable for aligning sequences or searching sequence data banks."
</AB>
<JT>J Mol Biol</JT>
<PY>231</PY>
<VO>231</VO>
<PP>735-752</PP>
</SEQ>

<SEQ>
<UI>0064   Karlin,S.     Applications and Stati.. Proc.Nat.Acad.S 93 
90:5873-5877
</UI>
<AU>Karlin S;
    Altschul SF
</AU>
<TI>Applications and Statistics for Multiple High-Scoring Segments in
Molecular Sequences
</TI>
<SU>Sequence analysis;
    Significance;
    USA;
    Segment;
    Statistical;
    Sequence comparison;
    Scoring
</SU>
<AB>"Molecular sequences will frequently yield several high-scoring segments
for which some combined assessment is in order. This paper describes the
statistical distribution for the sum of the scores of multiple high-scoring
segments and illustrates its application to the identification of possible
transmembrane segments and the evaluation of sequence similarity."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>90</PY>
<VO>90</VO>
<PP>5873-5877</PP>
</SEQ>

<SEQ>
<UI>0065   Karp,R.M.     Rapid Identification o.. ACM Sympos.Theo 72 
4:125-136
</UI>
<AU>Karp RM;
    Miller RE;
    Rosenberg AL
</AU>
<TI>Rapid Identification of Repeated Patterns in Strings, Trees and Arrays
</TI>
<SU>Sequence analysis;
    Regularities;
    USA;
    Pattern discovery;
    Identification;
    Repeat
</SU>
<AB>"We describe a strategy for constructing efficient algorithms for solving
two type of matching problems. ... Depth d Matches: Find all depth d
substructures of S which occur at least twice in S (possibly overlapping), and
find the position in S of each such repeated substructure. Maximum Matches: 
Find
the maximum depth D for which S has a repeated depth D substructure ... ."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>4</PY>
<VO>4</VO>
<PP>125-136</PP>
</SEQ>

<SEQ>
<UI>0066   Kashyap,R.L.  An Effective Algorithm.. Inform.Sci.     81 
23(3):201-217
</UI>
<AU>Kashyap RL;
    Oommen BJ
</AU>
<TI>An Effective Algorithm for String Correction Using Generalized Edit
Distances - II. Computational Complexity of the Algorithm and Some Applications
</TI>
<SU>Correction;
    USA;
    Edit;
    Complexity;
    Dictionary match;
    Distance;
    Algorithm
</SU>
<AB>"This paper deals with the problem of estimating an unknown transmitted
string X, belonging to a finite dictionary H from its observable noisy version
Y. ... We study the computational complexity of Algorithm I, and illustrate
quantitatively the advantage Algorithm I has over the standard technique and
other algorithms."
</AB>
<JT>Inform Sci</JT>
<PY>1981</PY>
<VO>23</VO>
<NO>3</NO>
<PP>201-217</PP>
</SEQ>

<SEQ>
<UI>0067   Kashyap,R.L.  A Common Basis for Sim.. Internat.J.Comp 83 13:17-40
</UI>
<AU>Kashyap RL;
    Oommen BJ
</AU>
<TI>A Common Basis for Similarity Measures Involving Two Strings
</TI>
<SU>Sequence proximity;
    USA;
    Edit;
    Longest common;
    Supersequence;
    Pairwise comparison;
    Similarity
</SU>
<AB>"We consider an abstract measure between strings X and Y, written as
D(X,Y), defined in terms of two abstract operators + and * and a binary 
function
d whose arguments are symbols of an alphabet A. ... Many new results are
obtained using this abstract formulation, such as an explicit linear
relationship between the LLCS and the LSCS between two strings."
</AB>
<JT>Internat J Comput Math</JT>
<PY>13</PY>
<VO>13</VO>
<PP>17-40</PP>
</SEQ>

<SEQ>
<UI>0068   Kashyap,R.L.  Similarity Measures fo.. Internat.J.Comp 83 
13:95-104
</UI>
<AU>Kashyap RL;
    Oommen BJ
</AU>
<TI>Similarity Measures for Sets of Strings
</TI>
<SU>Multiple comparison;
    USA;
    Sequence proximity;
    Similarity
</SU>
<AB>"We extend the results (of Kashyap and Oommen 1983) to capture various
numerical and nonnumerical measures involving more than two strings."
</AB>
<JT>Internat J Comput Math</JT>
<PY>13</PY>
<VO>13</VO>
<PP>95-104</PP>
</SEQ>

<SEQ>
<UI>0069   Lausen,B.     Statistical Analysis o.. Classificatio.. 
91Springer-Verlag
</UI>
<AU>Lausen B
</AU>
<TI>Statistical Analysis of Genetic Distance Data
</TI>
<ED>Bock HH
    Ihm P
</ED>
<BK>Classification, Data Analysis, and Knowledge Organization. Models and
Methods with Applications
</BK>
<SU>Pairwise alignment;
    Significance;
    Dot;
    DE;
    Statistical;
    Genetic;
    Distance
</SU>
<AB>"A genetic distance may be computed from aligned genetic sequence data;
e.g. DNA sequences. We discuss the dot-matrix plot as a possible graphical 
check
of the goodness of the alignment. ... Therefore, we discuss aspects of an
heuristic which allows the combined exploration of genetic distance between the
sequences and of different positional variation. A tree structure is not 
assumed
for such an exploration."
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1991</PY>
<PP>254-261</PP>
</SEQ>

<SEQ>
<UI>0070   Majster,M.E.  Efficient On-line Cons.. SIAM J.Comput.  80 
9(4):785-807
</UI>
<AU>Majster ME;
    Reiser A
</AU>
<TI>Efficient On-line Construction and Correction of Position Trees
</TI>
<SU>Pattern match;
    Search tree;
    DE;
    Correction;
    Data structure
</SU>
<AB>"This paper presents an on-line algorithm for the construction of 
position
trees, i.e., an algorithm which constructs the position tree for a given string
while reading the string from left to right. In addition, an on-line correction
algorithm is presented which - upon a change in the string - can be used to
construct the new position tree."
</AB>
<JT>SIAM J Comput</JT>
<PY>1980</PY>
<VO>9</VO>
<NO>4</NO>
<PP>785-807</PP>
</SEQ>

<SEQ>
<UI>0071   Manber,U.     Suffix Arrays: A New M.. Proceedings o.. 90Society 
for Ind
</UI>
<AU>Manber U;
    Myers EW
</AU>
<TI>Suffix Arrays: A New Method for On-line String Searches
</TI>
<BK>Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms
</BK>
<SU>String match;
    Search tree;
    USA;
    Data structure;
    String search;
    Suffix
</SU>
<AB>"A new and conceptually simple data structure, called a suffix array, for
on-line string searches is introduced in this paper. ... Suffix arrays permit
on-line string searches of the type, 'Is W a substring of A?' to be answered in
time O(P+logN), where P is the length of W and N is the length of A, which is
competitive with (and in some cases slightly better than) suffix trees."
</AB>
<PU>Society for Industrial and Applied Mathematics</PU>
<PL>Philadelphia, PA</PL>
<PY>1990</PY>
<PP>319-327</PP>
</SEQ>

<SEQ>
<UI>0072   Morrison,D.R. PATRICIA - Practical A.. J.Assoc.Comput. 68 
15(4):514-534
</UI>
<AU>Morrison DR
</AU>
<TI>PATRICIA - Practical Algorithm to Retrieve Information Coded in
Alphanumeric
</TI>
<SU>String match;
    Search tree;
    USA;
    Algorithm
</SU>
<AB>"PATRICIA is an algorithm which provides a flexible means of storing,
indexing, and retrieving information in a large file .... It retrieves
information in response to keys furnished by the user with a quantity of
computation which has a bound which depends linearly on the length of keys and
the number of their proper occurrences and is otherwise independent of the size
of the library." Section 6 is on how PATRICIA detects the presence of a phrase
and finds its proper occurrences.
</AB>
<JT>J Assoc Comput Mach</JT>
<PY>1968</PY>
<VO>15</VO>
<NO>4</NO>
<PP>514-534</PP>
</SEQ>

<SEQ>
<UI>0073   Myers,G.      A Four Russians Algori.. J.Assoc.Comput. 92 
39(4):430-448
</UI>
<AU>Myers G
</AU>
<TI>A Four Russians Algorithm for Regular Expression Pattern Matching
</TI>
<SU>Match complex patterns;
    Language;
    USA;
    Pattern match;
    Expression;
    Automata;
    Algorithm
</SU>
<AB>"We present an O(PN/log N) worst-case time and space algorithm for
determining if a word A of length N is in the language denoted by a regular
expression R of length P."
</AB>
<JT>J Assoc Comput Mach</JT>
<PY>1992</PY>
<VO>39</VO>
<NO>4</NO>
<PP>430-448</PP>
</SEQ>

<SEQ>
<UI>0074   Owolabi,O.    Efficient Pattern Sear.. Inform.Process. 93 47:17-21
</UI>
<AU>Owolabi O
</AU>
<TI>Efficient Pattern Searching over Large Dictionaries
</TI>
<SU>Database search;
    N-gram;
    Boyer-Moore;
    NI;
    Pattern match;
    Pattern search
</SU>
<AB>"A method is described which is suitable for on-line query term 
expansion.
By using an efficient version of the N-gram method for similarity matching, a
small set of strings from the dictionary is selected. From this set, all the
strings relevant to the query term are then identified using the Boyer-Moore
pattern matching algorithm."
</AB>
<JT>Inform Process Lett</JT>
<PY>47</PY>
<VO>47</VO>
<PP>17-21</PP>
</SEQ>

<SEQ>
<UI>0075   Pearson,W.R.  Identifying Distantly .. Curr.Opin.Struc 91 
1:321-326
</UI>
<AU>Pearson WR
</AU>
<TI>Identifying Distantly Related Protein Sequences
</TI>
<SU>Database search;
    Review;
    Consensus sequence;
    Significance;
    USA;
    Sequence comparison;
    Statistical;
    Region;
    Protein
</SU>
<AB>"New methods for identifying distantly related proteins can be used to
confirm sequence homology when only weak sequence similarity remains. These
methods improve the selectivity of sequence comparison either by calculating 
the
statistical significance of the most similar region, or by using consensus
patterns rather than simple pairwise similarity scores."
</AB>
<JT>Curr Opin Struct Biol</JT>
<PY>1</PY>
<VO>1</VO>
<PP>321-326</PP>
</SEQ>

<SEQ>
<UI>0076   Perlwitz,M.D. Pattern Analysis of th.. Adv.Appl.Math.  88 9:7-21
</UI>
<AU>Perlwitz MD;
    Burks C;
    Waterman MS
</AU>
<TI>Pattern Analysis of the Genetic Code
</TI>
<SU>Genetic;
    USA;
    Codon;
    Mapping
</SU>
<AB>"The genetic code is examined in a new and systematic fashion: we 
consider
the code as a mapping of one finite set (the 64 codons) to another (the 20 
amino
acids). Given a class of mappings simpler than the actual code, we ask which
mappings best approximate it."
</AB>
<JT>Adv Appl Math</JT>
<PY>9</PY>
<VO>9</VO>
<PP>7-21</PP>
</SEQ>

<SEQ>
<UI>0077   Rice,C.M.     The EMBL Data Library    Nucleic Acids R 93 
21(13):2967-29
</UI>
<AU>Rice CM;
    Fuchs R;
    Higgins DG;
    Stoehr PJ;
    Cameron GN
</AU>
<TI>The EMBL Data Library
</TI>
<SU>Database search;
    Sequence database;
    DE;
    EMBL
</SU>
<AB>"The principal role of the EMBL Data Library, since its inception in 
1980,
has been to maintain and distribute a database of nucleotide sequences (the 
EMBL
Nucleotide Sequence Database). It also supports and maintains the protein
sequence database SWISS-PROT and distributes other databases of interest to
molecular biologists."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1993</PY>
<VO>21</VO>
<NO>13</NO>
<PP>2967-2971</PP>
</SEQ>

<SEQ>
<UI>0078   Sackin,M.J.   Amino Acid Sequences i.. Biochem.J.      65 
96:70P-71P
</UI>
<AU>Sackin MJ;
    Sneath PHA
</AU>
<TI>Amino Acid Sequences in Proteins: A Computer Study
</TI>
<SU>Pairwise comparison;
    UK;
    Amino acid;
    Protein
</SU>
<AB>"An ALGOL program for the Elliott 803 computer has been developed for
comparing the amino acid sequences in two protein chains. It can detect
similarities, deletions, insertions and inversions that would be hard to detect
by eye. The method is to 'slide' the chains past each other one step at a time
and to count the number of amino acids that match."
</AB>
<JT>Biochem J</JT>
<PY>96</PY>
<VO>96</VO>
<PP>70P-71P</PP>
</SEQ>

<SEQ>
<UI>0079   Slisenko,A.O. Determination in Real .. Soviet Math.Dok 80 
21(2):392-395
</UI>
<AU>Slisenko AO
</AU>
<TI>Determination in Real Time of all the Periodicities in a Word
</TI>
<SU>Regularities;
    Complexity;
    RU;
    Word
</SU>
<AB>Describes "the general properties of a construction upon which the proof
of the following assertion is based: There exists an addressable machine that
determines in real time all the periodicities in an input word. ... One can
extract from the basic properties of the algorithm for finding the 
periodicities
of a word in real time a complete solution to the problem of the complexity of 
a
number of well-known problems concerning the determination of the subwords of 
an
input word."
</AB>
<JT>Soviet Math Dokl</JT>
<PY>1980</PY>
<VO>21</VO>
<NO>2</NO>
<PP>392-395</PP>
</SEQ>

<SEQ>
<UI>0080   Steele,J.M.   Long Common Subsequenc.. SIAM J.Appl.Mat 82 
42(4):731-737
</UI>
<AU>Steele JM
</AU>
<TI>Long Common Subsequences and Probability of Two Random Strings
</TI>
<SU>Pairwise comparison;
    Significance;
    USA;
    Probabilistic;
    Longest common;
    Subsequence;
    Probability
</SU>
<AB>"Let (x1,...xn) and (y1,...yn) be two strings from an alphabet A and let
Ln denote their longest common subsequence. The probabilistic behavior of Ln is
studied under various probability models for the x's and y's."
</AB>
<JT>SIAM J Appl Math</JT>
<PY>1982</PY>
<VO>42</VO>
<NO>4</NO>
<PP>731-737</PP>
</SEQ>

<SEQ>
<UI>0081   Unger,R.      DNAMAT: An Efficient G.. Comput.Appl.Bio 86 
2(4):283-289
</UI>
<AU>Unger R;
    Harel D;
    Sussman JL
</AU>
<TI>DNAMAT: An Efficient Graphic Matrix Sequence Homology Algorithm and its
Application to Structural Analysis
</TI>
<SU>Pairwise comparison;
    Multiple comparison;
    Dot;
    IL;
    Display;
    Homology;
    Algorithm;
    Graphic;
    Matrix
</SU>
<AB>"We present a fast algorithm to produce a graphic matrix representation 
of
sequence homology. ... In addition we suggest a way to extend our approach to
analyse a series of related DNA or RNA sequences, in order to determine certain
common structural features. The analysis is done by 'summing' a set of dot-
matrices to produce an overall matrix that displays structural elements common
to most of the sequences."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1986</PY>
<VO>2</VO>
<NO>4</NO>
<PP>283-289</PP>
</SEQ>

<SEQ>
<UI>0082   Vishkin,U.    Deterministic Sampling.. ACM Sympos.Theo 90 
22:170-180
</UI>
<AU>Vishkin U
</AU>
<TI>Deterministic Sampling - A New Technique for Fast Pattern Matching
</TI>
<SU>Parallel;
    USA;
    Pattern match;
    String match;
    Sampling
</SU>
<AB>"Consider the string matching problem. Given the pattern, we select
carefully a sample of its positions .... Then, we search for the sample. For
non-periodic patterns, the sample ... provides sparse verification. This
approach enables to perform the text analysis ... in O(log* n) time and optimal
speed-up on a PRAM. ... It also leads to a new linear time serial algorithm for
string matching."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>22</PY>
<VO>22</VO>
<PP>170-180</PP>
</SEQ>

<SEQ>
<UI>0083   Weiner,P.     Linear Pattern Matchin.. IEEE Sympos.Swi 73 14:1-11
</UI>
<AU>Weiner P
</AU>
<TI>Linear Pattern Matching Algorithms
</TI>
<SU>String match;
    Dictionary match;
    Search tree;
    USA;
    Pattern match;
    Data structure;
    Algorithm
</SU>
<AB>15-17 October 1973. "We introduce an interesting data structure called a
bi-tree. A linear time algorithm for obtaining a compacted version of a bi-tree
associated with a given string is presented. With this construction as the 
basic
tool, we indicate how to solve several pattern matching problems ... in linear
time."
</AB>
<JT>IEEE Sympos Switching Automata Theory </JT>
<PY>14</PY>
<VO>14</VO>
<PP>1-11</PP>
</SEQ>

<SEQ>
<UI>0084   Amir,A.       Adaptive Dictionary Ma.. IEEE Sympos.Fou 91 
32:760-766
</UI>
<AU>Amir A;
    Farach M
</AU>
<TI>Adaptive Dictionary Matching
</TI>
<SU>Dictionary match;
    Suffix;
    USA
</SU>
<AB>"We present new semi-adaptive and fully-adaptive dictionary matching
algorithms. In the fully adaptive algorithm, the dictionary is precessed in 
time
O( |D| log |D| ). Inserting a new pattern P into the dictionary can be done in
time O( |P| log |D| ). A dictionary pattern can be deleted in time O( log |D| 
).
Text scanning is accomplished in time O( |T| log |D| ). We also present a
parallel version of the algorithm with optimal speedup for the dictionary
construction and pattern addition phase and a logarithmic overhead in the text
scan phase. Our method incorporates a new way of using suffix trees ...."
</AB>
<JT>IEEE Sympos Found Comput Sci</JT>
<PY>32</PY>
<VO>32</VO>
<PP>760-766</PP>
</SEQ>

<SEQ>
<UI>0085   Commentz-Walt A String Matching Algo.. Lecture Notes i 79 
71:118-132
</UI>
<AU>Commentz-Walter B
</AU>
<TI>A String Matching Algorithm Fast on the Average. Extended Abstract
</TI>
<SU>Dictionary match;
    DE;
    String match;
    Algorithm
</SU>
<AB>Proceedings, 6th ICALP, International Colloquium on Automata, Languages
and Programming. Graz, Austria, July 1979. "A user of the database specifies 
one
or several words or phrases, so called keywords, describing the information
sought. The answer will be the documents which contain all or some of the user
specified keywords. It takes too much time to scan each document of the 
database
for every user separately. Therefore, we introduce a sort of secondary index 
...
containing keyword fragments. Searching the index with the user specified
keywords yields a superset of the documents required. ... Therefore, we scan 
the
documents of the superset for the user specified keywords."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>71</PY>
<VO>71</VO>
<PP>118-132</PP>
</SEQ>

<SEQ>
<UI>0086   Barton,G.J.   LOPAL and SCAMP: Techn.. J.Mol.Graphics  88 
6(Dec.):190-19
</UI>
<AU>Barton GJ;
    Sternberg MJE
</AU>
<TI>LOPAL and SCAMP: Techniques for the Comparison and Display of Protein
Sequences
</TI>
<SU>Structure;
    Program;
    UK;
    Display;
    Dynamic programming;
    Least squares;
    Protein
</SU>
<AB>"This paper describes two computer programs designed to assist in the
comparison of protein structures. LOPAL (LOoP ALignment) applies a dynamic
programming algorithm to the comparison of regions of protein three dimensional
(3D) structure and gives a similarity score and suggested sequence alignment
with that score. SCAMP (Structure Comparison and Alignment of Multiple 
Proteins)
is an interactive graphics program ... that allows the simultaneous display,
manipulation and pairwise least-squares fitting of up to nine independent
structures."
</AB>
<JT>J Mol Graphics </JT>
<PY>1988</PY>
<VO>6</VO>
<NO>Dec.</NO>
<PP>190-196</PP>
</SEQ>

<SEQ>
<UI>0087   Taylor,W.R.   Protein Structure Alig.. J.Mol.Biol.     89 208:1-22
</UI>
<AU>Taylor WR;
    Orengo CA
</AU>
<TI>Protein Structure Alignment
</TI>
<SU>Structure;
    UK;
    Dynamic programming;
    Protein
</SU>
<AB>"A new method of comparing protein structures is described, based on
distance plot analysis. ... When presented with the co-ordinate sets of two
structures, the method will produce automatically an alignment of their
sequences based on structural criteria. The method uses the dynamic programming
optimization technique, which is widely used in the comparison of protein
sequences and thus unifies the techniques of protein structure and sequence
comparison."
</AB>
<JT>J Mol Biol</JT>
<PY>208</PY>
<VO>208</VO>
<PP>1-22</PP>
</SEQ>

<SEQ>
<UI>0088   Rodeh,M.      Linear Algorithm for D.. J.Assoc.Comput. 81 
28(1):16-24
</UI>
<AU>Rodeh M;
    Pratt VR;
    Even S
</AU>
<TI>Linear Algorithm for Data Compression via String Matching
</TI>
<SU>Search tree;
    Compression;
    IL;
    String match;
    Algorithm
</SU>
<AB>"A linear implementation of the optimal universal data compression 
methods
of Lempel and Ziv is described. The main tool is McCreight's algorithm for
constructing suffix trees. Both bounded and unbounded memory are considered."
</AB>
<JT>J Assoc Comput Mach</JT>
<PY>1981</PY>
<VO>28</VO>
<NO>1</NO>
<PP>16-24</PP>
</SEQ>

<SEQ>
<UI>0089   Baeza-Yates,R Searching Subsequences   Theoret.Comput. 91 
78:363-376
</UI>
<AU>Baeza-Yates RA
</AU>
<TI>Searching Subsequences
</TI>
<SU>Subsequence;
    Automata;
    Longest common;
    CL
</SU>
<AB>"We define the directed acyclic subsequence graph of a text as the
smallest deterministic partial finite automaton that recognizes all possible
subsequences of that text. ... We show that it is possible to build this
automaton using O(n log n) time and O(n) space for a text of size n. With this
structure, we can search a subsequence in logarithmic time. We extend this
construction to the case of multiple strings .... For the latter case, we
discuss its application to the longest common subsequence problem. ... Our
algorithm improves upon previous solutions for more than two strings."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>78</PY>
<VO>78</VO>
<PP>363-376</PP>
</SEQ>

<SEQ>
<UI>0090   Irving,R.W.   Two Algorithms for the.. Lecture Notes i 92 
644:214-229
</UI>
<AU>Irving RW;
    Fraser CB
</AU>
<TI>Two Algorithms for the Longest Common Subsequence of Three (or More)
Strings
</TI>
<SU>Longest common;
    Subsequence;
    UK;
    Algorithm
</SU>
<AB>Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992.
Proceedings. "Various algorithms have been proposed, over the years, for the
longest common subsequence problem on 2 strings (2-LCS), many of these
imporving, at least for some cases, on the classical dynamic programming
approach. However, relatively little attention has been paid in the literature
to the k-LCS problem for k &gt; 2 .... In this paper, we describe and analyse two
algorithms with particular reference to the 3-LCS problem, though each 
algorithm
can be extended to solve the k-LCS problem for general k."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>644</PY>
<VO>644</VO>
<PP>214-229</PP>
</SEQ>

<SEQ>
<UI>0091   Henikoff,S.   Playing with Blocks: S.. New Biol.       91 
3(12):1148-115
</UI>
<AU>Henikoff S
</AU>
<TI>Playing with Blocks: Some Pitfalls of Forcing Multiple Alignments
</TI>
<SU>Multiple alignment;
    Review;
    USA;
    Region
</SU>
<AB>"Block alignments of multiple amino acid sequences are useful
representations of regions thought to share common ancestry and function. Often
the block alignments are motivated by the expectation that a protein of 
interest
is similar in function to members of a family of proteins. However, when
alignments are forced by using ad hoc methods, it is often difficult to decide
whether the proposed relationship is valid. Visual examination can be 
deceptive,
especially when alignments are not carried out in the context of controls
subjected to similar procedures.. ... When standard methods fail to find an
interesting block alignment unaided by human intervention, then the result
should be regarded with caution."
</AB>
<JT>New Biol</JT>
<PY>1991</PY>
<VO>3</VO>
<NO>12</NO>
<PP>1148-1154</PP>
</SEQ>

<SEQ>
<UI>0092   Breslauer,D.  Efficient Comparison B.. J.Complexity    93 
9(3):339-365
</UI>
<AU>Breslauer D;
    Galil Z
</AU>
<TI>Efficient Comparison Based String Matching
</TI>
<SU>String match;
    Pattern match;
    NL
</SU>
<AB>"We study the exact number of symbol comparisons that are required to
solve the string matching problem and present a family of efficient algorithms.
Unlike previous string matching algorithms, the algorithms in this family do 
not
'forget' results of comparisons, what makes their analysis much simpler. In
particular, we give a linear-time algorithm that finds all occurrences of a
pattern of length m in a text of length n .... The pattern preprocessing takes
linear time and makes at most 2m comparisons. This algorithm establishes that,
in general, searching for a long pattern is easier than searching for a short
one."
</AB>
<JT>J Complexity </JT>
<PY>1993</PY>
<VO>9</VO>
<NO>3</NO>
<PP>339-365</PP>
</SEQ>

<SEQ>
<UI>0093   Chao,K.M.     Constrained Sequence A.. Bull.Math.Biol. 93 
55(3):503-524
</UI>
<AU>Chao KM;
    Hardison RC;
    Miller W
</AU>
<TI>Constrained Sequence Alignment
</TI>
<SU>Pairwise alignment;
    Dynamic programming;
    USA;
    Sequence alignment;
    Locally optimal;
    Gap
</SU>
<AB>"This paper presents a dynamic programming algorithm for aligning two
sequences when the alignment is constrained to lie between two arbitrary
boundary lines in the dynamic programming matrix. For affine gap penalties, the
algorithm requires only O(F) computation time and O(M+N) space, where F is the
area of the feasible region and M and N are the sequence lengths. The result
extends to concave gap penalties, with somewhat increased time and space
bounds."
</AB>
<JT>Bull Math Biol</JT>
<PY>1993</PY>
<VO>55</VO>
<NO>3</NO>
<PP>503-524</PP>
</SEQ>

<SEQ>
<UI>0094   Friemann,A.   A New Approach for Dis.. Comput.Appl.Bio 92 
8(3):261-265
</UI>
<AU>Friemann A;
    Schmitz S
</AU>
<TI>A New Approach for Displaying Identities and Differences among Aligned
Amino Acid Sequences
</TI>
<SU>Display;
    Consensus index;
    Sequence proximity;
    DE;
    Amino acid
</SU>
<AB>"An algorithm is presented for computing degrees of sequence conservation
found among aligned amino acid sequences. Sequence identities are calculated 
for
each position of an alignment and average identity values of neighboring
positions are figured. The average identity value of the whole alignment is
chosen as a limit to discriminate between well and less conserved sequence
sections. A second algorithm is given to calculate the degree of divergence of
individual sequences compared to the other sequences of the alignment."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>3</NO>
<PP>261-265</PP>
</SEQ>

<SEQ>
<UI>0095   Hardison,R.   Use of Long Sequence A.. Mol.Biol.Evol.  93 
10(1):73-102
</UI>
<AU>Hardison R;
    Miller W
</AU>
<TI>Use of Long Sequence Alignments to Study the Evolution and Regulation of
Mammalian Globin Gene Clusters
</TI>
<SU>Multiple alignment;
    USA;
    Segment;
    Genome;
    Sequence alignment;
    Evolution;
    Gene
</SU>
<AB>"The determination of long segments of DNA sequences encompassing the b-
and a-globin gene clusters has provided an unprecedented data base for analysis
of genome evolution and regulation of gene clusters. A newly developed computer
tool kit generates local alignments between such long sequences in a space-
efficient manner, helps the user analyze the alignments effectively, and finds
consistently aligning blocks of sequences in multiple pairwise comparisons."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1993</PY>
<VO>10</VO>
<NO>1</NO>
<PP>73-102</PP>
</SEQ>

<SEQ>
<UI>0096   Barker,W.C.   Protein Sequence Datab.. Methods Enzymol 90 
183:31-49
</UI>
<AU>Barker WC;
    George DG;
    Hunt LT
</AU>
<TI>Protein Sequence Database
</TI>
<SU>Sequence database;
    USA;
    Protein
</SU>
<AB>"The Protein Sequence Database has been maintained by researchers at the
National Biomedical Research Foundation (NBRF) since the early 1960s. ...
Currently the NBRF effort is supported as part of the Protein Identification
Resource (PIR) project funded by the NIH Division of Research Resources, the
National Library of Medicine, and the National Institute of General Medical
Sciences. The main purpose of this resource is to aid the research community in
the identification and interpretation of protein sequence information."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>31-49</PP>
</SEQ>

<SEQ>
<UI>0097   Kolaskar,A.S. Sequence Alignment App.. J.Mol.Biol.     92 
223:1053-1061
</UI>
<AU>Kolaskar AS;
    Kulkarni-Kale U
</AU>
<TI>Sequence Alignment Approach to Pick Up Conformationally Similar Protein
Fragments
</TI>
<SU>Scoring;
    Substitution;
    Pairwise alignment;
    India;
    Sequence alignment;
    Fragment;
    Protein
</SU>
<AB>"A weight matrix, called Conformational Similarity Weight (CSW) matrix,
was prepared using the conformational similarity index. This weight matrix was
used to align sequences of 21 pairs of proteins whose crystal structures are
known. ... Such an approach allows us to pick up conformationally similar
protein fragments with more than 67% accuracy."
</AB>
<JT>J Mol Biol</JT>
<PY>223</PY>
<VO>223</VO>
<PP>1053-1061</PP>
</SEQ>

<SEQ>
<UI>0098   Clark,S.P.    MALIGNED: A Multiple S.. Comput.Appl.Bio 92 
8(6):535-538
</UI>
<AU>Clark SP
</AU>
<TI>MALIGNED: A Multiple Sequence Alignment Editor
</TI>
<SU>Multiple alignment;
    Program;
    CA;
    Sequence alignment;
    Editor
</SU>
<AB>"A multiple sequence alignment editor is described which runs on a 
VAX/VMS
system and can exchange data with a number of other programs, including those 
of
the Genetics Computer Group (GCG). Up to 199 sequences can be aligned. The
quality of the alignment can be easily judged during its development because 
the
display attributes to each character are determined by the way it matches the
other sequences."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>6</NO>
<PP>535-538</PP>
</SEQ>

<SEQ>
<UI>0099   Faulkner,D.V. Multiple Aligned Seque.. Trends Biochem. 88 
13:321-322
</UI>
<AU>Faulkner DV;
    Jurka J
</AU>
<TI>Multiple Aligned Sequence Editor (MASE)
</TI>
<SU>Multiple alignment;
    Program;
    Editor;
    Sequence analysis;
    USA
</SU>
<AB>"Cognitive capacities of the human brain can not, so far, be matched by
computers. Even well optimized computer programs have limited flexibility in
addressing the variety of problems associated with sequence analysis. Hence, we
were motivated to design a Multiple Aligned Sequence Editor (MASE) which
combines manual sequence manipulations with standard computer analysis."
</AB>
<JT>Trends Biochem Sci</JT>
<PY>13</PY>
<VO>13</VO>
<PP>321-322</PP>
</SEQ>

<SEQ>
<UI>0100   Stockwell,P.A HOMED: A Homologous Se.. Comput.Appl.Bio 87 
3(1):37-43
</UI>
<AU>Stockwell PA;
    Petersen GB
</AU>
<TI>HOMED: A Homologous Sequence Editor
</TI>
<SU>Program;
    NZ;
    Display;
    Consensus sequence;
    Parallel;
    Editor
</SU>
<AB>"The alignment of homologous sequences with each other and their display
has proved a difficult task, despite a frequent requirement for this process.
HOMED enables related sequences to be edited and listed in parallel with each
other. ... HOMED provides functions for listing the sequences in a variety of
formats and for generating a consensus sequence as well as providing a series 
of
tools for maintenance of the sequence database."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1987</PY>
<VO>3</VO>
<NO>1</NO>
<PP>37-43</PP>
</SEQ>

<SEQ>
<UI>0101   Thirup,S.     ALMA, An Editor for La.. Proteins Struct 90 
7:291-295
</UI>
<AU>Thirup S;
    Larsen NE
</AU>
<TI>ALMA, An Editor for Large Sequence Alignments
</TI>
<SU>Multiple alignment;
    Management;
    Program;
    DK;
    Sequence alignment;
    Display;
    Editor
</SU>
<AB>"A dedicated sequence editor, ALMA, was developed for aligning many
sequences of proteins or RNA molecules or longer DNA fragments. Like previously
published editors, ALMA is menu directed, screen oriented, and offers 
similarity
and consensus display. ALMA has the additional features of collective movement
of sequences, acceptance of input from many sources including structure files
and databases, secondary structure display, and easy merging of alignments. ...
The program allows interaction between manual and automatic alignment."
</AB>
<JT>Proteins Struct Funct Genet</JT>
<PY>7</PY>
<VO>7</VO>
<PP>291-295</PP>
</SEQ>

<SEQ>
<UI>0102   Knox,E.B.     Chloroplast Genome Rea.. Mol.Biol.Evol.  93 
10(2):414-430
</UI>
<AU>Knox EB;
    Downie SR;
    Palmer JD
</AU>
<TI>Chloroplast Genome Rearrangements and the Evolution of Giant Lobelias 
from
Herbaceous Ancestors
</TI>
<SU>Genome;
    Rearrangement;
    Deletion;
    Inversion;
    USA;
    Evolution;
    Chloroplast;
    Ancestor
</SU>
<AB>"Phylogenetic relationships among 16 species of Lobelia and single
representatives of Monopsis and Sclerotheca (Lobeliaceae) were assessed by
mapping restriction sites and major structural rearrangements (deletions and
inversions) in the large single-copy region of the chloroplast genome. Eleven
inversions and five different gene arrangements were found. A deletion 
involving
ORF512 is associated with many of the inversions, and all inversion endpoints
are located in intergenic spacer regions."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1993</PY>
<VO>10</VO>
<NO>2</NO>
<PP>414-430</PP>
</SEQ>

<SEQ>
<UI>0103   Henikoff,S.   Amino Acid Substitutio.. Proc.Nat.Acad.S 92 
89:10915-10919
</UI>
<AU>Henikoff S;
    Henikoff JG
</AU>
<TI>Amino Acid Substitution Matrices from Protein Blocks
</TI>
<SU>Sequence proximity;
    Substitution;
    USA;
    Scoring;
    Amino acid;
    Protein
</SU>
<AB>"Methods for alignment of protein sequences typically measure similarity
by using a substitution matrix with scores for all possible exchanges of one
amino acid with another. The most widely used matrices are based on the Dayhoff
model of evolutionary rates. Using a different approach, we have derived
substitution matrices from about 2000 blocks of aligned sequence segments
characterizing more than 500 groups of related proteins. This led to marked
improvements in alignments and in searches using queries from each of the
groups."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>89</PY>
<VO>89</VO>
<PP>10915-10919</PP>
</SEQ>

<SEQ>
<UI>0104   Fuchs,R.      Molecular Biological D.. Trends Biotechn 92 
10(1):61-66
</UI>
<AU>Fuchs R;
    Rice P;
    Cameron GN
</AU>
<TI>Molecular Biological Databases - Present and Future
</TI>
<SU>Sequence database;
    DE;
    Genome;
    Mapping
</SU>
<AB>"The importance of databases as a research tool in molecular biology is
growing steadily, and a wide range of databases relevant to genome research is
currently available. However, the design of current databases is inadequate for
accurate representation and analysis of the results of large-scale genome
mapping and sequencing projects. A new generation of databases is required to
master the challenges of the future." Challenges concerning data acquisition,
data distribution, data interpretation, flexibility of data representation and
database integration, database design.
</AB>
<JT>Trends Biotechnol</JT>
<PY>1992</PY>
<VO>10</VO>
<NO>1</NO>
<PP>61-66</PP>
</SEQ>

<SEQ>
<UI>0105   Orcutt,B.C.   Protein and Nucleic Ac.. Annu.Rev.Biophy 83 
12:419-441
</UI>
<AU>Orcutt BC;
    George DG;
    Dayhoff MO
</AU>
<TI>Protein and Nucleic Acid Sequence Database Systems
</TI>
<SU>Database search;
    Sequence database;
    USA;
    Protein;
    Nucleic acid
</SU>
<AB>"Several groups currently collect data and maintain large-scale
computerized nucleic acid sequence databases. These include the National
Biomedical Research Foundation (NBRF), the Los Alamos National Laboratory, the
European Molecular Biology Laboratory, and the Molecular Evolution group at
Lyon. ... Only the NBRF group maintains a comprehensive protein data collection
that is available on-line. In this review we primarily describe the NBRF 
system,
which as the present time contains the largest and most comprehensive data
collections and the most integrated on-line distribution system."
</AB>
<JT>Annu Rev Biophys Bioeng</JT>
<PY>12</PY>
<VO>12</VO>
<PP>419-441</PP>
</SEQ>

<SEQ>
<UI>0106   Gutell,R.R.   Identifying Constraint.. Nucleic Acids R 92 
20(21):5785-57
</UI>
<AU>Gutell RR;
    Power A;
    Hertz GZ;
    Putz EJ;
    Stormo GD
</AU>
<TI>Identifying Constraints on the Higher-Order Structure of RNA: Continued
Development and Application of Comparative Sequence Analysis Methods
</TI>
<SU>Multiple alignment;
    Structure;
    USA;
    Sequence analysis;
    RNA
</SU>
<AB>"Comparative sequence analysis addresses the problem of RNA folding and
RNA structural diversity, and is responsible for determining the folding of 
many
RNA molecules. ... Comparative structure analysis requires an alignment of 
those
sequences that make up the collection. The better the alignment, the more
meaningful the information that can be discerned. Initially sequences are
aligned for maximum primary structure homology. As secondary structure elements
are identified and phylogenetically proven, these features, in addition to
primary structure conservation, serve to constrain the juxtaposition of
sequences."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1992</PY>
<VO>20</VO>
<NO>21</NO>
<PP>5785-5795</PP>
</SEQ>

<SEQ>
<UI>0107   Marck,C.      'DNA Strider': A 'C' P.. Nucleic Acids R 88 
16(5):1829-183
</UI>
<AU>Marck C
</AU>
<TI>'DNA Strider': A 'C' Program for the Fast Analysis of DNA and Protein
Sequences on the Apple Macintosh Family of Computers
</TI>
<SU>Sequence analysis;
    FR;
    Program;
    Editor;
    Restriction;
    Dictionary match;
    Protein;
    DNA
</SU>
<AB>The program "has been designed as an easy to learn and use program as 
well
as a fast and efficient tool for the day-to-day sequence analysis work. The
program consists of a multi-window sequence editor and of various DNA and
Protein analysis functions. ... The restriction sites search uses a newly
designed fast hexamer look-ahead algorithm. Typical runtime for the search of
all sites with a library of 130 restriction endonucleases is 1 second per 10000
bases."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1988</PY>
<VO>16</VO>
<NO>5</NO>
<PP>1829-1836</PP>
</SEQ>

<SEQ>
<UI>0108   Blum,A.       Linear Approximation o.. J.Assoc.Comput. 94 
41(4):630-647
</UI>
<AU>Blum A;
    Jiang T;
    Li M;
    Tromp J;
    Yannakakis M
</AU>
<TI>Linear Approximation of Shortest Superstrings
</TI>
<SU>Supersequence;
    Shortest common;
    Approximation;
    USA
</SU>
<AB>Also Proc. 23rd ACM Symp. on Theory of Computing, 1991, 328-336. "We
consider the following problem: given a collection of strings s1, ..., sm, find
the shortest string s such that each si appears as a substring (a consecutive
block) of s. Although this problem is known to be NP-hard, a simple greedy
procedure appears to do quite will and is routinely used in DNA sequencing ....
We show that the greedy algorithm does in fact achieve a constant factor
approximation, proving an upper bound of 4n. Furthermore, we present a simple
modified version of the greedy algorithm that we show produces a superstring of
length at most 3n."
</AB>
<JT>J Assoc Comput Mach</JT>
<PY>1994</PY>
<VO>41</VO>
<NO>4</NO>
<PP>630-647</PP>
</SEQ>

<SEQ>
<UI>0109   Gallant,J.    On Finding Minimal Len.. J.Comput.System 80 20:50-58
</UI>
<AU>Gallant J;
    Maier D;
    Storer JA
</AU>
<TI>On Finding Minimal Length Superstrings
</TI>
<SU>Supersequence;
    Complexity;
    USA
</SU>
<AB>"The superstring problem is: Given a set S of strings and a positive
integer K, does S have a superstring of length K? ... We consider the 
complexity
of the superstring problem. NP-completeness results dealing with sets of 
strings
over both finite and infinite alphabets are presented. Also, for a restricted
version of the superstring problem, a linear time algorithm is given."
</AB>
<JT>J Comput Systems Sci</JT>
<PY>20</PY>
<VO>20</VO>
<PP>50-58</PP>
</SEQ>

<SEQ>
<UI>0110   Allison,L.    Restriction Site Mappi.. Comput.Appl.Bio 88 
4(1):97-101
</UI>
<AU>Allison L;
    Yee CN
</AU>
<TI>Restriction Site Mapping is in Separation Theory
</TI>
<SU>Restriction;
    Mapping;
    AU
</SU>
<AB>"A computer algorithm for restriction-site mapping consists of a 
generator
of partial maps and a consistency checker. This paper examines consistency
checking and argues that a method based on separation theory extracts the
maximum amount of information from fragment lengths in digest data. It results
in the minimum number of false maps being generated."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1988</PY>
<VO>4</VO>
<NO>1</NO>
<PP>97-101</PP>
</SEQ>

<SEQ>
<UI>0111   Jiang,T.      A Note on Shortest Sup.. Inform.Process. 92 
44(4):195-199
</UI>
<AU>Jiang T;
    Li M;
    Du DZ
</AU>
<TI>A Note on Shortest Superstrings with Flipping
</TI>
<SU>Supersequence;
    CA;
    Approximation
</SU>
<AB>"This paper considers an interesting variation of the [shortest common
superstring] problem: For a given set of strings S = {s1, ... , sm}, find a
shortest superstring that contains either si or siR for each i. The problem may
have applications in DNA sequencing practice when orientations of the fragments
in the target DNA molecule are unknown. We give a simple greedy algorithm and
prove a 4n approximation bound for it."
</AB>
<JT>Inform Process Lett</JT>
<PY>1992</PY>
<VO>44</VO>
<NO>4</NO>
<PP>195-199</PP>
</SEQ>

<SEQ>
<UI>0112   Karp,R.M.     Mapping the Genome: So.. ACM Sympos.Theo 93 
25:278-285
</UI>
<AU>Karp RM
</AU>
<TI>Mapping the Genome: Some Combinatorial Problems Arising in Molecular
Biology
</TI>
<SU>Genome;
    Mapping;
    Combinatorial;
    Clone;
    USA
</SU>
<AB>"In order to construct a physical map of a large DNA molecule it is
necessary to extract from it a large number of fragments called clones, obtain 
a
'fingerprint' of each clone, and then mathematically reassemble the DNA 
molecule
by determining how the clones overlap. This reassembly process leads to a 
number
of challenging algorithmic, combinatorial and probabilistic problems that are
currently handled in a primitive way, and should be grist for the mills of
theoretical computer scientists."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>25</PY>
<VO>25</VO>
<PP>278-285</PP>
</SEQ>

<SEQ>
<UI>0113   Middendorf,M. More on the Complexity.. Theoret.Comput. 94 
125:205-228
</UI>
<AU>Middendorf M
</AU>
<TI>More on the Complexity of Common Superstring and Supersequence Problems
</TI>
<SU>Supersequence;
    DE;
    Complexity
</SU>
<AB>The author obtains NP-completeness results concerning decision versions 
of
the problems to find the Shortest Common Superstring, the Shortest Common
Supersequence, and cyclic and permutation variants of them.
</AB>
<JT>Theoret Comput Sci</JT>
<PY>125</PY>
<VO>125</VO>
<PP>205-228</PP>
</SEQ>

<SEQ>
<UI>0114   Tarhio,J.     A Greedy Approximation.. Theoret.Comput. 88 
57:131-145
</UI>
<AU>Tarhio J;
    Ukkonen E
</AU>
<TI>A Greedy Approximation Algorithm for Constructing Shortest Common
Superstrings
</TI>
<SU>Multiple comparison;
    Supersequence;
    Knuth-Morris-Pratt;
    FI;
    Approximation;
    Compression;
    Shortest common;
    Algorithm
</SU>
<AB>"An approximation algorithm for the shortest common superstring problem 
is
developed, based on the Knuth-Morris-Pratt string-matching procedure and on the
greedy heuristics for finding longest Hamiltonian paths in weighted graphs.
Given a set R of strings, the algorithm constructs a common superstring for R 
in
O(mn) steps where m is the number of strings in R and n is the total length of
these strings. The performance of the algorithm is analysed in terms of the
compression in the common superstrings constructed ...."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>57</PY>
<VO>57</VO>
<PP>131-145</PP>
</SEQ>

<SEQ>
<UI>0115   Teng,S.H.     Approximating Shortest.. IEEE Sympos.Fou 93 
34:158-165
</UI>
<AU>Teng SH;
    Yao F
</AU>
<TI>Approximating Shortest Superstrings
</TI>
<SU>Supersequence;
    Shortest common;
    Approximation;
    USA
</SU>
<AB>"The Shortest Superstring Problem is to find a shortest possible string
that contains every string in a given set as substrings. This problem has
applications to data compression and DNA sequencing. As the problem is NP-hard
and MAX SNP-hard, approximation algorithms are of interest. We present a new
algorithm which always finds a superstring that is at most 2.89 times as long 
as
the shortest superstring. Our result improves the 3-approximation result of
Blum, Jiang, Li, Tromp, and Yannakakis [1991]."
</AB>
<JT>IEEE Sympos Found Comput Sci</JT>
<PY>34</PY>
<VO>34</VO>
<PP>158-165</PP>
</SEQ>

<SEQ>
<UI>0116   Turner,J.S.   Approximation Algorith.. Inform.Comput.  89 
83(1):1-20
</UI>
<AU>Turner JS
</AU>
<TI>Approximation Algorithms for the Shortest Common Superstring Problem
</TI>
<SU>Supersequence;
    Search tree;
    Approximation;
    USA;
    Shortest common;
    Algorithm
</SU>
<AB>"The object of the shortest common superstring problem (SCS) is to find
the shortest possible string that contains every string in a given set as
substrings. As the problem is NP-complete, approximation algorithms are of
interest. ... We describe several approximation algorithms that produce
solutions that are always within a factor of two of optimum with respect to the
overlap measure. We also describe an efficient implementation of one of these,
using McCreight's compact suffix tree construction algorithm."
</AB>
<JT>Inform Comput</JT>
<PY>1989</PY>
<VO>83</VO>
<NO>1</NO>
<PP>1-20</PP>
</SEQ>

<SEQ>
<UI>0117   Bork,P.       Recognition of Functio.. FEBS Lett.      89 
257(1):191-195
</UI>
<AU>Bork P
</AU>
<TI>Recognition of Functional Regions in Primary Structures using a Set of
Property Patterns
</TI>
<SU>Database search;
    Pattern library;
    DE;
    Region;
    Motif;
    Pattern definition;
    Structure;
    Recognition
</SU>
<AB>"32 consensus patterns for a set of functional regions and structural
motifs in protein sequences were constructed. The pattern definition is
heuristic and based on 11 selected steric and physicochemical properties. By
comparison with these patterns, it was possible to identify, without false
detection, 1532 sites in 8702 protein sequences of SWISSPROT. Screening against
such a pattern library offers a considerable chance to identify functional
regions or structural motifs in proteins from which only the sequence is 
known."
</AB>
<JT>FEBS Lett</JT>
<PY>1989</PY>
<VO>257</VO>
<NO>1</NO>
<PP>191-195</PP>
</SEQ>

<SEQ>
<UI>0118   Johnson,M.S.  Comparisons of Protein.. Curr.Opin.Struc 91 
1:334-344
</UI>
<AU>Johnson MS
</AU>
<TI>Comparisons of Protein Structures
</TI>
<SU>Structure;
    Review;
    UK;
    Protein
</SU>
<AB>"The structures of proteins related by evolution are remarkably alike 
even
when the observed sequence similarities are statistically marginal or seemingly
non-existent. Similar protein substructures are found in proteins for which
there is no evidence of common ancestry and no similarity in their global
topology. Recent advances in the comparison of whole proteins, together with 
the
comparison and analysis of their parts, have paved the way for the use of
structural information in prediction and modelling, protein engineering,
structure and sequence alignments, and investigations of protein evolution 
...."
</AB>
<JT>Curr Opin Struct Biol</JT>
<PY>1</PY>
<VO>1</VO>
<PP>334-344</PP>
</SEQ>

<SEQ>
<UI>0119   Sali,A.       From Comparison of Pro.. Trends Biochem. 90 
15:235-240
</UI>
<AU>Sali A;
    Overington JP;
    Johnson MS;
    Blundell TL
</AU>
<TI>From Comparison of Protein Sequences and Structures to Protein Modelling
and Design
</TI>
<SU>Sequence comparison;
    Structure;
    UK;
    Protein
</SU>
<AB>"A useful approach to modelling proteins exploits knowledge of three-
dimensional structures determined by X-ray crystallography together with rules
defined by their analysis and comparison. ... In this review we shall consider
our own approach to protein modelling which can be completely automated, and in
which all decisions are rule based."
</AB>
<JT>Trends Biochem Sci</JT>
<PY>15</PY>
<VO>15</VO>
<PP>235-240</PP>
</SEQ>

<SEQ>
<UI>0120   Gautheret,D.  Pattern Searching/Alig.. Comput.Appl.Bio 90 
6(4):325-331
</UI>
<AU>Gautheret D;
    Major F;
    Cedergren R
</AU>
<TI>Pattern Searching/Alignment with RNA Primary and Secondary Structures: An
Effective Descriptor for tRNA
</TI>
<SU>Pattern search;
    Sequence alignment;
    Motif;
    CA;
    Pattern match;
    Structure;
    RNA;
    Secondary
</SU>
<AB>"A convenient pattern-matching program using primary and higher-order
structural features has been developed based on a 'backtracking' algorithm. A
second implementation of the algorithm uses descriptors of structural features
(including primary sequences) to align a list of homologous or highly similar
sequences. An application of the pattern matcher to the search for tRNA and
group I intron structural motifs in sequence data banks is presented."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1990</PY>
<VO>6</VO>
<NO>4</NO>
<PP>325-331</PP>
</SEQ>

<SEQ>
<UI>0121   Gartmann,C.J. SQUIRREL: Sequence QUe.. Nucleic Acids R 91 
19(21):6033-60
</UI>
<AU>Gartmann CJ;
    Grob U
</AU>
<TI>SQUIRREL: Sequence QUery, Information Retrieval and REporting Library. A
Program Package for Analyzing Signals in Nucleic Acid Sequences for the VAX
</TI>
<SU>Multiple alignment;
    Segment;
    DE;
    Signal;
    Program;
    Nucleic acid;
    Retrieval;
    Query
</SU>
<AB>"A computer tool is described for comparison, analysis and search of
genetic signals. The method is based on sequence consensus matrices. It assumes
that a genetic signal (such as a promoter, enhancer or whatever) is composed of
several signal blocks separated from each other by variable distances. A set of
programs is presented to perform the analysis. ... The method is able to align
large sets of sequences within a few minutes and to check the quality of the
alignment."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1991</PY>
<VO>19</VO>
<NO>21</NO>
<PP>6033-6040</PP>
</SEQ>

<SEQ>
<UI>0122   Apostolico,A. Optimal Off-line Detec.. Theoret.Comput. 83 
22:297-315
</UI>
<AU>Apostolico A;
    Preparata FP
</AU>
<TI>Optimal Off-line Detection of Repetitions in a String
</TI>
<SU>Regularities;
    Search tree;
    USA;
    Data structure;
    Optimal;
    Repetition;
    Detection
</SU>
<AB>"An algorithm is presented to detect - within optimal time O(n log n) and
space O(n), off-line on a RAM - all of the distinct repetitions in a given
textstring on a finite alphabet. The proposed strategy is self-contained, as it
depends more heavily on algorithmic design considerations than on the
combinatorial properties of the output. It is based on a new data structure, 
the
leaf-tree, which is particularly suited to exploit simple properties of the
suffix tree associated with the string to be analyzed."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>22</PY>
<VO>22</VO>
<PP>297-315</PP>
</SEQ>

<SEQ>
<UI>0123   Apostolico,A. Structural Properties .. J.Comput.System 85 
31:394-411
</UI>
<AU>Apostolico A;
    Preparata FP
</AU>
<TI>Structural Properties of the String Statistics Problem
</TI>
<SU>String match;
    Search tree;
    USA;
    Regularities
</SU>
<AB>"A suitably weighted index tree ... can be easily adapted to store, for a
given string x and for all substrings w of x, the number of distinct instances
of w along x. ... If the substring w has nontrivial periods, however, the 
number
of distinct instances might differ from that of distinct non-overlapping
occurrences along x. It is shown here that O(n log n) storage units - n 
standing
for the length of x - are sufficient to organize this second kind of 
statistics,
in such a way that the maximum number of nonoverlapping instances for arbitrary
w along x can be retrieved in a number of character comparisons not exceeding
the length of w."
</AB>
<JT>J Comput Systems Sci</JT>
<PY>31</PY>
<VO>31</VO>
<PP>394-411</PP>
</SEQ>

<SEQ>
<UI>0124   Blumer,A.     The Smallest Automaton.. Theoret.Comput. 85 40:31-55
</UI>
<AU>Blumer A;
    Blumer J;
    Haussler D;
    Ehrenfeucht A;
    Chen MT;
    Seiferas J
</AU>
<TI>The Smallest Automaton Recognizing the Subwords of a Text
</TI>
<SU>Regularities;
    Automata;
    USA
</SU>
<AB>"Let a partial deterministic finite automaton be a DFA in which each 
state
need not have a transition edge for each letter of the alphabet. We demonstrate
that the smallest partial DFA for the set of all subwords of a given word w, 
|w|
&gt; 2, has at most 2 |w| - 2 states and 3 |w| - 4 transition edges, independently
of the alphabet size. We give an algorithm to build this smallest partial DFA
from the input w on-line in linear time."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>40</PY>
<VO>40</VO>
<PP>31-55</PP>
</SEQ>

<SEQ>
<UI>0125   Hancart,C.    On Simon's String Sear.. Inform.Process. 93 
47(2):95-99
</UI>
<AU>Hancart C
</AU>
<TI>On Simon's String Searching Algorithm
</TI>
<SU>String match;
    FR;
    Complexity;
    String search;
    Algorithm
</SU>
<AB>"Simon has recently designed [a string matching] algorithm which can be
regarded as a compromise between the implementation of [a deterministic finite
automaton] and [the algorithm of Knuth, Morris and Pratt]. ... In this paper, 
we
extend Simon's work by studying the complexity of [variants of Simon's
algorithm]."
</AB>
<JT>Inform Process Lett</JT>
<PY>1993</PY>
<VO>47</VO>
<NO>2</NO>
<PP>95-99</PP>
</SEQ>

<SEQ>
<UI>0126   Claverie,J.M. Heuristic Informationa.. Nucleic Acids R 86 
14(1):179-196
</UI>
<AU>Claverie JM;
    Bougueleret L
</AU>
<TI>Heuristic Informational Analysis of Sequences
</TI>
<SU>Sequence analysis;
    Pattern discovery;
    Information content;
    FR;
    Statistical;
    Profile;
    N-gram;
    Heuristic
</SU>
<AB>"Nucleotide or amino-acid sequences are interpreted as successions of
words of length k (k-tuples) the frequencies of which are highly variable in
different statistical populations of genes or proteins. After building k-tuple
reference tables from coherent subsets or entire data banks, the local
information content profile of individual sequences is drawn. Anomalous regions
(peaks or depressions) of such a profile can lead to the discovery and
identification of specific sequence patterns."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1986</PY>
<VO>14</VO>
<NO>1</NO>
<PP>179-196</PP>
</SEQ>

<SEQ>
<UI>0127   Prestridge,D. SIGNAL SCAN: A Compute.. Comput.Appl.Bio 91 
7(2):203-206
</UI>
<AU>Prestridge DS
</AU>
<TI>SIGNAL SCAN: A Computer Program that Scans DNA Sequences for Eukaryotic
Transcriptional Elements
</TI>
<SU>Dictionary match;
    USA;
    Consensus sequence;
    Signal;
    Program;
    DNA
</SU>
<AB>"SIGNAL SCAN uses both specific sequence elements derived from 
biochemical
characterization and elements from derived consensus sequences to match against
a user input DNA sequence. ... The matching algorithm is a simple string 
match."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1991</PY>
<VO>7</VO>
<NO>2</NO>
<PP>203-206</PP>
</SEQ>

<SEQ>
<UI>0128   Heringa,J.    A Method to Recognize .. Proteins Struct 93 
17:391-411
</UI>
<AU>Heringa J;
    Argos P
</AU>
<TI>A Method to Recognize Distant Repeats in Protein Sequences
</TI>
<SU>Regularities;
    Multiple alignment;
    DE;
    Display;
    Sequence alignment;
    Consensus sequence;
    Repeat;
    Protein
</SU>
<AB>"An automated algorithm is presented that delineates protein sequence
fragments which display similarity. The method incorporates a selection of a
number of local nonoverlapping sequence alignments with the highest similarity
scores and a graph-theoretical approach to elucidate the consistent start and
end points of the fragments comprising one or more ensembles of related
subsequences. The procedure allows the simultaneous identification of different
types of repeats within one sequence. A multiple alignment of the resulting
fragments is performed and a consensus sequence derived from the ensemble(s)."
</AB>
<JT>Proteins Struct Funct Genet</JT>
<PY>17</PY>
<VO>17</VO>
<PP>391-411</PP>
</SEQ>

<SEQ>
<UI>0129   Vingron,M.    Sequence Alignment and.. J.Mol.Biol.     94 235:1-12
</UI>
<AU>Vingron M;
    Waterman MS
</AU>
<TI>Sequence Alignment and Penalty Choice. Review of Concepts, Case Studies
and Implications
</TI>
<SU>Multiple alignment;
    Sequence proximity;
    USA;
    Sequence alignment;
    Gap;
    Review
</SU>
<AB>The paper reviews two recent advances in algorithms and probability that
enable us to take a new approach to the question of selecting parameters for
sequence alignment algorithms. "From this we gain a better understanding of the
dependence of alignments on parameters in general. We propose novel criteria to
detect biologically good alignments and highlight some specific features about
the interaction between similarity matrices and gap penalties."
</AB>
<JT>J Mol Biol</JT>
<PY>235</PY>
<VO>235</VO>
<PP>1-12</PP>
</SEQ>

<SEQ>
<UI>0130   de Almeida,N. A String-Matching Algo.. Inform.Process. 93 
47(5):257-259
</UI>
<AU>de Almeida NF Jr;
    Barbosa VC
</AU>
<TI>A String-Matching Algorithm for the CREW PRAM
</TI>
<SU>Parallel;
    BR;
    String match;
    Pattern match;
    Algorithm
</SU>
<AB>"We present an algorithm for the CREW PRAM to find all occurrences of a
pattern of size m in a text of size n. For a fixed alphabet and m = O(log2 n),
the algorithm runs in O(log m) time on O(n / log m) processors. Under these
restrictions, it is optimal and improves on the time complexity of previously
known string-matching algorithms for the CREW PRAM."
</AB>
<JT>Inform Process Lett</JT>
<PY>1993</PY>
<VO>47</VO>
<NO>5</NO>
<PP>257-259</PP>
</SEQ>

<SEQ>
<UI>0131   Giancarlo,R.  Fully Dynamic Dictiona..                 92
</UI>
<AU>Giancarlo R;
    Amir A;
    Farach M;
    Galil Z;
    Park K
</AU>
<TI>Fully Dynamic Dictionary Matching
BK  -
</TI>
<SU>Dictionary match;
    USA;
    Dynamic
</SU>
<AB>Preprint, 19 pp. "We consider the dynamic dictionary matching problem. 
...
Our algorithms improve the deletion scheme presented in Amir and Farach's 
recent
solution for the dynamic dictionary matching problem." Document No. 11272-
920311-12TM, AT&amp;T Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ
07974-2070, USA, 19 pages.
</AB>
<PY>1992</PY>
</SEQ>

<SEQ>
<UI>0132   Posfai,J.     VISA: Visual Sequence .. Comput.Appl.Bio 94 
10(5):537-544
</UI>
<AU>Posfai J;
    Szaraz Z;
    Roberts RJ
</AU>
<TI>VISA: Visual Sequence Analysis for the Comparison of Multiple Amino Acid
Sequences
</TI>
<SU>Sequence analysis;
    Multiple comparison;
    Amino acid;
    USA
</SU>
<AB>"VISA (VIsual Sequence Analysis) is a software package that displays
global similarities within a set of related protein sequences. The program
identifies amino acid patterns that are common to many members of the set of
sequences and displays them as a series of histograms. Individual peaks on the
display can be assigned a color and analogous peaks in the other sequences are
then automatically marked in the same color. This can be repeated for each
significant peak and leads to a display in which major matching segments of
multiple amino acid sequences appear as dominant peaks of the histograms with
matching colors."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>5</NO>
<PP>537-544</PP>
</SEQ>

<SEQ>
<UI>0133   Nakai,K.      Gnome - an Internet-Ba.. Comput.Appl.Bio 94 
10(5):547-550
</UI>
<AU>Nakai K;
    Tokimori T;
    Ogiwara A;
    Uchiyama I;
    Niiyama T
</AU>
<TI>Gnome - an Internet-Based Sequence Analysis Tool
</TI>
<SU>Sequence analysis;
    Electronic mail;
    Genome;
    JP
</SU>
<AB>"Gnome (GenomeNet Open Mail-service Environment) is a sequence analysis
tool that enables an end-user to make use of several Internet- (mainly e-mail)
based services with an easy-to-use graphical user interface. Users can conduct
homology and motif searches, and database-entry retrieval against the latest
databases by emitting search requests to and receiving their results from a
search-server by e-mail. The search results are viewed and managed efficiently
with this system. The Macintosh and X (Motif) versions of the Gnome client and
the UNIX version of the Gnome server are available to academic users free of
charge."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>5</NO>
<PP>547-550</PP>
</SEQ>

<SEQ>
<UI>0134   Colussi,L.    Fastest Pattern Matchi.. J.Algorithms    94 
16:163-189
</UI>
<AU>Colussi L
</AU>
<TI>Fastest Pattern Matching in Strings
</TI>
<SU>String match;
    Boyer-Moore;
    Pattern match;
    Italy
</SU>
<AB>"An algorithm is presented that substantially improves the algorithm of
Boyer and Moore for pattern matching in strings, both in the worst case and in
the average. ... The new algorithm performs 2n character comparisons in the
worst case while the Boyer and Moore algorithm requires 3n comparisons; the new
algorithm requires fewer comparisons than Boyer and Moore on the average .... 
As
a shortcoming of the new algorithm, the preprocessing of the pattern requires
O(m) time on the average but O(m2) in the worst case."
</AB>
<JT>J Algorithms </JT>
<PY>16</PY>
<VO>16</VO>
<PP>163-189</PP>
</SEQ>

<SEQ>
<UI>0135   Bailey,T.A.   Fast String Searching .. Inform.Process. 80 
11(3):130-133
</UI>
<AU>Bailey TA;
    Dromey RG
</AU>
<TI>Fast String Searching by Finding Subkeys in Subtext
</TI>
<SU>String match;
    Boyer-Moore;
    String search
</SU>
<AB>"Our algorithm dominates the Boyer and Moore algorithm for binary
alphabets, but is inferior for large alphabets and short keys. This algorithm 
is
an application of the technique used by Aho and Corasick to a single key. ...
The algorithm gains its speed by only looking at every bth character of the
text."
</AB>
<JT>Inform Process Lett</JT>
<PY>1980</PY>
<VO>11</VO>
<NO>3</NO>
<PP>130-133</PP>
</SEQ>

<SEQ>
<UI>0136   Baker,T.P.    A Technique for Extend.. SIAM J.Comput.  78 
7(4):533-541
</UI>
<AU>Baker TP
</AU>
<TI>A Technique for Extending Rapid Exact-Match String Matching to Arrays of
more than One Dimension
</TI>
<SU>String match;
    Knuth-Morris-Pratt;
    USA;
    Multidimensional;
    Pattern match;
    Pattern recognition
</SU>
<AB>"A class of algorithms is presented for very rapid on-line detection of
occurrences of a fixed set of pattern arrays as embedded subarrays in an input
array. By reducing the array problem to a string matching problem in a natural
way, it is shown that efficient string matching algorithms may be applied to
arrays. This is illustrated by use of the string-matching algorithm of Knuth,
Morris and Pratt."
</AB>
<JT>SIAM J Comput</JT>
<PY>1978</PY>
<VO>7</VO>
<NO>4</NO>
<PP>533-541</PP>
</SEQ>

<SEQ>
<UI>0137   Blumer,A.     Complete Inverted File.. J.Assoc.Comput. 87 
34(3):578-595
</UI>
<AU>Blumer A;
    Blumer J;
    Haussler D;
    McConnell R;
    Ehrenfeucht A
</AU>
<TI>Complete Inverted Files for Efficient Text Retrieval and Analysis
</TI>
<SU>String match;
    Search tree;
    Automata;
    USA;
    Retrieval;
    Data structure
</SU>
<AB>"A data structure that implements a complete inverted file for [a finite
set S of texts] that occupies linear space and can be built in linear time,
using the uniform-cost RAM model, is given. Using this data structure, the time
for each of the above query functions [find(w), freq(w), locations(w)] is
optimal. To accomplish this, techniques from the theory of finite automata and
the work on suffix trees are used to build a deterministic finite automaton 
that
recognizes the set of all subwords of the set S."
</AB>
<JT>J Assoc Comput Mach</JT>
<PY>1987</PY>
<VO>34</VO>
<NO>3</NO>
<PP>578-595</PP>
</SEQ>

<SEQ>
<UI>0138   Bookstein,A.  On Harrison's Substrin.. Comm.ACM        73 
16(3):180-181
</UI>
<AU>Bookstein A
</AU>
<TI>On Harrison's Substring Testing Technique
</TI>
<SU>Significance;
    USA;
    String match;
    Probabilistic
</SU>
<AB>"This note comments on a technique by Malcolm Harrison [1971] that tests
whether a given string of characters, S1, is a substring of another string of
characters, S2. ... We here note that, based on the assumptions inherent in
Harrison's development, it is possible to derive a more exact expression for 
the
probability of a false match."
</AB>
<JT>Comm ACM </JT>
<PY>1973</PY>
<VO>16</VO>
<NO>3</NO>
<PP>180-181</PP>
</SEQ>

<SEQ>
<UI>0139   Crochemore,M. An Optimal Algorithm f.. Inform.Process. 81 
12(5):244-250
</UI>
<AU>Crochemore M
</AU>
<TI>An Optimal Algorithm for Computing the Repetitions in a Word
</TI>
<SU>Regularities;
    Repetition;
    FR;
    Optimal;
    Word;
    Algorithm
</SU>
<AB>"This paper presents an algorithm to compute all the repetitions of
primitive factors in a word x [of length n] in time O(n log n). A
straightforward adaptation of the Knuth, Morris and Pratt's string-matching
algorithm also allows to solve the problem, but in time O(n2)."
</AB>
<JT>Inform Process Lett</JT>
<PY>1981</PY>
<VO>12</VO>
<NO>5</NO>
<PP>244-250</PP>
</SEQ>

<SEQ>
<UI>0140   Crochemore,M. Computing LCF in Linea.. EATCS Bull.     86 30:57-61
</UI>
<AU>Crochemore M
</AU>
<TI>Computing LCF in Linear Time
</TI>
<SU>String match;
    Automata;
    FR
</SU>
<AB>"The LCF of two words u and v of A* is the length of a longest factor
common to u and v. A linear algorithm to compute LCF is given, based on a 
linear
time algorithm to build the minimal suffix automaton of a word. The algorithm
yields a real-time string-matching algorithm."
</AB>
<JT>EATCS Bull</JT>
<PY>30</PY>
<VO>30</VO>
<PP>57-61</PP>
</SEQ>

<SEQ>
<UI>0141   Fitch,W.M.    Unresolved Problems in.. Lect.Math.Life  86 17:1-18
</UI>
<AU>Fitch WM
</AU>
<TI>Unresolved Problems in DNA Sequence Analysis
</TI>
<SU>Sequence analysis;
    Review;
    USA;
    Sequence comparison;
    Sequence alignment;
    Phylogeny;
    DNA
</SU>
<AB>"Problems in the analysis of DNA sequences can be of six classes.
[Structure analysis. Sequence comparison. Sequence alignment. Phylogeny
estimation. Sequence estimation, given a phylogeny. Analysis of mutation 
rates.]
These problems are complicated by biological considerations such as that 
changes
may occur in several ways that are context dependent. A number of unsolved
problems in each of these classes are formulated."
</AB>
<JT>Lect Math Life Sci</JT>
<PY>17</PY>
<VO>17</VO>
<PP>1-18</PP>
</SEQ>

<SEQ>
<UI>0142   Dromey,R.G.   A Fast Algorithm for T.. Austral.Comput. 79 
11(2):63-67
</UI>
<AU>Dromey RG
</AU>
<TI>A Fast Algorithm for Text Comparison
</TI>
<SU>Longest common;
    AU;
    String match;
    Edit;
    Algorithm
</SU>
<AB>"Two new algorithms for finding the longest unbroken common subsequence 
in
a pair of text files are presented. The algorithms are simple to implement,
economical on space requirements and they are highly efficient for the
comparison of pairs of text files for all ranges of overlap both large and
small."
</AB>
<JT>Austral Comput J</JT>
<PY>1979</PY>
<VO>11</VO>
<NO>2</NO>
<PP>63-67</PP>
</SEQ>

<SEQ>
<UI>0143   Ehrenfeucht,A A New Distance Metric .. Discrete Appl.M 88 
20:191-203
</UI>
<AU>Ehrenfeucht A;
    Haussler D
</AU>
<TI>A New Distance Metric on Strings Computable in Linear Time
</TI>
<SU>Sequence comparison;
    Sequence proximity;
    Correction;
    USA;
    Distance
</SU>
<AB>"We describe a new metric for sequence comparison that emphasizes global
similarity over sequential matching at the local level. It has the advantage
over the Levenshtein metric that strings of lengths n and m can be compared in
time proportional to n+m instead of nm. Various mathematical properties of the
metric are established."
</AB>
<JT>Discrete Appl Math</JT>
<PY>20</PY>
<VO>20</VO>
<PP>191-203</PP>
</SEQ>

<SEQ>
<UI>0144   Faloutsos,C.  Access Methods for Text  ACM Comput.Surv 85 
17(1):49-74
</UI>
<AU>Faloutsos C
</AU>
<TI>Access Methods for Text
</TI>
<SU>Retrieval;
    Review;
    CA;
    Compression;
    String match
</SU>
<AB>"This paper compares text retrieval methods intended for office systems.
The operational requirements of the office environment are discussed, and
retrieval methods from database systems and from information retrieval systems
are examined. We classify these methods and examine the most interesting
representatives of each class. Attempts to speed up retrieval with special
purpose hardware are also presented, and issues such as approximate string
matching and compression are discussed."
</AB>
<JT>ACM Comput Surveys </JT>
<PY>1985</PY>
<VO>17</VO>
<NO>1</NO>
<PP>49-74</PP>
</SEQ>

<SEQ>
<UI>0145   Gereb-Graus,M Three One-Way Heads Ca.. J.Comput.System 94 
48(1):1-8
</UI>
<AU>Gereb-Graus M;
    Li M
</AU>
<TI>Three One-Way Heads Cannot Do String Matching
</TI>
<SU>Automata;
    USA;
    String match
</SU>
<AB>"We prove that three-head one-way DFA [deterministic finite automata]
cannot perform string matching, that is, no three-head one-way DFA accepts the
language L = { x#y : x is a substring of y, where x,y are in {0,1}* }. This
answers the k = 3 case of the question whether a k-head one-way DFA can perform
string matching, raised by Galil and Seiferas."
</AB>
<JT>J Comput Systems Sci</JT>
<PY>1994</PY>
<VO>48</VO>
<NO>1</NO>
<PP>1-8</PP>
</SEQ>

<SEQ>
<UI>0146   Harrison,M.C. Implementation of the .. Comm.ACM        71 
14(12):777-779
</UI>
<AU>Harrison MC
</AU>
<TI>Implementation of the Substring Test by Hashing
</TI>
<SU>String match;
    USA
</SU>
<AB>"A technique is described for implementing the test which determines if
one string is a substring of another. When there is low probability that the
test will be satisfied, it is shown how the operation can be speeded up
considerably if it is preceded by a test on appropriately chosen hash codes of
the strings."
</AB>
<JT>Comm ACM </JT>
<PY>1971</PY>
<VO>14</VO>
<NO>12</NO>
<PP>777-779</PP>
</SEQ>

<SEQ>
<UI>0147   Kempf,M.      Time Optimal Left to R.. Acta Inform.    87 
24(4):461-474
</UI>
<AU>Kempf M;
    Bayer R;
    Guntzer U
</AU>
<TI>Time Optimal Left to Right Construction of Position Trees
</TI>
<SU>String match;
    DE;
    Search tree;
    Optimal
</SU>
<AB>"We are presenting a new algorithm for the on-line construction of
position trees. Reading a given input string from left to right we are
generating its position tree with the aid of the general concept of infix 
trees.
An additional chain structure within the trees, called tail node connection,
enables us to construct the tree within the best possible time (proportional to
the number of nodes). ... The position tree for a given text is a trie index
spelling out for every position the shortest substring starting at that 
position
and occurring nowhere else in the text."
</AB>
<JT>Acta Inform</JT>
<PY>1987</PY>
<VO>24</VO>
<NO>4</NO>
<PP>461-474</PP>
</SEQ>

<SEQ>
<UI>0148   Main,M.G.     An O(n log n) Algorith.. J.Algorithms    84 
5(3):422-432
</UI>
<AU>Main MG;
    Lorentz RJ
</AU>
<TI>An O(n log n) Algorithm for Finding all Repetitions in a String
</TI>
<SU>Regularities;
    Knuth-Morris-Pratt;
    USA;
    Repetition;
    Algorithm
</SU>
<AB>"Any nonempty string of the form xx is called a repetition. ... The
algorithm is based on a linear algorithm to find all the new repetitions formed
when two strings are concatenated. This linear algorithm is possible because 
new
repetitions of equal length must occur in blocks with consecutive starting
positions. The linear algorithm uses a variation of the Knuth-Morris-Pratt
algorithm to find all partial occurrences of a pattern within a text string. It
is also shown that no algorithm based on comparisons of symbols can improve O(n
log n)."
</AB>
<JT>J Algorithms </JT>
<PY>1984</PY>
<VO>5</VO>
<NO>3</NO>
<PP>422-432</PP>
</SEQ>

<SEQ>
<UI>0149   Meyer,B.      Incremental String Mat.. Inform.Process. 85 
21(5):219-227
</UI>
<AU>Meyer B
</AU>
<TI>Incremental String Matching
</TI>
<SU>Dictionary match;
    USA;
    String match;
    Complexity
</SU>
<AB>"The problem studied in this paper is to search a given text for
occurrences of certain strings, in the particular case where the set of strings
may change as the search proceeds. ... We show how [the algorithm of Aho and
Corasick] can be modified to allow incremental diagram construction, so that 
new
keywords may be entered at any time during the search. The incremental 
algorithm
presented essentially retains the time and space complexities of the non-
incremental one."
</AB>
<JT>Inform Process Lett</JT>
<PY>1985</PY>
<VO>21</VO>
<NO>5</NO>
<PP>219-227</PP>
</SEQ>

<SEQ>
<UI>0150   Moller-Nielse Experiments with a Fas.. Inform.Process. 84 
18(3):129-135
</UI>
<AU>Moller-Nielsen P;
    Staunstrup J
</AU>
<TI>Experiments with a Fast String Searching Algorithm
</TI>
<SU>String match;
    Parallel;
    Boyer-Moore;
    DK;
    String search;
    Algorithm
</SU>
<AB>"Consider the problem of finding the first occurrence of a particular
pattern in a (long) string of characters. Boyer and Moore (1977) found a fast
algorithm for doing this. Here we consider how this algorithm behaves when
executed on a multiprocessor. It is shown that a simple implementation performs
very well. This claim is based on experiments performed on the Multi-Maren
multiprocessor by the present authors (1982)."
</AB>
<JT>Inform Process Lett</JT>
<PY>1984</PY>
<VO>18</VO>
<NO>3</NO>
<PP>129-135</PP>
</SEQ>

<SEQ>
<UI>0151   Franklin,N.C. Conservation of Genome.. J.Mol.Biol.     84 
181:75-84
</UI>
<AU>Franklin NC
</AU>
<TI>Conservation of Genome Form but not Sequence in the Transcription
Antitermination Determinants of Bacteriophages l, f21 and P22
</TI>
<SU>Genome;
    Rearrangement;
    Sequence comparison;
    USA
</SU>
<AB>"Comparisons are made among DNA sequences upstream from terminators in
both leftwards and rightwards early operons of related coliphages l, f21 and
P22. ... Despite almost total disparity of DNA sequence, the three genomes can
be discerned to include the same elements in the same order and spacing ...."
</AB>
<JT>J Mol Biol</JT>
<PY>181</PY>
<VO>181</VO>
<PP>75-84</PP>
</SEQ>

<SEQ>
<UI>0152   Rivest,R.L.   Partial-Match Retrieva.. SIAM J.Comput.  76 
5(1):19-50
</UI>
<AU>Rivest RL
</AU>
<TI>Partial-Match Retrieval Algorithms
</TI>
<SU>Partial match;
    Match with don't cares;
    USA;
    Retrieval;
    Algorithm
</SU>
<AB>"We examine the efficiency of hash-coding and tree-search algorithms for
retrieving from a file of k-letter words all words which match a partially-
specified input query word (for example, retrieving all six-letter English 
words
of the form S**R*H where '*' is a 'don't care' character)."
</AB>
<JT>SIAM J Comput</JT>
<PY>1976</PY>
<VO>5</VO>
<NO>1</NO>
<PP>19-50</PP>
</SEQ>

<SEQ>
<UI>0153   Seiferas,J.   Real-Time Recognition .. Math.Systems Th 77 
11:111-146
</UI>
<AU>Seiferas J;
    Galil Z
</AU>
<TI>Real-Time Recognition of Substring Repetition and Reversal
</TI>
<SU>Regularities;
    Automata;
    Language;
    USA;
    String match;
    Repetition;
    Reversal;
    Recognition
</SU>
<AB>"Real-time multitape Turing machine algorithms are presented for
recognizing the languages { wxyxz : |w| = r|x|, |y| = s|x|, |z| = t|x| } and {
wxyxRz : |w| = r|x|, |y| = s|x|, |z| = t|x| } for fixed r, s, and t and for
string-matching with 'forced mismatches'."
</AB>
<JT>Math Systems Theory </JT>
<PY>11</PY>
<VO>11</VO>
<PP>111-146</PP>
</SEQ>

<SEQ>
<UI>0154   Slisenko,A.O. Recognizing a Symmetry.. Proc.Steklov In 73 
129:25-208
</UI>
<AU>Slisenko AO
</AU>
<TI>Recognizing a Symmetry Predicate by Multihead Turing Machines with Input
</TI>
<SU>Regularities;
    Automata;
    RU
</SU>
<AB>Only pages 25-28 and 208 are in the file. "It is proved that a symmetry
predicate [i.e., a palindrome] can be recognized by a certain six-head Turing
Machine with input in real time."
</AB>
<JT>Proc Steklov Inst Math</JT>
<PY>129</PY>
<VO>129</VO>
<PP>25-208</PP>
</SEQ>

<SEQ>
<UI>0155   Tarhio,J.     Boyer-Moore Approach t.. Lecture Notes i 90 
447:348-359
</UI>
<AU>Tarhio J;
    Ukkonen E
</AU>
<TI>Boyer-Moore Approach to Approximate String Matching
</TI>
<SU>Approximate match;
    Boyer-Moore;
    Match with k mismatches;
    Match with k differences;
    String match;
    FI
</SU>
<AB>Proceedings Scandinavian Workshop in Algorithmic Theory, SWAT'90. "The
Boyer-Moore idea applied in exact string matching is generalized to approximate
string matching. Two versions of the problem are considered. The k mismatches
problem is to find all approximate occurrences of a pattern string in a text
string with at most k mismatches. ... A related algorithm is developed for the 
k
differences problem where the task is to find all approximate occurrences of a
pattern in a text with &lt;= k differences (insertions, deletions, changes)."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>447</PY>
<VO>447</VO>
<PP>348-359</PP>
</SEQ>

<SEQ>
<UI>0156   Tharp,A.L.    The Practicality of Te.. Software.Practi 82 12:35-44
</UI>
<AU>Tharp AL;
    Tai KC
</AU>
<TI>The Practicality of Text Signatures for Accelerating String Searching
</TI>
<SU>String match;
    Signature;
    USA;
    Fingerprint;
    String search
</SU>
<AB>"This paper studies the use of text signatures in string searching. Text
signatures are a coded representation of a unit of text formed by hashing
substrings into bit positions which are, in turn, set to one. Then instead of
searching an entire line of text exhaustively, the text signature may be
examined first to determine if complete processing is warranted."
</AB>
<JT>Software Practice Experience </JT>
<PY>12</PY>
<VO>12</VO>
<PP>35-44</PP>
</SEQ>

<SEQ>
<UI>0157   Stormo,G.D.   Use of the 'Perceptron.. Nucleic Acids R 82 
10(9):2997-301
</UI>
<AU>Stormo GD;
    Schneider TD;
    Gold L;
    Ehrenfeucht A
</AU>
<TI>Use of the 'Perceptron' Algorithm to Distinguish Translational Initiation
Sites in E. coli
</TI>
<SU>Match a pattern matrix;
    Perceptron;
    USA;
    Algorithm
</SU>
<AB>"We have used a 'Perceptron' algorithm to find a weighting function which
distinguishes E. coli translational initiation sites from all other sites in a
library of over 78,000 nucleotides of mRNA sequence."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1982</PY>
<VO>10</VO>
<NO>9</NO>
<PP>2997-3011</PP>
</SEQ>

<SEQ>
<UI>0158   Staden,R.     Finding Protein Coding.. Methods Enzymol 90 
183:163-181
</UI>
<AU>Staden R
</AU>
<TI>Finding Protein Coding Regions in Genomic Sequences
</TI>
<SU>Sequence analysis;
    UK;
    Region;
    Coding;
    Signal;
    Protein;
    Genomic
</SU>
<AB>"There are two types of information that can be used for finding protein
coding regions. The first is to look for the special, so-called signal
sequences, such as splice junctions and promoters, that surround coding 
regions.
This is often called gene search by signal. The second is to examine long
sections of the DNA to see if they look more like coding sequence than 
noncoding
sequence. These latter methods are often described as gene search by content 
and
are the subject of this chapter."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>163-181</PP>
</SEQ>

<SEQ>
<UI>0159   Danckaert,A.  'Size Leap' Algorithm:.. Comput.Appl.Bio 91 
7(4):509-513
</UI>
<AU>Danckaert A;
    Chappey C;
    Hazout S
</AU>
<TI>'Size Leap' Algorithm: An Efficient Extraction of the Longest Common
Motifs from a Molecular Sequence Set. Application to the DNA Sequence
Reconstruction.
</TI>
<SU>Multiple alignment;
    Reconstruct;
    FR;
    Motif;
    Region;
    Longest common;
    DNA;
    Sequence reconstruction;
    Algorithm
</SU>
<AB>"We propose a new method, called 'size leap' algorithm, of search for
motifs of maximum size and common to two fragments at least. It allows the
creation of a reduced database of motifs from a set of sequences whose size
obeys the series of Fibonacci numbers. The convenience lies in the efficiency 
of
the motif extraction. It can be applied in the establishment of overlap regions
for DNA sequence reconstruction and multiple alignment of biological 
sequences."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1991</PY>
<VO>7</VO>
<NO>4</NO>
<PP>509-513</PP>
</SEQ>

<SEQ>
<UI>0160   Isono,K.      A Computer Program Pac.. Nucleic Acids R 84 
12(1):101-112
</UI>
<AU>Isono K
</AU>
<TI>A Computer Program Package for Storing and Retrieving DNA/RNA and Protein
Sequence Data
</TI>
<SU>Database search;
    DE;
    Program;
    Protein
</SU>
<AB>"Program DATBAS is for storing and improving DNA sequence data ....
Programs NUCDAT and PROTEN are for analyzing DNA/RNA and protein sequence data,
respectively. ... Program LITRAT enables users to prepare a scientific
literature file convenient for writing scientific articles, and program STRAIN
for storing information concerning bacterial and/or plasmid strains."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1984</PY>
<VO>12</VO>
<NO>1</NO>
<PP>101-112</PP>
</SEQ>

<SEQ>
<UI>0161   Churchill,G.A The Accuracy of DNA Se.. Genomics        92 14:89-98
</UI>
<AU>Churchill GA;
    Waterman MS
</AU>
<TI>The Accuracy of DNA Sequences: Estimating Sequence Quality
</TI>
<SU>Reconstruct;
    Consensus sequence;
    USA;
    Statistical;
    Likelihood;
    Fragment;
    DNA;
    Accuracy
</SU>
<AB>"In this paper we describe a method of the statistical reconstruction of 
a
large DNA sequence from a set of sequenced fragments. We assume that the
fragments have been assembled and address the problem of determining the degree
to which the constructed sequence is free from errors, i.e., its accuracy. ... 
A
likelihood-based procedure for the estimation of the sequencing error rates,
which utilizes an iterative EM algorithm, is described. ... We present three
different approaches to the definition of a consensus sequence."
</AB>
<JT>Genomics </JT>
<PY>14</PY>
<VO>14</VO>
<PP>89-98</PP>
</SEQ>

<SEQ>
<UI>0162   Krawetz,S.A.  Sequence Errors Descri.. Nucleic Acids R 89 
17(10):3951-39
</UI>
<AU>Krawetz SA
</AU>
<TI>Sequence Errors Described in GenBank: A Means to Determine the Accuracy 
of
DNA Sequence Interpretation
</TI>
<SU>Sequence analysis;
    Significance;
    USA;
    Error;
    DNA;
    GenBank;
    Accuracy
</SU>
<AB>"The accuracy of nucleic acid sequence data interpretation was determined
by assessing and quantifying the discrepancies reported in the GenBank 
database.
This permitted the calculation of an Error Rate (ER) for nucleic acid sequence
determination. ... This establishes the first set of limit boundaries of the ER
for sequence interpretation and sequence errors within the GenBank database and
provides the foundation for future assessments and the monitoring of sequence
data accumulation."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1989</PY>
<VO>17</VO>
<NO>10</NO>
<PP>3951-3957</PP>
</SEQ>

<SEQ>
<UI>0163   States,D.J.   Molecular Sequence Acc.. Proc.Nat.Acad.S 91 
88:5518-5522
</UI>
<AU>States DJ;
    Botstein D
</AU>
<TI>Molecular Sequence Accuracy and the Analysis of Protein Coding Regions
</TI>
<SU>Pairwise alignment;
    Error;
    Significance;
    USA;
    Region;
    Coding;
    Protein;
    Accuracy
</SU>
<AB>"We studied the impact of nucleic acid sequence errors on the ability to
align predicted amino acid sequences with the sequences of related proteins. We
found that with a simultaneous translation and alignment algorithm,
identification of sequence homologies is resilient to the introduction of 
random
errors. Proteins with &gt;30% sequence identity can be reliably recognized even in
the presence of 1% frameshifting (insertion or deletion) error rates and 5% 
base
substitution rates."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>88</PY>
<VO>88</VO>
<PP>5518-5522</PP>
</SEQ>

<SEQ>
<UI>0164   Hein,J.       Reconstructing Evoluti.. Math.Biosci.    90 
98:185-200
</UI>
<AU>Hein J
</AU>
<TI>Reconstructing Evolution of Sequences Subject to Recombination Using
Parsimony
</TI>
<SU>Phylogeny;
    USA;
    Segment;
    Dynamic programming;
    Parsimony;
    Evolution;
    Recombination
</SU>
<AB>"It is demonstrated that the appropriate structure to represent the
evolution of sequences with recombinations is a family of trees each describing
the evolution of a segment of the sequence. Two trees for neighboring segments
will differ by exactly the transfer of a subtree within the whole tree. This
leads to a metric between trees .... This metric is used to formulate a dynamic
programming algorithm that finds the most parsimonious history that fits a 
given
set of sequences. The algorithm is potentially very practical, since many 
groups
of sequences defy analysis by methods that ignore recombinations.."
</AB>
<JT>Math Biosci</JT>
<PY>98</PY>
<VO>98</VO>
<PP>185-200</PP>
</SEQ>

<SEQ>
<UI>0165   Olsen,G.J.    Phylogenetic Analysis .. Methods Enzymol 88 
164:793-812
</UI>
<AU>Olsen GJ
</AU>
<TI>Phylogenetic Analysis Using Ribosomal RNA
</TI>
<SU>Phylogeny;
    USA;
    RNA;
    Phylogenetic
</SU>
<AB>"The inference of phylogenetic relationships from molecular data (i.e.,
the field of molecular evolution) is contributing greatly to our understanding
of the evolution of life on Earth. Although the discussion that follows is
directed toward analyses based on rRNA sequences, nearly all of the concepts,
and many of the details, are equally applicable to the other DNA, RNA, or
protein sequences. ... The merits of rRNA for phylogenetic inference ... 
include
universality, functional constancy, ease of identification and isolation, and
apparent lack of lateral gene transfer."
</AB>
<JT>Methods Enzymol</JT>
<PY>164</PY>
<VO>164</VO>
<PP>793-812</PP>
</SEQ>

<SEQ>
<UI>0166   Claverie,J.M. Information Enhancemen.. Computers Chem. 93 
17(2):191-201
</UI>
<AU>Claverie JM;
    States DJ
</AU>
<TI>Information Enhancement Methods for Large Scale Sequence Analysis
</TI>
<SU>Database search;
    Significance;
    USA;
    Sequence analysis;
    Mask;
    Program
</SU>
<AB>"The improved efficiency of similarity search programs and the
affordability of even faster computers allow studies where whole sequence
databases can be the target of various comparisons with increasingly larger or
numerous query sequences. However, the usefulness of those 'brute force' 
methods
now becomes limited by the time it takes an experienced scientist to sift the
biologically relevant matches from overwhelming, albeit 'statistically
significant' outputs. ... We present two masking methods ... capable of
eliminating most of the irrelevant outputs in a variety of large scale sequence
analysis situations ...."
</AB>
<JT>Computers Chem</JT>
<PY>1993</PY>
<VO>17</VO>
<NO>2</NO>
<PP>191-201</PP>
</SEQ>

<SEQ>
<UI>0167   Lebbe,J.      Local Predictability i.. Biochimie       93 
75(5):371-378
</UI>
<AU>Lebbe J;
    Vignes R
</AU>
<TI>Local Predictability in Biological Sequences, Algorithm and Applications
</TI>
<SU>Sequence analysis;
    Significance;
    Sequence prediction;
    FR;
    Algorithm
</SU>
<AB>"The goal of this paper is to propose an algorithm based on the k nearest
neighbours to compute a local predictability measure in biological sequences.
Some ideas about the usefulness of this measure are discussed on the basis of
preliminary experimentations. ... Therefore we propose: to learn a system that
predicts each letter of a sequence, to compare each predicted letter with the
real so as to compute a local predictability measure, and to locate the zones
where the letters are particularly well or badly predicted."
</AB>
<JT>Biochimie </JT>
<PY>1993</PY>
<VO>75</VO>
<NO>5</NO>
<PP>371-378</PP>
</SEQ>

<SEQ>
<UI>0168   Philippe,H.   MUST, A Computer Packa.. Nucleic Acids R 93 
21(22):5264-52
</UI>
<AU>Philippe H
</AU>
<TI>MUST, A Computer Package of Management Utilities for Sequences and Trees
</TI>
<SU>Program;
    Management;
    Phylogeny;
    FR;
    Display
</SU>
<AB>"The MUST package is a phylogenetically oriented set of programs for data
management and display, allowing one to handle both raw data (sequences) and
results (trees, number of steps, bootstrap proportions). It is complementary to
the main available software for phylogenetic analysis (PHYLIP, PAUP, HENNIG86,
CLUSTAL) with which it is fully compatible."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1993</PY>
<VO>21</VO>
<NO>22</NO>
<PP>5264-5272</PP>
</SEQ>

<SEQ>
<UI>0169   Taylor,W.R.   Deriving an Amino Acid.. J.Theor.Biol.   93 
164(1):65-83
</UI>
<AU>Taylor WR;
    Jones DT
</AU>
<TI>Deriving an Amino Acid Distance Matrix
</TI>
<SU>Sequence proximity;
    Substitution;
    UK;
    Sequence alignment;
    Distance;
    Amino acid;
    Matrix
</SU>
<AB>"Various methods were investigated to convert an amino acid similarity
matrix into a low-dimensional metric distance matrix. Using projection
techniques, no unique transformation was found and of the many inversion forms
investigated, simple negation normalized by the diagonal elements produced a
good fit to the original data. ... The derived forms might find applications in
sequence alignment, including pattern-matching algorithms, and the construction
of phylogenetic trees."
</AB>
<JT>J Theor Biol</JT>
<PY>1993</PY>
<VO>164</VO>
<NO>1</NO>
<PP>65-83</PP>
</SEQ>

<SEQ>
<UI>0170   Allison,L.    Normalization of Affin.. J.Theor.Biol.   93 
161(2):263-269
</UI>
<AU>Allison L
</AU>
<TI>Normalization of Affine Gap Costs Used in Optimal Sequence Alignment
</TI>
<SU>Sequence proximity;
    AU;
    Sequence alignment;
    Edit;
    Automata;
    Gap;
    Optimal
</SU>
<AB>"It is shown how to normalize the costs of an alignment algorithm that
employs affine or linear gap costs. The normalized costs are interpreted as the
-log probabilities of the instructions of a finite-state edit-machine. This
gives an explicit model relating sequences that can be linked to processes of
mutation and evolution."
</AB>
<JT>J Theor Biol</JT>
<PY>1993</PY>
<VO>161</VO>
<NO>2</NO>
<PP>263-269</PP>
</SEQ>

<SEQ>
<UI>0171   Vingron,M.    Weighting in Sequence .. Proc.Nat.Acad.S 93 
90(19):8777-87
</UI>
<AU>Vingron M;
    Sibbald PR
</AU>
<TI>Weighting in Sequence Space: A Comparison of Methods in Terms of
Generalized Sequences
</TI>
<SU>Multiple alignment;
    Significance;
    Sequence weight;
    USA
</SU>
<AB>"A geometric analysis based on a continuous sequence space is presented
that provides a common framework in which to compare [four methods for 
weighting
aligned biological sequences]. It is concluded that there are two 'best'
methods. When the sequences are known to be phylogenetically related ..., the
method of Altschul, Carroll and Lipman (1989) is appropriate. When the 
sequences
are not known to be phylogenetically related ... a modification of the method 
of
Sibbald and Argos (1990) is preferable."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>1993</PY>
<VO>90</VO>
<NO>19</NO>
<PP>8777-8781</PP>
</SEQ>

<SEQ>
<UI>0172   Rinsma-Melche The Expected Number of.. N.Z.J.Bot.      93 
31(3):219-230
</UI>
<AU>Rinsma-Melchert I
</AU>
<TI>The Expected Number of Matches in Optimal Global Sequence Alignments
</TI>
<SU>Pairwise alignment;
    Significance;
    NZ;
    Sequence alignment;
    Optimal
</SU>
<AB>"This paper outlines how lattice walks and generating functions could be
used to find the expected number of matches in the optimal alignment of two
sequences, in several special cases. Solving the resulting equations proves
difficult."
</AB>
<JT>N Z J Bot</JT>
<PY>1993</PY>
<VO>31</VO>
<NO>3</NO>
<PP>219-230</PP>
</SEQ>

<SEQ>
<UI>0173   Altschul,S.F. A Protein Alignment Sc.. J.Mol.Evol.     93 
36(3):290-300
</UI>
<AU>Altschul SF
</AU>
<TI>A Protein Alignment Scoring System Sensitive at all Evolutionary 
Distances
</TI>
<SU>Sequence proximity;
    Substitution;
    USA;
    Statistical;
    Database search;
    Evolutionary distance;
    Scoring;
    Distance;
    Protein
</SU>
<AB>"Because in a database search it generally is not known a priori what
evolutionary distances will characterize  the similarities found, it is
necessary to employ an appropriate range of [substitution] matrices in order 
not
to overlook potential homologies. This paper formalizes this concept by 
defining
a scoring system that is sensitive at all detectable evolutionary distances. 
The
statistical behavior of this scoring system is analyzed, and it is shown that
for a typical protein database search, estimating the originally unknown
evolutionary distance appropriate to each alignment costs slightly over two 
bits
of information ...."
</AB>
<JT>J Mol Evol</JT>
<PY>1993</PY>
<VO>36</VO>
<NO>3</NO>
<PP>290-300</PP>
</SEQ>

<SEQ>
<UI>0174   Saccone,C.    Time and Biosequences    J.Mol.Evol.     93 
37(2):154-159
</UI>
<AU>Saccone C;
    Lanave C;
    Pesole G
</AU>
<TI>Time and Biosequences
</TI>
<SU>Sequence proximity;
    Italy;
    Evolutionary distance
</SU>
<AB>"In both quantitative and qualitative measurements of the genetic
distances [of biosequences], the compositional constraints of the nucleotide
sequences play a very important role. We demonstrate that when homologous
sequences significantly differ in base composition we get erratic branching
order and/or wrong evaluation of the evolutionary distances."
</AB>
<JT>J Mol Evol</JT>
<PY>1993</PY>
<VO>37</VO>
<NO>2</NO>
<PP>154-159</PP>
</SEQ>

<SEQ>
<UI>0175   Sneath,P.H.A. A Proposal on Metrics .. FEMS Microbiol. 93 
106(1):1-8
</UI>
<AU>Sneath PHA
</AU>
<TI>A Proposal on Metrics for Identification using Nucleic Acid Sequences
</TI>
<SU>Sequence proximity;
    UK;
    Identification;
    Nucleic acid
</SU>
<AB>"The need is stressed for attempts to be made to permit diagnostic 
nucleic
acid sequences to be used in a quantitative manner. Sequence differences or
binding values should be converted to a distance measure and from this an
ultrametric tree should be constructed. A single quantitative determination can
yield considerable information about the likely identity of an unknown
microorganism when the distance obtained from the sequence is compared with the
tree."
</AB>
<JT>FEMS Microbiol Lett</JT>
<PY>1993</PY>
<VO>106</VO>
<NO>1</NO>
<PP>1-8</PP>
</SEQ>

<SEQ>
<UI>0176   Johnson,M.S.  A Structural Basis for.. J.Mol.Biol.     93 
233(4):716-738
</UI>
<AU>Johnson MS;
    Overington JP
</AU>
<TI>A Structural Basis for Sequence Comparisons. An Evaluation of Scoring
Methodologies
</TI>
<SU>Sequence proximity;
    Substitution;
    UK;
    Sequence comparison;
    Scoring
</SU>
<AB>"A residue-exchange matrix has been derived that is suitable for
comparison of amino acid sequences. ... The majority of the data is from
structural comparisons where there is between 15 and 40% sequence identity. As 
a
result, a scoring matrix such as the one devised here should provide a 
sensitive
basis for the comparison of amino acid sequences and the search for homologous
sequences in amino acid databases. In order to asses the value of this matrix 
we
have made a comparative analysis with 12 other published scoring matrices that
have been used for the alignment of protein amino acid sequences."
</AB>
<JT>J Mol Biol</JT>
<PY>1993</PY>
<VO>233</VO>
<NO>4</NO>
<PP>716-738</PP>
</SEQ>

<SEQ>
<UI>0177   Apostolico,A. Guest Editor's Forewor.. Algorithmica    94 
12(4/5):245-24
</UI>
<AU>Apostolico A
</AU>
<TI>Guest Editor's Foreword. Special Issue on String Algorithmics and Its
Applications
</TI>
<SU>String search;
    Approximate match;
    Edit;
    Distance;
    USA
</SU>
<AB>"Most of the past and current research in string algorithmics falls into
one of the following problem categories: exact search, computation of edit
distances, and approximate search. So does the majority of the papers in this
special issue."
</AB>
<JT>Algorithmica </JT>
<PY>1994</PY>
<VO>12</VO>
<NO>4/5</NO>
<PP>245-246</PP>
</SEQ>

<SEQ>
<UI>0178   Gregor,J.     Dynamic Programming Al.. IEEE Trans.Patt 93 
15(2):129-135
</UI>
<AU>Gregor J;
    Thomason MG
</AU>
<TI>Dynamic Programming Alignment of Sequences Representing Cyclic Patterns
</TI>
<SU>Pairwise alignment;
    USA;
    Dynamic programming;
    Complexity;
    Dynamic
</SU>
<AB>"String alignment by dynamic programming is generalized to include cyclic
shift and corresponding optimal alignment cost for strings representing cyclic
patterns. A guided search algorithm uses bounds on actual alignment costs to
find all optimal cyclic shifts. ... Algorithmic complexity is analyzed for 
major
stages in the search. Applicability of the method is illustrated with satellite
DNA sequences and circularly permuted protein sequences."
</AB>
<JT>IEEE Trans Patt Anal Mach Intell</JT>
<PY>1993</PY>
<VO>15</VO>
<NO>2</NO>
<PP>129-135</PP>
</SEQ>

<SEQ>
<UI>0179   Blum,N.       On Locally Optimal Ali.. Lecture Notes i 92 
577:425-436
</UI>
<AU>Blum N
</AU>
<TI>On Locally Optimal Alignments in Genetic Sequences
</TI>
<SU>Pairwise alignment;
    DE;
    Locally optimal;
    Optimal;
    Genetic
</SU>
<AB>Proceedings, Ninth Symposium on Theoretical Aspects of Computer Science
(STACS). "We show how to compute all substrings of [a text string] x which have
c-locally minimal [edit] distance from [a pattern string] y and all
corresponding alignments in O(mn) time where n is the length of x and m is the
length of y."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>577</PY>
<VO>577</VO>
<PP>425-436</PP>
</SEQ>

<SEQ>
<UI>0180   Jacobson,G.   Heaviest Increasing/Co.. Lecture Notes i 92 
644:52-66
</UI>
<AU>Jacobson G;
    Vo KP
</AU>
<TI>Heaviest Increasing/Common Subsequence Problems
</TI>
<SU>Longest common;
    Subsequence;
    USA
</SU>
<AB>"We define the heaviest increasing subsequence (HIS) and heaviest common
subsequence (HCS) problems as natural generalizations of the well-studied
longest increasing subsequence (LIS) and longest common subsequence (LCS)
problems. We show how the famous Robinson-Schensted correspondence between
permutations and pairs of Young tableaux can be extended to compute heaviest
increasing subsequences. Then, we point out a simple weight-preserving
correspondence between the HIS and HCS problems. From this duality ... the 
Hunt-
Szymanski LCS algorithm can be seen as a special case of the Robinson-Schensted
algorithm."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>644</PY>
<VO>644</VO>
<PP>52-66</PP>
</SEQ>

<SEQ>
<UI>0181   Abarbanel,R.M Rapid Searches for Com.. Nucleic Acids R 84 
12(1):263-280
</UI>
<AU>Abarbanel RM;
    Wieneke PR;
    Mansfield E;
    Jaffe DA;
    Brutlag DL
</AU>
<TI>Rapid Searches for Complex Patterns in Biological Molecules
</TI>
<SU>Match complex patterns;
    Automata;
    USA;
    Pattern match;
    Database search
</SU>
<AB>"We have developed a tool called QUEST to allow the flexible exploration
of sequences in a data bank. QUEST combines the flexibility of the UNIX pattern
matching utilities and the speed of a finite state machine. In addition, QUEST
allows patterns to be defined in terms of other patterns."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1984</PY>
<VO>12</VO>
<NO>1</NO>
<PP>263-280</PP>
</SEQ>

<SEQ>
<UI>0182   Abrahamson,K. Generalized String Mat.. SIAM J.Comput.  87 
16(6):1039-105
</UI>
<AU>Abrahamson K
</AU>
<TI>Generalized String Matching
</TI>
<SU>String match;
    CA;
    Language
</SU>
<AB>"This paper investigates a generalization of string matching, in which 
the
pattern is a sequence of pattern elements, each compatible with a set of
symbols. The alphabet of symbols is infinite, with its members encoded in a
finite alphabet. ... The obvious algorithm for generalized string matching
requires time O(NM), where N is the length of the encoding of the pattern, and 
M
is that of the object string." Then a better algorithm is described.
</AB>
<JT>SIAM J Comput</JT>
<PY>1987</PY>
<VO>16</VO>
<NO>6</NO>
<PP>1039-1051</PP>
</SEQ>

<SEQ>
<UI>0183   Aho,A.V.      Algorithms for Finding.. Handbook of T.. 90Elsevier 
Scienc
</UI>
<AU>Aho AV
</AU>
<TI>Algorithms for Finding Patterns in Strings
</TI>
<ED>van Leeuwen J
</ED>
<BK>Handbook of Theoretical Computer Science, Volume A, Algorithms and
Complexitys
</BK>
<SU>Pattern match;
    Review;
    USA;
    Language;
    Expression;
    String match;
    Algorithm
</SU>
<AB>Notations for patterns. Matching keywords. Matching sets of keywords.
Matching regular expressions. Related problems, including a terse review of
approximate string matching. "No single algorithm is known for the longest-
common-subsequence problem that dominates all applications."
</AB>
<PU>Elsevier Science </PU>
<PL>Amsterdam </PL>
<PY>1990</PY>
<PP>255-300</PP>
</SEQ>

<SEQ>
<UI>0184   Goldberg,T.   Faster Parallel String.. J.Algorithms    94 
16:295-308
</UI>
<AU>Goldberg T;
    Zwick U
</AU>
<TI>Faster Parallel String Matching via Larger Deterministic Samples
</TI>
<SU>Parallel;
    IL;
    String match
</SU>
<AB>"Building on previous results of Breslauer, Galil, and Vishkin, we obtain
for every p(m) = O(log log m) an optimal speedup parallel string matching
algorithm that can preprocess a pattern P of length m in time O( p(m) ) and can
then find all occurrences of P in a text of an arbitrary length in time O(log
log m / log p(m) )."
</AB>
<JT>J Algorithms </JT>
<PY>16</PY>
<VO>16</VO>
<PP>295-308</PP>
</SEQ>

<SEQ>
<UI>0185   Hein,J.       An Algorithm Combining.. J.Theor.Biol.   94 
167:169-174
</UI>
<AU>Hein J
</AU>
<TI>An Algorithm Combining DNA and Protein Alignment
</TI>
<SU>Pairwise alignment;
    Coding;
    Protein;
    DNA;
    Genomic;
    DK;
    Algorithm
</SU>
<AB>"An algorithm is presented that aligns two DNA sequences minimizing the
overall amount of evolution that the associated proteins have experienced. It 
is
generalized to minimizing a weighted average of protein and DNA evolution. ...
This algorithm could undoubtedly be generalized to align DNA with many coding
frames in it. However, this would be very complicated, but highly practical as
this could align genomic structures well."
</AB>
<JT>J Theor Biol</JT>
<PY>167</PY>
<VO>167</VO>
<PP>169-174</PP>
</SEQ>

<SEQ>
<UI>0186   Kim,J.Y.      Fast String Matching u.. Software.Practi 94 
24(1):79-88
</UI>
<AU>Kim JY;
    Shawe-Taylor J
</AU>
<TI>Fast String Matching using an n-Gram Algorithm
</TI>
<SU>N-gram;
    Boyer-Moore;
    UK;
    String match;
    String search;
    Pattern match;
    Algorithm
</SU>
<AB>"Experimental results are given for the application of a new n-gram
algorithm to substring searching in DNA strings. The results confirm 
theoretical
predictions of expected running times based on the assumption that the data are
drawn from a stationary ergodic source. They also confirm that the algorithms
tested are the most efficient known for searches involving larger patterns."
</AB>
<JT>Software Practice Experience </JT>
<PY>1994</PY>
<VO>24</VO>
<NO>1</NO>
<PP>79-88</PP>
</SEQ>

<SEQ>
<UI>0187   Chaitin,G.J.  On the Length of Progr.. J.Assoc.Comput. 66 
13:547-569
</UI>
<AU>Chaitin GJ
</AU>
<TI>On the Length of Programs for Computing Finite Binary Sequences
</TI>
<SU>Sequence analysis;
    Significance;
    Information theory;
    USA;
    Automata;
    Program
</SU>
<AB>"The use of Turing machines for calculating finite binary sequences is
studied from the point of view of information theory and the theory of 
recursive
functions. Various results are obtained concerning the number of instructions 
in
programs. A modified form of Turing machine is studied from the same point of
view. An application to the problem of defining a patternless sequence is
proposed in terms of the concepts here developed."
</AB>
<JT>J Assoc Comput Mach</JT>
<PY>13</PY>
<VO>13</VO>
<PP>547-569</PP>
</SEQ>

<SEQ>
<UI>0188   Morimoto,K.   A Method of Compressin.. Software.Practi 94 
24(3):265-288
</UI>
<AU>Morimoto K;
    Iriguchi H;
    Aoe JI
</AU>
<TI>A Method of Compressing Trie Structures
</TI>
<SU>Database search;
    Search tree;
    JP;
    Data structure;
    Structure
</SU>
<AB>"A trie structure can immediately determine whether a desired key is in a
given key set or not, and can find its longest match easily. ... However, the
total number of states of a trie becomes large, so space requirements are not
good for a huge key set. To resolve this disadvantage a new structure which
reduces the total number of states in a traditional trie, called a double-trie,
is introduced in this paper. Insertion and deletion operation, as well as key
retrieval for this double-trie, are presented."
</AB>
<JT>Software Practice Experience </JT>
<PY>1994</PY>
<VO>24</VO>
<NO>3</NO>
<PP>265-288</PP>
</SEQ>

<SEQ>
<UI>0189   Smith,P.D.    On Tuning the Boyer-Mo.. Software.Practi 94 
24(4):435-436
</UI>
<AU>Smith PD
</AU>
<TI>On Tuning the Boyer-Moore-Horspool String Searching Algorithm
</TI>
<SU>String match;
    Boyer-Moore;
    USA;
    String search;
    Algorithm
</SU>
<AB>"Experiments suggest that recently reported improvements to the Boyer-
Moore-Horspool string searching algorithm may be due to compiler effects rather
than to properties of the language begin searched."
</AB>
<JT>Software Practice Experience </JT>
<PY>1994</PY>
<VO>24</VO>
<NO>4</NO>
<PP>435-436</PP>
</SEQ>

<SEQ>
<UI>0190   Wright,A.H.   Approximate String Mat.. Software.Practi 94 
24(4):337-362
</UI>
<AU>Wright AH
</AU>
<TI>Approximate String Matching using Within-word Parallelism
</TI>
<SU>Match with k differences;
    USA;
    Dynamic programming;
    String match
</SU>
<AB>An implementation of the dynamic programming algorithm for this problem 
is
given that packs several characters and mod-4 integers into a computer word.
Thus, it is a parallelization of the conventional implementation that runs on
ordinary processors. Since a small alphabet means that characters have short
binary codes, the degree of parallelism is greatest for small alphabets and for
processors with long words. For an alphabet of size 8 or smaller and a 64 bit
processor, a 21-fold parallelism over the conventional algorithm can be
obtained."
</AB>
<JT>Software Practice Experience </JT>
<PY>1994</PY>
<VO>24</VO>
<NO>4</NO>
<PP>337-362</PP>
</SEQ>

<SEQ>
<UI>0191   Stormo,G.D.   Probing Information Co.. Methods Enzymol 91 
208:458-468
</UI>
<AU>Stormo GD
</AU>
<TI>Probing Information Content of DNA-Binding Sites
</TI>
<SU>Consensus sequence;
    Information content;
    USA;
    Statistical
</SU>
<AB>Sauer, R.T., ed. Protein-DNA Interactions. San Diego: Academic Press. "An
information content analysis of protein-binding sites gives a quantitative
description of the specificity of the protein, independent of the mechanism of
specificity. It gives useful information about the total specificity of the
protein and about the individual positions within the binding sites. 
Information
content is consistent with both thermodynamic and statistical analyses of
specificity. When applied to a collection of known binding sites, the
description provided may be limited by the sample size or by unknown 
constraints
on those sites. Experimental procedures to determine the information content 
can
give much more reliable measures."
</AB>
<JT>Methods Enzymol</JT>
<PY>208</PY>
<VO>208</VO>
<PP>458-468</PP>
</SEQ>

<SEQ>
<UI>0192   Russell,R.B.  The Limits of Protein .. J.Mol.Biol.     93 
234:951-957
</UI>
<AU>Russell RB;
    Barton GJ
</AU>
<TI>The Limits of Protein Secondary Structure Prediction Accuracy from
Multiple Sequence Alignment
</TI>
<SU>Multiple alignment;
    Structure;
    UK;
    Sequence alignment;
    Protein;
    Prediction;
    Secondary;
    Accuracy
</SU>
<AB>"The expected best residue-by-residue accuracies for secondary structure
prediction from multiple protein sequence alignment have been determined by an
analysis of known protein structural families. The results show substantial
variation is possible among homologous protein structures , and that 100%
agreement is unlikely between a consensus prediction and one member of a 
protein
structural family. The study provides the range of agreement to be expected
between a perfect secondary structure prediction from a multiple alignment and
each protein within the alignment."
</AB>
<JT>J Mol Biol</JT>
<PY>234</PY>
<VO>234</VO>
<PP>951-957</PP>
</SEQ>

<SEQ>
<UI>0193   Orengo,C.A.   A Local Alignment Meth.. J.Mol.Biol.     93 
233:488-497
</UI>
<AU>Orengo CA;
    Taylor WR
</AU>
<TI>A Local Alignment Method for Protein Structure Motifs
</TI>
<SU>Structure;
    UK;
    Motif;
    Dynamic programming;
    Sequence alignment;
    Protein
</SU>
<AB>"A method for the comparison of protein three-dimensional substructures
was developed. The method employs the double dynamic programming method of
Taylor and Orengo but identifies multiple local alignments rather than a single
global alignment. A modification based on the Smith Waterman algorithm for
sequence alignment enables the automatic identification and growth of the most
structurally similar local alignments irrespective of length and composition."
</AB>
<JT>J Mol Biol</JT>
<PY>233</PY>
<VO>233</VO>
<PP>488-497</PP>
</SEQ>

<SEQ>
<UI>0194   Lefevre,C.    A Fast Word Search Alg.. Nucleic Acids R 94 
22(3):404-411
</UI>
<AU>Lefevre C;
    Ikeda JE
</AU>
<TI>A Fast Word Search Algorithm for the Representation of Sequence 
Similarity
in Genomic DNA
</TI>
<SU>Pairwise comparison;
    Dot;
    JP;
    Representation;
    Automata;
    Repetition;
    Similarity;
    DNA;
    Word;
    Genomic;
    Algorithm
</SU>
<AB>"Computation of [the dot matrix for comparing biological two biological
sequences] has been reconsidered here. An improvement is proposed through the
preprocessing of the data into an automaton recognizing the word structure of a
sequence. The main advantage of this approach is to systematically eliminate 
the
repetitions during word comparison. Simple heuristics are also considered to
greatly speed up pattern matching. As a result, large sequences are handled 
very
efficiently."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1994</PY>
<VO>22</VO>
<NO>3</NO>
<PP>404-411</PP>
</SEQ>

<SEQ>
<UI>0195   Lake,J.A.     Reconstructing Evoluti.. Proc.Nat.Acad.S 94 
91(4):1455-145
</UI>
<AU>Lake JA
</AU>
<TI>Reconstructing Evolutionary Trees from DNA and Protein Sequences:
Paralinear Distances
</TI>
<SU>Phylogeny;
    Markov;
    USA;
    Evolutionary tree;
    Substitution;
    Distance;
    Protein;
    DNA
</SU>
<AB>"The reconstruction of phylogenetic trees from DNA and protein sequences
is confounded by unequal rate effects. ... The algorithm presented here, called
paralinear distances, is valid for a much broader class of substitution
processes than previous algorithms and is accordingly less affected by unequal
rate effects. It may be used with all nucleic acid, protein, or other 
sequences,
provided that their evolution may be modeled as a succession of Markov
processes. ... Paralinear distances can fail when sequences are misaligned or
when site-to-site sequence variation of rates is extensive."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>1994</PY>
<VO>91</VO>
<NO>4</NO>
<PP>1455-1459</PP>
</SEQ>

<SEQ>
<UI>0196   Kahn,P.       EMBL Data Library        Methods Enzymol 90 
183:23-31
</UI>
<AU>Kahn P;
    Cameron G
</AU>
<TI>EMBL Data Library
</TI>
<SU>Sequence database;
    DE;
    EMBL
</SU>
<AB>"The EMBL Data Library was established in 1980 to collect, organize, and
distribute a database of nucleotide sequences and related descriptive
information extracted from publications in scientific journals." Databases. 
Data
acquisition. Data distribution.
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>23-31</PP>
</SEQ>

<SEQ>
<UI>0197   Hein,J.       Genomic Alignment        J.Mol.Evol.     94 
38:310-316
</UI>
<AU>Hein J;
    Stovlbaek J
</AU>
<TI>Genomic Alignment
</TI>
<SU>Pairwise alignment;
    DK;
    Region;
    Coding;
    Frame;
    Genomic
</SU>
<AB>"A heuristic algorithm is presented that can compare DNA with both coding
and noncoding regions, but that also can compare multiple reading frames and
determine which exons are homologous. A program, GenAl (Genomic Alignment), was
developed that implements the algorithm. Its use is demonstrated on two
retroviruses."
</AB>
<JT>J Mol Evol</JT>
<PY>38</PY>
<VO>38</VO>
<PP>310-316</PP>
</SEQ>

<SEQ>
<UI>0198   Fristensky,B. Feature Expressions: C.. Nucleic Acids R 93 
21(25):5997-60
</UI>
<AU>Fristensky B
</AU>
<TI>Feature Expressions: Creating and Manipulating Sequence Datasets
</TI>
<SU>Database search;
    CA;
    Region;
    Expression;
    Coding
</SU>
<AB>"Annotation of features, such as introns, exons and protein coding 
regions
in GenBank/EMBL/DDBJ entries is now standardized through use of the Features
Table (FT) language. Because FT is intrinsic to the database definition, it can
serve as a software- and platform-independent lingua franca for sequence
manipulation. The XYLEM package makes it possible to create and manipulate
sequence datasets using FT expressions."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1993</PY>
<VO>21</VO>
<NO>25</NO>
<PP>5997-6003</PP>
</SEQ>

<SEQ>
<UI>0199   Claverie,J.M. Detecting Frame Shifts.. J.Mol.Biol.     93 
234:1140-1157
</UI>
<AU>Claverie JM
</AU>
<TI>Detecting Frame Shifts by Amino Acid Sequence Comparison
</TI>
<SU>Sequence proximity;
    Substitution;
    Frame;
    USA;
    Sequence comparison;
    Scoring;
    Amino acid
</SU>
<AB>"I derive five new types of scoring matrix, each capable of detecting a
specific frame shift (deletion, insertion and inversion in 3 frames) and use
them with a regular local alignments program to detect amino acid sequences 
that
may have derived from alternative reading frames of the same nucleotide
sequence. Frame shifts are inferred from the sole comparison of the protein
sequences. The five scoring matrices were used with the BLASTP program to
compare all the protein sequences in the Swissprot database. Surprisingly, the
searches revealed hundreds of highly significant frame shift matches."
</AB>
<JT>J Mol Biol</JT>
<PY>234</PY>
<VO>234</VO>
<PP>1140-1157</PP>
</SEQ>

<SEQ>
<UI>0200   Burks,C.      GenBank: Current Statu.. Methods Enzymol 90 183:3-22
</UI>
<AU>Burks C;
    Cinkosky MJ;
    Gilna P;
    Hayden JED;
    Abe Y;
    Atencio EJ;
    Barnhouse S;
    Benton D;
    Buenafe CA;
    Cumella KE;
    Davison DB;
    Emmert DB;
    Faulkner MJ;
    Fickett JW;
    Fischer WM;
    Good M;
    Horne DA;
    Houghton FK;
    Kelkar PM;
    Kelley TA;
    Kelly M;
    King MA;
    Langan BJ;
    Lauer JT;
    Lopez N;
    Lynch C;
    Lynch J;
    Marchi JB;
    Marr TG;
    Martinez FA;
    McLeod MJ;
    Medvick PA;
    Mishra SK;
    Moore J;
    Munk CA;
    Mondragon SM;
    Nasseri KK;
    Nelson D;
    Nelson W;
    Nguyen T;
    Reiss G;
    Rice J;
    Ryals J;
    Salazar MD;
    Stelts SR;
    Trujillo BL;
    Tomlinson LJ;
    Weiner MG;
    Welch FJ;
    Wiig SE;
    Yudin K;
    Zins LB
</AU>
<TI>GenBank: Current Status and Future Directions
</TI>
<SU>Sequence database;
    GenBank;
    USA
</SU>
<AB>"The GenBank database provides a collection of nucleotide sequences as
well as relevant bibliographic and biological annotation. We present an updated
view of the size and scope of the database, and we also describe recent
developments in the strategies, protocols, and software for collecting,
maintaining, and distributing the data."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>3-22</PP>
</SEQ>

<SEQ>
<UI>0201   Boswell,D.R.  Sequence Alignment by .. Trends Biochem. 87 
12:279-280
</UI>
<AU>Boswell DR
</AU>
<TI>Sequence Alignment by Word Processor
</TI>
<SU>Multiple alignment;
    UK;
    Sequence alignment;
    Program;
    Word
</SU>
<AB>"The word processing programs I have used for sequence alignment on IBM
PC-compatible microprocessors are WORDSTAR and PCWRITE." The author discusses
the suitably of these programs for rudimentary alignment of multiple sequences.
</AB>
<JT>Trends Biochem Sci</JT>
<PY>12</PY>
<VO>12</VO>
<PP>279-280</PP>
</SEQ>

<SEQ>
<UI>0202   Baldi,P.      Hidden Markov Models o.. Proc.Nat.Acad.S 94 
91:1059-1063
</UI>
<AU>Baldi P;
    Chauvin Y;
    Hunkapiller T;
    McClure MA
</AU>
<TI>Hidden Markov Models of Biological Primary Sequence Information
</TI>
<SU>Sequence analysis;
    Markov;
    Multiple alignment;
    USA;
    Statistical;
    Motif;
    Model
</SU>
<AB>"Hidden Markov model (HMM) techniques are used to model families of
biological sequences. ... The HMM approach is applied to three protein families
.... In all cases, the models derived capture the important statistical
characteristics of the family and can be used for a number of tasks, including
multiple alignments, motif detection, and classification. For K sequences of
average length N, this approach yields an effective multiple-alignment 
algorithm
which requires O(KN2) operations, linear in the number of sequences."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>91</PY>
<VO>91</VO>
<PP>1059-1063</PP>
</SEQ>

<SEQ>
<UI>0203   Luo,L.        The Statistical Correl.. Bull.Math.Biol. 91 
53(3):345-353
</UI>
<AU>Luo L;
    Li H
</AU>
<TI>The Statistical Correlation of Nucleotides in Protein-Coding DNA 
Sequences
</TI>
<SU>Composition;
    Information content;
    Markov;
    Statistical;
    CN;
    Correlation;
    DNA;
    Nucleotide
</SU>
<AB>"The statistical correlation of nucleotides in a DNA sequence is 
described
by a set of redundancies D1, D2, D3, .... By calculation of {Dn} of 2341 coding
regions of nucleic acid sequences it is demonstrated that about 2/3 of 
sequences
has correlation length &lt;=2, 10% of sequences - correlation with 3-periodicity 
and
others - long range aperiodic correlation. The implications of the results from
the interactions of random mutation and natural selection are discussed
briefly."
</AB>
<JT>Bull Math Biol</JT>
<PY>1991</PY>
<VO>53</VO>
<NO>3</NO>
<PP>345-353</PP>
</SEQ>

<SEQ>
<UI>0204   Churchill,G.A Stochastic Models for .. Bull.Math.Biol. 89 
51(1):79-94
</UI>
<AU>Churchill GA
</AU>
<TI>Stochastic Models for Heterogeneous DNA Sequences
</TI>
<SU>Composition;
    Information content;
    Markov;
    Likelihood;
    USA;
    Display;
    Stochastic;
    DNA;
    Model
</SU>
<AB>"In this paper, the DNA sequence is viewed as a stochastic process with
local compositional properties determined by the states of a hidden Markov
chain. The model used is a discrete-state, discrete-outcome version of a 
general
model for non-stationary time series proposed by Kitagawa (1987). A smoothing
algorithm is described which can be used to reconstruct the hidden process and
produce graphic displays of the compositional structure of a sequence. The
problem of parameter estimation is approached using likelihood methods ...."
</AB>
<JT>Bull Math Biol</JT>
<PY>1989</PY>
<VO>51</VO>
<NO>1</NO>
<PP>79-94</PP>
</SEQ>

<SEQ>
<UI>0205   Tavare,S.     Codon Preference and P.. Bull.Math.Biol. 89 
51(1):95-115
</UI>
<AU>Tavare S;
    Song B
</AU>
<TI>Codon Preference and Primary Sequence Structure in Protein-Coding Regions
</TI>
<SU>Composition;
    Information content;
    Markov;
    USA;
    Region;
    Codon;
    Complexity;
    Substitution;
    Structure
</SU>
<AB>"The stochastic complexity of a data base of 365 protein-coding regions 
is
analysed. When the primary sequence is modeled as a spatially homogeneous 
Markov
source, the fit to observed codon preference is very poor. The situation
improves substantially when a non-homogeneous model is used. Some implications
for the estimation of species phylogeny and substitution rates are discussed."
</AB>
<JT>Bull Math Biol</JT>
<PY>1989</PY>
<VO>51</VO>
<NO>1</NO>
<PP>95-115</PP>
</SEQ>

<SEQ>
<UI>0206   Sankoff,D.    Probabilistic Models o.. Bull.Math.Biol. 89 
51(1):117-124
</UI>
<AU>Sankoff D;
    Goldstein M
</AU>
<TI>Probabilistic Models of Genome Shuffling
</TI>
<SU>Genome;
    Probabilistic;
    CA;
    Shuffling;
    Model
</SU>
<AB>"The comparison of entire genomes in evolutionary studies gives rise to
alignments characterized by many intersections, or inversions in the order of
two fragments in different genomes. To model this, we suggest a random 
migration
process for fragments, and discuss its equilibrium distribution in the case of
linear and circular genomes. Simulations are carried out to explore 'cut-off'
behavior as the process approaches equilibrium. ... Questions of applicability
of these models are discussed."
</AB>
<JT>Bull Math Biol</JT>
<PY>1989</PY>
<VO>51</VO>
<NO>1</NO>
<PP>117-124</PP>
</SEQ>

<SEQ>
<UI>0207   Arratia,R.    Tutorial on Large Devi.. Bull.Math.Biol. 89 
51(1):125-131
</UI>
<AU>Arratia R;
    Gordon L
</AU>
<TI>Tutorial on Large Deviations for the Binomial Distribution
</TI>
<SU>Probabilistic;
    USA;
    Distribution
</SU>
<AB>"We present, in an easy to use form, the large deviation theory of the
binomial distribution: how to approximate the probability of k or more 
successes
in n independent trials, each with success probability p, when the specified
fraction of successes, a = k/n, satisfies 0 &lt; p &lt; a &lt; 1."
</AB>
<JT>Bull Math Biol</JT>
<PY>1989</PY>
<VO>51</VO>
<NO>1</NO>
<PP>125-131</PP>
</SEQ>

<SEQ>
<UI>0208   Zharkikh,A.A. VOSTORG: A Package of .. Gene            91 
101:251-254
</UI>
<AU>Zharkikh AA;
    Rzhetsky AY;
    Morosov PS;
    Sitnikova TL;
    Krushkal JS
</AU>
<TI>VOSTORG: A Package of Microcomputer Programs for Sequence Analysis and
Construction of Phylogenetic Trees
</TI>
<SU>Sequence analysis;
    Phylogeny;
    RU;
    Program;
    Phylogenetic
</SU>
<AB>"VOSTORG is a new, versatile package of programs for the inference and
presentation of phylogenetic trees, as well as an efficient tool for nculeotide
(nt) and amino acid (aa) sequence analysis (sequence input, verification,
alignment, construction of consensus, etc.). On appropriately equipped systems,
these data can be displayed on a video monitor or printed as required. ... The
package is designed to be easily handled by occasional computer users and yet 
it
is powerful enough for experienced professionals."
</AB>
<JT>Gene </JT>
<PY>101</PY>
<VO>101</VO>
<PP>251-254</PP>
</SEQ>

<SEQ>
<UI>0209   Brutlag,D.L.  BLAZE(TM): An Implemen.. Computers Chem. 93 
17(2):203-207
</UI>
<AU>Brutlag DL;
    Dautricourt JP;
    Diaz R;
    Fier J;
    Moxon B;
    Stamm R
</AU>
<TI>BLAZE(TM): An Implementation of the Smith-Waterman Sequence Comparison
Algorithm on a Massively Parallel Computer
</TI>
<SU>Sequence comparison;
    Database search;
    Parallel;
    USA;
    FASTA;
    BLAST;
    Program;
    Algorithm
</SU>
<AB>"We have implemented the Smith and Waterman dynamic programming algorithm
on the massively parallel MP1104 computer from MasPar and compared its ability
to detect remote protein sequence homologies with that of other commonly used
database search algorithms. ... We have found that the algorithms, in order of
decreasing sensitivity are BLAZE, FASTDB, FASTA and BLAST. Hence the massively
parallel computers allow one to have maximal sensitivity and search speed
simultaneously."
</AB>
<JT>Computers Chem</JT>
<PY>1993</PY>
<VO>17</VO>
<NO>2</NO>
<PP>203-207</PP>
</SEQ>

<SEQ>
<UI>0210   States,D.J.   Improved Sensitivity o.. Methods: Compan 91 
3(1):66-70
</UI>
<AU>States DJ;
    Gish W;
    Altschul SF
</AU>
<TI>Improved Sensitivity of Nucleic Acid Database Searches Using Application-
Specific Scoring Matrices
</TI>
<SU>Database search;
    Sequence proximity;
    BLAST;
    Scoring;
    USA
</SU>
<AB>"Scoring matrices for nucleic acid sequence comparison that are based on
models appropriate to the analysis of molecular sequencing errors or biological
mutation processes are presented. In mammalian genomes, transition mutations
occur significantly more frequently than transversions, and the optimal scoring
of sequence alignments based on this substitution model differs from that
derived assuming a uniform mutation model. ... Results of searches performed
using BLASTN's default score matrix are compared with those using scores based
on a mutational model in which transitions are more prevalent than
transversions."
</AB>
<JT>Methods: Companion Methods Enzymol </JT>
<PY>1991</PY>
<VO>3</VO>
<NO>1</NO>
<PP>66-70</PP>
</SEQ>

<SEQ>
<UI>0211   Slisenko,A.O. String Matching in Rea.. Lecture Notes i 78 
64:493-496
</UI>
<AU>Slisenko AO
</AU>
<TI>String Matching in Real Time: Some Properties of the Data Structure
</TI>
<SU>Pattern match;
    Regularities;
    Search tree;
    RU;
    Data structure;
    String match;
    Complexity;
    Structure
</SU>
<AB>Mathematical Foundations of Computer Science, 1978: Proceedings, 7th.
Zakopane, Poland, 4-8 September 1978. Edited by J. Winkowski. "The two main 
aims
of this report are: (i) to claim new results on the complexity of a well-known
problem, namely, string-matching; (ii) to make explicit those general 
properties
of the data structure used in the real-time algorithm which provide its basic
speed capacities. Let us consider the following three problems of the string-
matching type: (1) recognize the set {uvw#v}, where u, v, w [are binary
sequences]; (2) find a longest repetition in a given string; (3) find all the
periodicities in a given string .... All these problems can be solved in real
time."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>64</PY>
<VO>64</VO>
<PP>493-496</PP>
</SEQ>

<SEQ>
<UI>0212   Nadeau,J.H.   Lengths of Chromosomal.. Proc.Nat.Acad.S 84 
81:814-818
</UI>
<AU>Nadeau JH;
    Taylor BA
</AU>
<TI>Lengths of Chromosomal Segments Conserved since Divergence of Man and
Mouse
</TI>
<SU>Chromosome;
    Rearrangement;
    Evolution;
    USA;
    Segment;
    Divergence
</SU>
<AB>"Linkage relationships of homologous loci in man and mouse were used to
estimate the mean length of autosomal segments conserved during evolution.
Comparison of the locations of &gt;83 homologous loci revealed 13 conserved
segments. ... Methods were developed for using this sample of conserved 
segments
to estimate the mean length of all conserved autosomal segments in the genome.
... The mean length of conserved segments was also used to estimate the number
of chromosomal rearrangements that have disrupted linkage since divergence of
man and mouse. This estimate was shown to be 178 +- 39 rearrangements."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>81</PY>
<VO>81</VO>
<PP>814-818</PP>
</SEQ>

<SEQ>
<UI>0213   Brooks,L.D.   The Probabilities of S.. Genomics        88 
3:207-216
</UI>
<AU>Brooks LD;
    Weir BS;
    Schaffer HE
</AU>
<TI>The Probabilities of Similarities DNA Sequence Comparisons
</TI>
<SU>Pairwise comparison;
    Significance;
    Statistical;
    USA;
    Sequence comparison;
    Similarity;
    Probability;
    DNA
</SU>
<AB>"We discuss the statistical significance of local similarities found
between DNA sequences, and illustrate the procedure with reference to the Queen
and Korn algorithm. ... A table is given to assess the significance of longest
similarities in sequences of length up to 1000 bases. Quite long similarities
are expected to occur by chance alone. The critical values we calculate for
assessing significance are preferable to expected numbers of similarities used
by some commercial computer packages. ... We have not found approximate
formulas, such as those of Waterman (1986), to be applicable over a wide range
of conditions."
</AB>
<JT>Genomics </JT>
<PY>3</PY>
<VO>3</VO>
<PP>207-216</PP>
</SEQ>

<SEQ>
<UI>0214   Hillis,D.M.   Application and Accura.. Science         94 264(29 
April):
</UI>
<AU>Hillis DM;
    Huelsenbeck JP;
    Cunningham CW
</AU>
<TI>Application and Accuracy of Molecular Phylogenies
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    USA;
    Accuracy
</SU>
<AB>"The performance of methods of phylogenetic analysis can be assessed by
numerical simulation studies and by the experimental evolution of organisms in
controlled laboratory situations. Both kinds of assessment indicate that
existing methods are effective at estimating phylogenies over a wide range of
evolutionary conditions, especially if information about substitution bias is
used to provide differential weightings for character transformations."
</AB>
<JT>Science </JT>
<PY>1994</PY>
<VO>264</VO>
<NO>29 April</NO>
<PP>671-677</PP>
</SEQ>

<SEQ>
<UI>0215   Sellers,P.H.  Pattern Recognition in.. Lect.Math.Life  86 17:19-28
</UI>
<AU>Sellers PH
</AU>
<TI>Pattern Recognition in DNA
</TI>
<SU>Pattern recognition;
    USA;
    DNA;
    Recognition
</SU>
<AB>"The possibility of DNA sequencing brings to biology an increased need 
for
mathematical and computational tools. This need has been met at the Rockefeller
University by developing computer programs for pattern recognition in DNA. The
main point to be illustrated in this lecture is that the successful development
of such programs requires a formal mathematical approach to the biological
problems involved."
</AB>
<JT>Lect Math Life Sci</JT>
<PY>17</PY>
<VO>17</VO>
<PP>19-28</PP>
</SEQ>

<SEQ>
<UI>0216   Jones,D.T.    The Rapid Generation o.. Comput.Appl.Bio 92 
8(3):275-282
</UI>
<AU>Jones DT;
    Taylor WR;
    Thornton JM
</AU>
<TI>The Rapid Generation of Mutation Data Matrices from Protein Sequences
</TI>
<SU>Sequence proximity;
    Scoring;
    UK;
    Protein
</SU>
<AB>"An efficient means for generating mutation data matrices from large
numbers of protein sequences is presented here. By means of an approximate
peptide-based sequence comparison algorithm, the set sequences are clustered at
the 85% identity level. The closest relating pairs of sequences are aligned, 
and
observed amino acid exchanges tallied in a matrix. The raw mutation frequency
matrix is processed in a similar way to that described by Dayhoff et al. 
(1978),
and so the resulting matrices may be easily used in current sequence analysis
applications, in place of the standard mutation data matrices, which have not
been updated for 13 years."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>3</NO>
<PP>275-282</PP>
</SEQ>

<SEQ>
<UI>0217   Karlin,S.     Maximal Length of Comm.. Ann.Probab.     88 
16(2):535-563
</UI>
<AU>Karlin S;
    Ost F
</AU>
<TI>Maximal Length of Common Words Among Random Letter Sequences
</TI>
<SU>Longest common;
    Significance;
    Probabilistic;
    USA;
    Word
</SU>
<AB>"Consider random letter sequences ... based on a finite alphabet 
generated
by uniformly mixing stationary processes. The asymptotic distributional
properties of the length of the longest common word in r or more of the s
sequences ... are investigated. When the probability measures of the different
sequences are not too dissimilar, a classical extremal type limit law holds 
....
The distributional properties of other long-word relationships and patterns
among the sequences are also discussed."
</AB>
<JT>Ann Probab</JT>
<PY>1988</PY>
<VO>16</VO>
<NO>2</NO>
<PP>535-563</PP>
</SEQ>

<SEQ>
<UI>0218   Nei,M.        Methods for Computing .. Mol.Biol.Evol.  85 
2(1):66-85
</UI>
<AU>Nei M;
    Stephens JC;
    Saitou N
</AU>
<TI>Methods for Computing the Standard Errors of Branching Points in an
Evolutionary Tree and Their Application to Molecular Data from Humans and Apes
</TI>
<SU>Evolutionary tree;
    Robustness;
    Analytical;
    Distance;
    UPGMA;
    USA;
    Error
</SU>
<AB>"Statistical methods for computing the standard errors of the branching
points of an evolutionary tree are developed. These methods are for the
unweighted pair-group method-determined (UPGMA) trees reconstructed from
molecular data such as amino acid sequences, nucleotide sequences, restriction-
sites data, and electrophoretic distances. They were applied to data for the
human, chimpanzee, gorilla, orangutan, and gibbon species."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1985</PY>
<VO>2</VO>
<NO>1</NO>
<PP>66-85</PP>
</SEQ>

<SEQ>
<UI>0219   Waterman,M.S. Probability Distributi.. Lect.Math.Life  86 17:29-56
</UI>
<AU>Waterman MS
</AU>
<TI>Probability Distributions for DNA Sequence Comparisons
</TI>
<SU>Sequence comparison;
    Significance;
    Statistical;
    Markov;
    USA;
    Segment;
    Distributed;
    Distribution;
    Probability;
    DNA
</SU>
<AB>"Recently DNA sequence comparisons have focused on finding long matching
segments between two sequences, rather than matching the entire sequences.
Generalizations of the celebrated Erdos-Renyi law give laws of large numbers 
and
extreme value distributions for random variables equal to the length of the
longest exact match and longest approximate match between the sequences. The
cases of independent, identically distributed sequences and of Markov chains 
are
presented. In the final section, simulated sequences and sequences from
bacteriophage lambda are analyzed in light of these theoretical results."
</AB>
<JT>Lect Math Life Sci</JT>
<PY>17</PY>
<VO>17</VO>
<PP>29-56</PP>
</SEQ>

<SEQ>
<UI>0220   Duret,L.      HOVERGEN: A Database o.. Nucleic Acids R 94 
22(12):2360-23
</UI>
<AU>Duret L;
    Mouchiroud D;
    Gouy M
</AU>
<TI>HOVERGEN: A Database of Homologous Vertebrate Genes
</TI>
<SU>Sequence database;
    Database search;
    Gene;
    FR
</SU>
<AB>"Similarity search programs easily find genes homologous to a given
sequence. However, only very tedious manual procedures allow the retrieval of
all sets of homologous genes sequenced for a given set of species. Moreover,
this search often generates errors due to the complexity of data to be managed
simultaneously: phylogenetic trees, alignments, taxonomy, sequences and related
information. HOVERGEN helps to solve these problems by integrating all this
information. ... This graphical tool gives thus a rapid and simple access to 
all
data necessary to interpret homology relationships between genes."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1994</PY>
<VO>22</VO>
<NO>12</NO>
<PP>2360-2365</PP>
</SEQ>

<SEQ>
<UI>0221   Collins,J.F.  High-Efficiency Sequen.. Computers and.. 
90Addison-Wesley
</UI>
<AU>Collins JF;
    Reddaway SF
</AU>
<TI>High-Efficiency Sequence Database Searching: Use of the Distributed Array
Processor
</TI>
<ED>Bell G
    Marr T
</ED>
<BK>Computers and DNA, SFI Studies in the Sciences of Complexity, vol. VII
</BK>
<SU>Sequence database;
    Database search;
    Parallel;
    Distributed;
    UK
</SU>
<AB>"Careful mapping of the sequence comparison algorithm described by
Coulson, Collins and Lyall [1987] has provided on the AMT DAP 510 machine a
high-speed method of searching for local protein sequence similarities in
databases. ... [Novel] methods will be required to maintain an adequate search-
and-retrieval capability with the most powerful computers. Such a method that
exploits the features of the DAP is described, whose performance should provide
the basis for adequate searching even when the database has reached the size of
the human genome, or 3 x 109 bases of genetic sequence."
</AB>
<PU>Addison-Wesley </PU>
<PL>Reading, MA </PL>
<PY>1990</PY>
<PP>85-92</PP>
</SEQ>

<SEQ>
<UI>0222   Davison,D.B.  Sequence Searching on .. Computers and.. 
90Addison-Wesley
</UI>
<AU>Davison DB
</AU>
<TI>Sequence Searching on Supercomputers
</TI>
<ED>Bell G
    Marr T
</ED>
<BK>Computers and DNA, SFI Studies in the Sciences of Complexity, vol. VII
</BK>
<SU>Sequence search;
    Program;
    Performance;
    USA
</SU>
<AB>"There are a large number of machines available [at Los Alamos] .... The
abundance of cycles, and the low cost, could lead one to think that 
optimization
would be less important. That is not so. Precisely because the queries are
larger and more involved, it is necessary to create code that is as efficient 
at
possible. This paper will discuss the steps involved in taking an existing
similarity code and improving its performance."
</AB>
<PU>Addison-Wesley </PU>
<PL>Reading, MA </PL>
<PY>1990</PY>
<PP>93-97</PP>
</SEQ>

<SEQ>
<UI>0223   Lapedes,A.    Application of Neural .. Computers and.. 
90Addison-Wesley
</UI>
<AU>Lapedes A;
    Barnes C;
    Burks C;
    Farber R;
    Sirotkin K
</AU>
<TI>Application of Neural Networks and Other Machine Learning Algorithms to
DNA Sequence Analysis
</TI>
<ED>Bell G
    Marr T
</ED>
<BK>Computers and DNA, SFI Studies in the Sciences of Complexity, vol. VII
</BK>
<SU>Sequence analysis;
    Neural;
    DNA;
    USA;
    Network;
    Learning;
    Algorithm
</SU>
<AB>"In this article we report initial, quantitative results on application 
of
simple neural networks and simple machine learning methods to two problems in
DNA sequence analysis. ... (1) Determination of whether procaryotic and
eucaryotic DNA sequences segments are translated to protein. ... (2)
Determination of whether eucaryotic DNA sequence segments containing the
dinucleotides 'AG' or 'GT' are transcribed to RNA splice junctions."
</AB>
<PU>Addison-Wesley </PU>
<PL>Reading, MA </PL>
<PY>1990</PY>
<PP>157-181</PP>
</SEQ>

<SEQ>
<UI>0224   Fischer,M.J.  String-Matching and Ot.. SIAM-AMS Proc.  74 
7:113-125
</UI>
<AU>Fischer MJ;
    Paterson MS
</AU>
<TI>String-Matching and Other Products
</TI>
<SU>String match;
    Approximate match;
    Don't care;
    USA
</SU>
<AB>In Complexity of Computation, Karp, R. M. (ed.). "The string-matching
problem considered here is to find all occurrences of a given pattern as a
substring of another longer string. ... The more difficult case where either
string may have 'don't care' symbols which are deemed to match with all symbols
is also considered. By exploiting the formal similarity of string-matching with
integer multiplication, a new algorithm has been obtained with a running time
which is only slightly worse than linear."
</AB>
<BK>SIAM-AMS Proc </BK>
<PY>1974</PY>
<VO>7</VO>
<PP>113-125</PP>
</SEQ>

<SEQ>
<UI>0225   Bird,R.S.     Formal Derivation of a.. Sci.Comput.Prog 89 
12:93-104
</UI>
<AU>Bird RS;
    Gibbons J;
    Jones G
</AU>
<TI>Formal Derivation of a Pattern Matching Algorithm
</TI>
<SU>Pattern match;
    Knuth-Morris-Pratt;
    UK;
    Algorithm
</SU>
<AB>"This paper is devoted to the synthesis of a functional version of the
Knuth-Morris-Pratt algorithm. ... However, we do assume some familiarity with
the basic ideas of functional programming."
</AB>
<JT>Sci Comput Programming </JT>
<PY>12</PY>
<VO>12</VO>
<PP>93-104</PP>
</SEQ>

<SEQ>
<UI>0226   Wilbur,W.J.   On the PAM Matrix Mode.. Mol.Biol.Evol.  85 
2(5):434-447
</UI>
<AU>Wilbur WJ
</AU>
<TI>On the PAM Matrix Model of Protein Evolution
</TI>
<SU>Scoring;
    PAM;
    Markov;
    USA;
    Evolution;
    Protein;
    Model;
    Matrix
</SU>
<AB>"The internal consistency of the PAM matrix model of protein evolution is
here investigated. ... A discrepancy of more than two orders of magnitude is
found between the predictions and the data when this is carried out. This is
partly accounted for by an error in constructing the matrix. However, it also
seems necessary that the basic model be modified. Several possibilities are
considered. One of these is to incorporate a site-dependent spectrum of
mutabilities associated with each amino acid."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1985</PY>
<VO>2</VO>
<NO>5</NO>
<PP>434-447</PP>
</SEQ>

<SEQ>
<UI>0227   Waterman,M.S. Multiple Hypothesis Te.. Computers and.. 
90Addison-Wesley
</UI>
<AU>Waterman MS;
    Gordon L
</AU>
<TI>Multiple Hypothesis Testing for Sequence Comparisons
</TI>
<ED>Bell G
    Marr T
</ED>
<BK>Computers and DNA, SFI Studies in the Sciences of Complexity, vol. VII
</BK>
<SU>Pairwise comparison;
    Significance;
    Locally optimal;
    Sequence comparison;
    USA
</SU>
<AB>"It is remarkable that these segmental matchings from random sequences 
are
so long and score so well. Simulations such as this suggest that understanding
the distribution of score (max Gij) under the null hypothesis of independence 
is
an important goal. Otherwise if the analysis of 'interesting' alignments
proceeds on an ad hoc basis, it is easy to be misled by statistically
insignificant alignments. ... The examples of this paper are of DNA sequences,
but the general theory allows analysis of protein and other sequences."
</AB>
<PU>Addison-Wesley </PU>
<PL>Reading, MA </PL>
<PY>1990</PY>
<PP>127-135</PP>
</SEQ>

<SEQ>
<UI>0228   Hebrard,J.J.  Calcul de la distance .. RAIRO Inform.Th 86 
20(4):441-456
</UI>
<AU>Hebrard JJ;
    Crochemore M
</AU>
<TI>Calcul de la distance par les sous-mots
</TI>
<SU>Sequence proximity;
    Automata;
    Longest common;
    FR;
    DE;
    Data structure;
    Distance
</SU>
<AB>"This paper gives two methods to compute the shortest subsequence which
distinguishes two different words u and v. The use of automata together with
data structures for 'Union-Find' questions leads to an algorithm almost linear
in the length of uv." A distance between u and v can be based on the length of
this subsequence.
</AB>
<JT>RAIRO Inform Theor Appl</JT>
<PY>1986</PY>
<VO>20</VO>
<NO>4</NO>
<PP>441-456</PP>
</SEQ>

<SEQ>
<UI>0229   Claverie,J.M. Database of Ancient Se.. Nature (Lond.)  93 364(1 
July):19
</UI>
<AU>Claverie JM
</AU>
<TI>Database of Ancient Sequences
</TI>
<SU>Sequence database;
    Region;
    Motif;
    USA;
    Ancient
</SU>
<AB>"Green et al. [1993] have introduced the concept of ancient conserved
regions (ACRs), defined as contiguous amino-acid sequence segments predating 
the
coelomate radiation 500-600 million years ago. ... I have estimated the total
number and assembled a repertoire of these ancestral sequences from an analysis
of the Swiss-Prot (21.0) protein database. ... This small 'ancestral' subset
thus constitutes a convenient resource for the fast screening and 
identification
of new sequences (for instance numerous cDNA partial sequences) and the
definition of motifs. The 551-representative ACR set is available on e-mail
request ...."
</AB>
<JT>Nature (Lond ) </JT>
<PY>1993</PY>
<VO>364</VO>
<NO>1 July</NO>
<PP>19-20</PP>
</SEQ>

<SEQ>
<UI>0230   Green,P.      Ancient Conserved Regi.. Science         93 259 (19 
March)
</UI>
<AU>Green P;
    Lipman D;
    Hillier L;
    Waterston R;
    States D;
    Claverie JM
</AU>
<TI>Ancient Conserved Regions in New Gene Sequences and the Protein Databases
</TI>
<SU>Database search;
    USA;
    Region;
    Gene;
    Protein;
    Ancient
</SU>
<AB>"Sets of new gene sequences from human, nematode, and yeast were compared
with each other and with a set of Escherichia coli genes in order to detect
ancient evolutionarily conserved regions (ACRs) in the encoded proteins. Nearly
all of the ACRs so identified were found to be homologous to sequences in the
protein databases. ... It is estimated that there are fewer than 900 ACRs in
all."
</AB>
<JT>Science </JT>
<PY>1993</PY>
<VO>259</VO>
<NO>19 March</NO>
<PP>1711-1716</PP>
</SEQ>

<SEQ>
<UI>0231   Schneider,T.D A Design for Computer .. Nucleic Acids R 82 
10(9):3013-302
</UI>
<AU>Schneider TD;
    Stormo GD;
    Haemer JS;
    Gold L
</AU>
<TI>A Design for Computer Nucleic-Acid-Sequence Storage, Retrieval, and
Manipulation
</TI>
<SU>Sequence database;
    Database search;
    Management;
    Program;
    USA;
    Retrieval
</SU>
<AB>"We have designed and built a data-base system for the storage of 
nucleic-
acid sequences. The system consists of a data base ('the library') and software
that manages and provides access to that data base ('the librarian').
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1982</PY>
<VO>10</VO>
<NO>9</NO>
<PP>3013-3024</PP>
</SEQ>

<SEQ>
<UI>0232   Breslauer,D.  A Lower Bound for Para.. SIAM J.Comput.  92 
21(5):856-862
</UI>
<AU>Breslauer D;
    Galil Z
</AU>
<TI>A Lower Bound for Parallel String Matching
</TI>
<SU>String match;
    Parallel;
    USA;
    Complexity
</SU>
<AB>"This paper presents an W(log log m) lower bound on the number of rounds
necessary for finding occurrences of a pattern string P[1..m] in a text string
T[1..2m] in parallel using m comparisons in each round. This bound is within a
constant factor of the fastest algorithm for this problem [Breslauer, Galil
(1990)] and also holds for an m-processor CRCW-PRAM in the case of a general
alphabet. Consequently, the paper derives the parallel complexity of the string
matching problem using p processors for general alphabets ...."
</AB>
<JT>SIAM J Comput</JT>
<PY>1992</PY>
<VO>21</VO>
<NO>5</NO>
<PP>856-862</PP>
</SEQ>

<SEQ>
<UI>0233   Boguski,M.S.  On Computer-Assisted A.. J.Lipid Res.    86 
27:1011-1034
</UI>
<AU>Boguski MS;
    Freeman M;
    Elshourbagy NA;
    Taylor JM;
    Gordon JI
</AU>
<TI>On Computer-Assisted Analysis of Biological Sequences: Proline
Punctuation, Consensus Sequences, and Apolipoprotein Repeats
</TI>
<SU>Sequence alignment;
    Sequence comparison;
    Database search;
    Consensus sequence;
    Structure;
    Review;
    USA;
    Sequence analysis;
    Repeat
</SU>
<AB>"We describe a number of computer methods that have been applied to the
analysis of apolipoprotein sequences. We discuss the suitability of these
methods for particular problems, how the choice of initial 'parameters' can
affect the results, and what the results can tell us about protein or gene
sequences. We also identify some outstanding problems of apolipoprotein 
sequence
analysis where further work is needed."
</AB>
<JT>J Lipid Res</JT>
<PY>27</PY>
<VO>27</VO>
<PP>1011-1034</PP>
</SEQ>

<SEQ>
<UI>0234   Tsai,W.H.     Attributed String Matc.. IEEE Trans.Patt 85 
7(4):453-462
</UI>
<AU>Tsai WH;
    Yu SS
</AU>
<TI>Attributed String Matching with Merging for Shape Recognition
</TI>
<SU>String match;
    Sequence proximity;
    CN;
    Segment;
    Recognition
</SU>
<AB>"Each attributed string is an ordered sequence of shape boundary
primitives, each representing a basic boundary structural unit, line segment,
with two types of numerical attributes, length and direction. A new type of
primitive edit operation, called merge, is then introduced, which can be used 
to
combine and then match any number of consecutive boundary primitives in one
shape with those in another. The resulting attributed string matching with
merging approach is shown useful for recognizing distorted shapes."
</AB>
<JT>IEEE Trans Patt Anal Mach Intell</JT>
<PY>1985</PY>
<VO>7</VO>
<NO>4</NO>
<PP>453-462</PP>
</SEQ>

<SEQ>
<UI>0235   Baeza-Yates,R On Boyer-Moore Automata  Algorithmica    94 
12(4/5):268-29
</UI>
<AU>Baeza-Yates RA;
    Choffrut C;
    Gonnet GH
</AU>
<TI>On Boyer-Moore Automata
</TI>
<SU>String search;
    Pattern match;
    Automata;
    Boyer-Moore;
    CL
</SU>
<AB>"The notion of Boyer-Moore automaton was introduced by Knuth, Morris, and
Pratt in their historical paper on fast pattern matching. It leads to an
algorithm that requires more preprocessing but is more efficient than the
original Boyer-Moore's algorithm. We formalize the notion of Boyer-Moore
automaton and we give an efficient building algorithm. Also, bounds on the
number of states are presented, and the concept of potential of a transition is
introduced to improve the worst- and average-case behavior of these machines."
</AB>
<JT>Algorithmica </JT>
<PY>1994</PY>
<VO>12</VO>
<NO>4/5</NO>
<PP>268-292</PP>
</SEQ>

<SEQ>
<UI>0236   Zweig,S.E.    Analysis of Large Nucl.. Nucleic Acids R 84 
12(1):767-776
</UI>
<AU>Zweig SE
</AU>
<TI>Analysis of Large Nucleic Acid Dot Matrices on Small Computers
</TI>
<SU>Sequence comparison;
    Dot;
    Program;
    USA;
    Compression
</SU>
<AB>"A UCSD Pascal program was developed which can analyze nucleic acid dot
matrices of up to 9500 x 9500 in size on the Apple II computer. Although
matrices of such size consume large amounts of computer memory, this program
minimizes these problems by analyzing only small strips of the matrix at a 
time,
and then transferring the results to a floppy disk or printer. Compression and
memory efficient code further enhance the size of the matrix that can be
analyzed."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1984</PY>
<VO>12</VO>
<NO>1</NO>
<PP>767-776</PP>
</SEQ>

<SEQ>
<UI>0237   Stephens,J.C. Statistical Methods of.. Mol.Biol.Evol.  85 
2(6):539-556
</UI>
<AU>Stephens JC
</AU>
<TI>Statistical Methods of DNA Sequence Analysis: Detection of Intragenic
Recombination or Gene Conversion
</TI>
<SU>Sequence analysis;
    Significance;
    Statistical;
    Phylogeny;
    USA;
    Gene;
    DNA;
    Recombination;
    Detection
</SU>
<AB>"Simple but exact statistical tests for detecting a cluster of associated
nucleotide changes in DNA are presented. The tests are based on the linear
distribution of a set of s sites among a total of n sites, where the s sites 
may
be the variable sites, sites of insertion/deletion, or categorized in some 
other
way. These tests are especially useful for detecting gene conversion and
intragenic recombination in a sample of DNA sequences."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1985</PY>
<VO>2</VO>
<NO>6</NO>
<PP>539-556</PP>
</SEQ>

<SEQ>
<UI>0238   Shepherd,J.C. Ancient Patterns in Nu.. Methods Enzymol 90 
183:180-192
</UI>
<AU>Shepherd JCW
</AU>
<TI>Ancient Patterns in Nucleic Acid Sequences
</TI>
<SU>Region;
    UK;
    Coding;
    Frame;
    Nucleic acid;
    Ancient
</SU>
<AB>"Here a brief summary is given of some of the evidence and reasoning
leading to the conclusion that remnants of a primeval coding system still exist
in present-day DNA sequences from all types of living organisms. ... A simple
computer program is then described which looks for remnants of these primeval
messages. Not only is it useful as a quick method of analysis of newly
determined sequences by predicting the likely reading frame and, in some cases,
the extent of existing genes, but it can be a guide to the nature of the genes
and their past history."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>180-192</PP>
</SEQ>

<SEQ>
<UI>0239   Hunt,L.T.     Usefulness of the PIR .. Methods in Pr.. 
91Birkhauser
</UI>
<AU>Hunt LT
</AU>
<TI>Usefulness of the PIR Database for Protein Comparisons
</TI>
<ED>Jornvall H
    Hoog JO;
    Gustavsson AM
</ED>
<BK>Methods in Protein Sequence Analysis
</BK>
<SU>Sequence database;
    Sequence analysis;
    USA;
    Sequence comparison;
    Protein;
    PIR
</SU>
<AB>"Innovative options being developed for the protein sequence databases of
the PIR-International will aid sequence analysis by providing more rapid access
to new data, facilitating information retrieval, incorporating new types of
information and data representations, and adding to the programs for searching,
comparison, and prediction. Protocols for sequence analysis are briefly
outlined."
</AB>
<PU>Birkhauser </PU>
<PL>Basel </PL>
<PY>1991</PY>
<PP>343-352</PP>
</SEQ>

<SEQ>
<UI>0240   Bleasby,A.J.  Construction of Valida.. Protein Eng.    90 
3(3):153-159
</UI>
<AU>Bleasby AJ;
    Wootton JC
</AU>
<TI>Construction of Validated, Non-redundant Composite Protein Sequence
Databases
</TI>
<SU>Sequence database;
    UK;
    Protein
</SU>
<AB>"A strategy has been developed for the construction of a validated,
comprehensive composite protein sequence database. Entries are amalgamated from
primary source data bases by a largely automated set of processes in which
redundant and trivially different entries are eliminated. A modular approach 
has
been adopted to allow scientific judgement to be used at each stage of database
processing and amalgamation."
</AB>
<JT>Protein Eng</JT>
<PY>1990</PY>
<VO>3</VO>
<NO>3</NO>
<PP>153-159</PP>
</SEQ>

<SEQ>
<UI>0241   Chothia,C.    The Relation Between t.. EMBO J.         86 
5(4):823-826
</UI>
<AU>Chothia C;
    Lesk AM
</AU>
<TI>The Relation Between the Divergence of Sequence and Structure in Proteins
</TI>
<SU>Sequence comparison;
    Structure;
    UK;
    Divergence;
    Protein
</SU>
<AB>"Here we report a systematic comparison of structures from eight 
different
protein families. This shows that the extent of the structural changes is
directly related to the extent of the sequence changes."
</AB>
<JT>EMBO J</JT>
<PY>1986</PY>
<VO>5</VO>
<NO>4</NO>
<PP>823-826</PP>
</SEQ>

<SEQ>
<UI>0242   Dayhoff,M.O.  A Model of Evolutionar.. Atlas of Prot.. 72National 
Biomed
</UI>
<AU>Dayhoff MO;
    Eck RV;
    Park CM
</AU>
<TI>A Model of Evolutionary Change in Proteins
</TI>
<ED>Dayhoff MO
</ED>
<BK>Atlas of Protein Sequence and Structure, 1972, Volume 5
</BK>
<SU>Sequence proximity;
    Substitution;
    PAM;
    Scoring;
    USA;
    Protein;
    Model
</SU>
<AB>"What mutations are most likely to be accepted? Which amino acids are
least likely to change? How does the passage of time affect the similarity of
related protein sequences?"
</AB>
<PU>National Biomedical Research Foundation </PU>
<PL>Washington, DC </PL>
<PY>1972</PY>
<PP>89-99</PP>
</SEQ>

<SEQ>
<UI>0243   Dayhoff,M.O.  A Model of Evolutionar.. Atlas of Prot.. 68National 
Biomed
</UI>
<AU>Dayhoff MO;
    Eck RV
</AU>
<TI>A Model of Evolutionary Change in Proteins
</TI>
<ED>Dayhoff MO
    Eck RV
</ED>
<BK>Atlas of Protein Sequence and Structure, 1967-68
</BK>
<SU>Sequence proximity;
    Substitution;
    PAM;
    Scoring;
    USA;
    Evolutionary distance;
    Protein;
    Model
</SU>
<AB>Accepted Point Mutations. Mutability of Amino Acids. Amino Acid
Frequencies in the Mutation Data. Mutation Probability Matrix for the
Evolutionary Distance of Two PAMs. Simulation of the Mutational Process.
Mutation Probability Matrices. Estimation of Evolutionary Distance. Relatedness
Odds Matrix. Computing Relationships Between Sequences.
</AB>
<PU>National Biomedical Research Foundation </PU>
<PL>Silver Spring, MD </PL>
<PY>1968</PY>
<PP>33-41</PP>
</SEQ>

<SEQ>
<UI>0244   Levin,J.M.    An Algorithm for Secon.. FEBS Lett.      86 
205(2):303-308
</UI>
<AU>Levin JM;
    Robson B;
    Garnier J
</AU>
<TI>An Algorithm for Secondary Structure Determination in Proteins Based on
Sequence Similarity
</TI>
<SU>Sequence proximity;
    Structure;
    FR;
    Scoring;
    Similarity;
    Protein;
    Secondary;
    Algorithm
</SU>
<AB>"A secondary structure prediction algorithm is proposed on the hypothesis
that short homologous sequences of amino acids have the same secondary 
structure
tendencies. Comparisons are made with the secondary structure assignments of
Kabsch and Sander from X-ray data ... and an empirically determined similarity
matrix which assigns a sequence similarity score between any two sequences of 7
residues in length. This similarity matrix differs in many respects from that 
of
the Dayhoff substitution matrix ...."
</AB>
<JT>FEBS Lett</JT>
<PY>1986</PY>
<VO>205</VO>
<NO>2</NO>
<PP>303-308</PP>
</SEQ>

<SEQ>
<UI>0245   Pevzner,P.A.  l-Tuple DNA Sequencing.. J.Biomol.Struct 89 
7(1):63-73
</UI>
<AU>Pevzner PA
</AU>
<TI>l-Tuple DNA Sequencing: Computer Analysis
</TI>
<SU>Supersequence;
    Shortest common;
    Reconstruct;
    RU;
    DNA;
    Sequencing
</SU>
<AB>"In the present paper a necessary and sufficient condition for testing 
the
uniqueness of sequence reconstruction is obtained and an efficient
reconstruction algorithm is proposed. ... The problem of l-tuple DNA sequencing
is a particular case of the problem about minimal superword: for a given set of
words S one has to find the word of minimal length containing all words of the
set S. Gallant et al. (1980) proved the NP-completeness of this problem. Hence
one can hope to obtain the efficient algorithms for its solution only in
particular cases."
</AB>
<JT>J Biomol Struct &amp; Dyn </JT>
<PY>1989</PY>
<VO>7</VO>
<NO>1</NO>
<PP>63-73</PP>
</SEQ>

<SEQ>
<UI>0246   Suen,C.Y.     n-Gram Statistics for .. IEEE Trans.Patt 79 
1(2):164-172
</UI>
<AU>Suen CY
</AU>
<TI>n-Gram Statistics for Natural Language Understanding and Text Processing
</TI>
<SU>Retrieval;
    N-gram;
    Statistical;
    CA;
    Language
</SU>
<AB>"n-Gram (n = 1 to 5) statistics and other properties of the English
language were derived for applications in natural language understanding and
text processing. They were computed from a well-known corpus composed of 1
million word samples. ... The positional distributions of n-grams obtained in
the present study are discussed. Statistical studies on word length and trends
of n-gram frequencies versus vocabulary are presented. In addition to a survey
of n-gram statistics found in the literature, a collection of n-gram statistics
obtained by other researchers is reviewed and compared."
</AB>
<JT>IEEE Trans Patt Anal Mach Intell</JT>
<PY>1979</PY>
<VO>1</VO>
<NO>2</NO>
<PP>164-172</PP>
</SEQ>

<SEQ>
<UI>0247   Kabsch,W.     On the Use of Sequence.. Proc.Nat.Acad.S 84 
81:1075-1078
</UI>
<AU>Kabsch W;
    Sander C
</AU>
<TI>On the Use of Sequence Homologies to Predict Protein Structure: Identical
Pentapeptides can have Completely Different Conformations
</TI>
<SU>Sequence comparison;
    Structure;
    DE;
    Homology;
    Protein
</SU>
<AB>"Pentapeptide structure within a protein is strongly dependent on 
sequence
context, a fact essentially ignored in most protein structure prediction
methods: just considering the local sequence of five residues is not sufficient
to predict correctly the local conformation (secondary structure). ... Also, we
are warned that in the growing practice of comparing a new protein sequence 
with
a data base of known sequences, finding an identical pentapeptide sequence
between two proteins is not a significant indication of structural similarity 
or
of evolutionary kinship."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>81</PY>
<VO>81</VO>
<PP>1075-1078</PP>
</SEQ>

<SEQ>
<UI>0248   Van Bockstael Sequence Representation  Biochimie       85 
67:509-516
</UI>
<AU>Van Bockstaele F
</AU>
<TI>Sequence Representation
</TI>
<SU>Representation;
    FR;
    Sequence analysis;
    Linguistic;
    Segment
</SU>
<AB>"This article deals with the definition of a method for analyzing
sequences of symbols, especially biological sequences. We are mostly interested
in finding representations of sequences, that could help to explicate
relationship between their structure and their activity. Starting with
automatically built rules, governing occurrences of symbols within sequences, 
we
define ways of using these rules to determine different subsequences that we
assume to be contexts. Labelled contexts provide a possible representation of
sequences."
</AB>
<JT>Biochimie </JT>
<PY>67</PY>
<VO>67</VO>
<PP>509-516</PP>
</SEQ>

<SEQ>
<UI>0249   Raiha,K.J.    The Shortest Common Su.. Theoret.Comput. 81 
16:187-198
</UI>
<AU>Raiha KJ;
    Ukkonen E
</AU>
<TI>The Shortest Common Supersequence Problem over Binary Alphabet is NP-
complete
</TI>
<SU>Supersequence;
    Complexity;
    FI;
    Shortest common
</SU>
<AB>"We consider the complexity of the Shortest Common Supersequence (SCS)
problem .... The SCS problem is shown to be NP-complete for strings over an
alphabet of size &gt;= 2."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>16</PY>
<VO>16</VO>
<PP>187-198</PP>
</SEQ>

<SEQ>
<UI>0250   Benson,G.     A Space Efficient Algo.. Lecture Notes i 94 807:1-14
</UI>
<AU>Benson G
</AU>
<TI>A Space Efficient Algorithm for Finding the Best Non-Overlapping 
Alignment
Score
</TI>
<SU>Sequence alignment;
    Repeat;
    Region;
    USA;
    Score;
    Algorithm
</SU>
<AB>5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings.
"Repeating patterns make up a significant fraction of DNA and protein 
molecules.
These repeating regions are important to biological function because they may
act as catalytic, regulatory or evolutionary sites and because they have been
implicated in human disease. .... In this paper, we present a space efficient
algorithm for finding the maximum alignment score for any two substrings of a
single string T under the condition that the substrings do not overlap. In a
biological context, this corresponds to the largest repeating region in the
molecule."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>807</PY>
<VO>807</VO>
<PP>1-14</PP>
</SEQ>

<SEQ>
<UI>0251   Chao,K.M.     Computing all Suboptim.. Lecture Notes i 94 
807:31-42
</UI>
<AU>Chao KM
</AU>
<TI>Computing all Suboptimal Alignments in Linear Space
</TI>
<SU>Sequence alignment;
    Suboptimal;
    USA
</SU>
<AB>5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings.
"Recently, a new compact representation for suboptimal alignments was proposed
by Naor and Brutlag (1993). The kernel of that representation is a minimal
directed acyclic graph (DAG) containing all suboptimal alignments. In this
paper, we propose a method that computes such a DAG in space linear to the 
graph
size. ... To exploit the computed DAG, we employ a variant of Aho-Corasick
pattern matching machine ... to locate all occurrences of specified patterns,
and then find a path in the DAG that maximizes the sum of the scores of the 
non-
overlapping patterns occurring in it."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>807</PY>
<VO>807</VO>
<PP>31-42</PP>
</SEQ>

<SEQ>
<UI>0252   Bafna,V.      Approximation Algorith.. Lecture Notes i 94 
807:43-53
</UI>
<AU>Bafna V;
    Lawler EL;
    Pevzner PA
</AU>
<TI>Approximation Algorithms for Multiple Sequence Alignment
</TI>
<SU>Multiple alignment;
    Approximation;
    Sequence alignment;
    USA;
    Algorithm
</SU>
<AB>5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "We
consider the problem of aligning of k sequences of length n. The cost function
is sum of pairs, and satisfies triangle inequality. ... We generalize this
approach to assemble an alignment of k sequences from optimally aligned subsets
of l &lt; k sequences to obtain an improved performance guarantee. For arbitrary l
&lt; k, we devise deterministic and randomized algorithms yielding performance
guarantees of 2 - l/k. For fixed l, the running times of these algorithms are
polynomial in n and k."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>807</PY>
<VO>807</VO>
<PP>43-53</PP>
</SEQ>

<SEQ>
<UI>0253   Huang,X.      A Context Dependent Me.. Lecture Notes i 94 
807:54-63
</UI>
<AU>Huang X
</AU>
<TI>A Context Dependent Method for Comparing Sequences
</TI>
<SU>Sequence comparison;
    Sequence proximity;
    Pairwise alignment;
    USA
</SU>
<AB>5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "A
scoring scheme is presented to measure the similarity score between two
biological sequences, where matches are weighted dependent on their context. 
The
scheme generalizes a widely used scoring scheme. A dynamic programming 
algorithm
is developed to compute a largest-scoring alignment of two sequences .... Also
developed is an algorithm for computing a largest-scoring local alignment
between two sequences in quadratic time and linear space."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>807</PY>
<VO>807</VO>
<PP>54-63</PP>
</SEQ>

<SEQ>
<UI>0254   Cobbs,A.L.    Fast Identification of.. Lecture Notes i 94 
807:64-74
</UI>
<AU>Cobbs AL
</AU>
<TI>Fast Identification of Approximately Matching Substrings
</TI>
<SU>Approximate match;
    Identification;
    USA
</SU>
<AB>5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "We
give an efficient algorithm for finding all maximal matches between [two
strings]. The algorithm runs in time bounded by the sum of the lengths of the
maximal matches .... The main application is identifying homologous regions of
protein sequences."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>807</PY>
<VO>807</VO>
<PP>64-74</PP>
</SEQ>

<SEQ>
<UI>0255   Huang,X.      Parametric Recomputing.. Lecture Notes i 94 
807:87-101
</UI>
<AU>Huang X;
    Pevzner PA;
    Miller W
</AU>
<TI>Parametric Recomputing in Alignment Graphs
</TI>
<SU>Sequence alignment;
    Parametric;
    Graph;
    USA
</SU>
<AB>5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings.
"DNA/protein sequence alignments in computational molecular biology depend
heavily on the settings of penalties for substitutions, insertions/deletions 
and
gaps. Inappropriate choice of parameters causes irrelevant matches ('noise') to
be reported .... This paper provides a computational underpinning for such
iterative noise filtration in alignment graphs. Our main results assume that a
preliminary noisy alignment, computed with reasonable but ad hoc parameters, is
given; the problem is to modify the parameters to reduce noise."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>807</PY>
<VO>807</VO>
<PP>87-101</PP>
</SEQ>

<SEQ>
<UI>0256   Manber,U.     A Text Compression Sch.. Lecture Notes i 94 
807:113-124
</UI>
<AU>Manber U
</AU>
<TI>A Text Compression Scheme that Allows Fast Searching Directly in the
Compressed File
</TI>
<SU>Sequence search;
    Compression;
    String match;
    USA
</SU>
<AB>5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "A
new text compression scheme is presented in this paper. The main purpose of 
this
scheme is to speed up string matching by searching the compressed file 
directly.
The scheme requires no modification of the string-matching algorithm, which is
used as a black box; any string-matching procedure can be used. Instead, the
pattern is modified; only the outcome of the matching of the modified pattern
against the compressed file is decompressed. Since the compressed file is
smaller than the original file, the search is faster both in terms of I/O time
and processing time than a search in the original file."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>807</PY>
<VO>807</VO>
<PP>113-124</PP>
</SEQ>

<SEQ>
<UI>0257   Lestree,L.    Unit Route Upper Bound.. Lecture Notes i 94 
807:136-145
</UI>
<AU>Lestree L
</AU>
<TI>Unit Route Upper Bound for String-Matching on Hypercube
</TI>
<SU>String match;
    Parallel;
    FR
</SU>
<AB>5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "We
give here an algorithm of string matching on a hypercube with constant memory
.... This algorithm is very close to the lower bound of the problem for this
architecture. ... The model chosen here is a SIMD hypercube with free
communication."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>807</PY>
<VO>807</VO>
<PP>136-145</PP>
</SEQ>

<SEQ>
<UI>0258   Kosaraju,S.R. Computation of Squares.. Lecture Notes i 94 
807:146-150
</UI>
<AU>Kosaraju SR
</AU>
<TI>Computation of Squares in a String (Preliminary Version)
</TI>
<SU>Regularities;
    Search tree;
    Square;
    USA
</SU>
<AB>5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "We
design a linear time algorithm for computing a square substring from each
position of a given string over a finite alphabet. The algorithm exploits
several subtle properties of suffix trees for strings."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>807</PY>
<VO>807</VO>
<PP>146-150</PP>
</SEQ>

<SEQ>
<UI>0259   Alexander,K.S Shortest Common Supers.. Lecture Notes i 94 
807:164-172
</UI>
<AU>Alexander KS
</AU>
<TI>Shortest Common Superstrings for Strings of Random Letters
</TI>
<SU>Supersequence;
    Shortest common;
    Compression;
    USA
</SU>
<AB>5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings.
"Given a finite collection of strings of letters from a fixed alphabet, it is 
of
interest, in the contexts of data compression and DNA sequencing, to find the
length of the shortest string which contains each of the given strings as a
consecutive substring. In order to analyze the average behavior of the optimal
superstring length, substrings with a specified collection of lengths are
considered with the letters selected independently at random. An asymptotic
expression, as the collection of lengths becomes large, is obtained for the
savings from compression ...."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>807</PY>
<VO>807</VO>
<PP>164-172</PP>
</SEQ>

<SEQ>
<UI>0260   Irving,R.W.   Maximal Common Subsequ.. Lecture Notes i 94 
807:173-183
</UI>
<AU>Irving RW;
    Fraser CB
</AU>
<TI>Maximal Common Subsequences and Minimal Common Supersequences
</TI>
<SU>Longest common;
    Subsequence;
    Shortest common;
    Supersequence;
    Approximation;
    UK
</SU>
<AB>5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. 
"Here
we study the related problems of finding a minimum-length maximal common
subsequence and a maximum-length minimal common supersequence. We describe
dynamic programming algorithms for the case of two strings ..., which can be
extended to any fixed number of strings. We also show that the minimum maximal
common subsequence problem is NP-hard in general for k strings, and we prove a
strong negative approximability result for this problem."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>807</PY>
<VO>807</VO>
<PP>173-183</PP>
</SEQ>

<SEQ>
<UI>0261   Breslauer,D.  Dictionary-Matching on.. Lecture Notes i 94 
807:184-197
</UI>
<AU>Breslauer D
</AU>
<TI>Dictionary-Matching on Unbounded Alphabets: Uniform-Length Dictionaries
</TI>
<SU>Dictionary match;
    Multidimensional;
    Search tree;
    Italy
</SU>
<AB>5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. 
"This
paper presents an efficient on-line dictionary-matching algorithm for the case
where the patterns have uniform length and the input alphabet is unbounded. A
tight lower bound establishes that our approach is optimal if the only access
the algorithm has to the input strings is by pairwise symbol comparisons. In an
immediate application, the new dictionary-matching algorithm can be used in a
previously known higher-dimensional array-matching algorithm, improving the
performance of this algorithm on unbounded alphabets."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>807</PY>
<VO>807</VO>
<PP>184-197</PP>
</SEQ>

<SEQ>
<UI>0262   Idury,R.M.    Multiple Matching of P.. Lecture Notes i 94 
807:226-239
</UI>
<AU>Idury RM;
    Schaffer AA
</AU>
<TI>Multiple Matching of Parameterized Patterns
</TI>
<SU>Pattern match;
    Parameterized;
    Automata;
    USA
</SU>
<AB>5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "We
extend Baker's theory of parameterized pattern matching (1993) to algorithms
that match multiple patterns in a text. We first consider the case where the
patterns are fixed and preprocessed once, and then the case where the pattern
set can change by insertions and deletions. Baker's algorithms are based on
suffix trees, whereas ours are based on pattern matching automata."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>807</PY>
<VO>807</VO>
<PP>226-239</PP>
</SEQ>

<SEQ>
<UI>0263   Akutsu,T.     Approximate String Mat.. Lecture Notes i 94 
807:240-249
</UI>
<AU>Akutsu T
</AU>
<TI>Approximate String Matching with Don't Care Characters
</TI>
<SU>Approximate match;
    Match with don't cares;
    String match;
    JP
</SU>
<AB>5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. 
"This
paper presents parallel and serial approximate matching algorithms for strings
with don't care characters. They are based on Landau and Vishkin's approximate
string matching algorithm and Fisher and Paterson's exact string matching
algorithm with don't care characters. ... Several extensions are also
described."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>807</PY>
<VO>807</VO>
<PP>240-249</PP>
</SEQ>

<SEQ>
<UI>0264   Chang,W.I.    Approximate String Mat.. Lecture Notes i 94 
807:259-273
</UI>
<AU>Chang WI;
    Marr TG
</AU>
<TI>Approximate String Matching and Local Similarity
</TI>
<SU>Approximate match;
    Sequence proximity;
    String match;
    USA;
    Similarity
</SU>
<AB>5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "In
this paper, we describe how the distance-based sublinear expected time 
algorithm
of Chang and Lawler can be extended to solve efficiently the local similarity
problem. We present both a new theoretical result, polynomial-space, constant-
fraction-error matching that is provably optimal, and a practical adaptation of
it that produces nearly identical results as Smith-Waterman, at speedups of 2X
... or better. Further improvements are anticipated."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>807</PY>
<VO>807</VO>
<PP>259-273</PP>
</SEQ>

<SEQ>
<UI>0265   Kececioglu,J. Efficient Bounds for O.. Lecture Notes i 94 
807:307-325
</UI>
<AU>Kececioglu J;
    Sankoff D
</AU>
<TI>Efficient Bounds for Oriented Chromosome Inversion Distance
</TI>
<SU>Genome;
    Sequence proximity;
    Chromosome;
    Inversion;
    CA;
    Distance
</SU>
<AB>5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "We
study the problem of comparing two circular chromosomes that have evolved by
chromosome inversion, assuming that the order of corresponding genes is known,
as well as their orientation. Determining the minimum number of inversions is
equivalent to finding the minimum of reversals to sort a signed circular
permutation, where a reversal takes an arbitrary substring of elements and
reverses their order, as well as flipping their sign. We show that tight bounds
on the minimum number of reversals can be found by simple and efficient
algorithms."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>807</PY>
<VO>807</VO>
<PP>307-325</PP>
</SEQ>

<SEQ>
<UI>0266                 Computational Molecula..                 88Oxford 
Universi
</UI>
<TI>Computational Molecular Biology: Sources and Methods for Sequence 
Analysis
</TI>
<ED>Lesk AM
BK  -
</ED>
<SU>Sequence analysis;
    UK
</SU>
<AB>Table of contents (ix-x) and references (229-247) only. Introduction (1
chapter); databanks of protein sequences (3); databanks of nucleic acid
sequences (2); databanks of three-dimensional structures (1); program systems
(3); the technical background (3); scientific applications (6); prospects (1).
</AB>
<PU>Oxford University Press </PU>
<PL>Oxford, UK </PL>
<PY>1988</PY>
<PP>1-249</PP>
</SEQ>

<SEQ>
<UI>0267   Chang,W.I.    Sublinear Approximate .. Algorithmica    94 
12(4/5):327-34
</UI>
<AU>Chang WI;
    Lawler EL
</AU>
<TI>Sublinear Approximate String Matching and Biological Applications
</TI>
<SU>Pattern match;
    String match;
    Edit;
    Distance;
    Suffix;
    Approximate match;
    USA
</SU>
<AB>"Given a text string of length n and a pattern string of length m over a
b-letter alphabet, the k differences approximate string matching problem asks
for all locations in the text where the pattern occurs with at most k
differences (substitutions, insertions, deletions). We treat k not as a 
constant
but as a fraction of m (not necessarily constant-fraction). ... We give an
algorithm that is sublinear time O( (n/m) k logb m ) when the text is random 
and
k is bounded by the threshold m/( logb m + O(1) )."
</AB>
<JT>Algorithmica </JT>
<PY>1994</PY>
<VO>12</VO>
<NO>4/5</NO>
<PP>327-344</PP>
</SEQ>

<SEQ>
<UI>0268   Gingeras,T.R. Computer Programs for .. Nucleic Acids R 79 
7(2):529-545
</UI>
<AU>Gingeras TR;
    Milazzo JP;
    Sciaky D;
    Roberts RJ
</AU>
<TI>Computer Programs for the Assembly of DNA Sequences
</TI>
<SU>Supersequence;
    Shortest common;
    Reconstruct;
    Program;
    USA;
    DNA
</SU>
<AB>"A collection of user-interactive computer programs is described which
aids in the assembly of DNA sequences. This is achieved by searching for the
positions of overlapping common nucleotide sequences within the blocks of
sequence obtained as primary data. Such overlapping segments are then melded
into one continuous string of nucleotides. Strategies for determining the
accuracy of the sequence being analyzed and reducing the error rate resulting
from the manual manipulation of sequence data are discussed."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1979</PY>
<VO>7</VO>
<NO>2</NO>
<PP>529-545</PP>
</SEQ>

<SEQ>
<UI>0269   Taylor,W.R.   A Holistic Approach to.. Protein Eng.    89 
2(7):505-519
</UI>
<AU>Taylor WR;
    Orengo CA
</AU>
<TI>A Holistic Approach to Protein Structure Alignment
</TI>
<SU>Structure;
    UK;
    Protein
</SU>
<AB>"A method of protein structure comparison developed previously is 
extended
to incorporate other aspects of protein structure in addition to the inter-
atomic vectors on which it was originally based. Each additional aspect ... was
introduced separately and evaluated for its ability to improve alignment
quality. The components were then combined, suitably weighted, to produce a 
more
holistic comparison method."
</AB>
<JT>Protein Eng</JT>
<PY>1989</PY>
<VO>2</VO>
<NO>7</NO>
<PP>505-519</PP>
</SEQ>

<SEQ>
<UI>0270   Goldstein,L.  Mapping DNA by Stochas.. Adv.Appl.Math.  87 
8:194-207
</UI>
<AU>Goldstein L;
    Waterman MS
</AU>
<TI>Mapping DNA by Stochastic Relaxation
</TI>
<SU>Digest;
    Mapping;
    USA;
    DNA;
    Stochastic
</SU>
<AB>"The multiple digest mapping problem arising in molecular biology can be
stated roughly as follows. A linear or circular segment of DNA is cut at all
occurrences of a specific short pattern by restriction enzymes. By using
restriction enzymes singly and in combination it is required to construct a map
showing the location of cleavage sites. In this paper we first consider the
efficacy of a simulated annealing algorithm towards the solution to the 
multiple
digest problem. Second, the double digest problem ... is shown to admit an
exponentially increasing number of solutions as a function of the length of the
segment under a particular probability model. Next, the double digest problem 
is
shown to lie in the class of NP complete problems ...."
</AB>
<JT>Adv Appl Math</JT>
<PY>8</PY>
<VO>8</VO>
<PP>194-207</PP>
</SEQ>

<SEQ>
<UI>0271   Orengo,C.A.   A Rapid Method of Prot.. J.Theor.Biol.   90 
147:517-551
</UI>
<AU>Orengo CA;
    Taylor WR
</AU>
<TI>A Rapid Method of Protein Structure Alignment
</TI>
<SU>Structure;
    UK;
    Protein
</SU>
<AB>"A reduction in the time required to compare two protein structures has
been achieved for a previously developed structure alignment method, by 
reducing
the number of residue pair comparisons which must be performed between the two
structures."
</AB>
<JT>J Theor Biol</JT>
<PY>147</PY>
<VO>147</VO>
<PP>517-551</PP>
</SEQ>

<SEQ>
<UI>0272   Orengo,C.A.   Fast Structure Alignme.. Proteins Struct 92 
14:139-167
</UI>
<AU>Orengo CA;
    Brown NP;
    Taylor WR
</AU>
<TI>Fast Structure Alignment for Protein Databank Searching
</TI>
<SU>Database search;
    Structure;
    UK;
    Protein;
    Databank
</SU>
<AB>"A fast method is described for searching and analyzing the protein
structure databank. It uses secondary structure followed by residue matching to
compare protein structures and is developed from a previous structural 
alignment
method based on dynamic programming."
</AB>
<JT>Proteins Struct Funct Genet</JT>
<PY>14</PY>
<VO>14</VO>
<PP>139-167</PP>
</SEQ>

<SEQ>
<UI>0273   Huang,X.      An Algorithm for Ident.. Comput.Appl.Bio 94 
10(3):219-225
</UI>
<AU>Huang X
</AU>
<TI>An Algorithm for Identifying Regions of a DNA Sequence that Satisfy a
Content Requirement
</TI>
<SU>Sequence analysis;
    Composition;
    Dynamic programming;
    USA;
    Region;
    DNA;
    Algorithm
</SU>
<AB>"We present a dynamic programming algorithm for identifying regions of a
DNA sequence that meet a user-specified compositional requirement. Applications
of the algorithm include finding C+G-rich regions, locating TA+CG-deficient
regions, identifying CpG islands, and finding regions rich in periodical three-
base patterns. The algorithm has the advantage over the simple window method in
that the algorithm shows the exact location of each identified region."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>3</NO>
<PP>219-225</PP>
</SEQ>

<SEQ>
<UI>0274   Chou,P.Y.     Prediction of the Seco.. Adv.Enzymol.Rel 78 
47:45-148
</UI>
<AU>Chou PY;
    Fasman GD
</AU>
<TI>Prediction of the Secondary Structure of Proteins from their Amino Acid
Sequence
</TI>
<SU>Structure;
    USA;
    Protein;
    Amino acid;
    Prediction;
    Secondary
</SU>
<AB>Historical introduction. The Chou and Fasman predictive method. 
Definition
of conformational regions. Refinement of conformational parameters. Application
of Chou-Fasman method. Comparison of predictive methods. Computerized Chou-
Fasman method. Future directions.
</AB>
<JT>Adv Enzymol Relat Areas Mol Biol</JT>
<PY>47</PY>
<VO>47</VO>
<PP>45-148</PP>
</SEQ>

<SEQ>
<UI>0275   Rost,B.       Prediction of Protein .. J.Mol.Biol.     93 
232:584-599
</UI>
<AU>Rost B;
    Sander C
</AU>
<TI>Prediction of Protein Secondary Structure at Better than 70% Accuracy
</TI>
<SU>Structure;
    Multiple alignment;
    DE;
    Neural;
    Sequence alignment;
    Protein;
    Prediction;
    Secondary;
    Accuracy
</SU>
<AB>"We have trained a two-layered feed-forward neural network on a non-
redundant data base of 130 protein chains to predict the secondary structure of
water-soluble proteins. A new key aspect is the use of evolutionary information
in the form of multiple sequence alignments that are used as input in place of
single sequences. The inclusion of protein family information in this form
increases the prediction accuracy by six to eight percentage points."
</AB>
<JT>J Mol Biol</JT>
<PY>232</PY>
<VO>232</VO>
<PP>584-599</PP>
</SEQ>

<SEQ>
<UI>0276   Huang,X.      On Global Sequence Ali.. Comput.Appl.Bio 94 
10(3):227-235
</UI>
<AU>Huang X
</AU>
<TI>On Global Sequence Alignment
</TI>
<SU>Sequence alignment;
    Dynamic programming;
    USA;
    Gap
</SU>
<AB>"We present a dynamic programming algorithm for computing a best global
alignment of two sequences. The proposed algorithm is robust in identifying any
of several global relationships between two sequences. The algorithm delivers a
best alignment of two sequences in linear space and quadratic time. We also
describe a multiple alignment algorithm based on the pairwise algorithm. ...
Experimental results indicate that for a commonly used set of gap penalties, 
the
new programs produce more satisfactory alignments on sequences of various
lengths than some existing pairwise and multiple programs based on the dynamic
programming algorithm of Needleman and Wunsch."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>3</NO>
<PP>227-235</PP>
</SEQ>

<SEQ>
<UI>0277   Bishop,M.J.   Evolutionary Trees fro.. Proc.R.Soc.Lond 85 
226:271-302
</UI>
<AU>Bishop MJ;
    Friday AE
</AU>
<TI>Evolutionary Trees from Nucleic Acid and Protein Sequences
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Probabilistic;
    UK;
    Protein;
    Nucleic acid
</SU>
<AB>"The problem addressed is that of estimating evolutionary relationship by
the comparative study of the nucleic acid or protein sequences of living
organisms. The most important point made in this account is that estimation of
evolutionary relationship should be based on clearly defined models the
assumptions of which are open to test. The models should as far as possible
conform to what is known about the processes of evolutionary change in the
organisms concerned. Prevailing approaches, grouped here as divergence models,
are stated below in such a way that it is clear that they involve unrealistic
assumptions about the nature of evolutionary change. Emphasis is placed on the
use of probabilistic models of evolutionary change."
</AB>
<JT>Proc R Soc Lond Ser B </JT>
<PY>226</PY>
<VO>226</VO>
<PP>271-302</PP>
</SEQ>

<SEQ>
<UI>0278   Lake,J.A.     A Rate-Independent Tec.. Mol.Biol.Evol.  87 
4(2):167-191
</UI>
<AU>Lake JA
</AU>
<TI>A Rate-Independent Technique for Analysis of Nucleic Acid Sequences:
Evolutionary Parsimony
</TI>
<SU>Phylogeny;
    Sequence analysis;
    Character data;
    Invariant;
    Substitution;
    Parsimony;
    USA;
    Robustness;
    Analytical;
    Nucleic acid
</SU>
<AB>"The method of evolutionary parsimony - or operator invariants - is a
technique of nucleic acid sequence analysis related to parsimony analysis and
explicitly designed for determining evolutionary relationships among four
distantly related taxa. The method is independent of substitution rates because
it is derived from consideration of the group properties of substitution
operators rather than from an analysis of the probabilities of substitution in
branches of a tree."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1987</PY>
<VO>4</VO>
<NO>2</NO>
<PP>167-191</PP>
</SEQ>

<SEQ>
<UI>0279   Raita,T.      Tuning the Boyer-Moore.. Software.Practi 92 
22(10):879-884
</UI>
<AU>Raita T
</AU>
<TI>Tuning the Boyer-Moore-Horspool String Searching Algorithm
</TI>
<SU>String match;
    Boyer-Moore;
    Regularities;
    Sequence search;
    Pattern match;
    FI;
    String search;
    Algorithm
</SU>
<AB>"Substring search is a common activity in computing. The fastest known
search method is that of Boyer and Moore with the improvements introduced by
Horspool. This paper presents a new implementation which takes advantage of the
dependencies between the characters. The resulting code runs 25 per cent faster
than the best currently-known routine."
</AB>
<JT>Software Practice Experience </JT>
<PY>1992</PY>
<VO>22</VO>
<NO>10</NO>
<PP>879-884</PP>
</SEQ>

<SEQ>
<UI>0280   Gyori,E.      Stack of Pancakes        Stud.Sci.Math.H 78 
13:133-137
</UI>
<AU>Gyori E;
    Turan G
</AU>
<TI>Stack of Pancakes
</TI>
<SU>Inversion;
    Reversal;
    Prefix;
    Genomic;
    HU
</SU>
<AB>Let p be a permutation of the number-set {1, ..., n}. Let an admissible
step be the reversing of the 'end' of the sequence (permutation). ... Let f(p)
be the minimal number of the admissible steps by means of which we get the
permutation 1,2,...,n. Let f(n) be the maximum of f(p) over all permutations in
the symmetric group Sn. Then f(n) &lt;= (5n + 5)/3 for arbitrary n. See also 
Gates &amp;
Papadimitriou (1979).
</AB>
<JT>Stud Sci Math Hungarica </JT>
<PY>13</PY>
<VO>13</VO>
<PP>133-137</PP>
</SEQ>

<SEQ>
<UI>0281   Wu,S.         Fast Text Searching Al.. Comm.ACM        92 
35(10):83-91
</UI>
<AU>Wu S;
    Manber U
</AU>
<TI>Fast Text Searching Allowing Errors
</TI>
<SU>Text search;
    Match with k differences;
    USA;
    Language;
    String match;
    Expression;
    Error
</SU>
<AB>"Many different approximate string-matching algorithms have been
suggested. In this article we present a new algorithm which is very fast in
practice, reasonably simple to implement, and supports a large number of
variations of the approximate string-matching problem. The algorithm is based 
on
a numeric scheme for exact string matching developed by Baeza-Yates and Gonnet
(1992). The algorithm can handle most of the common types of queries, including
arbitrary regular expressions, and several variations of closeness measures. 
...
[It] served as a basis for a software package for Unix called agrep, which has
been in use since June 1991."
</AB>
<JT>Comm ACM </JT>
<PY>1992</PY>
<VO>35</VO>
<NO>10</NO>
<PP>83-91</PP>
</SEQ>

<SEQ>
<UI>0282   Baeza-Yates,R A New Approach to Text.. Comm.ACM        92 
35(10):74-82
</UI>
<AU>Baeza-Yates RA;
    Gonnet GH
</AU>
<TI>A New Approach to Text Searching
</TI>
<SU>Text search;
    Match with don't cares;
    Match with k mismatches;
    CL;
    String match;
    String search
</SU>
<AB>"The string-matching problem consists of finding all occurrences of a
pattern of length m in a text of length n. We generalize the problem allowing
don't care symbols, the complement of a symbol, and any finite class of 
symbols.
We solve this problem for one or more patterns, with or without mismatches. For
small patterns the worst-case time is linear in the size of the text (we say
that a pattern is small if m is bounded by a constant)."
</AB>
<JT>Comm ACM </JT>
<PY>1992</PY>
<VO>35</VO>
<NO>10</NO>
<PP>74-82</PP>
</SEQ>

<SEQ>
<UI>0283   Barker,W.C.   Detecting Distant Rela.. Atlas of Prot.. 72National 
Biomed
</UI>
<AU>Barker WC;
    Dayhoff MO
</AU>
<TI>Detecting Distant Relationships: Computer Methods and Results
</TI>
<ED>Dayhoff MO
</ED>
<BK>Atlas of Protein Sequence and Structure, 1972, Volume 5
</BK>
<SU>Sequence proximity;
    Scoring;
    Significance;
    USA;
    Needleman-Wunsch
</SU>
<AB>"With [the mutation data] matrix and a strategy of comparison to randomly
permuted sequences, we can definitely infer relationships as remote as those
between cytochrome c551 of Pseudomonas and the cytochromes c of higher
organisms, plant leghemoglobin and vertebrate hemoglobins, and a bacterial
protease and the mammalian serine proteases. In this chapter we report the
results of tests on many pairs of proteins for relationship to each other. ...
We have used a computer algorithm designed by Needleman and Wunsch to test for
remote relationships between proteins."
</AB>
<PU>National Biomedical Research Foundation </PU>
<PL>Washington, DC </PL>
<PY>1972</PY>
<PP>101-110</PP>
</SEQ>

<SEQ>
<UI>0284   Cantor,C.R.   The Occurrence of Gaps.. Biochem.Biophys 68 
31(3):410-416
</UI>
<AU>Cantor CR
</AU>
<TI>The Occurrence of Gaps in Protein Sequences
</TI>
<SU>Pairwise alignment;
    Gap;
    Significance;
    USA;
    Protein
</SU>
<AB>"A simple procedure has been developed to test whether a gap does in fact
increase the apparent homology between two sequences. ... We have modified the
procedures developed by Fitch (1966) to include the placement of a gap at every
possible position in a protein sequence."
</AB>
<JT>Biochem Biophys Res Commun</JT>
<PY>1968</PY>
<VO>31</VO>
<NO>3</NO>
<PP>410-416</PP>
</SEQ>

<SEQ>
<UI>0285   Nussinov,R.   Compositional Variatio.. Comput.Appl.Bio 91 
7(3):287-293
</UI>
<AU>Nussinov R
</AU>
<TI>Compositional Variations in DNA Sequences
</TI>
<SU>Sequence analysis;
    Composition;
    USA;
    Signal;
    N-gram;
    Pattern discovery;
    DNA
</SU>
<AB>"Biologically occurring nucleotide sequences differ from randomly
generated ones. Here we describe general patterns found in prokaryotic and in
eukaryotic DNA. In the accompanying paper (Nussinov, 1991) we also describe DNA
signals recognized by their corresponding protein factors. In particular, we
focus on modes of searches for such patterns and signals and on the potential
properties such sequences may possess."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1991</PY>
<VO>7</VO>
<NO>3</NO>
<PP>287-293</PP>
</SEQ>

<SEQ>
<UI>0286   Nussinov,R.   Some Rules in the Orde.. Nucleic Acids R 80 
8(19):4545-456
</UI>
<AU>Nussinov R
</AU>
<TI>Some Rules in the Ordering of Nucleotides in the DNA
</TI>
<SU>Composition;
    N-gram;
    IL;
    Dyad;
    DNA;
    Nucleotide
</SU>
<AB>"Natural DNA sequences contain distinct nearest neighbor patterns.
Eukaryotic as well as prokaryotic sequences show a consistent hierarchy in the
frequencies of appearance of most doublets."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1980</PY>
<VO>8</VO>
<NO>19</NO>
<PP>4545-4562</PP>
</SEQ>

<SEQ>
<UI>0287   Nussinov,R.   Nearest Neighbor Nucle.. J.Biol.Chem.    81 
256(16):8458-8
</UI>
<AU>Nussinov R
</AU>
<TI>Nearest Neighbor Nucleotide Patterns: Structural and Biological
Implications
</TI>
<SU>Composition;
    N-gram;
    USA;
    Codon;
    Dyad;
    Nucleotide
</SU>
<AB>"Recently, nearest neighbor patterns were observed in prokaryotic and
eukaryotic DNA sequences. These are discussed with respect to some of their
biological implications. It is suggested that their origins relate to different
specific structures of nearest neighbor base pairs. These patterns strongly
constrain the DNA sequence. As such, they 'explain' to some degree the amino
acid codon choice and have direct bearing on questions related to evolution."
</AB>
<JT>J Biol Chem</JT>
<PY>1981</PY>
<VO>256</VO>
<NO>16</NO>
<PP>8458-8462</PP>
</SEQ>

<SEQ>
<UI>0288   Nussinov,R.   Doublet Frequencies in.. Nucleic Acids R 84 
12(3):1749-176
</UI>
<AU>Nussinov R
</AU>
<TI>Doublet Frequencies in Evolutionary Distinct Groups
</TI>
<SU>Composition;
    N-gram;
    IL;
    Dyad
</SU>
<AB>"We analyze the dinucleotide frequencies of occurrence and preferences
separately within the vertebrates, nonvertebrates, DNA viruses, .... Distinct
patterns are observed. ... Doublets are the most basic ingredient of order in
nucleotide sequences. We suggest that their preferences and the arrangement of
nucleotides in the DNA in general is determined to a large extent by the
conformational and packaging considerations of the double helix. Some 
principles
of DNA conformation are viewed in light of our results."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1984</PY>
<VO>12</VO>
<NO>3</NO>
<PP>1749-1763</PP>
</SEQ>

<SEQ>
<UI>0289   Nussinov,R.   Strong Doublet Prefere.. J.Mol.Evol.     84 
20:111-119
</UI>
<AU>Nussinov R
</AU>
<TI>Strong Doublet Preferences in Nucleotide Sequences and DNA Geometry
</TI>
<SU>Composition;
    N-gram;
    IL;
    Dyad;
    DNA;
    Nucleotide;
    Geometry
</SU>
<AB>"Analysis of the sequence data available today ... confirms the 
previously
observed phenomenon that there are distinct dinucleotide preferences in DNA
sequences. Consistent behaviour is observed in the major sequence groups
analysed here in prokaryotes, eukaryotes and mitochondria. Some doublet
preferences are common to all groups and are found in most sequences of the Los
Alamos Library. The patterns seen in such large data sets are very significant
statistically and biologically. Since they are present in numerous and diverse
nucleotide sequences, one may infer that they confer evolutionary advantages on
the organism."
</AB>
<JT>J Mol Evol</JT>
<PY>20</PY>
<VO>20</VO>
<PP>111-119</PP>
</SEQ>

<SEQ>
<UI>0290   Zhurkin,V.B.  Periodicity in DNA Pri.. Nucleic Acids R 81 
9(8):1963-1971
</UI>
<AU>Zhurkin VB
</AU>
<TI>Periodicity in DNA Primary Structure is Defined by Secondary Structure of
the Coded Protein
</TI>
<SU>Regularities;
    Structure;
    Linguistic;
    RU;
    Segment;
    Protein;
    DNA;
    Secondary
</SU>
<AB>"The repeating pattern of nucleotide sequences can be used for comparison
of the DNA segments with low degree of homology. ... Such particular sites of
DNA as promotors, origins of replication, etc. can be compared with the
punctuation marks in a printed text - these are the elements, which determine
the 'syntax' of the DNA language. To understand DNA functioning one should 
learn
the laws regulating short-range order in the nucleotide sequence as well
('orthographical' laws). The periodicity found by Trifonov and Sussman [1980] 
is
one of not numerous so far 'orthographical' laws of the DNA language ...."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1981</PY>
<VO>9</VO>
<NO>8</NO>
<PP>1963-1971</PP>
</SEQ>

<SEQ>
<UI>0291   Zhurkin,V.B.  Periodicity in DNA Pri.. Stud.Biophys.   82 
87(2/3):151-15
</UI>
<AU>Zhurkin VB
</AU>
<TI>Periodicity in DNA Primary Structure and Specific Alignment of 
Nucleosimes
</TI>
<SU>Structure;
    RU;
    DNA
</SU>
<AB>"Reconstitution experiments with core histones have yielded preferred
nucleosome positions which are dictated solely by DNA sequence. One of the
explanations of this phenomenon is that the nucleosomes favour those segments 
of
DNA which are more easily wraped about the core. ... Here a new concept is
suggested to explain the specific alignment of nucleosomes."
</AB>
<JT>Stud Biophys</JT>
<PY>1982</PY>
<VO>87</VO>
<NO>2/3</NO>
<PP>151-152</PP>
</SEQ>

<SEQ>
<UI>0292   Nussinov,R.   Signals in DNA Sequenc.. Comput.Appl.Bio 91 
7(3):295-299
</UI>
<AU>Nussinov R
</AU>
<TI>Signals in DNA Sequences and their Potential Properties
</TI>
<SU>Sequence analysis;
    Signal;
    USA;
    Pattern discovery;
    DNA
</SU>
<AB>"To date, most signal searches have been focused on specific recurrences
of nucleotide sequences. Much less attention has been directed towards the
structure, flexibility and hydrogen-bonding patterns that recognition elements
may possess. Here we review the various methods involved in such searches. In
particular, however, we also address the searches for potential properties. In
this regard it is of interest to inspect the asymmetry in the distributions of
complementary oligomers near biological features."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1991</PY>
<VO>7</VO>
<NO>3</NO>
<PP>295-299</PP>
</SEQ>

<SEQ>
<UI>0293   Nussinov,R.   Sequence Signals in Eu.. Crit.Rev.Bioche 90 
25(3):185-224
</UI>
<AU>Nussinov R
</AU>
<TI>Sequence Signals in Eukaryotic Upstream Regions
</TI>
<SU>Signal;
    Region;
    Consensus sequence;
    IL
</SU>
<AB>"Two DNA sequence elements are known to recur frequently upstream of
eukaryotic polymerase II-transcribed genes. The TATAAA .... The GGCCAATCT ....
Here, I discuss DNA structural considerations in upstream regions along with
protein readout of the major and minor groove information content. These
sequence-structure aspects are put in the general context of protein (factors)-
DNA (elements) recognition and regulartion."
</AB>
<JT>Crit Rev Biochem Mol Biol</JT>
<PY>1990</PY>
<VO>25</VO>
<NO>3</NO>
<PP>185-224</PP>
</SEQ>

<SEQ>
<UI>0294   Oliphant,A.R. Defining the Consensus.. Nucleic Acids R 88 
16(15):7673-76
</UI>
<AU>Oliphant AR;
    Struhl K
</AU>
<TI>Defining the Consensus Sequence of E. coli Promoter Elements by Random
Selection
</TI>
<SU>Consensus sequence;
    USA;
    Selection
</SU>
<AB>"The consensus sequence of E. coli promoter elements was determined by 
the
method of random selection. A large collection of hybrid molecules was produced
in which random-sequence oligonucleotides were cloned in place of a wild-type
promoter element, and functional -10 and -35 E. coli promoter elements were
obtained by a genetic selection involving the expression of a structural gene.
The DNA sequences ... for -10 and -35 elements were determined. The consensus
sequences determined by this approach are very similar to those determined by
comparing DNA sequences of naturally occurring  E. coli promoters."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1988</PY>
<VO>16</VO>
<NO>15</NO>
<PP>7673-7683</PP>
</SEQ>

<SEQ>
<UI>0295   Volinia,S.    The Frequency of Oligo.. Comput.Appl.Bio 89 
5(1):33-40
</UI>
<AU>Volinia S;
    Gambari R;
    Bernardi F;
    Barrai I
</AU>
<TI>The Frequency of Oligonucleotides in Mammalian Genic Regions
</TI>
<SU>Composition;
    Statistical;
    Significance;
    Region;
    Italy
</SU>
<AB>"We have prepared algorithms for the study of the frequency distribution
of all oligonucleotides of length 2-6 in DNA sequences ... and have obtained 
the
distribution of the ratio between the observed frequency of oligonucleotides 
and
their expected frequency based on independent nucleotide probabilities. ... We
observed that some oligonucleotides show a statistical behaviour and a regional
distribution similar to that of known signal sequences. Moreover the frequency
distribution of oligonucleotides of length 5 and 6 tends to become bimodal,
indicating the existence of a population of very frequent oligonucleotides."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1989</PY>
<VO>5</VO>
<NO>1</NO>
<PP>33-40</PP>
</SEQ>

<SEQ>
<UI>0296   Huang,X.      A Contig Assembly Prog.. Genomics        92 14:18-25
</UI>
<AU>Huang X
</AU>
<TI>A Contig Assembly Program Based on Sensitive Detection of Fragment
Overlaps
</TI>
<SU>Supersequence;
    Contig;
    USA;
    Dynamic programming;
    Program;
    Fragment;
    Detection
</SU>
<AB>"An effective computer program for assembling DNA fragments, the contig
assembly program (CAP), has been developed. In the CAP program, a filter is 
used
to eliminate quickly fragment pairs that could not possibly overlap, a dynamic
programming algorithm is applied to compute the maximal-scoring overlapping
alignment between each remaining pair of fragments, and a simple greedy 
approach
is employed to assemble fragments in order of alignment scores."
</AB>
<JT>Genomics </JT>
<PY>14</PY>
<VO>14</VO>
<PP>18-25</PP>
</SEQ>

<SEQ>
<UI>0297   Dear,S.       A Sequence Assembly an.. Nucleic Acids R 91 
19(14):3907-39
</UI>
<AU>Dear S;
    Staden R
</AU>
<TI>A Sequence Assembly and Editing Program for Efficient Management of Large
Projects
</TI>
<SU>Management;
    Contig;
    Sequence alignment;
    UK;
    Display;
    Program;
    Editor;
    Editing;
    Sequence assembly
</SU>
<AB>"We describe a sequence assembly and editing program for managing large
and small projects. It is being used to sequence complete cosmids and has
substantially reduced the time taken to process the data. ... All editing is
performed using a mouse operated contig editor that displays aligned sequences
and their traces together on the screen. The editor ... permits rapid movement
along the aligned sequences. Insertions, deletions and replacements can be made
in individual aligned readings and global changes can be made by editing the
consensus."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1991</PY>
<VO>19</VO>
<NO>14</NO>
<PP>3907-3911</PP>
</SEQ>

<SEQ>
<UI>0298   Peltola,H.    SEQAID: A DNA Sequence.. Nucleic Acids R 84 
12(1):307-321
</UI>
<AU>Peltola H;
    Soderlund H;
    Ukkonen E
</AU>
<TI>SEQAID: A DNA Sequence Assembling Program Based on a Mathematical Model
</TI>
<SU>Management;
    Restriction;
    FI;
    Program;
    DNA;
    Model
</SU>
<AB>"The program automatically assembles long DNA sequences from short
fragments with minimal user interaction. ... The main novel features of the
system are that SEQAID implements several new well-behaved algorithms based on 
a
mathematical model of the problem. It also utilizes available information on
restriction fragments to detect illegitimate overlaps and to fine relationships
between separately assembled sequence blocks."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1984</PY>
<VO>12</VO>
<NO>1</NO>
<PP>307-321</PP>
</SEQ>

<SEQ>
<UI>0299   Brown,A.H.D.  Analysis of Variation .. Statistical A.. 83Marcel 
Dekker
</UI>
<AU>Brown AHD;
    Clegg MT
</AU>
<TI>Analysis of Variation in Related DNA Sequences
</TI>
<ED>Weir BS
</ED>
<BK>Statistical Analysis of DNA Sequence Data
</BK>
<SU>Sequence analysis;
    AU;
    Genome;
    Statistical;
    DNA
</SU>
<AB>See Weir (1983) for the book's bibliography, pp. 231-248. "As the
application of sequencing technology expands, numerous copies of a particular
sequence will be available for comparison. For instance, different copies of a
gene which is highly reiterated in the genome may be sequenced, or several
copies of a particular single copy sequence drawn from different individuals 
may
be analyzed. In this chapter we will consider the analysis of sequence 
variation
in a sample of highly repeated genes. Our objective will be to illustrate
several statistical questions which arise naturally from such data."
</AB>
<PU>Marcel Dekker </PU>
<PL>New York, NY </PL>
<PY>1983</PY>
<PP>107-132</PP>
</SEQ>

<SEQ>
<UI>0300   Kanehisa,M.I. Pattern Recognition in.. Nucleic Acids R 82 
10(1):265-278
</UI>
<AU>Kanehisa MI;
    Goad WB
</AU>
<TI>Pattern Recognition in Nucleic Acid Sequences. II. An Efficient Method 
for
Finding Locally Stable Secondary Structures
</TI>
<SU>Pattern recognition;
    Structure;
    USA;
    Nucleic acid;
    Secondary;
    Recognition
</SU>
<AB>"We present a method for calculating all possible single hairpin loop
secondary structures in a nucleic acid sequence by the order of N2 operations
where N is the total number of bases. Each structure may contain any number of
bulges and internal loops. Most natural sequences are found to be
indistinguishable from random sequences in the potential of forming secondary
structures, which is defined by the frequency of possible secondary structures
calculated by the method. There is a strong correlation between the higher G+C
content and the higher structure forming potential."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1982</PY>
<VO>10</VO>
<NO>1</NO>
<PP>265-278</PP>
</SEQ>

<SEQ>
<UI>0301   Korn,L.J.     Computer Analysis of N.. Proc.Nat.Acad.S 77 
74(10):4401-44
</UI>
<AU>Korn LJ;
    Queen CL;
    Wegman MN
</AU>
<TI>Computer Analysis of Nucleic Acid Regulatory Sequences
</TI>
<SU>Sequence analysis;
    Program;
    Regularities;
    Repetition;
    USA;
    Dyad;
    Restriction
</SU>
<AB>A computer program designed to facilitate the analysis of nucleic acid
sequences "can search several nucleotide sequences for oligonucleotides common
to all of them. It can examine a DNA or RNA sequence for two kinds of 
homologous
regions - repetitions and dyad symmetries. The homologies need not be perfect:
mismatches and 'looping out' of nucleotides are allowed. The program also finds
(A+T)- and (C+G)-rich regions, locates restriction enzyme recognition sites,
determines the distribution of di- and trinucleotides, and performs various
other functions."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>1977</PY>
<VO>74</VO>
<NO>10</NO>
<PP>4401-4405</PP>
</SEQ>

<SEQ>
<UI>0302   McCallum,D.   Computer Processing of.. J.Mol.Biol.     77 
116:29-30
</UI>
<AU>McCallum D;
    Smith M
</AU>
<TI>Computer Processing of DNA Sequence Data
</TI>
<SU>Sequence analysis;
    Program;
    UK;
    Frame;
    Editor;
    Sequence search;
    DNA
</SU>
<AB>"In this Appendix we describe the basic features of the computer programs
used in this study." Compilation and numbering of the sequence. Editing and
revision. Search for specific sequences. Search for families of sequences.
Translation into protein sequence simultaneously in all three reading frames.
</AB>
<JT>J Mol Biol</JT>
<PY>116</PY>
<VO>116</VO>
<PP>29-30</PP>
</SEQ>

<SEQ>
<UI>0303   Orcutt,B.C.   Nucleic Acid Sequence .. Nucleic Acids R 82 
10(1):157-174
</UI>
<AU>Orcutt BC;
    George DG;
    Fredrickson JA;
    Dayhoff MO
</AU>
<TI>Nucleic Acid Sequence Database Computer System
</TI>
<SU>Sequence database;
    Database search;
    USA;
    Nucleic acid
</SU>
<AB>"On September 15, 1980, the Nucleic Acid Sequence Database Demonstration
Project of the National Biomedical Research Foundation was made available to
interested users through telephone access to our computer. ... The main
retrieval program of the system is the nucleic acid query program (NAQ). The
commands of this program and other ancillary programs of the system are 
designed
so that similar nucleic acid sequences can be aligned readily, protein 
sequences
or the complements of nucleic acid sequences can be constructed, stored, and
aligned, and feature tables can be examined."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1982</PY>
<VO>10</VO>
<NO>1</NO>
<PP>157-174</PP>
</SEQ>

<SEQ>
<UI>0304   Queen,C.L.    Computer Analysis of N.. Methods Enzymol 80 
65:595-609
</UI>
<AU>Queen CL;
    Korn LJ
</AU>
<TI>Computer Analysis of Nucleic Acids and Proteins
</TI>
<SU>Sequence analysis;
    Program;
    USA;
    Protein
</SU>
<AB>"In this article, we begin by outlining some general principles of
computer utilization. Then we discuss the computer programs available for
nucleic acid analysis and present in detail one comprehensive program that can
be used for amino acid as well as nucleotide sequences. We conclude with a
comment on the interpretation of computer-generated information." See also 
Korn,
Queen, Wegman (1977).
</AB>
<JT>Methods Enzymol</JT>
<PY>65</PY>
<VO>65</VO>
<PP>595-609</PP>
</SEQ>

<SEQ>
<UI>0305   Staden,R.     Sequence Data Handling.. Nucleic Acids R 77 
4(11):4037-405
</UI>
<AU>Staden R
</AU>
<TI>Sequence Data Handling by Computer
</TI>
<SU>Management;
    Sequence analysis;
    Program;
    UK
</SU>
<AB>"The speed of the new DNA sequencing techniques has created a need for
computer programs to handle the data produced. This paper describes simple
programs designed specifically for use by people with little or no computer
experience. The programs are for use on small computers and provide facilities
for storage, editing and analysis of both DNA and amino acid sequences."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1977</PY>
<VO>4</VO>
<NO>11</NO>
<PP>4037-4051</PP>
</SEQ>

<SEQ>
<UI>0306   Staden,R.     Further Procedures for.. Nucleic Acids R 78 
5(3):1013-1015
</UI>
<AU>Staden R
</AU>
<TI>Further Procedures for Sequence Analysis by Computer
</TI>
<SU>Sequence analysis;
    Program;
    UK;
    Management
</SU>
<AB>"A previous paper [Staden (1977)] described programs for sequence data
handling and analysis by computer. The facilities of this basic set are 
extended
by further easily used programs."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1978</PY>
<VO>5</VO>
<NO>3</NO>
<PP>1013-1015</PP>
</SEQ>

<SEQ>
<UI>0307   Eigen,M.      Statistical Geometry i.. Proc.Nat.Acad.S 88 
85:5913-5917
</UI>
<AU>Eigen M;
    Winkler-Oswatitsch R;
    Dress A
</AU>
<TI>Statistical Geometry in Sequence Space: A Method of Quantitative
Comparative Sequence Analysis
</TI>
<SU>Multiple comparison;
    Significance;
    DE;
    Statistical;
    Sequence analysis;
    Segment;
    Invariant;
    Geometry
</SU>
<AB>"A statistical method of comparative sequence analysis that combines
horizontal and vertical correlations among aligned sequences is introduced. It
is based on the analysis mainly of quartet combinations of sequences considered
as geometric configurations in sequence space. Numerical invariants related to
relative internal segment lengths are assigned to each such configuration and
statistical averages of these invariants are established. They are used for
internal calibration of the topology of divergence and for quantitative
determination of the noise level."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>85</PY>
<VO>85</VO>
<PP>5913-5917</PP>
</SEQ>

<SEQ>
<UI>0308   Sankoff,D.    Efficient Optimal Deco.. Math.Biosci.    92 
111:279-293
</UI>
<AU>Sankoff D
</AU>
<TI>Efficient Optimal Decomposition of a Sequence into Disjoint Regions, Each
Matched to Some Template in an Inventory
</TI>
<SU>Decomposition;
    CA;
    Region;
    Consensus sequence;
    Dynamic programming;
    Profile;
    Template;
    Optimal
</SU>
<AB>"Given an amino acid sequence, we discuss how to find efficiently an
optimal set of disjoint regions (substrings, domains, modules, etc.), each of
which can be matched to some element of a predefined inventory containing, for
example, consensus sequences, protosequences, or protein family profiles. ...
[The problem] can be solved in time quadratic in the length of the sequence and
linear with the number of templates in the inventory, by a single pass of a
dynamic programming algorithm."
</AB>
<JT>Math Biosci</JT>
<PY>111</PY>
<VO>111</VO>
<PP>279-293</PP>
</SEQ>

<SEQ>
<UI>0309   Auger,I.E.    Algorithms for the Opt.. Bull.Math.Biol. 89 51:39-54
</UI>
<AU>Auger IE;
    Lawrence CE
</AU>
<TI>Algorithms for the Optimal Identification of Segment Neighborhoods
</TI>
<SU>Sequence analysis;
    Common feature;
    Segment;
    Least squares;
    Likelihood;
    USA;
    Optimal;
    Identification;
    Algorithm
</SU>
<AB>"Two algorithms for the efficient identification of segment neighborhoods
are presented. A segment neighborhood is a set of contiguous residues that 
share
common features. Two procedures are developed to efficiently find estimates for
the parameters of the model that describe these features and for the residues
that define the boundaries of each segment neighborhood. The algorithms can
accept nearly any model of segment neighborhood, and can be applied with a 
broad
class of best fit functions including least squares and maximum likelihood."
</AB>
<JT>Bull Math Biol</JT>
<PY>51</PY>
<VO>51</VO>
<PP>39-54</PP>
</SEQ>

<SEQ>
<UI>0310   Smith,T.F.    The History of the Gen.. Genomics        90 
6:701-707
</UI>
<AU>Smith TF
</AU>
<TI>The History of the Genetic Sequence Databases
</TI>
<SU>Sequence database;
    USA;
    Genetic
</SU>
<AB>A historical sketch with 25 references
</AB>
<JT>Genomics </JT>
<PY>6</PY>
<VO>6</VO>
<PP>701-707</PP>
</SEQ>

<SEQ>
<UI>0311   George,D.G.   The Protein Identifica.. Nucleic Acids R 86 
14(1):11-15
</UI>
<AU>George DG;
    Barker WC;
    Hunt LT
</AU>
<TI>The Protein Identification Resource (PIR)
</TI>
<SU>Sequence database;
    USA;
    Coding;
    Identification;
    Protein;
    PIR
</SU>
<AB>"The Protein Identification Resource, which provides the scientific
community with an efficient on-line computer system designed for the
identification and analysis of protein sequences and their corresponding coding
sequences, has been established. The resource consists of an integrated 
computer
system composed of a number of protein and nucleic acid sequence databases and
the software necessary to analyze this information effectively."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1986</PY>
<VO>14</VO>
<NO>1</NO>
<PP>11-15</PP>
</SEQ>

<SEQ>
<UI>0312   Chappey,C.    A Method for Delineati.. Comput.Appl.Bio 92 
8(3):255-260
</UI>
<AU>Chappey C;
    Hazout S
</AU>
<TI>A Method for Delineating Structurally Homogeneous Regions in Protein
Sequences
</TI>
<SU>FR;
    Region;
    Common feature;
    Sequence analysis;
    Profile;
    Protein
</SU>
<AB>"A homogeneous region in a protein sequence is a set of contiguous
residues that share common features, concerning physico-chemical, structural 
and
mutational information. This paper presents a method for identifying such
homogeneous regions. From a profile describing a given type of biological
information along the sequence, the algorithm allows the segmentation of the
sequence by optimizing a criterion characterized by two user-defined control
parameters: the 'homogenizing degree' of the regions and the 'site
neighbourhood' size."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>3</NO>
<PP>255-260</PP>
</SEQ>

<SEQ>
<UI>0313   Claverie,J.M. Smoothing Profiles wit.. Comput.Appl.Bio 91 
7(1):113-115
</UI>
<AU>Claverie JM;
    Daulmerie C
</AU>
<TI>Smoothing Profiles with Sliding Windows: Better to Wear a Hat!
</TI>
<SU>Sequence analysis;
    Display;
    Profile;
    FR
</SU>
<AB>"A general way to analyze sequences is to turn them into lists of
position-dependent numerical values, the graphical representation of which
provides a sequence profile suitable for visual inspection and 'pattern
recognition'. ... We examined here the merit of the 'triangular' window, which
consists of associating linearly decreasing weights with the positions starting
from the center ('hat' average). ... It is remarkable how such a minor change 
in
the smoothing algorithm improves the profile readability."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1991</PY>
<VO>7</VO>
<NO>1</NO>
<PP>113-115</PP>
</SEQ>

<SEQ>
<UI>0314   Peltola,H.    Algorithms for Some St.. Information P.. 
83North-Holland
</UI>
<AU>Peltola H;
    Soderlund H;
    Tarhio J;
    Ukkonen E
</AU>
<TI>Algorithms for Some String Matching Problems Arising in Molecular 
Genetics
</TI>
<ED>Mason REA
</ED>
<BK>Information Processing 83. Proceedings of the IFIP 9th World Computer
Congress. Paris, France, September 19-23, 1983
</BK>
<SU>Composition;
    FI;
    String match;
    Genetic;
    Algorithm
</SU>
<AB>"With current laboratory techniques it is possible to determine the
nucleotide order for relatively short fragments of a long DNA molecule while 
the
total order for long molecules must be reconstructed from the fragments. ... We
give for this problem a simple formulation as a string matching problem, and
develop efficient algorithms for finding good approximate solutions."
</AB>
<PU>North-Holland </PU>
<PL>Amsterdam </PL>
<PY>1983</PY>
<PP>59-64</PP>
</SEQ>

<SEQ>
<UI>0315   Gingeras,T.R. Steps Toward Computer .. Science         80 209(19 
Sept.):
</UI>
<AU>Gingeras TR;
    Roberts RJ
</AU>
<TI>Steps Toward Computer Analysis of Nucleotide Sequences
</TI>
<SU>Sequence analysis;
    Review;
    USA;
    Clone;
    Nucleotide
</SU>
<AB>"Concomitant improvements in methods for nucleic acid sequencing have led
many investigators to characterize their clones by sequencing them. This has
resulted in the accumulation of such large amounts of sequence data that
computer-assisted methods, with programs directed toward the manipulation of
nucleic acid sequences, have become indispensable during the collection and
analysis of that data. ... It is the intent of this article to report on the
developing role of computer technology in this field."
</AB>
<JT>Science </JT>
<PY>1980</PY>
<VO>209</VO>
<NO>19 Sept.</NO>
<PP>1322-1328</PP>
</SEQ>

<SEQ>
<UI>0316   Stormo,G.D.   Quantitative Analysis .. Nucleic Acids R 86 
14(16):6661-66
</UI>
<AU>Stormo GD;
    Schneider TD;
    Gold L
</AU>
<TI>Quantitative Analysis of the Relationship between Nucleotide Sequence and
Functional Activity
</TI>
<SU>Function;
    Match a pattern matrix;
    USA;
    Nucleotide
</SU>
<AB>"Several recent papers have used matrices to evaluate nucleic acid
sequences .... The different papers vary in their methods of assigning values 
to
the elements of the matrix. In this paper we show how to use methods for 
solving
simultaneous equations to find the matrix elements that give the best fit to a
set of quantitative data."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1986</PY>
<VO>14</VO>
<NO>16</NO>
<PP>6661-6679</PP>
</SEQ>

<SEQ>
<UI>0317   Kashyap,R.L.  Spelling Correction us.. Pattern Recogni 84 
2(3):147-154
</UI>
<AU>Kashyap RL;
    Oommen BJ
</AU>
<TI>Spelling Correction using Probabilistic Methods
</TI>
<SU>Correction;
    USA;
    Probabilistic;
    Edit
</SU>
<AB>"A probabilistic procedure is suggested for the automatic correction of
spelling and typing errors in printed English texts. The heart of the procedure
is a probabilistic model for the generation of the garbled word from the 
correct
word. The garbler can delete or insert symbols in the word or substitute one or
more symbols by other symbols. An expression is derived for P(Y | X), the
probability of generating a garbled word Y from a correct word X. The model is
probabilistically consistent."
</AB>
<JT>Pattern Recognition Lett</JT>
<PY>1984</PY>
<VO>2</VO>
<NO>3</NO>
<PP>147-154</PP>
</SEQ>

<SEQ>
<UI>0318   Peterson,J.L. Computer Programs for .. Comm.ACM        80 
23(12):676-687
</UI>
<AU>Peterson JL
</AU>
<TI>Computer Programs for Detecting and Correcting Spelling Errors
</TI>
<SU>Correction;
    USA;
    Error;
    Program
</SU>
<AB>"With the increase in word and text processing computer systems, programs
which check and correct spelling will become more and more common. Peterson
investigates the basic structure of several such existing programs and their
approaches to solving the problems which arise when this type of program is
created. The basic framework and background necessary to write a spelling
checker or corrector are provided."
</AB>
<JT>Comm ACM </JT>
<PY>1980</PY>
<VO>23</VO>
<NO>12</NO>
<PP>676-687</PP>
</SEQ>

<SEQ>
<UI>0319   Riseman,E.M.  A Contextual Postproce.. IEEE Trans.Comp 74 
23(5):480-493
</UI>
<AU>Riseman EM;
    Hanson AR
</AU>
<TI>A Contextual Postprocessing System for Error Correction using Binary n-
Grams
</TI>
<SU>Correction;
    N-gram;
    USA;
    Pattern recognition;
    Error
</SU>
<AB>"The effectiveness of various forms of contextual information in a
postprocessing system for detection and correction of errors in words is
examined. Various algorithms utilizing context are considered, from a 
dictionary
algorithm which has available the maximum amount of information, to a set of
contextual algorithms utilizing positional binary n-gram statistics. ... This
type of information is extremely compact and the computation for error
correction is orders of magnitude less than that required by the dictionary
algorithm."
</AB>
<JT>IEEE Trans Comput</JT>
<PY>1974</PY>
<VO>23</VO>
<NO>5</NO>
<PP>480-493</PP>
</SEQ>

<SEQ>
<UI>0320   Smith,T.F.    Statistical Characteri.. Nucleic Acids R 83 
11(7):2205-222
</UI>
<AU>Smith TF;
    Waterman MS;
    Sadler JR
</AU>
<TI>Statistical Characterization of Nucleic Acid Sequence Functional Domains
</TI>
<SU>Function;
    USA;
    Statistical;
    Genome;
    Segment;
    Coding;
    Nucleic acid;
    Characterization
</SU>
<AB>"It has long been recognized that various genome classes were
distinguishable on the basis of base composition and nearest neighbor
frequencies. ... It is now clear that these and related statistics can uniquely
characterize the various functional domains of the genome. In particular,
peptide coding, intervening segments, structural RNA coding and mitochondrial
domains of the vertebrate genome are uniquely characterizable. ... Here, we
investigated the statistical measures most distinctive of the various domains
and then linked them to our current understandings in so far as possible."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1983</PY>
<VO>11</VO>
<NO>7</NO>
<PP>2205-2220</PP>
</SEQ>

<SEQ>
<UI>0321   Liquori,A.M.  Pattern Recognition of.. J.Mol.Evol.     86 23:80-87
</UI>
<AU>Liquori AM;
    Ripamonti A;
    Sadun C;
    Ottani S;
    Braga D
</AU>
<TI>Pattern Recognition of Sequence Similarities in Globular Proteins by
Fourier Analysis: A Novel Approach to Molecular Evolution
</TI>
<SU>Pairwise comparison;
    Fourier;
    Pattern recognition;
    Segment;
    Italy;
    Similarity;
    Evolution;
    Protein;
    Recognition
</SU>
<AB>"A new algorithm is introduced for analyzing gene-duplication-independent
(orthologous) and gene-duplication-dependent amino acid sequence similarities
between proteins of different species. It is based on the calculation of an
auto-correlation function D(x) as a Fourier series analogous to that used in
crystal analysis by x-ray diffraction. ... This method allows satisfactory
pattern recognition of homologies and internal duplications of an initial
segment of the polypeptide chain."
</AB>
<JT>J Mol Evol</JT>
<PY>23</PY>
<VO>23</VO>
<PP>80-87</PP>
</SEQ>

<SEQ>
<UI>0322   Sankoff,D.    Genomic Divergence thr.. Methods Enzymol 90 
183:428-438
</UI>
<AU>Sankoff D;
    Cedergren R;
    Abel Y
</AU>
<TI>Genomic Divergence through Gene Rearrangement
</TI>
<SU>Genome;
    Genomic;
    Probabilistic;
    Divergence;
    Gene;
    Rearrangement;
    CA
</SU>
<AB>"In this chapter we discuss simple probabilistic models for genome
shuffling introduced by Sankoff and Goldstein [1989] and apply them to the
assessment of relationships among a number of bacterial genomes. Lacking
complete nucleotide sequences at this level, we assess our methodology on
genetic map data."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>428-438</PP>
</SEQ>

<SEQ>
<UI>0323   Hillis,D.M.   Analysis of DNA Sequen.. Methods Enzymol 93 
224:456-487
</UI>
<AU>Hillis DM;
    Allard MW;
    Miyamoto MM
</AU>
<TI>Analysis of DNA Sequence Data: Phylogenetic Inference
</TI>
<SU>Phylogeny;
    Review;
    USA;
    DNA;
    Phylogenetic
</SU>
<AB>"Methods for inferring phylogeny from DNA sequences have proliferated
greatly in the last few years. Unfortunately, decisions concerning which of 
many
described methods will be used in a given study are rarely made by weighing the
advantages and disadvantages of each approach; instead, issues of availability
or historical inertia often dictate such choices. ... Our goal in this chapter
is to present a practical guide to selecting a set of methods for phylogenetic
analysis of nucleic acid sequences. We focus on the assumptions, advantages,
disadvantages, and limitations of the various approaches. Space does not permit
a description of each of the algorithms, but many of these are described in an
excellent review paper by Swofford and Olsen (1990)."
</AB>
<JT>Methods Enzymol</JT>
<PY>224</PY>
<VO>224</VO>
<PP>456-487</PP>
</SEQ>

<SEQ>
<UI>0324   Zakharov,I.A. Quantitative Analysis .. Dokl.Biol.Sci.  88 
301:443-447
</UI>
<AU>Zakharov IA;
    Valeev AK
</AU>
<TI>Quantitative Analysis of Evolution of Mammalian Genomes by Comparison of
Genetic Maps
</TI>
<SU>Genome;
    Genetic;
    Mapping;
    RU;
    Evolution
</SU>
<AB>Translated from Doklady Akademii Nauk SSSR, 301(5), 1213-1218, Aug. 1988.
"The number of homologous genes with known location in different mammals is 
high
enough to attempt to make a stricter analysis. Since no works are known to us 
in
which an appropriate mathematical apparatus has been proposed, our first task
was to formulate approaches to a quantitative analysis of similarities and
differences in genetic maps and to test the possibilities of such analysis by
comparing genetic maps of five species of mammals. ... Our first task was the
determination of the measure of similarity between genetic maps compared." Four
proximity measures are described.
</AB>
<JT>Dokl Biol Sci</JT>
<PY>301</PY>
<VO>301</VO>
<PP>443-447</PP>
</SEQ>

<SEQ>
<UI>0325   Middendorf,M. The Shortest Common No.. Theoret.Comput. 93 
108:365-369
</UI>
<AU>Middendorf M
</AU>
<TI>The Shortest Common Nonsubsequence Problem is NP-complete
</TI>
<SU>Nonsubsequence;
    DE;
    Complexity;
    Shortest common
</SU>
<AB>"The SCNS problem is shown to be NP-complete for strings over an alphabet
of size &gt;= 2."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>108</PY>
<VO>108</VO>
<PP>365-369</PP>
</SEQ>

<SEQ>
<UI>0326   Hebrard,J.J.  An Algorithm for Disti.. Theoret.Comput. 91 82:35-49
</UI>
<AU>Hebrard JJ
</AU>
<TI>An Algorithm for Distinguishing Efficiently Bit-Strings by their
Subsequences
</TI>
<SU>Longest common;
    Sequence proximity;
    FR;
    Subsequence;
    Algorithm
</SU>
<AB>"A linear on-line algorithm for computing a shortest subsequence that
distinguishes two different bit-strings is presented. The method is based on a
special way of factorizing strings. ... One can also consider as a measure of
similarity the greatest integer d(u,v) such that no string of length &lt;= d(u,v)
can distinguish u and v. This paper is devoted to the computation of d(u,v)."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>82</PY>
<VO>82</VO>
<PP>35-49</PP>
</SEQ>

<SEQ>
<UI>0327   Claverie,J.M. k-Tuple Frequency Anal.. Methods Enzymol 90 
183:237-252
</UI>
<AU>Claverie JM;
    Sauvaget I;
    Bougueleret L
</AU>
<TI>k-Tuple Frequency Analysis: From Intron/Exon Discrimination to T-cell
Epitope Mapping
</TI>
<SU>Sequence analysis;
    k-tuple;
    FR;
    N-gram;
    Discrimination;
    Mapping
</SU>
<AB>"There are many classes of functions for which neither a sufficient
overall homology nor a discriminant functional signature can be found. ... For
such problems, we have developed an alternative approach which does not depend
on the a priori recognition of a single or a few specific, highly discriminant,
patterns. Instead, the methods presented in this chapter take advantage of the
frequencies of occurrence of all subsequences of length k (k-tuples) as 
computed
from the sequence of interest."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>237-252</PP>
</SEQ>

<SEQ>
<UI>0328   Searls,D.B.   The Linguistics of DNA   Am.Sci.         92 
80(Nov.-Dec.):
</UI>
<AU>Searls DB
</AU>
<TI>The Linguistics of DNA
</TI>
<SU>Sequence analysis;
    Language;
    USA;
    Genome;
    Linguistic;
    DNA
</SU>
<AB>"Finding effective methods for reading the language of nucleic acids is
rapidly becoming an issue of practical concern. ... Equipped with knowledge of
the linguistic structure of the genome, one can endeavor to write a computer
program that parses genes and other high-level features of DNA."
</AB>
<JT>Am Sci</JT>
<PY>1992</PY>
<VO>80</VO>
<NO>Nov.-Dec.</NO>
<PP>579-591</PP>
</SEQ>

<SEQ>
<UI>0329   Brendel,V.    Genome Structure Descr.. Nucleic Acids R 84 
12(5):2561-256
</UI>
<AU>Brendel V;
    Busse HG
</AU>
<TI>Genome Structure Described by Formal Languages
</TI>
<SU>Genome;
    Language;
    Linguistic;
    DE;
    Automata;
    Structure
</SU>
<AB>"Nucleic acid sequences may be looked upon as words over the alphabet of
nucleotides. Naturally occurring DNAs and RNAs form subsets of the set of all
possible words. The use of formal languages is proposed to describe the
structure of these subsets. Regular languages defined by finite automata are
introduced to demonstrate the application of the concept on RNA-phages of group
I. This approach permits a concise characterization of grammatical patterns in
genetic information."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1984</PY>
<VO>12</VO>
<NO>5</NO>
<PP>2561-2568</PP>
</SEQ>

<SEQ>
<UI>0330   Collado-Vides A Transformational-Gra.. J.Theor.Biol.   89 
136:403-425
</UI>
<AU>Collado-Vides J
</AU>
<TI>A Transformational-Grammar Approach to the Study of the Regulation of 
Gene
Expression
</TI>
<SU>Language;
    Linguistic;
    Genome;
    MEX;
    Expression;
    Gene
</SU>
<AB>"We propose generative grammar for constructing an integrative paradigm
for the understanding of genome organization and the regulation of gene
expression. Linguistic terms in molecular biology are defined. ... A general
structure is presented for the grammar; the application of phase-structure 
rules
is justified by the existence of lexical categories. Transformational rules are
utilized to represent loops of regulation. ... Finally, this approach is
compared to other linguistic applications in molecular biology."
</AB>
<JT>J Theor Biol</JT>
<PY>136</PY>
<VO>136</VO>
<PP>403-425</PP>
</SEQ>

<SEQ>
<UI>0331   Head,T.       Formal Language Theory.. Bull.Math.Biol. 87 
49(6):737-759
</UI>
<AU>Head T
</AU>
<TI>Formal Language Theory and DNA: An Analysis of the Generative Capacity of
Specific Recombinant Behaviors
</TI>
<SU>Sequence analysis;
    Language;
    USA;
    DNA
</SU>
<AB>"A new manner of relating formal language theory to the study of
informational macromolecules is initiated. ... The associated languages are
analysed by means of a new generative formalism called a splicing system. A
significant subclass of these languages, which we call the persistent splicing
languages, is shown to coincide with a class of regular languages which have
been previously studied in other contexts: the strictly locally testable
languages. This study initiates the formal analysis of the generative power of
recombinational behaviors in general."
</AB>
<JT>Bull Math Biol</JT>
<PY>1987</PY>
<VO>49</VO>
<NO>6</NO>
<PP>737-759</PP>
</SEQ>

<SEQ>
<UI>0332   Newberg,L.A.  A Lower Bound on the N.. Adv.Appl.Math.  93 
14(2):172-183
</UI>
<AU>Newberg LA;
    Naor D
</AU>
<TI>A Lower Bound on the Number of Solutions to the Probed Partial Digest
Problem
</TI>
<SU>Digest;
    Mapping;
    USA
</SU>
<AB>"The probed partial digestion mapping method partially digests a DNA
strand with a restriction enzyme. A probe, which attaches to the DNA between 
two
restriction enzyme cutting sites, is hybridized to the partially digested DNA,
and the sizes of fragments to which the probe hybridizes are measured. The
objective is to reconstruct the linear order of the restriction enzyme cutting
sites from the multiset of measured lengths. ... This article shows that a
multiset of N measured lengths can have as many as W(Nt) solutions for any t &lt;
... 1.73."
</AB>
<JT>Adv Appl Math</JT>
<PY>1993</PY>
<VO>14</VO>
<NO>2</NO>
<PP>172-183</PP>
</SEQ>

<SEQ>
<UI>0333   Barth,G.      Relating the Average-c.. Combinatorial.. 
85Springer-Verlag
</UI>
<AU>Barth G
</AU>
<TI>Relating the Average-case Costs of the Brute-Force and Knuth-Morris-Pratt
String Matching Algorithm
</TI>
<ED>Apostolico A
    Galil Z
</ED>
<BK>Combinatorial Algorithms on Words
</BK>
<SU>String match;
    Knuth-Morris-Pratt;
    Markov;
    DE;
    Algorithm
</SU>
<AB>"The main objective of this paper is to elaborate on this observation
[that the Knuth-Morris-Pratt algorithm is not likely to be significantly faster
than the brute-force method in most actual applications] and to present a
detailed and accurate average-case analysis of both the brute-force and the KMP
algorithm. The analysis exploits results from Markov chain theory. ... An
accurate approximation for the ratio KMP/NAIVE, where KMP and NAIVE denote the
average case complexities of the KMP and naive string matching algorithms,
respectively, is given by the term 1 - (1/c) + (1/c2)."
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1985</PY>
<PP>45-58</PP>
</SEQ>

<SEQ>
<UI>0334   Odlyzko,A.M.  Enumeration of Strings   Combinatorial.. 
85Springer-Verlag
</UI>
<AU>Odlyzko AM
</AU>
<TI>Enumeration of Strings
</TI>
<ED>Apostolico A
    Galil Z
</ED>
<BK>Combinatorial Algorithms on Words
</BK>
<SU>Enumeration;
    Pattern match;
    String match;
    Survey;
    USA
</SU>
<AB>"A survey is presented of some methods and results on counting words that
satisfy various restrictions on subwords (i.e., blocks of consecutive symbols).
Various applications to comma-free codes, games, pattern matching, and other
subjects are indicated. The emphasis is on the unified treatment of those 
topics
through the use of generating functions."
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1985</PY>
<PP>205-228</PP>
</SEQ>

<SEQ>
<UI>0335   Main,M.G.     Linear Time Recognitio.. Combinatorial.. 
85Springer-Verlag
</UI>
<AU>Main MG;
    Lorentz RJ
</AU>
<TI>Linear Time Recognition of Squarefree Strings
</TI>
<ED>Apostolico A
    Galil Z
</ED>
<BK>Combinatorial Algorithms on Words
</BK>
<SU>Regularities;
    Square;
    USA;
    Recognition
</SU>
<AB>"This paper presents a new O(n log n) algorithm to determine whether a
string of length n has a substring which is a square. The algorithm is not as
general as some previous algorithms for finding all squares ..., but it does
have a simplicity which the others lack. Also, for a fixed alphabet of size k,
the algorithm can be improved by a factor of logk (n), yielding an O(n)
algorithm for determining whether a string contains a square."
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1985</PY>
<PP>271-278</PP>
</SEQ>

<SEQ>
<UI>0336   Rabin,M.O.    Discovering Repetition.. Combinatorial.. 
85Springer-Verlag
</UI>
<AU>Rabin MO
</AU>
<TI>Discovering Repetitions in Strings
</TI>
<ED>Apostolico A
    Galil Z
</ED>
<BK>Combinatorial Algorithms on Words
</BK>
<SU>Regularities;
    Repetition;
    String match;
    Fingerprint;
    USA
</SU>
<AB>"In the present paper we employ the fingerprinting method to solve yet
another string matching problem. Given a string y we want to find the earliest
repetition, i.e. the shortest w and x such that y = wxxz. We shall call this 
the
repetition problem."
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1985</PY>
<PP>279-288</PP>
</SEQ>

<SEQ>
<UI>0337   Restivo,A.    Some Decision Results .. Combinatorial.. 
85Springer-Verlag
</UI>
<AU>Restivo A;
    Salemi S
</AU>
<TI>Some Decision Results on Nonrepetitive Words
</TI>
<ED>Apostolico A
    Galil Z
</ED>
<BK>Combinatorial Algorithms on Words
</BK>
<SU>Regularities;
    Repetition;
    Square;
    Italy;
    Word
</SU>
<AB>"The paper addresses some generalizations of the Thue Problem such as:
given a word u, does there exist an infinite nonrepetitive overlap free (or
square free) word having u as a prefix? A solution to this as well as to 
related
problems is given for the case of overlap free words on a binary alphabet."
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1985</PY>
<PP>289-295</PP>
</SEQ>

<SEQ>
<UI>0338                 Combinatorial Algorith..                 
85Springer-Verlag
</UI>
<TI>Combinatorial Algorithms on Words
</TI>
<ED>Apostolico A
    Galil Z
BK  -
</ED>
<SU>String match;
    Search tree;
    Enumeration;
    Regularities;
    USA;
    Compression;
    Combinatorial;
    Word;
    Algorithm
</SU>
<AB>Table of contents only. General (1 paper), string matching (4), subword
trees (2), data compression (5), counting (4), periods and other regularities
(4), miscellaneous (5).
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1985</PY>
<PP>0-0</PP>
</SEQ>

<SEQ>
<UI>0339   Waterman,M.S. Foreword [Mathematical.. Bull.Math.Biol. 89 
51(1):1-4
</UI>
<AU>Waterman MS
</AU>
<TI>Foreword [Mathematical Analysis of Molecular Sequences. Special Issue]
</TI>
<SU>Sequence analysis;
    USA;
    Statistical;
    Region;
    Genome;
    Approximate match;
    Codon;
    Coding
</SU>
<AB>"The present issue is a collection of [ten] research papers in the area 
of
mathematical analysis of molecular sequences." Approximate matching of
sequences, fit models to protein sequences, statistical properties of a DNA
sequence, codon preference in protein coding regions, genome comparison, large
deviations for the binomial distribution.
</AB>
<JT>Bull Math Biol</JT>
<PY>1989</PY>
<VO>51</VO>
<NO>1</NO>
<PP>1-4</PP>
</SEQ>

<SEQ>
<UI>0340                 Combinatorial Pattern ..                 
94Springer-Verlag
</UI>
<TI>Combinatorial Pattern Matching. 5th Annual Symposium, CPM 94. 
Proceedings.
Lecture Notes in Computer Science, Volume 807.
</TI>
<ED>Crochemore M
    Gusfield D
BK  -
</ED>
<SU>Sequence alignment;
    Pattern match;
    FR;
    Language;
    Expression;
    Combinatorial
</SU>
<AB>Asilomar, June 5-8, 1994. "Combinatorial Pattern Matching addresses 
issues
of searching and matching of strings and more complicated patterns such as
trees, regular expressions, extended expressions, etc. The goal is to derive
non-trivial combinatorial properties for such structures and then to exploit
these properties in order to achieve superior performances for the 
corresponding
computational problems."
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1994</PY>
<PP>viii+326-0</PP>
</SEQ>

<SEQ>
<UI>0341                 Molecular Evolution: C..                 90Academic 
Press
</UI>
<TI>Molecular Evolution: Computer Analysis of Protein and Nucleic Acid
Sequences. Methods in Enzymology, Volume 183.
</TI>
<ED>Doolittle RF
BK  -
</ED>
<SU>Sequence database;
    Database search;
    Pattern match;
    Structure;
    Sequence alignment;
    Phylogeny;
    USA;
    Evolution;
    Protein;
    Nucleic acid
</SU>
<AB>Title page, table of contents only. Databases (4 papers), searching
databases (5), patterns in nucleic acid sequences (7), predicting RNA secondary
structure (3), aligning protein and nucleic acid sequences (12), estimating
sequence divergence (5), phylogenetic trees (6).
</AB>
<PU>Academic Press </PU>
<PL>San Diego </PL>
<PY>1990</PY>
<PP>1-707</PP>
</SEQ>

<SEQ>
<UI>0342   Angluin,D.    Finding Patterns Commo.. ACM Sympos.Theo 79 
11:130-141
</UI>
<AU>Angluin D
</AU>
<TI>Finding Patterns Common to a Set of Strings (Extended Abstract)
</TI>
<SU>Pattern language;
    Pattern definition;
    USA
</SU>
<AB>"We motivate, formalize, and study a computational problem in concrete
inductive inference. A 'pattern' is defined to be a concatenation of constants
and variables, and the language of a pattern is defined to be the set of 
strings
obtained by substituting constant strings for the variables. The problem we
consider is, given a set of strings, find a minimal pattern language containing
this set. This problem is shown to be effectively solvable in the general case
and to lead to correct inference in the limit of the pattern languages. There
exists a polynomial time algorithm for it in the restricted case of 
one-variable
patterns."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>11</PY>
<VO>11</VO>
<PP>130-141</PP>
</SEQ>

<SEQ>
<UI>0343   Chrobak,M.    Remarks on String-matc.. Inform.Process. 87 
24(5):325-329
</UI>
<AU>Chrobak M;
    Rytter W
</AU>
<TI>Remarks on String-matching and One-way Multihead Automata
</TI>
<SU>Pattern match;
    Automata;
    Complexity;
    PO;
    String match
</SU>
<AB>"The complexity of string-matching has been deeply investigated both in
sequential and parallel models of computation. Since string-matching can be 
done
in real time, there is no sense of talking about any lower bounds concerning 
its
time complexity. However, one can ask what is the simplest possible device
capable of doing string-matching."
</AB>
<JT>Inform Process Lett</JT>
<PY>1987</PY>
<VO>24</VO>
<NO>5</NO>
<PP>325-329</PP>
</SEQ>

<SEQ>
<UI>0344   Waterman,M.S. Interval Graphs and Ma.. Bull.Math.Biol. 86 
48(2):189-195
</UI>
<AU>Waterman MS;
    Griggs JR
</AU>
<TI>Interval Graphs and Maps of DNA
</TI>
<SU>Graph;
    Restriction;
    Mapping;
    USA;
    DNA
</SU>
<AB>"A special class of interval graphs is defined and characterized, and an
algorithm is given for their construction. These graphs are motivated by an
important representation of DNA called restriction maps by molecular 
biologists.
Circular restriction maps are easily included."
</AB>
<JT>Bull Math Biol</JT>
<PY>1986</PY>
<VO>48</VO>
<NO>2</NO>
<PP>189-195</PP>
</SEQ>

<SEQ>
<UI>0345   Galil,Z.      Two Fast Simulations w.. Inform.Process. 76 
4(4):85-87
</UI>
<AU>Galil Z
</AU>
<TI>Two Fast Simulations which Imply some Fast String Matching and 
Palindrome-
Recognition Algorithms
</TI>
<SU>Pattern match;
    Regularities;
    USA;
    String match;
    Simulation;
    Algorithm
</SU>
<AB>"Theorems 1 and 3 imply some fast algorithms for string-matching and for
palindrome-recognition which could not be derived directly by the previously
known simulations."
</AB>
<JT>Inform Process Lett</JT>
<PY>1976</PY>
<VO>4</VO>
<NO>4</NO>
<PP>85-87</PP>
</SEQ>

<SEQ>
<UI>0346   Manacher,G.   A New Linear-time "On-.. J.Assoc.Comput. 75 
22(3):346-351
</UI>
<AU>Manacher G
</AU>
<TI>A New Linear-time "On-line" Algorithm for Finding the Smallest Initial
Palindrome of a String
</TI>
<SU>Regularities;
    Automata;
    USA;
    Palindrome;
    Algorithm
</SU>
<AB>"Despite significant advances in linear-time scanning algorithms,
particularly those based wholly or in part on either Cook's linear-time
simulation of two-way deterministic pushdown automata or Weiner's algorithm, 
the
problem of recognizing the initial leftmost nonvoid palindrome of a string in
time proportional to the length N of the palindrome, examining no symbols other
than those in the palindorme, has remained open. The present algorithm solves
this problem, assuming that addition of two integers less than or equal to N 
may
be performed in a single operation."
</AB>
<JT>J Assoc Comput Mach</JT>
<PY>1975</PY>
<VO>22</VO>
<NO>3</NO>
<PP>346-351</PP>
</SEQ>

<SEQ>
<UI>0347   Stephen,G.A.  String Search                            92
</UI>
<AU>Stephen GA
</AU>
<TI>String Search
BK  -
</TI>
<SU>String match;
    Sequence proximity;
    Longest common;
    String search;
    UK
</SU>
<AB>Technical Report TR-92-gas-01, University College of North Wales, Bangor,
Gwynedd, UK, 138 pp. "This report is concerned with a string searching problem
which has arisen in the development of textual search methods for a proposed
information processing system. The aim of the report is not to attempt to
provide a definitive solution for the problem, although some ideas towards this
end are presented, but rather to review existing string searching algorithms
germane to the processing system in general and, in particular, to the specific
problem."
</AB>
<PY>1992</PY>
</SEQ>

<SEQ>
<UI>0348   Burkowski,F.J A Hardware Hashing Sch.. IEEE Trans.Comp 82 
31(9):825-834
</UI>
<AU>Burkowski FJ
</AU>
<TI>A Hardware Hashing Scheme in the Design of a Multiterm String Comparator
</TI>
<SU>Match with don't cares;
    Hardware;
    Retrieval;
    CA;
    Text search
</SU>
<AB>"This paper discusses the hardware design of a term detection unit which
may be used in the scanning of text emanating from a serial source such as a
disk or bubble memory. The main objective of this design is the implementation
of a high performance unit which can detect any one of many terms (e.g., 1024
terms) while accepting source text at disk transfer rates."
</AB>
<JT>IEEE Trans Comput</JT>
<PY>1982</PY>
<VO>31</VO>
<NO>9</NO>
<PP>825-834</PP>
</SEQ>

<SEQ>
<UI>0349   Crochemore,M. Transducers and Repeti.. Theoret.Comput. 86 45:63-86
</UI>
<AU>Crochemore M
</AU>
<TI>Transducers and Repetitions
</TI>
<SU>Regularities;
    Automata;
    FR;
    Search tree;
    Repetition
</SU>
<AB>"The factor transducer of a word associates to each of its factors (or
subwords) their first occurrence. Optimal bounds on the size of minimal factor
transducers together with an algorithm for building them are given. Analogue
results and a simple algorithm are given for the case of subsequential suffix
transducers. Algorithms are applied to repetition searching in words. ... 
Thanks
to factor transducers we get an O(n) algorithm for finding a square in a word 
of
length n on a fixed alphabet." Compare with Main, Lorentz (1984).
</AB>
<JT>Theoret Comput Sci</JT>
<PY>45</PY>
<VO>45</VO>
<PP>63-86</PP>
</SEQ>

<SEQ>
<UI>0350   Hirata,M.     A Versatile Data Strin.. IEEE J.Solid-St 88 
23(2):329-335
</UI>
<AU>Hirata M;
    Yamada H;
    Nagai H;
    Takahashi K
</AU>
<TI>A Versatile Data String-Search VLSI
</TI>
<SU>Approximate match;
    Match with don't cares;
    Hardware;
    Automata;
    JP;
    VLSI;
    String search
</SU>
<AB>"A versatile data string-search VLSI has been described. An 8K content
addressable memory and a 20K-gate finite-state automaton logic have been
combined to execute data string search. This architecture allowed versatile
operations, such as approximate-match and variable-length 'don't care' search 
at
high speed."
</AB>
<JT>IEEE J Solid-State Circuits</JT>
<PY>1988</PY>
<VO>23</VO>
<NO>2</NO>
<PP>329-335</PP>
</SEQ>

<SEQ>
<UI>0351   Landau,G.M.   Efficient String Match.. IEEE Sympos.Fou 85 
26:126-136
</UI>
<AU>Landau GM;
    Vishkin U
</AU>
<TI>Efficient String Matching in the Presence of Errors
</TI>
<SU>Approximate match;
    Error;
    String match;
    IL
</SU>
<AB>"Given a text of length n, a pattern of length m and an integer k, we
present an algorithm for finding all occurrences of the pattern in the text,
each with at most k differences. The algorithm runs in O( m2 + k2n ) time. 
Given
the same input we also present an algorithm for finding all occurrences of the
pattern in the text, each with at most k mismatches (superfluous characters in
either the text or the pattern are not allowed). This algorithm runs in O( k (m
log m + n) ) time."
</AB>
<JT>IEEE Sympos Found Comput Sci</JT>
<PY>26</PY>
<VO>26</VO>
<PP>126-136</PP>
</SEQ>

<SEQ>
<UI>0352   Landau,G.M.   Parallel Construction .. Lecture Notes i 87 
267:314-325
</UI>
<AU>Landau GM;
    Schieber B;
    Vishkin U
</AU>
<TI>Parallel Construction of a Suffix Tree (Extended Abstract)
</TI>
<SU>Search tree;
    Parallel;
    Suffix;
    IL
</SU>
<AB>Proceedings of the 14th ICALP. "Weiner's (1973) suffix tree is known to 
be
a powerful tool for string manipulations. We present a parallel algorithm for
constructing a suffix tree. The algorithm runs in O(log n) time and uses n
processors. We also present applications for designing efficient parallel
algorithms for several string problems."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>267</PY>
<VO>267</VO>
<PP>314-325</PP>
</SEQ>

<SEQ>
<UI>0353   Cornish-Bowde Assessment of Protein .. J.Theor.Biol.   77 
65:735-742
</UI>
<AU>Cornish-Bowden A
</AU>
<TI>Assessment of Protein Sequence Identity from Amino Acid Composition Data
</TI>
<SU>Sequence comparison;
    Composition;
    Sequence proximity;
    UK;
    Protein;
    Amino acid
</SU>
<AB>"In this paper I shall show that for two proteins of equal length an 
index
similar to that of Marchalonis &amp; Weltman (1971) provides a direct and unbiased
estimate of the number of differences between the two sequences. The precision
of this estimate can be calculated a priori and so the reliability of 
deductions
about ancestral relationships can be assessed. Straightforward interpretation 
of
the indexes of Harris et al. (1969) and of Marchalonis &amp; Weltman (1971) is now
possible, because both can readily be converted into the new index with very
little calculation."
</AB>
<JT>J Theor Biol</JT>
<PY>65</PY>
<VO>65</VO>
<PP>735-742</PP>
</SEQ>

<SEQ>
<UI>0354   Bishop,M.J.   Preface [Nucleic Acid .. Nucleic Acid .. 87IRL Press
</UI>
<AU>Bishop MJ;
    Rawlings CJ
</AU>
<TI>Preface [Nucleic Acid and Protein Sequence Analysis: A Practical 
Approach]
</TI>
<ED>Bishop MJ
    Rawlings CJ
</ED>
<BK>Nucleic Acid and Protein Sequence Analysis: A Practical Approach
</BK>
<SU>Sequence analysis;
    Database search;
    Sequence comparison;
    UK;
    Protein
</SU>
<AB>"This book is designed as a practical aid to biologists wishing to use
computers for the acquisition, storage, or analysis of nucleic acid or protein
sequences."
</AB>
<PU>IRL Press </PU>
<PL>Oxford </PL>
<PY>1987</PY>
<PP>v-v</PP>
</SEQ>

<SEQ>
<UI>0355   Copeland,N.G. A Genetic Linkage Map .. Science         94 262 (1 
Oct.):5
</UI>
<AU>Copeland NG;
    Jenkins NA;
    Gilbert DJ;
    Eppig JT;
    Maltais LJ;
    Miller JC;
    Dietrich WF;
    Weaver A;
    Lincoln SE;
    Steen RG;
    Stein LD;
    Nadeau JH;
    Lander ES
</AU>
<TI>A Genetic Linkage Map of the Mouse: Current Applications and Future
Prospects
</TI>
<SU>Genetic;
    Mapping;
    Genome;
    Evolution;
    Gene;
    USA
</SU>
<AB>"Technological advances have made possible the development of high-
resolution genetic linkage maps for the mouse. These maps in turn offer 
exciting
prospects for understanding mammalian genome evolution through comparative
mapping, for developing mouse models of human disease, and for identifying the
function of all genes in the organism."
</AB>
<JT>Science </JT>
<PY>1994</PY>
<VO>262</VO>
<NO>1 Oct.</NO>
<PP>57-66</PP>
</SEQ>

<SEQ>
<UI>0356   Staden,R.     The Current Status and.. Nucleic Acids R 86 
14(1):217-231
</UI>
<AU>Staden R
</AU>
<TI>The Current Status and Portability of our Sequence Handling Software
</TI>
<SU>Management;
    Program;
    Sequence comparison;
    Dot;
    UK
</SU>
<AB>"The package contains a comprehensive suite of programs for managing 
large
shotgun sequencing projects, a program containing 61 functions for analysing
single sequences and a program for comparing pairs of sequences for similarity.
... I believe the programs will now run on any machine with a FORTRAN 77
compiler and sufficient memory."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1986</PY>
<VO>14</VO>
<NO>1</NO>
<PP>217-231</PP>
</SEQ>

<SEQ>
<UI>0357   Hamm,G.H.     The EMBL Data Library    Nucleic Acids R 86 
14(1):5-9
</UI>
<AU>Hamm GH;
    Cameron GN
</AU>
<TI>The EMBL Data Library
</TI>
<SU>Sequence database;
    EMBL;
    DE
</SU>
<AB>"The EMBL Data Library was the first internationally supported central
resource for nucleic acid sequence data. Working in close collaboration with 
its
American counterpart, GenBank, the library prepares and makes available to the
scientific community a comprehensive collection of the published nucleic acid
sequences. This paper describes briefly the contents of the database, how it is
available, and possible future enhancements of Data Library services."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1986</PY>
<VO>14</VO>
<NO>1</NO>
<PP>5-9</PP>
</SEQ>

<SEQ>
<UI>0358   Chin,F.Y.L.   A Fast Algorithm for C.. J.Inform.Proces 90 
13(4):463-469
</UI>
<AU>Chin FYL;
    Poon CK
</AU>
<TI>A Fast Algorithm for Computing Longest Common Subsequences of Small
Alphabet Size
</TI>
<SU>Longest common;
    Subsequence;
    HK;
    Algorithm
</SU>
<AB>"This paper presents a new algorithm for [the LCS] problem .... This
algorithm is particularly efficient when s (the alphabet size) is small.
Different data structures are used to obtain variations of the basic algorithm
that require different time and space complexities."
</AB>
<JT>J Inform Process</JT>
<PY>1990</PY>
<VO>13</VO>
<NO>4</NO>
<PP>463-469</PP>
</SEQ>

<SEQ>
<UI>0359   Aho,A.V.      Efficient String Match.. Comm.ACM        75 
18(6):333-340
</UI>
<AU>Aho AV;
    Corasick MJ
</AU>
<TI>Efficient String Matching: An Aid to Bibliographic Search
</TI>
<SU>Dictionary match;
    USA;
    Pattern match;
    String match;
    Automata;
    Knuth-Morris-Pratt
</SU>
<AB>"This paper describes a simple, efficient algorithm to locate all
occurrences of any of a finite number of keywords in a string of text. The
algorithm consists of constructing a finite state pattern matching machine from
the keywords and then using the pattern matching machine to process the text
string in a single pass. ... Our approach combines the ideas in the Knuth-
Morris-Pratt algorithm with those of finite state machines."
</AB>
<JT>Comm ACM </JT>
<PY>1975</PY>
<VO>18</VO>
<NO>6</NO>
<PP>333-340</PP>
</SEQ>

<SEQ>
<UI>0360   Aho,A.V.      A Minimum Distance Err.. SIAM J.Comput.  72 
1(4):305-312
</UI>
<AU>Aho AV;
    Peterson TG
</AU>
<TI>A Minimum Distance Error-correcting Parser for Context-free Languages
</TI>
<SU>Correction;
    Language;
    USA;
    Edit;
    Distance;
    Parser
</SU>
<AB>"We assume three types of syntax errors can debase the sentences of a
language generated by a context-free language: the replacement of a symbol by 
an
incorrect symbol, the insertion of an extraneous symbol, or the deletion of a
symbol. We present an algorithm that will parse any input string to completion
finding the fewest possible number of errors. On a random access computer the
algorithm requires time proportional to the cube of the length of the imput."
</AB>
<JT>SIAM J Comput</JT>
<PY>1972</PY>
<VO>1</VO>
<NO>4</NO>
<PP>305-312</PP>
</SEQ>

<SEQ>
<UI>0361   Alexandrov,N. Local Multiple Alignme.. Comput.Appl.Bio 92 
8(4):339-345
</UI>
<AU>Alexandrov NN
</AU>
<TI>Local Multiple Alignment by Consensus Matrix
</TI>
<SU>Multiple alignment;
    Gap;
    Consensus matrix;
    JP;
    Matrix
</SU>
<AB>A new algorithm for aligning several sequences based on the calculation 
of
a consensus matrix and the comparison of all the sequences using this consensus
matrix. Two modifications, corresponding to evolutionary and functional 
meanings
of the alignment, depend on the specification of the gap penalty function.
Interplay between consensus matrix and multiple alignment
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>4</NO>
<PP>339-345</PP>
</SEQ>

<SEQ>
<UI>0362   Aho,A.V.      Bounds on the Complexi.. J.Assoc.Comput. 76 
23(1):1-12
</UI>
<AU>Aho AV;
    Hirschberg DS;
    Ullman JD
</AU>
<TI>Bounds on the Complexity of the Longest Common Subsequence Problem
</TI>
<SU>Longest common;
    USA;
    Complexity;
    Subsequence
</SU>
<AB>"The difficulty of computing a longest common subsequence of two strings
is examined using the decision tree model of computation, in which vertices
represent 'equal-unequal' comparisons." A lower bound is the product of the
lengths of the two strings
</AB>
<JT>J Assoc Comput Mach</JT>
<PY>1976</PY>
<VO>23</VO>
<NO>1</NO>
<PP>1-12</PP>
</SEQ>

<SEQ>
<UI>0363   Bairoch,A.    PROSITE: A Dictionary .. Nucleic Acids R 91 
19(Suppl.):224
</UI>
<AU>Bairoch A
</AU>
<TI>PROSITE: A Dictionary of Sites and Patterns in Proteins
</TI>
<SU>Sequence database;
    SWI;
    Motif;
    Sequence analysis;
    Pattern library;
    Signature;
    Protein;
    PROSITE
</SU>
<AB>"PROSITE is a compilation of sites and patterns found in protein
sequences. The use of protein sequence patterns (or motifs) to determine the
function of proteins is becoming very rapidly one of the essential tools of
sequence analysis. ... No attempt had been made until very recently to
systematically collect biologically significant patterns or to discover new
ones. It is for these reasons that we have developed, since 1988, a dictionary
of sites and patterns which we call PROSITE."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1991</PY>
<VO>19</VO>
<NO>Suppl.</NO>
<PP>2241-2245</PP>
</SEQ>

<SEQ>
<UI>0364   Allison,L.    A Bit-string Longest-c.. Inform.Process. 86 
23(6):305-310
</UI>
<AU>Allison L;
    Dix TI
</AU>
<TI>A Bit-string Longest-common-subsequence Algorithm
</TI>
<SU>Longest common;
    AU;
    Edit;
    Algorithm
</SU>
<AB>"A longest-common-subsequence algorithm is described which operates in
terms of bit or bit-string operations. It offers a speedup of the order of the
word-length on a conventional computer."
</AB>
<JT>Inform Process Lett</JT>
<PY>1986</PY>
<VO>23</VO>
<NO>6</NO>
<PP>305-310</PP>
</SEQ>

<SEQ>
<UI>0365   Allison,L.    Finite-State Models in.. J.Mol.Evol.     92 
35(1):77-89
</UI>
<AU>Allison L;
    Wallace CS;
    Yee CN
</AU>
<TI>Finite-State Models in the Alignment of Macromolecules
</TI>
<SU>Pairwise alignment;
    Significance;
    AU;
    Message length;
    Information theory;
    Model
</SU>
<AB>"Minimum message length encoding is a technique of inductive inference
with theoretical and practical advantages. It allows the posterior odds-ratio 
of
two theories or hypotheses to be calculated. Here it is applied to problems of
aligning or relating two strings, in particular two biological macromolecules."
</AB>
<JT>J Mol Evol</JT>
<PY>1992</PY>
<VO>35</VO>
<NO>1</NO>
<PP>77-89</PP>
</SEQ>

<SEQ>
<UI>0366   Allison,L.    Minimum Message Length.. Bull.Math.Biol. 90 
52(3):431-453
</UI>
<AU>Allison L;
    Yee CN
</AU>
<TI>Minimum Message Length Encoding and the Comparison of Macromolecules
</TI>
<SU>Pairwise alignment;
    Significance;
    AU;
    Message length;
    Information theory
</SU>
<AB>"The question of whether or not two strings are related and, if so, of 
how
they are related and the problem of finding a good theory of string mutation 
are
treated as inductive inference problems. The method allows the posterior odds-
ratio of two string alignments or of two models of string mutation to be
computed."
</AB>
<JT>Bull Math Biol</JT>
<PY>1990</PY>
<VO>52</VO>
<NO>3</NO>
<PP>431-453</PP>
</SEQ>

<SEQ>
<UI>0367   Altschul,S.F. Gap Costs for Multiple.. J.Theor.Biol.   89 
138(3):297-309
</UI>
<AU>Altschul SF
</AU>
<TI>Gap Costs for Multiple Sequence Alignment
</TI>
<SU>Multiple alignment;
    USA;
    Sequence alignment;
    Gap
</SU>
<AB>"A new definition of gap costs for multiple alignments is proposed and
compared with previous ones. Since the new definition links a multiple
alignment's cost to that of its pairwise projections, it allows knowledge 
gained
about two-sequence alignments to bear on the multiple alignment problem. Also,
such linkage is a key element of recent algorithms that have rendered practical
the simultaneous alignment of as many as six sequences."
</AB>
<JT>J Theor Biol</JT>
<PY>1989</PY>
<VO>138</VO>
<NO>3</NO>
<PP>297-309</PP>
</SEQ>

<SEQ>
<UI>0368   Altschul,S.F. Amino Acid Substitutio.. J.Mol.Biol.     91 
219(3):555-565
</UI>
<AU>Altschul SF
</AU>
<TI>Amino Acid Substitution Matrices from an Information Theoretic 
Perspective
</TI>
<SU>Sequence proximity;
    Substitution;
    Information theory;
    USA;
    Scoring;
    Sequence comparison;
    Statistical;
    Sequence alignment;
    Amino acid
</SU>
<AB>"In the light of information theory, it is possible to express the scores
of a substitution matrix in bits and to see that different matrices are better
adapted to different purposes." Discusses the PAM-120, PAM-200, and PAM-250
matrices
</AB>
<JT>J Mol Biol</JT>
<PY>1991</PY>
<VO>219</VO>
<NO>3</NO>
<PP>555-565</PP>
</SEQ>

<SEQ>
<UI>0369   Altschul,S.F. Weights for Data Relat.. J.Mol.Biol.     89 
207(4):647-653
</UI>
<AU>Altschul SF;
    Carroll RJ;
    Lipman DJ
</AU>
<TI>Weights for Data Related by a Tree
</TI>
<SU>Multiple alignment;
    Sequence weight;
    USA;
    Evolutionary tree
</SU>
<AB>"How can one characterize a set of data collected from different
biological species, or indeed any set of data related by an evolutionary tree?
The structure imposed by the tree implies that the data are not independent, 
and
for most applications this should be taken into account. We describe strategies
for weighting the data that circumvent some of the problems of dependency."
</AB>
<JT>J Mol Biol</JT>
<PY>1989</PY>
<VO>207</VO>
<NO>4</NO>
<PP>647-653</PP>
</SEQ>

<SEQ>
<UI>0370   Altschul,S.F. Significance of Nucleo.. Mol.Biol.Evol.  85 
2(6):526-538
</UI>
<AU>Altschul SF;
    Erickson BW
</AU>
<TI>Significance of Nucleotide Sequence Alignments: A Method for Random
Sequence Permutation that Preserves Dinucleotide and Codon Usage
</TI>
<SU>Pairwise alignment;
    Significance;
    USA;
    Sequence alignment;
    Codon;
    Nucleotide;
    Permutation
</SU>
<AB>"It is important to avoid claiming that sequence similarity is the result
of nucleotide order if it can be explained merely by nonrandom usage of
dinucleotides and/or codons. ... This paper describes and illustrates a method
that generates with equal probability all permutations with a given 
dinucleotide
usage or dinucleotide and codon usage."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1985</PY>
<VO>2</VO>
<NO>6</NO>
<PP>526-538</PP>
</SEQ>

<SEQ>
<UI>0371   Altschul,S.F. A Nonlinear Measure of.. Bull.Math.Biol. 86 
48(5/6):617-63
</UI>
<AU>Altschul SF;
    Erickson BW
</AU>
<TI>A Nonlinear Measure of Subalignment Similarity and its Significance 
Levels
</TI>
<SU>Subalignment;
    Locally optimal;
    USA;
    Significance;
    Pattern recognition;
    Similarity
</SU>
<AB>"A new measure of subalignment similarity is introduced. ... Previous
algorithms can not use this measure to find locally optimal subalignments
because, unlike Needleman-Wunsch and Sellers similarities, this measure is
nonlinear. A new pattern recognition algorithm is described for finding all
locally optimal subalignments of two nucleotide sequences."
</AB>
<JT>Bull Math Biol</JT>
<PY>1986</PY>
<VO>48</VO>
<NO>5/6</NO>
<PP>617-632</PP>
</SEQ>

<SEQ>
<UI>0372   Altschul,S.F. Locally Optimal Subali.. Bull.Math.Biol. 86 
48(5/6):633-66
</UI>
<AU>Altschul SF;
    Erickson BW
</AU>
<TI>Locally Optimal Subalignments using Nonlinear Similarity Functions
</TI>
<SU>Subalignment;
    Sequence comparison;
    USA;
    Locally optimal;
    Optimal;
    Function;
    Similarity
</SU>
<AB>"Nonlinear similarity functions are often better than linear functions at
distinguishing interesting subalignments from those due to chance. Nonlinear
similarity functions useful for comparing biological sequences are developed.
Several new algorithms are presented for finding locally optimal subalignments
of two sequences."
</AB>
<JT>Bull Math Biol</JT>
<PY>1986</PY>
<VO>48</VO>
<NO>5/6</NO>
<PP>633-660</PP>
</SEQ>

<SEQ>
<UI>0373   Altschul,S.F. Optimal Sequence Align.. Bull.Math.Biol. 86 
48(5/6):603-61
</UI>
<AU>Altschul SF;
    Erickson BW
</AU>
<TI>Optimal Sequence Alignment using Affine Gap Costs
</TI>
<SU>Sequence proximity;
    Pairwise alignment;
    USA;
    Sequence alignment;
    Optimal;
    Gap
</SU>
<AB>"When comparing two biological sequences, it is often desirable for a gap
to be assigned a cost not directly proportional to its length. If affine gap
costs are employed, ... the algorithm of Gotoh (1982) finds the minimum cost of
aligning two sequences in order MN steps." Since Gotoh's algorithm is flawed,
the authors describe "an algorithm that finds all and only the optimal
alignments."
</AB>
<JT>Bull Math Biol</JT>
<PY>1986</PY>
<VO>48</VO>
<NO>5/6</NO>
<PP>603-616</PP>
</SEQ>

<SEQ>
<UI>0374   Altschul,S.F. Significance Levels fo.. Bull.Math.Biol. 88 
50(1):77-92
</UI>
<AU>Altschul SF;
    Erickson BW
</AU>
<TI>Significance Levels for Biological Sequence Comparison using Non-linear
Similarity Functions
</TI>
<SU>Subalignment;
    Significance;
    USA;
    Sequence comparison;
    Scoring;
    Distribution;
    Function;
    Similarity
</SU>
<AB>"A class of non-linear similarity functions s1 has been proposed for
comparing subalignments of biological sequences. The distribution of maximal 
s1-
alignments is well approximated by the extreme value distribution. The
significance levels of s1 are studied for a variety of nucleotide frequency
distributions as well as for several matrices of amino acid substitution 
costs."
See Altschul, Erickson (1986)
</AB>
<JT>Bull Math Biol</JT>
<PY>1988</PY>
<VO>50</VO>
<NO>1</NO>
<PP>77-92</PP>
</SEQ>

<SEQ>
<UI>0375   Altschul,S.F. Basic Local Alignment .. J.Mol.Biol.     90 
215:403-410
</UI>
<AU>Altschul SF;
    Gish W;
    Miller W;
    Myers EW;
    Lipman DJ
</AU>
<TI>Basic Local Alignment Search Tool
</TI>
<SU>Subalignment;
    Database search;
    USA;
    Dynamic programming;
    Motif;
    Region;
    Locally optimal;
    BLAST
</SU>
<AB>A new method "which employs a measure based on well-defined mutation
scores. It directly approximates the results that would be obtained by a 
dynamic
programming algorithm for optimizing this measure. ... The basic algorithm ...
can be ... applied in a variety of contexts including straightforward DNA and
protein sequences database searches, motif searches, ... and in the analysis of
multiple regions of similarity ...."
</AB>
<JT>J Mol Biol</JT>
<PY>215</PY>
<VO>215</VO>
<PP>403-410</PP>
</SEQ>

<SEQ>
<UI>0376   Altschul,S.F. Trees, Stars, and Mult.. SIAM J.Appl.Mat 89 
49(1):197-209
</UI>
<AU>Altschul SF;
    Lipman DJ
</AU>
<TI>Trees, Stars, and Multiple Biological Sequence Alignment
</TI>
<SU>Multiple alignment;
    USA;
    Sequence alignment;
    Dynamic programming;
    Evolutionary tree
</SU>
<AB>"This paper presents an extension of Carrillo and Lipman's algorithm to
the definition of multiple alignment cost as the cost of an evolutionary tree."
</AB>
<JT>SIAM J Appl Math</JT>
<PY>1989</PY>
<VO>49</VO>
<NO>1</NO>
<PP>197-209</PP>
</SEQ>

<SEQ>
<UI>0377   Altschul,S.F. Protein Database Searc.. Proc.Nat.Acad.S 90 
87(14):5509-55
</UI>
<AU>Altschul SF;
    Lipman DJ
</AU>
<TI>Protein Database Searches for Multiple Alignments
</TI>
<SU>Database search;
    USA;
    Statistical;
    Multiple alignment;
    Pattern recognition;
    Sequence comparison;
    Protein
</SU>
<AB>"By searching a database for multiple as opposed to pairwise alignments,
distant relationships are much more easily distinguished from background noise.
Recent statistical results permit the power of this approach to be analyzed.
Given a typical query sequence, an algorithm described here permits the current
protein database to be searched for three-sequence alignments in less than four
minutes."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>1990</PY>
<VO>87</VO>
<NO>14</NO>
<PP>5509-5513</PP>
</SEQ>

<SEQ>
<UI>0378   Aoe,J.I.      An Efficient Implement.. IEEE Trans.Soft 89 
15(8):1010-101
</UI>
<AU>Aoe JI
</AU>
<TI>An Efficient Implementation of Static String Pattern Matching Machines
</TI>
<SU>Dictionary match;
    Automata;
    JP;
    Pattern match
</SU>
<AB>"A technique for implementing a static transition table of a string
pattern matching machine which locates all occurrences of a finite number of
keywords in a string is described. The approach is based on Johnson's storage
and retrieval method of the transition table of a finite state machine."
</AB>
<JT>IEEE Trans Software Eng</JT>
<PY>1989</PY>
<VO>15</VO>
<NO>8</NO>
<PP>1010-1016</PP>
</SEQ>

<SEQ>
<UI>0379   Aoe,J.        A Method for Improving.. IEEE Trans.Soft 84 
10(1):116-120
</UI>
<AU>Aoe J;
    Yamamoto Y;
    Shimada R
</AU>
<TI>A Method for Improving String Pattern Matching Machines
</TI>
<SU>Dictionary match;
    Automata;
    JP;
    Pattern match
</SU>
<AB>"This correspondence describes an efficient string pattern matching
machine to locate all occurrences of any of a finite number of keywords and
phrases in an arbitrary text string. Some conditions are defined on the states
of the machine in order to improve the speed and size of the machine by Aho and
Corasick" (1975)
</AB>
<JT>IEEE Trans Software Eng</JT>
<PY>1984</PY>
<VO>10</VO>
<NO>1</NO>
<PP>116-120</PP>
</SEQ>

<SEQ>
<UI>0380   Apostolico,A. Improving the Worst-ca.. Inform.Process. 86 
23(2):63-69
</UI>
<AU>Apostolico A
</AU>
<TI>Improving the Worst-case Performance of the Hunt-Szymanski Strategy for
the Longest Common Subsequence of Two Strings
</TI>
<SU>Longest common;
    USA;
    Search tree;
    Data structure;
    Subsequence;
    Performance
</SU>
<AB>"The new algorithm presented here pursues a schedule of primitive
operations quite close to the one inherent to the Hunt-Szymanski strategy, but
with substantially enhanced efficiency. ... First, its worst case is never 
worse
than linear in the product nm of the lengths of the two input strings. Second,
its time bound does not always grow with the cardinality r of the set R of all
pairs of matching positions of the input strings."
</AB>
<JT>Inform Process Lett</JT>
<PY>1986</PY>
<VO>23</VO>
<NO>2</NO>
<PP>63-69</PP>
</SEQ>

<SEQ>
<UI>0381   Apostolico,A. Remark on the Hsu-Du N.. Inform.Process. 87 
25(4):235-236
</UI>
<AU>Apostolico A
</AU>
<TI>Remark on the Hsu-Du New Algorithm for the Longest Common Subsequence
Problem
</TI>
<SU>Longest common;
    USA;
    Subsequence;
    Algorithm
</SU>
<AB>"One of the time bounds claimed for a recent algorithm [Hsu, Du (1984)]
computing the longest common subsequence of two strings is shown not to be
correct. While this fact considerably affects the performance of that 
algorithm,
it also contributes to pose a few interesting questions."
</AB>
<JT>Inform Process Lett</JT>
<PY>1987</PY>
<VO>25</VO>
<NO>4</NO>
<PP>235-236</PP>
</SEQ>

<SEQ>
<UI>0382   Apostolico,A. Efficient CRCW-PRAM Al.. Theoret.Comput. 93 
108:331-344
</UI>
<AU>Apostolico A
</AU>
<TI>Efficient CRCW-PRAM Algorithms for Universal Substring Searching
</TI>
<SU>String match;
    Parallel;
    USA;
    Complexity;
    Sequence search;
    Algorithm
</SU>
<AB>"Thus, in particular, searching for any substring of a pattern of size m
in any substring of a text of size n can be done in constant time with at most 
n
+ m processors, once both the text and the pattern have been put in standard
form at a cost of O((n + m) log n) operations. This has the same global
complexity as the early algorithm in [Galil (1985)], which, however, handled
only one definite pattern at a time."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>108</PY>
<VO>108</VO>
<PP>331-344</PP>
</SEQ>

<SEQ>
<UI>0383   Apostolico,A. Fast Linear-space Comp.. Theoret.Comput. 92 
92(1):3-17
</UI>
<AU>Apostolico A;
    Browne S;
    Guerra C
</AU>
<TI>Fast Linear-space Computations of Longest Common Subsequences
</TI>
<SU>Longest common;
    USA;
    Complexity;
    Subsequence
</SU>
<AB>"This paper reviews linear-space LCS computations in connection with two
classical paradigms originally designed to take less than quadratic time in
favorable circumstances. The objective is to achieve the space reduction 
without
alteration of the asymptotic time complexity of the original algorithm." One
suits cases where the LCS is expected to be close to the shortest input string;
another suits cases where one input is much shorter than the other
</AB>
<JT>Theoret Comput Sci</JT>
<PY>1992</PY>
<VO>92</VO>
<NO>1</NO>
<PP>3-17</PP>
</SEQ>

<SEQ>
<UI>0384   Apostolico,A. The Boyer-Moore-Galil .. SIAM J.Comput.  86 
15(1):98-105
</UI>
<AU>Apostolico A;
    Giancarlo R
</AU>
<TI>The Boyer-Moore-Galil String Searching Strategies Revisited
</TI>
<SU>Boyer-Moore;
    USA;
    Pattern match;
    String match;
    String search
</SU>
<AB>"Based on the Boyer-Moore-Galil approach, a new algorithm is proposed
which requires a number of character comparisons bounded by 2n, regardless of
the number of occurrences of the pattern in the text string. Preprocessing is
only slightly more involved and still requires a time linear in the pattern
size."
</AB>
<JT>SIAM J Comput</JT>
<PY>1986</PY>
<VO>15</VO>
<NO>1</NO>
<PP>98-105</PP>
</SEQ>

<SEQ>
<UI>0385   Apostolico,A. The Longest Common Sub.. Algorithmica    87 
2:315-336
</UI>
<AU>Apostolico A;
    Guerra C
</AU>
<TI>The Longest Common Subsequence Problem Revisited
</TI>
<SU>Longest common;
    USA;
    Data structure;
    Dynamic programming;
    Subsequence
</SU>
<AB>"This paper re-examines, in a unified framework, two classic approaches
[of Hirschberg and Hunt-Szymanski] to the problem of finding a longest common
subsequence (LCS) of two strings, and proposes faster implementations for 
both."
</AB>
<JT>Algorithmica </JT>
<PY>2</PY>
<VO>2</VO>
<PP>315-336</PP>
</SEQ>

<SEQ>
<UI>0386   Apostolico,A. Parallel Construction .. Algorithmica    88 
3:347-365
</UI>
<AU>Apostolico A;
    Iliopoulos C;
    Landau GM;
    Schieber B;
    Vishkin U
</AU>
<TI>Parallel Construction of a Suffix Tree with Applications
</TI>
<SU>Match with k differences;
    Parallel;
    USA;
    Search tree;
    String match;
    Regularities;
    Suffix
</SU>
<AB>"In this paper a CRCW parallel RAM algorithm is presented that constructs
the suffix tree associated with a string of n symbols in O(log n) time with n
processors. ... Efficient parallel procedures are also given for some string
problems that can be solved with suffix trees." On-line string matching. String
matching with k differences
</AB>
<JT>Algorithmica </JT>
<PY>3</PY>
<VO>3</VO>
<PP>347-365</PP>
</SEQ>

<SEQ>
<UI>0387   Argos,P.      A Sensitive Procedure .. J.Mol.Biol.     87 
193(2):385-396
</UI>
<AU>Argos P
</AU>
<TI>A Sensitive Procedure to Compare Amino Acid Sequences
</TI>
<SU>Pairwise alignment;
    DE;
    Segment;
    Subalignment;
    Scoring;
    Amino acid
</SU>
<AB>"Methods are discussed that provide sensitive criteria for detection of
weak sequence homologies. They are based on the Dayhoff relatedness odds amino
acid exchange matrix and certain residue physical characteristics. The search
procedure uses several residue probe lengths in comparing all possible segments
of two protein sequences ...Alignments are automatically effected using the
highest search matrix values and without the necessity of gap penalties."
</AB>
<JT>J Mol Biol</JT>
<PY>1987</PY>
<VO>193</VO>
<NO>2</NO>
<PP>385-396</PP>
</SEQ>

<SEQ>
<UI>0388   Argos,P.      Sensitivity Comparison.. Methods Enzymol 90 
183:352-365
</UI>
<AU>Argos P;
    Vingron M
</AU>
<TI>Sensitivity Comparisons of Protein Amino Acid Sequences
</TI>
<SU>Pairwise alignment;
    Multiple alignment;
    Dot;
    DE;
    Gap;
    Protein;
    Amino acid
</SU>
<AB>The classic alignment procedure for amino acid sequences is flawed,
especially when alignments have amino acid identity at 35% or less of the
matched positions, since multiple optimal alignments usually exist and are
sensitive to choices of gap penalties. The authors describe strategies to
overcome these problems for comparisons of two or multiple sequences. See Argos
(1987), Rechid, Vingron, and Argos (1989), and Vingron and Argos (1989) for
details
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>352-365</PP>
</SEQ>

<SEQ>
<UI>0389   Argos,P.      Protein Sequence Compa.. Protein Eng.    91 
4(4):375-383
</UI>
<AU>Argos P;
    Vingron M;
    Vogt G
</AU>
<TI>Protein Sequence Comparison: Methods and Significance
</TI>
<SU>Database search;
    Review;
    DE;
    Sequence comparison;
    Significance;
    Sequence alignment;
    Protein
</SU>
<AB>Single amino acid comparisons. Using sequence fragments for comparisons.
Multiple sequence alignment. Problems. Comparison of methods. Recommendations.
Future needs
</AB>
<JT>Protein Eng</JT>
<PY>1991</PY>
<VO>4</VO>
<NO>4</NO>
<PP>375-383</PP>
</SEQ>

<SEQ>
<UI>0390   Golding,B.    Exploratory Analysis o.. Comput.Appl.Bio 94 
10(3):243-247
</UI>
<AU>Golding B
</AU>
<TI>Exploratory Analysis of Multiple Sequence Alignments using Phylogenies
</TI>
<SU>Multiple alignment;
    Significance;
    Phylogeny;
    CA;
    Sequence alignment;
    Region
</SU>
<AB>"Multiple alignment algorithms may produce an alignment between sequences
even when they have little homology with other sequences. A program is 
presented
that makes use of a phylogeny to explore the implications of an alignment. ...
The program also permits randomization of subsections of the sequences to
determine the significance of the multiple alignment for these individual
regions. The combination of these two simple methods permits rapid and
interactive exploration of multiple sequence alignments."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>3</NO>
<PP>243-247</PP>
</SEQ>

<SEQ>
<UI>0391   Arratia,R.    The Erdos-Renyi Law in.. Ann.Statist.    90 
18(2):539-570
</UI>
<AU>Arratia R;
    Gordon L;
    Waterman MS
</AU>
<TI>The Erdos-Renyi Law in Distribution, for Coin Tossing and Sequence
Matching
</TI>
<SU>Pairwise comparison;
    Significance;
    USA;
    Statistical;
    Segment;
    Distribution
</SU>
<AB>"We consider the simplest problem of possible statistical interest,
matching segments from two independent sequences of independent identically
distributed letters. Surprisingly ... we shall see that even such a naive
formulation might be useful in a biological context. Our main results ... give
the asymptotic distribution of unusually rich matches between independent
sequences."
</AB>
<JT>Ann Statist</JT>
<PY>1990</PY>
<VO>18</VO>
<NO>2</NO>
<PP>539-570</PP>
</SEQ>

<SEQ>
<UI>0392   Arratia,R.    Critical Phenomena in .. Ann.Probab.     85 
13(4):1236-124
</UI>
<AU>Arratia R;
    Waterman MS
</AU>
<TI>Critical Phenomena in Sequence Matching
</TI>
<SU>Pairwise comparison;
    Significance;
    USA;
    Sequence match;
    Markov;
    Longest common
</SU>
<AB>"We give a generalization of the result of Erdos and Renyi on the length
of the longest head run in the first n tosses of a coin. ... The results
generalize to more than two sequences and to Markov chains. A strong law of
large numbers is given for the proportion of letters within the longest 
matching
word; the limiting proportion exhibits critical behavior ...."
</AB>
<JT>Ann Probab</JT>
<PY>1985</PY>
<VO>13</VO>
<NO>4</NO>
<PP>1236-1249</PP>
</SEQ>

<SEQ>
<UI>0393   Arratia,R.    The Erdos-Renyi Strong.. Ann.Probab.     89 
17(3):1152-116
</UI>
<AU>Arratia R;
    Waterman MS
</AU>
<TI>The Erdos-Renyi Strong Law for Pattern Matching with a Given Proportion 
of
Mismatches
</TI>
<SU>Pairwise comparison;
    Significance;
    USA;
    Pattern match;
    Longest common;
    Markov
</SU>
<AB>"Consider two random sequences ... of i.i.d. letters in which the
probability that two distinct letters match is p &gt; 0. For each value a between 
p
and 1, the length of the longest contiguous matching between the two sequences,
requiring only a proportion a of corresponding letters to match, satisfies a
strong law analogous to the Erdos-Renyi law for coin tossing."
</AB>
<JT>Ann Probab</JT>
<PY>1989</PY>
<VO>17</VO>
<NO>3</NO>
<PP>1152-1169</PP>
</SEQ>

<SEQ>
<UI>0394   Attwood,T.K.  Multiple Sequence Alig.. Gene            91 
98:153-159
</UI>
<AU>Attwood TK;
    Eliopoulos EE;
    Findlay JBC
</AU>
<TI>Multiple Sequence Alignment of Protein Families Showing Low Sequence
Homology: A Methodological Approach Using Database Pattern-matching
Discriminators for G-protein-linked Receptors
</TI>
<SU>Multiple alignment;
    UK;
    Sequence alignment;
    Pattern match;
    Discrimination;
    Profile;
    Homology;
    Protein
</SU>
<AB>"The approach ... does not rely on 3D-structure alignments, it does not
include explicit definitions of secondary structure positions, and neither does
it introduce gap penalties. The method relies instead on building up a 
character
profile for each position in the discriminators, ... and is used in a
qualitative manner to aid multiple sequence alignment."
</AB>
<JT>Gene </JT>
<PY>98</PY>
<VO>98</VO>
<PP>153-159</PP>
</SEQ>

<SEQ>
<UI>0395   Bacon,D.J.    Multiple Sequence Alig.. J.Mol.Biol.     86 
191:153-161
</UI>
<AU>Bacon DJ;
    Anderson WF
</AU>
<TI>Multiple Sequence Alignment
</TI>
<SU>Multiple alignment;
    Segment;
    CA;
    Sequence alignment;
    Statistical
</SU>
<AB>A multiple sequence alignment algorithm which is based on finding common
subsequences
</AB>
<JT>J Mol Biol</JT>
<PY>191</PY>
<VO>191</VO>
<PP>153-161</PP>
</SEQ>

<SEQ>
<UI>0396   Bacon,D.J.    Multiple Sequence Comp.. Methods Enzymol 90 
183:438-447
</UI>
<AU>Bacon DJ;
    Anderson WF
</AU>
<TI>Multiple Sequence Comparison
</TI>
<SU>Multiple alignment;
    Segment;
    CA;
    Sequence comparison;
    Statistical
</SU>
<AB>Concerns "the problem of finding weak similarities or distant
relationships among proteins for which only the sequences are known. Comparing
just two sequences at a time by current methods does not allow for a
sufficiently sensitive test of similarity. ... Simultaneous intercomparison of
several sequences, on the other hand, often yields a significantly nonrandom
signal that in turn provides a statistical basis for assertions about structure
and/or function."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>438-447</PP>
</SEQ>

<SEQ>
<UI>0397   Baeza-Yates,R Improved String Search.. Software.Practi 89 
19(3):257-271
</UI>
<AU>Baeza-Yates RA
</AU>
<TI>Improved String Searching
</TI>
<SU>String search;
    Boyer-Moore;
    CA;
    String match;
    Text search;
    Pattern match
</SU>
<AB>"We show that it is possible to improve the average time of the Boyer-
Moore string matching algorithm using more space."
</AB>
<JT>Software Practice Experience </JT>
<PY>1989</PY>
<VO>19</VO>
<NO>3</NO>
<PP>257-271</PP>
</SEQ>

<SEQ>
<UI>0398   Baeza-Yates,R String Searching Algor.. Lecture Notes i 89 
382:75-96
</UI>
<AU>Baeza-Yates RA
</AU>
<TI>String Searching Algorithms Revisited
</TI>
<SU>String match;
    CA;
    Data structure;
    Boyer-Moore;
    Knuth-Morris-Pratt;
    String search;
    Algorithm
</SU>
<AB>Dehne,F., Sack,J.R., Santoro,N. (Eds.), Algorithms and Data Structures,
Workshop WADS '89, Ottawa, Canada, 17-19 August 1989. "We present bounds for 
the
average case of the Knuth-Morris-Pratt (KMP) algorithm and the Boyer-Moore-
Horspool algorithm for random text. ... We also present a hybrid algorithm 
which
combines the KMP and BMH algorithms, and which, in practice, is faster than the
Boyer-Moore algorithm."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>382</PY>
<VO>382</VO>
<PP>75-96</PP>
</SEQ>

<SEQ>
<UI>0399   Baeza-Yates,R Average Running Time o.. Theoret.Comput. 92 
92(1):19-31
</UI>
<AU>Baeza-Yates RA;
    Regnier M
</AU>
<TI>Average Running Time of the Boyer-Moore-Horspool Algorithm
</TI>
<SU>String match;
    Boyer-Moore;
    CL;
    String search;
    Algorithm
</SU>
<AB>"We study Boyer-Moore-type string searching algorithms. We analyze the
Horspool's variant. The searching time is linear. An exact expression of the
linearity constant is derived and is proven to be asymptotically a, 1/c &lt;= a &lt;=
2/(c + 1), where c is the cardinality of the alphabet."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>1992</PY>
<VO>92</VO>
<NO>1</NO>
<PP>19-31</PP>
</SEQ>

<SEQ>
<UI>0400   Bains,W.      MULTAN: A Program to A.. Nucleic Acids R 86 
14(1):159-177
</UI>
<AU>Bains W
</AU>
<TI>MULTAN: A Program to Align Multiple DNA Sequences
</TI>
<SU>Multiple alignment;
    Consensus sequence;
    UK;
    Program;
    DNA
</SU>
<AB>The author describes a heuristic, iterative algorithm to align nucleic
acid sequences. A basic step of the algorithm generates a consensus sequence
from the current alignment of sequences. Seven rules identify a consensus 
result
at a position. Interplay between consensus sequence and multiple alignment
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1986</PY>
<VO>14</VO>
<NO>1</NO>
<PP>159-177</PP>
</SEQ>

<SEQ>
<UI>0401   Bains,W.      MULTAN (2), a Multiple.. Comput.Appl.Bio 89 
5(1):51-52
</UI>
<AU>Bains W
</AU>
<TI>MULTAN (2), a Multiple String Alignment Program for Nucleic Acids and
Proteins
</TI>
<SU>Multiple alignment;
    Consensus sequence;
    UK;
    Program;
    Protein;
    Nucleic acid
</SU>
<AB>"Here I describe a generalization of Bains' (1986) MULTAN program to 
align
any sequences made from an alphabet of &lt;= 64 characters." Interplay between
consensus sequence and multiple alignment
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1989</PY>
<VO>5</VO>
<NO>1</NO>
<PP>51-52</PP>
</SEQ>

<SEQ>
<UI>0402   Bairoch,A.    SEQANALREF: A Sequence.. Comput.Appl.Bio 91 
7(2):268-268
</UI>
<AU>Bairoch A
</AU>
<TI>SEQANALREF: A Sequence Analysis Bibliographic Reference Data Bank
</TI>
<SU>Sequence analysis;
    Bibliography;
    SWI
</SU>
<AB>"The majority of entries belong to one of the following categories:
algorithms for protein and nucleic acid sequence analysis ...; algorithms for
sequence-based phylogenetic analysis; description of biopolymer data banks ...;
description of software packages; description of on-line services for molecular
biologists."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1991</PY>
<VO>7</VO>
<NO>2</NO>
<PP>268-268</PP>
</SEQ>

<SEQ>
<UI>0403   Barron,S.     A Bibliography on Comp.. Comput.Appl.Bio 91 
7(2):269-269
</UI>
<AU>Barron S;
    Witten M;
    Harkness R;
    Driver J
</AU>
<TI>A Bibliography on Computational Algorithms in Molecular Biology and
Genetics
</TI>
<SU>Sequence analysis;
    Bibliography;
    USA;
    Program;
    Genetic;
    Algorithm
</SU>
<AB>"The purpose of this short note is to provide an announcement of an
ongoing database of bibliographic references on the subject of computational
algorithms in molecular biology and genetics. We have focused upon computer and
mathematical aspects of molecular biology and genetics (interpreted in a 
liberal
and broad sense)."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1991</PY>
<VO>7</VO>
<NO>2</NO>
<PP>269-269</PP>
</SEQ>

<SEQ>
<UI>0404   Barry,D.      Asynchronous Distance .. Biometrics      87 
43:261-276
</UI>
<AU>Barry D;
    Hartigan JA
</AU>
<TI>Asynchronous Distance Between Homologous DNA Sequences
</TI>
<SU>Sequence proximity;
    IR;
    Statistical;
    Significance;
    Distance;
    DNA
</SU>
<AB>"The distance between homologous DNA sequences of two species is proposed
to be -0.25 ln [det (P)], where P is the conditional probability matrix
specifying the proportions of the various nucleotides in the second sequence,
corresponding to each of the four nucleotides in the first sequence."
</AB>
<JT>Biometrics </JT>
<PY>43</PY>
<VO>43</VO>
<PP>261-276</PP>
</SEQ>

<SEQ>
<UI>0405   Barth,G.      An Alternative for the.. Inform.Process. 81 
13(4-5):134-13
</UI>
<AU>Barth G
</AU>
<TI>An Alternative for the Implementation of Knuth-Morris-Pratt Algorithm
</TI>
<SU>Knuth-Morris-Pratt;
    USA;
    String match;
    Algorithm
</SU>
<AB>"The new version of [the Knuth-Morris-Pratt algorithm] is optimal in the
sense that for any input data a minimal amount of effort is spent to prepare 
for
recoveries from possible mismatches."
</AB>
<JT>Inform Process Lett</JT>
<PY>1981</PY>
<VO>13</VO>
<NO>4-5</NO>
<PP>134-137</PP>
</SEQ>

<SEQ>
<UI>0406   Barth,G.      An Analytical Comparis.. Inform.Process. 84 
18(5):249-256
</UI>
<AU>Barth G
</AU>
<TI>An Analytical Comparison of Two String Searching Algorithms
</TI>
<SU>String match;
    Knuth-Morris-Pratt;
    DE;
    Pattern match;
    String search;
    Markov;
    Analytical;
    Algorithm
</SU>
<AB>"Average case analyses of two algorithms to locate [a pattern in a text]
are conducted in this paper. One algorithm is based on a straightforward trial-
and-error approach, the other one [is due to Knuth-Morris-Pratt]."
</AB>
<JT>Inform Process Lett</JT>
<PY>1984</PY>
<VO>18</VO>
<NO>5</NO>
<PP>249-256</PP>
</SEQ>

<SEQ>
<UI>0407   Barton,G.J.   Protein Multiple Seque.. Methods Enzymol 90 
183:403-428
</UI>
<AU>Barton GJ
</AU>
<TI>Protein Multiple Sequence Alignment and Flexible Pattern Matching
</TI>
<SU>Multiple alignment;
    Review;
    Match a pattern matrix;
    UK;
    Sequence alignment;
    Pattern match;
    Protein
</SU>
<AB>"In this chapter, a practical strategy for the rapid multiple alignment 
of
protein sequences is described. Although not guaranteed to give the
mathematically optimal alignment, the algorithm is able to cope with large
numbers of sequences. It is also a fast procedure that gives alignments
generally as good or better than those obtained by pairwise methods."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>403-428</PP>
</SEQ>

<SEQ>
<UI>0408   Barton,G.J.   Scanning Protein Seque.. Comput.Appl.Bio 91 
7(1):85-88
</UI>
<AU>Barton GJ
</AU>
<TI>Scanning Protein Sequence Databanks using a Distributed Processing
Workstation Network
</TI>
<SU>Database search;
    Distributed;
    UK;
    Sequence comparison;
    Protein;
    Network;
    Databank
</SU>
<AB>"The programme pscan has been developed to distribute protein databank
scans over a network of computers that share a common file system. pscan may be
used in conjunction with most conventional sequence comparison programmes. ...
Accordingly, pscan provides a low-cost, portable alternative to dedicated
parallel processing computers."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1991</PY>
<VO>7</VO>
<NO>1</NO>
<PP>85-88</PP>
</SEQ>

<SEQ>
<UI>0409   Barton,G.J.   Computer Speed and Seq.. Science         92 257(18 
Sept.):
</UI>
<AU>Barton GJ
</AU>
<TI>Computer Speed and Sequence Comparison
</TI>
<SU>Database search;
    UK;
    Sequence comparison
</SU>
<AB>"Despite recent well-known advances in computer performance, it is still 
a
commonly and erroneously held belief that rigorous sequence comparison method
are too expensive to use for protein database searching."
</AB>
<JT>Science </JT>
<PY>1992</PY>
<VO>257</VO>
<NO>18 Sept.</NO>
<PP>1609-1609</PP>
</SEQ>

<SEQ>
<UI>0410   Barton,G.J.   ALSCRIPT: A Tool to Fo.. Protein Eng.    93 
6(1):37-40
</UI>
<AU>Barton GJ
</AU>
<TI>ALSCRIPT: A Tool to Format Multiple Sequence Alignments
</TI>
<SU>Multiple alignment;
    UK;
    Sequence alignment;
    Display;
    Program
</SU>
<AB>"The ALSCRIPT program ... was developed specifically to allow the easy
formatting and graphical display of large multiple sequence alignments."
</AB>
<JT>Protein Eng</JT>
<PY>1993</PY>
<VO>6</VO>
<NO>1</NO>
<PP>37-40</PP>
</SEQ>

<SEQ>
<UI>0411   Barton,G.J.   A Strategy for the Rap.. J.Mol.Biol.     87 
198:327-337
</UI>
<AU>Barton GJ;
    Sternberg MJE
</AU>
<TI>A Strategy for the Rapid Multiple Alignment of Protein Sequences.
Confidence Levels from Tertiary Structure Comparisons
</TI>
<SU>Multiple alignment;
    Clustering;
    UK;
    Structure;
    Significance;
    Confidence;
    Protein
</SU>
<AB>"An algorithm is presented for the multiple alignment of protein 
sequences
that is both accurate and rapid computationally. The approach is based on the
conventional dynamic-programming method of pairwise alignment."
</AB>
<JT>J Mol Biol</JT>
<PY>198</PY>
<VO>198</VO>
<PP>327-337</PP>
</SEQ>

<SEQ>
<UI>0412   Barton,G.J.   Evaluation and Improve.. Protein Eng.    87 
1(2):89-94
</UI>
<AU>Barton GJ;
    Sternberg MJE
</AU>
<TI>Evaluation and Improvements in the Automatic Alignment of Protein
Sequences
</TI>
<SU>Pairwise alignment;
    Significance;
    UK;
    Sequence alignment;
    Structure;
    Protein
</SU>
<AB>"The accuracy of protein sequence alignment obtained by applying a
commonly used global sequence comparison algorithm is assessed. Alignments 
based
on the superposition of the three-dimensional structures are used as a standard
for testing the automatic, sequence-based methods."
</AB>
<JT>Protein Eng</JT>
<PY>1987</PY>
<VO>1</VO>
<NO>2</NO>
<PP>89-94</PP>
</SEQ>

<SEQ>
<UI>0413   Barton,G.J.   Flexible Protein Seque.. J.Mol.Biol.     90 
212(2):389-402
</UI>
<AU>Barton GJ;
    Sternberg MJE
</AU>
<TI>Flexible Protein Sequence Patterns: A Sensitive Method to Detect Weak
Structural Similarities
</TI>
<SU>Match a pattern matrix;
    UK;
    Pattern match;
    Pattern definition;
    Dynamic programming;
    Similarity;
    Protein
</SU>
<AB>"In contrast to conventional pattern matching, template or sequence
alignment methods, flexible [protein sequence] patterns allow residue patterns
typical of a complete protein fold to be developed in terms of residue 
positions
(elements), separated by gaps of defined range. An efficient dynamic 
programming
algorithm is presented to enable the best alignment(s) of a pattern with a
sequence to be identified."
</AB>
<JT>J Mol Biol</JT>
<PY>1990</PY>
<VO>212</VO>
<NO>2</NO>
<PP>389-402</PP>
</SEQ>

<SEQ>
<UI>0414   Beanland,T.J. The Inference of Evolu.. Comp.Biochem.Ph 92 
102B(4):643-65
</UI>
<AU>Beanland TJ;
    Howe CJ
</AU>
<TI>The Inference of Evolutionary Trees from Molecular Data
</TI>
<SU>Phylogeny;
    Multiple alignment;
    UK;
    Review;
    Evolutionary tree;
    Significance
</SU>
<AB>"Procedures for multiple alignment of sequence data, subsequent
phylogenetic inference, and testing of the trees derived are presented. The
assumptions underlying different approaches and the extent to which they are
valid are discussed."
</AB>
<JT>Comp Biochem Physiol B Comp Biochem</JT>
<PY>1992</PY>
<VO>102B</VO>
<NO>4</NO>
<PP>643-659</PP>
</SEQ>

<SEQ>
<UI>0415   Beckmann,J.S. Intervening Sequences .. J.Biomol.Struct 86 
4(3):391-400
</UI>
<AU>Beckmann JS;
    Brendel V;
    Trifonov EN
</AU>
<TI>Intervening Sequences Exhibit Distinct Vocabulary
</TI>
<SU>Sequence analysis;
    Significance;
    IL;
    Linguistic
</SU>
<AB>"Little is known about the origin and function of eukaryotic introns.
Application of a novel linguistic approach to the analysis of intervening
sequences reveals, however, that they exhibit a specific non-random vocabulary
whose major feature is the utilization of mirror-symmetrical words
('mirrorrim')."
</AB>
<JT>J Biomol Struct &amp; Dyn </JT>
<PY>1986</PY>
<VO>4</VO>
<NO>3</NO>
<PP>391-400</PP>
</SEQ>

<SEQ>
<UI>0416   Benner,S.A.   Response to Barton's L.. Science         92 257(18 
Sept.):
</UI>
<AU>Benner SA;
    Cohen MA;
    Gonnet GH
</AU>
<TI>Response to Barton's Letter: Computer Speed and Sequence Comparison
</TI>
<SU>Database search;
    SWI;
    Sequence comparison;
    Sequence alignment
</SU>
<AB>"However, our approach and Barton's differ in three fundamental ways, all
interesting to the general scientist who wants to use sequence alignments
without becoming entangled in its mathematics."
</AB>
<JT>Science </JT>
<PY>1992</PY>
<VO>257</VO>
<NO>18 Sept.</NO>
<PP>1609-1610</PP>
</SEQ>

<SEQ>
<UI>0417   Benson,D.C.   Digital Signal Process.. Nucleic Acids R 90 
18(10):3001-30
</UI>
<AU>Benson DC
</AU>
<TI>Digital Signal Processing Methods for Biological Sequence Comparison
</TI>
<SU>Pairwise comparison;
    Fourier;
    USA;
    Statistical;
    Signal;
    Sequence comparison
</SU>
<AB>"A method is discussed for DNA or protein sequence comparison using a
finite field fast Fourier transform, a digital signal processing technique; and
statistical methods are discussed for analyzing the output of this algorithm.
[It] compares two sequences of length N in computing time proportional to N log
N compared to N2 for methods currently used."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1990</PY>
<VO>18</VO>
<NO>10</NO>
<PP>3001-3006</PP>
</SEQ>

<SEQ>
<UI>0418   Benson,D.C.   Fourier Methods for Bi.. Nucleic Acids R 90 
18(21):6305-63
</UI>
<AU>Benson DC
</AU>
<TI>Fourier Methods for Biosequence Analysis
</TI>
<SU>Pairwise comparison;
    Fourier;
    USA;
    Gap
</SU>
<AB>"Novel methods are discussed for using fast Fourier transforms for DNA or
protein sequence comparison. ... Novel methods are given which (1) enable the
detection of clusters of matching letters, (2) facilitate the insertion of gaps
to enhance sequence similarity, and (3) accommodate to varying densities of
letters in the input sequences."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1990</PY>
<VO>18</VO>
<NO>21</NO>
<PP>6305-6310</PP>
</SEQ>

<SEQ>
<UI>0419   Berg,O.G.     Selection of DNA Bindi.. J.Mol.Biol.     87 
193:723-750
</UI>
<AU>Berg OG;
    von Hippel PH
</AU>
<TI>Selection of DNA Binding Sites by Regulatory Proteins. Statistical-
mechanical Theory and Application to Operators and Promoters
</TI>
<SU>Match a pattern matrix;
    USA;
    Sequence analysis;
    Statistical;
    Protein;
    Selection;
    DNA;
    Binding
</SU>
<AB>"We present a statistical-mechanical selection theory for the sequence
analysis of a set of specific DNA regulatory sites that makes it possible to
predict the relationship between individual base-pair choices in the site and
specific activity (affinity)."
</AB>
<JT>J Mol Biol</JT>
<PY>193</PY>
<VO>193</VO>
<PP>723-750</PP>
</SEQ>

<SEQ>
<UI>0420   Berger,M.P.   A Novel Randomized Ite.. Comput.Appl.Bio 91 
7(4):479-484
</UI>
<AU>Berger MP;
    Munson PJ
</AU>
<TI>A Novel Randomized Iterative Strategy for Aligning Multiple Protein
Sequences
</TI>
<SU>Multiple alignment;
    USA;
    Program;
    Optimal;
    Needleman-Wunsch;
    Pairwise alignment;
    Protein
</SU>
<AB>"Our algorithm randomly divides a group of unaligned sequences into two
subgroups, between which an optimal alignment is then obtained by a Needleman-
Wunsch style of algorithm. ... The pairwise alignment process is repeated using
different random divisions of the whole group into two subgroups."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1991</PY>
<VO>7</VO>
<NO>4</NO>
<PP>479-484</PP>
</SEQ>

<SEQ>
<UI>0421   Berkman,O.    Highly Parallelizable .. ACM Sympos.Theo 89 
21:309-319
</UI>
<AU>Berkman O;
    Breslauer D;
    Galil Z;
    Schieber B;
    Vishkin U
</AU>
<TI>Highly Parallelizable Problems
</TI>
<SU>String match;
    Parallel;
    IL
</SU>
<AB>Seattle, WA, 15-17 May 1989. "In this section [4] we describe a parallel
algorithm for finding all the occurrences of a pattern of length m in a text of
length n over an arbitrary alphabet. The algorithm runs in O(log log m) time
using n / log log m processors on a Common CRCW PRAM."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>21</PY>
<VO>21</VO>
<PP>309-319</PP>
</SEQ>

<SEQ>
<UI>0422   Bertossi,A.A. A VLSI System for Stri.. Integration, Th 90 
9:129-139
</UI>
<AU>Bertossi AA
</AU>
<TI>A VLSI System for String Matching
</TI>
<SU>Parallel;
    String match;
    Italy;
    VLSI
</SU>
<AB>Find all occurrences of a pattern of length m in a text of length n, 
where
both strings are over a finite alphabet S. "In spite of its practical 
relevance,
string matching has received very little attention so far in the VLSI
literature. In this paper, we present a special purpose VLSI system for string
matching which takes O(log n + log |S|) time and can be laid out with O(mn log 
m
log n) area."
</AB>
<JT>Integration, The VLSI J</JT>
<PY>1990</PY>
<VO>9</VO>
<PP>129-139</PP>
</SEQ>

<SEQ>
<UI>0423   Bertossi,A.A. A Parallel Solution to.. Comput.J.       92 
35(5):524-526
</UI>
<AU>Bertossi AA;
    Luccio F;
    Pagli L;
    Lodi E
</AU>
<TI>A Parallel Solution to the Approximate String Matching Problem
</TI>
<SU>Match with k differences;
    Parallel;
    Dynamic programming;
    String match;
    Italy;
    Approximate match;
    VLSI
</SU>
<AB>"We have shown how the approximate string matching problem (ASMP) can be
solved in parallel on a bounded degree network of elementary processors. The
proposed parallelization scheme is very simple. It is based on a standard
sequential method of dynamic programming, and attains optimal speedup. ... Our
scheme is instead suitable for VLSI implementation, and takes into account a
very general set of errors, for which no 'fast' algorithm is known."
</AB>
<JT>Comput J</JT>
<PY>1992</PY>
<VO>35</VO>
<NO>5</NO>
<PP>524-526</PP>
</SEQ>

<SEQ>
<UI>0424   Beyer,W.A.    A Molecular Sequence M.. Math.Biosci.    74 19:9-25
</UI>
<AU>Beyer WA;
    Stein ML;
    Smith TF;
    Ulam SM
</AU>
<TI>A Molecular Sequence Metric and Evolutionary Trees
</TI>
<SU>Sequence proximity;
    USA;
    Evolutionary tree
</SU>
<AB>"A precisely stated algorithm is given for reconstructing phylogenetic
relationships from protein amino acid sequence data under the restriction that
all distance measures be proper metrics. In conjunction with a general sequence
metric, the algorithm is applied to the cytochrome c data used in earlier
studies."
</AB>
<JT>Math Biosci</JT>
<PY>19</PY>
<VO>19</VO>
<PP>9-25</PP>
</SEQ>

<SEQ>
<UI>0425   Bishop,M.J.   Maximum Likelihood Ali.. J.Mol.Biol.     86 
190(2):159-165
</UI>
<AU>Bishop MJ;
    Thompson EA
</AU>
<TI>Maximum Likelihood Alignment of DNA Sequences
</TI>
<SU>Pairwise alignment;
    UK;
    Likelihood;
    Probabilistic;
    DNA
</SU>
<AB>"The optimal alignment problem for pairs of molecular sequences under a
probabilistic model of evolutionary change is equivalent to the problem of
estimating the maximum likelihood time required to transform one sequence to 
the
other. When this time has been estimated, various alignments of high posterior
probability may be written down. A simple model with two parameters is 
presented
and a method is described by which the likelihood may be computed."
</AB>
<JT>J Mol Biol</JT>
<PY>1986</PY>
<VO>190</VO>
<NO>2</NO>
<PP>159-165</PP>
</SEQ>

<SEQ>
<UI>0426   Bishop,M.     Fast Computer Search f.. Nucleic Acids R 84 
12(13):5471-54
</UI>
<AU>Bishop M;
    Thompson E
</AU>
<TI>Fast Computer Search for Similar DNA Sequences
</TI>
<SU>Database search;
    UK;
    Sequence database;
    Statistical;
    DNA
</SU>
<AB>"An extremely fast method of searching a nucleic acid sequence database
against a probe sequence is described. The method is based on the detection of
deviation from expected number and deviation from random spatial distribution 
of
sub-sequences which are unique within a sequence, and shared between that
sequence and the probe."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1984</PY>
<VO>12</VO>
<NO>13</NO>
<PP>5471-5474</PP>
</SEQ>

<SEQ>
<UI>0427   Blaisdell,B.E A Measure of the Simil.. Proc.Nat.Acad.S 86 
83(14):5155-51
</UI>
<AU>Blaisdell BE
</AU>
<TI>A Measure of the Similarity of Sets of Sequences not Requiring Sequence
Alignment
</TI>
<SU>Sequence proximity;
    USA;
    Sequence alignment;
    Markov;
    N-gram;
    Coding;
    Needleman-Wunsch;
    Dot;
    Similarity
</SU>
<AB>"Determination of first- and second-order Markov chain homogeneity of 
sets
of nuclear eukaryotic DNA sequences, both coding and noncoding, finds
similarities imperceptible to the standard Needleman-Wunsch base matching or
dot-matrix algorithms."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>1986</PY>
<VO>83</VO>
<NO>14</NO>
<PP>5155-5159</PP>
</SEQ>

<SEQ>
<UI>0428   Blaisdell,B.E Average Values of a Di.. J.Mol.Evol.     89 
29(6):538-547
</UI>
<AU>Blaisdell BE
</AU>
<TI>Average Values of a Dissimilarity Measure Not Requiring Sequence 
Alignment
are Twice the Averages of Conventional Mismatch Counts Requiring Sequence
Alignment for a Computer-generated Model System
</TI>
<SU>Sequence proximity;
    USA;
    Sequence alignment;
    N-gram;
    Statistical;
    Significance;
    Model
</SU>
<AB>"Three measures of sequence dissimilarity have been compared on a
computer-generated model system in which substitutions in random sequences were
made at randomly selected sites and the replacement character was chosen at
random from the set of characters different from the original occupant of the
site."
</AB>
<JT>J Mol Evol</JT>
<PY>1989</PY>
<VO>29</VO>
<NO>6</NO>
<PP>538-547</PP>
</SEQ>

<SEQ>
<UI>0429   Blaisdell,B.E Effectiveness of Measu.. J.Mol.Evol.     89 
29(6):526-537
</UI>
<AU>Blaisdell BE
</AU>
<TI>Effectiveness of Measures Requiring and Not Requiring Prior Sequence
Alignment for Estimating the Dissimilarity of Natural Sequences
</TI>
<SU>Sequence proximity;
    USA;
    Sequence alignment;
    Least squares;
    Discrimination;
    Evolutionary tree;
    Consensus tree
</SU>
<AB>"Various measures of sequence dissimilarity have been evaluated by how
well the additive least squares estimation of edges (branch lengths) of an
unrooted evolutionary tree fit the observed pair-wise dissimilarity measures 
and
by how consistent the trees are for different data sets derived from the same
set of sequences. This evaluation provided sensitive discrimination among
dissimilarity measures ...."
</AB>
<JT>J Mol Evol</JT>
<PY>1989</PY>
<VO>29</VO>
<NO>6</NO>
<PP>526-537</PP>
</SEQ>

<SEQ>
<UI>0430   Blaisdell,B.E Average Values of a Di.. J.Mol.Evol.     91 
32(6):521-528
</UI>
<AU>Blaisdell BE
</AU>
<TI>Average Values of a Dissimilarity Measure not Requiring Sequence 
Alignment
are Twice the Averages of Conventional Mismatch Counts Requiring Sequence
Alignment for a Variety of Computer-Generated Model Systems
</TI>
<SU>Sequence proximity;
    USA;
    Sequence alignment;
    Model
</SU>
<AB>"It has been found that two dissimilarity measures not requiring sequence
alignment perform about as well for the inference of unrooted evolutionary 
trees
as do conventional mismatch counts requiring prior sequence alignment. ... A
reason for the success of one of the measures not requiring sequence alignment
has been found."
</AB>
<JT>J Mol Evol</JT>
<PY>1991</PY>
<VO>32</VO>
<NO>6</NO>
<PP>521-528</PP>
</SEQ>

<SEQ>
<UI>0431   Blum,N.       On Locally Optimal Ali..                 93
</UI>
<AU>Blum N
</AU>
<TI>On Locally Optimal Alignments in Genetic Sequences (Revised Version)
BK  -
</TI>
<SU>Subalignment;
    Locally optimal;
    Optimal;
    Genetic;
    DE
</SU>
<AB>Report 8567-CS, Institut fur Informatik, Universitat Bonn, 23 pp. A c-
locally minimal distance is defined. "We show how to compute all substrings of 
x
which have c-locally minimal distance from y and all corresponding alignments 
in
O(mn) time where n is the length of x and m is the length of y."
</AB>
<PY>1993</PY>
</SEQ>

<SEQ>
<UI>0432   Blum,N.       Efficient Computation ..                 93
</UI>
<AU>Blum N
</AU>
<TI>Efficient Computation of All Optimal Alignments of Two Genetic Sequences
with Concave Weighting Functions
BK  -
</TI>
<SU>Pairwise alignment;
    Optimal;
    Function;
    Genetic;
    DE
</SU>
<AB>Report 8586-CS, Institut fur Informatik, Universitat Bonn, 28 pp. "We 
show
for any concave weighting function, how to compute a compact representation of
the distance graph of two genetic sequences x and y ...."
</AB>
<PY>1993</PY>
</SEQ>

<SEQ>
<UI>0433   Blum,N.       On Locally Optimal Loc..                 93
</UI>
<AU>Blum N
</AU>
<TI>On Locally Optimal Local Alignments and Subalignments of Genetic 
Sequences
with Concave Weighting Functions
BK  -
</TI>
<SU>Subalignment;
    Locally optimal;
    Optimal;
    Function;
    Genetic;
    DE
</SU>
<AB>Report 8587-CS, Institut fur Informatik, Universitat Bonn, 25 pp. "We 
show
for any concave weighting function, how to compute a compact representation of
the locally optimal local alignment graph of [sequences] x and y and of the
locally optimal subalignment graph of x and y, respectively which contains
exactly all locally optimal local alignments and all locally optimal
subalignments of x and y ...."
</AB>
<PY>1993</PY>
</SEQ>

<SEQ>
<UI>0434   Blum,N.       Some Remarks on the Ac..                 93
</UI>
<AU>Blum N
</AU>
<TI>Some Remarks on the Accurate Notion of Local Optimality in Genetic
Sequences
BK  -
</TI>
<SU>Pairwise comparison;
    Review;
    Genetic;
    DE
</SU>
<AB>Report 8588-CS, Institut fur Informatik, Universitat Bonn, 14 pp. "We 
give
a comprehensive survey of old and new results with respect to locally optimal
local alignments and locally optimal subalignments in genetic sequences."
</AB>
<PY>1993</PY>
</SEQ>

<SEQ>
<UI>0435   Boguski,M.S.  Computational Sequence.. J.Lipid Res.    92 
33:957-974
</UI>
<AU>Boguski MS
</AU>
<TI>Computational Sequence Analysis Revisited: New Databases, Software Tools,
and the Research Opportunities They Engender
</TI>
<SU>Sequence analysis;
    Review;
    USA;
    Sequence database;
    Sequence search;
    Multiple alignment;
    Motif
</SU>
<AB>"Recent developments in fast database searching, multiple sequence
alignment, and molecular modeling are discussed and windows-based, mouse-driven
software for CD-ROM and network information retrieval are described."
</AB>
<JT>J Lipid Res</JT>
<PY>33</PY>
<VO>33</VO>
<PP>957-974</PP>
</SEQ>

<SEQ>
<UI>0436   Boguski,M.S.  Analysis of Conserved .. New Biol.       92 
4(3):247-260
</UI>
<AU>Boguski MS;
    Hardison RC;
    Schwartz S;
    Miller W
</AU>
<TI>Analysis of Conserved Domains and Sequence Motifs in Cellular Regulatory
Proteins and Locus Control Regions Using New Software Tools for Multiple
Alignment and Visualization
</TI>
<SU>Multiple comparison;
    USA;
    Motif;
    Region;
    Sequence analysis;
    Multiple alignment;
    Display;
    Protein
</SU>
<AB>"Here we describe an integrated set of interactive Unix tools that
combines several multiple-alignment techniques with traditional 'dot-plot'
visualization to provide a flexible environment for approaching complex 
sequence
analysis problems."
</AB>
<JT>New Biol</JT>
<PY>1992</PY>
<VO>4</VO>
<NO>3</NO>
<PP>247-260</PP>
</SEQ>

<SEQ>
<UI>0437   Bork,P.       A Method for Property .. Stud.Biophys.   89 
129(2/3):231-2
</UI>
<AU>Bork P;
    Grunwald C
</AU>
<TI>A Method for Property Pattern Searches in Protein Sequence Data Bases,
Demonstrated by Detection of GTP-binding Sites
</TI>
<SU>Consensus sequence;
    Match complex patterns;
    Database search;
    DE;
    Pattern search;
    Sequence database;
    Protein;
    Detection
</SU>
<AB>"We have developed a method for deriving patterns of such properties (i.
e., consensus patterns) from alignments of related sequences and for the
subsequent database search for sequence sections that match these patterns. The
characterization of the residues of the patterns is based on ten 
physicochemical
and steric properties given in Zvelebil et al. (1987)."
</AB>
<JT>Stud Biophys</JT>
<PY>1989</PY>
<VO>129</VO>
<NO>2/3</NO>
<PP>231-240</PP>
</SEQ>

<SEQ>
<UI>0438   Bork,P.       Recognition of Differe.. Eur.J.Biochem.  90 
191:347-358
</UI>
<AU>Bork P;
    Grunwald C
</AU>
<TI>Recognition of Different Nucleotides-binding Sites in Primary Structures
Using a Property-pattern Approach
</TI>
<SU>Pattern match;
    DE;
    Consensus sequence;
    Sequence recognition;
    Structure;
    Recognition
</SU>
<AB>"Consensus sequence patterns for b-a-b folds binding FAD, NAD and GTP 
were
constructed on the basis of 11 steric and physicochemical properties. These
property patterns permit detection and distinction of the respective 
nucleotide-
binding sites on the basis of amino acid sequence analysis alone."
</AB>
<JT>Eur J Biochem</JT>
<PY>191</PY>
<VO>191</VO>
<PP>347-358</PP>
</SEQ>

<SEQ>
<UI>0439   Boswell,D.R.  A Program for Template.. Comput.Appl.Bio 88 
4(3):345-350
</UI>
<AU>Boswell DR
</AU>
<TI>A Program for Template Matching of Protein Sequences
</TI>
<SU>Match complex patterns;
    NZ;
    Motif;
    Template;
    Program;
    Protein
</SU>
<AB>"The matching of a template to a protein sequence is simplified by
treating it as a special case of sequence alignment. Restriction of the
distances between motifs in the template controls against spurious matches
within very long sequences. The program using this algorithm is fast enough to
be used in scanning large databases for sequences matching a complex template."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1988</PY>
<VO>4</VO>
<NO>3</NO>
<PP>345-350</PP>
</SEQ>

<SEQ>
<UI>0440   Boswell,D.R.  Sequence Comparison an.. Computational.. 88Oxford 
Universi
</UI>
<AU>Boswell DR;
    Lesk AM
</AU>
<TI>Sequence Comparison and Alignment: The Measurement and Interpretation of
Sequence Similarity
</TI>
<ED>Lesk AM
</ED>
<BK>Computational Molecular Biology. Sources and Methods for Sequence 
Analysis
</BK>
<SU>Pairwise comparison;
    Review;
    NZ;
    Sequence comparison;
    Sequence alignment;
    Similarity
</SU>
<AB>"Sequence comparison and alignment are among the most important tools of
computational molecular biology. ... Sequence comparison merely detects common
features; sequence alignment places residues of the sequences into the best 
one-
to-one correspondence. ... Here we emphasize not the methods but the
interpretation of the results."
</AB>
<PU>Oxford University Press </PU>
<PL>Oxford </PL>
<PY>1988</PY>
<PP>161-178</PP>
</SEQ>

<SEQ>
<UI>0441   Boswell,D.R.  Sequence Comparison by.. Nucleic Acids R 84 
12(1):457-463
</UI>
<AU>Boswell DR;
    McLachlan AD
</AU>
<TI>Sequence Comparison by Exponentially-Damped Alignment
</TI>
<SU>Pairwise alignment;
    UK;
    Sequence comparison;
    Dynamic programming
</SU>
<AB>Two "sequences are compared by calculating for each pair of residues a
score which represents the best local alignment bringing those residues into
correspondence; smooth localization is achieved by reducing the contribution of
distant parts of the alignment path by a factor which decreases exponentially
with their distance from the point in question."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1984</PY>
<VO>12</VO>
<NO>1</NO>
<PP>457-463</PP>
</SEQ>

<SEQ>
<UI>0442   Boyer,R.S.    A Fast String-Searchin.. Comm.ACM        77 
20(10):762-772
</UI>
<AU>Boyer RS;
    Moore JS
</AU>
<TI>A Fast String-Searching Algorithm
</TI>
<SU>String match;
    Boyer-Moore;
    USA;
    String search;
    Algorithm
</SU>
<AB>"An algorithm is presented that searches for the location, i, of the 
first
occurrence of a character string, pat, in another string, string. During the
search operation, the characters of pat are matched starting with the last
character of pat. The information gained by starting the match at the end of 
the
pattern often allows the algorithm to proceed in large jumps through the text
being searched."
</AB>
<JT>Comm ACM </JT>
<PY>1977</PY>
<VO>20</VO>
<NO>10</NO>
<PP>762-772</PP>
</SEQ>

<SEQ>
<UI>0443   Bradford,J.H. Sequence Matching with.. Inform.Process. 90 
34(4):193-196
</UI>
<AU>Bradford JH
</AU>
<TI>Sequence Matching with Binary Codes
</TI>
<SU>Sequence proximity;
    CA;
    Sequence match
</SU>
<AB>"This paper introduces an algorithm that encodes pairs of strings as
binary numbers such that the Hamming distance between the binary code words is
equal to the Levenshtein distance between the original strings."
</AB>
<JT>Inform Process Lett</JT>
<PY>1990</PY>
<VO>34</VO>
<NO>4</NO>
<PP>193-196</PP>
</SEQ>

<SEQ>
<UI>0444   Brendel,V.    Linguistics of Nucleot.. J.Biomol.Struct 86 
4(1):11-21
</UI>
<AU>Brendel V;
    Beckmann JS;
    Trifonov EN
</AU>
<TI>Linguistics of Nucleotide Sequences: Morphology and Comparison of
Vocabularies
</TI>
<SU>Sequence analysis;
    Significance;
    Linguistic;
    IL;
    Nucleotide
</SU>
<AB>"The concept of 'words' in continuous languages devoid of blanks is
introduced and an operational definition of words given. With this novel 
concept
nucleotide sequences become objects for linguistic analysis. The typical word
size of the nucleotide language is found to be 3 to 5 ...."
</AB>
<JT>J Biomol Struct &amp; Dyn </JT>
<PY>1986</PY>
<VO>4</VO>
<NO>1</NO>
<PP>11-21</PP>
</SEQ>

<SEQ>
<UI>0445   Brendel,V.    Methods and Algorithms.. Proc.Nat.Acad.S 92 
89:2002-2006
</UI>
<AU>Brendel V;
    Bucher P;
    Nourbakhsh IR;
    Blaisdell BE;
    Karlin S
</AU>
<TI>Methods and Algorithms for Statistical Analysis of Protein Sequences
</TI>
<SU>Sequence analysis;
    Significance;
    USA;
    Statistical;
    Protein;
    Algorithm
</SU>
<AB>"We describe several protein sequence statistics designed to evaluate
distinctive attributes of residue content and arrangement in primary structure.
Considered are global compositional biases, local clustering of different
residue types ..., long runs of charged or uncharged residues, periodic
patterns, counts and distribution of homooligopeptides, and unusual spacings ."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>89</PY>
<VO>89</VO>
<PP>2002-2006</PP>
</SEQ>

<SEQ>
<UI>0446   Breslauer,D.  An Optimal O(log log n.. SIAM J.Comput.  90 
19(6):1051-105
</UI>
<AU>Breslauer D;
    Galil Z
</AU>
<TI>An Optimal O(log log n) Time Parallel String Matching Algorithm
</TI>
<SU>Parallel;
    USA;
    String match;
    Optimal;
    Algorithm
</SU>
<AB>"An optimal O(log log n) time parallel algorithm for string matching on
CRCW-PRAM is presented. It improves previous results of Galil(1985) and Vishkin
(1985)." Since the algorithm requires n/log log n processors, the string
matching problem belongs to one of the lowest parallel complexity classes
</AB>
<JT>SIAM J Comput</JT>
<PY>1990</PY>
<VO>19</VO>
<NO>6</NO>
<PP>1051-1058</PP>
</SEQ>

<SEQ>
<UI>0447   Breslauer,D.  A Lower Bound for Para.. ACM Sympos.Theo 91 
23:439-443
</UI>
<AU>Breslauer D;
    Galil Z
</AU>
<TI>A Lower Bound for Parallel String Matching
</TI>
<SU>Parallel;
    USA;
    String match;
    Complexity
</SU>
<AB>New Orleans, LA, 6-8 May 1991. "We present an O(log log m) lower bound on
the number of rounds necessary for finding occurrences of a pattern string
P[1..m] in a text string T[1..2m] in parallel using m comparisons in each 
round.
This is the first lower bound for this problem. [It] is within a constant 
factor
of the fastest algorithm ... and also holds for an m-processor CRCW-PRAM in the
case of a general alphabet."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>23</PY>
<VO>23</VO>
<PP>439-443</PP>
</SEQ>

<SEQ>
<UI>0448   Brutlag,D.L.  Improved Sensitivity o.. Comput.Appl.Bio 90 
6(3):237-245
</UI>
<AU>Brutlag DL;
    Dautricourt JP;
    Maulik S;
    Relph J
</AU>
<TI>Improved Sensitivity of Biological Sequence Database Searches
</TI>
<SU>Database search;
    USA;
    Sequence database;
    k-tuple
</SU>
<AB>"We have increased the sensitivity of DNA and protein sequence database
searches by allowing similar but non-identical amino acids or nucleotides to
match. In addition, one can match k-tuples or words instead of matching
individual residues in order to speed the search. ... The concept of matching
non-identical k-tuples also increases the power of DNA database searches."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1990</PY>
<VO>6</VO>
<NO>3</NO>
<PP>237-245</PP>
</SEQ>

<SEQ>
<UI>0449   Butler,R.     Aligning Genetic Seque.. Strand: New C.. 
90Englewood Cliff
</UI>
<AU>Butler R;
    Butler T;
    Foster I;
    Karonis N;
    Olson R;
    Overbeek R;
    Pfluger N;
    Price M;
    Tuecke S
</AU>
<TI>Aligning Genetic Sequences
</TI>
<ED>Foster I
    Taylor S
</ED>
<BK>Strand: New Concepts in Parallel Programming
</BK>
<SU>Multiple alignment;
    Segment;
    USA;
    Genetic
</SU>
<AB>"Our [multiple sequence alignment] algorithm is based on the notion of
critical subsequences. ... When a critical subsequence occurs in two or more
sequences, we call the set of occurrences a pin. Our algorithm will attempt to
create an alignment in which as many pins as possible align exactly."
</AB>
<PU>Englewood Cliffs</PU>
<PL> NJ ,Prentice Hall </PL>
<PY>1990</PY>
<PP>253-271</PP>
</SEQ>

<SEQ>
<UI>0450   Carrillo,H.   The Multiple Sequence .. SIAM J.Appl.Mat 88 
48(5):1073-108
</UI>
<AU>Carrillo H;
    Lipman D
</AU>
<TI>The Multiple Sequence Alignment Problem in Biology
</TI>
<SU>Multiple alignment;
    USA;
    Sequence alignment;
    Dynamic programming;
    Complexity;
    Sequence comparison
</SU>
<AB>"The dynamic programming approach [to the multiple sequence alignment
problem] has the limitation that its complexity scales up greatly with 
dimension
.... In the following, we make observations on the problem of aligning 
sequences
and that of aligning subsets of these sequences that reveal constraints of the
problem that will prove useful in reducing computation in the dynamic
programming method."
</AB>
<JT>SIAM J Appl Math</JT>
<PY>1988</PY>
<VO>48</VO>
<NO>5</NO>
<PP>1073-1082</PP>
</SEQ>

<SEQ>
<UI>0451   Cavener,D.R.  Comparison of the Cons.. Nucleic Acids R 87 
15(4):1353-136
</UI>
<AU>Cavener DR
</AU>
<TI>Comparison of the Consensus Sequence Flanking Translational Start Sites 
in
Drosophila and Vertebrates
</TI>
<SU>Consensus sequence;
    USA;
    Consensus method
</SU>
<AB>"An important issue germane to the analysis of nucleic acid sequences is
the criteria used for consensus assignments. ... With these considerations in
mind I have chosen the following criteria for the assignment of consensus
sequences. ... However, the goal of this study was to obtain reliable consensus
data which would not be significantly affected by a few errors."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1987</PY>
<VO>15</VO>
<NO>4</NO>
<PP>1353-1361</PP>
</SEQ>

<SEQ>
<UI>0452   Chan,S.C.     Synthesis and Recognit.. IEEE Trans.Patt 91 
13(12):1245-12
</UI>
<AU>Chan SC;
    Wong AKC
</AU>
<TI>Synthesis and Recognition of Sequences
</TI>
<SU>Multiple alignment;
    Clustering;
    CA;
    Hierarchical;
    Entropy;
    Recognition
</SU>
<AB>"The synthesis of an ensemble of sequences is a 'sequence' of random
elements that specify the probabilities of occurrence of the different symbols
at the corresponding sites of the sequences. The synthesis is determined by a
hierarchical sequence synthesis procedure ... which returns not only the
taxonomic hierarchy of the whole ensemble of sequences but also the alignment
... of a group ... of the sequences at each level of the hierarchy."
</AB>
<JT>IEEE Trans Patt Anal Mach Intell</JT>
<PY>1991</PY>
<VO>13</VO>
<NO>12</NO>
<PP>1245-1255</PP>
</SEQ>

<SEQ>
<UI>0453   Chan,S.C.     A Survey of Multiple S.. Bull.Math.Biol. 92 
54(4):563-598
</UI>
<AU>Chan SC;
    Wong AKC;
    Chiu DKY
</AU>
<TI>A Survey of Multiple Sequence Comparison Methods
</TI>
<SU>Multiple alignment;
    Survey;
    CA;
    Sequence comparison
</SU>
<AB>"This article presents a survey of the exhaustive (optimal) and heuristic
(possibly sub-optimal) methods developed for the comparison of multiple
macromolecular sequences. Emphasis is given to the different approaches of the
heuristic methods
</AB>
<JT>Bull Math Biol</JT>
<PY>1992</PY>
<VO>54</VO>
<NO>4</NO>
<PP>563-598</PP>
</SEQ>

<SEQ>
<UI>0454   Takezaki,N.   Inconsistency of the M.. J.Mol.Evol.     94 
39:210-218
</UI>
<AU>Takezaki N;
    Nei M
</AU>
<TI>Inconsistency of the Maximum Parsimony Method When the Rate of Nucleotide
Substitution is Constant
</TI>
<SU>Phylogeny;
    Parsimony;
    USA;
    Substitution;
    Rate;
    Nucleotide
</SU>
<AB>"The inconsistency of the maximum parsimony method is known to occur even
when the rate of nucleotide substitution is constant. To understand why this
inconsistency occurs, a mathematical study was conducted for the cases of five,
six, and seven sequences. The results obtained indicate that this inconsistency
occurs because the probability of occurrence of nucleotide configurations
generated by one substitution on a short interior branch is often lower than
that of configurations generated by more substitutions on other longer 
branches.
The chance of occurrence of this event ... apparently increases as the number 
of
sequences increases."
</AB>
<JT>J Mol Evol</JT>
<PY>39</PY>
<VO>39</VO>
<PP>210-218</PP>
</SEQ>

<SEQ>
<UI>0455   Chang,W.I.    Approximate String Mat.. IEEE Sympos.Fou 90 
31:116-124
</UI>
<AU>Chang WI;
    Lawler EL
</AU>
<TI>Approximate String Matching in Sublinear Expected Time
</TI>
<SU>Match with k differences;
    USA;
    String match;
    Approximate match;
    Locally optimal
</SU>
<AB>22-24 October 1990, St. Louis, MO. "We are interested in much faster
algorithms for restricted cases of the [k differences approximate string
matching] problem, such as when the text string is random and errors are not 
too
frequent. We have devised an algorithm that, for k &lt; m/(log m + O(1)), runs in
time O((n/m)k log m) on the average. In the worst case, our algorithm is O(nk)
.... We define the approximate substring matching problem and give efficient
algorithms based on our techniques."
</AB>
<JT>IEEE Sympos Found Comput Sci</JT>
<PY>31</PY>
<VO>31</VO>
<PP>116-124</PP>
</SEQ>

<SEQ>
<UI>0456   Chao,K.M.     Aligning Two Sequences.. Comput.Appl.Bio 92 
8(5):481-487
</UI>
<AU>Chao KM;
    Pearson WR;
    Miller W
</AU>
<TI>Aligning Two Sequences Within a Specified Diagonal Band
</TI>
<SU>Pairwise alignment;
    USA;
    FASTA;
    Locally optimal
</SU>
<AB>"We describe an algorithm for aligning two sequences within a diagonal
band that requires only O(NW) computation time and O(N) space, where N is the
length of the shorter of the two sequences and W is the width of the band. ...
This algorithm has been incorporated into the FASTA program package ...."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>5</NO>
<PP>481-487</PP>
</SEQ>

<SEQ>
<UI>0457   Chappey,C.    MASH: An Interactive P.. Comput.Appl.Bio 91 
7(2):195-202
</UI>
<AU>Chappey C;
    Danckaert A;
    Dessen P;
    Hazout S
</AU>
<TI>MASH: An Interactive Program for Multiple Alignment and Consensus 
Sequence
Construction for Biological Sequences
</TI>
<SU>Multiple alignment;
    FR;
    Motif;
    Consensus sequence;
    Program
</SU>
<AB>"... a method that allows the selection of the series of the common 
motifs
to be aligned according to the 'alignment priority' criterion. This function
depends on both the length and the occurrence frequency of the motifs, and
allows the extraction of the total or local similarities, i.e. involving the
whole set or only some sequences."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1991</PY>
<VO>7</VO>
<NO>2</NO>
<PP>195-202</PP>
</SEQ>

<SEQ>
<UI>0458   Chen,E.S.     Parallel Alignment of .. Comput.Appl.Bio 93 
9(3):375-375
</UI>
<AU>Chen ES;
    Asano C;
    Davison DB
</AU>
<TI>Parallel Alignment of DNA Sequences on the Connection Machine CM-2
</TI>
<SU>Database search;
    Parallel;
    USA;
    Program;
    Hardware;
    DNA
</SU>
<AB>"This code allows for searches of a query sequence against a library,
while the Jones program [Jones (1992)] is best used for locating small patterns
within a database."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>3</NO>
<PP>375-375</PP>
</SEQ>

<SEQ>
<UI>0459   Chiu,D.K.Y.   Inferring Consensus St.. Comput.Appl.Bio 91 
7(3):347-352
</UI>
<AU>Chiu DKY;
    Kolodziejczak T
</AU>
<TI>Inferring Consensus Structure from Nucleic Acid Sequences
</TI>
<SU>Multiple alignment;
    Structure;
    CA;
    Nucleic acid
</SU>
<AB>"This paper presents an unsupervised inference method for determining the
higher-order structure from sequence data. The method is general, but in this
paper it is applied to nucleic acid sequences in determining the secondary 
(2-D)
and tertiary (3-D) structure of the macromolecule."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1991</PY>
<VO>7</VO>
<NO>3</NO>
<PP>347-352</PP>
</SEQ>

<SEQ>
<UI>0460   Choffrut,C.   An Optimal Algorithm f.. EATCS Bull.     90 
40:217-225
</UI>
<AU>Choffrut C
</AU>
<TI>An Optimal Algorithm for Building the Boyer-Moore Automaton
</TI>
<SU>String match;
    Boyer-Moore;
    Automata;
    FR;
    Optimal;
    Algorithm
</SU>
<AB>"The notion of Boyer-Moore automaton ... leads to an algorithm that
requires more preprocessing but is more efficient than the original Boyer-
Moore's algorithm. We give an optimal algorithm for computing the automaton and
state an upper bound on the size of the automaton ...."
</AB>
<JT>EATCS Bull</JT>
<PY>40</PY>
<VO>40</VO>
<PP>217-225</PP>
</SEQ>

<SEQ>
<UI>0461   Chvatal,V.    Longest Common Subsequ.. J.Appl.Probab.  75 
12:306-315
</UI>
<AU>Chvatal V;
    Sankoff D
</AU>
<TI>Longest Common Subsequence of Two Random Sequences
</TI>
<SU>Longest common;
    Significance;
    CA;
    Subsequence
</SU>
<AB>"Given two random k-ary sequences of length n, what is f(n, k), the
expected length of their longest common subsequence? ... We study the limiting
behaviour of n-1f(n, k) and derive upper and lower bounds on these limits for
all k."
</AB>
<JT>J Appl Probab</JT>
<PY>12</PY>
<VO>12</VO>
<PP>306-315</PP>
</SEQ>

<SEQ>
<UI>0462   Chvatal,V.    An Upper-Bound Techniq.. Time Warps, S.. 
83Addison-Wesley
</UI>
<AU>Chvatal V;
    Sankoff D
</AU>
<TI>An Upper-Bound Technique for Lengths of Common Subsequences
</TI>
<ED>Sankoff D
    Kruskal JB
</ED>
<BK>Time Warps, String Edits, and Macromolecules: The Theory and Practice of
Sequence Comparison
</BK>
<SU>Longest common;
    Significance;
    CA;
    Subsequence
</SU>
<AB>This chapter illustrates "the application of combinatorial argumentation
to sequence-comparison problems, in deriving upper bounds for the expected
length of the longest common subsequence of two random k-ary sequences of 
length
n."
</AB>
<PU>Addison-Wesley </PU>
<PL>Reading, MA </PL>
<PY>1983</PY>
<PP>353-357</PP>
</SEQ>

<SEQ>
<UI>0463   Claverie,J.M. Assessing the Biologic.. Comput.Appl.Bio 85 
1(2):95-104
</UI>
<AU>Claverie JM;
    Sauvaget I
</AU>
<TI>Assessing the Biological Significance of Primary Structure Consensus
Patterns using Sequence Databanks. I. Heat-shock and Glucocorticoid Control
Elements in Eukaryotic Promoters
</TI>
<SU>Signal;
    FR;
    Significance;
    Program;
    Database search;
    Structure;
    Databank
</SU>
<AB>"We describe FORTRAN 77 software allowing for convenient searching of any
segmented and ambiguous pattern in the currently available protein or 
nucleotide
sequence databanks."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1985</PY>
<VO>1</VO>
<NO>2</NO>
<PP>95-104</PP>
</SEQ>

<SEQ>
<UI>0464   Cockwell,K.Y. Software Tools for Mot.. Comput.Appl.Bio 89 
5(3):227-232
</UI>
<AU>Cockwell KY;
    Giles IG
</AU>
<TI>Software Tools for Motif and Pattern Scanning: Program Descriptions
including a Universal Sequence Reading Algorithm
</TI>
<SU>Dictionary match;
    UK;
    Motif;
    Gap;
    Program;
    Pattern match;
    Reading;
    Algorithm
</SU>
<AB>"Two programs, MOTIF and PATTERN, that scan sequences for matches to 
user-
defined motifs and patterns of motifs based on identity and set membership are
described." A pattern is a sequence of motifs separated by gaps
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1989</PY>
<VO>5</VO>
<NO>3</NO>
<PP>227-232</PP>
</SEQ>

<SEQ>
<UI>0465   Cohen,D.N.    Matching Code Sequence.. Math.Biosci.    75 24:25-30
</UI>
<AU>Cohen DN;
    Reichert TA;
    Wong AKC
</AU>
<TI>Matching Code Sequences Utilizing Context Free Quality Measures
</TI>
<SU>Pairwise alignment;
    USA;
    Optimal;
    Needleman-Wunsch
</SU>
<AB>"A method is described herein for discovering the optimal correspondence
of a pair of code sequences under generalized quality measures. The limits of
both this algorithm and that of Needleman and Wunsch are presented." The
algorithm described runs in O(M2N2) time.
</AB>
<JT>Math Biosci</JT>
<PY>24</PY>
<VO>24</VO>
<PP>25-30</PP>
</SEQ>

<SEQ>
<UI>0466   Cole,R.       Tighter Bounds on the .. IEEE Sympos.Fou 92 
33:600-609
</UI>
<AU>Cole R;
    Hariharan R
</AU>
<TI>Tighter Bounds on the Exact Complexity of String Matching
</TI>
<SU>Complexity;
    USA;
    String match
</SU>
<AB>24-27 October 1992, Pittsburgh, PA. "This paper considers how many
character comparisons are needed to find all occurrences of a pattern of length
m in a text of length n." Upper and lower bounds are obtained
</AB>
<JT>IEEE Sympos Found Comput Sci</JT>
<PY>33</PY>
<VO>33</VO>
<PP>600-609</PP>
</SEQ>

<SEQ>
<UI>0467   Collins,J.F.  Applications of Parall.. Nucleic Acids R 84 
12(1):181-192
</UI>
<AU>Collins JF;
    Coulson AFW
</AU>
<TI>Applications of Parallel Processing Algorithms for DNA Sequence Analysis
</TI>
<SU>Pairwise comparison;
    Parallel;
    UK;
    Sequence analysis;
    Distributed;
    DNA;
    Algorithm
</SU>
<AB>"This paper explores the applicability of an ICL Distributed Array
Processor ('DAP') to the general problem of finding similarities in DNA
sequences." Programs are described for inspecting the match matrix and 
searching
for alignments
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1984</PY>
<VO>12</VO>
<NO>1</NO>
<PP>181-192</PP>
</SEQ>

<SEQ>
<UI>0468   Collins,J.F.  Molecular Sequence Com.. Nucleic Acid .. 87IRL Press
</UI>
<AU>Collins JF;
    Coulson AFW
</AU>
<TI>Molecular Sequence Comparison and Alignment
</TI>
<ED>Bishop MJ
    Rawlings CJ
</ED>
<BK>Nucleic Acid and Protein Sequence Analysis: A Practical Approach
</BK>
<SU>Pairwise comparison;
    Review;
    UK;
    Sequence comparison
</SU>
<AB>"Similarity searches require a solution to one of three types of problem.
... Given two finite sequences, what pattern of indels makes the most plausibly
similar alignment between them? ... Which sub-sequence(s) of an indefinitely
long subsequence show(s) the greatest similarity to a short query sequence ...?
... Which pair(s) of sub-sequences ... show(s) the most plausible similarities
...?"
</AB>
<PU>IRL Press </PU>
<PL>Oxford </PL>
<PY>1987</PY>
<PP>323-358</PP>
</SEQ>

<SEQ>
<UI>0469   Collins,J.F.  Significance of Protei.. Methods Enzymol 90 
183:474-487
</UI>
<AU>Collins JF;
    Coulson AFW
</AU>
<TI>Significance of Protein Sequence Similarities
</TI>
<SU>Database search;
    Significance;
    UK;
    Locally optimal;
    Sequence comparison;
    Similarity;
    Protein
</SU>
<AB>"Given that a certain subsequence alignment has been found in comparing a
query sequence against a database of sequences, what is the chance that an
alignment of the same degree of similarity (or better) would have been found if
the query sequence had been compared with a database just like the database
which was used ... except that it contained no sequences which are 
significantly
related to the query sequence?"
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>474-487</PP>
</SEQ>

<SEQ>
<UI>0470   Collins,J.F.  The Significance of Pr.. Comput.Appl.Bio 88 
4(1):67-71
</UI>
<AU>Collins JF;
    Coulson AFW;
    Lyall A
</AU>
<TI>The Significance of Protein Sequence Similarities
</TI>
<SU>Subalignment;
    Significance;
    UK;
    Sequence comparison;
    Locally optimal;
    Similarity;
    Protein
</SU>
<AB>"A general method of assessing the significance of scored best local
alignments, particularly suited to protein sequence comparisons, is described.
The method establishes the parameters describing the distribution of the best
results from any search program, provided that the set is sufficiently large 
and
the majority of the alignments arise from unrelated sequences."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1988</PY>
<VO>4</VO>
<NO>1</NO>
<PP>67-71</PP>
</SEQ>

<SEQ>
<UI>0471   Colussi,L.    Correctness and Effici.. Inform.Comput.  91 
95(2):225-251
</UI>
<AU>Colussi L
</AU>
<TI>Correctness and Efficiency of Pattern Matching Algorithms
</TI>
<SU>String match;
    Complexity;
    Pattern match;
    Italy;
    Algorithm
</SU>
<AB>"A few lines pattern matching algorithm is obtained by using the
correctness proof of programs as a tool to the design of efficient algorithms.
The new algorithm is obtained from a brute force algorithm by three refinement
steps. ... [It] performs 1.5n character comparisons in the worst case and is
sublinear on a random text for all patterns. ... [It] always works better than
the classical [Knuth-Morris-Pratt] algorithm and, for some problems, is better
than the [Boyer-Moore] algorithm too."
</AB>
<JT>Inform Comput</JT>
<PY>1991</PY>
<VO>95</VO>
<NO>2</NO>
<PP>225-251</PP>
</SEQ>

<SEQ>
<UI>0472   Colussi,L.    On the Exact Complexit.. IEEE Sympos.Fou 90 
31:135-144
</UI>
<AU>Colussi L;
    Galil Z;
    Giancarlo R
</AU>
<TI>On the Exact Complexity of String Matching
</TI>
<SU>Complexity;
    USA;
    String match
</SU>
<AB>October 22-24, 1990, St. Louis, MO. "We investigate the maximal number of
character comparisons made by a linear-time string matching algorithm, given a
text of length n and a pattern of length m over a general alphabet. We denote 
it
by c(n,m) or approximate it by (1+C)n, where C is a universal constant. We add
the subscript 'on-line' when we restrict attention to on-line algorithms ....
The upper bound was established 20 years ago ... and no progress has been made
for 19 years. The only lower bound has been the obvious one .... We improve
these bounds and determine Con-line exactly."
</AB>
<JT>IEEE Sympos Found Comput Sci</JT>
<PY>31</PY>
<VO>31</VO>
<PP>135-144</PP>
</SEQ>

<SEQ>
<UI>0473   Consel,C.     Partial Evaluation of .. Inform.Process. 89 
30(2):79-86
</UI>
<AU>Consel C;
    Danvy O
</AU>
<TI>Partial Evaluation of Pattern Matching in Strings
</TI>
<SU>String match;
    Knuth-Morris-Pratt;
    FR;
    Pattern match;
    Automata
</SU>
<AB>"This article describes how automatically specializing a fairly naive
pattern matcher by partial evaluation leads to the Knuth, Morris &amp; Pratt
algorithm. Interestingly enough, no theorem proving is needed to achieve the
partial evaluation, as was previously argued, and it is sufficient to identify 
a
static component in the computation to get the result - a deterministic finite
automaton. This experiment illustrates how a small insight and partial
evaluation can achieve a nontrivial result."
</AB>
<JT>Inform Process Lett</JT>
<PY>1989</PY>
<VO>30</VO>
<NO>2</NO>
<PP>79-86</PP>
</SEQ>

<SEQ>
<UI>0474   Core,N.G.     Supercomputers and Bio.. Comput.Biomed.R 89 
22:497-515
</UI>
<AU>Core NG;
    Edmiston EW;
    Saltz JH;
    Smith RM
</AU>
<TI>Supercomputers and Biological Sequence Comparison Algorithms
</TI>
<SU>Pairwise comparison;
    Parallel;
    USA;
    Dynamic programming;
    Sequence comparison;
    Algorithm
</SU>
<AB>"One method of increasing the speed of the calculations [to compare
biological sequences] is to perform them in parallel. We present the results of
initial investigations using two dynamic programming algorithms on the Intel
iPSC hypercube and the Connection Machine as well as an inexpensive,
heuristically-based algorithm on the Encore Multimax."
</AB>
<JT>Comput Biomed Res</JT>
<PY>22</PY>
<VO>22</VO>
<PP>497-515</PP>
</SEQ>

<SEQ>
<UI>0475   Cormen,T.H.   String Matching          Introduction .. 90MIT Press
</UI>
<AU>Cormen TH;
    Leiserson CE;
    Rivest RL
</AU>
<TI>String Matching
</TI>
<BK>Introduction to Algorithms
</BK>
<SU>Review;
    USA;
    Automata;
    String match;
    Knuth-Morris-Pratt;
    Boyer-Moore
</SU>
<AB>Chapter 34: . The naive string-matching algorithm. The Rabin-Karp
algorithm. String matching with finite automata. The Knuth-Morris-Pratt
algorithm. The Boyer-Moore algorithm
</AB>
<PU>MIT Press </PU>
<PL>Cambridge, MA </PL>
<PY>1990</PY>
<PP>853-885</PP>
</SEQ>

<SEQ>
<UI>0476   Corpet,F.     Multiple Sequence Alig.. Nucleic Acids R 88 
16(22):10881-1
</UI>
<AU>Corpet F
</AU>
<TI>Multiple Sequence Alignment with Hierarchical Clustering
</TI>
<SU>Multiple alignment;
    FR;
    Sequence alignment;
    Clustering;
    Hierarchical
</SU>
<AB>This approach is based on the conventional dynamic-programming method of
pairwise alignment. Initially, a hierarchical clustering of the sequences is
calculated from the matrix of the pairwise alignment scores. From the 
clustering
a multiple sequence alignment is derived. The pairwise alignments from the
multiple alignment form a new matrix of pairwise alignment scores. If the 
matrix
is different from the previous one, iteration of the process can be performed
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1988</PY>
<VO>16</VO>
<NO>22</NO>
<PP>10881-10890</PP>
</SEQ>

<SEQ>
<UI>0477   Coulson,A.F.W Protein and Nucleic Ac.. Comput.J.       87 
30(5):420-424
</UI>
<AU>Coulson AFW;
    Collins JF;
    Lyall A
</AU>
<TI>Protein and Nucleic Acid Sequence Database Searching: a Suitable Case for
Parallel Processing
</TI>
<SU>Database search;
    Parallel;
    UK;
    Sequence database;
    Significance;
    Protein;
    Nucleic acid
</SU>
<AB>"Sequence analysis of protein and nucleic acid databases by exhaustive
string-matching algorithms is effectively implemented on large processor-array
machines, such as the I. C. L. DAP. An improved method of assessing the
significance of the best alignments for proteins is described."
</AB>
<JT>Comput J</JT>
<PY>1987</PY>
<VO>30</VO>
<NO>5</NO>
<PP>420-424</PP>
</SEQ>

<SEQ>
<UI>0478   Crochemore,M. Two-Way String-Matching  J.Assoc.Comput. 91 
38(3):651-675
</UI>
<AU>Crochemore M;
    Perrin D
</AU>
<TI>Two-Way String-Matching
</TI>
<SU>String match;
    Boyer-Moore;
    Knuth-Morris-Pratt;
    FR
</SU>
<AB>"A new string-matching algorithm is presented, which can be viewed as an
intermediate between the classical algorithms of Knuth, Morris, and Pratt on 
the
one hand and Boyer and Moore, on the other hand. The algorithm is linear in 
time
and uses constant space as the algorithm of Galil and Seiferas. It [is]
remarkably simple which consequently makes its analysis possible."
</AB>
<JT>J Assoc Comput Mach</JT>
<PY>1991</PY>
<VO>38</VO>
<NO>3</NO>
<PP>651-675</PP>
</SEQ>

<SEQ>
<UI>0479   Czelusniak,J. Maximum Parsimony Appr.. Methods Enzymol 90 
183:601-615
</UI>
<AU>Czelusniak J;
    Goodman M;
    Moncrief ND;
    Kehoe SM
</AU>
<TI>Maximum Parsimony Approach to Construction of Evolutionary Trees from
Aligned Homologous Sequences
</TI>
<SU>Multiple alignment;
    Phylogeny;
    USA;
    Evolutionary tree;
    Parsimony
</SU>
<AB>"We feel we have demonstrated (by example but not by rigorous proof) that
our heuristic maximum parsimony search procedures can approach the correct
evolutionary tree."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>601-615</PP>
</SEQ>

<SEQ>
<UI>0480   Dardel,F.     DNAid: A Macintosh Ful.. Comput.Appl.Bio 88 
4(4):483-486
</UI>
<AU>Dardel F;
    Bensoussan P
</AU>
<TI>DNAid: A Macintosh Full Screen Editor Featuring a Built-in Regular
Expression Interpreter for the Search of Specific Patterns in Biological
Sequences using Finite State Automata
</TI>
<SU>Dictionary match;
    Automata;
    FR;
    Language;
    Pattern search;
    Pattern language;
    Editor;
    Expression
</SU>
<AB>"In addition to the classical editing capabilities, powerful analysis and
search functions are available from within the editor. ... Furthermore a 
pattern
matching language is included which allows searches for user-defined strict or
fuzzy signals within biological sequences. Patterns are translated into finite
state automata which allow very efficient searches."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1988</PY>
<VO>4</VO>
<NO>4</NO>
<PP>483-486</PP>
</SEQ>

<SEQ>
<UI>0481   Davies,G.     Algorithms for Pattern.. Software.Practi 86 
16(6):575-601
</UI>
<AU>Davies G;
    Bowsher S
</AU>
<TI>Algorithms for Pattern Matching
</TI>
<SU>String match;
    Review;
    UK;
    Pattern match;
    Complexity;
    String search;
    Algorithm
</SU>
<AB>"This paper describes four algorithms of varying complexity used for
pattern matching, and investigates their behaviour. The algorithms are tested
using patterns of varying length from several alphabets."
</AB>
<JT>Software Practice Experience </JT>
<PY>1986</PY>
<VO>16</VO>
<NO>6</NO>
<PP>575-601</PP>
</SEQ>

<SEQ>
<UI>0482   Davison,D.    Sequence Similarity ('.. Bull.Math.Biol. 85 
47(4):437-474
</UI>
<AU>Davison D
</AU>
<TI>Sequence Similarity ('Homology') Searching for Molecular Biologists
</TI>
<SU>Pairwise comparison;
    Review;
    USA;
    Sequence alignment;
    Similarity
</SU>
<AB>"Major types of sequence similarity searching ... are reviewed and
examples of each are presented. The features and limitations of each type of
program, and individual implementations of each type are discussed. Two pairs 
of
sequences are used as examples to show how implementations of each type differ
in their results and their presentation. Both local and global alignment
programs are examined...."
</AB>
<JT>Bull Math Biol</JT>
<PY>1985</PY>
<VO>47</VO>
<NO>4</NO>
<PP>437-474</PP>
</SEQ>

<SEQ>
<UI>0483   Davison,D.    A Non-metric Sequence .. Bull.Math.Biol. 84 
46(4):579-590
</UI>
<AU>Davison D;
    Thompson KH
</AU>
<TI>A Non-metric Sequence Alignment Program
</TI>
<SU>Pairwise alignment;
    USA;
    Sequence alignment;
    Region;
    Program
</SU>
<AB>"An algorithm for nucleic acid and protein sequence alignment is
presented. It is a non-metric local similarity minimal-difference algorithm and
in the current implementation, assembles the matching regions found into a
pseudo-global format. Its strength are its speed of execution and the 
especially
convenient presentation of its output. The algorithm is intended for use in
sequence melding and local (small-region) similarity searching."
</AB>
<JT>Bull Math Biol</JT>
<PY>1984</PY>
<VO>46</VO>
<NO>4</NO>
<PP>579-590</PP>
</SEQ>

<SEQ>
<UI>0484   Day,G.R.      Statistical Significan.. Nucleic Acids R 82 
10(24):8323-83
</UI>
<AU>Day GR;
    Blake RD
</AU>
<TI>Statistical Significance of Symmetrical and Repetitive Segments of DNA
</TI>
<SU>Sequence analysis;
    Significance;
    USA;
    Statistical;
    Repetition;
    Segment;
    DNA
</SU>
<AB>"Methods of computer analysis for the recurrence of symmetrical and
repetitive elements in large numbers of DNA sequences are described, together
with derivations of appropriate quantitative criteria for the evaluation of the
statistical significance of these elements in DNAs of different base
composition."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1982</PY>
<VO>10</VO>
<NO>24</NO>
<PP>8323-8339</PP>
</SEQ>

<SEQ>
<UI>0485   Day,W.H.E.    Properties of Levensht.. Bull.Math.Biol. 84 
46(2):327-332
</UI>
<AU>Day WHE
</AU>
<TI>Properties of Levenshtein Metrics on Sequences
</TI>
<SU>Sequence proximity;
    CA
</SU>
<AB>"Those Levenshtein dissimilarity measures based on insertions and
deletions are analyzed by a model involving valuations on a partially ordered
set. The model reveals structural relationships among poset, valuation and
dissimilarity measure. As a consequence, certain Levenshtein dissimilarity
measures are shown to be metrics characterized by betweenness properties and
computable in terms of well-known measures of sequence similarity."
</AB>
<JT>Bull Math Biol</JT>
<PY>1984</PY>
<VO>46</VO>
<NO>2</NO>
<PP>327-332</PP>
</SEQ>

<SEQ>
<UI>0486   Day,W.H.E.    An Empirical Evaluatio.. New Approache.. 
94Springer-Verlag
</UI>
<AU>Day WHE;
    Gordon AD
</AU>
<TI>An Empirical Evaluation of Consensus Rules for Molecular Sequences
</TI>
<ED>Diday E
    Lechevallier Y;
    Schader M;
    Bertrand P;
    Burtschy B
</ED>
<BK>New Approaches in Classification and Data Analysis
</BK>
<SU>Consensus sequence;
    Probabilistic;
    CA
</SU>
<AB>"We investigate relationships among several consensus methods for
molecular sequences: c(P*), the containing subset method of Gordon [1994]; gp,
the generalized plurality rule method of Day and McMorris (1992); and sp, a
method based on the simple plurality rule."
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1994</PY>
<PP>347-355</PP>
</SEQ>

<SEQ>
<UI>0487   Day,W.H.E.    Consensus Sequences Ba.. Bull.Math.Biol. 92 
54(6):1057-106
</UI>
<AU>Day WHE;
    McMorris FR
</AU>
<TI>Consensus Sequences Based on Plurality Rule
</TI>
<SU>Consensus sequence;
    Plurality rule;
    CA;
    Consensus method
</SU>
<AB>"We apply concepts of social choice theory, in particular those 
concerning
median and plurality rules, to investigate the problem of finding a consensus 
of
aligned molecular sequences. ... Our results concern plurality rules which are
median rules, are characterized by the Condorcet properties, and are efficient
to calculate. Our approach is axiomatic."
</AB>
<JT>Bull Math Biol</JT>
<PY>1992</PY>
<VO>54</VO>
<NO>6</NO>
<PP>1057-1068</PP>
</SEQ>

<SEQ>
<UI>0488   Day,W.H.E.    Critical Comparison of.. Nucleic Acids R 92 
20(5):1093-109
</UI>
<AU>Day WHE;
    McMorris FR
</AU>
<TI>Critical Comparison of Consensus Methods for Molecular Sequences
</TI>
<SU>Consensus sequence;
    Review;
    CA;
    Consensus method
</SU>
<AB>"We conducted a critical comparison of nine consensus methods for
sequences, of which eight were used in papers appearing in this journal. We
report the results of that comparison, and we make recommendations which we 
hope
will assist researchers when they must select particular consensus methods for
particular applications."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1992</PY>
<VO>20</VO>
<NO>5</NO>
<PP>1093-1099</PP>
</SEQ>

<SEQ>
<UI>0489   Day,W.H.E.    Interpreting Consensus.. Math.Biosci.    92 
111(2):231-247
</UI>
<AU>Day WHE;
    McMorris FR
</AU>
<TI>Interpreting Consensus Sequences Based on Plurality Rule
</TI>
<SU>Consensus sequence;
    Plurality rule;
    CA;
    Consensus method;
    Profile
</SU>
<AB>"Our goal is to help researchers interpret the results of a function,
based on the concept of plurality rule, that calculates a consensus of a 
profile
of molecular bases. By expressing the plurality rule function as a composition
of simpler functions, we obtain both an algorithm to calculate the consensus
result and an upper bound on the number of nonequivalent results."
</AB>
<JT>Math Biosci</JT>
<PY>1992</PY>
<VO>111</VO>
<NO>2</NO>
<PP>231-247</PP>
</SEQ>

<SEQ>
<UI>0490   Day,W.H.E.    Threshold Consensus Me.. J.Theor.Biol.   92 
159(4):481-489
</UI>
<AU>Day WHE;
    McMorris FR
</AU>
<TI>Threshold Consensus Methods for Molecular Sequences
</TI>
<SU>Consensus sequence;
    CA;
    Consensus method
</SU>
<AB>"We introduce a parameterized threshold consensus method ... for 
molecular
sequences which is based on a majority-rule voting principle."
</AB>
<JT>J Theor Biol</JT>
<PY>1992</PY>
<VO>159</VO>
<NO>4</NO>
<PP>481-489</PP>
</SEQ>

<SEQ>
<UI>0491   Day,W.H.E.    Analysing Molecular Se.. N.Z.J.Bot.      93 
31(3):211-218
</UI>
<AU>Day WHE;
    McMorris FR
</AU>
<TI>Analysing Molecular Sequences Using Consensus
</TI>
<SU>Consensus sequence;
    Review;
    CA;
    Consensus method
</SU>
<AB>"Methods for discovering consensus sequences are surveyed. Included are
methods based on frequency thresholds, voting strategies, heuristics,
neighbourhoods, and measures of inhomogeneity or information content."
</AB>
<JT>N Z J Bot</JT>
<PY>1993</PY>
<VO>31</VO>
<NO>3</NO>
<PP>211-218</PP>
</SEQ>

<SEQ>
<UI>0492   Day,W.H.E.    Discovering Consensus .. Information a.. 
93Springer-Verlag
</UI>
<AU>Day WHE;
    McMorris FR
</AU>
<TI>Discovering Consensus Molecular Sequences
</TI>
<ED>Opitz O
    Lausen B;
    Klar R
</ED>
<BK>Information and Classification - Concepts, Methods and Applications
</BK>
<SU>Consensus sequence;
    Review;
    CA;
    Consensus method
</SU>
<AB>"We survey methods for discovering consensus sequences such as those 
based
on frequency thresholds, voting strategies, heuristics, neighbourhoods, and
measures of inhomogeneity or information content."
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1993</PY>
<PP>393-402</PP>
</SEQ>

<SEQ>
<UI>0493   Day,W.H.E.    On the Consistency of .. J.Classif.      94 
11(2):??-??
</UI>
<AU>Day WHE;
    McMorris FR
</AU>
<TI>On the Consistency of the Plurality Rule Consensus Function for Molecular
Sequences
</TI>
<SU>Consensus sequence;
    Plurality rule;
    CA;
    Function;
    Profile;
    Consistency
</SU>
<AB>The plurality rule consensus function fails to satisfy properties of
consistency that enable users to understand its behaviour for long profiles in
terms of its behaviour for short profiles. Because consistency is a desirable
feature of consensus functions, the authors explore the boundaries of its
applicability to the plurality rule consensus function
</AB>
<JT>J Classif</JT>
<PY>1994</PY>
<VO>11</VO>
<NO>2</NO>
<PP>??-??</PP>
</SEQ>

<SEQ>
<UI>0494   Day,W.H.E.    The Computation of Con.. Math.Comput.Mod 93 
17(10):49-52
</UI>
<AU>Day WHE;
    McMorris FR
</AU>
<TI>The Computation of Consensus Patterns in DNA Sequences
</TI>
<SU>Consensus sequence;
    Longest common;
    Complexity;
    CA;
    DNA
</SU>
<AB>"Two important consensus problems are closely related to two well-known
sequence problems. M. Waterman's problem of finding consensus strings is a
natural extension of the Longest Common Substring problem. The problem of
identifying consensus subsequences is a natural extension of the Longest Common
Subsequence problem, and thus is NP-hard."
</AB>
<JT>Math Comput Modelling </JT>
<PY>1993</PY>
<VO>17</VO>
<NO>10</NO>
<PP>49-52</PP>
</SEQ>

<SEQ>
<UI>0495   Day,W.H.E.    On the Existence of Co.. J.Comput.Inform 91 
2(2):123-137
</UI>
<AU>Day WHE;
    Mirkin BG
</AU>
<TI>On the Existence of Constrained Partitions of Integers
</TI>
<SU>Consensus sequence;
    Plurality rule;
    CA;
    Consensus method
</SU>
<AB>"We confirm Day and McMorris's conjecture that the plurality-rule
consensus function has exactly 26 nonequivalent results when it is used to
analyse molecules with four bases."
</AB>
<JT>J Comput Inform</JT>
<PY>1991</PY>
<VO>2</VO>
<NO>2</NO>
<PP>123-137</PP>
</SEQ>

<SEQ>
<UI>0496   Dayhoff,M.O.  Establishing Homologie.. Methods Enzymol 83 
91:524-544
</UI>
<AU>Dayhoff MO;
    Barker WC;
    Hunt LT
</AU>
<TI>Establishing Homologies in Protein Sequences
</TI>
<SU>Sequence proximity;
    Substitution;
    USA;
    Statistical;
    PAM;
    Homology;
    Protein
</SU>
<AB>In Hirs,C.H.W., Timasheff,S.N. (Eds.), Enzyme Structure, Part I. "We will
be particularly concerned with statistical tests capable of illuminating even
very distant relationships. ... From these tests we concluded that ... the MDM
matrix [mutation data scoring matrix] for 250 PAMs (Fig. 3) is the best matrix
for detecting distantly related sequences."
</AB>
<JT>Methods Enzymol</JT>
<PY>91</PY>
<VO>91</VO>
<PP>524-544</PP>
</SEQ>

<SEQ>
<UI>0497   Dayhoff,M.O.  A Model of Evolutionar.. Atlas of Prot.. 78National 
Biomed
</UI>
<AU>Dayhoff MO;
    Schwartz RM;
    Orcutt BC
</AU>
<TI>A Model of Evolutionary Change in Proteins
</TI>
<ED>Dayhoff MO
</ED>
<BK>Atlas of Protein Sequence and Structure, Volume 5, Supplement 3, 1978
</BK>
<SU>Sequence proximity;
    Substitution;
    PAM;
    Scoring;
    USA;
    Evolutionary distance;
    Protein;
    Model
</SU>
<AB>"The matrices derived from [protein] data that describe the amino acid
replacement probabilities between two sequences at various evolutionary
distances are more accurate and the scoring matrix that is derived is more
sensitive in detecting distant relationships than the one that we previously
derived."
</AB>
<PU>National Biomedical Research Foundation </PU>
<PL>Washington, DC </PL>
<PY>1978</PY>
<PP>345-352</PP>
</SEQ>

<SEQ>
<UI>0498   Deken,J.      Probabilistic Behavior.. Time Warps, S.. 
83Addison-Wesley
</UI>
<AU>Deken J
</AU>
<TI>Probabilistic Behavior of Longest-Common-Subsequence Length
</TI>
<ED>Sankoff D
    Kruskal JB
</ED>
<BK>Time Warps, String Edits, and Macromolecules: The Theory and Practice of
Sequence Comparison
</BK>
<SU>Longest common;
    Significance;
    USA;
    Probabilistic
</SU>
<AB>"For a fairly broad class of models for random sequences (cf. Deken 
1979),
it can be shown that the length of a longest common subsequence, divided by the
total sequence length, approaches ... some constant ck that is a function of 
the
random model used and the number k of letters in the alphabet. ... The counting
method used to derive the upper bounds [of ck] is of some independent interest,
and so is described below."
</AB>
<PU>Addison-Wesley </PU>
<PL>Reading, MA </PL>
<PY>1983</PY>
<PP>359-362</PP>
</SEQ>

<SEQ>
<UI>0499   Deken,J.G.    Some Limit Results for.. Discrete Math.  79 26:17-31
</UI>
<AU>Deken JG
</AU>
<TI>Some Limit Results for Longest Common Subsequences
</TI>
<SU>Longest common;
    Significance;
    USA;
    Subsequence
</SU>
<AB>"We specialize to uniform random sequences and give some improvements on
the lower bounds previously obtained [Chvatal, Sankoff 1975] for the proportion
of digits which can be matched in the limit. The main result ... is a system of
lower bounds which improve known results for alphabets of &gt;2 letters."
</AB>
<JT>Discrete Math</JT>
<PY>26</PY>
<VO>26</VO>
<PP>17-31</PP>
</SEQ>

<SEQ>
<UI>0500   DeLisi,C.     Assessing the Signific.. Math.Biosci.    84 69:77-85
</UI>
<AU>DeLisi C;
    Kanehisa M
</AU>
<TI>Assessing the Significance of Local Sequence Homologies
</TI>
<SU>Sequence analysis;
    Significance;
    USA;
    Statistical;
    Probabilistic;
    Homology
</SU>
<AB>"When homology searches are performed against databases containing 
several
hundred thousand residues, an important number to know is the probability that
the homologous sequence would have occurred simply as the result of the large
number of arrangements of short sequences that must be present in any large
collection of disparate sequences. In this note we evaluate different versions
of this question using different sets of rules."
</AB>
<JT>Math Biosci</JT>
<PY>69</PY>
<VO>69</VO>
<PP>77-85</PP>
</SEQ>

<SEQ>
<UI>0501   Depiereux,E.  Simultaneous and Multi.. Protein Eng.    91 
4(6):603-613
</UI>
<AU>Depiereux E;
    Feytmans E
</AU>
<TI>Simultaneous and Multivariate Alignment of Protein Sequences:
Correspondence between Physicochemical Profiles and Structurally Conserved
Regions (SCR)
</TI>
<SU>Multiple alignment;
    Belgium;
    Multivariate;
    Structure;
    Region;
    Profile;
    Protein
</SU>
<AB>"The method ... is based on two basic requirements for a meaningful
alignment. First, each sequence or segment of a sequence is characterized by a
multivariate physicochemical profile. Second, the alignment is performed by
considering all the sequences simultaneously, and the algorithm detects those
regions that form a set of similar profiles."
</AB>
<JT>Protein Eng</JT>
<PY>1991</PY>
<VO>4</VO>
<NO>6</NO>
<PP>603-613</PP>
</SEQ>

<SEQ>
<UI>0502   Depiereux,E.  MATCH-BOX: A Fundament.. Comput.Appl.Bio 92 
8(5):501-509
</UI>
<AU>Depiereux E;
    Feytmans E
</AU>
<TI>MATCH-BOX: A Fundamentally New Algorithm for the Simultaneous Alignment 
of
Several Protein Sequences
</TI>
<SU>Multiple alignment;
    Segment;
    Belgium;
    Gap;
    Protein;
    Algorithm
</SU>
<AB>The main problems in automatic procedures for multiple alignment are
related to the successive pairwise alignment approach and to the choice of gap
weighting. The authors' algorithm searches for complete matches common to all
the sequences without performing pairwise alignment and regardless of gap
weighting
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>5</NO>
<PP>501-509</PP>
</SEQ>

<SEQ>
<UI>0503   Deshpande,A.S A Platform for Biologi.. Comput.Appl.Bio 91 
7(2):237-247
</UI>
<AU>Deshpande AS;
    Richards DS;
    Pearson WR
</AU>
<TI>A Platform for Biological Sequence Comparison on Parallel Computers
</TI>
<SU>Database search;
    Parallel;
    USA;
    Sequence comparison;
    Sequence database;
    FASTA;
    Program
</SU>
<AB>"We have written two programs for searching biological sequence databases
that run on Intel hypercube computers. PSCANLIB compares a single sequence
against a sequence library, and PCOMPLIB compares all the entries in one
sequence library against a second library. ... We have implemented the rapid
FASTA sequence comparison algorithm and the more rigorous Smith-Waterman
algorithm within this framework."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1991</PY>
<VO>7</VO>
<NO>2</NO>
<PP>237-247</PP>
</SEQ>

<SEQ>
<UI>0504   Devereux,J.   A Comprehensive Set of.. Nucleic Acids R 84 
12(1):387-395
</UI>
<AU>Devereux J;
    Haeberli P;
    Smithies O
</AU>
<TI>A Comprehensive Set of Sequence Analysis Programs for the VAX
</TI>
<SU>Sequence analysis;
    USA;
    Program;
    Sequence comparison
</SU>
<AB>"The University of Wisconsin Genetics Computer Group (UWGCG) has been
organized to develop computational tools for the analysis and publication of
biological sequence data. A group of programs that will interact with each 
other
has been developed for the Digital Equipment Corporation VAX computer using the
VMS system."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1984</PY>
<VO>12</VO>
<NO>1</NO>
<PP>387-395</PP>
</SEQ>

<SEQ>
<UI>0505   Doolittle,R.F Similar Amino Acid Seq.. Science         81 214(9 
Oct.):14
</UI>
<AU>Doolittle RF
</AU>
<TI>Similar Amino Acid Sequences: Chance or Common Ancestry
</TI>
<SU>Significance;
    USA;
    Sequence comparison;
    Amino acid
</SU>
<AB>"Sometimes the surviving similarities [between amino acid sequences] are
so vague that even computer-based sequence comparison procedures are unable to
validate relationships. In other cases similar sequences may appear in totally
alien proteins as a result of mere chance or, occasionally, by the convergent
evolution of sequences with special properties."
</AB>
<JT>Science </JT>
<PY>1981</PY>
<VO>214</VO>
<NO>9 Oct.</NO>
<PP>149-159</PP>
</SEQ>

<SEQ>
<UI>0506   Doolittle,R.F Searching through Sequ.. Methods Enzymol 90 
183:99-110
</UI>
<AU>Doolittle RF
</AU>
<TI>Searching through Sequence Databases
</TI>
<SU>Database search;
    USA;
    Sequence database;
    Sequence search;
    Significance
</SU>
<AB>"The results of a sequence search usually require that judgments be made
about the significance of what has or has not been found. The primary aim of
this chapter is to provide a few simple guidelines and hints about how to make
these judgments. When it comes to low-level similarity, caution is always
warranted."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>99-110</PP>
</SEQ>

<SEQ>
<UI>0507   Doolittle,R.F Reconstructing History.. Protein Sci.    92 
1(2):191-200
</UI>
<AU>Doolittle RF
</AU>
<TI>Reconstructing History with Amino Acids
</TI>
<SU>Phylogeny;
    USA;
    Module;
    Mosaic;
    Shuffling
</SU>
<AB>"Among the factors that can confound the reconstruction of events,
however, are occasional horizontal gene transfers and exon shuffling. The 
latter
has led to a number of mosaic proteins, many of which contain various
combinations of a relatively small set of modules like the epidermal growth
factor domain."
</AB>
<JT>Protein Sci</JT>
<PY>1992</PY>
<VO>1</VO>
<NO>2</NO>
<PP>191-200</PP>
</SEQ>

<SEQ>
<UI>0508   Doolittle,R.F Nearest Neighbor Proce.. Methods Enzymol 90 
183:659-669
</UI>
<AU>Doolittle RF;
    Feng DF
</AU>
<TI>Nearest Neighbor Procedure for Relating Progressively Aligned Amino Acid
Sequences
</TI>
<SU>Multiple alignment;
    Phylogeny;
    USA;
    Amino acid
</SU>
<AB>Application of the progressive alignment method (Feng and Doolittle 1987,
1990). It is a nearest neighbor method called PAPA: parsimony after progressive
alignment
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>659-669</PP>
</SEQ>

<SEQ>
<UI>0509   Dumas,J.P.    Efficient Algorithms f.. Nucleic Acids R 82 
10(1):197-206
</UI>
<AU>Dumas JP;
    Ninio J
</AU>
<TI>Efficient Algorithms for Folding and Comparing Nucleic Acid Sequences
</TI>
<SU>Pairwise comparison;
    FR;
    Repeat;
    Structure;
    Regularities;
    Nucleic acid;
    Folding;
    Algorithm
</SU>
<AB>"Fast algorithms for analysing sequence data are presented. An algorithm
for strict homologies finds all common subsequences of length &gt;= 6 in two given
sequences. ... We shall describe, in its simplest form, an algorithm to search
for strict repeats within a sequence. ... With minor changes the algorithm
searches for palindromes or inverted repeats or searches for homologies,
inverted homologies or complementarities between two sequences."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1982</PY>
<VO>10</VO>
<NO>1</NO>
<PP>197-206</PP>
</SEQ>

<SEQ>
<UI>0510   Edmiston,E.W. Parallel Processing of.. Internat.J.Para 88 
17(3):259-275
</UI>
<AU>Edmiston EW;
    Core NG;
    Saltz JH;
    Smith RM
</AU>
<TI>Parallel Processing of Biological Sequence Comparison Algorithms
</TI>
<SU>Pairwise alignment;
    Subalignment;
    Parallel;
    USA;
    Sequence comparison;
    Algorithm
</SU>
<AB>"We present the results of initial investigations using the Intel iPSC/1
hypercube and the Connection Machine (CM-I) for these comparisons [of 
biological
sequences]. Since these machines have very different architectures, the issues
and performance trade-offs discussed have a wide applicability for the parallel
processing of biological sequence comparisons."
</AB>
<JT>Internat J Parallel Programming </JT>
<PY>1988</PY>
<VO>17</VO>
<NO>3</NO>
<PP>259-275</PP>
</SEQ>

<SEQ>
<UI>0511   Edmiston,E.   Parallelization of the.. Proceedings o.. 87Penn 
State Pres
</UI>
<AU>Edmiston E;
    Wagner RA
</AU>
<TI>Parallelization of the Dynamic Programming Algorithm for Comparison of
Sequences
</TI>
<BK>Proceedings of the 1987 International Conference on Parallel Processing
</BK>
<SU>Pairwise alignment;
    Parallel;
    USA;
    IL;
    Dynamic programming;
    Dynamic;
    Algorithm
</SU>
<AB>17-21 August 1987, Chicago, IL. "We look at parallel algorithms for two
similar problems: finding a best match between two sequences and finding a best
match of a short sequence to a subsequence of a long sequence. A method for
parallelizing the dynamic programming solutions to both of these problems is
presented. ... The parallel algorithms can execute on an SIMD machine ...."
</AB>
<PU>Penn State Press </PU>
<PL>Philadelphia, PA </PL>
<PY>1987</PY>
<PP>78-80</PP>
</SEQ>

<SEQ>
<UI>0512   Eilam-Tzoreff Matching Patterns in S.. Theoret.Comput. 88 
60(3):231-254
</UI>
<AU>Eilam-Tzoreff T;
    Vishkin U
</AU>
<TI>Matching Patterns in Strings Subject to Multi-linear Transformations
</TI>
<SU>String match;
    Pattern match;
    IL
</SU>
<AB>"Suppose we are given two strings of real numbers. ... We consider
problems within the following framework. Suppose each symbol of the pattern was
modified by any transformation which is a member in some family of
transformations. Find all occurrences of the pattern in the text where the
pattern may appear subject to any one of these transformations. Problems are
introduced and efficient algorithms are given."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>1988</PY>
<VO>60</VO>
<NO>3</NO>
<PP>231-254</PP>
</SEQ>

<SEQ>
<UI>0513   Eppstein,D.   Sequence Comparison wi.. J.Algorithms    90 
11(1):85-101
</UI>
<AU>Eppstein D
</AU>
<TI>Sequence Comparison with Mixed Convex and Concave Costs
</TI>
<SU>Pairwise alignment;
    Sequence proximity;
    Gap;
    USA;
    Sequence comparison
</SU>
<AB>"Recently a number of algorithms have been developed for solving the
minimum-weight edit sequence problem with non-linear costs for multiple
insertions and deletions. We extend these algorithms to cost functions that are
neither convex nor concave, but a mixture of both."
</AB>
<JT>J Algorithms </JT>
<PY>1990</PY>
<VO>11</VO>
<NO>1</NO>
<PP>85-101</PP>
</SEQ>

<SEQ>
<UI>0514   Eppstein,D.   Speeding up Dynamic Pr.. IEEE Sympos.Fou 88 
29:488-496
</UI>
<AU>Eppstein D;
    Galil Z;
    Giancarlo R
</AU>
<TI>Speeding up Dynamic Programming
</TI>
<SU>Pairwise alignment;
    USA;
    Dynamic programming;
    Data structure;
    Dynamic
</SU>
<AB>24-26 October 1988. "A number of important computational problems in
molecular biology ... can be expressed as recurrences which have typically been
solved with dynamic programming. By using more sophisticated data structures,
and by taking advantage of further structure from the applications, we speed up
the computation of several of these recurrences by one or two orders of
magnitude."
</AB>
<JT>IEEE Sympos Found Comput Sci</JT>
<PY>29</PY>
<VO>29</VO>
<PP>488-496</PP>
</SEQ>

<SEQ>
<UI>0515   Eppstein,D.   Sparse Dynamic Program.. J.Assoc.Comput. 92 
39(3):519-545
</UI>
<AU>Eppstein D;
    Galil Z;
    Giancarlo R;
    Italiano GF
</AU>
<TI>Sparse Dynamic Programming I: Linear Cost Functions
</TI>
<SU>Pairwise alignment;
    USA;
    Dynamic programming;
    Sequence comparison;
    Function;
    Dynamic
</SU>
<AB>"Dynamic programming solutions to a number of different recurrence
equations for sequence comparison ... are considered. These recurrences are
defined over a number of points that is quadratic in the input size; however
only a sparse set matters for the result. Efficient algorithms for these
problems are given, when the weight functions used in the recurrences are taken
to be linear."
</AB>
<JT>J Assoc Comput Mach</JT>
<PY>1992</PY>
<VO>39</VO>
<NO>3</NO>
<PP>519-545</PP>
</SEQ>

<SEQ>
<UI>0516   Eppstein,D.   Sparse Dynamic Program.. J.Assoc.Comput. 92 
39(3):546-567
</UI>
<AU>Eppstein D;
    Galil Z;
    Giancarlo R;
    Italiano GF
</AU>
<TI>Sparse Dynamic Programming II: Convex and Concave Cost Functions
</TI>
<SU>Pairwise alignment;
    USA;
    Dynamic programming;
    Gap;
    Function;
    Dynamic
</SU>
<AB>Continues Eppstein et al. (1992a). "Efficient algorithms are given for
solving these problems, when the cost of a gap in the alignment ... is taken as
a convex or concave function of the gap ... length."
</AB>
<JT>J Assoc Comput Mach</JT>
<PY>1992</PY>
<VO>39</VO>
<NO>3</NO>
<PP>546-567</PP>
</SEQ>

<SEQ>
<UI>0517   Erickson,B.W. Recognition of Pattern.. Time Warps, S.. 
83Addison-Wesley
</UI>
<AU>Erickson BW;
    Sellers PH
</AU>
<TI>Recognition of Patterns in Genetic Sequences
</TI>
<ED>Sankoff D
    Kruskal JB
</ED>
<BK>Time Warps, String Edits, and Macromolecules: The Theory and Practice of
Sequence Comparison
</BK>
<SU>Match with k differences;
    USA;
    Genetic;
    Recognition
</SU>
<AB>This chapter "deals with the question of how to find the consecutive
string (or strings) in a longer sequence a with 'best possible agreement' to a
shorter sequence b. ... It presents algorithms to solve two different versions
of this question .... In one version, best agreement means that the string has
the smallest possible distance to b. In the other, best agreement ... means 
that
no substring or superstring has smaller distance to b."
</AB>
<PU>Addison-Wesley </PU>
<PL>Reading, MA </PL>
<PY>1983</PY>
<PP>55-91</PP>
</SEQ>

<SEQ>
<UI>0518   Felsenstein,J Phylogenies from Molec.. Annu.Rev.Genet. 88 
22:521-565
</UI>
<AU>Felsenstein J
</AU>
<TI>Phylogenies from Molecular Sequences: Inference and Reliability
</TI>
<SU>Multiple alignment;
    Phylogeny;
    Statistical;
    Reliability;
    Analytical;
    Robustness;
    USA
</SU>
<AB>Estimating phylogenies. Methods for inferring phylogenies. Statistics and
the justification of methods. Statistical tests of phylogenies. The bootstrap,
the jackknife, and other resampling methods. Simulation studies
</AB>
<JT>Annu Rev Genet</JT>
<PY>22</PY>
<VO>22</VO>
<PP>521-565</PP>
</SEQ>

<SEQ>
<UI>0519   Felsenstein,J An Efficient Method fo.. Nucleic Acids R 82 
10(1):133-139
</UI>
<AU>Felsenstein J;
    Sawyer S;
    Kochin R
</AU>
<TI>An Efficient Method for Matching Nucleic Acid Sequences
</TI>
<SU>Pairwise comparison;
    USA;
    Fourier;
    Nucleic acid
</SU>
<AB>"A method of computing the fraction of matches between two nucleic acid
sequences at all possible alignments is described. It makes use of the Fast
Fourier Transform. ... This method will complement algorithms for efficiently
finding the longest matching parts of two sequences, and is faster than 
existing
algorithms for finding matches allowing deletions and insertions."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1982</PY>
<VO>10</VO>
<NO>1</NO>
<PP>133-139</PP>
</SEQ>

<SEQ>
<UI>0520   Feng,D.F.     Progressive Sequence A.. J.Mol.Evol.     87 
25:351-360
</UI>
<AU>Feng DF;
    Doolittle RF
</AU>
<TI>Progressive Sequence Alignment as a Prerequisite to Correct Phylogenetic
Trees
</TI>
<SU>Multiple alignment;
    Clustering;
    USA;
    Sequence alignment;
    Needleman-Wunsch;
    Phylogenetic
</SU>
<AB>"A progressive alignment method is described that utilizes the Needleman
and Wunsch pairwise alignment algorithm iteratively to achieve the multiple
alignment of a set of protein sequences and to construct an evolutionary tree
depicting their relationship. ... The method has the added virtue of providing
multiple sequence alignments quickly and simply by completely objective
criteria."
</AB>
<JT>J Mol Evol</JT>
<PY>25</PY>
<VO>25</VO>
<PP>351-360</PP>
</SEQ>

<SEQ>
<UI>0521   Feng,D.F.     Progressive Alignment .. Methods Enzymol 90 
183:375-387
</UI>
<AU>Feng DF;
    Doolittle RF
</AU>
<TI>Progressive Alignment and Phylogenetic Tree Construction of Protein
Sequences
</TI>
<SU>Multiple alignment;
    Clustering;
    USA;
    Gap;
    Protein;
    Phylogenetic
</SU>
<AB>"The ... progressive alignment method ... produces a multiple alignment
for a set of protein sequences by iteratively acting on the sequences. The
essence of the method is based on the simple rule, 'once a gap, always a gap.'
Consequently, the order in which the sequences are arranged is crucial. In this
regard, an approximate phylogenetic order of the sequences is first determined
by a series of pairwise alignments by the Needleman and Wunsch method."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>375-387</PP>
</SEQ>

<SEQ>
<UI>0522   Feng,D.F.     Aligning Amino Acid Se.. J.Mol.Evol.     85 
21:112-125
</UI>
<AU>Feng DF;
    Johnson MS;
    Doolittle RF
</AU>
<TI>Aligning Amino Acid Sequences: Comparison of Commonly Used Methods
</TI>
<SU>Sequence proximity;
    Substitution;
    Review;
    USA;
    Amino acid
</SU>
<AB>"We examined two extensive families of protein sequences using four
different alignment schemes that employ various degrees of 'weighting' in order
to determine which approach is most sensitive in establishing relationships. 
All
alignments used a similarity approach based on a general algorithm devised by
Needleman and Wunsch."
</AB>
<JT>J Mol Evol</JT>
<PY>21</PY>
<VO>21</VO>
<PP>112-125</PP>
</SEQ>

<SEQ>
<UI>0523   Fickett,J.W.  Fast optimal alignment   Nucleic Acids R 84 
12(1):175-179
</UI>
<AU>Fickett JW
</AU>
<TI>Fast optimal alignment
</TI>
<SU>Pairwise alignment;
    USA;
    Sequence alignment;
    Optimal
</SU>
<AB>"We show how to speed up sequence alignment algorithms of the type
introduced by Needleman and Wunsch .... What we do is reorder the computation 
of
the usual alignment matrix so that the optimal alignment is ordinarily found
when only a small fraction of the matrix is filled. The number of matrix
elements which have to be computed is related to the distance between the
sequences being aligned ...."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1984</PY>
<VO>12</VO>
<NO>1</NO>
<PP>175-179</PP>
</SEQ>

<SEQ>
<UI>0524   Fischel-Ghods Alignment of Protein S.. Protein Eng.    90 
3(7):577-581
</UI>
<AU>Fischel-Ghodsian F;
    Mathiowitz G;
    Smith TF
</AU>
<TI>Alignment of Protein Sequences using Secondary Structure: a Modified
Dynamic Programming Method
</TI>
<SU>Multiple alignment;
    Structure;
    USA;
    Dynamic programming;
    Protein;
    Secondary;
    Dynamic
</SU>
<AB>"A method for comparison of protein sequences based on their primary and
secondary structure is described. ... Sequences are compared with a dynamic
programming method (STRALIGN) that includes a similarity matrix for both the
amino acids and secondary structure."
</AB>
<JT>Protein Eng</JT>
<PY>1990</PY>
<VO>3</VO>
<NO>7</NO>
<PP>577-581</PP>
</SEQ>

<SEQ>
<UI>0525   Fitch,W.M.    An Improved Method of .. J.Mol.Biol.     66 16:9-16
</UI>
<AU>Fitch WM
</AU>
<TI>An Improved Method of Testing for Evolutionary Homology
</TI>
<SU>Sequence proximity;
    USA;
    Homology
</SU>
<AB>"A more sensitive method of searching for a homologous relation between
two proteins is presented. The method depends on determining the minimum number
of nucleotides which must be altered to permit the conversion of one sequence
into the other."
</AB>
<JT>J Mol Biol</JT>
<PY>16</PY>
<VO>16</VO>
<PP>9-16</PP>
</SEQ>

<SEQ>
<UI>0526   Fitch,W.M.    Further Improvements i.. J.Mol.Biol.     70 49:1-14
</UI>
<AU>Fitch WM
</AU>
<TI>Further Improvements in the Method of Testing for Evolutionary Homology
among Proteins
</TI>
<SU>Sequence proximity;
    USA;
    Homology;
    Protein
</SU>
<AB>"An earlier method for detecting significant genetic relatedness between
two gene products (Fitch 1966) is improved upon through three specific 
measures.
... The third measure [also shows] how quickly the probability that a result
would be ascribed to a chance event decreases as the length of the sequences
being compared is increased."
</AB>
<JT>J Mol Biol</JT>
<PY>49</PY>
<VO>49</VO>
<PP>1-14</PP>
</SEQ>

<SEQ>
<UI>0527   Fitch,W.M.    Random Sequences         J.Mol.Biol.     83 
163:171-176
</UI>
<AU>Fitch WM
</AU>
<TI>Random Sequences
</TI>
<SU>Sequence analysis;
    Significance;
    USA
</SU>
<AB>"The meaning of random [sequence] is briefly discussed along with a
distinction between representative sequences and shuffled sequences. The
rationale in choosing between them and a method for shuffling a sequence while
preserving nearest-neighbor frequencies is given."
</AB>
<JT>J Mol Biol</JT>
<PY>163</PY>
<VO>163</VO>
<PP>171-176</PP>
</SEQ>

<SEQ>
<UI>0528   Fitch,W.M.    Optimal Sequence Align.. Proc.Nat.Acad.S 83 
80:1382-1386
</UI>
<AU>Fitch WM;
    Smith TF
</AU>
<TI>Optimal Sequence Alignments
</TI>
<SU>Sequence proximity;
    USA;
    Sequence alignment;
    Codon;
    Gap;
    Optimal
</SU>
<AB>"Current theory is adequate to the task of finding an optimal alignment
between two character strings such as nucleic acids. Most algorithms currently
in use must fail to find the homologous alignment between a set of codons for
the chicken a- and b-hemoglobin sequence when it is in fact discoverable by a
more general treatment of gaps. Fundamental reasons for this are discussed."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>80</PY>
<VO>80</VO>
<PP>1382-1386</PP>
</SEQ>

<SEQ>
<UI>0529   Foulser,D.E.  Parallel Computation o.. Comput.Biomed.R 90 
23(4):310-331
</UI>
<AU>Foulser DE;
    Core NG
</AU>
<TI>Parallel Computation of Multiple Biological Sequence Comparisons
</TI>
<SU>Multiple comparison;
    Common feature;
    Parallel;
    USA;
    Sequence comparison;
    Search tree
</SU>
<AB>"This paper presents a parallel computer implementation of a suffix tree-
based method for rapid multiple sequence comparisons, as a variant on a method
proposed recently by Karlin et al." See Karlin, Ghandour, Ost, Tavare, and Korn
(1983), and Karlin, Morris, Ghandour, and Leung (1988)
</AB>
<JT>Comput Biomed Res</JT>
<PY>1990</PY>
<VO>23</VO>
<NO>4</NO>
<PP>310-331</PP>
</SEQ>

<SEQ>
<UI>0530   Fredman,M.L.  Algorithms for Computi.. Bull.Math.Biol. 84 
46(4):553-566
</UI>
<AU>Fredman ML
</AU>
<TI>Algorithms for Computing Evolutionary Similarity Measures with Length
Independent Gap Penalties
</TI>
<SU>Sequence alignment;
    Multiple alignment;
    USA;
    Gap;
    Similarity;
    Algorithm
</SU>
<AB>"We give algorithms for computing the extent of similarity between two or
three sequences of letters. The similarity measures we consider include a
penalty for inserting gaps within the sequence in order to enhance similarity.
The magnitude of the penalty for gaps is assumed to be independent of their 
size
in order to accommodate certain biological applications."
</AB>
<JT>Bull Math Biol</JT>
<PY>1984</PY>
<VO>46</VO>
<NO>4</NO>
<PP>553-566</PP>
</SEQ>

<SEQ>
<UI>0531   Friedemann,T. Alignment of Multiple .. Comput.Appl.Bio 88 
4(1):213-214
</UI>
<AU>Friedemann T
</AU>
<TI>Alignment of Multiple DNA and Protein Sequence Data
</TI>
<SU>Multiple alignment;
    Review;
    USA;
    Sequence alignment;
    Protein;
    DNA
</SU>
<AB>Summary of seven multiple sequence alignment programs and their
applicability
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1988</PY>
<VO>4</VO>
<NO>1</NO>
<PP>213-214</PP>
</SEQ>

<SEQ>
<UI>0532   Frishman,D.   Recognition of Distant.. J.Mol.Biol.     92 
228:951-962
</UI>
<AU>Frishman D;
    Argos P
</AU>
<TI>Recognition of Distantly Related Protein Sequences using Conserved Motifs
and Neural Networks
</TI>
<SU>Match complex patterns;
    DE;
    Motif;
    Neural;
    Protein;
    Network;
    Recognition
</SU>
<AB>"A sensitive technique for protein sequence motif recognition based on
neural networks has been developed. ... The objective of the present
investigation is to develop an automatic and sensitive algorithm to delineate
motifs in multiply aligned sequences and then to use these patterns in a search
for other distantly related primary structures."
</AB>
<JT>J Mol Biol</JT>
<PY>228</PY>
<VO>228</VO>
<PP>951-962</PP>
</SEQ>

<SEQ>
<UI>0533   Fristensky,B. Improving the Efficien.. Nucleic Acids R 86 
14(1):597-610
</UI>
<AU>Fristensky B
</AU>
<TI>Improving the Efficiency of Dot-matrix Similarity Searches Through Use of
an Oligomer Table
</TI>
<SU>Pairwise comparison;
    Dot;
    USA;
    Similarity;
    Oligomer
</SU>
<AB>"Dot-matrix sequence similarity searches can be greatly speeded up 
through
use of a table listing all locations of short oligomers in one to the sequences
to find potential similarities with a second sequence. The algorithm described
finds similarities between two sequences ... [by] comparing L residues at a 
time
...."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1986</PY>
<VO>14</VO>
<NO>1</NO>
<PP>597-610</PP>
</SEQ>

<SEQ>
<UI>0534   Fuchs,R.      MacPattern: Protein Pa.. Comput.Appl.Bio 91 
7(1):105-106
</UI>
<AU>Fuchs R
</AU>
<TI>MacPattern: Protein Pattern Searching on the Apple Macintosh
</TI>
<SU>Dictionary match;
    DE;
    Pattern definition;
    Protein
</SU>
<AB>"A program is described for rapid detection of protein sequence patterns
on the Apple Macintosh which makes full use of the information contained in the
PROSITE protein pattern database. ... The algorithm used for detecting patterns
is based on the set-membership matrix concept (Cockwell, Giles (1989)), adapted
to the PROSITE pattern definition syntax."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1991</PY>
<VO>7</VO>
<NO>1</NO>
<PP>105-106</PP>
</SEQ>

<SEQ>
<UI>0535   Fitch,W.M.    The Usefulness of Amin.. Evol.Biol.      70 4:67-109
</UI>
<AU>Fitch WM;
    Margoliash E
</AU>
<TI>The Usefulness of Amino Acid and Nucleotide Sequences in Evolutionary
Studies
</TI>
<SU>Pairwise comparison;
    Phylogeny;
    USA;
    Nucleotide;
    Amino acid
</SU>
<AB>Introduction. Detection of Significant Similarities Between Sequences.
Inferring Evolutionary Relationships from Sequence Information. Derived
Evolutionary and Genetic Information.
</AB>
<JT>Evol Biol</JT>
<PY>4</PY>
<VO>4</VO>
<PP>67-109</PP>
</SEQ>

<SEQ>
<UI>0536   Galas,D.J.    Rigorous Pattern-recog.. J.Mol.Biol.     85 
186:117-128
</UI>
<AU>Galas DJ;
    Eggert M;
    Waterman MS
</AU>
<TI>Rigorous Pattern-recognition Methods for DNA Sequences: Analysis of
Promoter Sequences from Escherichia coli
</TI>
<SU>Consensus sequence;
    Neighbourhood;
    USA;
    Statistical;
    Significance;
    DNA
</SU>
<AB>"We have developed rigorous analytical methods for finding unknown
patterns that occur imperfectly in a set of several sequences, and have used
them to examine a set of bacterial promoters. ... We also have provided
estimates for the statistical significance of common patterns discovered in 
sets
of sequences."
</AB>
<JT>J Mol Biol</JT>
<PY>186</PY>
<VO>186</VO>
<PP>117-128</PP>
</SEQ>

<SEQ>
<UI>0537   Galil,Z.      Real-Time Algorithms f.. ACM Sympos.Theo 76 
8:161-173
</UI>
<AU>Galil Z
</AU>
<TI>Real-Time Algorithms for String-Matching and Palindrome Recognition
</TI>
<SU>String match;
    Complexity;
    USA;
    Palindrome;
    Algorithm;
    Recognition
</SU>
<AB>Hershey, PA, 3-5 May 1976. "We give a sufficient condition when an 
on-line
algorithm can be transformed into a real-time algorithm. We use this condition
to construct real-time algorithms for string-matching and palindrome 
recognition
problems by random access machines and by Turing machines."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>8</PY>
<VO>8</VO>
<PP>161-173</PP>
</SEQ>

<SEQ>
<UI>0538   Galil,Z.      On Improving the Worst.. Comm.ACM        79 
22(9):505-508
</UI>
<AU>Galil Z
</AU>
<TI>On Improving the Worst Case Running Time of the Boyer-Moore String
Matching Algorithm
</TI>
<SU>Boyer-Moore;
    IL;
    String match;
    Algorithm
</SU>
<AB>"It is shown how to modify the Boyer-Moore string matching algorithm so
that its worst case running time is linear even when multiple occurrences of 
the
pattern are present in the text."
</AB>
<JT>Comm ACM </JT>
<PY>1979</PY>
<VO>22</VO>
<NO>9</NO>
<PP>505-508</PP>
</SEQ>

<SEQ>
<UI>0539   Galil,Z.      String Matching in Rea.. J.Assoc.Comput. 81 
28(1):134-149
</UI>
<AU>Galil Z
</AU>
<TI>String Matching in Real Time
</TI>
<SU>Knuth-Morris-Pratt;
    IL;
    String match
</SU>
<AB>"A sufficient condition for an on-line algorithm to be transformed into a
real-time algorithm is given. This condition is used to construct real-time
algorithms for various string-matching problems by random access machines and 
by
Turing machines." Knuth-Morris-Pratt real-time algorithms are described for RAM
and Turing machine
</AB>
<JT>J Assoc Comput Mach</JT>
<PY>1981</PY>
<VO>28</VO>
<NO>1</NO>
<PP>134-149</PP>
</SEQ>

<SEQ>
<UI>0540   Galil,Z.      Optimal Parallel Algor.. ACM Sympos.Theo 84 
16:240-248
</UI>
<AU>Galil Z
</AU>
<TI>Optimal Parallel Algorithms for String Matching
</TI>
<SU>Parallel;
    IL;
    String match;
    Optimal;
    Algorithm
</SU>
<AB>Washington, DC, 30 April - 2 May 1984. "Let WRAM (PRAM) be a parallel
computer with p processors (RAMs) which share a common memory and are allowed
simultaneous reads and writes (only simultaneous reads). ... We design below
families of parallel algorithms that solve the string matching problem ....
Similar families are also obtained for the problem of finding all initial
palindromes of a given string."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>16</PY>
<VO>16</VO>
<PP>240-248</PP>
</SEQ>

<SEQ>
<UI>0541   Galil,Z.      Optimal Parallel Algor.. Inform.Control  85 
67:144-157
</UI>
<AU>Galil Z
</AU>
<TI>Optimal Parallel Algorithms for String Matching
</TI>
<SU>Parallel;
    IL;
    String match;
    Optimal;
    Algorithm
</SU>
<AB>"Let WRAM (PRAM) be a parallel computer with p processors (RAMs) which
share a common memory and are allowed simultaneous reads and writes (only
simultaneous reads). ... We design below families of parallel algorithms that
solve the string matching problem .... Similar families are also obtained for
the problem of finding all initial palindromes of a given string."
</AB>
<JT>Inform Control (Orlando) </JT>
<PY>67</PY>
<VO>67</VO>
<PP>144-157</PP>
</SEQ>

<SEQ>
<UI>0542   Galil,Z.      A Constant-Time Optima.. ACM Sympos.Theo 92 24:69-76
</UI>
<AU>Galil Z
</AU>
<TI>A Constant-Time Optimal Parallel String-Matching Algorithm
</TI>
<SU>String match;
    Parallel;
    USA;
    Optimal;
    Algorithm
</SU>
<AB>Victoria, BC, 4-6 May 1992. "Given a pattern string, we describe a way to
preprocess it. We design a constant-time optimal parallel algorithm for finding
all occurrences of the (preprocessed) pattern in any given text."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>24</PY>
<VO>24</VO>
<PP>69-76</PP>
</SEQ>

<SEQ>
<UI>0543   Galil,Z.      Improved String Matchi.. SIGACT News     86 17(4, 
whole no
</UI>
<AU>Galil Z;
    Giancarlo R
</AU>
<TI>Improved String Matching with k Mismatches
</TI>
<SU>Match with k mismatches;
    IL;
    Data structure;
    String match;
    Search tree
</SU>
<AB>"Recently, an efficient algorithm for [matching with k mismatches] has
been devised by [Landau and Vishkin]. ... Here we present a compact version of
their algorithm .... The data structure that we use is the suffix tree of the
pattern modified in order to support the static lowest common ancestor 
algorithm
...."
</AB>
<JT>SIGACT News </JT>
<PY>1986</PY>
<VO>17</VO>
<NO>4, whole no. 62</NO>
<PP>52-54</PP>
</SEQ>

<SEQ>
<UI>0544   Galil,Z.      Parallel String Matchi.. Theoret.Comput. 87 
51:341-348
</UI>
<AU>Galil Z;
    Giancarlo R
</AU>
<TI>Parallel String Matching with k Mismatches
</TI>
<SU>Match with k mismatches;
    Parallel;
    USA;
    String match
</SU>
<AB>"Two improved algorithms for string matching with k mismatches are
presented. One algorithm is based on fast integer multiplication algorithms
whereas the other follows more closely classic string-matching techniques."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>51</PY>
<VO>51</VO>
<PP>341-348</PP>
</SEQ>

<SEQ>
<UI>0545   Galil,Z.      Data Structures and Al.. J.Complexity    88 
4(1):33-72
</UI>
<AU>Galil Z;
    Giancarlo R
</AU>
<TI>Data Structures and Algorithms for Approximate String Matching
</TI>
<SU>Review;
    USA;
    Data structure;
    String match;
    Parallel;
    Structure;
    Algorithm
</SU>
<AB>"This paper surveys techniques for designing efficient sequential and
parallel approximate string matching algorithms. Special attention is given to
the methods for the construction of data structures that efficiently support
primitive operations needed in approximate string matching."
</AB>
<JT>J Complexity </JT>
<PY>1988</PY>
<VO>4</VO>
<NO>1</NO>
<PP>33-72</PP>
</SEQ>

<SEQ>
<UI>0546   Galil,Z.      Speeding up Dynamic Pr.. Theoret.Comput. 89 
64:107-118
</UI>
<AU>Galil Z;
    Giancarlo R
</AU>
<TI>Speeding up Dynamic Programming with Applications to Molecular Biology
</TI>
<SU>Pairwise alignment;
    USA;
    Dynamic programming;
    Edit;
    Dynamic
</SU>
<AB>Improved algorithms are obtained for dual convex-concave cases of a
problem that arises in many applications. The algorithms speed up several
dynamic programming routines that solve as a subproblem the given problem. One
typical problem is to compute the edit distance between the two sequences, 
given
substitution costs and a convex cost function for gaps."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>64</PY>
<VO>64</VO>
<PP>107-118</PP>
</SEQ>

<SEQ>
<UI>0547   Galil,Z.      On the Exact Complexit.. SIAM J.Comput.  91 
20(6):1008-102
</UI>
<AU>Galil Z;
    Giancarlo R
</AU>
<TI>On the Exact Complexity of String Matching: Lower Bounds
</TI>
<SU>Complexity;
    USA;
    String match
</SU>
<AB>"This paper provides several lower bounds on the number of character
comparisons that any string matching algorithm must perform in the worst case 
in
order to find occurrences of a pattern string in a text string. The class of
algorithms that are considered need not know the alphabet."
</AB>
<JT>SIAM J Comput</JT>
<PY>1991</PY>
<VO>20</VO>
<NO>6</NO>
<PP>1008-1020</PP>
</SEQ>

<SEQ>
<UI>0548   Galil,Z.      On the Exact Complexit.. SIAM J.Comput.  92 
21(3):407-437
</UI>
<AU>Galil Z;
    Giancarlo R
</AU>
<TI>On the Exact Complexity of String Matching: Upper Bounds
</TI>
<SU>Complexity;
    USA;
    String match
</SU>
<AB>"It is shown that, for any pattern of length m and for any text of length
n, it is possible to find all occurrences of the pattern in the text in overall
linear time and at most (4n - m)/3 character comparisons."
</AB>
<JT>SIAM J Comput</JT>
<PY>1992</PY>
<VO>21</VO>
<NO>3</NO>
<PP>407-437</PP>
</SEQ>

<SEQ>
<UI>0549   Galil,Z.      A Linear-time Algorith.. Inform.Process. 90 
33(6):309-311
</UI>
<AU>Galil Z;
    Park K
</AU>
<TI>A Linear-time Algorithm for Concave One-dimensional Dynamic Programming
</TI>
<SU>Pairwise alignment;
    USA;
    Dynamic programming;
    Edit;
    Algorithm;
    Dynamic
</SU>
<AB>"The one-dimensional dynamic programming problem is defined .... The
modified edit distance problem [Galil, Giancarlo (1989)], which arises in
molecular biology, ... can be decomposed into 2n copies of the problem."
</AB>
<JT>Inform Process Lett</JT>
<PY>1990</PY>
<VO>33</VO>
<NO>6</NO>
<PP>309-311</PP>
</SEQ>

<SEQ>
<UI>0550   Galil,Z.      An Improved Algorithm .. SIAM J.Comput.  90 
19(6):989-999
</UI>
<AU>Galil Z;
    Park K
</AU>
<TI>An Improved Algorithm for Approximate String Matching
</TI>
<SU>Match with k differences;
    USA;
    String match;
    Algorithm
</SU>
<AB>"Given a text string, a pattern string, and an integer k, a new algorithm
for finding all occurrences of the pattern string in the text string with at
most k differences is presented. Both its theoretical and practical variants
improve upon the known algorithms."
</AB>
<JT>SIAM J Comput</JT>
<PY>1990</PY>
<VO>19</VO>
<NO>6</NO>
<PP>989-999</PP>
</SEQ>

<SEQ>
<UI>0551   Galil,Z.      Dynamic Programming wi.. Theoret.Comput. 92 92:49-76
</UI>
<AU>Galil Z;
    Park K
</AU>
<TI>Dynamic Programming with Convexity, Concavity and Sparsity
</TI>
<SU>Pairwise alignment;
    Dynamic programming;
    Review;
    Sequence proximity;
    Edit;
    IL;
    Sequence alignment;
    Longest common;
    Gap;
    Dynamic
</SU>
<AB>"In many applications dynamic programming problems satisfy additional
conditions of convexity, concavity and sparsity. This paper presents a
classification of dynamic programming problems and surveys efficient algorithms
based on the three conditions." The string edit distance problem, the longest
common subsequence problem, the sequence alignment problem, and the sequence
alignment problem with linear gap costs are examples or variations of Problem
2.2
</AB>
<JT>Theoret Comput Sci</JT>
<PY>92</PY>
<VO>92</VO>
<PP>49-76</PP>
</SEQ>

<SEQ>
<UI>0552   Galil,Z.      Linear-time String-mat.. Theoret.Comput. 81 
13(3):331-336
</UI>
<AU>Galil Z;
    Seiferas J
</AU>
<TI>Linear-time String-matching using only a Fixed Number of Local Storage
Locations
</TI>
<SU>String match;
    Knuth-Morris-Pratt;
    IL
</SU>
<AB>"In an earlier paper (Galil, Seiferas 1980), we asked whether any variant
of the linear-time string-matching algorithm of Knuth, Morris, and Pratt (1977)
could be implemented as a FORTRAN subroutine. ... In this note, we show that a
fixed number of local storage locations does suffice, at least for an
implementation which is slightly less 'straightforward.'"
</AB>
<JT>Theoret Comput Sci</JT>
<PY>1981</PY>
<VO>13</VO>
<NO>3</NO>
<PP>331-336</PP>
</SEQ>

<SEQ>
<UI>0553   Galil,Z.      Time-space-optimal Str.. J.Comput.System 83 
26(3):280-294
</UI>
<AU>Galil Z;
    Seiferas J
</AU>
<TI>Time-space-optimal String Matching
</TI>
<SU>Knuth-Morris-Pratt;
    IL;
    String match
</SU>
<AB>"Any string-matching algorithm requires at least linear time and a
constant number of local storage locations. We design and analyze an algorithm
which realizes both asymptotic bounds simultaneously. This can be viewed as
completely eliminating the need for the tabulated 'failure function' in the
linear-time algorithm of Knuth, Morris, and Pratt."
</AB>
<JT>J Comput Systems Sci</JT>
<PY>1983</PY>
<VO>26</VO>
<NO>3</NO>
<PP>280-294</PP>
</SEQ>

<SEQ>
<UI>0554   Galil,Z.      Saving Space in Fast S.. SIAM J.Comput.  80 
9(2):417-438
</UI>
<AU>Galil Z;
    Seiferas JI
</AU>
<TI>Saving Space in Fast String Matching
</TI>
<SU>Knuth-Morris-Pratt;
    IL;
    String match
</SU>
<AB>"Algorithms described in this paper reduce the extra space used by the
Knuth-Morris-Pratt algorithm down to O(log |x|) [x being the pattern] ...."
</AB>
<JT>SIAM J Comput</JT>
<PY>1980</PY>
<VO>9</VO>
<NO>2</NO>
<PP>417-438</PP>
</SEQ>

<SEQ>
<UI>0555   Gatlin,L.L.   The Information Conten.. J.Theor.Biol.   66 
10:281-300
</UI>
<AU>Gatlin LL
</AU>
<TI>The Information Content of DNA
</TI>
<SU>Composition;
    Information content;
    USA;
    DNA
</SU>
<AB>"Recognition of the relevance of the transition probability matrix of the
nearest neighbor experiment to the Shannon formula makes possible the
calculation of the average information per symbol for a given kind of DNA. ...
It is speculated that DNA sequences with highly asymmetric transition
probabilities serve control functions in the genetic program."
</AB>
<JT>J Theor Biol</JT>
<PY>10</PY>
<VO>10</VO>
<PP>281-300</PP>
</SEQ>

<SEQ>
<UI>0556   George,D.G.   Mutation Data Matrix a.. Methods Enzymol 90 
183:333-351
</UI>
<AU>George DG;
    Barker WC;
    Hunt LT
</AU>
<TI>Mutation Data Matrix and Its Uses
</TI>
<SU>Sequence proximity;
    Substitution;
    USA;
    Matrix
</SU>
<AB>Similarity scoring matrices. Dayhoff Mutation Data Matrix (MDM).
Limitations of the model. Computer applications using MDM. New similarity
matrices
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>333-351</PP>
</SEQ>

<SEQ>
<UI>0557   Gibbs,A.J.    The Diagram, a Method .. Eur.J.Biochem.  70 16:1-11
</UI>
<AU>Gibbs AJ;
    McIntyre GA
</AU>
<TI>The Diagram, a Method for Comparing Sequences
</TI>
<SU>Pairwise comparison;
    Dot;
    AU
</SU>
<AB>"We describe another alternative, the 'diagonal-match' or diagram method,
which, we think has advantages over other methods in that it is basically 
simple
(can be done by hand if necessary), and shows directly all the possible
similarities between the sequences."
</AB>
<JT>Eur J Biochem</JT>
<PY>16</PY>
<VO>16</VO>
<PP>1-11</PP>
</SEQ>

<SEQ>
<UI>0558   Goad,W.B.     Pattern Recognition in.. Nucleic Acids R 82 
10(1):247-263
</UI>
<AU>Goad WB;
    Kanehisa MI
</AU>
<TI>Pattern Recognition in Nucleic Acid Sequences. I. A General Method for
Finding Local Homologies and Symmetries
</TI>
<SU>Subalignment;
    USA;
    Pattern recognition;
    Statistical;
    Subsequence;
    Needleman-Wunsch;
    Homology;
    Nucleic acid;
    Recognition
</SU>
<AB>"We present an algorithm - a generalization of the Needleman-Wunsch-
Sellers algorithm - which finds within longer sequences all subsequences that
resemble one another locally. The probability that so close a resemblance would
occur by chance alone is calculated and used to classify these local homologies
according to statistical significance."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1982</PY>
<VO>10</VO>
<NO>1</NO>
<PP>247-263</PP>
</SEQ>

<SEQ>
<UI>0559   Goldstein,L.  Poisson Approximation .. Comm.Statist.Th 90 
19(11):4167-41
</UI>
<AU>Goldstein L
</AU>
<TI>Poisson Approximation and DNA Sequence Matching
</TI>
<SU>Pairwise alignment;
    Significance;
    USA;
    Approximation;
    Sequence match;
    Poisson;
    DNA
</SU>
<AB>"A formal justification for the strong limit behavior of the log n law 
for
the case of perfect matching between sequences is given in Arratia and Waterman
(1985); how to obtain detailed information about the distributional behavior in
the log n law using the Chen-Stein method is outlined below."
</AB>
<JT>Comm Statist Theory Methods </JT>
<PY>1990</PY>
<VO>19</VO>
<NO>11</NO>
<PP>4167-4179</PP>
</SEQ>

<SEQ>
<UI>0560   Goldstein,L.  Poisson, Compound Pois.. Bull.Math.Biol. 92 
54(5):785-812
</UI>
<AU>Goldstein L;
    Waterman MS
</AU>
<TI>Poisson, Compound Poisson and Process Approximations for Testing
Statistical Significance in Sequence Comparisons
</TI>
<SU>Pairwise alignment;
    Significance;
    USA;
    Approximation;
    Statistical;
    Sequence comparison;
    Poisson
</SU>
<AB>"Most ... algorithms search for the alignment of two sequences that
optimizes some alignment score. It is an important problem to assess the
statistical significance of a given score. In this paper we use newly developed
methods for Poisson approximation to derive estimates of the statistical
significance of k-word matches on a diagonal of a sequence comparison."
</AB>
<JT>Bull Math Biol</JT>
<PY>1992</PY>
<VO>54</VO>
<NO>5</NO>
<PP>785-812</PP>
</SEQ>

<SEQ>
<UI>0561   Gonnet,G.H.   An Analysis of the Kar.. Inform.Process. 90 
34(5):271-274
</UI>
<AU>Gonnet GH;
    Baeza-Yates RA
</AU>
<TI>An Analysis of the Karp-Rabin String Matching Algorithm
</TI>
<SU>CA;
    Probabilistic;
    String match;
    String search;
    Algorithm
</SU>
<AB>"We present an average case analysis of the Karp-Rabin string matching
algorithm. This algorithm is a probabilistic algorithm that adapts hashing
techniques to string searching. We also propose an efficient implementation of
this algorithm."
</AB>
<JT>Inform Process Lett</JT>
<PY>1990</PY>
<VO>34</VO>
<NO>5</NO>
<PP>271-274</PP>
</SEQ>

<SEQ>
<UI>0562   Gonnet,G.H.   Exhaustive Matching of.. Science         92 256(5 
June):14
</UI>
<AU>Gonnet GH;
    Cohen MA;
    Benner SA
</AU>
<TI>Exhaustive Matching of the Entire Protein Sequence Database
</TI>
<SU>Database search;
    SWI;
    Sequence database;
    Gap;
    Protein
</SU>
<AB>"The entire protein sequence database has been exhaustively matched.
Definitive mutation matrices and models for scoring gaps were obtained from the
matching and used to organize the sequence database as sets of evolutionarily
connected components. ... The key to matching in a reasonable time lies in the
step preceding the application of the Needleman-Wunsch algorithm: a
reorganization of the sequence data by indexing on a patricia tree."
</AB>
<JT>Science </JT>
<PY>1992</PY>
<VO>256</VO>
<NO>5 June</NO>
<PP>1443-1445</PP>
</SEQ>

<SEQ>
<UI>0563   Gordon,A.D.   A Probabilistic Approa.. New Approache.. 
94Springer-Verlag
</UI>
<AU>Gordon AD
</AU>
<TI>A Probabilistic Approach to Identifying Consensus in Molecular Sequences
</TI>
<ED>Diday E
    Lechevallier Y;
    Schader M;
    Bertrand P;
    Burtschy B
</ED>
<BK>New Approaches in Classification and Data Analysis
</BK>
<SU>Consensus sequence;
    Probabilistic;
    Profile;
    UK
</SU>
<AB>"Given a profile of nucleic acid bases at a specified position in an
aligned set of molecular sequences, a simple rule for defining ambiguity codes
is presented: all bases whose frequency in the profile falls below the maximum
profile frequency by no more than a specified number d are included in the
ambiguity code. Ways are described of defining d so as to ensure that this
'containing subset' possesses desirable properties under the assumption of a
multinomial model for the frequencies of bases in the profile."
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1994</PY>
<PP>356-361</PP>
</SEQ>

<SEQ>
<UI>0564   Gotoh,O.      An Improved Algorithm .. J.Mol.Biol.     82 
162:705-708
</UI>
<AU>Gotoh O
</AU>
<TI>An Improved Algorithm for Matching Biological Sequences
</TI>
<SU>Pairwise alignment;
    JP;
    Gap;
    Algorithm
</SU>
<AB>Waterman, Smith, and Beyer (1976) described an O(m2n) algorithm for
aligning two sequences in which gaps of any length are allowed. This paper
presents a new O(mn) algorithm in which gap weights have a special form
</AB>
<JT>J Mol Biol</JT>
<PY>162</PY>
<VO>162</VO>
<PP>705-708</PP>
</SEQ>

<SEQ>
<UI>0565   Gotoh,O.      Alignment of Three Bio.. J.Theor.Biol.   86 
121:327-337
</UI>
<AU>Gotoh O
</AU>
<TI>Alignment of Three Biological Sequences with an Efficient Traceback
Procedure
</TI>
<SU>Multiple alignment;
    JP;
    Dynamic programming
</SU>
<AB>"This paper describes a dynamic programming algorithm for aligning three
sequences at a time."
</AB>
<JT>J Theor Biol</JT>
<PY>121</PY>
<VO>121</VO>
<PP>327-337</PP>
</SEQ>

<SEQ>
<UI>0566   Gotoh,O.      Pattern Matching of Bi.. Comput.Appl.Bio 87 
3(1):17-20
</UI>
<AU>Gotoh O
</AU>
<TI>Pattern Matching of Biological Sequences with Limited Storage
</TI>
<SU>Subalignment;
    JP;
    Pattern match;
    Complexity
</SU>
<AB>A method is described for getting the locally best matched alignments
between a pair of biological sequences which greatly reduces the storage
requirement while maintaining the O(n2) time complexity
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1987</PY>
<VO>3</VO>
<NO>1</NO>
<PP>17-20</PP>
</SEQ>

<SEQ>
<UI>0567   Gotoh,O.      Consistency of Optimal.. Bull.Math.Biol. 90 
52(4):509-525
</UI>
<AU>Gotoh O
</AU>
<TI>Consistency of Optimal Sequence Alignments
</TI>
<SU>Multiple alignment;
    Segment;
    JP;
    Sequence alignment;
    Region;
    Optimal;
    Consistency
</SU>
<AB>A previous method "is further extended so that the combination of 
pairwise
alignments that gives the greatest consistency is found when possibly many
alignments are equally optimal for each pairwise comparison. A method for
acceleration of simultaneous multiple sequence alignment is proposed in which
consistent regions serve as 'anchor points' limiting application of direct
multi-way alignment to the rest of 'inconsistent' regions."
</AB>
<JT>Bull Math Biol</JT>
<PY>1990</PY>
<VO>52</VO>
<NO>4</NO>
<PP>509-525</PP>
</SEQ>

<SEQ>
<UI>0568   Gotoh,O.      Optimal Sequence Align.. Bull.Math.Biol. 90 
52(3):359-373
</UI>
<AU>Gotoh O
</AU>
<TI>Optimal Sequence Alignment Allowing for Long Gaps
</TI>
<SU>Pairwise alignment;
    JP;
    Sequence alignment;
    Gap;
    Optimal
</SU>
<AB>"Because a long stretch in a biological sequence can be lost or added by 
a
single mutational event such as unequal crossing-over or transposition of a
movable element, the probability of occurrence of a long gap seems almost
independent of the gap length, while short insertions or deletions would occur
in a length-dependent frequency." Thus a new algorithm for optimal sequence
alignment where the gap weight function is given by a piecewise linear function
</AB>
<JT>Bull Math Biol</JT>
<PY>1990</PY>
<VO>52</VO>
<NO>3</NO>
<PP>359-373</PP>
</SEQ>

<SEQ>
<UI>0569   Gotoh,O.      Optimal Alignment Betw.. Comput.Appl.Bio 93 
9(3):361-370
</UI>
<AU>Gotoh O
</AU>
<TI>Optimal Alignment Between Groups of Sequences and its Application to
Multiple Sequence Alignment
</TI>
<SU>Multiple alignment;
    JP;
    Optimal;
    Sequence alignment
</SU>
<AB>"Four algorithms ... were developed to align two groups of biological
sequences. ... The advantages and disadvantages of the four algorithms are
discussed on the basis of the results of examinations of several protein
families."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>3</NO>
<PP>361-370</PP>
</SEQ>

<SEQ>
<UI>0570   Gotoh,O.      Sequence Search on a S.. Nucleic Acids R 86 
14(1):57-64
</UI>
<AU>Gotoh O;
    Tagashira Y
</AU>
<TI>Sequence Search on a Supercomputer
</TI>
<SU>Database search;
    JP;
    Sequence search;
    Gap
</SU>
<AB>"A set of programs was developed for searching nucleic acid and protein
sequence data bases for sequences similar to a given sequence. The programs ...
were optimized for vector processing on a Hitachi S810-20 supercomputer. ... 
The
principal algorithm is that of Smith and Waterman (1981) modified to 
incorporate
a linear gap weight (Gotoh 1982)."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1986</PY>
<VO>14</VO>
<NO>1</NO>
<PP>57-64</PP>
</SEQ>

<SEQ>
<UI>0571   Gribskov,M.   The Language Metaphor .. Computers Chem. 92 
16(2):85-88
</UI>
<AU>Gribskov M
</AU>
<TI>The Language Metaphor in Sequence Analysis
</TI>
<SU>Sequence analysis;
    Linguistic;
    USA;
    Coding;
    Language
</SU>
<AB>"The metaphors of language and coding have provided a powerful framework
for organizing molecular biology. Many techniques developed in the analysis of
text and other communication channels have been successfully applied to
macromolecular sequences with little or no change. ... A number of properties,
such as long-range interactions, structural dynamics and the importance of
sequence variation in modulation of function, are poorly modelled by the
language metaphor."
</AB>
<JT>Computers Chem</JT>
<PY>1992</PY>
<VO>16</VO>
<NO>2</NO>
<PP>85-88</PP>
</SEQ>

<SEQ>
<UI>0572   Gribskov,M.   Profile Scanning for T.. Comput.Appl.Bio 88 
4(1):61-66
</UI>
<AU>Gribskov M;
    Homyak M;
    Edenfield J;
    Eisenberg D
</AU>
<TI>Profile Scanning for Three-dimensional Structural Patterns in Protein
Sequences
</TI>
<SU>Match a pattern matrix;
    USA;
    Dynamic programming;
    Profile;
    Protein
</SU>
<AB>"Profile analysis measures the similarity between a target sequence and a
group of aligned sequences (the probe). The probe sequences are used to produce
a position-specific scoring table (the profile) that can be aligned with any
sequence (the target) using standard dynamic programming methods."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1988</PY>
<VO>4</VO>
<NO>1</NO>
<PP>61-66</PP>
</SEQ>

<SEQ>
<UI>0573   Gribskov,M.   Profile Analysis         Methods Enzymol 90 
183:146-159
</UI>
<AU>Gribskov M;
    Luthy R;
    Eisenberg D
</AU>
<TI>Profile Analysis
</TI>
<SU>Match a pattern matrix;
    USA;
    Profile;
    Motif
</SU>
<AB>"The profile method provides a convenient way to represent information
about groups or families of sequences as well as a means to ask questions about
the definition of protein families, the relationships between distantly related
proteins, and the presence of sequence or structural motifs in proteins."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>146-159</PP>
</SEQ>

<SEQ>
<UI>0574   Gribskov,M.   Profile Analysis: Dete.. Proc.Nat.Acad.S 87 
84(13):4355-43
</UI>
<AU>Gribskov M;
    McLachlan AD;
    Eisenberg D
</AU>
<TI>Profile Analysis: Detection of Distantly Related Proteins
</TI>
<SU>Match a pattern matrix;
    USA;
    Profile;
    Sequence comparison;
    Protein;
    Detection
</SU>
<AB>"Profile analysis is a method for detecting distantly related proteins by
sequence comparison. The basis for comparison is not only the customary Dayhoff
mutational-distance matrix but also the results of structural studies and
information implicit in the alignments of the sequences of families of similar
proteins. This information is expressed in a position-specific scoring table
(profile) ...."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>1987</PY>
<VO>84</VO>
<NO>13</NO>
<PP>4355-4358</PP>
</SEQ>

<SEQ>
<UI>0575   Griggs,J.R.   Sequence Alignments wi.. SIAM J.Algebrai 86 
7(4):604-608
</UI>
<AU>Griggs JR;
    Hanlon PJ;
    Waterman MS
</AU>
<TI>Sequence Alignments with Matched Sections
</TI>
<SU>Pairwise alignment;
    USA;
    Sequence alignment
</SU>
<AB>"In molecular biology, two finite sequences are compared by displaying 
one
sequence written over another in an alignment. The number of alignments of two
sequences is related to the Stanton-Cowan numbers. This paper gives asymptotics
for the number of alignments of two sequences of length n with matching 
sections
of size at least b."
</AB>
<JT>SIAM J Algebraic Discrete Methods </JT>
<PY>1986</PY>
<VO>7</VO>
<NO>4</NO>
<PP>604-608</PP>
</SEQ>

<SEQ>
<UI>0576   Grob,U.       Recognition of Ill-def.. Comput.Appl.Bio 88 
4(1):79-88
</UI>
<AU>Grob U;
    Stuber K
</AU>
<TI>Recognition of Ill-defined Signals in Nucleic Acid Sequences
</TI>
<SU>Match a pattern matrix;
    DE;
    Signal;
    Nucleic acid;
    Recognition;
    Consensus matrix
</SU>
<AB>"A set of programs has been developed for the definition and handling of
nucleic acid sequence consensus information. The sequences of known genetic
control signals are combined in a matrix."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1988</PY>
<VO>4</VO>
<NO>1</NO>
<PP>79-88</PP>
</SEQ>

<SEQ>
<UI>0577   Grossi,R.     A Fast VLSI Solution f.. Integration, Th 92 
13(2):195-206
</UI>
<AU>Grossi R
</AU>
<TI>A Fast VLSI Solution for Approximate String Matching
</TI>
<SU>Match with k differences;
    Parallel;
    String match;
    Italy;
    VLSI
</SU>
<AB>"A simple hardware algorithm is proposed for the approximate string
matching problem .... The employed interconnection network is a classical mesh-
of-trees, augmented with trees along the main diagonals of the mesh. The area
and time bounds are shown, and compared favorably with previous solutions in
many cases."
</AB>
<JT>Integration, The VLSI J </JT>
<PY>1992</PY>
<VO>13</VO>
<NO>2</NO>
<PP>195-206</PP>
</SEQ>

<SEQ>
<UI>0578   Grossi,R.     Simple and Efficient S.. Inform.Process. 89 
33:113-120
</UI>
<AU>Grossi R;
    Luccio F
</AU>
<TI>Simple and Efficient String Matching with k Mismatches
</TI>
<SU>Match with k mismatches;
    Italy
</SU>
<AB>"We follow a new approach to [string matching with k mismatches] based on
the determination of the permutations of [string] P in [string] T, and propose
two algorithms for its solution. ... An extensive set of runs shows ... that 
the
running times are strongly reduced, thus making our algorithms important in
practice."
</AB>
<JT>Inform Process Lett</JT>
<PY>33</PY>
<VO>33</VO>
<PP>113-120</PP>
</SEQ>

<SEQ>
<UI>0579   Guibas,L.J.   A New Proof of the Lin.. SIAM J.Comput.  80 
9(4):672-682
</UI>
<AU>Guibas LJ;
    Odlyzko AM
</AU>
<TI>A New Proof of the Linearity of the Boyer-Moore String Searching 
Algorithm
</TI>
<SU>String match;
    Boyer-Moore;
    USA;
    String search;
    Algorithm
</SU>
<AB>"We study the combinatorial structure of periodic strings and use these
results to derive a new proof of the linearity of the Boyer-Moore algorithm in
the worst case. Our proof reduces the previously best known bound of 7n to 4n,
where n is the length of the text."
</AB>
<JT>SIAM J Comput</JT>
<PY>1980</PY>
<VO>9</VO>
<NO>4</NO>
<PP>672-682</PP>
</SEQ>

<SEQ>
<UI>0580   Guibas,L.J.   Periods in Strings       J.Combin.Theory 81 
30(1):19-42
</UI>
<AU>Guibas LJ;
    Odlyzko AM
</AU>
<TI>Periods in Strings
</TI>
<SU>Regularities;
    USA
</SU>
<AB>"We explore the notion of periods of a string. A period can be thought of
as a shift that causes the string to match over itself. ... This problem arose
in connection with our work on string searching algorithms. ... The more
sophisticated of these algorithms extract information from an unsuccessful 
match
and use it to rule out other matches which have no chance of succeeding. These
decisions invariably require knowledge of how the pattern matches over itself
...."
</AB>
<JT>J Combin Theory Ser A </JT>
<PY>1981</PY>
<VO>30</VO>
<NO>1</NO>
<PP>19-42</PP>
</SEQ>

<SEQ>
<UI>0581   Guibas,L.J.   String Overlaps, Patte.. J.Combin.Theory 81 
30(2):183-208
</UI>
<AU>Guibas LJ;
    Odlyzko AM
</AU>
<TI>String Overlaps, Pattern Matching and Nontransitive Games
</TI>
<SU>String match;
    Complexity;
    USA;
    Pattern match
</SU>
<AB>"This paper studies several topics concerning the way strings can 
overlap.
The key notion of the correlation of two strings is introduced, which is a
representation of how the second string can overlap into the first. ... Another
application shows that no algorithm can check for the presence of a given
pattern in a text without examining essentially all characters of the text in
the worst case."
</AB>
<JT>J Combin Theory Ser A </JT>
<PY>1981</PY>
<VO>30</VO>
<NO>2</NO>
<PP>183-208</PP>
</SEQ>

<SEQ>
<UI>0582   Gusfield,D.   Efficient Methods for .. Bull.Math.Biol. 93 
55(1):141-154
</UI>
<AU>Gusfield D
</AU>
<TI>Efficient Methods for Multiple Sequence Alignment with Guaranteed Error
Bounds
</TI>
<SU>Multiple alignment;
    Complexity;
    Approximation;
    USA;
    Sequence alignment;
    Error
</SU>
<AB>"Several precise measures have been proposed for evaluating the goodness
of a multiple alignment, but no efficient methods are known which compute the
optimal alignment for any of these measures in any but small cases. In this
paper, we consider two previously proposed measures, and give two
computationally efficient multiple alignment methods (one for each measure)
whose deviation from the optimal value is guaranteed to be less than a factor 
of
two."
</AB>
<JT>Bull Math Biol</JT>
<PY>1993</PY>
<VO>55</VO>
<NO>1</NO>
<PP>141-154</PP>
</SEQ>

<SEQ>
<UI>0583   Haber,J.E.    An Evaluation of the R.. J.Mol.Biol.     70 
50:617-639
</UI>
<AU>Haber JE;
    Koshland DE Jr
</AU>
<TI>An Evaluation of the Relatedness of Proteins based on Comparison of Amino
Acid Sequences
</TI>
<SU>Pairwise alignment;
    Significance;
    USA;
    Statistical;
    Sequence comparison;
    Protein;
    Amino acid
</SU>
<AB>"A procedure based on comparing amino acid residues was developed to
examine the statistical consequences of practices commonly applied in sequence
comparisons. ... Rules of thumb were developed to indicate significant
relatedness of two sequences beyond the expectations of chance."
</AB>
<JT>J Mol Biol</JT>
<PY>50</PY>
<VO>50</VO>
<PP>617-639</PP>
</SEQ>

<SEQ>
<UI>0584   Hall,J.D.     A Software Tool for Fi.. Comput.Appl.Bio 88 
4(1):35-40
</UI>
<AU>Hall JD;
    Myers EW
</AU>
<TI>A Software Tool for Finding Locally Optimal Alignments in Protein and
Nucleic Acid Sequences
</TI>
<SU>Subalignment;
    USA;
    Region;
    Locally optimal;
    Optimal;
    Protein;
    Nucleic acid
</SU>
<AB>"We describe software for aligning protein or nucleic acid sequences 
based
on the concept of match density. This method is especially useful for locating
regions of short similarity between two longer sequences which may be largely
dissimilar (e.g. locating active site regions in distantly related proteins)."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1988</PY>
<VO>4</VO>
<NO>1</NO>
<PP>35-40</PP>
</SEQ>

<SEQ>
<UI>0585   Hall,P.A.V.   Approximate String Mat.. ACM Comput.Surv 80 
12(4):381-402
</UI>
<AU>Hall PAV;
    Dowling GR
</AU>
<TI>Approximate String Matching
</TI>
<SU>Review;
    UK;
    Dynamic programming;
    String match
</SU>
<AB>"Approximate matching of strings is reviewed with the aim of surveying
techniques suitable for finding an item in a database when there may be a
spelling mistake or other error in the keyword. The methods found are 
classified
as either equivalence or similarity problems. Equivalence problems are seen to
be readily solved using canonical forms. For similarity problems difference
measures are surveyed, with a full description of the well-established dynamic
programming method ...."
</AB>
<JT>ACM Comput Surveys </JT>
<PY>1980</PY>
<VO>12</VO>
<NO>4</NO>
<PP>381-402</PP>
</SEQ>

<SEQ>
<UI>0586   Harr,R.       Search Algorithm for P.. Nucleic Acids R 83 
11(9):2943-295
</UI>
<AU>Harr R;
    Haggstrom M;
    Gustafsson P
</AU>
<TI>Search Algorithm for Pattern Match Analysis of Nucleic Acid Sequences
</TI>
<SU>Match a pattern matrix;
    SWE;
    Statistical;
    Significance;
    Pattern match;
    Nucleic acid;
    Algorithm
</SU>
<AB>"The algorithm is of pattern match type and is based on the fact that
genetic information often is a function of a predictable statistical occurrence
of the four bases within parts of the sequence. The search algorithm compares
the known statistical pattern of bases in e.g. a promoter, with an unknown
sequence and calculates the statistical significance of the match at all
positions in the unknown sequence."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1983</PY>
<VO>11</VO>
<NO>9</NO>
<PP>2943-2957</PP>
</SEQ>

<SEQ>
<UI>0587   Hashiguchi,K. String Matching Proble.. Inform.Comput.  92 
101(2):131-149
</UI>
<AU>Hashiguchi K;
    Yamada K
</AU>
<TI>String Matching Problems over Free Partially Commutative Monoids
</TI>
<SU>Knuth-Morris-Pratt;
    JP;
    String match
</SU>
<AB>"This paper studies two string matching problems over free partially
commutative monoids. We analyze these two problems in detail, and present two
efficient polynomial time algorithms for solving them. ... Thus our algorithms
may be regarded as FPCM-versions of the Knuth-Morris-Pratt string matching
algorithm."
</AB>
<JT>Inform Comput</JT>
<PY>1992</PY>
<VO>101</VO>
<NO>2</NO>
<PP>131-149</PP>
</SEQ>

<SEQ>
<UI>0588   Haskin,R.L.   Operational Characteri.. ACM Trans.Datab 83 
8(1):15-40
</UI>
<AU>Haskin RL;
    Hollaar LA
</AU>
<TI>Operational Characteristics of a Hardware-based Pattern Matcher
</TI>
<SU>Match complex patterns;
    USA;
    Automata
</SU>
<AB>"The design and operation of a new class of hardware-based pattern
matchers ... is presented. This recognizer is based on a unique implementation
technique for finite state automata consisting of partitioning the state table
among a number of simple digital machines."
</AB>
<JT>ACM Trans Database Systems </JT>
<PY>1983</PY>
<VO>8</VO>
<NO>1</NO>
<PP>15-40</PP>
</SEQ>

<SEQ>
<UI>0589   Hein,J.       A New Method that Simu.. Mol.Biol.Evol.  89 
6(6):649-668
</UI>
<AU>Hein J
</AU>
<TI>A New Method that Simultaneously Aligns and Reconstructs Ancestral
Sequences for any Number of Homologous Sequences, When the Phylogeny is Given
</TI>
<SU>Multiple alignment;
    Evolutionary tree;
    USA;
    Reconstruct;
    Phylogeny
</SU>
<AB>"Among the fundamental problems in molecular evolution and in the 
analysis
of homologous sequences are alignment, phylogeny reconstruction, and the
reconstruction of evolutionary sequences. This paper presents a fast, combined
solution to these problems."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1989</PY>
<VO>6</VO>
<NO>6</NO>
<PP>649-668</PP>
</SEQ>

<SEQ>
<UI>0590   Hein,J.       A Tree Reconstruction .. Mol.Biol.Evol.  89 
6(6):669-684
</UI>
<AU>Hein J
</AU>
<TI>A Tree Reconstruction Method that is Economical in the Number of Pairwise
Comparisons Used
</TI>
<SU>Multiple alignment;
    Phylogeny;
    USA;
    Pairwise comparison
</SU>
<AB>"A fast method for reconstructing phylogenies from distance data is
presented. The method is economical in the number of pairwise comparisons
needed. It can be combined with a new phylogenetic alignment procedure to yield
an algorithm that gives a complete history of a set of homologous sequences."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1989</PY>
<VO>6</VO>
<NO>6</NO>
<PP>669-684</PP>
</SEQ>

<SEQ>
<UI>0591   Hein,J.       Unified Approach to Al.. Methods Enzymol 90 
183:626-645
</UI>
<AU>Hein J
</AU>
<TI>Unified Approach to Alignment and Phylogenies
</TI>
<SU>Multiple alignment;
    Evolutionary tree;
    Phylogeny;
    USA
</SU>
<AB>"Conventionally, the alignment problem involves two sequences and must
consider both substitutions and insertions/deletions. The phylogeny problem
involves more sequences but usually requires that the insertions/deletions be
taken care of beforehand. The accomplishment of the method presented here is to
solve both problems simultaneously."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>626-645</PP>
</SEQ>

<SEQ>
<UI>0592   Hein,J.       A Heuristic Method to .. J.Mol.Evol.     93 
36(4):396-405
</UI>
<AU>Hein J
</AU>
<TI>A Heuristic Method to Reconstruct the History of Sequences Subject to
Recombination
</TI>
<SU>Multiple alignment;
    Phylogeny;
    JP;
    Reconstruct;
    Heuristic;
    Recombination
</SU>
<AB>"Sequences subject to recombination and gene conversion defy phylogenetic
analysis by traditional methods since their evolutionary history cannot be
adequately summarized by a tree. This study investigates ways to describe their
evolutionary history and proposes a method giving a partial reconstruction of
this history."
</AB>
<JT>J Mol Evol</JT>
<PY>1993</PY>
<VO>36</VO>
<NO>4</NO>
<PP>396-405</PP>
</SEQ>

<SEQ>
<UI>0593   Henikoff,S.   Automated Assembly of .. Nucleic Acids R 91 
19(23):6565-65
</UI>
<AU>Henikoff S;
    Henikoff JG
</AU>
<TI>Automated Assembly of Protein Blocks for Database Searching
</TI>
<SU>Database search;
    USA;
    Region;
    Motif;
    Protein
</SU>
<AB>"Here we present a system that is designed to assemble a best set of
blocks for a given group of related proteins. The blocks are extended from
ungapped aligned regions discovered by the MOTIF algorithm of [Smith, Annau,
Chandrasegaran 1990] which can rapidly detect very distant relationships among
large groups of proteins. Many blocks might be found, and they might overlap or
appear in different orders .... The best set of blocks among these is 
determined
by a new algorithm ...."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1991</PY>
<VO>19</VO>
<NO>23</NO>
<PP>6565-6572</PP>
</SEQ>

<SEQ>
<UI>0594   Henikoff,S.   Detection of Protein S.. Nucleic Acids R 88 
16(13):6191-62
</UI>
<AU>Henikoff S;
    Wallace JC
</AU>
<TI>Detection of Protein Similarities Using Nucleotide Sequence Databases
</TI>
<SU>Database search;
    USA;
    Sequence database;
    Frame;
    Similarity;
    Protein;
    Nucleotide;
    Detection
</SU>
<AB>"A simple procedure is described for finding similarities between 
proteins
using nucleotide sequence databases. ...[A probe] consisting of an unidentified
open reading frame (ORF) ... was conceptually translated into protein and
compared to every possible translated reading frame of every nucleotide 
sequence
in the database."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1988</PY>
<VO>16</VO>
<NO>13</NO>
<PP>6191-6204</PP>
</SEQ>

<SEQ>
<UI>0595   Henikoff,S.   Finding Protein Simila.. Methods Enzymol 90 
183:111-132
</UI>
<AU>Henikoff S;
    Wallace JC;
    Brown JP
</AU>
<TI>Finding Protein Similarities with Nucleotide Sequence Databases
</TI>
<SU>Database search;
    USA;
    Sequence database;
    Similarity;
    Protein;
    Nucleotide
</SU>
<AB>"It is worthwhile to search nucleotide sequence databases for protein
similarities, since these databases are more complete and up-to-date. As
illustrated in this chapter, it is advantageous to search these databases for
amino acid rather than nucleotide sequence similarities. Therefore, we have
adapted amino acid sequence searching procedures to detect similarities within
nucleotide sequence databases."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>111-132</PP>
</SEQ>

<SEQ>
<UI>0596   Henneke,C.M.  A Multiple Sequence Al.. Comput.Appl.Bio 89 
5(2):141-150
</UI>
<AU>Henneke CM
</AU>
<TI>A Multiple Sequence Alignment Algorithm for Homologous Proteins using
Secondary Structure Information and Optionally Keying Alignments to 
Functionally
Important Sites
</TI>
<SU>Multiple alignment;
    Clustering;
    Sequence alignment;
    Structure;
    Protein;
    UK;
    Secondary;
    Algorithm
</SU>
<AB>"The programs described herein function as part of a suite of programs
designed for pairwise alignment, multiple alignment, generation of randomized
sequences, production of alignment scores and a sorting routine for analysis of
the alignments produced."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1989</PY>
<VO>5</VO>
<NO>2</NO>
<PP>141-150</PP>
</SEQ>

<SEQ>
<UI>0597   Hertz,G.Z.    Identification of Cons.. Comput.Appl.Bio 90 
6(2):81-92
</UI>
<AU>Hertz GZ;
    Hartzell GW III;
    Stormo GD
</AU>
<TI>Identification of Consensus Patterns in Unaligned DNA Sequences Known to
be Functionally Related
</TI>
<SU>Consensus sequence;
    Information theory;
    USA;
    Identification;
    DNA
</SU>
<AB>The method identifies "consensus patterns in a set of unaligned DNA
sequences known to bind a common protein or to have some other common
biochemical function. The method is based on a matrix representation of binding
site patterns. ... The goal of the method is to find the most significant 
matrix
... out of all the matrices that can be formed ...."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1990</PY>
<VO>6</VO>
<NO>2</NO>
<PP>81-92</PP>
</SEQ>

<SEQ>
<UI>0598   Higgins,D.G.  Sequence Ordinations: .. Comput.Appl.Bio 92 
8(1):15-22
</UI>
<AU>Higgins DG
</AU>
<TI>Sequence Ordinations: A Multivariate Analysis Approach to Analysing Large
Sequence Data Sets
</TI>
<SU>Multiple alignment;
    Multivariate;
    DE
</SU>
<AB>"This paper shows how to use principal coordinates analysis to find low-
dimensional representations of distance matrices derived from aligned sets of
sequences."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>1</NO>
<PP>15-22</PP>
</SEQ>

<SEQ>
<UI>0599   Higgins,D.G.  CLUSTAL V: Improved So.. Comput.Appl.Bio 92 
8(2):189-191
</UI>
<AU>Higgins DG;
    Bleasby AJ;
    Fuchs R
</AU>
<TI>CLUSTAL V: Improved Software for Multiple Sequence Alignment
</TI>
<SU>Multiple alignment;
    Clustering;
    DE;
    Sequence alignment;
    Profile
</SU>
<AB>"... the multiple alignments are carried out in a progressive manner ....
Sequences are aligned in larger and larger groups according to the branching
order in a 'guide tree' ... constructed using the UPGMA method .... The 
strategy
for aligning two alignments is a simple extension of the profile alignment
method of Gribskov et al. (1987)."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>2</NO>
<PP>189-191</PP>
</SEQ>

<SEQ>
<UI>0600   Higgins,D.G.  CLUSTAL: A Package for.. Gene            88 
73:237-244
</UI>
<AU>Higgins DG;
    Sharp PM
</AU>
<TI>CLUSTAL: A Package for Performing Multiple Sequence Alignment on a
Microcomputer
</TI>
<SU>Multiple alignment;
    Clustering;
    IR;
    Sequence alignment
</SU>
<AB>"An approach for performing multiple alignments of large numbers of amino
acid or nucleotide sequences is described. The method is based on first 
deriving
a phylogenetic tree from a matrix of all pairwise sequence similarity scores,
obtained using a fast pairwise alignment algorithm. Then the multiple alignment
is achieved from a series of pairwise alignments of clusters of sequences,
following the order of branching in the tree."
</AB>
<JT>Gene </JT>
<PY>73</PY>
<VO>73</VO>
<PP>237-244</PP>
</SEQ>

<SEQ>
<UI>0601   Higgins,D.G.  Fast and Sensitive Mul.. Comput.Appl.Bio 89 
5(2):151-153
</UI>
<AU>Higgins DG;
    Sharp PM
</AU>
<TI>Fast and Sensitive Multiple Sequence Alignments on a Microcomputer
</TI>
<SU>Multiple alignment;
    Clustering;
    IR;
    Sequence alignment
</SU>
<AB>"A strategy is described for the rapid alignment of many long nucleic 
acid
or protein sequences on a microcomputer. ... The approach is based on
progressively aligning sequences according to the branching order in an initial
phylogenetic tree."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1989</PY>
<VO>5</VO>
<NO>2</NO>
<PP>151-153</PP>
</SEQ>

<SEQ>
<UI>0602   Higgins,D.G.  EMBLSCAN: Fast Approxi.. Comput.Appl.Bio 92 
8(2):137-139
</UI>
<AU>Higgins DG;
    Stoehr P
</AU>
<TI>EMBLSCAN: Fast Approximate DNA Database Searches on Compact Disc
</TI>
<SU>Database search;
    DE;
    Distributed;
    DNA
</SU>
<AB>"An algorithm that allows rapid searching of nucleic acid sequences based
on pregenerated index files is described. The programs and index files for
searching the entire EMBL nucleotide sequence collection are being distributed
on the EMBL Data Library's CD-ROM."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>2</NO>
<PP>137-139</PP>
</SEQ>

<SEQ>
<UI>0603   Hirosawa,M.   MASCOT: Multiple Align.. Comput.Appl.Bio 93 
9(2):161-167
</UI>
<AU>Hirosawa M;
    Hoshida M;
    Ishikawa M;
    Toya T
</AU>
<TI>MASCOT: Multiple Alignment System for Protein Sequences Based on 
Three-way
Dynamic Programming
</TI>
<SU>Multiple alignment;
    Clustering;
    JP;
    Simulated annealing;
    Protein;
    Dynamic programming;
    Dynamic
</SU>
<AB>"MASCOT achieves high-quality alignment by employing three-way alignment
in addition to two-way alignment. The resultant alignments are refined by
simulated annealing to higher quality. We also use a cluster analysis of
sequences to produce highly reliable alignments."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>2</NO>
<PP>161-167</PP>
</SEQ>

<SEQ>
<UI>0604   Hirschberg,D. A Linear Space Algorit.. Comm.ACM        75 
18(6):341-343
</UI>
<AU>Hirschberg DS
</AU>
<TI>A Linear Space Algorithm for Computing Maximal Common Subsequences
</TI>
<SU>Longest common;
    USA;
    Subsequence;
    Algorithm
</SU>
<AB>"The problem of finding a longest common subsequence of two strings has
been solved in quadratic time and space. An algorithm is presented which will
solve this problem in quadratic time and in linear space."
</AB>
<JT>Comm ACM </JT>
<PY>1975</PY>
<VO>18</VO>
<NO>6</NO>
<PP>341-343</PP>
</SEQ>

<SEQ>
<UI>0605   Hirschberg,D. Algorithms for the Lon.. J.Assoc.Comput. 77 
24(4):664-675
</UI>
<AU>Hirschberg DS
</AU>
<TI>Algorithms for the Longest Common Subsequence Problem
</TI>
<SU>Longest common;
    USA;
    Subsequence;
    Algorithm
</SU>
<AB>"Two algorithms are presented that solve the longest common subsequence
problem. The first algorithm is applicable in the general case and requires 
O(pn
+ n log n) time where p is the length of the longest common subsequence. ... In
the common special case where p is close to m, [the second] algorithm takes 
much
less time than n2."
</AB>
<JT>J Assoc Comput Mach</JT>
<PY>1977</PY>
<VO>24</VO>
<NO>4</NO>
<PP>664-675</PP>
</SEQ>

<SEQ>
<UI>0606   Hirschberg,D. An Information-Theoret.. Inform.Process. 78 
7(1):40-41
</UI>
<AU>Hirschberg DS
</AU>
<TI>An Information-Theoretic Lower Bound for the Longest Common Subsequence
Problem
</TI>
<SU>Longest common;
    Complexity;
    Information theory;
    USA;
    Subsequence
</SU>
<AB>"We shall prove that n log n is a lower bound on the number of "less than
- equal - greater than" comparisons required to solve the LCS problem, assuming
unrestricted alphabet size."
</AB>
<JT>Inform Process Lett</JT>
<PY>1978</PY>
<VO>7</VO>
<NO>1</NO>
<PP>40-41</PP>
</SEQ>

<SEQ>
<UI>0607   Hirschberg,D. Recent Results on the .. Time Warps, S.. 
83Addison-Wesley
</UI>
<AU>Hirschberg DS
</AU>
<TI>Recent Results on the Complexity of Common-Subsequence Problems
</TI>
<ED>Sankoff D
    Kruskal JB
</ED>
<BK>Time Warps, String Edits, and Macromolecules: The Theory and Practice of
Sequence Comparison
</BK>
<SU>Longest common;
    Complexity;
    USA
</SU>
<AB>An overview of recent results in the solution of a variety of common-
subsequence problems
</AB>
<PU>Addison-Wesley </PU>
<PL>Reading, MA </PL>
<PY>1983</PY>
<PP>325-330</PP>
</SEQ>

<SEQ>
<UI>0608   Hirst,J.D.    Prediction of ATP-bind.. Protein Eng.    91 
4(6):615-623
</UI>
<AU>Hirst JD;
    Sternberg MJE
</AU>
<TI>Prediction of ATP-binding Motifs: A Comparison of a Perceptron-type 
Neural
Network and a Consensus Sequence Method
</TI>
<SU>Match a pattern matrix;
    UK;
    Pattern recognition;
    Motif;
    Neural;
    Consensus sequence;
    Statistical;
    Prediction;
    Network
</SU>
<AB>"In this paper, a two-layer feed-forward neural network has been trained
to recognize ATP-binding local sequence motifs. The neural network correctly
classified 78% of the 349 sequences used. This was much better than a simple
motif-searching program. A more sophisticated statistical method was developed,
however, which performed marginally better (80% correct classification) than 
the
neural network."
</AB>
<JT>Protein Eng</JT>
<PY>1991</PY>
<VO>4</VO>
<NO>6</NO>
<PP>615-623</PP>
</SEQ>

<SEQ>
<UI>0609   Hodgman,T.C.  The Elucidation of Pro.. Comput.Appl.Bio 89 
5(1):1-13
</UI>
<AU>Hodgman TC
</AU>
<TI>The Elucidation of Protein Function by Sequence Motif Analysis
</TI>
<SU>Sequence analysis;
    Review;
    UK;
    Motif;
    Function;
    Protein
</SU>
<AB>"Protein sequence motifs are acquiring increasing prominence in the area
of sequence analysis. This review describes the current methods of their
construction and their use in the determination of protein function, and offers
guidelines on interpreting data obtained."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1989</PY>
<VO>5</VO>
<NO>1</NO>
<PP>1-13</PP>
</SEQ>

<SEQ>
<UI>0610   Hogeweg,P.    The Alignment of Sets .. J.Mol.Evol.     84 
20:175-186
</UI>
<AU>Hogeweg P;
    Hesper B
</AU>
<TI>The Alignment of Sets of Sequences and the Construction of Phyletic 
Trees:
An Integrated Method
</TI>
<SU>Multiple alignment;
    Evolutionary tree;
    NL
</SU>
<AB>"The alignment of sets of sequences and the construction of phyletic 
trees
cannot be treated separately. The concept of 'good alignment' is meaningless
without reference to a phyletic tree, and the construction of phyletic trees
presupposes alignment of the sequences. We propose an integrated method that
generates both an alignment of a set of sequences and a phyletic tree."
</AB>
<JT>J Mol Evol</JT>
<PY>20</PY>
<VO>20</VO>
<PP>175-186</PP>
</SEQ>

<SEQ>
<UI>0611   Horspool,R.N. Practical Fast Searchi.. Software.Practi 80 
10:501-506
</UI>
<AU>Horspool RN
</AU>
<TI>Practical Fast Searching in Strings
</TI>
<SU>String match;
    Boyer-Moore;
    CA
</SU>
<AB>"The problem of searching through text to find a specified substring is
considered in a practical setting. It is discovered that a method developed by
Boyer and Moore can outperform even special-purpose search instructions that 
may
be built into the computer hardware. For very short substrings however, these
special purpose instructions are fastest ...."
</AB>
<JT>Software Practice Experience </JT>
<PY>10</PY>
<VO>10</VO>
<PP>501-506</PP>
</SEQ>

<SEQ>
<UI>0612   Hsu,W.J.      Computing a Longest Co.. BIT             84 24:45-59
</UI>
<AU>Hsu WJ;
    Du MW
</AU>
<TI>Computing a Longest Common Subsequence for a Set of Strings
</TI>
<SU>Multiple alignment;
    CN;
    Dynamic programming;
    Longest common;
    Subsequence
</SU>
<AB>"The known 2-string LCS problem is generalized to finding a Longest 
Common
Subsequence (LCS) for a set of strings. A new, general approach that
systematically enumerates common subsequences is proposed for the solution. ...
The proposed method may be considered to be much more efficient than the
straightforward dynamic programming approach."
</AB>
<JT>BIT </JT>
<PY>24</PY>
<VO>24</VO>
<PP>45-59</PP>
</SEQ>

<SEQ>
<UI>0613   Hsu,W.J.      New Algorithms for LCS.. J.Comput.System 84 
29(2):133-152
</UI>
<AU>Hsu WJ;
    Du MW
</AU>
<TI>New Algorithms for LCS Problem
</TI>
<SU>Longest common;
    CN;
    Algorithm
</SU>
<AB>"Two algorithms which improve two existing results, respectively, are
presented. ... The [first] algorithm also exhibits desirable properties under
conditions of sparse matches. [The second] also outperforms existing algorithms
designed for sparsely-matched situations. ... The two algorithms provide
interesting contrasts of different approaches to one problem ...."
</AB>
<JT>J Comput Systems Sci</JT>
<PY>1984</PY>
<VO>29</VO>
<NO>2</NO>
<PP>133-152</PP>
</SEQ>

<SEQ>
<UI>0614   Huang,X.      A Lower Bound for the .. Inform.Process. 88 
27(6):319-321
</UI>
<AU>Huang X
</AU>
<TI>A Lower Bound for the Edit-distance Problem Under an Arbitrary Cost
Function
</TI>
<SU>Pairwise alignment;
    Complexity;
    USA;
    Edit;
    Longest common;
    Function
</SU>
<AB>"We show that any algorithm that can compute the edit distance of two
strings under an arbitrary cost function must take time proportional to n2 
under
the RAM model of computation, where n is the length of the strings. As a
corollary, we observe that the Hunt-Szymanski algorithm for longest common
subsequences cannot be extended to solve the general edit-distance problem."
</AB>
<JT>Inform Process Lett</JT>
<PY>1988</PY>
<VO>27</VO>
<NO>6</NO>
<PP>319-321</PP>
</SEQ>

<SEQ>
<UI>0615   Huang,X.      A Space-efficient Para.. Internat.J.Para 89 
18(3):223-239
</UI>
<AU>Huang X
</AU>
<TI>A Space-efficient Parallel Sequence Comparison Algorithm for a Message-
passing Multiprocessor
</TI>
<SU>Subalignment;
    Parallel;
    USA;
    Sequence comparison;
    Sequence alignment;
    Algorithm
</SU>
<AB>"We present a parallel algorithm for computing an optimal sequence
alignment in efficient space. The algorithm is intended for a message-passing
architecture with one-dimensional-array topology. ... Some experimental results
on an Intel hypercube are provided."
</AB>
<JT>Internat J Parallel Programming </JT>
<PY>1989</PY>
<VO>18</VO>
<NO>3</NO>
<PP>223-239</PP>
</SEQ>

<SEQ>
<UI>0616   Huang,X.      Computing Local Sequen.. Proceedings o.. 90
</UI>
<AU>Huang X
</AU>
<TI>Computing Local Sequence Similarities on a Hypercube
</TI>
<BK>Proceedings of the 1990 International Conference on Parallel Processing,
Vol. III
</BK>
<SU>Subalignment;
    Parallel;
    USA;
    Region;
    Similarity
</SU>
<AB>"Recently, a space efficient algorithm for finding similar regions of two
sequences has been developed (Huang, Hardison, Miller 1990). In this paper, we
consider parallelizing the algorithm on an Intel iPSC/2 hypercube. Experimental
results show that high parallel efficiency is achieved."
</AB>
<PY>1990</PY>
<PP>360-361</PP>
</SEQ>

<SEQ>
<UI>0617   Huang,X.      A Space-efficient Algo.. Comput.Appl.Bio 90 
6(4):373-381
</UI>
<AU>Huang X;
    Hardison RC;
    Miller W
</AU>
<TI>A Space-efficient Algorithm for Local Similarities
</TI>
<SU>Subalignment;
    USA;
    Similarity;
    Algorithm
</SU>
<AB>"We describe a dynamic-programming local-similarity algorithm that needs
only space proportional to the sum of the sequence lengths. The method can also
find repeats within a single long sequence. ... Our linear-space local
similarity algorithm combines the linear-space global alignment algorithm of
Myers and Miller (1988) with techniques of Waterman and Eggert (1987)."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1990</PY>
<VO>6</VO>
<NO>4</NO>
<PP>373-381</PP>
</SEQ>

<SEQ>
<UI>0618   Huang,X.      A Time-Efficient, Line.. Adv.Appl.Math.  91 
12:337-357
</UI>
<AU>Huang X;
    Miller W
</AU>
<TI>A Time-Efficient, Linear-Space Local Similarity Algorithm
</TI>
<SU>Subalignment;
    USA;
    Similarity;
    Algorithm
</SU>
<AB>"This paper presents a time-efficient algorithm that produces k best 
'non-
intersecting' local alignments for any chosen k. The algorithm's main strength
is that it needs only O(M + N + K) space, where M and N are the lengths of the
given sequences and K is the total length of the computed alignments."
</AB>
<JT>Adv Appl Math</JT>
<PY>12</PY>
<VO>12</VO>
<PP>337-357</PP>
</SEQ>

<SEQ>
<UI>0619   Huang,X.      Parallelization of a L.. Comput.Appl.Bio 92 
8(2):155-165
</UI>
<AU>Huang X;
    Miller W;
    Schwartz S;
    Hardison RC
</AU>
<TI>Parallelization of a Local Similarity Algorithm
</TI>
<SU>Subalignment;
    Parallel;
    USA;
    Region;
    Sequence comparison;
    Similarity;
    Algorithm
</SU>
<AB>"We describe how to parallelize the new algorithm [to determine the
similar regions within two given sequences] and present results of experimental
studies on an Intel hypercube. The parallel method provides rapid, high-
resolution alignments for users of our software toolkit for pairwise sequence
comparison ...."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>2</NO>
<PP>155-165</PP>
</SEQ>

<SEQ>
<UI>0620   Hume,A.       A Tale of Two Greps      Software.Practi 88 
18(11):1063-10
</UI>
<AU>Hume A
</AU>
<TI>A Tale of Two Greps
</TI>
<SU>Match complex patterns;
    USA;
    Text search;
    Program;
    Boyer-Moore
</SU>
<AB>Text searching programs such as the UNIX system tools grep and egrep
require more than just good algorithms;; they need to make efficient use of
system resources such as I/O. ... I also describe incorporating the Boyer-Moore
algorithm into egrep; egrep is now typically 8-10 (for some common patterns 30-
40) times faster than grep."
</AB>
<JT>Software Practice Experience </JT>
<PY>1988</PY>
<VO>18</VO>
<NO>11</NO>
<PP>1063-1072</PP>
</SEQ>

<SEQ>
<UI>0621   Hume,A.       Fast String Searching    Software.Practi 91 
21(11):1221-12
</UI>
<AU>Hume A;
    Sunday D
</AU>
<TI>Fast String Searching
</TI>
<SU>String match;
    Boyer-Moore;
    USA;
    String search
</SU>
<AB>The Boyer-Moore algorithm "has been the standard benchmark for the
practical string search literature. Yet this yardstick compares badly with
current practice. We describe two algorithms that perform 47% fewer comparisons
and are about 4.5 times faster across a wide range of architectures and
compilers. These new variants are members of a family of algorithms based on 
the
skip loop structure of the preferred, but often neglected, fast form of Boyer-
Moore."
</AB>
<JT>Software Practice Experience </JT>
<PY>1991</PY>
<VO>21</VO>
<NO>11</NO>
<PP>1221-1248</PP>
</SEQ>

<SEQ>
<UI>0622   Hunt,J.W.     A Fast Algorithm for C.. Comm.ACM        77 
20(5):350-353
</UI>
<AU>Hunt JW;
    Szymanski TG
</AU>
<TI>A Fast Algorithm for Computing Longest Common Subsequences
</TI>
<SU>Longest common;
    USA;
    Dynamic programming;
    Subsequence;
    Algorithm
</SU>
<AB>This algorithm is not the dynamic programming approach: the authors
"suggested extracting a longest common subsequence from the two strings and
producing the editing changes from the subsequence."
</AB>
<JT>Comm ACM </JT>
<PY>1977</PY>
<VO>20</VO>
<NO>5</NO>
<PP>350-353</PP>
</SEQ>

<SEQ>
<UI>0623   Ibarra,O.H.   String Editing on a On.. IEEE Trans.Comp 92 
41(1):112-118
</UI>
<AU>Ibarra OH;
    Jiang T;
    Wang H
</AU>
<TI>String Editing on a One-way Linear Array of Finite-state Machines
</TI>
<SU>Pairwise alignment;
    Parallel;
    USA;
    Longest common;
    Editing
</SU>
<AB>"We give an efficient parallel algorithm for the string edit problem. The
model of computation is a one-way linear array of identical finite-state
machines (nodes). ... Our algorithm can produce the actual minimum-cost edit
sequence in linear time. ... We also give applications to other problems such 
as
the longest common subsequence and approximate pattern matching."
</AB>
<JT>IEEE Trans Comput</JT>
<PY>1992</PY>
<VO>41</VO>
<NO>1</NO>
<PP>112-118</PP>
</SEQ>

<SEQ>
<UI>0624   Ibarra,O.H.   String Processing on t.. IEEE Trans.Acou 90 
38(1):160-164
</UI>
<AU>Ibarra OH;
    Pong TC;
    Sohn SM
</AU>
<TI>String Processing on the Hypercube
</TI>
<SU>Pairwise comparison;
    Parallel;
    USA;
    String match;
    Signal;
    Longest common
</SU>
<AB>"We give parallel algorithms for solving some string comparison problems
on the hypercube. These algorithms are widely applicable to the problems of
speech and signal processing." Problems considered: match a keyword, longest
common subsequence, string edit, minimum-length time-warping
</AB>
<JT>IEEE Trans Acoustics Speech Signal Processing </JT>
<PY>1990</PY>
<VO>38</VO>
<NO>1</NO>
<PP>160-164</PP>
</SEQ>

<SEQ>
<UI>0625   Isenman,M.E.  Performance and Archit.. IEEE Trans.Comp 90 
39(2):238-250
</UI>
<AU>Isenman ME;
    Shasha DE
</AU>
<TI>Performance and Architectural Issues for String Matching
</TI>
<SU>Knuth-Morris-Pratt;
    USA;
    String match;
    Performance
</SU>
<AB>"We introduce special heuristics to the Knuth-Morris-Pratt algorithm to
reduce the time and space required to perform the string matching. We compare
our hardware-based approach to the software approaches embodied in the UNIX
System grep and fgrep commands. ... We concentrate on hardware that can handle
variable length don't cares ...."
</AB>
<JT>IEEE Trans Comput</JT>
<PY>1990</PY>
<VO>39</VO>
<NO>2</NO>
<PP>238-250</PP>
</SEQ>

<SEQ>
<UI>0626   Ishikawa,M.   Multiple Sequence Alig.. Comput.Appl.Bio 93 
9(3):267-273
</UI>
<AU>Ishikawa M;
    Toya T;
    Hoshida M;
    Nitta K;
    Ogiwara A;
    Kanehisa M
</AU>
<TI>Multiple Sequence Alignment by Parallel Simulated Annealing
</TI>
<SU>Multiple alignment;
    JP;
    Sequence alignment;
    Parallel;
    Simulated annealing
</SU>
<AB>"We have developed simulated annealing algorithms to solve the problem of
multiple sequence alignment. ... To overcome long execution times for simulated
annealing, we utilized a parallel computer. ... The algorithm is also useful 
for
refining multiple alignments obtained by other heuristic methods."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>3</NO>
<PP>267-273</PP>
</SEQ>

<SEQ>
<UI>0627   Itoga,S.Y.    The String Merging Pro.. BIT             81 
21(1):20-30
</UI>
<AU>Itoga SY
</AU>
<TI>The String Merging Problem
</TI>
<SU>Multiple comparison;
    USA;
    Correction;
    Longest common
</SU>
<AB>"The string merging problem is to determine a merged string from a given
set of strings. ... Necessary and sufficient conditions are presented for the
case where this solution matches the solution to the string-to-string 
correction
problem. A special case where deletion is the only allowed edition [sic]
operation is shown to have the longest common subsequence of the strings as its
solution."
</AB>
<JT>BIT </JT>
<PY>1981</PY>
<VO>21</VO>
<NO>1</NO>
<PP>20-30</PP>
</SEQ>

<SEQ>
<UI>0628   Ivanov,A.G.   Recognition of an Appr.. Math.USSR-Izv.  85 
24(3):479-522
</UI>
<AU>Ivanov AG
</AU>
<TI>Recognition of an Approximate Occurrence of Words on a Turing Machine in
Real Time
</TI>
<SU>Match with k mismatches;
    RU;
    Word;
    Recognition
</SU>
<AB>"A Turing machine is constructed which in real time solves the problem of
the approximate identification of occurrences of words with respect to a number
of familiar metrics [e.g., Hamming, Minkowski]."
</AB>
<JT>Math USSR-Izv </JT>
<PY>1985</PY>
<VO>24</VO>
<NO>3</NO>
<PP>479-522</PP>
</SEQ>

<SEQ>
<UI>0629   Jagadeeswaran Interactive Computer P.. Nucleic Acids R 82 
10(1):433-447
</UI>
<AU>Jagadeeswaran P;
    McGuire PM Jr
</AU>
<TI>Interactive Computer Programs in Sequence Data Analysis
</TI>
<SU>Pairwise comparison;
    Dot;
    USA;
    Program
</SU>
<AB>"The first group of programs named MATCH ... is designed to generate the
dot matrix which provides information on the homology between two sequences and
direct and inverted repeats within a sequence."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1982</PY>
<VO>10</VO>
<NO>1</NO>
<PP>433-447</PP>
</SEQ>

<SEQ>
<UI>0630   Johnson,M.S.  A Method for the Simul.. J.Mol.Evol.     86 
23:267-278
</UI>
<AU>Johnson MS;
    Doolittle RF
</AU>
<TI>A Method for the Simultaneous Alignment of Three or More Amino Acid
Sequences
</TI>
<SU>Multiple alignment;
    Segment;
    USA;
    Amino acid
</SU>
<AB>"The basis of the approach is a progressive evaluation of selected
segments from each sequence. Only a small subset of all possible segments from
each sequence is compared, and a minimum of information is retained for the
trace-back of the alignment. As a result, this method has the advantage of 
being
both rapid and minimally consumptive of computer memory when constructing the
alignment."
</AB>
<JT>J Mol Evol</JT>
<PY>23</PY>
<VO>23</VO>
<PP>267-278</PP>
</SEQ>

<SEQ>
<UI>0631   Jones,R.      Sequence Pattern Match.. Comput.Appl.Bio 92 
8(4):377-383
</UI>
<AU>Jones R
</AU>
<TI>Sequence Pattern Matching on a Massively Parallel Computer
</TI>
<SU>Database search;
    Parallel;
    USA;
    Pattern match;
    Gap
</SU>
<AB>"A method is described for finding all occurrences of a sequence pattern
within a database of molecular sequences. Implementation of this on a massively
parallel computer [the CM-2] allows the user to perform very fast database
searches using complex patterns. In particular, the software supports
approximate pattern matching with score thresholds for either the entire 
pattern
or specified elements thereof. Matches to individual elements can be linked by
variable length gaps ...."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>4</NO>
<PP>377-383</PP>
</SEQ>

<SEQ>
<UI>0632   Jones,R.      Protein Sequence Compa.. Computers and.. 
90Addison-Wesley
</UI>
<AU>Jones R;
    Taylor W IV;
    Zhang X;
    Mesirov JP;
    Lander E
</AU>
<TI>Protein Sequence Comparison on the Connection Machine CM-2
</TI>
<ED>Bell G
    Marr T
</ED>
<BK>Computers and DNA, SFI Studies in the Sciences of Complexity, Vol. VII
</BK>
<SU>Subalignment;
    Parallel;
    USA;
    Sequence comparison;
    Protein
</SU>
<AB>"The appropriate algorithm for searching a database is that of Smith and
Waterman (1981), which locates the best common subsequence between two 
otherwise
unrelated sequences. ... Here we present our implementation of this algorithm 
on
the data parallel Connection Machine CM-2, manufactured by Thinking Machines
Corporation."
</AB>
<PU>Addison-Wesley </PU>
<PL>Reading, MA </PL>
<PY>1990</PY>
<PP>99-107</PP>
</SEQ>

<SEQ>
<UI>0633   Jukes,T.H.    Evolution of Protein M.. Mammalian Pro.. 69Academic 
Press
</UI>
<AU>Jukes TH;
    Cantor CR
</AU>
<TI>Evolution of Protein Molecules
</TI>
<ED>Munro HN
</ED>
<BK>Mammalian Protein Metabolism, Volume III
</BK>
<SU>Sequence proximity;
    Substitution;
    USA;
    Approximation;
    Evolution;
    Protein
</SU>
<AB>"From the triplet nature of the code one can see that certain amino acid
interchanges are much more likely than others in the limit of small number of
base changes. Depending on the amino acids involved, it can take either 1, 2, 
or
3 base changes in DNA (or RNA) to convert one amino acid to another. ... The
approximation one must make is to say that all single base changes are equally
probable."
</AB>
<PU>Academic Press </PU>
<PL>New York </PL>
<PY>1969</PY>
<PP>21-132</PP>
</SEQ>

<SEQ>
<UI>0634   Kanaoka,M.    Alignment of Protein S.. Protein Eng.    89 
2(5):347-351
</UI>
<AU>Kanaoka M;
    Kishimoto F;
    Ueki Y;
    Umeyama H
</AU>
<TI>Alignment of Protein Sequences using the Hydrophobic Core Scores
</TI>
<SU>Pairwise alignment;
    JP;
    Region;
    Gap;
    Protein;
    Score
</SU>
<AB>"To improve the accuracy of [pairwise] alignments, we introduced the
concept of hydrophobic core scores, which restrains putting 
insertions/deletions
in the hydrophobic core regions of the protein. ... The introduction of the
hydrophobic core scores derived from the knowledge of the tertiary structure of
one of each pair resulted in an improvement of the accuracy of the alignments."
</AB>
<JT>Protein Eng</JT>
<PY>1989</PY>
<VO>2</VO>
<NO>5</NO>
<PP>347-351</PP>
</SEQ>

<SEQ>
<UI>0635   Kanehisa,M.   Use of Statistical Cri.. Nucleic Acids R 84 
12(1):203-213
</UI>
<AU>Kanehisa M
</AU>
<TI>Use of Statistical Criteria for Screening Potential Homologies in Nucleic
Acid Sequences
</TI>
<SU>Subalignment;
    Significance;
    USA;
    Statistical;
    Segment;
    Monte Carlo;
    Homology;
    Nucleic acid
</SU>
<AB>"We proposed a simple formula to assess the statistical significance of
homologous segments found in comparison of two nucleic acid sequences (Goad,
Kanehisa 1982). This paper clarifies the basic assumptions of the formula and
its reliability is examined by Monte Carlo calculations. The results were
satisfactory for random sequences."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1984</PY>
<VO>12</VO>
<NO>1</NO>
<PP>203-213</PP>
</SEQ>

<SEQ>
<UI>0636   Karlin,S.     Methods for Assessing .. Proc.Nat.Acad.S 90 
87(6):2264-226
</UI>
<AU>Karlin S;
    Altschul SF
</AU>
<TI>Methods for Assessing the Statistical Significance of Molecular Sequence
Features by Using General Scoring Schemes
</TI>
<SU>Sequence analysis;
    Significance;
    USA;
    Statistical;
    Segment;
    Sequence alignment;
    Scoring
</SU>
<AB>"The distribution of the maximal segment score for randomly generated
single or multiple protein sequences is available under broad conditions. Such
results may serve as benchmarks of statistical significance. The results also
provide a means for choosing suitable scoring schemes."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>1990</PY>
<VO>87</VO>
<NO>6</NO>
<PP>2264-2268</PP>
</SEQ>

<SEQ>
<UI>0637   Karlin,S.     Identification of Sign.. Methods Enzymol 90 
183:388-402
</UI>
<AU>Karlin S;
    Blaisdell BE;
    Brendel V
</AU>
<TI>Identification of Significant Sequence Patterns in Proteins
</TI>
<SU>Sequence analysis;
    Significance;
    USA;
    Identification;
    Pattern discovery;
    Protein
</SU>
<AB>"The methods described in this chapter identify statistically significant
amino acid sequence configurations of many kinds. Our objective is to identify
diagnostic sequence features that might provide insights into protein function
and structure and ways of protein classification. ... Our focus here is to
identify statistically significant clusters, runs, and periodic patterns of
charge."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>388-402</PP>
</SEQ>

<SEQ>
<UI>0638   Karlin,S.     Chance and Statistical.. Science         92 257(3 
July):39
</UI>
<AU>Karlin S;
    Brendel V
</AU>
<TI>Chance and Statistical Significance in Protein and DNA Sequence Analysis
</TI>
<SU>Sequence analysis;
    Significance;
    USA;
    Statistical;
    Sequence comparison;
    Scoring;
    Protein;
    DNA
</SU>
<AB>"Statistical approaches help in the determination of significant
configurations in ... sequence data. Three recent statistical methods are
discussed: (i) score-based sequence analysis that provides a means for
characterizing anomalies in local sequence text and for evaluation sequence
comparisons; (ii) quantile distributions of amino acid usage that reveal 
general
compositional biases in proteins ...; and (iii) r-scan statistics that can be
applied to the analysis of spacings of sequence markers."
</AB>
<JT>Science </JT>
<PY>1992</PY>
<VO>257</VO>
<NO>3 July</NO>
<PP>39-49</PP>
</SEQ>

<SEQ>
<UI>0639   Karlin,S.     Statistical Methods an.. Annu.Rev.Biophy 91 
20:175-203
</UI>
<AU>Karlin S;
    Bucher P;
    Brendel V;
    Altschul SF
</AU>
<TI>Statistical Methods and Insights for Protein and DNA Sequences
</TI>
<SU>Sequence analysis;
    Significance;
    Review;
    USA;
    Statistical;
    Sequence comparison;
    Clustering;
    Protein;
    DNA
</SU>
<AB>"This article focuses on the statistics of protein sequences and the
insights they can provide to structure, function, and phylogenetic 
relatedness."
Sequence concepts and statistical significance. Sequence comparisons and
searches. Evaluation of clustering in protein sequences. Comparative
compositional analysis of protein sequences. Unusual spacings between sequence
letters or words
</AB>
<JT>Annu Rev Biophys Biophys Chem</JT>
<PY>20</PY>
<VO>20</VO>
<PP>175-203</PP>
</SEQ>

<SEQ>
<UI>0640   Karlin,S.     Statistical Compositio.. Ann.Statist.    90 
18(2):571-581
</UI>
<AU>Karlin S;
    Dembo A;
    Kawabata T
</AU>
<TI>Statistical Composition of High-scoring Segments from Molecular Sequences
</TI>
<SU>Sequence analysis;
    Significance;
    USA;
    Statistical;
    Segment;
    Probabilistic;
    Scoring;
    Composition
</SU>
<AB>"We present new probabilistic formulas for characterizing statistically
significant sequence configurations with respect to a general scoring scheme
associated with letter attributes and for enabling varying degrees in letter
matches. We describe the asymptotic extremal distribution of high aggregate
segment scores and the letter composition of high-scoring segments."
</AB>
<JT>Ann Statist</JT>
<PY>1990</PY>
<VO>18</VO>
<NO>2</NO>
<PP>571-581</PP>
</SEQ>

<SEQ>
<UI>0641   Karlin,S.     Comparative Statistics.. Proc.Nat.Acad.S 85 
82(18):6186-61
</UI>
<AU>Karlin S;
    Ghandour G
</AU>
<TI>Comparative Statistics for DNA and Protein Sequences: Multiple Sequence
Analysis
</TI>
<SU>Sequence analysis;
    Significance;
    USA;
    Statistical;
    Multiple comparison;
    Protein;
    DNA
</SU>
<AB>"Concepts and methods [Karlin and Ghandour, 1985] for the analysis of
patterns and relationships are extended to multiple DNA and protein sequences.
Functionals include multiple sequence common word occurrence distributions,
characterizations of high frequency shared words, and ascertainment of long
block identities."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>1985</PY>
<VO>82</VO>
<NO>18</NO>
<PP>6186-6190</PP>
</SEQ>

<SEQ>
<UI>0642   Karlin,S.     Comparative Statistics.. Proc.Nat.Acad.S 85 
82(17):5800-58
</UI>
<AU>Karlin S;
    Ghandour G
</AU>
<TI>Comparative Statistics for DNA and Protein Sequences: Single Sequence
Analysis
</TI>
<SU>Sequence analysis;
    Significance;
    USA;
    Statistical;
    Repeat;
    Protein;
    DNA
</SU>
<AB>"Four categories of data representations are used to help interpret
structures and similarities of nucleic acid and protein sequences. Statistical
significance of the observed relationships revealed by these representations 
are
assessed by a hierarchy of permutation procedures and by comparisons with
theoretical random models."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>1985</PY>
<VO>82</VO>
<NO>17</NO>
<PP>5800-5804</PP>
</SEQ>

<SEQ>
<UI>0643   Karlin,S.     Multiple-alphabet Amin.. Proc.Nat.Acad.S 85 
82:8597-8601
</UI>
<AU>Karlin S;
    Ghandour G
</AU>
<TI>Multiple-alphabet Amino Acid Sequence Comparisons of the Immunoglobulin 
k-
chain Constant Domain
</TI>
<SU>Sequence analysis;
    Significance;
    USA;
    Statistical;
    Codon;
    Amino acid;
    Sequence comparison
</SU>
<AB>"We compare the amino acid sequences of the constant domains of the
immunoglobulin k chain of human, mouse, and rabbit by using four ... 
'alphabets'
of the 20 amino acids based on their chemical, functional, charge, and
structural properties."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>82</PY>
<VO>82</VO>
<PP>8597-8601</PP>
</SEQ>

<SEQ>
<UI>0644   Karlin,S.     The Use of Multiple Al.. EMBO J.         85 
4(5):1217-1223
</UI>
<AU>Karlin S;
    Ghandour G
</AU>
<TI>The Use of Multiple Alphabets in Kappa-gene Immunoglobulin DNA Sequence
Comparisons
</TI>
<SU>Sequence analysis;
    Significance;
    USA;
    Statistical;
    DNA;
    Sequence comparison
</SU>
<AB>"Comparisons within and between [three] DNA sequences are carried out in
terms of three two-letter nucleotide alphabets: ... (ii) P-Q alphabet which
distinguishes purines ... from pyrimidines .... The P-Q alphabet comparisons
reveal an abundance of statistically significant block identities not seen at
the nucleotide level."
</AB>
<JT>EMBO J</JT>
<PY>1985</PY>
<VO>4</VO>
<NO>5</NO>
<PP>1217-1223</PP>
</SEQ>

<SEQ>
<UI>0645   Karlin,S.     DNA Sequence Compariso.. Mol.Biol.Evol.  85 
2(1):35-52
</UI>
<AU>Karlin S;
    Ghandour G;
    Foulser DE
</AU>
<TI>DNA Sequence Comparisons of the Human, Mouse, and Rabbit Immunoglobulin
Kappa Gene
</TI>
<SU>Sequence analysis;
    Significance;
    USA;
    Sequence comparison;
    Statistical;
    Longest common;
    Gene;
    DNA
</SU>
<AB>"New formulas for determining the expected length and variance of the
longest block identity (a succession of matching nucleotides) between multiple
random sequences are given and are used to establish statistical criteria for
ascertaining the significance of block identities shared in r out of s
sequences."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1985</PY>
<VO>2</VO>
<NO>1</NO>
<PP>35-52</PP>
</SEQ>

<SEQ>
<UI>0646   Karlin,S.     New Approaches for Com.. Proc.Nat.Acad.S 83 
80(18):5660-56
</UI>
<AU>Karlin S;
    Ghandour G;
    Ost F;
    Tavare S;
    Korn LJ
</AU>
<TI>New Approaches for Computer Analysis of Nucleic Acid Sequences
</TI>
<SU>Multiple comparison;
    Common feature;
    USA;
    Significance;
    Dyad;
    Statistical;
    Nucleic acid
</SU>
<AB>"A new ... algorithm is outlined that ascertains within and between
nucleic acid and protein sequences all direct repeats, dyad symmetries, and
other structural relationships. Large repeats, repeats of high frequency, dyad
symmetries of specified stem length and loop distance, and their distributions
are determined. Significance of homologies is assessed by a hierarchy of
permutation procedures."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>1983</PY>
<VO>80</VO>
<NO>18</NO>
<PP>5660-5664</PP>
</SEQ>

<SEQ>
<UI>0647   Karlin,S.     Algorithms for Identif.. Comput.Appl.Bio 88 
4(1):41-51
</UI>
<AU>Karlin S;
    Morris M;
    Ghandour G;
    Leung MY
</AU>
<TI>Algorithms for Identifying Local Molecular Sequence Features
</TI>
<SU>Multiple comparison;
    Common feature;
    USA;
    Dyad;
    Repeat;
    Multiple alignment;
    Algorithm
</SU>
<AB>"Efficient algorithms are described for identifying local molecular
sequence features including repeats, dyad symmetry pairings and aligned matches
between sequences, while allowing for errors. ... A similar algorithm for
multiple sequences identifies matches 'approximately aligned' with respect to
some common location. [It] is useful for refining alignment maps based on
coarser global analyses ...."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1988</PY>
<VO>4</VO>
<NO>1</NO>
<PP>41-51</PP>
</SEQ>

<SEQ>
<UI>0648   Karlin,S.     Efficient Algorithms f.. Proc.Nat.Acad.S 88 
85:841-845
</UI>
<AU>Karlin S;
    Morris M;
    Ghandour G;
    Leung MY
</AU>
<TI>Efficient Algorithms for Molecular Sequence Analysis
</TI>
<SU>Multiple comparison;
    Common feature;
    USA;
    Sequence analysis;
    Multiple alignment;
    Dyad;
    Repeat;
    Algorithm
</SU>
<AB>"Efficient (linear time) algorithms are described for identifying global
molecular sequence features allowing for errors including repeats, matches
between sequences, dyad symmetry pairings, and other sequence patterns. A
multiple sequence alignment algorithm is also described."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>85</PY>
<VO>85</VO>
<PP>841-845</PP>
</SEQ>

<SEQ>
<UI>0649   Karlin,S.     Counts of Long Aligned.. Adv.Appl.Probab 87 
19:293-351
</UI>
<AU>Karlin S;
    Ost F
</AU>
<TI>Counts of Long Aligned Word Matches Among Random Letter Sequences
</TI>
<SU>Multiple alignment;
    Significance;
    USA;
    Markov;
    Longest common;
    Statistical;
    Word
</SU>
<AB>"Asymptotic distributional properties of the maximal length aligned word
(a contiguous set of letters) among multiple random Markov dependent sequences
composed of letters from a finite alphabet are given. ... We shall concentrate
in this paper on the random variable which is the length of the longest aligned
matching word ... and also called the maximal length consensus segment."
</AB>
<JT>Adv Appl Probab</JT>
<PY>19</PY>
<VO>19</VO>
<PP>293-351</PP>
</SEQ>

<SEQ>
<UI>0650   Karlin,S.     Patterns in DNA and Am.. Mathematical .. 89CRC Press
</UI>
<AU>Karlin S;
    Ost F;
    Blaisdell BE
</AU>
<TI>Patterns in DNA and Amino Acid Sequences and Their Statistical
Significance
</TI>
<ED>Waterman MS
</ED>
<BK>Mathematical Methods for DNA Sequences
</BK>
<SU>Sequence analysis;
    Significance;
    Review;
    USA;
    Statistical;
    Repeat;
    Longest common;
    Pattern discovery;
    Amino acid;
    DNA
</SU>
<AB>"Relative to sequence composition, word relationships can be 
characterized
with reference to spacings, proximity to natural biological sites, unusual
lengths, clustering attributes, .... The specification of ... such concepts is
the first main objective .... Distinguishing significant features ... is
important in sequence comparisons. ... A description of useful theoretical
formulas and their interpretation on molecular sequence data is the second main
objective ...."
</AB>
<PU>CRC Press </PU>
<PL>Boca Raton, FL </PL>
<PY>1989</PY>
<PP>133-157</PP>
</SEQ>

<SEQ>
<UI>0651   Karp,R.M.     Efficient Randomized P.. IBM J.Res.Devel 87 
31(2):249-260
</UI>
<AU>Karp RM;
    Rabin MO
</AU>
<TI>Efficient Randomized Pattern-matching Algorithms
</TI>
<SU>String match;
    USA;
    Fingerprint;
    Multidimensional;
    Pattern match;
    Algorithm
</SU>
<AB>"We present randomized algorithms to solve the [matching keywords] 
problem
and some of its generalizations. The algorithms represent strings of length n 
by
much shorter strings called fingerprints, and achieve their efficiency by
manipulating fingerprints instead of longer strings. The algorithms require a
constant number of storage locations, and essentially run in real time. ... The
method readily generalizes to higher-dimensional pattern-matching problems."
</AB>
<JT>IBM J Res Develop</JT>
<PY>1987</PY>
<VO>31</VO>
<NO>2</NO>
<PP>249-260</PP>
</SEQ>

<SEQ>
<UI>0652   Kashyap,R.L.  An Effective Algorithm.. Inform.Sci.     81 
23(2):123-142
</UI>
<AU>Kashyap RL;
    Oommen BJ
</AU>
<TI>An Effective Algorithm for String Correction Using Generalized Edit
Distances - I. Description of the Algorithm and its Optimality
</TI>
<SU>Dictionary match;
    Correction;
    USA;
    Edit;
    Distance;
    Algorithm
</SU>
<AB>"This paper deals with the problem of estimating a transmitted string X,
from the corresponding received string Y, which is a noisy version of X. We
assume that Y contains any number of substitution, insertion, and deletion
errors, and that no two consecutive symbols of X were deleted in transmission."
</AB>
<JT>Inform Sci</JT>
<PY>1981</PY>
<VO>23</VO>
<NO>2</NO>
<PP>123-142</PP>
</SEQ>

<SEQ>
<UI>0653   Kashyap,R.L.  The Noisy Substring Ma.. IEEE Trans.Soft 83 
9(3):365-370
</UI>
<AU>Kashyap RL;
    Oommen BJ
</AU>
<TI>The Noisy Substring Matching Problem
</TI>
<SU>Dictionary match;
    Correction;
    USA;
    Edit
</SU>
<AB>"We considered the problem of estimating the set T(U), the subset of 
words
in the dictionary H which contains U as a substring, using only Y, a noisy
version of U. The suggested set estimate S*(Y) which needs cubic time for
computation has relatively high accuracy as verified by experiments."
</AB>
<JT>IEEE Trans Software Eng</JT>
<PY>1983</PY>
<VO>9</VO>
<NO>3</NO>
<PP>365-370</PP>
</SEQ>

<SEQ>
<UI>0654   Keim,P.       An Examination of the .. J.Mol.Biol.     81 
151:179-197
</UI>
<AU>Keim P;
    Heinrikson RL;
    Fitch WM
</AU>
<TI>An Examination of the Expected Degree of Sequence Similarity that might
Arise in Proteins that have Converged to Similar Conformational States
</TI>
<SU>Pairwise comparison;
    Significance;
    USA;
    Statistical;
    Structure;
    Similarity;
    Protein
</SU>
<AB>"The influence of structural similarity on both the genetic tests for
amino acid sequence similarity and the inference of homology was examined by
statistical methods."
</AB>
<JT>J Mol Biol</JT>
<PY>151</PY>
<VO>151</VO>
<PP>179-197</PP>
</SEQ>

<SEQ>
<UI>0655   Kim,J.Y.      An Approximate String-.. Theoret.Comput. 92 
92(1):107-117
</UI>
<AU>Kim JY;
    Shawe-Taylor J
</AU>
<TI>An Approximate String-Matching Algorithm
</TI>
<SU>Approximate match;
    UK;
    Data structure;
    Search tree;
    N-gram;
    String match;
    Algorithm
</SU>
<AB>"An approximate string-matching algorithm is described based on earlier
attribute-matching algorithms. The algorithm involves building a trie from the
text string .... Once this data structure has been built any number of
approximate searches can be made .... The ideas employed in the algorithm have
been shown effective in practice before, but have not previously received any
theoretical analysis."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>1992</PY>
<VO>92</VO>
<NO>1</NO>
<PP>107-117</PP>
</SEQ>

<SEQ>
<UI>0656   Kleffe,J.     First and Second Momen.. Comput.Appl.Bio 92 
8(5):433-441
</UI>
<AU>Kleffe J;
    Borodovsky M
</AU>
<TI>First and Second Moment of Counts of Words in Random Texts Generated by
Markov Chains
</TI>
<SU>Sequence analysis;
    Significance;
    Markov;
    DE;
    Word
</SU>
<AB>"An exact expression for the variance of random frequency that a given
word has in text generated by a Markov chain is presented. The result is 
applied
to periodic Markov chains, which describe the protein-coding DNA sequences
better than simple Markov chains. A new solution to the problem of word overlap
is proposed."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>5</NO>
<PP>433-441</PP>
</SEQ>

<SEQ>
<UI>0657   Kleffe,J.     The Joint Distribution.. Comput.Appl.Bio 93 
9(3):275-283
</UI>
<AU>Kleffe J;
    Grau E
</AU>
<TI>The Joint Distribution of Patterns in Random Sequences with Application 
to
the RC-measure for Expressivity
</TI>
<SU>Sequence analysis;
    Significance;
    DE;
    Markov;
    Distribution
</SU>
<AB>"A method was previously developed for computation of pattern
probabilities in random sequences under Markov chain models. We extend this
method to the calculation of the joint distribution for two patterns."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>3</NO>
<PP>275-283</PP>
</SEQ>

<SEQ>
<UI>0658   Kleffe,J.     Exact Computation of P.. Comput.Appl.Bio 90 
6(4):347-353
</UI>
<AU>Kleffe J;
    Langbecker U
</AU>
<TI>Exact Computation of Pattern Probabilities in Random Sequences Generated
by Markov Chains
</TI>
<SU>Sequence analysis;
    Significance;
    Markov;
    DE;
    Probability
</SU>
<AB>"Observed patterns in macromolecular sequences are often considered as
words and compared with their probabilities of occurring in random sequences.
Calculation of these probabilities, however, often lacks rigour. We have
developed an algorithm for exact computation of such probabilities for
stochastic sequences that follow a Markov chain model."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1990</PY>
<VO>6</VO>
<NO>4</NO>
<PP>347-353</PP>
</SEQ>

<SEQ>
<UI>0659   Knuth,D.E.    Fast Pattern Matching .. SIAM J.Comput.  77 
6(2):323-350
</UI>
<AU>Knuth DE;
    Morris JH;
    Pratt VR
</AU>
<TI>Fast Pattern Matching in Strings
</TI>
<SU>String match;
    Knuth-Morris-Pratt;
    USA;
    Pattern match
</SU>
<AB>"An algorithm is presented which finds all occurrences of one given 
string
within another, in running time proportional to the sum of the lengths of the
strings. The constant of proportionality is low enough to make this algorithm 
of
practical use, and the procedure can also be extended to deal with some more
general pattern-matching problems."
</AB>
<JT>SIAM J Comput</JT>
<PY>1977</PY>
<VO>6</VO>
<NO>2</NO>
<PP>323-350</PP>
</SEQ>

<SEQ>
<UI>0660   Konings,D.A.M Evolution of the Prima.. Mol.Biol.Evol.  87 
4(3):300-314
</UI>
<AU>Konings DAM;
    Hogeweg P;
    Hesper B
</AU>
<TI>Evolution of the Primary and Secondary Structures of the E1a mRNAs of the
Adenovirus
</TI>
<SU>Multiple alignment;
    Evolutionary tree;
    NL;
    Structure;
    Evolution;
    Secondary
</SU>
<AB>"Sankoff et al. (1972) were the first to use an estimated genealogical
relationship among sequences to assist in aligning multiple sequences. ... A
modified Sankoff et al. procedure ... (Hogeweg and Hesper 1984) ... called
TRIALS, modified and improved still further, is detailed here."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1987</PY>
<VO>4</VO>
<NO>3</NO>
<PP>300-314</PP>
</SEQ>

<SEQ>
<UI>0661   Krishnan,G.   DNA Sequence Analysis:.. Nucleic Acids R 86 
14(1):543-550
</UI>
<AU>Krishnan G;
    Kaul RK;
    Jagadeeswaran P
</AU>
<TI>DNA Sequence Analysis: A Procedure to Find Homologies Among Many 
Sequences
</TI>
<SU>Consensus sequence;
    Dot;
    USA;
    Sequence analysis;
    Program;
    Homology;
    DNA
</SU>
<AB>"SEQCMP, a program that analyzes and searches for homology among multiple
nucleic acid sequences, is described. The sequences are compared by the dot
matrix method and the consensus sequence is derived by superimposing all the 
dot
matrices on one another."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1986</PY>
<VO>14</VO>
<NO>1</NO>
<PP>543-550</PP>
</SEQ>

<SEQ>
<UI>0662   Kruskal,J.B.  An Overview of Sequenc.. Time Warps, S.. 
83Addison-Wesley
</UI>
<AU>Kruskal JB
</AU>
<TI>An Overview of Sequence Comparison
</TI>
<ED>Sankoff D
    Kruskal JB
</ED>
<BK>Time Warps, String Edits, and Macromolecules: The Theory and Practice of
Sequence Comparison
</BK>
<SU>Pairwise comparison;
    Review;
    USA;
    Sequence comparison
</SU>
<AB>Introduction to basic concepts, terminology, and notation (same as 
Kruskal
1983). It is the first of three chapters comprising a readable, self-contained
exposition on sequence comparison (Kruskal, Liberman 1983; Kruskal, Sankoff
1983)
</AB>
<PU>Addison-Wesley </PU>
<PL>Reading, MA </PL>
<PY>1983</PY>
<PP>1-44</PP>
</SEQ>

<SEQ>
<UI>0663   Kruskal,J.B.  An Overview of Sequenc.. SIAM Rev.       83 
25(2):201-237
</UI>
<AU>Kruskal JB
</AU>
<TI>An Overview of Sequence Comparison: Time Warps, String Edits, and
Macromolecules
</TI>
<SU>Pairwise comparison;
    Review;
    USA;
    Sequence comparison;
    Edit
</SU>
<AB>"A wide variety of different applications lead to problems in which
sequences of different lengths must be compared, to see how different they are,
and to see which elements in one sequence correspond to which elements in the
other sequences. ... This paper surveys the applications, methods, and theory 
of
sequence comparison." Same as Kruskal (1983)
</AB>
<JT>SIAM Rev</JT>
<PY>1983</PY>
<VO>25</VO>
<NO>2</NO>
<PP>201-237</PP>
</SEQ>

<SEQ>
<UI>0664   Kruskal,J.B.  The Symmetric Time-war.. Time Warps, S.. 
83Addison-Wesley
</UI>
<AU>Kruskal JB;
    Liberman M
</AU>
<TI>The Symmetric Time-warping Problem: From Continuous to Discrete
</TI>
<ED>Sankoff D
    Kruskal JB
</ED>
<BK>Time Warps, String Edits, and Macromolecules: The Theory and Practice of
Sequence Comparison
</BK>
<SU>Pairwise comparison;
    Review;
    USA
</SU>
<AB>Time-warping refers to the comparison of trajectories, or time-labeled
curves in multidimensional space, where each trajectory is subject to both
alteration by additive random error and variation in speed from one point to
another. It is the second of three chapters comprising a readable, self-
contained exposition on sequence comparison (Kruskal 1983; Kruskal, Sankoff
1983)
</AB>
<PU>Addison-Wesley </PU>
<PL>Reading, MA </PL>
<PY>1983</PY>
<PP>125-161</PP>
</SEQ>

<SEQ>
<UI>0665   Kruskal,J.B.  An Anthology of Algori.. Time Warps, S.. 
83Addison-Wesley
</UI>
<AU>Kruskal JB;
    Sankoff D
</AU>
<TI>An Anthology of Algorithms and Concepts for Sequence Comparison
</TI>
<ED>Sankoff D
    Kruskal JB
</ED>
<BK>Time Warps, String Edits, and Macromolecules: The Theory and Practice of
Sequence Comparison
</BK>
<SU>Pairwise comparison;
    Review;
    USA;
    Sequence comparison;
    Dynamic programming;
    Algorithm
</SU>
<AB>This chapter is for skeptics as yet unconvinced of the utility of the
dynamic programming approach to sequence comparison. It is the third of three
chapters comprising a readable, self-contained exposition on sequence 
comparison
(Kruskal 1983; Kruskal, Liberman 1983)
</AB>
<PU>Addison-Wesley </PU>
<PL>Reading, MA </PL>
<PY>1983</PY>
<PP>265-310</PP>
</SEQ>

<SEQ>
<UI>0666   Kumar,S.K.    A Linear Space Algorit.. Acta Inform.    87 
24:353-362
</UI>
<AU>Kumar SK;
    Rangan CP
</AU>
<TI>A Linear Space Algorithm for the LCS Problem
</TI>
<SU>Longest common;
    India;
    Complexity;
    Algorithm
</SU>
<AB>"A new linear-space algorithm to solve the LCS problem is presented. The
only other algorithm with linear-space complexity is by Hirschberg and has run
time complexity O(mn). Our algorithm, based on the divide and conquer 
technique,
has run time complexity O(n(m - p)), where p is the length of the LCS."
</AB>
<JT>Acta Inform</JT>
<PY>24</PY>
<VO>24</VO>
<PP>353-362</PP>
</SEQ>

<SEQ>
<UI>0667   Lake,J.A.     The Order of Sequence .. Mol.Biol.Evol.  91 
8(3):378-385
</UI>
<AU>Lake JA
</AU>
<TI>The Order of Sequence Alignment can Bias the Selection of Tree Topology
</TI>
<SU>Multiple alignment;
    Phylogeny;
    USA;
    Selection;
    Bias;
    Topology;
    Sequence alignment
</SU>
<AB>"The order in which sequences are aligned can bias tree selection. To 
test
the effect of alignment order, the classical four-taxon test has been applied 
to
the 'tree of life' by using alternative alignments and three reconstruction
algorithms (maximum parsimony, transversion parsimony, and evolutionary
parsimony). ... Specific alignment orders systematically favor alternative
trees."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1991</PY>
<VO>8</VO>
<NO>3</NO>
<PP>378-385</PP>
</SEQ>

<SEQ>
<UI>0668   Landau,G.M.   Efficient String Match.. Theoret.Comput. 86 
43:239-249
</UI>
<AU>Landau GM;
    Vishkin U
</AU>
<TI>Efficient String Matching with k Mismatches
</TI>
<SU>Match with k mismatches;
    IL;
    String match
</SU>
<AB>"Given a text of length n, a pattern of length m, and an integer k, we
present an algorithm for finding all occurrences of the pattern in the text,
each with at most k mismatches."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>43</PY>
<VO>43</VO>
<PP>239-249</PP>
</SEQ>

<SEQ>
<UI>0669   Landau,G.M.   Introducing Efficient .. ACM Sympos.Theo 86 
18:220-230
</UI>
<AU>Landau GM;
    Vishkin U
</AU>
<TI>Introducing Efficient Parallelism into Approximate String Matching and a
New Serial Algorithm
</TI>
<SU>Match with k differences;
    Parallel;
    IL;
    String match;
    Approximate match;
    Algorithm
</SU>
<AB>"Given a text of length n, a pattern of length m and an integer k, we
present parallel and serial algorithms for finding all occurrences of the
pattern in the text with at most k differences."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>18</PY>
<VO>18</VO>
<PP>220-230</PP>
</SEQ>

<SEQ>
<UI>0670   Landau,G.M.   Fast String Matching w.. J.Comput.System 88 
37(1):63-78
</UI>
<AU>Landau GM;
    Vishkin U
</AU>
<TI>Fast String Matching with k Differences
</TI>
<SU>Match with k differences;
    IL;
    String match
</SU>
<AB>"Given a text of length n, a pattern of length m, and an integer k, we
present an algorithm for finding all occurrences of the pattern in the text,
each with at most k differences."
</AB>
<JT>J Comput Systems Sci</JT>
<PY>1988</PY>
<VO>37</VO>
<NO>1</NO>
<PP>63-78</PP>
</SEQ>

<SEQ>
<UI>0671   Landau,G.M.   Fast Parallel and Seri.. J.Algorithms    89 
10:157-169
</UI>
<AU>Landau GM;
    Vishkin U
</AU>
<TI>Fast Parallel and Serial Approximate String Matching
</TI>
<SU>Match with k differences;
    Parallel;
    IL;
    String match;
    Approximate match
</SU>
<AB>"Given text of length n, a pattern of length m and an integer k, we
present parallel and serial algorithms for finding all occurrences of the
pattern in the text with at most k differences. The parallel algorithm requires
O(log m + k) time using n processors. The serial algorithm runs in O(nk) time
for an alphabet whose size is fixed."
</AB>
<JT>J Algorithms </JT>
<PY>10</PY>
<VO>10</VO>
<PP>157-169</PP>
</SEQ>

<SEQ>
<UI>0672   Landau,G.M.   An Efficient String Ma.. Nucleic Acids R 86 
14(1):31-46
</UI>
<AU>Landau GM;
    Vishkin U;
    Nussinov R
</AU>
<TI>An Efficient String Matching Algorithm with k Differences for Nucleotide
and Amino Acid Sequences
</TI>
<SU>Match with k differences;
    IL;
    String match;
    Amino acid;
    Nucleotide;
    Algorithm
</SU>
<AB>Given a pattern of length m, a text of length n, and an integer k, "we
present a simple algorithm showing that sequences can be optimally aligned in
O(k2n) time. For long sequences the gain factor over the currently used
algorithms is very large."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1986</PY>
<VO>14</VO>
<NO>1</NO>
<PP>31-46</PP>
</SEQ>

<SEQ>
<UI>0673   Landau,G.M.   An Efficient String Ma.. J.Theor.Biol.   87 
126(4):483-490
</UI>
<AU>Landau GM;
    Vishkin U;
    Nussinov R
</AU>
<TI>An Efficient String Matching Algorithm with k Substitutions for 
Nucleotide
and Amino Acid Sequences
</TI>
<SU>Match with k mismatches;
    IL;
    String match;
    Substitution;
    Amino acid;
    Nucleotide;
    Algorithm
</SU>
<AB>"Given a text of length n, a pattern of length m and an integer k, we
present an algorithm for finding all occurrences of the pattern in the text,
each with at most k substitutions. The algorithm runs in O(k(m log m + n)) 
time,
and requires O(nk) space. This algorithm has direct implications for nucleotide
and amino acid sequence comparisons."
</AB>
<JT>J Theor Biol</JT>
<PY>1987</PY>
<VO>126</VO>
<NO>4</NO>
<PP>483-490</PP>
</SEQ>

<SEQ>
<UI>0674   Landau,G.M.   Locating Alignments wi.. Comput.Appl.Bio 88 
4(1):19-24
</UI>
<AU>Landau GM;
    Vishkin U;
    Nussinov R
</AU>
<TI>Locating Alignments with k Differences for Nucleotide and Amino Acid
Sequences
</TI>
<SU>Match with k differences;
    Subalignment;
    IL;
    Pairwise alignment;
    Approximate match;
    Amino acid;
    Nucleotide
</SU>
<AB>"Given two sequences, a pattern of length m, a text of length n and a
positive integer k, we give two algorithms. The first finds all occurrences of
the pattern in the text as long as these do not differ from each other by more
than k differences. ... The second algorithm finds all subsequence alignments
between the pattern and the text with at most k differences."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1988</PY>
<VO>4</VO>
<NO>1</NO>
<PP>19-24</PP>
</SEQ>

<SEQ>
<UI>0675   Landau,G.M.   Fast Alignment of DNA .. Methods Enzymol 90 
183:487-502
</UI>
<AU>Landau GM;
    Vishkin U;
    Nussinov R
</AU>
<TI>Fast Alignment of DNA and Protein Sequences
</TI>
<SU>Match with k differences;
    Subalignment;
    IL;
    Approximate match;
    Protein;
    DNA
</SU>
<AB>"Searching [a database] for some 'key subsequences' with presumably
special coding or regulatory function is a basic problem often arising in the
analysis of the biological significance of these data. ... The numerous 
possible
matchings of long sequences generate, for these data, a problem of 
computational
difficulty. ... Our algorithms suggest a more efficient organization of the
matching task and the involved bookkeeping of the quality of partial 
matchings."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>487-502</PP>
</SEQ>

<SEQ>
<UI>0676   Lander,E.     Study of Protein Seque.. J.Supercomput.  89 
3:255-269
</UI>
<AU>Lander E;
    Mesirov JP;
    Taylor W IV
</AU>
<TI>Study of Protein Sequence Comparison Metrics on the Connection Machine 
CM-
2
</TI>
<SU>Pairwise comparison;
    Parallel;
    USA;
    Sequence comparison;
    Statistical;
    Scoring;
    Dynamic programming;
    Protein
</SU>
<AB>"Software tools have been developed to do rapid, large-scale protein
sequence comparisons on databases of amino acid sequences, using a data 
parallel
computer architecture. ... We have used this software to analyze the
effectiveness of various scoring metrics in determining sequence similarity, 
and
to generate statistical information about the behavior of these scoring systems
under the variation of certain parameters."
</AB>
<JT>J Supercomput</JT>
<PY>3</PY>
<VO>3</VO>
<PP>255-269</PP>
</SEQ>

<SEQ>
<UI>0677   Landes,C.     A Comparison of Severa.. Nucleic Acids R 92 
20(14):3631-36
</UI>
<AU>Landes C;
    Henaut A;
    Risler JL
</AU>
<TI>A Comparison of Several Similarity Indexes Used in the Classification of
Protein Sequences: A Multivariate Analysis
</TI>
<SU>Sequence proximity;
    FR;
    Multivariate;
    FASTA;
    Classification;
    Dot;
    Similarity;
    Protein
</SU>
<AB>"The present work describes an attempt to identify reliable criteria 
which
could be used as distance indices between protein sequences." Seven criteria
were tested. "Three criteria gave a classification consistent with known
similarities between the sequences in the sets, namely the Z-scores from 
BESTFIT
and FASTA and the multiple dot plot comparison distance index from DOCMA."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1992</PY>
<VO>20</VO>
<NO>14</NO>
<PP>3631-3637</PP>
</SEQ>

<SEQ>
<UI>0678   Landes,C.     Dot-plot Comparisons b.. Comput.Appl.Bio 93 
9(2):191-196
</UI>
<AU>Landes C;
    Henaut A;
    Risler JL
</AU>
<TI>Dot-plot Comparisons by Multivariate Analysis (DOCMA): A Tool for
Classifying Protein Sequences
</TI>
<SU>Multiple comparison;
    Dot;
    FR;
    Multivariate;
    Clustering;
    Protein
</SU>
<AB>"A method aimed at classifying protein sequences without resorting to
pairwise alignment is presented. Called DOCMA (DOt-plot Comparisons by
Multivariate Analysis), it is based on a multivariate analysis of the pairwise
dot-plots between all the sequences in the set."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>2</NO>
<PP>191-196</PP>
</SEQ>

<SEQ>
<UI>0679   Landraud,A.M. An Algorithm for Findi.. IEEE Trans.Patt 89 
11(8):890-895
</UI>
<AU>Landraud AM;
    Avril JF;
    Chretienne P
</AU>
<TI>An Algorithm for Finding a Common Structure Shared by a Family of Strings
</TI>
<SU>Multiple alignment;
    Segment;
    FR;
    Repeat;
    Dynamic programming;
    Structure;
    Algorithm
</SU>
<AB>"Our [alignment] method works in two successive stages. First, we use a
fast algorithm for drawing up a directory of exactly repeated patterns 
appearing
in a given majority of strings. In the second stage, our algorithm constructs
recursively 'anchoring patterns' by a 'divide-and-conquer' strategy and
converges on a maximum number of alignments."
</AB>
<JT>IEEE Trans Patt Anal Mach Intell</JT>
<PY>1989</PY>
<VO>11</VO>
<NO>8</NO>
<PP>890-895</PP>
</SEQ>

<SEQ>
<UI>0680   Lawrence,C.B. Use of Homology Domain.. Methods Enzymol 90 
183:133-146
</UI>
<AU>Lawrence CB
</AU>
<TI>Use of Homology Domains in Sequence Similarity Detection
</TI>
<SU>Subalignment;
    USA;
    Region;
    Locally optimal;
    Pairwise alignment;
    Homology;
    Similarity;
    Detection
</SU>
<AB>"This chapter describes an approach to identifying sequence similarities
that complements other standard methods. It is in the class of methods that
finds local optimal alignments. The main strength of this approach is that it
can identify the boundaries of homologous regions between two sequences with
great precision."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>133-146</PP>
</SEQ>

<SEQ>
<UI>0681   Lawrence,C.B. Definition and Identif.. Comput.Appl.Bio 88 
4(1):25-33
</UI>
<AU>Lawrence CB;
    Goldman DA
</AU>
<TI>Definition and Identification of Homology Domains
</TI>
<SU>Subalignment;
    Significance;
    USA;
    Region;
    Identification;
    Probabilistic;
    Scoring;
    Homology
</SU>
<AB>"The notion of a 'homology domain' is employed which defines the
boundaries of a region of sequence homology containing no insertions or
deletions. The relative significance of different potential homology domains is
evaluated using a non-linear similarity score related to the probability of
finding the observed level of similarity in the region by chance."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1988</PY>
<VO>4</VO>
<NO>1</NO>
<PP>25-33</PP>
</SEQ>

<SEQ>
<UI>0682   Lawrence,C.B. Optimized homology sea.. Bull.Math.Biol. 86 
48(5/6):569-58
</UI>
<AU>Lawrence CB;
    Goldman DA;
    Hood RT
</AU>
<TI>Optimized homology searches of the gene and protein sequence data banks
</TI>
<SU>Database search;
    USA;
    Homology;
    Gene;
    Protein
</SU>
<AB>"A strategy is presented for searching the gene and protein sequence data
banks which combines the use of two previously described algorithms [Altschul,
Erickson (1986); Lipman, Pearson (1985); Wilbur; Lipman (1983)]. The
implementation of this strategy is thoroughly evaluated with respect to
sensitivity, specificity and speed."
</AB>
<JT>Bull Math Biol</JT>
<PY>1986</PY>
<VO>48</VO>
<NO>5/6</NO>
<PP>569-583</PP>
</SEQ>

<SEQ>
<UI>0683   Lawrence,C.E. Maximum Likelihood Est.. J.Theor.Biol.   85 
113:425-439
</UI>
<AU>Lawrence CE;
    Reilly AA
</AU>
<TI>Maximum Likelihood Estimation of Subsequence Conservation
</TI>
<SU>Sequence proximity;
    USA;
    Likelihood;
    Statistical;
    Markov;
    Subsequence;
    Estimation
</SU>
<AB>"A statistical method is presented for comparing protein sequences by
partitioning the polymers and estimating each subsegment's degree of
conservation. Conservation is measured as a function of the number of
transitions occurring in the underlying time homogeneous Markov process assumed
to govern amino acid mutations. ... Partitioning and estimation are carried out
via maximum likelihood. The method is contrasted with the ... percent homology
measure."
</AB>
<JT>J Theor Biol</JT>
<PY>113</PY>
<VO>113</VO>
<PP>425-439</PP>
</SEQ>

<SEQ>
<UI>0684   Lawrence,C.E. An Expectation Maximiz.. Proteins Struct 90 7:41-51
</UI>
<AU>Lawrence CE;
    Reilly AA
</AU>
<TI>An Expectation Maximization (EM) Algorithm for the Identification and
Characterization of Common Sites in Unaligned Biopolymer Sequences
</TI>
<SU>Consensus sequence;
    Information theory;
    USA;
    Identification;
    Likelihood;
    Characterization;
    Expectation;
    Maximization;
    Algorithm
</SU>
<AB>"Statistical methodology for the identification and characterization of
protein binding sites in a set of unaligned DNA fragments is presented. ... No
alignment of the sites is required. Instead, the uncertainty in the location of
the sites is handled by employing the missing information principle to develop
an 'expectation maximization' (EM) algorithm."
</AB>
<JT>Proteins Struct Funct Genet</JT>
<PY>7</PY>
<VO>7</VO>
<PP>41-51</PP>
</SEQ>

<SEQ>
<UI>0685   Lecroq,T.     A Variation on the Boy.. Theoret.Comput. 92 
92:119-144
</UI>
<AU>Lecroq T
</AU>
<TI>A Variation on the Boyer-Moore Algorithm
</TI>
<SU>String match;
    Boyer-Moore;
    FR;
    Algorithm
</SU>
<AB>"A new approach can possess the ability for a given position in the text
to compute the length of the longest prefix of the word which ends at that
position. When we know this length, we are able to compute a better shift than
the Boyer-Moore approach. ... This leads to a linear-time algorithm which scans
the text characters at most three times each."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>92</PY>
<VO>92</VO>
<PP>119-144</PP>
</SEQ>

<SEQ>
<UI>0686   Lee,K.C.      Design and Analysis of.. Lecture Notes i 89 
368:215-229
</UI>
<AU>Lee KC;
    Mak VW
</AU>
<TI>Design and Analysis of a Parallel VLSI String Search Algorithm
</TI>
<SU>String match;
    Parallel;
    USA;
    Pattern match;
    VLSI;
    String search;
    Algorithm
</SU>
<AB>In Boral, H., Faudemay, P. (Eds.), Database Machines. Proceedings of the
Sixth International Workshop, IWDM '89, Deauville, France, 19-21 June 1989. "In
this paper, we propose a parallel VLSI string search algorithm called the Data
Parallel Pattern Matching (DPPM) algorithm."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>368</PY>
<VO>368</VO>
<PP>215-229</PP>
</SEQ>

<SEQ>
<UI>0687   Lefevre,C.    Pattern Recognition in.. Comput.Appl.Bio 93 
9(3):349-354
</UI>
<AU>Lefevre C;
    Ikeda JE
</AU>
<TI>Pattern Recognition in DNA Sequences and its Application to Consensus
Foot-printing
</TI>
<SU>Multiple comparison;
    Common feature;
    JP;
    Pattern recognition;
    Motif;
    Significance;
    Repeat;
    DNA;
    Recognition
</SU>
<AB>"We consider the problem of comparing several nucleic acid sequences to
identify words occurring imperfectly (patterns with no gap) with unusual
frequency. Methods for computing, representing, and inspecting interactively 
the
structure of such repeating motifs in nucleic acids and more generally any text
are described."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>3</NO>
<PP>349-354</PP>
</SEQ>

<SEQ>
<UI>0688   Lefevre,C.    The Position End-Set T.. Comput.Appl.Bio 93 
9(3):343-348
</UI>
<AU>Lefevre C;
    Ikeda JE
</AU>
<TI>The Position End-Set Tree: A Small Automaton for Word Recognition in
Biological Sequences
</TI>
<SU>String match;
    JP;
    Search tree;
    Regularities;
    Automata;
    Word;
    Recognition
</SU>
<AB>"When one is expecting to do many substring searches it is worthwhile to
build an auxiliary index to the sequence to aid in the search. We propose a
method to generate a compact index that can be viewed as a small (partial)
deterministic finite automaton recognizing the subword structure of a 
sequence."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>3</NO>
<PP>343-348</PP>
</SEQ>

<SEQ>
<UI>0689   Lesk,A.M.     Homology Modelling: In.. Curr.Opin.Struc 92 
2:242-247
</UI>
<AU>Lesk AM;
    Boswell DR
</AU>
<TI>Homology Modelling: Inferences from Tables of Aligned Sequences
</TI>
<SU>Multiple alignment;
    Structure;
    NZ;
    Homology
</SU>
<AB>"The relationship between an individual amino acid sequence and its
associated protein structure is deterministic, but has proved to be too subtle
to understand in detail from studying the relationships between single amino
acid sequences and structures. The patterns that appear in tables of aligned
homologous sequences contain much more information, and study of them has led 
to
success in several applications."
</AB>
<JT>Curr Opin Struct Biol</JT>
<PY>2</PY>
<VO>2</VO>
<PP>242-247</PP>
</SEQ>

<SEQ>
<UI>0690   Lesk,A.M.     Alignment of the Amino.. Protein Eng.    86 
1(1):77-78
</UI>
<AU>Lesk AM;
    Levitt M;
    Chothia C
</AU>
<TI>Alignment of the Amino Acid Sequences of Distantly Related Proteins using
Variable Gap Penalties
</TI>
<SU>Sequence alignment;
    UK;
    Region;
    Gap;
    Needleman-Wunsch;
    Structure;
    Protein;
    Amino acid
</SU>
<AB>"Because of the importance of the stability of the structures of the
packing of helix-helix interfaces, insertions and deletions are not observed to
occur in the interiors of helical regions of proteins. ... It is possible to
apply this insight ... to the alignment of distantly related sequences by a
modification of the Needleman-Wunsch procedure."
</AB>
<JT>Protein Eng</JT>
<PY>1986</PY>
<VO>1</VO>
<NO>1</NO>
<PP>77-78</PP>
</SEQ>

<SEQ>
<UI>0691   Leung,M.Y.    An Efficient Algorithm.. J.Mol.Biol.     91 
221(4):1367-13
</UI>
<AU>Leung MY;
    Blaisdell BE;
    Burge C;
    Karlin S
</AU>
<TI>An Efficient Algorithm for Identifying Matches With Errors in Multiple
Long Molecular Sequences
</TI>
<SU>Multiple comparison;
    Common feature;
    USA;
    Error;
    Repeat;
    Algorithm
</SU>
<AB>"An efficient algorithm is described for finding matches, repeats and
other word relations, allowing for errors, in large data sets of long molecular
sequences. The algorithm entails hashing on fixed-size words in conjunction 
with
the use of a linked list connecting all occurrences of the same word."
</AB>
<JT>J Mol Biol</JT>
<PY>1991</PY>
<VO>221</VO>
<NO>4</NO>
<PP>1367-1378</PP>
</SEQ>

<SEQ>
<UI>0692   Levenshtein,V Binary Codes Capable o.. Soviet Phys.Dok 66 
10(8):707-710
</UI>
<AU>Levenshtein VI
</AU>
<TI>Binary Codes Capable of Correcting Deletions, Insertions, and Reversals
</TI>
<SU>Sequence proximity;
    Correction;
    RU;
    Edit;
    Reversal;
    Deletion
</SU>
<AB>"Consider a function r(x, y) defined on pairs of binary words and equal 
to
the smallest number of deletions and insertions that transform the word x into
y. It is not difficult to show that the function r(x, y) is a metric .... It 
can
be shown that the function r(x, y) defined on pairs of binary words as equal to
the smallest number of deletions, insertions, and reversals that will transform
x into y is a metric ...."
</AB>
<JT>Soviet Phys Dokl</JT>
<PY>1966</PY>
<VO>10</VO>
<NO>8</NO>
<PP>707-710</PP>
</SEQ>

<SEQ>
<UI>0693   Li,M.         String-Matching Cannot.. Inform.Process. 86 
22(5):231-236
</UI>
<AU>Li M;
    Yesha Y
</AU>
<TI>String-Matching Cannot be Done by a Two-head One-way Deterministic Finite
Automaton
</TI>
<SU>String match;
    Complexity;
    USA;
    Automata
</SU>
<AB>"String-matching cannot be performed by a two-head one-way deterministic
finite automaton (or even by a Turing machine with two one-way input heads and
o(n) storage space)."
</AB>
<JT>Inform Process Lett</JT>
<PY>1986</PY>
<VO>22</VO>
<NO>5</NO>
<PP>231-236</PP>
</SEQ>

<SEQ>
<UI>0694   Lipman,D.J.   A Tool for Multiple Se.. Proc.Nat.Acad.S 89 
86:4412-4415
</UI>
<AU>Lipman DJ;
    Altschul SF;
    Kececioglu JD
</AU>
<TI>A Tool for Multiple Sequence Alignment
</TI>
<SU>Multiple alignment;
    USA;
    Sequence alignment;
    Dynamic programming
</SU>
<AB>"We describe the design and application of a tool for multiple alignment
of amino acid sequences that implements a new algorithm that greatly reduces 
the
computational demands of dynamic programming. This tool is able to align in
reasonable time as many as eight sequences the length of an average protein."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>86</PY>
<VO>86</VO>
<PP>4412-4415</PP>
</SEQ>

<SEQ>
<UI>0695   Lipman,D.J.   Comparative Analysis o.. Nucleic Acids R 82 
10(8):2723-273
</UI>
<AU>Lipman DJ;
    Maizel J
</AU>
<TI>Comparative Analysis of Nucleotide Acid Sequences by their General
Constraints
</TI>
<SU>Sequence analysis;
    Significance;
    USA;
    Information theory;
    Nucleotide
</SU>
<AB>"We describe two measures of a nucleic acid sequence, derived from
Information Theory, which characterize the constraints toward nonuniform base
composition, and the constraints on the ordering of the bases."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1982</PY>
<VO>10</VO>
<NO>8</NO>
<PP>2723-2739</PP>
</SEQ>

<SEQ>
<UI>0696   Lipman,D.J.   Rapid and Sensitive Pr.. Science         85 227(22 
March):
</UI>
<AU>Lipman DJ;
    Pearson WR
</AU>
<TI>Rapid and Sensitive Protein Similarity Searches
</TI>
<SU>Database search;
    USA;
    Similarity;
    Protein
</SU>
<AB>"We have developed an algorithm, used in the computer program FASTP... In
this article, we discuss the basis of the algorithm and its application to two
proteins evolutionarily related to other sequences in the database. In 
addition,
we show an example of a search which presented puzzling results and discuss
criteria for evaluating such results."
</AB>
<JT>Science </JT>
<PY>1985</PY>
<VO>227</VO>
<NO>22 March</NO>
<PP>1435-1441</PP>
</SEQ>

<SEQ>
<UI>0697   Lipman,D.J.   On the Statistical Sig.. Nucleic Acids R 84 
12(1):215-226
</UI>
<AU>Lipman DJ;
    Wilbur WJ;
    Smith TF;
    Waterman MS
</AU>
<TI>On the Statistical Significance of Nucleic Acid Similarities
</TI>
<SU>Subalignment;
    Significance;
    USA;
    Statistical;
    Similarity;
    Nucleic acid
</SU>
<AB>"The known statistical properties of nucleic acid sequences strongly
affect the statistical distribution of similarity values when calculated by
standard procedures. We propose a series of models which account for some of
these known statistical properties. The utility of the method is demonstrated 
in
evaluating high relative similarity scores in four specific cases in which 
there
is little biological context by which to judge the similarities."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1984</PY>
<VO>12</VO>
<NO>1</NO>
<PP>215-226</PP>
</SEQ>

<SEQ>
<UI>0698   Lopresti,D.P. P-NAC: A Systolic Arra.. Computer        87 
20(7):98-99
</UI>
<AU>Lopresti DP
</AU>
<TI>P-NAC: A Systolic Array for Comparing Nucleic Acid Sequences
</TI>
<SU>Pairwise alignment;
    Parallel;
    USA;
    Dynamic programming;
    Nucleic acid
</SU>
<AB>"The Princeton Nucleic Acid Comparator (P-NAC) is a linear systolic array
for comparing DNA sequences. The architecture is a parallel realization of a
standard dynamic programming algorithm" (Wagner, Fischer 1974)
</AB>
<JT>Computer </JT>
<PY>1987</PY>
<VO>20</VO>
<NO>7</NO>
<PP>98-99</PP>
</SEQ>

<SEQ>
<UI>0699   Lowrance,R.   An Extension of the St.. J.Assoc.Comput. 75 
22(2):177-183
</UI>
<AU>Lowrance R;
    Wagner RA
</AU>
<TI>An Extension of the String-to-string Correction Problem
</TI>
<SU>Pairwise alignment;
    Correction;
    USA;
    Edit
</SU>
<AB>The string-to-string correction problem of Wagner and Fischer (1974) used
the edit operations of insertion, deletion, and mutation. "This paper extends
the set of allowable edit operations to include the operation of interchanging
the positions of two adjacent characters."
</AB>
<JT>J Assoc Comput Mach</JT>
<PY>1975</PY>
<VO>22</VO>
<NO>2</NO>
<PP>177-183</PP>
</SEQ>

<SEQ>
<UI>0700   Lu,S.Y.       A Sentence-to-Sentence.. IEEE Trans.Syst 78 
8(5):381-389
</UI>
<AU>Lu SY;
    Fu KS
</AU>
<TI>A Sentence-to-Sentence Clustering Procedure for Pattern Analysis
</TI>
<SU>Sequence proximity;
    USA;
    Clustering;
    Probabilistic;
    Edit
</SU>
<AB>"The similarity between patterns is expressed in terms of the distance
between their corresponding sentences. A weighted distance between two strings
is defined and its probabilistic interpretation given. ... The following
algorithm, which is an extension of Wagner and Fisher's algorithm, computes the
weighted distance [in which insertions, deletions, and substitutions each have
distinct weights] between two strings."
</AB>
<JT>IEEE Trans Systems Man Cybernet</JT>
<PY>1978</PY>
<VO>8</VO>
<NO>5</NO>
<PP>381-389</PP>
</SEQ>

<SEQ>
<UI>0701   Luthy,R.      Secondary Structure-ba.. Proteins Struct 91 
10:229-239
</UI>
<AU>Luthy R;
    McLachlan AD;
    Eisenberg D
</AU>
<TI>Secondary Structure-based Profiles: Use of Structure-Conserving Scoring
Tables in Searching Protein Sequence Databases for Structural Similarities
</TI>
<SU>Match a pattern matrix;
    USA;
    Sequence database;
    Sequence comparison;
    Structure;
    Profile;
    Scoring;
    Similarity;
    Protein;
    Secondary
</SU>
<AB>"The profile method, for detecting distantly related proteins by sequence
comparison, has been extended to incorporate secondary structure information
from known X-ray structures. ... As in the standard profile method [Gribskov,
McLachlan, Eisenberg 1987], a position-dependent scoring table, termed a
profile, is calculated from the aligned sequences."
</AB>
<JT>Proteins Struct Funct Genet</JT>
<PY>10</PY>
<VO>10</VO>
<PP>229-239</PP>
</SEQ>

<SEQ>
<UI>0702   Lyon,G.       Syntax-Directed Least-.. Comm.ACM        74 
17(1):3-14
</UI>
<AU>Lyon G
</AU>
<TI>Syntax-Directed Least-Errors Analysis for Context-Free Languages: A
Practical Approach
</TI>
<SU>Sequence recognition;
    Correction;
    Language;
    USA;
    Dynamic programming
</SU>
<AB>"A least-errors recognizer is developed informally using the well-known
recognizer of Earley, along with elements of Bellman's dynamic programming. The
analyzer takes a general class of context-free grammars as drivers, and any
finite string as input. Recognition consists of a least-errors count for a
corrected version of the input relative to the driver grammar."
</AB>
<JT>Comm ACM </JT>
<PY>1974</PY>
<VO>17</VO>
<NO>1</NO>
<PP>3-14</PP>
</SEQ>

<SEQ>
<UI>0703   Maes,M.       On a Cyclic String-to-.. Inform.Process. 90 
35(2):73-78
</UI>
<AU>Maes M
</AU>
<TI>On a Cyclic String-to-string Correction Problem
</TI>
<SU>Pairwise alignment;
    Correction;
    NL;
    Longest common;
    Edit
</SU>
<AB>"This leads to the notion of a cyclic string, and in this paper we 
present
an O(nm log m) algorithm to solve the string-to-string correction problem for
cyclic strings."
</AB>
<JT>Inform Process Lett</JT>
<PY>1990</PY>
<VO>35</VO>
<NO>2</NO>
<PP>73-78</PP>
</SEQ>

<SEQ>
<UI>0704   Maier,D.      The Complexity of Some.. J.Assoc.Comput. 78 
25(2):322-336
</UI>
<AU>Maier D
</AU>
<TI>The Complexity of Some Problems on Subsequences and Supersequences
</TI>
<SU>Longest common;
    Supersequence;
    Complexity;
    USA;
    Subsequence
</SU>
<AB>The problem of calculating a longest common subsequence in N sequences is
NP-complete
</AB>
<JT>J Assoc Comput Mach</JT>
<PY>1978</PY>
<VO>25</VO>
<NO>2</NO>
<PP>322-336</PP>
</SEQ>

<SEQ>
<UI>0705   Maizel,J.V.,J Enhanced Graphic Matri.. Proc.Nat.Acad.S 81 
78(12):7665-76
</UI>
<AU>Maizel JV Jr;
    Lenk RP
</AU>
<TI>Enhanced Graphic Matrix Analysis of Nucleic Acid and Protein Sequences
</TI>
<SU>Pairwise comparison;
    Dot;
    DE;
    Regularities;
    Repeat;
    Palindrome;
    Protein;
    Nucleic acid;
    Graphic;
    Matrix
</SU>
<AB>The method "analyzes nucleic acid and amino acid sequences for features 
of
possible biological interest and reveals the spatial patterns of such features.
When a sequence is compared to itself the technique shows regions of self-
complementarity, direct repeats, and palindromic subsequences. Comparison of 
two
... sequences ... showed domains of similarity, regions of divergence, and
features explainable by transpositions."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>1981</PY>
<VO>78</VO>
<NO>12</NO>
<PP>7665-7669</PP>
</SEQ>

<SEQ>
<UI>0706   Manber,U.     An Algorithm for Strin.. Inform.Process. 91 
37:133-136
</UI>
<AU>Manber U;
    Baeza-Yates R
</AU>
<TI>An Algorithm for String Matching with a Sequence of Don't Cares
</TI>
<SU>Match with don't cares;
    USA;
    String match;
    Text search;
    Don't care;
    Don't care;
    Don't care;
    Don't care;
    Don't care;
    Don't care;
    Don't care;
    Don't care;
    Don't care;
    Algorithm;
    Don't care;
    Don't care
</SU>
<AB>"We present an algorithm to search for a pattern containing a sequence of
don't care symbols in a preprocessed text. This problem models proximity
searching in text searching systems and special searching problems in 
biological
sequences."
</AB>
<JT>Inform Process Lett</JT>
<PY>37</PY>
<VO>37</VO>
<PP>133-136</PP>
</SEQ>

<SEQ>
<UI>0707   Marck,C.      Fast Analysis of DNA a.. Nucleic Acids R 86 
14(1):583-590
</UI>
<AU>Marck C
</AU>
<TI>Fast Analysis of DNA and Protein Sequence on Apple IIe: Restriction Sites
Search, Alignment of Short Sequence and Dot Matrix Analysis
</TI>
<SU>Match with k differences;
    FR;
    Dot;
    Restriction;
    Gap;
    Protein;
    DNA;
    Matrix
</SU>
<AB>"The search for a short sequence (&lt; 36 bases) within a longer one (up to
9999 bases) with a given number of mismatches or gaps allowed has also been
written in assembly language." The algorithm is a simplification of one by
Fickett (1984)
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1986</PY>
<VO>14</VO>
<NO>1</NO>
<PP>583-590</PP>
</SEQ>

<SEQ>
<UI>0708   Martinez,H.M. An Efficient Method fo.. Nucleic Acids R 83 
11(13):4629-46
</UI>
<AU>Martinez HM
</AU>
<TI>An Efficient Method for Finding Repeats in Molecular Sequences
</TI>
<SU>Multiple alignment;
    Common feature;
    Regularities;
    USA;
    Complexity;
    Repeat;
    Dyad
</SU>
<AB>"The problem of finding repeats in molecular sequences is approached as a
sorting problem. It leads to a method which is linear in space complexity and N
log N in expected time complexity. ... Of particular interest is that several
sequences can be treated as a single sequence. This leads to an efficient 
method
... for finding common features of many sequences, such as favorable
alignments."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1983</PY>
<VO>11</VO>
<NO>13</NO>
<PP>4629-4634</PP>
</SEQ>

<SEQ>
<UI>0709   Martinez,H.M. A Flexible Multiple Se.. Nucleic Acids R 88 
16(5):1683-169
</UI>
<AU>Martinez HM
</AU>
<TI>A Flexible Multiple Sequence Alignment Program
</TI>
<SU>Multiple alignment;
    Clustering;
    USA;
    Sequence alignment;
    Region;
    Program
</SU>
<AB>"The 'regions' method for multisequence alignment used in the previously
reported program MALIGN [Sobel, Martinez 1986] has been generalized to include
recursive refinement so that unaligned portions between two regions at the
current level of resolution can be handled with increased resolution. ...
GENALIGN uses this improved regions method to execute fast pairwise alignments
in the framework of Taylor's multisequence alignment procedure using clustered
pairwise alignments."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1988</PY>
<VO>16</VO>
<NO>5</NO>
<PP>1683-1691</PP>
</SEQ>

<SEQ>
<UI>0710   Masek,W.J.    A Faster Algorithm Com.. J.Comput.System 80 
20(1):18-31
</UI>
<AU>Masek WJ;
    Paterson MS
</AU>
<TI>A Faster Algorithm Computing String Edit Distances
</TI>
<SU>Pairwise alignment;
    USA;
    Edit;
    Longest common;
    Distance;
    Algorithm
</SU>
<AB>"The operations we admit are deleting, inserting and replacing one symbol
at a time, with possibly different costs for each of these operations. ... We
describe an algorithm for computing the edit distance between two strings of
length n" which requires O(n2/log n) "steps whenever the costs of edit
operations are integral multiples of a single positive real number and the
alphabet for the strings is finite."
</AB>
<JT>J Comput Systems Sci</JT>
<PY>1980</PY>
<VO>20</VO>
<NO>1</NO>
<PP>18-31</PP>
</SEQ>

<SEQ>
<UI>0711   Masek,W.J.    How to Compute String-.. Time Warps, S.. 
83Addison-Wesley
</UI>
<AU>Masek WJ;
    Paterson MS
</AU>
<TI>How to Compute String-edit Distances Quickly
</TI>
<ED>Sankoff D
    Kruskal JB
</ED>
<BK>Time Warps, String Edits, and Macromolecules: The Theory and Practice of
Sequence Comparison
</BK>
<SU>Pairwise alignment;
    USA;
    Edit;
    Longest common;
    Distance
</SU>
<AB>"We present an algorithm with an asymptotically faster execution time, 
for
example O(n2/log n) when both strings are of length n, providing that the
alphabet for the strings is finite and all edit costs are integral multiples of
some real number r."
</AB>
<PU>Addison-Wesley </PU>
<PL>Reading, MA </PL>
<PY>1983</PY>
<PP>337-349</PP>
</SEQ>

<SEQ>
<UI>0712   McCaldon,P.   Oligopeptide Biases in.. Proteins Struct 88 4:99-122
</UI>
<AU>McCaldon P;
    Argos P
</AU>
<TI>Oligopeptide Biases in Protein Sequences and Their Use in Predicting
Protein Coding Regions in Nucleotide Sequences
</TI>
<SU>String match;
    Significance;
    DE;
    Region;
    Coding;
    Frame;
    Protein;
    Nucleotide
</SU>
<AB>"We have examined oligopeptides with lengths ranging from 2 to 11 
residues
in protein sequences that show no obvious evolutionary relationship. ... The
results, contrary to previous studies, show clear prejudices in protein
sequences. The oligopeptide preferences were used to help decide the
significance of sequence homologies ...."
</AB>
<JT>Proteins Struct Funct Genet</JT>
<PY>4</PY>
<VO>4</VO>
<PP>99-122</PP>
</SEQ>

<SEQ>
<UI>0713   McCreight,E.M A Space-Economical Suf.. J.Assoc.Comput. 76 
23(2):262-272
</UI>
<AU>McCreight EM
</AU>
<TI>A Space-Economical Suffix Tree Construction Algorithm
</TI>
<SU>String match;
    Search tree;
    USA;
    Pattern match;
    Suffix;
    Algorithm
</SU>
<AB>"A new algorithm is presented for constructing auxiliary digital search
trees to aid in exact-match substring searching. This algorithm has the same
asymptotic running time bound as previously published algorithms, but is more
economical in space. ... New work on ... (the update problem) is presented."
</AB>
<JT>J Assoc Comput Mach</JT>
<PY>1976</PY>
<VO>23</VO>
<NO>2</NO>
<PP>262-272</PP>
</SEQ>

<SEQ>
<UI>0714   McLachlan,A.D Tests for Comparing Re.. J.Mol.Biol.     71 
61:409-424
</UI>
<AU>McLachlan AD
</AU>
<TI>Tests for Comparing Related Amino Acid Sequences. Cytochrome c and
Cytochrome c551
</TI>
<SU>Pairwise comparison;
    Dot;
    UK;
    Segment;
    Statistical;
    Repeat;
    Amino acid
</SU>
<AB>"An improved method for testing similarities or repeats in protein
sequences ... includes three features: a measure of similarity for amino acids,
based on observed substitutions in homologous proteins; a search procedure 
which
compares all pairs of segments of two proteins; new statistical tests which
estimate the probabilities that observed correlations could have occurred by
chance."
</AB>
<JT>J Mol Biol</JT>
<PY>61</PY>
<VO>61</VO>
<PP>409-424</PP>
</SEQ>

<SEQ>
<UI>0715   McLachlan,A.D Analysis of Gene Dupli.. J.Mol.Biol.     83 
169(1):15-30
</UI>
<AU>McLachlan AD
</AU>
<TI>Analysis of Gene Duplication Repeats in the Myosin Rod
</TI>
<SU>Pairwise comparison;
    Dot;
    UK;
    Statistical;
    Significance;
    Repeat;
    Duplication;
    Gene
</SU>
<AB>"For the analysis of the myosin repeats we have needed to develop 
improved
statistical methods which make accurate tests of significance under a range of
assumptions about the nature of the sequence. These methods are outlined below.
... We begin below with a brief resume of the comparison matrix method and then
describe the newer developments."
</AB>
<JT>J Mol Biol</JT>
<PY>1983</PY>
<VO>169</VO>
<NO>1</NO>
<PP>15-30</PP>
</SEQ>

<SEQ>
<UI>0716   McLachlan,A.D Confidence Limits for .. J.Mol.Biol.     85 
185:39-49
</UI>
<AU>McLachlan AD;
    Boswell DR
</AU>
<TI>Confidence Limits for Homology in Protein or Gene Sequences. The c-myc
Oncogene and Adenovirus E1a Protein
</TI>
<SU>Pairwise alignment;
    Significance;
    UK;
    Statistical;
    Homology;
    Confidence;
    Protein;
    Gene
</SU>
<AB>"We describe new tests, of general application, for deciding whether two
proteins or DNA sequences are significantly homologous, in cases where the
relationship is neither evidently true nor evidently false."
</AB>
<JT>J Mol Biol</JT>
<PY>185</PY>
<VO>185</VO>
<PP>39-49</PP>
</SEQ>

<SEQ>
<UI>0717   Mehldau,G.    A System for Pattern M.. Comput.Appl.Bio 93 
9(3):299-314
</UI>
<AU>Mehldau G;
    Myers G
</AU>
<TI>A System for Pattern Matching Applications on Biosequences
</TI>
<SU>Match complex patterns;
    USA;
    Pattern match;
    Approximate match;
    Program;
    Pattern definition
</SU>
<AB>"ANREP is a system for finding matches to patterns .... ANREP provides a
unified framework for almost all previously proposed biosequence patterns and
extends them by providing approximate matching, a feature heretofore 
unavailable
except for the limited case of individual sequences."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>3</NO>
<PP>299-314</PP>
</SEQ>

<SEQ>
<UI>0718   Mengeritsky,G Recognition of Charact.. Comput.Appl.Bio 87 
3(3):223-227
</UI>
<AU>Mengeritsky G;
    Smith TF
</AU>
<TI>Recognition of Characteristic Patterns in Sets of Functionally Equivalent
DNA Sequences
</TI>
<SU>Consensus sequence;
    Statistical;
    USA;
    Pattern discovery;
    DNA;
    Recognition
</SU>
<AB>"An algorithm has been developed for the identification of unknown
patterns which are distinctive for a set of short DNA sequences believed to be
functionally equivalent. A pattern is defined as being a string, containing
fully or partially specified nucleotides at each position of the string."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1987</PY>
<VO>3</VO>
<NO>3</NO>
<PP>223-227</PP>
</SEQ>

<SEQ>
<UI>0719   Miller,P.L.   Parallel Computation a.. Comput.Appl.Bio 91 
7(1):71-78
</UI>
<AU>Miller PL;
    Nadkarni PM;
    Pearson WR
</AU>
<TI>Parallel Computation and FASTA: Confronting the Problem of Parallel
Database Search for a Fast Sequence Comparison Algorithm
</TI>
<SU>Database search;
    Parallel;
    USA;
    FASTA;
    Sequence comparison;
    Algorithm
</SU>
<AB>"We have parallelized the FASTA algorithm for biological sequence
comparison using Linda, a machine-independent parallel programming language. 
The
resulting parallel program runs on a variety of different parallel machines."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1991</PY>
<VO>7</VO>
<NO>1</NO>
<PP>71-78</PP>
</SEQ>

<SEQ>
<UI>0720   Miller,P.L.   Comparing Machine-inde.. Comput.Appl.Bio 92 
8(2):167-175
</UI>
<AU>Miller PL;
    Nadkarni PM;
    Pearson WR
</AU>
<TI>Comparing Machine-independent versus Machine-specific Parallelization of 
a
Software Platform for Biological Sequence Comparison
</TI>
<SU>Database search;
    Parallel;
    USA;
    Sequence comparison
</SU>
<AB>"A platform program that performs biological sequence comparison provides
a case study to compare the relative advantages of a machine-independent
approach to parallel computation versus a machine-specific approach. ... In the
benchmark tests reported, the benefits of the machine-independent approach were
achieved with only a modest sacrifice in efficiency."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>2</NO>
<PP>167-175</PP>
</SEQ>

<SEQ>
<UI>0721   Miller,W.     Building Multiple Alig.. Comput.Appl.Bio 93 
9(2):169-176
</UI>
<AU>Miller W
</AU>
<TI>Building Multiple Alignments from Pairwise Alignments
</TI>
<SU>Multiple alignment;
    USA;
    Pairwise alignment;
    Dot
</SU>
<AB>"Given a family of related sequences, one can first determine alignments
between various pairs of those sequences, then construct a simultaneous
alignment of all the sequences that is determined in a natural manner by the 
set
of pairwise alignments. ... This paper presents an efficient algorithm for
constructing a multiple alignment from a set of pairwise alignments." It makes
five assumptions and is based on dot plots
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>2</NO>
<PP>169-176</PP>
</SEQ>

<SEQ>
<UI>0722   Miller,W.     A File Comparison Prog.. Software.Practi 85 
15(11):1025-10
</UI>
<AU>Miller W;
    Myers EW
</AU>
<TI>A File Comparison Program
</TI>
<SU>Longest common;
    Sequence proximity;
    USA;
    Program;
    Edit
</SU>
<AB>"This paper presents a simple method for computing a shortest sequence of
insertion and deletion commands that converts one given file to another. The
method is particularly efficient when the difference between the two files is
small compared to the files' lengths. In experiments performed on typical 
files,
the program often ran four times faster than the UNIX diff command."
</AB>
<JT>Software Practice Experience </JT>
<PY>1985</PY>
<VO>15</VO>
<NO>11</NO>
<PP>1025-1040</PP>
</SEQ>

<SEQ>
<UI>0723   Miller,W.     Sequence Comparison wi.. Bull.Math.Biol. 88 
50(2):97-120
</UI>
<AU>Miller W;
    Myers EW
</AU>
<TI>Sequence Comparison with Concave Weighting Functions
</TI>
<SU>Pairwise alignment;
    USA;
    Sequence comparison;
    Gap;
    Function
</SU>
<AB>"We consider efficient methods for computing a difference metric between
two sequences of symbols, where the cost of an operation to insert or delete a
block of symbols is a concave function of the block's length. Alternatively,
sequences can be optimally aligned when gap penalties are a concave function of
the gap length. Two algorithms [based on Waterman 1984] are presented."
</AB>
<JT>Bull Math Biol</JT>
<PY>1988</PY>
<VO>50</VO>
<NO>2</NO>
<PP>97-120</PP>
</SEQ>

<SEQ>
<UI>0724   Mirkin,B.     Consensus Functions an.. Bull.Math.Biol. 93 
55(4):695-713
</UI>
<AU>Mirkin B;
    Roberts FS
</AU>
<TI>Consensus Functions and Patterns in Molecular Sequences
</TI>
<SU>Consensus sequence;
    Neighbourhood;
    RU;
    Function
</SU>
<AB>"We study a method of consensus originally due to Waterman et al. (1984)
which is used to identify patterns or features in a molecular sequence where a
pattern can be moved within a given 'window.' We show that some well-known
consensus methods of the social sciences, the median and the mean, are special
cases of this method .... The specific parameters used [by] Waterman et al. 
make
their method equivalent to the median procedure ...."
</AB>
<JT>Bull Math Biol</JT>
<PY>1993</PY>
<VO>55</VO>
<NO>4</NO>
<PP>695-713</PP>
</SEQ>

<SEQ>
<UI>0725   Mironov,A.A.  Statistical Method for.. Nucleic Acids R 88 
16(11):5169-51
</UI>
<AU>Mironov AA;
    Alexandrov NN
</AU>
<TI>Statistical Method for Rapid Homology Search
</TI>
<SU>Database search;
    RU;
    Statistical;
    Fragment;
    Significance;
    Homology
</SU>
<AB>"Sequences to be compared are divided into fragments with length N, where
N is the minimal expected homology size. For each of the fragment pairs the
distance r is calculated. If r occurs smaller then r0 - the cutoff value,
[these] sequences can contain homologous fragments. ... This way of comparison
may be used for preliminary selection of sequence pairs which are thought to be
homologous and must be completed by the following construction of the optimal
alignment."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1988</PY>
<VO>16</VO>
<NO>11</NO>
<PP>5169-5173</PP>
</SEQ>

<SEQ>
<UI>0726   Miyazawa,S.   A New Substitution Mat.. Protein Eng.    93 
6(3):267-278
</UI>
<AU>Miyazawa S;
    Jernigan RL
</AU>
<TI>A New Substitution Matrix for Protein Sequence Searches Based on Contact
Frequencies in Protein Structures
</TI>
<SU>Sequence proximity;
    Substitution;
    JP;
    Sequence search;
    Scoring;
    Structure;
    Protein;
    Matrix
</SU>
<AB>"In global and local homology searches, [our] scoring matrix tends to
yield significantly higher alignment scores than either the unitary matrix or
the genetic code matrix, and also may yield higher alignment scores for
distantly related protein pairs than MDM78."
</AB>
<JT>Protein Eng</JT>
<PY>1993</PY>
<VO>6</VO>
<NO>3</NO>
<PP>267-278</PP>
</SEQ>

<SEQ>
<UI>0727   Mohana Rao,J. New Scoring Matrix for.. Internat.J.Pept 87 
29:276-281
</UI>
<AU>Mohana Rao JK
</AU>
<TI>New Scoring Matrix for Amino Acid Residue Exchanges Based on Residue
Characteristic Physical Parameters
</TI>
<SU>Sequence proximity;
    Substitution;
    USA;
    Scoring;
    Amino acid;
    Matrix;
    Residue;
    Physical
</SU>
<AB>"When comparing protein sequences for detecting homologies, the use of
[our new EMPAR scoring] matrix in place of the Dayhoff log-odds matrix yields
results that reflect the topological similarities in the proteins. The use of
EMPAR is equivalent to the parametric correlation coefficient approach of Ooi
and his colleagues."
</AB>
<JT>Internat J Pept Protein Res</JT>
<PY>29</PY>
<VO>29</VO>
<PP>276-281</PP>
</SEQ>

<SEQ>
<UI>0728   Moore,G.W.    Alignment Statistic fo.. J.Mol.Evol.     77 
9:121-130
</UI>
<AU>Moore GW;
    Goodman M
</AU>
<TI>Alignment Statistic for Identifying Related Protein Sequences
</TI>
<SU>Pairwise alignment;
    Significance;
    USA;
    Statistical;
    Codon;
    Protein
</SU>
<AB>"Closely related proteins show an obvious kinship by having numerous
matching amino acids in their aligned sequences. Kinship between anciently
separated proteins requires a statistical evaluation to rule out fortuitous
similarities. A simple statistic is developed which assumes equal probability
for all codon pairs ...."
</AB>
<JT>J Mol Evol</JT>
<PY>9</PY>
<VO>9</VO>
<PP>121-130</PP>
</SEQ>

<SEQ>
<UI>0729   Mott,R.F.     Maximum-Likelihood Est.. Bull.Math.Biol. 92 
54(1):59-75
</UI>
<AU>Mott RF
</AU>
<TI>Maximum-Likelihood Estimation of the Statistical Distribution of Smith-
Waterman Local Sequence Similarity Scores
</TI>
<SU>Subalignment;
    Significance;
    Likelihood;
    UK;
    Statistical;
    Database search;
    Distribution;
    Similarity;
    Estimation;
    Score
</SU>
<AB>"A method is described for estimating the distribution and hence testing
the statistical significance of sequence similarity scores obtained during a
data-bank search. Maximum-likelihood is used to fit a model to the scores,
avoiding any costly simulation of random sequences. The method is applied in
detail to the Smith-Waterman algorithm when gaps are allowed ...."
</AB>
<JT>Bull Math Biol</JT>
<PY>1992</PY>
<VO>54</VO>
<NO>1</NO>
<PP>59-75</PP>
</SEQ>

<SEQ>
<UI>0730   Mott,R.F.     STATSEARCH: A GCG-comp.. Comput.Appl.Bio 90 
6(3):293-295
</UI>
<AU>Mott RF;
    Kirkwood TBL
</AU>
<TI>STATSEARCH: A GCG-compatible Program for Assessing Statistical
Significance during DNA and Protein Databank Searches
</TI>
<SU>Sequence analysis;
    Significance;
    UK;
    Statistical;
    Program;
    Database search;
    Protein;
    DNA;
    Databank
</SU>
<AB>"We describe a program STATSEARCH which implements the method of [Mott,
Kirkwood, Curnow (1989)] for searching DNA and protein sequence databanks for
statistically significant similarities to a given query sequence."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1990</PY>
<VO>6</VO>
<NO>3</NO>
<PP>293-295</PP>
</SEQ>

<SEQ>
<UI>0731   Mott,R.F.     A Test for the Statist.. Comput.Appl.Bio 89 
5(2):123-131
</UI>
<AU>Mott RF;
    Kirkwood TBL;
    Curnow RN
</AU>
<TI>A Test for the Statistical Significance of DNA Sequence Similarities for
Application in Databank Searches
</TI>
<SU>Database search;
    Significance;
    UK;
    Statistical;
    Dyad;
    Similarity;
    DNA;
    Databank
</SU>
<AB>"A method is developed, based on word-searching, which provides a rapid
test for the statistical significance of DNA sequence similarities for use in
databank searching. The method makes allowance for the lengths and dinucleotide
compositions of the sequences being compared."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1989</PY>
<VO>5</VO>
<NO>2</NO>
<PP>123-131</PP>
</SEQ>

<SEQ>
<UI>0732   Mott,R.F.     An Accurate Approximat.. Bull.Math.Biol. 90 
52(6):773-784
</UI>
<AU>Mott RF;
    Kirkwood TBL;
    Curnow RN
</AU>
<TI>An Accurate Approximation to the Distribution of the Length of the 
Longest
Matching Word between Two Random DNA Sequences
</TI>
<SU>Longest common;
    Significance;
    UK;
    Distribution;
    Statistical;
    Approximation;
    DNA;
    Word
</SU>
<AB>The derivation uses "only elementary probability arguments. The
distribution is shown to be consistent with previous asymptotic results for the
mean and variance of longest common words. The application of the distribution
to assessing the statistical significance of sequence similarities is
considered."
</AB>
<JT>Bull Math Biol</JT>
<PY>1990</PY>
<VO>52</VO>
<NO>6</NO>
<PP>773-784</PP>
</SEQ>

<SEQ>
<UI>0733   Mott,R.F.     Tests for the Statisti.. Protein Eng.    90 
4(2):149-154
</UI>
<AU>Mott RF;
    Kirkwood TBL;
    Curnow RN
</AU>
<TI>Tests for the Statistical Significance of Protein Sequence Similarities 
in
Databank Searches
</TI>
<SU>Database search;
    Significance;
    UK;
    Statistical;
    Similarity;
    Protein;
    Databank
</SU>
<AB>"A suite of tests to evaluate the statistical significance of protein
sequence similarities is developed for use in data bank searches. The tests are
based on the Wilbur-Lipman word-search algorithm, and take into account the
sequence lengths and compositions, and optionally the weighting of amino acid
matches."
</AB>
<JT>Protein Eng</JT>
<PY>1990</PY>
<VO>4</VO>
<NO>2</NO>
<PP>149-154</PP>
</SEQ>

<SEQ>
<UI>0734   Mrazek,J.     GLOBIC: A Very Fast Mi.. Comput.Appl.Bio 92 
8(1):29-34
</UI>
<AU>Mrazek J;
    Kypr J
</AU>
<TI>GLOBIC: A Very Fast Microcomputer Program for Fingerprinting,
Characterization and Comparison of Long Nucleotide Sequences
</TI>
<SU>Pairwise comparison;
    CZ;
    Program;
    Fingerprint;
    N-gram;
    Nucleotide;
    Characterization
</SU>
<AB>"Instead of the nucleotide sequences themselves, GLOBIC compares the 
local
nucleotide or short oligonucleotide compositions. GLOBIC presents two-
dimensional maps of contour lines depicting the similarity of two different
sequences, a sequence compared to itself, to its complementary sequence or to a
random sequence."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>1</NO>
<PP>29-34</PP>
</SEQ>

<SEQ>
<UI>0735   Mrazek,J.     UNIREP: A Microcompute.. Comput.Appl.Bio 93 
9(3):355-360
</UI>
<AU>Mrazek J;
    Kypr J
</AU>
<TI>UNIREP: A Microcomputer Program to Find Unique and Repetitive Nucleotide
Sequences in Genomes
</TI>
<SU>Sequence analysis;
    CZ;
    Genome;
    Program;
    Repetition;
    Nucleotide
</SU>
<AB>The program "identifies repetitive and unique nucleotide sequences in
genomes or parts of genomes. A key feature of the algorithm is an
oligonucleotide representation in a numerical code to make possible a 
comparison
of all pairs of oligonucleotides (including overlaps) occurring in the analyzed
sequence."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>3</NO>
<PP>355-360</PP>
</SEQ>

<SEQ>
<UI>0736   Mukherjee,A.  Hardware Algorithms fo.. IEEE Trans.Comp 89 
38(4):600-603
</UI>
<AU>Mukherjee A
</AU>
<TI>Hardware Algorithms for Determining Similarity Between Two Strings
</TI>
<SU>Longest common;
    USA;
    Complexity;
    Hardware;
    Pattern match;
    VLSI;
    Similarity;
    Algorithm
</SU>
<AB>"This paper presents pipelined hardware algorithms, with time complexity
O(n + m), for determining similarity between two character strings expressed as
the length of the longest common subsequence of the given pair of strings. ...
Two methods are presented: a sequential method with serial text input and an
alternating method in which both the pattern and the text are serially applied
to the machine."
</AB>
<JT>IEEE Trans Comput</JT>
<PY>1989</PY>
<VO>38</VO>
<NO>4</NO>
<PP>600-603</PP>
</SEQ>

<SEQ>
<UI>0737   Mukhopadhyay, A Fast Algorithm for t.. Inform.Sci.     80 20:69-82
</UI>
<AU>Mukhopadhyay A
</AU>
<TI>A Fast Algorithm for the Longest-Common-Subsequence Problem
</TI>
<SU>Longest common;
    USA;
    Complexity;
    Algorithm
</SU>
<AB>"A fast algorithm for the [LCS] problem is presented which runs in O((p +
n) log n) time, where p is the total number of pairs of matched positions
between the strings. Thus, the average performance of this algorithm is much
better than those of the quadratic algorithms proposed earlier and takes only a
linear amount of space."
</AB>
<JT>Inform Sci</JT>
<PY>20</PY>
<VO>20</VO>
<PP>69-82</PP>
</SEQ>

<SEQ>
<UI>0738   Murata,M.     Three-way Needleman-Wu.. Methods Enzymol 90 
183:365-375
</UI>
<AU>Murata M
</AU>
<TI>Three-way Needleman-Wunsch Algorithm
</TI>
<SU>Multiple alignment;
    AU;
    Needleman-Wunsch;
    Sequence alignment;
    Dynamic programming;
    Algorithm
</SU>
<AB>Murata, Richardson, Sussman (1985) extended "the method of Needleman and
Wunsch so that three sequences could be compared simultaneously.
...Modifications to the program for the comparison of longer sequences are
described in this chapter."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>365-375</PP>
</SEQ>

<SEQ>
<UI>0739   Murata,M.     Simultaneous Compariso.. Proc.Nat.Acad.S 85 
82:3073-3077
</UI>
<AU>Murata M;
    Richardson JS;
    Sussman JL
</AU>
<TI>Simultaneous Comparison of Three Protein Sequences
</TI>
<SU>Multiple alignment;
    AU;
    Needleman-Wunsch;
    Sequence alignment;
    Dynamic programming;
    Protein
</SU>
<AB>"Here we present an algorithm for the simultaneous comparison of three
biological sequences. The [dynamic programming] algorithm is an extension of 
the
method developed by S. B. Needleman and C. D. Wunsch ...." Murata (1990)
describes extensions
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>82</PY>
<VO>82</VO>
<PP>3073-3077</PP>
</SEQ>

<SEQ>
<UI>0740   Myers,E.W.    An O(ND) Difference Al.. Algorithmica    86 
1:251-266
</UI>
<AU>Myers EW
</AU>
<TI>An O(ND) Difference Algorithm and its Variations
</TI>
<SU>Pairwise alignment;
    USA;
    Longest common;
    Edit;
    Algorithm
</SU>
<AB>"The problems of finding a longest common subsequence of two sequences A
and B and a shortest edit script for transforming A into B have long been known
to be dual problems. In this paper, they are shown to be equivalent to finding 
a
shortest/longest path in an edit graph." The algorithm was discovered
independently by Ukkonen (1983, 1985)
</AB>
<JT>Algorithmica </JT>
<PY>1</PY>
<VO>1</VO>
<PP>251-266</PP>
</SEQ>

<SEQ>
<UI>0741   Myers,E.W.    Optimal Alignments in .. Comput.Appl.Bio 88 
4(1):11-17
</UI>
<AU>Myers EW;
    Miller W
</AU>
<TI>Optimal Alignments in Linear Space
</TI>
<SU>Pairwise alignment;
    USA;
    Sequence alignment;
    Gap;
    Optimal;
    Program
</SU>
<AB>"Space, not time, is often the limiting factor when computing optimal
sequence alignments, and a number of recent papers in the biology literature
have proposed space-saving strategies. However, [Hirschberg 1975] presented a
method that is superior to the new proposals, both in theory and in practice.
The goal of this paper is to give Hirschberg's idea the visibility it deserves
by developing a linear-space version of Gotoh's algorithm, which accommodates
affine gap penalties."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1988</PY>
<VO>4</VO>
<NO>1</NO>
<PP>11-17</PP>
</SEQ>

<SEQ>
<UI>0742   Myers,E.W.    Approximate Matching o.. Bull.Math.Biol. 89 
51(1):5-37
</UI>
<AU>Myers EW;
    Miller W
</AU>
<TI>Approximate Matching of Regular Expressions
</TI>
<SU>Dictionary match;
    USA;
    Language;
    Approximate match;
    Expression;
    Sequence match
</SU>
<AB>"Given a sequence A and regular expression R, the approximate regular
expression matching problem is to find a sequence matching R whose optimal
alignment with A is the highest scoring of all such sequences. This paper
develops an algorithm to solve the problem in time O(MN) .... Our method is
superior to an earlier algorithm by Wagner and Seiferas in several ways."
</AB>
<JT>Bull Math Biol</JT>
<PY>1989</PY>
<VO>51</VO>
<NO>1</NO>
<PP>5-37</PP>
</SEQ>

<SEQ>
<UI>0743   Myers,E.W.    Row Replacement Algori.. ACM Trans.Progr 89 
11(1):33-56
</UI>
<AU>Myers EW;
    Miller W
</AU>
<TI>Row Replacement Algorithms for Screen Editors
</TI>
<SU>Pairwise alignment;
    USA;
    Sequence comparison;
    Dynamic programming;
    Editor;
    Edit;
    Algorithm
</SU>
<AB>"Interactive screen editors repeatedly determine terminal command
sequences to update a screen row. Computing an optimal command sequence differs
from the traditional sequence comparison problem in that there is a cost for
moving the cursor over unedited characters and the cost of an n-character
command is not always the cost of n one-character commands." A dynamic
programming algorithm is presented
</AB>
<JT>ACM Trans Programming Languages Systems </JT>
<PY>1989</PY>
<VO>11</VO>
<NO>1</NO>
<PP>33-56</PP>
</SEQ>

<SEQ>
<UI>0744   Myers,E.W.    Computer Program for t.. Nucleic Acids R 86 
14(1):501-508
</UI>
<AU>Myers EW;
    Mount DW
</AU>
<TI>Computer Program for the IBM Personal Computer which Searches for
Approximate Matches to Short Oligonucleotide Sequences in Long Target DNA
Sequences
</TI>
<SU>Match with k differences;
    USA;
    Program;
    Approximate match;
    DNA
</SU>
<AB>"We describe a program which may be used to find approximate matches to a
short predefined DNA sequence in a larger target DNA sequence." The algorithm 
is
a refinement of one by Sellers (1980)
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1986</PY>
<VO>14</VO>
<NO>1</NO>
<PP>501-508</PP>
</SEQ>

<SEQ>
<UI>0745   Nakatsu,N.    A Longest Common Subse.. Acta Inform.    82 
18:171-179
</UI>
<AU>Nakatsu N;
    Kambayashi Y;
    Yajima S
</AU>
<TI>A Longest Common Subsequence Algorithm Suitable for Similar Text Strings
</TI>
<SU>Longest common;
    JP;
    Subsequence;
    Retrieval;
    Algorithm
</SU>
<AB>Let m and n be lengths of two strings, m &lt;= n, which have a longest 
common
subsequence of length p. "In this paper, O(n(m - p)) algorithm is presented.
When p is close to m (in other words, two given strings are similar), the
algorithm presented here runs much faster than previously known algorithms."
</AB>
<JT>Acta Inform</JT>
<PY>18</PY>
<VO>18</VO>
<PP>171-179</PP>
</SEQ>

<SEQ>
<UI>0746   Nedde,D.N.    Visualizing Relationsh.. Comput.Appl.Bio 93 
9(3):331-335
</UI>
<AU>Nedde DN;
    Ward MO
</AU>
<TI>Visualizing Relationships between Nucleic Acid Sequences using 
Correlation
Images
</TI>
<SU>Pairwise comparison;
    Dot;
    USA;
    Sequence comparison;
    Program;
    Display;
    Correlation
</SU>
<AB>"This paper describes a portable software package implementing a 
variation
of the dot-matrix plot for genetic sequence comparison in conjunction with
highly interactive image manipulation and examination techniques."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>3</NO>
<PP>331-335</PP>
</SEQ>

<SEQ>
<UI>0747   Needleman,S.B A General Method Appli.. J.Mol.Biol.     70 
48(3):443-453
</UI>
<AU>Needleman SB;
    Wunsch CD
</AU>
<TI>A General Method Applicable to the Search for Similarities in the Amino
Acid Sequence of Two Proteins
</TI>
<SU>Pairwise alignment;
    Dynamic programming;
    USA;
    Needleman-Wunsch;
    Edit;
    Similarity;
    Amino acid;
    Protein
</SU>
<AB>Dynamic programming is used with a similarity criterion to find an 
optimal
alignment of two sequences. This paper is a standard reference for the
comparison of two molecular sequences
</AB>
<JT>J Mol Biol</JT>
<PY>1970</PY>
<VO>48</VO>
<NO>3</NO>
<PP>443-453</PP>
</SEQ>

<SEQ>
<UI>0748   Niefind,K.    Amino Acid Similarity .. J.Mol.Biol.     91 
219(3):481-497
</UI>
<AU>Niefind K;
    Schomburg D
</AU>
<TI>Amino Acid Similarity Coefficients for Protein Modeling and Sequence
Alignment Derived from Main-chain Folding Angles
</TI>
<SU>Sequence proximity;
    Substitution;
    DE;
    Scoring;
    Sequence alignment;
    Similarity;
    Amino acid;
    Protein;
    Folding
</SU>
<AB>"A set of 'similarity-parameters' was calculated that reflects the
influence of the proteinogenic amino acids on the structure of the protein
backbone. ... [These parameters] should form a scoring matrix in protein
sequence alignment superior to identity scoring. The usability of the 
'structure
derived correlation matrix (SCM)' for these purposes is assessed and
demonstrated for some examples ...."
</AB>
<JT>J Mol Biol</JT>
<PY>1991</PY>
<VO>219</VO>
<NO>3</NO>
<PP>481-497</PP>
</SEQ>

<SEQ>
<UI>0749   Nielsen,P.T.  On the Expected Durati.. IEEE Trans.Info 73 
19:702-704
</UI>
<AU>Nielsen PT
</AU>
<TI>On the Expected Duration of a Search for a Fixed Pattern in Random Data
</TI>
<SU>String match;
    DK;
    Regularities;
    Complexity
</SU>
<AB>"An expression is obtained for the expected duration of a search to find 
a
given L-ary sequence in a semi-infinite stream of random L-ary data. The search
time is found to be an increasing function of the lengths of the 'bifices' of
the pattern, where the term bifix dentoes a sequence wihc is both a prefix and 
a
suffix." L is the cardinality of the alphabet
</AB>
<JT>IEEE Trans Inform Theory </JT>
<PY>19</PY>
<VO>19</VO>
<PP>702-704</PP>
</SEQ>

<SEQ>
<UI>0750   Ninio,J.      String Analysis and En.. J.Mol.Biol.     89 
207:585-596
</UI>
<AU>Ninio J;
    Mizraji E
</AU>
<TI>String Analysis and Energy Minimization in the Partition of DNA Sequences
</TI>
<SU>Sequence analysis;
    Significance;
    FR;
    Motif;
    Signal;
    DNA;
    Energy
</SU>
<AB>"While the recognition of particular signals in sequences relies on
complex physical interactions, the problem is often analysed in terms of the
presence or absence of literal motifs (strings) in the sequence. We present 
here
a test-case for evaluating the potential of this approach."
</AB>
<JT>J Mol Biol</JT>
<PY>207</PY>
<VO>207</VO>
<PP>585-596</PP>
</SEQ>

<SEQ>
<UI>0751   Nussinov,R.   An Efficient Code Sear.. J.Theor.Biol.   83 
100:319-328
</UI>
<AU>Nussinov R
</AU>
<TI>An Efficient Code Searching for Sequence Homology and DNA Duplication
</TI>
<SU>Pairwise alignment;
    IL;
    Structure;
    Homology;
    Duplication;
    DNA
</SU>
<AB>"This paper presents a very simple and efficient algorithm that searches
for sequence homology and gene duplication. The code finds the best alignment 
of
two, short or long, sequences without having to specify how many unmatched 
bases
are allowed to be looped out. ... The code runs in O(n3/2) units of time. ...
The present method is modeled after the planar folding algorithm ... which has
been introduced to secondary structure of RNA .... In general, any good
secondary structure algorithm can be converted to yield an algorithm for
sequence alignment."
</AB>
<JT>J Theor Biol</JT>
<PY>100</PY>
<VO>100</VO>
<PP>319-328</PP>
</SEQ>

<SEQ>
<UI>0752   O'Hara,P.J.   PRIMEGEN, a Tool for D.. Comput.Appl.Bio 91 
7(4):533-534
</UI>
<AU>O'Hara PJ;
    Venezia D
</AU>
<TI>PRIMEGEN, a Tool for Designing Primers from Multiple Alignments
</TI>
<SU>Multiple alignment;
    Program;
    USA;
    Sequence alignment;
    Region
</SU>
<AB>"PRIMEGEN (for primer generator) can evaluate a multiple protein sequence
alignment both for degree of conservation and for the oligonucleotide 
degeneracy
necessary to encode every amino acid sequence with a given region of the
alignment."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1991</PY>
<VO>7</VO>
<NO>4</NO>
</SEQ>

<SEQ>
<UI>0753   O'Neill,M.C.  Consensus Methods for .. J.Mol.Biol.     89 
207(2):301-310
</UI>
<AU>O'Neill MC
</AU>
<TI>Consensus Methods for Finding and Ranking DNA Binding Sites. Application
to Escherichia coli Promoters
</TI>
<SU>Match a pattern matrix;
    USA;
    Consensus method;
    DNA;
    Binding
</SU>
<AB>"There have been many different approaches employed to define the
'consensus' sequence of various DNA binding sites and to use the definition
obtained to locate and rank members of a given sequence family. The analysis
presented here enlists two of these approaches, each in modified form, to
develop a highly efficient search protocol for Escherichia coli promoters ...."
</AB>
<JT>J Mol Biol</JT>
<PY>1989</PY>
<VO>207</VO>
<NO>2</NO>
<PP>301-310</PP>
</SEQ>

<SEQ>
<UI>0754   Okuda,T.      A Method for the Corre.. IEEE Trans.Comp 76 
25:172-177
</UI>
<AU>Okuda T;
    Tanaka E;
    Kasai T
</AU>
<TI>A Method for the Correction of Garbled Words based on the Levenshtein
Metric
</TI>
<SU>Sequence proximity;
    Correction;
    JP;
    Edit;
    Error;
    Word
</SU>
<AB>"In this paper we propose a new method for correcting garbled words based
on Levenshtein distance and weighted Levenshtein distance [in which insertions,
deletions, and substitutions each have distinct weights]."
</AB>
<JT>IEEE Trans Comput</JT>
<PY>25</PY>
<VO>25</VO>
<PP>172-177</PP>
</SEQ>

<SEQ>
<UI>0755   Oommen,B.J.   Constrained String Edi.. Inform.Sci.     87 
40:267-284
</UI>
<AU>Oommen BJ
</AU>
<TI>Constrained String Editing
</TI>
<SU>Pairwise alignment;
    CA;
    Editing
</SU>
<AB>"In this paper we consider the problem of transforming X to Y using any
arbitrary edit constraint involving the number and type of edit operations to 
be
performed. An algorithm is presented to compute the minimum distance associated
with editing X to Y subject to the specified constraint. ... The technique to
compute the optimal transformation is also presented."
</AB>
<JT>Inform Sci</JT>
<PY>40</PY>
<VO>40</VO>
<PP>267-284</PP>
</SEQ>

<SEQ>
<UI>0756   Oommen,B.J.   Recognition of Noisy S.. IEEE Trans.Patt 87 
9(5):676-685
</UI>
<AU>Oommen BJ
</AU>
<TI>Recognition of Noisy Subsequences Using Constrained Edit Distances
</TI>
<SU>String match;
    Correction;
    CA;
    Edit;
    Error;
    Subsequence;
    Distance;
    Recognition
</SU>
<AB>"let X* be any unknown word from a finite dictionary H. Let U be any
arbitrary subsequence of X*. We consider the problem of estimating X* by
processing Y, which is a noisy version of U."
</AB>
<JT>IEEE Trans Patt Anal Mach Intell</JT>
<PY>1987</PY>
<VO>9</VO>
<NO>5</NO>
<PP>676-685</PP>
</SEQ>

<SEQ>
<UI>0757   Owens,J.      A Fixed-point Alignmen.. Comput.Appl.Bio 88 
4(1):73-77
</UI>
<AU>Owens J;
    Chatterjee D;
    Nussinov R;
    Konopka AK;
    Maizel JV Jr
</AU>
<TI>A Fixed-point Alignment Technique for Detection of Recurrent and Common
Sequence Motifs Associated with Biological Features
</TI>
<SU>Multiple comparison;
    Common feature;
    USA;
    Motif;
    Detection
</SU>
<AB>"A fixed-point alignment analysis technique is presented which is 
designed
to locate common sequence motifs in collections of proteins or nucleic acids.
Initially a program aligns a collection of sequences by a common sequence
pattern or known biological feature. ... Once all alignment markers are 
located,
the sequences are scanned for occurrences of given oligomers within a specified
span both upstream and downstream of the fixed-point."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1988</PY>
<VO>4</VO>
<NO>1</NO>
<PP>73-77</PP>
</SEQ>

<SEQ>
<UI>0758   Owolabi,O.    Fast Approximate Strin.. Software.Practi 88 
18(4):387-393
</UI>
<AU>Owolabi O;
    McGregor DR
</AU>
<TI>Fast Approximate String Matching
</TI>
<SU>Database search;
    N-gram;
    UK;
    String match;
    Approximate match
</SU>
<AB>Approximate string matching to entries in a stored dictionary. "The first
[stage] uses a very compact n-gram table to preselect sets of roughly similar
strings. The second stage compares these with the input string using an 
accurate
method to give an accurately matched set of strings. ... The resulting method 
is
both computationally fast and storage-efficient."
</AB>
<JT>Software Practice Experience </JT>
<PY>1988</PY>
<VO>18</VO>
<NO>4</NO>
<PP>387-393</PP>
</SEQ>

<SEQ>
<UI>0759   Panjukov,V.V. Finding Steady Alignme.. Comput.Appl.Bio 93 
9(3):285-290
</UI>
<AU>Panjukov VV
</AU>
<TI>Finding Steady Alignments: Similarity and Distance
</TI>
<SU>Pairwise alignment;
    RU;
    Gap;
    Similarity;
    Distance
</SU>
<AB>"Certain alignments keep the optimum despite the weight parameters 
varying
over a range of values. Alignments of this kind are called steady. A method
finding all the steady optimal alignments of two sequences is presented
providing that a gap penalty is directly proportional to gap length."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>3</NO>
<PP>285-290</PP>
</SEQ>

<SEQ>
<UI>0760   Parry-Smith,D SOMAP: a Novel Interac.. Comput.Appl.Bio 91 
7(2):233-235
</UI>
<AU>Parry-Smith DJ;
    Attwood TK
</AU>
<TI>SOMAP: a Novel Interactive Approach to Multiple Protein Sequences
Alignment
</TI>
<SU>Multiple alignment;
    UK;
    Display;
    Sequence analysis;
    Protein
</SU>
<AB>"The approach used is essentially one of manual sequence manipulation,
aided by built-in symbolic displays of identities and similarities, and strict
and 'fuzzy' (ambiguous) pattern-matching facilities. Additional flexibility is
provided by means of an interface to a publicly available automatic alignment
system and to a comprehensive sequence analysis package."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1991</PY>
<VO>7</VO>
<NO>2</NO>
<PP>233-235</PP>
</SEQ>

<SEQ>
<UI>0761   Patthy,L.     Detecting Homology of .. J.Mol.Biol.     87 
198:567-577
</UI>
<AU>Patthy L
</AU>
<TI>Detecting Homology of Distantly Related Proteins with Consensus Sequences
</TI>
<SU>Multiple alignment;
    Consensus sequence;
    HU;
    Clustering;
    Homology;
    Protein
</SU>
<AB>The multiple alignment algorithm iterates between generating a multiple
alignment from a consensus sequence (by pairwise comparisons) and constructing 
a
consensus sequence from aligned sequences. An initial grouping of similar
sequences suggests that the method might be embedded in a SAHN clustering
structure
</AB>
<JT>J Mol Biol</JT>
<PY>198</PY>
<VO>198</VO>
<PP>567-577</PP>
</SEQ>

<SEQ>
<UI>0762   Pearson,W.R.  Rapid and Sensitive Se.. Methods Enzymol 90 
183:63-98
</UI>
<AU>Pearson WR
</AU>
<TI>Rapid and Sensitive Sequence Comparison with FASTP and FASTA
</TI>
<SU>Database search;
    USA;
    FASTA
</SU>
<AB>"In this chapter, I show an example of a simple FASTA library search,
describe the FASTA algorithm, and then discuss in detail a more problematic
search, namely, one for members of the G-protein-coupled receptor family.
Additional information about how to customize the scoring parameters and output
from the FASTA programs is included in the appendices."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>63-98</PP>
</SEQ>

<SEQ>
<UI>0763   Pearson,W.R.  Searching Protein Sequ.. Genomics        91 
11(3):635-650
</UI>
<AU>Pearson WR
</AU>
<TI>Searching Protein Sequence Libraries: Comparison of the Sensitivity and
Selectivity of the Smith-Waterman and FASTA Algorithms
</TI>
<SU>Database search;
    USA;
    FASTA;
    Sequence comparison;
    Protein;
    Algorithm
</SU>
<AB>"Rapid sequence comparison algorithms such as FASTP (Lipman, Pearson,
1985) and FASTA (Pearson, Lipman, 1988; Pearson, 1990) have dramatically
decreased the amount of time required to compare a newly determined protein
sequence to a protein or DNA sequence database." "Several strategies for
improving the sensitivity of FASTA were examined."
</AB>
<JT>Genomics </JT>
<PY>1991</PY>
<VO>11</VO>
<NO>3</NO>
<PP>635-650</PP>
</SEQ>

<SEQ>
<UI>0764   Pearson,W.R.  Improved Tools for Bio.. Proc.Nat.Acad.S 88 
85:2444-2448
</UI>
<AU>Pearson WR;
    Lipman DJ
</AU>
<TI>Improved Tools for Biological Sequence Comparison
</TI>
<SU>Database search;
    USA;
    Sequence comparison;
    FASTA;
    Significance;
    Composition;
    Display;
    Region
</SU>
<AB>"The FASTA program is a more sensitive derivative of the FASTP program,
which can be used to search protein or DNA sequence data bases .... The RFD2
program can be used to evaluate the significance of similarity scores using a
shuffling method that preserves local sequence composition. The LFASTA program
can display all the regions of local similarity between two sequences with
scores greater than a threshold ...."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>85</PY>
<VO>85</VO>
<PP>2444-2448</PP>
</SEQ>

<SEQ>
<UI>0765   Pearson,W.R.  Dynamic Programming Al.. Methods Enzymol 92 
210:575-601
</UI>
<AU>Pearson WR;
    Miller W
</AU>
<TI>Dynamic Programming Algorithms for Biological Sequence Comparison
</TI>
<SU>Pairwise comparison;
    Dynamic programming;
    Review;
    Sequence comparison;
    Profile;
    USA;
    Dynamic;
    Algorithm
</SU>
<AB>In Brand,L., and Johnson,M.L. (Eds.), Numerical Computer Methods. "We
discuss several dynamic programming algorithms that have been applied to
biological sequence comparison problems. ... We present efficient dynamic
programming algorithms for calculating global and local similarity scores, and
for comparing a sequence 'profile' or pattern to a sequence."
</AB>
<JT>Methods Enzymol</JT>
<PY>210</PY>
<VO>210</VO>
<PP>575-601</PP>
</SEQ>

<SEQ>
<UI>0766   Peltola,H.    Algorithms for the Sea.. Nucleic Acids R 86 
14(1):99-107
</UI>
<AU>Peltola H;
    Soderlund H;
    Ukkonen E
</AU>
<TI>Algorithms for the Search of Amino Acid Patterns in Nucleic Acid 
Sequences
</TI>
<SU>Pairwise alignment;
    FI;
    Region;
    Dynamic programming;
    Sequence comparison;
    Codon;
    Amino acid;
    Nucleic acid;
    Algorithm
</SU>
<AB>"Some algorithms are described for the search of regions in a nucleic 
acid
sequence that, when translated into amino acids, are homologous to a given 
amino
acid pattern. All algorithms are modifications of the dynamic programming 
method
for sequence comparison such that the translation of codons is taken into
account."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1986</PY>
<VO>14</VO>
<NO>1</NO>
<PP>99-107</PP>
</SEQ>

<SEQ>
<UI>0767   Pesole,G.     WORDUP: An Efficient A.. Nucleic Acids R 92 
20(11):2871-28
</UI>
<AU>Pesole G;
    Prunella N;
    Liuni S;
    Attimonelli M;
    Saccone C
</AU>
<TI>WORDUP: An Efficient Algorithm for Discovering Statistically Significant
Patterns in DNA Sequences
</TI>
<SU>Sequence analysis;
    Significance;
    Statistical;
    Markov;
    Motif;
    Italy;
    DNA;
    Algorithm
</SU>
<AB>"We present here a fast and sensitive method designed to isolate short
nucleotide sequences which have non-random statistical properties and may thus
be biologically active. It is based on a first order Markov analysis and allows
us to detect statistically significant sequence motifs from six to ten
nucleotides long which are significantly shared (or avoided) in the sequences
under investigation."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1992</PY>
<VO>20</VO>
<NO>11</NO>
<PP>2871-2875</PP>
</SEQ>

<SEQ>
<UI>0768   Petersen,S.B. Training Neural Networ.. Trends Biotechn 90 
8(11):304-308
</UI>
<AU>Petersen SB;
    Bohr H;
    Bohr J;
    Brunak S;
    Cotterill RMJ;
    Fredholm H;
    Lautrup B
</AU>
<TI>Training Neural Networks to Analyse Biological Sequences
</TI>
<SU>Sequence analysis;
    Neural;
    DK;
    Structure;
    Network
</SU>
<AB>Sequence homology measured by neural networks. Secondary structure
prediction. Prediction of b-turns in proteins. Prediction of three-dimensional
protein backbone conformation. Using neural networks on nucleic acid sequences
</AB>
<JT>Trends Biotechnol</JT>
<PY>1990</PY>
<VO>8</VO>
<NO>11</NO>
<PP>304-308</PP>
</SEQ>

<SEQ>
<UI>0769   Pevzner,P.A.  Multiple Alignment, Co.. SIAM J.Appl.Mat 92 
52(6):1763-177
</UI>
<AU>Pevzner PA
</AU>
<TI>Multiple Alignment, Communication Cost, and Graph Matching
</TI>
<SU>Multiple alignment;
    Complexity;
    Approximation;
    USA;
    Graph
</SU>
<AB>"Although many algorithms for suboptimal alignment have been suggested, 
no
"performance guarantees" algorithms have been known until recently. A
computationally efficient approximation multiple alignment algorithm with
guaranteed error bounds equal to the normalized communication cost of a
corresponding graph is given in this paper." See also Gusfield (1993)
</AB>
<JT>SIAM J Appl Math</JT>
<PY>1992</PY>
<VO>52</VO>
<NO>6</NO>
<PP>1763-1779</PP>
</SEQ>

<SEQ>
<UI>0770   Pevzner,P.A.  Nucleotide Sequences v.. Computers Chem. 92 
16(2):103-106
</UI>
<AU>Pevzner PA
</AU>
<TI>Nucleotide Sequences versus Markov Models
</TI>
<SU>Sequence analysis;
    Significance;
    Markov;
    USA;
    Nucleotide;
    Model
</SU>
<AB>"There exist several peculiarities of nucleotide sequences that preclude
their description by existing models and thus allow one to distinguish DNA
sequences from random A, T, G, C-texts. ... This paper reviews some approaches
to locate anomalous words and establishes links between the recent results on
walking Markov models with strand symmetry ... and non-stationary words in DNA
sequences ...."
</AB>
<JT>Computers Chem</JT>
<PY>1992</PY>
<VO>16</VO>
<NO>2</NO>
<PP>103-106</PP>
</SEQ>

<SEQ>
<UI>0771   Pevzner,P.A.  Statistical Distance B.. Comput.Appl.Bio 92 
8(2):121-127
</UI>
<AU>Pevzner PA
</AU>
<TI>Statistical Distance Between Texts and Filtration Methods in Sequence
Comparison
</TI>
<SU>Database search;
    USA;
    Statistical;
    Sequence comparison;
    Complexity;
    Dynamic programming;
    Distance
</SU>
<AB>"Upon searching local similarities in long sequences, the necessity of a
'rapid' similarity search becomes acute. Quadratic complexity of dynamic
programming algorithms forces the employment of filtration methods that allow
elimination of the sequences with a low similarity level. The paper is devoted
to the theoretical substantiations of the filtration method based on the
statistical distance between texts."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>2</NO>
<PP>121-127</PP>
</SEQ>

<SEQ>
<UI>0772   Pevzner,P.A.  Linguistics of Nucleot.. J.Biomol.Struct 89 
6(5):1013-1026
</UI>
<AU>Pevzner PA;
    Borodovsky MY;
    Mironov AA
</AU>
<TI>Linguistics of Nucleotide Sequences I: The Significance of Deviations 
from
Mean Statistical Characteristics and Prediction of the Frequencies of 
Occurrence
of Words
</TI>
<SU>Sequence analysis;
    Significance;
    Linguistic;
    RU;
    Statistical;
    Sequence prediction;
    Prediction;
    Nucleotide;
    Word
</SU>
<AB>"We propose a formula for the variance of the number of word's 
occurrences
in the text, with allowance for word overlaps, making it possible to assess the
significance of the deviations from the expected statistical characteristics.
... [Also a] new method for predicting the frequencies of occurrence of
particular words ...."
</AB>
<JT>J Biomol Struct &amp; Dyn </JT>
<PY>1989</PY>
<VO>6</VO>
<NO>5</NO>
<PP>1013-1026</PP>
</SEQ>

<SEQ>
<UI>0773   Pevzner,P.A.  Linguistics of Nucleot.. J.Biomol.Struct 89 
6(5):1027-1038
</UI>
<AU>Pevzner PA;
    Borodovsky MY;
    Mironov AA
</AU>
<TI>Linguistics of Nucleotide Sequences II: Stationary Words in Genetic Texts
and the Zonal Structure of DNA
</TI>
<SU>Sequence analysis;
    Significance;
    Linguistic;
    RU;
    Distributed;
    Genetic;
    Structure;
    DNA;
    Nucleotide;
    Word
</SU>
<AB>"Words are irregularly distributed in genetic texts. The analysis of this
irregularity leads to the notion of stationary and non-stationary words. ... 
The
distribution of stationary words suggests a method for partitioning DNA into
zones."
</AB>
<JT>J Biomol Struct &amp; Dyn </JT>
<PY>1989</PY>
<VO>6</VO>
<NO>5</NO>
<PP>1027-1038</PP>
</SEQ>

<SEQ>
<UI>0774   Pietrokovski, Linguistic Measure of .. J.Biomol.Struct 90 
7(6):1251-1268
</UI>
<AU>Pietrokovski S;
    Hirshon J;
    Trifonov EN
</AU>
<TI>Linguistic Measure of Taxonomic and Functional Relatedness of Nucleotide
Sequences
</TI>
<SU>Sequence proximity;
    Linguistic;
    IL;
    Nucleotide
</SU>
<AB>"A single value, the linguistic similarity between the sequences, is
suggested as a measure of sequence relatedness. ... The similarity value is
shown to be very sensitive to the relatedness of the source species, thus
providing a convenient tool for taxonomic classification of species by their
sequence vocabularies. ... This can be a basis for a quick screening technique
for functional characterization of the sequences ...."
</AB>
<JT>J Biomol Struct &amp; Dyn </JT>
<PY>1990</PY>
<VO>7</VO>
<NO>6</NO>
<PP>1251-1268</PP>
</SEQ>

<SEQ>
<UI>0775   Pirklbauer,K. A Study of Pattern-Mat.. Structured Prog 92 13:89-98
</UI>
<AU>Pirklbauer K
</AU>
<TI>A Study of Pattern-Matching Algorithms
</TI>
<SU>String match;
    Review;
    Austria;
    Algorithm
</SU>
<AB>"This paper does not present a new pattern-matching algorithm but offers 
a
survey of well-known algorithms and compares their run-time behavior. ... The
algorithms are compared by measuring the behavior of typical examples."
</AB>
<JT>Structured Programming </JT>
<PY>13</PY>
<VO>13</VO>
<PP>89-98</PP>
</SEQ>

<SEQ>
<UI>0776   Pizzi,E.      A Simple Method for Gl.. Nucleic Acids R 92 
20(1):131-136
</UI>
<AU>Pizzi E;
    Attimonelli M;
    Liuni S;
    Frontali C;
    Sacconne C
</AU>
<TI>A Simple Method for Global Sequence Comparison
</TI>
<SU>Sequence proximity;
    Sequence comparison;
    Italy
</SU>
<AB>"We investigated the possibility of using a global approach to 
efficiently
pre-screen a large database, in order to rapidly identify those sequences which
are related to a given one." "A simple method of sequence comparison, based on 
a
correlation analysis of oligonucleotide frequency distributions, is here shown
to be a reliable test of overall sequence similarity."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1992</PY>
<VO>20</VO>
<NO>1</NO>
<PP>131-136</PP>
</SEQ>

<SEQ>
<UI>0777   Posfai,J.     Predictive Motifs Deri.. Nucleic Acids R 89 
17(7):2421-243
</UI>
<AU>Posfai J;
    Bhagwat AS;
    Posfai G;
    Roberts RJ
</AU>
<TI>Predictive Motifs Derived from Cytosine Methyltransferases
</TI>
<SU>Multiple alignment;
    Segment;
    USA;
    Motif
</SU>
<AB>"To produce a global alignment of the set of similar sequences we
developed a new procedure. ... In brief, information from both the amino acid
and the nucleic acid sequences is used to produce the alignment. The program
attempts to reproduce the method of alignment by eye, by directly locating
globally conserved sequence features."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1989</PY>
<VO>17</VO>
<NO>7</NO>
<PP>2421-2435</PP>
</SEQ>

<SEQ>
<UI>0778   Pramanik,S.   A Hardware Pattern Mat.. Comput.J.       85 
28(3):264-269
</UI>
<AU>Pramanik S;
    King CT
</AU>
<TI>A Hardware Pattern Matching Algorithm on a Dataflow
</TI>
<SU>Match complex patterns;
    Hardware;
    USA;
    Pattern match;
    Parallel;
    Algorithm
</SU>
<AB>"A hardware pattern matcher is presented, which searches for patterns on 
a
data flow, such as characters read from a disk. The backing up on the data 
flow,
for a general pattern matching, is avoided by means of a set of cells running 
in
parallel. Each cell can search for a pattern independently, but requires only
one one-character comparator."
</AB>
<JT>Comput J</JT>
<PY>1985</PY>
<VO>28</VO>
<NO>3</NO>
<PP>264-269</PP>
</SEQ>

<SEQ>
<UI>0779   Pustell,J.    A High Speed, High Cap.. Nucleic Acids R 82 
10(15):4765-47
</UI>
<AU>Pustell J;
    Kafatos FC
</AU>
<TI>A High Speed, High Capacity Homology Matrix: Zooming through SV40 and
Polyoma
</TI>
<SU>Pairwise comparison;
    Dot;
    USA;
    Compression;
    Homology;
    Matrix
</SU>
<AB>"We present a new homology matrix program which owes its basic conception
to the two-dimensional dot matrices previously described ... but has important
improvements and new features." It has a noise-filtration system, capacity for
compression without much loss of information, and execution speed
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1982</PY>
<VO>10</VO>
<NO>15</NO>
<PP>4765-4782</PP>
</SEQ>

<SEQ>
<UI>0780   Queen,C.      Improvements to a Prog.. Nucleic Acids R 82 
10(1):449-456
</UI>
<AU>Queen C;
    Wegman MN;
    Korn LJ
</AU>
<TI>Improvements to a Program for DNA Analysis: A Procedure to Find 
Homologies
Among Many Sequences
</TI>
<SU>Consensus sequence;
    Neighbourhood;
    USA;
    Program;
    Homology;
    DNA
</SU>
<AB>"Partial homologies among a set of sequences are not readily detected by 
a
computer program that compares sequences two at a time, because each pair of
sequences contains many homologies not shared by the others. We have therefore
added to our program a procedure for detecting multi-sequence homologies, based
on an algorithm that analyzes all the sequences simultaneously."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1982</PY>
<VO>10</VO>
<NO>1</NO>
<PP>449-456</PP>
</SEQ>

<SEQ>
<UI>0781   Rabani,Y.     On the Space Complexit.. Theoret.Comput. 92 
95:231-244
</UI>
<AU>Rabani Y;
    Galil Z
</AU>
<TI>On the Space Complexity of Some Algorithms for Sequence Comparison
</TI>
<SU>Pairwise alignment;
    Complexity;
    IL;
    Sequence comparison;
    Gap;
    Algorithm
</SU>
<AB>"Recent algorithms for computing the modified edit distance given convex
or concave gap cost functions are shown to require W(n2) space for certain
input."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>95</PY>
<VO>95</VO>
<PP>231-244</PP>
</SEQ>

<SEQ>
<UI>0782   Raiha,L.      Approximate Sequence C.. Pattern Recogni 90 
12(1/2):159-16
</UI>
<AU>Raiha L
</AU>
<TI>Approximate Sequence Comparison: A Study with Histograms
</TI>
<SU>Pairwise alignment;
    FI;
    Sequence comparison
</SU>
<AB>"We have succeeded in generalizing the algorithm [Ukkonen (1985)] for 
non-
negative cost functions. ... The alphabet of the sequences may be infinite. ...
The cost function to weigh the editing operations must have at least two of the
metric properties: non-negative values and the triangle inequality. ... The
algorithm needs linear space."
</AB>
<JT>Pattern Recognition </JT>
<PY>1990</PY>
<VO>12</VO>
<NO>1/2</NO>
<PP>159-169</PP>
</SEQ>

<SEQ>
<UI>0783   Rechid,R.     A New Interactive Prot.. Comput.Appl.Bio 89 
5(2):107-113
</UI>
<AU>Rechid R;
    Vingron M;
    Argos P
</AU>
<TI>A New Interactive Protein Sequence Alignment Program and Comparison of 
its
Results with Widely Used Algorithms
</TI>
<SU>Pairwise alignment;
    DE;
    Sequence comparison;
    Display;
    Program;
    Protein;
    Sequence alignment;
    Algorithm
</SU>
<AB>"A computer program that allows interactive sequence comparison is
described. It graphically displays a search matrix using residue 
physicochemical
characteristics and multilength segmental comparisons."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1989</PY>
<VO>5</VO>
<NO>2</NO>
<PP>107-113</PP>
</SEQ>

<SEQ>
<UI>0784   Regnier,M.    Knuth-Morris-Pratt Alg.. Lecture Notes i 89 
379:431-444
</UI>
<AU>Regnier M
</AU>
<TI>Knuth-Morris-Pratt Algorithm: An Analysis
</TI>
<SU>String match;
    Knuth-Morris-Pratt;
    FR;
    Algorithm
</SU>
<AB>In Kreczmar, A., Mirkowska, G. (Eds.), Mathematical Foundations of
Computer Science 1989, MFCS '89, Porabka-Kozubnik, Poland, 28 August - 1
September 1989. "This paper deals with an average analysis of the Knuth-Morris-
Pratt algorithm. ... An algebraic scheme is used, based on combinatorics on
words and generating functions."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>379</PY>
<VO>379</VO>
<PP>431-444</PP>
</SEQ>

<SEQ>
<UI>0785   Reich,J.G.    On the Statistical Ass.. Nucleic Acids R 84 
12(13):5529-55
</UI>
<AU>Reich JG;
    Drabsch H;
    Daumler A
</AU>
<TI>On the Statistical Assessment of Similarities in DNA Sequences
</TI>
<SU>Pairwise alignment;
    Significance;
    DE;
    Statistical;
    Gap;
    Similarity;
    DNA
</SU>
<AB>"The statistical behavior of the similarity score for unrelated DNA
sequences calculated as letter-by-letter comparison or from various forms of
optimal alignment was studied. ... This makes it possible to adopt a simple
criterion for the rejection of fortuitous similarity. It is based on the mean
and standard deviation of chance scores whose expected values, depending on
chain length, gap penalty and probability of letter coincidence, may be
calculated ...."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1984</PY>
<VO>12</VO>
<NO>13</NO>
<PP>5529-5543</PP>
</SEQ>

<SEQ>
<UI>0786   Reich,J.G.    A Simple Statistical S.. Comput.Appl.Bio 87 
3(1):25-30
</UI>
<AU>Reich JG;
    Meiske W
</AU>
<TI>A Simple Statistical Significance Test of Window Scores in Large Dot
Matrices Obtained from Protein or Nucleic Acid Sequences
</TI>
<SU>Pairwise comparison;
    Dot;
    DE;
    Statistical;
    Significance;
    Protein;
    Nucleic acid;
    Score
</SU>
<AB>"A test of the statistical significance of dot constellations as detected
by window search in large dot matrices is described. The procedure takes the
correlation between overlapping windows on the diagonals of a dot matrices into
account. It is based on a confidence limit of the exact distribution of dot
scores."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1987</PY>
<VO>3</VO>
<NO>1</NO>
<PP>25-30</PP>
</SEQ>

<SEQ>
<UI>0787   Reichert,T.A. An Application of Info.. J.Theor.Biol.   73 
42(2):245-261
</UI>
<AU>Reichert TA;
    Cohen DN;
    Wong AKC
</AU>
<TI>An Application of Information Theory to Genetic Mutations and the 
Matching
of Polypeptide Sequences
</TI>
<SU>Sequence proximity;
    Information theory;
    USA;
    Genetic
</SU>
<AB>"An information-based methodology for determining the quality of an
alignment of two code sequences is presented. ... In application, one needs to
obtain estimates of the distribution of (a) the spacing between mutations, (b)
the frequency of the four mutation operations, and (c) the inserted character
frequencies and deletion lengths."
</AB>
<JT>J Theor Biol</JT>
<PY>1973</PY>
<VO>42</VO>
<NO>2</NO>
<PP>245-261</PP>
</SEQ>

<SEQ>
<UI>0788   Reizer,J.     Possible Problems with.. Trends Biochem. 92 
17(2):60-60
</UI>
<AU>Reizer J;
    Saier MH;
    Reizer A
</AU>
<TI>Possible Problems with the Protein Sequence Comparison Program FASTA
</TI>
<SU>Database search;
    FASTA;
    USA;
    Sequence comparison;
    Program;
    Protein
</SU>
<AB>"A prerequisite to [computer aided comparisons of protein sequences] is 
an
effective screening routine that reliably searches protein libraries and 
selects
sequences for evaluation. The widely used FASTA program provides a rapid 
library
search and sequence comparison algorithm for this purpose. Some problems that 
we
have encountered while using FASTA should be brought to the attention of other
users."
</AB>
<JT>Trends Biochem Sci</JT>
<PY>1992</PY>
<VO>17</VO>
<NO>2</NO>
<PP>60-60</PP>
</SEQ>

<SEQ>
<UI>0789   Rinsma,I.     Distribution of the Nu.. Bull.Math.Biol. 90 
52(3):349-358
</UI>
<AU>Rinsma I;
    Hendy M;
    Penny D
</AU>
<TI>Distribution of the Number of Matches between Nucleotide Sequences
</TI>
<SU>Pairwise alignment;
    Significance;
    NZ;
    Distribution;
    Nucleotide
</SU>
<AB>"A method is given for calculating the probability of observing m matches
from two overlapping random sequences. [It is a] useful first step in 
evaluating
the reliability of evolutionary trees .... It could also be used to determine
how much better an optimal alignment is than expected by chance."
</AB>
<JT>Bull Math Biol</JT>
<PY>1990</PY>
<VO>52</VO>
<NO>3</NO>
<PP>349-358</PP>
</SEQ>

<SEQ>
<UI>0790   Risler,J.L.   Amino Acid Substitutio.. J.Mol.Biol.     88 
204:1019-1029
</UI>
<AU>Risler JL;
    Delorme MO;
    Delacroix H;
    Henaut A
</AU>
<TI>Amino Acid Substitutions in Structurally Related Proteins. A Pattern
Recognition Approach. Determination of a New and Efficient Scoring Matrix
</TI>
<SU>Sequence proximity;
    Substitution;
    FR;
    Pattern recognition;
    Scoring;
    Amino acid;
    Protein;
    Recognition;
    Matrix
</SU>
<AB>"Amino acid substitutions in evolutionarily related proteins have been
studied from a structural point of view. ... The matrix of distances between
amino acids, or scoring matrix, determined from this study is different from 
any
other published matrix. ... [It] seems to be very efficient for aligning
distantly related proteins."
</AB>
<JT>J Mol Biol</JT>
<PY>204</PY>
<VO>204</VO>
<PP>1019-1029</PP>
</SEQ>

<SEQ>
<UI>0791   Rivest,R.L.   On the Worst-case Beha.. SIAM J.Comput.  77 
6(4):669-674
</UI>
<AU>Rivest RL
</AU>
<TI>On the Worst-case Behaviour of String-Searching Algorithms
</TI>
<SU>String match;
    Complexity;
    USA;
    Pattern match;
    Algorithm
</SU>
<AB>"Any algorithm for finding a pattern of length k in a string of length n
must examine at least n - k + 1 of the characters of the string in the worst
case. ... We prove that this is the best possible result. Therefore there do 
not
exist pattern matching algorithms whose worst-case behavior is 'sublinear' in n
(that is, linear with constant less than one), in contrast with the situation
for average behavior ...."
</AB>
<JT>SIAM J Comput</JT>
<PY>1977</PY>
<VO>6</VO>
<NO>4</NO>
<PP>669-674</PP>
</SEQ>

<SEQ>
<UI>0792   Roberts,L.    New Chip may Speed Gen.. Science         89 244(12 
May):65
</UI>
<AU>Roberts L
</AU>
<TI>New Chip may Speed Genome Analysis
</TI>
<SU>Match complex patterns;
    USA;
    Genome;
    Pattern match;
    Parallel
</SU>
<AB>"What they have come up with ... is 'a hardware solution to what is
normally handled by investigators as a software problem' ... - a relatively
inexpensive parallel processing system that can scan up to 10 million 
characters
a second. ... What accounts for the speed of this system is that the
instructions for pattern matching are hardwired into the processors."
</AB>
<JT>Science </JT>
<PY>1989</PY>
<VO>244</VO>
<NO>12 May</NO>
<PP>655-656</PP>
</SEQ>

<SEQ>
<UI>0793   Robson,B.     Natural Sequence Code .. Comput.Appl.Bio 92 
8(3):283-289
</UI>
<AU>Robson B;
    Greaney PJ
</AU>
<TI>Natural Sequence Code Representations for Compression and Rapid Searching
of Human-genome Style Databases
</TI>
<SU>Database search;
    UK;
    Compression;
    Representation
</SU>
<AB>"Numeric descriptions ('bio-informatic descriptions') of amino acid
residues have been developed which will be of value whenever the quality and
quantity of information in very large ... gene and protein sequences is to be
compared or manipulated. ... Preliminary studies on both a supercomputer and
smaller machines suggest a 'worst-case' speeding of [approximately] 4.5-fold."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>3</NO>
<PP>283-289</PP>
</SEQ>

<SEQ>
<UI>0794   Rohde,K.      A Fast, Sensitive Patt.. Comput.Appl.Bio 93 
9(2):183-189
</UI>
<AU>Rohde K;
    Bork P
</AU>
<TI>A Fast, Sensitive Pattern-matching Approach for Protein Sequences
</TI>
<SU>Match a pattern matrix;
    DE;
    Dynamic programming;
    Protein
</SU>
<AB>"We present a fast, sensitive pattern-matching algorithm that describes a
pattern by its physico-chemical properties rather than by occurrence of amino
acids, using a fast, dynamic programming algorithm. ... This method leads to a
better description of the pattern, as it is not simply a group of similar
sequence positions but rather a structure with special properties and
functions."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>2</NO>
<PP>183-189</PP>
</SEQ>

<SEQ>
<UI>0795   Roytberg,M.A. A Search for Common Pa.. Comput.Appl.Bio 92 
8(1):57-64
</UI>
<AU>Roytberg MA
</AU>
<TI>A Search for Common Patterns in Many Sequences
</TI>
<SU>Multiple comparison;
    Common feature;
    RU
</SU>
<AB>"A new approach to search for common patterns in many sequences is
presented. The idea is that one sequence from the set of sequences to be
compared is considered as a 'basic' one and all its similarities with other
sequences are found. Multiple similarities are then reconstructed using these
data."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>1</NO>
<PP>57-64</PP>
</SEQ>

<SEQ>
<UI>0796   Russell,R.B.  Multiple Protein Seque.. Proteins Struct 92 
14:309-323
</UI>
<AU>Russell RB;
    Barton GJ
</AU>
<TI>Multiple Protein Sequence Alignment from Tertiary Structure Comparison:
Assignment of Global and Residue Confidence Levels
</TI>
<SU>Multiple alignment;
    Structure;
    UK;
    Sequence alignment;
    Confidence;
    Protein;
    Residue
</SU>
<AB>"An algorithm is presented for the accurate and rapid generation of
multiple protein sequence alignments from tertiary structure comparisons. ... 
In
order to reduce the need for visual verification, two similarity indices are
introduced to determine the quality of each generated structural alignment."
</AB>
<JT>Proteins Struct Funct Genet</JT>
<PY>14</PY>
<VO>14</VO>
<PP>309-323</PP>
</SEQ>

<SEQ>
<UI>0797   Rytter,W.     A Correct Preprocessin.. SIAM J.Comput.  80 
9(3):509-512
</UI>
<AU>Rytter W
</AU>
<TI>A Correct Preprocessing Algorithm for Boyer-Moore String-Searching
</TI>
<SU>String match;
    Boyer-Moore;
    MEX;
    Correction;
    Knuth-Morris-Pratt;
    Algorithm
</SU>
<AB>"We present the correction to Knuth's algorithm [Knuth, Morris, Pratt
(1977)] for computing the table of pattern shifts later used in the Boyer-Moore
algorithm [(1977)] for pattern matching."
</AB>
<JT>SIAM J Comput</JT>
<PY>1980</PY>
<VO>9</VO>
<NO>3</NO>
<PP>509-512</PP>
</SEQ>

<SEQ>
<UI>0798   Earley,J.     An Efficient Context-f.. Comm.ACM        70 
13(2):94-102
</UI>
<AU>Earley J
</AU>
<TI>An Efficient Context-free Parsing Algorithm
</TI>
<SU>Parsing;
    Language;
    USA;
    Algorithm
</SU>
<AB>"A parsing algorithm which seems to be the most efficient general 
context-
free algorithm known is described. It is similar to both Knuth's LR(k) 
algorithm
and the familiar top-down algorithm. It has a time bound proportional to n3
(where n is the length of the string being parsed) in general; it has an n2
bound for unambiguous grammars; and it runs in linear time on a large class of
grammars, which seems to include most practical context-free programming
language grammars."
</AB>
<JT>Comm ACM </JT>
<PY>1970</PY>
<VO>13</VO>
<NO>2</NO>
<PP>94-102</PP>
</SEQ>

<SEQ>
<UI>0799   Younger,D.H.  Recognition and Parsin.. Inform.Control  67 
10(2):189-208
</UI>
<AU>Younger DH
</AU>
<TI>Recognition and Parsing of Context-Free Languages in Time n3.
</TI>
<SU>Sequence recognition;
    Language;
    Automata;
    USA;
    Parsing;
    Recognition
</SU>
<AB>"A recognition algorithm is exhibited whereby an arbitrary string over a
given vocabulary can be tested for containment in a given context-free 
language.
A special merit of this algorithm is that it is completed in a number of steps
proportional to the cube of the number of symbols in the tested string. ... The
recognition algorithm is then simulated on a Turing Machine. It is shown that
this simulation likewise requires a number of steps proportional to only the
cube of the test string length."
</AB>
<JT>Inform Control (Orlando) </JT>
<PY>1967</PY>
<VO>10</VO>
<NO>2</NO>
<PP>189-208</PP>
</SEQ>

<SEQ>
<UI>0800   Claverie,J.M. A New Protein Sequence.. Nature (Lond.)  85 318(7 
Nov.):19
</UI>
<AU>Claverie JM;
    Sauvaget I
</AU>
<TI>A New Protein Sequence Data Bank
</TI>
<SU>Sequence database;
    FR;
    Sequence analysis;
    Protein
</SU>
<AB>"A protein sequence data bank, called PGtrans, is available from our
laboratory. This data bank is generated by automatic computer translation of 
the
well-known nucleotide sequence library GenBank .... The main purpose of PGtrans
is to offer a direct access to all amino-acid sequences coded among GenBank
nucleotide sequences and thus to be an efficient tool for protein homology
searches. Its format is compatible with the rest of our sequence analysis
software and will be consistent with the recommendations of the CODATA task
group on protein sequence data banks."
</AB>
<JT>Nature (Lond ) </JT>
<PY>1985</PY>
<VO>318</VO>
<NO>7 Nov.</NO>
<PP>19-19</PP>
</SEQ>

<SEQ>
<UI>0801   Chin,F.       Performance Analysis o.. Algorithmica    94 
12(4/5):293-31
</UI>
<AU>Chin F;
    Poon CK
</AU>
<TI>Performance Analysis of Some Simple Heuristics for Computing Longest
Common Subsequences
</TI>
<SU>Longest common;
    Subsequence;
    Performance;
    Heuristic;
    HK
</SU>
<AB>"Although the Longest Common Subsequence (LCS) Problem has been studied 
by
many researchers for years, heuristic methods have not been investigated 
before.
In this paper we present a simple heuristic which guarantees to return a common
subsequence of length at least 1/s that of the longest where s is the number of
different symbols in the input strings. Furthermore, we generalize the idea to
several classes of heuristic algorithms. Surprisingly, we find that no other
heuristic in these classes outperforms this simple algorithm."
</AB>
<JT>Algorithmica </JT>
<PY>1994</PY>
<VO>12</VO>
<NO>4/5</NO>
<PP>293-311</PP>
</SEQ>

<SEQ>
<UI>0802   Gribskov,M.   The Codon Preference P.. Nucleic Acids R 84 
12(1):539-549
</UI>
<AU>Gribskov M;
    Devereux J;
    Burgess RR
</AU>
<TI>The Codon Preference Plot: Graphic Analysis of Protein Coding Sequences
and Prediction of Gene Expression
</TI>
<SU>Sequence analysis;
    Codon;
    Expression;
    USA;
    Coding;
    Frame;
    Protein;
    Gene;
    Prediction;
    Graphic
</SU>
<AB>"The codon preference plot is useful for locating genes in sequenced DNA,
predicting the relative level of their expression and for detecting DNA
sequencing errors resulting in the insertion or deletion of bases within a
coding sequence. The three possible reading frames are displayed in parallel
along with the open reading frames and plots of the location of rare codons in
each reading frame."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1984</PY>
<VO>12</VO>
<NO>1</NO>
<PP>539-549</PP>
</SEQ>

<SEQ>
<UI>0803   Orcutt,B.C.   Searching the Protein .. Bull.Math.Biol. 84 
46(4):545-552
</UI>
<AU>Orcutt BC;
    Barker WC
</AU>
<TI>Searching the Protein Sequence Database
</TI>
<SU>Database search;
    Review;
    USA;
    Dynamic programming;
    Approximate match;
    Protein;
    Sequence database
</SU>
<AB>"As the volume of protein sequence data grows, rapid methods for 
searching
the protein sequence database become of primary importance. Rigorous comparison
of sequences is obtained with the well-known dynamic programming algorithms.
However these algorithms are not rapid enough to use for routinely searching 
the
entire database. In this paper we discuss some methods that can be used for
rapid searches." The protein identification problem. Search for identical
matching segments. Search for approximate matching segments (substitutions).
Search for approximate matching segments (all mutations).
</AB>
<JT>Bull Math Biol</JT>
<PY>1984</PY>
<VO>46</VO>
<NO>4</NO>
<PP>545-552</PP>
</SEQ>

<SEQ>
<UI>0804   Tavare,S.     Some Statistical Aspec.. Mathematical .. 89CRC Press
</UI>
<AU>Tavare S;
    Giddings BW
</AU>
<TI>Some Statistical Aspects of the Primary Structure of Nucleotide Sequences
</TI>
<ED>Waterman MS
</ED>
<BK>Mathematical Methods for DNA Sequences
</BK>
<SU>Sequence analysis;
    Markov;
    Fourier;
    Regularities;
    USA;
    Statistical;
    Structure;
    Nucleotide
</SU>
<AB>"The second section describes some Markov chain methods for assessing the
dependence structure that exists in a sequence of nucleotides. Particular
emphasis is placed on methods for estimating the order of the Markov 
dependence.
... The third part of our paper describes some methods for searching for
repetitive or periodic patterns in a sequence. We base our analysis on the
discrete Walsh transform and compare it to the more familiar Fourier methods."
</AB>
<PU>CRC Press </PU>
<PL>Boca Raton, FL </PL>
<PY>1989</PY>
<PP>117-132</PP>
</SEQ>

<SEQ>
<UI>0805   Sackin,M.J.   Crossassociation: A Me.. Biochem.Genet.  71 
5:287-313
</UI>
<AU>Sackin MJ
</AU>
<TI>Crossassociation: A Method of Comparing Protein Sequences
</TI>
<SU>Pairwise comparison;
    Significance;
    UK;
    Statistical;
    Protein
</SU>
<AB>"The method is to 'slide' the sequences past each other one step at a 
time
and to count the number of amino acids that match. At each overlap position, 
the
program prints the percentage match and statistical significance measures of 
the
matching. ... The method includes computation of three overall similarity
measures between sequences which should have use in both evolutionary and
taxonomic studies."
</AB>
<JT>Biochem Genet</JT>
<PY>5</PY>
<VO>5</VO>
<PP>287-313</PP>
</SEQ>

<SEQ>
<UI>0806   Salemme,A.    A Convenient Method fo.. Nucleic Acids R 84 
12:257-262
</UI>
<AU>Salemme A;
    Furano AV
</AU>
<TI>A Convenient Method for Locating Sets of Related Short Sequences in DNA
Sequences of any Length
</TI>
<SU>Match with k mismatches;
    USA;
    DNA
</SU>
<AB>"With a single execution of each program, the user can search one or more
sequences of any length for one or more short sequences. The user specifies
either the number of mismatches (from 0 through N) or the positions of the
mismatches allowed in each short sequence."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>12</PY>
<VO>12</VO>
<PP>257-262</PP>
</SEQ>

<SEQ>
<UI>0807   Sander,C.     Database of Homology-d.. Proteins Struct 91 9:56-68
</UI>
<AU>Sander C;
    Schneider R
</AU>
<TI>Database of Homology-derived Protein Structures and the Structural 
Meaning
of Sequence Alignment
</TI>
<SU>Pairwise alignment;
    Significance;
    Sequence alignment;
    Structure;
    Protein;
    Sequence weight;
    DE
</SU>
<AB>"The threshold of sequence similarity sufficient for structural homology
depends strongly on the length of the alignment. Here, we first quantify the
relation between sequence similarity, structure similarity, and alignment 
length
by an exhaustive survey of alignments between proteins of known structure and
report a homology threshold curve as a function of alignment length."
</AB>
<JT>Proteins Struct Funct Genet</JT>
<PY>9</PY>
<VO>9</VO>
<PP>56-68</PP>
</SEQ>

<SEQ>
<UI>0808   Sankoff,D.    Matching Sequences und.. Proc.Nat.Acad.S 72 
69(1):4-6
</UI>
<AU>Sankoff D
</AU>
<TI>Matching Sequences under Deletion/Insertion Constraints
</TI>
<SU>Pairwise alignment;
    Dynamic programming;
    CA;
    Longest common
</SU>
<AB>The algorithm of Needleman and Wunsch (1970), for finding longest common
subsequences without constraints, "is improved from the viewpoint of
computational economy. An economical algorithm is then elaborated for finding
subsequences satisfying deletion/insertion constraints."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>1972</PY>
<VO>69</VO>
<NO>1</NO>
<PP>4-6</PP>
</SEQ>

<SEQ>
<UI>0809   Sankoff,D.    Minimal Mutation Trees.. SIAM J.Appl.Mat 75 
28(1):35-42
</UI>
<AU>Sankoff D
</AU>
<TI>Minimal Mutation Trees of Sequences
</TI>
<SU>Multiple alignment;
    CA;
    Significance
</SU>
<AB>"Given a finite tree, some of whose vertices are identified with given
finite sequences, we show how to construct sequences for all the remaining
vertices simultaneously, so as to minimize the total edge-length of the tree.
Edge-length is calculated by a metric whose biological significance is the
mutational distance between two sequences." See Sankoff and Cedergren (1983)
</AB>
<JT>SIAM J Appl Math</JT>
<PY>1975</PY>
<VO>28</VO>
<NO>1</NO>
<PP>35-42</PP>
</SEQ>

<SEQ>
<UI>0810   Sankoff,D.    Simultaneous Solution .. SIAM J.Appl.Mat 85 
45(5):810-825
</UI>
<AU>Sankoff D
</AU>
<TI>Simultaneous Solution of the RNA Folding, Alignment and Protosequence
Problems
</TI>
<SU>Multiple alignment;
    Dynamic programming;
    CA;
    RNA;
    Folding
</SU>
<AB>"The alignment of finite sequences, the inference of ribonucleic acid
secondary structures (folding), and the reconstruction of ancestral sequences 
on
a phylogenetic tree, are three problems which have dynamic programming
solutions, which we formulate in a common mathematical framework. Combining the
objective functions for alignment ... and folding (free energy), we present an
algorithm which solves all three problems simultaneously ...."
</AB>
<JT>SIAM J Appl Math</JT>
<PY>1985</PY>
<VO>45</VO>
<NO>5</NO>
<PP>810-825</PP>
</SEQ>

<SEQ>
<UI>0811   Sankoff,D.    A Test for Nucleotide .. J.Mol.Biol.     73 
77:159-164
</UI>
<AU>Sankoff D;
    Cedergren RJ
</AU>
<TI>A Test for Nucleotide Sequence Homology
</TI>
<SU>Pairwise alignment;
    Significance;
    Monte Carlo;
    CA;
    Homology;
    Nucleotide
</SU>
<AB>With respect to alignments of two given sequences, "a test is developed
which computes the significance of each deletion/insertion hypothesized, based
on Monte-Carlo sampling of random sequences with the same base composition as
the experimental sequences being tested."
</AB>
<JT>J Mol Biol</JT>
<PY>77</PY>
<VO>77</VO>
<PP>159-164</PP>
</SEQ>

<SEQ>
<UI>0812   Sankoff,D.    Simultaneous Compariso.. Time Warps, S.. 
83Addison-Wesley
</UI>
<AU>Sankoff D;
    Cedergren RJ
</AU>
<TI>Simultaneous Comparison of Three or More Sequences Related by a Tree
</TI>
<ED>Sankoff D
    Kruskal JB
</ED>
<BK>Time Warps, String Edits, and Macromolecules: The Theory and Practice of
Sequence Comparison
</BK>
<SU>Multiple alignment;
    Dynamic programming;
    Character weight;
    CA
</SU>
<AB>The algorithm minimizes the length of the given evolutionary tree of the
sequences. See Sankoff (1975)
</AB>
<PU>Addison-Wesley </PU>
<PL>Reading, MA </PL>
<PY>1983</PY>
<PP>253-263</PP>
</SEQ>

<SEQ>
<UI>0813   Sankoff,D.    Frequency of Insertion.. J.Mol.Evol.     76 
7:133-149
</UI>
<AU>Sankoff D;
    Cedergren RJ;
    Lapalme G
</AU>
<TI>Frequency of Insertion-Deletion, Transversion,and Transition in the
Evolution of 5S Ribosomal RNA
</TI>
<SU>Multiple alignment;
    Dynamic programming;
    CA;
    Evolution;
    RNA;
    Transition
</SU>
<AB>"We present a dynamic programming algorithm which finds the optimal
alignment for a set of N sequences simultaneously, where each sequence is
associated with one of the N tips of a given evolutionary tree. Concurrently,
protosequences are constructed corresponding to the ancestral nodes of the
tree."
</AB>
<JT>J Mol Evol</JT>
<PY>7</PY>
<VO>7</VO>
<PP>133-149</PP>
</SEQ>

<SEQ>
<UI>0814                 Time Warps, String Edi..                 
83Addison-Wesley
</UI>
<TI>Time Warps, String Edits, and Macromolecules: The Theory and Practice of
Sequence Comparison
</TI>
<ED>Sankoff D
    Kruskal JB
BK  -
</ED>
<SU>Sequence analysis;
    Dynamic programming;
    CA;
    Sequence comparison;
    Complexity;
    Edit
</SU>
<AB>An overview of sequence comparison based on dynamic programming. It
includes major sections on: macromolecular sequences; time-warping, continuous
functions, and speech processing; algorithms for related problems; 
computational
complexity; random sequences
</AB>
<PU>Addison-Wesley </PU>
<PL>Reading, MA </PL>
<PY>1983</PY>
<PP>xii+382-xii+382</PP>
</SEQ>

<SEQ>
<UI>0815   Sankoff,D.    Gene Order Comparisons.. Proc.Nat.Acad.S 92 
89:6575-6579
</UI>
<AU>Sankoff D;
    Leduc G;
    Antoine N;
    Paquin B;
    Lang BF;
    Cedergren R
</AU>
<TI>Gene Order Comparisons for Phylogenetic Inference: Evolution of the
Mitochondrial Genome
</TI>
<SU>Genome;
    CA;
    Evolution;
    Gene;
    Phylogenetic
</SU>
<AB>"We describe the construction of a database of 16 mitochondrial gene
orders from fungi and other eukaryotes by using complete or nearly complete
genomic sequences; propose a measure of gene order rearrangement based on the
minimal set of chromosomal inversions, transpositions, insertions, and 
deletions
necessary to convert the order in one genome to that of the other; report on
algorithm design and the development of the DERANGE software for the 
calculation
of this measure ...."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>89</PY>
<VO>89</VO>
<PP>6575-6579</PP>
</SEQ>

<SEQ>
<UI>0816   Sankoff,D.    Evolution of 5S RNA an.. Nat.New Biol.   73 245(24 
Oct.):2
</UI>
<AU>Sankoff D;
    Morel C;
    Cedergren RJ
</AU>
<TI>Evolution of 5S RNA and the Nonrandomness of Base Replacement
</TI>
<SU>Multiple alignment;
    CA;
    Evolution;
    RNA
</SU>
<AB>"The main problem is to align the various sequences so that bases in
corresponding position in different sequences are fairly certain to reflect a
common term in the ancestral sequence. ... This problem has recently been 
solved
for the case where the evaluation is based on a minimal mutation criterion. ...
For the case of three known sequences and one unknown sequence in the
configuration it is quite easy to implement a computer program for the method."
</AB>
<JT>Nat New Biol</JT>
<PY>1973</PY>
<VO>245</VO>
<NO>24 Oct.</NO>
<PP>232-234</PP>
</SEQ>

<SEQ>
<UI>0817   Sankoff,D.    Shortcuts, Diversions,.. Discrete Math.  73 
4:287-293
</UI>
<AU>Sankoff D;
    Sellers PH
</AU>
<TI>Shortcuts, Diversions, and Maximal Chains in Partially Ordered Sets
</TI>
<SU>Pairwise alignment;
    CA;
    Restriction
</SU>
<AB>"An algorithm is described for finding the maximal weight chain between
two points in a locally finite partial order under the restriction that all but
k (or fewer) successive pairs in the chain belong to a given subset of the
partial order relation." Application to molecular genetics
</AB>
<JT>Discrete Math</JT>
<PY>4</PY>
<VO>4</VO>
<PP>287-293</PP>
</SEQ>

<SEQ>
<UI>0818   Santibanez,M. A Multiple Alignment P.. Comput.Appl.Bio 87 
3(2):111-114
</UI>
<AU>Santibanez M;
    Rohde K
</AU>
<TI>A Multiple Alignment Program for Protein Sequences
</TI>
<SU>Multiple alignment;
    Segment;
    DE;
    Program;
    Protein
</SU>
<AB>This program for multiple alignment of protein sequences is "an extension
of the fast alignment program by Wilbur et al. (1984) into higher dimensions.
The use of hash procedures on fragments of the protein sequences increases the
speed of calculation. Thereby we also take into account fragments which are
present in some, but not in all, sequences considered."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1987</PY>
<VO>3</VO>
<NO>2</NO>
<PP>111-114</PP>
</SEQ>

<SEQ>
<UI>0819   Saqi,M.A.S.   A Simple Method to Gen.. J.Mol.Biol.     91 
219(4):727-732
</UI>
<AU>Saqi MAS;
    Sternberg MJE
</AU>
<TI>A Simple Method to Generate Non-trivial Alternate Alignments of Protein
Sequences
</TI>
<SU>Pairwise alignment;
    Dynamic programming;
    UK;
    Protein
</SU>
<AB>"An algorithm is presented that finds suboptimal alignments of protein
sequences by a simple modification to the standard dynamic programming method."
</AB>
<JT>J Mol Biol</JT>
<PY>1991</PY>
<VO>219</VO>
<NO>4</NO>
<PP>727-732</PP>
</SEQ>

<SEQ>
<UI>0820   Saroff,H.A.   A Note on the Evaluati.. Bull.Math.Biol. 84 
46(5/6):951-96
</UI>
<AU>Saroff HA
</AU>
<TI>A Note on the Evaluation of Similarity (Homology) of Short Sequences with
Long Sequences
</TI>
<SU>Pairwise comparison;
    Significance;
    Monte Carlo;
    USA;
    Gap;
    Similarity;
    Homology
</SU>
<AB>"Monte Carlo data on the comparison of a short sequence with a long one
are developed in a manner to quantify the occurrence of gaps. ... In previous
publications tables of expected frequencies of occurrence of similarities
between a short sequence, length 10, and a long one, length 112, were presented
.... This note develops the problem in more detail, particularly the results
relating to gaps."
</AB>
<JT>Bull Math Biol</JT>
<PY>1984</PY>
<VO>46</VO>
<NO>5/6</NO>
<PP>951-961</PP>
</SEQ>

<SEQ>
<UI>0821   Saurin,W.     Comparaison de plusieu.. C.R.Acad.Sci.Pa 86 
303(13):541-54
</UI>
<AU>Saurin W;
    Marliere P
</AU>
<TI>Comparaison de plusieurs sequences proteiques par reconnaissance de blocs
conserves
</TI>
<SU>Multiple alignment;
    Segment;
    FR;
    Region;
    DE
</SU>
<AB>Simultaneous Alignment of Several Protein Sequences. "The sequences of
related proteins show the alternance of conserved and variable regions. ...
Although the exact meaning of such constraints remains elusive, conserved
regions can be extracted from protein chains and used to align them. We
developed a program that efficiently performs this task."
</AB>
<JT>C R Acad Sci Paris Ser III </JT>
<PY>1986</PY>
<VO>303</VO>
<NO>13</NO>
<PP>541-546</PP>
</SEQ>

<SEQ>
<UI>0822   Saurin,W.     Matching Relational Pa.. Comput.Appl.Bio 87 
3(2):115-120
</UI>
<AU>Saurin W;
    Marliere P
</AU>
<TI>Matching Relational Patterns in Nucleic Acid Sequences
</TI>
<SU>Match complex patterns;
    FR;
    Nucleic acid
</SU>
<AB>"We describe a program that efficiently searches sequence data banks for
complex patterns where sites are linked by common relations such as identity,
complementarity or span. Its algorithm is closer to those of automatic
demonstration than to the finite state machines used in fast pattern matching."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1987</PY>
<VO>3</VO>
<NO>2</NO>
<PP>115-120</PP>
</SEQ>

<SEQ>
<UI>0823   Schaback,R.   On the Expected Sublin.. SIAM J.Comput.  88 
17(4):648-658
</UI>
<AU>Schaback R
</AU>
<TI>On the Expected Sublinearity of the Boyer-Moore Algorithm
</TI>
<SU>String match;
    Boyer-Moore;
    DE;
    Probabilistic;
    Algorithm
</SU>
<AB>"This paper analyzes the expected performance of a simplified version BM*
of the Boyer-Moore string-matching algorithm. A probabilistic automaton A, 
which
models the expected behavior of BM*, is set up under the assumption that both
text and pattern are generated by a source which emits independent and
uncorrelated symbols with an arbitrary distribution of probabilities."
</AB>
<JT>SIAM J Comput</JT>
<PY>1988</PY>
<VO>17</VO>
<NO>4</NO>
<PP>648-658</PP>
</SEQ>

<SEQ>
<UI>0824   Schneider,T.D Sequence Logos: A New .. Nucleic Acids R 90 
18(20):6097-61
</UI>
<AU>Schneider TD;
    Stephens RM
</AU>
<TI>Sequence Logos: A New Way to Display Consensus Sequences
</TI>
<SU>Consensus sequence;
    USA;
    Display
</SU>
<AB>"A graphical method is presented for displaying the patterns in a set of
aligned sequences. The characters representing the sequence are stacked on top
of each other for each position in the aligned sequences. The height of each
letter is made proportional to its frequency, and the letters are sorted so the
most common one is on top."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1990</PY>
<VO>18</VO>
<NO>20</NO>
<PP>6097-6100</PP>
</SEQ>

<SEQ>
<UI>0825   Schneider,T.D Information Content of.. J.Mol.Biol.     86 
188:415-431
</UI>
<AU>Schneider TD;
    Stormo GD;
    Gold L;
    Ehrenfeucht A
</AU>
<TI>Information Content of Binding Sites on Nucleotide Sequences
</TI>
<SU>Match a pattern matrix;
    Information content;
    USA;
    Distributed;
    Nucleotide;
    Binding
</SU>
<AB>"We define a measure of the information ... in the sequence patterns at
binding sites. It allows one to investigate how information is distributed
across the sites and to compare one site to another. One can also calculate the
amount of information ... that would be required to locate the sites, given 
that
they occur with some frequency in the genome." Matrices having a high
information content
</AB>
<JT>J Mol Biol</JT>
<PY>188</PY>
<VO>188</VO>
<PP>415-431</PP>
</SEQ>

<SEQ>
<UI>0826   Schoniger,M.  A Local Algorithm for .. Bull.Math.Biol. 92 
54(4):521-536
</UI>
<AU>Schoniger M;
    Waterman MS
</AU>
<TI>A Local Algorithm for DNA Sequence Alignment with Inversions
</TI>
<SU>Subalignment;
    Dynamic programming;
    USA;
    Sequence alignment;
    Inversion;
    DNA;
    Algorithm
</SU>
<AB>"A dynamic programming algorithm to find all optimal alignments of DNA
subsequences is described. The alignments use not only substitutions, 
insertions
and deletions of nucleotides but also inversions (reversed complements) of
substrings of the sequences. The inversion alignments themselves contain
substitutions, insertions and deletions of nucleotides."
</AB>
<JT>Bull Math Biol</JT>
<PY>1992</PY>
<VO>54</VO>
<NO>4</NO>
<PP>521-536</PP>
</SEQ>

<SEQ>
<UI>0827   Schuler,G.D.  A Workbench for Multip.. Proteins Struct 91 
9(3):180-190
</UI>
<AU>Schuler GD;
    Altschul SF;
    Lipman DJ
</AU>
<TI>A Workbench for Multiple Alignment Construction and Analysis
</TI>
<SU>Multiple alignment;
    Segment;
    USA;
    Sequence alignment
</SU>
<AB>"Multiple sequence alignment can be a useful technique for studying
molecular evolution, as well as for analyzing relationships between structure 
or
function and primary sequence. We have developed for this purpose an 
interactive
program, MACAW ..., that allows the user to construct multiple alignments by
locating, analyzing, editing, and combining 'blocks' of aligned sequence
segments."
</AB>
<JT>Proteins Struct Funct Genet</JT>
<PY>1991</PY>
<VO>9</VO>
<NO>3</NO>
<PP>180-190</PP>
</SEQ>

<SEQ>
<UI>0828   Schwartz,R.M. Matrices for Detecting.. Atlas of Prot.. 78National 
Biomed
</UI>
<AU>Schwartz RM;
    Dayhoff MO
</AU>
<TI>Matrices for Detecting Distant Relationships
</TI>
<ED>Dayhoff MO
</ED>
<BK>Atlas of Protein Sequence and Structure, Volume 5, Supplement 3, 1978
</BK>
<SU>Sequence proximity;
    Substitution;
    Scoring;
    USA;
    Genetic
</SU>
<AB>A comparison of four matrices for calculating scores of pairs of aligned
protein sequences: unitary matrix (UM), genetic code matrix (GCM), alternative
amino acids matrix (AAAM), and the mutation data matrix (MDM78). In the
comparisons, MDM78 seems clearly to be the best
</AB>
<PU>National Biomedical Research Foundation </PU>
<PL>Washington, DC </PL>
<PY>1978</PY>
<PP>353-358</PP>
</SEQ>

<SEQ>
<UI>0829   Schwartz,S.   Software Tools for Ana.. Nucleic Acids R 91 
19(17):4663-46
</UI>
<AU>Schwartz S;
    Miller W;
    Yang CM;
    Hardison RC
</AU>
<TI>Software Tools for Analyzing Pairwise Alignments of Long Sequences
</TI>
<SU>Pairwise comparison;
    Dot;
    USA;
    Pairwise alignment
</SU>
<AB>"Computer tools are needed to summarize the information [from pairwise
sequence comparisons], to assist in its analysis, and to report the findings.
... One tool prepares publication-quality pictorial representations of
alignments, while another facilitates interactive browsing of pairwise 
alignment
data."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1991</PY>
<VO>19</VO>
<NO>17</NO>
<PP>4663-4667</PP>
</SEQ>

<SEQ>
<UI>0830   Sege,R.D.     A Statistical Test for.. Nucleic Acids R 82 
10(1):375-389
</UI>
<AU>Sege RD;
    Saxberg BEH
</AU>
<TI>A Statistical Test for Comparing Several Nucleotide Sequences
</TI>
<SU>Consensus sequence;
    Likelihood;
    USA;
    Statistical;
    Nucleotide
</SU>
<AB>"The general problem addressed here is to determine the level of
information contained in a group of N bases, i.e., to examine the distribution
of bases at one location among N sequences, or at N locations along one 
sequence
.... A method will now be derived which will allow one to determine the level 
of
confidence for rejecting the hypothesis that the observed data came by chance
selection from the base pool."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1982</PY>
<VO>10</VO>
<NO>1</NO>
<PP>375-389</PP>
</SEQ>

<SEQ>
<UI>0831   Sellers,P.H.  An Algorithm for the D.. J.Combin.Theory 74 
16:253-258
</UI>
<AU>Sellers PH
</AU>
<TI>An Algorithm for the Distance Between Two Finite Sequences
</TI>
<SU>Pairwise alignment;
    Sequence proximity;
    USA;
    Dynamic programming;
    Distance;
    Algorithm
</SU>
<AB>An equal-weights, distance-based, dynamic programming algorithm to 
compare
two sequences. See Sellers (1974b) for the generalization to arbitrary weights
</AB>
<JT>J Combin Theory Ser A </JT>
<PY>16</PY>
<VO>16</VO>
<PP>253-258</PP>
</SEQ>

<SEQ>
<UI>0832   Sellers,P.H.  On the Theory and Comp.. SIAM J.Appl.Mat 74 
26(4):787-793
</UI>
<AU>Sellers PH
</AU>
<TI>On the Theory and Computation of Evolutionary Distances
</TI>
<SU>Pairwise alignment;
    Sequence proximity;
    USA;
    Evolutionary distance;
    Distance
</SU>
<AB>"This paper gives a formal definition of the biological concept of
evolutionary distance and an algorithm to compute it." See Sellers (1974a) for
the algorithm under the assumption of equal weights
</AB>
<JT>SIAM J Appl Math</JT>
<PY>1974</PY>
<VO>26</VO>
<NO>4</NO>
<PP>787-793</PP>
</SEQ>

<SEQ>
<UI>0833   Sellers,P.H.  Pattern Recognition in.. Proc.Nat.Acad.S 79 
76(7):3041-304
</UI>
<AU>Sellers PH
</AU>
<TI>Pattern Recognition in Genetic Sequences
</TI>
<SU>Match with k differences;
    Pattern recognition;
    USA;
    Evolutionary distance;
    Genetic;
    Recognition
</SU>
<AB>"This paper announces an algorithm for finding pattern similarities
between two given finite sequences. Two portions, one from each sequence, are
similar if they are close in the metric space of evolutionary distances. ...
This result lends itself to detecting similarities by computer between pairs of
biological sequences, such as proteins and nucleic acids." See Sellers (1980)
for details
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>1979</PY>
<VO>76</VO>
<NO>7</NO>
<PP>3041-3041</PP>
</SEQ>

<SEQ>
<UI>0834   Sellers,P.H.  The Theory and Computa.. J.Algorithms    80 
1:359-373
</UI>
<AU>Sellers PH
</AU>
<TI>The Theory and Computation of Evolutionary Distances: Pattern Recognition
</TI>
<SU>Match with k differences;
    Pattern recognition;
    USA;
    Display;
    Evolutionary distance;
    Distance;
    Recognition
</SU>
<AB>"A method of finding pattern similarities between two sequences is given.
Two portions, one from each sequence, are similar if they are close in the
metric space of evolutionary distances. The method allows a complete list to be
made of all pairs of intervals, one from each of two given sequences, such that
each pair displays a maximum local degree of similarity ...."
</AB>
<JT>J Algorithms </JT>
<PY>1</PY>
<VO>1</VO>
<PP>359-373</PP>
</SEQ>

<SEQ>
<UI>0835   Sellers,P.H.  Pattern Recognition in.. Bull.Math.Biol. 84 
46(4):501-514
</UI>
<AU>Sellers PH
</AU>
<TI>Pattern Recognition in Genetic Sequences by Mismatch Density
</TI>
<SU>Subalignment;
    Dynamic programming;
    USA;
    Pattern recognition;
    Genetic;
    Recognition
</SU>
<AB>Computer program for similarity search. Find subsequences within the
matrix exhibiting locally optimal alignment by maintaining a minimum match
density. Based on the concept of match density as suggested by Goad and 
Kanehisa
(1982)
</AB>
<JT>Bull Math Biol</JT>
<PY>1984</PY>
<VO>46</VO>
<NO>4</NO>
<PP>501-514</PP>
</SEQ>

<SEQ>
<UI>0836   Shapiro,B.A.  An Interactive Dot Mat.. J.Biomol.Struct 87 
4(5):697-706
</UI>
<AU>Shapiro BA;
    Nussinov R;
    Lipkin LE;
    Maizel JV Jr
</AU>
<TI>An Interactive Dot Matrix System for Locating Potentially Significant
Features in Nucleic Acid Molecules
</TI>
<SU>Sequence analysis;
    Significance;
    Dot;
    USA;
    Region;
    Probabilistic;
    Nucleic acid;
    Matrix
</SU>
<AB>"An interactive computer system using a dot matrix approach has been
developed and used to determine potentially significant features due to
distortions in the B-DNA helix .... Specifically, it was found that a pattern 
of
alternating doublets of purines and pyrimidines appear to exist in regulatory
regions. This result is shown to be beyond probabilistic expectation."
</AB>
<JT>J Biomol Struct &amp; Dyn </JT>
<PY>1987</PY>
<VO>4</VO>
<NO>5</NO>
<PP>697-706</PP>
</SEQ>

<SEQ>
<UI>0837   Sibbald,P.R.  Scrutineer: A Computer.. Comput.Appl.Bio 90 
6(3):279-288
</UI>
<AU>Sibbald PR;
    Argos P
</AU>
<TI>Scrutineer: A Computer Program that Flexibly Seeks and Describes Motifs
and Profiles in Protein Sequence Databases
</TI>
<SU>Database search;
    DE;
    Motif;
    Sequence database;
    Program;
    Profile;
    Protein
</SU>
<AB>"Scrutineer is an interactive, user-friendly program designed to search
for motifs, patterns and profiles in the Swissprot, Protein Identification
Resource (PIR) or SeqDb protein sequence databases."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1990</PY>
<VO>6</VO>
<NO>3</NO>
<PP>279-288</PP>
</SEQ>

<SEQ>
<UI>0838   Sibbald,P.R.  Weighting Aligned Prot.. J.Mol.Biol.     90 
216(4):813-818
</UI>
<AU>Sibbald PR;
    Argos P
</AU>
<TI>Weighting Aligned Protein or Nucleic Acid Sequences to Correct for 
Unequal
Representation
</TI>
<SU>Sequence weight;
    DE;
    Correction;
    Database search;
    Representation;
    Profile;
    Protein;
    Nucleic acid
</SU>
<AB>"Aligned sequences from the same family ... are seldom representative of
the entire family. ... For many applications, such as using alignments or
profiles to perform database searches for distantly related family members, 
such
unequal representation requires correction. An algorithm to perform appropriate
weighting of individual sequences is presented along with examples illustrating
its efficacy."
</AB>
<JT>J Mol Biol</JT>
<PY>1990</PY>
<VO>216</VO>
<NO>4</NO>
<PP>813-818</PP>
</SEQ>

<SEQ>
<UI>0839   Sibbald,P.R.  Calculating Higher Ord.. J.Theor.Biol.   89 
136:475-483
</UI>
<AU>Sibbald PR;
    Banerjee S;
    Maze J
</AU>
<TI>Calculating Higher Order DNA Sequence Information Measures
</TI>
<SU>Sequence analysis;
    Information theory;
    CA;
    DNA
</SU>
<AB>"This paper re-examines the use of information theory as a tool for
understanding DNA. Specifically we (a) refine Gatlin's application (1972) of
information theory to DNA sequence analysis, (b) point out some recent
misinterpretations of results obtained by Brooks et al. (1988), (c) reconsider
the problem that the finite lengths of DNA sequences pose for the use of a
theory designed for sequences of infinite length ...."
</AB>
<JT>J Theor Biol</JT>
<PY>136</PY>
<VO>136</VO>
<PP>475-483</PP>
</SEQ>

<SEQ>
<UI>0840   Sittig,D.F.   A Parallel Computing A.. Comput.Biomed.R 91 
24(2):152-169
</UI>
<AU>Sittig DF;
    Foulser D;
    Carriero N;
    McCorkle G;
    Miller PL
</AU>
<TI>A Parallel Computing Approach to Genetic Sequence Comparison: The Master-
Worker Paradigm with Interworker Communication
</TI>
<SU>Pairwise alignment;
    Parallel;
    USA;
    Sequence comparison;
    Dynamic programming;
    Genetic
</SU>
<AB>"We have implemented a parallel version of a dynamic programming
biological sequence comparison algorithm to study the potential applicability 
of
using parallel computers for genetic sequence comparisons. Our parallel program
... was tested on both a 10 CPU Sequent Symmetry and a 64 CPU Intel Hypercube."
A parallel version of Gotoh's (1982) algorithm
</AB>
<JT>Comput Biomed Res</JT>
<PY>1991</PY>
<VO>24</VO>
<NO>2</NO>
<PP>152-169</PP>
</SEQ>

<SEQ>
<UI>0841   Slisenko,A.O. Detection of Periodici.. J.Soviet Math.  83 
22(3):1316-138
</UI>
<AU>Slisenko AO
</AU>
<TI>Detection of Periodicities and String-matching in Real Time
</TI>
<SU>Pattern match;
    Complexity;
    Regularities;
    RU;
    Detection
</SU>
<AB>"This article contains a detailed description of an algorithm for finding
all periodicities in real time on a machine with random memory access and
registers of asymptotically minimal length. In fact, this construction gives a
real-time algorithm for pattern matching, finding the longest repetitions, and
so forth."
</AB>
<JT>J Soviet Math</JT>
<PY>1983</PY>
<VO>22</VO>
<NO>3</NO>
<PP>1316-1387</PP>
</SEQ>

<SEQ>
<UI>0842   Smit,G.de V.  A Comparison of Three .. Software.Practi 82 12:57-66
</UI>
<AU>Smit Gde V
</AU>
<TI>A Comparison of Three String Matching Algorithms
</TI>
<SU>Knuth-Morris-Pratt;
    Boyer-Moore;
    SA;
    String match;
    Complexity;
    Algorithm
</SU>
<AB>"Three string matching algorithms - straightforward, Knuth-Morris-Pratt
and Boyer-Moore - are examined and their time complexities discussed. A
comparison of their actual average behaviour is made, based on empirical data
presented. It is shown that the Boyer-Moore algorithm is extremely efficient in
most cases and that ... the Knuth-Morris-Pratt algorithm is not significantly
better on the average than the straightforward algorithm."
</AB>
<JT>Software Practice Experience </JT>
<PY>12</PY>
<VO>12</VO>
<PP>57-66</PP>
</SEQ>

<SEQ>
<UI>0843   Smith,H.O.    Finding Sequence Motif.. Proc.Nat.Acad.S 90 
87:826-830
</UI>
<AU>Smith HO;
    Annau TM;
    Chandrasegaran S
</AU>
<TI>Finding Sequence Motifs in Groups of Functionally Related Proteins
</TI>
<SU>Consensus sequence;
    Statistical;
    USA;
    Motif;
    Segment;
    Protein
</SU>
<AB>"We have developed a method for rapidly finding patterns of conserved
amino acid residues (motifs) in groups of functionally related proteins. ...
Segments of the proteins containing those patterns that occur most frequently
are aligned on each other by a scoring method that obtains an average
relatedness value for all the amino acids in each column of the aligned 
sequence
block ...."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>87</PY>
<VO>87</VO>
<PP>826-830</PP>
</SEQ>

<SEQ>
<UI>0844   Smith,P.D.    Experiments with a Ver.. Software.Practi 91 
21(10):1065-10
</UI>
<AU>Smith PD
</AU>
<TI>Experiments with a Very Fast Substring Search Algorithm
</TI>
<SU>Pattern match;
    USA;
    String match;
    Algorithm
</SU>
<AB>"Sunday devised string matching methods that are generally faster than 
the
Boyer-Moore algorithm. His fastest method used statistics of the language being
scanned to determine the order in which character pairs are to be compared. In
this paper the performances of similar, but language-independent, algorithms 
are
examined. Results comparable with language-based algorithms can be achieved 
with
an adaptive technique."
</AB>
<JT>Software Practice Experience </JT>
<PY>1991</PY>
<VO>21</VO>
<NO>10</NO>
<PP>1065-1074</PP>
</SEQ>

<SEQ>
<UI>0845   Smith,R.      A Finite State Machine.. Comput.Appl.Bio 88 
4(4):459-465
</UI>
<AU>Smith R
</AU>
<TI>A Finite State Machine Algorithm for Finding Restriction Sites and other
Pattern Matching Applications
</TI>
<SU>Dictionary match;
    Automata;
    USA;
    Restriction;
    Algorithm
</SU>
<AB>"Existing algorithms for finding restriction endonuclease recognition
sites use brute-force algorithms which run in time O(NM) where N is the number
of nucleotides in the sequence under analysis and M is the total number of
nucleotides in all the different sites being searched for. This paper presents 
a
deterministic finite state machine algorithm which runs in time O(N)."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1988</PY>
<VO>4</VO>
<NO>4</NO>
<PP>459-465</PP>
</SEQ>

<SEQ>
<UI>0846   Smith,R.F.    Automatic Generation o.. Proc.Nat.Acad.S 90 
87:118-122
</UI>
<AU>Smith RF;
    Smith TF
</AU>
<TI>Automatic Generation of Primary Sequence Patterns from Sets of Related
Protein Sequences
</TI>
<SU>Consensus sequence;
    USA;
    Clustering;
    Protein
</SU>
<AB>"We have developed a computer algorithm that can extract the pattern of
conserved primary sequence elements common to all members of a homologous
protein family. The method involves clustering the pairwise similarity scores
among a set of related sequences to generate a binary dendrogram (tree). The
tree is then reduced in a stepwise manner ... until only a single common 'root'
pattern remains."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>87</PY>
<VO>87</VO>
<PP>118-122</PP>
</SEQ>

<SEQ>
<UI>0847   Smith,R.F.    Pattern-induced Multi-.. Protein Eng.    92 
5(1):35-41
</UI>
<AU>Smith RF;
    Smith TF
</AU>
<TI>Pattern-induced Multi-sequence Alignment (PIMA) Algorithm Employing
Secondary Structure-dependent Gap Penalties for Use in Comparative Protein
Modelling
</TI>
<SU>Multiple alignment;
    USA;
    Sequence alignment;
    Gap;
    Protein;
    Secondary;
    Algorithm
</SU>
<AB>"A multiple sequence alignment algorithm is described that uses a dynamic
programming-based pattern construction method to align a set of homologous
sequences based on their common pattern of conserved sequence elements."
</AB>
<JT>Protein Eng</JT>
<PY>1992</PY>
<VO>5</VO>
<NO>1</NO>
<PP>35-41</PP>
</SEQ>

<SEQ>
<UI>0848   Smith,T.F.    Comparison of Bioseque.. Adv.Appl.Math.  81 
2:482-489
</UI>
<AU>Smith TF;
    Waterman MS
</AU>
<TI>Comparison of Biosequences
</TI>
<SU>Pairwise alignment;
    USA;
    Segment;
    Needleman-Wunsch
</SU>
<AB>"The homology measure of Needleman and Wunsch (1970) is shown, under
general conditions, to be equivalent to the distance measure of Sellers (1974).
A new algorithm is given to find similar pairs of segments, one segment from
each sequence. The new algorithm is compared to an earlier one due to Sellers
(1980)."
</AB>
<JT>Adv Appl Math</JT>
<PY>2</PY>
<VO>2</VO>
<PP>482-489</PP>
</SEQ>

<SEQ>
<UI>0849   Smith,T.F.    Identification of Comm.. J.Mol.Biol.     81 
147:195-197
</UI>
<AU>Smith TF;
    Waterman MS
</AU>
<TI>Identification of Common Molecular Subsequences
</TI>
<SU>Subalignment;
    USA;
    Segment;
    Identification;
    Subsequence
</SU>
<AB>"In this letter we extend the above ideas to find a pair of segments, one
from each of two long sequences, such that there is no other pair of segments
with greater similarity (homology). The similarity measure used here allows for
arbitrary length deletions and insertions."
</AB>
<JT>J Mol Biol</JT>
<PY>147</PY>
<VO>147</VO>
<PP>195-197</PP>
</SEQ>

<SEQ>
<UI>0850   Smith,T.F.    The Statistical Distri.. Nucleic Acids R 85 
13(2):645-656
</UI>
<AU>Smith TF;
    Waterman MS;
    Burks C
</AU>
<TI>The Statistical Distribution of Nucleic Acid Similarities
</TI>
<SU>Pairwise comparison;
    Significance;
    USA;
    Statistical;
    Segment;
    Distributed;
    Distribution;
    Similarity;
    Nucleic acid
</SU>
<AB>"All pairs of a large set of known vertebrate DNA sequences were searched
by computer for most similar segments. Analysis of this data shows that the
computed similarity scores are distributed proportionally to the logarithm of
the product of the lengths of the sequences involved. ... A simple rule is
derived for determination of statistical significance of the similarity scores
and to assist in relating statistical and biological significance."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1985</PY>
<VO>13</VO>
<NO>2</NO>
<PP>645-656</PP>
</SEQ>

<SEQ>
<UI>0851   Smith,T.F.    Comparative Biosequenc.. J.Mol.Evol.     81 
18(1):38-46
</UI>
<AU>Smith TF;
    Waterman MS;
    Fitch WM
</AU>
<TI>Comparative Biosequence Metrics
</TI>
<SU>Pairwise alignment;
    USA;
    Sequence alignment;
    Needleman-Wunsch
</SU>
<AB>"The sequence alignment algorithms of Needleman and Wunsch (1970) and
Sellers (1974) are compared. Although the former maximizes similarity and the
latter minimizes differences, the two procedures are proven to be equivalent.
The equivalence relations necessary for each procedure to give the same result
are" described
</AB>
<JT>J Mol Evol</JT>
<PY>1981</PY>
<VO>18</VO>
<NO>1</NO>
<PP>38-46</PP>
</SEQ>

<SEQ>
<UI>0852   Sobel,E.      A Multiple Sequence Al.. Nucleic Acids R 86 
14(1):363-374
</UI>
<AU>Sobel E;
    Martinez HM
</AU>
<TI>A Multiple Sequence Alignment Program
</TI>
<SU>Multiple alignment;
    Segment;
    USA;
    Sequence alignment;
    Statistical;
    Significance;
    Program
</SU>
<AB>"A program is described for simultaneously aligning two or more molecular
sequences which is based on first finding common segments above a specified
length and then piecing these together to maximize an alignment scoring
function. Optimal as well as near-optimal alignments are found, and there is
also provided a means for randomizing the given sequences for testing the
statistical significance of an alignment."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1986</PY>
<VO>14</VO>
<NO>1</NO>
<PP>363-374</PP>
</SEQ>

<SEQ>
<UI>0853   Spouge,J.L.   Improving Sequence-mat.. J.Mol.Biol.     85 
181(1):137-138
</UI>
<AU>Spouge JL
</AU>
<TI>Improving Sequence-matching Algorithms by Working from Both Ends
</TI>
<SU>Pairwise alignment;
    USA;
    Algorithm
</SU>
<AB>"Recent algorithms (e.g., Ukkonen, Fickett) align nucleic acid sequences
(starting from the left) by bounding the allowed distance between subsequences
by d, aligning, then incrementing d until all of both sequences are aligned.
Aligning from both ends is more efficient."
</AB>
<JT>J Mol Biol</JT>
<PY>1985</PY>
<VO>181</VO>
<NO>1</NO>
<PP>137-138</PP>
</SEQ>

<SEQ>
<UI>0854   Spouge,J.L.   Speeding up Dynamic Pr.. SIAM J.Appl.Mat 89 
49(5):1552-156
</UI>
<AU>Spouge JL
</AU>
<TI>Speeding up Dynamic Programming Algorithms for Finding Optimal Lattice
Paths
</TI>
<SU>Pairwise alignment;
    Dynamic programming;
    USA;
    Optimal;
    Dynamic;
    Algorithm
</SU>
<AB>"Finding an optimal alignment between two sequences can ... be reduced to
finding an optimal lattice path. Dynamic programming algorithms are generally
well-suited to such problems, but can be slow and require too much storage ....
Faster algorithms requiring less computer storage can often be constructed by
restricting calculations to a 'computational volume' known to contain the
optimal path."
</AB>
<JT>SIAM J Appl Math</JT>
<PY>1989</PY>
<VO>49</VO>
<NO>5</NO>
<PP>1552-1566</PP>
</SEQ>

<SEQ>
<UI>0855   Spouge,J.L.   Fast Optimal Alignment   Comput.Appl.Bio 91 7(1):1-7
</UI>
<AU>Spouge JL
</AU>
<TI>Fast Optimal Alignment
</TI>
<SU>Pairwise alignment;
    USA;
    Gap;
    Optimal
</SU>
<AB>"A general principle underlies the efficiency of [the efficient alignment
algorithms of Fickett and Ukkonen]: inequalities can direct computations to
promising subalignments. Hence inequalities can be used to suggest alignment
algorithms. Inequalities for unweighted end-gaps, affine and concave gap
weights, etc., are discussed ...."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1991</PY>
<VO>7</VO>
<NO>1</NO>
<PP>1-7</PP>
</SEQ>

<SEQ>
<UI>0856   Sprizhitsky,Y Statistical Analysis o.. J.Biomol.Struct 88 
6(2):345-358
</UI>
<AU>Sprizhitsky YA;
    Nechipurenko YD;
    Alexandrov AA;
    Volkenstein MV
</AU>
<TI>Statistical Analysis of Nucleotide Runs in Coding and Noncoding DNA
Sequences
</TI>
<SU>Sequence analysis;
    Significance;
    RU;
    Statistical;
    Coding;
    DNA;
    Nucleotide
</SU>
<AB>"There are considerable differences of run distributions in DNA sequences
of procaryotes, invertebrates and vertebrates. There is an abundance of short
runs (1-2 nucleotides long) in the coding sequences and there is a deficiency 
of
such runs in the noncoding regions."
</AB>
<JT>J Biomol Struct &amp; Dyn </JT>
<PY>1988</PY>
<VO>6</VO>
<NO>2</NO>
<PP>345-358</PP>
</SEQ>

<SEQ>
<UI>0857   Srinivas,Y.V. A Sheaf-theoretic Appr.. Theoret.Comput. 93 
112:53-97
</UI>
<AU>Srinivas YV
</AU>
<TI>A Sheaf-theoretic Approach to Pattern Matching and Related Problems
</TI>
<SU>String match;
    Knuth-Morris-Pratt;
    Pattern match;
    USA
</SU>
<AB>"We present a general theory of pattern matching by adopting an
extensional, geometric view of patterns. ... We derive a generalized version of
the Knuth-Morris-Pratt string-matching algorithm by gradually converting this
extensional description into an intensional description, i.e., an algorithm."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>112</PY>
<VO>112</VO>
<PP>53-97</PP>
</SEQ>

<SEQ>
<UI>0858   Staden,R.     An Interactive Graphic.. Nucleic Acids R 82 
10(9):2951-296
</UI>
<AU>Staden R
</AU>
<TI>An Interactive Graphics Program for Comparing and Aligning Nucleic Acid
and Amino Acid Sequences
</TI>
<SU>Pairwise comparison;
    Dot;
    UK;
    Program;
    Amino acid;
    Nucleic acid;
    Graphic
</SU>
<AB>"This paper describes a computer program designed to look for 
similarities
between pairs of nucleic or amino acid sequences. ... The basic principle ...
was first described by Gibbs and McIntyre (1970) and involves producing a
diagram that contains a representation of all the matches between a pair of
sequences. This diagram is then scanned by eye and the human ability to
recognize patterns used to detect any similarities that might be present."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1982</PY>
<VO>10</VO>
<NO>9</NO>
<PP>2951-2961</PP>
</SEQ>

<SEQ>
<UI>0859   Staden,R.     Computer Methods to Lo.. Nucleic Acids R 84 
12(1):505-519
</UI>
<AU>Staden R
</AU>
<TI>Computer Methods to Locate Signals in Nucleic Acid Sequences
</TI>
<SU>Match a pattern matrix;
    UK;
    Signal;
    Nucleic acid
</SU>
<AB>"We describe a computer program that can be used to locate poorly defined
recognition sequences .... Our methods ... assign separate values to each base
at each position of the recognition sequence and can therefore indicate the
relative importance of each base at each position. This is done by using a
weight matrix to represent each type of recognition sequence."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1984</PY>
<VO>12</VO>
<NO>1</NO>
<PP>505-519</PP>
</SEQ>

<SEQ>
<UI>0860   Staden,R.     Methods to Define and .. Comput.Appl.Bio 88 
4(1):53-60
</UI>
<AU>Staden R
</AU>
<TI>Methods to Define and Locate Patterns of Motifs in Sequences
</TI>
<SU>Dictionary match;
    UK;
    Motif
</SU>
<AB>"A method to define and search for complex patterns of motifs in nucleic
acid and protein sequences is described. With this method nucleic acid motifs
can be defined in eight different ways and protein motifs in six. A pattern is
defined by a list of motifs. ... Programs to search for patterns in individual
sequences and libraries of sequences are described."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1988</PY>
<VO>4</VO>
<NO>1</NO>
<PP>53-60</PP>
</SEQ>

<SEQ>
<UI>0861   Staden,R.     Methods for Calculatin.. Comput.Appl.Bio 89 
5(2):89-96
</UI>
<AU>Staden R
</AU>
<TI>Methods for Calculating the Probabilities of Finding Patterns in 
Sequences
</TI>
<SU>Pattern recognition;
    Significance;
    UK;
    Probabilistic;
    Motif;
    Probability
</SU>
<AB>"This paper describes the use of probability-generating functions for
calculating the probabilities of finding motifs in nucleic acid and protein
sequences. Equations and algorithms are given for calculating the probabilities
associated with nine different ways of defining motifs. Comparisons are made
with searches of random sequences."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1989</PY>
<VO>5</VO>
<NO>2</NO>
<PP>89-96</PP>
</SEQ>

<SEQ>
<UI>0862   Staden,R.     Methods for Discoverin.. Comput.Appl.Bio 89 
5(4):293-298
</UI>
<AU>Staden R
</AU>
<TI>Methods for Discovering Novel Motifs in Nucleic Acid Sequences
</TI>
<SU>Consensus sequence;
    Neighbourhood;
    UK;
    Motif;
    Nucleic acid
</SU>
<AB>"We describe a computer tool to aid the discovery of new motifs in 
nucleic
acid sequences. ... The heart of the method is the creation of dictionaries of
related subsequences [which] can then be analyzed to look for the commonest or
best-defined subsequences, those that occur in the highest number of different
sequences, or for those in equivalent positions within the family."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1989</PY>
<VO>5</VO>
<NO>4</NO>
<PP>293-298</PP>
</SEQ>

<SEQ>
<UI>0863   Staden,R.     Searching for Patterns.. Methods Enzymol 90 
183:193-211
</UI>
<AU>Staden R
</AU>
<TI>Searching for Patterns in Protein and Nucleic Acid Sequences
</TI>
<SU>Match a pattern matrix;
    UK;
    Protein;
    Nucleic acid
</SU>
<AB>"There is a rapidly growing number of well-established patterns,
especially in nucleic acid sequences, and some readers may wish only to know 
how
to search for these. This chapter, however, describes a set of programs that 
not
only perform searches for known patterns but which also enable users to define
their own patterns. The patterns can be defined in many different ways ... and
the search programs operate on individual sequences as well as whole libraries
of sequences." See Staden (1988) for the programs
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>193-211</PP>
</SEQ>

<SEQ>
<UI>0864   Staden,R.     Screening Protein and .. DNA Seq.- J.DNA 91 
1:369-374
</UI>
<AU>Staden R
</AU>
<TI>Screening Protein and Nucleic Acid Sequences against Libraries of 
Patterns
</TI>
<SU>Match complex patterns;
    Pattern library;
    Motif;
    UK;
    Protein;
    Nucleic acid
</SU>
<AB>"We describe programs that can screen nucleic acid and protein sequences
against libraries of motifs and patterns. Such comparisons are likely to play 
an
important role in interpreting the function of sequences determined during 
large
scale sequencing projects."
</AB>
<JT>DNA Seq - J DNA Seq Mapping</JT>
<PY>1991</PY>
<VO>1</VO>
<PP>369-374</PP>
</SEQ>

<SEQ>
<UI>0865   States,D.J.   Similarity and Homology  Sequence Anal.. 91W. H. 
Freeman
</UI>
<AU>States DJ;
    Boguski MS
</AU>
<TI>Similarity and Homology
</TI>
<ED>Gribskov M
    Devereux J
</ED>
<BK>Sequence Analysis Primer
</BK>
<SU>Sequence analysis;
    Review;
    Dot;
    Dynamic programming;
    Sequence alignment;
    Similarity;
    Homology;
    USA
</SU>
<AB>A review. Similarity versus Homology. Dot Matrix Methods. Dynamic
Programming Methods. Scoring Systems. Multiple Sequence Alignment
</AB>
<PU>W H Freeman </PU>
<PL>New York </PL>
<PY>1991</PY>
<PP>89-157</PP>
</SEQ>

<SEQ>
<UI>0866   Sternberg,M.J PROMOT: A FORTRAN Prog.. Comput.Appl.Bio 91 
7(2):257-260
</UI>
<AU>Sternberg MJE
</AU>
<TI>PROMOT: A FORTRAN Program to Scan Protein Sequences Against a Library of
Known Motifs
</TI>
<SU>Dictionary match;
    Pattern library;
    Motif;
    UK;
    Program;
    Protein
</SU>
<AB>"Recently a database (PROSITE) has been established that contains 337
known motifs encoded as a list of allowed residue types at specific positions
along the sequence. PROMOT is a FORTRAN computer program that takes a protein
sequence and examines if it contains any of the motifs in PROSITE."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1991</PY>
<VO>7</VO>
<NO>2</NO>
<PP>257-260</PP>
</SEQ>

<SEQ>
<UI>0867   Sternberg,M.J Library of common prot.. Nature (Lond.)  91 349(10 
Jan.):1
</UI>
<AU>Sternberg MJE
</AU>
<TI>Library of common protein motifs
</TI>
<SU>Sequence analysis;
    Significance;
    Pattern library;
    Motif;
    UK;
    Protein
</SU>
<AB>"One problem facing molecular biologists is the evaluation of the
significance of finding a common amino-acid sequence motif in different
proteins. ... Bairoch has established a library, called PROSITE, of 337 protein
motifs, based on the ... SWISSPROT14 database. ... Thus if one finds a known
motif in a newly determined protein sequence and [the expected number of chance
matches calculated from residue frequencies] &lt; 0.5, then it is likely that this
match detects a biologically meaningful relationship."
</AB>
<JT>Nature (Lond ) </JT>
<PY>1991</PY>
<VO>349</VO>
<NO>10 Jan.</NO>
<PP>111-111</PP>
</SEQ>

<SEQ>
<UI>0868   Sternberg,M.J Local Protein Sequence.. Protein Eng.    90 
4(2):125-131
</UI>
<AU>Sternberg MJE;
    Islam SA
</AU>
<TI>Local Protein Sequence Similarity Does Not Imply a Structural 
Relationship
</TI>
<SU>Subalignment;
    Significance;
    UK;
    Structure;
    Similarity;
    Protein
</SU>
<AB>"Thus local sequence [similarity] does not indicate a structural
similarity when there is neither an evolutionary nor functional explanation to
support this. Accordingly structure predictions based on finding a local
sequence similarity with an evolutionary unrelated protein of known 
conformation
are unlikely to be valid."
</AB>
<JT>Protein Eng</JT>
<PY>1990</PY>
<VO>4</VO>
<NO>2</NO>
<PP>125-131</PP>
</SEQ>

<SEQ>
<UI>0869   Sternberg,M.J Protein Sequences - Ho.. Trends Biotechn 91 
9(9):300-302
</UI>
<AU>Sternberg MJE;
    Islam SA
</AU>
<TI>Protein Sequences - Homologies and Motifs
</TI>
<SU>Database search;
    Review;
    UK;
    Motif;
    Homology;
    Protein
</SU>
<AB>"When a protein sequence is determined, it is standard procedure to
perform database searches to identify any similarities that may exist to other
sequences .... This article outlines some of the software techniques available
for performing such searches."
</AB>
<JT>Trends Biotechnol</JT>
<PY>1991</PY>
<VO>9</VO>
<NO>9</NO>
<PP>300-302</PP>
</SEQ>

<SEQ>
<UI>0870   Stormo,G.D.   Identifying Coding Seq.. Nucleic Acid .. 87IRL Press
</UI>
<AU>Stormo GD
</AU>
<TI>Identifying Coding Sequences
</TI>
<ED>Bishop MJ
    Rawlings CJ
</ED>
<BK>Nucleic Acid and Protein Sequence Analysis: A Practical Approach
</BK>
<SU>Match a pattern matrix;
    USA;
    Motif;
    Coding
</SU>
<AB>A survey of a variety of techniques which have been employed to describe
and locate imprecisely defined motifs
</AB>
<PU>IRL Press </PU>
<PL>Oxford, UK </PL>
<PY>1987</PY>
<PP>231-258</PP>
</SEQ>

<SEQ>
<UI>0871   Stormo,G.D.   Computer Methods for A.. Annu.Rev.Biophy 88 
17:241-263
</UI>
<AU>Stormo GD
</AU>
<TI>Computer Methods for Analyzing Sequence Recognition of Nucleic Acids
</TI>
<SU>Match a pattern matrix;
    Consensus sequence;
    Review;
    USA;
    Sequence recognition;
    Information content;
    Nucleic acid;
    Recognition
</SU>
<AB>Perspectives and overview. Sequence patterns in nucleic acids: 
qualitative
specificity, quantitative specificity. Qualitative analysis: finding consensus,
matrix methods. Quantitative analysis: information content of binding sites,
thermodynamics of recognition, activity matrices. Future directions. 
Conclusions
</AB>
<JT>Annu Rev Biophys Biophys Chem</JT>
<PY>17</PY>
<VO>17</VO>
<PP>241-263</PP>
</SEQ>

<SEQ>
<UI>0872   Stormo,G.D.   Consensus Patterns in .. Methods Enzymol 90 
183:211-221
</UI>
<AU>Stormo GD
</AU>
<TI>Consensus Patterns in DNA
</TI>
<SU>Consensus sequence;
    Information theory;
    USA;
    Region;
    Genome;
    DNA
</SU>
<AB>"This chapter describes computer-aided methods useful for the
identification and analysis of regulatory sites. The goal of these methods is 
to
extract from a set of known binding sites a pattern which describes the sites
and serves to distinguish them from other regions of the genome that are not
bound by the protein."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>211-221</PP>
</SEQ>

<SEQ>
<UI>0873   Stormo,G.D.   Identifying Protein-bi.. Proc.Nat.Acad.S 89 
86:1183-1187
</UI>
<AU>Stormo GD;
    Hartzell GW III
</AU>
<TI>Identifying Protein-binding Sites from Unaligned DNA Fragments
</TI>
<SU>Consensus sequence;
    Information theory;
    Pattern recognition;
    USA;
    Fragment;
    DNA
</SU>
<AB>"We present a [consensus] method that can be applied to the problem of
identifying the recognition pattern for a DNA-binding protein given only a
collection of sequenced DNA fragments .... The method compares the 'information
content' of a large number of possible binding site alignments to arrive at a
matrix representation of the binding site pattern."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>86</PY>
<VO>86</VO>
<PP>1183-1187</PP>
</SEQ>

<SEQ>
<UI>0874   Streletc,V.B. Fast, Statistically Ba.. Comput.Appl.Bio 92 
8(6):529-534
</UI>
<AU>Streletc VB;
    Shindyalov IN;
    Kolchanov NA;
    Milanesi L
</AU>
<TI>Fast, Statistically Based Alignment of Amino Acid Sequences on the Base 
of
Diagonal Fragments of Dot-matrices
</TI>
<SU>Pairwise alignment;
    Dot;
    RU;
    Statistical;
    Fragment;
    Amino acid
</SU>
<AB>"We present a new pairwise alignment algorithm that uses iterative
statistical analysis of homologous subsequences. Apart from the classical
conversion of the dot-matrix characteristic of the Needleman-Wunsch algorithm
(NW), we used only those matrix elements that corresponded to the most non-
random subsequence homologies."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>6</NO>
<PP>529-534</PP>
</SEQ>

<SEQ>
<UI>0875   Stuckle,E.E.  Statistical Analysis o.. Nucleic Acids R 90 
18(22):6641-66
</UI>
<AU>Stuckle EE;
    Emmrich C;
    Grob U;
    Nielsen PJ
</AU>
<TI>Statistical Analysis of Nucleotide Sequences
</TI>
<SU>Sequence analysis;
    Significance;
    Markov;
    DE;
    Statistical;
    Signal;
    Nucleotide
</SU>
<AB>"In order to scan nucleic acid databases for potentially relevant but as
yet unknown signals, we have developed an improved statistical model for 
pattern
analysis of nucleic acid sequences by modifying previous methods based on 
Markov
chains. ... The model allows the simultaneous analysis of several short
sequences with unequal base frequencies and Markov order k not= 0 as is 
usually the
case in databases."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1990</PY>
<VO>18</VO>
<NO>22</NO>
<PP>6641-6647</PP>
</SEQ>

<SEQ>
<UI>0876   Subbiah,S.    A Method for Multiple .. J.Mol.Biol.     89 
209:539-548
</UI>
<AU>Subbiah S;
    Harrison SC
</AU>
<TI>A Method for Multiple Sequence Alignment with Gaps
</TI>
<SU>Multiple alignment;
    Clustering;
    USA;
    Sequence alignment;
    Gap;
    Needleman-Wunsch;
    Optimal
</SU>
<AB>"A method that performs multiple sequence alignment by cyclical use of 
the
standard pairwise Needleman-Wunsch algorithm is presented. ... Comparison with
the one known case where the optimal multiple sequence alignment has been
rigorously determined shows that in practice the proposed method finds the
mathematically optimal solution."
</AB>
<JT>J Mol Biol</JT>
<PY>209</PY>
<VO>209</VO>
<PP>539-548</PP>
</SEQ>

<SEQ>
<UI>0877   Suboch,G.M.   Statistical Significan.. Comput.Appl.Bio 90 
6(1):43-48
</UI>
<AU>Suboch GM;
    Sprizhitsky YA
</AU>
<TI>Statistical Significance of Some Complex Nucleotide Combinations: A
Comparison of DNA Models
</TI>
<SU>Sequence analysis;
    Significance;
    Markov;
    RU;
    Statistical;
    DNA;
    Nucleotide;
    Model
</SU>
<AB>"DNA is modelled as a sequence of nucleotide runs. This model is shown to
provide a more adequate description of the observed frequencies of occurrence 
of
local homopurine-homopyrimidine mirror repeats than the second-order 
homogeneous
Markov chain model."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1990</PY>
<VO>6</VO>
<NO>1</NO>
<PP>43-48</PP>
</SEQ>

<SEQ>
<UI>0878   Sunday,D.M.   A Very Fast Substring .. Comm.ACM        90 
33(8):132-142
</UI>
<AU>Sunday DM
</AU>
<TI>A Very Fast Substring Search Algorithm
</TI>
<SU>String match;
    USA;
    Algorithm
</SU>
<AB>"This article describes a substring search algorithm that is faster than
the Boyer-Moore algorithm. This algorithm does not depend on scanning the
pattern string in any particular order. Three variations of the algorithm are
given that use three different pattern scan orders."
</AB>
<JT>Comm ACM </JT>
<PY>1990</PY>
<VO>33</VO>
<NO>8</NO>
<PP>132-142</PP>
</SEQ>

<SEQ>
<UI>0879   Swofford,D.L. Phylogeny Reconstruction Molecular Sys.. 90Sinauer 
Associa
</UI>
<AU>Swofford DL;
    Olsen GJ
</AU>
<TI>Phylogeny Reconstruction
</TI>
<ED>Hillis DM
    Moritz C
</ED>
<BK>Molecular Systematics
</BK>
<SU>Multiple alignment;
    Phylogeny;
    Evolutionary tree;
    Character data;
    Parsimony;
    USA
</SU>
<AB>"Inferring phylogenetic relationships from molecular data requires the
selection of an appropriate method from the many techniques that have been
described. Unfortunately, phylogenetic analysis is frequently treated as a 
black
box into which data are fed and out of which 'The Tree' springs. Our goal in
this chapter is to provide more than a cursory description of the available
analytical methods; rather, we hope to develop a conceptual framework for
understanding the theoretical and practical distinctions among alternative
methodologies."
</AB>
<PU>Sinauer Associates </PU>
<PL>Sunderland, MA </PL>
<PY>1990</PY>
<PP>411-501</PP>
</SEQ>

<SEQ>
<UI>0880   Tajima,F.     Determination of Windo.. J.Mol.Evol.     91 
33:470-473
</UI>
<AU>Tajima F
</AU>
<TI>Determination of Window Size for Analyzing DNA Sequences
</TI>
<SU>Sequence analysis;
    Significance;
    JP;
    Region;
    DNA
</SU>
<AB>"DNA sequences are generally not random sequences. To show such
nonrandomness visually, DNA sequence data are often plotted as moving averages
for a certain length of window slid along a sequence. Here a simple algorithm 
is
presented for determining the window size and for finding a nonrandom region of
sequence."
</AB>
<JT>J Mol Evol</JT>
<PY>33</PY>
<VO>33</VO>
<PP>470-473</PP>
</SEQ>

<SEQ>
<UI>0881   Tajima,K.     A New Multiple Alignme.. J.Protein Chem. 88 
7(3):292-293
</UI>
<AU>Tajima K
</AU>
<TI>A New Multiple Alignment Algorithm for Protein and DNA Sequences Based on
Vector-Scalar Matching
</TI>
<SU>Multiple alignment;
    Dynamic programming;
    Hardware;
    JP;
    Gap;
    Protein;
    DNA;
    Algorithm
</SU>
<AB>"The proposed new algorithm has two new features. First, the multiple
alignment containing gaps is treated as a vector sequence. ... Second, vector
and scalar sequences are compared using a dynamic programming approach. This
vector-scalar matching enables us to align sequences globally."
</AB>
<JT>J Protein Chem</JT>
<PY>1988</PY>
<VO>7</VO>
<NO>3</NO>
<PP>292-293</PP>
</SEQ>

<SEQ>
<UI>0882   Tajima,K.     Multiple DNA and Prote.. Comput.Appl.Bio 88 
4(4):467-471
</UI>
<AU>Tajima K
</AU>
<TI>Multiple DNA and Protein Sequence Alignment on a Workstation and a
Supercomputer
</TI>
<SU>Multiple alignment;
    Evolutionary tree;
    JP;
    Sequence alignment;
    Protein;
    DNA
</SU>
<AB>This multiple sequence alignment "method is based on the alignment of a
set of aligned sequences with the new sequence, and uses a recursive procedure
of such alignment. ... In this paper we describe the method of multiple
alignment based on a phylogenetic tree and its application to ... protein and
DNA sequences with the use of a workstation and supercomputer."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1988</PY>
<VO>4</VO>
<NO>4</NO>
<PP>467-471</PP>
</SEQ>

<SEQ>
<UI>0883   Takaoka,T.    An On-line Pattern Mat.. Inform.Process. 86 
22(6):329-330
</UI>
<AU>Takaoka T
</AU>
<TI>An On-line Pattern Matching Algorithm
</TI>
<SU>String match;
    Knuth-Morris-Pratt;
    JP;
    Pattern match;
    Parallel;
    On-line;
    Algorithm
</SU>
<AB>The Boyer-Moore and Knuth-Morris-Pratt "algorithms are off-line ones in
the sense that after the pattern is input, the actual matching algorithm runs.
... The present short article gives an algorithm for on-line pattern matching
which does pattern matching in parallel with the action of reading input
symbols."
</AB>
<JT>Inform Process Lett</JT>
<PY>1986</PY>
<VO>22</VO>
<NO>6</NO>
<PP>329-330</PP>
</SEQ>

<SEQ>
<UI>0884   Tanaka,E.     A High-speed String Co.. IEEE Trans.Patt 87 
9(6):806-815
</UI>
<AU>Tanaka E;
    Kojima Y
</AU>
<TI>A High-speed String Correction Method Using a Hierarchical File
</TI>
<SU>Sequence comparison;
    Correction;
    JP;
    Hierarchical
</SU>
<AB>"We proposed a multistage hierarchical string correction method for large
vocabulary. The lower bound of computational labor is estimated, and it is 
shown
that a multistage string correction method using a special type of a
hierarchical file can reduce computational labor greatly. ... Another
application of this technique is a search for approximate matches in a large
file."
</AB>
<JT>IEEE Trans Patt Anal Mach Intell</JT>
<PY>1987</PY>
<VO>9</VO>
<NO>6</NO>
<PP>806-815</PP>
</SEQ>

<SEQ>
<UI>0885   Tarhio,J.     Approximate Boyer-Moor.. SIAM J.Comput.  93 
22(2):243-260
</UI>
<AU>Tarhio J;
    Ukkonen E
</AU>
<TI>Approximate Boyer-Moore String Matching
</TI>
<SU>Boyer-Moore;
    Match with k mismatches;
    Match with k differences;
    FI;
    String match
</SU>
<AB>"The Boyer-Moore idea applied in exact string matching is generalized to
approximate string matching. Two versions of the problem are considered. ... [k
mismatches and k differences.] ... The new algorithms are often significantly
faster than the old ones. Both algorithms are functionally equivalent with the
Horspool version of the Boyer-Moore algorithm when k = 0."
</AB>
<JT>SIAM J Comput</JT>
<PY>1993</PY>
<VO>22</VO>
<NO>2</NO>
<PP>243-260</PP>
</SEQ>

<SEQ>
<UI>0886   Taylor,P.     A Fast Homology Progra.. Nucleic Acids R 84 
12(1):447-455
</UI>
<AU>Taylor P
</AU>
<TI>A Fast Homology Program for Aligning Biological Sequences
</TI>
<SU>Pairwise alignment;
    UK;
    Gap;
    Program;
    Evolutionary distance;
    Homology
</SU>
<AB>This paper describes improved algorithms for computing the alignment
(evolutionary distance and optimal path) of a pair of sequences subject to
constraints on the form of the gap weighting function. Compare with Gotoh 
(1982)
and Waterman, Smith, and Beyer (1976)
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1984</PY>
<VO>12</VO>
<NO>1</NO>
<PP>447-455</PP>
</SEQ>

<SEQ>
<UI>0887   Taylor,P.     A New Method for Findi.. Comput.Appl.Bio 91 
7(4):495-500
</UI>
<AU>Taylor P;
    Rosenberg P;
    Samsonova MG
</AU>
<TI>A New Method for Finding Long Consensus Patterns in Nucleic Acid 
Sequences
</TI>
<SU>Consensus sequence;
    UK;
    Nucleic acid
</SU>
<AB>"We describe a fast computer algorithm for identifying consensus patterns
in DNA sequences. The method requires no prior assumptions about the consensus
pattern other than its length. ... [It permits] the analysis of long sequences
for consensus patterns of up to 16 bases."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1991</PY>
<VO>7</VO>
<NO>4</NO>
<PP>495-500</PP>
</SEQ>

<SEQ>
<UI>0888   Taylor,W.R.   Identification of Prot.. J.Mol.Biol.     86 
188:233-258
</UI>
<AU>Taylor WR
</AU>
<TI>Identification of Protein Sequence Homology by Consensus Template
Alignment
</TI>
<SU>Multiple alignment;
    Pattern match;
    UK;
    Sequence alignment;
    Consensus sequence;
    Identification;
    Template;
    Homology;
    Protein
</SU>
<AB>A multiple sequence alignment algorithm which (like Bains 1986) relies on
the iterative definition of a 'consensus' sequence that determines the register
of all the sequences considered. Interplay between consensus sequence and
multiple alignment
</AB>
<JT>J Mol Biol</JT>
<PY>188</PY>
<VO>188</VO>
<PP>233-258</PP>
</SEQ>

<SEQ>
<UI>0889   Taylor,W.R.   Multiple Sequence Alig.. Comput.Appl.Bio 87 
3(2):81-87
</UI>
<AU>Taylor WR
</AU>
<TI>Multiple Sequence Alignment by a Pairwise Algorithm
</TI>
<SU>Multiple alignment;
    Clustering;
    UK;
    Sequence alignment;
    Algorithm
</SU>
<AB>"An algorithm is described that processes the results of a conventional
pairwise sequence alignment program to automatically produce an unambiguous
multiple alignment of many sequences. Unlike other, more complex, multiple
alignment programs, the method described here is fast enough to be used on
almost any multiple sequence alignment problem."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1987</PY>
<VO>3</VO>
<NO>2</NO>
<PP>81-87</PP>
</SEQ>

<SEQ>
<UI>0890   Taylor,W.R.   A Flexible Method to A.. J.Mol.Evol.     88 
28:161-169
</UI>
<AU>Taylor WR
</AU>
<TI>A Flexible Method to Align Large Numbers of Biological Sequences
</TI>
<SU>Multiple alignment;
    UK;
    Consensus sequence;
    Clustering
</SU>
<AB>"A method for the alignment of two or more biological sequences is
described. The method is a direct extension of the method of Taylor (1987)
incorporating a consensus sequence approach and allows considerable freedom in
the control of the clustering of the sequences." Interplay between consensus
sequence and multiple alignment
</AB>
<JT>J Mol Evol</JT>
<PY>28</PY>
<VO>28</VO>
<PP>161-169</PP>
</SEQ>

<SEQ>
<UI>0891   Taylor,W.R.   Pattern Matching Metho.. Protein Eng.    88 
2(2):77-86
</UI>
<AU>Taylor WR
</AU>
<TI>Pattern Matching Methods in Protein Sequence Comparison and Structure
Prediction
</TI>
<SU>Match a pattern matrix;
    Review;
    UK;
    Pattern match;
    Sequence comparison;
    Structure;
    Dynamic programming;
    Protein;
    Prediction
</SU>
<AB>Review of template based methods, dynamic programming based methods, and
fragment based methods
</AB>
<JT>Protein Eng</JT>
<PY>1988</PY>
<VO>2</VO>
<NO>2</NO>
<PP>77-86</PP>
</SEQ>

<SEQ>
<UI>0892   Taylor,W.R.   A Template Based Metho.. Progress in Bio 89 
54:159-252
</UI>
<AU>Taylor WR
</AU>
<TI>A Template Based Method of Pattern Matching in Protein Sequences
</TI>
<SU>Match complex patterns;
    Pattern match;
    UK;
    Structure;
    Template;
    Protein
</SU>
<AB>"The following sections of the current work describe the "Template"
program of Taylor (1986a) and some of its applications." Template Method:
Specification and Matching of Simple Patterns. Combinatorics, Template
Interactions and Domain Recognition. Secondary Structure Prediction. Match 
Sets,
Multiple Sequences and Sources of Patterns
</AB>
<JT>Progress in Biophysics and Molecular Biology </JT>
<PY>54</PY>
<VO>54</VO>
<PP>159-252</PP>
</SEQ>

<SEQ>
<UI>0893   Taylor,W.R.   Hierarchical Method to.. Methods Enzymol 90 
183:456-474
</UI>
<AU>Taylor WR
</AU>
<TI>Hierarchical Method to Align Large Numbers of Biological Sequences
</TI>
<SU>Multiple alignment;
    Clustering;
    UK;
    Hierarchical
</SU>
<AB>"In this chapter I describe the computer program that resulted in 
response
to my own need to align more than two protein sequences."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>456-474</PP>
</SEQ>

<SEQ>
<UI>0894   Taylor,W.R.   Templates, Consensus P.. Curr.Opin.Struc 91 
1:327-333
</UI>
<AU>Taylor WR;
    Jones DT
</AU>
<TI>Templates, Consensus Patterns and Motifs
</TI>
<SU>Consensus sequence;
    Pattern match;
    Review;
    UK;
    Motif;
    Structure;
    Template
</SU>
<AB>"Current methods in pattern and consensus-sequence matching are reviewed.
Attention is focused on those studies in which these methods have been applied
to either known structures or structure prediction, including some applications
that use machine learning and artificial intelligence."
</AB>
<JT>Curr Opin Struct Biol</JT>
<PY>1</PY>
<VO>1</VO>
<PP>327-333</PP>
</SEQ>

<SEQ>
<UI>0895   Thompson,K.   Regular Expression Sea.. Comm.ACM        68 
11(6):419-422
</UI>
<AU>Thompson K
</AU>
<TI>Regular Expression Search Algorithm
</TI>
<SU>Match complex patterns;
    Automata;
    Language;
    USA;
    Expression;
    Signal;
    Algorithm
</SU>
<AB>"A method for locating specific character strings embedded in character
text is described and an implementation of this method in the form of a 
compiler
is discussed. ... The object program then accepts the text to be searched as
input and produces a signal every time an embedded string in the text matches
the given regular expression."
</AB>
<JT>Comm ACM </JT>
<PY>1968</PY>
<VO>11</VO>
<NO>6</NO>
<PP>419-422</PP>
</SEQ>

<SEQ>
<UI>0896   Thorne,J.L.   Freeing Phylogenies fr.. Mol.Biol.Evol.  92 
9(6):1148-1162
</UI>
<AU>Thorne JL;
    Kishino H
</AU>
<TI>Freeing Phylogenies from Artifacts of Alignment
</TI>
<SU>Multiple alignment;
    Phylogeny;
    Likelihood;
    USA;
    Evolutionary distance;
    Evolutionary tree
</SU>
<AB>"Widely used methods for phylogenetic inference, both those that require
and those that produce alignments, share certain weaknesses. ... A method that
lacks them is introduced. For each pair of sequences in the data set, the 
method
utilizes both insertion-deletion and amino acid replacement information to
estimate a pairwise evolutionary distance. ... The distance matrix and standard
error estimates are used to infer a phylogenetic tree."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1992</PY>
<VO>9</VO>
<NO>6</NO>
<PP>1148-1162</PP>
</SEQ>

<SEQ>
<UI>0897   Thorne,J.L.   An Evolutionary Model .. J.Mol.Evol.     91 
33(2):114-124
</UI>
<AU>Thorne JL;
    Kishino H;
    Felsenstein J
</AU>
<TI>An Evolutionary Model for Maximum Likelihood Alignment of DNA Sequences
</TI>
<SU>Pairwise alignment;
    Likelihood;
    Dynamic programming;
    USA;
    Statistical;
    DNA;
    Model
</SU>
<AB>"Most algorithms for the alignment of biological sequences are not 
derived
from an evolutionary model. Consequently, these alignment algorithms lack a
strong statistical basis. A maximum likelihood method for the alignment of two
DNA sequences is presented. This method is based upon a statistical model of 
DNA
sequence evolution for which we have obtained explicit transition
probabilities."
</AB>
<JT>J Mol Evol</JT>
<PY>1991</PY>
<VO>33</VO>
<NO>2</NO>
<PP>114-124</PP>
</SEQ>

<SEQ>
<UI>0898   Thorne,J.L.   Inching Toward Reality.. J.Mol.Evol.     92 
34(1):3-16
</UI>
<AU>Thorne JL;
    Kishino H;
    Felsenstein J
</AU>
<TI>Inching Toward Reality: An Improved Likelihood Model of Sequence 
Evolution
</TI>
<SU>Pairwise alignment;
    Likelihood;
    Dynamic programming;
    USA;
    Substitution;
    Evolution;
    Model
</SU>
<AB>"Our previous evolutionary model [1991] is generalized to permit
approximate treatment of multiple-base insertions and deletions as well as
regional heterogeneity of substitution rates. Parameter estimation and 
alignment
procedures that incorporate these generalizations are developed."
</AB>
<JT>J Mol Evol</JT>
<PY>1992</PY>
<VO>34</VO>
<NO>1</NO>
<PP>3-16</PP>
</SEQ>

<SEQ>
<UI>0899   Tichy,W.F.    The String-to-string C.. ACM Trans.Compu 84 
2(4):309-321
</UI>
<AU>Tichy WF
</AU>
<TI>The String-to-string Correction Problem with Block Moves
</TI>
<SU>Pairwise alignment;
    USA;
    Correction
</SU>
<AB>"An algorithm that produces the shortest edit sequence transforming one
string into another is presented. The algorithm is optimal in the sense that it
generates a minimal covering set of common substrings of one string with 
respect
to another. Two improvements of the basic algorithm are developed. ... The 
block
move algorithm ... runs in linear time and space."
</AB>
<JT>ACM Trans Comput Systems </JT>
<PY>1984</PY>
<VO>2</VO>
<NO>4</NO>
<PP>309-321</PP>
</SEQ>

<SEQ>
<UI>0900   Timkovskii,V. Complexity of Common S.. Cybernetics     90 
25(5):565-580
</UI>
<AU>Timkovskii VG
</AU>
<TI>Complexity of Common Subsequence and Supersequence Problems and Related
Problems
</TI>
<SU>Longest common;
    Supersequence;
    Complexity;
    RU;
    Subsequence
</SU>
<AB>Translated from Kibernetika, No. 5, pp. 1-13, September-October, 1989. 
"In
this paper, we consider old and new polynomial-time and NP-hard problems of
finding longest common subsequences and subwords and shortest common
supersequences and superwords .... The results provide a more complete
characterization of the complexity of these problems. We also discuss the dual
problems ...."
</AB>
<JT>Cybernetics </JT>
<PY>1990</PY>
<VO>25</VO>
<NO>5</NO>
<PP>565-580</PP>
</SEQ>

<SEQ>
<UI>0901   Tyler,E.C.    A Review of Algorithms.. Comput.Biomed.R 91 
24(1):72-96
</UI>
<AU>Tyler EC;
    Horton MR;
    Krause PR
</AU>
<TI>A Review of Algorithms for Molecular Sequence Comparison
</TI>
<SU>Pairwise comparison;
    Review;
    USA;
    Sequence comparison;
    Algorithm
</SU>
<AB>"Most computer analyses of nucleic acid and protein sequences depend on
comparisons between sequences. ... This paper reviews algorithms currently in
use to solve comparison problems in molecular biology. Each algorithm is
explained in detail and discussed in terms of the molecular biology problems it
is most suited to solve."
</AB>
<JT>Comput Biomed Res</JT>
<PY>1991</PY>
<VO>24</VO>
<NO>1</NO>
<PP>72-96</PP>
</SEQ>

<SEQ>
<UI>0902   Tyson,H.      Alignment of Nucleotid.. Comput.Methods  85 21:3-10
</UI>
<AU>Tyson H;
    Haley B
</AU>
<TI>Alignment of Nucleotide or Amino Acid Sequences on Microcomputers, Using 
a
Modification of Sellers' (1974) Algorithm which Avoids the Need for Calculation
of the Complete Distance Matrix
</TI>
<SU>Pairwise alignment;
    CA;
    Gap;
    Distance;
    Amino acid;
    Nucleotide;
    Algorithm;
    Matrix
</SU>
<AB>"The Sellers algorithm for calculating distance between sequences has 
been
modified to reduce its demands on microcomputer memory space by more than half.
Gap penalties and mismatch scores are user-adjustable."
</AB>
<JT>Comput Methods Programs Biomed</JT>
<PY>21</PY>
<VO>21</VO>
<PP>3-10</PP>
</SEQ>

<SEQ>
<UI>0903   Ukkonen,E.    On Approximate String .. Lecture Notes i 83 
158:487-495
</UI>
<AU>Ukkonen E
</AU>
<TI>On Approximate String Matching
</TI>
<SU>Pairwise alignment;
    FI;
    Edit;
    String match
</SU>
<AB>In Foundations of Computation Theory, Proceedings of the 1983
International FCT-Conference, Borgholm, Sweden, August 21-27, 1983. "An
algorithm is given for computing the edit distance as well as the corresponding
sequence of editing steps ... between two strings .... The algorithm needs time
O(s min(m, n)) and space O(s2) where s is the edit distance .... For small s
this is a considerable improvement over the best previously known algorithm 
that
needs time and space O(mn)."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>158</PY>
<VO>158</VO>
<PP>487-495</PP>
</SEQ>

<SEQ>
<UI>0904   Ukkonen,E.    Algorithms for Approxi.. Inform.Control  85 
64:100-118
</UI>
<AU>Ukkonen E
</AU>
<TI>Algorithms for Approximate String Matching
</TI>
<SU>Pairwise alignment;
    FI;
    String match;
    Algorithm
</SU>
<AB>This paper is a revised and expanded version of Ukkonen (1983). The 
author
develops an improved algorithm to compute the Levenshtein distance between a
pair of strings
</AB>
<JT>Inform Control (Orlando) </JT>
<PY>64</PY>
<VO>64</VO>
<PP>100-118</PP>
</SEQ>

<SEQ>
<UI>0905   Ukkonen,E.    Finding Approximate Pa.. J.Algorithms    85 
6(1):132-137
</UI>
<AU>Ukkonen E
</AU>
<TI>Finding Approximate Patterns in Strings
</TI>
<SU>Match with k differences;
    Automata;
    FI;
    Edit
</SU>
<AB>"Let p (the pattern) be a string and t &gt;= 0 an integer. The problem of
locating in any string a substring whose edit distance from p is at most a 
given
constant t is considered. An algorithm is presented to construct a 
deterministic
finite-state automaton that solves the problem."
</AB>
<JT>J Algorithms </JT>
<PY>1985</PY>
<VO>6</VO>
<NO>1</NO>
<PP>132-137</PP>
</SEQ>

<SEQ>
<UI>0906   Ulam,S.M.     Some Combinatorial Pro.. Applications .. 72Academic 
Press
</UI>
<AU>Ulam SM
</AU>
<TI>Some Combinatorial Problems Studied Experimentally on Computing Machines
</TI>
<ED>Zaremba SK
</ED>
<BK>Applications of Number Theory to Numerical Analysis
</BK>
<SU>Sequence proximity;
    USA;
    Coding;
    Combinatorial
</SU>
<AB>"Two classes of problems are discussed. In the first group the main
questions concern the behavior of sequences of symbols, coding physical or
biological properties. A fundamental question concerns the notion of a distance
... in the spaces of such sequences."
</AB>
<PU>Academic Press </PU>
<PL>New York </PL>
<PY>1972</PY>
<PP>1-10</PP>
</SEQ>

<SEQ>
<UI>0907   Ulam,S.M.     Some Ideas and Prospec.. Annu.Rev.Biophy 72 
1:277-292
</UI>
<AU>Ulam SM
</AU>
<TI>Some Ideas and Prospects in Biomathematics
</TI>
<SU>Sequence proximity;
    USA;
    Codon
</SU>
<AB>"A quantitative treatment of problems of morphology could employ the
notion of a distance as a measure of difference between the elements of a set
that constitutes the object of a study. ... We shall give here several 
examples.
A fundamental one is the set of codons in a DNA chain."
</AB>
<JT>Annu Rev Biophys Bioeng</JT>
<PY>1</PY>
<VO>1</VO>
<PP>277-292</PP>
</SEQ>

<SEQ>
<UI>0908   Ullmann,J.R.  A Binary N-gram Techni.. Comput.J.       77 
20(2):141-147
</UI>
<AU>Ullmann JR
</AU>
<TI>A Binary N-gram Technique for Automatic Correction of Substitution,
Deletion, Insertion and Reversal Errors in Words
</TI>
<SU>Match with k differences;
    N-gram;
    Correction;
    UK;
    Coding;
    Error;
    Substitution;
    Word;
    Reversal;
    Deletion
</SU>
<AB>"This paper offers three basic contributions to n-gram technology. First,
a method of reducing storage requirements by random superimposed coding. 
Second,
an n-gram method for finding all dictionary words that differ from a given word
by up to two errors. Third, an n-gram method for correcting up to two
substitution, insertion, deletion and reversal errors without doing a separate
computation for every possible pair of errors."
</AB>
<JT>Comput J</JT>
<PY>1977</PY>
<VO>20</VO>
<NO>2</NO>
<PP>141-147</PP>
</SEQ>

<SEQ>
<UI>0909   Dayhoff,M.O.  Computer Analysis of P.. Sci.Am.         69 
221(1):86-95
</UI>
<AU>Dayhoff MO
</AU>
<TI>Computer Analysis of Protein Evolution
</TI>
<SU>Phylogeny;
    USA;
    Evolution;
    Protein
</SU>
<AB>"Amino acid sequences of similar proteins in different organisms contain
information on relations among species. This information is analyzed to
reconstruct in detail the history of living things."
</AB>
<JT>Sci Am</JT>
<PY>1969</PY>
<VO>221</VO>
<NO>1</NO>
<PP>86-95</PP>
</SEQ>

<SEQ>
<UI>0910   Crochemore,M. String-Matching on Ord.. Theoret.Comput. 92 92:33-47
</UI>
<AU>Crochemore M
</AU>
<TI>String-Matching on Ordered Alphabets
</TI>
<SU>String match;
    Regularities;
    FR;
    Complexity
</SU>
<AB>"We present a new string-matching algorithm that exploits an ordering of
the alphabet. The algorithm is linear in time and uses a fixed number of memory
locations in addition to the text and the pattern. Therefore, it is time-space-
optimal. Its main characteristic is that it scans the pattern from left to
right. No preprocessing of the pattern is needed and the complexity is
independent of the size of the pattern. An important consequence is the
possibility of computing the periods of a word in linear time and constant
space. The algorithm can also be turned into a real-time string-matching
algorithm."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>92</PY>
<VO>92</VO>
<PP>33-47</PP>
</SEQ>

<SEQ>
<UI>0911   van der Woude Playing with Patterns,.. Sci.Comput.Prog 89 
12(3):177-190
</UI>
<AU>van der Woude J
</AU>
<TI>Playing with Patterns, Searching for Strings
</TI>
<SU>String match;
    Knuth-Morris-Pratt;
    Regularities;
    NL;
    Pattern search
</SU>
<AB>"We present [an] exercise that is especially interesting for problems
dealing with periodicity. In particular it enables us to treat preprocessing 
and
search in the Knuth-Morris-Pratt pattern search algorithm as a unit. The main
objective of this paper is the design, not the algorithm(s). ... Driven by
correctness arguments we calculate the algorithm."
</AB>
<JT>Sci Comput Programming </JT>
<PY>1989</PY>
<VO>12</VO>
<NO>3</NO>
<PP>177-190</PP>
</SEQ>

<SEQ>
<UI>0912   van Heel,M.   A New Family of Powerf.. J.Mol.Biol.     91 
220(4):877-887
</UI>
<AU>van Heel M
</AU>
<TI>A New Family of Powerful Multivariate Statistical Sequence Analysis
Techniques
</TI>
<SU>Sequence analysis;
    Invariant;
    DE;
    Multivariate;
    Statistical
</SU>
<AB>"A novel multivariate statistical approach is presented for extracting 
and
exploiting intrinsic information present in our ever-growing sequence data
banks. The information extraction from the sequences avoids the pitfalls of
intersequence alignment by analyzing secondary invariant functions derived from
the sequences in the data bank rather than the sequences themselves. ... The 
...
principles can be used for a wide spectrum of sequence analysis problems ...."
</AB>
<JT>J Mol Biol</JT>
<PY>1991</PY>
<VO>220</VO>
<NO>4</NO>
<PP>877-887</PP>
</SEQ>

<SEQ>
<UI>0913   Venezia,D.    Rapid Motif Compliance.. Comput.Appl.Bio 93 
9(1):65-69
</UI>
<AU>Venezia D;
    O'Hara PJ
</AU>
<TI>Rapid Motif Compliance Scoring with Match Weight Sets
</TI>
<SU>Match complex patterns;
    Motif;
    Language;
    USA;
    Expression;
    Scoring
</SU>
<AB>"The program MOTIF incorporates a weight matrix and a rapid, backtracking
tree-search algorithm to score motif compliance with greatly enhanced
performance while placing no constraints on the motif. ... MOTIF allows a 
choice
of regular expression formats and can use both motif and sequence libraries as
either targets or queries."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>1</NO>
<PP>65-69</PP>
</SEQ>

<SEQ>
<UI>0914   Vihinen,M.    An Algorithm for Simul.. Comput.Appl.Bio 88 
4(1):89-92
</UI>
<AU>Vihinen M
</AU>
<TI>An Algorithm for Simultaneous Comparison of Several Sequences
</TI>
<SU>Multiple comparison;
    Dot;
    FI;
    Region;
    Algorithm
</SU>
<AB>A dot matrix approach. "Conserved regions of one sequence are located by
doing pairwise comparisons with other sequences .... The observation matrices
filled with scores of comparisons are superimposed and added together and those
points having values greater than or equal to stringency are accepted."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1988</PY>
<VO>4</VO>
<NO>1</NO>
<PP>89-92</PP>
</SEQ>

<SEQ>
<UI>0915   Vihinen,M.    Simultaneous Compariso.. Methods Enzymol 90 
183:447-456
</UI>
<AU>Vihinen M
</AU>
<TI>Simultaneous Comparison of Several Sequences
</TI>
<SU>Multiple comparison;
    Dot;
    FI;
    Region
</SU>
<AB>"The difference between sequence comparison and alignment is that the
former indicates all similarities between the sequences whereas the latter
method aligns the matching bases or residues. ... The comparisons give overall
sequence similarity regardless of alignment. ... A new method to study sequence
similarities by comparing one sequence with another was developed. In this
approach pairwise comparisons of aligned sequences are superimposed to search
conserved regions of the query sequence."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>447-456</PP>
</SEQ>

<SEQ>
<UI>0916   Vihinen,M.    MULTICOMP: a Program P.. Comput.Appl.Bio 92 
8(1):35-38
</UI>
<AU>Vihinen M;
    Euranto A;
    Luostarinen P;
    Nevalainen O
</AU>
<TI>MULTICOMP: a Program Package for Multiple Sequence Comparison
</TI>
<SU>Multiple comparison;
    Dot;
    FI;
    Sequence comparison;
    Program
</SU>
<AB>"The MULTICOMP program package includes several procedures with which one
query sequence can be compared simultaneously to several DNA, RNA or amino acid
sequences. The same technique was also introduced for comparing propensities of
secondary structural features, which can be predicted on the basis of amino 
acid
sequences."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>1</NO>
<PP>35-38</PP>
</SEQ>

<SEQ>
<UI>0917   Vingron,M.    A Fast and Sensitive M.. Comput.Appl.Bio 89 
5(2):115-121
</UI>
<AU>Vingron M;
    Argos P
</AU>
<TI>A Fast and Sensitive Multiple Sequence Alignment Algorithm
</TI>
<SU>Multiple alignment;
    Segment;
    Sequence weight;
    DE;
    Sequence alignment;
    Algorithm
</SU>
<AB>"A two-step multiple alignment strategy is presented that allows rapid
alignment of a set of homologous sequences and comparison of pre-aligned groups
of sequences."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1989</PY>
<VO>5</VO>
<NO>2</NO>
<PP>115-121</PP>
</SEQ>

<SEQ>
<UI>0918   Vingron,M.    Determination of Relia.. Protein Eng.    90 
3(7):565-569
</UI>
<AU>Vingron M;
    Argos P
</AU>
<TI>Determination of Reliable Regions in Protein Sequence Alignments
</TI>
<SU>Subalignment;
    Significance;
    DE;
    Region;
    Sequence alignment;
    Sequence comparison;
    Protein
</SU>
<AB>"Judging the significance of alignments is still a major problem in
sequence comparison. We present a method to delineate reliable regions within 
an
alignment. This differs from standard approaches in that it does not attempt to
attribute one significance value to the alignment as a whole, but assesses
alignment quality locally. An algorithm is provided that predicts which residue
pairs in an alignment are likely to be correctly matched."
</AB>
<JT>Protein Eng</JT>
<PY>1990</PY>
<VO>3</VO>
<NO>7</NO>
<PP>565-569</PP>
</SEQ>

<SEQ>
<UI>0919   Vingron,M.    Motif Recognition and .. J.Mol.Biol.     91 
218:33-43
</UI>
<AU>Vingron M;
    Argos P
</AU>
<TI>Motif Recognition and Alignment for Many Sequences by Comparison of Dot-
matrices
</TI>
<SU>Multiple alignment;
    Dot;
    Motif;
    DE;
    Region;
    Sequence alignment;
    Gap;
    Recognition
</SU>
<AB>"We present an algorithm to delineate dot-plot agreement. A novel
procedure ... is developed to identify common patterns and reliably aligned
regions in a set of distantly related sequences. The algorithm finds motifs
independent of input sequence lengths and reduces the dependence on gap
penalties. When sequences share greater similarity, the same approach converts
to a multiple sequence alignment procedure."
</AB>
<JT>J Mol Biol</JT>
<PY>218</PY>
<VO>218</VO>
<PP>33-43</PP>
</SEQ>

<SEQ>
<UI>0920   Vintsyuk,T.K. Speech Discrimination .. Cybernetics     68 
4(1):52-57
</UI>
<AU>Vintsyuk TK
</AU>
<TI>Speech Discrimination by Dynamic Programming
</TI>
<SU>Pairwise alignment;
    Dynamic programming;
    RU;
    Discrimination;
    Signal;
    Dynamic
</SU>
<AB>Also (Russian) Kibernetika, 4(1), 81-88, 1968. "In our proposed algorithm
for the recognition of words the greatest possible match between the readings 
of
[a vector of spectral intensity readings] for the unknown signal and for the
standard of its class is achieved .... The recognition of words is carried out
through discrimination of the components of the word and is accomplished by the
method of dynamic programming." Vintsyuk and Needleman and Wunsch (1970) are
early users of dynamic programming to compare sequences
</AB>
<JT>Cybernetics </JT>
<PY>1968</PY>
<VO>4</VO>
<NO>1</NO>
<PP>52-57</PP>
</SEQ>

<SEQ>
<UI>0921   Vishkin,U.    Optimal Parallel Patte.. Inform.Control  85 
67:91-113
</UI>
<AU>Vishkin U
</AU>
<TI>Optimal Parallel Pattern Matching in Strings
</TI>
<SU>String match;
    Parallel;
    IL;
    Pattern match;
    Optimal
</SU>
<AB>"Given a text of length n and a pattern of length m, we present a 
parallel
linear algorithm for finding all occurrences of the pattern in the text. The
algorithm runs in O(n/p) time using any number of p &lt;= n/log m processors on a
concurrent-read concurrent-write parallel random-access-machine."
</AB>
<JT>Inform Control (Orlando) </JT>
<PY>67</PY>
<VO>67</VO>
<PP>91-113</PP>
</SEQ>

<SEQ>
<UI>0922   Vishkin,U.    Deterministic Sampling.. SIAM J.Comput.  91 
20(1):22-40
</UI>
<AU>Vishkin U
</AU>
<TI>Deterministic Sampling - A New Technique for Fast Pattern Matching
</TI>
<SU>Parallel;
    IL;
    Pattern match;
    Pattern recognition;
    String match;
    Sampling
</SU>
<AB>"... This approach enables the text analysis ... to be performed in 
O(log*
n) time and optimal speedup on a PRAM. This improves on the previous fastest
optimal speedup result. It also leads to a new serial algorithm for string
matching that runs in linear time including preprocessing. The approach is
expected to be applicable for pragmatic pattern recognition problems."
</AB>
<JT>SIAM J Comput</JT>
<PY>1991</PY>
<VO>20</VO>
<NO>1</NO>
<PP>22-40</PP>
</SEQ>

<SEQ>
<UI>0923   Vogel,H.      Generalization and Sim.. J.Mol.Evol.     78 
10:339-348
</UI>
<AU>Vogel H
</AU>
<TI>Generalization and Simplification of the Moore-Goodman Test for
Significance of Alignment Homologies
</TI>
<SU>Pairwise alignment;
    Significance;
    FR;
    Codon;
    Homology
</SU>
<AB>"A test given by Moore and Goodman (1977) that checks the significance of
a homology between protein sequences is generalized to any type of distance
measure and to any classification of codon pairs or amino acids according to
this measure."
</AB>
<JT>J Mol Evol</JT>
<PY>10</PY>
<VO>10</VO>
<PP>339-348</PP>
</SEQ>

<SEQ>
<UI>0924   Vogt,G.       Searching for Distantl.. Comput.Appl.Bio 92 
8(1):49-55
</UI>
<AU>Vogt G;
    Argos P
</AU>
<TI>Searching for Distantly Related Protein Sequences in Large Databases by
Parallel Processing on a Transputer Machine
</TI>
<SU>Database search;
    Parallel;
    DE;
    Sequence alignment;
    Protein
</SU>
<AB>"AliMac is an implementation of a sensitive sequence alignment algorithm
on a parallel computer. The method achieves reliable alignments for very
distantly related sequences from a combined use of amino acid exchange weights
and physicochemical characteristics. ... This paper describes the AliMac
hardware and software and discusses problems and peculiarities of parallel
implementations, especially with transputers."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>1</NO>
<PP>49-55</PP>
</SEQ>

<SEQ>
<UI>0925   von Heijne,G. Computer Analysis of D.. Eur.J.Biochem.  91 
199:253-256
</UI>
<AU>von Heijne G
</AU>
<TI>Computer Analysis of DNA and Protein Sequences
</TI>
<SU>Sequence analysis;
    Review;
    Sequence database;
    Motif;
    Sequence alignment;
    Neural;
    SWE;
    Protein;
    DNA
</SU>
<AB>"Some recent trends in the development of theoretical methods for DNA and
protein sequence analysis are reviewed, with particular emphasis on the design
of new databases, motif searches, sequence alignment algorithms and 
applications
of neural networks." Sensitivity. Speed. Multiple alignments
</AB>
<JT>Eur J Biochem</JT>
<PY>199</PY>
<VO>199</VO>
<PP>253-256</PP>
</SEQ>

<SEQ>
<UI>0926   Wagner,R.A.   Order-n Correction for.. Comm.ACM        74 
17(5):265-268
</UI>
<AU>Wagner RA
</AU>
<TI>Order-n Correction for Regular Languages
</TI>
<SU>Correction;
    Language;
    Automata;
    USA
</SU>
<AB>"A method is presented for calculating a string B, belonging to a given
regular language L, which is 'nearest' (in number of edit operations) to a 
given
input string a. B is viewed as a reasonable 'correction' for the possibly
erroneous string a, where a was originally intended to be a string of L."
</AB>
<JT>Comm ACM </JT>
<PY>1974</PY>
<VO>17</VO>
<NO>5</NO>
<PP>265-268</PP>
</SEQ>

<SEQ>
<UI>0927   Wagner,R.A.   On the Complexity of t.. ACM Sympos.Theo 75 
7:218-223
</UI>
<AU>Wagner RA
</AU>
<TI>On the Complexity of the Extended String-to-string Correction Problem
</TI>
<SU>Pairwise alignment;
    Complexity;
    USA;
    Correction
</SU>
<AB>Albuquerque, NM, 5-7 May 1975. "The Extended String-to-String Correction
Problem (ESSCP) is defined as the problem of determining, for given strings A
and B over alphabet V, a minimum-cost sequence S of edit operations such that
S(A) = B. The sequence S may make use of the operations Change, Insert, Delete
and Swap .... Thus, 'almost all' ESSCPs can be solved in deterministic
polynomial time, but the general problem is NP-complete."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>7</PY>
<VO>7</VO>
<PP>218-223</PP>
</SEQ>

<SEQ>
<UI>0928   Wagner,R.A.   On the Complexity of t.. Time Warps, S.. 
83Addison-Wesley
</UI>
<AU>Wagner RA
</AU>
<TI>On the Complexity of the Extended String-to-string Correction Problem
</TI>
<ED>Sankoff D
    Kruskal JB
</ED>
<BK>Time Warps, String Edits, and Macromolecules: The Theory and Practice of
Sequence Comparison
</BK>
<SU>Pairwise alignment;
    Complexity;
    USA;
    Correction
</SU>
<AB>Permits insert, replace, delete, and swap operations. "In the present
chapter we analyze this algorithm and investigate its running time with respect
to the length of the strings being compared and the relative costs of the
various edit operations." See Wagner (1975) for an extended abstract with the
same title
</AB>
<PU>Addison-Wesley </PU>
<PL>Reading, MA </PL>
<PY>1983</PY>
<PP>215-235</PP>
</SEQ>

<SEQ>
<UI>0929   Wagner,R.A.   The String-to-String C.. J.Assoc.Comput. 74 
21(1):168-173
</UI>
<AU>Wagner RA;
    Fischer MJ
</AU>
<TI>The String-to-String Correction Problem
</TI>
<SU>Pairwise alignment;
    Longest common;
    Correction;
    USA
</SU>
<AB>Permits the edit operations of insertion, deletion, and mutation - the
Levenshtein distance. "An algorithm is presented which solves this problem in
time proportional to the product of the lengths of the two strings."
</AB>
<JT>J Assoc Comput Mach</JT>
<PY>1974</PY>
<VO>21</VO>
<NO>1</NO>
<PP>168-173</PP>
</SEQ>

<SEQ>
<UI>0930   Wagner,R.A.   Correcting Counter-Aut.. SIAM J.Comput.  78 
7(3):357-375
</UI>
<AU>Wagner RA;
    Seiferas JI
</AU>
<TI>Correcting Counter-Automaton-Recognizable Languages
</TI>
<SU>Dictionary match;
    Automata;
    Correction;
    USA;
    Language
</SU>
<AB>"The edit operations considered here are single-character deletions,
single-character insertions, and single-character substitutions, each at an
independent cost that does not depend on context. Employing a linear-time
algorithm for solving single-origin graph shortest distance problems, it is
shown how to correct a string of length n into the language accepted by a
counter automaton in time proportional to n2 on a RAM with unit operation cost
function."
</AB>
<JT>SIAM J Comput</JT>
<PY>1978</PY>
<VO>7</VO>
<NO>3</NO>
<PP>357-375</PP>
</SEQ>

<SEQ>
<UI>0931   Wallace,J.C.  PATMAT: A Searching an.. Comput.Appl.Bio 92 
8(3):249-254
</UI>
<AU>Wallace JC;
    Henikoff S
</AU>
<TI>PATMAT: A Searching and Extraction Program for Sequence, Pattern and 
Block
Queries and Databases
</TI>
<SU>Database search;
    USA;
    Program;
    Query
</SU>
<AB>"A program has been developed that provides molecular biologists with
multiple tools for searching databases, yet uses a very simple interface. 
PATMAT
can use protein or (translated) DNA sequences, patterns or blocks of aligned
proteins as queries of databases consisting of amino acid or nucleotide
sequences, pattern or blocks."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>3</NO>
<PP>249-254</PP>
</SEQ>

<SEQ>
<UI>0932   Wallin,E.     Fast Needleman-Wunsch .. Comput.Appl.Bio 93 
9(1):117-118
</UI>
<AU>Wallin E;
    Wettergren C;
    Hedman F;
    von Heijne G
</AU>
<TI>Fast Needleman-Wunsch Scanning of Sequence Databanks on a Massively
Parallel Computer
</TI>
<SU>Pairwise alignment;
    Parallel;
    Needleman-Wunsch;
    SWE;
    Databank
</SU>
<AB>"We have implemented the Needleman-Wunsch (NW) algorithm on a massively
parallel computer, an 8K CM-2 machine (Thinking Machines Co., Cambridge, MA)."
Compare with Lander, Mesirov, Taylor (1989)
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>1</NO>
<PP>117-118</PP>
</SEQ>

<SEQ>
<UI>0933   Wang,Y.P.     Optimal Correspondence.. IEEE Trans.Patt 90 
12(11):1080-10
</UI>
<AU>Wang YP;
    Pavlidis T
</AU>
<TI>Optimal Correspondence of String Subsequences
</TI>
<SU>Longest common;
    Sequence proximity;
    Language;
    USA;
    Optimal;
    Subsequence
</SU>
<AB>"The problem of substring matching when ... the alphabet is infinite ...
has received less attention. We present definitions of string distance, an
effective way of computing them, and matching algorithms minimizing such
distances. Our analysis also includes the matching of strings to regular
expressions."
</AB>
<JT>IEEE Trans Patt Anal Mach Intell</JT>
<PY>1990</PY>
<VO>12</VO>
<NO>11</NO>
<PP>1080-1087</PP>
</SEQ>

<SEQ>
<UI>0934   Watanabe,K.   Optimal Alignments of .. Comput.Appl.Bio 85 
1(2):83-87
</UI>
<AU>Watanabe K;
    Urano Y;
    Tamaoki T
</AU>
<TI>Optimal Alignments of Biological Sequences on a Microcomputer
</TI>
<SU>Pairwise alignment;
    CA;
    Optimal
</SU>
<AB>"An algorithm and a program have been developed which enable optimal
alignments of biological sequences on an 8-bit microcomputer." The algorithm is
based on Waterman, Smith, Beyer (1976)
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1985</PY>
<VO>1</VO>
<NO>2</NO>
<PP>83-87</PP>
</SEQ>

<SEQ>
<UI>0935   Waterman,M.S. Sequence Alignments in.. Proc.Nat.Acad.S 83 
80:3123-3124
</UI>
<AU>Waterman MS
</AU>
<TI>Sequence Alignments in the Neighborhood of the Optimum with General
Application to Dynamic Programming
</TI>
<SU>Pairwise alignment;
    Dynamic programming;
    Locally optimal;
    USA;
    Sequence alignment;
    Dynamic
</SU>
<AB>"There are sometimes unknown constraints on the sequences that cause the
'true' alignment to disagree with the optimum (computer) solution. To assist in
overcoming these difficulties, an algorithm has been developed to produce all
alignments within a specified distance of the optimum. The distance can be
chosen after the optimum is computed, and the algorithm can be repeated at
will."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>80</PY>
<VO>80</VO>
<PP>3123-3124</PP>
</SEQ>

<SEQ>
<UI>0936   Waterman,M.S. Efficient Sequence Ali.. J.Theor.Biol.   84 
108:333-337
</UI>
<AU>Waterman MS
</AU>
<TI>Efficient Sequence Alignment Algorithms
</TI>
<SU>Pairwise alignment;
    Gap;
    USA;
    Sequence alignment;
    Algorithm
</SU>
<AB>Sequence alignments with "multiple insertion/deletions are known to
increase computation time from O(n2) to O(n3) although Gotoh has presented an
O(n2) algorithm in the case the multiple insertion/deletion weighting function
is linear. It is argued in this paper that it could be desirable to use concave
weighting functions. For that case, an algorithm is derived that is conjectured
to be O(n2)."
</AB>
<JT>J Theor Biol</JT>
<PY>108</PY>
<VO>108</VO>
<PP>333-337</PP>
</SEQ>

<SEQ>
<UI>0937   Waterman,M.S. General Methods of Seq.. Bull.Math.Biol. 84 
46(4):473-500
</UI>
<AU>Waterman MS
</AU>
<TI>General Methods of Sequence Comparison
</TI>
<SU>Sequence comparison;
    Dynamic programming;
    Subalignment;
    Database search;
    Review;
    USA;
    Region;
    Segment
</SU>
<AB>"Mathematical methods for comparison of nucleic acid sequences are
reviewed. There are two major methods of sequence comparison: dynamic
programming and a method referred to here as the regions method. The problem
types discussed are comparison of two sequences, location of long matching
segments, efficient database searches and comparison of several sequences."
</AB>
<JT>Bull Math Biol</JT>
<PY>1984</PY>
<VO>46</VO>
<NO>4</NO>
<PP>473-500</PP>
</SEQ>

<SEQ>
<UI>0938   Waterman,M.S. Multiple Sequence Alig.. Nucleic Acids R 86 
14(22):9095-91
</UI>
<AU>Waterman MS
</AU>
<TI>Multiple Sequence Alignment by Consensus
</TI>
<SU>Multiple alignment;
    Consensus sequence;
    USA;
    Sequence alignment
</SU>
<AB>Describes an algorithm for multiple sequence alignment that matches words
of length and degree of mismatch chosen by the user. The method is based on the
consensus sequence algorithm described by Waterman, Arratia, and Galas (1984)
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1986</PY>
<VO>14</VO>
<NO>22</NO>
<PP>9095-9102</PP>
</SEQ>

<SEQ>
<UI>0939   Waterman,M.S. Computer Analysis of N.. Methods Enzymol 88 
164:765-793
</UI>
<AU>Waterman MS
</AU>
<TI>Computer Analysis of Nucleic Acid Sequences
</TI>
<SU>Sequence analysis;
    Sequence comparison;
    Consensus sequence;
    Structure;
    Review;
    USA
</SU>
<AB>In Noller,H.F.,Jr., Moldave,K. (Eds.), Ribosomes. "I make no attempt to
survey the literature. Instead I try to describe some useful and interesting
methods of sequence analysis that utilize the power of computers." Sequence
comparisons. Consensus patterns. Secondary structure. Conclusions
</AB>
<JT>Methods Enzymol</JT>
<PY>164</PY>
<VO>164</VO>
<PP>765-793</PP>
</SEQ>

<SEQ>
<UI>0940   Waterman,M.S. Consensus Patterns in .. Mathematical .. 89CRC Press
</UI>
<AU>Waterman MS
</AU>
<TI>Consensus Patterns in Sequences
</TI>
<ED>Waterman MS
</ED>
<BK>Mathematical Methods for DNA Sequences
</BK>
<SU>Consensus sequence;
    Neighbourhood;
    USA
</SU>
<AB>Review of an approach to identifying DNA features that are not conserved
precisely in location or in pattern. Applications to consensus words or
palindromes in multiple sequences, consensus within one sequence, and long
consensus patterns
</AB>
<PU>CRC Press </PU>
<PL>Boca Raton, FL </PL>
<PY>1989</PY>
<PP>93-115</PP>
</SEQ>

<SEQ>
<UI>0941                 Mathematical Methods f..                 89CRC Press
</UI>
<TI>Mathematical Methods for DNA Sequences
</TI>
<ED>Waterman MS
BK  -
</ED>
<SU>Sequence analysis;
    Sequence comparison;
    Review;
    USA;
    Statistical;
    DNA
</SU>
<AB>The book has ten chapters on mathematical, statistical, and computer
methods for analyzing DNA sequences
</AB>
<PU>CRC Press </PU>
<PL>Boca Raton, FL </PL>
<PY>1989</PY>
<PP>x+283-x+283</PP>
</SEQ>

<SEQ>
<UI>0942   Waterman,M.S. Sequence Alignments      Mathematical .. 89CRC Press
</UI>
<AU>Waterman MS
</AU>
<TI>Sequence Alignments
</TI>
<ED>Waterman MS
</ED>
<BK>Mathematical Methods for DNA Sequences
</BK>
<SU>Sequence alignment;
    Consensus sequence;
    Database search;
    Review;
    USA;
    Dynamic programming
</SU>
<AB>Review of the dynamic programming alignment of two or more sequences, the
consensus alignment of multiple sequences, and the comparison of a sequence to 
a
data base
</AB>
<PU>CRC Press </PU>
<PL>Boca Raton, FL </PL>
<PY>1989</PY>
<PP>53-92</PP>
</SEQ>

<SEQ>
<UI>0943   Waterman,M.S. Pattern Recognition in.. Bull.Math.Biol. 84 
46(4):515-527
</UI>
<AU>Waterman MS;
    Arratia R;
    Galas DJ
</AU>
<TI>Pattern Recognition in Several Sequences: Consensus and Alignment
</TI>
<SU>Consensus sequence;
    Neighbourhood;
    USA;
    Pattern recognition;
    Statistical;
    Recognition
</SU>
<AB>"This paper gives a new and practical solution for finding unknown
patterns that occur imperfectly above a preset frequency. Algorithms for 
finding
the patterns are given as well as estimates of statistical significance." The
consensus concept depends on window width, consensus sequence length, and
neighbourhood specification
</AB>
<JT>Bull Math Biol</JT>
<PY>1984</PY>
<VO>46</VO>
<NO>4</NO>
<PP>515-527</PP>
</SEQ>

<SEQ>
<UI>0944   Waterman,M.S. A Dynamic Programming .. Math.Biosci.    85 
77(1/2):179-18
</UI>
<AU>Waterman MS;
    Byers TH
</AU>
<TI>A Dynamic Programming Algorithm to Find all Solutions in a Neighborhood 
of
the Optimum
</TI>
<SU>Pairwise alignment;
    Locally optimal;
    Dynamic programming;
    USA;
    Dynamic;
    Algorithm
</SU>
<AB>"Just after he introduced dynamic programming, Richard Bellman with R.
Kalaba in 1960 gave a method for finding Kth best policies. Their method has
been modified wince then, but it is still not practical for many problems. This
paper describes a new technique which modifies the usual backtracking procedure
and lists all near-optimal policies. This practical algorithm is very much in
the spirit of the original formulation of dynamic programming. An application 
to
matching biological sequences is given."
</AB>
<JT>Math Biosci</JT>
<PY>1985</PY>
<VO>77</VO>
<NO>1/2</NO>
<PP>179-188</PP>
</SEQ>

<SEQ>
<UI>0945   Waterman,M.S. A New Algorithm for Be.. J.Mol.Biol.     87 
197(4):723-728
</UI>
<AU>Waterman MS;
    Eggert M
</AU>
<TI>A New Algorithm for Best Subsequence Alignments with Application to tRNA-
rRNA Comparisons
</TI>
<SU>Subalignment;
    Locally optimal;
    USA;
    Gap;
    Subsequence;
    Algorithm
</SU>
<AB>"The algorithm of Smith and Waterman for identification of maximally
similar subsequences is extended to allow identification of all 
non-intersecting
similar subsequences with similarity score at or above some preset level. The
resulting alignments are found in order of score, with the highest scoring
alignment first. In the case of single gaps or multiple gaps weighted linear
with gap length, the algorithm is extremely efficient ...."
</AB>
<JT>J Mol Biol</JT>
<PY>1987</PY>
<VO>197</VO>
<NO>4</NO>
<PP>723-728</PP>
</SEQ>

<SEQ>
<UI>0946   Waterman,M.S. Parametric Sequence Co.. Proc.Nat.Acad.S 92 
89:6090-6093
</UI>
<AU>Waterman MS;
    Eggert M;
    Lander E
</AU>
<TI>Parametric Sequence Comparisons
</TI>
<SU>Sequence comparison;
    Dynamic programming;
    USA;
    Parametric;
    Statistical
</SU>
<AB>Compare two sequences. "We present an algorithm to efficiently find the
optimal alignments for all choices of the penalty parameters. It is then
possible to systematically explore these alignments for those with the most
biological or statistical interest."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>89</PY>
<VO>89</VO>
<PP>6090-6093</PP>
</SEQ>

<SEQ>
<UI>0947   Waterman,M.S. Phase Transitions in S.. Proc.Nat.Acad.S 87 
84(5):1239-124
</UI>
<AU>Waterman MS;
    Gordon L;
    Arratia R
</AU>
<TI>Phase Transitions in Sequence Matches and Nucleic Acid Structure
</TI>
<SU>Sequence analysis;
    Sequence comparison;
    Significance;
    USA;
    Region;
    Sequence match;
    Structure;
    Nucleic acid;
    Transition
</SU>
<AB>"Extremal properties, such as longest helical region, can now be studied
with a new family of probability distributions [Arratia, Gordon, Waterman,
1986]. Not only is such extremal behavior analyzed with great precision, but 
new
phase transitions are determined. ... These results ... also have importance 
for
significance tests in comparison of nucleic acid or protein sequences."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>1987</PY>
<VO>84</VO>
<NO>5</NO>
<PP>1239-1243</PP>
</SEQ>

<SEQ>
<UI>0948   Waterman,M.S. Consensus Methods for .. Methods Enzymol 90 
183:221-237
</UI>
<AU>Waterman MS;
    Jones R
</AU>
<TI>Consensus Methods for DNA and Protein Sequence Alignment
</TI>
<SU>Consensus sequence;
    Neighbourhood;
    USA;
    Consensus method;
    Protein;
    Sequence alignment;
    DNA
</SU>
<AB>"The purpose of this chapter is to present some of the tools that we have
created in order to analyze multiple sequences in a rigorous, efficient, and
systematic way. ... Our approach is based on what we refer to as consensus
analysis. ... The basis of the consensus method is an algorithm to find
consensus words, with the degree of matching and alignment specified by the 
user
of the program."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>221-237</PP>
</SEQ>

<SEQ>
<UI>0949   Waterman,M.S. Computer Alignment of .. Phylogenetic .. 91Oxford 
Universi
</UI>
<AU>Waterman MS;
    Joyce J;
    Eggert M
</AU>
<TI>Computer Alignment of Sequences
</TI>
<ED>Miyamoto MM
    Cracraft J
</ED>
<BK>Phylogenetic Analysis of DNA Sequences
</BK>
<SU>Sequence alignment;
    Review;
    USA;
    Statistical
</SU>
<AB>Sequence alignment, aligning full sequences, maximum sequences,
statistical distribution of alignment scores, multiple sequence alignment
</AB>
<PU>Oxford University Press</PU>
<PL>New York</PL>
<PY>1991</PY>
<PP>59-72</PP>
</SEQ>

<SEQ>
<UI>0950   Waterman,M.S. Line Geometries for Se.. Bull.Math.Biol. 84 
46(4):567-577
</UI>
<AU>Waterman MS;
    Perlwitz MD
</AU>
<TI>Line Geometries for Sequence Comparisons
</TI>
<SU>Multiple alignment;
    Dynamic programming;
    Evolutionary tree;
    USA;
    Sequence comparison;
    Sequence alignment;
    Geometry
</SU>
<AB>"A simple generalization of the sequences makes it possible to obtain 
some
results about the geometry of sequence alignments. These ideas suggest 
heuristic
approaches to problems of comparing several sequences. If M sequences [of 
length
n] are known to be related by a binary tree, they can be aligned in O(MN2) time
and O(N2 + NM) storage."
</AB>
<JT>Bull Math Biol</JT>
<PY>1984</PY>
<VO>46</VO>
<NO>4</NO>
<PP>567-577</PP>
</SEQ>

<SEQ>
<UI>0951   Waterman,M.S. Some Biological Sequen.. Adv.Math.       76 
20(3):367-387
</UI>
<AU>Waterman MS;
    Smith TF;
    Beyer WA
</AU>
<TI>Some Biological Sequence Metrics
</TI>
<SU>Sequence proximity;
    Multiple alignment;
    Information theory;
    USA
</SU>
<AB>Section 8 "extends the notion of distance between two sequences to a
distance among n sequences: the n-distance. An algorithm which computes this
distance is given. The algorithm also gives the alignment of the n sequences
which has least weight."
</AB>
<JT>Adv Math</JT>
<PY>1976</PY>
<VO>20</VO>
<NO>3</NO>
<PP>367-387</PP>
</SEQ>

<SEQ>
<UI>0952   Weir,B.S.     Statistical Analysis o.. J.Nat.Cancer In 88 
80(6):395-406
</UI>
<AU>Weir BS
</AU>
<TI>Statistical Analysis of DNA Sequences
</TI>
<SU>Sequence analysis;
    Significance;
    Review;
    USA;
    Statistical;
    Markov;
    Region;
    DNA
</SU>
<AB>"Developments in the statistical analysis of DNA sequence data since 1984
are reviewed. Mathematical methods employing dynamic programming or
incorporating Markov chain theory have been developed to search sequences for
regions of similarity and to align sequences. When the biological forces of
mutation and genetic drift are included in models, distances between aligned
sequences allow the construction of evolutionary trees."
</AB>
<JT>J Nat Cancer Inst</JT>
<PY>1988</PY>
<VO>80</VO>
<NO>6</NO>
<PP>395-406</PP>
</SEQ>

<SEQ>
<UI>0953   White,C.T.    The Diagonal-traverse .. Nucleic Acids R 84 
12(1):751-766
</UI>
<AU>White CT;
    Hardies SC;
    Hutchison CA III;
    Edgell MH
</AU>
<TI>The Diagonal-traverse Homology Search Algorithm for Locating Similarities
between Two Sequences
</TI>
<SU>Pairwise comparison;
    Dot;
    USA;
    Display;
    Segment;
    Homology;
    Similarity;
    Algorithm
</SU>
<AB>"We present a fast computer algorithm for finding homology between two 
DNA
sequences. It generates a two-dimensional display in which a diagonal string of
dots represents a stretch of homology between the two segments. Our algorithm
performs the search very rapidly, and has no internal data storage requirement
except for the sequences themselves."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1984</PY>
<VO>12</VO>
<NO>1</NO>
<PP>751-766</PP>
</SEQ>

<SEQ>
<UI>0954   Wilbur,W.J.   Rapid Similarity Searc.. Proc.Nat.Acad.S 83 
80:726-730
</UI>
<AU>Wilbur WJ;
    Lipman DJ
</AU>
<TI>Rapid Similarity Searches of Nucleic Acid and Protein Data Banks
</TI>
<SU>Database search;
    Pairwise alignment;
    k-tuple;
    USA;
    Similarity;
    Protein;
    Nucleic acid
</SU>
<AB>"We present an algorithm for the global comparison of sequences based on
matching k-tuples of sequence elements for a fixed k. ... The algorithm has 
also
been adapted, in a separate implementation, to produce rigorous sequence
alignments."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>80</PY>
<VO>80</VO>
<PP>726-730</PP>
</SEQ>

<SEQ>
<UI>0955   Wilbur,W.J.   The Context Dependent .. SIAM J.Appl.Mat 84 
44(3):557-567
</UI>
<AU>Wilbur WJ;
    Lipman DJ
</AU>
<TI>The Context Dependent Comparison of Biological Sequences
</TI>
<SU>Pairwise alignment;
    Pairwise comparison;
    Sequence proximity;
    Dynamic programming;
    USA
</SU>
<AB>"A general method for comparing two macromolecules is developed. The
method differs from more traditional procedures in that matches are evaluated
dependent on sequence context. We first define a context dependent similarity
score between sequences and give a dynamic programming algorithm for its
calculation. Conditions are then described which allow the conversion of the
similarity score to a metric distance. The class of metrics ... includes the
Sellers metric."
</AB>
<JT>SIAM J Appl Math</JT>
<PY>1984</PY>
<VO>44</VO>
<NO>3</NO>
<PP>557-567</PP>
</SEQ>

<SEQ>
<UI>0956   Williams,P.L. Phylogeny Determinatio.. Methods Enzymol 90 
183:615-627
</UI>
<AU>Williams PL;
    Fitch WM
</AU>
<TI>Phylogeny Determination Using Dynamically Weighted Parsimony Method
</TI>
<SU>Phylogeny;
    Character weight;
    USA;
    Region;
    Parsimony
</SU>
<AB>When estimating phylogenies from sequences, workers may discard regions 
of
sequences that are difficult to align, or weight transversions more than
transitions. "Both kinds of weighting are subject to the charge of investigator
bias in the absence of some procedure for assigning weights .... This chapter
presents various methods for assigning both kinds of weights plus a method for
evaluating trees given the weights."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>615-627</PP>
</SEQ>

<SEQ>
<UI>0957   Wong,A.K.C.   A Multiple Sequence Co.. Bull.Math.Biol. 93 
55(2):465-486
</UI>
<AU>Wong AKC;
    Chan SC;
    Chiu DKY
</AU>
<TI>A Multiple Sequence Comparison Method
</TI>
<SU>Multiple alignment;
    Phylogeny;
    CA;
    Sequence comparison;
    Hierarchical
</SU>
<AB>"A new method for the comparison of multiple macromolecular sequences ...
is based on a hierarchical sequence synthesis procedure that does not require
any a priori knowledge of the molecular structure of the sequences or the
phylogenetic relations among the sequences. It ... has the capability of ...
aligning the sequences while the taxonomic tree of the sequences is being
constructed in one single phase."
</AB>
<JT>Bull Math Biol</JT>
<PY>1993</PY>
<VO>55</VO>
<NO>2</NO>
<PP>465-486</PP>
</SEQ>

<SEQ>
<UI>0958   Wong,A.K.C.   A Generalized Method f.. Comput.Biol.Med 74 4:43-57
</UI>
<AU>Wong AKC;
    Reichert TA;
    Cohen DN;
    Aygun BO
</AU>
<TI>A Generalized Method for Matching Informational Macromolecular Code
Sequences
</TI>
<SU>Pairwise alignment;
    Information theory;
    USA
</SU>
<AB>"This paper is concerned with the discovery of the best way of changing
one sequence into another by assembling the set of alterations that minimize
some quality measure. The quality measures presently incorporated into the
algorithm assess the amount of information required to convert one sequence 
into
another (Reichert, Cohen, Wong 1973). This procedure differs in a fundamental
way from [previous work]."
</AB>
<JT>Comput Biol Med</JT>
<PY>4</PY>
<VO>4</VO>
<PP>43-57</PP>
</SEQ>

<SEQ>
<UI>0959   Wong,C.K.     Bounds for the String .. J.Assoc.Comput. 76 
23(1):13-16
</UI>
<AU>Wong CK;
    Chandra AK
</AU>
<TI>Bounds for the String Editing Problem
</TI>
<SU>Pairwise comparison;
    Complexity;
    USA;
    Editing
</SU>
<AB>"The string editing problem is to determine the distance between two
strings as measured by the minimal cost sequence of deletions, insertions, and
changes of symbols needed to transform one string into the other. ... If the
operations on symbols of the strings are restricted to tests of equality, then
O(nm) operations are necessary (and sufficient) to compute the distance."
</AB>
<JT>J Assoc Comput Mach</JT>
<PY>1976</PY>
<VO>23</VO>
<NO>1</NO>
<PP>13-16</PP>
</SEQ>

<SEQ>
<UI>0960   Wu,S.         An O(NP) Sequence Comp.. Inform.Process. 90 
35(6):317-323
</UI>
<AU>Wu S;
    Manber U;
    Myers G;
    Miller W
</AU>
<TI>An O(NP) Sequence Comparison Algorithm
</TI>
<SU>Pairwise comparison;
    Longest common;
    USA;
    Sequence comparison;
    Edit;
    Algorithm
</SU>
<AB>"Let A and B be two sequences of lengths M and N, N &gt;= M, let D be the
length of a shortest insertion-deletion edit script , and let P be the number 
of
deletions in such a script. "We present an algorithm for finding a shortest 
edit
distance of A and B whose worst-case running time is O(NP) and whose expected
running time is O(N + PD). ... It is nearly twice as fast as the O(ND) 
algorithm
of Myers ...."
</AB>
<JT>Inform Process Lett</JT>
<PY>1990</PY>
<VO>35</VO>
<NO>6</NO>
<PP>317-323</PP>
</SEQ>

<SEQ>
<UI>0961   Yamada,H.     A High-Speed String-Se.. IEEE J.Solid-St 87 
22(5):829-834
</UI>
<AU>Yamada H;
    Hirata M;
    Nagai H;
    Takahashi K
</AU>
<TI>A High-Speed String-Search Engine
</TI>
<SU>Database search;
    Parallel;
    Hardware;
    Automata;
    JP
</SU>
<AB>"This paper describes a newly developed VLSI character string-search
engine (SSE) which uses a new architecture ... that combines finite-state
automaton logic with a new content addressable memory to achieve a string
comparison rate as fast as 80 million strings per second. This string-search
performance is several times faster than any previously reported."
</AB>
<JT>IEEE J Solid-State Circuits</JT>
<PY>1987</PY>
<VO>22</VO>
<NO>5</NO>
<PP>829-834</PP>
</SEQ>

<SEQ>
<UI>0962   Yao,A.C.C.    The Complexity of Patt.. SIAM J.Comput.  79 
8(3):368-387
</UI>
<AU>Yao ACC
</AU>
<TI>The Complexity of Pattern Matching for a Random String
</TI>
<SU>String match;
    Knuth-Morris-Pratt;
    Complexity;
    USA;
    Pattern match
</SU>
<AB>"In this paper, we study the average-case complexity of pattern matching
in the model of [Knuth, Morris, Pratt (1977)]. ... These results in particular
confirm [a conjecture of Knuth] when n &gt;= 2m. We may add that the case m &lt;= n 
&lt;= 2m
is mainly of theoretical interest, as the text strings are usually much longer
than the patterns in practice."
</AB>
<JT>SIAM J Comput</JT>
<PY>1979</PY>
<VO>8</VO>
<NO>3</NO>
<PP>368-387</PP>
</SEQ>

<SEQ>
<UI>0963   Yee,C.N.      Reconstruction of Stri.. Comput.Appl.Bio 93 9(1):1-7
</UI>
<AU>Yee CN;
    Allison L
</AU>
<TI>Reconstruction of Strings Past
</TI>
<SU>Pairwise alignment;
    Automata;
    Message length;
    Austria
</SU>
<AB>"Minimum message length encoding, a method of inductive inference, is
applied to the string-alignment problem. It leads to an alignment method that
averages over all alignments in a weighted fashion. Experiments indicate that
this method can recover the actual parameters of evolution with high accuracy
and over a wide range of values, whereas the use of a single optimal alignment
gives biased results."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>1</NO>
<PP>1-7</PP>
</SEQ>

<SEQ>
<UI>0964   Yianilos,P.N. A Dedicated Comparator.. Electronics     83 
56(24):113-117
</UI>
<AU>Yianilos PN
</AU>
<TI>A Dedicated Comparator Matches Symbol Strings Fast and Intelligently
</TI>
<SU>Database search;
    Parallel;
    Hardware;
    USA
</SU>
<AB>An integrated circuit "searches a data base and ranks the 16 best matches
to a given string at up to 30,000 records per second." The measurement of
proximity between strings is described on p. 115
</AB>
<JT>Electronics </JT>
<PY>1983</PY>
<VO>56</VO>
<NO>24</NO>
<PP>113-117</PP>
</SEQ>

<SEQ>
<UI>0965   Zuker,M.      Suboptimal Sequence Al.. J.Mol.Biol.     91 
221(2):403-420
</UI>
<AU>Zuker M
</AU>
<TI>Suboptimal Sequence Alignment in Molecular Biology: Alignment with Error
Analysis
</TI>
<SU>Pairwise alignment;
    Dynamic programming;
    Dot;
    CA;
    Sequence alignment;
    Suboptimal;
    Error
</SU>
<AB>"A molecular sequence alignment algorithm based on dynamic programming 
has
been extended to allow the computation of all pairs of residues that can be 
part
of optimal and suboptimal sequence alignments. The uncertainties inherent in
sequence alignment can be displayed using a new form of dot plot. The method 
...
can reveal what parts of the alignment are better determined than others."
</AB>
<JT>J Mol Biol</JT>
<PY>1991</PY>
<VO>221</VO>
<NO>2</NO>
<PP>403-420</PP>
</SEQ>

<SEQ>
<UI>0966   Zvelebil,M.J. Prediction of Protein .. J.Mol.Biol.     87 
195(4):957-961
</UI>
<AU>Zvelebil MJ;
    Barton GJ;
    Taylor WR;
    Sternberg MJE
</AU>
<TI>Prediction of Protein Secondary Structure and Active Sites using the
Alignment of Homologous Sequences
</TI>
<SU>Multiple alignment;
    Structure;
    UK;
    Protein;
    Prediction;
    Secondary
</SU>
<AB>"The prediction of protein secondary structure ... is improved by 9% to
66% using the information available from a family of homologous sequences." A
method to align multiple sequences is described on page 958
</AB>
<JT>J Mol Biol</JT>
<PY>1987</PY>
<VO>195</VO>
<NO>4</NO>
<PP>957-961</PP>
</SEQ>

<SEQ>
<UI>0967   Nakayama,S.I. Method for Clustering .. J.Chem.Inf.Comp 88 28:72-78
</UI>
<AU>Nakayama SI;
    Shigezumi S;
    Yoshida M
</AU>
<TI>Method for Clustering Proteins by Use of All Possible Pairs of Amino 
Acids
as Structural Descriptors
</TI>
<SU>Sequence proximity;
    N-gram;
    Clustering;
    JP;
    Dyad;
    Composition;
    Protein;
    Amino acid
</SU>
<AB>"Proteins were represented as vectors, of which components were all
possible pairs of amino acids. From a distance matrix between any pairs of
proteins thus represented, several clusters corresponding to connected
components were generated. Application of this method to three different sets 
of
proteins showed that it was suitable for clustering closely related proteins
with respect to the sequential similarity defined by Dayhoff."
</AB>
<JT>J Chem Inf Comput Sci</JT>
<PY>28</PY>
<VO>28</VO>
<PP>72-78</PP>
</SEQ>

<SEQ>
<UI>0968   Liu,K.C.      On String Pattern Matc.. SIAM J.Comput.  81 
10(1):118-140
</UI>
<AU>Liu KC
</AU>
<TI>On String Pattern Matching: A New Model with a Polynomial Time Algorithm
</TI>
<SU>Pattern match;
    USA;
    Parsing;
    Pattern definition;
    Language;
    String match;
    Model;
    Algorithm
</SU>
<AB>"A polynomial time algorithm is presented for string pattern matching.
Earley's parsing algorithm is adapted for context-free patterns and is extended
to allow the augmentation of the immediate assignment operation of SNOBOL4 and 
a
powerful describtive operator not previously implemented, set complement.
Canonical pattern definition systems are defined to describe patterns for which
our algorithm will perform pattern matching. The languages generated by such
systems are called extended context-free languages, and are shown to properly
contain the family of context-free languages ...."
</AB>
<JT>SIAM J Comput</JT>
<PY>1981</PY>
<VO>10</VO>
<NO>1</NO>
<PP>118-140</PP>
</SEQ>

<SEQ>
<UI>0969   Taylor,W.R.   The Classification of .. J.Theor.Biol.   86 
119:205-218
</UI>
<AU>Taylor WR
</AU>
<TI>The Classification of Amino Acid Conservation
</TI>
<SU>Consensus sequence;
    UK;
    Sequence alignment;
    Classification;
    Amino acid
</SU>
<AB>"A classification of amino acid type is described which is based on a
synthesis of physico-chemical and mutation data. This is organised in the form
of a Venn diagram from which sub-sets are derived that include groups of amino
acids likely to be conserved for similar structural reasons. These sets are 
used
to describe conservation in aligned sequences by allocating to each position 
the
smallest set that contains all the residue types brought together by the
alignment. This minimal set assignment provides a simple way of reducing the
information contained in a sequence alignment to a form which can be analysed 
by
computer yet remains readable."
</AB>
<JT>J Theor Biol</JT>
<PY>119</PY>
<VO>119</VO>
<PP>205-218</PP>
</SEQ>

<SEQ>
<UI>0970   Thornton,J.M. Protein Motifs and Dat.. Trends Biochem. 89 
14:300-304
</UI>
<AU>Thornton JM;
    Gardner SP
</AU>
<TI>Protein Motifs and Data-base Searching
</TI>
<SU>Sequence database;
    UK;
    Motif;
    Structure;
    Protein
</SU>
<AB>"Protein structure and sequence motifs are now recognized for many
different protein families and topologies. To aid identification and use of
these motifs in modelling and prediction, it has become necessary to establish
consistent data bases of protein structure, including not only coordinates, but
also derived data such as secondary structure location and solvent
accessibilities. ... We will concentrate on structural motifs and structure-
related sequence motifs and describe how these can be extracted from the newly
established data bases of protein structure for use in prediction and
modelling."
</AB>
<JT>Trends Biochem Sci</JT>
<PY>14</PY>
<VO>14</VO>
<PP>300-304</PP>
</SEQ>

<SEQ>
<UI>0971   Guenoche,A.   Alignment and Hierarch.. Information a.. 
93Springer-Verlag
</UI>
<AU>Guenoche A
</AU>
<TI>Alignment and Hierarchical Clustering Method for Strings
</TI>
<ED>Opitz O
    Lausen B;
    Klar R
</ED>
<BK>Information and Classification. Concepts, Methods and Applications
</BK>
<SU>Multiple alignment;
    Longest common;
    FR;
    Clustering;
    Hierarchical
</SU>
<AB>"We develop a conceptual clustering method for strings to realize an
multiple alignment of biological sequences. We associate to each cluster a
common subsequence of its strings. Unfortunately, the longest common 
subsequence
problem is NP-hard as soon as there are more than two strings. To avoid this
difficulty, we present: a greedy alignment method; some improvements of the
Hirschberg algorithm ...; an ascending clustering method, which provides a
common subsequence that is longer than the one given by the greedy algorithm."
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1993</PY>
<PP>403-412</PP>
</SEQ>

<SEQ>
<UI>0972   Andersson,A.  The Complexity of Sear.. ACM Sympos.Theo 94 
26:317-325
</UI>
<AU>Andersson A;
    Hagerup T;
    Hastad J;
    Petersson O
</AU>
<TI>The Complexity of Searching a Sorted Array of Strings
</TI>
<SU>Sequence search;
    Complexity;
    SWE
</SU>
<AB>23-25 May 1994, Montreal, Quebec. "We present an algorithm for finding a
given k-character string in an array of n strings, arranged in alphabetical
order, using O( ( k log log n / ( log log ( 4 + ( k log log n / log n )  ) ) ) 
+
k + log n ) character comparisons. This improves significantly upon previous
bounds."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>26</PY>
<VO>26</VO>
<PP>317-325</PP>
</SEQ>

<SEQ>
<UI>0973   Bodlaender,H. Beyond NP-Completeness.. ACM Sympos.Theo 94 
26:449-458
</UI>
<AU>Bodlaender HL;
    Fellows MR;
    Hallett MT
</AU>
<TI>Beyond NP-Completeness for Problems of Bounded Width: Hardness for the W
Hierarchy (Extended Abstract)
</TI>
<SU>Longest common;
    Complexity;
    Parameterized;
    CA
</SU>
<AB>23-25 May 1994, Montreal, Quebec. "The parameterized computational
complexity of a collection of well-known problems including: ... LONGEST COMMON
SUBSEQUENCE ... is explored. It is shown that these problems are hard for
various levels of the W hierarchy. ... Theorem 2. LCS is hard for W[t] for all
t."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>26</PY>
<VO>26</VO>
<PP>449-458</PP>
</SEQ>

<SEQ>
<UI>0974   Hagerup,T.    Optimal Parallel Strin.. ACM Sympos.Theo 94 
26:382-391
</UI>
<AU>Hagerup T
</AU>
<TI>Optimal Parallel String Algorithms: Merging, Sorting and Computing the
Minimum
</TI>
<SU>Merge;
    Sort;
    Parallel;
    DE;
    Optimal;
    Algorithm
</SU>
<AB>23-25 May 1994, Montreal, Quebec. "We study fundamental comparison
problems on strings of characters, equipped with the usual lexicographical
ordering. For each problem studied, we give a parallel algorithm that is 
optimal
with respect to at least one criterion for which no optimal algorithm was
previously known." The results concern: merging two sets of strings, sorting a
sequence of strings, and finding the minimum string in a sequence of strings.
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>26</PY>
<VO>26</VO>
<PP>382-391</PP>
</SEQ>

<SEQ>
<UI>0975   Hagerup,T.    Merging and Sorting St.. Lecture Notes i 92 
629:298-306
</UI>
<AU>Hagerup T;
    Petersson O
</AU>
<TI>Merging and Sorting Strings in Parallel
</TI>
<SU>Merge;
    Sort;
    Parallel;
    DE
</SU>
<AB>Proceedings, 17th International Symposium on Mathematical Foundations of
Computer Science. "We show that strings of characters, equipped with the usual
lexicographical ordering, can be merged and sorted in parallel as efficiently 
as
integers, although with some loss in speed." The models of computation
considered are the CRCW PRAM and the EREW PRAM.
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>629</PY>
<VO>629</VO>
<PP>298-306</PP>
</SEQ>

<SEQ>
<UI>0976   Clift,B.      Sequence Landscapes      Nucleic Acids R 86 
14(1):141-158
</UI>
<AU>Clift B;
    Haussler D;
    McConnell R;
    Schneider TD;
    Stormo GD
</AU>
<TI>Sequence Landscapes
</TI>
<SU>Regularities;
    Display;
    USA;
    Repeat
</SU>
<AB>"We describe a method for representing the structure of repeating
sequences in nucleic-acids, proteins and other texts. A portion of the sequence
is presented at the bottom of a CRT screen. Above the sequence is its 
landscape,
which looks like a mountain range. Each mountain corresponds to a subsequence 
of
the sequence. At the peak of every mountain is written the number of times that
the subsequence appears. ... Using sequence landscapes, one can quickly locate
significant repeats."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1986</PY>
<VO>14</VO>
<NO>1</NO>
<PP>141-158</PP>
</SEQ>

<SEQ>
<UI>0977   Hariharan,R.  Optimal Parallel Suffi.. ACM Sympos.Theo 94 
26:290-299
</UI>
<AU>Hariharan R
</AU>
<TI>Optimal Parallel Suffix Tree Construction
</TI>
<SU>Search tree;
    Parallel;
    USA;
    Optimal;
    Suffix
</SU>
<AB>"An O(m)-work O(log4 m)-time common CRCW-PRAM algorithm for constructing
the suffix tree of a string s of length m drawn from any fixed alphabet set is
obtained. The algorithm takes O(m) space and is the first known work and space
optimal parallel algorithm for this problem. It can be generalized to a string 
s
drawn from any general alphabet ...."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>26</PY>
<VO>26</VO>
<PP>290-299</PP>
</SEQ>

<SEQ>
<UI>0978   Jiang,T.      Aligning Sequences via.. ACM Sympos.Theo 94 
26:760-769
</UI>
<AU>Jiang T;
    Lawler EL;
    Wang L
</AU>
<TI>Aligning Sequences via an Evolutionary Tree: Complexity and Approximation
</TI>
<SU>Multiple alignment;
    Evolutionary tree;
    Complexity;
    Approximation;
    CA
</SU>
<AB>"It is shown that tree alignment is NP-hard and generalized tree 
alignment
is MAX SNP-hard. On the positive side, we design an efficient approximation
algorithm with performance ratio 2 for tree alignment. The algorithm is then
extended to a polynomial-time approximation scheme. ... The contrast between 
the
approximability of tree alignment and generalized tree alignment shows that a
phylogenetic tree can indeed help in multiple alignment."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>26</PY>
<VO>26</VO>
<PP>760-769</PP>
</SEQ>

<SEQ>
<UI>0979   Lander,E.S.   Mapping and Interpreti.. Comm.ACM        91 
34(11):33-39
</UI>
<AU>Lander ES;
    Langridge R;
    Saccocio DM
</AU>
<TI>Mapping and Interpreting Biological Information
</TI>
<SU>Sequence analysis;
    Database search;
    Structure;
    USA;
    Mapping
</SU>
<AB>"This article summarizes some of the key computational challenges in 
three
major areas as discussed by workshop participants: sequence analysis,
information storage and retrieval, and protein structure prediction.
Successfully meeting the challenges in all these problem areas is likely to
require unprecedented collaboration between the computer and life sciences."
</AB>
<JT>Comm ACM </JT>
<PY>1991</PY>
<VO>34</VO>
<NO>11</NO>
<PP>33-39</PP>
</SEQ>

<SEQ>
<UI>0980   Kosaraju,S.R. Real-Time Pattern Matc.. ACM Sympos.Theo 94 
26:310-316
</UI>
<AU>Kosaraju SR
</AU>
<TI>Real-Time Pattern Matching and Quasi-Real-Time Construction of Suffix
Trees. Preliminary Version
</TI>
<SU>Pattern match;
    Search tree;
    USA;
    Suffix
</SU>
<AB>"We design simple real-time algorithms for the following problems for any
text string T and pattern string P: (a) given T#P as input, test whether PR is 
a
substring of T, and (b) Given T#P as input, test whether P is a substring of T.
Even though these results were claimed in a voluminous paper by Slisenko, the
design of a convincing and understandable solution is a well-known open 
problem.
Our algorithm is based on a novel top-down suffix tree construction algorithm.
This algorithm ... constructs enough of the suffix tree in real-time so that it
can respond to pattern match queries in real-time."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>26</PY>
<VO>26</VO>
<PP>310-316</PP>
</SEQ>

<SEQ>
<UI>0981   Cole,R.       Optimally Fast Paralle.. IEEE Sympos.Fou 93 
34:248-258
</UI>
<AU>Cole R;
    Crochemore M;
    Galil Z;
    Gasieniec L;
    Hariharan R;
    Muthukrishnan S;
    Park K;
    Rytter W
</AU>
<TI>Optimally Fast Parallel Algorithms for Preprocessing and Pattern Matching
in One and Two Dimensions
</TI>
<SU>Pattern match;
    Parallel;
    USA;
    Algorithm
</SU>
<AB>The authors obtain these results for a pattern of length m: "1. Improving
the preprocessing of the constant-time text search algorithm [Galil 1992] ....
2. A constant-time deterministic string-matching algorithm in the case that the
text length n satisfies n = W( m1+e ) for a constant e &gt; 0. 3. A simple
probabilistic string-matching algorithm that has constant time with high
probability for random input. 4. A constant expected time Las-Vegas algorithm
for computing the period of the pattern and all witnesses and thus string
matching itself, solving the main open problem remaining in string matching."
</AB>
<JT>IEEE Sympos Found Comput Sci</JT>
<PY>34</PY>
<VO>34</VO>
<PP>248-258</PP>
</SEQ>

<SEQ>
<UI>0982   Muthukrishnan String Matching Under .. Lecture Notes i 92 
652:356-367
</UI>
<AU>Muthukrishnan S;
    Ramesh H
</AU>
<TI>String Matching Under a General Matching Relation
</TI>
<SU>String match;
    Complexity;
    USA
</SU>
<AB>Proc. 12th FST &amp; TCS, India. "In standard string matching, each symbol
matches only itself. In other string matching problems, e.g., the string
matching with 'don't cares' problem, a symbol may match several symbols. In
general, an arbitrary many-to-many matching relation might hold between 
symbols.
We consider a general string matching problem in which such a matching relation
is specified and those text positions are sought at which the pattern matches
under this relation. Depending upon the existence of a simple, easily
recognizable property in the given matching relation, we show that string
matching either requires time linear in the text and pattern lengths or is at
least as hard as boolean multiplication."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>652</PY>
<VO>652</VO>
<PP>356-367</PP>
</SEQ>

<SEQ>
<UI>0983   Muthukrishnan Non-Standard Stringolo.. ACM Sympos.Theo 94 
26:770-779
</UI>
<AU>Muthukrishnan S;
    Palem K
</AU>
<TI>Non-Standard Stringology: Algorithms and Complexity
</TI>
<SU>String match;
    Match with don't cares;
    Complexity;
    USA;
    Algorithm
</SU>
<AB>"Non-standard stringology concerns string matching problems, wherein a
position in the 'text' (of size n) matches one in the 'pattern' (of size m),
based on very general relationships between the corresponding 'symbols'. For
example, string matching with don't cares is a simple non-standard string
matching problem .... The main results in this paper concern the inherent
complexity of a variety of non-standard string matching problems, characterized
in terms of algebraic convolutions."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>26</PY>
<VO>26</VO>
<PP>770-779</PP>
</SEQ>

<SEQ>
<UI>0984   Penotti,F.E.  A Distributed System f.. Comput.Appl.Bio 94 
10(3):277-280
</UI>
<AU>Penotti FE
</AU>
<TI>A Distributed System for DNA/Protein Database Similarity Searches
</TI>
<SU>Database search;
    Distributed;
    Italy;
    Similarity
</SU>
<AB>"A distributed system for exhaustive alignment similarity searches on
DNA/protein databases is presented. The system makes it possible to share the
computational burden on diverse computers, provided they are interconnected by 
a
network supporting TCP/IP communication."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>3</NO>
<PP>277-280</PP>
</SEQ>

<SEQ>
<UI>0985   Sahinalp,S.C. Symmetry Breaking for .. ACM Sympos.Theo 94 
26:300-309
</UI>
<AU>Sahinalp SC;
    Vishkin U
</AU>
<TI>Symmetry Breaking for Suffix Tree Construction
</TI>
<SU>String match;
    Search tree;
    Parallel;
    USA;
    Suffix
</SU>
<AB>"There are several serial algorithms for suffix tree construction which
run in linear time, but the number of operations in the only parallel algorithm
available ... is propostional to n log n. ... We show how to break symmetries
that occur in the process of assigning labels ... and thereby reduce the number
of labeled substrings to linear. We give several algorithms for suffix tree
construction. One of them runs in O(log2 n) parallel time and O(n) work for
input strings whose characters are drawn from a constant size alphabet."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>26</PY>
<VO>26</VO>
<PP>300-309</PP>
</SEQ>

<SEQ>
<UI>0986   Bairoch,A.    The SWISS-PROT Protein.. Nucleic Acids R 91 
19:2247-2249
</UI>
<AU>Bairoch A;
    Boeckmann B
</AU>
<TI>The SWISS-PROT Protein Sequence Data Bank
</TI>
<SU>Sequence database;
    SWI;
    Protein;
    SWISS-PROT
</SU>
<AB>"SWISS-PROT is an annotated protein sequence database established in 1986
and maintained collaboratively, since 1988, by the Department of Medical
Biochemistry of the University of Geneva and the EMBL Data Library." Sources of
the sequence data. Format. What distinguishes SWISS-PROT from other protein
sequence databases? Annotation. Minimal redundancy. Integration with other
databases. Content of the current release. Distribution.
</AB>
<JT>Nucleic Acids Res</JT>
<PY>19</PY>
<VO>19</VO>
<PP>2247-2249</PP>
</SEQ>

<SEQ>
<UI>0987   Barker,W.C.   The PIR-International .. Nucleic Acids R 92 
20:2023-2026
</UI>
<AU>Barker WC;
    George DG;
    Mewes HW;
    Tsugita A
</AU>
<TI>The PIR-International Protein Sequence Database
</TI>
<SU>Sequence database;
    Protein;
    PIR;
    USA
</SU>
<AB>PIR-International. The protein sequence database. The superfamily concept
and placement in the database. Standardization within and among databases. Work
in progress. Complementary role of the protein sequence database and Geninfo.
Data distribution on magnetic tapes and cd-rom. On-line access and e-mail
servers. How to obtain PIR-International databases, software, and newsletters.
</AB>
<JT>Nucleic Acids Res</JT>
<PY>20</PY>
<VO>20</VO>
<PP>2023-2026</PP>
</SEQ>

<SEQ>
<UI>0988   Staden,R.     Indexing the Sequence .. DNA Seq.- J.DNA 92 3:99-105
</UI>
<AU>Staden R;
    Dear S
</AU>
<TI>Indexing the Sequence Libraries: Software Providing a Common Indexing
System for all the Standard Sequence Libraries
</TI>
<SU>Database search;
    Program;
    UK
</SU>
<AB>"We describe a set of programs for creating and using indexes for the
distributed forms of the major sequence libraries. The indexes conform to the
specification of those distributed on cd-rom by the EMBL sequence library. The
programs create entry name, accession number, author and freetext indexes and a
brief directory index. If a suitable application program is given an entry name
or accession number these indexes allow rapid retrieval of sequences or
annotation. ... We also describe the organisation and use of the different
sequence libraries and their index files."
</AB>
<JT>DNA Seq - J DNA Seq Mapping</JT>
<PY>1992</PY>
<VO>3</VO>
<PP>99-105</PP>
</SEQ>

<SEQ>
<UI>0989   Bernstein,M.  Reducing the Man-Machi.. Comput.Appl.Bio 87 
3(3):229-232
</UI>
<AU>Bernstein M
</AU>
<TI>Reducing the Man-Machine Barrier: The Sequence Analysis Workbench
</TI>
<SU>Sequence analysis;
    Program;
    USA
</SU>
<AB>"Direct manipulation offers an alternative paradigm in which the 
scientist
uses the computer as a scientific instrument for examining and modifying the
data directly, rather than by issuing instructions to an agent. The Sequence
Analysis Workbench provides several experimental tools for direct manipulation
of sequence data; object-oriented programming makes it possible to construct
sophisticated tools quickly, and facilitates critical examination and review of
scientific software."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1987</PY>
<VO>3</VO>
<NO>3</NO>
<PP>229-232</PP>
</SEQ>

<SEQ>
<UI>0990   Sonnhammer,E. A Workbench for Large-.. Comput.Appl.Bio 94 
10(3):301-307
</UI>
<AU>Sonnhammer ELL;
    Durbin R
</AU>
<TI>A Workbench for Large-Scale Sequence Homology Analysis
</TI>
<SU>Sequence analysis;
    Program;
    UK;
    Sequence alignment;
    Database search;
    BLAST;
    Homology
</SU>
<AB>"To reduce the tedious browsing of large quantities of protein
similarities, two programs, MSPcrunch and Blixem, were developed, which assist
in processing the results from the database search programs in the BLAST suite.
MSPcrunch removes biased composition and redundant matches while keeping weak
matches that are consistent with a larger gapped alignment. ... Blixem is a
multiple sequence alignment viewer for X-windows which makes it significantly
easier to scan and evaluate the matches ratified by MSPcrunch."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>3</NO>
<PP>301-307</PP>
</SEQ>

<SEQ>
<UI>0991   Cornish-Bowde How Reliably do Amino .. J.Theor.Biol.   79 
76:369-386
</UI>
<AU>Cornish-Bowden A
</AU>
<TI>How Reliably do Amino Acid Composition Comparisons Predict Sequence
Similarities between Proteins?
</TI>
<SU>Sequence proximity;
    Sequence comparison;
    Composition;
    UK;
    Similarity;
    Amino acid;
    Protein
</SU>
<AB>"A method for comparing amino acid compositions of proteins (Cornish-
Bowden 1977) has been extended to allow proteins of unequal lengths to be
compared. ... It tends to exaggerate the amount of difference between unrelated
proteins. ... When applied to related proteins the method gives results in good
agreement with those predicted."
</AB>
<JT>J Theor Biol</JT>
<PY>76</PY>
<VO>76</VO>
<PP>369-386</PP>
</SEQ>

<SEQ>
<UI>0992   Strelets,V.B. Data Bank Homology Sea.. Comput.Appl.Bio 94 
10(3):319-322
</UI>
<AU>Strelets VB;
    Ptitsyn AA;
    Milanesi L;
    Lim HA
</AU>
<TI>Data Bank Homology Search Algorithm with Linear Computation Complexity
</TI>
<SU>Database search;
    k-tuple;
    USA;
    Region;
    Complexity;
    Homology;
    Algorithm
</SU>
<AB>"The principal advantages of the new algorithm are: (i) linear 
computation
complexity; (ii) low memory requirements; (iii) high sensitivity to the 
presence
of local region homology. The algorithm first calculates indicative matrices of
k-tuple 'realization' in the query sequence and then searches for an 
appropriate
number of matching k-tuples within a narrow range in database sequences."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>3</NO>
<PP>319-322</PP>
</SEQ>

<SEQ>
<UI>0993   Branscomb,E.  Optimizing Restriction.. Genomics        90 
8:351-366
</UI>
<AU>Branscomb E;
    Slezak T;
    Pae R;
    Galas D;
    Carrano AV;
    Waterman M
</AU>
<TI>Optimizing Restriction Fragment Fingerprinting Methods for Ordering Large
Genomic Libraries
</TI>
<SU>Genome;
    Fingerprint;
    Contig;
    Likelihood;
    Statistical;
    USA;
    Restriction;
    Fragment;
    Genomic
</SU>
<AB>"We present a statistical analysis of the problem of ordering large
genomic cloned libraries through overlap detection based on restriction
fingerprinting. ... To this end, we adopt a statistical approach that uses the
likelihood ratio as a statistic to detect overlap. ... This estimate is a
critical tool for the accurate, automatic assembly of overlapping sets of
fragments into islands called 'contigs.' These contigs must subsequently be
connected by other methods to provide an ordered set of overlapping fragments
covering the entire genome."
</AB>
<JT>Genomics </JT>
<PY>8</PY>
<VO>8</VO>
<PP>351-366</PP>
</SEQ>

<SEQ>
<UI>0994   Mott,R.       Algorithms and Softwar.. Nucleic Acids R 93 
21(8):1965-197
</UI>
<AU>Mott R;
    Grigoriev A;
    Maier E;
    Hoheisel J;
    Lehrach H
</AU>
<TI>Algorithms and Software Tools for Ordering Clone Libraries: Application 
to
the Mapping of the Genome of Schizosaccharomyces pombe
</TI>
<SU>Genome;
    Clone;
    Mapping;
    Program;
    UK;
    Simulated annealing;
    Algorithm
</SU>
<AB>"A complete set of software tools to aid the physical mapping of a genome
has been developed and successfully applied .... Two approaches were used for
ordering single-copy hybridisation probes: one was based on the simulated
annealing algorithm to order all probes, and another on inferring the minimum-
spanning subset of the probes using a heuristic filtering procedure. ... In
addition to these programs and the database management software, tools for
visualizing and editing the data are described."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1993</PY>
<VO>21</VO>
<NO>8</NO>
<PP>1965-1974</PP>
</SEQ>

<SEQ>
<UI>0995   Olson,M.V.    Random-Clone Strategy .. Proc.Nat.Acad.S 86 
83:7826-7830
</UI>
<AU>Olson MV;
    Dutchik JE;
    Graham MY;
    Brodeur GM;
    Helms C;
    Frank M;
    MacCollin M;
    Scheinman R;
    Frank T
</AU>
<TI>Random-Clone Strategy for Genomic Restriction Mapping in Yeast
</TI>
<SU>Genome;
    Restriction;
    USA;
    Mapping;
    Clone;
    Genomic
</SU>
<AB>"An approach to global restriction mapping is described that is 
applicable
to any complex source DNA. By analyzing a single restriction digest for each
member of a redundant set of l clones, a data base is constructed that contains
fragment-size lists for all the clones. The clones are then grouped into
usbsets, each member of which is related to at least one other member by a
significant overlap. Finally, a tree-searching algorithm seeks restriction maps
that are consistent with the fragment-size lists for all the clones in each
subset."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>83</PY>
<VO>83</VO>
<PP>7826-7830</PP>
</SEQ>

<SEQ>
<UI>0996   Zhang,P.      An Algorithm Based on .. Comput.Appl.Bio 94 
10(3):309-317
</UI>
<AU>Zhang P;
    Schon EA;
    Fischer SG;
    Cayanis E;
    Weiss J;
    Kistler S;
    Bourne PE
</AU>
<TI>An Algorithm Based on Graph Theory for the Assembly of Contigs in 
Physical
Mapping of DNA
</TI>
<SU>Genome;
    USA;
    Graph;
    Contig;
    Mapping;
    DNA;
    Algorithm;
    Physical mapping;
    Physical
</SU>
<AB>"An algorithm is described for mapping DNA contigs based on an interval
graph (IG) representation. In general terms, the input to the algorithm is a 
set
of binary overlapping relations among finite intervals spread along a real 
line,
from which the algorithm generates sets of ordered overlapping fragments
spanning that line. The implications of a more general case of the IG, called a
probe interval graph (PIG), in which only a subset of cosmids are used as
probes, are also discussed. ... CPU time is essentially linear with respect to
the number of cosmids analyzed."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>3</NO>
<PP>309-317</PP>
</SEQ>

<SEQ>
<UI>0997   Schmitt,W.    Multiple Solutions of .. Adv.Appl.Math.  91 
12:412-427
</UI>
<AU>Schmitt W;
    Waterman MS
</AU>
<TI>Multiple Solutions of DNA Restriction Mapping Problems
</TI>
<SU>Restriction;
    Digest;
    Mapping;
    USA;
    DNA
</SU>
<AB>"The construction of a restriction map of a DNA molecule from fragment
length data is known to be NP hard. However, it is also known that under a
simple model of randomness the number of solutions to the mapping problem
increases exponentially with the length of the DNA molecule. In this paper, we
define a hierarchy of equivalence relations on the set of all solutions to the
mapping problem and study the combinatorics and characterization of the
equivalence classes."
</AB>
<JT>Adv Appl Math</JT>
<PY>12</PY>
<VO>12</VO>
<PP>412-427</PP>
</SEQ>

<SEQ>
<UI>0998   Watterson,G.A The Chromosome Inversi.. J.Theor.Biol.   82 99:1-7
</UI>
<AU>Watterson GA;
    Ewens WJ;
    Hall TE;
    Morgan A
</AU>
<TI>The Chromosome Inversion Problem
</TI>
<SU>Chromosome;
    Inversion;
    Genomic;
    AU
</SU>
<AB>"We wish to calculate a measure of distance between two species for the
purpose of constructing a phylogenetic tree. The data from which the distance
measure is to be calculated is the order of the sequence of gene loci around a
circular chromosome, and the distance between any two species is the minimum
number of chromosomal inversions necessary to make the two sequences identical.
There is no top or bottom to the chromosome so mirror image sequences are
regarded as being identical. There is also no fixed 12 o'clock position. 
Various
algorithms are considered which yield upper and lower bounds to the distance
measure required but no algorithm giving the exact value has been found."
</AB>
<JT>J Theor Biol</JT>
<PY>99</PY>
<VO>99</VO>
<PP>1-7</PP>
</SEQ>

<SEQ>
<UI>0999   Galil,Z.      Saving Space in Fast S.. IEEE Sympos.Fou 77 
18:179-188
</UI>
<AU>Galil Z;
    Seiferas J
</AU>
<TI>Saving Space in Fast String-Matching
</TI>
<SU>String match;
    IL
</SU>
<AB>"The inspiration for this paper was an attempt to implement the fast
string-matching algorithm of Knuth, Morris and Pratt (1977) as a Fortran
subroutine. ... We show [for pattern x and text y] how to reduce the additional
space utilization by the fast algorithm down to O( log |x| ) memory locations.
... We show how to reduce the running time of the naive algorithm all the way
down to O( |x|e (|x| + |y|) ) for any fixed e &gt; 0. Thus we get an almost 
linear-
time algorithm which can be implemented without any dynamic storage allocation
at all."
</AB>
<JT>IEEE Sympos Found Comput Sci</JT>
<PY>18</PY>
<VO>18</VO>
<PP>179-188</PP>
</SEQ>

<SEQ>
<UI>1000   Guibas,L.J.   A New Proof of the Lin.. IEEE Sympos.Fou 77 
18:189-195
</UI>
<AU>Guibas LJ;
    Odlyzko AM
</AU>
<TI>A New Proof of the Linearity of the Boyer-Moore String Searching 
Algorithm
</TI>
<SU>String match;
    Boyer-Moore;
    String search;
    USA;
    Algorithm
</SU>
<AB>"The main result of this paper is a new proof of the linearity of the
Boyer-Moore algorithm. We have paid close attention to details and to clarity 
of
presentation. We have also improved the worst case bound from 6n to 4n. In the
process we have developed considerable combinatorial machinery dealing with the
occurrence of periods in strings, much of it of interest in its own right.
Undoubtedly the same or similar machinery will come in handy in the analysis of
other questions concerning pattern matching. [Possibly] the true worst case
bound for the algorithm is 2n."
</AB>
<JT>IEEE Sympos Found Comput Sci</JT>
<PY>18</PY>
<VO>18</VO>
<PP>189-195</PP>
</SEQ>

<SEQ>
<UI>1001   Bafna,V.      Genome Rearrangements .. IEEE Sympos.Fou 93 
34:148-157
</UI>
<AU>Bafna V;
    Pevzner PA
</AU>
<TI>Genome Rearrangements and Sorting by Reversals
</TI>
<SU>Genome;
    Rearrangement;
    Sort;
    Reversal;
    USA
</SU>
<AB>"Sequence comparison in molecular biology is in the beginning of a major
paradigm shift - a shift from gene comparison based on local mutations to
chromosome comparison based on global rearrangements. In the simplest form the
problem of gene rearrangements corresponds to sorting by reversals, i.e.,
sorting of an array using reversals of arbitrary fragments." Theoretical 
results
and approximation algorithms are given for the cases of unsigned and signed
permutations.
</AB>
<JT>IEEE Sympos Found Comput Sci</JT>
<PY>34</PY>
<VO>34</VO>
<PP>148-157</PP>
</SEQ>

<SEQ>
<UI>1002   Skiena,S.S.   A Partial Digest Appro.. Bull.Math.Biol. 94 
56(2):275-294
</UI>
<AU>Skiena SS;
    Sundaram G
</AU>
<TI>A Partial Digest Approach to Restriction Site Mapping
</TI>
<SU>Restriction;
    Mapping;
    DNA;
    USA;
    Digest
</SU>
<AB>"We present a new, practical algorithm to resolve the experimental data 
in
restriction site analysis, which is a common technique for mapping DNA.
Specifically, we assert that multiple digestions with a single restriction
enzyme can provide sufficient information to identify the positions of the
restriction sites with high probability. The motivation for the new approach
comes from combinatorial results on the number of mutually homeometric sets in
one dimension, where two sets of n points are homeometric if the multiset of
n(n-1)/2 distances they determine are the same."
</AB>
<JT>Bull Math Biol</JT>
<PY>1994</PY>
<VO>56</VO>
<NO>2</NO>
<PP>275-294</PP>
</SEQ>

<SEQ>
<UI>1003   Galil,Z.      Time-Space-Optimal Str.. ACM Sympos.Theo 81 
13:106-113
</UI>
<AU>Galil Z;
    Seiferas J
</AU>
<TI>Time-Space-Optimal String Matching (preliminary report)
</TI>
<SU>String match;
    Complexity;
    Automata;
    IL
</SU>
<AB>"In this paper we describe a new linear-time string-matching algorithm
requiring neither dynamic storage allocation nor other high-level capabilities.
The algorithm can be implemented to run in linear time even on a six-head two-
way finite automaton. Moreover, the automaton requires only =, not= branching.
(Decisions depend on which of the six scanned pattern or text symbols and
positions are the same, but not on the particular symbols or how many symbols
there are. Hence the same algorithm works even for an infinite alphabet.)"
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>13</PY>
<VO>13</VO>
<PP>106-113</PP>
</SEQ>

<SEQ>
<UI>1004   Hirschberg,D. The Least Weight Subse.. IEEE Sympos.Fou 85 
26:137-143
</UI>
<AU>Hirschberg DS;
    Larmore LL
</AU>
<TI>The Least Weight Subsequence Problem - extended abstract
</TI>
<SU>Least weight;
    Subsequence;
    USA
</SU>
<AB>"The least weight subsequence (LWS) problem is introduced, and is shown 
to
be equivalent to the classic minimum path problem for directed graphs. A 
special
case of the LWS problem is shown to be solvable in O( n log n ) time generally
and, for certain weight functions, in linear time. A number of applications are
given, including an optimum paragraph formation problem and the problem of
finding a minimum height B-tree, whose solutions realize improvement in
asymptotic time complexity."
</AB>
<JT>IEEE Sympos Found Comput Sci</JT>
<PY>26</PY>
<VO>26</VO>
<PP>137-143</PP>
</SEQ>

<SEQ>
<UI>1005   Fitch,W.M.    Mapping the Order of D.. Gene            83 22:19-29
</UI>
<AU>Fitch WM;
    Smith TF;
    Ralph WW
</AU>
<TI>Mapping the Order of DNA Restriction Fragments
</TI>
<SU>Restriction;
    Mapping;
    DNA;
    USA;
    Fragment
</SU>
<AB>"A straightforward method was designed for mapping the order of DNA
restriction fragments obtained by a double and two single digestions, without
the necessity of using a computer or a radioactive label. All possible 
solutions
compatible with a pre-set level of error in the determination of sequence
lengths are obtained. The primary assumptions are given, and the appropriate
modifications of the algorithm are presented as a function of any assumptions
one is unable (or unwilling) to make. Use of the method in connection with end-
labeled fragments is also described."
</AB>
<JT>Gene </JT>
<PY>22</PY>
<VO>22</VO>
<PP>19-29</PP>
</SEQ>

<SEQ>
<UI>1006   Li,M.         Towards a DNA Sequenci.. IEEE Sympos.Fou 90 
31:125-134
</UI>
<AU>Li M
</AU>
<TI>Towards a DNA Sequencing Theory (Learning a String) (Preliminary Version)
</TI>
<SU>Sequence analysis;
    Supersequence;
    Shortest common;
    Approximation;
    DNA;
    CA;
    Sequencing;
    Learning
</SU>
<AB>"We model the DNA sequencing problem as learning a superstring from its
randomly drawn substrings. ... One major obstacle to our approach turns out to
be a quite well-known open question on how to approximate the shortest common
superstring of a set of strings .... We give the first provably good algorithm
which approximates the shortest superstring of length n by a superstring of
length O( n log n )."
</AB>
<JT>IEEE Sympos Found Comput Sci</JT>
<PY>31</PY>
<VO>31</VO>
<PP>125-134</PP>
</SEQ>

<SEQ>
<UI>1007   Kannan,S.     Inferring Evolutionary.. IEEE Sympos.Fou 90 
31(I):362-371
</UI>
<AU>Kannan S;
    Warnow T
</AU>
<TI>Inferring Evolutionary History from DNA Sequences (Extended Abstract)
</TI>
<SU>Phylogeny;
    DNA;
    USA
</SU>
<AB>"We are interested here in two related problems. The first is determining
whether we can triangulate a vertex-colored graph without introducing edges
between vertices of the same color. This is related to a fundamental problem 
for
geneticists, that of using character state information to construct 
evolutionary
trees. We demonstrate the polynomial equivalence of these problems. An 
important
subproblem arises when the characters are based upon DNA sequences. We present
an O( n2k ) algorithm for this case where n is the number of species and k is
the number of characters."
</AB>
<JT>IEEE Sympos Found Comput Sci</JT>
<PY>1990</PY>
<VO>31</VO>
<NO>I</NO>
<PP>362-371</PP>
</SEQ>

<SEQ>
<UI>1008   Blum,A.       Linear Approximation o.. ACM Sympos.Theo 91 
23:328-336
</UI>
<AU>Blum A;
    Jiang T;
    Li M;
    Tromp J;
    Yannakakis M
</AU>
<TI>Linear Approximation of Shortest Superstrings
</TI>
<SU>Supersequence;
    Shortest common;
    Approximation;
    USA
</SU>
<AB>"Although [the shortest common superstring] problem is known to be NP-
hard, a simple greedy procedure appears to do quite well .... We show that the
greedy algorithm does in fact achieve a constant factor approximation, proving
an upper bound of 4n. Furthermore, we present a simple modified version of the
greedy algorithm that we show produces a superstring of length at most 3n. We
also show the superstring problem to be MAX SNP-hard, which implies that a
polynomial-time approximation scheme for this problem is unlikely."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>23</PY>
<VO>23</VO>
<PP>328-336</PP>
</SEQ>

<SEQ>
<UI>1009   Galil,Z.      Truly Alphabet-Indepen.. IEEE Sympos.Fou 92 
33:247-256
</UI>
<AU>Galil Z;
    Park K
</AU>
<TI>Truly Alphabet-Independent Two-Dimensional Pattern Matching
</TI>
<SU>Pattern match;
    Multidimensional;
    USA
</SU>
<AB>"We present an algorithm [for two-dimensional pattern matching with a
pattern of size m2 and a text of size n2] that is truly independent of the
alphabet and takes linear O( m2 + n2 ) time. As in the Knuth-Morris-Pratt
algorithm, the only operation on the alphabet is the equality test of two
symbols."
</AB>
<JT>IEEE Sympos Found Comput Sci</JT>
<PY>33</PY>
<VO>33</VO>
<PP>247-256</PP>
</SEQ>

<SEQ>
<UI>1010   Jiang,T.      k One-way Heads Cannot.. ACM Sympos.Theo 93 25:62-70
</UI>
<AU>Jiang T;
    Li M
</AU>
<TI>k One-way Heads Cannot do String-Matching
</TI>
<SU>String match;
    Automata;
    CA
</SU>
<AB>"We settle a conjecture raised by Z. Galil and J. Seiferas [1981] 12 
years
ago: k-head one-way deterministic finite automata cannot perform 
string-matching
(i.e., accept the language { x#y : there exists u and there exists v such that 
y
= uxv } ), for any k."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>25</PY>
<VO>25</VO>
<PP>62-70</PP>
</SEQ>

<SEQ>
<UI>1011   Baker,B.S.    A Theory of Parameteri.. ACM Sympos.Theo 93 25:71-80
</UI>
<AU>Baker BS
</AU>
<TI>A Theory of Parameterized Pattern Matching: Algorithms and Applications
(Extended Abstract)
</TI>
<SU>Pattern match;
    Parameterized;
    Suffix;
    USA;
    Algorithm
</SU>
<AB>"This paper develops a theory and algorithms for an application problem
arising in software maintenance: to track down duplication in a large software
system. We want to find ... parameterized matches, where a parameterized match
between two sections of code means that one section can be transformed into the
other by [a one-to-one replacement of parameter names]. This paper formalizes
this problem in terms of parameterized strings and parameterized pattern
matching and defines a new data structure (parameterized suffix tree) suitable
for parameterized pattern matching."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>25</PY>
<VO>25</VO>
<PP>71-80</PP>
</SEQ>

<SEQ>
<UI>1012   Blum,N.       Speeding up Dynamic Pr..                 94Institut 
fur In
</UI>
<AU>Blum N
</AU>
<TI>Speeding up Dynamic Programming without Omitting any Optimal Solution and
some Applications in Molecular Biology
BK  -
</TI>
<SU>Pairwise alignment;
    Dynamic programming;
    DE;
    Complexity;
    Optimal;
    Dynamic
</SU>
<AB>"We extend the algorithm of Galil and Biancarlo, which sped up dynamic
programming in the case of concave cost functions such that a compact
representation of all optimal solutions is computed. The time complexity grows
only by a small constant factor. Under the assumption that such a compact
representation is given, we develop efficient algorithms for the solution of
problems in molecular biology concerning the computation of all optimal local
alignments and all optimal subalignments in genetic sequences."
</AB>
<PU>Institut fur Informatik</PU>
<PL> Universitat Bonn ,Bonn </PL>
<PY>1994</PY>
<PP>1-37</PP>
</SEQ>

<SEQ>
<UI>1013   Doolittle,R.F Similar Amino Acid Seq.. Trends Biochem. 89 
14:244-245
</UI>
<AU>Doolittle RF
</AU>
<TI>Similar Amino Acid Sequences Revisited
</TI>
<SU>Sequence comparison;
    Significance;
    USA;
    Amino acid
</SU>
<AB>"The rapid accumulation of protein sequences, many bearing unexpected
resemblances to each other, is providing a new perspective on evolution." See
also Doolittle (1981).
</AB>
<JT>Trends Biochem Sci</JT>
<PY>14</PY>
<VO>14</VO>
<PP>244-245</PP>
</SEQ>

<SEQ>
<UI>1014   Staden,R.     Graphic Methods to Det.. Nucleic Acids R 84 
12(1):521-538
</UI>
<AU>Staden R
</AU>
<TI>Graphic Methods to Determine the Function of Nucleic Acid Sequences
</TI>
<SU>Function;
    Sequence analysis;
    UK;
    Region;
    Display;
    Nucleic acid;
    Graphic
</SU>
<AB>"We have described a single program [ANALYSEQ] that contains the
traditional sequence analysis techniques plus some new methods that can locate
particular sequence features or regions that are of interest because they are
unusual. Most of the routines display their results graphically which has a
number of advantages. Graphical output is clearer than marking listings of
sequences; allows superposition, and therefore easy comparison, of the results
of many different and often independent forms of analysis; and it allows us to
see regions of sequences that may perform more than one function."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1984</PY>
<VO>12</VO>
<NO>1</NO>
<PP>521-538</PP>
</SEQ>

<SEQ>
<UI>1015   Hillis,D.M.   Ribosomal DNA: Molecul.. Q.Rev.Biol.     91 
66(4):411-453
</UI>
<AU>Hillis DM;
    Dixon MT
</AU>
<TI>Ribosomal DNA: Molecular Evolution and Phylogenetic Inference
</TI>
<SU>Phylogeny;
    Review;
    USA;
    Evolution;
    DNA;
    Phylogenetic
</SU>
<AB>"Studies of rDNA sequences have been used to infer phylogenetic history
across a very broad spectrum .... The reasons for the systematic versatility of
rDNA include the numerous rates of evolution among different regions of rDNA
..., the presence of many copies of most rDNA sequences per genome, and the
pattern of concerted evolution that occurs among repeated copies. These 
features
facilitate the analysis of rDNA by direct RNA sequencing, DNA sequencing ...,
and restriction enzyme methodologies. Constraints imposed by secondary 
structure
of rRNA and concerted evolution need to be considered in phylogenetic analyses,
but these constraints do not appear to impede seriously the usefulness of 
rDNA."
</AB>
<JT>Q Rev Biol</JT>
<PY>1991</PY>
<VO>66</VO>
<NO>4</NO>
<PP>411-453</PP>
</SEQ>

<SEQ>
<UI>1016   Konopka,A.    Is the Information Con.. J.Theor.Biol.   84 
107:697-704
</UI>
<AU>Konopka A
</AU>
<TI>Is the Information Content of DNA Evolutionarily Significant?
</TI>
<SU>Composition;
    Information content;
    DE;
    DNA
</SU>
<AB>"It has been suggested (Subba Rao, Hamid &amp; Subba Rao, 1979; Subba Rao,
Geevan &amp; Subba Rao, 1982) that the information content of the coding regions in
DNA tends to increase with evolution and, therefore, is a suitable indicator of
evolutionary progress. In order to re-examine this hypothesis, I have modified
the method used by Subba Rao et al. (1982) in such a way that the numerical
results are much less sensitive to the amino acid composition of the 
polypeptide
corresponding to the DNA sequences under consideration. By using this modified
procedure, I present evidence that the hypothesis of Subba Rao et al. (1982) is
not valid for a wide range of evolving genes."
</AB>
<JT>J Theor Biol</JT>
<PY>107</PY>
<VO>107</VO>
<PP>697-704</PP>
</SEQ>

<SEQ>
<UI>1017   Shulman,M.J.  The Coding Function of.. J.Theor.Biol.   81 
88:409-420
</UI>
<AU>Shulman MJ;
    Steinberg CM;
    Westmoreland N
</AU>
<TI>The Coding Function of Nucleotide Sequences can be Discerned by
Statistical Analysis
</TI>
<SU>Sequence analysis;
    Function;
    Statistical;
    Coding;
    SWI;
    Nucleotide
</SU>
<AB>"The nucleotide sequences of the RNA phage MS2 and the DNA phage fX were
subjected to statistical analysis. This analysis alone indicates (a) that the
genetic code is a non-overlapping triplet code and (b) what the correct reading
frame is. The application of these methods to identify structure in sequences 
of
unknown function is discussed."
</AB>
<JT>J Theor Biol</JT>
<PY>88</PY>
<VO>88</VO>
<PP>409-420</PP>
</SEQ>

<SEQ>
<UI>1018   Subba Rao,J.  Significance of the In.. J.Theor.Biol.   82 
96:571-577
</UI>
<AU>Subba Rao J;
    Geevan CP;
    Subba Rao G
</AU>
<TI>Significance of the Information Content of DNA in Mutations and Evolution
</TI>
<SU>Information content;
    Composition;
    Significance;
    India;
    Evolution;
    DNA
</SU>
<AB>"One point mutations in human haemoglobins have been analysed and it is
seen that most of these mutations satisfy the condition P1 &gt; P2, where P1 is 
the
probability of occurrence of the codon that mutates and P2 is that of the codon
it mutates to. Further, it is shown that the hypothesis that the information
content of DNA is a reasonable evolutionary measure is consistent with the 
above
condition."
</AB>
<JT>J Theor Biol</JT>
<PY>96</PY>
<VO>96</VO>
<PP>571-577</PP>
</SEQ>

<SEQ>
<UI>1019   Rzhetsky,A.   A Simple Method for Es.. Mol.Biol.Evol.  92 
9(5):945-967
</UI>
<AU>Rzhetsky A;
    Nei M
</AU>
<TI>A Simple Method for Estimating and Testing Minimum-Evolution Trees
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Statistical;
    Minimum evolution;
    USA
</SU>
<AB>"A simple method for estimating and testing phylogenetic trees under the
principle of minimum evolution (ME) is presented. The basic procedure of this
method is first to obtain the neighbor-joining (NJ) tree by Saitou and Nei's
method and then to search for a tree with the minimum value of the sum (S) of
branch lengths by examining all trees that are closely related to the NJ tree.
Once the ME tree is identified, a statistical test is conducted for the
difference in S between this tree and other closely related trees. The
mathematical method required for conducting this test is developed by using the
least-squares approach."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1992</PY>
<VO>9</VO>
<NO>5</NO>
<PP>945-967</PP>
</SEQ>

<SEQ>
<UI>1020   Barber,A.M.   SequenceEditingAligner.. Gene.Anal.Techn 90 7:39-45
</UI>
<AU>Barber AM;
    Maizel JV Jr
</AU>
<TI>SequenceEditingAligner: A Multiple Sequence Editor and Aligner
</TI>
<SU>Display;
    Program;
    Editor;
    Sequence alignment;
    Consensus sequence;
    USA
</SU>
<AB>"Here we present the SequenceEditingAligner system for editing multiple,
aligned genetic sequences. This is an interactive multi-window color system 
that
displays more than 3500 nucleotides or amino acids. The system handles nucleic
acid or protein sequences with or without secondary structure data. More than
300 sequences, each more than 1500 elements in length, may be analyzed 
together.
With the system scientists can classify elements, align sequences, edit them,
find consensus patterns, and simultaneously generate oligomer frequency
histograms and other statistics."
</AB>
<JT>Gene Anal Techn Appl</JT>
<PY>7</PY>
<VO>7</VO>
<PP>39-45</PP>
</SEQ>

<SEQ>
<UI>1021   Stockwell,P.A HOMED: A Homologous Se.. Trends Biochem. 88 
13:322-324
</UI>
<AU>Stockwell PA
</AU>
<TI>HOMED: A Homologous Sequence Editor
</TI>
<SU>Display;
    Program;
    Editor;
    NZ
</SU>
<AB>"Since the initial publication of the HOMED HOMologous sequence EDitor in
CABIOS (1987) a number of further enhancements have been made so that an 
updated
report of the current capabilities is desirable."
</AB>
<JT>Trends Biochem Sci</JT>
<PY>13</PY>
<VO>13</VO>
<PP>322-324</PP>
</SEQ>

<SEQ>
<UI>1022   Fuchs,R.      Free Molecular Biologi.. Comput.Appl.Bio 90 
6(2):120-121
</UI>
<AU>Fuchs R
</AU>
<TI>Free Molecular Biological Software Available from the EMBL File Server
</TI>
<SU>Program;
    Sequence analysis;
    DE;
    EMBL;
    Server
</SU>
<AB>"A new service provided by EMBL (EMBL Software File Server) is described
that will make free molecular biology software available to anyone with 
computer
network access. MS-DOA, Apple Macintosh and VAX/VMX are supported at the 
moment.
The programs will be delivered by normal electronic mail; conversion mechanisms
will transform binary files to ASCII to allow mail transfer. The service will
also help authors to distribute their software conveniently."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1990</PY>
<VO>6</VO>
<NO>2</NO>
<PP>120-121</PP>
</SEQ>

<SEQ>
<UI>1023   Heckel,P.     A Technique for Isolat.. Comm.ACM        78 
21(4):264-268
</UI>
<AU>Heckel P
</AU>
<TI>A Technique for Isolating Differences between Files
</TI>
<SU>Sequence proximity;
    Longest common;
    Subsequence;
    USA
</SU>
<AB>"A simple algorithm is described for isolating the differences between 
two
files. ... The algorithm isolates differences in a way that corresponds closely
to our intuitive notion of difference, is easy to implement, and is
computationally efficient, with time linear in the file length. For most
applications the algorithm isolates differences similar to those isolated by 
the
longest common subsequence."
</AB>
<JT>Comm ACM </JT>
<PY>1978</PY>
<VO>21</VO>
<NO>4</NO>
<PP>264-268</PP>
</SEQ>

<SEQ>
<UI>1024   Salamon,P.    A Maximum Entropy Prin.. Computers Chem. 92 
16(2):117-124
</UI>
<AU>Salamon P;
    Konopka AK
</AU>
<TI>A Maximum Entropy Principle for the Distribution of Local Complexity in
Naturally Occurring Nucleotide Sequences
</TI>
<SU>Composition;
    Linguistic;
    Entropy;
    Complexity;
    Distribution;
    USA;
    Nucleotide
</SU>
<AB>"A maximum entropy principle (MEP) governing the distribution of
complexity of short oligonucleotides from large collections of functionally
equivalent sequences is presented. The principle is seen to work well in both
translated regions (exons and bacterial genes) and introns from various 
genomes.
It also works in cases of sample sequences from various genomes and even a
representative sample of the entire GenBank. This suggests that all naturally
occurring DNA sequences are likely to follow the MEP described in this report."
</AB>
<JT>Computers Chem</JT>
<PY>1992</PY>
<VO>16</VO>
<NO>2</NO>
<PP>117-124</PP>
</SEQ>

<SEQ>
<UI>1025   Bell,G.I.     Roles of Repetitive Se.. Computers Chem. 92 
16(2):135-143
</UI>
<AU>Bell GI
</AU>
<TI>Roles of Repetitive Sequences
</TI>
<SU>Regularities;
    Repeat;
    Repetition;
    USA
</SU>
<AB>"Repetitive sequences are ubiquitous in the DNA of eukaryotes, some as
tandem arrays and others interspersed widely in the genome. Repetitive 
sequences
have special roles in genome evolution, which increasingly detailed sequence
information is helping to elucidate. Processes, including meiotic crossing over
(equal and unequal), unequal mitotic sister chromatid exchange, gene conversion
and transposition, with or without multiplication, can foster homogeneity of 
the
members of a repeat family (concerted evolution) and turnover of the whole
genome. Some examples are considered."
</AB>
<JT>Computers Chem</JT>
<PY>1992</PY>
<VO>16</VO>
<NO>2</NO>
<PP>135-143</PP>
</SEQ>

<SEQ>
<UI>1026   Konopka,A.K.  Sequences, Codes and F.. Computers Chem. 92 
16(2):83-84
</UI>
<AU>Konopka AK
</AU>
<TI>Sequences, Codes and Functions
</TI>
<SU>Sequence analysis;
    USA;
    Coding;
    Linguistic;
    Function
</SU>
<AB>This editorial introduces a special issue entitled "Open Problems of
Computational Molecular Biology." The papers were presented at the Open 
Problems
of Computational Molecular Biology Workshop, Telluride, 2-8 June 1991; they are
devoted to aspects of the biological coding problem. Contents: Overviews and
Opinions (3 papers), Mathematical Models in Biomolecular Linguistics (3),
Examples of Encoded Biological Functions (3), Computational Experiments (2),
Models and Proposals (2).
</AB>
<JT>Computers Chem</JT>
<PY>1992</PY>
<VO>16</VO>
<NO>2</NO>
<PP>83-84</PP>
</SEQ>

<SEQ>
<UI>1027   Churchill,G.A Hidden Markov Chains a.. Computers Chem. 92 
16(2):107-115
</UI>
<AU>Churchill GA
</AU>
<TI>Hidden Markov Chains and the Analysis of Genome Structure
</TI>
<SU>Genome;
    Statistical;
    Markov;
    USA;
    Structure
</SU>
<AB>"In this paper, statistical methods based on a hidden Markov chain model
are used to study the structure of some small complete genomes and a human
genome segment. A variety of discrete compositional domains are discovered and
their correlations with genome function are explored."
</AB>
<JT>Computers Chem</JT>
<PY>1992</PY>
<VO>16</VO>
<NO>2</NO>
<PP>107-115</PP>
</SEQ>

<SEQ>
<UI>1028   Argos,P.      The Language of Protei.. Computers Chem. 92 
16(2):93-102
</UI>
<AU>Argos P
</AU>
<TI>The Language of Protein Folding: Many Forked Tongues
</TI>
<SU>Structure;
    Linguistic;
    DE;
    Protein;
    Language;
    Folding
</SU>
<AB>"Protein folding is discussed in analogy with language. The protein
primary sequence, a string of successive amino acids in one letter code, is a
sentence. A sentence contains words, or subsequences, which are the local
secondary structures of the protein. Each of the words can have several 
meanings
but, when collected together and read in context, convey a central idea; 
namely,
the folded and functional protein. Deciphering the fold or meaning of the
protein sentence from only a knowledge of the ordered letters is a difficult
task."
</AB>
<JT>Computers Chem</JT>
<PY>1992</PY>
<VO>16</VO>
<NO>2</NO>
<PP>93-102</PP>
</SEQ>

<SEQ>
<UI>1029   Claverie,J.M. Sequence "Signals": Ar.. Computers Chem. 92 
16(2):89-91
</UI>
<AU>Claverie JM
</AU>
<TI>Sequence "Signals": Artifact or Reality?
</TI>
<SU>Signal;
    USA;
    Linguistic
</SU>
<AB>"I first review the concept of molecular sequence signal and the various
forms it can take in the literature. I then comment on the limitation of this
concept on the grounds that sequence signals are neither independent of a
subjective representation, nor complete and/or verifiable correlates of the
associated biological phenomena."
</AB>
<JT>Computers Chem</JT>
<PY>1992</PY>
<VO>16</VO>
<NO>2</NO>
<PP>89-91</PP>
</SEQ>

<SEQ>
<UI>1030   Konopka,A.K.  Computational Molecula.. Computers Chem. 93 
17(2):v-vi
</UI>
<AU>Konopka AK
</AU>
<TI>Computational Molecular Biology: From Sequence Research to Software
Development
</TI>
<SU>Sequence analysis;
    USA
</SU>
<AB>This editorial introduces a special issue entitled "Open Problems of
Computational Molecular Biology (2)." The papers were presented at the Second
International Workshop on Open Problems in Computational Molecular Biology,
Telluride, 19 July - 2 August 1992; they are devoted to aspects of 
computational
molecular biology (the goal of which is to understand biological phenomena
through computational experiments and plausible reasoning). Contents: Overviews
and Opinions (2 papers), Mathematical Models (3), Computational Experiments and
Molecular Evolution (3), Data Analysis and Software Development (4).
</AB>
<JT>Computers Chem</JT>
<PY>1993</PY>
<VO>17</VO>
<NO>2</NO>
<PP>v-vi</PP>
</SEQ>

<SEQ>
<UI>1031   Taylor,W.R.   Protein Structure Pred.. Computers Chem. 93 
17(2):117-122
</UI>
<AU>Taylor WR
</AU>
<TI>Protein Structure Prediction from Sequence
</TI>
<SU>Structure;
    Sequence alignment;
    Pattern match;
    UK;
    Protein;
    Prediction
</SU>
<AB>"The problem of protein tertiary structure prediction from sequence is
reviewed, emphasizing that practical solutions are most likely to come from the
recognition of existing (known) structures that fit the sequence of the protein
of unknown structure. Fit can be defined in terms of sequence alone - by simple
alignment in the more obvious problems or pattern matching where the similarity
is remote and fragmentary. More remote similarities can be recognized by
matching the sequence directly onto a known structure. This threading method is
outlined ...."
</AB>
<JT>Computers Chem</JT>
<PY>1993</PY>
<VO>17</VO>
<NO>2</NO>
<PP>117-122</PP>
</SEQ>

<SEQ>
<UI>1032   Gribskov,M.   A Mechanistic View of .. Computers Chem. 93 
17(2):113-116
</UI>
<AU>Gribskov M
</AU>
<TI>A Mechanistic View of Proteins and Their Sequences
</TI>
<SU>Sequence analysis;
    Mechanistic;
    USA;
    Protein
</SU>
<AB>"I consider the application of a mechanical analogy to sequence analysis
and in particular to protein sequences and structures. The mechanistic metaphor
is easily recognized as one of the fundamental concepts behind experimental
disciplines such as biochemistry, genetics and cell biology. Its application to
analysis of protein sequences is most clearly seen in the application of
comparative approaches to associating structure with function."
</AB>
<JT>Computers Chem</JT>
<PY>1993</PY>
<VO>17</VO>
<NO>2</NO>
<PP>113-116</PP>
</SEQ>

<SEQ>
<UI>1033   Salamon,P.    On the Robustness of M.. Computers Chem. 93 
17(2):135-148
</UI>
<AU>Salamon P;
    Wootton JC;
    Konopka AK;
    Hansen LK
</AU>
<TI>On the Robustness of Maximum Entropy Relationships for Complexity
Distributions of Nucleotide Sequences
</TI>
<SU>Composition;
    Entropy;
    Complexity;
    Distribution;
    USA;
    Robustness;
    Nucleotide
</SU>
<AB>"Given a functionally equivalent set of natural nucleotide sequences, the
distribution of local compositional complexity among all subsequences of this
set appears to be as random as possible consistent with the mean complexity of
such subsequences. The robustness of this relationship and its possible causes
have been explored ...."
</AB>
<JT>Computers Chem</JT>
<PY>1993</PY>
<VO>17</VO>
<NO>2</NO>
<PP>135-148</PP>
</SEQ>

<SEQ>
<UI>1034   Wootton,J.C.  Statistics of Local Co.. Computers Chem. 93 
17(2):149-163
</UI>
<AU>Wootton JC;
    Federhen S
</AU>
<TI>Statistics of Local Complexity in Amino Acid Sequences and Sequence
Databases
</TI>
<SU>Sequence analysis;
    Sequence database;
    Statistical;
    Complexity;
    Segment;
    USA;
    Amino acid
</SU>
<AB>"Protein sequences contain surprisingly many local regions of low
compositional complexity. ... Several different formal definitions of local
complexity and probability are presented here and are compared for their 
utility
in algorithms for localization of such regions in amino acid sequences and
sequence databases. ... These measures ... are shown to be broadly similar for
first-pass, approximate localization of low-complexity regions in protein
sequences, but they give significantly different results when applied in 
optimal
segmentation algorithms."
</AB>
<JT>Computers Chem</JT>
<PY>1993</PY>
<VO>17</VO>
<NO>2</NO>
<PP>149-163</PP>
</SEQ>

<SEQ>
<UI>1035   Bell,G.I.     Repetitive DNA Sequenc.. Computers Chem. 93 
17(2):185-190
</UI>
<AU>Bell GI;
    Torney DC
</AU>
<TI>Repetitive DNA Sequences: Some Considerations for Simple Sequence Repeats
</TI>
<SU>Regularities;
    Repeat;
    USA;
    DNA
</SU>
<AB>"(1) Can the polymorphism evident in the length of many simple sequence
repeats (SSRs) or microsatellites be explained as a result of unequal mitotic
crossing over? [Probably not.] ... (2) Some results are presented on the number
of mono- and di-nucleotide repeats in the human genome. For each high scoring
locus, an optimal alignment is made of the actual with an ideal SSR; for such
alignments, the relative numbers of insertions, deletions (indels), transitions
and transversions are obtained for each class of SSR. (3) An elementary
derivation of the number of equivalence classes of SSRs of any word length, n,
is given."
</AB>
<JT>Computers Chem</JT>
<PY>1993</PY>
<VO>17</VO>
<NO>2</NO>
<PP>185-190</PP>
</SEQ>

<SEQ>
<UI>1036   Wu,C.H.       Classification Neural .. Computers Chem. 93 
17(2):219-227
</UI>
<AU>Wu CH
</AU>
<TI>Classification Neural Networks for Rapid Sequence Annotation and 
Automated
Database Organization
</TI>
<SU>Sequence database;
    Neural;
    Classification;
    N-gram;
    USA;
    Network
</SU>
<AB>"A neural network classification method has been developed as an
alternative approach to the search/organization problem of large molecular
databases. Two artificial neural systems have been implemented on a Cray for
rapid protein/nucleic acid classification of unknown sequences. The system
employs a n-gram hashing function for sequence encoding and modular back-
propagation networks for classification. The protein system, which classifies
proteins into PIR superfamilies, has achieved 82-100% sensitivity at a speed
that is about an order of magnitude faster than other search methods."
</AB>
<JT>Computers Chem</JT>
<PY>1993</PY>
<VO>17</VO>
<NO>2</NO>
<PP>219-227</PP>
</SEQ>

<SEQ>
<UI>1037   Miura,R.M.    Preface [Some Mathemat.. Lect.Math.Life  86 17:ix-x
</UI>
<AU>Miura RM
</AU>
<TI>Preface [Some Mathematical Questions in Biology - DNA Sequence Analysis]
</TI>
<SU>Sequence analysis;
    CA;
    Pattern recognition;
    Sequence comparison;
    Statistical;
    Probabilistic;
    Structure;
    DNA
</SU>
<AB>"This volume contains papers based on lectures which were presented at 
the
Eighteenth Annual Symposium on Some Mathematical Questions in Biology - DNA
Sequence Analysis. The Symposium was held on May 28, 1984 in New York City in
conjunction with the Annual Meeting of the American Association for the
Advancement of Science."
</AB>
<JT>Lect Math Life Sci</JT>
<PY>17</PY>
<VO>17</VO>
<PP>ix-x</PP>
</SEQ>

<SEQ>
<UI>1038   Cole,R.       Tight Bounds on the Co.. SIAM J.Comput.  94 
23(5):1075-109
</UI>
<AU>Cole R
</AU>
<TI>Tight Bounds on the Complexity of the Boyer-Moore String Matching
Algorithm
</TI>
<SU>String match;
    Boyer-Moore;
    Pattern match;
    USA;
    Complexity;
    Algorithm
</SU>
<AB>"The problem of finding all occurrences of a pattern of length m in a 
text
of length n is considered. It is shown that the Boyer-Moore string matching
algorithm performs roughly 3n comparisons and that this bound is tight up to
O(n/m) .... While the upper bound is somewhat involved, its main elements
provide a simple proof of a 4n upper bound for the same algorithm."
</AB>
<JT>SIAM J Comput</JT>
<PY>1994</PY>
<VO>23</VO>
<NO>5</NO>
<PP>1075-1091</PP>
</SEQ>

<SEQ>
<UI>1039   Crochemore,M. Speeding Up Two String.. Algorithmica    94 
12(4/5):247-26
</UI>
<AU>Crochemore M;
    Czumaj A;
    Gasieniec L;
    Jarominek S;
    Lecroq T;
    Plandowski W;
    Rytter W
</AU>
<TI>Speeding Up Two String-Matching Algorithms
</TI>
<SU>String match;
    Pattern match;
    Suffix;
    Automata;
    Repetition;
    Boyer-Moore;
    Factor;
    PO;
    Algorithm
</SU>
<AB>"We show how to speed up two string-matching algorithms: the Boyer-Moore
algorithm (BM algorithm), and its version called here the reverse factor
algorithm (RF algorithm). The RF algorithm is based on factor graphs for the
reverse of the pattern. The main feature of both algorithms is that they scan
the text right-to-left from the supposed right position of the pattern. ... We
show that it is enough to remember the last matched segment ... to speed up the
RF algorithm considerably ... and to speed up the BM algorithm (to make at most
2n comparisons."
</AB>
<JT>Algorithmica </JT>
<PY>1994</PY>
<VO>12</VO>
<NO>4/5</NO>
<PP>247-267</PP>
</SEQ>

<SEQ>
<UI>1040   Vishkin,U.    Optimal Parallel Patte.. Lecture Notes i 85 
194:497-508
</UI>
<AU>Vishkin U
</AU>
<TI>Optimal Parallel Pattern Matching in Strings (Extended Summary)
</TI>
<SU>Pattern match;
    Parallel;
    String match;
    Optimal;
    IL
</SU>
<AB>Proceedings, ICALP'85. "Given a text of length n and a pattern, we 
present
a parallel linear algorithm for finding all occurrences of the pattern in the
text. The algorithm runs in O(n/p) time using any number of p &lt;= n/log n
processors on a concurrent-read concurrent-write parallel random-access-
machine."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>194</PY>
<VO>194</VO>
<PP>497-508</PP>
</SEQ>

<SEQ>
<UI>1041   Moore,D.      An Optimal Algorithm t.. Inform.Process. 94 
50:239-246
</UI>
<AU>Moore D;
    Smyth WF
</AU>
<TI>An Optimal Algorithm to Compute all the Covers of a String
</TI>
<SU>Cover;
    AU;
    Optimal;
    Algorithm
</SU>
<AB>"Let x denote a given nonempty string of length n &gt;= 1. A string u is a
cover of x if and only if every position of x lies within an occurrence of u
within x. Thus x is always a cover of itself. In this paper we characterize all
the covers of x in terms of an easily computed normal form for x. The
characterization theorem then gives rise to a simple recursive algorithm which
computes all the covers of x in time Q(n)."
</AB>
<JT>Inform Process Lett</JT>
<PY>50</PY>
<VO>50</VO>
<PP>239-246</PP>
</SEQ>

<SEQ>
<UI>1042   Perleberg,C.H Single Character Searc.. Inform.Process. 94 
50:269-275
</UI>
<AU>Perleberg CH
</AU>
<TI>Single Character Searching Methods and the Shift-Or Pattern-Matching
Algorithm
</TI>
<SU>Sequence search;
    Pattern match;
    CL;
    Algorithm
</SU>
<AB>"Single character searching (SCS) methods have wide application in text
processing, since many text processing algorithms need to search for a single
character in a text string. In this paper, we compare three SCS methods. Two 
SCS
methods are applied to the shift-or pattern matching algorithm of Baeza-Yates
and Gonnet (1992), and the performance of the different versions of the
algorithm are compared. ... Finally, the shift-or implementations are compared
to the Tuned Boyer-Moore implementation of Hume and Sunday (1991)."
</AB>
<JT>Inform Process Lett</JT>
<PY>50</PY>
<VO>50</VO>
<PP>269-275</PP>
</SEQ>

<SEQ>
<UI>1043   Breslauer,D.  Testing String Superpr.. Inform.Process. 94 
49:235-241
</UI>
<AU>Breslauer D
</AU>
<TI>Testing String Superprimitivity in Parallel
</TI>
<SU>String match;
    Cover;
    Parallel;
    Regularities;
    Italy
</SU>
<AB>"A string w covers another string z if every symbol of z is within some
occurrence of w in z. A string is called superprimitive if it is covered only 
by
itself .... This paper presents an optimal ... CRCW-PRAM algorithm that tests 
if
a string z is superprimitive ...."
</AB>
<JT>Inform Process Lett</JT>
<PY>49</PY>
<VO>49</VO>
<PP>235-241</PP>
</SEQ>

<SEQ>
<UI>1044   Apostolico,A. Optimal Superprimitivi.. Inform.Process. 91 
39(1):17-20
</UI>
<AU>Apostolico A;
    Farach M;
    Iliopoulos CS
</AU>
<TI>Optimal Superprimitivity Testing for Strings
</TI>
<SU>Cover;
    Optimal;
    Regularities;
    USA
</SU>
<AB>"A string w covers another string z if every position of z is within some
occurrence of w in z. Clearly, every string is covered by itself. A string that
is covered only by itself is superprimitive. We show that the property of being
superprimitive is testable on a string of n symbols in O(n) time and space."
</AB>
<JT>Inform Process Lett</JT>
<PY>1991</PY>
<VO>39</VO>
<NO>1</NO>
<PP>17-20</PP>
</SEQ>

<SEQ>
<UI>1045   Breslauer,D.  An On-line String Supe.. Inform.Process. 92 
44(6):345-347
</UI>
<AU>Breslauer D
</AU>
<TI>An On-line String Superprimitivity Test
</TI>
<SU>Cover;
    Prefix;
    Regularities;
    USA;
    On-line
</SU>
<AB>"We present an on-line linear-time algorithm that tests if each prefix of
an input string is superprimitive while the string is given a symbol at a 
time."
</AB>
<JT>Inform Process Lett</JT>
<PY>1992</PY>
<VO>44</VO>
<NO>6</NO>
<PP>345-347</PP>
</SEQ>

<SEQ>
<UI>1046   Amir,A.       Alphabet Dependence in.. Inform.Process. 94 
49:111-115
</UI>
<AU>Amir A;
    Farach M;
    Muthukrishnan S
</AU>
<TI>Alphabet Dependence in Parameterized Matching
</TI>
<SU>Parameterized;
    Pattern match;
    USA;
    String match
</SU>
<AB>"In this paper we provide an algorithm to find all occurrences of a
pattern string of length m in a text string of length n under the parameterized
pattern matching model. ... Our algorithm is optimal since we show that [a type
of] dependence ... is inherent to any algorithm for this problem in the
comparison model."
</AB>
<JT>Inform Process Lett</JT>
<PY>49</PY>
<VO>49</VO>
<PP>111-115</PP>
</SEQ>

<SEQ>
<UI>1047   Kannan,S.K.   Inferring Evolutionary.. SIAM J.Comput.  94 
23(4):713-737
</UI>
<AU>Kannan SK;
    Warnow TJ
</AU>
<TI>Inferring Evolutionary History from DNA Sequences
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Graph;
    USA;
    DNA
</SU>
<AB>"One of the longstanding problems in computational molecular biology is
the Character Compatibility Problem, which is concerned with the construction 
of
phylogenetic trees for species sets, where the species are defined by
characters. The character compatibility problem is NP-complete in general. In
this paper an O(n2k) time algorithm is described for the case where the species
are described by quaternary characters. This algorithm can be used to construct
phylogenetic trees from DNA sequences."
</AB>
<JT>SIAM J Comput</JT>
<PY>1994</PY>
<VO>23</VO>
<NO>4</NO>
<PP>713-737</PP>
</SEQ>

<SEQ>
<UI>1048   Baeza-Yates,R Fast String Matching w.. Inform.Comput.  94 
108(2):187-199
</UI>
<AU>Baeza-Yates RA;
    Gonnet GH
</AU>
<TI>Fast String Matching with Mismatches
</TI>
<SU>Match with k mismatches;
    String match;
    Boyer-Moore;
    Automata;
    CL
</SU>
<AB>"We describe and analyze three simple and fast algorithms on the average
for solving the problem of string matching with bounded number of mismatches.
These are the naive algorithm, an algorithm based on the Boyer-Moore approach,
and ad hoc deterministic finite automata searching. We include simulation
results that compare these algorithms to previous works."
</AB>
<JT>Inform Comput</JT>
<PY>1994</PY>
<VO>108</VO>
<NO>2</NO>
<PP>187-199</PP>
</SEQ>

<SEQ>
<UI>1049   Baeza-Yates,R Fast and Practical App.. Lecture Notes i 92 
644:185-192
</UI>
<AU>Baeza-Yates RA;
    Perleberg CH
</AU>
<TI>Fast and Practical Approximate String Matching
</TI>
<SU>Approximate match;
    String match;
    CL
</SU>
<AB>Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992.
Proceedings. "We present new algorithms for approximate string matching based 
in
simple, but efficient, ideas. First, we present an algorithm for string 
matching
with mismatches based in arithmetical operations that runs in linear worst case
time for most practical cases. This is a new approach to string searching.
Second, we present an algorithm for string matching with errors based on
partitioning the pattern that requires linear expected time for typical 
inputs."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>644</PY>
<VO>644</VO>
<PP>185-192</PP>
</SEQ>

<SEQ>
<UI>1050   Baeza-Yates,R Technical Corresponden.. Comm.ACM        92 
35(4):132-137
</UI>
<AU>Baeza-Yates R;
    Krogh FT;
    Ziegler B;
    Sibbald PR;
    Sunday DM
</AU>
<TI>Technical Correspondence. Notes on a Very Fast Substring Search Algorithm
</TI>
<SU>String search;
    String match;
    USA;
    Algorithm
</SU>
<AB>Four letters criticizing the paper by Sunday (1990), and his response.
</AB>
<JT>Comm ACM </JT>
<PY>1992</PY>
<VO>35</VO>
<NO>4</NO>
<PP>132-137</PP>
</SEQ>

<SEQ>
<UI>1051   Manber,U.     Approximate Pattern Ma.. Byte            92 17(12, 
Nov.):2
</UI>
<AU>Manber U;
    Wu S
</AU>
<TI>Approximate Pattern Matching
</TI>
<SU>Pattern match;
    Approximate match;
    USA
</SU>
<AB>"Most text editors and search programs do not support approximate 
searches
because of the complexity involved in implementing such a procedure. But some
new algorithms may change that. Below, we describe one such algorithm in
sufficient detail to enable you to include it in your own programs. We also
describe agrep, a Unix software tool for apporximate pattern matching that we
developed. Agrep includes many options that make searching powerful and
convenient."
</AB>
<JT>Byte </JT>
<PY>1992</PY>
<VO>17</VO>
<NO>12, Nov.</NO>
<PP>281-292</PP>
</SEQ>

<SEQ>
<UI>1052   Lipton,R.J.   Computational Approach.. Proc.IEEE       89 
77(7):1056-106
</UI>
<AU>Lipton RJ;
    Marr TG;
    Welsh JD
</AU>
<TI>Computational Approaches to Discovering Semantics in Molecular Biology
</TI>
<SU>Sequence comparison;
    USA
</SU>
<AB>"One of the central questions of molecular biology is the discovery of 
the
semantics of DNA. This discovery relies in a critical way on a variety of
expensive computations. In order to solve these computations, both parallel
computers and special-purpose hardware play a major role. ... In this paper we
discuss the basic methodology involved in discovering the evolutionary 
structure
of both DNA and proteins. ... The fundamental questions are just how the
sequences are to be compared. The implementation issues concern the vast 
amounts
of computation required to execute the known algorithms."
</AB>
<JT>Proc IEEE </JT>
<PY>1989</PY>
<VO>77</VO>
<NO>7</NO>
<PP>1056-1060</PP>
</SEQ>

<SEQ>
<UI>1053   Richards,F.M. The Protein Folding Pr.. Sci.Am.         91 
264(1):54-63
</UI>
<AU>Richards FM
</AU>
<TI>The Protein Folding Problem
</TI>
<SU>Structure;
    USA;
    Protein;
    Folding
</SU>
<AB>"In theory, all one needs to know in order to fold a protein into its
biologically active shape is the sequence of its constituent amino acids. Why
has nobody been able to put theory into practice?"
</AB>
<JT>Sci Am</JT>
<PY>1991</PY>
<VO>264</VO>
<NO>1</NO>
<PP>54-63</PP>
</SEQ>

<SEQ>
<UI>1054   Crochemore,M. Foreword [Selected Pap.. Theoret.Comput. 92 
92(1):1-1
</UI>
<AU>Crochemore M
</AU>
<TI>Foreword [Selected Papers of the Combinatorial Pattern Matching School,
Paris, July 1990]
</TI>
<SU>Pattern match;
    FR;
    Combinatorial
</SU>
<AB>"This volume contains a selection of papers that have been presented at
the first Combinatorial Pattern Matching school, held in Paris during July 
1990.
The school presented to young researchers a wide variety of combinatorial
methods used in the domain of Pattern Recognition, through lectures delivered 
by
A. V. Aho, A. Apostolico, M. Crochemore, Z. Galil and E. Ukkonen. Other
researchers have also presented their own works and the whole result is a kind
of panorama of what is being done in the domain of Pattern Matching and its
applications."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>1992</PY>
<VO>92</VO>
<NO>1</NO>
<PP>1-1</PP>
</SEQ>

<SEQ>
<UI>1055   Crochemore,M. Foreword [Combinatoria.. Lecture Notes i 94 
807:iii-iii
</UI>
<AU>Crochemore M;
    Gusfield D
</AU>
<TI>Foreword [Combinatorial Pattern Matching. 5th Annual Symposium, CPM 94]
</TI>
<SU>Pattern match;
    FR
</SU>
<AB>Asilomar, CA, USA, June 5-8, 1994. Proceedings. "Combinatorial Pattern
Matching addresses issues of searching and matching of strings and more
complicated patterns such as trees, regular expressions, extended expressions,
etc. The goal is to derive non-trivial combinatorial properties for such
structures and then to exploit these properties in order to achieve superior
performances for the corresponding computational problems." Contents: 
alignments
(7 papers), various matchings (5), combinatorial aspects (7), more bio-
informatics (7).
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>807</PY>
<VO>807</VO>
<PP>iii-iii</PP>
</SEQ>

<SEQ>
<UI>1056   Lander,E.S.   Genomic Mapping by Fin.. Genomics        88 
2:231-239
</UI>
<AU>Lander ES;
    Waterman MS
</AU>
<TI>Genomic Mapping by Fingerprinting Random Clones: A Mathematical Analysis
</TI>
<SU>Genome;
    Fingerprint;
    Mapping;
    Clone;
    USA;
    Genomic
</SU>
<AB>"The physical map is assembled by first 'fingerprinting' a large number 
of
clones chosen at random from a recombinant library and then inferring overlaps
between clones with sufficiently similar fingerprints. Although the basic
approach is the same, there are many possible choices for the fingerprint used
to characterize the clones and the rules for declaring overlap. In this paper,
we derive simple formulas showing how the progress of a physical mapping 
project
is affected by the nature of the fingerprinting scheme. Using these formulas, 
we
discuss the analytic considerations involved in selecting an appropriate
fingerprinting scheme for a particular project."
</AB>
<JT>Genomics </JT>
<PY>2</PY>
<VO>2</VO>
<PP>231-239</PP>
</SEQ>

<SEQ>
<UI>1057   Michiels,F.   Molecular Approaches t.. Comput.Appl.Bio 87 
3(3):203-210
</UI>
<AU>Michiels F;
    Craig AG;
    Zehetner G;
    Smith GP;
    Lehrach H
</AU>
<TI>Molecular Approaches to Genome Analysis: A Strategy for the Construction
of Ordered Overlapping Clone Libraries
</TI>
<SU>Genome;
    Statistical;
    Clone;
    DE
</SU>
<AB>"Here we describe progress on a series of molecular techniques designed 
to
bridge the gap between genetic and molecular distances in mammals. ... We
summarize approaches for the physical and molecular analysis of genetic
distances and describe the experimental, statistical and computational basis of
a new approach to create ordered libraries of overlapping clones from large
genomes."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1987</PY>
<VO>3</VO>
<NO>3</NO>
<PP>203-210</PP>
</SEQ>

<SEQ>
<UI>1058   Sulston,J.    Software for Genome Ma.. Comput.Appl.Bio 88 
4(1):125-132
</UI>
<AU>Sulston J;
    Mallett F;
    Staden R;
    Durbin R;
    Horsnell T;
    Coulson A
</AU>
<TI>Software for Genome Mapping by Fingerprinting Techniques
</TI>
<SU>Genome;
    Fingerprint;
    Program;
    UK;
    Mapping;
    Fragment
</SU>
<AB>"A genome mapping package has been developed for reading and assembling
data from clones analysed by restriction enzyme fragmentation and 
polyacrylamide
gel electrophoresis. The package comprises: data entry; matching; assembly;
statistical analysis; modelling."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1988</PY>
<VO>4</VO>
<NO>1</NO>
<PP>125-132</PP>
</SEQ>

<SEQ>
<UI>1059   Baeza-Yates,R Proximity Matching usi.. Lecture Notes i 94 
807:198-212
</UI>
<AU>Baeza-Yates R;
    Cunto W;
    Manber U;
    Wu S
</AU>
<TI>Proximity Matching using Fixed-Queries Trees
</TI>
<SU>Approximate match;
    Search tree;
    Data structure;
    CL
</SU>
<AB>5th Annual Symposium, CPM 94. Asilomar, June 5-8, 1994. Proceedings. "We
present a new data structure, called the fixed-queries tree, for the problem of
finding all elements of a fixed set that are close, under some distance
function, to a query element. Fixed-queries trees can be used for any distance
function, not necessarily even a metric, as long as it satisfies the triangle
inequality. ... Fixed-queries trees are particularly efficient for applications
in which comparing two elements is expensive."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>807</PY>
<VO>807</VO>
<PP>198-212</PP>
</SEQ>

<SEQ>
<UI>1060   Ukkonen,E.    Approximate String-Mat.. Theoret.Comput. 92 
92:191-211
</UI>
<AU>Ukkonen E
</AU>
<TI>Approximate String-Matching with q-Grams and Maximal Matches
</TI>
<SU>Approximate match;
    N-gram;
    Sequence proximity;
    Longest common;
    Edit;
    FI;
    String match
</SU>
<AB>"We study approximate string-matching in connection with two string
distance functions that are computable in linear time. The first function is
based on the so-called q-grams. An algorithm is given for the associated 
string-
matching problem that finds the locally best approximate occurrences of pattern
P, |P| = m, in text T, |T| = n, in time O(n log(m-q)). The other distance
function is based on finding maximal common substrings and allows a form of
approximate string-matching in time O(n). Both distances give a lower bound for
the edit distance ... which leads to fast hybrid algorithms for the edit
distance based string-matching."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>92</PY>
<VO>92</VO>
<PP>191-211</PP>
</SEQ>

<SEQ>
<UI>1061   Crochemore,M. String Matching with C.. Lecture Notes i 88 
324:44-58
</UI>
<AU>Crochemore M
</AU>
<TI>String Matching with Constraints
</TI>
<SU>String match;
    FR
</SU>
<AB>Proceedings, MFCS'88 Symposium, Carlsbad, Czechoslovakia. "In this paper,
two string-matching algorithms belonging to the second family [fixed word,
variable text] are presented. ... The first algorithm ... processes the text in
real-time. The delay only depends on the size of the alphabet. Our algorithm
heavily relies on properties of minimal automata recognizing the suffixes of a
word. ... We present ... an algorithm which requires only constant additional
memory space during all its phases .... It makes use of a deep theorem on words
... known as the critical factorization theorem."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>324</PY>
<VO>324</VO>
<PP>44-58</PP>
</SEQ>

<SEQ>
<UI>1062   Galil,Z.      An Improved Algorithm .. Lecture Notes i 89 
372:394-404
</UI>
<AU>Galil Z;
    Park K
</AU>
<TI>An Improved Algorithm for Approximate String Matching
</TI>
<SU>Approximate match;
    Automata;
    String match;
    USA;
    Match with k differences;
    Algorithm
</SU>
<AB>Automata, Languages, and Programming (ICALP'89), 16th International
Colloquium. "Given a text string, a pattern string, and an integer k, a new
algorithm for finding all occurrences of the pattern string in the text string
with at most k differences is presented. Both its theoretical and practical
variants improve the known algorithms."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>372</PY>
<VO>372</VO>
<PP>394-404</PP>
</SEQ>

<SEQ>
<UI>1063   Quong,R.W.    Fast Average-Case Patt.. Theoret.Comput. 92 
92:165-179
</UI>
<AU>Quong RW
</AU>
<TI>Fast Average-Case Pattern Matching by Multiplexing Sparse Tables
</TI>
<SU>Pattern match;
    Match with k mismatches;
    USA;
    Pattern recognition
</SU>
<AB>"Pattern matching consists of finding occurrences of a pattern in some
data. One general approach is to sample the data collecting evidence about
possible matches. By sampling appropriately, we force matches to be sparse and
can encode a table of size m as a series of smaller tables .... This method
yields practical algorithms with fast average-case running times for a wide
variety of pattern matching and pattern recognition problems. We apply our
technique of multiplexing sparse tables to the k-mismatches string searching
problem ...."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>92</PY>
<VO>92</VO>
<PP>165-179</PP>
</SEQ>

<SEQ>
<UI>1064   Baeza-Yates,R Algorithms for String .. SIGIR Forum     89 
23(3,4):34-58
</UI>
<AU>Baeza-Yates RA
</AU>
<TI>Algorithms for String Searching: A Survey
</TI>
<SU>Sequence search;
    Survey;
    CA;
    String search;
    Algorithm
</SU>
<AB>"We present the most important algorithms for string matching: the naive
algorithm, the Knuth-Morris-Pratt algorithm, different variants of the Boyer-
Moore algorithm, the shift-or algorithm, and the Karp-Rabin algorithm (a
probabilistic one). Experimental results for random text and one sample of
English text are included. We also survey the main theoretical results for each
algorithm. We use the C programming language to present our algorithms. ... An
extensive bibliography is also included."
</AB>
<JT>SIGIR Forum </JT>
<PY>1989</PY>
<VO>23</VO>
<NO>3,4</NO>
<PP>34-58</PP>
</SEQ>

<SEQ>
<UI>1065   Graham,S.L.   On Line Context Free L.. ACM Sympos.Theo 76 
8:112-120
</UI>
<AU>Graham SL;
    Harrison MA;
    Ruzzo WL
</AU>
<TI>On Line Context Free Language Recognition in less than Cubic Time
</TI>
<SU>Sequence recognition;
    Language;
    USA;
    On-line;
    Recognition
</SU>
<AB>"A new on-line context free language recognition algorithm is presented
which is derived from Earley's algorithm and has several advantages over the
original. First, the new algorithm not only is conceptually simpler than
Earley's, but also allows significant speed improvements. Second, our algorithm
serves to explain the connections between Earley's algorithm and the Cocke-
Kasami-Younger algorithm. Third, our algorithm allows an implementation which
uses only ... O( n3/log n ) operations on a RAM. This makes it the fastest 
known
on-line context free language recognition algorithm."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>8</PY>
<VO>8</VO>
<PP>112-120</PP>
</SEQ>

<SEQ>
<UI>1066   Hirschberg,D. The Least Weight Subse.. SIAM J.Comput.  87 
16(4):628-638
</UI>
<AU>Hirschberg DS;
    Larmore LL
</AU>
<TI>The Least Weight Subsequence Problem
</TI>
<SU>Least weight;
    Subsequence;
    Dynamic programming;
    USA
</SU>
<AB>"The least weight subsequence (LWS) problem is introduced, and is shown 
to
be equivalent to the classic minimum path problem for directed graphs. A 
special
case of the LWS problem is shown to be solvable in O(n log n) time generally,
and for certain weight functions, in linear time. A number of applications are
given, including an optimum paragraph formation problem and the problem of
finding a minimum height B-tree, whose solutions realize improvement in
asymptotic time complexity."
</AB>
<JT>SIAM J Comput</JT>
<PY>1987</PY>
<VO>16</VO>
<NO>4</NO>
<PP>628-638</PP>
</SEQ>

<SEQ>
<UI>1067   Valiant,L.G.  General Context-Free R.. J.Comput.System 75 
10:308-315
</UI>
<AU>Valiant LG
</AU>
<TI>General Context-Free Recognition in Less than Cubic Time
</TI>
<SU>Sequence recognition;
    Language;
    USA;
    Recognition
</SU>
<AB>"By a succession of reductions we show that context-free recognition, for
n character input strings, can be carried out at least as fast as 
multiplication
for n x n Boolean matrices. Using Strassen's method for matrix multiplication,
an indirect algorithm for general context-free recognition can be derived that
has time complexity O(n2.81). This is asymptotically more efficient than any of
the best previously known recognition schemes ... all of which require O(n3)
time in the worst case. The crucial result on which the new algorithm depends 
is
a general one that is applicable to a wide class of matrix computations."
</AB>
<JT>J Comput Systems Sci</JT>
<PY>10</PY>
<VO>10</VO>
<PP>308-315</PP>
</SEQ>

<SEQ>
<UI>1068   Wilber,R.     The Concave Least-Weig.. J.Algorithms    88 
9:418-425
</UI>
<AU>Wilber R
</AU>
<TI>The Concave Least-Weight Subsequence Problem Revisited
</TI>
<SU>Least weight;
    Subsequence;
    USA
</SU>
<AB>"D. S. Hirschberg and L. L. Larmore (1987) showed that the concave least-
weight subsequence problem can be solved in O(n log n) time and that if a
certain extra condition is imposed it can be solved in O(n) time. Here we show
that the concave least weight subsequence problem can always be solved in O(n)
time, without any extra conditions."
</AB>
<JT>J Algorithms </JT>
<PY>9</PY>
<VO>9</VO>
<PP>418-425</PP>
</SEQ>

<SEQ>
<UI>1069   Chang,J.H.    Parallel Parsing on a .. Proceedings o.. 86IEEE 
Computer S
</UI>
<AU>Chang JH;
    Ibarra OH;
    Palis MA
</AU>
<TI>Parallel Parsing on a One-Way Array of Finite-State Machines
</TI>
<ED>Hwang K
    Jacobs SM;
    Swartzlander EE
</ED>
<BK>Proceedings of the 1986 International Conference on Parallel Processing,
August 19-22, 1986
</BK>
<SU>Sequence recognition;
    Language;
    Automata;
    USA;
    Parallel;
    Parsing
</SU>
<AB>"We show that a one-way two-dimensional iterative array of finite-state
machines can recognize and parse strings of any context-free language in linear
time. What makes this result interesting and rather surprising is the fact that
each processor of the array holds only a fixed amount of information
(independent of the size of the input) and communicates with its neighbors in
only one direction. This makes for a simple VLSI implementation."
</AB>
<PU>IEEE Computer Society Press </PU>
<PL>Washington, DC </PL>
<PY>1986</PY>
<PP>887-894</PP>
</SEQ>

<SEQ>
<UI>1070   Chiang,Y.T.   Parallel Parsing Algor.. IEEE Trans.Patt 84 
6(3):302-314
</UI>
<AU>Chiang YT;
    Fu KS
</AU>
<TI>Parallel Parsing Algorithms and VLSI Implementations for Syntactic 
Pattern
Recognition
</TI>
<SU>Sequence recognition;
    Language;
    Parallel;
    Pattern recognition;
    Parsing;
    VLSI;
    USA;
    Algorithm;
    Recognition
</SU>
<AB>"Earley's algorithm has been commonly used for the parsing of general
context-free languages and the error-correcting parsing in syntactic pattern
recognition. ... This paper presents a parallel Earley's recognition algorithm
in terms of an 'X*' operator. ... Simulation results show that this system can
recognize a string with length n in 2n+1 system time. We also present a 
parallel
parse-extraction algorithm, a complete parsing algorithm, and an error-
correcting recognition algorithm. ... These parallel algorithms are especially
useful for syntactic pattern recognition."
</AB>
<JT>IEEE Trans Patt Anal Mach Intell</JT>
<PY>1984</PY>
<VO>6</VO>
<NO>3</NO>
<PP>302-314</PP>
</SEQ>

<SEQ>
<UI>1071   Kosaraju,S.R. Speed of Recognition o.. SIAM J.Comput.  75 
4(3):331-340
</UI>
<AU>Kosaraju SR
</AU>
<TI>Speed of Recognition of Context-Free Languages by Array Automata
</TI>
<SU>Sequence recognition;
    Language;
    Automata;
    USA;
    Recognition
</SU>
<AB>"The recognition speed of context-free languages (CFL's) using arrays of
finite state machines is considered. It is shown that CFL's can be recognized 
by
2-dimensional arrays in linear time and by 1-dimensional arrays in time n2."
</AB>
<JT>SIAM J Comput</JT>
<PY>1975</PY>
<VO>4</VO>
<NO>3</NO>
<PP>331-340</PP>
</SEQ>

<SEQ>
<UI>1072   Dolev,D.      Parallel Computation o.. Parallel Proc.. 88Elsevier 
Scienc
</UI>
<AU>Dolev D;
    Gil J
</AU>
<TI>Parallel Computation of Edit Distance
</TI>
<ED>Chiricozzi E
    D'Amico A
</ED>
<BK>Parallel Processing and Applications
</BK>
<SU>Longest common;
    Edit;
    Parallel;
    IL;
    Distance
</SU>
<AB>"The subject of the paper is the edit distance between two strings of
characters. We will introduce two parallel algorithms to compute the elementary
operations useful in evaluating the edit distance. ... The parallel model used
is CRCW-PRAM .... The main algorithm in the paper is an efficient algorithm for
solving the editing distance with arbitrary weights for the operations replace,
delete, and insert. ... When the weights of the three functions are ..., the
editing distance reduces to the problem of finding the longest common
subsequence (LCS) of the two strings."
</AB>
<PU>Elsevier Science </PU>
<PL>Amsterdam </PL>
<PY>1988</PY>
<PP>265-275</PP>
</SEQ>

<SEQ>
<UI>1073   Burnett,L.    Development of a Super.. Nucleic Acids R 86 
14(1):47-55
</UI>
<AU>Burnett L
</AU>
<TI>Development of a Superior Strategy for Computer-Assisted Nucleotide
Sequence Analysis
</TI>
<SU>Pairwise comparison;
    Region;
    AU;
    Sequence analysis;
    Nucleotide
</SU>
<AB>"A new strategy for high-resolution nucleotide sequence analysis has been
developed. The strategy involves an exhaustive tree-searching algorithm which
examines all possible combinations of short regions of sequence alignments,
followed by culling of unsuitable sequence relationships. The new algorithm can
detect sequence homologies invisible to existing algorithms, and is capable of
detecting all possible sequence relationships."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1986</PY>
<VO>14</VO>
<NO>1</NO>
<PP>47-55</PP>
</SEQ>

<SEQ>
<UI>1074   Lawrence,C.B. Data structures for DN.. Nucleic Acids R 86 
14(1):205-216
</UI>
<AU>Lawrence CB
</AU>
<TI>Data structures for DNA Sequence Manipulation
</TI>
<SU>Data structure;
    Sequence database;
    USA;
    Structure;
    DNA
</SU>
<AB>"Two data structures designated Fragment and Construct are described. The
Fragment data structure defines a continuous nucleic acid sequence from a 
unique
genetic origin. The Construct defines a continuous sequence composed of
sequences from multiple genetic origins. These data structures are manipulated
by a set of software tools to simulate the construction of mosaic recombinant
DNA molecules. They are also used as an interface between sequence data banks
and analytical programs."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1986</PY>
<VO>14</VO>
<NO>1</NO>
<PP>205-216</PP>
</SEQ>

<SEQ>
<UI>1075   Roberts,R.J.  Preface [Special issue.. Nucleic Acids R 86 
14(1):0-0
</UI>
<AU>Roberts RJ;
    Soll D
</AU>
<TI>Preface [Special issue devoted to the applications of computers to
research on nucleic acids]
</TI>
<SU>Sequence analysis;
    Program;
    USA
</SU>
<AB>"This is the third special issue of Nucleic Acids Research devoted to the
applications of computers to research on nucleic acids. ... The aim of these
special issues has been to heighten the awareness of both scientists and
programmers to the broad range of software that is currently available." Table
of contents: 63 papers, 620 pages.
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1986</PY>
<VO>14</VO>
<NO>1</NO>
<PP>0-0</PP>
</SEQ>

<SEQ>
<UI>1076   Roberts,R.J.  Preface [Issue devoted.. Nucleic Acids R 82 
10(1):0-0
</UI>
<AU>Roberts RJ;
    Soll D
</AU>
<TI>Preface [Issue devoted to the applications of computers to research on
nucleic acids]
</TI>
<SU>Sequence analysis;
    Program;
    USA
</SU>
<AB>"The programs described range from straightforward algorithms based on
simple search routines to some highly sophisticated packages for data base
management and analysis." Table of contents: 38 papers, 456 pages.
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1982</PY>
<VO>10</VO>
<NO>1</NO>
<PP>0-0</PP>
</SEQ>

<SEQ>
<UI>1077   Pearson,W.R.  Automatic Construction.. Nucleic Acids R 82 
10(1):217-227
</UI>
<AU>Pearson WR
</AU>
<TI>Automatic Construction of Restriction Site Maps
</TI>
<SU>Restriction;
    Program;
    USA
</SU>
<AB>"A computer program is described which constructs maps of restriction
endonuclease cleavage sites in DNA molecules, given only the fragment lengths.
The program utilizes fragment length data from single and double restriction
enzyme digests to generate maps for linear or circular molecules."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1982</PY>
<VO>10</VO>
<NO>1</NO>
<PP>217-227</PP>
</SEQ>

<SEQ>
<UI>1078   Roberts,R.J.  Preface [Issue devoted.. Nucleic Acids R 84 
12(1):0-0
</UI>
<AU>Roberts RJ;
    Soll D
</AU>
<TI>Preface [Issue devoted to the applications of computers to research on
nucleic acids]
</TI>
<SU>Sequence analysis;
    Program;
    USA
</SU>
<AB>"It is the aim of this second special issue to heighten the awareness of
both scientists and programmers to the broad range of software that is 
currently
available." Table of contents: part 1, 37 papers; part 2, 42 papers.
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1984</PY>
<VO>12</VO>
<NO>1</NO>
<PP>0-0</PP>
</SEQ>

<SEQ>
<UI>1079   Waterman,M.S. Algorithms for Restric.. Nucleic Acids R 84 
12(1):237-242
</UI>
<AU>Waterman MS;
    Smith TF;
    Katcher HL
</AU>
<TI>Algorithms for Restriction Map Comparisons
</TI>
<SU>Restriction;
    Mapping;
    USA;
    Algorithm
</SU>
<AB>"An algorithm is presented which compares two restriction maps, yielding 
a
measure of distance between the maps and relating the maps by an alignment. 
This
new algorithm finds the minimum weighted sum of genetic events required to
convert one map into the other, where the genetic events are the
appearance/disappearance of restriction sites and changes in the number of 
bases
between restriction sites."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1984</PY>
<VO>12</VO>
<NO>1</NO>
<PP>237-242</PP>
</SEQ>

<SEQ>
<UI>1080   Bucher,P.     Signal Search Analysis.. Nucleic Acids R 84 
12(1):287-305
</UI>
<AU>Bucher P;
    Bryan B
</AU>
<TI>Signal Search Analysis: A New Method to Localize and Characterize
Functionally Important DNA Sequences
</TI>
<SU>Signal;
    Pattern discovery;
    N-gram;
    SWI;
    DNA
</SU>
<AB>"The generation of 'signal search data' represents a general method of
describing the common properties of a set of DNA sequences presumed to be
functionally analogous. Besides the detailed description of this method we
present two computer programs which use signal search data as input data: One
that processes them to a 'constraint profile' and another one which lists over-
represented 'signals' of potential functional relevance."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1984</PY>
<VO>12</VO>
<NO>1</NO>
<PP>287-305</PP>
</SEQ>

<SEQ>
<UI>1081   Sankoff,D.    A Strategy for Sequenc.. Nucleic Acids R 82 
10(1):421-431
</UI>
<AU>Sankoff D;
    Cedergren RJ;
    McKay W
</AU>
<TI>A Strategy for Sequence Phylogeny Research
</TI>
<SU>Phylogeny;
    CA;
    Statistical
</SU>
<AB>"The proliferation of sequence data eventually exceeds the capacity of
rigorous minimal mutation methods. Rather than having recourse to rapid
suboptimal or matrix methods, which lead to uncertain, ambiguous and non-unique
results, we suggest here a way of combining reasonable degrees of biological
and/or statistical certainty about the data with absolute optimization
procedures. This reduces the computing problem without the disadvantages of
suboptimal methods."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1982</PY>
<VO>10</VO>
<NO>1</NO>
<PP>421-431</PP>
</SEQ>

<SEQ>
<UI>1082   Fuchs,R.      New Services of the EM.. Nucleic Acids R 90 
18(15):4319-43
</UI>
<AU>Fuchs R;
    Stoehr P;
    Rice P;
    Omond R;
    Cameron G
</AU>
<TI>New Services of the EMBL Data Library
</TI>
<SU>Sequence database;
    Database search;
    FASTA;
    DE;
    EMBL
</SU>
<AB>"The EMBL File Server has been reorganised, and many new databases and
other information relevant to biologists are now accessible via global computer
networks. A broad range of software for molecular biology is freely available
for different popular computer systems, including the EMBL enhancements to the
Wisconsin (GCG) Package. The new Mail-Quicksearch and Mail-FastA services give
access to the latest sequence data for database searches by ordinary electronic
mail."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1990</PY>
<VO>18</VO>
<NO>15</NO>
<PP>4319-4323</PP>
</SEQ>

<SEQ>
<UI>1083   Brendel,V.    A Computer Algorithm f.. Nucleic Acids R 84 
12(10):4411-44
</UI>
<AU>Brendel V;
    Trifonov EN
</AU>
<TI>A Computer Algorithm for Testing Potential Prokaryotic Terminators
</TI>
<SU>Match a pattern matrix;
    IL;
    Algorithm
</SU>
<AB>"An algorithm to locate terminators in templates of known nucleotide
sequence has been constructed on the basis of correlation to the distribution 
of
dinucleotides along the aligned signal sequences. The algorithm has been tested
on natural sequences of a total length of about 11,500 N. It finds all known
independent terminators and only a few other sites, including some of the rho-
dependent and putative terminators."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1984</PY>
<VO>12</VO>
<NO>10</NO>
<PP>4411-4427</PP>
</SEQ>

<SEQ>
<UI>1084   Mulligan,M.E. Escherichia coli Pro  .. Nucleic Acids R 84 
12(1):789-800
</UI>
<AU>Mulligan ME;
    Hawley DK;
    Entriken R;
    McClure WR
</AU>
<TI>Escherichia coli Promoter Sequences Predict in vitro RNA Polymerase
Selectivity
</TI>
<SU>Match complex patterns;
    Sequence proximity;
    USA;
    Match a pattern matrix;
    RNA
</SU>
<AB>"We describe a simple algorithm for computing a homology score for
Escherichia coli promoters based on DNA sequence alone. ... The search for a
possible promoter site within a DNA sequence occurs in two steps. Initially the
locations of sequences homologous to the consensus sequence of the two most
highly conserved regions are identified. ... The second stage of the promoter
search is the combination of -35 sequences with -10 sequences to form potential
promoters. ... Once a potential promoter has been located, it is then evaluated
according to a weighting scheme."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1984</PY>
<VO>12</VO>
<NO>1</NO>
<PP>789-800</PP>
</SEQ>

<SEQ>
<UI>1085   Lake,J.A.     Determining Evolutiona.. J.Mol.Evol.     87 26:59-73
</UI>
<AU>Lake JA
</AU>
<TI>Determining Evolutionary Distances from Highly Diverged Nucleic Acid
Sequences: Operator Metrics
</TI>
<SU>Phylogeny;
    Invariant;
    Statistical;
    USA;
    Substitution;
    Evolutionary distance;
    Distance
</SU>
<AB>"Operator metrics are explicitly designed to measure evolutionary
distances from nucleic acid sequences when substitution rates differ greatly
among the organisms being compared, or when substitutions have been extensive.
Unlike lengths calculated by the distance matrix and parsimony methods, in 
which
substitutions in one branch of a tree can alter the measured length of another
branch, lengths determined by operator metrics are not affected by 
substitutions
outside the branch."
</AB>
<JT>J Mol Evol</JT>
<PY>26</PY>
<VO>26</VO>
<PP>59-73</PP>
</SEQ>

<SEQ>
<UI>1086   Klotz,L.C.    Calculation of Evoluti.. Proc.Nat.Acad.S 79 
76(9):4516-452
</UI>
<AU>Klotz LC;
    Komar N;
    Blanken RL;
    Mitchell RM
</AU>
<TI>Calculation of Evolutionary Trees from Sequence Data
</TI>
<SU>Phylogeny;
    USA;
    Evolutionary tree;
    Substitution
</SU>
<AB>"In this paper we present a method for calculating evolutionary trees 
from
sequence data that uses, along with the difference matrix [constructed from the
sequence differences between pairs of sequences from the organisms], the rate 
of
evolution of the various sequences from their common ancestor. It is proven
analytically that this method uniquely determines both the correct tree 
topology
and root in theory for unequal rates of sequence evolution. How one would
estimate an ancestral sequence to be used in the method is discussed ...."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>1979</PY>
<VO>76</VO>
<NO>9</NO>
<PP>4516-4520</PP>
</SEQ>

<SEQ>
<UI>1087   Waterman,M.S. Parametric and Ensembl.. Bull.Math.Biol. 94 
56(4):743-767
</UI>
<AU>Waterman MS
</AU>
<TI>Parametric and Ensemble Sequence Alignment Algorithms
</TI>
<SU>Sequence alignment;
    Dynamic programming;
    Locally optimal;
    USA;
    Parametric;
    Algorithm
</SU>
<AB>"Recently algorithms for parametric alignment ... find optimal scores for
all penalty parameters, both for global and local sequence alignment. This 
paper
reviews those techniques. Then in the main part of this paper dynamic
programming methods are used to compute ensemble alignment, finding all
alignment scores for all parameters. Both global and local ensemble alignments
are studied, and parametric alignment is used to compute near optimal ensemble
alignments."
</AB>
<JT>Bull Math Biol</JT>
<PY>1994</PY>
<VO>56</VO>
<NO>4</NO>
<PP>743-767</PP>
</SEQ>

<SEQ>
<UI>1088   Felsenstein,J Evolutionary Trees fro.. J.Mol.Evol.     81 
17:368-376
</UI>
<AU>Felsenstein J
</AU>
<TI>Evolutionary Trees from DNA Sequences: A Maximum Likelihood Approach
</TI>
<SU>Phylogeny;
    Statistical;
    Likelihood;
    Character data;
    Evolutionary rate;
    Program;
    USA;
    Evolutionary tree;
    DNA
</SU>
<AB>"The application of maximum likelihood techniques to the estimation of
evolutionary trees from nucleic acid sequence data is discussed. A
computationally feasible method for finding such maximum likelihood estimates 
is
developed, and a computer program is available. The method has advantages over
the traditional parsimony algorithms, which can give misleading results if 
rates
of evolution differ in different lineages. It also allows the testing of
hypotheses about the constancy of evolutionary rates by likelihood ratio tests,
and gives rough indication of the error of the estimate of the tree."
</AB>
<JT>J Mol Evol</JT>
<PY>17</PY>
<VO>17</VO>
<PP>368-376</PP>
</SEQ>

<SEQ>
<UI>1089   Felsenstein,J Inferring Evolutionary.. Statistical A.. 83Marcel 
Dekker
</UI>
<AU>Felsenstein J
</AU>
<TI>Inferring Evolutionary Trees from DNA Sequences
</TI>
<ED>Weir BS
</ED>
<BK>Statistical Analysis of DNA Sequence Data
</BK>
<SU>Statistical;
    Phylogeny;
    Likelihood;
    USA;
    Evolutionary tree;
    Analytical;
    Robustness;
    DNA
</SU>
<AB>See Weir (1983) for the book's bibliography, pp. 231-248. Introduction.
Parsimony methods. Maximum likelihood methods. Alternatives to Likelihood. The
state of the problem
</AB>
<PU>Marcel Dekker </PU>
<PL>New York </PL>
<PY>1983</PY>
<PP>133-150</PP>
</SEQ>

<SEQ>
<UI>1090   Tavare,S.     Some Probabilistic and.. Lect.Math.Life  86 17:57-86
</UI>
<AU>Tavare S
</AU>
<TI>Some Probabilistic and Statistical Problems in the Analysis of DNA
Sequences
</TI>
<SU>Probabilistic;
    Statistical;
    Sequence analysis;
    Substitution;
    USA;
    DNA
</SU>
<AB>"This paper concentrates on statistical aspects of the estimation of
substitution rates and divergence times on the basis of DNA sequence data. A 
new
method of estimation is suggested, and exhibited using data from serum albumin
and a-fetoprotein. The divergence time of the rat and mouse is estimated using 
a
tree calibrated by the human-rat divergence time. Some inherent difficulties in
these methods are highlighted by statistical analysis of the sequences."
</AB>
<JT>Lect Math Life Sci</JT>
<PY>17</PY>
<VO>17</VO>
<PP>57-86</PP>
</SEQ>

<SEQ>
<UI>1091   Naor,D.       On Suboptimal Alignmen.. Lecture Notes i 93 
684:179-196
</UI>
<AU>Naor D;
    Brutlag D
</AU>
<TI>On Suboptimal Alignments of Biological Sequences
</TI>
<SU>Sequence alignment;
    Suboptimal;
    Enumeration;
    USA
</SU>
<AB>4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings.
"We present a method for representing all alignments whose score is within any
given delta from the optimal score. It represents a large number of alignments
by a compact graph which makes it easy to impose additional biological
constraints and select one desirable alignment from this large set. ... We
define a set of 'canonical' suboptimal alignments, and argue that these are the
essential ones since any other suboptimal alignment is a combination of few
canonical ones. We then show how to efficiently enumerate suboptimal alignments
in order of their score, and count their numbers."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>684</PY>
<VO>684</VO>
<PP>179-196</PP>
</SEQ>

<SEQ>
<UI>1092   Klotz,L.C.    A Practical Method for.. J.Theor.Biol.   81 
91:261-272
</UI>
<AU>Klotz LC;
    Blanken RL
</AU>
<TI>A Practical Method for Calculating Evolutionary Trees from Sequence Data
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    USA
</SU>
<AB>"In a previous paper (Klotz et al., 1979) we described a method for
determining evolutionary trees from sequence data when rates of evolution of 
the
sequences might differ greatly. ... However, the method is impractical to use 
in
most situations because it requires some knowledge of the ancestor. In this
present paper we describe another method, related to the previous one, in which
a present-day sequence can serve temporarily as an ancestor for purposes of
determining the evolutionary tree regardless of the rates of evolution of the
sequences involved."
</AB>
<JT>J Theor Biol</JT>
<PY>91</PY>
<VO>91</VO>
<PP>261-272</PP>
</SEQ>

<SEQ>
<UI>1093   Zharkikh,A.A. Rapid Evaluation of Nu.. Dokl.Biol.Sci.  89 
308:611-613
</UI>
<AU>Zharkikh AA;
    Rzhetskii AY
</AU>
<TI>Rapid Evaluation of Nucleotide Sequence Homology by Oligonucleotide
Frequency Analysis
</TI>
<SU>Sequence comparison;
    Composition;
    Homology;
    Database search;
    N-gram;
    RU;
    Nucleotide
</SU>
<AB>Translated from Doklady Akademii Nauk SSSR, 308(5), 1232-1235, Oct. 1989.
"The rapid evaluation of homologies between two or more DNA sequences is
important in searching for homologs in databases, in molecular taxonomy, and 
for
other purposes. The complex and time-consuming nature of homology analysis is
such that it is appropriate to substitute standard methods with faster but less
accurate methods using a relatively small number of their integral
characteristics. For this purpose we report the use of a measure of the
similarity of the oligonucleotide composition of DNA sequences."
</AB>
<JT>Dokl Biol Sci</JT>
<PY>308</PY>
<VO>308</VO>
<PP>611-613</PP>
</SEQ>

<SEQ>
<UI>1094   Kimura,M.     Estimation of Evolutio.. Proc.Nat.Acad.S 81 
78(1):454-458
</UI>
<AU>Kimura M
</AU>
<TI>Estimation of Evolutionary Distances between Homologous Nucleotide
Sequences
</TI>
<SU>Substitution;
    Codon;
    Pairwise comparison;
    Evolutionary distance;
    Statistical;
    Distance;
    JP;
    Nucleotide;
    Estimation
</SU>
<AB>"By using two models of evolutionary base substitutions - 'three-
substitution-type' and 'two-frequency-class' models - some formulae are derived
which permit a simple estimation of the evolutionary distances (and also the
evolutionary rates when the divergence times are known) through comparative
studies of DNA (and RNA) sequences. These formulae are applied to estimate the
base substitution rates at the first, second, and third positions of codons 
....
Also, formulae for estimating the synonymous component (at the third codon
position) and the standard errors are obtained."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>1981</PY>
<VO>78</VO>
<NO>1</NO>
<PP>454-458</PP>
</SEQ>

<SEQ>
<UI>1095   Amir,A.       An Alphabet Independen.. SIAM J.Comput.  94 
23(2):313-323
</UI>
<AU>Amir A;
    Benson G;
    Farach M
</AU>
<TI>An Alphabet Independent Approach to Two-Dimensional Pattern Matching
</TI>
<SU>Pattern match;
    Regularities;
    Multidimensional;
    USA;
    String match
</SU>
<AB>"The authors show an algorithm for two-dimensional matching with an O(n2)
text-scanning phase. Furthermore, the text scan requires no special assumptions
about the alphabet, i.e., it runs on the same model as the standard linear-time
string-matching algorithm. The pattern preprocessing requires an ordered
alphabet and runs with the same alphabet dependency as the previously known
algorithms."
</AB>
<JT>SIAM J Comput</JT>
<PY>1994</PY>
<VO>23</VO>
<NO>2</NO>
<PP>313-323</PP>
</SEQ>

<SEQ>
<UI>1096   Amir,A.       Two-dimensional Dictio.. Inform.Process. 92 
44(5):233-239
</UI>
<AU>Amir A;
    Farach M
</AU>
<TI>Two-dimensional Dictionary Matching
</TI>
<SU>Dictionary match;
    Multidimensional;
    USA
</SU>
<AB>"In this paper, we present an algorithm for the Two-Dimensional 
Dictionary
Problem. [It] is that of finding each occurrence of a set of two-dimensional
patterns in a text."
</AB>
<JT>Inform Process Lett</JT>
<PY>1992</PY>
<VO>44</VO>
<NO>5</NO>
<PP>233-239</PP>
</SEQ>

<SEQ>
<UI>1097   Bird,R.S.     Two Dimensional Patter.. Inform.Process. 77 
6(5):168-170
</UI>
<AU>Bird RS
</AU>
<TI>Two Dimensional Pattern Matching
</TI>
<SU>Pattern match;
    Multidimensional;
    Knuth-Morris-Pratt;
    UK
</SU>
<AB>"In this case the problem is to determine where, if anywhere, the pattern
occurs as a subarray of the text. Our purpose is to give an algorithm for the
two dimensional case, one which follows the general approach of the [Knuth-
Morris-Pratt algorithm], and indeed uses the KMP as a subprogram."
</AB>
<JT>Inform Process Lett</JT>
<PY>1977</PY>
<VO>6</VO>
<NO>5</NO>
<PP>168-170</PP>
</SEQ>

<SEQ>
<UI>1098   Zhu,R.F.      A Technique for Two-Di.. Comm.ACM        89 
32(9):1110-112
</UI>
<AU>Zhu RF;
    Takaoka T
</AU>
<TI>A Technique for Two-Dimensional Pattern Matching
</TI>
<SU>Pattern match;
    Multidimensional;
    JP
</SU>
<AB>"By reducing an array matching problem to a string matching problem in a
natural way, it is shown that efficient string matching algorithms can be
applied to arrays, assuming that a linear preprocessing is made on the text. 
...
In this article we first present an efficient pattern matching algorithm for 
the
two-dimensional case, one which is a combination of the [Knuth-Morris-Pratt] 
and
[Rabin-Karp] algorithms. ... Computer experiments show that for various pattern
sizes the average cost for either of our algorithms is much less than that of
the algorithm proposed by Bird (1977)."
</AB>
<JT>Comm ACM </JT>
<PY>1989</PY>
<VO>32</VO>
<NO>9</NO>
<PP>1110-1120</PP>
</SEQ>

<SEQ>
<UI>1099   Stanfill,C.   Parallel Free-Text Sea.. Comm.ACM        86 
29(12):1229-12
</UI>
<AU>Stanfill C;
    Kahle B
</AU>
<TI>Parallel Free-Text Search on the Connection Machine System
</TI>
<SU>Parallel;
    Text search;
    Database search;
    USA;
    Hardware
</SU>
<AB>"A new implementation of free-text search using a new parallel computer -
the Connection Machine - makes possible the application of exhaustive methods
not previously feasible for large databases."
</AB>
<JT>Comm ACM </JT>
<PY>1986</PY>
<VO>29</VO>
<NO>12</NO>
<PP>1229-1239</PP>
</SEQ>

<SEQ>
<UI>1100   Reeves,P.R.   MULTICOMP: A Program f.. Comput.Appl.Bio 94 
10(3):281-284
</UI>
<AU>Reeves PR;
    Farnell L;
    Lan R
</AU>
<TI>MULTICOMP: A Program for Preparing Sequence Data for Phylogenetic 
Analysis
</TI>
<SU>Phylogeny;
    Management;
    AU;
    Program;
    Phylogenetic
</SU>
<AB>"MULTICOMP is a program that assists in the phylogenetic analysis of DNA
sequences. It streamlines sequence handling and analysis. Input is from either
individual sequence files or a file of aligned sequences. It produces data on
variation at DNA and amino acid sequence level and can also convert sequences 
to
data formats suitable for PHYLIP, PAUP and MacClade phylogenetic inference
programs. Further, two tree-building programs, NEIGHBOR and DNAPARS, of PHYLIP
can be directly run from within it."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>3</NO>
<PP>281-284</PP>
</SEQ>

<SEQ>
<UI>1101   Sadler,J.R.   Regulatory Pattern Ide.. Nucleic Acids R 83 
11(7):2221-223
</UI>
<AU>Sadler JR;
    Waterman MS;
    Smith TF
</AU>
<TI>Regulatory Pattern Identification in Nucleic Acid Sequences
</TI>
<SU>Pattern recognition;
    Identification;
    USA;
    Nucleic acid
</SU>
<AB>"A critique of the often employed consensus and local homology methods
suggests the need for new tools. In particular, such new methods should use the
positional and structural data now becoming available on exactly what it is 
that
is recognized in the DNA sequence by sequence-specific binding proteins."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1983</PY>
<VO>11</VO>
<NO>7</NO>
<PP>2221-2231</PP>
</SEQ>

<SEQ>
<UI>1102   Breen,S.      Renewal Theory for Sev.. J.Appl.Probab.  85 
22:228-234
</UI>
<AU>Breen S;
    Waterman MS;
    Zhang N
</AU>
<TI>Renewal Theory for Several Patterns
</TI>
<SU>Sequence analysis;
    Statistical;
    USA
</SU>
<AB>"Discrete renewal theory is generalized to study the occurrence of a
collection of patterns in random sequences, where a renewal is defined to be 
the
occurrence of one of the patterns in the collection which does not overlap an
earlier renewal. The action of restriction enzymes on DNA sequences provided
motivation for this work. Related results of Guibas and Odlyzko are discussed."
</AB>
<JT>J Appl Probab</JT>
<PY>22</PY>
<VO>22</VO>
<PP>228-234</PP>
</SEQ>

<SEQ>
<UI>1103   Waterman,M.S. Consensus Methods for .. Mathematical .. 89CRC Press
</UI>
<AU>Waterman MS
</AU>
<TI>Consensus Methods for Folding Single-Stranded Nucleic Acids
</TI>
<ED>Waterman MS
</ED>
<BK>Mathematical Methods for DNA Sequences
</BK>
<SU>Consensus method;
    Structure;
    USA;
    Nucleic acid;
    Folding
</SU>
<AB>"The structure of single-stranded RNA macromolecules is crucial to the
functioning of an organism. ... Other than by guessing or inspection, there 
seem
to be two major techniques for prediction of secondary structure: the minimum
energy method and the comparative method. The previous chapter gives an
extensive treatment of the important minimum energy method, which utilizes
dynamic programming. After briefly discussing the minimum energy approach we
turn to the main topic of this chapter, comparative or consensus analysis of
folding."
</AB>
<PU>CRC Press </PU>
<PL>Boca Raton, FL </PL>
<PY>1989</PY>
<PP>185-224</PP>
</SEQ>

<SEQ>
<UI>1104   Waterman,M.S. Genomic Sequence Datab.. Genomics        90 
6:700-701
</UI>
<AU>Waterman MS
</AU>
<TI>Genomic Sequence Databases
</TI>
<SU>Genome;
    Sequence database;
    USA;
    Genomic
</SU>
<AB>"Collecting and managing data that are growing so rapidly, that require
constant correction, and that must be adapted to new definitions are major
tasks. Cooperation between databases has obvious scientific and political
difficulties, even within one country. When we factor in problems of
international cooperation, the reality of a unified set of biological databases
seems even more remote. These areas require policy decisions that will affect
the progress of international science. Who should make these decisions? Who 
will
actually make them? National and international databases must be coordinated.
... We cannot leave the future of information management in biology to chance."
</AB>
<JT>Genomics </JT>
<PY>6</PY>
<VO>6</VO>
<PP>700-701</PP>
</SEQ>

<SEQ>
<UI>1105   Kececioglu,J. Reconstructing a Histo.. ACM-SIAM Sympos 94 
5:471-480
</UI>
<AU>Kececioglu J;
    Gusfield D
</AU>
<TI>Reconstructing a History of Recombinations from a Set of Sequences
</TI>
<SU>Phylogeny;
    Sequence analysis;
    Genomic;
    Recombination;
    Edit;
    Distance;
    USA
</SU>
<AB>Preprint, 12 pp. "One of the classic problems in computational biology is
the reconstruction of evolutionary histories. A recent trend is toward
increasing the explanatory power of the models by incorporating higher-order
evolutionary events that more accurately reflect the full range of mutation at
the molecular level. In this paper, we take a step in this direction by
considering the problem of reconstructing an evolutionary history for a set of
genetic sequences that have evolved by recombination. Recombination produces a
new sequence by crossing two parent sequences, and is among the most important
mechanisms of high-order molecular mutation."
</AB>
<BK>ACM-SIAM Sympos Discrete Algorithms</BK>
<PY>1994</PY>
<VO>5</VO>
<PP>471-480</PP>
</SEQ>

<SEQ>
<UI>1106   Pevzner,P.A.  Matrix Longest Subsequ.. Lecture Notes i 92 
644:79-89
</UI>
<AU>Pevzner PA;
    Waterman MS
</AU>
<TI>Matrix Longest Subsequence Problem, Duality and Hilbert Bases
</TI>
<SU>Longest common;
    Subsequence;
    USA;
    Duality;
    Matrix
</SU>
<AB>Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992.
Proceedings. "Although a number of efficient algorithms for the longest common
subsequence (LCS) problem have been suggested since the 1970's, there is no
duality theorem for the LCS problem. In the present paper a simple duality
theorem is proved for the LCS problem and for a wide class of partial orders
generalizing the notion of common subsequence. An algorithm for finding
generalized LCS is suggested which has the classical dynamic programming
algorithm as a special case."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>644</PY>
<VO>644</VO>
<PP>79-89</PP>
</SEQ>

<SEQ>
<UI>1107   Pevzner,P.A.  Generalized Sequence A.. Adv.Appl.Math.  93 
14(2):139-171
</UI>
<AU>Pevzner PA;
    Waterman MS
</AU>
<TI>Generalized Sequence Alignment and Duality
</TI>
<SU>Sequence alignment;
    Longest common;
    Duality;
    USA
</SU>
<AB>"Although a number of efficient algorithms for the longest common
subsequence (LCS) problem have been suggested since the 1970s, there is no
duality theorem for the LCS problem. In the present paper a simple duality
theorem is proved for the LCS problem and for a wide class of partial orders
generalizing the notion of common subsequence and sequence alignment. An
algorithm for finding generalized alignment is suggested which has the 
classical
dynamic programming approach for alignment problems as a special case. The
algorithm covers both local and global alignment as well as a variety of gap
functions."
</AB>
<JT>Adv Appl Math</JT>
<PY>1993</PY>
<VO>14</VO>
<NO>2</NO>
<PP>139-171</PP>
</SEQ>

<SEQ>
<UI>1108   Pevzner,P.A.  A Fast Filtration for .. Lecture Notes i 93 
684:197-214
</UI>
<AU>Pevzner PA;
    Waterman MS
</AU>
<TI>A Fast Filtration for the Substring Matching Problem
</TI>
<SU>String match;
    Match with k mismatches;
    Approximate match;
    USA
</SU>
<AB>4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings.
"Given a text of length n and a query of length q we present an algorithm for
finding all locations of m-tuples in the text and in the query that differ by 
at
most k mismatches. ... In the case q=m the problem coincides with the classical
approximate string matching with k mismatches problem. We present a new 
approach
to this problem based on multiple filtration which may have advantages over 
some
sophisticated and theoretically efficient methods that have been proposed."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>684</PY>
<VO>684</VO>
<PP>197-214</PP>
</SEQ>

<SEQ>
<UI>1109   Arratia,R.    A Phase Transition for.. Ann.Appl.Probab 94 
4(1):200-225
</UI>
<AU>Arratia R;
    Waterman MS
</AU>
<TI>A Phase Transition for the Score in Matching Random Sequences Allowing
Deletions
</TI>
<SU>Pairwise alignment;
    Scoring;
    Significance;
    Subsequence;
    Longest common;
    USA;
    Sequence match;
    Transition;
    Deletion;
    Score
</SU>
<AB>"We consider a sequence matching problem involving the optimal alignment
score for contiguous subsequences, rewarding matches and penalizing for
deletions and mismatches. This score is used by biologists comparing pairs of
DNA or protein sequences. We prove that for two sequences of length n, as n 
goes
to infinity, there is a phase transition between linear growth in n, when the
penalty parameters are small, and logarithmic growth in n, when the penalties
are large. The results are valid for independent sequences with iid or Markov
letters. ... The longest common subsequence problem of Chvatal and Sankoff is a
special case of our setup."
</AB>
<JT>Ann Appl Probab</JT>
<PY>1994</PY>
<VO>4</VO>
<NO>1</NO>
<PP>200-225</PP>
</SEQ>

<SEQ>
<UI>1110   Waterman,M.S. Rapid and Accurate Est.. Proc.Nat.Acad.S 94 
91:4625-4628
</UI>
<AU>Waterman MS;
    Vingron M
</AU>
<TI>Rapid and Accurate Estimates of Statistical Significance for Sequence 
Data
Base Searches
</TI>
<SU>Database search;
    Statistical;
    Significance;
    Sequence database;
    USA
</SU>
<AB>"A central question in sequence comparison is the statistical 
significance
of an observed similarity. For local alignment containing gaps to optimize
sequence similarity this problem has so far not been solved mathematically.
Using as a basis the Chen-Stein theory of Poisson approximation, we present a
practical method to approximate the probability that a local alignment score is
a result of chance alone. For a set of similarity scores and gap penalties only
one simulation of random alignments needs to be calculated to derive the key
information allowing us to estimate the significance of any alignment 
calculated
under this setting. We present applications to data base searching and the
analysis of pairwise and self-comparisons of proteins."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>91</PY>
<VO>91</VO>
<PP>4625-4628</PP>
</SEQ>

<SEQ>
<UI>1111   Wang,J.T.L.   Discovering Active Mot.. Nucleic Acids R 94 
22(14):2769-27
</UI>
<AU>Wang JTL;
    Marr TG;
    Shasha D;
    Shapiro BA;
    Chirn GW
</AU>
<TI>Discovering Active Motifs in Sets of Related Protein Sequences and Using
Them for Classification
</TI>
<SU>Sequence search;
    Motif;
    Classification;
    USA;
    Protein
</SU>
<AB>"We describe a method for discovering active motifs in a set of related
protein sequences. The method is an automatic two step process: (1) find
candidate motifs in a small sample of the sequences; (2) test whether these
motifs are approximately present in all the sequences. To reduce the running
time, we develop two optimization heuristics based on statistical estimation 
and
pattern matching techniques. ... By combining the discovered motifs with an
existing fingerprint technique, we develop a protein classifier."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1994</PY>
<VO>22</VO>
<NO>14</NO>
<PP>2769-2775</PP>
</SEQ>

<SEQ>
<UI>1112   Britten,R.J.  Repeated Sequences in .. Science         68 161(9 
Aug.):52
</UI>
<AU>Britten RJ;
    Kohne DE
</AU>
<TI>Repeated Sequences in DNA
</TI>
<SU>Regularities;
    Repeat;
    USA;
    DNA
</SU>
<AB>"Hundreds of thousands of copies of DNA sequences have been incorporated
into the gneomes of higher organisms. ... In this article we describe selected
measurements that show most clearly the presence of repeated sequences and
indicate some of their properties."
</AB>
<JT>Science </JT>
<PY>1968</PY>
<VO>161</VO>
<NO>9 Aug.</NO>
<PP>529-540</PP>
</SEQ>

<SEQ>
<UI>1113   Britten,R.J.  Analysis of Repeating .. Methods Enzymol 74 
29:363-418
</UI>
<AU>Britten RJ;
    Graham DE;
    Neufeld BR
</AU>
<TI>Analysis of Repeating DNA Sequences by Reassociation
</TI>
<SU>Regularities;
    Repetition;
    USA;
    DNA
</SU>
<AB>"Repetitive DNA occurs widely, if not universally, among higher 
organisms.
A variety of procedures has been developed or adapted to examine its
characteristics, and a body of concepts and language has grown up to deal with
its complexities. This chapter attempts to summarize this body of knowledge and
technique. ... The chapter consists of an introduction in the form of a 
glossary
and descriptions of techniques and a method for the evaluation of rate
constants."
</AB>
<JT>Methods Enzymol</JT>
<PY>29</PY>
<VO>29</VO>
<PP>363-418</PP>
</SEQ>

<SEQ>
<UI>1114   Felsenstein,J Numerical Methods for .. Q.Rev.Biol.     82 
57(4):379-404
</UI>
<AU>Felsenstein J
</AU>
<TI>Numerical Methods for Inferring Evolutionary Trees
</TI>
<SU>Evolutionary tree;
    Phylogeny;
    Review;
    USA
</SU>
<AB>Parsimony methods. Correlated characters. Compatibility. Clustering
methods. Pairwise methods. Explicitly statistical methods. Nucleotide and
protein sequence data.
</AB>
<JT>Q Rev Biol</JT>
<PY>1982</PY>
<VO>57</VO>
<NO>4</NO>
<PP>379-404</PP>
</SEQ>

<SEQ>
<UI>1115   Kashyap,R.L.  Statistical Estimation.. J.Theor.Biol.   74 
47:75-101
</UI>
<AU>Kashyap RL;
    Subas S
</AU>
<TI>Statistical Estimation of Parameters in a Phylogenetic Tree Using a
Dynamic Model of the Substitutional Process
</TI>
<SU>Statistical;
    Phylogeny;
    Likelihood;
    USA;
    Model;
    Phylogenetic;
    Estimation;
    Dynamic
</SU>
<AB>"Using a modified version of the substitutional process proposed by
Neyman, we estimate the parameters of the phylogenetic tree made up of three
species .... The parameters estimated are the rate of substitution of amino
acids along a protein and the ratio of the times of divergence of the species
.... A method is given for determining the tree structure when it is not known.
Both the maximum likelihood and Bayes methods are used in the estimation. ...
Next we consider the construction of the correct phlyogenetic tree made up of
three or more taxonomic categories ...."
</AB>
<JT>J Theor Biol</JT>
<PY>47</PY>
<VO>47</VO>
<PP>75-101</PP>
</SEQ>

<SEQ>
<UI>1116   Kececioglu,J. Of Mice and Men: Algor.. ACM-SIAM Sympos 95 
6:???-???
</UI>
<AU>Kececioglu JD;
    Ravi R
</AU>
<TI>Of Mice and Men: Algorithms for Evolutionary Distances between Genomes
with Translocation
</TI>
<SU>Evolutionary distance;
    Genome;
    Translocation;
    Rearrangement;
    Inversion;
    USA;
    Distance;
    Algorithm
</SU>
<AB>Preprint, 10 pp. "In this paper, we begin the algorithmic study of genome
rearrangement by translocation. ... We model this as a process that exchanges
prefixes and suffixes of strings, where each string represents a sequence of
distinct markers along a chromosome in the genome. For the general problem of
determining the translocation distance between two such sets of strings, we
present a 2-approximation algorithm. ... We also examine for the first time two
types of rearrangements in concert. ... For genomes that have evolved by
translocation and inversion, we show there is a simple 2-approximation 
algorithm
for data in which the orientation of markers is unknown, and a 
3/2-approximation
algorithm when orientation is known."
</AB>
<BK>ACM-SIAM Sympos Discrete Algorithms</BK>
<PY>1995</PY>
<VO>6</VO>
<PP>???-???</PP>
</SEQ>

<SEQ>
<UI>1117   Felsenstein,J The Number of Evolutio.. Syst.Zool.      78 27:27-33
</UI>
<AU>Felsenstein J
</AU>
<TI>The Number of Evolutionary Trees
</TI>
<SU>Evolutionary tree;
    USA
</SU>
<AB>"A simple method of counting the number of possible evolutionary trees is
presented. The trees are assumed to be rooted, with labelled tips but 
unlabelled
root and unlabelled interior nodes. The method allows multifurcations as well 
as
bifurcations. It makes use of a simple recurrence relation for T(n,m), the
number of trees with n labelled tips and m unlabelled interior nodes."
</AB>
<JT>Syst Zool</JT>
<PY>27</PY>
<VO>27</VO>
<PP>27-33</PP>
</SEQ>

<SEQ>
<UI>1118   Felsenstein,J Statistical Inference .. J.Roy.Statist.S 83 
146(3):246-272
</UI>
<AU>Felsenstein J
</AU>
<TI>Statistical Inference of Phylogenies
</TI>
<SU>Statistical;
    Phylogeny;
    Markov;
    Likelihood;
    Review;
    USA
</SU>
<AB>"Statistical work on inferring phylogenies has concentrated on two cases:
nucleic acid sequence data, modelled by a stochastic process with four discrete
states, and gene frequency data, modelled by Brownian motion. A review of this
work is presented. There are many unsolved problems, the most important of 
which
is to persuade biologists to think of the problem of inferring phylogenies as
being basically statistical, and to abandon deductive frameworks that are used
as a justification for 'parsimony' methods."
</AB>
<JT>J Roy Statist Soc Ser A </JT>
<PY>1983</PY>
<VO>146</VO>
<NO>3</NO>
<PP>246-272</PP>
</SEQ>

<SEQ>
<UI>1119   Fitch,W.M.    Construction of Phylog.. Science         67 155(20 
Jan.):2
</UI>
<AU>Fitch WM;
    Margoliash E
</AU>
<TI>Construction of Phylogenetic Trees
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Clustering;
    Distance;
    USA;
    Phylogenetic
</SU>
<AB>"A method based on mutation distances as estimated from cytochrome c
sequences is of general applicability. ... The mutation distance between two
cytochromes is defined here as the minimal number of nucleotides that would 
need
to be altered in order for the gene for one cytochrome to code for the other."
</AB>
<JT>Science </JT>
<PY>1967</PY>
<VO>155</VO>
<NO>20 Jan.</NO>
<PP>279-284</PP>
</SEQ>

<SEQ>
<UI>1120   Fitch,W.M.    Toward Defining the Co.. Syst.Zool.      71 
20:406-416
</UI>
<AU>Fitch WM
</AU>
<TI>Toward Defining the Course of Evolution: Minimum Change for a Specific
Tree Topology
</TI>
<SU>Evolutionary tree;
    Phylogeny;
    USA;
    Evolution;
    Character optimization;
    Topology
</SU>
<AB>"A method is presented that is asserted to provide all hypothetical
ancestral character states that are consistent with describing the descent of
the present-day character states in a minimum number of changes of state using 
a
predetermined phylogenetic relationship among the taxa represented. The
character states used as examples are the four messenger RNA nucleotides
encoding the amino acid sequences of proteins, but eh method is general."
</AB>
<JT>Syst Zool</JT>
<PY>20</PY>
<VO>20</VO>
<PP>406-416</PP>
</SEQ>

<SEQ>
<UI>1121   Hartigan,J.A. Minimum Mutation Fits .. Biometrics      73 29:53-65
</UI>
<AU>Hartigan JA
</AU>
<TI>Minimum Mutation Fits to a Given Tree
</TI>
<SU>Evolutionary tree;
    Phylogeny;
    USA
</SU>
<AB>"A number of objects, such as species, lie at the ends of a known
evolutionary tree. A variable taking a finite number of possible values is
specified on this set of objects. How can the values of the variable be
estimated for the ancestors of the objects? One way is to assign to the
ancestors those values which have the minimum number of mutations (or changes)
in going from ancestors to their immediate descendants. In this paper, a method
of generating all such minimum mutation fits is described. ... Most relevantly,
a recent paper by Fitch (1971) ... specifies rules for constructing a minimum
mutation fit to a given binary tree. The advance of this paper is in specifying
construction rules for a general tree, and in proving that the rules do given a
minimum mutation fit."
</AB>
<JT>Biometrics </JT>
<PY>29</PY>
<VO>29</VO>
<PP>53-65</PP>
</SEQ>

<SEQ>
<UI>1122   Kaplan,N.     Statistical Analysis o.. Statistical A.. 83Marcel 
Dekker
</UI>
<AU>Kaplan N
</AU>
<TI>Statistical Analysis of Restriction Enzyme Map Data and Nucleotide
Sequence Data
</TI>
<ED>Weir BS
</ED>
<BK>Statistical Analysis of DNA Sequence Data
</BK>
<SU>Statistical;
    Restriction;
    Mapping;
    USA;
    Nucleotide
</SU>
<AB>See Weir (1983) for the book's bibliography, pp. 231-248. "Restriction
enzyme map data provide only a limited amount of information about the
similarities of homologous DNA sequences. Complete information is in hand when
the DNA is totally sequenced. ... The analysis of the data generated by these
new techniques has required the development of new statistical methodology.
Several authors have used this data to estimate evolutionary distance between
two DNA sequences having a common ancestor .... Others have used the data to
estimate DNA sequence variation within populations .... The purpose of this
chapter is to survey these statistical methods."
</AB>
<PU>Marcel Dekker </PU>
<PL>New York </PL>
<PY>1983</PY>
<PP>75-106</PP>
</SEQ>

<SEQ>
<UI>1123   Kimura,M.     A Simple Method for Es.. J.Mol.Evol.     80 
16:111-120
</UI>
<AU>Kimura M
</AU>
<TI>A Simple Method for Estimating Evolutionary Rates of Base Substitutions
Through Comparative Studies of Nucleotide Sequences
</TI>
<SU>Substitution;
    Evolutionary distance;
    Sequence comparison;
    JP;
    Evolutionary rate;
    Rate;
    Nucleotide
</SU>
<AB>"Some simple formulae were obtained which enable us to estimate
evolutionary distances in terms of the number of nucleotide substitutions (and,
also, the evolutionary rates when the divergence times are known). In comparing
a pair of nucleotide sequences, we distinguish two types of differences;
[transitions, transversions]. ... Also, formulae for standard errors were
obtained. Some examples were worked out using reported globin sequences to show
that synonymous substitutions occur at much higher rates than amino acid-
altering substitutions in evolution."
</AB>
<JT>J Mol Evol</JT>
<PY>16</PY>
<VO>16</VO>
<PP>111-120</PP>
</SEQ>

<SEQ>
<UI>1124   Kimura,M.     On the Stochastic Mode.. J.Mol.Evol.     72 2:87-90
</UI>
<AU>Kimura M;
    Ohta T
</AU>
<TI>On the Stochastic Model for Estimation of Mutational Distance between
Homologous Proteins
</TI>
<SU>Substitution;
    Evolutionary distance;
    Stochastic;
    JP;
    Distance;
    Protein;
    Model;
    Estimation
</SU>
<AB>"A set of simple equations is derived which gives the relationship 
between
the observed amino acid differences per 100 codons and the evolutionary 
distance
per 100 codons using Holmquist's stochastic model of molecular evolution."
</AB>
<JT>J Mol Evol</JT>
<PY>2</PY>
<VO>2</VO>
<PP>87-90</PP>
</SEQ>

<SEQ>
<UI>1125   Farris,J.S.   On the Phenetic Approa.. Major Pattern.. 77Plenum 
Press
</UI>
<AU>Farris JS
</AU>
<TI>On the Phenetic Approach to Vertebrate Classification
</TI>
<ED>Hecht MK
    Goody PC;
    Hecht BM
</ED>
<BK>Major Patterns in Vertebrate Evolution. NATO ASI Series, Vol. 14
</BK>
<SU>Classification;
    Clustering;
    Hierarchical;
    Evolutionary tree;
    USA
</SU>
<AB>"The topological errors [of an inferred phylogeny] might be remedied,
however, by using a correction called the transformed distance method (Farris
1977; Klotz et al. 1979). In brief, this method uses an outgroup as reference 
to
make corrections for unequal rates of evolution among the lineages under study
and then applies UPGMA to the new distance matrix to infer the topology of the
tree." -- Li, Graur (1991), p. 109.
</AB>
<PU>Plenum Press </PU>
<PL>New York </PL>
<PY>1977</PY>
<PP>823-850</PP>
</SEQ>

<SEQ>
<UI>1126   Fitch,W.M.    Toward Finding the Tre.. Proceedings o.. 75Freeman
</UI>
<AU>Fitch WM
</AU>
<TI>Toward Finding the Tree of Maximum Parsimony
</TI>
<ED>Estabrook G
</ED>
<BK>Proceedings of the Eighth International Conference on Numerical Taxonomy
</BK>
<SU>Evolutionary tree;
    Phylogeny;
    USA;
    Parsimony
</SU>
<AB>"Since the general solution to the problem [of finding a tree of maximum
parsimony] is not at hand, I shall consider procedures that are useful, either
in making the problem more tractable or in pointing toward the most 
parsimonious
tree, as well as noting pitfalls. Two major parts of this are, 1, the concept 
of
a discordancy and, 2, a natural interpretation of a Prim-Kruskal netwsork ... 
in
terms of phylogeny." This is a preliminary version of Fitch (1977).
</AB>
<PU>Freeman </PU>
<PL>San Francisco </PL>
<PY>1975</PY>
<PP>189-230</PP>
</SEQ>

<SEQ>
<UI>1127   Fitch,W.M.    On the Problem of Disc.. Am.Nat.         77 111(No. 
978):2
</UI>
<AU>Fitch WM
</AU>
<TI>On the Problem of Discovering the Most Parsimonious Tree
</TI>
<SU>Evolutionary tree;
    Phylogeny;
    USA
</SU>
<AB>"Since the general solution to the problem [of finding a tree of maximum
parsimony] is not in hand, I shall consider procedures that are useful, either
in making the problem more tractable or in pointing toward the most 
parsimonious
tree, as well as noting pitfalls. Two major parts of this are: (1) the concept
of a discordancy and (2) a natural interpretation of a Prim-Kruskal ... or 
other
single linkage network in terms of phylogeny." Fitch (1975) is a preliminary
version of this paper.
</AB>
<JT>Am Nat</JT>
<PY>1977</PY>
<VO>111</VO>
<NO>No. 978</NO>
<PP>223-257</PP>
</SEQ>

<SEQ>
<UI>1128   Fitch,W.M.    Evolutionary Trees wit.. J.Mol.Evol.     74 
3:263-278
</UI>
<AU>Fitch WM;
    Farris JS
</AU>
<TI>Evolutionary Trees with Minimum Nucleotide Replacements from Amino Acid
Sequences
</TI>
<SU>Evolutionary tree;
    Phylogeny;
    USA;
    Nucleotide;
    Amino acid
</SU>
<AB>"The problem of determining the minimum number of nucleotide 
substitutions
required to account for the descent of a set of amino acid sequences given 
their
ancestral relationships (phylogeny) has been studied. A method expanding upon
the earlier work of Fitch (1971) for a set of nucleotide sequences is presented
and its merits compared to the method of Moore et al. (1973)."
</AB>
<JT>J Mol Evol</JT>
<PY>3</PY>
<VO>3</VO>
<PP>263-278</PP>
</SEQ>

<SEQ>
<UI>1129   Fitch,W.M.    A Non-Sequential Metho.. J.Mol.Evol.     81 18:30-37
</UI>
<AU>Fitch WM
</AU>
<TI>A Non-Sequential Method for Constructing Trees and Hierarchical
Classifications
</TI>
<SU>Hierarchical;
    Classification;
    Phylogeny;
    Evolutionary tree;
    Clustering;
    Distance;
    USA
</SU>
<AB>"A procedure is presented that forms an unrooted tree-like structure from
a matrix of pairwise differences. The tree is not formed a portion at a time, 
as
methods now in use generally do, but is formed en toto without intervening
estimates of branch lengths. The method is based on a relaxed additivity (four-
point metric) constraint. From the tree, a classification may be formed."
</AB>
<JT>J Mol Evol</JT>
<PY>18</PY>
<VO>18</VO>
<PP>30-37</PP>
</SEQ>

<SEQ>
<UI>1130   Li,W.H.       Simple Method for Cons.. Proc.Nat.Acad.S 81 
78(2):1085-108
</UI>
<AU>Li WH
</AU>
<TI>Simple Method for Constructing Phylogenetic Trees from Distance Matrices
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    USA;
    Distance;
    Phylogenetic
</SU>
<AB>"A simple method is proposed for constructing phylogenetic trees from
distance matrices. The procedure for constructing tree topologies is similar to
that of the unweighted pair-group method (UPG method) but makes corrections for
unequal rates of evolution among lineages. The procedure for estimating branch
lengths is the same as that of the Fitch and Margoliash method (F-M method)
except that it allows no negative branch lengths. The performance of the 
present
procedure for the construction of tree topologies is compared with that of the
UPG method, the F-M method, Farris' method, and the modified Farris method 
...."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>1981</PY>
<VO>78</VO>
<NO>2</NO>
<PP>1085-1089</PP>
</SEQ>

<SEQ>
<UI>1131   Saitou,N.     The Neighbor-Joining M.. Mol.Biol.Evol.  87 
4(4):406-425
</UI>
<AU>Saitou N;
    Nei M
</AU>
<TI>The Neighbor-Joining Method: A New Method for Reconstructing Phylogenetic
Trees
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Clustering;
    Distance;
    USA;
    Phylogenetic;
    Neighbor joining
</SU>
<AB>"A new method called the neighbor-joining method is proposed for
reconstructing phylogenetic trees from evolutionary distance data. The 
principle
of this method is to find pairs of operational taxonomic units (OTUs) that
minimize the total branch length at each stage of clustering of OTUs starting
with a starlike tree. The branch lengths as well as the topology of a
parsimonious tree can quickly be obtained by using this method. Using computer
simulation, we studied the efficiency of this method in obtaining the correct
unrooted tree in comparison with that of five other tree-making methods ...."
See Studier, Keppler (1988) for clarifications and a correction.
</AB>
<JT>Mol Biol Evol</JT>
<PY>1987</PY>
<VO>4</VO>
<NO>4</NO>
<PP>406-425</PP>
</SEQ>

<SEQ>
<UI>1132   Sattath,S.    Additive Similarity Tr.. Psychometrika   77 
42(3):319-345
</UI>
<AU>Sattath S;
    Tversky A
</AU>
<TI>Additive Similarity Trees
</TI>
<SU>Clustering;
    Hierarchical;
    Additive tree;
    Program;
    Similarity;
    Distance;
    IL
</SU>
<AB>"Similarity data can be represented by additive trees. ... The additive
tree is less restrictive than the ultrametric tree, commonly known as the
hierarchical clustering scheme. The two representations are characterized and
compared. A computer program, ADDTREE, for the construction of additive trees 
is
described and applied to several sets of data."
</AB>
<JT>Psychometrika </JT>
<PY>1977</PY>
<VO>42</VO>
<NO>3</NO>
<PP>319-345</PP>
</SEQ>

<SEQ>
<UI>1133   Studier,J.A.  A Note on the Neighbor.. Mol.Biol.Evol.  88 
5(6):729-731
</UI>
<AU>Studier JA;
    Keppler KJ
</AU>
<TI>A Note on the Neighbor-Joining Algorithm of Saitou and Nei
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    USA;
    Error;
    Neighbor joining;
    Algorithm
</SU>
<AB>"The minimum running time of the algorithm as formulated by Saitou and 
Nei
is unclear. We present an alternative formulation that runs in time O(N3), 
where
N is the number of operational taxonomic units (OTUs). ... The proof given by
Saitou and Nei that the correct tree is recovered if D is treelike is 
incorrect.
We describe the error and supply a correct proof below."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1988</PY>
<VO>5</VO>
<NO>6</NO>
<PP>729-731</PP>
</SEQ>

<SEQ>
<UI>1134   Szpankowski,W (Un)Expected Behavior .. ACM-SIAM Sympos 92 
3:422-431
</UI>
<AU>Szpankowski W
</AU>
<TI>(Un)Expected Behavior of Typical Suffix Trees
</TI>
<SU>String match;
    Suffix;
    Search tree;
    USA
</SU>
<AB>"Recently, Chang and Lawler have designed a sublinear expected time
algorithm for approximate string matching using simple estimates of some
parameters of suffix trees. ... In this paper, we use a novel technique called
string ruler approach to provide a characterization of several basic parameters
of suffix trees .... These findings are used to ... provide new insights and
generalizations of string matching algorithms, particularly the one by Chang 
and
Lawler."
</AB>
<BK>ACM-SIAM Sympos Discrete Algorithms</BK>
<PY>1992</PY>
<VO>3</VO>
<PP>422-431</PP>
</SEQ>

<SEQ>
<UI>1135   Nussinov,R.   Theoretical Molecular .. J.Theor.Biol.   87 
125:219-235
</UI>
<AU>Nussinov R
</AU>
<TI>Theoretical Molecular Biology: Prospectives and Perspectives
</TI>
<SU>String match;
    Pattern match;
    Approximate match;
    Consensus method;
    IL
</SU>
<AB>"I briefly discuss some aspects of theoretical molecular biology.
Specifically, I include the issues of searches for homologies via string
matchings, for patterns of specific nucleotide gorupings and of sequence-
structure relationship. The various approaches developed in order to achieve
this end are described, attempting to convey some of the excitement in this
quickly growing field." Pattern recognition in symbolic strings. Algorithms for
string matching and for finding consensus sequences. Nearest neighbour patterns
in nucleotide sequences. Consensus sequences. Structural Implications. Some
further considerations.
</AB>
<JT>J Theor Biol</JT>
<PY>125</PY>
<VO>125</VO>
<PP>219-235</PP>
</SEQ>

<SEQ>
<UI>1136   Senapathy,P.  Splice Junctions, Bran.. Methods Enzymol 90 
183:252-278
</UI>
<AU>Senapathy P;
    Shapiro MB;
    Harris NL
</AU>
<TI>Splice Junctions, Branch Point Sites, and Exons: Sequence Statistics,
Identification, and Applications to Genome Project
</TI>
<SU>Pattern discovery;
    Identification;
    Genome;
    USA;
    Exon
</SU>
<AB>"We have used the tabulated consensus scoring matrices to find the most
probable splice sites in a given sequence. ... A method was developed to 
predict
potential exons in an uncharacterized sequence based on splice site scores and
by using other parameters of exons and eukaryotic coding sequences.  However,
although this method could identify some complete exons of a gene, it cannot
identify all the exons of a gene completely. ... Thus the problem of 
identifying
complete genes is an order of magnitude more complex than finding individual
exons ...."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>252-278</PP>
</SEQ>

<SEQ>
<UI>1137   Eigen,M.      Statistical Geometry o.. Methods Enzymol 90 
183:505-530
</UI>
<AU>Eigen M;
    Winkler-Oswatitsch R
</AU>
<TI>Statistical Geometry on Sequence Space
</TI>
<SU>Multiple comparison;
    Statistical;
    Sequence analysis;
    Sequence proximity;
    DE;
    Geometry
</SU>
<AB>"Alignment as such represents a two-dimensional matrix and thus invites
horizontal and vertical inspection. Distance is calculated by horizontal 
summing
of differences between two sequences. Positional nonuniformities of mutation or
fixation manifest themselves in vertical deviations from consensus occupation.
In this chapter we describe methods of comparative sequence analysis that
combine horizontal and vertical criteria. They are used to construct geometries
that are more complex, but at the same time also more informative than simple
distance dendrograms. We start by introducing the concept of sequence space, a
high-dimensional space that is most appropriate for representing sequence
relations."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>505-530</PP>
</SEQ>

<SEQ>
<UI>1138   Gojobori,T.   Statistical Methods fo.. Methods Enzymol 90 
183:531-550
</UI>
<AU>Gojobori T;
    Moriyama EN;
    Kimura M
</AU>
<TI>Statistical Methods for Estimating Sequence Divergence
</TI>
<SU>Evolutionary distance;
    Substitution;
    Statistical;
    JP;
    Divergence
</SU>
<AB>"The observed number of nucleotide differences between the two DNA
sequences is thus frequently different from the total number of nucleotide
substitutions that have actually occurred during their divergence. Statistical
methods for estimating the number of nucleotide substitutions are therefore
required for comparative studies of DNA sequences. In this chapter, we first
describe various methods for estimating the number of nucleotide substitutions
and then discuss the advantages and disadvantages of these methods."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>531-550</PP>
</SEQ>

<SEQ>
<UI>1139   Saccone,C.    Influence of Base Comp.. Methods Enzymol 90 
183:570-583
</UI>
<AU>Saccone C;
    Lanave C;
    Pesole G;
    Preparata G
</AU>
<TI>Influence of Base Composition on Quantitative Estimates of Gene Evolution
</TI>
<SU>Evolutionary distance;
    Composition;
    Stochastic;
    Markov;
    Italy;
    Evolution;
    Gene
</SU>
<AB>"The measure of the genetic distance between organisms is one of the most
challenging and difficult issues in molecular evolution. ... The construction 
of
simple models of molecular evolution appears as a necessary and scientifically
appropriate first step in any methodological approach. A few years ago we
proposed a simple stochastic model of molecular evolution, the stationary 
Markov
model, and we demonstrated that it is at work in a large variety of types of
evolutionary dynamics operating at the gene level. In this chapter we present
the theoretical basis of the model, its mathematical formulation, and a few
experimental applications."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>570-583</PP>
</SEQ>

<SEQ>
<UI>1140   Saitou,N.     Maximum Likelihood Met.. Methods Enzymol 90 
183:584-598
</UI>
<AU>Saitou N
</AU>
<TI>Maximum Likelihood Methods
</TI>
<SU>Evolutionary distance;
    Phylogeny;
    Likelihood;
    JP
</SU>
<AB>"Application of the maximum likelihood (ML) method to the problem of
phylogenetic tree reconstruction was first studied for the case of gene
frequency data. Later, an ML algorithm for constructing unrooted phylogenetic
trees from nucleotide sequence data was developed by Felsenstein (1981).
Recently, Saitou (1988) proposed a stepwise tree-searching algorithm for the ML
method. This is similar to that of the neighbor-joining method (Saitou, Nei
1987), in which distance matrices are used."
</AB>
<JT>Methods Enzymol</JT>
<PY>183</PY>
<VO>183</VO>
<PP>584-598</PP>
</SEQ>

<SEQ>
<UI>1141   Felsenstein,J PHYLIP - Phylogeny Inf.. Cladistics      89 
5(2):164-166
</UI>
<AU>Felsenstein J
</AU>
<TI>PHYLIP - Phylogeny Inference Package (Version 3.2)
</TI>
<SU>Phylogeny;
    USA;
    Program
</SU>
<AB>"This is a free package of programs for inferring phylogenies and 
carrying
out certain related tasks. At present it contains 29 programs, which carry out
different algorithms on different kinds of data. The programs in the package
are: ..." Programs for molecular sequence data (10), Programs for distance
matrix data (2), Programs for Gene Frequencies (2), Programs for discrete state
data (10), Programs for plotting trees and consensus trees (5). [The current
version is available from the author by ftp (file transfer program).]
</AB>
<JT>Cladistics </JT>
<PY>1989</PY>
<VO>5</VO>
<NO>2</NO>
<PP>164-166</PP>
</SEQ>

<SEQ>
<UI>1142   Blanken,R.L.  Computer Comparison of.. J.Mol.Evol.     82 19:9-19
</UI>
<AU>Blanken RL;
    Klotz LC;
    Hinnebusch AG
</AU>
<TI>Computer Comparison of New and Existing Criteria for Constructing
Evolutionary Trees from Sequence Data
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    USA
</SU>
<AB>"Three new methods for constructing evolutionary trees from molecular
sequence data are presented. These methods are based on a theory for correcting
for non-constant evolutionary rates (Klotz et al. 1979; Klotz and Blanken 
1981).
Extensive computer simulations were run to compare these new methods to the
commonly used criteria of Dayhoff (1978) and Fitch and Margoliash (1967). ...
However, no method yielded the correct topology all of the time, which
demonstrated the need to determine confidence estimates in a particular result
when evolutionary trees are determined from sequence data."
</AB>
<JT>J Mol Evol</JT>
<PY>19</PY>
<VO>19</VO>
<PP>9-19</PP>
</SEQ>

<SEQ>
<UI>1143   Winkler-Oswat Comparative Sequence A.. Chemica Scripta 86 
26B:59-66
</UI>
<AU>Winkler-Oswatitsch R;
    Dress A;
    Eigen M
</AU>
<TI>Comparative Sequence Analysis Exemplified with tRNA and 5S rRNA
</TI>
<SU>Sequence analysis;
    Evolutionary tree;
    DE
</SU>
<AB>"The advent of new sequencing techniques has brought a sudden increase in
the data available for the study of evolutionary history on a quantitative
basis. Criteria are put forward and methods are developed that allow an optimal
alignment of sequences, a determination of the topology of their kinship
relations, a reconstitution of precursors and a reliable establishment of their
randomization. The criteria developed are tested by comparison to a large bulk
of data from both tRNA and ribosomal 5S RNA sequences."
</AB>
<JT>Chemica Scripta</JT>
<PY>1986</PY>
<VO>26B</VO>
<PP>59-66</PP>
</SEQ>

<SEQ>
<UI>1144   Wheeler,W.C.  Paired Sequence Differ.. Mol.Biol.Evol.  88 
5(1):90-96
</UI>
<AU>Wheeler WC;
    Honeycutt RL
</AU>
<TI>Paired Sequence Difference in Ribosomal RNAs: Evolutionary and
Phylogenetic Implications
</TI>
<SU>Phylogeny;
    USA;
    RNA;
    Phylogenetic
</SU>
<AB>"Ribosomal RNAs have secondary structures that are maintained by internal
Watson-Crick pairing. ... [W]e show that Darwinian selection operates on these
nucleotide sequences to maintain functionally important secondary structure.
Insect phylogenies based on nucleotide positions involved in pairing and the
production of secondary structure are incongruent with those constructed on the
basis of positions that are not. Furthermore, phylogeny reconstruction using
these nonpairing bases is concordant with other, morphological data."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1988</PY>
<VO>5</VO>
<NO>1</NO>
<PP>90-96</PP>
</SEQ>

<SEQ>
<UI>1145   McLachlan,A.D Repeating Sequences an.. J.Mol.Biol.     72 
64:417-437
</UI>
<AU>McLachlan AD
</AU>
<TI>Repeating Sequences and Gene Duplication in Proteins
</TI>
<SU>Repeat;
    Duplication;
    Repetition;
    UK;
    Gene;
    Protein
</SU>
<AB>"The theory that proteins have evolved by repeated internal duplication 
of
short segments of polypeptide chains has been tested by looking for repeats and
near repeats in over 50 different proteins, many of them of known structure. 
The
probability that the observed repeats could arise by chance has been 
calculated.
The search does not yield a single new example where the evidence for gene
duplication is compelling. No protein shows a unique internally consistent
pattern of repeats which both correlates with repeats in the structure and
cannot be explained by chance."
</AB>
<JT>J Mol Biol</JT>
<PY>64</PY>
<VO>64</VO>
<PP>417-437</PP>
</SEQ>

<SEQ>
<UI>1146   Kaplan,N.     A New Estimate of Sequ.. J.Mol.Evol.     79 
13:295-304
</UI>
<AU>Kaplan N;
    Langley CH
</AU>
<TI>A New Estimate of Sequence Divergence of Mitochondrial DNA Using
Restriction Endonuclease Mappings
</TI>
<SU>Evolutionary distance;
    Restriction;
    Mapping;
    USA;
    Divergence;
    DNA
</SU>
<AB>"A new estimate of the sequence divergence of mitochondrial DNA in 
related
species using restriction enzyme maps is constructed. The estimate is derived
assuming a simple Poisson-like model for the evolutionary process and is chosen
to maximize an expression which is a reasonable approximation to the true
likelihood of the restriction map data. Using this estimate, four sets of
mitochondrial DNA data are analyzed and discussed."
</AB>
<JT>J Mol Evol</JT>
<PY>13</PY>
<VO>13</VO>
<PP>295-304</PP>
</SEQ>

<SEQ>
<UI>1147   Peacock,D.    Use of Amino Acid Sequ.. J.Mol.Biol.     75 
95:513-527
</UI>
<AU>Peacock D;
    Boulter D
</AU>
<TI>Use of Amino Acid Sequence Data in Phylogeny and Evaluation of Methods
using Computer Simulation
</TI>
<SU>Phylogeny;
    UK;
    Simulation;
    Amino acid
</SU>
<AB>"The advantages and disadvantages of the use of amino acid sequence data
for phylogenetic studies are critically examined. The accuracy of two of the
main methods currently used to construct phylogenetic relationships from amino
acid sequences is evaluated, using a computer program that produces model
sequences by simulating the process of protein evolution." The methods compared
were those of Dayhoff &amp; Eck (1966) and of Moore, Goodman &amp; Barnabas (1973).
</AB>
<JT>J Mol Biol</JT>
<PY>95</PY>
<VO>95</VO>
<PP>513-527</PP>
</SEQ>

<SEQ>
<UI>1148   Saitou,N.     Property and Efficienc.. J.Mol.Evol.     88 
27:261-273
</UI>
<AU>Saitou N
</AU>
<TI>Property and Efficiency of the Maximum Likelihood Method for Molecular
Phylogeny
</TI>
<SU>Phylogeny;
    Likelihood;
    USA
</SU>
<AB>"The maximum likelihood (ML) method for constructing phylogenetic trees
(both rooted and unrooted trees) from DNA sequence data was studied. Although
there is some theoretical problem in the comparison of ML values conditional 
for
each topology, it is possible to make a heuristic argument to justify the
method. Based on this argument, a new algorithm for estimating the ML tree is
presented."
</AB>
<JT>J Mol Evol</JT>
<PY>27</PY>
<VO>27</VO>
<PP>261-273</PP>
</SEQ>

<SEQ>
<UI>1149   Sourdis,J.    Accuracy of Phylogenet.. Mol.Biol.Evol.  87 
4(2):159-166
</UI>
<AU>Sourdis J;
    Krimbas C
</AU>
<TI>Accuracy of Phylogenetic Trees Estimated from DNA Sequence Data
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    GR;
    DNA;
    Phylogenetic;
    Accuracy
</SU>
<AB>"The relative merits of four different tree-making methods in obtaining
the correct topology were studied by using computer simulation. The methods
studied were the unweighted pair-group method with arithmetic mean (UPGMA),
Fitch and Margoliash's (FM) method, the distance Wagner (DW) method, and Tateno
et al.'s modified Farris (MF) method. ... The results obtained can be 
summarized
as follows: (1) The probability of obtaining the correct rooted or unrooted 
tree
is low unless a large number of nucleotide differences exists between different
sequences. (2) ..."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1987</PY>
<VO>4</VO>
<NO>2</NO>
<PP>159-166</PP>
</SEQ>

<SEQ>
<UI>1150   Tateno,Y.     Statistical Properties.. J.Mol.Evol.     86 
23:354-361
</UI>
<AU>Tateno Y;
    Tajima F
</AU>
<TI>Statistical Properties of Molecular Tree Construction Methods under the
Neutral Mutation Model
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Statistical;
    JP;
    Model
</SU>
<AB>"The statistical properties of three molecular tree construction methods 
-
the unweighted pair-group arithmetic average clustering (UPG), Farris, and
modified Farris methods - are examined under the neutral mutation model of
evolution. The methods are compared for accuracy in construction of the 
topology
and estimation of the branch lengths, using statistics of these two aspects. 
The
distribution of the statistic concerning topological construction is shown to 
be
as important as its mean and variance for the comparison."
</AB>
<JT>J Mol Evol</JT>
<PY>23</PY>
<VO>23</VO>
<PP>354-361</PP>
</SEQ>

<SEQ>
<UI>1151   Tateno,Y.     Accuracy of Estimated .. J.Mol.Evol.     82 
18:387-404
</UI>
<AU>Tateno Y;
    Nei M;
    Tajima F
</AU>
<TI>Accuracy of Estimated Phylogenetic Trees from Molecular Data. I. 
Distantly
Related Species
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    USA;
    Phylogenetic;
    Accuracy
</SU>
<AB>"The accuracies and efficiencies of four different methods for
constructing phylogenetic trees from molecular data were examined by using
computer simulation. The methods examined are UPGMA, Fitch and Margoliash's
(1967) (F/M) method, Farris' (1972) method, and the modified Farris method
(Tateno, Nei, and Tajima, this paper). ... Nevertheless, any tree-making method
is likely to make errors in obtaining the correct topology with a high
probability, unless all branch lengths of the true tree are sufficiently long.
... The agreement between patristic and observed distances is not a good
indicator of the goodness of the tree obtained."
</AB>
<JT>J Mol Evol</JT>
<PY>18</PY>
<VO>18</VO>
<PP>387-404</PP>
</SEQ>

<SEQ>
<UI>1152   Thompson,E.A. Likelihood and Parsimo.. Cladistics      86 
2(1):43-52
</UI>
<AU>Thompson EA
</AU>
<TI>Likelihood and Parsimony: Comparison of Criteria and Solutions
</TI>
<SU>Phylogeny;
    Likelihood;
    UK;
    Parsimony
</SU>
<AB>"This paper investigates the effects of alternative levels of assumption
upon resulting estimates of evolutionary history, in the context of a 
particular
problem - analysis of varying allele frequencies between polymorphic alleles
existing in all the populations under consideration."
</AB>
<JT>Cladistics </JT>
<PY>1986</PY>
<VO>2</VO>
<NO>1</NO>
<PP>43-52</PP>
</SEQ>

<SEQ>
<UI>1153   Hirschberg,D. The Set-Set LCS Problem  Algorithmica    89 
4(4):503-510
</UI>
<AU>Hirschberg DS;
    Larmore LL
</AU>
<TI>The Set-Set LCS Problem
</TI>
<SU>Longest common;
    Subsequence;
    USA;
    Dynamic programming
</SU>
<AB>"An efficient algorithm is presented that solves a generalization of the
Longest Common Subsequence problem, in which both of the input strings consists
of sets of symbols which may be permuted."
</AB>
<JT>Algorithmica </JT>
<PY>1989</PY>
<VO>4</VO>
<NO>4</NO>
<PP>503-510</PP>
</SEQ>

<SEQ>
<UI>1154   Ukkonen,E.    A Linear-Time Algorith.. Algorithmica    90 
5(3):313-323
</UI>
<AU>Ukkonen E
</AU>
<TI>A Linear-Time Algorithm for Finding Approximate Shortest Common
Superstrings
</TI>
<SU>Supersequence;
    Shortest common;
    FI;
    Algorithm
</SU>
<AB>"Approximate shortest common superstrings for a given set R of strings 
can
be constructed by applying the greedy heuristics for finding a longest
Hamiltonian path in the weighted graph that represents the pairwise overlaps
between the strings in R. We develop an efficient implementation of this idea
using a modified Aho-Corasick string-matching automaton."
</AB>
<JT>Algorithmica </JT>
<PY>1990</PY>
<VO>5</VO>
<NO>3</NO>
<PP>313-323</PP>
</SEQ>

<SEQ>
<UI>1155   Apostolico,A. Optimal Parallel Detec.. Algorithmica    92 
8(4):285-319
</UI>
<AU>Apostolico A
</AU>
<TI>Optimal Parallel Detection of Squares in Strings
</TI>
<SU>Regularities;
    Optimal;
    Parallel;
    Square;
    Italy;
    Detection
</SU>
<AB>"A string is square-free if it has no nonempty substring of the form ww.
It is shown that the square-freedom of a string of n symbols over an arbitrary
alphabet can be tested by a CRCW PRAM with n processors in O(log n) time and
linear auxiliary space. ... More elaborate constructions lead to a CRCW PRAM
algorithm for detecting, within the same n-processors bounds, all positioned
squares in x in time O(log n) and using linear auxiliary space. The fastest
sequential algorithms solve this problem in O(n log n) time, and such a
performance is known to be optimal."
</AB>
<JT>Algorithmica </JT>
<PY>1992</PY>
<VO>8</VO>
<NO>4</NO>
<PP>285-319</PP>
</SEQ>

<SEQ>
<UI>1156   Lam,T.W.      Finding Least-Weight S.. Algorithmica    93 
9(6):615-628
</UI>
<AU>Lam TW;
    Chan KF
</AU>
<TI>Finding Least-Weight Subsequences with Fewer Processors
</TI>
<SU>Least weight;
    Subsequence;
    HK
</SU>
<AB>"In this paper we show that if the weight function satisfies the inverse
quadrangle inequality, the [least-weight subsequence] problem can be solved on 
a
CREW PRAM in O(log2 n log log n) time with n/log log n processors, or in O(log2
n) time with n log n processors. Notice that the processor-time complexity of
our algorithm is much closer to the almost linear-time complexity of the best-
known sequential algorithm."
</AB>
<JT>Algorithmica </JT>
<PY>1993</PY>
<VO>9</VO>
<NO>6</NO>
<PP>615-628</PP>
</SEQ>

<SEQ>
<UI>1157   Ukkonen,E.    Approximate String Mat.. Algorithmica    93 
10(5):353-364
</UI>
<AU>Ukkonen E;
    Wood D
</AU>
<TI>Approximate String Matching with Suffix Automata
</TI>
<SU>Approximate match;
    String match;
    Automata;
    FI;
    Suffix
</SU>
<AB>"The approximate string matching problem is, given a text string, a
pattern string, and an integer k, to find in the text all approximate
occurrences of the pattern. An approximate occurrence means a substring of the
text with edit distance at most k from the pattern. We give a new O(kn)
algorithm for this problem, where n is the length of the text. The algorithm is
based on the suffix automaton with failure transitions and on the diagonalwise
monotonicity of the edit distance table. Some experiments showing that the
algorithm has a small overhead are reported."
</AB>
<JT>Algorithmica </JT>
<PY>1993</PY>
<VO>10</VO>
<NO>5</NO>
<PP>353-364</PP>
</SEQ>

<SEQ>
<UI>1158   Steele,J.M.   An Efron-Stein Inequal.. Ann.Statist.    86 
14(2):753-758
</UI>
<AU>Steele JM
</AU>
<TI>An Efron-Stein Inequality for Nonsymmetric Statistics
</TI>
<SU>Longest common;
    Subsequence;
    USA
</SU>
<AB>"Finally the inequality is applied to a problem of string comparisons by
means of long common subsequences, a problem considered at length in Sankoff 
and
Kruskal (1983). The best known bound on the variance of the longest common
subsequence is improved, and a new k string comparison problem is introduced."
</AB>
<JT>Ann Statist</JT>
<PY>1986</PY>
<VO>14</VO>
<NO>2</NO>
<PP>753-758</PP>
</SEQ>

<SEQ>
<UI>1159   Kececioglu,J. Combinatorial Algorith.. Algorithmica    94 
12:???-???
</UI>
<AU>Kececioglu JD;
    Myers EW
</AU>
<TI>Combinatorial Algorithms for DNA Sequence Assembly
</TI>
<SU>Approximation;
    Fragment;
    Sequence reconstruction;
    Sequence assembly;
    USA;
    Combinatorial;
    DNA;
    Algorithm
</SU>
<AB>Preprint, 45 pp. "The sequence reconstruction problem that we take as our
formulation of DNA sequence assembly is a variation of the shortest common
superstring problem, complicated by the presence of sequencing errors and
reverse complements of fragments. Since the simpler superstring problem is NP-
hard, any efficient reconstruction procedure must resort to heuristics. In this
paper, however, a four phase approach based on rigorous design criteria is
presented, and has been found to be very accurate in practice."
</AB>
<JT>Algorithmica </JT>
<PY>12</PY>
<VO>12</VO>
<PP>???-???</PP>
</SEQ>

<SEQ>
<UI>1160   Pevzner,P.A.  Open Combinatorial Pro.. Israel Sympos.T 95 
3:???-???
</UI>
<AU>Pevzner PA;
    Waterman MS
</AU>
<TI>Open Combinatorial Problems in Computational Molecular Biology
</TI>
<SU>Genome;
    Rearrangement;
    Mapping;
    Sequencing;
    Sequence comparison;
    USA;
    Combinatorial
</SU>
<AB>Preprint, 16 pp. "In the last few years theoretical computer scientists
have found new challenges in computational molecular biology. We discuss recent
advances and present some open combinatorial problems in different areas of
computational molecular biology such as genome rearrangements, DNA physical
mapping, DNA sequencing and sequence comparison."
</AB>
<JT>Israel Sympos Theor Comput Systems </JT>
<PY>3</PY>
<VO>3</VO>
<PP>???-???</PP>
</SEQ>

<SEQ>
<UI>1161   Cheng,H.D.    VLSI Architectures for.. Pattern Recogni 87 
20(1):125-141
</UI>
<AU>Cheng HD;
    Fu KS
</AU>
<TI>VLSI Architectures for String Matching and Pattern Matching
</TI>
<SU>VLSI;
    String match;
    Pattern match;
    Hardware;
    USA
</SU>
<AB>"In this paper, we discuss string-matching and dynamic time-warp pattern-
matching. ... We propose a VLSI architecture based on the space-time domain
expansion approach which can compute the string distance and also give the
matching index-pairs which correspond to the edit sequence. ... We also propose
a VLSI architecture for dynamic time-warping based on the space-time expansion
method which can obtain a high throughput by using extensive pipelining and
parallelism."
</AB>
<JT>Pattern Recognition </JT>
<PY>1987</PY>
<VO>20</VO>
<NO>1</NO>
<PP>125-141</PP>
</SEQ>

<SEQ>
<UI>1162   Hollaar,L.A.  Text Retrieval Computers Computer        79 
12(3):40-50
</UI>
<AU>Hollaar LA
</AU>
<TI>Text Retrieval Computers
</TI>
<SU>Hardware;
    Retrieval;
    USA
</SU>
<AB>"The hardware required for efficient text retrieval differs from that
required for retrieval of formatted data. Here is an examination of such
hardware, particularly term comparators."
</AB>
<JT>Computer </JT>
<PY>1979</PY>
<VO>12</VO>
<NO>3</NO>
<PP>40-50</PP>
</SEQ>

<SEQ>
<UI>1163   Iyengar,S.S.  A String Searching Alg.. Appl.Math.Compu 80 
6:123-131
</UI>
<AU>Iyengar SS;
    Alia V
</AU>
<TI>A String Searching Algorithm
</TI>
<SU>String search;
    USA;
    Algorithm
</SU>
<AB>"This paper is an attempt to develop a string searching algorithm that
begins the search for a match in the middle of the strings being compared. The
algorithm uses information gained from mismatches and the location of the 
search
area in the large string, to make decisions and direct the search. Several
elements of this algorithm can be useful in string searching applications."
</AB>
<JT>Appl Math Comput</JT>
<PY>6</PY>
<VO>6</VO>
<PP>123-131</PP>
</SEQ>

<SEQ>
<UI>1164   Felsenstein,J Confidence Limits on P.. Evolution       85 
39(4):783-791
</UI>
<AU>Felsenstein J
</AU>
<TI>Confidence Limits on Phylogenies: An Approach Using the Bootstrap
</TI>
<SU>Evolutionary tree;
    Robustness;
    Resampling;
    Bootstrap;
    Confidence;
    USA;
    Statistical;
    Phylogeny
</SU>
<AB>"The recently-developed statistical method known as the "bootstrap' can 
be
used to place confidence intervals on phylogenies. ... In the case of
phylogenies, it is argued that the proper method of resampling is to keep all 
of
the original species while sampling characters with replacement, under the
assumption that the characters have been independently drawn by the systematist
and have evolved independently. Majority-rule consensus trees can be used to
construct a phylogeny showing all of the inferred monophyletic groups that
occurred in a majority of the bootstrap samples."
</AB>
<JT>Evolution </JT>
<PY>1985</PY>
<VO>39</VO>
<NO>4</NO>
<PP>783-791</PP>
</SEQ>

<SEQ>
<UI>1165   Aoe,J.I.      An Efficient Implement.. SIGIR Forum     89 
23(3,4):22-33
</UI>
<AU>Aoe JI
</AU>
<TI>An Efficient Implementation of String Pattern Matching Machines for a
Finite Number of Keywords
</TI>
<SU>Pattern match;
    String match;
    Automata;
    Data structure;
    JP
</SU>
<AB>"This paper describes a method of implementing a static transition table
of a string pattern matching machine to locate all occurrences of a finite
number of keywords in a text string. The scheme combines the fast access of an
array representation with the compactness of a list structure. Each transition
can be computed from the present data structure in O(1) time and the storage is
as small as the list structure. The construction and pattern matching programs
associated with the present data structure are provided and the efficiency is
evaluated by empirical results."
</AB>
<JT>SIGIR Forum </JT>
<PY>1989</PY>
<VO>23</VO>
<NO>3,4</NO>
<PP>22-33</PP>
</SEQ>

<SEQ>
<UI>1166   Kuo,S.        An Improved Algorithm .. SIGIR Forum     89 
23(3,4):89-99
</UI>
<AU>Kuo S;
    Cross GR
</AU>
<TI>An Improved Algorithm to Find the Length of the Longest Common 
Subsequence
of Two Strings
</TI>
<SU>Pairwise comparison;
    Longest common;
    Subsequence;
    USA;
    Algorithm
</SU>
<AB>"We present an improvement to this algorithm [Hunt, Szymanski (1977)] 
....
Some experimental results show dramatic improvements for large n."
</AB>
<JT>SIGIR Forum </JT>
<PY>1989</PY>
<VO>23</VO>
<NO>3,4</NO>
<PP>89-99</PP>
</SEQ>

<SEQ>
<UI>1167   Staden,R.     An Improved Sequence H.. Comput.Appl.Bio 90 
6(4):387-393
</UI>
<AU>Staden R
</AU>
<TI>An Improved Sequence Handling Package that Runs on the Apple Macintosh
</TI>
<SU>Sequence analysis;
    Program;
    UK
</SU>
<AB>"We report improvements to our sequence analysis package and adaptation 
to
run on the Apple Macintosh range of machines. ... In addition to a large number
of small but useful extra features, some important new analytical functions 
have
been devised. These include sequence and contig editors; optimal alignment and
comparison methods; and a new method for comparing the observed and expected
frequencies of selected oligonucleotides."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1990</PY>
<VO>6</VO>
<NO>4</NO>
<PP>387-393</PP>
</SEQ>

<SEQ>
<UI>1168   Gleeson,T.J.  An X Windows and UNIX .. Comput.Appl.Bio 91 
7(3):398-0
</UI>
<AU>Gleeson TJ;
    Staden R
</AU>
<TI>An X Windows and UNIX Implementation of Our Sequence Analysis Package
</TI>
<SU>Sequence analysis;
    Program;
    UK
</SU>
<AB>"Our comprehensive package of programs for handling and analysing
sequences (references in Staden, 1990) has been used on VAX machines running
under the VMS operating system for many years. ... Further modifications to the
original FORTRAN and an additional set of routines written in C have enabled us
to produce two new versions of the programs to run under the X Window System.
The first runs under the terminal emulator xterm, and the second runs directly
under X."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1991</PY>
<VO>7</VO>
<NO>3</NO>
<PP>398-0</PP>
</SEQ>

<SEQ>
<UI>1169   Stephen,G.A.  String Searching Algor..                 94World 
Scientifi
</UI>
<AU>Stephen GA
</AU>
<TI>String Searching Algorithms
BK  -
</TI>
<SU>String search;
    String match;
    Approximate match;
    Search tree;
    Distance;
    Repeat;
    UK;
    Algorithm
</SU>
<AB>"This book presents a bibliographic overview of the field and an 
anthology
of detailed descriptions of the principal algorithms available. The aim is
twofold: on the one hand, to provide an easy-to-read comparison of the 
available
techniques in each area, and on the other, to furnish the reader with a
reference to in-depth descriptions of the major algorithms. Topics covered
include methods for finding exact and approximate string matches, calculating
'edit' distances between strings, finding common sequences and finding the
longest repetitions within strings."
</AB>
<PU>World Scientific Publishing </PU>
<PL>Singapore </PL>
<PY>1994</PY>
<PP>xii+243-0</PP>
</SEQ>

<SEQ>
<UI>1170   Ukkonen,E.    Approximate String-Mat.. Lecture Notes i 93 
684:228-242
</UI>
<AU>Ukkonen E
</AU>
<TI>Approximate String-Matching over Suffix Trees
</TI>
<SU>Approximate match;
    String match;
    Search tree;
    FI;
    Suffix
</SU>
<AB>4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings.
"The classical approximate string-matching problem ... is considered. We
concentrate on the special case in which [the text] T is available for
preprocessing before the searches with varying [pattern] P and [neighbourhood]
k. It is shown how the searches can be done fast using the suffix tree of T
augmented with the suffix links as the preprocessed form of T and applying
dynamic programming over the tree. Three variations of the search algorithm are
developed ...."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>684</PY>
<VO>684</VO>
<PP>228-242</PP>
</SEQ>

<SEQ>
<UI>1171   Vingron,M.    Multiple Sequence Comp.. Lecture Notes i 93 
684:243-253
</UI>
<AU>Vingron M;
    Pevzner PA
</AU>
<TI>Multiple Sequence Comparison and n-Dimensional Image Reconstruction
</TI>
<SU>Sequence comparison;
    Dot;
    Multiple alignment;
    USA
</SU>
<AB>4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings.
"In recent studies the usefulness of dot-matrices for multiple sequence
alignment has been proved. Viewing dot-matrices as projections of unknown n-
dimensional points, we consider the multiple alignment problem (for n 
sequences)
as an n-dimensional image reconstruction problem with noise. From this
perspective we introduce and develop the filtering method due to Vingron and
Argos (1991). ... An improved version of the original algorithm is introduced
that avoids costly dot-matrix multiplications ...."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>684</PY>
<VO>684</VO>
<PP>243-253</PP>
</SEQ>

<SEQ>
<UI>1172   Breslauer,D.  Tight Comparison Bound.. Lecture Notes i 93 
684:11-19
</UI>
<AU>Breslauer D;
    Colussi L;
    Toniolo L
</AU>
<TI>Tight Comparison Bounds for the String Prefix-Matching Problem
</TI>
<SU>String match;
    Prefix;
    Knuth-Morris-Pratt;
    Italy
</SU>
<AB>4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings.
"This is a natural generalization of the string matching problem where only
occurrences of the whole pattern are sought. The Knuth-Morris-Pratt string
matching algorithm can be easily adapted to solve the string prefix-matching
problem without making additional comparisons. In this paper we study the exact
complexity of the string prefix-matching problem in the deterministic 
sequential
comparison model. Our bounds do not account for comparisons made in a pattern
preprocessing step."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>684</PY>
<VO>684</VO>
<PP>11-19</PP>
</SEQ>

<SEQ>
<UI>1173   Iliopoulos,C. Covering a String        Lecture Notes i 93 
684:54-62
</UI>
<AU>Iliopoulos CS;
    Moore DWG;
    Park K
</AU>
<TI>Covering a String
</TI>
<SU>Repetition;
    Regularities;
    AU
</SU>
<AB>4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings.
"We consider the problem of finding the repetitive structures of a given string
x. The period u of the string x grasps the repetitiveness of x, since x is a
prefix of a string constructed by concatenations of u. We generalize the 
concept
of repetitiveness as follows: A string w covers a string x if there exists a
string constructed by concatenations and superpositions of w of which x is a
substring. A substring w of x is called a seed of x if w covers x. We present 
an
O(n log n) time algorithm for finding all the seeds of a given string of length
n."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>684</PY>
<VO>684</VO>
<PP>54-62</PP>
</SEQ>

<SEQ>
<UI>1174   Irving,R.W.   On the Worst-Case Beha.. Lecture Notes i 93 
684:63-73
</UI>
<AU>Irving RW;
    Fraser CB
</AU>
<TI>On the Worst-Case Behaviour of some Approximation Algorithms for the
Shortest Common Supersequence of k Strings
</TI>
<SU>Supersequence;
    Shortest common;
    UK;
    Approximation;
    Algorithm
</SU>
<AB>4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings.
"Two natural polynomial-time approximation algorithms for the shortest common
supersequence (SCS) of k strings are analysed from the point of view of worst-
case performance guarantee. Both algorithms behave badly in the worst case,
whether the underlying alphabet is unbounded or of fixed size."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>684</PY>
<VO>684</VO>
<PP>63-73</PP>
</SEQ>

<SEQ>
<UI>1175   Kannan,S.K.   An Algorithm for Locat.. Lecture Notes i 93 
684:74-86
</UI>
<AU>Kannan SK;
    Myers EW
</AU>
<TI>An Algorithm for Locating Non-Overlapping Regions of Maximum Alignment
Score
</TI>
<SU>Sequence alignment;
    Repeat;
    Region;
    USA;
    Score;
    Algorithm
</SU>
<AB>4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings.
"In this paper we present an O(N2 log2 N) algorithm for finding the two non-
overlapping substrings of a given string of length N which have the highest-
scoring alignment between them. This significantly improves the previously best
known bound of O(N3) for the worst-case complexity of this problem."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>684</PY>
<VO>684</VO>
<PP>74-86</PP>
</SEQ>

<SEQ>
<UI>1176   Kececioglu,J. Exact and Approximatio.. Lecture Notes i 93 
684:87-105
</UI>
<AU>Kececioglu J;
    Sankoff D
</AU>
<TI>Exact and Approximation Algorithms for the Inversion Distance between Two
Chromosomes
</TI>
<SU>Genome;
    Sequence proximity;
    Chromosome;
    Inversion;
    CA;
    Approximation;
    Distance;
    Reversal;
    Transposition;
    Translocation;
    Algorithm
</SU>
<AB>(To appear in Algorithmica, 1994, as "Exact and Approximation Algorithms
for Sorting by Reversals, with Application to Genome Rearrangements.") 4th
Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings. 
"Motivated
by the problem in computational biology of reconstructing the series of
chromosome inversions by which one organism evolved from another, we consider
the problem of computing the shortest series of reversals that transform one
permutation to another. The permutations describe the order of genes on
corresponding chromosomes, and a reversal takes an arbitrary substring of
elements and reverses their order. For this problem we develop two algorithms: 
a
greedy approximation algorithm ... and a branch and bound exact algorithm that
finds an optimal solution ...."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>684</PY>
<VO>684</VO>
<PP>87-105</PP>
</SEQ>

<SEQ>
<UI>1177   Kececioglu,J. The Maximum Weight Tra.. Lecture Notes i 93 
684:106-119
</UI>
<AU>Kececioglu J
</AU>
<TI>The Maximum Weight Trace Problem in Multiple Sequence Alignment
</TI>
<SU>Sequence alignment;
    Multiple alignment;
    USA
</SU>
<AB>4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings.
"We define a new problem in multiple sequence alignment, called maximum weight
trace. The problem formalizes in a natural way the common practice of merging
pairwise alignments to form multiple sequence alignments, and contains a 
version
of the minimum sum of pairs alignment problem as a special case. ... We develop
a branch and bound algorithm for maximum weight trace. Though the problem is 
NP-
complete, an implementation of the algorithm shows we can solve instances on as
many as 6 sequences of length 250 in a few minutes."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>684</PY>
<VO>684</VO>
<PP>106-119</PP>
</SEQ>

<SEQ>
<UI>1178   Landau,G.M.   An Algorithm for Appro.. Lecture Notes i 93 
684:120-133
</UI>
<AU>Landau GM;
    Schmidt JP
</AU>
<TI>An Algorithm for Approximate Tandem Repeats
</TI>
<SU>Repeat;
    Approximate match;
    USA;
    Algorithm
</SU>
<AB>4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings.
"A perfect tandem repeat within a string S is a substring r = uv for which u =
v. An approximate tandem repeat is a substring r = uv for which u and v are
similar. In this paper we consider two criterions of similarity: the Hamming
distance (k mismatches) and the edit distance (k differences). For a string S 
of
length n and an integer k our algorithm reports all locally optimal approximate
repeats ...."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>684</PY>
<VO>684</VO>
<PP>120-133</PP>
</SEQ>

<SEQ>
<UI>1179   Louchard,G.   Analysis of a String E.. Lecture Notes i 93 
684:152-163
</UI>
<AU>Louchard G;
    Szpankowski W
</AU>
<TI>Analysis of a String Edit Problem in a Probabilistic Framework (Extended
Abstract)
</TI>
<SU>Edit;
    Sequence proximity;
    Probabilistic;
    Belgium
</SU>
<AB>4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings.
"We consider a string edit problem in a probabilistic framework. ... In
particular, we observe that the [edit distance] is asymptotically almost surely
(a.s.) equal to an where a is a constant and n is the sum of lengths of both
strings. We also obtained some bounds on a in the so called independent model 
in
which all weights ... are assumed to be independent. More importantly, we show
that the edit distance is well concentrated around its average value. As a by-
product of our results, we also present a precise estimate of the number of
alignments between two strings."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>684</PY>
<VO>684</VO>
<PP>152-163</PP>
</SEQ>

<SEQ>
<UI>1180   Muthukrishnan Detecting False Matche.. Lecture Notes i 93 
684:164-178
</UI>
<AU>Muthukrishnan S
</AU>
<TI>Detecting False Matches in String Matching Algorithms
</TI>
<SU>String match;
    Parallel;
    USA;
    Algorithm
</SU>
<AB>4th Annual Symposium, CPM 93. Padova, Italy, June 2-4, 1993. Proceedings.
"Consider a text string of length n, a pattern string of length m and a match
vector of length n which declares each location in the text to be either a 
match
... or a potential match. ... We investigate the complexity of two problems in
this context, namely, checking if there is any false match, and identifying all
the false matches in the match vector. We present an algorithm on the CRCW PRAM
that checks if there exists any false match in O(1) time using O(n) processors.
Since string matching takes W(log log m) time on the CRCW PRAM, checking for
false matches is provably simpler than string matching."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>684</PY>
<VO>684</VO>
<PP>164-178</PP>
</SEQ>

<SEQ>
<UI>1181   Szpankowski,W Probabilistic Analysis.. Lecture Notes i 92 644:1-14
</UI>
<AU>Szpankowski W
</AU>
<TI>Probabilistic Analysis of Generalized Suffix Trees (Extended Abstract)
</TI>
<SU>Search tree;
    Data structure;
    String match;
    Probabilistic;
    USA;
    Suffix
</SU>
<AB>Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992.
Proceedings. "Suffix trees find several applications in computer science and
telecommunications, most notably in algorithms on strings, data compressions 
and
codes. We consider in a probabilistic framework a family of generalized suffix
trees - called b-suffix trees - built from the first n suffixes of a random
word. ... Several parameters of b-suffix trees are of interest, namely the
typical depth, the depth of insertion, the height, the external path length, 
and
so forth. We establish some results concerning typical, that is, almost sure
(a.s.), behavior of these parameters."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>644</PY>
<VO>644</VO>
<PP>1-14</PP>
</SEQ>

<SEQ>
<UI>1182   Regnier,M.    A Language Approach to.. Lecture Notes i 92 
644:15-26
</UI>
<AU>Regnier M
</AU>
<TI>A Language Approach to String Searching Evaluation
</TI>
<SU>String search;
    Probabilistic;
    Knuth-Morris-Pratt;
    Boyer-Moore;
    Markov;
    FR;
    Language
</SU>
<AB>Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992.
Proceedings. "We propose a general framework to derive average performance of
string searching algorithms that preprocess the pattern. It relies mainly on
languages and combinatorics on words, joined to some probabilistic tools. The
approach is quite powerful: although we concentrate here on Morris-Pratt and
Boyer-Moore-Horspool, it applies to a large class of algorithms. A fairly
general character distribution is assumed, namely a Markovian one, suitable for
applications such as natural languages or biological databases searching."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>644</PY>
<VO>644</VO>
<PP>15-26</PP>
</SEQ>

<SEQ>
<UI>1183   Atallah,M.J.  Pattern Matching With .. Lecture Notes i 92 
644:27-40
</UI>
<AU>Atallah MJ;
    Jacquet P;
    Szpankowski W
</AU>
<TI>Pattern Matching With Mismatches: A Probabilistic Analysis and a
Randomized Algorithm (Extended Abstract)
</TI>
<SU>Pattern match;
    Approximate match;
    Probabilistic;
    USA;
    Algorithm
</SU>
<AB>Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992.
Proceedings. "Given a text of length n and a pattern of length m over some
(possibly unbounded) alphabet, we consider the problem of finding all positions
in the text at which the pattern 'almost occurs'. Here by 'almost occurs' we
mean that at least some fixed fraction r of the characters of the pattern ...
are equal to their corresponding characters in the text. We design a randomized
algorithm that has O(n log m) worst-case time complexity and computes with high
probability all of the almost-occurrences of the pattern in the text."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>644</PY>
<VO>644</VO>
<PP>27-40</PP>
</SEQ>

<SEQ>
<UI>1184   Kim,J.Y.      Fast Multiple Keyword .. Lecture Notes i 92 
644:41-51
</UI>
<AU>Kim JY;
    Shawe-Taylor J
</AU>
<TI>Fast Multiple Keyword Searching
</TI>
<SU>Dictionary match;
    N-gram;
    UK
</SU>
<AB>Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992.
Proceedings. "A new multiple keyword searching algorithm is presented as a
generalization of a fast substring matching algorithm based on an n-gram
technique. The expected searching time complexity is shown to be O((N/m + ml)
log lm) under reasonable assumptions about the keywords together with the
assumption that the text is drawn from a stationary ergodic source, where N is
the text size, l the number of keywords and m the smallest keyword size."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>644</PY>
<VO>644</VO>
<PP>41-51</PP>
</SEQ>

<SEQ>
<UI>1185   Knight,J.R.   Approximate Regular Ex.. Lecture Notes i 92 
644:67-78
</UI>
<AU>Knight JR;
    Myers EW
</AU>
<TI>Approximate Regular Expression Pattern Matching with Concave Gap 
Penalties
</TI>
<SU>Pattern match;
    Approximate match;
    Language;
    Gap;
    USA;
    Expression
</SU>
<AB>Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992.
Proceedings. "Given a sequence A of length M and a regular expression R of
length P, an approximate regular expression pattern matching algorithm computes
the score of the best alignment between A and one of the sequences exactly
matched by R. There are a variety of schemes for scoring alignments. ... In 
this
paper we present an O(MP(log M + log2 P)) algorithm for approximate regular
expression matching for an arbitrary [function scoring each aligned pair of
symbols] and any concave [gap weighting function]."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>644</PY>
<VO>644</VO>
<PP>67-78</PP>
</SEQ>

<SEQ>
<UI>1186   Fischetti,V.A Identifying Periodic O.. Lecture Notes i 92 
644:111-120
</UI>
<AU>Fischetti VA;
    Landau GM;
    Schmidt JP;
    Sellers PH
</AU>
<TI>Identifying Periodic Occurrences of a Template with Applications to
Protein Structure
</TI>
<SU>Regularities;
    Template;
    Match a pattern matrix;
    Structure;
    USA;
    Protein
</SU>
<AB>Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992.
Proceedings. "We consider a string matching problem where the pattern is a
template that matches many different strings with various degrees of 
perfection.
... For a text T of length n, and a template P of length m, we wish to find the
best alignment of T with Pn, which is the concatenation of n copies of P, (m
will typically be much smaller than n). ... We show that the structure of Pn 
can
be exploited and the problem reduced to essentially solving a dynamic
programming of size O(mn)."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>644</PY>
<VO>644</VO>
<PP>111-120</PP>
</SEQ>

<SEQ>
<UI>1187   Sankoff,D.    Edit Distance for Geno.. Lecture Notes i 92 
644:121-135
</UI>
<AU>Sankoff D
</AU>
<TI>Edit Distance for Genome Comparison Based on Non-Local Operations
</TI>
<SU>Genome;
    Edit;
    Distance;
    Rearrangement;
    CA
</SU>
<AB>Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992.
Proceedings. Motivated by "the feasibility of evolutionary inference based on
the macrostructure of entire genomes, rather than on the traditional comparison
of homologous versions of a single gene in different organisms. In this paper,
we define a number of measures of gene order rearrangement, describe algorithm
design and software development for the calculation of some of these quantities
in single-chromosome genomes, and report on the results of applying these tools
to a database of mitochondrial gene orders inferred from genomic sequences."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>644</PY>
<VO>644</VO>
<PP>121-135</PP>
</SEQ>

<SEQ>
<UI>1188   Chang,W.I.    Theoretical and Empiri.. Lecture Notes i 92 
644:175-184
</UI>
<AU>Chang WI;
    Lampe J
</AU>
<TI>Theoretical and Empirical Comparisons of Approximate String Matching
Algorithms
</TI>
<SU>String match;
    Approximate match;
    Match with k differences;
    Probabilistic;
    USA;
    Algorithm
</SU>
<AB>Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992.
Proceedings. "We study in depth a model of non-exact pattern matching based on
edit distance .... More precisely, the k differences approximate string 
matching
problem specifies .... We have carefully implemented and analyzed various O(kn)
algorithms based on dynamic programming (DP).... A new algorithm is presented
that computes much fewer entries of the DP table. ... We give a probabilistic
analysis of the DP table in order to prove that the expected running time of 
our
algorithm ... is O(kn) for random text."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>644</PY>
<VO>644</VO>
<PP>175-184</PP>
</SEQ>

<SEQ>
<UI>1189   Pevzner,P.A.  Multiple Alignment wit.. Lecture Notes i 92 
644:205-213
</UI>
<AU>Pevzner PA
</AU>
<TI>Multiple Alignment with Guaranteed Error Bounds and Communication Cost
</TI>
<SU>Multiple alignment;
    Error;
    Graph;
    USA
</SU>
<AB>Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992.
Proceedings. "Dynamic programming for optimal multiple alignment requires too
much time to be practical. Although many algorithms for suboptimal alignment
have been suggested, no 'performance guarantees' have been known until 
recently.
We give an approximation multiple alignment algorithm with guaranteed error
bounds equal to the normalized communication cost of a corresponding graph."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>644</PY>
<VO>644</VO>
<PP>205-213</PP>
</SEQ>

<SEQ>
<UI>1190   Hui,L.C.K.    Color Set Size Problem.. Lecture Notes i 92 
644:230-243
</UI>
<AU>Hui LCK
</AU>
<TI>Color Set Size Problem with Applications to String Matching
</TI>
<SU>String match;
    Longest common;
    Multiple comparison;
    USA
</SU>
<AB>Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992.
Proceedings. "This paper gives an optimal sequential solution of the color set
size problem and string matching applications including a linear time algorithm
for the problem of finding the longest substring common to at least k out of m
input strings for all k between 1 and m. In addition, parallel solutions to the
above problems are given. These solutions may shed light on problems in
computational biology, such as the multiple string alignment problem."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>644</PY>
<VO>644</VO>
<PP>230-243</PP>
</SEQ>

<SEQ>
<UI>1191   Mehta,D.P.    Computing Display Conf.. Lecture Notes i 92 
644:244-261
</UI>
<AU>Mehta DP;
    Sahni S
</AU>
<TI>Computing Display Conflicts in String and Circular String Visualization
</TI>
<SU>Sequence analysis;
    Display;
    Graph;
    Data structure;
    USA
</SU>
<AB>Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992.
Proceedings. "We have proposed a model for the visualization of strings and
circular strings, where we introduced the problem of display conflicts. In this
paper, we provide efficient algorithms for computing display conflicts in 
linear
strings. These algorithms make use of the scdawg data structure for linear
strings. We also extend the scdawg data structure to represent circular 
strings.
The resulting data structure may now be employed to compute display conflicts 
in
circular strings by using the algorithms for computing conflicts in linear
strings."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>644</PY>
<VO>644</VO>
<PP>244-261</PP>
</SEQ>

<SEQ>
<UI>1192   Amir,A.       Efficient Randomized D.. Lecture Notes i 92 
644:262-275
</UI>
<AU>Amir A;
    Farach M;
    Matias Y
</AU>
<TI>Efficient Randomized Dictionary Matching Algorithms (Extended Abstract)
</TI>
<SU>Dictionary match;
    USA;
    Algorithm
</SU>
<AB>Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992.
Proceedings. "In string matching, randomized algorithms have primarily made use
of randomized hashing functions which convert strings into 'signatures' or
'finger prints'. We explore the use of finger prints in conjunction with other
randomized and deterministic techniques and data structures. We present several
new algorithms for dictionary matching, along with parallel algorithms which 
are
simpler or more efficient than previously known algorithms."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>644</PY>
<VO>644</VO>
<PP>262-275</PP>
</SEQ>

<SEQ>
<UI>1193   Idury,R.M.    Dynamic Dictionary Mat.. Lecture Notes i 92 
644:276-287
</UI>
<AU>Idury RM;
    Schaffer AA
</AU>
<TI>Dynamic Dictionary Matching with Failure Functions
</TI>
<SU>Dictionary match;
    USA;
    Function;
    Dynamic
</SU>
<AB>Third Annual Symposium, CPM 92. Tucson, Arizona, April 29 - May 1, 1992.
Proceedings. "Amir, Farach, Galil, Giancarlo, and Park used an automaton based
on suffix trees to solve the dynamic [dictionary matching] problem. We show how
to match their time bounds for update and search using a failure function
framework, similar to that used by Aho and Corasick to solve the static
dictionary matching problem. We then show that our approach allows us to 
achieve
faster search times at the expense of the update times. Finally, we show how to
speed up the initial dictionary construction."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>644</PY>
<VO>644</VO>
<PP>276-287</PP>
</SEQ>

<SEQ>
<UI>1194   Libertini,G.  "Reconstruction of Anc.. J.Mol.Evol.     94 
39:219-229
</UI>
<AU>Libertini G;
    Di Donato A
</AU>
<TI>"Reconstruction of Ancestral Sequences by the Inferential Method, a Tool
for Protein Engineering Studies
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Italy;
    Protein
</SU>
<AB>"This paper describes the inferential method, an approach for
reconstructing protein and nucleotide sequences of ancestral species, starting
from known, homologous, contemporary sequences. The method requires knowledge 
of
the topology of the phylogenetic tree, whose nodes are the species to whom the
reconstructed sequences belong. The method has been tested by computer
simulation of speciation and nucleotide substitutions, starting from a single
ancestral sequence, and by subsequent reconstruction of nodal sequences."
</AB>
<JT>J Mol Evol</JT>
<PY>39</PY>
<VO>39</VO>
<PP>219-229</PP>
</SEQ>

<SEQ>
<UI>1195   Huelsenbeck,J Success of Phylogeneti.. Syst.Biol.      93 
42(3):247-264
</UI>
<AU>Huelsenbeck JP;
    Hillis DM
</AU>
<TI>Success of Phylogenetic Methods in the Four-Taxon Case
</TI>
<SU>Phylogeny;
    USA;
    Phylogenetic
</SU>
<AB>"The success of 16 methods of phylogenetic inference was examined using
consistency and simulation analysis. Success - the frequency with which a tree-
making method correctly identified the true phylogeny - was examined for an
unrooted four-taxon tree. In this study, tree-making methods were examined 
under
a large number of branch-length conditions and under three models of sequence
evolution. The results are plotted to facilitate comparisons among the methods.
The consistency analysis indicated which methods converge on the correct tree
given infinite sample size."
</AB>
<JT>Syst Biol</JT>
<PY>1993</PY>
<VO>42</VO>
<NO>3</NO>
<PP>247-264</PP>
</SEQ>

<SEQ>
<UI>1196   Miyata,T.     Two Types of Amino Aci.. J.Mol.Evol.     79 
12:219-236
</UI>
<AU>Miyata T;
    Miyazawa S;
    Yasunaga T
</AU>
<TI>Two Types of Amino Acid Substitutions in Protein Evolution
</TI>
<SU>Substitution;
    Amino acid;
    Protein;
    Evolution;
    JP
</SU>
<AB>"The frequency of amino acid substitutions, relative to the frequency
expected by chance, decreases linearly with the increase in physico-chemical
differences between amino acid pairs involved in a substitution. This
correlation does not apply to abnormal human hemoglobins. Since abnormal
hemoglobins mostly reflect the process of mutation rather than selection, the
correlation manifest during protein evolution between substitution frequency 
and
physico-chemical difference in amino acids can be attributed to natural
selection. ... From this analysis, we can show that there exists another type 
of
substitution which depends less on the extent of physico-chemical properties of
substituted amino acids."
</AB>
<JT>J Mol Evol</JT>
<PY>12</PY>
<VO>12</VO>
<PP>219-236</PP>
</SEQ>

<SEQ>
<UI>1197   Kishino,H.    Evaluation of the Maxi.. J.Mol.Evol.     89 
29:170-179
</UI>
<AU>Kishino H;
    Hasegawa M
</AU>
<TI>Evaluation of the Maximum Likelihood Estimate of the Evolutionary Tree
Topologies from DNA Sequence Data, and the Branching Order in Hominoidea
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Likelihood;
    JP;
    Robustness;
    Analytical;
    DNA;
    Topology
</SU>
<AB>"In evaluating the extent to which the maximum likelihood tree is a
significantly better representation of the true tree, it is important to
estimate the variance of the difference between log likelihood of different 
tree
topologies. Bootstrap resampling can be used for this purpose ... but it 
imposes
a great computation burden. To overcome this difficulty, we developed a new
method for estimating the variance by expressing it directly."
</AB>
<JT>J Mol Evol</JT>
<PY>29</PY>
<VO>29</VO>
<PP>170-179</PP>
</SEQ>

<SEQ>
<UI>1198   Saitou,N.     Relative Efficiencies .. Mol.Biol.Evol.  89 
6(5):514-525
</UI>
<AU>Saitou N;
    Imanishi T
</AU>
<TI>Relative Efficiencies of the Fitch-Margoliash, Maximum-Parsimony, 
Maximum-
Likelihood, Minimum-Evolution, and Neighbor-joining Methods of Phylogenetic 
Tree
Construction in Obtaining the Correct Tree
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Clustering;
    Distance;
    JP;
    Minimum evolution;
    Phylogenetic;
    Neighbor joining
</SU>
<AB>"The relative efficiencies of several tree-making methods for obtaining
the correct phylogenetic tree were studied by using computer simulation. ... If
one considers the computational time involved, the [neighbor-joining] method
seems to be a method of choice."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1989</PY>
<VO>6</VO>
<NO>5</NO>
<PP>514-525</PP>
</SEQ>

<SEQ>
<UI>1199   Sourdis,J.    Relative Efficiencies .. Mol.Biol.Evol.  88 
5(3):298-311
</UI>
<AU>Sourdis J;
    Nei M
</AU>
<TI>Relative Efficiencies of the Maximum Parsimony and Distance-Matrix 
Methods
in Obtaining the Correct Phylognetic Tree
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Distance;
    Parsimony;
    USA
</SU>
<AB>"The relative efficiencies of the maximum parsimony (MP) and distance-
matrix methods in obtaining the correct tree (topology) were studied by using
computer simulation. The distance-matrix methods examined are the neighbor-
joining, distance-Wagner, Tateno et al. modified Farris, Faith, and Li 
methods."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1988</PY>
<VO>5</VO>
<NO>3</NO>
<PP>298-311</PP>
</SEQ>

<SEQ>
<UI>1200   Tajima,F.     Estimation of Evolutio.. Mol.Biol.Evol.  94 
11(2):278-286
</UI>
<AU>Tajima F;
    Takezaki N
</AU>
<TI>Estimation of Evolutionary Distance for Reconstructing Molecular
Phylogenetic Trees
</TI>
<SU>Phylogeny;
    Evolutionary distance;
    Evolutionary tree;
    JP;
    Distance;
    Phylogenetic;
    Estimation
</SU>
<AB>"The most commonly used measure of evolutionary distance in molecular
phylogenetics is the number of nucleotide substitutions per site. However, this
number is not necessarily most efficient for reconstructing a phylogenetic 
tree.
In order to evaluate the accuracy of evolutionary distance for obtaining the
correct tree topology, an accuracy index, A(t), was proposed. ... Using A(t),
namely, finding the condition under which A(t) gives the maximum value, we can
obtain an evolutionary distance which is efficient for obtaining the correct
topology."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1994</PY>
<VO>11</VO>
<NO>2</NO>
<PP>278-286</PP>
</SEQ>

<SEQ>
<UI>1201   Tateno,Y.     Relative Efficiencies .. Mol.Biol.Evol.  94 
11(2):261-277
</UI>
<AU>Tateno Y;
    Takezaki N;
    Nei M
</AU>
<TI>Relative Efficiencies of the Maximum-Likelihood, Neighbor-Joining, and
Maximum-Parsimony Methods When Substitution Rate Varies with Site
</TI>
<SU>Phylogeny;
    Substitution;
    Joining;
    Likelihood;
    Parsimony;
    USA;
    Rate;
    Neighbor joining
</SU>
<AB>"The relative efficiencies of the maximum-likelihood (ML), neighbor-
joining (NJ), and maximum-parsimony (MP) methods in obtaining the correct
topology and in estimating the branch lengths for the case of four DNA 
sequences
were studied by computer simulation, under the assumption either that there is
variation in substitution rate among different nucleotide sites or that there 
is
no variation."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1994</PY>
<VO>11</VO>
<NO>2</NO>
<PP>261-277</PP>
</SEQ>

<SEQ>
<UI>1202   Hendy,M.D.    Spectral Analysis of P.. J.Classif.      93 10:5-24
</UI>
<AU>Hendy MD;
    Penny D
</AU>
<TI>Spectral Analysis of Phylogenetic Data
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    NZ;
    Spectral analysis;
    Phylogenetic
</SU>
<AB>"The spectral analysis of sequence and distance data is a new approach to
phylogenetic analysis. ... We develop an optimality selection procedure using a
least squares best fit, to find the phylogenetic tree whose tree spectrum most
closely matches the conjugate spectrum. An inferred sequence spectrum can be
derived from the selected tree spectrum using the inverse Hadamard conjugation
to allow a comparison with the original sequence spectrum."
</AB>
<JT>J Classif</JT>
<PY>10</PY>
<VO>10</VO>
<PP>5-24</PP>
</SEQ>

<SEQ>
<UI>1203   Cavender,J.A. Invariants of Phylogen.. J.Classif.      87 4:57-71
</UI>
<AU>Cavender JA;
    Felsenstein J
</AU>
<TI>Invariants of Phylogenies in a Simple Case with Discrete States
</TI>
<SU>Phylogeny;
    Invariant;
    Statistical;
    USA
</SU>
<AB>"Under a simple model of transition between two states, we can work out
the probabilities of different data outcomes in four species with any given
phylogeny. For a given tree topology, if all characters are evolving under the
same probabilistic model, there are two quadratic forms in the frequencies of
outcomes that must be zero. It may be possible to test the null hypothesis that
the tree is of a particular topology by testing whether these quadratic forms
are zero. One of the tests is a test for independence in a simple 2 x 2
contingency table."
</AB>
<JT>J Classif</JT>
<PY>4</PY>
<VO>4</VO>
<PP>57-71</PP>
</SEQ>

<SEQ>
<UI>1204   Sankoff,D.    Designer Invariants fo.. Mol.Biol.Evol.  90 
7(3):255-269
</UI>
<AU>Sankoff D
</AU>
<TI>Designer Invariants for Large Phylogenies
</TI>
<SU>Phylogeny;
    Invariant;
    Markov;
    CA
</SU>
<AB>"The Cavender-Felsenstein edge-length invariants for binary characters on
4-trees provide the starting point for the development of 'customized'
invariants for evaluating and comparing phylogenetic hypotheses. The binary
character invariants may be generalized to k-valued characters without losing
the quadratic nature of the invariants .... The key to the approach is that
certain sets of these configurations constitute events which are
probabilistically independent from other such sets, under the symmetric Markov
change models studied. By introducing more complex sets of configurations, we
find the quadratic invariants for 5-trees in the binary model ...."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1990</PY>
<VO>7</VO>
<NO>3</NO>
<PP>255-269</PP>
</SEQ>

<SEQ>
<UI>1205   Penny,D.      Trees from Sequences: .. Austral.Syst.Bo 90 3(10 
Aug.):21-
</UI>
<AU>Penny D;
    Hendy MD;
    Zimmer EA;
    Hamby RK
</AU>
<TI>Trees from Sequences: Panacea or Pandora's Box?
</TI>
<SU>Phylogeny;
    Reliability;
    Robustness;
    Consistency;
    NZ
</SU>
<AB>"There are however still many problems estimating the reliability of the
results of tree reconstruction. These are discussed, with examples, under the
three headings of sampling error, methodological problems, and human errors. 
The
methodological problems are the hardest to solve. They include the large number
of trees, incomplete use information, inconsistency (converging to an incorrect
tree), problems derived from unknown selection pressures on sequences, and 
trees
being an inappropriate model. To overcome these problems, a good method for
reconstructing trees should have the properties of being fast, efficient,
consistent, robust and falsifiable."
</AB>
<JT>Austral Syst Bot</JT>
<PY>1990</PY>
<VO>3</VO>
<NO>10 Aug.</NO>
<PP>21-38</PP>
</SEQ>

<SEQ>
<UI>1206   Hendy,M.D.    The Relationship Betwe.. Syst.Zool.      89 
38:310-321
</UI>
<AU>Hendy MD
</AU>
<TI>The Relationship Between Simple Evolutionary Tree Models and Observable
Sequence Data
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    NZ;
    Model
</SU>
<AB>"Cavender (1978) introduced a model of an evolutionary branching process
on a sequence of characters, where the characters take either of two states 
with
symmetric probabilities of change between them. From this model ... we show how
to derive some properties of the resulting sequences and distance measures
between pairs of taxa. These can be used to test the effectiveness of current
algorithms for recovering [a given evolutionary tree], such as parsimony or
distance methods. The relationships are described in terms of two matrices of
exponential order."
</AB>
<JT>Syst Zool</JT>
<PY>38</PY>
<VO>38</VO>
<PP>310-321</PP>
</SEQ>

<SEQ>
<UI>1207   Penny,D.      Reliability of Evoluti.. Cold Spring Har 87 
52:857-862
</UI>
<AU>Penny D;
    Hendy MD;
    Henderson IM
</AU>
<TI>Reliability of Evolutionary Trees
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    NZ;
    Reliability
</SU>
<AB>"We describe a simple method for maximum likelihood for 2-state
characters, using Hadamard matrices. Because these matrices are easily 
inverted,
we can, for a given tree, calculate rates of evolution directly from the data.
The method has allowed us to compare maximum likelihood, minimal length, and
distance methods for reconstructing evolutionary trees. ... We have recently
described ... a new likelihood method that seems to be particularly suitable 
for
the long sequences of ribosomes. These sequences are sufficiently long to test
for convergence to a single tree and to allow estimates of deviations from a
simple model."
</AB>
<JT>Cold Spring Harbor Sympos Quant Biol</JT>
<PY>52</PY>
<VO>52</VO>
<PP>857-862</PP>
</SEQ>

<SEQ>
<UI>1208   Hendy,M.D.    A Framework for the Qu.. Syst.Zool.      89 
38(4):297-309
</UI>
<AU>Hendy MD;
    Penny D
</AU>
<TI>A Framework for the Quantitative Study of Evolutionary Trees
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    NZ
</SU>
<AB>"A direct method for calculating expected data from an evolutionary model
for two state characters is described. ... These relationships have been used 
to
analyse the behaviour of tree building algorithms under conditions when there
are sufficient data. ... With equal rates of evolution ... we show that for n=4
taxa, parsimony will always converge to the correct tree, but we give examples
with n=5 where parsimony will converge on an incorrect tree, even for equal
rates of evolution. A further example with n=6 shows convergence to an 
incorrect
tree with equal but arbitrarily small rates of change."
</AB>
<JT>Syst Zool</JT>
<PY>1989</PY>
<VO>38</VO>
<NO>4</NO>
<PP>297-309</PP>
</SEQ>

<SEQ>
<UI>1209   Penny,D.      Testing the Theory of .. Nature (Lond.)  82 297(20 
May):19
</UI>
<AU>Penny D;
    Foulds LR;
    Hendy MD
</AU>
<TI>Testing the Theory of Evolution by Comparing Phylogenetic Trees
Constructed from five Different Protein Sequences
</TI>
<SU>Evolutionary tree;
    NZ;
    Evolution;
    Protein;
    Phylogenetic
</SU>
<AB>"The theory of evolution predicts that similar phylogenetic trees should
be obtained from different sets of character data. We have tested this
prediction using sequence data for 5 proteins from 11 species. Our results are
consistent with the theory of evolution. ... The general conculsions from the
present work are that (1) it is possible to make falsifiable predictions from
the hypothesis that species have been linked in the past by an evolutionary 
tree
and (2) there is strong support from these five sequences for the theory of
evolution."
</AB>
<JT>Nature (Lond ) </JT>
<PY>1982</PY>
<VO>297</VO>
<NO>20 May</NO>
<PP>197-200</PP>
</SEQ>

<SEQ>
<UI>1210   Cavender,J.A. Mechanized Derivation .. Mol.Biol.Evol.  89 
6(3):301-316
</UI>
<AU>Cavender JA
</AU>
<TI>Mechanized Derivation of Linear Invariants
</TI>
<SU>Phylogeny;
    Invariant;
    Markov;
    USA
</SU>
<AB>"Linear invariants, discovered by Lake, promise to provide a versatile 
way
of inferring phylogenies on the basis of nucleic acid sequences .... A 
semigroup
of Markov transition matrices embodies the assumptions underlying the method,
and alternative semigroups exist. The set of all linear invariants may be
derived from the semigroup by using an algorithm described here. Under
assumptions no stronger than Lake's, there are &gt;50 independent linear 
invariants
for each of the 15 rooted trees linking four species."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1989</PY>
<VO>6</VO>
<NO>3</NO>
<PP>301-316</PP>
</SEQ>

<SEQ>
<UI>1211   Cavender,J.A. Taxonomy with Confidence Math.Biosci.    78 
40:271-280
</UI>
<AU>Cavender JA
</AU>
<TI>Taxonomy with Confidence
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    USA;
    Confidence
</SU>
<AB>"There are essentially three ways in which four species may be related in
a phylogenetic tree graph. It is usual to compute for each of these three
possibilities the smallest number of mutations that could have brought about 
the
observed distribution of characteristics among the four species. The graph that
minimizes this number is then preferred. In fact, the hypothesis that the graph
chosen in this way is correct may be accepted with confidence if the minimum is
strong in a sense described here. In principle, the theory could be extended to
treat sets of more than four species." See the erratum on page 309.
</AB>
<JT>Math Biosci</JT>
<PY>40</PY>
<VO>40</VO>
<PP>271-280</PP>
</SEQ>

<SEQ>
<UI>1212   Cavender,J.A. Tests of Phylogenetic .. Math.Biosci.    81 
54:217-229
</UI>
<AU>Cavender JA
</AU>
<TI>Tests of Phylogenetic Hypotheses under Generalized Models
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Statistical;
    USA;
    Model;
    Phylogenetic
</SU>
<AB>"Prospective topologies of phylogenetic trees can be tested as hypotheses
using statistics based on parsimony. If the unknown branch lengths of the trees
are different for different characters, the method still works. When the
transition probabilities between the states of characters are unequal in a 
known
or unknown degree, the method still works. Hybridization or horizontal gene
transfer in the history of a group can never be rejected; whether it can be
confidently detected is problematical. Only four species are treated here and
only binary characters."
</AB>
<JT>Math Biosci</JT>
<PY>54</PY>
<VO>54</VO>
<PP>217-229</PP>
</SEQ>

<SEQ>
<UI>1213   Moore,G.W.    A Method for Construct.. J.Theor.Biol.   73 
38:459-485
</UI>
<AU>Moore GW;
    Barnabas J;
    Goodman M
</AU>
<TI>A Method for Constructing Maximum Parsimony Ancestral Amino Acid 
Sequences
on a Given Network
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    USA;
    Parsimony;
    Amino acid;
    Network
</SU>
<AB>"A solution is presented for the problem of how to find ancestral codons
which minimize the number of mutations over a given network of species for 
which
character-states of aligned amino acid sequences among the contemporary species
are known. Three theorems which allow this 'maximum parsimony' problem to be
solved are proved; then the use of these theorems in finding maximum parsimony
ancestral codons is illustrated on a network of chicken and mammalian alpha
globin amino acid sequences at two alignment positions."
</AB>
<JT>J Theor Biol</JT>
<PY>38</PY>
<VO>38</VO>
<PP>459-485</PP>
</SEQ>

<SEQ>
<UI>1214   Farris,J.S.   A Probability Model fo.. Syst.Zool.      73 
22:250-256
</UI>
<AU>Farris JS
</AU>
<TI>A Probability Model for Inferring Evolutionary Trees
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Statistical;
    Stochastic;
    USA;
    Probability;
    Model
</SU>
<AB>"Estimation of evolutionary trees should be treated as a problem in
statistical inference, but such treatment requires the explicit formulation of 
a
stochastic model of the evolutionary process. Because an evolutionary inference
procedure is likely to be put to such uses as deciding the issue of whether
rates of evolution are homogeneous, the stochastic model underlying the
inference procedure should not assume homogeneity over time of the evolutionary
process .... Such a model is constructed, and it is shown that most 
parsimonious
trees are maximum-likelihood estimated evolutionary trees under the stochastic
model."
</AB>
<JT>Syst Zool</JT>
<PY>22</PY>
<VO>22</VO>
<PP>250-256</PP>
</SEQ>

<SEQ>
<UI>1215   Zuckerkandl,E Molecules as Documents.. J.Theor.Biol.   65 
8:357-366
</UI>
<AU>Zuckerkandl E;
    Pauling L
</AU>
<TI>Molecules as Documents of Evolutionary History
</TI>
<SU>Phylogeny;
    USA
</SU>
<AB>"Different types of molecules are discussed in relation to their fitness
for providing the basis for a molecular phylogeny. Best fit are the
'semantides', i.e. the different types of macromolecules that carry the genetic
information or a very extensive translation thereof. The fact that more than 
one
coding triplet may code for a given amino acid residue in a polypeptide leads 
to
the notion of 'isosemantic substitutions' in genic and messenger
polynucleotides. Such substitutions lead to differences in nucleotide sequence
that are not expressed by differences in amino acid sequence. Some possible
consequences of isosemanticism are discussed."
</AB>
<JT>J Theor Biol</JT>
<PY>8</PY>
<VO>8</VO>
<PP>357-366</PP>
</SEQ>

<SEQ>
<UI>1216   Felsenstein,J Maximum Likelihood and.. Syst.Zool.      73 
22:240-249
</UI>
<AU>Felsenstein J
</AU>
<TI>Maximum Likelihood and Minimum-Steps Methods for Estimating Evolutionary
Trees from Data on Discrete Characters
</TI>
<SU>Phylogeny;
    Likelihood;
    Evolutionary tree;
    Statistical;
    USA
</SU>
<AB>"The general maximum likelihood approach to the statistical estimation of
phylogenies is outlined, for data in which there are a number of discrete 
states
for each character. The details of the maximum likelihood method will depend on
the details of the probabilistic model of evolution assumed. There are a very
large number of possible models of evolution. For a few of the simpler models,
the calculation of the likelihood of an evolutionary tree is outlined. For 
these
models, the maximum likelihood tree will be the same as the 'most parsimonious'
tree if the probability of change during the evolution of the group is assumed 
a
priori to be very small."
</AB>
<JT>Syst Zool</JT>
<PY>22</PY>
<VO>22</VO>
<PP>240-249</PP>
</SEQ>

<SEQ>
<UI>1217   Felsenstein,J Cases in which Parsimo.. Syst.Zool.      78 
27:401-410
</UI>
<AU>Felsenstein J
</AU>
<TI>Cases in which Parsimony or Compatibility Methods will be Positively
Misleading
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Likelihood;
    USA;
    Parsimony;
    Compatibility
</SU>
<AB>Republished as Felsenstein (1984). "For some simple three- and four-
species cases involving a character with two states, it is determined under 
what
conditions several methods of phylogenetic inference will fail to converge to
the true phylogeny as more and more data are accumulated. The methods are the
Camin-Sokal parsimony method, the compatibility method, and Farris's unrooted
Wagner tree parsimony method. In all cases the conditions for this failure
(which is the failure to be statistically consistent) are essentially that
parallel changes exceed informative, nonparallel changes."
</AB>
<JT>Syst Zool</JT>
<PY>27</PY>
<VO>27</VO>
<PP>401-410</PP>
</SEQ>

<SEQ>
<UI>1218   Felsenstein,J A Likelihood Approach .. Biol.J.Linn.Soc 81 
16:183-196
</UI>
<AU>Felsenstein J
</AU>
<TI>A Likelihood Approach to Character Weighting and What It Tells Us about
Parsimony and Compatibility
</TI>
<SU>Character weight;
    Statistical;
    Likelihood;
    Phylogeny;
    USA;
    Parsimony;
    Compatibility
</SU>
<AB>"The statistical framework of maximum likelihood estimation is used to
examine character weighting in inferring phylogenies. A simple probabilistic
model of evolution is used, in which each character evolves independently among
two states, and different lineages evolve independently. When different
characters have different known probabilities of change, all sufficiently 
small,
the proper maximum likelihood method of estimating phylogenies is a weighted
parsimony method in which the weights are logarithmically related to the rates
of change. When rates of change are taken extremely small, the weights become
more equal and unweighted parsimony methods are obtained."
</AB>
<JT>Biol J Linn Soc</JT>
<PY>16</PY>
<VO>16</VO>
<PP>183-196</PP>
</SEQ>

<SEQ>
<UI>1219   Felsenstein,J Parsimony in Systemati.. Annu.Rev.Ecol.S 83 
14:313-333
</UI>
<AU>Felsenstein J
</AU>
<TI>Parsimony in Systematics: Biological and Statistical Issues
</TI>
<SU>Phylogeny;
    Statistical;
    USA;
    Parsimony;
    Systematics
</SU>
<AB>"Quite recently 'parsimony' has become the favored method for inferring
phylogenies (evolutionary trees). The accompanying article by Elliott Sover
discusses the philosophical issues relating to the status of parsimony from a
somewhat different perspective than that adopted here. This review will discuss
parsimony, its origins, its major variants, and its biological assumptions."
</AB>
<JT>Annu Rev Ecol Syst</JT>
<PY>14</PY>
<VO>14</VO>
<PP>313-333</PP>
</SEQ>

<SEQ>
<UI>1220   Swofford,D.L. When are Phylogeny Est.. Phylogenetic .. 91Oxford 
Universi
</UI>
<AU>Swofford DL
</AU>
<TI>When are Phylogeny Estimates from Molecular and Morphological Data
Incongruent?
</TI>
<ED>Miyamoto MM
    Cracraft J
</ED>
<BK>Phylogenetic Analysis of DNA Sequences
</BK>
<SU>Phylogeny;
    Significance;
    USA
</SU>
<AB>"To the extent that character sets 'tell the truth' about their past,
phylogenies inferred from different character sets should be congruent with the
true tree and therefore with each other. ... In practice, however, the ideal of
perfect congruence is frequently not achieved. ... This chapter has two
purposes. First, in keeping with the general theme of this volume, I will 
review
several methods currently being used to assess levels of congruence. Second, I
will suggest some additional procedures, facilitated by recent improvements in
computer software, that allow a more comprehensive examination of the question
posed in the title."
</AB>
<PU>Oxford University Press </PU>
<PL>New York </PL>
<PY>1991</PY>
<PP>295-333</PP>
</SEQ>

<SEQ>
<UI>1221   Navidi,W.C.   The Effect of Unequal .. Mol.Biol.Evol.  92 
9(6):1163-1175
</UI>
<AU>Navidi WC;
    Beckett-Lemus L
</AU>
<TI>The Effect of Unequal Transversion Rates on the Accuracy of Evolutionary
Parsimony
</TI>
<SU>Phylogeny;
    Parsimony;
    USA;
    Rate;
    Transversion;
    Accuracy
</SU>
<AB>"Evolutionary parsimony is an easy-to-use method of phylogenetic 
inference
that is based on nucleic acid sequences and that does not require the 
assumption
that evolutionary processes in the various sites on the molecule are identical.
It does, however, require a parameter constraint, known as the 'balanced
transversion' assumption. We show that the accuracy of the procedure is fairly
insensitive to moderate violations of this assumption - and that the procedure
thus is applicable under more general conditions than previously thought."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1992</PY>
<VO>9</VO>
<NO>6</NO>
<PP>1163-1175</PP>
</SEQ>

<SEQ>
<UI>1222   Miyata,T.     Molecular Evolution of.. J.Mol.Evol.     80 16:23-36
</UI>
<AU>Miyata T;
    Yasunaga T
</AU>
<TI>Molecular Evolution of mRNA: A Method for Estimating Evolutionary Rates 
of
Synonymous and Amino Acid Substitutions from Homologous Nucleotide Sequences 
and
Its Application
</TI>
<SU>Substitution;
    JP;
    Evolution;
    Evolutionary rate;
    Synonymous;
    Amino acid;
    Rate;
    Nucleotide
</SU>
<AB>"A method for estimating the evolutionary rates of synonymous and amino
acid substitutions from homologous nucleotide sequences is presented."
</AB>
<JT>J Mol Evol</JT>
<PY>16</PY>
<VO>16</VO>
<PP>23-36</PP>
</SEQ>

<SEQ>
<UI>1223   Lanave,C.     A New Method for Calcu.. J.Mol.Evol.     84 20:86-93
</UI>
<AU>Lanave C;
    Preparata G;
    Saccone C;
    Serio G
</AU>
<TI>A New Method for Calculating Evolutionary Substitution Rates
</TI>
<SU>Substitution;
    Stochastic;
    Markov;
    Italy;
    Rate
</SU>
<AB>"In this paper we present a new method for analysing molecular evolution
in homologous genes based on a general stationary Markov process. The elaborate
statistical analysis necessary to apply the method effectively has been
performed using Monte Carlo techniques. We have applied our method to the 
silent
third position of the codon of the five mitochondrial genes coding for
identified proteins of four mammalian species (rat, mouse, cow and man). We
found that the method applies satisfactorily to the three former species, while
the last appears to be outside the scope of the present approach."
</AB>
<JT>J Mol Evol</JT>
<PY>20</PY>
<VO>20</VO>
<PP>86-93</PP>
</SEQ>

<SEQ>
<UI>1224   Jin,L.        Limitations of the Evo.. Mol.Biol.Evol.  90 
7(1):82-102
</UI>
<AU>Jin L;
    Nei M
</AU>
<TI>Limitations of the Evolutionary Parsimony Method of Phylogenetic Analysis
</TI>
<SU>Phylogeny;
    Parsimony;
    Invariant;
    USA;
    Phylogenetic
</SU>
<AB>"Lake's evolutionary parsimony (EP) method of constructing a phylogenetic
tree is primarily applied to four DNA sequences. ... However, Lake's method
depends on a number of unrealistic assumptions. We therefore examined the
theoretical basis of his method and reached the following conclusions: ... (6)
As long as a proper distance measure is used, the NJ method is better than the
EP and MP methods whether there is a transition/transversion bias or whether
there is variation in substitution rate among different nucleotide sites."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1990</PY>
<VO>7</VO>
<NO>1</NO>
<PP>82-102</PP>
</SEQ>

<SEQ>
<UI>1225   Navidi,W.C.   Methods for Inferring .. Mol.Biol.Evol.  91 
8(1):128-143
</UI>
<AU>Navidi WC;
    Churchill GA;
    von Haeseler A
</AU>
<TI>Methods for Inferring Phylogenies from Nucleic Acid Sequence Data by 
Using
Maximum Likelihood and Linear Invariants
</TI>
<SU>Phylogeny;
    Likelihood;
    Invariant;
    Statistical;
    Significance;
    USA;
    Nucleic acid
</SU>
<AB>"A likelihood-ratio test may be used to determine the feasibility of any
tree for which the maximum likelihood can be computed. The method of linear
invariants described by Cavender, which includes Lake's method of evolutionary
parsimony as a special case, is essentially a form of the likelihood-ratio
method. In the case of a small number of species (four or five), these methods
may be used to find a confidence set for the correct tree. An exact version of
Lake's asymptotic c2 test has been mentioned by Holmquist et al. Under very
general assumptions, a one-sided exact test is appropriate, which greatly
increases power."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1991</PY>
<VO>8</VO>
<NO>1</NO>
<PP>128-143</PP>
</SEQ>

<SEQ>
<UI>1226   Navidi,W.C.   Phylogenetic Inference.. Biometrics      93 
49(2):543-555
</UI>
<AU>Navidi WC;
    Churchill GA;
    von Haeseler A
</AU>
<TI>Phylogenetic Inference: Linear Invariants and Maximum Likelihood
</TI>
<SU>Phylogeny;
    Likelihood;
    Invariant;
    Statistical;
    USA;
    Phylogenetic
</SU>
<AB>"We develop a new statistical method for inferring phylogenies, based on 
a
likelihood ratio test. This method does not require parameter constraints but
does require identical evolutionary processes in the sites considered. ... We
describe a sound mathematical basis for the use of linear invariants. We show
that the validity of the method requires parameter constraints, but does not
require that the evolutionary processes in differing sites be identical. We 
show
that the method of linear invariants is asymptotically equivalent to a less
powerful version of our likelihood ratio test, and is thus essentially a 
maximum
likelihood technique."
</AB>
<JT>Biometrics </JT>
<PY>1993</PY>
<VO>49</VO>
<NO>2</NO>
<PP>543-555</PP>
</SEQ>

<SEQ>
<UI>1227   Staden,R.     Automation of the Comp.. Nucleic Acids R 82 
10(15):4731-47
</UI>
<AU>Staden R
</AU>
<TI>Automation of the Computer Handling of Gel Reading Data Produced by the
Shotgun Method of DNA Sequencing
</TI>
<SU>Supersequence;
    Shortest common;
    Reconstruct;
    UK;
    DNA;
    Reading;
    Sequencing
</SU>
<AB>"This paper describes a computer method for handling gel reading data
produced by the shotgun method of DNA sequencing. The method greatly reduces 
the
time the sequencer needs to spend checking and editing his data and yet it
produces a consensus sequence for which the accuracy of determination of every
base can be clearly shown. ... No information is lost in this process as
alignments are achieved by making only insertions and because all the 
individual
gel readings are added to a database from which they can be retrieved and
displayed lined up one above the other."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1982</PY>
<VO>10</VO>
<NO>15</NO>
<PP>4731-4751</PP>
</SEQ>

<SEQ>
<UI>1228   Hillis,D.M.   Molecular Versus Morph.. Annu.Rev.Ecol.S 87 18:23-42
</UI>
<AU>Hillis DM
</AU>
<TI>Molecular Versus Morphological Approaches to Systematics
</TI>
<SU>Phylogeny;
    Review;
    USA;
    Systematics
</SU>
<AB>"In this review, I first outline the advantages of both morphological and
molecular approaches to systematics. I then discuss some common differences in
assumptions and methods of analysis that can lead to spurious conflict between
studies, especially those concerning phylogenetic reconstruction. A major
impediment in comparing the two approaches is that the histories of the
application of the two techniques to systematic problems differ to a large
extent. ... Finally, I discuss ways in which conflicting studies can be
reconciled, and I argue for the increased combination of molecular and
morphological data in order to maximize phylogenetic information."
</AB>
<JT>Annu Rev Ecol Syst</JT>
<PY>18</PY>
<VO>18</VO>
<PP>23-42</PP>
</SEQ>

<SEQ>
<UI>1229   Cesari,Y.     Une caracterisation de.. C.R.Acad.Sci.Pa 78 
286(24):1175-1
</UI>
<AU>Cesari Y;
    Yincent M
</AU>
<TI>Une caracterisation des mots periodiques
</TI>
<SU>Regularities;
    Cover;
    Repetition;
    FR;
    DE
</SU>
<AB>"We establish the periodicity of words in which all letters admit a 
double
covering."
</AB>
<JT>C R Acad Sci Paris Ser A </JT>
<PY>1978</PY>
<VO>286</VO>
<NO>24</NO>
<PP>1175-1177</PP>
</SEQ>

<SEQ>
<UI>1230   Bean,D.R.     Avoidable Patterns in .. Pacific J.Math. 79 
85(2):261-294
</UI>
<AU>Bean DR;
    Ehrenreucht A;
    McNulty GF
</AU>
<TI>Avoidable Patterns in Strings of Symbols
</TI>
<SU>String match;
    Pattern match;
    USA
</SU>
<AB>"A word is just a finite string of letters. The word W avoids the word U
provided no substitution instance of U is a subword of W. W is avoidable if on
some finite alphabet there is an infinite collection of words each of which
avoids W. ... Next we examine avoidable words in general and prove that all
words of length at least 2n on an alphabet with n letters are sumultaneously
avoidable. We show that on any finite alphabet the collection of avoidable 
words
is simultaneously avoidable. We provide an effective (recursive)
characterization of avoidability."
</AB>
<JT>Pacific J Math</JT>
<PY>1979</PY>
<VO>85</VO>
<NO>2</NO>
<PP>261-294</PP>
</SEQ>

<SEQ>
<UI>1231   Tarhio,J.     A Greedy Algorithm for.. Lecture Notes i 86 
233:602-610
</UI>
<AU>Tarhio J;
    Ukkonen E
</AU>
<TI>A Greedy Algorithm for Constructing Shortest Common Superstrings
</TI>
<SU>Supersequence;
    Shortest common;
    Reconstruct;
    FI;
    Algorithm
</SU>
<AB>"An algorithm for constructing shortest common superstrings for a given
set R of strings is developed, based on Knuth-Morris-Pratt string matching
procedure and on the greedy heuristics for finding longest Hamiltonian paths in
weighted graphs. The algorithm runs in O(mn + m2 log m) steps where m is the
number of strings in R and n is the total length of these strings. The
compression in the common superstring constructed by the algorithm is shown to
be at least half of the compression in a shortest superstring."
</AB>
<JT>Lecture Notes in Comput Sci</JT>
<PY>233</PY>
<VO>233</VO>
<PP>602-610</PP>
</SEQ>

<SEQ>
<UI>1232   Apostolico,A. On Context Constrained.. RAIRO Inform.Th 84 
18(2):147-159
</UI>
<AU>Apostolico A
</AU>
<TI>On Context Constrained Squares and Repetitions in a String
</TI>
<SU>Repetition;
    Square;
    Regularities;
    Italy
</SU>
<AB>"Some combinatorial and computational problems concerning repetitions and
repetition roots in a string x on a finite alphabet - that are characterized in
general by an O(n log n) bound in terms of the length n of x - are shown to
admit of a linear bound when approached in particular contexts."
</AB>
<JT>RAIRO Inform Theor</JT>
<PY>1984</PY>
<VO>18</VO>
<NO>2</NO>
<PP>147-159</PP>
</SEQ>

<SEQ>
<UI>1233   Apostolico,A. Efficient Parallel Alg.. SIAM J.Comput.  90 
19(5):968-988
</UI>
<AU>Apostolico A;
    Atallah MJ;
    Larmore LL;
    McFaddin S
</AU>
<TI>Efficient Parallel Algorithms for String Editing and Related Problems
</TI>
<SU>Editing;
    Distance;
    Sequence comparison;
    Parallel;
    USA;
    Algorithm
</SU>
<AB>"The string editing problem ... has a well-known O(|x||y|) 
time-sequential
solution. Efficient PRAM parallel algorithms for the string editing problem are
given."
</AB>
<JT>SIAM J Comput</JT>
<PY>1990</PY>
<VO>19</VO>
<NO>5</NO>
<PP>968-988</PP>
</SEQ>

<SEQ>
<UI>1234   Crochemore,M. Recherche lineaire d'u.. C.R.Acad.Sci.Pa 83 
296(18):781-78
</UI>
<AU>Crochemore M
</AU>
<TI>Recherche lineaire d'un carre dans un mot [Linear Searching for a Square
in a Word]
</TI>
<SU>Regularities;
    Square;
    Repetition;
    FR
</SU>
<AB>"The search for a square in a word may be implemented in time 
proportional
to the length of the word on a random access machine provided the alphabet is
fixed."
</AB>
<JT>C R Acad Sci Paris Ser I </JT>
<PY>1983</PY>
<VO>296</VO>
<NO>18</NO>
<PP>781-784</PP>
</SEQ>

<SEQ>
<UI>1235   Hirschberg,D. The Set LCS Problem      Algorithmica    87 2:91-95
</UI>
<AU>Hirschberg DS;
    Larmore LL
</AU>
<TI>The Set LCS Problem
</TI>
<SU>Longest common;
    Subsequence;
    Dynamic programming;
    USA
</SU>
<AB>"An efficient algorithm is presented that solves a generalization of the
Longest Common Subsequence problem, in which one of the two input strings
contains sets of symbols which may be permuted. This problem arises from a 
music
application."
</AB>
<JT>Algorithmica </JT>
<PY>2</PY>
<VO>2</VO>
<PP>91-95</PP>
</SEQ>

<SEQ>
<UI>1236   Patterson,C.  Homology in Classical .. Mol.Biol.Evol.  88 
5(6):603-625
</UI>
<AU>Patterson C
</AU>
<TI>Homology in Classical and Molecular Biology
</TI>
<SU>Sequence comparison;
    Homology;
    UK
</SU>
<AB>"Hypotheses of homology are the basis of comparative morphology and
comparative molecular biology. The kinds of homologous and nonhomologous
relations in classical and molecular biology are explored through the three
tests that may be applied to a hypothesis of homology: congruence, conjunction,
and similarity. The same three tests apply in molecular comparisons and in
morphology, and in each field they differentiate eight kinds of relation. These
various relations are discussed and compared."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1988</PY>
<VO>5</VO>
<NO>6</NO>
<PP>603-625</PP>
</SEQ>

<SEQ>
<UI>1237   Li,W.H.       A Statistical Test of .. Mol.Biol.Evol.  89 
6(4):424-435
</UI>
<AU>Li WH
</AU>
<TI>A Statistical Test of Phylogenies Estimated from Sequence Data
</TI>
<SU>Evolutionary tree;
    Significance;
    Statistical;
    USA;
    Phylogeny
</SU>
<AB>"A simple approach to testing the significance of the branching order,
estimated from protein or DNA sequence data, of three taxa is proposed. The
branching order is inferred by the transformed-distance method, under the
assumption that one or two outgroups are available, and the branch lengths are
estimated by the least-squares method. The inferred branching order is
considered significant if the estimated internodal distance is significantly
greater than zero. To test this, a formula for the variance of the internodal
distance has been developed. The statistical test proposed has been checked by
computer simulation."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1989</PY>
<VO>6</VO>
<NO>4</NO>
<PP>424-435</PP>
</SEQ>

<SEQ>
<UI>1238   Shoemaker,J.S Evidence from Nuclear .. Mol.Biol.Evol.  89 
6(3):270-289
</UI>
<AU>Shoemaker JS;
    Fitch WM
</AU>
<TI>Evidence from Nuclear Sequences that Invariable Sites should be 
Considered
when Sequence Divergence is Calculated
</TI>
<SU>Sequence proximity;
    USA;
    Divergence
</SU>
<AB>"It has long been known, from the distribution of multiple amino acid
replacements, that not all amino acids of a sequence are replaceable. More
recently, the phenomenon was observed at the nucleotide level in mitochondrial
DNA even after allowing for different rates of transition and transversion
substitutions. We have extended the search to globin gene sequences from 
various
organisms, with the following results. ... (5) The fit in the latter case
suggests, if the assumptions are correct and at all common, that current
procedures for estimating the total number of nucleotide substitutions in two
genes since their divergence from their common ancestor could be low by as much
as an order of magnitude."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1989</PY>
<VO>6</VO>
<NO>3</NO>
<PP>270-289</PP>
</SEQ>

<SEQ>
<UI>1239   Fitch,W.M.    Correcting Parsimoniou.. Mol.Biol.Evol.  90 
7(5):438-443
</UI>
<AU>Fitch WM;
    Beintema JJ
</AU>
<TI>Correcting Parsimonious Trees for Unseen Nucleotide Substitutions: The
Effect of Dense Branching as Exemplified by Ribonuclease
</TI>
<SU>Evolutionary rate;
    Substitution;
    Sequence proximity;
    USA;
    Nucleotide
</SU>
<AB>"In a study of mammalian ribonuclease evolutionary rates, we applied the
Fitch-Bruschi correction to reduce the bias caused by an unequal sampling of
taxa in different lineages. The correction was clearly appropriate but only up
to a point. The analysis showed that the sampling of taxa within the pecora was
sufficiently intense that no correction for unseen, amino acid-changing,
nucleotide substitutions was required."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1990</PY>
<VO>7</VO>
<NO>5</NO>
<PP>438-443</PP>
</SEQ>

<SEQ>
<UI>1240   Fitch,W.M.    The Evolution of Proka.. Mol.Biol.Evol.  87 
4(4):381-394
</UI>
<AU>Fitch WM;
    Bruschi M
</AU>
<TI>The Evolution of Prokaryotic Ferredoxins - With a General Method
Correcting for Unobserved Substitutions in Less Branched Lineages
</TI>
<SU>Evolutionary rate;
    Substitution;
    Correction;
    Evolution;
    USA
</SU>
<AB>"Appendix. Correction of Limb Length on Most-Parsimonious Trees. It is
well recognized that, in most-parsimonious trees, the number of nucleotide
substitutions (or amino acid replacements) observed between a sequence and a
remote ancestor of it is an increasing function of the number of branching
events between the two of them. The effect is to cause lineages with fewer
branchings to appear to evolve more slowly. ... We present here a new and
simpler method that also corrects for this problem."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1987</PY>
<VO>4</VO>
<NO>4</NO>
<PP>381-394</PP>
</SEQ>

<SEQ>
<UI>1241   Tajima,F.     A Simple Graphic Metho.. Mol.Biol.Evol.  90 
7(6):578-588
</UI>
<AU>Tajima F
</AU>
<TI>A Simple Graphic Method for Reconstructing Phylogenetic Trees from
Molecular Data
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    JP;
    Phylogenetic;
    Graphic
</SU>
<AB>"A simple graphic method is proposed for reconstructing phylogenetic 
trees
from molecular data. This method is similar to the unweighted pair-group method
with arithmetic mean, but the process of computation of average distances and
reconstruction of new matrices, required in the latter method, is eliminated
from this new method, so that one can reconstruct a phylogenetic tree without
using a computer, unless the number of operational taxonomic units is very
large. Furthermore, this method allows a phylogenetic tree to have
multifurcating branches whenever there is ambiguity with bifurcation."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1990</PY>
<VO>7</VO>
<NO>6</NO>
<PP>578-588</PP>
</SEQ>

<SEQ>
<UI>1242   Bulmer,M.     Use of the Method of G.. Mol.Biol.Evol.  91 
8(6):868-883
</UI>
<AU>Bulmer M
</AU>
<TI>Use of the Method of Generalized Least Squares in Reconstructing
Phylogenies from Sequence Data
</TI>
<SU>Phylogeny;
    Least squares;
    UK;
    Square
</SU>
<AB>"The method of generalized least squares provides a flexible method of
phylogenetic reconstruction from sequence data, after reducing them to pairwise
distances between species, corrected for multiple and back mutation. It gives
efficient estimates of the branch lengths of a given tree. It also provides a
natural measure of the departure of the observed from the predicted set of
distances which has a c2 distribution under the true topology; this fact is 
used
to construct a significance test on the topology and so to determine a
'confidence interval' for the set of trees which are compatible with the data."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1991</PY>
<VO>8</VO>
<NO>6</NO>
<PP>868-883</PP>
</SEQ>

<SEQ>
<UI>1243   Olsen,G.J.    Systematic Underestima.. Mol.Biol.Evol.  91 
8(5):592-608
</UI>
<AU>Olsen GJ
</AU>
<TI>Systematic Underestimation of Tree Branch Lengths by Lake's Operator
Metrics: An Effect of Position-dependent Substitution Rates
</TI>
<SU>Substitution;
    Evolutionary distance;
    USA;
    Rate;
    Systematics
</SU>
<AB>"It is shown analytically that operator metrics does not yield the 
claimed
estimate of transversion sequence differences when sequence positions differ in
their nucleotide substitution rate, in which case the method underestimates 
tree
branch lengths. The site-to-site variations in substitution rate that have been
characterized by previous authors are of sufficient magnitude to explain the
problems observed in the operator-metrics branch length estimates. Transversion
substitutions estimated using Kimura's two-parameter (transition/transversion)
model are less subject to this problem and are more consistent with directly
observed differences."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1991</PY>
<VO>8</VO>
<NO>5</NO>
<PP>592-608</PP>
</SEQ>

<SEQ>
<UI>1244   Takahata,N.   Sampling Errors in Phy.. Mol.Biol.Evol.  91 
8(4):494-502
</UI>
<AU>Takahata N;
    Tajima F
</AU>
<TI>Sampling Errors in Phylogeny
</TI>
<SU>Evolutionary distance;
    Evolutionary tree;
    Statistical;
    Significance;
    JP;
    Error;
    Sampling;
    Phylogeny
</SU>
<AB>"The sampling variance of nucleotide diversity or branch length in a
phylogenetic tree constructed by any distance method provides a criterion to
judge whether a deduction or an inference made from data is statistically
significant. However, computation of the sampling variance is usually tedious
.... In this paper, we derive simple formulas for the minimum and maximum 
values
of the sampling variance, which are independent of underlying substitution
models. Application of these formulas demonstrates satisfactorily accurate
estimates of the sampling variances and therefore their practical use."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1991</PY>
<VO>8</VO>
<NO>4</NO>
<PP>494-502</PP>
</SEQ>

<SEQ>
<UI>1245   Jin,L.        Relative Efficiencies .. Mol.Biol.Evol.  91 
8(3):356-365
</UI>
<AU>Jin L;
    Nei M
</AU>
<TI>Relative Efficiencies of the Maximum-Parsimony and Distance-Matrix 
Methods
of Phylogeny Construction for Restriction Data
</TI>
<SU>Phylogeny;
    Restriction;
    USA;
    Joining;
    Parsimony;
    UPGMA
</SU>
<AB>"The relative efficiencies of the maximum-parsimony (MP), UPGMA, and
neighbor-joining (NJ) methods in obtaining the correct tree (topology) for
restriction-site and restriction-fragment data were studied by computer
simulation."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1991</PY>
<VO>8</VO>
<NO>3</NO>
<PP>356-365</PP>
</SEQ>

<SEQ>
<UI>1246   Bafna,V.      Sorting by Transpositi..                 94
</UI>
<AU>Bafna V;
    Pevzner PA
</AU>
<TI>Sorting by Transpositions
BK  -
</TI>
<SU>Rearrangement;
    Transposition;
    Genomic;
    USA
</SU>
<AB>Preprint received 7 Nov. 1994, 15 pp. "The paper addresses the problem of
genome comparison versus classical gene comparison and presents algorithms to
analyse rearrangements in genomes evolving by transpositions. In the simplest
form the problem corresponds to sorting by transpositions, i.e., sorting of an
array using transpositions of arbitrary fragments. We derive lower bounds on
transposition distance between permutations and present approximation 
algorithms
for sorting by transpositions. The algorithms also imply a non-trivial upper
bound on the transposition diameter of the symmetric group."
</AB>
<PY>1994</PY>
</SEQ>

<SEQ>
<UI>1247   Lewontin,R.C. Inferring the Number o.. Mol.Biol.Evol.  89 
6(1):15-32
</UI>
<AU>Lewontin RC
</AU>
<TI>Inferring the Number of Evolutionary Events from DNA Coding Sequence
Differences
</TI>
<SU>Evolutionary distance;
    Coding;
    USA;
    DNA
</SU>
<AB>"The estimation of the amount of evolutionary divergence that has taken
place between two DNA coding sequences depends strongly on the degree of
constraint on amino acid replacements. If amino acid replacements are 
relatively
unconstrained, the individual nucleotide is the appropriate unit of analysis 
and
the method of Tajima and Nei can be used. If amino acid replacements are
constarained, however, this method is shown to be inapplicable. For sequences
with strong amino acid constraints, a method is outlined analogous to the 
Tajima
and Nei method using codons as the unit of analysis. Only synomymous
substitutions are used."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1989</PY>
<VO>6</VO>
<NO>1</NO>
<PP>15-32</PP>
</SEQ>

<SEQ>
<UI>1248   Tajima,F.     Statistical Method for.. Mol.Biol.Evol.  92 
9(1):168-181
</UI>
<AU>Tajima F
</AU>
<TI>Statistical Method for Estimating the Standard Errors of Branch Lengths 
in
a Phylogenetic Tree Reconstructed without Assuming Equal Rates of Nucleotide
Substitution among Different Lineages
</TI>
<SU>Evolutionary tree;
    Evolutionary distance;
    Statistical;
    Error;
    Substitution;
    JP;
    Rate;
    Nucleotide;
    Phylogenetic
</SU>
<AB>"A statistical method is developed for estimating the standard errors of
branch lengths in a phylogenetic tree reconstructed without assuming equal 
rates
of nucleotide substitution among different lineages. This method can be easily
used for testing whether the length of an interior branch in a reconstructed
tree is positive, i.e., whether the topology of the tree is correct. Computer
simulations indicate that this method is appropriate for a statistical test. 
...
The results obtained show that the present method provides a powerful
statistical test."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1992</PY>
<VO>9</VO>
<NO>1</NO>
<PP>168-181</PP>
</SEQ>

<SEQ>
<UI>1249   DeBry,R.W.    The Consistency of Sev.. Mol.Biol.Evol.  92 
9(3):537-551
</UI>
<AU>DeBry RW
</AU>
<TI>The Consistency of Several Phylogeny-Inference Methods under Varying
Evolutionary Rates
</TI>
<SU>Phylogeny;
    Evolutionary rate;
    USA;
    Consistency;
    Rate
</SU>
<AB>"A phylogenetic method is a consistent estimator of phylogeny if and only
if it is guaranteed to give the correct tree, given that sufficient (possibly
infinite) independent data are examined. The following methods are examined for
consistency: UPGMA (unweighted pair-group, averages), NJ (neighbor joining), MF
(modified Farris), and P (parsimony). A two-parameter model of nucleotide
sequence substitution is used, and the expected distribution of character 
states
is calculated. Without perfect correction for superimposed substitutions, all
four methods may be inconsistent if there is but one branch evolving at a 
faster
rate than the other branches."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1992</PY>
<VO>9</VO>
<NO>3</NO>
<PP>537-551</PP>
</SEQ>

<SEQ>
<UI>1250   Tamura,K.     Estimation of the Numb.. Mol.Biol.Evol.  92 
9(4):678-687
</UI>
<AU>Tamura K
</AU>
<TI>Estimation of the Number of Nucleotide Substitutions When There Are 
Strong
Transition-Transversion and G+C-Content Biases
</TI>
<SU>Substitution;
    JP;
    Nucleotide;
    Estimation
</SU>
<AB>"A simple mathematical method is developed to estimate the number of
nucleotide substitutions per site between two DNA sequences, by extending
Kimura's (1980) two-parameter method to the case where a G+C-content bias
exists. This method will be useful when there are strong 
transition-transversion
and G+C-content biases, as in the case of Drosophila mitochondrial DNA."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1992</PY>
<VO>9</VO>
<NO>4</NO>
<PP>678-687</PP>
</SEQ>

<SEQ>
<UI>1251   Clark,A.G.    Sequencing Errors and .. Mol.Biol.Evol.  92 
9(4):744-752
</UI>
<AU>Clark AG;
    Whittam TS
</AU>
<TI>Sequencing Errors and Molecular Evolutionary Analysis
</TI>
<SU>Phylogeny;
    Substitution;
    Error;
    USA;
    Sequencing
</SU>
<AB>"Heuristic approaches were used to quantify the influence that sequencing
errors have on estimates of nucleotide diversity, substitution rate, and the
construction of genealogies. Error rates of &lt; 1 nucleotide/kb probably have
little effect on conclusions about evolutionary history of highly polymorphic
organisms such as Drosophila and Escherichia coli, but organisms with very low
nucleotide diversity, such as humans, require greater sequencing accuracy. A
scan of GenBank for corrections of previous errors reveals that sequencing
errors are highly nonrandom."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1992</PY>
<VO>9</VO>
<NO>4</NO>
<PP>744-752</PP>
</SEQ>

<SEQ>
<UI>1252   Churchill,G.A Sample Size for a Phyl.. Mol.Biol.Evol.  92 
9(4):753-769
</UI>
<AU>Churchill GA;
    von Haeseler A;
    Navidi WC
</AU>
<TI>Sample Size for a Phylogenetic Inference
</TI>
<SU>Evolutionary distance;
    Statistical;
    Significance;
    USA;
    Phylogenetic
</SU>
<AB>"The objective of this work is to describe sample-size calculations for
the inference of a nonzero central branch length in an unrooted four-species
phylogeny. Attention is restricted to independent binary characters, such as
might be obtained from an alignment of the purine-pyrimidine sequences of a
nucleic acid molecule. A statistical test based on a multinomial model for
character-state configurations is described. The importance of including
invariable sites in models for sequence change is demonstrated, and their 
effect
on sample size is quantified."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1992</PY>
<VO>9</VO>
<NO>4</NO>
<PP>753-769</PP>
</SEQ>

<SEQ>
<UI>1253   Allard,M.W.   Testing Phylogenetic A.. Mol.Biol.Evol.  92 
9(5):778-786
</UI>
<AU>Allard MW;
    Miyamoto MM
</AU>
<TI>Testing Phylogenetic Approaches with Empirical Data, as Illustrated with
the Parsimony Method
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Significance;
    USA;
    Parsimony;
    Phylogenetic
</SU>
<AB>"In the present study, the evolutionary relationships of lipotyphlan
insectivores ... are investigated with new mitochondrial DNA sequences of the
12S ribosomal RNA gene. A single phylgeny based on parsimony analyses of these
sequences is accepted as well supported according to different criteria,
although an exception to this conclusion is noted. This exception forms the
basis for an investigation of why an incorrect solution is obtained by the
parsimony method in this particular case."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1992</PY>
<VO>9</VO>
<NO>5</NO>
<PP>778-786</PP>
</SEQ>

<SEQ>
<UI>1254   Zharkikh,A.   Statistical Properties.. Mol.Biol.Evol.  92 
9(6):1119-1147
</UI>
<AU>Zharkikh A;
    Li WH
</AU>
<TI>Statistical Properties of Bootstrap Estimation of Phylogenetic 
Variability
from Nucleotide Sequences. I. Four Taxa with a Molecular Clock
</TI>
<SU>Evolutionary tree;
    Bootstrap;
    Statistical;
    USA;
    Clock;
    Nucleotide;
    Phylogenetic;
    Estimation
</SU>
<AB>"The statistical properties of sample estimation and bootstrap estimation
of phylogenetic variability from a sample of nucleotide sequences are studied 
by
using model trees of three taxa with an outgroup and by assuming a constant 
rate
of nucleotide substitution. The maximum-parsimony method of tree reconstruction
is used. An analytic formula is derived for estimating the sequence length that
is required if P, the probability of obtaining the true tree from the sampled
sequences, is to be equal to or higher than a given value."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1992</PY>
<VO>9</VO>
<NO>6</NO>
<PP>1119-1147</PP>
</SEQ>

<SEQ>
<UI>1255   Schoniger,M.  A Simple Method to Imp.. Mol.Biol.Evol.  93 
10(2):471-483
</UI>
<AU>Schoniger M;
    von Haeseler A
</AU>
<TI>A Simple Method to Improve the Reliability of Tree Reconstruction
</TI>
<SU>Phylogeny;
    Reliability;
    USA
</SU>
<AB>"The efficiencies of distance-matrix methods for correct tree
reconstruction under a variety of substitution rates, transition-transversion
biases, and different model trees were studied. ... We show that a combination
of combinatorial weighting by Williams and Fitch (1990) and the Jukes-Cantor
(1969) correction significantly increases the efficiency of tree-reconstruction
methods, for a large fraction of evolutionary parameters. We explain why this
approach is superior to any other weighting/correction scheme tested, as long 
as
.... An approximate threshold for switching to a different weighting scheme is
given."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1993</PY>
<VO>10</VO>
<NO>2</NO>
<PP>471-483</PP>
</SEQ>

<SEQ>
<UI>1256   Tajima,F.     Unbiased Estimation of.. Mol.Biol.Evol.  93 
10(3):677-688
</UI>
<AU>Tajima F
</AU>
<TI>Unbiased Estimation of Evolutionary Distance between Nucleotide Sequences
</TI>
<SU>Evolutionary distance;
    Substitution;
    JP;
    Distance;
    Nucleotide;
    Estimation
</SU>
<AB>"A new algorithm for estimating the number of nucleotide substitutions 
per
site (i.e., the evolutionary distance) between two nucleotide sequences is
presented. This algorithm can be applied to many estimation methods, such as
Jukes and Cantor's method (1969), Kimura's transition/transversion method
(1980), and Tajima and Nei's method (1984). Unlike ordinary methods, this
algorithm is always applicable. Numerical computations and computer simulations
indicate that this algorithm gives an almost unbiased estimate of the
evolutionary distance, unless the evolutionary distance is very large."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1993</PY>
<VO>10</VO>
<NO>3</NO>
<PP>677-688</PP>
</SEQ>

<SEQ>
<UI>1257   Rzhetsky,A.   Theoretical Foundation.. Mol.Biol.Evol.  93 
10(5):1073-109
</UI>
<AU>Rzhetsky A;
    Nei M
</AU>
<TI>Theoretical Foundation of the Minimum-Evolution Method of Phylogenetic
Inference
</TI>
<SU>Phylogeny;
    Minimum evolution;
    USA;
    Phylogenetic
</SU>
<AB>"The minimum-evolution (ME) method of phylogenetic inference is based on
the assumption that the tree with the smallest sum of branch length estimates 
is
most likely to be the true one. In the past this assumption has been used
without mathematical proof. Here we present the theoretical basis of this 
method
by showing that the expectation of the sum of branch length estimates for the
true tree is smallest among all possible trees, provided that the evolutionary
distances used are statistically unbiased and that the branch lengths are
estimated by the ordinary least-squares method."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1993</PY>
<VO>10</VO>
<NO>5</NO>
<PP>1073-1095</PP>
</SEQ>

<SEQ>
<UI>1258   Yang,Z.       Maximum-Likelihood Est.. Mol.Biol.Evol.  93 
10(6):1396-140
</UI>
<AU>Yang Z
</AU>
<TI>Maximum-Likelihood Estimation of Phylogeny from DNA Sequences when
Substitution Rates Differ over Sites
</TI>
<SU>Phylogeny;
    Likelihood;
    Substitution;
    CN;
    DNA;
    Rate;
    Estimation
</SU>
<AB>"Felsenstein's (1981) maximum-likelihood approach for inferring phylogeny
from DNA sequences assumes that the rate of nucleotide substitution is constant
over different nucleotide sites. This assumption is sometimes unrealistic, as
has been revealed by analysis of real sequence data. In the present paper
Felsenstein's method is extended to the case where substitution rates over 
sites
are described by the G distribution. A numerical example is presented to show
that the method fits the data better than do previous models."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1993</PY>
<VO>10</VO>
<NO>6</NO>
<PP>1396-1401</PP>
</SEQ>

<SEQ>
<UI>1259   Tamura,K.     Model Selection in the.. Mol.Biol.Evol.  94 
11(1):154-157
</UI>
<AU>Tamura K
</AU>
<TI>Model Selection in the Estimation of the Number of Nucleotide
Substitutions
</TI>
<SU>Evolutionary distance;
    Likelihood;
    USA;
    Substitution;
    Selection;
    Model;
    Nucleotide;
    Estimation
</SU>
<AB>"Tamura and Nei (1993) recently published a new mathematical model for
estimating the number of nucleotide substitutions per site to analyze
mitochondrial DNA (mtDNA) control-region sequences from humans and chimpanzees.
Although this mathematical model fitted the observed pattern of nucleotide
substitution quite well, the goodness of fit of the model has not been tested
statistically. In the present communication, I would like to examine Horai et
al.'s (1992) data on the coding region of mtDNA and show that Tamura and Nei's
model fits observed data better than does Hasegawa et al.'s (1985) model."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1994</PY>
<VO>11</VO>
<NO>1</NO>
<PP>154-157</PP>
</SEQ>

<SEQ>
<UI>1260   Yang,Z.       Comparison of Models f.. Mol.Biol.Evol.  94 
11(2):316-324
</UI>
<AU>Yang Z;
    Goldman N;
    Friday A
</AU>
<TI>Comparison of Models for Nucleotide Substitution Used in Maximum-
Likelihood Phylogenetic Estimation
</TI>
<SU>Phylogeny;
    Likelihood;
    Substitution;
    UK;
    Model;
    Nucleotide;
    Phylogenetic;
    Estimation
</SU>
<AB>"Using real sequence data, we evaluate the adequacy of assumptions made 
in
evolutionary models of nucleotide substitution and the effects that these
assumptions have on estimation of evolutionary trees. Two aspects of the
assumptions are evaluated. The first concerns the pattern of nucleotide
substitution, including equilibrium base frequencies and the
transition/transversion-rate ratio. The second concerns the variation of
substitution rates over sites. The maximum-likelihood estimate of tree topology
appears quite robust to both these aspects of the assumptions of the models, 
but
evaluation of the reliability of the estimated tree by using simpler, less
realistic models can be misleading."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1994</PY>
<VO>11</VO>
<NO>2</NO>
<PP>316-324</PP>
</SEQ>

<SEQ>
<UI>1261   Wakeley,J.    Substitution-Rate Vari.. Mol.Biol.Evol.  94 
11(3):436-442
</UI>
<AU>Wakeley J
</AU>
<TI>Substitution-Rate Variation among Sites and the Estimation of Transition
Bias
</TI>
<SU>Substitution;
    Sequence comparison;
    USA;
    Transition;
    Bias;
    Estimation
</SU>
<AB>"Substitution-rate variation among sites and differences in the
probabilities of change among the four nucleotides are conflated in DNA 
sequence
comparisons. When variation in rate exists among sites but is ignored, biases 
in
the rates of change among nucleotides are underestimated. This paper provides a
quantification of this effect when the observed proportions of transitions, P,
and transversions, Q, between two sequences are used to estimate transition
bias. The utility of P/Q as an estimator is examined both with and without rate
variation among sites."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1994</PY>
<VO>11</VO>
<NO>3</NO>
<PP>436-442</PP>
</SEQ>

<SEQ>
<UI>1262   Kuhner,M.K.   A Simulation Compariso.. Mol.Biol.Evol.  94 
11(3):459-468
</UI>
<AU>Kuhner MK;
    Felsenstein J
</AU>
<TI>A Simulation Comparison of Phylogeny Algorithms under Equal and Unequal
Evolutionary Rates
</TI>
<SU>Phylogeny;
    Simulation;
    Evolutionary rate;
    Parsimony;
    Likelihood;
    USA;
    Rate;
    Algorithm
</SU>
<AB>"Using simulated data, we compared five methods of phylogenetic tree
estimation: parsimony, compatibility, maximum-likelihood, Fitch-Margoliash, and
neighbor joining. ... Maximum likelihood was the most successful method 
overall,
although for short sequences Fitch-Margoliash and neighbor joining were
sometimes better. ... Parsimony and compatibility had particular difficulty 
with
inaccuracy and bias when substitution rates varied among different branches.
When rates of evolution varied among different sites, all methods showed signs
of inaccuracy and bias."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1994</PY>
<VO>11</VO>
<NO>3</NO>
<PP>459-468</PP>
</SEQ>

<SEQ>
<UI>1263   Zharkikh,A.   Inconsistency of the M.. Syst.Biol.      93 
42(2):113-125
</UI>
<AU>Zharkikh A;
    Li WH
</AU>
<TI>Inconsistency of the Maximum-Parsimony Method: The Case of Five Taxa with
a Molecular Clock
</TI>
<SU>Phylogeny;
    Parsimony;
    Simulation;
    Monte Carlo;
    Joining;
    USA;
    Clock
</SU>
<AB>"The inconsistency of the maximum-parsimony method for the case of five
taxa with a molecular clock was studied using an analytical approach and Monte
Carlo simulation. The inconsistency occurs in the case of a symmetrical tree
with short internal branches and long external branches but can be avoided by
using slowly evolving sequences. The neighbor-joining method is consistent if
the evolutionary distances between taxa are estimated accurately."
</AB>
<JT>Syst Biol</JT>
<PY>1993</PY>
<VO>42</VO>
<NO>2</NO>
<PP>113-125</PP>
</SEQ>

<SEQ>
<UI>1264   Farris,J.S.   A Successive Approxima.. Syst.Zool.      69 
18:374-385
</UI>
<AU>Farris JS
</AU>
<TI>A Successive Approximations Approach to Character Weighting
</TI>
<SU>Character weight;
    USA;
    Approximation
</SU>
<AB>"Characters that are reliable for cladistic inference are those that are
consistent with the true phyletic relationships, that is, those that have 
little
homoplasy. ... A technique that infers cladistic relationship by successively
weighting characters according to apparent cladistic reliability is suggested,
and computer simulation tests of the technique are described. Results indicate
that the successive weighting procedure can be highly successful, even when
cladistically reliable characters are heavily outnumbered by unreliable ones."
</AB>
<JT>Syst Zool</JT>
<PY>18</PY>
<VO>18</VO>
<PP>374-385</PP>
</SEQ>

<SEQ>
<UI>1265   Gojobori,T.   Estimation of Average .. J.Mol.Evol.     82 
18:414-422
</UI>
<AU>Gojobori T;
    Ishii K;
    Nei M
</AU>
<TI>Estimation of Average Number of Nucleotide Substitutions When the Rate of
Substitution Varies with Nucleotide
</TI>
<SU>Evolutionary distance;
    Substitution;
    Statistical;
    Pairwise comparison;
    USA;
    Nucleotide;
    Rate;
    Estimation
</SU>
<AB>"A formal mathematical analysis of Kimura's (1981) six-parameter model of
nucleotide substitution for the case of unequal substitution rates among
different pairs of nucleotides is conducted, and new formulae for estimating 
the
number of nucleotide substitutions and its standard error are obtained. By 
using
computer simulation, the validities and utilities of Jukes and Cantor's (1969)
one-parameter formula, Takahata and Kimura's (1981) four-parameter formula, and
our six-parameter formula for estimating the number of nucleotide substitutions
are examined under three different schemes of nucleotide substitution."
</AB>
<JT>J Mol Evol</JT>
<PY>18</PY>
<VO>18</VO>
<PP>414-422</PP>
</SEQ>

<SEQ>
<UI>1266   Golding,G.B.  Estimates of DNA and P.. Mol.Biol.Evol.  83 
1(1):125-142
</UI>
<AU>Golding GB
</AU>
<TI>Estimates of DNA and Protein Sequence Divergence: An Examination of Some
Assumptions
</TI>
<SU>Evolutionary distance;
    Statistical;
    Divergence;
    USA;
    Protein;
    DNA
</SU>
<AB>"Some of the assumptions underlying estimates of DNA and protein sequence
divergence are examined. A solution for the variance of these estimates that
allows for different mutation rates and different population sizes in each
species and for an arbitrary structure in the initial population is obtained. 
It
is shown that these conditions do not strongly affect estimates of divergence.
In general, they cause the variance of divergence to be smaller than a binomial
variance. Thus, the binomial variance that is usually assumed for these
estimates is safely conservative."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1983</PY>
<VO>1</VO>
<NO>1</NO>
<PP>125-142</PP>
</SEQ>

<SEQ>
<UI>1267   Li,W.H.       A New Method for Estim.. Mol.Biol.Evol.  85 
2(2):150-174
</UI>
<AU>Li WH;
    Wu CI;
    Luo CC
</AU>
<TI>A New Method for Estimating Synonymous and Nonsynonymous Rates of
Nucleotide Substitution Considering the Relative Likelihood of Nucleotide and
Codon Changes
</TI>
<SU>Substitution;
    Likelihood;
    Evolutionary distance;
    Codon;
    Synonymous;
    Rate;
    Nucleotide;
    USA
</SU>
<AB>"A new method is proposed for estimating the number of synonymous and
nonsynonymous nucleotide substitutions between homologous genes. In this 
method,
a nucleotide site is classified as nondegenerate, twofold degenerate, or
fourfold degenerate, depending on how often nucleotide substitutions will 
result
in amino acid replacement; nucleotide changes are classified as either
transitional or transversional, and changes between codons are assumed to occur
with different probabilities, which are determined by their relative 
frequencies
among more than 3,000 changes in mammalian genes."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1985</PY>
<VO>2</VO>
<NO>2</NO>
<PP>150-174</PP>
</SEQ>

<SEQ>
<UI>1268   Nei,M.        Simple Methods for Est.. Mol.Biol.Evol.  86 
3(5):418-426
</UI>
<AU>Nei M;
    Gojobori T
</AU>
<TI>Simple Methods for Estimating the Numbers of Synonymous and Nonsynonymous
Nucleotide Substitutions
</TI>
<SU>Evolutionary distance;
    Pairwise comparison;
    Statistical;
    Substitution;
    Codon;
    USA;
    Synonymous;
    Nucleotide
</SU>
<AB>"Two simple methods for estimating the numbers of synonymous and
nonsynonymous nucleotide substitutions are presented. Although they give no
weights to different types of codon substitutions, these methods give
essentially the same results as those obtained by Miyata and Yasunaga's [1980]
and by Li et al.'s [1985] methods. Computer simulation indicates that estimates
of synonymous substitutions obtained by the two methods are quite accurate
unless the number of nucleotide substitutions per site is very large. It is
shown that all available methods tend to give an underestimate of the number of
nonsynonymous substitutions when the number is large."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1986</PY>
<VO>3</VO>
<NO>5</NO>
<PP>418-426</PP>
</SEQ>

<SEQ>
<UI>1269   Takahata,N.   A Model of Evolutionar.. Genetics        81 
98(Jul.):641-6
</UI>
<AU>Takahata N;
    Kimura M
</AU>
<TI>A Model of Evolutionary Base Substitutions and its Application with
Special Reference to Rapid Change of Pseudogenes
</TI>
<SU>Evolutionary distance;
    Substitution;
    Statistical;
    Pairwise comparison;
    JP;
    Model;
    Pseudogene
</SU>
<AB>"A model of evolutionary base substitutions that can incorporate 
different
substitutional rates between the four bases and that takes into account unequal
composition of bases in DNA sequences is proposed. Using this model, we derived
formulae that enable us to estimate the evolutionary distances in terms of the
number of nucleotide substitutions through comparative studies of nucleotide
sequences. In order to check the validity of various formulae, Monte Carlo
experiments were performed. These formulae were applied to analyze data on DNA
sequences from diverse organisms."
</AB>
<JT>Genetics </JT>
<PY>1981</PY>
<VO>98</VO>
<NO>Jul.</NO>
<PP>641-657</PP>
</SEQ>

<SEQ>
<UI>1270   Tamura,K.     Estimation of the Numb.. Mol.Biol.Evol.  93 
10(3):512-526
</UI>
<AU>Tamura K;
    Nei M
</AU>
<TI>Estimation of the Number of Nucleotide Substitutions in the Control 
Region
of Mitochondrial DNA in Humans and Chimpanzees
</TI>
<SU>Evolutionary distance;
    Substitution;
    Region;
    DNA;
    Nucleotide;
    USA;
    Estimation
</SU>
<AB>"Examining the pattern of nucleotide substitution for the control region
of mitochondrial DNA (mtDNA) in humans and chimpanzees, we developed a new
mathematical method for estimating the number of transitional and 
transversional
substitutions per site, as well as the total number of nucleotide 
substitutions.
In this method, excess transition, unequal nucleotide frequencies, and 
variation
of substitution rate among different sites are all taken into account."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1993</PY>
<VO>10</VO>
<NO>3</NO>
<PP>512-526</PP>
</SEQ>

<SEQ>
<UI>1271   Tajima,F.     Estimation of Evolutio.. Mol.Biol.Evol.  84 
1(3):269-285
</UI>
<AU>Tajima F;
    Nei M
</AU>
<TI>Estimation of Evolutionary Distance between Nucleotide Sequences
</TI>
<SU>Evolutionary distance;
    Statistical;
    Pairwise comparison;
    Distance;
    USA;
    Nucleotide;
    Estimation
</SU>
<AB>"A mathematical formula for estimating the average number of nucleotide
substitutions per site (d) between two homologous DNA sequences is developed by
taking into account unequal rates of substitution among different nucleotide
pairs. Although this formula is obtained for the equal-input model of 
nucleotide
substitution, computer simulations have shown that it gives a reasonably good
estimate for a wide range of nucleotide substitution patterns as long as d &lt;= 
1.
... A statistical method for estimating the number of nucleotide changes due to
deletion and insertion is also developed."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1984</PY>
<VO>1</VO>
<NO>3</NO>
<PP>269-285</PP>
</SEQ>

<SEQ>
<UI>1272   Cavalli-Sforz Phylogenetic Analysis:.. Am.J.Hum.Genet. 67 19(3), 
Part I:
</UI>
<AU>Cavalli-Sforza LL;
    Edwards AWF
</AU>
<TI>Phylogenetic Analysis: Models and Estimation Procedures
</TI>
<SU>Phylogeny;
    Likelihood;
    Evolution;
    Evolutionary tree;
    Clustering;
    Distance;
    Italy;
    Model;
    Phylogenetic;
    Estimation
</SU>
<AB>See also Evolution, 21:550-570(1967). "Acceptance of the theory of
evolution as the means of explaining observed similarities and differences 
among
organisms invites the construction of trees of descent purporting to show
evolutionary relationships. Whether such trees are based on fossil or living
specimens, they may often be criticized for having a subjective element. The
purpose of this paper is to show how suitable evolutionary models can be
constructed and applied objectively. In it we amplify and extend the methods we
have given in previous communications ...."
</AB>
<JT>Am J Hum Genet</JT>
<PY>1967</PY>
<VO>19</VO>
<NO>3</NO>
<PP>233-257 (Part I)</PP>
</SEQ>

<SEQ>
<UI>1273   Cavalli-Sforz Phylogenetic Analysis:.. Evolution       67 
21:550-570
</UI>
<AU>Cavalli-Sforza LL;
    Edwards AWF
</AU>
<TI>Phylogenetic Analysis: Models and Estimation Procedures
</TI>
<SU>Phylogeny;
    Likelihood;
    Evolution;
    Evolutionary tree;
    Clustering;
    Distance;
    Italy;
    Model;
    Phylogenetic;
    Estimation
</SU>
<AB>See also American Journal of Human Genetics, 19(3), Part I:233-257(1967).
"Acceptance of the theory of evolution as the means of explaining observed
similarities and differences among organisms invites the construction of trees
of descent purporting to show evolutionary relationships. Whether such trees 
are
based on fossil or living specimens, they may often be criticized for having a
subjective element. The purpose of this paper is to show how suitable
evolutionary models can be constructed and applied objectively. In it we 
amplify
and extend the methods we have given in previous communications ...."
</AB>
<JT>Evolution </JT>
<PY>21</PY>
<VO>21</VO>
<PP>550-570</PP>
</SEQ>

<SEQ>
<UI>1274   Goldman,N.    Maximum Likelihood Inf.. Syst.Zool.      90 
39(4):345-361
</UI>
<AU>Goldman N
</AU>
<TI>Maximum Likelihood Inference of Phylogenetic Trees, with Special 
Reference
to a Poisson Process Model of DNA Substitution and to Parsimony Analysis
</TI>
<SU>Phylogeny;
    Likelihood;
    Parsimony;
    Substitution;
    UK;
    Poisson;
    DNA;
    Model;
    Phylogenetic
</SU>
<AB>"Maximum likelihood inference is discussed, and some of its advantages 
and
disadvantages are noted. The application of maximum likelihood inference to
phylogenetics is examined, and a simple Poisson process model of DNA
substitution is used as one example. Further examples follow from the
clarification of implicit models underlying traditional 'parsimony' and
'compatibility' analyses. From the elucidation of these models and analyses, it
is seen that Poisson process analysis gives a statistically consistent estimate
of phylogeny, and that parsimony methods do indeed have a maximum likelihood
foundation but give potentially incorrect estimates of phylogeny."
</AB>
<JT>Syst Zool</JT>
<PY>1990</PY>
<VO>39</VO>
<NO>4</NO>
<PP>345-361</PP>
</SEQ>

<SEQ>
<UI>1275   Goldman,N.    Statistical Tests of M.. J.Mol.Evol.     93 
36:182-198
</UI>
<AU>Goldman N
</AU>
<TI>Statistical Tests of Models of DNA Substitution
</TI>
<SU>Substitution;
    Statistical;
    Phylogeny;
    Likelihood;
    Clock;
    UK;
    DNA;
    Model
</SU>
<AB>"A test statistic suggested by Cox is employed to test the adequacy of
some statistical models of DNA sequence evolution used in the phylogenetic
inference method introduced by Felsenstein. Monte Carlo simulations are used to
assess significance levels. The resulting statistical tests provide an 
objective
and very general assessment of all the components of a DNA substitution model;
more specific versions of the test are devised to test individual components of
a model. In all cases, the new analyses have the additional advantage that
values of phylogenetic parameters do not have to be assumed in order to perform
the tests."
</AB>
<JT>J Mol Evol</JT>
<PY>36</PY>
<VO>36</VO>
<PP>182-198</PP>
</SEQ>

<SEQ>
<UI>1276   Fukami-Kobaya Robustness of Maximum .. J.Mol.Evol.     91 32:79-91
</UI>
<AU>Fukami-Kobayashi K;
    Tateno Y
</AU>
<TI>Robustness of Maximum Likelihood Tree Estimation Against Different
Patterns of Base Substitutions
</TI>
<SU>Phylogeny;
    Likelihood;
    Substitution;
    Robustness;
    JP;
    Estimation
</SU>
<AB>"We first evaluated the robustness of the maximum likelihood (ML) method
in the estimation of molecular trees against different nucleotide substitution
patterns, including Jukes and Cantor's .... Namely, we conducted computer
simulations in which we could set up various evolutionary models of a
hypothetical gene, and define a true tree to which an estimated tree by the ML
method was to be compared. The results show that topology estimation by the ML
method is considerably robust against different ratios of transitions to
transversions and different GC contents .... The ML tree estimation based on
Jikes and Cantor's model is also revealed to be resistant to GC content, but
rather sensitive to the ratio of transitions to transversions."
</AB>
<JT>J Mol Evol</JT>
<PY>32</PY>
<VO>32</VO>
<PP>79-91</PP>
</SEQ>

<SEQ>
<UI>1277   Nei,M.        Relative Efficiencies .. Phylogenetic .. 91Oxford 
Universi
</UI>
<AU>Nei M
</AU>
<TI>Relative Efficiencies of Different Tree-Making Methods for Molecular Data
</TI>
<ED>Miyamoto MM
    Cracraft J
</ED>
<BK>Phylogenetic Analysis of DNA Sequences
</BK>
<SU>Phylogeny;
    Reliability;
    Recovery;
    Review;
    USA
</SU>
<AB>"There are many different tree-making methods that can be used for
molecular data. Each of these methods has some advantages and disadvantages, 
and
the overall relative efficiencies of the methods in recovering the correct
phylogenetic tree are still controversial. ... In the late 1970s we initiated a
comprehensive study of this problem, considering DNA sequences (Tateno et al.,
1982). ... In this chapter, a summary of the results of these studies is
presented. Before the discussion of these results, however, the theoretical
basis of each tree-making method that is used for molecular data will be
presented."
</AB>
<PU>Oxford University Press </PU>
<PL>New York </PL>
<PY>1991</PY>
<PP>90-128</PP>
</SEQ>

<SEQ>
<UI>1278   Camin,J.H.    A Method for Deducing .. Evolution       65 
19:311-326
</UI>
<AU>Camin JH;
    Sokal RR
</AU>
<TI>A Method for Deducing Branching Sequences in Phylogeny
</TI>
<SU>Phylogeny;
    Parsimony;
    USA
</SU>
<AB>"... those trees which most closely resembled the true cladistics
invariably required for their construction the least number of postulated
evolutionary steps for the characters studied. Subsequently we examined the
possibility of reconstructing cladistics by the principle of evolutionary
parsimony. ... A method is described for reconstructing presumed cladistic
evolutionary sequences of recent organisms and its implications are discussed.
... The reconstruction proceeds on the hypothesis that the minimum number of
evolutionary steps yields the correct cladogram. The method has been programmed
for computer processing."
</AB>
<JT>Evolution </JT>
<PY>19</PY>
<VO>19</VO>
<PP>311-326</PP>
</SEQ>

<SEQ>
<UI>1279   Archie,J.W.   A Randomization Test f.. Syst.Zool.      89 
38(3):239-252
</UI>
<AU>Archie JW
</AU>
<TI>A Randomization Test for Phylogenetic Information in Systematic Data
</TI>
<SU>Phylogenetic;
    Significance;
    Statistical;
    USA;
    Systematics
</SU>
<AB>"A randomization procedure is proposed to determine if sets of data used
for phylogenetic analysis contain phylogenetically nonrandom information. The
method compares the observed number of steps on a minimum length tree with the
mean number of steps on minimum length trees derived from the same data set
after character state assignments have been randomly permuted within each
character. Such randomized data sets will exhibit exactly the same character
state distributions as the original data but no phylogenetic informaiton."
</AB>
<JT>Syst Zool</JT>
<PY>1989</PY>
<VO>38</VO>
<NO>3</NO>
<PP>239-252</PP>
</SEQ>

<SEQ>
<UI>1280   Huelsenbeck,J Tree-Length Distributi.. Syst.Zool.      91 
40(3):257-270
</UI>
<AU>Huelsenbeck JP
</AU>
<TI>Tree-Length Distribution Skewness: An Indicator of Phylogenetic
Information
</TI>
<SU>Phylogenetic;
    Statistical;
    Significance;
    Simulation;
    USA;
    Distribution
</SU>
<AB>"Computer simulations in which phylogenies were generated under various
conditions were used to examine the relationship between the phylogenetic 
signal
of a character data set, the skewness of the tree-length distribution, and the
position of the real tree relative to the most parsimonious tree for a four-
character-state system. Character data that are consistent with one 
phylogenetic
hypothesis produce tree-length distributions that are highly skewed to the 
left,
whereas character data consistent with many phylogenetic hypotheses produce 
more
symmetrical tree-length distributions that cannot be distinguished from tree-
length distributions produced by random character data."
</AB>
<JT>Syst Zool</JT>
<PY>1991</PY>
<VO>40</VO>
<NO>3</NO>
<PP>257-270</PP>
</SEQ>

<SEQ>
<UI>1281   Faith,D.P.    Could a Cladogram This.. Cladistics      91 
7(1):1-28
</UI>
<AU>Faith DP;
    Cranston PS
</AU>
<TI>Could a Cladogram This Short Have Arisen by Chance Alone?: On Permutation
Tests for Cladistic Structure
</TI>
<SU>Phylogenetic;
    Cladistic;
    Statistical;
    Significance;
    AU;
    Structure;
    Permutation
</SU>
<AB>"The length of the most-parsimonious tree reflects the degree to which 
the
observed characters co-vary such that a single tree topology can explain shared
character states among the taxa. This 'cladistic covariation' can be quantified
by comparing the length of the most parsimonious tree for the observed data set
to that found for data sets with random covariation of characters. ... The
cladistic permutation tail probability, PTP, is defined as the estimate of the
proportion of times that a tree can be found as short or shorter than the
original tree. Significant cladistic covariation exists if the PTP is less than
a prescribed value, for example, 0.05."
</AB>
<JT>Cladistics </JT>
<PY>1991</PY>
<VO>7</VO>
<NO>1</NO>
<PP>1-28</PP>
</SEQ>

<SEQ>
<UI>1282   Fitch,W.M.    Cautionary Remarks on .. Syst.Zool.      79 
28(3):375-379
</UI>
<AU>Fitch WM
</AU>
<TI>Cautionary Remarks on Using Gene Expression Events in Parsimony 
Procedures
</TI>
<SU>Phylogeny;
    Gene;
    Expression;
    Parsimony;
    USA
</SU>
<AB>"It can be seen that the distribution [of tree lengths] is normal .... I
have never seen a normal distribution for real sequences before. ... But while
the distribution is not skewed in the more usual fashion, that does not prevent
the figure from illustrating an important feature that is not sufficiently
recognized. That feature is that there may be more than one most parsimonious
tree topology, that there may be many tree topologies that are less 
parsimonious
by only a very small number of substitutions, and that the total range of
substitutions from the best to the worst tree will most commonly be much less
than the number of possible trees if the taxa are reasonably numerous."
</AB>
<JT>Syst Zool</JT>
<PY>1979</PY>
<VO>28</VO>
<NO>3</NO>
<PP>375-379</PP>
</SEQ>

<SEQ>
<UI>1283   Hillis,D.M.   Discriminating Between.. Phylogenetic .. 91Oxford 
Universi
</UI>
<AU>Hillis DM
</AU>
<TI>Discriminating Between Phylogenetic Signal and Random Noise in DNA
Sequences
</TI>
<ED>Miyamoto MM
    Cracraft J
</ED>
<BK>Phylogenetic Analysis of DNA Sequences
</BK>
<SU>Phylogenetic;
    Signal;
    USA;
    DNA
</SU>
<AB>"In parsimony analysis, changes at nucleotide positions among aligned
sequences are mapped onto a tree, and the number of evolutionary changes
required to accommodate that tree with the data is calculated as the tree
length. For any given data set, this procedure may be repeated for many
thousands of trees .... The optimal tree is thus the one that requires the
fewest number of evolutionary changes. In this chapter, I argue that the shape
of the distribution of tree lengths contains information useful in deciding
whether or not the data set contains phylogenetic signal."
</AB>
<PU>Oxford University Press </PU>
<PL>New York </PL>
<PY>1991</PY>
<PP>278-294</PP>
</SEQ>

<SEQ>
<UI>1284   Goodman,M.    Further Remarks on the.. Syst.Zool.      79 
28(3):379-385
</UI>
<AU>Goodman M;
    Czelusniak J;
    Moore GW
</AU>
<TI>Further Remarks on the Parameter of Gene Duplication and Expression 
Events
in Parsimony Reconstructions
</TI>
<SU>Phylogeny;
    Expression;
    Gene;
    Duplication;
    Parsimony;
    USA
</SU>
<AB>"We share the concern expressed by Walter Fitch (1979) in his cautionary
remarks about possible pitfalls in the parsimony procedure advocated by us
(Goodman et al., 1979). Thus, while we do not fully agree with all his points,
our present remarks are intended to be complementary to his and to thereby help
elucidate the problem of constructing a correct genealogical tree from amino
acid sequence data. ... Thus, in contrast to the usual maximum parsimony
reconstruction which only minimizes the number of nucleotide replacements
throughout the tree, the new procedure minimizes the sum of nucleotide
replacements, gene duplications, and gene expression events."
</AB>
<JT>Syst Zool</JT>
<PY>1979</PY>
<VO>28</VO>
<NO>3</NO>
<PP>379-385</PP>
</SEQ>

<SEQ>
<UI>1285   Hillis,D.M.   Signal, Noise, and Rel.. J.Hered.        92 
83:189-195
</UI>
<AU>Hillis DM;
    Huelsenbeck JP
</AU>
<TI>Signal, Noise, and Reliability in Molecular Phylogenetic Analyses
</TI>
<SU>Phylogenetic;
    Statistical;
    Significance;
    Signal;
    Reliability;
    USA
</SU>
<AB>"DNA sequences and other molecular data compared among organisms may
contain phylogenetic signal, or they may be randomized with respect to
phylogenetic history. Some method is needed to distinguish phylogenetic signal
from random noise to avoid analysis of data that have been randomized with
respect to the historical relationships of the taxa being compared. ... The
distribution of tree lengths of all tree topologies (or a random sample 
thereof)
provides a sensitive measure of phylogenetic signal: data matrices with
phylogenetic signal produce tree-length distributions that are strongly skewed
to the left .... Tables of critical values of a skewness test statistic, g1, 
are
provided ...."
</AB>
<JT>J Hered</JT>
<PY>83</PY>
<VO>83</VO>
<PP>189-195</PP>
</SEQ>

<SEQ>
<UI>1286   Hasegawa,M.   Dating of the Human-Ap.. J.Mol.Evol.     85 
22:160-174
</UI>
<AU>Hasegawa M;
    Kishino H;
    Yano T
</AU>
<TI>Dating of the Human-Ape Splitting by a Molecular Clock of Mitochondrial
DNA
</TI>
<SU>Evolutionary divergence;
    Statistical;
    Markov;
    Clock;
    JP;
    DNA
</SU>
<AB>"A new statistical method for estimating divergence dates of species from
DNA sequence data by a molecular clock approach is developed. This method takes
into account effectively the information contained in a set of DNA sequence
data."
</AB>
<JT>J Mol Evol</JT>
<PY>22</PY>
<VO>22</VO>
<PP>160-174</PP>
</SEQ>

<SEQ>
<UI>1287   Perler,F.     The Evolution of Genes.. Cell (Cambridge 80 
20:555-566
</UI>
<AU>Perler F;
    Efstratiadis A;
    Lomedico P;
    Gilbert W;
    Kolodner R;
    Dodgson J
</AU>
<TI>The Evolution of Genes: The Chicken Preproinsulin Gene
</TI>
<SU>Evolutionary divergence;
    Evolution;
    Gene;
    Clock;
    USA
</SU>
<AB>"The divergences between insulin gene sequences, and also between globin
genes, show that changes at introns and silent positions in coding regions
appear very rapidly ..., but that the accumulation of changes in these sites
saturates, although not completely, after about 100 million years. From this we
conclude that not all of these sites are neutral and that they do not behave as
accurate evolutionary clocks over long periods of time. However, nucleotide
substitutions leading to amino acid replacements are an excellent clock. Our
analysis indicates that this clock is driven by selection."
</AB>
<JT>Cell (Cambridge, Mass )</JT>
<PY>1980</PY>
<VO>20</VO>
<PP>555-566</PP>
</SEQ>

<SEQ>
<UI>1288   Farris,J.S.   Estimating Phylogeneti.. Am.Nat.         72 
106(Sept.-Oct.
</UI>
<AU>Farris JS
</AU>
<TI>Estimating Phylogenetic Trees from Distance Matrices
</TI>
<SU>Clustering;
    Phylogeny;
    Evolutionary tree;
    USA;
    Distance;
    Phylogenetic
</SU>
<AB>"In this paper I shall describe a modification of the Wagner tree-
constructing technique of Kluge and Farris. The new procedure operates only 
upon
an OTU x OTU matrix of phenetic differences and has no need to reference a
character-state matrix."
</AB>
<JT>Am Nat</JT>
<PY>1972</PY>
<VO>106</VO>
<NO>Sept.-Oct.</NO>
<PP>645-668</PP>
</SEQ>

<SEQ>
<UI>1289   Sneath,P.H.A. Numerical Taxonomy: Th..                 73W. H. 
Freeman
</UI>
<AU>Sneath PHA;
    Sokal RR
</AU>
<TI>Numerical Taxonomy: The Principles and Practice of Numerical
Classification
BK  -
</TI>
<SU>Hierarchical;
    Classification;
    Clustering;
    Distance;
    UPGMA;
    UK
</SU>
<AB>See section 5.5 (Sequential, Agglomerative, Hierarchic, Nonoverlapping
Clustering Methods, pp. 214-245) and in particular the discussion of UPGMA on
pages 230-234.
</AB>
<PU>W H Freeman </PU>
<PL>San Francisco </PL>
<PY>1973</PY>
<PP>pp. xv+573-0</PP>
</SEQ>

<SEQ>
<UI>1290   Holmquist,R.  Analysis of Higher-Pri.. Mol.Biol.Evol.  88 
5(3):217-236
</UI>
<AU>Holmquist R;
    Miyamoto MM;
    Goodman M
</AU>
<TI>Analysis of Higher-Primate Phylogeny from Transversion Differences in
Nuclear and Mitochondrial DNA by Lake's Methods of Evolutionary Parsimony and
Operator Metrics
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Character data;
    Invariant;
    Parsimony;
    USA;
    DNA;
    Transversion
</SU>
<AB>"We concluded that there is no agreement on either the correct branching
order or differential rates of evolution among the higher primates ....
Recently, Lake developed two novel methods, based on group properties of
transition and transversion operators, that (a) permit, in principle, objective
resolution of problems of the above type and (b) attach a statistical
significance level to the conclusions drawn. In the present paper, we develop
formulas for using these two methods in tandem and apply them to study
transversion differences in nuclear [and] mitochondrial DNA ...."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1988</PY>
<VO>5</VO>
<NO>3</NO>
<PP>217-236</PP>
</SEQ>

<SEQ>
<UI>1291   Dixon,M.T.    Ribosomal RNA Secondar.. Mol.Biol.Evol.  93 
10(1):256-267
</UI>
<AU>Dixon MT;
    Hillis DM
</AU>
<TI>Ribosomal RNA Secondary Structure: Compensatory Mutations and 
Implications
for Phylogenetic Analysis
</TI>
<SU>Character weight;
    Phylogeny;
    Structure;
    RNA;
    USA;
    Phylogenetic;
    Secondary
</SU>
<AB>"Using sequence data from the 28S ribosomal RNA (rRNA) genes of selected
vertebrates, we investigated the effects that constraints imposed by secondary
structure have on the phylogenetic analysis of rRNA sequence data. Our analysis
indicates that characters from both base-pairing regions (stems) and non-base-
pairing regions (loops) contain phylogenetic information, as judged by the 
level
of support of the phylogenetic results compared with a well-established tree
based on both morphological and molecular data."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1993</PY>
<VO>10</VO>
<NO>1</NO>
<PP>256-267</PP>
</SEQ>

<SEQ>
<UI>1292   Kallersjo,M.  Skewness and Permutation Cladistics      92 
8(3):275-287
</UI>
<AU>Kallersjo M;
    Farris JS;
    Kluge AG;
    Bult C
</AU>
<TI>Skewness and Permutation
</TI>
<SU>Phylogenetic;
    Statistical;
    Significance;
    USA;
    Permutation
</SU>
<AB>"Following Fitch's (1979) early suggestion, Le Quesne (1989), Huelsenbeck
(1991) and Hillis (1991) have all recommended assessing the phylogenetic
structure in systematic data according to the skewness of the distribution of
tree lengths. We point out here that such evaluations can be misleading;
arguments for that approach are not well-considered. ... To resolve this 
problem
we introduce a test based on a new measure - total support - which takes
multiple most parsimonious trees into account. Our fast method for 
approximating
support may prove useful in analyses of very large data matrices."
</AB>
<JT>Cladistics </JT>
<PY>1992</PY>
<VO>8</VO>
<NO>3</NO>
<PP>275-287</PP>
</SEQ>

<SEQ>
<UI>1293   Sidow,A.      Compositional Statisti.. J.Mol.Evol.     90 31:51-68
</UI>
<AU>Sidow A;
    Wilson AC
</AU>
<TI>Compositional Statistics: An Improvement of Evolutionary Parsimony and 
its
Application to Deep Branches in the Tree of Life
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Invariant;
    Parsimony;
    Statistical;
    USA
</SU>
<AB>"We present compositional statistics, a new method of phylogenetic
inference, which is an extension of evolutionary parsimony. Compositional
statistics takes account of the base composition of the compared sequences by
using nucleotide positions that evolutionary parsimony ignores. It shares with
evolutionary parsimony the features of rate invariance and the fundamental
distinction between transitions and transversions. Of the presently available
methods of phylogenetic inference, compositional statistics is based on the
fewest and mildest assumptions about the mode of DNA sequence evolution."
</AB>
<JT>J Mol Evol</JT>
<PY>31</PY>
<VO>31</VO>
<PP>51-68</PP>
</SEQ>

<SEQ>
<UI>1294   Eddy,S.R.     RNA Sequence Analysis .. Nucleic Acids R 94 
22(11):2079-20
</UI>
<AU>Eddy SR;
    Durbin R
</AU>
<TI>RNA Sequence Analysis using Covariance Models
</TI>
<SU>Sequence analysis;
    Probabilistic;
    Consensus sequence;
    Structure;
    UK;
    RNA;
    Covariance;
    Model
</SU>
<AB>"We describe a general approach to several RNA sequence analysis problems
using probabilistic models that flexibly describe the secondary structure and
primary sequence consensus of an RNA sequence family. We call these models
'covariance models'. A covariance model of tRNA sequences is an extremely
sensitive and discriminative tool for searching for additional tRNAs and tRNA-
related sequences in sequence databases. A model can be built automatically 
from
an existing sequence alignment."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1994</PY>
<VO>22</VO>
<NO>11</NO>
<PP>2079-2088</PP>
</SEQ>

<SEQ>
<UI>1295   Felsenstein,J Confidence Limits on P.. Syst.Zool.      85 
34(2):152-161
</UI>
<AU>Felsenstein J
</AU>
<TI>Confidence Limits on Phylogenies With a Molecular Clock
</TI>
<SU>Evolutionary tree;
    Robustness;
    Analytical;
    Statistical;
    Clock;
    USA;
    Confidence;
    Phylogeny
</SU>
<AB>"For three species in the presence of a molecular clock, it is possible 
to
compute how many steps a phylogeny must have to be significantly worse than the
most parsimonious phylogeny. ... The distribution of two statistics is obtained
by direct enumeration of all possible outcomes .... The two statistics are the
number of fewer steps in the best tree than in the next best tree, and the
number of 'phylogenetically informative' characters supporting the best tree.
These two statistics prove to be approximately equivalent in statistical power,
and tables of 95%-significance values are provided for each."
</AB>
<JT>Syst Zool</JT>
<PY>1985</PY>
<VO>34</VO>
<NO>2</NO>
<PP>152-161</PP>
</SEQ>

<SEQ>
<UI>1296   Williams,S.A. A Statistical Test tha.. Mol.Biol.Evol.  89 
6(4):325-330
</UI>
<AU>Williams SA;
    Goodman M
</AU>
<TI>A Statistical Test that Supports a Human/Chimpanzee Clade Based on
Noncoding DNA Sequence Data
</TI>
<SU>Evolutionary tree;
    Robustness;
    Analytical;
    USA;
    Statistical;
    DNA
</SU>
<AB>"Using the aligned DNA sequence data of Miyamoto et al. [1988] and Maeda
et al. [1988], all noncoding genetic material, and a simple statistical test, 
we
show that a Homo/Pan clade is supported at approximately the 3% level of
significance. The method accommodates polymorphism and different evolutionary
rates for different sites. All assumptions on which the statistical study is
based are made explicit."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1989</PY>
<VO>6</VO>
<NO>4</NO>
<PP>325-330</PP>
</SEQ>

<SEQ>
<UI>1297   Felsenstein,J Distance Methods: A Re.. Cladistics      86 
2(2):130-143
</UI>
<AU>Felsenstein J
</AU>
<TI>Distance Methods: A Reply to Farris
</TI>
<SU>Phylogeny;
    Statistical;
    USA;
    Distance
</SU>
<AB>"Farris (1985) claimed that my assertions about unbiasedness and
consistency of estimates of a phylogeny obtained by least squares fitting are 
in
error. ... It is argued, contrary to Farris's claims, that one need not avoid
nonmetric distances, and that one should avoid negative branch lengths in
estimates of phylogenies from distance data. Statistical tests of clockness,
and, to a limited extent, of alternative phylogenies can be constructed, and
these are demonstrated by example. ... Information on phylogenies is present in
distance data, as in other kinds of data, and statistical methods can be
developed to extract it."
</AB>
<JT>Cladistics </JT>
<PY>1986</PY>
<VO>2</VO>
<NO>2</NO>
<PP>130-143</PP>
</SEQ>

<SEQ>
<UI>1298   Hillis,D.M.   An Empirical Test of B.. Syst.Biol.      93 
42(2):182-192
</UI>
<AU>Hillis DM;
    Bull JJ
</AU>
<TI>An Empirical Test of Bootstrapping as a Method for Assessing Confidence 
in
Phylogenetic Analysis
</TI>
<SU>Evolutionary tree;
    Robustness;
    Resampling;
    Bootstrap;
    Confidence;
    USA;
    Phylogenetic
</SU>
<AB>"Although bootstrapping was first applied in phylogenetics to assess the
repeatability of a given result, bootstrap results are commonly interpreted as 
a
measure of the probability that a phylogenetic estimate represents the true
phylogeny. Here we use computer simulations and a laboratory-generated 
phylogeny
to test bootstrapping results of parsimony analyses, both as measures of
repeatability (i.e., the probability of repeating a result given a new sample 
of
characters) and accuracy (i.e., the probability that a result represents the
true phylogeny)."
</AB>
<JT>Syst Biol</JT>
<PY>1993</PY>
<VO>42</VO>
<NO>2</NO>
<PP>182-192</PP>
</SEQ>

<SEQ>
<UI>1299   Lanyon,S.M.   Detecting Internal Inc.. Syst.Zool.      85 
34(4):397-403
</UI>
<AU>Lanyon SM
</AU>
<TI>Detecting Internal Inconsistencies in Distance Data
</TI>
<SU>Evolutionary tree;
    Robustness;
    Jackknife;
    USA;
    Distance
</SU>
<AB>"Phylogenetic trees, derived from distance measures, may be of variable
reliability due to variance in the quality of the data sets from which they are
produced. Such trees, therefore, are of questionable value as a means of
summarizing large data sets. To improve our confidence in these trees, a
jackknife technique is presented that, in combination with existing consensus
techniques, identifies those portions of evolutionary history that are poorly
known due to inconsistencies in the data. ... The approach is a simple
modification of existing tree-generating methods."
</AB>
<JT>Syst Zool</JT>
<PY>1985</PY>
<VO>34</VO>
<NO>4</NO>
<PP>397-403</PP>
</SEQ>

<SEQ>
<UI>1300   Penny,D.      Estimating the Reliabi.. Mol.Biol.Evol.  86 
3(5):403-417
</UI>
<AU>Penny D;
    Hendy M
</AU>
<TI>Estimating the Reliability of Evolutionary Trees
</TI>
<SU>Evolutionary tree;
    Robustness;
    Resampling;
    Jackknife;
    Reliability;
    NZ
</SU>
<AB>"Six protein sequences from the same 11 mammalian taxa were used to
estimate the accuracy and reliability of phylogenetic trees using real, rather
than simulated, data. ... It was concluded that it is possible to give a
reasonable estimate of the reliability of the final tree, at least when several
sequences are combined. ... In our opinion, it is unreasonable to publish an
evolutionary tree derived from sequence data without giving an idea of the
reliability of the tree."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1986</PY>
<VO>3</VO>
<NO>5</NO>
<PP>403-417</PP>
</SEQ>

<SEQ>
<UI>1301   Sanderson,M.J Confidence Limits on P.. Cladistics      89 
5(2):113-129
</UI>
<AU>Sanderson MJ
</AU>
<TI>Confidence Limits on Phylogenies: The Bootstrap Revisited
</TI>
<SU>Evolutionary tree;
    Robustness;
    Resampling;
    Bootstrap;
    Confidence;
    USA;
    Phylogeny
</SU>
<AB>"The bootstrap, a non-parametric statistical analysis, can be used to
assess confidence limits on phylogenies. The method most widely used tests the
monophyly of individual clades. This paper proposes additional applications of
the bootstrap which provide useful information about phylogeny even when many
clades are found not to be supported with confidence (as often occurs in
practice). In such cases it is still possible to place a constraint on the
phylogenetic position of taxa by examining the relative size of the smallest
monophyletic groups that contain them."
</AB>
<JT>Cladistics </JT>
<PY>1989</PY>
<VO>5</VO>
<NO>2</NO>
<PP>113-129</PP>
</SEQ>

<SEQ>
<UI>1302   Penny,D.      Testing Methods of Evo.. Cladistics      85 
1(3):266-278
</UI>
<AU>Penny D;
    Hendy MD
</AU>
<TI>Testing Methods of Evolutionary Tree Construction
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Confidence;
    Robustness;
    Character weight;
    NZ
</SU>
<AB>"Evaluating the reliability of methods for reconstructing evolutionary
trees is discussed under the four headings of: evaluating criteria for an
optimal tree, finding the optimal tree for the criterion selected, detecting
reliable and unreliable data, and estimating the error range for the final 
tree.
... An objective weighting of columns (characters) can lead to an improved tree
by giving less weight to columns that are closer to a random order. The
weighting of characters is derived from the ratio of the observed to expected
number of incompatabilities for each column. Several forms of character
weighting give better trees ...."
</AB>
<JT>Cladistics </JT>
<PY>1985</PY>
<VO>1</VO>
<NO>3</NO>
<PP>266-278</PP>
</SEQ>

<SEQ>
<UI>1303                 Phylogenetic Analysis ..                 91Oxford 
Universi
</UI>
<TI>Phylogenetic Analysis of DNA Sequences
</TI>
<ED>Miyamoto MM
    Cracraft J
BK  -
</ED>
<SU>Sequence analysis;
    Phylogeny;
    Evolution;
    USA;
    DNA;
    Phylogenetic
</SU>
<AB>"This volume has assembled an internationally recognized group of
investigators representing different theoretical viewpoints and disciplines to
address critically a diversity of questions about DNA systematics. ... This 
book
has its roots in the symposium 'Recent Advances in Phylogenetic Studies of DNA
Sequences,' which was part of a special centennial celebration of the American
Society of Zoologists, held in conjunction with the Society of Systematic
Zoology, on December 26-30, 1989 in Boston Massachusetts. ... Each participant
concentrated on the strengths, limitations, and assumptions of their approaches
relative to others."
</AB>
<PU>Oxford University Press </PU>
<PL>New York </PL>
<PY>1991</PY>
<PP>x+358-0</PP>
</SEQ>

<SEQ>
<UI>1304   Crochemore,M. Pattern Matching in St.. Image Analysi.. 88Plenum
</UI>
<AU>Crochemore M;
    Perrin D
</AU>
<TI>Pattern Matching in Strings
</TI>
<ED>Cantoni V
    Di Gesu V;
    Levialdi S
</ED>
<BK>Image Analysis and Processing II
</BK>
<SU>String match;
    Pattern match;
    Factorization;
    FR
</SU>
<AB>"In this paper, we present a new method for pattern matching in strings.
... From the practical viewpoint, its merits consist in requiring only constant
additional memory space. It can therefore be compared with the algorithm of
[Galil, Seiferas (1983)] but it is faster and simpler. From the theoretical
viewpoint, its main feature is that it makes use of a deep theorem on words
known as the critical factorization theorem due to Cesari, Duval, Vincent ....
It is also amusing that the new algorithm can be considered as a compromise
between Knuth-Morris-Pratt's and Boyer-Moore's algorithms."
</AB>
<PU>Plenum </PU>
<PL>New York </PL>
<PY>1988</PY>
<PP>67-79</PP>
</SEQ>

<SEQ>
<UI>1305   Crochemore,M. Constant-space String-.. Foundations o.. 
88Springer-Verlag
</UI>
<AU>Crochemore M
</AU>
<TI>Constant-space String-matching
</TI>
<ED>Nori ?
    Kumar ?
</ED>
<BK>Foundations of Software Technology and Theoretical Computer Science
</BK>
<SU>String match;
    Factorization;
    FR
</SU>
<AB>"We present a string-matching algorithm with the following properties: it
is linear in time with a small multiplicative constant during all its phases; 
it
processes the searched text with constant memory space in addition to the
string. ... During its first phase the algorithm computes the smallest period 
of
the pattern, in some situations. The computation succeeds when this period is
not too great. The question remains whether there exists an algorithm computing
the smallest period of a word in linear time with constant extra memory space."
</AB>
<PU>Springer-Verlag </PU>
<PL>New York </PL>
<PY>1988</PY>
<PP>80-87</PP>
</SEQ>

<SEQ>
<UI>1306   Crochemore,M. String-Matching and Pe.. EATCS Bull.     89 
39:149-153
</UI>
<AU>Crochemore M
</AU>
<TI>String-Matching and Periods
</TI>
<SU>String match;
    Regularities;
    FR
</SU>
<AB>"We present a new string-matching algorithm based on a computation of
periods of the pattern. It is linear in time and uses a fixed number of memory
locations in addition to the text and the pattern. Therefore it is time-space-
optimal as the algorithms of [Galil &amp; Seiferas 1983] and [Crochemore &amp; Perrin
1989]. Its main characteristic is that it scans the pattern from left to right
as [Knuth, Morris &amp; Pratt 1977] does. No preprocessing of the pattern is needed
and the complexity is independent of the size of the pattern."
</AB>
<JT>EATCS Bull</JT>
<PY>39</PY>
<VO>39</VO>
<PP>149-153</PP>
</SEQ>

<SEQ>
<UI>1307   Bafna,V.      Genome Rearrangements ..                 94
</UI>
<AU>Bafna V;
    Pevzner PA
</AU>
<TI>Genome Rearrangements and Sorting by Reversals
BK  -
</TI>
<SU>Genome;
    Rearrangement;
    Reversal;
    USA
</SU>
<AB>Preprint dated 14 Oct. 1994, 28 pp. "Recently, Kececioglu and Sankoff 
gave
the first approximation algorithm for sorting by reversals with guaranteed 
error
bound 2 and identified open problems related to chromosome rearrangements. One
of these problems is Gollan's conjecture on the reversal diameter of the
symmetric group. This paper proves the conjecture. Further the problem of
expected reversal distance between two random permutations is investigated. ...
An approximation algorithm for signed permutations is presented, which provides
a performance guarantee of 3/2. Finally, using the signed permutations 
approach,
an approximation algorithm for sorting by reversals is described, which 
achieves
a performance guarantee of 7/4."
</AB>
<PY>1994</PY>
</SEQ>

<SEQ>
<UI>1308   Waterman,M.   Sequence Comparison Si..                 94
</UI>
<AU>Waterman M;
    Vingron M
</AU>
<TI>Sequence Comparison Significance and Poisson Approximation
BK  -
</TI>
<SU>Sequence comparison;
    Poisson;
    Approximation;
    Chen-Stein;
    USA;
    Significance
</SU>
<AB>Preprint received 7 Dec. 1994, 44 pp. "The Chen-Stein method of Poisson
approximation has been used to establish theorems about comparison of two DNA 
or
protein sequences. The most useful result for sequence alignment applies to
alignment scoring for aligned letters and no gaps. However there has not been a
valid method to assign statistical significance to alignment scores with gaps.
In this paper we extend Poisson approximation techniques using the Aldous
clumping heuristic to a practical method of estimating statistical
significance."
</AB>
<PY>1994</PY>
</SEQ>

<SEQ>
<UI>1309   Day,W.H.E.    Estimating Phylogenies.. Classificatio.. 
91Springer-Verlag
</UI>
<AU>Day WHE
</AU>
<TI>Estimating Phylogenies with Invariant Functions of Data
</TI>
<ED>Bock HH
    Ihm P
</ED>
<BK>Classification, Data Analysis, and Knowledge Organization: Models and
Methods with Applications
</BK>
<SU>Phylogeny;
    Invariant;
    CA;
    Function
</SU>
<AB>"What is encouraging, however, is that researchers are beginning to
develop methods of estimating phylogenies which may be robust under conditions
where parsimony is not. A strategy shared by some of these methods (Cavender,
Felsenstein (1987), Lake (1987)) is to use invariant functions of the data to
identify the correct topology of the corresponding phylogeny. But which
invariants, and how? What assumptions underlie these approaches? I discuss 
these
issues and indicate the direction this research seems to be taking."
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1991</PY>
<PP>248-253</PP>
</SEQ>

<SEQ>
<UI>1310   Wolf,K.       Variance Estimation in.. Classificatio.. 
91Springer-Verlag
</UI>
<AU>Wolf K;
    Degens PO
</AU>
<TI>Variance Estimation in the Additive Tree Model
</TI>
<ED>Bock HH
    Ihm P
</ED>
<BK>Classification, Data Analysis, and Knowledge Organization: Models and
Methods with Applications
</BK>
<SU>Additive tree;
    Robustness;
    Confidence;
    Likelihood;
    Distance;
    DE;
    Variance;
    Model;
    Estimation
</SU>
<AB>"By the use of stochastic models it is possible to judge procedures for
fitting additive trees to dissimilarity data. We use the simple additive error
model ... to analyse the accuracy of an estimated additive tree by estimating
its variance, too. Analogously to the three-object variance estimator in the
ultrametric case ... we propose a four-object variance estimator based on the
simple maximum-likelihood ... variance estimation for all subsets consisting of
any four objects of an additive tree. In contrast to variance estimation using
the residual sum of squares this new estimator is not based on the assumed i.e.
estimated structure of the given dissimilarity data."
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1991</PY>
<PP>262-269</PP>
</SEQ>

<SEQ>
<UI>1311   Archie,J.W.   Homoplasy Excess Stati.. Syst.Zool.      90 
39(2):169-174
</UI>
<AU>Archie JW
</AU>
<TI>Homoplasy Excess Statistics and Retention Indices: A Reply to Farris
</TI>
<SU>Evolutionary tree;
    Character data;
    Phylogenetic;
    USA
</SU>
<AB>"In a recent paper (Archie, 1989) I introduced a new approach to 
measuring
levels of homoplasy in phylogenetic data sets in which the observed level of
homoplasy is compared to the maximum achievable for a data set containing no
phylogenetic information. ... The properties and behavior of the homoplasy
excess statistics ... were compared to the consistency index (Kluge and Farris,
1969). ... Farris (1990) discusses several minor points regarding Archie (1989)
which deserve comment ...."
</AB>
<JT>Syst Zool</JT>
<PY>1990</PY>
<VO>39</VO>
<NO>2</NO>
<PP>169-174</PP>
</SEQ>

<SEQ>
<UI>1312   Sanderson,M.J Flexible Phylogeny Rec.. Syst.Zool.      90 
39(4):414-420
</UI>
<AU>Sanderson MJ
</AU>
<TI>Flexible Phylogeny Reconstruction: A Review of Phylogenetic Inference
Packages Using Parsimony
</TI>
<SU>Phylogeny;
    Parsimony;
    Program;
    Review;
    USA;
    Phylogenetic
</SU>
<AB>A review of PHYLIP, Hennig86 and PAUP. "In short, hastily formed opinions
are not hard to find, and it is desirable to have a set of standards by which 
to
judge these programs while minimizing the personal biases that inevitably
intrude." Criteria: efficiency, documentation, flexibility, synthesis, 
benchmark
comparisons.
</AB>
<JT>Syst Zool</JT>
<PY>1990</PY>
<VO>39</VO>
<NO>4</NO>
<PP>414-420</PP>
</SEQ>

<SEQ>
<UI>1313   Faith,D.P.    Probability, Parsimony.. Syst.Biol.      92 
41(2):252-257
</UI>
<AU>Faith DP;
    Cranston PS
</AU>
<TI>Probability, Parsimony, and Popper
</TI>
<SU>Cladistic;
    Phylogenetic;
    Character data;
    Resampling;
    AU;
    Probability;
    Parsimony
</SU>
<AB>"Randomization tests for cladistic structure ... compare the minimum
length of the tree found for the original character data to the length of the
tree achieved for corresponding randomized data sets. ... In practice, the
proportion of all data sets (observed and random) having a tree length as short
or shorter than that of the observed tree is defined as the 'cladistic
permutation tail probability' or PTP (Faith, Cranston 1991). ... In this paper,
we formalize this connection between PTP tests and degree of corroboration by
developing a link between the tail probability associated with PTP tests and 
the
concept of corroboration as developed by Popper (1959)."
</AB>
<JT>Syst Biol</JT>
<PY>1992</PY>
<VO>41</VO>
<NO>2</NO>
<PP>252-257</PP>
</SEQ>

<SEQ>
<UI>1314   Knight,A.     Substitution Bias, Wei.. Syst.Biol.      93 
42(1):18-31
</UI>
<AU>Knight A;
    Mindell DP
</AU>
<TI>Substitution Bias, Weighting of DNA Sequence Evolution, and the
Phylogenetic Position of Fea's Viper
</TI>
<SU>Substitution;
    Character weight;
    USA;
    Evolution;
    DNA;
    Phylogenetic;
    Bias
</SU>
<AB>"Character state weights were assigned prior to phylogenetic analysis in
proportion to the ratio of expected to observed nucleotide differences in
pairwise comparisons of sequences. ... This method may help in resolving rapid
radiations and other relationships obscured by homoplasy."
</AB>
<JT>Syst Biol</JT>
<PY>1993</PY>
<VO>42</VO>
<NO>1</NO>
<PP>18-31</PP>
</SEQ>

<SEQ>
<UI>1315   Steel,M.A.    Parsimony can be Consi.. Syst.Biol.      93 
42(4):581-587
</UI>
<AU>Steel MA;
    Hendy MD;
    Penny D
</AU>
<TI>Parsimony can be Consistent!
</TI>
<SU>Phylogeny;
    Parsimony;
    Consistency;
    NZ
</SU>
<AB>"A desired property of any method for reconstructing evolutionary trees 
is
that it be consistent, i.e., as sequences become longer the method will recover
the correct tree with probability tending to 1. ... We report here that the
original conclusion is too sweeping in that the problem is not with the
parsimony criterion itself but rather with the implementation of the criterion.
... Many criteria, including parsimony and compatibility, are consistent after
appropriate nonlinear transformations that adjust for multiple hits (Penny et
al., 1993)."
</AB>
<JT>Syst Biol</JT>
<PY>1993</PY>
<VO>42</VO>
<NO>4</NO>
<PP>581-587</PP>
</SEQ>

<SEQ>
<UI>1316   Penny,D.      Some Recent Progress w.. N.Z.J.Bot.      93 
31(3):275-288
</UI>
<AU>Penny D;
    Watson EE;
    Hickson RE;
    Lockhart PJ
</AU>
<TI>Some Recent Progress with Methods for Evolutionary Trees
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Consistency;
    Spectral analysis;
    Parsimony;
    NZ
</SU>
<AB>"We discuss methods for inferring evolutionary trees from these patterns
or signals under five properties desired for an ideal method. These five
desiderata are that the methods be efficient (fast), consistent, powerful,
robust, and falsifiable. Our conclusion is that corrections for multiple 
changes
in sequences are the most important factor for any method to be consistent. 
Most
optimality criteria, including compatibility and parsimony, become consistent
when the sequences have appropriate corrections for multiple changes.
Conversely, virtually no methods are consistent without adjustments for 
multiple
changes."
</AB>
<JT>N Z J Bot</JT>
<PY>1993</PY>
<VO>31</VO>
<NO>3</NO>
<PP>275-288</PP>
</SEQ>

<SEQ>
<UI>1317   Schoniger,M.  More Reliable Phylogen.. Information a.. 
93Springer-Verlag
</UI>
<AU>Schoniger M;
    von Haeseler A
</AU>
<TI>More Reliable Phylogenies by Properly Weighted Nucleotide Substitutions
</TI>
<ED>Opitz O
    Lausen B;
    Klar R
</ED>
<BK>Information and Classification: Concepts, Methods and Applications
</BK>
<SU>Phylogeny;
    Substitution;
    Evolutionary rate;
    DE;
    Nucleotide
</SU>
<AB>"The efficiency of the neighbor-joining method under a variety of
substitution rates, transition-transversion biases and model trees is studied.
If substitution rates vary considerably and the ratio of transitions and
transversions is large, even a Kimura (1980) two-parameter correction cannot
guarantee reconstruction of the model tree. We show that application of the
combinatorial weighting method by Williams and Fitch (1990) together with the
Jukes-Cantor (1969) correction significantly improves the efficiency of tree
reconstructions for a wide range of evolutionary parameters."
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1993</PY>
<PP>413-420</PP>
</SEQ>

<SEQ>
<UI>1318   Trifonov,E.N. Nucleotide Sequences a.. Classificatio.. 88Elsevier 
Scienc
</UI>
<AU>Trifonov EN
</AU>
<TI>Nucleotide Sequences as a Language: Morphological Classes of Words
</TI>
<ED>Bock HH
</ED>
<BK>Classification and Related Methods of Data Analysis
</BK>
<SU>Sequence analysis;
    Linguistic;
    IL;
    Nucleotide;
    Word;
    Language
</SU>
<AB>"Like every known written language the nucleotide sequences are
repetitive, i.e. certain words (strings) of the four letter alphabet ... occur
frequently, while other combinations of letters are avoided. There are several
morphologically distinct classes of words (morphemes) in this language of the
nucleotide sequences (Gnomic language). ... Oligonucleotide ('syllabic')
composition of words of semantic dictionary of Gnomic language is discussed.
Gnomic 'speech apparatus' appears to favor certain combinations of letters."
</AB>
<PU>Elsevier Science Publishers B V (North Holland)</PU>
<PL>Amsterdam</PL>
<PY>1988</PY>
<PP>57-64</PP>
</SEQ>

<SEQ>
<UI>1319   Vach,W.       The Jukes-Cantor Trans.. Analyzing and.. 
92Springer-Verlag
</UI>
<AU>Vach W
</AU>
<TI>The Jukes-Cantor Transformation and Additivity of Estimated Genetic
Distances
</TI>
<ED>Schader M
</ED>
<BK>Analyzing and Modeling Data and Knowledge
</BK>
<SU>Substitution;
    Evolutionary distance;
    DE;
    Genetic;
    Distance
</SU>
<AB>"We give a simple derivation for the Jukes-Cantor transformation. The
importance of the transformation with respect to distance-based tree
constructing methods is demonstrated. We show that it is not justified to 
expect
that the transformation improves the additivity of the estimated distance if
non-additivity is measured by the degree of violation of the equation in the
four-point condition. Finally, some effects of model violation are discussed."
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1992</PY>
<PP>141-150</PP>
</SEQ>

<SEQ>
<UI>1320   Wheeler,W.C.  Nucleic Acid Sequence .. Cladistics      90 
6(4):363-367
</UI>
<AU>Wheeler WC
</AU>
<TI>Nucleic Acid Sequence Phylogeny and Random Outgroups
</TI>
<SU>Phylogeny;
    Character data;
    Outgroup;
    USA;
    Nucleic acid
</SU>
<AB>"When divergent taxa are used to root networks, it is assumed that the
character states in the outgroup have historical similarity to those in the
ingroup. Yet, if the data are nucleic acid sequences, the character states
shared by a divergent outgroup may be based not on history but on random
similarity. A simple procedure is proposed to test this possibility. In the
absence of an appropriate outgroup, root position can be estimated with the use
of an asymmetrical character transformation matrix. If the matrix is
sufficiently biased, it can supply the polarity information usually derived 
from
an outgroup."
</AB>
<JT>Cladistics </JT>
<PY>1990</PY>
<VO>6</VO>
<NO>4</NO>
<PP>363-367</PP>
</SEQ>

<SEQ>
<UI>1321   Wheeler,W.C.  Combinatorial Weights .. Cladistics      90 
6(3):269-275
</UI>
<AU>Wheeler WC
</AU>
<TI>Combinatorial Weights in Phylogenetic Analysis: A Statistical Parsimony
Procedure
</TI>
<SU>Phylogeny;
    Character weight;
    Statistical;
    Parsimony;
    USA;
    Combinatorial;
    Phylogenetic
</SU>
<AB>"A data dependent weighting procedure is developed to allow the 
comparison
of phylogenetic trees based on nucleic acid sequence data. The sampling error 
of
this cladogram 'cost' is then examined, permitting statistical evaluation of 
the
cost differential."
</AB>
<JT>Cladistics </JT>
<PY>1990</PY>
<VO>6</VO>
<NO>3</NO>
<PP>269-275</PP>
</SEQ>

<SEQ>
<UI>1322   Felsenstein,J Distance Methods for I.. Evolution       84 
38(1):16-24
</UI>
<AU>Felsenstein J
</AU>
<TI>Distance Methods for Inferring Phylogenies: A Justification
</TI>
<SU>Phylogeny;
    USA;
    Distance
</SU>
<AB>"There are at least two different logical frameworks underlying distance
methods. Farris (1981) has presented a major critique of distance methods,
finding a number of them to be ill-justified. We shall see that his conclusions
come from adopting one of the two logical frameworks, and that when the other 
is
adopted, many of these methods do turn out to have a coherent logical basis. 
The
Path Length Interpretation. ... A Statistical Framework for Distance Methods."
</AB>
<JT>Evolution </JT>
<PY>1984</PY>
<VO>38</VO>
<NO>1</NO>
<PP>16-24</PP>
</SEQ>

<SEQ>
<UI>1323   Drolet,S.     Quadratic Tree Invaria.. J.Theor.Biol.   90 
144:117-129
</UI>
<AU>Drolet S;
    Sankoff D
</AU>
<TI>Quadratic Tree Invariants for Multivalued Characters
</TI>
<SU>Phylogeny;
    Invariant;
    Character data;
    CA
</SU>
<AB>"Generalization to characters other than binary is difficult because of
the computational size of the problem - when the Cavender-Felsenstein method is
applied directly to the case of three-valued characters, a quartic polynomial
involving 22,050 terms results. Algebraic manipulation with the help of 
MACSYMA,
however, shows that there are quadratic branch-length invariants in this case 
as
well. Similarities in the form of the binary and trinary character invariants
suggests a form for the case of four-valued characters and numerous tests
confirm this. It is this case which will be of use in phylogenetic
reconstruction based on nucleotide sequence data."
</AB>
<JT>J Theor Biol</JT>
<PY>144</PY>
<VO>144</VO>
<PP>117-129</PP>
</SEQ>

<SEQ>
<UI>1324   De Soete,G.   A Least Squares Algori.. Psychometrika   83 
48(4):621-626
</UI>
<AU>De Soete G
</AU>
<TI>A Least Squares Algorithm for Fitting Additive Trees to Proximity Data
</TI>
<SU>Phylogeny;
    Additive tree;
    Distance;
    Least squares;
    Belgium;
    Square;
    Algorithm
</SU>
<AB>"A least squares algorithm for fitting additive trees to proximity data 
is
described. The algorithm uses a penalty function to enforce the four point
condition on the estimated path length distances. The algorithm is evaluated in
a small Monte Carlo study. Finally, an illustrative application is presented."
</AB>
<JT>Psychometrika </JT>
<PY>1983</PY>
<VO>48</VO>
<NO>4</NO>
<PP>621-626</PP>
</SEQ>

<SEQ>
<UI>1325   Buneman,P.    The Recovery of Trees .. Mathematics i.. 
71Edinburgh Unive
</UI>
<AU>Buneman P
</AU>
<TI>The Recovery of Trees from Measures of Dissimilarity
</TI>
<ED>Hodson FR
    Kendall DG;
    Tautu P
</ED>
<BK>Mathematics in the Archaeological and Historical Sciences
</BK>
<SU>Phylogeny;
    Additive tree;
    Distance;
    UK;
    Recovery
</SU>
<AB>"The problem of inferring an evolutionary tree from a set of measurements
is one that crops up in various fields .... For example, amino-acid sequences 
of
the same protein extracted from different organisms can be determined, and one
can attempt, from the dissimilarities between these sequences, to construct a
phylogenetic tree of these organisms. ... The object of this paper is to show
that there is a method for inferring a tree from a [dissimilarity coefficient]
which has properties that may make it rather more attractive than other
currently available methods." Definition of the four-point condition for
additive tree metrics.
</AB>
<PU>Edinburgh University Press </PU>
<PL>Edinburgh </PL>
<PY>1971</PY>
<PP>387-395</PP>
</SEQ>

<SEQ>
<UI>1326   Le Quesne,W.J Frequency Distribution.. Cladistics      89 
5(4):395-407
</UI>
<AU>Le Quesne WJ
</AU>
<TI>Frequency Distributions of Lengths of Possible Networks from a Data 
Matrix
</TI>
<SU>Phylogeny;
    Statistical;
    Distribution;
    UK;
    Network;
    Matrix
</SU>
<AB>"The aim of the present work has been to examine the frequency
distributions of the unrooted tree lengths obtained with real and random data,
and to see whether any assessment of the information content of the data matrix
could be made from the characteristics of this distribution."
</AB>
<JT>Cladistics </JT>
<PY>1989</PY>
<VO>5</VO>
<NO>4</NO>
<PP>395-407</PP>
</SEQ>

<SEQ>
<UI>1327   Prager,E.M.   Construction of Phylog.. J.Mol.Evol.     78 
11:129-142
</UI>
<AU>Prager EM;
    Wilson AC
</AU>
<TI>Construction of Phylogenetic Trees for Proteins and Nucleic Acids:
Empirical Evaluation of Alternative Matrix Methods
</TI>
<SU>Phylogeny;
    Distance;
    Parsimony;
    UPGMA;
    USA;
    Protein;
    Nucleic acid;
    Phylogenetic;
    Matrix
</SU>
<AB>"The methods of Fitch and Margoliash and of Farris for the construction 
of
phylogenetic trees were compared. It is suggested that were input data are
likely to include overestimates as well as true estimates and underestimates of
the actual distances between taxonomic units, the F-M method is the most
reasonable to use for constructing phylogenies from distance matrices. ... By
contrast, where it is known that each input datum is indeed either a true
estimate or an underestimate of the actual distance between 2 taxonomic units,
the Farris procedure appears, on theoretical grounds, to be the method of
choice. Amino acid and nucleotide sequence data are in this category."
</AB>
<JT>J Mol Evol</JT>
<PY>11</PY>
<VO>11</VO>
<PP>129-142</PP>
</SEQ>

<SEQ>
<UI>1328   Farris,J.S.   Distance Data Revisited  Cladistics      85 
1(1):67-85
</UI>
<AU>Farris JS
</AU>
<TI>Distance Data Revisited
</TI>
<SU>Phylogeny;
    USA;
    Distance
</SU>
<AB>"Objections to my earlier demonstratin, that the branch lengths of trees
fitted to distance matrices have no physical interpretation, are shown to be
ill-founded. ... A method is introduced for constructing multiple trees of
optimal or near-optimal fit to distance data, and this is found to give better
performance than previous methods. Most published trees based on distances have
been poorly chosen. Consensus trees of several trees with near-optimal fit are
found to be quite poorly resolved, and it appears that molecular distances
seldom provide much useful information on phylogenetic relationships."
</AB>
<JT>Cladistics </JT>
<PY>1985</PY>
<VO>1</VO>
<NO>1</NO>
<PP>67-85</PP>
</SEQ>

<SEQ>
<UI>1329   Farris,J.S.   Distance Data in Phylo.. Advances in C.. 81New York 
Botani
</UI>
<AU>Farris JS
</AU>
<TI>Distance Data in Phylogenetic Analysis
</TI>
<ED>Funk VA
    Brooks DR
</ED>
<BK>Advances in Cladistics: Proceedings of the First Meeting of the Willi
Hennig Society
</BK>
<SU>Phylogeny;
    USA;
    Distance;
    Phylogenetic
</SU>
<AB>"It is my aim here to concentrate on the methodological issues posed by
distance analyses. I shall begin by tracing the development of techniques in 
the
context of immunological distance, perhaps the most common type of distance
data. Later I shall show how that discussion can be extended to other kinds of
molecular distances, and to distances generally." Reconstructing Genealogies.
Optimal Branch Length Fitting. Improving Fit. Measures of Fit. Clocks and
Ultrametrics. Path Length Interpretations. Euclidean Distances. Manhattan
Distances.
</AB>
<PU>New York Botanical Garden </PU>
<PL>Bronx, NY </PL>
<PY>1981</PY>
<PP>3-23</PP>
</SEQ>

<SEQ>
<UI>1330   Swofford,D.L. On the Utility of the .. Advances in C.. 81New York 
Botani
</UI>
<AU>Swofford DL
</AU>
<TI>On the Utility of the Distance Wagner Procedure
</TI>
<ED>Funk VA
    Brooks DR
</ED>
<BK>Advances in Cladistics: Proceedings of the First Meeting of the Willi
Hennig Society
</BK>
<SU>Phylogeny;
    USA;
    Distance
</SU>
<AB>"I find little empirical justification for the notion that the F-M 
[Fitch-
Margoliash] method is preferable to the distance Wagner procedure. Indeed, I
will show here that the same data sets and trees used by Prager and Wilson
(1978) to demonstrate the supposed superiority of the F-M method can in fact be
used to support exactly the opposite conclusion. That is, when the two methods
are compared fairly, the distance Wagner procedure outperforms the F-M method."
</AB>
<PU>New York Botanical Garden </PU>
<PL>Bronx, NY </PL>
<PY>1981</PY>
<PP>25-43</PP>
</SEQ>

<SEQ>
<UI>1331   Felsenstein,J PHYLIP (Phylogeny Infe..                 94
</UI>
<AU>Felsenstein J
</AU>
<TI>PHYLIP (Phylogeny Inference Package) Version 3.5c: Executables for
Macintosh
BK  -
</TI>
<SU>Phylogeny;
    Program;
    USA
</SU>
<AB>This README document introduces executable programs that are available
from J. Felsenstein by anonymous ftp (file transfer protocol). For information
on how to obtain the programs, request the file 'Getting PHYLIP 3.5 by ftp' 
from
userid 'joe' at electronic mail address 'genetics.washington.edu'
</AB>
<PY>1994</PY>
</SEQ>

<SEQ>
<UI>1332   Farris,J.S.   Distances and Statistics Cladistics      86 
2(2):144-157
</UI>
<AU>Farris JS
</AU>
<TI>Distances and Statistics
</TI>
<SU>Phylogeny;
    Statistical;
    USA;
    Distance
</SU>
<AB>"Felsenstein's claim of approximate additivity for sequence differences 
is
based on an unjustified model, as is his proposed nonadditive fitting method.
His advocacy of the nonnegativity restriction on fitted branch lengths rests on
the false premise that distances are additive. His proposed significance test
confounds sampling error with departures from additivity and rests on false
assumptions of additivity and of independence of distances. His additive 
fitting
program lacks any useful facility for recognizing ambiguities in distance 
data."
</AB>
<JT>Cladistics </JT>
<PY>1986</PY>
<VO>2</VO>
<NO>2</NO>
<PP>144-157</PP>
</SEQ>

<SEQ>
<UI>1333   Penny,D.      Towards a Basis for Cl.. J.Theor.Biol.   82 
96:129-142
</UI>
<AU>Penny D
</AU>
<TI>Towards a Basis for Classification: The Incompleteness of Distance
Measures, Incompatibility Analysis and Phenetic Classification
</TI>
<SU>Classification;
    Character data;
    NZ;
    Distance
</SU>
<AB>"It is shown that information is lost in converting the original data to
distances in that it is in general not possible to recover the original data
from a distance matrix. ... It is concluded that because of this loss of
information the methods of phenetic classification are inherently weaker than
methods that retain the original data. An indication is given of how 
information
is lost in transforming to distances. Incompatibility matrices are shown not to
contain all the original information but these methods usually retain the
original data for tree building."
</AB>
<JT>J Theor Biol</JT>
<PY>96</PY>
<VO>96</VO>
<PP>129-142</PP>
</SEQ>

<SEQ>
<UI>1334   Vingron,M.    Towards Integration of..                 94
</UI>
<AU>Vingron M;
    von Haeseler A
</AU>
<TI>Towards Integration of Multiple Alignment and Phylogenetic Tree
Construction
BK  -
</TI>
<SU>Multiple alignment;
    Evolutionary tree;
    Phylogeny;
    DE;
    Phylogenetic
</SU>
<AB>Preprint dated June 1994, 16 pp. "The central problem in the study of
molecular evolution is the reconstruction of the history of a set of biological
sequences in the form of a phylogenetic tree. One of the steps in calculating
this tree is computation of a multiple alignment of the set of sequences. Most
existing approaches treat the two problems of multiple alignment and tree
construction as separate while in fact they influence each other. Based on
three-way alignments of pre-aligned groups of sequences we adapt a commonly 
used
tree construction procedure to produce both tree and multiple alignment
simultaneously. A sufficient criterion to prevent the introduction of edges 
with
negative length reduces the number of three-way alignments that need to be
computed."
</AB>
<PY>1994</PY>
</SEQ>

<SEQ>
<UI>1335   De Soete,G.   On the Construction of.. Z.Naturforsch.T 83 
38:156-158
</UI>
<AU>De Soete G
</AU>
<TI>On the Construction of 'Optimal' Phylogenetic Trees
</TI>
<SU>Phylogeny;
    Distance;
    Belgium;
    Phylogenetic
</SU>
<AB>"An iterative algorithm for constructing the optimal phylogenetic tree
from a given set of dissimilarity data is described. The procedure is applied
for illustrative purposes to a data set compiled by Fitch and Margoliash."
</AB>
<JT>Z Naturforsch Teil C </JT>
<PY>38</PY>
<VO>38</VO>
<PP>156-158</PP>
</SEQ>

<SEQ>
<UI>1336   Waterman,M.S. Additive Evolutionary .. J.Theor.Biol.   77 
64:199-213
</UI>
<AU>Waterman MS;
    Smith TF;
    Singh M;
    Beyer WA
</AU>
<TI>Additive Evolutionary Trees
</TI>
<SU>Evolutionary tree;
    Phylogeny;
    Additive tree;
    USA
</SU>
<AB>"Metric trees are dendrograms which ... have numerical values attached to
the branches. ... Metric trees and additive matrices are discussed and the
uniqueness of the metric tree for an additive dissimilarity matrix is shown. A
simple algorithm is given to generate the metric tree for an additive
dissimilarity matrix. This algorithm is extended to non-additive dissimilarity
matrices through the use of linear programming."
</AB>
<JT>J Theor Biol</JT>
<PY>64</PY>
<VO>64</VO>
<PP>199-213</PP>
</SEQ>

<SEQ>
<UI>1337   Swofford,D.L. Reconstructing Ancestr.. Math.Biosci.    87 
87:199-229
</UI>
<AU>Swofford DL;
    Maddison WP
</AU>
<TI>Reconstructing Ancestral Character States Under Wagner Parsimony
</TI>
<SU>Evolutionary tree;
    Phylogeny;
    Character optimization;
    USA;
    Parsimony
</SU>
<AB>"The problem of assigning optimal character states to the hypothetical
ancestors of an evolutionary tree under the Wagner parsimony criterion is
examined. A proof is provided for the correctness of Farris's well-known, but
previously unproven, algorithm for solving this problem. However, the solution
is not, in general, unique, and Farris's method obtains only a subset 
(generally
only one) of the possible solutions. Algorithms that discover other solutions
and that resolve ambiguities through the imposition of ancillary criteria are
developed and discussed."
</AB>
<JT>Math Biosci</JT>
<PY>87</PY>
<VO>87</VO>
<PP>199-229</PP>
</SEQ>

<SEQ>
<UI>1338   Maddison,W.P. MacClade: Analysis of ..                 92Sinauer 
Associa
</UI>
<AU>Maddison WP;
    Maddison DR
</AU>
<TI>MacClade: Analysis of Phylogeny and Character Evolution. Version 3
BK  -
</TI>
<SU>Phylogeny;
    Character data;
    Program;
    USA;
    Evolution
</SU>
<AB>"This book is both a manual for the computer program MacClade, describing
its features and potential uses, as well as a portrayal of a phylogenetic
approach to studying diversity and evolution. It is relatively easy to see the
diversity of living organisms, but it has proved more difficult to see that
diversity in terms of its history; the slow development of a thoroughly
phylogenetic perspective in biology attests to this challenge. Together this
book and program present methods for analyzing and exploring phylogenetic
hypotheses, including hypotheses about character evolution."
</AB>
<PU>Sinauer Associates</PU>
<PL> Inc ,Sunderland, MA </PL>
<PY>1992</PY>
<PP>xi+398-0</PP>
</SEQ>

<SEQ>
<UI>1339   Sober,E.      Parsimony in Systemati.. Annu.Rev.Ecol.S 83 
14:335-357
</UI>
<AU>Sober E
</AU>
<TI>Parsimony in Systematics: Philosophical Issues
</TI>
<SU>Phylogeny;
    Parsimony;
    Likelihood;
    USA;
    Systematics
</SU>
<AB>"If one is entirely ignorant of the contingent properties of the
evolutionary process that generated a set of taxa (e.g. what forces acted when
and with what intensities, or what the probabilities were of certain
evolutionary transitions), can one still reasonably use parsimony? Cladists
generally say yes, while their critics disagree." Competing methods of
phylogenetic inference. A priori defences of parsimony. A posteriori criticisms
of parsimony. A likelihood justification of parsimony. Summary.
</AB>
<JT>Annu Rev Ecol Syst</JT>
<PY>14</PY>
<VO>14</VO>
<PP>335-357</PP>
</SEQ>

<SEQ>
<UI>1340   Sober,E.      Reconstructing the Pas..                 88MIT Press
</UI>
<AU>Sober E
</AU>
<TI>Reconstructing the Past. Parsimony, Evolution, and Inference
BK  -
</TI>
<SU>Phylogeny;
    Parsimony;
    Likelihood;
    USA;
    Evolution
</SU>
<AB>The biological problem of phylogenetic inference. The philosophical
problem of simplicity. The principle of the common cause. Cladistics and the
limits of hypothetico-deductivism. Parsimony, likelihood, and Consistency. A
model branching process.
</AB>
<PU>MIT Press </PU>
<PL>Cambridge, MA </PL>
<PY>1988</PY>
<PP>xviii+265-0</PP>
</SEQ>

<SEQ>
<UI>1341   Maddison,W.   Reconstructing Charact.. Cladistics      89 
5(4):365-377
</UI>
<AU>Maddison W
</AU>
<TI>Reconstructing Character Evolution on Polytomous Cladograms
</TI>
<SU>Phylogeny;
    Character data;
    USA;
    Evolution
</SU>
<AB>"New algorithms for both ordered and unordered characters are presented 
to
reconstruct character evolution under the uncertain-resolution interpretation 
of
polytomies. These algorithms allow the cladogram to resolve itself so as to be
favourable for the character whose evolution is being reconstructed. Because
different characters may have different favourable resolutions, it is not
possible in general to use these algorithms to determine the total parsimony of
a polytomous cladogram ..., for which the only adequate approach is to find a
most parsimonious dichotomous resolution of the cladogram."
</AB>
<JT>Cladistics </JT>
<PY>1989</PY>
<VO>5</VO>
<NO>4</NO>
<PP>365-377</PP>
</SEQ>

<SEQ>
<UI>1342   Wang,C.       A Subgraph Problem fro.. J.Comput.Biol.  94 
1(3):227-234
</UI>
<AU>Wang C
</AU>
<TI>A Subgraph Problem from Restriction Maps of DNA
</TI>
<SU>Restriction;
    Mapping;
    DNA;
    USA
</SU>
<AB>"Computing the minimum number of edge removals needed to convert a
bipartite graph into an interval graph was proposed by Waterman and Griggs in
the study of restriction maps of DNA. We show that this problem is NP-complete
and we give a polynomial algorithm that finds an edge-maximum interval subgraph
for trees. Then various heuristics can be devised using this algorithm."
</AB>
<JT>J Comput Biol</JT>
<PY>1994</PY>
<VO>1</VO>
<NO>3</NO>
<PP>227-234</PP>
</SEQ>

<SEQ>
<UI>1343   Nanney,D.L.   Shifting Ditypic Site .. J.Mol.Evol.     89 
28:451-459
</UI>
<AU>Nanney DL;
    Preparata RM;
    Preparata FP;
    Meyer EB;
    Simon EM
</AU>
<TI>Shifting Ditypic Site Analysis: Heuristics for Expanding the Phylogenetic
Range of Nucleotide Sequences in Sankoff Analyses
</TI>
<SU>Phylogeny;
    Character data;
    USA;
    Nucleotide;
    Heuristic;
    Phylogenetic
</SU>
<AB>"We describe and illustrate a simple heuristic approach to the Sankoff
methods for construction of parsimonious evolutionary trees from nucleotide
sequence data. The procedure is intended to permit more valid inferences,
particularly from relatively short sequences, concerning relationships among
taxa separated for long time intervals. The procedure is based on the great
variability of evolutionary plasticity among sites in the molecules and removes
from consideration the more highly variable sites. ... Only 'ditypic sites,'
i.e., sites observed in only two evolutionary states within the array, are used
in making phylogenetic inferences."
</AB>
<JT>J Mol Evol</JT>
<PY>28</PY>
<VO>28</VO>
<PP>451-459</PP>
</SEQ>

<SEQ>
<UI>1344   Han,J.        Over-Representation of.. Nucleic Acids R 94 
22(9):1735-174
</UI>
<AU>Han J;
    Hsu C;
    Zhu Z;
    Longshore JW;
    Finley WH
</AU>
<TI>Over-Representation of the Disease Associated (CAG) and (CGG) Repeats in
the Human Genome
</TI>
<SU>Repeat;
    Repetition;
    Genome;
    Sequence analysis;
    USA
</SU>
<AB>"Expansion of trimer repeats has recently been described as a new type of
human mutation. Of the 64 possible trimer compositions, only the CGG and CAG
repeats have been implicated in genetic diseases. This study intends to address
two questions: (1) What makes the CGG and CAG repeats unique? (2) Could other
trimer repeats be involved in this type of mutation? ... The computer aided
sequence analysis studies reported here may help to understand the molecular
mechanisms of trimer repeat expansion."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1994</PY>
<VO>22</VO>
<NO>9</NO>
<PP>1735-1740</PP>
</SEQ>

<SEQ>
<UI>1345   Zhang,Z.      An Exponential Example.. J.Comput.Biol.  94 
1(3):235-239
</UI>
<AU>Zhang Z
</AU>
<TI>An Exponential Example for a Partial Digest Mapping Algorithm
</TI>
<SU>Digest;
    Mapping;
    USA;
    Algorithm
</SU>
<AB>"The partial digest problem for small-scale DNA physical mapping is known
in computer science as the turnpike reconstruction problem. Although no
polynomial algorithm for this problem is known, a simple backtracking algorithm
of Skiena et al. works well in practice. Weiss raises the question whether an
exponential example exists for this algorithm. This paper presents such an
exponential example for this backtracking algorithm."
</AB>
<JT>J Comput Biol</JT>
<PY>1994</PY>
<VO>1</VO>
<NO>3</NO>
<PP>235-239</PP>
</SEQ>

<SEQ>
<UI>1346   Lake,J.A.     Origin of the Eukaryot.. Nature (Lond.)  88 331(14 
Jan.):1
</UI>
<AU>Lake JA
</AU>
<TI>Origin of the Eukaryotic Nucleus Determined by Rate-Invariant Analysis of
rRNA Sequences
</TI>
<SU>Phylogeny;
    Invariant;
    USA
</SU>
<AB>"The second application [of Lake's (1987) method of evolutionary
parsimony] to more taxa is to infer a fully resolved branching of many species
(Lake 1988); however, it has yet to be described in sufficient detail to be
reproduced." - Swofford, Olsen (1990), p. 474.
</AB>
<JT>Nature (Lond ) </JT>
<PY>1988</PY>
<VO>331</VO>
<NO>14 Jan.</NO>
<PP>184-186</PP>
</SEQ>

<SEQ>
<UI>1347   Martin,D.R.   Equivalence Classes fo.. J.Comput.Biol.  94 
1(3):241-253
</UI>
<AU>Martin DR
</AU>
<TI>Equivalence Classes for the Double-Digest Problem with Coincident Cut
Sites
</TI>
<SU>Restriction;
    Mapping;
    Digest;
    DNA;
    USA
</SU>
<AB>"Pevzner (1994) completely characterized the solutions to the [Double
Digest Problem (DDP)] in the case of no coincident cut sites by associating
solutions to DDP with alternating Eulerian paths in an edge-bicolored graph. In
this paper we extend the definition of cassettes and their transformations to
the general case allowing coincident cut sites. Solutions to the DDP in the
general case are again characterized by associating solutions to the DDP with
alternating Eulerian cycles in an extended graph."
</AB>
<JT>J Comput Biol</JT>
<PY>1994</PY>
<VO>1</VO>
<NO>3</NO>
<PP>241-253</PP>
</SEQ>

<SEQ>
<UI>1348   Fukami,K.     On the Maximum Likelih.. J.Mol.Evol.     89 
28:460-464
</UI>
<AU>Fukami K;
    Tateno Y
</AU>
<TI>On the Maximum Likelihood Method for Estimating Molecular Trees:
Uniqueness of the Likelihood Point
</TI>
<SU>Phylogeny;
    Likelihood;
    JP
</SU>
<AB>"Studies are carried out on the uniqueness of the stationary point on the
likelihood function for estimating molecular phylogenetic trees, yielding proof
that there exists at most one stationary point, i.e., the maximum point, in the
parameter range for the one parameter model of nucleotide substitution. The
proof is simple yet applicable to any type of tree topology with an arbitrary
number of operational taxonomic units (OTUs). The proof ensures that any valid
approximation algorithm be able to reach the unique maximum point under the
conditions mentioned above."
</AB>
<JT>J Mol Evol</JT>
<PY>28</PY>
<VO>28</VO>
<PP>460-464</PP>
</SEQ>

<SEQ>
<UI>1349   Hendy,M.D.    Branch and Bound Algor.. Math.Biosci.    82 
59:277-290
</UI>
<AU>Hendy MD;
    Penny D
</AU>
<TI>Branch and Bound Algorithms to Determine Minimal Evolutionary Trees
</TI>
<SU>Evolutionary tree;
    Phylogeny;
    Program;
    NZ;
    Algorithm
</SU>
<AB>"Two practical branch and bound algorithms for determining minimal and
near-minimal phylogenetic trees from protein sequence data are presented. A
mathematical description and analysis of phylogenetic trees introduces these
algorithms. A comment on efficiency and fine tuning completes the paper. An
example is cited where computer time was reduced from an estimated 55 days for 
a
total search, to just under 5 minutes."
</AB>
<JT>Math Biosci</JT>
<PY>59</PY>
<VO>59</VO>
<PP>277-290</PP>
</SEQ>

<SEQ>
<UI>1350   Sharkey,M.J.  A Hypothesis-Independe.. Cladistics      89 
5(1):63-86
</UI>
<AU>Sharkey MJ
</AU>
<TI>A Hypothesis-Independent Method of Character Weighting for Cladistic
Analysis
</TI>
<SU>Character weight;
    Compatibility;
    CA;
    Cladistic
</SU>
<AB>"A hypothesis-independent method of weighting cladistic characters, based
on character compatibility, is proposed. The method is used in two fashions, to
generate cladograms, and to select from multiple minimum length cladograms. ...
The method is contrasted with other weighting techniques which are generally
found to be hypothesis dependent."
</AB>
<JT>Cladistics </JT>
<PY>1989</PY>
<VO>5</VO>
<NO>1</NO>
<PP>63-86</PP>
</SEQ>

<SEQ>
<UI>1351   Hendy,M.D.    Identification of Phyl.. J.Theor.Biol.   78 
71:441-452
</UI>
<AU>Hendy MD;
    Penny D;
    Foulds LR
</AU>
<TI>Identification of Phylogenetic Trees of Minimal Length
</TI>
<SU>Phylogeny;
    Parsimony;
    NZ;
    Identification;
    Phylogenetic
</SU>
<AB>"The problem of determining an optimal phylogenetic tree from a set of
data is an example of the Steiner problem in graphs. There is no efficient
algorithm for solving this problem with reasonably large data sets. In the
present paper an approach is described that proves in some cases that a given
tree is optimal without testing all possible trees. ... We simultaneously
attempt to reduce the total length of the tree and increase the lower bound.
When these are equal it is not possible to make a shorter tree with a given 
data
set and given criterion."
</AB>
<JT>J Theor Biol</JT>
<PY>71</PY>
<VO>71</VO>
<PP>441-452</PP>
</SEQ>

<SEQ>
<UI>1352   Hendy,M.D.    Proving Phylogenetic T.. Math.Biosci.    80 51:71-88
</UI>
<AU>Hendy MD;
    Foulds LR;
    Penny D
</AU>
<TI>Proving Phylogenetic Trees Minimal with l-Clustering and Set Partitioning
</TI>
<SU>Phylogeny;
    Parsimony;
    Evolutionary tree;
    NZ;
    Phylogenetic
</SU>
<AB>"The problem of determining a minimal length phylogenetic (evolutionary)
tree from a set of aligned protein sequences is defined mathematically. 
Although
this is an example of the Steiner problem in graphs (SPG), we exploit the
special nature of the character sequences to solve it more efficiently than by
using SPG algorithms. This is done by using a key theorem concerning partitions
of the data. All optimal solutions for problems with less than six species are
classified. In problems where optimality is not immediately achieved, the data
must be partitioned."
</AB>
<JT>Math Biosci</JT>
<PY>51</PY>
<VO>51</VO>
<PP>71-88</PP>
</SEQ>

<SEQ>
<UI>1353   Lopez-Ortiz,A Linear Pattern Matchin.. SIGACT News     94 
25(3):114-121
</UI>
<AU>Lopez-Ortiz A
</AU>
<TI>Linear Pattern Matching of Repeated Substrings
</TI>
<SU>Pattern match;
    Repeat;
    String match;
    Data structure;
    Search tree;
    CA
</SU>
<AB>"Weiner (1973) presented a very original algorithm that performs linear
time recognition of repeated instances of a substring in a string. Weiner's
approach to this problem was as important as the solution to the problem 
itself.
... Unfortunately, Weiner's paper may be difficult for modern readers. Familiar
objects such as trees and other data structures are described using notation
drawn from automata theory. Typographical errors and overloading of terms
contribute to the difficulty. This paper attempts to explain Weiner's result in
a more accessible manner."
</AB>
<JT>SIGACT News </JT>
<PY>1994</PY>
<VO>25</VO>
<NO>3</NO>
<PP>114-121</PP>
</SEQ>

<SEQ>
<UI>1354   Moore,G.M.    An Iterative Approach .. J.Theor.Biol.   73 
38:423-457
</UI>
<AU>Moore GM;
    Goodman M;
    Barnabas J
</AU>
<TI>An Iterative Approach from the Standpoint of the Additive Hypothesis to
the Dendrogram Problem Posed by Molecular Data Sets
</TI>
<SU>Phylogeny;
    Distance;
    USA
</SU>
<AB>"The problem of constructing a dendrogram depicting phylogenetic
relationships for a collection of contemporary species is considered. An
approach was developed based on the additive hypothesis in which each 'length'
between two species can be described by the shortest sum of lengths for the
individual links on the dendrogram topology which connect the two species. The
additive hypothesis holds equally well if the dendrogram is replaced by its
corresponding (rootless) network. Network topologies are defined set
theoretically in terms of the initial, contemporary species ...."
</AB>
<JT>J Theor Biol</JT>
<PY>38</PY>
<VO>38</VO>
<PP>423-457</PP>
</SEQ>

<SEQ>
<UI>1355   Maddison,W.P. Interactive Analysis o.. Folia Primatol. 89 
53(1-4):190-20
</UI>
<AU>Maddison WP;
    Maddison DR
</AU>
<TI>Interactive Analysis of Phylogeny and Character Evolution Using the
Computer Program MacClade
</TI>
<SU>Phylogeny;
    Character data;
    Program;
    USA;
    Evolution
</SU>
<AB>"Computer programs for phylogenetic analysis have been important tools in
systematics and evolutionary biology, but most have been designed primarily for
the reconstruction of phylogenetic trees and not the interpretation of patterns
of character evolution. Described here is the computer program MacClade,
designed for interactive analysis of character evolution and phylogeny."
</AB>
<JT>Folia Primatol</JT>
<PY>1989</PY>
<VO>53</VO>
<NO>1-4</NO>
<PP>190-202</PP>
</SEQ>

<SEQ>
<UI>1356   Sober,E.      A Likelihood Justifica.. Cladistics      85 
1(3):209-233
</UI>
<AU>Sober E
</AU>
<TI>A Likelihood Justification of Parsimony
</TI>
<SU>Phylogeny;
    Parsimony;
    Likelihood;
    USA
</SU>
<AB>"A connection is established between maximally parsimonious cladograms 
and
trees of highest likelihood. The assumptions needed to prove this are derivable
from the structure of evolutionary theory and are independent of the frequency
of homoplasy. The bearing of this justification on alternative methods of
phylogenetic inference and on Felsenstein's (1978) proof that parsimony and
other phylogenetic methods can be statistically inconsistent is discussed."
</AB>
<JT>Cladistics </JT>
<PY>1985</PY>
<VO>1</VO>
<NO>3</NO>
<PP>209-233</PP>
</SEQ>

<SEQ>
<UI>1357   Sober,E.      Parsimony and Characte.. Cladistics      86 
2(1):28-42
</UI>
<AU>Sober E
</AU>
<TI>Parsimony and Character Weighting
</TI>
<SU>Phylogeny;
    Character weight;
    Likelihood;
    USA;
    Parsimony
</SU>
<AB>"The likelihood justification of parsimony proposed in Sober (1983, 1984)
is applied to some problems posed by character weighting. An argument is
provided for thinking that the point frequency of a character is not a good
descriptor for parsimonious reconstructions of a phylogeny. The idea that a 
good
character will be conservative or nonadaptive is also examined from a 
likelihood
point of view."
</AB>
<JT>Cladistics </JT>
<PY>1986</PY>
<VO>2</VO>
<NO>1</NO>
<PP>28-42</PP>
</SEQ>

<SEQ>
<UI>1358   McGuire,J.B.  On the Reconstruction .. J.Theor.Biol.   78 
75:141-147
</UI>
<AU>McGuire JB;
    Thompson CJ
</AU>
<TI>On the Reconstruction of an Evolutionary Order
</TI>
<SU>Phylogeny;
    Axiomatic;
    USA
</SU>
<AB>"The problem of reconstructing an evolutionary order from various
taxonomic criteria may be thought of as specifying a computer program or
evolution function e that takes as input the taxonomic orders based on the
criteria and produces a single composite evolutionary order as output. We
specify four conditions that any such e should satisfy. Taken separately the
conditions seem reasonable but taken together they are inconsistent."
</AB>
<JT>J Theor Biol</JT>
<PY>75</PY>
<VO>75</VO>
<PP>141-147</PP>
</SEQ>

<SEQ>
<UI>1359   Vach,W.       Least Squares Approxim.. CSQ - Comput.St 91 
3:203-218
</UI>
<AU>Vach W;
    Degens PO
</AU>
<TI>Least Squares Approximation of Additive Trees to Dissimilarities -
Characterizations and Algorithms
</TI>
<SU>Additive tree;
    Distance;
    Least squares;
    Square;
    Approximation;
    Monte Carlo;
    DE;
    Characterization;
    Algorithm
</SU>
<AB>"We consider the problem of fitting an additive tree to a given
dissimilarity matrix by least squares approximation. We present several
characterizations of the local solutions to this approximation problem. One of
them leads directly to a new algorithmic approach extending the agglomerative
construction principle which is well established in hierarchical clustering.
Some new and traditional tree constructing methods are compared in a Monte 
Carlo
study with regard to their ability to redetect substructures of a true tree."
</AB>
<JT>CSQ - Comput Statist Quart</JT>
<PY>1991</PY>
<VO>3</VO>
<PP>203-218</PP>
</SEQ>

<SEQ>
<UI>1360   Rodrigo,A.G.  A Modification to Whee.. Cladistics      92 
8(2):165-170
</UI>
<AU>Rodrigo AG
</AU>
<TI>A Modification to Wheeler's Combinatorial Weights Calculations
</TI>
<SU>Substitution;
    Character weight;
    Phylogeny;
    NZ;
    Combinatorial
</SU>
<AB>"Wheeler (1990) proposed a procedure for weighting the transformations
between all nucleotide pairs, from a set of aligned nucleotide sequences. ... 
In
this paper I show that the normalization procedure estimates the conditional
probability ... instead of the 'total' probability .... I argue that an 
estimate
of the latter probability is more appropriate for phylogenetic analysis and I
present a modification of Wheeler's method. Finally, I show how we may estimate
asymmetric transformation probabilities using an outgroup. If there is a
reasonable outgroup available, this method may be preferable to the other
described here."
</AB>
<JT>Cladistics </JT>
<PY>1992</PY>
<VO>8</VO>
<NO>2</NO>
<PP>165-170</PP>
</SEQ>

<SEQ>
<UI>1361   Rodrigo,A.G.  An Information-Rich Ch.. N.Z.Nat.Sci.    89 
16:97-103
</UI>
<AU>Rodrigo AG
</AU>
<TI>An Information-Rich Character Weighting Procedure for Parsimony Analysis
</TI>
<SU>Phylogeny;
    Character weight;
    Parsimony;
    NZ
</SU>
<AB>"A weighting procedure is proposed which takes account of prior
information pertaining to the characters used in a parsimony analysis. This
information comes from specific knowledge about the biology of the group in
question, as well as general evolutionary theory. ... The procedure is an
iterative one, and can be terminated once the resultant tree has converged to a
'constant' value, or after a predetermined number of runs. The resultant tree
may or may not be as short as the most parsimonious tree. It is argued that in
taking account of prior information, the proposed procedure is information-rich
(IR)."
</AB>
<JT>N Z Nat Sci</JT>
<PY>16</PY>
<VO>16</VO>
<PP>97-103</PP>
</SEQ>

<SEQ>
<UI>1362   Albert,V.A.   On the Rationale and U.. Cladistics      92 
8(1):73-83
</UI>
<AU>Albert VA;
    Mishler BD
</AU>
<TI>On the Rationale and Utility of Weighting Nucleotide Sequence Data
</TI>
<SU>Phylogeny;
    Character weight;
    USA;
    Nucleotide
</SU>
<AB>"These issues are germane to the weighting scheme recently proposed by W.
Wheeler (1990) for use with nucleotide sequence data. We will address Wheeler's
weighting approach from two angles: (i) the assumptions that must be made in
order to justify its use; and (ii) its convergence, albeit in an inferior
manner, to the within-character weighting approach already developed by David
Sankoff and colleagues ...."
</AB>
<JT>Cladistics </JT>
<PY>1992</PY>
<VO>8</VO>
<NO>1</NO>
<PP>73-83</PP>
</SEQ>

<SEQ>
<UI>1363   Mishler,B.D.  The Use of Nucleic Aci.. Taxon           88 
37:391-395
</UI>
<AU>Mishler BD;
    Bremer K;
    Humphries CJ;
    Churchill SP
</AU>
<TI>The Use of Nucleic Acid Sequence Data in Phylogenetic Reconstruction
</TI>
<SU>Phylogeny;
    USA;
    Phylogenetic;
    Nucleic acid
</SU>
<AB>"Considerable interest has recently been focused on nucleic acid sequence
data as a source of phylogenetic information. ... With respect to higher-level
relationships of green plants, 5S RNA sequences provide the most information at
the present time .... In an earlier paper ... we attempted to apply these data
to a cladistic analysis of the green plants, but were discouraged with the
results because of considerable homoplasy in the data. Steele et al. (1988) 
have
objected to our rejection of these particular data in that analysis. We wish to
respond to their concerns and more generally discuss prospects and problems 
with
nucleic acid sequences as systematic evidence."
</AB>
<JT>Taxon </JT>
<PY>37</PY>
<VO>37</VO>
<PP>391-395</PP>
</SEQ>

<SEQ>
<UI>1364   Wheeler,W.    Quo Vadis?               Cladistics      92 
8(1):85-86
</UI>
<AU>Wheeler W
</AU>
<TI>Quo Vadis?
</TI>
<SU>Phylogeny;
    Character weight;
    USA
</SU>
<AB>"Albert and Mishler (1992) raise several points in the criticism of my 
two
papers (Wheeler, 1990a,b) describing and using combinatorial weights. Overall,
the points raised are divisible into two types, those which arise from a
misconception as to the meaning of the weights and those which are based on a
probabilistic model of their construction. I will discuss their criticism in
this light."
</AB>
<JT>Cladistics </JT>
<PY>1992</PY>
<VO>8</VO>
<NO>1</NO>
<PP>85-86</PP>
</SEQ>

<SEQ>
<UI>1365   Bryant,H.N.   The Role of Permutatio.. Syst.Biol.      92 
41(2):258-263
</UI>
<AU>Bryant HN
</AU>
<TI>The Role of Permutation Tail Probability Tests in Phylogenetic 
Systematics
</TI>
<SU>Phylogenetic;
    Statistical;
    Probability;
    CA;
    Permutation;
    Systematics
</SU>
<AB>"Faith and Cranston (1992) attempted to forge a formal link between the
cladistic permutation tail probability (PTP) associated with their 
randomization
test for cladistic structure (Faith and Cranston, 1991) and Karl Popper's 
(1959)
concept of corroboration. ... As a reviewer of an earlier version of their
paper, I argued that the null model - namely that the characters in the data
matrix will covary at random - is contrary to the basic axioms of phylogenetic
systematics. ... Discussion of these problems leads to a slightly different
interpretation of the role of PTP testing in the evaluation of 
most-parsimonious
cladograms."
</AB>
<JT>Syst Biol</JT>
<PY>1992</PY>
<VO>41</VO>
<NO>2</NO>
<PP>258-263</PP>
</SEQ>

<SEQ>
<UI>1366   Felsenstein,J Methods for Inferring .. Numerical Tax.. 
83Springer-Verlag
</UI>
<AU>Felsenstein J
</AU>
<TI>Methods for Inferring Phylogenies: A Statistical View
</TI>
<ED>Felsenstein J
</ED>
<BK>Numerical Taxonomy. NATO ASI Series, Vol. G1
</BK>
<SU>Phylogeny;
    Statistical;
    USA
</SU>
<AB>"While throughout the rest of science it is generally accepted that
statistics is the framework within which inferences from data ought to be made,
in systematics this is a minority viewpoint. Nonstatistical principles such as
parsimony are usually invoked as underlying the logic of the inferences. I 
think
that these principles are preferred precisely because they have an aura of
certainty that a statistical framework cannot provide. The remainder of this
paper will explore the implications of a statistical viewpoint on inferring
phylogenies. Readers who want a more extended account can consult my recent
review of methods for inferring phylogenies (Felsenstein 1982)."
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1983</PY>
<PP>315-334</PP>
</SEQ>

<SEQ>
<UI>1367   Felsenstein,J Parsimony and Likeliho.. Syst.Zool.      86 
35(4):617-626
</UI>
<AU>Felsenstein J;
    Sober E
</AU>
<TI>Parsimony and Likelihood: An Exchange
</TI>
<SU>Phylogeny;
    Likelihood;
    Parsimony;
    USA
</SU>
<AB>"This is intended as an exploration of the differences between the 
authors
on the relationship between parsimony and likelihood. We have used the format
pioneered by Harper and Platnick (1978) of an exchange of comments, each by one
of the authors."
</AB>
<JT>Syst Zool</JT>
<PY>1986</PY>
<VO>35</VO>
<NO>4</NO>
<PP>617-626</PP>
</SEQ>

<SEQ>
<UI>1368   Neff,N.A.     A Rational Basis for A.. Syst.Zool.      86 
35(1):110-123
</UI>
<AU>Neff NA
</AU>
<TI>A Rational Basis for A Priori Character Weighting
</TI>
<SU>Character weight;
    USA
</SU>
<AB>"Previously presented arguments for and against character weighting in
systematic analyses are briefly reviewed and the bases for different weighting
methods summarized. A priori and a posteriori methods are defined. I conclude
that a priori weighting is the only noncircular approach for weighting of
characters in the construction or recognition of groups of taxa, but that no
objective method of a priori weighting has been proposed to date. A 
hypothetico-
deductive methodology for character analysis completely prior to and 
independent
of cladistic analysis (or phylogeny reconstruction) is briefly summarized."
</AB>
<JT>Syst Zool</JT>
<PY>1986</PY>
<VO>35</VO>
<NO>1</NO>
<PP>110-123</PP>
</SEQ>

<SEQ>
<UI>1369   Saitou,N.     A Theoretical Study of.. Syst.Zool.      89 
38(1):1-6
</UI>
<AU>Saitou N
</AU>
<TI>A Theoretical Study of the Underestimation of Branch Lengths by the
Maximum Parsimony Principle
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Parsimony;
    JP
</SU>
<AB>"The degree of underestimation of branch lengths by the maximum parsimony
principle is studied. The expected number of nucleotide changes per site under
the maximum parsimony principle is computed, and it is compared with the
expected number of nucleotide substitutions. ... It is shown that as long as 
the
evolutionary distance is less than 0.2, the maximum parsimony principle gives
good estimates of nucleotide substitutions. When the evolutionary distance is
greater than 0,2, however, the method gives gross underestimates of nucleotide
substitutions."
</AB>
<JT>Syst Zool</JT>
<PY>1989</PY>
<VO>38</VO>
<NO>1</NO>
<PP>1-6</PP>
</SEQ>

<SEQ>
<UI>1370   Goloboff,P.A. Character Optimization.. Cladistics      93 
9(4):433-436
</UI>
<AU>Goloboff PA
</AU>
<TI>Character Optimization and Calculation of Tree Lengths
</TI>
<SU>Evolutionary tree;
    Character optimization;
    USA;
    Optimization
</SU>
<AB>"In cladistics, character optimization (Farris, 1970) is the process of
finding the possible assignments of states to the internal nodes of a tree such
that the steps, or length, for the character are the minimum possible .... For
non-additive characters, all the states that occur in at least one possible
optimization can be found using Fitch's (1971) two-pass algorithm. For additive
characters, the only published algorithm is Swofford and Maddison's (1987). I
describe here another algorithm which, like Swofford and Maddison's, deals only
with dichotomous trees, but is possibly more efficient and simpler to program."
</AB>
<JT>Cladistics </JT>
<PY>1993</PY>
<VO>9</VO>
<NO>4</NO>
<PP>433-436</PP>
</SEQ>

<SEQ>
<UI>1371   Farris,J.S.   Methods for Computing .. Syst.Zool.      70 19:83-92
</UI>
<AU>Farris JS
</AU>
<TI>Methods for Computing Wagner Trees
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Character optimization;
    USA
</SU>
<AB>"The article derives some properties of Wagner Trees and Networks and
describes computational procedures for Prim Networks, the Wagner Method,
Rootless Wagner Method and optimization of hypothetical intermediates."
</AB>
<JT>Syst Zool</JT>
<PY>19</PY>
<VO>19</VO>
<PP>83-92</PP>
</SEQ>

<SEQ>
<UI>1372   Swofford,D.L. Parsimony, Character-s.. Systematics, .. 92
</UI>
<AU>Swofford DL;
    Maddison WP
</AU>
<TI>Parsimony, Character-state Reconstructions, and Evolutionary Inferences
</TI>
<ED>Mayden RL
</ED>
<BK>Systematics, Historical Ecology, and North American Freshwater Fishes
</BK>
<SU>Phylogeny;
    Parsimony;
    Character optimization;
    USA
</SU>
<AB>Goloboff (1993), p. 436
</AB>
<PY>1992</PY>
<PP>186-223</PP>
</SEQ>

<SEQ>
<UI>1373   Goloboff,P.A. Estimating Character W.. Cladistics      93 
9(1):83-91
</UI>
<AU>Goloboff PA
</AU>
<TI>Estimating Character Weights during Tree Search
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Character weight;
    USA
</SU>
<AB>"A new method for weighting characters according to their homoplasy is
proposed; the method is non-iterative and does not require independent
estimations of weights. It is based on searching trees with maximum total fit,
with character fits defined as a concave function of homoplasy. Then, when
comparing trees, differences in steps occurring in characters which show more
homoplasy on the trees are less influential. The reliability of the characters
is estimated, during the analysis, as a logical implication of the trees being
compared. The 'fittest' trees imply that the characters are maximally reliable
and, given character conflict, have fewer steps for the characters which fit 
the
tree better."
</AB>
<JT>Cladistics </JT>
<PY>1993</PY>
<VO>9</VO>
<NO>1</NO>
<PP>83-91</PP>
</SEQ>

<SEQ>
<UI>1374   Hide,W.       Biological Evaluation .. J.Comput.Biol.  94 
1(3):199-215
</UI>
<AU>Hide W;
    Burke J;
    Davison DB
</AU>
<TI>Biological Evaluation of d2, an Algorithm for High-Performance Sequence
Comparison
</TI>
<SU>Sequence comparison;
    Database search;
    Sequence proximity;
    Sequence database;
    USA;
    Algorithm
</SU>
<AB>"A number of algorithms exist for searching sequence databases for
biologically significant similarities based on the primary sequence similarity
of aligned sequences. We have determined the biological sensitivity and
selectivity of d2, a high-performance comparison algorithm that rapidly
determines the relative dissimilarity of large datasets of genetic sequences. 
d2
uses sequence-word multiplicity as a simple measure of dissimilarity. It is not
constrained by the comparison of direct sequence alignments and so can use word
contexts to yield new information on relationships. ... A theoretical analysis
of the expectation for scores is presented."
</AB>
<JT>J Comput Biol</JT>
<PY>1994</PY>
<VO>1</VO>
<NO>3</NO>
<PP>199-215</PP>
</SEQ>

<SEQ>
<UI>1375   Davis,J.I.    Character Removal as a.. Cladistics      93 
9(2):201-210
</UI>
<AU>Davis JI
</AU>
<TI>Character Removal as a Means for Assessing Stability of Clades
</TI>
<SU>Evolutionary tree;
    Robustness;
    USA
</SU>
<AB>"The stability of each clade resolved by a data set can be assessed as 
the
minimum number of characters that, when removed, cause resolution of the clade
to be lost; a clade is regarded as having been lost when it does not occur in
the strict consensus tree. The clade stability index (CSI) is the ratio of this
minimum number of characters to the number of informative characters in the 
data
set. The CSI of a clade can range from 0 (absence from the consensus tree of 
the
complete data set) to 1 (all informative characters must be removed for the
clade to fail to be resolved)."
</AB>
<JT>Cladistics </JT>
<PY>1993</PY>
<VO>9</VO>
<NO>2</NO>
<PP>201-210</PP>
</SEQ>

<SEQ>
<UI>1376   Cracraft,J.   Parsimony and Phylogen.. Phylogenetic .. 91Oxford 
Universi
</UI>
<AU>Cracraft J;
    Helm-Bychowski K
</AU>
<TI>Parsimony and Phylogenetic Inference using DNA Sequences: Some
Methodological Strategies
</TI>
<ED>Miyamoto MM
    Cracraft J
</ED>
<BK>Phylogenetic Analysis of DNA Sequences
</BK>
<SU>Phylogeny;
    Parsimony;
    Reliability;
    Informativeness;
    USA;
    DNA;
    Phylogenetic
</SU>
<AB>"This chapter addresses two of the more general problems involving the 
use
of parsimony procedures in phylogenetic inference: (1) given that there are
physico-chemical/functional constraints on sequence evolution, especially in
sequences coding for proteins or structural RNAs, how might parsimony be 
applied
in order to infer phylogenetic relationships, and (2) how might we judge the
phylogenetic informativeness of sequence data?"
</AB>
<PU>Oxford University Press </PU>
<PL>New York </PL>
<PY>1991</PY>
<PP>184-220</PP>
</SEQ>

<SEQ>
<UI>1377   Fitch,W.M.    The Estimate of Total .. Phil.Trans.R.So 86 
312:317-324
</UI>
<AU>Fitch WM
</AU>
<TI>The Estimate of Total Nucleotide Substitutions from Pairwise Differences
is Biased
</TI>
<SU>Evolutionary rate;
    Evolutionary divergence;
    Substitution;
    USA;
    Nucleotide
</SU>
<AB>"A nomographic method is presented that estimates the number of 
nucleotide
substitutions since the common ancestor of two nucleotide sequences with no
assumption about the proportion of transition and transversion substitutions
except that it is constant over time. ... Mitochondrial data provide evidence
that, for this and probably other current models correcting for superimposed
substitutions, one or more of the underlying assumptions is incorrect. This is
because there is some unknown systematic bias affecting this evolutionary
process. It is suggested that at least part of the bias arises from incorrectly
assuming that all sites are variable."
</AB>
<JT>Phil Trans R Soc Lond Ser B </JT>
<PY>312</PY>
<VO>312</VO>
<PP>317-324</PP>
</SEQ>

<SEQ>
<UI>1378   Zharkikh,A.   Statistical Properties.. J.Mol.Evol.     92 
35(4):356-366
</UI>
<AU>Zharkikh A;
    Li WH
</AU>
<TI>Statistical Properties of Bootstrap Estimation of Phylogenetic 
Variability
from Nucleotide Sequences. II. Four Taxa Without a Molecular Clock
</TI>
<SU>Evolutionary tree;
    Bootstrap;
    Statistical;
    Clock;
    USA;
    Nucleotide;
    Phylogenetic;
    Estimation
</SU>
<AB>Zharkikh, Li (1993), p. 125
</AB>
<JT>J Mol Evol</JT>
<PY>1992</PY>
<VO>35</VO>
<NO>4</NO>
<PP>356-366</PP>
</SEQ>

<SEQ>
<UI>1379   Hedges,S.B.   The Number of Replicat.. Mol.Biol.Evol.  92 
9(2):366-369
</UI>
<AU>Hedges SB
</AU>
<TI>The Number of Replications Needed for Accurate Estimation of the 
Bootstrap
P Value in Phylogenetic Studies
</TI>
<SU>Phylogeny;
    Bootstrap;
    USA;
    Phylogenetic;
    Estimation
</SU>
<AB>"The bootstrap is a statistical method for obtaining a nonparametric
estimate of error. ... The application of bootstrapping to phylogeny estimation
is a tradeoff between the maximum number of replications that can be performed
by the researcher in a reasonable amount of time and the minimum number of
replications needed for accurate estimation of the bootstrap P value (BP). The
purpose of the present report is to explore the variance (and hence the
accuracy) of the phylogenetic BP and to establish guidelines for efficient
bootstrap sampling."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1992</PY>
<VO>9</VO>
<NO>2</NO>
<PP>366-369</PP>
</SEQ>

<SEQ>
<UI>1380   Mindell,D.P.  Aligning DNA Sequences.. Phylogenetic .. 91Oxford 
Universi
</UI>
<AU>Mindell DP
</AU>
<TI>Aligning DNA Sequences: Homology and Phylogenetic Weighting
</TI>
<ED>Miyamoto MM
    Cracraft J
</ED>
<BK>Phylogenetic Analysis of DNA Sequences
</BK>
<SU>Multiple alignment;
    Sequence weight;
    Homology;
    USA;
    DNA;
    Phylogenetic
</SU>
<AB>"The object of this chapter is to examine the practice of DNA sequence
alignment in the context of homology assessment. I point out that species
sequences should be aligned in descending order of phylogenetic relationship
(phylogenetic weighting of alignments) to maintain the continuity of 
information
which forms the basis of relationships of homology. Using mitochondrial
ribosomal RNA (rRNA) sequences, I also show how shuffling the order of input 
for
sequences in multiple alignments may be used to help determine phylogenetic
relationships among taxa whose divergences occurred relatively close together 
in
time."
</AB>
<PU>Oxford University Press </PU>
<PL>New York </PL>
<PY>1991</PY>
<PP>73-89</PP>
</SEQ>

<SEQ>
<UI>1381   Sidow,A.      Compositional Statisti.. Phylogenetic .. 91Oxford 
Universi
</UI>
<AU>Sidow A;
    Wilson AC
</AU>
<TI>Compositional Statistics Evaluated by Computer Simulations
</TI>
<ED>Miyamoto MM
    Cracraft J
</ED>
<BK>Phylogenetic Analysis of DNA Sequences
</BK>
<SU>Phylogeny;
    Evolutionary tree;
    Invariant;
    Simulation;
    USA
</SU>
<AB>"This chapter is about a new method called compositional statistics, 
which
is most suitable for elucidating relationships among highly diverged sequences.
In contrast to most other methods, it takes into account the sequences' base
compositions. Our discussion emphasizes the idea that biases in the base
composition of the compared sequences may affect a phylogenetic analysis and
produce systematic errors if not properly corrected for. ... In order to point
out the most useful applications of compositional statistics, we first discuss
some important strengths and weaknesses of the most commonly used methods of
phylogenetic inference."
</AB>
<PU>Oxford University Press </PU>
<PL>New York </PL>
<PY>1991</PY>
<PP>129-146</PP>
</SEQ>

<SEQ>
<UI>1382   Felsenstein,J Is There Something Wro.. Syst.Biol.      93 
42(2):193-200
</UI>
<AU>Felsenstein J;
    Kishino H
</AU>
<TI>Is There Something Wrong with the Bootstrap on Phylogenies? A Reply to
Hillis and Bull
</TI>
<SU>Phylogeny;
    Bootstrap;
    Reliability;
    Evolutionary tree;
    USA
</SU>
<AB>"We argue that these phenomena are not a result of using the bootstrap 
but
are a result of summarizing the evidence for a group by using a P value. Hillis
and Bull's phenomena are rather precisely duplicated in a much simpler model of
estimating where the mean of a normal distribution is, a model having no
bootstrapping. As in empirical studies, we can often get a clearer picture by
considering a simple example. Finally, we show that there is another
straightforward meaning of the P value that is not invalidated by Hillis and
Bull's criticisms and that can be taken as the 'real' meaning of the bootstrap 
P
value."
</AB>
<JT>Syst Biol</JT>
<PY>1993</PY>
<VO>42</VO>
<NO>2</NO>
<PP>193-200</PP>
</SEQ>

<SEQ>
<UI>1383   Sanderson,M.J MacClade, Version 3.0    Syst.Biol.      93 
42(2):218-220
</UI>
<AU>Sanderson MJ
</AU>
<TI>MacClade, Version 3.0
</TI>
<SU>Phylogeny;
    Character data;
    Review;
    Program;
    USA
</SU>
<AB>"Aside from a few minor bugs, the only real deficiency is the lack of
support for System 7, the recent update of the Macintosh operating system. ...
In the meantime, they have done an admirable job of not only satisfying most
workers' requirements but also of making a statement about the level of
sophistication necessary in studies of character evolution - and they have
provided the means to achieve it."
</AB>
<JT>Syst Biol</JT>
<PY>1993</PY>
<VO>42</VO>
<NO>2</NO>
<PP>218-220</PP>
</SEQ>

<SEQ>
<UI>1384   Felsenstein,J Counting Phylogenetic .. J.Theor.Biol.   91 
152:357-376
</UI>
<AU>Felsenstein J
</AU>
<TI>Counting Phylogenetic Invariants in Some Simple Cases
</TI>
<SU>Phylogeny;
    Invariant;
    Evolutionary tree;
    USA;
    Phylogenetic
</SU>
<AB>"An informal degrees of freedom argument is used to count the number of
phylogenetic invariants in cases where we have three or four species and can
assume a Jukes-Cantor model of base substitution with or without a molecular
clock. ... Two new classes of invariants are found: non-phylogenetic cubic
invariants testing independence of evolutionary events in different lineages,
and linear phylogenetic invariants which occur when there is a molecular clock.
Most of the linear invariants found by Cavender (1989) turn out in the Jukes-
Cantor case to be simple tests of symmetry of the substitution model, and not
phylogenetic invariants."
</AB>
<JT>J Theor Biol</JT>
<PY>152</PY>
<VO>152</VO>
<PP>357-376</PP>
</SEQ>

<SEQ>
<UI>1385   Hendy,M.D.    Hadamard Conjugation: .. N.Z.J.Bot.      93 
31(3):231-237
</UI>
<AU>Hendy MD;
    Charleston MA
</AU>
<TI>Hadamard Conjugation: A Versatile Tool for Modelling Nucleotide Sequence
Evolution
</TI>
<SU>Phylogeny;
    Hadamard;
    Invariant;
    NZ;
    Evolution;
    Nucleotide
</SU>
<AB>"Hadamard conjugation has proved to be a useful tool in examining some of
the properties of the patterns of nucleotide sequences arising from the
evolution of the taxa they represent. It has a considerable advantage in that
the formulae are independent of the phylogenetic structure under consideration
.... Hadamard conjugation is outlined and four applications are introduced.
[They] are the theoretical examination of tree building methods, the generation
of sample sequences under various models for simulation studies, the
identification of some phylogenetic invariants, and the closest tree method for
inferring phylogenetic trees and their edge lengths."
</AB>
<JT>N Z J Bot</JT>
<PY>1993</PY>
<VO>31</VO>
<NO>3</NO>
<PP>231-237</PP>
</SEQ>

<SEQ>
<UI>1386   Fu,Y.X.       Construction of Linear.. Math.Biosci.    92 
109:201-228
</UI>
<AU>Fu YX;
    Li WH
</AU>
<TI>Construction of Linear Invariants in Phylogenetic Inference
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Invariant;
    USA;
    Phylogenetic
</SU>
<AB>"An analytical method is presented for constructing linear invariants. 
All
linear invariants of a k-species tree can be derived from those of 
(k-1)-species
trees using this method. The new method is simpler than that of Cavender, which
relies on numerical computations. Moreover, the new method provides a 
convenient
tool to study the relationships between linear invariants of the same tree or 
of
different trees. All linear invariants of trees of up to five species are
derived in this study. ... The number of linear invariants for a tree is found
to increase rapidly with the number of species."
</AB>
<JT>Math Biosci</JT>
<PY>109</PY>
<VO>109</VO>
<PP>201-228</PP>
</SEQ>

<SEQ>
<UI>1387   Penny,D.      Progress with Methods .. Trends Ecol.Evo 92 
7(3):73-79
</UI>
<AU>Penny D;
    Hendy MD;
    Steel MA
</AU>
<TI>Progress with Methods for Constructing Evolutionary Trees
</TI>
<SU>Phylogeny;
    Invariant;
    Review;
    Evolutionary tree;
    NZ
</SU>
<AB>"Evolutionists dream of a tree-reconstruction method that is efficient
(fast), powerful, consistent, robust and falsifiable. These criteria are at
present conflicting in that the fastest methods are weak (in their use of
information in the sequences) and inconsistent (even with very long sequences
they may lead to an incorrect tree). But there has been exciting progress in 
new
approaches to tree inference, in understanding general properties of methods,
and in developing ideas for estimating the reliability of trees. New
phylogenetic invariant methods allow selected parameters of the underlying 
model
to be estimated directly from sequences."
</AB>
<JT>Trends Ecol Evol</JT>
<PY>1992</PY>
<VO>7</VO>
<NO>3</NO>
<PP>73-79</PP>
</SEQ>

<SEQ>
<UI>1388   Penny,D.      Testing the Theory of .. Phylogenetic .. 91Oxford 
Universi
</UI>
<AU>Penny D;
    Hendy MD;
    Steel MA
</AU>
<TI>Testing the Theory of Descent
</TI>
<ED>Miyamoto MM
    Cracraft J
</ED>
<BK>Phylogenetic Analysis of DNA Sequences
</BK>
<SU>Phylogeny;
    Spectral analysis;
    NZ
</SU>
<AB>"In this chapter we review our approach to the study of evolutionary
trees. This has been developed within a strong Popperian framework ... of 
aiming
to develop falsifiable hypotheses. After discussing some of the general issues
involved, we then discuss the question of how good methods are for inferring
trees, particularly from molecular data."
</AB>
<PU>Oxford University Press </PU>
<PL>New York </PL>
<PY>1991</PY>
<PP>155-183</PP>
</SEQ>

<SEQ>
<UI>1389   Steel,M.A.    Spectral Analysis and .. Appl.Math.Lett. 92 
5(6):63-67
</UI>
<AU>Steel MA;
    Hendy MD;
    Skekely LA;
    Erdos PL
</AU>
<TI>Spectral Analysis and a Closest Tree Method for Genetic Sequences
</TI>
<SU>Phylogeny;
    Genetic;
    Spectral analysis;
    Consistency;
    NZ
</SU>
<AB>"We describe a new method for estimating the evolutionary tree linking a
collection of species from their aligned four-state genetic sequences. This
method, which can be adapted to provide a branch-and-bound algorithm, is
statistically consistent provided the sequences have evolved according to a
standard stochastic model of nucleotide mutation. Our approach exploits a 
recent
group-theoretic description of this model."
</AB>
<JT>Appl Math Lett</JT>
<PY>1992</PY>
<VO>5</VO>
<NO>6</NO>
<PP>63-67</PP>
</SEQ>

<SEQ>
<UI>1390   Kim,J.        Multiple Sequence Alig.. Comput.Appl.Bio 94 
10(4):419-426
</UI>
<AU>Kim J;
    Pramanik S;
    Chung MJ
</AU>
<TI>Multiple Sequence Alignment using Simulated Annealing
</TI>
<SU>Multiple alignment;
    Simulated annealing;
    USA;
    Sequence alignment
</SU>
<AB>"An algorithm called Multiple Sequence Alignment using Simulated 
Annealing
(MSASA) has been developed. The computational complexity of MSASA is
significantly reduced by replacing the high-temperature phase of the annealing
process by a fast heuristic algorithm. ... Compared to the dynamic programming
approach, MSASA can (i) use natural gap costs which can generate better
solution, (ii) align more sequences and (iii) take less computation time."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>4</NO>
<PP>419-426</PP>
</SEQ>

<SEQ>
<UI>1391   Gotoh,O.      Further Improvement in.. Comput.Appl.Bio 94 
10(4):379-387
</UI>
<AU>Gotoh O
</AU>
<TI>Further Improvement in Methods of Group-to-Group Sequence Alignment with
Generalized Profile Operations
</TI>
<SU>Multiple alignment;
    Profile;
    JP
</SU>
<AB>"It has previously been shown that rigorous optimization of alignment
between two groups of sequences in the sense of minimal sum of pairs (SP) score
with a linear gap-weighting function can be achieved by an extended version of
the dynamic programming algorithm. The major drawback of this algorithm was 
that
the computation time grows in proportion to the product of the numbers (M and 
N)
of sequences comprising the two groups. A new algorithm presented in this paper
achieves the same rigorous alignment in a time complexity much less dependent 
on
the sizes of the two groups."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>4</NO>
<PP>379-387</PP>
</SEQ>

<SEQ>
<UI>1392   Fuchs,R.      Sequence Analysis by E.. Comput.Appl.Bio 94 
10(4):413-417
</UI>
<AU>Fuchs R
</AU>
<TI>Sequence Analysis by Electronic Mail: A Tool for Accessing Internet 
E-mail
Servers
</TI>
<SU>Sequence analysis;
    Program;
    Electronic mail;
    DE;
    Server;
    Internet
</SU>
<AB>"A new utility program, MSU, is described that simplifies the use of
electronic mail servers for sequence analysis. Service descriptions are defined
in external control files which can be changed without affecting the main
program. This approach makes MSU a highly flexible tool that allows easy
modification, extension and customization of service descriptions to suit 
users'
personal requirements."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>4</NO>
<PP>413-417</PP>
</SEQ>

<SEQ>
<UI>1393   Coulson,A.    Extracting the Informa.. Trends Biotechn 93 
11:223-227
</UI>
<AU>Coulson A
</AU>
<TI>Extracting the Information - Sequence Analysis Software Design Evolves
</TI>
<SU>Sequence analysis;
    Program;
    UK
</SU>
<AB>"In the past few years, techniques for electronic-data storage, retrieval
and analysis have become essential research tools in molecular biology. The
starting point for this development was the invention of techniques for the
rapid cloning and sequencing of genes; frequently, the sequence of a gene will
be available long before the gene product has been isolated, or even before
there are any experimantal methods available for its study. It is therefore
important to extract as much information as possible from the sequence itself,
both for its own sake, and to provide guidance in selecting subsequent
experimental approaches."
</AB>
<JT>Trends Biotechnol</JT>
<PY>11</PY>
<VO>11</VO>
<PP>223-227</PP>
</SEQ>

<SEQ>
<UI>1394   Davison,D.B.  The GenBank-Server at .. Nucleic Acids R 90 
18(6):1571-157
</UI>
<AU>Davison DB;
    Chappelear JE
</AU>
<TI>The GenBank-Server at the University of Houston
</TI>
<SU>Sequence database;
    Electronic mail;
    Program;
    USA
</SU>
<AB>"The University of Houston GenBank-Server is an electronic mail facility
which has been successfully in service for the last 14 months. It provides 
locus
id and accession number access to GenBank data. In addition, it also holds the
contributed-software archives previously kept at BIONET."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1990</PY>
<VO>18</VO>
<NO>6</NO>
<PP>1571-1572</PP>
</SEQ>

<SEQ>
<UI>1395   Henikoff,S.   Sequence Analysis by E.. Trends Biochem. 93 
18(Jul.):267-2
</UI>
<AU>Henikoff S
</AU>
<TI>Sequence Analysis by Electronic Mail Server
</TI>
<SU>Sequence analysis;
    Electronic mail;
    USA;
    Server
</SU>
<AB>"Sequence analysis tasks account for much of the recent popularity of e-
mail servers for biologists. ... Some e-mail servers make available programs
that are not generally found in sequence analysis packages. ... Table I lists
several e-mail servers for sequence analysis tasks. Amos Bairoch's more 
complete
description of e-mail servers can itself be obtained from an e-mail server 
(send
the message 'get doc:serv_ema.txt' to netservembl-heidelberg.de)."
</AB>
<JT>Trends Biochem Sci</JT>
<PY>1993</PY>
<VO>18</VO>
<NO>Jul.</NO>
<PP>267-268</PP>
</SEQ>

<SEQ>
<UI>1396   Tatusov,R.L.  A Simple Tool to Searc.. Comput.Appl.Bio 94 
10(4):457-459
</UI>
<AU>Tatusov RL;
    Koonin EV
</AU>
<TI>A Simple Tool to Search for Sequence Motifs that are Conserved in BLAST
Outputs
</TI>
<SU>Database search;
    Motif;
    BLAST;
    USA
</SU>
<AB>"An obvious way to augment the selectivity of 'weak' motifs without
compromising the specificity is to search for motifs that are conserved in
groups of related sequences. Such conservation serves as a filter that cuts off
fortuitous occurrences of motifs. We describe here a simple program, Bla, that
searches the output of BLAST, the widespresd fast database-searching program
..., for conserved motifs."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>4</NO>
<PP>457-459</PP>
</SEQ>

<SEQ>
<UI>1397   Miller,W.     A Note about Computing.. Comput.Appl.Bio 94 
10(4):455-456
</UI>
<AU>Miller W;
    Boguski M
</AU>
<TI>A Note about Computing All Local Alignments
</TI>
<SU>Pairwise alignment;
    Locally optimal;
    USA
</SU>
<AB>"A recent paper in this journal by G. Barton proposed an efficient
algorithm for locating locally optimal alignments between two sequences.
Although the paper claims that all such alignments are found, the approach
frequently fails to detect some of the significant matches. This note explains
the deficiency."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>4</NO>
<PP>455-456</PP>
</SEQ>

<SEQ>
<UI>1398   Landes,C.     Fast Databank Searchin.. Comput.Appl.Bio 94 
10(4):453-454
</UI>
<AU>Landes C;
    Risler JL
</AU>
<TI>Fast Databank Searching with a Reduced Amino-Acid Alphabet
</TI>
<SU>Database search;
    FR;
    Amino acid;
    Databank
</SU>
<AB>"Fast sequence databanks search algorithms generally make use of hash
tables and look for exactly matching words. An increased sensitivity - at the
expense of a decreased selectivity - can be attained in the case of proteins by
using a reduced amino acid alphabet. We propose here an alphabet reduced to 10
symbols, that we used in modified versions of the FASTP and SCAN programs. An
application ... shows that this technique may be useful in detecting distant
relationships between proteins."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>4</NO>
<PP>453-454</PP>
</SEQ>

<SEQ>
<UI>1399   Rzhetsky,A.   METREE: A Program Pack.. Comput.Appl.Bio 94 
10(4):409-412
</UI>
<AU>Rzhetsky A;
    Nei M
</AU>
<TI>METREE: A Program Package for Inferring and Testing Minimum-Evolution
Trees
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Minimum evolution;
    Program;
    USA
</SU>
<AB>"The METREE program package for estimating phylogenetic trees with the
minimum evolution method is written in Turbo C 2.0 and is intended to be used 
on
any IBM-compatible personal computers that have a mathematical coprocessor. The
package is simple to use and is menu driven. A program for visualizing and
printing out the final tree is also included."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>4</NO>
<PP>409-412</PP>
</SEQ>

<SEQ>
<UI>1400   Corpet,F.     RNAlign Program: Align.. Comput.Appl.Bio 94 
10(4):389-399
</UI>
<AU>Corpet F;
    Michot B
</AU>
<TI>RNAlign Program: Alignment of RNA Sequences using both Primary and
Secondary Structures
</TI>
<SU>Sequence alignment;
    Sequence database;
    FR;
    Program;
    Structure;
    RNA;
    Secondary
</SU>
<AB>"We have developed an algorithm and a computer program for aligning new
RNA sequences with a bank of aligned homologous RNA sequences. Given a common
folding structure for the bank, the program performs an alignment between the
bank and a new sequence, optimal both in terms of primary and secondary
structure. This method is useful to align sequences that present a common
folding structure despite extensive divergence of their primary structures. It
allows these preserved regions to be precisely distinguished from domains with
more variable secondary structure."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>4</NO>
<PP>389-399</PP>
</SEQ>

<SEQ>
<UI>1401   Warnow,T.J.   Constructing Phylogene.. N.Z.J.Bot.      93 
31(3):239-248
</UI>
<AU>Warnow TJ
</AU>
<TI>Constructing Phylogenetic Trees Efficiently using Compatibility Criteria
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Compatibility;
    USA;
    Phylogenetic
</SU>
<AB>"The Character Compatibility Problem is a classical problem in
computational biology concerned with constructing phylogenetic trees of minimum
possible evolution from qualitative character sets. This problem arose in the
1970s, and until recently the only cases for which efficient algorithms were
found were for binary (i.e. two-state) characters and for two characters at a
time, while the complexity of the general problem remained open. In this paper
we will discuss the remarkable progress on this problem since 1990."
</AB>
<JT>N Z J Bot</JT>
<PY>1993</PY>
<VO>31</VO>
<NO>3</NO>
<PP>239-248</PP>
</SEQ>

<SEQ>
<UI>1402   Yang,Z.       Maximum Likelihood Phy.. J.Mol.Evol.     94 
39(3):306-314
</UI>
<AU>Yang Z
</AU>
<TI>Maximum Likelihood Phylogenetic Estimation from DNA Sequences with
Variable Rates over Sites: Approximate Methods
</TI>
<SU>Phylogeny;
    Likelihood;
    Evolutionary rate;
    Approximation;
    UK;
    DNA;
    Rate;
    Phylogenetic;
    Estimation
</SU>
<AB>"Two approximate methods are proposed for maximum likelihood phylogenetic
estimation, which allow variable rates of substitution across nucleotide sites.
Three data sets with quite different characteristics were analyzed to examine
empirically the performance of these methods. ... The computational 
requirements
of the two methods are comparable to that of Felsenstein's (1981) model, which
assumes a single rate for all the sites."
</AB>
<JT>J Mol Evol</JT>
<PY>1994</PY>
<VO>39</VO>
<NO>3</NO>
<PP>306-314</PP>
</SEQ>

<SEQ>
<UI>1403   Zharkikh,A.   Estimation of Evolutio.. J.Mol.Evol.     94 
39(3):315-329
</UI>
<AU>Zharkikh A
</AU>
<TI>Estimation of Evolutionary Distances Between Nucleotide Sequences
</TI>
<SU>Evolutionary distance;
    Distance;
    Markov;
    Substitution;
    USA;
    Nucleotide;
    Estimation
</SU>
<AB>"A formal mathematical analysis of the substitution process in nucleotide
sequence evolution was done in terms of the Markov process. ... Extensive
computer simulation was used to compare the accuracy and effectiveness of
various methods for estimating the evolutionary distance between two nucleotide
sequences. It was shown that the multiparameter methods of Lanave et al. 
(1984),
Gojobori et al. (1982), and Barry and Hartigan (1987) are preferable to others
for the purpose of phylogenetic analysis when the sequences are long. However,
when sequences are short and the evolutionary distance is large, Tajima and
Nei's (1984) method is superior to others."
</AB>
<JT>J Mol Evol</JT>
<PY>1994</PY>
<VO>39</VO>
<NO>3</NO>
<PP>315-329</PP>
</SEQ>

<SEQ>
<UI>1404                 Statistical Analysis o..                 83Marcel 
Dekker,
</UI>
<TI>Statistical Analysis of DNA Sequence Data
</TI>
<ED>Weir BS
BK  -
</ED>
<SU>Sequence analysis;
    Statistical;
    USA;
    DNA
</SU>
<AB>I have only the preface and bibliography. "This book is intended to 
survey
the rapidly growing field of statistical analysis of DNA sequence data. The
authors are all engaged in such analyses, and several of them are also involved
in the generation of DNA data. They have pointed to current problems in the
interpretation of the new genetic information and have shown possible 
approaches
to solving these problems. We all hope that the book will serve as a timely and
convenient reference for molecular, population and evolutionary geneticists and
will also serve to stimulate statisticians to become involved in one of the 
most
exciting areas of modern science."
</AB>
<PU>Marcel Dekker</PU>
<PL> Inc ,New York </PL>
<PY>1983</PY>
<PP>pp. ix+255-0</PP>
</SEQ>

<SEQ>
<UI>1405   McClure,M.A.  Comparative Analysis o.. Mol.Biol.Evol.  94 
11(4):571-592
</UI>
<AU>McClure MA;
    Vasi TK;
    Fitch WM
</AU>
<TI>Comparative Analysis of Multiple Protein-Sequence Alignment Methods
</TI>
<SU>Multiple alignment;
    Sequence alignment;
    Survey;
    Protein;
    Motif;
    USA
</SU>
<AB>"We have analyzed a total of 12 different global and local multiple
protein-sequence alignment methods. The purpose of this study is to evaluate
each method's ability to correctly identify the ordered series of motifs found
among all members of a given protein family. ... The performance of all 12
methods was affected by (1) the number of sequences in the test sets, (2) the
degree of similarity among the sequences, and (3) the number of indels required
to produce a multiple alignment. Global methods generally performed better than
local methods in the detection of motif patterns."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1994</PY>
<VO>11</VO>
<NO>4</NO>
<PP>571-592</PP>
</SEQ>

<SEQ>
<UI>1406   Waddell,P.J.  The Sampling Distribut.. Mol.Biol.Evol.  94 
11(4):630-642
</UI>
<AU>Waddell PJ;
    Penny D;
    Hendy MD;
    Arnold G
</AU>
<TI>The Sampling Distributions and Covariance Matrix of Phylogenetic Spectra
</TI>
<SU>Phylogeny;
    Genetic;
    Hadamard;
    Spectral analysis;
    Distribution;
    Covariance;
    NZ;
    Sampling;
    Phylogenetic;
    Matrix
</SU>
<AB>"We extend recent advances in computing variance-covariance matrices from
genetic distances to a sequence method of phylogenetic analysis. These 
matrices,
together with other statistical properties of corrected sequence spectra, are
studied as a foundation fro more powerful and testable methods in 
phylogenetics.
... Our results extend naturally to four-color (nucleotide) spectra."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1994</PY>
<VO>11</VO>
<NO>4</NO>
<PP>630-642</PP>
</SEQ>

<SEQ>
<UI>1407   Gaut,B.S.     Detecting Substitution.. Mol.Biol.Evol.  94 
11(4):620-629
</UI>
<AU>Gaut BS;
    Weir BS
</AU>
<TI>Detecting Substitution-Rate Heterogeneity among Regions of a Nucleotide
Sequence
</TI>
<SU>Substitution;
    Region;
    Likelihood;
    Gene;
    USA;
    Nucleotide
</SU>
<AB>"Likelihood-ration statistics are proposed to test for heterogeneity in
nucleotide substitution rate among regions of a DNA sequence. The tests examine
three-sequence phylogenies, and two specific tests are proposed: a test to
detect rate heterogeneity among genic regions within a sequence, over all
evolutionary lineages; and a test to detect rate heterogeneity among regions in
a specific evolutionary lineage. Simulations examine the ability of tests to
detect a single region that varies in nucleotide substitution rate relative to
the remainder of the sequence."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1994</PY>
<VO>11</VO>
<NO>4</NO>
<PP>620-629</PP>
</SEQ>

<SEQ>
<UI>1408   Ota,T.        Variance and Covarianc.. Mol.Biol.Evol.  94 
11(4):613-619
</UI>
<AU>Ota T;
    Nei M
</AU>
<TI>Variance and Covariances of the Numbers of Synonymous and Nonsynonymous
Substitutions per Site
</TI>
<SU>Substitution;
    Synonymous;
    Statistical;
    Variance;
    USA;
    Covariance
</SU>
<AB>"Nei and Gojobori (1986) developed a simple method to estimate the 
numbers
of synonymous (ds) and nonsynonymous (dn) substitutions per site. In the 
present
paper, we have developed a method for computing variances and covariances of
ds's and dn's and of the proportions of synonymous (ps) and nonsynonymous (pn)
differences. We also have developed a method for computing the variances of 
mean
ds, dn, ps, pn, without constructing a phylogenetic tree of the genes. We have
conducted computer simulations based on simple evolutionary models and have
shown that the new method gives good estimates of variances and covariances."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1994</PY>
<VO>11</VO>
<NO>4</NO>
<PP>613-619</PP>
</SEQ>

<SEQ>
<UI>1409   Lockhart,P.J. Recovering Evolutionar.. Mol.Biol.Evol.  94 
11(4):605-612
</UI>
<AU>Lockhart PJ;
    Steel MA;
    Hendy MD;
    Penny D
</AU>
<TI>Recovering Evolutionary Trees under a More Realistic Model of Sequence
Evolution
</TI>
<SU>Phylogeny;
    Evolutionary tree;
    Stochastic;
    Evolution;
    NZ;
    Model
</SU>
<AB>"We report a new transformation, the LogDet, that is consistent for
sequences with differing nucleotide composition and that have arisen under
simple but asymmetric stochastic models of evolution. This transformation is
required because existing methods tend to group sequences on the basis of their
nucleotide composition, irrespective of their evolutionary history. ... The
overall conclusions from this study are that irregular A, C, G, T compositions
are an important and possible general cause of patterns that can mislead tree-
reconstruction methods, even when high bootstrap values are obtained."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1994</PY>
<VO>11</VO>
<NO>4</NO>
<PP>605-612</PP>
</SEQ>

<SEQ>
<UI>1410   Neuwald,A.F.  Detecting Patterns in .. J.Mol.Biol.     94 
239:698-712
</UI>
<AU>Neuwald AF;
    Green P
</AU>
<TI>Detecting Patterns in Protein Sequences
</TI>
<SU>Pattern recognition;
    Motif;
    Statistical;
    Significance;
    Sequence alignment;
    Sequence comparison;
    USA;
    Protein
</SU>
<AB>"The detection of conserved sequence patterns (motifs) in related 
proteins
often yields valuable structural and functional insights. We describe a method
that utilizes rigorous statistics and a depth-first search procedure to
efficiently and exhaustively search a set of proteins for significant patterns
up to a specified length. Additional procedures classify related patterns into
groups and identify protein segments most likely to share a common motif."
</AB>
<JT>J Mol Biol</JT>
<PY>239</PY>
<VO>239</VO>
<PP>698-712</PP>
</SEQ>

<SEQ>
<UI>1411   Zhang,C.T.    A Graphic Approach to .. J.Mol.Biol.     94 238:1-8
</UI>
<AU>Zhang CT;
    Chou KC
</AU>
<TI>A Graphic Approach to Analyzing Codon Usage in 1562 Escherichia coli
Protein Coding Sequences
</TI>
<SU>Coding;
    Codon;
    Sequence analysis;
    CN;
    Protein;
    Graphic
</SU>
<AB>"The occurrence frequencies of the four bases ... at each of the three
codon positions for 1562 E. coli protein coding sequences have been calculated.
The 1562 x 4 x 3 = 18,744 data thus obtained have been analyzed by a graphic
method .... The results of our analysis indicate that the patterns for the 
first
two codon positions reflect the origin for producing  native folding structures
of proteins. We thus come to the conclustion that the distribution patterns for
the first two codon positions should be basically species-independent, as
confirmed by studies for a number of other species. However, the distribution
pattern for the third codon position is species-dependent."
</AB>
<JT>J Mol Biol</JT>
<PY>238</PY>
<VO>238</VO>
<PP>1-8</PP>
</SEQ>

<SEQ>
<UI>1412   Idury,R.M.    Dynamic Dictionary Mat.. Theoret.Comput. 94 
131:295-310
</UI>
<AU>Idury RM;
    Schaffer AA
</AU>
<TI>Dynamic Dictionary Matching with Failure Functions
</TI>
<SU>Dictionary match;
    Pattern match;
    USA;
    Function;
    Dynamic
</SU>
<AB>"Amir and Farach (1991) and Amir et al. (to appear) recently initiated 
the
study of the dynamic dictionary pattern matching problem. ... Amir et al. (to
appear) used an automaton based on suffix trees to solve the dynamic problem.
... We show that the same bounds can be achieved using a framework based on
failure functions. We then show that our approach also allows us to achieve
faster search times at the expense of the update times. ... This is 
advantageous
if the search texts are much larger than the dictionary or searches are more
frequent than updates."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>131</PY>
<VO>131</VO>
<PP>295-310</PP>
</SEQ>

<SEQ>
<UI>1413   Miyamoto,M.M. A Congruence Test of R.. Syst.Biol.      94 
43(2):236-249
</UI>
<AU>Miyamoto MM;
    Allard MW;
    Adkins RM;
    Janecek LL;
    Honeycutt RL
</AU>
<TI>A Congruence Test of Reliability using Linked Mitochondrial DNA Sequences
</TI>
<SU>Phylogeny;
    Reliability;
    Congruence;
    USA;
    DNA
</SU>
<AB>"In the absence of certainty, well-corroborated hypotheses of species
relationships serve as the best estimates of the true phylogenies of groups.
This approach was extended to linked mitochondrial DNA (mtDNA) sequences that
share the same gene phylogenies because of nonrecombination. This expectation 
of
congruence forms the basis to test the reliability of unequal weighting for
different base positions and changes of DNA sequences. ... Heavy weighting for
stems and first/second codon positions and for transversions were first
evaluated against the molecular evolutionary properties of the three genes and
then evaluated by congruence ...."
</AB>
<JT>Syst Biol</JT>
<PY>1994</PY>
<VO>43</VO>
<NO>2</NO>
<PP>236-249</PP>
</SEQ>

<SEQ>
<UI>1414   Bairoch,A.    List of Molecular Biol..                 93
</UI>
<AU>Bairoch A
</AU>
<TI>List of Molecular Biology Email Servers
BK  -
</TI>
<SU>Electronic mail;
    Sequence database;
    Database search;
    Program;
    SWI;
    Server
</SU>
<AB>Document serv_ema.txt (version 1.70, 10 Dec. 1993) which is available 
from
netservembl-heidelberg.de. "This document briefly describes the various email
servers that are available to molecular biologists. The servers described in
this document generally fall into one of the following two categories: (1)
Servers that provide an analytical function. ... (2) Servers that allow you to
retrieve all or part of a database."
</AB>
<PY>1993</PY>
</SEQ>

<SEQ>
<UI>1415   Felsenstein,J Cases in which Parsimo.. Conceptual Is.. 84MIT Press
</UI>
<AU>Felsenstein J
</AU>
<TI>Cases in which Parsimony or Compatibility Methods will be Positively
Misleading
</TI>
<ED>Sober E
</ED>
<BK>Conceptual Issues in Evolutionary Biology. An Anthology
</BK>
<SU>Phylogeny;
    Evolutionary tree;
    Likelihood;
    Parsimony;
    Compatibility;
    USA
</SU>
<AB>Originally published as Felsenstein (1978). "For some simple three- and
four-species cases involving a character with two states, it is determined 
under
what conditions several methods of phylogenetic inference will fail to converge
to the true phylogeny as more and more data are accumulated. The methods are 
the
Camin-Sokal parsimony method, the compatibility method, and Farris's unrooted
Wagner tree parsimony method. In all cases the conditions for this failure
(which is the failure to be statistically consistent) are essentially that
parallel changes exceed informative, nonparallel changes."
</AB>
<PU>MIT Press </PU>
<PL>Cambridge, MA </PL>
<PY>1984</PY>
<PP>663-674</PP>
</SEQ>

<SEQ>
<UI>1416   Felsenstein,J Phylogenies and the Co.. Am.Nat.         85 
125(1):1-15
</UI>
<AU>Felsenstein J
</AU>
<TI>Phylogenies and the Comparative Method
</TI>
<SU>Phylogeny;
    Statistical;
    Correlation;
    USA
</SU>
<AB>"Recent years have seen a growth in numerical studies using the
comparative method. The method usually involves a comparison of two phenotypes
across a range of species or higher taxa, or a comparison of one phenotype with
an environmental variable. ... My intention is to point out a serious
statistical problem with this approach, a problem that affects all of these
studies. It arises from the fact that species are part of a hierarchically
structured phylogeny, and thus cannot be regarded for statistical purposes as 
if
drawn independently from the same distribution."
</AB>
<JT>Am Nat</JT>
<PY>1985</PY>
<VO>125</VO>
<NO>1</NO>
<PP>1-15</PP>
</SEQ>

<SEQ>
<UI>1417   Felsenstein,J Perils of Molecular In.. Nature (Lond.)  88 335 (8 
Sept.):
</UI>
<AU>Felsenstein J
</AU>
<TI>Perils of Molecular Introspection
</TI>
<SU>Phylogeny;
    Likelihood;
    Parsimony;
    Invariant;
    Distance;
    USA
</SU>
<AB>This is a brief overview of methods for analysing the phylogeny of the
apes. "We can either use all the information with a highly specific 
evolutionary
model, as likelihood methods do, or trade some of that information for
robustness by looking at a smaller subset of the data, as invariants, parsimony
and distance methods each does in different ways."
</AB>
<JT>Nature (Lond ) </JT>
<PY>1988</PY>
<VO>335</VO>
<NO>8 Sept.</NO>
<PP>118-118</PP>
</SEQ>

<SEQ>
<UI>1418   Felsenstein,J Phylogenies and Quanti.. Annu.Rev.Ecol.S 88 
19:445-471
</UI>
<AU>Felsenstein J
</AU>
<TI>Phylogenies and Quantitative Characters
</TI>
<SU>Phylogeny;
    Character data;
    USA
</SU>
<AB>"My argument is that the methods used to study the evolution of
quantitative characters within populations can profitably be used on a
phylogenetic scale to illumine the connection between pattern and process. ...
The moment seems ripe to consider the issue."
</AB>
<JT>Annu Rev Ecol Syst</JT>
<PY>19</PY>
<VO>19</VO>
<PP>445-471</PP>
</SEQ>

<SEQ>
<UI>1419   Trelles-Salaz On an Efficient Parall.. Comput.Appl.Bio 94 
10(5):509-511
</UI>
<AU>Trelles-Salazar O;
    Zapata EL;
    Carazo JM
</AU>
<TI>On an Efficient Parallelization of Exhaustive Sequence Comparison
Algorithms on Message Passing Architectures
</TI>
<SU>Sequence comparison;
    Database search;
    Parallel;
    SP;
    Algorithm
</SU>
<AB>"We present a new parallel computing approach to the case of exhaustive
sequential sequence comparison algorithms on message-passing architectures. In
this context a modification of guided self-scheduling as well as efficient
buffering strategies are presented. We discuss two specific implementations, 
one
on the Paramid parallel computer, and the other on a cluster of workstations
running PVM. In both cases the parallel performance is higher than with any
other method presented so far."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>5</NO>
<PP>509-511</PP>
</SEQ>

<SEQ>
<UI>1420   Felsenstein,J Phylogenies from Restr.. Evolution       92 
46(1):159-173
</UI>
<AU>Felsenstein J
</AU>
<TI>Phylogenies from Restriction Sites: A Maximum-Likelihood Approach
</TI>
<SU>Phylogeny;
    Restriction;
    Likelihood;
    USA
</SU>
<AB>"Restriction sites data can be analyzed by maximum likelihood to obtain
estimates of phylogenies. The likelihood methods of Smouse and Li, who were 
able
to compute likelihoods for up to four species under a simplified model of base
change, can be extended numerically to deal with any number of species. The
computational methods for doing so are outlined. The resulting algorithms are
slow but take multiple gains and losses of restriction sites fully into 
account,
unlike parsimony methods. ... The present method is available in a computer
program."
</AB>
<JT>Evolution </JT>
<PY>1992</PY>
<VO>46</VO>
<NO>1</NO>
<PP>159-173</PP>
</SEQ>

<SEQ>
<UI>1421   Archie,J.W.   The Number of Evolutio.. Theoret.Pop.Bio 93 43:52-79
</UI>
<AU>Archie JW;
    Felsenstein J
</AU>
<TI>The Number of Evolutionary Steps on Random and Minimum Length Trees for
Random Evolutionary Data
</TI>
<SU>Evolutionary tree;
    Statistical;
    USA
</SU>
<AB>"A model of evolutionarily uninformative data is derived and two separate
character state distributions, one with two states (0,1) and one with missing-
value data (0,1,{0,1}), are obtained. The expectation of number of steps on
random trees is derived for both types of data and the variance in number of
steps is derived for missing-value data. It is conjectured that the number of
steps on random trees for these data should be asymptotically normal. Computer
simulation is used to find approximations for the expected number and variance
in number of steps of minimum length trees for both types of random 
evolutionary
data."
</AB>
<JT>Theoret Pop Biol</JT>
<PY>43</PY>
<VO>43</VO>
<PP>52-79</PP>
</SEQ>

<SEQ>
<UI>1422   Steel,M.      A Complete Family of P.. N.Z.J.Bot.      93 
31(3):289-296
</UI>
<AU>Steel M;
    Szekely L;
    Erdos PL;
    Waddell P
</AU>
<TI>A Complete Family of Phylogenetic Invariants for Any Number of Taxa Under
Kimura's 3ST Model
</TI>
<SU>Phylogeny;
    Invariant;
    Spectral analysis;
    NZ;
    Model;
    Phylogenetic
</SU>
<AB>"We describe a new family of phylogenetic invariants that arise from the
recently developed spectral analysis approach to tree reconstruction. These
invariants, which are valid for Kimura's 3ST model, possess four important
properties - they are defined equally easily for any number of taxa, their
description is tree-independent, they apply even when the distribution of the
four nucleotides in the ancestral taxon is unknown, and they can be modified to
deal with sequence sites that do not mutate independently with identical
distribution."
</AB>
<JT>N Z J Bot</JT>
<PY>1993</PY>
<VO>31</VO>
<NO>3</NO>
<PP>289-296</PP>
</SEQ>

<SEQ>
<UI>1423   Weir,B.S.     Variances for Distance.. N.Z.J.Bot.      93 
31(3):317-321
</UI>
<AU>Weir BS;
    Gaut BS
</AU>
<TI>Variances for Distances Between Plant Sequences
</TI>
<SU>Phylogeny;
    Pairwise comparison;
    Distance;
    Character data;
    USA;
    Variance
</SU>
<AB>"When the data consist of DNA sequences, appropriate distances can be
defined to reflect the mutation model and to have expected values proportional
to the time of divergence of the sequences. With the growing amount of sequence
data, it is also necessary to incorporate within-species sequence variation 
into
measures of distance between sequences. This additional feature requires
attention to be paid to drift and recombination as well as mutation, and 
greatly
increases the difficulty of estimating variances of estimated distances.
Numerical resampling seems appropriate."
</AB>
<JT>N Z J Bot</JT>
<PY>1993</PY>
<VO>31</VO>
<NO>3</NO>
<PP>317-321</PP>
</SEQ>

<SEQ>
<UI>1424   Fitch,W.M.    Weighted Parsimony: Do.. Phylogenetic .. 91Oxford 
Universi
</UI>
<AU>Fitch WM;
    Ye J
</AU>
<TI>Weighted Parsimony: Does it Work?
</TI>
<ED>Miyamoto MM
    Cracraft J
</ED>
<BK>Phylogenetic Analysis of DNA Sequences
</BK>
<SU>Phylogeny;
    Character weight;
    USA;
    Parsimony
</SU>
<AB>"In 1969, Farris suggested a relatively unbiased way of weighting the
value of a character for systematic purposes based upon the proposition that
characters that frequently change their state are unreliable guides to
relationships .... In nucleotide sequences, one can apply the same philosophy
not only to the various characters ... but to the character changes as well.
[See Williams and Fitch (1989, 1990).] A computer program has been developed
that permits one to perform either kind of weighting, or both simultaneously
.... In this work, we use simulation to test whether the principle works in
practice."
</AB>
<PU>Oxford University Press </PU>
<PL>New York </PL>
<PY>1991</PY>
<PP>147-154</PP>
</SEQ>

<SEQ>
<UI>1425   Li,W.H.       Statistical Methods fo.. Phylogenetic .. 91Oxford 
Universi
</UI>
<AU>Li WH;
    Gouy M
</AU>
<TI>Statistical Methods for Testing Molecular Phylogenies
</TI>
<ED>Miyamoto MM
    Cracraft J
</ED>
<BK>Phylogenetic Analysis of DNA Sequences
</BK>
<SU>Phylogeny;
    Statistical;
    Confidence;
    USA
</SU>
<AB>"Fortunately, the rapid accumulation of DNA sequence data has stimulated 
a
strong trend to make phylogenetic reconstruction more statistical. Statistical
tests can be classified as analytical or resampling. Resampling methods (e.g.,
bootstrapping, jacknifing) resample the data to infer empirically the
variability of the estimate obtained by a tree-making method. [See Felsenstein
1988.] In this chapter, we discuss only analytical methods. Analytical tests 
can
be based on parsimony methods, distance methods, likelihood methods, or
invariant methods. The last approach, which includes the evolutionary parsimony
method, has been reviewed in Felsenstein (1988). We discuss only the other
approaches."
</AB>
<PU>Oxford University Press </PU>
<PL>New York </PL>
<PY>1991</PY>
<PP>249-277</PP>
</SEQ>

<SEQ>
<UI>1426   Swofford,D.L. PAUP: Phylogenetic Ana..                 91Illinois 
Natura
</UI>
<AU>Swofford DL
</AU>
<TI>PAUP: Phylogenetic Analysis Using Parsimony, Version 3.0s.
BK  -
</TI>
<SU>Phylogeny;
    Character data;
    Program;
    Parsimony;
    Invariant;
    USA;
    Phylogenetic
</SU>
<AB>Draft version of the User's Manual for PAUP 3.0, dated 6 Dec. 1991.
"Version 3 of PAUP contains many significant improvements over earlier versions
of the program. From a scientific standpoint, the most important enhancement is
support for a wider variety of parsimony models, including the well-known Dollo
and Camin-Sokal variants and a 'generalized' method that allows the
specification of user-defined character types. These methods supplement the
ordered-reversible (Wagner) and unordered (Fitch) parsimony methods of the
earlier versions. In addition, the method of invariants for nucleotide sequence
data ('evolutionary parsimony') developed by James Lake has been incorporated."
</AB>
<PU>Illinois Natural History Survey </PU>
<PL>Champaign, IL, USA </PL>
<PY>1991</PY>
<PP>pp.vii+178-0</PP>
</SEQ>

<SEQ>
<UI>1427   Martino,R.L.  Parallel Computing in .. Science         94 265(12 
Aug.):9
</UI>
<AU>Martino RL;
    Johnson CA;
    Suh EB;
    Trus BL;
    Yap TK
</AU>
<TI>Parallel Computing in Biomedical Research
</TI>
<SU>Parallel;
    Hardware;
    Database search;
    Protein;
    Structure;
    Prediction;
    USA
</SU>
<AB>"Scalable parallel computer architectures provide the computational
performance needed for advanced biomedical computing problems. The National
Institutes of Health have developed a number of parallel algorithms and
techniques useful in determining biological structure and function. These
applications include ... searching for homologous DNA or amino acid sequences 
in
large biological databases. Timing results demonstrate substantial performance
improvements with parallel implementations compared with conventional 
sequential
systems."
</AB>
<JT>Science </JT>
<PY>1994</PY>
<VO>265</VO>
<NO>12 Aug.</NO>
<PP>902-907</PP>
</SEQ>

<SEQ>
<UI>1428   Zhang,Z.      Chaining Multiple-Alig.. J.Comput.Biol.  94 
1(3):217-226
</UI>
<AU>Zhang Z;
    Raghavachari B;
    Hardison RC;
    Miller W
</AU>
<TI>Chaining Multiple-Alignment Blocks
</TI>
<SU>Multiple alignment;
    Block search;
    Footprint;
    USA
</SU>
<AB>"We derive a time-efficient method for building a multiple alignment
consisting of a highest-scoring chain of 'blocks,' i.e., short gap-free
alignments. Besides executing faster than a general-purpose multiple-alignment
program, the method may be particularly appropriate when discovery of blocks
meeting a certain criterion is the main reason for aligning  the sequences.
Utility of the method is illustrated by locating a chain of 'phylogenetic
footprints' (specifically, exact matches of length 6 or more) in the 
5'-flanking
regions of six mammalian e-globin genes."
</AB>
<JT>J Comput Biol</JT>
<PY>1994</PY>
<VO>1</VO>
<NO>3</NO>
<PP>217-226</PP>
</SEQ>

<SEQ>
<UI>1429   Yamauchi,K.   The Sequence Flanking .. Nucleic Acids R 91 
19(10):2715-27
</UI>
<AU>Yamauchi K
</AU>
<TI>The Sequence Flanking Translational Initiation Site in Protozoa
</TI>
<SU>Consensus method;
    JP
</SU>
<AB>"If a specific nucleotide was observed at a frequency greater than 50% at
a specific position, it was defined as consensus nucleotide. If the sum of the
frequencies of two nucleotides was greater than 75% and neither nucleotide met
the criteria for a single consensus, they were assigned as co-consensus
nucleotides."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1991</PY>
<VO>19</VO>
<NO>10</NO>
<PP>2715-2720</PP>
</SEQ>

<SEQ>
<UI>1430   Shapiro,M.B.  RNA Splice Junctions o.. Nucleic Acids R 87 
15(17):7155-71
</UI>
<AU>Shapiro MB;
    Senapathy P
</AU>
<TI>RNA Splice Junctions of Different Classes of Eukaryotes: Sequence
Statistics and Functional Implications in Gene Expression
</TI>
<SU>Consensus method;
    USA;
    RNA;
    Gene;
    Expression
</SU>
<AB>"The following simple rule was used in arriving at a consensus sequence 
at
each location: if the highest percentage computed for a particular nucleotide
site equals or exceeds 40, choose the corresponding nucleotide; choose also the
nucleotide with the second highest percentage if it equals or exceeds 30 and is
at least twice as large as the third highest percentage."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1987</PY>
<VO>15</VO>
<NO>17</NO>
<PP>7155-7174</PP>
</SEQ>

<SEQ>
<UI>1431   Arratia,R.    Two Moments Suffice fo.. Ann.Probab.     89 
17(1):9-25
</UI>
<AU>Arratia R;
    Goldstein L;
    Gordon L
</AU>
<TI>Two Moments Suffice for Poisson Approximations: The Chen-Stein Method
</TI>
<SU>Statistical;
    Significance;
    Poisson;
    Chen-Stein;
    USA;
    Approximation
</SU>
<AB>"Convergence to the Poisson distribution, for the number of occurrences 
of
dependent events, can often be established by computing only first and second
moments, but not higher ones. This remarkable result is due to Chen (1975). The
method also provides an upper bound on the total variation distance to the
Poisson distribution, and succeeds in cases where third and higher moments blow
up. This paper presents Chen's results in a form that is easy to use and gives 
a
multivariable extension, which gives an upper bound on the total variation
distance between a sequence of dependent indicator functions and a Poisson
process with the same intensity."
</AB>
<JT>Ann Probab</JT>
<PY>1989</PY>
<VO>17</VO>
<NO>1</NO>
<PP>9-25</PP>
</SEQ>

<SEQ>
<UI>1432   Wilson,A.C.   Biochemical Evolution    Annu.Rev.Bioche 77 
46:573-639
</UI>
<AU>Wilson AC;
    Carlson SS;
    White TJ
</AU>
<TI>Biochemical Evolution
</TI>
<SU>Evolution;
    Clock;
    Evolutionary rate;
    USA
</SU>
<AB>"This review deals with the contributions of comparative studies on the
nucleic acids and proteins of present-day organisms to knowledge of evolution.
... The main concern of this review is with the rates at which base
substitutions and amino acid substitutions have been fixed and with the
relationship between these rates and the rates of organismal evolution. We
consider topics that have not been comprehensively reviewed before, such as
molecular evolution in primates, the generation-time hypothesis, stochastic
variation in the evolutionary clock, and the relationship between sequence
evolution and organismal evolution."
</AB>
<JT>Annu Rev Biochem</JT>
<PY>46</PY>
<VO>46</VO>
<PP>573-639</PP>
</SEQ>

<SEQ>
<UI>1433   Day,W.H.E.    Sequence Analysis and .. CSNA Newsletter 94 
37(Nov.):0-0
</UI>
<AU>Day WHE
</AU>
<TI>Sequence Analysis and Comparison: A Bibliography. Version 4.0 - 5 October
1994
</TI>
<SU>Sequence analysis;
    Sequence comparison;
    Bibliography;
    CA
</SU>
<AB>"I am maintaining, in electronic form, a bibliography of papers on the
theory or methodology of sequence analysis, alignment, comparison or consensus.
The bibliography includes many papers on the estimation of phylogenies from
sequences. It has only a few papers on the alignment, comparison or prediction
of sequence structures." Included are instructions on: how to perform free-text
searching of version 3.0 with the Wide-Area Information Server (WAIS), and how
to obtain a text-only file of the bibliography by anonymous ftp (file transfer
protocol).
</AB>
<JT>CSNA Newsletter </JT>
<PY>1994</PY>
<VO>37</VO>
<NO>Nov.</NO>
<PP>0-0</PP>
</SEQ>

<SEQ>
<UI>1434   Dembo,A.      Strong Limit Theorems .. Ann.Probab.     91 
19(4):1737-175
</UI>
<AU>Dembo A;
    Karlin S
</AU>
<TI>Strong Limit Theorems of Empirical Functionals for Large Exceedances of
Partial Sums of I.I.D. Variables
</TI>
<SU>Statistical;
    Significance;
    Scoring;
    Sequence analysis;
    USA
</SU>
<AB>The paper's results are applied "in characterizing the composition of 
high
scoring segments in letter sequences .... The [problem is] of interest in
connection with molecular (DNA and protein) sequence comparisons (see Section 
4,
Karlin and Altschul (1990) and Karlin, Dembo and Kawabata (1990)) ...."
</AB>
<JT>Ann Probab</JT>
<PY>1991</PY>
<VO>19</VO>
<NO>4</NO>
<PP>1737-1755</PP>
</SEQ>

<SEQ>
<UI>1435   Karlin,S.     Limit Distributions of.. Adv.Appl.Probab 92 
24:113-140
</UI>
<AU>Karlin S;
    Dembo A
</AU>
<TI>Limit Distributions of Maximal Segmental Score Among Markov-Dependent
Partial Sums
</TI>
<SU>Sequence analysis;
    Statistical;
    Significance;
    Markov;
    USA;
    Distribution;
    Score
</SU>
<AB>"In this paper we derive new probabilistic formulas useful for assessing
statistical significance (unusual high values) of a sequence segment 
composition
allowing a general scoring scheme in letter values in the context of Markov-
dependent sequences. (For biological discussions and applications, see Karlin &amp;
Altschul (1990) and Karlin, Bucher, Brendel &amp; Altschul (1991).) The formulas
have been incorporated into computer software that are now effective in the
analysis of biomolecular sequence data (e.g. Altschul, Gish, Miller, Myers &amp;
Lipman (1990), Altschul &amp; Lipman (1990), Karlin, Bucher, Brendel &amp; Altschul
(1991))."
</AB>
<JT>Adv Appl Probab</JT>
<PY>24</PY>
<VO>24</VO>
<PP>113-140</PP>
</SEQ>

<SEQ>
<UI>1436   Gusfield,D.   Parametric Optimizatio.. ACM-SIAM Sympos 92 
3:432-439
</UI>
<AU>Gusfield D;
    Balasubramanian K;
    Naor D
</AU>
<TI>Parametric Optimization of Sequence Alignment
</TI>
<SU>Pairwise alignment;
    Parametric;
    Sequence alignment;
    USA;
    Optimization
</SU>
<AB>"Parametric Sequence Alignment is the problem of computing the optimal
valued alignment between two sequences as a function of variable weights for
matches, mismatches, spaces and gaps. The goal is to partition the parameter
space into regions (which are necessarily convex) such that in each region one
alignment is optimal throughout and such that the regions are maximal for this
property. In this paper we are primarily concerned with the structure of this
convex decomposition, and secondarily with the complexity of computing the
decomposition."
</AB>
<BK>ACM-SIAM Sympos Discrete Algorithms</BK>
<PY>1992</PY>
<VO>3</VO>
<PP>432-439</PP>
</SEQ>

<SEQ>
<UI>1437   Arratia,R.    Poisson Approximation .. Stat.Sci.       90 
5(4):403-434
</UI>
<AU>Arratia R;
    Goldstein L;
    Gordon L
</AU>
<TI>Poisson Approximation and the Chen-Stein Method
</TI>
<SU>Pairwise comparison;
    Statistical;
    Significance;
    Poisson;
    Chen-Stein;
    USA;
    Approximation
</SU>
<AB>Includes commentaries by J. M. Steele, A. D. Barbour, M. S. Waterman and
L. H. Y. Chen, and also a rejoinder by the authors. "The Chen-Stein method of
Poisson approximation is a powerful tool for computing an error bound when
approximating probabilities using the Poisson distribution. In many cases, this
bound may be given in terms of first and second moments alone. We present a
background of the method and state some fundamental Poisson approximation
theorems. The body of this paper is an illustration, through varied examples, 
of
the wide applicability and utility of the Chen-Stein method. ... We conclude
with an application to molecular biology."
</AB>
<JT>Stat Sci</JT>
<PY>1990</PY>
<VO>5</VO>
<NO>4</NO>
<PP>403-434</PP>
</SEQ>

<SEQ>
<UI>1438   Li,W.H.       Fundamentals of Molecu..                 91Sinauer 
Associa
</UI>
<AU>Li WH;
    Graur D
</AU>
<TI>Fundamentals of Molecular Evolution
BK  -
</TI>
<SU>Gene;
    Evolution;
    Genome;
    Phylogeny;
    USA
</SU>
<AB>"We have set out to write a book for 'beginners' in molecular evolution.
At the same time, we have tried to maintain the standards of the scientific
method and to include quantitative treatments of the issues at hand. Therefore,
in describing evolutionary phenomena and mechanisms at the molecular level, 
both
mathematical and intuitive explanations are provided. Neither is meant to be at
the expense of the other; rather, the two approaches are intended to complement
each other and to help the reader achieve a better grasp of the issues. We have
not attempted to attain encyclopedic completeness, but have provided a large
number of examples to support and clarify the many theoretical arguments and
discussions."
</AB>
<PU>Sinauer Associates</PU>
<PL> Inc ,Sunderland, MA </PL>
<PY>1991</PY>
<PP>xv+284-0</PP>
</SEQ>

<SEQ>
<UI>1439                 Combinatorial Pattern ..                 
93Springer-Verlag
</UI>
<TI>Combinatorial Pattern Matching. 4th Annual Symposium, CPM 93. 
Proceedings.
Lecture Notes in Computer Science, Volume 684.
</TI>
<ED>Apostolico A
    Crochemore M;
    Galil Z;
    Manber U
BK  -
</ED>
<SU>Pattern match;
    Italy;
    Combinatorial
</SU>
<AB>Padova, Italy, June 1993. "Combinatorial Pattern Matching addresses 
issues
of searching and matching of strings and more complicated patterns such as
trees, regular expressions, extended expressions, etc. The goal is to derive
nontrivial combinatorial properties for such structures and then to exploit
these properties in order to achieve superior performances for the 
corresponding
computational problems."
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1993</PY>
<PP>viii+265-0</PP>
</SEQ>

<SEQ>
<UI>1440                 Combinatorial Pattern ..                 
92Springer-Verlag
</UI>
<TI>Combinatorial Pattern Matching. Third Annual Symposium. Proceedings.
Lecture Notes in Computer Science, Volume 644.
</TI>
<ED>Apostolico A
    Crochemore M;
    Galil Z;
    Manber U
BK  -
</ED>
<SU>Pattern match;
    Pattern search;
    USA;
    Combinatorial
</SU>
<AB>Tucson, Arizona, April/May 1992. "Combinatorial Pattern Matching 
addresses
issues of searching and matching of strings and more complicated patterns such
as trees, regular expressions, extended expressions, etc. The goal is to derive
nontrivial combinatorial properties for such structures and then to exploit
these properties in order to achieve superior performances for the 
corresponding
computational problems. In recent years, a steady flow of high-quality
scientific study of this subject has changed a sparse set of isolated results
into a full-fledged area of algorithmics. Still, there is currently no central
place for disseminating results in this area. We hope that CPM can grow to 
serve
as the focus point."
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1992</PY>
<PP>x+287-0</PP>
</SEQ>

<SEQ>
<UI>1441   Allison,L.    Using Hirschberg's Alg.. Inform.Process. 94 
51(5):251-255
</UI>
<AU>Allison L
</AU>
<TI>Using Hirschberg's Algorithm to Generate Random Alignments of Strings
</TI>
<SU>Pairwise alignment;
    Monte Carlo;
    Edit;
    Distance;
    Subsequence;
    Longest common;
    AU;
    Algorithm
</SU>
<AB>"Hirschberg [(1975)] gave an alignment algorithm for the longest common
subsequence problem that uses O( n2 ) time and O( n ) space for two strings of
length n. A simple modification of the algorithm can sample string alignments 
at
random according to their probability distribution. This is useful for
statistical estimation of evolutionary distances of a family of strings, e.g.
DNA strings. The algorithm's time and space complexity are unchanged."
</AB>
<JT>Inform Process Lett</JT>
<PY>1994</PY>
<VO>51</VO>
<NO>5</NO>
<PP>251-255</PP>
</SEQ>

<SEQ>
<UI>1442   Borodovsky,M. Deriving Non-Homogeneo.. Computers Chem. 94 
18(3):259-267
</UI>
<AU>Borodovsky M;
    Peresetsky A
</AU>
<TI>Deriving Non-Homogeneous DNA Markov Chain Models by Cluster Analysis
Algorithm Minimizing Multiple Alignment Entropy
</TI>
<SU>Pattern discovery;
    Markov;
    Multiple alignment;
    Clustering;
    Statistical;
    USA;
    DNA;
    Entropy;
    Model;
    Algorithm
</SU>
<AB>"Non-homogeneous Markov chain models can represent biologically important
regions of DNA sequences. The statistical pattern that is described by these
models is usually weak and was found primarily because of strong biological
indications. The general method for extracting similar patterns is presented in
the current paper. The algorithm incorporates cluster analysis, multiple
alignment and entropy minimization. ... These Markov models were already
employed in the GeneMark gene prediction algorithm, which is used in genome
sequencing projects."
</AB>
<JT>Computers Chem</JT>
<PY>1994</PY>
<VO>18</VO>
<NO>3</NO>
<PP>259-267</PP>
</SEQ>

<SEQ>
<UI>1443   Brendel,V.    Applications of Statis.. Computers Chem. 94 
18(3):251-253
</UI>
<AU>Brendel V;
    Karlin S
</AU>
<TI>Applications of Statistical Criteria in Protein Sequence Analysis: Case
Study of Yeast RNA Polymerase II Subunits
</TI>
<SU>Pattern discovery;
    Statistical;
    Sequence analysis;
    USA;
    Protein;
    RNA
</SU>
<AB>"We have recently proposed statistical techniques to identify unusual
protein sequence features [Brendel, Bucher, Nourbakhsh, Blaisdell, Karlin
(1992); Karlin, Brendel (1992); Karlin, Blaisdell, Bucher (1992)]. Extensive
mapping of these features to particular groups of proteins may afford new ways
of protein classification. Here we present a case study of such analysis by
discussing special features of the amino acid sequences of yeast RNA polymerase
II, the first eukaryotic RNA polymerase for which all subunits have been
sequenced."
</AB>
<JT>Computers Chem</JT>
<PY>1994</PY>
<VO>18</VO>
<NO>3</NO>
<PP>251-253</PP>
</SEQ>

<SEQ>
<UI>1444   Karlin,S.     A Method to Identify D.. J.Mol.Biol.     89 
205(1):165-177
</UI>
<AU>Karlin S;
    Blaisdell BE;
    Mocarski ES;
    Brendel V
</AU>
<TI>A Method to Identify Distinctive Charge Configurations in Protein
Sequences, with Application to Human Herpesvirus Polypeptides
</TI>
<SU>Pattern discovery;
    Statistical;
    Significance;
    Protein;
    USA;
    Charge
</SU>
<AB>"Charge interactions are of great importance for protein function and
structure, and for a variety of cellular and biochemical processes. We present 
a
systematic approach to the detection of distinctive clusters, runs and periodic
patterns of charged residues in a protein sequence. Criteria and formulae are
set forth to assess statistical significance of these charge configurations. 
...
The statistics developed in this paper apply more generally to other than 
charge
properties of a protein and should aid in the evaluation of a large variety of
sequence features."
</AB>
<JT>J Mol Biol</JT>
<PY>1989</PY>
<VO>205</VO>
<NO>1</NO>
<PP>165-177</PP>
</SEQ>

<SEQ>
<UI>1445   Karlin,S.     Statistical Analyses o.. Nucleic Acids R 92 
20(6):1363-137
</UI>
<AU>Karlin S;
    Burge C;
    Campbell AM
</AU>
<TI>Statistical Analyses of Counts and Distributions of Restriction Sites in
DNA Sequences
</TI>
<SU>Pattern discovery;
    Statistical;
    Significance;
    Restriction;
    Distribution;
    DNA;
    USA
</SU>
<AB>"Counts and spacings of all 4- and 6-bp palindromes in DNA sequences from
a broad range of organisms were investigated. Both 4- and 6-bp average
palindrome counts were significantly low in all bacteriophages except one,
probably as a means of avoiding restriction enzyme cleavage. ... The counts and
distributions of 4-bp and 6-bp restriction sites in bacterial species are
variable. ... Interpretations of these results are given in terms of
restriction/methylation regimes, recombination and transcription processes, and
possible structural and regulatory roles of 4- and 6-bp palindromes."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1992</PY>
<VO>20</VO>
<NO>6</NO>
<PP>1363-1370</PP>
</SEQ>

<SEQ>
<UI>1446   Karlin,S.     Quantile Distributions.. Protein Eng.    92 
5(8):729-738
</UI>
<AU>Karlin S;
    Blaisdell BE;
    Bucher P
</AU>
<TI>Quantile Distributions of Amino Acid Usage in Protein Classes
</TI>
<SU>Pattern discovery;
    Statistical;
    Significance;
    Distribution;
    Protein;
    USA;
    Amino acid
</SU>
<AB>"A comparative study of the compositional properties of various protein
sets from both cellular and viral organisms is presented. Invariants and
contrasts of amino acid usages have been discerned for different protein
function classes and for different species using robust statistical methods
based on quantile distributions and stochastic ordering relationships. In
addition, a quantitative criterion to assess amino acid compositional extremes
relative to a reference protein set is proposed and applied."
</AB>
<JT>Protein Eng</JT>
<PY>1992</PY>
<VO>5</VO>
<NO>8</NO>
<PP>729-738</PP>
</SEQ>

<SEQ>
<UI>1447   Karlin,S.     Patchiness and Correla.. Science         93 259 (29 
Jan.):
</UI>
<AU>Karlin S;
    Brendel V
</AU>
<TI>Patchiness and Correlations in DNA Sequences
</TI>
<SU>Pattern discovery;
    Statistical;
    DNA;
    Correlation;
    Stochastic;
    USA
</SU>
<AB>"The highly nonrandom character of genomic DNA can confound attempts at
modeling DNA sequence variation by standard stochastic processes (including
random walk or fractal models). In particular, the mosaic character of DNA
consisting of patches of different composition can fully account for apparent
long-range correlations in DNA."
</AB>
<JT>Science </JT>
<PY>1993</PY>
<VO>259</VO>
<NO>29 Jan.</NO>
<PP>677-680</PP>
</SEQ>

<SEQ>
<UI>1448   Dembo,A.      Central Limit Theorems.. Stochastic Proc 93 
45(2):259-271
</UI>
<AU>Dembo A;
    Karlin S
</AU>
<TI>Central Limit Theorems of Partial Sums for Large Segmental Values
</TI>
<SU>Pattern discovery;
    Statistical;
    Significance;
    Central limit;
    USA
</SU>
<AB>"Many random structures of theoretical and practical importance are
associated with sequences of real random variables of high aggregate values
having small probability, often exponentially small. In this context we set
forth a class of Gaussian distributional limit theorems conditioned on rare
events. The results can be construed as a central limit theorem in the context
of large deviation theory. ... Our motivation stems from biomolecular sequence
comparisons, Karlin and Altschul (1990), Karlin et al. (1990)."
</AB>
<JT>Stochastic Processes and Their Applications </JT>
<PY>1993</PY>
<VO>45</VO>
<NO>2</NO>
<PP>259-271</PP>
</SEQ>

<SEQ>
<UI>1449   Lawrence,C.E. Detecting Subtle Seque.. Science         93 262 (8 
Oct.):2
</UI>
<AU>Lawrence CE;
    Altschul SF;
    Boguski MS;
    Liu JS;
    Neuwald AF;
    Wootton JC
</AU>
<TI>Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple
Alignment
</TI>
<SU>Pattern discovery;
    Multiple alignment;
    Repeat;
    Signal;
    USA;
    Sampling
</SU>
<AB>"A wealth of protein and DNA sequence data is being generated .... A
crucial barrier to deciphering these sequences and understanding the relations
among them is the difficulty of detecting subtle local residue patterns common
to multiple sequences. Such patterns frequently reflect similar molecular
structures and biological properties. A mathematical definition of this 'local
multiple alignment' problem suitable for full computer automation has been used
to develop a new and sensitive algorithm, based on the statistical method of
iterative sampling."
</AB>
<JT>Science </JT>
<PY>1993</PY>
<VO>262</VO>
<NO>8 Oct.</NO>
<PP>208-214</PP>
</SEQ>

<SEQ>
<UI>1450   Apostolico,A. Optimal Canonization o.. Inform.Comput.  91 
95(1):76-95
</UI>
<AU>Apostolico A;
    Crochemore M
</AU>
<TI>Optimal Canonization of All Substrings of a String
</TI>
<SU>Sequence analysis;
    Factor;
    Optimal;
    Italy
</SU>
<AB>"Any word can be decomposed uniquely into lexicographically nonincreasing
factors each one of which is a Lyndon word. This paper addresses the
relationship between the Lyndon decomposition of a word x and a canonical
rotation of x, i.e., a rotation w of x that is lexicographically smallest among
all rotations of x. The main combinatorial result is a characterization of the
Lyndon factor of x with which w must start. As an application, faster on-line
algorithms for finding the canonical rotation(s) of x are developed by
nontrivial extension of known Lyndon factorization strategies."
</AB>
<JT>Inform Comput</JT>
<PY>1991</PY>
<VO>95</VO>
<NO>1</NO>
<PP>76-95</PP>
</SEQ>

<SEQ>
<UI>1451   Apostolico,A. Efficient Detection of.. Theoret.Comput. 93 
119(2):247-265
</UI>
<AU>Apostolico A;
    Ehrenfeucht A
</AU>
<TI>Efficient Detection of Quasiperiodicities in Strings
</TI>
<SU>Sequence analysis;
    Regularities;
    USA;
    Detection
</SU>
<AB>"A string z is quasiperiodic if there is a second string w not= z such 
that
the occurrences of w in z cover z entirely, i.e., every position of z falls
within some occurrence of w in z. It is shown here that all maximal
quasiperiodic substrings of a string x of n symbols can be detected in time O(n
log2 n)."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>1993</PY>
<VO>119</VO>
<NO>2</NO>
<PP>247-265</PP>
</SEQ>

<SEQ>
<UI>1452   Apostolico,A. Self-Alignments in Wor.. J.Algorithms    92 
13(3):446-467
</UI>
<AU>Apostolico A;
    Szpankowski W
</AU>
<TI>Self-Alignments in Words and Their Applications
</TI>
<SU>Self alignment;
    Word;
    Repetition;
    Suffix;
    USA
</SU>
<AB>"Some quantities associated with periodicities in words are analyzed
within the Bernoulli probabilistic model. In particular, the following problem
is addressed. Assume that a string X is given, with symbols emitted randomly 
but
independently according to some known distribution of probabilities. Then, for
each pair (W,Z) of distinct suffixes of X, the expected length of the longest
common prefix of W and Z is sought. The collection of these lengths, that are
called here self-alignments, plays a crucial role in several algorithmic
problems on words, such as building suffix trees or inverted files, detecting
squares and other regularities, computing substring statistics, etc."
</AB>
<JT>J Algorithms </JT>
<PY>1992</PY>
<VO>13</VO>
<NO>3</NO>
<PP>446-467</PP>
</SEQ>

<SEQ>
<UI>1453   Baeza-Yates,R Fast Two-Dimensional P.. Inform.Process. 93 
45(1):51-57
</UI>
<AU>Baeza-Yates R;
    Regnier M
</AU>
<TI>Fast Two-Dimensional Pattern Matching
</TI>
<SU>Pattern match;
    Multidimensional;
    CL
</SU>
<AB>"An algorithm for searching for a two-dimensional m x m pattern in a two-
dimensional n x n text is presented. It performs on the average less 
comparisons
than the size of the text: n2 / m using m2 extra space. Basically, it uses
multiple string matching on only n / m rows of the text. It runs in at most 2n2
time and is close to the optimal n2 time for many patterns. It steadily extends
to an alphabet-independent algorithm with a similar worst case. Experimental
results are included for a practical version."
</AB>
<JT>Inform Process Lett</JT>
<PY>1993</PY>
<VO>45</VO>
<NO>1</NO>
<PP>51-57</PP>
</SEQ>

<SEQ>
<UI>1454   Bairoch,A.    PROSITE: Recent Develo.. Nucleic Acids R 94 
22(17):3583-35
</UI>
<AU>Bairoch A;
    Bucher P
</AU>
<TI>PROSITE: Recent Developments
</TI>
<SU>Sequence database;
    PROSITE;
    Protein;
    Pattern library;
    SWI
</SU>
<AB>"PROSITE is a compilation of sites and patterns found in protein
sequences; it can be used as a method of determining the function of
uncharacterized proteins translated from genomic or cDNA sequences."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1994</PY>
<VO>22</VO>
<NO>17</NO>
<PP>3583-3589</PP>
</SEQ>

<SEQ>
<UI>1455   Bairoch,A.    The SWISS-PROT Protein.. Nucleic Acids R 94 
22(17):3578-35
</UI>
<AU>Bairoch A;
    Boeckmann B
</AU>
<TI>The SWISS-PROT Protein Sequence Data Bank: Current Status
</TI>
<SU>Sequence database;
    SWISS-PROT;
    Protein;
    SWI
</SU>
<AB>"SWISS-PROT is an annotated protein sequence database established in 1986
and maintained collaboratively, since 1988, by the Department of Medical
Biochemistry of the University of Geneva and the EMBL Data Library. The SWISS-
PROT protein sequence data bank consists of sequence entries. Sequence entries
are composed of different line types, each with their own format. For
standardization purposes the format of SWISS-PROT follows as closely as 
possible
that of the EMBL Nucleotide Sequence Database."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1994</PY>
<VO>22</VO>
<NO>17</NO>
<PP>3578-3580</PP>
</SEQ>

<SEQ>
<UI>1456   Bairoch,A.    PROSITE: A Dictionary .. Nucleic Acids R 92 
20(suppl):2013
</UI>
<AU>Bairoch A
</AU>
<TI>PROSITE: A Dictionary of Sites and Patterns in Proteins
</TI>
<SU>Sequence database;
    PROSITE;
    Protein;
    Pattern library;
    SWI
</SU>
<AB>"PROSITE is a compilation of sites and patterns found in protein
sequences. The use of protein sequence patterns (or motifs) to determine the
function of proteins is becoming very rapidly one of the essential tools of
sequence analysis. ... While there have been a number of recent reports that
review published patterns, no attempt had been made until very recently to
systematically collect biologically significant patterns or to discover new
ones. It is for these reasons that we have developed, since 1988, a dictionary
of sites and pattern which we call PROSITE."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1992</PY>
<VO>20</VO>
<NO>suppl</NO>
<PP>2013-2018</PP>
</SEQ>

<SEQ>
<UI>1457   Bairoch,A.    The SWISS-PROT Protein.. Nucleic Acids R 92 
20(suppl):2019
</UI>
<AU>Bairoch A;
    Boeckmann B
</AU>
<TI>The SWISS-PROT Protein Sequence Data Bank
</TI>
<SU>Sequence database;
    SWISS-PROT;
    Protein;
    SWI
</SU>
<AB>"SWISS-PROT is an annotated protein sequence database established in 1986
and maintained collaboratively, since 1988, by the Department of Medical
Biochemistry of the University of Geneva and the EMBL Data Library."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1992</PY>
<VO>20</VO>
<NO>suppl</NO>
<PP>2019-2022</PP>
</SEQ>

<SEQ>
<UI>1458   Appel,R.D.    A New Generation of In.. Trends Biochem. 94 
19(6):258-260
</UI>
<AU>Appel RD;
    Bairoch A;
    Hochstrasser DF
</AU>
<TI>A New Generation of Information Retrieval Tools for Biologists: The
Example of the ExPASy WWW Server
</TI>
<SU>Retrieval;
    WWW;
    Server;
    SWI
</SU>
<AB>"ExPASy is a WWW server set up at the University Hospital of Geneva and
the Medical Biochemistry Department of Geneva University, and is dedicated to
molecular biology with an emphasis on data relevant to proteins. The two main
entry points on the server give access to the SWISS-PROT database of annotated
protein sequences and the SWISS-2DPAGE database of two-dimensional gel
electrophoresis images. SWISS-PROT can be searched by protein description, 
entry
name or accession number, or referenced author name, as well as by performing a
full text search on all the annotation fields."
</AB>
<JT>Trends Biochem Sci</JT>
<PY>1994</PY>
<VO>19</VO>
<NO>6</NO>
<PP>258-260</PP>
</SEQ>

<SEQ>
<UI>1459   Crochemore,M. Two-Dimensional Patter.. Inform.Process. 93 
46(4):159-162
</UI>
<AU>Crochemore M;
    Gasieniec L;
    Rytter W
</AU>
<TI>Two-Dimensional Pattern Matching by Sampling
</TI>
<SU>Pattern match;
    Multidimensional;
    Sampling;
    FR
</SU>
<AB>"We extend the concept of deterministic sampling to the two-dimensional
pattern matching problem. We show that almost all patterns have a logarithmic
deterministic sample. There are 2D-matching algorithms which work efficiently
for almost all patterns. They solve the 2D-matching problem in linear 
sequential
time with O(1) space, or, alternatively in O(1) parallel time with linear 
number
of processors. This is the first attempt to reduce the space for 
two-dimensional
pattern matching."
</AB>
<JT>Inform Process Lett</JT>
<PY>1993</PY>
<VO>46</VO>
<NO>4</NO>
<PP>159-162</PP>
</SEQ>

<SEQ>
<UI>1460   Crochemore,M. Efficient Parallel Alg.. Inform.Process. 91 
38(2):57-60
</UI>
<AU>Crochemore M;
    Rytter W
</AU>
<TI>Efficient Parallel Algorithms to Test Square-Freeness and Factorize
Strings
</TI>
<SU>Regularities;
    Square;
    Parallel;
    Factor;
    FR;
    Algorithm
</SU>
<AB>"A string is square-free iff it does not contain a nonempty subword of 
the
form ww. We give an algorithm testing square-freeness of strings in log n time
with n processors of a CRCW PRAM. The input alphabet is not bounded. The best
sequential time algorithm for this problem takes O(n log n) time. Hence the
total number of operations in our parallel algorithm matches that of the best
sequential algorithm. The algorithm relies on an efficient parallel computation
of a factorization of words in text compression."
</AB>
<JT>Inform Process Lett</JT>
<PY>1991</PY>
<VO>38</VO>
<NO>2</NO>
<PP>57-60</PP>
</SEQ>

<SEQ>
<UI>1461   Crochemore,M. Parallel Construction .. Inform.Process. 90 
35(3):121-128
</UI>
<AU>Crochemore M;
    Rytter W
</AU>
<TI>Parallel Construction of Minimal Suffix and Factor Automata
</TI>
<SU>Automata;
    Factor;
    Suffix;
    Parallel;
    FR
</SU>
<AB>"We show that the constructions of directed acyclic word graphs (dawg's)
and of minimal suffix and minimal factor automata can be done by almost optimal
parallel algorithms (optimal within logarithmic factor). In the 
concurrent-write
model our algorithms work in log n time and in the exclusive-write model they
work in log2 n time. The number of employed processors is linear. Hence our
constructions have the same complexity as the best known parallel algorithms
computing suffix trees. A relationship between dawg's and suffix trees is
exploited ...."
</AB>
<JT>Inform Process Lett</JT>
<PY>1990</PY>
<VO>35</VO>
<NO>3</NO>
<PP>121-128</PP>
</SEQ>

<SEQ>
<UI>1462   Neraud,J.     A String-Matching Inte.. Theoret.Comput. 92 
92(1):145-164
</UI>
<AU>Neraud J;
    Crochemore M
</AU>
<TI>A String-Matching Interpretation of the Equation xmyn = zp
</TI>
<SU>String match;
    Pattern match;
    On-line;
    FR
</SU>
<AB>"We consider the following problem. Instance: a finite alphabet A, a
biprefix code X = {x,y} whose elements are primitive, a word w in A*. Question:
find all maximal factors of w which are prefixes of a word of X*. We present an
on-line algorithm which solves the problem in time linear in the length of w,
after a preprocessing phase applied to the set X."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>1992</PY>
<VO>92</VO>
<NO>1</NO>
<PP>145-164</PP>
</SEQ>

<SEQ>
<UI>1463   Crochemore,M. Usefulness of the Karp.. Theoret.Comput. 91 
88(1):59-82
</UI>
<AU>Crochemore M;
    Rytter W
</AU>
<TI>Usefulness of the Karp-Miller-Rosenberg Algorithm in Parallel 
Computations
on Strings and Arrays
</TI>
<SU>String match;
    Parallel;
    Multidimensional;
    Sequence analysis;
    Repeat;
    FR;
    Algorithm
</SU>
<AB>"The Karp-Miller-Rosenberg (1972) algorithm was one of the first 
efficient
(almost linear) sequential algorithms for finding repeated patterns and for
string matching. In the area of efficient sequential computations on strings it
was soon superseded by more efficient (and more sophisticated) algorithms. We
show that the Karp-Miller-Rosenberg algorithm (KMR) must be considered as a
basic technique in parallel computations. For many problems, variations of KMR
give the (known) most efficient parallel algorithms."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>1991</PY>
<VO>88</VO>
<NO>1</NO>
<PP>59-82</PP>
</SEQ>

<SEQ>
<UI>1464   Fuchs,R.      Molecular Biological D.. Progress in Bio 91 
56(3):215-245
</UI>
<AU>Fuchs R;
    Cameron GN
</AU>
<TI>Molecular Biological Databases: The Challenge of the Genome Era
</TI>
<SU>Sequence database;
    Genome;
    DE
</SU>
<AB>"In this article we discuss the implications which the advances in
sequencing technology and the genome analysis projects will have for the
existing sequence databanks and how they can react to the challenges of the
future. ... The first section provides some basic information on sequence
databases and genome projects in order to improve the understanding of the
problems which the databanks will have to face in the coming years. Then, the
consequences of large-scale sequencing and genome analysis projects are
explained in detail and it is shown that they require fundamanetal changes to
the work of the sequence databanks. Next, different approaches and strategies
for coping with the forthcoming problems are outlined, and finally we present a
model for a next generation of sequence and other biological databases which
requires a conceptional reorganization of these databases, but which offers 
good
chances for successfully mastering the challenges of the future."
</AB>
<JT>Progress in Biophysics and Molecular Biology </JT>
<PY>1991</PY>
<VO>56</VO>
<NO>3</NO>
<PP>215-245</PP>
</SEQ>

<SEQ>
<UI>1465   Higgins,D.G.  The EMBL Data Library    Nucleic Acids R 92 
20(suppl):2071
</UI>
<AU>Higgins DG;
    Fuchs R;
    Stoehr PJ;
    Cameron GN
</AU>
<TI>The EMBL Data Library
</TI>
<SU>Sequence database;
    EMBL;
    Nucleotide;
    DE
</SU>
<AB>"The EMBL Data Library is part of the European Molecular Biology
Laboratory in Heidelberg, Germany. It was established in 1980 and its principal
role is to maintain and distribute a database of nucleotide sequences (the EMBL
Nucleotide Sequence Database). It is also involved in maintaining other
biological databases such as the protein sequence database SWISS-PROT and
distributes other databases of interest to molecular biologists."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1992</PY>
<VO>20</VO>
<NO>suppl</NO>
<PP>2071-2074</PP>
</SEQ>

<SEQ>
<UI>1466   Luttke,A.     MacP12: A Protein Prop.. Comput.Appl.Bio 93 
9(6):760-761
</UI>
<AU>Luttke A;
    Fuchs R
</AU>
<TI>MacP12: A Protein Property Multi-Profile Plot Program for the Apple
Macintosh
</TI>
<SU>Sequence analysis;
    Display;
    Profile;
    Protein;
    Amino acid;
    DE;
    Program
</SU>
<AB>"MacP12, a program for the Apple Macintosh, allows simultaneous plotting
of two protein property profiles selectable from 12 built-in amino acid 
property
scales. Various parameters such as the region to be analyzed, the size of the
sliding window, the weighting function and the size of the graphical output can
be easily adjusted by the user, which makes this program appropriate for 
diverse
research questions. Since build-in scales can be simply exchanged, MacP12 is
adaptable to the specific needs of the individual user."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>6</NO>
<PP>760-761</PP>
</SEQ>

<SEQ>
<UI>1467   Benner,S.A.   Empirical and Structur.. J.Mol.Biol.     93 
229(4):1065-10
</UI>
<AU>Benner SA;
    Cohen MA;
    Gonnet GH
</AU>
<TI>Empirical and Structural Models for Insertions and Deletions in the
Divergent Evolution of Proteins
</TI>
<SU>Evolution;
    Protein;
    Model;
    Indel;
    SWI;
    Deletion
</SU>
<AB>"The exhaustive matching of the protein sequence database makes possible 
a
broadly based study of insertions and deletions (indels) during divergent
evolution. In this study, the probability of a gap in an alignment of a pair of
homologous protein sequences was found to increase with the evolutionary
distance measured in PAM units (number of accepted point mutations per 100 
amino
acid residues). A relationship between the average number of amino acid 
residues
between indels and evolutionary distance suggests that a unit 30 to 40 amino
acid residues in length remains, on average, undisrupted by indels during
divergent evolution."
</AB>
<JT>J Mol Biol</JT>
<PY>1993</PY>
<VO>229</VO>
<NO>4</NO>
<PP>1065-1082</PP>
</SEQ>

<SEQ>
<UI>1468   Gusfield,D.   Faster Implementation .. Inform.Process. 94 
51(5):271-274
</UI>
<AU>Gusfield D
</AU>
<TI>Faster Implementation of a Shortest Superstring Approximation
</TI>
<SU>Supersequence;
    Shortest common;
    Approximation;
    Data structure;
    Regularities;
    USA
</SU>
<AB>"The shortest superstring problem has recently received renewed attention
due to its connection to problems in sequencing long pieces of DNA. Most
recently, Teng and Yao (1993) developed an approximation algorithm for the
shortest superstring problem which has a smaller error bound than the 
previously
best approximation due to Blum, Jiang, Li, Tromp and Yannakakis (1991). ... In
this paper we reduce the worst case running time for the new approximation
method of Teng and Yao, making its running time competitive with the
approximation method of Blum et al. We exploit suffix trees and properties of
the periodicity of strings."
</AB>
<JT>Inform Process Lett</JT>
<PY>1994</PY>
<VO>51</VO>
<NO>5</NO>
<PP>271-274</PP>
</SEQ>

<SEQ>
<UI>1469   Gusfield,D.   An Efficient Algorithm.. Inform.Process. 92 
41(4):181-185
</UI>
<AU>Gusfield D;
    Landau GM;
    Schieber B
</AU>
<TI>An Efficient Algorithm for the All Pairs Suffix-Prefix Problem
</TI>
<SU>Sequence analysis;
    Suffix;
    Prefix;
    USA;
    Algorithm
</SU>
<AB>"For a pair of strings (S1, S2), define the suffix-prefix match of (S1,
S2) to be the longest suffix of string S1 that matches a prefix of string S2.
The following problem is considered in this paper. Given a collection of 
strings
S1, S2, ..., Sk of total length m, find the suffix-prefix match for each of the
k(k-1) ordered pairs of strings. We present an algorithm that solves the 
problem
in O( m + k2 ) time, for any fixed alphabet. Since the size of the input is 
W(m)
and the size of the output is W(k2) this solution is optimal."
</AB>
<JT>Inform Process Lett</JT>
<PY>1992</PY>
<VO>41</VO>
<NO>4</NO>
<PP>181-185</PP>
</SEQ>

<SEQ>
<UI>1470   Gusfield,D.   Parametric Optimizatio.. Algorithmica    94 
12(4/5):312-32
</UI>
<AU>Gusfield D;
    Balasubramanian K;
    Naor D
</AU>
<TI>Parametric Optimization of Sequence Alignment
</TI>
<SU>Sequence alignment;
    Pairwise alignment;
    Parametric;
    Edit;
    Distance;
    Optimization;
    USA
</SU>
<AB>"Parametric sequence alignment is the problem of computing the optimal-
valued alignment between two sequences as a function of variable weights for
matches, mismatches, spaces, and gaps. ... In this paper we are primarily
concerned with the structure of this convex decomposition, and secondarily with
the complexity of computing the decomposition. The most striking results are 
the
following: For the special case where only matches, mismatches, and spaces are
counted, and where spaces are counted throughout the alignment, we show that 
the
decomposition is surprisingly simple: all regions are infinite; there are at
most n2/3 regions; the lines that bound the regions are all of the form b = c +
(c + 0.5)a; ..."
</AB>
<JT>Algorithmica </JT>
<PY>1994</PY>
<VO>12</VO>
<NO>4/5</NO>
<PP>312-326</PP>
</SEQ>

<SEQ>
<UI>1471   Hendy,M.D.    A Combinatorial Descri.. Discrete Math.  91 
96(1):51-58
</UI>
<AU>Hendy MD
</AU>
<TI>A Combinatorial Description of the Closest Tree Algorithm for Finding
Evolutionary Trees
</TI>
<SU>Phylogeny;
    Combinatorial;
    Evolutionary tree;
    NZ;
    Algorithm
</SU>
<AB>"The closest tree algorithm for estimating the evolutionary history of n
species, from a set of homologous DNA or RNA sequences is designed to avoid the
problem of inconsistency inherent in current methods. ... In this paper, a new
description of the algorithm is given, exploiting a combinatorial inverse pair
relationship. As a consequence, the algorithm can be improved in efficiency, to
be O(n2n) for some classes of sequences. This improvement makes the algorithm
practical for problems involving up to n = 20 species."
</AB>
<JT>Discrete Math</JT>
<PY>1991</PY>
<VO>96</VO>
<NO>1</NO>
<PP>51-58</PP>
</SEQ>

<SEQ>
<UI>1472   Hendy,M.D.    A Discrete Fourier Ana.. Proc.Nat.Acad.S 94 
91(8):3339-334
</UI>
<AU>Hendy MD;
    Penny D;
    Steel MA
</AU>
<TI>A Discrete Fourier Analysis for Evolutionary Trees
</TI>
<SU>Evolutionary tree;
    Fourier;
    Spectral analysis;
    NZ
</SU>
<AB>"Discrete Fourier transformations have recently been developed to model
the evolution of two-state characters (the Cavender/Farris model). We report
here the extension of these transformations to provide invertible relationships
between a phylogenetic tree T (with three probability parameters of nucleotide
substitution on each edge corresponding to Kimura's 3ST model) and the expected
frequencies of the nucleotide patterns in the sequences. We refer to these
relationships as spectral analysis."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>1994</PY>
<VO>91</VO>
<NO>8</NO>
<PP>3339-3343</PP>
</SEQ>

<SEQ>
<UI>1473   Feldman,W.    Gray Code Masks for Se.. Genomics        94 
23:233-235
</UI>
<AU>Feldman W;
    Pevzner P
</AU>
<TI>Gray Code Masks for Sequencing by Hybridization
</TI>
<SU>Sequencing;
    Hybridization;
    Error;
    USA;
    Mask
</SU>
<AB>"In light-directed synthesis of high-density oligonucleotide arrays for
sequencing by hybridization, synthesis errors result from the unintended
illumination of chip regions that should remain dark. Most synthesis errors
occur at the borders of illuminated regions, where light diffraction, internal
reflection, and scattering produce the most unintended illumination. A
combinatorial synthesis strategy based on two-dimensional Gray codes was 
devised
to reduce the overall lengths of these borders in masks for photolithographic
chip design. This article describes an application of two-dimensional Gray 
codes
...."
</AB>
<JT>Genomics </JT>
<PY>23</PY>
<VO>23</VO>
<PP>233-235</PP>
</SEQ>

<SEQ>
<UI>1474   Shields,D.C.  GCWIND: A Microcompute.. Comput.Appl.Bio 92 
8(5):521-523
</UI>
<AU>Shields DC;
    Higgins DG;
    Sharp PM
</AU>
<TI>GCWIND: A Microcomputer Program for Identifying Open Reading Frames
According to Codon Positional G+C Content
</TI>
<SU>Frame;
    Reading;
    Codon;
    Program;
    IR
</SU>
<AB>"GCWIND is a microcomputer (IBM-PC compatible) program for the
identification of protein-coding open reading frames. The program is similar to
the FRAME program .... The base compositions (%G+C) for each of the three
possible reading phases through the DNA sequence are displayed separately,
together with the positions of potential translation initiation and termination
codons (on the leading and complementary strands), to provide an immediate
representation of those regions within the sequence that have coding 
potential."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>5</NO>
<PP>521-523</PP>
</SEQ>

<SEQ>
<UI>1475   Kishino,H.    Maximum Likelihood Inf.. J.Mol.Evol.     90 
31(2):151-160
</UI>
<AU>Kishino H;
    Miyata T;
    Hasegawa M
</AU>
<TI>Maximum Likelihood Inference of Protein Phylogeny and the Origin of
Chloroplasts
</TI>
<SU>Phylogeny;
    Likelihood;
    Protein;
    Markov;
    JP;
    Chloroplast
</SU>
<AB>"A maximum likelihood method for inferring protein phylogeny was
developed. It is based on a Markov model that takes into account the unequal
transition probabilities among pairs of amino acids and does not assume
constancy of rate among different lineages. Therefore, this method is expected
to be powerful in inferring phylogeny among distantly related proteins, either
orthologous or paralogous, where the evolutionary rate may deviate from
constancy. Not only amino acid substitutions but also insertion/deletion events
during evolution were incorporated into the Markov model."
</AB>
<JT>J Mol Evol</JT>
<PY>1990</PY>
<VO>31</VO>
<NO>2</NO>
<PP>151-160</PP>
</SEQ>

<SEQ>
<UI>1476   Myers,E.W.    A Sublinear Algorithm .. Algorithmica    94 
12(4/5):345-37
</UI>
<AU>Myers EW
</AU>
<TI>A Sublinear Algorithm for Approximate Keyword Searching
</TI>
<SU>Approximate match;
    Database search;
    Dynamic programming;
    USA;
    Algorithm
</SU>
<AB>"Given a relatively short query string W of length P, a long subject
string A of length N, and a threshold D, the approximate keyword search problem
is to find all substrings of A that align with W with not more than D
insertions, deletions, and mismatches. ... In this paper we present an 
algorithm
that given a precomputed index of the database A, finds rare matches in time
that is sublinear in N, i.e., Nc for some c &lt; 1. The sequence A must be over a
finite alphabet S. ... In preliminary practical experiments, the approach gives
a 50- to 500-fold improvement over previous algorithms for problems of interest
in molecular biology."
</AB>
<JT>Algorithmica </JT>
<PY>1994</PY>
<VO>12</VO>
<NO>4/5</NO>
<PP>345-374</PP>
</SEQ>

<SEQ>
<UI>1477   Fischetti,V.A Identifying Periodic O.. Inform.Process. 93 
45(1):11-18
</UI>
<AU>Fischetti VA;
    Landau GM;
    Schmidt JP;
    Sellers PH
</AU>
<TI>Identifying Periodic Occurrences of a Template with Applications to
Protein Structure
</TI>
<SU>Protein;
    Structure;
    Template;
    Regularities;
    String match;
    USA
</SU>
<AB>Author sequence and affiliations were corrected on page 157 of Inform.
Process. Lett. 46 (1993). "Consider a template P of size m in which each
character matches many different characters with various degrees of perfection.
Given a text T of size n, we present a simple and practical algorithm that 
finds
the substring of T, which best matches some substring of Pn (Pn is the
concatenation of an arbitrary number of copies of P). The algorithm produces 
the
matched pair and their alignment in O( mn ) time."
</AB>
<JT>Inform Process Lett</JT>
<PY>1993</PY>
<VO>45</VO>
<NO>1</NO>
<PP>11-18</PP>
</SEQ>

<SEQ>
<UI>1478   Li,W.H.       Unbiased Estimation of.. J.Mol.Evol.     93 
36(1):96-99
</UI>
<AU>Li WH
</AU>
<TI>Unbiased Estimation of the Rates of Synonymous and Nonsynonymous
Substitution
</TI>
<SU>Substitution;
    Synonymous;
    Rate;
    Transition;
    Transversion;
    USA;
    Estimation
</SU>
<AB>"The current convention in estimating the number of substitutions per
synonymous site (KS) and per nonsysnonymous site (KA) between two 
protein-coding
genes is to count each twofold degenerate site as one-third synonymous and two-
thirds nonsynonymous because one of the three possible changes as such a site 
is
synonymous and the other two are nonsynonymous. This counting rule can
considerably overestimate the KS value .... A new method that gives unbiased
estimates is proposed."
</AB>
<JT>J Mol Evol</JT>
<PY>1993</PY>
<VO>36</VO>
<NO>1</NO>
<PP>96-99</PP>
</SEQ>

<SEQ>
<UI>1479   Gates,W.H.    Bounds for Sorting by .. Discrete Math.  79 27:47-57
</UI>
<AU>Gates WH;
    Papadimitriou CH
</AU>
<TI>Bounds for Sorting by Prefix Reversal
</TI>
<SU>Inversion;
    Prefix;
    Reversal;
    Genomic;
    USA
</SU>
<AB>"For a permutation s of the integers from 1 to n, let f(s) be the 
smallest
number of prefix reversals that will transform s to the identity permuation, 
and
let f(n) be the largest such f(s) for all s in (the symmetric group) Sn. We 
show
that f(n) &lt;= (5n + 5)/3, and that f(n) &gt;= 17n/16 for n a multiple of 16. If,
furthermore, each integer is required to participate in an even number of
reversed prefixes, the corresponding function g(n) is shown to obey 3n/2 - 1 &lt;=
g(n) &lt;= 2n + 3."
</AB>
<JT>Discrete Math</JT>
<PY>27</PY>
<VO>27</VO>
<PP>47-57</PP>
</SEQ>

<SEQ>
<UI>1480   Myers,E.W.    An O(N**2 log N) Restr.. Bull.Math.Biol. 92 
54(4):599-618
</UI>
<AU>Myers EW;
    Huang X
</AU>
<TI>An O(N2 log N) Restriction Map Comparison and Search Algorithm
</TI>
<SU>Restriction;
    Mapping;
    USA;
    Algorithm
</SU>
<AB>"We present an O(R log P) time, O(M + P2) space algorithm for searching a
restriction map with M sites for the best matches to a shorter map with P 
sites,
where R, the number of matching site pairs, is bounded by MP. As first proposed
by Waterman et al. (1984) the objective function used to score matches is
additive in the number of unaligned sites and the discrepancies in the 
distances
between adjacent aligned sites. Our algorithm is basically a sparse dynamic
programming computation in which 'candidate lists' are used to model the future
contribution of all previously computed entries to those yet to be computed."
</AB>
<JT>Bull Math Biol</JT>
<PY>1992</PY>
<VO>54</VO>
<NO>4</NO>
<PP>599-618</PP>
</SEQ>

<SEQ>
<UI>1481   Pevzner,P.A.  Optimal Chips for Mega.. Mol.Biol.(Mosc. 91 25(2 
part 2):4
</UI>
<AU>Pevzner PA;
    Lysov YuP;
    Khrapko KR;
    Belyavskii AV;
    Florentev VL;
    Mirzabekov AD
</AU>
<TI>Optimal Chips for Megabase DNA Sequencing
</TI>
<SU>Sequencing;
    DNA;
    Optimal;
    RU
</SU>
<AB>Translated from Molekulyarnaya Biologiya, 25(2), 552-562, March-April
1991. "A new approach to DNA sequencing associated with hybridization of a DNA
fragment with oligonucleotides immobilized on a two-dimensional matrix (the 
SHOM
method) was proposed in 1988. The first SHOM studies were directed at creating 
a
sequence matrix containing all 65,536 octanucleotides. A new family of
sequencing matrices has now been proposed, making it possible to reduce the
number of oligonucleotides to be synthesized by a factor of 5-15 with virtually
no decrease in method resolution."
</AB>
<JT>Mol Biol (Mosc ) </JT>
<PY>1991</PY>
<VO>25</VO>
<NO>2 part 2</NO>
<PP>459-467</PP>
</SEQ>

<SEQ>
<UI>1482   Gelfand,M.S.  Extendable Words in Nu.. Comput.Appl.Bio 92 
8(2):129-135
</UI>
<AU>Gelfand MS;
    Kozhukhin CG;
    Pevzner PA
</AU>
<TI>Extendable Words in Nucleotide Sequences
</TI>
<SU>Sequence analysis;
    Word;
    Statistical;
    Nucleotide;
    Linguistic;
    RU
</SU>
<AB>"Previous statistical analyses revealed several peculiarities of
nucleotide sequences that preclude their description by existing models and 
thus
allow one to distinguish DNA and RNA sequences from random A,T,C,G-texts. This
is a consequence of the unusual distribution of certain words in nucleotide
sequences: while the distribution of (most) words is consistent with Markov
models of small orders, the distribution of certain words cannot be described 
by
any previous model .... In this work we introduce a probabilistic approach that
is partly motivated by analogy with linguistics."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>2</NO>
<PP>129-135</PP>
</SEQ>

<SEQ>
<UI>1483   Sankoff,D.    Analytical Approaches .. Biochimie       93 
75(5):409-413
</UI>
<AU>Sankoff D
</AU>
<TI>Analytical Approaches to Genomic Evolution
</TI>
<SU>Genomic;
    Evolution;
    Analytical;
    CA
</SU>
<AB>"We model the non-local mechanisms of genomic evolution and propose
methods for studying the evolutionary divergence of species based on these
models. Mechanisms include the movement of segments of genomes within a single
chromosome (transpositions), the reciprocal translocation of segments between
two chromosomes, and the inversion of segments. Each of these is studied in the
context of a different type of genomic data. We introduce the theory of
phylogenetic invariants for evolutionary inference based on very long
macromolecular sequences."
</AB>
<JT>Biochimie </JT>
<PY>1993</PY>
<VO>75</VO>
<NO>5</NO>
<PP>409-413</PP>
</SEQ>

<SEQ>
<UI>1484   Ferretti,V.   The Empirical Discover.. Adv.Appl.Probab 93 
25(2):290-302
</UI>
<AU>Ferretti V;
    Sankoff D
</AU>
<TI>The Empirical Discovery of Phylogenetic Invariants
</TI>
<SU>Phylogeny;
    Invariant;
    Markov;
    CA;
    Phylogenetic
</SU>
<AB>"An invariant F of a tree T under a k-state Markov model, where the time
parameter is identified with the edges of T, allows us to recognize whether 
data
on N observed species can be associated with the N terminal vertices of T in 
the
sense of having been generated on T rather than on any other tree with N
terminals. The invariance is with respect to the (time) lengths associated with
the edges of the tree. We propose a general method of finding invariants of a
parametrized functional form. ... We apply this to the case of quadratic
invariants of unrooted binary trees with four terminals, for all k, using the
Jukes-Cantor type of Markov matrix."
</AB>
<JT>Adv Appl Probab</JT>
<PY>1993</PY>
<VO>25</VO>
<NO>2</NO>
<PP>290-302</PP>
</SEQ>

<SEQ>
<UI>1485   Sibbald,P.R.  Overseer: A Nucleotide.. Comput.Appl.Bio 92 
8(1):45-48
</UI>
<AU>Sibbald PR;
    Sommerfeldt H;
    Argos P
</AU>
<TI>Overseer: A Nucleotide Sequence Searching Tool
</TI>
<SU>Sequence search;
    Nucleotide;
    Database search;
    DE
</SU>
<AB>"Overseer is a computer program that searches databases of nucleic acid
sequences for objects of interest to the user. Such objects may consist of any
number of simpler building blocks such as repeats, palindromes or stem-loops,
strings of particular bases with or without mismatches, etc. Written in 
standard
Pascal, this program runs under Unix and VMS and should also run under other
operating systems."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>1</NO>
<PP>45-48</PP>
</SEQ>

<SEQ>
<UI>1486   Guigo,R.      Inferring Correlation .. IEEE Trans.Patt 93 
15(10):1030-10
</UI>
<AU>Guigo R;
    Smith TF
</AU>
<TI>Inferring Correlation between Database Queries: Analysis of Protein
Sequence Patterns
</TI>
<SU>Database search;
    Correlation;
    Protein;
    Pattern discovery;
    USA;
    Query
</SU>
<AB>"Given a subset P of a database, we address the problem of finding the
query f in a given database attribute having the closest extension to P. In the
particular case that we outline, P is the set of protein sequences in a protein
sequence database matching a given protein sequence pattern, whereas f is a
query in the annotation of the database. Ideally, f is the description of a
biological function. If the extension of f is very similar to P, we may infer
association between the pattern and the biological function described by the
query."
</AB>
<JT>IEEE Trans Patt Anal Mach Intell</JT>
<PY>1993</PY>
<VO>15</VO>
<NO>10</NO>
<PP>1030-1041</PP>
</SEQ>

<SEQ>
<UI>1487   Guigo,R.      Automatic Evaluation o.. Comput.Appl.Bio 91 
7(3):309-315
</UI>
<AU>Guigo R;
    Johansson A;
    Smith TF
</AU>
<TI>Automatic Evaluation of Protein Sequence Functional Patterns
</TI>
<SU>Protein;
    Pattern discovery;
    USA
</SU>
<AB>"A procedure that automatically provides an evaluation of the diagnostic
ability of a protein sequence functional pattern is described. The procedure
relies on the identification of the closest definable set in terms of a 
(protein
sequence) database functional annotation to the set of database instances
containing a given pattern. Assuming annotation correctness and completeness in
the protein sequence database, the degree of statistical association between
these sets provides an appropriate measure of the diagnostic ability of the
pattern."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1991</PY>
<VO>7</VO>
<NO>3</NO>
<PP>309-315</PP>
</SEQ>

<SEQ>
<UI>1488   Lamperti,E.D. Corruption of Genomic .. Nucleic Acids R 92 
20(11):2741-27
</UI>
<AU>Lamperti ED;
    Kittelberger JM;
    Smith TF;
    Villa-Komaroff L
</AU>
<TI>Corruption of Genomic Databases with Anomalous Sequence
</TI>
<SU>Sequence database;
    Reliability;
    Genomic;
    USA
</SU>
<AB>"We describe evidence that DNA sequences from vectors used for cloning 
and
sequencing have been incorporated accidentally into eukaryotic entries in the
GenBank database. These incorporations were not restricted to one type of 
vector
or to a single mechanism. Many minor instances may have been the result of
simple editing errors, but some entries contained large blocks of vector
sequence that had been incorporated by contamination or other accidents during
cloning. Some cases involved unusual rearrangements and areas of vector distant
from the normal insertion sites."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1992</PY>
<VO>20</VO>
<NO>11</NO>
<PP>2741-2747</PP>
</SEQ>

<SEQ>
<UI>1489   Steel,M.A.    Confidence in Evolutio.. Nature (Lond.)  93 364 (29 
Jul.):
</UI>
<AU>Steel MA;
    Lockhart PJ;
    Penny D
</AU>
<TI>Confidence in Evolutionary Trees from Biological Sequence Data
</TI>
<SU>Evolutionary tree;
    Confidence;
    Significance;
    NZ
</SU>
<AB>"Where genomes have independently acquired similar G+C base compositions,
signals in the data arise that cause methods of evolutionary tree 
reconstruction
to estimate the wrong tree by grouping together sequences with similar G+C
content. Under these conditions randomization tests can lead to both the
rejection of the correct evolutionary hypothesis and acceptance of an incorrect
hypothesis .... We have proposed one approach to testing for the G+C content
problem. Here we present a formalization of this method, a frequency-dependent
significance test, which has general application."
</AB>
<JT>Nature (Lond ) </JT>
<PY>1993</PY>
<VO>364</VO>
<NO>29 Jul.</NO>
<PP>440-442</PP>
</SEQ>

<SEQ>
<UI>1490   Szekely,L.A.  Fourier Calculus on Ev.. Adv.Appl.Math.  93 
14(2):200-216
</UI>
<AU>Szekely LA;
    Steel MA;
    Erdos PL
</AU>
<TI>Fourier Calculus on Evolutionary Trees
</TI>
<SU>Evolutionary tree;
    Fourier;
    Calculus;
    Phylogeny;
    HU
</SU>
<AB>"We describe a Fourier analysis approach to the reconstruction theory of
evolutionary trees that is based on Kimura's model of molecular evolution."
</AB>
<JT>Adv Appl Math</JT>
<PY>1993</PY>
<VO>14</VO>
<NO>2</NO>
<PP>200-216</PP>
</SEQ>

<SEQ>
<UI>1491   Prestridge,D. SIGNAL SCAN 3.0: New D.. Comput.Appl.Bio 93 
9(1):113-115
</UI>
<AU>Prestridge DS;
    Stormo G
</AU>
<TI>SIGNAL SCAN 3.0: New Database and Program Features
</TI>
<SU>Database search;
    Program;
    DNA;
    USA;
    Signal
</SU>
<AB>"SIGNAL SCAN is a program that utilizes a transcription factor database 
to
find potential transcription factor binding sites in DNA sequences. ... SIGNAL
SCAN is now network compatible and is available for IBM-compatible PC, Unix and
VMA platforms."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>1</NO>
<PP>113-115</PP>
</SEQ>

<SEQ>
<UI>1492   Snyder,E.E.   Identification of Codi.. Nucleic Acids R 93 
21(3):607-613
</UI>
<AU>Snyder EE;
    Stormo GD
</AU>
<TI>Identification of Coding Regions in Genomic DNA Sequences: An Application
of Dynamic Programming and Neural Networks
</TI>
<SU>Pattern discovery;
    Coding;
    Region;
    Dynamic programming;
    Neural;
    Identification;
    DNA;
    Genomic;
    USA;
    Network;
    Dynamic
</SU>
<AB>"Dynamic programming (DP) is applied to the problem of precisely
identifying internal exons and introns in genomic DNA sequences. The program
GeneParser first scores the sequence of interest for splice sites and for these
intron- and exon-specific content measures: codon usage, local compositional
complexity, 6-tuple frequency, length distribution and periodic asymmetry. This
information is then organized for interpretation by DP. GeneParser employs the
DP algorithm to enforce the constraints that introns and exons must be adjacent
and non-overlapping and finds the highest scoring combination of introns and
exons subject to these constraints."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1993</PY>
<VO>21</VO>
<NO>3</NO>
<PP>607-613</PP>
</SEQ>

<SEQ>
<UI>1493   Huelsenbeck,J Is Character Weighting.. Syst.Biol.      94 
43(2):288-291
</UI>
<AU>Huelsenbeck JP;
    Swofford DL;
    Cunningham CW;
    Bull JJ;
    Waddell PJ
</AU>
<TI>Is Character Weighting a Panacea for the Problem of Data Heterogeneity in
Phylogenetic Analysis?
</TI>
<SU>Phylogeny;
    Character data;
    Character weight;
    USA;
    Phylogenetic
</SU>
<AB>"Although we fully agree that a weighting strategy that correctly down-
weights unreliable characters can improve accuracy of phylogenetic estimation
(Bull et al., 1993:394), we find the position that weighting can take into
account all forms of heterogeneity to be overly optimistic. Even if that
position were supportable, however, we do not understand the claim that
conditions under which differential weighting fails to solve the problem would
also cause our approach to fail."
</AB>
<JT>Syst Biol</JT>
<PY>1994</PY>
<VO>43</VO>
<NO>2</NO>
<PP>288-291</PP>
</SEQ>

<SEQ>
<UI>1494   Taylor,W.R.   Compensating Changes i.. Protein Eng.    94 
7(3):341-348
</UI>
<AU>Taylor WR;
    Hatrick K
</AU>
<TI>Compensating Changes in Protein Multiple Sequence Alignments
</TI>
<SU>Multiple alignment;
    Protein;
    Sequence alignment;
    Structure;
    UK
</SU>
<AB>"A method was developed to identify compensating changes between residues
at positions in a multiple sequence alignment. (For example, one position might
always contain a positively charged residue when the other is negatively 
charged
and vice versa.) A correlation-based method was used to measure the 
compensation
found in the four residues at a pair of positions in any two sequences in a
multiple alignment. All possible sequence pairings were measured at the pair of
positions and the resulting matrix analysed to give a measure of cooperativity
among the pairs."
</AB>
<JT>Protein Eng</JT>
<PY>1994</PY>
<VO>7</VO>
<NO>3</NO>
<PP>341-348</PP>
</SEQ>

<SEQ>
<UI>1495   Taylor,W.R.   Protein Fold Refinemen.. Protein Eng.    93 
6(6):593-604
</UI>
<AU>Taylor WR
</AU>
<TI>Protein Fold Refinement: Building Models from Idealized Folds using Motif
Constraints and Multiple Sequence Data
</TI>
<SU>Protein;
    Fold;
    Model;
    Motif;
    Multiple alignment;
    Distance;
    UK
</SU>
<AB>"A general solution to the problem of directly incorporating data from
multiple sequence alignments into the construction of molecular models was
approached through the calculation of an estimated pairwise distance based on
conserved hydrophobicity. A scaling method was developed that allowed the
required bulk geometric properties of the estimated pairwise distances (mean 
and
mean squared) to mimic those expected in a globular protein. These properties
were maintained independently of the composition, length, number or degree of
conservation of the original sequences."
</AB>
<JT>Protein Eng</JT>
<PY>1993</PY>
<VO>6</VO>
<NO>6</NO>
<PP>593-604</PP>
</SEQ>

<SEQ>
<UI>1496   Lerman,I.C.   Classification of Alig.. New Approache.. 
94Springer-Verlag
</UI>
<AU>Lerman IC;
    Nicolas J;
    Tallur B;
    Peter P
</AU>
<TI>Classification of Aligned Biological Sequences
</TI>
<ED>Diday E
    Lechevallier Y;
    Schader M;
    Bertrand P;
    Burtschy B
</ED>
<BK>New Approaches in Classification and Data Analysis
</BK>
<SU>Classification;
    Clustering;
    Similarity;
    Sequence comparison;
    FR
</SU>
<AB>"We considered the problem of classifying aligned sequences and applied
our methods to two families - a family of 68 cytochrome sequences and that of 
42
globin sequences. ... The main interest of this paper is to show the
interactions between mathematical representation of similarities among these
complex data structures and the outcome of clustering within the common
framework of Likelihood Linkage Analysis (L.L.A.) (Lerman, Lerman et al.
(1993)). Applying LLA methodology, fine and relevant results have been obtained
for both data sets."
</AB>
<PU>Springer-Verlag </PU>
<PL>Berlin </PL>
<PY>1994</PY>
<PP>370-377</PP>
</SEQ>

<SEQ>
<UI>1497   Huang,X.      Dynamic Programming Al.. Comput.Appl.Bio 92 
8(5):511-520
</UI>
<AU>Huang X;
    Waterman MS
</AU>
<TI>Dynamic Programming Algorithms for Restriction Map Comparison
</TI>
<SU>Restriction;
    Mapping;
    Dynamic programming;
    USA;
    Dynamic;
    Algorithm
</SU>
<AB>"For most sequence comparison problems there is a corresponding map
comparison algorithm. While map data may appear to be incompatible with dynamic
programming, we show in this paper that the rigor and efficiency of dynamic
programming algorithms carry over to the map comparison algorithms. We present
algorithms for restriction map comparison .... The new algorithms are a natural
extension of a previous map comparison model. Dynamic programming algorithms 
for
computing optimal global and local alignments under the new model are
described."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1992</PY>
<VO>8</VO>
<NO>5</NO>
<PP>511-520</PP>
</SEQ>

<SEQ>
<UI>1498   Crochemore,M. On Two-Dimensional Pat.. Theoret.Comput. 94 
132:403-414
</UI>
<AU>Crochemore M;
    Rytter W
</AU>
<TI>On Two-Dimensional Pattern Matching by Optimal Parallel Algorithms
</TI>
<SU>Pattern match;
    Parallel;
    Multidimensional;
    FR;
    Optimal;
    Algorithm
</SU>
<AB>"Simplified versions of Kedem-Landau-Palem algorithms for parallel one-
dimensional and two-dimensional pattern-matching on a CRCW PRAM are presented.
... A novel algorithm for two-dimensional matching is presented which is more
directly designed for two-dimensional objects."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>132</PY>
<VO>132</VO>
<PP>403-414</PP>
</SEQ>

<SEQ>
<UI>1499   Wu,C.         Back-Propagation and C.. Nucleic Acids R 94 
22(20):4291-42
</UI>
<AU>Wu C;
    Shivakumar S
</AU>
<TI>Back-Propagation and Counter-Propagation Neural Networks for Phylogenetic
Classification of Ribosomal RNA Sequences
</TI>
<SU>Phylogeny;
    Classification;
    Neural;
    RNA;
    USA;
    Phylogenetic;
    Network
</SU>
<AB>"A neural network system has been developed for rapid and accurate
classification of ribosomal RNA sequences according to phylogenetic
relationship. The molecular sequences are encoded into neural input vectors
using an n-gram hashing method. A SVD (singular value decomposition) method is
used to compress and reduce the size of long and sparse n-gram input vectors.
The neural networks used are three-layered, feed-forward networks that employ
supervised learning paradigms, including the back-propagation algorithm and a
modified counter-propagation algorithm."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1994</PY>
<VO>22</VO>
<NO>20</NO>
<PP>4291-4299</PP>
</SEQ>

<SEQ>
<UI>1500   Goldman,N.    A Codon-based Model of.. Mol.Biol.Evol.  94 
11(5):725-736
</UI>
<AU>Goldman N;
    Yang Z
</AU>
<TI>A Codon-based Model of Nucleotide Substitution for Protein-coding DNA
Sequences
</TI>
<SU>Codon;
    Model;
    Substitution;
    DNA;
    Phylogeny;
    Markov;
    UK;
    Nucleotide
</SU>
<AB>"A codon-based model for the evolution of protein-coding DNA sequences is
presented for use in phylogenetic estimation. A Markov process is used to
describe substitutions between codons. Transition/transversion rate bias and
codon usage bias are allowed in the model, and selective restraints at the
protein level are accommodated using physicochemical distances between the 
amino
acids coded for by the codons."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1994</PY>
<VO>11</VO>
<NO>5</NO>
<PP>725-736</PP>
</SEQ>

<SEQ>
<UI>1501   Muse,S.V.     A Likelihood Approach .. Mol.Biol.Evol.  94 
11(5):715-724
</UI>
<AU>Muse SV;
    Gaut BS
</AU>
<TI>A Likelihood Approach for Comparing Synonymous and Nonsynonymous
Nucleotide Substitution Rates, with Application to the Chloroplast Genome
</TI>
<SU>Likelihood;
    Synonymous;
    Substitution;
    Genome;
    Model;
    DNA;
    Evolution;
    USA;
    Nucleotide;
    Rate;
    Chloroplast
</SU>
<AB>"A model of DNA sequence evolution applicable to coding regions is
presented. This represents the first evolutionary model that accounts for
dependencies among nucleotides within a codon. The model uses the codon, as
opposed to the nuceotide, as the unit of evolution, and is parameterized in
terms of synonymous and nonsynonymous nucleotide substitution rates. One of the
model's advantages ... is that it completely corrects for multiple hits at a
codon, rather than taking a parsimony approach and considering only pathways of
minimum change between homologous codons."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1994</PY>
<VO>11</VO>
<NO>5</NO>
<PP>715-724</PP>
</SEQ>

<SEQ>
<UI>1502   Konopka,A.K.  Computational Experime.. Computers Chem. 94 
18(3):v-viii
</UI>
<AU>Konopka AK
</AU>
<TI>Computational Experiments in Molecular Biology: Searching for the 'Big
Picture'
</TI>
<SU>Sequence analysis;
    USA
</SU>
<AB>This editorial introduces a special issue entitled "Open Problems of
Computational Molecular Biology (3)." Many of the papers were presented at the
Third International Workshop on Open Problems in Computational Molecular
Biology, Telluride, 11-25 July 1993. Contents: Computational Issues in Genome
Research (3 papers), Protein Structure Prediction (4), Mathematical Techniques
in Sequence Research (4), Modelling Evolution and Development (3), Overviews 
and
Opinions (2).
</AB>
<JT>Computers Chem</JT>
<PY>1994</PY>
<VO>18</VO>
<NO>3</NO>
<PP>v-viii</PP>
</SEQ>

<SEQ>
<UI>1503   Claverie,J.M. Some Useful Statistica.. Computers Chem. 94 
18(3):287-294
</UI>
<AU>Claverie JM
</AU>
<TI>Some Useful Statistical Properties of Position-Weight Matrices
</TI>
<SU>Sequence analysis;
    Pattern discovery;
    Statistical;
    Significance;
    Profile;
    Scoring;
    USA
</SU>
<AB>"Position-weight matrices (or profiles) are simple mathematical objects
traditionally used to capture the information about local sequence patterns (or
motifs) characteristic of a given structure or function. Although weight
matrices can lead to fast database scanning algorithms their usage has been
limited, due to the lack of a reliable method to assess the statistical
significance of the matching scores. In this article I first review three
different computation schemes for designing weight matrices .... I then show
that, for patterns spanning 10 positions or more, the best scores expected from
matching random sequences are distributed according to the extreme value
(Gumbel) distribution."
</AB>
<JT>Computers Chem</JT>
<PY>1994</PY>
<VO>18</VO>
<NO>3</NO>
<PP>287-294</PP>
</SEQ>

<SEQ>
<UI>1504   Lawrence,C.   Toward the Unification.. Computers Chem. 94 
18(3):255-258
</UI>
<AU>Lawrence C
</AU>
<TI>Toward the Unification of Sequence and Structural Data for Identification
of Structural and Functional Constraints
</TI>
<SU>Pattern discovery;
    Structure;
    Function;
    Protein;
    USA;
    Identification
</SU>
<AB>"The identification and characterization of local residue patterns or
conserved segments shared by a set of biopolymers has provided a number of
insights in molecular biology. Biopolymer sequences are observations from macro
molecules that share common structural or function features. The approach taken
here rests on the notion that information may be most efficiently extracted 
from
these observations through the use of a model that faithfully represents macro-
molecular characteristics. Accordingly, our efforts are focused on statistical
models which attempt to capture central features of protein structure, 
function,
and change. Here the assumptions that underlie two new methods for the analysis
of protein sequence data are explicitly delineated."
</AB>
<JT>Computers Chem</JT>
<PY>1994</PY>
<VO>18</VO>
<NO>3</NO>
<PP>255-258</PP>
</SEQ>

<SEQ>
<UI>1505   Fickett,J.W.  Inferring Genes from O.. Computers Chem. 94 
18(3):203-205
</UI>
<AU>Fickett JW
</AU>
<TI>Inferring Genes from Open Reading Frames
</TI>
<SU>Gene;
    Protein;
    Coding;
    Reading;
    ORF;
    USA;
    Frame
</SU>
<AB>"One expects that in DNA without protein coding function, stop codons
(which constitute three of the 64 possible codons) should occur frequently in
all reading frames, and that a long open reading frame (ORF) can be interpreted
as a sign for the existence of a gene. We make a beginning on introducing
quantitative measures of confidence into this inference - taking Saccharomyces
cerevisiae as a sample case - and show that some common assumptions can
reasonably be questioned. In particular we show that statistical support for 
the
biological function of shorter ORFs listed as putative genes in recent papers 
is
in fact very weak."
</AB>
<JT>Computers Chem</JT>
<PY>1994</PY>
<VO>18</VO>
<NO>3</NO>
<PP>203-205</PP>
</SEQ>

<SEQ>
<UI>1506   Day,W.H.E.    The Asymptotic Plurali.. Math.Comput.Mod 95 0:0-0
</UI>
<AU>Day WHE;
    Kubicka E;
    Kubicki G;
    McMorris FR
</AU>
<TI>The Asymptotic Plurality Rule for Molecular Sequences
</TI>
<SU>Consensus method;
    Plurality rule;
    Sequence analysis;
    Characterization;
    USA
</SU>
<AB>"The asymptotic plurality rule, apl, is a consensus function which maps
each profile P of length n (i.e., each sequence of n bases appearing at an
aligned position of n molecules) to a set apl ( P ) of consensus results (i.e.,
ambiguity codes) that is a descriptive summary of P. Our main result is to
characterize each consensus result X = apl ( P ) in terms of the frequencies
with which the bases in P occur. We then use these characterizations to
investigate features (e.g., strong consistency, length independence) of apl 
that
researchers may find useful for the interpretation of apl's consensus results."
</AB>
<JT>Math Comput Modelling </JT>
<PY>0</PY>
<VO>0</VO>
<PP>0-0</PP>
</SEQ>

<SEQ>
<UI>1507   Baeza-Yates,R Analysis of Boyer-Moor.. ACM-SIAM Sympos 90 
1:328-343
</UI>
<AU>Baeza-Yates RA;
    Gonnet GH;
    Regnier M
</AU>
<TI>Analysis of Boyer-Moore-Type String Searching Algorithms
</TI>
<SU>String search;
    Boyer-Moore;
    Automata;
    CL;
    Algorithm
</SU>
<AB>"We study Boyer-Moore-type string searching algorithms. First, we analyze
the Horspool's variant. The [average-case] searching time is linear. An exact
expression of the linearity constant is derived and is proven to be
asymptotically 1/c, where c is the cardinality of the alphabet. ... We also
study Boyer-Moore automata, a notion that we formalize. This approach appears 
to
be faster than any other known algorithm, in both the worst and average case
number of inspections."
</AB>
<BK>ACM-SIAM Sympos Discrete Algorithms</BK>
<PY>1990</PY>
<VO>1</VO>
<PP>328-343</PP>
</SEQ>

<SEQ>
<UI>1508   Amir,A.       Efficient Pattern Matc.. ACM-SIAM Sympos 90 
1:344-357
</UI>
<AU>Amir A;
    Landau GM;
    Vishkin U
</AU>
<TI>Efficient Pattern Matching with Scaling
</TI>
<SU>Pattern match;
    Multidimensional;
    USA
</SU>
<AB>"The problem of pattern matching with scaling is defined. The input for
the two-dimensional version of the problem consists of an n x n 'text' matrix
and an m x m 'pattern' matrix. We want to find all occurrences of the pattern 
in
the text, scaled to all natural multiples. ... This problem is useful for some
tasks in computer vision. Our main contribution is a linear time algorithm for
the problem."
</AB>
<BK>ACM-SIAM Sympos Discrete Algorithms</BK>
<PY>1990</PY>
<VO>1</VO>
<PP>344-357</PP>
</SEQ>

<SEQ>
<UI>1509   Amir,A.       Efficient 2-dimensiona.. ACM-SIAM Sympos 91 
2:212-223
</UI>
<AU>Amir A;
    Farach M
</AU>
<TI>Efficient 2-dimensional Approximate Matching of Non-rectangular Figures
</TI>
<SU>Pattern match;
    Multidimensional;
    Approximate match;
    USA
</SU>
<AB>"Finding all occurrences of a non-rectangular pattern of height m and 
area
a in an n x n text with no more than k mismatch, insertion, and deletion errors
is an important problem in computer vision. It can be solved using a dynamic
programming approach in time O( an2 ). We show a O( kn2 ( m log m )1/2 ( k log 
k
)1/2 + k2n2 ) algorithm which combines convolutions with dynamic programming. 
At
the heart of the algorithm are the Smaller Matching Problem and the k-Aligned
Ones with Location Problem. Efficient algorithms to solve both these problems
are presented."
</AB>
<BK>ACM-SIAM Sympos Discrete Algorithms</BK>
<PY>1991</PY>
<VO>2</VO>
<PP>212-223</PP>
</SEQ>

<SEQ>
<UI>1510   Cole,R.       Tight Bounds on the Co.. ACM-SIAM Sympos 91 
2:224-233
</UI>
<AU>Cole R
</AU>
<TI>Tight Bounds on the Complexity of the Boyer-Moore String Matching
Algorithm
</TI>
<SU>String match;
    Boyer-Moore;
    Complexity;
    USA;
    Algorithm
</SU>
<AB>"The problem of finding all occurrences of a pattern of length m in a 
text
of length n is considered. It is shown that the Boyer-Moore string matching
algorithm performs roughly 3n comparisons and that this bound is tight up to O(
n/m ); more precisely, an upper bound of 3n - n/m comparisons is shown, as is a
lower bound of 3n( 1 - o(1) ) comparisons, as n/m goes to infinity and m goes 
to
infinity. While the upper bound is somewhat involved, its main elements provide
a quite simple proof of a 4n upper bound for the same algorithm."
</AB>
<BK>ACM-SIAM Sympos Discrete Algorithms</BK>
<PY>1991</PY>
<VO>2</VO>
<PP>224-233</PP>
</SEQ>

<SEQ>
<UI>1511   Amir,A.       Two-Dimensional Period.. ACM-SIAM Sympos 92 
3:440-452
</UI>
<AU>Amir A;
    Benson G
</AU>
<TI>Two-Dimensional Periodicity and its Applications
</TI>
<SU>Pattern match;
    Multidimensional;
    Regularities;
    USA
</SU>
<AB>"This paper presents a new algorithmic technique for two-dimensional
matching, that of periodicity analysis. This paper's main contribution is
defining and analysing two-dimensional periodicity in rectangular arrays. In
addition, we introduce a new pattern matching paradigm - Compressed Matching. A
text array T and a pattern array P are given in compressed forms c(T) and c(P).
We seek all appearances of P in T, without decompressing T."
</AB>
<BK>ACM-SIAM Sympos Discrete Algorithms</BK>
<PY>1992</PY>
<VO>3</VO>
<PP>440-452</PP>
</SEQ>

<SEQ>
<UI>1512   Waterman,M.S. Introduction to Comput..                 95Chapman 
Hall
</UI>
<AU>Waterman MS
</AU>
<TI>Introduction to Computational Biology: Maps, Sequences and Genomes
BK  -
</TI>
<SU>Sequence analysis;
    Mapping;
    Genome;
    USA
</SU>
<AB>Not published as of March 1995.
</AB>
<PU>Chapman Hall </PU>
<PL> </PL>
<PY>1995</PY>
<PP>0-0</PP>
</SEQ>

<SEQ>
<UI>1513   Sankoff,D.    Steiner Points in the ..                 95
</UI>
<AU>Sankoff D;
    Sundaram G;
    Kececioglu J
</AU>
<TI>Steiner Points in the Space of Genome Rearrangements (in preparation)
BK  -
</TI>
<SU>Genomic;
    Rearrangement;
    CA;
    Genome
</SU>
<AB>Preliminary version presented at Workshop on Genome Rearrangements,
University of Southern California, March 1994.
</AB>
<PY>1995</PY>
</SEQ>

<SEQ>
<UI>1514   Luo,L.        A Stochastic Evolution.. J.Theor.Biol.   92 
157:83-94
</UI>
<AU>Luo L;
    Trainor LEH
</AU>
<TI>A Stochastic Evolutionary Model of Molecular Sequences
</TI>
<SU>Sequence analysis;
    Stochastic;
    Evolution;
    Model;
    CA
</SU>
<AB>"A stochastic evolutionary model of molecular sequences is proposed. The
basic forces in evolution are supposed to be mutation and selection. ... The
selective force is divided into two parts: a slowly-varying part and a rapidly-
changing fluctuation. The latter influences the distribution of sequences and
results in an equation of motion along the flow line. The former plays a more
important role in the emergence of evolutionary order. It is demonstrated that
the asymmetry of selective forces would lead to a definite order of the 
system."
</AB>
<JT>J Theor Biol</JT>
<PY>157</PY>
<VO>157</VO>
<PP>83-94</PP>
</SEQ>

<SEQ>
<UI>1515   Jiang,T.      Approximating Shortest.. Theoret.Comput. 94 
134(2):473-491
</UI>
<AU>Jiang T;
    Li M
</AU>
<TI>Approximating Shortest Superstrings with Constraints
</TI>
<SU>Supersequence;
    Shortest common;
    Data structure;
    CA
</SU>
<AB>Also Proc. 3rd Workshop on Algorithms and Data Structures, 1993, pp. 385-
396. "Various versions of the shortest common superstring problem play 
important
roles in data compression and DNA sequencing. ... We present polynomial-time
approximation algorithms that produce consistent superstrings of length O(n),
for two important special cases: (a) when no negative strings contain positive
strings as substrings; (b) when there are only a constant number of negative
strings. The algorithms are obtained by making an essential use of the 
Hungarian
algorithm, which can find an optimal cycle cover on weighted graphs."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>1994</PY>
<VO>134</VO>
<NO>2</NO>
<PP>473-491</PP>
</SEQ>

<SEQ>
<UI>1516   Bandelt,H.J.  Split Decomposition: A.. Mol.Phylogenet. 92 
1(3):242-252
</UI>
<AU>Bandelt HJ;
    Dress AWM
</AU>
<TI>Split Decomposition: A New and Useful Approach to Phylogenetic Analysis 
of
Distance Data
</TI>
<SU>Phylogenetic;
    Evolutionary distance;
    Distance;
    DE;
    Decomposition
</SU>
<AB>"In order to analyze the structure inherent to a matrix of 
dissimilarities
(such as evolutionary distances) we propose to use a new technique called split
decomposition. This method accurately dissects the given dissimilarity measure
as a sum of elementary 'split' metrics plus a (small) residue. The split
summands identify related groups which are susceptible to further 
interpretation
when casted against the available biological information. Reanalysis of
previously published ribosomal RNA data sets using split decomposition
illustrate the potential of this approach."
</AB>
<JT>Mol Phylogenet Evol</JT>
<PY>1992</PY>
<VO>1</VO>
<NO>3</NO>
<PP>242-252</PP>
</SEQ>

<SEQ>
<UI>1517   Zhang,M.Q.    Alignment of Molecular.. J.Theor.Biol.   95 
174(2):119-129
</UI>
<AU>Zhang MQ;
    Marr TG
</AU>
<TI>Alignment of Molecular Sequences Seen as Random Path Analysis
</TI>
<SU>Sequence alignment;
    Random path;
    Dynamic programming;
    USA
</SU>
<AB>"We propose a generating functional method - random path analysis (RPA) -
that generalizes the classical dynamic programming (DP) method widely used in
sequence alignments. For a given cost function, DP is a deterministic method
that finds an optimal alignment by minimizing the total cost function for all
possible alignments. By allowing uncertainty, RPA is a statistical method that
weights fluctuating alignments by probabilities. Therefore, DP may be thought 
of
as the deterministic limit of RPA when the fluctuations approach zero. ... Here
we focus on deriving a mathematically rigorous solution to RPA both in its
combinatorial form and in its graphical representation ...."
</AB>
<JT>J Theor Biol</JT>
<PY>1995</PY>
<VO>174</VO>
<NO>2</NO>
<PP>119-129</PP>
</SEQ>

<SEQ>
<UI>1518   Orengo,C.A.   A Review of Methods fo.. Patterns in P.. 
92Springer-Verlag
</UI>
<AU>Orengo CA
</AU>
<TI>A Review of Methods for Protein Structure Comparison
</TI>
<ED>Taylor WR
</ED>
<BK>Patterns in Protein Sequence and Structure
</BK>
<SU>Structure;
    Review;
    Protein
</SU>
<AB>Vol. 7, Springer series in Biophysics. Orengo, Taylor (1993), p. 497
</AB>
<PU>Springer-Verlag </PU>
<PL>Heidelberg </PL>
<PY>1992</PY>
<PP>159-188</PP>
</SEQ>

<SEQ>
<UI>1519   Goldstein,L.  Approximations to Prof.. J.Comput.Biol.  94 
1(2):93-104
</UI>
<AU>Goldstein L;
    Waterman MS
</AU>
<TI>Approximations to Profile Score Distributions
</TI>
<SU>Scoring;
    Statistical;
    Significance;
    Approximation;
    Profile;
    Distribution;
    USA;
    Score
</SU>
<AB>"Profiles, which are summaries of multiple alignments of a sequence
family, are used to find new instances of the family in databases. In this
paper, we study the maximum score M obtained when the profile is aligned 
without
indels at all possible positions of a random sequence. The main theorem gives 
an
approximation to the distribution function of M with an explicit bound on the
error. This theorem implies that M has a limiting extreme value distribution."
</AB>
<JT>J Comput Biol</JT>
<PY>1994</PY>
<VO>1</VO>
<NO>2</NO>
<PP>93-104</PP>
</SEQ>

<SEQ>
<UI>1520   Searls,D.B.   The Computational Ling.. Artificial In.. 93AAAI 
Press
</UI>
<AU>Searls DB
</AU>
<TI>The Computational Linguistics of Biological Sequences
</TI>
<ED>Hunter L
</ED>
<BK>Artificial Intelligence and Molecular Biology
</BK>
<SU>Sequence analysis;
    Language;
    USA;
    Linguistic
</SU>
<AB>Searls (1992), p. 591 who mentions 1992, Snyder &amp; Stormo (1995), p. 17.
</AB>
<PU>AAAI Press </PU>
<PL>Cambridge, MA </PL>
<PY>1993</PY>
<PP>47-120</PP>
</SEQ>

<SEQ>
<UI>1521   Luo,L.        The Maximum Informatio.. J.Theor.Biol.   95 
174(2):131-136
</UI>
<AU>Luo L;
    Bai G
</AU>
<TI>The Maximum Information Principle and the Evolution of Nucleotide
Sequences
</TI>
<SU>Sequence analysis;
    Composition;
    Sequence prediction;
    Markov;
    Probability;
    Information theory;
    CN;
    Evolution;
    Nucleotide
</SU>
<AB>"The probability distributions of bases in nucleotide sequences are
deduced from the maximum information principle by maximizing the entropy (due 
to
random mutation of bases) under certain constraints (Markovian entropy, G+C
content, etc., due to selection). Two formulations are given with respect to
different selective constraints. The deviations of theoretical distributions
from experimental data are lower than 10% for most sequences. It is shown that
the Lagrange multipliers change from species to species systematically - i.e.,
selective constraints correlate with evolution."
</AB>
<JT>J Theor Biol</JT>
<PY>1995</PY>
<VO>174</VO>
<NO>2</NO>
<PP>131-136</PP>
</SEQ>

<SEQ>
<UI>1522   Aho,A.V.      Bounds on the Complexi.. IEEE Sympos.Fou 74 
15:104-109
</UI>
<AU>Aho AV;
    Hirschberg DS;
    Ullman JD
</AU>
<TI>Bounds on the Complexity of the Longest Common Subsequence Problem
</TI>
<SU>Longest common;
    Complexity;
    Subsequence;
    USA
</SU>
<AB>David S. Johnson. Not at DIMACS. Published in J. ACM, 23(1), 1-12 (1976)
</AB>
<JT>IEEE Sympos Found Comput Sci</JT>
<PY>15</PY>
<VO>15</VO>
<PP>104-109</PP>
</SEQ>

<SEQ>
<UI>1523   Griggs,J.R.   On the Number of Align.. Graphs Combin.  90 
6:133-146
</UI>
<AU>Griggs JR;
    Hanlon P;
    Odlyzko AM;
    Waterman MS
</AU>
<TI>On the Number of Alignments of k Sequences
</TI>
<SU>Multiple alignment;
    Sequence alignment;
    Combinatorial;
    USA
</SU>
<AB>"Numerous studies by molecular biologists concern the relationships
between several long DNA sequences, which are listed in rows with some gaps
inserted and with similar positions aligned vertically. This motivates our
interest in estimating the number of possible arrangements of such sequences. 
We
say that a k sequence alignment of size n is obtained by inserting some (or no)
0's into k sequences of n 1's so that every sequence has the same length and so
that there is no position which is 0 in all sequences. We show by a
combinatorial argument that for any fixed k &gt;= 1, the number f(k,n) of k
alignments of length n grows like (c-sub-k)**n as n [goes to infinity] ...."
</AB>
<JT>Graphs Combin</JT>
<PY>6</PY>
<VO>6</VO>
<PP>133-146</PP>
</SEQ>

<SEQ>
<UI>1524   Neyman,J.     Molecular Studies of E.. Statistical D.. 71Academic 
Press
</UI>
<AU>Neyman J
</AU>
<TI>Molecular Studies of Evolution: A Source of Novel Statistical Problems
</TI>
<ED>Gupta SS
    Yackel J
</ED>
<BK>Statistical Decision Theory and Related Topics; Proceedings of a 
Symposium
Held at Purdue University, November 23-25, 1970
</BK>
<SU>Statistical;
    Likelihood;
    Phylogeny;
    Evolution;
    USA
</SU>
<AB>"The recently opened and rapidly developing field of evolution research,
conducted on the level of molecules, is a novel source of interesting
statistical and probabilistic problems. The biological studies are concerned
with macromolecules which, in organisms as diverse as Man, Monkey, Carp, Whale
and Yeast, perform similar functions and have similar structures. The 
apparently
inconsequential differences among such homologous macromolecules, their sites
and their frequencies, are at the base of current efforts to establish lineages
linking the species studied to a common ancestor. The nature of statistical
problems originating from such biological studies is illustrated on two
tentative stochastic models of 'inconsequential' substitutions in the
macromolecules."
</AB>
<PU>Academic Press </PU>
<PL>New York </PL>
<PY>1971</PY>
<PP>1-27</PP>
</SEQ>

<SEQ>
<UI>1525   Bisant,D.     Identification of Ribo.. Nucleic Acids R 95 
23(9):1632-163
</UI>
<AU>Bisant D;
    Maizel J
</AU>
<TI>Identification of Ribosome Binding Sites in Escherichia coli Using Neural
Network Models
</TI>
<SU>RNA;
    Binding;
    Neural;
    USA;
    Identification;
    Ribosome;
    Network;
    Model
</SU>
<AB>"This study investigated the use of neural networks in the identification
of Escherichia coli ribosome binding sites. ... Feedforward backpropagation
networks were applied to their identification. Perceptrons were also applied,
since they have been the previous best method since 1982. Evaluation of
performance for all the neural networks and perceptrons was determined by ROC
[receiver-operating-characteristic] analysis. The neural network provided
significant improvement in the recognition of these sites when compared with 
the
previous best method, finding less than half the number of false positives when
both models were adjusted to find an equal number of actual sites."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1995</PY>
<VO>23</VO>
<NO>9</NO>
<PP>1632-1639</PP>
</SEQ>

<SEQ>
<UI>1526   Andersson,A.  Efficient Implementati.. Software.Practi 95 
25(2):129-141
</UI>
<AU>Andersson A;
    Nilsson S
</AU>
<TI>Efficient Implementation of Suffix Trees
</TI>
<SU>Suffix;
    Search tree;
    Compression;
    SWE
</SU>
<AB>"We study the problem of string searching using the traditional approach
of storing all unique substrings of the text in a suffix tree. The methods of
path compression, level compression and data compression are combined to build 
a
simple, compact and efficient implementation of a suffix tree. Based on a
comparative discussion and extensive experiments, we argue that our new data
structure is superior to previous methods in many practical situations."
</AB>
<JT>Software Practice Experience </JT>
<PY>1995</PY>
<VO>25</VO>
<NO>2</NO>
<PP>129-141</PP>
</SEQ>

<SEQ>
<UI>1527   Charleston,M. The Effects of Sequenc.. J.Comput.Biol.  94 
1(2):133-151
</UI>
<AU>Charleston MA;
    Hendy MD;
    Penny D
</AU>
<TI>The Effects of Sequence Length, Tree Topology, and Number of Taxa on the
Performance of Phylogenetic Methods
</TI>
<SU>Phylogeny;
    Performance;
    Phylogenetic;
    Simulation;
    NZ;
    Topology
</SU>
<AB>"Simulations were used to study the performance of several 
character-based
and distance-based phylogenetic methods in obtaining the correct tree from
pseudo-randomly generated input data. The study included all the topologies of
unrooted binary trees with from 4 to 10 pendant vertices (taxa) inclusive. The
length of the character sequences used ranged from 10 to 10**5 characters
exponentially. The methods studied include Closest Tree, Compatibility, Li's
method, Maximum Parsimony, Neighbor-joining, Neighborliness, and UPGMA."
</AB>
<JT>J Comput Biol</JT>
<PY>1994</PY>
<VO>1</VO>
<NO>2</NO>
<PP>133-151</PP>
</SEQ>

<SEQ>
<UI>1528   Steel,M.A.    Reconstructing Trees W.. J.Comput.Biol.  94 
1(2):153-163
</UI>
<AU>Steel MA;
    Szekely LA;
    Hendy MD
</AU>
<TI>Reconstructing Trees When Sequence Sites Evolve at Variable Rates
</TI>
<SU>Phylogeny;
    Rate;
    Markov;
    Spectral analysis;
    NZ
</SU>
<AB>"For a sequence of colors independently evolving on a tree under a simple
Markov model, we consider conditions under which the tree can be uniquely
recovered from the 'sequence spectrum' -- the expected frequencies of the
various leaf colorations. This is relevant for phylogenetic analysis (where
colors represent nucleotides or amino acids; leaves represent extant taxa) as
the sequence spectrum is estimated directly from a collection of aligned
sequences. ... Hence there is a logical barrier to accurate, consistent
phylogenetic inference for these models when assumptions about the rate
distribution are not made."
</AB>
<JT>J Comput Biol</JT>
<PY>1994</PY>
<VO>1</VO>
<NO>2</NO>
<PP>153-163</PP>
</SEQ>

<SEQ>
<UI>1529   Fasman,K.H.   Restructuring the Geno.. J.Comput.Biol.  94 
1(2):165-171
</UI>
<AU>Fasman KH
</AU>
<TI>Restructuring the Genome Data Base: A Model for a Federation of 
Biological
Databases
</TI>
<SU>Sequence database;
    Genome;
    Model;
    USA
</SU>
<AB>"The creation of a federation of public biological databases has been
proposed. Formerly independent systems will need to be modified to interoperate
better within this federation. This will enable the federated system to provide
biologists with an integrated view of biological data. The GDB Human Genome
Database is being restructured to participate in the proposed federation. GDB
itself will be organized into a collection of related data sets in support of
human gene mapping. The techniques that will be used to link these data sets
will be applicable to the federation as a whole."
</AB>
<JT>J Comput Biol</JT>
<PY>1994</PY>
<VO>1</VO>
<NO>2</NO>
<PP>165-171</PP>
</SEQ>

<SEQ>
<UI>1530   Nei,M.        Molecular Evolutionary..                 87Columbia 
Univer
</UI>
<AU>Nei M
</AU>
<TI>Molecular Evolutionary Genetics
BK  -
</TI>
<SU>Evolutionary distance;
    Substitution;
    Evolution;
    Population;
    Genetic;
    JP
</SU>
<AB>"During the last ten years, spectacular progress has occurred in the 
study
of molecular evolution and variation mainly because of the introduction of new
biochemical techniques such as gene cloning, DNA sequencing, and restriction
enzyme methods. ... Furthermore, the molecular approach is now being used for
studying the evolution of morphological, physiological, and behavioral
characters. The purpose of this book is to summarize and review recent
developments in this area of study. Previously, molecular evolution and
population genetics were studied as separate scientific disciplines. In this
book, an attempt will be made to unify these two disciplines into one which may
be called molecular evolutionary genetics." Bibliography: pp. 433-495.
</AB>
<PU>Columbia University Press </PU>
<PL>New York </PL>
<PY>1987</PY>
<PP>x+512-0</PP>
</SEQ>

<SEQ>
<UI>1531   Krichevsky,R. Occam's Razor, Partial.. Inform.Comput.  94 
108(1):158-174
</UI>
<AU>Krichevsky RE
</AU>
<TI>Occam's Razor, Partially Specified Boolean Functions, String Matching, 
and
Independent Sets
</TI>
<SU>String match;
    Probability;
    RU;
    Function
</SU>
<AB>"An algorithm transforming any partially specified boolean function into
an asymptotically shortest program to compute it is presented. The algorithms
runs in quadratic time. As corollaries, two methods are developed. The first of
them solves the string-matching problem in a randomized way with a smaller 
false
match probability than previously known methods. The second produces an
asymptotically largest family of independent sets in nearly minimal time."
</AB>
<JT>Inform Comput</JT>
<PY>1994</PY>
<VO>108</VO>
<NO>1</NO>
<PP>158-174</PP>
</SEQ>

<SEQ>
<UI>1532   Altschul,S.F. Issues in Searching Mo.. Nature Genetics 94 
6(2):119-129
</UI>
<AU>Altschul SF;
    Boguski MS;
    Gish W;
    Wootton JC
</AU>
<TI>Issues in Searching Molecular Sequence Databases
</TI>
<SU>Sequence database;
    Database search;
    Statistical;
    Sequence alignment;
    Scoring;
    USA
</SU>
<AB>"Sequence similarity search programs are versatile tools for the 
molecular
biologist, frequently able to identify possible DNA coding regions and to
provide clues to gene and protein structure and function. While much attention
had been paid to the precise algorithms these programs employ and to their
relative speeds, there is a constellation of associated issues that are equally
important to realize the full potential of these methods. Here, we consider a
number of these issues, including the choice of scoring systems, the 
statistical
significance of alignments, the masking of uninformative or potentially
confounding sequence regions, the nature and extent of sequence redundancy in
the databases and network access to similarity search services."
</AB>
<JT>Nature Genetics </JT>
<PY>1994</PY>
<VO>6</VO>
<NO>2</NO>
<PP>119-129</PP>
</SEQ>

<SEQ>
<UI>1533   Fitch,W.M.    A Hidden Bias in the E.. Evolutionary .. 86Academic 
Press
</UI>
<AU>Fitch WM
</AU>
<TI>A Hidden Bias in the Estimate of Total Nucleotide Substitutions from
Pairwise Differences
</TI>
<ED>Karlin S
    Nevo E
</ED>
<BK>Evolutionary Processes and Theory
</BK>
<SU>Substitution;
    Evolutionary divergence;
    Nucleotide;
    Bias;
    USA
</SU>
<AB>Swofford, Olsen (1990), p. 537
</AB>
<PU>Academic Press </PU>
<PL>Orlando, FL </PL>
<PY>1986</PY>
<PP>315-328</PP>
</SEQ>

<SEQ>
<UI>1534   Cavender,J.A. Necessary Conditions f.. Math.Biosci.    91 
103:69-75
</UI>
<AU>Cavender JA
</AU>
<TI>Necessary Conditions for the Method of Inferring Phylogeny by Linear
Invariants
</TI>
<SU>Evolutionary tree;
    Invariant;
    Phylogeny;
    Markov;
    USA
</SU>
<AB>"It is known that if all the Markov transition matrices that govern the
substitution of one nucleotide for another satisfy six linear constraints, then
equations can be derived that permit one to infer evolutionary trees from
nucleic acid sequences by the method of linear invariants. These sufficient
conditions are also necessary. Any relaxation of them results in the loss of 
all
linear invariants. Necessary conditions for any given set of linear invariants
can be derived by examining conditions a matrix must satisfy to map a certain
set of matrices into itself. To the extent that necessary conditions are
incorrect, a method is not reliable."
</AB>
<JT>Math Biosci</JT>
<PY>103</PY>
<VO>103</VO>
<PP>69-75</PP>
</SEQ>

<SEQ>
<UI>1535   Karlin,S.     Computational DNA Sequ.. Annu.Rev.Microb 94 
48:619-654
</UI>
<AU>Karlin S;
    Cardon LR
</AU>
<TI>Computational DNA Sequence Analysis
</TI>
<SU>Sequence analysis;
    DNA;
    Protein;
    Statistical;
    Evolution;
    k-tuple;
    USA
</SU>
<AB>"This paper reviews several new developments in computer and statistical
analysis of DNA and protein sequences. We present criteria and describe means
for assessing and interpreting genomic inhomogeneities within and between
sequences. These include: (a) characterizations of short oligonucleotide biases
and general compositional tendencies; (b) molecular evolutionary 
reconstructions
based on dinucleotide relative abundance distance measures and partial
orderings; (c) the application of r-scan statistics, quantile distributions, 
and
score-based analyses to identify clustering, overdispersion, and excessive
evenness in the distribution of a marker array along a sequence."
</AB>
<JT>Annu Rev Microbiol</JT>
<PY>48</PY>
<VO>48</VO>
<PP>619-654</PP>
</SEQ>

<SEQ>
<UI>1536   Li,W.H.       Reconstruction of Phyl.. Cold Spring Har 87 
52:847-856
</UI>
<AU>Li WH;
    Wolfe KH;
    Sourdis J;
    Sharp PM
</AU>
<TI>Reconstruction of Phylogenetic Trees and Estimation of Divergence Times
under Nonconstant Rates of Evolution
</TI>
<SU>Evolutionary rate;
    Divergence;
    Evolution;
    Rate;
    Phylogenetic;
    Estimation;
    USA
</SU>
<AB>"Phylogenetic reconstruction is extremely difficult when the rates of
evolution differ greatly among lineages and when the taxa or DNA sequences 
under
study are distantly related. We have studied this problem for the simple case 
of
only four taxa. We used computer simulation to compare the performance of
several methods to see which are most effective against unequal rates of
evolution .... It is commonly thought that phylogenetic reconstruction becomes
much simpler when one or more outgroups are available .... However, the
usefulness of an outgroup depends on its distance from the taxa under study. We
therefore studied how quickly the reliability of an outgroup reference 
decreases
with that distance."
</AB>
<JT>Cold Spring Harbor Sympos Quant Biol</JT>
<PY>52</PY>
<VO>52</VO>
<PP>847-856</PP>
</SEQ>

<SEQ>
<UI>1537   Golding,B.    A Maximum Likelihood A.. J.Mol.Evol.     90 
31:511-523
</UI>
<AU>Golding B;
    Felsenstein J
</AU>
<TI>A Maximum Likelihood Approach to the Detection of Selection from a
Phylogeny
</TI>
<SU>Evolutionary tree;
    Likelihood;
    Selection;
    Phylogeny;
    CA;
    Detection
</SU>
<AB>"A large amount of information is contained within the phylogenetic
relationships between species. ... The influence that deleterious selection
might have is determined here. The likelihood of different phylogenies in the
presence of selection is explored to determine the properties of such a
likelihood surface. The calculation of likelihoods for a phylogeny in the
presence and absence of selection, permits the application of a likelihood 
ratio
test to search for selection. It is shown that even a single selected site can
have a strong effect on the likelihood."
</AB>
<JT>J Mol Evol</JT>
<PY>31</PY>
<VO>31</VO>
<PP>511-523</PP>
</SEQ>

<SEQ>
<UI>1538   Taylor,W.R.   Protein Structure Mode.. J.Biotechnol.   94 
35(2/3):281-29
</UI>
<AU>Taylor WR
</AU>
<TI>Protein Structure Modelling from Remote Sequence Similarity
</TI>
<SU>Review;
    Protein;
    Structure;
    Model;
    Sequence proximity;
    Amino acid;
    UK;
    Similarity
</SU>
<AB>"Many methods exist for taking a sequence that exhibits similarity to
another of known structure and building a molecular model. However, when the
sequence similarity is very remote and fragmentary, this 
"modelling-by-homology"
approach is less reliable. Current methods that tackle this problem are 
reviewed
below, taking as an example the construction of a predicted model for the
retroviral protease. ... Because of the rapid proliferation of methods and 
their
variants, an exhaustive review of the literature has not been possible and the
following survey concentrates on the developments of the author and colleagues
to explain the basic methods."
</AB>
<JT>J Biotechnol</JT>
<PY>1994</PY>
<VO>35</VO>
<NO>2/3</NO>
<PP>281-291</PP>
</SEQ>

<SEQ>
<UI>1539   Fukami-Kobaya Estimation of Evolutio.. Mol.Biol.Evol.  94 
11(1):99-105
</UI>
<AU>Fukami-Kobayashi K
</AU>
<TI>Estimation of Evolutionary Distance between Distantly Related Sequences 
of
Amino Acids, Taking Account of Patterns of Amino Acid Replacement
</TI>
<SU>Evolutionary distance;
    Amino acid;
    Distance;
    Likelihood;
    Phylogenetic;
    JP;
    Estimation
</SU>
<AB>"A method called the 'similarity distance method' (SD method) was
developed to obtain maximum-likelihood estimates of evolutionary distance
between amino acid sequences, on the basis of a given pattern of amino acid
replacement. Computer simulation revealed that, by using the new method,
evolutionary distance can be estimated efficiently even when the expected
identity between the sequences is as low as 0.14 and the length of the 
sequences
is only 50 amino acid residues."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1994</PY>
<VO>11</VO>
<NO>1</NO>
<PP>99-105</PP>
</SEQ>

<SEQ>
<UI>1540   Agarwala,R.   A Polynomial-time Algo.. SIAM J.Comput.  94 
23(6):1216-122
</UI>
<AU>Agarwala R;
    Fernandez-Baca D
</AU>
<TI>A Polynomial-time Algorithm for the Perfect Phylogeny Problem when the
Number of Character States is Fixed
</TI>
<SU>Phylogeny;
    Character data;
    Compatibility;
    Evolutionary tree;
    USA;
    Algorithm
</SU>
<AB>"This paper presents a polynomial-time algorithm for determining whether 
a
set of species, described by the characters they exhibit, has a perfect
phylogeny, assuming the maximum number of possible states for a character is
fixed. This solves a longstanding open problem. This result should be 
contrasted
with the proof by Steel (1992) and Bodlaender, Fellows and Warnow (1992) that
the perfect phylogeny problem is NP-complete in general."
</AB>
<JT>SIAM J Comput</JT>
<PY>1994</PY>
<VO>23</VO>
<NO>6</NO>
<PP>1216-1224</PP>
</SEQ>

<SEQ>
<UI>1541   Apostolico,A. Parallel Detection of .. Theoret.Comput. 95 
141:163-173
</UI>
<AU>Apostolico A;
    Breslauer D;
    Galil Z
</AU>
<TI>Parallel Detection of all Palindromes in a String
</TI>
<SU>String search;
    Palindrome;
    Parallel;
    USA;
    Detection
</SU>
<AB>"This paper presents two efficient concurrent-read concurrent-write
parallel algorithms that find all palindromes in a given string. ... These new
results improve on the known parallel palindrome detection algorithms by using
smaller auxiliary space and either by making fewer operations or by achieving a
faster running time."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>141</PY>
<VO>141</VO>
<PP>163-173</PP>
</SEQ>

<SEQ>
<UI>1542   Bafna,V.      Sorting by Reversals: .. Mol.Biol.Evol.  95 
12(2):239-246
</UI>
<AU>Bafna V;
    Pevzner PA
</AU>
<TI>Sorting by Reversals: Genome Rearrangements in Plant Organelles and
Evolutionary History of X Chromosome
</TI>
<SU>Reversal;
    Genome;
    Sequence comparison;
    Inversion;
    USA;
    Rearrangement;
    Chromosome
</SU>
<AB>"The paper addresses the problem of genome comparison versus classical
gene comparison and presents algorithms to analyze rearrangements in genomes
evolving by inversions. In the simplest form the problem corresponds to sorting
by reversals, that is sorting of an array using reversals of arbitrary
fragments. We describe algorithms to analyze genomes evolving by inversions and
discuss applications of these algorithms in molecular evolution."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1995</PY>
<VO>12</VO>
<NO>2</NO>
<PP>239-246</PP>
</SEQ>

<SEQ>
<UI>1543   Breslauer,D.  Fast Parallel String P.. Theoret.Comput. 95 
137:269-278
</UI>
<AU>Breslauer D
</AU>
<TI>Fast Parallel String Prefix-matching
</TI>
<SU>String search;
    Prefix;
    Parallel;
    String match;
    DK
</SU>
<AB>"An O( log log m ) time n log m / log log m - processor CRCW-PRAM
algorithm for the string prefix-matching problem over general alphabets is
presented. The algorithm can also be used to compute the KMP [Knuth-Morris-
Pratt] failure function in O( log log m ) time on m log m / log log m
processors. These results improve on the running time of the best previous
algorithm for both problems, which was O( log m ), while preserving the same
number of processors."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>137</PY>
<VO>137</VO>
<PP>269-278</PP>
</SEQ>

<SEQ>
<UI>1544   Breslauer,D.  Dictionary-Matching on.. J.Algorithms    95 
18:278-295
</UI>
<AU>Breslauer D
</AU>
<TI>Dictionary-Matching on Unbounded Alphabets: Uniform Length Dictionaries
</TI>
<SU>Dictionary match;
    On-line;
    Multidimensional;
    Italy
</SU>
<AB>"In the string-matching problem one is interested in all occurrences of a
short pattern string in a longer text string. Dictionary-matching is a
generalization of this problem where one is looking simultaneously for all
occurrences of several patterns in a single text. This paper presents an
efficient on-line dictionary-matching algorithm for the case where the patterns
have uniform length and the input alphabet is unbounded. A tight lower bound
establishes that our approach is optimal if the only access the algorithm has 
to
the input strings is by pairwise symbol comparisons."
</AB>
<JT>J Algorithms </JT>
<PY>18</PY>
<VO>18</VO>
<PP>278-295</PP>
</SEQ>

<SEQ>
<UI>1545   Cole,R.       Tighter Lower Bounds o.. SIAM J.Comput.  95 
24(1):30-45
</UI>
<AU>Cole R;
    Hariharan R;
    Paterson M;
    Zwick U
</AU>
<TI>Tighter Lower Bounds on the Exact Complexity of String Matching
</TI>
<SU>String match;
    Complexity;
    On-line;
    Pattern match;
    USA
</SU>
<AB>"This paper considers the exact number of character comparisons needed to
find all occurrences of a pattern of length m in a text of length n using on-
line and general algorithms. ... These lower bounds complement an on-line upper
bound ... obtained recently by Cole and Hariharan. The lower bounds are 
obtained
by finding patterns with interesting combinatorial properties. It is also shown
that for some patterns off-line algorithms can be more efficient than on-line
algorithms."
</AB>
<JT>SIAM J Comput</JT>
<PY>1995</PY>
<VO>24</VO>
<NO>1</NO>
<PP>30-45</PP>
</SEQ>

<SEQ>
<UI>1546   DeBry,R.W.    The Relationship betwe.. Mol.Biol.Evol.  95 
12(2):291-297
</UI>
<AU>DeBry RW;
    Abele LG
</AU>
<TI>The Relationship between Parsimony and Maximum-Likelihood Analyses: Tree
Scores and Confidence Estimates for Three Real Data Sets
</TI>
<SU>Parsimony;
    Likelihood;
    Phylogeny;
    Evolutionary tree;
    Confidence;
    USA;
    Score
</SU>
<AB>"However, it is not known how frequently the most parsimonious topology
will be the same as the maximum-likelihood topology with real data sets. Three
18S nucleotide sequence data sets are examined, each consisting of seven
crustacean taxa. For each data set, under both parsimony and likelihood, scores
are determined for all 945 topologies, complete confidence sets are estimated 
by
methods that account for variance in the phylogenetic estimate, and bootstrap
resampling is performed."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1995</PY>
<VO>12</VO>
<NO>2</NO>
<PP>291-297</PP>
</SEQ>

<SEQ>
<UI>1547   Ferretti,V.   Phylogenetic Invariant.. J.Theor.Biol.   95 
173:147-162
</UI>
<AU>Ferretti V;
    Sankoff D
</AU>
<TI>Phylogenetic Invariants for More General Evolutionary Models
</TI>
<SU>Phylogenetic;
    Invariant;
    Phylogeny;
    Evolutionary tree;
    CA;
    Model
</SU>
<AB>"In this paper, we apply a general method of finding invariants of a
parameterized functional form to find low-degree polynomial invariants for
different models. Quadratic invariants are obtained for the Kimura 
two-parameter
model, for a model allowing evolutionary dependence between positions in the
sequences and for an asymmetric model that allows for A+T versus G+C 
asymmetries
in DNA base composition. Those invariants are found for trees (unrooted in the
case of the Kimura model and rooted for the others) with N=3 or N=4 terminal
vertices."
</AB>
<JT>J Theor Biol</JT>
<PY>173</PY>
<VO>173</VO>
<PP>147-162</PP>
</SEQ>

<SEQ>
<UI>1548   Gascuel,O.    A Note on Sattath and .. Mol.Biol.Evol.  94 
11(6):961-963
</UI>
<AU>Gascuel O
</AU>
<TI>A Note on Sattath and Tversky's, Saitou and Nei's, and Studier and
Keppler's Algorithms for Inferring Phylogenies from Evolutionary Distances
</TI>
<SU>Phylogeny;
    Reconstruct;
    Distance;
    Additive tree;
    Evolutionary tree;
    Evolutionary distance;
    FR;
    Algorithm
</SU>
<AB>"Several simulations ... have shown a high relative efficiency of ADDTREE
and of the NJ method in recovering the true topology. These studies have also
shown that ADDTREE and the NJ method, whose principles seem very different, are
in fact close and usually provide identical or similar trees. ... In this note,
we account for this proximity regardless of the number of taxa, and we show 
that
the minimum evolution principle, as employed in the NJ method, is very close to
the neighborliness used by Sattath and Tversky (1977) and by Fitch (1981) in a
nonagglomerative way."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1994</PY>
<VO>11</VO>
<NO>6</NO>
<PP>961-963</PP>
</SEQ>

<SEQ>
<UI>1549   Gaut,B.S.     Success of Maximum Lik.. Mol.Biol.Evol.  95 
12(1):152-162
</UI>
<AU>Gaut BS;
    Lewis PO
</AU>
<TI>Success of Maximum Likelihood Phylogeny Inference in the Four-Taxon Case
</TI>
<SU>Likelihood;
    Phylogeny;
    Evolutionary tree;
    USA
</SU>
<AB>"We used simulated data to investigate a number of properties of maximum-
likelihood (ML) phylogenetic tree estimation for the case of four taxa. ... 
Data
were analyzed in the ML framework with two different substitution models, and 
we
compared the ability of the two models to reconstruct the correct topology.
Although both models were inconsistent for some branch-length combinations in
the presence of site-to-site variation, the models were efficient predictors of
topology under most simulation conditions."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1995</PY>
<VO>12</VO>
<NO>1</NO>
<PP>152-162</PP>
</SEQ>

<SEQ>
<UI>1550   Karlin,S.     Which Bacterium is the.. Proc.Nat.Acad.S 94 
91:12842-12846
</UI>
<AU>Karlin S;
    Campbell AM
</AU>
<TI>Which Bacterium is the Ancestor of the Animal Mitochondrial Genome?
</TI>
<SU>Genome;
    Composition;
    Nucleotide;
    Bias;
    USA;
    Ancestor
</SU>
<AB>"We present considerable data supporting the hypothesis that a 
Sulfolobus-
or Mycoplasma-like endosymbiont, rather than an a-protobacterium, is the
ancestor of animal mitochondrial genomes. This hypothesis is based on 
pronounced
similarities in oligonucleotide relative abundance extremes common to animal
mtDNA, Sulfolobus, and Mycoplasma capricolum and pronounced discrepancies of
these relative abundance values with respect to a-proteobacteria."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>91</PY>
<VO>91</VO>
<PP>12842-12846</PP>
</SEQ>

<SEQ>
<UI>1551   Karlin,S.     Heterogeneity of Genom.. Proc.Nat.Acad.S 94 
91:12837-12841
</UI>
<AU>Karlin S;
    Ladunga I;
    Blaisdell BE
</AU>
<TI>Heterogeneity of Genomes: Measures and Values
</TI>
<SU>Genome;
    Nucleotide;
    Distance;
    USA
</SU>
<AB>"Genomic homogeneity is investigated for a broad base of DNA sequences in
terms of dinucleotide relative abundance distances (abbreviated d-distances) 
and
of oligonucleotide compositional extremes. It is shown that d-distances between
different genomic sequences in the same species are low, only about 2 or 3 
times
the distance found in random DNA, and are generally smaller than the between-
species d-distances."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>91</PY>
<VO>91</VO>
<PP>12837-12841</PP>
</SEQ>

<SEQ>
<UI>1552   Karlin,S.     Comparisons of Eukaryo.. Proc.Nat.Acad.S 94 
91:12832-12836
</UI>
<AU>Karlin S;
    Ladunga I
</AU>
<TI>Comparisons of Eukaryotic Genomic Sequences
</TI>
<SU>Genome;
    Sequence comparison;
    USA;
    Genomic
</SU>
<AB>"A method for assessing genomic similarity based on relative abundances 
of
short oligonucleotides in large DNA samples is introduced. The method requires
neither homologous sequences nor prior sequence alignments. The analysis 
centers
on (i) dinucleotide (and tri- and tetra-) relative abundance extremes in 
genomic
sequences, (ii) distances between sequences based on all dinucleotide relative
abundance values, and (iii) a multidimensional partial ordering protocol. The
emphasis in this paper is on assessments of general relatedness of genomes as
distinguished from phylogenetic reconstructions."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>91</PY>
<VO>91</VO>
<PP>12832-12836</PP>
</SEQ>

<SEQ>
<UI>1553   Kelly,C.      A Test of the Markovia.. Biometrics      94 
50:653-664
</UI>
<AU>Kelly C
</AU>
<TI>A Test of the Markovian Model of DNA Evolution
</TI>
<SU>Markov;
    Evolution;
    DNA;
    USA;
    Model
</SU>
<AB>"The Markov model of molecular evolution has recently received a
significant amount of interest because its statistical nature allows for the
testing of a number of evolutionary hypotheses. Here we propose a test which
assesses whether data from two species sharing a common ancestor will fit a
general Markovian model. We illustrate the test with two examples of data which
appear at first glance not to fit a Markov model."
</AB>
<JT>Biometrics </JT>
<PY>50</PY>
<VO>50</VO>
<PP>653-664</PP>
</SEQ>

<SEQ>
<UI>1554   Lento,G.M.    Use of Spectral Analys.. Mol.Biol.Evol.  95 
12(1):28-52
</UI>
<AU>Lento GM;
    Hickson RE;
    Chambers GK;
    Penny D
</AU>
<TI>Use of Spectral Analysis to Test Hypotheses on the Origin of Pinnipeds
</TI>
<SU>Spectral analysis;
    Evolution;
    DNA;
    NZ
</SU>
<AB>"We inferred phylogenetic reconstructions from DNA sequence data using
standard parsimony and neighbor-joining algorithms for phylogenetic inference 
as
well as a new method called spectral analysis (Hendy and Penny) in which
phylogenetic information is displayed independently of any selected tree. We
identified and compensated for potential sources of error known to lend to
selection of incorrect phylogenetic trees. These include sampling error, 
unequal
evolutionary rates on lineages, unequal nucleotide composition among lineages,
unequal rates of change at different sites, and inappropriate tree selection
criteria."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1995</PY>
<VO>12</VO>
<NO>1</NO>
<PP>28-52</PP>
</SEQ>

<SEQ>
<UI>1555   Perna,N.T.    Unequal Base Frequenci.. Mol.Biol.Evol.  95 
12(2):359-361
</UI>
<AU>Perna NT;
    Kocher TD
</AU>
<TI>Unequal Base Frequencies and the Estimation of Substitution Rates
</TI>
<SU>Substitution;
    Nucleotide;
    Sequence comparison;
    USA;
    Rate;
    Estimation
</SU>
<AB>"The model assumes a stationary process (i.e., the observed base
composition reflects the nucleotide frequencies at equilibrium). In order to
maintain the equilibrium composition, substitutions are weighted by the
frequency of the mutant base. The motivation for this weighting arises from an
analysis of the patterns and relative rates of base substitutions inferred by
parsimony from a distance-based tree. We wish to discuss this analysis of
substitution patterns, as well as the sensitivity of divergence estimates to 
the
equilibrium nucleotide frequencies which are assumed."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1995</PY>
<VO>12</VO>
<NO>2</NO>
<PP>359-361</PP>
</SEQ>

<SEQ>
<UI>1556   Rzhetsky,A.   Tests of Applicability.. Mol.Biol.Evol.  95 
12(1):131-151
</UI>
<AU>Rzhetsky A;
    Nei M
</AU>
<TI>Tests of Applicability of Several Substitution Models for DNA Sequence
Data
</TI>
<SU>Substitution;
    DNA;
    Sequence analysis;
    Invariant;
    Phylogeny;
    USA;
    Model
</SU>
<AB>"Using linear invariants for various models of nucleotide substitution, 
we
developed test statistics for examining the applicability of a specific model 
to
a given dataset in phylogenetic inference. ... The test statistics developed 
are
independent of evolutionary time and phylogeny, although the variances of the
statistics contain phylogenetic information. Therefore, these statistics can be
used before a phylogenetic tree is estimated. Our objective is to find the
simplest model that is applicable to a given dataset, keeping in mind that a
simple model usually gives an estimate of evolutionary distance ... with a
smaller variance than a complicated model when the simple model is correct."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1995</PY>
<VO>12</VO>
<NO>1</NO>
<PP>131-151</PP>
</SEQ>

<SEQ>
<UI>1557   Sitnikova,T.  Interior-Branch and Bo.. Mol.Biol.Evol.  95 
12(2):319-333
</UI>
<AU>Sitnikova T;
    Rzhetsky A;
    Nei M
</AU>
<TI>Interior-Branch and Bootstrap Tests of Phylogenetic Trees
</TI>
<SU>Phylogeny;
    Bootstrap;
    Statistical;
    USA;
    Phylogenetic
</SU>
<AB>"We have compared statistical properties of the interior-branch and
bootstrap tests of phylogenetic trees when the neighbor-joining tree-building
method is used. ... Actually, the bootstrap test usually underestimates the
extent of statistical support of species clusters. The relationship between the
confidence values obtained by the two tests varies with both the topology and
expected branch lengths of the true (model) tree."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1995</PY>
<VO>12</VO>
<NO>2</NO>
<PP>319-333</PP>
</SEQ>

<SEQ>
<UI>1558   Strumpen,V.   Coupling Hundreds of W.. Software.Practi 95 
25(3):291-304
</UI>
<AU>Strumpen V
</AU>
<TI>Coupling Hundreds of Workstations for Parallel Molecular Sequence 
Analysis
</TI>
<SU>Parallel;
    Distributed;
    Sequence analysis;
    SWI
</SU>
<AB>"We present a highly scalable approach to distributed parallel computing
on workstations in the Internet which provides significant speed-up to 
molecular
biology sequence analysis. Recent developments show that smaller numbers of
workstations connected via a local area network can be used efficiently for
parallel computing. This work emphasizes scalability with respect to the number
of workstations employed."
</AB>
<JT>Software Practice Experience </JT>
<PY>1995</PY>
<VO>25</VO>
<NO>3</NO>
<PP>291-304</PP>
</SEQ>

<SEQ>
<UI>1559   Tillier,E.R.M Neighbor Joining and M.. Mol.Biol.Evol.  95 
12(1):7-15
</UI>
<AU>Tillier ERM;
    Collins RA
</AU>
<TI>Neighbor Joining and Maximum Likelihood with RNA Sequences: Addressing 
the
Interdependence of Sites
</TI>
<SU>Likelihood;
    RNA;
    Sequence analysis;
    CA;
    Joining;
    Neighbor joining
</SU>
<AB>"We analyze a new probabilistic model for the evolution of 
double-stranded
RNA molecules that considers substitutions of the base pairs rather than of 
each
of the bases independently. The new model, called the double-stranded model, 
was
incorporated into the neighbor-joining distance and maximum likelihood methods.
Computer simulations show that maximum likelihood is very robust to the
violation of the assumption of the independence of sites. In contrast, the
neighbor-joining method is sensitive to such violations ...."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1995</PY>
<VO>12</VO>
<NO>1</NO>
<PP>7-15</PP>
</SEQ>

<SEQ>
<UI>1560   Overton,G.C.  QGB: A System for Quer.. J.Comput.Biol.  94 
1(1):3-14
</UI>
<AU>Overton GC;
    Aaronson JS;
    Haas J;
    Adams J
</AU>
<TI>QGB: A System for Querying Sequence Database Fields and Features
</TI>
<SU>Database search;
    Sequence database;
    USA
</SU>
<AB>"We have developed a general system, QGB, for performing complex queries
on the information in the DDBJ/EMBL/Genbank databases, including queries over
the structural features of sequences implied in the FEATURE TABLE. Queries are
formed in a Structured Query Language (SQL)-like syntax with language 
extensions
to support complex types ... appropriate for representing and querying sequence
data. A novel aspect of QGB is its ability to deduce missing features and infer
relationships among features as a consequence of constructing a parse tree of
sequence structure from information described in the FEATURE TABLE."
</AB>
<JT>J Comput Biol</JT>
<PY>1994</PY>
<VO>1</VO>
<NO>1</NO>
<PP>3-14</PP>
</SEQ>

<SEQ>
<UI>1561   States,D.J.   Combined Use of Sequen.. J.Comput.Biol.  94 
1(1):39-50
</UI>
<AU>States DJ;
    Gish W
</AU>
<TI>Combined Use of Sequence Similarity and Codon Bias for Coding Region
Identification
</TI>
<SU>Codon;
    Bias;
    Coding;
    Region;
    Sequence proximity;
    USA;
    Similarity;
    Identification
</SU>
<AB>"A computer program called BLASTX was previously shown to be effective in
identifying and assigning putative function to likely protein coding regions by
detecting significant similarity between a conceptually translated nucleotide
query sequence and members of a protein sequence database. We present and 
assess
the sensitivity of a new option to this software tool, herein called BLASTC,
which employs information obtained from biases in codon utilization, along with
the information obtained from sequence similarity."
</AB>
<JT>J Comput Biol</JT>
<PY>1994</PY>
<VO>1</VO>
<NO>1</NO>
<PP>39-50</PP>
</SEQ>

<SEQ>
<UI>1562   Miller,W.     Constructing Aligned S.. J.Comput.Biol.  94 
1(1):51-64
</UI>
<AU>Miller W;
    Boguski M;
    Raghavachari B;
    Zhang Z;
    Hardison RC
</AU>
<TI>Constructing Aligned Sequence Blocks
</TI>
<SU>Multiple alignment;
    Sequence alignment;
    Pairwise alignment;
    USA
</SU>
<AB>"This paper presents an efficient method for constructing aligned blocks
(i.e., gap-free multiple alignments) from a set of pairwise alignments. The
method is more sensitive than some earlier block-constructing methods for
detecting conserved sequence regions. The technique is applied to analyze
conserved regions in protein prenyltransferases and to detect regulatory
elements in the 5' flank of the b-globin gene."
</AB>
<JT>J Comput Biol</JT>
<PY>1994</PY>
<VO>1</VO>
<NO>1</NO>
<PP>51-64</PP>
</SEQ>

<SEQ>
<UI>1563   Ferretti,V.   Skewed Base Compositio.. J.Comput.Biol.  94 
1(1):77-92
</UI>
<AU>Ferretti V;
    Lang BF;
    Sankoff D
</AU>
<TI>Skewed Base Compositions, Asymmetric Transition Matrices, and 
Phylogenetic
Invariants
</TI>
<SU>Phylogenetic;
    Invariant;
    Transition;
    Evolutionary tree;
    CA;
    Composition
</SU>
<AB>"Evolutionary inference methods that assume equal DNA base compositions
and symmetric nucleotide substitution matrices, where these assumptions do not
hold, are likely to group species on the basis of similar base compositions
rather than true phylogenetic relationships. We propose an invariants-based
method for dealing with this problem. ... We apply a general 'empirical' method
of finding invariants of a parameterized functional form. ... We discuss the
problems of finding asymmetric models satisfying the property of semigroup
closure, of finding asymmetric models that admit invariants at all, and of the
computational complexity of the method."
</AB>
<JT>J Comput Biol</JT>
<PY>1994</PY>
<VO>1</VO>
<NO>1</NO>
<PP>77-92</PP>
</SEQ>

<SEQ>
<UI>1564   Godzik,A.     Flexible Algorithm for.. Comput.Appl.Bio 94 
10(6):587-596
</UI>
<AU>Godzik A;
    Skolnick J
</AU>
<TI>Flexible Algorithm for Direct Multiple Alignment of Protein Structures 
and
Sequences
</TI>
<SU>Protein;
    Structure;
    Multiple alignment;
    Sequence alignment;
    USA;
    Algorithm
</SU>
<AB>"The recently described equivalence between the alignment of two proteins
and a conformation of a lattice chain on a two-dimensional square lattice is
extended to multiple alignments. The search for the optimal multiple alignment
between several proteins, which is equivalent to finding the energy minimum in
the conformational space of a multi-dimensional lattice chain, is studied by 
the
Monte Carlo approach. This method ... can accept arbitrary scoring functions,
including non-local ones, and its speed decreases slowly with increasing number
of dimensions."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>6</NO>
<PP>587-596</PP>
</SEQ>

<SEQ>
<UI>1565   Kondrakhin,Y. Construction of a Gene.. Comput.Appl.Bio 94 
10(6):597-603
</UI>
<AU>Kondrakhin YV;
    Shamin VV;
    Kolchanov NA
</AU>
<TI>Construction of a Generalized Consensus Matrix for Recognition of
Vertebrate Pre-mRNA 3'-terminal Processing Sites
</TI>
<SU>Consensus matrix;
    RNA;
    RU;
    Recognition;
    Matrix
</SU>
<AB>"Using a set of sequences of 63 cleavage / polyadenylation sites of
vertebrate pre-mRNA, a generalized consensus matrix was constructed. The
elements of the matrix were the absolute frequencies of oligonucleotides of
length l at the ith position of sites. The cleavage point of each site was
assigned the same position number. To recognize a polyadenylation site in a
nucleotide sequence, a multiplicative measure was obtained using the elements 
of
the generalized consensus matrix as weight factors."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>6</NO>
<PP>597-603</PP>
</SEQ>

<SEQ>
<UI>1566   Shepelev,V.A. Multidimensional Dot-m.. Comput.Appl.Bio 94 
10(6):605-611
</UI>
<AU>Shepelev VA;
    Yanishevsky NV
</AU>
<TI>Multidimensional Dot-matrices
</TI>
<SU>Multidimensional;
    Dot;
    RU
</SU>
<AB>"A generalization of the dot-matrix of similarity for n sequence is
proposed. For the visualization of the n-dimensional dot-matrix, the n
projections onto the plane passing through the main diagonal and each of the n
axes of Euclidean space En are displayed. The projection is compressed so that
the points at the coordinates ... are depicted on the plane. The common regions
of similarity are revealed as segments of straight lines parallel to the main
diagonal."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>6</NO>
<PP>605-611</PP>
</SEQ>

<SEQ>
<UI>1567   Schneider,G.  Artificial Neural Netw.. Comput.Appl.Bio 94 
10(6):635-645
</UI>
<AU>Schneider G;
    Schuchhardt J;
    Wrede P
</AU>
<TI>Artificial Neural Networks and Simulated Molecular Evolution are 
Potential
Tools for Sequence-oriented Protein Design
</TI>
<SU>Neural;
    Feature extraction;
    Evolution;
    Simulation;
    Protein;
    DE;
    Network
</SU>
<AB>"The potential of artificial neural filter systems for feature extraction
from amino acid sequences is discussed. Analysis of signal peptidase I 
cleavage-
sites in protein precursor sequences serves as an example application. Trained
neural networks can be used as the fitness function in an evolutionary protein
design cycle termed 'simulated molecular evolution' which is an entirely
computer-based method for the rational design of locally encoded amino acid
sequence features."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>6</NO>
<PP>635-645</PP>
</SEQ>

<SEQ>
<UI>1568   Smith,S.W.    The Genetic Data Envir.. Comput.Appl.Bio 94 
10(6):671-675
</UI>
<AU>Smith SW;
    Overbeek R;
    Woese CR;
    Gilbert W;
    Gillevet PM
</AU>
<TI>The Genetic Data Environment: An Expandable GUI for Multiple Sequence
Analysis
</TI>
<SU>GUI;
    Sequence analysis;
    Multiple comparison;
    USA;
    Genetic
</SU>
<AB>"An X-Windows-based graphic user interface is presented which allows the
seamless integration of numerous existing biomolecular programs into a single
analysis environment. This environment is based on a core multiple sequence
editor that is linked to external programs by a user-expandable menu system and
is supported on Sun and DEC workstations. There is no limitation to the number
of external functions that can be linked to the interface. The length and 
number
of sequences that can be handled are limited only by the size of virtual memory
present on the workstation."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>6</NO>
<PP>671-675</PP>
</SEQ>

<SEQ>
<UI>1569   Wishart,D.S.  Constrained Multiple S.. Comput.Appl.Bio 94 
10(6):687-688
</UI>
<AU>Wishart DS;
    Boyko RF;
    Sykes BD
</AU>
<TI>Constrained Multiple Sequence Alignment using XALIGN
</TI>
<SU>Sequence alignment;
    Multiple alignment;
    CA
</SU>
<AB>"In response to this need, we have developed the program XALIGN (X-ray
ALIGNment), a menu-driven, modular program designed to perform up to six
different alignment functions. These include: 1. Pairwise protein sequence
alignment. 2. Multiple (&gt;500) sequence alignment. 3. Pairwise sequence /
structure alignments. 4. Multiple (&gt;500) sequence / structure alignments. 5.
Multi-residue clustering (for editing and alignment). 6. Multi-residue 
anchoring
(for editing and alignment)."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>6</NO>
<PP>687-688</PP>
</SEQ>

<SEQ>
<UI>1570   Vingron,M.    Multiple Sequence Comp.. Adv.Appl.Math.  95 
16(1):1-22
</UI>
<AU>Vingron M;
    Pevzner PA
</AU>
<TI>Multiple Sequence Comparison and Consistency on Multipartite Graphs
</TI>
<SU>Sequence comparison;
    Multiple comparison;
    Consistency;
    Graph;
    USA
</SU>
<AB>"Calculation of dot-matrices is a widespread tool in biological sequence
comparison. As a visual aid they are used in pairwise sequence comparison but 
so
far have been of little help in the simultaneous comparison of several
sequences. Viewing dot-matrices as projections of unknown n-dimensional points
we consider the multiple alignment problem (for n sequences) as an 
n-dimensional
image reconstruction problem with noise. We model this situation using a
multipartite graph and introduce a notion of 'consistency' on such a graph."
</AB>
<JT>Adv Appl Math</JT>
<PY>1995</PY>
<VO>16</VO>
<NO>1</NO>
<PP>1-22</PP>
</SEQ>

<SEQ>
<UI>1571   Bunke,H.      An Improved Algorithm .. Inform.Process. 95 54:93-96
</UI>
<AU>Bunke H;
    Csirik J
</AU>
<TI>An Improved Algorithm for Computing the Edit Distance of Run-length Coded
Strings
</TI>
<SU>Edit;
    Distance;
    Approximate match;
    String match;
    SWI;
    Algorithm
</SU>
<AB>"Recently, an algorithm for computing the edit distance of run-length
coded strings was proposed. ... In this paper, we propose a different approach.
Our new algorithm will be also based on a division of the edit matrix into
blocks. However, no subdivision of these blocks will ever be required. ... The
new algorithm is restricted, however, to the special cost function under which
the cost of any insertion and deletion is equal to 1, and the cost of any
substitution is equal to 2. The algorithm described in [Bunke &amp; Csirik 1993] 
can
additionally handle the case where all edit operations have unit cost."
</AB>
<JT>Inform Process Lett</JT>
<PY>54</PY>
<VO>54</VO>
<PP>93-96</PP>
</SEQ>

<SEQ>
<UI>1572   Idury,R.M.    Multiple Matching of R.. Inform.Comput.  95 
117(1):78-90
</UI>
<AU>Idury RM;
    Schaffer AA
</AU>
<TI>Multiple Matching of Rectangular Patterns
</TI>
<SU>String match;
    Multidimensional;
    Range search;
    USA;
    Rectangular
</SU>
<AB>"We describe the first worst-case efficient algorithm for simultaneously
matching multiple rectangular patterns of varying sizes and aspect ratios in a
rectangular text. Efficient means significantly more efficient asymptotically
than applying known algorithms that handle one height (or width or aspect 
ratio)
at a time for each height. Our algorithm features an interesting use of
multidimensional range searching, as well as new adaptations of several known
techniques for two-dimensional string matching."
</AB>
<JT>Inform Comput</JT>
<PY>1995</PY>
<VO>117</VO>
<NO>1</NO>
<PP>78-90</PP>
</SEQ>

<SEQ>
<UI>1573   Sheng,K.N.    Pattern Matching betwe.. Bull.Math.Biol. 94 
56(6):1143-116
</UI>
<AU>Sheng KN;
    Naus JI
</AU>
<TI>Pattern Matching between Two Non-aligned Random Sequences
</TI>
<SU>Pattern match;
    Sequence match;
    Probabilistic;
    Longest common;
    USA
</SU>
<AB>"Given two independent sequences of letters, we seek the probability
distribution of the length of the longest matching word. This word can be in
different positions in the two sequences and we consider both perfect and 
nearly
perfect matching. We derive bounds and approximations for the probability and
compare them with other bounds and approximations. The results can be applied 
to
DNA sequences in molecular biology and generalized matching between two
independent random seuqences."
</AB>
<JT>Bull Math Biol</JT>
<PY>1994</PY>
<VO>56</VO>
<NO>6</NO>
<PP>1143-1162</PP>
</SEQ>

<SEQ>
<UI>1574   Knight,A.     Weighting of Nucleotid.. Syst.Biol.      95 
44(1):112-116
</UI>
<AU>Knight A;
    Mindell DP
</AU>
<TI>Weighting of Nucleotide Sequences: A Reply
</TI>
<SU>Character weight;
    DNA;
    USA;
    Nucleotide
</SU>
<AB>"In an earlier paper (Knight and Mindell, 1993), we proposed and applied 
a
method for calculating weights for DNA sequence characters prior to 
phylogenetic
analysis. Our weighting scheme uses the ratio of expected to observed (EOR)
nucleotide differences in comparisons of sequences .... Collins et al. (1994)
expanded on the EOR method using the same premises mentioned above. ... We 
argue
here that ... their use of a random model in calculating expected values for 
the
EOR is unrealistic and therefore inappropriate in light of the extensive
evidence indicating the nonrandom nature of sequence evolution ...."
</AB>
<JT>Syst Biol</JT>
<PY>1995</PY>
<VO>44</VO>
<NO>1</NO>
<PP>112-116</PP>
</SEQ>

<SEQ>
<UI>1575   Collins,T.M.  Compositional Effects .. Syst.Biol.      94 
43(3):449-459
</UI>
<AU>Collins TM;
    Kraus F;
    Estabrook G
</AU>
<TI>Compositional Effects and Weighting of Nucleotide Sequences for
Phylogenetic Analysis
</TI>
<SU>Character weight;
    DNA;
    Phylogenetic;
    USA;
    Nucleotide
</SU>
<AB>"Knight and Mindell (1993) proposed a new method for weighting classes of
nucleotide substitution for application to phylogenetic analysis using DNA
sequences. ... However, the method proposed by K&amp;M for addressing the effects 
of
compositional bias when weighting is lacking in several respects. ... We 
suggest
a general approach to generating weighting schemes of this kind that is
consistent with our comments above and show one example of such a scheme that 
we
believe follows the intent of K&amp;M."
</AB>
<JT>Syst Biol</JT>
<PY>1994</PY>
<VO>43</VO>
<NO>3</NO>
<PP>449-459</PP>
</SEQ>

<SEQ>
<UI>1576   Chao,K.M.     Recent Developments in.. J.Comput.Biol.  94 
1(4):271-291
</UI>
<AU>Chao KM;
    Hardison RC;
    Miller W
</AU>
<TI>Recent Developments in Linear-Space Alignment Methods: A Survey
</TI>
<SU>Sequence analysis;
    Dynamic programming;
    Multiple alignment;
    Survey;
    USA
</SU>
<AB>"A dynamic-programming strategy for sequence alignment first proposed in
1975 by Dan Hirschberg can be adapted to yield a number of extremely space-
efficient algorithms. ... Three of our recent extensions of the technique are
then outlined. ... We also describe two linear-space methods for computing k
best local ... alignments, where k &gt;= 1. ... Finally, we describe programs that
implement various combinations of these techniques to provide a multisequence
alignment method that is especially suited to handling a few very long
sequences."
</AB>
<JT>J Comput Biol</JT>
<PY>1994</PY>
<VO>1</VO>
<NO>4</NO>
<PP>271-291</PP>
</SEQ>

<SEQ>
<UI>1577   Rioux,P.A.    A Portable Search Engi.. J.Comput.Biol.  94 
1(4):293-295
</UI>
<AU>Rioux PA;
    Gilbert WA;
    Littlejohn TG
</AU>
<TI>A Portable Search Engine and Browser for the Entrez Database
</TI>
<SU>Database search;
    Sequence database;
    CA
</SU>
<AB>"Entrez is a molecular sequence and reference database. We present a tool
called CLEVER which permits flexible access to the Entrez database by other
applications and interactively by users. In this way, CLEVER is both a search
engine and a command-line browser of the Entrez database."
</AB>
<JT>J Comput Biol</JT>
<PY>1994</PY>
<VO>1</VO>
<NO>4</NO>
<PP>293-295</PP>
</SEQ>

<SEQ>
<UI>1578   Taylor,W.R.   Motif-Biased Protein S.. J.Comput.Biol.  94 
1(4):297-310
</UI>
<AU>Taylor WR
</AU>
<TI>Motif-Biased Protein Sequence Alignment
</TI>
<SU>Protein;
    Sequence alignment;
    Multiple alignment;
    Motif;
    Gap;
    UK
</SU>
<AB>"A method was developed for pairwise protein sequence alignment to 
emulate
the effect of structural knowledge or multiple sequences. Runs of matches of 
the
preferred length were emphasized through the use of a product-bias allowing
short motifs to influence the alignment to a degree that was a realistic
reflection of their infrequency of occurrence. This gave motifs a locally high
scoring match, making their alignment relatively less sensitive to the value of
the gap penalty. This property should be a great advantage when a large number
of sequence comparisons are made with a fixed set of parameter values, as
typically occurs in the scan of a sequence databank with a probe or in the
development of a multiple alignment."
</AB>
<JT>J Comput Biol</JT>
<PY>1994</PY>
<VO>1</VO>
<NO>4</NO>
<PP>297-310</PP>
</SEQ>

<SEQ>
<UI>1579   Wang,L.       On the Complexity of M.. J.Comput.Biol.  94 
1(4):337-348
</UI>
<AU>Wang L;
    Jiang T
</AU>
<TI>On the Complexity of Multiple Sequence Alignment
</TI>
<SU>Sequence alignment;
    Multiple alignment;
    Complexity;
    Approximation;
    Evolutionary tree;
    CA
</SU>
<AB>"We study the computational complexity of two popular problems in 
multiple
sequence alignment: multiple alignment with SP-score and multiple tree
alignment. It is shown that the first problem is NP-complete and the second is
MAX SNP-hard. The complexity of tree alignment with a given phylogeny is also
considered."
</AB>
<JT>J Comput Biol</JT>
<PY>1994</PY>
<VO>1</VO>
<NO>4</NO>
<PP>337-348</PP>
</SEQ>

<SEQ>
<UI>1580   Naor,D.       On Near-Optimal Alignm.. J.Comput.Biol.  94 
1(4):349-366
</UI>
<AU>Naor D;
    Brutlag DL
</AU>
<TI>On Near-Optimal Alignments of Biological Sequences
</TI>
<SU>Sequence alignment;
    Pairwise alignment;
    Suboptimal;
    Dynamic programming;
    Edit;
    Distance;
    IL
</SU>
<AB>"A near-optimal alignment between a pair of sequences is an alignment
whose score lies within the neighborhood of the optimal score. We present an
efficient method for representing all alignments whose score is within any 
given
delta from the optimal score. The representation is a compact graph that makes
it easy to impose additional biological constraints and select one desirable
alignment from the large set of alignments. We study the combinatorial nature 
of
near-optimal alignments, and define a set of 'canonical' near-optimal
alignments."
</AB>
<JT>J Comput Biol</JT>
<PY>1994</PY>
<VO>1</VO>
<NO>4</NO>
<PP>349-366</PP>
</SEQ>

<SEQ>
<UI>1581   Julich,A.     Implementations of BLA.. Comput.Appl.Bio 95 
11(1):3-6
</UI>
<AU>Julich A
</AU>
<TI>Implementations of BLAST for Parallel Computers
</TI>
<SU>BLAST;
    Parallel;
    Complexity;
    DE
</SU>
<AB>"The BLAST sequence comparison programs have been ported to a variety of
parallel computers - the shared memory machine Cray Y-MP 8/864 and the
distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the
programs were ported to run on workstation clusters. We explain the
parallelization techniques and consider the pros and cons of these methods. The
BLAST programs are very well suited for parallelization for a moderate number 
of
processors. We illustrate our results using the program blastp as an example."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1995</PY>
<VO>11</VO>
<NO>1</NO>
<PP>3-6</PP>
</SEQ>

<SEQ>
<UI>1582   Resenchuk,S.M ALIGNMENT SERVICE: Cre.. Comput.Appl.Bio 95 
11(1):7-11
</UI>
<AU>Resenchuk SM;
    Blinov VM
</AU>
<TI>ALIGNMENT SERVICE: Creation and Processing of Alignments of Sequences of
Unlimited Length
</TI>
<SU>Multiple alignment;
    Sequence alignment;
    Editor;
    RU
</SU>
<AB>"A package for the creation and processing of multiple sequence alignment
is described. There is no limit on the lengths of the processed nucleotide or
amino acid sequences, and the number of sequences in the alignment is also
unlimited. The main groups of functions are: a semi-automatic alignment editor;
a wide set of functions for technical processing of alignments; nucleotide
alignment mapping and translation; and similarity search functions. A user-
friendly interface and a set of generally used file actions provide a special
operational subsystem for everyday tasks."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1995</PY>
<VO>11</VO>
<NO>1</NO>
<PP>7-11</PP>
</SEQ>

<SEQ>
<UI>1583   Hirosawa,M.   Comprehensive Study on.. Comput.Appl.Bio 95 
11(1):13-18
</UI>
<AU>Hirosawa M;
    Totoki Y;
    Hoshida M;
    Ishikawa M
</AU>
<TI>Comprehensive Study on Iterative Algorithms of Multiple Sequence 
Alignment
</TI>
<SU>Sequence alignment;
    Multiple alignment;
    Heuristic;
    Approximation;
    JP;
    Algorithm
</SU>
<AB>"Recently, an effective new class of algorithms has been developed. These
algorithms iteratively apply dynamic programming to partially aligned sequences
to improve their alignment quality. ... This paper reports our comprehensive
comparison of iterative algorithms. We proved that performance improves
remarkably when using a tree-based iterative method, which iteratively refines
an alignment whenever two subalignments are merged in a tree-based way. We
propose a tree-dependent, restricted partitioning technique to efficiently
reduce the execution time of iterative algorithms."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1995</PY>
<VO>11</VO>
<NO>1</NO>
<PP>13-18</PP>
</SEQ>

<SEQ>
<UI>1584   Granjeon,E.   Detection of Compositi.. Comput.Appl.Bio 95 
11(1):29-37
</UI>
<AU>Granjeon E;
    Tarroux P
</AU>
<TI>Detection of Compositional Constraints in Nucleic Acid Sequences Using
Neural Networks
</TI>
<SU>Exon;
    Intron;
    Composition;
    Neural;
    FR;
    Detection;
    Nucleic acid;
    Network
</SU>
<AB>"We describe in this paper a neural network method for the detection of
compositional constraints in introns and exons. ... As with the previous
approaches, this technique discriminates introns and exons .... Moreover, using
junk DNA sequences in the learning phase allows one to detect constrained
regions inside the intron and the exon sequences (i.e., sequences that differ,
by their nucleic acid compositions, from junk DNA). The application of our
approach could be useful in the study of the internal organization of these
sequences."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1995</PY>
<VO>11</VO>
<NO>1</NO>
<PP>29-37</PP>
</SEQ>

<SEQ>
<UI>1585   Eroshkin,A.M. PROANAL version 2: Mul.. Comput.Appl.Bio 95 
11(1):39-44
</UI>
<AU>Eroshkin AM;
    Fomin VI;
    Zhilkin PA;
    Ivanisenko VV;
    Kondrakhin YV
</AU>
<TI>PROANAL version 2: Multifunctional Program for Analysis of Multiple
Protein Sequence Alignments and for Studying the Structure - Activity
Relationships in Protein Families
</TI>
<SU>Protein;
    Multiple alignment;
    Sequence alignment;
    RU;
    Program;
    Structure
</SU>
<AB>"A new version of the program PROANAL is described. A multiple linear
regression analysis of the protein structure - activity relationship allows one
to investigate the combinations of protein sites and factors influencing the
activity. ... PROANAL2 may be useful in the simulation of protein-engineering
experiments and in the search of a number of protein regions such as functional
sites, secondary structures, solvent-exposed regions, T- and B-cell antigenic
determinants, etc."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1995</PY>
<VO>11</VO>
<NO>1</NO>
<PP>39-44</PP>
</SEQ>

<SEQ>
<UI>1586   Bodlaender,H. Parameterized Complexi.. Comput.Appl.Bio 95 
11(1):49-57
</UI>
<AU>Bodlaender HL;
    Downey RG;
    Fellows MR;
    Hallett MT;
    Wareham HT
</AU>
<TI>Parameterized Complexity Analysis in Computational Biology
</TI>
<SU>Longest common;
    Sequence alignment;
    Consensus discovery;
    Complexity;
    CA;
    Parameterized
</SU>
<AB>"Many computational problems in biology involve parameters for which a
small range of values cover important applications. We argue that for many
problems in this setting, parameterized computational complexity rather than 
NP-
completeness is the appropriate tool for studying apparent intractability. ...
In addition to surveying this complexity framework, we describe a new result 
for
the Longest Common Subsequence problem. ... Lower bounds on the complexity of
this basic combinatorial problem imply lower bounds on more general sequence
alignment and consensus discovery problems."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1995</PY>
<VO>11</VO>
<NO>1</NO>
<PP>49-57</PP>
</SEQ>

<SEQ>
<UI>1587   Sagot,M.F.    Finding Flexible Patte.. Comput.Appl.Bio 95 
11(1):59-70
</UI>
<AU>Sagot MF;
    Viari A;
    Pothier J;
    Soldano H
</AU>
<TI>Finding Flexible Patterns in a Text: An Application to Three-dimensional
Molecular Matching
</TI>
<SU>Pattern match;
    Multidimensional;
    Protein;
    Structure;
    FR
</SU>
<AB>"Finding certain regularities in a text is an important problem in many
areas, e.g., in the analysis of biological molecules such as nucleic acids or
proteins. In the latter case, the text may be sequences of amino acids or a
linear coding of three-dimensional structures, and the regularities then
correspond to lexical or structural motifs common to two, or more, proteins. We
first recall an earlier algorithm that found these regularities in a flexible
way. Then we introduce a generalized version of this algorithm designed for the
particular case of protein three-dimensional structures, since these structures
present a few peculiarities that make them computationally harder to process."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1995</PY>
<VO>11</VO>
<NO>1</NO>
<PP>59-70</PP>
</SEQ>

<SEQ>
<UI>1588   Perochon-Dori RNA_d2: A Computer Pro.. Comput.Appl.Bio 95 
11(1):101-109
</UI>
<AU>Perochon-Dorisse J;
    Chetouani F;
    Aurel S;
    Iscolo N;
    Bichot B
</AU>
<TI>RNA_d2: A Computer Program for Editing and Display of RNA Secondary
Structures
</TI>
<SU>RNA;
    Editing;
    Display;
    Secondary;
    Structure;
    FR;
    Program
</SU>
<AB>"RNA_d2 is a user-friendly program developed for interactively generating
aesthetic and non-overlapping drawings of RNA secondary structures. It is
designed so that the drawings can be edited in a very natural and intuitive 
way,
in order to emphasize structural homologies between several molecules, as well
as the foldings themselves to update the base-pair sets according to new data.
... RNA_d2 allows easy untangling and editing of RNA molecules &gt; 1000
nucleotides long."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1995</PY>
<VO>11</VO>
<NO>1</NO>
<PP>101-109</PP>
</SEQ>

<SEQ>
<UI>1589   Schoniger,M.  Simulating Efficiently.. Comput.Appl.Bio 95 
11(1):111-115
</UI>
<AU>Schoniger M;
    von Haeseler A
</AU>
<TI>Simulating Efficiently the Evolution of DNA Sequences
</TI>
<SU>Simulation;
    Evolution;
    DNA;
    Stochastic;
    DE
</SU>
<AB>"Two menu-driven FORTRAN programs are described that simulate the
evolution of DNA sequences in accordance with a user-specified model. This
general stochastic model allows for an arbitrary stationary nucleotide
composition and any transition-transversion bias during the process of base
substitution. In addition, the user may define any hypothetical model tree
according to which a family of sequences evolves. The programs suggest the
computationally most inexpensive approach to generate nucleotide substitutions.
Either reproducible or non-repeatable simulations ... can be performed."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1995</PY>
<VO>11</VO>
<NO>1</NO>
<PP>111-115</PP>
</SEQ>

<SEQ>
<UI>1590   Xu,Y.         Correcting Sequencing .. Comput.Appl.Bio 95 
11(2):117-124
</UI>
<AU>Xu Y;
    Mural RJ;
    Uberbacher EC
</AU>
<TI>Correcting Sequencing Errors in DNA Coding Regions Using a Dynamic
Programming Approach
</TI>
<SU>Error;
    Correction;
    Sequence analysis;
    DNA;
    Coding;
    Region;
    Dynamic programming;
    USA;
    Sequencing;
    Dynamic
</SU>
<AB>"This paper presents an algorithm for detecting and 'correcting'
sequencing errors that occur in DNA coding regions. The types of sequencing
errors addressed are insertions and deletions (indels) of DNA bases. The goal 
is
to provide a capability which makes single-pass or low-redundancy sequence data
more informative, reducing the need for high-redundancy sequencing for gene
identification and characterization purposes. ... On a test set consisting of 
68
human DNA sequences with 1% randomly generated indels in coding regions, the
algorithm detected and corrected 76% of the indels."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1995</PY>
<VO>11</VO>
<NO>2</NO>
<PP>117-124</PP>
</SEQ>

<SEQ>
<UI>1591   Chao,K.M.     A Local Alignment Tool.. Comput.Appl.Bio 95 
11(2):147-153
</UI>
<AU>Chao KM;
    Zhang J;
    Ostell J;
    Miller W
</AU>
<TI>A Local Alignment Tool for Very Long DNA Sequences
</TI>
<SU>Sequence alignment;
    Pairwise alignment;
    DNA;
    USA
</SU>
<AB>"This paper presents a practical program, called sim2, for building local
alignments of two sequences, each of which may be hundreds of kilobases long.
sim2 first constructs n best non-intersecting chains of 'fragments', such as 
all
occurrences of identical 5-tuples in each of two DNA sequences, for any
specified n &gt;= 1. Each chain is then refined by delivering an optimal alignment
in a region delimited by the chain. sim2 requires only space proportional to 
the
size of the input sequences and the output alignments, and the same source code
runs on Unix machines, on Macintoshes, on PCs, and on DEC Alpha PCs."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1995</PY>
<VO>11</VO>
<NO>2</NO>
<PP>147-153</PP>
</SEQ>

<SEQ>
<UI>1592   Watanabe,H.   A Comprehensive Repres.. Comput.Appl.Bio 95 
11(2):159-166
</UI>
<AU>Watanabe H;
    Otsuka J
</AU>
<TI>A Comprehensive Representation of Extensive Similarity Linkage between
Large Numbers of Proteins
</TI>
<SU>Protein;
    Multiple comparison;
    Sequence comparison;
    Clustering;
    JP;
    Representation;
    Similarity
</SU>
<AB>"A method is described for the representation of a bird's-eye view of
similarity relationships between large numbers of proteins. With the aid of
single-linkage clustering, proteins are clustered into groups on the basis of
various types of similarity such as sequence similarity estimated between all
the protein pairs. Proteins in a group are directly or indirectly connected to
all proteins in the same group by similarities higher than a given threshold 
and
show no similarity higher than the threshold to any proteins outside the 
group."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1995</PY>
<VO>11</VO>
<NO>2</NO>
<PP>159-166</PP>
</SEQ>

<SEQ>
<UI>1593   Thompson,J.D. Introducing Variable G.. Comput.Appl.Bio 95 
11(2):181-186
</UI>
<AU>Thompson JD
</AU>
<TI>Introducing Variable Gap Penalties to Sequence Alignment in Linear Space
</TI>
<SU>Sequence alignment;
    Pairwise alignment;
    Gap;
    Complexity;
    DE
</SU>
<AB>"The problem of finding an optimal sequence alignment has been solved by
Hirschberg (1975) in quadratic time and linear space. Myers and Miller (1988)
presented an implementation of this algorithm for aligning biological 
sequences,
incorporating affine gap penalties. ... This paper presents a further
development of the Myers and Miller algorithm. Here, we maximize similarity
scores and, more significantly, introduce position-specific gap penalties. 
Thus,
residue-dependent information such as structure preferences and existing gaps 
in
a partial alignment can be applied to the solution of the alignment problem."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1995</PY>
<VO>11</VO>
<NO>2</NO>
<PP>181-186</PP>
</SEQ>

<SEQ>
<UI>1594   Doelz,R.      A Compression Mechanis.. Comput.Appl.Bio 95 
11(2):219-223
</UI>
<AU>Doelz R;
    Eggenberger F
</AU>
<TI>A Compression Mechanism for Sequence Databases to Improve the Efficiency
of Conventional Tools
</TI>
<SU>Sequence database;
    Compression;
    SWI
</SU>
<AB>"This paper describes a method to compress molecular biology databases
that are characterized by an increasing proportion of data derived from genome
projects. The performance of our tool has been tested on various data files of
the EMBL nucleotide sequence database. ... The compression of sequence database
updates was tested in combination with the common Unix compression program
'compress'. Our tool improved the efficiency of 'compress' on average by 16%."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1995</PY>
<VO>11</VO>
<NO>2</NO>
<PP>219-223</PP>
</SEQ>

<SEQ>
<UI>1595   Allison,L.    The Posterior Probabil.. J.Mol.Evol.     94 
39:418-430
</UI>
<AU>Allison L;
    Wallace CS
</AU>
<TI>The Posterior Probability Distribution of Alignments and its Application
to Parameter Estimation of Evolutionary Trees and to Optimization of Multiple
Alignments
</TI>
<SU>Evolutionary tree;
    Probability;
    Sequence alignment;
    Multiple alignment;
    Optimal;
    Simulated annealing;
    AU;
    Distribution;
    Optimization;
    Estimation
</SU>
<AB>"How to sample alignments from their posterior probability distribution
given two strings is shown. This is extended to sampling alignments of more 
than
two strings. The result is first applied to the estimation of the edges of a
given evolutionary tree over several strings. Second, when used in conjunction
with simulated annealing, it gives a stochastic search method for an optimal
multiple alignment."
</AB>
<JT>J Mol Evol</JT>
<PY>39</PY>
<VO>39</VO>
<PP>418-430</PP>
</SEQ>

<SEQ>
<UI>1596   Benson,D.A.   GenBank                  Nucleic Acids R 94 
22(17):3441-34
</UI>
<AU>Benson DA;
    Boguski M;
    Lipman DJ;
    Ostell J
</AU>
<TI>GenBank
</TI>
<SU>DNA;
    Sequence database;
    GenBank;
    USA
</SU>
<AB>"The GenBank sequence database continues to expand its data coverage,
quality control, annotation content and retrieval services for the scientific
community. Besides handling direct submissions of sequence data from authors,
GenBank also incorporates DNA sequences from all available public sources; an
integrated retrieval system, known as Entrez, also makes available data from 
the
major protein sequence and structural databases, and from U.S. and European
patents. MEDLINE abstracts from published articles describing the sequences are
also included as an additional source of biological annotation for sequence
entries."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1994</PY>
<VO>22</VO>
<NO>17</NO>
<PP>3441-3444</PP>
</SEQ>

<SEQ>
<UI>1597   Benson,G.     A Method for Fast Data.. Nucleic Acids R 94 
22(22):4828-48
</UI>
<AU>Benson G;
    Waterman MS
</AU>
<TI>A Method for Fast Database Search for all k-Nucleotide Repeats
</TI>
<SU>Database search;
    DNA;
    Repeat;
    k-tuple;
    Region;
    USA
</SU>
<AB>"A significant portion of DNA consists of repeating patterns of various
sizes, from very small (one, two and three nucleotides) to very large (over 300
nucleotides). ... It would be useful to search for such regions in the DNA
database in order that they may be studied more fully.  ... Therefore, any
program to look for repeats must be efficient and fast. In this paper, we
present some new techniques that are useful in recognizing repeating patterns
and describe a new program for rapidly detecting repeat regions in the DNA
database where the basic unit of the repeat has size up to 32 nucleotides."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1994</PY>
<VO>22</VO>
<NO>22</NO>
<PP>4828-4836</PP>
</SEQ>

<SEQ>
<UI>1598   Bonfield,J.K. The Application of Num.. Nucleic Acids R 95 
23(8):1406-141
</UI>
<AU>Bonfield JK;
    Staden R
</AU>
<TI>The Application of Numerical Estimates of Base Calling Accuracy to DNA
Sequencing Projects
</TI>
<SU>DNA;
    Sequence recognition;
    Sequencing;
    Accuracy;
    Consensus method;
    UK
</SU>
<AB>"During DNA sequencing projects one of the most labour intensive and
highly skilled tasks is to view the original trace descriptions of gels and to
adjudicate between conflicting readings. Given the current methods of
calculating a consensus, the majority of the time employed in viewing traces 
and
editing readings is actually devoted to making the poorer data fit the good
data. We propose new consensus calculation algorithms that employ numerical
estimates of base calling accuracy and which when used in conjunction with an
automatic detector of contradictory data should greatly reduce the time spent
checking and editing readings and hence improve DNA sequencing productivity."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1995</PY>
<VO>23</VO>
<NO>8</NO>
<PP>1406-1410</PP>
</SEQ>

<SEQ>
<UI>1599   Borodovsky,M. Intrinsic and Extrinsi.. Nucleic Acids R 94 
22(22):4756-47
</UI>
<AU>Borodovsky M;
    Rudd KE;
    Koonin EV
</AU>
<TI>Intrinsic and Extrinsic Approaches for Detecting Genes in a Bacterial
Genome
</TI>
<SU>Gene;
    Identification;
    Detection;
    Markov;
    BLAST;
    Motif;
    Sequence comparison;
    USA;
    Genome
</SU>
<AB>"The unannotated regions of the E. coli genome DNA sequence from the
EcoSeq6 database, totaling 1,278 'intergenic' sequences of the combined length
of 359,279 basepairs, were analyzed using computer-assisted methods with the 
aim
of identifying putative unknown genes. The proposed strategy for finding new
genes includes two key elements: (i) prediction of expressed open reading 
frames
(ORFs) using the GeneMark method based on Markov chain models for coding and
non-coding regions of E. coli DNA, and (ii) search for protein sequence
similarities using programs based on the BLAST algorithm and programs for motif
identification."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1994</PY>
<VO>22</VO>
<NO>22</NO>
<PP>4756-4767</PP>
</SEQ>

<SEQ>
<UI>1600   Cserzo,M.     New Alignment Strategy.. J.Mol.Biol.     94 
243:388-396
</UI>
<AU>Cserzo M;
    Bernassau JM;
    Simon I;
    Maigret B
</AU>
<TI>New Alignment Strategy for Transmembrane Proteins
</TI>
<SU>Protein;
    Sequence alignment;
    Homology;
    Model;
    FR
</SU>
<AB>"In this paper an algorithm which locates helical transmembrane segments
is described. It is shown that given the location of transmembrane helices of a
protein, corresponding helices in another membrane related protein can be
pinpointed. The method seems to be extremely insensitive to sequence identity
but highly sensitive to the property of a sequence to assume transmembrane
helical structure. ... There are indications that hint at the broader range of
applicability of the presented method."
</AB>
<JT>J Mol Biol</JT>
<PY>243</PY>
<VO>243</VO>
<PP>388-396</PP>
</SEQ>

<SEQ>
<UI>1601   Emmert,D.B.   The European Bioinform.. Nucleic Acids R 94 
22(17):3445-34
</UI>
<AU>Emmert DB;
    Stoehr PJ;
    Stoesser G;
    Cameron GN
</AU>
<TI>The European Bioinformatics Institute (EBI) Databases
</TI>
<SU>Database search;
    Sequence database;
    Sequence search;
    UK
</SU>
<AB>"This paper describes the databases and services of the European
Bioinformatics Institute (EBI). In collaboration with DDBJ and GenBank/NCBI, 
the
EBI maintains and distributes the EMBL Nucleotide Sequence Database, Europe's
primary nucleotide sequence data resource. ... Over thirty additional 
specialist
molecular biology databases, as well as software and documentation of interest
to molecular biologists, are also available. The EBI network services include
database searching, entry retrieval, and sequence similarity searching
facilities."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1994</PY>
<VO>22</VO>
<NO>17</NO>
<PP>3445-3449</PP>
</SEQ>

<SEQ>
<UI>1602   Fasman,K.H.   The GDB(TM) Human Geno.. Nucleic Acids R 94 
22(17):3462-34
</UI>
<AU>Fasman KH;
    Cuticchia AJ;
    Kingsbury DT
</AU>
<TI>The GDB(TM) Human Genome Data Base anno 1994
</TI>
<SU>Genome;
    Database search;
    Mapping;
    Data acquisition;
    USA
</SU>
<AB>"In 1991 the Genome Data Base at Johns Hopkins University School of
Medicine was selected as the central repository for mapping data from the Human
Genome Project .... It is even more important that GDB provide leadership in 
the
genome informatics enterprise. Three themes described here are dominant in our
future plans and represent the essence of the major changes made in the past
year. They include: enhanced data acquisition, better map representation, and
full integration into the collection of genomic databases."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1994</PY>
<VO>22</VO>
<NO>17</NO>
<PP>3462-3469</PP>
</SEQ>

<SEQ>
<UI>1603   Fu,Y.X.       Linear Invariants unde.. J.Theor.Biol.   95 
173:339-352
</UI>
<AU>Fu YX
</AU>
<TI>Linear Invariants under Jukes' and Cantor's One-parameter Model
</TI>
<SU>Invariant;
    Phylogenetic;
    Evolutionary tree;
    USA;
    Model
</SU>
<AB>"Linear invariants are random variables with zero expectations under
certain assumptions. In this paper, linear invariants under Jukes' and Cantor's
one-parameter model, both with and without the assumption that nucleotide
frequencies are at equilibrium, are studied using the method developed in a
previous paper. Phylogenetic linear invariants ... for trees with up to seven
species are derived and bases of phylogenetic linear invariant spaces for
unrooted trees with four, five and six species are presented."
</AB>
<JT>J Theor Biol</JT>
<PY>173</PY>
<VO>173</VO>
<PP>339-352</PP>
</SEQ>

<SEQ>
<UI>1604   Gautheret,D.  Identification of Base.. J.Mol.Biol.     95 
248:27-43
</UI>
<AU>Gautheret D;
    Damberger SH;
    Gutell RR
</AU>
<TI>Identification of Base-triples in RNA using Comparative Sequence Analysis
</TI>
<SU>Sequence analysis;
    Sequence comparison;
    RNA;
    Structure;
    USA;
    Identification
</SU>
<AB>"Comparative sequence analysis has proven to be a very efficient tool for
the determination of RNA secondary structure and certain tertiary interactions.
However, base-triples, an important RNA structural element, cannot be predicted
accurately from sequence data. We show here that the poor base correlations
observed at base-triple positions are the result of two factors. (1) Base
covariation is not as strictly required in triples as it is in Watson-Crick
pairs. (2) Base-triple structures are less conserved among homologous
molecules."
</AB>
<JT>J Mol Biol</JT>
<PY>248</PY>
<VO>248</VO>
<PP>27-43</PP>
</SEQ>

<SEQ>
<UI>1605   George,D.G.   The PIR-International .. Nucleic Acids R 94 
22(17):3569-35
</UI>
<AU>George DG;
    Barker WC;
    Mewes HW;
    Pfeiffer F;
    Tsugita A
</AU>
<TI>The PIR-International Protein Sequence Database
</TI>
<SU>Protein;
    Sequence database;
    USA
</SU>
<AB>"PIR-International is an association of macromolecular sequence data
collection centers dedicated to fostering international cooperation as an
essential element in the development of scientific databases. A major objective
of PIR-International is to continue the development of the Protein Sequence
Database as an essential public resource for protein sequence information. This
paper briefly describes the architecture of the Protein Sequence Database and
how it and associated data sets are distributed and can be accessed
electronically."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1994</PY>
<VO>22</VO>
<NO>17</NO>
<PP>3569-3573</PP>
</SEQ>

<SEQ>
<UI>1606   Gu,X.         The Size Distribution .. J.Mol.Evol.     95 
40:464-473
</UI>
<AU>Gu X;
    Li WH
</AU>
<TI>The Size Distribution of Insertions and Deletions in Human and Rodent
Pseudogenes Suggests the Logarithmic Gap Penalty for Sequence Alignment
</TI>
<SU>Pseudogene;
    Gap;
    Sequence alignment;
    Indel;
    USA;
    Distribution;
    Deletion
</SU>
<AB>"The size distributions of deletions, insertions, and indels ... were
studied, using 78 human processed pseudogenes and other published data sets. 
The
following results were obtained: (1) Deletions occur more frequently than do
insertions in sequence evolution ... (2) Empirically, the size distributions of
deletions, insertions, and indels can be described well by a power law ... (5)
The linear gap penalty, which is most commonly used in sequence alignment, is
not supported by our analysis; rather, ... an appropriate gap penalty is wk = a
+ b ln k, where a is the gap creation cost and b ln k is the gap extension cost
...."
</AB>
<JT>J Mol Evol</JT>
<PY>40</PY>
<VO>40</VO>
<PP>464-473</PP>
</SEQ>

<SEQ>
<UI>1607   Hein,J.       A Maximum-Likelihood A.. J.Mol.Evol.     95 
40:181-189
</UI>
<AU>Hein J;
    Stovlbaek J
</AU>
<TI>A Maximum-Likelihood Approach to Analyzing Nonoverlapping and Overlapping
Reading Frames
</TI>
<SU>Likelihood;
    Reading;
    Frame;
    Evolution;
    Sequence analysis;
    DK
</SU>
<AB>"A model is presented for sequence evolution on the basis of which one 
can
analyze combinations of noncoding, singly coding, and multiply coding regions 
of
aligned homologous DNA sequences. It is a generalization of Kimura's (1980) and
Li, Wu &amp; Luo's (1985) transition-transversion models with selection on
replacement substitutions. Based on a hierarchy of hypotheses, one will be able
to estimate selection factors and transition and transversion distances for
different combinations of regions ...."
</AB>
<JT>J Mol Evol</JT>
<PY>40</PY>
<VO>40</VO>
<PP>181-189</PP>
</SEQ>

<SEQ>
<UI>1608   Henikoff,S.   Position-based Sequenc.. J.Mol.Biol.     94 
243:574-578
</UI>
<AU>Henikoff S;
    Henikoff JG
</AU>
<TI>Position-based Sequence Weights
</TI>
<SU>Multiple alignment;
    Sequence alignment;
    Sequence weight;
    Profile;
    Database search;
    Protein;
    Block search;
    USA
</SU>
<AB>"Sequence weighting methods have been used to reduce redundancy and
emphasize diversity in multiple sequence alignment and searching applications.
Each of these methods is based on a notion of distance between a sequence and 
an
ancestral or generalized sequence. We describe a different approach, which 
bases
weights on the diversity observed at each position in the alignment, rather 
than
on a sequence distance measure. These position-based weights make minimal
assumptions, are simple to compute, and perform well in comprehensive
evaluations."
</AB>
<JT>J Mol Biol</JT>
<PY>243</PY>
<VO>243</VO>
<PP>574-578</PP>
</SEQ>

<SEQ>
<UI>1609   Heulsenbeck,J Performance of Phyloge.. Syst.Biol.      95 
44(1):17-48
</UI>
<AU>Heulsenbeck JP
</AU>
<TI>Performance of Phylogenetic Methods in Simulation
</TI>
<SU>Phylogenetic;
    Evolutionary tree;
    Simulation;
    Performance;
    USA
</SU>
<AB>"In this study, I examined the performance of 26 commonly used methods of
phylogenetic inference for three statistical criteria: consistency, efficiency,
and robustness. ... The performance of methods was examined under three models
of DNA substitution for four taxa. The branch lengths of the four-taxon trees
were varied extensively in this simulation. The results indicate that most
methods perform well (i.e., estimate the correct tree &gt;= 95% of the time) over 
a
large portion of the four-taxon parameter space. In general, maximum likelihood
performed best, followed by the additive distance methods and the parsimony
methods."
</AB>
<JT>Syst Biol</JT>
<PY>1995</PY>
<VO>44</VO>
<NO>1</NO>
<PP>17-48</PP>
</SEQ>

<SEQ>
<UI>1610   Ina,Y.        New Methods for Estima.. J.Mol.Evol.     95 
40:190-226
</UI>
<AU>Ina Y
</AU>
<TI>New Methods for Estimating the Numbers of Synonymous and Nonsynonymous
Substitutions
</TI>
<SU>Evolutionary tree;
    Distance;
    Substitution;
    Statistical;
    JP;
    Synonymous
</SU>
<AB>"New methods for estimating the numbers of synonymous and nonsynonymous
substitutions per site were developed. The methods are unweighted pathway
methods based on Kumura's two-parameter model. Computer simulations were
conducted to evaluate the accuracies of the new methods, Nei and Gojobori's 
(NG)
method, Miyata and Yasunaga's (MY) method, Li, Wu, and Luo's (LWL) method, and
Pamilo, Bianchi, and Li's (PBL) method. ... The NG, MY, and LWL methods give
overestimates of the number of synonymous substitutions and underestimates of
the number of nonsynonymous substitutions."
</AB>
<JT>J Mol Evol</JT>
<PY>40</PY>
<VO>40</VO>
<PP>190-226</PP>
</SEQ>

<SEQ>
<UI>1611   Krogh,A.      A Hidden Markov Model .. Nucleic Acids R 94 
22(22):4768-47
</UI>
<AU>Krogh A;
    Mian IS;
    Haussler D
</AU>
<TI>A Hidden Markov Model that finds Genes in E. coli DNA
</TI>
<SU>Markov;
    Gene;
    DNA;
    Statistical;
    Protein;
    USA;
    Model
</SU>
<AB>"A hidden Markov model (HMM) has been developed to find protein coding
genes in E. coli DNA using E. coli genome DNA sequence from the EcoSeq6 
database
maintained by Kenn Rudd. This HMM includes states that model the codons and
their frequencies in E. coli genes, as well as the patterns found in the
intergenic region, including repetitive extragenic palindromic sequences and 
the
Shine-Delgarno motif. ... The HMM finds the exact locations of about 80% of the
known E. coli genes, and approximate locations for about 10%. It also finds
several potentially new genes, and locates several places where insertion or
deletion errors and/or frameshifts may be present in the contigs."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1994</PY>
<VO>22</VO>
<NO>22</NO>
<PP>4768-4778</PP>
</SEQ>

<SEQ>
<UI>1612   Li,W.H.       Statistical Tests of D.. Syst.Biol.      95 
44(1):49-63
</UI>
<AU>Li WH;
    Zharkikh A
</AU>
<TI>Statistical Tests of DNA Phylogenies
</TI>
<SU>Statistical;
    DNA;
    Phylogeny;
    Parsimony;
    Bootstrap;
    Minimum evolution;
    Bias;
    Neighbor joining;
    USA
</SU>
<AB>"In this article, we review (1) statistical tests of DNA phylogenies
inferred by the maximum-parsimony method ... (2) statistical tests based on the
minimum-evolution criterion ..., and (3) the bootstrap technique for estimating
the confidence level of a phylogenetic hypothesis based on either the maximum-
parsimony or the neighbor-joining method. We explain why the bootstrap 
technique
usually gives biased estimates and how to correct the bias."
</AB>
<JT>Syst Biol</JT>
<PY>1995</PY>
<VO>44</VO>
<NO>1</NO>
<PP>49-63</PP>
</SEQ>

<SEQ>
<UI>1613   Maidak,B.L.   The Ribosomal Database.. Nucleic Acids R 94 
22(17):3485-34
</UI>
<AU>Maidak BL;
    Larsen N;
    McCaughey MJ;
    Overbeek R;
    Olsen GJ;
    Fogel K;
    Blandy J;
    Woese CR
</AU>
<TI>The Ribosomal Database Project
</TI>
<SU>Ribosome;
    RNA;
    Sequence database;
    USA;
    rdp.life.uiuc.edu
</SU>
<AB>"The Ribosomal Database Project (RDP) is a curated database that offers
ribosome-related data, analysis services, and associated computer programs. The
offerings include phylogenetically ordered alignments of ribosomal RNA (rRNA)
sequences, derived phylogenetic trees, rRNA secondary structure diagrams, and
various software for handling, analyzing and displaying alignments and trees.
The data are available via anonymous ftp (rdp.life.uiuc.edu), electronic mail
(serverrdp.life.uiuc.edu) and gopher (rdpgopher.life.uiuc.edu)."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1994</PY>
<VO>22</VO>
<NO>17</NO>
<PP>3485-3487</PP>
</SEQ>

<SEQ>
<UI>1614   Sakakibara,Y. Stochastic Context-fre.. Nucleic Acids R 94 
22(23):5112-51
</UI>
<AU>Sakakibara Y;
    Brown M;
    Hughey R;
    Mian IS;
    Sjolander K;
    Underwood RC;
    Haussler D
</AU>
<TI>Stochastic Context-free Grammars for tRNA Modeling
</TI>
<SU>Stochastic;
    Language;
    Grammar;
    Sequence alignment;
    Markov;
    Structure;
    Secondary;
    USA
</SU>
<AB>"Stochastic context-free grammars (SCFGs) are applied to the problems of
folding, aligning and modeling families of tRNA sequences. SCFGs capture the
sequences' common primary and secondary structure and generalize the hidden
Markov models (HMMs) used in related work on protein and DNA. Results show that
after having been trained on as few as 20 tRNA sequences from only two tRNA
subfamilies ..., the model can discern general tRNA from similar-length RNA
sequences of other kinds, can find secondary structure of new tRNA sequences,
and can produce multiple alignments of large sets of tRNA sequences."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1994</PY>
<VO>22</VO>
<NO>23</NO>
<PP>5112-5120</PP>
</SEQ>

<SEQ>
<UI>1615   Sander,C.     The HSSP Database of P.. Nucleic Acids R 94 
22(17):3597-35
</UI>
<AU>Sander C;
    Schneider R
</AU>
<TI>The HSSP Database of Protein Structure - Sequence Alignments
</TI>
<SU>Protein;
    Structure;
    Sequence alignment;
    Sequence database;
    Homology;
    DE
</SU>
<AB>"HSSP (homology-derived structures of proteins) is a derived database
merging structural (2-D and 3-D) and sequence information (1-D). For each
protein of known 3D structure from the Protein Data Bank, the database has a
file with all sequence homologues, properly aligned to the PDB protein.
Homologues are very likely to have the same 3D structure as the PDB protein to
which they have been aligned. As a result, the database is not only a database
of sequence aligned sequence families, but it is also a database of implied
secondary and tertiary structures."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1994</PY>
<VO>22</VO>
<NO>17</NO>
<PP>3597-3599</PP>
</SEQ>

<SEQ>
<UI>1616   Snyder,E.E.   Identification of Prot.. J.Mol.Biol.     95 248:1-18
</UI>
<AU>Snyder EE;
    Stormo GD
</AU>
<TI>Identification of Protein Coding Regions in Genomic DNA
</TI>
<SU>Protein;
    Coding;
    Region;
    Genomic;
    DNA;
    Gene;
    Identification;
    USA
</SU>
<AB>"We have developed a computer program, GeneParser, which identifies and
determines the fine structure of protein genes in genomic DNA sequences. ...
Using this method, we can rapidly generate ranked suboptimal solutions, each of
which is the optimum solution containing a given intron-exon junction. We have
tested the system on a large collection of human genes. ... We have also
quantified the robustness of the method to substitution and frame-shift errors
and show how the system can be optimized for performance on sequences with 
known
levels of sequencing errors."
</AB>
<JT>J Mol Biol</JT>
<PY>248</PY>
<VO>248</VO>
<PP>1-18</PP>
</SEQ>

<SEQ>
<UI>1617   Strelets,V.B. Analysis of Peptides f.. J.Mol.Evol.     94 
39:625-630
</UI>
<AU>Strelets VB;
    Shindyalov IN;
    Lim HA
</AU>
<TI>Analysis of Peptides from Known Proteins: Clusterization in Sequence 
Space
</TI>
<SU>Protein;
    Sequence proximity;
    k-tuple;
    Evolution;
    Clustering;
    USA
</SU>
<AB>"A combinatorial sequence space (CSS) model was introduced to represent
sequences as a set of overlapping k-tuples of some fixed length which 
correspond
to points in the CSS. The aim was to analyze clusterization of protein 
sequences
in the CSS and to test various hypotheses about the possible evolutionary basis
of this clusterization. The authors developed an easy-to-use technique which 
can
reveal and analyze such a characterization in a multidimensional CSS.
Application of the technique led to an unexpectedly high clusterization of
points in the CSS corresponding to k-tuples from known proteins."
</AB>
<JT>J Mol Evol</JT>
<PY>39</PY>
<VO>39</VO>
<PP>625-630</PP>
</SEQ>

<SEQ>
<UI>1618   Thompson,J.D. CLUSTAL W: Improving t.. Nucleic Acids R 94 
22(22):4673-46
</UI>
<AU>Thompson JD;
    Higgins DG;
    Gibson TJ
</AU>
<TI>CLUSTAL W: Improving the Sensitivity of Progressive Multiple Sequence
Alignment through Sequence Weighting, Position-specific Gap Penalties and 
Weight
Matrix Choice
</TI>
<SU>Multiple alignment;
    Sequence alignment;
    Sequence weight;
    Gap;
    Substitution;
    DE;
    Matrix
</SU>
<AB>"The sensitivity of the commonly used progressive multiple sequence
alignment method has been greatly improved for the alignment of divergent
protein sequences. Firstly, individual weights are assigned to each sequence in
a partial alignment in order to down-weight near-duplicate sequences and up-
weight the most divergent ones. Secondly, amino acid substitution matrices are
varied at different alignment stages according to the divergence of the
sequences to be aligned. Thirdly .... Fourthly .... These modifications are
incorporated into a new program, CLUSTAL W which is freely available."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1994</PY>
<VO>22</VO>
<NO>22</NO>
<PP>4673-4680</PP>
</SEQ>

<SEQ>
<UI>1619   Thorne,J.L.   Estimation and Reliabi.. Biometrics      95 
51(1):100-113
</UI>
<AU>Thorne JL;
    Churchill GA
</AU>
<TI>Estimation and Reliability of Molecular Sequence Alignments
</TI>
<SU>Estimation;
    Reliability;
    Sequence alignment;
    Stochastic;
    Evolution;
    Likelihood;
    USA
</SU>
<AB>"The problem of estimating the relatedness of a pair of biological
sequences is addressed. A stochastic model of sequence evolution is described
that allows insertion and deletion as well as replacement of amino acid 
residues
... over time. An expectation-maximization (EM) algorithm that obtains maximum
likelihood estimates of the model parameters is introduced. The method assumes
that the sequences are related by descent from a common ancestor but the
alignment (i.e., the precise evolutionary correspondence between residues in
each sequence) is unknown. Results from the E-step of the EM algorithm are used
to assess the likelihood that any two residues are related by direct descent
from a common ancestor."
</AB>
<JT>Biometrics </JT>
<PY>1995</PY>
<VO>51</VO>
<NO>1</NO>
<PP>100-113</PP>
</SEQ>

<SEQ>
<UI>1620   Tillier,E.R.M Maximum Likelihood wit.. J.Mol.Evol.     94 
39:409-417
</UI>
<AU>Tillier ERM
</AU>
<TI>Maximum Likelihood with Multiparameter Models of Substitution
</TI>
<SU>Likelihood;
    Substitution;
    Model;
    Evolution;
    CA
</SU>
<AB>"Maximum-likelihood approaches to phylogenetic estimation have the
potential of great flexibility, even though current implementations are highly
constrained. One such constraint has been the limitation to one-parameter 
models
of substitution. A general implementation of Newton's maximization procedure 
was
developed that allows the maximum likelihood method to be used with
multiparameter models. The Estimate and Maximize (EM) algorithm was also used 
to
obtain a good approximation to the maximum likelihood for a certain class of
multiparameter models."
</AB>
<JT>J Mol Evol</JT>
<PY>39</PY>
<VO>39</VO>
<PP>409-417</PP>
</SEQ>

<SEQ>
<UI>1621   Ulyanov,A.V.  Multi-alphabet Consens.. Nucleic Acids R 95 
23(8):1434-144
</UI>
<AU>Ulyanov AV;
    Stormo GD
</AU>
<TI>Multi-alphabet Consensus Algorithm for Identification of Low Specificity
Protein-DNA Interactions
</TI>
<SU>Consensus discovery;
    Identification;
    Signal;
    DNA;
    USA;
    Algorithm
</SU>
<AB>"A method for the identification and characterization of protein-DNA
interactions is presented. We have developed an approach for finding unknown
multiple patterns that occur imperfectly in a set of several sequences. The
pattern may contain letters from the nucleotide alphabet (A,C,G,T) including
ambiguous characters (A/C, A/G, etc.). This method reveals weak DNA signals on
an unaligned set of DNA fragments known to be functionally related and assumes
no prior information on the sequences' alignment."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1995</PY>
<VO>23</VO>
<NO>8</NO>
<PP>1434-1440</PP>
</SEQ>

<SEQ>
<UI>1622   Levitt,M.     Accurate Modeling of P.. J.Mol.Biol.     92 
226:507-533
</UI>
<AU>Levitt M
</AU>
<TI>Accurate Modeling of Protein Conformation by Automatic Segment Matching
</TI>
<SU>Protein;
    Segment;
    Match complex patterns;
    Sequence match;
    Database search;
    Model;
    USA
</SU>
<AB>"Segment match modeling uses a data base of highly refined known protein
X-ray structures to build an unknown target structure from its amino acid
sequence and the atomic coordinates of a few of its atoms .... The target
structure is first broken into a set of short segments. The data base is then
searched for matching segments, which are fitted onto the framework of the
target structure. Three criteria are used for choosing a matching data base
segment: amino acid sequence similarity, conformational similarity (atomic
coordinates), and compatibility with the target structure (van der Waals'
interactions)."
</AB>
<JT>J Mol Biol</JT>
<PY>226</PY>
<VO>226</VO>
<PP>507-533</PP>
</SEQ>

<SEQ>
<UI>1623   Cardon,L.R.   Expectation Maximizati.. J.Mol.Biol.     92 
223:159-170
</UI>
<AU>Cardon LR;
    Stormo GD
</AU>
<TI>Expectation Maximization Algorithm for Identifying Protein-binding Sites
with Variable Lengths from Unaligned DNA Fragments
</TI>
<SU>DNA;
    Expectation;
    Maximization;
    Protein;
    Binding;
    Multiple alignment;
    Consensus sequence;
    Fragment;
    USA;
    Algorithm
</SU>
<AB>"An Expectation Maximization algorithm for identification of DNA binding
sites is presented. The approach predicts the location of binding regions while
allowing variable length spacers within the sites. In addition to predicting 
the
most likely spacer length for a set of DNA fragments, the method identifies
individual sites that differ in spacer size. No alignment of DNA sequences is
necessary. The method is illustrated by application to 231 E. coli DNA 
fragments
known to contain promoters with variable spacings between their consensus
regions."
</AB>
<JT>J Mol Biol</JT>
<PY>223</PY>
<VO>223</VO>
<PP>159-170</PP>
</SEQ>

<SEQ>
<UI>1624   Saroff,H.A.   The Uniqueness of Prot.. Bull.Math.Biol. 84 
46(4):661-672
</UI>
<AU>Saroff HA
</AU>
<TI>The Uniqueness of Protein Sequences. Uniqueness Diagrams for the Dayhoff
File - 1984
</TI>
<SU>Protein;
    Sequence analysis;
    k-tuple;
    Monte Carlo;
    Approximation;
    USA
</SU>
<AB>"Protein sequences of the Dayhoff databank of 1984 have been analyzed to
evaluate the occurrences of the 400 dipeptides and 8000 tripeptides. Expected
values and standard deviations for the di- and tri-peptides were determined by
Monte Carlo and binomial approximation. A condensed format containing this
information, labeled a uniqueness diagram, is presented and made available in
the form of a microfiche."
</AB>
<JT>Bull Math Biol</JT>
<PY>1984</PY>
<VO>46</VO>
<NO>4</NO>
<PP>661-672</PP>
</SEQ>

<SEQ>
<UI>1625   Bougueleret,L Objective Comparison o.. Nucleic Acids R 88 
16(5):1729-173
</UI>
<AU>Bougueleret L;
    Tekaia F;
    Sauvaget I;
    Claverie JM
</AU>
<TI>Objective Comparison of Exon and Intron Sequences by the Mean of 2-
Dimensional Data Analysis Methods
</TI>
<SU>Exon;
    Intron;
    Sequence comparison;
    Sequence analysis;
    FR
</SU>
<AB>"Here we advocate the use of 2-dimensional data representation in the
context of the informational approach of sequence analysis (Claverie &amp;
Bougueleret (1986) NAR 14, 179-196) by applying these methods to the problem of
intron/exon discrimination. Two main findings are reported: (i) oligonucleotide
patterns complementary to the U1 small nuclear RNA are specifically avoided in
exon sequences, (ii) vertebrate intron sequences, to the exclusion of other
eukariotic phyla, are characterized by a peculiar distribution of CpG 
containing
patterns."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1988</PY>
<VO>16</VO>
<NO>5</NO>
<PP>1729-1738</PP>
</SEQ>

<SEQ>
<UI>1626   Fields,C.A.   Gm: A Practical Tool f.. Comput.Appl.Bio 90 
6:263-270
</UI>
<AU>Fields CA;
    Soderlund CA
</AU>
<TI>Gm: A Practical Tool for Automating DNA Sequence Analysis
</TI>
<SU>Sequence analysis;
    DNA
</SU>
<AB>Snyder &amp; Stormo (1995), p. 17
</AB>
<JT>Comput Appl Biosci</JT>
<PY>6</PY>
<VO>6</VO>
<PP>263-270</PP>
</SEQ>

<SEQ>
<UI>1627   Gish,W.       Identification of Prot.. Nature Genetics 93 
3:266-272
</UI>
<AU>Gish W;
    States DJ
</AU>
<TI>Identification of Protein Coding Regions by Database Similarity Search
</TI>
<SU>Protein;
    Database search;
    Similarity;
    Identification;
    Coding;
    Region
</SU>
<AB>Snyder &amp; Stormo (1995), p. 17
</AB>
<JT>Nature Genetics </JT>
<PY>3</PY>
<VO>3</VO>
<PP>266-272</PP>
</SEQ>

<SEQ>
<UI>1628   Knight,J.R.   Super-Pattern Matching   Algorithmica    95 
13(1/2):211-24
</UI>
<AU>Knight JR;
    Myers EW
</AU>
<TI>Super-Pattern Matching
</TI>
<SU>Pattern match;
    USA
</SU>
<AB>UnCover SICI Code:  0178-4617(19950101)13:1:2L.211:SM;1-
</AB>
<JT>Algorithmica </JT>
<PY>1995</PY>
<VO>13</VO>
<NO>1/2</NO>
<PP>211-243</PP>
</SEQ>

<SEQ>
<UI>1629   Zuker,M.      On Finding all Subopti.. Science         89 244(7 
April):4
</UI>
<AU>Zuker M
</AU>
<TI>On Finding all Suboptimal Foldings of an RNA Molecule
</TI>
<SU>RNA;
    Folding;
    Secondary;
    Structure;
    Suboptimal;
    CA
</SU>
<AB>"An algorithm and a computer program have been prepared for determining
RNA secondary structures within any prescribed increment of the computed global
minimum free energy. The mathematical problem of determining how well defined a
minimum energy folding is can now be solved. All predicted base pairs that can
participate in suboptimal structures may be displayed and analyzed graphically.
Representative suboptimal foldings are generated by selecting these base pairs
one at a time and computing the best foldings that contain them."
</AB>
<JT>Science </JT>
<PY>1989</PY>
<VO>244</VO>
<NO>7 April</NO>
<PP>48-52</PP>
</SEQ>

<SEQ>
<UI>1630   Uberbacher,E. Locating Protein-codin.. Proc.Nat.Acad.S 91 
88:11261-11265
</UI>
<AU>Uberbacher EC;
    Mural RJ
</AU>
<TI>Locating Protein-coding Regions in Human DNA Sequences by a Multiple
Sensor-Neural Network Approach
</TI>
<SU>DNA;
    Neural;
    Network;
    Coding;
    Protein;
    Region;
    USA
</SU>
<AB>"Identifying genes within large regions of uncharacterized DNA is a
difficult undertaking and is currently the focus of many research efforts. We
describe a reliable computational approach for locating protein-coding portions
of genes in anonymous DNA sequence. Using a concept suggested by robotic
environmental sensing, our method combines a set of sensor algorithms and a
neural network to localize the coding regions. Several algorithms that report
local characteristics of the DNA sequence, and therefore act as sensors, are
also described."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>88</PY>
<VO>88</VO>
<PP>11261-11265</PP>
</SEQ>

<SEQ>
<UI>1631   Jiang,T.      On the Complexity of L.. Theoret.Comput. 93 
119:363-371
</UI>
<AU>Jiang T;
    Li M
</AU>
<TI>On the Complexity of Learning Strings and Sequences
</TI>
<SU>Sequence analysis;
    Complexity;
    Model;
    Learning;
    CA
</SU>
<AB>"It is shown that strings (sequences) cannot be learned by strings
(sequences) in Valiant's distribution-free (pac-) learning model, assuming RP 
not=
NP."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>119</PY>
<VO>119</VO>
<PP>363-371</PP>
</SEQ>

<SEQ>
<UI>1632   Amir,A.       Improved Dynamic Dicti.. Inform.Comput.  95 
119(2):258-282
</UI>
<AU>Amir A;
    Farach M;
    Schaffer AA
</AU>
<TI>Improved Dynamic Dictionary Matching
</TI>
<SU>Dictionary match;
    Dynamic;
    USA
</SU>
<AB>UnCover SICI Code: 0890-5401(19950601)119:2L.258:IDDM;1-
</AB>
<JT>Inform Comput</JT>
<PY>1995</PY>
<VO>119</VO>
<NO>2</NO>
<PP>258-282</PP>
</SEQ>

<SEQ>
<UI>1633   Henikoff,S.   Performance Evaluation.. Proteins Struct 93 17:49-61
</UI>
<AU>Henikoff S;
    Henikoff JG
</AU>
<TI>Performance Evaluation of Amino Acid Substitution Matrices
</TI>
<SU>Substitution;
    Amino acid;
    Performance
</SU>
<AB>Henikoff &amp; Henikoff (1994), p.578
</AB>
<JT>Proteins Struct Funct Genet</JT>
<PY>17</PY>
<VO>17</VO>
<PP>49-61</PP>
</SEQ>

<SEQ>
<UI>1634   Luthy,R.      Improving the Sensitiv.. Protein Sci.    94 
3:139-146
</UI>
<AU>Luthy R;
    Xenarios I;
    Bucher P
</AU>
<TI>Improving the Sensitivity of the Sequence Profile Method
</TI>
<SU>Profile;
    Protein
</SU>
<AB>Henikoff &amp; Henikoff (1994), p.578
</AB>
<JT>Protein Sci</JT>
<PY>3</PY>
<VO>3</VO>
<PP>139-146</PP>
</SEQ>

<SEQ>
<UI>1635   Evens,S.N.    Invariants of Some Pro.. Ann.Statist.    93 
21(1):355-377
</UI>
<AU>Evens SN;
    Speed TP
</AU>
<TI>Invariants of Some Probability Models Used in Phylogenetic Inference
</TI>
<SU>Invariant;
    Phylogenetic;
    Fourier;
    Probability;
    Model;
    USA
</SU>
<AB>"The so-called method of invariants is a technique in the field of
molecular evolution for inferring phylogenetic relations among a number of
species on the basis of nucleotide sequence data. An invariant is a polynomial
function of the probability distribution defined by a stochastic model for the
observed nucleotide sequence. ... For a wide class of models found in the
literature, we present a simple algebraic formalism for recognising whether or
not a function is an invariant and for generating all possible invariants. Our
work is based on recognising an underlying group structure and using discrete
Fourier analysis."
</AB>
<JT>Ann Statist</JT>
<PY>1993</PY>
<VO>21</VO>
<NO>1</NO>
<PP>355-377</PP>
</SEQ>

<SEQ>
<UI>1636   Fu,Y.X.       Necessary and Sufficie.. Math.Biosci.    91 
105:229-238
</UI>
<AU>Fu YX;
    Li WH
</AU>
<TI>Necessary and Sufficient Conditions for the Existence of Certain 
Quadratic
Invariants under a Phylogenetic Tree
</TI>
<SU>Invariant;
    Phylogenetic;
    Evolutionary tree;
    Substitution;
    Model;
    USA
</SU>
<AB>"Invariants are functions of the probabilities of state configurations
among lineages, with expected values equal to zero under certain phylogenies.
For two-state sequences, the existence of certain quadratic invariants requires
a symmetric substitution model. For sequences with more than two states, the
necessary condition for the existence of certain quadratic invariants in terms
of independent events is much stronger than symmetry. For DNA sequences, only
three parameters are allowed in the substitution model, which includes Kimura's
two-parameter model as a special case."
</AB>
<JT>Math Biosci</JT>
<PY>105</PY>
<VO>105</VO>
<PP>229-238</PP>
</SEQ>

<SEQ>
<UI>1637   Nguyen,T.     A Derivation of All Li.. J.Mol.Evol.     92 35:60-76
</UI>
<AU>Nguyen T;
    Speed TP
</AU>
<TI>A Derivation of All Linear Invariants for a Nonbalanced Transversion 
Model
</TI>
<SU>Invariant;
    Transversion;
    Phylogenetic;
    Model;
    Evolutionary tree;
    USA
</SU>
<AB>"Cavender noted a generalization of linear invariants to certain more
general substitution models. In this paper we give a simple explicit 
description
of a basis for all linear invariants for a slight variant of Cavender's more
general model, which applies to rooted trees linking any number of species.
Bases for rooted trees linking five species are enumerated and the method
applied to a problem concerning RNA polymerase sequence data."
</AB>
<JT>J Mol Evol</JT>
<PY>35</PY>
<VO>35</VO>
<PP>60-76</PP>
</SEQ>

<SEQ>
<UI>1638   Bunke,H.      An Algorithm for Match.. Computing       93 
50:297-314
</UI>
<AU>Bunke H;
    Csirik J
</AU>
<TI>An Algorithm for Matching Run-Length Coded Strings
</TI>
<SU>String match;
    Edit;
    Distance;
    Longest common;
    Sequence comparison;
    SWI;
    Algorithm
</SU>
<AB>"An algorithm for the computation of the edit distance of run-length 
coded
strings is given. In run-length coding, not all individual symbols in a string
are explicitly listed. Instead, one run of identical consecutive symbols is
coded by giving one representative symbol together with its multiplicity. The
algorithm determines the minimum cost sequence of edit operations transforming
one string into another. In the worst case, the algorithm has a time complexity
of O(nm), where n and m give the lengths of the strings to be compared. In the
best case, the time complexity is O(kl), where k and l are the numbers of runs
of identical symbols in the two strings under comparison."
</AB>
<JT>Computing </JT>
<PY>50</PY>
<VO>50</VO>
<PP>297-314</PP>
</SEQ>

<SEQ>
<UI>1639   Bock,H.H.     Consensus Rules for Mo.. Learning and .. 
95Springer-Verlag
</UI>
<AU>Bock HH;
    Day WHE;
    McMorris FR
</AU>
<TI>Consensus Rules for Molecular Sequences: Open Problems
</TI>
<ED>Bock HH
    Polasek W
</ED>
<BK>Learning and Knowledge
</BK>
<SU>Consensus discovery;
    Consensus method;
    Sequence analysis;
    DE
</SU>
<AB>Preprint: "Any set of n aligned molecular (e.g., DNA, protein) sequences
can be represented by a matrix of n rows and m columns in which the n symbols
(e.g., bases, amino acids) in a given column represent homologous states of a
biological character. At each column of the matrix, a problem of consensus
description is to determine a set of symbols (e.g., ambiguity codes for DNA)
that best represents the n symbols in the column. Although consensus sequences
appear frequently in the biological literature, the features or relevance of
rules for deriving consensus molecular sequences are largely unexplored. To
encourage further research, we summarize recent work and pose mathematical and
biological problems regarding consensus rules for molecular sequences."
</AB>
<PU>Springer-Verlag </PU>
<PL>Heidelberg </PL>
<PY>1995</PY>
<PP>1-11</PP>
</SEQ>

<SEQ>
<UI>1640   Overington,J. Environment-specific A.. Protein Sci.    92 
1:216-226
</UI>
<AU>Overington J;
    Donnelly D;
    Johnson MS;
    Sali A;
    Blundell TL
</AU>
<TI>Environment-specific Amino Acid Substitution Tables: Tertiary Templates
and Prediction of Protein Folds
</TI>
<SU>Amino acid;
    Substitution;
    Template;
    Prediction;
    Protein;
    Fold
</SU>
<AB>Altschul, Boguski, Gish &amp; Wootton (1994), p. 129
</AB>
<JT>Protein Sci</JT>
<PY>1</PY>
<VO>1</VO>
<PP>216-226</PP>
</SEQ>

<SEQ>
<UI>1641   Pascarella,S. Analysis of Insertions.. J.Mol.Biol.     92 
224:461-471
</UI>
<AU>Pascarella S;
    Argos P
</AU>
<TI>Analysis of Insertions/Deletions in Protein Structures
</TI>
<SU>Indel;
    Protein;
    Structure;
    Sequence analysis;
    Evolution;
    DE
</SU>
<AB>"An analysis of insertions and deletions (indels) occurring in a databank
of multiple sequence alignments based on protein tertiary structure is 
reported.
Indels prefer to be short (1 to 5 residues). The average intervening sequence
length between them versus the percentage of residue identity in pairwise
alignments shows an exponential behaviour, suggesting a stochastic process such
that nearly every loop in an ancestral structure is a possible target for 
indels
during evolution."
</AB>
<JT>J Mol Biol</JT>
<PY>224</PY>
<VO>224</VO>
<PP>461-471</PP>
</SEQ>

<SEQ>
<UI>1642   Sikela,J.M.   Finding New Genes Fast.. Nature Genetics 93 
3:189-191
</UI>
<AU>Sikela JM;
    Auffray C
</AU>
<TI>Finding New Genes Faster than Ever
</TI>
<SU>Gene
</SU>
<AB>Altschul, Boguski, Gish &amp; Wootton (1994), p. 129
</AB>
<JT>Nature Genetics </JT>
<PY>3</PY>
<VO>3</VO>
<PP>189-191</PP>
</SEQ>

<SEQ>
<UI>1643   Karlin,S.     Charge Configurations .. Proc.Nat.Acad.S 88 
85:9396-9400
</UI>
<AU>Karlin S;
    Brendel V
</AU>
<TI>Charge Configurations in Viral Proteins
</TI>
<SU>Sequence search;
    Charge;
    Statistical;
    Protein;
    USA;
    Viral
</SU>
<AB>"The spatial distribution of the charged residues of a protein is of
interest with respect to potential electrostatic interactions. We have examined
the proteins of a large number of representative eukaryotic and prokaryotic
viruses for the occurrence of significant clusters, runs, and periodic patterns
of charge. ..."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>85</PY>
<VO>85</VO>
<PP>9396-9400</PP>
</SEQ>

<SEQ>
<UI>1644   Luo,L.F.      Informational Paramete.. J.Theor.Biol.   88 
130:351-361
</UI>
<AU>Luo LF;
    Tsai L;
    Zhou YM
</AU>
<TI>Informational Parameters of Nucleic Acid and Molecular Evolution
</TI>
<SU>Information theory;
    Evolution;
    Statistical;
    Probability;
    Nucleic acid;
    Evolutionary distance;
    Grammar;
    CN
</SU>
<AB>"From the point of view of information theory, a statistical analysis of
2000 nucleic acid sequences ... is given. The sequences are grouped into 20
categories. The probability-order-difference (POD) matrix is defined which is
used to analyse the evolutionary distance of any two categories of sequences.
The informational parameters ... are calculated for each sequence and averaged
in each category. The statistical dependence of these parameters on molecular
evolution is discussed. It is found that (X) is a good statistical quantity
which describes the vocabulary compositions as well as the grammatical
constructions of the genetic language."
</AB>
<JT>J Theor Biol</JT>
<PY>130</PY>
<VO>130</VO>
<PP>351-361</PP>
</SEQ>

<SEQ>
<UI>1645   Rowe,G.W.     On the Informational C.. J.Theor.Biol.   83 
101:151-170
</UI>
<AU>Rowe GW;
    Trainor LEH
</AU>
<TI>On the Informational Content of Viral DNA
</TI>
<SU>Sequence analysis;
    Information content;
    DNA;
    Genome;
    Codon;
    Bias;
    CA;
    Viral
</SU>
<AB>"This paper is concerned primarily with how information is stored in 
viral
DNA. The general problem of defining information content is discussed and a
procedure for analysis extended from that of Gatlin (1972) is developed. Long
range correlations in base sequences are analyzed for several viral genomes. 
The
relationship of these correlations to the existence of strong codon biases is
examined and the consequences discussed."
</AB>
<JT>J Theor Biol</JT>
<PY>101</PY>
<VO>101</VO>
<PP>151-170</PP>
</SEQ>

<SEQ>
<UI>1646   Stuckle,E.E.  Probability of Occurre.. J.Theor.Biol.   92 
159:299-306
</UI>
<AU>Stuckle EE;
    Nielsen PJ;
    Grob U
</AU>
<TI>Probability of Occurrence of Specific Oligomers
</TI>
<SU>Sequence analysis;
    Statistical;
    Oligomer;
    Probability;
    DE
</SU>
<AB>"We improved an already existing formula for calculating the probability
of occurrence of specific oligomers (Grob and Stuber, 1987) by taking into
account unequal base distribution. This method identifies specific oligomers in
a given sequence as candidates for biological signals."
</AB>
<JT>J Theor Biol</JT>
<PY>159</PY>
<VO>159</VO>
<PP>299-306</PP>
</SEQ>

<SEQ>
<UI>1647   Karlin,S.     Molecular Evolution of.. J.Virol.        94 
68(3):1886-190
</UI>
<AU>Karlin S;
    Mocarski ES;
    Schachtel GA
</AU>
<TI>Molecular Evolution of Herpesviruses: Genomic and Protein Sequence
Comparisons
</TI>
<SU>Evolution;
    Sequence comparison;
    Genomic;
    Protein;
    Phylogenetic;
    Sequence alignment;
    USA
</SU>
<AB>"Phylogenetic reconstruction of herpesvirus evolution is generally 
founded
on amino acid sequence comparisons of specific proteins. These are relevant to
the evolution of the specific gene (or set of genes), but the resulting
phylogeny may vary depending on the particular sequence chosen for analysis (or
comparison). In the first part of this report, we compare 13 herpesvirus 
genomes
by using a new multidimensional methodology based on distance measures and
partial orderings of dinucleotide relative abundances. ... In the second part 
of
this report, evolutionary relationships among the 13 herpesvirus genomes are
evaluated on the basis of recent methods of amino acid alignment applied to 
four
essential protein sequences. ... By our methods, evolutionary relationships
derived from genomic comparisons versus protein comparisons differ to some
extent. The dinucleotide relative abundance distances appear to discriminate 
DNA
structure specificity more than sequence specificity. The evolutionary
development of genes among viruses (and species) is more dependent on each
individual gene."
</AB>
<JT>J Virol</JT>
<PY>1994</PY>
<VO>68</VO>
<NO>3</NO>
<PP>1886-1902</PP>
</SEQ>

<SEQ>
<UI>1648   Karlin,S.     Statistical Studies of.. Phil.Trans.R.So 94 
344(1310):391-
</UI>
<AU>Karlin S
</AU>
<TI>Statistical Studies of Biomolecular Sequences: Score-based Methods
</TI>
<SU>Sequence analysis;
    Score;
    Statistical;
    Genome;
    Multiple comparison;
    Segment;
    USA
</SU>
<AB>"This presentation reviews the method of score-based sequence analysis
with the objectives of discerning distinctive segments in single sequences and
identifying significant common segments in sequence comparisons. A number of 
new
results are described here for both the theory and its applications. These
include distributional theory involving several high scoring segments in single
sequences, distribution formulas for general scoring regimes in multiple
sequence comparisons, bounds for periodic scoring assignments, sensitivity
analysis of genome composition and refinements on predicting exons and genes in
DNA sequences."
</AB>
<JT>Phil Trans R Soc Lond Ser B </JT>
<PY>1994</PY>
<VO>344</VO>
<NO>1310</NO>
<PP>391-402</PP>
</SEQ>

<SEQ>
<UI>1649   Burge,C.      Over- and Under-Repres.. Proc.Nat.Acad.S 92 
89:1358-1362
</UI>
<AU>Burge C;
    Campbell AM;
    Karlin S
</AU>
<TI>Over- and Under-Representation of Short Oligonucleotides in DNA Sequences
</TI>
<SU>Sequence analysis;
    DNA;
    Statistical;
    k-tuple;
    Genomic;
    USA
</SU>
<AB>"Strand-symmetric relative abundance functionals for di-, tri-, and
tetranucleotides are introduced and applied to sequences encompassing a broad
phylogenetic range to discern tendencies and anomalies in the occurrences of
these short oligonucleotides within and between genomic sequences. ...
Explanations for these over- and under-representations in terms of DNA/RNA
structures and regulatory mechanisms are considered."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>89</PY>
<VO>89</VO>
<PP>1358-1362</PP>
</SEQ>

<SEQ>
<UI>1650   Gutell,R.R.   Comparative Studies of.. Curr.Opin.Struc 93 
3:313-322
</UI>
<AU>Gutell RR
</AU>
<TI>Comparative Studies of RNA: Inferring Higher-order Structure from 
Patterns
of Sequence Variation
</TI>
<SU>RNA;
    Structure;
    Sequence comparison;
    USA
</SU>
<AB>Gautheret, Damberger &amp; Gutell (1995), p. 42.
</AB>
<JT>Curr Opin Struct Biol</JT>
<PY>3</PY>
<VO>3</VO>
<PP>313-322</PP>
</SEQ>

<SEQ>
<UI>1651   Olsen,G.J.    Comparative Analysis o..                 
83University Micr
</UI>
<AU>Olsen GJ
</AU>
<TI>Comparative Analysis of Nucleotide Sequence Data
BK  -
</TI>
<SU>Sequence comparison;
    RNA;
    Secondary;
    Sequence analysis;
    Phylogeny;
    USA;
    Nucleotide
</SU>
<AB>"By examining the divergence of the sequences it is possible to study the
evolutionary relationships of the sequences, and thus infer the relationships
between the organisms carrying them. By examining the similarities of the
sequences, it is possible to identify conserved sequence features, which, due 
to
their retention through evolution, we infer to be important to the function of
the sequences in vivo. This thesis contains three major divisions, each
emphasizing a different type of comparative sequence analysis. The first
division addresses the analysis of nucleotide sequence phylogeny. ... The 
second
division explores the application of comparative analysis to the deduction of
RNA secondary structure. ... The final divisiion presents a general method for
inferring the relationship between the nucleotide appearing at one position in 
a
molecule and the nucleotide appearing at a second position in the same
molecule."
</AB>
<PU>University Microfilms International</PU>
<PL>Ann Arbor, MI 48106, USA</PL>
<PY>1983</PY>
<PP>xii+1-163</PP>
</SEQ>

<SEQ>
<UI>1652   Winker,S.     Structure Detection th.. Comput.Appl.Bio 90 
6:365-371
</UI>
<AU>Winker S;
    Overbeek R;
    Woese CR;
    Olsen GJ;
    Pfluger N
</AU>
<TI>Structure Detection through Automated Covariance Search
</TI>
<SU>Structure;
    Sequence comparison;
    Covariance;
    Detection
</SU>
<AB>Gautheret, Damberger &amp; Gutell (1995), p. 43.
</AB>
<JT>Comput Appl Biosci</JT>
<PY>6</PY>
<VO>6</VO>
<PP>365-371</PP>
</SEQ>

<SEQ>
<UI>1653   Rooman,M.J.   Identification of Pred.. Nature (Lond.)  88 
335(6185):45-4
</UI>
<AU>Rooman MJ;
    Wodak SJ
</AU>
<TI>Identification of Predictive Sequence Motifs Limited by Protein Structure
Data Base Size
</TI>
<SU>Protein;
    Structure;
    Database search;
    Motif;
    Sequence database;
    Identification;
    Belgium
</SU>
<AB>"Associations between short amino acid sequence patterns and protein
secondary structure classes can be found by searching a data base of known
protein structures. Analysis of these associations suggests that secondary
structure of proteins can be determined locally by sequence motifs of high
predictive value, but at present our ability to find these motifs is limited by
the size of the available data bases."
</AB>
<JT>Nature (Lond ) </JT>
<PY>1988</PY>
<VO>335</VO>
<NO>6185</NO>
<PP>45-49</PP>
</SEQ>

<SEQ>
<UI>1654   Steel,M.      Recovering the Correct.. Mol.Phylogenet. 95 0:0-0
</UI>
<AU>Steel M;
    Hendy M;
    Lockhart PJ;
    Penny D
</AU>
<TI>Recovering the Correct Tree Under a More Realistic Model of Sequence
Evolution
</TI>
<SU>Evolutionary tree;
    Sequence analysis;
    NZ;
    Model;
    Evolution
</SU>
<AB>Li &amp; Zharkikh (1995), p. 63. 28.07.95: not yet on shelf.
</AB>
<JT>Mol Phylogenet Evol</JT>
<PY>0</PY>
<VO>0</VO>
<PP>0-0</PP>
</SEQ>

<SEQ>
<UI>1655   Rawlings,D.   Adjacencies in Words     Adv.Appl.Math.  95 
16(2):206-218
</UI>
<AU>Rawlings D
</AU>
<TI>Adjacencies in Words
</TI>
<SU>String search;
    Common feature;
    Word
</SU>
<AB>UnCover SICI Code:  0196-8858(19950601)16:2L.206:AW;1-
</AB>
<JT>Adv Appl Math</JT>
<PY>1995</PY>
<VO>16</VO>
<NO>2</NO>
<PP>206-218</PP>
</SEQ>

<SEQ>
<UI>1656   Kececioglu,J. Combinatorial Algorith.. Algorithmica    95 
13(1/2):7-51
</UI>
<AU>Kececioglu JD;
    Myers EW
</AU>
<TI>Combinatorial Algorithms for DNA Sequence Assembly
</TI>
<SU>Sequence assembly;
    USA;
    Combinatorial;
    Algorithm;
    DNA
</SU>
<AB>UnCover SICI Code:  0178-4617(19950101)13:1:2L.7:CADS;1-
</AB>
<JT>Algorithmica </JT>
<PY>1995</PY>
<VO>13</VO>
<NO>1/2</NO>
<PP>7-51</PP>
</SEQ>

<SEQ>
<UI>1657   Alizadeh,F.   Physical Mapping of Ch.. Algorithmica    95 
13(1/2):52-76
</UI>
<AU>Alizadeh F;
    Karp RM;
    Weisser DK
</AU>
<TI>Physical Mapping of Chromosomes: A Combinatorial Problem in Molecular
Biology
</TI>
<SU>Chromosome;
    Physical mapping;
    Mapping;
    Combinatorial;
    Physical
</SU>
<AB>UnCover SICI Code:  0178-4617(19950101)13:1:2L.52:PMCC;1-
</AB>
<JT>Algorithmica </JT>
<PY>1995</PY>
<VO>13</VO>
<NO>1/2</NO>
<PP>52-76</PP>
</SEQ>

<SEQ>
<UI>1658   Pevzner,P.A.  DNA Physical Mapping a.. Algorithmica    95 
13(1/2):77-105
</UI>
<AU>Pevzner PA
</AU>
<TI>DNA Physical Mapping and Alternating Eulerian Cycles in Colored Graphs
</TI>
<SU>Physical mapping;
    Graph;
    DNA;
    USA;
    Mapping;
    Physical
</SU>
<AB>UnCover SICI Code:  0178-4617(19950101)13:1:2L.77:DPMA;1-
</AB>
<JT>Algorithmica </JT>
<PY>1995</PY>
<VO>13</VO>
<NO>1/2</NO>
<PP>77-105</PP>
</SEQ>

<SEQ>
<UI>1659   Chao,K.M.     Linear-Space Algorithm.. Algorithmica    95 
13(1/2):106-13
</UI>
<AU>Chao KM;
    Miller W
</AU>
<TI>Linear-Space Algorithms that Build Local Alignments from Fragments
</TI>
<SU>Sequence alignment;
    Fragment;
    Algorithm
</SU>
<AB>UnCover SICI Code:  0178-4617(19950101)13:1:2L.106:LATB;1-
</AB>
<JT>Algorithmica </JT>
<PY>1995</PY>
<VO>13</VO>
<NO>1/2</NO>
<PP>106-134</PP>
</SEQ>

<SEQ>
<UI>1660   Pevzner,P.A.  Multiple Filtration an.. Algorithmica    95 
13(1/2):135-15
</UI>
<AU>Pevzner PA;
    Waterman MS
</AU>
<TI>Multiple Filtration and Approximate Pattern Matching
</TI>
<SU>Pattern match;
    Approximate match;
    USA
</SU>
<AB>UnCover SICI Code:  0178-4617(19950101)13:1:2L.135:MFAP;1-
</AB>
<JT>Algorithmica </JT>
<PY>1995</PY>
<VO>13</VO>
<NO>1/2</NO>
<PP>135-154</PP>
</SEQ>

<SEQ>
<UI>1661   Farach,M.     A Robust Model for Fin.. Algorithmica    95 
13(1/2):155-17
</UI>
<AU>Farach M;
    Kannan S;
    Warnow T
</AU>
<TI>A Robust Model for Finding Optimal Evolutionary Trees
</TI>
<SU>Evolutionary tree;
    Phylogeny;
    Model;
    USA;
    Optimal
</SU>
<AB>UnCover SICI Code:  0178-4617(19950101)13:1:2L.155:RMFO;1-
</AB>
<JT>Algorithmica </JT>
<PY>1995</PY>
<VO>13</VO>
<NO>1/2</NO>
<PP>155-179</PP>
</SEQ>

<SEQ>
<UI>1662   Kececioglu,J. Exact and Approximatio.. Algorithmica    95 
13(1/2):180-21
</UI>
<AU>Kececioglu J;
    Sankoff D
</AU>
<TI>Exact and Approximation Algorithms for Sorting by Reversals, with
Application to Genome Rearrangement
</TI>
<SU>Genome;
    Rearrangement;
    Reversal;
    Algorithm;
    USA;
    Approximation
</SU>
<AB>UnCover SICI Code:  0178-4617(19950101)13:1:2L.180:EAAS;1-
</AB>
<JT>Algorithmica </JT>
<PY>1995</PY>
<VO>13</VO>
<NO>1/2</NO>
<PP>180-210</PP>
</SEQ>

<SEQ>
<UI>1663   de Rezende,P. Point Set Pattern Matc.. Algorithmica    95 
13(4):387-403?
</UI>
<AU>de Rezende PJ;
    Lee DT
</AU>
<TI>Point Set Pattern Matching in d-Dimensions
</TI>
<SU>Pattern match;
    Multidimensional
</SU>
<AB>UnCover SICI Code: 0178-4617(19950401)13:4L.387:PSPM;1-
</AB>
<JT>Algorithmica </JT>
<PY>1995</PY>
<VO>13</VO>
<NO>4</NO>
<PP>387-403</PP>
</SEQ>

<SEQ>
<UI>1664   Crochemore,M. Squares, Cubes, and Ti.. Algorithmica    95 
13(5):405-425
</UI>
<AU>Crochemore M;
    Rytter W
</AU>
<TI>Squares, Cubes, and Time-Space Efficient String Searching
</TI>
<SU>String search;
    Square
</SU>
<AB>UnCover SICI Code: 0178-4617(19950501)13:5L.405:SCTE;1-
</AB>
<JT>Algorithmica </JT>
<PY>1995</PY>
<VO>13</VO>
<NO>5</NO>
<PP>405-425</PP>
</SEQ>

<SEQ>
<UI>1665   Knight,J.R.   Approximate Regular Ex.. Algorithmica    95 
14(1):85-121?
</UI>
<AU>Knight JR;
    Myers EW
</AU>
<TI>Approximate Regular Expression Pattern Matching with Concave Gap 
Penalties
</TI>
<SU>Pattern match;
    Gap;
    Language;
    Expression
</SU>
<AB>UnCover SICI Code: 0178-4617(19950701)14:1L.85:AREP;1-
</AB>
<JT>Algorithmica </JT>
<PY>1995</PY>
<VO>14</VO>
<NO>1</NO>
<PP>85-121</PP>
</SEQ>

<SEQ>
<UI>1666   Golumbic,M.C. On the Complexity of D.. Adv.Appl.Math.  94 
15(3):251-???
</UI>
<AU>Golumbic MC;
    Kaplan H;
    Shamir R
</AU>
<TI>On the Complexity of DNA Physical Mapping
</TI>
<SU>Physical mapping;
    Complexity;
    DNA;
    Mapping;
    Physical
</SU>
<AB>UnCover SICI Code: 0196-8858(19940901)15:3L.251:CDP;1-
</AB>
<JT>Adv Appl Math</JT>
<PY>1994</PY>
<VO>15</VO>
<NO>3</NO>
<PP>251-???</PP>
</SEQ>

<SEQ>
<UI>1667   Amir,A.       Efficient 2-Dimensiona.. Inform.Comput.  95 
118(1):1-11
</UI>
<AU>Amir A;
    Farach M
</AU>
<TI>Efficient 2-Dimensional Approximate Matching of Half-Rectangular Figures
</TI>
<SU>Pattern match;
    Multidimensional;
    Rectangular;
    Approximate match
</SU>
<AB>UnCover SICI Code: 0890-5401(19950401)118:1L.1:E2AM;1-
</AB>
<JT>Inform Comput</JT>
<PY>1995</PY>
<VO>118</VO>
<NO>1</NO>
<PP>1-11</PP>
</SEQ>

<SEQ>
<UI>1668   Karlin,S.     Contrasts in Codon Usa.. J.Virol.        90 
64(9):4264-427
</UI>
<AU>Karlin S;
    Blaisdell BE;
    Schachtel GA
</AU>
<TI>Contrasts in Codon Usage of Latent versus Productive Genes of 
Epstein-Barr
Virus: Data and Hypotheses
</TI>
<SU>Codon;
    Gene;
    Sequence analysis;
    Bias;
    USA
</SU>
<AB>"Epstein-Barr virus (EBV) has two different modes of existence: latent 
and
productive. ... It is shown that the EBV genes known to be expressed during
latency display codon usage strikingly different from that of genes that are
expressed during lytic growth. In particular, the percentage of S3 (G or C in
codon site 3) is persistently lower (about 20%) in all latent genes than in
nonlatent genes. ... Two principal explanations to account for the EBV latent
versus productive gene codon disparity are proposed."
</AB>
<JT>J Virol</JT>
<PY>1990</PY>
<VO>64</VO>
<NO>9</NO>
<PP>4264-4273</PP>
</SEQ>

<SEQ>
<UI>1669   Karlin,S.     Dinucleotide Relative .. Trends in Genet 95 
11(7):283-???
</UI>
<AU>Karlin S;
    Burge C
</AU>
<TI>Dinucleotide Relative Abundance Extremes: A Genomic Signature
</TI>
<SU>Genomic;
    k-tuple;
    Sequence analysis;
    USA;
    Signature
</SU>
<AB>UnCover SICI Code: 0168-9479(19950701)11:7L.283:DRAE;1-  27.07.95: not 
yet
on shelf.
</AB>
<JT>Trends in Genetics </JT>
<PY>1995</PY>
<VO>11</VO>
<NO>7</NO>
<PP>283-???</PP>
</SEQ>

<SEQ>
<UI>1670   Karlin,S.     Measuring Residue Asso.. J.Mol.Biol.     94 
239(2):227-248
</UI>
<AU>Karlin S;
    Zuker M;
    Brocchieri L
</AU>
<TI>Measuring Residue Associations in Protein Structures. Possible
Implications for Protein Folding.
</TI>
<SU>Protein;
    Folding;
    Residue;
    Association;
    Structure;
    Distance;
    USA
</SU>
<AB>"We propose a number of distance measures between residues in protein
structures based on average, minimum and maximum distances of all atom 
(backbone
and side-chain) coordinates or with respect to side-chain atom coordinates 
only.
... For each distance measure, averaging and normalizing over representative
protein structures, association values and closeness orderings for all amino
acid types are determined. The expected associations of side-chain interactions
between oppositely charged residues, among hydrophobic residues and of cysteine
with cysteine are confirmed."
</AB>
<JT>J Mol Biol</JT>
<PY>1994</PY>
<VO>239</VO>
<NO>2</NO>
<PP>227-248</PP>
</SEQ>

<SEQ>
<UI>1671   Karlin,S.     Significant Similarity.. Mol.Biol.Evol.  92 
9(1):152-167
</UI>
<AU>Karlin S;
    Brendel V;
    Bucher P
</AU>
<TI>Significant Similarity and Dissimilarity in Homologous Proteins
</TI>
<SU>Protein;
    Significance;
    Sequence comparison;
    Statistical;
    Similarity;
    USA
</SU>
<AB>"Common practice emphasizes significant sequence similarities between
different members of protein families. These similarities presumably reflect on
evolutionary conservation of structurally and functionally essential residues.
The nonconserved regions, on the other hand, may be either selectively neutral
or differentiated. We propose several distributional sequence statistics (e.g.,
clustering of charged residues, compositional biases, and repetitive patterns)
as indicators of differentiation events."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1992</PY>
<VO>9</VO>
<NO>1</NO>
<PP>152-167</PP>
</SEQ>

<SEQ>
<UI>1672   Karlin,S.     Some Statistical Probl.. JASA            91 
86(413):27-??
</UI>
<AU>Karlin S;
    Macken C
</AU>
<TI>Some Statistical Problems in the Assessment of Inhomogeneities of DNA
Sequence Data
</TI>
<SU>Statistical;
    DNA;
    Sequence analysis
</SU>
<AB>UnCover: MAR 01 1991 v 86 n 413
</AB>
<JT>JASA </JT>
<PY>1991</PY>
<VO>86</VO>
<NO>413</NO>
<PP>27-??</PP>
</SEQ>

<SEQ>
<UI>1673   Dembo,A.      Critical Phenomena for.. Ann.Probab.     94 
22(4):1993-???
</UI>
<AU>Dembo A;
    Karlin S;
    Zeitouni O
</AU>
<TI>Critical Phenomena for Sequence Matching with Scoring
</TI>
<SU>Statistical;
    Scoring;
    Sequence match;
    USA
</SU>
<AB>UnCover SICI Code: 0091-1798(19941001)22:4L.1993:CPSM;1-
</AB>
<JT>Ann Probab</JT>
<PY>1994</PY>
<VO>22</VO>
<NO>4</NO>
<PP>1993-????</PP>
</SEQ>

<SEQ>
<UI>1674   Dembo,A.      Limit Distribution of .. Ann.Probab.     94 
22(4):2022-???
</UI>
<AU>Dembo A;
    Karlin S;
    Zeitouni O
</AU>
<TI>Limit Distribution of Maximal Non-aligned Two-sequence Segmental Score
</TI>
<SU>Statistical;
    Score;
    Segment;
    Pairwise comparison;
    USA;
    Distribution
</SU>
<AB>UnCover SICI Code: 0091-1798(19941001)22:4L.2022:LDMT;1-
</AB>
<JT>Ann Probab</JT>
<PY>1994</PY>
<VO>22</VO>
<NO>4</NO>
<PP>2022-????</PP>
</SEQ>

<SEQ>
<UI>1675   Port,E.       Genomic Mapping by End.. Genomics        95 
26(1):84-100
</UI>
<AU>Port E;
    Sun F;
    Martin D;
    Waterman MS
</AU>
<TI>Genomic Mapping by End-Characterized Random Clones: A Mathematical
Analysis
</TI>
<SU>Genomic;
    Mapping;
    Clone;
    Physical;
    Fingerprint;
    USA
</SU>
<AB>"Physical maps can be constructed by 'fingerprinting' a large number of
random clones and inferring overlap between clones when the fingerprints are
sufficiently similar. E. Lander and M. Waterman (1988) gave a mathematical
analysis of such mapping strategies. ... Recently it has been proposed that 
ends
of clones rather than the entire clone be fingerprinted or characterized. Such
fingerprints ... require a mathematical analysis deeper than that of Lander-
Waterman. This paper studies clone islands, which can include uncharacterized
regions, and also the islands that are formed entirely from the ends of 
clones."
</AB>
<JT>Genomics </JT>
<PY>1995</PY>
<VO>26</VO>
<NO>1</NO>
<PP>84-100</PP>
</SEQ>

<SEQ>
<UI>1676   Penner,R.C.   Spaces of RNA Secondar.. Adv.Math.       93 
101(1):31-49
</UI>
<AU>Penner RC;
    Waterman MS
</AU>
<TI>Spaces of RNA Secondary Structures
</TI>
<SU>RNA;
    Secondary;
    Structure;
    Topology;
    USA
</SU>
<AB>"We prove two topological theorems in physical chemistry. ... In fact, 
our
primary motivation here is to study secondary structures on RNA. This imposes
the further restriction that there can be at most one base-pair supported at a
given site of underlying linear macromolecule,  and imposing this restriction
leads to the class of 'binary macromolecules.' Our main results here assert the
sphericity of certain topological spaces of both arbitrary and binary
macromolecules, and it is the latter which we hope may have applications to
RNA."
</AB>
<JT>Adv Math</JT>
<PY>1993</PY>
<VO>101</VO>
<NO>1</NO>
<PP>31-49</PP>
</SEQ>

<SEQ>
<UI>1677   Waterman,M.S. Designer Algorithms fo.. N.Z.J.Bot.      93 
31(3):269-273
</UI>
<AU>Waterman MS;
    von Haeseler A
</AU>
<TI>Designer Algorithms for Cryptogene Searches
</TI>
<SU>Gene;
    Database search;
    RNA;
    Dynamic programming;
    Algorithm;
    Editing;
    USA
</SU>
<AB>"RNA editing in the mitochondria of kinetoplastid protozoa describes the
insertion and (or) deletion of precise numbers of uridines at precise locations
in the transcribed RNA. Such genes are known as cryptogenes. We describe 
dynamic
programming algorithms to search for unknown cryptogenes and for the sequences
that template the editing, gRNAs."
</AB>
<JT>N Z J Bot</JT>
<PY>1993</PY>
<VO>31</VO>
<NO>3</NO>
<PP>269-273</PP>
</SEQ>

<SEQ>
<UI>1678   Waterman,M.   Estimating Statistical.. Phil.Trans.R.So 94 
344(1310):383-
</UI>
<AU>Waterman M
</AU>
<TI>Estimating Statistical Significance of Sequence Alignments
</TI>
<SU>Sequence alignment;
    Statistical;
    Significance;
    Pairwise comparison;
    Segment;
    USA
</SU>
<AB>"Algorithms that compare two proteins or DNA sequences and produce an
alignment of the best matching segments are widely used in molecular biology.
These algorithms produce scores that when comparing random sequences of length 
n
grow proportional to n or to log(n) depending on the algorithm parameters. The
Azuma-Hoeffding inequality gives an upper bound on the probability of large
deviations of the score from its mean in the linear case. Poisson approximation
can be applied in the logarithmic case."
</AB>
<JT>Phil Trans R Soc Lond Ser B </JT>
<PY>1994</PY>
<VO>344</VO>
<NO>1310</NO>
<PP>383-390</PP>
</SEQ>

<SEQ>
<UI>1679   Frank-Kamenet Fractality of DNA Texts  J.Biomol.Struct 94 
12(3):655-670
</UI>
<AU>Frank-Kamenetskii MD;
    Borovik AS;
    Grosberg AY
</AU>
<TI>Fractality of DNA Texts
</TI>
<SU>Fractal;
    Sequence analysis;
    DNA
</SU>
<AB>UnCover SICI Code: 0739-1102(19941201)12:3L.655:FDT;1-
</AB>
<JT>J Biomol Struct &amp; Dyn </JT>
<PY>1994</PY>
<VO>12</VO>
<NO>3</NO>
<PP>655-670</PP>
</SEQ>

<SEQ>
<UI>1680   Chelvanayagam Easy Adaptation of Pro.. Protein Eng.    94 
7(2):173-184
</UI>
<AU>Chelvanayagam G;
    Roy G;
    Argos P
</AU>
<TI>Easy Adaptation of Protein Structure to Sequence
</TI>
<SU>Protein;
    Structure;
    Sequence analysis
</SU>
<AB>UnCover SICI Code: 0269-2139(19940201)7:2L.173:EAPS;1-
</AB>
<JT>Protein Eng</JT>
<PY>1994</PY>
<VO>7</VO>
<NO>2</NO>
<PP>173-184</PP>
</SEQ>

<SEQ>
<UI>1681   Vriend,G.     A Novel Search Method .. Protein Eng.    94 
7(1):23-30
</UI>
<AU>Vriend G;
    Sander C;
    Stouten PFW
</AU>
<TI>A Novel Search Method for Protein Sequence-Structure Relations using
Property Profiles
</TI>
<SU>Protein;
    Sequence search;
    Structure;
    Profile
</SU>
<AB>UnCover SICI Code: 0269-2139(19940101)7:1L.23:NSMP;1-
</AB>
<JT>Protein Eng</JT>
<PY>1994</PY>
<VO>7</VO>
<NO>1</NO>
<PP>23-30</PP>
</SEQ>

<SEQ>
<UI>1682   Flores,T.P.   An Algorithm for Autom.. Protein Eng.    94 
7(1):31-38
</UI>
<AU>Flores TP;
    Moss DS;
    Thornton JM
</AU>
<TI>An Algorithm for Automatically Generating Protein Topology Cartoons
</TI>
<SU>Protein;
    Topology;
    Structure;
    Algorithm
</SU>
<AB>UnCover SICI Code: 0269-2139(19940101)7:1L.31:AAGP;1-
</AB>
<JT>Protein Eng</JT>
<PY>1994</PY>
<VO>7</VO>
<NO>1</NO>
<PP>31-38</PP>
</SEQ>

<SEQ>
<UI>1683   Saqi,S.A.M.   Identification of Sequ.. Protein Eng.    94 
7(2):165-172
</UI>
<AU>Saqi SAM;
    Sternberg MJE
</AU>
<TI>Identification of Sequence Motifs from a Set of Proteins with Related
Function
</TI>
<SU>Protein;
    Sequence analysis;
    Motif;
    Identification;
    Function
</SU>
<AB>UnCover SICI Code: 0269-2139(19940201)7:2L.165:ISMF;1-
</AB>
<JT>Protein Eng</JT>
<PY>1994</PY>
<VO>7</VO>
<NO>2</NO>
<PP>165-172</PP>
</SEQ>

<SEQ>
<UI>1684   Laughton,A.C. A Study of Simulated A.. Protein Eng.    94 
7(2):235-242
</UI>
<AU>Laughton AC
</AU>
<TI>A Study of Simulated Annealing Protocols for Use with  Molecular Dynamics
in Protein Structure Prediction
</TI>
<SU>Protein;
    Simulated annealing;
    Structure;
    Prediction;
    Dynamic
</SU>
<AB>UnCover SICI Code: 0269-2139(19940201)7:2L.235:SSAP;1-
</AB>
<JT>Protein Eng</JT>
<PY>1994</PY>
<VO>7</VO>
<NO>2</NO>
<PP>235-242</PP>
</SEQ>

<SEQ>
<UI>1685   Mao,B.        Protein Folding Classe.. Protein Eng.    94 
7(3):319-330
</UI>
<AU>Mao B;
    Chou KC;
    Zhang CT
</AU>
<TI>Protein Folding Classes: A Geometric Interpretation of the Amino Acid
Composition of Globular Proteins
</TI>
<SU>Protein;
    Folding;
    Amino acid;
    Composition;
    Geometry
</SU>
<AB>UnCover SICI Code: 0269-2139(19940301)7:3L.319:PFCG;1-
</AB>
<JT>Protein Eng</JT>
<PY>1994</PY>
<VO>7</VO>
<NO>3</NO>
<PP>319-330</PP>
</SEQ>

<SEQ>
<UI>1686   Attwood,T.K.  PRINTS - A Protein Mot.. Protein Eng.    94 
7(7):841-848
</UI>
<AU>Attwood TK;
    Beck ME
</AU>
<TI>PRINTS - A Protein Motif Fingerprint Database
</TI>
<SU>Protein;
    Motif;
    Fingerprint;
    Database search
</SU>
<AB>UnCover SICI Code: 0269-2139(19940701)7:7L.841:PPMF;1-
</AB>
<JT>Protein Eng</JT>
<PY>1994</PY>
<VO>7</VO>
<NO>7</NO>
<PP>841-848</PP>
</SEQ>

<SEQ>
<UI>1687   Fidelis,K.    Comparison of Systemat.. Protein Eng.    94 
7(8):953-960
</UI>
<AU>Fidelis K;
    Stern PS;
    Moult J
</AU>
<TI>Comparison of Systematic Search and Database Methods for Construction
Segment of Protein Structure
</TI>
<SU>Protein;
    Structure;
    Database search;
    Segment;
    Systematics
</SU>
<AB>UnCover SICI Code: 0269-2139(19940801)7:8L.953:CSSD;1-
</AB>
<JT>Protein Eng</JT>
<PY>1994</PY>
<VO>7</VO>
<NO>8</NO>
<PP>953-960</PP>
</SEQ>

<SEQ>
<UI>1688   Lathrop,R.H.  The Protein Threading .. Protein Eng.    94 
7(9):1059-1068
</UI>
<AU>Lathrop RH
</AU>
<TI>The Protein Threading Problem with Sequence Amino Acid Interaction
Preferences is NP-complete
</TI>
<SU>Protein;
    Sequence proximity;
    Amino acid;
    Complexity
</SU>
<AB>UnCover SICI Code: 0269-2139(19940901)7:9L.1059:PTPW;1-
</AB>
<JT>Protein Eng</JT>
<PY>1994</PY>
<VO>7</VO>
<NO>9</NO>
<PP>1059-1068</PP>
</SEQ>

<SEQ>
<UI>1689   De Filippis,V Predicting Local Struc.. Protein Eng.    94 
7(10):1203-120
</UI>
<AU>De Filippis V;
    Sander C;
    Vriend G
</AU>
<TI>Predicting Local Structural Changes that Result from Point Mutations
</TI>
<SU>Protein;
    Structure;
    Prediction
</SU>
<AB>UnCover SICI Code: 0269-2139(19941001)7:10L.1203:PLSC;1-
</AB>
<JT>Protein Eng</JT>
<PY>1994</PY>
<VO>7</VO>
<NO>10</NO>
<PP>1203-1208</PP>
</SEQ>

<SEQ>
<UI>1690   Shindyalov,I. Macromolecular Query L.. Protein Eng.    94 
7(11):1311-132
</UI>
<AU>Shindyalov IN;
    Chang W;
    Bourne PE
</AU>
<TI>Macromolecular Query Language (MMQL): Prototype Data Model and
Implementation
</TI>
<SU>Protein;
    Model;
    Language;
    Query
</SU>
<AB>UnCover SICI Code: 0269-2139(19941101)7:11L.1311:MQL(;1-
</AB>
<JT>Protein Eng</JT>
<PY>1994</PY>
<VO>7</VO>
<NO>11</NO>
<PP>1311-1322</PP>
</SEQ>

<SEQ>
<UI>1691   Horimoto,K.   A Simple Procedure for.. Protein Eng.    94 
7(12):1433-144
</UI>
<AU>Horimoto K;
    Yamamoto H;
    Otsuka J
</AU>
<TI>A Simple Procedure for Assigning a Sequence Motif with an Obscure 
Pattern:
Application to the Basic/Helix-Loop-Helix Motif
</TI>
<SU>Protein;
    Sequence analysis;
    Motif
</SU>
<AB>UnCover SICI Code: 0269-2139(19941201)7:12L.1433:SPAS;1-
</AB>
<JT>Protein Eng</JT>
<PY>1994</PY>
<VO>7</VO>
<NO>12</NO>
<PP>1433-1440</PP>
</SEQ>

<SEQ>
<UI>1692   Zhu,Z.Y.      A New Approach to the .. Protein Eng.    95 
8(2):103-108
</UI>
<AU>Zhu ZY
</AU>
<TI>A New Approach to the Evaluation of Protein Secondary Structure
Predictions at the Level of the Elements of Secondary Structure
</TI>
<SU>Protein;
    Secondary;
    Structure;
    Prediction
</SU>
<AB>UnCover SICI Code: 0269-2139(19950201)8:2L.103:NAEP;1-
</AB>
<JT>Protein Eng</JT>
<PY>1995</PY>
<VO>8</VO>
<NO>2</NO>
<PP>103-108</PP>
</SEQ>

<SEQ>
<UI>1693   Milik,M.      Neural Network System .. Protein Eng.    95 
8(3):225-236
</UI>
<AU>Milik M;
    Kolinski A;
    Skolnick J
</AU>
<TI>Neural Network System for the Evaluation of Side-chain Packing in Protein
Structures
</TI>
<SU>Neural;
    Network;
    Protein;
    Structure
</SU>
<AB>UnCover SICI Code: 0269-2139(19950301)8:3L.225:NNSE;1-
</AB>
<JT>Protein Eng</JT>
<PY>1995</PY>
<VO>8</VO>
<NO>3</NO>
<PP>225-236</PP>
</SEQ>

<SEQ>
<UI>1694   Kolinski,A.   Monte Carlo Simulation.. Proteins Struct 94 
18(4):338-352
</UI>
<AU>Kolinski A;
    Skolnick J
</AU>
<TI>Monte Carlo Simulations of Protein Folding. I. Lattice Model and
Interaction Scheme
</TI>
<SU>Protein;
    Folding;
    Monte Carlo;
    Simulation;
    Model
</SU>
<AB>UnCover SICI Code: 0887-3585(1994)18:4L.338:MCSP;1-
</AB>
<JT>Proteins Struct Funct Genet</JT>
<PY>1994</PY>
<VO>18</VO>
<NO>4</NO>
<PP>338-352</PP>
</SEQ>

<SEQ>
<UI>1695   Rose,G.D.     Protein Folding: Predi.. Proteins Struct 94 
19(1):1-3
</UI>
<AU>Rose GD;
    Creamer TP
</AU>
<TI>Protein Folding: Predicting Predicting
</TI>
<SU>Protein;
    Folding;
    Prediction
</SU>
<AB>UnCover SICI Code: 0887-3585(1994)19:1L.1:PFPP;1-
</AB>
<JT>Proteins Struct Funct Genet</JT>
<PY>1994</PY>
<VO>19</VO>
<NO>1</NO>
<PP>1-3</PP>
</SEQ>

<SEQ>
<UI>1696   Rost,B.       Combining Evolutionary.. Proteins Struct 94 
19(1):55-72
</UI>
<AU>Rost B;
    Sander C
</AU>
<TI>Combining Evolutionary Information and Neural Networks to Predict Protein
Secondary Structure
</TI>
<SU>Protein;
    Secondary;
    Structure;
    Evolution;
    Neural;
    Network
</SU>
<AB>UnCover SICI Code: 0887-3585(1994)19:1L.55:CEIN;1-
</AB>
<JT>Proteins Struct Funct Genet</JT>
<PY>1994</PY>
<VO>19</VO>
<NO>1</NO>
<PP>55-72</PP>
</SEQ>

<SEQ>
<UI>1697   Abagyan,R.    Recognition of Distant.. Proteins Struct 94 
19(2):132-140
</UI>
<AU>Abagyan R;
    Frishman D;
    Argos P
</AU>
<TI>Recognition of Distantly Related Proteins Through Energy Calculations
</TI>
<SU>Protein;
    Recognition;
    Energy
</SU>
<AB>UnCover SICI Code: 0887-3585(1994)19:2L.132:RDRP;1-
</AB>
<JT>Proteins Struct Funct Genet</JT>
<PY>1994</PY>
<VO>19</VO>
<NO>2</NO>
<PP>132-140</PP>
</SEQ>

<SEQ>
<UI>1698   Holm,L.       Searching Protein Stru.. Proteins Struct 94 
19(3):165-173
</UI>
<AU>Holm L;
    Sander C
</AU>
<TI>Searching Protein Structure Databases Has Come of Age
</TI>
<SU>Protein;
    Structure;
    Database search
</SU>
<AB>UnCover SICI Code: 0887-3585(1994)19:3L.165:SPSD;1-
</AB>
<JT>Proteins Struct Funct Genet</JT>
<PY>1994</PY>
<VO>19</VO>
<NO>3</NO>
<PP>165-173</PP>
</SEQ>

<SEQ>
<UI>1699   Holm,L.       Parser for Protein Fol.. Proteins Struct 94 
19(3):256-268
</UI>
<AU>Holm L;
    Sander C
</AU>
<TI>Parser for Protein Folding Units
</TI>
<SU>Protein;
    Folding;
    Parser
</SU>
<AB>UnCover SICI Code: 0887-3585(1994)19:3L.256:PPFU;1-
</AB>
<JT>Proteins Struct Funct Genet</JT>
<PY>1994</PY>
<VO>19</VO>
<NO>3</NO>
<PP>256-268</PP>
</SEQ>

<SEQ>
<UI>1700   Chou,K.C.     A Novel Approach to Pr.. Proteins Struct 95 
21(4):319-344
</UI>
<AU>Chou KC
</AU>
<TI>A Novel Approach to Predicting Protein Structural Classes in a (20-1)
Amino Acid Composition Space
</TI>
<SU>Protein;
    Structure;
    Prediction;
    Amino acid;
    Composition
</SU>
<AB>UnCover SICI Code: 0887-3585(1995)21:4L.319:NAPP;1-
</AB>
<JT>Proteins Struct Funct Genet</JT>
<PY>1995</PY>
<VO>21</VO>
<NO>4</NO>
<PP>319-344</PP>
</SEQ>

<SEQ>
<UI>1701   Rost,B.       Improved Prediction of.. Proc.Nat.Acad.S 93 
90(16):7558-75
</UI>
<AU>Rost B;
    Sander C
</AU>
<TI>Improved Prediction of Protein Secondary Structure by Use of Sequence
Profiles and Neural Networks
</TI>
<SU>Neural;
    Secondary;
    Structure;
    Prediction;
    Protein;
    DE;
    Profile;
    Network
</SU>
<AB>"The explosive accumulation of protein sequences in the wake of large-
scale sequencing projects is in stark contrast to the much slower experimental
determination of protein structures. Improved methods of structure prediction
from the gene sequence alone are therefore needed. Here, we report a 
substantial
increase in both the accuracy and quality of secondary-structure predictions,
using a neural-network algorithm. The main improvements come from the use of
multiple sequence alignments (better overall accuracy), from "balanced 
training"
(better prediction of  beta -strands), and from "structure context training"
(better prediction of helix and strand lengths)."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>1993</PY>
<VO>90</VO>
<NO>16</NO>
<PP>7558-7562</PP>
</SEQ>

<SEQ>
<UI>1702   Benson,G.     A Space Efficient Algo.. Theoret.Comput. 95 
145:357-369
</UI>
<AU>Benson G
</AU>
<TI>A Space Efficient Algorithm for Finding the Best Nonoverlapping Alignment
Score
</TI>
<SU>Sequence analysis;
    Sequence alignment;
    Repeat;
    Score;
    Algorithm;
    USA
</SU>
<AB>"Repeating patterns make up a significant fraction of DNA and protein
molecules. ... In this paper, we present a space efficient algorithm for 
finding
the maximum alignment score for any two substrings of a single string T under
the condition that the substrings do not overlap. In a biological context, this
corresponds to the largest repeating region in the molecule. The algorithm runs
in O(n**2 log**2 n) time and uses only O(n**2) space."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>145</PY>
<VO>145</VO>
<PP>357-369</PP>
</SEQ>

<SEQ>
<UI>1703   Brenner,S.E.  Network Sequence Retri.. Trends in Genet 95 
11(6):247-248
</UI>
<AU>Brenner SE
</AU>
<TI>Network Sequence Retrieval
</TI>
<SU>Sequence search;
    Database search;
    Electronic mail;
    World Wide Web;
    Retrieval;
    UK;
    Network
</SU>
<AB>"Retrieving DNA and protein sequences from a database is one of the 
common
computer tasks for molecular biologists and should be one of the simplest.  ...
But for scientists who wish to spend their research time at the bench and not 
at
the computer, even the trouble of obtaining current versions of the software,
installing them and learning about them can be a distressingly large time
investment. ... A World Wide Web (WWW) client can provide a one-piece solution.
... Time spent learning how to use the WWW is a good investment."
</AB>
<JT>Trends in Genetics </JT>
<PY>1995</PY>
<VO>11</VO>
<NO>6</NO>
<PP>247-248</PP>
</SEQ>

<SEQ>
<UI>1704   Brown,N.P.    Identification and Ana.. J.Mol.Biol.     95 
249:342-359
</UI>
<AU>Brown NP;
    Whittaker AJ;
    Newell WR;
    Rawlings CJ;
    Beck S
</AU>
<TI>Identification and Analysis of Multigene Families by Comparison of Exon
Fingerprints
</TI>
<SU>Gene;
    Sequence alignment;
    Sequence comparison;
    Dynamic programming;
    Fingerprint;
    Exon;
    Genomic;
    UK;
    Identification
</SU>
<AB>"Gene families are often recognised by sequence homology using similarity
searching to find relationships, however, genomic sequence data provides gene
architectural information not used by conventional search methods. ... A fast
search technique capable of detecting possible weak sequence homologies 
apparent
at the intron/exon level of gene organization is presented for comparing
spliceosomal genes and gene fragments."
</AB>
<JT>J Mol Biol</JT>
<PY>249</PY>
<VO>249</VO>
<PP>342-359</PP>
</SEQ>

<SEQ>
<UI>1705   Charleston,M. Neighbor-Joining Uses .. Mol.Phylogenet. 93 
2(1):6-12
</UI>
<AU>Charleston MA;
    Hendy MD;
    Penny D
</AU>
<TI>Neighbor-Joining Uses the Optimal Weight for Net Divergence
</TI>
<SU>Phylogenetic;
    Neighbor joining;
    Distance;
    NZ;
    Optimal;
    Divergence
</SU>
<AB>"A class of phylogenetic clustering methods which calculate net
divergences from distance data, but assign differing weights to the net
divergences, is defined. ... The accuracy of some of these methods is studied 
by
computer simulation for the case of four taxa under the additive tree
hypothesis. Of these methods and under this hypothesis, it is proved that
Neighbor-Joining uses the only weighting for net divergence which is 
consistent,
so that it is the only method in the class which is expected to converge to the
correct tree as more data are added."
</AB>
<JT>Mol Phylogenet Evol</JT>
<PY>1993</PY>
<VO>2</VO>
<NO>1</NO>
<PP>6-12</PP>
</SEQ>

<SEQ>
<UI>1706   Yang,Z.       Evaluation of Several .. J.Mol.Evol.     95 
40:689-697
</UI>
<AU>Yang Z
</AU>
<TI>Evaluation of Several Methods for Estimating Phylogenetic Trees when
Substitution Rates Differ over Nucleotide Sites
</TI>
<SU>Phylogenetic;
    Evolutionary tree;
    Substitution;
    Rate;
    Nucleotide;
    UK
</SU>
<AB>"Several maximum likelihood and distance matrix methods for estimating
phylogenetic trees from homologous DNA sequences were compared when 
substitution
rates at sites were assumed to follow a gamma distribution. Computer 
simulations
were performed to estimate the probabilities that various tree estimation
methods recover the true tree topology. The case of four species was 
considered,
and a few combinations of parameters were examined. Attention was applied to
discriminating among different sources of error in tree reconstruction ...."
</AB>
<JT>J Mol Evol</JT>
<PY>40</PY>
<VO>40</VO>
<PP>689-697</PP>
</SEQ>

<SEQ>
<UI>1707   Zhang,L.      On the Approximation o.. Theoret.Comput. 95 
143:353-362
</UI>
<AU>Zhang L
</AU>
<TI>On the Approximation of Longest Common Nonsupersequences and Shortest
Common Nonsubsequences
</TI>
<SU>Longest common;
    Shortest common;
    Subsequence;
    Supersequence;
    CA;
    Approximation;
    Nonsubsequence
</SU>
<AB>"The longest common nonsupersequence (LCNS) problem is shown to be NP-
complete over the binary alphabet, and Max SNP-hard, in general. Although it is
open whether this problem and the shortest common nonsubsequence problem are 
Max
SNP-hard over the binary alphabet, we show that their generalizations (the 
mixed
supersequence and the mixed subsequence problems) indeed remain Max SNP-hard
over the binary alphabet."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>143</PY>
<VO>143</VO>
<PP>353-362</PP>
</SEQ>

<SEQ>
<UI>1708   Gatesy,J.     Alignment-Ambiguous Nu.. Mol.Phylogenet. 93 
2(2):152-157
</UI>
<AU>Gatesy J;
    DeSalle R;
    Wheeler W
</AU>
<TI>Alignment-Ambiguous Nucleotide Sites and the Exclusion of Systematic Data
</TI>
<SU>Phylogenetic;
    Indel;
    Sequence alignment;
    Multiple alignment;
    DNA;
    USA;
    Nucleotide;
    Systematics
</SU>
<AB>"Molecular systematists generally rely on computer algorithms to 
establish
the alignment of DNA sequences. However, when alignment regions are
characterized by multiple insertions and deletions, these gap-filled stretches
of DNA are often excised before phylogenetic reconstruction. This exclusion of
systematic data is generally determined by subjective criteria. We explore a
replicable methodology in which the comparison of several multiple sequence
alignments can be used to eliminate regions of unstable sequence alignment."
</AB>
<JT>Mol Phylogenet Evol</JT>
<PY>1993</PY>
<VO>2</VO>
<NO>2</NO>
<PP>152-157</PP>
</SEQ>

<SEQ>
<UI>1709   Gu,X.         Maximum Likelihood Est.. Mol.Biol.Evol.  95 
12(4):546-557
</UI>
<AU>Gu X;
    Fu YX;
    Li WH
</AU>
<TI>Maximum Likelihood Estimation of the Heterogeneity of Substitution Rate
among Nucleotide Sites
</TI>
<SU>Likelihood;
    Estimation;
    Substitution;
    Rate;
    Nucleotide;
    Distance;
    USA
</SU>
<AB>"This paper presents a maximum likelihood approach to estimating the
variation of substitution rate among nucleotide sites. We assume that the rate
varies among sites according to an invariant+gamma distribution, which has two
parameters: the gamma parameter alpha and the proportion of invariable sites
theta. Theoretical treatments on three, four, and five sequences have been
conducted, and computer programs have been developed. ... Extensive simulations
show that ...."
</AB>
<JT>Mol Biol Evol</JT>
<PY>1995</PY>
<VO>12</VO>
<NO>4</NO>
<PP>546-557</PP>
</SEQ>

<SEQ>
<UI>1710   Harper,R.     World Wide Web Resourc.. Trends in Genet 95 
11(6):223-228
</UI>
<AU>Harper R
</AU>
<TI>World Wide Web Resources for the Biologist
</TI>
<SU>WWW;
    Network;
    Electronic mail;
    Internet;
    UK;
    World Wide Web
</SU>
<AB>"The World Wide Web is currently the major networking resource for
biologists. It has passed Gopher and simple electronic mail (email) servers in
popularity. In the 1990's, the advent of client-server sortware will be the 
main
driving force in bioinformatics. During the past few years, biologists have 
used
the Internet increasingly to distribute data, and the methods of doing this 
have
become more and more sophisticated as the speed with which network links can be
made has increased."
</AB>
<JT>Trends in Genetics </JT>
<PY>1995</PY>
<VO>11</VO>
<NO>6</NO>
<PP>223-228</PP>
</SEQ>

<SEQ>
<UI>1711   Hasegawa,M.   Relative Efficiencies .. Mol.Phylogenet. 93 2(1):1-5
</UI>
<AU>Hasegawa M;
    Fujiwara M
</AU>
<TI>Relative Efficiencies of the Maximum Likelihood, Maximum Parsimony, and
Neighbor-Joining Methods for Estimating Protein Phylogeny
</TI>
<SU>Likelihood;
    Parsimony;
    Neighbor joining;
    Protein;
    Phylogeny;
    Simulation;
    JP
</SU>
<AB>"The relative efficiencies of the maximum likelihood (ML), maximum
parsimony (MP), and neighbor-joining (NJ) methods for protein phylogeny in
obtaining the correct tree topology were studied by using computer simulation.
Furthermore, the robustness of the methods against departures from the assumed
underlying model was studied."
</AB>
<JT>Mol Phylogenet Evol</JT>
<PY>1993</PY>
<VO>2</VO>
<NO>1</NO>
<PP>1-5</PP>
</SEQ>

<SEQ>
<UI>1712   Huynen,M.A.   Pattern Generation in .. J.Mol.Evol.     94 39:71-79
</UI>
<AU>Huynen MA;
    Hogeweg P
</AU>
<TI>Pattern Generation in Molecular Evolution: Exploitation of the Variation
in RNA Landscapes
</TI>
<SU>RNA;
    Pattern definition;
    Evolution;
    Secondary;
    Structure;
    Simulation;
    Statistical;
    NL
</SU>
<AB>"Evolution of RNA secondary structure is studied using simulation
techniques and statistical analysis of fitness landscapes. The transition from
RNA sequence to RNA secondary structure leads to fitness landscapes that have
local variations in their 'ruggedness.' Evolution exploits these variations. In
stable environments it moves the quasispecies toward relatively 'flat' peaks,
where not only the master sequence but also its mutants have a high fitness. In
a rapidly changing environment, the situation is reversed: evolution moves the
quasispecies to a region where the correlation between secondary structures of
'neighboring' RNA sequences is relatively low."
</AB>
<JT>J Mol Evol</JT>
<PY>39</PY>
<VO>39</VO>
<PP>71-79</PP>
</SEQ>

<SEQ>
<UI>1713   Jiang,T.      Shortest Consistent Su.. Theoret.Comput. 95 
143:113-122
</UI>
<AU>Jiang T;
    Timkovsky VG
</AU>
<TI>Shortest Consistent Superstrings Computable in Polynomial Time
</TI>
<SU>Shortest consistent;
    Shortest common;
    Supersequence;
    Complexity;
    Sequencing;
    Hybridization;
    CA
</SU>
<AB>"The shortest consistent superstring problem is, given a set of positive
strings and a set of negative strings, finding a shortest string including 
every
positive string and no negative string as a substring. The problem is NP-hard
and arises in DNA sequencing by hybridization. It is also an extension of the
well-known shortest common superstring problem which corresponds to the case
when the set of negative strings is empty. In this paper we show that a 
shortest
consistent superstring can be found in polynomial time if (i) a longest common
nonsuperstring for the set of negative strings exists or (ii) the number of
positive strings is bounded and every symbol of the alphabet appears at the end
of some negative string."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>143</PY>
<VO>143</VO>
<PP>113-122</PP>
</SEQ>

<SEQ>
<UI>1714   Karlin,S.     Correlation Analysis o.. Proc.Nat.Acad.S 92 
89:12165-12169
</UI>
<AU>Karlin S;
    Bucher P
</AU>
<TI>Correlation Analysis of Amino Acid Usage in Protein Classes
</TI>
<SU>Correlation;
    Protein;
    Amino acid;
    Viral;
    Genome;
    USA
</SU>
<AB>"We present a comparative study of residue usage correlations of various
organism protein sets of diverse phylogenetic species and of open reading 
frames
of several large human viral genomes. Our correlation analysis reveals three
major tendencies: .... Discussion and speculations relate amino acid usage
correlations to protein function/structure, cellular localization, proximity in
amino acid biosynthetic pathways, amino acid relative abundances, tRNA and
aminoacyl synthetase availabilities, and evolutionary processes."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>89</PY>
<VO>89</VO>
<PP>12165-12169</PP>
</SEQ>

<SEQ>
<UI>1715   May,A.C.W.    The Recognition of Pro.. Phil.Trans.R.So 94 
344:373-381
</UI>
<AU>May ACW;
    Johnson MS;
    Rufino SD;
    Wako H;
    Zhu ZY;
    Sowdhamini R;
    Srinivasan N;
    Rodionov MA;
    Blundell TL
</AU>
<TI>The Recognition of Protein Structure and Function from Sequence: Adding
Value to Genome Data
</TI>
<SU>Protein;
    Structure;
    Function;
    Sequence analysis;
    Genome;
    UK;
    Recognition
</SU>
<AB>"The explosion of DNA sequence data from genome projects presents many
challenges. For instance, we must extend our current knowledge of protein
structure and function so that it can be applied to these new sequences. The
derivation of rules for the relationships between sequence and structure allow
us to recognize a common fold by the use of tertiary templates. New techniques
enable us to begin to meet the challenge of rule-based modelling of distantly
related proteins. This paper describes an integrated and knowledge-based
approach to the prediction of protein structure and function which can maximize
the value of sequence information."
</AB>
<JT>Phil Trans R Soc Lond Ser B </JT>
<PY>344</PY>
<VO>344</VO>
<PP>373-381</PP>
</SEQ>

<SEQ>
<UI>1716   Middendorf,M. On Finding Minimal, Ma.. Theoret.Comput. 95 
145:317-327
</UI>
<AU>Middendorf M
</AU>
<TI>On Finding Minimal, Maximal, and Consistent Sequences over a Binary
Alphabet
</TI>
<SU>Longest common;
    Subsequence;
    Shortest common;
    Supersequence;
    Complexity;
    DE
</SU>
<AB>"In this paper we investigate the complexity of finding various kinds of
common super- and subsequences with respect to one or two given sets of 
strings.
We show that Longest Minimal Common Supersequence, Shortest Maximal Common
Subsequence, and Shortest Maximal Common Non-Supersequence are MAX SNP-hard 
over
a binary alphabet. ... We show how these problems can be related to finding
sequences consistent with respect to two given sets of strings. This leads to a
unified approach for characterizing the complexity of such problems."
</AB>
<JT>Theoret Comput Sci</JT>
<PY>145</PY>
<VO>145</VO>
<PP>317-327</PP>
</SEQ>

<SEQ>
<UI>1717   Miramontes,P. Structural and Thermod.. J.Mol.Evol.     95 
40:698-704
</UI>
<AU>Miramontes P;
    Medrano L;
    Cerpa C;
    Cedergren R;
    Ferbeyre G;
    Cocho G
</AU>
<TI>Structural and Thermodynamic Properties of DNA Uncover Different
Evolutionary Histories
</TI>
<SU>DNA;
    Structure;
    Thermodynamic;
    Evolutionary tree;
    Genomic;
    CA
</SU>
<AB>"We propose an index of DNA homogeneity (IDH) based on a binary
distribution model that quantifies structural and thermodynamic aggregates
present in DNA primary structures. Extensive analysis of sequence databases 
with
the IDH uncovers significant constraints on DNA sequence other than those
derived from codon usage or protein function. This index clearly distinguishes
between organisms of different evolutive origins and places them in disjoint
domains of DNA sequence space."
</AB>
<JT>J Mol Evol</JT>
<PY>40</PY>
<VO>40</VO>
<PP>698-704</PP>
</SEQ>

<SEQ>
<UI>1718   Pesole,G.     A Statistical Method f.. Mol.Phylogenet. 92 
1(2):91-96
</UI>
<AU>Pesole G;
    Attimonelli M;
    Preparata G;
    Saccone C
</AU>
<TI>A Statistical Method for Detecting Regions with Different Evolutionary
Dynamics in Multialigned Sequences
</TI>
<SU>Statistical;
    Region;
    Evolution;
    Multiple alignment;
    Sequence alignment;
    Stochastic;
    Gene;
    Italy;
    Dynamic
</SU>
<AB>"We describe a stochastic method for tracing the evolutionary pattern of
multialigned sequences. This method allows us to detect gene regions with
distinct evolutionary dynamics, e.g., regions that significantly deviate from
the expected behavior. Accurate detection of hypervariable or hyperconstrained
regions may provide useful information on the structure/function relationship 
of
biosequences. This information can help localize functional constraints. In
addition, the selection of distinct evolutionary dynamics may assist in the
correct use of biosequences as reliable molecular clocks."
</AB>
<JT>Mol Phylogenet Evol</JT>
<PY>1992</PY>
<VO>1</VO>
<NO>2</NO>
<PP>91-96</PP>
</SEQ>

<SEQ>
<UI>1719   Ragan,M.A.    Phylogenetic Inference.. Mol.Phylogenet. 92 
1(1):53-58
</UI>
<AU>Ragan MA
</AU>
<TI>Phylogenetic Inference Based on Matrix Representation of Trees
</TI>
<SU>Phylogenetic;
    Evolutionary tree;
    Consensus tree;
    CA;
    Matrix;
    Representation
</SU>
<AB>"Rooted phylogenetic trees can be represented as matrices in which the
rows correspond to termini, and columns correspond to internal nodes ....
Parsimony analysis of such a metrix will fully recover the topology of the
original tree. The maximum size of the represented matrix depends only on the
number of termini in the tree .... Representations of multiple trees ... can
readily be combined into a single matrix .... Parsimony analysis of the
resulting composite matrix yields a hybrid supertree which typically provides
greater resolution than conventional consensus trees."
</AB>
<JT>Mol Phylogenet Evol</JT>
<PY>1992</PY>
<VO>1</VO>
<NO>1</NO>
<PP>53-58</PP>
</SEQ>

<SEQ>
<UI>1720   Schoniger,M.  A Stochastic Model for.. Mol.Phylogenet. 94 
3(3):240-247
</UI>
<AU>Schoniger M;
    von Haeseler A
</AU>
<TI>A Stochastic Model for the Evolution of Autocorrelated DNA Sequences
</TI>
<SU>Stochastic;
    Model;
    Evolution;
    DNA;
    Substitution;
    DE
</SU>
<AB>"Currently used stochastic models of DNA sequence evolution assume
independent and identically distributed nucleotide sites. They are too simple 
to
account for dependence structures obviously present in molecular data. Up to 
now
more realistic stochastic models for nucleotide substitutions have been
considered intractable. In this paper a procedure that accounts for non-
overlapping correlations among pairs of sites of a DNA sequence is developed. 
We
show that currently used models that ignore correlated sites underestimate
distances inferred from observed sequence dissimilarities."
</AB>
<JT>Mol Phylogenet Evol</JT>
<PY>1994</PY>
<VO>3</VO>
<NO>3</NO>
<PP>240-247</PP>
</SEQ>

<SEQ>
<UI>1721   Steel,M.      A Frequency-Dependent .. Mol.Phylogenet. 95 
4(1):64-71
</UI>
<AU>Steel M;
    Lockhart PJ;
    Penny D
</AU>
<TI>A Frequency-Dependent Significance Test for Parsimony
</TI>
<SU>Evolutionary tree;
    Parsimony;
    Significance;
    Statistical;
    Sequence comparison;
    NZ
</SU>
<AB>"We describe techniques for assessing evolutionary trees constructed by
the parsimony criteria, when sequences exhibit irregular base compositions. In
particular, we extend a recently described frequency-dependent significance 
test
to handle any number of taxa and describe a modification of the 
Kishino-Hasegawa
sites test. These modifications are useful for detecting historical signals
beyond those patterns which arise purely from irregular base compositions
between the compared sequences. ... We also describe how the techniques can be
modified to determine how 'tree-like' data are, given independent variation in
the base frequencies."
</AB>
<JT>Mol Phylogenet Evol</JT>
<PY>1995</PY>
<VO>4</VO>
<NO>1</NO>
<PP>64-71</PP>
</SEQ>

<SEQ>
<UI>1722   van Batenburg An APL-programmed Gene.. J.Theor.Biol.   95 
174:269-280
</UI>
<AU>van Batenburg FHD;
    Gultyaev AP;
    Pleij CWA
</AU>
<TI>An APL-programmed Genetic Algorithm for the Prediction of RNA Secondary
Structure
</TI>
<SU>Genetic;
    Algorithm;
    Prediction;
    RNA;
    Secondary;
    Structure;
    NL
</SU>
<AB>"The possibilities of using a genetic algorithm for the prediction of RNA
secondary structure were investigated. The algorithm, using the procedure of
stepwise selection of the most fit structures (similarly to natural evolution),
allows different models of fitness or driving forces determining RNA structure
to be easily introduced. This can be used for simulation of the RNA folding
process and for the investigation of possible folding pathways. Such an
algorithm needs several modifications before it can predict RNA secondary
structures. After modification, a fair number of correct stems are predicted,
even when using computationally quick, but very crude, fitness criteria ...."
</AB>
<JT>J Theor Biol</JT>
<PY>174</PY>
<VO>174</VO>
<PP>269-280</PP>
</SEQ>

<SEQ>
<UI>1723   Vogt,G.       An Assessment of Amino.. J.Mol.Biol.     95 
249:816-831
</UI>
<AU>Vogt G;
    Etzold T;
    Argos P
</AU>
<TI>An Assessment of Amino Acid Exchange Matrices in Aligning Protein
Sequences: The Twilight Zone Revisited
</TI>
<SU>Amino acid;
    Protein;
    Sequence alignment;
    Residue;
    Gap;
    Substitution;
    Score;
    Matrix;
    DE
</SU>
<AB>"The sensitivity of most protein sequence alignment methods depends
strongly on the quality of the comparison matrices used. These matrices, which
assign weights or similarity scores to every possible amino acid substitution
pair, are utilized to differentiate amongst the various possible alignments of
two or more sequences. There are many ways to generate these exchange weights
and new matrices are constantly published. There has been no overall assessment
of these various matrices when applied in different alignment techniques and
over many protein folds and families, both close and distant and with the use 
of
several gap penalty values. In this work, a set of amino acid sequences matched
by superposition of known protein tertiary topologies is used to test the
alignment accuracy of the different method/matrix/penalty combinations."
</AB>
<JT>J Mol Biol</JT>
<PY>249</PY>
<VO>249</VO>
<PP>816-831</PP>
</SEQ>

<SEQ>
<UI>1724   Wheeler,W.C.  Elision: A Method for .. Mol.Phylogenet. 95 4(1):1-9
</UI>
<AU>Wheeler WC;
    Gatesy J;
    DeSalle R
</AU>
<TI>Elision: A Method for Accommodating Multiple Molecular Sequence 
Alignments
with Alignment-Ambiguous Sites
</TI>
<SU>Sequence alignment;
    Multiple alignment;
    Character weight;
    Consensus alignment;
    Phylogenetic;
    USA
</SU>
<AB>"Multiple alignments are frequently nonunique. Two sources of these
multiple alignments are analysis based on different sets of alignment parameter
values ... and nonunique equally costly alignments based on a single set of
alignment parameters. By 'eliding' these individual alignments into a single
grand alignment, phylogeny that is weighted toward those positions that align
more consistently can be reconstructed. Positions that show greater variation
among alignments will be relatively downweighted. The technique results in a
weighting procedure that is a posteriori and based on the evidence established
from the original sequence alignments."
</AB>
<JT>Mol Phylogenet Evol</JT>
<PY>1995</PY>
<VO>4</VO>
<NO>1</NO>
<PP>1-9</PP>
</SEQ>

<SEQ>
<UI>1725   Yang,Z.       Estimating the Pattern.. J.Mol.Evol.     94 
39:105-111
</UI>
<AU>Yang Z
</AU>
<TI>Estimating the Pattern of Nucleotide Substitution
</TI>
<SU>Nucleotide;
    Substitution;
    Model;
    Markov;
    Likelihood;
    DNA;
    Phylogenetic;
    UK
</SU>
<AB>"In this paper a model-based maximum likelihood approach is proposed for
estimating substitution patterns in real sequences. Nucleotide substitution is
assumed to follow a homogeneous Markov process, and the general reversible
process model (REV) and the unrestricted model without the reversibility
assumption are used. These models are also applied to examine the adequacy of
the [HKY85] model of Hasegawa, Kishino and Yano (1985). ... It is concluded 
that
the use of the REV model in phylogenetic analysis can be recommended, 
especially
for large data sets or for sequences with extreme substitution patterns, while
HKY85 may be expected to provide a good approximation."
</AB>
<JT>J Mol Evol</JT>
<PY>39</PY>
<VO>39</VO>
<PP>105-111</PP>
</SEQ>

<SEQ>
<UI>1726   Hart,W.E.     Fast Protein Folding i.. ACM Sympos.Theo 95 
27:157-168
</UI>
<AU>Hart WE;
    Istrail S
</AU>
<TI>Fast Protein Folding in the Hydrophobic-hydrophilic Model Within Three-
eights of Optimal (Extended Abstract)
</TI>
<SU>Protein;
    Folding;
    Complexity;
    USA;
    Model;
    Optimal
</SU>
<AB>"We present performance-guaranteed approximation algorithms for the
protein folding problem in the hydrophobic-hydrophilic model, Dill (1985). ...
The protein is modeled as a chain of amino acids of length n which are of two
types: H (hydrophobic, i.e., nonpolar) and P (hydrophilic, i.e., polar). ... 
Our
algorithms have linear (3n) time and achieve a three-dimensional protein
conformation that has a guaranteed free energy within 3/8 of optimal. ...
Equally important, the folding pathway and final conformations of our 
algorithms
are biologically plausible."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>27</PY>
<VO>27</VO>
<PP>157-168</PP>
</SEQ>

<SEQ>
<UI>1727   Kosaraju,S.R. Large-Scale Assembly o.. ACM Sympos.Theo 95 
27:169-177
</UI>
<AU>Kosaraju SR;
    Delcher AL
</AU>
<TI>Large-Scale Assembly of DNA Strings and Space-Efficient Construction of
Suffix Trees
</TI>
<SU>Supersequence;
    Sequence assembly;
    Shortest common;
    Suffix;
    Complexity;
    Ancestor;
    USA;
    DNA
</SU>
<AB>"We consider the problem of assembling a given set of DNA strings into a
small set of strings. A simple version of this problem is known as the
superstring problem. ... We first give a linear-time algorithm for the greedy
heuristic to construct a superstring. We then generalize the problem to several
DNA string assembly problems and develop greedy implementations for them. We
also describe efficient algorithms to compute the suffix tree for strings over
unbounded alphabets and to compute nearest common ancestors in trees."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>27</PY>
<VO>27</VO>
<PP>169-177</PP>
</SEQ>

<SEQ>
<UI>1728   Hannenhalli,S Transforming Cabbage i.. ACM Sympos.Theo 95 
27:178-189
</UI>
<AU>Hannenhalli S;
    Pevzner P
</AU>
<TI>Transforming Cabbage into Turnip (Polynomial Algorithm for Sorting Signed
Permutations by Reversals)
</TI>
<SU>Complexity;
    Reversal;
    Sort;
    Signed;
    Permutation;
    Genome;
    USA;
    Algorithm
</SU>
<AB>"Analysis of genomes evolving by inversions leads to a combinatorial
problem of sorting by reversals studied in detail recently. ... We study 
sorting
of signed permutations by reversals, a problem which adequately models
rearrangements in small genomes like chloroplast or mitochondrial DNA. The
previously suggested performance guarantee algorithms for sorting signed
permutations by reversals approximate the reversal distance between 
permutations
with an astonishing accuracy for both simulated and biological data. We prove a
duality theorem explaining this intriguing performance and show that there
exists a 'hidden' parameter which allows one to efficiently compute the 
reversal
distance between signed permutations."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>27</PY>
<VO>27</VO>
<PP>178-189</PP>
</SEQ>

<SEQ>
<UI>1729   Ferragina,P.  A Fully-Dynamic Data S.. ACM Sympos.Theo 95 
27:693-702
</UI>
<AU>Ferragina P;
    Grossi R
</AU>
<TI>A Fully-Dynamic Data Structure for External Substring Search
</TI>
<SU>Data structure;
    String search;
    Dynamic;
    Pattern search;
    Suffix;
    Italy;
    Structure
</SU>
<AB>"We address the issue of efficiently searching on external dynamic data
structures for strings, introducing the External Dynamic Substring Search 
[EDSS]
problem. ... We introduce the SB-Tree data structure for [a set of external 
text
strings], which is the first fully-dynamic data structure allowing the EDSS
problem to be solved with provably good worst-case and amortized I/O bounds. 
...
In this paper, we address the issue of efficiently finding all the occurrences
of a pattern string as a substring of many text strings."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>27</PY>
<VO>27</VO>
<PP>693-702</PP>
</SEQ>

<SEQ>
<UI>1730   Farach,M.     String Matching in Lem.. ACM Sympos.Theo 95 
27:703-712
</UI>
<AU>Farach M;
    Thorup M
</AU>
<TI>String Matching in Lempel-Ziv Compressed Strings
</TI>
<SU>String match;
    Compression;
    Lempel-Ziv;
    USA
</SU>
<AB>"String matching and Compression are two widely studied areas of computer
science. ... Data structures from string matching can be used to derive fast
implementations of many important compression schemes, most notably the Lempel-
Ziv (LZ1) algorithm. ... The Compressed Matching Problem is that of performing
string matching in a compressed text, without uncompressing it. ... In this
paper, we give the first non-trivial compressed matching algorithm for the
classic compression scheme, the LZ1 algorithm."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>27</PY>
<VO>27</VO>
<PP>703-712</PP>
</SEQ>

<SEQ>
<UI>1731   Czumaj,A.     Work-Time-Optimal Para.. ACM Sympos.Theo 95 
27:713-722
</UI>
<AU>Czumaj A;
    Galil Z;
    Gasieniec L;
    Park K;
    Plandowski W
</AU>
<TI>Work-Time-Optimal Parallel Algorithms for String Problems
</TI>
<SU>Parallel;
    Algorithm;
    String match;
    Pattern match;
    Regularities;
    Palindrome;
    Square;
    PO
</SU>
<AB>"A parallel algorithm is work-optimal if it uses the smallest possible
work; a work-optimal algorithm is work-time-optimal if it also uses the 
smallest
possible time. We design work-time-optimal algorithms for a number of string
processing problems on the EREW-PRAM and the hypercube. They include string
matching and two dimensional pattern matching."
</AB>
<JT>ACM Sympos Theory Comput</JT>
<PY>27</PY>
<VO>27</VO>
<PP>713-722</PP>
</SEQ>

<SEQ>
<UI>1732   Vivarelli,F.  LGANN: A Parallel Syst.. Comput.Appl.Bio 95 
11(3):253-260
</UI>
<AU>Vivarelli F;
    Giusti G;
    Villani M;
    Campanini R;
    Fariselli P;
    Compiani M;
    Casadio R
</AU>
<TI>LGANN: A Parallel System Combining a Local Genetic Algorithm and Neural
Networks for the Prediction of Secondary Structure of Proteins
</TI>
<SU>Parallel;
    Genetic;
    Algorithm;
    Neural;
    Network;
    Prediction;
    Secondary;
    Structure;
    Protein;
    Italy
</SU>
<AB>"In this work we describe a parallel system consisting of feed-forward
neural networks supervised by a local genetic algorithm. The system is
implemented in a transputer architecture and is used to predict the secondary
structures of globular proteins. This method allows a wide search in the
parameter space of the neural networks and the determination of their optimal
topology for the predictive task. Different neural network topologies are
selected by the genetic algorithm on the basis of minimal values of mean square
errors on the testing set."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1995</PY>
<VO>11</VO>
<NO>3</NO>
<PP>253-260</PP>
</SEQ>

<SEQ>
<UI>1733   Cantalloube,H Automat and BLAST: Com.. Comput.Appl.Bio 95 
11(3):261-272
</UI>
<AU>Cantalloube H;
    Labesse G;
    Chomilier J;
    Nahum C;
    Cho YY;
    Chams V;
    Achour A;
    Lachgar A;
    Mbika JP;
    Issing W;
    Mornon JP;
    Bizzini B;
    Zagury D;
    Zagury JF
</AU>
<TI>Automat and BLAST: Comparison of Two Protein Sequence Similarity Search
Programs
</TI>
<SU>Protein;
    Sequence search;
    Similarity;
    BLAST;
    FR;
    Program
</SU>
<AB>"Since the early 1980s, protein/DNA sequence similarity search has become
of major importance to biologists, and the need for fast and efficient tools
grows with the size of databanks. Two programs use the strategy of finite state
deterministic automatons to accomplish these searches. One of these two is
BLAST, which is now widely used, and the other Automat, which has just been
published. The differences and similarities in their basic principles, their 
use
and their performances are analysed in this paper in order to allow optimal use
of these important softwares."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1995</PY>
<VO>11</VO>
<NO>3</NO>
<PP>261-272</PP>
</SEQ>

<SEQ>
<UI>1734   Fondrat,C.    A Rapid Access Motif D.. Comput.Appl.Bio 95 
11(3):273-279
</UI>
<AU>Fondrat C;
    Dessen P
</AU>
<TI>A Rapid Access Motif Database (RAMdb) with a Search Algorithm for the
Retrieval Patterns in Nucleic Acids or Protein Databanks
</TI>
<SU>Databank;
    Motif;
    Database search;
    Retrieval;
    Pattern search;
    Nucleic acid;
    Protein;
    FR;
    Algorithm
</SU>
<AB>"We present here a codification structure, entirely interfaced with the
main packages for biomolecule database management, associated with a new search
algorithm to retrieve quickly a sequence in a database. This system is derived
from a method previously proposed for homology search in databanks with a
preprocessed codification of an entire database in which all the overlapping
subsequences of a specific length in a sequence were converted into a code and
stored in a hash-coding file. This new algorithm is designed for an improved 
use
of the codification."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1995</PY>
<VO>11</VO>
<NO>3</NO>
<PP>273-279</PP>
</SEQ>

<SEQ>
<UI>1735   Bansal,M.     NUPARM and NUCGEN: Sof.. Comput.Appl.Bio 95 
11(3):281-287
</UI>
<AU>Bansal M;
    Bhattacharyya D;
    Ravi B
</AU>
<TI>NUPARM and NUCGEN: Software for Analysis and Generation of Sequence
Dependent Nucleic Acid Structures
</TI>
<SU>DNA;
    Structure;
    Nucleic acid;
    Geometry;
    RNA;
    India
</SU>
<AB>"Software packages NUPARM and NUCGEN are described, which can be used to
understand sequence directed structural variations in nucleic acids, by 
analysis
and generation of non-uniform structures. A set of local inter basepair
parameters ... have been defined, which use geometry and coordinates of two
successive basepairs only and can be used to generate polymeric structures with
varying geometries for each of the 16 possible dinucleotide steps. ... NUPARM
can be used to analyse both DNA and RNA structures, with single as well as
double stranded helices."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1995</PY>
<VO>11</VO>
<NO>3</NO>
<PP>281-287</PP>
</SEQ>

<SEQ>
<UI>1736   Trelles-Salaz An Image-processing Ap.. Comput.Appl.Bio 95 
11(3):301-308
</UI>
<AU>Trelles-Salazar O;
    Zapata EL;
    Dopazo J;
    Coulson AFW;
    Carazo JM
</AU>
<TI>An Image-processing Approach to Dotplots: An X-Window-based Program for
Interactive Analysis of Dotplots Derived from Sequence and Structural Data
</TI>
<SU>Dot;
    Sequence comparison;
    Sequence analysis;
    Structure;
    SP;
    Program
</SU>
<AB>"We present an approach to the study of the relationships between
biological sequences and structures applying image analysis methods to 
dotplots.
We introduce a set of analytical tools based on different types of digital
image-processing filters that are new within the context of dotplots. We have
reformulated some of the usual approaches in dotplot analysis as mathematical
operations on images within the framework of mathematical morphology. An X-
Window-based implementation of this new approach has been developed and is
available by anonymous FTP."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1995</PY>
<VO>11</VO>
<NO>3</NO>
<PP>301-308</PP>
</SEQ>

<SEQ>
<UI>1737   Reczko,M.     A Parallel Neural Netw.. Comput.Appl.Bio 95 
11(3):309-315
</UI>
<AU>Reczko M;
    Hatzigeorgiou A;
    Mache N;
    Zell A;
    Suhai S
</AU>
<TI>A Parallel Neural Network Simulator on the Connection Machine CM-5
</TI>
<SU>Parallel;
    Neural;
    Network;
    Simulation;
    Pattern discovery;
    Prediction;
    DE
</SU>
<AB>"We here present a parallel implementation of artificial neural networks
on the connection machine CM-5 and compare it with other parallel
implementations on SIMD and MIMD architectures. This parallel implementation 
was
developed with the goal of efficiently training large neural networks with huge
training pattern sets for applications in molecular biology, in particular the
prediction of coding regions in DNA sequences."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1995</PY>
<VO>11</VO>
<NO>3</NO>
<PP>309-315</PP>
</SEQ>

<SEQ>
<UI>1738   Singh,G.B.    DNAView: A Quality Ass.. Comput.Appl.Bio 95 
11(3):317-319
</UI>
<AU>Singh GB;
    Krawetz SA
</AU>
<TI>DNAView: A Quality Assessment Tool for the Visualization of Large
Sequenced Regions
</TI>
<SU>Sequence analysis;
    Display;
    Region;
    Graphic;
    Nucleic acid;
    DNA;
    USA
</SU>
<AB>"This communication describes DNAView, a graphical tool for the
visualization and printing of large nucleic acid sequences. DNAView uses color
coding to compactly display genomic segments of up to 100kb on a single printed
page. The specific color schemes integrated into DNAView can highlight 'local
aggregate' properties of large segments of DNA. We have also incorporated a
confidence expression for the assigned sequence. This is represented by base
color intensity that is proportional to the number of times that base was
sequenced."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1995</PY>
<VO>11</VO>
<NO>3</NO>
<PP>317-319</PP>
</SEQ>

<SEQ>
<UI>1739   Mironov,A.A.  DNASUN: A Package of C.. Comput.Appl.Bio 95 
11(3):331-335
</UI>
<AU>Mironov AA;
    Alexandrov NN;
    Bogodarova NY;
    Grigorjev A;
    Lebedev VF;
    Lunovskaya LV;
    Truchan ME;
    Pevzner PA
</AU>
<TI>DNASUN: A Package of Computer Programs for the Biotechnology Laboratory
</TI>
<SU>DNA;
    Gene;
    Program;
    Sequence analysis;
    Sequence alignment;
    Physical mapping;
    Sequencing;
    Protein;
    Nucleotide;
    RU
</SU>
<AB>"The paper describes a new software package DNASUN developed for
supporting gene engineering laboratories. The package provides a user-friendly
interface for experimental researches and supports the traditional
nucleotide/protein sequence analysis as well as physical mapping, sequencing,
plasmid manipulations, optimal oligonucleotide probe selection and other common
molecular biology procedures."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1995</PY>
<VO>11</VO>
<NO>3</NO>
<PP>331-335</PP>
</SEQ>

<SEQ>
<UI>1740   Ferran,E.A.   A Hybrid Method to Clu.. Comput.Appl.Bio 93 
9(6):671-680
</UI>
<AU>Ferran EA;
    Pflugfelder B
</AU>
<TI>A Hybrid Method to Cluster Protein Sequences based on Statistics and
Artificial Neural Networks
</TI>
<SU>Clustering;
    Protein;
    Sequence analysis;
    Statistical;
    Neural;
    FR;
    Network
</SU>
<AB>"We have recently proposed a method, based on artificial neural networks
(ANNs) to cluster protein sequences into families according to their degree of
sequence similarity. The network was trained with an unsupervised learning
algorithm, using, as inputs, matrix patterns derived from the bipeptide
composition of the protein sequences. We describe here some further 
improvements
to that approach. ... Finally, we propose a new hybrid method of the 
statistical
and ANN approaches, in which the results of the statistical method are used to
choose the number of neurons and inputs of the network."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1993</PY>
<VO>9</VO>
<NO>6</NO>
<PP>671-680</PP>
</SEQ>

<SEQ>
<UI>1741   Kumar,S.      MEGA: Molecular Evolut.. Comput.Appl.Bio 94 
10(2):189-191
</UI>
<AU>Kumar S;
    Tamura K;
    Nei M
</AU>
<TI>MEGA: Molecular Evolutionary Genetics Analysis Software for 
Microcomputers
</TI>
<SU>Evolution;
    Genetic;
    Evolutionary distance;
    Phylogenetic;
    Statistical;
    UPGMA;
    Parsimony;
    Neighbor joining;
    USA
</SU>
<AB>"A computer program package called MEGA has been developed for estimating
evolutionary distances, reconstructing phylogenetic trees and computing basic
statistical quantities from molecular data. ... In this program, various 
methods
for estimating evolutionary distances from nucleotide and amino acid sequence
data, three different methods of phylogenetic inference (UPGMA, 
neighbor-joining
and maximum parsimony) and two statistical tests of topological differences are
included."
</AB>
<JT>Comput Appl Biosci</JT>
<PY>1994</PY>
<VO>10</VO>
<NO>2</NO>
<PP>189-191</PP>
</SEQ>

<SEQ>
<UI>1742   DeSalle,R.    Implications of Ancien.. Experientia     94 
50(6):543-550
</UI>
<AU>DeSalle R
</AU>
<TI>Implications of Ancient DNA for Phylogenetic Studies
</TI>
<SU>DNA;
    Ancient;
    Phylogenetic;
    Review;
    Cladistic;
    USA
</SU>
<AB>"The utility of DNA sequence characters from fossil specimens is examined
from a phylogenetic perspective. Four ways that fossil characters can alter
phylogenetic hypotheses are discussed. ... Fossil DNA sequences as characters
will be affected by the problem of missing data and missing taxa. In general,
cladogram accuracy will be more greatly affected by missing taxa and cladogram
resolution will be affected more acutely by missing data. Due to these points,
an examination of the importance of the phylogenetic question being addressed,
the utility of the fossil DNA sequences and the rarity of the fossil should be
considered before damage of a fossil is undertaken."
</AB>
<JT>Experientia </JT>
<PY>1994</PY>
<VO>50</VO>
<NO>6</NO>
<PP>543-550</PP>
</SEQ>

<SEQ>
<UI>1743   Archakov,A.I. Structural Classificat.. Biochem.Mol.Bio 93 
31(6):1071-108
</UI>
<AU>Archakov AI;
    Degtyare? KN
</AU>
<TI>Structural Classification of the P450 Superfamily based on Consensus
Sequence Comparison
</TI>
<SU>Protein;
    Structure;
    Classification;
    Consensus sequence;
    Sequence comparison;
    Superfamily
</SU>
<AB>CSNA Service 23(1994).
</AB>
<JT>Biochem Mol Biol Intl</JT>
<PY>1993</PY>
<VO>31</VO>
<NO>6</NO>
<PP>1071-1080</PP>
</SEQ>

<SEQ>
<UI>1744   Pietrokovski, Comparing Nucleotide a.. J.Biotechnol.   94 
35(2/3):257-27
</UI>
<AU>Pietrokovski S
</AU>
<TI>Comparing Nucleotide and Protein Sequences by Linguistic Methods
</TI>
<SU>Nucleotide;
    Protein;
    Sequence comparison;
    Linguistic
</SU>
<AB>CSNA Service 23(1994).
</AB>
<JT>J Biotechnol</JT>
<PY>1994</PY>
<VO>35</VO>
<NO>2/3</NO>
<PP>257-272</PP>
</SEQ>

<SEQ>
<UI>1745   Marshall,C.R. Dollo's Law and the De.. Proc.Nat.Acad.S 94 
91(25):12283-1
</UI>
<AU>Marshall CR;
    Raff EC;
    Raff RA
</AU>
<TI>Dollo's Law and the Death and Resurrection of Genes
</TI>
<SU>Dollo;
    Gene;
    Genome;
    Evolution;
    Genetic;
    Protein;
    USA
</SU>
<AB>"Dollo's law, the concept that evolution is not substantively reversible,
implies that the degradation of genetic information is sufficiently fast that
genes or developmental pathways released from selective pressure will rapidly
become nonfunctional. Using empirical data to assess the rate of loss of coding
information in genes for proteins with varying degrees of tolerance to
mutational change, we show that, in fact, there is a significant probability
over evolutionary time scales of 0.5-6 million years for successful 
reactivation
of silenced genes or 'lost' developmental programs. Conversely, the 
reactivation
of long (&gt;10 million years)-unexpressed genes and dormant developmental 
pathways
is not possible unless function is maintained by other selective constraints
...."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>1994</PY>
<VO>91</VO>
<NO>25</NO>
<PP>12283-12287</PP>
</SEQ>

<SEQ>
<UI>1746   Schuster,P.   From Sequences to Shap.. Proc.R.Soc.Lond 94 
255(1344):279-
</UI>
<AU>Schuster P;
    Fontana W;
    Stadler PF;
    Hofacker IL
</AU>
<TI>From Sequences to Shapes and Back: A Case-Study in RNA Secondary
Structures
</TI>
<SU>RNA;
    Secondary;
    Structure;
    Folding;
    Sequence analysis;
    USA
</SU>
<AB>"RNA folding is viewed here as a map assigning secondary structures to
sequences. ... By using an algorithm for inverse folding, we show that 
sequences
sharing the same structure are distributed randomly over sequence space. All
common structures can be accessed from an arbitrary sequence by a number of
mutations much smaller than the chain length. ... Implications for evolutionary
adaptation and for applied molecular evolution are evident: finding a 
particular
structure by mutation and selection is much simpler than expected and, even if
catalytic activity should turn out to be sparse in the space of RNA structures,
it can hardly be missed by evolutionary processes."
</AB>
<JT>Proc R Soc Lond Ser B </JT>
<PY>1994</PY>
<VO>255</VO>
<NO>1344</NO>
<PP>279-284</PP>
</SEQ>

<SEQ>
<UI>1747   Olmstead,R.G. Chloroplast DNA System.. Am.J.Bot.       94 
81(9):1205-122
</UI>
<AU>Olmstead RG;
    Palmer JD
</AU>
<TI>Chloroplast DNA Systematics: A Review of Methods and Data Analysis
</TI>
<SU>DNA;
    Chloroplast;
    Systematics;
    Review;
    Restriction;
    Mapping;
    Sequence comparison;
    Phylogenetic;
    USA
</SU>
<AB>"The field of plant molecular systematics is expanding rapidly, and with
it new and refined methods are coming into use. This paper reviews recent
advances in experimental methods and data analysis, as applied to the
chloroplast genome. ... The relative advantages and disadvantages of 
comparative
restriction site mapping and DNA sequencing are reviewed. For both methods, the
analysis of resulting data requires sufficient taxon and character sampling to
achieve the best possible estimate of phylogenetic relationships. Parsimony
analysis is particularly sensitive to the issue of taxon sampling due to the
problem of long branches attracting on a tree."
</AB>
<JT>Am J Bot</JT>
<PY>1994</PY>
<VO>81</VO>
<NO>9</NO>
<PP>1205-1224</PP>
</SEQ>

<SEQ>
<UI>1748   Brower,A.V.Z. Practical and Theoreti.. Ann.Entomol.Soc 94 
87(6):702-716
</UI>
<AU>Brower AVZ;
    DeSalle R
</AU>
<TI>Practical and Theoretical Considerations for Choice of a DNA-Sequence
Region in Insect Molecular Systematics, with a Short Review of Published 
Studies
using Nuclear Gene Regions (Review)
</TI>
<SU>DNA;
    Region;
    Gene;
    Review;
    Insect;
    Systematics
</SU>
<AB>CSNA Service 23(1994).
</AB>
<JT>Ann Entomol Soc Am</JT>
<PY>1994</PY>
<VO>87</VO>
<NO>6</NO>
<PP>702-716</PP>
</SEQ>

<SEQ>
<UI>1749   Dong,S.       Gene Structure Predict.. Genomics        94 
23(3):540-551
</UI>
<AU>Dong S;
    Searls DB
</AU>
<TI>Gene Structure Prediction by Linguistic Methods
</TI>
<SU>Gene;
    Structure;
    Prediction;
    Linguistic
</SU>
<AB>CSNA Service 23(1994).
</AB>
<JT>Genomics </JT>
<PY>1994</PY>
<VO>23</VO>
<NO>3</NO>
<PP>540-551</PP>
</SEQ>

<SEQ>
<UI>1750   Wheeler,W.C.  Malign - A Multiple Se.. J.Hered.        94 
85(5):417-418
</UI>
<AU>Wheeler WC;
    Gladstei? DS
</AU>
<TI>Malign - A Multiple Sequence Alignment Program (Technical Note)
</TI>
<SU>Multiple alignment;
    Sequence alignment;
    Program
</SU>
<AB>CSNA Service 23(1994).
</AB>
<JT>J Hered</JT>
<PY>1994</PY>
<VO>85</VO>
<NO>5</NO>
<PP>417-418</PP>
</SEQ>

<SEQ>
<UI>1751   Adell,J.C.    Monte Carlo Simulation.. J.Mol.Evol.     94 
38(3):305-309
</UI>
<AU>Adell JC;
    Dopazo J
</AU>
<TI>Monte Carlo Simulation in Phylogenies: An Application to Test the
Constancy of Evolutionary Rates
</TI>
<SU>Monte Carlo;
    Simulation;
    Phylogenetic;
    Evolutionary rate;
    Bootstrap;
    Clock;
    Least squares;
    SP;
    Phylogeny;
    Rate
</SU>
<AB>"Monte Carlo simulation has commonly been used in phylogenetic studies to
test different tree-reconstruction methods, and consequently, its application
for testing evolutionary models can be considered as a natural extension of 
this
usage. Repetitive simulation of a given evolutionary process, under the
restrictions imposed by the model to be tested, along a determinate tree
topology allow the estimate of probability distributions for the desired
parameters. Next, the phylogenetic tree can be reconstructed again without the
constraints of the model, and the parameter of interest, derived from this 
tree,
can be compared to the corresponding probability distribution derived from the
restricted, simulated trees."
</AB>
<JT>J Mol Evol</JT>
<PY>1994</PY>
<VO>38</VO>
<NO>3</NO>
<PP>305-309</PP>
</SEQ>

<SEQ>
<UI>1752   Krogh,A.      Hidden Markov Models i.. J.Mol.Biol.     94 
235(5):1501-15
</UI>
<AU>Krogh A;
    Brown M;
    Mian IS;
    Sjolander K;
    Haussler D
</AU>
<TI>Hidden Markov Models in Computational Biology - Applications to Protein
Modeling
</TI>
<SU>Protein;
    Model;
    Markov;
    Sequence alignment;
    Multiple alignment;
    Database search;
    Statistical;
    USA
</SU>
<AB>"Hidden Markov Models (HMMs) are applied to the problems of statistical
modeling, database seqrching and multiple sequence alignment of protein 
families
and protein domains. These methods are demonstrated on the globin family, the
protein kinase catalytic domain, and the EF-hand calcium binding motif. ... The
HMM produces multiple alignments of good quality that agree closely with the
alignments produced by programs that incorporate three-dimensional structural
information."
</AB>
<JT>J Mol Biol</JT>
<PY>1994</PY>
<VO>235</VO>
<NO>5</NO>
<PP>1501-1531</PP>
</SEQ>

<SEQ>
<UI>1753   Hess,S.T.     Wide Variations in Nei.. J.Mol.Biol.     94 
236(4):1022-10
</UI>
<AU>Hess ST;
    Blake JD;
    Blake RD
</AU>
<TI>Wide Variations in Neighbor-dependent Substitution Rates
</TI>
<SU>Substitution;
    Rate;
    Bias;
    Indel;
    USA
</SU>
<AB>"The pattern of 20,200 point substitutions in the 16 unique neighbor-pair
environments has been determined from aligned gene/pseudogene sequences in the
current database of human DNA sequences. Substitution rates, representing
averages over those for different regions of the genome, are distributed over a
60-fold range with strong biases in particular neighbor-pair environments. ...
Characteristic biases of the content and arrangement of oligonucleotide strings
or tuples in all sequence elements, but particularly in non-coding regions,
appear to be due to the pattern of different neighbor-dependent substitution
rates."
</AB>
<JT>J Mol Biol</JT>
<PY>1994</PY>
<VO>236</VO>
<NO>4</NO>
<PP>1022-1033</PP>
</SEQ>

<SEQ>
<UI>1754   Smith,A.B.    Paleontological Data a.. Paleobiology    94 
20(3):259-273
</UI>
<AU>Smith AB;
    Littlewood DTJ
</AU>
<TI>Paleontological Data and Molecular Phylogenetic Analysis
</TI>
<SU>Phylogenetic;
    Sequence analysis;
    Rate;
    Evolution;
    UK
</SU>
<AB>"Molecular data are becoming an indispensable tool for the reconstruction
of phylogenies. Fossil molecular data remain scarce, but have the potential to
resolve patterns of deep branching and provide empirical tests of tree
reconstruction techniques. A total evidence approach, combining and comparing
complementary morphological, molecular and stratigraphical data from both 
recent
and fossil taxa, is advocated as the most promising way forward because there
are several well-established problems that can afflict the analysis of 
molecular
sequence data sometimes resulting in spurious tree topologies."
</AB>
<JT>Paleobiology </JT>
<PY>1994</PY>
<VO>20</VO>
<NO>3</NO>
<PP>259-273</PP>
</SEQ>

<SEQ>
<UI>1755   Conklin,D.    Knowledge Discovery in.. IEEE Trans.Know 93 
5(6):985-987
</UI>
<AU>Conklin D;
    Fortier S;
    Glasgow J
</AU>
<TI>Knowledge Discovery in Molecular Databases (Letter)
</TI>
<SU>Knowledge;
    Sequence database;
    Database search
</SU>
<AB>CSNA Service 23(1994).
</AB>
<JT>IEEE Trans Knowledge Data Eng</JT>
<PY>1993</PY>
<VO>5</VO>
<NO>6</NO>
<PP>985-987</PP>
</SEQ>

<SEQ>
<UI>1756   Du,M.W.       An Approach to Designi.. IEEE Trans.Know 94 
6(4):620-633
</UI>
<AU>Du MW;
    Chang SC
</AU>
<TI>An Approach to Designing Very Fast Approximate String-Matching Algorithms
</TI>
<SU>String match;
    Approximate match;
    Algorithm
</SU>
<AB>CSNA Service 23(1994).
</AB>
<JT>IEEE Trans Knowledge Data Eng</JT>
<PY>1994</PY>
<VO>6</VO>
<NO>4</NO>
<PP>620-633</PP>
</SEQ>

<SEQ>
<UI>1757   Bertossi,A.A. Parallel String-Matchi.. J.Parallel Dist 94 
22(2):229-234
</UI>
<AU>Bertossi AA;
    Logi F
</AU>
<TI>Parallel String-Matching with Variable-Length Don't Cares
</TI>
<SU>Parallel;
    String match;
    Don't care
</SU>
<AB>CSNA Service 23(1994).
</AB>
<JT>J Parallel Distrib Comput</JT>
<PY>1994</PY>
<VO>22</VO>
<NO>2</NO>
<PP>229-234</PP>
</SEQ>

<SEQ>
<UI>1758   Pande,V.S.    Nonrandomness in Prote.. Proc.Nat.Acad.S 94 
91(26):12972-1
</UI>
<AU>Pande VS;
    Grosberg AY;
    Tanaka T
</AU>
<TI>Nonrandomness in Protein Sequences: Evidence for a Physically Driven 
Stage
of Evolution?
</TI>
<SU>Evolution;
    Protein;
    Sequence analysis;
    DNA;
    Statistical;
    Genetic
</SU>
<AB>"The sequences, or primary structures, of existing biopolymers-in
particular, proteins-are believed to be a product of evolution. Are the
sequences random? If not, what is the character of this nonrandomness? To
explore the statistics of protein sequences, we use the idea of mapping the
sequence onto the trajectory of a random walk, originally proposed by Peng et
al. in their analysis of DNA sequences. Using three different mappings,
corresponding to three basic physical interactions between amino acids, we 
found
pronounced deviations from pure
randomness, and these deviations seem directed toward minimization of the 
energy
of the three-dimensional structure. We consider this result as evidence for a
physically driven stage of evolution."
</AB>
<JT>Proc Nat Acad Sci USA </JT>
<PY>1994</PY>
<VO>91</VO>
<NO>26</NO>
<PP>12972-12975</PP>
</SEQ>

<SEQ>
<UI>1759   Attwood,T.K.  PRINTS - A Database of.. Nucleic Acids R 94 
22(17):3590-35
</UI>
<AU>Attwood TK;
    Beck ME;
    Bleasby AJ;
    Parry-Smith DJ
</AU>
<TI>PRINTS - A Database of Protein Motif Fingerprints
</TI>
<SU>Database search;
    Protein;
    Fingerprint;
    UK;
    Motif
</SU>
<AB>"PRINTS is a compendium of protein motif 'fingerprints'. A fingerprint is
defined as a group of motifs excised from conserved regions of a sequence
alignment, whose diagnostic power or potency is refined by iterative
databasescanning (in this case the OWL composite sequence database). ... The 
use
of groups of independent, linearly- or spatially- distinct motifs allows 
protein
folds and functionalities to be characterised more flexibly and powerfully than
conventional single-component patterns or regular expressions. ... The
information contained within PRINTS is distinct from, but complementary to the
consensus expressions stored in the widely-used PROSITE dictionary of 
patterns."
</AB>
<JT>Nucleic Acids Res</JT>
<PY>1994</PY>
<VO>22</VO>
<NO>17</NO>
<PP>3590-3596</PP>
</SEQ>

<SEQ>
<UI>1760   Rost,B.       Structure Prediction o.. Curr.Opin.Biote 94 
5(4):372-380
</UI>
<AU>Rost B;
    Sander C
</AU>
<TI>Structure Prediction of Proteins - Where are We Now?
</TI>
<SU>Protein;
    Structure;
    Nucleotide;
    Sequence analysis;
    Secondary;
    DE;
    Prediction
</SU>
<AB>"Although the 'structure from sequence' prediction problem remains
fundamentally unsolved, new and promising methods in one, two and three
dimensions have reopened the field. Significantly improved one-dimensional
prediction of secondary structure from multiple sequence alignments is now in
routine use. In the two-dimensional approach, inter-residue contacts can be
detected by analysis of correlated mutations, albeit with low accuracy. 
Finally,
three-dimensional methods, in which pseudopotentials or information values are
derived from the databases, are proving their value for distinguishing between
correct and incorrect models."
</AB>
<JT>Curr Opin Biotechnol</JT>
<PY>1994</PY>
<VO>5</VO>
<NO>4</NO>
<PP>372-380</PP>
</SEQ>
</BIB>