Logged Out Log In
SIAM J. Comput. 32, pp. 1654-1673 (20 pages)
A Subquadratic Sequence Alignment Algorithm for Unrestricted Scoring Matrices
Given two strings of size $n$ over a constant alphabet, the classical algorithm for computing the similarity between two sequences [D. Sankoff and J. B. Kruskal, eds., {Time Warps, String Edits, and Macromolecules}; Addison-Wesley, Reading, MA, 1983; T. F. Smith and M. S. Waterman, { J.\ Molec.\ Biol., 147 (1981), pp. 195-197] uses a dynamic programming matrix and compares the two strings in O(n2) time. We address the challenge of computing the similarity of two strings in subquadratic time for metrics which use a scoring matrix of unrestricted weights. Our algorithm applies to both {local} and {global} similarity computations. The speed-up is achieved by dividing the dynamic programming matrix into variable sized blocks, as induced by Lempel-Ziv parsing of both strings, and utilizing the inherent periodic nature of both strings. This leads to an $O(n^2 / \log n)$, algorithm for an input of constant alphabet size. For most texts, the time complexity is actually $O(h n^2 / \log n)$, where $h \le 1$ is the entropy of the text. We also present an algorithm for comparing two {run-length} encoded strings of length m and n, compressed into m' and n' runs, respectively, in O(m'n + n'm) complexity. This result extends to all distance or similarity scoring schemes that use an additive gap penalty.
© 2003 Society for Industrial and Applied Mathematics
RELATED DATABASES
To view database links for this article,
you need to log in.
KEYWORDS
PUBLICATION DATA
ARTICLE DATA
Digital Object Identifier
For access to fully linked references, you need to log in.
For access to citing articles, you need to log in.




ALL SIAM Content
Scitation
Google Scholar