SIAM Digital Library
 
 
 

You are not logged in Logged Out Log In

SIAM J. Comput. 32, pp. 1654-1673 (20 pages)

A Subquadratic Sequence Alignment Algorithm for Unrestricted Scoring Matrices

Maxime Crochemore, Gad M. Landau, and Michal Ziv-Ukelson

Full Text: Download PDF | Buy PDF (US$25) | View Cart
Given two strings of size $n$ over a constant alphabet, the classical algorithm for computing the similarity between two sequences [D. Sankoff and J. B. Kruskal, eds., {Time Warps, String Edits, and Macromolecules}; Addison-Wesley, Reading, MA, 1983; T. F. Smith and M. S. Waterman, { J.\ Molec.\ Biol., 147 (1981), pp. 195-197] uses a dynamic programming matrix and compares the two strings in O(n2) time. We address the challenge of computing the similarity of two strings in subquadratic time for metrics which use a scoring matrix of unrestricted weights. Our algorithm applies to both {local} and {global} similarity computations. The speed-up is achieved by dividing the dynamic programming matrix into variable sized blocks, as induced by Lempel-Ziv parsing of both strings, and utilizing the inherent periodic nature of both strings. This leads to an $O(n^2 / \log n)$, algorithm for an input of constant alphabet size. For most texts, the time complexity is actually $O(h n^2 / \log n)$, where $h \le 1$ is the entropy of the text. We also present an algorithm for comparing two {run-length} encoded strings of length m and n, compressed into m' and n' runs, respectively, in O(m'n + n'm) complexity. This result extends to all distance or similarity scoring schemes that use an additive gap penalty.

© 2003 Society for Industrial and Applied Mathematics

RELATED DATABASES

To view database links for this article, you need to log in.

KEYWORDS

PUBLICATION DATA

ISSN

0097-5397 (print)  
1095-7111 (online)

ARTICLE DATA


For access to fully linked references, you need to log in.

For access to citing articles, you need to log in.


Close

close