Free access
Proceedings
Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms

Massively Parallel Approximation Algorithms for Edit Distance and Longest Common Subsequence

Abstract

String similarity measures are among the most fundamental problems in computer science. The notable examples are edit distance (ED) and longest common subsequence (LCS). These problems find their applications in various contexts such as computational biology, text processing, compiler optimization, data analysis, image analysis, etc. In this work, we revisit edit distance and longest common subsequence in the parallel settings. We present massively parallel algorithms for both problems that are optimal in the following senses:
The approximation factor of our algorithms is 1 + ∊.
The round complexity of our algorithms is constant.
The total running time of our algorithms over all machines is Õ(n2). This matches the running time of the best-known solutions for approximating edit distance and longest common subsequence within a 1 + factor in the sequential setting.
Our result for edit distance substantially improves the massively parallel algorithm of [15] in terms of approximation factor, round complexity, number of machines, and total running time. Our unified approach to tackle both problems is to divide one of the strings into smaller blocks and try to locally predict which intervals of the other string correspond to each block in an optimal solution.
Our main technical contribution is a novel parallel algorithm for computing a set of compositions, and recursively decomposing each function into a set of smaller iterative compositions (in terms of memory needed to solve the problem). These two methods together give us a strong tool for approximating combinatorial problems. For instance, LCS can be formulated as a recursive composition of functions and therefore this tool enables us to approximate LCS within a factor 1 + . Indeed, we recursively decompose the problem until we are able to compute the solution on a single machine. Since our methods are quite general, we expect this technique to find its applications in other combinatorial problems as well.

Formats available

You can view the full content in the following formats:

Information & Authors

Information

Published In

cover image Proceedings
Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms
Pages: 1654 - 1672
Editor: Timothy M. Chan, University of Illinois at Urbana-Champaign, USA
ISBN (Online): 978-1-61197-548-2

History

Published online: 2 January 2019

Authors

Affiliations

MohammadTaghi Hajiaghayi

Notes

*
A portion of this work was completed while some of the authors were visiting Simons Institute for Theory of Computing.
Supported in part by NSF CAREER award CCF-1053605, NSF AF: Medium grant CCF-1161365, NSF BIGDATA grant IIS-1546108, NSF SPX grant CCF-1822738, and two small UMD AI in Business and Society Seed Grant and UMD Year of Data Science Program Grant
Details of the algorithm for edit distance and the 1 + approximation algorithm for LCS are include in the full version of the paper.

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

Figures

Tables

Media

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media