Abstract

Measuring sequence similarity and compressing texts are among the most fundamental tasks in string algorithms. In this work, we develop near-optimal *quantum* algorithms for the central problems in these two areas: computing the edit distance of two strings [Levenshtein, 1965] and building the Lempel-Ziv factorization of a string [Ziv & Lempel, 1977], respectively.

Classically, the edit distance of two length-*n* strings can be computed in *𝒪*(*n*^{2}) time and there is little hope for a significantly faster algorithm: an *𝒪*(*n*^{2-ɛ})-time procedure would falsify the Strong Exponential Time Hypothesis. Quantum computers might circumvent this lower bound, but even 3-approximation of edit distance is not known to admit an *𝒪*(*n*^{2-ɛ})-time quantum algorithm. In the *bounded* setting, where the complexity is parameterized by the value *k* of the edit distance, there is an *𝒪*(*n* + *k*^{2})-time classical algorithm [Myers, 1986; Landau & Vishkin, 1988], which is optimal (up to sub-polynomial factors and conditioned on SETH) as a function of *n* and *k*. Our first main contribution is a quantum

- time algorithm that uses

queries, where the

*Õ*(·) notation hides polylogarithmic factors. This query complexity is unconditionally optimal, and any significant improvement in the time complexity would break the quadratic barrier for the unbounded setting. Interestingly, our divide-and-conquer quantum algorithm reduces the bounded edit distance problem to the special case where the two input strings have small Lempel-Ziv factorizations. Then, it combines our quantum LZ compression algorithm with a classical subroutine computing edit distance between compressed strings. The LZ factorization problem can be classically solved in

*𝒪*(

*n*) time, which is unconditionally optimal in the quantum setting (even for computing just the size

*z* of the factorization). We can, however, hope for a quantum speedup if we parameterize the complexity in terms of

*z*. Already a generic oracle identification algorithm [Kothari 2014] yields the optimal query complexity of

at the price of exponential running time. Our second main contribution is a quantum algorithm that also achieves the optimal time complexity of

*𝒪*(

*z* log

^{2} *n*). The key insight is the introduction of a novel LZ-like factorization of size

*𝒪*(

*z* log

^{2} *n*), which allows us to efficiently compute each new factor through a combination of classical and quantum algorithmic techniques. From this, we obtain the desired LZ factorization. Using existing results [Kempa & Kociumaka, 2020], we can then obtain the string's run-length encoded Burrows-Wheeler Transform (BWT)—another classical compressor [Burrows & Wheeler, 1994], and a structure for longest common extensions (LCE) queries in

*Õ*(

*z*) extra time [I, 2017; Nishimoto et al., 2016].

Lastly, we obtain efficient indexes of size *𝒪*(*z*) for counting and reporting the occurrences of a given pattern and for supporting more general suffix array and inverse suffix array queries, based on the recent *r-index* [Gagie, Navarro, and Prezza, 2020]. These indexes can be constructed in

quantum time, which allows us to solve many fundamental problems, like longest common substring, maximal unique matches, and Lyndon factorization, in time

.

^{*} The full version of the paper can be accessed at https://arxiv.org/abs/2311.01793