This post is based on the lecture “Selected Topics in String Algorithms”1 delivered by Ce Jin.
Preliminaries
An alphabet is a finite and nonempty set, usually denoted by $\Sigma$. The elements of $\Sigma$ are called characters. A string of length $n$ is a sequence of characters $(s_1, s_2, \ldots, s_{n})$ with $s_i\in \Sigma$, and is denoted by $s[1\ldots n]$. We write $|s|$ for the length of a string $s$. A substring $s[i\ldots j] (1\leq i\leq j\leq n)$ of $s$ is the sequence $(s_i, s_{i+1}, \ldots, s_{j})$. We use $s[i]=s_i$ to denote the $i$-th character of $s$. For simplicity, we also write $s[i\ldots j]=s[i]s[i+1]\ldots s[j]$ as shorthand for the sequence $(s_i,s_{i+1},\ldots, s_{j})$2. A prefix of $s$ of length $x$, denoted $pre(s, x)$, is the substring $s[1\ldots x]$; similarly, a suffix of length $x$, denoted $suf(s,x)$, is the substring $s[n-x+1\ldots n]$.
String Matching Problem
A string matching problem is stated as follows: given 2 strings $s$ and $t$ of length $n$ and $m (n\leq m)$, respectively. Determine that if there is a substring $t[i\ldots i+n-1], (1\leq i\leq m-n+1)$ of $t$ such that $s=t[i\ldots i+n-1]$.
I’ll stop here today! I’m too lazy!!!
Brute Force Method
Borders and Periodicity
Prefix Function
Efficient Algorithm for Prefix Function
Periodicity Lemma
Applications
-
中文:「字符串算法选讲」。You might find this recording video on bilibili useful. ↩
-
$s[i\ldots i]$ can also be written as $s[i]$. ↩