Suffix Array

Definition

Let $s$ be a string of length $n$. The $i$-th suffix of $s$ is the substring $s[i \ldots n - 1]$.

A suffix array will contain integers that represent the starting indexes of the all the suffixes of a given string, after the aforementioned suffixes are sorted.

As an example look at the string $s = abaab$. All suffixes are as follows

$$\begin{array}{ll} 0. & abaab \\ 1. & baab \\ 2. & aab \\ 3. & ab \\ 4. & b \end{array}$$

After sorting these strings:

$$\begin{array}{ll} 2. & aab \\ 3. & ab \\ 0. & abaab \\ 4. & b \\ 1. & baab \end{array}$$

Therefore the suffix array for $s$ will be $(2,~ 3,~ 0,~ 4,~ 1)$.

As a data structure it is widely used in areas such as data compression, bioinformatics and, in general, in any area that deals with strings and string matching problems.

Construction

$O(n^2 \log n)$ approach

This is the most naive approach. Get all the suffixes and sort them using quicksort or mergesort and simultaneously retain their original indices. Sorting uses $O(n \log n)$ comparisons, and since comparing two strings will additionally take $O(n)$ time, we get the final complexity of $O(n^2 \log n)$.