You are given a string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$ of length <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$n$.
The prefix function for this string is defined as an array <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi></math>$\pi$ of length <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$n$, where <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo></math>$\pi[i]$ is the length of the longest proper prefix of the substring <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo stretchy="false">[</mo><mn>0</mn><mo>…</mo><mi>i</mi><mo stretchy="false">]</mo></math>$s[0 \dots i]$ which is also a suffix of this substring.
A proper prefix of a string is a prefix that is not equal to the string itself.
By definition, <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mn>0</mn><mo stretchy="false">]</mo><mo>=</mo><mn>0</mn></math>$\pi[0] = 0$.
Mathematically the definition of the prefix function can be written as follows:
For example, prefix function of string "abcabcd" is <math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">[</mo><mn>0</mn><mo>,</mo><mn>0</mn><mo>,</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo><mn>3</mn><mo>,</mo><mn>0</mn><mo stretchy="false">]</mo></math>$[0, 0, 0, 1, 2, 3, 0]$, and prefix function of string "aabaaab" is <math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">[</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>,</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo><mn>2</mn><mo>,</mo><mn>3</mn><mo stretchy="false">]</mo></math>$[0, 1, 0, 1, 2, 2, 3]$.
It is easy to see that its complexity is <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>O</mi><mo stretchy="false">(</mo><msup><mi>n</mi><mn>3</mn></msup><mo stretchy="false">)</mo></math>$O(n^3)$, which has room for improvement.
This algorithm was proposed by Knuth and Pratt and independently from them by Morris in 1977.
It was used as the main function of a substring search algorithm.
The first important observation is, that the values of the prefix function can only increase by at most one.
Indeed, otherwise, if <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo>+</mo><mn>1</mn><mo stretchy="false">]</mo><mo>></mo><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo><mo>+</mo><mn>1</mn></math>$\pi[i + 1] \gt \pi[i] + 1$, then we can take this suffix ending in position <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi><mo>+</mo><mn>1</mn></math>$i + 1$ with the length <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo>+</mo><mn>1</mn><mo stretchy="false">]</mo></math>$\pi[i + 1]$ and remove the last character from it.
We end up with a suffix ending in position <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi></math>$i$ with the length <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo>+</mo><mn>1</mn><mo stretchy="false">]</mo><mo>−</mo><mn>1</mn></math>$\pi[i + 1] - 1$, which is better than <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo></math>$\pi[i]$, i.e. we get a contradiction.
The following illustration shows this contradiction.
The longest proper suffix at position <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi></math>$i$ that also is a prefix is of length <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>2</mn></math>$2$, and at position <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi><mo>+</mo><mn>1</mn></math>$i+1$ it is of length <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>4</mn></math>$4$.
Therefore the string <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>s</mi><mn>0</mn></msub><mtext> </mtext><msub><mi>s</mi><mn>1</mn></msub><mtext> </mtext><msub><mi>s</mi><mn>2</mn></msub><mtext> </mtext><msub><mi>s</mi><mn>3</mn></msub></math>$s_0 ~ s_1 ~ s_2 ~ s_3$ is equal to the string <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>s</mi><mrow data-mjx-texclass="ORD"><mi>i</mi><mo>−</mo><mn>2</mn></mrow></msub><mtext> </mtext><msub><mi>s</mi><mrow data-mjx-texclass="ORD"><mi>i</mi><mo>−</mo><mn>1</mn></mrow></msub><mtext> </mtext><msub><mi>s</mi><mi>i</mi></msub><mtext> </mtext><msub><mi>s</mi><mrow data-mjx-texclass="ORD"><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub></math>$s_{i-2} ~ s_{i-1} ~ s_i ~ s_{i+1}$, which means that also the strings <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>s</mi><mn>0</mn></msub><mtext> </mtext><msub><mi>s</mi><mn>1</mn></msub><mtext> </mtext><msub><mi>s</mi><mn>2</mn></msub></math>$s_0 ~ s_1 ~ s_2$ and <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>s</mi><mrow data-mjx-texclass="ORD"><mi>i</mi><mo>−</mo><mn>2</mn></mrow></msub><mtext> </mtext><msub><mi>s</mi><mrow data-mjx-texclass="ORD"><mi>i</mi><mo>−</mo><mn>1</mn></mrow></msub><mtext> </mtext><msub><mi>s</mi><mi>i</mi></msub></math>$s_{i-2} ~ s_{i-1} ~ s_i$ are equal, therefore <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo></math>$\pi[i]$ has to be <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>3</mn></math>$3$.
Thus when moving to the next position, the value of the prefix function can either increase by one, stay the same, or decrease by some amount.
This fact already allows us to reduce the complexity of the algorithm to <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>O</mi><mo stretchy="false">(</mo><msup><mi>n</mi><mn>2</mn></msup><mo stretchy="false">)</mo></math>$O(n^2)$, because in one step the prefix function can grow at most by one.
In total the function can grow at most <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$n$ steps, and therefore also only can decrease a total of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$n$ steps.
This means we only have to perform <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>O</mi><mo stretchy="false">(</mo><mi>n</mi><mo stretchy="false">)</mo></math>$O(n)$ string comparisons, and reach the complexity <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>O</mi><mo stretchy="false">(</mo><msup><mi>n</mi><mn>2</mn></msup><mo stretchy="false">)</mo></math>$O(n^2)$.
Let's go further, we want to get rid of the string comparisons.
To accomplish this, we have to use all the information computed in the previous steps.
So let us compute the value of the prefix function <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi></math>$\pi$ for <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi><mo>+</mo><mn>1</mn></math>$i + 1$.
If <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo stretchy="false">[</mo><mi>i</mi><mo>+</mo><mn>1</mn><mo stretchy="false">]</mo><mo>=</mo><mi>s</mi><mo stretchy="false">[</mo><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo><mo stretchy="false">]</mo></math>$s[i+1] = s[\pi[i]]$, then we can say with certainty that <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo>+</mo><mn>1</mn><mo stretchy="false">]</mo><mo>=</mo><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo><mo>+</mo><mn>1</mn></math>$\pi[i+1] = \pi[i] + 1$, since we already know that the suffix at position <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi></math>$i$ of length <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo></math>$\pi[i]$ is equal to the prefix of length <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo></math>$\pi[i]$.
This is illustrated again with an example.
If this is not the case, <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo stretchy="false">[</mo><mi>i</mi><mo>+</mo><mn>1</mn><mo stretchy="false">]</mo><mo>≠</mo><mi>s</mi><mo stretchy="false">[</mo><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo><mo stretchy="false">]</mo></math>$s[i+1] \neq s[\pi[i]]$, then we need to try a shorter string.
In order to speed things up, we would like to immediately move to the longest length <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>j</mi><mo><</mo><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo></math>$j \lt \pi[i]$, such that the prefix property in the position <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi></math>$i$ holds, i.e. <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo stretchy="false">[</mo><mn>0</mn><mo>…</mo><mi>j</mi><mo>−</mo><mn>1</mn><mo stretchy="false">]</mo><mo>=</mo><mi>s</mi><mo stretchy="false">[</mo><mi>i</mi><mo>−</mo><mi>j</mi><mo>+</mo><mn>1</mn><mo>…</mo><mi>i</mi><mo stretchy="false">]</mo></math>$s[0 \dots j-1] = s[i-j+1 \dots i]$:
Indeed, if we find such a length <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>j</mi></math>$j$, then we again only need to compare the characters <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo stretchy="false">[</mo><mi>i</mi><mo>+</mo><mn>1</mn><mo stretchy="false">]</mo></math>$s[i+1]$ and <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo stretchy="false">[</mo><mi>j</mi><mo stretchy="false">]</mo></math>$s[j]$.
If they are equal, then we can assign <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo>+</mo><mn>1</mn><mo stretchy="false">]</mo><mo>=</mo><mi>j</mi><mo>+</mo><mn>1</mn></math>$\pi[i+1] = j + 1$.
Otherwise we will need to find the largest value smaller than <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>j</mi></math>$j$, for which the prefix property holds, and so on.
It can happen that this goes until <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>j</mi><mo>=</mo><mn>0</mn></math>$j = 0$.
If then <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo stretchy="false">[</mo><mi>i</mi><mo>+</mo><mn>1</mn><mo stretchy="false">]</mo><mo>=</mo><mi>s</mi><mo stretchy="false">[</mo><mn>0</mn><mo stretchy="false">]</mo></math>$s[i+1] = s[0]$, we assign <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo>+</mo><mn>1</mn><mo stretchy="false">]</mo><mo>=</mo><mn>1</mn></math>$\pi[i+1] = 1$, and <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo>+</mo><mn>1</mn><mo stretchy="false">]</mo><mo>=</mo><mn>0</mn></math>$\pi[i+1] = 0$ otherwise.
So we already have a general scheme of the algorithm.
The only question left is how do we effectively find the lengths for <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>j</mi></math>$j$.
Let's recap:
for the current length <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>j</mi></math>$j$ at the position <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi></math>$i$ for which the prefix property holds, i.e. <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo stretchy="false">[</mo><mn>0</mn><mo>…</mo><mi>j</mi><mo>−</mo><mn>1</mn><mo stretchy="false">]</mo><mo>=</mo><mi>s</mi><mo stretchy="false">[</mo><mi>i</mi><mo>−</mo><mi>j</mi><mo>+</mo><mn>1</mn><mo>…</mo><mi>i</mi><mo stretchy="false">]</mo></math>$s[0 \dots j-1] = s[i-j+1 \dots i]$, we want to find the greatest <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>k</mi><mo><</mo><mi>j</mi></math>$k \lt j$, for which the prefix property holds.
The illustration shows, that this has to be the value of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>j</mi><mo>−</mo><mn>1</mn><mo stretchy="false">]</mo></math>$\pi[j-1]$, which we already calculated earlier.
So we finally can build an algorithm that doesn't perform any string comparisons and only performs <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>O</mi><mo stretchy="false">(</mo><mi>n</mi><mo stretchy="false">)</mo></math>$O(n)$ actions.
Here is the final procedure:
We compute the prefix values <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo></math>$\pi[i]$ in a loop by iterating from <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi><mo>=</mo><mn>1</mn></math>$i = 1$ to <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi><mo>=</mo><mi>n</mi><mo>−</mo><mn>1</mn></math>$i = n-1$ (<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mn>0</mn><mo stretchy="false">]</mo></math>$\pi[0]$ just gets assigned with <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>0</mn></math>$0$).
To calculate the current value <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo></math>$\pi[i]$ we set the variable <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>j</mi></math>$j$ denoting the length of the best suffix for <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi><mo>−</mo><mn>1</mn></math>$i-1$. Initially <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>j</mi><mo>=</mo><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo>−</mo><mn>1</mn><mo stretchy="false">]</mo></math>$j = \pi[i-1]$.
Test if the suffix of length <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>j</mi><mo>+</mo><mn>1</mn></math>$j+1$ is also a prefix by comparing <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo stretchy="false">[</mo><mi>j</mi><mo stretchy="false">]</mo></math>$s[j]$ and <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo></math>$s[i]$.
If they are equal then we assign <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo><mo>=</mo><mi>j</mi><mo>+</mo><mn>1</mn></math>$\pi[i] = j + 1$, otherwise we reduce <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>j</mi></math>$j$ to <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>j</mi><mo>−</mo><mn>1</mn><mo stretchy="false">]</mo></math>$\pi[j-1]$ and repeat this step.
If we have reached the length <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>j</mi><mo>=</mo><mn>0</mn></math>$j = 0$ and still don't have a match, then we assign <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo><mo>=</mo><mn>0</mn></math>$\pi[i] = 0$ and go to the next index <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi><mo>+</mo><mn>1</mn></math>$i + 1$.
This is an online algorithm, i.e. it processes the data as it arrives - for example, you can read the string characters one by one and process them immediately, finding the value of prefix function for each next character.
The algorithm still requires storing the string itself and the previously calculated values of prefix function, but if we know beforehand the maximum value <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math>$M$ the prefix function can take on the string, we can store only <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi><mo>+</mo><mn>1</mn></math>$M+1$ first characters of the string and the same number of values of the prefix function.
Search for a substring in a string. The Knuth-Morris-Pratt algorithm¶
The task is the classical application of the prefix function.
Given a text <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>t</mi></math>$t$ and a string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$, we want to find and display the positions of all occurrences of the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$ in the text <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>t</mi></math>$t$.
For convenience we denote with <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$n$ the length of the string s and with <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>m</mi></math>$m$ the length of the text <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>t</mi></math>$t$.
We generate the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo>+</mo><mi mathvariant="normal">#</mi><mo>+</mo><mi>t</mi></math>$s + \# + t$, where <math xmlns="http://www.w3.org/1998/Math/MathML"><mi mathvariant="normal">#</mi></math>$\#$ is a separator that appears neither in <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$ nor in <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>t</mi></math>$t$.
Let us calculate the prefix function for this string.
Now think about the meaning of the values of the prefix function, except for the first <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi><mo>+</mo><mn>1</mn></math>$n + 1$ entries (which belong to the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$ and the separator).
By definition the value <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo></math>$\pi[i]$ shows the longest length of a substring ending in position <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi></math>$i$ that coincides with the prefix.
But in our case this is nothing more than the largest block that coincides with <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$ and ends at position <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi></math>$i$.
This length cannot be bigger than <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$n$ due to the separator.
But if equality <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo><mo>=</mo><mi>n</mi></math>$\pi[i] = n$ is achieved, then it means that the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$ appears completely in at this position, i.e. it ends at position <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi></math>$i$.
Just do not forget that the positions are indexed in the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo>+</mo><mi mathvariant="normal">#</mi><mo>+</mo><mi>t</mi></math>$s + \# + t$.
Thus if at some position <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi></math>$i$ we have <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo><mo>=</mo><mi>n</mi></math>$\pi[i] = n$, then at the position <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi><mo>−</mo><mo stretchy="false">(</mo><mi>n</mi><mo>+</mo><mn>1</mn><mo stretchy="false">)</mo><mo>−</mo><mi>n</mi><mo>+</mo><mn>1</mn><mo>=</mo><mi>i</mi><mo>−</mo><mn>2</mn><mi>n</mi></math>$i - (n + 1) - n + 1 = i - 2n$ in the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>t</mi></math>$t$ the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$ appears.
As already mentioned in the description of the prefix function computation, if we know that the prefix values never exceed a certain value, then we do not need to store the entire string and the entire function, but only its beginning.
In our case this means that we only need to store the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo>+</mo><mi mathvariant="normal">#</mi></math>$s + \#$ and the values of the prefix function for it.
We can read one character at a time of the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>t</mi></math>$t$ and calculate the current value of the prefix function.
Thus the Knuth-Morris-Pratt algorithm solves the problem in <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>O</mi><mo stretchy="false">(</mo><mi>n</mi><mo>+</mo><mi>m</mi><mo stretchy="false">)</mo></math>$O(n + m)$ time and <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>O</mi><mo stretchy="false">(</mo><mi>n</mi><mo stretchy="false">)</mo></math>$O(n)$ memory.
Counting the number of occurrences of each prefix¶
Here we discuss two problems at once.
Given a string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$ of length <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$n$.
In the first variation of the problem we want to count the number of appearances of each prefix <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo stretchy="false">[</mo><mn>0</mn><mo>…</mo><mi>i</mi><mo stretchy="false">]</mo></math>$s[0 \dots i]$ in the same string.
In the second variation of the problem another string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>t</mi></math>$t$ is given and we want to count the number of appearances of each prefix <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo stretchy="false">[</mo><mn>0</mn><mo>…</mo><mi>i</mi><mo stretchy="false">]</mo></math>$s[0 \dots i]$ in <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>t</mi></math>$t$.
First we solve the first problem.
Consider the value of the prefix function <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo></math>$\pi[i]$ at a position <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi></math>$i$.
By definition it means that the prefix of length <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo></math>$\pi[i]$ of the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$ occurs and ends at position <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi></math>$i$, and there is no longer prefix that follows this definition.
At the same time shorter prefixes can end at this position.
It is not difficult to see, that we have the same question that we already answered when we computed the prefix function itself:
Given a prefix of length <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>j</mi></math>$j$ that is a suffix ending at position <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi></math>$i$, what is the next smaller prefix <math xmlns="http://www.w3.org/1998/Math/MathML"><mo><</mo><mi>j</mi></math>$\lt j$ that is also a suffix ending at position <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi></math>$i$.
Thus at the position <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi></math>$i$ ends the prefix of length <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo></math>$\pi[i]$, the prefix of length <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo><mo>−</mo><mn>1</mn><mo stretchy="false">]</mo></math>$\pi[\pi[i] - 1]$, the prefix <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>π</mi><mo stretchy="false">[</mo><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo><mo>−</mo><mn>1</mn><mo stretchy="false">]</mo><mo>−</mo><mn>1</mn><mo stretchy="false">]</mo></math>$\pi[\pi[\pi[i] - 1] - 1]$, and so on, until the index becomes zero.
Thus we can compute the answer in the following way.
Here for each value of the prefix function we first count how many times it occurs in the array <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi></math>$\pi$, and then compute the final answers:
if we know that the length prefix <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi></math>$i$ appears exactly <math xmlns="http://www.w3.org/1998/Math/MathML"><mtext>ans</mtext><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo></math>$\text{ans}[i]$ times, then this number must be added to the number of occurrences of its longest suffix that is also a prefix.
At the end we need to add <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>1</mn></math>$1$ to each result, since we also need to count the original prefixes also.
Now let us consider the second problem.
We apply the trick from Knuth-Morris-Pratt:
we create the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo>+</mo><mi mathvariant="normal">#</mi><mo>+</mo><mi>t</mi></math>$s + \# + t$ and compute its prefix function.
The only differences to the first task is, that we are only interested in the prefix values that relate to the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>t</mi></math>$t$, i.e. <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo></math>$\pi[i]$ for <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi><mo>≥</mo><mi>n</mi><mo>+</mo><mn>1</mn></math>$i \ge n + 1$.
With those values we can perform the exact same computations as in the first task.
Given a string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$ of length <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$n$.
We want to compute the number of different substrings appearing in it.
We will solve this problem iteratively.
Namely we will learn, knowing the current number of different substrings, how to recompute this count by adding a character to the end.
So let <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>k</mi></math>$k$ be the current number of different substrings in <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$, and we add the character <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>c</mi></math>$c$ to the end of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$.
Obviously some new substrings ending in <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>c</mi></math>$c$ will appear.
We want to count these new substrings that didn't appear before.
We take the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>t</mi><mo>=</mo><mi>s</mi><mo>+</mo><mi>c</mi></math>$t = s + c$ and reverse it.
Now the task is transformed into computing how many prefixes there are that don't appear anywhere else.
If we compute the maximal value of the prefix function <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>π</mi><mrow data-mjx-texclass="ORD"><mtext>max</mtext></mrow></msub></math>$\pi_{\text{max}}$ of the reversed string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>t</mi></math>$t$, then the longest prefix that appears in <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$ is <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>π</mi><mrow data-mjx-texclass="ORD"><mtext>max</mtext></mrow></msub></math>$\pi_{\text{max}}$ long.
Clearly also all prefixes of smaller length appear in it.
Therefore the number of new substrings appearing when we add a new character <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>c</mi></math>$c$ is <math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">|</mo><mi>s</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mo>+</mo><mn>1</mn><mo>−</mo><msub><mi>π</mi><mrow data-mjx-texclass="ORD"><mtext>max</mtext></mrow></msub></math>$|s| + 1 - \pi_{\text{max}}$.
So for each character appended we can compute the number of new substrings in <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>O</mi><mo stretchy="false">(</mo><mi>n</mi><mo stretchy="false">)</mo></math>$O(n)$ times, which gives a time complexity of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>O</mi><mo stretchy="false">(</mo><msup><mi>n</mi><mn>2</mn></msup><mo stretchy="false">)</mo></math>$O(n^2)$ in total.
It is worth noting, that we can also compute the number of different substrings by appending the characters at the beginning, or by deleting characters from the beginning or the end.
Given a string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$ of length <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$n$.
We want to find the shortest "compressed" representation of the string, i.e. we want to find a string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>t</mi></math>$t$ of smallest length such that <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$ can be represented as a concatenation of one or more copies of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>t</mi></math>$t$.
It is clear, that we only need to find the length of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>t</mi></math>$t$. Knowing the length, the answer to the problem will be the prefix of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$ with this length.
Let us compute the prefix function for <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$.
Using the last value of it we define the value <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>k</mi><mo>=</mo><mi>n</mi><mo>−</mo><mi>π</mi><mo stretchy="false">[</mo><mi>n</mi><mo>−</mo><mn>1</mn><mo stretchy="false">]</mo></math>$k = n - \pi[n - 1]$.
We will show, that if <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>k</mi></math>$k$ divides <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$n$, then <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>k</mi></math>$k$ will be the answer, otherwise there is no effective compression and the answer is <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$n$.
Let <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$n$ be divisible by <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>k</mi></math>$k$.
Then the string can be partitioned into blocks of the length <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>k</mi></math>$k$.
By definition of the prefix function, the prefix of length <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi><mo>−</mo><mi>k</mi></math>$n - k$ will be equal with its suffix.
But this means that the last block is equal to the block before.
And the block before has to be equal to the block before it.
And so on.
As a result, it turns out that all blocks are equal, therefore we can compress the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$ to length <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>k</mi></math>$k$.
Of course we still need to show that this is actually the optimum.
Indeed, if there was a smaller compression than <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>k</mi></math>$k$, than the prefix function at the end would be greater than <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi><mo>−</mo><mi>k</mi></math>$n - k$.
Therefore <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>k</mi></math>$k$ is really the answer.
Now let us assume that <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$n$ is not divisible by <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>k</mi></math>$k$.
We show that this implies that the length of the answer is <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$n$.
We prove it by contradiction.
Assuming there exists an answer, and the compression has length <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>p</mi></math>$p$ (<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>p</mi></math>$p$ divides <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$n$).
Then the last value of the prefix function has to be greater than <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi><mo>−</mo><mi>p</mi></math>$n - p$, i.e. the suffix will partially cover the first block.
Now consider the second block of the string.
Since the prefix is equal with the suffix, and both the prefix and the suffix cover this block and their displacement relative to each other <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>k</mi></math>$k$ does not divide the block length <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>p</mi></math>$p$ (otherwise <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>k</mi></math>$k$ divides <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$n$), then all the characters of the block have to be identical.
But then the string consists of only one character repeated over and over, hence we can compress it to a string of size <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>1</mn></math>$1$, which gives <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>k</mi><mo>=</mo><mn>1</mn></math>$k = 1$, and <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>k</mi></math>$k$ divides <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$n$.
Contradiction.
Building an automaton according to the prefix function¶
Let's return to the concatenation to the two strings through a separator, i.e. for the strings <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$ and <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>t</mi></math>$t$ we compute the prefix function for the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo>+</mo><mi mathvariant="normal">#</mi><mo>+</mo><mi>t</mi></math>$s + \# + t$.
Obviously, since <math xmlns="http://www.w3.org/1998/Math/MathML"><mi mathvariant="normal">#</mi></math>$\#$ is a separator, the value of the prefix function will never exceed <math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">|</mo><mi>s</mi><mo stretchy="false">|</mo></math>$|s|$.
It follows, that it is sufficient to only store the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo>+</mo><mi mathvariant="normal">#</mi></math>$s + \#$ and the values of the prefix function for it, and we can compute the prefix function for all subsequent character on the fly:
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><munder><mrow data-mjx-texclass="OP"><munder><mrow><msub><mi>s</mi><mn>0</mn></msub><mtext> </mtext><msub><mi>s</mi><mn>1</mn></msub><mtext> </mtext><mo>…</mo><mtext> </mtext><msub><mi>s</mi><mrow data-mjx-texclass="ORD"><mi>n</mi><mo>−</mo><mn>1</mn></mrow></msub><mtext> </mtext><mi mathvariant="normal">#</mi></mrow><mo>⏟</mo></munder></mrow><mrow data-mjx-texclass="ORD"><mtext>need to store</mtext></mrow></munder><mtext> </mtext><munder><mrow data-mjx-texclass="OP"><munder><mrow><msub><mi>t</mi><mn>0</mn></msub><mtext> </mtext><msub><mi>t</mi><mn>1</mn></msub><mtext> </mtext><mo>…</mo><mtext> </mtext><msub><mi>t</mi><mrow data-mjx-texclass="ORD"><mi>m</mi><mo>−</mo><mn>1</mn></mrow></msub></mrow><mo>⏟</mo></munder></mrow><mrow data-mjx-texclass="ORD"><mtext>do not need to store</mtext></mrow></munder></math>$$\underbrace{s_0 ~ s_1 ~ \dots ~ s_{n-1} ~ \#}_{\text{need to store}} ~ \underbrace{t_0 ~ t_1 ~ \dots ~ t_{m-1}}_{\text{do not need to store}}$$
Indeed, in such a situation, knowing the next character <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>c</mi><mo>∈</mo><mi>t</mi></math>$c \in t$ and the value of the prefix function of the previous position is enough information to compute the next value of the prefix function, without using any previous characters of the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>t</mi></math>$t$ and the value of the prefix function in them.
In other words, we can construct an automaton (a finite state machine): the state in it is the current value of the prefix function, and the transition from one state to another will be performed via the next character.
Thus, even without having the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>t</mi></math>$t$, we can construct such a transition table <math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">(</mo><msub><mtext>old</mtext><mi>π</mi></msub><mo>,</mo><mi>c</mi><mo stretchy="false">)</mo><mo stretchy="false">→</mo><msub><mtext>new</mtext><mi>π</mi></msub></math>$(\text{old}_\pi, c) \rightarrow \text{new}_\pi$ using the same algorithm as for calculating the transition table:
However in this form the algorithm runs in <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>O</mi><mo stretchy="false">(</mo><msup><mi>n</mi><mn>2</mn></msup><mn>26</mn><mo stretchy="false">)</mo></math>$O(n^2 26)$ time for the lowercase letters of the alphabet.
Note that we can apply dynamic programming and use the already calculated parts of the table.
Whenever we go from the value <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>j</mi></math>$j$ to the value <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>π</mi><mo stretchy="false">[</mo><mi>j</mi><mo>−</mo><mn>1</mn><mo stretchy="false">]</mo></math>$\pi[j-1]$, we actually mean that the transition <math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">(</mo><mi>j</mi><mo>,</mo><mi>c</mi><mo stretchy="false">)</mo></math>$(j, c)$ leads to the same state as the transition as <math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">(</mo><mi>π</mi><mo stretchy="false">[</mo><mi>j</mi><mo>−</mo><mn>1</mn><mo stretchy="false">]</mo><mo>,</mo><mi>c</mi><mo stretchy="false">)</mo></math>$(\pi[j-1], c)$, and this answer is already accurately computed.
As a result we construct the automaton in <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>O</mi><mo stretchy="false">(</mo><mn>26</mn><mi>n</mi><mo stretchy="false">)</mo></math>$O(26 n)$ time.
When is such a automaton useful?
To begin with, remember that we use the prefix function for the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo>+</mo><mi mathvariant="normal">#</mi><mo>+</mo><mi>t</mi></math>$s + \# + t$ and its values mostly for a single purpose: find all occurrences of the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$ in the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>t</mi></math>$t$.
Therefore the most obvious benefit of this automaton is the acceleration of calculating the prefix function for the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo>+</mo><mi mathvariant="normal">#</mi><mo>+</mo><mi>t</mi></math>$s + \# + t$.
By building the automaton for <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi><mo>+</mo><mi mathvariant="normal">#</mi></math>$s + \#$, we no longer need to store the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$ or the values of the prefix function in it.
All transitions are already computed in the table.
But there is a second, less obvious, application.
We can use the automaton when the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>t</mi></math>$t$ is a gigantic string constructed using some rules.
This can for instance be the Gray strings, or a string formed by a recursive combination of several short strings from the input.
For completeness we will solve such a problem:
given a number <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>k</mi><mo>≤</mo><msup><mn>10</mn><mn>5</mn></msup></math>$k \le 10^5$ and a string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$ of length <math xmlns="http://www.w3.org/1998/Math/MathML"><mo>≤</mo><msup><mn>10</mn><mn>5</mn></msup></math>$\le 10^5$.
We have to compute the number of occurrences of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$ in the <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>k</mi></math>$k$-th Gray string.
Recall that Gray's strings are define in the following way:
In such cases even constructing the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>t</mi></math>$t$ will be impossible, because of its astronomical length.
The <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>k</mi></math>$k$-th Gray string is <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mn>2</mn><mi>k</mi></msup><mo>−</mo><mn>1</mn></math>$2^k-1$ characters long.
However we can calculate the value of the prefix function at the end of the string effectively, by only knowing the value of the prefix function at the start.
In addition to the automaton itself, we also compute values <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>G</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo><mo stretchy="false">[</mo><mi>j</mi><mo stretchy="false">]</mo></math>$G[i][j]$ - the value of the automaton after processing the string <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>g</mi><mi>i</mi></msub></math>$g_i$ starting with the state <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>j</mi></math>$j$.
And additionally we compute values <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>K</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo><mo stretchy="false">[</mo><mi>j</mi><mo stretchy="false">]</mo></math>$K[i][j]$ - the number of occurrences of <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$ in <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>g</mi><mi>i</mi></msub></math>$g_i$, before during the processing of <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>g</mi><mi>i</mi></msub></math>$g_i$ starting with the state <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>j</mi></math>$j$.
Actually <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>K</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo><mo stretchy="false">[</mo><mi>j</mi><mo stretchy="false">]</mo></math>$K[i][j]$ is the number of times that the prefix function took the value <math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">|</mo><mi>s</mi><mo stretchy="false">|</mo></math>$|s|$ while performing the operations.
The answer to the problem will then be <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>K</mi><mo stretchy="false">[</mo><mi>k</mi><mo stretchy="false">]</mo><mo stretchy="false">[</mo><mn>0</mn><mo stretchy="false">]</mo></math>$K[k][0]$.
How can we compute these values?
First the basic values are <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>G</mi><mo stretchy="false">[</mo><mn>0</mn><mo stretchy="false">]</mo><mo stretchy="false">[</mo><mi>j</mi><mo stretchy="false">]</mo><mo>=</mo><mi>j</mi></math>$G[0][j] = j$ and <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>K</mi><mo stretchy="false">[</mo><mn>0</mn><mo stretchy="false">]</mo><mo stretchy="false">[</mo><mi>j</mi><mo stretchy="false">]</mo><mo>=</mo><mn>0</mn></math>$K[0][j] = 0$.
And all subsequent values can be calculated from the previous values and using the automaton.
To calculate the value for some <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi></math>$i$ we remember that the string <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>g</mi><mi>i</mi></msub></math>$g_i$ consists of <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>g</mi><mrow data-mjx-texclass="ORD"><mi>i</mi><mo>−</mo><mn>1</mn></mrow></msub></math>$g_{i-1}$, the <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi></math>$i$ character of the alphabet, and <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>g</mi><mrow data-mjx-texclass="ORD"><mi>i</mi><mo>−</mo><mn>1</mn></mrow></msub></math>$g_{i-1}$.
Thus the automaton will go into the state:
The values for <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>K</mi><mo stretchy="false">[</mo><mi>i</mi><mo stretchy="false">]</mo><mo stretchy="false">[</mo><mi>j</mi><mo stretchy="false">]</mo></math>$K[i][j]$ can also be easily counted.
So we can solve the problem for Gray strings, and similarly also a huge number of other similar problems.
For example the exact same method also solves the following problem:
we are given a string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$ and some patterns <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>t</mi><mi>i</mi></msub></math>$t_i$, each of which is specified as follows:
it is a string of ordinary characters, and there might be some recursive insertions of the previous strings of the form <math xmlns="http://www.w3.org/1998/Math/MathML"><msubsup><mi>t</mi><mi>k</mi><mrow data-mjx-texclass="ORD"><mtext>cnt</mtext></mrow></msubsup></math>$t_k^{\text{cnt}}$, which means that at this place we have to insert the string <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>t</mi><mi>k</mi></msub></math>$t_k$<math xmlns="http://www.w3.org/1998/Math/MathML"><mtext>cnt</mtext></math>$\text{cnt}$ times.
An example of such patterns:
The recursive substitutions blow the string up, so that their lengths can reach the order of <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mn>100</mn><mrow data-mjx-texclass="ORD"><mn>100</mn></mrow></msup></math>$100^{100}$.
We have to find the number of times the string <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>s</mi></math>$s$ appears in each of the strings.
The problem can be solved in the same way by constructing the automaton of the prefix function, and then we calculate the transitions in for each pattern by using the previous results.