Lukas' Notes

data-structures computation

Definition

Suffix Tree

A suffix tree for a string is a compressed trie of all suffixes of T\$$, where $$ is a fresh letter marking the end of the string.

Each leaf stores the starting position of one suffix. The children at each node are ordered by the lexicographic order of their edge labels.

Query

Substring search

Given a suffix tree for and a pattern , walk down from the root matching edge by edge.

Search(root, P):
    u = root
    i = 0                            // position in P
    while i < |P|:
        find child v of u whose edge label starts with P[i]
        if no such child:
            return null              // P does not occur in T
        L = label(u, v)
        k = min(|P| - i, |L|)
        if P[i..i+k-1] != L[0..k-1]:
            return null              // mismatch
        if i + k = |P|:
            return (v, k)            // P ends inside edge (u, v)
        u = v
        i = i + k
    return (u, 0)                     // P ends at node u

Runtime: when the alphabet has constant size. Finding the correct child takes per step with a direct lookup table at each node. With a general ordered alphabet, the children are ordered by lexicographic order, so a binary search over the children takes per step, yielding .

Finding all occurrences

Once the locus of is found, every leaf in the subtree rooted at corresponds to an occurrence of in . Traverse the subtree and collect the leaf positions.

Runtime: where is the number of occurrences. The term comes from the search; the term from traversing the subtree.

Examples

Suffixes of

For , append the fresh letter \$$ and build the compressed trie of the suffixes of \text{banana}$$:

The leaf labelled represents the suffix T[3..]\ = \text{ana}$$.

Substring lookup

A string occurs in iff labels a path starting at the root of the suffix tree.

For , the string occurs because the path

exists. The leaves below this path give the starting positions and .