Definition
Suffix Tree
A suffix tree for a string is a compressed trie of all suffixes of T\$$, where $$ is a fresh letter marking the end of the string.
Each leaf stores the starting position of one suffix. The children at each node are ordered by the lexicographic order of their edge labels.
Query
Substring search
Given a suffix tree for and a pattern , walk down from the root matching edge by edge.
Search(root, P): u = root i = 0 // position in P while i < |P|: find child v of u whose edge label starts with P[i] if no such child: return null // P does not occur in T L = label(u, v) k = min(|P| - i, |L|) if P[i..i+k-1] != L[0..k-1]: return null // mismatch if i + k = |P|: return (v, k) // P ends inside edge (u, v) u = v i = i + k return (u, 0) // P ends at node uRuntime: when the alphabet has constant size. Finding the correct child takes per step with a direct lookup table at each node. With a general ordered alphabet, the children are ordered by lexicographic order, so a binary search over the children takes per step, yielding .
Finding all occurrences
Once the locus of is found, every leaf in the subtree rooted at corresponds to an occurrence of in . Traverse the subtree and collect the leaf positions.
Runtime: where is the number of occurrences. The term comes from the search; the term from traversing the subtree.
Examples
Suffixes of
For , append the fresh letter \$$ and build the compressed trie of the suffixes of \text{banana}$$:
The leaf labelled represents the suffix T[3..]\ = \text{ana}$$.
Substring lookup
A string occurs in iff labels a path starting at the root of the suffix tree.
For , the string occurs because the path
exists. The leaves below this path give the starting positions and .