Definition
Chomsky–Schützenberger Theorem
Let Let over be a Dyck language.
A language over is context-free iff there exists some and a homomorphism such that:
where denotes a regular language over .
Choosing n
When to choose
Choose according to how many independent matching patterns you want to encode.
- : use . One kind of matching is enough for one mirrored block.
- : use . Two independent mirrored blocks suggest two bracket types.
- : use . Three independent mirrored blocks suggest three bracket types.
- : use . Four independent mirrored blocks suggest four bracket types.
Some less obvious examples are:
- : a natural choice is . The language has two independent matching conditions, even though they are not written as explicit reversals.
- : a natural choice is . The middle marker separates the two halves, so one bracket type already suffices.
- : a natural choice is . There are two nested dependencies, but they are not presented as the same kind of mirror image.
- : a natural choice is . Two bracket types are needed to distinguish the two kinds of delimiters.
The choice is not unique. It is often just the most natural encoding for the proof.
Examples
Example 1
Example 2
Consider the language with and
where
Here, need not be bijective: two different bracket symbols may map to the same terminal symbol, as and both map to .
The regular language is
Then, for example,
Let . We have to find a Dyck language , a regular language , and a string homomorphism , such that:
Choose with , and with:
Now, we have to ensure that the bracket structure for and exists. We choose :
Let . We have to find a Dyck language , a regular language , and a string homomorphism , such that:
We first have to choose for . Each has two structural dependencies and . Thus, we choose .
Next, we choose a regular language that gives shape constraints. We choose:
Malformed Note that itself allows for malformed bracket expressions, such as:
That’s why we intersect with , the set of all well-formed bracket expressions.
Next, we have to select a string homomorphism that maps each word to a word . Given our , a natural choice is:
Possible mapping to
Forbidden due to 's shape constraints
We found a valid combination. Therefore, is context-free.
Let . We have to find a Dyck language , a regular language , and a string homomorphism , such that:
We try and thus . We have to find a that forms the shape of the dependencies.
For , we choose:
Therefore:
Problems:
- We cannot express the entire alphabet via .
- We strictly enforced equality in length, i.e. , but specifies .
Given that types of brackets aren't enough, we increase to and choose :
We then choose :
To evaluate , we must ensure that the sequence of closing brackets is the exact structural mirror of the opening brackets. Let denote the string reversal of , and let denote a character mapping where and .
Problem: We still strictly enforce rather than .
To solve the inequality , we must distinguish between two structural roles for the characters in our prefix :
- Counted characters: These generate a corresponding in the suffix.
- Extra characters: These do not generate a (they map to under ).
Since our prefix alphabet is , every character generated in requires two independent choices: its letter ( or ) and its structural role (counted or extra). This results in distinct states:
- Counted
- Counted
- Extra
- Extra
A string homomorphism is strictly deterministic; one bracket type can only map to one specific output. If we choose , the Dyck language only provides distinct pairs of brackets. By the pigeonhole principle, we cannot map independent states to bracket pairs.
If we try , we are forced to sacrifice a combination (e.g., we could generate extra ‘s, but not extra ‘s). Therefore, fails, and we must increase the dimension to .
We choose as:
Note: We use the Kleene star to allow for and .
We have to consider the following dependencies:
- or extra (no generated at the end)
- or counted (one generated at the end)
We then construct mapping all opening brackets to our prefix alphabet, and erasing the closing brackets of the “extra” characters:
Therefore, let be an arbitrary sequence of opening brackets, be its reversal, and be the mapping to closing counterparts:
For every word evaluated by , the prefix generates characters of and . The suffix generates exactly characters of (where is the number of counted brackets) and empty strings .
Since is always structurally guaranteed, this perfectly yields
proving is a context-free language.