languages

Definition

Chomsky–Schützenberger Theorem

Let Let over be a Dyck language.

A language over is context-free iff there exists some and a homomorphism such that:

where denotes a regular language over .

Choosing n

When to choose

Choose according to how many independent matching patterns you want to encode.

  • : use . One kind of matching is enough for one mirrored block.
  • : use . Two independent mirrored blocks suggest two bracket types.
  • : use . Three independent mirrored blocks suggest three bracket types.
  • : use . Four independent mirrored blocks suggest four bracket types.

Some less obvious examples are:

  • : a natural choice is . The language has two independent matching conditions, even though they are not written as explicit reversals.
  • : a natural choice is . The middle marker separates the two halves, so one bracket type already suffices.
  • : a natural choice is . There are two nested dependencies, but they are not presented as the same kind of mirror image.
  • : a natural choice is . Two bracket types are needed to distinguish the two kinds of delimiters.

The choice is not unique. It is often just the most natural encoding for the proof.

Examples

Example 1

where with and .

Therefore, is context-free.

Example 2

Consider the language with and

where

Here, need not be bijective: two different bracket symbols may map to the same terminal symbol, as and both map to .

The regular language is

Then, for example,

Let . We have to find a Dyck language , a regular language , and a string homomorphism , such that:

Choose with , and with:

Now, we have to ensure that the bracket structure for and exists. We choose :

Let . We have to find a Dyck language , a regular language , and a string homomorphism , such that:

We first have to choose for . Each has two structural dependencies and . Thus, we choose .

Next, we choose a regular language that gives shape constraints. We choose:

Malformed Note that itself allows for malformed bracket expressions, such as:

That’s why we intersect with , the set of all well-formed bracket expressions.

Next, we have to select a string homomorphism that maps each word to a word . Given our , a natural choice is:

Possible mapping to

Forbidden due to 's shape constraints

We found a valid combination. Therefore, is context-free.

Let . We have to find a Dyck language , a regular language , and a string homomorphism , such that:

We choose as:

Note: We use the Kleene star to allow for and .

We have to consider the following dependencies:

  • or extra (no generated at the end)
  • or counted (one generated at the end)

We then construct mapping all opening brackets to our prefix alphabet, and erasing the closing brackets of the “extra” characters:

Therefore, let be an arbitrary sequence of opening brackets, be its reversal, and be the mapping to closing counterparts:

For every word evaluated by , the prefix generates characters of and . The suffix generates exactly characters of (where is the number of counted brackets) and empty strings .

Since is always structurally guaranteed, this perfectly yields

proving is a context-free language.