languages

Definition

Pumping Lemma (Regular Languages)

Infinite regular language have a special property: Every word from a certain length onward can be pumped, i.e. every such word contains a substring that can be repeated arbitrarily often, with the resulting words still belong to the same language.

Let be an infinite regular language. Then there exists a bound (depending only on ) such that for every word with there exist words , such that:

and

Short (quantifier) form:

Important

Equivalently:

Pigeonhole Principle

The proof uses the pigeonhole principle. Let be a deterministic finite automaton with states, and let with .

After reading the first symbols of , the automaton has visited states, including the initial state. Since there are only states, at least one state must occur twice.

Intuitively, is the length beyond which the automaton can no longer keep seeing only new states. From that point onward, some state must repeat, so a loop appears in the computation.

Intuition

The first input symbols produce visited states. Since there are only states, one state must repeat.

In the picture above, the repeated state lies inside the first input symbols. The segment between the two occurrences is , and together with it stays to the left of the boundary, so .

If the input word is written as , then the automaton can read zero or more times without changing the state reached after .

Therefore, every word of length at least can be decomposed as with and .

Observations

What the pumping lemma gives

  • First choose a pumping length for the language.

  • For a deterministic finite automaton with states, a natural choice is .

  • The bound does not have to be minimal.

  • Then choose any word with .

  • For that word, find one decomposition with and .

  • Then every pumped word must still lie in .

Common mistake does not describe the whole language. It only describes the pumped versions of one chosen word.

So one example family is enough to show how pumping works, but it is not a full description of .

Every sufficiently long word in a regular language contains a repeatable middle segment , because a finite automaton must revisit a state (see pigeonhole principle).

Examples

Positive Example

This automaton has two states, so would already work. However, the pumping length does not have to be minimal. We may also choose the larger bound .

Why and The pumping lemma only says that there exists some bound . It does not require the smallest possible one. So although already works for this automaton, is also a valid pumping length.

After fixing , we choose a word with length at least . The word is too short, and is also too short. A simple accepted word with length is .

Its run is

We choose this chain because its word length reaches the chosen pumping length and it shows the repeated state explicitly.

Take the word

We can therefore split the word as

with

Here, is the loop part: the automaton stays in while reading it. Since also has an -loop, we may repeat any number of times. Thus:

Coverage only describes the pumped versions of the chosen word . It does not describe the whole language , because the word is also in but cannot be written in the form .

The family

Negative Example

We want to show that if the pumping lemma does not apply to a language , then is non-regular language.

Given a language :

Assume that is regular, i.e. the pumping lemma holds.

Let with . Partition into

Because and the first symbols of are all ‘s (because ), both and consist only of ‘s. So we can write:

and then, by removing the prefix and , we construct a suffix :

so that:

Consider . The pumping lemma claims that must still be in :

Here, the number of ‘s is and the number of ‘s is . Since , we have , so:

because words in must have the same number of ‘s and ‘s.

This is a contradiction, which means that the assumption of being regular must be wrong.

Negative Example

Assume that the language is regular:

The trick is to put the symbol with as exponent upfront. So we could choose:

as word.

Therefore, the word only consists of s, given the constraint .

For , we get:

Looking at the construction of , we can infer that as we caused a non-symmetric pattern, given . Note that works for any non-negative integer .

This contradicts the assumption above that is regular. Thus, is not regular.

Let . Assume that is regular, i.e. the pumping lemma is applicable.

Then, there exists some such that for all with and all that we can denote as:

Choose with .

Why this word works symbols are all the same, so that any substring with must lie completely inside that repeated block.

This is useful because pumping changes only that part of the word. The remaining suffix stays fixed, so any property that requires the left and right parts to match will be destroyed.

In other words, the word is chosen so that the pumping lemma is forced to act exactly where the language’s defining symmetry is most fragile.

Now, we partition into three components:

Given that , it follows that . We rewrite as:

Choose . We get:

Looking at the definition of and , it follows that . This is contradicts the initial claim that is regular, given that every word has to have a partition such that for ALL the partition is equal to the original word.

Therefore, is not regular.

Furthermore, given that is equivalent to the language of palindromes:

Let . Assume that is regular, i.e. the pumping lemma is applicable.

Then there exists some such that all, for any , there exists a partition with and .

Choose . Then, there exists some partition such that:

Given that , it follows that , given that . Rewrite as:

Choose . Then:

Given that , it follows that . That contradicts the initial statement that is regular, i.e. is non-regular.