Tenth Lecture -- How to prove languages not to be regular

Monday, October 16

Difficulties

How do we prove that a language is not regular? In general, proving impossibility results, lower bounds, is very difficult. If I want to prove I can swim across a pool, there is an easy proof: I swim across it. How do I prove I cannot? I jump and drown?

The intuition is that some languages will not be regular because the acceptor, a finite automaton, is finite. It can remember only finitely many things.

For 1-letter languages, things are easy. We can formalize easily the intuition "a k-state automaton cannot remember k+1 things".

Look at the graph of a deterministic automaton over a 1-letter alphabet. Each node has outdegree 1. So the graph is a directed path that enters into a directed cycle (like a number 6) -- either of these may be missing.
It is now clear that a 3-state automaton cannot accept the language of strings of length a multiple of 4.

For larger alphabets, things seem more complicated, but there is a general technique, the pumping lemma that, formalizes our intuition in a way that enables us to easily produce proofs of non-regularity. Before looking at it, let us prove a result directly. The intuition we gain will be useful when we cover the Nerode-Myhill characterization of regular languages.

A Proof

We will solve a homework problem, showing that
L={a^n b^n | n=0, 1, 2, ...} is not regular.

First note that no argument of the form "the natural way to do X is a given algorithm. The algorithm does not work. So it is impossible to do X." can work. I call these attempts proof of ignorance: all they do is show that I cannot do it.

So any formal proof must be by contradiction: we assume that X can be done, then using this, prove something like 1=0. Since this is a contradiction, our assumption must have been wrong.

Assume L is regular. Then there is a dfa, say M, such that L=L(M). Let k be the number of states of M.
We will show that there are two lengths i1 and i2, such that
deltaHat(q0, a^i1)=deltaHat(qo,a^i2 )=p
where deltaHat is the transition function of M extended to strings, q0 is the initial state of M, and p is some state of M.
This yields a contradiction: we must have
deltaHat(q0, a^i1 b^i1) in F (since a^i1 b^i1 is in L),
deltaHat(q0, a^i2 b^i2) in F (since a^i2 b^i2 is in L),
and deltaHat(q0, a^i1 b^i2) not in F (since a^i1 b^i2 is not in L.)
But deltaHat(q0, a^i1 b^i1)=deltaHat( deltaHat(q0, a^i1 ), b^i1)=deltaHat(p, b^i1), so delataHat(p, b^i1) is in F. Similarly, we must have deltaHat(p, b^i2) in F
(since deltaHat(q0, a^i2 b^i2)=deltaHat( deltaHat(q0, a^i2 ), b^i2)=deltaHat(p, b^i2) and a^i2 b^i2 is in L)
But then delataHat(qo, a^i1 b^i2 ) is in F (since delataHat(qo, a^i1 b^i2 )=deltaHat(delataHat(qo, a^i1 ), b^i2 )= deltaHat(p, b^i2 )
This is the required contradiction.

To finish the proof, we must show that there are two lengths i1 and i2, such that
deltaHat(q0, a^i1)=deltaHat(qo,a^i2 )=p
Intuitively "M can remember the length only in its finite state. There are only finitely many of these, but you need infinitely many different states, on for each length."

The formalization is not hard. It uses the Pigeonhole Principle: if you have n+1 pigeons that must go into n pigeonholes, there will be a hole with at least 2 pigeons.
More formally: any mapping of an n+1 element set into {1, 2, ... n} will have some i, 1<=i<=n, such that the preimage of i has cardinality at least 2.

Now consider the sequence
deltaHat(q0, epsilon), deltaHat(q0, a), deltaHat(q0, a^2 ), .... deltaHat(q0, a^k )
(remember that k is the number of states of M.) There are k+1 elements, and the range of the mapping has cardinality k. So there is some state, call it p, with two preimages, say deltaHat(q0 a^i1 ) and deltaHat(q0 a^i2 ).

This concludes the proof.

The pumping lemma

Covered in text.