Regular Expression is Accepted by Finite State Machine

Theorem
Let $R$ be a regular expression.

Then there exists a finite state machine $F$ s.t. its accepted language $L\left({F}\right)$ is exactly $L\left({R}\right)$, the language defined by $R$.

Proof
This proof proceeds by structural induction.

Case 1. Assume $R$ is the empty-set regular expression, $\varnothing$.

Then $L\left({R}\right) = \varnothing$.

Consider the finite state machine $F_\varnothing$ defined as:


 * $ \displaystyle F_\varnothing = \left({ S_\varnothing, A_\varnothing, I_\varnothing, \Sigma, T_\varnothing }\right) $

where:


 * $ S_\varnothing = \left\{ { \mathsf{Rej} }\right\} $
 * $ A_\varnothing = \varnothing $
 * $ I_\varnothing = \mathsf{Rej} $
 * $ T_\varnothing \left({ s, \sigma }\right) = \mathsf{Rej} $ for all $ s \in S_\varnothing, \sigma \in \Sigma $

This machine is always in a rejecting state and never leaves it, so no word is in $L\left({ F_\varnothing }\right)$.

Therefore, $L\left({ F_\varnothing }\right) = \varnothing = L\left({R}\right)$.

Case 2. Assume $R$ is the empty-word regular expression, $\epsilon$.

Then $L\left({R}\right) = \left\{ { \left[\right] }\right\}$.

Consider the finite state machine $F_\epsilon$ defined as:


 * $ \displaystyle F_\epsilon = \left({ S_\epsilon, A_\epsilon, I_\epsilon, \Sigma, T_\epsilon }\right) $

where:


 * $ S_\epsilon = \left\{ { \mathsf{Acc}, \mathsf{Rej} }\right\} $
 * $ A_\epsilon = \left\{ { \mathsf{Acc} }\right\} $
 * $ I_\epsilon = \mathsf{Acc} $
 * $ T_\epsilon \left({ s, \sigma }\right) = \mathsf{Rej} $ for all $ s \in S_\epsilon, \sigma \in \Sigma $

This machine starts out in an accepting state, so $\left[\right]$ (the empty word) is in $L\left({ F_\epsilon }\right)$.

Furthermore, any symbol moves the machine to a rejecting state and never back, so no other word is in $L\left({ F_\epsilon }\right)$.

Therefore, $L\left({ F_\epsilon }\right) = \left\{ { \left[\right] }\right\} = L\left({R}\right)$.

Case 3. Assume $R$ is a literal $\sigma$.

Then $L\left({R}\right) = \left\{ { \left[{\sigma}\right] }\right\}$.

Consider the finite state machine $F_\sigma$ defined as:


 * $ \displaystyle F_\sigma = \left({ S_\sigma, A_\sigma, I_\sigma, \Sigma, T_\sigma }\right) $

where:


 * $ S_\sigma = \left\{ { \mathsf{Start}, \mathsf{Acc}, \mathsf{Rej} }\right\} $
 * $ A_\sigma = \left\{ { \mathsf{Acc} }\right\} $
 * $ I_\sigma = \mathsf{Start} $
 * $ T_\sigma \left({ \mathsf{Start}, \sigma }\right) = \mathsf{Acc} $
 * $ T_\sigma \left({ s', \sigma' }\right) = \mathsf{Rej} $ for all other $ s' \in S_\sigma, \sigma' \in \Sigma $

This machine starts out in a rejecting state, so $\left[\right]$ (the empty word) is not in $L\left({ F_\sigma }\right)$.

After receiving the symbol $\sigma$ at the start, this machine moves to an accepting state, so $\left[{\sigma}\right]$ is in $L\left({ F_\sigma }\right)$.

Any other initial symbol, and any symbol after the initial, moves the machine to a rejecting state and never back, so no other word is in $L\left({ F_\sigma }\right)$.

Therefore, $L\left({ F_\sigma }\right) = \left\{ { \left[{\sigma}\right] }\right\} = L\left({R}\right)$.

Case 4. Assume $R$ is a concatenation, $R_1 R_2$.

By the induction hypothesis, there exist finite state machines


 * $ \displaystyle F_1 = \left({ S_1, A_1, I_1, \Sigma, T_1 }\right) $ s.t. $ \displaystyle L\left({F_1}\right) = L\left({R_1}\right) $
 * $ \displaystyle F_2 = \left({ S_2, A_2, I_2, \Sigma, T_2 }\right) $ s.t. $ \displaystyle L\left({F_2}\right) = L\left({R_2}\right) $

Define a new finite state machine $F_c$ as:


 * $ \displaystyle F_c = \left({ S_c, A_c, I_c, \Sigma, T_c }\right) $

where:


 * $ S_c = S_1 \times \mathcal{P} \left({ S_2 }\right) $ where $\times$ denotes the Cartesian Product and $\mathcal{P}$ the Power Set
 * $ A_c = \left\{ { \left({ s_1, s_2 }\right) : s_1 \in S_1 \land s_2 \cap A_2 \neq \varnothing }\right\} $
 * $ I_c = \begin{cases} \left({ I_1, \varnothing }\right) & \mbox{if } I_1 \notin A_1 \\ \left({ I_1, \left\{ {I_2} \right\} }\right) & \mbox{if } I_1 \in A_1 \end{cases} $
 * $ \displaystyle T_c \left({ \left({ s_1, s_2 }\right), \sigma }\right) = \begin{cases} \left({ T_1 \left({ s_1, \sigma }\right), \bigcup_{s \in s_2} \left\{ { T_2 \left({ s, \sigma }\right) }\right\} }\right) & \mbox{if } T_1 \left({ s_1, \sigma }\right) \notin A_1 \\ \left({ T_1 \left({ s_1, \sigma }\right), \bigcup_{s \in s_2} \left\{ { T_2 \left({ s, \sigma }\right) }\right\} \cup \left\{ {I_2} \right\} }\right) & \mbox{if } T_1 \left({ s_1, \sigma }\right) \in A_1 \end{cases} $