Introduction to Regular Expressions

Introduction to Regular Expressions


An alphabet is a finite set of symbols denoted by the symbol . Examples are the Roman alphabet = {a,b,c,...,z} and the Binary alphabet = {0,1}. Any finite set of symbols can be used.


A String over an alphabet is a finite sequence of symbols from that alphabet. We represent strings by writing the symbols next to each other. For example words is a string over the alphabet {a,b,c,...,z} and 110110 is a string over the alphabet {0,1}. The string with one symbol is shown as the symbol itself. A string with no symbols is called the empty string denoted by e. We generally use u, v, w, x, y, z and the greek letters to name strings; for example we might use x to name the string ghi. To avoid confusion, we shall not use e or any symbol in the alphabet as the name of a string. The set of all strings, including the empty string, over an alphabet is denoted by *.


Any set of strings over an alphabet -that is any subset of *- is called a language. Thus *, and are languages. We can specify a finite language by listing all its strings. For example { abbba, bbab, bb} is a language over {a, b}. However, some languages are infinite so we can not list all the elements. Some example languages are { 0, 01, 011, 0111, ...}, { w {0,1}*:w has an equal number of zeros and ones} and { w {a, b}*:w has an odd number of a's and an odd number of b's}. We can specify infinite languages using the notation

L={w * :w has property P}.

Definition of Regular Expressions

The Regular Expressions over an alphabet are the strings over the alphabet {),(, , , *} such that the following holds.

  1. and each member of is a regular expression
  2. if and are regular expressions then so is ( ) (Union)
  3. if and are regular expressions then so is () (Concatenation)
  4. if is a regular expression then so is * ( Kleene Star).

NOTE: Since you are unable to type on the keyboard, you can type '\0' instead. We do not allow the use of the empty string e in our regular expressions, but as e is simply shorthand for * you can type '\0*' instead.


The order in which the Union, Concatenation and Kleene Star operations are evaluated has order similar to addition, multiplication and exponentiation. Kleene Star is evaluated first. Concatenation is evaluated second. Union is evaluated last. This order can be changed by putting brackets around part of the expression that you want to evaluate together. For example, aa bb means the String aa or the string bb. While a (a b)b means the string aab or the string abb.

Practice Questions     Automata  

Last modified: Thu Jan 6 13:30:12 PST 2000 by J. N. Tremblay