Introduction to Regular Expressions
# Introduction to Regular Expressions

## Alphabets

An alphabet is a finite set of symbols denoted by the symbol .
Examples are the Roman alphabet = {*a,b,c,...,z*} and
the Binary alphabet = {*0,1*}. Any finite set of symbols
can be used.
## Strings

A String over an alphabet is a finite sequence of symbols from that alphabet. We represent
strings by writing the symbols next to each other. For example *words* is a string over
the alphabet {*a,b,c,...,z*} and *110110* is a string over the alphabet {*0,1*}.
The string with one symbol is shown as the symbol itself. A string with no symbols is called
the empty string denoted by *e*. We generally use *u, v, w,
x, y, z* and the greek letters to name strings; for example we might use *x* to name
the string *ghi*. To avoid confusion, we shall not use *e* or any symbol in the
alphabet as the name of a string. The set of all strings, including the empty string, over an
alphabet is denoted by *.
## Languages

Any set of strings over an alphabet -that is any subset of
*- is called a language. Thus *,
and are languages. We can specify
a finite language by listing all its strings. For example { *abbba, bbab, bb*} is a
language over {*a, b*}. However, some languages are infinite so we can not list all the
elements. Some example languages are { 0, 01, 011, 0111, ...}, { *w*
{0,1}*:*w* has an equal number of zeros and ones} and
{ *w* {*a, b*}*:*w* has an odd number of *a*'s
and an odd number of *b*'s}. We can specify infinite languages using the notation

L={*w* * :*w*
has property *P*}.

## Definition of Regular Expressions

The Regular Expressions over an alphabet are the strings over
the alphabet {),(,
, , *} such that the following holds.

- and each member of
is a regular expression
- if and
are regular expressions then so is (
)
(Union)
- if and are regular
expressions then so is ()
(Concatenation)
- if is a regular expression then so is
* (
Kleene Star).

NOTE: Since you are unable to type on the keyboard, you can type '\0' instead.
We do not allow the use of the empty string *e* in our regular expressions, but as *e* is
simply shorthand for * you can type '\0*' instead.

## Precedence

The order in which the Union, Concatenation and Kleene Star operations are evaluated has
order similar to addition, multiplication and exponentiation. Kleene Star is evaluated first.
Concatenation is evaluated second. Union is evaluated last. This order can be changed by
putting brackets around part of the expression that you want to evaluate together. For
example, aa bb means the String aa or the string bb. While a
(a b)b means the string aab or the string abb.

Practice Questions Automata

Last modified: Thu Jan 6 13:30:12 PST 2000
by J. N. Tremblay