In
computing, a regular expression is a
string that is used to describe or match a
set of strings, according to certain
syntax rules.Regular expressions are used by many
text editors, utilities, and programming languages to search and manipulate text based on patterns. For example,
Perl and
Tcl have a powerful regular expression engine built directly into their syntax. Several utilities provided by
Unix distributions—including the editor
ed and the filter
grep—were the first to popularize the concept of regular expressions. "Regular expression" is often shortened to regex or regexp (singular), or regexes, regexps, or regexen (plural). Some authors distinguish between regular expression and regexp, restricting the former to true regular expressions, which describe regular languages, while using the latter for any regular expression-like pattern, including those that describe languages that are not regular. As only some authors observe this distinction, it is not safe to rely upon it.
See more at Wikipedia.org...
1. <
text,
operating system> (regexp, RE) One of the
wild card patterns used by
Perl and other languages, following
Unix utilities such as
grep,
sed, and
awk and editors such as
vi and
Emacs. Regular expressions use conventions similar to but more elaborate than those described under
glob. A regular expression is a sequence of characters with the following meanings:
An ordinary character (not one of the special characters discussed below) matches that character.
A backslash (\) followed by any special character matches the special character itself. The special characters are:
"." matches any character except NEWLINE; "RE*" (where the "*" is called the "
Kleene star") matches zero or more occurrences of RE. If there is any choice, the longest leftmost matching string is chosen, in most regexp
flavours.
"^" at the beginning of an RE matches the start of a line and "$" at the end of an RE matches the end of a line.
[string] matches any one character in that string. If the first character of the string is a "^" it matches any character except the remaining characters in the string (and also usually excluding NEWLINE). "-" may be used to indicate a range of consecutive ASCII characters.
\( RE \) matches whatever RE matches and \n, where n is a digit, matches whatever was matched by the RE between the nth \( and its corresponding \) earlier in the same RE. Many flavours use ( RE ) used instead of \( RE \).
The concatenation of REs is a RE that matches the concatenation of the strings matched by each RE. RE1 | RE2 matches whatever RE1 or RE2 matches.
\ matches the end of a word. In many flavours of regexp, \> and \ and \m\ matches m occurences of RE. RE\
m,\ matches m or more occurences of RE. RE\
m,n\ matches between m and n occurences.
The exact details of how regexp will work in a given application vary greatly from flavour to flavour. A comprehensive survey of regexp flavours is found in Friedl 1997 (see below).
[Jeffrey E.F. Friedl, "
Mastering Regular Expressions, O'Reilly, 1997].
2. Any description of a
pattern composed from combinations of
symbols and the three
operators:
Concatenation - pattern A concatenated with B matches a match for A followed by a match for B.
Or - pattern A-or-B matches either a match for A or a match for B.
Closure - zero or more matches for a pattern.
The earliest form of regular expressions (and the term itself) were invented by mathematician
Stephen Cole Kleene in the mid-1950s, as a notation to easily manipulate "regular sets", formal descriptions of the behaviour of
finite state machines, in
regular algebra.
[S.C. Kleene, "Representation of events in nerve nets and finite automata", 1956, Automata Studies. Princeton].
[J.H. Conway, "Regular algebra and finite machines", 1971, Eds Chapman & Hall].
[Sedgewick, "Algorithms in C", page 294].
(2004-02-01)