Regular Expressions Quick Reference

Regular expressions

Several DeltaWalker functionality areas—the file and the folder comparison filters as well as the Find/Replace dialog—leverage the power of regular expressions as a means of searching and matching text. Using regular expressions you can express a diverse set of patterns and be very precise as to the exact text to be matched. Their wide acceptance and knowledge base coverage in the public domain makes them a preferred choice.

This section gives a brief introduction to the regular expression syntax. For additional pointers on, please see the references listed in the See Also section below.

Literals

All characters but the characters specified bellow are interpreted as themselves and the explicitly mentioned characters are interpreted as themselves only when escaped with a backslash (\) character placed right before them:

\.[]^$?*+{}|()

Literal escapes

The following table lists and explains special uses of the backslash (\) character in combination with other literals for the purpose of matching certain characters:

Construct

Matches

\t

The tab character

\n

The newline (i.e. line-feed) character

\r

The carriage-return character

\f

The form-feed character

\a

The bell (i.e. alert) character

\e

The escape character

\0n

The character with octal value 0n (0 <= n <= 7)

\0nn

The character with octal value 0nn (0 <= n <= 7)

\0mnn

The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)

\xhh

The character with hexadecimal value 0xhh

\uhhhh

The character with hexadecimal value 0xhhhh

\cx

The control character corresponding to x

Character classes

The dot (.) character matches any character. It's the simplest and the most widely used case of the so-called character classes - regular sub-expressions with simplified syntax matching sets of characters:

Construct

Matches

[abc]

a, b, or c (simple class)

[^abc]

Any character except a, b, or c (negation)

[a-zA-Z]

a through z or A through Z, inclusive (range)

[a-d[m-p]]

a through d, or m through p: [a-dm-p] (union)

[a-z&&[def]]

d, e, or f (intersection)

[a-z&&[^bc]]

a through z, except for b and c: [ad-z] (subtraction)

[a-z&&[^m-p]]

a through z, and not m through p: [a-lq-z](subtraction)

\d

A digit: [0-9]

\D

A non-digit: [^0-9]

\s

A whitespace character: [ \t\n\x0B\f\r]

\S

A non-whitespace character: [^\s]

\w

A word character: [a-zA-Z_0-9]

\W

A non-word character: [^\w]

Boundary matchers

One of the special meanings of the ^ character has already been demonstrated as part of the syntax to define negated character classes. Its second meaning which is also in wide use is to denote the beginning of a line i.e. it does not match an actual character but discovers where a line starts. Other expressions signaling boundaries are:

Construct

Matches

$

The end of a line

\b

A word boundary

\B

A non-word boundary

\A

The beginning of the input

\G

The end of the previous match

\Z

The end of the input but for the final terminator, if any

\z

The end of the input

Quantifiers

Quantifiers enable spelling out the notion of expressions that repeat their match a certain number of times. Expressions that are to match multiple times are suffixed by the quantifiers. The following table lists forms of quantified expressions that are often used:

Construct

Matches

X?

X, once or not at all

X*

X, zero or more times

X+

X, one or more times

X{n}

X, exactly n times

X{n,}

X, at least n times

X{n,m}

X, at least n but not more than m times

Logical alternation

When matches at a given position are possible according to different expressions, the | character is used to separate the alternative expressions. For example, the scenario of matching according to either X or Y is expressed with the following form:

X|Y

Groups

Parenthesis group the elements of the regular expression into distinct sub-expressions so that quantifiers and logical alternation can be applied to them.

See Also