Everything Else
Regex Reference

Regular Expressions Quick Reference

Several DeltaWalker functionality areas—the file and the folder comparison filters as well as the Find/Replace dialog—leverage the power of regular expressions as a means of searching and matching text. Using regular expressions, you can express a diverse set of patterns and be very precise as to the exact text to be matched. Their wide acceptance and knowledge base coverage in the public domain makes them a preferred choice.

This section gives a brief introduction to the regular expression syntax. For additional pointers, please see the references listed in the See Also section below.

Literals

All characters but the characters specified below are interpreted as themselves, and the explicitly mentioned characters are interpreted as themselves only when escaped with a backslash (\) character placed right before them:

\\.\[\]^$?\*+{}|()

Literal escapes

The following table lists and explains special uses of the backslash (\) character in combination with other literals for the purpose of matching certain characters:

ConstructMatches
\tThe tab character
\nThe newline (i.e. line-feed) character
\rThe carriage-return character
\fThe form-feed character
\aThe bell (i.e. alert) character
\eThe escape character
\0nThe character with octal value 0n (0 <= n <= 7)
\0nnThe character with octal value 0nn (0 <= n <= 7)
\0mnnThe character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)
\xhhThe character with hexadecimal value 0xhh
\uhhhhThe character with hexadecimal value 0xhhhh
\cxThe control character corresponding to x

Character classes

The dot (.) character matches any character. It's the simplest and the most widely used case of the so-called character classes—regular sub-expressions with simplified syntax matching sets of characters:

ConstructMatches
[abc]a, b, or c (simple class)
[^abc]Any character except a, b, or c (negation)
[a-zA-Z]a through z or A through Z, inclusive (range)
[a-d[m-p]]a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]]d, e, or f (intersection)
[a-z&&[^bc]]a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]]a through z, and not m through p: [a-lq-z] (subtraction)
\dA digit: [0-9]
\DA non-digit: [^0-9]
\sA whitespace character: [ \t\n\x0B\f\r]
\SA non-whitespace character: [^\s]
\wA word character: [a-zA-Z_0-9]
\WA non-word character: [^\w]

Boundary matchers

One of the special meanings of the ^ character has already been demonstrated as part of the syntax to define negated character classes. Its second meaning, which is also in wide use, is to denote the beginning of a line i.e. it does not match an actual character but discovers where a line starts. Other expressions signaling boundaries are:

ConstructMatches
$The end of a line
\bA word boundary
\BA non-word boundary
\AThe beginning of the input
\GThe end of the previous match
\ZThe end of the input but for the final terminator, if any
\zThe end of the input

Quantifiers

Quantifiers enable spelling out the notion of expressions that repeat their match a certain number of times. Expressions that are to match multiple times are suffixed by the quantifiers. The following table lists forms of quantified expressions that are often used:

ConstructMatches
X?X, once or not at all
X*X, zero or more times
X+X, one or more times
X{n}X, exactly n times
X{n,}X, at least n times
X{n,m}X, at least n but not more than m times

Logical alternation

When matches at a given position are possible according to different expressions, the | character is used to separate the alternative expressions. For example, the scenario of matching according to either X or Y is expressed with the following form:

X|Y

Groups

Parentheses group the elements of the regular expression into distinct sub-expressions so that quantifiers and logical alternation can be applied to them.

See Also