Regular expressions
A regular expression is a string with special meanings assigned to certain characters. For example, the "." (period) means "match any character", and "*" (asterisk) means "match zero or more of the previous character".
Regular expression characters can be combined to form more complex expressions. The expression "The .* dog" will match "The lazy dog" and "The brown dog" since ".*" will match zero or more ("*") of any character (".").
The table below describes many of the regular expression characters and how they are used to match the input string.
Groups may be extracted using the group notation in the regular expression. A group is surrounded by parentheses - "(" and ")" in the regular expression. An expression to extract the kind of dog would look like "The (.*) dog". The result will have a GROUPS array containing a single element indicating what kind of a dog was described in the string.
Characters | |
---|---|
x | The character x |
// | The backslash character |
/t | The tab character ( '/u0009') |
/n | The newline (line feed) character ( '/u000A') |
/r | The carriage-return character ( '/u000D') |
/f | The form-feed character ( '/u000C') |
/e | The escape character ( '/u001B') |
Character classes | |
---|---|
[abc] | a, b, or c (simple class) |
[^abc] | Any character except a, b, or c (negation) |
[a-zA-Z] | a through z or A through Z, inclusive (range) |
[a-d[m-p]] | a through d, or m through p: [a-dm-p] (union) |
[a-z&&[def]] | d, e, or f (intersection) |
[a-z&&[^bc]] | a through z, except for b and c: [ad-z] (subtraction) |
[a-z&&[^m-p]] | a through z, and not m through p: [a-lq-z](subtraction) |
Predefined character classes | |
---|---|
. | Any character (may or may not match line terminators) |
/d | A digit: [0-9] |
/D | A non-digit: [^0-9] |
/s | A whitespace character: [ /t/n/x0B/f/r] |
/S | A non-whitespace character: [^/s] |
/w | A word character: [a-zA-Z_0-9] |
/W |
Boundary matchers | |
---|---|
^ | The beginning of a line |
$ | The end of a line |
/b | A word boundary |
/B | A non-word boundary |
/A | The beginning of the input |
/G | The end of the previous match |
/Z | The end of the input but for the final terminator, if any |
/z | The end of the input |
Greedy quantifiers | |
---|---|
X ? | X, once or not at all |
X * | X, zero or more times |
X + | X, one or more times |
X { n } | X, exactly n times |
X { n ,} | X, at least n times |
X { n , m } | X, at least n but not more than m times |
Reluctant quantifiers | |
---|---|
X ?? | X, once or not at all |
X *? | X, zero or more times |
X +? | X, one or more times |
X { n }? | X, exactly n times |
X { n ,}? | X, at least n times |
X { n , m }? | X, at least n but not more than m times |
Possessive quantifiers | |
---|---|
X ?+ | X, once or not at all |
X *+ | X, zero or more times |
X ++ | X, one or more times |
X { n }+ | X, exactly n times |
X { n ,}+ | X, at least n times |
X { n , m }+ | X, at least n but not more than m times |