Regular expressions
Regular expression, often shortened to RegEx or RegExp, is a text-based format used to specify match patterns.
Syntax
x
, y
, and z
when used under symbols are placeholders for text. Capital X
s, Y
s, and Z
s are used for number placeholders.
Symbol(s) | Name | Description | Example |
---|---|---|---|
Groups and backreferences | |||
(x) |
Capture group | Separates the content in the output. | "Foo Bar" /(Foo)|(Bar)/g -> [ "Foo", "Bar" ]
|
(?:x) |
Non-capture group | Acts as if the parentheses were not there | "Foo Bar" /(?:Foo)|(?:Bar)/g -> [ "Foo Bar" ]
|
(?<y>x) |
Named capture group | Equivalent to (x) , except it remembers the content used. |
"Foo Bar" /(?<F>Foo)|(?<B>Bar)/g -> [ "Foo", "Bar" ]
|
\k<y> |
Named backreference | References a previous named capture group, note that \k is literal |
"Foo Foo" /(?<Foo>Foo)\s\k<Foo>/g -> [ "Foo Foo" ]
|
Character classes | |||
[x-z] |
Character class | Matches every letter or number from x to z . |
"Foo Bar" /[a-f]/gi -> [ "F", "B", "a" ]
|
[xyz] |
References either x , y , or z |
"Foo Bar" /[FB]/g -> [ "F", "B" ]
| |
[^x-z] |
Negated character class | Matches every letter or number not from x to z . |
"Foo Bar" /[^a-f]/gi -> [ "o", "o", " ", "r" ]
|
[^xyz] |
References characters that aren't x , y , or z |
"Foo Bar" /[^FB]/g -> [ "o", "o", " ", "a", "r" ]
| |
. |
Wildcard | Matches every character besides line terminators. Line terminators include \n , \r , \u2028 , and \u2029 |
"Foo Bar" /./g -> [ "F", "o", "o", " ", "B", "a", "r" ]
|
x|y |
Disjunction | Match something or something else. | "Foo Bar" /Foo|Bar/g -> [ "Foo", "Bar" ]
|
\ |
Escape character | If a character is reserved for regex, such as * , | , or . . Note that this is itself a reserve character, so to match for it, you need to use \\ . |
"Foo.bar apple 78.9 banana" /[A-Za-z0-9]*\.[A-Za-z0-9]*/g -> [ "Foo.bar", "78.9" ]
|
\d |
Digit character class | Equivalent to [0-9] |
"78 Foo Bars" /\d/g -> [ "7", "8" ]
|
\D |
Non-digit character class | Equivalent to [^0-9] |
"78 Foo Bars" /\d/g -> [ "F", "o"," "o", " ", "B", "a", "r", "s" ]
|
\w |
Word character class | Equivalent to [A-Za-z0-9_] |
"_Foo- Bars+" /\d/g -> [ "_", "F"," "o", "o", "B", "a", "r", "s" ]
|
\W |
Non-word character class | Equivalent to [^A-Za-z0-9_] |
"_Foo- Bars+" /\d/g -> [ "-", " ", "+" ]
|
\s |
White space character class | Matches all whitespace characters. Equivalent to [\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff] |
"_Foo- Bars+" /\d/g -> [ "-", " ", "+" ]
|
\S |
Non-white space character class | Matches everything but whitespace characters. Equivalent to [^\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff] |
"_Foo- Bars+" /\d/g -> [ "-", " ", "+" ]
|
\t |
Horizontal tab | Matches horizontal tab characters. | "a b" /\t/g -> [ " " ]
|
\n |
New line | Matches linefeed/new line characters | "a b" /(?:\r?\n)|(?:\v)|(?:\f)/g -> [ "" ]
|
\r |
Carriage return | Matches carriage return characters | |
\v |
Vertical tab | Matches vertical tab characters | |
\f |
Form feed | Matches form feed characters | |
[\b] |
Backspace | Matches backspace | No example can be provided |
\0 |
NUL | Matches the NUL character | |
\u{YYYY} or \u{YYYY} |
Unicode value escape | When the u flag is applied. Here Y represents a hexadecimal number.
| |
\uYYYY |
Matches provided UTF-16 hexadecimal value. Represented with Y s here.
| ||
\p{x} or \P{x} |
Unicode character class | Matches a character based on the Unicode property (x ).
| |
\cx |
Caret notation escape | Matches the sequence following \c with caret notation. Note that x represents a sequence of characters here, rather than a single one. |
"a b" /\cM\cJ//g -> [ "" ]
|
Assertions | |||
^ |
Input boundary beginning | Matches the beginning of the input. If the m flag is on, it matches the start of each line. |
"Foo Bar" /(^Foo)|(Bar$)/g -> [ "Foo", "Bar" ]
|
$ |
Input boundary end | Matches the end of the input. If the m flag is on, it matches the end of each line.
| |
\b |
Word boundary | Matches either end of a word. | "Foo Bar" /(\bFoo\b)/ -> [ "Foo" ]
|
\B |
Non-word boundary | Matches the middle of a word. | "Foo Bar" /(B\Bar)/ -> [ "Bar" ]
|
x(?=y) |
Positive lookahead | Matches if y is after x , but doesn't include y in the output. |
"Foo Bar" /Foo(?= Bar)/ -> [ "Foo" ]
|
x(?!y) |
Negative lookahead | Matches if y is not after x , but doesn't include y in the output. |
"Foo Bar" /Foo(?! Car)/ -> [ "Foo" ]
|
(?<=x)y |
Positive lookbehind | Matches if y is before x , but doesn't include y in the output. |
"Foo Bar" /(?<=Foo )Bar/ -> [ "Bar" ]
|
(?<!x)y |
Negative lookbehind | Matches if y is before x , but doesn't include y in the output. |
"Foo Bar" /(?<!Moo )Bar/ -> [ "Bar" ]
|
Quantifiers | |||
x* |
Wild-amount | Matches x any number of times, including 0. |
"Foo Foo Foo Bar" /(?:Foo )*Bar/g -> [ "Foo Foo Foo Bar" ]
|
x+ |
Wild-1-or-more | Matches x if it occurs 1 or more times. |
"Foo Bar Bar" /(Foo)+ (Bar)+/ -> [ "Foo Bar Bar" ]
|
x? |
Can occur | Matches x if it occurs, otherwise, ignore it. |
"Foo " /Foo (Bar)?/ -> [ "Foo " ]
|
x{Y} |
Occurs set times | Matches if x occurs Y times. |
"Foo Bar Bar" /Fo{2} (?:Bar\s?){2}/ [ "Foo Bar Bar" ]
|
x{Y,Z} |
Occurs between set times | Matches if x occurs Y and Z times. |
"Foooo Bar Bar Bar Bar Bar" /Fo{2,5} (?:Bar\s?){1,10}/ [ "Foooo Bar Bar Bar Bar Bar" ]
|
x*? , x+? , x?? , x{Y}? , or x{Y,Z}? |
Lazy match | Matches x the least number of times possible, in accordance to the base rule. |
"Foooo Bar Bar Bar Bar Bar" /Fo{2,5} (?:Bar\s??){1,10}?/ [ "Foooo Bar" ]
|
Flags
Whilst there are flags other than the following, they are either non-standard, or do not have a baring on PenguinMod.
Flag | Name | Description |
---|---|---|
g |
g lobal |
Search all of a string, rather than stopping once you find an occurrence. |
i |
Case i nsensitive |
The search will ignore the case of characters, making /[A-Za-z]g and /[a-z]/gi equivalent.
|
m |
m ultiline |
Makes ^ and $ match the start and end of lines rather than the start and end of strings.
|
s |
s ingle line/dot all |
Makes . able to match all line terminators: \n , \r , \u2028 , and \u2029 .
|
u |
u nicode |
Makes the pattern treated as a sequence of unicode codepoints. |
v |
unicode upgrade | Similar to u , but updated with more features.
|
Examples
The following examples will have a RegExp string, and an example string. The places where the example string matches the RegExp will be highlighted with alternating yellow and orange.
Fox and dog
The RegExp (?<=[^a-z])[^aeiou][aeiou][^aeiou](?=[^a-z]|$)
will match any 3-letter-word that starts and ends with a consonant, with the middle letter being a vowel.
The quick brown fox jumps over the lazy dog
In this case, the matches are fox and dog.
Word after "l"
The RegExp (?<=(?:^| )[lL])[A-Za-z]*
matches every character in a word that starts with the letter L, other than the L at the start.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
In this case, the matches are "orem," "abore," "aboris," and "aborum."
Non-specific dialects
The RegExp gr[ea]y
matches the British-English "grey" and the American-English "gray." this allows for
The colour grey.
The color gray.
Here, both grey and gray are detected.
See also
Further reading
- Regular expressions on Wikipedia
- Regular-Expressions.info, a website with many guides and examples for RegExp.
- RegEgg, a tutorial for RegEx.
External links
- regex101, fairly useful little app with some fun challenges to test your knowledge of regex.
References
- ↑ This table is mainly based on MDN's documentation of the syntax of regular expressions
- ↑ This table is mainly based on MDN's documentation regular expressions flags.