Regular expressions
Jump to navigation
Jump to search
Regular expression, often shortened to regex, is used to specify a match pattern with just text.
Syntax
x
, y
, and z
when used under symbols are placeholders for text. Capital X
s, Y
s, and Z
s are used for number placeholders.
Symbol(s) | Name | Description | Example |
---|---|---|---|
Groups and backreferences | |||
(x) |
Capture group | Separates the content in the output. | "Foo Bar" /(Foo)|(Bar)/g -> [ "Foo", "Bar" ]
|
(?:x) |
Non-capture group | Acts as if the parentheses were not there | "Foo Bar" /(?:Foo)|(?:Bar)/g -> [ "Foo Bar" ]
|
(?<y>x) |
Named capture group | Equivalent to (x) , except it remembers the content used. |
"Foo Bar" /(?<F>Foo)|(?<B>Bar)/g -> [ "Foo", "Bar" ]
|
\k<y> |
Named backreference | References a previous named capture group, note that \k is literal |
"Foo Foo" /(?<Foo>Foo)\s\k<Foo>/g -> [ "Foo Foo" ]
|
Character classes | |||
[x-z] |
Character class | Matches every letter or number from x to z . |
"Foo Bar" /[a-f]/gi -> [ "F", "B", "a" ]
|
[xyz] |
References either x , y , or z |
"Foo Bar" /[FB]/g -> [ "F", "B" ]
| |
[^x-z] |
Negated character class | Matches every letter or number not from x to z . |
"Foo Bar" /[^a-f]/gi -> [ "o", "o", " ", "r" ]
|
[^xyz] |
References characters that aren't x , y , or z |
"Foo Bar" /[^FB]/g -> [ "o", "o", " ", "a", "r" ]
| |
. |
Wildcard | Matches every character besides line terminators. Line terminators include \n , \r , \u2028 , and \u2029 |
"Foo Bar" /./g -> [ "F", "o", "o", " ", "B", "a", "r" ]
|
x|y |
Disjunction | Match something or something else. | "Foo Bar" /Foo|Bar/g -> [ "Foo", "Bar" ]
|
\ |
Escape character | If a character is reserved for regex, such as * , | , or . . Note that this is itself a reserve character, so to match for it, you need to use \\ . |
"Foo.bar apple 78.9 banana" /[A-Za-z0-9]*\.[A-Za-z0-9]*/g -> [ "Foo.bar", "78.9" ]
|
\d |
Digit character class | Equivalent to [0-9] |
"78 Foo Bars" /\d/g -> [ "7", "8" ]
|
\D |
Non-digit character class | Equivalent to [^0-9] |
"78 Foo Bars" /\d/g -> [ "F", "o"," "o", " ", "B", "a", "r", "s" ]
|
\w |
Word character class | Equivalent to [A-Za-z0-9_] |
"_Foo- Bars+" /\d/g -> [ "_", "F"," "o", "o", "B", "a", "r", "s" ]
|
\W |
Non-word character class | Equivalent to [^A-Za-z0-9_] |
"_Foo- Bars+" /\d/g -> [ "-", " ", "+" ]
|
\s |
White space character class | Matches all whitespace characters. Equivalent to [\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff] |
"_Foo- Bars+" /\d/g -> [ "-", " ", "+" ]
|
\S |
Non-white space character class | Matches everything but whitespace characters. Equivalent to [^\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff] |
"_Foo- Bars+" /\d/g -> [ "-", " ", "+" ]
|
\t |
Horizontal tab | Matches horizontal tab characters. | "a b" /\t/g -> [ " " ]
|
\n |
New line | Matches linefeed/new line characters | "a b" /(?:\r?\n)|(?:\v)|(?:\f)/g -> [ "" ]
|
\r |
Carriage return | Matches carriage return characters | |
\v |
Vertical tab | Matches vertical tab characters | |
\f |
Form feed | Matches form feed characters | |
[\b] |
Backspace | Matches backspace | No example can be provided |
\0 |
NUL | Matches the NUL character | |
\u{YYYY} or \u{YYYY} |
Unicode value escape | When the u flag is applied. Here Y represents a hexadecimal number.
| |
\uYYYY |
Matches provided UTF-16 hexadecimal value. Represented with Y s here.
| ||
\p{x} or \P{x} |
Unicode character class | Matches a character based on the Unicode property (x ).
| |
\cx |
Caret notation escape | Matches the sequence following \c with caret notation. Note that x represents a sequence of characters here, rather than a single one. |
"a b" /\cM\cJ//g -> [ "" ]
|
Assertions | |||
^ |
Input boundary beginning | Matches the beginning of the input. If the m flag is on, it matches the start of each line. |
"Foo Bar" /(^Foo)|(Bar$)/g -> [ "Foo", "Bar" ]
|
$ |
Input boundary end | Matches the end of the input. If the m flag is on, it matches the end of each line.
| |
\b |
Word boundary | Matches either end of a word. | "Foo Bar" /(\bFoo\b)/ -> [ "Foo" ]
|
\B |
Non-word boundary | Matches the middle of a word. | "Foo Bar" /(B\Bar)/ -> [ "Bar" ]
|
x(?=y) |
Positive lookahead | Matches if y is after x , but doesn't include y in the output. |
"Foo Bar" /Foo(?= Bar)/ -> [ "Foo" ]
|
x(?!y) |
Negative lookahead | Matches if y is not after x , but doesn't include y in the output. |
"Foo Bar" /Foo(?! Car)/ -> [ "Foo" ]
|
(?<=x)y |
Positive lookbehind | Matches if y is before x , but doesn't include y in the output. |
"Foo Bar" /(?<=Foo )Bar/ -> [ "Bar" ]
|
(?<!x)y |
Negative lookbehind | Matches if y is before x , but doesn't include y in the output. |
"Foo Bar" /(?<!Moo )Bar/ -> [ "Bar" ]
|
Quantifiers | |||
x* |
Wild-amount | Matches x any number of times, including 0. |
"Foo Foo Foo Bar" /(?:Foo )*Bar/g -> [ "Foo Foo Foo Bar" ]
|
x+ |
Wild-1-or-more | Matches x if it occurs 1 or more times. |
"Foo Bar Bar" /(Foo)+ (Bar)+/ -> [ "Foo Bar Bar" ]
|
x? |
Can occur | Matches x if it occurs, otherwise, ignore it. |
"Foo " /Foo (Bar)?/ -> [ "Foo " ]
|
x{Y} |
Occurs set times | Matches if x occurs Y times. |
"Foo Bar Bar" /Fo{2} (?:Bar\s?){2}/ [ "Foo Bar Bar" ]
|
x{Y,Z} |
Occurs between set times | Matches if x occurs Y and Z times. |
"Foooo Bar Bar Bar Bar Bar" /Fo{2,5} (?:Bar\s?){1,10}/ [ "Foooo Bar Bar Bar Bar Bar" ]
|
x*? , x+? , x?? , x{Y}? , or x{Y,Z}? |
Lazy match | Matches x the least number of times possible, in accordance to the base rule. |
"Foooo Bar Bar Bar Bar Bar" /Fo{2,5} (?:Bar\s??){1,10}?/ [ "Foooo Bar" ]
|
Flags
Whilst there are flags other than the following, they are either non-standard, or do not have a baring on PenguinMod.
Flag | Name | Description |
---|---|---|
g |
g lobal |
Search all of a string, rather than stopping once you find an occurrence. |
i |
Case i nsensitive |
The search will ignore the case of characters, making /[A-Za-z]g and /[a-z]/gi equivalent.
|
m |
m ultiline |
Makes ^ and $ match the start and end of lines rather than the start and end of strings.
|
s |
s ingle line/dot all |
Makes . able to match all line terminators: \n , \r , \u2028 , and \u2029 .
|
u |
u nicode |
Makes the pattern treated as a sequence of unicode codepoints. |
v |
unicode upgrade | Similar to u , but updated with more features.
|
See also
(match [foo bar] with regex [foo] [g]:: operators)
<test regex [foo bar] [g] with text [foo]:: sensing>
External links
- Regular expressions on Wikipedia
- TurboWarp extension gallery featuring TrueFantom's RegExp extension. It can be loaded into PenguinMod using
https://extensions.turbowarp.org/true-fantom/regexp.js
as the URL in the Load Custom Extensions popup. It adds more regex functionality into PenguinMod. - regex101, fairly useful little app with some fun challenges to test your knowledge of regex.
- MDN's Regular expressions documentation for JavaScript. There wasn't a good place to cite this, but I sourced at lot of stuff from here. Pretty much all of the names for each syntax element.