Regular expressions: Difference between revisions
Jump to navigation
Jump to search
Content deleted Content added
initial first paragraph; make see also scratch blocks use [] instead of () to specify text rather than numbers or a variable. |
Finished Character classes, started assertions |
||
Line 38: | Line 38: | ||
|- |
|- |
||
| <code>\</code> || Escape character || If a character is reserved for regex, such as <code><nowiki>*</nowiki></code>, <code><nowiki>|</nowiki></code>, or <code>.</code>. Note that this is itself a reserve character, so to match for it, you need to use <code>\\</code>. || "Foo.bar apple 78.9 banana" <code><nowiki>/[A-Za-z0-9]*\.[A-Za-z0-9]*/g</nowiki></code> -> <code>[ "Foo.bar", "78.9" ]</code> |
| <code>\</code> || Escape character || If a character is reserved for regex, such as <code><nowiki>*</nowiki></code>, <code><nowiki>|</nowiki></code>, or <code>.</code>. Note that this is itself a reserve character, so to match for it, you need to use <code>\\</code>. || "Foo.bar apple 78.9 banana" <code><nowiki>/[A-Za-z0-9]*\.[A-Za-z0-9]*/g</nowiki></code> -> <code>[ "Foo.bar", "78.9" ]</code> |
||
|- |
|||
| <code>\d</code> || Digit character class escape || Equivalent to <code>[0-9]</code> || "78 Foo Bars" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "7", "8" ]</code> |
|||
|- |
|||
| <code>\D</code> || Non-digit character class escape || Equivalent to <code>[^0-9]</code> || "78 Foo Bars" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "F", "o"," "o", " ", "B", "a", "r", "s" ]</code> |
|||
|- |
|||
| <code>\w</code> || Word character class escape || Equivalent to <code>[A-Za-z0-9_]</code> || "_Foo- Bars+" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "_", "F"," "o", "o", "B", "a", "r", "s" ]</code> |
|||
|- |
|||
| <code>\W</code> || Non-word character class escape || Equivalent to <code>[^A-Za-z0-9_]</code> || "_Foo- Bars+" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "-", " ", "+" ]</code> |
|||
|- |
|||
| <code>\s</code> || White space character class escape || Matches all whitespace characters. Equivalent to <code>[\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]</code> || "_Foo- Bars+" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "-", " ", "+" ]</code> |
|||
|- |
|||
| <code>\S</code> || Non-white space character class escape || Matches everything but whitespace characters. Equivalent to <code>[^\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]</code> || "_Foo- Bars+" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "-", " ", "+" ]</code> |
|||
|- |
|||
| <code>\t</code> || Horizontal tab escape || Matches horizontal tab characters. || "a b" <code><nowiki>/\t/g</nowiki></code> -> <code>[ " " ]</code> |
|||
|- |
|||
| <code>\n</code> || Linefeed escape || Matches linefeed/new line characters || rowspan="4" | "a<br>b" <code><nowiki>/(?:\r?\n)|(?:\v)|(?:\f)/g</nowiki></code> -> <code>[ "" ]</code> |
|||
|- |
|||
| <code>\r</code> || Carriage return escape || Matches carriage return characters |
|||
|- |
|||
| <code>\v</code> || Vertical tab escape || Matches vertical tab characters |
|||
|- |
|||
| <code>\f</code> || Form feed escape || Matches form feed characters |
|||
|- |
|||
| <code>[\b]</code> || Backspace escape || Matches backspace || rowspan="5" | No example can be provided |
|||
|- |
|||
| <code>\0</code> || NUL escape || Matches the NUL character |
|||
|- |
|||
| <code>\u{YYYY}</code> or <code>\u{YYYY}</code> || rowspan="2" | Unicode value escape || When the <code>u</code> flag is applied. Here <code>Y</code> represents a hexadecimal number. |
|||
|- |
|||
| <code>\uYYYY</code> || Matches provided UTF-16 hexadecimal value. Represented with <code>Y</code>s here. |
|||
|- |
|||
| <code>\p{x}</code> or <code>\P{x}</code> || Unicode character class escape || Matches a character based on the Unicode property (<code>x</code>). |
|||
|- |
|||
| <code>\cx</code> || Caret notation escape || Matches the sequence following <code>\c</code> with [[w:Caret notation|caret notation]]. Note that <code>x</code> represents a sequence of characters here, rather than a single one. || "a<br>b" <code><nowiki>/\cM\cJ//g</nowiki></code> -> <code><nowiki>[ "" ]</nowiki></code> |
|||
|- |
|||
! colspan=4 | Assertions |
|||
|- |
|||
| <code>^</code> || Input boundary beginning assertion || Matches the beginning of the input. If the <code>m</code> flag is on, it matches the start of each line. || rowspan="2" | "Foo Bar" <code>/^Foo Bar$/g</code> -> <code>[ "Foo Bar" ]</code> |
|||
|- |
|||
| <code>$</code> || Input boundary end assertion || Matches the end of the input. If the <code>m</code> flag is on, it matches the end of each line. |
|||
|- |
|||
| <code>\b</code> || Word boundary assertion || Matches either end of a word. || "Foo Bar" <code>/(\bFoo\b)/</code> -> <code>[ "Foo" ]</code> |
|||
|- |
|||
| <code>\B</code> || Non-word boundary assertion || Matches the middle of a word. || "Foo Bar" <code>/(B\Bar)/</code> -> <code>[ "Bar" ]</code> |
|||
|} |
|} |
||
Revision as of 16:56, 2 July 2024
Regular expression, often shorted to regex, is used to specify a match pattern with just text.
Syntax
x
, y
, and z
when used under symbols are placeholders for text. Capital X
s, Y
s, and Z
s are used for number placeholders.
Symbol(s) | Name | Description | Example |
---|---|---|---|
Groups and backreferences | |||
(x) |
Capture group | Separates the content in the output. | "Foo Bar" /(Foo)|(Bar)/g -> [ "Foo", "Bar" ]
|
(?:x) |
Non-capture group | Acts as if the parentheses were not there | "Foo Bar" /(?:Foo)|(?:Bar)/g -> [ "Foo Bar" ]
|
(?<y>x) |
Named capture group | Equivalent to (x) , except it remembers the content used. |
"Foo Bar" /(?<F>Foo)|(?<B>Bar)/g -> [ "Foo", "Bar" ]
|
\k<y> |
Named backreference | References a previous named capture group, note that \k is literal |
"Foo Foo" /(?<Foo>Foo)\s\k<Foo>/g -> [ "Foo Foo" ]
|
Character classes | |||
[x-z] |
Character class | Matches every letter or number from x to z . |
"Foo Bar" /[a-f]/gi -> [ "F", "B", "a" ]
|
[xyz] |
References either x , y , or z |
"Foo Bar" /[FB]/g -> [ "F", "B" ]
| |
[^x-z] |
Negated character class | Matches every letter or number not from x to z . |
"Foo Bar" /[^a-f]/gi -> [ "o", "o", " ", "r" ]
|
[^xyz] |
References characters that aren't x , y , or z |
"Foo Bar" /[^FB]/g -> [ "o", "o", " ", "a", "r" ]
| |
. |
Wildcard | Matches every character besides line terminators. Line terminators include \n , \r , \u2028 , and \u2029 |
"Foo Bar" /./g -> [ "F", "o", "o", " ", "B", "a", "r" ]
|
x|y |
Disjunction | Match something or something else. | "Foo Bar" /Foo|Bar/g -> [ "Foo", "Bar" ]
|
\ |
Escape character | If a character is reserved for regex, such as * , | , or . . Note that this is itself a reserve character, so to match for it, you need to use \\ . |
"Foo.bar apple 78.9 banana" /[A-Za-z0-9]*\.[A-Za-z0-9]*/g -> [ "Foo.bar", "78.9" ]
|
\d |
Digit character class escape | Equivalent to [0-9] |
"78 Foo Bars" /\d/g -> [ "7", "8" ]
|
\D |
Non-digit character class escape | Equivalent to [^0-9] |
"78 Foo Bars" /\d/g -> [ "F", "o"," "o", " ", "B", "a", "r", "s" ]
|
\w |
Word character class escape | Equivalent to [A-Za-z0-9_] |
"_Foo- Bars+" /\d/g -> [ "_", "F"," "o", "o", "B", "a", "r", "s" ]
|
\W |
Non-word character class escape | Equivalent to [^A-Za-z0-9_] |
"_Foo- Bars+" /\d/g -> [ "-", " ", "+" ]
|
\s |
White space character class escape | Matches all whitespace characters. Equivalent to [\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff] |
"_Foo- Bars+" /\d/g -> [ "-", " ", "+" ]
|
\S |
Non-white space character class escape | Matches everything but whitespace characters. Equivalent to [^\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff] |
"_Foo- Bars+" /\d/g -> [ "-", " ", "+" ]
|
\t |
Horizontal tab escape | Matches horizontal tab characters. | "a b" /\t/g -> [ " " ]
|
\n |
Linefeed escape | Matches linefeed/new line characters | "a b" /(?:\r?\n)|(?:\v)|(?:\f)/g -> [ "" ]
|
\r |
Carriage return escape | Matches carriage return characters | |
\v |
Vertical tab escape | Matches vertical tab characters | |
\f |
Form feed escape | Matches form feed characters | |
[\b] |
Backspace escape | Matches backspace | No example can be provided |
\0 |
NUL escape | Matches the NUL character | |
\u{YYYY} or \u{YYYY} |
Unicode value escape | When the u flag is applied. Here Y represents a hexadecimal number.
| |
\uYYYY |
Matches provided UTF-16 hexadecimal value. Represented with Y s here.
| ||
\p{x} or \P{x} |
Unicode character class escape | Matches a character based on the Unicode property (x ).
| |
\cx |
Caret notation escape | Matches the sequence following \c with caret notation. Note that x represents a sequence of characters here, rather than a single one. |
"a b" /\cM\cJ//g -> [ "" ]
|
Assertions | |||
^ |
Input boundary beginning assertion | Matches the beginning of the input. If the m flag is on, it matches the start of each line. |
"Foo Bar" /^Foo Bar$/g -> [ "Foo Bar" ]
|
$ |
Input boundary end assertion | Matches the end of the input. If the m flag is on, it matches the end of each line.
| |
\b |
Word boundary assertion | Matches either end of a word. | "Foo Bar" /(\bFoo\b)/ -> [ "Foo" ]
|
\B |
Non-word boundary assertion | Matches the middle of a word. | "Foo Bar" /(B\Bar)/ -> [ "Bar" ]
|
Flags
Flag | Name | Description |
---|---|---|
g |
g lobal |
Search all of a string, rather than stopping once you find an occurrence. |
See also
(Match [foo bar] with regex [foo] [g] :: operators)
<Test regex [foo bar] [g] with text [foo] :: sensing>
External links
- Regular expressions on Wikipedia
- TurboWarp extension gallery featuring TrueFantom's RegExp extension. It can be loaded into PenguinMod using
https://extensions.turbowarp.org/true-fantom/regexp.js
as the URL in the Load Custom Extensions popup. It adds more regex functionality into PenguinMod. - regex101, fairly useful little app with some fun challenges to test your knowledge of regex.
- MDN's Regular expressions documentation for JavaScript. There wasn't a good place to cite this, but I sourced at lot of stuff from here. Pretty much all of the names for each syntax element.