Regular expressions: Difference between revisions

Finished Character classes, started assertions
(initial first paragraph; make see also scratch blocks use [] instead of () to specify text rather than numbers or a variable.)
(Finished Character classes, started assertions)
Line 38:
|-
| <code>\</code> || Escape character || If a character is reserved for regex, such as <code><nowiki>*</nowiki></code>, <code><nowiki>|</nowiki></code>, or <code>.</code>. Note that this is itself a reserve character, so to match for it, you need to use <code>\\</code>. || "Foo.bar apple 78.9 banana" <code><nowiki>/[A-Za-z0-9]*\.[A-Za-z0-9]*/g</nowiki></code> -> <code>[ "Foo.bar", "78.9" ]</code>
|-
| <code>\d</code> || Digit character class escape || Equivalent to <code>[0-9]</code> || "78 Foo Bars" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "7", "8" ]</code>
|-
| <code>\D</code> || Non-digit character class escape || Equivalent to <code>[^0-9]</code> || "78 Foo Bars" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "F", "o"," "o", " ", "B", "a", "r", "s" ]</code>
|-
| <code>\w</code> || Word character class escape || Equivalent to <code>[A-Za-z0-9_]</code> || "_Foo- Bars+" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "_", "F"," "o", "o", "B", "a", "r", "s" ]</code>
|-
| <code>\W</code> || Non-word character class escape || Equivalent to <code>[^A-Za-z0-9_]</code> || "_Foo- Bars+" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "-", " ", "+" ]</code>
|-
| <code>\s</code> || White space character class escape || Matches all whitespace characters. Equivalent to <code>[\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]</code> || "_Foo- Bars+" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "-", " ", "+" ]</code>
|-
| <code>\S</code> || Non-white space character class escape || Matches everything but whitespace characters. Equivalent to <code>[^\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]</code> || "_Foo- Bars+" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "-", " ", "+" ]</code>
|-
| <code>\t</code> || Horizontal tab escape || Matches horizontal tab characters. || "a&nbsp;&nbsp;&nbsp;&nbsp;b" <code><nowiki>/\t/g</nowiki></code> -> <code>[ "&nbsp;&nbsp;&nbsp;&nbsp;" ]</code>
|-
| <code>\n</code> || Linefeed escape || Matches linefeed/new line characters || rowspan="4" | "a<br>b" <code><nowiki>/(?:\r?\n)|(?:\v)|(?:\f)/g</nowiki></code> -> <code>[ "" ]</code>
|-
| <code>\r</code> || Carriage return escape || Matches carriage return characters
|-
| <code>\v</code> || Vertical tab escape || Matches vertical tab characters
|-
| <code>\f</code> || Form feed escape || Matches form feed characters
|-
| <code>[\b]</code> || Backspace escape || Matches backspace || rowspan="5" | No example can be provided
|-
| <code>\0</code> || NUL escape || Matches the NUL character
|-
| <code>\u{YYYY}</code> or <code>\u{YYYY}</code> || rowspan="2" | Unicode value escape || When the <code>u</code> flag is applied. Here <code>Y</code> represents a hexadecimal number.
|-
| <code>\uYYYY</code> || Matches provided UTF-16 hexadecimal value. Represented with <code>Y</code>s here.
|-
| <code>\p{x}</code> or <code>\P{x}</code> || Unicode character class escape || Matches a character based on the Unicode property (<code>x</code>).
|-
| <code>\cx</code> || Caret notation escape || Matches the sequence following <code>\c</code> with [[w:Caret notation|caret notation]]. Note that <code>x</code> represents a sequence of characters here, rather than a single one. || "a<br>b" <code><nowiki>/\cM\cJ//g</nowiki></code> -> <code><nowiki>[ "" ]</nowiki></code>
|-
! colspan=4 | Assertions
|-
| <code>^</code> || Input boundary beginning assertion || Matches the beginning of the input. If the <code>m</code> flag is on, it matches the start of each line. || rowspan="2" | "Foo Bar" <code>/^Foo Bar$/g</code> -> <code>[ "Foo Bar" ]</code>
|-
| <code>$</code> || Input boundary end assertion || Matches the end of the input. If the <code>m</code> flag is on, it matches the end of each line.
|-
| <code>\b</code> || Word boundary assertion || Matches either end of a word. || "Foo Bar" <code>/(\bFoo\b)/</code> -> <code>[ "Foo" ]</code>
|-
| <code>\B</code> || Non-word boundary assertion || Matches the middle of a word. || "Foo Bar" <code>/(B\Bar)/</code> -> <code>[ "Bar" ]</code>
|}