Regular expressions: Difference between revisions

Regular expressions (view source)

Revision as of 17:53, 2 July 2024

5,754 bytes added , 8 days ago

corrected last example

VisualWikitext

Steve0Greatness

Chat moderators, Confirmed users, Moderators

93

edits

Revision as of 08:46, 2 July 2024 (view source) Steve0Greatness (talk \| contribs) (Added some external links related to regex.) ← Older edit		Revision as of 17:53, 2 July 2024 (view source) Steve0Greatness (talk \| contribs) (corrected last example) Newer edit →
(4 intermediate revisions by the same user not shown)
Line 1: [[File:RegexpHighlight.webp\|thumb\|500px\|alt=Lorem ipsum first paragraph where every consonant-vowel pair is highlighted\|The alternating <mark style="background:rgb(255,255,0);color:rgb(0,0,0)">yellow</mark> and <mark style="background:rgb(255,221,136);color:rgb(0,0,0);">orange</mark> highlights show results for the following regexp pattern: <code>/[a-z](?<![aeiou])[aeiou]/gi</code> (any consonant-vowel pair)]] '''Regular expression''', often shorted to '''regex''', is used to specify a ~~standard~~[[w:Pattern matching\|match pattern]] with just text. == Syntax == Line 38: \|- \| <code>\</code> \|\| Escape character \|\| If a character is reserved for regex, such as <code><nowiki></nowiki></code>, <code><nowiki>\|</nowiki></code>, or <code>.</code>. Note that this is itself a reserve character, so to match for it, you need to use <code>\\</code>. \|\| "Foo.bar apple 78.9 banana" <code><nowiki>/[A-Za-z0-9]\.[A-Za-z0-9]/g</nowiki></code> -> <code>[ "Foo.bar", "78.9" ]</code> \|- \| <code>\d</code> \|\| Digit character class \|\| Equivalent to <code>[0-9]</code> \|\| "78 Foo Bars" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "7", "8" ]</code> \|- \| <code>\D</code> \|\| Non-digit character class \|\| Equivalent to <code>[^0-9]</code> \|\| "78 Foo Bars" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "F", "o"," "o", " ", "B", "a", "r", "s" ]</code> \|- \| <code>\w</code> \|\| Word character class \|\| Equivalent to <code>[A-Za-z0-9_]</code> \|\| "_Foo- Bars+" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "_", "F"," "o", "o", "B", "a", "r", "s" ]</code> \|- \| <code>\W</code> \|\| Non-word character class \|\| Equivalent to <code>[^A-Za-z0-9_]</code> \|\| "_Foo- Bars+" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "-", " ", "+" ]</code> \|- \| <code>\s</code> \|\| White space character class \|\| Matches all whitespace characters. Equivalent to <code>[\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]</code> \|\| "_Foo- Bars+" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "-", " ", "+" ]</code> \|- \| <code>\S</code> \|\| Non-white space character class \|\| Matches everything but whitespace characters. Equivalent to <code>[^\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]</code> \|\| "_Foo- Bars+" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "-", " ", "+" ]</code> \|- \| <code>\t</code> \|\| Horizontal tab \|\| Matches horizontal tab characters. \|\| "a    b" <code><nowiki>/\t/g</nowiki></code> -> <code>[ "    " ]</code> \|- \| <code>\n</code> \|\| New line \|\| Matches linefeed/new line characters \|\| rowspan="4" \| "a<br>b" <code><nowiki>/(?:\r?\n)\|(?:\v)\|(?:\f)/g</nowiki></code> -> <code>[ "" ]</code> \|- \| <code>\r</code> \|\| Carriage return \|\| Matches carriage return characters \|- \| <code>\v</code> \|\| Vertical tab \|\| Matches vertical tab characters \|- \| <code>\f</code> \|\| Form feed \|\| Matches form feed characters \|- \| <code>[\b]</code> \|\| Backspace \|\| Matches backspace \|\| rowspan="5" \| No example can be provided \|- \| <code>\0</code> \|\| NUL \|\| Matches the NUL character \|- \| <code>\u{YYYY}</code> or <code>\u{YYYY}</code> \|\| rowspan="2" \| Unicode value escape \|\| When the <code>u</code> flag is applied. Here <code>Y</code> represents a hexadecimal number. \|- \| <code>\uYYYY</code> \|\| Matches provided UTF-16 hexadecimal value. Represented with <code>Y</code>s here. \|- \| <code>\p{x}</code> or <code>\P{x}</code> \|\| Unicode character class \|\| Matches a character based on the Unicode property (<code>x</code>). \|- \| <code>\cx</code> \|\| Caret notation escape \|\| Matches the sequence following <code>\c</code> with [[w:Caret notation\|caret notation]]. Note that <code>x</code> represents a sequence of characters here, rather than a single one. \|\| "a<br>b" <code><nowiki>/\cM\cJ//g</nowiki></code> -> <code><nowiki>[ "" ]</nowiki></code> \|- ! colspan=4 \| Assertions \|- \| <code>^</code> \|\| Input boundary beginning \|\| Matches the beginning of the input. If the <code>m</code> flag is on, it matches the start of each line. \|\| rowspan="2" \| "Foo Bar" <code>/(^Foo)\|(Bar$)/g</code> -> <code>[ "Foo", "Bar" ]</code> \|- \| <code>$</code> \|\| Input boundary end \|\| Matches the end of the input. If the <code>m</code> flag is on, it matches the end of each line. \|- \| <code>\b</code> \|\| Word boundary \|\| Matches either end of a word. \|\| "Foo Bar" <code>/(\bFoo\b)/</code> -> <code>[ "Foo" ]</code> \|- \| <code>\B</code> \|\| Non-word boundary \|\| Matches the middle of a word. \|\| "Foo Bar" <code>/(B\Bar)/</code> -> <code>[ "Bar" ]</code> \|- \| <code>x(?=y)</code> \|\| Positive lookahead \|\| Matches if <code>y</code> is after <code>x</code>, but doesn't include <code>y</code> in the output. \|\| "Foo Bar" <code>/Foo(?= Bar)/</code> -> <code>[ "Foo" ]</code> \|- \| <code>x(?!y)</code> \|\| Negative lookahead \|\| Matches if <code>y</code> is not after <code>x</code>, but doesn't include <code>y</code> in the output. \|\| "Foo Bar" <code>/Foo(?! Car)/</code> -> <code>[ "Foo" ]</code> \|- \| <code>(?<=x)y</code> \|\| Positive lookbehind \|\| Matches if <code>y</code> is before <code>x</code>, but doesn't include <code>y</code> in the output. \|\| "Foo Bar" <code>/(?<=Foo )Bar/</code> -> <code>[ "Bar" ]</code> \|- \| <code>(?<!x)y</code> \|\| Negative lookbehind \|\| Matches if <code>y</code> is before <code>x</code>, but doesn't include <code>y</code> in the output. \|\| "Foo Bar" <code>/(?<!Moo )Bar/</code> -> <code>[ "Bar" ]</code> \|- ! Quantifiers \|- \| <code>x</code> \|\| Wild-amount \|\| Matches <code>x</code> any number of times, including 0. \|\| "Foo Foo Foo Bar" <code>/(?:Foo )Bar/g</code> -> <code>[ "Foo Foo Foo Bar" ]</code> \|- \| <code>x+</code> \|\| Wild-1-or-more \|\| Matches <code>x</code> if it occurs 1 or more times. \|\| "Foo Bar Bar" <code>/(Foo)+ (Bar)+/</code> -> <code>[ "Foo Bar Bar" ]</code> \|- \| <code>x?</code> \|\| Can occur \|\| Matches <code>x</code> if it occurs, otherwise, ignore it. \|\| "Foo " <code>/Foo (Bar)?/</code> -> <code>[ "Foo " ]</code> \|- \| <code>x{Y}</code> \|\| Occurs set times \|\| Matches if <code>x</code> occurs <code>Y</code> times. \|\| "Foo Bar Bar" <code>/Fo{2} (?:Bar\s?){2}/</code> <code>[ "Foo Bar Bar" ]</code> \|- \| <code>x{Y,Z}</code> \|\| Occurs between set times \|\| Matches if <code>x</code> occurs <code>Y</code> and <code>Z</code> times. \|\| "Foooo Bar Bar Bar Bar Bar" <code>/Fo{2,5} (?:Bar\s?){1,10}/</code> <code>[ "Foooo Bar Bar Bar Bar Bar" ]</code> \|- \| <code>x?</code>, <code>x+?</code>, <code>x??</code>, <code>x{Y}?</code>, or <code>x{Y,Z}?</code> \|\| Lazy match \|\| Matches <code>x</code> the least number of times possible, in accordance to the base rule. \|\| "Foooo Bar Bar Bar Bar Bar" <code>/Fo{2,5} (?:Bar\s??){1,10}?/</code> <code>[ "Foooo Bar" ]</code> \|} Line 52 ⟶ 118: == See also == * [[Match () with regex () ()\|<sb>(Match ([foo bar)] with regex ([foo)] ([g)] :: operators)</sb>]] * [[Test regex () () with text ()\|<sb><Test regex ([foo bar)] ([g)] with text ([foo)] :: sensing></sb>]] == External links ==