User:Steve0Greatness/drafts/Regular expression: Difference between revisions

Content added Content deleted

Inline

Latest revision as of 17:53, 2 July 2024

Lorem ipsum first paragraph where every consonant-vowel pair is highlighted — The alternating yellow and orange highlights show results for the following regexp pattern: `/[a-z](?<![aeiou])[aeiou]/gi` (any consonant-vowel pair)

Regular expression, often shorted to regex, is used to specify a match pattern with just text.

Syntax

x, y, and z when used under symbols are placeholders for text. Capital Xs, Ys, and Zs are used for number placeholders.

Syntax Reference
Symbol(s)	Name	Description	Example
Groups and backreferences
`(x)`	Capture group	Separates the content in the output.	"Foo Bar" `/(Foo)\|(Bar)/g` -> `[ "Foo", "Bar" ]`
`(?:x)`	Non-capture group	Acts as if the parentheses were not there	"Foo Bar" `/(?:Foo)\|(?:Bar)/g` -> `[ "Foo Bar" ]`
`(?<y>x)`	Named capture group	Equivalent to `(x)`, except it remembers the content used.	"Foo Bar" `/(?<F>Foo)\|(?<B>Bar)/g` -> `[ "Foo", "Bar" ]`
`\k<y>`	Named backreference	References a previous named capture group, note that `\k` is literal	"Foo Foo" `/(?<Foo>Foo)\s\k<Foo>/g` -> `[ "Foo Foo" ]`
Character classes
`[x-z]`	Character class	Matches every letter or number from `x` to `z`.	"Foo Bar" `/[a-f]/gi` -> `[ "F", "B", "a" ]`
`[xyz]`	Character class	References either `x`, `y`, or `z`	"Foo Bar" `/[FB]/g` -> `[ "F", "B" ]`
`[^x-z]`	Negated character class	Matches every letter or number not from `x` to `z`.	"Foo Bar" `/[^a-f]/gi` -> `[ "o", "o", " ", "r" ]`
`[^xyz]`	Negated character class	References characters that aren't `x`, `y`, or `z`	"Foo Bar" `/[^FB]/g` -> `[ "o", "o", " ", "a", "r" ]`
`.`	Wildcard	Matches every character besides line terminators. Line terminators include `\n`, `\r`, `\u2028`, and `\u2029`	"Foo Bar" `/./g` -> `[ "F", "o", "o", " ", "B", "a", "r" ]`
`x\|y`	Disjunction	Match something or something else.	"Foo Bar" `/Foo\|Bar/g` -> `[ "Foo", "Bar" ]`
`\`	Escape character	If a character is reserved for regex, such as `*`, `\|`, or `.`. Note that this is itself a reserve character, so to match for it, you need to use `\\`.	"Foo.bar apple 78.9 banana" `/[A-Za-z0-9]\.[A-Za-z0-9]/g` -> `[ "Foo.bar", "78.9" ]`
`\d`	Digit character class	Equivalent to `[0-9]`	"78 Foo Bars" `/\d/g` -> `[ "7", "8" ]`
`\D`	Non-digit character class	Equivalent to `[^0-9]`	"78 Foo Bars" `/\d/g` -> `[ "F", "o"," "o", " ", "B", "a", "r", "s" ]`
`\w`	Word character class	Equivalent to `[A-Za-z0-9_]`	"_Foo- Bars+" `/\d/g` -> `[ "_", "F"," "o", "o", "B", "a", "r", "s" ]`
`\W`	Non-word character class	Equivalent to `[^A-Za-z0-9_]`	"_Foo- Bars+" `/\d/g` -> `[ "-", " ", "+" ]`
`\s`	White space character class	Matches all whitespace characters. Equivalent to `[\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]`	"_Foo- Bars+" `/\d/g` -> `[ "-", " ", "+" ]`
`\S`	Non-white space character class	Matches everything but whitespace characters. Equivalent to `[^\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]`	"_Foo- Bars+" `/\d/g` -> `[ "-", " ", "+" ]`
`\t`	Horizontal tab	Matches horizontal tab characters.	"a b" `/\t/g` -> `[ " " ]`
`\n`	New line	Matches linefeed/new line characters	"a b" `/(?:\r?\n)\|(?:\v)\|(?:\f)/g` -> `[ "" ]`
`\r`	Carriage return	Matches carriage return characters
`\v`	Vertical tab	Matches vertical tab characters
`\f`	Form feed	Matches form feed characters
`[\b]`	Backspace	Matches backspace	No example can be provided
`\0`	NUL	Matches the NUL character
`\u{YYYY}` or `\u{YYYY}`	Unicode value escape	When the `u` flag is applied. Here `Y` represents a hexadecimal number.
`\uYYYY`	Unicode value escape	Matches provided UTF-16 hexadecimal value. Represented with `Y`s here.
`\p{x}` or `\P{x}`	Unicode character class	Matches a character based on the Unicode property (`x`).
`\cx`	Caret notation escape	Matches the sequence following `\c` with caret notation. Note that `x` represents a sequence of characters here, rather than a single one.	"a b" `/\cM\cJ//g` -> `[ "" ]`
Assertions
`^`	Input boundary beginning	Matches the beginning of the input. If the `m` flag is on, it matches the start of each line.	"Foo Bar" `/(^Foo)\|(Bar$)/g` -> `[ "Foo", "Bar" ]`
`$`	Input boundary end	Matches the end of the input. If the `m` flag is on, it matches the end of each line.	"Foo Bar" `/(^Foo)\|(Bar$)/g` -> `[ "Foo", "Bar" ]`
`\b`	Word boundary	Matches either end of a word.	"Foo Bar" `/(\bFoo\b)/` -> `[ "Foo" ]`
`\B`	Non-word boundary	Matches the middle of a word.	"Foo Bar" `/(B\Bar)/` -> `[ "Bar" ]`
`x(?=y)`	Positive lookahead	Matches if `y` is after `x`, but doesn't include `y` in the output.	"Foo Bar" `/Foo(?= Bar)/` -> `[ "Foo" ]`
`x(?!y)`	Negative lookahead	Matches if `y` is not after `x`, but doesn't include `y` in the output.	"Foo Bar" `/Foo(?! Car)/` -> `[ "Foo" ]`
`(?<=x)y`	Positive lookbehind	Matches if `y` is before `x`, but doesn't include `y` in the output.	"Foo Bar" `/(?<=Foo )Bar/` -> `[ "Bar" ]`
`(?<!x)y`	Negative lookbehind	Matches if `y` is before `x`, but doesn't include `y` in the output.	"Foo Bar" `/(?<!Moo )Bar/` -> `[ "Bar" ]`
Quantifiers
`x*`	Wild-amount	Matches `x` any number of times, including 0.	"Foo Foo Foo Bar" `/(?:Foo )*Bar/g` -> `[ "Foo Foo Foo Bar" ]`
`x+`	Wild-1-or-more	Matches `x` if it occurs 1 or more times.	"Foo Bar Bar" `/(Foo)+ (Bar)+/` -> `[ "Foo Bar Bar" ]`
`x?`	Can occur	Matches `x` if it occurs, otherwise, ignore it.	"Foo " `/Foo (Bar)?/` -> `[ "Foo " ]`
`x{Y}`	Occurs set times	Matches if `x` occurs `Y` times.	"Foo Bar Bar" `/Fo{2} (?:Bar\s?){2}/` `[ "Foo Bar Bar" ]`
`x{Y,Z}`	Occurs between set times	Matches if `x` occurs `Y` and `Z` times.	"Foooo Bar Bar Bar Bar Bar" `/Fo{2,5} (?:Bar\s?){1,10}/` `[ "Foooo Bar Bar Bar Bar Bar" ]`
`x*?`, `x+?`, `x??`, `x{Y}?`, or `x{Y,Z}?`	Lazy match	Matches `x` the least number of times possible, in accordance to the base rule.	"Foooo Bar Bar Bar Bar Bar" `/Fo{2,5} (?:Bar\s??){1,10}?/` `[ "Foooo Bar" ]`

Flags

Flag	Name	Description
`g`	`g`lobal	Search all of a string, rather than stopping once you find an occurrence.

External links

Regular expressions on Wikipedia
TurboWarp extension gallery featuring TrueFantom's RegExp extension. It can be loaded into PenguinMod using https://extensions.turbowarp.org/true-fantom/regexp.js as the URL in the Load Custom Extensions popup. It adds more regex functionality into PenguinMod.
regex101, fairly useful little app with some fun challenges to test your knowledge of regex.
MDN's Regular expressions documentation for JavaScript. There wasn't a good place to cite this, but I sourced at lot of stuff from here. Pretty much all of the names for each syntax element.

@@ Line 39: / Line 39: @@
 | <code>\</code> || Escape character || If a character is reserved for regex, such as <code><nowiki>*</nowiki></code>, <code><nowiki>|</nowiki></code>, or <code>.</code>. Note that this is itself a reserve character, so to match for it, you need to use <code>\\</code>. || "Foo.bar apple 78.9 banana" <code><nowiki>/[A-Za-z0-9]*\.[A-Za-z0-9]*/g</nowiki></code> -> <code>[ "Foo.bar", "78.9" ]</code>
 |-
-| <code>\d</code> || Digit character class escape || Equivalent to <code>[0-9]</code> || "78 Foo Bars" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "7", "8" ]</code>
+| <code>\d</code> || Digit character class || Equivalent to <code>[0-9]</code> || "78 Foo Bars" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "7", "8" ]</code>
 |-
-| <code>\D</code> || Non-digit character class escape || Equivalent to <code>[^0-9]</code> || "78 Foo Bars" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "F", "o"," "o", " ", "B", "a", "r", "s" ]</code>
+| <code>\D</code> || Non-digit character class || Equivalent to <code>[^0-9]</code> || "78 Foo Bars" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "F", "o"," "o", " ", "B", "a", "r", "s" ]</code>
 |-
-| <code>\w</code> || Word character class escape || Equivalent to <code>[A-Za-z0-9_]</code> || "_Foo- Bars+" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "_", "F"," "o", "o", "B", "a", "r", "s" ]</code>
+| <code>\w</code> || Word character class || Equivalent to <code>[A-Za-z0-9_]</code> || "_Foo- Bars+" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "_", "F"," "o", "o", "B", "a", "r", "s" ]</code>
 |-
-| <code>\W</code> || Non-word character class escape || Equivalent to <code>[^A-Za-z0-9_]</code> || "_Foo- Bars+" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "-", " ", "+" ]</code>
+| <code>\W</code> || Non-word character class || Equivalent to <code>[^A-Za-z0-9_]</code> || "_Foo- Bars+" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "-", " ", "+" ]</code>
 |-
-| <code>\s</code> || White space character class escape || Matches all whitespace characters. Equivalent to <code>[\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]</code> || "_Foo- Bars+" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "-", " ", "+" ]</code>
+| <code>\s</code> || White space character class || Matches all whitespace characters. Equivalent to <code>[\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]</code> || "_Foo- Bars+" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "-", " ", "+" ]</code>
 |-
-| <code>\S</code> || Non-white space character class escape || Matches everything but whitespace characters. Equivalent to <code>[^\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]</code> || "_Foo- Bars+" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "-", " ", "+" ]</code>
+| <code>\S</code> || Non-white space character class || Matches everything but whitespace characters. Equivalent to <code>[^\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]</code> || "_Foo- Bars+" <code><nowiki>/\d/g</nowiki></code> -> <code>[ "-", " ", "+" ]</code>
 |-
-| <code>\t</code> || Horizontal tab escape || Matches horizontal tab characters. || "a&nbsp;&nbsp;&nbsp;&nbsp;b" <code><nowiki>/\t/g</nowiki></code> -> <code>[ "&nbsp;&nbsp;&nbsp;&nbsp;" ]</code>
+| <code>\t</code> || Horizontal tab || Matches horizontal tab characters. || "a&nbsp;&nbsp;&nbsp;&nbsp;b" <code><nowiki>/\t/g</nowiki></code> -> <code>[ "&nbsp;&nbsp;&nbsp;&nbsp;" ]</code>
 |-
-| <code>\n</code> || Linefeed escape || Matches linefeed/new line characters || rowspan="4" | "a<br>b" <code><nowiki>/(?:\r?\n)|(?:\v)|(?:\f)/g</nowiki></code> -> <code>[ "" ]</code>
+| <code>\n</code> || New line || Matches linefeed/new line characters || rowspan="4" | "a<br>b" <code><nowiki>/(?:\r?\n)|(?:\v)|(?:\f)/g</nowiki></code> -> <code>[ "" ]</code>
 |-
-| <code>\r</code> || Carriage return escape || Matches carriage return characters
+| <code>\r</code> || Carriage return || Matches carriage return characters
 |-
-| <code>\v</code> || Vertical tab escape || Matches vertical tab characters
+| <code>\v</code> || Vertical tab || Matches vertical tab characters
 |-
-| <code>\f</code> || Form feed escape || Matches form feed characters
+| <code>\f</code> || Form feed || Matches form feed characters
 |-
-| <code>[\b]</code> || Backspace escape || Matches backspace || rowspan="5" | No example can be provided
+| <code>[\b]</code> || Backspace || Matches backspace || rowspan="5" | No example can be provided
 |-
-| <code>\0</code> || NUL escape || Matches the NUL character
+| <code>\0</code> || NUL || Matches the NUL character
 |-
 | <code>\u{YYYY}</code> or <code>\u{YYYY}</code> || rowspan="2" | Unicode value escape || When the <code>u</code> flag is applied. Here <code>Y</code> represents a hexadecimal number.
@@ Line 69: / Line 69: @@
 | <code>\uYYYY</code> || Matches provided UTF-16 hexadecimal value. Represented with <code>Y</code>s here.
 |-
-| <code>\p{x}</code> or <code>\P{x}</code> || Unicode character class escape || Matches a character based on the Unicode property (<code>x</code>).
+| <code>\p{x}</code> or <code>\P{x}</code> || Unicode character class || Matches a character based on the Unicode property (<code>x</code>).
 |-
 | <code>\cx</code> || Caret notation escape || Matches the sequence following <code>\c</code> with [[w:Caret notation|caret notation]]. Note that <code>x</code> represents a sequence of characters here, rather than a single one. || "a<br>b" <code><nowiki>/\cM\cJ//g</nowiki></code> -> <code><nowiki>[ "" ]</nowiki></code>
@@ Line 75: / Line 75: @@
 ! colspan=4 | Assertions
 |-
-| <code>^</code> || Input boundary beginning assertion || Matches the beginning of the input. If the <code>m</code> flag is on, it matches the start of each line. || rowspan="2" | "Foo Bar" <code>/^Foo Bar$/g</code> -> <code>[ "Foo Bar" ]</code>
+| <code>^</code> || Input boundary beginning || Matches the beginning of the input. If the <code>m</code> flag is on, it matches the start of each line. || rowspan="2" | "Foo Bar" <code>/(^Foo)|(Bar$)/g</code> -> <code>[ "Foo", "Bar" ]</code>
 |-
-| <code>$</code> || Input boundary end assertion || Matches the end of the input. If the <code>m</code> flag is on, it matches the end of each line.
+| <code>$</code> || Input boundary end || Matches the end of the input. If the <code>m</code> flag is on, it matches the end of each line.
 |-
-| <code>\b</code> || Word boundary assertion || Matches either end of a word. || "Foo Bar" <code>/(\bFoo\b)/</code> -> <code>[ "Foo" ]</code>
+| <code>\b</code> || Word boundary || Matches either end of a word. || "Foo Bar" <code>/(\bFoo\b)/</code> -> <code>[ "Foo" ]</code>
 |-
-| <code>\B</code> || Non-word boundary assertion || Matches the middle of a word. || "Foo Bar" <code>/(B\Bar)/</code> -> <code>[ "Bar" ]</code>
+| <code>\B</code> || Non-word boundary || Matches the middle of a word. || "Foo Bar" <code>/(B\Bar)/</code> -> <code>[ "Bar" ]</code>
+|-
+| <code>x(?=y)</code> || Positive lookahead || Matches if <code>y</code> is after <code>x</code>, but doesn't include <code>y</code> in the output. || "Foo Bar" <code>/Foo(?= Bar)/</code> -> <code>[ "Foo" ]</code>
+|-
+| <code>x(?!y)</code> || Negative lookahead || Matches if <code>y</code> is not after <code>x</code>, but doesn't include <code>y</code> in the output. || "Foo Bar" <code>/Foo(?! Car)/</code> -> <code>[ "Foo" ]</code>
+|-
+| <code>(?<=x)y</code> || Positive lookbehind || Matches if <code>y</code> is before <code>x</code>, but doesn't include <code>y</code> in the output. || "Foo Bar" <code>/(?<=Foo )Bar/</code> -> <code>[ "Bar" ]</code>
+|-
+| <code>(?<!x)y</code> || Negative lookbehind || Matches if <code>y</code> is before <code>x</code>, but doesn't include <code>y</code> in the output. || "Foo Bar" <code>/(?<!Moo )Bar/</code> -> <code>[ "Bar" ]</code>
+|-
+! Quantifiers
+|-
+| <code>x*</code> || Wild-amount || Matches <code>x</code> any number of times, including 0. || "Foo Foo Foo Bar" <code>/(?:Foo )*Bar/g</code> -> <code>[ "Foo Foo Foo Bar" ]</code>
+|-
+| <code>x+</code> || Wild-1-or-more || Matches <code>x</code> if it occurs 1 or more times. || "Foo Bar Bar" <code>/(Foo)+ (Bar)+/</code> -> <code>[ "Foo Bar Bar" ]</code>
+|-
+| <code>x?</code> || Can occur || Matches <code>x</code> if it occurs, otherwise, ignore it. || "Foo " <code>/Foo (Bar)?/</code> -> <code>[ "Foo " ]</code>
+|-
+| <code>x{Y}</code> || Occurs set times || Matches if <code>x</code> occurs <code>Y</code> times. || "Foo Bar Bar" <code>/Fo{2} (?:Bar\s?){2}/</code> <code>[ "Foo Bar Bar" ]</code>
+|-
+| <code>x{Y,Z}</code> || Occurs between set times || Matches if <code>x</code> occurs <code>Y</code> and <code>Z</code> times. || "Foooo Bar Bar Bar Bar Bar" <code>/Fo{2,5} (?:Bar\s?){1,10}/</code> <code>[ "Foooo Bar Bar Bar Bar Bar" ]</code>
+|-
+| <code>x*?</code>, <code>x+?</code>, <code>x??</code>, <code>x{Y}?</code>, or <code>x{Y,Z}?</code> || Lazy match || Matches <code>x</code> the least number of times possible, in accordance to the base rule. || "Foooo Bar Bar Bar Bar Bar" <code>/Fo{2,5} (?:Bar\s??){1,10}?/</code> <code>[ "Foooo Bar" ]</code>
 |}

User:Steve0Greatness/drafts/Regular expression: Difference between revisions

Latest revision as of 17:53, 2 July 2024

Contents

Syntax

Flags

See also

External links

Navigation menu

User:Steve0Greatness/drafts/Regular expression: Difference between revisions

Latest revision as of 17:53, 2 July 2024

Syntax

Flags

See also

External links

Navigation menu

Search