Regular expressions

From PenguinMod Wiki (Official)
Jump to navigation Jump to search
Lorem ipsum first paragraph where every consonant-vowel pair is highlighted
The alternating yellow and orange highlights show results for the following regexp pattern: /[a-z](?<![aeiou])[aeiou]/gi (any consonant-vowel pair)

Regular expression, often shortened to regex, is used to specify a match pattern with just text.

Syntax

x, y, and z when used under symbols are placeholders for text. Capital Xs, Ys, and Zs are used for number placeholders.


Syntax Reference
Symbol(s) Name Description Example
Groups and backreferences
(x) Capture group Separates the content in the output. "Foo Bar" /(Foo)|(Bar)/g -> [ "Foo", "Bar" ]
(?:x) Non-capture group Acts as if the parentheses were not there "Foo Bar" /(?:Foo)|(?:Bar)/g -> [ "Foo Bar" ]
(?<y>x) Named capture group Equivalent to (x), except it remembers the content used. "Foo Bar" /(?<F>Foo)|(?<B>Bar)/g -> [ "Foo", "Bar" ]
\k<y> Named backreference References a previous named capture group, note that \k is literal "Foo Foo" /(?<Foo>Foo)\s\k<Foo>/g -> [ "Foo Foo" ]
Character classes
[x-z] Character class Matches every letter or number from x to z. "Foo Bar" /[a-f]/gi -> [ "F", "B", "a" ]
[xyz] References either x, y, or z "Foo Bar" /[FB]/g -> [ "F", "B" ]
[^x-z] Negated character class Matches every letter or number not from x to z. "Foo Bar" /[^a-f]/gi -> [ "o", "o", " ", "r" ]
[^xyz] References characters that aren't x, y, or z "Foo Bar" /[^FB]/g -> [ "o", "o", " ", "a", "r" ]
. Wildcard Matches every character besides line terminators. Line terminators include \n, \r, \u2028, and \u2029 "Foo Bar" /./g -> [ "F", "o", "o", " ", "B", "a", "r" ]
x|y Disjunction Match something or something else. "Foo Bar" /Foo|Bar/g -> [ "Foo", "Bar" ]
\ Escape character If a character is reserved for regex, such as *, |, or .. Note that this is itself a reserve character, so to match for it, you need to use \\. "Foo.bar apple 78.9 banana" /[A-Za-z0-9]*\.[A-Za-z0-9]*/g -> [ "Foo.bar", "78.9" ]
\d Digit character class Equivalent to [0-9] "78 Foo Bars" /\d/g -> [ "7", "8" ]
\D Non-digit character class Equivalent to [^0-9] "78 Foo Bars" /\d/g -> [ "F", "o"," "o", " ", "B", "a", "r", "s" ]
\w Word character class Equivalent to [A-Za-z0-9_] "_Foo- Bars+" /\d/g -> [ "_", "F"," "o", "o", "B", "a", "r", "s" ]
\W Non-word character class Equivalent to [^A-Za-z0-9_] "_Foo- Bars+" /\d/g -> [ "-", " ", "+" ]
\s White space character class Matches all whitespace characters. Equivalent to [\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff] "_Foo- Bars+" /\d/g -> [ "-", " ", "+" ]
\S Non-white space character class Matches everything but whitespace characters. Equivalent to [^\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff] "_Foo- Bars+" /\d/g -> [ "-", " ", "+" ]
\t Horizontal tab Matches horizontal tab characters. "a    b" /\t/g -> [ "    " ]
\n New line Matches linefeed/new line characters "a
b" /(?:\r?\n)|(?:\v)|(?:\f)/g -> [ "" ]
\r Carriage return Matches carriage return characters
\v Vertical tab Matches vertical tab characters
\f Form feed Matches form feed characters
[\b] Backspace Matches backspace No example can be provided
\0 NUL Matches the NUL character
\u{YYYY} or \u{YYYY} Unicode value escape When the u flag is applied. Here Y represents a hexadecimal number.
\uYYYY Matches provided UTF-16 hexadecimal value. Represented with Ys here.
\p{x} or \P{x} Unicode character class Matches a character based on the Unicode property (x).
\cx Caret notation escape Matches the sequence following \c with caret notation. Note that x represents a sequence of characters here, rather than a single one. "a
b" /\cM\cJ//g -> [ "" ]
Assertions
^ Input boundary beginning Matches the beginning of the input. If the m flag is on, it matches the start of each line. "Foo Bar" /(^Foo)|(Bar$)/g -> [ "Foo", "Bar" ]
$ Input boundary end Matches the end of the input. If the m flag is on, it matches the end of each line.
\b Word boundary Matches either end of a word. "Foo Bar" /(\bFoo\b)/ -> [ "Foo" ]
\B Non-word boundary Matches the middle of a word. "Foo Bar" /(B\Bar)/ -> [ "Bar" ]
x(?=y) Positive lookahead Matches if y is after x, but doesn't include y in the output. "Foo Bar" /Foo(?= Bar)/ -> [ "Foo" ]
x(?!y) Negative lookahead Matches if y is not after x, but doesn't include y in the output. "Foo Bar" /Foo(?! Car)/ -> [ "Foo" ]
(?<=x)y Positive lookbehind Matches if y is before x, but doesn't include y in the output. "Foo Bar" /(?<=Foo )Bar/ -> [ "Bar" ]
(?<!x)y Negative lookbehind Matches if y is before x, but doesn't include y in the output. "Foo Bar" /(?<!Moo )Bar/ -> [ "Bar" ]
Quantifiers
x* Wild-amount Matches x any number of times, including 0. "Foo Foo Foo Bar" /(?:Foo )*Bar/g -> [ "Foo Foo Foo Bar" ]
x+ Wild-1-or-more Matches x if it occurs 1 or more times. "Foo Bar Bar" /(Foo)+ (Bar)+/ -> [ "Foo Bar Bar" ]
x? Can occur Matches x if it occurs, otherwise, ignore it. "Foo " /Foo (Bar)?/ -> [ "Foo " ]
x{Y} Occurs set times Matches if x occurs Y times. "Foo Bar Bar" /Fo{2} (?:Bar\s?){2}/ [ "Foo Bar Bar" ]
x{Y,Z} Occurs between set times Matches if x occurs Y and Z times. "Foooo Bar Bar Bar Bar Bar" /Fo{2,5} (?:Bar\s?){1,10}/ [ "Foooo Bar Bar Bar Bar Bar" ]
x*?, x+?, x??, x{Y}?, or x{Y,Z}? Lazy match Matches x the least number of times possible, in accordance to the base rule. "Foooo Bar Bar Bar Bar Bar" /Fo{2,5} (?:Bar\s??){1,10}?/ [ "Foooo Bar" ]

Flags

Whilst there are flags other than the following, they are either non-standard, or do not have a baring on PenguinMod.

Flag Name Description
g global Search all of a string, rather than stopping once you find an occurrence.
i Case insensitive The search will ignore the case of characters, making /[A-Za-z]g and /[a-z]/gi equivalent.
m multiline Makes ^ and $ match the start and end of lines rather than the start and end of strings.
s single line/dot all Makes . able to match all line terminators: \n, \r, \u2028, and \u2029.
u unicode Makes the pattern treated as a sequence of unicode codepoints.
v unicode upgrade Similar to u, but updated with more features.

See also

External links

  • Regular expressions on Wikipedia
  • TurboWarp extension gallery featuring TrueFantom's RegExp extension. It can be loaded into PenguinMod using https://extensions.turbowarp.org/true-fantom/regexp.js as the URL in the Load Custom Extensions popup. It adds more regex functionality into PenguinMod.
  • regex101, fairly useful little app with some fun challenges to test your knowledge of regex.
  • MDN's Regular expressions documentation for JavaScript. There wasn't a good place to cite this, but I sourced at lot of stuff from here. Pretty much all of the names for each syntax element.