src/regex

    Dark Mode
Search:
Group by:

A library for parsing, compiling, and executing regular expressions. The match time is linear in the length of the text and the regular expression. So, it can handle input from untrusted users. The syntax is similar to PCRE but lacks a few features that can not be implemented while keeping the space/time complexity guarantees, ex: backreferences.

Syntax

Matching one character

.          any character except new line (includes new line with s flag)
\d         digit (\p{Nd})
\D         not digit
\pN        One-letter name Unicode character class
\p{Greek}  Unicode character class (general category or script)
\PN        Negated one-letter name Unicode character class
\P{Greek}  negated Unicode character class (general category or script)

Character classes

[xyz]         A character class matching either x, y or z (union).
[^xyz]        A character class matching any character except x, y and z.
[a-z]         A character class matching any character in range a-z.
[[:alpha:]]   ASCII character class ([A-Za-z])
[[:^alpha:]]  Negated ASCII character class ([^A-Za-z])
[\[\]]        Escaping in character classes (matching [ or ])

Composites

xy   concatenation (x followed by y)
x|y  alternation (x or y, prefer x)

Repetitions

x*       zero or more of x (greedy)
x+       one or more of x (greedy)
x?       zero or one of x (greedy)
x*?      zero or more of x (ungreedy/lazy)
x+?      one or more of x (ungreedy/lazy)
x??      zero or one of x (ungreedy/lazy)
x{n,m}   at least n x and at most m x (greedy)
x{n,}    at least n x (greedy)
x{n}     exactly n x
x{n,m}?  at least n x and at most m x (ungreedy/lazy)
x{n,}?   at least n x (ungreedy/lazy)
x{n}?    exactly n x

Empty matches

^   the beginning of text (or start-of-line with multi-line mode)
$   the end of text (or end-of-line with multi-line mode)
\A  only the beginning of text (even with multi-line mode enabled)
\z  only the end of text (even with multi-line mode enabled)
\b  a Unicode word boundary (\w on one side and \W, \A, or \z on other)
\B  not a Unicode word boundary

Grouping and flags

(exp)          numbered capture group (indexed by opening parenthesis)
(?P<name>exp)  named (also numbered) capture group (allowed chars: [_0-9a-zA-Z])
(?:exp)        non-capturing group
(?flags)       set flags within current group
(?flags:exp)   set flags for exp (non-capturing)

Flags are each a single character. For example, (?x) sets the flag x and (?-x) clears the flag x. Multiple flags can be set or cleared at the same time: (?xy) sets both the x and y flags, (?x-y) sets the x flag and clears the y flag, and (?-xy) clears both the x and y flags.

i  case-insensitive: letters match both upper and lower case
m  multi-line mode: ^ and $ match begin/end of line
s  allow . to match \L (new line)
U  swap the meaning of x* and x*? (un-greedy mode)
u  Unicode support (enabled by default)
x  ignore whitespace and allow line comments (starting with #)

All flags are disabled by default unless stated otherwise

The regex accepts passing a set of flags:

regexCaseless        same as (?i)
regexMultiline       same as (?m)
regexDotAll          same as (?s)
regexUngreedy        same as (?U)
regexAscii           same as (?-u)
regexExtended        same as (?x)
regexArbitraryBytes  treat both the regex and the input text
                     as arbitrary byte sequences
Note: Read the Match arbitrary bytes section to learn more about the arbitrary bytes mode and ascii mode

Escape sequences

\*         literal *, works for any punctuation character: \.+*?()|[]{}^$
\a         bell (\x07)
\f         form feed (\x0C)
\t         horizontal tab
\n         new line (\L)
\r         carriage return
\v         vertical tab (\x0B)
\123       octal character code (up to three digits)
\x7F       hex character code (exactly two digits)
\x{10FFFF} any hex character code corresponding to a Unicode code point
\u007F     hex character code (exactly four digits)
\U0010FFFF hex character code (exactly eight digits)

Perl character classes (Unicode friendly)

These classes are based on the definitions provided in UTS#18

\d  digit (\p{Nd})
\D  not digit
\s  whitespace (\p{White_Space})
\S  not whitespace
\w  word character (\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{Join_Control})
\W  not word character

ASCII character classes

[[:alnum:]]   alphanumeric ([0-9A-Za-z])
[[:alpha:]]   alphabetic ([A-Za-z])
[[:ascii:]]   ASCII ([\x00-\x7F])
[[:blank:]]   blank ([\t ])
[[:cntrl:]]   control ([\x00-\x1F\x7F])
[[:digit:]]   digits ([0-9])
[[:graph:]]   graphical ([!-~])
[[:lower:]]   lower case ([a-z])
[[:print:]]   printable ([ -~])
[[:punct:]]   punctuation ([!-/:-@\[-`{-~])
[[:space:]]   whitespace ([\t\n\v\f\r ])
[[:upper:]]   upper case ([A-Z])
[[:word:]]    word characters ([0-9A-Za-z_])
[[:xdigit:]]  hex digit ([0-9A-Fa-f])

Lookaround Assertions

(?=regex)   A positive lookahead assertion
(?!regex)   A negative lookahead assertion
(?<=regex)  A positive lookbehind assertion
(?<!regex)  A negative lookbehind assertion

Any regex expression is a valid lookaround; groups are captured as well. Beware, lookarounds containing repetitions (*, +, and {n,}) may run in polynomial time.

Examples

Match

The match function match a text from start to end, similar to ^regex$. This means the whole text needs to match the regex for this function to return true.

let text = "nim c --styleCheck:hint --colors:off regex.nim"
var m: RegexMatch2
if match(text, re2"nim c (?:--(\w+:\w+) *)+ (\w+).nim", m):
  doAssert text[m.group(0)] == "colors:off"
  doAssert text[m.group(1)] == "regex"
else:
  doAssert false, "no match"

Captures

Like most other regex engines, this library only captures the last repetition in a repeated group (*, +, {n}). Note how in the previous example both styleCheck:hint and colors:off are matched in the same group but only the last captured match (colors:off) is returned.

To check if a capture group did match you can use reNonCapture. For example doAssert m.group(0) != reNonCapture. This is useful to disambiguate empty captures and non-matched captures. Since both return an empty string when slicing the text.

The space complexity for captures is O(regex_len * groups_count), and so it can be used to match untrusted text.

Find

The find function will find the first piece of text that match a given regex.

let text = """
The Continental's email list:
john_wick@continental.com
winston@continental.com
ms_perkins@continental.com
"""
var match = ""
var capture = ""
var m: RegexMatch2
if find(text, re2"(\w+)@\w+\.\w+", m):
  match = text[m.boundaries]
  capture = text[m.group(0)]
doAssert match == "john_wick@continental.com"
doAssert capture == "john_wick"

Find All

The findAll function will find all pieces of text that match a given regex, returning their boundaries and captures/submatches.

let text = """
The Continental's email list:
john_wick@continental.com
winston@continental.com
ms_perkins@continental.com
"""
var matches = newSeq[string]()
var captures = newSeq[string]()
for m in findAll(text, re2"(\w+)@\w+\.\w+"):
  matches.add text[m.boundaries]
  captures.add text[m.group(0)]
doAssert matches == @[
  "john_wick@continental.com",
  "winston@continental.com",
  "ms_perkins@continental.com"
]
doAssert captures == @["john_wick", "winston", "ms_perkins"]

Verbose Mode

Verbose mode (?x) makes regexes more readable by allowing comments and multi-lines within the regular expression itself. The caveat is spaces and pound signs must be scaped to be matched.

const exp = re2"""(?x)
\#   # the hashtag
\w+  # hashtag words
"""
let text = "#NimLang"
doAssert match(text, exp)

Match Macro

The match macro is sometimes more convenient, and faster than the function version. It will run a full match on the whole string, similar to ^regex$.

A matches: seq[string] variable is injected into the scope, and it contains the submatches for every capture group.

var matched = false
let text = "[my link](https://example.com)"
match text, rex"\[([^\]]+)\]\((https?://[^)]+)\)":
  doAssert matches == @["my link", "https://example.com"]
  matched = true
doAssert matched

Invalid UTF-8 input text

UTF-8 validation on the input text is only done in debug mode for perf reasons. The behaviour on invalid UTF-8 input (i.e: malformed, corrupted, truncated, etc) when compiling in release/danger mode is currently undefined, and it will likely result in an internal AssertionDefect or some other error.

What can be done about this is validating the input text to avoid passing invalid input to the match function.

import unicode
# good input text
doAssert validateUtf8("abc") == -1
# bad input text
doAssert validateUtf8("\xf8\xa1\xa1\xa1\xa1") != -1

Note at the time of writting this, Nim's validateUtf8 is not strict enough and so you are better off using nim-unicodeplus's verifyUtf8 function.

Match arbitrary bytes

Setting the regexArbitraryBytes flag will treat both the regex and the input text as byte sequences. This flag makes ascii mode the default.

Note: Do not confuse this with ascii mode. Setting the regex to ascii mode (?-u) alone is not enough to match arbitrary bytes, since both the input and regex will be treated as UTF8.
const flags = {regexArbitraryBytes}
doAssert match("\xff", re2(r"\xff", flags))
doAssert match("\xf8\xa1\xa1\xa1\xa1", re2(r".+", flags))

Beware of (un)expected behaviour when mixin UTF-8 characters.

const flags = {regexArbitraryBytes}
doAssert match("Ⓐ", re2(r"Ⓐ", flags))
doAssert match("ⒶⒶ", re2(r"(Ⓐ)+", flags))
doAssert not match("ⒶⒶ", re2(r"Ⓐ+", flags))  # ???

The last line in the above example won't match because the regex is parsed as a byte sequence. The character is composed of multiple bytes (\xe2\x92\xb6), and only the last byte is affected by the + operator.

Compile the regex at compile time

Passing a regex literal or assigning it to a const will compile the regex at compile time. Errors in the expression will be catched at compile time this way.

Do not confuse the regex compilation with the matching operation. The following examples do the matching at runtime. But matching at compile-time is supported as well.

let text = "abc"
block:
  const rexp = re2".+"
  doAssert match(text, rexp)
block:
  doAssert match(text, re2".+")
block:
  func myFn(s: string, exp: static string) =
    const rexp = re2(exp)
    doAssert match(s, rexp)
  myFn(text, r".+")

Using a const can avoid confusion when passing flags:

let text = "abc"
block:
  const rexp = re2(r".+", {regexDotAll})
  doAssert match(text, rexp)
block:
  doAssert match(text, re2(r".+", {regexDotAll}))
block:
  # this will compile the expression at runtime
  # because flags is a var, avoid it!
  let flags = {regexDotAll}
  doAssert match(text, re2(r".+", flags))

Compile the regex at runtime

Note: Consider compiling the regex at compile-time whenever possible.

Most of the time compiling the regex at runtime can be avoided, and it should be avoided. Nim has really good compile-time capabilities like reading files, constructing strings, and so on. However, it cannot be helped in cases where the regex is passed to the program at runtime (from terminal input, network, or text files).

To compile the regex at runtime define the regex literal as a var/let, or pass the string expression as a var.

let text = "abc"
block:
  var rexp = re2".+"
  doAssert match(text, rexp)
block:
  let rexp = re2".+"
  doAssert match(text, rexp)
block:
  var exp = r".+"
  doAssert match(text, re2(exp))
block:
  func myFn(s: string, exp: string) =
    doAssert match(s, re2(exp))
  myFn(text, r".+")

Consts

reNonCapture = (a: -1, b: -2)

Procs

func contains(s: string; pattern: Regex): bool {.inline, ...raises: [],
    deprecated: "use contains(string, Regex2) instead", tags: [RootEffect].}
Deprecated: use contains(string, Regex2) instead
func contains(s: string; pattern: Regex2): bool {.inline, ...raises: [],
    tags: [RootEffect].}

Example:

doAssert re2"bc" in "abcd"
doAssert re2"(23)+" in "23232"
doAssert re2"^(23)+$" notin "23232"
func endsWith(s: string; pattern: Regex): bool {.inline, ...raises: [],
    deprecated: "use endsWith(string, Regex2) instead", tags: [RootEffect].}
Deprecated: use endsWith(string, Regex2) instead
func endsWith(s: string; pattern: Regex2): bool {.inline, ...raises: [],
    tags: [RootEffect].}
return whether the string ends with the pattern or not

Example:

doAssert "abc".endsWith(re2"\w")
doAssert not "abc".endsWith(re2"\d")
func escapeRe(s: string): string {....raises: [], tags: [].}
Escape special regex characters in s so that it can be matched verbatim
func find(s: string; pattern: Regex2; m: var RegexMatch2; start = 0): bool {.
    inline, ...raises: [], tags: [RootEffect].}
search through the string looking for the first location where there is a match

Example:

var m: RegexMatch2
doAssert "abcd".find(re2"bc", m) and
  m.boundaries == 1 .. 2
doAssert not "abcd".find(re2"de", m)
doAssert "2222".find(re2"(22)*", m) and
  m.group(0) == 2 .. 3
func find(s: string; pattern: Regex; m: var RegexMatch; start = 0): bool {.
    inline, ...raises: [],
    deprecated: "use find(string, Regex2, var RegexMatch2) instead",
    tags: [RootEffect].}
Deprecated: use find(string, Regex2, var RegexMatch2) instead
func findAll(s: string; pattern: Regex2; start = 0): seq[RegexMatch2] {.inline,
    ...raises: [], tags: [RootEffect].}
func findAll(s: string; pattern: Regex; start = 0): seq[RegexMatch] {.inline,
    ...raises: [], deprecated: "use findAll(string, Regex2) instead",
    tags: [RootEffect].}
Deprecated: use findAll(string, Regex2) instead
func findAllBounds(s: string; pattern: Regex2; start = 0): seq[Slice[int]] {.
    inline, ...raises: [], tags: [RootEffect].}
func findAllBounds(s: string; pattern: Regex; start = 0): seq[Slice[int]] {.
    inline, ...raises: [], deprecated: "use findAllBounds(string, Regex2) instead",
    tags: [RootEffect].}
Deprecated: use findAllBounds(string, Regex2) instead
func findAndCaptureAll(s: string; pattern: Regex): seq[string] {.inline,
    ...raises: [], deprecated: "use findAll(string, Regex2) instead",
    tags: [RootEffect].}
Deprecated: use findAll(string, Regex2) instead
func group(m: RegexMatch2; i: int): Slice[int] {.inline, ...raises: [], tags: [].}
return slice for a given group. Slice of start > end are empty matches (i.e.: re2"(\d?)") and they are included same as in PCRE.

Example:

let text = "abc"
var m: RegexMatch2
doAssert text.match(re2"(\w)+", m)
doAssert text[m.group(0)] == "c"
func group(m: RegexMatch2; s: string): Slice[int] {.inline, ...raises: [KeyError],
    tags: [].}
return slices for a given named group

Example:

let text = "abc"
var m: RegexMatch2
doAssert text.match(re2"(?P<foo>\w)+", m)
doAssert text[m.group("foo")] == "c"
func group(m: RegexMatch; groupName: string; text: string): seq[string] {.
    inline, ...raises: [KeyError], deprecated, tags: [].}
Deprecated
func group(m: RegexMatch; i: int): seq[Slice[int]] {.inline, ...raises: [],
    deprecated: "use group(RegexMatch2, int)", tags: [].}
Deprecated: use group(RegexMatch2, int)
func group(m: RegexMatch; i: int; text: string): seq[string] {.inline,
    ...raises: [], deprecated, tags: [].}
Deprecated
func group(m: RegexMatch; s: string): seq[Slice[int]] {.inline,
    ...raises: [KeyError], deprecated: "use group(RegexMatch2, string)", tags: [].}
Deprecated: use group(RegexMatch2, string)
func groupFirstCapture(m: RegexMatch; groupName: string; text: string): string {.
    inline, ...raises: [KeyError], deprecated, tags: [].}
Deprecated
func groupFirstCapture(m: RegexMatch; i: int; text: string): string {.inline,
    ...raises: [], deprecated, tags: [].}
Deprecated
func groupLastCapture(m: RegexMatch; groupName: string; text: string): string {.
    inline, ...raises: [KeyError],
    deprecated: "use group(RegexMatch2, string) instead", tags: [].}
Deprecated: use group(RegexMatch2, string) instead
func groupLastCapture(m: RegexMatch; i: int; text: string): string {.inline,
    ...raises: [], deprecated: "use group(RegexMatch2, int) instead", tags: [].}
Deprecated: use group(RegexMatch2, int) instead
func groupNames(m: RegexMatch): seq[string] {.inline, ...raises: [],
    deprecated: "use groupNames(RegexMatch2)", tags: [].}
Deprecated: use groupNames(RegexMatch2)
func groupNames(m: RegexMatch2): seq[string] {.inline, ...raises: [], tags: [].}
return the names of capturing groups.

Example:

let text = "hello world"
var m: RegexMatch2
doAssert text.match(re2"(?P<greet>hello) (?P<who>world)", m)
doAssert m.groupNames == @["greet", "who"]
func groupsCount(m: RegexMatch): int {.inline, ...raises: [], deprecated: "use groupsCount(RegexMatch2)",
                                       tags: [].}
Deprecated: use groupsCount(RegexMatch2)
func groupsCount(m: RegexMatch2): int {.inline, ...raises: [], tags: [].}
return the number of capturing groups

Example:

var m: RegexMatch2
doAssert "ab".match(re2"(a)(b)", m)
doAssert m.groupsCount == 2
func isInitialized(re: Regex): bool {.inline, ...raises: [], deprecated: "use isInitialized(Regex2) instead",
                                      tags: [].}
Deprecated: use isInitialized(Regex2) instead
func isInitialized(re: Regex2): bool {.inline, ...raises: [], tags: [].}
Check whether the regex has been initialized

Example:

var re: Regex2
doAssert not re.isInitialized
re = re2"foo"
doAssert re.isInitialized
func match(s: string; pattern: Regex): bool {.inline, ...raises: [],
    deprecated: "use match(string, Regex2) instead", tags: [RootEffect].}
Deprecated: use match(string, Regex2) instead
func match(s: string; pattern: Regex2): bool {.inline, ...raises: [],
    tags: [RootEffect].}
func match(s: string; pattern: Regex2; m: var RegexMatch2; start = 0): bool {.
    inline, ...raises: [], tags: [RootEffect].}
return a match if the whole string matches the regular expression. This is similar to find(text, re"^regex$", m) but has better performance

Example:

var m: RegexMatch2
doAssert "abcd".match(re2"abcd", m)
doAssert not "abcd".match(re2"abc", m)
func match(s: string; pattern: Regex; m: var RegexMatch; start = 0): bool {.
    inline, ...raises: [],
    deprecated: "use match(string, Regex2, var RegexMatch2) instead",
    tags: [RootEffect].}
Deprecated: use match(string, Regex2, var RegexMatch2) instead
func re(s: static string): static[Regex] {.inline,
    ...deprecated: "use re2(static string) instead".}
Deprecated: use re2(static string) instead
func re(s: string): Regex {....raises: [RegexError],
                            deprecated: "use re2(string) instead", tags: [].}
Deprecated: use re2(string) instead
func re2(s: static string; flags: static RegexFlags = {}): static[Regex2] {.
    inline.}
Parse and compile a regular expression at compile-time
func re2(s: string; flags: RegexFlags = {}): Regex2 {....raises: [RegexError],
    tags: [].}
Parse and compile a regular expression at run-time

Example:

let abcx = re2"abc\w"
let abcx2 = re2(r"abc\w")
let pat = r"abc\w"
let abcx3 = re2(pat)
func replace(s: string; pattern: Regex2;
             by: proc (m: RegexMatch2; s: string): string; limit = 0): string {.
    inline, ...raises: [], effectsOf: by, ...tags: [RootEffect].}

Replace matched substrings.

If limit is given, at most limit replacements are done. limit of 0 means there is no limit

Example:

proc removeStars(m: RegexMatch2, s: string): string =
  result = s[m.group(0)]
  if result == "*":
    result = ""
let text = "**this is a test**"
doAssert text.replace(re2"(\*)", removeStars) == "this is a test"
func replace(s: string; pattern: Regex2; by: string; limit = 0): string {.
    inline, ...raises: [ValueError], tags: [RootEffect].}

Replace matched substrings.

Matched groups can be accessed with $N notation, where N is the group's index, starting at 1 (1-indexed). $$ means literal $.

If limit is given, at most limit replacements are done. limit of 0 means there is no limit

Example:

doAssert "aaa".replace(re2"a", "b", 1) == "baa"
doAssert "abc".replace(re2"(a(b)c)", "m($1) m($2)") ==
  "m(abc) m(b)"
doAssert "Nim is awesome!".replace(re2"(\w\B)", "$1_") ==
  "N_i_m i_s a_w_e_s_o_m_e!"
func replace(s: string; pattern: Regex;
             by: proc (m: RegexMatch; s: string): string; limit = 0): string {.
    inline, ...raises: [], effectsOf: by, ...deprecated: "use replace(string, Regex2, proc(RegexMatch2, string): string) instead",
    tags: [RootEffect].}
Deprecated: use replace(string, Regex2, proc(RegexMatch2, string): string) instead
func replace(s: string; pattern: Regex; by: string; limit = 0): string {.inline,
    ...raises: [ValueError],
    deprecated: "use replace(string, Regex2, string) instead",
    tags: [RootEffect].}
Deprecated: use replace(string, Regex2, string) instead
func rex(s: string): RegexLit {....raises: [], tags: [].}
Raw regex literal string
func split(s: string; sep: Regex): seq[string] {.inline, ...raises: [],
    deprecated: "use split(string, Regex2) instead", tags: [RootEffect].}
Deprecated: use split(string, Regex2) instead
func split(s: string; sep: Regex2): seq[string] {.inline, ...raises: [],
    tags: [RootEffect].}
return not matched substrings

Example:

doAssert split("11a22Ϊ33Ⓐ44弢55", re2"\d+") ==
  @["", "a", "Ϊ", "Ⓐ", "弢", ""]
func splitIncl(s: string; sep: Regex): seq[string] {.inline, ...raises: [],
    deprecated: "use splitIncl(string, Regex2) instead", tags: [RootEffect].}
Deprecated: use splitIncl(string, Regex2) instead
func splitIncl(s: string; sep: Regex2): seq[string] {.inline, ...raises: [],
    tags: [RootEffect].}
return not matched substrings, including captured groups

Example:

let
  parts = splitIncl("a,b", re2"(,)")
  expected = @["a", ",", "b"]
doAssert parts == expected
func startsWith(s: string; pattern: Regex2; start = 0): bool {.inline,
    ...raises: [], tags: [RootEffect].}
return whether the string starts with the pattern or not

Example:

doAssert "abc".startsWith(re2"\w")
doAssert not "abc".startsWith(re2"\d")
func startsWith(s: string; pattern: Regex; start = 0): bool {.inline,
    ...raises: [], deprecated: "use startsWith(string, Regex2) instead",
    tags: [RootEffect].}
Deprecated: use startsWith(string, Regex2) instead
func toPattern(s: string): Regex {....raises: [RegexError],
                                   deprecated: "Use `re2(string)` instead",
                                   tags: [].}
Deprecated: Use `re2(string)` instead

Iterators

iterator findAll(s: string; pattern: Regex2; start = 0): RegexMatch2 {.inline,
    ...raises: [], tags: [RootEffect].}
search through the string and return each match. Empty matches (start > end) are included

Example:

let text = "abcabc"
var bounds = newSeq[Slice[int]]()
var found = newSeq[string]()
for m in findAll(text, re2"bc"):
  bounds.add m.boundaries
  found.add text[m.boundaries]
doAssert bounds == @[1 .. 2, 4 .. 5]
doAssert found == @["bc", "bc"]
iterator findAll(s: string; pattern: Regex; start = 0): RegexMatch {.inline,
    ...raises: [], deprecated: "use findAll(string, Regex2) instead",
    tags: [RootEffect].}
Deprecated: use findAll(string, Regex2) instead
iterator findAllBounds(s: string; pattern: Regex2; start = 0): Slice[int] {.
    inline, ...raises: [], tags: [RootEffect].}
search through the string and return each match. Empty matches (start > end) are included

Example:

let text = "abcabc"
var bounds = newSeq[Slice[int]]()
for bd in findAllBounds(text, re2"bc"):
  bounds.add bd
doAssert bounds == @[1 .. 2, 4 .. 5]
iterator findAllBounds(s: string; pattern: Regex; start = 0): Slice[int] {.
    inline, ...raises: [], deprecated: "use findAllBounds(string, Regex2) instead",
    tags: [RootEffect].}
Deprecated: use findAllBounds(string, Regex2) instead
iterator group(m: RegexMatch; i: int): Slice[int] {.inline, ...raises: [],
    deprecated, tags: [].}
Deprecated
iterator group(m: RegexMatch; s: string): Slice[int] {.inline,
    ...raises: [KeyError], deprecated, tags: [].}
Deprecated
iterator split(s: string; sep: Regex): string {.inline, ...raises: [],
    deprecated: "use split(string, Regex2) instead", tags: [RootEffect].}
Deprecated: use split(string, Regex2) instead
iterator split(s: string; sep: Regex2): string {.inline, ...raises: [],
    tags: [RootEffect].}
return not matched substrings

Example:

var found = newSeq[string]()
for s in split("11a22Ϊ33Ⓐ44弢55", re2"\d+"):
  found.add s
doAssert found == @["", "a", "Ϊ", "Ⓐ", "弢", ""]

Macros

macro match(text: string; regex: RegexLit; body: untyped): untyped

return a match if the whole string matches the regular expression. This is similar to the match function, but faster. Notice it requires a raw regex literal string as second parameter; the regex must be known at compile time, and cannot be a var/let/const

A matches: seq[string] variable is injected into the scope, and it contains the submatches for every capture group. If a group is repeated (ex: (\\w)+), it will contain the last capture for that group.

Note: Only available in Nim +1.1

Example:

match "abc", rex"(a(b)c)":
  doAssert matches == @["abc", "b"]