Source Edit

What is NRE?

A regular expression library for Nim using PCRE to do the hard work.

For documentation on how to write patterns, there exists the official PCRE pattern documentation. You can also search the internet for a wide variety of third-party documentation and tools.

Warning: If you love sequtils.toSeq we have bad news for you. This library doesn’t work with it due to documented compiler limitations. As a workaround, use this:

Example:

  1. import std/nre
  2. # either `import std/nre except toSeq` or fully qualify `sequtils.toSeq`:
  3. import std/sequtils
  4. iterator iota(n: int): int =
  5. for i in 0..<n: yield i
  6. assert sequtils.toSeq(iota(3)) == @[0, 1, 2]

Note: There are also alternative nimble packages such as tinyre and regex.

Licencing

PCRE has some additional terms that you must agree to in order to use this module.

Example:

  1. import std/nre
  2. import std/sugar
  3. let vowels = re"[aeoui]"
  4. let bounds = collect:
  5. for match in "moiga".findIter(vowels): match.matchBounds
  6. assert bounds == @[1 .. 1, 2 .. 2, 4 .. 4]
  7. from std/sequtils import toSeq
  8. let s = sequtils.toSeq("moiga".findIter(vowels))
  9. # fully qualified to avoid confusion with nre.toSeq
  10. assert s.len == 3
  11. let firstVowel = "foo".find(vowels)
  12. let hasVowel = firstVowel.isSome()
  13. assert hasVowel
  14. let matchBounds = firstVowel.get().captureBounds[-1]
  15. assert matchBounds.a == 1
  16. # as with module `re`, unless specified otherwise, `start` parameter in each
  17. # proc indicates where the scan starts, but outputs are relative to the start
  18. # of the input string, not to `start`:
  19. assert find("uxabc", re"(?<=x|y)ab", start = 1).get.captures[-1] == "ab"
  20. assert find("uxabc", re"ab", start = 3).isNone

Imports

pcre, util, tables, strutils, options, unicode

Types

  1. CaptureBounds = distinct RegexMatch

Source Edit

  1. Captures = distinct RegexMatch

Source Edit

  1. InvalidUnicodeError = ref object of RegexError
  2. pos*: int ## the location of the invalid unicode in bytes

Thrown when matching fails due to invalid unicode in strings Source Edit

  1. Regex = ref object
  2. pattern*: string
  3. ## not nil
  4. ## nil

Represents the pattern that things are matched against, constructed with re(string). Examples: re”foo”, re(r”(*ANYCRLF)(?x)foo # comment”.- pattern: string

  1. the string that was used to create the pattern. For details on how to write a pattern, please see [the official PCRE pattern documentation.](https://www.pcre.org/original/doc/html/pcrepattern.html)
  2. captureCount: int
  3. the number of captures that the pattern has.
  4. captureNameId: Table\[string, int\]
  5. a table from the capture names to their numeric id.

Options

The following options may appear anywhere in the pattern, and they affect the rest of it.

  • (?i) - case insensitive
  • (?m) - multi-line: ^ and $ match the beginning and end of lines, not of the subject string
  • (?s) - . also matches newline (dotall)
  • (?U) - expressions are not greedy by default. ? can be added to a qualifier to make it greedy
  • (?x) - whitespace and comments (#) are ignored (extended)
  • (?X) - character escapes without special meaning (\w vs. \a) are errors (extra)

One or a combination of these options may appear only at the beginning of the pattern:

  • (*UTF8) - treat both the pattern and subject as UTF-8
  • (*UCP) - Unicode character properties; \w matches я
  • (*U) - a combination of the two options above
  • (*FIRSTLINE*) - fails if there is not a match on the first line
  • (*NO_AUTO_CAPTURE) - turn off auto-capture for groups; (?<name>…) can be used to capture
  • (*CR) - newlines are separated by \r
  • (*LF) - newlines are separated by \n (UNIX default)
  • (*CRLF) - newlines are separated by \r\n (Windows default)
  • (*ANYCRLF) - newlines are separated by any of the above
  • (*ANY) - newlines are separated by any of the above and Unicode newlines:

    single characters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS (paragraph separator, U+2029). For the 8-bit library, the last two are recognized only in UTF-8 mode. — man pcre

  • (*JAVASCRIPT_COMPAT) - JavaScript compatibility

  • (*NO_STUDY) - turn off studying; study is enabled by default

For more details on the leading option groups, see the Option Setting and the Newline Convention sections of the PCRE syntax manual.

Some of these options are not part of PCRE and are converted by nre into PCRE flags. These include NEVER_UTF, ANCHORED, DOLLAR_ENDONLY, FIRSTLINE, NO_AUTO_CAPTURE, JAVASCRIPT_COMPAT, U, NO_STUDY. In other PCRE wrappers, you will need to pass these as separate flags to PCRE.

Source Edit

  1. RegexError = ref object of CatchableError

Source Edit

  1. RegexInternalError = ref object of RegexError

Internal error in the module, this probably means that there is a bug Source Edit

  1. RegexMatch = object
  2. pattern*: Regex ## The regex doing the matching.
  3. ## Not nil.
  4. str*: string ## The string that was matched against.
  5. ## First item is the bounds of the match
  6. ## Other items are the captures
  7. ## `a` is inclusive start, `b` is exclusive end

Usually seen as OptionRegexMatch, it represents the result of an execution. On failure, it is none, on success, it is some.- pattern: Regex

  1. the pattern that is being matched
  2. str: string
  3. the string that was matched against
  4. captures\[\]: string
  5. the string value of whatever was captured at that id. If the value is invalid, then behavior is undefined. If the id is \-1, then the whole match is returned. If the given capture was not matched, nil is returned. See examples for match.
  6. captureBounds\[\]: HSlice\[int, int\]
  7. gets the bounds of the given capture according to the same rules as the above. If the capture is not filled, then None is returned. The bounds are both inclusive. See examples for match.
  8. match: string
  9. the full text of the match.
  10. matchBounds: HSlice\[int, int\]
  11. the bounds of the match, as in captureBounds\[\]
  12. (captureBounds|captures).toTable
  13. returns a table with each named capture as a key.
  14. (captureBounds|captures).toSeq
  15. returns all the captures by their number.
  16. $: string
  17. same as match

Source Edit

  1. StudyError = ref object of RegexError

Thrown when studying the regular expression fails for whatever reason. The message contains the error code. Source Edit

  1. SyntaxError = ref object of RegexError
  2. pos*: int ## the location of the syntax error in bytes
  3. pattern*: string ## the pattern that caused the problem

Thrown when there is a syntax error in the regular expression string passed in Source Edit

Procs

  1. proc `$`(pattern: RegexMatch): string {....raises: [], tags: [], forbids: [].}

Source Edit

  1. proc `==`(a, b: Regex): bool {....raises: [], tags: [], forbids: [].}

Source Edit

  1. proc `==`(a, b: RegexMatch): bool {....raises: [], tags: [], forbids: [].}

Source Edit

  1. func `[]`(pattern: CaptureBounds; i: int): HSlice[int, int] {....raises: [],
  2. tags: [], forbids: [].}

Source Edit

  1. func `[]`(pattern: CaptureBounds; name: string): HSlice[int, int] {.
  2. ...raises: [KeyError], tags: [], forbids: [].}

Source Edit

  1. func `[]`(pattern: Captures; i: int): string {....raises: [], tags: [], forbids: [].}

Source Edit

  1. func `[]`(pattern: Captures; name: string): string {....raises: [KeyError],
  2. tags: [], forbids: [].}

Source Edit

  1. func captureBounds(pattern: RegexMatch): CaptureBounds {....raises: [], tags: [],
  2. forbids: [].}

Source Edit

  1. proc captureCount(pattern: Regex): int {....raises: [ValueError], tags: [],
  2. forbids: [].}

Source Edit

  1. proc captureNameId(pattern: Regex): Table[string, int] {....raises: [], tags: [],
  2. forbids: [].}

Source Edit

  1. func captures(pattern: RegexMatch): Captures {....raises: [], tags: [], forbids: [].}

Source Edit

  1. func contains(pattern: CaptureBounds; i: int): bool {....raises: [], tags: [],
  2. forbids: [].}

Source Edit

  1. func contains(pattern: CaptureBounds; name: string): bool {....raises: [KeyError],
  2. tags: [], forbids: [].}

Source Edit

  1. func contains(pattern: Captures; i: int): bool {....raises: [], tags: [],
  2. forbids: [].}

Source Edit

  1. func contains(pattern: Captures; name: string): bool {....raises: [KeyError],
  2. tags: [], forbids: [].}

Source Edit

  1. proc contains(str: string; pattern: Regex; start = 0; endpos = int.high): bool {.
  2. ...raises: [ValueError, RegexInternalError, InvalidUnicodeError], tags: [],
  3. forbids: [].}

Determine if the string contains the given pattern between the end and start positions: This function is equivalent to isSome(str.find(pattern, start, endpos)).

Example:

  1. assert "abc".contains(re"bc")
  2. assert not "abc".contains(re"cd")
  3. assert not "abc".contains(re"a", start = 1)

Source Edit

  1. proc escapeRe(str: string): string {....gcsafe, raises: [], tags: [], forbids: [].}

Escapes the string so it doesn’t match any special characters. Incompatible with the Extra flag (X).

Escaped char: \ + * ? [ ^ ] $ ( ) { } = ! < > | : -

Example:

  1. assert escapeRe("fly+wind") == "fly\\+wind"
  2. assert escapeRe("!") == "\\!"
  3. assert escapeRe("nim*") == "nim\\*"

Source Edit

  1. proc find(str: string; pattern: Regex; start = 0; endpos = int.high): Option[
  2. RegexMatch] {....raises: [ValueError, RegexInternalError, InvalidUnicodeError],
  3. tags: [], forbids: [].}

Finds the given pattern in the string between the end and start positions.- start

  1. The start point at which to start matching. |abc is 0; a|bc is 1
  2. endpos
  3. The maximum index for a match; int.high means the end of the string, otherwise its an inclusive upper bound.

Source Edit

  1. proc findAll(str: string; pattern: Regex; start = 0; endpos = int.high): seq[
  2. string] {....raises: [ValueError, RegexInternalError, InvalidUnicodeError],
  3. tags: [], forbids: [].}

Source Edit

  1. func match(pattern: RegexMatch): string {....raises: [], tags: [], forbids: [].}

Source Edit

  1. proc match(str: string; pattern: Regex; start = 0; endpos = int.high): Option[
  2. RegexMatch] {....raises: [ValueError, RegexInternalError, InvalidUnicodeError],
  3. tags: [], forbids: [].}

Like find(…), but anchored to the start of the string.

Example:

  1. assert "foo".match(re"f").isSome
  2. assert "foo".match(re"o").isNone
  3. assert "abc".match(re"(\w)").get.captures[0] == "a"
  4. assert "abc".match(re"(?<letter>\w)").get.captures["letter"] == "a"
  5. assert "abc".match(re"(\w)\w").get.captures[-1] == "ab"
  6. assert "abc".match(re"(\w)").get.captureBounds[0] == 0 .. 0
  7. assert 0 in "abc".match(re"(\w)").get.captureBounds
  8. assert "abc".match(re"").get.captureBounds[-1] == 0 .. -1
  9. assert "abc".match(re"abc").get.captureBounds[-1] == 0 .. 2

Source Edit

  1. func matchBounds(pattern: RegexMatch): HSlice[int, int] {....raises: [], tags: [],
  2. forbids: [].}

Source Edit

  1. proc re(pattern: string): Regex {....raises: [KeyError, SyntaxError, StudyError,
  2. ValueError], tags: [], forbids: [].}

Source Edit

  1. proc replace(str: string; pattern: Regex; sub: string): string {.
  2. ...raises: [ValueError, RegexInternalError, InvalidUnicodeError, KeyError],
  3. tags: [], forbids: [].}

Source Edit

  1. proc replace(str: string; pattern: Regex;
  2. subproc: proc (match: RegexMatch): string): string {.
  3. ...raises: [ValueError, RegexInternalError, InvalidUnicodeError, Exception],
  4. tags: [RootEffect], forbids: [].}

Replaces each match of Regex in the string with subproc, which should never be or return nil.

If subproc is a proc (RegexMatch): string, then it is executed with each match and the return value is the replacement value.

If subproc is a proc (string): string, then it is executed with the full text of the match and the return value is the replacement value.

If subproc is a string, the syntax is as follows:

  • $$ - literal $
  • $123 - capture number 123
  • $foo - named capture foo
  • ${foo} - same as above
  • $1$# - first and second captures
  • $# - first capture
  • $0 - full match

If a given capture is missing, IndexDefect thrown for un-named captures and KeyError for named captures.

Source Edit

  1. proc replace(str: string; pattern: Regex; subproc: proc (match: string): string): string {.
  2. ...raises: [ValueError, RegexInternalError, InvalidUnicodeError, Exception],
  3. tags: [RootEffect], forbids: [].}

Source Edit

  1. proc split(str: string; pattern: Regex; maxSplit = -1; start = 0): seq[string] {.
  2. ...raises: [ValueError, RegexInternalError, InvalidUnicodeError], tags: [],
  3. forbids: [].}

Splits the string with the given regex. This works according to the rules that Perl and Javascript use.

start behaves the same as in find(…).

Example:

  1. # - If the match is zero-width, then the string is still split:
  2. assert "123".split(re"") == @["1", "2", "3"]
  3. # - If the pattern has a capture in it, it is added after the string
  4. # split:
  5. assert "12".split(re"(\d)") == @["", "1", "", "2", ""]
  6. # - If `maxsplit != -1`, then the string will only be split
  7. # `maxsplit - 1` times. This means that there will be `maxsplit`
  8. # strings in the output seq.
  9. assert "1.2.3".split(re"\.", maxsplit = 2) == @["1", "2.3"]

Source Edit

  1. proc toSeq(pattern: CaptureBounds; default = none(HSlice[int, int])): seq[
  2. Option[HSlice[int, int]]] {....raises: [ValueError], tags: [], forbids: [].}

Source Edit

  1. proc toSeq(pattern: Captures; default: Option[string] = none(string)): seq[
  2. Option[string]] {....raises: [ValueError], tags: [], forbids: [].}

Source Edit

  1. func toTable(pattern: CaptureBounds): Table[string, HSlice[int, int]] {.
  2. ...raises: [KeyError], tags: [], forbids: [].}

Source Edit

  1. func toTable(pattern: Captures): Table[string, string] {....raises: [KeyError],
  2. tags: [], forbids: [].}

Source Edit

Iterators

  1. iterator findIter(str: string; pattern: Regex; start = 0; endpos = int.high): RegexMatch {.
  2. ...raises: [ValueError, RegexInternalError, InvalidUnicodeError], tags: [],
  3. forbids: [].}

Works the same as find(…), but finds every non-overlapping match:

Example:

  1. import std/sugar
  2. assert collect(for a in "2222".findIter(re"22"): a.match) == @["22", "22"]
  3. # not @["22", "22", "22"]

Arguments are the same as find(…)

Variants:

  • proc findAll(…) returns a seq[string]

Source Edit

  1. iterator items(pattern: CaptureBounds; default = none(HSlice[int, int])): Option[
  2. HSlice[int, int]] {....raises: [ValueError], tags: [], forbids: [].}

Source Edit

  1. iterator items(pattern: Captures; default: Option[string] = none(string)): Option[
  2. string] {....raises: [ValueError], tags: [], forbids: [].}

Source Edit

Exports

$, none, flatten, none, \==, get, map), get, map_2), UnpackDefect, Option, get, unsafeGet, option, UnpackError, flatMap), isSome, some, filter), isNone