normalize

This module implements all the Unicode Normalization Form algorithms

The normalization is buffered. Buffering makes the algorithm take O(n) time and O(1) space. Making it suitable for untrusted text and streaming.

The result is not guaranteed to be equal to the unbuffered one. However, this is usually only true for malformed text. The buffer may be flushed before filling it completely.

NFD will apply a canonical decomposition. NFC will apply a canonical decomposition, then the canonical composition. NFKD will apply a compatibility decomposition. NFKC will apply a compatibility decomposition, then the canonical composition.

Procs

proc toNFD(s: string): string {...}{.raises: [], tags: [].}
Return the normalized input. Result may take 3 times the size of the input
proc toNFD(s: seq[Rune]): seq[Rune] {...}{.deprecated: "Use toNFD(string)", raises: [],
                                  tags: [].}
Deprecated: Use toNFD(string)
Return the normalized input. Result may take 4 times the size of the input
proc toNFC(s: string): string {...}{.raises: [], tags: [].}
Return the normalized input. Result may take 3 times the size of the input
proc toNFC(s: seq[Rune]): seq[Rune] {...}{.deprecated: "Use toNFC(string)", raises: [],
                                  tags: [].}
Deprecated: Use toNFC(string)
Return the normalized input. Result may take 3 times the size of the input
proc toNFKD(s: string): string {...}{.raises: [], tags: [].}
Return the normalized input. Result may take 11 times the size of the input
proc toNFKD(s: seq[Rune]): seq[Rune] {...}{.deprecated: "Use toNFKD(string)", raises: [],
                                   tags: [].}
Deprecated: Use toNFKD(string)
Return the normalized input. Result may take 18 times the size of the input
proc toNFKC(s: string): string {...}{.raises: [], tags: [].}
Return the normalized input. Result may take 11 times the size of the input
proc toNFKC(s: seq[Rune]): seq[Rune] {...}{.deprecated: "Use toNFKC(string)", raises: [],
                                   tags: [].}
Deprecated: Use toNFKC(string)
Return the normalized input. Result may take 18 times the size of the input
proc isNFC(s: string): bool {...}{.inline, raises: [], tags: [].}
Return whether the unicode characters are normalized or not. For some inputs the result is always false (even if it's normalized)
proc isNFC(s: seq[Rune]): bool {...}{.inline, deprecated: "Use isNFC(string)", raises: [],
                             tags: [].}
Deprecated: Use isNFC(string)
Return whether the unicode characters are normalized or not. For some inputs the result is always false (even if it's normalized)
proc isNFD(s: string): bool {...}{.inline, raises: [], tags: [].}
Return whether the unicode characters are normalized or not. For some inputs the result is always false (even if it's normalized)
proc isNFD(s: seq[Rune]): bool {...}{.inline, deprecated: "Use isNFD(string)", raises: [],
                             tags: [].}
Deprecated: Use isNFD(string)
Return whether the unicode characters are normalized or not. For some inputs the result is always false (even if it's normalized)
proc isNFKC(s: string): bool {...}{.inline, raises: [], tags: [].}
Return whether the unicode characters are normalized or not. For some inputs the result is always false (even if it's normalized)
proc isNFKC(s: seq[Rune]): bool {...}{.inline, deprecated: "Use isNFKC(string)", raises: [],
                              tags: [].}
Deprecated: Use isNFKC(string)
Return whether the unicode characters are normalized or not. For some inputs the result is always false (even if it's normalized)
proc isNFKD(s: string): bool {...}{.inline, raises: [], tags: [].}
Return whether the unicode characters are normalized or not. For some inputs the result is always false (even if it's normalized)
proc isNFKD(s: seq[Rune]): bool {...}{.inline, deprecated: "Use isNFKD(string)", raises: [],
                              tags: [].}
Deprecated: Use isNFKD(string)
Return whether the unicode characters are normalized or not. For some inputs the result is always false (even if it's normalized)
proc cmpNfd(a, b: openArray[char]): bool {...}{.raises: [], tags: [].}
Compare two strings are canonically equivalent. This is more efficient than normalizing + comparing, as it does not create temporary strings (i.e it won't allocate).

Iterators

iterator toNFD(s: string): Rune {...}{.inline, raises: [], tags: [].}
Iterates over each normalized unicode character
iterator toNFD(s: seq[Rune]): Rune {...}{.inline, deprecated: "Use toNFD(string)",
                                 raises: [], tags: [].}
Deprecated: Use toNFD(string)
Iterates over each normalized unicode character
iterator toNFC(s: string): Rune {...}{.inline, raises: [], tags: [].}
Iterates over each normalized unicode character
iterator toNFC(s: seq[Rune]): Rune {...}{.inline, deprecated: "Use toNFC(string)",
                                 raises: [], tags: [].}
Deprecated: Use toNFC(string)
Iterates over each normalized unicode character
iterator toNFKD(s: string): Rune {...}{.inline, raises: [], tags: [].}
Iterates over each normalized unicode character
iterator toNFKD(s: seq[Rune]): Rune {...}{.inline, deprecated: "Use toNFKD(string)",
                                  raises: [], tags: [].}
Deprecated: Use toNFKD(string)
Iterates over each normalized unicode character
iterator toNFKC(s: string): Rune {...}{.inline, raises: [], tags: [].}
Iterates over each normalized unicode character
iterator toNFKC(s: seq[Rune]): Rune {...}{.inline, deprecated: "Use toNFKC(string)",
                                  raises: [], tags: [].}
Deprecated: Use toNFKC(string)
Iterates over each normalized unicode character