Source Edit

Routines for converting between different character encodings. On UNIX, this uses the iconv library, on Windows the Windows API.

The following example shows how to change character encodings.

Example:

  1. import std/encodings
  2. when defined(windows):
  3. let
  4. orig = "öäüß"
  5. # convert `orig` from "UTF-8" to "CP1252"
  6. cp1252 = convert(orig, "CP1252", "UTF-8")
  7. # convert `cp1252` from "CP1252" to "ibm850"
  8. ibm850 = convert(cp1252, "ibm850", "CP1252")
  9. current = getCurrentEncoding()
  10. assert orig == "\195\182\195\164\195\188\195\159"
  11. assert ibm850 == "\148\132\129\225"
  12. assert convert(ibm850, current, "ibm850") == orig

The example below uses a reuseable EncodingConverter object which is created by open with destEncoding and srcEncoding specified. You can use convert on this object multiple times.

Example:

  1. import std/encodings
  2. when defined(windows):
  3. var fromGB2312 = open("utf-8", "gb2312")
  4. let first = "\203\173\197\194\163\191\210\187" &
  5. "\203\242\209\204\211\234\200\206\198\189\201\250"
  6. assert fromGB2312.convert(first) == "谁怕?一蓑烟雨任平生"
  7. let second = "\211\208\176\215\205\183\200\231" &
  8. "\208\194\163\172\199\227\184\199\200\231\185\202"
  9. assert fromGB2312.convert(second) == "有白头如新,倾盖如故"

Imports

os, parseutils, strutils

Types

  1. EncodingConverter = object

Source Edit

  1. EncodingError = object of ValueError

Exception that is raised for encoding errors. Source Edit

Procs

  1. proc close(c: EncodingConverter) {....raises: [], tags: [], forbids: [].}

Frees the resources the converter c holds. Source Edit

  1. proc codePageToName(c: CodePage): string {....raises: [], tags: [], forbids: [].}

Source Edit

  1. proc convert(c: EncodingConverter; s: string): string {.
  2. ...raises: [EncodingError, OSError], tags: [], forbids: [].}

Source Edit

  1. proc convert(s: string; destEncoding = "UTF-8"; srcEncoding = "CP1252"): string {.
  2. ...raises: [ValueError, EncodingError, EncodingError, OSError], tags: [],
  3. forbids: [].}

Converts s to destEncoding. It assumed that s is in srcEncoding. This opens a converter, uses it and closes it again and is thus more convenient but also likely less efficient than re-using a converter.

Warning: UTF-16BE and UTF-32 conversions are not supported on Windows.

Source Edit

  1. proc getCurrentEncoding(uiApp = false): string {....raises: [], tags: [],
  2. forbids: [].}

Retrieves the current encoding. On Unix, “UTF-8” is always returned. The uiApp parameter is Windows specific. If true, the UI’s code-page is returned, if false, the Console’s code-page is returned. Source Edit

  1. proc nameToCodePage(name: string): CodePage {....raises: [ValueError], tags: [],
  2. forbids: [].}

Source Edit

  1. proc open(destEncoding = "UTF-8"; srcEncoding = "CP1252"): EncodingConverter {.
  2. ...raises: [ValueError, EncodingError], tags: [], forbids: [].}

Opens a converter that can convert from srcEncoding to destEncoding. Raises EncodingError if it cannot fulfill the request. Source Edit