Source Edit

Source highlighter for programming or markup languages. Currently only few languages are supported, other languages may be added. The interface supports one language nested in another.

You can use this to build your own syntax highlighting, check this example:

  1. let code = """for x in $int.high: echo x.ord mod 2 == 0"""
  2. var toknizr: GeneralTokenizer
  3. initGeneralTokenizer(toknizr, code)
  4. while true:
  5. getNextToken(toknizr, langNim)
  6. case toknizr.kind
  7. of gtEof: break # End Of File (or string)
  8. of gtWhitespace:
  9. echo gtWhitespace # Maybe you want "visible" whitespaces?.
  10. echo substr(code, toknizr.start, toknizr.length + toknizr.start - 1)
  11. of gtOperator:
  12. echo gtOperator # Maybe you want Operators to use a specific color?.
  13. echo substr(code, toknizr.start, toknizr.length + toknizr.start - 1)
  14. # of gtSomeSymbol: syntaxHighlight("Comic Sans", "bold", "99px", "pink")
  15. else:
  16. echo toknizr.kind # All the kinds of tokens can be processed here.
  17. echo substr(code, toknizr.start, toknizr.length + toknizr.start - 1)

The proc getSourceLanguage can get the language enum from a string:

  1. for l in ["C", "c++", "jAvA", "Nim", "c#"]: echo getSourceLanguage(l)

There is also a Cmd pseudo-language supported, which is a simple generic shell/cmdline tokenizer (UNIX shell/Powershell/Windows Command): no escaping, no programming language constructs besides variable definition at the beginning of line. It supports these operators:

  1. & && | || ( ) '' "" ; # for comments

Instead of escaping always use quotes like here nimgrep —ext:’nim|nims’ file.name shows how to input |. Any argument that contains . or / or \ will be treated as a file or directory.

In addition to Cmd there is also Console language for displaying interactive sessions. Lines with a command should start with $, other lines are considered as program output.

Imports

strutils, algorithm

Types

  1. GeneralTokenizer = object of RootObj
  2. kind*: TokenClass
  3. start*, length*: int

Source Edit

  1. SourceLanguage = enum
  2. langNone, langNim, langCpp, langCsharp, langC, langJava, langYaml, langPython,
  3. langCmd, langConsole

Source Edit

  1. TokenClass = enum
  2. gtEof, gtNone, gtWhitespace, gtDecNumber, gtBinNumber, gtHexNumber,
  3. gtOctNumber, gtFloatNumber, gtIdentifier, gtKeyword, gtStringLit,
  4. gtLongStringLit, gtCharLit, gtEscapeSequence, gtOperator, gtPunctuation,
  5. gtComment, gtLongComment, gtRegularExpression, gtTagStart, gtTagEnd, gtKey,
  6. gtValue, gtRawData, gtAssembler, gtPreprocessor, gtDirective, gtCommand,
  7. gtRule, gtHyperlink, gtLabel, gtReference, gtPrompt, gtProgramOutput,
  8. gtProgram, gtOption, gtOther

Source Edit

Consts

  1. sourceLanguageToAlpha: array[SourceLanguage, string] = ["none", "Nim", "cpp",
  2. "csharp", "C", "Java", "Yaml", "Python", "Cmd", "Console"]

list of languages spelled with alpabetic characters Source Edit

  1. sourceLanguageToStr: array[SourceLanguage, string] = ["none", "Nim", "C++",
  2. "C#", "C", "Java", "Yaml", "Python", "Cmd", "Console"]

Source Edit

  1. tokenClassToStr: array[TokenClass, string] = ["Eof", "None", "Whitespace",
  2. "DecNumber", "BinNumber", "HexNumber", "OctNumber", "FloatNumber",
  3. "Identifier", "Keyword", "StringLit", "LongStringLit", "CharLit",
  4. "EscapeSequence", "Operator", "Punctuation", "Comment", "LongComment",
  5. "RegularExpression", "TagStart", "TagEnd", "Key", "Value", "RawData",
  6. "Assembler", "Preprocessor", "Directive", "Command", "Rule", "Hyperlink",
  7. "Label", "Reference", "Prompt", "ProgramOutput", "program", "option",
  8. "Other"]

Source Edit

Procs

  1. proc deinitGeneralTokenizer(g: var GeneralTokenizer) {....raises: [], tags: [],
  2. forbids: [].}

Source Edit

  1. proc getNextToken(g: var GeneralTokenizer; lang: SourceLanguage) {....raises: [],
  2. tags: [], forbids: [].}

Source Edit

  1. proc getSourceLanguage(name: string): SourceLanguage {....raises: [], tags: [],
  2. forbids: [].}

Source Edit

  1. proc initGeneralTokenizer(g: var GeneralTokenizer; buf: cstring) {....raises: [],
  2. tags: [], forbids: [].}

Source Edit

  1. proc initGeneralTokenizer(g: var GeneralTokenizer; buf: string) {....raises: [],
  2. tags: [], forbids: [].}

Source Edit

  1. proc tokenize(text: string; lang: SourceLanguage): seq[(string, TokenClass)] {.
  2. ...raises: [], tags: [], forbids: [].}

Source Edit