String

A String represents an immutable sequence of UTF-8 characters.

A String is typically created with a string literal enclosing UTF-8 characters in double quotes ("):

  1. "hello world"

Escaping

A backslash denotes a special character inside a string, which can either be a named escape sequence or a numerical representation of a unicode codepoint.

Available escape sequences:

  1. "\"" # double quote
  2. "\\" # backslash
  3. "\#" # hash character (to escape interpolation)
  4. "\a" # alert
  5. "\b" # backspace
  6. "\e" # escape
  7. "\f" # form feed
  8. "\n" # newline
  9. "\r" # carriage return
  10. "\t" # tab
  11. "\v" # vertical tab
  12. "\377" # octal ASCII character
  13. "\xFF" # hexadecimal ASCII character
  14. "\uFFFF" # hexadecimal unicode character
  15. "\u{0}".."\u{10FFFF}" # hexadecimal unicode character

Any other character following a backslash is interpreted as the character itself.

A backslash followed by at most three digits ranging from 0 to 7 denotes a code point written in octal:

  1. "\101" # => "A"
  2. "\123" # => "S"
  3. "\12" # => "\n"
  4. "\1" # string with one character with code point 1

A backslash followed by a u denotes a unicode codepoint. It can either be followed by exactly four hexadecimal characters representing the unicode bytes (\u0000 to \uFFFF) or a number of one to six hexadecimal characters wrapped in curly braces (\u{0} to \u{10FFFF}.

  1. "\u0041" # => "A"
  2. "\u{41}" # => "A"
  3. "\u{1F52E}" # => "🔮"

One curly brace can contain multiple unicode characters each separated by a whitespace.

  1. "\u{48 45 4C 4C 4F}" # => "HELLO"

Interpolation

A string literal with interpolation allows to embed expressions into the string which will be expanded at runtime.

  1. a = 1
  2. b = 2
  3. "sum: #{a} + #{b} = #{a + b}" # => "sum: 1 + 2 = 3"

String interpolation is also possible with String#%.

Any expression may be placed inside the interpolated section, but it’s best to keep the expression small for readability.

Interpolation can be disabled by escaping the hash character (#) with a backslash or by using a non-interpolating string literal like %q().

  1. "\#{a + b}" # => "#{a + b}"
  2. %q(#{a + b}) # => "#{a + b}"

Interpolation is implemented using a String::Builder and invoking Object#to_s(IO) on each expression enclosed by #{...}. The expression "sum: #{a} + #{b} = #{a + b}" is equivalent to:

  1. String.build do |io|
  2. io << "sum: "
  3. io << a
  4. io << " + "
  5. io << b
  6. io << " = "
  7. io << a + b
  8. end

Percent string literals

Besides double-quotes strings, Crystal also supports string literals indicated by a percent sign (%) and a pair of delimiters. Valid delimiters are parentheses (), square brackets [], curly braces {}, angles <> and pipes ||. Except for the pipes, all delimiters can be nested meaning a start delimiter inside the string escapes the next end delimiter.

These are handy to write strings that include double quotes which would have to be escaped in double-quoted strings.

  1. %(hello ("world")) # => "hello (\"world\")"
  2. %[hello ["world"]] # => "hello [\"world\"]"
  3. %{hello {"world"}} # => "hello {\"world\"}"
  4. %<hello <"world">> # => "hello <\"world\">"
  5. %|hello "world"| # => "hello \"world\""

A literal denoted by %q does not apply interpolation nor escapes while %Q has the same meaning as %.

  1. name = "world"
  2. %q(hello \n #{name}) # => "hello \\n \#{name}"
  3. %Q(hello \n #{name}) # => "hello \n world"

Percent string array literal

Besides the single string literal, there is also a percent literal to create an Array of strings. It is indicated by %w and a pair of delimiters. Valid delimiters are as same as percent string literals.

  1. %w(foo bar baz) # => ["foo", "bar", "baz"]
  2. %w(foo\nbar baz) # => ["foo\\nbar", "baz"]
  3. %w(foo(bar) baz) # => ["foo(bar)", "baz"]

Note that literal denoted by %w does not apply interpolation nor escapes except spaces. Since strings are separated by a single space character () which must be escaped to use it as a part of a string.

  1. %w(foo\ bar baz) # => ["foo bar", "baz"]

Multiline strings

Any string literal can span multiple lines:

  1. "hello
  2. world" # => "hello\n world"

Note that in the above example trailing and leading spaces, as well as newlines, end up in the resulting string. To avoid this a string can be split into multiple lines by joining multiple literals with a backslash:

  1. "hello " \
  2. "world, " \
  3. "no newlines" # same as "hello world, no newlines"

Alternatively, a backslash followed by a newline can be inserted inside the string literal:

  1. "hello \
  2. world, \
  3. no newlines" # same as "hello world, no newlines"

In this case, leading whitespace is not included in the resulting string.

Heredoc

A here document or heredoc can be useful for writing strings spanning over multiple lines. A heredoc is denoted by <<- followed by an heredoc identifier which is an alphanumeric sequence starting with a letter (and may include underscores). The heredoc starts in the following line and ends with the next line that contains only the heredoc identifier, optionally preceeded by whitespace.

  1. <<-XML
  2. <parent>
  3. <child />
  4. </parent>
  5. XML

Leading whitespace is removed from the heredoc contents according to the number of whitespace in the last line before the heredoc identifier.

  1. <<-STRING # => "Hello\n world"
  2. Hello
  3. world
  4. STRING
  5. <<-STRING # => " Hello\n world"
  6. Hello
  7. world
  8. STRING

After the heredoc identifier, and in that same line, anything that follows continues the original expression that came before the heredoc. It’s as if the end of the starting heredoc identifier is the end of the string. However, the string contents come in subsequent lines until the ending heredoc idenfitier which must be on its own line.

  1. <<-STRING.upcase # => "HELLO"
  2. hello
  3. STRING
  4. def upcase(string)
  5. string.upcase
  6. end
  7. upcase(<<-STRING) # => "HELLO WORLD"
  8. Hello World
  9. STRING

If multiple heredocs start in the same line, their bodies are read sequentially:

  1. print(<<-FIRST, <<-SECOND) # prints "HelloWorld"
  2. Hello
  3. FIRST
  4. World
  5. SECOND

A heredoc generally allows interpolation and escapes.

To denote a heredoc without interpolation or escapes, the opening heredoc identifier is enclosed in single quotes:

  1. <<-'HERE' # => "hello \\n \#{world}"
  2. hello \n #{world}
  3. HERE