String

A String represents an immutable sequence of UTF-8 characters.

A String is typically created with a string literal enclosing UTF-8 characters in double quotes ("):

  1. "hello world"

Escaping

A backslash denotes a special character inside a string, which can either be a named escape sequence or a numerical representation of a unicode codepoint.

Available escape sequences:

  1. "\"" # double quote
  2. "\\" # backslash
  3. "\a" # alert
  4. "\b" # backspace
  5. "\e" # escape
  6. "\f" # form feed
  7. "\n" # newline
  8. "\r" # carriage return
  9. "\t" # tab
  10. "\v" # vertical tab
  11. "\NNN" # octal ASCII character
  12. "\xNN" # hexadecimal ASCII character
  13. "\uNNNN" # hexadecimal unicode character
  14. "\u{NNNN...}" # hexadecimal unicode character

Any other character following a backslash is interpreted as the character itself.

A backslash followed by at most three digits ranging from 0 to 7 denotes a code point written in octal:

  1. "\101" # => "A"
  2. "\123" # => "S"
  3. "\12" # => "\n"
  4. "\1" # string with one character with code point 1

A backslash followed by a u denotes a unicode codepoint. It can either be followed by exactly four hexadecimal characters representing the unicode bytes (\u0000 to \uFFFF) or a number of one to six hexadecimal characters wrapped in curly braces (\u{0} to \u{10FFFF}.

  1. "\u0041" # => "A"
  2. "\u{41}" # => "A"
  3. "\u{1F52E}" # => "🔮"

One curly brace can contain multiple unicode characters each separated by a whitespace.

  1. "\u{48 45 4C 4C 4F}" # => "HELLO"

Interpolation

A string literal with interpolation allows to embed expressions into the string which will be expanded at runtime.

  1. a = 1
  2. b = 2
  3. "sum: #{a} + #{b} = #{a + b}" # => "sum: 1 + 2 = 3"

String interpolation is also possible with String#%.

Any expression may be placed inside the interpolated section, but it’s best to keep the expression small for readability.

Interpolation can be disabled by escaping the # character with a backslash or by using a non-interpolating string literal like %q().

  1. "\#{a + b}" # => "#{a + b}"
  2. %q(#{a + b}) # => "#{a + b}"

Interpolation is implemented using a String::Builder and invoking Object#to_s(IO) on each expression enclosed by #{...}. The expression "sum: #{a} + #{b} = #{a + b}" is equivalent to:

  1. String.build do |io|
  2. io << "sum: "
  3. io << a
  4. io << " + "
  5. io << b
  6. io << " = "
  7. io << a + b
  8. end

Percent string literals

Besides double-quotes strings, Crystal also supports string literals indicated by a percent sign (%) and a pair of delimiters. Valid delimiters are parentheses (), square brackets [], curly braces {}, angles <> and pipes ||. Except for the pipes, all delimiters can be nested meaning a start delimiter inside the string escapes the next end delimiter.

These are handy to write strings that include double quotes which would have to be escaped in double-quoted strings.

  1. %(hello ("world")) # => "hello (\"world\")"
  2. %[hello ["world"]] # => "hello [\"world\"]"
  3. %{hello {"world"}} # => "hello {\"world\"}"
  4. %<hello <"world">> # => "hello <\"world\">"
  5. %|hello "world"| # => "hello \"world\""

A literal denoted by %q does not apply interpolation nor escapes while %Q has the same meaning as %.

  1. name = "world"
  2. %q(hello \n #{name}) # => "hello \\n \#{name}"
  3. %Q(hello \n #{name}) # => "hello \n world"

Percent string array literal

Besides the single string literal, there is also a percent literal to create an Array of strings. It is indicated by %w and a pair of delimiters. Valid delimiters are as same as percent string literals.

  1. %w(foo bar baz) # => ["foo", "bar", "baz"]
  2. %w(foo\nbar baz) # => ["foo\\nbar", "baz"]
  3. %w(foo(bar) baz) # => ["foo(bar)", "baz"]

Note that literal denoted by %w does not apply interpolation nor escapes expect spaces. Since strings are separated by a single space character ( ) which must be escaped to use it as a part of a string.

  1. %w(foo\ bar baz) # => ["foo bar", "baz"]

Multiline strings

Any string literal can span multiple lines:

  1. "hello
  2. world" # => "hello\n world"

Note that in the above example trailing and leading spaces, as well as newlines,
end up in the resulting string. To avoid this a string can be split into multiple lines
by joining multiple literals with a backslash:

  1. "hello " \
  2. "world, " \
  3. "no newlines" # same as "hello world, no newlines"

Alternatively, a backslash followed by a newline can be inserted inside the string literal:

  1. "hello \
  2. world, \
  3. no newlines" # same as "hello world, no newlines"

In this case, leading whitespace is not included in the resulting string.

Heredoc

A here document or heredoc can be useful for writing strings spanning over multiple lines.
A heredoc is denoted by <<- followed by an heredoc identifier which is an alphanumeric sequence starting with a letter (and may include underscores). The heredoc starts in the following line and ends with the next line that starts with the heredoc identifier (ignoring leading whitespace) and is either followed by a newline or a non-alphanumeric character.

  1. <<-XML
  2. <parent>
  3. <child />
  4. </parent>
  5. XML

Leading whitespace is removed from the heredoc contents according to the number of whitespace in the last line before the heredoc identifier.

  1. <<-STRING # => "Hello\n world"
  2. Hello
  3. world
  4. STRING
  5. <<-STRING # => " Hello\n world"
  6. Hello
  7. world
  8. STRING

It is possible to directly call methods on heredoc string literals, or use them inside parentheses:

  1. <<-SOME.upcase # => "HELLO"
  2. hello
  3. SOME
  4. def upcase(string)
  5. string.upcase
  6. end
  7. upcase(<<-SOME) # => "HELLO"
  8. hello
  9. SOME

A heredoc generally allows interpolation and escapes.

To denote a heredoc without interpolation or escapes, the opening heredoc identifier is enclosed in single quotes:

  1. <<-'HERE' # => "hello \\n \#{world}"
  2. hello \n #{world}
  3. HERE