Skyhash Protocol 1.0

About this document

Copyright (c) 2021 Sayan Nandan <nandansayan@outlook.com>
In effect since: v0.6.0
Date: 11th May, 2021

Introduction

Skyhash or the Skytable Serialization Protocol (SSP) is a serialization protocol built on top of TCP that is used by Skytable for client/server communication. All clients willing to communicate with Skytable need to implement this protocol.

Concepts

Skyhash uses a query/response action just like HTTP’s request/response action — clients send queries while the server sends responses. All the bytes sent by a client to a server is called a Query Packet while all the bytes sent by the server in response to this is called the Response packet.

There are different kinds of queries:

  • Simple queries: These queries just have one action in the query packet, and hence, have just one response in the response packet
  • Pipeline queries: These queries carry multiple actions in the query pakcet and hence their response packet also contains multiple responses. You can read more about querying here.

Irrespective of the query type, all these packets are made of a metaframe and a dataframe.

The Metaframe

The metaframe is the first part of the packet separated from the rest of the packet by a line feed (\n) character. It looks like this:

  1. *<c>\n

where <c> tells us the number of actions this packet corresponds to. For simple queries which run one action, this will be one while for batch queries it can have any value in the range (1, +∞).

The Dataframe

The dataframe is made up of elements. Each element corresponds to a single action and hence corresponds to a single query. Simple queries will run one action and hence will have one element while batch queries will run a number of actions and hence will have a number of elements.

Every element is of a certain data type and this type determines how the element is serialized with Skyhash. Responses receive some extra data types which are highlighted in response specific data types.

Common Data Types

Usually serialized data types look like:

  1. <tsymbol><len>\n
  2. -----DATA-------

where the <tsymbol> corresponds to the Type Symbol and the <len> corresponds to the length of this element. Below is a list of data types and their <tsymbol>s.

Strings (+/?)

String elements are serialized like:

  1. +<c>\n
  2. <mystring>\n

Where <c> is the number of bytes in the string ‘<mystring>‘. So a string ‘Sayan’ will be serialized into:

  1. +5\n
  2. Sayan\n

There is also a binary string (binstr) type with a tsymbol ?. For this kind of string, no unicode validation is carried out.

Unsigned integers (:)

64-bit usigned integers are serialized into:

  1. :<c>\n
  2. <myint>\n

Where <c> is the number of digits in the integer and <myint> is the integer itself.

Arrays (&)

Arrays are recursive data types, that is an array can contain another array which in turn can contain another array and so on. And array is essentially a collection of data types, including itself. Also, arrays can be multi-type.

Skyhash serializes arrays into:

  1. &<c>\n
  2. <elements>

Where <c> is the number of elements in this array and <elements> are the elements present in the array. Take a look at the following examples:

  1. An array containing two strings:
  1. &2\n
  2. +5\n
  3. Hello
  4. +5\n
  5. World\n

This can be represented as:

  1. Array([String("Hello"), String("World")]);
  1. An array containing a string an two integers:
  1. &3\n
  2. +5\n
  3. Hello
  4. :1\n
  5. 0\n
  6. :1\n
  7. 1\n

Which can be represented as:

  1. Array([String("Hello"), UnsignedInt64(0), UnsignedInt64(1)]);
  1. An array containing two arrays: Pipe symbols (|) and underscores (_) were added for explaining the logical parts of the array:
  1. ___________________________
  2. &2\n |_____________| |
  3. &2\n | | |
  4. +5\n | | |
  5. Hello\n | Array 1 | |
  6. +5\n | | |
  7. World\n |_____________| |
  8. &3\n | | Nested |
  9. +5\n | | Array |
  10. Hello\n | | |
  11. +5\n | Array 2 | |
  12. World\n | | |
  13. +5\n | | |
  14. Again\n |_____________|_____________|

This can be represented as:

  1. Array([
  2. Array([String("Hello"), String("World")]),
  3. Array([String("Hello"), String("World"), String("Again")]),
  4. ]);

This can be nested even more!

Important notes

These data types and <tsymbols> are non-exhaustive. Whenever you are attempting to deserialize a packet, always throw some kind of UnimplementedError to indicate that your client cannot yet deserialize this specific type.

Useful read

We strongly recommend you to read the full list of types and how they are serialized in this document.

Response Specific Data Types

Responses will return some additional data types. This is a non-exhaustive list of such types.

Response Codes (!)

Response codes are often returned by the server when no ‘producable’ data can be returned, i.e something like FLUSHDB can only possibly return ‘Okay’ or an error. This distinction is made to reduce errors while matching responses. Skyhash will serialize a response code like:

  1. !<c>\n
  2. <code>\n

Where <c> is the number of characters in the code and <code> is the code itself. So Code 0 that corresponds to OKAY will be serialized into:

  1. !1\n
  2. 0\n

You find a full list of response codes in this table.

A full example (a simple query)

Let’s take a look at what happens when we send SET x ex. First, the client needs to serialize it into a Skyhash compatible type. Since this is a simple query, we just have one single element in the query array. Most of Skytable’s common actions use arrays, and SET uses an AnyArray. So in SET x ex:

  • This is a simple query
  • We need to send an AnyArray
  • It has three elements: ['SET', 'x', 'ex']
  1. *1\n # '*1' because this is a simple query
  2. ~3\n # 3 elements
  3. 3\n # 'SET' has 3 chars
  4. SET\n # 'SET' itself
  5. 1\n # 'x' has 1 char
  6. x\n # 'x' itself
  7. 2\n # 'ex' has 2 chars
  8. ex\n # 'ex' itself

Way to go! We just did it!

Now the server would return a query array with one element: a response code. This is what it returns:

  1. *1\n
  2. !1\n
  3. 0\n

Here:

  • *1 because this response corresponds to a simple query
  • !1 because the returned data type is a response code with tsymbol ! and a length of 1 char
  • 0 because this is the response code that corresponds to Okay

A full example (a pipelined query)

Let’s take a look at when we send two queries HEYA once and HEYA twice to the server, as a pipelined query.

  • This is a pipelined query
  • We need to send two AnyArrays, one for each query

This is what the client has to send (#s are used to denote comments):

  1. *2\n # *2 because this a pipelined query with two queries
  2. # we begin our first query from here
  3. ~2\n # our first query has two elements: "HEYA" and "once"
  4. 4\n # "HEYA" has 4 characters
  5. HEYA\n # the element itself
  6. 4\n # "once" has 4 characters
  7. once\n # the element itself
  8. # we're done. the second query begins here
  9. ~2\n # our second query has two elements: "HEYA" and "twice"
  10. 4\n # "HEYA" has 4 characters
  11. HEYA\n # the element itself
  12. 5\n # "twice" has 5 characters
  13. twice\n # the element itself

The server then responds with (#s are used to denote comments):

  1. *2\n # this response has two responses, for two queries
  2. # the first response
  3. +4\n # the first element "once" has 4 chars
  4. once\n # the element itself
  5. # the second response
  6. +5\n # the second element "twice" has 5 chars
  7. twice\n # the element itself

And there — you’ve learned Skyhash!