Hash Functions

Hash functions can be used for the deterministic pseudo-random shuffling of elements.

halfMD5

Interprets all the input parameters as strings and calculates the MD5 hash value for each of them. Then combines hashes, takes the first 8 bytes of the hash of the resulting string, and interprets them as UInt64 in big-endian byte order.

  1. halfMD5(par1, ...)

The function is relatively slow (5 million short strings per second per processor core).
Consider using the sipHash64 function instead.

Arguments

The function takes a variable number of input parameters. Arguments can be any of the supported data types.

Returned Value

A UInt64 data type hash value.

Example

  1. SELECT halfMD5(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS halfMD5hash, toTypeName(halfMD5hash) AS type
  1. ┌────────halfMD5hash─┬─type───┐
  2. 186182704141653334 UInt64
  3. └────────────────────┴────────┘

MD5

Calculates the MD5 from a string and returns the resulting set of bytes as FixedString(16).
If you don’t need MD5 in particular, but you need a decent cryptographic 128-bit hash, use the ‘sipHash128’ function instead.
If you want to get the same result as output by the md5sum utility, use lower(hex(MD5(s))).

sipHash64

Produces a 64-bit SipHash hash value.

  1. sipHash64(par1,...)

This is a cryptographic hash function. It works at least three times faster than the MD5 function.

Function interprets all the input parameters as strings and calculates the hash value for each of them. Then combines hashes by the following algorithm:

  1. After hashing all the input parameters, the function gets the array of hashes.
  2. Function takes the first and the second elements and calculates a hash for the array of them.
  3. Then the function takes the hash value, calculated at the previous step, and the third element of the initial hash array, and calculates a hash for the array of them.
  4. The previous step is repeated for all the remaining elements of the initial hash array.

Arguments

The function takes a variable number of input parameters. Arguments can be any of the supported data types.

Returned Value

A UInt64 data type hash value.

Example

  1. SELECT sipHash64(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS SipHash, toTypeName(SipHash) AS type
  1. ┌──────────────SipHash─┬─type───┐
  2. 13726873534472839665 UInt64
  3. └──────────────────────┴────────┘

sipHash128

Calculates SipHash from a string.
Accepts a String-type argument. Returns FixedString(16).
Differs from sipHash64 in that the final xor-folding state is only done up to 128 bits.

cityHash64

Produces a 64-bit CityHash hash value.

  1. cityHash64(par1,...)

This is a fast non-cryptographic hash function. It uses the CityHash algorithm for string parameters and implementation-specific fast non-cryptographic hash function for parameters with other data types. The function uses the CityHash combinator to get the final results.

Arguments

The function takes a variable number of input parameters. Arguments can be any of the supported data types.

Returned Value

A UInt64 data type hash value.

Examples

Call example:

  1. SELECT cityHash64(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS CityHash, toTypeName(CityHash) AS type
  1. ┌─────────────CityHash─┬─type───┐
  2. 12072650598913549138 UInt64
  3. └──────────────────────┴────────┘

The following example shows how to compute the checksum of the entire table with accuracy up to the row order:

  1. SELECT groupBitXor(cityHash64(*)) FROM table

intHash32

Calculates a 32-bit hash code from any type of integer.
This is a relatively fast non-cryptographic hash function of average quality for numbers.

intHash64

Calculates a 64-bit hash code from any type of integer.
It works faster than intHash32. Average quality.

SHA1

SHA224

SHA256

Calculates SHA-1, SHA-224, or SHA-256 from a string and returns the resulting set of bytes as FixedString(20), FixedString(28), or FixedString(32).
The function works fairly slowly (SHA-1 processes about 5 million short strings per second per processor core, while SHA-224 and SHA-256 process about 2.2 million).
We recommend using this function only in cases when you need a specific hash function and you can’t select it.
Even in these cases, we recommend applying the function offline and pre-calculating values when inserting them into the table, instead of applying it in SELECTS.

URLHash(url[, N])

A fast, decent-quality non-cryptographic hash function for a string obtained from a URL using some type of normalization.
URLHash(s) – Calculates a hash from a string without one of the trailing symbols /,? or # at the end, if present.
URLHash(s, N) – Calculates a hash from a string up to the N level in the URL hierarchy, without one of the trailing symbols /,? or # at the end, if present.
Levels are the same as in URLHierarchy. This function is specific to Yandex.Metrica.

farmFingerprint64

farmHash64

Produces a 64-bit FarmHash or Fingerprint value. farmFingerprint64 is preferred for a stable and portable value.

  1. farmFingerprint64(par1, ...)
  2. farmHash64(par1, ...)

These functions use the Fingerprint64 and Hash64 methods respectively from all available methods.

Arguments

The function takes a variable number of input parameters. Arguments can be any of the supported data types.

Returned Value

A UInt64 data type hash value.

Example

  1. SELECT farmHash64(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS FarmHash, toTypeName(FarmHash) AS type
  1. ┌─────────────FarmHash─┬─type───┐
  2. 17790458267262532859 UInt64
  3. └──────────────────────┴────────┘

javaHash

Calculates JavaHash from a string. This hash function is neither fast nor having a good quality. The only reason to use it is when this algorithm is already used in another system and you have to calculate exactly the same result.

Syntax

  1. SELECT javaHash('');

Returned value

A Int32 data type hash value.

Example

Query:

  1. SELECT javaHash('Hello, world!');

Result:

  1. ┌─javaHash('Hello, world!')─┐
  2. -1880044555
  3. └───────────────────────────┘

javaHashUTF16LE

Calculates JavaHash from a string, assuming it contains bytes representing a string in UTF-16LE encoding.

Syntax

  1. javaHashUTF16LE(stringUtf16le)

Arguments

  • stringUtf16le — a string in UTF-16LE encoding.

Returned value

A Int32 data type hash value.

Example

Correct query with UTF-16LE encoded string.

Query:

  1. SELECT javaHashUTF16LE(convertCharset('test', 'utf-8', 'utf-16le'))

Result:

  1. ┌─javaHashUTF16LE(convertCharset('test', 'utf-8', 'utf-16le'))─┐
  2. 3556498
  3. └──────────────────────────────────────────────────────────────┘

hiveHash

Calculates HiveHash from a string.

  1. SELECT hiveHash('');

This is just JavaHash with zeroed out sign bit. This function is used in Apache Hive for versions before 3.0. This hash function is neither fast nor having a good quality. The only reason to use it is when this algorithm is already used in another system and you have to calculate exactly the same result.

Returned value

A Int32 data type hash value.

Type: hiveHash.

Example

Query:

  1. SELECT hiveHash('Hello, world!');

Result:

  1. ┌─hiveHash('Hello, world!')─┐
  2. 267439093
  3. └───────────────────────────┘

metroHash64

Produces a 64-bit MetroHash hash value.

  1. metroHash64(par1, ...)

Arguments

The function takes a variable number of input parameters. Arguments can be any of the supported data types.

Returned Value

A UInt64 data type hash value.

Example

  1. SELECT metroHash64(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS MetroHash, toTypeName(MetroHash) AS type
  1. ┌────────────MetroHash─┬─type───┐
  2. 14235658766382344533 UInt64
  3. └──────────────────────┴────────┘

jumpConsistentHash

Calculates JumpConsistentHash form a UInt64.
Accepts two arguments: a UInt64-type key and the number of buckets. Returns Int32.
For more information, see the link: JumpConsistentHash

murmurHash2_32, murmurHash2_64

Produces a MurmurHash2 hash value.

  1. murmurHash2_32(par1, ...)
  2. murmurHash2_64(par1, ...)

Arguments

Both functions take a variable number of input parameters. Arguments can be any of the supported data types.

Returned Value

  • The murmurHash2_32 function returns hash value having the UInt32 data type.
  • The murmurHash2_64 function returns hash value having the UInt64 data type.

Example

  1. SELECT murmurHash2_64(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS MurmurHash2, toTypeName(MurmurHash2) AS type
  1. ┌──────────MurmurHash2─┬─type───┐
  2. 11832096901709403633 UInt64
  3. └──────────────────────┴────────┘

gccMurmurHash

Calculates a 64-bit MurmurHash2 hash value using the same hash seed as gcc. It is portable between CLang and GCC builds.

Syntax

  1. gccMurmurHash(par1, ...);

Arguments

Returned value

  • Calculated hash value.

Type: UInt64.

Example

Query:

  1. SELECT
  2. gccMurmurHash(1, 2, 3) AS res1,
  3. gccMurmurHash(('a', [1, 2, 3], 4, (4, ['foo', 'bar'], 1, (1, 2)))) AS res2

Result:

  1. ┌─────────────────res1─┬────────────────res2─┐
  2. 12384823029245979431 1188926775431157506
  3. └──────────────────────┴─────────────────────┘

murmurHash3_32, murmurHash3_64

Produces a MurmurHash3 hash value.

  1. murmurHash3_32(par1, ...)
  2. murmurHash3_64(par1, ...)

Arguments

Both functions take a variable number of input parameters. Arguments can be any of the supported data types.

Returned Value

  • The murmurHash3_32 function returns a UInt32 data type hash value.
  • The murmurHash3_64 function returns a UInt64 data type hash value.

Example

  1. SELECT murmurHash3_32(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS MurmurHash3, toTypeName(MurmurHash3) AS type
  1. ┌─MurmurHash3─┬─type───┐
  2. 2152717 UInt32
  3. └─────────────┴────────┘

murmurHash3_128

Produces a 128-bit MurmurHash3 hash value.

  1. murmurHash3_128( expr )

Arguments

Returned Value

A FixedString(16) data type hash value.

Example

  1. SELECT murmurHash3_128('example_string') AS MurmurHash3, toTypeName(MurmurHash3) AS type
  1. ┌─MurmurHash3──────┬─type────────────┐
  2. 6�1
  3. 4"S5KT�~~q │ FixedString(16) │
  4. └──────────────────┴─────────────────┘

xxHash32, xxHash64

Calculates xxHash from a string. It is proposed in two flavors, 32 and 64 bits.

  1. SELECT xxHash32('');
  2. OR
  3. SELECT xxHash64('');

Returned value

A Uint32 or Uint64 data type hash value.

Type: xxHash.

Example

Query:

  1. SELECT xxHash32('Hello, world!');

Result:

  1. ┌─xxHash32('Hello, world!')─┐
  2. 834093149
  3. └───────────────────────────┘

See Also

Original article