$strLenBytes (aggregation)

$strLenBytes (aggregation)

Definition

$strLenBytes

New in version 3.4.

Returns the number of UTF-8 encoded bytes in the specified string.

$strLenBytes has the following operatorexpression syntax:

{ $strLenBytes: <string expression> }

The argument can be any valid expression as long as it resolves to a string. Formore information on expressions, see Expressions.

If the argument resolves to a value of null or refers to amissing field, $strLenBytes returns an error.

Behavior

The $strLenBytes operator counts the number of UTF-8encoded bytes in a string where each character may use between oneand four bytes.

For example, US-ASCII characters are encoded using one byte. Characterswith diacritic markings and additional Latin alphabetical characters(i.e. Latin characters outside of the English alphabet) are encodedusing two bytes. Chinese, Japanese and Korean characters typicallyrequire three bytes, and other planes of unicode (emoji, mathematicalsymbols, etc.) require four bytes.

The $strLenBytes operator differs from$strLenCP operator which counts thecode pointsin the specified string regardless of how many bytes each characteruses.

Example	Results	Notes
{ $strLenBytes: "abcde" }	`5`	Each character is encoded using one byte.
{ $strLenBytes: "Hello World!" }	`12`	Each character is encoded using one byte.
{ $strLenBytes: "cafeteria" }	`9`	Each character is encoded using one byte.
{ $strLenBytes: "cafétéria" }	`11`	`é` is encoded using two bytes.
{ $strLenBytes: "" }	`0`	Empty strings return 0.
{ $strLenBytes: "$€λG" }	`7`	`€` is encoded using three bytes.`λ` is encoded using two bytes.
{ $strLenBytes: "寿司" }	`6`	Each character is encoded using three bytes.

Example

Single-Byte and Multibyte Character Set

A collection named food contains the following documents:

{ "_id" : 1, "name" : "apple" }
{ "_id" : 2, "name" : "banana" }
{ "_id" : 3, "name" : "éclair" }
{ "_id" : 4, "name" : "hamburger" }
{ "_id" : 5, "name" : "jalapeño" }
{ "_id" : 6, "name" : "pizza" }
{ "_id" : 7, "name" : "tacos" }
{ "_id" : 8, "name" : "寿司" }

The following operation uses the $strLenBytes operator to calculatethe length of each name value:

db.food.aggregate(
  [
    {
      $project: {
        "name": 1,
        "length": { $strLenBytes: "$name" }
      }
    }
  ]
)

The operation returns the following results:

{ "_id" : 1, "name" : "apple", "length" : 5 }
{ "_id" : 2, "name" : "banana", "length" : 6 }
{ "_id" : 3, "name" : "éclair", "length" : 7 }
{ "_id" : 4, "name" : "hamburger", "length" : 9 }
{ "_id" : 5, "name" : "jalapeño", "length" : 9 }
{ "_id" : 6, "name" : "pizza", "length" : 5 }
{ "_id" : 7, "name" : "tacos", "length" : 5 }
{ "_id" : 8, "name" : "寿司", "length" : 6 }

The documents with _id: 3 and _id: 5 each contain a diacriticcharacter (é and ñ respectively) that requires two bytes toencode. The document with _id: 8 contains two Japanese charactersthat are encoded using three bytes each. This makes the lengthgreater than the number of characters in name for the documentswith _id: 3, _id: 5 and _id: 8.