$substrCP (aggregation)
Definition
$substrCP
- Returns the substring of a string. The substring starts with thecharacter at the specified UTF-8 code point (CP) index (zero-based)in the string for the number of code points specified.
$substrCP
has the following operatorexpression syntax:
- { $substrCP: [ <string expression>, <code point index>, <code point count> ] }
FieldTypeDescriptionstring expression
stringThe string from which the substring will be extracted. string expression
can be any valid expression aslong as it resolves to a string. For more information onexpressions, see Expressions.
If the argument resolves to a value of null
or refers to a fieldthat is missing, $substrCP
returns an empty string.
If the argument does not resolve to a string or null
norrefers to a missing field, $substrCP
returns an error.code point index
numberIndicates the starting point of the substring. code point index
can be any valid expression as long asit resolves to a non-negative integer.code point count
numberCan be any valid expressionas long as it resolves to a non-negative integer or number that can berepresented as an integer (such as 2.0).
ExampleResults{ $substrCP: [ "abcde", 1, 2 ] }
"bc"
{ $substrCP: [ "Hello World!", 6, 5 ] }
"World"
{ $substrCP: [ "cafétéria", 0, 5 ] }
"cafét"
{ $substrCP: [ "cafétéria", 5, 4 ] }
"tér"
{ $substrCP: [ "cafétéria", 7, 3 ] }
"ia"
{ $substrCP: [ "cafétéria", 3, 1 ] }
"é"
Behavior
The $substrCP
operator uses the code points to extractthe substring. This behavior differs from the$substrBytes
operator which extracts the substringby the number of bytes, where each character uses between one and fourbytes.
Example
Single-Byte Character Set
Consider an inventory
collection with the following documents:
- { "_id" : 1, "item" : "ABC1", quarter: "13Q1", "description" : "product 1" }
- { "_id" : 2, "item" : "ABC2", quarter: "13Q4", "description" : "product 2" }
- { "_id" : 3, "item" : "XYZ1", quarter: "14Q2", "description" : null }
The following operation uses the $substrCP
operator toseparate the quarter
value into a yearSubstring
and aquarterSubstring
. The quarterSubstring
field represents therest of the string from the specified byte index
following theyearSubstring
. It is calculated by subtracting the byte index
from the length of the string using $strLenCP
.
- db.inventory.aggregate(
- [
- {
- $project: {
- item: 1,
- yearSubstring: { $substrCP: [ "$quarter", 0, 2 ] },
- quarterSubtring: {
- $substrCP: [
- "$quarter", 2, { $subtract: [ { $strLenCP: "$quarter" }, 2 ] }
- ]
- }
- }
- }
- ]
- )
The operation returns the following results:
- { "_id" : 1, "item" : "ABC1", "yearSubstring" : "13", "quarterSubtring" : "Q1" }
- { "_id" : 2, "item" : "ABC2", "yearSubstring" : "13", "quarterSubtring" : "Q4" }
- { "_id" : 3, "item" : "XYZ1", "yearSubstring" : "14", "quarterSubtring" : "Q2" }
Single-Byte and Multibyte Character Set
A collection named food
contains the following documents:
- { "_id" : 1, "name" : "apple" }
- { "_id" : 2, "name" : "banana" }
- { "_id" : 3, "name" : "éclair" }
- { "_id" : 4, "name" : "hamburger" }
- { "_id" : 5, "name" : "jalapeño" }
- { "_id" : 6, "name" : "pizza" }
- { "_id" : 7, "name" : "tacos" }
- { "_id" : 8, "name" : "寿司sushi" }
The following example uses the $substrCP
operator to create a threebyte menuCode
from the name
value:
- db.food.aggregate(
- [
- {
- $project: {
- "name": 1,
- "menuCode": { $substrCP: [ "$name", 0, 3 ] }
- }
- }
- ]
- )
The operation returns the following results:
- { "_id" : 1, "name" : "apple", "menuCode" : "app" }
- { "_id" : 2, "name" : "banana", "menuCode" : "ban" }
- { "_id" : 3, "name" : "éclair", "menuCode" : "écl" }
- { "_id" : 4, "name" : "hamburger", "menuCode" : "ham" }
- { "_id" : 5, "name" : "jalapeño", "menuCode" : "jal" }
- { "_id" : 6, "name" : "pizza", "menuCode" : "piz" }
- { "_id" : 7, "name" : "tacos", "menuCode" : "tac" }
- { "_id" : 8, "name" : "寿司sushi", "menuCode" : "寿司s" }
See also