$regexFind (aggregation)
Definition
New in version 4.2.
Provides regular expression (regex) pattern matching capability inaggregation expressions. If a match is found, returns a documentthat contains information on the first match. If a match is notfound, returns null.
MongoDB uses Perl compatible regular expressions (i.e. “PCRE” )version 8.41 with UTF-8 support.
Prior to MongoDB 4.2, aggregation pipeline can only use the queryoperator $regex
in the $match
stage. For moreinformation on using regex in a query, see $regex
.
Syntax
The $regexFind
operator has the following syntax:
- { $regexFind: { input: <expression> , regex: <expression>, options: <expression> } }
Operator Fields
Field | Description | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
input | The string on which you wish to apply the regex pattern.Can be a string or any valid expression that resolves to a string. | ||||||||||
regex | The regex pattern to apply. Can be any valid expression that resolves to either a string or regexpattern /<pattern>/ . When using the regex /<patthern>/ , youcan also specify the regex options i and m (but not thes or x options):- "pattern" - /<pattern>/ - /<pattern>/<options> Alternatively, you can also specify the regex options with theoptions field. To specify the s or x options, youmust use the options field.You cannot specify options in both the regex and the options field. | ||||||||||
options | Optional. The following <options> are available for usewith regular expression.NoteYou cannot specify options in both the regex and theoptions field.
|
Returns
If the operator does not find a match, the result of the operator is anull
.
If the operator finds a match, the result of the operator is a documentthat contains:
- the first matching string in the input,
- the code point index(not byte index) of the matching string in the input, and
- An array of the strings that corresponds to the groups captured bythe matching string. Capturing groups are specified with parenthesis
()
in the regex pattern.
- { "match" : <string>, "idx" : <num>, "captures" : <array of strings> }
See also
Behavior
$regexFind and Collation
$regexFind
ignores the collation specified for thecollection, db.collection.aggregate()
, and the index, if used.
For example, the create a sample collection with collation strength1
(i.e. compare base character only and ignore other differencessuch as case and diacritics):
- db.createCollection( "myColl", { collation: { locale: "fr", strength: 1 } } )
Insert the following documents:
- db.myColl.insertMany([
- { _id: 1, category: "café" },
- { _id: 2, category: "cafe" },
- { _id: 3, category: "cafE" }
- ])
Using the collection’s collation, the following operation performs acase-insensitive and diacritic-insensitive match:
- db.myColl.aggregate( [ { $match: { category: "cafe" } } ] )
The operation returns the following 3 documents:
- { "_id" : 1, "category" : "café" }
- { "_id" : 2, "category" : "cafe" }
- { "_id" : 3, "category" : "cafE" }
However, the aggregation expression $regexFind
ignorescollation; that is, the following regular expression pattern matching examplesare case-sensitive and diacritic sensitive:
- db.myColl.aggregate( [ { $addFields: { resultObject: { $regexFind: { input: "$category", regex: /cafe/ } } } } ] )
- db.myColl.aggregate(
- [ { $addFields: { resultObject: { $regexFind: { input: "$category", regex: /cafe/ } } } } ],
- { collation: { locale: "fr", strength: 1 } } // Ignored in the $regexFind
- )
Both operations return the following:
- { "_id" : 1, "category" : "café", "resultObject" : null }
- { "_id" : 2, "category" : "cafe", "resultObject" : { "match" : "cafe", "idx" : 0, "captures" : [ ] } }
- { "_id" : 3, "category" : "cafE", "resultObject" : null }
To perform a case-insensitive regex pattern matching, use thei Option instead. Seei Option for an example.
Examples
$regexFind and Its Options
To illustrate the behavior of the $regexFind
operator asdiscussed in this example, create a sample collection products
withthe following documents:
- db.products.insertMany([
- { _id: 1, description: "Single LINE description." },
- { _id: 2, description: "First lines\nsecond line" },
- { _id: 3, description: "Many spaces before line" },
- { _id: 4, description: "Multiple\nline descriptions" },
- { _id: 5, description: "anchors, links and hyperlinks" },
- { _id: 6, description: "métier work vocation" }
- ])
By default, $regexFind
performs a case-sensitive match.For example, the following aggregation performs a case-sensitive$regexFind
on the description
field. The regexpattern /line/
does not specify any grouping:
- db.products.aggregate([
- { $addFields: { returnObject: { $regexFind: { input: "$description", regex: /line/ } } } }
- ])
The operation returns the following:
- { "_id" : 1, "description" : "Single LINE description.", "returnObject" : null }
- { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : { "match" : "line", "idx" : 6, "captures" : [ ] } }
- { "_id" : 3, "description" : "Many spaces before line", "returnObject" : { "match" : "line", "idx" : 23, "captures" : [ ] } }
- { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : { "match" : "line", "idx" : 9, "captures" : [ ] } }
- { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : null }
- { "_id" : 6, "description" : "métier work vocation", "returnObject" : null }
The following regex pattern /lin(e|k)/
specifies a grouping(e|k)
in the pattern:
- db.products.aggregate([
- { $addFields: { returnObject: { $regexFind: { input: "$description", regex: /lin(e|k)/ } } } }
- ])
The operation returns the following:
- { "_id" : 1, "description" : "Single LINE description.", "returnObject" : null }
- { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : { "match" : "line", "idx" : 6, "captures" : [ "e" ] } }
- { "_id" : 3, "description" : "Many spaces before line", "returnObject" : { "match" : "line", "idx" : 23, "captures" : [ "e" ] } }
- { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : { "match" : "line", "idx" : 9, "captures" : [ "e" ] } }
- { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : { "match" : "link", "idx" : 9, "captures" : [ "k" ] } }
- { "_id" : 6, "description" : "métier work vocation", "returnObject" : null }
In the return option, the idx
field is the code point index and not the byteindex. To illustrate, consider the following example that uses theregex pattern /tier/
:
- db.products.aggregate([
- { $addFields: { returnObject: { $regexFind: { input: "$description", regex: /tier/ } } } }
- ])
The operation returns the following where only the last recordmatches the pattern and the returned idx
is 2
(instead of 3if using a byte index)
- { "_id" : 1, "description" : "Single LINE description.", "returnObject" : null }
- { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : null }
- { "_id" : 3, "description" : "Many spaces before line", "returnObject" : null }
- { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : null }
- { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : null }
- { "_id" : 6, "description" : "métier work vocation",
- "returnObject" : { "match" : "tier", "idx" : 2, "captures" : [ ] } }
i Option
Note
You cannot specify options in both the regex
and theoptions
field.
To perform case-insensitive pattern matching, include the i option as part of the regex field orin the options field:
- // Specify i as part of the regex field
- { $regexFind: { input: "$description", regex: /line/i } }
- // Specify i in the options field
- { $regexFind: { input: "$description", regex: /line/, options: "i" } }
- { $regexFind: { input: "$description", regex: "line", options: "i" } }
For example, the following aggregation performs a case-insensitive$regexFind
on the description
field. The regexpattern /line/
does not specify any grouping:
- db.products.aggregate([
- { $addFields: { returnObject: { $regexFind: { input: "$description", regex: /line/i } } } }
- ])
The operation returns the following documents:
- { "_id" : 1, "description" : "Single LINE description.", "returnObject" : { "match" : "LINE", "idx" : 7, "captures" : [ ] } }
- { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : { "match" : "line", "idx" : 6, "captures" : [ ] } }
- { "_id" : 3, "description" : "Many spaces before line", "returnObject" : { "match" : "line", "idx" : 23, "captures" : [ ] } }
- { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : { "match" : "line", "idx" : 9, "captures" : [ ] } }
- { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : null }
- { "_id" : 6, "description" : "métier work vocation", "returnObject" : null }
m Option
Note
You cannot specify options in both the regex
and theoptions
field.
To match the specified anchors (e.g. ^
, $
) for each line of amultiline string, include the m option aspart of the regex field or in theoptions field:
- // Specify m as part of the regex field
- { $regexFind: { input: "$description", regex: /line/m } }
- // Specify m in the options field
- { $regexFind: { input: "$description", regex: /line/, options: "m" } }
- { $regexFind: { input: "$description", regex: "line", options: "m" } }
The following example includes both the i
and the m
options tomatch lines starting with either the letter s
or S
formultiline strings:
- db.products.aggregate([
- { $addFields: { returnObject: { $regexFind: { input: "$description", regex: /^s/im } } } }
- ])
The operation returns the following:
- { "_id" : 1, "description" : "Single LINE description.", "returnObject" : { "match" : "S", "idx" : 0, "captures" : [ ] } }
- { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : { "match" : "s", "idx" : 12, "captures" : [ ] } }
- { "_id" : 3, "description" : "Many spaces before line", "returnObject" : null }
- { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : null }
- { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : null }
- { "_id" : 6, "description" : "métier work vocation", "returnObject" : null }
x Option
Note
You cannot specify options in both the regex
and theoptions
field.
To ignore all unescaped white space characters and comments (denoted bythe un-escaped hash #
character and the next new-line character) inthe pattern, include the s option in theoptions field:
- // Specify x in the options field
- { $regexFind: { input: "$description", regex: /line/, options: "x" } }
- { $regexFind: { input: "$description", regex: "line", options: "x" } }
The following example includes the x
option to skip unescaped whitespaces and comments:
- db.products.aggregate([
- { $addFields: { returnObject: { $regexFind: { input: "$description", regex: /lin(e|k) # matches line or link/, options:"x" } } } }
- ])
The operation returns the following:
- { "_id" : 1, "description" : "Single LINE description.", "returnObject" : null }
- { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : { "match" : "line", "idx" : 6, "captures" : [ "e" ] } }
- { "_id" : 3, "description" : "Many spaces before line", "returnObject" : { "match" : "line", "idx" : 23, "captures" : [ "e" ] } }
- { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : { "match" : "line", "idx" : 9, "captures" : [ "e" ] } }
- { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : { "match" : "link", "idx" : 9, "captures" : [ "k" ] } }
- { "_id" : 6, "description" : "métier work vocation", "returnObject" : null }
s Option
Note
You cannot specify options in both the regex
and theoptions
field.
To allow the dot character (i.e. .
) in the pattern to match allcharacters including the new line character, include the s option in the optionsfield:
- // Specify s in the options field
- { $regexFind: { input: "$description", regex: /m.*line/, options: "s" } }
- { $regexFind: { input: "$description", regex: "m.*line", options: "s" } }
The following example includes the s
option to allow the dotcharacter (i.e. .) to match all characters including new line as wellas the i
option to perform a case-insensitive match:
- db.products.aggregate([
- { $addFields: { returnObject: { $regexFind: { input: "$description", regex:/m.*line/, options: "si" } } } }
- ])
The operation returns the following:
- { "_id" : 1, "description" : "Single LINE description.", "returnObject" : null }
- { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : null }
- { "_id" : 3, "description" : "Many spaces before line", "returnObject" : { "match" : "Many spaces before line", "idx" : 0, "captures" : [ ] } }
- { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : { "match" : "Multiple\nline", "idx" : 0, "captures" : [ ] } }
- { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : null }
- { "_id" : 6, "description" : "métier work vocation", "returnObject" : null }
Use $regexFind to Parse Email from String
Create a sample collection feedback
with the following documents:
- db.feedback.insertMany([
- { "_id" : 1, comment: "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com" },
- { "_id" : 2, comment: "I wanted to concatenate a string" },
- { "_id" : 3, comment: "How do I convert a date to string? cam@mongodb.com" },
- { "_id" : 4, comment: "It's just me. I'm testing. fred@MongoDB.com" }
- ])
The following aggregation uses the $regexFind
to extractthe email from the comment
field (case insensitive).
- db.feedback.aggregate( [
- { $addFields: {
- "email": { $regexFind: { input: "$comment", regex: /[a-z0-9_.+-]+@[a-z0-9_.+-]+\.[a-z0-9_.+-]+/i } }
- } },
- { $set: { email: "$email.match"} }
- ] )
- First Stage
- The stage uses the
$addFields
stage to add a new fieldemail
to the document. The new field contains the result ofperforming the$regexFind
on thecomment
field:
- { "_id" : 1, "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com", "email" : { "match" : "aunt.arc.tica@example.com", "idx" : 38, "captures" : [ ] } }
- { "_id" : 2, "comment" : "I wanted to concatenate a string", "email" : null }
- { "_id" : 3, "comment" : "I can't find how to convert a date to string. cam@mongodb.com", "email" : { "match" : "cam@mongodb.com", "idx" : 46, "captures" : [ ] } }
- { "_id" : 4, "comment" : "It's just me. I'm testing. fred@MongoDB.com", "email" : { "match" : "fred@MongoDB.com", "idx" : 28, "captures" : [ ] } }
- Second Stage
- The stage use the
$set
stage to reset theemail
tothe current"$email.match"
value. If the current value ofemail
is null, the new value ofemail
is set to null.
- { "_id" : 1, "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com", "email" : "aunt.arc.tica@example.com" }
- { "_id" : 2, "comment" : "I wanted to concatenate a string" }
- { "_id" : 3, "comment" : "I can't find how to convert a date to string. cam@mongodb.com", "email" : "cam@mongodb.com" }
- { "_id" : 4, "comment" : "It's just me. I'm testing. fred@MongoDB.com", "email" : "fred@MongoDB.com" }
Apply $regexFind to String Elements of an Array
Create a sample collection contacts
with the following documents:
- db.contacts.insertMany([
- { "_id" : 1, name: "Aunt Arc Tikka", details: [ "+672-19-9999", "aunt.arc.tica@example.com" ] },
- { "_id" : 2, name: "Belle Gium", details: [ "+32-2-111-11-11", "belle.gium@example.com" ] },
- { "_id" : 3, name: "Cam Bo Dia", details: [ "+855-012-000-0000", "cam.bo.dia@example.com" ] },
- { "_id" : 4, name: "Fred", details: [ "+1-111-222-3333" ] }
- ])
The following aggregation uses the $regexFind
to convertthe details
array into an embedded document with an email
andphone
fields:
- db.contacts.aggregate( [
- { $unwind: "$details" },
- { $addFields: {
- "regexemail": { $regexFind: { input: "$details", regex: /^[a-z0-9_.+-]+@[a-z0-9_.+-]+\.[a-z0-9_.+-]+$/, options: "i" } },
- "regexphone": { $regexFind: { input: "$details", regex: /^[+]{0,1}[0-9]*\-?[0-9_\-]+$/ } }
- } },
- { $project: { _id: 1, name: 1, details: { email: "$regexemail.match", phone: "$regexphone.match" } } },
- { $group: { _id: "$_id", name: { $first: "$name" }, details: { $mergeObjects: "$details"} } },
- { $sort: { _id: 1 } }
- ])
- First Stage
- The stage
$unwinds
the array into separate documents:
- { "_id" : 1, "name" : "Aunt Arc Tikka", "details" : "+672-19-9999" }
- { "_id" : 1, "name" : "Aunt Arc Tikka", "details" : "aunt.arc.tica@example.com" }
- { "_id" : 2, "name" : "Belle Gium", "details" : "+32-2-111-11-11" }
- { "_id" : 2, "name" : "Belle Gium", "details" : "belle.gium@example.com" }
- { "_id" : 3, "name" : "Cam Bo Dia", "details" : "+855-012-000-0000" }
- { "_id" : 3, "name" : "Cam Bo Dia", "details" : "cam.bo.dia@example.com" }
- { "_id" : 4, "name" : "Fred", "details" : "+1-111-222-3333" }
- Second Stage
- The stage uses the
$addFields
stage to add new fields tothe document that contains the result of the$regexFind
for phone number and email:
- { "_id" : 1, "name" : "Aunt Arc Tikka", "details" : "+672-19-9999", "regexemail" : null, "regexphone" : { "match" : "+672-19-9999", "idx" : 0, "captures" : [ ] } }
- { "_id" : 1, "name" : "Aunt Arc Tikka", "details" : "aunt.arc.tica@example.com", "regexemail" : { "match" : "aunt.arc.tica@example.com", "idx" : 0, "captures" : [ ] }, "regexphone" : null }
- { "_id" : 2, "name" : "Belle Gium", "details" : "+32-2-111-11-11", "regexemail" : null, "regexphone" : { "match" : "+32-2-111-11-11", "idx" : 0, "captures" : [ ] } }
- { "_id" : 2, "name" : "Belle Gium", "details" : "belle.gium@example.com", "regexemail" : { "match" : "belle.gium@example.com", "idx" : 0, "captures" : [ ] }, "regexphone" : null }
- { "_id" : 3, "name" : "Cam Bo Dia", "details" : "+855-012-000-0000", "regexemail" : null, "regexphone" : { "match" : "+855-012-000-0000", "idx" : 0, "captures" : [ ] } }
- { "_id" : 3, "name" : "Cam Bo Dia", "details" : "cam.bo.dia@example.com", "regexemail" : { "match" : "cam.bo.dia@example.com", "idx" : 0, "captures" : [ ] }, "regexphone" : null }
- { "_id" : 4, "name" : "Fred", "details" : "+1-111-222-3333", "regexemail" : null, "regexphone" : { "match" : "+1-111-222-3333", "idx" : 0, "captures" : [ ] } }
- Third Stage
- The stage use the
$project
stage to output documents with the_id
field, thename
field andthedetails
field. Thedetails
field is set to adocument withemail
andphone
fields, whose values are determined from theregexemail
andregexphone
fields, respectively.
- { "_id" : 1, "name" : "Aunt Arc Tikka", "details" : { "phone" : "+672-19-9999" } }
- { "_id" : 1, "name" : "Aunt Arc Tikka", "details" : { "email" : "aunt.arc.tica@example.com" } }
- { "_id" : 2, "name" : "Belle Gium", "details" : { "phone" : "+32-2-111-11-11" } }
- { "_id" : 2, "name" : "Belle Gium", "details" : { "email" : "belle.gium@example.com" } }
- { "_id" : 3, "name" : "Cam Bo Dia", "details" : { "phone" : "+855-012-000-0000" } }
- { "_id" : 3, "name" : "Cam Bo Dia", "details" : { "email" : "cam.bo.dia@example.com" } }
- { "_id" : 4, "name" : "Fred", "details" : { "phone" : "+1-111-222-3333" } }
- Fourth Stage
- The stage uses the
$group
stage to groups the input documents bytheir_id
value. The stage uses the$mergeObjects
expression to merge thedetails
documents.
- { "_id" : 3, "name" : "Cam Bo Dia", "details" : { "phone" : "+855-012-000-0000", "email" : "cam.bo.dia@example.com" } }
- { "_id" : 4, "name" : "Fred", "details" : { "phone" : "+1-111-222-3333" } }
- { "_id" : 1, "name" : "Aunt Arc Tikka", "details" : { "phone" : "+672-19-9999", "email" : "aunt.arc.tica@example.com" } }
- { "_id" : 2, "name" : "Belle Gium", "details" : { "phone" : "+32-2-111-11-11", "email" : "belle.gium@example.com" } }
- Fifth Stage
- The stage uses the
$sort
stage to sort the documents by the_id
field.
- { "_id" : 1, "name" : "Aunt Arc Tikka", "details" : { "phone" : "+672-19-9999", "email" : "aunt.arc.tica@example.com" } }
- { "_id" : 2, "name" : "Belle Gium", "details" : { "phone" : "+32-2-111-11-11", "email" : "belle.gium@example.com" } }
- { "_id" : 3, "name" : "Cam Bo Dia", "details" : { "phone" : "+855-012-000-0000", "email" : "cam.bo.dia@example.com" } }
- { "_id" : 4, "name" : "Fred", "details" : { "phone" : "+1-111-222-3333" } }
Use Captured Groupings to Parse User Name
Create a sample collection employees
with the following documents:
- db.employees.insertMany([
- { "_id" : 1, name: "Aunt Arc Tikka", "email" : "aunt.tica@example.com" },
- { "_id" : 2, name: "Belle Gium", "email" : "belle.gium@example.com" },
- { "_id" : 3, name: "Cam Bo Dia", "email" : "cam.dia@example.com" },
- { "_id" : 4, name: "Fred" }
- ])
The employee email has the format<firstname>.<lastname>@example.com
. Using the captured
fieldreturned in the $regexFind
results, you can parse outuser names for employees.
- db.employees.aggregate( [
- { $addFields: {
- "username": { $regexFind: { input: "$email", regex: /^([a-z0-9_.+-]+)@[a-z0-9_.+-]+\.[a-z0-9_.+-]+$/, options: "i" } },
- } },
- { $set: { username: { $arrayElemAt: [ "$username.captures", 0 ] } } }
- ] )
- First Stage
- The stage uses the
$addFields
stage to add a new fieldusername
to the document. The new field contains the result ofperforming the$regexFind
on theemail
field:
- { "_id" : 1, "name" : "Aunt Arc Tikka", "email" : "aunt.tica@example.com", "username" : { "match" : "aunt.tica@example.com", "idx" : 0, "captures" : [ "aunt.tica" ] } }
- { "_id" : 2, "name" : "Belle Gium", "email" : "belle.gium@example.com", "username" : { "match" : "belle.gium@example.com", "idx" : 0, "captures" : [ "belle.gium" ] } }
- { "_id" : 3, "name" : "Cam Bo Dia", "email" : "cam.dia@example.com", "username" : { "match" : "cam.dia@example.com", "idx" : 0, "captures" : [ "cam.dia" ] } }
- { "_id" : 4, "name" : "Fred", "username" : null }
- Second Stage
- The stage use the
$set
stage to reset theusername
tothe zero-th element of the"$username.captures"
array. If the current value ofusername
is null, the new value ofusername
is set to null.
- { "_id" : 1, "name" : "Aunt Arc Tikka", "email" : "aunt.tica@example.com", "username" : "aunt.tica" }
- { "_id" : 2, "name" : "Belle Gium", "email" : "belle.gium@example.com", "username" : "belle.gium" }
- { "_id" : 3, "name" : "Cam Bo Dia", "email" : "cam.dia@example.com", "username" : "cam.dia" }
- { "_id" : 4, "name" : "Fred", "username" : null }