$regexFindAll (aggregation)
Definition
New in version 4.2.
Provides regular expression (regex) pattern matching capability inaggregation expressions. The operator returns an array of documentsthat contains information on each match. If a match is not found,returns an empty array.
MongoDB uses Perl compatible regular expressions (i.e. “PCRE” )version 8.41 with UTF-8 support.
Prior to MongoDB 4.2, aggregation pipeline can only use the queryoperator $regex
in the $match
stage. For moreinformation on using regex in a query, see $regex
.
Syntax
The $regexFindAll
operator has the following syntax:
- { $regexFindAll: { input: <expression> , regex: <expression>, options: <expression> } }
Field | Description | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
input | The string on which you wish to apply the regex pattern.Can be a string or any valid expression that resolves to a string. | ||||||||||
regex | The regex pattern to apply. Can be any valid expression that resolves to either a string or regexpattern /<pattern>/ . When using the regex /<patthern>/ , youcan also specify the regex options i and m (but not thes or x options):- "pattern" - /<pattern>/ - /<pattern>/<options> Alternatively, you can also specify the regex options with theoptions field. To specify the s or x options, youmust use the options field.You cannot specify options in both the regex and the options field. | ||||||||||
options | Optional. The following <options> are available for usewith regular expression.NoteYou cannot specify options in both the regex and theoptions field.
|
Returns
The operator returns an array:
If the operator does not find a match, the operator returns an emptyarray.
If the operator finds a match, the operator returns an array ofdocuments that contains the following information for each match:
- the matching string in the input,
- the code pointindex (not byte index) of the matching string in the input, and
- An array of the strings that corresponds to the groups captured bythe matching string. Capturing groups are specified with parenthesis
()
in the regex pattern.
- [ { "match" : <string>, "idx" : <num>, "captures" : <array of strings> }, ... ]
See also
Behavior
$regexFindAll and Collation
$regexFindAll
ignores the collation specified for thecollection, db.collection.aggregate()
, and the index, if used.
For example, the create a sample collection with collation strength1
(i.e. compare base character only and ignore other differencessuch as case and diacritics):
- db.createCollection( "myColl", { collation: { locale: "fr", strength: 1 } } )
Insert the following documents:
- db.myColl.insertMany([
- { _id: 1, category: "café" },
- { _id: 2, category: "cafe" },
- { _id: 3, category: "cafE" }
- ])
Using the collection’s collation, the following operation performs acase-insensitive and diacritic-insensitive match:
- db.myColl.aggregate( [ { $match: { category: "cafe" } } ] )
The operation returns the following 3 documents:
- { "_id" : 1, "category" : "café" }
- { "_id" : 2, "category" : "cafe" }
- { "_id" : 3, "category" : "cafE" }
However, the aggregation expression $regexFind
ignorescollation; that is, the following regular expression pattern matching examplesare case-sensitive and diacritic sensitive:
- db.myColl.aggregate( [ { $addFields: { results: { $regexFindAll: { input: "$category", regex: /cafe/ } } } } ] )
- db.myColl.aggregate(
- [ { $addFields: { results: { $regexFindAll: { input: "$category", regex: /cafe/ } } } } ],
- { collation: { locale: "fr", strength: 1 } } // Ignored in the $regexFindAll
- )
Both operations return the following:
- { "_id" : 1, "category" : "café", "results" : [ ] }
- { "_id" : 2, "category" : "cafe", "results" : [ { "match" : "cafe", "idx" : 0, "captures" : [ ] } ] }
- { "_id" : 3, "category" : "cafE", "results" : [ ] }
To perform a case-insensitive regex pattern matching, use thei Option instead. Seei Option for an example.
Examples
$regexFindAll and Its Options
To illustrate the behavior of the $regexFindAll
operator asdiscussed in this example, create a sample collection products
withthe following documents:
- db.products.insertMany([
- { _id: 1, description: "Single LINE description." },
- { _id: 2, description: "First lines\nsecond line" },
- { _id: 3, description: "Many spaces before line" },
- { _id: 4, description: "Multiple\nline descriptions" },
- { _id: 5, description: "anchors, links and hyperlinks" },
- { _id: 6, description: "métier work vocation" }
- ])
By default, $regexFindAll
performs a case-sensitive match.For example, the following aggregation performs a case-sensitive$regexFindAll
on the description
field. The regexpattern /line/
does not specify any grouping:
- db.products.aggregate([
- { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /line/ } } } }
- ])
The operationr returns the following:
- {
- "_id" : 1,
- "description" : "Single LINE description.",
- "returnObject" : [ ]
- }
- {
- "_id" : 2,
- "description" : "First lines\nsecond line",
- "returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ ]}, { "match" : "line", "idx" : 19, "captures" : [ ] } ]
- }
- {
- "_id" : 3,
- "description" : "Many spaces before line",
- "returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ ] } ]
- }
- {
- "_id" : 4,
- "description" : "Multiple\nline descriptions",
- "returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ ] }
- ] }
- {
- "_id" : 5,
- "description" : "anchors, links and hyperlinks",
- "returnObject" : [ ]
- }
- {
- "_id" : 6,
- "description" : "métier work vocation",
- "returnObject" : [ ]
- }
The following regex pattern /lin(e|k)/
specifies a grouping(e|k)
in the pattern:
- db.products.aggregate([
- { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /lin(e|k)/ } } } }
- ])
The operation returns the following:
- {
- "_id" : 1,
- "description" : "Single LINE description.",
- "returnObject": [ ]
- }
- {
- "_id" : 2,
- "description" : "First lines\nsecond line",
- "returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ "e" ] }, { "match" : "line", "idx" : 19, "captures" : [ "e" ] } ]
- }
- {
- "_id" : 3,
- "description" : "Many spaces before line",
- "returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ "e" ] } ]
- }
- {
- "_id" : 4,
- "description" : "Multiple\nline descriptions",
- "returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ "e" ] } ]
- }
- {
- "_id" : 5,
- "description" : "anchors, links and hyperlinks",
- "returnObject" : [ { "match" : "link", "idx" : 9, "captures" : [ "k" ] }, { "match" : "link", "idx" : 24, "captures" : [ "k" ] } ]
- }
- {
- "_id" : 6,
- "description" : "métier work vocation",
- "returnObject" : [ ]
- }
In the return option, the idx
field is the code point index and not the byteindex. To illustrate, consider the following example that uses theregex pattern /tier/
:
- db.products.aggregate([
- { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /tier/ } } } }
- ])
The operation returns the following where only the last recordmatches the pattern and the returned idx
is 2
(instead of 3if using a byte index)
- { "_id" : 1, "description" : "Single LINE description.", "returnObject" : [ ] }
- { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : [ ] }
- { "_id" : 3, "description" : "Many spaces before line", "returnObject" : [ ] }
- { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : [ ] }
- { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : [ ] }
- { "_id" : 6, "description" : "métier work vocation",
- "returnObject" : [ { "match" : "tier", "idx" : 2, "captures" : [ ] } ] }
i Option
Note
You cannot specify options in both the regex
and theoptions
field.
To perform case-insensitive pattern matching, include the i option as part of the regex field or in the optionsfield:
- // Specify i as part of the regex field
- { $regexFindAll: { input: "$description", regex: /line/i } }
- // Specify i in the options field
- { $regexFindAll: { input: "$description", regex: /line/, options: "i" } }
- { $regexFindAll: { input: "$description", regex: "line", options: "i" } }
For example, the following aggregation performs a case-insensitive$regexFindAll
on the description
field. The regexpattern /line/
does not specify any grouping:
- db.products.aggregate([
- { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /line/i } } } }
- ])
The operation returns the following documents:
- {
- "_id" : 1,
- "description" : "Single LINE description.",
- "returnObject" : [ { "match" : "LINE", "idx" : 7, "captures" : [ ] } ]
- }
- {
- "_id" : 2,
- "description" : "First lines\nsecond line",
- "returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ ] }, { "match" : "line", "idx" : 19, "captures" : [ ] } ]
- }
- {
- "_id" : 3,
- "description" : "Many spaces before line",
- "returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ ] } ]
- }
- {
- "_id" : 4,
- "description" : "Multiple\nline descriptions",
- "returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ ] } ]
- }
- {
- "_id" : 5,
- "description" : "anchors, links and hyperlinks",
- "returnObject" : [ ]
- }
- { "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }
m Option
Note
You cannot specify options in both the regex
and theoptions
field.
To match the specified anchors (e.g. ^
, $
) for each line of amultiline string, include the m optionas part of the regex field or in theoptions field:
- // Specify m as part of the regex field
- { $regexFindAll: { input: "$description", regex: /line/m } }
- // Specify m in the options field
- { $regexFindAll: { input: "$description", regex: /line/, options: "m" } }
- { $regexFindAll: { input: "$description", regex: "line", options: "m" } }
The following example includes both the i
and the m
options tomatch lines starting with either the letter s
or S
formultiline strings:
- db.products.aggregate([
- { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /^s/im } } } }
- ])
The operation returns the following:
- {
- "_id" : 1,
- "description" : "Single LINE description.",
- "returnObject" : [ { "match" : "S", "idx" : 0, "captures" : [ ] } ]
- }
- {
- "_id" : 2,
- "description" : "First lines\nsecond line",
- "returnObject" : [ { "match" : "s", "idx" : 12, "captures" : [ ] } ]
- }
- {
- "_id" : 3,
- "description" : "Many spaces before line",
- "returnObject" : [ ]
- }
- {
- "_id" : 4,
- "description" : "Multiple\nline descriptions",
- "returnObject" : [ ]
- }
- {
- "_id" : 5,
- "description" : "anchors, links and hyperlinks",
- "returnObject" : [ ]
- }
- { "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }
x Option
Note
You cannot specify options in both the regex
and theoptions
field.
To ignore all unescaped white space characters and comments (denoted bythe un-escaped hash #
character and the next new-line character) inthe pattern, include the s option in theoptions field:
- // Specify x in the options field
- { $regexFindAll: { input: "$description", regex: /line/, options: "x" } }
- { $regexFindAll: { input: "$description", regex: "line", options: "x" } }
The following example includes the x
option to skip unescaped whitespaces and comments:
- db.products.aggregate([
- { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /lin(e|k) # matches line or link/, options:"x" } } } }
- ])
The operation returns the following:
- {
- "_id" : 1,
- "description" : "Single LINE description.",
- "returnObject" : [ ]
- }
- {
- "_id" : 2,
- "description" : "First lines\nsecond line",
- "returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ "e" ] }, { "match" : "line", "idx" : 19, "captures" : [ "e" ] } ]
- }
- {
- "_id" : 3,
- "description" : "Many spaces before line",
- "returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ "e" ] } ]
- }
- {
- "_id" : 4,
- "description" : "Multiple\nline descriptions",
- "returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ "e" ] } ]
- }
- {
- "_id" : 5,
- "description" : "anchors, links and hyperlinks",
- "returnObject" : [ { "match" : "link", "idx" : 9, "captures" : [ "k" ] }, { "match" : "link", "idx" : 24, "captures" : [ "k" ] } ]
- }
- { "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }
s Option
Note
You cannot specify options in both the regex
and theoptions
field.
To allow the dot character (i.e. .
) in the pattern to match allcharacters including the new line character, include the s option in the options field:
- // Specify s in the options field
- { $regexFindAll: { input: "$description", regex: /m.*line/, options: "s" } }
- { $regexFindAll: { input: "$description", regex: "m.*line", options: "s" } }
The following example includes the s
option to allow the dotcharacter (i.e. .) to match all characters including new line as wellas the i
option to perform a case-insensitive match:
- db.products.aggregate([
- { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex:/m.*line/, options: "si" } } } }
- ])
The operation returns the following:
- {
- "_id" : 1,
- "description" : "Single LINE description.",
- "returnObject" : [ ]
- }
- {
- "_id" : 2,
- "description" : "First lines\nsecond line",
- "returnObject" : [ ]
- }
- {
- "_id" : 3,
- "description" : "Many spaces before line",
- "returnObject" : [ { "match" : "Many spaces before line", "idx" : 0, "captures" : [ ] } ]
- }
- {
- "_id" : 4,
- "description" : "Multiple\nline descriptions",
- "returnObject" : [ { "match" : "Multiple\nline", "idx" : 0, "captures" : [ ] } ]
- }
- {
- "_id" : 5,
- "description" : "anchors, links and hyperlinks",
- "returnObject" : [ ]
- }
- { "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }
Use $regexFindAll to Parse Email from String
Create a sample collection feedback
with the following documents:
- db.feedback.insertMany([
- { "_id" : 1, comment: "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com" },
- { "_id" : 2, comment: "I wanted to concatenate a string" },
- { "_id" : 3, comment: "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com" },
- { "_id" : 4, comment: "It's just me. I'm testing. fred@MongoDB.com" }
- ])
The following aggregation uses the $regexFindAll
to extractall emails from the comment
field (case insensitive).
- db.feedback.aggregate( [
- { $addFields: {
- "email": { $regexFindAll: { input: "$comment", regex: /[a-z0-9_.+-]+@[a-z0-9_.+-]+\.[a-z0-9_.+-]+/i } }
- } },
- { $set: { email: "$email.match"} }
- ] )
- First Stage
- The stage uses the
$addFields
stage to add a new fieldemail
to the document. The new field is an array that containsthe result of performing the$regexFindAll
on thecomment
field:
- { "_id" : 1, "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com", "email" : [ { "match" : "aunt.arc.tica@example.com", "idx" : 38, "captures" : [ ] } ] }
- { "_id" : 2, "comment" : "I wanted to concatenate a string", "email" : [ ] }
- { "_id" : 3, "comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com", "email" : [ { "match" : "cam@mongodb.com", "idx" : 56, "captures" : [ ] }, { "match" : "c.dia@mongodb.com", "idx" : 75, "captures" : [ ] } ] }
- { "_id" : 4, "comment" : "It's just me. I'm testing. fred@MongoDB.com", "email" : [ { "match" : "fred@MongoDB.com", "idx" : 28, "captures" : [ ] } ] }
- Second Stage
- The stage use the
$set
stage to reset theemail
array elements tothe"email.match"
value(s). If the current value ofemail
is null, the new value ofemail
is set to null.
- { "_id" : 1, "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com", "email" : [ "aunt.arc.tica@example.com" ] }
- { "_id" : 2, "comment" : "I wanted to concatenate a string", "email" : [ ] }
- { "_id" : 3, "comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com", "email" : [ "cam@mongodb.com", "c.dia@mongodb.com" ] }
- { "_id" : 4, "comment" : "It's just me. I'm testing. fred@MongoDB.com", "email" : [ "fred@MongoDB.com" ] }
Use Captured Groupings to Parse User Name
Create a sample collection feedback
with the following documents:
- db.feedback.insertMany([
- { "_id" : 1, comment: "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com" },
- { "_id" : 2, comment: "I wanted to concatenate a string" },
- { "_id" : 3, comment: "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com" },
- { "_id" : 4, comment: "It's just me. I'm testing. fred@MongoDB.com" }
- ])
To reply to the feedback, assume you want to parse the local-part ofthe email address to use as the name in the greetings. Using thecaptured
field returned in the $regexFindAll
results,you can parse out the local part of each email address:
- db.feedback.aggregate( [
- { $addFields: {
- "names": { $regexFindAll: { input: "$comment", regex: /([a-z0-9_.+-]+)@[a-z0-9_.+-]+\.[a-z0-9_.+-]+/i } },
- } },
- { $set: { names: { $reduce: { input: "$names.captures", initialValue: [ ], in: { $concatArrays: [ "$$value", "$$this" ] } } } } }
- ] )
- First Stage
- The stage uses the
$addFields
stage to add a new fieldnames
to the document. The new field contains the result ofperforming the$regexFindAll
on thecomment
field:
- {
- "_id" : 1,
- "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com",
- "names" : [ { "match" : "aunt.arc.tica@example.com", "idx" : 38, "captures" : [ "aunt.arc.tica" ] } ]
- }
- { "_id" : 2, "comment" : "I wanted to concatenate a string", "names" : [ ] }
- {
- "_id" : 3,
- "comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com",
- "names" : [
- { "match" : "cam@mongodb.com", "idx" : 56, "captures" : [ "cam" ] },
- { "match" : "c.dia@mongodb.com", "idx" : 75, "captures" : [ "c.dia" ] }
- ]
- }
- {
- "_id" : 4,
- "comment" : "It's just me. I'm testing. fred@MongoDB.com",
- "names" : [ { "match" : "fred@MongoDB.com", "idx" : 28, "captures" : [ "fred" ] } ]
- }
- Second Stage
- The stage use the
$set
stage with the$reduce
operator to resetnames
to an array that containsthe"$names.captures"
elements.
- {
- "_id" : 1,
- "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com",
- "names" : [ "aunt.arc.tica" ]
- }
- { "_id" : 2, "comment" : "I wanted to concatenate a string", "names" : [ ] }
- {
- "_id" : 3,
- "comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com",
- "names" : [ "cam", "c.dia" ]
- }
- {
- "_id" : 4,
- "comment" : "It's just me. I'm testing. fred@MongoDB.com",
- "names" : [ "fred" ]
- }