Definition
$regexFindAllProvides regular expression (regex) pattern matching capability in aggregation expressions. The operator returns an array of documents that contains information on each match. If a match is not found, returns an empty array.
Syntax
The $regexFindAll operator has the following syntax:
{ $regexFindAll: { input: <expression> , regex: <expression>, options: <expression> } }| Field | Description | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
The string on which you wish to apply the regex pattern. Can be a string or any valid expression that resolves to a string. | |||||||||||
The regex pattern to apply. Can be any valid expression that resolves to either a string or regex pattern
Alternatively, you can also specify the regex options with the options field. To specify the You cannot specify options in both the | |||||||||||
Optional. The following You cannot specify options in both the
|
Returns
The operator returns an array:
- If the operator does not find a match, the operator returns an empty array.
If the operator finds a match, the operator returns an array of documents that contains the following information for each match:
- the matching string in the input,
- the code point index (not byte index) of the matching string in the input, and
- An array of the strings that corresponds to the groups captured by the matching string. Capturing groups are specified with unescaped parenthesis
()in the regex pattern.
[ { "match" : <string>, "idx" : <num>, "captures" : <array of strings> }, ... ]Behavior
PCRE Library
Starting in version 6.1, MongoDB uses the PCRE2 (Perl Compatible Regular Expressions) library to implement regular expression pattern matching. To learn more about PCRE2, see the PCRE Documentation.
$regexFindAll and Collation
String matching for
$regexFindAllis always case-sensitive and diacritic-sensitive.$regexFindAllignores the collation specified for the collection,db.collection.aggregate(), and the index, if used.For example, create a collection with collation strength
1, meaning the collation only compares base characters and ignores differences such as case and diacritics:db.createCollection( "restaurants", { collation: { locale: "fr", strength: 1 } } )Insert the following documents:
db.restaurants.insertMany( [
{ _id: 1, category: "café", status: "Open" },
{ _id: 2, category: "cafe", status: "open" },
{ _id: 3, category: "cafE", status: "open" }
] )The following uses the collection's collation to perform a case-insensitive and diacritic-insensitive match:
db.restaurants.aggregate( [ { $match: { category: "cafe" } } ] )[
{ _id: 1, category: 'café', status: 'Open' },
{ _id: 2, category: 'cafe', status: 'open' },
{ _id: 3, category: 'cafE', status: 'open' }
]However,
$regexFindAllignores collation. The following regular expression pattern matching examples are case-sensitive and diacritic sensitive:db.restaurants.aggregate( [
{
$addFields: {
resultObject: { $regexFindAll: { input: "$category", regex: /cafe/ } }
}
}
] )
db.restaurants.aggregate( [
{
$addFields: {
resultObject: { $regexFindAll: { input: "$category", regex: /cafe/ } }
}
}
],
{ collation: { locale: "fr", strength: 1 } } // Ignored in the $regexFindAll
)Both operations return the following:
{ "_id" : 1, "category" : "café", "resultObject" : null }
{ "_id" : 2, "category" : "cafe", "resultObject" : { "match" : "cafe", "idx" : 0, "captures" : [ ] } }
{ "_id" : 3, "category" : "cafE", "resultObject" : null }Because the query ignores the collation, it requires an exact match on the
categorystring (including case and accent marks), meaning only document_id: 2is matched.To perform a case-insensitive regex pattern matching, use the
iOption instead. SeeiOption for an example.capturesOutput BehaviorIf your regex pattern contains capture groups and the pattern finds a match in the input, the
capturesarray in the results corresponds to the groups captured by the matching string. Capture groups are specified with unescaped parentheses()in the regex pattern. The length of thecapturesarray equals the number of capture groups in the pattern and the order of the array matches the order in which the capture groups appear.Create a sample collection named
contactswith the following documents:db.contacts.insertMany([
{ "_id": 1, "fname": "Carol", "lname": "Smith", "phone": "718-555-0113" },
{ "_id": 2, "fname": "Daryl", "lname": "Doe", "phone": "212-555-8832" },
{ "_id": 3, "fname": "Polly", "lname": "Andrews", "phone": "208-555-1932" },
{ "_id": 4, "fname": "Colleen", "lname": "Duncan", "phone": "775-555-0187" },
{ "_id": 5, "fname": "Luna", "lname": "Clarke", "phone": "917-555-4414" }
])The following pipeline applies the regex pattern
/(C(ar)*)ol/to thefnamefield:db.contacts.aggregate([
{
$project: {
returnObject: {
$regexFindAll: { input: "$fname", regex: /(C(ar)*)ol/ }
}
}
}
])The regex pattern finds a match with
fnamevaluesCarolandColleen:{ "_id" : 1, "returnObject" : [ { "match" : "Carol", "idx" : 0, "captures" : [ "Car", "ar" ] } ] }
{ "_id" : 2, "returnObject" : [ ] }
{ "_id" : 3, "returnObject" : [ ] }{ "_id" : 4, "returnObject" : [ { "match" : "Col", "idx" : 0, "captures" : [ "C", null ] } ] }
{ "_id" : 5, "returnObject" : [ ] }The pattern contains the capture group
(C(ar)*)which contains the nested group(ar). The elements in thecapturesarray correspond to the two capture groups. If a matching document is not captured by a group (e.g.Colleenand the group(ar)),$regexFindAllreplaces the group with a null placeholder.As shown in the previous example, the
capturesarray contains an element for each capture group (usingnullfor non-captures). Consider the following example which searches for phone numbers with New York City area codes by applying a logicalorof capture groups to thephonefield. Each group represents a New York City area code:db.contacts.aggregate([
{
$project: {
nycContacts: {
$regexFindAll: { input: "$phone", regex: /^(718).*|^(212).*|^(917).*/ }
}
}
}
])For documents which are matched by the regex pattern, the
capturesarray includes the matching capture group and replaces any non-capturing groups withnull:{ "_id" : 1, "nycContacts" : [ { "match" : "718-555-0113", "idx" : 0, "captures" : [ "718", null, null ] } ] }
{ "_id" : 2, "nycContacts" : [ { "match" : "212-555-8832", "idx" : 0, "captures" : [ null, "212", null ] } ] }
{ "_id" : 3, "nycContacts" : [ ] }
{ "_id" : 4, "nycContacts" : [ ] }
{ "_id" : 5, "nycContacts" : [ { "match" : "917-555-4414", "idx" : 0, "captures" : [ null, null, "917" ] } ] }Examples
$regexFindAlland Its OptionsTo illustrate the behavior of the
$regexFindAlloperator as discussed in this example, create a sample collectionproductswith the following documents:db.products.insertMany([
{ _id: 1, description: "Single LINE description." },
{ _id: 2, description: "First lines\nsecond line" },
{ _id: 3, description: "Many spaces before line" },
{ _id: 4, description: "Multiple\nline descriptions" },
{ _id: 5, description: "anchors, links and hyperlinks" },
{ _id: 6, description: "métier work vocation" }
])By default,
$regexFindAllperforms a case-sensitive match. For example, the following aggregation performs a case-sensitive$regexFindAllon thedescriptionfield. The regex pattern/line/does not specify any grouping:db.products.aggregate([
{ $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /line/ } } } }
])The operation returns the following:
{
"_id" : 1,
"description" : "Single LINE description.",
"returnObject" : [ ]
}
{
"_id" : 2,
"description" : "First lines\nsecond line",
"returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ ]}, { "match" : "line", "idx" : 19, "captures" : [ ] } ]
}
{
"_id" : 3,
"description" : "Many spaces before line",
"returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ ] } ]
}
{
"_id" : 4,
"description" : "Multiple\nline descriptions",
"returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ ] }
] }
{
"_id" : 5,
"description" : "anchors, links and hyperlinks",
"returnObject" : [ ]
}
{
"_id" : 6,
"description" : "métier work vocation",
"returnObject" : [ ]
}The following regex pattern
/lin(e|k)/specifies a grouping(e|k)in the pattern:db.products.aggregate([
{ $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /lin(e|k)/ } } } }
])The operation returns the following:
{
"_id" : 1,
"description" : "Single LINE description.",
"returnObject": [ ]
}
{
"_id" : 2,
"description" : "First lines\nsecond line",
"returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ "e" ] }, { "match" : "line", "idx" : 19, "captures" : [ "e" ] } ]
}
{
"_id" : 3,
"description" : "Many spaces before line",
"returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ "e" ] } ]
}
{
"_id" : 4,
"description" : "Multiple\nline descriptions",
"returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ "e" ] } ]
}
{
"_id" : 5,
"description" : "anchors, links and hyperlinks",
"returnObject" : [ { "match" : "link", "idx" : 9, "captures" : [ "k" ] }, { "match" : "link", "idx" : 24, "captures" : [ "k" ] } ]
}
{
"_id" : 6,
"description" : "métier work vocation",
"returnObject" : [ ]
}In the return option, the
idxfield is the code point index and not the byte index. To illustrate, consider the following example that uses the regex pattern/tier/:db.products.aggregate([
{ $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /tier/ } } } }
])The operation returns the following where only the last record matches the pattern and the returned
idxis2(instead of 3 if using a byte index){ "_id" : 1, "description" : "Single LINE description.", "returnObject" : [ ] }
{ "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : [ ] }
{ "_id" : 3, "description" : "Many spaces before line", "returnObject" : [ ] }
{ "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : [ ] }
{ "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : [ ] }
{ "_id" : 6, "description" : "métier work vocation",
"returnObject" : [ { "match" : "tier", "idx" : 2, "captures" : [ ] } ] }iOptionNote
You cannot specify options in both the
regexand theoptionsfield.To perform case-insensitive pattern matching, include the i option as part of the regex field or in the options field:
// Specify i as part of the regex field
{ $regexFindAll: { input: "$description", regex: /line/i } }
// Specify i in the options field
{ $regexFindAll: { input: "$description", regex: /line/, options: "i" } }
{ $regexFindAll: { input: "$description", regex: "line", options: "i" } }For example, the following aggregation performs a case-insensitive
$regexFindAllon thedescriptionfield. The regex pattern/line/does not specify any grouping:db.products.aggregate([
{ $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /line/i } } } }
])The operation returns the following documents:
{
"_id" : 1,
"description" : "Single LINE description.",
"returnObject" : [ { "match" : "LINE", "idx" : 7, "captures" : [ ] } ]
}
{
"_id" : 2,
"description" : "First lines\nsecond line",
"returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ ] }, { "match" : "line", "idx" : 19, "captures" : [ ] } ]
}
{
"_id" : 3,
"description" : "Many spaces before line",
"returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ ] } ]
}
{
"_id" : 4,
"description" : "Multiple\nline descriptions",
"returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ ] } ]
}
{
"_id" : 5,
"description" : "anchors, links and hyperlinks",
"returnObject" : [ ]
}
{ "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }mOptionNote
You cannot specify options in both the
regexand theoptionsfield.To match the specified anchors (e.g.
^,$) for each line of a multiline string, include the m option as part of the regex field or in the options field:// Specify m as part of the regex field
{ $regexFindAll: { input: "$description", regex: /line/m } }
// Specify m in the options field
{ $regexFindAll: { input: "$description", regex: /line/, options: "m" } }
{ $regexFindAll: { input: "$description", regex: "line", options: "m" } }The following example includes both the
iand themoptions to match lines starting with either the lettersorSfor multiline strings:db.products.aggregate([
{ $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /^s/im } } } }
])The operation returns the following:
{
"_id" : 1,
"description" : "Single LINE description.",
"returnObject" : [ { "match" : "S", "idx" : 0, "captures" : [ ] } ]
}
{
"_id" : 2,
"description" : "First lines\nsecond line",
"returnObject" : [ { "match" : "s", "idx" : 12, "captures" : [ ] } ]
}
{
"_id" : 3,
"description" : "Many spaces before line",
"returnObject" : [ ]
}
{
"_id" : 4,
"description" : "Multiple\nline descriptions",
"returnObject" : [ ]
}
{
"_id" : 5,
"description" : "anchors, links and hyperlinks",
"returnObject" : [ ]
}
{ "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }xOptionNote
You cannot specify options in both the
regexand theoptionsfield.To ignore all unescaped white space characters and comments (denoted by the un-escaped hash
#character and the next new-line character) in the pattern, include the s option in the options field:// Specify x in the options field
{ $regexFindAll: { input: "$description", regex: /line/, options: "x" } }
{ $regexFindAll: { input: "$description", regex: "line", options: "x" } }The following example includes the
xoption to skip unescaped white spaces and comments:db.products.aggregate([
{ $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /lin(e|k) # matches line or link/, options:"x" } } } }
])The operation returns the following:
{
"_id" : 1,
"description" : "Single LINE description.",
"returnObject" : [ ]
}
{
"_id" : 2,
"description" : "First lines\nsecond line",
"returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ "e" ] }, { "match" : "line", "idx" : 19, "captures" : [ "e" ] } ]
}
{
"_id" : 3,
"description" : "Many spaces before line",
"returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ "e" ] } ]
}
{
"_id" : 4,
"description" : "Multiple\nline descriptions",
"returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ "e" ] } ]
}
{
"_id" : 5,
"description" : "anchors, links and hyperlinks",
"returnObject" : [ { "match" : "link", "idx" : 9, "captures" : [ "k" ] }, { "match" : "link", "idx" : 24, "captures" : [ "k" ] } ]
}
{ "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }sOptionNote
You cannot specify options in both the
regexand theoptionsfield.To allow the dot character (i.e.
.) in the pattern to match all characters including the new line character, include the s option in the options field:// Specify s in the options field
{ $regexFindAll: { input: "$description", regex: /m.*line/, options: "s" } }
{ $regexFindAll: { input: "$description", regex: "m.*line", options: "s" } }The following example includes the
soption to allow the dot character (i.e. .) to match all characters including new line as well as theioption to perform a case-insensitive match:db.products.aggregate([
{ $addFields: { returnObject: { $regexFindAll: { input: "$description", regex:/m.*line/, options: "si" } } } }
])The operation returns the following:
{
"_id" : 1,
"description" : "Single LINE description.",
"returnObject" : [ ]
}
{
"_id" : 2,
"description" : "First lines\nsecond line",
"returnObject" : [ ]
}
{
"_id" : 3,
"description" : "Many spaces before line",
"returnObject" : [ { "match" : "Many spaces before line", "idx" : 0, "captures" : [ ] } ]
}
{
"_id" : 4,
"description" : "Multiple\nline descriptions",
"returnObject" : [ { "match" : "Multiple\nline", "idx" : 0, "captures" : [ ] } ]
}
{
"_id" : 5,
"description" : "anchors, links and hyperlinks",
"returnObject" : [ ]
}
{ "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }Use
$regexFindAllto Parse Email from StringCreate a sample collection
feedbackwith the following documents:db.feedback.insertMany([
{ "_id" : 1, comment: "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com" },
{ "_id" : 2, comment: "I wanted to concatenate a string" },
{ "_id" : 3, comment: "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com" },
{ "_id" : 4, comment: "It's just me. I'm testing. fred@MongoDB.com" }
])The following aggregation uses the
$regexFindAllto extract all emails from thecommentfield (case insensitive).db.feedback.aggregate( [
{ $addFields: {
"email": { $regexFindAll: { input: "$comment", regex: /[a-z0-9_.+-]+@[a-z0-9_.+-]+\.[a-z0-9_.+-]+/i } }
} },
{ $set: { email: "$email.match"} }
] )- First Stage
The stage uses the
$addFieldsstage to add a new fieldemailto the document. The new field is an array that contains the result of performing the$regexFindAllon thecommentfield:{ "_id" : 1, "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com", "email" : [ { "match" : "aunt.arc.tica@example.com", "idx" : 38, "captures" : [ ] } ] }
{ "_id" : 2, "comment" : "I wanted to concatenate a string", "email" : [ ] }
{ "_id" : 3, "comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com", "email" : [ { "match" : "cam@mongodb.com", "idx" : 56, "captures" : [ ] }, { "match" : "c.dia@mongodb.com", "idx" : 75, "captures" : [ ] } ] }
{ "_id" : 4, "comment" : "It's just me. I'm testing. fred@MongoDB.com", "email" : [ { "match" : "fred@MongoDB.com", "idx" : 28, "captures" : [ ] } ] }- Second Stage
The stage use the
$setstage to reset theemailarray elements to the"email.match"value(s). If the current value ofemailis null, the new value ofemailis set to null.{ "_id" : 1, "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com", "email" : [ "aunt.arc.tica@example.com" ] }
{ "_id" : 2, "comment" : "I wanted to concatenate a string", "email" : [ ] }
{ "_id" : 3, "comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com", "email" : [ "cam@mongodb.com", "c.dia@mongodb.com" ] }
{ "_id" : 4, "comment" : "It's just me. I'm testing. fred@MongoDB.com", "email" : [ "fred@MongoDB.com" ] }Use Captured Groupings to Parse User Name
Create a sample collection
feedbackwith the following documents:db.feedback.insertMany([
{ "_id" : 1, comment: "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com" },
{ "_id" : 2, comment: "I wanted to concatenate a string" },
{ "_id" : 3, comment: "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com" },
{ "_id" : 4, comment: "It's just me. I'm testing. fred@MongoDB.com" }
])To reply to the feedback, assume you want to parse the local-part of the email address to use as the name in the greetings. Using the
capturedfield returned in the$regexFindAllresults, you can parse out the local part of each email address:db.feedback.aggregate( [
{ $addFields: {
"names": { $regexFindAll: { input: "$comment", regex: /([a-z0-9_.+-]+)@[a-z0-9_.+-]+\.[a-z0-9_.+-]+/i } },
} },
{ $set: { names: { $reduce: { input: "$names.captures", initialValue: [ ], in: { $concatArrays: [ "$$value", "$$this" ] } } } } }
] )- First Stage
The stage uses the
$addFieldsstage to add a new fieldnamesto the document. The new field contains the result of performing the$regexFindAllon thecommentfield:{
"_id" : 1,
"comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com",
"names" : [ { "match" : "aunt.arc.tica@example.com", "idx" : 38, "captures" : [ "aunt.arc.tica" ] } ]
}
{ "_id" : 2, "comment" : "I wanted to concatenate a string", "names" : [ ] }
{
"_id" : 3,
"comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com",
"names" : [
{ "match" : "cam@mongodb.com", "idx" : 56, "captures" : [ "cam" ] },
{ "match" : "c.dia@mongodb.com", "idx" : 75, "captures" : [ "c.dia" ] }
]
}
{
"_id" : 4,
"comment" : "It's just me. I'm testing. fred@MongoDB.com",
"names" : [ { "match" : "fred@MongoDB.com", "idx" : 28, "captures" : [ "fred" ] } ]
}- Second Stage
The stage use the
$setstage with the$reduceoperator to resetnamesto an array that contains the"$names.captures"elements.{
"_id" : 1,
"comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com",
"names" : [ "aunt.arc.tica" ]
}
{ "_id" : 2, "comment" : "I wanted to concatenate a string", "names" : [ ] }
{
"_id" : 3,
"comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com",
"names" : [ "cam", "c.dia" ]
}
{
"_id" : 4,
"comment" : "It's just me. I'm testing. fred@MongoDB.com",
"names" : [ "fred" ]
}Tip
For more information on the behavior of the
capturesarray and additional examples, seecapturesOutput Behavior.