Database Manual / Reference / Query Language / Query Predicates / Miscellaneous

$regex

Note

This page describes regular expression search capabilities for self-managed (non-Atlas) deployments. For data hosted on MongoDB, MongoDB also offers an improved full-text search solution, MongoDB Search, which has its own $regex operator. 本页介绍用于自我管理(非Atlas)部署的正则表达式搜索功能。对于托管在MongoDB上的数据,MongoDB还提供了一种改进的全文搜索解决方案MongoDB search,它有自己的$regex运算符。To learn more, see $regex in the MongoDB Search documentation.要了解更多信息,请参阅MongoDB搜索文档中的$regex

Definition定义

$regex
Provides regular expression capabilities for pattern matching strings in queries.为查询中的模式匹配字符串提供正则表达式功能。

Compatibility兼容性

You can use $regex for deployments hosted in the following environments:您可以将$regex用于在以下环境中托管的部署:

  • MongoDB Atlas: The fully managed service for MongoDB deployments in the cloud:云中MongoDB部署的完全托管服务
  • MongoDB Enterprise: The subscription-based, self-managed version of MongoDB:MongoDB的基于订阅的自我管理版本
  • MongoDB Community: The source-available, free-to-use, and self-managed version of MongoDB:MongoDB的源代码可用、免费使用和自我管理版本

Syntax语法

To use $regex, use one of the following syntaxes:要使用$regex,请使用以下语法之一:

{ <field>: { $regex: /pattern/, $options: '<options>' } }
{ "<field>": { "$regex": "pattern", "$options": "<options>" } }
{ <field>: { $regex: /pattern/<options> } }

Note

To use $regex with mongodump, you must enclose the query document in single quotes ('{ ... }') to ensure that it does not interact with your shell environment.要将$regexmongodump一起使用,您必须将查询文档括在单引号('{ ... }')中,以确保它不会与shell环境交互。

The query document must be in Extended JSON v2 format (either relaxed or canonical/strict mode), which includes enclosing the field names and operators in quotes. For example:查询文档必须采用Extended JSON v2格式(宽松或规范/严格模式),其中包括将字段名和运算符括在引号中。例如:

mongodump -d=sample_mflix -c=movies  -q='{"year": {"$regex": "20"}}'

In MongoDB, you can also use regular expression objects (i.e. /pattern/) to specify regular expressions:在MongoDB中,您还可以使用正则表达式对象(即/pattern/)来指定正则表达式:

{ <field>: /pattern/<options> }

For restrictions on particular syntax use, see $regex vs. /pattern/ Syntax.有关特定语法使用的限制,请参阅$regex/pattern/语法。

$options

The following <options> are available for use with regular expression.以下<options>可用于正则表达式。

Option选项Description描述
iCase insensitivity to match upper and lower cases. For an example, see Perform Case-Insensitive Regular Expression Match.不区分大小写,以匹配大小写。例如,请参阅执行不区分大小写的正则表达式匹配
m

For patterns that include anchors (i.e. ^ for the start, $ for the end), match at the beginning or end of each line for strings with multiline values. 对于包含锚点的模式(即^表示开始,$表示结束),在每行的开始或结束处匹配具有多行值的字符串。Without this option, these anchors match at beginning or end of the string. 如果没有此选项,这些锚点将在字符串的开头或结尾匹配。For an example, see Multiline Match for Lines Starting with Specified Pattern.例如,请参阅以指定图案开头的线的多线匹配

If the pattern contains no anchors or if the string value has no newline characters (e.g. \n), the m option has no effect.如果模式不包含锚点,或者字符串值没有换行符(例如\n),则m选项无效。

x

"Extended" capability to ignore all white space characters in the $regex pattern unless escaped or included in a character class.“扩展”功能,忽略$regex模式中的所有空格字符,除非转义或包含在字符类中。

Additionally, it ignores characters in-between and including an un-escaped hash/pound (#) character and the next new line, so that you may include comments in complicated patterns. 此外,它忽略了中间的字符,包括一个未转义的哈希/磅(#)字符和下一个新行,这样你就可以在复杂的模式中包含注释。This only applies to data characters; white space characters may never appear within special character sequences in a pattern.这仅适用于数据字符;空白字符可能永远不会出现在模式中的特殊字符序列中。

The x option does not affect the handling of the VT character (i.e. code 11).x选项不影响VT字符的处理(即代码11)。

sAllows the dot character (i.e. .) to match all characters including newline characters. 允许点字符(即.)匹配所有字符,包括换行符。For an example, see Use the . Dot Character to Match New Line.例如,请参阅使用.点字符以匹配新行。
uSupports Unicode. This flag is accepted, but is redundant. UTF is set by default in the $regex operator, making the u option unnecessary.支持Unicode。此标志被接受,但是多余的。默认情况下,在$regex运算符中设置了UTF,因此u选项是不必要的。

Note

The $regex operator does not support the global search modifier g.$regex运算符不支持全局搜索修饰符g

Behavior行为

$regex vs. 对比/pattern/ Syntax语法

$in Expressions表达式

To include a regular expression in an $in query predicate operator, you can only use JavaScript regular expression objects (/pattern/ ).要在$in查询谓词运算符中包含正则表达式,您只能使用JavaScript正则表达式对象(/pattern/)。

For example:例如:

{ name: { $in: [ /^acme/i, /^ack/ ] } }

You cannot use $regex operator expressions inside an $in operator.不能在$in运算符中使用$regex运算符表达式。

Implicit AND Conditions for the Field隐式AND字段条件

To include a regular expression in a comma-separated list of query conditions for the field, use the $regex operator. For example:要在字段的逗号分隔的查询条件列表中包含正则表达式,请使用$regex运算符。例如:

{ name: { $regex: /acme.*corp/i, $nin: [ 'acmeblahcorp' ] } }
{ name: { $regex: /acme.*corp/, $options: 'i', $nin: [ 'acmeblahcorp' ] } }
{ name: { $regex: 'acme.*corp', $options: 'i', $nin: [ 'acmeblahcorp' ] } }

x and s Options选项

To use either the x option or s options, you must use the $regex operator expression with the $options operator. 要使用x选项或s选项,必须将$regex运算符表达式与$options运算符一起使用。For example, to specify the i and the s options, you must use $options for both:例如,要指定is选项,必须对这两个选项都使用$options

{ name: { $regex: /acme.*corp/, $options: "si" } }
{ name: { $regex: 'acme.*corp', $options: "si" } }

PCRE Versus JavaScriptPCRE与JavaScript

To use PCRE-supported features in a regular expression that aren't supported in JavaScript, you must use the $regex operator and specify the regular expression as a string.要在JavaScript不支持的正则表达式中使用PCRE支持的功能,您必须使用$regex运算符并将正则表达式指定为字符串。

To match case-insensitive strings:要匹配不区分大小写的字符串:

  • "(?i)" begins a case-insensitive match.开始不区分大小写的匹配。
  • "(?-i)" ends a case-insensitive match.结束不区分大小写的匹配。

For example, the regular expression "(?i)a(?-i)cme" matches strings that:例如,正则表达式"(?i)a(?-i)cme"匹配以下字符串:

  • Begin with "a" or "A". This is a case-insensitive match."a""A"开头。这是一个不区分大小写的匹配。
  • End with "cme". This is a case-sensitive match."cme"结尾。这是一场区分大小写的比赛。

These strings match the example regular expression:这些字符串与示例正则表达式匹配:

  • "acme"
  • "Acme"

The following example uses the $regex operator to find name field strings that match the regular expression "(?i)a(?-i)cme":以下示例使用$regex运算符查找与正则表达式"(?i)a(?-i)cme"匹配的名称字段字符串:

{ name: { $regex: "(?i)a(?-i)cme" } }

Starting in version 6.1, MongoDB uses the PCRE2 (Perl Compatible Regular Expressions) library to implement regular expression pattern matching. To learn more about PCRE2, see the PCRE Documentation.从6.1版本开始,MongoDB使用PCRE2(Perl兼容正则表达式)库来实现正则表达式模式匹配。要了解有关PCRE2的更多信息,请参阅PCRE文档

$regex and $not

The $not operator can perform logical NOT operation on both:$not运算符可以对以下两者执行逻辑NOT操作:

  • Regular expression objects (i.e. /pattern/)正则表达式对象(例如/pattern/

    For example:例如:

    db.inventory.find( { item: { $not: /^p.*/ } } )
  • $regex operator expressions运算符表达式

    For example:例如:

    db.inventory.find( { item: { $not: { $regex: "^p.*" } } } )
    db.inventory.find( { item: { $not: { $regex: /^p.*/ } } } )

Index Use索引使用

Index use and performance for $regex queries varies depending on whether the query is case-sensitive or case-insensitive.$regex查询的索引使用和性能因查询区分大小写或不区分大小写而异。

Case-Sensitive Queries区分大小写的查询

For case sensitive regular expression queries, if an index exists for the field, then MongoDB matches the regular expression against the values in the index, which can be faster than a collection scan.对于区分大小写的正则表达式查询,如果字段存在索引,则MongoDB会将正则表达式与索引中的值进行匹配,这可能比集合扫描更快。

Further optimization can occur if the regular expression is a "prefix expression", which means that all potential matches start with the same string. This allows MongoDB to construct a "range" from that prefix and only match against those values from the index that fall within that range.如果正则表达式是“前缀表达式”,则可以进行进一步的优化,这意味着所有潜在的匹配都以相同的字符串开头。这允许MongoDB从该前缀构建一个“范围”,并且只与该范围内的索引值进行匹配。

A regular expression is a "prefix expression" if it starts with a caret (^) or a left anchor (\A), followed by a string of simple symbols. 如果正则表达式以插入符(^)或左锚点(\A)开头,后跟一串简单符号,则它是“前缀表达式”。For example, the regex /^abc.*/ will be optimized by matching only against the values from the index that start with abc.例如,正则表达式/^abc.*/将通过仅与索引中以abc开头的值进行匹配来进行优化。

Additionally, while /^a/, /^a.*/, and /^a.*$/ match equivalent strings, they have different performance characteristics. 此外,虽然/^a//^a.*//^a.*$/匹配等效字符串,但它们具有不同的性能特征。All of these expressions use an index if an appropriate index exists; however, /^a.*/, and /^a.*$/ are slower. /^a/ can stop scanning after matching the prefix.如果存在适当的索引,则所有这些表达式都使用索引;然而,/^a.*//^a.*$/的速度较慢。/^a/匹配前缀后可以停止扫描。

Case-Insensitive Queries大小写不敏感查询

Case-insensitive indexes do not improve performance for $regex queries, as the $regex operator is not collation-aware and therefore cannot take advantage of such indexes.不区分大小写的索引并不能提高$regex查询的性能,因为$regex运算符不支持排序规则,因此无法利用此类索引。

Examples示例

The examples in this section use the following products collection:本节中的示例使用以下products集合:

db.products.insertMany( [
{ _id: 100, sku: "abc123", description: "Single line description." },
{ _id: 101, sku: "abc789", description: "First line\nSecond line" },
{ _id: 102, sku: "xyz456", description: "Many spaces before line" },
{ _id: 103, sku: "xyz789", description: "Multiple\nline description" },
{ _id: 104, sku: "Abc789", description: "SKU starts with A" }
] )

Perform a LIKE Match进行LIKE匹配

The following example matches all documents where the sku field is like "%789":以下示例匹配sku字段类似于"%789"的所有文档:

db.products.find( { sku: { $regex: /789$/ } } )

The example is analogous to the following SQL LIKE statement:该示例类似于以下SQL LIKE语句:

SELECT * FROM products
WHERE sku like "%789";

Example output:示例输出:

[
{ _id: 101, sku: 'abc789', description: 'First line\nSecond line' },
{ _id: 103, sku: 'xyz789', description: 'Multiple\nline description' },
{ _id: 104, sku: 'Abc789', description: 'SKU starts with A' }
]

Perform Case-Insensitive Regular Expression Match执行不区分大小写的正则表达式匹配

The following example uses the i option perform a case-insensitive match for documents with sku value that starts with ABC.以下示例使用i选项对sku值以ABC开头的文档执行不区分大小写的匹配。

db.products.find( { sku: { $regex: /^ABC/i } } )

Example output:示例输出:

[
{ _id: 100, sku: 'abc123', description: 'Single line description.' },
{ _id: 101, sku: 'abc789', description: 'First line\nSecond line' },
{ _id: 104, sku: 'Abc789', description: 'SKU starts with A' }
]

Multiline Match for Lines Starting with Specified Pattern以指定图案开头的线条的多行匹配

The following example uses the m option to match lines starting with the letter S for multiline strings:以下示例使用m选项来匹配多行字符串中以字母S开头的行:

db.products.find( { description: { $regex: /^S/, $options: 'm' } } )

Example output:示例输出:

[
{ _id: 100, sku: 'abc123', description: 'Single line description.' },
{ _id: 101, sku: 'abc789', description: 'First line\nSecond line' },
{ _id: 104, sku: 'Abc789', description: 'SKU starts with A' }
]

Without the m option, the example output is:如果没有m选项,示例输出为:

[
{ _id: 100, sku: 'abc123', description: 'Single line description.' },
{ _id: 104, sku: 'Abc789', description: 'SKU starts with A' }
]

If the $regex pattern does not contain an anchor, the pattern matches against the string as a whole, as in the following example:如果$regex模式不包含锚点,则该模式将与整个字符串匹配,如下例所示:

db.products.find( { description: { $regex: /S/ } } )

Example output:示例输出:

[
{ _id: 100, sku: 'abc123', description: 'Single line description.' },
{ _id: 101, sku: 'abc789', description: 'First line\nSecond line' },
{ _id: 104, sku: 'Abc789', description: 'SKU starts with A' }
]

Use the . Dot Character to Match New Line使用.点字符以匹配新行

The following example uses the s option to allow the dot character (i.e. .) to match all characters including new line as well as the i option to perform a case-insensitive match:以下示例使用s选项允许点字符(即.)匹配包括新行在内的所有字符,并使用i选项执行不区分大小写的匹配:

db.products.find( { description: { $regex: /m.*line/, $options: 'si' } } )

Example output:示例输出:

[
{ _id: 102, sku: 'xyz456', description: 'Many spaces before line' },
{ _id: 103, sku: 'xyz789', description: 'Multiple\nline description' }
]

Without the s option, the example output is:如果没有s选项,示例输出为:

[
{ _id: 102, sku: 'xyz456', description: 'Many spaces before line' }
]

Ignore White Spaces in Pattern忽略模式中的空格

The following example uses the x option ignore white spaces and the comments, denoted by the # and ending with the \n in the matching pattern:以下示例使用x选项忽略空格和注释,在匹配模式中用#表示并以\n结尾:

var pattern = "abc #category code\n123 #item number"
db.products.find( { sku: { $regex: pattern, $options: "x" } } )

Example output:示例输出:

[
{ _id: 100, sku: 'abc123', description: 'Single line description.' }
]

Use a Regular Expression to Match Case in Strings使用正则表达式匹配字符串中的大小写

The following example uses the regular expression "(?i)a(?-i)bc" to match sku field strings that contain:以下示例使用正则表达式"(?i)a(?-i)bc"来匹配包含以下内容的sku字段字符串:

  • "abc"
  • "Abc"
db.products.find( { sku: { $regex: "(?i)a(?-i)bc" } } )

Example output:示例输出:

[
{ _id: 100, sku: 'abc123', description: 'Single line description.' },
{ _id: 101, sku: 'abc789', description: 'First line\nSecond line' },
{ _id: 104, sku: 'Abc789', description: 'SKU starts with A' }
]

Extend Regex Options to Match Characters Outside of ASCII扩展正则表达式选项以匹配ASCII以外的字符

New in version 6.1.在版本6.1中新增。

By default, certain regex options (such as /b and /w) only recognize ASCII characters. This can cause unexpected results when performing regex matches against UTF-8 characters.默认情况下,某些正则表达式选项(如/b/w)只识别ASCII字符。在对UTF-8字符执行正则表达式匹配时,这可能会导致意外结果。

Starting in MongoDB 6.1, you can specify the *UCP regex option to match UTF-8 characters.从MongoDB 6.1开始,您可以指定*UCP正则表达式选项来匹配UTF-8字符。

Important

Performance of UCP OptionUCP选项的履行

The *UCP option results in slower queries than those without the option specified because *UCP requires a multistage table lookup to perform the match.*UCP选项导致查询速度比没有指定选项的查询慢,因为*UCP需要多级表查找来执行匹配。

For example, consider the following documents in a songs collection:例如,考虑songs集合中的以下文档:

db.songs.insertMany( [
{ _id: 0, "artist" : "Blue Öyster Cult", "title": "The Reaper" },
{ _id: 1, "artist": "Blue Öyster Cult", "title": "Godzilla" },
{ _id: 2, "artist" : "Blue Oyster Cult", "title": "Take Me Away" }
] )

The following regex query uses the \b option in a regex match. The \b option matches a word boundary.以下正则表达式查询在正则表达式匹配中使用\b选项。\b选项匹配单词边界。

db.songs.find( { artist: { $regex: /\byster/ } } )

Example output:示例输出:

[
{ _id: 0, artist: 'Blue Öyster Cult', title: 'The Reaper' },
{ _id: 1, artist: 'Blue Öyster Cult', title: 'Godzilla' }
]

The previous results are unexpected because none of the words in the returned artist fields begin with the matched string (yster). 之前的结果出乎意料,因为返回的artist字段中没有一个单词以匹配的字符串(yster)开头。The Ö character in documents _id: 0 and _id: 1 is ignored when performing the match because it is a UTF-8 character.执行匹配时,文档_id:0_id:1中的Ö字符被忽略,因为它是UTF-8字符。

The expected result is that the query does not return any documents.预期的结果是查询不返回任何文档。

To allow the query to recognize UTF-8 characters, specify the *UCP option before the pattern:要允许查询识别UTF-8字符,请在模式前指定*UCP选项:

db.songs.find( { artist: { $regex: "(*UCP)/\byster/" } } )

The previous query does not return any documents, which is the expected result.前面的查询没有返回任何文档,这是预期的结果。

Tip

Escape Characters for Regex Patterns正则表达式模式的转义字符

When specifying *UCP or any other regular expression option, ensure that you use the correct escape characters for your shell or driver.指定*UCP或任何其他正则表达式选项时,请确保为shell或驱动程序使用正确的转义符。