Expressions Explained解释的表达式
Summarising Aggregation Expressions汇总聚合表达式
Expressions give aggregation pipelines their data manipulation power. 表达式赋予聚合管道数据处理能力。However, they tend to be something that developers start using by just copying examples from the MongoDB Manual and then refactoring these without thinking enough about what they are. 然而,它们往往是开发人员开始使用的东西,只需复制MongoDB手册中的示例,然后在没有充分考虑它们是什么的情况下进行重构。Proficiency in aggregation pipelines demands a deeper understanding of expressions.精通聚合管道需要对表达式有更深入的理解。
Aggregation expressions come in one of three primary flavours:聚合表达式有三种主要风格之一:
-
Operators.运算符。Accessed as an object with a作为对象访问,前缀为$
prefix followed by the operator function name.$
,后跟运算符函数名。The "dollar-operator-name" is used as the main key for the object.“美元运算符名称”用作对象的主键。Examples:示例:{$arrayElemAt: ...}
,{$cond: ...}
,{$dateToString: ...}
-
Field Paths.字段路径。Accessed as a string with a以字符串形式访问,前缀为$
prefix followed by the field's path in each record being processed.$
,后面跟着正在处理的每个记录中的字段路径。Examples:示例:"$account.sortcode"
,"$addresses.address.city"
-
Variables.变量。Accessed as a string with a以字符串形式访问,前缀为$$
prefix followed by the fixed name and falling into three sub-categories:$$
,后跟固定名称,分为三个子类别:-
Context System Variables.上下文系统变量。With values coming from the system environment rather than each input record an aggregation stage is processing.对于来自系统环境而不是每个输入记录的值,聚合阶段正在进行处理。Examples:示例:"$$NOW"
,"$$CLUSTER_TIME"
-
Marker Flag System Variables.标记标志系统变量。To indicate desired behaviour to pass back to the aggregation runtime.指示要传递回聚合运行时的所需行为。Examples:示例:"$$ROOT"
,"$$REMOVE"
,"$$PRUNE"
-
Bind User Variables.绑定用户变量。For storing values you declare with a用于存储使用$let
operator (or with thelet
option of a$lookup
stage, oras
option of a$map
or$filter
stage).$let
运算符(或使用$lookup
阶段的let
选项,或$map
或$filter
阶段的as
选项)声明的值。Examples:示例:"$$product_name_var"
,"$$orderIdVal"
-
You can combine these three categories of aggregation expressions when operating on input records, enabling you to perform complex comparisons and transformations of data. 在对输入记录进行操作时,可以将这三类聚合表达式组合在一起,从而可以对数据执行复杂的比较和转换。To highlight this, the code snippet below is an excerpt from this book's Mask Sensitive Fields example, which combines all three expressions.为了强调这一点,下面的代码片段摘录自本书的掩码敏感字段示例,该示例结合了所有三个表达式。
"customer_info": {"$cond": {
"if": {"$eq": ["$customer_info.category", "SENSITIVE"]},
"then": "$$REMOVE",
"else": "$customer_info",
}}
The pipeline retains an embedded sub-document (除非原始子文档中的字段具有特定值(customer_info
) in each resulting record unless a field in the original sub-document has a specific value (category=SENSITIVE
). category=SENSITIVE
),否则管道会在每个生成的记录中保留一个嵌入的子文档(customer_info
)。{$cond: ...}
is one of the operator expressions used in the excerpt (a "conditional" operator expression which takes three arguments: 是摘录中使用的运算符表达式之一(一个“条件”运算符表达式,包含三个参数:if
, then
& else
). if
、then
和else
)。{$eq: ...}
is another operator expression (a "comparison" operator expression). 是另一个运算符表达式(“比较”运算符表达式)。"$$REMOVE"
is a "marker flag" variable expression instructing the pipeline to exclude the field. 是一个“标记标志”变量表达式,指示管道排除字段。Both "$customer_info.category"
and "$customer_info"
elements are field path expressions referencing each incoming record's fields."$customer_info.category"
和"$customer_info"
元素都是引用每个传入记录字段的字段路径表达式。
What Do Expressions Produce?表达式产生什么?
As described above, an expression can be an Operator (e.g. 如上所述,表达式可以是运算符(例如{$concat: ...}
), a Variable (e.g. "$$ROOT"
) or a Field Path (e.g. "$address"
). {$concat: ...}
)、变量(例如"$$ROOT"
)或字段路径(例如"$address"
)。In all these cases, an expression is just something that dynamically populates and returns a new JSON/BSON data type element, which can be one of:在所有这些情况下,表达式只是动态填充并返回新的JSON/BSON数据类型元素的东西,它可以是以下元素之一:
- a Number
(including integer, long, float, double, decimal128)(包括integer、long、float、double、decimal128) - a String (UTF-8)
- a Boolean
- a DateTime (UTC)
- an Array
- an Object
However, a specific expression can restrict you to returning just one or a few of these types. 但是,特定的表达式可能会限制您只返回其中一个或几个类型。For example, the 例如,组合多个字符串的{$concat: ...}
Operator, which combines multiple strings, can only produce a String data type (or null). {$concat: ...}
运算符只能生成字符串数据类型(或null
)。The Variable 变量"$$ROOT"
can only return an Object which refers to the root document currently being processed in the pipeline stage. "$$ROOT"
只能返回引用当前正在管道阶段处理的根文档的Object。
A Field Path (e.g. 字段路径(例如"$address"
) is different and can return an element of any data type, depending on what the field refers to in the current input document. "$address"
)不同,可以返回任何数据类型的元素,具体取决于当前输入文档中字段所指的内容。For example, suppose 例如,假设"$address"
references a sub-document. "$address"
引用了一个子文档。In this case, it will return an Object. 在这种情况下,它将返回一个Object。However, if it references a list of elements, it will return an Array. 但是,如果它引用一个元素列表,它将返回一个数组。As a human, you can guess that the Field Path 作为一个人,您可以猜测字段路径"$address"
won't return a DateTime, but the aggregation runtime does not know this ahead of time. "$address"
不会返回DateTime,但聚合运行时并不提前知道这一点。There could be even more dynamics at play. 可能还有更多的动力在发挥作用。Due to MongoDB's flexible data model, 由于MongoDB灵活的数据模型,"$address"
could yield a different type for each record processed in a pipeline stage. "$address"
可以为管道阶段处理的每个记录生成不同的类型。The first record's 第一条记录的address
may be an Object sub-document with street name and city fields. address
可以是具有街道名称和城市字段的Object子文档。The second record's 第二条记录的地址可能将完整地址表示为单个字符串。address
might represent the full address as a single String.
In summary, Field Paths and Bind User Variables are expressions that can return any JSON/BSON data type at runtime depending on their context. 总之,字段路径和绑定用户变量是可以在运行时根据上下文返回任何JSON/BSON数据类型的表达式。For the other kinds of expressions (Operators, Context System Variables and Marker Flag System Variables), the data type each can return is fixed to one or a set number of documented types. 对于其他类型的表达式(运算符、上下文系统变量和标记标志系统变量),每个表达式可以返回的数据类型固定为一个或一组记录类型。To establish the exact data type produced by these specific operators, you need to consult the Aggregation Pipeline Quick Reference documentation. 要确定这些特定运算符生成的确切数据类型,您需要查阅聚合管道快速参考文档。
For the Operator category of expressions, an expression can also take other expressions as parameters, making them composable. 对于表达式的运算符类别,表达式还可以将其他表达式作为参数,使其可组合。Suppose you need to determine the day of the week for a given date, for example:假设您需要确定给定日期的星期几,例如:
{"$dayOfWeek": ISODate("2021-04-24T00:00:00Z")}
Here the 这里$dayOfWeek
Operator expression can only return an element of type Number and takes a single parameter, an element of type DateTime. $dayOfWeek
运算符表达式只能返回Number类型的元素,并接受一个参数,即DateTime类型的元素。However, rather than using a hardcoded date-time for the parameter, you could have provided an expression. 但是,您可以提供一个表达式,而不是为参数使用硬编码的日期时间。This could be a Field Path expression, for example:这可以是字段路径表达式,例如:
{"$dayOfWeek": "$person_details.data_of_birth"}
Alternatively, you could have defined the parameter using a Context System Variable expression, for example:或者,您可以使用上下文系统变量表达式定义参数,例如:
{"$dayOfWeek": "$$NOW"}
Or you could even have defined the parameter using yet another Operator expression, for example: 或者,您甚至可以使用另一个运算符表达式定义参数,例如:
{"$dayOfWeek": {"$dateFromParts": {"year" : 2021, "month" : 4, "day": 24}}}
Furthermore, you could have defined 此外,您可以为year
, month
and day
parameters for $dateFromParts
to be dynamically generated using expressions rather than literal values. $dateFromParts
定义year
、month
和day
参数,以便使用表达式而不是文字值动态生成。The ability to chain expressions together in this way gives your pipelines a lot of power and flexibility when you need it. 以这种方式将表达式链接在一起的能力为您的管道在需要时提供了很大的力量和灵活性。
Can All Stages Use Expressions?所有阶段都可以使用表达式吗?
The following question is something you may not have asked yourself before, but asking this question and considering why the answer is what it is can help reveal more about what aggregation expressions are and why you use them.以下问题可能是你以前没有问过自己的,但问这个问题并考虑为什么答案是什么,可以帮助揭示更多关于聚合表达式是什么以及你为什么使用它们的信息。
Question:问题: Can aggregation expressions be used within any type of pipeline stage?聚合表达式可以在任何类型的管道阶段中使用吗?
Answer:答案: No
There are many types of stages in the Aggregation Framework that don't allow expressions to be embedded. 聚合框架中有许多类型的阶段不允许嵌入表达式。Examples of some of the most commonly used of these stages are:这些阶段中一些最常用的例子有:
$match
$limit
$skip
$sort
$count
$lookup
$out
Some of these stages may be a surprise to you if you've never really thought about it before. 如果你以前从未真正想过,其中一些阶段可能会让你大吃一惊。You might well consider 您可能会认为$match
to be the most surprising item in this list. $match
是这个列表中最令人惊讶的项目。The content of a $match
stage is just a set of query conditions with the same syntax as MQL rather than an aggregation expression. $match
阶段的内容只是一组查询条件,其语法与MQL相同,而不是聚合表达式。There is a good reason for this. The aggregation engine reuses the MQL query engine to perform a "regular" query against the collection, enabling the query engine to use all its usual optimisations. 这是有充分理由的。聚合引擎重用MQL查询引擎来对集合执行“常规”查询,从而使查询引擎能够使用其所有常规优化。The query conditions are taken as-is from the 查询条件取自管道顶部的$match
stage at the top of the pipeline. $match
阶段。Therefore, the 因此,$match
filter must use the same syntax as MQL. $match
筛选器必须使用与MQL相同的语法。
In most of the stages that are unable to leverage expressions, it doesn't usually make sense for their behaviour to be dynamic, based on the pipeline data entering the stage. 在大多数无法利用表达式的阶段中,基于进入阶段的管道数据,它们的行为通常是动态的,这是没有意义的。For a client application that paginates results, you might define a value of 对于对结果进行分页的客户端应用程序,可以为20
for the $limit
stage. $limit
阶段定义值20
。However, maybe you want to dynamically bind a value to the 但是,您可能希望将一个值动态绑定到$limit
stage, sourced by a $lookup
stage earlier in the pipeline. $limit
阶段,该阶段由管道早期的$lookup
阶段提供。The lookup operation might pull in the user's preferred "page list size" value from a "user preferences" collection. 查找操作可能会从“用户首选项”集合中提取用户首选的“页面列表大小”值。Nonetheless, the Aggregation Framework does not support this today for the listed stage types to avoid the overhead of the extra checks it would need to perform for what are essentially rare cases.尽管如此,聚合框架目前不支持列出的阶段类型,以避免在极少数情况下需要执行额外检查的开销。
In most cases, only one of the listed stages needs to be more expressive: the 在大多数情况下,列出的阶段中只有一个需要更具表现力:$match
stage, but this stage is already flexible by being based on MQL query conditions. $match
阶段,但该阶段已经基于MQL查询条件而变得灵活。However, sometimes, even MQL isn't expressive enough to sufficiently define a rule to identify records to retain in an aggregation. 然而,有时,即使是MQL也不足以表达,无法充分定义规则来标识要保留在聚合中的记录。The remainder of this chapter explores these challenges and how they are solved.本章的剩余部分探讨了这些挑战以及如何解决这些挑战。
What Is Using $expr
Inside $match
All About?什么是在$match
内部使用$expr
全部关于?
$expr
Inside $match
All About?The previously stated generalisation about 前面所说的关于$match
not supporting expressions is actually inaccurate. $match
不支持表达式的概括实际上是不准确的。Version 3.6 of MongoDB introduced the MongoDB的3.6版引入了$expr
operator, which you can embed within a $match
stage (or in MQL) to leverage aggregation expressions when filtering records. $expr
运算符,您可以将其嵌入$match
阶段(或MQL)中,以便在筛选记录时利用聚合表达式。Essentially, this enables MongoDB's query runtime (which executes an aggregation's 从本质上讲,这使MongoDB的查询运行时(执行聚合的$match
) to reuse expressions provided by MongoDB's aggregation runtime.$match
)能够重用MongoDB的聚合运行时提供的表达式。
Inside a 在$expr
operator, you can include any composite expression fashioned from $
operator functions, $
field paths and $$
variables. $expr
运算符中,可以包含由$
运算符函数、$
字段路径和$$
变量构成的任何复合表达式。A few situations demand having to use 一些情况要求必须在$expr
from inside a $match
stage. $match
阶段内使用$expr
。Examples include:示例包括:
-
A requirement to compare two fields from the same record to determine whether to keep the record based on the comparison's outcome对同一记录中的两个字段进行比较以根据比较结果确定是否保留该记录的要求 -
A requirement to perform a calculation based on values from multiple existing fields in each record and then comparing the calculation to a constant要求根据每条记录中多个现有字段的值进行计算,然后将计算结果与常数进行比较
These are impossible in an aggregation (or MQL 如果您使用常规的find()
) if you use regular $match
query conditions.$match
查询条件,那么在聚合(或MQL find()
)中这些是不可能的。
Take the example of a collection holding information on different instances of rectangles (capturing their width and height), similar to the following: 以一个集合为例,该集合包含关于矩形的不同实例的信息(捕捉它们的宽度和高度),类似于以下内容:
[
{ _id: 1, width: 2, height: 8 },
{ _id: 2, width: 3, height: 4 },
{ _id: 3, width: 20, height: 1 }
]
What if you wanted to run an aggregation pipeline to only return rectangles with an 如果您想运行聚合管道,只返回面积大于area
greater than 12
? 12
的矩形,该怎么办?This comparison isn't possible in a conventional aggregation when using a single 在使用单个$match
query condition. $match
查询条件的传统聚合中,这种比较是不可能的。However, with 但是,使用$expr
, you can analyse a combination of fields in-situ using expressions. $expr
,您可以使用表达式原位分析字段组合。You can implement the requirement with the following pipeline:您可以使用以下管道来实现该需求:
var pipeline = [
{"$match": {
"$expr": {"$gt": [{"$multiply": ["$width", "$height"]}, 12]},
}},
];
The result of executing an aggregation with this pipeline is:使用此管道执行聚合的结果是:
[
{ _id: 1, width: 2, height: 8 },
{ _id: 3, width: 20, height: 1 }
]
As you can see, the second of the three shapes is not output because its area is only 正如您所看到的,三个形状中的第二个没有输出,因为它的面积只有12
(3 x 4
).12
(3 x 4
)。
Restrictions When Using Expressions with $match
将表达式与$match
一起使用时的限制
$match
You should be aware that there are restrictions on when the runtime can benefit from an index when using a 您应该意识到,在$expr
operator inside a $match
stage. $match
阶段中使用$expr
运算符时,运行时何时可以从索引中获益是有限制的。This partly depends on the version of MongoDB you are running. 这在一定程度上取决于您正在运行的MongoDB版本。Using 使用$expr
, you can leverage a $eq
comparison operator with some constraints, including an inability to use a multi-key index. $expr
,您可以利用带有一些约束的$eq
比较运算符,包括无法使用多键索引。For MongoDB versions before 5.0, if you use a "range" comparison operator (对于5.0之前的MongoDB版本,如果使用“范围”比较运算符($gt
, $gte
, $lt
and $lte
), an index cannot be employed to match the field, but this works fine in version 5.0 and greater.$gt
、$gte
、$lt
和$lte
),则无法使用索引来匹配字段,但这在5.0及更高版本中效果良好。
There are also subtle differences when ordering values for a specific field across multiple documents when some values have different types. 当某些值具有不同类型时,在多个文档中对特定字段的值进行排序时也会有细微的差异。MongoDB's query runtime (which executes regular MQL and MongoDB的查询运行时(执行常规MQL和$match
filters) and MongoDB's aggregation runtime (which implements $expr
) can apply different ordering rules when filtering, referred to as "type bracketing". $match
筛选器)和MongoDB的聚合运行时(实现$expr
)可以在筛选时应用不同的排序规则,称为“类型包围”。Consequently, a range query may not yield the same result with 因此,如果某些值具有不同的类型,则使用$expr
as it does with MQL if some values have different types.$expr
的范围查询可能不会产生与使用MQL相同的结果。
Due to the potential challenges outlined, only use a 由于概述了潜在的挑战,如果没有其他方法可以使用常规MQL语法组装筛选条件,请仅在$expr
operator in a $match
stage if there is no other way of assembling the filter criteria using regular MQL syntax.$match
阶段使用$expr
运算符。