Definition定义
aggregatePerforms aggregation operation using the aggregation pipeline. The pipeline allows users to process data from a collection or other source with a sequence of stage-based manipulations.使用聚合管道执行聚合操作。该管道允许用户通过一系列基于阶段的操作来处理来自集合或其他来源的数据。Tip
In在mongosh, this command can also be run through thedb.aggregate()anddb.collection.aggregate()helper methods or with thewatch()helper method.mongosh中,此命令也可以通过db.aggregate()和db.collection.aggregate()辅助方法或watch()辅助方法运行。Helper methods are convenient for助手方法对mongoshusers, but they may not return the same level of information as database commands.mongosh用户来说很方便,但它们可能不会返回与数据库命令相同级别的信息。In cases where the convenience is not needed or the additional return fields are required, use the database command.如果不需要便利性或需要额外的返回字段,请使用数据库命令。
Compatibility兼容性
This command is available in deployments hosted in the following environments:此命令在以下环境中托管的部署中可用:
- MongoDB Atlas
: The fully managed service for MongoDB deployments in the cloud:云中MongoDB部署的完全托管服务
Important
This command has limited support in M0 and Flex clusters. For more information, see Unsupported Commands.此命令在M0和Flex集群中的支持有限。有关详细信息,请参阅不支持的命令。
- MongoDB
Enterprise企业版: The subscription-based, self-managed version of MongoDB:MongoDB的基于订阅的自我管理版本 - MongoDB
Community社区版: The source-available, free-to-use, and self-managed version of MongoDB:MongoDB的源代码可用、免费使用和自我管理版本
Syntax语法
Changed in version 5.0.在5.0版本中更改。
The command has the following syntax:该命令具有以下语法:
db.runCommand(
{
aggregate: "<collection>" || 1,
pipeline: [ <stage>, <...> ],
explain: <boolean>,
allowDiskUse: <boolean>,
cursor: <document>,
maxTimeMS: <int>,
bypassDocumentValidation: <boolean>,
readConcern: <document>,
collation: <document>,
hint: <string or document>,
comment: <any>,
writeConcern: <document>,
let: <document> // Added in MongoDB 5.0
}
)
Command Fields命令字段
The aggregate command takes the following fields as arguments:aggregate命令接受以下字段作为参数:
aggregate | string | 1 for collection agnostic commands.1表示与集合无关的命令。 |
pipeline | array | |
explain | boolean |
|
| boolean | allowDiskUseByDefault for a specific query. You can use this option to either:allowDiskUseByDefault。您可以使用此选项:
allowDiskUseByDefault is set to true and the server requires more than 100 megabytes of memory for a pipeline execution stage, MongoDB automatically writes temporary files to disk unless the query specifies { allowDiskUse: false }.allowDiskUseByDefault设置为true,并且服务器在管道执行阶段需要超过100兆字节的内存,MongoDB会自动将临时文件写入磁盘,除非查询指定{ allowDiskUse: false }。allowDiskUseByDefault.allowDiskUseByDefault。usedDisk indicator if any aggregation stage wrote data to temporary files due to memory restrictions.usedDisk指示符。 |
cursor | document | aggregate command with the cursor option unless the command includes the explain option.explain选项,否则必须将aggregate命令与cursor选项一起使用。
|
maxTimeMS | non-negative integer | maxTimeMS, operations will not time out. A value of 0 explicitly specifies the default unbounded behavior.maxTimeMS的值,操作将不会超时。值0明确指定默认的无界行为。db.killOp(). MongoDB only terminates an operation at one of its designated interrupt points.db.killOp()相同的机制终止超过分配时间限制的操作。MongoDB仅在其指定的中断点之一终止操作。 |
bypassDocumentValidation | boolean | $out or $merge aggregation stages.$out或$merge聚合阶段时才适用。aggregate to bypass schema validation during the operation. This lets you insert documents that do not meet the validation requirements.aggregate在操作过程中绕过架构验证。这允许您插入不符合验证要求的文档。 |
readConcern | document | readConcern option has the following syntax: readConcern: { level: <value> }readConcern选项具有以下语法:readConcern: { level: <value> }
$out stage cannot be used in conjunction with read concern "linearizable". $out阶段不能与读取关注"linearizable"结合使用。"linearizable" read concern for db.collection.aggregate(), you cannot include the $out stage in the pipeline.db.collection.aggregate()指定"linearizable"读取关注,则不能在管道中包含$out阶段。$merge stage cannot be used in conjunction with read concern "linearizable". $merge阶段不能与读取关注"linearizable"结合使用。"linearizable" read concern for db.collection.aggregate(), you cannot include the $merge stage in the pipeline.db.collection.aggregate()指定"linearizable"读取关注,则不能在管道中包含$merge阶段。 |
collation | document |
locale field is mandatory; all other collation fields are optional. For descriptions of the fields, see Collation Document.locale字段是必填的;所有其他排序字段都是可选的。有关字段的描述,请参阅排序规则文档。db.createCollection()), the operation uses the collation specified for the collection.db.createCollection()),则操作将使用为集合指定的排序规则。 |
hint | string or document | hint does not apply to $lookup and $graphLookup stages.hint不适用于$lookup和$graphLookup阶段。 |
comment | any |
aggregate command is inherited by any subsequent getMore commands run on the aggregate cursor.aggregate命令上设置的任何注释都会被在aggregate游标上运行的任何后续getMore命令继承。 |
writeConcern | document | $out or $merge stage.$out或$merge阶段一起使用的写入关注的文档。$out or $merge stage.$out或$merge阶段使用默认写入关注。 |
let | document | $$) together with your variable name in the form $$<variable_name>. $$)和变量名,格式为$$<variable_name>。$$targetTotal.$$targetTotal。$match stage, you must access the variable within the $expr operator.$match阶段使用变量筛选结果,您必须在$expr运算符中访问该变量。
let and variables, see Use Variables in let.let和变量的完整示例,请参阅在let中使用变量。 |
You must use the 除非命令中包含aggregate command with the cursor option unless the command includes the explain option.explain选项,否则必须将aggregate命令与cursor选项一起使用。
To indicate a cursor with the default batch size, specify要使用默认批处理大小指示游标,请指定cursor: {}.cursor: {}。To indicate a cursor with a non-default batch size, use要指示具有非默认批大小的游标,请使用cursor: { batchSize: <num> }.cursor: { batchSize: <num> }。
For more information about the aggregation pipeline, see:有关聚合管道的更多信息,请参阅:
Sessions会话
For cursors created inside a session, you cannot call 对于在会话内创建的游标,不能在会话外调用getMore outside the session.getMore。
Similarly, for cursors created outside of a session, you cannot call 同样,对于在会话外创建的游标,您不能在会话内调用getMore inside a session.getMore。
Session Idle Timeout会话空闲超时
MongoDB drivers and MongoDB驱动程序和mongosh associate all operations with a server session, with the exception of unacknowledged write operations. mongosh将所有操作与服务器会话相关联,但未确认的写入操作除外。For operations not explicitly associated with a session (i.e. using 对于未明确与会话关联的操作(即使用Mongo.startSession()), MongoDB drivers and mongosh create an implicit session and associate it with the operation.Mongo.startSession()),MongoDB驱动程序和mongosh会创建一个隐式会话并将其与操作关联。
If a session is idle for longer than 30 minutes, the MongoDB server marks that session as expired and may close it at any time. When the MongoDB server closes the session, it also kills any in-progress operations and open cursors associated with the session. 如果会话空闲时间超过30分钟,MongoDB服务器会将该会话标记为已过期,并可能随时将其关闭。当MongoDB服务器关闭会话时,它还会终止任何正在进行的操作并打开与会话关联的游标。This includes cursors configured with 这包括配置了noCursorTimeout() or a maxTimeMS() greater than 30 minutes.noCursorTimeout()或大于30分钟的maxTimeMS()的游标。
For operations that return a cursor, if the cursor may be idle for longer than 30 minutes, issue the operation within an explicit session using 对于返回游标的操作,如果游标可能空闲超过30分钟,请使用Mongo.startSession() and periodically refresh the session using the refreshSessions command. Mongo.startSession()在显式会话中发出该操作,并使用refreshSessions命令定期刷新会话。See Session Idle Timeout for more information.
Transactions事务
aggregate can be used inside distributed transactions.aggregate可以在分布式事务中使用。
However, the following stages are not allowed within transactions:但是,事务中不允许出现以下阶段:
$collStats$currentOp$indexStats$listLocalSessions$listSessions$merge$out$planCacheStats$unionWith
You also cannot specify the 您也不能指定explain option.explain选项。
For cursors created outside of a transaction, you cannot call对于在事务外部创建的游标,您不能在事务内部调用getMoreinside the transaction.getMore。For cursors created in a transaction, you cannot call对于在事务中创建的游标,您不能在事务外部调用getMoreoutside the transaction.getMore。
Important
In most cases, a distributed transaction incurs a greater performance cost over single document writes, and the availability of distributed transactions should not be a replacement for effective schema design. 在大多数情况下,分布式事务比单文档写入产生更大的性能成本,分布式事务的可用性不应取代有效的模式设计。For many scenarios, the denormalized data model (embedded documents and arrays) will continue to be optimal for your data and use cases. 对于许多场景,非规范化数据模型(嵌入式文档和数组)将继续是您的数据和用例的最佳选择。That is, for many scenarios, modeling your data appropriately will minimize the need for distributed transactions.也就是说,对于许多场景,适当地对数据进行建模将最大限度地减少对分布式事务的需求。
For additional transactions usage considerations (such as runtime limit and oplog size limit), see also Production Considerations.有关其他事务使用注意事项(如运行时限制和oplog大小限制),另请参阅生产注意事项。
Client Disconnection客户端断开连接
For 对于不包括aggregate operation that do not include the $out or $merge stages:$out或$merge阶段的聚合操作:
If the client that issued 如果发出aggregate disconnects before the operation completes, MongoDB marks aggregate for termination using killOp.aggregate的客户端在操作完成之前断开连接,MongoDB将使用killOp标记aggregate终止。
Query Settings查询设置
8.0版本中的新功能。New in version 8.0.版本8.0中的新功能。
You can use query settings to set index hints, set operation rejection filters, and other fields. 您可以使用查询设置来设置索引提示、设置操作拒绝筛选器和其他字段。The settings apply to the query shape on the entire cluster. 这些设置适用于整个集群上的查询形状。The cluster retains the settings after shutdown.集群在关闭后保留设置。
The query optimizer uses the query settings as an additional input during query planning, which affects the plan selected to run the query. You can also use query settings to block a query shape.查询优化器在查询规划期间使用查询设置作为额外输入,这会影响为运行查询而选择的计划。您还可以使用查询设置来阻止查询形状。
To add query settings and explore examples, see 要添加查询设置并探索示例,请参阅setQuerySettings.setQuerySettings。
You can add query settings for 您可以为find, distinct, and aggregate commands.find、distinct和aggregate命令添加查询设置。
Query settings have more functionality and are preferred over deprecated index filters.查询设置具有更多功能,并且优于已弃用的索引筛选器。
To remove query settings, use 要删除查询设置,请使用removeQuerySettings. removeQuerySettings。To obtain the query settings, use a 要获取查询设置,请在聚合管道中使用$querySettings stage in an aggregation pipeline.$querySettings阶段。
Stable 稳定API
When using Stable API V1:当使用稳定API V1:
You cannot use the following stages in an您不能在aggregatecommand:aggregate命令中使用以下阶段:Don't include the不要在explainfield in anaggregatecommand. If you do, the server returns an APIStrictError error.aggregate命令中包含explain字段。如果这样做,服务器将返回APIStrictError错误。When using the使用$collStatsstage, you can only use thecountfield. No other$collStatsfields are available.$collStats阶段时,您只能使用count字段。没有其他$collStats字段可用。
Example示例
You must use the 除非命令中包含aggregate command with the cursor option unless the command includes the explain option.explain选项,否则必须将aggregate命令与cursor选项一起使用。
To indicate a cursor with the default batch size, specify要使用默认批处理大小指示游标,请指定cursor: {}.cursor: {}。To indicate a cursor with a non-default batch size, use要指示具有非默认批大小的游标,请使用cursor: { batchSize: <num> }.cursor: { batchSize: <num> }。
Rather than run the 大多数用户应该使用aggregate command directly, most users should use the db.collection.aggregate() helper provided in mongosh or the equivalent helper in their driver. mongosh中提供的db.collection.aggregate()辅助程序或其驱动程序中的等效辅助程序,而不是直接运行aggregate命令。In 2.6 and later, the 在2.6及更高版本中,db.collection.aggregate() helper always returns a cursor.db.collection.aggregate()辅助函数总是返回一个游标。
Except for the first two examples which demonstrate the command syntax, the examples in this page use the 除了演示命令语法的前两个示例外,本页中的示例都使用db.collection.aggregate() helper.db.collection.aggregate()助手。
Aggregate Data with Multi-Stage Pipeline使用多级管道聚合数据
A collection 集合articles contains documents such as the following:articles包含以下文档:
{
_id: ObjectId("52769ea0f3dc6ead47c9a1b2"),
author: "abc123",
title: "zzz",
tags: [ "programming", "database", "mongodb" ]
}
The following example performs an 以下示例对aggregate operation on the articles collection to calculate the count of each distinct element in the tags array that appears in the collection.articles集合执aggregate操作,以计算集合中出现的tags数组中每个不同元素的计数。
db.runCommand( {
aggregate: "articles",
pipeline: [
{ $project: { tags: 1 } },
{ $unwind: "$tags" },
{ $group: { _id: "$tags", count: { $sum : 1 } } }
],
cursor: { }
} )
In 在mongosh, this operation can use the db.collection.aggregate() helper as in the following:mongosh中,此操作可以使用db.collection.aggregate()辅助函数,如下所示:
db.articles.aggregate( [
{ $project: { tags: 1 } },
{ $unwind: "$tags" },
{ $group: { _id: "$tags", count: { $sum : 1 } } }
] )Use $currentOp on an Admin Database在admin数据库上使用$currentOp
The following example runs a pipeline with two stages on the admin database. 以下示例在管理数据库上运行一个包含两个阶段的管道。The first stage runs the 第一阶段运行$currentOp operation and the second stage filters the results of that operation.$currentOp操作,第二阶段筛选该操作的结果。
db.adminCommand( {
aggregate : 1,
pipeline : [ {
$currentOp : { allUsers : true, idleConnections : true } }, {
$match : { shard : "shard01" }
}
],
cursor : { }
} )
Note
The aggregate command does not specify a collection and instead takes the form {aggregate: 1}. aggregate命令没有指定集合,而是采用{aggregate: 1}的形式。This is because the initial 这是因为初始的$currentOp stage does not draw input from a collection. $currentOp阶段不从集合中获取输入。It produces its own data that the rest of the pipeline uses.它生成自己的数据,供管道的其他部分使用。
The new 添加了新的db.aggregate() helper has been added to assist in running collectionless aggregations such as this. db.aggregate()辅助程序,以帮助运行此类无集体聚合。The above aggregation could also be run like this example.上述聚合也可以像这个例子一样运行。
Return Information on the Aggregation Operation返回聚合操作的信息
The following aggregation operation sets the optional field 以下聚合操作将可选字段explain to true to return information about the aggregation operation.explain设置为true,以返回有关聚合操作的信息。
db.orders.aggregate([
{ $match: { status: "A" } },
{ $group: { _id: "$cust_id", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } }
],
{ explain: true }
)
Note
The explain output is subject to change between releases.解释输出可能会在不同版本之间发生变化。
Tip
db.collection.aggregate() method方法
Interaction with allowDiskUseByDefault与allowDiskUseByDefault的交互
allowDiskUseByDefaultStarting in MongoDB 6.0, pipeline stages that require more than 100 megabytes of memory to execute write temporary files to disk by default. These temporary files last for the duration of the pipeline execution and can influence storage space on your instance. 从MongoDB 6.0开始,默认情况下,需要超过100兆字节内存来执行将临时文件写入磁盘的流水线阶段。这些临时文件在管道执行期间持续存在,并可能影响实例上的存储空间。In earlier versions of MongoDB, you must pass 在早期版本的MongoDB中,您必须将{ allowDiskUse: true } to individual find and aggregate commands to enable this behavior.{ allowDiskUse: true }传递给单个find和aggregate命令以启用此行为。
Individual 单个find and aggregate commands can override the allowDiskUseByDefault parameter by either:find和aggregate命令可以通过以下任一方式覆盖allowDiskUseByDefault参数:
Using当{ allowDiskUse: true }to allow writing temporary files out to disk whenallowDiskUseByDefaultis set tofalseallowDiskUseByDefault设置为false时,使用{ allowDiskUse: true }允许将临时文件写入磁盘Using当{ allowDiskUse: false }to prohibit writing temporary files out to disk whenallowDiskUseByDefaultis set totrueallowDiskUseByDefault设置为true时,使用{ allowDiskUse: false }禁止将临时文件写入磁盘
The profiler log messages and diagnostic log messages includes a 如果由于内存限制,任何聚合阶段将数据写入临时文件,分析器日志消息和诊断日志消息都会包含一个usedDisk indicator if any aggregation stage wrote data to temporary files due to memory restrictions.usedDisk指示符。
Aggregate Data Specifying Batch Size指定批大小的聚合数据
To specify an initial batch size, specify the 要指定初始批大小,请在batchSize in the cursor field, as in the following example:cursor字段中指定batchSize,如下例所示:
db.orders.aggregate( [
{ $match: { status: "A" } },
{ $group: { _id: "$cust_id", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } },
{ $limit: 2 }
],
{ cursor: { batchSize: 0 } }
)
The { cursor: { batchSize: 0 } } document, which specifies the size of the initial batch size, indicates an empty first batch. This batch size is useful for quickly returning a cursor or failure message without doing significant server-side work.{ cursor: { batchSize: 0 } }文档指定了初始批大小的大小,表示第一批为空。此批大小对于快速返回游标或失败消息非常有用,而无需进行大量的服务器端工作。
To specify batch size for subsequent 要为后续的getMore operations (after the initial batch), use the batchSize field when running the getMore command.getMore操作(在初始批处理之后)指定批处理大小,请在运行getMore命令时使用batchSize字段。
Specify a Collation指定排序规则
Collation allows users to specify language-specific rules for string comparison, such as rules for lettercase and accent marks.排序规则允许用户为字符串比较指定特定于语言的规则,例如字母大小写和重音标记的规则。
A collection myColl has the following documents:myColl集合有以下文件:
{ _id: 1, category: "café", status: "A" }
{ _id: 2, category: "cafe", status: "a" }
{ _id: 3, category: "cafE", status: "a" }
The following aggregation operation includes the Collation option:以下聚合操作包括Collation选项:
db.myColl.aggregate(
[ { $match: { status: "A" } }, { $group: { _id: "$category", count: { $sum: 1 } } } ],
{ collation: { locale: "fr", strength: 1 } }
);
For descriptions on the collation fields, see Collation Document.有关排序规则字段的说明,请参阅排序规则文档。
Hint an Index提示索引
Create a collection 使用以下文档创建集合foodColl with the following documents:foodColl:
db.foodColl.insertMany( [
{ _id: 1, category: "cake", type: "chocolate", qty: 10 },
{ _id: 2, category: "cake", type: "ice cream", qty: 25 },
{ _id: 3, category: "pie", type: "boston cream", qty: 20 },
{ _id: 4, category: "pie", type: "blueberry", qty: 15 }
] )
Create the following indexes:创建以下索引:
db.foodColl.createIndex( { qty: 1, type: 1 } );
db.foodColl.createIndex( { qty: 1, category: 1 } );
The following aggregation operation includes the 以下聚合操作包括强制使用指定索引的hint option to force the usage of the specified index:hint选项:
db.foodColl.aggregate(
[ { $sort: { qty: 1 }}, { $match: { category: "cake", qty: 10 } }, { $sort: { type: -1 } } ],
{ hint: { qty: 1, category: 1 } }
)Override Default Read Concern覆盖默认读取关注
To override the default read concern level, use the 要覆盖默认的读取关注级别,请使用readConcern option. readConcern选项。The getMore command uses the readConcern level specified in the originating aggregate command.getMore命令使用原始聚合命令中指定的readConcern级别。
You cannot use the 您不能将$out or the $merge stage in conjunction with read concern "linearizable". $out或$merge阶段与读取关注"linearizable"结合使用。That is, if you specify 也就是说,如果为"linearizable" read concern for db.collection.aggregate(), you cannot include either stages in the pipeline.db.collection.aggregate()指定"linearizable"读取关注,则不能在管道中包含任何阶段。
The following operation on a replica set specifies a read concern of 对副本集的以下操作指定了"majority" to read the most recent copy of the data confirmed as having been written to a majority of the nodes."majority"读取关注,以读取确认已写入大多数节点的数据的最新副本。
Important
You can specify read concern level您可以为包含"majority"for an aggregation that includes an$outstage.$out阶段的聚合指定读取关注级别"majority"。Regardless of the read concern level, the most recent data on a node may not reflect the most recent version of the data in the system.无论读取关注级别如何,节点上的最新数据可能不会反映系统中数据的最新版本。
db.restaurants.aggregate(
[ { $match: { rating: { $lt: 5 } } } ],
{ readConcern: { level: "majority" } }
)
To ensure that a single thread can read its own writes, use 为了确保单个线程可以读取自己的写入,请对副本集的主线程使用"majority" read concern and "majority" write concern against the primary of the replica set."majority"读取关注和"majority"写入关注。
Use Variables in let在let中使用变量
let5.0版本中的新功能。New in version 5.0.版本5.0中的新功能。
To define variables that you can access elsewhere in the command, use the 要定义可以在命令的其他地方访问的变量,请使用let option.let选项。
Note
Create a collection 创建一个包含蛋糕口味销售的cakeSales containing sales for cake flavors:cakeSales集合:
db.cakeSales.insertMany( [
{ _id: 1, flavor: "chocolate", salesTotal: 1580 },
{ _id: 2, flavor: "strawberry", salesTotal: 4350 },
{ _id: 3, flavor: "cherry", salesTotal: 2150 }
] )
The following example:以下示例:
retrieves the cake that has a检索salesTotalgreater than 3000, which is the cake with an_idof 2salesTotal大于3000的蛋糕,即_id为2的蛋糕defines a在targetTotalvariable inlet, which is referenced in$gtas$$targetTotallet中定义一个targetTotal变量,在$gt中引用为$$targetTotal
db.runCommand( {
aggregate: db.cakeSales.getName(),
pipeline: [
{ $match: {
$expr: { $gt: [ "$salesTotal", "$$targetTotal" ] }
} },
],
cursor: {},
let: { targetTotal: 3000 }
} )