aggregate
On this page本页内容
Definition定义
aggregate-
Performs aggregation operation using the aggregation pipeline.使用聚合管道执行聚合操作。The pipeline allows users to process data from a collection or other source with a sequence of stage-based manipulations.管道允许用户通过一系列基于阶段的操作来处理来自集合或其他源的数据。TipIn在mongosh, this command can also be run through thedb.aggregate()anddb.collection.aggregate()helper methods or with thewatch()helper method.mongosh中,此命令也可以通过db.aggregate()和db.collection.aggregate()助手方法运行,也可以与watch()助手方法一起运行。Helper methods are convenient for助手方法对mongoshusers, but they may not return the same level of information as database commands. In cases where the convenience is not needed or the additional return fields are required, use the database command.mongosh用户来说很方便,但它们可能不会返回与数据库命令相同级别的信息。如果不需要方便,或者需要额外的返回字段,请使用数据库命令。
Syntax语法
Changed in version 5.0.5.0版更改。
The command has the following syntax:该命令具有以下语法:
db.runCommand(
{
aggregate: "<collection>" || 1,
pipeline: [ <stage>, <...> ],
explain: <boolean>,
allowDiskUse: <boolean>,
cursor: <document>,
maxTimeMS: <int>,
bypassDocumentValidation: <boolean>,
readConcern: <document>,
collation: <document>,
hint: <string or document>,
comment: <any>,
writeConcern: <document>,
let: <document> // Added in MongoDB 5.0
}
)
Command Fields命令字段
The aggregate command takes the following fields as arguments:aggregate命令将以下字段作为参数:
aggregate | string | 1 for collection agnostic commands.1用于与集合无关的命令。 |
pipeline | array | |
explain | boolean | |
allowDiskUse | boolean | allowDiskUseByDefault for a specific query. You can use this option to either: allowDiskUseByDefault。您可以使用此选项执行以下操作之一:
allowDiskUseByDefault is set to true and the server requires more than 100 megabytes of memory for a pipeline execution stage, MongoDB automatically writes temporary files to disk unless the query specifies { allowDiskUse: false }.allowDiskUseByDefault设置为true,并且服务器在管道执行阶段需要超过100兆字节的内存,MongoDB会自动将临时文件写入磁盘,除非查询指定{ allowDiskUse: false }。allowDiskUseByDefault.allowDiskUseByDefault。usedDisk indicator if any aggregation stage wrote data to temporary files due to memory restrictions. usedDisk指示符。 |
cursor | document | aggregate command without the cursor option unless the command includes the explain option. Unless you include the explain option, you must specify the cursor option. cursor选项的aggregate命令的使用,除非该命令包含explain选项。除非包含explain选项,否则必须指定cursor选项。
|
maxTimeMS | non-negative integer | maxTimeMS, operations will not time out. A value of 0 explicitly specifies the default unbounded behavior.maxTimeMS的值,操作将不会超时。值0显式指定默认的无边界行为。db.killOp(). db.killOp()相同的机制终止超过指定时间限制的操作。 |
bypassDocumentValidation | boolean | $out or $merge aggregation stages.$out或$merge聚合阶段时适用。aggregate to bypass document validation during the operation. This lets you insert documents that do not meet the validation requirements. aggregate在操作期间绕过文档验证。这样可以插入不符合验证要求的文档。 |
readConcern | document | readConcern选项具有以下语法:readConcern: { level: <value> }
$out stage cannot be used in conjunction with read concern "linearizable". $out阶段不能与读取关注"linearizable"一起使用。"linearizable" read concern for db.collection.aggregate(), you cannot include the $out stage in the pipeline.db.collection.aggregate()指定"linearizable"读取关注,则不能在管道中包含$out阶段。$merge stage cannot be used in conjunction with read concern "linearizable". $merge阶段不能与读取关注"linearizable"一起使用。"linearizable" read concern for db.collection.aggregate(), you cannot include the $merge stage in the pipeline. db.collection.aggregate()指定"linearizable"读取关注,则不能在管道中包含$merge阶段。 |
collation | document | collationcollation选项具有以下语法:collation: {
locale field is mandatory; all other collation fields are optional. locale字段是必需的;所有其他排序规则字段都是可选的。db.createCollection()), the operation uses the collation specified for the collection.db.createCollection()),则操作将使用为集合指定的排序规则。 |
hint | string or document | Note |
comment | any |
Note |
writeConcern | document | $out or $merge stage.$out或$merge阶段一起使用的写入关注的文档。$out or $merge stage. $out或$merge阶段使用默认的写入关注。 |
let | document | { <variable_name_1>: <expression_1>,
$$) together with your variable name in the form $$<variable_name>. $$)和形式为$$<variable_name>的变量名。$$targetTotal. $$targetTotal。Note let and variables, see Use Variables in let. let和变量的完整示例,请参阅在let中使用变量。 |
MongoDB 3.6 removes the use of MongoDB 3.6删除了不带aggregate command without the cursor option unless the command includes the explain option. cursor选项的aggregate命令的使用,除非该命令包含explain选项。Unless you include the 除非包含explain option, you must specify the cursor option.explain选项,否则必须指定cursor选项。
To indicate a cursor with the default batch size, specify若要指示具有默认批处理大小的游标,请指定cursor: {}.cursor: {}。To indicate a cursor with a non-default batch size, use要指示具有非默认批大小的游标,请使用cursor: { batchSize: <num> }.cursor: { batchSize: <num> }。
For more information about the aggregation pipeline Aggregation Pipeline, Aggregation Reference, and Aggregation Pipeline Limits.有关聚合管道聚合管道、聚合参考和聚合管道限制的详细信息。
Sessions会话
New in version 4.0. 4.0版新增。
For cursors created inside a session, you cannot call 对于在会话内创建的游标,不能在会话外调用getMore outside the session.getMore。
Similarly, for cursors created outside of a session, you cannot call 类似地,对于在会话之外创建的游标,不能在会话内部调用getMore inside a session.getMore。
Session Idle Timeout会话空闲超时
MongoDB drivers and MongoDB驱动程序和mongosh associate all operations with a server session, with the exception of unacknowledged write operations. mongosh将所有操作与服务器会话相关联,但未确认的写入操作除外。For operations not explicitly associated with a session (i.e. using 对于未显式关联会话的操作(即使用Mongo.startSession()), MongoDB drivers and mongosh create an implicit session and associate it with the operation.Mongo.startSession()),MongoDB驱动程序和mongosh会创建一个隐式会话并将其与操作关联。
If a session is idle for longer than 30 minutes, the MongoDB server marks that session as expired and may close it at any time. 如果会话空闲时间超过30分钟,MongoDB服务器会将该会话标记为已过期,并可能随时关闭该会话。When the MongoDB server closes the session, it also kills any in-progress operations and open cursors associated with the session. 当MongoDB服务器关闭会话时,它还会杀死任何正在进行的操作和打开与会话相关的游标。This includes cursors configured with 这包括配置为noCursorTimeout() or a maxTimeMS() greater than 30 minutes.noCursorTimeout()或maxTimeMS()大于30分钟的游标。
For operations that return a cursor, if the cursor may be idle for longer than 30 minutes, issue the operation within an explicit session using 对于返回游标的操作,如果游标空闲时间可能超过30分钟,请使用Mongo.startSession() and periodically refresh the session using the refreshSessions command. See Session Idle Timeout for more information.Mongo.startSession()在显式会话中发出操作,并使用refreshSessions命令定期刷新会话。有关详细信息,请参阅会话空闲超时。
Transactions事务
aggregate can be used inside multi-document transactions.可以在多文档事务中使用。
However, the following stages are not allowed within transactions:但是,事务中不允许出现以下阶段:
You also cannot specify the 您也不能指定explain option.explain选项。
For cursors created outside of a transaction, you cannot call对于在事务外部创建的游标,不能在事务内部调用getMoreinside the transaction.getMore。For cursors created in a transaction, you cannot call对于在事务中创建的游标,不能在事务外调用getMoreoutside the transaction.getMore。
In most cases, multi-document transaction incurs a greater performance cost over single document writes, and the availability of multi-document transactions should not be a replacement for effective schema design. 在大多数情况下,与单文档写入相比,多文档事务会产生更高的性能成本,并且多文档事务的可用性不应取代有效的模式设计。For many scenarios, the denormalized data model (embedded documents and arrays) will continue to be optimal for your data and use cases. 对于许多场景,非规范化数据模型(嵌入文档和数组)将继续是您的数据和用例的最佳选择。That is, for many scenarios, modeling your data appropriately will minimize the need for multi-document transactions.也就是说,对于许多场景,对数据进行适当建模将最大限度地减少对多文档事务的需求。
For additional transactions usage considerations (such as runtime limit and oplog size limit), see also Production Considerations.有关其他事务使用注意事项(如运行时限制和操作日志大小限制),请参阅生产注意事项。
Client Disconnection客户端断开连接
For 对于不包括aggregate operation that do not include the $out or $merge stages:$out或$merge阶段的aggregate操作:
Starting in MongoDB 4.2, if the client that issued 从MongoDB 4.2开始,如果在操作完成之前发出aggregate disconnects before the operation completes, MongoDB marks aggregate for termination using killOp.aggregate的客户端断开连接,MongoDB会使用killOp标记aggregate以终止。
Stable API
When using Stable API V1:当使用稳定API V1:
You cannot use the following stages in an不能在aggregatecommand:aggregate命令中使用以下阶段:Don't include the不要在explainfield in anaggregatecommand. If you do, the server returns an APIStrictError error.aggregate命令中包含explain字段。如果执行此操作,服务器将返回APIStrictError错误。When using the使用$collStatsstage, you can only use thecountfield.$collStats阶段时,只能使用count字段。No other没有其他$collStatsfields are available.$collStats字段可用。
Example实例
MongoDB 3.6 removes the use of MongoDB 3.6删除了不带aggregate command without the cursor option unless the command includes the explain option. cursor选项的aggregate命令的使用,除非该命令包含explain选项。Unless you include the 除非包含explain option, you must specify the cursor option.explain选项,否则必须指定cursor选项。
To indicate a cursor with the default batch size, specify若要指示具有默认批处理大小的游标,请指定cursor: {}.cursor: {}。To indicate a cursor with a non-default batch size, use要指示具有非默认批大小的游标,请使用cursor: { batchSize: <num> }.cursor: { batchSize: <num> }。
Rather than run the 大多数用户不应该直接运行aggregate command directly, most users should use the db.collection.aggregate() helper provided in mongosh or the equivalent helper in their driver. aggregate命令,而应该使用mongosh中提供的db.collection.aggregate()助手或驱动程序中的等效助手。In 2.6 and later, the 在2.6及更高版本中,db.collection.aggregate() helper always returns a cursor.db.collection.aggregate()助手始终返回一个游标。
Except for the first two examples which demonstrate the command syntax, the examples in this page use the 除了前两个示例演示了命令语法外,本页中的示例都使用了db.collection.aggregate() helper.db.collection.aggregate()助手。
Aggregate Data with Multi-Stage Pipeline使用多级管道聚合数据
A collection articles contains documents such as the following:articles文章包含以下文档:
{
_id: ObjectId("52769ea0f3dc6ead47c9a1b2"),
author: "abc123",
title: "zzz",
tags: [ "programming", "database", "mongodb" ]
}
The following example performs an 以下示例对aggregate operation on the articles collection to calculate the count of each distinct element in the tags array that appears in the collection.articles集合执行aggregate操作,以计算集合中出现的tags数组中每个不同元素的计数。
db.runCommand( {
aggregate: "articles",
pipeline: [
{ $project: { tags: 1 } },
{ $unwind: "$tags" },
{ $group: { _id: "$tags", count: { $sum : 1 } } }
],
cursor: { }
} )
In 在mongosh, this operation can use the db.collection.aggregate() helper as in the following:mongosh中,此操作可以使用db.collection.aggregate()助手,如下所示:
db.articles.aggregate( [
{ $project: { tags: 1 } },
{ $unwind: "$tags" },
{ $group: { _id: "$tags", count: { $sum : 1 } } }
] )
Use $currentOp on an Admin Database在admin数据库上使用$currentOp
The following example runs a pipeline with two stages on the admin database. 以下示例在管理数据库上运行一个包含两个阶段的管道。The first stage runs the 第一阶段运行$currentOp operation and the second stage filters the results of that operation.$currentOp操作,第二阶段筛选该操作的结果。
db.adminCommand( {
aggregate : 1,
pipeline : [ {
$currentOp : { allUsers : true, idleConnections : true } }, {
$match : { shard : "shard01" }
}
],
cursor : { }
} )
The aggregate command does not specify a collection and instead takes the form {aggregate: 1}. aggregate命令未指定集合,而是采用{aggregate: 1}的形式。This is because the initial 这是因为最初的$currentOp stage does not draw input from a collection. It produces its own data that the rest of the pipeline uses.$currentOp阶段不会从集合中提取输入。它生成自己的数据,供管道的其他部分使用。
The new 添加了新的db.aggregate() helper has been added to assist in running collectionless aggregations such as this. The above aggregation could also be run like this example.db.aggregate()帮助程序来帮助运行这样的无集合聚合。上面的聚合也可以像这个例子一样运行。
Return Information on the Aggregation Operation返回有关聚合操作的信息
The following aggregation operation sets the optional field 以下聚合操作将可选字段explain to true to return information about the aggregation operation.explain设置为true,以返回有关聚合操作的信息。
db.orders.aggregate([
{ $match: { status: "A" } },
{ $group: { _id: "$cust_id", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } }
],
{ explain: true }
)
The explain output is subject to change between releases.解释输出可能会在不同版本之间发生变化。
See also: 另请参阅:
db.collection.aggregate() method方法
Interaction with allowDiskUseByDefault与allowDiskUseByDefault的交互
allowDiskUseByDefaultStarting in MongoDB 6.0, pipeline stages that require more than 100 megabytes of memory to execute write temporary files to disk by default. 从MongoDB 6.0开始,需要超过100MB内存才能执行的管道阶段默认情况下会将临时文件写入磁盘。In earlier verisons of MongoDB, you must pass 在MongoDB的早期版本中,必须将{ allowDiskUse: true } to individual find and aggregate commands to enable this behavior.{ allowDiskUse: true }传递给单个find和aggregate命令才能启用此行为。
Individual 单独的find and aggregate commands may override the allowDiskUseByDefault parameter by either:find和aggregate命令可以通过以下方式覆盖allowDiskUseByDefault参数:
Using当{ allowDiskUse: true }to allow writing temporary files out to disk whenallowDiskUseByDefaultis set tofalseallowDiskUseByDefault设置为false时,使用{ allowDiskUse: true }允许将临时文件写入磁盘Using当{ allowDiskUse: false }to prohibit writing temporary files out to disk whenallowDiskUseByDefaultis set totrueallowDiskUseByDefault设置为true时,使用{ allowDiskUse: false }禁止将临时文件写入磁盘
Starting in MongoDB 4.2, the profiler log messages and diagnostic log messages includes a 从MongoDB 4.2开始,如果任何聚合阶段由于内存限制将数据写入临时文件,探查器日志消息和诊断日志消息都会包含一个usedDisk indicator if any aggregation stage wrote data to temporary files due to memory restrictions.usedDisk指示符。
See also: 另请参阅:
Aggregate Data Specifying Batch Size指定批量大小的聚合数据
To specify an initial batch size, specify the 要指定初始批大小,请在batchSize in the cursor field, as in the following example:cursor字段中指定batchSize,如下例所示:
db.orders.aggregate( [
{ $match: { status: "A" } },
{ $group: { _id: "$cust_id", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } },
{ $limit: 2 }
],
{ cursor: { batchSize: 0 } }
)
The 指定初始批大小的{ cursor: { batchSize: 0 } } document, which specifies the size of the initial batch size, indicates an empty first batch. { cursor: { batchSize: 0 } }文档表示第一批为空。This batch size is useful for quickly returning a cursor or failure message without doing significant server-side work.这个批量大小对于快速返回游标或失败消息非常有用,而不需要做大量的服务器端工作。
To specify batch size for subsequent 要为后续的getMore operations (after the initial batch), use the batchSize field when running the getMore command.getMore操作指定批处理大小(在初始批处理之后),请在运行getMore命令时使用batchSize字段。
Specify a Collation指定排序规则
collation allows users to specify language-specific rules for string comparison, such as rules for lettercase and accent marks.允许用户为字符串比较指定特定于语言的规则,例如字母大小写和重音标记的规则。
A collection 集合myColl has the following documents:myColl包含以下文档:
{ _id: 1, category: "café", status: "A" }
{ _id: 2, category: "cafe", status: "a" }
{ _id: 3, category: "cafE", status: "a" }
The following aggregation operation includes the Collation option:以下聚合操作包括collation选项:
db.myColl.aggregate(
[ { $match: { status: "A" } }, { $group: { _id: "$category", count: { $sum: 1 } } } ],
{ collation: { locale: "fr", strength: 1 } }
);
For descriptions on the collation fields, see Collation Document.有关排序规则字段的说明,请参阅排序规则文档。
Hint an Index提示索引
Create a collection 使用以下文档创建一个集合foodColl with the following documents:foodColl:
db.foodColl.insertMany( [
{ _id: 1, category: "cake", type: "chocolate", qty: 10 },
{ _id: 2, category: "cake", type: "ice cream", qty: 25 },
{ _id: 3, category: "pie", type: "boston cream", qty: 20 },
{ _id: 4, category: "pie", type: "blueberry", qty: 15 }
] )
Create the following indexes:创建以下索引:
db.foodColl.createIndex( { qty: 1, type: 1 } );
db.foodColl.createIndex( { qty: 1, category: 1 } );
The following aggregation operation includes the 以下聚合操作包括用于强制使用指定索引的hint option to force the usage of the specified index:hint选项:
db.foodColl.aggregate(
[ { $sort: { qty: 1 }}, { $match: { category: "cake", qty: 10 } }, { $sort: { type: -1 } } ],
{ hint: { qty: 1, category: 1 } }
)
Override Default Read Concern覆盖默认读取关注
To override the default read concern level, use the 要覆盖默认的读取关注级别,请使用readConcern option. readConcern选项。The getMore command uses the readConcern level specified in the originating aggregate command.getMore命令使用原始聚合命令中指定的readConcern级别。
You cannot use the 不能将$out or the $merge stage in conjunction with read concern "linearizable". $out或$merge阶段与读取关注"linearizable"一起使用。That is, if you specify 也就是说,如果为"linearizable" read concern for db.collection.aggregate(), you cannot include either stages in the pipeline.db.collection.aggregate()指定了"linearizable"读取关注,则不能在管道中包含这两个阶段。
The following operation on a replica set specifies a read concern of 以下对副本集的操作指定了"majority" to read the most recent copy of the data confirmed as having been written to a majority of the nodes."majority"的读取关注,以读取已确认写入大多数节点的数据的最新副本。
Starting in MongoDB 4.2, you can specify read concern level从MongoDB 4.2开始,您可以为包含"majority"for an aggregation that includes an$outstage.$out阶段的聚合指定读取关注级别"majority"。Regardless of the read concern level, the most recent data on a node may not reflect the most recent version of the data in the system.无论读取关注级别如何,节点上的最新数据都可能无法反映系统中数据的最新版本。
db.restaurants.aggregate(
[ { $match: { rating: { $lt: 5 } } } ],
{ readConcern: { level: "majority" } }
)
To ensure that a single thread can read its own writes, use 要确保单个线程可以读取自己的写入,请对副本集的主线程使用"majority" read concern and "majority" write concern against the primary of the replica set."majority"读取关注和"majority"写入关注。
Use Variables in let在let中使用变量
letNew in version 5.0. 5.0版新增。
To define variables that you can access elsewhere in the command, use the let option.要定义可以在命令的其他地方访问的变量,请使用let选项。
Create a collection 创建一个包含蛋糕口味销售的集合cakeSales containing sales for cake flavors:cakeSales:
db.cakeSales.insertMany( [
{ _id: 1, flavor: "chocolate", salesTotal: 1580 },
{ _id: 2, flavor: "strawberry", salesTotal: 4350 },
{ _id: 3, flavor: "cherry", salesTotal: 2150 }
] )
The following example:以下示例:
retrieves the cake that has a检索salesTotalgreater than 3000, which is the cake with an_idof 2salesTotal大于3000的蛋糕,即_id为2的蛋糕defines a在targetTotalvariable inlet, which is referenced in$gtas$$targetTotallet中定义targetTotal变量,该变量在$gt中被引用为$$targetTotal
db.runCommand( {
aggregate: db.cakeSales.getName(),
pipeline: [
{ $match: {
$expr: { $gt: [ "$salesTotal", "$$targetTotal" ] }
} },
],
cursor: {},
let: { targetTotal: 3000 }
} )