aggregate
On this page本页内容
Definition定义
aggregate
-
Performs aggregation operation using the aggregation pipeline.使用聚合管道执行聚合操作。The pipeline allows users to process data from a collection or other source with a sequence of stage-based manipulations.管道允许用户通过一系列基于阶段的操作来处理来自集合或其他源的数据。TipIn在mongosh
, this command can also be run through thedb.aggregate()
anddb.collection.aggregate()
helper methods or with thewatch()
helper method.mongosh
中,此命令也可以通过db.aggregate()
和db.collection.aggregate()
助手方法运行,也可以与watch()
助手方法一起运行。Helper methods are convenient for助手方法对mongosh
users, but they may not return the same level of information as database commands. In cases where the convenience is not needed or the additional return fields are required, use the database command.mongosh
用户来说很方便,但它们可能不会返回与数据库命令相同级别的信息。如果不需要方便,或者需要额外的返回字段,请使用数据库命令。
Syntax语法
Changed in version 5.0.5.0版更改。
The command has the following syntax:该命令具有以下语法:
db.runCommand(
{
aggregate: "<collection>" || 1,
pipeline: [ <stage>, <...> ],
explain: <boolean>,
allowDiskUse: <boolean>,
cursor: <document>,
maxTimeMS: <int>,
bypassDocumentValidation: <boolean>,
readConcern: <document>,
collation: <document>,
hint: <string or document>,
comment: <any>,
writeConcern: <document>,
let: <document> // Added in MongoDB 5.0
}
)
Command Fields命令字段
The aggregate
command takes the following fields as arguments:aggregate
命令将以下字段作为参数:
aggregate | string | 1 for collection agnostic commands.1 用于与集合无关的命令。 |
pipeline | array | |
explain | boolean | |
allowDiskUse | boolean | allowDiskUseByDefault for a specific query. You can use this option to either: allowDiskUseByDefault 。您可以使用此选项执行以下操作之一:
allowDiskUseByDefault is set to true and the server requires more than 100 megabytes of memory for a pipeline execution stage, MongoDB automatically writes temporary files to disk unless the query specifies { allowDiskUse: false } .allowDiskUseByDefault 设置为true ,并且服务器在管道执行阶段需要超过100兆字节的内存,MongoDB会自动将临时文件写入磁盘,除非查询指定{ allowDiskUse: false } 。allowDiskUseByDefault .allowDiskUseByDefault 。usedDisk indicator if any aggregation stage wrote data to temporary files due to memory restrictions. usedDisk 指示符。 |
cursor | document | aggregate command without the cursor option unless the command includes the explain option. Unless you include the explain option, you must specify the cursor option. cursor 选项的aggregate 命令的使用,除非该命令包含explain 选项。除非包含explain 选项,否则必须指定cursor 选项。
|
maxTimeMS | non-negative integer | maxTimeMS , operations will not time out. A value of 0 explicitly specifies the default unbounded behavior.maxTimeMS 的值,操作将不会超时。值0 显式指定默认的无边界行为。db.killOp() . db.killOp() 相同的机制终止超过指定时间限制的操作。 |
bypassDocumentValidation | boolean | $out or $merge aggregation stages.$out 或$merge 聚合阶段时适用。aggregate to bypass document validation during the operation. This lets you insert documents that do not meet the validation requirements. aggregate 在操作期间绕过文档验证。这样可以插入不符合验证要求的文档。 |
readConcern | document | readConcern 选项具有以下语法:readConcern: { level: <value> }
$out stage cannot be used in conjunction with read concern "linearizable" . $out 阶段不能与读取关注"linearizable" 一起使用。"linearizable" read concern for db.collection.aggregate() , you cannot include the $out stage in the pipeline.db.collection.aggregate() 指定"linearizable" 读取关注,则不能在管道中包含$out 阶段。$merge stage cannot be used in conjunction with read concern "linearizable" . $merge 阶段不能与读取关注"linearizable" 一起使用。"linearizable" read concern for db.collection.aggregate() , you cannot include the $merge stage in the pipeline. db.collection.aggregate() 指定"linearizable" 读取关注,则不能在管道中包含$merge 阶段。 |
collation | document | collation collation 选项具有以下语法:collation: { locale field is mandatory; all other collation fields are optional. locale 字段是必需的;所有其他排序规则字段都是可选的。db.createCollection() ), the operation uses the collation specified for the collection.db.createCollection() ),则操作将使用为集合指定的排序规则。 |
hint | string or document | Note |
comment | any |
Note |
writeConcern | document | $out or $merge stage.$out 或$merge 阶段一起使用的写入关注的文档。$out or $merge stage. $out 或$merge 阶段使用默认的写入关注。 |
let | document | { <variable_name_1>: <expression_1>, $$ ) together with your variable name in the form $$<variable_name> . $$ )和形式为$$<variable_name> 的变量名。$$targetTotal . $$targetTotal 。Note let and variables, see Use Variables in let . let 和变量的完整示例,请参阅在let 中使用变量。 |
MongoDB 3.6 removes the use of MongoDB 3.6删除了不带aggregate
command without the cursor
option unless the command includes the explain
option. cursor
选项的aggregate
命令的使用,除非该命令包含explain
选项。Unless you include the 除非包含explain
option, you must specify the cursor option.explain
选项,否则必须指定cursor
选项。
To indicate a cursor with the default batch size, specify若要指示具有默认批处理大小的游标,请指定cursor: {}
.cursor: {}
。To indicate a cursor with a non-default batch size, use要指示具有非默认批大小的游标,请使用cursor: { batchSize: <num> }
.cursor: { batchSize: <num> }
。
For more information about the aggregation pipeline Aggregation Pipeline, Aggregation Reference, and Aggregation Pipeline Limits.有关聚合管道聚合管道、聚合参考和聚合管道限制的详细信息。
Sessions会话
New in version 4.0. 4.0版新增。
For cursors created inside a session, you cannot call 对于在会话内创建的游标,不能在会话外调用getMore
outside the session.getMore
。
Similarly, for cursors created outside of a session, you cannot call 类似地,对于在会话之外创建的游标,不能在会话内部调用getMore
inside a session.getMore
。
Session Idle Timeout会话空闲超时
MongoDB drivers and MongoDB驱动程序和mongosh
associate all operations with a server session, with the exception of unacknowledged write operations. mongosh
将所有操作与服务器会话相关联,但未确认的写入操作除外。For operations not explicitly associated with a session (i.e. using 对于未显式关联会话的操作(即使用Mongo.startSession()
), MongoDB drivers and mongosh
create an implicit session and associate it with the operation.Mongo.startSession()
),MongoDB驱动程序和mongosh
会创建一个隐式会话并将其与操作关联。
If a session is idle for longer than 30 minutes, the MongoDB server marks that session as expired and may close it at any time. 如果会话空闲时间超过30分钟,MongoDB服务器会将该会话标记为已过期,并可能随时关闭该会话。When the MongoDB server closes the session, it also kills any in-progress operations and open cursors associated with the session. 当MongoDB服务器关闭会话时,它还会杀死任何正在进行的操作和打开与会话相关的游标。This includes cursors configured with 这包括配置为noCursorTimeout()
or a maxTimeMS()
greater than 30 minutes.noCursorTimeout()
或maxTimeMS()
大于30分钟的游标。
For operations that return a cursor, if the cursor may be idle for longer than 30 minutes, issue the operation within an explicit session using 对于返回游标的操作,如果游标空闲时间可能超过30分钟,请使用Mongo.startSession()
and periodically refresh the session using the refreshSessions
command. See Session Idle Timeout for more information.Mongo.startSession()
在显式会话中发出操作,并使用refreshSessions
命令定期刷新会话。有关详细信息,请参阅会话空闲超时。
Transactions事务
aggregate
can be used inside multi-document transactions.可以在多文档事务中使用。
However, the following stages are not allowed within transactions:但是,事务中不允许出现以下阶段:
You also cannot specify the 您也不能指定explain
option.explain
选项。
For cursors created outside of a transaction, you cannot call对于在事务外部创建的游标,不能在事务内部调用getMore
inside the transaction.getMore
。For cursors created in a transaction, you cannot call对于在事务中创建的游标,不能在事务外调用getMore
outside the transaction.getMore
。
In most cases, multi-document transaction incurs a greater performance cost over single document writes, and the availability of multi-document transactions should not be a replacement for effective schema design. 在大多数情况下,与单文档写入相比,多文档事务会产生更高的性能成本,并且多文档事务的可用性不应取代有效的模式设计。For many scenarios, the denormalized data model (embedded documents and arrays) will continue to be optimal for your data and use cases. 对于许多场景,非规范化数据模型(嵌入文档和数组)将继续是您的数据和用例的最佳选择。That is, for many scenarios, modeling your data appropriately will minimize the need for multi-document transactions.也就是说,对于许多场景,对数据进行适当建模将最大限度地减少对多文档事务的需求。
For additional transactions usage considerations (such as runtime limit and oplog size limit), see also Production Considerations.有关其他事务使用注意事项(如运行时限制和操作日志大小限制),请参阅生产注意事项。
Client Disconnection客户端断开连接
For 对于不包括aggregate
operation that do not include the $out
or $merge
stages:$out
或$merge
阶段的aggregate
操作:
Starting in MongoDB 4.2, if the client that issued 从MongoDB 4.2开始,如果在操作完成之前发出aggregate
disconnects before the operation completes, MongoDB marks aggregate
for termination using killOp
.aggregate
的客户端断开连接,MongoDB会使用killOp
标记aggregate
以终止。
Stable API
When using Stable API V1:当使用稳定API V1:
You cannot use the following stages in an不能在aggregate
command:aggregate
命令中使用以下阶段:Don't include the不要在explain
field in anaggregate
command. If you do, the server returns an APIStrictError error.aggregate
命令中包含explain
字段。如果执行此操作,服务器将返回APIStrictError错误。When using the使用$collStats
stage, you can only use thecount
field.$collStats
阶段时,只能使用count字段。No other没有其他$collStats
fields are available.$collStats
字段可用。
Example实例
MongoDB 3.6 removes the use of MongoDB 3.6删除了不带aggregate
command without the cursor
option unless the command includes the explain
option. cursor
选项的aggregate
命令的使用,除非该命令包含explain
选项。Unless you include the 除非包含explain
option, you must specify the cursor option.explain
选项,否则必须指定cursor
选项。
To indicate a cursor with the default batch size, specify若要指示具有默认批处理大小的游标,请指定cursor: {}
.cursor: {}
。To indicate a cursor with a non-default batch size, use要指示具有非默认批大小的游标,请使用cursor: { batchSize: <num> }
.cursor: { batchSize: <num> }
。
Rather than run the 大多数用户不应该直接运行aggregate
command directly, most users should use the db.collection.aggregate()
helper provided in mongosh
or the equivalent helper in their driver. aggregate
命令,而应该使用mongosh
中提供的db.collection.aggregate()
助手或驱动程序中的等效助手。In 2.6 and later, the 在2.6及更高版本中,db.collection.aggregate()
helper always returns a cursor.db.collection.aggregate()
助手始终返回一个游标。
Except for the first two examples which demonstrate the command syntax, the examples in this page use the 除了前两个示例演示了命令语法外,本页中的示例都使用了db.collection.aggregate()
helper.db.collection.aggregate()
助手。
Aggregate Data with Multi-Stage Pipeline使用多级管道聚合数据
A collection articles
contains documents such as the following:articles
文章包含以下文档:
{
_id: ObjectId("52769ea0f3dc6ead47c9a1b2"),
author: "abc123",
title: "zzz",
tags: [ "programming", "database", "mongodb" ]
}
The following example performs an 以下示例对aggregate
operation on the articles
collection to calculate the count of each distinct element in the tags
array that appears in the collection.articles
集合执行aggregate
操作,以计算集合中出现的tags
数组中每个不同元素的计数。
db.runCommand( {
aggregate: "articles",
pipeline: [
{ $project: { tags: 1 } },
{ $unwind: "$tags" },
{ $group: { _id: "$tags", count: { $sum : 1 } } }
],
cursor: { }
} )
In 在mongosh
, this operation can use the db.collection.aggregate()
helper as in the following:mongosh
中,此操作可以使用db.collection.aggregate()
助手,如下所示:
db.articles.aggregate( [
{ $project: { tags: 1 } },
{ $unwind: "$tags" },
{ $group: { _id: "$tags", count: { $sum : 1 } } }
] )
Use $currentOp on an Admin Database在admin
数据库上使用$currentOp
The following example runs a pipeline with two stages on the admin database. 以下示例在管理数据库上运行一个包含两个阶段的管道。The first stage runs the 第一阶段运行$currentOp
operation and the second stage filters the results of that operation.$currentOp
操作,第二阶段筛选该操作的结果。
db.adminCommand( {
aggregate : 1,
pipeline : [ {
$currentOp : { allUsers : true, idleConnections : true } }, {
$match : { shard : "shard01" }
}
],
cursor : { }
} )
The aggregate
command does not specify a collection and instead takes the form {aggregate: 1}
. aggregate
命令未指定集合,而是采用{aggregate: 1}
的形式。This is because the initial 这是因为最初的$currentOp
stage does not draw input from a collection. It produces its own data that the rest of the pipeline uses.$currentOp
阶段不会从集合中提取输入。它生成自己的数据,供管道的其他部分使用。
The new 添加了新的db.aggregate()
helper has been added to assist in running collectionless aggregations such as this. The above aggregation could also be run like this example.db.aggregate()
帮助程序来帮助运行这样的无集合聚合。上面的聚合也可以像这个例子一样运行。
Return Information on the Aggregation Operation返回有关聚合操作的信息
The following aggregation operation sets the optional field 以下聚合操作将可选字段explain
to true
to return information about the aggregation operation.explain
设置为true
,以返回有关聚合操作的信息。
db.orders.aggregate([
{ $match: { status: "A" } },
{ $group: { _id: "$cust_id", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } }
],
{ explain: true }
)
The explain output is subject to change between releases.解释输出可能会在不同版本之间发生变化。
See also: 另请参阅:
db.collection.aggregate()
method方法
Interaction with allowDiskUseByDefault
与allowDiskUseByDefault
的交互
allowDiskUseByDefault
Starting in MongoDB 6.0, pipeline stages that require more than 100 megabytes of memory to execute write temporary files to disk by default. 从MongoDB 6.0开始,需要超过100MB内存才能执行的管道阶段默认情况下会将临时文件写入磁盘。In earlier verisons of MongoDB, you must pass 在MongoDB的早期版本中,必须将{ allowDiskUse: true }
to individual find
and aggregate
commands to enable this behavior.{ allowDiskUse: true }
传递给单个find
和aggregate
命令才能启用此行为。
Individual 单独的find
and aggregate
commands may override the allowDiskUseByDefault
parameter by either:find
和aggregate
命令可以通过以下方式覆盖allowDiskUseByDefault
参数:
Using当{ allowDiskUse: true }
to allow writing temporary files out to disk whenallowDiskUseByDefault
is set tofalse
allowDiskUseByDefault
设置为false
时,使用{ allowDiskUse: true }
允许将临时文件写入磁盘Using当{ allowDiskUse: false }
to prohibit writing temporary files out to disk whenallowDiskUseByDefault
is set totrue
allowDiskUseByDefault
设置为true
时,使用{ allowDiskUse: false }
禁止将临时文件写入磁盘
Starting in MongoDB 4.2, the profiler log messages and diagnostic log messages includes a 从MongoDB 4.2开始,如果任何聚合阶段由于内存限制将数据写入临时文件,探查器日志消息和诊断日志消息都会包含一个usedDisk
indicator if any aggregation stage wrote data to temporary files due to memory restrictions.usedDisk
指示符。
See also: 另请参阅:
Aggregate Data Specifying Batch Size指定批量大小的聚合数据
To specify an initial batch size, specify the 要指定初始批大小,请在batchSize
in the cursor
field, as in the following example:cursor
字段中指定batchSize
,如下例所示:
db.orders.aggregate( [
{ $match: { status: "A" } },
{ $group: { _id: "$cust_id", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } },
{ $limit: 2 }
],
{ cursor: { batchSize: 0 } }
)
The 指定初始批大小的{ cursor: { batchSize: 0 } }
document, which specifies the size of the initial batch size, indicates an empty first batch. { cursor: { batchSize: 0 } }
文档表示第一批为空。This batch size is useful for quickly returning a cursor or failure message without doing significant server-side work.这个批量大小对于快速返回游标或失败消息非常有用,而不需要做大量的服务器端工作。
To specify batch size for subsequent 要为后续的getMore
operations (after the initial batch), use the batchSize
field when running the getMore
command.getMore
操作指定批处理大小(在初始批处理之后),请在运行getMore
命令时使用batchSize
字段。
Specify a Collation指定排序规则
collation
allows users to specify language-specific rules for string comparison, such as rules for lettercase and accent marks.允许用户为字符串比较指定特定于语言的规则,例如字母大小写和重音标记的规则。
A collection 集合myColl
has the following documents:myColl
包含以下文档:
{ _id: 1, category: "café", status: "A" }
{ _id: 2, category: "cafe", status: "a" }
{ _id: 3, category: "cafE", status: "a" }
The following aggregation operation includes the Collation option:以下聚合操作包括collation
选项:
db.myColl.aggregate(
[ { $match: { status: "A" } }, { $group: { _id: "$category", count: { $sum: 1 } } } ],
{ collation: { locale: "fr", strength: 1 } }
);
For descriptions on the collation fields, see Collation Document.有关排序规则字段的说明,请参阅排序规则文档。
Hint an Index提示索引
Create a collection 使用以下文档创建一个集合foodColl
with the following documents:foodColl
:
db.foodColl.insertMany( [
{ _id: 1, category: "cake", type: "chocolate", qty: 10 },
{ _id: 2, category: "cake", type: "ice cream", qty: 25 },
{ _id: 3, category: "pie", type: "boston cream", qty: 20 },
{ _id: 4, category: "pie", type: "blueberry", qty: 15 }
] )
Create the following indexes:创建以下索引:
db.foodColl.createIndex( { qty: 1, type: 1 } );
db.foodColl.createIndex( { qty: 1, category: 1 } );
The following aggregation operation includes the 以下聚合操作包括用于强制使用指定索引的hint
option to force the usage of the specified index:hint
选项:
db.foodColl.aggregate(
[ { $sort: { qty: 1 }}, { $match: { category: "cake", qty: 10 } }, { $sort: { type: -1 } } ],
{ hint: { qty: 1, category: 1 } }
)
Override Default Read Concern覆盖默认读取关注
To override the default read concern level, use the 要覆盖默认的读取关注级别,请使用readConcern
option. readConcern
选项。The getMore
command uses the readConcern
level specified in the originating aggregate
command.getMore
命令使用原始聚合命令中指定的readConcern
级别。
You cannot use the 不能将$out
or the $merge
stage in conjunction with read concern "linearizable"
. $out
或$merge
阶段与读取关注"linearizable"
一起使用。That is, if you specify 也就是说,如果为"linearizable"
read concern for db.collection.aggregate()
, you cannot include either stages in the pipeline.db.collection.aggregate()
指定了"linearizable"
读取关注,则不能在管道中包含这两个阶段。
The following operation on a replica set specifies a read concern of 以下对副本集的操作指定了"majority"
to read the most recent copy of the data confirmed as having been written to a majority of the nodes."majority"
的读取关注,以读取已确认写入大多数节点的数据的最新副本。
Starting in MongoDB 4.2, you can specify read concern level从MongoDB 4.2开始,您可以为包含"majority"
for an aggregation that includes an$out
stage.$out
阶段的聚合指定读取关注级别"majority"
。Regardless of the read concern level, the most recent data on a node may not reflect the most recent version of the data in the system.无论读取关注级别如何,节点上的最新数据都可能无法反映系统中数据的最新版本。
db.restaurants.aggregate(
[ { $match: { rating: { $lt: 5 } } } ],
{ readConcern: { level: "majority" } }
)
To ensure that a single thread can read its own writes, use 要确保单个线程可以读取自己的写入,请对副本集的主线程使用"majority"
read concern and "majority"
write concern against the primary of the replica set."majority"
读取关注和"majority"
写入关注。
Use Variables in let
在let
中使用变量
let
New in version 5.0. 5.0版新增。
To define variables that you can access elsewhere in the command, use the let option.要定义可以在命令的其他地方访问的变量,请使用let
选项。
Create a collection 创建一个包含蛋糕口味销售的集合cakeSales
containing sales for cake flavors:cakeSales
:
db.cakeSales.insertMany( [
{ _id: 1, flavor: "chocolate", salesTotal: 1580 },
{ _id: 2, flavor: "strawberry", salesTotal: 4350 },
{ _id: 3, flavor: "cherry", salesTotal: 2150 }
] )
The following example:以下示例:
retrieves the cake that has a检索salesTotal
greater than 3000, which is the cake with an_id
of 2salesTotal
大于3000的蛋糕,即_id
为2的蛋糕defines a在targetTotal
variable inlet
, which is referenced in$gt
as$$targetTotal
let
中定义targetTotal
变量,该变量在$gt
中被引用为$$targetTotal
db.runCommand( {
aggregate: db.cakeSales.getName(),
pipeline: [
{ $match: {
$expr: { $gt: [ "$salesTotal", "$$targetTotal" ] }
} },
],
cursor: {},
let: { targetTotal: 3000 }
} )