On this page本页内容
$group
Groups input documents by the specified 按指定的_id expression and for each distinct grouping, outputs a document. _id表达式对输入文档进行分组,并为每个不同的分组输出一个文档。The 每个输出文档的_id field of each output document contains the unique group by value. _id字段包含唯一的group by值。The output documents can also contain computed fields that hold the values of some accumulator expression.输出文档还可以包含保存某些累加器表达式值的计算字段。
$group does not order its output documents.不对其输出文档进行排序。
The $group stage has the following prototype form:$group阶段具有以下原型形式:
{
$group:
{
_id: <expression>, // Group By Expression
<field1>: { <accumulator1> : <expression1> },
...
}
}
_id | _id value of null, or any other constant value, the $group stage calculates accumulated values for all the input documents as a whole. _id值指定为null或任何其他常量值,$group阶段将计算所有输入文档作为一个整体的累积值。 |
field |
The _id and the accumulator operators can accept any valid expression. _id和累加器运算符可以接受任何有效表达式。For more information on expressions, see Expressions.有关表达式的详细信息,请参阅表达式。
The <accumulator> operator must be one of the following accumulator operators:<accumulator>运算符必须是以下累加器运算符之一:
Changed in version 5.0.在版本5.0中更改。
$accumulator | |
$addToSet |
|
$avg |
|
$bottom |
|
$bottomN |
|
$count |
|
$first |
|
$firstN |
|
$last |
|
$lastN |
|
$max |
|
$maxN |
|
$mergeObjects | |
$min |
|
$push |
|
$stdDevPop |
|
$stdDevSamp |
|
$sum |
|
$top |
|
$topN |
|
$groupThe $group stage has a limit of 100 megabytes of RAM. $group阶段的RAM限制为100兆字节。By default, if the stage exceeds this limit, 默认情况下,如果阶段超过此限制,$group returns an error. $group将返回错误。To allow more space for stage processing, use the allowDiskUse option to enable aggregation pipeline stages to write data to temporary files.要为阶段处理提供更多空间,请使用allowDiskUse选项启用聚合管道阶段将数据写入临时文件。
$groupThis section describes optimizations to improve the performance of 本节介绍了提高$group. $group性能的优化。There are optimizations that you can make manually and optimizations MongoDB makes internally.您可以手动进行优化,也可以在MongoDB内部进行优化。
If a pipeline 如果管道按同一字段排序和分组,sorts and groups by the same field and the $group stage only uses the $first accumulator operator, consider adding an index on the grouped field which matches the sort order. $group阶段仅使用$first累加器运算符,请考虑在分组字段上添加与排序顺序匹配的索引。In some cases, the 在某些情况下,$group stage can use the index to quickly find the first document of each group.$group阶段可以使用索引快速查找每个组的第一个文档。
If a collection named 如果名为foo contains an index { x: 1, y: 1 }, the following pipeline can use that index to find the first document of each group:foo的集合包含索引{ x: 1, y: 1 },则以下管道可以使用该索引查找每个组的第一个文档:
db.foo.aggregate([
{
$sort:{ x : 1, y : 1 }
},
{
$group: {
_id: { x : "$x" },
y: { $first : "$y" }
}
}
])
Starting in MongoDB 5.2, MongoDB uses the slot-based execution query engine to execute 从MongoDB 5.2开始,MongoDB使用基于时隙的执行查询引擎来执行$group stages when $group is either:$group阶段,当$group为:
$match followed by $group, the $match and $group stages are executed by the slot-based engine.$match开头,后面跟着$group,则$match和$group阶段由基于插槽的引擎执行。In most cases, the slot-based engine provides improved performance and lower CPU and memory costs compared to the classic query engine.在大多数情况下,与传统查询引擎相比,基于插槽的引擎提供了更好的性能和更低的CPU和内存成本。
See 请参见$group Optimization.$group优化。
In 在mongosh, create a sample collection named sales with the following documents:mongosh中,使用以下文档创建名为sales的样本集合:
db.sales.insertMany([
{ "_id" : 1, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("2"), "date" : ISODate("2014-03-01T08:00:00Z") },
{ "_id" : 2, "item" : "jkl", "price" : NumberDecimal("20"), "quantity" : NumberInt("1"), "date" : ISODate("2014-03-01T09:00:00Z") },
{ "_id" : 3, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt( "10"), "date" : ISODate("2014-03-15T09:00:00Z") },
{ "_id" : 4, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt("20") , "date" : ISODate("2014-04-04T11:21:39.736Z") },
{ "_id" : 5, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("10") , "date" : ISODate("2014-04-04T21:23:13.331Z") },
{ "_id" : 6, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("5" ) , "date" : ISODate("2015-06-04T05:08:13Z") },
{ "_id" : 7, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("10") , "date" : ISODate("2015-09-10T08:43:00Z") },
{ "_id" : 8, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("5" ) , "date" : ISODate("2016-02-06T20:20:13Z") },
])
The following aggregation operation uses the 以下聚合操作使用$group stage to count the number of documents in the sales collection:$group阶段计算sales集合中的文档数:
db.sales.aggregate( [
{
$group: {
_id: null,
count: { $count: { } }
}
}
] )
The operation returns the following result:该操作返回以下结果:
{ "_id" : null, "count" : 8 }
This aggregation operation is equivalent to the following SQL statement:此聚合操作等效于以下SQL语句:
SELECT COUNT(*) AS count FROM sales
The following aggregation operation uses the 以下聚合操作使用$group stage to retrieve the distinct item values from the sales collection:$group阶段从sales集合中检索不同的项目值:
db.sales.aggregate( [ { $group : { _id : "$item" } } ] )
The operation returns the following result:该操作返回以下结果:
{ "_id" : "abc" }
{ "_id" : "jkl" }
{ "_id" : "def" }
{ "_id" : "xyz" }
The following aggregation operation groups documents by the 以下聚合操作按item field, calculating the total sale amount per item and returning only the items with total sale amount greater than or equal to 100:item字段对文档进行分组,计算每个项目的总销售额,并仅返回总销售额大于或等于100的项目:
db.sales.aggregate(
[
// First Stage
{
$group :
{
_id : "$item",
totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } }
}
},
// Second Stage
{
$match: { "totalSaleAmount": { $gte: 100 } }
}
]
)
$group stage groups the documents by item to retrieve the distinct item values. $group阶段按项目对文档进行分组,以检索不同的项目值。totalSaleAmount for each item.totalSaleAmount(总销售额)。$match stage filters the resulting documents to only return items with a totalSaleAmount greater than or equal to 100.$match阶段筛选结果文档,仅返回totalSaleAmount大于或等于100的项目。The operation returns the following result:该操作返回以下结果:
{ "_id" : "abc", "totalSaleAmount" : NumberDecimal("170") }
{ "_id" : "xyz", "totalSaleAmount" : NumberDecimal("150") }
{ "_id" : "def", "totalSaleAmount" : NumberDecimal("112.5") }
This aggregation operation is equivalent to the following SQL statement:此聚合操作等效于以下SQL语句:
SELECT item, Sum(( price * quantity )) AS totalSaleAmount FROM sales GROUP BY item HAVING totalSaleAmount >= 100
In 在mongosh, create a sample collection named sales with the following documents:mongosh中,使用以下文档创建名为sales的样本集合:
db.sales.insertMany([
{ "_id" : 1, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("2"), "date" : ISODate("2014-03-01T08:00:00Z") },
{ "_id" : 2, "item" : "jkl", "price" : NumberDecimal("20"), "quantity" : NumberInt("1"), "date" : ISODate("2014-03-01T09:00:00Z") },
{ "_id" : 3, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt( "10"), "date" : ISODate("2014-03-15T09:00:00Z") },
{ "_id" : 4, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt("20") , "date" : ISODate("2014-04-04T11:21:39.736Z") },
{ "_id" : 5, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("10") , "date" : ISODate("2014-04-04T21:23:13.331Z") },
{ "_id" : 6, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("5" ) , "date" : ISODate("2015-06-04T05:08:13Z") },
{ "_id" : 7, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("10") , "date" : ISODate("2015-09-10T08:43:00Z") },
{ "_id" : 8, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("5" ) , "date" : ISODate("2016-02-06T20:20:13Z") },
])
The following pipeline calculates the total sales amount, average sales quantity, and sale count for each day in the year 2014:以下管道计算2014年每天的总销售额、平均销售额和销售额:
db.sales.aggregate([ // First Stage { $match : { "date": { $gte: new ISODate("2014-01-01"), $lt: new ISODate("2015-01-01") } } }, // Second Stage { $group : { _id : { $dateToString: { format: "%Y-%m-%d", date: "$date" } }, totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } }, averageQuantity: { $avg: "$quantity" }, count: { $sum: 1 } } }, // Third Stage { $sort : { totalSaleAmount: -1 } } ])
$match stage filters the documents to only pass documents from the year 2014 to the next stage.$match阶段筛选文档,仅将2014年的文档传递到下一阶段。$group stage groups the documents by date and calculates the total sale amount, average quantity, and total count of the documents in each group.$group阶段按日期对文档进行分组,并计算每组文档的总销售额、平均数量和总计数。$sort stage sorts the results by the total sale amount for each group in descending order.$sort阶段按每组的总销售额降序对结果进行排序。The operation returns the following results:该操作返回以下结果:
{ "_id" : "2014-04-04", "totalSaleAmount" : NumberDecimal("200"), "averageQuantity" : 15, "count" : 2 }
{ "_id" : "2014-03-15", "totalSaleAmount" : NumberDecimal("50"), "averageQuantity" : 10, "count" : 1 }
{ "_id" : "2014-03-01", "totalSaleAmount" : NumberDecimal("40"), "averageQuantity" : 1.5, "count" : 2 }
This aggregation operation is equivalent to the following SQL statement:此聚合操作等效于以下SQL语句:
SELECT date, Sum(( price * quantity )) AS totalSaleAmount, Avg(quantity) AS averageQuantity, Count(*) AS Count FROM sales GROUP BY Date(date) ORDER BY totalSaleAmount DESC
nullnull分组The following aggregation operation specifies a group 以下聚合操作指定group_id of null, calculating the total sale amount, average quantity, and count of all documents in the collection._id为null,用于计算集合中所有文档的总销售额、平均数量和计数。
db.sales.aggregate([
{
$group : {
_id : null,
totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } },
averageQuantity: { $avg: "$quantity" },
count: { $sum: 1 }
}
}
])
The operation returns the following result:该操作返回以下结果:
{
"_id" : null,
"totalSaleAmount" : NumberDecimal("452.5"),
"averageQuantity" : 7.875,
"count" : 8
}
This aggregation operation is equivalent to the following SQL statement:此聚合操作等效于以下SQL语句:
SELECT Sum(price * quantity) AS totalSaleAmount, Avg(quantity) AS averageQuantity, Count(*) AS Count FROM sales
$countdb.collection.countDocuments() which wraps the $group aggregation stage with a $sum expression.db.collection.countDocuments(),它用$sum表达式包装$group聚合阶段。In 在mongosh, create a sample collection named books with the following documents:mongosh中,使用以下文档创建名为books的示例集合:
db.books.insertMany([
{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 },
{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
{ "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
])
title by authorauthor分组titleThe following aggregation operation pivots the data in the 以下聚合操作将books collection to have titles grouped by authors.books集合中的数据透视为按作者分组的标题。
db.books.aggregate([
{ $group : { _id : "$author", books: { $push: "$title" } } }
])
The operation returns the following documents:操作将返回以下文档:
{ "_id" : "Homer", "books" : [ "The Odyssey", "Iliad" ] }
{ "_id" : "Dante", "books" : [ "The Banquet", "Divine Comedy", "Eclogues" ] }
authorauthor分组文档The following aggregation operation groups documents by 以下聚合操作按author:author对文档进行分组:
db.books.aggregate([ // First Stage { $group : { _id : "$author", books: { $push: "$$ROOT" } } }, // Second Stage { $addFields: { totalCopies : { $sum: "$books.copies" } } } ])
$group uses the 使用$$ROOT system variable to group the entire documents by authors. $$ROOT系统变量按作者对整个文档进行分组。This stage passes the following documents to the next stage:本阶段将以下文件传递给下一阶段:
{ "_id" : "Homer",
"books" :
[
{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
{ "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
]
},
{ "_id" : "Dante",
"books" :
[
{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
]
}
$addFields adds a field to the output containing the total copies of books for each author.在输出中添加一个字段,该字段包含每个作者的图书总数。
The resulting documents must not exceed the BSON Document Size limit of 16 megabytes.生成的文档不得超过16兆字节的BSON文档大小限制。
The operation returns the following documents:操作将返回以下文档:
{
"_id" : "Homer",
"books" :
[
{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
{ "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
],
"totalCopies" : 20
}
{
"_id" : "Dante",
"books" :
[
{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
],
"totalCopies" : 5
}
The Aggregation with the Zip Code Data Set tutorial provides an extensive example of the “使用邮政编码数据集进行聚合”教程提供了一个常见用例中的$group operator in a common use case.$group运算符的广泛示例。