$group (aggregation)
On this page本页内容
Definition定义
$group
-
The$group
stage separates documents into groups according to a "group key".$group
阶段根据“组键”将文档分为多个组。The output is one document for each unique group key.对于每个唯一的组键,输出是一个文档。A group key is often a field, or group of fields.组键通常是一个字段或一组字段。The group key can also be the result of an expression.组键也可以是表达式的结果。Use the使用_id
field in the$group
pipeline stage to set the group key.$group
管道阶段中的_id
字段来设置组键。See below for usage examples.请参阅下面的用法示例。In the在$group
stage output, the_id
field is set to the group key for that document.$group
阶段输出中,_id
字段被设置为该文档的组键。The output documents can also contain additional fields that are set using accumulator expressions.输出文档还可以包含使用累加器表达式设置的附加字段。Note$group
does not order its output documents.不排序其输出文档。The$group
stage has the following prototype form:$group
阶段具有以下原型形式:{
$group:
{
_id: <expression>, // Group key
<field1>: { <accumulator1> : <expression1> },
...
}
}Field字段Description描述_id
Required.必要的。The_id
expression specifies the group key._id
表达式指定组键。If you specify an如果指定_id
value of null, or any other constant value, the$group
stage returns a single document that aggregates values across all of the input documents._id
值为null
或任何其他常数值,$group
阶段将返回一个文档,该文档聚合所有输入文档的值。See the Group by Null example.请参见“按Null
分组”示例。field
Optional.可选的。Computed using the accumulator operators.使用累加器运算符进行计算。The_id
and the accumulator operators can accept any validexpression
._id
和累加器运算符可以接受任何有效的表达式。For more information on expressions, see Expressions.有关表达式的详细信息,请参阅表达式。
Considerations注意事项
Accumulator Operator集合器运算符
The <accumulator>
operator must be one of the following accumulator operators:<accumulator>
运算符必须是以下累加器运算符之一:
Changed in version 5.05.0版更改.
$accumulator | |
$addToSet | $setWindowFields stage. $setWindowFields 阶段中可用。 |
$avg | $setWindowFields stage. $setWindowFields 阶段中可用。 |
$bottom | $group and $setWindowFields stages. $group 和$setWindowFields 阶段中可用。 |
$bottomN | n fields within a group, according to the specified sort order. n 个字段的聚合。$group and $setWindowFields stages. $group 和$setWindowFields 阶段中可用。 |
$count | $count pipeline stage. $count 管道阶段不同。$group and $setWindowFields stages. $group 和$setWindowFields 阶段中可用。 |
$first | $setWindowFields stage. $setWindowFields 阶段中可用。 |
$firstN | n elements within a group. n 个元素的聚合。$firstN array operator. $firstN 数组运算符不同。$group , expression and $setWindowFields stages. $group 、表达式和$setWindowFields 阶段中可用。 |
$last | $setWindowFields stage. $setWindowFields 阶段中可用。 |
$lastN | n elements within a group. n 个元素的聚合。$lastN array operator. $lastN 数组运算符不同。$group , expression and $setWindowFields stages. $group 、表达式和$setWindowFields 阶段中可用。 |
$max | $setWindowFields stage. $setWindowFields 阶段中可用。 |
$maxN | n maximum valued elements in a group. n 个最大值元素的聚合。$maxN array operator. $maxN 数组运算符不同。$group , $setWindowFields and as an expression. $group 、$setWindowFields 中可用,用作表达式。 |
$median | |
$mergeObjects | |
$min | $setWindowFields stage. $setWindowFields 阶段中可用。 |
$percentile | |
$push | $setWindowFields stage. $setWindowFields 阶段中可用。 |
$stdDevPop | $setWindowFields stage. $setWindowFields 阶段中可用。 |
$stdDevSamp | $setWindowFields stage. $setWindowFields 阶段中可用。 |
$sum | $setWindowFields stage. $setWindowFields 阶段中可用。 |
$top | $group and $setWindowFields stages. $group 和$setWindowFields 阶段中可用。 |
$topN | n fields within a group, according to the specified sort order. n 个字段的聚合。$group and $setWindowFields stages. $group 和$setWindowFields 阶段中可用。 |
$group
and Memory Restrictions和内存限制
The $group
stage has a limit of 100 megabytes of RAM. $group
阶段的RAM限制为100兆字节。By default, if the stage exceeds this limit, 默认情况下,如果阶段超过此限制,$group
returns an error. $group
将返回一个错误。To allow more space for stage processing, use the allowDiskUse option to enable aggregation pipeline stages to write data to temporary files.若要为阶段处理留出更多空间,请使用allowDiskUse
选项启用聚合管道阶段以将数据写入临时文件。
See also: 另请参阅:
$group
Performance Optimizations性能优化
This section describes optimizations to improve the performance of 本节介绍用于提高$group
. $group
性能的优化。There are optimizations that you can make manually and optimizations MongoDB makes internally.您可以手动进行优化,也可以在MongoDB内部进行优化。
Optimization to Return the First or Last Document of Each Group优化返回每组的第一个或最后一个文档
If a pipeline 如果管道按同一字段进行排序和分组,而sorts
and groups
by the same field and the $group
stage only uses the $first
or $last
accumulator operator, consider adding an index on the grouped field which matches the sort order. $group
阶段仅使用$first
或$last
累加器运算符,请考虑在分组字段上添加与排序顺序匹配的索引。In some cases, the 在某些情况下,$group
stage can use the index to quickly find the first or last document of each group.$group
阶段可以使用索引快速查找每个组的第一个或最后一个文档。
If a collection named 如果名为foo
contains an index { x: 1, y: 1 }
, the following pipeline can use that index to find the first document of each group:foo
的集合包含索引{ x: 1, y: 1 }
,则以下管道可以使用该索引来查找每个组的第一个文档:
db.foo.aggregate([
{
$sort:{ x : 1, y : 1 }
},
{
$group: {
_id: { x : "$x" },
y: { $first : "$y" }
}
}
])
Slot-Based Query Execution Engine基于插槽的查询执行引擎
Starting in version 5.2, MongoDB uses the slot-based execution query engine to execute 从5.2版本开始,MongoDB使用基于槽的执行查询引擎来执行$group
stages if either:$group
阶段,如果:
$group
is the first stage in the pipeline.是管道中的第一阶段。All preceding stages in the pipeline can also be executed by the slot-based execution engine.管道中的所有先前阶段也可以由基于槽的执行引擎执行。
For more information, see 有关更多信息,请参阅$group
Optimization.$group
优化。
Examples实例
Count the Number of Documents in a Collection统计集合中的文档数
In 在mongosh
, create a sample collection named sales
with the following documents:mongosh
中,使用以下文档创建一个名为sales
的示例集合:
db.sales.insertMany([
{ "_id" : 1, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("2"), "date" : ISODate("2014-03-01T08:00:00Z") },
{ "_id" : 2, "item" : "jkl", "price" : NumberDecimal("20"), "quantity" : NumberInt("1"), "date" : ISODate("2014-03-01T09:00:00Z") },
{ "_id" : 3, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt( "10"), "date" : ISODate("2014-03-15T09:00:00Z") },
{ "_id" : 4, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt("20") , "date" : ISODate("2014-04-04T11:21:39.736Z") },
{ "_id" : 5, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("10") , "date" : ISODate("2014-04-04T21:23:13.331Z") },
{ "_id" : 6, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("5" ) , "date" : ISODate("2015-06-04T05:08:13Z") },
{ "_id" : 7, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("10") , "date" : ISODate("2015-09-10T08:43:00Z") },
{ "_id" : 8, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("5" ) , "date" : ISODate("2016-02-06T20:20:13Z") },
])
The following aggregation operation uses the 以下聚合操作使用$group
stage to count the number of documents in the sales
collection:$group
阶段来统计sales
集合中的文档数:
db.sales.aggregate( [
{
$group: {
_id: null,
count: { $count: { } }
}
}
] )
The operation returns the following result:该操作返回以下结果:
{ "_id" : null, "count" : 8 }
This aggregation operation is equivalent to the following SQL statement:此聚合操作等效于以下SQL语句:
SELECT COUNT(*) AS count FROM sales
See also: 另请参阅:
Retrieve Distinct Values检索不同的值
The following aggregation operation uses the 以下聚合操作使用$group
stage to retrieve the distinct item values from the sales
collection:$group
阶段从销售集合中检索不同的项目值:
db.sales.aggregate( [ { $group : { _id : "$item" } } ] )
The operation returns the following result:该操作返回以下结果:
{ "_id" : "abc" }
{ "_id" : "jkl" }
{ "_id" : "def" }
{ "_id" : "xyz" }
Group by Item Having按项目分组
The following aggregation operation groups documents by the 以下聚合操作按item
field, calculating the total sale amount per item and returning only the items with total sale amount greater than or equal to 100:item
字段对单据进行分组,计算每个物料的总销售额,只返回总销售额大于或等于100的物料:
db.sales.aggregate(
[
// First Stage
{
$group :
{
_id : "$item",
totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } }
}
},
// Second Stage
{
$match: { "totalSaleAmount": { $gte: 100 } }
}
]
)
First Stage:第一阶段The$group
stage groups the documents byitem
to retrieve the distinct item values. This stage returns thetotalSaleAmount
for each item.$group
阶段按项目对文档进行分组,以检索不同的项目值。此阶段返回每个项目的总销售金额。Second Stage:第二阶段The$match
stage filters the resulting documents to only return items with atotalSaleAmount
greater than or equal to 100.$match
阶段筛选生成的文档,以仅返回totalSaleAmount
大于或等于100的项目。
The operation returns the following result:该操作返回以下结果:
{ "_id" : "abc", "totalSaleAmount" : NumberDecimal("170") }
{ "_id" : "xyz", "totalSaleAmount" : NumberDecimal("150") }
{ "_id" : "def", "totalSaleAmount" : NumberDecimal("112.5") }
This aggregation operation is equivalent to the following SQL statement:此聚合操作等效于以下SQL语句:
SELECT item,
Sum(( price * quantity )) AS totalSaleAmount
FROM sales
GROUP BY item
HAVING totalSaleAmount >= 100
See also: 另请参阅:
Calculate Count, Sum, and Average计算计数、求和和和平均值
In 在mongosh
, create a sample collection named sales
with the following documents:mongosh
中,使用以下文档创建一个名为sales
的示例集合:
db.sales.insertMany([
{ "_id" : 1, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("2"), "date" : ISODate("2014-03-01T08:00:00Z") },
{ "_id" : 2, "item" : "jkl", "price" : NumberDecimal("20"), "quantity" : NumberInt("1"), "date" : ISODate("2014-03-01T09:00:00Z") },
{ "_id" : 3, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt( "10"), "date" : ISODate("2014-03-15T09:00:00Z") },
{ "_id" : 4, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt("20") , "date" : ISODate("2014-04-04T11:21:39.736Z") },
{ "_id" : 5, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("10") , "date" : ISODate("2014-04-04T21:23:13.331Z") },
{ "_id" : 6, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("5" ) , "date" : ISODate("2015-06-04T05:08:13Z") },
{ "_id" : 7, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("10") , "date" : ISODate("2015-09-10T08:43:00Z") },
{ "_id" : 8, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("5" ) , "date" : ISODate("2016-02-06T20:20:13Z") },
])
Group by Day of the Year按年度日期分组
The following pipeline calculates the total sales amount, average sales quantity, and sale count for each day in the year 2014:以下管道计算2014年每天的总销售额、平均销售额和销售额:
db.sales.aggregate([
// First Stage
{
$match : { "date": { $gte: new ISODate("2014-01-01"), $lt: new ISODate("2015-01-01") } }
},
// Second Stage
{
$group : {
_id : { $dateToString: { format: "%Y-%m-%d", date: "$date" } },
totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } },
averageQuantity: { $avg: "$quantity" },
count: { $sum: 1 }
}
},
// Third Stage
{
$sort : { totalSaleAmount: -1 }
}
])
First Stage:第一阶段The$match
stage filters the documents to only pass documents from the year 2014 to the next stage.$match
阶段筛选文档,只将2014年的文档传递到下一阶段。Second Stage:第二阶段The$group
stage groups the documents by date and calculates the total sale amount, average quantity, and total count of the documents in each group.$group
阶段按日期对文档进行分组,并计算每组中文档的总销售金额、平均数量和总数。Third Stage:第三阶段The$sort
stage sorts the results by the total sale amount for each group in descending order.$sort
阶段按每组的总销售额降序对结果进行排序。
The operation returns the following results:该操作返回以下结果:
{ "_id" : "2014-04-04", "totalSaleAmount" : NumberDecimal("200"), "averageQuantity" : 15, "count" : 2 }
{ "_id" : "2014-03-15", "totalSaleAmount" : NumberDecimal("50"), "averageQuantity" : 10, "count" : 1 }
{ "_id" : "2014-03-01", "totalSaleAmount" : NumberDecimal("40"), "averageQuantity" : 1.5, "count" : 2 }
This aggregation operation is equivalent to the following SQL statement:此聚合操作等效于以下SQL语句:
SELECT date,
Sum(( price * quantity )) AS totalSaleAmount,
Avg(quantity) AS averageQuantity,
Count(*) AS Count
FROM sales
WHERE date >= '01/01/2014' AND date < '01/01/2015'
GROUP BY date
ORDER BY totalSaleAmount DESC
Group by null
按null
分组
null
The following aggregation operation specifies a group 以下聚合操作指定分组_id
of null
, calculating the total sale amount, average quantity, and count of all documents in the collection._id
为null
,计算集合中所有文档的总销售金额、平均数量和计数。
db.sales.aggregate([
{
$group : {
_id : null,
totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } },
averageQuantity: { $avg: "$quantity" },
count: { $sum: 1 }
}
}
])
The operation returns the following result:该操作返回以下结果:
{
"_id" : null,
"totalSaleAmount" : NumberDecimal("452.5"),
"averageQuantity" : 7.875,
"count" : 8
}
This aggregation operation is equivalent to the following SQL statement:此聚合操作等效于以下SQL语句:
SELECT Sum(price * quantity) AS totalSaleAmount,
Avg(quantity) AS averageQuantity,
Count(*) AS Count
FROM sales
Pivot Data数据透视表
In 在mongosh
, create a sample collection named books
with the following documents:mongosh
中,使用以下文档创建一个名为books
的示例集合:
db.books.insertMany([
{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 },
{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
{ "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
])
Group title
by author
按author
分组title
title
by author
The following aggregation operation pivots the data in the 以下聚合操作将books
collection to have titles grouped by authors.books
集合中的数据调整为按作者分组的标题。
db.books.aggregate([
{ $group : { _id : "$author", books: { $push: "$title" } } }
])
The operation returns the following documents:该操作返回以下文档:
{ "_id" : "Homer", "books" : [ "The Odyssey", "Iliad" ] }
{ "_id" : "Dante", "books" : [ "The Banquet", "Divine Comedy", "Eclogues" ] }
Group Documents by author
按author
分组文档
author
The following aggregation operation groups documents by 以下聚合操作按author
:author
对文档进行分组:
db.books.aggregate([
// First Stage
{
$group : { _id : "$author", books: { $push: "$$ROOT" } }
},
// Second Stage
{
$addFields:
{
totalCopies : { $sum: "$books.copies" }
}
}
])
First Stage:第一阶段-
$group
uses the使用$$ROOT
system variable to group the entire documents by authors. This stage passes the following documents to the next stage:$$ROOT
系统变量按作者对整个文档进行分组。此阶段将以下文件传递到下一阶段:{ "_id" : "Homer",
"books" :
[
{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
{ "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
]
},
{ "_id" : "Dante",
"books" :
[
{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
]
} Second Stage:第二阶段-
$addFields
adds a field to the output containing the total copies of books for each author.在输出中添加一个字段,其中包含每个作者的书籍总副本。NoteThe resulting documents must not exceed the BSON Document Size limit of 16 megabytes.生成的文档不得超过BSON文档大小16兆字节的限制。The operation returns the following documents:该操作返回以下文档:{
"_id" : "Homer",
"books" :
[
{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
{ "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
],
"totalCopies" : 20
}
{
"_id" : "Dante",
"books" :
[
{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
],
"totalCopies" : 5
}
See also: 另请参阅:
Additional Resources额外的资源
The Aggregation with the Zip Code Data Set tutorial provides an extensive example of the 使用邮政编码数据集进行聚合教程在常见用例中提供了$group
operator in a common use case.$group
运算符的大量示例。