Docs HomeMongoDB Manual

$group (aggregation)

Definition定义

$group

The $group stage separates documents into groups according to a "group key". $group阶段根据“组键”将文档分为多个组。The output is one document for each unique group key.对于每个唯一的组键,输出是一个文档。

A group key is often a field, or group of fields. 组键通常是一个字段或一组字段。The group key can also be the result of an expression. 组键也可以是表达式的结果。Use the _id field in the $group pipeline stage to set the group key. 使用$group管道阶段中的_id字段来设置组键。See below for usage examples.请参阅下面的用法示例

In the $group stage output, the _id field is set to the group key for that document.$group阶段输出中,_id字段被设置为该文档的组键。

The output documents can also contain additional fields that are set using accumulator expressions.输出文档还可以包含使用累加器表达式设置的附加字段。

Note

$group does not order its output documents.不排序其输出文档。

The $group stage has the following prototype form:$group阶段具有以下原型形式:

{
$group:
{
_id: <expression>, // Group key
<field1>: { <accumulator1> : <expression1> },
...
}
}
Field字段Description描述
_idRequired.必要的。The _id expression specifies the group key. _id表达式指定组键。If you specify an _id value of null, or any other constant value, the $group stage returns a single document that aggregates values across all of the input documents. 如果指定_id值为null或任何其他常数值,$group阶段将返回一个文档,该文档聚合所有输入文档的值。See the Group by Null example.请参见“按Null分组”示例。
fieldOptional.可选的。Computed using the accumulator operators.使用累加器运算符进行计算。

The _id and the accumulator operators can accept any valid expression. _id累加器运算符可以接受任何有效的表达式。For more information on expressions, see Expressions.有关表达式的详细信息,请参阅表达式

Considerations注意事项

Accumulator Operator集合器运算符

The <accumulator> operator must be one of the following accumulator operators:<accumulator>运算符必须是以下累加器运算符之一:

Changed in version 5.05.0版更改.

Name名称Description描述
$accumulatorReturns the result of a user-defined accumulator function.返回用户定义的累加器函数的结果。
$addToSetReturns an array of unique expression values for each group. 为每个组返回一个唯一的表达式值数组。Order of the array elements is undefined. 数组元素的顺序未定义。
Changed in version 5.0:5.0版更改:Available in the $setWindowFields stage. $setWindowFields阶段中可用。
$avgReturns an average of numerical values. 返回数值的平均值。Ignores non-numeric values. 忽略非数值。
Changed in version 5.0:5.0版更改:Available in the $setWindowFields stage. $setWindowFields阶段中可用。
$bottomReturns the bottom element within a group according to the specified sort order. 根据指定的排序顺序返回组中的底部元素。
New in version 5.2. 5.2版新增。
Available in the $group and $setWindowFields stages. $group$setWindowFields阶段中可用。
$bottomNReturns an aggregation of the bottom n fields within a group, according to the specified sort order. 根据指定的排序顺序,返回组中最后n个字段的聚合。
New in version 5.2. 5.2版新增。
Available in the $group and $setWindowFields stages. $group$setWindowFields阶段中可用。
$countReturns the number of documents in a group.返回一个组中的文档数。
Distinct from the $count pipeline stage. $count管道阶段不同。
New in version 5.0:5.0版新增:Available in the $group and $setWindowFields stages. $group$setWindowFields阶段中可用。
$firstReturns the result of an expression for the first document in a group. 返回组中第一个文档的表达式结果。
Changed in version 5.0:5.0版更改:Available in the $setWindowFields stage. $setWindowFields阶段中可用。
$firstNReturns an aggregation of the first n elements within a group. 返回一个组中前n个元素的聚合。Only meaningful when documents are in a defined order. 只有当文档按定义的顺序排列时才有意义。Distinct from the $firstN array operator. $firstN数组运算符不同。
New in version 5.2:5.2版新增:Available in the $group, expression and $setWindowFields stages. $group表达式$setWindowFields阶段中可用。
$lastReturns the result of an expression for the last document in a group. 返回组中最后一个文档的表达式结果。
Changed in version 5.0:5.0版更改:Available in the $setWindowFields stage. $setWindowFields阶段中可用。
$lastNReturns an aggregation of the last n elements within a group. 返回一个组中最后n个元素的聚合。Only meaningful when documents are in a defined order. 只有当文档按定义的顺序排列时才有意义。Distinct from the $lastN array operator. $lastN数组运算符不同。
New in version 5.2:5.2版新增:Available in the $group, expression and $setWindowFields stages. $group表达式$setWindowFields阶段中可用。
$maxReturns the highest expression value for each group. 返回每组的最高表达式值。
Changed in version 5.0:5.0版更改:Available in the $setWindowFields stage. $setWindowFields阶段中可用。
$maxNReturns an aggregation of the n maximum valued elements in a group. 返回一个组中n个最大值元素的聚合。Distinct from the $maxN array operator. $maxN数组运算符不同。
New in version 5.2. 5.2版新增。
Available in $group, $setWindowFields and as an expression. $group$setWindowFields中可用,用作表达式
$medianReturns an approximation of the median, the 50th percentile, as a scalar value. 以标量值的形式返回中值(第50个百分位数)的近似值。
New in version 7.0. 7.0版新增。
This operator is available as an accumulator in these stages: 此操作器在以下阶段可用作蓄能器: It is also available as an aggregation expression. 它也可用作聚合表达式
$mergeObjectsReturns a document created by combining the input documents for each group.返回通过组合每组的输入文档创建的文档。
$minReturns the lowest expression value for each group. 返回每组的最低表达式值。
Changed in version 5.0:5.0版更改:Available in the $setWindowFields stage. $setWindowFields阶段中可用。
$percentileReturns an array of scalar values that correspond to specified percentile values.返回与指定百分位数相对应的标量值数组。
New in version 7.0. 7.0版新增。
This operator is available as an accumulator in these stages: 此操作器在以下阶段可用作蓄能器: It is also available as an aggregation expression. 它也可用作聚合表达式
$pushReturns an array of expression values for documents in each group. 为每组中的文档返回一个表达式值数组。
Changed in version 5.0:5.0版更改:Available in the $setWindowFields stage. $setWindowFields阶段中可用。
$stdDevPopReturns the population standard deviation of the input values. 返回输入值的总体标准偏差。
Changed in version 5.0:5.0版更改:Available in the $setWindowFields stage. $setWindowFields阶段中可用。
$stdDevSampReturns the sample standard deviation of the input values. 返回输入值的标准偏差样本。
Changed in version 5.0:5.0版更改:Available in the $setWindowFields stage. $setWindowFields阶段中可用。
$sumReturns a sum of numerical values. 返回数值的总和。Ignores non-numeric values. 忽略非数值。
Changed in version 5.0:5.0版更改:Available in the $setWindowFields stage. $setWindowFields阶段中可用。
$topReturns the top element within a group according to the specified sort order. 根据指定的排序顺序返回组中的顶部元素。
New in version 5.2. 5.2版新增。
Available in the $group and $setWindowFields stages. $group$setWindowFields阶段中可用。
$topNReturns an aggregation of the top n fields within a group, according to the specified sort order. 根据指定的排序顺序,返回组中前n个字段的聚合。
New in version 5.2. 5.2版新增。
Available in the $group and $setWindowFields stages. $group$setWindowFields阶段中可用。

$group and Memory Restrictions和内存限制

The $group stage has a limit of 100 megabytes of RAM. $group阶段的RAM限制为100兆字节。By default, if the stage exceeds this limit, $group returns an error. 默认情况下,如果阶段超过此限制,$group将返回一个错误。To allow more space for stage processing, use the allowDiskUse option to enable aggregation pipeline stages to write data to temporary files.若要为阶段处理留出更多空间,请使用allowDiskUse选项启用聚合管道阶段以将数据写入临时文件。

Tip

$group Performance Optimizations性能优化

This section describes optimizations to improve the performance of $group. 本节介绍用于提高$group性能的优化。There are optimizations that you can make manually and optimizations MongoDB makes internally.您可以手动进行优化,也可以在MongoDB内部进行优化。

Optimization to Return the First or Last Document of Each Group优化返回每组的第一个或最后一个文档

If a pipeline sorts and groups by the same field and the $group stage only uses the $first or $last accumulator operator, consider adding an index on the grouped field which matches the sort order. 如果管道按同一字段进行排序分组,而$group阶段仅使用$first$last累加器运算符,请考虑在分组字段上添加与排序顺序匹配的索引。In some cases, the $group stage can use the index to quickly find the first or last document of each group.在某些情况下,$group阶段可以使用索引快速查找每个组的第一个或最后一个文档。

Example

If a collection named foo contains an index { x: 1, y: 1 }, the following pipeline can use that index to find the first document of each group:如果名为foo的集合包含索引{ x: 1, y: 1 },则以下管道可以使用该索引来查找每个组的第一个文档:

db.foo.aggregate([
{
$sort:{ x : 1, y : 1 }
},
{
$group: {
_id: { x : "$x" },
y: { $first : "$y" }
}
}
])

Slot-Based Query Execution Engine基于插槽的查询执行引擎

Starting in version 5.2, MongoDB uses the slot-based execution query engine to execute $group stages if either:从5.2版本开始,MongoDB使用基于槽的执行查询引擎来执行$group阶段,如果:

  • $group is the first stage in the pipeline.是管道中的第一阶段。
  • All preceding stages in the pipeline can also be executed by the slot-based execution engine.管道中的所有先前阶段也可以由基于槽的执行引擎执行。

For more information, see $group Optimization.有关更多信息,请参阅$group优化

Examples实例

Count the Number of Documents in a Collection统计集合中的文档数

In mongosh, create a sample collection named sales with the following documents:mongosh中,使用以下文档创建一个名为sales的示例集合:

db.sales.insertMany([
{ "_id" : 1, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("2"), "date" : ISODate("2014-03-01T08:00:00Z") },
{ "_id" : 2, "item" : "jkl", "price" : NumberDecimal("20"), "quantity" : NumberInt("1"), "date" : ISODate("2014-03-01T09:00:00Z") },
{ "_id" : 3, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt( "10"), "date" : ISODate("2014-03-15T09:00:00Z") },
{ "_id" : 4, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt("20") , "date" : ISODate("2014-04-04T11:21:39.736Z") },
{ "_id" : 5, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("10") , "date" : ISODate("2014-04-04T21:23:13.331Z") },
{ "_id" : 6, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("5" ) , "date" : ISODate("2015-06-04T05:08:13Z") },
{ "_id" : 7, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("10") , "date" : ISODate("2015-09-10T08:43:00Z") },
{ "_id" : 8, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("5" ) , "date" : ISODate("2016-02-06T20:20:13Z") },
])

The following aggregation operation uses the $group stage to count the number of documents in the sales collection:以下聚合操作使用$group阶段来统计sales集合中的文档数:

db.sales.aggregate( [
{
$group: {
_id: null,
count: { $count: { } }
}
}
] )

The operation returns the following result:该操作返回以下结果:

{ "_id" : null, "count" : 8 }

This aggregation operation is equivalent to the following SQL statement:此聚合操作等效于以下SQL语句:

SELECT COUNT(*) AS count FROM sales
Tip

See also: 另请参阅:

Retrieve Distinct Values检索不同的值

The following aggregation operation uses the $group stage to retrieve the distinct item values from the sales collection:以下聚合操作使用$group阶段从销售集合中检索不同的项目值:

db.sales.aggregate( [ { $group : { _id : "$item" } } ] )

The operation returns the following result:该操作返回以下结果:

{ "_id" : "abc" }
{ "_id" : "jkl" }
{ "_id" : "def" }
{ "_id" : "xyz" }

Group by Item Having按项目分组

The following aggregation operation groups documents by the item field, calculating the total sale amount per item and returning only the items with total sale amount greater than or equal to 100:以下聚合操作按item字段对单据进行分组,计算每个物料的总销售额,只返回总销售额大于或等于100的物料:

db.sales.aggregate(
[
// First Stage
{
$group :
{
_id : "$item",
totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } }
}
},
// Second Stage
{
$match: { "totalSaleAmount": { $gte: 100 } }
}
]
)
First Stage:第一阶段
The $group stage groups the documents by item to retrieve the distinct item values. This stage returns the totalSaleAmount for each item.$group阶段按项目对文档进行分组,以检索不同的项目值。此阶段返回每个项目的总销售金额。
Second Stage:第二阶段
The $match stage filters the resulting documents to only return items with a totalSaleAmount greater than or equal to 100.$match阶段筛选生成的文档,以仅返回totalSaleAmount大于或等于100的项目。

The operation returns the following result:该操作返回以下结果:

{ "_id" : "abc", "totalSaleAmount" : NumberDecimal("170") }
{ "_id" : "xyz", "totalSaleAmount" : NumberDecimal("150") }
{ "_id" : "def", "totalSaleAmount" : NumberDecimal("112.5") }

This aggregation operation is equivalent to the following SQL statement:此聚合操作等效于以下SQL语句:

SELECT item,
Sum(( price * quantity )) AS totalSaleAmount
FROM sales
GROUP BY item
HAVING totalSaleAmount >= 100
Tip

See also: 另请参阅:

$match

Calculate Count, Sum, and Average计算计数、求和和和平均值

In mongosh, create a sample collection named sales with the following documents:mongosh中,使用以下文档创建一个名为sales的示例集合:

db.sales.insertMany([
{ "_id" : 1, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("2"), "date" : ISODate("2014-03-01T08:00:00Z") },
{ "_id" : 2, "item" : "jkl", "price" : NumberDecimal("20"), "quantity" : NumberInt("1"), "date" : ISODate("2014-03-01T09:00:00Z") },
{ "_id" : 3, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt( "10"), "date" : ISODate("2014-03-15T09:00:00Z") },
{ "_id" : 4, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt("20") , "date" : ISODate("2014-04-04T11:21:39.736Z") },
{ "_id" : 5, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("10") , "date" : ISODate("2014-04-04T21:23:13.331Z") },
{ "_id" : 6, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("5" ) , "date" : ISODate("2015-06-04T05:08:13Z") },
{ "_id" : 7, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("10") , "date" : ISODate("2015-09-10T08:43:00Z") },
{ "_id" : 8, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("5" ) , "date" : ISODate("2016-02-06T20:20:13Z") },
])

Group by Day of the Year按年度日期分组

The following pipeline calculates the total sales amount, average sales quantity, and sale count for each day in the year 2014:以下管道计算2014年每天的总销售额、平均销售额和销售额:

db.sales.aggregate([
// First Stage
{
$match : { "date": { $gte: new ISODate("2014-01-01"), $lt: new ISODate("2015-01-01") } }
},
// Second Stage
{
$group : {
_id : { $dateToString: { format: "%Y-%m-%d", date: "$date" } },
totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } },
averageQuantity: { $avg: "$quantity" },
count: { $sum: 1 }
}
},
// Third Stage
{
$sort : { totalSaleAmount: -1 }
}
])
First Stage:第一阶段
The $match stage filters the documents to only pass documents from the year 2014 to the next stage.$match阶段筛选文档,只将2014年的文档传递到下一阶段。
Second Stage:第二阶段
The $group stage groups the documents by date and calculates the total sale amount, average quantity, and total count of the documents in each group.$group阶段按日期对文档进行分组,并计算每组中文档的总销售金额、平均数量和总数。
Third Stage:第三阶段
The $sort stage sorts the results by the total sale amount for each group in descending order.$sort阶段按每组的总销售额降序对结果进行排序。

The operation returns the following results:该操作返回以下结果:

{ "_id" : "2014-04-04", "totalSaleAmount" : NumberDecimal("200"), "averageQuantity" : 15, "count" : 2 }
{ "_id" : "2014-03-15", "totalSaleAmount" : NumberDecimal("50"), "averageQuantity" : 10, "count" : 1 }
{ "_id" : "2014-03-01", "totalSaleAmount" : NumberDecimal("40"), "averageQuantity" : 1.5, "count" : 2 }

This aggregation operation is equivalent to the following SQL statement:此聚合操作等效于以下SQL语句:

SELECT date,
Sum(( price * quantity )) AS totalSaleAmount,
Avg(quantity) AS averageQuantity,
Count(*) AS Count
FROM sales
WHERE date >= '01/01/2014' AND date < '01/01/2015'
GROUP BY date
ORDER BY totalSaleAmount DESC
Tip

See also: 另请参阅:

Group by nullnull分组

The following aggregation operation specifies a group _id of null, calculating the total sale amount, average quantity, and count of all documents in the collection.以下聚合操作指定分组_idnull,计算集合中所有文档的总销售金额、平均数量和计数。

db.sales.aggregate([
{
$group : {
_id : null,
totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } },
averageQuantity: { $avg: "$quantity" },
count: { $sum: 1 }
}
}
])

The operation returns the following result:该操作返回以下结果:

{
"_id" : null,
"totalSaleAmount" : NumberDecimal("452.5"),
"averageQuantity" : 7.875,
"count" : 8
}

This aggregation operation is equivalent to the following SQL statement:此聚合操作等效于以下SQL语句:

SELECT Sum(price * quantity) AS totalSaleAmount,
Avg(quantity) AS averageQuantity,
Count(*) AS Count
FROM sales
Tip

See also: 另请参阅:

Pivot Data数据透视表

In mongosh, create a sample collection named books with the following documents:mongosh中,使用以下文档创建一个名为books的示例集合:

db.books.insertMany([
{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 },
{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
{ "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
])

Group title by authorauthor分组title

The following aggregation operation pivots the data in the books collection to have titles grouped by authors.以下聚合操作将books集合中的数据调整为按作者分组的标题。

db.books.aggregate([
{ $group : { _id : "$author", books: { $push: "$title" } } }
])

The operation returns the following documents:该操作返回以下文档:

{ "_id" : "Homer", "books" : [ "The Odyssey", "Iliad" ] }
{ "_id" : "Dante", "books" : [ "The Banquet", "Divine Comedy", "Eclogues" ] }

Group Documents by authorauthor分组文档

The following aggregation operation groups documents by author:以下聚合操作按author对文档进行分组:

db.books.aggregate([
// First Stage
{
$group : { _id : "$author", books: { $push: "$$ROOT" } }
},
// Second Stage
{
$addFields:
{
totalCopies : { $sum: "$books.copies" }
}
}
])
First Stage:第一阶段

$group uses the $$ROOT system variable to group the entire documents by authors. This stage passes the following documents to the next stage:使用$$ROOT系统变量按作者对整个文档进行分组。此阶段将以下文件传递到下一阶段:

{ "_id" : "Homer",
"books" :
[
{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
{ "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
]
},
{ "_id" : "Dante",
"books" :
[
{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
]
}
Second Stage:第二阶段

$addFields adds a field to the output containing the total copies of books for each author.在输出中添加一个字段,其中包含每个作者的书籍总副本。

Note

The resulting documents must not exceed the BSON Document Size limit of 16 megabytes.生成的文档不得超过BSON文档大小16兆字节的限制。

The operation returns the following documents:该操作返回以下文档:

{
"_id" : "Homer",
"books" :
[
{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
{ "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
],
"totalCopies" : 20
}

{
"_id" : "Dante",
"books" :
[
{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
],
"totalCopies" : 5
}
Tip

See also: 另请参阅:

$addFields

Additional Resources额外的资源

The Aggregation with the Zip Code Data Set tutorial provides an extensive example of the $group operator in a common use case.使用邮政编码数据集进行聚合教程在常见用例中提供了$group运算符的大量示例。