$group (aggregation)

On this page本页内容

Definition定义

$group

Groups input documents by the specified _id expression and for each distinct grouping, outputs a document. 按指定的_id表达式对输入文档进行分组,并为每个不同的分组输出一个文档。The _id field of each output document contains the unique group by value. 每个输出文档的_id字段包含唯一的group by值。The output documents can also contain computed fields that hold the values of some accumulator expression.输出文档还可以包含保存某些累加器表达式值的计算字段。

Note注意

$group does not order its output documents.不对其输出文档进行排序。

The $group stage has the following prototype form:$group阶段具有以下原型形式:

{
  $group:
    {
      _id: <expression>, // Group By Expression
      <field1>: { <accumulator1> : <expression1> },
      ...
    }
 }
Field字段Description描述
_idRequired.必需。 If you specify an _id value of null, or any other constant value, the $group stage calculates accumulated values for all the input documents as a whole. 如果将_id值指定为null或任何其他常量值,$group阶段将计算所有输入文档作为一个整体的累积值。See example of Group by Null.请参见按空分组的示例
fieldOptional.可选。 Computed using the accumulator operators.使用累加器运算符计算。

The _id and the accumulator operators can accept any valid expression. _id累加器运算符可以接受任何有效表达式。For more information on expressions, see Expressions.有关表达式的详细信息,请参阅表达式

Considerations注意事项

Accumulator Operator累加器运算符

The <accumulator> operator must be one of the following accumulator operators:<accumulator>运算符必须是以下累加器运算符之一:

Changed in version 5.0.在版本5.0中更改

Name名称Description描述
$accumulatorReturns the result of a user-defined accumulator function.返回用户定义的累加器函数的结果。
$addToSet

Returns an array of unique expression values for each group. Order of the array elements is undefined.返回每个组的唯一表达式值数组。数组元素的顺序未定义。

Changed in version 5.0.在版本5.0中更改

Available in $setWindowFields stage.$setWindowFields阶段可用。

$avg

Returns an average of numerical values. Ignores non-numeric values.返回数值的平均值。忽略非数值。

Changed in version 5.0.在版本5.0中更改

Available in $setWindowFields stage.$setWindowFields阶段可用。

$bottom

Returns the bottom element within a group according to the specified sort order.根据指定的排序顺序返回组中的底部元素。

New in version 5.2.在版本5.2中新增

Available in $group and $setWindowFields stages.$group$setWindowFields阶段中可用。

$bottomN

Returns an aggregation of the bottom n fields within a group, according to the specified sort order.根据指定的排序顺序,返回组中底部n个字段的聚合。

New in version 5.2.在版本5.2中新增

Available in $group and $setWindowFields stages.$group$setWindowFields阶段中可用。

$count

Returns the number of documents in a group.返回组中的文档数。

Distinct from the $count pipeline stage.$count管道阶段不同。

New in version 5.0.在版本5.0中新增 Available in $group and $setWindowFields stages.$group$setWindowFields阶段中可用。

$first

Returns a value from the first document for each group. Order is only defined if the documents are sorted.从每个组的第一个文档返回一个值。仅当文档已排序时,才定义顺序。

Distinct from the $first array operator.$first数组运算符不同。

Changed in version 5.0.在版本5.0中更改

Available in $setWindowFields stage.$setWindowFields阶段可用。

$firstN

Returns an aggregation of the first n elements within a group. 返回组中前n个元素的集合。Only meaningful when documents are in a defined order. 仅当文档处于定义的顺序时才有意义。Distinct from the $firstN array operator.$firstN数组运算符不同。

New in version 5.2.在版本5.2中新增 Available in $group, expression and $setWindowFields stages.$group表达式$setWindowFields阶段中可用。

$last

Returns a value from the last document for each group. Order is only defined if the documents are sorted.从每个组的最后一个文档返回一个值。仅当文档已排序时,才定义顺序。

Distinct from the $last array operator.$last数组运算符不同。

Changed in version 5.0.在版本5.0中更改

Available in $setWindowFields stage.$setWindowFields阶段可用。

$lastN

Returns an aggregation of the last n elements within a group. Only meaningful when documents are in a defined order. 返回组中最后n个元素的集合。仅当文档处于定义的顺序时才有意义。Distinct from the $lastN array operator.$lastN数组运算符不同。

New in version 5.2.在版本5.2中新增 Available in $group, expression and $setWindowFields stages.$group表达式$setWindowFields阶段中可用。

$max

Returns the highest expression value for each group.返回每个组的最高表达式值。

Changed in version 5.0.在版本5.0中更改

Available in $setWindowFields stage.$setWindowFields阶段可用。

$maxN

Returns an aggregation of the n maximum valued elements in a group. 返回组中n个最大值元素的集合。Distinct from the $maxN array operator.$maxN数组运算符不同。

New in version 5.2.在版本5.2中新增

Available in $group, $setWindowFields and as an expression.$group$setWindowFields表达式中可用。

$mergeObjectsReturns a document created by combining the input documents for each group.返回通过组合每个组的输入文档创建的文档。
$min

Returns the lowest expression value for each group.返回每个组的最低表达式值。

Changed in version 5.0.在版本5.0中更改

Available in $setWindowFields stage.$setWindowFields阶段可用。

$push

Returns an array of expression values for documents in each group.返回每个组中文档的表达式值数组。

Changed in version 5.0.在版本5.0中更改

Available in $setWindowFields stage.$setWindowFields阶段可用。

$stdDevPop

Returns the population standard deviation of the input values.返回输入值的总体标准偏差。

Changed in version 5.0.在版本5.0中更改

Available in $setWindowFields stage.$setWindowFields阶段可用。

$stdDevSamp

Returns the sample standard deviation of the input values.返回输入值的样本标准偏差。

Changed in version 5.0.在版本5.0中更改

Available in $setWindowFields stage.$setWindowFields阶段可用。

$sum

Returns a sum of numerical values. Ignores non-numeric values.返回数值的总和。忽略非数值。

Changed in version 5.0.在版本5.0中更改

Available in $setWindowFields stage.$setWindowFields阶段可用。

$top

Returns the top element within a group according to the specified sort order.根据指定的排序顺序返回组中的顶部元素。

New in version 5.2.在版本5.2中新增

Available in $group and $setWindowFields stages.$group$setWindowFields阶段中可用。

$topN

Returns an aggregation of the top n fields within a group, according to the specified sort order.根据指定的排序顺序返回组中前n个字段的聚合。

New in version 5.2.在版本5.2中新增

Available in $group and $setWindowFields stages.$group$setWindowFields阶段中可用。

$group and Memory Restrictions和内存限制

The $group stage has a limit of 100 megabytes of RAM. $group阶段的RAM限制为100兆字节。By default, if the stage exceeds this limit, $group returns an error. 默认情况下,如果阶段超过此限制,$group将返回错误。To allow more space for stage processing, use the allowDiskUse option to enable aggregation pipeline stages to write data to temporary files.要为阶段处理提供更多空间,请使用allowDiskUse选项启用聚合管道阶段将数据写入临时文件。

Tip提示

$group Performance Optimizations性能优化

This section describes optimizations to improve the performance of $group. 本节介绍了提高$group性能的优化。There are optimizations that you can make manually and optimizations MongoDB makes internally.您可以手动进行优化,也可以在MongoDB内部进行优化。

Optimization to Return the First Document of Each Group优化以返回每组的第一个文档

If a pipeline sorts and groups by the same field and the $group stage only uses the $first accumulator operator, consider adding an index on the grouped field which matches the sort order. 如果管道按同一字段排序分组$group阶段仅使用$first累加器运算符,请考虑在分组字段上添加与排序顺序匹配的索引In some cases, the $group stage can use the index to quickly find the first document of each group.在某些情况下,$group阶段可以使用索引快速查找每个组的第一个文档。

Example示例

If a collection named foo contains an index { x: 1, y: 1 }, the following pipeline can use that index to find the first document of each group:如果名为foo的集合包含索引{ x: 1, y: 1 },则以下管道可以使用该索引查找每个组的第一个文档:

db.foo.aggregate([
  {
    $sort:{ x : 1, y : 1 }
  },
  {
    $group: {
      _id: { x : "$x" },
      y: { $first : "$y" }
    }
  }
])

Slot-Based Query Execution Engine基于时隙的查询执行引擎

Starting in MongoDB 5.2, MongoDB uses the slot-based execution query engine to execute $group stages when $group is either:从MongoDB 5.2开始,MongoDB使用基于时隙的执行查询引擎来执行$group阶段,当$group为:

  • The first stage in the pipeline.管道中的第一阶段。
  • Part of a series of stages executed by the slot-based engine that occurs at the beginning of the pipeline. 由基于时隙的引擎执行的一系列阶段的一部分,发生在管道的开始处。For example, if a pipeline begins with $match followed by $group, the $match and $group stages are executed by the slot-based engine.例如,如果管道以$match开头,后面跟着$group,则$match$group阶段由基于插槽的引擎执行。

In most cases, the slot-based engine provides improved performance and lower CPU and memory costs compared to the classic query engine.在大多数情况下,与传统查询引擎相比,基于插槽的引擎提供了更好的性能和更低的CPU和内存成本。

See $group Optimization.请参见$group优化

Examples示例

Count the Number of Documents in a Collection统计集合中的文档数

In mongosh, create a sample collection named sales with the following documents:mongosh中,使用以下文档创建名为sales的样本集合:

db.sales.insertMany([
  { "_id" : 1, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("2"), "date" : ISODate("2014-03-01T08:00:00Z") },
  { "_id" : 2, "item" : "jkl", "price" : NumberDecimal("20"), "quantity" : NumberInt("1"), "date" : ISODate("2014-03-01T09:00:00Z") },
  { "_id" : 3, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt( "10"), "date" : ISODate("2014-03-15T09:00:00Z") },
  { "_id" : 4, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" :  NumberInt("20") , "date" : ISODate("2014-04-04T11:21:39.736Z") },
  { "_id" : 5, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("10") , "date" : ISODate("2014-04-04T21:23:13.331Z") },
  { "_id" : 6, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("5" ) , "date" : ISODate("2015-06-04T05:08:13Z") },
  { "_id" : 7, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("10") , "date" : ISODate("2015-09-10T08:43:00Z") },
  { "_id" : 8, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("5" ) , "date" : ISODate("2016-02-06T20:20:13Z") },
])

The following aggregation operation uses the $group stage to count the number of documents in the sales collection:以下聚合操作使用$group阶段计算sales集合中的文档数:

db.sales.aggregate( [
  {
    $group: {
       _id: null,
       count: { $count: { } }
    }
  }
] )

The operation returns the following result:该操作返回以下结果:

{ "_id" : null, "count" : 8 }

This aggregation operation is equivalent to the following SQL statement:此聚合操作等效于以下SQL语句:

SELECT COUNT(*) AS count FROM sales

Retrieve Distinct Values检索不同的值

The following aggregation operation uses the $group stage to retrieve the distinct item values from the sales collection:以下聚合操作使用$group阶段从sales集合中检索不同的项目值:

db.sales.aggregate( [ { $group : { _id : "$item" } } ] )

The operation returns the following result:该操作返回以下结果:

{ "_id" : "abc" }
{ "_id" : "jkl" }
{ "_id" : "def" }
{ "_id" : "xyz" }

Group by Item Having按项目分组

The following aggregation operation groups documents by the item field, calculating the total sale amount per item and returning only the items with total sale amount greater than or equal to 100:以下聚合操作按item字段对文档进行分组,计算每个项目的总销售额,并仅返回总销售额大于或等于100的项目:

db.sales.aggregate(
  [
    // First Stage
    {
      $group :
        {
          _id : "$item",
          totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } }
        }
     },
     // Second Stage
     {
       $match: { "totalSaleAmount": { $gte: 100 } }
     }
   ]
 )
First Stage:第一阶段:
The $group stage groups the documents by item to retrieve the distinct item values. $group阶段按项目对文档进行分组,以检索不同的项目值。This stage returns the totalSaleAmount for each item.此阶段返回每个项目的totalSaleAmount(总销售额)。
Second Stage:第二阶段:
The $match stage filters the resulting documents to only return items with a totalSaleAmount greater than or equal to 100.$match阶段筛选结果文档,仅返回totalSaleAmount大于或等于100的项目。

The operation returns the following result:该操作返回以下结果:

{ "_id" : "abc", "totalSaleAmount" : NumberDecimal("170") }
{ "_id" : "xyz", "totalSaleAmount" : NumberDecimal("150") }
{ "_id" : "def", "totalSaleAmount" : NumberDecimal("112.5") }

This aggregation operation is equivalent to the following SQL statement:此聚合操作等效于以下SQL语句:

SELECT item,
   Sum(( price * quantity )) AS totalSaleAmount
FROM   sales
GROUP  BY item
HAVING totalSaleAmount >= 100
Tip提示
See also: 参阅:

Calculate Count, Sum, and Average计算计数、总和和平均值

In mongosh, create a sample collection named sales with the following documents:mongosh中,使用以下文档创建名为sales的样本集合:

db.sales.insertMany([
  { "_id" : 1, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("2"), "date" : ISODate("2014-03-01T08:00:00Z") },
  { "_id" : 2, "item" : "jkl", "price" : NumberDecimal("20"), "quantity" : NumberInt("1"), "date" : ISODate("2014-03-01T09:00:00Z") },
  { "_id" : 3, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt( "10"), "date" : ISODate("2014-03-15T09:00:00Z") },
  { "_id" : 4, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" :  NumberInt("20") , "date" : ISODate("2014-04-04T11:21:39.736Z") },
  { "_id" : 5, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("10") , "date" : ISODate("2014-04-04T21:23:13.331Z") },
  { "_id" : 6, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("5" ) , "date" : ISODate("2015-06-04T05:08:13Z") },
  { "_id" : 7, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("10") , "date" : ISODate("2015-09-10T08:43:00Z") },
  { "_id" : 8, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("5" ) , "date" : ISODate("2016-02-06T20:20:13Z") },
])

Group by Day of the Year按日分组

The following pipeline calculates the total sales amount, average sales quantity, and sale count for each day in the year 2014:以下管道计算2014年每天的总销售额、平均销售额和销售额:

db.sales.aggregate([
  // First Stage
  {
    $match : { "date": { $gte: new ISODate("2014-01-01"), $lt: new ISODate("2015-01-01") } }
  },
  // Second Stage
  {
    $group : {
       _id : { $dateToString: { format: "%Y-%m-%d", date: "$date" } },
       totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } },
       averageQuantity: { $avg: "$quantity" },
       count: { $sum: 1 }
    }
  },
  // Third Stage
  {
    $sort : { totalSaleAmount: -1 }
  }
 ])
First Stage:第一阶段:
The $match stage filters the documents to only pass documents from the year 2014 to the next stage.$match阶段筛选文档,仅将2014年的文档传递到下一阶段。
Second Stage:第二阶段:
The $group stage groups the documents by date and calculates the total sale amount, average quantity, and total count of the documents in each group.$group阶段按日期对文档进行分组,并计算每组文档的总销售额、平均数量和总计数。
Third Stage:第三阶段:
The $sort stage sorts the results by the total sale amount for each group in descending order.$sort阶段按每组的总销售额降序对结果进行排序。

The operation returns the following results:该操作返回以下结果:

{ "_id" : "2014-04-04", "totalSaleAmount" : NumberDecimal("200"), "averageQuantity" : 15, "count" : 2 }
{ "_id" : "2014-03-15", "totalSaleAmount" : NumberDecimal("50"), "averageQuantity" : 10, "count" : 1 }
{ "_id" : "2014-03-01", "totalSaleAmount" : NumberDecimal("40"), "averageQuantity" : 1.5, "count" : 2 }

This aggregation operation is equivalent to the following SQL statement:此聚合操作等效于以下SQL语句:

SELECT date,
       Sum(( price * quantity )) AS totalSaleAmount,
       Avg(quantity)
             AS averageQuantity,
       Count(*)
                  AS Count
FROM   sales
GROUP  BY Date(date)
ORDER  BY totalSaleAmount DESC
Tip提示
See also: 参阅:

Group by nullnull分组

The following aggregation operation specifies a group _id of null, calculating the total sale amount, average quantity, and count of all documents in the collection.以下聚合操作指定group_idnull,用于计算集合中所有文档的总销售额、平均数量和计数。

db.sales.aggregate([
  {
    $group : {
       _id : null,
       totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } },
       averageQuantity: { $avg: "$quantity" },
       count: { $sum: 1 }
    }
  }
 ])

The operation returns the following result:该操作返回以下结果:

{
  "_id" : null,
  "totalSaleAmount" : NumberDecimal("452.5"),
  "averageQuantity" : 7.875,
  "count" : 8
}

This aggregation operation is equivalent to the following SQL statement:此聚合操作等效于以下SQL语句:

SELECT Sum(price * quantity) AS totalSaleAmount,
       Avg(quantity)
         AS averageQuantity,
       Count(*)
              AS Count
FROM   sales
Tip提示
See also: 参阅:

Pivot Data数据透视

In mongosh, create a sample collection named books with the following documents:mongosh中,使用以下文档创建名为books的示例集合:

db.books.insertMany([
  { "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
  { "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
  { "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 },
  { "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
  { "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
])

Group title by authorauthor分组title

The following aggregation operation pivots the data in the books collection to have titles grouped by authors.以下聚合操作将books集合中的数据透视为按作者分组的标题。

db.books.aggregate([
   { $group : { _id : "$author", books: { $push: "$title" } } }
 ])

The operation returns the following documents:操作将返回以下文档:

{ "_id" : "Homer", "books" : [ "The Odyssey", "Iliad" ] }
{ "_id" : "Dante", "books" : [ "The Banquet", "Divine Comedy", "Eclogues" ] }

Group Documents by authorauthor分组文档

The following aggregation operation groups documents by author:以下聚合操作按author对文档进行分组:

db.books.aggregate([
   // First Stage
   {
     $group : { _id : "$author", books: { $push: "$$ROOT" } }
   },
   // Second Stage
   {
     $addFields:
       {
         totalCopies : { $sum: "$books.copies" }
       }
   }
 ])
First Stage:第一阶段:

$group uses the $$ROOT system variable to group the entire documents by authors. 使用$$ROOT系统变量按作者对整个文档进行分组。This stage passes the following documents to the next stage:本阶段将以下文件传递给下一阶段:

{ "_id" : "Homer",
  "books" :
    [
       { "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
       { "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
    ]
 },
 { "_id" : "Dante",
   "books" :
     [
       { "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
       { "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
       { "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
     ]
 }
Second Stage:第二阶段:

$addFields adds a field to the output containing the total copies of books for each author.在输出中添加一个字段,该字段包含每个作者的图书总数。

Note注意

The resulting documents must not exceed the BSON Document Size limit of 16 megabytes.生成的文档不得超过16兆字节的BSON文档大小限制。

The operation returns the following documents:操作将返回以下文档:

{
  "_id" : "Homer",
  "books" :
     [
       { "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
       { "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
     ],
   "totalCopies" : 20
}
{
  "_id" : "Dante",
  "books" :
     [
       { "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
       { "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
       { "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
     ],
   "totalCopies" : 5
}
Tip提示
See also: 参阅:

Additional Resources额外资源

The Aggregation with the Zip Code Data Set tutorial provides an extensive example of the $group operator in a common use case.“使用邮政编码数据集进行聚合”教程提供了一个常见用例中的$group运算符的广泛示例。

←  $graphLookup (aggregation)$indexStats (aggregation) →