Definition定义
$groupThe $group stage combines multiple documents with the same field, fields, or expression into a single document according to a group key. The result is one document per unique group key.$group阶段根据组键将具有相同字段、字段或表达式的多个文档组合到一个文档中。结果是每个唯一的组键对应一个文档。A group key is often a field, or group of fields. The group key can also be the result of an expression. Use the组键通常是一个字段或一组字段。组键也可以是表达式的结果。使用_idfield in the$grouppipeline stage to set the group key. See below for usage examples.$group管道阶段中的_id字段设置组键。请参阅下面的使用示例。In the在$groupstage output, the_idfield is set to the group key for that document.$group阶段输出中,_id字段设置为该文档的组键。The output documents can also contain additional fields that are set using accumulator expressions.输出文档还可以包含使用累加器表达式设置的其他字段。Note
$groupdoes not order its output documents.不排序其输出文档。
Compatibility兼容性
You can use 您可以将$group for deployments hosted in the following environments:$group用于在以下环境中托管的部署:
- MongoDB Atlas
: The fully managed service for MongoDB deployments in the cloud:云中MongoDB部署的完全托管服务
- MongoDB Enterprise
: The subscription-based, self-managed version of MongoDB:MongoDB的基于订阅的自我管理版本 - MongoDB Community
: The source-available, free-to-use, and self-managed version of MongoDB:MongoDB的源代码可用、免费使用和自我管理版本
Syntax语法
The $group stage has the following prototype form:$group阶段具有以下原型形式:
{
$group:
{
_id: <expression>, // Group key
<field1>: { <accumulator1> : <expression1> },
...
}
}
_id | _id expression specifies the group key. If you specify an _id value of null, or any other constant value, the $group stage returns a single document that aggregates values across all of the input documents. _id表达式指定组键。如果指定_id值为null或任何其他常数值,则$group阶段将返回一个聚合所有输入文档值的单个文档。 |
field |
The _id and the accumulator operators can accept any valid expression. For more information on expressions, see Expressions._id和累加器运算符可以接受任何有效的expression。有关表达式的详细信息,请参阅表达式。
Considerations注意事项
Performance性能
$group is a blocking stage, which causes the pipeline to wait for all input data to be retrieved for the blocking stage before processing the data. 是一个阻塞阶段,它使管道在处理数据之前等待所有输入数据被检索到阻塞阶段。A blocking stage may reduce performance because it reduces parallel processing for a pipeline with multiple stages. 阻塞阶段可能会降低性能,因为它减少了具有多个阶段的流水线的并行处理。A blocking stage may also use substantial amounts of memory for large data sets.阻塞阶段也可能为大型数据集使用大量内存。
Accumulator Operator累加器运算符
The <accumulator> operator must be one of the following accumulator operators:<accumulator>运算符必须是以下累加器运算符之一:
$accumulator | |
$addToSet |
|
$avg |
|
$bottom |
|
$bottomN | n fields within a group, according to the specified sort order.
|
$concatArrays |
|
$count |
|
$first |
|
$firstN |
|
$last |
|
$lastN |
|
$max |
|
$maxN | n maximum valued elements in a group. n个最大值元素的聚合。$maxN array operator.$maxN数组运算符不同。
|
$median |
|
$mergeObjects | |
$min |
|
$minN | n minimum valued elements in a group. Distinct from the $minN array operator.n个最小值元素的聚合。与$minN数组运算符不同。
|
$percentile |
|
$push |
|
$setUnion |
|
$stdDevPop |
|
$stdDevSamp |
|
$sum |
|
$top |
|
$topN | n fields within a group, according to the specified sort order.
|
$group and Memory Restrictions和内存限制
If the 如果$group stage exceeds 100 megabytes of RAM, MongoDB writes data to temporary files. $group阶段超过100兆字节的RAM,MongoDB会将数据写入临时文件。However, if the allowDiskUse option is set to 但是,如果false, $group returns an error. allowDiskUse选项设置为false,$group将返回错误。For more information, refer to Aggregation Pipeline Limits.有关更多信息,请参阅聚合管道限制。
$group Performance Optimizations性能优化
This section describes optimizations to improve the performance of 本节介绍提高$group. There are optimizations that you can make manually and optimizations MongoDB makes internally.$group性能的优化。您可以手动进行优化,也可以在MongoDB内部进行优化。
Optimization to Return the First or Last Document of Each Group优化以返回每个组的第一个或最后一个文档
If a pipeline 如果管道按同一字段排序和分组,并且sorts and groups by the same field and the $group stage only uses the $first or $last accumulator operator, consider adding an index on the grouped field which matches the sort order. $group阶段仅使用$first或$last累加器运算符,请考虑在分组字段上添加一个与排序顺序匹配的索引。In some cases, the 在某些情况下,$group stage can use the index to quickly find the first or last document of each group.$group阶段可以使用索引快速找到每个组的第一个或最后一个文档。
Example示例
If a collection named 如果名为foo contains an index { x: 1, y: 1 }, the following pipeline can use that index to find the first document of each group:foo的集合包含索引{ x: 1, y: 1 },则以下管道可以使用该索引查找每个组的第一个文档:
db.foo.aggregate([
{
$sort:{ x : 1, y : 1 }
},
{
$group: {
_id: { x : "$x" },
y: { $first : "$y" }
}
}
])Slot-Based Query Execution Engine基于插槽的查询执行引擎
Starting in version 5.2, MongoDB uses the slot-based execution query engine to execute 从5.2版本开始,MongoDB使用基于槽的执行查询引擎来执行$group stages if either:$group阶段,如果出现以下情况之一:
$groupis the first stage in the pipeline.这是筹备工作的第一阶段。All preceding stages in the pipeline can also be executed by the slot-based execution engine.流水线中的所有先前阶段也可以由基于槽的执行引擎执行。
For more information, see 有关更多信息,请参阅$group Optimization.$group优化。
Examples示例
MongoDB Shell
Count the Number of Documents in a Collection统计集合中的文档数量
In 在mongosh, create a sample collection named sales with the following documents:mongosh中,使用以下文档创建一个名为sales的示例集合:
db.sales.insertMany([
{ "_id" : 1, "item" : "abc", "price" : Decimal128("10"), "quantity" : Int32("2"), "date" : ISODate("2014-03-01T08:00:00Z") },
{ "_id" : 2, "item" : "jkl", "price" : Decimal128("20"), "quantity" : Int32("1"), "date" : ISODate("2014-03-01T09:00:00Z") },
{ "_id" : 3, "item" : "xyz", "price" : Decimal128("5"), "quantity" : Int32( "10"), "date" : ISODate("2014-03-15T09:00:00Z") },
{ "_id" : 4, "item" : "xyz", "price" : Decimal128("5"), "quantity" : Int32("20") , "date" : ISODate("2014-04-04T11:21:39.736Z") },
{ "_id" : 5, "item" : "abc", "price" : Decimal128("10"), "quantity" : Int32("10") , "date" : ISODate("2014-04-04T21:23:13.331Z") },
{ "_id" : 6, "item" : "def", "price" : Decimal128("7.5"), "quantity": Int32("5" ) , "date" : ISODate("2015-06-04T05:08:13Z") },
{ "_id" : 7, "item" : "def", "price" : Decimal128("7.5"), "quantity": Int32("10") , "date" : ISODate("2015-09-10T08:43:00Z") },
{ "_id" : 8, "item" : "abc", "price" : Decimal128("10"), "quantity" : Int32("5" ) , "date" : ISODate("2016-02-06T20:20:13Z") },
])
The following aggregation operation uses the 以下聚合操作使用$group stage to count the number of documents in the sales collection:$group阶段来计算sales集合中的文档数量:
db.sales.aggregate( [
{
$group: {
_id: null,
count: { $count: { } }
}
}
] )
The operation returns the following result:该操作返回以下结果:
{ "_id" : null, "count" : 8 }
This aggregation operation is equivalent to the following SQL statement:此聚合操作等效于以下SQL语句:
SELECT COUNT(*) AS count FROM salesRetrieve Distinct Values检索不同的值
The following aggregation operation uses the 以下聚合操作使用$group stage to retrieve the distinct item values from the sales collection:$group阶段从销售集合中检索不同的项目值:
db.sales.aggregate( [ { $group : { _id : "$item" } } ] )
The operation returns the following result:该操作返回以下结果:
{ "_id" : "abc" }
{ "_id" : "jkl" }
{ "_id" : "def" }
{ "_id" : "xyz" }
Note
For example, 例如,以下形式的$group operations of the following form can result in a DISTINCT_SCAN:$group操作可能会导致DISTINCT_SCAN:
{ $group : { _id : "$<field>" } }
For more information on behavior for retrieving distinct values, see the distinct command behavior.有关检索不同值的行为的更多信息,请参阅不同命令行为。
To see whether your operation results in a 要查看您的操作是否导致DISTINCT_SCAN, check your operation's explain results.DISTINCT_SCAN,请检查您的操作的解释结果。
Group by Item Having按项目分组
The following aggregation operation groups documents by the 以下聚合操作按item field, calculating the total sale amount per item and returning only the items with total sale amount greater than or equal to 100:item字段对文档进行分组,计算每个项目的总销售金额,并仅返回总销售金额大于或等于100的项目:
db.sales.aggregate(
[
// First Stage
{
$group :
{
_id : "$item",
totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } }
}
},
// Second Stage
{
$match: { "totalSaleAmount": { $gte: 100 } }
}
]
)
First Stage:第一阶段:The$groupstage groups the documents byitemto retrieve the distinct item values. This stage returns thetotalSaleAmountfor each item.$group阶段按item对文档进行分组,以检索不同的项目值。此阶段返回每个项目的totalSaleAmount(总销售金额)。Second Stage:第二阶段:The$matchstage filters the resulting documents to only return items with atotalSaleAmountgreater than or equal to 100.$match阶段筛选生成的文档,仅返回totalSaleAmount大于或等于100的项目。
The operation returns the following result:该操作返回以下结果:
{ "_id" : "abc", "totalSaleAmount" : Decimal128("170") }
{ "_id" : "xyz", "totalSaleAmount" : Decimal128("150") }
{ "_id" : "def", "totalSaleAmount" : Decimal128("112.5") }
This aggregation operation is equivalent to the following SQL statement:此聚合操作等效于以下SQL语句:
SELECT item,
Sum(( price * quantity )) AS totalSaleAmount
FROM sales
GROUP BY item
HAVING totalSaleAmount >= 100Tip
Calculate Count, Sum, and Average计算计数、总和和平均值
In 在mongosh, create a sample collection named sales with the following documents:mongosh中,使用以下文档创建一个名为sales的示例集合:
db.sales.insertMany([
{ "_id" : 1, "item" : "abc", "price" : Decimal128("10"), "quantity" : Int32("2"), "date" : ISODate("2014-03-01T08:00:00Z") },
{ "_id" : 2, "item" : "jkl", "price" : Decimal128("20"), "quantity" : Int32("1"), "date" : ISODate("2014-03-01T09:00:00Z") },
{ "_id" : 3, "item" : "xyz", "price" : Decimal128("5"), "quantity" : Int32( "10"), "date" : ISODate("2014-03-15T09:00:00Z") },
{ "_id" : 4, "item" : "xyz", "price" : Decimal128("5"), "quantity" : Int32("20") , "date" : ISODate("2014-04-04T11:21:39.736Z") },
{ "_id" : 5, "item" : "abc", "price" : Decimal128("10"), "quantity" : Int32("10") , "date" : ISODate("2014-04-04T21:23:13.331Z") },
{ "_id" : 6, "item" : "def", "price" : Decimal128("7.5"), "quantity": Int32("5" ) , "date" : ISODate("2015-06-04T05:08:13Z") },
{ "_id" : 7, "item" : "def", "price" : Decimal128("7.5"), "quantity": Int32("10") , "date" : ISODate("2015-09-10T08:43:00Z") },
{ "_id" : 8, "item" : "abc", "price" : Decimal128("10"), "quantity" : Int32("5" ) , "date" : ISODate("2016-02-06T20:20:13Z") },
])Group by Day of the Year按年度天数分组
The following pipeline calculates the total sales amount, average sales quantity, and sale count for each day in the year 2014:以下管道计算2014年每天的总销售额、平均销售量和销售计数:
db.sales.aggregate([
// First Stage
{
$match : { "date": { $gte: new ISODate("2014-01-01"), $lt: new ISODate("2015-01-01") } }
},
// Second Stage
{
$group : {
_id : { $dateToString: { format: "%Y-%m-%d", date: "$date" } },
totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } },
averageQuantity: { $avg: "$quantity" },
count: { $sum: 1 }
}
},
// Third Stage
{
$sort : { totalSaleAmount: -1 }
}
])
First Stage:第一阶段:The$matchstage filters the documents to only pass documents from the year 2014 to the next stage.$match阶段筛选文档,仅将2014年的文档传递到下一阶段。Second Stage:第二阶段:The$groupstage groups the documents by date and calculates the total sale amount, average quantity, and total count of the documents in each group.$group阶段按日期对文档进行分组,并计算每组中文档的总销售金额、平均数量和总计数。Third Stage:第三阶段:The$sortstage sorts the results by the total sale amount for each group in descending order.$sort阶段按每组的总销售额降序对结果进行排序。
The operation returns the following results:该操作返回以下结果:
{
"_id" : "2014-04-04",
"totalSaleAmount" : Decimal128("200"),
"averageQuantity" : 15, "count" : 2
}
{
"_id" : "2014-03-15",
"totalSaleAmount" : Decimal128("50"),
"averageQuantity" : 10, "count" : 1
}
{
"_id" : "2014-03-01",
"totalSaleAmount" : Decimal128("40"),
"averageQuantity" : 1.5, "count" : 2
}
This aggregation operation is equivalent to the following SQL statement:此聚合操作等效于以下SQL语句:
SELECT date,
Sum(( price * quantity )) AS totalSaleAmount,
Avg(quantity) AS averageQuantity,
Count(*) AS Count
FROM sales
WHERE date >= '01/01/2014' AND date < '01/01/2015'
GROUP BY date
ORDER BY totalSaleAmount DESCTip
$match$sortdb.collection.countDocuments()which wraps the它用$groupaggregation stage with a$sumexpression.$sum表达式包装$group聚合阶段。
Group by null按null分组
nullThe following aggregation operation specifies a group 以下聚合操作指定组_id of null, calculating the total sale amount, average quantity, and count of all documents in the collection._id为null,计算集合中所有文档的总销售金额、平均数量和计数。
db.sales.aggregate([
{
$group : {
_id : null,
totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } },
averageQuantity: { $avg: "$quantity" },
count: { $sum: 1 }
}
}
])
The operation returns the following result:该操作返回以下结果:
{
"_id" : null,
"totalSaleAmount" : Decimal128("452.5"),
"averageQuantity" : 7.875,
"count" : 8
}
This aggregation operation is equivalent to the following SQL statement:此聚合操作等效于以下SQL语句:
SELECT Sum(price * quantity) AS totalSaleAmount,
Avg(quantity) AS averageQuantity,
Count(*) AS Count
FROM salesTip
$countdb.collection.countDocuments()which wraps the它用$groupaggregation stage with a$sumexpression.$sum表达式包装$group聚合阶段。
Pivot Data枢轴数据
In 在mongosh, create a sample collection named books with the following documents:mongosh中,使用以下文档创建一个名为books的示例集合:
db.books.insertMany([
{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 },
{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
{ "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
])Group title by author按author分组title
title by authorThe following aggregation operation pivots the data in the 以下聚合操作将books collection to have titles grouped by authors.books集合中的数据进行枢轴转换,使其标题按作者分组。
db.books.aggregate([
{ $group : { _id : "$author", books: { $push: "$title" } } }
])
The operation returns the following documents:该操作返回以下文档:
{ "_id" : "Homer", "books" : [ "The Odyssey", "Iliad" ] }
{ "_id" : "Dante", "books" : [ "The Banquet", "Divine Comedy", "Eclogues" ] }Group Documents by author按author分组文档
authorThe following aggregation operation groups documents by 以下聚合操作组文档按author:author分类:
db.books.aggregate([
// First Stage
{
$group : { _id : "$author", books: { $push: "$$ROOT" } }
},
// Second Stage
{
$addFields:
{
totalCopies : { $sum: "$books.copies" }
}
}
])
First Stage:第一阶段:$groupuses the使用$$ROOTsystem variable to group the entire documents by authors.$$ROOT系统变量按作者对整个文档进行分组。This stage passes the following documents to the next stage:此阶段将以下文件传递到下一阶段:{ "_id" : "Homer",
"books" :
[
{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
{ "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
]
},
{ "_id" : "Dante",
"books" :
[
{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
]
}Second Stage:第二阶段:$addFieldsadds a field to the output containing the total copies of books for each author.在输出中添加一个字段,其中包含每个作者的书籍总份数。Note
The resulting documents must not exceed the BSON Document Size limit of 16 mebibytes.生成的文档不得超过BSON文档大小限制16兆字节。The operation returns the following documents:该操作返回以下文档:{
"_id" : "Homer",
"books" :
[
{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
{ "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
],
"totalCopies" : 20
}
{
"_id" : "Dante",
"books" :
[
{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
],
"totalCopies" : 5
}
C#
The C# examples on this page use the 本页上的C#示例使用Atlas示例数据集中的sample_mflix database from the Atlas sample datasets. sample_mflix数据库。To learn how to create a free MongoDB Atlas cluster and load the sample datasets, see Get Started in the MongoDB .NET/C# Driver documentation.要了解如何创建免费的MongoDB Atlas集群并加载示例数据集,请参阅MongoDB NET/C#驱动程序文档中的入门。
The following 以下Movie class models the documents in the sample_mflix.movies collection:Movie类对sample_mflix.movies集合中的文档进行建模:
public class Movie
{
public ObjectId Id { get; set; }
public int Runtime { get; set; }
public string Title { get; set; }
public string Rated { get; set; }
public List<string> Genres { get; set; }
public string Plot { get; set; }
public ImdbData Imdb { get; set; }
public int Year { get; set; }
public int Index { get; set; }
public string[] Comments { get; set; }
[]
public DateTime LastUpdated { get; set; }
}
Note
ConventionPack for Pascal CasePascal大小写的约定包
The C# classes on this page use Pascal case for their property names, but the field names in the MongoDB collection use camel case. To account for this difference, you can use the following code to register a 此页面上的C#类使用Pascal大小写作为其属性名,但MongoDB集合中的字段名使用驼峰大小写。为了解释这种差异,您可以在应用程序启动时使用以下代码注册ConventionPack when your application starts:ConventionPack:
var camelCaseConvention = new ConventionPack { new CamelCaseElementNameConvention() };
ConventionRegistry.Register("CamelCase", camelCaseConvention, type => true);To use the MongoDB .NET/C# driver to add a 要使用MongoDB .NET/C#驱动程序向聚合管道添加$group stage to an aggregation pipeline, call the Group() method on a PipelineDefinition object.$group阶段,请在PipelineDefinition对象上调用group()方法。
The following example creates a pipeline stage that groups documents by the value of their Rated field. Each group's rating is shown in a field named Rating in each output document. Each output document also contains a field named TotalRuntime, whose value is the total runtime of all movies in the group.
var pipeline = new EmptyPipelineDefinition<Movie>()
.Group(
id: m => m.Rated,
group: g => new
{
Rating = g.Key,
TotalRuntime = g.Sum(m => m.Runtime)
}
);Node.js
The Node.js examples on this page use the 本页上的Node.js示例使用Atlas示例数据集中的sample_mflix database from the Atlas sample datasets. sample_mflix数据库。To learn how to create a free MongoDB Atlas cluster and load the sample datasets, see Get Started in the MongoDB Node.js driver documentation.要了解如何创建免费的MongoDB Atlas集群并加载示例数据集,请参阅MongoDB Node.js驱动程序文档中的入门。
To use the MongoDB Node.js driver to add a 要使用MongoDB Node.js驱动程序向聚合管道添加$group stage to an aggregation pipeline, use the $group operator in a pipeline object.$group阶段,请在管道对象中使用$group运算符。
The following example creates a pipeline stage that groups documents by the value of their 以下示例创建了一个管道阶段,该阶段根据文档的rated field. Each output document contains a rating field that stores each group's rating. rated(额定)字段值对文档进行分组。每个输出文档都包含一个rating(评级)字段,用于存储每个组的评级。Each output document also contains a field named 每个输出文档还包含一个名为totalRuntime that stores the total runtime of all movies in the group. The example then runs the aggregation pipeline:totalRuntime的字段,用于存储组中所有电影的总运行时间。然后,该示例运行聚合管道:
const pipeline = [
{
$group: {
_id: "$rated",
rating: { $first: "$rated" },
totalRuntime: { $sum: "$runtime" }
}
}
];
const cursor = collection.aggregate(pipeline);
return cursor;Learn More了解更多
The Group and Total Data tutorial provides an extensive example of the 分组和合计数据教程提供了一个常见用例中$group operator in a common use case.$Group运算符的广泛示例。
To learn more about related pipeline stages, see the 要了解有关相关管道阶段的更多信息,请参阅$addFields guide.$addFields指南。