Definition定义
$bucketCategorizes incoming documents into groups, called buckets, based on a specified expression and bucket boundaries and outputs a document per each bucket. Each output document contains an根据指定的表达式和桶边界将传入文档分类到称为桶的组中,并为每个桶输出一个文档。每个输出文档都包含一个_idfield whose value specifies the inclusive lower bound of the bucket._id字段,其值指定桶的包含下限。The output option specifies the fields included in each output document.output选项指定每个输出文档中包含的字段。$bucketonly produces output documents for buckets that contain at least one input document.仅为至少包含一个输入文档的桶生成输出文档。
Considerations注意事项
$bucket and Memory Restrictions和内存限制
The $bucket stage has a limit of 100 megabytes of RAM. By default, if the stage exceeds this limit, $bucket returns an error. $bucket阶段的RAM限制为100兆字节。默认情况下,如果阶段超过此限制,$bucket将返回错误。To allow more space for stage processing, use the allowDiskUse option to enable aggregation pipeline stages to write data to temporary files.要为阶段处理提供更多空间,请使用allowDiskUse选项启用聚合管道阶段将数据写入临时文件。
Syntax语法
{
$bucket: {
groupBy: <expression>,
boundaries: [ <lowerbound1>, <lowerbound2>, ... ],
default: <literal>,
output: {
<output1>: { <$accumulator expression> },
...
<outputN>: { <$accumulator expression> }
}
}
}The $bucket document contains the following fields:$bucket文档包含以下字段:
groupBy | expression | $ and enclose it in quotes.$,并将其括在引号中。$bucket includes a default specification, each input document must resolve the groupBy field path or expression to a value that falls within one of the ranges specified by the boundaries.$bucket包含default规范,否则每个输入文档都必须将groupBy字段路径或表达式解析为落在boundaries指定范围内的值。 |
boundaries | array | groupBy表达式的值数组,用于指定每个桶的边界。每对相邻的值都充当桶的包容性下限和排他性上限。您必须至少指定两个边界。[ 10, Long(20), Int32(30) ][ 0, 5, 10 ] creates two buckets:[0,5,10]的数组创建了两个桶:
|
default | literal | _id of an additional bucket that contains all documents whose groupBy expression result does not fall into a bucket specified by boundaries._id的文字,该桶包含其groupBy表达式结果不属于边界指定的桶的所有文档。groupBy expression to a value within one of the bucket ranges specified by boundaries or the operation throws an error.groupBy表达式解析为边界指定的桶范围内的值,否则操作将抛出错误。default value must be less than the lowest boundaries value, or greater than or equal to the highest boundaries value.default值必须小于最低boundaries值,或大于或等于最高boundaries值。default value can be of a different type than the entries in boundaries.default值的类型可以与boundaries中的条目不同。 |
output | document | _id field. _id字段外,还指定了输出文档中要包含的字段。output document, the operation returns a count field containing the number of documents in each bucket.output文档,该操作将返回一个count字段,其中包含每个桶中的文档数量。output document, only the fields specified in the document are returned; i.e. the count field is not returned unless it is explicitly included in the output document.output文档,则只返回文档中指定的字段;即,除非输出文档中明确包含count字段,否则不会返回count字段。 |
Behavior行为
$bucket requires at least one of the following conditions to be met or the operation throws an error:需要满足以下条件中的至少一个,否则操作会抛出错误:
Each input document resolves the groupBy expression to a value within one of the bucket ranges specified by boundaries, or每个输入文档将groupBy表达式解析为边界指定的桶范围内的值,或A default value is specified to bucket documents whose为groupByvalues are outside of theboundariesor of a different BSON type than the values inboundaries.groupBy值超出boundaries或BSON类型与边界中的值不同的桶文档指定default值。
If the 如果groupBy expression resolves to an array or a document, $bucket arranges the input documents into buckets using the comparison logic from $sort.groupBy表达式解析为数组或文档,则$bucket将使用$sort中的比较逻辑将输入文档排列到桶中。
Examples示例
MongoDB Shell
Bucket by Year and Filter by Bucket Results按年份和按桶结果筛选
In 在mongosh, create a sample collection named artists with the following documents:mongosh中,使用以下文档创建一个名为artists的样本集:
db.artists.insertMany([
{ "_id" : 1, "last_name" : "Bernard", "first_name" : "Emil", "year_born" : 1868, "year_died" : 1941, "nationality" : "France" },
{ "_id" : 2, "last_name" : "Rippl-Ronai", "first_name" : "Joszef", "year_born" : 1861, "year_died" : 1927, "nationality" : "Hungary" },
{ "_id" : 3, "last_name" : "Ostroumova", "first_name" : "Anna", "year_born" : 1871, "year_died" : 1955, "nationality" : "Russia" },
{ "_id" : 4, "last_name" : "Van Gogh", "first_name" : "Vincent", "year_born" : 1853, "year_died" : 1890, "nationality" : "Holland" },
{ "_id" : 5, "last_name" : "Maurer", "first_name" : "Alfred", "year_born" : 1868, "year_died" : 1932, "nationality" : "USA" },
{ "_id" : 6, "last_name" : "Munch", "first_name" : "Edvard", "year_born" : 1863, "year_died" : 1944, "nationality" : "Norway" },
{ "_id" : 7, "last_name" : "Redon", "first_name" : "Odilon", "year_born" : 1840, "year_died" : 1916, "nationality" : "France" },
{ "_id" : 8, "last_name" : "Diriks", "first_name" : "Edvard", "year_born" : 1855, "year_died" : 1930, "nationality" : "Norway" }
])The following operation groups the documents into buckets according to the 以下操作根据year_born field and filters based on the count of documents in the buckets:year_born字段将文档分组到桶中,并根据桶中的文档计数进行筛选:
db.artists.aggregate( [
// First Stage第一阶段
{
$bucket: {
groupBy: "$year_born", // Field to group by用来分组的字段
boundaries: [ 1840, 1850, 1860, 1870, 1880 ], // Boundaries for the buckets桶的边界
default: "Other", // Bucket ID for documents which do not fall into a bucket未放入桶中的文档的桶ID
output: { // Output for each bucket每个桶的输出每个桶的输出
"count": { $sum: 1 },
"artists" :
{
$push: {
"name": { $concat: [ "$first_name", " ", "$last_name"] },
"year_born": "$year_born"
}
}
}
}
},
// Second Stage第二阶段
{
$match: { count: {$gt: 3} }
}
] )
First Stage第一阶段The$bucketstage groups the documents into buckets by theyear_bornfield. The buckets have the following boundaries:$bucket阶段按year_born字段将文档分组到桶中。桶有以下boundaries:[1840, 1850)with inclusive lowerbound,具有包含下限1840and exclusive upper bound1850.1840和排除上限1850。[1850, 1860)with inclusive lowerbound,具有包含下限1850and exclusive upper bound1860.1850和排除上限1860。[1860, 1870)with inclusive lowerbound,具有包含下限1860and exclusive upper bound1870.1860和排除上限1870。[1870, 1880)with inclusive lowerbound,具有包含下限1870and exclusive upper bound1880.1870和排除上限1880。If a document did not contain the如果文档不包含year_bornfield or itsyear_bornfield was outside the ranges above, it would be placed in the default bucket with the_idvalue"Other".year_born字段或其year_born域超出上述范围,则将其放置在_id值为"Other"的default桶中。
The stage includes the output document to determine the fields to return:该阶段包括output文档,以确定要返回的字段:Field字段Description描述_idInclusive lower bound of the bucket.包括桶的下限。countCount of documents in the bucket.清点桶中的文件数。artistsArray of documents containing information on each artist in the bucket. Each document contains the artist's包含桶中每个艺术家信息的文档数组。每个文档都包含艺术家的This stage passes the following documents to the next stage:此阶段将以下文件传递到下一阶段:{ "_id" : 1840, "count" : 1, "artists" : [ { "name" : "Odilon Redon", "year_born" : 1840 } ] }
{ "_id" : 1850, "count" : 2, "artists" : [ { "name" : "Vincent Van Gogh", "year_born" : 1853 },
{ "name" : "Edvard Diriks", "year_born" : 1855 } ] }
{ "_id" : 1860, "count" : 4, "artists" : [ { "name" : "Emil Bernard", "year_born" : 1868 },
{ "name" : "Joszef Rippl-Ronai", "year_born" : 1861 },
{ "name" : "Alfred Maurer", "year_born" : 1868 },
{ "name" : "Edvard Munch", "year_born" : 1863 } ] }
{ "_id" : 1870, "count" : 1, "artists" : [ { "name" : "Anna Ostroumova", "year_born" : 1871 } ] }Second Stage第二阶段The$matchstage filters the output from the previous stage to only return buckets which contain more than 3 documents.$match阶段筛选前一阶段的输出,只返回包含3个以上文档的桶。The operation returns the following document:该操作返回以下文档:{ "_id" : 1860, "count" : 4, "artists" :
[
{ "name" : "Emil Bernard", "year_born" : 1868 },
{ "name" : "Joszef Rippl-Ronai", "year_born" : 1861 },
{ "name" : "Alfred Maurer", "year_born" : 1868 },
{ "name" : "Edvard Munch", "year_born" : 1863 }
]
}Use $bucket with $facet to Bucket by Multiple Fields使用$bucket和$facet按多个字段创建桶You can use the您可以使用$facetstage to perform multiple$bucketaggregations in a single stage.$facet阶段在单个阶段中执行多个$bucket聚合。In在mongosh, create a sample collection namedartworkwith the following documents:mongosh中,使用以下文档创建一个名为artwork的样本集:db.artwork.insertMany([
{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
"price" : Decimal128("199.99") },
{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
"price" : Decimal128("280.00") },
{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
"price" : Decimal128("76.04") },
{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
"price" : Decimal128("167.30") },
{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
"price" : Decimal128("483.00") },
{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
"price" : Decimal128("385.00") },
{ "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893
/* No price*/ },
{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
"price" : Decimal128("118.42") }
])The following operation uses two以下操作在$bucketstages within a$facetstage to create two groupings, one bypriceand the other byyear:$facet阶段中使用两个$bucket阶段来创建两个分组,一个按price,另一个按year:db.artwork.aggregate( [
{
$facet: { //Top-level $facet stage顶级$facet阶段
"price": [ //Output field 1输出字段1
{
$bucket: {
groupBy: "$price", //Field to group by用来分组的字段
boundaries: [ 0, 200, 400 ], //Boundaries for the buckets桶的边界
default: "Other", //Bucket ID for documents which do not fall into a bucket未放入桶中的文档的桶ID
output: { //Output for each bucket每个桶的输出
"count": { $sum: 1 },
"artwork" : { $push: { "title": "$title", "price": "$price" } },
"averagePrice": { $avg: "$price" }
}
}
}
],
"year": [ //Output field 2输出字段2
{
$bucket: {
groupBy: "$year", //Field to group by用来分组的字段
boundaries: [ 1890, 1910, 1920, 1940 ], //Boundaries for the buckets桶的边界
default: "Unknown", //Bucket ID for documents which do not fall into a bucket未放入桶中的文档的桶ID
output: { //Output for each bucket每个桶的输出
"count": { $sum: 1 },
"artwork": { $push: { "title": "$title", "year": "$year" } }
}
}
}
]
}
}
] )First Facet第一方面The first facet groups the input documents by第一个方面按price. The buckets have the following boundaries:price对输入文档进行分组。桶有以下边界:[0, 200)with inclusive lowerbound具有包含下限0and exclusive upper bound200.0和排除上限200。[200, 400)with inclusive lowerbound具有包含下限200and exclusive upper bound400.200和排除上限400。"Other", the,defaultbucket containing documents without prices or prices outside the ranges above.default桶包含没有价格或价格超出上述范围的文档。
The$bucketstage includes the output document to determine the fields to return:$bucket阶段包括output文档,以确定要返回的字段:Field字段Description描述_idInclusive lower bound of the bucket.包括桶的下限。countCount of documents in the bucket.清点桶中的文件数。artworkArray of documents containing information on each artwork in the bucket.包含桶中每件艺术品信息的文档数组。averagePriceEmploys the使用$avgoperator to display the average price of all artwork in the bucket.$avg运算符显示桶中所有艺术品的平均价格。Second Facet第二方面The second facet groups the input documents by第二个方面按年份对输入文档进行分组。桶有以下边界:year. The buckets have the following boundaries:[1890, 1910)with inclusive lowerbound具有包含下限1890and exclusive upper bound1910.1890和排除上限1910。[1910, 1920)with inclusive lowerbound具有包含下限1910and exclusive upper bound1920.1910和排除上限1920。[1920, 1940)with inclusive lowerbound具有包含下限1920and exclusive upper bound1940.1920和排除上限1940。"Unknown", the,defaultbucket containing documents without years or years outside the ranges above.default桶包含没有年份或超出上述范围的文档。
The$bucketstage includes the output document to determine the fields to return:$bucket阶段包括output文档,以确定要返回的字段:Field字段Description描述countCount of documents in the bucket.清点桶中的文件数。artworkArray of documents containing information on each artwork in the bucket.包含桶中每件艺术品信息的文档数组。Output输出The operation returns the following document:该操作返回以下文档:{
"price" : [ //Output of first facet第一面输出
{
"_id" : 0,
"count" : 4,
"artwork" : [
{ "title" : "The Pillars of Society", "price" : Decimal128("199.99") },
{ "title" : "Dancer", "price" : Decimal128("76.04") },
{ "title" : "The Great Wave off Kanagawa", "price" : Decimal128("167.30") },
{ "title" : "Blue Flower", "price" : Decimal128("118.42") }
],
"averagePrice" : Decimal128("140.4375")
},
{
"_id" : 200,
"count" : 2,
"artwork" : [
{ "title" : "Melancholy III", "price" : Decimal128("280.00") },
{ "title" : "Composition VII", "price" : Decimal128("385.00") }
],
"averagePrice" : Decimal128("332.50")
},
{
//Includes documents without prices and prices greater than 400包括没有价格和价格超过400的文件
"_id" : "Other",
"count" : 2,
"artwork" : [
{ "title" : "The Persistence of Memory", "price" : Decimal128("483.00") },
{ "title" : "The Scream" }
],
"averagePrice" : Decimal128("483.00")
}
],
"year" : [ //Output of second facet第二面输出
{
"_id" : 1890,
"count" : 2,
"artwork" : [
{ "title" : "Melancholy III", "year" : 1902 },
{ "title" : "The Scream", "year" : 1893 }
]
},
{
"_id" : 1910,
"count" : 2,
"artwork" : [
{ "title" : "Composition VII", "year" : 1913 },
{ "title" : "Blue Flower", "year" : 1918 }
]
},
{
"_id" : 1920,
"count" : 3,
"artwork" : [
{ "title" : "The Pillars of Society", "year" : 1926 },
{ "title" : "Dancer", "year" : 1925 },
{ "title" : "The Persistence of Memory", "year" : 1931 }
]
},
{
//Includes documents without a year包括没有年份的文件
"_id" : "Unknown",
"count" : 1,
"artwork" : [
{ "title" : "The Great Wave off Kanagawa" }
]
}
]
}
C#
The C# examples on this page use the本页上的C#示例使用Atlas示例数据集中的sample_mflixdatabase from the Atlas sample datasets.sample_mflix数据库。To learn how to create a free MongoDB Atlas cluster and load the sample datasets, see Get Started in the MongoDB .NET/C# Driver documentation.要了解如何创建免费的MongoDB Atlas集群并加载示例数据集,请参阅MongoDB .NET/C#驱动程序文档中的入门。The following以下Movieclass models the documents in thesample_mflix.moviescollection:Movie类对sample_mflix.movies集合中的文档进行建模:public class Movie
{
public ObjectId Id { get; set; }
public int Runtime { get; set; }
public string Title { get; set; }
public string Rated { get; set; }
public List<string> Genres { get; set; }
public string Plot { get; set; }
public ImdbData Imdb { get; set; }
public int Year { get; set; }
public int Index { get; set; }
public string[] Comments { get; set; }
[]
public DateTime LastUpdated { get; set; }
}Note
ConventionPack for Pascal CasePascal案例的约定包The C# classes on this page use Pascal case for their property names, but the field names in the MongoDB collection use camel case. To account for this difference, you can use the following code to register a此页面上的C#类使用Pascal大小写作为其属性名,但MongoDB集合中的字段名使用驼峰大小写。为了解释这种差异,您可以在应用程序启动时使用以下代码注册ConventionPackwhen your application starts:ConventionPack:var camelCaseConvention = new ConventionPack { new CamelCaseElementNameConvention() };
ConventionRegistry.Register("CamelCase", camelCaseConvention, type => true);To use the MongoDB .NET/C# driver to add a要使用MongoDB .NET/C#驱动程序向聚合管道添加$bucketstage to an aggregation pipeline, call theBucket()method on aPipelineDefinitionobject.$bucket阶段,请在PipelineDefinition对象上调用Bucket()方法。The following example creates a pipeline stage that groups incoming documents by the value of their以下示例创建了一个管道阶段,该阶段根据其Runtimefield, inclusive of the lower boundary and exclusive of the upper boundary:Runtime字段的值对传入文档进行分组,包括下限,不包括上限:var pipeline = new EmptyPipelineDefinition<Movie>()
.Bucket(
groupBy: m => m.Runtime,
boundaries: new List<int>() { 0, 71, 91, 121, 151, 201, 999 });To customize the要自定义$bucketoperation, pass an AggregateBucketOptions object to theBucket()method.$bucket操作,请将AggregateBucketOptions对象传递给Bucket()方法。The following example performs the same以下示例执行与前一个示例相同的$bucketoperation as the previous example, but groups all documents with aRuntimevalue greater than999into the default bucket, named"Other":$bucket操作,但将所有Runtime值大于999的文档分组到名为"Other"的默认桶中:var bucketOptions = new AggregateBucketOptions<BsonValue>()
{
DefaultBucket = (BsonValue)"Other"
};
var pipeline = new EmptyPipelineDefinition<Movie>()
.Bucket(
groupBy: m => m.Runtime,
boundaries: new List<BsonValue>() { 0, 71, 91, 121, 151, 201, 999 },
options: bucketOptions);Node.js
The Node.js examples on this page use the本页上的Node.js示例使用Atlas示例数据集中的sample_mflixdatabase from the Atlas sample datasets.sample_mflix数据库。To learn how to create a free MongoDB Atlas cluster and load the sample datasets, see Get Started in the MongoDB Node.js driver documentation.要了解如何创建免费的MongoDB Atlas集群并加载示例数据集,请参阅MongoDB Node.js驱动程序文档中的入门。To use the MongoDB Node.js driver to add a要使用MongoDB Node.js驱动程序向聚合管道添加$bucketstage to an aggregation pipeline, use the$bucketoperator in a pipeline object.$bucket阶段,请在管道对象中使用$bucket运算符。The following example creates a pipeline stage that groups incoming documents by the value of their以下示例创建了一个管道阶段,该阶段根据runtimefield, inclusive of the lower boundary and exclusive of the upper boundary.runtime字段的值对传入文档进行分组,包括下限,不包括上限。The aggregation stage groups all documents with a聚合阶段将所有运行时值大于runtimevalue greater than999into the default bucket, named"other".999的文档分组到名为"other"的默认桶中。The example then runs the aggregation pipeline:然后,该示例运行聚合管道:const pipeline = [
{
$bucket: {
groupBy: "$runtime",
boundaries: [0, 17, 91, 121, 151, 201, 999],
default: "other"
}
}
];
const cursor = collection.aggregate(pipeline);
return cursor;Learn More了解更多To learn more about related pipeline stages, see the要了解有关相关管道阶段的更多信息,请参阅$bucketAutoguide.$bucketAuto指南。