On this page本页内容
$bucket Categorizes incoming documents into groups, called buckets, based on a specified expression and bucket boundaries and outputs a document per each bucket. 根据指定的表达式和桶边界将传入文档分类为称为桶的组,并为每个桶输出一个文档。Each output document contains an 每个输出文档都包含一个_id field whose value specifies the inclusive lower bound of the bucket. _id字段,其值指定桶的包含下限。The output option specifies the fields included in each output document.输出选项指定每个输出文档中包含的字段。
$bucket only produces output documents for buckets that contain at least one input document.仅为包含至少一个输入文档的存储桶生成输出文档。
$bucketThe $bucket stage has a limit of 100 megabytes of RAM. $bucket阶段的RAM限制为100兆字节。By default, if the stage exceeds this limit, 默认情况下,如果阶段超过此限制,$bucket returns an error. $bucket将返回一个错误。To allow more space for stage processing, use the allowDiskUse option to enable aggregation pipeline stages to write data to temporary files.要为阶段处理留出更多空间,请使用allowDiskUse选项启用聚合管道阶段将数据写入临时文件。
{
$bucket: {
groupBy: <expression>,
boundaries: [ <lowerbound1>, <lowerbound2>, ... ],
default: <literal>,
output: {
<output1>: { <$accumulator expression> },
...
<outputN>: { <$accumulator expression> }
}
}
}
The $bucket document contains the following fields:$bucket文档包含以下字段:
groupBy | expression |
|
boundaries | array |
|
default | literal |
|
output | document |
<outputfield1>: { <accumulator>: <expression1> },
...
<outputfieldN>: { <accumulator>: <expressionN> }
|
$bucket requires at least one of the following conditions to be met or the operation throws an error:要求至少满足以下条件之一,否则操作将抛出错误:
groupBy表达式解析为boundaries指定的桶范围之一内的值,或者groupBy values are outside of the boundaries or of a different BSON type than the values in boundaries.groupBy值超出boundaries或与boundaries中的值不同的BSON类型的bucket文档,将指定一个default值。If the 如果groupBy expression resolves to an array or a document, $bucket arranges the input documents into buckets using the comparison logic from $sort.groupBy表达式解析为数组或文档,$bucket使用$sort的比较逻辑将输入文档排列到bucket中。
In 在mongosh, create a sample collection named artists with the following documents:mongosh中,使用以下文档创建名为artists的样本集合:
db.artists.insertMany([
{ "_id" : 1, "last_name" : "Bernard", "first_name" : "Emil", "year_born" : 1868, "year_died" : 1941, "nationality" : "France" },
{ "_id" : 2, "last_name" : "Rippl-Ronai", "first_name" : "Joszef", "year_born" : 1861, "year_died" : 1927, "nationality" : "Hungary" },
{ "_id" : 3, "last_name" : "Ostroumova", "first_name" : "Anna", "year_born" : 1871, "year_died" : 1955, "nationality" : "Russia" },
{ "_id" : 4, "last_name" : "Van Gogh", "first_name" : "Vincent", "year_born" : 1853, "year_died" : 1890, "nationality" : "Holland" },
{ "_id" : 5, "last_name" : "Maurer", "first_name" : "Alfred", "year_born" : 1868, "year_died" : 1932, "nationality" : "USA" },
{ "_id" : 6, "last_name" : "Munch", "first_name" : "Edvard", "year_born" : 1863, "year_died" : 1944, "nationality" : "Norway" },
{ "_id" : 7, "last_name" : "Redon", "first_name" : "Odilon", "year_born" : 1840, "year_died" : 1916, "nationality" : "France" },
{ "_id" : 8, "last_name" : "Diriks", "first_name" : "Edvard", "year_born" : 1855, "year_died" : 1930, "nationality" : "Norway" }
])
The following operation groups the documents into buckets according to the 以下操作根据year_born field and filters based on the count of documents in the buckets:year_born字段将文档分组到存储桶中,并根据存储桶中的文档计数进行筛选:
db.artists.aggregate( [ // First Stage { $bucket: { groupBy: "$year_born", // Field to group by boundaries: [ 1840, 1850, 1860, 1870, 1880 ], // Boundaries for the buckets default: "Other", // Bucket id for documents which do not fall into a bucket output: { // Output for each bucket "count": { $sum: 1 }, "artists" : { $push: { "name": { $concat: [ "$first_name", " ", "$last_name"] }, "year_born": "$year_born" } } } } }, // Second Stage { $match: { count: {$gt: 3} } } ] )
The $bucket stage groups the documents into buckets by the year_born field. $bucket阶段按照year_born字段将文档分组到桶中。The buckets have the following boundaries:桶具有以下boundaries:
1840 and exclusive upper bound 1850.1840和排除上界1850。1850 and exclusive upper bound 1860.1850和排除上界1860。1860 and exclusive upper bound 1870.1860和排除上界1870。1870 and exclusive upper bound 1880.1870和排除上界1880。year_born field or its year_born field was outside the ranges above, it would be placed in the 默认 bucket with the _id value "Other".year_born字段或其year_born字段不在上述范围内,则它将被放置在default存储桶中,其_id值为"Other"。The stage includes the output document to determine the fields to return:该阶段包括用于确定要返回的字段的输出文档:
_id | |
count | |
artists |
|
This stage passes the following documents to the next stage:此阶段将以下文件传递到下一阶段:
{ "_id" : 1840, "count" : 1, "artists" : [ { "name" : "Odilon Redon", "year_born" : 1840 } ] }
{ "_id" : 1850, "count" : 2, "artists" : [ { "name" : "Vincent Van Gogh", "year_born" : 1853 },
{ "name" : "Edvard Diriks", "year_born" : 1855 } ] }
{ "_id" : 1860, "count" : 4, "artists" : [ { "name" : "Emil Bernard", "year_born" : 1868 },
{ "name" : "Joszef Rippl-Ronai", "year_born" : 1861 },
{ "name" : "Alfred Maurer", "year_born" : 1868 },
{ "name" : "Edvard Munch", "year_born" : 1863 } ] }
{ "_id" : 1870, "count" : 1, "artists" : [ { "name" : "Anna Ostroumova", "year_born" : 1871 } ] }
The $match stage filters the output from the previous stage to only return buckets which contain more than 3 documents.$match阶段筛选前一阶段的输出,只返回包含3个以上文档的存储桶。
The operation returns the following document:运算返回以下文档:
{ "_id" : 1860, "count" : 4, "artists" :
[
{ "name" : "Emil Bernard", "year_born" : 1868 },
{ "name" : "Joszef Rippl-Ronai", "year_born" : 1861 },
{ "name" : "Alfred Maurer", "year_born" : 1868 },
{ "name" : "Edvard Munch", "year_born" : 1863 }
]
}
$bucket与$facet一起用于多个字段的bucketYou can use the 您可以使用$facet stage to perform multiple $bucket aggregations in a single stage.$facet阶段在单个阶段中执行多个$bucket聚合。
In 在mongosh, create a sample collection named artwork with the following documents:mongosh中,使用以下文档创建名为artwork的样本集合:
db.artwork.insertMany([
{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
"price" : NumberDecimal("199.99") },
{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
"price" : NumberDecimal("280.00") },
{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
"price" : NumberDecimal("76.04") },
{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
"price" : NumberDecimal("167.30") },
{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
"price" : NumberDecimal("483.00") },
{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
"price" : NumberDecimal("385.00") },
{ "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893
/* No price*/ },
{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
"price" : NumberDecimal("118.42") }
])
The following operation uses two 以下操作使用$bucket stages within a $facet stage to create two groupings, one by price and the other by year:$facet阶段中的两个$bucket阶段创建两个分组,一个按price,另一个按year:
db.artwork.aggregate( [
{
$facet: {
// Top-level $facet stage
"price": [
// Output field 1
{
$bucket: {
groupBy: "$price",
// Field to group by
boundaries: [ 0, 200, 400 ], // Boundaries for the buckets
default: "Other",
// Bucket id for documents which do not fall into a bucket
output: {
// Output for each bucket
"count": { $sum: 1 },
"artwork" : { $push: { "title": "$title", "price": "$price" } },
"averagePrice": { $avg: "$price" }
}
}
}
],
"year": [
// Output field 2
{
$bucket: {
groupBy: "$year",
// Field to group by
boundaries: [ 1890, 1910, 1920, 1940 ], // Boundaries for the buckets
default: "Unknown",
// Bucket id for documents which do not fall into a bucket
output: {
// Output for each bucket
"count": { $sum: 1 },
"artwork": { $push: { "title": "$title", "year": "$year" } }
}
}
}
]
}
}
] )
The first facet groups the input documents by 第一个方面按price. price对输入文档进行分组。The buckets have the following boundaries:桶具有以下边界:
0 and exclusive upper bound 200.0和排除上界200。200 and exclusive upper bound 400.200和排除上界400。default bucket containing documents without prices or prices outside the ranges above.default桶。The $bucket stage includes the output document to determine the fields to return:$bucket阶段包括output文档,用于确定要返回的字段:
_id | |
count | |
artwork | |
averagePrice | $avg operator to display the average price of all artwork in the bucket.$avg运算符显示桶中所有艺术品的平均价格。 |
The second facet groups the input documents by 第二个方面按year. year对输入文档进行分组。The buckets have the following boundaries:桶具有以下边界:
1890 and exclusive upper bound 1910.1890和排除上界1910。1910 and exclusive upper bound 1920.1910和排除上界1920。1910 and exclusive upper bound 1940.1910和排除上界1940。default bucket containing documents without years or years outside the ranges above.default存储桶中包含的文档没有年份或年份超出上述范围。The $bucket stage includes the output document to determine the fields to return:$bucket阶段包括输出文档,用于确定要返回的字段:
count | |
artwork |
The operation returns the following document:运算返回以下文档:
{
"price" : [ // Output of first facet
{
"_id" : 0,
"count" : 4,
"artwork" : [
{ "title" : "The Pillars of Society", "price" : NumberDecimal("199.99") },
{ "title" : "Dancer", "price" : NumberDecimal("76.04") },
{ "title" : "The Great Wave off Kanagawa", "price" : NumberDecimal("167.30") },
{ "title" : "Blue Flower", "price" : NumberDecimal("118.42") }
],
"averagePrice" : NumberDecimal("140.4375")
},
{
"_id" : 200,
"count" : 2,
"artwork" : [
{ "title" : "Melancholy III", "price" : NumberDecimal("280.00") },
{ "title" : "Composition VII", "price" : NumberDecimal("385.00") }
],
"averagePrice" : NumberDecimal("332.50")
},
{
// Includes documents without prices and prices greater than 400
"_id" : "Other",
"count" : 2,
"artwork" : [
{ "title" : "The Persistence of Memory", "price" : NumberDecimal("483.00") },
{ "title" : "The Scream" }
],
"averagePrice" : NumberDecimal("483.00")
}
],
"year" : [ // Output of second facet
{
"_id" : 1890,
"count" : 2,
"artwork" : [
{ "title" : "Melancholy III", "year" : 1902 },
{ "title" : "The Scream", "year" : 1893 }
]
},
{
"_id" : 1910,
"count" : 2,
"artwork" : [
{ "title" : "Composition VII", "year" : 1913 },
{ "title" : "Blue Flower", "year" : 1918 }
]
},
{
"_id" : 1920,
"count" : 3,
"artwork" : [
{ "title" : "The Pillars of Society", "year" : 1926 },
{ "title" : "Dancer", "year" : 1925 },
{ "title" : "The Persistence of Memory", "year" : 1931 }
]
},
{
// Includes documents without a year
"_id" : "Unknown",
"count" : 1,
"artwork" : [
{ "title" : "The Great Wave off Kanagawa" }
]
}
]
}