$bucket (aggregation)
On this page本页内容
Definition定义
$bucket
-
Categorizes incoming documents into groups, called buckets, based on a specified expression and bucket boundaries and outputs a document per each bucket.根据指定的表达式和桶边界,将传入的文档分类为称为bucket的组,并为每个桶输出一个文档。Each output document contains an每个输出文档都包含一个_id
field whose value specifies the inclusive lower bound of the bucket._id
字段,其值指定桶的包含下界。The output option specifies the fields included in each output document.output
选项指定每个输出文档中包含的字段。$bucket
only produces output documents for buckets that contain at least one input document.仅为包含至少一个输入文档的存储桶生成输出文档。
Considerations注意事项
$bucket
and Memory Restrictions
The $bucket
stage has a limit of 100 megabytes of RAM. $bucket
阶段的RAM限制为100兆字节。By default, if the stage exceeds this limit, 默认情况下,如果阶段超过此限制,$bucket
returns an error. To allow more space for stage processing, use the allowDiskUse option to enable aggregation pipeline stages to write data to temporary files.$bucket
将返回一个错误。若要为阶段处理留出更多空间,请使用allowDiskUse
选项启用聚合管道阶段以将数据写入临时文件。
See also: 另请参阅:
Syntax语法
{
$bucket: {
groupBy: <expression>,
boundaries: [ <lowerbound1>, <lowerbound2>, ... ],
default: <literal>,
output: {
<output1>: { <$accumulator expression> },
...
<outputN>: { <$accumulator expression> }
}
}
}
The $bucket
document contains the following fields:$bucket
文档包含以下字段:
groupBy | expression | $ and enclose it in quotes.$bucket includes a default specification, each input document must resolve the groupBy field path or expression to a value that falls within one of the ranges specified by the boundaries. $bucket 包含default 规范,否则每个输入文档必须将groupBy 字段路径或表达式解析为一个位于boundaries 指定的范围内的值。 |
boundaries | array | groupBy 表达式的值数组,用于指定每个桶的边界。[ 10, NumberLong(20), NumberInt(30) ]
Example
[ 0, 5, 10 ] creates two buckets: [ 0, 5, 10 ] 的数组创建两个桶:
|
default | literal | _id of an additional bucket that contains all documents whose groupBy expression result does not fall into a bucket specified by boundaries._id 的文字,该存储桶包含groupBy 表达式结果不属于boundaries 指定存储桶的所有文档。groupBy expression to a value within one of the bucket ranges specified by boundaries or the operation throws an error.groupBy 表达式解析为boundaries 指定的某个桶范围内的值,否则操作将引发错误。default value must be less than the lowest boundaries value, or greater than or equal to the highest boundaries value.default 值必须小于最低boundaries 值,或大于或等于最高boundaries 值。default value can be of a different type than the entries in boundaries . default 值可以是与boundaries 中的条目不同的类型。 |
output | document | _id field. _id 字段外,还指定要包含在输出文档中的字段的文档。<outputfield1>: { <accumulator>: <expression1> }, output document, the operation returns a count field containing the number of documents in each bucket.output 文档,则操作将返回一个count 字段,该字段包含每个存储桶中的文档数。output document, only the fields specified in the document are returned; i.e. the count field is not returned unless it is explicitly included in the output document. output 文档,则只返回文档中指定的字段;即,除非count 字段明确包含在输出文档中,否则不会返回该字段。 |
Behavior行为
$bucket
requires at least one of the following conditions to be met or the operation throws an error:要求至少满足以下条件之一,或者操作引发错误:
Each input document resolves the groupBy expression to a value within one of the bucket ranges specified by boundaries, or每个输入文档将groupBy
表达式解析为boundaries
指定的一个桶范围内的值,或者A default value is specified to bucket documents whose为groupBy
values are outside of theboundaries
or of a different BSON type than the values inboundaries
.groupBy
值在boundaries
之外或BSON类型与边界中的值不同的桶文档指定default
值。
If the 如果groupBy
expression resolves to an array or a document, $bucket
arranges the input documents into buckets using the comparison logic from $sort
.groupBy
表达式解析为数组或文档,$bucket
将使用$sort
中的比较逻辑将输入文档排列到桶中。
Examples实例
Bucket by Year and Filter by Bucket Results逐年筛选和按筛选结果筛选
In 在mongosh
, create a sample collection named artists
with the following documents:mongosh
中,用以下文件创建一个名为artists
的样本集合:
db.artists.insertMany([
{ "_id" : 1, "last_name" : "Bernard", "first_name" : "Emil", "year_born" : 1868, "year_died" : 1941, "nationality" : "France" },
{ "_id" : 2, "last_name" : "Rippl-Ronai", "first_name" : "Joszef", "year_born" : 1861, "year_died" : 1927, "nationality" : "Hungary" },
{ "_id" : 3, "last_name" : "Ostroumova", "first_name" : "Anna", "year_born" : 1871, "year_died" : 1955, "nationality" : "Russia" },
{ "_id" : 4, "last_name" : "Van Gogh", "first_name" : "Vincent", "year_born" : 1853, "year_died" : 1890, "nationality" : "Holland" },
{ "_id" : 5, "last_name" : "Maurer", "first_name" : "Alfred", "year_born" : 1868, "year_died" : 1932, "nationality" : "USA" },
{ "_id" : 6, "last_name" : "Munch", "first_name" : "Edvard", "year_born" : 1863, "year_died" : 1944, "nationality" : "Norway" },
{ "_id" : 7, "last_name" : "Redon", "first_name" : "Odilon", "year_born" : 1840, "year_died" : 1916, "nationality" : "France" },
{ "_id" : 8, "last_name" : "Diriks", "first_name" : "Edvard", "year_born" : 1855, "year_died" : 1930, "nationality" : "Norway" }
])
The following operation groups the documents into buckets according to the 以下操作根据year_born
field and filters based on the count of documents in the buckets:year_born
字段将文档分组到bucket中,并根据bucket中的文档数进行筛选:
db.artists.aggregate( [
// First Stage
{
$bucket: {
groupBy: "$year_born", //Field to group by分组依据字段
boundaries: [ 1840, 1850, 1860, 1870, 1880 ], //Boundaries for the buckets桶的边界
default: "Other", //Bucket ID for documents which do not fall into a bucket不属于存储桶的文档的存储桶ID
output: { //Output for each bucket每个桶的输出
"count": { $sum: 1 },
"artists" :
{
$push: {
"name": { $concat: [ "$first_name", " ", "$last_name"] },
"year_born": "$year_born"
}
}
}
}
},
//Second Stage第二阶段
{
$match: { count: {$gt: 3} }
}
] )
First Stage第一阶段-
The$bucket
stage groups the documents into buckets by theyear_born
field. The buckets have the following boundaries:$bucket
阶段根据year_born
字段将文档分组到桶中。桶具有以下boundaries
:[1840, 1850)
with inclusive lowerbound具有包含下限1840
and exclusive upper bound1850
.1840
和排除上限1850
。[1850, 1860)
with inclusive lowerbound具有包含下限1850
and exclusive upper bound1860
.1850
和排除上限1860
。[1860, 1870)
with inclusive lowerbound具有包含下限1860
and exclusive upper bound1870
.1860
和排除上限1870
。[1870, 1880)
with inclusive lowerbound具有包含下限1870
and exclusive upper bound1880
.1870
和排除上限1880
。If a document did not contain the如果文档不包含year_born
field or itsyear_born
field was outside the ranges above, it would be placed in the default bucket with the_id
value"Other"
.year_born
字段,或者其year_born
字段超出上述范围,则它将被放置在_id
值为"Other"
的default
桶中。
The stage includes the output document to determine the fields to return:该阶段包括用于确定要返回的字段的output
文档:Field字段Description描述_id
Inclusive lower bound of the bucket.桶的包含下限。count
Count of documents in the bucket.桶中的文档数。artists
Array of documents containing information on each artist in the bucket.包含bucket中每个艺术家信息的文档数组。Each document contains the artist's每个文档都包含艺术家的This stage passes the following documents to the next stage:此阶段将以下文件传递到下一阶段:{ "_id" : 1840, "count" : 1, "artists" : [ { "name" : "Odilon Redon", "year_born" : 1840 } ] }
{ "_id" : 1850, "count" : 2, "artists" : [ { "name" : "Vincent Van Gogh", "year_born" : 1853 },
{ "name" : "Edvard Diriks", "year_born" : 1855 } ] }
{ "_id" : 1860, "count" : 4, "artists" : [ { "name" : "Emil Bernard", "year_born" : 1868 },
{ "name" : "Joszef Rippl-Ronai", "year_born" : 1861 },
{ "name" : "Alfred Maurer", "year_born" : 1868 },
{ "name" : "Edvard Munch", "year_born" : 1863 } ] }
{ "_id" : 1870, "count" : 1, "artists" : [ { "name" : "Anna Ostroumova", "year_born" : 1871 } ] } Second Stage第二阶段-
The$match
stage filters the output from the previous stage to only return buckets which contain more than 3 documents.$match
阶段筛选前一阶段的输出,只返回包含3个以上文档的桶。The operation returns the following document:该操作返回以下文档:{ "_id" : 1860, "count" : 4, "artists" :
[
{ "name" : "Emil Bernard", "year_born" : 1868 },
{ "name" : "Joszef Rippl-Ronai", "year_born" : 1861 },
{ "name" : "Alfred Maurer", "year_born" : 1868 },
{ "name" : "Edvard Munch", "year_born" : 1863 }
]
}
Use $bucket with $facet to Bucket by Multiple Fields将$bucket
与$facet
一起用于多个字段的桶
You can use the 您可以使用$facet
stage to perform multiple $bucket
aggregations in a single stage.$facet
阶段在单个阶段中执行多个$bucket
聚合。
In 在mongosh
, create a sample collection named artwork
with the following documents:mongosh
中,使用以下文档创建一个名为artwork
的样本集合:
db.artwork.insertMany([
{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
"price" : NumberDecimal("199.99") },
{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
"price" : NumberDecimal("280.00") },
{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
"price" : NumberDecimal("76.04") },
{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
"price" : NumberDecimal("167.30") },
{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
"price" : NumberDecimal("483.00") },
{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
"price" : NumberDecimal("385.00") },
{ "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893
/* No price*/ },
{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
"price" : NumberDecimal("118.42") }
])
The following operation uses two 以下操作使用$bucket
stages within a $facet
stage to create two groupings, one by price
and the other by year
:$facet
阶段中的两个$bucket
阶段来创建两个分组,一个按price
,另一个按year
:
db.artwork.aggregate( [
{
$facet: { //Top-level $facet stage顶级$facet阶段
"price": [ //Output field输出字段1
{
$bucket: {
groupBy: "$price", //Field to group by分组依据字段
boundaries: [ 0, 200, 400 ], //Boundaries for the buckets桶的边界
default: "Other", //Bucket ID for documents which do not fall into a bucket不属于存储桶的文档的存储桶ID
output: { //Output for each bucket每个桶的输出
"count": { $sum: 1 },
"artwork" : { $push: { "title": "$title", "price": "$price" } },
"averagePrice": { $avg: "$price" }
}
}
}
],
"year": [ //Output field输出字段2
{
$bucket: {
groupBy: "$year", //Field to group by分组依据字段
boundaries: [ 1890, 1910, 1920, 1940 ], //Boundaries for the buckets桶的边界
default: "Unknown", //Bucket ID for documents which do not fall into a bucket不属于存储桶的文档的存储桶ID
output: { //Output for each bucket每个桶的输出
"count": { $sum: 1 },
"artwork": { $push: { "title": "$title", "year": "$year" } }
}
}
}
]
}
}
] )
First Facet第一位面-
The first facet groups the input documents by第一个位面按price
. The buckets have the following boundaries:price
对输入文档进行分组。桶具有以下边界:[0, 200)
with inclusive lowerbound具有包含下限0
and exclusive upper bound200
.0
和排除上限200
。[200, 400)
with inclusive lowerbound具有包含下限200
and exclusive upper bound400
.200
和排除上限400
。"Other", thedefault
bucket containing documents without prices or prices outside the ranges above."Other"
,包含没有价格或价格超出上述范围的文档的default
存储桶。
The$bucket
stage includes the output document to determine the fields to return:$bucket
阶段包括用于确定要返回的字段的output
文档:Field字段Description描述_id
Inclusive lower bound of the bucket.桶的包含下限。count
Count of documents in the bucket.存储桶中的文档数。artwork
Array of documents containing information on each artwork in the bucket.一组文档,其中包含有关桶中每个艺术品的信息。averagePrice
Employs the使用$avg
operator to display the average price of all artwork in the bucket.$avg
运算符来显示桶中所有艺术品的平均价格。 Second Facet第二位面-
The second facet groups the input documents by第二个方面按year
.year
对输入文档进行分组。The buckets have the following boundaries:桶具有以下边界:[1890, 1910)
with inclusive lowerbound具有包含下界1890
and exclusive upper bound1910
.1890
和排除上界1910
。[1910, 1920)
with inclusive lowerbound具有包含下界1910
and exclusive upper bound1920
.1910
和排除上界1920
。[1920, 1940)
with inclusive lowerbound具有包含下界1920
and exclusive upper bound1940
.1920
和排除上界1940
。"Unknown"
, the,default
bucket containing documents without years or years outside the ranges above.default
桶,包含没有年份或年份超出上述范围的文档。
The$bucket
stage includes the output document to determine the fields to return:$bucket
阶段包括用于确定要返回的字段的output
文档:Field字段Description描述count
Count of documents in the bucket.存储桶中的文档数。artwork
Array of documents containing information on each artwork in the bucket.一组文档,其中包含有关bucket中每个艺术品的信息。 Output输出-
The operation returns the following document:该操作返回以下文档:{
"price" : [ //Output of first facet第一位面的输出
{
"_id" : 0,
"count" : 4,
"artwork" : [
{ "title" : "The Pillars of Society", "price" : NumberDecimal("199.99") },
{ "title" : "Dancer", "price" : NumberDecimal("76.04") },
{ "title" : "The Great Wave off Kanagawa", "price" : NumberDecimal("167.30") },
{ "title" : "Blue Flower", "price" : NumberDecimal("118.42") }
],
"averagePrice" : NumberDecimal("140.4375")
},
{
"_id" : 200,
"count" : 2,
"artwork" : [
{ "title" : "Melancholy III", "price" : NumberDecimal("280.00") },
{ "title" : "Composition VII", "price" : NumberDecimal("385.00") }
],
"averagePrice" : NumberDecimal("332.50")
},
{
//Includes documents without prices and prices greater than 400包括没有价格和价格超过400的文档
"_id" : "Other",
"count" : 2,
"artwork" : [
{ "title" : "The Persistence of Memory", "price" : NumberDecimal("483.00") },
{ "title" : "The Scream" }
],
"averagePrice" : NumberDecimal("483.00")
}
],
"year" : [ //Output of second facet第二位面的输出
{
"_id" : 1890,
"count" : 2,
"artwork" : [
{ "title" : "Melancholy III", "year" : 1902 },
{ "title" : "The Scream", "year" : 1893 }
]
},
{
"_id" : 1910,
"count" : 2,
"artwork" : [
{ "title" : "Composition VII", "year" : 1913 },
{ "title" : "Blue Flower", "year" : 1918 }
]
},
{
"_id" : 1920,
"count" : 3,
"artwork" : [
{ "title" : "The Pillars of Society", "year" : 1926 },
{ "title" : "Dancer", "year" : 1925 },
{ "title" : "The Persistence of Memory", "year" : 1931 }
]
},
{
//Includes documents without a year包括没有年份的文档
"_id" : "Unknown",
"count" : 1,
"artwork" : [
{ "title" : "The Great Wave off Kanagawa" }
]
}
]
}