Definition定义
$bucketAutoCategorizes incoming documents into a specific number of groups, called buckets, based on a specified expression. Bucket boundaries are automatically determined in an attempt to evenly distribute the documents into the specified number of buckets.根据指定的表达式将传入文档分类到特定数量的组中,称为桶。桶边界是自动确定的,试图将文档均匀地分布到指定数量的桶中。Each bucket is represented as a document in the output. The document for each bucket contains:每个桶在输出中表示为一个文档。每个桶的文档包含:An一个_idobject that specifies the bounds of the bucket._id对象,指定桶的边界。The_id.minfield specifies the inclusive lower bound for the bucket._id.min字段指定桶的包含下限。The_id.maxfield specifies the upper bound for the bucket. This bound is exclusive for all buckets except the final bucket in the series, where it is inclusive._id.max字段指定桶的上限。此绑定仅适用于除系列中的最后一个桶之外的所有桶,后者是包容性的。
A包含桶中文档数量的countfield that contains the number of documents in the bucket. Thecountfield is included by default when theoutputdocument is not specified.count字段。当未指定output文档时,默认情况下会包含count字段。
The$bucketAutostage has the following form:$bucketAuto阶段具有以下形式:{
$bucketAuto: {
groupBy: <expression>,
buckets: <number>,
output: {
<output1>: { <$accumulator expression> },
...
}
granularity: <string>
}
}Field字段Type类型Description描述groupByexpression An expression to group documents by.用于对文档进行分组的表达式。To specify a field path, prefix the field name with a dollar sign要指定字段路径,请在字段名前加上美元符号$and enclose it in quotes.$,并将其括在引号中。bucketsinteger A positive 32-bit integer that specifies the number of buckets into which input documents are grouped.一个32位正整数,指定输入文档分组的桶数。outputdocument Optional. A document that specifies the fields to include in the output documents in addition to the可选。一种文档,除了_idfield. To specify the field to include, you must use accumulator expressions:_id字段外,还指定了输出文档中要包含的字段。要指定要包含的字段,必须使用累加器表达式:<outputfield1>: { <accumulator>: <expression1> },
...The default指定输出时,默认countfield is not included in the output document whenoutputis specified. Explicitly specify thecountexpression as part of theoutputdocument to include it:count字段不包含在output文档中。将count表达式明确指定为output文档的一部分以包含它:output: {
<outputfield1>: { <accumulator>: <expression1> },
...
count: { $sum: 1 }
}granularitystring Optional. A string that specifies the preferred number series to use to ensure that the calculated boundary edges end on preferred round numbers or their powers of 10.可选。一个字符串,指定要使用的首选数字序列,以确保计算出的边界边以首选整数或其10的幂结尾。Available only if the all仅当所有groupByvalues are numeric and none of them areNaN.groupBy值都是数字并且都不是NaN时才可用。The supported values of支持的granularityare:granularity值为:"R5""R10""R20""R40""R80""1-2-5"
"E6""E12""E24""E48""E96""E192""POWERSOF2"
Considerations注意事项
$bucketAuto and Memory Restrictions和内存限制
The $bucketAuto stage has a limit of 100 megabytes of RAM. By default, if the stage exceeds this limit, MongoDB automatically writes temporary files to disk. $bucketAuto阶段的RAM限制为100兆字节。默认情况下,如果阶段超过此限制,MongoDB会自动将临时文件写入磁盘。For details, see 有关详细信息,请参阅allowDiskUseByDefault.allowDiskUseByDefault。
Behavior行为
There may be less than the specified number of buckets if:在以下情况下,桶的数量可能少于指定数量:
The number of input documents is less than the specified number of buckets.输入文档的数量小于指定的桶数。The number of unique values of thegroupByexpression is less than the specified number ofbuckets.groupBy表达式的唯一值数小于指定的buckets的数量。Thegranularityhas fewer intervals than the number ofbuckets.granularity的间隔比buckets的数量少。Thegranularityis not fine enough to evenly distribute documents into the specified number ofbuckets.granularity不够精细,无法将文档均匀地分布到指定数量的buckets中。
If the 如果groupBy expression refers to an array or document, the values are arranged using the same ordering as in $sort before determining the bucket boundaries.groupBy表达式引用数组或文档,则在确定桶边界之前,将使用与$sort中相同的顺序排列值。
The even distribution of documents across buckets depends on the cardinality, or the number of unique values, of the 文档在桶中的均匀分布取决于groupBy field. If the cardinality is not high enough, the $bucketAuto stage may not evenly distribute the results across buckets.groupBy字段的基数或唯一值的数量。如果基数不够高,$bucketAuto阶段可能无法在桶之间均匀分布结果。
Granularity粒度
The $bucketAuto accepts an optional granularity parameter which ensures that the boundaries of all buckets adhere to a specified preferred number series. Using a preferred number series provides more control on where the bucket boundaries are set among the range of values in the groupBy expression. $bucketAuto接受一个可选的granularity参数,该参数确保所有桶的边界都符合指定的首选数字序列。使用首选数字序列可以更好地控制groupBy表达式中值范围内桶边界的设置位置。They may also be used to help logarithmically and evenly set bucket boundaries when the range of the 当groupBy expression scales exponentially.groupBy表达式的范围呈指数级扩展时,它们也可用于帮助对数和均匀地设置桶边界。
Renard Series雷纳德系列
The Renard number series are sets of numbers derived by taking either the 5 th, 10 th, 20 th, 40 th, or 80 th root of 10, then including various powers of the root that equate to values between 1.0 to 10.0 (10.3 in the case of 雷诺数系列是通过取10的第5、第10、第20、第40或第80个根得到的数字集,然后包括等于1.0到10.0之间的值的各种幂次(R80为10.3)。R80).
Set 将granularity to R5, R10, R20, R40, or R80 to restrict bucket boundaries to values in the series. The values of the series are multiplied by a power of 10 when the groupBy values are outside of the 1.0 to 10.0 (10.3 for R80) range.granularity设置为R5、R10、R20、R40或R80,以将桶边界限制为序列中的值。当groupBy值在1.0到10.0(R80为10.3)范围之外时,该系列的值将乘以10的幂。
Example示例
The R5 series is based off of the fifth root of 10, which is 1.58, and includes various powers of this root (rounded) until 10 is reached. The R5 series is derived as follows:R5系列基于10的第五个根,即1.58,并包括该根的各种幂(四舍五入),直到达到10。R5系列衍生如下:
- 10 0/5 = 1
- 10 1/5 = 1.584 ~ 1.6
- 10 2/5 = 2.511 ~ 2.5
- 10 3/5 = 3.981 ~ 4.0
- 10 4/5 = 6.309 ~ 6.3
- 10 5/5 = 10
The same approach is applied to the other Renard series to offer finer granularity, i.e., more intervals between 1.0 and 10.0 (10.3 for 同样的方法也适用于其他Renard系列,以提供更精细的粒度,即1.0到10.0之间的间隔更多(R80).R80为10.3)。
E SeriesE系列
The E number series are similar to the Renard series in that they subdivide the interval from 1.0 to 10.0 by the 6 th, 12 th, 24 th, 48 th, 96 th, or 192 nd root of ten with a particular relative error.E数列与雷纳德数列相似,因为它们将1.0到10.0的区间除以10的第6、12、24、48、96或192次方根,并具有特定的相对误差。
Set 将granularity to E6, E12, E24, E48, E96, or E192 to restrict bucket boundaries to values in the series. granularity设置为E6、E12、E24、E48、E96或E192,以将桶边界限制为系列中的值。The values of the series are multiplied by a power of 10 when the 当groupBy values are outside of the 1.0 to 10.0 range. groupBy值在1.0到10.0范围之外时,该系列的值将乘以10的幂。To learn more about the E-series and their respective relative errors, see preferred number series.要了解有关E系列及其各自相对误差的更多信息,请参阅首选数字系列。
1-2-5 Series系列
The 1-2-5 series behaves like a three-value Renard series, if such a series existed.1-2-5系列的行为类似于三值Renard系列(如果存在这样的系列的话)。
Set 将粒度设置为granularity to 1-2-5 to restrict bucket boundaries to various powers of the third root of 10, rounded to one significant digit.granularity,以将桶边界限制为10的三次方根的各种幂,四舍五入到一个有效数字。
Example示例
The following values are part of the 以下值是1-2-5系列的一部分:0.1、0.2、0.5、1、2、5、10、20、50、100、200、500、1000等。。。1-2-5 series: 0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, and so on...
Powers of Two Series两个系列的力量
Set 将granularity to POWERSOF2 to restrict bucket boundaries to numbers that are a power of two.granularity设置为POWERSOF2,将桶边界限制为2的幂。
Example示例
The following numbers adhere to the power of two Series:示例
- 2 0 = 1
- 2 1 = 2
- 2 2 = 4
- 2 3 = 8
- 2 4 = 16
- 2 5 = 32
- and so on...
A common implementation is how various computer components, like memory, often adhere to the 一个常见的实现是,各种计算机组件(如内存)通常遵循POWERSOF2 set of preferred numbers:POWERSOF2的首选数字集:
1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, and so on....
Comparing Different Granularities比较不同粒度
The following operation demonstrates how specifying different values for 以下操作演示了为granularity affects how $bucketAuto determines bucket boundaries. A collection of things have an _id numbered from 0 to 99:granularity指定不同值如何影响$bucketAuto确定桶边界的方式。一组事物的_id编号为0到99:
{ _id: 0 }
{ _id: 1 }
...
{ _id: 99 }
Different values for 以下操作中替换了不同的granularity are substituted into the following operation:granularity值:
db.things.aggregate( [
{
$bucketAuto: {
groupBy: "$_id",
buckets: 5,
granularity: <granularity>
}
}
] )
The results in the following table demonstrate how different values for 下表中的结果展示了不同的granularity yield different bucket boundaries:granularity值如何产生不同的桶边界:
{ "_id" : { "min" : 0, "max" : 20 }, "count" : 20 }{ "_id" : { "min" : 20, "max" : 40 }, "count" : 20 }{ "_id" : { "min" : 40, "max" : 60 }, "count" : 20 }{ "_id" : { "min" : 60, "max" : 80 }, "count" : 20 }{ "_id" : { "min" : 80, "max" : 99 }, "count" : 20 } | ||
| R20 | { "_id" : { "min" : 0, "max" : 20 }, "count" : 20 }{ "_id" : { "min" : 20, "max" : 40 }, "count" : 20 }{ "_id" : { "min" : 40, "max" : 63 }, "count" : 23 }{ "_id" : { "min" : 63, "max" : 90 }, "count" : 27 }{ "_id" : { "min" : 90, "max" : 100 }, "count" : 10 } | |
| E24 | { "_id" : { "min" : 0, "max" : 20 }, "count" : 20 }{ "_id" : { "min" : 20, "max" : 43 }, "count" : 23 }{ "_id" : { "min" : 43, "max" : 68 }, "count" : 25 }{ "_id" : { "min" : 68, "max" : 91 }, "count" : 23 }{ "_id" : { "min" : 91, "max" : 100 }, "count" : 9 } | |
| 1-2-5 | { "_id" : { "min" : 0, "max" : 20 }, "count" : 20 }{ "_id" : { "min" : 20, "max" : 50 }, "count" : 30 }{ "_id" : { "min" : 50, "max" : 100 }, "count" : 50 } | |
| POWERSOF2 | { "_id" : { "min" : 0, "max" : 32 }, "count" : 32 }{ "_id" : { "min" : 32, "max" : 64 }, "count" : 32 }{ "_id" : { "min" : 64, "max" : 128 }, "count" : 36 } |
Examples示例
MongoDB Shell
Consider a collection 考虑一个包含以下文件的集合artwork with the following documents:artwork:
{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
"price" : Decimal128("199.99"),
"dimensions" : { "height" : 39, "width" : 21, "units" : "in" } }
{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
"price" : Decimal128("280.00"),
"dimensions" : { "height" : 49, "width" : 32, "units" : "in" } }
{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
"price" : Decimal128("76.04"),
"dimensions" : { "height" : 25, "width" : 20, "units" : "in" } }
{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
"price" : Decimal128("167.30"),
"dimensions" : { "height" : 24, "width" : 36, "units" : "in" } }
{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
"price" : Decimal128("483.00"),
"dimensions" : { "height" : 20, "width" : 24, "units" : "in" } }
{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
"price" : Decimal128("385.00"),
"dimensions" : { "height" : 30, "width" : 46, "units" : "in" } }
{ "_id" : 7, "title" : "The Scream", "artist" : "Munch",
"price" : Decimal128("159.00"),
"dimensions" : { "height" : 24, "width" : 18, "units" : "in" } }
{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
"price" : Decimal128("118.42"),
"dimensions" : { "height" : 24, "width" : 20, "units" : "in" } }Single Facet Aggregation单面聚合
In the following operation, input documents are grouped into four buckets according to the values in the 在以下操作中,输入文档根据price field:price字段中的值分为四个桶:
db.artwork.aggregate( [
{
$bucketAuto: {
groupBy: "$price",
buckets: 4
}
}
] )
The operation returns the following documents:该操作返回以下文档:
{
"_id" : {
"min" : Decimal128("76.04"),
"max" : Decimal128("159.00")
},
"count" : 2
}
{
"_id" : {
"min" : Decimal128("159.00"),
"max" : Decimal128("199.99")
},
"count" : 2
}
{
"_id" : {
"min" : Decimal128("199.99"),
"max" : Decimal128("385.00")
},
"count" : 2
}
{
"_id" : {
"min" : Decimal128("385.00"),
"max" : Decimal128("483.00")
},
"count" : 2
}Multi-Faceted Aggregation多面聚合
The $bucketAuto stage can be used within the $facet stage to process multiple aggregation pipelines on the same set of input documents from artwork.$bucketAuto阶段可以在$facet阶段中使用,以处理来自artwork的同一组输入文档上的多个聚合管道。
The following aggregation pipeline groups the documents from the 以下聚合管道根据artwork collection into buckets based on price, year, and the calculated area:price、year和计算area将artwork集合中的文档分组到桶中:
db.artwork.aggregate( [
{
$facet: {
"price": [
{
$bucketAuto: {
groupBy: "$price",
buckets: 4
}
}
],
"year": [
{
$bucketAuto: {
groupBy: "$year",
buckets: 3,
output: {
"count": { $sum: 1 },
"years": { $push: "$year" }
}
}
}
],
"area": [
{
$bucketAuto: {
groupBy: {
$multiply: [ "$dimensions.height", "$dimensions.width" ]
},
buckets: 4,
output: {
"count": { $sum: 1 },
"titles": { $push: "$title" }
}
}
}
]
}
}
] )
The operation returns the following document:该操作返回以下文档:
{
"area" : [
{
"_id" : { "min" : 432, "max" : 500 },
"count" : 3,
"titles" : [
"The Scream",
"The Persistence of Memory",
"Blue Flower"
]
},
{
"_id" : { "min" : 500, "max" : 864 },
"count" : 2,
"titles" : [
"Dancer",
"The Pillars of Society"
]
},
{
"_id" : { "min" : 864, "max" : 1568 },
"count" : 2,
"titles" : [
"The Great Wave off Kanagawa",
"Composition VII"
]
},
{
"_id" : { "min" : 1568, "max" : 1568 },
"count" : 1,
"titles" : [
"Melancholy III"
]
}
],
"price" : [
{
"_id" : { "min" : Decimal128("76.04"), "max" : Decimal128("159.00") },
"count" : 2
},
{
"_id" : { "min" : Decimal128("159.00"), "max" : Decimal128("199.99") },
"count" : 2
},
{
"_id" : { "min" : Decimal128("199.99"), "max" : Decimal128("385.00") },
"count" : 2 },
{
"_id" : { "min" : Decimal128("385.00"), "max" : Decimal128("483.00") },
"count" : 2
}
],
"year" : [
{ "_id" : { "min" : null, "max" : 1913 }, "count" : 3, "years" : [ 1902 ] },
{ "_id" : { "min" : 1913, "max" : 1926 }, "count" : 3, "years" : [ 1913, 1918, 1925 ] },
{ "_id" : { "min" : 1926, "max" : 1931 }, "count" : 2, "years" : [ 1926, 1931 ] }
]
}C#
The C# examples on this page use the 本页上的C#示例使用Atlas示例数据集中的sample_mflix database from the Atlas sample datasets. sample_mflix数据库。To learn how to create a free MongoDB Atlas cluster and load the sample datasets, see Get Started in the MongoDB .NET/C# Driver documentation.要了解如何创建免费的MongoDB Atlas集群并加载示例数据集,请参阅MongoDB .NET/C#驱动程序文档中的入门。
The following 以下Movie class models the documents in the sample_mflix.movies collection:Movie类对sample_mflix.movies集合中的文档进行建模:
public class Movie
{
public ObjectId Id { get; set; }
public int Runtime { get; set; }
public string Title { get; set; }
public string Rated { get; set; }
public List<string> Genres { get; set; }
public string Plot { get; set; }
public ImdbData Imdb { get; set; }
public int Year { get; set; }
public int Index { get; set; }
public string[] Comments { get; set; }
[]
public DateTime LastUpdated { get; set; }
}
Note
ConventionPack for Pascal CasePascal案例的约定包
The C# classes on this page use Pascal case for their property names, but the field names in the MongoDB collection use camel case. To account for this difference, you can use the following code to register a 此页面上的C#类使用Pascal大小写作为其属性名,但MongoDB集合中的字段名使用驼峰大小写。为了解释这种差异,您可以在应用程序启动时使用以下代码注册ConventionPack when your application starts:ConventionPack:
var camelCaseConvention = new ConventionPack { new CamelCaseElementNameConvention() };
ConventionRegistry.Register("CamelCase", camelCaseConvention, type => true);To use the MongoDB .NET/C# driver to add a 要使用MongoDB .NET/C#驱动程序将$bucketAuto stage to an aggregation pipeline, call the BucketAuto() method on a PipelineDefinition object.BucketAuto()阶段添加到聚合管道中,请在PipelineDefinition对象上调用bucketAuto()方法。
The following example creates a pipeline stage that evenly distributes documents into five buckets by the value of their 以下示例创建了一个管道阶段,该阶段根据其Runtime field:Runtime字段的值将文档平均分配到五个桶中:
var pipeline = new EmptyPipelineDefinition<Movie>()
.BucketAuto(
groupBy: m => m.Runtime,
buckets: 5);
You can use an AggregateBucketAutoOptions object to specify a preferred number -based scheme to set boundary values. 您可以使用AggregateBucketAutoOptions对象指定基于首选数字的方案设置边界值。The following example performs the same 以下示例执行与前一个示例相同的$bucketAuto operation as the previous example, but also sets the bucket boundaries at powers of 2:$bucketAuto操作,但也将桶边界设置为2的幂次方:
var bucketAutoOptions = new AggregateBucketAutoOptions()
{
Granularity = new AggregateBucketAutoGranularity("POWERSOF2")
};
var pipeline = new EmptyPipelineDefinition<Movie>()
.BucketAuto(
groupBy: m => m.Runtime,
buckets: 5,
options: bucketAutoOptions);Node.js
The Node.js examples on this page use the 本页上的Node.js示例使用Atlas示例数据集中的sample_mflix database from the Atlas sample datasets. sample_mflix数据库。To learn how to create a free MongoDB Atlas cluster and load the sample datasets, see Get Started in the MongoDB Node.js driver documentation.要了解如何创建免费的MongoDB Atlas集群并加载示例数据集,请参阅MongoDB Node.js驱动程序文档中的入门。
To use the MongoDB Node.js driver to add a 要使用MongoDB Node.js驱动程序将$bucketAuto stage to an aggregation pipeline, use the $bucketAuto operator in a pipeline object.$bucketAuto阶段添加到聚合管道中,请在管道对象中使用$bucketAuto运算符。
The following example creates a pipeline stage that evenly distributes documents into five buckets by the value of their 以下示例创建了一个管道阶段,该阶段根据文档的runtime field. The example then runs the aggregation pipeline:runtime字段的值将文档平均分配到五个桶中。然后,该示例运行聚合管道:
const pipeline = [
{
$bucketAuto: {
groupBy: "$runtime",
buckets: 5
}
}
];
const cursor = collection.aggregate(pipeline);
return cursor;
The following example performs the same 以下示例执行与前一个示例相同的$bucketAuto operation as the previous example, but sets the bucket boundaries as powers of 2 by using the granularity parameter:$bucketAuto操作,但通过使用粒度参数将桶边界设置为2的幂:
const pipeline = [
{
$bucketAuto: {
groupBy: "$runtime",
buckets: 5,
granularity: "POWERSOF2"
}
}
];
const cursor = collection.aggregate(pipeline);
return cursor;Learn More了解更多
To learn more about related pipeline stages, see the 要了解有关相关管道阶段的更多信息,请参阅$bucket guide.$bucket指南。