$bucketAuto (aggregation)
On this page本页内容
Definition定义
$bucketAuto
-
Categorizes incoming documents into a specific number of groups, called buckets, based on a specified expression.根据指定的表达式,将传入文档分类为特定数量的组,称为桶。Bucket boundaries are automatically determined in an attempt to evenly distribute the documents into the specified number of buckets.桶边界是自动确定的,目的是将文档平均分配到指定数量的桶中。Each bucket is represented as a document in the output. The document for each bucket contains:每个桶在输出中表示为一个文档。每个桶的文档包含:An一个_id
object that specifies the bounds of the bucket._id
对象,用于指定桶的边界。The_id.min
field specifies the inclusive lower bound for the bucket._id.min
字段指定桶的包含下界。The_id.max
field specifies the upper bound for the bucket._id.max
字段指定桶的上限。This bound is exclusive for all buckets except the final bucket in the series, where it is inclusive.此绑定对于除系列中的最后一个桶之外的所有桶都是独占的,在该桶中它是包含的。
Acount
field that contains the number of documents in the bucket.count
字段,包含桶中的文档数。The如果未指定count
field is included by default when theoutput
document is not specified.output
文档,则默认情况下会包含count
字段。
The$bucketAuto
stage has the following form:$bucketAuto
阶段具有以下形式:{
$bucketAuto: {
groupBy: <expression>,
buckets: <number>,
output: {
<output1>: { <$accumulator expression> },
...
}
granularity: <string>
}
}Field字段Type类型Description描述groupBy
expression An expression to group documents by.文档分组依据的表达式。To specify a field path, prefix the field name with a dollar sign若要指定字段路径,请在字段名称前面加上美元符号$
and enclose it in quotes.$
,并将其括在引号中。buckets
integer A positive 32-bit integer that specifies the number of buckets into which input documents are grouped.一个32位正整数,用于指定将输入文档分组到的桶数。output
document Optional.可选的。A document that specifies the fields to include in the output documents in addition to the除了_id
field._id
字段外,还指定要包含在输出文档中的字段的文档。To specify the field to include, you must use accumulator expressions:要指定要包含的字段,必须使用累加器表达式:<outputfield1>: { <accumulator>: <expression1> },
...The default指定count
field is not included in the output document whenoutput
is specified. Explicitly specify thecount
expression as part of theoutput
document to include it:output
时,输出文档中不包括默认count
字段。将count
表达式明确指定为输出文档的一部分,以包含它:output: {
<outputfield1>: { <accumulator>: <expression1> },
...
count: { $sum: 1 }
}granularity
string Optional.可选的。A string that specifies the preferred number series一个字符串,指定要使用的首选数字系列to use to ensure that the calculated boundary edges end on preferred round numbers or their powers of 10.
,以确保计算的边界边以首选整数或其10的幂结束。
Available only if the all仅当所有groupBy
values are numeric and none of them areNaN
.groupBy
值都是数字并且没有一个是NaN
时才可用。
The supported values of支持的granularity
are:granularity
值为:"R5"
"R10"
"R20"
"R40"
"R80"
"1-2-5"
"E6"
"E12"
"E24"
"E48"
"E96"
"E192"
"POWERSOF2"
Considerations注意事项
$bucketAuto
and Memory Restrictions和内存限制
The $bucketAuto
stage has a limit of 100 megabytes of RAM. $bucketAuto
阶段的RAM限制为100兆字节。By default, if the stage exceeds this limit, 默认情况下,如果阶段超过此限制,$bucketAuto
returns an error. $bucketAuto
将返回一个错误。To allow more space for stage processing, use the allowDiskUse option to enable aggregation pipeline stages to write data to temporary files.若要为阶段处理留出更多空间,请使用allowDiskUse
选项启用聚合管道阶段以将数据写入临时文件。
See also: 另请参阅:
Behavior行为
There may be less than the specified number of buckets if:如果出现以下情况,则可能存在少于指定数量的桶:
The number of input documents is less than the specified number of buckets.输入文档的数量小于指定的桶数量。The number of unique values of thegroupBy
expression is less than the specified number ofbuckets
.groupBy
表达式的唯一值数小于指定的buckets
数目。Thegranularity
has fewer intervals than the number ofbuckets
.granularity
的间隔少于buckets
的数量。Thegranularity
is not fine enough to evenly distribute documents into the specified number ofbuckets
.granularity
不够细,无法将文档均匀分布到指定数量的buckets
中。
If the 如果groupBy表达式引用数组或文档,则在确定桶边界之前,将使用与groupBy
expression refers to an array or document, the values are arranged using the same ordering as in $sort
before determining the bucket boundaries.$sort
中相同的顺序排列值。
The even distribution of documents across buckets depends on the cardinality, or the number of unique values, of the 文档在桶之间的均匀分布取决于groupBy
field. groupBy
字段的基数或唯一值的数量。If the cardinality is not high enough, the $bucketAuto stage may not evenly distribute the results across buckets.如果基数不够高,$bucketAuto
阶段可能无法将结果均匀地分布在桶中。
Granularity粒度
The $bucketAuto
accepts an optional granularity
parameter which ensures that the boundaries of all buckets adhere to a specified preferred number series.
$bucketAuto
接受一个可选的granularity
参数,该参数确保所有桶的边界都符合指定的首选数字序列。
Using a preferred number series provides more control on where the bucket boundaries are set among the range of values in the 使用首选数字序列可以更好地控制groupBy
expression. groupBy
表达式中值范围中的桶边界设置位置。They may also be used to help logarithmically and evenly set bucket boundaries when the range of the 当groupBy
expression scales exponentially.groupBy
表达式的范围按指数缩放时,它们也可以用于帮助以对数和均匀的方式设置桶边界。
Renard SeriesRenard系列
The Renard number series are sets of numbers derived by taking either the 5 th, 10 th, 20 th, 40 th, or 80 th root of 10, then including various powers of the root that equate to values between 1.0 to 10.0 (10.3 in the case of 雷纳德数系列是通过取10的第5、第10、第20、第40或第80个根导出的数集,然后包括等于1.0到10.0之间的值的根的各种幂(在R80的情况下为10.3)。R80
).
Set 将granularity
to R5
, R10
, R20
, R40
, or R80
to restrict bucket boundaries to values in the series. granularity
设置为R5
、R10
、R20
、R40
或R80
,以将桶边界限制为系列中的值。The values of the series are multiplied by a power of 10 when the 当groupBy
values are outside of the 1.0 to 10.0 (10.3 for R80
) range.groupBy
值在1.0到10.0(R80为10.3)范围之外时,该系列的值将乘以10的幂。
The R5
series is based off of the fifth root of 10, which is 1.58, and includes various powers of this root (rounded) until 10 is reached. R5
系列基于10的第五个根,即1.58,并包括该根的各种幂(四舍五入),直到达到10。The R5
series is derived as follows:R5
系列的推导如下:
- 10 0/5 = 1
- 10 1/5 = 1.584 ~ 1.6
- 10 2/5 = 2.511 ~ 2.5
- 10 3/5 = 3.981 ~ 4.0
- 10 4/5 = 6.309 ~ 6.3
- 10 5/5 = 10
The same approach is applied to the other Renard series to offer finer granularity, i.e., more intervals between 1.0 and 10.0 (10.3 for 同样的方法也应用于其他Renard系列,以提供更精细的粒度,即1.0和10.0之间的更多间隔(R80
).R80
为10.3)。
E SeriesE系列
The E number series are similar to the Renard series in that they subdivide the interval from 1.0 to 10.0 by the 6 th, 12 th, 24 th, 48 th, 96 th, or 192 nd root of ten with a particular relative error.E数系列与Renard系列相似,因为它们将1.0到10.0的区间细分为10的第6、12、24、48、96或192次根,并具有特定的相对误差。
Set 将granularity
to E6
, E12
, E24
, E48
, E96
, or E192
to restrict bucket boundaries to values in the series. granularity
设置为E6
、E12
、E24
、E48
、E96
或E192
,以将桶边界限制为系列中的值。The values of the series are multiplied by a power of 10 when the 当groupBy
values are outside of the 1.0 to 10.0 range. groupBy
值在1.0到10.0的范围之外时,该系列的值将乘以10的幂。To learn more about the E-series and their respective relative errors, see preferred number series要了解有关E系列及其各自相对误差的更多信息,请参阅首选数字系列.
。
1-2-5 Series系列
The 1-2-5
series behaves like a three-value Renard series, if such a series existed.1-2-5
级数的行为类似于一个三值Renard级数,如果这样的级数存在的话。
Set 将granularity
to 1-2-5
to restrict bucket boundaries to various powers of the third root of 10, rounded to one significant digit.granularity
设置为1-2-5
,将桶边界限制为10的
三次方,四舍五入到一个有效数字。
The following values are part of the 以下值是1-2-5
series: 1-2-5
系列的一部分:0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, and so on...
Powers of Two Series两个级数的幂
Set 将granularity
to POWERSOF2
to restrict bucket boundaries to numbers that are a power of two.granularity
设置为POWERSOF2
,将桶边界限制为2的幂。
The following numbers adhere to the power of two Series:以下数字遵循两个系列的幂:
- 2 0 = 1
- 2 1 = 2
- 2 2 = 4
- 2 3 = 8
- 2 4 = 16
- 2 5 = 32
and so on...等等
A common implementation is how various computer components, like memory, often adhere to the 一个常见的实现是,各种计算机组件(如内存)通常遵循POWERSOF2
set of preferred numbers:POWERSOF2
首选数字集:
1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, and so on....
Comparing Different Granularities比较不同粒度
The following operation demonstrates how specifying different values for 以下操作演示了为granularity
affects how $bucketAuto
determines bucket boundaries. A collection of things
have an _id
numbered from 0 to 99:granularity
指定不同值如何影响$bucketAuto
确定桶边界。一个things
集合的_id
编号从0到99:
{ _id: 0 }
{ _id: 1 }
...
{ _id: 99 }
Different values for 将不同的granularity
are substituted into the following operation:granularity
值代入以下操作:
db.things.aggregate( [
{
$bucketAuto: {
groupBy: "$_id",
buckets: 5,
granularity: <granularity>
}
}
] )
The results in the following table demonstrate how different values for 下表中的结果展示了granularity
yield different bucket boundaries:granularity
的不同值如何产生不同的桶边界:
{ "_id" : { "min" : 0, "max" : 20 }, "count" : 20 } { "_id" : { "min" : 20, "max" : 40 }, "count" : 20 } { "_id" : { "min" : 40, "max" : 60 }, "count" : 20 } { "_id" : { "min" : 60, "max" : 80 }, "count" : 20 } { "_id" : { "min" : 80, "max" : 99 }, "count" : 20 } | ||
R20 | { "_id" : { "min" : 0, "max" : 20 }, "count" : 20 } { "_id" : { "min" : 20, "max" : 40 }, "count" : 20 } { "_id" : { "min" : 40, "max" : 63 }, "count" : 23 } { "_id" : { "min" : 63, "max" : 90 }, "count" : 27 } { "_id" : { "min" : 90, "max" : 100 }, "count" : 10 } | |
E24 | { "_id" : { "min" : 0, "max" : 20 }, "count" : 20 } { "_id" : { "min" : 20, "max" : 43 }, "count" : 23 } { "_id" : { "min" : 43, "max" : 68 }, "count" : 25 } { "_id" : { "min" : 68, "max" : 91 }, "count" : 23 } { "_id" : { "min" : 91, "max" : 100 }, "count" : 9 } | |
1-2-5 | { "_id" : { "min" : 0, "max" : 20 }, "count" : 20 } { "_id" : { "min" : 20, "max" : 50 }, "count" : 30 } { "_id" : { "min" : 50, "max" : 100 }, "count" : 50 } | |
POWERSOF2 | { "_id" : { "min" : 0, "max" : 32 }, "count" : 32 } { "_id" : { "min" : 32, "max" : 64 }, "count" : 32 } { "_id" : { "min" : 64, "max" : 128 }, "count" : 36 } |
Example实例
Consider a collection 考虑一件带有以下文件的集合artwork
with the following documents:artwork
:
{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
"price" : NumberDecimal("199.99"),
"dimensions" : { "height" : 39, "width" : 21, "units" : "in" } }
{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
"price" : NumberDecimal("280.00"),
"dimensions" : { "height" : 49, "width" : 32, "units" : "in" } }
{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
"price" : NumberDecimal("76.04"),
"dimensions" : { "height" : 25, "width" : 20, "units" : "in" } }
{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
"price" : NumberDecimal("167.30"),
"dimensions" : { "height" : 24, "width" : 36, "units" : "in" } }
{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
"price" : NumberDecimal("483.00"),
"dimensions" : { "height" : 20, "width" : 24, "units" : "in" } }
{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
"price" : NumberDecimal("385.00"),
"dimensions" : { "height" : 30, "width" : 46, "units" : "in" } }
{ "_id" : 7, "title" : "The Scream", "artist" : "Munch",
"price" : NumberDecimal("159.00"),
"dimensions" : { "height" : 24, "width" : 18, "units" : "in" } }
{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
"price" : NumberDecimal("118.42"),
"dimensions" : { "height" : 24, "width" : 20, "units" : "in" } }
Single Facet Aggregation单位面聚合
In the following operation, input documents are grouped into four buckets according to the values in the 在以下操作中,输入的单据根据price
field:price
字段中的值分为四个桶:
db.artwork.aggregate( [
{
$bucketAuto: {
groupBy: "$price",
buckets: 4
}
}
] )
The operation returns the following documents:该操作返回以下文档:
{
"_id" : {
"min" : NumberDecimal("76.04"),
"max" : NumberDecimal("159.00")
},
"count" : 2
}
{
"_id" : {
"min" : NumberDecimal("159.00"),
"max" : NumberDecimal("199.99")
},
"count" : 2
}
{
"_id" : {
"min" : NumberDecimal("199.99"),
"max" : NumberDecimal("385.00")
},
"count" : 2
}
{
"_id" : {
"min" : NumberDecimal("385.00"),
"max" : NumberDecimal("483.00")
},
"count" : 2
}
Multi-Faceted Aggregation多位面聚合
The $bucketAuto
stage can be used within the $facet
stage to process multiple aggregation pipelines on the same set of input documents from artwork
.$bucketAuto
阶段可以在$facet
阶段中用于处理来自artwork
的同一组输入文档上的多个聚合管道。
The following aggregation pipeline groups the documents from the 以下聚合管道根据artwork
collection into buckets based on price
, year
, and the calculated area
:price
、year
和计算所得的area
将artwork
集合中的文档分组为多个桶:
db.artwork.aggregate( [
{
$facet: {
"price": [
{
$bucketAuto: {
groupBy: "$price",
buckets: 4
}
}
],
"year": [
{
$bucketAuto: {
groupBy: "$year",
buckets: 3,
output: {
"count": { $sum: 1 },
"years": { $push: "$year" }
}
}
}
],
"area": [
{
$bucketAuto: {
groupBy: {
$multiply: [ "$dimensions.height", "$dimensions.width" ]
},
buckets: 4,
output: {
"count": { $sum: 1 },
"titles": { $push: "$title" }
}
}
}
]
}
}
] )
The operation returns the following document:该操作返回以下文档:
{
"area" : [
{
"_id" : { "min" : 432, "max" : 500 },
"count" : 3,
"titles" : [
"The Scream",
"The Persistence of Memory",
"Blue Flower"
]
},
{
"_id" : { "min" : 500, "max" : 864 },
"count" : 2,
"titles" : [
"Dancer",
"The Pillars of Society"
]
},
{
"_id" : { "min" : 864, "max" : 1568 },
"count" : 2,
"titles" : [
"The Great Wave off Kanagawa",
"Composition VII"
]
},
{
"_id" : { "min" : 1568, "max" : 1568 },
"count" : 1,
"titles" : [
"Melancholy III"
]
}
],
"price" : [
{
"_id" : { "min" : NumberDecimal("76.04"), "max" : NumberDecimal("159.00") },
"count" : 2
},
{
"_id" : { "min" : NumberDecimal("159.00"), "max" : NumberDecimal("199.99") },
"count" : 2
},
{
"_id" : { "min" : NumberDecimal("199.99"), "max" : NumberDecimal("385.00") },
"count" : 2 },
{
"_id" : { "min" : NumberDecimal("385.00"), "max" : NumberDecimal("483.00") },
"count" : 2
}
],
"year" : [
{ "_id" : { "min" : null, "max" : 1913 }, "count" : 3, "years" : [ 1902 ] },
{ "_id" : { "min" : 1913, "max" : 1926 }, "count" : 3, "years" : [ 1913, 1918, 1925 ] },
{ "_id" : { "min" : 1926, "max" : 1931 }, "count" : 2, "years" : [ 1926, 1931 ] }
]
}