Database Manual / Reference / Query Language / Aggregation Stages

$bucketAuto (aggregation stage)(聚合阶段)

Definition定义

$bucketAuto

Categorizes incoming documents into a specific number of groups, called buckets, based on a specified expression. Bucket boundaries are automatically determined in an attempt to evenly distribute the documents into the specified number of buckets.根据指定的表达式将传入文档分类到特定数量的组中,称为桶。桶边界是自动确定的,试图将文档均匀地分布到指定数量的桶中。

Each bucket is represented as a document in the output. The document for each bucket contains:每个桶在输出中表示为一个文档。每个桶的文档包含:

  • An _id object that specifies the bounds of the bucket.一个_id对象,指定桶的边界。

    • The _id.min field specifies the inclusive lower bound for the bucket._id.min字段指定桶的包含下限。
    • The _id.max field specifies the upper bound for the bucket. This bound is exclusive for all buckets except the final bucket in the series, where it is inclusive._id.max字段指定桶的上限。此绑定仅适用于除系列中的最后一个桶之外的所有桶,后者是包容性的。
  • A count field that contains the number of documents in the bucket. The count field is included by default when the output document is not specified.包含桶中文档数量的count字段。当未指定output文档时,默认情况下会包含count字段。

The $bucketAuto stage has the following form:$bucketAuto阶段具有以下形式:

{
$bucketAuto: {
groupBy: <expression>,
buckets: <number>,
output: {
<output1>: { <$accumulator expression> },
...
}
granularity: <string>
}
}
Field字段Type类型Description描述
groupByexpressionAn expression to group documents by. 用于对文档进行分组的表达式To specify a field path, prefix the field name with a dollar sign $ and enclose it in quotes.要指定字段路径,请在字段名前加上美元符号$,并将其括在引号中。
bucketsintegerA positive 32-bit integer that specifies the number of buckets into which input documents are grouped.一个32位正整数,指定输入文档分组的桶数。
outputdocumentOptional. A document that specifies the fields to include in the output documents in addition to the _id field. To specify the field to include, you must use accumulator expressions:可选。一种文档,除了_id字段外,还指定了输出文档中要包含的字段。要指定要包含的字段,必须使用累加器表达式

<outputfield1>: { <accumulator>: <expression1> },
...
The default count field is not included in the output document when output is specified. Explicitly specify the count expression as part of the output document to include it:指定输出时,默认count字段不包含在output文档中。将count表达式明确指定为output文档的一部分以包含它:

output: {
<outputfield1>: { <accumulator>: <expression1> },
...
count: { $sum: 1 }
}
granularitystringOptional. A string that specifies the preferred number series to use to ensure that the calculated boundary edges end on preferred round numbers or their powers of 10.可选。一个字符串,指定要使用的首选数字序列,以确保计算出的边界边以首选整数或其10的幂结尾。

Available only if the all groupBy values are numeric and none of them are NaN.仅当所有groupBy值都是数字并且都不是NaN时才可用。

The supported values of granularity are:支持的granularity值为:

  • "R5"
  • "R10"
  • "R20"
  • "R40"
  • "R80"
  • "1-2-5"
  • "E6"
  • "E12"
  • "E24"
  • "E48"
  • "E96"
  • "E192"
  • "POWERSOF2"

Considerations注意事项

$bucketAuto and Memory Restrictions和内存限制

The $bucketAuto stage has a limit of 100 megabytes of RAM. By default, if the stage exceeds this limit, MongoDB automatically writes temporary files to disk. $bucketAuto阶段的RAM限制为100兆字节。默认情况下,如果阶段超过此限制,MongoDB会自动将临时文件写入磁盘。For details, see allowDiskUseByDefault.有关详细信息,请参阅allowDiskUseByDefault

Behavior行为

There may be less than the specified number of buckets if:在以下情况下,桶的数量可能少于指定数量:

  • The number of input documents is less than the specified number of buckets.输入文档的数量小于指定的桶数。
  • The number of unique values of the groupBy expression is less than the specified number of buckets.groupBy表达式的唯一值数小于指定的buckets的数量。
  • The granularity has fewer intervals than the number of buckets.granularity的间隔比buckets的数量少。
  • The granularity is not fine enough to evenly distribute documents into the specified number of buckets.granularity不够精细,无法将文档均匀地分布到指定数量的buckets中。

If the groupBy expression refers to an array or document, the values are arranged using the same ordering as in $sort before determining the bucket boundaries.如果groupBy表达式引用数组或文档,则在确定桶边界之前,将使用与$sort中相同的顺序排列值。

The even distribution of documents across buckets depends on the cardinality, or the number of unique values, of the groupBy field. If the cardinality is not high enough, the $bucketAuto stage may not evenly distribute the results across buckets.文档在桶中的均匀分布取决于groupBy字段的基数或唯一值的数量。如果基数不够高,$bucketAuto阶段可能无法在桶之间均匀分布结果。

Granularity粒度

The $bucketAuto accepts an optional granularity parameter which ensures that the boundaries of all buckets adhere to a specified preferred number series. Using a preferred number series provides more control on where the bucket boundaries are set among the range of values in the groupBy expression. $bucketAuto接受一个可选的granularity参数,该参数确保所有桶的边界都符合指定的首选数字序列。使用首选数字序列可以更好地控制groupBy表达式中值范围内桶边界的设置位置。They may also be used to help logarithmically and evenly set bucket boundaries when the range of the groupBy expression scales exponentially.groupBy表达式的范围呈指数级扩展时,它们也可用于帮助对数和均匀地设置桶边界。

Renard Series雷纳德系列

The Renard number series are sets of numbers derived by taking either the 5 th, 10 th, 20 th, 40 th, or 80 th root of 10, then including various powers of the root that equate to values between 1.0 to 10.0 (10.3 in the case of R80).雷诺数系列是通过取10的第5、第10、第20、第40或第80个根得到的数字集,然后包括等于1.0到10.0之间的值的各种幂次(R80为10.3)。

Set granularity to R5, R10, R20, R40, or R80 to restrict bucket boundaries to values in the series. The values of the series are multiplied by a power of 10 when the groupBy values are outside of the 1.0 to 10.0 (10.3 for R80) range.granularity设置为R5R10R20R40R80,以将桶边界限制为序列中的值。当groupBy值在1.0到10.0(R80为10.3)范围之外时,该系列的值将乘以10的幂。

Example示例

The R5 series is based off of the fifth root of 10, which is 1.58, and includes various powers of this root (rounded) until 10 is reached. The R5 series is derived as follows:R5系列基于10的第五个根,即1.58,并包括该根的各种幂(四舍五入),直到达到10。R5系列衍生如下:

  • 10 0/5 = 1
  • 10 1/5 = 1.584 ~ 1.6
  • 10 2/5 = 2.511 ~ 2.5
  • 10 3/5 = 3.981 ~ 4.0
  • 10 4/5 = 6.309 ~ 6.3
  • 10 5/5 = 10

The same approach is applied to the other Renard series to offer finer granularity, i.e., more intervals between 1.0 and 10.0 (10.3 for R80).同样的方法也适用于其他Renard系列,以提供更精细的粒度,即1.0到10.0之间的间隔更多(R80为10.3)。

E SeriesE系列

The E number series are similar to the Renard series in that they subdivide the interval from 1.0 to 10.0 by the 6 th, 12 th, 24 th, 48 th, 96 th, or 192 nd root of ten with a particular relative error.E数列与雷纳德数列相似,因为它们将1.0到10.0的区间除以10的第6、12、24、48、96或192次方根,并具有特定的相对误差。

Set granularity to E6, E12, E24, E48, E96, or E192 to restrict bucket boundaries to values in the series. granularity设置为E6E12E24E48E96E192,以将桶边界限制为系列中的值。The values of the series are multiplied by a power of 10 when the groupBy values are outside of the 1.0 to 10.0 range. groupBy值在1.0到10.0范围之外时,该系列的值将乘以10的幂。To learn more about the E-series and their respective relative errors, see preferred number series.要了解有关E系列及其各自相对误差的更多信息,请参阅首选数字系列

1-2-5 Series系列

The 1-2-5 series behaves like a three-value Renard series, if such a series existed.1-2-5系列的行为类似于三值Renard系列(如果存在这样的系列的话)。

Set granularity to 1-2-5 to restrict bucket boundaries to various powers of the third root of 10, rounded to one significant digit.将粒度设置为granularity,以将桶边界限制为10的三次方根的各种幂,四舍五入到一个有效数字。

Example示例

The following values are part of the 1-2-5 series: 0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, and so on...以下值是1-2-5系列的一部分:0.1、0.2、0.5、1、2、5、10、20、50、100、200、500、1000等。。。

Powers of Two Series两个系列的力量

Set granularity to POWERSOF2 to restrict bucket boundaries to numbers that are a power of two.granularity设置为POWERSOF2,将桶边界限制为2的幂。

Example示例

The following numbers adhere to the power of two Series:示例

  • 2 0 = 1
  • 2 1 = 2
  • 2 2 = 4
  • 2 3 = 8
  • 2 4 = 16
  • 2 5 = 32
  • and so on...

A common implementation is how various computer components, like memory, often adhere to the POWERSOF2 set of preferred numbers:一个常见的实现是,各种计算机组件(如内存)通常遵循POWERSOF2的首选数字集:

1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, and so on....

Comparing Different Granularities比较不同粒度

The following operation demonstrates how specifying different values for granularity affects how $bucketAuto determines bucket boundaries. A collection of things have an _id numbered from 0 to 99:以下操作演示了为granularity指定不同值如何影响$bucketAuto确定桶边界的方式。一组事物的_id编号为0到99:

{ _id: 0 }
{ _id: 1 }
...
{ _id: 99 }

Different values for granularity are substituted into the following operation:以下操作中替换了不同的granularity值:

db.things.aggregate( [
{
$bucketAuto: {
groupBy: "$_id",
buckets: 5,
granularity: <granularity>
}
}
] )

The results in the following table demonstrate how different values for granularity yield different bucket boundaries:下表中的结果展示了不同的granularity值如何产生不同的桶边界:

Granularity粒度Results结果Notes备注
No granularity无粒度{ "_id" : { "min" : 0, "max" : 20 }, "count" : 20 }
{ "_id" : { "min" : 20, "max" : 40 }, "count" : 20 }
{ "_id" : { "min" : 40, "max" : 60 }, "count" : 20 }
{ "_id" : { "min" : 60, "max" : 80 }, "count" : 20 }
{ "_id" : { "min" : 80, "max" : 99 }, "count" : 20 }
R20{ "_id" : { "min" : 0, "max" : 20 }, "count" : 20 }
{ "_id" : { "min" : 20, "max" : 40 }, "count" : 20 }
{ "_id" : { "min" : 40, "max" : 63 }, "count" : 23 }
{ "_id" : { "min" : 63, "max" : 90 }, "count" : 27 }
{ "_id" : { "min" : 90, "max" : 100 }, "count" : 10 }
E24{ "_id" : { "min" : 0, "max" : 20 }, "count" : 20 }
{ "_id" : { "min" : 20, "max" : 43 }, "count" : 23 }
{ "_id" : { "min" : 43, "max" : 68 }, "count" : 25 }
{ "_id" : { "min" : 68, "max" : 91 }, "count" : 23 }
{ "_id" : { "min" : 91, "max" : 100 }, "count" : 9 }
1-2-5{ "_id" : { "min" : 0, "max" : 20 }, "count" : 20 }
{ "_id" : { "min" : 20, "max" : 50 }, "count" : 30 }
{ "_id" : { "min" : 50, "max" : 100 }, "count" : 50 }
The specified number of buckets exceeds the number of intervals in the series.指定的桶数超过了系列中的间隔数。
POWERSOF2{ "_id" : { "min" : 0, "max" : 32 }, "count" : 32 }
{ "_id" : { "min" : 32, "max" : 64 }, "count" : 32 }
{ "_id" : { "min" : 64, "max" : 128 }, "count" : 36 }
The specified number of buckets exceeds the number of intervals in the series.指定的桶数超过了系列中的间隔数。

Examples示例

MongoDB Shell

Consider a collection artwork with the following documents:考虑一个包含以下文件的集合artwork

{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
"price" : Decimal128("199.99"),
"dimensions" : { "height" : 39, "width" : 21, "units" : "in" } }
{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
"price" : Decimal128("280.00"),
"dimensions" : { "height" : 49, "width" : 32, "units" : "in" } }
{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
"price" : Decimal128("76.04"),
"dimensions" : { "height" : 25, "width" : 20, "units" : "in" } }
{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
"price" : Decimal128("167.30"),
"dimensions" : { "height" : 24, "width" : 36, "units" : "in" } }
{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
"price" : Decimal128("483.00"),
"dimensions" : { "height" : 20, "width" : 24, "units" : "in" } }
{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
"price" : Decimal128("385.00"),
"dimensions" : { "height" : 30, "width" : 46, "units" : "in" } }
{ "_id" : 7, "title" : "The Scream", "artist" : "Munch",
"price" : Decimal128("159.00"),
"dimensions" : { "height" : 24, "width" : 18, "units" : "in" } }
{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
"price" : Decimal128("118.42"),
"dimensions" : { "height" : 24, "width" : 20, "units" : "in" } }

Single Facet Aggregation单面聚合

In the following operation, input documents are grouped into four buckets according to the values in the price field:在以下操作中,输入文档根据price字段中的值分为四个桶:

db.artwork.aggregate( [
{
$bucketAuto: {
groupBy: "$price",
buckets: 4
}
}
] )

The operation returns the following documents:该操作返回以下文档:

{
"_id" : {
"min" : Decimal128("76.04"),
"max" : Decimal128("159.00")
},
"count" : 2
}
{
"_id" : {
"min" : Decimal128("159.00"),
"max" : Decimal128("199.99")
},
"count" : 2
}
{
"_id" : {
"min" : Decimal128("199.99"),
"max" : Decimal128("385.00")
},
"count" : 2
}
{
"_id" : {
"min" : Decimal128("385.00"),
"max" : Decimal128("483.00")
},
"count" : 2
}

Multi-Faceted Aggregation多面聚合

The $bucketAuto stage can be used within the $facet stage to process multiple aggregation pipelines on the same set of input documents from artwork.$bucketAuto阶段可以在$facet阶段中使用,以处理来自artwork的同一组输入文档上的多个聚合管道。

The following aggregation pipeline groups the documents from the artwork collection into buckets based on price, year, and the calculated area:以下聚合管道根据priceyear和计算areaartwork集合中的文档分组到桶中:

db.artwork.aggregate( [
{
$facet: {
"price": [
{
$bucketAuto: {
groupBy: "$price",
buckets: 4
}
}
],
"year": [
{
$bucketAuto: {
groupBy: "$year",
buckets: 3,
output: {
"count": { $sum: 1 },
"years": { $push: "$year" }
}
}
}
],
"area": [
{
$bucketAuto: {
groupBy: {
$multiply: [ "$dimensions.height", "$dimensions.width" ]
},
buckets: 4,
output: {
"count": { $sum: 1 },
"titles": { $push: "$title" }
}
}
}
]
}
}
] )

The operation returns the following document:该操作返回以下文档:

{
"area" : [
{
"_id" : { "min" : 432, "max" : 500 },
"count" : 3,
"titles" : [
"The Scream",
"The Persistence of Memory",
"Blue Flower"
]
},
{
"_id" : { "min" : 500, "max" : 864 },
"count" : 2,
"titles" : [
"Dancer",
"The Pillars of Society"
]
},
{
"_id" : { "min" : 864, "max" : 1568 },
"count" : 2,
"titles" : [
"The Great Wave off Kanagawa",
"Composition VII"
]
},
{
"_id" : { "min" : 1568, "max" : 1568 },
"count" : 1,
"titles" : [
"Melancholy III"
]
}
],
"price" : [
{
"_id" : { "min" : Decimal128("76.04"), "max" : Decimal128("159.00") },
"count" : 2
},
{
"_id" : { "min" : Decimal128("159.00"), "max" : Decimal128("199.99") },
"count" : 2
},
{
"_id" : { "min" : Decimal128("199.99"), "max" : Decimal128("385.00") },
"count" : 2 },
{
"_id" : { "min" : Decimal128("385.00"), "max" : Decimal128("483.00") },
"count" : 2
}
],
"year" : [
{ "_id" : { "min" : null, "max" : 1913 }, "count" : 3, "years" : [ 1902 ] },
{ "_id" : { "min" : 1913, "max" : 1926 }, "count" : 3, "years" : [ 1913, 1918, 1925 ] },
{ "_id" : { "min" : 1926, "max" : 1931 }, "count" : 2, "years" : [ 1926, 1931 ] }
]
}
C#

The C# examples on this page use the sample_mflix database from the Atlas sample datasets. 本页上的C#示例使用Atlas示例数据集中的sample_mflix数据库。To learn how to create a free MongoDB Atlas cluster and load the sample datasets, see Get Started in the MongoDB .NET/C# Driver documentation.要了解如何创建免费的MongoDB Atlas集群并加载示例数据集,请参阅MongoDB .NET/C#驱动程序文档中的入门

The following Movie class models the documents in the sample_mflix.movies collection:以下Movie类对sample_mflix.movies集合中的文档进行建模:

public class Movie
{
public ObjectId Id { get; set; }

public int Runtime { get; set; }

public string Title { get; set; }

public string Rated { get; set; }

public List<string> Genres { get; set; }

public string Plot { get; set; }

public ImdbData Imdb { get; set; }

public int Year { get; set; }

public int Index { get; set; }

public string[] Comments { get; set; }

[BsonElement("lastupdated")]
public DateTime LastUpdated { get; set; }
}

Note

ConventionPack for Pascal CasePascal案例的约定包

The C# classes on this page use Pascal case for their property names, but the field names in the MongoDB collection use camel case. To account for this difference, you can use the following code to register a ConventionPack when your application starts:此页面上的C#类使用Pascal大小写作为其属性名,但MongoDB集合中的字段名使用驼峰大小写。为了解释这种差异,您可以在应用程序启动时使用以下代码注册ConventionPack

var camelCaseConvention = new ConventionPack { new CamelCaseElementNameConvention() };
ConventionRegistry.Register("CamelCase", camelCaseConvention, type => true);

To use the MongoDB .NET/C# driver to add a $bucketAuto stage to an aggregation pipeline, call the BucketAuto() method on a PipelineDefinition object.要使用MongoDB .NET/C#驱动程序将BucketAuto()阶段添加到聚合管道中,请在PipelineDefinition对象上调用bucketAuto()方法。

The following example creates a pipeline stage that evenly distributes documents into five buckets by the value of their Runtime field:以下示例创建了一个管道阶段,该阶段根据其Runtime字段的值将文档平均分配到五个桶中:

var pipeline = new EmptyPipelineDefinition<Movie>()
.BucketAuto(
groupBy: m => m.Runtime,
buckets: 5);

You can use an AggregateBucketAutoOptions object to specify a preferred number -based scheme to set boundary values. 您可以使用AggregateBucketAutoOptions对象指定基于首选数字的方案设置边界值。The following example performs the same $bucketAuto operation as the previous example, but also sets the bucket boundaries at powers of 2:以下示例执行与前一个示例相同的$bucketAuto操作,但也将桶边界设置为2的幂次方:

var bucketAutoOptions = new AggregateBucketAutoOptions()
{
Granularity = new AggregateBucketAutoGranularity("POWERSOF2")
};

var pipeline = new EmptyPipelineDefinition<Movie>()
.BucketAuto(
groupBy: m => m.Runtime,
buckets: 5,
options: bucketAutoOptions);
Node.js

The Node.js examples on this page use the sample_mflix database from the Atlas sample datasets. 本页上的Node.js示例使用Atlas示例数据集中的sample_mflix数据库。To learn how to create a free MongoDB Atlas cluster and load the sample datasets, see Get Started in the MongoDB Node.js driver documentation.要了解如何创建免费的MongoDB Atlas集群并加载示例数据集,请参阅MongoDB Node.js驱动程序文档中的入门

To use the MongoDB Node.js driver to add a $bucketAuto stage to an aggregation pipeline, use the $bucketAuto operator in a pipeline object.要使用MongoDB Node.js驱动程序将$bucketAuto阶段添加到聚合管道中,请在管道对象中使用$bucketAuto运算符。

The following example creates a pipeline stage that evenly distributes documents into five buckets by the value of their runtime field. The example then runs the aggregation pipeline:以下示例创建了一个管道阶段,该阶段根据文档的runtime字段的值将文档平均分配到五个桶中。然后,该示例运行聚合管道:

const pipeline = [
{
$bucketAuto: {
groupBy: "$runtime",
buckets: 5
}
}
];

const cursor = collection.aggregate(pipeline);
return cursor;

The following example performs the same $bucketAuto operation as the previous example, but sets the bucket boundaries as powers of 2 by using the granularity parameter:以下示例执行与前一个示例相同的$bucketAuto操作,但通过使用粒度参数将桶边界设置为2的幂:

const pipeline = [
{
$bucketAuto: {
groupBy: "$runtime",
buckets: 5,
granularity: "POWERSOF2"
}
}
];

const cursor = collection.aggregate(pipeline);
return cursor;

Learn More了解更多

To learn more about related pipeline stages, see the $bucket guide.要了解有关相关管道阶段的更多信息,请参阅$bucket指南。