$bucketAuto (aggregation)

~~On this page~~本页内容

~~Definition~~定义
~~Considerations~~注意事项
~~Behavior~~行为
~~Example~~示例

Definition定义

$bucketAuto

~~Categorizes incoming documents into a specific number of groups, called buckets, based on a specified expression.~~ 根据指定的表达式，将传入文档分类为特定数量的组，称为bucket。~~Bucket boundaries are automatically determined in an attempt to evenly distribute the documents into the specified number of buckets.~~桶边界自动确定，以尝试将文档均匀分布到指定数量的桶中。

~~Each bucket is represented as a document in the output.~~ 每个bucket在输出中表示为一个文档。~~The document for each bucket contains:~~每个桶的文档包含：

~~An _id object that specifies the bounds of the bucket.~~指定桶边界的_id.min对象。
- ~~The _id.min field specifies the inclusive lower bound for the bucket.~~_idmin字段指定桶的包含下限。
- ~~The _id.max field specifies the upper bound for the bucket.~~ _id.max字段指定桶的上限。~~This bound is exclusive for all buckets except the final bucket in the series, where it is inclusive.~~该边界对除系列中的最后一个桶之外的所有桶都是排他性的，其中它是包含的。
~~A count field that contains the number of documents in the bucket.~~ 包含存储桶中文档数量的count字段。~~The count field is included by default when the output document is not specified.~~未指定output文档时，默认情况下包括count字段。

~~The $bucketAuto stage has the following form:~~$bucketAuto阶段具有以下形式：

{
  $bucketAuto: {
      groupBy: <expression>,
      buckets: <number>,
      output: {
         <output1>: { <$accumulator expression> },
         ...
      }
      granularity: <string>
  }
}

~~Field~~字段 ~~Type~~类型 ~~Description~~描述

groupBy expression ~~An expression to group documents by.~~ 用于将文档分组的表达式。~~To specify a field path, prefix the field name with a dollar sign $ and enclose it in quotes.~~若要指定字段路径，请在字段名称前面加上美元符号$，并将其括在引号中。

buckets integer ~~A positive 32-bit integer that specifies the number of buckets into which input documents are grouped.~~一个正32位整数，指定输入文档分组到的存储桶数。

output

document

~~Optional.~~ 可选。~~A document that specifies the fields to include in the output documents in addition to the _id field.~~ 一种文档，它指定了除_id字段外还包括在输出文档中的字段。~~To specify the field to include, you must use accumulator expressions:~~要指定要包含的字段，必须使用累加器表达式：

<outputfield1>: { <accumulator>: <expression1> },
...

~~The default count field is not included in the output document when output is specified.~~ 指定output时，输出文档中不包括默认count字段。~~Explicitly specify the count expression as part of the output document to include it:~~显式指定count表达式作为输出文档的一部分，以将其包括在内：

output: {
  <outputfield1>: { <accumulator>: <expression1> },
  ...
  count: { $sum: 1 }
}

granularity

string

~~Optional.~~ 可选。~~A string that specifies the preferred number series to use to ensure that the calculated boundary edges end on preferred round numbers or their powers of 10.~~一个字符串，用于指定要使用的首选数列，以确保计算的边界边以首选整数或其10的幂结束。

~~Available only if the all groupBy values are numeric and none of them are NaN.~~仅当所有groupBy值均为数值且均为NaN时可用。

~~The suppported values of granularity are:~~支持的granularity值为：


`"R5"` `"R10"` `"R20"` `"R40"` `"R80"` `"1-2-5"`	`"E6"` `"E12"` `"E24"` `"E48"` `"E96"` `"E192"` `"POWERSOF2"`

Considerations注意事项

`$bucketAuto` and Memory Restrictions和内存限制

~~The $bucketAuto stage has a limit of 100 megabytes of RAM.~~ $bucketAuto阶段的RAM限制为100兆字节。~~By default, if the stage exceeds this limit, $bucketAuto returns an error.~~ 默认情况下，如果阶段超过此限制，$bucketAuto将返回一个错误。~~To allow more space for stage processing, use the allowDiskUse option to enable aggregation pipeline stages to write data to temporary files.~~要为阶段处理留出更多空间，请使用allowDiskUse选项启用聚合管道阶段将数据写入临时文件。

~~Tip~~提示

~~See also:~~ 参阅：

~~Aggregation Pipeline Limits~~聚合管道限制

Behavior行为

~~There may be less than the specified number of buckets if:~~如果出现以下情况，则可能少于指定的桶数：

~~The number of input documents is less than the specified number of buckets.~~输入文档的数量小于指定的桶数。
~~The number of unique values of the groupBy expression is less than the specified number of buckets.~~groupBy表达式的唯一值数小于指定的buckets的数量。
~~The granularity has fewer intervals than the number of buckets.~~granularity的间隔小于buckets的数量。
~~The granularity is not fine enough to evenly distribute documents into the specified number of buckets.~~granularity不够精细，无法将文档均匀分布到指定数量的buckets中。

~~If the groupBy expression refers to an array or document, the values are arranged using the same ordering as in $sort before determining the bucket boundaries.~~如果groupBy表达式引用数组或文档，则在确定桶边界之前，使用与$sort相同的顺序排列值。

~~The even distribution of documents across buckets depends on the cardinality, or the number of unique values, of the groupBy field.~~ 文档跨存储桶的均匀分布取决于groupBy字段的基数或唯一值的数量。~~If the cardinality is not high enough, the $bucketAuto stage may not evenly distribute the results across buckets.~~如果基数不够高，则$bucketAuto阶段可能无法将结果均匀分布到各个桶。

Granularity粒度

~~The $bucketAuto accepts an optional granularity parameter which ensures that the boundaries of all buckets adhere to a specified preferred number series.~~ $bucketAuto接受一个可选的granularity参数，该参数确保所有桶的边界符合指定的首选数字序列。~~Using a preferred number series provides more control on where the bucket boundaries are set among the range of values in the groupBy expression.~~ 使用优选的数字序列提供了对在groupBy表达式中的值范围中设置桶边界的更多控制。~~They may also be used to help logarithmically and evenly set bucket boundaries when the range of the groupBy expression scales exponentially.~~当groupBy表达式的范围按指数缩放时，它们还可用于帮助对数和均匀地设置桶边界。

Renard Series雷纳德系列

The Renard number series are sets of numbers derived by taking either the 5 ^th, 10 ^th, 20 ^th, 40 ^th, or 80 ^th root of 10, then including various powers of the root that equate to values between 1.0 to 10.0 (10.3 in the case of R80).雷诺数系列是通过取10的第5、第10、第20、第40或第80个根，然后包括等于1.0到10.0之间的值（R80情况下为10.3）的根的各种幂得出的数集。

~~Set granularity to R5, R10, R20, R40, or R80 to restrict bucket boundaries to values in the series.~~ 将granularity设置为R5、R10、R20、R40或R80，以将桶边界限制为系列中的值。~~The values of the series are multiplied by a power of 10 when the groupBy values are outside of the 1.0 to 10.0 (10.3 for R80) range.~~当groupBy值在1.0到10.0（R80为10.3）范围之外时，该系列的值乘以10的幂。

~~Example~~示例

~~The R5 series is based off of the fifth root of 10, which is 1.58, and includes various powers of this root (rounded) until 10 is reached.~~ R5系列基于10的五次方根，即1.58，并包括该方根的各种幂（四舍五入），直到达到10。~~The R5 series is derived as follows:~~R5系列推导如下：

10 ^0/5 = 1
10 ^1/5 = 1.584 ~ 1.6
10 ^2/5 = 2.511 ~ 2.5
10 ^3/5 = 3.981 ~ 4.0
10 ^4/5 = 6.309 ~ 6.3
10 ^5/5 = 10

~~The same approach is applied to the other Renard series to offer finer granularity, i.e., more intervals between 1.0 and 10.0 (10.3 for R80).~~同样的方法适用于其他Renard系列，以提供更精细的粒度，即1.0和10.0之间的更多间隔（R80为10.3）。

E SeriesE数列

The E number series are similar to the Renard series in that they subdivide the interval from 1.0 to 10.0 by the 6 ^th, 12 ^th, 24 ^th, 48 ^th, 96 ^th, or 192 ^nd root of ten with a particular relative error.E数列与Renard数列相似，因为它们将1.0到10.0之间的间隔细分为10的第6、12、24、48、96或192次方根，并具有特定的相对误差。

~~Set granularity to E6, E12, E24, E48, E96, or E192 to restrict bucket boundaries to values in the series.~~ 将granularity设置为E6、E12、E24、E48、E96或E192，以将桶边界限制为系列中的值。~~The values of the series are multiplied by a power of 10 when the groupBy values are outside of the 1.0 to 10.0 range.~~ 当groupBy值在1.0到10.0范围之外时，序列的值乘以10的幂。~~To learn more about the E-series and their respective relative errors, see preferred number series.~~要了解有关E系列及其各自相对误差的更多信息，请参阅首选数字系列。

1-2-5 Series

~~The 1-2-5 series behaves like a three-value Renard series, if such a series existed.~~1-2-5系列的行为类似于三值Renard系列（如果存在此类系列）。

~~Set granularity to 1-2-5 to restrict bucket boundaries to various powers of the third root of 10, rounded to one significant digit.~~将granularity设置为1-2-5，将桶边界限制为10的第三个根的各种幂，四舍五入到一个有效数字。

~~Example~~示例

~~The following values are part of the 1-2-5 series: 0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, and so on...~~以下值是1-2-5系列的一部分：0.1、0.2、0.5、1、2、5、10、20、50、100、200、500、1000等。。。

Powers of Two Series二级数的幂

~~Set granularity to POWERSOF2 to restrict bucket boundaries to numbers that are a power of two.~~将granularity设置为POWERSOF2，将桶边界限制为2的幂。

~~Example~~实例

~~The following numbers adhere to the power of two Series:~~以下数字符合两个系列的幂：

2 ⁰ = 1
2 ¹ = 2
2 ² = 4
2 ³ = 8
2 ⁴ = 16
2 ⁵ = 32
and so on...

~~A common implementation is how various computer components, like memory, often adhere to the POWERSOF2 set of preferred numbers:~~一种常见的实现方式是，各种计算机组件（如内存）通常遵循2的幂组首选数字：

~~1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, and so on....~~1、2、4、8、16、32、64、128、256、512、1024、2048……

Comparing Different Granularities比较不同粒度

~~The following operation demonstrates how specifying different values for granularity affects how $bucketAuto determines bucket boundaries.~~ 以下操作演示了为granularity指定不同的值如何影响$bucketAuto如何确定桶边界。~~A collection of things have an _id numbered from 1 to 100:~~things集合具有编号为1到100的_id：

{ _id: 1 }
{ _id: 2 }
...
{ _id: 100 }

~~Different values for granularity are substituted into the following operation:~~granularity的不同值被替换为以下操作：

db.things.aggregate( [
  {
    $bucketAuto: {
      groupBy: "$_id",
      buckets: 5,
      granularity: <granularity>
    }
  }
] )

~~The results in the following table demonstrate how different values for granularity yield different bucket boundaries:~~下表中的结果演示了granularity的不同值如何产生不同的桶边界：

Granularity	Results	Notes
No granularity	{ "_id" : { "min" : 0, "max" : 20 }, "count" : 20 } { "_id" : { "min" : 20, "max" : 40 }, "count" : 20 } { "_id" : { "min" : 40, "max" : 60 }, "count" : 20 } { "_id" : { "min" : 60, "max" : 80 }, "count" : 20 } { "_id" : { "min" : 80, "max" : 99 }, "count" : 20 }
R20	{ "_id" : { "min" : 0, "max" : 20 }, "count" : 20 } { "_id" : { "min" : 20, "max" : 40 }, "count" : 20 } { "_id" : { "min" : 40, "max" : 63 }, "count" : 23 } { "_id" : { "min" : 63, "max" : 90 }, "count" : 27 } { "_id" : { "min" : 90, "max" : 100 }, "count" : 10 }
E24	{ "_id" : { "min" : 0, "max" : 20 }, "count" : 20 } { "_id" : { "min" : 20, "max" : 43 }, "count" : 23 } { "_id" : { "min" : 43, "max" : 68 }, "count" : 25 } { "_id" : { "min" : 68, "max" : 91 }, "count" : 23 } { "_id" : { "min" : 91, "max" : 100 }, "count" : 9 }
1-2-5	{ "_id" : { "min" : 0, "max" : 20 }, "count" : 20 } { "_id" : { "min" : 20, "max" : 50 }, "count" : 30 } { "_id" : { "min" : 50, "max" : 100 }, "count" : 50 }	~~The specified number of buckets exceeds the number of intervals in the series.~~指定的桶数超过了系列中的间隔数。
POWERSOF2	{ "_id" : { "min" : 0, "max" : 32 }, "count" : 32 } { "_id" : { "min" : 32, "max" : 64 }, "count" : 32 } { "_id" : { "min" : 64, "max" : 128 }, "count" : 36 }	~~The specified number of buckets exceeds the number of intervals in the series.~~指定的桶数超过了系列中的间隔数。

Example示例

~~Consider a collection artwork with the following documents:~~考虑一个artwork集合包含以下文档：

{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
    "price" : NumberDecimal("199.99"),
    "dimensions" : { "height" : 39, "width" : 21, "units" : "in" } }
{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
    "price" : NumberDecimal("280.00"),
    "dimensions" : { "height" : 49, "width" : 32, "units" : "in" } }
{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
    "price" : NumberDecimal("76.04"),
    "dimensions" : { "height" : 25, "width" : 20, "units" : "in" } }
{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
    "price" : NumberDecimal("167.30"),
    "dimensions" : { "height" : 24, "width" : 36, "units" : "in" } }
{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
    "price" : NumberDecimal("483.00"),
    "dimensions" : { "height" : 20, "width" : 24, "units" : "in" } }
{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
    "price" : NumberDecimal("385.00"),
    "dimensions" : { "height" : 30, "width" : 46, "units" : "in" } }
{ "_id" : 7, "title" : "The Scream", "artist" : "Munch",
    "price" : NumberDecimal("159.00"),
    "dimensions" : { "height" : 24, "width" : 18, "units" : "in" } }
{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
    "price" : NumberDecimal("118.42"),
    "dimensions" : { "height" : 24, "width" : 20, "units" : "in" } }

Single Facet Aggregation单面聚合

~~In the following operation, input documents are grouped into four buckets according to the values in the price field:~~在以下操作中，输入单据根据price字段中的值分为四个桶：

db.artwork.aggregate( [
   {
     $bucketAuto: {
         groupBy: "$price",
         buckets: 4
     }
   }
] )

~~The operation returns the following documents:~~操作将返回以下文档：

{
  "_id" : {
    "min" : NumberDecimal("76.04"),
    "max" : NumberDecimal("159.00")
  },
  "count" : 2
}
{
  "_id" : {
    "min" : NumberDecimal("159.00"),
    "max" : NumberDecimal("199.99")
  },
  "count" : 2
}
{
  "_id" : {
    "min" : NumberDecimal("199.99"),
    "max" : NumberDecimal("385.00")
  },
  "count" : 2
}
{
  "_id" : {
    "min" : NumberDecimal("385.00"),
    "max" : NumberDecimal("483.00")
  },
  "count" : 2
}

Multi-Faceted Aggregation多面聚合

~~The $bucketAuto stage can be used within the $facet stage to process multiple aggregation pipelines on the same set of input documents from artwork.~~$bucketAuto阶段可以在$facet阶段中使用，以处理来自artwork的同一组输入文档上的多个聚合管道。

~~The following aggregation pipeline groups the documents from the artwork collection into buckets based on price, year, and the calculated area:~~以下聚合管道根据price、year和计算area将artwork集合中的文档分组到桶中：

db.artwork.aggregate( [
  {
    $facet: {
      "price": [
        {
          $bucketAuto: {
            groupBy: "$price",
            buckets: 4
          }
        }
      ],
      "year": [
        {
          $bucketAuto: {
            groupBy: "$year",
            buckets: 3,
            output: {
              "count": { $sum: 1 },
              "years": { $push: "$year" }
            }
          }
        }
      ],
      "area": [
        {
          $bucketAuto: {
            groupBy: {
              $multiply: [ "$dimensions.height", "$dimensions.width" ]
            },
            buckets: 4,
            output: {
              "count": { $sum: 1 },
              "titles": { $push: "$title" }
            }
          }
        }
      ]
    }
  }
] )

~~The operation returns the following document:~~运算返回以下文档：

{
  "area" : [
    {
      "_id" : { "min" : 432, "max" : 500 },
      "count" : 3,
      "titles" : [
        "The Scream",
        "The Persistence of Memory",
        "Blue Flower"
      ]
    },
    {
      "_id" : { "min" : 500, "max" : 864 },
      "count" : 2,
      "titles" : [
        "Dancer",
        "The Pillars of Society"
      ]
    },
    {
      "_id" : { "min" : 864, "max" : 1568 },
      "count" : 2,
      "titles" : [
        "The Great Wave off Kanagawa",
        "Composition VII"
      ]
    },
    {
      "_id" : { "min" : 1568, "max" : 1568 },
      "count" : 1,
      "titles" : [
        "Melancholy III"
      ]
    }
  ],
  "price" : [
    {
      "_id" : { "min" : NumberDecimal("76.04"), "max" : NumberDecimal("159.00") },
      "count" : 2
    },
    {
      "_id" : { "min" : NumberDecimal("159.00"), "max" : NumberDecimal("199.99") },
      "count" : 2
    },
    {
      "_id" : { "min" : NumberDecimal("199.99"), "max" : NumberDecimal("385.00") },
      "count" : 2 },
    {
      "_id" : { "min" : NumberDecimal("385.00"), "max" : NumberDecimal("483.00") },
      "count" : 2
    }
  ],
  "year" : [
    { "_id" : { "min" : null, "max" : 1913 }, "count" : 3, "years" : [ 1902 ] },
    { "_id" : { "min" : 1913, "max" : 1926 }, "count" : 3, "years" : [ 1913, 1918, 1925 ] },
    { "_id" : { "min" : 1926, "max" : 1931 }, "count" : 2, "years" : [ 1926, 1931 ] }
  ]
}

← $bucket (aggregation)$collStats (aggregation) →