$bucket (aggregation)

~~On this page~~本页内容

~~Definition~~定义
~~Considerations~~注意事项
~~Syntax~~语法
~~Behavior~~行为
~~Examples~~示例

Definition定义

$bucket

~~Categorizes incoming documents into groups, called buckets, based on a specified expression and bucket boundaries and outputs a document per each bucket.~~ 根据指定的表达式和桶边界将传入文档分类为称为桶的组，并为每个桶输出一个文档。~~Each output document contains an _id field whose value specifies the inclusive lower bound of the bucket.~~ 每个输出文档都包含一个_id字段，其值指定桶的包含下限。~~The output option specifies the fields included in each output document.~~输出选项指定每个输出文档中包含的字段。

$bucket ~~only produces output documents for buckets that contain at least one input document.~~仅为包含至少一个输入文档的存储桶生成输出文档。

Considerations注意事项

`$bucket` and Memory Restrictions和内存限制

~~The $bucket stage has a limit of 100 megabytes of RAM.~~ $bucket阶段的RAM限制为100兆字节。~~By default, if the stage exceeds this limit, $bucket returns an error.~~ 默认情况下，如果阶段超过此限制，$bucket将返回一个错误。~~To allow more space for stage processing, use the allowDiskUse option to enable aggregation pipeline stages to write data to temporary files.~~要为阶段处理留出更多空间，请使用allowDiskUse选项启用聚合管道阶段将数据写入临时文件。

~~Tip~~提示

~~See also:~~ 参阅：

~~Aggregation Pipeline Limits~~聚合管道限制

Syntax语法

{
  $bucket: {
      groupBy: <expression>,
      boundaries: [ <lowerbound1>, <lowerbound2>, ... ],
      default: <literal>,
      output: {
         <output1>: { <$accumulator expression> },
         ...
         <outputN>: { <$accumulator expression> }
      }
   }
}

~~The $bucket document contains the following fields:~~$bucket文档包含以下字段：

~~Field~~字段	~~Type~~类型	~~Description~~描述
`groupBy`	expression	~~An expression to group documents by.~~ 用于将文档分组的表达式。~~To specify a field path, prefix the field name with a dollar sign `$` and enclose it in quotes.~~若要指定字段路径，请在字段名称前面加上美元符号`$`，并将其括在引号中。 ~~Unless `$bucket` includes a default specification, each input document must resolve the `groupBy` field path or expression to a value that falls within one of the ranges specified by the boundaries.~~除非`$bucket`包含`default`规范，否则每个输入文档必须将`groupBy`字段路径或表达式解析为属于边界指定范围之一的值。
`boundaries`	array	~~An array of values based on the groupBy expression that specify the boundaries for each bucket.~~ 基于`groupBy`表达式的值数组，用于指定每个桶的边界。~~Each adjacent pair of values acts as the inclusive lower boundary and the exclusive upper boundary for the bucket.~~ 每个相邻值对充当桶的包含性下边界和排他性上边界。~~You must specify at least two boundaries.~~必须至少指定两个边界。 ~~The specified values must be in ascending order and all of the same type.~~ 指定的值必须按升序排列，并且都是相同的类型。~~The exception is if the values are of mixed numeric types, such as:~~例外情况是，如果值是混合数字类型，例如： `[ 10, NumberLong(20), NumberInt(30) ]` ~~Example~~示例 ~~An array of `[ 0, 5, 10 ]` creates two buckets:~~`[0,5,10]`数组创建两个桶： [0, 5) ~~with inclusive lower bound `0` and exclusive upper bound `5`.~~带有包括下界`0`和排除上界`5`。 [5, 10) ~~with inclusive lower bound `5` and exclusive upper bound `10`.~~带有包括下界`5`和排除上界`10`。
`default`	literal	~~Optional.~~ 可选。~~A literal that specifies the `_id` of an additional bucket that contains all documents whose groupBy expression result does not fall into a bucket specified by boundaries.~~指定附加存储桶的`_id`的文字，该存储桶包含其`groupBy`表达式结果不属于由`boundaries`指定的存储桶的所有文档。 ~~If unspecified, each input document must resolve the `groupBy` expression to a value within one of the bucket ranges specified by `boundaries` or the operation throws an error.~~如果未指定，则每个输入文档必须将`groupBy`表达式解析为边界指定的桶范围之一内的值，否则操作将抛出错误。 ~~The `default` value must be less than the lowest `boundaries` value, or greater than or equal to the highest `boundaries` value.~~`default`必须小于最低`boundaries`值，或大于或等于最高`boundaries`值。 ~~The `default` value can be of a different type than the entries in `boundaries`.~~`default`值的类型可以与边界中的条目不同。
`output`	document	~~Optional.~~ 可选。~~A document that specifies the fields to include in the output documents in addition to the `_id` field.~~ 一种文档，它指定了除`_id`字段外还包括在输出文档中的字段。~~To specify the field to include, you must use accumulator expressions.~~要指定要包含的字段，必须使用累加器表达式。 <outputfield1>: { <accumulator>: <expression1> }, ... <outputfieldN>: { <accumulator>: <expressionN> } ~~If you do not specify an `output` document, the operation returns a `count` field containing the number of documents in each bucket.~~如果未指定`output`文档，则操作将返回一个`count`字段，其中包含每个存储桶中的文档数。 ~~If you specify an `output` document, only the fields specified in the document are returned; i.e. the `count` field is not returned unless it is explicitly included in the `output` document.~~如果指定`output`文档，则只返回文档中指定的字段；即，除非`output`文档中明确包含`count`字段，否则不返回计数字段。

Behavior行为

$bucket ~~requires at least one of the following conditions to be met or the operation throws an error:~~要求至少满足以下条件之一，否则操作将抛出错误：

~~Each input document resolves the groupBy expression to a value within one of the bucket ranges specified by boundaries, or~~每个输入文档将groupBy表达式解析为boundaries指定的桶范围之一内的值，或者
~~A default value is specified to bucket documents whose groupBy values are outside of the boundaries or of a different BSON type than the values in boundaries.~~对于groupBy值超出boundaries或与boundaries中的值不同的BSON类型的bucket文档，将指定一个default值。

~~If the groupBy expression resolves to an array or a document, $bucket arranges the input documents into buckets using the comparison logic from $sort.~~如果groupBy表达式解析为数组或文档，$bucket使用$sort的比较逻辑将输入文档排列到bucket中。

Examples示例

Bucket by Year and Filter by Bucket Results按年份存储桶和按存储桶筛选结果

~~In mongosh, create a sample collection named artists with the following documents:~~在mongosh中，使用以下文档创建名为artists的样本集合：

db.artists.insertMany([
  { "_id" : 1, "last_name" : "Bernard", "first_name" : "Emil", "year_born" : 1868, "year_died" : 1941, "nationality" : "France" },
  { "_id" : 2, "last_name" : "Rippl-Ronai", "first_name" : "Joszef", "year_born" : 1861, "year_died" : 1927, "nationality" : "Hungary" },
  { "_id" : 3, "last_name" : "Ostroumova", "first_name" : "Anna", "year_born" : 1871, "year_died" : 1955, "nationality" : "Russia" },
  { "_id" : 4, "last_name" : "Van Gogh", "first_name" : "Vincent", "year_born" : 1853, "year_died" : 1890, "nationality" : "Holland" },
  { "_id" : 5, "last_name" : "Maurer", "first_name" : "Alfred", "year_born" : 1868, "year_died" : 1932, "nationality" : "USA" },
  { "_id" : 6, "last_name" : "Munch", "first_name" : "Edvard", "year_born" : 1863, "year_died" : 1944, "nationality" : "Norway" },
  { "_id" : 7, "last_name" : "Redon", "first_name" : "Odilon", "year_born" : 1840, "year_died" : 1916, "nationality" : "France" },
  { "_id" : 8, "last_name" : "Diriks", "first_name" : "Edvard", "year_born" : 1855, "year_died" : 1930, "nationality" : "Norway" }
])

~~The following operation groups the documents into buckets according to the year_born field and filters based on the count of documents in the buckets:~~以下操作根据year_born字段将文档分组到存储桶中，并根据存储桶中的文档计数进行筛选：

db.artists.aggregate( [
  // First Stage
  {
    $bucket: {
      groupBy: "$year_born",
                        // Field to group by
      boundaries: [ 1840, 1850, 1860, 1870, 1880 ], // Boundaries for the buckets
      default: "Other",
                             // Bucket id for documents which do not fall into a bucket
      output: {
                                     // Output for each bucket
        "count": { $sum: 1 },
        "artists" :
          {
            $push: {
              "name": { $concat: [ "$first_name", " ", "$last_name"] },
              "year_born": "$year_born"
            }
          }
      }
    }
  },
  // Second Stage
  {
    $match: { count: {$gt: 3} }
  }
] )

~~First Stage~~第一阶段

~~The $bucket stage groups the documents into buckets by the year_born field.~~ $bucket阶段按照year_born字段将文档分组到桶中。~~The buckets have the following boundaries:~~桶具有以下boundaries：

[1840, 1850) ~~with inclusive lowerbound 1840 and exclusive upper bound 1850.~~带有包括下界1840和排除上界1850。
[1850, 1860) ~~with inclusive lowerbound 1850 and exclusive upper bound 1860.~~带有包括下界1850和排除上界1860。
[1860, 1870) ~~with inclusive lowerbound 1860 and exclusive upper bound 1870.~~带有包括下界1860和排除上界1870。
[1870, 1880) ~~with inclusive lowerbound 1870 and exclusive upper bound 1880.~~带有包括下界1870和排除上界1880。
~~If a document did not contain the year_born field or its year_born field was outside the ranges above, it would be placed in the 默认 bucket with the _id value "Other".~~如果文档不包含year_born字段或其year_born字段不在上述范围内，则它将被放置在default存储桶中，其_id值为"Other"。

~~The stage includes the output document to determine the fields to return:~~该阶段包括用于确定要返回的字段的输出文档：

~~Field~~字段 ~~Description~~描述

_id ~~Inclusive lower bound of the bucket.~~包括桶的下限。

count ~~Count of documents in the bucket.~~桶中文档的计数。

artists

~~Array of documents containing information on each artist in the bucket.~~ 包含桶中每个艺术家信息的文档数组。~~Each document contains the artist's~~每个文档包含艺术家的

name~~, which is a concatenation (i.e. $concat) of the artist's first_name and last_name.~~，这是艺术家的first_name和last_name的串联（即$concat）。
year_born

~~This stage passes the following documents to the next stage:~~此阶段将以下文件传递到下一阶段：

{ "_id" : 1840, "count" : 1, "artists" : [ { "name" : "Odilon Redon", "year_born" : 1840 } ] }
{ "_id" : 1850, "count" : 2, "artists" : [ { "name" : "Vincent Van Gogh", "year_born" : 1853 },
                                           { "name" : "Edvard Diriks", "year_born" : 1855 } ] }
{ "_id" : 1860, "count" : 4, "artists" : [ { "name" : "Emil Bernard", "year_born" : 1868 },
                                           { "name" : "Joszef Rippl-Ronai", "year_born" : 1861 },
                                           { "name" : "Alfred Maurer", "year_born" : 1868 },
                                           { "name" : "Edvard Munch", "year_born" : 1863 } ] }
{ "_id" : 1870, "count" : 1, "artists" : [ { "name" : "Anna Ostroumova", "year_born" : 1871 } ] }

~~Second Stage~~第二阶段

~~The $match stage filters the output from the previous stage to only return buckets which contain more than 3 documents.~~$match阶段筛选前一阶段的输出，只返回包含3个以上文档的存储桶。

~~The operation returns the following document:~~运算返回以下文档：

{ "_id" : 1860, "count" : 4, "artists" :
  [
    { "name" : "Emil Bernard", "year_born" : 1868 },
    { "name" : "Joszef Rippl-Ronai", "year_born" : 1861 },
    { "name" : "Alfred Maurer", "year_born" : 1868 },
    { "name" : "Edvard Munch", "year_born" : 1863 }
  ]
}

Use $bucket with $facet to Bucket by Multiple Fields将`$bucket`与`$facet`一起用于多个字段的bucket

~~You can use the $facet stage to perform multiple $bucket aggregations in a single stage.~~您可以使用$facet阶段在单个阶段中执行多个$bucket聚合。

~~In mongosh, create a sample collection named artwork with the following documents:~~在mongosh中，使用以下文档创建名为artwork的样本集合：

db.artwork.insertMany([
  { "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
      "price" : NumberDecimal("199.99") },
  { "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
      "price" : NumberDecimal("280.00") },
  { "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
      "price" : NumberDecimal("76.04") },
  { "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
      "price" : NumberDecimal("167.30") },
  { "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
      "price" : NumberDecimal("483.00") },
  { "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
      "price" : NumberDecimal("385.00") },
  { "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893
      /* No price*/ },
  { "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
      "price" : NumberDecimal("118.42") }
])

~~The following operation uses two $bucket stages within a $facet stage to create two groupings, one by price and the other by year:~~以下操作使用$facet阶段中的两个$bucket阶段创建两个分组，一个按price，另一个按year：

db.artwork.aggregate( [
  {
    $facet: {
                               // Top-level $facet stage
      "price": [
                            // Output field 1
        {
          $bucket: {
              groupBy: "$price",
            // Field to group by
              boundaries: [ 0, 200, 400 ],  // Boundaries for the buckets
              default: "Other",
             // Bucket id for documents which do not fall into a bucket
              output: {
                     // Output for each bucket
                "count": { $sum: 1 },
                "artwork" : { $push: { "title": "$title", "price": "$price" } },
                "averagePrice": { $avg: "$price" }
              }
          }
        }
      ],
      "year": [
                                      // Output field 2
        {
          $bucket: {
            groupBy: "$year",
                        // Field to group by
            boundaries: [ 1890, 1910, 1920, 1940 ],  // Boundaries for the buckets
            default: "Unknown",
                      // Bucket id for documents which do not fall into a bucket
            output: {
                                // Output for each bucket
              "count": { $sum: 1 },
              "artwork": { $push: { "title": "$title", "year": "$year" } }
            }
          }
        }
      ]
    }
  }
] )

~~First Facet~~个第一面

~~The first facet groups the input documents by price.~~ 第一个方面按price对输入文档进行分组。~~The buckets have the following boundaries:~~桶具有以下边界：

[0, 200) ~~with inclusive lowerbound 0 and exclusive upper bound 200.~~带有包括下界0和排除上界200。
[200, 400) ~~with inclusive lowerbound 200 and exclusive upper bound 400.~~带有包括下界200和排除上界400。
~~"Other", the default bucket containing documents without prices or prices outside the ranges above.~~“其他”，包含没有价格或价格超出上述范围的单据的default桶。

~~The $bucket stage includes the output document to determine the fields to return:~~$bucket阶段包括output文档，用于确定要返回的字段：

~~Field~~字段	~~Description~~描述
`_id`	~~Inclusive lower bound of the bucket.~~包括桶的下限。
`count`	~~Count of documents in the bucket.~~桶中文档的计数。
`artwork`	~~Array of documents containing information on each artwork in the bucket.~~包含桶中每个艺术品信息的文档数组。
`averagePrice`	~~Employs the `$avg` operator to display the average price of all artwork in the bucket.~~使用`$avg`运算符显示桶中所有艺术品的平均价格。

~~Second Facet~~第二阶段

~~The second facet groups the input documents by year.~~ 第二个方面按year对输入文档进行分组。~~The buckets have the following boundaries:~~桶具有以下边界：

[1890, 1910) ~~with inclusive lowerbound 1890 and exclusive upper bound 1910.~~带有包括下界1890和排除上界1910。
[1910, 1920) ~~with inclusive lowerbound 1910 and exclusive upper bound 1920.~~带有包括下界1910和排除上界1920。
[1920, 1940) ~~with inclusive lowerbound 1910 and exclusive upper bound 1940.~~带有包括下界1910和排除上界1940。
~~"Unknown", the default bucket containing documents without years or years outside the ranges above.~~“未知”，default存储桶中包含的文档没有年份或年份超出上述范围。

~~The $bucket stage includes the output document to determine the fields to return:~~$bucket阶段包括输出文档，用于确定要返回的字段：

~~Field~~字段	~~Description~~描述
`count`	~~Count of documents in the bucket.~~桶中文档的计数。
`artwork`	~~Array of documents containing information on each artwork in the bucket.~~包含桶中每个艺术品信息的文档数组。

~~Output~~输出