$bucket (aggregation)

~~On this page~~本页内容

~~Definition~~定义
~~Considerations~~注意事项
~~Syntax~~语法
~~Behavior~~行为
~~Examples~~实例

Definition定义

$bucket

~~Categorizes incoming documents into groups, called buckets, based on a specified expression and bucket boundaries and outputs a document per each bucket.~~ 根据指定的表达式和桶边界，将传入的文档分类为称为bucket的组，并为每个桶输出一个文档。~~Each output document contains an _id field whose value specifies the inclusive lower bound of the bucket.~~ 每个输出文档都包含一个_id字段，其值指定桶的包含下界。~~The output option specifies the fields included in each output document.~~output选项指定每个输出文档中包含的字段。

$bucket ~~only produces output documents for buckets that contain at least one input document.~~仅为包含至少一个输入文档的存储桶生成输出文档。

Considerations注意事项

`$bucket` and Memory Restrictions

~~The $bucket stage has a limit of 100 megabytes of RAM.~~ $bucket阶段的RAM限制为100兆字节。By default, if the stage exceeds this limit, $bucket returns an error. To allow more space for stage processing, use the allowDiskUse option to enable aggregation pipeline stages to write data to temporary files.默认情况下，如果阶段超过此限制，$bucket将返回一个错误。若要为阶段处理留出更多空间，请使用allowDiskUse选项启用聚合管道阶段以将数据写入临时文件。

Tip

Syntax语法

{
  $bucket: {
      groupBy: <expression>,
      boundaries: [ <lowerbound1>, <lowerbound2>, ... ],
      default: <literal>,
      output: {
         <output1>: { <$accumulator expression> },
         ...
         <outputN>: { <$accumulator expression> }
      }
   }
}

~~The $bucket document contains the following fields:~~$bucket文档包含以下字段：

~~Field~~字段	~~Type~~类型	~~Description~~描述
`groupBy`	expression	~~An expression to group documents by.~~ 文档分组依据的表达式。~~To specify a field path, prefix the field name with a dollar sign `$` and enclose it in quotes.~~若要指定字段路径，请在字段名称前面加上美元符号$，并将其括在引号中。 ~~Unless `$bucket` includes a default specification, each input document must resolve the `groupBy` field path or expression to a value that falls within one of the ranges specified by the boundaries.~~ 除非`$bucket`包含`default`规范，否则每个输入文档必须将`groupBy`字段路径或表达式解析为一个位于`boundaries`指定的范围内的值。
`boundaries`	array	~~An array of values based on the groupBy expression that specify the boundaries for each bucket.~~ 一个基于`groupBy`表达式的值数组，用于指定每个桶的边界。~~Each adjacent pair of values acts as the inclusive lower boundary and the exclusive upper boundary for the bucket. You must specify at least two boundaries.~~每个相邻的值对充当桶的包含下边界和排除上边界。必须至少指定两个边界。 ~~The specified values must be in ascending order and all of the same type.~~ 指定的值必须按升序排列，并且都属于同一类型。~~The exception is if the values are of mixed numeric types, such as:~~如果值是混合数字类型，则会出现例外，例如： `[ 10, NumberLong(20), NumberInt(30) ]` Example ~~An array of `[ 0, 5, 10 ]` creates two buckets:~~ `[ 0, 5, 10 ]`的数组创建两个桶： `[0, 5)` ~~with inclusive lower bound `0` and exclusive upper bound `5`.~~具有包含下界`0`和排除上界`5`。 `[5, 10)` ~~with inclusive lower bound `5` and exclusive upper bound `10`.~~具有包含下界`5`和排除上界`10`。
`default`	literal	~~Optional.~~可选的。~~A literal that specifies the `_id` of an additional bucket that contains all documents whose groupBy expression result does not fall into a bucket specified by boundaries.~~指定附加存储桶的`_id`的文字，该存储桶包含`groupBy`表达式结果不属于`boundaries`指定存储桶的所有文档。 ~~If unspecified, each input document must resolve the `groupBy` expression to a value within one of the bucket ranges specified by `boundaries` or the operation throws an error.~~如果未指定，则每个输入文档必须将`groupBy`表达式解析为`boundaries`指定的某个桶范围内的值，否则操作将引发错误。 ~~The `default` value must be less than the lowest `boundaries` value, or greater than or equal to the highest `boundaries` value.~~`default`值必须小于最低`boundaries`值，或大于或等于最高`boundaries`值。 ~~The `default` value can be of a different type than the entries in `boundaries`.~~ `default`值可以是与`boundaries`中的条目不同的类型。
`output`	document	~~Optional.~~可选的。~~A document that specifies the fields to include in the output documents in addition to the `_id` field.~~ 除了`_id`字段外，还指定要包含在输出文档中的字段的文档。~~To specify the field to include, you must use accumulator expressions.~~ 若要指定要包含的字段，必须使用累加器表达式。 <outputfield1>: { <accumulator>: <expression1> }, ... <outputfieldN>: { <accumulator>: <expressionN> } ~~If you do not specify an `output` document, the operation returns a `count` field containing the number of documents in each bucket.~~如果未指定`output`文档，则操作将返回一个`count`字段，该字段包含每个存储桶中的文档数。 ~~If you specify an `output` document, only the fields specified in the document are returned; i.e. the `count` field is not returned unless it is explicitly included in the `output` document.~~ 如果指定`output`文档，则只返回文档中指定的字段；即，除非`count`字段明确包含在输出文档中，否则不会返回该字段。

Behavior行为

$bucket ~~requires at least one of the following conditions to be met or the operation throws an error:~~要求至少满足以下条件之一，或者操作引发错误：

~~Each input document resolves the groupBy expression to a value within one of the bucket ranges specified by boundaries, or~~每个输入文档将groupBy表达式解析为boundaries指定的一个桶范围内的值，或者
~~A default value is specified to bucket documents whose groupBy values are outside of the boundaries or of a different BSON type than the values in boundaries.~~为groupBy值在boundaries之外或BSON类型与边界中的值不同的桶文档指定default值。

~~If the groupBy expression resolves to an array or a document, $bucket arranges the input documents into buckets using the comparison logic from $sort.~~如果groupBy表达式解析为数组或文档，$bucket将使用$sort中的比较逻辑将输入文档排列到桶中。

Examples实例

Bucket by Year and Filter by Bucket Results逐年筛选和按筛选结果筛选

~~In mongosh, create a sample collection named artists with the following documents:~~在mongosh中，用以下文件创建一个名为artists的样本集合：

db.artists.insertMany([
  { "_id" : 1, "last_name" : "Bernard", "first_name" : "Emil", "year_born" : 1868, "year_died" : 1941, "nationality" : "France" },
  { "_id" : 2, "last_name" : "Rippl-Ronai", "first_name" : "Joszef", "year_born" : 1861, "year_died" : 1927, "nationality" : "Hungary" },
  { "_id" : 3, "last_name" : "Ostroumova", "first_name" : "Anna", "year_born" : 1871, "year_died" : 1955, "nationality" : "Russia" },
  { "_id" : 4, "last_name" : "Van Gogh", "first_name" : "Vincent", "year_born" : 1853, "year_died" : 1890, "nationality" : "Holland" },
  { "_id" : 5, "last_name" : "Maurer", "first_name" : "Alfred", "year_born" : 1868, "year_died" : 1932, "nationality" : "USA" },
  { "_id" : 6, "last_name" : "Munch", "first_name" : "Edvard", "year_born" : 1863, "year_died" : 1944, "nationality" : "Norway" },
  { "_id" : 7, "last_name" : "Redon", "first_name" : "Odilon", "year_born" : 1840, "year_died" : 1916, "nationality" : "France" },
  { "_id" : 8, "last_name" : "Diriks", "first_name" : "Edvard", "year_born" : 1855, "year_died" : 1930, "nationality" : "Norway" }
])

~~The following operation groups the documents into buckets according to the year_born field and filters based on the count of documents in the buckets:~~以下操作根据year_born字段将文档分组到bucket中，并根据bucket中的文档数进行筛选：

db.artists.aggregate( [
  // First Stage
  {
    $bucket: {
      groupBy: "$year_born",                        // Field to group by分组依据字段
      boundaries: [ 1840, 1850, 1860, 1870, 1880 ], // Boundaries for the buckets桶的边界
      default: "Other",                             // Bucket ID for documents which do not fall into a bucket不属于存储桶的文档的存储桶ID
      output: {                                     // Output for each bucket每个桶的输出
        "count": { $sum: 1 },
        "artists" :
          {
            $push: {
              "name": { $concat: [ "$first_name", " ", "$last_name"] },
              "year_born": "$year_born"
            }
          }
      }
    }
  },
  // Second Stage第二阶段
  {
    $match: { count: {$gt: 3} }
  }
] )

~~First Stage~~第一阶段

~~The $bucket stage groups the documents into buckets by the year_born field. The buckets have the following boundaries:~~$bucket阶段根据year_born字段将文档分组到桶中。桶具有以下boundaries：

[1840, 1850) ~~with inclusive lowerbound 1840 and exclusive upper bound 1850.~~具有包含下限1840和排除上限1850。
[1850, 1860) ~~with inclusive lowerbound 1850 and exclusive upper bound 1860.~~具有包含下限1850和排除上限1860。
[1860, 1870) ~~with inclusive lowerbound 1860 and exclusive upper bound 1870.~~具有包含下限1860和排除上限1870。
[1870, 1880) ~~with inclusive lowerbound 1870 and exclusive upper bound 1880.~~具有包含下限1870和排除上限1880。
~~If a document did not contain the year_born field or its year_born field was outside the ranges above, it would be placed in the default bucket with the _id value "Other".~~如果文档不包含year_born字段，或者其year_born字段超出上述范围，则它将被放置在_id值为"Other"的default桶中。

~~The stage includes the output document to determine the fields to return:~~该阶段包括用于确定要返回的字段的output文档：

~~Field~~字段	~~Description~~描述
`_id`	~~Inclusive lower bound of the bucket.~~桶的包含下限。
`count`	~~Count of documents in the bucket.~~桶中的文档数。
`artists`	~~Array of documents containing information on each artist in the bucket.~~ 包含bucket中每个艺术家信息的文档数组。~~Each document contains the artist's~~ 每个文档都包含艺术家的 ~~`name`, which is a concatenation (i.e. `$concat`) of the artist's `first_name` and `last_name`.~~`name`，它是艺术家的`first_name`和`last_name`的串联（即`$concat`）。 `year_born`

~~This stage passes the following documents to the next stage:~~此阶段将以下文件传递到下一阶段：

{ "_id" : 1840, "count" : 1, "artists" : [ { "name" : "Odilon Redon", "year_born" : 1840 } ] }

{ "_id" : 1850, "count" : 2, "artists" : [ { "name" : "Vincent Van Gogh", "year_born" : 1853 },
                                           { "name" : "Edvard Diriks", "year_born" : 1855 } ] }

{ "_id" : 1860, "count" : 4, "artists" : [ { "name" : "Emil Bernard", "year_born" : 1868 },
                                           { "name" : "Joszef Rippl-Ronai", "year_born" : 1861 },
                                           { "name" : "Alfred Maurer", "year_born" : 1868 },
                                           { "name" : "Edvard Munch", "year_born" : 1863 } ] }

{ "_id" : 1870, "count" : 1, "artists" : [ { "name" : "Anna Ostroumova", "year_born" : 1871 } ] }

~~Second Stage~~第二阶段

~~The $match stage filters the output from the previous stage to only return buckets which contain more than 3 documents.~~$match阶段筛选前一阶段的输出，只返回包含3个以上文档的桶。

~~The operation returns the following document:~~该操作返回以下文档：

{ "_id" : 1860, "count" : 4, "artists" :
  [
    { "name" : "Emil Bernard", "year_born" : 1868 },
    { "name" : "Joszef Rippl-Ronai", "year_born" : 1861 },
    { "name" : "Alfred Maurer", "year_born" : 1868 },
    { "name" : "Edvard Munch", "year_born" : 1863 }
  ]
}

Use $bucket with $facet to Bucket by Multiple Fields将`$bucket`与`$facet`一起用于多个字段的桶

~~You can use the $facet stage to perform multiple $bucket aggregations in a single stage.~~您可以使用$facet阶段在单个阶段中执行多个$bucket聚合。

~~In mongosh, create a sample collection named artwork with the following documents:~~在mongosh中，使用以下文档创建一个名为artwork的样本集合：

db.artwork.insertMany([
  { "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
      "price" : NumberDecimal("199.99") },
  { "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
      "price" : NumberDecimal("280.00") },
  { "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
      "price" : NumberDecimal("76.04") },
  { "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
      "price" : NumberDecimal("167.30") },
  { "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
      "price" : NumberDecimal("483.00") },
  { "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
      "price" : NumberDecimal("385.00") },
  { "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893
      /* No price*/ },
  { "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
      "price" : NumberDecimal("118.42") }
])

~~The following operation uses two $bucket stages within a $facet stage to create two groupings, one by price and the other by year:~~以下操作使用$facet阶段中的两个$bucket阶段来创建两个分组，一个按price，另一个按year：

db.artwork.aggregate( [
  {
    $facet: {                               // Top-level $facet stage顶级$facet阶段
      "price": [                            // Output field 输出字段1
        {
          $bucket: {
              groupBy: "$price",            // Field to group by分组依据字段
              boundaries: [ 0, 200, 400 ],  // Boundaries for the buckets桶的边界
              default: "Other",             // Bucket ID for documents which do not fall into a bucket不属于存储桶的文档的存储桶ID
              output: {                     // Output for each bucket每个桶的输出
                "count": { $sum: 1 },
                "artwork" : { $push: { "title": "$title", "price": "$price" } },
                "averagePrice": { $avg: "$price" }
              }
          }
        }
      ],
      "year": [                                      // Output field 输出字段2
        {
          $bucket: {
            groupBy: "$year",                        // Field to group by分组依据字段
            boundaries: [ 1890, 1910, 1920, 1940 ],  // Boundaries for the buckets桶的边界
            default: "Unknown",                      // Bucket ID for documents which do not fall into a bucket不属于存储桶的文档的存储桶ID
            output: {                                // Output for each bucket每个桶的输出
              "count": { $sum: 1 },
              "artwork": { $push: { "title": "$title", "year": "$year" } }
            }
          }
        }
      ]
    }
  }
] )

~~First Facet~~第一位面

~~The first facet groups the input documents by price. The buckets have the following boundaries:~~第一个位面按price对输入文档进行分组。桶具有以下边界：

[0, 200) ~~with inclusive lowerbound 0 and exclusive upper bound 200.~~具有包含下限0和排除上限200。
[200, 400) ~~with inclusive lowerbound 200 and exclusive upper bound 400.~~具有包含下限200和排除上限400。
~~"Other", the default bucket containing documents without prices or prices outside the ranges above.~~"Other"，包含没有价格或价格超出上述范围的文档的default存储桶。

~~The $bucket stage includes the output document to determine the fields to return:~~$bucket阶段包括用于确定要返回的字段的output文档：

~~Field~~字段	~~Description~~描述
`_id`	~~Inclusive lower bound of the bucket.~~桶的包含下限。
`count`	~~Count of documents in the bucket.~~存储桶中的文档数。
`artwork`	~~Array of documents containing information on each artwork in the bucket.~~一组文档，其中包含有关桶中每个艺术品的信息。
`averagePrice`	~~Employs the `$avg` operator to display the average price of all artwork in the bucket.~~使用`$avg`运算符来显示桶中所有艺术品的平均价格。

~~Second Facet~~第二位面

~~The second facet groups the input documents by year.~~ 第二个方面按year对输入文档进行分组。~~The buckets have the following boundaries:~~桶具有以下边界：

[1890, 1910) ~~with inclusive lowerbound 1890 and exclusive upper bound 1910.~~具有包含下界1890和排除上界1910。
[1910, 1920) ~~with inclusive lowerbound 1910 and exclusive upper bound 1920.~~具有包含下界1910和排除上界1920。
[1920, 1940) ~~with inclusive lowerbound 1920 and exclusive upper bound 1940.~~具有包含下界1920和排除上界1940。
"Unknown"~~, the default bucket containing documents without years or years outside the ranges above.~~，default桶，包含没有年份或年份超出上述范围的文档。

~~The $bucket stage includes the output document to determine the fields to return:~~$bucket阶段包括用于确定要返回的字段的output文档：

~~Field~~字段	~~Description~~描述
`count`	~~Count of documents in the bucket.~~存储桶中的文档数。
`artwork`	~~Array of documents containing information on each artwork in the bucket.~~一组文档，其中包含有关bucket中每个艺术品的信息。

~~Output~~输出

~~The operation returns the following document:~~该操作返回以下文档：

{
  "price" : [ // Output of first facet第一位面的输出
    {
      "_id" : 0,
      "count" : 4,
      "artwork" : [
        { "title" : "The Pillars of Society", "price" : NumberDecimal("199.99") },
        { "title" : "Dancer", "price" : NumberDecimal("76.04") },
        { "title" : "The Great Wave off Kanagawa", "price" : NumberDecimal("167.30") },
        { "title" : "Blue Flower", "price" : NumberDecimal("118.42") }
      ],
      "averagePrice" : NumberDecimal("140.4375")
    },
    {
      "_id" : 200,
      "count" : 2,
      "artwork" : [
        { "title" : "Melancholy III", "price" : NumberDecimal("280.00") },
        { "title" : "Composition VII", "price" : NumberDecimal("385.00") }
      ],
      "averagePrice" : NumberDecimal("332.50")
    },
    {
      // Includes documents without prices and prices greater than 400包括没有价格和价格超过400的文档
      "_id" : "Other",
      "count" : 2,
      "artwork" : [
        { "title" : "The Persistence of Memory", "price" : NumberDecimal("483.00") },
        { "title" : "The Scream" }
      ],
      "averagePrice" : NumberDecimal("483.00")
    }
  ],
  "year" : [ // Output of second facet第二位面的输出
    {
      "_id" : 1890,
      "count" : 2,
      "artwork" : [
        { "title" : "Melancholy III", "year" : 1902 },
        { "title" : "The Scream", "year" : 1893 }
      ]
    },
    {
      "_id" : 1910,
      "count" : 2,
      "artwork" : [
        { "title" : "Composition VII", "year" : 1913 },
        { "title" : "Blue Flower", "year" : 1918 }
      ]
    },
    {
      "_id" : 1920,
      "count" : 3,
      "artwork" : [
        { "title" : "The Pillars of Society", "year" : 1926 },
        { "title" : "Dancer", "year" : 1925 },
        { "title" : "The Persistence of Memory", "year" : 1931 }
      ]
    },
    {
      // Includes documents without a year包括没有年份的文档
      "_id" : "Unknown",
      "count" : 1,
      "artwork" : [
        { "title" : "The Great Wave off Kanagawa" }
      ]
    }
  ]
}

Tip

$bucket (aggregation)

Definition定义

Considerations注意事项

`$bucket` and Memory Restrictions

See also: 另请参阅：

Syntax语法

Behavior行为

Examples实例

Bucket by Year and Filter by Bucket Results逐年筛选和按筛选结果筛选

Use $bucket with $facet to Bucket by Multiple Fields将`$bucket`与`$facet`一起用于多个字段的桶

See also: 另请参阅：

$bucket (aggregation)

Definition定义

Considerations注意事项

$bucket and Memory Restrictions

See also: 另请参阅：

Syntax语法

Behavior行为

Examples实例

Bucket by Year and Filter by Bucket Results逐年筛选和按筛选结果筛选

Use $bucket with $facet to Bucket by Multiple Fields将$bucket与$facet一起用于多个字段的桶

See also: 另请参阅：

`$bucket` and Memory Restrictions

Use $bucket with $facet to Bucket by Multiple Fields将`$bucket`与`$facet`一起用于多个字段的桶