Aggregation with the Zip Code Data Set使用邮政编码数据集进行聚合

~~On this page~~本页内容

~~Data Model~~数据模型
aggregate() ~~Method~~方法
~~Return States with Populations above 10 Million~~返回人口超过1000万的州
~~Return Average City Population by State~~返回各州的平均城市人口
~~Return Largest and Smallest Cities by State~~按州返回最大和最小的城市

~~The examples in this document use the zipcodes collection.~~ 本文档中的示例使用zipcodes集合。~~This collection is available at: media.mongodb.org/zips.json.~~ 此集合可从以下网址获得：media.mongodb.org/zips.json。~~Use mongoimport to load this data set into your mongod instance.~~使用mongoimport将此数据集加载到mongod实例中。

Data Model数据模型

~~Each document in the zipcodes collection has the following form:~~zipcodes集合中的每个文档都有以下格式：

{
  "_id": "10280",
  "city": "NEW YORK",
  "state": "NY",
  "pop": 5574,
  "loc": [
    -74.016323,
    40.710537
  ]
}

~~The _id field holds the zip code as a string.~~_id字段将邮政编码作为字符串保存。
~~The city field holds the city name.~~ city字段包含城市名称。~~A city can have more than one zip code associated with it as different sections of the city can each have a different zip code.~~一个城市可以有多个与之关联的邮政编码，因为城市的不同区域可以有不同的邮政编码。
~~The state field holds the two letter state abbreviation.~~state字段包含两个字母的state缩写。
~~The pop field holds the population.~~pop字段包含人口。
~~The loc field holds the location as a longitude latitude pair.~~loc字段以经纬度对的形式保存位置。

`aggregate()` Method方法

~~All of the following examples use the aggregate() helper in mongosh.~~以下所有示例都使用mongosh中的aggregate()助手。

~~The aggregate() method uses the aggregation pipeline to process documents into aggregated results.~~ aggregate()方法使用聚合管道将文档处理为聚合结果。~~An aggregation pipeline consists of stages with each stage processing the documents as they pass along the pipeline.~~ 聚合管道由多个阶段组成，每个阶段在文档通过管道时处理文档。~~Documents pass through the stages in sequence.~~文件按顺序经过各个阶段。

~~The aggregate() method in mongosh provides a wrapper around the aggregate database command.~~ mongosh中的aggregate()方法提供了一个围绕aggregate数据库命令的包装器。~~See the documentation for your driver for a more idiomatic interface for data aggregation operations.~~有关数据聚合操作的更惯用界面，请参阅驱动程序的文档。

Return States with Populations above 10 Million返回人口超过1000万的州

~~The following aggregation operation returns all states with total population greater than 10 million:~~以下聚合操作返回总人口超过1000万的所有州：

db.zipcodes.aggregate( [
   { $group: { _id: "$state", totalPop: { $sum: "$pop" } } },
   { $match: { totalPop: { $gte: 10*1000*1000 } } }
] )

~~In this example, the aggregation pipeline consists of the $group stage followed by the $match stage:~~在本例中，聚合管道由$group阶段和$match阶段组成：

~~The $group stage groups the documents of the zipcode collection by the state field, calculates the totalPop field for each state, and outputs a document for each unique state.~~$group阶段按state字段对zipcode集合的文档进行分组，计算每个州的totalPop字段，并为每个唯一州输出一个文档。

~~The new per-state documents have two fields: the _id field and the totalPop field.~~ 新的每州文档有两个字段：_id字段和totalPop字段。~~The _id field contains the value of the state; i.e. the group by field.~~ _id字段包含state的值；即按字段分组。~~The totalPop field is a calculated field that contains the total population of each state.~~ totalPop字段是一个计算字段，包含每个州的总人口。~~To calculate the value, $group uses the $sum operator to add the population field (pop) for each state.~~为了计算值，$group使用$sum运算符为每个州添加人口字段（pop）。

~~After the $group stage, the documents in the pipeline resemble the following:~~在$group阶段之后，管道中的文档类似于以下内容：
```
{
  "_id" : "AK",
  "totalPop" : 550043
}
```
~~The $match stage filters these grouped documents to output only those documents whose totalPop value is greater than or equal to 10 million.~~ $match阶段筛选这些分组文档，以仅输出totalPop值大于或等于1000万的文档。~~The $match stage does not alter the matching documents but outputs the matching documents unmodified.~~$match阶段不会更改匹配文档，但会输出未经修改的匹配文档。

~~The equivalent SQL for this aggregation operation is:~~此聚合操作的等效SQL为：

SELECT state, SUM(pop) AS totalPop
FROM zipcodes
GROUP BY state
HAVING totalPop >= (10*1000*1000)

~~Tip~~提示

~~See also:~~ 参阅：

Return Average City Population by State返回各州的平均城市人口

~~The following aggregation operation returns the average populations for cities in each state:~~以下聚合操作返回每个州城市的平均人口：

db.zipcodes.aggregate( [
   { $group: { _id: { state: "$state", city: "$city" }, pop: { $sum: "$pop" } } },
   { $group: { _id: "$_id.state", avgCityPop: { $avg: "$pop" } } }
] )

~~In this example, the aggregation pipeline consists of the $group stage followed by another $group stage:~~在本例中，聚合管道由$group阶段和另一个$group阶段组成：

The first $group stage groups the documents by the combination of city and state, uses the $sum expression to calculate the population for each combination, and outputs a document for each city and state combination. 第一个$group阶段根据城市和州的组合对文档进行分组，使用$sum表达式计算每个组合的人口，并输出每个city和state组合的文档。[1]

~~After this stage in the pipeline, the documents resemble the following:~~经过这一阶段后，文件如下所示：
```
{
  "_id" : {
    "state" : "CO",
    "city" : "EDGEWATER"
  },
  "pop" : 13154
}
```
A second $group stage groups the documents in the pipeline by the _id.state field (i.e. the state field inside the _id document), uses the $avg expression to calculate the average city population (avgCityPop) for each state, and outputs a document for each state.第二个$group阶段通过_id.state字段（即_id文档中的state字段）对管道中的文档进行分组，使用$avg表达式计算每个州的平均城市人口（avgCityPop），并输出每个州的文档。

~~The documents that result from this aggregation operation resembles the following:~~此聚合操作产生的文档类似于以下内容：

{
  "_id" : "MN",
  "avgCityPop" : 5335
}

~~Tip~~提示

~~See also:~~ 参阅：

Return Largest and Smallest Cities by State按州返回最大和最小的城市

~~The following aggregation operation returns the smallest and largest cities by population for each state:~~以下聚合操作返回每个州按人口划分的最小和最大城市：

db.zipcodes.aggregate( [
   { $group:
      {
        _id: { state: "$state", city: "$city" },
        pop: { $sum: "$pop" }
      }
   },
   { $sort: { pop: 1 } },
   { $group:
      {
        _id : "$_id.state",
        biggestCity:  { $last: "$_id.city" },
        biggestPop:   { $last: "$pop" },
        smallestCity: { $first: "$_id.city" },
        smallestPop:  { $first: "$pop" }
      }
   },
  // the following $project is optional, and
  // modifies the output format.
  { $project:
    { _id: 0,
      state: "$_id",
      biggestCity:  { name: "$biggestCity",  pop: "$biggestPop" },
      smallestCity: { name: "$smallestCity", pop: "$smallestPop" }
    }
  }
] )

~~In this example, the aggregation pipeline consists of a $group stage, a $sort stage, another $group stage, and a $project stage:~~在本例中，聚合管道由$group阶段、$sort阶段、另一个$group阶段和$project阶段组成：

The first $group stage groups the documents by the combination of the city and state, calculates the sum of the pop values for each combination, and outputs a document for each city and state combination.第一个$group阶段根据city和state的组合对文档进行分组，计算每个组合的pop值之sum，并输出每个city和state组合的文档。

~~At this stage in the pipeline, the documents resemble the following:~~在这一阶段，文件类似于以下内容：
```
{
  "_id" : {
    "state" : "CO",
    "city" : "EDGEWATER"
  },
  "pop" : 13154
}
```
~~The $sort stage orders the documents in the pipeline by the pop field value, from smallest to largest; i.e. by increasing order.~~ $sort阶段按pop字段值从最小到最大排序管道中的文档；即通过增加顺序。~~This operation does not alter the documents.~~此操作不会更改文档。
~~The next $group stage groups the now-sorted documents by the _id.state field (i.e. the state field inside the _id document) and outputs a document for each state.~~下一个$group阶段根据_id.state字段（即_id文档中的state字段）对当前排序的文档进行分组，并为每个州输出一个文档。

~~The stage also calculates the following four fields for each state.~~ 该阶段还为每个州计算以下四个字段。~~Using the $last expression, the $group operator creates the biggestCity and biggestPop fields that store the city with the largest population and that population.~~ 使用$last表达式，$group运算符创建biggestCity和biggestPop字段，这些字段存储人口最多的城市和人口。~~Using the $first expression, the $group operator creates the smallestCity and smallestPop fields that store the city with the smallest population and that population.~~使用$first表达式、$group运算符创建存储人口最少的城市和人口的smallestCity和smallestPop字段。

~~The documents, at this stage in the pipeline, resemble the following:~~现阶段正在编制的文件类似于以下内容：
```
{
  "_id" : "WA",
  "biggestCity" : "SEATTLE",
  "biggestPop" : 520096,
  "smallestCity" : "BENGE",
  "smallestPop" : 2
}
```
~~The final $project stage renames the _id field to state and moves the biggestCity, biggestPop, smallestCity, and smallestPop into biggestCity and smallestCity embedded documents.~~最后一个$project阶段将_id字段重命名为state，并将biggestCity、biggestPop、smallestCity和smallestPop移动到biggestCity和smallestCity嵌入文档中。

~~The output documents of this aggregation operation resemble the following:~~此聚合操作的输出文档类似于以下内容：

{
  "state" : "RI",
  "biggestCity" : {
    "name" : "CRANSTON",
    "pop" : 176404
  },
  "smallestCity" : {
    "name" : "CLAYVILLE",
    "pop" : 45
  }
}

[1]	~~A city can have more than one zip code associated with it as different sections of the city can each have a different zip code.~~一个城市可以有多个与之关联的邮政编码，因为城市的不同区域可以有不同的邮政编码。