Aggregation with the Zip Code Data Set使用邮政编码数据集进行聚合

On this page本页内容

The examples in this document use the zipcodes collection. 本文档中的示例使用zipcodes集合。This collection is available at: media.mongodb.org/zips.json. 此集合可从以下网址获得:media.mongodb.org/zips.jsonUse mongoimport to load this data set into your mongod instance.使用mongoimport将此数据集加载到mongod实例中。

Data Model数据模型

Each document in the zipcodes collection has the following form:zipcodes集合中的每个文档都有以下格式:

{
  "_id": "10280",
  "city": "NEW YORK",
  "state": "NY",
  "pop": 5574,
  "loc": [
    -74.016323,
    40.710537
  ]
}
  • The _id field holds the zip code as a string._id字段将邮政编码作为字符串保存。
  • The city field holds the city name. city字段包含城市名称。A city can have more than one zip code associated with it as different sections of the city can each have a different zip code.一个城市可以有多个与之关联的邮政编码,因为城市的不同区域可以有不同的邮政编码。
  • The state field holds the two letter state abbreviation.state字段包含两个字母的state缩写。
  • The pop field holds the population.pop字段包含人口。
  • The loc field holds the location as a longitude latitude pair.loc字段以经纬度对的形式保存位置。

aggregate() Method方法

All of the following examples use the aggregate() helper in mongosh.以下所有示例都使用mongosh中的aggregate()助手。

The aggregate() method uses the aggregation pipeline to process documents into aggregated results. aggregate()方法使用聚合管道将文档处理为聚合结果。An aggregation pipeline consists of stages with each stage processing the documents as they pass along the pipeline. 聚合管道由多个阶段组成,每个阶段在文档通过管道时处理文档。Documents pass through the stages in sequence.文件按顺序经过各个阶段。

The aggregate() method in mongosh provides a wrapper around the aggregate database command. mongosh中的aggregate()方法提供了一个围绕aggregate数据库命令的包装器。See the documentation for your driver for a more idiomatic interface for data aggregation operations.有关数据聚合操作的更惯用界面,请参阅驱动程序的文档。

Return States with Populations above 10 Million返回人口超过1000万的州

The following aggregation operation returns all states with total population greater than 10 million:以下聚合操作返回总人口超过1000万的所有州:

db.zipcodes.aggregate( [
   { $group: { _id: "$state", totalPop: { $sum: "$pop" } } },
   { $match: { totalPop: { $gte: 10*1000*1000 } } }
] )

In this example, the aggregation pipeline consists of the $group stage followed by the $match stage:在本例中,聚合管道$group阶段和$match阶段组成:

  • The $group stage groups the documents of the zipcode collection by the state field, calculates the totalPop field for each state, and outputs a document for each unique state.$group阶段按state字段对zipcode集合的文档进行分组,计算每个州的totalPop字段,并为每个唯一州输出一个文档。

    The new per-state documents have two fields: the _id field and the totalPop field. 新的每州文档有两个字段:_id字段和totalPop字段。The _id field contains the value of the state; i.e. the group by field. _id字段包含state的值;即按字段分组。The totalPop field is a calculated field that contains the total population of each state. totalPop字段是一个计算字段,包含每个州的总人口。To calculate the value, $group uses the $sum operator to add the population field (pop) for each state.为了计算值,$group使用$sum运算符为每个州添加人口字段(pop)。

    After the $group stage, the documents in the pipeline resemble the following:$group阶段之后,管道中的文档类似于以下内容:

    {
      "_id" : "AK",
      "totalPop" : 550043
    }
  • The $match stage filters these grouped documents to output only those documents whose totalPop value is greater than or equal to 10 million. $match阶段筛选这些分组文档,以仅输出totalPop值大于或等于1000万的文档。The $match stage does not alter the matching documents but outputs the matching documents unmodified.$match阶段不会更改匹配文档,但会输出未经修改的匹配文档。

The equivalent SQL for this aggregation operation is:此聚合操作的等效SQL为:

SELECT state, SUM(pop) AS totalPop
FROM zipcodes
GROUP BY state
HAVING totalPop >= (10*1000*1000)
Tip提示
See also: 参阅:

Return Average City Population by State返回各州的平均城市人口

The following aggregation operation returns the average populations for cities in each state:以下聚合操作返回每个州城市的平均人口:

db.zipcodes.aggregate( [
   { $group: { _id: { state: "$state", city: "$city" }, pop: { $sum: "$pop" } } },
   { $group: { _id: "$_id.state", avgCityPop: { $avg: "$pop" } } }
] )

In this example, the aggregation pipeline consists of the $group stage followed by another $group stage:在本例中,聚合管道$group阶段和另一个$group阶段组成:

  • The first $group stage groups the documents by the combination of city and state, uses the $sum expression to calculate the population for each combination, and outputs a document for each city and state combination. 第一个$group阶段根据城市和州的组合对文档进行分组,使用$sum表达式计算每个组合的人口,并输出每个citystate组合的文档。[1]

    After this stage in the pipeline, the documents resemble the following:经过这一阶段后,文件如下所示:

    {
      "_id" : {
        "state" : "CO",
        "city" : "EDGEWATER"
      },
      "pop" : 13154
    }
  • A second $group stage groups the documents in the pipeline by the _id.state field (i.e. the state field inside the _id document), uses the $avg expression to calculate the average city population (avgCityPop) for each state, and outputs a document for each state.第二个$group阶段通过_id.state字段(即_id文档中的state字段)对管道中的文档进行分组,使用$avg表达式计算每个州的平均城市人口(avgCityPop),并输出每个州的文档。

The documents that result from this aggregation operation resembles the following:此聚合操作产生的文档类似于以下内容:

{
  "_id" : "MN",
  "avgCityPop" : 5335
}
Tip提示
See also: 参阅:

Return Largest and Smallest Cities by State按州返回最大和最小的城市

The following aggregation operation returns the smallest and largest cities by population for each state:以下聚合操作返回每个州按人口划分的最小和最大城市:

db.zipcodes.aggregate( [
   { $group:
      {
        _id: { state: "$state", city: "$city" },
        pop: { $sum: "$pop" }
      }
   },
   { $sort: { pop: 1 } },
   { $group:
      {
        _id : "$_id.state",
        biggestCity:  { $last: "$_id.city" },
        biggestPop:   { $last: "$pop" },
        smallestCity: { $first: "$_id.city" },
        smallestPop:  { $first: "$pop" }
      }
   },
  // the following $project is optional, and
  // modifies the output format.
  { $project:
    { _id: 0,
      state: "$_id",
      biggestCity:  { name: "$biggestCity",  pop: "$biggestPop" },
      smallestCity: { name: "$smallestCity", pop: "$smallestPop" }
    }
  }
] )

In this example, the aggregation pipeline consists of a $group stage, a $sort stage, another $group stage, and a $project stage:在本例中,聚合管道$group阶段、$sort阶段、另一个$group阶段和$project阶段组成:

  • The first $group stage groups the documents by the combination of the city and state, calculates the sum of the pop values for each combination, and outputs a document for each city and state combination.第一个$group阶段根据citystate的组合对文档进行分组,计算每个组合的pop值之sum,并输出每个citystate组合的文档。

    At this stage in the pipeline, the documents resemble the following:在这一阶段,文件类似于以下内容:

    {
      "_id" : {
        "state" : "CO",
        "city" : "EDGEWATER"
      },
      "pop" : 13154
    }
  • The $sort stage orders the documents in the pipeline by the pop field value, from smallest to largest; i.e. by increasing order. $sort阶段按pop字段值从最小到最大排序管道中的文档;即通过增加顺序。This operation does not alter the documents.此操作不会更改文档。
  • The next $group stage groups the now-sorted documents by the _id.state field (i.e. the state field inside the _id document) and outputs a document for each state.下一个$group阶段根据_id.state字段(即_id文档中的state字段)对当前排序的文档进行分组,并为每个州输出一个文档。

    The stage also calculates the following four fields for each state. 该阶段还为每个州计算以下四个字段。Using the $last expression, the $group operator creates the biggestCity and biggestPop fields that store the city with the largest population and that population. 使用$last表达式,$group运算符创建biggestCitybiggestPop字段,这些字段存储人口最多的城市和人口。Using the $first expression, the $group operator creates the smallestCity and smallestPop fields that store the city with the smallest population and that population.使用$first表达式、$group运算符创建存储人口最少的城市和人口的smallestCitysmallestPop字段。

    The documents, at this stage in the pipeline, resemble the following:现阶段正在编制的文件类似于以下内容:

    {
      "_id" : "WA",
      "biggestCity" : "SEATTLE",
      "biggestPop" : 520096,
      "smallestCity" : "BENGE",
      "smallestPop" : 2
    }
  • The final $project stage renames the _id field to state and moves the biggestCity, biggestPop, smallestCity, and smallestPop into biggestCity and smallestCity embedded documents.最后一个$project阶段将_id字段重命名为state,并将biggestCitybiggestPopsmallestCitysmallestPop移动到biggestCitysmallestCity嵌入文档中。

The output documents of this aggregation operation resemble the following:此聚合操作的输出文档类似于以下内容:

{
  "state" : "RI",
  "biggestCity" : {
    "name" : "CRANSTON",
    "pop" : 176404
  },
  "smallestCity" : {
    "name" : "CLAYVILLE",
    "pop" : 45
  }
}
[1] A city can have more than one zip code associated with it as different sections of the city can each have a different zip code.一个城市可以有多个与之关联的邮政编码,因为城市的不同区域可以有不同的邮政编码。
←  Aggregation Pipeline and Sharded CollectionsAggregation with User Preference Data →