Aggregation with the Zip Code Data Set使用邮政编码数据集进行聚合
On this page本页内容
The examples in this document use the 本文档中的示例使用zipcodes
collection. zipcodes
集合。This collection is available at: media.mongodb.org/zips.json该系列可在以下网址获得:media.mongodb.org/zips.json.
。
Use 使用mongoimport
to load this data set into your mongod
instance.mongoimport
将此数据集加载到您的mongod
实例中。
Data Model数据模型
Each document in the zipcodes
collection has the following form:zipcodes
集合中的每个文档都具有以下形式:
{
"_id": "10280",
"city": "NEW YORK",
"state": "NY",
"pop": 5574,
"loc": [
-74.016323,
40.710537
]
}
The_id
field holds the zip code as a string._id
字段将邮政编码作为字符串保存。Thecity
field holds the city name. A city can have more than one zip code associated with it as different sections of the city can each have a different zip code.city
字段包含城市名称。一个城市可以有多个与之相关的邮政编码,因为城市的不同部分都可以有不同的邮政编码。Thestate
field holds the two letter state abbreviation.state
字段包含两个字母的州缩写。Thepop
field holds the population.pop
字段保存了人口。Theloc
field holds the location as a longitude latitude pair.loc
字段将位置保存为经纬度对。
aggregate()
Method方法
All of the following examples use the 以下所有示例都使用aggregate()
helper in mongosh
.mongosh
中的aggregate()
辅助对象。
The aggregate()
method uses the aggregation pipeline to process documents into aggregated results. aggregate()
方法使用聚合管道将文档处理为聚合结果。An aggregation pipeline consists of stages with each stage processing the documents as they pass along the pipeline. Documents pass through the stages in sequence.聚合管道由多个阶段组成,每个阶段在文档通过管道时对其进行处理。文档按顺序通过各个阶段。
The aggregate()
method in mongosh
provides a wrapper around the aggregate
database command. mongosh
中的aggregate()
方法为aggregate
数据库命令提供了一个包装器。See the documentation for your driver for a more idiomatic interface for data aggregation operations.有关数据聚合操作的更惯用的接口,请参阅驱动程序的文档。
Return States with Populations above 10 Million返回人口超过1000万的州
The following aggregation operation returns all states with total population greater than 10 million:以下聚合操作返回总人口超过1000万的所有州:
db.zipcodes.aggregate( [
{ $group: { _id: "$state", totalPop: { $sum: "$pop" } } },
{ $match: { totalPop: { $gte: 10*1000*1000 } } }
] )
In this example, the aggregation pipeline consists of the 在本例中,聚合管道由$group
stage followed by the $match
stage:$group
阶段和$match
阶段组成:
The$group
stage groups the documents of thezipcode
collection by thestate
field, calculates thetotalPop
field for each state, and outputs a document for each unique state.$group
阶段根据state
字段对zipcode
集合的文档进行分组,计算每个州的totalPop
字段,并为每个唯一州输出一个文档。The new per-state documents have two fields: the新的每个州文档有两个字段:_id
field and thetotalPop
field._id
字段和totalPop
字段。The_id
field contains the value of thestate
; i.e. the group by field._id
字段包含state
的值;即按字段分组。ThetotalPop
field is a calculated field that contains the total population of each state.totalPop
字段是一个计算字段,包含每个州的总人口。To calculate the value,为了计算该值,$group
uses the$sum
operator to add the population field (pop
) for each state.$group
使用$sum
运算符为每个州添加填充字段(pop)。After the在$group
stage, the documents in the pipeline resemble the following:$group
阶段之后,管道中的文档如下所示:{
"_id" : "AK",
"totalPop" : 550043
}The$match
stage filters these grouped documents to output only those documents whosetotalPop
value is greater than or equal to 10 million.$match
阶段筛选这些分组文档,以仅输出totalPop
值大于或等于1000万的文档。The$match
stage does not alter the matching documents but outputs the matching documents unmodified.$match
阶段不会更改匹配的文档,而是输出未修改的匹配文档。
The equivalent SQL for this aggregation operation is:此聚合操作的等效SQL为:
SELECT state, SUM(pop) AS totalPop
FROM zipcodes
GROUP BY state
HAVING totalPop >= (10*1000*1000)
Return Average City Population by State返回各州的平均城市人口
The following aggregation operation returns the average populations for cities in each state:以下聚合操作返回各州城市的平均人口:
db.zipcodes.aggregate( [
{ $group: { _id: { state: "$state", city: "$city" }, pop: { $sum: "$pop" } } },
{ $group: { _id: "$_id.state", avgCityPop: { $avg: "$pop" } } }
] )
In this example, the aggregation pipeline consists of the 在本例中,聚合管道由$group
stage followed by another $group
stage:$group
阶段和另一个$group
阶段组成:
The first第一个$group
stage groups the documents by the combination ofcity
andstate
, uses the$sum
expression to calculate the population for each combination, and outputs a document for eachcity
andstate
combination.$group
阶段按city
和state
的组合对文档进行分组,使用$sum
表达式计算每个组合的人口,并为每个city
和state
组合输出一个文档。[1]After this stage in the pipeline, the documents resemble the following:在管道的这一阶段之后,文档如下所示:{
"_id" : {
"state" : "CO",
"city" : "EDGEWATER"
},
"pop" : 13154
}A second第二个$group
stage groups the documents in the pipeline by the_id.state
field (i.e. thestate
field inside the_id
document), uses the$avg
expression to calculate the average city population (avgCityPop
) for each state, and outputs a document for each state.$group
阶段根据_id.state
字段(即_id
文档中的state
字段)对管道中的文档进行分组,使用$avg
表达式计算每个州的平均城市人口(avgCityPop
),并为每个州输出一个文档。
The documents that result from this aggregation operation resembles the following:此聚合操作产生的文档类似于以下内容:
{
"_id" : "MN",
"avgCityPop" : 5335
}
Return Largest and Smallest Cities by State按州返回最大和最小城市
The following aggregation operation returns the smallest and largest cities by population for each state:以下聚合操作按人口返回各州的人口最少和人口最多的城市:
db.zipcodes.aggregate( [
{ $group:
{
_id: { state: "$state", city: "$city" },
pop: { $sum: "$pop" }
}
},
{ $sort: { pop: 1 } },
{ $group:
{
_id : "$_id.state",
biggestCity: { $last: "$_id.city" },
biggestPop: { $last: "$pop" },
smallestCity: { $first: "$_id.city" },
smallestPop: { $first: "$pop" }
}
},
//the following $project is optional, and modifies the output format.下面的$project是可选的,并修改输出格式。
{ $project:
{ _id: 0,
state: "$_id",
biggestCity: { name: "$biggestCity", pop: "$biggestPop" },
smallestCity: { name: "$smallestCity", pop: "$smallestPop" }
}
}
] )
In this example, the aggregation pipeline consists of a 在本例中,聚合管道由$group
stage, a $sort
stage, another $group
stage, and a $project
stage:$group
阶段、$sort
阶段、另一个$group
阶段和$project
阶段组成:
The first第一个$group
stage groups the documents by the combination of thecity
andstate
, calculates thesum
of thepop
values for each combination, and outputs a document for eachcity
andstate
combination.$group
阶段根据city
和state
的组合对文档进行分组,计算每个组合的pop
值之sum
,并为每个city
和state
组合输出一个文档。At this stage in the pipeline, the documents resemble the following:在此阶段,文件如下所示:{
"_id" : {
"state" : "CO",
"city" : "EDGEWATER"
},
"pop" : 13154
}The$sort
stage orders the documents in the pipeline by thepop
field value, from smallest to largest; i.e. by increasing order.$sort
阶段按照pop
字段值对管道中的文档进行排序,从最小到最大;即通过增加订单。This operation does not alter the documents.此操作不会更改文档。The next下一个$group
stage groups the now-sorted documents by the_id.state
field (i.e. thestate
field inside the_id
document) and outputs a document for each state.$group
阶段根据_id.state
字段(即_id
文档中的state
字段)对现在排序的文档进行分组,并为每个州输出一个文档。The stage also calculates the following four fields for each state.该阶段还为每个状态计算以下四个字段。Using the$last
expression, the$group
operator creates thebiggestCity
andbiggestPop
fields that store the city with the largest population and that population.$group
运算符使用$last
表达式创建biggestCity
和biggestPop
字段,用于存储人口最多的城市和人口。Using the$first
expression, the$group
operator creates thesmallestCity
andsmallestPop
fields that store the city with the smallest population and that population.$group
运算符使用$first
表达式创建smallestCity
和smallestPop
字段,用于存储人口最少的城市和人口。The documents, at this stage in the pipeline, resemble the following:现阶段的文件如下所示:{
"_id" : "WA",
"biggestCity" : "SEATTLE",
"biggestPop" : 520096,
"smallestCity" : "BENGE",
"smallestPop" : 2
}- T
he final最后的$project
stage renames the_id
field tostate
and moves thebiggestCity
,biggestPop
,smallestCity
, andsmallestPop
intobiggestCity
andsmallestCity
embedded documents.$project
阶段将_id
字段重命名为state
,并将biggestCity
、biggestPop
、smallestCity
和smallestPop
移动到biggestCity
和smallstCity
嵌入文档中。
The output documents of this aggregation operation resemble the following:此聚合操作的输出文档如下所示:
{
"state" : "RI",
"biggestCity" : {
"name" : "CRANSTON",
"pop" : 176404
},
"smallestCity" : {
"name" : "CLAYVILLE",
"pop" : 45
}
}
[1] |