Docs HomeMongoDB Manual

Aggregation with User Preference Data与用户首选项数据的聚合

Data Model数据模型

Consider a hypothetical sports club with a database that contains a users collection that tracks the user's join dates, sport preferences, and stores these data in documents that resemble the following:考虑一个假设的体育俱乐部,该俱乐部的数据库包含一个users集合,用于跟踪用户的加入日期、体育偏好,并将这些数据存储在类似于以下的文档中:

{
_id : "jane",
joined : ISODate("2011-03-02"),
likes : ["golf", "racquetball"]
}
{
_id : "joe",
joined : ISODate("2012-07-02"),
likes : ["tennis", "golf", "swimming"]
}

Normalize and Sort Documents规范化和排序文档

The following operation returns user names in upper case and in alphabetical order. 以下操作以大写字母顺序返回用户名。The aggregation includes user names for all documents in the users collection. 聚合包括users集合中所有文档的用户名。You might do this to normalize user names for processing.您可以这样做来规范化用户名以进行处理。

db.users.aggregate(
[
{ $project : { name:{$toUpper:"$_id"} , _id:0 } },
{ $sort : { name : 1 } }
]
)

All documents from the users collection pass through the pipeline, which consists of the following operations:users集合中的所有文档都通过管道,管道由以下操作组成:

  • The $project operator:$project运算符:

    • creates a new field called name.创建一个名为name的新字段。
    • converts the value of the _id to upper case, with the $toUpper operator. 使用$toUpper运算符将_id的值转换为大写。Then the $project creates a new field, named name to hold this value.然后$project创建一个新字段,命名为name以保存该值。
    • suppresses the id field. $project will pass the _id field by default, unless explicitly suppressed.取消显示id字段。默认情况下,除非明确禁止,否则$project将传递_id字段。
  • The $sort operator orders the results by the name field.$sort运算符按名称字段对结果进行排序。

The results of the aggregation would resemble the following:汇总结果如下所示:

{
"name" : "JANE"
},
{
"name" : "JILL"
},
{
"name" : "JOE"
}

Return Usernames Ordered by Join Month返回按加入月份排序的用户名

The following aggregation operation returns user names sorted by the month they joined. 以下聚合操作返回按加入月份排序的用户名。This kind of aggregation could help generate membership renewal notices.这种聚合可以帮助生成会员续订通知。

db.users.aggregate(
[
{ $project :
{
month_joined : { $month : "$joined" },
name : "$_id",
_id : 0
}
},
{ $sort : { month_joined : 1 } }
]
)

The pipeline passes all documents in the users collection through the following operations:管道通过以下操作传递用户集合中的所有文档:

  • The $project operator:$project运算符:

    • Creates two new fields: month_joined and name.创建两个新字段:month_joinedname
    • Suppresses the id from the results. The aggregate() method includes the _id, unless explicitly suppressed.从结果中取消显示idaggregate()方法包括_id,除非显式抑制。
  • The $month operator converts the values of the joined field to integer representations of the month. $month运算符将joined字段的值转换为月份的整数表示形式。Then the $project operator assigns those values to the month_joined field.然后$project运算符将这些值分配给month_joined字段。
  • The $sort operator sorts the results by the month_joined field.$sort运算符按month_joined字段对结果进行排序。

The operation returns results that resemble the following:该操作返回的结果如下所示:

{
"month_joined" : 1,
"name" : "ruth"
},
{
"month_joined" : 1,
"name" : "harold"
},
{
"month_joined" : 1,
"name" : "kate"
}
{
"month_joined" : 2,
"name" : "jill"
}

Return Total Number of Joins per Month返回每月的加入总数

The following operation shows how many people joined each month of the year. 以下操作显示一年中每个月有多少人加入。You might use this aggregated data for recruiting and marketing strategies.您可以将这些汇总数据用于招聘和营销策略。

db.users.aggregate(
[
{ $project : { month_joined : { $month : "$joined" } } } ,
{ $group : { _id : {month_joined:"$month_joined"} , number : { $sum : 1 } } },
{ $sort : { "_id.month_joined" : 1 } }
]
)

The pipeline passes all documents in the users collection through the following operations:管道通过以下操作传递users集合中的所有文档:

  • The $project operator creates a new field called month_joined.$project运算符创建一个名为month_joind的新字段。
  • The $month operator converts the values of the joined field to integer representations of the month. $month运算符将joined字段的值转换为月份的整数表示形式。Then the $project operator assigns the values to the month_joined field.然后$project运算符将值分配给month_joined字段。
  • The $group operator collects all documents with a given month_joined value and counts how many documents there are for that value. $group运算符集合具有给定month_joined值的所有文档,并计算该值的文档数。Specifically, for each unique value, $group creates a new "per-month" document with two fields:具体来说,对于每个唯一的值,$group会创建一个新的“每月”文档,其中包含两个字段:

    • _id, which contains a nested document with the month_joined field and its value._id,其中包含一个带有month_joined字段及其值的嵌套文档。
    • number, which is a generated field. number,这是一个生成的字段。The $sum operator increments this field by 1 for every document containing the given month_joined value.对于每个包含给定month_joined值的文档,$sum运算符将此字段递增1。
  • The $sort operator sorts the documents created by $group according to the contents of the month_joined field.$sort运算符根据month_joined字段的内容对$group创建的文档进行排序。

The result of this aggregation operation would resemble the following:此聚合操作的结果如下所示:

{
"_id" : {
"month_joined" : 1
},
"number" : 3
},
{
"_id" : {
"month_joined" : 2
},
"number" : 9
},
{
"_id" : {
"month_joined" : 3
},
"number" : 5
}

Return the Five Most Common "Likes"返回五个最常见的“赞”

The following aggregation collects top five most "liked" activities in the data set. 以下聚合集合了数据集中最受“喜欢”的前五个活动。This type of analysis could help inform planning and future development.这种类型的分析有助于为规划和未来发展提供信息。

db.users.aggregate(
[
{ $unwind : "$likes" },
{ $group : { _id : "$likes" , number : { $sum : 1 } } },
{ $sort : { number : -1 } },
{ $limit : 5 }
]
)

The pipeline begins with all documents in the users collection, and passes these documents through the following operations:管道从users集合中的所有文档开始,并通过以下操作传递这些文档:

  • The $unwind operator separates each value in the likes array, and creates a new version of the source document for every element in the array.$unwind运算符分离likes数组中的每个值,并为数组中的每一个元素创建源文档的新版本。

    Example

    Given the following document from the users collection:给定users集合中的以下文档:

    {
    _id : "jane",
    joined : ISODate("2011-03-02"),
    likes : ["golf", "racquetball"]
    }

    The $unwind operator would create the following documents:$unwind运算符将创建以下文档:

    {
    _id : "jane",
    joined : ISODate("2011-03-02"),
    likes : "golf"
    }
    {
    _id : "jane",
    joined : ISODate("2011-03-02"),
    likes : "racquetball"
    }
  • The $group operator collects all documents with the same value for the likes field and counts each grouping. With this information, $group creates a new document with two fields:$group运算符为likes字段集合具有相同值的所有文档,并对每个分组进行计数。有了这些信息,$group将创建一个包含两个字段的新文档:

    • _id, which contains the likes value.,它包含likes值。
    • number, which is a generated field. The $sum operator increments this field by 1 for every document containing the given likes value.number,这是一个生成的字段。对于每个包含给定likes值的文档,$sum运算符将该字段增加1
  • The $sort operator sorts these documents by the number field in reverse order.$sort运算符按数字字段以相反的顺序对这些文档进行排序。
  • The $limit operator only includes the first 5 result documents.$limit运算符只包括前5个结果文档。

The results of aggregation would resemble the following:汇总结果如下所示:

{
"_id" : "golf",
"number" : 33
},
{
"_id" : "racquetball",
"number" : 31
},
{
"_id" : "swimming",
"number" : 24
},
{
"_id" : "handball",
"number" : 19
},
{
"_id" : "tennis",
"number" : 18
}