Faceted Classification分面分类
Minimum MongoDB Version: 4.2
Scenario情形
You want to provide a faceted search capability on your retail website to enable customers to refine their product search by selecting specific characteristics against the product results listed in the web page. 您希望在零售网站上提供多方面的搜索功能,使客户能够通过根据网页中列出的产品结果选择特定特征来优化产品搜索。It is beneficial to classify the products by different dimensions, where each dimension, or facet, corresponds to a particular field in a product record (e.g. product rating, product price). 按不同的维度对产品进行分类是有益的,其中每个维度或方面都对应于产品记录中的特定字段(例如,产品评级、产品价格)。Each facet should be broken down into sub-ranges so that a customer can select a specific sub-range (4 - 5 stars) for a particular facet (e.g. rating). 每个方面都应该细分为子范围,这样客户就可以为特定方面(例如评级)选择特定的子范围(4-5颗星)。The aggregation pipeline will analyse the products collection by each facet's field (rating and price) to determine each facet's spread of values.聚合管道将按每个方面的领域(评级和价格)分析产品集合,以确定每个方面的价值分布。
Sample Data Population样本数据总体
Drop any old version of the database (if it exists) and then populate a new 删除数据库的任何旧版本(如果存在),然后用16个文档填充新products collection with 16 documents (the database commands have been split in two to enable your clipboard to hold all the text - ensure you copy and execute each of the two sections):products集合(数据库命令已一分为二,以使剪贴板能够保存所有文本-确保复制并执行这两个部分中的每一个):
-Part 1-
db = db.getSiblingDB("book-faceted-classfctn");
db.dropDatabase();
// Insert first 8 records into the collection将前8条记录插入集合
db.products.insertMany([
{
"name": "Asus Laptop",
"category": "ELECTRONICS",
"description": "Good value laptop for students",
"price": NumberDecimal("431.43"),
"rating": NumberDecimal("4.2"),
},
{
"name": "The Day Of The Triffids",
"category": "BOOKS",
"description": "Classic post-apocalyptic novel",
"price": NumberDecimal("5.01"),
"rating": NumberDecimal("4.8"),
},
{
"name": "Morphy Richardds Food Mixer",
"category": "KITCHENWARE",
"description": "Luxury mixer turning good cakes into great",
"price": NumberDecimal("63.13"),
"rating": NumberDecimal("3.8"),
},
{
"name": "Karcher Hose Set",
"category": "GARDEN",
"description": "Hose + nosels + winder for tidy storage",
"price": NumberDecimal("22.13"),
"rating": NumberDecimal("4.3"),
},
{
"name": "Oak Coffee Table",
"category": "HOME",
"description": "size is 2m x 0.5m x 0.4m",
"price": NumberDecimal("22.13"),
"rating": NumberDecimal("3.8"),
},
{
"name": "Lenovo Laptop",
"category": "ELECTRONICS",
"description": "High spec good for gaming",
"price": NumberDecimal("1299.99"),
"rating": NumberDecimal("4.1"),
},
{
"name": "One Day in the Life of Ivan Denisovich",
"category": "BOOKS",
"description": "Brutal life in a labour camp",
"price": NumberDecimal("4.29"),
"rating": NumberDecimal("4.9"),
},
{
"name": "Russell Hobbs Chrome Kettle",
"category": "KITCHENWARE",
"description": "Nice looking budget kettle",
"price": NumberDecimal("15.76"),
"rating": NumberDecimal("3.9"),
},
]);
-Part 2-
// Insert second 8 records into the collection将第二个8条记录插入集合
db.products.insertMany([
{
"name": "Tiffany Gold Chain",
"category": "JEWELERY",
"description": "Looks great for any age and gender",
"price": NumberDecimal("582.22"),
"rating": NumberDecimal("4.0"),
},
{
"name": "Raleigh Racer 21st Century Classic",
"category": "BICYCLES",
"description": "Modern update to a classic 70s bike design",
"price": NumberDecimal("523.00"),
"rating": NumberDecimal("4.5"),
},
{
"name": "Diesel Flare Jeans",
"category": "CLOTHES",
"description": "Top end casual look",
"price": NumberDecimal("129.89"),
"rating": NumberDecimal("4.3"),
},
{
"name": "Jazz Silk Scarf",
"category": "CLOTHES",
"description": "Style for the winder months",
"price": NumberDecimal("28.39"),
"rating": NumberDecimal("3.7"),
},
{
"name": "Dell XPS 13 Laptop",
"category": "ELECTRONICS",
"description": "Developer edition",
"price": NumberDecimal("1399.89"),
"rating": NumberDecimal("4.4"),
},
{
"name": "NY Baseball Cap",
"category": "CLOTHES",
"description": "Blue & white",
"price": NumberDecimal("18.99"),
"rating": NumberDecimal("4.0"),
},
{
"name": "Tots Flower Pots",
"category": "GARDEN",
"description": "Set of three",
"price": NumberDecimal("9.78"),
"rating": NumberDecimal("4.1"),
},
{
"name": "Picky Pencil Sharpener",
"category": "Stationery",
"description": "Ultra budget",
"price": NumberDecimal("0.67"),
"rating": NumberDecimal("1.2"),
},
]);
Aggregation Pipeline聚合管道
Define a pipeline ready to perform the aggregation:定义准备执行聚合的管道:
var pipeline = [
// Group products by 2 facets: 1) by price ranges, 2) by rating ranges按2个方面对产品进行分组:1)按价格范围,2)按评级范围
{"$facet": {
// Group by price ranges
"by_price": [
// Group into 3 ranges: inexpensive small price range to expensive large price range分为3个范围:便宜的小价格范围到昂贵的大价格范围
{"$bucketAuto": {
"groupBy": "$price",
"buckets": 3,
"granularity": "1-2-5",
"output": {
"count": {"$sum": 1},
"products": {"$push": "$name"},
},
}},
// Tag range info as "price_range"
{"$set": {
"price_range": "$_id",
}},
// Omit unwanted fields
{"$unset": [
"_id",
]},
],
// Group by rating ranges按评级范围分组
"by_rating": [
// Group products evenly across 5 rating ranges from low to high在从低到高的5个评级范围内对产品进行平均分组
{"$bucketAuto": {
"groupBy": "$rating",
"buckets": 5,
"output": {
"count": {"$sum": 1},
"products": {"$push": "$name"},
},
}},
// Tag range info as "rating_range"将范围信息标记为“rating_range”
{"$set": {
"rating_range": "$_id",
}},
// Omit unwanted fields
{"$unset": [
"_id",
]},
],
}},
];
Execution执行
Execute the aggregation using the defined pipeline and also view its explain plan:使用定义的管道执行聚合,并查看其解释计划:
db.products.aggregate(pipeline);
db.products.explain("executionStats").aggregate(pipeline);
Expected Results预期结果
A single document should be returned, which contains 2 facets (keyed off 应返回一个单独的文档,其中包含2个方面(分别由by_price and by_rating respectively), where each facet shows its sub-ranges of values and the products belonging to each sub-range, as shown below:by_price和by_rating键控),其中每个方面显示其值的子范围和属于每个子范围的产品,如下所示:
[
{
by_price: [
{
count: 6,
products: [
'Picky Pencil Sharpener', 'One Day in the Life of Ivan Denisovich',
'The Day Of The Triffids', 'Tots Flower Pots', 'Russell Hobbs Chrome Kettle',
'NY Baseball Cap'
],
price_range: {
min: NumberDecimal('0.500000000000000'), max: NumberDecimal('20.0000000000000')
}
},
{
count: 5,
products: [
'Karcher Hose Set', 'Oak Coffee Table', 'Jazz Silk Scarf',
'Morphy Richardds Food Mixer', 'Diesel Flare Jeans'
],
price_range: {
min: NumberDecimal('20.0000000000000'), max: NumberDecimal('200.0000000000000')
}
},
{
count: 5,
products: [
'Asus Laptop', 'Raleigh Racer 21st Century Classic', 'Tiffany Gold Chain',
'Lenovo Laptop', 'Dell XPS 13 Laptop'
],
price_range: {
min: NumberDecimal('200.0000000000000'), max: NumberDecimal('2000.0000000000000')
}
}
],
by_rating: [
{
count: 4,
products: [
'Picky Pencil Sharpener', 'Jazz Silk Scarf', 'Morphy Richardds Food Mixer',
'Oak Coffee Table'
],
rating_range: { min: NumberDecimal('1.2'), max: NumberDecimal('3.9') }
},
{
count: 3,
products: [
'Russell Hobbs Chrome Kettle', 'Tiffany Gold Chain', 'NY Baseball Cap'
],
rating_range: { min: NumberDecimal('3.9'), max: NumberDecimal('4.1') }
},
{
count: 3,
products: [ 'Lenovo Laptop', 'Tots Flower Pots', 'Asus Laptop' ],
rating_range: { min: NumberDecimal('4.1'), max: NumberDecimal('4.3') }
},
{
count: 3,
products: [
'Karcher Hose Set', 'Diesel Flare Jeans', 'Dell XPS 13 Laptop'
],
rating_range: { min: NumberDecimal('4.3'), max: NumberDecimal('4.5') }
},
{
count: 3,
products: [
'Raleigh Racer 21st Century Classic', 'The Day Of The Triffids',
'One Day in the Life of Ivan Denisovich'
],
rating_range: { min: NumberDecimal('4.5'), max: NumberDecimal('4.9') }
}
]
}
]
Observations观察
-
Multiple Pipelines.多条管道。The不必使用$facetstage doesn't have to be employed for you to use the$bucketAutostage.$facet阶段就可以使用$bucketAuto阶段。In most faceted search scenarios, you will want to understand a collection by multiple dimensions at once (price & rating in this case).在大多数分面搜索场景中,您会希望同时从多个维度(本例中为价格和评级)来理解集合。The$facetstage is convenient because it allows you to define various$bucketAutodimensions in one go in a single pipeline.$facet阶段非常方便,因为它允许您在单个管道中一次性定义各种$bucketAuto维度。Otherwise, a client application must invoke an aggregation multiple times, each using a new否则,客户端应用程序必须多次调用聚合,每次都使用一个新的$bucketAutostage to process a different field.$bucketAuto阶段来处理不同的字段。In fact, each section of a事实上,$facetstage is just a regular aggregation [sub-]pipeline, able to contain any type of stage (with a few specific documented exceptions) and may not even contain$bucketAutoor$bucketstages at all.$facet阶段的每个部分只是一个常规的聚合[sub-]管道,可以包含任何类型的阶段(除了一些特定的记录异常),甚至可能根本不包含$bucketAuto或$bucket阶段。 -
Single Document Result.单个文档结果。If the result of a如果允许基于$facetbased aggregation is allowed to be multiple documents, this will cause a problem.$facet的聚合的结果是多个文档,这将导致问题。The results will contain a mix of records originating from different facets but with no way of ascertaining the facet each result record belongs to.结果将包含来自不同方面的记录的混合,但无法确定每个结果记录属于哪个方面。Consequently, when using因此,当使用$facet, a single document is always returned, containing top-level fields identifying each facet. Having only a single result record is not usually a problem.$facet时,总是返回一个文档,其中包含标识每个facet的顶级字段。只有一个结果记录通常不是问题。A typical requirement for faceted search is to return a small amount of grouped summary data about a collection rather than large amounts of raw data from the collection.分面搜索的一个典型要求是返回关于集合的少量分组摘要数据,而不是来自集合的大量原始数据。Therefore the 16MB document size limit should not be an issue.因此,16MB文档大小限制不应成为问题。 -
Spread Of Ranges.范围的扩展。In this example, each of the two employed bucketing facets uses a different granularity number scheme for spreading out the sub-ranges of values.在本例中,所采用的两个bucketing方面中的每一个都使用不同的粒度编号方案来展开值的子范围。You choose a numbering scheme based on what you know about the nature of the facet.您可以根据对面的性质的了解来选择编号方案。For instance, most of the ratings values in the sample collection have scores bunched between late 3s and early 4s.例如,样本集中的大多数评分值的得分都集中在3分后期和4分早期。If a numbering scheme is defined to reflect an even spread of ratings, most products will appear in the same sub-range bucket and some sub-ranges would contain no products (e.g. ratings 2 to 3 in this example).如果编号方案被定义为反映评级的均匀分布,则大多数产品将出现在同一子范围桶中,而一些子范围将不包含产品(例如,本例中的评级为2至3)。This wouldn't provide website customers with much selectivity on product ratings.这不会为网站客户提供太多的产品评级选择性。 -
Faster Facet Computation.更快的位面计算。The aggregation in this example has no choice but to perform a "full-collection-scan" to construct the faceted results.本例中的聚合别无选择,只能执行“全集合扫描”来构建分面结果。For large collections, the time the user has to wait on the website to see these results may be prohibitively long.对于大型集合,用户必须在网站上等待才能看到这些结果的时间可能长得令人望而却步。However, there is an alternative mechanism you can employ to generate faceted results faster, using Atlas Search, as highlighted in a later example in this book.然而,还有一种替代机制可以使用Atlas Search更快地生成分面结果,正如本书稍后的示例中所强调的那样。Therefore, if you can adopt Atlas Search, use its faceted search capability rather than MongoDB's general-purpose faceted search capability.因此,如果您可以采用Atlas Search,请使用它的分面搜索功能,而不是MongoDB的通用分面搜索能力。