Compound Text Search Criteria复合文本搜索条件
Minimum MongoDB Version: 4.2
Scenario情形
You want to search a collection of e-commerce products to find specific movie DVDs. 您想要搜索电子商务产品的集合来查找特定的电影DVD。Based on each DVD's full-text plot description, you want movies with a post-apocalyptic theme, especially those related to a nuclear disaster where some people survive. 根据每张DVD的全文情节描述,你想要后世界末日主题的电影,尤其是那些与核灾难有关的电影,其中一些人幸存下来。However, you aren't interested in seeing movies involving zombies.然而,你对看僵尸电影不感兴趣。
To execute this example, you need to be using an Atlas Cluster rather than a self-managed MongoDB deployment.要执行此示例,您需要使用Atlas集群,而不是自行管理的MongoDB部署。The simplest way to achieve this is to provision a Free Tier Atlas Cluster.实现这一点的最简单方法是提供一个免费的Tier Atlas集群。
Sample Data Population样本数据总体
Drop any old version of the database (if it exists) and then populate a new products collection with some DVD and Book records:删除数据库的任何旧版本(如果存在),然后用一些DVD和Book记录填充新产品集合:
db = db.getSiblingDB("book-compound-text-search");
db.products.remove({});
// Insert 7 records into the products collection在产品集合中插入7条记录
db.products.insertMany([
{
"name": "The Road",
"category": "DVD",
"description": "In a dangerous post-apocalyptic world, a dying father protects his surviving son as they try to reach the coast",
},
{
"name": "The Day Of The Triffids",
"category": "BOOK",
"description": "Post-apocalyptic disaster where most people are blinded by a meteor shower and then die at the hands of a new type of plant",
},
{
"name": "The Road",
"category": "BOOK",
"description": "In a dangerous post-apocalyptic world, a dying father protects his surviving son as they try to reach the coast",
},
{
"name": "The Day the Earth Caught Fire",
"category": "DVD",
"description": "A series of nuclear explosions cause fires and earthquakes to ravage cities, with some of those that survive trying to rescue the post-apocalyptic world",
},
{
"name": "28 Days Later",
"category": "DVD",
"description": "A caged chimp infected with a virus is freed from a lab, and the infection spreads to people who become zombie-like with just a few surviving in a post-apocalyptic country",
},
{
"name": "Don't Look Up",
"category": "DVD",
"description": "Pre-apocalyptic situation where some astronomers warn humankind of an approaching comet that will destroy planet Earth",
},
{
"name": "Thirteen Days",
"category": "DVD",
"description": "Based on the true story of the Cuban nuclear misile threat, crisis is averted at the last minute and the workd survives",
},
]);
Now, using the simple procedure described in the Create Atlas Search Index appendix, define a Search Index. 现在,使用创建Atlas搜索索引附录中描述的简单过程,定义一个搜索索引。Select the new database collection book-compound-text-search.products and enter the following JSON search index definition:选择新的数据库集合book-compound-text-search.products,并输入以下JSON搜索索引定义:
{
"searchAnalyzer": "lucene.english",
"mappings": {
"dynamic": true
}
}
This definition indicates that the index should use the lucene-english analyzer and include all document fields to be searchable with their inferred data types.这个定义表明索引应该使用lucene英语分析器,并包括所有可搜索的文档字段及其推断的数据类型。
Aggregation Pipeline聚合管道
Define a pipeline ready to perform the aggregation:定义准备执行聚合的管道:
var pipeline = [
// Search for DVDs where the description must contain "apocalyptic" but not "zombie"搜索描述必须包含“启示录”但不包含“僵尸”的DVD
{"$search": {
"index": "default",
"compound": {
"must": [
{"text": {
"path": "description",
"query": "apocalyptic",
}},
],
"should": [
{"text": {
"path": "description",
"query": "nuclear survives",
}},
],
"mustNot": [
{"text": {
"path": "description",
"query": "zombie",
}},
],
"filter": [
{"text": {
"path": "category",
"query": "DVD",
}},
],
}
}},
// Capture the search relevancy score in the output and omit the _id field在输出中捕获搜索相关性得分,并省略_id字段
{"$set": {
"score": {"$meta": "searchScore"},
"_id": "$$REMOVE",
}},
];
Execution执行
Execute the aggregation using the defined pipeline and also view its explain plan:使用定义的管道执行聚合,并查看其解释计划:
db.products.aggregate(pipeline);
db.products.explain("executionStats").aggregate(pipeline);
Expected Results预期结果
Three documents should be returned, showing products which are post-apocalyptic themed DVDs, as shown below:应退回三份文件,显示的产品为后世界末日主题DVD,如下所示:
[
{
name: 'The Day the Earth Caught Fire',
category: 'DVD',
description: 'A series of nuclear explosions cause fires and earthquakes to ravage cities, with some of those that survive trying to rescue the post-apocalyptic world',
score: 0.8468831181526184
},
{
name: 'The Road',
category: 'DVD',
description: 'In a dangerous post-apocalyptic world, a dying father protects his surviving son as they try to reach the coast',
score: 0.3709350824356079
},
{
name: "Don't Look Up",
category: 'DVD',
description: 'Pre-apocalyptic situation where some astronomers warn humankind of an approaching comet that will destroy planet Earth',
score: 0.09836573898792267
}
]
If you don't see any results, double-check that the system has finished generating your new index.如果没有看到任何结果,请仔细检查系统是否已完成生成新索引。
Observations观察
-
Search Stage.搜索阶段。The$searchstage is only available in aggregation pipelines run against an Atlas-based MongoDB database which leverages Atlas Search.$search阶段仅在针对利用Atlas search的基于Atlas的MongoDB数据库运行的聚合管道中可用。A$searchstage must be the first stage of an aggregation pipeline, and under the covers, it instructs the system to execute a text search operation against an internally synchronised Lucene full-text index.$search阶段必须是聚合管道的第一阶段,它指示系统对内部同步的Lucene全文索引执行文本搜索操作。Inside the在$searchstage, you can only use one of a small set of text-search specific pipeline operators.$search阶段中,您只能使用一小部分特定于文本搜索的管道运算符。In this example, the pipeline uses a在本例中,管道使用$compoundoperator to define a combination of multiple $text text-search operators.$compound运算符来定义多个$text文本搜索运算符的组合。 -
Results & Relevancy Explanation.结果和相关性解释。The executed pipeline ignores four of the seven input documents and sorts the remaining three documents by highest relevancy first.执行的管道忽略七个输入文档中的四个,并首先按最高相关性对其余三个文档进行排序。It achieves this by applying the following actions:它通过应用以下操作来实现这一点:It excludes two book-related records because the它排除了两个与书籍相关的记录,因为filteroption executes a$textmatch on justDVDin the category field.filter选项只在类别字段中的DVD上执行$text匹配。It ignores the "28 Days Later" DVD record because the它忽略“28天后”DVD记录,因为mustNotoption's$textmatches "zombie" in the description field.mustNot选项的$text与描述字段中的“僵尸”匹配。It excludes the movie "Thirteen Days" because even though its description contains two of the optional terms ("nuclear" and "survives"), it doesn't include the mandatory term "apocalyptic".它排除了电影《十三天》,因为尽管它的描述包含了两个可选术语(“核”和“幸存”),但它不包括强制性术语“世界末日”。It deduces the score of the remaining records based on the ratio of the number of matching terms ("apocalyptic", "nuclear", and "survives") in each document's它根据每个文档descriptionfield versus how infrequently those terms appear in other documents in the same collection.description字段中匹配术语(“启示录”、“核”和“幸存”)的数量与这些术语在同一集合中的其他文档中出现的频率之比,推断出其余记录的得分。
-
English Language Analyzer.英语语言分析器。 Atlas Search provides multiple Analyzer options for breaking down generated text indexes and executing text queries into searchable tokens. The default analyzer, Standard, is not used here because the pipeline needs to match variations of the same English words. For example, "survives" and "surviving" need to refer to the same term, and hence the text index uses the lucene.english analyzer. -
Meta Operator.元运算符。 The$metaoperator provides supplementary metadata about the results of a text search performed earlier in a pipeline. When leveraging an Atlas Search based text search, the pipeline can look up asearchScorefield in the metadata to access the relevancy score attributed to each text search result. This example usessearchScoreto help you understand why the results are in a particular order, with some records having higher relevancy than others. In this example, it serves no other purpose, and you can omit it. However, in a different situation, you might want to use the search score to filter out low relevancy results in a later$matchstage of a pipeline, for example.