Facets And Counts Text Search分面和计数文本搜索
Minimum MongoDB Version: 4.4 (due to use of the (由于在facet option in the $searchMeta stage)$searchMeta阶段使用了facet选项)
Scenario情形
You help run a bank's call centre and want to analyse the summary descriptions of customer telephone enquiries recorded by call centre staff. 您帮助运营银行的呼叫中心,并希望分析呼叫中心工作人员记录的客户电话查询的摘要描述。You want to look for customer calls that mention fraud and understand what periods of a specific day these fraud-related calls occur. This insight will help the bank plan its future staffing rotas for the fraud department.您想查找提及欺诈的客户电话,并了解这些与欺诈相关的电话在特定的一天中发生的时间段。这一见解将有助于该行规划未来欺诈部门的人员轮换。
To execute this example, you need to be using an Atlas Cluster rather than a self-managed MongoDB deployment.要执行此示例,您需要使用Atlas集群,而不是自行管理的MongoDB部署。The simplest way to achieve this is to provision a Free Tier Atlas Cluster.实现这一点的最简单方法是提供一个免费的Tier Atlas集群。
Sample Data Population样本数据总体
Drop any old version of the database (if it exists) and then populate a new enquiries collection with new records:删除数据库的任何旧版本(如果存在),然后用新记录填充新的enquiries集合:
db = db.getSiblingDB("book-facets-text-search");
db.enquiries.remove({});
// Insert records into the enquiries collection
db.enquiries.insertMany([
{
"acountId": "9913183",
"datetime": ISODate("2022-01-30T08:35:52Z"),
"summary": "They just made a balance enquiry only - no other issues",
},
{
"acountId": "9913183",
"datetime": ISODate("2022-01-30T09:32:07Z"),
"summary": "Reported suspected fraud - froze cards, initiated chargeback on the transaction",
},
{
"acountId": "6830859",
"datetime": ISODate("2022-01-30T10:25:37Z"),
"summary": "Customer said they didn't make one of the transactions which could be fraud - passed on to the investigations team",
},
{
"acountId": "9899216",
"datetime": ISODate("2022-01-30T11:13:32Z"),
"summary": "Struggling financially this month hence requiring extended overdraft - increased limit to 500 for 2 monts",
},
{
"acountId": "1766583",
"datetime": ISODate("2022-01-30T10:56:53Z"),
"summary": "Fraud reported - fradulent direct debit established 3 months ago - removed instruction and reported to crime team",
},
{
"acountId": "9310399",
"datetime": ISODate("2022-01-30T14:04:48Z"),
"summary": "Customer rang on mobile whilst fraud call in progress on home phone to check if it was valid - advised to hang up",
},
{
"acountId": "4542001",
"datetime": ISODate("2022-01-30T16:55:46Z"),
"summary": "Enquiring for loan - approved standard loan for 6000 over 4 years",
},
{
"acountId": "7387756",
"datetime": ISODate("2022-01-30T17:49:32Z"),
"summary": "Froze customer account when they called in as multiple fraud transactions appearing even whilst call was active",
},
{
"acountId": "3987992",
"datetime": ISODate("2022-01-30T22:49:44Z"),
"summary": "Customer called claiming fraud for a transaction which confirmed looks suspicious and so issued chargeback",
},
{
"acountId": "7362872",
"datetime": ISODate("2022-01-31T07:07:14Z"),
"summary": "Worst case of fraud I've ever seen - customer lost millions - escalated to our high value team",
},
]);
Now, using the simple procedure described in the Create Atlas Search Index appendix, define a Search Index. Select the new database collection book-facets-text-search.enquiries and enter the following JSON search index definition:现在,使用创建Atlas搜索索引附录中描述的简单过程,定义一个搜索索引。选择新的数据库集合bookfacets文本搜索查询,并输入以下JSON搜索索引定义:
{
"analyzer": "lucene.english",
"searchAnalyzer": "lucene.english",
"mappings": {
"dynamic": true,
"fields": {
"datetime": [
{"type": "date"},
{"type": "dateFacet"}
]
}
}
}
This definition indicates that the index should use the lucene-english analyzer.这个定义表明索引应该使用lucene英语分析器。It includes an explicit mapping for the它包括datetimefield to ask for the field to be indexed in two ways to simultaneously support a date range filter and faceting from the same pipeline.datetime字段的显式映射,以要求以两种方式对字段进行索引,从而同时支持日期范围筛选器和来自同一管道的faceting。The mapping indicates that all other document fields will be searchable with inferred data types.映射指示所有其他文档字段都可以使用推断的数据类型进行搜索。
Aggregation Pipeline聚合管道
Define a pipeline ready to perform the aggregation:定义准备执行聚合的管道:
var pipeline = [
// For 1 day match 'fraud' enquiries, grouped into periods of the day, counting them
{"$searchMeta": {
"index": "default",
"facet": {
"operator": {
"compound": {
"must": [
{"text": {
"path": "summary",
"query": "fraud",
}},
],
"filter": [
{"range": {
"path": "datetime",
"gte": ISODate("2022-01-30"),
"lt": ISODate("2022-01-31"),
}},
],
},
},
"facets": {
"fraudEnquiryPeriods": {
"type": "date",
"path": "datetime",
"boundaries": [
ISODate("2022-01-30T00:00:00.000Z"),
ISODate("2022-01-30T06:00:00.000Z"),
ISODate("2022-01-30T12:00:00.000Z"),
ISODate("2022-01-30T18:00:00.000Z"),
ISODate("2022-01-31T00:00:00.000Z"),
],
}
}
}
}},
];
Execution执行
Execute the aggregation using the defined pipeline:使用定义的管道执行聚合:
db.enquiries.aggregate(pipeline);
Note, it is not currently possible to view the explain plan for a 请注意,目前无法查看基于$searchMeta based aggregation.$searchMeta的聚合的解释计划。
Expected Results预期结果
The results should show the pipeline matched 6 documents for a specific day on the text 结果应显示,管道匹配了特定日期的6份文本fraud, spread out over the four 6-hour periods, as shown below:fraud文件,分布在四个6小时的时间段内,如下所示:
[
{
count: { lowerBound: Long("6") },
facet: {
fraudEnquiryPeriods: {
buckets: [
{
_id: ISODate("2022-01-30T00:00:00.000Z"),
count: Long("0")
},
{
_id: ISODate("2022-01-30T06:00:00.000Z"),
count: Long("3")
},
{
_id: ISODate("2022-01-30T12:00:00.000Z"),
count: Long("2")
},
{
_id: ISODate("2022-01-30T18:00:00.000Z"),
count: Long("1")
}
]
}
}
}
]
If you don't see any facet results and the value of 如果您没有看到任何方面的结果,并且count is zero, double-check that the system has finished generating your new index.count的值为零,请仔细检查系统是否已完成生成新索引。
Observations观察
-
Search Metadata Stage.搜索元数据阶段。The$searchMetastage is only available in aggregation pipelines run against an Atlas-based MongoDB database which leverages Atlas Search. A$searchMetastage must be the first stage of an aggregation pipeline, and under the covers, it performs a text search operation against an internally synchronised Lucene full-text index.$searchMeta阶段仅在针对利用Atlas Search的基于Atlas的MongoDB数据库运行的聚合管道中可用。$searchMeta阶段必须是聚合管道的第一个阶段,并且在幕后,它对内部同步的Lucene全文索引执行文本搜索操作。However, it is different from the但是,它与前面搜索示例章节中使用的$searchoperator used in the earlier search example chapter.$search运算符不同。Instead, you use相反,您使用$searchMetato ask the system to return metadata about the text search you executed, such as the match count, rather than returning the search result records.$searchMeta来要求系统返回有关您执行的文本搜索的元数据,例如匹配计数,而不是返回搜索结果记录。The$searchMetastage takes afacetoption, which takes two options,operatorandfacet, which you use to define the text search criteria and categorise the results in groups.$searchMeta阶段采用一个facet选项,该选项有两个选项,operator和facet,用于定义文本搜索标准并将结果分组。 -
Date Range Filter.日期范围筛选器。 The pipeline uses a$textoperator for matching descriptions containing the term fraud. Additionally, the search criteria include a$rangeoperator. The$rangeoperator allows you to match records between two numbers or two dates. The example pipeline applies a date range, only including documents where eachdatetimefield's value is 30-January-2022. -
Facet Boundaries.分面边界。 The pipeline uses afacetcollector to group metadata results by date range boundaries. Each boundary in the example defines a 6-hour period of the same specific day for a document'sdatetimefield. A single pipeline can declare multiple facets; hence you give each facet a different name. The pipeline only defines one facet in this example, labelling it fraudEnquiryPeriods. When the pipeline executes, it returns the total count of matched documents and the count of matches in each facet grouping.There were no fraud-related enquiries between midnight and 6am, indicating that perhaps the fraud department only requires "skeleton-staffing" for such periods.从午夜到早上6点,没有任何与欺诈有关的查询,这表明欺诈部门可能只需要在这段时间内配备“骨干人员”。In contrast, the period between 6am and midday shows the highest number of fraud-related enquiries, suggesting the bank dedicates additional staff to those periods.相比之下,早上6点到中午这段时间的欺诈相关查询数量最高,这表明该行在这段时间专门增加了员工。 -
Faster Facet Counts.加快分面计数。A faceted index is a special type of Lucene index optimised to compute counts of dataset categories.分面索引是一种特殊类型的Lucene索引,经过优化以计算数据集类别的计数。An application can leverage the index to offload much of the work required to analyse facets ahead of time, thus avoiding some of the latency costs when invoking a faceted search at runtime.应用程序可以利用索引提前卸载分析facet所需的大部分工作,从而避免在运行时调用facet搜索时的一些延迟成本。Therefore use the Atlas faceted search capability if you are in a position to adopt Atlas Search, rather than using MongoDB's general-purpose faceted search capability described in an earlier example in this book.因此,如果您能够采用Atlas search,请使用Atlas分面搜索功能,而不是使用本书前面示例中描述的MongoDB的通用分面搜索能力。 -
Combining A Search Operation With Metadata.将搜索操作与元数据相结合。In this example, a pipeline uses在本例中,管道使用$searchMetato obtain metadata from a search (counts and facets).$searchMeta从搜索中获取元数据(计数和方面)。What if you also want the actual search results from running如果您还希望运行类似于前面示例的$searchsimilar to the previous example?$search来获得实际搜索结果,该怎么办?You could invoke two operations from your client application, one to retrieve the search results and one to retrieve the metadata results.您可以从客户端应用程序调用两个操作,一个用于检索搜索结果,另一个用于提取元数据结果。However, Atlas Search provides a way of obtaining both aspects within a single aggregation.然而,Atlas Search提供了一种在单个聚合中获得这两个方面的方法。Instead of using a您使用的不是$searchMetastage, you use a$searchstage. The pipeline automatically stores its metadata in the$$SEARCH_METAvariable, ready for you to access it via subsequent stages in the same pipeline. For example:$searchMeta阶段,而是$search阶段。管道会自动将其元数据存储在$searchMeta变量中,以便您通过同一管道中的后续阶段访问它。例如{"$set": {"mymetadata": "$$SEARCH_META"}}