Facets And Counts Text Search分面和计数文本搜索

Minimum MongoDB Version: 4.4    (due to use of the facet option in the $searchMeta stage)(由于在$searchMeta阶段使用了facet选项)

Scenario情形

You help run a bank's call centre and want to analyse the summary descriptions of customer telephone enquiries recorded by call centre staff. 您帮助运营银行的呼叫中心,并希望分析呼叫中心工作人员记录的客户电话查询的摘要描述。You want to look for customer calls that mention fraud and understand what periods of a specific day these fraud-related calls occur. This insight will help the bank plan its future staffing rotas for the fraud department.您想查找提及欺诈的客户电话,并了解这些与欺诈相关的电话在特定的一天中发生的时间段。这一见解将有助于该行规划未来欺诈部门的人员轮换。

To execute this example, you need to be using an Atlas Cluster rather than a self-managed MongoDB deployment. 要执行此示例,您需要使用Atlas集群,而不是自行管理的MongoDB部署。The simplest way to achieve this is to provision a Free Tier Atlas Cluster.实现这一点的最简单方法是提供一个免费的Tier Atlas集群

Sample Data Population样本数据总体

Drop any old version of the database (if it exists) and then populate a new enquiries collection with new records:删除数据库的任何旧版本(如果存在),然后用新记录填充新的enquiries集合:

db = db.getSiblingDB("book-facets-text-search");
db.enquiries.remove({});

// Insert records into the enquiries collection
db.enquiries.insertMany([
  {
    "acountId": "9913183",
    "datetime": ISODate("2022-01-30T08:35:52Z"),
    "summary": "They just made a balance enquiry only - no other issues",
  },
  {
    "acountId": "9913183",
    "datetime": ISODate("2022-01-30T09:32:07Z"),
    "summary": "Reported suspected fraud - froze cards, initiated chargeback on the transaction",
  },
  {
    "acountId": "6830859",
    "datetime": ISODate("2022-01-30T10:25:37Z"),
    "summary": "Customer said they didn't make one of the transactions which could be fraud - passed on to the investigations team",
  },
  {
    "acountId": "9899216",
    "datetime": ISODate("2022-01-30T11:13:32Z"),
    "summary": "Struggling financially this month hence requiring extended overdraft - increased limit to 500 for 2 monts",
  },  
  {
    "acountId": "1766583",
    "datetime": ISODate("2022-01-30T10:56:53Z"),
    "summary": "Fraud reported - fradulent direct debit established 3 months ago - removed instruction and reported to crime team",
  },
  {
    "acountId": "9310399",
    "datetime": ISODate("2022-01-30T14:04:48Z"),
    "summary": "Customer rang on mobile whilst fraud call in progress on home phone to check if it was valid - advised to hang up",
  },
  {
    "acountId": "4542001",
    "datetime": ISODate("2022-01-30T16:55:46Z"),
    "summary": "Enquiring for loan - approved standard loan for 6000 over 4 years",
  },
  {
    "acountId": "7387756",
    "datetime": ISODate("2022-01-30T17:49:32Z"),
    "summary": "Froze customer account when they called in as multiple fraud transactions appearing even whilst call was active",
  },
  {
    "acountId": "3987992",
    "datetime": ISODate("2022-01-30T22:49:44Z"),
    "summary": "Customer called claiming fraud for a transaction which confirmed looks suspicious and so issued chargeback",
  },
  {
    "acountId": "7362872",
    "datetime": ISODate("2022-01-31T07:07:14Z"),
    "summary": "Worst case of fraud I've ever seen - customer lost millions - escalated to our high value team",
  },
]);

 

Now, using the simple procedure described in the Create Atlas Search Index appendix, define a Search Index. Select the new database collection book-facets-text-search.enquiries and enter the following JSON search index definition:现在,使用创建Atlas搜索索引附录中描述的简单过程,定义一个搜索索引。选择新的数据库集合bookfacets文本搜索查询,并输入以下JSON搜索索引定义:

{
  "analyzer": "lucene.english",
  "searchAnalyzer": "lucene.english",
  "mappings": {
    "dynamic": true,
    "fields": {
      "datetime": [
        {"type": "date"},
        {"type": "dateFacet"}
      ]
    }
  }
}

This definition indicates that the index should use the lucene-english analyzer. 这个定义表明索引应该使用lucene英语分析器。It includes an explicit mapping for the datetime field to ask for the field to be indexed in two ways to simultaneously support a date range filter and faceting from the same pipeline. 它包括datetime字段的显式映射,以要求以两种方式对字段进行索引,从而同时支持日期范围筛选器和来自同一管道的faceting。The mapping indicates that all other document fields will be searchable with inferred data types.映射指示所有其他文档字段都可以使用推断的数据类型进行搜索。

Aggregation Pipeline聚合管道

Define a pipeline ready to perform the aggregation:定义准备执行聚合的管道:

var pipeline = [
  // For 1 day match 'fraud' enquiries, grouped into periods of the day, counting them
  {"$searchMeta": {
    "index": "default",    
    "facet": {
      "operator": {
        "compound": {
          "must": [
            {"text": {
              "path": "summary",
              "query": "fraud",
            }},
          ],
          "filter": [
            {"range": {
              "path": "datetime",
              "gte": ISODate("2022-01-30"),
              "lt": ISODate("2022-01-31"),
            }},
          ],
        },
      },
      "facets": {        
        "fraudEnquiryPeriods": {
          "type": "date",
          "path": "datetime",
          "boundaries": [
            ISODate("2022-01-30T00:00:00.000Z"),
            ISODate("2022-01-30T06:00:00.000Z"),
            ISODate("2022-01-30T12:00:00.000Z"),
            ISODate("2022-01-30T18:00:00.000Z"),
            ISODate("2022-01-31T00:00:00.000Z"),
          ],
        }            
      }        
    }           
  }},
];

Execution执行

Execute the aggregation using the defined pipeline:使用定义的管道执行聚合:

db.enquiries.aggregate(pipeline);

Note, it is not currently possible to view the explain plan for a $searchMeta based aggregation.请注意,目前无法查看基于$searchMeta的聚合的解释计划。

Expected Results预期结果

The results should show the pipeline matched 6 documents for a specific day on the text fraud, spread out over the four 6-hour periods, as shown below:结果应显示,管道匹配了特定日期的6份文本fraud文件,分布在四个6小时的时间段内,如下所示:

[
  {
    count: { lowerBound: Long("6") },
    facet: {
      fraudEnquiryPeriods: {
        buckets: [
          {
            _id: ISODate("2022-01-30T00:00:00.000Z"),
            count: Long("0")
          },
          {
            _id: ISODate("2022-01-30T06:00:00.000Z"),
            count: Long("3")
          },
          {
            _id: ISODate("2022-01-30T12:00:00.000Z"),
            count: Long("2")
          },
          {
            _id: ISODate("2022-01-30T18:00:00.000Z"),
            count: Long("1")
          }
        ]
      }
    }
  }
]

If you don't see any facet results and the value of count is zero, double-check that the system has finished generating your new index.如果您没有看到任何方面的结果,并且count的值为零,请仔细检查系统是否已完成生成新索引。

Observations观察