Database Manual / Indexes / Strategies

Create Selective Indexes to Answer Queries Efficiently创建选择性索引以高效回答查询

Selectivity is a query property that describes the ratio of documents matching the query versus the total number of documents in a collection. The selectivity of an index describes how many documents a unique index key matches. A query or index has high selectivity when proportionally few documents match a query or a given index key.选择性是一个查询属性,描述了与查询匹配的文档与集合中文档总数的比率。索引的选择性描述了一个唯一索引键匹配的文档数量。当与查询或给定索引键匹配的文档比例较低时,查询或索引具有很高的选择性。

Because indexes can have different selectivities depending on the index keys used, ensure that the most selective indexes are available based on the predicates contained in a query. To ensure the most efficient query execution, create indexes that most uniquely match the predicates contained in a query.由于索引的选择性可能因所使用的索引键而异,因此请确保根据查询中包含的谓词提供最具选择性的索引。为了确保最有效的查询执行,请创建与查询中包含的谓词最唯一匹配的索引。

Examples示例

Selectivity with Many Common Values具有许多共同价值观的选择性

Consider a collection of documents that have the following form:考虑一组具有以下形式的文档:

{
status: "processed",
product_type: "electronics"
}

In this example, the status of 99% of documents in the collection is processed. 在此示例中,集合中99%文档的status(状态)是processed(已处理)。If you add an index on status and query for documents with the status of processed, both the index and the query have low selectivity. 如果为status添加索引并查询statusprocessed的文档,则索引和查询的选择性都很低。However, if you want to query for documents that do not have the status of processed, the index and the query have high selectivity because the query only returns 1% of the documents in a collection.但是,如果要查询statusprocessed的文档,索引和查询具有很高的选择性,因为查询只返回集合中1%的文档。

Selectivity When Values are Distributed值分布时的选择性

Consider a collection of documents where the status field has three values distributed across the collection:考虑一组文档,其中status字段有三个值分布在整个集合中:

[
{ _id: ObjectId(), status: "processed", product_type: "electronics" },
{ _id: ObjectId(), status: "processed", product_type: "grocery" },
{ _id: ObjectId(), status: "processed", product_type: "household" },
{ _id: ObjectId(), status: "pending", product_type: "electronics" },
{ _id: ObjectId(), status: "pending", product_type: "grocery" },
{ _id: ObjectId(), status: "pending", product_type: "household" },
{ _id: ObjectId(), status: "new", product_type: "electronics" },
{ _id: ObjectId(), status: "new", product_type: "grocery" },
{ _id: ObjectId(), status: "new", product_type: "household" }
]

If you add an index on status and query for { "status": "pending", "product_type": "electronics" }, MongoDB must read three index keys, retrieve three documents matching that status, and filter those documents further on product_type to return the one matching document. 如果您在status上添加索引并查询{ "status": "pending", "product_type": "electronics" },MongoDB必须读取三个索引键,检索与该状态匹配的三个文档,并在product_type上进一步筛选这些文档以返回一个匹配的文档。Similarly, a query for { "status": {$in: ["processed", "pending"] }, "product_type" : "electronics" } must read six documents to return the two matching documents.同样,查询{ "status": {$in: ["processed", "pending"] }, "product_type" : "electronics" }必须读取六个文档才能返回两个匹配的文档。

Consider the same index on a collection where status has nine values distributed across the collection:考虑一个集合上的同一索引,其中status个值分布在整个集合中:

[
{ _id: ObjectId(), status: 1, product_type: "electronics" },
{ _id: ObjectId(), status: 2, product_type: "grocery" },
{ _id: ObjectId(), status: 3, product_type: "household"},
{ _id: ObjectId(), status: 4, product_type: "electronics" },
{ _id: ObjectId(), status: 5, product_type: "grocery"},
{ _id: ObjectId(), status: 6, product_type: "household"},
{ _id: ObjectId(), status: 7, product_type: "electronics" },
{ _id: ObjectId(), status: 8, product_type: "grocery" },
{ _id: ObjectId(), status: 9, product_type: "household" }
]

If you query for { "status": 2, "product_type": "grocery" }, MongoDB only reads one document matching the index key, indicating the index is highly selective. 如果你查询{ "status": 2, "product_type": "grocery" },MongoDB只会读取一个与索引键匹配的文档,这表明该索引具有高度的选择性。By using this index, you can receive a query response more efficiently, since MongoDB must only further filter one document matching the index value. In this case, the filter also matches, and the query only returns one document.通过使用此索引,您可以更有效地接收查询响应,因为MongoDB只需进一步筛选一个与索引值匹配的文档。在这种情况下,筛选器也匹配,查询只返回一个文档。

Although this example's query on status equality is more selective, a query such as { "status": { $gt: 5 }, "product_type": "grocery" } still needs to read four documents if you use the same index on status. However, if you create a compound index on product_type and status, MongoDB can more efficiently answer a query for {"status": { $gt: 5 }, "product_type": "grocery" } via the compound index, as the query returns only one matching document.虽然这个例子对status相等性的查询更具选择性,但如果你对status使用相同的索引,那么像{ "status": { $gt: 5 }, "product_type": "grocery" }这样的查询仍然需要读取四个文档。但是,如果你在product_typestatus上创建一个复合索引,MongoDB可以通过复合索引更有效地回答{"status": { $gt: 5 }, "product_type": "grocery" } 的查询,因为查询只返回一个匹配的文档。

To improve query performance, you can create a compound index that narrows the documents that queries read. For example, if you want to improve performance for queries on status and product_type, you could create a compound index on those two fields.为了提高查询性能,您可以创建一个复合索引,缩小查询读取的文档范围。例如,如果你想提高statusproduct_type查询的性能,你可以在这两个字段上创建一个复合索引。

If MongoDB reads a relatively large number of documents to return results, some queries may perform faster without indexes. To determine performance, see Measure Index Use.如果MongoDB读取相对大量的文档以返回结果,则某些查询在没有索引的情况下可能会执行得更快。要确定性能,请参阅度量指标使用