Selectivity is a query property that describes the ratio of documents matching the query versus the total number of documents in a collection. The selectivity of an index describes how many documents a unique index key matches. A query or index has high selectivity when proportionally few documents match a query or a given index key.选择性是一个查询属性,描述了与查询匹配的文档与集合中文档总数的比率。索引的选择性描述了一个唯一索引键匹配的文档数量。当与查询或给定索引键匹配的文档比例较低时,查询或索引具有很高的选择性。
Because indexes can have different selectivities depending on the index keys used, ensure that the most selective indexes are available based on the predicates contained in a query. To ensure the most efficient query execution, create indexes that most uniquely match the predicates contained in a query.由于索引的选择性可能因所使用的索引键而异,因此请确保根据查询中包含的谓词提供最具选择性的索引。为了确保最有效的查询执行,请创建与查询中包含的谓词最唯一匹配的索引。
Examples示例
Selectivity with Many Common Values具有许多共同价值观的选择性
Consider a collection of documents that have the following form:考虑一组具有以下形式的文档:
{
status: "processed",
product_type: "electronics"
}
In this example, the 在此示例中,集合中99%文档的status of 99% of documents in the collection is processed. status(状态)是processed(已处理)。If you add an index on 如果为status and query for documents with the status of processed, both the index and the query have low selectivity. status添加索引并查询status为processed的文档,则索引和查询的选择性都很低。However, if you want to query for documents that do not have the 但是,如果要查询status of processed, the index and the query have high selectivity because the query only returns 1% of the documents in a collection.status为processed的文档,索引和查询具有很高的选择性,因为查询只返回集合中1%的文档。
Selectivity When Values are Distributed值分布时的选择性
Consider a collection of documents where the 考虑一组文档,其中status field has three values distributed across the collection:status字段有三个值分布在整个集合中:
[
{ _id: ObjectId(), status: "processed", product_type: "electronics" },
{ _id: ObjectId(), status: "processed", product_type: "grocery" },
{ _id: ObjectId(), status: "processed", product_type: "household" },
{ _id: ObjectId(), status: "pending", product_type: "electronics" },
{ _id: ObjectId(), status: "pending", product_type: "grocery" },
{ _id: ObjectId(), status: "pending", product_type: "household" },
{ _id: ObjectId(), status: "new", product_type: "electronics" },
{ _id: ObjectId(), status: "new", product_type: "grocery" },
{ _id: ObjectId(), status: "new", product_type: "household" }
]
If you add an index on 如果您在status and query for { "status": "pending", "product_type": "electronics" }, MongoDB must read three index keys, retrieve three documents matching that status, and filter those documents further on product_type to return the one matching document. status上添加索引并查询{ "status": "pending", "product_type": "electronics" },MongoDB必须读取三个索引键,检索与该状态匹配的三个文档,并在product_type上进一步筛选这些文档以返回一个匹配的文档。Similarly, a query for 同样,查询{ "status": {$in: ["processed", "pending"] }, "product_type" : "electronics" } must read six documents to return the two matching documents.{ "status": {$in: ["processed", "pending"] }, "product_type" : "electronics" }必须读取六个文档才能返回两个匹配的文档。
Consider the same index on a collection where 考虑一个集合上的同一索引,其中status has nine values distributed across the collection:status有九个值分布在整个集合中:
[
{ _id: ObjectId(), status: 1, product_type: "electronics" },
{ _id: ObjectId(), status: 2, product_type: "grocery" },
{ _id: ObjectId(), status: 3, product_type: "household"},
{ _id: ObjectId(), status: 4, product_type: "electronics" },
{ _id: ObjectId(), status: 5, product_type: "grocery"},
{ _id: ObjectId(), status: 6, product_type: "household"},
{ _id: ObjectId(), status: 7, product_type: "electronics" },
{ _id: ObjectId(), status: 8, product_type: "grocery" },
{ _id: ObjectId(), status: 9, product_type: "household" }
]
If you query for 如果你查询{ "status": 2, "product_type": "grocery" }, MongoDB only reads one document matching the index key, indicating the index is highly selective. { "status": 2, "product_type": "grocery" },MongoDB只会读取一个与索引键匹配的文档,这表明该索引具有高度的选择性。By using this index, you can receive a query response more efficiently, since MongoDB must only further filter one document matching the index value. In this case, the filter also matches, and the query only returns one document.通过使用此索引,您可以更有效地接收查询响应,因为MongoDB只需进一步筛选一个与索引值匹配的文档。在这种情况下,筛选器也匹配,查询只返回一个文档。
Although this example's query on 虽然这个例子对status equality is more selective, a query such as { "status": { $gt: 5 }, "product_type": "grocery" } still needs to read four documents if you use the same index on status. However, if you create a compound index on product_type and status, MongoDB can more efficiently answer a query for {"status": { $gt: 5 }, "product_type": "grocery" } via the compound index, as the query returns only one matching document.status相等性的查询更具选择性,但如果你对status使用相同的索引,那么像{ "status": { $gt: 5 }, "product_type": "grocery" }这样的查询仍然需要读取四个文档。但是,如果你在product_type和status上创建一个复合索引,MongoDB可以通过复合索引更有效地回答{"status": { $gt: 5 }, "product_type": "grocery" } 的查询,因为查询只返回一个匹配的文档。
To improve query performance, you can create a compound index that narrows the documents that queries read. For example, if you want to improve performance for queries on 为了提高查询性能,您可以创建一个复合索引,缩小查询读取的文档范围。例如,如果你想提高status and product_type, you could create a compound index on those two fields.status和product_type查询的性能,你可以在这两个字段上创建一个复合索引。
If MongoDB reads a relatively large number of documents to return results, some queries may perform faster without indexes. To determine performance, see Measure Index Use.如果MongoDB读取相对大量的文档以返回结果,则某些查询在没有索引的情况下可能会执行得更快。要确定性能,请参阅度量指标使用。