New in version 5.3.版本5.3中的新功能。
Clustered collections store documents in index order rather than the natural order typical of traditional collections. 聚集集合以索引顺序存储文档,而不是传统集合的自然顺序。Clustered collections store documents in one WiredTiger file ordered according to the index specification, instead of requiring a separate index file for the default 聚集集合将文档存储在一个根据索引规范排序的WiredTiger文件中,而不需要为默认_id index._id索引提供单独的索引文件。
Storing the collection's documents in index order can provide benefits for storage and performance compared to traditional collections and their related regular indexes.与传统的集合及其相关的常规索引相比,按索引顺序存储集合的文档可以提高存储和性能。
Clustered collections are created with a clustered index. The clustered index specifies the order in which documents are stored.使用聚集索引创建聚集集合。聚集索引指定了文档的存储顺序。
To create a clustered collection, see Examples.要创建聚集集合,请参阅示例。
Important
Backward-Incompatible Feature向后不兼容功能
You must drop clustered collections before you can downgrade to a version of MongoDB earlier than 5.3.您必须先删除聚集集合,然后才能降级到早于5.3的MongoDB版本。
Benefits益处
Clustered collections have the following benefits compared to non-clustered collections:与非聚集集合相比,聚集集合具有以下优点:
Faster queries on clustered collections without needing a secondary index, such as queries with range scans and equality comparisons on the clustered index key.在不需要辅助索引的情况下对聚集集合进行更快的查询,例如对聚集索引键进行范围扫描和相等性比较的查询。Clustered collections have a lower storage size, which improves performance for queries and bulk inserts.聚集集合具有较低的存储大小,这提高了查询和批量插入的性能。Clustered collections can eliminate the need for a secondary TTL (Time To Live) index.聚集集合可以消除对辅助TTL(生存时间)索引的需求。A clustered index is also a TTL index if you specify the expireAfterSeconds field.如果指定expireAfterSeconds字段,聚集索引也是TTL索引。To be used as a TTL index, the若要用作TTL索引,_idfield must be a supported date type. See TTL Indexes._id字段必须是受支持的日期类型。请参阅TTL索引。If you use a clustered index as a TTL index, it improves document delete performance and reduces the clustered collection storage size.如果将聚集索引用作TTL索引,则可以提高文档删除性能并减小聚集集合存储大小。
Clustered collections have additional performance improvements for inserts, updates, deletes, and queries.聚集集合在插入、更新、删除和查询方面有额外的性能改进。All collections have an _id index.所有集合都有一个_id索引。A non-clustered collection stores the非聚集集合将_idindex separately from the documents. This requires two writes for inserts, updates, and deletes, and two reads for queries._id索引与文档分开存储。这需要两次写入以进行插入、更新和删除,以及两次读取以进行查询。A clustered collection stores the index and the documents together in聚集集合按_idvalue order. This requires one write for inserts, updates, and deletes, and one read for queries._id值顺序将索引和文档存储在一起。这需要一次写入操作用于插入、更新和删除,一次读取操作用于查询。
Behavior行为
Clustered collections store documents ordered by the clustered index key value. The clustered index key must be 聚集集合存储按聚集索引键值排序的文档。聚集索引键必须为{ _id: 1 }.{ _id: 1 }。
You can only have one clustered index in a collection because the documents can be stored in only one order. Only collections with a clustered index store the data in sorted order.一个集合中只能有一个聚集索引,因为文档只能按一种顺序存储。只有具有聚集索引的集合才能按排序顺序存储数据。
You can have a clustered index and add secondary indexes to a clustered collection. Clustered indexes differ from secondary indexes:您可以拥有聚集索引,并向聚集集合添加辅助索引。聚集索引不同于辅助索引:
A clustered index can only be created when you create the collection.只有在创建集合时才能创建聚集索引。The clustered index keys are stored with the collection. The collection size returned by the聚集索引键与集合一起存储。collStatscommand includes the clustered index size.collStats命令返回的集合大小包括聚集索引大小。
Starting in MongoDB 6.0.7, if a usable clustered index exists, the MongoDB query planner evaluates the clustered index against secondary indexes in the query planning process. 从MongoDB 6.0.7开始,如果存在可用的聚集索引,MongoDB查询计划器将在查询计划过程中根据辅助索引评估聚集索引。When a query uses a clustered index, MongoDB performs a bounded collection scan.当查询使用聚集索引时,MongoDB会执行有界集合扫描。
Prior to MongoDB 6.0.7, if a secondary index existed on a clustered collection and the secondary index was usable by your query, the query planner selected the secondary index instead of the clustered index by default. 在MongoDB 6.0.7之前,如果聚集集合上存在辅助索引,并且该辅助索引可供您的查询使用,则默认情况下,查询计划器会选择辅助索引而不是聚集索引。In MongoDB 6.1 and prior, to use the clustered index, you must provide a hint because the query optimizer does not automatically select the clustered index.在MongoDB 6.1及更早版本中,要使用聚集索引,必须提供提示,因为查询优化器不会自动选择聚集索引。
Index Size索引大小
In clustered collections with only a default index on the 在_ìd字段上只有默认索引(没有辅助索引)的聚集集合中,索引大小显示为零,因为该集合不需要单独的索引文件。_ìd field (no secondary indexes), the index size appears as zero because the collection does not require a separate index file.
Limitations局限性
Clustered collection limitations:群集集合限制:
The clustered index key must be聚集索引键必须为{ _id: 1 }.{ _id: 1 }。You cannot transform a non-clustered collection to a clustered collection, or the reverse. Instead, you can:您无法将非群集集合转换为群集集合,反之亦然。相反,您可以:Read documents from one collection and write them to another collection using an aggregation pipeline with an使用带有$outstage or a$mergestage.$out阶段或$merge阶段的聚合管道从一个集合读取文档并将其写入另一个集合。Export collection data with使用mongodumpand import the data into another collection withmongorestore.mongodump导出集合数据,并使用mongorestore将数据导入另一个集合。
You cannot hide a clustered index. See Hidden indexes.您无法隐藏聚集索引。请参见隐藏索引。If there are secondary indexes for the clustered collection, the collection has a larger storage size. This is because secondary indexes on a clustered collection with large clustered index keys may have a larger storage size than secondary indexes on a non-clustered collection.如果聚集集合有辅助索引,则该集合的存储大小会更大。这是因为具有大型聚集索引键的聚集集合上的辅助索引可能比非聚集集合中的辅助索引具有更大的存储大小。Clustered collections may not be capped collections.聚集集合不能是封顶集合。
Set Your Own Clustered Index Key Values设置自己的聚集索引键值
By default, the clustered index key values are the unique document object identifiers.默认情况下,聚集索引键值是唯一的文档对象标识符。
You can set your own clustered index key values. Your key values must follow the standard constraints of the _id field.您可以设置自己的聚集索引键值。您的键值必须遵循_id字段的标准约束。
Additionally, use the following practices to optimize performance:此外,使用以下实践来优化性能:
Use sequentially increasing key values to improve insert performance.使用顺序递增的键值来提高插入性能。Set your index keys to be as small in size as possible.将索引键的大小设置得尽可能小。A clustered index supports keys up to 8 MB in size, but a much smaller clustered index key is best.聚集索引支持大小不超过8 MB的键,但最好使用更小的聚集索引键。Large keys increase the storage size of the clustered collection and its secondary indexes which decreases clustered collection performance.大键会增加聚集集合及其辅助索引的存储大小,从而降低聚集集合的性能。
Warning
Randomly generated key values may decrease a clustered collection's performance.随机生成的键值可能会降低聚集集合的性能。
Examples示例
This section shows clustered collection examples.本节显示了聚集集合示例。
Create Example示例
The following 以下create example adds a clustered collection named products:create示例添加了一个名为products的聚集集合:
db.runCommand( {
create: "products",
clusteredIndex: { "key": { _id: 1 }, "unique": true, "name": "products clustered key" }
} )
In the example, clusteredIndex specifies:在该示例中,clusteredIndex指定:
"key": { _id: 1 }, which sets the clustered index key to the_idfield."key": { _id: 1 },它将聚集索引键设置为_id字段。"unique": true, which indicates the clustered index key value must be unique."unique": true,表示聚集索引键值必须是唯一的。"name": "products clustered key", which sets the clustered index name.,它设置聚集索引名称。
db.createCollection Example示例
The following 以下db.createCollection() example adds a clustered collection named stocks:db.createCollection()示例添加了一个名为stocks的聚集集合:
db.createCollection(
"stocks",
{ clusteredIndex: { "key": { _id: 1 }, "unique": true, "name": "stocks clustered key" } }
)In the example, clusteredIndex specifies:在该示例中,clusteredIndex指定:
"key": { _id: 1 }, which sets the clustered index key to the,它将聚集索引键设置为_idfield._id字段。"unique": true, which indicates the clustered index key value must be unique.,这表示聚集索引键值必须是唯一的。"name": "stocks clustered key", which sets the clustered index name.,它设置聚集索引名称。
Date Clustered Index Key Example日期聚集索引键示例
The following 以下create example adds a clustered collection named orders:create示例添加了一个名为orders的集群集合:
db.createCollection(
"orders",
{ clusteredIndex: { "key": { _id: 1 }, "unique": true, "name": "orders clustered key" } }
)
In the example, clusteredIndex specifies:在该示例中,clusteredIndex指定:
"key": { _id: 1 }, which sets the clustered index key to the,它将聚集索引键设置为_idfield._id字段。"unique": true, which indicates the clustered index key value must be unique.,这表示聚集索引键值必须是唯一的。"name": "orders clustered key", which sets the clustered index name.,设置聚集索引名称。
The following example adds documents to the 以下示例将文档添加到orders collection:orders集合中:
db.orders.insertMany( [
{ _id: ISODate( "2022-03-18T12:45:20Z" ), "quantity": 50, "totalOrderPrice": 500 },
{ _id: ISODate( "2022-03-18T12:47:00Z" ), "quantity": 5, "totalOrderPrice": 50 },
{ _id: ISODate( "2022-03-18T12:50:00Z" ), "quantity": 1, "totalOrderPrice": 10 }
] )The _id clusteredIndex key stores the order date._id clusteredIndex键存储订单日期。
If you use the 如果在范围查询中使用_id field in a range query, performance is improved. _id字段,性能会得到提高。For example, the following query uses 例如,以下查询使用_id and $gt to return the orders where the order date is greater than the supplied date:_id和$gt返回订单日期大于提供日期的订单:
db.orders.find( { _id: { $gt: ISODate( "2022-03-18T12:47:00.000Z" ) } } )Example output:输出示例:
[
{
_id: ISODate( "2022-03-18T12:50:00.000Z" ),
quantity: 1,
totalOrderPrice: 10
}
]Determine if a Collection is Clustered确定集合是否已群集
To determine if a collection is clustered, use the 要确定集合是否已群集,请使用listCollections command:listCollections命令:
db.runCommand( { listCollections: 1 } )
For clustered collections, you will see the clusteredIndex details in the output. For example, the following output shows the details for the 对于集群集合,您将在输出中看到orders clustered collection:clusteredIndex详细信息。例如,以下输出显示了orders集群集合的详细信息:
...
name: 'orders',
type: 'collection',
options: {
clusteredIndex: {
v: 2,
key: { _id: 1 },
name: 'orders clustered key',
unique: true
}
},
...v is the index version.是索引版本。