Docs HomeMongoDB Manual

Clustered Collections聚集集合

New in version 5.3.

Overview概述

Starting in MongoDB 5.3, you can create a collection with a clustered index. 从MongoDB 5.3开始,您可以创建一个具有聚集索引的集合。Collections created with a clustered index are called clustered collections.使用聚集索引创建的集合称为聚集集合。

Benefits好处

Because clustered collections store documents ordered by the clustered index key value, clustered collections have the following benefits compared to non-clustered collections:由于聚集集合存储按聚集索引键值排序的文档,因此与非聚集集合相比,聚集集合具有以下优点:

  • Faster queries on clustered collections without needing a secondary index, such as queries with range scans and equality comparisons on the clustered index key.在不需要辅助索引的情况下对聚集集合进行更快的查询,例如对聚集索引键进行范围扫描和相等性比较的查询。
  • Clustered collections have a lower storage size, which improves performance for queries and bulk inserts.聚集集合具有较低的存储大小,这提高了查询和大容量插入的性能。
  • Clustered collections can eliminate the need for a secondary TTL (Time To Live) index.聚集集合可以消除对辅助TTL(生存时间)索引的需要。

    • A clustered index is also a TTL index if you specify the expireAfterSeconds field.如果指定expireAfterSeconds字段,聚集索引也是TTL索引。
    • To be used as a TTL index, the _id field must be a supported date type. 要用作TTL索引,_id字段必须是受支持的日期类型。See TTL Indexes.请参阅TTL索引
    • If you use a clustered index as a TTL index, it improves document delete performance and reduces the clustered collection storage size.如果使用聚集索引作为TTL索引,则可以提高文档删除性能并减小聚集集合存储大小。
  • Clustered collections have additional performance improvements for inserts, updates, deletes, and queries.聚集集合在插入、更新、删除和查询方面具有额外的性能改进。

    • All collections have an _id index.所有集合都有一个_id索引
    • A non-clustered collection stores the _id index separately from the documents. 非聚集集合将_id索引与文档分开存储。This requires two writes for inserts, updates, and deletes, and two reads for queries.这需要对插入、更新和删除进行两次写入,对查询进行两次读取。
    • A clustered collection stores the index and the documents together in _id value order. 聚集集合按_id值的顺序将索引和文档存储在一起。This requires one write for inserts, updates, and deletes, and one read for queries.这需要一次写入用于插入、更新和删除,一次读取用于查询。

Behavior行为

Clustered collections store documents ordered by the clustered index key value.聚集集合存储按聚集索引键值排序的文档。

You can only have one clustered index in a collection because the documents can be stored in only one order. Only collections with a clustered index store the data in sorted order.一个集合中只能有一个聚集索引,因为文档只能按一个顺序存储。只有具有聚集索引的集合才能按排序顺序存储数据。

You can have a clustered index and add secondary indexes to a clustered collection. Clustered indexes differ from secondary indexes:您可以拥有聚集索引并将辅助索引添加到聚集集合中。聚集索引与辅助索引不同:

  • A clustered index can only be created when you create the collection.只有在创建集合时才能创建聚集索引。
  • The clustered index keys are stored with the collection. 聚集索引键与集合一起存储。The collection size returned by the collStats command includes the clustered index size.collStats命令返回的集合大小包括聚集索引大小。
Important

Backward-Incompatible Feature向后不兼容功能

You must drop clustered collections before you can downgrade to a version of MongoDB earlier than 5.3.必须先删除聚集集合,然后才能降级到低于5.3的MongoDB版本。

Limitations局限性

Clustered collection limitations:聚集集合限制:

  • You cannot transform a non-clustered collection to a clustered collection, or the reverse. Instead, you can:不能将非聚集集合转换为聚集集合,或者将聚集集合转换为非聚集集合。但是,您可以:

    • Read documents from one collection and write them to another collection using an aggregation pipeline with an $out stage or a $merge stage.使用带有$out阶段或$merge阶段的聚合管道从一个集合读取文档,并将它们写入另一个集合。
    • Export collection data with mongodump and import the data into another collection with mongorestore.使用mongodump导出集合数据,并使用mongorestore将数据导入另一个集合。
  • By default, if a secondary index exists on a clustered collection and the secondary index is usable by your query, the secondary index is selected instead of the clustered index.默认情况下,如果聚集集合中存在辅助索引,并且查询可以使用该辅助索引,则会选择辅助索引而不是聚集索引。

    • You must provide a hint to use the clustered index because it is not automatically selected by the query optimizer.必须提供使用聚集索引的提示,因为查询优化器不会自动选择聚集索引。
    • The clustered index is not automatically used by the query optimizer if a usable secondary index exists.如果存在可用的辅助索引,则查询优化器不会自动使用聚集索引
    • When a query uses a clustered index, it will perform a bounded collection scan.当查询使用聚集索引时,它将执行有界集合扫描
  • The clustered index key must be on the _id field.聚集索引键必须位于_id字段上。
  • You cannot hide a clustered index. See Hidden indexes.不能隐藏聚集索引。请参见隐藏索引
  • If there are secondary indexes for the clustered collection, the collection has a larger storage size. 如果聚集集合有辅助索引,则该集合的存储大小较大。This is because secondary indexes on a clustered collection with large clustered index keys may have a larger storage size than secondary indexes on a non-clustered collection.这是因为具有大型聚集索引键的聚集集合上的辅助索引的存储大小可能大于非聚集集合上的次要索引。
  • Clustered collections may not be capped collections.聚集集合可能不是封顶集合

Set Your Own Clustered Index Key Values设置自己的聚集索引键值

By default, the clustered index key values are the unique document object identifiers.默认情况下,聚集索引键值是唯一的文档对象标识符

You can set your own clustered index key values. Your key:您可以设置自己的聚集索引键值。您的键:

  • Must contain unique values.必须包含唯一的值。
  • Must be immutable.必须是不可变的。
  • Should contain sequentially increasing values. 应包含按顺序递增的值。This is not a requirement but improves insert performance.这不是要求,但可以提高插入性能。
  • Should be as small in size as possible.尺寸应尽可能小。

    • A clustered index supports keys up to 8 MB in size, but a much smaller clustered index key is best.聚集索引支持大小高达8MB的键,但最好使用小得多的聚集索引键。
    • A large clustered index key causes the clustered collection to increase in size and secondary indexes are also larger. 大的聚集索引键会导致聚集集合的大小增加,辅助索引也会更大。This reduces the performance and storage benefits of the clustered collection.这降低了聚集集合的性能和存储优势。
    • Secondary indexes on clustered collections with large clustered index keys may use more space compared to secondary indexes on non-clustered collections.与非聚集集合上的二级索引相比,具有大型聚集索引键的聚集集合的二级指数可能会占用更多空间。

Examples实例

This section shows clustered collection examples.本节显示聚集集合示例。

Create Example示例

The following create example adds a clustered collection named products:以下create示例添加了一个名为products聚集集合

db.runCommand( {
create: "products",
clusteredIndex: { "key": { _id: 1 }, "unique": true, "name": "products clustered key" }
} )

In the example, clusteredIndex specifies:在示例中,clusteredIndex指定:

  • "key": { _id: 1 }, which sets the clustered index key to the _id field.,将聚集索引键设置为_id字段。
  • "unique": true, which indicates the clustered index key value must be unique.,表示聚集索引键值必须是唯一的。
  • "name": "products clustered key", which sets the clustered index name.,用于设置聚集索引名称。

db.createCollection Example实例

The following db.createCollection() example adds a clustered collection named stocks:以下db.createCollection()示例添加了一个名为stocks聚集集合

db.createCollection(
"stocks",
{ clusteredIndex: { "key": { _id: 1 }, "unique": true, "name": "stocks clustered key" } }
)

In the example, clusteredIndex specifies:在示例中,clusteredIndex指定:

  • "key": { _id: 1 }, which sets the clustered index key to the _id field.,将聚集索引键设置为_id字段。
  • "unique": true, which indicates the clustered index key value must be unique.,表示聚集索引键值必须是唯一的。
  • "name": "stocks clustered key", which sets the clustered index name.,用于设置聚集索引名称。

Date Clustered Index Key Example日期聚集索引键示例

The following create example adds a clustered collection named orders:以下create示例添加了一个名为orders的聚集集合:

db.createCollection(
"orders",
{ clusteredIndex: { "key": { _id: 1 }, "unique": true, "name": "orders clustered key" } }
)

In the example, clusteredIndex specifies:在示例中,clusteredIndex指定:

  • "key": { _id: 1 }, which sets the clustered index key to the _id field.,将聚集索引键设置为_id字段。
  • "unique": true, which indicates the clustered index key value must be unique.,表示聚集索引键值必须是唯一的。
  • "name": "orders clustered key", which sets the clustered index name.,用于设置聚集索引名称。

The following example adds documents to the orders collection:以下示例将文档添加到orders集合:

db.orders.insertMany( [
{ _id: ISODate( "2022-03-18T12:45:20Z" ), "quantity": 50, "totalOrderPrice": 500 },
{ _id: ISODate( "2022-03-18T12:47:00Z" ), "quantity": 5, "totalOrderPrice": 50 },
{ _id: ISODate( "2022-03-18T12:50:00Z" ), "quantity": 1, "totalOrderPrice": 10 }
] )

The _id clusteredIndex key stores the order date._id clusteredIndex键存储订单日期。

If you use the _id field in a range query, performance is improved. 如果在范围查询中使用_id字段,则可以提高性能。For example, the following query uses _id and $gt to return the orders where the order date is greater than the supplied date:例如,以下查询使用_id$gt返回订单日期大于提供日期的订单:

db.orders.find( { _id: { $gt: ISODate( "2022-03-18T12:47:00.000Z" ) } } )

Example output:输出示例:

[
{
_id: ISODate( "2022-03-18T12:50:00.000Z" ),
quantity: 1,
totalOrderPrice: 10
}
]

Determine if a Collection is Clustered确定集合是否群集

To determine if a collection is clustered, use the listCollections command:要确定集合是否已聚集,请使用listCollections命令:

db.runCommand( { listCollections: 1 } )

For clustered collections, you will see the clusteredIndex details in the output. 对于聚集集合,您将在输出中看到clusteredIndex的详细信息。For example, the following output shows the details for the orders clustered collection:例如,以下输出显示orders聚集集合的详细信息:

...
name: 'orders',
type: 'collection',
options: {
clusteredIndex: {
v: 2,
key: { _id: 1 },
name: 'orders clustered key',
unique: true
}
},
...

v is the index version.是索引版本。