Modeling application data for MongoDB should consider various operational factors that impact the performance of MongoDB. For instance, different data models can allow for more efficient queries, increase the throughput of insert and update operations, or distribute activity to a sharded cluster more effectively.为MongoDB建模应用程序数据应考虑影响MongoDB性能的各种操作因素。例如,不同的数据模型可以实现更高效的查询,提高插入和更新操作的吞吐量,或者更有效地将活动分配到分片集群。
When developing a data model, analyze all of your application's read and write operations in conjunction with the following considerations.在开发数据模型时,结合以下考虑因素分析应用程序的所有读写操作。
Atomicity原子性
In MongoDB, a write operation is atomic on the level of a single document, even if the operation modifies multiple embedded documents within a single document. 在MongoDB中,写操作在单个文档级别上是原子性的,即使该操作修改了单个文档中的多个嵌入式文档。When a single write operation modifies multiple documents (e.g. 当单个写入操作修改多个文档时(例如db.collection.updateMany()), the modification of each document is atomic, but the operation as a whole is not atomic.db.collection.updateMany()),每个文档的修改都是原子性的,但整个操作不是原子性的。
Embedded Data Model嵌入式数据模型
The embedded data model combines all related data in a single document instead of normalizing across multiple documents and collections. This data model facilitates atomic operations.嵌入式数据模型将所有相关数据组合在一个文档中,而不是在多个文档和集合之间进行标准化。此数据模型便于原子操作。
See Model Data for Atomic Operations for an example data model that provides atomic updates for a single document.有关为单个文档提供原子更新的示例数据模型,请参阅原子操作的模型数据。
Multi-Document Transaction多文档事务
For data models that store references between related pieces of data, the application must issue separate read and write operations to retrieve and modify these related pieces of data.对于存储相关数据之间引用的数据模型,应用程序必须发出单独的读写操作来检索和修改这些相关数据。
For situations that require atomicity of reads and writes to multiple documents (in a single or multiple collections), MongoDB supports distributed transactions, including transactions on replica sets and sharded clusters.对于需要对多个文档(在单个或多个集合中)进行读写原子性的情况,MongoDB支持分布式事务,包括副本集和分片集群上的事务。
For more information, see transactions有关更多信息,请参阅事务
Important
In most cases, a distributed transaction incurs a greater performance cost over single document writes, and the availability of distributed transactions should not be a replacement for effective schema design. 在大多数情况下,分布式事务比单文档写入产生更大的性能成本,分布式事务的可用性不应取代有效的模式设计。For many scenarios, the denormalized data model (embedded documents and arrays) will continue to be optimal for your data and use cases. That is, for many scenarios, modeling your data appropriately will minimize the need for distributed transactions.对于许多场景,非规范化数据模型(嵌入式文档和数组)将继续是数据和用例的最佳选择。也就是说,对于许多场景,适当地对数据进行建模将最大限度地减少对分布式事务的需求。
For additional transactions usage considerations (such as runtime limit and oplog size limit), see also Production Considerations.有关其他事务使用注意事项(如运行时限制和oplog大小限制),另请参阅生产注意事项。
Sharding分片
MongoDB uses sharding to provide horizontal scaling. These clusters support deployments with large data sets and high-throughput operations. MongoDB使用分片来提供水平扩展。这些集群支持具有大数据集和高吞吐量操作的部署。Sharding allows users to partition a collection within a database to distribute the collection's documents across a number of 分片允许用户在数据库中对集合进行分区,以便将集合的文档分布在多个mongod instances or shards.mongod实例或分片上。
To distribute data and application traffic in a sharded collection, MongoDB uses the shard key. Selecting the proper shard key has significant implications for performance, and can enable or prevent query isolation and increased write capacity. 为了在分片集合中分发数据和应用程序流量,MongoDB使用分片键。选择正确的分片键对性能有重大影响,可以启用或阻止查询隔离和增加写入容量。While you can change your shard key later, it is important to carefully consider your shard key choice.虽然您可以稍后更改分片键,但重要的是要仔细考虑分片键选择。
See Sharding and Shard Keys for more information.有关更多信息,请参阅分片和分片键。
Indexes索引
Use indexes to improve performance for common queries. Build indexes on fields that appear often in queries and for all operations that return sorted results. MongoDB automatically creates a unique index on the 使用索引来提高常见查询的性能。对查询中经常出现的字段以及返回排序结果的所有操作建立索引。MongoDB会自动在_id field._id字段上创建一个唯一索引。
As you create indexes, consider the following behaviors of indexes:创建索引时,请考虑索引的以下行为:
Each index requires at least 8 kB of data space.每个索引至少需要8kB的数据空间。Adding an index has some negative performance impact for write operations. For collections with high write-to-read ratio, indexes are expensive since each insert must also update any indexes.添加索引会对写入操作的性能产生一些负面影响。对于具有高读写比的集合,索引是昂贵的,因为每次插入都必须更新任何索引。Collections with high read-to-write ratio often benefit from additional indexes. Indexes do not affect un-indexed read operations.具有高读写比的集合通常受益于额外的索引。索引不会影响未索引的读取操作。When active, each index consumes disk space and memory. This usage can be significant and should be tracked for capacity planning, especially for concerns over working set size.当处于活动状态时,每个索引都会消耗磁盘空间和内存。这种使用可能很重要,应该跟踪容量规划,特别是对工作集大小的担忧。
See Indexing Strategies for more information on indexes as well as Interpret Explain Plan Results. 有关索引的更多信息,请参阅索引策略以及解释计划结果。Additionally, the MongoDB database profiler may help identify inefficient queries.此外,MongoDB数据库分析器可能有助于识别低效的查询。
Large Number of Collections大量集合
In certain situations, you might choose to store related information in several collections rather than in a single collection.在某些情况下,您可能会选择将相关信息存储在多个集合中,而不是存储在单个集合中。
Consider a sample collection 考虑一个示例集合logs that stores log documents for various environment and applications. The logs collection contains documents of the following form:logs,它存储各种环境和应用程序的日志文档。logs集合包含以下形式的文档:
{ log: "dev", ts: ..., info: ... }
{ log: "debug", ts: ..., info: ...}
If the total number of documents is low, you may group documents into collection by type. For logs, consider maintaining distinct log collections, such as 如果文档总数较低,您可以按类型将文档分组到集合中。对于日志,考虑维护不同的日志集合,如logs_dev and logs_debug. The logs_dev collection would contain only the documents related to the dev environment.logs_dev和logs_debug。logs_dev集合将仅包含与开发环境相关的文档。
Generally, having a large number of collections has no significant performance penalty and results in very good performance. Distinct collections are very important for high-throughput batch processing.一般来说,拥有大量的集合不会对性能产生明显的影响,并且会带来非常好的性能。不同的集合对于高通量批处理非常重要。
When using models that have a large number of collections, consider the following behaviors:使用具有大量集合的模型时,请考虑以下行为:
Each collection has a certain minimum overhead of a few kilobytes.每个集合都有一定的最小开销,即几千字节。Each index, including the index on每个索引,包括_id, requires at least 8 kB of data space._id上的索引,都需要至少8kB的数据空间。For each database, a single namespace file (i.e.对于每个数据库,一个名称空间文件(即<database>.ns) stores all meta-data for that database, and each index and collection has its own entry in the namespace file. See places namespace length limits for specific limitations.<database>.ns)存储该数据库的所有元数据,每个索引和集合在名称空间文件中都有自己的条目。请参阅放置命名空间长度限制以了解具体限制。
Collection Contains Large Number of Small Documents集合包含大量小文档
You should consider embedding for performance reasons if you have a collection with a large number of small documents. If you can group these small documents by some logical relationship and you frequently retrieve the documents by this grouping, you might consider "rolling-up" the small documents into larger documents that contain an array of embedded documents.如果您有一个包含大量小文档的集合,出于性能原因,您应该考虑嵌入。如果可以按某种逻辑关系对这些小文档进行分组,并且经常通过这种分组检索文档,则可以考虑将小文档“汇总”为包含一系列嵌入式文档的较大文档。
"Rolling up" these small documents into logical groupings means that queries to retrieve a group of documents involve sequential reads and fewer random disk accesses. Additionally, "rolling up" documents and moving common fields to the larger document benefit the index on these fields. 将这些小文档“汇总”到逻辑分组中意味着检索一组文档的查询涉及顺序读取和较少的随机磁盘访问。此外,“卷起”文档并将公共字段移动到较大的文档中有利于这些字段的索引。There would be fewer copies of the common fields and there would be fewer associated key entries in the corresponding index. See Indexes for more information on indexes.公共字段的副本将减少,相应索引中的关联键条目也将减少。有关索引的更多信息,请参阅索引。
However, if you often only need to retrieve a subset of the documents within the group, then "rolling-up" the documents may not provide better performance. Furthermore, if small, separate documents represent the natural model for the data, you should maintain that model.但是,如果您通常只需要检索组内文档的一个子集,那么“汇总”文档可能无法提供更好的性能。此外,如果单独的小文档表示数据的自然模型,则应维护该模型。
Storage Optimization for Small Documents小型文档的存储优化
Each MongoDB document contains a certain amount of overhead. This overhead is normally insignificant but becomes significant if all documents are just a few bytes, as might be the case if the documents in your collection only have one or two fields.每个MongoDB文档都包含一定量的开销。这种开销通常是微不足道的,但如果所有文档都只有几个字节,就会变得很重要,就像集合中的文档只有一两个字段一样。
Consider the following suggestions and strategies for optimizing storage utilization for these collections:考虑以下建议和策略,以优化这些集合的存储利用率:
Use the显式使用_idfield explicitly._id字段。MongoDB clients automatically add anMongoDB客户端会自动为每个文档添加一个_idfield to each document and generate a unique 12-byte ObjectId for the_idfield._id字段,并为_id字段生成一个唯一的12字节ObjectId。Furthermore, MongoDB always indexes the此外,MongoDB总是对_idfield. For smaller documents this may account for a significant amount of space._id字段进行索引。对于较小的文档,这可能会占用大量空间。To optimize storage use, users can specify a value for the为了优化存储使用,用户可以在将文档插入集合时显式指定_idfield explicitly when inserting documents into the collection. This strategy allows applications to store a value in the_idfield that would have occupied space in another portion of the document._id字段的值。此策略允许应用程序在_id字段中存储一个值,该值将占用文档另一部分的空间。You can store any value in the您可以在_idfield, but because this value serves as a primary key for documents in the collection, it must uniquely identify them. If the field's value is not unique, then it cannot serve as a primary key as there would be collisions in the collection._id字段中存储任何值,但由于该值是集合中文档的主键,因此必须唯一标识它们。如果字段的值不是唯一的,那么它就不能作为主键,因为集合中会有冲突。Use shorter field names.使用较短的字段名。Note
While shortening field names can reduce BSON size in MongoDB, it's often more effective to modify the overall document model to reduce BSON size. Shortening field names might reduce expressiveness, and does not affect the size of indexes, as indexes have a predefined structure that does not incorporate field names.虽然缩短字段名可以减少MongoDB中的BSON大小,但修改整个文档模型以减少BSON大小通常更有效。缩短字段名可能会降低表现力,并且不会影响索引的大小,因为索引具有不包含字段名的预定义结构。MongoDB stores all field names in every document. For most documents, this represents a small fraction of the space used by a document; however, for small documents the field names may represent a proportionally large amount of space. Consider a collection of small documents that resemble the following:MongoDB将所有字段名存储在每个文档中。对于大多数文档,这只占文档所用空间的一小部分;然而,对于小文档,字段名可能代表相当大的空间量。考虑一组类似于以下内容的小文档:{ last_name : "Smith", best_score: 3.9 }If you shorten the field named如果将名为last_nametolnameand the field namedbest_scoretoscore, as follows, you could save 9 bytes per document.last_name的字段缩短为lname,将名为best_score的字段缩短至score,如下所示,则每个文档可以节省9个字节。{ lname : "Smith", score : 3.9 }Embed documents.嵌入文档。In some cases you may want to embed documents in other documents and save on the per-document overhead. See Collection Contains Large Number of Small Documents.在某些情况下,您可能希望将文档嵌入到其他文档中,以节省每份文档的开销。请参阅集合包含大量小文档。Optimize the document model.优化文档模型。
Data Lifecycle Management数据生命周期管理
Data modeling decisions should take data lifecycle management into consideration.数据建模决策应考虑数据生命周期管理。
The Time to Live or TTL feature of collections expires documents after a period of time. Consider using the TTL feature if your application requires some data to persist in the database for a limited period of time.集合的生存时间或TTL功能会在一段时间后使文档过期。如果应用程序需要一些数据在数据库中保留一段有限的时间,请考虑使用TTL功能。
Additionally, if your application only uses recently inserted documents, consider Capped Collections. 此外,如果应用程序只使用最近插入的文档,请考虑封顶集合。Capped collections provide first-in-first-out (FIFO) management of inserted documents and efficiently support operations that insert and read documents based on insertion order.封顶集合提供插入文档的先进先出(FIFO)管理,并有效地支持基于插入顺序插入和读取文档的操作。