Docs HomeMongoDB Manual

FAQ: MongoDB Storage

This document addresses common questions regarding MongoDB's storage system.本文档解决了有关MongoDB存储系统的常见问题。

Storage Engine Fundamentals存储引擎基础知识

What is a storage engine?什么是存储引擎?

A storage engine is the part of a database that is responsible for managing how data is stored, both in memory and on disk. 存储引擎是数据库的一部分,负责管理数据在内存和磁盘上的存储方式。Many databases support multiple storage engines, where different engines perform better for specific workloads. 许多数据库支持多个存储引擎,其中不同的引擎对特定的工作负载表现更好。For example, one storage engine might offer better performance for read-heavy workloads, and another might support a higher throughput for write operations.例如,一个存储引擎可能为读取繁重的工作负载提供更好的性能,另一个可能为写入操作提供更高的吞吐量。

Tip

See also: 另请参阅:

Storage Engines

Can you mix storage engines in a replica set?您可以在复制集中混合使用存储引擎吗?

Yes. You can have replica set members that use different storage engines (WiredTiger and in-memory)对您可以拥有使用不同存储引擎(WiredTiger和内存中)的复制副本集成员

Note

Starting in version 4.2, MongoDB removes the deprecated MMAPv1 storage engine.从4.2版本开始,MongoDB删除了不推荐使用的MMAPv1存储引擎。

Storage Recommendations存储建议

How many collections and indexes can be in a cluster?一个集群中可以有多少个集合和索引?

Cluster performance might degrade once the combined number of collections and indexes reaches beyond 100,000. 一旦集合和索引的总数超过100000,集群性能可能会下降。In addition, many large collections have a greater impact on performance than smaller collections.此外,许多大型集合比小型集合对性能的影响更大。

WiredTiger Storage EngineWiredTiger存储引擎

Can I upgrade an existing deployment to WiredTiger?我可以将现有部署升级到WiredTiger吗?

Yes. See:对请参阅:

How much compression does WiredTiger provide?WiredTiger提供了多少压缩?

The ratio of compressed data to uncompressed data depends on your data and the compression library used. 压缩数据与未压缩数据的比率取决于您的数据和使用的压缩库。By default, collection data in WiredTiger use Snappy block compression; zlib and zstd compression is also available. 默认情况下,WiredTiger中的采集数据使用Snappy块压缩zlibzstd压缩也可用。Index data use prefix compression by default.默认情况下,索引数据使用前缀压缩

To what size should I set the WiredTiger internal cache?我应该将WiredTiger内部缓存设置为什么大小?

With WiredTiger, MongoDB utilizes both the WiredTiger internal cache and the filesystem cache.有了WiredTiger,MongoDB既利用了WiredTinger内部缓存,也利用了文件系统缓存。

Starting in MongoDB 3.4, the default WiredTiger internal cache size is the larger of either:从MongoDB 3.4开始,默认的WiredTiger内部缓存大小是以下两者中较大的一个:

  • 50% of (RAM - 1 GB), or
  • 256 MB.

For example, on a system with a total of 4GB of RAM the WiredTiger cache will use 1.5GB of RAM (0.5 * (4 GB - 1 GB) = 1.5 GB). 例如,在总内存为4GB的系统上,WiredTiger缓存将使用1.5GB的RAM(0.5 * (4 GB - 1 GB) = 1.5 GB)。Conversely, a system with a total of 1.25 GB of RAM will allocate 256 MB to the WiredTiger cache because that is more than half of the total RAM minus one gigabyte (0.5 * (1.25 GB - 1 GB) = 128 MB < 256 MB).反过来说,一个总RAM为1.25 GB的系统将为WiredTiger缓存分配256 MB,因为这超过了总RAM的一半减去1 GB(0.5 * (1.25 GB - 1 GB) = 128 MB < 256 MB)。

Note

In some instances, such as when running in a container, the database can have memory constraints that are lower than the total system memory. 在某些情况下,例如在容器中运行时,数据库的内存约束可能低于系统总内存。In such instances, this memory limit, rather than the total system memory, is used as the maximum RAM available.在这种情况下,这个内存限制,而不是整个系统内存,被用作可用的最大RAM。

To see the memory limit, see hostInfo.system.memLimitMB.要查看内存限制,请参阅hostInfo.system.memLimitMB

By default, WiredTiger uses Snappy block compression for all collections and prefix compression for all indexes. 默认情况下,WiredTiger对所有集合使用Snappy块压缩,对所有索引使用前缀压缩。Compression defaults are configurable at a global level and can also be set on a per-collection and per-index basis during collection and index creation.压缩默认值可以在全局级别进行配置,也可以在集合和索引创建期间按每个集合和每个索引进行设置。

Different representations are used for data in the WiredTiger internal cache versus the on-disk format:WiredTiger内部缓存中的数据与磁盘上的格式使用不同的表示形式:

  • Data in the filesystem cache is the same as the on-disk format, including benefits of any compression for data files. The filesystem cache is used by the operating system to reduce disk I/O.文件系统缓存中的数据与磁盘上的格式相同,包括对数据文件进行任何压缩的好处。操作系统使用文件系统缓存来减少磁盘I/O。
  • Indexes loaded in the WiredTiger internal cache have a different data representation to the on-disk format, but can still take advantage of index prefix compression to reduce RAM usage. WiredTiger内部缓存中加载的索引具有与磁盘上格式不同的数据表示形式,但仍然可以利用索引前缀压缩来减少RAM的使用。Index prefix compression deduplicates common prefixes from indexed fields.索引前缀压缩从索引字段中消除常见前缀的重复。
  • Collection data in the WiredTiger internal cache is uncompressed and uses a different representation from the on-disk format. WiredTiger内部缓存中的采集数据未压缩,使用不同于磁盘格式的表示形式。Block compression can provide significant on-disk storage savings, but data must be uncompressed to be manipulated by the server.块压缩可以显著节省磁盘上的存储空间,但数据必须经过压缩才能由服务器操作。

Via the filesystem cache, MongoDB automatically uses all free memory that is not used by the WiredTiger cache or by other processes.通过文件系统缓存,MongoDB自动使用WiredTiger缓存或其他进程未使用的所有可用内存。

To adjust the size of the WiredTiger internal cache, see storage.wiredTiger.engineConfig.cacheSizeGB and --wiredTigerCacheSizeGB. Avoid increasing the WiredTiger internal cache size above its default value.要调整WiredTiger内部缓存的大小,请参阅storage.wiredTiger.engineConfig.cacheSizeGB--wiredTigerCacheSizeGB。避免将WiredTiger内部缓存大小增加到其默认值以上。

Note

The storage.wiredTiger.engineConfig.cacheSizeGB limits the size of the WiredTiger internal cache. storage.wiredTiger.engineConfig.cacheSizeGB限制了WiredTiger内部缓存的大小。The operating system will use the available free memory for filesystem cache, which allows the compressed MongoDB data files to stay in memory. 操作系统将使用可用的空闲内存进行文件系统缓存,这允许压缩的MongoDB数据文件留在内存中。In addition, the operating system will use any free RAM to buffer file system blocks and file system cache.此外,操作系统将使用任何空闲的RAM来缓冲文件系统块和文件系统缓存。

To accommodate the additional consumers of RAM, you may have to decrease WiredTiger internal cache size.为了容纳更多的RAM消耗者,您可能需要减小WiredTiger内部缓存的大小。

The default WiredTiger internal cache size value assumes that there is a single mongod instance per machine. 默认的WiredTiger内部缓存大小值假定每台机器有一个mongod实例。If a single machine contains multiple MongoDB instances, then you should decrease the setting to accommodate the other mongod instances.如果一台机器包含多个MongoDB实例,那么应该减少设置以容纳其他mongod实例。

If you run mongod in a container (e.g. lxc, cgroups, Docker, etc.) that does not have access to all of the RAM available in a system, you must set storage.wiredTiger.engineConfig.cacheSizeGB to a value less than the amount of RAM available in the container. 如果在无法访问系统中所有可用RAM的容器(例如lxccgroups、Docker等)中运行mongod,则必须将storage.wiredTiger.engineConfig.cacheSizeGB设置为小于容器中可用RAM量的值。The exact amount depends on the other processes running in the container. See memLimitMB.确切的数量取决于容器中运行的其他进程。请参阅memLimitMB

To view statistics on the cache and eviction rate, see the wiredTiger.cache field returned from the serverStatus command.要查看缓存和逐出率的统计信息,请参阅serverStatus命令返回的wiredTiger.cache字段。

How much memory does MongoDB allocate per connection?MongoDB为每个连接分配了多少内存?

Each connection uses up to 1 megabyte of RAM.每个连接最多使用1兆字节的RAM。

To optimize memory use for connections, ensure that you:要优化连接的内存使用,请确保:

  • Monitor the number of open connections to your deployment. Too many open connections result in excessive use of RAM and reduce available memory for the working set.监视到部署的打开连接数。过多的开放连接会导致RAM的过度使用,并减少工作集的可用内存。
  • Close connection pools when they are no longer needed. 当不再需要连接池时,请关闭它们。A connection pool is a cache of open, ready-to-use database connections maintained by the driver. 连接池是由驱动程序维护的打开的、随时可用的数据库连接的缓存。Closing unneeded pools makes additional memory resources available.关闭不需要的池可以获得额外的内存资源。
  • Manage the size of your connection pool. 管理连接池的大小。The maxPoolSize connection string option specifies the maximum number of open connections in the pool. maxPoolSize连接字符串选项指定池中打开的连接的最大数量。By default, you can have up to 100 open connections in the pool. Lowering the maxPoolSize reduces the maximum amount of RAM used for connections.默认情况下,池中最多可以有100个打开的连接。降低maxPoolSize会减少用于连接的最大RAM量。

    Tip

    To configure your connection pool, see Connection Pool Configuration Settings.要配置连接池,请参阅连接池配置设置

How frequently does WiredTiger write to disk?WiredTiger写入磁盘的频率是多少?

Checkpoints检查点
Starting in version 3.6, MongoDB configures WiredTiger to create checkpoints (i.e. write the snapshot data to disk) at intervals of 60 seconds. 从3.6版本开始,MongoDB将WiredTiger配置为每隔60秒创建一个检查点(即将快照数据写入磁盘)。In earlier versions, MongoDB sets checkpoints to occur in WiredTiger on user data at an interval of 60 seconds or when 2 GB of journal data has been written, whichever occurs first.在早期版本中,MongoDB将检查点设置为在WiredTiger中每隔60秒或写入2GB日志数据时(以先发生者为准)对用户数据进行检查。
Journal Data
WiredTiger syncs the buffered journal records to disk upon any of the following conditions: WiredTiger在以下任何一种情况下都会将缓冲的日志记录同步到磁盘:
  • For replica set members (primary and secondary members),对于副本集成员(主要和次要成员),

    • If there are operations waiting for oplog entries. Operations that can wait for oplog entries include:如果有操作正在等待oplog条目。可以等待oplog条目的操作包括:

    • Additionally for secondary members, after every batch application of the oplog entries.此外,对于辅助成员,在每次批量应用oplog条目之后。
  • If a write operation includes or implies a write concern of j: true.如果写入操作包含或暗示j: true的写入关注。

    Note

    Write concern "majority" implies j: true if the writeConcernMajorityJournalDefault is true.如果writeConcernMajorityJournalDefaulttrue,则写关注"majority"表示j:true

  • At every 100 milliseconds (See storage.journal.commitIntervalMs).每100毫秒(请参阅storage.journal.commitIntervalMs)。
  • When WiredTiger creates a new journal file. WiredTiger创建新的日志文件时。Because MongoDB uses a journal file size limit of 100 MB, WiredTiger creates a new journal file approximately every 100 MB of data.由于MongoDB使用100MB的日志文件大小限制,WiredTiger大约每100MB的数据就会创建一个新的日志文件。

How do I reclaim disk space in WiredTiger?如何在WiredTiger中回收磁盘空间?

The WiredTiger storage engine maintains lists of empty records in data files as it deletes documents. WiredTiger存储引擎在删除文档时会维护数据文件中的空记录列表。This space can be reused by WiredTiger, but will not be returned to the operating system unless under very specific circumstances.WiredTiger可以重复使用此空间,但除非在非常特殊的情况下,否则不会将其返回到操作系统。

The amount of empty space available for reuse by WiredTiger is reflected in the output of db.collection.stats() under the heading wiredTiger.block-manager.file bytes available for reuse.WiredTiger可重复使用的空空间量反映在db.collection.stats()的输出中,标题为wiredTiger.block-manager.file bytes available for reuse

To allow the WiredTiger storage engine to release this empty space to the operating system, you can de-fragment your data file. 要允许WiredTiger存储引擎将此空白空间释放到操作系统,您可以对数据文件进行去分片化。This can be achieved using the compact command. 这可以使用compact命令来实现。For more information on its behavior and other considerations, see compact.有关其行为和其他注意事项的更多信息,请参阅compact

Data Storage Diagnostics数据存储诊断

How can I check the size of a collection?如何检查集合的大小?

To view the statistics for a collection, including the data size, use the db.collection.stats() method from within mongosh. 要查看集合的统计信息,包括数据大小,请使用mongosh中的db.collection.stats()方法。The following example issues db.collection.stats() for the orders collection:以下示例为订单集合发出db.collection.stats()

db.orders.stats();

MongoDB also provides the following methods to return specific sizes for the collection:MongoDB还提供了以下方法来返回集合的特定大小:

The following script prints the statistics for each database:以下脚本打印每个数据库的统计信息:

db.adminCommand("listDatabases").databases.forEach(function (d) {
mdb = db.getSiblingDB(d.name);
printjson(mdb.stats());
})

The following script prints the statistics for each collection in each database:以下脚本打印每个数据库中每个集合的统计信息:

db.adminCommand("listDatabases").databases.forEach(function (d) {
mdb = db.getSiblingDB(d.name);
mdb.getCollectionNames().forEach(function(c) {
s = mdb[c].stats();
printjson(s);
})
})

How can I check the size of the individual indexes for a collection?如何检查集合的各个索引的大小?

To view the size of the data allocated for each index, use the db.collection.stats() method and check the indexSizes field in the returned document.要查看为每个索引分配的数据大小,请使用db.collection.stats()方法并检查返回文档中的indexSizes字段。

If an index uses prefix compression (which is the default for WiredTiger), the returned size for that index reflects the compressed size.如果索引使用前缀压缩(这是WiredTiger的默认值),则该索引返回的大小反映压缩后的大小。

How can I get information on the storage use of a database?如何获取有关数据inventory储使用情况的信息?

The db.stats() method in mongosh returns the current state of the "active" database. mongosh中的db.stats()方法返回“活动”数据库的当前状态。For the description of the returned fields, see dbStats Output.有关返回字段的描述,请参阅dbStats输出