FAQ: MongoDB Storage常见问题解答:MongoDB存储

On this page本页内容

This document addresses common questions regarding MongoDB's storage system.本文档介绍了有关MongoDB存储系统的常见问题。

Storage Engine Fundamentals存储引擎基础

What is a storage engine?什么是存储引擎?

A storage engine is the part of a database that is responsible for managing how data is stored, both in memory and on disk. Many databases support multiple storage engines, where different engines perform better for specific workloads. 存储引擎是数据库的一部分,负责管理数据在内存和磁盘中的存储方式。许多数据库支持多个存储引擎,其中不同的引擎对于特定的工作负载表现更好。For example, one storage engine might offer better performance for read-heavy workloads, and another might support a higher throughput for write operations.例如,一个存储引擎可能为读取繁重的工作负载提供更好的性能,而另一个可能为写入操作提供更高的吞吐量。

Tip提示
See also: 参阅:

Can you mix storage engines in a replica set?您可以在复制集中混合存储引擎吗?

Yes. You can have replica set members that use different storage engines (WiredTiger and in-memory)对。您可以拥有使用不同存储引擎(WiredTiger和内存中)的副本集成员

Note注意

Starting in version 4.2, MongoDB removes the deprecated MMAPv1 storage engine.从版本4.2开始,MongoDB删除了不推荐使用的MMAPv1存储引擎。

WiredTiger Storage EngineWiredTiger存储引擎

Can I upgrade an existing deployment to WiredTiger?我可以将现有部署升级到WiredTiger吗?

Yes. See:对。参阅:

How much compression does WiredTiger provide?WiredTiger提供了多少压缩?

The ratio of compressed data to uncompressed data depends on your data and the compression library used. 压缩数据与未压缩数据的比率取决于您的数据和使用的压缩库。By default, collection data in WiredTiger use Snappy block compression; zlib and zstd compression is also available. 默认情况下,WiredTiger中的采集数据使用Snappy块压缩zlibzstd压缩也可用。Index data use prefix compression by default.默认情况下,索引数据使用前缀压缩

To what size should I set the WiredTiger internal cache?我应该将WiredTiger内部缓存设置为什么大小?

With WiredTiger, MongoDB utilizes both the WiredTiger internal cache and the filesystem cache.使用WiredTiger,MongoDB利用WiredTier内部缓存和文件系统缓存。

Starting in MongoDB 3.4, the default WiredTiger internal cache size is the larger of either:从MongoDB 3.4开始,默认WiredTiger内部缓存大小为以下两者中的较大值:

  • 50% of (RAM - 1 GB), or
  • 256 MB.

For example, on a system with a total of 4GB of RAM the WiredTiger cache will use 1.5GB of RAM (0.5 * (4 GB - 1 GB) = 1.5 GB). 例如,在总共有4GB RAM的系统上,WiredTiger缓存将使用1.5GB的0.5 * (4 GB - 1 GB) = 1.5 GBConversely, a system with a total of 1.25 GB of RAM will allocate 256 MB to the WiredTiger cache because that is more than half of the total RAM minus one gigabyte (0.5 * (1.25 GB - 1 GB) = 128 MB < 256 MB).相反,总RAM为1.25GB的系统将为WiredTiger缓存分配256MB,因为这是总RAM的一半以上减去1GB(0.5 * (1.25 GB - 1 GB) = 128 MB < 256 MB)。

Note注意

In some instances, such as when running in a container, the database can have memory constraints that are lower than the total system memory. 在某些情况下,例如在容器中运行时,数据库的内存约束可能低于系统总内存。In such instances, this memory limit, rather than the total system memory, is used as the maximum RAM available.在这种情况下,此内存限制而不是整个系统内存被用作可用的最大RAM。

To see the memory limit, see hostInfo.system.memLimitMB.要查看内存限制,请参阅hostInfo.system.memLimitMB

By default, WiredTiger uses Snappy block compression for all collections and prefix compression for all indexes. 默认情况下,WiredTiger对所有集合使用快照块压缩,对所有索引使用前缀压缩。Compression defaults are configurable at a global level and can also be set on a per-collection and per-index basis during collection and index creation.压缩默认值可以在全局级别配置,也可以在集合和索引创建期间按每个集合和每个索引设置。

Different representations are used for data in the WiredTiger internal cache versus the on-disk format:WiredTiger内部缓存中的数据与磁盘格式中的数据使用不同的表示方式:

  • Data in the filesystem cache is the same as the on-disk format, including benefits of any compression for data files. 文件系统缓存中的数据与磁盘格式相同,包括对数据文件进行任何压缩的好处。The filesystem cache is used by the operating system to reduce disk I/O.操作系统使用文件系统缓存来减少磁盘I/O。
  • Indexes loaded in the WiredTiger internal cache have a different data representation to the on-disk format, but can still take advantage of index prefix compression to reduce RAM usage. WiredTiger内部缓存中加载的索引具有与磁盘格式不同的数据表示,但仍然可以利用索引前缀压缩来减少RAM使用。Index prefix compression deduplicates common prefixes from indexed fields.索引前缀压缩从索引字段中消除常见前缀的重复。
  • Collection data in the WiredTiger internal cache is uncompressed and uses a different representation from the on-disk format. WiredTiger内部缓存中的采集数据未压缩,并使用与磁盘格式不同的表示形式。Block compression can provide significant on-disk storage savings, but data must be uncompressed to be manipulated by the server.块压缩可以显著节省磁盘存储,但数据必须解压缩才能由服务器操作。

Via the filesystem cache, MongoDB automatically uses all free memory that is not used by the WiredTiger cache or by other processes.通过文件系统缓存,MongoDB自动使用WiredTiger缓存或其他进程未使用的所有可用内存。

To adjust the size of the WiredTiger internal cache, see storage.wiredTiger.engineConfig.cacheSizeGB and --wiredTigerCacheSizeGB. 要调整WiredTiger内部缓存的大小,请参阅storage.wiredTiger.engineConfig.cacheSizeGB--wiredTigerCacheSizeGBAvoid increasing the WiredTiger internal cache size above its default value.避免将WiredTiger内部缓存大小增加到其默认值以上。

Note注意

The storage.wiredTiger.engineConfig.cacheSizeGB limits the size of the WiredTiger internal cache. storage.wiredTiger.engineConfig.cacheSizeGB限制WiredTiger内部缓存的大小。The operating system will use the available free memory for filesystem cache, which allows the compressed MongoDB data files to stay in memory. 操作系统将使用可用的空闲内存进行文件系统缓存,这允许压缩的MongoDB数据文件保留在内存中。In addition, the operating system will use any free RAM to buffer file system blocks and file system cache.此外,操作系统将使用任何空闲RAM来缓冲文件系统块和文件系统缓存。

To accommodate the additional consumers of RAM, you may have to decrease WiredTiger internal cache size.为了适应RAM的额外消耗,您可能必须减小WiredTiger内部缓存大小。

The default WiredTiger internal cache size value assumes that there is a single mongod instance per machine. 默认的WiredTiger内部缓存大小值假定每台机器有一个mongod实例。If a single machine contains multiple MongoDB instances, then you should decrease the setting to accommodate the other mongod instances.如果一台机器包含多个MongoDB实例,则应减少设置以容纳其他mongod实例。

If you run mongod in a container (e.g. lxc, cgroups, Docker, etc.) that does not have access to all of the RAM available in a system, you must set storage.wiredTiger.engineConfig.cacheSizeGB to a value less than the amount of RAM available in the container. 如果您在无法访问系统中所有可用RAM的容器(如lxccgroups、Docker等)中运行mongod,则必须将storage.wiredTiger.engineConfig.cacheSizeGB设置为小于容器中可用RAM数量的值。The exact amount depends on the other processes running in the container. 确切数量取决于容器中运行的其他进程。See memLimitMB.请参阅memLimitMB

To view statistics on the cache and eviction rate, see the wiredTiger.cache field returned from the serverStatus command.要查看缓存和收回率的统计信息,请参阅serverStatus命令返回的wiredTiger.cache字段。

How frequently does WiredTiger write to disk?WiredTiger写入磁盘的频率是多少?

Checkpoints
Starting in version 3.6, MongoDB configures WiredTiger to create checkpoints (i.e. write the snapshot data to disk) at intervals of 60 seconds. 从3.6版开始,MongoDB将WiredTiger配置为每隔60秒创建检查点(即将快照数据写入磁盘)。In earlier versions, MongoDB sets checkpoints to occur in WiredTiger on user data at an interval of 60 seconds or when 2 GB of journal data has been written, whichever occurs first.在早期版本中,MongoDB将检查点设置为在WiredTiger中以60秒的间隔对用户数据进行检查,或在写入2GB日志数据时进行检查,以先发生的为准。
Journal Data日志数据
WiredTiger syncs the buffered journal records to disk upon any of the following conditions:WiredTiger在以下任何情况下将缓冲日志记录同步到磁盘:
  • For replica set members (primary and secondary members),对于副本集成员(主要和次要成员),

    • If there are operations waiting for oplog entries. 如果有操作等待oplog条目。Operations that can wait for oplog entries include:可以等待oplog条目的操作包括:

    • Additionally for secondary members, after every batch application of the oplog entries.此外,对于次要成员,在每次批量应用oplog条目之后。
  • If a write operation includes or implies a write concern of j: true.如果写入操作包含或暗示j: true的写入关注。

    Note注意

    Write concern "majority" implies j: true if the writeConcernMajorityJournalDefault is true.如果writeConcernMajorityJournalDefaulttrue,则写关注"majority"意味着j:true

  • At every 100 milliseconds (See storage.journal.commitIntervalMs).每100毫秒一次(请参阅storage.journal.commitIntervalMs)。
  • When WiredTiger creates a new journal file. Because MongoDB uses a journal file size limit of 100 MB, WiredTiger creates a new journal file approximately every 100 MB of data.WiredTiger创建新日志文件时。由于MongoDB使用的日志文件大小限制为100 MB,WiredTiger大约每100 MB的数据创建一个新的日志文件。

How do I reclaim disk space in WiredTiger?如何回收WiredTiger中的磁盘空间?

The WiredTiger storage engine maintains lists of empty records in data files as it deletes documents. WiredTiger存储引擎在删除文档时维护数据文件中的空记录列表。This space can be reused by WiredTiger, but will not be returned to the operating system unless under very specific circumstances.WiredTiger可以重用此空间,但除非在非常特殊的情况下,否则不会将其返回到操作系统。

The amount of empty space available for reuse by WiredTiger is reflected in the output of db.collection.stats() under the heading wiredTiger.block-manager.file bytes available for reuse.WiredTiger可重用的空空间量反映在标题wiredTiger.block-manager.file bytes available for reuse下的db.collection.stats()输出中。

To allow the WiredTiger storage engine to release this empty space to the operating system, you can de-fragment your data file. 要允许WiredTiger存储引擎将此空白空间释放给操作系统,您可以对数据文件进行分段。This can be achieved using the compact command. 这可以使用compact命令实现。For more information on its behavior and other considerations, see compact.有关其行为和其他注意事项的更多信息,请参阅compact

Data Storage Diagnostics数据存储诊断

How can I check the size of a collection?如何检查集合的大小?

To view the statistics for a collection, including the data size, use the db.collection.stats() method from within mongosh. 要查看集合的统计信息,包括数据大小,请使用mongosh中的db.collection.stats()方法。The following example issues db.collection.stats() for the orders collection:以下示例为orders集合发出db.collection.stats()

db.orders.stats();

MongoDB also provides the following methods to return specific sizes for the collection:MongoDB还提供了以下方法来返回集合的特定大小:

The following script prints the statistics for each database:以下脚本打印每个数据库的统计信息:

db.adminCommand("listDatabases").databases.forEach(function (d) {
   mdb = db.getSiblingDB(d.name);
   printjson(mdb.stats());
})

The following script prints the statistics for each collection in each database:以下脚本打印每个数据库中每个集合的统计信息:

db.adminCommand("listDatabases").databases.forEach(function (d) {
   mdb = db.getSiblingDB(d.name);
   mdb.getCollectionNames().forEach(function(c) {
      s = mdb[c].stats();
      printjson(s);
   })
})

How can I check the size of the individual indexes for a collection?如何检查集合的单个索引的大小?

To view the size of the data allocated for each index, use the db.collection.stats() method and check the indexSizes field in the returned document.要查看为每个索引分配的数据大小,请使用db.collection.stats()方法并检查返回文档中的indexSizes字段。

If an index uses prefix compression (which is the default for WiredTiger), the returned size for that index reflects the compressed size.如果索引使用前缀压缩(这是WiredTiger的默认值),则该索引返回的大小将反映压缩的大小。

How can I get information on the storage use of a database?如何获取数据库的存储使用信息?

The db.stats() method in mongosh returns the current state of the "active" database. mongosh中的db.stats()方法返回“活动”数据库的当前状态。For the description of the returned fields, see dbStats Output.有关返回字段的说明,请参阅dbStats输出

←  GridFSFrequently Asked Questions →