Database Manual / Storage

Journaling日志记录

To provide durability in the event of a failure, MongoDB uses write ahead logging to on-disk journal files.为了在发生故障时提供持久性,MongoDB使用预写日志记录磁盘上的日志文件。

Journaling and the WiredTiger Storage Engine日志和WiredTiger存储引擎

Important

The log mentioned in this section refers to the WiredTiger write-ahead log (i.e. the journal) and not the MongoDB log file.本节中提到的日志是指WiredTiger预写日志(即日志),而不是MongoDB日志文件。

WiredTiger uses checkpoints to provide a consistent view of data on disk and allow MongoDB to recover from the last checkpoint. WiredTiger使用检查点来提供磁盘上数据的一致视图,并允许MongoDB从最后一个检查点恢复。However, if MongoDB exits unexpectedly in between checkpoints, journaling is required to recover information that occurred after the last checkpoint.但是,如果MongoDB在检查点之间意外退出,则需要日志记录来恢复在最后一个检查点之后发生的信息。

Note

Starting in MongoDB 6.1, journaling is always enabled. As a result, MongoDB removes the storage.journal.enabled option and the corresponding --journal and --nojournal command-line options.从MongoDB 6.1开始,日志始终处于启用状态。因此,MongoDB删除了storage.journal.enabled选项以及相应的--journal--nojournal命令行选项。

With journaling, the recovery process:使用日志记录,恢复过程:

  1. Looks in the data files to find the identifier of the last checkpoint.在数据文件中查找最后一个检查点的标识符。
  2. Searches in the journal files for the record that matches the identifier of the last checkpoint.在日志文件中搜索与最后一个检查点的标识符匹配的记录。
  3. Apply the operations in the journal files since the last checkpoint.应用自上次检查点以来日志文件中的操作。

Journaling Process日志记录过程

With journaling, WiredTiger creates one journal record for each client initiated write operation. The journal record includes any internal write operations caused by the initial write. 通过日志记录,WiredTiger为每个客户端发起的写入操作创建一个日志记录。日志记录包括由初始写入引起的任何内部写入操作。For example, an update to a document in a collection may result in modifications to the indexes; WiredTiger creates a single journal record that includes both the update operation and its associated index modifications.例如,对集合中文档的更新可能会导致索引的修改;WiredTiger创建了一个日志记录,其中包括更新操作及其相关的索引修改。

MongoDB configures WiredTiger to use in-memory buffering for storing the journal records. Threads coordinate to allocate and copy into their portion of the buffer. All journal records up to 128 kB are buffered.MongoDB将WiredTiger配置为使用内存缓冲来存储日志记录。线程协调分配和复制到缓冲区的相应部分。所有高达128 kB的日志记录都会被缓冲。

WiredTiger syncs the buffered journal records to disk upon any of the following conditions:WiredTiger在以下任何一种情况下将缓冲的日志记录同步到磁盘:

  • For replica set members (primary and secondary members):对于副本集成员(主要和次要成员):

    • If a write operation includes or implies a write concern of j: true.如果写操作包含或暗示了j:true的写关注。
    • Additionally for secondary members, after every batch application of the oplog entries.此外,对于次要成员,每次批量应用oplog条目后。

    Note

    Write concern "majority" implies j: true if the writeConcernMajorityJournalDefault is true.如果writeConcernMajorityJournalDefault为真,则写关注"majority"意味着j:true

  • At every 100 milliseconds (See storage.journal.commitIntervalMs).每100毫秒一次(请参阅storage.journal.commitIntervalMs)。
  • When WiredTiger creates a new journal file. Because MongoDB uses a journal file size limit of 100 MB, WiredTiger creates a new journal file approximately every 100 MB of data.当WiredTiger创建新的日志文件时。因为MongoDB使用100MB的日志文件大小限制,所以WiredTiger大约每100MB的数据创建一个新的日志文件。

Important

In between write operations, while the journal records remain in the WiredTiger buffers, updates can be lost following a hard shutdown of mongod.在写入操作之间,虽然日志记录仍保留在WiredTiger缓冲区中,但在mongod硬关闭后,更新可能会丢失。

Tip

The serverStatus command returns information on the WiredTiger journal statistics in the wiredTiger.log field.serverStatus命令返回wiredTiger.log字段中WiredTiger日志统计信息。

Journal Files日志文件

For the journal files, MongoDB creates a subdirectory named journal under the dbPath directory. WiredTiger journal files have names with the following format WiredTigerLog.<sequence> where <sequence> is a zero-padded number starting from 0000000001.对于日志文件,MongoDB在dbPath目录下创建了一个名为journal的子目录。WiredTiger日志文件的名称格式为WiredTigerLog.<sequence>其中<sequence>是一个从0000000001开始的零填充数字。

Journal Records日记记录

Journal files contain a record per each client initiated write operation日志文件包含每个客户端启动的写入操作的记录

  • The journal record includes any internal write operations caused by the initial write. For example, an update to a document in a collection may result in modifications to the indexes; WiredTiger creates a single journal record that includes both the update operation and its associated index modifications.日志记录包括由初始写入引起的任何内部写入操作。例如,对集合中文档的更新可能会导致索引的修改;WiredTiger创建了一个日志记录,其中包括更新操作及其相关的索引修改。
  • Each record has a unique identifier.每条记录都有一个唯一的标识符。
  • The minimum journal record size for WiredTiger is 128 bytes.WiredTiger的最小日志记录大小为128字节。

Compression压缩

By default, MongoDB configures WiredTiger to use snappy compression for its journaling data. 默认情况下,MongoDB将WiredTiger配置为对其日志数据使用snappy压缩。To specify a different compression algorithm or no compression, use the storage.wiredTiger.engineConfig.journalCompressor setting. 要指定不同的压缩算法或不进行压缩,请使用storage.wiredTiger.engineConfig.journalCompressor设置。For details, see Change WiredTiger Journal Compressor.有关详细信息,请参阅更改接线的涡轮轴颈压缩机

Note

If a log record is less than or equal to 128 bytes, which is the minimum log record size for WiredTiger, WiredTiger does not compress that record.如果日志记录小于或等于128字节,这是WiredTiger的最小日志记录大小,WiredTiger不会压缩该记录。

Journal File Size Limit日志文件大小限制

WiredTiger journal files have a maximum size limit of approximately 100 MB. Once the file exceeds that limit, WiredTiger creates a new journal file.WiredTiger日志文件的最大大小限制约为100 MB。一旦文件超过该限制,WiredTiger就会创建一个新的日志文件。

WiredTiger automatically removes old journal files and maintains only the files needed to recover from the last checkpoint. To determine how much disk space to set aside for journal files, consider the following:WiredTiger会自动删除旧的日志文件,并仅保留从上一个检查点恢复所需的文件。要确定为日志文件留出多少磁盘空间,请考虑以下因素:

  • The default maximum size for a checkpoint is 2 GB检查点的默认最大大小为2 GB
  • Additional space may be required for MongoDB to write new journal files while recovering from a checkpoint从检查点恢复时,MongoDB可能需要额外的空间来写入新的日志文件
  • MongoDB compresses journal filesMongoDB压缩日志文件
  • The time it takes to restore a checkpoint is specific to your use case恢复检查点所需的时间取决于用例
  • If you override the maximum checkpoint size or disable compression, your calculations may be significantly different如果覆盖最大检查点大小或禁用压缩,计算可能会有很大不同

For these reasons, it is difficult to calculate exactly how much additional space you need. Over-estimating disk space is always a safer approach.由于这些原因,很难准确计算你需要多少额外的空间。高估磁盘空间总是一种更安全的方法。

Important

If you do not set aside enough disk space for your journal files, the MongoDB server will crash.如果您没有为日志文件留出足够的磁盘空间,MongoDB服务器将崩溃。

Pre-Allocation预分配

WiredTiger pre-allocates journal files.WiredTiger预先分配日志文件。

Journaling and the In-Memory Storage Engine日志和内存存储引擎

In MongoDB Enterprise, the In-Memory Storage Engine is part of general availability (GA). Because its data is kept in memory, there is no separate journal. 在MongoDB Enterprise中,内存存储引擎是通用可用性(GA)的一部分。因为它的数据保存在内存中,所以没有单独的日记账。Write operations with a write concern of j: true are immediately acknowledged.写关注为j:true的写操作会立即得到确认。

If any voting member of a replica set uses the in-memory storage engine, you must set writeConcernMajorityJournalDefault to false.如果副本集的任何投票成员使用内存中的存储引擎,则必须将writeConcernMajorityJournalDefault设置为false

Note

Starting in version 4.2 (and 4.0.13 and 3.6.14 ), if a replica set member uses the in-memory storage engine (voting or non-voting) but the replica set has writeConcernMajorityJournalDefault set to true, the replica set member logs a startup warning.从版本4.2(以及4.0.13和3.6.14)开始,如果副本集成员使用内存中的存储引擎(投票或非投票),但副本集的writeConcernMajorityJournalDefault设置为true,则副本集成员会记录启动警告。

With writeConcernMajorityJournalDefault set to false, MongoDB does not wait for w: "majority" writes to be written to the on-disk journal before acknowledging the writes. writeConcernMajorityJournalDefault设置为false时,MongoDB不会在确认写入之前等待w: "majority"写入写入到磁盘日志中。As such, "majority" write operations could possibly roll back in the event of a transient loss (e.g. crash and restart) of a majority of nodes in a given replica set.因此,在给定副本集中的大多数节点暂时丢失(例如崩溃和重启)的情况下,"majority"写操作可能会回滚。