Monitoring is a critical component of all database administration. A firm grasp of MongoDB's reporting will allow you to assess the state of your database and maintain your deployment without crisis. Additionally, a sense of MongoDB's normal operational parameters will allow you to diagnose problems before they escalate to failures.监控是所有数据库管理的关键组成部分。牢牢掌握MongoDB的报告将使您能够评估数据库的状态,并在没有危机的情况下维护部署。此外,了解MongoDB的正常操作参数将使您能够在问题升级为故障之前进行诊断。
This document presents an overview of the available monitoring utilities and the reporting statistics available in MongoDB. It also introduces diagnostic strategies and suggestions for monitoring replica sets and sharded clusters.本文档概述了MongoDB中可用的监控实用程序和报告统计数据。它还介绍了用于监控副本集和分片集群的诊断策略和建议。
Monitoring Strategies监控策略
MongoDB provides various methods for collecting data about the state of a running MongoDB instance:MongoDB提供了各种方法来集合有关正在运行的MongoDB实例状态的数据:
MongoDB distributes a set of utilities that provides real-time reporting of database activities.MongoDB分发了一组实用程序,提供数据库活动的实时报告。MongoDB provides various database commands that return statistics regarding the current database state with greater fidelity.MongoDB提供各种数据库命令,以更高的保真度返回有关当前数据库状态的统计信息。- MongoDB Atlas
is a cloud-hosted database-as-a-service for running, monitoring, and maintaining MongoDB deployments.是一种云托管的数据库即服务,用于运行、监控和维护MongoDB部署。 - MongoDB Cloud Manager
is a hosted service that monitors running MongoDB deployments to collect data and provide visualization and alerts based on that data.是一种托管服务,用于监控正在运行的MongoDB部署,以集合数据并基于该数据提供可视化和警报。 MongoDB Ops Manager is an on-premises solution available in MongoDB Enterprise Advanced that monitors running MongoDB deployments to collect data and provide visualization and alerts based on that data.MongoDB Ops Manager是MongoDB Enterprise Advanced中提供的本地解决方案,它监控正在运行的MongoDB部署以集合数据,并根据这些数据提供可视化和警报。
Each strategy can help answer different questions and is useful in different contexts. These methods are complementary.每种策略都可以帮助回答不同的问题,在不同的情况下都很有用。这些方法是互补的。
MongoDB Reporting ToolsMongoDB报告工具
This section provides an overview of the reporting methods distributed with MongoDB. It also offers examples of the kinds of questions that each method is best suited to help you address.本节概述了随MongoDB分发的报告方法。它还提供了每种方法最适合帮助您解决的问题的示例。
Utilities公用事业
The MongoDB distribution includes a number of utilities that quickly return statistics about instances' performance and activity. Typically, these are most useful for diagnosing issues and assessing normal operation.MongoDB发行版包括许多实用程序,可以快速返回有关实例性能和活动的统计数据。通常,这些对于诊断问题和评估正常运行最有用。
mongostat
mongostat captures and returns the counts of database operations by type (e.g. insert, query, update, delete, etc.). These counts report on the load distribution on the server.按类型捕获并返回数据库操作的计数(例如插入、查询、更新、删除等)。这些计数报告服务器上的负载分布。
Use mongostat to understand the distribution of operation types and to inform capacity planning. 了解运营类型的分布,并为产能规划提供信息。See the 有关详细信息,请参阅mongostat reference page for details.mongostat参考页面。
mongotop
mongotop tracks and reports the current read and write activity of a MongoDB instance, and reports these statistics on a per collection basis.跟踪和报告MongoDB实例的当前读写活动,并在每个集合的基础上报告这些统计数据。
Use 使用mongotop to check if your database activity and use match your expectations. See the mongotop reference page for details.mongotop检查数据库活动和使用是否符合期望。有关详细信息,请参阅mongotop参考页面。
Commands命令
MongoDB includes a number of commands that report on the state of the database.MongoDB包含许多报告数据库状态的命令。
These data may provide a finer level of granularity than the utilities discussed above. Consider using their output in scripts and programs to develop custom alerts, or to modify the behavior of your application in response to the activity of your instance. 这些数据可以提供比上述实用程序更精细的粒度。考虑在脚本和程序中使用它们的输出来开发自定义警报,或根据实例的活动修改应用程序的行为。The db.currentOp() method is another useful tool for identifying the database instance's in-progress operations.db.currentOp()方法是另一个用于识别数据库实例正在进行的操作的有用工具。
serverStatus
The serverStatus command, or db.serverStatus() from the shell, returns a general overview of the status of the database, detailing disk usage, memory use, connection, journaling, and index access. The command returns quickly and does not impact MongoDB performance.serverStatus命令或shell中的db.serverStatus()返回数据库状态的总体概述,详细说明磁盘使用情况、内存使用情况、连接、日志记录和索引访问。该命令返回速度很快,不会影响MongoDB的性能。
serverStatus outputs an account of the state of a MongoDB instance. 输出MongoDB实例状态的帐户。This command is rarely run directly. In most cases, the data is more meaningful when aggregated, as one would see with monitoring tools including MongoDB Cloud Manager and Ops Manager. 此命令很少直接运行。在大多数情况下,数据在聚合时更有意义,正如人们在MongoDB Cloud Manager和Ops Manager等监控工具中看到的那样。Nevertheless, all administrators should be familiar with the data provided by 尽管如此,所有管理员都应该熟悉serverStatus.serverStatus提供的数据。
dbStats
The dbStats command, or db.stats() from the shell, returns a document that addresses storage use and data volumes. dbStats命令或shell中的db.stats()返回一个文档,该文档涉及存储使用和数据量。The dbStats reflect the amount of storage used, the quantity of data contained in the database, and object, collection, and index counters.dbStats反映了使用的存储量、数据库中包含的数据量以及对象、集合和索引计数器。
Use this data to monitor the state and storage capacity of a specific database. This output also allows you to compare use between databases and to determine the average document size in a database.使用此数据监视特定数据库的状态和存储容量。此输出还允许您比较数据库之间的使用情况,并确定数据库中的平均文档大小。
collStats
The shell中的collStats or db.collection.stats() from the shell that provides statistics that resemble dbStats on the collection level, including a count of the objects in the collection, the size of the collection, the amount of disk space used by the collection, and information about its indexes.collStats或db.collection.stats()提供在集合级别类似于dbStats的统计信息,包括集合中对象的计数、集合的大小、集合使用的磁盘空间量以及有关其索引的信息。
replSetGetStatus
The replSetGetStatus命令(shell中的replSetGetStatus command (rs.status() from the shell) returns an overview of your replica set's status. The replSetGetStatus document details the state and configuration of the replica set and statistics about its members.rsstatus())返回副本集状态的概述。replSetGetStatus文档详细说明了副本集的状态和配置及其成员的统计信息。
Use this data to ensure that replication is properly configured, and to check the connections between the current host and the other members of the replica set.使用此数据确保复制配置正确,并检查当前主机与副本集其他成员之间的连接。
Hosted (SaaS) Monitoring Tools托管(SaaS)监控工具
These are monitoring tools provided as a hosted service, usually through a paid subscription.这些是作为托管服务提供的监控工具,通常通过付费订阅提供。
| MongoDB | |
| VividCortex | |
| New Relic | |
| Datadog | |
| Pandora FMS |
Process Logging进程日志记录
During normal operation, 在正常运行期间,mongod and mongos instances report a live account of all server activity and operations to either standard output or a log file. The following runtime settings control these options.mongod和mongos实例会向标准输出或日志文件报告所有服务器活动和操作的实时帐户。以下运行时设置控制这些选项。
quiet. Limits the amount of information written to the log or output.。限制写入日志或输出的信息量。verbosity. Increases the amount of information written to the log or output.。增加写入日志或输出的信息量。You can also modify the logging verbosity during runtime with the您还可以在运行时使用logLevelparameter or thedb.setLogLevel()method in the shell.logLevel参数或shell中的db.setLogLevel()方法修改日志记录的详细程度。path. Enables logging to a file, rather than the standard output. You must specify the full path to the log file when adjusting this setting.。允许记录到文件,而不是标准输出。调整此设置时,必须指定日志文件的完整路径。logAppend. Adds information to a log file instead of overwriting the file.。将信息添加到日志文件中,而不是覆盖该文件。
Note
You can specify these configuration operations as the command line arguments to 您可以将这些配置操作指定为mongod or mongos.mongod或mongos的命令行参数。
For example:例如:
mongod -v --logpath /var/log/mongodb/server1.log --logappend
Starts a 以详细模式启动mongod instance in verbose mode, appending data to the log file at /var/log/mongodb/server1.log/.mongod实例,将数据附加到/var/log/mongodb/server1.log/的日志文件中。
The following database commands also affect logging:以下数据库命令也会影响日志记录:
getLog. Displays recent messages from the。显示mongodprocess log.mongod进程日志中的最新消息。logRotate. Rotates the log files for。仅旋转mongodprocesses only.mongod进程的日志文件。See Rotate Log Files.请参见旋转日志文件。
Log Redaction日志重设
Available in MongoDB Enterprise only仅在MongoDB企业版中可用
A 运行mongod or mongos running with redactClientLogData redacts any message accompanying a given log event before logging, leaving only metadata, source files, or line numbers related to the event. redactClientLogData的mongod或mongos在记录之前会编辑给定日志事件附带的任何消息,只留下与事件相关的元数据、源文件或行号。redactClientLogData prevents potentially sensitive information from entering the system log at the cost of diagnostic detail.以牺牲诊断细节为代价,防止潜在的敏感信息进入系统日志。
For example, the following operation inserts a document into a 例如,以下操作将文档插入到不进行日志编辑的mongod running without log redaction. The mongod has the log verbosity level set to 1:mongod中。mongod的日志详细程度设置为1:
db.clients.insertOne( { "name" : "Joe", "PII" : "Sensitive Information" } )
This operation produces the following log event:此操作将生成以下日志事件:
{
"t": { "$date": "2024-07-19T15:36:55.024-07:00" },
"s": "I",
"c": "COMMAND",
...
"attr": {
"type": "command",
...
"appName": "mongosh 2.2.10",
"command": {
"insert": "clients",
"documents": [
{
"name": "Joe",
"PII": "Sensitive Information",
"_id": { "$oid": "669aea8792c7fd822d3e1d8c" }
}
],
"ordered": true,
...
}
...
}
}
When 当mongod runs with redactClientLogData and performs the same insert operation, it produces the following log event:mongod使用redactClientLogData运行并执行相同的插入操作时,它会生成以下日志事件:
{
"t": { "$date": "2024-07-19T15:36:55.024-07:00" },
"s": "I",
"c": "COMMAND",
...
"attr": {
"type": "command",
...
"appName": "mongosh 2.2.10",
"command": {
"insert": "###",
"documents": [
{
"name": "###",
"PII": "###",
"_id": "###"
}
],
"ordered": "###",
...
}
...
}
}
Use 将redactClientLogData in conjunction with Encryption at Rest and TLS/SSL (Transport Encryption) to assist compliance with regulatory requirements.redactClientLogData与静态加密和TLS/SSL(传输加密)结合使用,以帮助遵守监管要求。
Diagnosing Performance Issues诊断性能问题
As you develop and operate applications with MongoDB, you may want to analyze the performance of the database as the application. MongoDB Performance discusses some of the operational factors that can influence performance.当您使用MongoDB开发和操作应用程序时,您可能希望分析数据库作为应用程序的性能。MongoDB性能讨论了可能影响性能的一些操作因素。
Replication and Monitoring复制和监控
Beyond the basic monitoring requirements for any MongoDB instance, for replica sets, administrators must monitor replication lag. "Replication lag" refers to the amount of time that it takes to copy (i.e. replicate) a write operation on the primary to a secondary. Some small delay period may be acceptable, but significant problems emerge as replication lag grows, including:除了对任何MongoDB实例的基本监控要求外,对于副本集,管理员还必须监控复制延迟。“复制滞后”是指将主服务器上的写入操作复制到辅助服务器所需的时间。一些小的延迟期可能是可以接受的,但随着复制延迟的增加,会出现重大问题,包括:
Growing cache pressure on the primary.主服务器上的缓存压力越来越大。Operations that occurred during the period of lag are not replicated to one or more secondaries. If you're using replication to ensure data persistence, exceptionally long delays may impact the integrity of your data set.在延迟期间发生的操作不会复制到一个或多个次级服务器。如果您使用复制来确保数据持久性,那么异常长的延迟可能会影响数据集的完整性。If the replication lag exceeds the length of the operation log (oplog) then MongoDB will have to perform an initial sync on the secondary, copying all data from the primary and rebuilding all indexes.如果复制延迟超过操作日志(oplog)的长度,那么MongoDB将不得不在辅助服务器上执行初始同步,从主服务器复制所有数据并重建所有索引。[1]This is uncommon under normal circumstances, but if you configure the oplog to be smaller than the default, the issue can arise.在正常情况下,这并不常见,但如果将oplog配置为小于默认值,则可能会出现问题。Note
The size of the oplog is only configurable during the first run using theoplog的大小只能在第一次运行时使用--oplogSizeargument to themongodcommand, or preferably, theoplogSizeMBsetting in the MongoDB configuration file.mongod命令的--oplogSize参数进行配置,或者最好使用MongoDB配置文件中的oplogSizeMB设置。If you do not specify this on the command line before running with the如果在使用--replSetoption,mongodwill create a default sized oplog.--replSet选项运行之前没有在命令行中指定此选项,mongod将创建一个默认大小的oplog。By default, the oplog is 5 percent of total available disk space on 64-bit systems. For more information about changing the oplog size, see the Change the Oplog Size of Self-Managed Replica Set Members.默认情况下,oplog占64位系统上总可用磁盘空间的5%。有关更改oplog大小的更多信息,请参阅更改自我管理副本集成员的oplog大小。
Flow Control流量控制
Administrators can limit the rate at which the primary applies its writes with the goal of keeping the 管理员可以限制主应用其写入的速率,目的是将大多数提交的延迟保持在可配置的最大值majority committed lag under a configurable maximum value flowControlTargetLagSeconds.flowControlTargetLagSeconds以下。
By default, flow control is enabled.
See also: Check the Replication Lag.
Replica Set Status副本集状态
Replication issues are most often the result of network connectivity issues between members, or the result of a primary that does not have the resources to support application and replication traffic. 复制问题通常是由于成员之间的网络连接问题造成的,或者是由于主服务器没有资源来支持应用程序和复制流量造成的。To check the status of a replica, use the 要检查副本的状态,请在shell中使用replSetGetStatus or the following helper in the shell:replSetGetStatus或以下帮助程序:
rs.status()
The replSetGetStatus reference provides a more in-depth overview view of this output. replSetGetStatus引用提供了此输出的更深入的概览视图。In general, watch the value of 一般来说,要注意optimeDate, and pay particular attention to the time difference between the primary and the secondary members.optimeDate的值,并特别注意primary成员和secondary成员之间的时间差。
| [1] | majority commit point. |
Slow Application of Oplog EntriesOplog条目的缓慢应用
Secondary members of a replica set now log oplog entries that take longer than the slow operation threshold to apply. These slow oplog messages:副本集的次要成员现在记录的oplog条目的应用时间超过了慢速操作阈值。这些缓慢的oplog消息:
- Are logged for the secondaries in the
diagnostic log. - Are logged under the
REPLcomponent with the textapplied op: <oplog entry> took <num>ms. Do not depend on the log levels (either at the system or component level)不依赖于日志级别(无论是在系统级别还是组件级别)Do not depend on the profiling level.不要依赖于分析级别。Are affected by受slowOpSampleRate.slowOpSampleRate(慢速操作采样率)的影响。
The profiler does not capture slow oplog entries.分析器不会捕获慢速oplog条目。
Sharding and Monitoring分片和监控
In most cases, the components of sharded clusters benefit from the same monitoring and analysis as all other MongoDB instances. In addition, clusters require further monitoring to ensure that data is effectively distributed among nodes and that sharding operations are functioning appropriately.
Config Servers
The config database maintains a map identifying which documents are on which shards. The cluster updates this map as chunks move between shards. When a configuration server becomes inaccessible, certain sharding operations become unavailable, such as moving chunks and starting mongos instances. However, clusters remain accessible from already-running mongos instances.
Because inaccessible configuration servers can seriously impact the availability of a sharded cluster, you should monitor your configuration servers to ensure that the cluster remains well balanced and that mongos instances can restart.
MongoDB Cloud Manager and Ops Manager monitor config servers and can create notifications if a config server becomes inaccessible. See the MongoDB Cloud Manager documentation and Ops Manager documentation for more information.
Balancing and Chunk Distribution平衡和块分布
The most effective sharded cluster deployments evenly balance chunks among the shards. To facilitate this, MongoDB has a background balancer process that distributes data to ensure that chunks are always optimally distributed among the shards.
Issue the db.printShardingStatus() or sh.status() command to the mongos from within mongosh. This returns an overview of the entire cluster including the database name, and a list of the chunks.
Stale Locks陈旧的锁
To check the lock status of the database, connect to a mongos instance using mongosh. Issue the following command sequence to switch to the config database and display all outstanding locks on the shard database:
use config
db.locks.find()
The balancing process takes a special "balancer" lock that prevents other balancing activity from transpiring. In the config database, use the following command to view the "balancer" lock.
db.locks.find( { _id : "balancer" } )
The primary of the CSRS config server holds the "balancer" lock, using a process ID named "ConfigServer". CSRS配置服务器的主服务器使用名为“ConfigServer”的进程ID持有“平衡器”锁。This lock is never released. To determine if the balancer is running, see Check if Balancer is Running.
Storage Node Watchdog存储节点监视器
Note
The Storage Node Watchdog is available in both the Community and MongoDB Enterprise editions.存储节点监视器在社区版和MongoDB企业版中都有提供。
The Storage Node Watchdog monitors the following MongoDB directories to detect filesystem unresponsiveness:存储节点监视器监视以下MongoDB目录以检测文件系统无响应:
- The
--dbpathdirectory - The
journaldirectory inside the--dbpathdirectory - The directory of
--logpathfile - The directory of
--auditPathfile
Note
Starting in MongoDB 6.1, journaling is always enabled. As a result, MongoDB removes the 从MongoDB 6.1开始,日志始终处于启用状态。因此,MongoDB删除了storage.journal.enabled option and the corresponding --journal and --nojournal command-line options.storage.journal.enabled选项以及相应的--journal和--nojournal命令行选项。
By default, the Storage Node Watchdog is disabled. You can only enable the Storage Node Watchdog on a 默认情况下,存储节点监视器处于禁用状态。您只能在启动时通过将mongod at startup time by setting the watchdogPeriodSeconds parameter to an integer greater than or equal to 60. watchdogPeriodSeconds参数设置为大于或等于60的整数来启用mongod上的存储节点监视器。However, once enabled, you can pause the Storage Node Watchdog and restart during runtime. See 但是,一旦启用,您可以暂停存储节点监视器并在运行时重新启动。有关详细信息,请参阅watchdogPeriodSeconds parameter for details.watchdogPeriodSeconds参数。
If any of the filesystems containing the monitored directories become unresponsive, the Storage Node Watchdog terminates the 如果包含受监视目录的任何文件系统变得没有响应,存储节点监视器将终止mongod and exits with a status code of 61. mongod并退出,状态代码为61。If the 如果mongod is the primary of a replica set, the termination initiates a failover, allowing another member to become primary.mongod是副本集的主成员,则终止会启动故障转移,允许另一个成员成为主成员。
Once a 一旦mongod has terminated, it may not be possible to cleanly restart it on the same machine.mongod终止,可能无法在同一台机器上干净地重新启动它。
Note
Symlinks符号链接
If any of its monitored directories is a symlink to other volumes, the Storage Node Watchdog does not monitor the symlink target.如果其任何受监视的目录是指向其他卷的符号链接,则存储节点监视器不会监视符号链接目标。
For example, if the 例如,如果mongod uses storage.directoryPerDB: true (or --directoryperdb) and symlinks a database directory to another volume, the Storage Node Watchdog does not follow the symlink to monitor the target.mongod使用storage.directoryPerDB: true(或--directoryperdb)并将数据库目录符号链接到另一个卷,则存储节点监视器不会遵循符号链接来监视目标。
The maximum time the Storage Node Watchdog can take to detect an unresponsive filesystem and terminate is nearly twice the value of 存储节点监视器检测到无响应的文件系统并终止所需的最长时间几乎是watchdogPeriodSeconds.watchdogPeriodSeconds值的两倍。