On this page本页内容
Monitoring is a critical component of all database administration. 监控是所有数据库管理的关键组成部分。A firm grasp of MongoDB's reporting will allow you to assess the state of your database and maintain your deployment without crisis. 牢牢掌握MongoDB的报告将允许您评估数据库的状态,并在没有危机的情况下维护部署。Additionally, a sense of MongoDB's normal operational parameters will allow you to diagnose problems before they escalate to failures.此外,对MongoDB正常运行参数的了解将允许您在问题升级为故障之前诊断问题。
This document presents an overview of the available monitoring utilities and the reporting statistics available in MongoDB. 本文档概述了MongoDB中可用的监控实用程序和报告统计数据。It also introduces diagnostic strategies and suggestions for monitoring replica sets and sharded clusters.它还介绍了监控副本集和分片集群的诊断策略和建议。
MongoDB provides various methods for collecting data about the state of a running MongoDB instance:MongoDB提供了各种方法来集合有关正在运行的MongoDB实例状态的数据:
Each strategy can help answer different questions and is useful in different contexts. 每种策略都有助于回答不同的问题,并且在不同的环境中很有用。These methods are complementary.这些方法是互补的。
This section provides an overview of the reporting methods distributed with MongoDB. 本节概述了MongoDB发布的报告方法。It also offers examples of the kinds of questions that each method is best suited to help you address.它还提供了每种方法最适合帮助你解决的问题的例子。
New in version 4.0.在版本4.0中新增。
MongoDB offers free Cloud monitoring for standalones or replica sets.MongoDB为Standalone或副本集提供免费云监控。
By default, you can enable/disable free monitoring during runtime using 默认情况下,可以使用db.enableFreeMonitoring()
and db.disableFreeMonitoring()
.db.enableFreeMonitoring()
和db.disableFreeMonitoring()
在运行时启用/禁用自由监视。
Free monitoring provides up to 24 hours of data. 免费监控可提供多达24小时的数据。For more details, see Free Monitoring.有关更多详细信息,请参阅免费监控。
The MongoDB distribution includes a number of utilities that quickly return statistics about instances' performance and activity. Typically, these are most useful for diagnosing issues and assessing normal operation.MongoDB发行版包括许多实用程序,可以快速返回有关实例性能和活动的统计信息。通常,这些对诊断问题和评估正常运行最有用。
mongostat
mongostat
captures and returns the counts of database operations by type (e.g. insert, query, update, delete, etc.). 按类型捕获并返回数据库操作的计数(例如插入、查询、更新、删除等)。These counts report on the load distribution on the server.这些计数报告服务器上的负载分布。
Use 使用mongostat
to understand the distribution of operation types and to inform capacity planning. mongostat
了解运营类型的分布,并告知产能规划。See the 有关详细信息,请参阅mongostat
reference page for details.mongostat
参考页面。
mongotop
mongotop
tracks and reports the current read and write activity of a MongoDB instance, and reports these statistics on a per collection basis.跟踪和报告MongoDB实例的当前读写活动,并按集合报告这些统计信息。
Use 使用mongotop
to check if your database activity and use match your expectations. mongotop
检查您的数据库活动和使用是否符合您的期望。See the 有关详细信息,请参阅mongotop
reference page for details.mongotop
参考页面。
Changed in version 3.6.在版本3.6中更改。
MongoDB includes a number of commands that report on the state of the database.MongoDB包含许多报告数据库状态的命令。
These data may provide a finer level of granularity than the utilities discussed above. 这些数据可以提供比上面讨论的实用程序更精细的粒度级别。Consider using their output in scripts and programs to develop custom alerts, or to modify the behavior of your application in response to the activity of your instance. 考虑在脚本和程序中使用它们的输出来开发自定义警报,或者响应于实例的活动修改应用程序的行为。The db.currentOp()
method is another useful tool for identifying the database instance's in-progress operations.db.currentOp()
方法是另一个用于识别数据库实例正在进行的操作的有用工具。
serverStatus
The shell中的serverStatus
command, or db.serverStatus()
from the shell, returns a general overview of the status of the database, detailing disk usage, memory use, connection, journaling, and index access. serverStatus
命令或db.serverStatus()
返回数据库状态的一般概述,详细说明磁盘使用、内存使用、连接、日志记录和索引访问。The command returns quickly and does not impact MongoDB performance.该命令返回速度很快,不会影响MongoDB的性能。
serverStatus
outputs an account of the state of a MongoDB instance. 输出MongoDB实例状态的说明。This command is rarely run directly. 这个命令很少直接运行。In most cases, the data is more meaningful when aggregated, as one would see with monitoring tools including MongoDB Cloud Manager and Ops Manager. 在大多数情况下,数据在聚合时更有意义,这在MongoDB Cloud Manager和Ops Manager等监控工具中可以看到。Nevertheless, all administrators should be familiar with the data provided by 不过,所有管理员都应该熟悉serverStatus
.serverStatus
提供的数据。
dbStats
The dbStats
command, or db.stats()
from the shell, returns a document that addresses storage use and data volumes. dbStats
命令或shell中的db.stats()
返回一个解决存储使用和数据量的文档。The dbStats
reflect the amount of storage used, the quantity of data contained in the database, and object, collection, and index counters.dbStats
反映使用的存储量、数据库中包含的数据量以及对象、集合和索引计数器。
Use this data to monitor the state and storage capacity of a specific database. 使用此数据监视特定数据库的状态和存储容量。This output also allows you to compare use between databases and to determine the average document size in a database.此输出还允许您比较数据库之间的使用情况,并确定数据库中的平均文档大小。
collStats
The shell中的collStats
or db.collection.stats()
from the shell that provides statistics that resemble dbStats
on the collection level, including a count of the objects in the collection, the size of the collection, the amount of disk space used by the collection, and information about its indexes.collStats
或db.collection.stats()
,它在集合级别上提供类似于dbStats
的统计信息,包括集合中对象的计数、集合的大小、集合使用的磁盘空间量以及有关其索引的信息。
replSetGetStatus
The replSetGetStatus
command (rs.status()
from the shell) returns an overview of your replica set's status. replSetGetStatus
命令(shell中的rs.status()
命令)返回副本集状态的概览。The replSetGetStatus document details the state and configuration of the replica set and statistics about its members.replSetGetStatus文档详细说明了复制集的状态和配置,以及有关其成员的统计信息。
Use this data to ensure that replication is properly configured, and to check the connections between the current host and the other members of the replica set.使用此数据可确保正确配置复制,并检查当前主机与副本集其他成员之间的连接。
These are monitoring tools provided as a hosted service, usually through a paid subscription.这些是作为托管服务提供的监控工具,通常通过付费订阅提供。
VividCortex | |
Scout | |
Dashboard for MongoDB | |
New Relic | |
Datadog | |
Pandora FMS |
During normal operation, 在正常操作期间,mongod
and mongos
instances report a live account of all server activity and operations to either standard output or a log file. mongod
和mongos
实例会向标准输出或日志文件报告所有服务器活动和操作的实时帐户。The following runtime settings control these options.以下运行时设置控制这些选项。
quiet
verbosity
logLevel
parameter or the db.setLogLevel()
method in the shell.logLevel
参数或shell中的db.setLogLevel()
方法修改日志详细信息。path
logAppend
You can specify these configuration operations as the command line arguments to mongod or mongos可以将这些配置操作指定为mongod
或mongos
的命令行参数
For example:
mongod -v --logpath /var/log/mongodb/server1.log --logappend
Starts a mongod
instance in verbose
mode, appending data to the log file at /var/log/mongodb/server1.log/
.
The following database commands also affect logging:以下数据库命令也会影响日志记录:
getLog
mongod
process log.mongod
进程日志的最新消息。logRotate
mongod
processes only. mongod
进程旋转日志文件。Available in MongoDB Enterprise only仅在MongoDB Enterprise中提供
A 使用mongod
running with security.redactClientLogData
redacts messages associated with any given log event before logging, leaving only metadata, source files, or line numbers related to the event. security.redactClientLogData
运行的mongod
会在记录之前对与任何给定日志事件相关的消息进行编辑,只留下与事件相关的元数据、源文件或行号。security.redactClientLogData
prevents potentially sensitive information from entering the system log at the cost of diagnostic detail.security.redactClientLogData
防止潜在的敏感信息进入系统日志,但会以牺牲诊断细节为代价。
For example, the following operation inserts a document into a 例如,以下操作将一个文档插入到运行时没有日志编辑的mongod
running without log redaction. The mongod
has systemLog.component.command.verbosity
set to 1
:mongod
中。mongod
将systemLog.component.command.verbosity
设置为1
:
db.clients.insertOne( { "name" : "Joe", "PII" : "Sensitive Information" } )
This operation produces the following log event:此操作将生成以下日志事件:
2017-06-09T13:35:23.446-04:00 I COMMAND [conn1] command internal.clients appName: "MongoDB Shell" command: insert { insert: "clients", documents: [ { _id: ObjectId('593adc5b99001b7d119d0c97'), name: "Joe", PII: " Sensitive Information" } ], ordered: true } ...
A 使用mongod
running with security.redactClientLogData
performing the same insert operation produces the following log event:security.redactClientLogData
运行的mongod
执行相同的插入操作,会生成以下日志事件:
2017-06-09T13:45:18.599-04:00 I COMMAND [conn1] command internal.clients appName: "MongoDB Shell" command: insert { insert: "###", documents: [ { _id: "###", name: "###", PII: "###" } ], ordered: "###" }
Use 将redactClientLogData
in conjunction with Encryption at Rest and TLS/SSL (Transport Encryption) to assist compliance with regulatory requirements.redactClientLogData
与Rest加密和TLS/SSL(传输加密)结合使用,以帮助遵守法规要求。
As you develop and operate applications with MongoDB, you may want to analyze the performance of the database as the application. 在使用MongoDB开发和操作应用程序时,您可能需要分析数据库作为应用程序的性能。MongoDB Performance discusses some of the operational factors that can influence performance.MongoDB性能讨论了一些可能影响性能的操作因素。
Beyond the basic monitoring requirements for any MongoDB instance, for replica sets, administrators must monitor replication lag. 除了对任何MongoDB实例的基本监视要求之外,对于副本集,管理员还必须监视复制延迟。"Replication lag" refers to the amount of time that it takes to copy (i.e. replicate) a write operation on the primary to a secondary. “复制延迟”指将primary上的写入操作复制(即复制)到secondary所需的时间。Some small delay period may be acceptable, but significant problems emerge as replication lag grows, including:一些小的延迟期可能是可以接受的,但随着复制延迟的增长,会出现重大问题,包括:
If the replication lag exceeds the length of the operation log (oplog) then MongoDB will have to perform an initial sync on the secondary, copying all data from the primary and rebuilding all indexes. 如果复制延迟超过了操作日志(oplog)的长度,那么MongoDB必须在辅助服务器上执行初始同步,从主服务器复制所有数据并重建所有索引。[1] This is uncommon under normal circumstances, but if you configure the oplog to be smaller than the default, the issue can arise.这在正常情况下并不常见,但如果将oplog配置为小于默认值,则可能会出现问题。
The size of the oplog is only configurable during the first run using the oplog的大小只能在第一次运行期间使用--oplogSize
argument to the mongod
command, or preferably, the oplogSizeMB
setting in the MongoDB configuration file. mongod
命令的--oplogSize
参数进行配置,或者最好使用MongoDB配置文件中的oplogSizeMB
设置。If you do not specify this on the command line before running with the 如果在使用--replSet
option, mongod
will create a default sized oplog.--replSet
选项运行之前未在命令行中指定此选项,mongod
将创建默认大小的oplog。
By default, the oplog is 5 percent of total available disk space on 64-bit systems. 默认情况下,oplog是64位系统上总可用磁盘空间的5%。For more information about changing the oplog size, see the Change the Size of the Oplog.有关更改oplog大小的更多信息,请参阅更改oplog的大小。
Starting in MongoDB 4.2, administrators can limit the rate at which the primary applies its writes with the goal of keeping the 从MongoDB 4.2开始,管理员可以限制主应用其写操作的速率,目的是将大多数提交的延迟保持在可配置的最大值majority committed
lag under a configurable maximum value flowControlTargetLagSeconds
.flowControlTargetLagSeconds
之下。
By default, flow control is 默认情况下,流量控制处于启用状态。enabled
.
For flow control to engage, the replica set/sharded cluster must have: featureCompatibilityVersion (FCV) of 要启用流控制,副本集/分片集群必须具有:4.2
and read concern majority enabled
. 4.2
的featureCompatibilityVersion(FCV)和已启用的读关注多数。That is, enabled flow control has no effect if FCV is not 也就是说,如果FCV不是4.2
or if read concern majority is disabled.4.2
或读关注多数被禁用,则启用的流控制无效。
See also: Check the Replication Lag.另请参见:检查复制延迟。
Replication issues are most often the result of network connectivity issues between members, or the result of a primary that does not have the resources to support application and replication traffic. 复制问题通常是由于成员之间的网络连接问题,或者是由于primary没有资源来支持应用程序和复制流量。To check the status of a replica, use the 要检查副本的状态,请使用replSetGetStatus
or the following helper in the shell:replSetGetStatus
或shell中的以下帮助程序:
rs.status()
The replSetGetStatus
reference provides a more in-depth overview view of this output. replSetGetStatus
参考提供了此输出的更深入的概览视图。In general, watch the value of 一般来说,注意optimeDate
, and pay particular attention to the time difference between the primary and the secondary members.optimeDate
的值,并特别注意主要成员和次要成员之间的时间差。
[1] | majority commit point . |
Starting in version 4.0, MongoDB offers free monitoring for standalone and replica sets. 从4.0版开始,MongoDB为独立和副本集提供免费监控。For more information, see Free Monitoring.有关更多信息,请参阅免费监控。
Starting in version 4.2 (also available starting in 4.0.6), secondary members of a replica set now log oplog entries that take longer than the slow operation threshold to apply. 从版本4.2开始(也可从4.0.6开始使用),副本集的次要成员现在会记录需要比慢速操作阈值更长时间才能应用的oplog条目。These slow oplog messages:这些缓慢的oplog消息:
diagnostic log
.REPL
component with the text applied op: <oplog entry> took <num>ms
.REPL
组件下,文本为applied op: <oplog entry> took <num>ms
。May be affected by 可能会受到slowOpSampleRate
, depending on your MongoDB version:slowOpSampleRate
的影响,具体取决于您的MongoDB版本:
slowOpSampleRate
. slowOpSampleRate
的影响。slowOpSampleRate
.slowOpSampleRate
的影响。The profiler does not capture slow oplog entries.探查器不会捕获较慢的oplog条目。
In most cases, the components of sharded clusters benefit from the same monitoring and analysis as all other MongoDB instances. 在大多数情况下,分片集群的组件受益于与所有其他MongoDB实例相同的监控和分析。In addition, clusters require further monitoring to ensure that data is effectively distributed among nodes and that sharding operations are functioning appropriately.此外,集群需要进一步监控,以确保数据在节点之间有效分布,并确保分片操作正常运行。
The config database maintains a map identifying which documents are on which shards. 配置数据库维护一个地图,标识哪些文档在哪些分片上。The cluster updates this map as chunks move between shards. 当区块在分片之间移动时,集群会更新此地图。When a configuration server becomes inaccessible, certain sharding operations become unavailable, such as moving chunks and starting 当配置服务器变得不可访问时,某些分片操作将变得不可用,例如移动块和启动mongos
instances. mongos
实例。However, clusters remain accessible from already-running 但是,集群仍然可以从已经运行的mongos
instances.mongos
实例访问。
Because inaccessible configuration servers can seriously impact the availability of a sharded cluster, you should monitor your configuration servers to ensure that the cluster remains well balanced and that 由于无法访问的配置服务器会严重影响分片集群的可用性,因此您应该监控配置服务器,以确保集群保持良好平衡,并且mongos
instances can restart.mongos
实例可以重新启动。
MongoDB Cloud Manager and Ops Manager monitor config servers and can create notifications if a config server becomes inaccessible. MongoDB云管理器和Ops Manager监控配置服务器,并在配置服务器无法访问时创建通知。See the MongoDB Cloud Manager documentation and Ops Manager documentation for more information.有关更多信息,请参阅MongoDB云管理器文档和Ops Manager文档。
The most effective sharded cluster deployments evenly balance chunks among the shards. 最有效的分片集群部署会在分片之间均匀地平衡块。To facilitate this, MongoDB has a background balancer process that distributes data to ensure that chunks are always optimally distributed among the shards.为了便于实现这一点,MongoDB有一个后台均衡器进程,用于分配数据,以确保块始终以最佳方式分布在分片之间。
Issue the 从db.printShardingStatus()
or sh.status()
command to the mongos
from within mongosh
. mongosh
内部向mongos
发出db.printShardingStatus()
或sh.status()
命令。This returns an overview of the entire cluster including the database name, and a list of the chunks.这将返回整个集群的概览,包括数据库名称和区块列表。
To check the lock status of the database, connect to a 要检查数据库的锁定状态,请使用mongos
instance using mongosh
. mongosh
连接到mongos
实例。Issue the following command sequence to switch to the 发出以下命令序列以切换到config
database and display all outstanding locks on the shard database:config
数据库并显示分片数据库上所有未完成的锁:
use config db.locks.find()
The balancing process takes a special "balancer" lock that prevents other balancing activity from transpiring. 平衡过程需要一个特殊的“均衡器”锁,防止发生其他平衡活动。In the 在config
database, use the following command to view the "balancer" lock.config
数据库中,使用以下命令查看“均衡器”锁。
db.locks.find( { _id : "balancer" } )
Changed in version 3.4.在版本3.4中更改。
The Storage Node Watchdog monitors the following MongoDB directories to detect filesystem unresponsiveness:存储节点监视程序监视以下MongoDB目录,以检测文件系统无响应:
--dbpath
directory--dbpath
目录journal
directory inside the --dbpath
directory if journaling
is enabled--dbpath
目录中的journal
目录--logpath
file--logpath
文件的目录--auditPath
file--auditPath
文件的目录By default, the Storage Node Watchdog is disabled. 默认情况下,存储节点监视程序处于禁用状态。You can only enable the Storage Node Watchdog on a 只有在启动时将watchdogPeriodSeconds参数设置为大于或等于60的整数,才能在mongod
at startup time by setting the watchdogPeriodSeconds
parameter to an integer greater than or equal to 60. mongod
上启用存储节点Watchdog。However, once enabled, you can pause the Storage Node Watchdog and restart during runtime. 但是,启用后,可以暂停存储节点监视程序,并在运行时重新启动。See 有关详细信息,请参阅watchdogPeriodSeconds
parameter for details.watchdogPeriodSeconds
参数。
If any of the filesystems containing the monitored directories become unresponsive, the Storage Node Watchdog terminates the 如果包含受监视目录的任何文件系统没有响应,存储节点看门狗将终止mongod
and exits with a status code of 61. mongod
并以61的状态代码退出。If the 如果mongod
is the primary of a replica set, the termination initiates a failover, allowing another member to become primary.mongod
是副本集的主要成员,则终止将启动故障转移,允许另一个成员成为主要成员。
Once a 一旦mongod
has terminated, it may not be possible to cleanly restart it on the same machine.mongod
终止,可能无法在同一台机器上干净地重新启动它。
If any of its monitored directories is a symlink to other volumes, the Storage Node Watchdog does not monitor the symlink target.如果其任何受监视的目录是指向其他卷的符号链接,则存储节点监视程序不会监视符号链接目标。
For example, if the 例如,如果mongod
uses storage.directoryPerDB: true
(or --directoryperdb
) and symlinks a database directory to another volume, the Storage Node Watchdog does not follow the symlink to monitor the target.mongod
使用storage.directoryPerDB: true
(或--directoryperdb
)并将数据库目录符号链接到另一个卷,则存储节点监视程序不会跟随符号链接来监视目标。
The maximum time the Storage Node Watchdog can take to detect an unresponsive filesystem and terminate is nearly twice the value of 存储节点监视程序检测到无响应文件系统并终止所需的最长时间几乎是watchdogPeriodSeconds
.watchdogPeriodSeconds
(监视周期秒值)的两倍。