On this page本页内容
mongod
process that stopped running unexpectedly?mongod
进程的信息?keepalive
time affect MongoDB Deployments?keepalive
是否影响MongoDB部署?This document provides answers to common diagnostic questions and issues.本文档提供了常见诊断问题和问题的答案。
If you don't find the answer you're looking for, check the complete list of FAQs or post your question to the MongoDB Community.如果你找不到你想要的答案,请查看常见问题的完整列表,或者将你的问题发布到MongoDB社区。
mongod
process that stopped running unexpectedly?mongod
进程的信息?If 如果mongod
shuts down unexpectedly on a UNIX or UNIX-based platform, and if mongod
fails to log a shutdown or error message, then check your system logs for messages pertaining to MongoDB. mongod
在UNIX或基于UNIX的平台上意外关闭,并且mongod
未能记录关闭或错误消息,请检查系统日志中与MongoDB相关的消息。For example, for logs located in 例如,对于/var/log/messages中的日志,请使用以下命令:/var/log/messages
, use the following commands:
sudo grep mongod /var/log/messages sudo grep score /var/log/messages
keepalive
time affect MongoDB Deployments?keepalive
时间是否影响MongoDB部署?If you experience network timeouts or socket errors in communication between clients and servers, or between members of a sharded cluster or replica set, check the TCP keepalive value for the affected systems.如果在客户端和服务器之间,或分片集群或副本集成员之间的通信中遇到网络超时或套接字错误,请检查受影响系统的TCP keepalive值。
Many operating systems set this value to 许多操作系统默认将此值设置为7200秒(两小时)。7200
seconds (two hours) by default. For MongoDB, you will generally experience better results with a shorter keepalive value, on the order of 对于MongoDB,您通常会使用更短的keepalive值(大约120
seconds (two minutes).120
秒(两分钟))获得更好的结果。
If your MongoDB deployment experiences keepalive-related issues, you must alter the keepalive value on all affected systems. 如果MongoDB部署遇到保活相关问题,则必须更改所有受影响系统上的保活值。This includes all machines running 这包括所有运行mongod
or mongos
processes and all machines hosting client processes that connect to MongoDB.mongod
或mongos
进程的机器,以及所有托管连接到MongoDB的客户端进程的机器。
To view the keepalive setting on Linux, use one of the following commands:要查看Linux上的keepalive设置,请使用以下命令之一:
sysctl net.ipv4.tcp_keepalive_time
Or:或:
cat /proc/sys/net/ipv4/tcp_keepalive_time
The value is measured in seconds.该值以秒为单位测量。
Although the setting name includes 尽管设置名称包括ipv4
, the tcp_keepalive_time
value applies to both IPv4 and IPv6.ipv4
,但tcp_keepalive_time
值同时适用于ipv4和IPv6。
To change the 要更改tcp_keepalive_time
value, you can use one of the following commands, supplying a <value> in seconds:tcp_keepalive_time
值,可以使用以下命令之一,以秒为单位提供<value>
:
sudo sysctl -w net.ipv4.tcp_keepalive_time=<value>
Or:
echo <value> | sudo tee /proc/sys/net/ipv4/tcp_keepalive_time
These operations do not persist across system reboots. 这些操作不会在系统重新启动期间持续。To persist the setting, add the following line to 要保持设置,请在/etc/sysctl.conf
, supplying a <value> in seconds, and reboot the machine:/etc/sysctl.conf
中添加以下行,以秒为单位提供一个<value>
,然后重新启动计算机:
net.ipv4.tcp_keepalive_time = <value>
Keepalive values greater than 大于300
seconds, (5 minutes) will be overridden on mongod
and mongos
sockets and set to 300
seconds.300
秒(5分钟)的Keepalive值将在mongod
和mongos
套接字上被覆盖,并设置为300秒。
To view the keepalive setting on Windows, issue the following command:要查看Windows上的keepalive设置,请发出以下命令:
reg query HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters /v KeepAliveTime
The registry value is not present by default. 默认情况下不存在注册表值。The system default, used if the value is absent, is 如果该值不存在,则系统默认值为7200000
milliseconds or 0x6ddd00
in hexadecimal.7200000
毫秒或十六进制的0x6ddd00
。
To change the 要更改KeepAliveTime
value, use the following command in an Administrator Command Prompt, where <value>
is expressed in hexadecimal (e.g. 120000
is 0x1d4c0
):KeepAliveTime
值,请在管理员命令提示符中使用以下命令,其中<value>
以十六进制表示(例如120000
是0x1d4c0
):
reg add HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\ /t REG_DWORD /v KeepAliveTime /d <value>
Windows users should consider the Windows Server Technet Article on KeepAliveTime for more information on setting keepalive for MongoDB deployments on Windows systems. 有关在Windows系统上为MongoDB部署设置keepalive的更多信息,Windows用户应考虑有关KeepAliveTime的Windows Server Technet文章。Keepalive values greater than or equal to 600000 milliseconds (10 minutes) will be ignored by 大于或等于mongod
and mongos
.600000
毫秒(10分钟)的Keepalive值将被mongod
和mongos
忽略。
To view the keepalive setting on macOS, issue the following command:要查看macOS上的keepalive设置,请发出以下命令:
sysctl net.inet.tcp.keepidle
The value is measured in milliseconds.该值以毫秒为单位测量。
To change the 要更改net.inet.tcp.keepidle
value, you can use the following command, supplying a <value> in milliseconds:net.inet.tcp.keepidle
值,可以使用以下命令,以毫秒为单位提供<value>
:
sudo sysctl net.inet.tcp.keepidle=<value>
This operation does not persist across system reboots, and must be set each time your system reboots. 此操作不会在系统重新启动期间持续,必须在每次系统重新启动时设置。See your operating system's documentation for instructions on setting this value persistently. 有关持久设置此值的说明,请参阅操作系统文档。Keepalive values greater than or equal to 大于或等于600000
milliseconds (10 minutes) will be ignored by mongod
and mongos
.600000
毫秒(10分钟)的Keepalive值将被mongod
和mongos
忽略。
In macOS 10.15 Catalina, Apple no longer allows for configuration of the 在macOS 10.15 Catalina中,苹果不再允许配置net.inet.tcp.keepidle
option.net.inet.tcp.keepidle
选项。
You will need to restart 您需要重新启动mongod
and mongos
processes for new system-wide keepalive settings to take effect.mongod
和mongos
进程,以使新的系统范围保活设置生效。
If you see a very large number of connection and re-connection messages in your MongoDB log, then clients are frequently connecting and disconnecting to the MongoDB server. 如果您在MongoDB日志中看到大量连接和重新连接消息,那么客户端经常连接和断开MongoDB服务器。This is normal behavior for applications that do not use request pooling, such as CGI. Consider using FastCGI, an Apache Module, or some other kind of persistent application server to decrease the connection overhead.对于不使用请求池的应用程序(如CGI),这是正常的行为。考虑使用FastCGI、Apache模块或其他类型的持久应用服务器来减少连接开销。
If these connections do not impact your performance you can use the run-time 如果这些连接不影响性能,则可以使用运行时quiet
option or the command-line option --quiet
to suppress these messages from the log.quiet
选项或命令行选项--quiet
从日志中抑制这些消息。
Starting in version 4.0, MongoDB offers free Cloud monitoring for standalones and replica sets. 从4.0版开始,MongoDB为标准和副本集提供免费云监控。Free monitoring provides information about your deployment, including:免费监控提供有关部署的信息,包括:
For more information, see Free Monitoring.有关详细信息,请参阅免费监控。
The MongoDB Cloud Manager and Ops Manager, an on-premise solution available in MongoDB Enterprise Advanced include monitoring functionality, which collects data from running MongoDB deployments and provides visualization and alerts based on that data.MongoDB Cloud Manager和Ops Manager是MongoDB Enterprise Advanced中提供的内部部署解决方案,包括监控功能,可从运行的MongoDB部署中集合数据,并根据这些数据提供可视化和警报。
For more information, see also the MongoDB Cloud Manager documentation and Ops Manager documentation.有关更多信息,请参阅MongoDB Cloud Manager文档和Ops Manager文档。
A full list of third-party tools is available as part of the Monitoring for MongoDB documentation.第三方工具的完整列表作为Monitoring for MongoDB文档的一部分提供。
No.
If the cache does not have enough space to load additional data, WiredTiger evicts pages from the cache to free up space.如果缓存没有足够的空间来加载其他数据,WiredTiger会从缓存中逐出页面以释放空间。
The 存储storage.wiredTiger.engineConfig.cacheSizeGB
limits the size of the WiredTiger internal cache. storage.wiredTiger.engineConfig.cacheSizeGB
限制了WiredTiger内部缓存的大小。The operating system will use the available free memory for filesystem cache, which allows the compressed MongoDB data files to stay in memory. In addition, the operating system will use any free RAM to buffer file system blocks and file system cache.操作系统将使用可用的空闲内存进行文件系统缓存,这允许压缩的MongoDB数据文件留在内存中。此外,操作系统将使用任何空闲RAM来缓冲文件系统块和文件系统缓存。
To accommodate the additional consumers of RAM, you may have to decrease WiredTiger internal cache size.为了容纳更多的RAM用户,您可能需要减小WiredTiger内部缓存的大小。
The default WiredTiger internal cache size value assumes that there is a single 默认的WiredTiger内部缓存大小值假设每台机器有一个mongod
instance per machine. mongod
实例。If a single machine contains multiple MongoDB instances, then you should decrease the setting to accommodate the other 如果一台机器包含多个MongoDB实例,则应减少设置以容纳其他mongod
instances.mongod
实例。
If you run 如果在无法访问系统中所有可用RAM的容器(例如mongod
in a container (e.g. lxc
, cgroups
, Docker, etc.) that does not have access to all of the RAM available in a system, you must set storage.wiredTiger.engineConfig.cacheSizeGB
to a value less than the amount of RAM available in the container. lxc
、cgroups
、Docker
等)中运行mongod
,则必须将storage.wiredTiger.engineConfig.cacheSizeGB
设置为小于容器中可用RAM的值。The exact amount depends on the other processes running in the container. 确切的数量取决于容器中运行的其他进程。See 参见memLimitMB
.memLimitMB
。
To see statistics on the cache and eviction, use the 要查看缓存和逐出的统计信息,请使用serverStatus
command. serverStatus
命令。The wiredTiger.cache
field holds the information on the cache and eviction.wiredTiger.cache
字段保存有关缓存和逐出的信息。
... "wiredTiger" : { ... "cache" : { "tracked dirty bytes in the cache" : <num>, "bytes currently in the cache" : <num>, "maximum bytes configured" : <num>, "bytes read into cache" :<num>, "bytes written from cache" : <num>, "pages evicted by application threads" : <num>, "checkpoint blocked page eviction" : <num>, "unmodified pages evicted" : <num>, "page split during eviction deepened the tree" : <num>, "modified pages evicted" : <num>, "pages selected for eviction unable to be evicted" : <num>, "pages evicted because they exceeded the in-memory maximum" : <num>,, "pages evicted because they had chains of deleted items" : <num>, "failed eviction of pages that exceeded the in-memory maximum" : <num>, "hazard pointer blocked page eviction" : <num>, "internal pages evicted" : <num>, "maximum page size at eviction" : <num>, "eviction server candidate queue empty when topping up" : <num>, "eviction server candidate queue not empty when topping up" : <num>, "eviction server evicting pages" : <num>, "eviction server populating queue, but not evicting pages" : <num>, "eviction server unable to reach eviction goal" : <num>, "pages split during eviction" : <num>, "pages walked for eviction" : <num>, "eviction worker thread evicting pages" : <num>, "in-memory page splits" : <num>, "percentage overhead" : <num>, "tracked dirty pages in the cache" : <num>, "pages currently held in the cache" : <num>, "pages read into cache" : <num>, "pages written from cache" : <num>, }, ...
For an explanation of some key cache and eviction statistics, such as 有关某些密钥缓存和逐出统计信息的说明,例如缓存中当前的wiredTiger.cache.bytes
currently in the cache and wiredTiger.cache.tracked
dirty bytes in the cache, see wiredTiger.cache
.wiredTiger.cache.bytes
和缓存中的wiredTiger.cache.tracked
脏字节,请参阅wiredTiger.cache
。
To adjust the size of the WiredTiger internal cache, see 要调整WiredTiger内部缓存的大小,请参阅storage.wiredTiger.engineConfig.cacheSizeGB
and --wiredTigerCacheSizeGB
. storage.wiredTiger.engineConfig.cacheSizeGB
和--wiredTigerCacheSizeGB
。Avoid increasing the WiredTiger internal cache size above its default value.避免将WiredTiger内部缓存大小增加到其默认值以上。
With WiredTiger, MongoDB utilizes both the WiredTiger internal cache and the filesystem cache.在WiredTiger中,MongoDB同时利用WiredTig内部缓存和文件系统缓存。
Starting in MongoDB 3.4, the default WiredTiger internal cache size is the larger of either:从MongoDB 3.4开始,默认的WiredTiger内部缓存大小为以下值中的较大值:
For example, on a system with a total of 4GB of RAM the WiredTiger cache will use 1.5GB of RAM (例如,在总共有4GB RAM的系统上,WiredTiger缓存将使用1.5GB的RAM(0.5 * (4 GB - 1 GB) = 1.5 GB
). 0.5 * (4 GB - 1 GB) = 1.5 GB
)。Conversely, a system with a total of 1.25 GB of RAM will allocate 256 MB to the WiredTiger cache because that is more than half of the total RAM minus one gigabyte (相反,一个总共有1.25 GB RAM的系统将为WiredTiger缓存分配256 MB,因为这超过了总RAM减去1 GB的一半(0.5 * (1.25 GB - 1 GB) = 128 MB < 256 MB
).0.5 * (1.25 GB - 1 GB) = 128 MB < 256 MB
)。
In some instances, such as when running in a container, the database can have memory constraints that are lower than the total system memory. 在某些情况下,例如在容器中运行时,数据库的内存约束可能低于总系统内存。In such instances, this memory limit, rather than the total system memory, is used as the maximum RAM available.在这种情况下,这个内存限制,而不是总系统内存,被用作可用的最大RAM。
To see the memory limit, see 要查看内存限制,请参阅hostInfo.system.memLimitMB
.hostInfo.system.memLimitMB
。
By default, WiredTiger uses Snappy block compression for all collections and prefix compression for all indexes. 默认情况下,WiredTiger对所有集合使用Snappy块压缩,对所有索引使用前缀压缩。Compression defaults are configurable at a global level and can also be set on a per-collection and per-index basis during collection and index creation.压缩默认值可以在全局级别进行配置,也可以在集合和索引创建期间根据每个集合和每个索引进行设置。
Different representations are used for data in the WiredTiger internal cache versus the on-disk format:WiredTiger内部缓存中的数据与磁盘上的数据格式使用不同的表示:
Via the filesystem cache, MongoDB automatically uses all free memory that is not used by the WiredTiger cache or by other processes.通过文件系统缓存,MongoDB自动使用WiredTiger缓存或其他进程未使用的所有空闲内存。
To adjust the size of the WiredTiger internal cache, see 要调整WiredTiger内部缓存的大小,请参阅storage.wiredTiger.engineConfig.cacheSizeGB
and --wiredTigerCacheSizeGB
. storage.wiredTiger.engineConfig.cacheSizeGB
和--wiredTigerCacheSizeGB
。Avoid increasing the WiredTiger internal cache size above its default value.避免将WiredTiger内部缓存大小增加到其默认值以上。
The 存储storage.wiredTiger.engineConfig.cacheSizeGB
limits the size of the WiredTiger internal cache. storage.wiredTiger.engineConfig.cacheSizeGB
限制了WiredTiger内部缓存的大小。The operating system will use the available free memory for filesystem cache, which allows the compressed MongoDB data files to stay in memory. 操作系统将使用可用的空闲内存进行文件系统缓存,这允许压缩的MongoDB数据文件留在内存中。In addition, the operating system will use any free RAM to buffer file system blocks and file system cache.此外,操作系统将使用任何空闲RAM来缓冲文件系统块和文件系统缓存。
To accommodate the additional consumers of RAM, you may have to decrease WiredTiger internal cache size.为了容纳更多的RAM用户,您可能需要减小WiredTiger内部缓存的大小。
The default WiredTiger internal cache size value assumes that there is a single 默认的WiredTiger内部缓存大小值假定每台机器有一个mongod
instance per machine. mongod
实例。If a single machine contains multiple MongoDB instances, then you should decrease the setting to accommodate the other 如果一台机器包含多个MongoDB实例,那么应该减少设置以适应其他mongod
instances.mongod
实例。
If you run 如果在无法访问系统中所有可用RAM的容器(例如mongod
in a container (e.g. lxc
, cgroups
, Docker, etc.) that does not have access to all of the RAM available in a system, you must set storage.wiredTiger.engineConfig.cacheSizeGB
to a value less than the amount of RAM available in the container. lxc
、cgroups
、Docker等)中运行mongod
,则必须将storage.wiredTiger.engineConfig.cacheSizeGB
设置为小于容器中可用RAM的值。The exact amount depends on the other processes running in the container. 确切的数量取决于容器中运行的其他进程。See 参见memLimitMB
.memLimitMB
。
To view statistics on the cache and eviction rate, see the 要查看缓存和逐出率的统计信息,请参阅wiredTiger.cache
field returned from the serverStatus
command.serverStatus
命令返回的wiredTiger.cache
字段。
The two most important factors in maintaining a successful sharded cluster are:维护成功分片集群的两个最重要因素是:
While you can change your shard key later, it is important to carefully consider your shard key choice to avoid scalability and performance issues. 虽然您可以稍后更改分片键,但必须仔细考虑分片键的选择,以避免可伸缩性和性能问题。Continue reading for specific issues you may encounter in a production environment.继续阅读您在生产环境中可能遇到的具体问题。
Your cluster must have sufficient data for sharding to make sense. 您的集群必须有足够的数据才能进行分片。Sharding works by migrating chunks between the shards until each shard has roughly the same number of chunks.分片的工作原理是在分片之间迁移块,直到每个分片具有大致相同数量的块。
The default chunk size is 128 megabytes. MongoDB will not begin migrations until the imbalance of chunks in the cluster exceeds the migration threshold. 默认块大小为128 MB。MongoDB将不会开始迁移,直到集群中的块不平衡超过迁移阈值。This behavior helps prevent unnecessary chunk migrations, which can degrade the performance of your cluster as a whole.这种行为有助于防止不必要的块迁移,这会降低集群的整体性能。
If you have just deployed a sharded cluster, make sure that you have enough data to make sharding effective. 如果您刚刚部署了分片集群,请确保您有足够的数据使分片有效。If you do not have sufficient data to create more than eight 128 megabyte chunks, then all data will remain on one shard. 如果您没有足够的数据来创建八个以上的128兆字节块,那么所有数据将保留在一个分片上。Either lower the chunk size setting, or add more data to the cluster.降低块大小设置,或向群集添加更多数据。
As a related problem, the system will split chunks only on inserts or updates, which means that if you configure sharding and do not continue to issue insert and update operations, the database will not create any chunks. 作为一个相关问题,系统将仅在插入或更新时分割块,这意味着如果您配置了分片,并且不继续发出插入和更新操作,数据库将不会创建任何块。You can either wait until your application inserts data or split chunks manually.您可以等到应用程序插入数据或手动拆分数据块。
Finally, if your shard key has a low cardinality, MongoDB may not be able to create sufficient splits among the data.最后,如果您的分片键基数较低,MongoDB可能无法在数据之间创建足够的分割。
In some situations, a single shard or a subset of the cluster will receive a disproportionate portion of the traffic and workload. 在某些情况下,单个分片或集群的一个子集将接收到不成比例的流量和工作负载。In almost all cases this is the result of a shard key that does not effectively allow write scaling.在几乎所有情况下,这都是由于分片键不允许有效地进行写入缩放。
It's also possible that you have "hot chunks." 你也可能有“热块”In this case, you may be able to solve the problem by splitting and then migrating parts of these chunks.在这种情况下,您可以通过分割并迁移这些块的一部分来解决问题。
You may have to consider resharding your collection with a different shard key to correct this pattern.您可能需要考虑使用不同的分片键来重新分片集合,以纠正这种模式。
If you have just deployed your sharded cluster, you may want to consider the troubleshooting suggestions for a new cluster where data remains on a single shard.如果您刚刚部署了分片集群,您可能需要考虑针对新集群的故障排除建议,其中数据保留在单个分片上。
If the cluster was initially balanced, but later developed an uneven distribution of data, consider the following possible causes:如果集群最初是平衡的,但后来数据分布不均匀,请考虑以下可能的原因:
Your data set is growing faster than the balancer can distribute data around the cluster. 您的数据集增长速度快于平衡器在集群中分发数据的速度。This is uncommon and typically is the result of:这种情况并不常见,通常是由于:
If migrations impact your cluster or application's performance, consider the following options, depending on the nature of the impact:如果迁移会影响群集或应用程序的性能,请根据影响的性质考虑以下选项:
If the balancer is always migrating chunks to the detriment of overall cluster performance:如果均衡器总是在迁移块,从而损害整体集群性能:
It's also possible that your shard key causes your application to direct all writes to a single shard. 您的分片键还可能导致应用程序将所有写入操作都指向单个分片。This kind of activity pattern can require the balancer to migrate most data soon after writing it. 这种活动模式可能要求平衡器在写入数据后立即迁移大部分数据。You may have to consider resharding your collection with a different shard key that provides better write scaling.您可能必须考虑使用不同的分片键重新分片您的集合,以提供更好的写入伸缩性。