Docs HomeMongoDB Manual

FAQ: MongoDB Diagnostics常见问题解答:MongoDB诊断

This document provides answers to common diagnostic questions and issues.本文档提供常见诊断问题的答案。

If you don't find the answer you're looking for, check the complete list of FAQs or post your question to the MongoDB Community.如果你找不到你想要的答案,请查看常见问题的完整列表或将你的问题发布到MongoDB社区

Where can I find information about a mongod process that stopped running unexpectedly?在哪里可以找到有关意外停止运行的mongod进程的信息?

If mongod shuts down unexpectedly on a UNIX or UNIX-based platform, and if mongod fails to log a shutdown or error message, then check your system logs for messages pertaining to MongoDB. 如果mongod在基于UNIX或UNIX的平台上意外关闭,并且mongod未能记录关闭或错误消息,请检查系统日志中是否有与MongoDB有关的消息。For example, for logs located in /var/log/messages, use the following commands:例如,对于位于/var/log/messages中的日志,请使用以下命令:

sudo grep mongod /var/log/messages
sudo grep score /var/log/messages

Does TCP keepalive time affect MongoDB Deployments?TCPkeepalive时间会影响MongoDB部署吗?

If you experience network timeouts or socket errors in communication between clients and servers, or between members of a sharded cluster or replica set, check the TCP keepalive value for the affected systems.如果在客户端和服务器之间,或在分片集群或副本集的成员之间的通信中遇到网络超时或套接字错误,请检查受影响系统的TCP保活值。

Many operating systems set this value to 7200 seconds (two hours) by default. 许多操作系统默认情况下将此值设置为7200秒(两小时)。For MongoDB, you will generally experience better results with a shorter keepalive value, on the order of 120 seconds (two minutes).对于MongoDB,通常使用较短的保活值(约为120秒(两分钟))会获得更好的结果。

If your MongoDB deployment experiences keepalive-related issues, you must alter the keepalive value on all affected systems. 如果您的MongoDB部署遇到与保活相关的问题,则必须更改所有受影响系统上的保活值。This includes all machines running mongod or mongos processes and all machines hosting client processes that connect to MongoDB.这包括所有运行mongodmongos进程的机器,以及所有托管连接到MongoDB的客户端进程的机器。

Adjusting the TCP keepalive value:调整TCP保活值:

  • To view the keepalive setting on Linux, use one of the following commands:要在Linux上查看保持活动设置,请使用以下命令之一:

    sysctl net.ipv4.tcp_keepalive_time

    Or:

    cat /proc/sys/net/ipv4/tcp_keepalive_time

    The value is measured in seconds.该值以秒为单位进行测量。

    Note

    Although the setting name includes ipv4, the tcp_keepalive_time value applies to both IPv4 and IPv6.尽管设置名称包括ipv4,但tcp_keepalive_time值同时适用于ipv4和IPv6。

  • To change the tcp_keepalive_time value, you can use one of the following commands, supplying a <value> in seconds:要更改tcp_keepalive_time值,可以使用以下命令之一,以秒为单位提供<value>

    sudo sysctl -w net.ipv4.tcp_keepalive_time=<value>

    Or:

    echo <value> | sudo tee /proc/sys/net/ipv4/tcp_keepalive_time

    These operations do not persist across system reboots. 这些操作不会在系统重新启动时持续存在。To persist the setting, add the following line to /etc/sysctl.conf, supplying a <value> in seconds, and reboot the machine:要保持该设置,请在/etc/sysctl.conf中添加以下行,以秒为单位提供一个<value>,然后重新启动计算机:

    net.ipv4.tcp_keepalive_time = <value>

    Keepalive values greater than 300 seconds, (5 minutes) will be overridden on mongod and mongos sockets and set to 300 seconds.大于300秒(5分钟)的Keepalive值将在mongodmongos套接字上被覆盖,并设置为300秒。

  • To view the keepalive setting on Windows, issue the following command:要在Windows上查看保持活动设置,请发出以下命令:

    reg query HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters /v KeepAliveTime

    The registry value is not present by default. 默认情况下,注册表值不存在。The system default, used if the value is absent, is 7200000 milliseconds or 0x6ddd00 in hexadecimal.如果没有该值,则使用系统默认值,即7200000毫秒或十六进制的0x6ddd00

  • To change the KeepAliveTime value, use the following command in an Administrator Command Prompt, where <value> is expressed in hexadecimal (e.g. 120000 is 0x1d4c0):要更改KeepAliveTime值,请在管理员命令提示符中使用以下命令,其中<value>以十六进制表示(例如120000表示0x1d4c0):

    reg add HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\ /t REG_DWORD /v KeepAliveTime /d <value>

    Windows users should consider the Windows Server Technet Article on KeepAliveTime for more information on setting keepalive for MongoDB deployments on Windows systems. Windows用户应考虑Windows Server Technet关于KeepAliveTime的文章,以了解有关在Windows系统上为MongoDB部署设置keepalive的更多信息。Keepalive values greater than or equal to 600000 milliseconds (10 minutes) will be ignored by mongod and mongos.大于或等于600000毫秒(10分钟)的Keepalive值将被mongodmongos忽略。

  • To view the keepalive setting on macOS, issue the following command:要在macOS上查看保活设置,请发出以下命令:

    sysctl net.inet.tcp.keepidle

    The value is measured in milliseconds.该值以毫秒为单位测量。

  • To change the net.inet.tcp.keepidle value, you can use the following command, supplying a <value> in milliseconds:要更改net.inet.tcp.keepidle值,可以使用以下命令,提供以毫秒为单位的<value>

    sudo sysctl net.inet.tcp.keepidle=<value>

    This operation does not persist across system reboots, and must be set each time your system reboots. See your operating system's documentation for instructions on setting this value persistently. 此操作不会在系统重新启动期间持续存在,并且必须在每次系统重新启动时进行设置。有关持久设置此值的说明,请参阅操作系统的文档。Keepalive values greater than or equal to 600000 milliseconds (10 minutes) will be ignored by mongod and mongos.大于或等于600000毫秒(10分钟)的Keepalive值将被mongodmongos忽略。

    Note

    In macOS 10.15 Catalina, Apple no longer allows for configuration of the net.inet.tcp.keepidle option.在macOS 10.15 Catalina中,苹果不再允许配置net.inet.tcp.keepidle选项。

You will need to restart mongod and mongos processes for new system-wide keepalive settings to take effect.您需要重新启动mongodmongos进程,才能使新的系统范围的保活设置生效。

Do TCP Retransmission Timeouts affect MongoDB Deployments?TCP重传超时会影响MongoDB部署吗?

If you experience long stalls (stalls greater than two minutes) followed by network timeouts or socket errors between clients and server or between members of a sharded cluster or replica set, check the tcp_retries2 value for the affected systems.如果在客户端和服务器之间、分片集群或副本集的成员之间经历长时间的暂停(暂停时间超过两分钟),然后出现网络超时或套接字错误,请检查受影响系统的tcp_reries2值。

Most Linux operating systems set this value to 15 by default, while Windows sets it to 5. 大多数Linux操作系统默认情况下将此值设置为15,而Windows则将其设置为5For MongoDB, you experience better results with a lower tcp_retries2 value, on the order of 5 (12 seconds) or lower.对于MongoDB,tcp_reries2的值越低,效果越好,大约为5(12秒)或更低。

If your MongoDB deployment experiences TCP retransmission timeout-related issues, change the tcp_retries2 value (TcpMaxDataRetransmission on Windows) for all affected systems. 如果您的MongoDB部署遇到与TCP重传超时相关的问题,请更改所有受影响系统的tcp_retries2值(在Windows上为TcpMaxDataRetransmission)。This includes all machines running mongod or mongos processes and all machines hosting client processes that connect to MongoDB.这包括所有运行mongodmongos进程的机器,以及所有托管连接到MongoDB的客户端进程的机器。

Adjust the TCP Retransmission Timeout调整TCP重新传输超时

On most Linux operating systems, control the TCP retransmission by adjusting the net.ipv4.tcp_retries2 sysctl setting.在大多数Linux操作系统上,通过调整net.ipv4.tcp_retries2设置来控制TCP重传。

Note

Although the setting name includes ipv4, the tcp_retries2 setting applies to both IPv4 and IPv6.尽管设置名称包括ipv4tcp_reries2设置同时适用于ipv4和IPv6。

  • To view the current setting, use the sysctl command:要查看当前设置,请使用sysctl命令:

    sysctl net.ipv4.tcp_retries2
    net.ipv4.tcp_retries = 15
  • To change the tcp_retries2 setting at runtime, use the sysctl command:要在运行时更改tcp_reries2设置,请使用sysctl命令:

    sysctl -w net.ipv4.tcp_retries2=8
  • To make the change permanent, edit the configuration file:要使更改永久化,请编辑配置文件:

    1. Open /etc/sysctl.conf in your preferred text editor:在您喜欢的文本编辑器中打开/etc/sysctl.conf

      vi /etc/sysctl.conf
    2. Configure the net.ipv4.tcp_retries2 setting:配置net.ipv4.tcp_retries2设置:

      net.ipv4.tcp_retries2 = 8
    3. Restart the system.重新启动系统。

    Your system now uses the new tcp_retries2 setting.您的系统现在使用新的tcp_reries2设置。

On Windows, control TCP Retransmission by adjusting the TcpMaxDataRetransmissions parameter.在Windows上,通过调整TcpMaxDataRetransmissions参数来控制TCP Retransmission。

  • To view the TcpMaxDataRetransmissions setting on Windows, issue the following command:要在Windows上查看TcpMaxDataRetransmissions设置,请发出以下命令:

    reg query HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters /v TcpMaxDataRetransmissions

    By default, the parameter is not set. The system default, used if the value is absent, is 5 retries.默认情况下,不设置该参数。如果没有该值,则系统默认值为5次重试。

  • To change the TcpMaxDataRetransmissions value, use the following command in an Administrator Command Prompt, where <value> is an integer:要更改TcpMaxDataRetransmissions值,请在管理员命令提示符中使用以下命令,其中<value>是一个整数:

    reg add HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\ /t REG_DWORD /v TcpMaxDataRetransmission /d <value>

Why does MongoDB log so many "Connection Accepted" events?为什么MongoDB会记录这么多“接受连接”事件?

If you see a very large number of connection and re-connection messages in your MongoDB log, then clients are frequently connecting and disconnecting to the MongoDB server. 如果您在MongoDB日志中看到大量连接和重新连接消息,那么客户端经常连接和断开与MongoDB服务器的连接。This is normal behavior for applications that do not use request pooling, such as CGI. Consider using FastCGI, an Apache Module, or some other kind of persistent application server to decrease the connection overhead.对于不使用请求池的应用程序(如CGI)来说,这是正常的行为。考虑使用FastCGI、Apache模块或其他类型的持久应用程序服务器来减少连接开销。

If these connections do not impact your performance you can use the run-time quiet option or the command-line option --quiet to suppress these messages from the log.如果这些连接不影响您的性能,您可以使用运行时quiet选项或命令行选项--quiet从日志中抑制这些消息。

What tools are available for monitoring MongoDB?有哪些工具可用于监控MongoDB?

Starting in version 4.0, MongoDB offers free Cloud monitoring for standalones and replica sets. 从4.0版本开始,MongoDB为单机和副本集提供免费的云监控Free monitoring provides information about your deployment, including:免费监控提供有关部署的信息,包括:

  • Operation Execution Times操作执行次数
  • Memory Usage
  • CPU Usage
  • Operation Counts

For more information, see Free Monitoring.有关更多信息,请参阅免费监控

The MongoDB Cloud Manager and Ops Manager, an on-premise solution available in MongoDB Enterprise Advanced include monitoring functionality, which collects data from running MongoDB deployments and provides visualization and alerts based on that data.MongoDB Cloud ManagerOps Manager是MongoDB Enterprise Advanced中提供的一种内部部署解决方案,包括监控功能,它可以从运行的MongoDB部署中集合数据,并基于这些数据提供可视化和警报。

For more information, see also the MongoDB Cloud Manager documentation and Ops Manager documentation.有关更多信息,请参阅MongoDB Cloud Manager文档Ops Manager文档

A full list of third-party tools is available as part of the Monitoring for MongoDB documentation.第三方工具的完整列表作为Monitoring for MongoDB文档的一部分提供。

Memory Diagnostics for the WiredTiger Storage EngineWiredTiger存储引擎的内存诊断

Must my working set size fit RAM?我的工作集大小必须适合RAM吗?

No.

If the cache does not have enough space to load additional data, WiredTiger evicts pages from the cache to free up space.如果缓存没有足够的空间来加载其他数据,WiredTiger会从缓存中逐出页面以释放空间。

Note

The storage.wiredTiger.engineConfig.cacheSizeGB limits the size of the WiredTiger internal cache. storage.wiredTiger.engineConfig.cacheSizeGB限制了WiredTiger内部缓存的大小。The operating system will use the available free memory for filesystem cache, which allows the compressed MongoDB data files to stay in memory. In addition, the operating system will use any free RAM to buffer file system blocks and file system cache.操作系统将使用可用的空闲内存进行文件系统缓存,这允许压缩的MongoDB数据文件留在内存中。此外,操作系统将使用任何空闲的RAM来缓冲文件系统块和文件系统缓存。

To accommodate the additional consumers of RAM, you may have to decrease WiredTiger internal cache size.为了容纳更多的RAM消耗者,您可能需要减小WiredTiger内部缓存的大小。

The default WiredTiger internal cache size value assumes that there is a single mongod instance per machine. 默认的WiredTiger内部缓存大小值假定每台机器有一个mongod实例。If a single machine contains multiple MongoDB instances, then you should decrease the setting to accommodate the other mongod instances.如果一台机器包含多个MongoDB实例,那么应该减少设置以容纳其他mongod实例。

If you run mongod in a container (e.g. lxc, cgroups, Docker, etc.) that does not have access to all of the RAM available in a system, you must set storage.wiredTiger.engineConfig.cacheSizeGB to a value less than the amount of RAM available in the container. 如果在无法访问系统中所有可用RAM的容器(例如lxccgroups、Docker等)中运行mongod,则必须将storage.wiredTiger.engineConfig.cacheSizeGB设置为小于容器中可用RAM量的值。The exact amount depends on the other processes running in the container. 确切的数量取决于容器中运行的其他进程。See memLimitMB.请参阅memLimitMB

To see statistics on the cache and eviction, use the serverStatus command. 要查看有关缓存和逐出的统计信息,请使用serverStatus命令。The wiredTiger.cache field holds the information on the cache and eviction.wiredTiger.cache字段保存有关缓存和逐出的信息。

...
"wiredTiger" : {
...
"cache" : {
"tracked dirty bytes in the cache" : <num>,
"bytes currently in the cache" : <num>,
"maximum bytes configured" : <num>,
"bytes read into cache" :<num>,
"bytes written from cache" : <num>,
"pages evicted by application threads" : <num>,
"checkpoint blocked page eviction" : <num>,
"unmodified pages evicted" : <num>,
"page split during eviction deepened the tree" : <num>,
"modified pages evicted" : <num>,
"pages selected for eviction unable to be evicted" : <num>,
"pages evicted because they exceeded the in-memory maximum" : <num>,,
"pages evicted because they had chains of deleted items" : <num>,
"failed eviction of pages that exceeded the in-memory maximum" : <num>,
"hazard pointer blocked page eviction" : <num>,
"internal pages evicted" : <num>,
"maximum page size at eviction" : <num>,
"eviction server candidate queue empty when topping up" : <num>,
"eviction server candidate queue not empty when topping up" : <num>,
"eviction server evicting pages" : <num>,
"eviction server populating queue, but not evicting pages" : <num>,
"eviction server unable to reach eviction goal" : <num>,
"pages split during eviction" : <num>,
"pages walked for eviction" : <num>,
"eviction worker thread evicting pages" : <num>,
"in-memory page splits" : <num>,
"percentage overhead" : <num>,
"tracked dirty pages in the cache" : <num>,
"pages currently held in the cache" : <num>,
"pages read into cache" : <num>,
"pages written from cache" : <num>,
},
...

For an explanation of some key cache and eviction statistics, such as wiredTiger.cache.bytes currently in the cache and wiredTiger.cache.tracked dirty bytes in the cache, see wiredTiger.cache.有关某些键缓存和逐出统计信息的解释,例如缓存中当前的wiredTiger.cache.bytes和缓存中的wiredTiger.cache.tracked脏字节,请参阅wiredTiger.cache

To adjust the size of the WiredTiger internal cache, see storage.wiredTiger.engineConfig.cacheSizeGB and --wiredTigerCacheSizeGB. 要调整WiredTiger内部缓存的大小,请参阅storage.wiredTiger.engineConfig.cacheSizeGB--wiredTigerCacheSizeGBAvoid increasing the WiredTiger internal cache size above its default value.避免将WiredTiger内部缓存大小增加到其默认值以上。

How do I calculate how much RAM I need for my application?我如何计算我的应用程序需要多少RAM?

With WiredTiger, MongoDB utilizes both the WiredTiger internal cache and the filesystem cache.有了WiredTiger,MongoDB既利用了WiredTinger内部缓存,也利用了文件系统缓存。

Starting in MongoDB 3.4, the default WiredTiger internal cache size is the larger of either:从MongoDB 3.4开始,默认的WiredTiger内部缓存大小是以下两者中较大的一个:

  • 50% of (RAM - 1 GB), or
  • 256 MB.

For example, on a system with a total of 4GB of RAM the WiredTiger cache will use 1.5GB of RAM (0.5 * (4 GB - 1 GB) = 1.5 GB). 例如,在总内存为4GB的系统上,WiredTiger缓存将使用1.5GB的RAM(0.5 * (4 GB - 1 GB) = 1.5 GB)。Conversely, a system with a total of 1.25 GB of RAM will allocate 256 MB to the WiredTiger cache because that is more than half of the total RAM minus one gigabyte (0.5 * (1.25 GB - 1 GB) = 128 MB < 256 MB).相反,总RAM为1.25 GB的系统将向WiredTiger缓存分配256 MB,因为这超过了总RAM的一半减去1 GB(0.5 * (1.25 GB - 1 GB) = 128 MB < 256 MB)。

Note

In some instances, such as when running in a container, the database can have memory constraints that are lower than the total system memory. In such instances, this memory limit, rather than the total system memory, is used as the maximum RAM available.在某些情况下,例如在容器中运行时,数据库的内存约束可能低于系统总内存。在这种情况下,这个内存限制,而不是整个系统内存,被用作可用的最大RAM。

To see the memory limit, see hostInfo.system.memLimitMB.要查看内存限制,请参阅hostInfo.system.memLimitMB

By default, WiredTiger uses Snappy block compression for all collections and prefix compression for all indexes. Compression defaults are configurable at a global level and can also be set on a per-collection and per-index basis during collection and index creation.默认情况下,WiredTiger对所有集合使用Snappy块压缩,对所有索引使用前缀压缩。压缩默认值可以在全局级别进行配置,也可以在集合和索引创建期间按每个集合和每个索引进行设置。

Different representations are used for data in the WiredTiger internal cache versus the on-disk format:WiredTiger内部缓存中的数据与磁盘上的格式使用不同的表示形式:

  • Data in the filesystem cache is the same as the on-disk format, including benefits of any compression for data files. The filesystem cache is used by the operating system to reduce disk I/O.文件系统缓存中的数据与磁盘上的格式相同,包括对数据文件进行任何压缩的好处。操作系统使用文件系统缓存来减少磁盘I/O。
  • Indexes loaded in the WiredTiger internal cache have a different data representation to the on-disk format, but can still take advantage of index prefix compression to reduce RAM usage. Index prefix compression deduplicates common prefixes from indexed fields.WiredTiger内部缓存中加载的索引具有与磁盘上格式不同的数据表示形式,但仍然可以利用索引前缀压缩来减少RAM的使用。索引前缀压缩从索引字段中消除常见前缀的重复。
  • Collection data in the WiredTiger internal cache is uncompressed and uses a different representation from the on-disk format. Block compression can provide significant on-disk storage savings, but data must be uncompressed to be manipulated by the server.WiredTiger内部缓存中的采集数据是未压缩的,使用不同于磁盘格式的表示形式。块压缩可以显著节省磁盘上的存储空间,但数据必须经过压缩才能由服务器操作。

Via the filesystem cache, MongoDB automatically uses all free memory that is not used by the WiredTiger cache or by other processes.通过文件系统缓存,MongoDB自动使用WiredTiger缓存或其他进程未使用的所有可用内存。

To adjust the size of the WiredTiger internal cache, see storage.wiredTiger.engineConfig.cacheSizeGB and --wiredTigerCacheSizeGB. 要调整WiredTiger内部缓存的大小,请参阅storage.wiredTiger.engineConfig.cacheSizeGB--wiredTigerCacheSizeGBAvoid increasing the WiredTiger internal cache size above its default value.避免将WiredTiger内部缓存大小增加到其默认值以上。

Note

The storage.wiredTiger.engineConfig.cacheSizeGB limits the size of the WiredTiger internal cache. storage.wiredTiger.engineConfig.cacheSizeGB限制了WiredTiger内部缓存的大小。The operating system will use the available free memory for filesystem cache, which allows the compressed MongoDB data files to stay in memory. In addition, the operating system will use any free RAM to buffer file system blocks and file system cache.操作系统将使用可用的空闲内存进行文件系统缓存,这允许压缩的MongoDB数据文件留在内存中。此外,操作系统将使用任何空闲的RAM来缓冲文件系统块和文件系统缓存。

To accommodate the additional consumers of RAM, you may have to decrease WiredTiger internal cache size.为了容纳更多的RAM消耗者,您可能需要减小WiredTiger内部缓存的大小。

The default WiredTiger internal cache size value assumes that there is a single mongod instance per machine. 默认的WiredTiger内部缓存大小值假定每台机器有一个mongod实例。If a single machine contains multiple MongoDB instances, then you should decrease the setting to accommodate the other mongod instances.如果一台机器包含多个MongoDB实例,那么应该减少设置以容纳其他mongod实例。

If you run mongod in a container (e.g. lxc, cgroups, Docker, etc.) that does not have access to all of the RAM available in a system, you must set storage.wiredTiger.engineConfig.cacheSizeGB to a value less than the amount of RAM available in the container. 如果在无法访问系统中所有可用RAM的容器(例如lxccgroups、Docker等)中运行mongod,则必须将storage.wiredTiger.engineConfig.cacheSizeGB设置为小于容器中可用RAM量的值。The exact amount depends on the other processes running in the container. 确切的数量取决于容器中运行的其他进程。See memLimitMB.请参阅memLimitMB

To view statistics on the cache and eviction rate, see the wiredTiger.cache field returned from the serverStatus command.要查看缓存和逐出率的统计信息,请参阅serverStatus命令返回的wiredTiger.cache字段。

Sharded Cluster Diagnostics分片群集诊断

The two most important factors in maintaining a successful sharded cluster are:维护成功的分片集群的两个最重要的因素是:

While you can change your shard key later, it is important to carefully consider your shard key choice to avoid scalability and performance issues. Continue reading for specific issues you may encounter in a production environment.虽然稍后可以更改分片键,但重要的是要仔细考虑分片键的选择,以避免可扩展性和性能问题。继续阅读,了解您在生产环境中可能遇到的具体问题。

In a new sharded cluster, why does all data remain on one shard?在一个新的分片集群中,为什么所有数据都保留在一个分片上?

Your cluster must have sufficient data for sharding to make sense. Sharding works by migrating chunks between the shards until each shard has roughly the same number of chunks.您的集群必须有足够的数据来进行分片。分片通过在分片之间迁移块来工作,直到每个分片具有大致相同数量的块。

The default chunk size is 128 megabytes. MongoDB will not begin migrations until the imbalance of chunks in the cluster exceeds the migration threshold. 默认块大小为128兆字节。MongoDB不会开始迁移,直到集群中块的不平衡超过迁移阈值This behavior helps prevent unnecessary chunk migrations, which can degrade the performance of your cluster as a whole.这种行为有助于防止不必要的块迁移,这可能会降低整个集群的性能。

If you have just deployed a sharded cluster, make sure that you have enough data to make sharding effective. 如果您刚刚部署了一个分片集群,请确保您有足够的数据使分片有效。If you do not have sufficient data to create more than eight 128 megabyte chunks, then all data will remain on one shard. 如果您没有足够的数据来创建超过8个128兆字节的块,那么所有数据都将保留在一个分片上。Either lower the chunk size setting, or add more data to the cluster.降低区块大小设置,或者向集群中添加更多数据。

As a related problem, the system will split chunks only on inserts or updates, which means that if you configure sharding and do not continue to issue insert and update operations, the database will not create any chunks. 作为一个相关的问题,系统将仅在插入或更新时分割块,这意味着如果您配置了分片,并且不继续执行插入和更新操作,数据库将不会创建任何块。You can either wait until your application inserts data or split chunks manually.您可以等待应用程序插入数据,也可以手动分割块

Finally, if your shard key has a low cardinality, MongoDB may not be able to create sufficient splits among the data.最后,如果您的分片键基数较低,MongoDB可能无法在数据之间创建足够的拆分。

Why would one shard receive a disproportionate amount of traffic in a sharded cluster?为什么一个分片会在一个分片集群中接收到不成比例的流量?

In some situations, a single shard or a subset of the cluster will receive a disproportionate portion of the traffic and workload. 在某些情况下,单个分片或集群的子集将接收不成比例的流量和工作负载。In almost all cases this is the result of a shard key that does not effectively allow write scaling.在几乎所有情况下,这都是由于分片键不能有效地允许写入缩放所导致的。

It's also possible that you have "hot chunks." In this case, you may be able to solve the problem by splitting and then migrating parts of these chunks.也有可能你有“热块”。在这种情况下,你可以通过拆分然后迁移这些块的一部分来解决问题。

You may have to consider resharding your collection with a different shard key to correct this pattern.为了纠正这种模式,您可能需要考虑使用不同的分片键重新分片集合

What can prevent a sharded cluster from balancing?什么可以阻止分片集群进行平衡?

If you have just deployed your sharded cluster, you may want to consider the troubleshooting suggestions for a new cluster where data remains on a single shard.如果您刚刚部署了分片集群,您可能需要考虑新集群的故障排除建议,其中数据保留在单个分片上。

If the cluster was initially balanced, but later developed an uneven distribution of data, consider the following possible causes:如果集群最初是平衡的,但后来数据分布不均,请考虑以下可能的原因:

  • You have deleted or removed a significant amount of data from the cluster. If you have added additional data, it may have a different distribution with regards to its shard key.您已从群集中删除或删除了大量数据。如果您添加了额外的数据,那么它的分片键可能会有不同的分布。
  • Your shard key has low cardinality and MongoDB cannot split the chunks any further.您的分片键基数较低,MongoDB无法进一步分割块。
  • Your data set is growing faster than the balancer can distribute data around the cluster. This is uncommon and typically is the result of:您的数据集的增长速度快于平衡器在集群中分发数据的速度。这种情况并不常见,通常是由于以下原因造成的:

    • a balancing window that is too short, given the rate of data growth.考虑到数据的增长速度,这是一个太短的平衡窗口
    • an uneven distribution of write operations that requires more data migration. You may have to choose a different shard key to resolve this issue.写操作分布不均,需要更多的数据迁移。您可能需要选择不同的分片键来解决此问题。
    • poor network connectivity between shards, which may lead to chunk migrations that take too long to complete. Investigate your network configuration and interconnections between shards.分片之间的网络连接较差,这可能导致块迁移耗时过长。调查您的网络配置和分片之间的互连。

Why do chunk migrations affect sharded cluster performance?为什么区块迁移会影响分片集群的性能?

If migrations impact your cluster or application's performance, consider the following options, depending on the nature of the impact:如果迁移影响集群或应用程序的性能,请根据影响的性质考虑以下选项:

  1. If migrations only interrupt your clusters sporadically, you can limit the balancing window to prevent balancing activity during peak hours. 如果迁移只是偶尔中断集群,则可以限制平衡窗口,以防止在高峰时段进行平衡活动。Ensure that there is enough time remaining to keep the data from becoming out of balance again.确保有足够的剩余时间防止数据再次失衡。
  2. If the balancer is always migrating chunks to the detriment of overall cluster performance:如果平衡器总是迁移块,从而损害集群的整体性能:

It's also possible that your shard key causes your application to direct all writes to a single shard. 也有可能是您的分片键导致应用程序将所有写入都指向单个分片。This kind of activity pattern can require the balancer to migrate most data soon after writing it. 这种活动模式可能要求平衡器在写入数据后不久迁移大部分数据。You may have to consider resharding your collection with a different shard key that provides better write scaling.您可能需要考虑使用不同的分片键重新分片集合,以提供更好的写入伸缩性。