MQTT Client Management Solution

In IoT project development, the MQTT protocol is an application layer protocol, and the MQTT client is responsible for data transmission, so client management is particularly important.

Limitations

The number of MQTT clients that an application can support is limited. For example, in Java, by default, creating an MQTT client without any operations requires 5 threads.

Without considering system limitations, the maximum number of threads can be calculated using the following formula Number of threads = (available memory of the machine - heap memory allocated by JVM) / Xss value. For example, if our container size is 8G, heap size is 4096M, and the default value of -Xss is used, the maximum number of threads can be calculated as 4096.

The lifespan of each MQTT client is long. In IoT projects, an MQTT client often runs continuously for several years, so future expansion needs to be considered during the solution design phase.

Solution Design

The most important design elements for the limitations are as follows:

Limit the maximum number of MQTT clients that the current application can create through configuration files.
Be able to create multiple application instances and support load balancing and failover.

The first point above is relatively easy to achieve by limiting the maximum number of MQTT clients that the current application can create through configuration files. The biggest difficulty lies in the second point, how to achieve load balancing and failover between multiple application instances.

Load Balancing

This load balancing solution has a prerequisite: it is assumed that the behavior of each MQTT client is the same, and there will be no situation where some MQTT clients perform heavy tasks.

Under this prerequisite, the load balancing strategy can be designed to be extremely simple: select the node with the smallest number of uses from all application instances to create the MQTT client.

Number of uses = maximum number of MQTT clients - number of MQTT clients already created

Do not forget that the execution content of the MQTT client can be the same, but the execution time is different. The reason is that there will be time differences when physical devices report data. Therefore, a complex load balancing algorithm can combine weights through the following content.

\text{Remaining capacity ratio} = \frac{\text{Maximum number of MQTT clients} - \text{Current number of MQTT clients}}{\text{Maximum number of MQTT clients}}

\text{CPU load ratio} = \frac{\text{Current CPU usage}}{100}

\text{Memory load ratio} = \frac{\text{Current memory usage}}{\text{Maximum memory capacity}}

\text{Load score} = w_1 \times \text{Remaining capacity ratio} + w_2 \times \text{CPU load ratio} + w_3 \times \text{Memory load ratio}

Among them: $w_1$ , $w_2$ and $w_3$ are the weights of the remaining capacity ratio, CPU load ratio and memory load ratio respectively. Finally, according to the calculated load score, the application instance with the lowest score is selected as the creation node of the new MQTT client.

Horizontal Scaling

In the load balancing solution, we assume that the behavior of MQTT clients is the same, so when scaling is needed, it is only necessary to increase the number of application instances.

Horizontal scaling timing: Trigger scaling when the usage of MQTT clients in a single application instance reaches 80%.

Note: A message notification mechanism can be used in the application instance to inform operation and maintenance personnel or automated programs to create new application instances.

MQTT客户端管理方案.drawio.png

Note:

The facade in the figure can be any component that can communicate with the MQTT client management instance, such as Nginx, application programs, etc.
The selection mechanism is within the MQTT client management instance, not a separate program

Failover

Failover is to ensure the normal operation of the MQTT client in the event of an accident. Here are two main fault scenarios:

Use EMQX to remove the MQTT client (consider this a misoperation, only for MQTT created within the cluster is a misoperation). Misoperation mainly uses the offline notification mechanism of the MQTT client. For example, in a Java program, the MqttCallback#connectionLost method will be called when the MQTT client is disconnected, so it can be used as a basis for judging whether it is a misoperation.
The application instance dies unexpectedly, causing all MQTT clients in the application instance to be destroyed. The following implementation schemes can be used to address this issue:
1. Monitor the application instance through an external program, and trigger the failover mechanism if the application instance dies. Note: This solution requires additional maintenance of a monitoring program and ensuring the normal operation of the monitoring program.
2. Use tools like Redis to implement expiration monitoring. When the MQTT client in the application instance expires, trigger the failover mechanism. Note: Each application instance needs an additional scheduled task to periodically write heartbeat data to Redis, which will increase thread consumption.

MQTT客户端保活.drawio.png