Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in publish operation #1345

Closed
7eRoM opened this issue Jul 15, 2019 · 9 comments
Closed

Memory leak in publish operation #1345

7eRoM opened this issue Jul 15, 2019 · 9 comments

Comments

@7eRoM
Copy link

7eRoM commented Jul 15, 2019

Consider these two scenarios:
Scenario 1:

  1. The broker is active and is listening
  2. A publisher connects to it
  3. The broker become inactive and goes down for some reason
  4. The publisher tries to publish a message. Get 'MOSQ_ERR_NO_CONN' error message.

In this scenario there is no memory leak. But pay attention to this scenario:
Scenario 2:

  1. The broker is down and is not listening
  2. The publisher tries to connect to the broker
  3. Whether the connection is established or not, the publisher tries to publish a message . It must get 'MOSQ_ERR_NO_CONN' error message but It gets 'MOSQ_ERR_ERRNO' error message on account of a poor error handling.

This bad error handling leads to a memory leak. According to the QoS of the message, size of memory leak will be differ. If QoS is 0, the memory will be consumed as much as the payloadlen. If the QoS is 1, the size will be multiplied by two.
For example:
QoS=0, payloadlen=250MB --> Consumed memory in memory leak: 250MB
QoS=1, payloadlen=250MB --> Consumed memory in memory leak: 500MB

According to the code available on https://github.com/iosphere/mosquitto, I did these back trace:
Scenario 1:
https://github.com/iosphere/mosquitto/blob/master/lib/send_mosq.c#L103
if(mosq->sock == INVALID_SOCKET) return MOSQ_ERR_NO_CONN;
_mosquitto_send_publish (send_mosq.c#L103)
mosquitto_publish (mosquitto.c.)
publish (mosquittopp.cpp)

Scenario 2:
https://github.com/iosphere/mosquitto/blob/master/lib/net_mosq.c#L799
return MOSQ_ERR_ERRNO;
_mosquitto_packet_write (net_mosq.c#L799)
_mosquitto_packet_queue (net_mosq.c)
_mosquitto_send_real_publish (send_mosq.c)
_mosquitto_send_publish (send_mosq.c)
mosquitto_publish (mosquitto.c.)
publish (mosquittopp.cpp)

Note
Maybe you say that is the publisher fault. He must check the connection status and then publish his message.
Yes, but the library is handling this exception and returning 'MOSQ_ERR_ERRNO' error message. So, it must complete the task and clean up the memory. otherwise, it seems a segmentation fault is better than a bad exception handling.

@karlp
Copy link
Contributor

karlp commented Jul 15, 2019

and what about on current code? you filed the ticket here, why didn't you look at the code here instead of some years old independent fork?

@7eRoM
Copy link
Author

7eRoM commented Jul 15, 2019

and what about on current code? you filed the ticket here, why didn't you look at the code here instead of some years old independent fork?

First of all, I just wrote the trace back for hint.
I`m using the latest compatible version of this library in Ubuntu 16.04 and this bug still exist.

I`m not sure but I think there is a back trace for scenario 2 like this:

https://github.com/eclipse/mosquitto/blob/master/lib/packet_mosq.c#L219
return MOSQ_ERR_ERRNO;
packet__write (packet_mosq.c#L219)
packet__queue (packet_mosq.c)
send__real_publish (send_publish.c)
send_publish (send_publish.c)

@ralight
Copy link
Contributor

ralight commented Jul 24, 2019

Thanks for the report. Do you have some example code that shows this behaviour? I've just tried to implement what I think you mean, but haven't been able to reproduce it. I've tried on version 1.6.3 and 1.4.8.

@7eRoM
Copy link
Author

7eRoM commented Jul 29, 2019

Thanks for the report. Do you have some example code that shows this behaviour? I've just tried to implement what I think you mean, but haven't been able to reproduce it. I've tried on version 1.6.3 and 1.4.8.

First, sorry for the late reply.

I have written a poc and compile that. The problem still exists.
But here is a manually rewritten of that. I did not refactor the code. Sorry for misspelling and non-formatting.

mqtt.h

#ifndef MQTT_H
#define MQTT_H

#include <string>
#include <cstdio>
#include <cstring>
#include <iostream>
#include <mosquittopp.h>

using namespace std;

#define DEAFULT_KEEP_ALIVE 60

class mqtt : public mosqpp::mosquittopp
{
    public:
        mqtt(const char* id);

        int init_mqtt(string host, int port);
        void on_connect(int rc);
        void on_disconnet(int rc);
};

#endif //MQTT_H

mqtt.cpp

#include "mqtt.h"

using namespace std;

mqtt::mqtt(const char *id) : mosquittopp(id)
{

}

int mqtt::init_mqtt(string host, int port)
{
    string log;
    int ret;

    mosqpp::lib_init();

    ret = connect_async(host.c_str(), port, 60);
    if(ret != MOSQ_ERR_SUCCESS)
    {
        if(ret == MOSQ_ERR_ERRNO)
        {
            log = "(" + host + ":" + to_string(port) + "): connect_async failed! (" + to_string(errno) + ": " + strerror(errno) + ")";
            cout << log << endl;
            return __LINE__;
        }
        else
        {
            log = "(" + host + ":" + to_string(port) + "): connect_async failed! (" + to_string(ret) + ": " + string(mosquitto_strerror(ret)) + ")";
            cout << log << endl;
            return __LINE__;
        }
    }

    ret = loop_start();
    if(ret != MOSQ_ERR_SUCCESS)
    {
        if(ret == MOSQ_ERR_ERRNO)
        {
            log = "(" + host + ":" + to_string(port) + "): loop_start failed! (" + to_string(errno) + ": " + strerror(errno) + ")";
            cout << log << endl;
            return __LINE__;
        }
        else
        {
            log = "(" + host + ":" + to_string(port) + "): loop_start failed! (" + to_string(ret) + ": " + string(mosquitto_strerror(ret)) + ")";
            cout << log << endl;
            return __LINE__;
        }
    }
}

void mqtt::on_connect(int rc)
{
    if(rc == MOSQ_ERR_SUCCESS)
        cout << "Connected successfully to the broker" << endl;
    else
        cout << "Error in connection to the broker" << endl;
}

void on_disconnet(int rc)
{
    if(rc == MOSQ_ERR_SUCCESS)
        cout << "Disconnected successfully from the broker" << endl;
    else
        cout << "Error in disconnection from the broker" << endl;
}

main.cpp

#include <cstring>
#include <cstdio>
#include <string>
#include <iostream>
#include <cstdlib>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include <memory>
#include <fstream>
#include <sys/stat.h>

#include "mqtt.h"

using namespace std;

int main(int argc, char* argv[])
{
    int ret = 0;
    string log;

    string id = "my_id";
    int port = 6101;
    string host = "localhost";
    int mid;
    string file_path = "250MB.bin";
    string topic = "my_topic";
    int qos = 0;
    bool retain = false;

    mqtt my_mqtt(id.c_str());
    my_mqtt.init_mqtt(host, port);

    struct stat path_stat;
    stat(file_path.c_str(), &path_stat);
    long payloadlen = path_stat.st_size;
    
    unique_ptr<char []> buffer(new char[payloadlen]);

    FILE *fp = fopen(file_path.c_str(), "rb");
    if(fp == NULL)
    {
        log = "fopen failed in file '" + file_path + "': " + strerror(errno);
        cout << log << endl;
    }

    long nread = 0;
    long total_nread = 0;
    while(total_nread < payloadlen)
    {
        nread = fread(buffer.get() + total_nread, sizeof(char), payloadlen - total_nread, fp);
        total_nread += nread;
    }
    fclose(fp);

    cout << "[After 3 publications] First please check the RAM usage (free -m) and then press any key to continue..." << endl;
    getchar();

    // publish 3 times
    int published_ret = my_mqtt.publish(&mid, topic.c_str(), payloadlen, buffer.get(), qos, retain);
    if(published_ret == MOSQ_ERR_SUCCESS)
    {
        log = "File '" + file_path + "'published successfully";
        cout << log << endl;
    }
    else if(published_ret == MOSQ_ERR_SUCCESS)
    {
        log = "File '" + file_path + "' is not published (" + to_string(published_ret) + ": " + strerror(errno) + ")";
        cout << log << endl;
    }
    else
    {
        log = "File '" + file_path + "' is not published (" + to_string(published_ret) + ": " + string(mosquitto_strerror(published_ret)) + ")";
        cout << log << endl;
    }


    published_ret = my_mqtt.publish(&mid, topic.c_str(), payloadlen, buffer.get(), qos, retain);
    if(published_ret == MOSQ_ERR_SUCCESS)
    {
        log = "File '" + file_path + "'published successfully";
        cout << log << endl;
    }
    else if(published_ret == MOSQ_ERR_SUCCESS)
    {
        log = "File '" + file_path + "' is not published (" + to_string(published_ret) + ": " + strerror(errno) + ")";
        cout << log << endl;
    }
    else
    {
        log = "File '" + file_path + "' is not published (" + to_string(published_ret) + ": " + string(mosquitto_strerror(published_ret)) + ")";
        cout << log << endl;
    }


    published_ret = my_mqtt.publish(&mid, topic.c_str(), payloadlen, buffer.get(), qos, retain);
    if(published_ret == MOSQ_ERR_SUCCESS)
    {
        log = "File '" + file_path + "'published successfully";
        cout << log << endl;
    }
    else if(published_ret == MOSQ_ERR_SUCCESS)
    {
        log = "File '" + file_path + "' is not published (" + to_string(published_ret) + ": " + strerror(errno) + ")";
        cout << log << endl;
    }
    else
    {
        log = "File '" + file_path + "' is not published (" + to_string(published_ret) + ": " + string(mosquitto_strerror(published_ret)) + ")";
        cout << log << endl;
    }

    cout << "[After 3 publications] And now, please check again the RAM usage (free -m) and then press any key to exit..." << endl;

    getchar();
    return 0;
}   

The poc tries to connect to (localhost:6101) which no broker is listening to.
And then it publishes a 250MB file for 3 times. (We repeat the publication for 3 times to show more clearly the memmory leak).

Before executing the program the RAM usage is as show below:

free -m
		total		used		free		shared		buff/cache		available
Mem:		 7896		 249		7255		    36		       391		     7363
Swap:		 7617		   3		7624		

Before publishing the RAM usage is as show below:

free -m
		total		used		free		shared		buff/cache		available
Mem:		 7896		 500		7255		    36		       391		     7363
Swap:		 7617		   3		7624	

And after publishing the RAM usage is as show below:

free -m
		total		used		free		shared		buff/cache		available
Mem:		 7896		1252		6253                36		       391		     7363
Swap:		 7617		   3		7624	

Used RAM: 1252 - 249 =~ 1000MB --> 250MB (The file in RAM) + 250MB (MeMLeak from publication number 1) + 250MB (MeMLeak from publication number 2) + 250MB (MeMLeak from publication number 3)

@karlp
Copy link
Contributor

karlp commented Jul 29, 2019

run it under valgrind if you have such a nice test case, it's much more useful at finding what's going on than free, which looks at the entire system

@ralight
Copy link
Contributor

ralight commented Jul 29, 2019

I haven't had chance to look at this yet because of the sheer number of compiler errors there are. It's all very well apologising for the typos and mistakes in your test case, but it suggests that you haven't actually tested the test case and are hoping it shows what you want, and besides that I find it very impolite that you expect anybody who would like to help you to fix all of your errors first.

My guess is that what you are seeing is similar to what this shows: https://github.com/ralight/heap-allocation/blob/master/mem.c

I will fix the errors later and see what I can find.

@7eRoM
Copy link
Author

7eRoM commented Jul 30, 2019

I haven't had chance to look at this yet because of the sheer number of compiler errors there are. It's all very well apologising for the typos and mistakes in your test case, but it suggests that you haven't actually tested the test case and are hoping it shows what you want, and besides that I find it very impolite that you expect anybody who would like to help you to fix all of your errors first.

My guess is that what you are seeing is similar to what this shows: https://github.com/ralight/heap-allocation/blob/master/mem.c

I will fix the errors later and see what I can find.

As I said, I compiled my test case and sent you the result as shown above. I`ve got shocked with your non-professional behaviour and unfriendly words.

Agian, as I said, becuase of some problems, I have not direct access to my developing system and I have to rewrite the code manually in my internet system and then send to you.

Anyway, I`ve editted codes of the test case and think there will be no execuses!

@ralight
Copy link
Contributor

ralight commented Jul 30, 2019

I guess we're even then, we're both shocked by each others lack of consideration. It is quite wearing that so many people contacting me expect me to do all of the work for them, and I apologise that you have been the recipient of my frustration.

This isn't a memory leak, it is just queueing the packets when it shouldn't be. If you connect again or free the client, the memory will be freed. I agree that it shouldn't be held in limbo like this though. If you use connect() instead of connect_async(), the memory is not held at all. I have a patch that will make connect_async() work just the same as connect() in this regard.

ralight added a commit that referenced this issue Jul 30, 2019
`mosquitto_connect_async()` is now consistent with `mosquitto_connect()`
when connecting to a non-existent server.

Closes #1345. Thanks to Mohammad Reza.
@7eRoM
Copy link
Author

7eRoM commented Jul 31, 2019

People contant you because you did a great work: Mosquitto lib in C/C++!
Keep it up!

Anyway, thank you for consideration and the trying to make this lib better and better.
All the best.

@ralight ralight added this to the 1.6.4 milestone Aug 1, 2019
@ralight ralight closed this as completed Aug 1, 2019
vankxr pushed a commit to vankxr/mosquitto that referenced this issue Aug 9, 2019
`mosquitto_connect_async()` is now consistent with `mosquitto_connect()`
when connecting to a non-existent server.

Closes eclipse#1345. Thanks to Mohammad Reza.
ralight added a commit that referenced this issue Sep 18, 2019
`mosquitto_connect_async()` is now consistent with `mosquitto_connect()`
when connecting to a non-existent server.

Closes #1345. Thanks to Mohammad Reza.
@lock lock bot locked as resolved and limited conversation to collaborators Oct 30, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants