Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device unable to connect after restart command #1579

Closed
bjoernsauer opened this issue Nov 10, 2022 · 11 comments
Closed

Device unable to connect after restart command #1579

bjoernsauer opened this issue Nov 10, 2022 · 11 comments
Assignees
Labels
bug Something isn't working theme:mqtt Theme: mqtt and mosquitto related topics

Comments

@bjoernsauer
Copy link

Describe the bug

Performing a device restart from cumulocity restarts the device. However the device is not able to connect to the tenant after the restart.
Log output of the tedge-mapper-c8y:

2022-11-10T14:23:14.3352263Z  INFO flockfile::unix: Lockfile created "/run/lock/tedge-mapper-c8y.lock"
2022-11-10T14:23:14.40770605Z  INFO init: c8y_api::http_proxy: Initialisation
2022-11-10T14:23:24.40987676Z ERROR init: c8y_api::http_proxy: An error occurred while retrieving internal Id, operation will retry in 60 seconds and mapper will reinitialise.
 Error: RequestTimeout
2022-11-10T14:24:34.41163103Z ERROR init: c8y_api::http_proxy: An error occurred while retrieving internal Id, operation will retry in 60 seconds and mapper will reinitialise.
 Error: RequestTimeout

The error repeats until a manual reboot. After the manual reboot the device can connect to cumulocity again without an error.

Log after manual reboot:

2022-11-10T14:29:30.59818096Z  INFO flockfile::unix: Lockfile created "/run/lock/tedge-mapper-c8y.lock"
2022-11-10T14:29:30.66734297Z  INFO init: c8y_api::http_proxy: Initialisation
2022-11-10T14:29:31.14584877Z  INFO init: c8y_api::http_proxy: Initialisation done.
2022-11-10T14:29:31.14796399Z  INFO tedge_mapper::core::mapper: tedge-mapper-c8y starting
2022-11-10T14:29:31.15365901Z  INFO tedge-mapper-c8y: tedge_mapper::core::mapper: Running

To Reproduce

Issue a restart command.

Expected behavior

The device should connect after a restart command.

Environment:

  • OS: Custom yocto build
  • Phoenix Contact AXC F 2152, Custom embedded hardware ARMv7
  • System-Architecture: Linux axcf2152 5.4.47-rt28-pxc # 1 SMP PREEMPT_RT Wed Dec 15 20:26:30 UTC 2021 armv7l armv7l armv7l GNU/Linux
  • thin-edge.io version 0.8.0
@bjoernsauer bjoernsauer added the bug Something isn't working label Nov 10, 2022
@reubenmiller
Copy link
Contributor

Thanks @bjoernsauer for the details. We will see if we can reproduce it, otherwise we will reach out for some more details.

@reubenmiller
Copy link
Contributor

@bjoernsauer After the reboot, could you check if the mosquitto.service started properly?

If you are using systemctl, you can check this via:

sudo systemctl status mosquitto.service

And it should return something like this

$ sudo systemctl status mosquitto.service
● mosquitto.service - Mosquitto MQTT Broker
     Loaded: loaded (/lib/systemd/system/mosquitto.service; enabled; vendor preset: enabled)
     Active: active (running) since Sun 2022-11-13 23:17:28 CET; 2 days ago
       Docs: man:mosquitto.conf(5)
             man:mosquitto(8)
    Process: 2003 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS)
   Main PID: 546 (mosquitto)
      Tasks: 1 (limit: 4915)
        CPU: 1min 37.976s
     CGroup: /system.slice/mosquitto.service
             └─546 /usr/sbin/mosquitto -c /etc/mosquitto/mosquitto.conf

@cmosd cmosd mentioned this issue Nov 16, 2022
16 tasks
@bjoernsauer
Copy link
Author

Hi @reubenmiller, I'm using a sysvinit system via a self defined system.toml file.
I've checked mosquitto and yes it's running.

@reubenmiller
Copy link
Contributor

reubenmiller commented Nov 21, 2022

@bjoernsauer Could you please post your system.toml file so that we can try to reproduce it?

The restart command should invoke init 6. Could you also try to send this manually (using sudo or root)?

@bjoernsauer
Copy link
Author

Hi @reubenmiller The restart is not the problem manually calling init 6performs well. The device does the reboot. After the reboot I see the same issue in the tedge_mapper c8y output log:

2022-11-21T08:41:21.30833385Z  INFO flockfile::unix: Lockfile created "/run/lock/tedge-mapper-c8y.lock"
2022-11-21T08:41:21.41281743Z  INFO init: c8y_api::http_proxy: Initialisation
2022-11-21T08:41:31.41429271Z ERROR init: c8y_api::http_proxy: An error occurred while retrieving internal Id, operation will retry in 60 seconds and mapper will reinitialise.
 Error: RequestTimeout

system.toml

[init]
name = "sysv"
is_available = ["/etc/tedge/system_helper.sh", "available"]
restart = ["/etc/tedge/system_helper.sh", "initd", "{}", "restart"]
stop =  ["/etc/tedge/system_helper.sh", "initd", "{}", "stop"]
enable =  ["/etc/tedge/system_helper.sh", "updaterc", "{}", "enable"]
disable =  ["/etc/tedge/system_helper.sh", "updaterc", "{}", "disable"]
is_active = ["/etc/tedge/system_helper.sh", "initd", "{}", "status"]

script file system_helper.sh

#!/bin/sh

mode=$1
service=$2
arg=$3

systemctl_available () {
    command -v systemctl >/dev/null 2>&1;
}

do_check_sysv_is_available () {
    if systemctl_available; then
        echo "systemd init detected."
        exit 1
    fi
    if [ -L /sbin/init ]; then
        if [ $(readlink /sbin/init) = "/sbin/init.sysvinit" ]; then
            echo "SysV init detected."
            exit 0;
        elif [ $(readlink /sbin/init) = "/lib/systemd/systemd" ]; then
            echo "systemd init detected."
            exit 1
        else
            echo "No SysV init detected."
            exit 2;
        fi
    # if init is not a symlink we have sysv
    elif [ -f /sbin/init ]; then
        echo "SysV init detected."
        exit 0;
    else
        echo "No SysV init detected."
        exit 3;
    fi
}

do_initd_script_call () {
    /etc/init.d/${service} $arg
    exit $?
}

do_updaterc_call () {
    if [ $arg = "enable" ]; then
        update-rc.d $service defaults 98 20
        update-rc.d $service enable
    else
        update-rc.d $service disable
    fi
    exit $?
}

if [ $mode = "available" ]; then
    do_check_sysv_is_available
elif [ $mode = "initd" ]; then
    do_initd_script_call
elif [ $mode = "updaterc" ]; then
    do_updaterc_call
else
    exit 1
fi

@reubenmiller
Copy link
Contributor

The logs are very useful there, could you modify how the tedge_mapper is started to include the --debug argument?

For example in systemd this would be editing the service definition (though your usage might differ slightly if you are using sysvinit)

File: /usr/lib/systemd/system/tedge-mapper-c8y.service

[Unit]
Description=tedge-mapper-c8y converts Thin Edge JSON measurements to Cumulocity JSON format.
After=syslog.target network.target mosquitto.service

[Service]
User=tedge
ExecStart=/usr/bin/tedge_mapper c8y --debug
Restart=on-failure
RestartPreventExitStatus=255

[Install]
WantedBy=multi-user.target

@bjoernsauer
Copy link
Author

Hi @reubenmiller, here is the log after the device reboot.
The error message repeats until I disconnect the device and connect ist again.

2022-11-24T10:58:41.1667011Z  INFO flockfile::unix: Lockfile created "/run/lock/tedge-mapper-c8y.lock"
2022-11-24T10:58:41.25894187Z TRACE mio::poll: registering event source with poller: token=Token(1), interests=READABLE | WRITABLE
2022-11-24T10:58:41.26022658Z DEBUG rumqttc::state: Subscribe. Topics = [Filter = c8y/s/dat, Qos = AtLeastOnce], Pkid = 1
2022-11-24T10:58:41.26167645Z  INFO init: c8y_api::http_proxy: Initialisation
2022-11-24T10:58:41.26224065Z DEBUG rumqttc::state: Publish. Topic = c8y/s/uat, Pkid = 2, Payload Size = 0
2022-11-24T10:58:51.26317512Z ERROR init: c8y_api::http_proxy: An error occurred while retrieving internal Id, operation will retry in 60 seconds and mapper will reinitialise.
 Error: RequestTimeout
2022-11-24T10:59:41.26061356Z DEBUG rumqttc::state: Pingreq,
            last incoming packet before 59996 millisecs,
            last outgoing request before 59998 millisecs
2022-11-24T10:59:51.26517084Z DEBUG rumqttc::state: Publish. Topic = c8y/s/uat, Pkid = 3, Payload Size = 0
2022-11-24T11:00:01.26571797Z ERROR init: c8y_api::http_proxy: An error occurred while retrieving internal Id, operation will retry in 60 seconds and mapper will reinitialise.
 Error: RequestTimeout
^C
admin@axcf2152:~$ cat tedge-mapper-c8y.log
2022-11-24T10:58:41.1667011Z  INFO flockfile::unix: Lockfile created "/run/lock/tedge-mapper-c8y.lock"
2022-11-24T10:58:41.25894187Z TRACE mio::poll: registering event source with poller: token=Token(1), interests=READABLE | WRITABLE
2022-11-24T10:58:41.26022658Z DEBUG rumqttc::state: Subscribe. Topics = [Filter = c8y/s/dat, Qos = AtLeastOnce], Pkid = 1
2022-11-24T10:58:41.26167645Z  INFO init: c8y_api::http_proxy: Initialisation
2022-11-24T10:58:41.26224065Z DEBUG rumqttc::state: Publish. Topic = c8y/s/uat, Pkid = 2, Payload Size = 0
2022-11-24T10:58:51.26317512Z ERROR init: c8y_api::http_proxy: An error occurred while retrieving internal Id, operation will retry in 60 seconds and mapper will reinitialise.
 Error: RequestTimeout
2022-11-24T10:59:41.26061356Z DEBUG rumqttc::state: Pingreq,
            last incoming packet before 59996 millisecs,
            last outgoing request before 59998 millisecs
2022-11-24T10:59:51.26517084Z DEBUG rumqttc::state: Publish. Topic = c8y/s/uat, Pkid = 3, Payload Size = 0
2022-11-24T11:00:01.26571797Z ERROR init: c8y_api::http_proxy: An error occurred while retrieving internal Id, operation will retry in 60 seconds and mapper will reinitialise.
 Error: RequestTimeout
admin@axcf2152:~$ cat tedge-mapper-c8y.log
2022-11-24T10:58:41.1667011Z  INFO flockfile::unix: Lockfile created "/run/lock/tedge-mapper-c8y.lock"
2022-11-24T10:58:41.25894187Z TRACE mio::poll: registering event source with poller: token=Token(1), interests=READABLE | WRITABLE
2022-11-24T10:58:41.26022658Z DEBUG rumqttc::state: Subscribe. Topics = [Filter = c8y/s/dat, Qos = AtLeastOnce], Pkid = 1
2022-11-24T10:58:41.26167645Z  INFO init: c8y_api::http_proxy: Initialisation
2022-11-24T10:58:41.26224065Z DEBUG rumqttc::state: Publish. Topic = c8y/s/uat, Pkid = 2, Payload Size = 0
2022-11-24T10:58:51.26317512Z ERROR init: c8y_api::http_proxy: An error occurred while retrieving internal Id, operation will retry in 60 seconds and mapper will reinitialise.
 Error: RequestTimeout
2022-11-24T10:59:41.26061356Z DEBUG rumqttc::state: Pingreq,
            last incoming packet before 59996 millisecs,
            last outgoing request before 59998 millisecs
2022-11-24T10:59:51.26517084Z DEBUG rumqttc::state: Publish. Topic = c8y/s/uat, Pkid = 3, Payload Size = 0
2022-11-24T11:00:01.26571797Z ERROR init: c8y_api::http_proxy: An error occurred while retrieving internal Id, operation will retry in 60 seconds and mapper will reinitialise.
 Error: RequestTimeout
2022-11-24T11:00:41.26194745Z DEBUG rumqttc::state: Pingreq,
            last incoming packet before 49995 millisecs,
            last outgoing request before 49996 millisecs
2022-11-24T11:01:01.26767379Z DEBUG rumqttc::state: Publish. Topic = c8y/s/uat, Pkid = 4, Payload Size = 0
2022-11-24T11:01:11.26934005Z ERROR init: c8y_api::http_proxy: An error occurred while retrieving internal Id, operation will retry in 60 seconds and mapper will reinitialise.
 Error: RequestTimeout
2022-11-24T11:01:41.263582Z DEBUG rumqttc::state: Pingreq,
            last incoming packet before 39994 millisecs,
            last outgoing request before 39995 millisecs
2022-11-24T11:02:11.27117636Z DEBUG rumqttc::state: Publish. Topic = c8y/s/uat, Pkid = 5, Payload Size = 0
2022-11-24T11:02:21.27477699Z ERROR init: c8y_api::http_proxy: An error occurred while retrieving internal Id, operation will retry in 60 seconds and mapper will reinitialise.
 Error: RequestTimeout
2022-11-24T11:02:41.26495802Z DEBUG rumqttc::state: Pingreq,
            last incoming packet before 29992 millisecs,
            last outgoing request before 29993 millisecs
2022-11-24T11:03:21.27674494Z DEBUG rumqttc::state: Publish. Topic = c8y/s/uat, Pkid = 6, Payload Size = 0
2022-11-24T11:03:31.27837968Z ERROR init: c8y_api::http_proxy: An error occurred while retrieving internal Id, operation will retry in 60 seconds and mapper will reinitialise.
 Error: RequestTimeout
2022-11-24T11:03:41.26622072Z DEBUG rumqttc::state: Pingreq,
            last incoming packet before 19987 millisecs,
            last outgoing request before 19989 millisecs

@reubenmiller
Copy link
Contributor

reubenmiller commented Dec 1, 2022

Looks like we have found the root cause! Seems to be a problem with the mosquitto regarding a bridge connection in version 2.0.15, which results in the bridge not connecting every 2nd time after a reboot.

Looking at the mosquitto logs /var/log/mosquitto/mosquitto.log, it shows an Expiring client error on startup.

1669879960: mosquitto version 2.0.15 running
1669879960: Expiring client Cumulocity due to timeout.

Then when rebooting, it works again. Below shows the happy case. But a subsequent reboot would result in the Expiring client error again.

1669881815: Opening ipv4 listen socket on port 1883.
1669881815: Connecting bridge (step 1) edge_to_c8y (mytenant.eu-latest.cumulocity.com:8883)
1669881815: mosquitto version 2.0.15 running
1669881815: Connecting bridge (step 2) edge_to_c8y (certification-phoenixcontact.eu-latest.cumulocity.com:8883)

Here is the ticket from the mosquitto repo eclipse/mosquitto#2634

Recommended fix

  • install another version of mosquitto, for example 2.0.11

@reubenmiller
Copy link
Contributor

@bjoernsauer Did downgrading to mosquitto 2.0.11 solve your issue? If so can you please close this ticket?

@reubenmiller reubenmiller added the theme:mqtt Theme: mqtt and mosquitto related topics label Dec 1, 2022
@bjoernsauer
Copy link
Author

Downgraded mosquitto from 2.0.15 to 2.0.11. So far the issue did not occure again.
Thank you for your excellent support @reubenmiller

@Zodor
Copy link

Zodor commented Dec 21, 2022

We have the same problem at our installations also. We downgraded to 2.0.14 at that solved it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working theme:mqtt Theme: mqtt and mosquitto related topics
Projects
None yet
Development

No branches or pull requests

4 participants