Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High frequent "Bad no subscription" errors in client #5687

Open
2 of 7 tasks
tboeckel opened this issue Mar 16, 2023 · 11 comments
Open
2 of 7 tasks

High frequent "Bad no subscription" errors in client #5687

tboeckel opened this issue Mar 16, 2023 · 11 comments

Comments

@tboeckel
Copy link

tboeckel commented Mar 16, 2023

Description

From time to time we are facing the problem that an open62541 based client connects to an open62541 based server and then obviously something goes wrong with the subscription of variables. We set up a subscription and the put an arbitrary number of monitored items into that subscription. Usually this all works perfectly, but sometimes something goes wrong (without any readable notice) and then we get tons of these "BadNoSubscription" log messages right after a complain about a dropped StatusChangeNotification:

[11:03:51.731] WARN [warning|client ] Dropped a StatusChangeNotification since no callback is registered
[11:03:51.731] WARN [warning|client ] Received Publish Response with code BadNoSubscription
[11:03:51.731] WARN [warning|client ] Received Publish Response with code BadNoSubscription
[11:03:51.731] WARN [warning|client ] Received Publish Response with code BadNoSubscription
[11:03:51.731] WARN [warning|client ] Received Publish Response with code BadNoSubscription
[11:03:51.731] WARN [warning|client ] Received Publish Response with code BadNoSubscription
[11:03:51.731] WARN [warning|client ] Received Publish Response with code BadNoSubscription
[11:03:51.732] WARN [warning|client ] Received Publish Response with code BadNoSubscription
[11:03:51.732] WARN [warning|client ] Received Publish Response with code BadNoSubscription
.....

Upon the next start of our application everything is working again, at least in 98 of 100 times. No changes on either side inbetween. But in this situation neither side is really usable.

What is causing this behaviour and what can we do about it?

Used CMake options:
UA_ENABLE_AMALGAMATION:BOOL=ON
UA_ENABLE_ENCRYPTION:STRING=OPENSSL
UA_ENABLE_ENCRYPTION_OPENSSL:BOOL=ON
UA_MULTITHREADING:STRING=100

Checklist

Please provide the following information:

  • open62541 Version (release number or git tag): 1.3.5
  • Other OPC UA SDKs used (client or server):
  • Operating system: Ubuntu 20.04, Ubuntu 22.04, custom Yocto Linux
  • Logs (with UA_LOGLEVEL set as low as necessary) attached
  • Wireshark network dump attached
  • Self-contained code example attached
  • Critical issue
@tboeckel
Copy link
Author

tboeckel commented Apr 25, 2023

This issue is becoming a real show stopper. Sometimes the applications start properly on each try, sometimes I get this issue each and every time the applications start. I really get the impression the issue is dependend on aliens, moon phases or what ever.
It would be really nice if someone could please take a look at this.

@tboeckel
Copy link
Author

tboeckel commented Jun 5, 2023

Although the issue seemed to have vanished for some time it occures more often again now. Even for client applications which do NOT subscribe ANY variables from the remote servers. How can this happen?

We really need to get this issue resolved. Please help me!!

@tboeckel
Copy link
Author

tboeckel commented Jun 5, 2023

When this effect happens, the right ahead of it all there is status change notification with a status code of UA_STATUSCODE_GOODSUBSCRIPTIONTRANSFERRED. Maybe this points into a direction.

@tboeckel
Copy link
Author

tboeckel commented Jun 5, 2023

There is no further additional info or inner status code or inner notification.

@tboeckel
Copy link
Author

tboeckel commented Jun 5, 2023

I must correct myself. The effect still only happens for client application which actually do subscribe values from the remote server. I was mislead by a misinterpreted debug output. Sorry for that.
But the basic question remains: what is causing this error? Why is the subscription transferred to another session? At least the is the location where the status change notification is thrown.

@tboeckel
Copy link
Author

tboeckel commented Sep 1, 2023

In the meantime I added a StatusChange notification callback. The only thing it is able to output is a statuscode of UA_STATUSCODE_GOODSUBSCRIPTIONTRANSFERRED. Why does this happen? And what can I do about that or how should this situation be handled?

@VictorManKBO
Copy link

I have this problem too. When clients connect at different times, everything works. But, if you turn off and turn on the server, then the clients immediately restore communication and somewhere such an error (internal) error occurs. With all this, retval = GOOD.
When only one client works - no such problem is noticed. My version v1.3.4

4/9 13:0:45.874  OpcUaClient creted!
4/9 13:0:45.927  CheckConnection() : BadDisconnect
4/9 13:0:45.927  [warn/userland]	AcceptAll Certificate Verification. Any remote certificate will be accepted.
4/9 13:0:45.932  [info/channel]	Connection 816 | SecureChannel 3 | SecureChannel opened with SecurityPolicy http:https://opcfoundation.org/UA/SecurityPolicy#None and a revised lifetime of 600.00s
4/9 13:0:45.932  [info/client]	Client Status: ChannelState: Open, SessionState: Closed, ConnectStatus: Good
4/9 13:0:45.933  [info/client]	Selected endpoint 0 in URL opc.tcp:https://192.168.220.43:4840 with SecurityMode None and SecurityPolicy http:https://opcfoundation.org/UA/SecurityPolicy#None
4/9 13:0:45.934  [info/client]	Selected UserTokenPolicy open62541-anonymous-policy with UserTokenType Anonymous and SecurityPolicy http:https://opcfoundation.org/UA/SecurityPolicy#None
4/9 13:0:45.935  [info/client]	Client Status: ChannelState: Open, SessionState: Created, ConnectStatus: Good
4/9 13:0:45.936  [info/client]	Client Status: ChannelState: Open, SessionState: Activated, ConnectStatus: Good
4/9 13:0:45.937  Reconnect() : connected to 192.168.220.43:4840
4/9 13:0:47.953  subscription #3 created  with status Good
4/9 13:0:56.775  [warn/channel]	Connection 816 | SecureChannel 3 | Receiving the response failed with StatusCode BadConnectionClosed
4/9 13:0:56.776  [warn/client]	Received Publish Response with code BadSecureChannelClosed
4/9 13:0:56.777  [warn/client]	Received Publish Response with code BadSecureChannelClosed
4/9 13:0:56.778  [warn/client]	Received Publish Response with code BadSecureChannelClosed
4/9 13:0:56.784  [warn/channel]	Connection 0 | SecureChannel 0 | Could not receive with StatusCode BadConnectionClosed
4/9 13:0:56.784  [info/client]	Client Status: ChannelState: Closed, SessionState: Created, ConnectStatus: Good
4/9 13:0:56.785  CheckConnection() : Good
4/9 13:0:56.836  CheckConnection() : Good
4/9 13:0:56.836  [info/client]	SecureChannel must be connected before sending requests
4/9 13:0:56.837  [info/client]	Client Status: ChannelState: Closed, SessionState: Closed, ConnectStatus: Good
4/9 13:0:56.837  [warn/userland]	AcceptAll Certificate Verification. Any remote certificate will be accepted.
4/9 13:1:1.838  [warn/network]	Connection to opc.tcp:https://192.168.220.43:4840 timed out
4/9 13:1:1.839  [info/client]	Client Status: ChannelState: Fresh, SessionState: Closed, ConnectStatus: BadTimeout
4/9 13:1:1.839  Reconnect() : connection error with 192.168.220.43:4840(BadTimeout)
4/9 13:1:1.840  CheckConnection() : BadTimeout
...........................	BadTimeout
4/9 13:1:27.13  CheckConnection() : BadTimeout
4/9 13:1:27.13  [info/client]	Client Status: ChannelState: Closed, SessionState: Closed, ConnectStatus: BadTimeout
4/9 13:1:27.14  [warn/userland]	AcceptAll Certificate Verification. Any remote certificate will be accepted.
4/9 13:1:27.19  [info/channel]	Connection 816 | SecureChannel 1 | SecureChannel opened with SecurityPolicy http:https://opcfoundation.org/UA/SecurityPolicy#None and a revised lifetime of 600.00s
4/9 13:1:27.20  [info/client]	Client Status: ChannelState: Open, SessionState: Closed, ConnectStatus: Good
4/9 13:1:27.21  [info/client]	Selected endpoint 0 in URL opc.tcp:https://192.168.220.43:4840 with SecurityMode None and SecurityPolicy http:https://opcfoundation.org/UA/SecurityPolicy#None
4/9 13:1:27.22  [info/client]	Selected UserTokenPolicy open62541-anonymous-policy with UserTokenType Anonymous and SecurityPolicy http:https://opcfoundation.org/UA/SecurityPolicy#None
4/9 13:1:27.24  [info/client]	Client Status: ChannelState: Open, SessionState: Created, ConnectStatus: Good
4/9 13:1:27.25  [info/client]	Client Status: ChannelState: Open, SessionState: Activated, ConnectStatus: Good
4/9 13:1:27.26  Reconnect() : connected to 192.168.220.43:4840
4/9 13:1:29.29  subscription #1 creted with status Good
4/9 13:1:29.785  [warn/client]	Dropped a StatusChangeNotification since no callback is registered
4/9 13:1:29.786  [info/client]	Received a ServiceFault response
4/9 13:1:29.787  [info/client]	The ServiceResult has the StatusCode BadNoSubscription
4/9 13:1:29.787  [warn/client]	Received Publish Response with code BadNoSubscription

@VictorManKBO
Copy link

I was looking for a reason: we took our two clients and ua_expert for the test. In the server logs, in case of failures on the client, we saw how the subscription of one client was TRANSFERRED to the other client!
[info/session] SecureChannel 3 | Session "" | Subscription 2 | Transferred to this Session
log file from server:

5/9 12:55:21.324  [info/network]	TCP network layer listening on opc.tcp:https://SCADA22:4840/
5/9 12:55:21.483  [info/network]	Connection 1060 | New connection over TCP from 192.168.220.219
5/9 12:55:21.485  [info/channel]	Connection 1060 | SecureChannel 1 | SecureChannel opened with SecurityPolicy http:https://opcfoundation.org/UA/SecurityPolicy#None and a revised lifetime of 600.00s
5/9 12:55:21.499  [info/session]	SecureChannel 1 | Session "" | Session created
5/9 12:55:21.503  [info/session]	SecureChannel 1 | Session "" | ActivateSession: Session activated
5/9 12:55:21.510  [info/session]	SecureChannel 1 | Session "" | Subscription 1 | Subscription created (Publishing interval 100.00ms, max 1000 notifications per publish)
5/9 12:55:21.513  [info/session]	SecureChannel 1 | Session "" | Subscription 1 | MonitoredItem 1 | Created the MonitoredItem (Sampling Interval: 0.00ms, Queue Size: 100)
5/9 12:55:21.520  [info/session]	SecureChannel 1 | Session "" | Subscription 1 | MonitoredItem 2 | Created the MonitoredItem (Sampling Interval: 0.00ms, Queue Size: 100)
5/9 12:55:21.526  [info/session]	SecureChannel 1 | Session "" | Subscription 1 | MonitoredItem 3 | Created the MonitoredItem (Sampling Interval: 0.00ms, Queue Size: 100)
5/9 12:55:22.304  [info/network]	Connection 1064 | New connection over TCP from 192.168.220.235
5/9 12:55:22.305  [info/channel]	Connection 1064 | SecureChannel 2 | SecureChannel opened with SecurityPolicy http:https://opcfoundation.org/UA/SecurityPolicy#None and a revised lifetime of 600.00s
5/9 12:55:22.310  [info/session]	SecureChannel 2 | Session "" | Session created
5/9 12:55:22.312  [info/session]	SecureChannel 2 | Session "" | ActivateSession: Session activated
5/9 12:55:24.315  [info/session]	SecureChannel 2 | Session "" | Subscription 2 | Subscription created (Publishing interval 100.00ms, max 1000 notifications per publish)
5/9 12:55:24.317  [info/session]	SecureChannel 2 | Session "" | Subscription 2 | MonitoredItem 1 | Created the MonitoredItem (Sampling Interval: 0.00ms, Queue Size: 100)
5/9 12:55:24.032  [info/session]	SecureChannel 2 | Session "" | Subscription 2 | MonitoredItem 2 | Created the MonitoredItem (Sampling Interval: 0.00ms, Queue Size: 100)
5/9 12:55:25.026  [info/network]	Connection 1068 | New connection over TCP from 192.168.220.219
5/9 12:55:25.027  [info/channel]	Connection 1068 | SecureChannel 3 | SecureChannel opened with SecurityPolicy http:https://opcfoundation.org/UA/SecurityPolicy#None and a revised lifetime of 600.00s
5/9 12:55:25.027  [warn/channel]	Connection 1068 | SecureChannel 3 | ActivateSession: Session not found
5/9 12:55:25.028  [info/session]	SecureChannel 3 | Session "" | Session created
5/9 12:55:25.028  [info/session]	SecureChannel 3 | Session "" | ActivateSession: Session activated
5/9 12:55:25.033  [info/session]	SecureChannel 3 | Session "" | Subscription 2 | Transferred to this Session
5/9 12:55:25.033  [info/session]	SecureChannel 2 | Session "" | Subscription 2 | Subscription deleted

I started looking for what its "Transferred to this Session" and found a description of the structure UA_Subscription(src/server/ua_subscription.h):

/* Subscriptions are managed in a server-wide linked list. If they are attached
 * to a Session, then they are additionaly in the per-Session linked-list. A
 * subscription is always generated for a Session. But the CloseSession Service
 * may keep Subscriptions intact beyond the Session lifetime. They can then be
 * re-bound to a new Session with the TransferSubscription Service. */

From the description it is clear that the subscription is attached to the SessionName, and in all my clients this variable is empty! I began to look for how to change it and it turned out that in my version (v1.3.4) there is NO api to change it.

Editing the library code(3 files):

  1. include\open62541\client.h -> struct UA_ClientConfig -> add member:
    UA_String sessionName; /* Session name */
  2. src\client\ua_client.c -> static void UA_ClientConfig_clear(UA_ClientConfig *config){} -> add clear in function:
    UA_String_clear(&config->sessionName);
  3. src\client\ua_client_connect.c -> static UA_StatusCode createSessionAsync(UA_Client *client){} -> set in UA_CreateSessionRequest:
    UA_String_copy(&client->config.sessionName, &request.sessionName);

After recompilation(lib), you need to add in your application:

  UA_ClientConfig* config = UA_Client_getConfig(m_client);
  config->sessionName = UA_String_fromChars("UniqName");

This fixed my client's error:

[warn/client]	Dropped a StatusChangeNotification since no callback is registered
[info/client]	Received a ServiceFault response
[info/client]	The ServiceResult has the StatusCode BadNoSubscription
[warn/client]	Received Publish Response with code BadNoSubscription

@tboeckel
Copy link
Author

tboeckel commented Sep 8, 2023

Unfortunately this patch does not solve the issue for me. I still get these high frequent "BadNoSubscription" messages.

@tboeckel
Copy link
Author

In the meantime I found a workaround for this issue. It is not perfect and I really like to have a better approach, but for the time being I can live with this "fix".
If the status change notification tells about a successfully transferred subscription I simply abort the connection to the server and restart the connection from scratch. Then, on the second try, everything works out perfectly and the subscription is no longer transferred and hence no high frequent BadNoSubscription messages occur.

But why is the transfer of a subscription from one session to another session (which has NOT been initiated by my application) is required or necessary at all? My client sets up a single connection to a server. This connection should result in one single session. From my point of view there is no space for a transfer to another session, at least not to a session that my application never caused to be set up and hence cannot control.

@oliver
Copy link

oliver commented Feb 8, 2024

I have this problem as well, and it appears to be caused by having UaExpert running in parallel. Apparently UaExpert will after reconnection try to transfer its subscription from the previous session; but it will use the SubscriptionId that it used in the previous session, even though that SubscriptionId might now be used by some other client!

This is quite similar to the description by @VictorManKBO in #5687 (comment) .

I am able to reproduce this problem reliably, like this:

  • start my server
  • start UaExpert
    • it will now get SubscriptionId 1
  • stop my server
  • start my server and immediately start my client
    • my client will now get SubscriptionId 1 (because it is now the first client to connect)
  • after ca. one second, UaExpert will automatically reconnect as well
  • now UaExpert will send a TransferSubscriptionsRequest message, for SubscriptionId 1 !
  • and my server will happily transfer the subscription (that now belongs to my client) to UaExpert
    • my client will receive a StatusChangeNotification message (which causes the "Dropped a StatusChangeNotification since no callback is registered" log output); and I guess the "Received Publish Response with code BadNoSubscription" logs are followup-errors.

In this case, server and client are using open62541 1.3.9 and open62541pp 0.11.0, under Debian 12 (Bookworm). UaExpert is in version 1.7.1 540.

I am not experienced with OPC-UA, so cannot really say who is at fault here:

Anyway, I guess the solution is not use UaExpert, or maybe set up a separate user account for UaExpert, or do not use the automatic reconnection in UaExpert?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants