Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconnect with ongoing Subscriptions #3599

Open
3 of 7 tasks
kleskjr opened this issue May 14, 2020 · 19 comments
Open
3 of 7 tasks

Reconnect with ongoing Subscriptions #3599

kleskjr opened this issue May 14, 2020 · 19 comments
Labels
Component: Server Issues related to the server code Priority: Medium Issue with medium priority Status: Pending Needs final decision if issue is valid or should be abandoned. Or if the issue needs to be discussed Type: Bug Bug in the code which needs to be fixed

Comments

@kleskjr
Copy link

kleskjr commented May 14, 2020

Description

The client keeps crashing during continous work. Usually, the problem arises during the renew of SecureChannel (~450 secs after initial connect. Maybe this duration comes from the 75% of 10 mins channel life-time). I experince this issue on various clients, client that writes periodically, or one that is subscribed to value changes.

Log of a client that is subscribed to a value change of 1 variable

connected.....
[2020-05-14 16:09:19.486 (UTC+0200)] info/userland      Value has changed!
[2020-05-14 16:09:19.486 (UTC+0200)] info/userland      The value is: 216.000000

[2020-05-14 16:09:19.487 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 216 | Sending request with RequestId 236 of type PublishRequest
[2020-05-14 16:09:19.487 (UTC+0200)] warn/client        Client already connected
connected.....
[2020-05-14 16:09:20.488 (UTC+0200)] warn/client        Client already connected
connected.....
[2020-05-14 16:09:21.487 (UTC+0200)] info/userland      Value has changed!
[2020-05-14 16:09:21.487 (UTC+0200)] info/userland      The value is: 217.000000

[2020-05-14 16:09:21.487 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 216 | Requesting to renew the SecureChannel
[2020-05-14 16:09:21.487 (UTC+0200)] debug/channel      Requesting to open a SecureChannel
[2020-05-14 16:09:21.487 (UTC+0200)] debug/channel      OPN message sent
[2020-05-14 16:09:21.487 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 216 | Sending request with RequestId 238 of type PublishRequest
[2020-05-14 16:09:21.487 (UTC+0200)] warn/client        Client already connected
connected.....
[2020-05-14 16:09:21.488 (UTC+0200)] trace/channel      Connection 3 | SecureChannel 216 | Verifying chunk signature
[2020-05-14 16:09:21.488 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 216 | Generating SecureChannel keys
[2020-05-14 16:09:21.488 (UTC+0200)] trace/channel      Connection 3 | SecureChannel 216 | Generating new local keys
[2020-05-14 16:09:21.488 (UTC+0200)] trace/channel      Connection 3 | SecureChannel 216 | Generating new remote keys
[2020-05-14 16:09:21.488 (UTC+0200)] info/channel       Connection 3 | SecureChannel 216 | SecureChannel renewed
[2020-05-14 16:09:21.488 (UTC+0200)] warn/client        Client already connected
connected.....
[2020-05-14 16:09:22.489 (UTC+0200)] warn/client        Client already connected
connected.....
[2020-05-14 16:09:23.487 (UTC+0200)] warn/channel       Connection 3 | SecureChannel 216 | Received an unknown SecurityToken
[2020-05-14 16:09:23.487 (UTC+0200)] warn/channel       Connection 3 | SecureChannel 216 | Receiving the response failed with StatusCode BadSecureChannelTokenUnknown
[2020-05-14 16:09:23.488 (UTC+0200)] warn/channel       Connection 0 | SecureChannel 0 | Could not receive with StatusCode BadConnectionClosed
[2020-05-14 16:09:23.488 (UTC+0200)] info/client        Client Status: ChannelState: Closed, SessionState: Created, ConnectStatus: Good
[2020-05-14 16:09:23.488 (UTC+0200)] trace/client       Client connect iterate
[2020-05-14 16:09:23.488 (UTC+0200)] trace/client       Client connect iterate
[2020-05-14 16:09:23.488 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 0 | Generating new local nonce
[2020-05-14 16:09:23.489 (UTC+0200)] debug/network      Sent HEL message
[2020-05-14 16:09:23.489 (UTC+0200)] debug/client       Client Status: ChannelState: HELSent, SessionState: Created, ConnectStatus: Good
[2020-05-14 16:09:23.489 (UTC+0200)] trace/client       Client connect iterate
[2020-05-14 16:09:23.489 (UTC+0200)] debug/network      Received ACK message
[2020-05-14 16:09:23.489 (UTC+0200)] debug/client       Client Status: ChannelState: AckReceived, SessionState: Created, ConnectStatus: Good
[2020-05-14 16:09:23.489 (UTC+0200)] trace/client       Client connect iterate
[2020-05-14 16:09:23.489 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 0 | Requesting to open a SecureChannel
[2020-05-14 16:09:23.489 (UTC+0200)] debug/channel      Requesting to open a SecureChannel
[2020-05-14 16:09:23.489 (UTC+0200)] debug/channel      OPN message sent
[2020-05-14 16:09:23.489 (UTC+0200)] debug/client       Client Status: ChannelState: OPNSent, SessionState: Created, ConnectStatus: Good
[2020-05-14 16:09:23.489 (UTC+0200)] trace/client       Client connect iterate
[2020-05-14 16:09:23.489 (UTC+0200)] trace/channel      Connection 3 | SecureChannel 0 | Verifying chunk signature
[2020-05-14 16:09:23.489 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 219 | Generating SecureChannel keys
[2020-05-14 16:09:23.490 (UTC+0200)] trace/channel      Connection 3 | SecureChannel 219 | Generating new local keys
[2020-05-14 16:09:23.490 (UTC+0200)] trace/channel      Connection 3 | SecureChannel 219 | Generating new remote keys
[2020-05-14 16:09:23.490 (UTC+0200)] info/channel       Connection 3 | SecureChannel 219 | Opened SecureChannel with SecurityPolicy http:https://opcfoundation.org/UA/SecurityPolicy#None
[2020-05-14 16:09:23.490 (UTC+0200)] info/client        Client Status: ChannelState: Open, SessionState: Created, ConnectStatus: Good
[2020-05-14 16:09:23.490 (UTC+0200)] trace/client       Client connect iterate
[2020-05-14 16:09:23.490 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 219 | Sending request with RequestId 241 of type ActivateSessionRequest
[2020-05-14 16:09:23.490 (UTC+0200)] debug/client       Client Status: ChannelState: Open, SessionState: ActivateRequested, ConnectStatus: Good
[2020-05-14 16:09:23.490 (UTC+0200)] trace/client       Client connect iterate
[2020-05-14 16:09:23.490 (UTC+0200)] info/client        The ServiceResult has the StatusCode BadInternalError
[2020-05-14 16:09:23.490 (UTC+0200)] error/client       ActivateSession failed with error code BadInternalError
[2020-05-14 16:09:23.490 (UTC+0200)] info/client        Client Status: ChannelState: Open, SessionState: ActivateRequested, ConnectStatus: BadInternalError
connected.....
[2020-05-14 16:09:23.490 (UTC+0200)] error/userland     Not connected. Retrying to connect in 1 second
[2020-05-14 16:09:24.490 (UTC+0200)] warn/client        Client already connected
[2020-05-14 16:09:24.490 (UTC+0200)] trace/client       Client connect iterate
connected.....
[2020-05-14 16:09:24.490 (UTC+0200)] error/userland     Not connected. Retrying to connect in 1 second
[2020-05-14 16:09:25.491 (UTC+0200)] warn/client        Client already connected
[2020-05-14 16:09:25.491 (UTC+0200)] trace/client       Client connect iterate

Log of client that reads a value in a subscription callback:

[2020-05-14 15:29:49.472 (UTC+0200)] info/userland
 Value has changed!
[2020-05-14 15:29:49.472 (UTC+0200)] info/userland      reading value...(iter=130.000000)
[2020-05-14 15:29:49.472 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 211 | Sending request with RequestId 313 of type ReadRequest
[2020-05-14 15:29:49.473 (UTC+0200)] debug/client       Decode a message of type ReadResponse
[2020-05-14 15:29:49.473 (UTC+0200)] info/userland      read value is 130.000000 (iter=130.000000
[2020-05-14 15:29:49.473 (UTC+0200)] info/userland      quiting callback (iter=130.000000)
[2020-05-14 15:29:49.473 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 211 | Sending request with RequestId 314 of type PublishRequest
[2020-05-14 15:29:49.473 (UTC+0200)] warn/client        Client already connected
connected.....
[2020-05-14 15:29:50.475 (UTC+0200)] warn/client        Client already connected
connected.....
[2020-05-14 15:29:51.472 (UTC+0200)] info/userland
 Value has changed!
[2020-05-14 15:29:51.472 (UTC+0200)] info/userland      reading value...(iter=131.000000)
[2020-05-14 15:29:51.473 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 211 | Sending request with RequestId 315 of type ReadRequest
[2020-05-14 15:29:51.473 (UTC+0200)] debug/client       Decode a message of type ReadResponse
[2020-05-14 15:29:51.473 (UTC+0200)] info/userland      read value is 131.000000 (iter=131.000000
[2020-05-14 15:29:51.473 (UTC+0200)] info/userland      quiting callback (iter=131.000000)
[2020-05-14 15:29:51.473 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 211 | Sending request with RequestId 316 of type PublishRequest
[2020-05-14 15:29:51.473 (UTC+0200)] warn/client        Client already connected
connected.....
[2020-05-14 15:29:52.474 (UTC+0200)] warn/client        Client already connected
connected.....
[2020-05-14 15:29:53.472 (UTC+0200)] info/userland
 Value has changed!
[2020-05-14 15:29:53.472 (UTC+0200)] info/userland      reading value...(iter=132.000000)
[2020-05-14 15:29:53.472 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 211 | Requesting to renew the SecureChannel
[2020-05-14 15:29:53.472 (UTC+0200)] debug/channel      Requesting to open a SecureChannel
[2020-05-14 15:29:53.472 (UTC+0200)] debug/channel      OPN message sent
[2020-05-14 15:29:53.473 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 211 | Sending request with RequestId 318 of type ReadRequest
[2020-05-14 15:29:53.473 (UTC+0200)] trace/channel      Connection 3 | SecureChannel 211 | Verifying chunk signature
[2020-05-14 15:29:53.473 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 211 | Generating SecureChannel keys
[2020-05-14 15:29:53.473 (UTC+0200)] trace/channel      Connection 3 | SecureChannel 211 | Generating new local keys
[2020-05-14 15:29:53.473 (UTC+0200)] trace/channel      Connection 3 | SecureChannel 211 | Generating new remote keys
[2020-05-14 15:29:53.473 (UTC+0200)] info/channel       Connection 3 | SecureChannel 211 | SecureChannel renewed
[2020-05-14 15:29:53.473 (UTC+0200)] warn/channel       Connection 3 | SecureChannel 211 | Received an unknown SecurityToken
[2020-05-14 15:29:53.473 (UTC+0200)] warn/channel       Connection 3 | SecureChannel 211 | Receiving the response failed with StatusCode BadSecureChannelTokenUnknown
[2020-05-14 15:29:53.473 (UTC+0200)] info/client        Client Status: ChannelState: Closed, SessionState: Created, ConnectStatus: Good
zsh: segmentation fault (core dumped)  ./subclient

Log of the writing client:


[2020-05-14 14:34:05.694 (UTC+0200)] warn/client        Client already connected
niter = 226.0
2020-05-14T12:34:05.78226.0[2020-05-14 14:34:05.779 (UTC+0200)] debug/channel   Connection 19 | SecureChannel 181 | Requesting to renew the SecureChannel
[2020-05-14 14:34:05.779 (UTC+0200)] debug/channel      Requesting to open a SecureChannel
[2020-05-14 14:34:05.779 (UTC+0200)] debug/channel      OPN message sent
[2020-05-14 14:34:05.779 (UTC+0200)] debug/channel      Connection 19 | SecureChannel 181 | Sending request with RequestId 6306 of type WriteRequest
[2020-05-14 14:34:05.779 (UTC+0200)] trace/channel      Connection 19 | SecureChannel 181 | Verifying chunk signature
[2020-05-14 14:34:05.779 (UTC+0200)] debug/channel      Connection 19 | SecureChannel 181 | Generating SecureChannel keys
[2020-05-14 14:34:05.779 (UTC+0200)] trace/channel      Connection 19 | SecureChannel 181 | Generating new local keys
[2020-05-14 14:34:05.780 (UTC+0200)] trace/channel      Connection 19 | SecureChannel 181 | Generating new remote keys
[2020-05-14 14:34:05.780 (UTC+0200)] info/channel       Connection 19 | SecureChannel 181 | SecureChannel renewed
[2020-05-14 14:34:05.780 (UTC+0200)] warn/channel       Connection 19 | SecureChannel 181 | Received an unknown SecurityToken
[2020-05-14 14:34:05.780 (UTC+0200)] warn/channel       Connection 19 | SecureChannel 181 | Receiving the response failed with StatusCode BadSecureChannelTokenUnknown
[2020-05-14 14:34:05.780 (UTC+0200)] info/client        Client Status: ChannelState: Closed, SessionState: Created, ConnectStatus: Good
[2020-05-14 14:34:05.780 (UTC+0200)] info/client        SecureChannel must be connected before sending requests
2020-05-14T12:34:05.882
[2020-05-14 14:34:06.195 (UTC+0200)] trace/client       Client connect iterate
[2020-05-14 14:34:06.196 (UTC+0200)] trace/client       Client connect iterate
[2020-05-14 14:34:06.196 (UTC+0200)] debug/channel      Connection 19 | SecureChannel 0 | Generating new local nonce
[2020-05-14 14:34:06.196 (UTC+0200)] debug/network      Sent HEL message
[2020-05-14 14:34:06.196 (UTC+0200)] debug/client       Client Status: ChannelState: HELSent, SessionState: Created, ConnectStatus: Good
[2020-05-14 14:34:06.196 (UTC+0200)] trace/client       Client connect iterate
[2020-05-14 14:34:06.196 (UTC+0200)] debug/network      Received ACK message
[2020-05-14 14:34:06.196 (UTC+0200)] debug/client       Client Status: ChannelState: AckReceived, SessionState: Created, ConnectStatus: Good
[2020-05-14 14:34:06.196 (UTC+0200)] trace/client       Client connect iterate
[2020-05-14 14:34:06.196 (UTC+0200)] debug/channel      Connection 19 | SecureChannel 0 | Requesting to open a SecureChannel
[2020-05-14 14:34:06.196 (UTC+0200)] debug/channel      Requesting to open a SecureChannel
[2020-05-14 14:34:06.197 (UTC+0200)] debug/channel      OPN message sent
[2020-05-14 14:34:06.197 (UTC+0200)] debug/client       Client Status: ChannelState: OPNSent, SessionState: Created, ConnectStatus: Good
[2020-05-14 14:34:06.197 (UTC+0200)] trace/client       Client connect iterate
[2020-05-14 14:34:06.197 (UTC+0200)] trace/channel      Connection 19 | SecureChannel 0 | Verifying chunk signature
[2020-05-14 14:34:06.197 (UTC+0200)] debug/channel      Connection 19 | SecureChannel 184 | Generating SecureChannel keys
[2020-05-14 14:34:06.197 (UTC+0200)] trace/channel      Connection 19 | SecureChannel 184 | Generating new local keys
[2020-05-14 14:34:06.197 (UTC+0200)] trace/channel      Connection 19 | SecureChannel 184 | Generating new remote keys
[2020-05-14 14:34:06.197 (UTC+0200)] info/channel       Connection 19 | SecureChannel 184 | Opened SecureChannel with SecurityPolicy http:https://opcfoundation.org/UA/SecurityPolicy#None
[2020-05-14 14:34:06.197 (UTC+0200)] info/client        Client Status: ChannelState: Open, SessionState: Created, ConnectStatus: Good
[2020-05-14 14:34:06.197 (UTC+0200)] trace/client       Client connect iterate
[2020-05-14 14:34:06.197 (UTC+0200)] debug/channel      Connection 19 | SecureChannel 184 | Sending request with RequestId 6309 of type ActivateSessionRequest
[2020-05-14 14:34:06.198 (UTC+0200)] debug/client       Client Status: ChannelState: Open, SessionState: ActivateRequested, ConnectStatus: Good
[2020-05-14 14:34:06.198 (UTC+0200)] trace/client       Client connect iterate
[2020-05-14 14:34:06.198 (UTC+0200)] info/client        The ServiceResult has the StatusCode BadInternalError
[2020-05-14 14:34:06.198 (UTC+0200)] error/client       ActivateSession failed with error code BadInternalError
[2020-05-14 14:34:06.198 (UTC+0200)] info/client        Client Status: ChannelState: Open, SessionState: ActivateRequested, ConnectStatus: BadInternalError
[2020-05-14 14:34:06.199 (UTC+0200)] info/client        Client Status: ChannelState: Closed, SessionState: Closed, ConnectStatus: BadInternalError

Background Information / Reproduction Steps

Used CMake options:

cmake -DUA_ENABLE_AMALGAMATION=ON -DUA_LOGLEVEL=1 -DUA_ENABLE_PUBSUB=ON -DUA_ENABLE_SUBSCRIPTIONS=ON -DBUILD_SHARED_LIBS=ON -DCMAKE_BUILD_TYPE=RelWithDebInfo ..

Checklist

Please provide the following information:

  • open62541 Version (release number or git tag): 1a667a0
  • Other OPC UA SDKs used (client or server):
  • Operating system: Linux
  • Logs (with UA_LOGLEVEL set as low as necessary) attached
  • Wireshark network dump attached
  • Self-contained code example attached
  • Critical issue
@jpfr
Copy link
Member

jpfr commented May 14, 2020

Can you run in a debugger or with valgrind and send a stack trace?

It seems you call UA_Client_connect a lot. Why?

@kleskjr
Copy link
Author

kleskjr commented May 15, 2020

Thanks for the reply, jpfr!

I have a loop in which the connect is run every second. Removing it doesn't change the issue. Here is the log. Interestingly after the BadInternalError, UA_Client_run_iterate(client, 1000) starts itterating like crazy without respecting the timeout (I had to put a sleep of 100 do limit the Client connect iterate).

The output of valgrind --leak-check=yes ./subclient:

 Value has changed!                                                                                                                          
[2020-05-15 11:05:01.582 (UTC+0200)] info/userland      reading value...(iter=22.000000)                                                     
[2020-05-15 11:05:01.583 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 305 | Sending request with RequestId 444 of type ReadReq
uest                                                                          
[2020-05-15 11:05:01.584 (UTC+0200)] debug/client       Decode a message of type ReadResponse
[2020-05-15 11:05:01.585 (UTC+0200)] info/userland      read value is 22.000000 (iter=22.000000                                             
[2020-05-15 11:05:03.085 (UTC+0200)] info/userland      quiting callback (iter=22.000000)
[2020-05-15 11:05:03.086 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 305 | Sending request with RequestId 445 of type Publish
Request                                                                                   
[2020-05-15 11:05:03.581 (UTC+0200)] info/userland                            
 Value has changed!                                                                                                                     
[2020-05-15 11:05:03.582 (UTC+0200)] info/userland      reading value...(iter=23.000000)
[2020-05-15 11:05:03.582 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 305 | Sending request with RequestId 446 of type ReadReq
uest                                                                                                                    
[2020-05-15 11:05:03.583 (UTC+0200)] debug/client       Decode a message of type ReadResponse
[2020-05-15 11:05:03.584 (UTC+0200)] info/userland      read value is 23.000000 (iter=23.000000
[2020-05-15 11:05:05.085 (UTC+0200)] info/userland      quiting callback (iter=23.000000)
[2020-05-15 11:05:05.087 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 305 | Requesting to renew the SecureChannel
[2020-05-15 11:05:05.088 (UTC+0200)] debug/channel      Requesting to open a SecureChannel
[2020-05-15 11:05:05.089 (UTC+0200)] debug/channel      OPN message sent
[2020-05-15 11:05:05.090 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 305 | Sending request with RequestId 448 of type Publish
Request
[2020-05-15 11:05:05.192 (UTC+0200)] trace/channel      Connection 3 | SecureChannel 305 | Verifying chunk signature
[2020-05-15 11:05:05.193 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 305 | Generating SecureChannel keys
[2020-05-15 11:05:05.193 (UTC+0200)] trace/channel      Connection 3 | SecureChannel 305 | Generating new local keys
[2020-05-15 11:05:05.194 (UTC+0200)] trace/channel      Connection 3 | SecureChannel 305 | Generating new remote keys
[2020-05-15 11:05:05.195 (UTC+0200)] info/channel       Connection 3 | SecureChannel 305 | SecureChannel renewed
[2020-05-15 11:05:05.583 (UTC+0200)] warn/channel       Connection 3 | SecureChannel 305 | Received an unknown SecurityToken
[2020-05-15 11:05:05.586 (UTC+0200)] warn/channel       Connection 3 | SecureChannel 305 | Receiving the response failed with StatusCode BadS
ecureChannelTokenUnknown
[2020-05-15 11:05:05.599 (UTC+0200)] warn/channel       Connection 0 | SecureChannel 0 | Could not receive with StatusCode BadConnectionClose
d
[2020-05-15 11:05:05.600 (UTC+0200)] info/client        Client Status: ChannelState: Closed, SessionState: Created, ConnectStatus: Good
[2020-05-15 11:05:05.701 (UTC+0200)] trace/client       Client connect iterate
[2020-05-15 11:05:05.813 (UTC+0200)] trace/client       Client connect iterate
[2020-05-15 11:05:05.914 (UTC+0200)] trace/client       Client connect iterate
[2020-05-15 11:05:05.914 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 0 | Generating new local nonce
[2020-05-15 11:05:05.915 (UTC+0200)] debug/network      Sent HEL message                
[2020-05-15 11:05:05.915 (UTC+0200)] debug/client       Client Status: ChannelState: HELSent, SessionState: Created, ConnectStatus: Good     
[2020-05-15 11:05:06.015 (UTC+0200)] trace/client       Client connect iterate
[2020-05-15 11:05:06.016 (UTC+0200)] debug/network      Received ACK message                 
[2020-05-15 11:05:06.017 (UTC+0200)] debug/client       Client Status: ChannelState: AckReceived, SessionState: Created, ConnectStatus: Good
[2020-05-15 11:05:06.117 (UTC+0200)] trace/client       Client connect iterate           
[2020-05-15 11:05:06.117 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 0 | Requesting to open a SecureChannel                  
[2020-05-15 11:05:06.117 (UTC+0200)] debug/channel      Requesting to open a SecureChannel
[2020-05-15 11:05:06.117 (UTC+0200)] debug/channel      OPN message sent      
[2020-05-15 11:05:06.117 (UTC+0200)] debug/client       Client Status: ChannelState: OPNSent, SessionState: Created, ConnectStatus: Good
[2020-05-15 11:05:06.218 (UTC+0200)] trace/client       Client connect iterate          
[2020-05-15 11:05:06.218 (UTC+0200)] trace/channel      Connection 3 | SecureChannel 0 | Verifying chunk signature                           
[2020-05-15 11:05:06.218 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 311 | Generating SecureChannel keys
[2020-05-15 11:05:06.218 (UTC+0200)] trace/channel      Connection 3 | SecureChannel 311 | Generating new local keys
[2020-05-15 11:05:06.218 (UTC+0200)] trace/channel      Connection 3 | SecureChannel 311 | Generating new remote keys
[2020-05-15 11:05:06.219 (UTC+0200)] info/channel       Connection 3 | SecureChannel 311 | Opened SecureChannel with SecurityPolicy http:https://op
cfoundation.org/UA/SecurityPolicy#None
[2020-05-15 11:05:06.219 (UTC+0200)] info/client        Client Status: ChannelState: Open, SessionState: Created, ConnectStatus: Good
[2020-05-15 11:05:06.319 (UTC+0200)] trace/client       Client connect iterate
[2020-05-15 11:05:06.319 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 311 | Sending request with RequestId 451 of type Activat
eSessionRequest
[2020-05-15 11:05:06.320 (UTC+0200)] debug/client       Client Status: ChannelState: Open, SessionState: ActivateRequested, ConnectStatus: Go
od
[2020-05-15 11:05:06.420 (UTC+0200)] trace/client       Client connect iterate
[2020-05-15 11:05:06.420 (UTC+0200)] info/client        The ServiceResult has the StatusCode BadInternalError
[2020-05-15 11:05:06.421 (UTC+0200)] error/client       ActivateSession failed with error code BadInternalError
[2020-05-15 11:05:06.421 (UTC+0200)] error/client       ActivateSession failed with error code BadInternalError
[2020-05-15 11:05:06.422 (UTC+0200)] info/client        Client Status: ChannelState: Open, SessionState: ActivateRequested, ConnectStatus: Ba
dInternalError
[2020-05-15 11:05:06.522 (UTC+0200)] trace/client       Client connect iterate
[2020-05-15 11:05:06.622 (UTC+0200)] trace/client       Client connect iterate
[2020-05-15 11:05:06.723 (UTC+0200)] trace/client       Client connect iterate
[2020-05-15 11:05:06.823 (UTC+0200)] trace/client       Client connect iterate
[2020-05-15 11:05:06.924 (UTC+0200)] trace/client       Client connect iterate
[2020-05-15 11:05:07.024 (UTC+0200)] trace/client       Client connect iterate
[2020-05-15 11:05:07.125 (UTC+0200)] trace/client       Client connect iterate
[2020-05-15 11:05:07.226 (UTC+0200)] trace/client       Client connect iterate
[2020-05-15 11:05:07.326 (UTC+0200)] trace/client       Client connect iterate
[2020-05-15 11:05:07.426 (UTC+0200)] trace/client       Client connect iterate
[2020-05-15 11:05:07.527 (UTC+0200)] trace/client       Client connect iterate

...

[2020-05-15 11:05:09.833 (UTC+0200)] trace/client       Client connect iterate
[2020-05-15 11:05:09.933 (UTC+0200)] trace/client       Client connect iterate
^C[2020-05-15 11:05:10.031 (UTC+0200)] info/userland    Received Ctrl-C
==24076==
==24076== HEAP SUMMARY:
==24076==     in use at exit: 4,578 bytes in 49 blocks
==24076==   total heap usage: 5,875 allocs, 5,826 frees, 58,805,141 bytes allocated
==24076==
==24076== 4,578 (1,168 direct, 3,410 indirect) bytes in 1 blocks are definitely lost in loss record 17 of 17
==24076==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24076==    by 0x154F7B: UA_Client_newWithConfig (open62541.c:43392)
==24076==    by 0x18EC52: UA_Client_new (open62541.c:58318)
==24076==    by 0x194AF9: main (subscribe_dev.c:203)
==24076==
==24076== LEAK SUMMARY:
==24076==    definitely lost: 1,168 bytes in 1 blocks
==24076==    indirectly lost: 3,410 bytes in 48 blocks
==24076==      possibly lost: 0 bytes in 0 blocks
==24076==    still reachable: 0 bytes in 0 blocks
==24076==         suppressed: 0 bytes in 0 blocks
==24076==
==24076== For counts of detected and suppressed errors, rerun with: -v
==24076== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

@jpfr
Copy link
Member

jpfr commented May 15, 2020

The connection fails with this:

Client Status: ChannelState: Open, SessionState: ActivateRequested, ConnectStatus: BadInternalError

After this, Client_run_iterate will immediately bail every time.
You have to call Client_connect to reset the ConnectStatus.

Can you reproduce the crash with Valgrind as well?
A stack trace would be most interesting.

@kleskjr
Copy link
Author

kleskjr commented May 15, 2020

After this, Client_run_iterate will immediately bail every time.
You have to call Client_connect to reset the ConnectStatus.

yes, therefore I was calling Client_connect, although it didn't help. So far, I need to delete the client, and create a new one each time I get bad ConnectStatus (every 450 secs). Maybe there is some fundamental flaw in my code? Here is a minimal example that reproduces the BadConnectionClose error, without a crash:

#include "open62541.h"
// gcc -std=c99 -g -O0 -I ../res ../res/open62541.c subscribe_min.c -DUA_ARCHITECTURE_POSIX -o subclient

UA_Boolean running = true;
UA_Boolean reset_client = true;

static void stopHandler(int sign) {
    UA_LOG_INFO(UA_Log_Stdout, UA_LOGCATEGORY_USERLAND, "Received Ctrl-C");
    running = 0;
}

static void
cb_valuechanged(UA_Client *client, UA_UInt32 subId, void *subContext,
        UA_UInt32 monId, void *monContext, UA_DataValue *value) {

    UA_LOG_INFO(UA_Log_Stdout, UA_LOGCATEGORY_USERLAND, "Value has changed!");

    if(UA_Variant_hasScalarType(&value->value, &UA_TYPES[UA_TYPES_FLOAT])) {
        UA_LOG_INFO(UA_Log_Stdout, UA_LOGCATEGORY_USERLAND, "The value is: %f\n", *(UA_Float *) value->value.data);
    }
}

static void
cb_valuechanged2(UA_Client *client, UA_UInt32 subId, void *subContext,
        UA_UInt32 monId, void *monContext, UA_DataValue *value) {
    UA_LOG_INFO(UA_Log_Stdout, UA_LOGCATEGORY_USERLAND, "\n Value has changed!");
    if(UA_Variant_hasScalarType(&value->value, &UA_TYPES[UA_TYPES_FLOAT])) {
        //printf("the value is: %f\n", *(UA_Float *) value->value.data);
    }

    UA_LOG_INFO(UA_Log_Stdout, UA_LOGCATEGORY_USERLAND, "reading value...(iter=%f)", *(UA_Float *) value->value.data);
    UA_Variant *val = UA_Variant_new();
    UA_StatusCode retval = UA_Client_readValueAttribute(client, UA_NODEID_NUMERIC(3, 10202), val);
    UA_LOG_INFO(UA_Log_Stdout, UA_LOGCATEGORY_USERLAND, "read value is %f (iter=%f", *((float*)val->data), *(UA_Float *) value->value.data);
    if (*(float*)val->data != *(float*)value->value.data) {
        printf("counters don't match!!!!!!!!!!!!!!!!\n");
        return;
    }

    UA_Variant_delete(val);
    UA_sleep_ms(1500);
    UA_LOG_INFO(UA_Log_Stdout, UA_LOGCATEGORY_USERLAND, "quiting callback (iter=%f)", *(UA_Float *) value->value.data);
}

void subscribe(UA_Client *client){
    UA_CreateSubscriptionRequest request = UA_CreateSubscriptionRequest_default();
    UA_CreateSubscriptionResponse response = UA_Client_Subscriptions_create(client, request,
                                                                            NULL, NULL, NULL);

    if(response.responseHeader.serviceResult == UA_STATUSCODE_GOOD)
        UA_LOG_INFO(UA_Log_Stdout, UA_LOGCATEGORY_USERLAND, "Create subscription succeeded, id %u", response.subscriptionId);

    UA_LOG_INFO(UA_Log_Stdout, UA_LOGCATEGORY_USERLAND, "subscribing....");
    UA_MonitoredItemCreateRequest monRequest =
        UA_MonitoredItemCreateRequest_default(UA_NODEID_NUMERIC(3, 10203));

    monRequest.requestedParameters.samplingInterval = 10;
    monRequest.requestedParameters.discardOldest = true;
    UA_MonitoredItemCreateResult monResponse =
        UA_Client_MonitoredItems_createDataChange(client, response.subscriptionId,
                                                  UA_TIMESTAMPSTORETURN_BOTH,
                                                  monRequest, NULL, cb_valuechanged, NULL);
    if(monResponse.statusCode == UA_STATUSCODE_GOOD)
        UA_LOG_INFO(UA_Log_Stdout, UA_LOGCATEGORY_USERLAND, "Monitoring id %u", monResponse.monitoredItemId);
}


int main(void) {
    signal(SIGINT, stopHandler); /* catches ctrl-c */

    UA_Client *client = UA_Client_new();
    UA_ClientConfig *config = UA_Client_getConfig(client);
    UA_ClientConfig_setDefault(config);
        
    UA_StatusCode retval = UA_Client_connect(client, "opc.tcp:https://localhost:4840");
    subscribe(client);

    while(running) {
        UA_Client_run_iterate(client, 1000);
        UA_sleep_ms(100);
    }
}

To reprodice the crash, one needs to change the callback function from cb_valuechanged to
cb_valuechanged2. Keep in mind that it does not always happen! Here is a stack trace from crash:

 Value has changed!                                                                                                                                 
[2020-05-15 13:36:08.137 (UTC+0200)] info/userland      reading value...(iter=219.000000)                                                           
[2020-05-15 13:36:08.138 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 362 | Sending request with RequestId 457 of type ReadRequest   
[2020-05-15 13:36:08.139 (UTC+0200)] debug/client       Decode a message of type ReadResponse                                                       
[2020-05-15 13:36:08.140 (UTC+0200)] info/userland      read value is 219.000000 (iter=219.000000                                                
[2020-05-15 13:36:09.640 (UTC+0200)] info/userland      quiting callback (iter=219.000000)                                                       
[2020-05-15 13:36:09.640 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 362 | Sending request with RequestId 458 of type PublishRequest
[2020-05-15 13:36:10.137 (UTC+0200)] info/userland  
 Value has changed!
[2020-05-15 13:36:10.138 (UTC+0200)] info/userland      reading value...(iter=220.000000)
[2020-05-15 13:36:10.140 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 362 | Requesting to renew the SecureChannel
[2020-05-15 13:36:10.140 (UTC+0200)] debug/channel      Requesting to open a SecureChannel
[2020-05-15 13:36:10.142 (UTC+0200)] debug/channel      OPN message sent
[2020-05-15 13:36:10.142 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 362 | Sending request with RequestId 460 of type ReadRequest
[2020-05-15 13:36:10.145 (UTC+0200)] trace/channel      Connection 3 | SecureChannel 362 | Verifying chunk signature
[2020-05-15 13:36:10.145 (UTC+0200)] debug/channel      Connection 3 | SecureChannel 362 | Generating SecureChannel keys
[2020-05-15 13:36:10.145 (UTC+0200)] trace/channel      Connection 3 | SecureChannel 362 | Generating new local keys
[2020-05-15 13:36:10.146 (UTC+0200)] trace/channel      Connection 3 | SecureChannel 362 | Generating new remote keys
[2020-05-15 13:36:10.147 (UTC+0200)] info/channel       Connection 3 | SecureChannel 362 | SecureChannel renewed
[2020-05-15 13:36:10.149 (UTC+0200)] warn/channel       Connection 3 | SecureChannel 362 | Received an unknown SecurityToken
[2020-05-15 13:36:10.152 (UTC+0200)] warn/channel       Connection 3 | SecureChannel 362 | Receiving the response failed with StatusCode BadSecureChannelTokenUnknown
[2020-05-15 13:36:10.162 (UTC+0200)] info/client        Client Status: ChannelState: Closed, SessionState: Created, ConnectStatus: Good
==12963== Invalid read of size 4
==12963==    at 0x194691: cb_valuechanged2 (subscribe_dev.c:114)
==12963==    by 0x15F967: processDataChangeNotification (open62541.c:47411)
==12963==    by 0x15FB1C: processNotificationMessage (open62541.c:47462)
==12963==    by 0x15FF8D: UA_Client_Subscriptions_processPublishResponse (open62541.c:47584)
==12963==    by 0x1600EA: processPublishResponseAsync (open62541.c:47612)
==12963==    by 0x155933: processAsyncResponse (open62541.c:43636)
==12963==    by 0x155AE0: processServiceResponse (open62541.c:43685)
==12963==    by 0x12183F: assembleProcessMessage (open62541.c:22850)
==12963==    by 0x121C5A: processSymmetricChunks (open62541.c:22962)
==12963==    by 0x121D91: processCompleteChunks (open62541.c:22985)
==12963==    by 0x12223C: UA_SecureChannel_processBuffer (open62541.c:23142)
==12963==    by 0x122310: UA_SecureChannel_receive (open62541.c:23168)
==12963==    by 0x155E16: receiveResponse (open62541.c:43762)
==12963==    by 0x1568D7: UA_Client_run_iterate (open62541.c:44029)
==12963==    by 0x194B4E: main (subscribe_dev.c:242)
==12963==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==12963==
==12963==
==12963== Process terminating with default action of signal 11 (SIGSEGV)
==12963==  Access not within mapped region at address 0x0
==12963==    at 0x194691: cb_valuechanged2 (subscribe_dev.c:114)
==12963==    by 0x15F967: processDataChangeNotification (open62541.c:47411)
==12963==    by 0x15FB1C: processNotificationMessage (open62541.c:47462)
==12963==    by 0x15FF8D: UA_Client_Subscriptions_processPublishResponse (open62541.c:47584)
==12963==    by 0x1600EA: processPublishResponseAsync (open62541.c:47612)
==12963==    by 0x155933: processAsyncResponse (open62541.c:43636)
==12963==    by 0x155AE0: processServiceResponse (open62541.c:43685)
==12963==    by 0x12183F: assembleProcessMessage (open62541.c:22850)
==12963==    by 0x121C5A: processSymmetricChunks (open62541.c:22962)
==12963==    by 0x121D91: processCompleteChunks (open62541.c:22985)
==12963==    by 0x12223C: UA_SecureChannel_processBuffer (open62541.c:23142)
==12963==    by 0x122310: UA_SecureChannel_receive (open62541.c:23168)
==12963==    by 0x155E16: receiveResponse (open62541.c:43762)
==12963==    by 0x1568D7: UA_Client_run_iterate (open62541.c:44029)
==12963==    by 0x194B4E: main (subscribe_dev.c:242)
==12963==  If you believe this happened as a result of a stack
==12963==  overflow in your program's main thread (unlikely but
==12963==  possible), you can try to increase the size of the
==12963==  main thread stack using the --main-stacksize= flag.
==12963==  The main thread stack size used in this run was 8388608.
==12963==
==12963== HEAP SUMMARY:
==12963==     in use at exit: 70,289 bytes in 58 blocks
==12963==   total heap usage: 5,966 allocs, 5,908 frees, 59,975,087 bytes allocated
==12963==
==12963== 52 (48 direct, 4 indirect) bytes in 1 blocks are definitely lost in loss record 13 of 25
==12963==    at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==12963==    by 0x118FBE: UA_new (open62541.c:9045)
==12963==    by 0x1940EB: UA_Variant_new (open62541.h:17768)
==12963==    by 0x194637: cb_valuechanged2 (subscribe_dev.c:112)
==12963==    by 0x15F967: processDataChangeNotification (open62541.c:47411)
==12963==    by 0x15FB1C: processNotificationMessage (open62541.c:47462)
==12963==    by 0x15FF8D: UA_Client_Subscriptions_processPublishResponse (open62541.c:47584)
==12963==    by 0x1600EA: processPublishResponseAsync (open62541.c:47612)
==12963==    by 0x155933: processAsyncResponse (open62541.c:43636)
==12963==    by 0x155AE0: processServiceResponse (open62541.c:43685)
==12963==    by 0x12183F: assembleProcessMessage (open62541.c:22850)
==12963==    by 0x121C5A: processSymmetricChunks (open62541.c:22962)
==12963==    by 0x121D91: processCompleteChunks (open62541.c:22985)
==12963==    by 0x12223C: UA_SecureChannel_processBuffer (open62541.c:23142)
==12963==    by 0x122310: UA_SecureChannel_receive (open62541.c:23168)
==12963==    by 0x155E16: receiveResponse (open62541.c:43762)
==12963==    by 0x1568D7: UA_Client_run_iterate (open62541.c:44029)
==12963==    by 0x194B4E: main (subscribe_dev.c:242)
==12963==
==12963== LEAK SUMMARY:
==12963==    definitely lost: 48 bytes in 1 blocks
==12963==    indirectly lost: 4 bytes in 1 blocks
==12963==      possibly lost: 0 bytes in 0 blocks
==12963==    still reachable: 70,237 bytes in 56 blocks
==12963==         suppressed: 0 bytes in 0 blocks
==12963== Reachable blocks (those to which a pointer was found) are not shown.
==12963== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==12963==
==12963== For counts of detected and suppressed errors, rerun with: -v
==12963== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
zsh: segmentation fault (core dumped)  valgrind --num-callers=50 --leak-check=yes ./subclient

@ichrispa ichrispa added this to the 1.1 milestone May 15, 2020
@jpfr
Copy link
Member

jpfr commented May 15, 2020

Looking at cb_valuechanged2 it looks as if this is the problem.

    UA_Variant *val = UA_Variant_new();
    UA_Client_readValueAttribute(client, UA_NODEID_NUMERIC(3, 10202), val);
    UA_LOG_INFO(UA_Log_Stdout, UA_LOGCATEGORY_USERLAND, "read value is %f (iter=%f",
                *((float*)val->data), *(UA_Float *) value->value.data);

When the client is disconnected, UA_Client_readValueAttribute fails and val will be empty.
Check before dereferencing.

@kleskjr
Copy link
Author

kleskjr commented May 15, 2020

Yes, that's true, the reference to val is causing the crash. Thanks!
However, the client can't recover from BadInternalError during SecureChannel renewal, and UA_Client_connect doesn't reconnect. Do you get a similar behaviour?

@jpfr jpfr changed the title Crashing at Requesting to open a SecureChannel Reconnect with ongoing Subscriptions May 18, 2020
@kleskjr
Copy link
Author

kleskjr commented May 19, 2020

Hi jpfr,

I'm getting this nasty BadInternalError for any client actually, not just during subscrption.
Attached, you can see a minimal example of a writing client that runs into the same problem. If noone else is experiencing this issue, I guess that there should be something wrong in my implementation.

for building the library, I use cmake -DUA_ENABLE_AMALGAMATION=ON -DBUILD_SHARED_LIBS=ON ../

then to compile the client below: gcc -std=c99 -I ../res ../res/open62541.c write_min.c -o writeclient

#include "open62541.h"
UA_Boolean running = true;

static void stopHandler(int sign) {
    UA_LOG_INFO(UA_Log_Stdout, UA_LOGCATEGORY_USERLAND, "Received Ctrl-C");
    running = 0;
}

int main(void) {
    signal(SIGINT, stopHandler); /* catches ctrl-c */

    UA_Client *client = UA_Client_new();
    UA_ClientConfig *config = UA_Client_getConfig(client);
    UA_ClientConfig_setDefault(config);
        
    UA_StatusCode retval = UA_Client_connect(client, "opc.tcp:https://localhost:4840");
    UA_UInt16 ns = 3;
    UA_UInt32 idname = 10203;
    
    float value = 0.0f;

    UA_Variant *val = UA_Variant_new();

    while(running) {
        UA_StatusCode retval = UA_Client_connect(client, "opc.tcp:https://localhost:4840");
        printf("connected.....\n");
        if(retval != UA_STATUSCODE_GOOD) {
            UA_LOG_ERROR(UA_Log_Stdout, UA_LOGCATEGORY_USERLAND,
                         "Not connected. Retrying to connect in 1 second");
            UA_sleep_ms(1000);
            continue;
        }
        value += 1.0f;
        UA_Variant_setScalarCopy(val, &value, &UA_TYPES[UA_TYPES_FLOAT]);
        UA_Client_writeValueAttribute(client, UA_NODEID_NUMERIC(ns, idname), val);
        UA_LOG_INFO(UA_Log_Stdout, UA_LOGCATEGORY_USERLAND, "value written");
        UA_sleep_ms(1000);
    }
    UA_Variant_delete(val);
}

Edit: as a workaround, placing UA_Client_disconnect(client) directly after the check if(retval != UA_STATUSCODE_GOOD) seems to save the day.

@srdgame
Copy link

srdgame commented Jun 2, 2020

I guess you meet the same issue as mine.

It is about the secure channel session renewal, which caused by:

---------------------------- src/client/ua_client.c ----------------------------
index 58f3f14f..03b09aba 100644
@@ -444,6 +444,10 @@ __UA_Client_Service(UA_Client *client, const void *request,
     if(client->channel.state != UA_SECURECHANNELSTATE_OPEN) {
         UA_LOG_INFO(&client->config.logger, UA_LOGCATEGORY_CLIENT,
                     "SecureChannel must be connected before sending requests");
+		// DIRK:
+		// Set proper serviceResult
+        UA_ResponseHeader *respHeader = (UA_ResponseHeader*)response;
+        respHeader->serviceResult = UA_STATUSCODE_BADCONNECTIONCLOSED;
 		return;

And there is another issue about session renewal, which is about the tokenId. I am not sure about the logic of nextSecurityToken, so I have an dirty fix on following

------------------------ src/ua_securechannel_crypto.c ------------------------
index ed8f0eeb..219a2443 100644
@@ -592,6 +592,14 @@ UA_StatusCode
 checkSymHeader(UA_SecureChannel *channel, UA_UInt32 tokenId) {
     /* If the message uses a different token, check if it is the next token. */
     if(tokenId != channel->securityToken.tokenId) {
+		/// Dirk fix
+		if (channel->nextSecurityToken.tokenId == 0) {

@srdgame
Copy link

srdgame commented Jun 2, 2020

@jpfr
If the session renewal will not update the session state via the State Callback, does it means that requests can be sent out during the session renewal?

@jpfr
Copy link
Member

jpfr commented Jun 7, 2020

Yes, you can send requests during the renewal.

@jpfr
Copy link
Member

jpfr commented Jun 7, 2020

@kleskjr I tried out the code but could not reproduce the issue.
What is your environment?
Here is my log (the SecureChannel lifetime was reduced to not wait for minutes on end):

jpfr@virtualbox:~/software/open62541/build$ ./a.out
[2020-06-07 12:59:24.207 (UTC+0200)] warn/userland      AcceptAll Certificate Verification. Any remote certificate will be accepted.
[2020-06-07 12:59:24.208 (UTC+0200)] info/channel       Connection 3 | SecureChannel 2 | Opened SecureChannel with SecurityPolicy http:https://opcfoundation.org/UA/SecurityPolicy#None
[2020-06-07 12:59:24.208 (UTC+0200)] info/client        Client Status: ChannelState: Open, SessionState: Closed, ConnectStatus: Good
[2020-06-07 12:59:24.208 (UTC+0200)] info/client        Selected Endpoint opc.tcp:https://localhost:4840 with SecurityMode None and SecurityPolicy http:https://opcfoundation.org/UA/SecurityPolicy#None
[2020-06-07 12:59:24.208 (UTC+0200)] info/client        Selected UserTokenPolicy open62541-anonymous-policy with UserTokenType Anonymous and SecurityPolicy http:https://opcfoundation.org/UA/SecurityPolicy#None
[2020-06-07 12:59:24.208 (UTC+0200)] info/client        Client Status: ChannelState: Open, SessionState: Created, ConnectStatus: Good
[2020-06-07 12:59:24.208 (UTC+0200)] info/client        Client Status: ChannelState: Open, SessionState: Activated, ConnectStatus: Good
[2020-06-07 12:59:24.208 (UTC+0200)] warn/client        Client already connected
connected.....
[2020-06-07 12:59:24.208 (UTC+0200)] info/userland      value written
[2020-06-07 12:59:25.209 (UTC+0200)] warn/client        Client already connected
connected.....
[2020-06-07 12:59:25.209 (UTC+0200)] info/userland      value written
[2020-06-07 12:59:26.209 (UTC+0200)] warn/client        Client already connected
connected.....
[2020-06-07 12:59:26.209 (UTC+0200)] info/userland      value written
[2020-06-07 12:59:27.210 (UTC+0200)] warn/client        Client already connected
connected.....
[2020-06-07 12:59:27.210 (UTC+0200)] info/userland      value written
[2020-06-07 12:59:28.211 (UTC+0200)] warn/client        Client already connected
connected.....
[2020-06-07 12:59:28.211 (UTC+0200)] info/channel       Connection 3 | SecureChannel 2 | SecureChannel renewed
[2020-06-07 12:59:28.211 (UTC+0200)] info/userland      value written
[2020-06-07 12:59:29.212 (UTC+0200)] warn/client        Client already connected
connected.....
[2020-06-07 12:59:29.212 (UTC+0200)] info/userland      value written
[2020-06-07 12:59:30.212 (UTC+0200)] warn/client        Client already connected

jpfr added a commit to jpfr/open62541 that referenced this issue Jun 11, 2020
jpfr added a commit to jpfr/open62541 that referenced this issue Jun 11, 2020
jpfr added a commit to jpfr/open62541 that referenced this issue Jun 12, 2020
jpfr added a commit that referenced this issue Jun 12, 2020
@jpfr
Copy link
Member

jpfr commented Jun 12, 2020

@srdgame The first part of your proposed fix was merged in #3696.
Thanks!

Investigating the second part...

@kleskjr
Copy link
Author

kleskjr commented Jun 12, 2020

@jpfr
using standart Ubuntu18.04/macOS and amalgamation.

build with:
cmake -DUA_ENABLE_AMALGAMATION=ON -DBUILD_SHARED_LIBS=ON ../

compiled with:
gcc -std=c99 open62541.c write_min.c -o write_min

versions:
cmake version 3.10.2
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

@srdgame
Copy link

srdgame commented Jul 28, 2020

@jpfr If you set the serviceResult to be BADCONNECTIONCLOSED, will it generate an session closed state?

In my case that the opcua service provided by an embedded device, which has pretty low performance on creating subscription. If we have more than 400 nodes to be subscribed, it takes more than 5 minutes which is the session timeout.

If session renewal generates the session state 'CLOSED', it will break initialization process of my application

Yes, the POOR device takes about one second to handle one 'SUB' request.

@Kaushil17
Copy link

Kaushil17 commented Jul 28, 2020

@jpfr facing same issue while connecting with server without any security policy I get an error like this.


[2020-07-28 21:31:26.696 (UTC+0550)] info/channel	Connection 3 | SecureChannel 2 | Opened SecureChannel with SecurityPolicy http:https://opcfoundation.org/UA/SecurityPolicy#None

[2020-07-28 21:31:26.696 (UTC+0550)] info/client	Client Status: ChannelState: Open, SessionState: Closed, ConnectStatus: Good

[2020-07-28 21:31:26.711 (UTC+0550)] info/client	Rejecting UserTokenPolicy 0 in endpoint 6: security policy 'http:https://opcfoundation.org/UA/SecurityPolicy#Basic128Rsa15' not available

[2020-07-28 21:31:26.711 (UTC+0550)] info/client	Selected Endpoint opc.tcp:https://kaushil:53530/OPCUA/SimulationServer with SecurityMode None and SecurityPolicy http:https://opcfoundation.org/UA/SecurityPolicy#None

[2020-07-28 21:31:26.711 (UTC+0550)] info/client	Selected UserTokenPolicy anonymous with UserTokenType Anonymous and SecurityPolicy http:https://opcfoundation.org/UA/SecurityPolicy#None

[2020-07-28 21:31:26.726 (UTC+0550)] info/client	Received a ServiceFault response

[2020-07-28 21:31:26.726 (UTC+0550)] info/client	The ServiceResult has the StatusCode BadInternalError

[2020-07-28 21:31:26.726 (UTC+0550)] info/client	Client Status: ChannelState: Open, SessionState: Closed, ConnectStatus: BadInternalError

[2020-07-28 21:31:26.726 (UTC+0550)] info/client	Client Status: ChannelState: Closed, SessionState: Closed, ConnectStatus: BadInternalError```

@NoelGraf NoelGraf added Component: Server Issues related to the server code Priority: Medium Issue with medium priority Status: Pending Needs final decision if issue is valid or should be abandoned. Or if the issue needs to be discussed Type: Bug Bug in the code which needs to be fixed labels May 6, 2021
@jpfr jpfr removed this from the 1.1 milestone Nov 23, 2021
@aiweed
Copy link

aiweed commented Oct 11, 2023

static UA_StatusCode
connectSync(UA_Client *client) {
    UA_DateTime now = UA_DateTime_nowMonotonic();
    UA_DateTime maxDate = now + ((UA_DateTime)client->config.timeout * UA_DATETIME_MSEC);

    UA_StatusCode retval = initConnect(client);
    if(retval != UA_STATUSCODE_GOOD)
        return retval;

    while(retval == UA_STATUSCODE_GOOD) {
        if(!client->endpointsHandshake && !client->findServersHandshake &&
           client->discoveryUrl.length > 0 &&
           (client->sessionState == UA_SESSIONSTATE_ACTIVATED ||
            (client->noSession && client->channel.state == UA_SECURECHANNELSTATE_OPEN)))
            break;
        now = UA_DateTime_nowMonotonic();
        if(maxDate < now)
            return UA_STATUSCODE_BADTIMEOUT;
        retval = UA_Client_run_iterate(client,
                                       (UA_UInt32)((maxDate - now) / UA_DATETIME_MSEC));
    }

    return retval;
}

这段代码如果连接是正确的,返回超时错误导致的,我理解的 UA_Client_run_iterate 这个函数返回结果如果是 UA_STATUSCODE_GOOD 然后直接return retval;

static UA_StatusCode
connectSync(UA_Client *client) {
    UA_DateTime now = UA_DateTime_nowMonotonic();
    UA_DateTime maxDate = now + ((UA_DateTime)client->config.timeout * UA_DATETIME_MSEC);

    UA_StatusCode retval = initConnect(client);
    if(retval != UA_STATUSCODE_GOOD)
        return retval;

    retval = -1;
    while(retval != UA_STATUSCODE_GOOD) {
        if(!client->endpointsHandshake && !client->findServersHandshake &&
           client->discoveryUrl.length > 0 &&
           (client->sessionState == UA_SESSIONSTATE_ACTIVATED ||
            (client->noSession && client->channel.state == UA_SECURECHANNELSTATE_OPEN)))
            break;
        now = UA_DateTime_nowMonotonic();
        if(maxDate < now)
            return retval;//返回真正的错误代码
        retval = UA_Client_run_iterate(client,
                                       (UA_UInt32)((maxDate - now) / UA_DATETIME_MSEC));
    }

    return retval;
}

版本 1.3.6
编译参数 cmake -DUA_BUILD_EXAMPLES=ON -DUA_ENABLE_PUBSUB=ON -DUA_ENABLE_SUBSCRIPTIONS=ON -DUA_ENABLE_SUBSCRIPTIONS_EVENTS=ON ..

@jpfr
Copy link
Member

jpfr commented Oct 11, 2023

@aiweed added this comment:

返回真正的错误代码 --> Return the actual error code

I don't fully agree with the change.
We want to return a timeout whenever time has run out.
That is independent from the sucess of UA_Client_run_iterate.

Since this issue was opened we have reworked the client logic and also tested against the Prosys simulation server.
Is there a publicly available example where the issue can still be reproduced?

@aiweed
Copy link

aiweed commented Oct 11, 2023

you are right.
My original intention is to solve the problem of automatic reconnection after the connection between the client and the server is disconnected. The example used in the example is client_subscription_loop.c and server.cpp, and then an error occurs.

[2023-10-11 18:07:45.398 (UTC+0800)] warn/userland	AcceptAll Certificate Verification. Any remote certificate will be accepted.
[2023-10-11 18:07:45.401 (UTC+0800)] info/userland	Waiting for ack
[2023-10-11 18:07:45.401 (UTC+0800)] info/userland	Session disconnected
[2023-10-11 18:07:45.401 (UTC+0800)] info/userland	Session disconnected
[2023-10-11 18:07:45.401 (UTC+0800)] info/userland	Waiting for OPN Response
[2023-10-11 18:07:45.401 (UTC+0800)] info/userland	Session disconnected
[2023-10-11 18:07:45.401 (UTC+0800)] info/channel	Connection 3 | SecureChannel 3 | SecureChannel opened with SecurityPolicy http:https://opcfoundation.org/UA/SecurityPolicy#None and a revised lifetime of 600.00s
[2023-10-11 18:07:45.401 (UTC+0800)] info/client	Client Status: ChannelState: Open, SessionState: Closed, ConnectStatus: Good
[2023-10-11 18:07:45.401 (UTC+0800)] info/userland	A SecureChannel to the server is open
[2023-10-11 18:07:45.401 (UTC+0800)] info/userland	Session disconnected
[2023-10-11 18:07:45.401 (UTC+0800)] info/client	Use the EndpointURL opc.tcp:https://jason-virtual-machine:4840/ returned from FindServers
[2023-10-11 18:07:45.401 (UTC+0800)] info/client	Client Status: ChannelState: Closed, SessionState: Closed, ConnectStatus: Good
[2023-10-11 18:07:45.401 (UTC+0800)] info/userland	The client is disconnected
[2023-10-11 18:07:45.401 (UTC+0800)] info/userland	Session disconnected
[2023-10-11 18:07:45.401 (UTC+0800)] info/userland	The client is disconnected
[2023-10-11 18:07:45.401 (UTC+0800)] info/userland	Session disconnected
[2023-10-11 18:07:45.402 (UTC+0800)] info/userland	Waiting for ack
[2023-10-11 18:07:45.402 (UTC+0800)] info/userland	Session disconnected
[2023-10-11 18:07:45.402 (UTC+0800)] info/userland	Session disconnected
[2023-10-11 18:07:45.402 (UTC+0800)] info/userland	Waiting for OPN Response
[2023-10-11 18:07:45.402 (UTC+0800)] info/userland	Session disconnected
[2023-10-11 18:07:45.402 (UTC+0800)] info/channel	Connection 3 | SecureChannel 4 | SecureChannel opened with SecurityPolicy http:https://opcfoundation.org/UA/SecurityPolicy#None and a revised lifetime of 600.00s
[2023-10-11 18:07:45.403 (UTC+0800)] info/client	Client Status: ChannelState: Open, SessionState: Closed, ConnectStatus: Good
[2023-10-11 18:07:45.403 (UTC+0800)] info/userland	A SecureChannel to the server is open
[2023-10-11 18:07:45.403 (UTC+0800)] info/userland	Session disconnected
[2023-10-11 18:07:45.403 (UTC+0800)] info/client	Selected endpoint 0 in URL opc.tcp:https://jason-virtual-machine:4840/ with SecurityMode None and SecurityPolicy http:https://opcfoundation.org/UA/SecurityPolicy#None
[2023-10-11 18:07:45.403 (UTC+0800)] info/client	Selected UserTokenPolicy open62541-anonymous-policy with UserTokenType Anonymous and SecurityPolicy http:https://opcfoundation.org/UA/SecurityPolicy#None
[2023-10-11 18:07:45.403 (UTC+0800)] info/userland	A SecureChannel to the server is open
[2023-10-11 18:07:45.403 (UTC+0800)] info/client	Client Status: ChannelState: Open, SessionState: Created, ConnectStatus: Good
[2023-10-11 18:07:45.403 (UTC+0800)] info/userland	A SecureChannel to the server is open
[2023-10-11 18:07:45.403 (UTC+0800)] info/userland	A SecureChannel to the server is open
[2023-10-11 18:07:45.403 (UTC+0800)] info/client	Client Status: ChannelState: Open, SessionState: Activated, ConnectStatus: Good
[2023-10-11 18:07:45.404 (UTC+0800)] info/userland	A SecureChannel to the server is open
[2023-10-11 18:07:45.404 (UTC+0800)] info/userland	A session with the server is activated
[2023-10-11 18:07:45.404 (UTC+0800)] info/userland	Create subscription succeeded, id 2
[2023-10-11 18:07:45.404 (UTC+0800)] info/userland	Monitoring UA_NS0ID_SERVER_SERVERSTATUS_CURRENTTIME', id 1
[2023-10-11 18:07:45.404 (UTC+0800)] warn/client	Client already connected
[2023-10-11 18:07:50.404 (UTC+0800)] error/userland	BadTimeout
[2023-10-11 18:07:51.405 (UTC+0800)] warn/client	Client already connected
[2023-10-11 18:07:56.405 (UTC+0800)] error/userland	BadTimeout
[2023-10-11 18:07:57.405 (UTC+0800)] warn/client	Client already connected
[2023-10-11 18:08:02.405 (UTC+0800)] error/userland	BadTimeout
[2023-10-11 18:08:03.406 (UTC+0800)] warn/client	Client already connected

If I modify it like this, I find that the server restarts abnormally and the client cannot automatically connect. If the network is disconnected, it seems to be able to connect automatically, but if the server restarts, it cannot reconnect normally.

int
main(void) {

    signal(SIGINT, stopHandler); /* catches ctrl-c */

    UA_Client *client = UA_Client_new();
    UA_ClientConfig *cc = UA_Client_getConfig(client);
    UA_ClientConfig_setDefault(cc);

    /* Set stateCallback */
    cc->stateCallback = stateCallback;
    cc->subscriptionInactivityCallback = subscriptionInactivityCallback;

    UA_StatusCode retval = -1;

    /* if already connected, this will return GOOD and do nothing */
    /* if the connection is closed/errored, the connection will be reset and then
     * reconnected */
    /* Alternatively you can also use UA_Client_getState to get the current state */
    if(retval != UA_STATUSCODE_GOOD) {
        // UA_Client_connect(client, "opc.tcp:https://192.168.0.213:4840");
        UA_Client_connect(client, "opc.tcp:https://localhost:4840");
        UA_sleep_ms(1000);
    }

    while(running) {

        retval = UA_Client_run_iterate(client, 500);
    };

    /* Clean up */
    UA_Client_delete(client); /* Disconnects the client internally */
    return EXIT_SUCCESS;
}

Therefore, I have judged the timing of reconnection and can reconnect normally.

    UA_StatusCode retval = -1;
    while(running) {
        /* if already connected, this will return GOOD and do nothing */
        /* if the connection is closed/errored, the connection will be reset and then
         * reconnected */
        /* Alternatively you can also use UA_Client_getState to get the current state */
        if(retval != UA_STATUSCODE_GOOD) {
            UA_Client_connect(client, "opc.tcp:https://localhost:4840");
            UA_sleep_ms(1000);
        }

        retval = UA_Client_run_iterate(client, 500);
    };

@jpfr
Copy link
Member

jpfr commented Oct 11, 2023

We have different levels of "abort".
If a connection is fully open and breaks down, then we try to reconnect.
If we are not connected and the connection fails, then we cannot recover.

You can solve this by having an "outer loop" that does a reset+reconnect if a connection has failed in a way that cannot be recovered.

We could think of having automated retry with a delay.
But that logic is typically handled by the application where the client is embedded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Server Issues related to the server code Priority: Medium Issue with medium priority Status: Pending Needs final decision if issue is valid or should be abandoned. Or if the issue needs to be discussed Type: Bug Bug in the code which needs to be fixed
Projects
None yet
Development

No branches or pull requests

7 participants