Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatibility with calling spin_until_future_complete while in callback #1313

Closed
jmblixt3 opened this issue Jul 11, 2024 · 11 comments · Fixed by #1316
Closed

Incompatibility with calling spin_until_future_complete while in callback #1313

jmblixt3 opened this issue Jul 11, 2024 · 11 comments · Fixed by #1316
Assignees

Comments

@jmblixt3
Copy link
Contributor

jmblixt3 commented Jul 11, 2024

Bug report

Required Info:

  • Operating System:
    • Ubuntu 24.04 (devcontainer)
  • Installation type:
    • From source
  • Version or commit hash:
    • rolling, 43198cb
    • (First noticed issue on humble, MRE written for rolling)
  • DDS implementation:
    • fast dds
  • Client library (if applicable):
    • rclcpp, rclpy

Steps to reproduce issue

Utilizing demos repo with action tutorials_py and action_tutorials_cpp code
Apply the following patch (its in a txt file because github issues don't like patches for some reason)
patch

Then build and run the following nodes in separate terminals

ros2 run action_tutorials_py fibonacci_action_server

or

ros2 run action_tutorials_cpp fibonacci_action_server

and

ros2 run action_tutorials_py fibonacci_action_hybrid

Finally, run

ros2 action send_goal /some_action action_tutorials_interfaces/action/Fibonacci

Then run it again

ros2 action send_goal /some_action action_tutorials_interfaces/action/Fibonacci

Expected behavior

On the first execution of ros2 action send_goal we get the expected behavior of

Waiting for an action server to become available...
Sending goal:
     order: 0

Goal accepted with ID: 16988b561e414d82bc9fb1c793788156

Result:
    sequence:
- 0
- 1

Goal finished with status: SUCCEEDED

And any subsequent calls to send_goals to some_action work exactly the same only varying Goal ID

Actual behavior

Any Subsequent calls actually result in

Waiting for an action server to become available...
Sending goal:
     order: 0

^C

So the goal request is never processed by the action hybrid, after the first call.

Additional information

I initially found this when working on #1123, thinking there was a race condition within rclpy, but since the goal request is never getting received by the hybrid action. The default goal request processing is a simple return accepted leaving no place for potential race condition issues in that instance.

@jmblixt3 jmblixt3 changed the title Incompadability with nested action client/servers with mixed rclpy and rclcpp Incompatibility with nested action client/servers with mixed rclpy and rclcpp Jul 11, 2024
@Barry-Xu-2018
Copy link
Contributor

I suspect it's because the same executor (default executor) is being executed simultaneously on different threads.

I tried to use 2 executors (executor1 and default executor) and this issue doesn't occur.
fibonacci_action_hybrid.py.txt

@jmblixt3 jmblixt3 changed the title Incompatibility with nested action client/servers with mixed rclpy and rclcpp Incompatibility with nested action client/servers Jul 12, 2024
@jmblixt3
Copy link
Contributor Author

Looking more closely looks like it isn't caused exclusively by calling a cpp action server and I've adjusted issue description and steps to reproduce

@jmblixt3
Copy link
Contributor Author

Looks like by doing something like this, can mitigate the issue, but feels really awkward. Here's a patch implementing this
example.txt.
Maybe this is just a fundamental limitation of nodes, but using different callback groups should be sufficient in my eyes anyway.

@Barry-Xu-2018
Copy link
Contributor

Barry-Xu-2018 commented Jul 15, 2024

I suspect it's because the same executor (default executor) is being executed simultaneously on different threads.

This guess is incorrect.
The issue is likely related to using the default SingleThreadedExecutor.
I can't explain the root cause yet (It will take some time to pinpoint the cause.). I just tried switching to MultiThreadedExecutor and found that the problem didn't occur.
fibonacci_action_hybrid.py.txt

@Barry-Xu-2018
Copy link
Contributor

@jmblixt3

The root cause is related to rclpy.spin_until_future_complete().

rclpy.spin(node) is to add node to default executor.

The implementation of rclpy.spin_until_future_complete() is

executor = get_global_executor() if executor is None else executor
try:
executor.add_node(node)
executor.spin_until_future_complete(future, timeout_sec)
finally:
executor.remove_node(node)

The last step is to remove node.
After this step, no node is in executor. So no processing will be called at all (rclpy.spin()).

Change code as below

...
#rclpy.spin_until_future_complete(self,request_future)
rclpy.get_global_executor().spin_until_future_complete(request_future)
...
#rclpy.spin_until_future_complete(self,result_future)
rclpy.get_global_executor().spin_until_future_complete(result_future)

@jmblixt3
Copy link
Contributor Author

Would it be reasonable to check before adding the node if it is already in executor. If already present not remove it after completion?

@Barry-Xu-2018
Copy link
Contributor

Would it be reasonable to check before adding the node if it is already in executor. If already present not remove it after completion?

There won't be any duplicate Nodes added. Refer to line 271

def add_node(self, node: 'Node') -> bool:
"""
Add a node whose callbacks should be managed by this executor.
:param node: The node to add to the executor.
:return: ``True`` if the node was added, ``False`` otherwise.
"""
with self._nodes_lock:
if node not in self._nodes:
self._nodes.add(node)
node.executor = self
# Rebuild the wait set so it includes this new node
self._guard.trigger()
return True
return False

@jmblixt3
Copy link
Contributor Author

I understand it's not adding a duplicate. I was wondering if it would be appropriate to have the node not be removed from executor at the end if it was already spinning.

@Barry-Xu-2018
Copy link
Contributor

I was wondering if it would be appropriate to have the node not be removed from executor at the end if it was already spinning.

Spinning is the executor's responsibility. While spinning, the Nodes added to the executor can be changed.

The executor will recollect the entities from added nodes at an appropriate point.
Recollecting entities will occur after handling all the ready tasks returned by wait_set.wait().

if nodes is None:
nodes_to_use = self.get_nodes()

@jmblixt3 jmblixt3 changed the title Incompatibility with nested action client/servers Incompatibility with calling spin_until_future_complete while in callback Jul 17, 2024
jmblixt3 added a commit to jmblixt3/rclpy that referenced this issue Jul 17, 2024
Closes rclpy:ros2#1313
Current if spin_unitl_future_complete is called inside a nodes callback it removes the node from the executor
This results in any subsiquent waitables to never be checked by the node since the node is no longer in the executor
This aims to fix that by only removing the node from the executor if it wasn't already present
jmblixt3 added a commit to jmblixt3/rclpy that referenced this issue Jul 18, 2024
Closes rclpy:ros2#1313
Current if spin_unitl_future_complete is called inside a nodes callback it removes the node from the executor
This results in any subsiquent waitables to never be checked by the node since the node is no longer in the executor
This aims to fix that by only removing the node from the executor if it wasn't already present
jmblixt3 added a commit to jmblixt3/rclpy that referenced this issue Jul 18, 2024
Closes rclpy:ros2#1313
Current if spin_unitl_future_complete is called inside a nodes callback it removes the node from the executor
This results in any subsiquent waitables to never be checked by the node since the node is no longer in the executor
This aims to fix that by only removing the node from the executor if it wasn't already present

Signed-off-by: Jonathan Blixt <[email protected]>
@Barry-Xu-2018
Copy link
Contributor

I saw your PR #1316 and I understand your idea. I misunderstood earlier.

jmblixt3 added a commit to jmblixt3/rclpy that referenced this issue Aug 13, 2024
Closes rclpy:ros2#1313
Current if spin_unitl_future_complete is called inside a nodes callback it removes the node from the executor
This results in any subsiquent waitables to never be checked by the node since the node is no longer in the executor
This aims to fix that by only removing the node from the executor if it wasn't already present

Signed-off-by: Jonathan Blixt <[email protected]>
@sloretz
Copy link
Contributor

sloretz commented Aug 23, 2024

The PR is a good change, but all spin functions including spin_until_future_complete aren't intended to be used within a callback. Use coroutines instead. Define the callback with async def and use the await keyword to get the result from the future.

async def execute_callback(self, goal_handle: ServerGoalHandle):
        self.get_logger().info("Requesting goal...")
        # self._action_client.send_goal_async()
        request_future = self._action_client.send_goal_async(Fibonacci.Goal(order=goal_handle.request.order))
        spin_handle: ClientGoalHandle = await request_future
        self.get_logger().info("Recieved request result...")
        result_future: Future = spin_handle.get_result_async()
        result = await result_future
        self.get_logger().info("Recieved final result...")
        if not result_future.cancelled():
            goal_handle.succeed()
            return result

sloretz pushed a commit that referenced this issue Aug 24, 2024
Closes rclpy:#1313
Current if spin_unitl_future_complete is called inside a nodes callback it removes the node from the executor
This results in any subsiquent waitables to never be checked by the node since the node is no longer in the executor
This aims to fix that by only removing the node from the executor if it wasn't already present

Signed-off-by: Jonathan Blixt <[email protected]>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
mergify bot pushed a commit that referenced this issue Aug 24, 2024
Closes rclpy:#1313
Current if spin_unitl_future_complete is called inside a nodes callback it removes the node from the executor
This results in any subsiquent waitables to never be checked by the node since the node is no longer in the executor
This aims to fix that by only removing the node from the executor if it wasn't already present

Signed-off-by: Jonathan Blixt <[email protected]>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
(cherry picked from commit 47346ef)
mergify bot pushed a commit that referenced this issue Aug 24, 2024
Closes rclpy:#1313
Current if spin_unitl_future_complete is called inside a nodes callback it removes the node from the executor
This results in any subsiquent waitables to never be checked by the node since the node is no longer in the executor
This aims to fix that by only removing the node from the executor if it wasn't already present

Signed-off-by: Jonathan Blixt <[email protected]>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
(cherry picked from commit 47346ef)
mergify bot pushed a commit that referenced this issue Aug 24, 2024
Closes rclpy:#1313
Current if spin_unitl_future_complete is called inside a nodes callback it removes the node from the executor
This results in any subsiquent waitables to never be checked by the node since the node is no longer in the executor
This aims to fix that by only removing the node from the executor if it wasn't already present

Signed-off-by: Jonathan Blixt <[email protected]>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
(cherry picked from commit 47346ef)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants