Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sporadic failure on system_tests/test_communication #275

Open
YuanYuYuan opened this issue Sep 2, 2024 · 3 comments
Open

Sporadic failure on system_tests/test_communication #275

YuanYuYuan opened this issue Sep 2, 2024 · 3 comments

Comments

@YuanYuYuan
Copy link
Contributor

YuanYuYuan commented Sep 2, 2024

rmw_zenoh_cpp sometimes fails to pass system_tests/test_communication with the following error.

Traceback (most recent call last):
  File "/ws/src/system_tests/test_communication/test/action_server_py.py", line 199, in <module>
    rclpy.spin(node)
  File "/ws/install/rclpy/lib/python3.12/site-packages/rclpy/__init__.py", line 310, in spin
    executor.spin_once()
  File "/ws/install/rclpy/lib/python3.12/site-packages/rclpy/executors.py", line 853, in spin_once
    self._spin_once_impl(timeout_sec)
  File "/ws/install/rclpy/lib/python3.12/site-packages/rclpy/executors.py", line 837, in _spin_once_impl
    handler, entity, node = self.wait_for_ready_callbacks(
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ws/install/rclpy/lib/python3.12/site-packages/rclpy/executors.py", line 807, in wait_for_ready_callbacks
    return next(self._cb_iter)
           ^^^^^^^^^^^^^^^^^^^
  File "/ws/install/rclpy/lib/python3.12/site-packages/rclpy/executors.py", line 711, in _wait_for_ready_callbacks
    wait_set.wait(timeout_nsec)
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/ws/src/system_tests/test_communication/test/action_server_py.py", line 184, in <module>
    with rclpy.init(args=[]):
  File "/ws/install/rclpy/lib/python3.12/site-packages/rclpy/__init__.py", line 112, in __exit__
    try_shutdown(context=shutdown_context, uninstall_handlers=self.installed_signal_handlers)
  File "/ws/install/rclpy/lib/python3.12/site-packages/rclpy/__init__.py", line 200, in try_shutdown
    _try_shutdown(context=context)
  File "/ws/install/rclpy/lib/python3.12/site-packages/rclpy/utilities.py", line 104, in try_shutdown
    g_default_context.try_shutdown()
  File "/ws/install/rclpy/lib/python3.12/site-packages/rclpy/context.py", line 174, in try_shutdown
    self._cleanup()
  File "/ws/install/rclpy/lib/python3.12/site-packages/rclpy/context.py", line 156, in _cleanup
    self.__context.shutdown()
rclpy._rclpy_pybind11.RCLError: failed to shutdown: rcl_shutdown already called on the given context, at /ws/src/rcl/rcl/src/rcl/init.c:290

To reproduce

colcon build --packages-select test_communication --cmake-args -DSKIP_MULTI_RMW_TESTS=On
/usr/bin/python3 /ws/src/system_tests/test_communication/test/action_server_py.py NestedMessage /test_time_12_40_58
(Press Ctrl-C)

Note: The root cause is zenoh possibly takes a bit longer to close the session in rmw_shutdown. One can easily reproduce this by adding a sleep in rmw_shutdown.

@YuanYuYuan YuanYuYuan changed the title poradic Sporadic failure of system_tests/test_communication Sep 2, 2024
@YuanYuYuan YuanYuYuan changed the title Sporadic failure of system_tests/test_communication Sporadic failure on system_tests/test_communication Sep 2, 2024
@YuanYuYuan
Copy link
Contributor Author

The analysis has been posted in rclpy ros2/rclpy#1352

@Yadunund
Copy link
Member

Yadunund commented Sep 3, 2024

@YuanYuYuan thanks for the detailed analysis! Curious if this issue only arises with rclpy or also rclcpp? Also, would this be related to #170?

I still think shutting down the Zenoh session within rmw_shutdown is the right thing to do. If downstream packages such as rclpy and rclcpp are not equipped to handle cases where this function takes a while to execute, then we should fix expectations in those downstream implementations to handle these cases.

@YuanYuYuan
Copy link
Contributor Author

Hi @Yadunund, I believe this issue is mainly caused by rclpy and the fix is coming ros2/rclpy#1353.

Also, would this be related to #170?

Actually, it would be a possible consequence once we fix rclpy. I think there has already been many debates in Rust community about this problem. See here for more details. In short, Rust suggests not doing this way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants