Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check on closed event loop #7356

Closed
wants to merge 1 commit into from
Closed

Conversation

drew2a
Copy link
Collaborator

@drew2a drew2a commented Apr 5, 2023

@drew2a drew2a marked this pull request as ready for review April 5, 2023 13:42
@drew2a drew2a requested a review from a team as a code owner April 5, 2023 13:42
@drew2a drew2a requested review from kozlovsky and removed request for a team April 5, 2023 13:42
Copy link
Collaborator

@kozlovsky kozlovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I don't like the proposed change of the AsyncGroup.add_task call for several reasons:

  1. The new version of the code silently suppresses the RuntimeError exception, assuming the reason for it is that the event loop is closed. But is it the only reason we can get RuntimeError here? I don't know. As a result of the change, we can miss another reason for RuntimeError by suppressing it instead of handling it properly. In the current version of the PR, we don't even add the actual error message to the text of the warning. But if we add it, it is probably not enough, as the error is actually not reported, and the chance we notice it in logs on a developer machine (or notice it in Sentry report related to some different error) is not very big.

  2. The previous design assumes that as a result of the AsyncGroup.add_task call, the task is always created. With a new design, it is possible that the task was not created as a result of the call. To check it, the return value should be examined. With this approach, later, we can discover hard-to-debug problems when the code assumes that some task was created, but actually, it wasn't.

  3. Currently, the error happens when a task is created during the shutdown phase. The PR supposes to handle errors in the following way:

    task = group.add_task(...)
    if task:  # we are probably in a shutdown phase, do something with it

    while it is actually a logical error that we attempted to create a task during the shutdown. It is more correct to handle it in the following way:

    # We should not even try to call `add_task` if we are in a shutdown state
    if not self._shutdown:
        group.add_task(...)

Specifically, I want to:

  1. avoid a possible future situation when Tribler is in a shutdown state, a component mistakenly starts some important shutdown-related code as a separate task, and then this task is silently ignored;
  2. avoid a necessity to examine the return value each time the add_task method is called (as it looks improper to just ignore a possible situation that the task was not actually created)

As an alternative approach, to fix the specific error we actually have in #7353, we can change the code of the EventsEndpoint in the following way: we can add a new synchronous method, EventsEndpoint.send_event, and use it in the EventsEndpoint.on_notification handler:

Instead:

    def on_notification(self, topic, *args, **kwargs):
        if topic in topics_to_send_to_gui:
            data = {"topic": topic.__name__, "args": args, "kwargs": kwargs}
            self.async_group.add_task(self.write_data(data))

we can do the following:

    def on_notification(self, topic, *args, **kwargs):
        if topic in topics_to_send_to_gui:
            self.send_event({"topic": topic.__name__, "args": args, "kwargs": kwargs})

with the following implementation of the send_event method:

    def send_event(self, message: Dict[str, Any]) -> bool:
        """
        Write data over the event socket if it's open (in a separate async task).
        """
        if self._shutdown or not self.has_connection_to_gui():
            self._logger.warning(f"Skip message: {message}")

        self.async_group.add_task(self.write_data(message))

The differences with the approach in the PR are the following:

  1. We do not mask RuntimeError here; instead, we prevent the specific situation where it can arise;
  2. The situation is handled explicitly; there is not some hidden logic that suppresses task creation;
  3. We have a clear log message in case the event sending to GUI was skipped

We have a trade-off here, and I'd like to hear a third-person opinion on this (@xoriole ?). I see the trade-off as the choice between:

  • the approach of this PR: implicit suppression of all errors to not annoy users, with a danger of not running some code during the shutdown and a burden to explicitly check the result of add_task in all places when the actual creation of the task is important;
  • the alternative approach: explicitly check the shutdown state before calling add_task; not possible to accidentally skip task creation; a small chance of having another component that still does not have if self._shutdown check and can annoy users with an error during the shutdown.

@kozlovsky kozlovsky requested a review from xoriole April 11, 2023 10:28
Comment on lines -31 to +32
task = self.async_group.add_task(self.on_tribler_shutdown())
task.add_done_callback(lambda result: TinyTriblerService._graceful_shutdown(self))
if task := self.async_group.add_task(self.on_tribler_shutdown()):
task.add_done_callback(lambda result: TinyTriblerService._graceful_shutdown(self))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this piece of code confirms what I'm saying. Here we have a code for graceful shutdown. That is, the code should be executed during the shutdown. With the design approach taken in this PR, this code can be totally and silently skipped. What kind of graceful shutdown is this if it can be completely ignored without any warning?

@drew2a drew2a closed this Apr 13, 2023
@drew2a drew2a deleted the fix/7353 branch April 14, 2023 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants