Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkable#ProcessCheckResult(): only clean up ack comments older than check result #9718

Merged

Conversation

Al2Klimov
Copy link
Member

@Al2Klimov Al2Klimov commented Mar 3, 2023

Normally if for some reason an ack comment still exists on a checkable not
acked anymore, still clean it up. But while replaying log config objects
incl. ack comments come before check results and acks. I.e. 1) ack comment,
2) DOWN check result and 3) ack. Not 1) DOWN check result, 2) ack and 3) ack
comment. So the checkable is temporarily not acked, but already has the ack
comment. In this case the DOWN check result which is older than the ack
comment shall not clean up the latter.

fixes #9652
closes #9709

TODO

Copy link
Contributor

@julianbrost julianbrost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't done any testing myself so far but the change looks fine logic-wise.

lib/icinga/checkable-comment.cpp Outdated Show resolved Hide resolved
… check result

Normally if for some reason an ack comment still exists on a checkable not
acked anymore, still clean it up. But while replaying log config objects
incl. ack comments come before check results and acks. I.e. 1) ack comment,
2) DOWN check result and 3) ack. Not 1) DOWN check result, 2) ack and 3) ack
comment. So the checkable is temporarily not acked, but already has the ack
comment. In this case the DOWN check result which is older than the ack
comment shall not clean up the latter.
@Wintermute2k6
Copy link

From the latest Testbuild it seems to have improved something:
On a host there was a service with incoherent history entries / screenshots where provided internally
One Host all Ack vanished
Another Host where Ack vanished (shown in Logs , but in History missing, during controll look, was the Host automatically removed from the Director at 13 o'clock)
3 other hosts were synced succesfully

@Al2Klimov
Copy link
Member Author

Nice!

@julianbrost 2.14? For the customer's sake?

@julianbrost julianbrost added this to the 2.14.0 milestone Apr 18, 2023
@julianbrost
Copy link
Contributor

We should do this as this fixes a problem we could reproduce independently. Remaining question: does it fix #9652? "seems to have improved something" doesn't sound very conclusive to me.

@Al2Klimov
Copy link
Member Author

IMAO it fixes that:

Bildschirm_foto 2023-04-18 um 12 23 43

Yes, the customer has the IDO (which we won’t fix unless objectively necessarily) which gets written the wrong timestamps:

Bildschirm_foto 2023-04-18 um 12 23 50

But it's "just" a display problem. I mean, you could argue that the UNACK actually happened at that late time on master2.

Even if I'm wrong: a 90% fix is better than no fix at all.

Also:

They just shouldn’t shutdown the master for so long.

(c) Eric

@julianbrost
Copy link
Contributor

Are all the remaining problems mentioned in #9718 (comment) IDO problems?

@Al2Klimov
Copy link
Member Author

I think so.

@julianbrost julianbrost merged commit eca8890 into master May 5, 2023
@icinga-probot icinga-probot bot deleted the acknowledgement-sync-between-masters-are-not-working-9652 branch May 5, 2023 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Acknowledgement-Sync between masters are not working
3 participants