Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NO-JIRA: riskanalysis: improve request retries #28961

Merged

Conversation

sosiouxme
Copy link
Member

@sosiouxme sosiouxme commented Jul 29, 2024

I observed that when a RA http request fails for any reason, retries also fail immediately. Stephen speculated that reusing the request object might be a problem. What I could find indicated that would only be a problem in specialized circumstances but it seemed worth trying.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 29, 2024
@sosiouxme
Copy link
Member Author

did not solve the problem :(

I observed that when a RA http request fails for any reason, retries
also fail immediately. Stephen speculated that reusing the request
object might be a problem. What I could find indicated that would only
be a problem in specialized circumstances but it seemed worth trying.
@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: 5bb32d2

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-ipsec-serial High
[sig-cluster-lifecycle] pathological event should not see excessive Back-off restarting failed containers for ns/openshift-multus
This test has passed 100.00% of 55 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-serial' 'periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn-serial'] in the last 14 days.
---
[sig-arch] events should not repeat pathologically for ns/openshift-ovn-kubernetes
This test has passed 100.00% of 55 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-serial' 'periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn-serial'] in the last 14 days.

@sosiouxme
Copy link
Member Author

Latest change seems to have fixed it. I get the feeling sharing the context was the problem. Results that show retries working as they should:

  1. timed out, 2nd try succeeded
  2. timed out, 2nd try succeeded
  3. timed out all 3 tries

@sosiouxme
Copy link
Member Author

/retest-required

@sosiouxme sosiouxme changed the title riskanalysis: improve request retries NO-JIRA: riskanalysis: improve request retries Jul 31, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jul 31, 2024
@openshift-ci-robot
Copy link

@sosiouxme: This pull request explicitly references no jira issue.

In response to this:

I observed that when a RA http request fails for any reason, retries also fail immediately. Stephen speculated that reusing the request object might be a problem. What I could find indicated that would only be a problem in specialized circumstances but it seemed worth trying.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sosiouxme
Copy link
Member Author

/retest-required

@neisw
Copy link
Contributor

neisw commented Jul 31, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 31, 2024
Copy link
Contributor

openshift-ci bot commented Jul 31, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neisw, sosiouxme

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

openshift-ci bot commented Aug 1, 2024

@sosiouxme: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-ipsec-serial 5bb32d2 link false /test e2e-aws-ovn-ipsec-serial
ci/prow/e2e-aws-ovn-single-node 5bb32d2 link false /test e2e-aws-ovn-single-node

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 04973c8 into openshift:master Aug 1, 2024
22 of 24 checks passed
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: openshift-enterprise-tests
This PR has been included in build openshift-enterprise-tests-container-v4.18.0-202408010449.p0.g04973c8.assembly.stream.el9.
All builds following this will include this PR.

@sosiouxme sosiouxme deleted the 20240729-fix-ra-retries branch August 1, 2024 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants