Requests which are canceled due to service profile governance are not retried #2358

pwlodek · 2019-11-20T15:15:04Z

Issue Type:

Bug report
Feature request

What happened:
Requests which are timed out by the service profile timeout parameter are not subject to retry.

What you expected to happen:
I would like timeouts which are governed by service profile to be treated as failures, and I would like them to be retried.

How to reproduce it (as minimally and precisely as possible):
Create a simple Linkerd service profile, and set a timeout on any route to be a small value, say 500ms. Now if you call this endpoint from a meshed pod, and when it happens that the call takes more than 500ms to complete, the pod will see 504 gateway timeout. No retries are happening.

Anything else we need to know?:
Imagine you have a service endpoint /database/products. When you do GET /database/products you will receive a list of products from a database. Sometimes this call takes 50ms to complete, sometimes it takes 3sec to complete. I would like to be able to tell linkerd hey, if this request takes longer than X, cancel it AND do a retry (according to a retry budget). This situation would be analogues to a situation where this endpoint returns 500, in which case linkerd (if configured) will do a retry according to the budget.

Why is this important? Say I configured above route to timeout after 500ms. I have a meshed pod which does GET /database/products. Say the first request takes more than 500ms, in which case the request is canceled. But linkerd will do a retry, so the request is retried and lets say it will take 100ms this time. So what happened is first requestd timed out after 500ms but it was retried and the second time it took 100ms to complete. So from the calling pod point of view the request took ~600ms to complete. If linkerd didint interfere it would take 3 sec to complete. That is what should happen.

At this point, the fact that Linkerd cancels the request after X and returns 504 gateway timeout is pretty much useless.

Environment:

linkerd version: 2.6
Platform, version, and config files (Kubernetes, DC/OS, etc): Kubernetes 1.14
Cloud provider or hardware configuration: Docker Desktop with Kubernetes

zaharidichev · 2019-11-20T15:17:55Z

@pwlodek I think you created that in the wrong repo, do you mind moving it to the linkerd2 one ?

pwlodek · 2019-11-20T15:25:59Z

Correct, please close this one as the issue pertains to Linkerd2. It is tracked here linkerd/linkerd2#3743

pwlodek · 2019-11-20T15:26:26Z

Closing this one

pwlodek closed this as completed Nov 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Requests which are canceled due to service profile governance are not retried #2358

Requests which are canceled due to service profile governance are not retried #2358

pwlodek commented Nov 20, 2019 •

edited

Loading

zaharidichev commented Nov 20, 2019

pwlodek commented Nov 20, 2019

pwlodek commented Nov 20, 2019

Requests which are canceled due to service profile governance are not retried #2358

Requests which are canceled due to service profile governance are not retried #2358

Comments

pwlodek commented Nov 20, 2019 • edited Loading

zaharidichev commented Nov 20, 2019

pwlodek commented Nov 20, 2019

pwlodek commented Nov 20, 2019

pwlodek commented Nov 20, 2019 •

edited

Loading