Avoid getting stuck in a loop #2244

sigmavirus24 · 2014-09-23T02:04:04Z

This prevents a case where we make a request to URL A, which 301s to B which
would then 301 back to A. Alternatively, for less simple schemes, this will
also prevent us from getting stuck in a loop, e.g., it will prevent the
following from causing an endless loop:

    A -> B -> C -> D -> E -> F --
    ^                             \
    |                             /
    ---<------------<----------<-

Fixes #2231.

I tested this by cloning httpbin and hard coding a permanent redirect loop from /relative-redirect/1 to /relative-redirect/2 so that we could trigger this. This pull request fixes it.

This prevents a case where we make a request to URL A, which 301s to B which would then 301 back to A. Alternatively, for less simple schemes, this will also prevent us from getting stuck in a loop, e.g., it will prevent the following from causing an endless loop: A -> B -> C -> D -> E -> F -- ^ \ | / ---<------------<----------<-

Lukasa · 2014-09-23T06:27:44Z

This no longer gets us stuck in a loop by causing us to actually hit the URLs that we're being forcibly redirected to, starting with the end of the redirect chain.

Is this what we want? Or do we want to throw an exception, e.g. RedirectLoopError?

sigmavirus24 · 2014-09-23T11:52:54Z

I don't think we should be adding a new exception for this. This will preserve the previous behavior of having hotting a Max Redirects exception thrown. It's no worse than the behavior we had before the redirect cache was introduced.

Sent from my Android device with K-9 Mail. Please excuse my brevity.

sigmavirus24 · 2014-09-23T14:43:52Z

But yes, I suspect we could subclass RedirectLoopError from TooManyRedirects and raise that so as to save the calls to the network that would otherwise cause us to exceed the max number of retries. I'm not opposed to that. I'd like to hear other opinions though from @fcosantos and @RuudBurger (and anyone else that has one).

@kennethreitz thoughts on saving people from hitting the network more than necessary when we can detect an endless redirect loop?

sigmavirus24 · 2014-09-23T14:45:02Z

I should also have mentioned that this saves us from loops like this too:

    A -> B -> C -> D -> E -> F --
        ^                         \
        |                         /
        ------------<----------<-

Or any other case that essentially results in an endless loop

RuudBurger · 2014-09-23T14:48:42Z

The RedirectLoopError seems like a good idea.

In my case, the url works (and redirects properly) in the browser. So maybe there is another issue on how the cache works.
I can provide you with a test url, if you have an email address I can send it to. As it is a url containing an API key.

sigmavirus24 · 2014-09-23T14:52:33Z

@RuudBurger both @Lukasa and I have our emails available on our GitHub profiles. If you'd rather use PGP, you can find my PGP key (and associated email address) by searching https://pgp.mit.edu/ for my real name.

Lukasa · 2014-09-23T15:48:55Z

I see absolutely no reason why permanent redirects in the redirect cache should not count against the max redirects limit. They are still redirects, we're just not hitting the wire to do them. That behaviour will also fix our infinite redirects problem.

fcosantos · 2014-09-24T08:39:10Z

In my opinion an exception needs to be raised and RedirectLoopError is always going to be a TooManyRedirects at the end, if you want to subclass it is fine but maybe unnecessary.

I like @sigmavirus24 set() solution, maybe you want to pack it a bit more:

checked_urls = set()
while request.url in redirect_cache:
    checked_urls.add(request.url)
    request.url = self.redirect_cache.get(request.url)
    if request.url in checked_urls:
        raise *whatever*

sigmavirus24 · 2014-09-24T13:08:48Z

RedirectLoopError is always going to be a TooManyRedirects at the end

@fcosantos I don't quite understand what you mean. Could you clarify this for me?

Lukasa · 2014-09-24T15:09:55Z

I think the key point is that there's no need for a RedirectLoopError, we can just have TooManyRedirects. I still think permanent redirects should count against the redirect limit though, with a loop being a special case where we immediately know that we'll hit TooManyRedirects.

sigmavirus24 · 2014-09-24T15:25:20Z

@Lukasa I can think of a couple ways to make it influence the max_redirects count.

Move the logic into Session#resolve_redirects. This unfortunately would mean actually using the network for the first (possibly permanent) redirect.
Keep count along with the set of how many times we go through that loop and add a new optional parameter to Session#resolve_redirects along the lines of redirects_already_followed=0. We can then initialize i in Session#resolve_redirects with that and that will affect the max number of redirects possible (including using the cache).

Option 2 seems most practical, but I just loathe adding more arguments (that could confuse a user) to a public API like this.

Also, I think there's value in using a subclass of TooManyRedirects. I can see an instance where this might cause confusion because of a case like @RuudBurger has. In a browser, it might very well work just fine, but because of oddities in the usage of requests the loop is caused by the redirect cache. I think providing users a way to disambiguate where the error is actually coming from is very useful (and a better experience).

Lukasa · 2014-09-24T16:07:15Z

I wonder if my concern about the redirect cache leading to reproducible behaviour is just my specific problem. I just realised that the other thing this redirect cache changes is the behaviour of Response.history, which is now not guaranteed to be the same for each request (the redirect cache doesn't populate it).

sigmavirus24 · 2014-09-24T16:15:38Z

@Lukasa good point. I'm not sure we can actually reconstruct the history accurately unless we also cache responses in the redirect cache. In other words, we'd have a cache something like:

{'http:https://example.com/': ('http:https://www.example.com', <Response [301]>)}

And a big problem with that would be timestamps in headers and such (e.g., cookies). All of this which makes me wonder exactly how good an idea it is to keep the redirect cache around.

Lukasa · 2014-09-24T16:22:14Z

I don't know that we should necessarily throw out the redirect cache, but we should at the very least document the hell out of how it is going to behave.

Avoid getting stuck in a loop

kennethreitz · 2014-09-25T02:57:41Z

+1

sigmavirus24 · 2014-09-25T16:36:10Z

Uh... this wasn't exactly ready to merge. (Not that it breaks anything, but we were discussing alternative solutions.)

…uests#2244

sigmavirus24 mentioned this pull request Sep 23, 2014

Endless loop in sessions.py #2231

Closed

kennethreitz added a commit that referenced this pull request Sep 25, 2014

Merge pull request #2244 from sigmavirus24/bug/2231

de76fbe

Avoid getting stuck in a loop

kennethreitz merged commit de76fbe into psf:master Sep 25, 2014

cazzerson referenced this pull request in cazzerson/social-feed-manager Oct 3, 2014

Restrict requests version to avoid redirect loop bug kennethreitz/req…

732011b

…uests#2244

sigmavirus24 mentioned this pull request Oct 17, 2014

Revisit concerns in #2244 #2287

Closed

sigmavirus24 deleted the bug/2231 branch June 6, 2016 20:33

dependencies bot mentioned this pull request Jun 13, 2018

requests versions available: 2.19.0 Harmon758/Harmonbot#188

Closed

pyup-bot mentioned this pull request Jun 30, 2020

Pin requests to latest version 2.24.0 camptocamp/c2cgeoportal#6649

Closed

github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid getting stuck in a loop #2244

Avoid getting stuck in a loop #2244

sigmavirus24 commented Sep 23, 2014

Lukasa commented Sep 23, 2014

sigmavirus24 commented Sep 23, 2014

sigmavirus24 commented Sep 23, 2014

sigmavirus24 commented Sep 23, 2014

RuudBurger commented Sep 23, 2014

sigmavirus24 commented Sep 23, 2014

Lukasa commented Sep 23, 2014

fcosantos commented Sep 24, 2014

sigmavirus24 commented Sep 24, 2014

Lukasa commented Sep 24, 2014

sigmavirus24 commented Sep 24, 2014

Lukasa commented Sep 24, 2014

sigmavirus24 commented Sep 24, 2014

Lukasa commented Sep 24, 2014

kennethreitz commented Sep 25, 2014

sigmavirus24 commented Sep 25, 2014

Avoid getting stuck in a loop #2244

Avoid getting stuck in a loop #2244

Conversation

sigmavirus24 commented Sep 23, 2014

Lukasa commented Sep 23, 2014

sigmavirus24 commented Sep 23, 2014

I don't think we should be adding a new exception for this. This will preserve the previous behavior of having hotting a Max Redirects exception thrown. It's no worse than the behavior we had before the redirect cache was introduced.

sigmavirus24 commented Sep 23, 2014

sigmavirus24 commented Sep 23, 2014

RuudBurger commented Sep 23, 2014

sigmavirus24 commented Sep 23, 2014

Lukasa commented Sep 23, 2014

fcosantos commented Sep 24, 2014

sigmavirus24 commented Sep 24, 2014

Lukasa commented Sep 24, 2014

sigmavirus24 commented Sep 24, 2014

Lukasa commented Sep 24, 2014

sigmavirus24 commented Sep 24, 2014

Lukasa commented Sep 24, 2014

kennethreitz commented Sep 25, 2014

sigmavirus24 commented Sep 25, 2014