Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If you have successfully authenticated and then start getting a 401 you should re-auth. #539

Closed
colinnewell opened this issue May 17, 2021 · 18 comments

Comments

@colinnewell
Copy link

Bug Report Checklist (remove this template if submitting a feature request)

  1. Which version of Kivik are you using? Please state a version number, or master.

master

  1. Which version of Go are you using? (Output of go version) -- Kivik 2.x supports Go 1.9 and later. Kivik 3.x requires Go 1.11 or later, with module support. The Kivik 4/master aims to support Go 1.11 and later, with module support.

1.16

  1. What did you do? (Include your code if possible)

There is a situation where the cookie auth for couchdb (note, I'm not sure whether this bug report belongs in this repo or the couchdb one) gets out of sync, and couchdb returns a 401 to our previously working cookie. In that situation, bar waiting for the cookie to expire, I'm struggling to spot how to rectify the situation.

To replicate this in a reliable but somewhat awkward way I set couchdb to have a long expiry time, got the kivik client to connect and do stuff so it stored the cookie. Then I tweaked the couchdb config to have a shorter expiry and re-started. The cookie now became invalid, and so couchdb returned a 401.

  1. What did you expect to see?

Kivik receive the 401, and try to log in again, and assuming that worked, retry the operation. If that received a 401, then fair enough, chuck that error back.

  1. What did you see instead?

401, and I don't know what I can do to resolve that. There doesn't appear to be an obvious way to tell the library to reauthenticate, or that the cookie is no longer valid, which would probably resolve the situation.

I should note that under normal circumstances the library performs great. When it's in sync, or constantly communicating things work out fine. It either gets sent an updated cookie by couchdb as it rotates the sessions, or it realises that the cookie has expired and does the right thing. The problem occurs when the 2 sides get out of sync somehow.

It could be that I'm missing something in the library that allows me to deal with this. It also might be that this bug report ought to be in the couchdb repo. I'm not too certain where this ought to be resolved.

@colinnewell colinnewell changed the title If you have successfully authenticated and then start getting a 401 you should tre If you have successfully authenticated and then start getting a 401 you should re-auth. May 17, 2021
@colinnewell
Copy link
Author

I might be wrong about 401, it might be 403 and I've lost the plot. Need to retest when I get a moment.

@flimzy
Copy link
Member

flimzy commented May 17, 2021

Thanks for the report. Please keep me posted if you find more information. When I have time, I'll try to investigate and reproduce.

@colinnewell
Copy link
Author

I was correct initially when I said 401. I just checked the network traffic I had recorded. That does make sense, 401 for credentials that don't work. I happened to see a 403 in a test fail, but that must have been authenticated okay, but some permissions weren't right.

        "response": {
          "status": 401,
          "statusText": "401 Unauthorized",
          "httpVersion": "HTTP/1.1",
          "headers": [
            {
              "name": "Server",
              "value": "CouchDB/3.1.1 (Erlang OTP/20)"
            },
            {
              "name": "X-Couch-Request-Id",
              "value": "f7906b61f2"
            },
            {
              "name": "X-Couchdb-Body-Time",
              "value": "0"
            },
            {
              "name": "Cache-Control",
              "value": "must-revalidate"
            },
            {
              "name": "Content-Length",
              "value": "78"
            },
            {
              "name": "Content-Type",
              "value": "application/json"
            },
            {
              "name": "Date",
              "value": "Fri, 14 May 2021 11:58:32 GMT"
            }
          ],
          "cookies": [],
          "content": {
            "mimeType": "application/json",
            "size": 78,
            "text": "{\"error\":\"unauthorized\",\"reason\":\"You are not authorized to access this db.\"}\n"
          },
          "redirectURL": "",
          "headersSize": 0,
          "bodySize": 0,
          "_transferSize": 0
        },

@colinnewell
Copy link
Author

To work around this for now I've basically stuck a facade on the DB to allow me to retry in this situation.

// Put ...
func (c *CouchDB) Put(
    ctx context.Context,
    docID string,
    doc interface{},
    options ...kivik.Options,
) (string, error) {
    var rev string
    err := c.retryOn401(func() error {
        var err error
        rev, err = c.db.Put(ctx, docID, doc, options...)
        return err
    })
    return rev, err
}

...

func (c *CouchDB) retryOn401(f func() error) error {
    if err := f(); err != nil {
        if httpErr, ok := err.(*chttp.HTTPError); ok {
            if httpErr.StatusCode() == 401 {
                // ensure we re-auth when retrying
                if err := c.replaceDB(); err != nil {
                    return fmt.Errorf(
                        "401 and then failed to set up new db object: %w",
                        err,
                    )
                }
                return f()
            }
        }
        return err
    }
    return nil

Internally I end up replacing the whole client/db object under the hood to reauth currently.

Essentially I think you need at the least some sort of 'invalidate token' or 'reauth' method on your auth providers, and ideally an automatic retry mechanism when you do hit a 401. Perhaps the auth mechanisms could have a 'retry warranted' method too? For basic auth a retry isn't really going to make sense. For cookies it does. It depends on if you have a token that can be refreshed.

@colinnewell
Copy link
Author

Here is a sample config I created using mountebank to reproduce and test my changes in my own code. It provides minimal mocks that replicate enough of couchdb's responses.

mountebank.json

The first call to PUT works, the second fails, then it rotates through the responses alternately succeeding and failing.

If figure it would be worth sharing as it might help illustrate the problem, although obviously it's not completely realistic.

@colinnewell
Copy link
Author

Looking at the code in more detail, I'm wondering if the http.RoundTripper interface provides the key. Right now you're using that to set the cookies on request. In theory if we check the response for a 401, we could use that as another signal to invalidate the internal cookie.

I think that works with the spirit of the interface, and also allows the potential change to be very small and not require anything at all structural to be changed.

@colinnewell
Copy link
Author

Actually, I guess that still leaves you needing a retry, but that's less problematic than the situation where you can't move on because the baked in cookie is invalid, but you can't trigger a new log in.

@colinnewell
Copy link
Author

I have mocked out something that potentially allows the cookies to be invalidated using the RoundTripper interface. In some ways it's tempting to do everything there, including a retry, but the interface documentation seems to suggest that is definitely not what one should do, and it's quite plausibly quite problematic.

Invalidating the cookie doesn't seem like too bad an idea at that point, or at least I can't spot a particular reason not to.

@colinnewell
Copy link
Author

I do see prior art on someone using the interface to do a retry. My one concern is if the reading of the body of the request could be problematic for the retry, but that could be cloned.

https://www.jacoelho.com/blog/a-story-about-go-http-client/

@flimzy
Copy link
Member

flimzy commented May 21, 2021

I'm only now looking closely at this. A couple questions and comments...

To replicate this in a reliable but somewhat awkward way I set couchdb to have a long expiry time, got the kivik client to connect and do stuff so it stored the cookie. Then I tweaked the couchdb config to have a shorter expiry and re-started. The cookie now became invalid, and so couchdb returned a 401.

So it seems clear that you're able to trigger this bug by shortening the expiry time after the cookie has been created with a longer one--that much makes sense. But do you have an understanding of how/why this was happening without that change in configuration? Is there a CouchDB bug that needs to be investigated, too?

In any case, this should be fixed, or at least a work-around provided in this library, so my thoughts on that:

  1. Attempting to re-authenticate for a 401 might make sense, but care must be taken to avoid any sort of infinite loop. We don't want an actual bad credential situation to spiral out of control, infinitely retrying to auth, and failing.

  2. You're quite right that many requests cannot be safely retried in case of a 401, but some could. You can see how http.Transport handles retries here: https://golang.org/pkg/net/http/#Transport This driver could do something similar, for eligible requests.

2b. We could get even more out of this if we use the GetBody method of the request, but this won't be trivial, and is starting to stray from the core topic... :)

  1. Simply returning a 401 to the user and providing a re-auth mechanims might be the safest approach. Unless we can find some way to guarantee loops in Support Travis #1 above, this seems the best approach, even though it makes for a poor user experience.

3b. [random idea] I wonder if we could just clear the cookie in case of a 401. That won't cause a loop, and we can assume that cookie is dead at that point. Then any subsequent requests should attempt to re-authenticate... (I think). I guess that's what your PR does.

@colinnewell
Copy link
Author

It sounds like we are basically thinking the same things. My PR is aiming to do 3b. That allows the user to retry as they want.

I'm not certain as to real life circumstances in which this happens, but I would guess time drift between servers might be one candidate. Exactly how disruptive that would be would depend on the scale of that drift, but fundamentally it's possible, and I'd rather minimise the disruption.

The PR isn't perfect, you've made one comment already, and I can take a look at getting it into a state that you're happy to merge it. I have tested it enough to be happy that the current code works and reduces this problem. It just needs refining.

@flimzy
Copy link
Member

flimzy commented May 22, 2021

I was just looking at the current auth code, and was reminded, thanks to a comment, that if CouchDB is configured without 'allow_persistent_cookies', then the cookie does not include an expiry date, and kivik can do nothing about it. I don't think that's what's happening to you, but it would result in a very similar (or identical) behavior: a 401 with no recourse.

@flimzy
Copy link
Member

flimzy commented May 22, 2021

If the problem is the result of clock drift, then maybe one partial solution would be to refresh the cookie before it expires. I'm actually a bit surprised I didn't do this already. See here: https://github.com/go-kivik/couchdb/blob/master/chttp/cookieauth.go#L60-L62

A lot of services will refresh a token halfway to the expiry. Maybe a similar approach can be taken here.

@colinnewell
Copy link
Author

colinnewell commented May 22, 2021

One thing to note is that couchdb automatically refreshes cookies if you are in regular contact with it. I'm not sure at exactly it's rules for doing so are, but you can see as you send a request to do something, it will set a different cookie value in the response to extend the session. I set my session length to 10s in the couchdb server and hammered it with traffic and saw no glitches because of that.

This is probably one of the reasons that this has never been a large problem for you thus far. While the problem outlined here is legit, it is an edge case.

I should also note that when I started traffic again at some point > 10s I saw your library perform beautifully by starting the traffic with an attempt to log in as the expiry log internally worked as designed. Your library is generally working as required, except for holding onto the cookie when there is evidence that it's not working.

@colinnewell
Copy link
Author

colinnewell commented May 22, 2021

Thinking about it, it might be worth noting that someone who has cookie auth and is talking to a cluster without a shared secret set could also hit this sort of behaviour. Ironically the change I'm proposing might make this even more annoying to spot as it could be slightly less pervasive. If you had a cluster of 3 boxes, you'd expect 66% of calls to during the time frame 401. With the automatic removal of the cookie, you'd potentially only see around 33%?

This point makes me wonder about checking if the documentation for this library is clear enough regarding cookie auth and server setup? Cookie auth is generally the neatest option, but you do need to ensure your server cluster is set up correctly.

@flimzy
Copy link
Member

flimzy commented May 22, 2021

it might be worth noting that someone who has cookie auth and is talking to a cluster without a shared secret set could also hit this sort of behaviour.

This sounds like a pretty clearly mis-configured cluster, so I wouldn't try very hard to accomodate this scenario.

@colinnewell
Copy link
Author

I don't mean do anything special in the code. More a note 'Cookie auth is used by default, so ensure your cluster is configured correctly' type of message. It may well be that you have that in there already. I'm not looking at the docs at this point.

@flimzy
Copy link
Member

flimzy commented Jun 3, 2021

I've merged go-kivik/couchdb#280 and backported to v3 with go-kivik/couchdb#281. v3.2.9 includes the fix.

@flimzy flimzy closed this as completed Jun 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants