Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] HTTP library #1346

Closed
theacodes opened this issue Jan 5, 2016 · 62 comments
Closed

[Discussion] HTTP library #1346

theacodes opened this issue Jan 5, 2016 · 62 comments
Assignees

Comments

@theacodes
Copy link
Contributor

Presently, we use httplib2 by default but allow users to specify their own http library (#908) so long as it conforms to httplib2.Http's signature.

httplib2 was chosen because it's the underlying http client used by google/oauth2client. However, httplib2 has a variety of issues include not being thread-safe (#1274), not doing any connection pooling, etc.

We should consider what it would take to move to another http library. The major considerations are:

  1. We must support using oauth2client's credentials with the library. The essential functionality is adding the auth header and performing refresh & retry. This can be done either here or within oauth2client.
  2. The library must work on Google App Engine.
@dhermes
Copy link
Contributor

dhermes commented Jan 5, 2016

As a start, it'd be nice to

>>> http = httplib2.Http()
>>> auth_http = credentials.authorize(http)
>>> auth_http
<oauth2client.transport.AuthenticatedHttp at 0x00deadbeaf00>

and auth_http.request would perform the same functionality that the monkey-patched stuff would

@theacodes
Copy link
Contributor Author

What I would prefer even more:

>>> auth_http = AuthenticatedHttp(credentials)
>>> auth_http
<oauth2client.transport.AuthenticatedHttp at 0x00deadbeaf00>

Which would allow something like:

>>> auth_session = AuthenticatedSession(credentials)
>>> auth_session
<oauth2client.transport.requests.AuthenticatedSession at 0x00deadbeaf00>

To be idiomatic.

@theacodes
Copy link
Contributor Author

Does it make sense to have transports be a separate library all together?

@theacodes
Copy link
Contributor Author

Does it make sense to have transports be a separate library all together?

nvm, the refresh logic is very oauth-centric so it doesn't quite make sense. Though I am curious what special considerations we'll need to make to use those transports in this library.

@theacodes
Copy link
Contributor Author

Wow, there's a ton of code in gcloud.streaming that seems to be working around httplib2.

What's the utility in continuing to support httplib2 directly? Could we move to urllib3 and be better served?

@dhermes
Copy link
Contributor

dhermes commented Jan 5, 2016

gcloud.streaming was for #935. We would have used apitools but it's future was unclear and it was (is?) completely undocumented. Hopefully we can strip down gcloud.streaming to just the core of what we need.


Your suggestion

>>> auth_http = AuthenticatedHttp(credentials)

is not agnostic of the transport. Instead you'd need

>>> auth_http = AuthenticatedHttp(credentials, http)

and you'd need a well-defined interface for http (e.g. defines http.request with signature same as httplib2.Http.request)

@theacodes
Copy link
Contributor Author

Sorry, I may not have been clear: AuthenticatedHttp is the transport - it's httplib2, whereas AuthenticatedSession would be a requests-based transport. Perhaps this is better:

authed_transport = transports.httplib2.Transport(credentials)
authed_transport = transports.urllib3.Transport(credentials)
authed_transport = transports.requests.Transport(credentials)

@dhermes
Copy link
Contributor

dhermes commented Jan 5, 2016

Got it. I think it'd be easier to have a simple default that can use a httplib2.Http object with no modification and then maybe have libraries like oauth2client-urllib3 and oauth2client-requests that define the necessary interface.

@theacodes
Copy link
Contributor Author

I'm not a fan of httplib2's interface, and I think we'd be doing ourselves a disservice if we tried to make that our definition for the future.

Personally, I'd be in favor of dropping httplib2 altogether in favor of urllib3. Requests support would then be pretty straightforward. But that's pretty extreme and the waters for urllib3 and app engine are rocky.

@dhermes
Copy link
Contributor

dhermes commented Jan 5, 2016

Now we're talking! 😀

Can you describe your desired interface?

@theacodes
Copy link
Contributor Author

Not sure yet. :)

Maybe something along the lines of how urllib3 does things already: the urlopen method:

    def urlopen(self, method, url, body=None, headers=None, retries=None,
                redirect=True, assert_same_host=True, timeout=_Default,
                pool_timeout=None, release_conn=None, **response_kw):

Our transport for urllib3 could just be a wrapper around urlopen that does the authentication and refresh logic. Any other transport would need to implement the same urlopen interface.

@theacodes
Copy link
Contributor Author

Maybe it's better to put it this way: I think it's fine to let urllib3's poolmanager define our interface. We don't necessarily need to create any abstract classes or anything like that.

@dhermes
Copy link
Contributor

dhermes commented Jan 5, 2016

Very similar to httplib2.Http.request:

def request(self, uri, method="GET", body=None, headers=None,
            redirections=DEFAULT_MAX_REDIRECTS, connection_type=None):

also google.appengine.api.urlfetch.fetch

def fetch(self, url, payload=None, method=1, headers={}, allow_truncated=False,
          follow_redirects=True, deadline=None, validate_certificate=None):

and urllib2.urlopen in 2.7.6

def urlopen(url, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT):

and urllib2.urlopen in 2.7.10 (latest as of Jan. 5, 2016)

def urlopen(url, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT,
            cafile=None, capath=None, cadefault=False, context=None):

@dhermes
Copy link
Contributor

dhermes commented Jan 5, 2016

We don't necessarily need to create any abstract classes or anything like that.

I agree. I think it'd be nice just to allow someone to create any old transport they like and have it work with oauth2client provided it knew how to quack.

@theacodes
Copy link
Contributor Author

agree. I think it'd be nice just to allow someone to create any old transport they like and have it work with oauth2client provided it knew how to quack.

That's sort of the beauty of urllib3's design - the 'Manager' is the top-level component and there's currently multiple managers (e.g, ProxyManager, AppEngineManager, etc.).

@dhermes
Copy link
Contributor

dhermes commented Jan 5, 2016

Seems like we agree on the interface part, but maybe not on the dependency bit. I'd like it to work without any assumption that the user has urllib3 or knows what it is. (That might be a bit extreme if we want to provide a nice default.)

@theacodes
Copy link
Contributor Author

Considering requests (and by extension urllib3) are currently the 3rd most popular python package (above the package manager, even!) it doesn't seem like much of a stretch for us to use urrlib3 by default and say that anyone who wishes to use another transport should implement urllib3's PoolManager interface. We can even provide the existing AppEngineManager as an example.

@dhermes
Copy link
Contributor

dhermes commented Jan 5, 2016

Yeah that may be the right route. (We should loop in httplib2 at some point.)

@theacodes
Copy link
Contributor Author

So does our plan from here look like:

  1. 100% coverage on oauth2client.
  2. Create a AuthenticatedManager class that handles auth and retry logic and forwards to a urllib3.PoolManager compliant class.
  3. Replace httplib2 usage in in oauth2client and just expect and object with urllib3.PoolManager's signature and exceptions.
  4. Implement aforementioned interface to httplib2 for historical reasons (read: App Engine).
  5. Come up for a plan of attack for this library.

@dhermes
Copy link
Contributor

dhermes commented Jan 5, 2016

You don't think App Engine will be ready to go with urllib3?

@theacodes
Copy link
Contributor Author

It should be, but I'm unsure. It's not as thoroughly tested as raw httplib2. Though the surface area is quite small. What do you think?

@dhermes
Copy link
Contributor

dhermes commented Jan 5, 2016

Raw httplib2 has lots of problems too, but at least it has a lot of cycles.

@theacodes
Copy link
Contributor Author

What are you feelings on replacing httplib2 wholesale with urllib3 in this library once oauth2client is ready?

@dhermes
Copy link
Contributor

dhermes commented Jan 5, 2016

I'm fine with it. I have no allegiance to httplib2 and it is essentially unmaintained, which isn't very desirable.

@theacodes
Copy link
Contributor Author

Cool. Okay. Seems like we have at least a half-baked plan here. Do we want to file bugs on oauth2client to bring the discussion over there?

FWIW @shazow is currently asking for opinions while designing urllib3 2.0 here.

@shazow
Copy link

shazow commented Jan 5, 2016

Fun fact: I wrote urllib3 partly because of very many httplib2-related issues in 2008. :P (Not much had changed.)

@dhermes
Copy link
Contributor

dhermes commented Jan 5, 2016

Ha! Nice. I wish I would've known in 2011 what I know now when I started working with the Google client libraries.

@jonparrott I've already filed googleapis/oauth2client#128 for the http interface discussion

@dhermes
Copy link
Contributor

dhermes commented Jan 21, 2016

Prototype coming anytime soon?

@theacodes
Copy link
Contributor Author

Going through OSS process now. It will live at
GoogleCloudPlatform/httplib2shim.

On Wed, Jan 20, 2016, 5:57 PM Danny Hermes [email protected] wrote:

Prototype coming anytime soon?


Reply to this email directly or view it on GitHub
#1346 (comment)
.

@dhermes
Copy link
Contributor

dhermes commented Jan 21, 2016

👍

@theacodes
Copy link
Contributor Author

httplib2shim has been published.

@dhermes what do you think we should do from here? My initial thoughts:

  1. Merge Make connection return a thread-local instance of http. #1274 for the near-term.
  2. Get system-tests to run with both httplib2 and httplib2shim.
  3. Figure out how to use urllib3 for all of our httplib2 usage, but only use httplib2shim when interacting with oauth2client.
  4. Figure out what to do about oauth2client, if anything.

@dhermes
Copy link
Contributor

dhermes commented Feb 9, 2016

Thanks for the heads-up!

I think we should attack oauth2client first. I don't really like the idea of merging #1274 as-is, but maybe with some design discussions / me calming down, it'll be fine.

@theacodes
Copy link
Contributor Author

What are you ideas for oauth2client? I feel like our hands are kind of tied as we don't control all of the downstream clients.

@ds-hwang
Copy link

ds-hwang commented Apr 6, 2016

I have the same issue.

@theacodes
Copy link
Contributor Author

@ds-hwang Feel free to give httplib2shim a go and report any issues.

@ds-hwang
Copy link

ds-hwang commented Apr 7, 2016

I think httplib2shim is fine.
The root issue is that oauth2client doesn't set proxy up to httplib2shim properly
I found the temporary solution; https://code.google.com/p/googleappengine/issues/detail?id=9533#c4

@theacodes
Copy link
Contributor Author

Update:

This library now uses google-auth instead of oauth2client. google-auth supports httplib2, requests, and urllib3.

We still need a plan of attack for removing httplib2 as the transport here, but we no longer have a hard dependency on it.

@mdmiller53
Copy link

hi all, i appreciate that this issue is being more broadly discussed and will be fixed with #1346. but we've been seeing this issue for a couple years now and it really hasn't been documented at a very visible level, which would have helped when we first ran into this.
i'm a member of the ISB-CGC NCI cloud pilot team and we're tasked with uploading the files from the Genomic Data Center (GDC) into Google cloud storage. we've been collaborating with the Google Genomics team. because the data files are organized by program, cancer type, and data type, the upload can be implemented as embarrassingly parallel but we noticed the errors when trying to use gcloud python client as depicted in #1214. we've been using boto, that doesn't have that problem, but boto3 no longer supports GCS. as one workaround, i tried gcloud_requests which was mentioned in #1214 and that seemed to work fine, but saw that the issue was closed in favor of this issue (makes perfect sense), so i tried httplib2shim just now on an upload of ~1500 files and did see one problem. the code is set up to retry a couple of times on exception. for five files, on the first try, there was the following:

problem uploading gdc/test_local_gdc_upload_run/TCGA-ACC/Isoform Expression Quantification/48081b15-4369-431f-a344-663135e9fa91_isoforms.quantification.txt due to 'http:metadata.google.internal'

but on the subsequent retries, they failed (our application logic) because the file had evidently been uploaded and this error occurred after the successful upload on the first try.
look forward to the final solution being well publicized.

@lukesneeringer lukesneeringer added the priority: p2 Moderately-important priority. Fix may not be included in next release. label Apr 19, 2017
@lukesneeringer lukesneeringer removed the priority: p2 Moderately-important priority. Fix may not be included in next release. label Apr 19, 2017
@asilversempirical
Copy link

What's the current status of thread-safety in the python APIs? #1214 was closed and redirected to this issue, but it'd be nice to have a clearer explanation of what's going on in this issue with respect to thread safety. FWIW https://developers.google.com/api-client-library/python/guide/thread_safety still mentions only httplib2 and a somewhat cumbersome workaround to ensure thread safety.

@lukesneeringer
Copy link
Contributor

Hi @asilversempirical:
@dhermes is in the process of moving us over to urllib3, which is thread-safe.
I actually thought there was an issue for this, but I can not find it, so I am reopening this one. :-)

@dhermes
Copy link
Contributor

dhermes commented Apr 19, 2017

@lukesneeringer I'd guess that issue was #1998

@lukesneeringer
Copy link
Contributor

Okay. Let's keep that one and close this one.

@asilversempirical
Copy link

Thanks for the pointer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants