Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate_signed_url is horrendously slow #183

Closed
lukesneeringer opened this issue Aug 4, 2017 · 11 comments
Closed

generate_signed_url is horrendously slow #183

lukesneeringer opened this issue Aug 4, 2017 · 11 comments
Assignees

Comments

@lukesneeringer
Copy link

From @dw on July 28, 2017 10:55

Calls to google.cloud.storage.Blob.generate_signed_url() are too slow.

OS type and version

OS X latest / Linux Ubuntu 17.04 / Debian 8

Python version and virtual environment information python --version

2.7.10

google-cloud-python version pip show google-cloud, pip show google-<service> or `pip

audiocapture] pip freeze | grep google
gapic-google-cloud-pubsub-v1==0.15.4
gapic-google-cloud-speech-v1==0.15.3
google-api-python-client==1.6.2
google-auth==1.0.1
google-auth-httplib2==0.0.2
google-cloud-core==0.25.0
google-cloud-language==0.25.0
google-cloud-pubsub==0.26.0
google-cloud-speech==0.25.1
google-cloud-storage==1.1.1
google-cloud-translate==0.24.0
google-gax==0.15.13
google-resumable-media==0.0.2
googleapis-common-protos==1.5.2
grpc-google-iam-v1==0.11.1
proto-google-cloud-pubsub-v1==0.15.4
proto-google-cloud-speech-v1==0.15.3

Stacktrace if available

         316278 function calls (314359 primitive calls) in 10.201 seconds

   Ordered by: cumulative time
   List reduced from 874 to 100 due to restriction <100>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   10.208   10.208 base.py:61(view)
        1    0.000    0.000   10.208   10.208 mixins.py:53(dispatch)
        1    0.000    0.000   10.208   10.208 base.py:80(dispatch)
        1    0.000    0.000   10.208   10.208 base.py:154(get)
        1    0.000    0.000   10.207   10.207 views.py:166(get_context_data)
    301/1    0.020    0.000    9.927    9.927 serializers.py:479(to_representation)
        1    0.000    0.000    9.926    9.926 serializers.py:648(to_representation)
     1201    0.004    0.000    9.792    0.008 fields.py:1752(to_representation)
      200    0.002    0.000    9.447    0.047 utils.py:15(url_for_blob)
      200    0.003    0.000    9.445    0.047 blob.py:195(generate_signed_url)
      200    0.003    0.000    9.434    0.047 credentials.py:107(generate_signed_url)
      200    0.004    0.000    9.392    0.047 credentials.py:43(_get_signed_query_params)
      200    0.001    0.000    9.384    0.047 service_account.py:314(sign_bytes)
      200    0.001    0.000    9.384    0.047 crypt.py:228(sign)
      200    0.003    0.000    9.382    0.047 pkcs1.py:248(sign)
      200    0.003    0.000    9.346    0.047 key.py:402(blinded_encrypt)
      400    8.601    0.022    8.601    0.022 {pow}
      200    0.002    0.000    8.546    0.043 core.py:33(encrypt_int)
      100    0.000    0.000    4.748    0.047 serializers.py:132(get_raw_video_url)
      100    0.001    0.000    4.747    0.047 utils.py:20(url_for_segment)
      100    0.000    0.000    4.720    0.047 serializers.py:127(get_web_video_url)
      100    0.001    0.000    4.713    0.047 utils.py:26(url_for_encoding)
      200    0.007    0.000    0.711    0.004 key.py:121(unblind)
      200    0.001    0.000    0.704    0.004 common.py:133(inverse)
      200    0.703    0.004    0.703    0.004 common.py:108(extended_gcd)

Steps to reproduce

  1. Create a google.cloud.storage.Client() with an RSA key configured
  2. Create a google.cloud.storage.Bucket() from that Client
  3. Create a google.cloud.storage.Blob() from that Bucket
  4. Call blob.generate_signed_url(86400) in a short (100 iteration) loop
  5. Observe runtime greatly exceeds expectations for any comparative operation implemented in a sensible manner.
  6. Notice rsa pure-Python RSA implementation is in use, even though OpenSSL and suchlike are available

Code example

client = make_client()
blob = client.bucket('foo').blob('bar')
for x in xrange(100):
    blob.generate_signed_url(86400)

Copied from original issue: googleapis/google-cloud-python#3696

@lukesneeringer
Copy link
Author

From @dw on July 28, 2017 11:34

The implementation from https://github.com/GoogleCloudPlatform/storage-signedurls-python looks much more competitive, it manages closer to 18ms per URL, vs. 47ish for this package. I'm still deeply surprised that RSA signing would be so slow, even with the OpenSSL implementation used by storage-signedurls-python

@lukesneeringer lukesneeringer added api: storage Issues related to the Cloud Storage API. auth labels Aug 4, 2017
@lukesneeringer
Copy link
Author

From @arthurdarcet on July 28, 2017 12:4

with the cryptography package:

private_key = '-----BEGIN PRIVATE KEY-----\n…'
backend = cryptography.hazmat.backends.default_backend()
key = cryptography.hazmat.primitives.serialization.load_pem_private_key(
	private_key.encode('utf-8'),
	password=None,
	backend=backend,
)

to_sign = '\n'.join((
	method.upper(),
	content_md5,
	content_type,
	str(expires),
	'/' + path,
))

sign = key.sign(
	to_sign.encode('utf-8'),
	cryptography.hazmat.primitives.asymmetric.padding.PKCS1v15(),
	cryptography.hazmat.primitives.hashes.SHA256(),
)

clocks at 1.2ms per signature on my machine

@lukesneeringer
Copy link
Author

From @dw on July 28, 2017 14:57

Thanks for this! I've borrowed it for our application as a temporary measure. I wonder why PyCrypto is so much slower

@lukesneeringer
Copy link
Author

From @dhermes on July 28, 2017 15:15

Notice rsa pure-Python RSA implementation is in use, even though OpenSSL and suchlike are available

I was going to point this out, but you've already noticed (which is great). We used to support pyOpenSSL and PyCrypto (as well as rsa) in oauth2client, but I'm not sure what the current "state of the art" is in google-auth (will check now).

RE: PyCrypto, that library is no longer maintained, so we won't be adding support for credential signing from it.

@lukesneeringer
Copy link
Author

From @arthurdarcet on July 28, 2017 15:23

PyCrypto was more or less replaced by pyca/cryptography: https://cryptography.io/en/latest/
It would be very nice to be able to use this in oauth2client, but I guess the issue is on the wrong repo

@lukesneeringer
Copy link
Author

From @dhermes on July 28, 2017 15:28

@arthurdarcet I am currently putting together a "proof-of-concept" signer using pyOpenSSL, so hopefully this will go somewhere. (I've not used pyca/crytography enough to be familiar.)

I'd like to emphatically point out that oauth2client is "dead", and development has moved to google-auth. (So that's where we'd want support for these signers.)

@lukesneeringer
Copy link
Author

From @dw on July 28, 2017 15:40

@dhermes Note that pyOpenSSL has been on life support under PyCA for the last few years, and before that it had no active maintainer. The PyCA guys recommend the cryptography package for all new code

Whups, confused it with PyCrypto :)

@lukesneeringer
Copy link
Author

From @dhermes on July 28, 2017 16:12

@dw Thanks for the info. Here is the pyOpenSSL example I was working on (though maybe it's not so relevant as you point out):
https://gist.github.com/dhermes/036d4cb068e692a6d6e4a1f6e10e8b29

There isn't a great way in google-auth of getting this in use, here is a super hacky approach.

In [1]: import google.auth

In [2]: from openssl_signer import OpenSSLSigner

In [3]: 

In [3]: credentials, _ = google.auth.default()

In [4]: credentials
Out[4]: <google.oauth2.service_account.Credentials at 0x7fe1c8296780>

In [5]: old_signer = credentials._signer

In [6]: old_signer
Out[6]: <google.auth.crypt.RSASigner at 0x7fe1c82492e8>

In [7]: key = old_signer._key.save_pkcs1()

In [8]: key[:30]
Out[8]: b'-----BEGIN RSA PRIVATE KEY----'

In [9]: new_signer = OpenSSLSigner.from_string(
   ...:     key, key_id=old_signer.key_id)
   ...: 

In [10]: message = b'sanity check'

In [11]: old_signer.sign(message) == new_signer.sign(message)
Out[11]: True

In [12]: 

In [12]: credentials._signer = new_signer

actually comparing these two we see about a 46x speedup:

In [13]: %timeit old_signer.sign(message)
33.1 ms ± 261 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [14]: %timeit old_signer.sign(message)
33.1 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [15]: %timeit new_signer.sign(message)
712 µs ± 3.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [16]: %timeit new_signer.sign(message)
719 µs ± 1.51 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

@lukesneeringer
Copy link
Author

From @dhermes on July 28, 2017 16:12

@jonparrott We definitely need to move this conversation over to google-auth.

@lukesneeringer
Copy link
Author

From @jonparrott on July 28, 2017 16:14

Yep, we want to add ECC signers to Google auth so using cryptography for RSA is an option. We'd just need to make it optional.

@lukesneeringer
Copy link
Author

From @dhermes on July 28, 2017 16:53

I add a cryptography signer to my gist (based on the above from @arthurdarcet):

In [17]: from cryptography_signer import CryptographySigner

In [18]: cryptography_signer = CryptographySigner.from_string(
    ...:     key, key_id=old_signer.key_id)
    ...: 

In [19]: old_signer.sign(message) == cryptography_signer.sign(message)
Out[19]: True

In [20]: %timeit cryptography_signer.sign(message)
664 µs ± 5.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [21]: %timeit cryptography_signer.sign(message)
669 µs ± 4.98 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Post-Script: Though it (cryptography) seems to be marginally faster than pyCrypto pyOpenSSL, it isn't actually. I was playing YouTube videos before so my laptop was working harder. In the more restful state, the previous times became:

In [13]: %timeit old_signer.sign(message)
30.3 ms ± 427 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [14]: %timeit old_signer.sign(message)
30.3 ms ± 323 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [15]: %timeit new_signer.sign(message)
680 µs ± 6.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [16]: %timeit new_signer.sign(message)
653 µs ± 14.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

@theacodes theacodes added enhancement and removed api: storage Issues related to the Cloud Storage API. auth labels Aug 4, 2017
dhermes added a commit to dhermes/google-auth-library-python that referenced this issue Aug 5, 2017
Fixes googleapis#183.

NOTE: This change is incomplete. It needs unit tests and a Verifier
      implementation.
dhermes added a commit to dhermes/google-auth-library-python that referenced this issue Aug 11, 2017
Fixes googleapis#183.

NOTE: This change is incomplete. It needs unit tests and a Verifier
      implementation.
theacodes pushed a commit to dhermes/google-auth-library-python that referenced this issue Feb 8, 2018
Fixes googleapis#183.

NOTE: This change is incomplete. It needs unit tests and a Verifier
      implementation.
theacodes pushed a commit that referenced this issue Feb 8, 2018
@yoshi-automation yoshi-automation added triage me I really want to be triaged. 🚨 This issue needs some love. labels Apr 6, 2020
@tseaver tseaver removed triage me I really want to be triaged. 🚨 This issue needs some love. labels Nov 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants