Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCS zarr datasets can only be opened with token='anon' from binder #14

Open
rabernat opened this issue Sep 14, 2018 · 4 comments
Open

Comments

@rabernat
Copy link
Member

I am trying out these examples with the pangeo binder.

In pangeo.pydata.org, the following code works:

ds = xr.open_zarr(gcsfs.GCSMap('pangeo-data/SOSE')

But in hub.binder.pangeo.io, it fails with

_call exception: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8d48ca4ac8>: Failed to establish a new connection: [Errno 110] Connection timed out',))
Traceback (most recent call last):
  File "/srv/conda/lib/python3.6/site-packages/urllib3/connection.py", line 141, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  File "/srv/conda/lib/python3.6/site-packages/urllib3/util/connection.py", line 83, in create_connection
    raise err
  File "/srv/conda/lib/python3.6/site-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out

I find I need to do

ds = xr.open_zarr(gcsfs.GCSMap('pangeo-data/SOSE', gcs=gcsfs.GCSFileSystem(token='anon')))

which is significantly uglier and more complicated.

Can we somehow make anonymous tokens the default for gcsfs?

cc @martindurant, @jhamman

@martindurant
Copy link
Collaborator

Is this perhaps a case for hiding the exact data loading invocation in a Intake catalogue? :)

@martindurant
Copy link
Collaborator

In more detail, gcsfs is supposed to try the auth mechanisms in the following order (if not supplied), ['google_default', 'cache', 'cloud', 'anon'], meaning that if no credentials are found, you fall back to anon. Why this isn't working, I don't know, it'd be worthwhile finding out what kind of auth it thinks was successfully established.

@rabernat
Copy link
Member Author

I have a feeling this is related to the automatic blocking of access to the metadata server which occurs in zero2jupyterhub-k8s. We override this here for pangeo.pydata.org. But probably not for binder.

@jhamman
Copy link
Member

jhamman commented Sep 14, 2018

@rabernat - I just redeployed binder with the couldMetadata option set. Would you mind trying again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants