Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot use s3 related functions #333

Closed
trojblue opened this issue Nov 11, 2023 · 3 comments
Closed

Cannot use s3 related functions #333

trojblue opened this issue Nov 11, 2023 · 3 comments

Comments

@trojblue
Copy link

trojblue commented Nov 11, 2023

Sysinfo:

(base) ubuntu@ip-10-53-8-252:~$ pip show megfile
Name: megfile
Version: 2.2.9.post3
Summary: Megvii file operation library
Home-page: https://github.com/megvii-research/megfile
Author: megvii
Author-email: [email protected]
License: 
Location: /home/ubuntu/miniconda3/lib/python3.10/site-packages
Requires: boto3, botocore, paramiko, pyyaml, requests, tqdm
Required-by: 

(base) ubuntu@ip-10-53-8-252:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.6 LTS
Release:        20.04
Codename:       focal

Issue:

tried using it in CLI and here's what happened:

(base) ubuntu@ip-10-53-8-252:~$ aws s3 ls s3:https://dataset-ingested/user-preference/
                           PRE pref_100k_min513x768/
                           PRE pref_100k_min513x768_YIELD/
                           PRE sd-human-ft/
                           PRE sd-user-pref-50k-ft-gpt/
                           PRE sd-user-pref-75k-ft/
                           PRE sd-user-pref-v2-large-full/
2023-07-04 00:41:59          0 
2023-07-11 06:16:49       1444 README.md
(base) ubuntu@ip-10-53-8-252:~$ 
(base) ubuntu@ip-10-53-8-252:~$ megfile ls s3:https://dataset-ingested/user-preference/

[S3UnknownError] Unknown error encountered: 's3:https://dataset-ingested/user-preference/', error: botocore.exceptions.ClientError('An error occurred (PermanentRedirect) when calling the ListObjectsV2 operation: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.'), endpoint: 'https://s3.amazonaws.com'
(base) ubuntu@ip-10-53-8-252:~$ 

the aws credentials have been configured with aws configure and works with AWS cli.

also tried using it in python:

from megfile import smart_walk

s3_directory = 's3:https://dataset-ingested/user-preference/'

# Walking through the directory
for root, dirs, files in smart_walk(s3_directory):
    print(f"Current directory: {root}")
    print(f"Subdirectories: {dirs}")
    print(f"Files: {files}")
    print("-" * 20)

error message:

---------------------------------------------------------------------------
ClientError                               Traceback (most recent call last)
File ~/miniconda3/lib/python3.10/site-packages/megfile/s3_path.py:1534, in S3Path.is_dir(self, followlinks)
   1533 try:
-> 1534     resp = self._client.list_objects_v2(
   1535         Bucket=bucket, Prefix=prefix, Delimiter='/', MaxKeys=1)
   1536 except Exception as error:

File ~/miniconda3/lib/python3.10/site-packages/botocore/client.py:535, in ClientCreator._create_api_method.<locals>._api_call(self, *args, **kwargs)
    534 # The "self" in this scope is referring to the BaseClient.
--> 535 return self._make_api_call(operation_name, kwargs)

File ~/miniconda3/lib/python3.10/site-packages/botocore/client.py:980, in BaseClient._make_api_call(self, operation_name, api_params)
    979     error_class = self.exceptions.from_code(error_code)
--> 980     raise error_class(parsed_response, operation_name)
    981 else:

ClientError: An error occurred (PermanentRedirect) when calling the ListObjectsV2 operation: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.

The above exception was the direct cause of the following exception:

S3UnknownError                            Traceback (most recent call last)
/home/ubuntu/dev/data-processings/ingested_gdl_twitter_processings.ipynb Cell 18 line 6
      3 s3_directory = 's3:https://dataset-ingested/user-preference/'
      5 # Walking through the directory
----> 6 for root, dirs, files in smart_walk(s3_directory):
      7     print(f"Current directory: {root}")
      8     print(f"Subdirectories: {dirs}")

File ~/miniconda3/lib/python3.10/site-packages/megfile/s3_path.py:2040, in S3Path.walk(self, followlinks)
   2037 if not bucket:
   2038     raise UnsupportedError('Walk whole s3', self.path_with_protocol)
-> 2040 if not self.is_dir():
   2041     return
   2043 stack = [key]

File ~/miniconda3/lib/python3.10/site-packages/megfile/s3_path.py:1540, in S3Path.is_dir(self, followlinks)
   1537     error = translate_s3_error(error, self.path_with_protocol)
   1538     if isinstance(error,
   1539                   (S3UnknownError, S3ConfigError, S3PermissionError)):
-> 1540         raise error
   1541     return False
   1543 if not key:  # bucket is accessible

S3UnknownError: Unknown error encountered: 's3:https://dataset-ingested/user-preference/', error: botocore.exceptions.ClientError('An error occurred (PermanentRedirect) when calling the ListObjectsV2 operation: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.'), endpoint: 'https://s3.amazonaws.com'

the bucket I'm trying to get is in the same region as my configuration (from aws configure). any clue on what happened? thanks.

@LoveEatCandy
Copy link
Collaborator

I guess it's a bug. megfile not get all configurations from file. Are you setup region_name by aws configure ?

@LoveEatCandy
Copy link
Collaborator

LoveEatCandy commented Nov 13, 2023

I test the region configuration in file is working.
This error message means the region you using is different from the bucket's region. You may check the region configuration.
If region is right, please show debug logs to me, like this:

import logging
logging.basicConfig(level=logging.DEBUG)

Thanks.

@LoveEatCandy
Copy link
Collaborator

Reopen if the question is still existing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants