-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Existing file marked as non-existing #265
Comments
@hayesgb Digging a bit more, switching to |
@hayesgb (Continuing from #261) Just tried from head:
|
Also if it helps, my paths look like:
|
Would you mind posting the result of:
fs.details(“somecontainer/mydata/mydata2/abc”)
… On Aug 15, 2021, at 6:07 PM, lmeyerov ***@***.***> wrote:
somecontainer/mydata/mydata2/abc
|
|
FYI, having more luck with variants of: async def aexists_dir(path):
blob_service_client = BlobServiceClient.from_connection_string(conn_str)
async with blob_service_client:
container_client = blob_service_client.get_container_client(az_storage_container_name)
async for myblob in container_client.list_blobs(name_starts_with=path):
return myblob['name'] != path
return False |
Thanks. I may end up updating to this. I asked about details earlier, but could you post the result of |
{
"metadata": None,
"creation_time": datetime.datetime(2020, 9, 29, 0, 16, 6, tzinfo=datetime.timezone.utc),
"deleted": None,
"deleted_time": None,
"last_modified": datetime.datetime(2021, 8, 13, 15, 35, 35, tzinfo=datetime.timezone.utc),
"content_settings": {
"content_type": "application/x-gzip",
"content_encoding": None,
"content_language": None,
"content_md5": bytearray(b"*****"),
"content_disposition": None,
"cache_control": None
},
"remaining_retention_days": None,
"archive_status": None,
"last_accessed_on": None,
"etag": "*****",
"tags": None,
"tag_count": None,
"name": "mycontainer/myfolder/myfile",
"size": 4332,
"type": "file"
} |
Thanks for the help here @lmeyerov Release 2021.08.2 should fix the errors with isfile. |
Can you share an example of the slowly downloading isdir? This does call cc.list_blobs. Are there a very large number of blobs in the location you're scanning? |
Yes - it's a potentially big folder (named parquet dumps), in this case I wouldn't be surprised if 1K-10K files. I think async list_files paginates, though I'm unsure of how to ensure that's reasonably small. That's part of the reason we're trying to only do asyncio w/ adlfs, ensuring even occasional blips will not starve out other tasks. |
@lmeyerov -- I just refactored _isdir on the accel_isdir branch. It passes all the tests, and completely eliminates the list_blobs call. Would appreciate your feedback if you have a chance to check it out. |
Sure -- will check on Th/F (am traveling) At the same time, if anything around async multi-connection downloads of indiv + folder blobs, happy to check there. Currently investigating how to do via az's SDK, but we rather have unified under fsspec! |
Cool. Just curious -- on the multi-connection downloads -- are you looking to use Dask or is the use case async multithreading? |
RE:async multithreading, az sdk has parallel connection support with a configurable # of streams, which seems like a fine first step..
|
What happened:
fs.isfile(existing_file_path)
incorrectly returns False and gives a warningEDIT: Output is
What you expected to happen:
Return
True
without a warningMinimal Complete Verifiable Example:
Anything else we need to know?:
Environment:
fsspec '2021.07.0' (conda)
adlfs '2021.08.1' (pip, no conda yet)
docker / ubuntu 18.04 / python 3.7
The text was updated successfully, but these errors were encountered: