Append write operation ends in `AttributeError` #249

anders-kiaer · 2021-06-15T13:12:43Z

import pandas as pd
import numpy as np
import fastparquet
from adlfs import AzureBlobFileSystem

CONTAINER_NAME = ...
BLOB_NAME = f"{CONTAINER_NAME}/some.parquet"
ACCOUNT_NAME = ...
ACCOUNT_KEY = ...

fs = AzureBlobFileSystem(account_name=ACCOUNT_NAME, account_key=ACCOUNT_KEY, container_name=CONTAINER_NAME)
df = pd.DataFrame(np.random.randint(0, 100, size=(100, 4)), columns=list("ABCD"))

try:
    fastparquet.write(BLOB_NAME, df, open_with=fs.open, append=True)
except FileNotFoundError:
    # File does not already exist... create new
    fastparquet.write(BLOB_NAME, df, open_with=fs.open)

What happened:

Running the script for the first time (i.e. when the file does not already exist), it completes without problem. On next run when the file is to be appended:

Traceback (most recent call last):
  File "test_append.py", line 15, in <module>
    fastparquet.write(BLOB_NAME, df, open_with=fs.open, append=True)
  File "[...]/python3.8/site-packages/fastparquet/writer.py", line 873, in write
    write_simple(filename, data, fmd, row_group_offsets,
  File "[...]/python3.8/site-packages/fastparquet/writer.py", line 705, in write_simple
    with open_with(fn, mode) as f:
  File "[...]/python3.8/site-packages/fsspec/spec.py", line 962, in open
    f = self._open(
  File "[...]/python3.8/site-packages/adlfs/spec.py", line 1607, in _open
    return AzureBlobFile(
  File "[...]/python3.8/site-packages/adlfs/spec.py", line 1720, in __init__
    raise NotImplementedError("File mode not supported")
NotImplementedError: File mode not supported
Exception ignored in: <function AzureBlobFile.__del__ at 0x7f4156c04310>
Traceback (most recent call last):
  File "[...]/python3.8/site-packages/adlfs/spec.py", line 1909, in __del__
    self.close()
  File "[...]/python3.8/site-packages/adlfs/spec.py", line 1745, in close
    super().close()
  File "[...]/python3.8/site-packages/fsspec/spec.py", line 1554, in close
    if not self.forced:
AttributeError: 'AzureBlobFile' object has no attribute 'forced'

What you expected to happen:

One of two options:

Either successful write operation (is supporting append with fastparquet in adlfs fundamentally challenging due to e.g. the options of appending/modifying blobs being limited, or is it more a prioritization/not yet implemented question?)
Alternatively maybe a better (final) error message. There is a clue higher up in the traceback stating NotImplementedError, but for some reason "Exception ignored" and instead it ends with an AttributeError. 🤔

As far as I can see, in the append mode, the mode that is tried behind the scenes is rb+ when the AttributeError occurs.

Environment:

Python version: 3.8
Operating System: Ubuntu
Install method (conda, pip, source): pip

The text was updated successfully, but these errors were encountered:

hayesgb · 2021-07-17T12:08:02Z

The package default behavior is to follow Azure's default, which creates a BlockBlob, as described here. BlockBlobs do not accept an append operation. There's limited ability to create an append blob, but it hasn't been tried with this use case.

From a roadmap perspective, and thinking about how the package gets used by Dask, I'd really like to understand your use case. Currently, the approach I generally see is to incrementally add new files, rather than appending to an existing file. However, the ability to append to an existing file, or collection of files when incrementally updating files, is definitely needed.

@martindurant -- can you comment on how s3fs and gcsfs handle this?

martindurant · 2021-07-17T19:51:15Z

gcsfs does not support append

In s3fs, append means: "make a new file, where the first block(s) is the contents of the file which previously had the same name". This is possible if the original file is >5MB. If not, append works by downloading the contents of the previous file and starting a new file from scratch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Append write operation ends in `AttributeError` #249

Append write operation ends in `AttributeError` #249

anders-kiaer commented Jun 15, 2021 •

edited

Loading

hayesgb commented Jul 17, 2021 •

edited

Loading

martindurant commented Jul 17, 2021

Append write operation ends in AttributeError #249

Append write operation ends in AttributeError #249

Comments

anders-kiaer commented Jun 15, 2021 • edited Loading

hayesgb commented Jul 17, 2021 • edited Loading

martindurant commented Jul 17, 2021

Append write operation ends in `AttributeError` #249

Append write operation ends in `AttributeError` #249

anders-kiaer commented Jun 15, 2021 •

edited

Loading

hayesgb commented Jul 17, 2021 •

edited

Loading