Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleanup in TempfileManager can fail on cluster filesystem #100

Closed
rdenham opened this issue Jun 27, 2024 · 1 comment · Fixed by #106
Closed

cleanup in TempfileManager can fail on cluster filesystem #100

rdenham opened this issue Jun 27, 2024 · 1 comment · Fixed by #106

Comments

@rdenham
Copy link

rdenham commented Jun 27, 2024

A follow up to #93.

In rios > 2, there is a clean up step which removes temporary files and a temporary directory they are stored in. On our panasas file system, this can fail due to the presence of a .panfs* file that remains in the temporary directory when it is attempted to be removed.

Some testing shows that in most situations, there is no issue in creating and removing temporary directories in python on this filesystem. The error only occurs with the default concurrency setting, ie using:

conc = applier.ConcurrencyStyle(
    numReadWorkers=0,
    numComputeWorkers=0,
    computeWorkerKind='CW_NONE',
    computeWorkersRead=False,
    singleBlockComputeWorkers=False,
    haveSharedTemp=True,
    readBufferInsertTimeout=10,
    readBufferPopTimeout=10,
    computeBufferInsertTimeout=10,
    computeBufferPopTimeout=20,
    computeBarrierTimeout=600,
)

controls.setConcurrencyStyle(conc)

Changing numReadWorkers to 1 (or any integer), seems to prevent this occurring. Also no problem if we set controls.setTempdir to point to a non-cluster file system.

I'm not familar enough with the workings of the concurrency model to debug this further, but happy to help where I can.

@neilflood
Copy link
Member

As mentioned in email, I think this problem is already solved by the changes in #98 and #99. Let me know if either you or @badmatitude were able to confirm this with real tests (I am just working on theoretical knowledge and speculation about the Panasas).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants