Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(dsp): prevent cache instantiation when cache is disabled #839

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

stevenhoelscher
Copy link

The environment variable DSP_CACHEBOOL does not completely disable the filesystem cache.

As a result, joblib.Memory is still instantiated, which will try to create a directory on the filesystem.

In an environment where a user does not have write access to the DSP_CACHEDIR, this will throw a surprising exception despite DSP_CACHEBOOL being set to false.

File "/var/lang/lib/python3.11/site-packages/dsp/modules/cache_utils.py", line 27, in <module>
    CacheMemory = Memory(location=cachedir, verbose=0)
  File "/var/lang/lib/python3.11/site-packages/joblib/memory.py", line 1020, in __init__
    self.store_backend = _store_backend_factory(
  File "/var/lang/lib/python3.11/site-packages/joblib/memory.py", line 132, in _store_backend_factory
    obj.configure(location, verbose=verbose,
  File "/var/lang/lib/python3.11/site-packages/joblib/_store_backends.py", line 452, in configure
    mkdirp(self.location)
  File "/var/lang/lib/python3.11/site-packages/joblib/disk.py", line 61, in mkdirp
    os.makedirs(d)

CacheMemory.cache = noop_decorator

if cache_turn_on:
CacheMemory = Memory(location=cachedir, verbose=0)

cachedir2 = os.environ.get('DSP_NOTEBOOK_CACHEDIR')
NotebookCacheMemory = dotdict()
NotebookCacheMemory.cache = noop_decorator

if cachedir2:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this could be updated to

if cache_turn_on and cachedir2:

Copy link
Author

@stevenhoelscher stevenhoelscher Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious what DSP_NOTEBOOK_CACHEDIR denotes. Because it seems to assign the NotebookCacheMemory a fs cache but then walks that back if DSP_CACHEBOOL is false.

Also note that if DSP_CACHEBOOL is true, but DSP_NOTEBOOK_CACHEDIR is not set, then the cache won't work because of joblib.Memory defaults:

If None is given, no caching is done and the Memory object is completely transparent.

https://joblib.readthedocs.io/en/latest/generated/joblib.Memory.html

@arnavsinghvi11
Copy link
Collaborator

Thanks @stevenhoelscher for the PR! This looks good and maintains the cache correctly. Is the PR ready to merge or pending the change you've mentioned here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants