Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss greedy/prefetch caching behaviour #222

Open
khaeru opened this issue Nov 25, 2019 · 0 comments
Open

Discuss greedy/prefetch caching behaviour #222

khaeru opened this issue Nov 25, 2019 · 0 comments

Comments

@khaeru
Copy link
Member

khaeru commented Nov 25, 2019

#213 added the Backend class CachingBackend and adjusted JDBCBackend to use it.

  1. Caching is always on in JDBCBackend.
  2. When an item is loaded with filters, e.g. scen.par('demand', filters=dict{'year': [1234]}), then:
    a. only the filtered rows are retrieved from Java/ixmp_source.
    b. The pd.DataFrame containing these rows is cached.
    c. The cache is used for future scen.par() calls with identical filters.
  3. When an item is loaded with different filters, a different pd.DataFrame is cached.
    a. For instance, scen.par('demand', filters=dict{'year': [1234, 4321]}) results in an distinct value in cache.
  4. When an item is loaded without filters, the entire item is cached.
  5. If an entire item is cached, subsequent filtered requests (like 2. or 3.) are met by:
    a. Taking the cached pd.DataFrame for the entire item.
    b. Filtering it in Python, and
    c. Returning it.
    …i.e. without database access.

A different caching behaviour we could call greedy- or prefetch caching. In this case, to meet 2. or 3., the entire contents (all rows) are first cached (as in 4.); and then filtered on the Python side (as in 5.a–c).

This issue is to discuss:

  • What should the default behaviour be? i.e. prefetch on, or off?
  • How should the behaviour be controlled?
    • E.g. a prefetch=True keyword argument to Platform, passed to JDBCBackend.
    • or, a method or attribute on JDBCBackend to change the behaviour.

Consider different use cases:

  • Alice is working with a small set of changes to a MESSAGE parameter like land_out that is very large. If prefetch is on, her code loads the entire item, making it slower than it could be.
  • Bob is working with an extensive script that performs adjustments to the entire parameter; his code works faster if the whole item is prefetched and then repeatedly filtered in Python.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant