Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Less cluster memory #338

Closed
wants to merge 11 commits into from
Closed

Less cluster memory #338

wants to merge 11 commits into from

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Sep 16, 2022

This will run the array workloads that accept cluster memory as an input parameter parametrized for a low-memory and high-memory situation.

The tests themselves sometimes still multiply this value but I didn't want to modify test logic. Further down the road, it will surely be an interesting exercise to increase this value to >1.0 as well but for now we're mostly concerned getting the low mem cases right.

For future benchmark development I believe the pattern of adjusting used data size as a function of cluster memory is a good practice that allows us scaling these tests very easily

@fjetter
Copy link
Member Author

fjetter commented Sep 16, 2022

Failure on the win stability test https://github.com/coiled/platform/issues/91

@fjetter
Copy link
Member Author

fjetter commented Sep 16, 2022

Failure on ubu 3.9 upstream is #336

I have no idea how this change could introduce regressions. The one test I touched that is flagged as a regression is test_dataframe_align but the regression is detected for coiled-runtime 0.0.4

I am surprised that this even kicks in. Given the parametrization I'm wondering how the script correlates the new changes with the old ones. Any idea @ian-r-rose ?

@fjetter
Copy link
Member Author

fjetter commented Sep 16, 2022

The visualization on this is a bit unfortunate. Could be fixed with a DB migration but not sure if that's worth it
image

def upgrade() -> None:
op.execute(
"""update test_run
set name = name || '[100% cluster memory]'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ian-r-rose I'm not entirely sure what name and orignalname is. is it correct for me to just modify name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

originalname is the name of the actual function, whereas name is the full name with any parametrizations tacked on (this is pytest terminology). So the structure of this migration looks correct to me.

def upgrade() -> None:
op.execute(
"""update test_run
set name = name || '[100% cluster memory]'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

originalname is the name of the actual function, whereas name is the full name with any parametrizations tacked on (this is pytest terminology). So the structure of this migration looks correct to me.

conftest.py Outdated
Comment on lines 438 to 442
def _cluster_memory(client: distributed.Client) -> int:
"Total memory available on the cluster, in bytes"
return int(
sum(w["memory_limit"] for w in client.scheduler_info()["workers"].values())
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems unfortunate to delete the utility function such that it's inaccessible to tests who might want to do something different with the cluster memory measurement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I'll just rename it to get_cluster_memory

@fjetter
Copy link
Member Author

fjetter commented Sep 16, 2022

"proof" that the migration script works
image

@ian-r-rose
Copy link
Contributor

I am surprised that this even kicks in. Given the parametrization I'm wondering how the script correlates the new changes with the old ones. Any idea @ian-r-rose ?

It looks like you've worked this out, but for posterity, the correlation with older runs is based on matching the name (not originalname, which does not include parameterizations) and the runtime version.

@ian-r-rose
Copy link
Contributor

"proof" that the migration script works image

Nice

conftest.py Outdated

@pytest.fixture(
params=[
pytest.param(0.3, id="30% cluster memory"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

30% is just a guess. I'm looking for stuff that "comfortably fits into memory". I figured that if a test actually generates that much data, we could hold two full copies before starting to spill.

This will be individual for every test though since most tests still multiply the cluster-memory by a certain factor etc.

@fjetter
Copy link
Member Author

fjetter commented Sep 16, 2022

@ian-r-rose (or anybody else) if you are happy with this, please go ahead and merge since I'm likely already out before CI finishes but I think we should get this in before the weekend

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only taken a quick look at this -- but I like the idea of quickly being able to say "run this test with small, large, etc. datasets compared to the cluster memory" 👍

Copy link
Contributor

@gjoseph92 gjoseph92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of quickly being able to say "run this test with small, large, etc. datasets compared to the cluster memory"

I like this idea as well, but are we sure it's valuable to do automatically on every case? I appreciate setting something up like this so we can use it conveniently in the future, but does it add a valuable enough signal right now to be worth increasing test runtime this much? I could see merging this, but on main only having a single parametrization of memory_factor.

# From https://github.com/dask/distributed/issues/2602#issuecomment-535009454
if cluster_memory == 1.0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it ever be 1.0? The actual cluster memory would have to be 1 byte for that to happen. I assume you want to detect the not-30% case, but because you only have access to the (faked) size in bytes, not the factor, that's tricky to do. (See above for discussion.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this originates from an earlier iteration. Will need to fix that

data = da.random.random(
scaled_array_shape(target_nbytes, ("x", "10MiB")),
chunks=(1, parse_bytes("10MiB") // 8),
)
print_size_info(memory, target_nbytes, data)
print_size_info(cluster_memory, target_nbytes, data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't love implementing this by lying to the test about the amount of cluster memory. It makes this logging of size info wrong, for one. It also means that tests that do want to adapt to cluster memory, but don't make sense to run on a variety of parameterizations (maybe they're meant to test spilling, or they know they won't work on larger datasets [see comment below]), are difficult to write.

Instead, why not have memory_factor be the fixture, and multiply within the test when appropriate? A little more code, but also more explicit and readable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@hayesgb hayesgb mentioned this pull request Oct 4, 2022
Copy link
Contributor

@hayesgb hayesgb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only question here is whether it makes sense to use the same data sizes for all of the array tests.

Was there a specific reason the designated multiples were chosen for each test. I think @gjoseph92 may have better context. See #410 as alternative.

@gjoseph92
Copy link
Contributor

Was there a specific reason the designated multiples were chosen for each test

Absolutely. These multiples are are based on actual workloads reported by actual users relative to the size of their actual cluster.

As an example, test_anom_mean came from dask/distributed#2602 (comment), which was sometimes unrunnable using a 75 GiB dataset on a 200GiB cluster (see dask/distributed#2602 (comment)). Increasing cluster size significantly would make it runnable. So this test was tuned to trigger the real-world case of this workload with data ~half the size of the cluster, which presumably used to trigger spilling and then failure. Making the data significantly smaller means we'd expect it to finish even if scheduling was horrendous. Making the data significantly bigger means I'm not sure we could reasonably expect it to be runnable.

So adding a multiple to every test's data size might kinda make sense, but we shouldn't just change how all the data sizes are already set up relative to cluster memory—just apply the multiple at the end.

@fjetter fjetter closed this Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants