Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adjust some gc heuristics #32556

Merged
merged 1 commit into from
Jul 16, 2019
Merged

adjust some gc heuristics #32556

merged 1 commit into from
Jul 16, 2019

Conversation

JeffBezanson
Copy link
Sponsor Member

An attempt to do something about #32472 and #28986 (also helps #29740 a bit).

In #28986, we basically see a live size alternating between x and 2x, and each x->2x transition was triggering an unnecessary full collection. What I'd like to do instead is full collect if (1) the heap has grown a lot since the last full collection, and (2) minor collections are not succeeding at reducing it. I tried to implement that by monitoring the live_bytes variable and looking for growth that lasts a couple minor collections.

In #32472 we have very long pause times (for both major and minor, some nearly 500ms on my system), probably due to marking large numbers of Tasks. We have been keeping the collection interval a fairly small constant as long as minor collections are succeeding at freeing all new objects. That makes sense if minor collections are basically free, but if they get slow we need to back off the interval. I've been trying to do something timing-based, but it gets fairly unpredictable and complex. A simple approach I found was to allow the minor collect interval to be some fraction (half) of live_bytes. In many workloads, the collect interval is actually much larger than the number of live bytes, so allowing it to scale this way seems reasonably conservative to me.

Finally, for some reason we were resetting the interval to default_collect_interval / 2. Does anybody know why? I traced that to 7c8acce (5 years ago). I mean, it's called default_collect_interval, not twice_default_collect_interval.

- trigger a full collection if live size grows a lot and stays there
- use a larger minor collect interval based on live_bytes

helps #32472 and #28986
@JeffBezanson JeffBezanson added performance Must go faster GC Garbage collector labels Jul 11, 2019
@JeffBezanson
Copy link
Sponsor Member Author

@nanosoldier runbenchmarks(ALL, vs=":master")

@StefanKarpinski
Copy link
Sponsor Member

I imagine that @yuyichao might be a good person to review if he's willing.

@JeffBezanson
Copy link
Sponsor Member Author

Indeed. The reviewers box isn't letting me select him, which is maybe some sort of github problem?

@StefanKarpinski
Copy link
Sponsor Member

Yeah, I'm not sure, his user name also never autocompletes, so maybe some privacy setting?

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jul 12, 2019

Nice, it looks like there’s a chance you might have made some of the “problems” faster without affecting much else!

@JeffBezanson
Copy link
Sponsor Member Author

I'll merge this for now; can certainly revert or revise if any objections or problems arise.

@JeffBezanson JeffBezanson merged commit f578ada into master Jul 16, 2019
@JeffBezanson JeffBezanson deleted the jb/gcinterval branch July 16, 2019 03:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants