Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump up MAX_HP_FOR_GC from 1GB to 3GB #2848

Merged
merged 4 commits into from
Nov 1, 2021

Conversation

ulan
Copy link
Contributor

@ulan ulan commented Oct 21, 2021

I think the plan was to get initial version with 1GB, then
bump it up to 3G, and then implement more advanced heuristic.

The motivation: there is a spike of messages that dirty between
256MB and 512MB of memory. One theory is that these messages
are from Motoko canisters that crossed the 1GB threshold.

I think the plan was to get initial version with 1GB, then
bump it up to 3G, and then implement more advanced heuristic.

The motivation: there is a spike of messages that dirty between
512MB and 1GB of memory. One theory is that these messages are
from Motoko canisters that crossed the 1GB threshold.
@ulan ulan requested review from osa1 and crusso October 21, 2021 20:03
@crusso
Copy link
Contributor

crusso commented Oct 21, 2021

I'll let @osa1 approve this one - unless you are looking for a rubber stamp.

@osa1
Copy link
Contributor

osa1 commented Oct 22, 2021

Canisters that allocate at a slow rate but allocate large amounts every once in a while rely on this parameter to have enough allocation are for the calls with large allocations. A larger MAX_HP_FOR_GC will make it more likely for those canisters to get stuck. I have no objections if we're OK with this.

Perhaps in the long run it would make sense to allow configuring these parameters. We could even do it in runtime, maybe with a compiler-generated upgrade method (which would also help with getting a canister unstuck), or provide a prim and let the user to define the endpoint if they feel the need.

Since a larger MAX_HP_FOR_GC is more risky (may cause some canisters to get stuck), I wonder if we should think about possible recovery strategies before merging?

The motivation: there is a spike of messages that dirty between 256MB and 512MB of memory. One theory is that these messages are from Motoko canisters that crossed the 1GB threshold.

Is the spike for just one message? If a canister passes 1 GiB threshold and dirties 256-512 MiB of memory, that means half of the space is reclaimed, and for the next few messages it shouldn't do GC (assuming it allocates at the same rate as before).

@osa1
Copy link
Contributor

osa1 commented Oct 22, 2021

(Restarted failing CI job)

@ulan
Copy link
Contributor Author

ulan commented Oct 22, 2021

Canisters that allocate at a slow rate but allocate large amounts every once in a while rely on this parameter to have enough allocation are for the calls with large allocations. A larger MAX_HP_FOR_GC will make it more likely for those canisters to get stuck. I have no objections if we're OK with this.

I didn't get why would the canister get stuck depending the value of this parameter? Don't we force GC when allocation fails anyway independent of the GC schedule?

@ulan
Copy link
Contributor Author

ulan commented Oct 22, 2021

Is the spike for just one message? If a canister passes 1 GiB threshold and dirties 256-512 MiB of memory, that means half of the space is reclaimed, and for the next few messages it shouldn't do GC (assuming it allocates at the same rate as before).

I am not sure. If all objects survive, would we dirty 0 pages or all pages? I wonder if most of the objects at the beginning are surviving and only the objects at the end are moved.

@osa1
Copy link
Contributor

osa1 commented Oct 22, 2021

I didn't get why would the canister get stuck depending the value of this parameter? Don't we force GC when allocation fails anyway independent of the GC schedule?

We only do GC after an update message.

@osa1
Copy link
Contributor

osa1 commented Oct 22, 2021

If all objects survive, would we dirty 0 pages or all pages?

  • With copying GC: all pages
  • With mark-compact: onle a few pages for the mark stack and bitmap

@ulan
Copy link
Contributor Author

ulan commented Oct 22, 2021

I didn't get why would the canister get stuck depending the value of this parameter? Don't we force GC when allocation fails anyway independent of the GC schedule?

We only do GC after an upgrade message.

I see, thanks! Is it due to missing implementation or was it intentionally removed? In general it seems very important to do GC if the allocation fails because GC schedule cannot be perfect.

@osa1
Copy link
Contributor

osa1 commented Oct 22, 2021

Is it due to missing implementation or was it intentionally removed?

It was never implemented. See also #2033.

@ulan
Copy link
Contributor Author

ulan commented Oct 22, 2021

Is it due to missing implementation or was it intentionally removed?

It was never implemented. See also #2033.

Thanks. That's what I feared. Adding stack support is a large project.

Perhaps in the long run it would make sense to allow configuring these parameters. We could even do it in runtime, maybe with a compiler-generated upgrade method (which would also help with getting a canister unstuck), or provide a prim and let the user to define the endpoint if they feel the need.

That sounds reasonable as a mitigation. Another idea: can we detect that we are running in the context of an pre/post upgrade hooks in Motoko? If so we could define one hard limit for messages (e.g. 3GB) after which we trap with OOM. But when running in upgrade hooks we could increase the limit to 4GB. So that the developer can get canister unstuck.

@nomeata
Copy link
Collaborator

nomeata commented Oct 22, 2021

can we detect that we are running in the context of an pre/post upgrade hooks in Motoko?

Yes, see Lifecycle module in compile.ml.

@ggreif ggreif added the automerge-squash When ready, merge (using squash) label Oct 22, 2021
@crusso
Copy link
Contributor

crusso commented Oct 25, 2021

I didn't get why would the canister get stuck depending the value of this parameter? Don't we force GC when allocation fails anyway independent of the GC schedule?

We only do GC after an upgrade message.

You mean update (not upgrade) message, right? Just in case someone got confused...

@osa1
Copy link
Contributor

osa1 commented Oct 25, 2021

You mean update (not upgrade) message, right? Just in case someone got confused...

Yes, sorry. Edited my original message.

@osa1
Copy link
Contributor

osa1 commented Oct 29, 2021

Created #2864 for the CI issue.

@mergify mergify bot merged commit c9d4d08 into dfinity:master Nov 1, 2021
@mergify mergify bot removed the automerge-squash When ready, merge (using squash) label Nov 1, 2021
@osa1
Copy link
Contributor

osa1 commented Nov 1, 2021

I wasn't aware that this is labeled as "automerge-squash". I think we should revert this since this will make it more likely for canisters to get stuck, and we have no way of recovering such canisters.

Any thoughts on this? @crusso @ggreif @ulan

@osa1
Copy link
Contributor

osa1 commented Nov 1, 2021

We discussed this with @ulan. Here's my summary of the problem and what we want to do:

  • MAX_HEAP_FOR_GC = 1 GiB is too low. It is common for canisters to have live data more than 1 GiB, and in those cases doing GC after every update message causes performance problems. (mainly on the replica, but maybe uses too many cycles too)

  • This PR increases MAX_HEAP_FOR_GC to 3 GiB. This makes it more likely for canisters to get stuck. With this change, when a canisters reaches 3 GiB heap it will have 1 GiB allocation area for the next query message. Less than that for the next update message because it will want to do GC after the update message.

  • We are OK with giving canisters 1 GiB allocation space when their heap size reaches 3 GiB. (not sure what the reasoning here is, perhaps @ulan can say more)

  • However, the current default GC (copying) copies all live data to a new space and then back. So with this change, a canister with 3 GiB heap can have at most 1 GiB of live data (less depending on how much it allocates in the last message). This case is probably not too uncommon, so we want to revert this PR for now.

One possible way forward here is making the compacting GC the default. With compacting GC, we need to allocate a bitmap (one bit per heap word), and a growable mark stack. This makes it less likely for canisters with 3 GiB heap size to get stuck.

Compacting GC uses more cycles (~30% last time I benchmarked), but halves dirtied pages, and large dirtied pages is what causes problems on the replica side. So we're OK with trading cycles for less dirtied pages.

Before making compacting GC the default we want to test it more. One idea is to backport random heap generation in #2706.

For now we revert this. We should merge this with the PR that makes compacting GC the default.

osa1 added a commit that referenced this pull request Nov 1, 2021
mergify bot pushed a commit that referenced this pull request Nov 1, 2021
@osa1
Copy link
Contributor

osa1 commented Nov 2, 2021

In addition to my comment above, we discussed about the actual costs of GC algorithms for the platform. Ulan says one dirtied page costs the same as between 10,000 and 100,000 cycles to the replica. If we take that into account in GC benchmarks, I think compacting GC may be significantly faster than the copying GC.

I don't have the raw data for the latest version of the compacting GC and copying GC so I will have to repeat the benchmarks for this.

@rossberg
Copy link
Contributor

rossberg commented Nov 2, 2021

I agree that we should make compacting GC the default as soon as we think it's safe to do so.

In the meantime, would it make sense to bump max HP to 2GiB (or something slightly less)? Would that avoid the risk of getting stuck?

@nomeata
Copy link
Collaborator

nomeata commented Nov 2, 2021

Compacting might be better than copying, but even that is not really a viable solution; didn't we conclude that only a non-moving incremental GC actually works on the IC? Has work in that direction started? Or at least, do we know which design to work towards?

@rossberg
Copy link
Contributor

rossberg commented Nov 2, 2021

Yes, we definitely want an incremental generational collector, and we (Ömer) is busy working towards that. I don't know that we concluded non-moving is required (or desired), but Ömer is currently implementing a page-based heap as a prerequisite for the next steps. With that, GC would mostly operate on one page at a time in the end.

@nomeata
Copy link
Collaborator

nomeata commented Nov 2, 2021

Ok, fair enough, incremental might be good enough even if moving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants