-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDC resolver may cause TiKV OOM #15412
Labels
affects-4.0
This bug affects 4.0.x versions.
affects-5.0
This bug affects 5.0.x versions.
affects-5.1
This bug affects 5.1.x versions.
affects-5.2
This bug affects 5.2.x versions.
affects-5.3
This bug affects 5.3.x versions.
affects-5.4
affects-6.0
affects-6.1
affects-6.2
affects-6.3
affects-6.4
affects-6.5
affects-6.6
affects-7.0
affects-7.1
affects-7.2
affects-7.3
severity/major
type/bug
The issue is confirmed as a bug.
Comments
This was referenced Aug 29, 2023
overvenus
added
affects-4.0
This bug affects 4.0.x versions.
affects-5.0
This bug affects 5.0.x versions.
affects-5.1
This bug affects 5.1.x versions.
affects-5.2
This bug affects 5.2.x versions.
affects-5.3
This bug affects 5.3.x versions.
affects-5.4
affects-6.0
affects-6.1
affects-6.2
affects-6.3
affects-6.4
affects-6.5
affects-6.6
and removed
may-affects-5.2
may-affects-5.3
may-affects-5.4
may-affects-6.1
may-affects-6.5
may-affects-7.1
labels
Aug 31, 2023
ti-chi-bot
pushed a commit
to ti-chi-bot/tikv
that referenced
this issue
Sep 5, 2023
close tikv#15412 Signed-off-by: ti-chi-bot <[email protected]>
ti-chi-bot
pushed a commit
to ti-chi-bot/tikv
that referenced
this issue
Sep 5, 2023
close tikv#15412 Signed-off-by: ti-chi-bot <[email protected]>
ti-chi-bot
pushed a commit
to ti-chi-bot/tikv
that referenced
this issue
Sep 5, 2023
close tikv#15412 Signed-off-by: ti-chi-bot <[email protected]>
ti-chi-bot
pushed a commit
to ti-chi-bot/tikv
that referenced
this issue
Sep 5, 2023
close tikv#15412 Signed-off-by: ti-chi-bot <[email protected]>
ti-chi-bot
pushed a commit
to ti-chi-bot/tikv
that referenced
this issue
Sep 5, 2023
close tikv#15412 Signed-off-by: ti-chi-bot <[email protected]>
ti-chi-bot bot
added a commit
that referenced
this issue
Sep 7, 2023
ref #15412 MemoryQuota alloc API returns result, make it more ergonomic. Signed-off-by: Neil Shen <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ti-chi-bot
pushed a commit
to ti-chi-bot/tikv
that referenced
this issue
Sep 7, 2023
ref tikv#15412 Signed-off-by: ti-chi-bot <[email protected]>
ti-chi-bot
pushed a commit
to ti-chi-bot/tikv
that referenced
this issue
Sep 7, 2023
ref tikv#15412 Signed-off-by: ti-chi-bot <[email protected]>
ti-chi-bot
pushed a commit
to ti-chi-bot/tikv
that referenced
this issue
Sep 7, 2023
ref tikv#15412 Signed-off-by: ti-chi-bot <[email protected]>
ti-chi-bot
pushed a commit
to ti-chi-bot/tikv
that referenced
this issue
Sep 7, 2023
ref tikv#15412 Signed-off-by: ti-chi-bot <[email protected]>
ti-chi-bot
pushed a commit
to ti-chi-bot/tikv
that referenced
this issue
Sep 7, 2023
ref tikv#15412 Signed-off-by: ti-chi-bot <[email protected]>
overvenus
added a commit
to ti-chi-bot/tikv
that referenced
this issue
Nov 27, 2023
close tikv#15412 Similar to resolved-ts endpoint, cdc endpoint maintains resolvers for subscribed regions. These resolvers also need memory quota, otherwise they may cause OOM. This commit lets cdc endpoint deregister regions if they exceed memory quota. Signed-off-by: Neil Shen <[email protected]>
overvenus
added a commit
to ti-chi-bot/tikv
that referenced
this issue
Nov 27, 2023
ref tikv#15412 MemoryQuota alloc API returns result, make it more ergonomic. Signed-off-by: Neil Shen <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
overvenus
added a commit
to ti-chi-bot/tikv
that referenced
this issue
Dec 8, 2023
close tikv#15412 Similar to resolved-ts endpoint, cdc endpoint maintains resolvers for subscribed regions. These resolvers also need memory quota, otherwise they may cause OOM. This commit lets cdc endpoint deregister regions if they exceed memory quota. Signed-off-by: Neil Shen <[email protected]>
overvenus
added a commit
to ti-chi-bot/tikv
that referenced
this issue
Dec 8, 2023
ref tikv#15412 MemoryQuota alloc API returns result, make it more ergonomic. Signed-off-by: Neil Shen <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ti-chi-bot bot
pushed a commit
that referenced
this issue
Dec 10, 2023
#15523 #15554 (#15465) close #14864, ref #14864, ref #15412, close #15412, close #15553 This commit rolls up following patches: *: add memory quota to resolved_ts::Resolver (#15400) ref #14864 This is the first PR to fix OOM caused by Resolver tracking large txns. Resolver checks memory quota before tracking a lock, and returns false if it exceeds memory quota. Signed-off-by: Neil Shen <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Signed-off-by: Neil Shen <[email protected]> --- resolved_ts: re-register region if memory quota exceeded (#15411) close #14864 Fix resolved ts OOM caused by Resolver tracking large txns. `ObserveRegion` is deregistered if it exceeds memory quota. It may cause higher CPU usage because of scanning locks, but it's better than OOM. Signed-off-by: Neil Shen <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> --- resolved_ts: track pending lock memory usage (#15452) ref #14864 * Fix resolved ts OOM caused by adding large txns locks to `ResolverStatus`. * Add initial scan backoff duration metrics. Signed-off-by: Neil Shen <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Co-authored-by: Connor <[email protected]> --- cdc: deregister delegate if memory quota exceeded (#15486) close #15412 Similar to resolved-ts endpoint, cdc endpoint maintains resolvers for subscribed regions. These resolvers also need memory quota, otherwise they may cause OOM. This commit lets cdc endpoint deregister regions if they exceed memory quota. Signed-off-by: Neil Shen <[email protected]> --- *: let alloc API return result (#15529) ref #15412 MemoryQuota alloc API returns result, make it more ergonomic. Signed-off-by: Neil Shen <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> --- resolved_ts: limit scanner memory usage (#15523) ref #14864 * Break resolved ts scan entry into multiple tasks. * Limit concurrent resolved ts scan tasks. * Remove resolved ts dead code. Signed-off-by: Neil Shen <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> --- resolved_ts: remove hash set to save memory (#15554) close #15553 The Resolver uses a hash set to keep track of locks associated with the same timestamp. When the length of the hash set reaches zero, it indicates that the transaction has been fully committed. To save memory, we can replace the hash set with an integer. Signed-off-by: Neil Shen <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Signed-off-by: Neil Shen <[email protected]> Signed-off-by: Neil Shen <[email protected]> Co-authored-by: Neil Shen <[email protected]>
ti-chi-bot bot
pushed a commit
that referenced
this issue
Dec 27, 2023
#15523 and #15554 (#15464) close #14864, ref #14864, close #15412, ref #15412, close #15553 This commit rolls up following patches: *: add memory quota to resolved_ts::Resolver (#15400) ref #14864 This is the first PR to fix OOM caused by Resolver tracking large txns. Resolver checks memory quota before tracking a lock, and returns false if it exceeds memory quota. Signed-off-by: Neil Shen <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Signed-off-by: Neil Shen <[email protected]> --- resolved_ts: re-register region if memory quota exceeded (#15411) close #14864 Fix resolved ts OOM caused by Resolver tracking large txns. `ObserveRegion` is deregistered if it exceeds memory quota. It may cause higher CPU usage because of scanning locks, but it's better than OOM. Signed-off-by: Neil Shen <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> --- resolved_ts: track pending lock memory usage (#15452) ref #14864 * Fix resolved ts OOM caused by adding large txns locks to `ResolverStatus`. * Add initial scan backoff duration metrics. Signed-off-by: Neil Shen <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Co-authored-by: Connor <[email protected]> --- cdc: deregister delegate if memory quota exceeded (#15486) close #15412 Similar to resolved-ts endpoint, cdc endpoint maintains resolvers for subscribed regions. These resolvers also need memory quota, otherwise they may cause OOM. This commit lets cdc endpoint deregister regions if they exceed memory quota. Signed-off-by: Neil Shen <[email protected]> --- *: let alloc API return result (#15529) ref #15412 MemoryQuota alloc API returns result, make it more ergonomic. Signed-off-by: Neil Shen <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> --- resolved_ts: limit scanner memory usage (#15523) ref #14864 * Break resolved ts scan entry into multiple tasks. * Limit concurrent resolved ts scan tasks. * Remove resolved ts dead code. Signed-off-by: Neil Shen <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> --- resolved_ts: remove hash set to save memory (#15554) close #15553 The Resolver uses a hash set to keep track of locks associated with the same timestamp. When the length of the hash set reaches zero, it indicates that the transaction has been fully committed. To save memory, we can replace the hash set with an integer. Signed-off-by: Neil Shen <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Signed-off-by: Neil Shen <[email protected]> Signed-off-by: Neil Shen <[email protected]> Co-authored-by: Neil Shen <[email protected]>
17 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
affects-4.0
This bug affects 4.0.x versions.
affects-5.0
This bug affects 5.0.x versions.
affects-5.1
This bug affects 5.1.x versions.
affects-5.2
This bug affects 5.2.x versions.
affects-5.3
This bug affects 5.3.x versions.
affects-5.4
affects-6.0
affects-6.1
affects-6.2
affects-6.3
affects-6.4
affects-6.5
affects-6.6
affects-7.0
affects-7.1
affects-7.2
affects-7.3
severity/major
type/bug
The issue is confirmed as a bug.
Bug Report
Similar to #14864, CDC has its own resolver, and it may cause OOM too.
What version of TiKV are you using?
> 4.0.0
Steps to reproduce
Run a large txn that is much greater than TiKV memory size.
What did you expect?
No OOM.
What did happened?
OOM.
The text was updated successfully, but these errors were encountered: