services/horizon: Reap history object tables when ingestion is idle #4518

bartekn · 2022-08-07T17:22:28Z

PR Checklist

PR Structure

This PR has reasonably narrow scope (if not, break it down into smaller PRs).
This PR avoids mixing refactoring changes with feature changes (split into two PRs
otherwise).
This PR's title starts with name of package that is most changed in the PR, ex.
services/friendbot, or all or doc if the changes are broad or impact many
packages.

Thoroughness

This PR adds tests for the most critical parts of the new functionality or fixes.
I've updated any docs (developer docs, .md
files, etc... affected by this change). Take a look in the docs folder for a given service,
like this one.

Release planning

I've updated the relevant CHANGELOG (here for Horizon) if
needed with deprecations, added features, breaking changes, and DB schema changes.
I've decided if this PR requires a new major/minor version according to
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.

What

This commit adds code responsible for clearing orphaned rows in lookup historical tables. Orphaned rows can appear when old data is removed by reaper. The new code is separate from the existing reaper code (see "Alternative solutions" below) and activates after each ledger if there are no more ledgers to ingest in the backend. This has two advantages: it does not slow down catchup and it works only when ingestion is idle which shouldn't affect ingestion at all. To ensure performance is not affected, the ReapLookupTables method is called with 5 seconds context timeout which means that if it does not finish the work in specified time it will simply be cancelled.

The solution here requires new indexes added in c2d52f0 (without it finding the rows to delete is slow). For each lookup table, we check the number of occurences of a given lookup ID in all the tables in which lookup table is used. If no occurences are found, the row is removed from lookup table.

Rows are removed in batches of 10000 rows (can be modified in the future). The cursor is updated when tables is processed so after next ledger ingesion the next chunk of rows is checked. When cursor reaches the end of table it is reset back to 0. This ensures that all the orphaned rows are removed eventually (some rows can be skipped because new rows are added to lookup tables by ingestion and some are removed by reaper so offset does not always skip to the place is should to cover entire table).

Close [partially] #4396.

Alternative solutions

While working on this I tried to implement @fons'es idea from #4396 which was removing rows before clearing historical data which are not present in other ranges. There is a general problem with this solution. The lookup tables are actively used by ingestion which means that if rows are deleted while ingestion reads a given row it can create inconsistent data. We could modify reaper to aquire ingestion lock but if there are many ledgers to remove it can affect ingestion.

We could also write a query that finds and removes all the orphaned rows but it's too slow to be executed between ingestion of two consecutive ledgers.

Why

While Horizon removes history data when --history-retention-count flag is set it doesn't clear lookup historical tables. Lookup tables are [id, key name] pairs that allow setting pointers to keys in historical tables, thus saving disk space. This data can occupy a vast space on disk and is never used when old historical data is deleted.

Known limitations

[TODO or N/A]

services/horizon/internal/db2/history/main.go

Shaptic · 2022-08-08T16:50:46Z

services/horizon/internal/db2/history/main.go

+// In short it checks the 100 rows omiting 1000 row of history_claimable_balances
+// and counts occurences of each row in corresponding history tables.
+// If there are no history rows for a given id, the row in
+// history_claimable_balances is removed.


This is an interesting way to determine "age." It creates a dependence between the history tables, right? Is there a reason we don't rely on ledger close time, instead? I guess probably because history_claimable_balances doesn't have that row 😞

It's not really about age. The rows are sorted by id which is just a sequence integer value assigned to specific ledger object (like claimable balance). The limit ... offset listing here is just to ensure we iterate over entire table in multiple cycles.

services/horizon/internal/ingest/main.go

bartekn · 2022-08-09T09:15:04Z

I'm going to merge this without 👍 to start testing a new release. Please add comments and I'll open another PR with review fixes/requests.

Adds the last remaining table: `history_assets` to lookup table reaper. In #4518 the code responsible for reaping lookup tables was added but was missing one table: `history_assets` due to lack of proper indexes. This commit should remove all remaining data in lookup tables.

bartekn added 7 commits August 7, 2022 19:21

services/horizon: Reap history object tables when ingestion is idle

6a7cec6

Register a new metric

3e86e93

Start a DB tx in maybeReapLookupTables

2539cc4

Better logging

c58063d

Remove accounts and assets because no indexes

0d98106

Fix tests

733748a

Fix, add tests

da58f01

bartekn marked this pull request as ready for review August 8, 2022 15:45

bartekn requested a review from a team August 8, 2022 15:45

Shaptic reviewed Aug 8, 2022

View reviewed changes

Shaptic requested a review from a team August 8, 2022 16:54

bartekn added 2 commits August 9, 2022 08:56

Merge branch 'master' into reap-history-objects

fe94271

revert maybeVerifyState fix

f2a1438

bartekn merged commit ee063a7 into stellar:master Aug 9, 2022

bartekn deleted the reap-history-objects branch August 9, 2022 09:18

bartekn mentioned this pull request Aug 9, 2022

horizon: history_claimable_balances is not cleared out by the reaper. #4396

Closed

bartekn mentioned this pull request Sep 1, 2022

services/horizon: Add history_assets to lookup tables reap #4565

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

services/horizon: Reap history object tables when ingestion is idle #4518

services/horizon: Reap history object tables when ingestion is idle #4518

bartekn commented Aug 7, 2022 •

edited

Loading

Shaptic Aug 8, 2022

bartekn Aug 9, 2022

bartekn commented Aug 9, 2022

services/horizon: Reap history object tables when ingestion is idle #4518

services/horizon: Reap history object tables when ingestion is idle #4518

Conversation

bartekn commented Aug 7, 2022 • edited Loading

PR Structure

Thoroughness

Release planning

What

Alternative solutions

Why

Known limitations

Shaptic Aug 8, 2022

Choose a reason for hiding this comment

bartekn Aug 9, 2022

Choose a reason for hiding this comment

bartekn commented Aug 9, 2022

bartekn commented Aug 7, 2022 •

edited

Loading