services/horizon: Reap history object tables when ingestion is idle #4518
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Checklist
PR Structure
otherwise).
services/friendbot
, orall
ordoc
if the changes are broad or impact manypackages.
Thoroughness
.md
files, etc... affected by this change). Take a look in the
docs
folder for a given service,like this one.
Release planning
needed with deprecations, added features, breaking changes, and DB schema changes.
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.
What
This commit adds code responsible for clearing orphaned rows in lookup historical tables. Orphaned rows can appear when old data is removed by reaper. The new code is separate from the existing reaper code (see "Alternative solutions" below) and activates after each ledger if there are no more ledgers to ingest in the backend. This has two advantages: it does not slow down catchup and it works only when ingestion is idle which shouldn't affect ingestion at all. To ensure performance is not affected, the
ReapLookupTables
method is called with 5 seconds context timeout which means that if it does not finish the work in specified time it will simply be cancelled.The solution here requires new indexes added in c2d52f0 (without it finding the rows to delete is slow). For each lookup table, we check the number of occurences of a given lookup ID in all the tables in which lookup table is used. If no occurences are found, the row is removed from lookup table.
Rows are removed in batches of 10000 rows (can be modified in the future). The cursor is updated when tables is processed so after next ledger ingesion the next chunk of rows is checked. When cursor reaches the end of table it is reset back to 0. This ensures that all the orphaned rows are removed eventually (some rows can be skipped because new rows are added to lookup tables by ingestion and some are removed by reaper so
offset
does not always skip to the place is should to cover entire table).Close [partially] #4396.
Alternative solutions
While working on this I tried to implement @fons'es idea from #4396 which was removing rows before clearing historical data which are not present in other ranges. There is a general problem with this solution. The lookup tables are actively used by ingestion which means that if rows are deleted while ingestion reads a given row it can create inconsistent data. We could modify reaper to aquire ingestion lock but if there are many ledgers to remove it can affect ingestion.
We could also write a query that finds and removes all the orphaned rows but it's too slow to be executed between ingestion of two consecutive ledgers.
Why
While Horizon removes history data when
--history-retention-count
flag is set it doesn't clear lookup historical tables. Lookup tables are[id, key name]
pairs that allow setting pointers to keys in historical tables, thus saving disk space. This data can occupy a vast space on disk and is never used when old historical data is deleted.Known limitations
[TODO or N/A]