Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core): initialize SQLite off-main-thread #18401

Merged
merged 7 commits into from
Mar 27, 2023

Conversation

mmastrac
Copy link
Contributor

@mmastrac mmastrac commented Mar 23, 2023

This gets SQLite off the flamegraph and reduces initialization time by somewhere between 0.2ms and 0.5ms. In addition, I took the opportunity to move all the cache management code to a single place and reduce duplication. While the PR has a net gain of lines, much of that is just being a bit more deliberate with how we're recovering from errors.

The existing caches had various policies for dealing with cache corruption, so I've unified them and tried to isolate the decisions we make for recovery in a single place (see open_connection in CacheDB). The policy I chose was:

  1. Retry twice to open on-disk caches
  2. If that fails, try to delete the file and recreate it on-disk
  3. If we fail to delete the file or re-create a new cache, use a fallback strategy that can be chosen per-cache: InMemory (temporary cache for the process run), BlackHole (ignore writes, return empty reads), or Error (fail on every operation).

The caches all use the same general code now, and share the cache failure recovery policy.

In addition, it cleans up a TODO in the NodeAnalysisCache.

@mmastrac mmastrac force-pushed the sqlite_off_thread branch 8 times, most recently from 6dc1a14 to c0619c4 Compare March 24, 2023 16:23
@mmastrac mmastrac marked this pull request as ready for review March 24, 2023 17:08
@mmastrac mmastrac changed the title [WIP] Move SQLite cache initialization off main thread feat(core): initialize SQLite off-main-thread Mar 24, 2023
Copy link
Member

@dsherret dsherret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good! I just have a few small comments.

cli/cache/cache_db.rs Outdated Show resolved Hide resolved
cli/cache/cache_db.rs Outdated Show resolved Hide resolved
cli/cache/cache_db.rs Outdated Show resolved Hide resolved
cli/cache/cache_db.rs Outdated Show resolved Hide resolved
cli/cache/cache_db.rs Outdated Show resolved Hide resolved
cli/cache/caches.rs Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice 👍

cli/cache/cache_db.rs Outdated Show resolved Hide resolved
cli/cache/cache_db.rs Show resolved Hide resolved
cli/proc_state.rs Show resolved Hide resolved
@bartlomieju
Copy link
Member

@mmastrac could you post latest benchmark results against v1.32.1?

@mmastrac
Copy link
Contributor Author

I ran some benchmarks w/different journal modes, as I believe we're going to have to move from journal_mode=OFF back to either journal_mode=TRUNCATE or journal_mode=WAL. I think we'll want to use TRUNCATE based on these numbers.

There's another optimization that wins us a few milliseconds - just dropping the SQLite connection without running cleanup. I want to hold that one back until I can confirm that we're not going to risk corruption with it, but there's no reason it should be any different than crashing, which is already safe w/SQLite.

Summary:

Scenario                   Time       Commit
Base                       12.5ms     9eb
SQLite off-thread (SOT)    12.4ms     514
Base + WAL                 12.7ms     6c0
Base + TRUNCATE            12.6ms     ec4
SOT + WAL                  12.5ms     ef9
SOT + TRUNCATE             12.5ms     d8c

Raw results:

Benchmark 1: ./deno-9ebce6e725dd0b33aea20025995fb1e790b92df5 run empty.js
  Time (mean ± σ):      12.5 ms ±   0.4 ms    [User: 8.8 ms, System: 2.5 ms]
  Range (min … max):    11.9 ms …  14.3 ms    100 runs

Benchmark 2: ./deno-514313cc5d925015f3fd9e8916c04e9e5524782f run empty.js
  Time (mean ± σ):      12.4 ms ±   0.4 ms    [User: 9.0 ms, System: 3.0 ms]
  Range (min … max):    11.9 ms …  15.0 ms    100 runs

Benchmark 3: ./deno-6c0b5239a run empty.js
  Time (mean ± σ):      12.7 ms ±   0.3 ms    [User: 8.8 ms, System: 2.6 ms]
  Range (min … max):    12.0 ms …  14.6 ms    100 runs

Benchmark 4: ./deno-ec48252fc run empty.js
  Time (mean ± σ):      12.6 ms ±   0.7 ms    [User: 8.8 ms, System: 2.6 ms]
  Range (min … max):    11.8 ms …  14.6 ms    100 runs

Benchmark 5: ./deno-d8c2f5bfe run empty.js
  Time (mean ± σ):      12.5 ms ±   0.5 ms    [User: 9.0 ms, System: 3.1 ms]
  Range (min … max):    11.7 ms …  15.1 ms    100 runs

Benchmark 6: ./deno-ef9a5c1c9 run empty.js
  Time (mean ± σ):      12.5 ms ±   0.3 ms    [User: 8.9 ms, System: 3.4 ms]
  Range (min … max):    12.0 ms …  13.9 ms    100 runs```

@dsherret dsherret disabled auto-merge March 27, 2023 17:48
Copy link
Member

@dsherret dsherret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

cli/cache/cache_db.rs Show resolved Hide resolved
@bartlomieju
Copy link
Member

A couple observations from the benchmark page:

  1. Thread count jumped from 8 to 10 - kind of unexpected, I thought we're gonna use 1 more thread:
    Screenshot 2023-03-28 at 10 23 37
  2. Memory usage increased by about 4Mb:
    Screenshot 2023-03-28 at 10 24 19

I'm not suggesting we revert this change, just wanted to put it on people's radar

mmastrac added a commit that referenced this pull request Mar 28, 2023
…t in CacheDB (#18469)

Fast-follow on #18401 -- the reason that some tests were panicking in
the `CacheDB` `impl Drop` was that the cache itself was being dropped
during panic and the runtime may or may not still exist at that point.
We can reduce the actual tokio runtime testing to where it's needed.

In addition, we return the journal mode to `TRUNCATE` to avoid the risk
of data corruption.
mmastrac added a commit that referenced this pull request Mar 31, 2023
This gets SQLite off the flamegraph and reduces initialization time by
somewhere between 0.2ms and 0.5ms. In addition, I took the opportunity
to move all the cache management code to a single place and reduce
duplication. While the PR has a net gain of lines, much of that is just
being a bit more deliberate with how we're recovering from errors.

The existing caches had various policies for dealing with cache
corruption, so I've unified them and tried to isolate the decisions we
make for recovery in a single place (see `open_connection` in
`CacheDB`). The policy I chose was:

 1. Retry twice to open on-disk caches
 2. If that fails, try to delete the file and recreate it on-disk
3. If we fail to delete the file or re-create a new cache, use a
fallback strategy that can be chosen per-cache: InMemory (temporary
cache for the process run), BlackHole (ignore writes, return empty
reads), or Error (fail on every operation).

The caches all use the same general code now, and share the cache
failure recovery policy.

In addition, it cleans up a TODO in the `NodeAnalysisCache`.
mmastrac added a commit that referenced this pull request Mar 31, 2023
…t in CacheDB (#18469)

Fast-follow on #18401 -- the reason that some tests were panicking in
the `CacheDB` `impl Drop` was that the cache itself was being dropped
during panic and the runtime may or may not still exist at that point.
We can reduce the actual tokio runtime testing to where it's needed.

In addition, we return the journal mode to `TRUNCATE` to avoid the risk
of data corruption.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants