Skip to content

Commit

Permalink
Update README
Browse files Browse the repository at this point in the history
  • Loading branch information
jlongster committed Aug 12, 2021
1 parent 6a94d73 commit d98ad07
Show file tree
Hide file tree
Showing 2 changed files with 62 additions and 87 deletions.
147 changes: 61 additions & 86 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,131 +5,106 @@ It implements a backend for [sql.js](https://github.com/sql-js/sql.js/) (sqlite3

It basically stores a whole database into another database. Which is absurd.

[See the demo](https://priceless-keller-d097e5.netlify.app/)
[See the demo](https://priceless-keller-d097e5.netlify.app/). You can also view an entire app using this [here](https://app-next.actualbudget.com/).

## Why do that?

IndexedDB is not a great database. It's slow, hard to work with, and has very few advantages for small local apps. Most cases are served better with SQL.

## ... How well does it work?

It works absurdly well. It consistently beats IndexedDB performance up to 10x:

Read performance: doing something like `SELECT SUM(value) FROM kv`:

<img width="610" alt="perf-sum-chrome" src="https://user-images.githubusercontent.com/17031/129102253-8adf163a-76b6-4af8-a1cf-8e2e39012ab0.png">

Write performance: doing a bulk insert:

<img width="609" alt="perf-writes-chrome" src="https://user-images.githubusercontent.com/17031/129102454-b4c362b3-1b0a-4625-ac96-72fc276497f3.png">

Why? It's simple once you think about it: since we are reading/writing data in 4K chunks (size is configurable), we automatically batch reads and writes. If you want to store 1 million objects into IDB, you need to do 1 million writes. With this absurd backend, it only needs to do ~12500 writes.

Usually when doing this kind of thing, there is a serious downside. But in this case, there isn't really. We get access to tons of features we didn't have before: views, full-text search, proper indexes, anything sqlite3 can do. It's a win-win.

The only real downside is you have to download a 1MB WebAssembly file. That might be a non-starter for you, but for any real apps that's fine.

There's one catch: **it requires `SharedArrayBuffer`**. Safari is the last browser to not enable it yet, but it's going to. In other browsers you need some special headers to enable it, but this is fine. In the future it will be available in all browsers.
You should also read [this blog post](https://jlongster.com/future-sql-web) which explains the project in great detail.

## How do I use it?

This is very early stages, but first you install the packages:
You can check out the [example project](https://github.com/jlongster/absurd-example-project) to get started. Or follow the steps below:

First you install the packages:

```
yarn add @jlongster/sql.js absurd-sql.js-backend
```

Right now you need to use my fork of `sql.js`, but I'm going to open a PR and hopefully get it merged. The changes are minimal.

The following code will get you up and running:
absurd-sql **must** run in a worker. This is fine because you really shouldn't be blocking the main thread anyway. So on the main thread, do this:

```js
import initSqlJS from '@jlongster/sql.js';
import { SqliteFS } from 'absurd-sql.js-backend';
import IndexedDBBackend from 'absurd-sql.js-backend/dist/indexeddb-backend');
import { initBackend } from 'absurd-sql/dist/indexeddb-main-thread';

function init() {
let worker = new Worker(new URL('./index.worker.js', import.meta.url));
// This is only required because Safari doesn't support nested
// workers. This installs a handler that will proxy creating web
// workers through the main thread
initBackend(worker);
}

async function run() {
// Initialize sql.js (loads wasm)
let SQL = await initSqlJS();
init();
```
// Create the backend and filesystem
let backend = new IndexedDBBackend(4096);
let BFS = new SqliteFS(SQL.FS, backend);
Then in `index.worker.js` do this:
// For now, we need to initialize some internal state. This
// API will be improved
SQL.register_for_idb(BFS);
await BFS.init();
```js
import initSqlJs from '@jlongster/sql.js';
import { SQLiteFS } from 'absurd-sql';
import IndexedDBBackend from 'absurd-sql/dist/indexeddb-backend';

// Mount the filesystem
FS.mount(BFS, {}, '/blocked');
async function run() {
let SQL = await initSqlJs({ locateFile: file => file });
let sqlFS = new SQLiteFS(SQL.FS, new IndexedDBBackend());
SQL.register_for_idb(sqlFS);

let db = new SQL.Database('/blocked/db.sqlite', { filename: true });
SQL.FS.mkdir('/sql');
SQL.FS.mount(sqlFS, {}, '/sql');

// Always use a memory journal; writing it makes no sense
db.exec('PRAGMA journal_mode=MEMORY;')
let db = new SQL.Database('/sql/db.sqlite', { filename: true });
// You might want to try `PRAGMA page_size=8192;` too!
db.exec(`
PRAGMA journal_mode=MEMORY;
`);

// Use sqlite and never lose data!
// Your code
}
```
If you look in your IndexedDB database, you should see something like this:

<img width="831" alt="Screen Shot 2021-07-21 at 12 12 26 PM" src="https://user-images.githubusercontent.com/17031/126525517-6b5429db-e4d8-43f0-af48-352a55456995.png">

## How does it work?

I will write this out more later, but there are many fun tricks that make this work:

### `SharedArrayBuffer` and `Atomics.wait`

The biggest problem is when sqlite does a read or write, the API is totally synchronous because it's based on the C API. Accessing IndexedDB is always async, so how do we get around that?

We spawn a read/write process and give it a `SharedArrayBuffer` and then use the [`Atomics`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Atomics) API to communicate via the buffer. For example, our backends writes a read request into the shared buffer, and the worker reads it, performs the read async, and then writes the result back.

The real magic is the [`Atomics.wait`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Atomics/wait) API. It's a beautiful thing. When you call it, it completely blocks JS until the condition is met. You use it to wait on some data in the `SharedArrayBuffer`, and this is what enables us to turn the async read/write into a sync one. The backend calls it to wait on the result from the worker and blocks until it's done.

### Long-lived IndexedDB transactions
## Requirements
IndexedDB has an awful behavior where it auto-commits transactions once the event loop is done processing. This makes it impossible to use a transaction over time, and requires you to create a new one if you are doing many reads over time. Creating a transaction is _super_ slow and this is a massive perf hit.
Because this uses `SharedArrayBuffer` and the `Atomics` API, there are some requirement for code to run.
However, `Atomics.wait` is so great. We _also_ use it in the read/write worker to block the process which keeps transactions alive. That means while processing requests from the backend, we can reuse a transaction for all of them. If 1000 reads come through, we will use the same `readonly` transaction for all of them, which is a massive speedup.
* It must be run in a worker thread (you shouldn't block the main thread with queries anyway)
* Your server must respond with the following headers:
### Automatically choosing between `get` and cursors

Because we can keep a transaction for reads over time, we can use IndexedDB cursors to iterate over data when handling sequential read requests. There's a lot of interesting tradeoffs here because opening a cursor is actually super slow in some browsers, but iterating is a lot faster than many `get` requests. This backend will intelligently detect when several sequential reads happen and automatically switch to using a cursor

### Leverage IndexedDB transaction semantics for locking

We fully embrace IndexedDB transaction semantics to ensure correct ordering of read/writes. We map sqlite's lock/unlock requests to transactions in a way that works (still needs to be 100% verified), and the best thing about this is a database can never leave a lock open.
```
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
```
Browsers already handle terminating IDB transactions in weird situations. Because we only rely on IDB transactions, our locks will get properly terminated as well.
Those headers are required because browsers only enable `SharedArrayBuffer` if you tell it to isolate the process. There are potential security problems if `SharedArrayBuffer` was available everywhere.
## Browser differences
## Fallback mode
If you look at the [demo](https://priceless-keller-d097e5.netlify.app/), we insert 1,000,000 items into a database and scan through all of them with a `SELECT COUNT(*) FROM kv)`. This causes a lot of reads. We've recorded a lot of statistics for how IDB performs across browsers and will write out more soon.
We do support browsers without `SharedArrayBuffer` (only Safari). Read more about it here: https://jlongster.com/future-sql-web#fallback-mode-without-sharedarraybuffer
For now, here are a couple things. This is a graph of all the reads recorded during that SQL query. The X axis is the time at which the read finished (ms), and Y axis is the time the read took (ms). There are a total of ~12500 reads.
There are some limitations in this mode: only one tab can be writing the database at a time. The database will never be corrupted; if multiple tabs try to write it will just throw an error (in the future it should call a handler that you provide so you can notify the user).
### Chrome
## Performance
Chrome has a p50 read time of .280ms and a total time of ~4.2s:
It consistently beats IndexedDB performance up to 10x:
<img width="488" alt="Screen Shot 2021-07-21 at 12 24 24 PM" src="https://user-images.githubusercontent.com/17031/126525556-8e44ec33-4e6e-4c5f-80cd-4a887adfa7cf.png">
Read performance: doing something like `SELECT SUM(value) FROM kv`:
### Firefox
<img width="610" alt="perf-sum-chrome" src="https://user-images.githubusercontent.com/17031/129102253-8adf163a-76b6-4af8-a1cf-8e2e39012ab0.png">
Firefox has a p50 read time .101 and a total time of ~.1.8s:
Write performance: doing a bulk insert:
<img width="490" alt="Screen Shot 2021-07-21 at 12 33 12 PM" src="https://user-images.githubusercontent.com/17031/126525626-325a19bf-94b0-4c63-84ed-ff930483cdd0.png">
<img width="609" alt="perf-writes-chrome" src="https://user-images.githubusercontent.com/17031/129102454-b4c362b3-1b0a-4625-ac96-72fc276497f3.png">
Look how nicely consistent that is.
These are all on a 2015 macbook pro. Benchmark code is in `src/examples/bench`.
### Others
## How does it work?
The demo works in the latest version of Safari Technical Preview if you enable SharedArrayBuffer, but unfortunately high resolution timers are not available. That's sad because Safari seem to have great perf, similar to Firefox. Chrome is the real slow one here.
Read [this blog post](https://jlongster.com/future-sql-web) for more details.
I haven't tried other browsers.
## Where you can help
We should run these stats with a lot of other types of queries as well, and I'll do that in the future.
There are several things that could be done:
* Add a bunch more tests
* Implement a `webkitFileSystem` backend
* I already started it [here](https://gist.github.com/jlongster/ec00ddbb47b4b29897ab5939b8e32fbe), but initial results showed that it was way slower?
* Bug fixes
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "absurd-sql",
"version": "0.0.47",
"version": "0.0.48",
"main": "./dist/index.js",
"scripts": {
"build": "rm -rf dist && rollup -c rollup.config.js",
Expand Down

0 comments on commit d98ad07

Please sign in to comment.