The following command causes REPLICA to become a copy of ORIGIN:
$ sqlite3_rsync ORIGIN REPLICA ?OPTIONS?
Use the --help
or -?
flag to see the
complete list of options. Option flags may appear before, after, or
between the ORIGIN and REPLICA arguments.
Add the -v
option to see more output, in a format
similar to "rsync".
If REPLICA does not already exist, it is created.
ssh is used for communication, so "USER@HOST" may be an SSH alias.
It is not required that one of ORIGIN or REPLICA be remote. The sqlite3_rsync utility works fine if both ORIGIN and REPLIA are local.
Both databases may be "live" while this utility is running. Other programs can have active connections to the databases on either end while this utility is running. Other programs can write to ORIGIN and can read from REPLICA while this utility runs.
REPLICA becomes a copy of a snapshot of ORIGIN as it existed when the sqlite3_rsync command started. If other processes change the content of ORIGIN while this command is running, those changes will be applied to ORIGIN, but they are not transferred to REPLICA Thus, REPLICA ends up as a fully-consistent snapshot of ORIGIN at an instant in time.
The synchronization uses a bandwidth-efficient protocol, similar to rsync (from which its name is derived).
The database files must both be in WAL mode, and must have the same page-size.
While sqlite3_rsync is running, REPLICA is read-only. Queries can be run against REPLICA while this utility is running, just not write transactions.
Only a single database is synchronized for each invocation of this utility. It is not (yet) possible to synchronize many different databases using wildcards, as it is with standard "rsync".
At least one of ORIGIN or REPLICA must be on the local machine. They cannot both be databases on other machines.
On the remote system, this utility must be installed in one of
the directories in the default $PATH for SSH. The
/usr/local/bin
directory is often a good choice.
Alternately, the --exe
NAME
flag may be used to specify a remote location for the
binary, e.g. --exe /opt/bin/sqlite3_rsync
.
The replica will be a very close copy of the origin, but not an exact copy. All of the table (and index) content will be byte-for-byte identical in the replica. However, there can be some minor changes in the database header. In particular, the replica will have the following differences from the origin:
The change counter in bytes 24 through 27 of the database header might be incremented in the replica.
The version-valid-for number in bytes in 96 through 99 of the database header will be the SQLite version number of the sqlite3_rsync program that made the copy, not the version number of the last writer to the origin database.
On Windows, a single-letter HOST without a USER@ prefix will be interpreted as a Windows drive-letter, not as a hostname.
Install sqlite3_rsync simply by putting the executable somewhere on your $PATH. If you are synchronizing with a remote system, the sqlite3_rsync executable must be installed on both the local and the remote system. When installing the sqlite3_rsync executable on the remote system, ensure that it is found on the $PATH used by SSH. Putting the sqlite3_rsync executable in the /usr/local/bin directory is often a good choice.
Unfortunately, on MacOS, the default PATH for ssh is "/usr/bin:/bin:/usr/sbin:/sbin" and MacOS does not allow you to add new programs to any of those directories. So if you are trying to sync with a remote Mac, you'll have to add the --exe command line argument to sqlite3_rsync to specify the location where you have installed sqlite3_rsync on the remote side. For example:
sqlite3_rsync sample.db mac:sample.db --exe /Users/xyz/bin/sqlite3_rsync
The writer of this document has never had any success in getting SSHD to run on Windows. Perhaps he will figure that out and be able to provide instructions for syncing a database to or from a remote Windows machine in a future release.
The protocol is for the replica to send a cryptographic hash of each of its pages over to the origin side, then the origin sends back the complete content of any page for which the hash does not match.
Suppose the replica contains R pages. If the replica and the origin are already identical, then about R*20 bytes of hash are sent from the replica to the origin, and apart from some trivial overhead and housekeeping traffic, nothing else moves over the wire. So for databases with a 4096-byte page size, the minimum bandwidth required to run this utility is equivalent to about 0.5% of the database. The worst case synchronization occurs if the replica and origin are completely different and have no pages in common. In that case, about total network traffic is about 100.5% of the database size.
The calculations in the previous paragraph do not consider the compression that SSH implements. Most SQLite databases are compressible and so the bandwidth cost of a complete synchronization is probably less than 100.5% of the database size. However, the cryptographic page hashes are not compressible, so the best case will never be better than about 0.5% of the database size, for a 4096-byte page size. Minimum bandwidth required goes down for larger page sizes. The best case is about 0.03% of the database size for databases with a 65,536-byte page size.
Ordinary rsync does not understand SQLite transactions. Rsync will make a copy of ORIGIN into REPLICA, however the copy might not be consistent. Parts of the copy might be from one transaction, while other parts might from a different transaction. The database copy might be corrupt.
If no other processes are connected to the database for the entire time that rsync is running, and if the database does not have a hot journal, then rsync will make a consistent copy of the database. But if you cannot guarantee that both of those conditions are met, then rsync might generate a corrupt copy. The sqlite3_rsync utility, on the other hand, always generates a consistent copy.