Skip to content

MohitPanchariya/rdiff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RDIFF

This project is an implementation of the rdiff tool of librsync used for finding the diff of a file on a remote machine. (https://github.com/librsync/librsync/blob/master/doc/rdiff.md).

Note:

This tool isn't concerned with how the signature, and delta files are sent/received over the network. It's only concerned with working with these files and performing the patch.

Usage:

Assume two machines, machine A and machine B, have different versions of the same file. Machine A wants to synchronise its file with the file present on machine B.
The following example shows the usage of rdiff on Machine A

import rdiff

# create an instance of the checksum class
checksum = rdiff.signature.Checksum()

# Create an instance of the signature class, passing in the checksum object
# Optionally a blocksize can be specified, defaults to 1024 bytes
signature = rdiff.signature.Signature(checksum=checksum)

# create the signature file
# basisFilePath -> Path to the file for which signature must be generated
# sigFilePath -> Where to store the signature file
signature.createSignature(basisFilePath="path_to_file", sigFilePath="path_to_signature_file")

# This machine sends over the signature file to the remote machine
# The remote machine responds back with a delta file
# rdiff isn't concerned with how this communication over the network takes place

patcher = rdiff.patch.Patch()
# Perform a patch operation
# How the delta file is obtained from the remote machine is not a concern of rdiff
# deltaFilePath -> Path to the delta file obtained from the remote machine
# basisFilePath -> Path to the file which is to be updated
# outFilePath -> Path to store the updated file
# Note: The original file isn't modified, instead a new file is created.
patcher.patchFile(
    deltaFilePath="path_to_delta_file", basisFilePath="path_to_file", outFilePath="path_to_updated_file"
)

The following shows the usage of rdiff on machine B

import rdiff

# create an instance of the checksum class
checksum = rdiff.signature.Checksum()

# create an instance of the delta class
delta = rdiff.delta.Delta()

# The createDelatFile method is used to create a delta file, for a given file
# against a signature file obtained from a remote machine.
# inFilePath -> Path to the updated file, the file which the remote machine wants to synchronise
# deltatFilePath -> Path to store the delta file
# sigFielPath -> Path to the signature file obtained from the remote machine.
# rdiff is not concerned with how this signature file is obtained.
# blocksize -> This should be identical to the blocksize used by the remote machine to generate the
# signature file.
delta.createDeltaFile(
    inFilePath="path_to_updated_file", deltaFilePath="path_to_delta_file",
    sigFielPath="path_to_sig_file", blockSize=1024, checksum=checksum
)

# This machine now sends the delta file to the remote machine.

The following is a combined example, on a single machine. The code can be extended to two remote machines.

"""
Assume there are two machines, machine A and machine B.
Machine A and machine B both have different versions of test.txt
Machine A wants to sync its file to have the same content as the file on machine B.

Machine A creates a signature file and sends it over to machine B.
Machine B uses this signature file and generates a delta file against the signature file
and sends the delta file back to machine A.

Machine A now uses this delta file to patch its file. Thereby, synchronising its file to have
the same content as the file on machine B.

The following is an example on a single machine. The same example can be extended to two
different machines connected over a network.
"""
import rdiff

checksum = rdiff.signature.Checksum()
signature = rdiff.signature.Signature(checksum=checksum, blockSize=1024)

# Machine A making the signature file
signature.createSignature(basisFilePath="path_to_file", sigFilePath="path_to_sig_file")

delta = rdiff.delta.Delta()
# Machine B creates the delta file using the signature file generated by Machine A
delta.createDeltaFile(
    inFilePath="path_to_updated_file", deltaFilePath="path_to_delta_file",
    sigFielPath="path_to_sig_file", blockSize=1024, checksum=checksum
)

patcher = rdiff.patch.Patch()
# Machine A patches its file (creates a new version of the file located at path_to_updated_file)
# using the delta file generated by Machine B
patcher.patchFile(
    deltaFilePath="path_to_delta_file", basisFilePath="path_to_file",
    outFilePath="path_to_updated_file"
)

File Formats

Information on the format/structure of the different files involved can be found at: https://github.com/MohitPanchariya/rdiff/blob/master/file_formats.md

Further Reading

More about the rsync algorithm can be found at: https://rsync.samba.org/tech_report/tech_report.html
A detailed explanation of the rsync algorithm is present in the PhD thesis of Andrew Tridgell, specifically chapter 3 of the thesis: https://www.samba.org/~tridge/phd_thesis.pdf