This was written in an era before the USB sticks... When floppies were still in use, and CDs had just started to replace them... Still, it could be adapted and made useful for today's technology... I still have a large percentage of read failures when it comes to my burned CDs/DVDs, and would very much like to find some time and apply another layer of Reed-Solomon over the ones already used by normal recording tech... |
In the Internet age, these transfers are mostly done through e-mail attachments or ftp/www servers. Which means that both source and destination machines must have access to the Web.
What do you do when this isn't the case?
You could burn CD-Rs for your files - even though most of the time, your data amounts to less than 5 MB (assuming of course that you have the necessary equipment inside your computer). If you don't (like me), you could kindly ask your company's sysadmin to do it for you. But what if this happens on a daily basis? You can't constantly bother your sysadmin (unless you want to suddenly see that your computer's web access became rather... slow :‑) So you do the brave thing, you just compress your files in a couple of disks, and take them with you. Right?
Sadly, this world is governed by Murphy's law. 9 times out of 10, one of your diskettes will have a defective sector or two, which ruins the whole set. Even if you use archivers that employ some kind of error correction (like rar, from Eugene Roshal) you basically cross your fingers and hope for the best.
Until now.
The basic plot is that with error correcting codes, like for example Reed-Solomon, one can use parity bytes to protect a block of data. For my needs, I chose a block of 232 bytes, added 24 bytes of parity (totaling approximately 10% of excess information), and was able to correct up to 12 erroneous bytes inside the 232-byte block. You'd argue of course that a diskette is a block device, that works or fails on 512-byte sector boundaries. True. So all we have to do, is to simply interleave the stream inside the diskette's sectors, in the following way:
1st byte of stream | first byte of first sector |
2nd byte of stream | first byte of second sector |
... | ... |
2879th byte of stream | first byte of last sector |
2880th byte of stream | second byte of first sector |
2881st byte of stream | second byte of second sector |
... | ... |
This way, if a sector is defective, you only lose one byte inside your 232-byte block for each of the 512 blocks that pass through that sector - and the algorithm can handle 12 of those errors per block! Taking into account the fact that sector errors are local events (in terms of diskette space), an even better way to interleave is this:
1st byte of stream | first byte of first sector |
2nd byte of stream | first byte of Nth sector |
... | ... |
(2880/N) byte of stream | first byte of second sector |
((2880/N)+1) byte of stream | first byte of (2+N)th sector |
... | ... |
...where N is a parameter (in my test implementation, I used N=8).
So, all you have to do when transferring your files, is to instruct your compression utility (rar or arj or whatever) to create 1336300-byte blocks. You then transfer these files to your diskettes in the way I described, and you can stop worrying about bad sectors - if it happens, you'll still get your data, at a very high probability (To lose data, your diskette must have 12 bad sectors on 8-sector intervals, for example: sectors 0,8,16,..,96 must be bad. Quite unlikely, unless you play baseball with your diskettes).
Seriously, this idea can obviously be implemented for other mediums - even be a true filesystem, in the ext2 and NTFS sense. Go ahead and implement it to your heart's desire.
A sample implementation for Windows (and Linux, through dd usage) is available (.7z stands for 7-zip):
Index | CV | Updated: Sat Oct 8 12:33:59 2022 |