New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does mkfs.fat.c calculate number of data clusters properly? #125
Comments
Nice spot! I have gone through the code and agree with you, the space for the two reserved FAT entries should be subtracted from the remaining space. Relevant summary of FATI am going to summarise what I have learnt over the past few days studying FAT in case it is helpful for anyone else. TermsSector - should probably be set to hard disk sector but a few other values also work e.g. 512, 4096 Structure of FAT file systemAll these regions occupy an integer number of sectors. After the specified number of sectors for one region, the next region immediately begins, so these regions are back-to-back. Reserved sectorsCan be set to an arbitrary number of sectors.
FATThe origin of the FAT name. Each entry is 12 bits for FAT12, 16 bits (2 bytes) for FAT16, and 32 bits (4 bytes) for FAT32. The first two entries are special, each subsequent entry corresponds to a cluster in the data area. Root directoriesEach entry is 32 bytes. E.g. 512 entries is 32 (512 byte) sectors. The length of this region is 0 for FAT32 and the root directory is stored in data clusters. DataComposed of clusters storing file contents. More information about FAT can be found online. mkfs.fat discussionNow we can talk about mkfs.fat. Size determination algorithmDefinitions/initialisation
Relevant portion of algorithm
Plus/minusAs mentioned above, the plus should be minus. Recall that FAT consists of two special entries + subsequent entries describing clusters. The total available space for cluster data + FAT data actually describing clusters is However, this does not cause mkfs.fat to produce invalid filesystems. It is actually safe for the FAT to be too big. The only thing that could happen is that some sectors at the end of the FAT is unused, and the filesystem supports a few clusters less than if it was minus. Even though overestimating the FAT size is ok, I still went ahead and analysed when the algorithm actually makes different decisions. I have actually calculated that with the default settings for FAT12, the wrong algorithm does not actually make any different decisions. Useless analysis of FAT length/cluster calculation for FAT12As described in the algorithm above, the plus sign slightly overestimates the space available for the FAT and data region. This could result in the number of sectors given to a FAT being more than necessary. The number of sectors given to a FAT is calculated in two steps: calculate clusters without worrying about sector or other alignment first, and then calculate how many actual sectors needs to be given to the FAT. The two lines are reproduced below.
With reference to the full algorithm, note that a wrong value for As mentioned above, the number of FAT sectors Thus, FAT length will only change near integer values of sectors, e.g. if number of clusters is 339, 680, etc. It is sufficient to analyse the behaviour of the code near these values. I rewrote the algorithm in JavaScript to compare the Let for (var i = 1; i < 10; i++) {
// (In sectors) Space for number of clusters that i FAT sectors accommodates + space for i FAT sectors * number of FATs + (number of FATs - 1)
// If this was one sector more, the wrong code could calculate a FAT one sector bigger while still fitting the same amount of clusters
var fatdata = Math.floor(512 * i / 12 * 8 - 2) * 4 + i * 2 + 1;
var clusters = Math.floor(2 * (fatdata * 512 + 2 * 3) / (2 * 4 * 512 + 2 * 3)); // Simulate floor division as in C code
var clustplus = 2 * (fatdata * 512 + 2 * 3) / (2 * 4 * 512 + 2 * 3); // Floating point division
var clustminus = 2 * (fatdata * 512 - 2 * 3) / (2 * 4 * 512 + 2 * 3);
var fatlength = Math.ceil((((clusters + 2) * 3 + 1) >> 1) / 512);
console.log(fatdata, clustplus, clustminus, fatlength);
} The wrong FAT length in sectors would be greater than actual. To cause this,
As shown in the results above, there is no change in the FAT length calculated. Also, the decimal digits seem to repeat every three FAT lengths, so I tried to simplify the expression for Let 2 * ((Math.floor(512 * i / 12 * 8 - 2) * 4 + i * 2 + 1) * 512 + (pm) * 2 * 3) / (4 * 512 + 3);
((Math.floor(512 * i / 12 * 8 - 2) * 4 + i * 2 + 1) * 512 + pm * 2 * 3) / (4 * 512 + 3);
((Math.floor(512 * i / 3 * 2 - 2) * 4 + i * 2 + 1) * 512 + pm * 6) / (4 * 512 + 3);
((Math.floor(1024 * i / 3 - 2) * 4 + i * 2 + 1) * 512 + pm * 6) / (4 * 512 + 3);
((Math.floor(i / 3 + 341 * i - 2) * 4 + i * 2 + 1) * 512 + pm * 6) / (4 * 512 + 3);
(((Math.floor(i / 3) + 341 * i - 2) * 4 + i * 2 + 1) * 512 + pm * 6) / (4 * 512 + 3);
(((Math.floor(i / 3) + 341 * i - 2 + 0.5 * i + 0.25) * 4) * 512 + pm * 6) / (4 * 512 + 3);
((Math.floor(i / 3) + 341.5 * i - 1.75) * 4 * 512 + pm * 6) / (4 * 512 + 3);
((Math.floor(i / 3) + 341.5 * i - 1.75) * 4 * 512 + pm * 6) / (4 * 512 + 3);
((Math.floor(i / 3) + 341.5 * i - 1.75) * 4 * 512 / (4 * 512 + 3) * (4 * 512 + 3) + pm * 6) / (4 * 512 + 3);
((Math.floor(i / 3) + 341.5 * i - 1.75 ) * 2048 / 2051 * (4 * 512 + 3) + pm * 6) / (4 * 512 + 3);
(( Math.floor(i / 3) * 2048 + 341.5 * 2048 * i - 1.75 * 2048 ) / 2051 * (4 * 512 + 3) + pm * 6) / (4 * 512 + 3);
((-Math.floor(i / 3) * 3 + Math.floor(i / 3) * 2051 + 341 * 2051 * i + i - 1.75 * 2051 + 1.75 * 3) / 2051 * (4 * 512 + 3) + pm * 6) / (4 * 512 + 3);
(-Math.floor(i / 3) * 3 + i + 5.25 + (Math.floor(i / 3) * 2051 + 341 * 2051 * i - 1.75 * 2051 ) / 2051 * (4 * 512 + 3) + pm * 6) / (4 * 512 + 3);
(-Math.floor(i / 3) * 3 + i + 5.25 + (Math.floor(i / 3) + 341 * i - 1.75 ) * (4 * 512 + 3) + pm * 6) / (4 * 512 + 3);
(-Math.floor(i / 3) * 3 + i + 5.25 + pm * 6) / (4 * 512 + 3) + Math.floor(i / 3) + 341 * i - 1.75;
(i % 3 - 0.75 + 6 + pm * 6) / (4 * 512 + 3) + Math.floor(i / 3) + 341 * i - 1.75;
(i % 3 - 0.75 + (plus is true) * 2 * 6) / (4 * 512 + 3) + Math.floor(i / 3) + 341 * i - 1.75;
(i % 3 - 0.75 + (plus is true) * 12) / (4 * 512 + 3) + Math.floor(i / 3) + 341 * i - 2 + 0.25;
(2 - 0.75 + (plus is true) * 12) / (4 * 512 + 3) + (0) + 0.25;
(1.25 + (1) * 12) / 2051 + 0.25; // = 0.25646026328620186
(1.25 + (0) * 12) / 2051 + 0.25; // = 0.2506094588005851 which is nowhere near enough to make Since FAT length/cluster calculation for FAT16/FAT32However, the same cannot be said for FAT16 and FAT32. I did not analyse the math in as much detail as I did for FAT12, but I modified the mkfs.fat source code to brute force every single block value for plus and minus to find cases where they are different. It seems like in all the cases below, the wrong plus version formats the disk with 2 less clusters. The code can be found at etlow@8adc555. The output is below. Basically, 8224 to 8239 blocks with the default settings causes the number of clusters available to be 4093 instead of 4095 if minus was used.
The relevant modification I made is this for loop. /* Test code */
struct params par;
backup_params(&par);
for (unsigned long long test_blocks = 64; test_blocks < 12000000; test_blocks += 16) {
devinfo.size = test_blocks * BLOCK_SIZE;
blocks = (devinfo.size - part_sector * sector_size) / BLOCK_SIZE;
orphaned_sectors = ((devinfo.size - part_sector * sector_size) % BLOCK_SIZE) / sector_size;
plus = TRUE;
establish_params(&devinfo);
setup_tables(); /* Sets align_structures so params must be restored */
restore_params(&par);
unsigned plus_fat = fat_entries;
plus = FALSE;
establish_params(&devinfo);
setup_tables();
if (plus_fat != fat_entries) {
printf("%llu blocks: FAT%d, clusters: + %u, - %u\n",
blocks, size_fat, plus_fat - 2, fat_entries - 2);
}
restore_params(&par);
}
printf("last blocks value tested: %llu\n", blocks);
exit(0); The values above can be checked with the current mkfs.fat as follows.
Changing the plus to minus and compiling, we get 4095 clusters as expected.
|
Other proposed changesWhile looking at the code, I also noticed a few other things. Not sure if I am hijacking this thread to talk about it. Alignment to cluster boundaryIn the current mkfs.fat code, all regions are aligned to cluster boundaries. However, I do not see why this has to be done. The FAT region is made up of FAT entries which are 12, 16, or 32 bits long, and the root directory region consists of directory entries which are 32 bytes long. These two regions do not contain any structures which are inherently one cluster long or even one sector long, so there is no benefit to align them. The reserved sector region should already be aligned as it is right at the start of the partition, so the only region that needs to be aligned is the data region which consists of clusters. If I understand FAT correctly, one of the purpose of the reserved sectors region is to be used for alignment. Thus, only the number of reserved sectors has to be adjusted after knowing the length of the FAT region and root directory region. Changing the number of reserved sectors will shift the FAT region and root directory region so that the data region can start at a cluster boundary. Alignment to sectors per trackAccording to the comments in the code, this is needed by DOS and mtools. However, I have removed the alignment and it still works with the current version of mtools. It also works for me on Windows 10. I don't have a DOS system convenient for testing, so I could not check that. Does anyone know of a way I can test that this does not work? Tiny filesystem alignment disablingThe current code disables alignment if the number of sectors is below 8192. After the other alignment changes, alignment only saves at most one cluster, and alignment can still be disabled manually with the I have forked the code and implemented these changes, and I can store files in it with Windows 10 and mtools without any issue. I am unsure about creating an unsolicited pull request as it is a lot of changes. The changes can be found at etlow/dosfstools@master...align. |
dosfstools/src/mkfs.fat.c
Lines 799 to 800 in 91978a2
dosfstools/src/mkfs.fat.c
Lines 819 to 820 in 91978a2
dosfstools/src/mkfs.fat.c
Lines 847 to 848 in 91978a2
src/mkfs.fat.c lines 799/819/847
The space in the FAT's that are reserved (the first 2 entries in each) is added to the total amount of space allocated for FAT/data. In reality, shouldn't that space be subtracted, since it is already inherently part of the space allocated for the FAT's? If we are trying to find the total amount of space that can be allocated toward data clusters (the actual cluster and the entry in the FAT), then that would be all of the data for FATs and data, minus the reserved space that is not used.
This is supported by the fact in the atari_format code (line 986), the reserved entry space is subtracted instead of added.
dosfstools/src/mkfs.fat.c
Lines 980 to 989 in 91978a2
It could very well be that I am completely misunderstanding what is happening, I am just trying to figure out how FAT works myself.
The text was updated successfully, but these errors were encountered: