Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First XML DB build does not record hashing failures #9

Open
RupeeClock opened this issue Jun 4, 2024 · 2 comments
Open

First XML DB build does not record hashing failures #9

RupeeClock opened this issue Jun 4, 2024 · 2 comments

Comments

@RupeeClock
Copy link

I am looking for an alternative to FCIV which is more robust and capable of recording errors to an output file (akin go the fciv.err file produced by FCIV), for the purposes of identifying corrupted files which are not yet part of a hashed integrity database.
FCIV was useful in this respect in that when it encounters a file that produces a Cyclic Redundancy Check error, that a record of this gets added to the fciv.err file output. However, it was limited in that it would fail and halt if a file name had unexpected characters or a directory path was too long.

When running PsFCIV on a directory for the first time with known bad files, the generated XML file will add a <FILE_ENTRY> element detailing the <name>, <Size>, and <Timestamp>, and hashes such as <MD5> if successful.
For corrupted files, it instead produces these errors within the PowerShell window which fail to identify which file produced the error:

Exception calling "HashFile" with "2" argument(s): "Data error (cyclic redundancy check).
"
At C:\Program Files\WindowsPowerShell\Modules\PsFCIV\1.1\PsFCIV.psm1:62 char:17
+ ...             $hashBytes = [PsFCIV.Support.CryptUtils]::HashFile($file, ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
    + FullyQualifiedErrorId : IOException

Exception calling "FormatBytes" with "2" argument(s): "Value cannot be null.
Parameter name: inArray"
At C:\Program Files\WindowsPowerShell\Modules\PsFCIV\1.1\PsFCIV.psm1:67 char:21
+ ...             $object.$hash = [PsFCIV.Support.CryptUtils]::FormatBytes( ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
    + FullyQualifiedErrorId : ArgumentNullException

The result is that a <FILE_ENTRY> will be added to the XML DB as a new entry without any hashes.
This is difficult to parse for identifying bad files.
The tool in its current state is suitable for identifying changes in data compared to a previously built database, but not as well suited to discovering corrupted files.

Would it be possible to modify the functionality so that when it encounters a hashing failure, that it records an explicit failure to the XML DB instead of omitting the hash object? Or alternatively, writing its own fciv.err file.

@Crypt32
Copy link
Collaborator

Crypt32 commented Jun 5, 2024

Can you clarify what do you mean under "known bad files" and "corrupted files"? Files are physically corrupted? If it is the case, what is the desired behavior when PSFCIV finds such file?

@RupeeClock
Copy link
Author

RupeeClock commented Jun 5, 2024

By known bad files, I'm referring to files that have been physically corrupted.
Outside of file hashing use cases (previously using 7zip or FCIV), they are known to be bad if they do not function correctly (e.g. in video playback where are unplayable sections), or cannot be copied to another location (Windows Explorer returning "Can't read from the source file or disk").

For my use case of identifying corrupted data to be restored from backup or otherwise salvaged, FCIV was useful because its fciv.err output would record all errors it encounters with error codes. It works by creating or appending an fciv.err file starting with the initial Command Line instruction

Here's a sanitised example of what that output may look like:

********************************************************************************
Command Line: fciv -add D: -r -md5 

HashAndStore --> d:\DumpStack.log.tmp : 
	Error msg  : Access is denied.
	Error code : 5

HashAndStore --> d:\fciv.err : 
	Error msg  : The process cannot access the file because it is being used by another process.
	Error code : 20

HashAndStore --> d:\Sample-Game\data.bin : 
	Error msg  : Data error (cyclic redundancy check).
	Error code : 17

HashAndStore --> d:\Music\Musician - Track Name?.mp3 : 
	Error msg  : The filename, directory name, or volume label syntax is incorrect.
	Error code : 7b

HashAndStore --> d:\Music\File Name with non-unicode characters.mp3 : 
	Error msg  : The system cannot find the file specified.
	Error code : 2
	
d:\Music\Folder Name with Accented Characters\*
	Error msg  : The system cannot find the path specified.
	Error code : 3
	
HashAndStore --> d:\Music\Very Long Filename that exceeds Windows built-in MAX_PATH limit of 256 or 260 characters - Older Windows applications particularly CMD applications such as FCIV cannot handle long path or file names when the full length exceeds this character limit.mp3 : 
	Error msg  : The system cannot find the path specified.
	Error code : 3

For my use case, every result with Error code 17 returning "Data error (cyclic redundancy check)" was useful.
Other error codes resulting from the dated application not handling longer paths or filenames with special characters, were not useful, which is where PsFCIV has come in useful as it can process such files.

I think the request in its simplest form is that PsFCIV during or upon completing processing, should have an error output to a file (similar to fciv.err) whenever it encounters data it cannot successfully process, including the file name and error message. This wouldn't be limited to data errors, it could handle things such as files that were moved during initial file/directory enumeration, or are currently being used by another process.

I'd like to add that since I first opened this issue, I've learned how to process PsFCIV's XML output using XSL transformations into a file listing only the <FILE_ENTRY> elements that lacked <MD5> elements, which achieves my goal of identifying all corrupted files needing treatment. I am very appreciative of PsFCIV and am thankful for it as I've been looking for a solution to this problem for a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants