Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to store uid, gid and perms in cache file #265

Closed
jonjacksonma opened this issue Feb 28, 2024 · 10 comments
Closed

Option to store uid, gid and perms in cache file #265

jonjacksonma opened this issue Feb 28, 2024 · 10 comments

Comments

@jonjacksonma
Copy link

Hi.
It would be great to have the option for qdirstat-cache-writer to have the option of storing uid, gid and permissions in the cache file. The use-case is a large shared filesystem where you typically want to know who owns the largest files/folders after identifying them, and where generating the cache file ahead of time via cron is necessary to load the tree in a reasonable time.
thanks,
Jon

@shundhammer
Copy link
Owner

That would mean making the file format incompatible to older versions, which I am very reluctant to do: It is also in use for some backup software because it's so simple.

In environments where large shared filesystems are still a thing (I haven't seen many of those over the last 20 or so years), isn't directory ownership typically implicit with the path? Do you really have directory trees where various users create files and directories all over the place? Don't departments, teams and users get some subtree assigned where they have permissions to create their own files and directories?

@shundhammer
Copy link
Owner

The file format specification:

https://github.com/shundhammer/qdirstat/blob/master/doc/cache-file-format.txt

Adding another three fields is pretty trivial, of course. It would be a numeric (!) UID, GID, and octal permissions. I recall something about an UID <-> user name mapping service for NFS for environments where they may be different on different machines; that would be out of the question.

The file format already has a version number in its header, which helps to identify which parser to use. But QDirStat would need to retain backwards compatibility with the old file format; that makes the code a bit uglier.

@shundhammer
Copy link
Owner

shundhammer commented Feb 29, 2024

Writing UID, GID and permissions works now in the Perl qdirstat-cache-writer, and the QDirStat binary can read them. Please check out and build the huha-cache-uid branch and do some initial testing.

Writing the new format with the QDirStat built-in cache writer will come tomorrow. Done.

Docs for the new file format here. The format has also become a bit prettier and easier to read for humans.

qdirstat-cache-writer now also has two new command line options to enforce the old format V1.0 with -1, and the new format V2.0 with -2 (for completeness; it's the new default).

@shundhammer
Copy link
Owner

This is now merged to master.

@jonjacksonma
Copy link
Author

jonjacksonma commented Mar 1, 2024

Thanks for adding these options, it will be extremely helpful. The new options -1 and -2 for qdirstat-cache-writer work perfectly. There seems to be an issue with -l though. When read into QDirStat gui, there are spurious files with names that are 3 or 4 digit numbers and size of 1.8GB
qdirstat-cache-write-l-option

Tested with QDirStat 1.9.01-git

@shundhammer
Copy link
Owner

Oops... that was a missing whitespace delimiter in that long format between the type and the name/full path; so both were conflated into what appeared to be one single field like F/foo/bar/myfile, and the subsequent fields on that line all moved up one position.

This is now fixed.

@shundhammer
Copy link
Owner

BTW you can easily look into a cache file, even if it's gzipped: Just use zless, zcat or zgrep. You don't even need to gunzip it.

@shundhammer
Copy link
Owner

Fun fact: That whole thing moved the fields of the affected lines one position up, so some other was interpreted as the size; and time_t of today is around 0x65e1d6ff (seconds since 1970-01-01 00:00) which translates to just about 1.6 GB. :-)

@jonjacksonma
Copy link
Author

I was wondering how the fields ended up mapped... Thanks for the fix.

This doesn't affect functionality at all but while testing I noticed that the cache file itself gets listed in the cache with a size of 257 bytes, i.e. the size while it's still being written to. Could be a cosmetic improvement to exclude the cache file from listing
qdirstat-cache-file

@shundhammer
Copy link
Owner

Well, it's there in that directory, so of course it will be listed. And yes, of course this is just a snapshot in time, and a moment later the size may be different; like with all files on a modern OS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants