Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpm2archive -f pax cannot handle utf8 filenames #2972

Open
mlschroe opened this issue Mar 14, 2024 · 13 comments
Open

rpm2archive -f pax cannot handle utf8 filenames #2972

mlschroe opened this issue Mar 14, 2024 · 13 comments
Assignees

Comments

@mlschroe
Copy link
Contributor

It fails because it want to convert the filenames to utf8:

$ echo $LC_CTYPE
de_DE@euro
$ rpm2archive /usr/src/packages/RPMS/x86_64/empty-3.0.0-1.x86_64.rpm > /dev/null
Error writing archive: Can't translate pathname './fooöo' to UTF-8 (84)

@mlschroe
Copy link
Contributor Author

In this case archive_write_header() returns ARCHIVE_WARN, which is treated as error in rpm2archive. OTOH I don't think libarchive should mess with the file names, maybe it makes sense to set the hdrcharset to BINARY for pax. But that adds a "hdrcharset=BINARY" attribute that GNU tar complains about.

@mlschroe
Copy link
Contributor Author

Oh, and the error handling in rpm2archive is completely broken...

@mlschroe
Copy link
Contributor Author

Btw, it cannot handle UTF8 filenames as well, as it checks the current locale which is not initialized and thus 7 bit ascii...

@mlschroe
Copy link
Contributor Author

I find it very surprising that bsdtar's output depends on the current locale, but that seems to be the case:

$ echo hello > micro_µ
$ bsdtar -cf - . | bsdtar -tf -
./
./micro_µ
$ LC_CTYPE=de_DE@euro bsdtar -cf - . | bsdtar -tf -
./
./micro_µ
$ LC_CTYPE=de_DE@euro bsdtar --options hdrcharset=BINARY -cf - . | bsdtar -tf -
./
./micro_µ

@mlschroe
Copy link
Contributor Author

It's not much work to not use libarchive for writing. The only two formats that can be used for archive writing are cpio and pax (all the others have too many limitations). Writing cpio is easy and writing a pax tar file is also not hard (reading a tar file is where it gets really messy because of all the different implementations).

Is that something you would be interested in?

@pmatilai
Copy link
Member

rpm2archive could use some love for sure, but I'd rather not teach it about format internals, that's just the kind of thing I'd rather outsource to somebody else - like libarchive. If it doesn't do what we want it to do, then lets at least look at fixing it instead of doing it in rpm, it'd benefit way more people too.

As for the encoding, I think here's a case for RPMTAG_ENCODING: if that's present and says utf-8 (upstream rpm will never put anything else there) we can safely assume utf-8. Anything else is a legacy case and if the easiest solution is to just say "BINARY" encoding then that's fine with me. Or does "GNU tar complains" mean it actually fails entirely rather than just warn?

@mlschroe
Copy link
Contributor Author

It just warns about the unknown attribute.

@mlschroe
Copy link
Contributor Author

And this is about file names, I think "upstream rpm" treats those pretty much as binary as they are created by the build process and not part of the spec file.

@pmatilai
Copy link
Member

Right. So the warning would only be seen by folks who run rpm2archive to convert legacy rpms to tar - ie a rare corner case really. A harmless warning from gnu tar in that case is quite acceptable to me at least.

@mlschroe
Copy link
Contributor Author

Why "legacy"? Does the current code reject non-utf8 file names?

@pmatilai
Copy link
Member

pmatilai commented Mar 18, 2024

It does, by default. For many years now.

Looking closer: we turned it into an error five years ago, before that it was a warning for a similar period of time. It's still macro overridable for of course for v4 packages.

@mlschroe
Copy link
Contributor Author

That makes things a bit easier, so we just need to teach libarchive that it should accept utf8. I'll adapt the title of this issue ;-)

@mlschroe mlschroe changed the title rpm2archive -f pax cannot handle non-utf8 filenames rpm2archive -f pax cannot handle utf8 filenames Mar 18, 2024
@mlschroe
Copy link
Contributor Author

I'll open a pull request for this.

mlschroe added a commit to mlschroe/rpm that referenced this issue Mar 25, 2024
Our headers are always useing utf8 and the pax standard also requires
utf8 strings. So do this nasty little locale switching to make
libarchive not depend on the active locale.

Fixes issue rpm-software-management#2972
pmatilai pushed a commit that referenced this issue Mar 28, 2024
Our headers are always useing utf8 and the pax standard also requires
utf8 strings. So do this nasty little locale switching to make
libarchive not depend on the active locale.

Fixes issue #2972
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants