-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Void Linux repository support #400
Conversation
This data is not suitable for repology. There are incompatible versions with -N suffix (revision?), and there are subpackages (-32bit, -devel-32bit, -doc, etc.) which cannot be compared to anything. Having a lot of similar packages for all architectures is also a problem. See also #158. |
The revisions can easily be scratched (I believe Should I then move the architectures to a flavour, would it still be possible to include voidlinux? |
This allows to parse multiple independent (but overlapping) subrepositories such as "i386" and "amd64", and append a flavor based on repository name, so if a package has different versions for different architectures, older versions would be considered outdated and not legacy. Related to repology#400
Yes. See
You can't, and I don't understand how it's going to help you. However, as I see there's
No, the more correct would be to add flavor based on repository name. I've just added code to allow that.
As long as the data is of sufficient quality, but that can only be seen after proper parsing is implemented. |
repology/parsers/voidlinux.py
Outdated
@@ -0,0 +1,46 @@ | |||
# Copyright (C) 2016-2017 Dmitry Marakasov <[email protected]> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please set proper copyright with your name,
repology/parsers/voidlinux.py
Outdated
from repology.package import Package | ||
|
||
|
||
def parse_maintainer(maintainerstr): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already from repology.util import GetMaintainers
tool.
repology/parsers/voidlinux.py
Outdated
plist_index = plistlib.load(open(index_path, 'rb'), | ||
fmt=plistlib.FMT_XML) | ||
|
||
return [Package(name=pkgname, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fill fields one by one. The plans are to hide Package fields behind setters to catch incorrect data on assignment.
repology/parsers/voidlinux.py
Outdated
fmt=plistlib.FMT_XML) | ||
|
||
return [Package(name=pkgname, | ||
version=props['pkgver'].split('-')[-1].replace('_', '-'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if that's safe - -
may be encountered in a version. I'd ensure that it starts with package name and strip pkgname-
from the pkgver
. Don't forget to strip revision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
may never be in a version.
repos.d/voidlinux.yaml
Outdated
########################################################################### | ||
# Voidlinux | ||
########################################################################### | ||
- name: voidlinux |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is `linux' suffix necessary everywhere? I'd prefer to drop it for simplicity.
repos.d/voidlinux.yaml
Outdated
packagelinks: | ||
- desc: Git | ||
url: 'https://github.com/voidlinux/void-packages/tree/master/srcpkgs/{name}' | ||
tags: [ all, production ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add newline
repos.d/voidlinux.yaml
Outdated
- desc: Void Linux package repository | ||
url: https://github.com/voidlinux/void-packages | ||
packagelinks: | ||
- desc: Git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May also add another link to template
repos.d/voidlinux.yaml
Outdated
fetcher: WgetTar | ||
parser: VoidLinux | ||
url: https://repo.voidlinux.eu/current/{source}-repodata | ||
subrepo: '{source}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+ flavor: '{source}'
as mentioned in the comment
I still don't like these lots of repos. This produces excessive and duplicate package information which both slows down processing and makes website views cumbersome. Since you, as I assume, are Void Linux user, can't you set up native tools to produce machine readable index of source packages out of |
I did something parsing |
I've meant preparing information on Void source packages outside repology. Probably somewhere on Void Linux infrastructure. |
The void-packages repository is simple: you list main packages by listing regular directories (not symlinks) and adding the python3 symlinks. Packages do not exist on user systems in any way not merely from the repodata (including no subpkg vs. pkg distinction). |
ef25b5c
to
11aea6e
Compare
Latest version combines the void-package repository and the repodata. This way we can be sure which are actual packages and which are subpackages, while at the same time avoiding heuristics for parsing the template files. For this to work, I had to write my own fetcher, and I supplied some extra variables to the |
Looks overcomplicated now. Why's can't |
I've tried it. The data looks mostly good, however for some reason |
I've also noticed some subpackages with broken comment (seem to be replaced with suffix which was intended to be appended):
Could it be that some packages need metadata reparsing? |
See my branch: https://github.com/repology/repology/tree/voidlinux |
@ebfe from #voidlinux has a proof of concept json dump (https://gist.github.com/ebfe/e6ff186811cd207b884a9d7cff9d4e8a), I'd prefer to use it if it can be set up for automated regular updates anytime soon. |
Another option: output from
Package is a subpackage if it lists itself in |
Repology cannot run any scripts as it's intended to be portable. We need a ready to use dump. |
We are rebuilding packages to generate missing We moved website to https://voidlinux.org , github to https://github.com/void-linux/void-packages and repository to https://alpha.de.repo.voidlinux.org/current/ . |
@AMDmi3 Indexes are now regenerated for |
Lets recap this. The code is a year old rotten and needs to be rewritten from scratch. I no longer remember details so I have no idea what |
Just useful metadata field missing then (details).
Bothers me too, but parser is based on repodata files, not directory listing.
We fixed repo as, you were about to accept it.
If it is a problem, I am sure there is a way to handle that: keeping entry non-duplicated by database, having a set of names of already-registed packages in parser or something other.
You are right, there is no. |
I cannot maintain it unless I see the repository contents at least. Transparency is an absolute requirement.
As the time pass, and more repositories are supported, quality requirements raise, as each problematic repository creates more work for me and more noise for the users.
Not an option, as this introduces inconsistencies. However, we could stick with e.g. |
The code I originally PR'd is indeed disapproved, and was no longer even discussed here. |
Fine for me. Is listing directory only major issue (while I do not understand why is it an issue) then? |
For now, yes. I still need to check the data. |
Seems like there's some serious confusion here about how xbps works and how our mirror infrastructure works. First, there are multiple repodata files because there are multiple repositories. If you don't parse them then you won't have the complete view of the universe. If you're okay having massive holes in the dataset, then just parse a handful of files. Second, the current/ directory is for machines, not humans. The page is there to prevent you from loading a page with 10k+ links on it which your browser will try to prefetch for validity. If you believe you actually want a file list then get it from rsync which is designed to do that, or get yourself netblock banned from someone else's mirror. There's no transparency in that list anyway, its pretty trivial to filter for specific UAs and give back a different list there. |
If you need to check the data, you can compose a URL to any package in question yourself. This is straightforward, and the package is provided as a tarball (by a different extension, but a tarball). |
There is no confusion, I understand that fully. But, since there is no source packages (e.g. templates) metadata, and I'm not going to just dump a whole bunch of duplicates (archs + musl + multilib) for each package into repology, that's what we'll have to live with.
The completeness of the dataset should bother you, as a primary service consumer, in the first place. And I see no other way to improve that than void publishing source metadata. Which was already demonstrated by different people, but never finished.
I need to see what's I'm working with to keep it working and improve. Mirrors do not help as they all serve the stub instead of a listing. It would be OK if it worked with curl. |
I am not a consumer, I'm the person on the other end who's dealing with people hammering mirrors rather than parsing the machine readable data.
Then summon the listing via rsync as suggested. |
How would you prefer to get all the data from archs, musl, nonfree? Since any format will have duplicates that don't share every quality (same pkgname version, different arch) |
Example commandline please? |
I'd prefer not to deal with binary package data at all. |
I suggest piping that to less. |
Well, if you don't deal with the binary package data then you'll get bogus data. Not all packages sitting in the webroot are in the current generation of the tree. |
Thanks, it needs a trailing slash, but it's acceptable for my purpose.
It's not bogus, it's just the discrepancy between source and binary packages. Ideally, we need knowledge of both, so we can both tell maintainers which packages need work (tied to commits), and tell users which packages are already available for them (tied to package builds). Software authors are probably interested in both. However for now we focus on maintainers, who want outdated status of their packages to be cleared as soon as possible after they commit the update. The lag of actual package builds is expected and considered tolerable, no one complained on that for now. |
As long as you understand that the tree probably doesn't resolve at all times, I guess that's fine. I'm not sure how your system works then without referential integrity checks. You should probably pick another mirror though to pull that rsync listing from, unless of course your machines are in Germany. |
@AMDmi3 there are often multiple versions of a package in the physical directory, though only one in the repo index. The only thing that matters to users or maintainers is what's in the index. Everything else is irrelevant. |
To be clear. The repo index has nothing to do with the theoretical packages we have. It has EVERYTHING to do with the packages that people can actually install at that very moment. |
That's definitely not true for maintainers as they commit to the repository and the state of repository is what is relevant to them. |
@AMDmi3 As someone with close knowledge about Void operations, you should know that the only thing that matters is what's in the repo index, and that repo index is authoritative. |
Nope. When template is updated and binary index is not updated for time longer than usual processing, it is clear signal that template is broken in some way causing new version of program not being available for users. Therefore it is not important what template contain. Binary index is. |
Well I don't see a point in dissuading you. The fact is that without source metadata, void support will always be lacking. Meanwhile, I've updated the code an run a local update, and it still looks good. Will commit after adding some rules. |
It's deployed, enjoy. |
@Chocimier JFYI, there are some minor parsing problems which could be worth fixing, including misformatted URLs and couple of missing source-revisions. |
Missing source-revisons come from packages troublesome to rebuild, as I mentioned before. Homepages are fixed now, thanks for info.
|
I could use some feedback on
repos.d/voidlinux.yaml
, mainly the sources. I used a lot of subrepos because each source might have a different version, but perhaps I should use flavours for this? I'm not sure what's the accepted practice.cc @Vaelatern