Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Void Linux repository support #400

Closed
wants to merge 7 commits into from

Conversation

ninewise
Copy link

I could use some feedback on repos.d/voidlinux.yaml, mainly the sources. I used a lot of subrepos because each source might have a different version, but perhaps I should use flavours for this? I'm not sure what's the accepted practice.

cc @Vaelatern

@AMDmi3
Copy link
Member

AMDmi3 commented Nov 29, 2017

This data is not suitable for repology. There are incompatible versions with -N suffix (revision?), and there are subpackages (-32bit, -devel-32bit, -doc, etc.) which cannot be compared to anything. Having a lot of similar packages for all architectures is also a problem. See also #158.

@ninewise
Copy link
Author

The revisions can easily be scratched (I believe Package.origversion could be used to refer to them?) and it shouldn't be hard to remove subpackages (provided I can get access to both a clone of the repository and the repodata file in the same parser?).

Should I then move the architectures to a flavour, would it still be possible to include voidlinux?

This allows to parse multiple independent (but overlapping)
subrepositories such as "i386" and "amd64", and append a flavor
based on repository name, so if a package has different versions
for different architectures, older versions would be considered
outdated and not legacy.

Related to repology#400
@AMDmi3
Copy link
Member

AMDmi3 commented Nov 29, 2017

The revisions can easily be scratched (I believe Package.origversion could be used to refer to them?)

Yes. See repology/parsers/freebsd.py for SanitizeVersion pattern used for this purpose. Please stick to it, as it would make refactoring easier in future.

and it shouldn't be hard to remove subpackages (provided I can get access to both a clone of the repository and the repodata file in the same parser?).

You can't, and I don't understand how it's going to help you. However, as I see there's source-revisions field which may be useful: you could split it by : and set package.effname to the first part right away. That way, Void packages will be born assigned to correct metapackage, and information on package name will still be preserved.

Should I then move the architectures to a flavour

No, the more correct would be to add flavor based on repository name. I've just added code to allow that.

would it still be possible to include voidlinux?

As long as the data is of sufficient quality, but that can only be seen after proper parsing is implemented.

@@ -0,0 +1,46 @@
# Copyright (C) 2016-2017 Dmitry Marakasov <[email protected]>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please set proper copyright with your name,

from repology.package import Package


def parse_maintainer(maintainerstr):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already from repology.util import GetMaintainers tool.

plist_index = plistlib.load(open(index_path, 'rb'),
fmt=plistlib.FMT_XML)

return [Package(name=pkgname,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fill fields one by one. The plans are to hide Package fields behind setters to catch incorrect data on assignment.

fmt=plistlib.FMT_XML)

return [Package(name=pkgname,
version=props['pkgver'].split('-')[-1].replace('_', '-'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if that's safe - - may be encountered in a version. I'd ensure that it starts with package name and strip pkgname- from the pkgver. Don't forget to strip revision.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- may never be in a version.

###########################################################################
# Voidlinux
###########################################################################
- name: voidlinux
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is `linux' suffix necessary everywhere? I'd prefer to drop it for simplicity.

packagelinks:
- desc: Git
url: 'https://github.com/voidlinux/void-packages/tree/master/srcpkgs/{name}'
tags: [ all, production ]
Copy link
Member

@AMDmi3 AMDmi3 Nov 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add newline

- desc: Void Linux package repository
url: https://github.com/voidlinux/void-packages
packagelinks:
- desc: Git
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May also add another link to template

fetcher: WgetTar
parser: VoidLinux
url: https://repo.voidlinux.eu/current/{source}-repodata
subrepo: '{source}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+ flavor: '{source}' as mentioned in the comment

@AMDmi3
Copy link
Member

AMDmi3 commented Nov 29, 2017

I still don't like these lots of repos. This produces excessive and duplicate package information which both slows down processing and makes website views cumbersome. Since you, as I assume, are Void Linux user, can't you set up native tools to produce machine readable index of source packages out of void-packges repository? That would make parsing easier, as unmangled package name and version are available and avoid duplicate packages.

@ninewise
Copy link
Author

I did something parsing void-packages in my first commit, db19fa8, but I assumed it'd be better to avoid calling the shell in a subprocess. I could revert to that method again if that's preferred.

@AMDmi3
Copy link
Member

AMDmi3 commented Nov 29, 2017

I've meant preparing information on Void source packages outside repology. Probably somewhere on Void Linux infrastructure.

@Vaelatern
Copy link

The void-packages repository is simple: you list main packages by listing regular directories (not symlinks) and adding the python3 symlinks.

Packages do not exist on user systems in any way not merely from the repodata (including no subpkg vs. pkg distinction).

@ninewise ninewise force-pushed the voidlinux branch 3 times, most recently from ef25b5c to 11aea6e Compare November 30, 2017 13:24
@ninewise
Copy link
Author

Latest version combines the void-package repository and the repodata. This way we can be sure which are actual packages and which are subpackages, while at the same time avoiding heuristics for parsing the template files.

For this to work, I had to write my own fetcher, and I supplied some extra variables to the sources to avoid hardcoding urls and flavors. This version creates a lot less packages compared to the previous, as it uses flavors for the architectures.

@AMDmi3
Copy link
Member

AMDmi3 commented Nov 30, 2017

Looks overcomplicated now. Why's can't source-revisions be used?

@AMDmi3
Copy link
Member

AMDmi3 commented Nov 30, 2017

I've tried it. The data looks mostly good, however for some reason source-revisions is missing for a fraction of packages. Is there a reason for it?

@AMDmi3
Copy link
Member

AMDmi3 commented Nov 30, 2017

I've also noticed some subpackages with broken comment (seem to be replaced with suffix which was intended to be appended):

Nov 30 17:13:58 voidlinux:     sanity warning: libgadu-devel: comment is not stripped: " - development files"
Nov 30 17:13:59 voidlinux:     sanity warning: libsoxr-devel: comment is not stripped: " - development files"
Nov 30 17:13:59 voidlinux:     sanity warning: libsoxr-doc: comment is not stripped: " - documentation and examples"
Nov 30 17:14:00 voidlinux:     sanity warning: luaposix51: comment is not stripped: " - Lua 5.1"
Nov 30 17:14:00 voidlinux:     sanity warning: luaposix52: comment is not stripped: " - Lua 5.2"

Could it be that some packages need metadata reparsing?

@AMDmi3
Copy link
Member

AMDmi3 commented Nov 30, 2017

See my branch: https://github.com/repology/repology/tree/voidlinux
The only problem left with it are packages without source-revisions. If this can be fixed from Void side, I'm merging it. Otherwise, we may investigate parsing raw template files with bashlex module I've just stumbled upon.

@AMDmi3
Copy link
Member

AMDmi3 commented Nov 30, 2017

@ebfe from #voidlinux has a proof of concept json dump (https://gist.github.com/ebfe/e6ff186811cd207b884a9d7cff9d4e8a), I'd prefer to use it if it can be set up for automated regular updates anytime soon.

@Chocimier
Copy link

Chocimier commented Jan 6, 2018

Another option: output from xbps-src included in git repository is quite readable:

~/github/void-packages ./xbps-src show netpbm
pkgname:        netpbm
version:        10.81.0
revision:       1
distfiles:      https://github.com/chneukirchen/netpbm-mirror/archive/a5fb397b3ab3ac6b2c17c72598036bedbdca1816.tar.gz
distfiles:      https://github.com/chneukirchen/netpbm-mirror/archive/ad3acab4d1ef5acd4a232c2bec3590620cd0edea.tar.gz
checksum:       ba1a03998d920f33a42450f583cd9024c90bb986c8b6a4ab7244d660deef593c
checksum:       55678f15a2ade5ed4c75f988a05296c895042e58a8c24829929562b3344a5349
maintainer:     Leah Neukirchen <[email protected]>
Upstream URL:   http:https://netpbm.sourceforge.net/
License(s):     BSD,GPL-2,custom
short_desc:     Toolkit for manipulation of graphic images
subpackages:    libnetpbm-devel
subpackages:    libnetpbm

Package is a subpackage if it lists itself in subpackages field.

@AMDmi3
Copy link
Member

AMDmi3 commented Jan 7, 2018

Repology cannot run any scripts as it's intended to be portable. We need a ready to use dump.

@Chocimier
Copy link

We are rebuilding packages to generate missing source-revisions, more than half is done. Note that for some packages there is trailing /template there.

We moved website to https://voidlinux.org , github to https://github.com/void-linux/void-packages and repository to https://alpha.de.repo.voidlinux.org/current/ .

@Chocimier
Copy link

@AMDmi3 Indexes are now regenerated for source-revisions. There are few leftovers, that can't be easily fixed, but I hope the whole state it is acceptable.

@AMDmi3
Copy link
Member

AMDmi3 commented Dec 6, 2018

Lets recap this. The code is a year old rotten and needs to be rewritten from scratch. I no longer remember details so I have no idea what source-revisions thing is about. I can't even examine current repository contents as it acts like it knows better than me whether I can handle a file list. The latest version of code wanted to parse 21 repodatas which would presumably lead to a lot of duplicated packages, if that's still the case that is not acceptable. I also assume there's still no dedicated json or xbps-src output (which looks usable too) dump.

@Chocimier
Copy link

I no longer remember details so I have no idea what source-revisions thing is about.

Just useful metadata field missing then (details).

I can't even examine current repository contents

Bothers me too, but parser is based on repodata files, not directory listing.

The latest version of code wanted to parse 21 repodatas which would presumably lead to a lot of duplicated packages, if that's still the case that is not acceptable.

We fixed repo as, you were about to accept it.

a lot of duplicated packages

If it is a problem, I am sure there is a way to handle that: keeping entry non-duplicated by database, having a set of names of already-registed packages in parser or something other.

I also assume there's still no dedicated json

You are right, there is no.

@AMDmi3
Copy link
Member

AMDmi3 commented Dec 6, 2018

Bothers me too, but parser is based on repodata files, not directory listing.

I cannot maintain it unless I see the repository contents at least. Transparency is an absolute requirement.

We fixed repo as, you were about to accept it.

As the time pass, and more repositories are supported, quality requirements raise, as each problematic repository creates more work for me and more noise for the users.

If it is a problem, I am sure there is a way to handle that: keeping entry non-duplicated by database, having a set of names of already-registed packages in parser or something other.

Not an option, as this introduces inconsistencies.

However, we could stick with e.g. x86_64 + nonfree/x86_64.

@ninewise
Copy link
Author

ninewise commented Dec 6, 2018

The code I originally PR'd is indeed disapproved, and was no longer even discussed here.

@Chocimier
Copy link

However, we could stick with e.g. x86_64 + nonfree/x86_64.

Fine for me. Is listing directory only major issue (while I do not understand why is it an issue) then?

@AMDmi3
Copy link
Member

AMDmi3 commented Dec 6, 2018

Fine for me. Is listing directory only major issue (while I do not understand why is it an issue) then?

For now, yes. I still need to check the data.

@the-maldridge
Copy link

Seems like there's some serious confusion here about how xbps works and how our mirror infrastructure works.

First, there are multiple repodata files because there are multiple repositories. If you don't parse them then you won't have the complete view of the universe. If you're okay having massive holes in the dataset, then just parse a handful of files.

Second, the current/ directory is for machines, not humans. The page is there to prevent you from loading a page with 10k+ links on it which your browser will try to prefetch for validity. If you believe you actually want a file list then get it from rsync which is designed to do that, or get yourself netblock banned from someone else's mirror. There's no transparency in that list anyway, its pretty trivial to filter for specific UAs and give back a different list there.

@Vaelatern
Copy link

If you need to check the data, you can compose a URL to any package in question yourself. This is straightforward, and the package is provided as a tarball (by a different extension, but a tarball).

@AMDmi3
Copy link
Member

AMDmi3 commented Dec 6, 2018

First, there are multiple repodata files because there are multiple repositories. If you don't parse them then you won't have the complete view of the universe.

There is no confusion, I understand that fully. But, since there is no source packages (e.g. templates) metadata, and I'm not going to just dump a whole bunch of duplicates (archs + musl + multilib) for each package into repology, that's what we'll have to live with.

If you're okay having massive holes in the dataset, then just parse a handful of files.

The completeness of the dataset should bother you, as a primary service consumer, in the first place. And I see no other way to improve that than void publishing source metadata. Which was already demonstrated by different people, but never finished.

Second, the current/ directory is for machines, not humans. The page is there to prevent you from loading a page with 10k+ links on it which your browser will try to prefetch for validity. If you believe you actually want a file list then get it from rsync which is designed to do that, or get yourself netblock banned from someone else's mirror. There's no transparency in that list anyway, its pretty trivial to filter for specific UAs and give back a different list there.

I need to see what's I'm working with to keep it working and improve. Mirrors do not help as they all serve the stub instead of a listing. It would be OK if it worked with curl.

@the-maldridge
Copy link

The completeness of the dataset should bother you, as a primary service consumer, in the first place.

I am not a consumer, I'm the person on the other end who's dealing with people hammering mirrors rather than parsing the machine readable data.

I need to see what's I'm working with to keep it working and improve. Mirrors do not help as they all serve the stub instead of a listing. It would be OK if it worked with curl.

Then summon the listing via rsync as suggested.

@Vaelatern
Copy link

Vaelatern commented Dec 6, 2018

But, since there is no source packages (e.g. templates) metadata, and I'm not going to just dump a whole bunch of duplicates (archs + musl + multilib) for each package into repology, that's what we'll have to live with.

How would you prefer to get all the data from archs, musl, nonfree? Since any format will have duplicates that don't share every quality (same pkgname version, different arch)

@AMDmi3
Copy link
Member

AMDmi3 commented Dec 6, 2018

Then summon the listing via rsync as suggested.

Example commandline please?

@AMDmi3
Copy link
Member

AMDmi3 commented Dec 6, 2018

How would you prefer to get all the data from archs, musl, nonfree? Since any format will have duplicates that don't share every quality (same pkgname version, different arch)

I'd prefer not to deal with binary package data at all.

@the-maldridge
Copy link

rsync rsync:https://alpha.de.repo.voidlinux.org/voidmirror/current

I suggest piping that to less.

@the-maldridge
Copy link

Well, if you don't deal with the binary package data then you'll get bogus data. Not all packages sitting in the webroot are in the current generation of the tree.

@AMDmi3
Copy link
Member

AMDmi3 commented Dec 6, 2018

rsync rsync:https://alpha.de.repo.voidlinux.org/voidmirror/current/

Thanks, it needs a trailing slash, but it's acceptable for my purpose.

Well, if you don't deal with the binary package data then you'll get bogus data. Not all packages sitting in the webroot are in the current generation of the tree.

It's not bogus, it's just the discrepancy between source and binary packages. Ideally, we need knowledge of both, so we can both tell maintainers which packages need work (tied to commits), and tell users which packages are already available for them (tied to package builds). Software authors are probably interested in both. However for now we focus on maintainers, who want outdated status of their packages to be cleared as soon as possible after they commit the update. The lag of actual package builds is expected and considered tolerable, no one complained on that for now.

@the-maldridge
Copy link

As long as you understand that the tree probably doesn't resolve at all times, I guess that's fine. I'm not sure how your system works then without referential integrity checks.

You should probably pick another mirror though to pull that rsync listing from, unless of course your machines are in Germany.

@Vaelatern
Copy link

@AMDmi3 there are often multiple versions of a package in the physical directory, though only one in the repo index. The only thing that matters to users or maintainers is what's in the index. Everything else is irrelevant.

@Vaelatern
Copy link

To be clear. The repo index has nothing to do with the theoretical packages we have. It has EVERYTHING to do with the packages that people can actually install at that very moment.

@AMDmi3
Copy link
Member

AMDmi3 commented Dec 6, 2018

The only thing that matters to users or maintainers is what's in the index.

That's definitely not true for maintainers as they commit to the repository and the state of repository is what is relevant to them.

@Vaelatern
Copy link

@AMDmi3 As someone with close knowledge about Void operations, you should know that the only thing that matters is what's in the repo index, and that repo index is authoritative.

@Chocimier
Copy link

That's definitely not true for maintainers as they commit to the repository and the state of repository is what is relevant to them.

Nope. When template is updated and binary index is not updated for time longer than usual processing, it is clear signal that template is broken in some way causing new version of program not being available for users. Therefore it is not important what template contain. Binary index is.

@AMDmi3
Copy link
Member

AMDmi3 commented Dec 6, 2018

Well I don't see a point in dissuading you. The fact is that without source metadata, void support will always be lacking. Meanwhile, I've updated the code an run a local update, and it still looks good. Will commit after adding some rules.

@AMDmi3
Copy link
Member

AMDmi3 commented Dec 7, 2018

It's deployed, enjoy.

@AMDmi3
Copy link
Member

AMDmi3 commented Dec 25, 2018

@Chocimier JFYI, there are some minor parsing problems which could be worth fixing, including misformatted URLs and couple of missing source-revisions.

https://repology.org/log/550490

@Chocimier
Copy link

Chocimier commented Dec 28, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants