-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unmatched size of mirrored data while finishing bandersnatch mirror
#1105
Comments
something elseToday I found something more interesting:
Yes, always the same list.
And why the first line only has a number(serial?), unlike the other lines? Is this list match the previous
lastLooking forward for the reply. I am now backuping the whole disk image before I do any further. |
HI there, The size on PyPI is a sum of the database metadata. I wouldn't be surprised of the deletions are not updating it correctly or something. Could be worth a check. Usually when this happens it's 1 package causing issue. This file can be removed and bandersnatch will try sync again from the serial in the serial file along side the todo. So you should be safe to delete it and let it resume. |
OK, maybe another reason is, some deleted data in upstream can not be synced with bandersnatch, but still exists on upstream server?
|
This might relate to the fact that bandersnatch does not automatically remove files that's gone upstream, so the mirror only does garbage collection when a full |
Good call. This is 100% the sad state of bandersnatch. We don't have a good mechanism to know what files to delete as we keep the service stateless apart from the blob store (i.e. filesystem, s3 etc.). Only options I see are:
|
I have the same issue, I don't know why there are so many missing package files in the image. How can I make a complete mirror of pypi? 2024-07-30 20:31:59,734 INFO: Fetching metadata for package: zwero-brain-games1 (serial 14011926) (package.py:58)
2024-07-30 20:31:59,796 INFO: zutnlp no longer exists on PyPI (package.py:66)
2024-07-30 20:31:59,796 INFO: Fetching metadata for package: zx-core-backend (serial 3916140) (package.py:58)
2024-07-30 20:31:59,901 INFO: zwdata no longer exists on PyPI (package.py:66) |
If this is from a failed sync, go to the resume file and remove the packages from there. I don't have a better solution or time to try fix this sorry. |
desc
I use bandersnatch to sync from pypi.org, for almost 10days. Today it finally comes to "generating global index page..." and then finish all its work, while I found that the size is only 8822G, which is not the desired size told in https://pypi.org/stats.
details
command:
bandersnatch -c bs.conf mirror
bs.conf:
As is shown in the config file, I use an alternative download mirror, and also block serveral packages. But even I take the blocked packages in conclusion, the number still did not match:
df -h -B G
questions
[blocklist]
part is. But why the size shown in tuna server status is 9.75T, not the 9.48T(as is calculated above)?btw
Recent days when running to "generating global index page...", bandersnatch always come begin with an
![image](https://user-images.githubusercontent.com/57323846/161436375-40f51f2a-8d83-4de9-ade9-4e54cdcc67eb.png)
![image](https://user-images.githubusercontent.com/57323846/161436311-37ff1f1f-ecdf-46ab-b771-fd066c98a189.png)
Response timeout
error:pic1:
pic2:
The command I use is
bandersnatch -c bs.conf mirror
as usual even for the incremental update.Q: Should I run
bandersnatch verify
instead?The text was updated successfully, but these errors were encountered: