Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source of blockurls.txt #23

Closed
strilok opened this issue Jun 20, 2024 · 1 comment
Closed

Source of blockurls.txt #23

strilok opened this issue Jun 20, 2024 · 1 comment

Comments

@strilok
Copy link

strilok commented Jun 20, 2024

Hi Team,
Could you please let us know the source for blockurls.txt which you are using in the bwupdate.sh at line 339 :

add blockurls

sed '/^$/d; /#/d' lst/blockurls.txt | sort -u >>hit.txt

Regards

@maravento
Copy link
Owner

maravento commented Jun 20, 2024

Definition:

  • blockurls.txt is a blacklist of the blackweb project, which includes domains that we have selected as malicious. It also works as a domain debug list of overlapping subdomains. Example: if "foo.blablabla.com" and "blablabla.com" are found in blackweb.txt, you will get a Squid error of type overlapping subdomains (e.g.: WARNING: '.foo.blablabla.com' is a subdomain of ' .blablabla.com'). So, we include "blablabla.com" in blockurls.txt to avoid this.

  • hit.txt is the final purge list of valid domains, before being converted to blackweb.txt. All the lists created by blackweb.txt are full of non-existent domains. Our script joins all the lists and then does a check for each domain and sends those that do not exist to fault.txt and those that are valid to hit.txt

According to the loop from the line you mention, the following happens (in order):

add "blockurls.txt" to "hit.txt" to fix some overlapping subdomains and add our selected domains as malicious.

sed '/^$/d; /#/d' lst/blockurls.txt | sort -u >>hit.txt

And hit.txt is cleaned of the overlapping subdomains that were added with "blockurls.txt", to obtain a temporary final list, before the last processing:

grep -vi -f <(sed 's:^\(.*\)$:.\\\1\$:' lst/blockurls.txt) hit.txt | sed -r '/[^a-z0-9.-]/d' | sort -u >blackweb_tmp

We hope this explanation has been sufficient. We will close the issue, since it has been answered, but if you have another question, you can ask or reopen it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants