Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About Super.Hosts (included in Blackweb) #6

Closed
maravento opened this issue Aug 5, 2017 · 79 comments
Closed

About Super.Hosts (included in Blackweb) #6

maravento opened this issue Aug 5, 2017 · 79 comments
Assignees

Comments

@maravento
Copy link

Great work. We have included your list (Ultimate Super.Hosts Blacklist ) in Data sheet (sources) of our Blackweb project based on Squid-Cache for Linux
Special Thanks

@funilrys
Copy link
Member

funilrys commented Aug 5, 2017

Nice 👍 😸

@mitchellkrogza is on holiday but I think he would say : AWESOME 👍 💯 🥇 ⭐ 🏆

I'm working on a better way to clean this repository with funceble so maybe when he come from holiday my next release or pre-release will handle big file automation with travis 😸

@maravento
Copy link
Author

Great. To express our congratulations

@mitchellkrogza
Copy link
Member

mitchellkrogza commented Aug 5, 2017 via email

@maravento
Copy link
Author

maravento commented Aug 5, 2017

We have also included the following sources:
Hacked-Malware-Web-Sites
Badd-Boyz-Hosts
NginxBadBotBlocker
Thanks

@maravento maravento reopened this Aug 5, 2017
@mitchellkrogza
Copy link
Member

Hi @maravento I am back from trip away, 👍 thanks so much for including this and my other sources in your BlackWeb Project. I have pushed out some new updates today, removed a few false positives and the list size of Ultimate Hosts has grown again with fresh data from remote sources.

@maravento
Copy link
Author

great news.

@mitchellkrogza
Copy link
Member

@maravento been doing some big cleaning up and de-duping. Busy with another big clean of dead and expired domains but that's going to take some time to complete but you will see the list is reduced somewhat and has not one dupe. Found an error in my scripting where some input files were in a DOS format causing dupes to be created but all fixed now.

@maravento
Copy link
Author

thanks for the info. I will update blackweb, but there is a problem. I would need the list of excluded domains to exclude them also from blackweb

@maravento maravento reopened this Aug 17, 2017
@mitchellkrogza
Copy link
Member

mitchellkrogza commented Aug 17, 2017

@maravento Can you possibly send me the list you have so I can run it against current list and provide you with the removed stuff? Does blackweb not pull it from the repo and update it? I've provided raw links in the README to all the raw files.

@maravento
Copy link
Author

maravento commented Aug 17, 2017

Blackweb does not eliminate. Only adds bad domains (except whitelist)
https://github.com/maravento/blackweb/tree/master/bl
cat blackweb.tar.gz* | tar xzf -

PD: By the way 7.827.420 black domains (Blackweb download many sources)

@mitchellkrogza
Copy link
Member

@maravento it's taken several weeks of work but we now have a central repo for controlling removals of dead domains, dead blogspot domains & more coming.

This central repo also controls whitelisting and removal of false positives.

So all projects now draw from this central information and have list cleaning functions which do the stripping and removals on every build.

The central control system can be found at : https://github.com/mitchellkrogza/CENTRAL-REPO.Dead.Inactive.Whitelisted.Domains.For.Hosts.Projects and we'd love any contributions or additions from you, simply send PR's wherever applicable.

Right now on dead domains we have a 99% accuracy rate. Each list will be re-tested from time to time to check for domains or web sites that have become re-active and they will be added the re-active-domains list.

This now gives much better control across all repo's and reduces any whitelisted or false positive domains ever being re-added to a list by mistake.

@mitchellkrogza
Copy link
Member

mitchellkrogza commented Sep 23, 2017

@maravento
Copy link
Author

Great news. On Monday we will update blackweb project with the new repositories.
An important question:
This list:
Ultimate Super Hosts Blacklist
It is the same as this or a new repository?:
Ultimate Hosts
thanks

@mitchellkrogza
Copy link
Member

Hi @maravento yes thats the same repo. If you look at the Readme on https://github.com/mitchellkrogza/Ultimate.Hosts.Blacklist you can see the different raw files. The super hosts file is comprised of domain names plus IP addresses

@mitchellkrogza
Copy link
Member

@maravento

This Raw file is a plain text lists of domain names only
https://hosts.ubuntu101.co.za/domains.list

And this raw file is a plain text lists if bad IP's only
https://hosts.ubuntu101.co.za/domains.list

Those two lists have no commenting or anything and don't include the 0.0.0.0 domain.com or 0.0.0.0 IP.IP.IP.IP

So for your uses pulling those might be easiest

@maravento
Copy link
Author

maravento commented Sep 23, 2017

This Raw file is a plain text lists of domain names only
https://hosts.ubuntu101.co.za/domains.list
And this raw file is a plain text lists if bad IP's only
https://hosts.ubuntu101.co.za/domains.list

please check the clarification. both links are the same

@mitchellkrogza
Copy link
Member

Sorry working off my mobile

IP List only
https://hosts.ubuntu101.co.za/ips.list

Domains list only
https://hosts.ubuntu101.co.za/domains.list

@mitchellkrogza
Copy link
Member

Absolutely no permission required and please let me know how it works out

@maravento
Copy link
Author

maravento commented Sep 25, 2017

Great. About ip.list
https://hosts.ubuntu101.co.za/ips.list
contains the following errors (which are not IPs/CIDR ipv4):
1035.globatel.ru
1570.info
1940s.org.uk
1945.net
2002:5bc8:c58::5bc8:c58
2002:b6ff:2c08::b6ff:2c08
2111dh.com
19333.com
21234.com
62528.com
14713804a.l2m.net
87654321.info

@maravento
Copy link
Author

About whitelisted IPs/CIDR: https://github.com/mitchellkrogza/CENTRAL-REPO.Dead.Inactive.Whitelisted.Domains.For.Hosts.Projects/blob/master/whitelisted-ip-ranges-ALL-combined.txt

In our project whiteip

IPs/CIDR Add from Central-Repo:
199.30.16.0/24
199.30.27.0/24
203.208.60.0/24
64.68.80.0/21
157.56.0.0/14
207.46.0.0/16
the rest of the ips/cidr were already included in whiteip.txt

Problems detected in Central-Repo:
172.217.0.0/19 (should be 172.217.0.0/16)
NetRange: 172.217.0.0 - 172.217.255.255
CIDR: 172.217.0.0/16
NetName: GOOGLE
NetHandle: NET-172-217-0-0-1
Parent: NET172 (NET-172-0-0-0-0)
NetType: Direct Allocation
OriginAS: AS15169
Organization: Google Inc. (GOGL)
RegDate: 2012-04-16
Updated: 2012-04-16
Ref: https://whois.arin.net/rest/net/NET-172-217-0-0-1

66.249.64.0/18 and 66.249.80.0/20 (should be 66.249.64.0/19)
NetRange: 66.249.64.0 - 66.249.95.255
CIDR: 66.249.64.0/19
NetName: GOOGLE
NetHandle: NET-66-249-64-0-1
Parent: NET66 (NET-66-0-0-0-0)
NetType: Direct Allocation
OriginAS:
Organization: Google Inc. (GOGL)
RegDate: 2004-03-05
Updated: 2012-02-24
Ref: https://whois.arin.net/rest/net/NET-66-249-64-0-1

@mitchellkrogza
Copy link
Member

mitchellkrogza commented Sep 28, 2017

@maravento please check now if the list looks better / correct > https://hosts.ubuntu101.co.za/ips.list

I used the following

sort -t . -k 1,1n -k 2,2n -k 3,3n -k 4,4n -k 5,5n -k 6,6n -k 7,7n -k 8,8n -k 9,9n $_input2 | uniq > $_input4 && mv $_input4 $_input2

@maravento
Copy link
Author

maravento commented Sep 28, 2017

A suggestion. Separate ipv4 and ipv6 lists (because debugging is different in both)

@mitchellkrogza
Copy link
Member

👍 @maravento shall implement tomorrow

@mitchellkrogza
Copy link
Member

mitchellkrogza commented Sep 29, 2017

@maravento have we got a better layout now on CENTRAL REPO ?? Have a look and give me some feedback please. https://github.com/mitchellkrogza/CENTRAL-REPO.Dead.Inactive.Whitelisted.Domains.For.Hosts.Projects

@funilrys any feedback / comments from you on this new layout?

@mitchellkrogza
Copy link
Member

Once you guys both happy, I can re-write the cleaning functions on all my other repo's including this one of course.

@maravento
Copy link
Author

I think you have done a great job of debug and reorganizing. i congratulate you

@mitchellkrogza
Copy link
Member

Thanks so much @maravento much appreciated and truly appreciate all your input . Have a great weekend

@maravento
Copy link
Author

maravento commented Oct 2, 2017

Hi.
About: https://github.com/mitchellkrogza/CENTRAL-REPO.Dead.Inactive.Whitelisted.Domains.For.Hosts.Projects/blob/master/DOMAINS-dead.txt
I was doing a random routine review and I have seen some valid and active url (porn) such as:
http:https://yoursexymind.com/
http:https://09zyy.com/
https://www.zurich-girls.ch/
It can happen that a malicious domain (porn, weapons, etc.) is inactive for a while and then reactivated.
So, in these cases, I'd think it's best to block them, regardless of whether the domain is active or not

@mitchellkrogza
Copy link
Member

Thanks @maravento all those lists are currently undergoing re-testing. We will soon have them fixed and keep certain domains on the list regardless of if their state changes. It's a big Work in progress this one.

@mitchellkrogza
Copy link
Member

I think this is where we will use https://github.com/mitchellkrogza/CENTRAL-REPO.Dead.Inactive.Whitelisted.Domains.For.Hosts.Projects/blob/master/DOMAINS-re-active.txt to keep a list of permanently blacklisted domains and also add to it from our re-tests when previously inactive are now active again.

@maravento
Copy link
Author

that's too much debugging work

@mitchellkrogza
Copy link
Member

mitchellkrogza commented Oct 3, 2017

Gotta re-think this whole thing a bit. No need to tell me how much work this is turning out to be 😂

@maravento
Copy link
Author

maravento commented Oct 3, 2017

I give you the same suggestion I gave you days ago:
You already have a blacklist (https://github.com/mitchellkrogza/Ultimate.Hosts.Blacklist)
Then debug that list, not with dead domains (because they can revive). Debug it with invalid domains (which do not exist or never existed).
Take a look at our list of invalid domains (urls/tlds) (hopefully more complete):
https://github.com/maravento/blackweb/blob/master/invalid.txt

@mitchellkrogza
Copy link
Member

@maravento thanks going to take a fresh start with the dead domains control manner on central repo. As always appreciate your input and help. 👍

@mitchellkrogza
Copy link
Member

@maravento I took a fresh start to CENTRAL REPO now, focusing mostly on dead blogspot domains which we know 100% are dead and the list of invalid domains. Now a much smaller list of dead domains and will just focus on invalid domains. https://github.com/mitchellkrogza/CENTRAL-REPO.Dead.Inactive.Whitelisted.Domains.For.Hosts.Projects

@maravento
Copy link
Author

I have tried the new version of DOMAINS-dead.txt and it goes perfect (I have verified the domains and they are really dead or invalid). I included it in the next update of Blackweb
PD: I've been debugging Blackweb for 2 days with DOMAINS-dead.txt ... and counting ... Our work never ends.
Thanks for your great work

@mitchellkrogza
Copy link
Member

Thanks so much @maravento and thanks again for your valuable insight and help. It's now a much smaller list and re-testing it will only ever take a few hours. I have some tests running on the complete Ultimate hosts list which will give us some additional invalid domains which will be tested once more before being added to domains-invalid. 👍

@maravento
Copy link
Author

Hi. @mitchellkrogza There are many duplicate IPs / CIDRs in the ips.list. This happens because many lines have a space at the end and generate duplicates. Example:
222.187.221.28
222.236.44.131
223.165.25.36
.... A very long etc
So, to fix it, debug the list with:
sort -t. -k 1.1n -k 2.2n -k 3,3n -k 4,4n -k 5,5n -k 6,6n -k 7,7n -k 8,8n -k 9,9n ips.list | sed 's / * $ //' | uniq> newips.list
after debugging, you will also notice that there are other conflicts. Example:
216.152.252.253 is a subnetwork of (B) 216.152.240.0/20
.... A long etc
But that is not serious. I solve it with this script,(it takes days to debug)

@maravento maravento reopened this Oct 7, 2017
@mitchellkrogza
Copy link
Member

Thank @maravento I found the problem, was only one of the input sources yoyo.org which had spaces. All spaces removed and new build in progress.

@maravento
Copy link
Author

maravento commented Oct 7, 2017

i found in ips.list:
2002:3f8d:ec63:0:0:0:3f8d:ec63
2002:5bc8:c58::5bc8:c58
2002:b6ff:2c08::b6ff:2c08
2002:b6ff:2d90::b6ff:2d90
is Ipv4 & ipv6 or ipv4 only?

Also i found 0.0.0.0/8, 10.0.0.0/8, 100.64.0.0/10, 192.0.0.0/24, 192.0.2.0/24, 192.168.0.0/16, etc (Reserved IP addresses) (?)

@mitchellkrogza
Copy link
Member

Thanks @maravento currently ips.list is combined ipv4 and ipv6 and also needs better sorting which I will do tomorrow. I will also address those reserved ranges. Can you spot any more dupes?

@mitchellkrogza
Copy link
Member

I think all those reserved IP ranges originate from the yoyo.org input source. Will have it sorted tomorrow.

@maravento
Copy link
Author

I think that's all

@maravento
Copy link
Author

I recommend using this pipe to discard anything unusual (ipv6, spaces...):
sed 's/^[ \s]//;s/[ \s]$//' | sed "/:/d"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants