-
-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List Culling and Sorting (WIP) #20
Comments
👍 You gave me the idea of URL implementation into funceble but it's may be for the future ... :) By the way, I'm almost ready for the next release 😉 |
this is how pihole does it https://github.com/pi-hole/pi-hole/wiki/Customising-sources-for-ad-lists |
Thanks @xxriticxx and @funlirys I have custom scripts to do this all but it does take time and cross checking. Almost done and you can see the list has shrunk somewhat. By tomorrow morning it will be in a really nice clean state going forward. @xxcriticxx I also found some other false positives during this process today. |
@mitchellkrogza will the list be smaller in size(mb) wise? |
Yes indeed, it will be smaller and quicker to update once all the dupes are truly gone. |
yes right now its around 60mb should be around 5mb only why do you have |
ALL: is only used in a hosts.deny file. It works differently to a normal
hosts file. It stops incoming connections to your pc from specified ips and
domains while the normal hosts file stops only outgoing connectivity
|
For your pihole you should be pulling the latest hosts raw file and not
hosts.deny or superhosts.dent
|
so i think i am pulling wrong list i only need ip or domains |
please give me link to correct list |
i will play with it later on today |
Lists all sorted and clean now, been quite a job. |
looks really good @mitchellkrogza 👍 good work 💯 |
Thanks @funilrys 👍 it's now the way it should be. The hosts file contains only domains and not IP addresses as that's the way it should actually be for the way DNS works. IP's are now all listed in the hosts.deny file so the combination of using the hosts + hosts.deny on any "nix" system should keep out 99% of utter garbage 👍 I also stripped out all domains starting with www. as it just led to so much duplication and is actually unnecessary as DNS will reject the whole domain no matter what is in front of the root domain name. |
Now to run the new funceble when it's out against these lists 😬 but I'll do that from my Ubuntu box and not inside Travis |
You should wait a bit 😉 😅 |
@xxcriticxx the raw hosts file at https://hosts.ubuntu101.co.za/hosts is now down to 54Mb. It's nice and clean now and will get even cleaner once I run it against funceble and strip out all dead and inactive domains but at least now it's a proper hosts file with domain names only and not IP addresses as it should be. So as explained above to @funilrys if you use the hosts + hosts.deny on any nix system you will be well protected. |
@funilrys no worries, I'm in no rush, I would rather wait until you have perfected it. Going to be massively interesting to see the results when I do get to run it against this list. Once funceble can help clean out all dead and expired domains this should be the squeakiest clean list out there. |
I personally can't wait to see if you find some issues 😉 😹 I hope that https://raw.githubusercontent.com/mitchellkrogza/Ultimate.Hosts.Blacklist/9816d45d8f1bc10a8a56271c799580c5910898d0/.input_sources/_urlblacklist.com/spyware/ips.txt will help me improve the handle of IP Just discovered that funilrys/funceble#83 is not as fixed as I thought 😭 |
|
@xxcriticxx Please Pull it again now and repost your report, just updated it again as there was a problem with some duplications which I fixed now. V1.2017.08.109 |
@funilrys if you look in all the directories in .input_sources you will see what work I did yesterday splitting domains and ips into separate files. Lots of IP's to test now at: 465,685 Dos2Unix was the savior for fixing the dupes issue I was running into, now lists have not one dupe as far as I can see. |
pihole will take care of duplication |
Perfect thanks @xxcriticxx let me know if you find any more false positives. |
@xxcriticxx raw hosts file now down to 52 Mb. |
@mitchellkrogza I saw it 😉 funilrys/funceble#83 is now officially fixed 👍 |
And for that issue it's a code side issue :D whois return |
Thats' what I need ...... I have to track bad sites manually using screen recorder on OSX as some of them do 3-10 redirects in the blink of an eye. Who knows maybe that guy who registered 0000opengate.biz thought he might live to be 5200 years old 🤣 |
Does that mean Mitch uncovered another 🐛 😁 whois return Domain Expiration Date: Sun Jul 15 23:59:59 GMT 2018 |
Definitely 😹 👍 |
By the way found a really cool one liner yesterday to clean lists. Both lists have to be sorted and dupe free .... works very well but ..... knowing you, you will find and even cleverer way just to outwit me 🤣
|
@xxcriticxx I can see once I remove all these dead and inactive domains I will have my Ultimate hosts down to a much more respectable size for all users and also only filled with stuff that actually exists. |
@funilrys I tell you one thing, and this was why I started Ultimate Hosts .... 90% of the lists out there are filled with utterly useless garbage that does not exist anymore and nobody cleans or bothers to clean their lists. |
@mitchellkrogza how many hosts can dnsmasq handle? |
@xxcriticxx really not sure, don't use it. Try asking on their forums. |
@funilrys think I'm going to pull the new dev funceble into Badd-Boyz-Hosts and let it loose tonight. See if I can beat the Travis 50 minute timeout. |
@mitchellkrogza thats what pihole using for the hosts i know it has to have limit |
@xxcriticxx does your pi-hole ever crash with my list size ??? |
@xxcriticxx trying to find out for you. |
@mitchellkrogza did not crash yet but i am running it on reg computer not pie3 |
Try it and see what happens, only way to find out. Remember once my funceble output of this list finishes, probably by tomorrow morning sometime there will be a LOT of dead stuff culled out of this hosts list. |
@mitchellkrogza check this webiste out see if you can add any lists https://filterlists.com/ |
Nice find @xxcriticxx I'll add some of these in the morning. Thanks for
your support on this project.
|
@xxcriticxx added 4 new data sources today so far, so thanks for that link 👍 . Some of them look well maintained, some of them not so much. Also some are useless to a hosts file. But we are slowly growing this into a really top notch list and once @funilrys finishes his work on the dev branch of funceble we will clean this list of all dead and useless stuff and then have a killer accurate hosts list barre none. |
@xxcriticxx added 6 new data sources today, more tomorrow then over the weekend once funceble finishes checking there will be a big clean up of dead domains. |
@mitchellkrogza is my list good to pull? |
@xxcriticxx which one ?? Please point me to it. |
Added another new data source today, pre-edited version of someonewhocares.org |
@mitchellkrogza did you update raw list that i use can i pull new list? |
@xxcriticxx raw lists always up to date immediately after a build completes. |
@xxcriticxx just wait 5 minutes, busy with new build and fresh files in 5 minutes |
@xxcriticxx all raw files at latest version. |
ok i will pull later when am home |
Cool let me know, also see if you know anyone who can test the windows version of the hosts file. It seems ok on XP but does require that DNS client is disabled |
Work in progress, major sorting of input source lists splitting IP's and domain names into separate lists and removing thousands of duplicates.
Should be done by day end and will make managing of this repo much easier from here on out.
A lot of the input sources have VERY bad duplications and structures using full urls including url parameters and file names which are actually totally useless inside a hosts file as are IP addresses which now only get generated into the hosts.deny and superhosts.deny files.
The text was updated successfully, but these errors were encountered: