Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request additional sources #529

Closed
DRSDavidSoft opened this issue Oct 10, 2019 · 10 comments
Closed

Request additional sources #529

DRSDavidSoft opened this issue Oct 10, 2019 · 10 comments

Comments

@DRSDavidSoft
Copy link

DRSDavidSoft commented Oct 10, 2019

Hi there, would it be possible to add the following sources as repositories to https://github.com/Ultimate-Hosts-Blacklist?

dbl.oisd.nl related issue

  • https://dbl.oisd.nl (main)
  • https://dblmobile.oisd.nl (mobile edition)

MoaAB (i.e. Mother of all adblocking)

  • https://adblock.mahakala.is/ (main)

These lists contain some useful domains to block, but they're bloated with inactive domains, so a CI + PyFunceble sorting and filtering them out will be useful.

Thanks!

@mitchellkrogza
Copy link
Member

We've just dropped mahakala after it being here for a long time because it just has too many false positives. The other team members can have a look inclusion of oisd.nl

@spirillen
Copy link
Contributor

Hi @DRSDavidSoft

At first eye sight I don't see any differ from other list.

So where is it you think these list add to existing lists?

funilrys added a commit to Ultimate-Hosts-Blacklist/dev-center that referenced this issue Oct 10, 2019
@dnmTX
Copy link

dnmTX commented Oct 10, 2019

@DRSDavidSoft where is this list coming from? Does it have a repo in GitHub so we can see how well is curated?
Just for the record,since adblock.mahakala.is experience i'm very much against adding lists with unknown origin.
Also we'll wait for @funilrys to test it against all the lists present here to check the number of unique domains and we'll go from there.

@DRSDavidSoft
Copy link
Author

The MoaAB (adblock.mahakala.is) homepage is actually on the XDA forums here. I don't believe that they use any sort of CVS (let alone git), but aside from the ridiculous amount of false positives, they're kind of a mostly reputable project. (Besides Energized and 1Hosts projects, which are also XDA-oriented projects).

I agree that the number of false positives for MoaAB list is a lot, but I would still appreciate a CI running PyFunceble on it to detect ACTIVE domains, so that I can cross-reference it with the other sources in my list.

@dnmTX
Copy link

dnmTX commented Oct 11, 2019

@DRSDavidSoft i was asking about dbl.oisd.nl.
adblock.mahakala.is was removed and ain't no coming back anytime soon.

P.S. Ok,i found it on Reddit,and here is a quote from the one who created it:

Download all hosts-files, zone-files, dnsmasq-lists, and so on (non-pi-hole compatible lists are also included), I could find on the internet (Current count; ~3000 lists, some duplicates from different locations) - There's not a list on filterlists.com that I've not checked, and I'm still finding/adding new ones! - Here are some of the lists I'm including

And how is that different then Ultimate-Hosts-Blacklist? I guess there are more lists included there with very little supervision and who knows how many false positives.It's a huge list,and in my opinion it's not needed here.
I VOTE NO 👎

@DRSDavidSoft
Copy link
Author

DRSDavidSoft commented Oct 11, 2019

@dnmTX, oh, I see.

It's worth mentioning that the author has a zero false positive policy and takes this matter very seriously.

This blocklist contains well over 1 million entries, but still you should not have to whitelist things, it's thát good.
It's my goal for this list to be as perfect as possible, so you shouldn't find any :D

What they explaine is that they avoid including any sources that may produce a false positive:

I maintain a separate list (which I call the "skipifcontains"-list. In here I listed domains that should never be blocked. Eg; google.com, facebook.com, amazon.com, microsoft.com, netflix.com, etc)
If a downloaded list contains any one of those listed, then I run the list though a regex loop to only capture the domains that make sense to be blocked, but the rest of the list contents will be skipped.

When I asked him for their sources on Reddit, he linked to only some of them, which is here: https://credits.oisd.nl that's why I'm eager to see what other sources he use.

@mitchellkrogza
Copy link
Member

mitchellkrogza commented Oct 11, 2019

That's one massive list of credits 🙀 I can almost guarantee you there is no way a list like that can be properly maintained and not contain false positives. Several of my own projects included there but just a quick look through reveals a number of other lists which have been dropped over the years due to not being maintained and containing to many FP's. There's also a lot of duplication due to the projects included there which are already projects made up of unified lists.

Ultimate Hosts used to be well over 1.6 Million entries (bigger if I am not mistaken) at a point and the hosts file was unusable on most systems, especially Windows systems (still is). It has reduced somewhat due to better maintenance and community involvement and dropping of unreliable lists.

It's easy to make the biggest hosts file in the world (done it already) but whether or not the hosts file is usable to any without crashing their networking or computer is something we know just doesn't actually work.

@funilrys and I knew this some time ago and started working on a DNS based solution which works flawlessly. Due to time constraints and some needed changes in the design of how we build the DNS zones that project is on hold for now. Ultimately DNS based is the only reliable way forward for the masses by having a blocking service that places zero overhead onto anyone's system and takes milliseconds of networking time as DNS does.

@DRSDavidSoft
Copy link
Author

DRSDavidSoft commented Oct 11, 2019

@mitchellkrogza This is true, and I agree with you on your points! While that list is extensive (and may contain many redundant entries that serve no purpose to many of the users), it usually contains the bare minimum FPs, and generally doesn't block anything other than what they have advertised to block.

I say this because I maintain a list of whitelist sources (in addition to my list of blacklist sources :3) and whenever I add a new blacklist to my list, I always cross-reference them for any potential FPs, to see if it'll break anything useful in the future. So far, MoaAB seems to contains the most FPs while the oisd list contains the least FPs between any of the aggregated lists that I use.

With that being said, I respect the decision to omit including oisd and/or MoaAB in your repo; it was just a request to see if they'd be accepted here 😄 as I already use the other, more useful lists from your repos.

As for duplicates issues, BTW, I believe it's the user's responsibility to check for and remove any duplicates that might be there, when combining lists from different sources. I know I do, and there are many tools available to do just that for users.

As a sidenote... I personally dropped the /etc/hosts method for blocking hosts years ago due to some issues I had at the time. I believe setting up a local DNS server is the way to go fo several benefits:
  1. Zero-overhead for the client, as you already said
  2. More control over the response (I prefer to get NX records)
  3. More control over wildcarding and CNAMEs (i.e. blocking *.bad-domain.com as well as any CNAMEs)
  4. Portable (e.g. using a VPN)
  5. Expandable (i.e. using it for ALL my devices both at home and office!)
  6. Easier to maintain (there's only a small .conf file that you maintain with the list of sources, that you update on a weekly basis)
  7. Easier to debug (there are logging tools that show the user requests as well as tools to find out exactly which source contains FPs)

These systems are already in-place by well-established and popular projects such as Pi-hole that have huge communities.

Personally, instead of the aforementioned Pi-hole, I'm using DNScrypt-proxy as a resolver + Unbound as forwarder/caching for my DNS requests; and I immensely enjoy the benefits that this setup has.

The whole setup is set up...The whole setup is set up on my server (a cheap VPS I bought) instead of running it on a Raspberry Pi at home, so I can use it on my router at home, or with a VPN from anywhere on my phone, or by feeding it as a resolver to my workplace's Windows Server Active Directory setup.
I should note that I have other reasons to set up a DNS server for myself...

I should note that I have other reasons to set up a DNS server for myself using the DNSCrypt-proxy project besides blocking domains. The reason being that my ISP (and/or my gov.) is blocking many DNS requests without my consent (think youtube.com and twitter.com, for example) while allowing adware crap like doubleclick.net to go through.

I'm against this type of useless blocking and I personally believe the opposite thing should be blocked, so I use DNSCrypt-proxy to bypass their useless DNS blocks, and deploy my own blacklist to block domains that actually should be blocked.

Just sharing my 2 cents. 😉

@dnmTX
Copy link

dnmTX commented Oct 11, 2019

RIGHT ON @mitchellkrogza 🚀

@funilrys
Copy link
Member

funilrys commented Oct 18, 2019

Well, If so a CI + PyFunceble sorting and filtering them out will be useful is the only requirement. It's a good idea I didn't delete it yet from @dead-hosts.

I pinged @mitchellkrogza out there (@dead-hosts ==> dev-center) for the case that he needed help for the setup and configuration of PyFunceble in his very own machine.

In between, we (@mitchellkrogza and I) just tested 100000 domains in about 2 hours with my (dev version) changes in the last hours. So let's see what happens with that huge list inside @dead-hosts.

I think everybody will agree, with my decision to close this.
Let me know if some clarification or others are needed.

Cheers,
Nissar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants