Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SquidGuard lists format compatability tool exitst? #161

Closed
elico opened this issue Mar 13, 2018 · 14 comments
Closed

SquidGuard lists format compatability tool exitst? #161

elico opened this issue Mar 13, 2018 · 14 comments
Assignees
Labels

Comments

@elico
Copy link

elico commented Mar 13, 2018

Hey,

I have seen that couple projects use this repository and started wondering about making it compatible with other formats such as SquidGuard and\or Squid-Cache domains acl's\lists.
I am maintaining the Squid-Cache project RPM and DEB packages for quite a while and wrote many BSD licensed tools for squid users\admin.
Comparing SquidGuard and Squid-Cache acls format to hosts files format the main difference is that hosts file is targeting a specific hostname while SquidGuard and Squid-Cache offers the ability to blacklist a whole domain and it's subdomains.
SquidGuard has one flaw compared to Squid-Cache domains acls and it's the option to distinguish between an acl to block a only single domain to a single domain and all of it's subdomains.
Should the domain list: domains-dotted-format.list be compatible with Squid-Cache domain's acl's format? What I mean is, are all the domains and subdomains in the domains-dotted-format.list should be blocked?
If not I can write a script that can compare and filter the different host\domains files to produce a Squid-Cache domain acl's compatible format.

@funilrys
Copy link
Member

Hey @elico,

I never used SquidGuard but I like to learn and implement new things 😸
As long as I know @mitchellkrogza do not use SquidGuard either but he may have work with it in the past.

Again, if you want an answer from @mitchellkrogza , you will have to wait until next week (normally). As he gets a bunch of notifications in a week AFK, it may take some time but I'll address this in prior in our backend system.

To answer your questions:
Keep in mind that every domains or IP which are present in this repository should be blocked.
But also keep in mind that in our list we have domains with subdomains or only domains.
An example is #158 which is present on the list as a single domain but also with subdomains.

If you have further questions please let me know I'll do my best to answer.

@elico
Copy link
Author

elico commented Mar 13, 2018

@funilrys Here it's first goes first and time will bring whatever it can..
I have seen the example at #158 and I do not know how the lists are generated or what these are based on but I had the chance to see the update.py file.
Depends on the source which the domains was blocked, the definition of the domain can be changed.

About SquidGuard
You do not need to learn SquidGuard or use it, and I will explain:
Squid-Cache sends a terminated line ie with "\n" in the end and expects a response "OK\n" or "ERR\n".
The specific technical details are irrelevant since these are very old C pieces of code, but the concept which SquidGuard implements is using a simple key value DB.
Each list has a "domain" db file which is a bdb key->value(1), the domain is the key and the value is irrelevant to SquidGuard.
When SquidGuard analyze a url it receives it splits the url into couple different parts and then test these in a specific order against the domains db files as instructed by the configuration file.
Usually admins like categories like in the famous "shallalist" (h'mm) and like many ACL and FW system's the first rule that matches catch.
Now each domains db contains only a catch all rules ie for the url:
http:https://1.t.y.example.org/test/path
it will test the domains in order:

  • example.org
  • 1.t.y.example.org
  • t.y.example.org
  • y.example.org

So the #158 example will always catch patrz.pl and it's subdomains if it exists in the domains db file which is meant to be blocked.

However Squid-Cache domains acl's uses another concept which is based on it's domains tree structure.
A domain "ACL" can include one of two formats but cannot collide ie:
patrz.pl
will match only patrz.pl but a dotted prefixed domain ACL such as .patrz.pl will match any domain and subdomain of patrz.pl.
With this in mind since .patrz.pl includes any subdomain you cannot create an entry in this list which co-exist with an upper level domain in it.
Since the hosts list is kind of "small" in terms of memory consumption I can write an ICAP service that will use any of this repository offered lists but it is not defined anywhere in the current lists if the domain and it's subdomains is "blacklisted" or only this specific domain.

If you would ask me on a resolution to a conflict which is not ideal but practical is that when an upper level domain and a subdomain exists on the same list it means that only a full match for these domains will be a match(compared to hosts lists which are only matching the exact domain).

Depends on the source or reason for the blacklisting of the domain it can be prefixed with a dot "." to reflect a full block or without a dot prefix to reflect an exact match block only.

@elico
Copy link
Author

elico commented Apr 12, 2018

@funilrys Related to this topic I was wondering about a public service.
I have a service written that I can modify to use your lists(black\white) and offer anyone access to it.
Like OpenDNS\Symantech\Others offer a DNS service I can offer the next services:

  • DNS Rating API ie (127.0.0.127 is a match for black list and 127.0.0.1 is white list)
  • HTTP\HTTPS(1+2) host\url query
  • ICAP\ICAPS http\https query

What do you think about the idea?

@funilrys
Copy link
Member

I think @mitchellkrogza might answer in a way better than me as we are launching a similar service asap @elico :)

@mitchellkrogza
Copy link
Member

mitchellkrogza commented Apr 13, 2018

@elico this is a great idea and we could certainly add the generation of a Squidguard formatted list.

We are in the process of building the DNS service side for dealing with all the domains listed here and it's already working great, so if you query the DNS for any listed domain on Ultimate it will respond to your PC as 0.0.0.0 and when you query a non listed domain the server does the forward query and sends it back intact so like google.com will respond with 2607:f8b0:4004:801::200e 216.58.217.142 but 000sex.com will respond with 0.0.0.0

So our logic behind this was a little different to many DNS based services which work on a whitelist / blacklist answer, we only answer for blacklisted and anything not blacklisted it automatically whitelisted and passed on to DNS recursion.

Here's a sample run of nslookup's

$nslookup 000xxx.com

Non-authoritative answer:
Name:    000xxx.com
Address:  0.0.0.0


$nslookup www.000xxx.com

Non-authoritative answer:
Name:    000xxx.com
Address:  0.0.0.0
Aliases:  www.000xxx.com


$nslookup google.com

Non-authoritative answer:
Name:    google.com
Addresses:  2607:f8b0:4004:80e::200e
          172.217.7.206

@mitchellkrogza
Copy link
Member

@elico today we did our final coding on the generation of the zone file thanks to @funilrys excellent Python programming skills.

We now generate the full zone file containing the full 1,798,038 domains (as of today) and all of this in < 7 minutes and the DNS server is live with the latest update.

Well Done @funilrys on this commendable piece of code 👍 🎉

@elico
Copy link
Author

elico commented Apr 13, 2018

@mitchellkrogza in what language it was written?
I have written my services in GoLang and since I have seen your project I was thinking about a nice radix tree to implement the DNS service.
It actually doesn't require any zone file but then a clean domain and ip addresses plain text file.

@mitchellkrogza
Copy link
Member

@elico using simple BIND9 the global DNS of choice, our script to generate the zone file daily is written in Python by @funilrys

@elico
Copy link
Author

elico commented May 9, 2018

@mitchellkrogza Can I get any references to the Python script which generates the zone file daily?

@funilrys
Copy link
Member

@elico It's just a basic file generation script ... If you know about BIND9 you can generate what you want ...

@stale
Copy link

stale bot commented Oct 20, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Oct 20, 2019
@spirillen
Copy link
Contributor

@elico today we did our final coding on the generation of the zone file thanks to @funilrys excellent Python programming skills.

We now generate the full zone file containing the full 1,798,038 domains (as of today) and all of this in < 7 minutes and the DNS server is live with the latest update.

Well Done @funilrys on this commendable piece of code +1 tada

7 minutes ?? damned do it have to go to the toilet first?? I use seconds with powerdns...... and the reload is almost instant... Bind9 have to do some code optimization....

This was with a rpz file of 1.8mil records, after replacing it with use of wildcards like

example.com CNAME .
*.example.com CNAME .

It's now down to ~6-700.00 and you seriously need to use the time command to record reload/regeneration time

@stale stale bot removed the wontfix label Oct 20, 2019
@funilrys
Copy link
Member

funilrys commented Oct 20, 2019

@spirillen, as Mitch @mitchellkrogza mentioned elsewhere, it's outdated. We will come back soon to it.

@stale
Copy link

stale bot commented Dec 19, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Dec 19, 2019
@stale stale bot closed this as completed Dec 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants