Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Adblock Plus domain lists #1532

Merged
merged 10 commits into from
Mar 11, 2023
Merged

Conversation

DL6ER
Copy link
Member

@DL6ER DL6ER commented Feb 15, 2023

What does this implement/fix?

This PR implements support for AdBlock Plus (ABP)-style domain lists in Pi-hole. To be precise, it adds support for the following domain matching syntax defined here at the time of opening this PR:

image

We do not implement any other features such as exception rules as they are typically beyond what a DNS server can do (path information, for instance, is simply not available).

It should be noted that this new feature is not for free but the rather complex syntax means it comes at some computational costs (= delays in DNS replies if you are on low-end hardware). To mitigate this drawback, ABP-style matching is only enabled when FTL actually detects such domains in the gravity table. This shouldn't be the case for the vast majority of users using "normal" HOSTS-style or simple one-domain-per-line adlists as sources for Pi-hole.

Related issue or feature (if applicable): N/A

Pull request in docs with documentation (if applicable): N/A


By submitting this pull request, I confirm the following:

  1. I have read and understood the contributors guide, as well as this entire template. I understand which branch to base my commits and Pull Requests against.
  2. I have commented my proposed changes within the code.
  3. I am willing to help maintain this change if there are issues with it later.
  4. It is compatible with the EUPL 1.2 license
  5. I have squashed any insignificant commits. (git rebase)

Checklist:

  • The code change is tested and works locally.
  • I based my code and PRs against the repositories developmental branch.
  • I signed off all commits. Pi-hole enforces the DCO for all contributions
  • I signed all my commits. Pi-hole requires signatures to verify authorship
  • I have read the above and my PR is ready for review.

…ds to be switched on by setting GRAVITY_ABP_STYLE=true in pihole-FTL.conf to avoid running this computationally expensive task on the vast majority of user databases only fed from properly formatted HOSTS lists. Gravity can enable the setting when it detects ABP format automatically.

Signed-off-by: DL6ER <[email protected]>
… domains are present in the database. This ensures that this addition comes at no extra costs to any installs using pure HOSTS-style adlists.

Signed-off-by: DL6ER <[email protected]>
@DL6ER DL6ER requested review from PromoFaux, yubiuser and a team February 15, 2023 20:21
src/database/gravity-db.c Outdated Show resolved Hide resolved
Spespellingllling

Signed-off-by: Dan Schaper <[email protected]>
@dschaper
Copy link
Member

Will need a matching PR in core to allow gravity.sh to pass the new format.

@PromoFaux
Copy link
Member

🎉

root@docker-pihole-pi4:/# pihole-FTL sqlite3 /etc/pihole/gravity.db "Select * from gravity limit 1"
||0.code.cotsta.ru^|2
root@docker-pihole-pi4:/# nslookup 0.code.cotsta.ru 127.0.0.1
Server:         127.0.0.1
Address:        127.0.0.1#53

Name:   0.code.cotsta.ru
Address: 0.0.0.0
Name:   0.code.cotsta.ru
Address: ::

root@docker-pihole-pi4:/# nslookup subdomain.0.code.cotsta.ru 127.0.0.1
Server:         127.0.0.1
Address:        127.0.0.1#53

Name:   subdomain.0.code.cotsta.ru
Address: 0.0.0.0
Name:   subdomain.0.code.cotsta.ru
Address: ::

@pralor-bot
Copy link

This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/regex-capture-non-capture-groups-best-practice/61297/10

@jpgpi250
Copy link

It should be noted that this new feature is not for free but the rather complex syntax means it comes at some computational costs (= delays in DNS replies if you are on low-end hardware). To mitigate this drawback, ABP-style matching is only enabled when FTL actually detects such domains in the gravity table.

could you please ensure there is a warning when "ABP-style matching" is enabled, either in the FTL log or in the web interface ("pi-hole diagnosis"), this to ensure the system warns there might be a "computational cost".

Thanks.

@yubiuser
Copy link
Member

I added an info to the gravity script

  [i] Target: https://small.oisd.nl
  [✓] Status: Retrieval successful
  [i] List contained AdBlock Plus style domains
  [i] Imported 72149 domains
  [i] List has been updated

@jpgpi250
Copy link

I added an info to the gravity script

gravityupdate runs on sunday early morning, not too many people look at the resulting log. An entry in the FTL log would be highly appreciated, monitoring this log allows me to generate the appropriate alert.

@yubiuser
Copy link
Member

Just for my understanding: what do you intend to do, if you would get such a warning about an adlist? What are the consequences of the warning?

@jpgpi250
Copy link

Given the predicted performance cost, I would remove the list that activates "ABP-style matching", I'm happy with the amount of blocking I achieve now (35.4%) and confident that the additional domains are already (or will be) part of the regular lists I currently use.
Hopefully, this the list, containing these entries is easy to find...

@DL6ER
Copy link
Member Author

DL6ER commented Feb 25, 2023

Given the predicted performance cost, I would remove the list that activates "ABP-style matching

Having a certain high-performance computing background, my choice of words may have not been adequate. Typically, gravity lookups take on the order of few microseconds so with ABP style blocking enabled we're still talking sub-milliseconds here. I don't think it's worth making a big deal out of it but it also won't harm to add a log message like

gravity contains ABP-style domains

@jpgpi250
Copy link

gravity contains ABP-style domains

  • list url and / or ID please

@jfb-pihole
Copy link
Sponsor Member

  • list url and / or ID please

Is this really necessary? There may be multiple lists of this type, depending on what the user added.

@DL6ER
Copy link
Member Author

DL6ER commented Feb 25, 2023

No, I will neither add the list URL nor the adlist ID. The reason is that this would be a very expensive database query. Image expensive now on the order of several seconds to a few minutes of FTL downtime. The reason is that we'd have to do a full database scan to get this information. Currently, FTL scans only if there is any domain in ABP style and stops right when it found one. However, there may be multiple adlists in the database so we'd have to scan the entire table until we have collected all adlist IDs.

As always, you can easily get what you want by a third-party script running an appropriate SQL query, such as

SELECT DISTINCT adlist_id FROM gravity WHERE domain LIKE '||%^';

@rdwebdesign
Copy link
Member

@DL6ER DL6ER marked this pull request as draft February 25, 2023 17:35
@DL6ER
Copy link
Member Author

DL6ER commented Feb 25, 2023

I pushed a new commit and set this PR back to draft because I'm interested in your findings about timing. Please, check out again this branch and have a look into your FTL.log file for something like

TIMED gravity_check_ABP_format(): .... ms

What does it say in your case? Do you have ABP domains in your gravity or not? I'm also interested in results once with and once without these

@yubiuser
Copy link
Member

I tried this on a x86 (Celeron J4105 ) once with a single ABP list (https://small.oisd.nl)

TIMED gravity_check_ABP_format(): 325.687554 ms

and once without

TIMED gravity_check_ABP_format(): 323.809821 ms

…king is to be used or not. Also log when FTL enabled ABP-style blocking

Signed-off-by: DL6ER <[email protected]>
@PromoFaux
Copy link
Member

Yes

pi-hole/pi-hole@c5faf3d

src/database/gravity-db.c Outdated Show resolved Hide resolved
src/database/gravity-db.c Outdated Show resolved Hide resolved
src/database/gravity-db.c Outdated Show resolved Hide resolved
@jpgpi250
Copy link

another one that might be useful when the need arises to compare domain entries from the gravity table (gravity.db) with domain entries from the queries view (pihole-FTL.db)

SELECT trim(domain, '|^')

example (not working on standard pihole deployments)

# domains that used to be allowed, now blocked
exec 3>&1 1>"${importfile}"
	pihole-FTL sqlite3 -separator ',' "${FTLdb}" << EOSQL
	ATTACH '${olddb}' AS olddb;
	SELECT timestamp, domain, status from queries WHERE 
		timestamp < '${timestamp}'
		AND status IN (2, 3, 12, 13, 14)
		AND domain IN (SELECT trim(domain, '|^') FROM olddb.new) GROUP BY domain;
EOSQL
exec 1>&3 3>&-
sudo sqlite3 "${olddb}" "DELETE FROM blocked;"
sudo pihole-FTL sqlite3 -separator ',' "${olddb}" ".mode csv" ".import '${importfile}' blocked"
sudo rm "${importfile}"

sqlite3 select result with trim (sample) - may match an entry in the queries view:

zero.dns0.eu
zetland.rm-ni.uk
zougloub.eu
zrh1-ns01.monzoon.net
zxcvb.pp.ua

sqlite3 select result without trim (sample) - will never match anything in the queries view:

||zero.dns0.eu^
||zetland.rm-ni.uk^
||zougloub.eu^
||zrh1-ns01.monzoon.net^
||zxcvb.pp.ua^

@dschaper
Copy link
Member

pihole -q looks OK

pihole -q has not been updated to know how to parse ABP style. It likely will not have that capability when the feature is initially released. It will match exact but not subdomains.

 pihole -q 1qaz.de
 Match found in http:https://localhost.localdomain/DOHservers/DOHadb.txt:
   ||1qaz.de^

That's fine and will show the exact match.

pihole -q sub.1quaz.de will not show that it is part of ||1qaz.de^ until we find a way for shell to be able to walk a domain while being efficient enough.

@dschaper
Copy link
Member

sqlite3 select result without trim (sample) - will never match anything in the queries view:

There needs to be an understanding that ABP style entries are 'special' and not a direct match for comparison.

@jpgpi250
Copy link

jpgpi250 commented Feb 28, 2023

@dschaper @rdwebdesign

That's fine and will show the exact match.

pihole -q sub.1quaz.de will not show that it is part of ||1qaz.de^ until we find a way for shell to be able to walk a domain while being efficient enough.

copied over query.sh from branch abp_query
FYI 2.341.067 domains in gravity.

result (success):

pi@raspberrypi:~/tmp $ time pihole -q zzxedr.xyz
 Match found in https://v.firebog.net/hosts/AdguardDNS.txt:
   zzxedr.xyz
 Match found in https://small.oisd.nl:
   ||zzxedr.xyz^

real    0m5.325s
user    0m4.923s
sys     0m0.394s
pi@raspberrypi:~/tmp $ time pihole -q sub.zzxedr.xyz
 Match found in https://small.oisd.nl:
   ||zzxedr.xyz^

real    0m7.638s
user    0m7.324s
sys     0m0.337s

will keep testing...

@jpgpi250
Copy link

jpgpi250 commented Mar 3, 2023

  1. the FTL log shows a lot of entries like (partial only, there are more entries):
[2023-03-03 09:35:29.251 9201/F21551] Using ABP-style entries in gravity database
[2023-03-03 09:35:29.257 9202/F21551] Using ABP-style entries in gravity database
[2023-03-03 09:35:29.261 9203/F21551] Using ABP-style entries in gravity database
[2023-03-03 09:35:29.771 9205/F21551] Using ABP-style entries in gravity database
[2023-03-03 09:35:29.774 9204/F21551] Using ABP-style entries in gravity database
[2023-03-03 09:35:29.775 9206/F21551] Using ABP-style entries in gravity database
[2023-03-03 09:38:25.528 9431/F21551] Using ABP-style entries in gravity database
[2023-03-03 09:38:25.531 9432/F21551] Using ABP-style entries in gravity database
[2023-03-03 09:38:25.535 9433/F21551] Using ABP-style entries in gravity database
[2023-03-03 09:39:35.562 9480/F21551] Using ABP-style entries in gravity database
[2023-03-03 09:39:35.568 9481/F21551] Using ABP-style entries in gravity database
[2023-03-03 09:39:35.571 9482/F21551] Using ABP-style entries in gravity database

is this intended? I know I asked for a warning, but this looks like overkill. The warning, when pihole-FTL starts is sufficient.

  1. what does '9482/F21551' and similar entries (see below, 21551/T21567) indicate in this message?
[2023-03-03 09:40:00.086 21551/T21567] Notice: Database size is 3.92 MB, deleted 168 rows

@dschaper
Copy link
Member

dschaper commented Mar 4, 2023

2. '9482/F21551' and similar entries (see below, 21551/T21567

I'm pretty sure those are parent process PIDs and Fork | Thread PIDs.

@dschaper
Copy link
Member

dschaper commented Mar 4, 2023

Are there any outstanding issues that need to be addressed before we can merge this in?

Are the timing calls and the header includes meant to be removed?

@DL6ER
Copy link
Member Author

DL6ER commented Mar 4, 2023

the FTL log shows a lot of entries

They should only be shown when gravity is reloaded. However, you post made me aware of that gravity is reloaded when forks are created as SQLite3 doesn't have proper multi-processing support when it comes to forks (only threads are fine). I'll see how this can be improved, setting back to draft until then.

@dschaper We can remove them, I'll do this as well.

@DL6ER DL6ER marked this pull request as draft March 4, 2023 19:41
…ve the hint about ABP domains, this can easily be checked in gravity

Signed-off-by: DL6ER <[email protected]>
@DL6ER
Copy link
Member Author

DL6ER commented Mar 5, 2023

I removed the leftover traces of the benchmarking we did in between and decided to also remove the ABP-style blocking hint. Ensuring it is logged for every gravity rebuild but not when, e.g., FTL forks a dedicated TCP handler seems unnecessarily complicated given that users (a) should know when they add lists with such domains, (b) pihole -g would tell them pretty obvious, (c) it can be checked at any time using SELECT value FROM info WHERE property = 'abp_domains';, and, finally, (d) it doesn't actually matter.

This PR should now finally be ready for review + merge.

@DL6ER DL6ER marked this pull request as ready for review March 5, 2023 13:00
@DL6ER DL6ER requested a review from a team March 5, 2023 13:00
@DL6ER DL6ER dismissed yubiuser’s stale review March 5, 2023 13:00

Review comments have been addressed

@jpgpi250
Copy link

jpgpi250 commented Mar 5, 2023

latest version installed
FTL version is new/adb_style_blocking vDev-da118e8 (Latest: v5.21)
will report back...

@jpgpi250
Copy link

jpgpi250 commented Mar 8, 2023

  • FTL behaves as expected (no problems)
  • I've merged the development version of query.sh (fix for #5199) with the latest version of query.sh from branch abp_query (fixes match for abp_style sub domains e.g. pihole -q sub.zzxedr.xyz with gravity entry ||zzxedr.xyz^). Everything appears to be working as expected.
  • there is a difference in gravity.sh beween branches abp and abp_query (first line is from abp branch):
  valid_domain_pattern="([a-z0-9]([a-z0-9_-]{0,61}[a-z0-9]){0,1}\.)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9]\.?"
  valid_domain_pattern="([a-z0-9]([a-z0-9_-]{0,61}[a-z0-9]){0,1}\.)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9]"

see issue 4701, PR 4704, related?

edit
I investigated the gravity.sh change further, It looks like the first entry is the one to go:

PR ensures trailing periods are removed before entering the domain into gravity. The result is that more domains are valid, domains with a trailing period are considered valid, trailing period is removed by gravity.sh.

my test file (used to verify pihole-FTL returns correct results), with an entry with trailing dot (in real life, there are blocklists with domains with a trailing dot):

# nsd (Name Server Daemon for ftl.test)
gravity-blocked.test.ftl
gravity-46-blocked.test.ftl
# trailing dot entry (https://github.com/pi-hole/pi-hole/issues/4701)
gravity-dot-blocked.test.ftl.
# gravity abp-style (Adblock Plus)
||gravity-abp.test.ftl^

with the abp_query version:

  [i] Target: file:https:///home/pi/pihole/testftl.txt
  [i] Status: Pending...
  [] Status: Retrieval successful
  [i] List contained AdBlock Plus style domains
  [i] Imported 3 domains, ignoring 1 non-domain entries
      Sample of non-domain entries:
        - gravity-dot-blocked.test.ftl.
  [i] List has been updated

pihole -q gravity-dot-blocked.test.ftl
  [i] No results found for gravity-dot-blocked.test.ftl within the adlists

with the abp version:

  [i] Target: file:https:///home/pi/pihole/testftl.txt
  [i] Status: Pending...
  [] Status: Retrieval successful
  [i] List contained AdBlock Plus style domains
  [i] Imported 4 domains
  [i] List stayed unchanged

pihole -q gravity-dot-blocked.test.ftl
 Match found in file:https:///home/pi/pihole/testftl.txt:
   gravity-dot-blocked.test.ftl

rdwebdesign has been informed of this (PM)
/edit

@rdwebdesign
Copy link
Member

rdwebdesign commented Mar 8, 2023

  • there is a difference in gravity.sh beween branches abp and abp_query (first line is from abp branch):
  valid_domain_pattern="([a-z0-9]([a-z0-9_-]{0,61}[a-z0-9]){0,1}\.)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9]\.?"
  valid_domain_pattern="([a-z0-9]([a-z0-9_-]{0,61}[a-z0-9]){0,1}\.)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9]"

This difference is expected since my code on abp_query is based on a previous version of abp branch and it was initially intended as a test. I created a test branch in Feb 27. This trailing dot was introduced on Mar 4 and I didn't had the time to update my branch.

I appreciate your enthusiasm, but as I said before abp_query branch is still a work in progress (and there is no PR yet).
That branch will be updated after abp PR is merged.
There is no reason to rush things.

@pralor-bot
Copy link

This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/pi-hole-ftl-v5-22-web-v5-19-and-core-v5-16-1-released/61999/1

@pralor-bot
Copy link

This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/urgent-feature-request-to-whitelist/63603/14

@pralor-bot
Copy link

This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/symbols-in-adlist-pipes-and-circumflex-caret/68498/2

@pralor-bot
Copy link

This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/list-download-failed-no-cached-list-available/69968/4

@pralor-bot
Copy link

This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/cant-add-whitelist-txt-for-rakuten/69984/2

@pralor-bot
Copy link

This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/does-pihole-support-fetching-regex-lists-yet/70087/4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants