-
-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Adblock Plus domain lists #1532
Conversation
…ds to be switched on by setting GRAVITY_ABP_STYLE=true in pihole-FTL.conf to avoid running this computationally expensive task on the vast majority of user databases only fed from properly formatted HOSTS lists. Gravity can enable the setting when it detects ABP format automatically. Signed-off-by: DL6ER <[email protected]>
… domains are present in the database. This ensures that this addition comes at no extra costs to any installs using pure HOSTS-style adlists. Signed-off-by: DL6ER <[email protected]>
…list not available" Signed-off-by: DL6ER <[email protected]>
Signed-off-by: Christian König <[email protected]>
Spespellingllling Signed-off-by: Dan Schaper <[email protected]>
Will need a matching PR in core to allow gravity.sh to pass the new format. |
🎉
|
This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there: https://discourse.pi-hole.net/t/regex-capture-non-capture-groups-best-practice/61297/10 |
could you please ensure there is a warning when "ABP-style matching" is enabled, either in the FTL log or in the web interface ("pi-hole diagnosis"), this to ensure the system warns there might be a "computational cost". Thanks. |
I added an info to the gravity script
|
gravityupdate runs on sunday early morning, not too many people look at the resulting log. An entry in the FTL log would be highly appreciated, monitoring this log allows me to generate the appropriate alert. |
Just for my understanding: what do you intend to do, if you would get such a warning about an adlist? What are the consequences of the warning? |
Given the predicted performance cost, I would remove the list that activates "ABP-style matching", I'm happy with the amount of blocking I achieve now (35.4%) and confident that the additional domains are already (or will be) part of the regular lists I currently use. |
Having a certain high-performance computing background, my choice of words may have not been adequate. Typically, gravity lookups take on the order of few microseconds so with ABP style blocking enabled we're still talking sub-milliseconds here. I don't think it's worth making a big deal out of it but it also won't harm to add a log message like
|
|
Is this really necessary? There may be multiple lists of this type, depending on what the user added. |
No, I will neither add the list URL nor the adlist ID. The reason is that this would be a very expensive database query. Image expensive now on the order of several seconds to a few minutes of FTL downtime. The reason is that we'd have to do a full database scan to get this information. Currently, FTL scans only if there is any domain in ABP style and stops right when it found one. However, there may be multiple adlists in the database so we'd have to scan the entire table until we have collected all adlist IDs. As always, you can easily get what you want by a third-party script running an appropriate SQL query, such as SELECT DISTINCT adlist_id FROM gravity WHERE domain LIKE '||%^'; |
The information is already on |
Signed-off-by: DL6ER <[email protected]>
I pushed a new commit and set this PR back to draft because I'm interested in your findings about timing. Please, check out again this branch and have a look into your
What does it say in your case? Do you have ABP domains in your gravity or not? I'm also interested in results once with and once without these |
I tried this on a x86 (Celeron J4105 ) once with a single ABP list (https://small.oisd.nl)
and once without
|
…king is to be used or not. Also log when FTL enabled ABP-style blocking Signed-off-by: DL6ER <[email protected]>
Signed-off-by: DL6ER <[email protected]>
another one that might be useful when the need arises to compare domain entries from the gravity table (gravity.db) with domain entries from the queries view (pihole-FTL.db)
example (not working on standard pihole deployments)
sqlite3 select result with trim (sample) - may match an entry in the queries view:
sqlite3 select result without trim (sample) - will never match anything in the queries view:
|
That's fine and will show the exact match.
|
There needs to be an understanding that ABP style entries are 'special' and not a direct match for comparison. |
copied over query.sh from branch abp_query result (success):
will keep testing... |
is this intended? I know I asked for a warning, but this looks like overkill. The warning, when pihole-FTL starts is sufficient.
|
I'm pretty sure those are |
Are there any outstanding issues that need to be addressed before we can merge this in? Are the timing calls and the header includes meant to be removed? |
They should only be shown when gravity is reloaded. However, you post made me aware of that gravity is reloaded when forks are created as SQLite3 doesn't have proper multi-processing support when it comes to forks (only threads are fine). I'll see how this can be improved, setting back to draft until then. @dschaper We can remove them, I'll do this as well. |
…ve the hint about ABP domains, this can easily be checked in gravity Signed-off-by: DL6ER <[email protected]>
I removed the leftover traces of the benchmarking we did in between and decided to also remove the ABP-style blocking hint. Ensuring it is logged for every gravity rebuild but not when, e.g., FTL forks a dedicated TCP handler seems unnecessarily complicated given that users (a) should know when they add lists with such domains, (b) This PR should now finally be ready for review + merge. |
latest version installed |
see issue 4701, PR 4704, related? edit PR ensures trailing periods are removed before entering the domain into gravity. The result is that more domains are valid, domains with a trailing period are considered valid, trailing period is removed by gravity.sh. my test file (used to verify pihole-FTL returns correct results), with an entry with trailing dot (in real life, there are blocklists with domains with a trailing dot):
with the abp_query version:
with the abp version:
rdwebdesign has been informed of this (PM) |
This difference is expected since my code on I appreciate your enthusiasm, but as I said before |
This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there: https://discourse.pi-hole.net/t/pi-hole-ftl-v5-22-web-v5-19-and-core-v5-16-1-released/61999/1 |
This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there: https://discourse.pi-hole.net/t/urgent-feature-request-to-whitelist/63603/14 |
This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there: https://discourse.pi-hole.net/t/symbols-in-adlist-pipes-and-circumflex-caret/68498/2 |
This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there: https://discourse.pi-hole.net/t/list-download-failed-no-cached-list-available/69968/4 |
This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there: https://discourse.pi-hole.net/t/cant-add-whitelist-txt-for-rakuten/69984/2 |
This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there: https://discourse.pi-hole.net/t/does-pihole-support-fetching-regex-lists-yet/70087/4 |
What does this implement/fix?
This PR implements support for AdBlock Plus (ABP)-style domain lists in Pi-hole. To be precise, it adds support for the following domain matching syntax defined here at the time of opening this PR:
We do not implement any other features such as exception rules as they are typically beyond what a DNS server can do (path information, for instance, is simply not available).
It should be noted that this new feature is not for free but the rather complex syntax means it comes at some computational costs (= delays in DNS replies if you are on low-end hardware). To mitigate this drawback, ABP-style matching is only enabled when FTL actually detects such domains in the
gravity
table. This shouldn't be the case for the vast majority of users using "normal" HOSTS-style or simple one-domain-per-line adlists as sources for Pi-hole.Related issue or feature (if applicable): N/A
Pull request in docs with documentation (if applicable): N/A
By submitting this pull request, I confirm the following:
git rebase
)Checklist:
developmental
branch.