Skip to content

Commit

Permalink
Cosmetic bucket performance improvements + stabilizations (#163)
Browse files Browse the repository at this point in the history
* [BREAKING] `getCosmeticsFilter` API changed to allow finer-grain subsetting
    of cosmetic filters returned: hostname-specific, DOM-specific, generic, etc.
  * [BREAKING] cosmetic unhide filters without hostname constraints are allowed.
  * [BREAKING] `NetworkFilter.isCptAllowed` now accept request type as a string.
  * [BREAKING] drop support for legacy Firefox Bootstrap request types.
  * Fix matching of hostnames anchors with wildcard.
  * Add support for `$frame` option in network filters.
  * Add support for `$document` and `$doc` options in network filters.
  * Add soft dependency to tldts to simplify API
    - left as require/import in normal bundles
    - bundled in minified bundles
  * Add tests for Request abstraction
  * Add static method helpers to create Request instances
    - `Request.fromRawDetails(...)`
    - `Request.fromWebRequestDetails(...)`
    - `Request.fromPuppeteerDetails(...)`
    - `Request.fromElectronDetails(...)`
  * Add tests for injection using `jsdom`
  * Cosmetic filtering performance improvements
    - Make use of DOM information to return subset of filters: ids, classes, hrefs
    - Make use of MutationObserver from content-script to return new DOM info
  * Create integration benchmark to measure full extension
  * Add Request parsing micro-benchmark
  * Update bench/comparison to use adblock-rs instead of ad-block
  • Loading branch information
remusao committed May 29, 2019
1 parent 36fcb2c commit f11adee
Show file tree
Hide file tree
Showing 44 changed files with 8,717 additions and 6,371 deletions.
28 changes: 28 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,34 @@

*not released yet*

* [BREAKING] `getCosmeticsFilter` API changed to allow finer-grain subsetting
of cosmetic filters returned: hostname-specific, DOM-specific, generic, etc.
This allows to inject x70 less custom styles in frames for the same
blocking, which results in a massive memory decrease as well as less time
spent in repaint.
* [BREAKING] cosmetic unhide filters without hostname constraints are allowed.
* [BREAKING] `NetworkFilter.isCptAllowed` now accept request type as a string.
* [BREAKING] drop support for legacy Firefox Bootstrap request types.
* Fix matching of hostnames anchors with wildcard.
* Add support for `$frame` option in network filters.
* Add support for `$document` and `$doc` options in network filters.
* Add soft dependency to tldts to simplify API
- left as require/import in normal bundles
- bundled in minified bundles
* Add tests for Request abstraction
* Add static method helpers to create Request instances
- `Request.fromRawDetails(...)`
- `Request.fromWebRequestDetails(...)`
- `Request.fromPuppeteerDetails(...)`
- `Request.fromElectronDetails(...)`
* Add tests for injection using `jsdom`
* Cosmetic filtering performance improvements
- Make use of DOM information to return subset of filters: ids, classes, hrefs
- Make use of MutationObserver from content-script to return new DOM info
* Create integration benchmark to measure full extension
* Add Request parsing micro-benchmark
* Update bench/comparison to use adblock-rs instead of ad-block

## 0.9.1

*03-05-2019*
Expand Down
31 changes: 18 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,11 +77,11 @@ const { Request } = require('@cliqz/adblocker');

const request = new Request({
type: 'script',

url: 'https://sub.domain.com/ads.js?param=42',
hostname: 'sub.domain.com',
domain: 'domain.com',

sourceUrl: 'https://frame-domain.com',
hostname: 'frame-domain.com',
domain: 'frame-domain.com',
Expand All @@ -90,16 +90,22 @@ const request = new Request({
console.log(filter.match(request)); // true
```

Because creating `Request` instances using the constructor is a bit cumbersome, the library provides a helper function to make this smoother: `makeRequest`. It allows to provide only a subset of the information and will assign default values for whatever is missing. Since we also need to extract the hostnames and domains of URLs and do not want to impose a specific library for this, we need to provide an implementation of `parse` which gets an URL as argument and returns its hostname and domain. In this example we use the [tldts](https://www.npmjs.com/package/tldts) library.
Because creating `Request` instances using the constructor is a bit cumbersome, the library provides a few helper functions to make this smoother:

* `Request.fromRawDetails(...)`
* `Request.fromWebRequestDetails(...)`
* `Request.fromPuppeteerDetails(...)`
* `Request.fromElectronDetails(...)`

If you are creating a `Request` outside of an extension/electron/puppeteer context, you should use `Request.fromRawDetails`. It allows to provide only a subset of the information and will assign default values for whatever is missing.

```javascript
const { parse } = require('tldts');
const { makeRequest } = require('@cliqz/adblocker');
const { Request } = require('@cliqz/adblocker');

const request = makeRequest({
const request = Request.fromRawDetails({
type: 'script',
url: 'https://domain.com/ads.js',
}, parse);
});

filter.match(request); // true
```
Expand Down Expand Up @@ -130,8 +136,7 @@ filter.match('sub.domain.com', 'domain.com'); // true
Manipulating filters at a low level is useful to build tooling or debugging, but they are not appropriate for efficient blocking of requests (it would require iterating on all the filters to know if a request needs to be blocked). Instead, we can make use of the `FiltersEngine` class which can be seen as a "container" for both network and cosmetic filters. The filters are organized in a very compact way which also enables fast matching.

```javascript
const { FiltersEngine, NetworkFilter, CosmeticFilter, makeRequest } = require('@cliqz/adblocker');
const { parse } = require('tldts');
const { FiltersEngine, NetworkFilter, CosmeticFilter, Request } = require('@cliqz/adblocker');

// Parse multiple filters at once
let engine = FiltersEngine.parse(`
Expand All @@ -158,16 +163,16 @@ const {
redirect, // data url to redirect to if any
exception, // instance of NetworkFilter exception if any
filter, // instance of NetworkFilter which matched
} = engine.match(makeRequest({
} = engine.match(Request.fromRawDetails({
type: 'script',
url: 'https://sub.domain.com/ads.js',
}, parse));
}));

// Matching CSP (content security policy) filters.
const directives = engine.getCSPDirectives(makeRequest({
const directives = engine.getCSPDirectives(Request.fromRawDetails({
type: 'main_frame',
url: 'https://sub.domain.com/',
}, parse));
}));

// Matching cosmetic filters
const {
Expand Down
10 changes: 5 additions & 5 deletions index.ts → adblocker.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,9 @@
* file, You can obtain one at https://mozilla.org/MPL/2.0/.
*/

// Cosmetic injection
export { default as injectCosmetics } from './src/cosmetics-injection';
export { IMessageFromBackground } from './src/content/communication';

export { default as FiltersEngine, ENGINE_VERSION } from './src/engine/engine';
export { default as ReverseIndex } from './src/engine/reverse-index';
export { default as Request, makeRequest } from './src/request';
export { default as Request } from './src/request';
export { default as CosmeticFilter } from './src/filters/cosmetic';
export { default as NetworkFilter } from './src/filters/network';

Expand All @@ -25,3 +21,7 @@ export { tokenize, fastHash, updateResponseHeadersWithCSP } from './src/utils';
export { default as StaticDataView } from './src/data-view';

export { default as Config } from './src/config';

export { default as WebExtensionEngine } from './src/webextension/background';

export * from './cosmetics';
14 changes: 7 additions & 7 deletions bench/comparison/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,21 +10,21 @@ requests.json:
# VERSION: 69118b828db0f6a53bc2306deacffc5361aeef0c
./blockers/adblockpluscore:
git clone --branch=next https://github.com/adblockplus/adblockpluscore.git ./blockers/adblockpluscore
cd ./blockers/adblockpluscore && git reset --hard 69118b828db0f6a53bc2306deacffc5361aeef0c
cd ./blockers/adblockpluscore && git reset --hard c84ece65137ef991559c6b78d13eae3296236b4e

# VERSION: 0.2.0
../../node_modules/abp-filter-parser:
npm install --save https://github.com/duckduckgo/abp-filter-parser.git#0.2.0
npm install --save https://github.com/duckduckgo/abp-filter-parser.git
cd ../../node_modules/abp-filter-parser/ && npm install && cd -

# VERSION: 4.1.7
../../node_modules/ad-block:
npm install --save [email protected]
# VERSION: latest
../../node_modules/adblock-rs:
npm install --save adblock-rs

../../dist:
cd ../../ && npm ci && npm pack

brave: ../../node_modules/ad-block
brave: ../../node_modules/adblock-rs
NODE_ENV=production node run.js brave requests.json

cliqz:
Expand Down Expand Up @@ -57,7 +57,7 @@ adblockfast:
deps: requests.json \
../../dist \
../../node_modules/abp-filter-parser \
../../node_modules/ad-block \
../../node_modules/adblock-rs \
../../node_modules/jsdom \
../../node_modules/puppeteer-pool \
../../node_modules/sandboxed-module \
Expand Down
26 changes: 5 additions & 21 deletions bench/comparison/blockers/brave.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,27 +6,11 @@
* file, You can obtain one at https://mozilla.org/MPL/2.0/.
*/

const { AdBlockClient, FilterOptions } = require('ad-block');
const { getHostname } = require('tldts');

// This maps webRequest types to Brave types
const BRAVE_OPTIONS = {
sub_frame: FilterOptions.subdocument,
stylesheet: FilterOptions.stylesheet,
image: FilterOptions.image,
media: FilterOptions.media,
font: FilterOptions.font,
script: FilterOptions.script,
xmlhttprequest: FilterOptions.xmlHttpRequest,
websocket: FilterOptions.websocket,
other: FilterOptions.other,
};
const { Engine } = require('adblock-rs');

module.exports = class Brave {
static parse(rawLists) {
const client = new AdBlockClient();
client.parse(rawLists);
return new Brave(client);
return new Brave(new Engine(rawLists.split(/[\n\r]+/g)));
}

constructor(client) {
Expand All @@ -42,10 +26,10 @@ module.exports = class Brave {
}

match({ type, url, frameUrl }) {
return this.client.matches(
return this.client.check(
url,
BRAVE_OPTIONS[type] || FilterOptions.noFilterOption,
getHostname(frameUrl),
frameUrl,
type,
);
}
};
8 changes: 3 additions & 5 deletions bench/comparison/blockers/cliqz.js
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,7 @@

const path = require('path');

const tldts = require('tldts');

const { FiltersEngine, makeRequest } = require(path.resolve(__dirname, '../../../'));
const { FiltersEngine, Request } = require(path.resolve(__dirname, '../../../'));


module.exports = class Cliqz {
Expand All @@ -31,10 +29,10 @@ module.exports = class Cliqz {
}

match({ url, frameUrl, type }) {
return this.engine.match(makeRequest({
return this.engine.match(Request.fromRawDetails({
url,
sourceUrl: frameUrl,
type,
}, tldts.parse)).match;
})).match;
}
};
2 changes: 1 addition & 1 deletion bench/comparison/run.js
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ async function main() {
diff = process.hrtime(start);
serializationTimings.push((diff[0] * 1000000000 + diff[1]) / 1000000);
}
cacheSize = serialized.length;
cacheSize = serialized.length || serialized.byteLength;

// Deserialize
for (let i = 0; i < 100; i += 1) {
Expand Down
21 changes: 16 additions & 5 deletions bench/micro.js
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,9 @@

/* eslint-disable no-bitwise */

const { FiltersEngine, fastHash, tokenize, parseFilters } = require('../');
const { FiltersEngine, fastHash, tokenize, parseFilters, Request } = require('../');
const { createEngine, domains500 } = require('./utils');


function benchEngineCreation({ lists, resources }) {
return createEngine(lists, resources, {
loadCosmeticFilters: true,
Expand Down Expand Up @@ -93,15 +92,27 @@ function benchGetCosmeticsFilters({ engine }) {
}
}

function benchRequestParsing({ requests }) {
for (let i = 0; i < requests.length; i += 1) {
const { url, frameUrl, cpt } = requests[i];
Request.fromRawDetails({
url,
sourceUrl: frameUrl,
type: cpt,
});
}
}

module.exports = {
benchCosmeticsFiltersParsing,
benchEngineCreation,
benchEngineDeserialization,
benchEngineSerialization,
benchGetCosmeticTokens,
benchGetCosmeticsFilters,
benchGetNetworkTokens,
benchNetworkFiltersParsing,
benchRequestParsing,
benchStringHashing,
benchStringTokenize,
benchGetNetworkTokens,
benchGetCosmeticTokens,
benchGetCosmeticsFilters,
};
Loading

0 comments on commit f11adee

Please sign in to comment.