Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track specific validation errors #260

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

karwa
Copy link
Contributor

@karwa karwa commented Sep 26, 2023

From the table, I've only done the "URL parsing" section, not the IDNA or host parsing sections.

Having this in the reference implementation has already helped me confirm a couple of issues with the validation errors reported by the standard (bugs and PRs to WHATWG/url incoming). It can also be useful for other implementations to check the errors they report. Hopefully we can add expected validation errors to the WPT test suite in due course, as well.

That said, these results are not exposed to the live viewer yet. I've hacked something together locally, but I'm not sure of the cleanest way to thread that through to the viewer.

@karwa
Copy link
Contributor Author

karwa commented Sep 26, 2023

Related to #156 and #61

As for performance: we could use a bit-mask rather than an array of strings (since the standard only has a handful of unique validation errors). The cost is that we would lose some information, namely:

  • The order in which errors were encountered
  • The number of occurrences of each error

It would also be more limiting, as we would be unable to store associated information with each error (e.g. cursor position). But, in return, we would be able to store all errors that were encountered in a fixed amount of memory.

Another alternative would be for the parser to accept a validation error callback closure. Callers could supply a closure which collects all errors in an array, or which ignores them entirely, and the overhead in the parser should not be worse than any other virtual function call; the parser would just call the user-supplied function saying "hey, this error happened". Importantly, the caller would provide the memory to store errors, if they indeed want to store them.

@karwa
Copy link
Contributor Author

karwa commented Sep 26, 2023

I've implemented the closure approach on a branch (https://github.com/karwa/whatwg-url/commits/validation-error-callback) and hooked it up roughly to the live viewer.

The constructor for the URL type is generated from IDL or something and I'm not sure how to go around it, so you'll need to forward that argument manually by editing the generated constructor in lib/URL.js (line 115) and adding:

{
  let curArg = arguments[2];
  if (curArg !== undefined && typeof curArg === "function") {
    args.push(curArg);
  }
}

But yeah, it illustrates the approach. Since it's a callback function we can also add supplementary information about the error such as the pointer position, without having to worry about the memory overhead of storing all this stuff.

@domenic
Copy link
Member

domenic commented Sep 30, 2023

This is exciting work!

I think we only want to expose this capability through the low-level URL Standard API, not the new URL() API. That should bypass any worries about conflicting with the generated code. It does mean the live viewer would need to switch which API it uses, but that should be fine.

Other things we'd need to do before merging this:

  • Documentation in the README, including on which validation errors are missing.
  • The results of the benchmark script before and after.

It may also be worth benchmarking the results of a no-op default onValidationError, versus a null/undefined default onValidationError that is always called with this.onValidationError.?(...).

@domenic
Copy link
Member

domenic commented Sep 30, 2023

I also like the onValidationError approach because we could avoid the performance cost of #59 by doing something like

if (this.onValidationError && !isURLCodePoint(c) && c !== p("%")) {
   this.onValidationError("invalid-URL-unit");
}

instead of just

if (!isURLCodePoint(c) && c !== p("%")) {
   this.onValidationError.?("invalid-URL-unit");
}

There are a few other instances of that pattern as well that might be worth optimizing out, e.g.

    if (c === p("%") &&
        (!infra.isASCIIHex(this.input[this.pointer + 1]) ||
         !infra.isASCIIHex(this.input[this.pointer + 2]))) {
      this.onValidationError("invalid-URL-unit");
    }

could remove collapse the three checks down into one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants