-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use [[OriginalFlags]] from within RegExpBuiltinExec? #489
Comments
there's actually a reason for this. [[RegExpMatcher]] represents the "compiled" matcher that is created by a built-in RegExp constructor. The semantics of such a matcher are defined by 21.2.2. Both the creation of a [[RegExpMatcher]] and its evaluation semantics depend upon the original state of some (but not all) of the flags that were presented to the constructor. The state of the original flags are immutably captured in [[OriginalFlags]]. So, the value of [[OriginalFlags]] is always consistent with the flag observations made when generating the associated [[RegExpMatcher]]. That is also why flag values used within 21.2.2 are derived from [[OriginalFlags]]. Let's call the flags that 21.2.2 actually depends upon the matcher flags. The matcher flags are "u", "m", and "i". The RegExpBuiltinExec abstract operation is essentially a loop around multiple invocations of a [[RegExpMatcher]]. There are three ways flags are used within RegExpBuiltinExec
Using a property access to get "global" and "sticky" enables a RegExp subclass to change the policy for managing those flags while still using the builtin matcher. For example, a subclass might allow either or both of them to be modified between This design provides a little bit of flexibility for subclass authors without significantly constraining implementations. You can still merge the RegExpBuiltinExec loop with a [[RegExpMatcher]] is you want. You just need to guard it with "global" and "sticky" checks. |
It's exactly this guarding that is difficult for us. Because we compile |
The WebKit JSC implementation sounds similar to V8's. JSC also compiles in the Since getting the A related issue is that requiring property access for global and sticky instead of [[OriginalFlags]] means that repetitive RegExp operations like RegExpPrototype[@@match] invoked with a RegExp that has a true |
Having just read it now, I have to agree with littledan that this API is pretty darn confusing. (1) sticky+global vs unicode+multiline+ignoreCase. It is simply inconsistent that the left-hand-side of this list of flags is modifiable while the right-hand-side is not. The fact that this inconsistency happens to be made possible by an artifact of how [[RegExpMatcher]] is specified does not mean that it is good. I believe that inconsistency is bad in API design because it makes an API harder to follow. (2) Base class vs subclass. The base class built-in regular expression object provides read-only access to its flags (through accessors that provide getters but not setters). The idea that a subclass should take a base class's read-only flags and make them writeable violates the Liskov Substitution Principle -- and is not something I've ever wanted to do in my own code. Even if all built-in code has been specified to check for these violations of consistency and object oriented design principle, and tolerate them, we're leaving behind a huge trap for any library authors who want to write helper functions that accept regular expression objects as inputs. All such library code will need to change to accommodate this new, odd feature of a regular expression changing its flags (but only some of them) mid-operation. |
This closes tc39#489. In multiple ECMAScript implementations, the 'sticky' flag is compiled into the matcher. Given that RegExpBuiltinExec already has a RegExp instance, and it makes other references to [[OriginalFlags]] in that algorithm, it seems reasonable to refer to [[OriginalFlags]] for these flags as well.
I think it's probably ok to make this change, assuming there aren't any objections from other implementors. The split treatment of the flags is mostly a remnant of some historic structuring of the the spec. I agree that consistency would be better and the subclassing argument should apply to either all or none of the them. I think there were some intermediate drafts of the ES6 spec where the separate treatment of the flags were significant, but things evolved so that was no longer the case
Sure there are, the semantics is to follow the algorithm steps exactly as specified. That means that each invocation of @geoffreygaren |
This closes tc39#489. RegExpBuiltinExec reads other flags from [[OriginalFlags]], based on the fact that it is operating on a RegExp instance. In ES2015, however, it accessed 'global' and 'sticky' via property accesses. This patch changes RegExpBuiltinExec to access those flags via [[OriginalFlags]] as well.
This would also simplify the implementation in SpiderMonkey quite a bit and enable some more optimizations. I also agree that the flags should be treated in a consistent manner regardless of any implementation considerations. So, a wholehearted +1. |
Seems fine, I am also happy to delete some code. |
During the Nov 2015 TC-39 meeting, @littledan brought up ”Simplification of ES2015 RegExp Semantics which was rejected. IIRC, @allenwb shared some slides with a broader philosophy of the RegExp portion of ES2015 . I cannot find minutes covering that discussion or any slides for @allenwb. I think it would help focus the discussion on this item next week if the committee had those minutes and slides available before the meeting next week. |
Looking at RegExpBuiltinExec's algorithm in conjunction with the algorithm described for 21.2.5.6 RegExp.prototype[@@match] and it appears that the combination of the two will never terminate in at least one case. Consider a RegExp derived object that does not override exec, with a pattern of /a/, and the argument string of "aa". It does have a "global" getter that initially returns true, but all subsequent calls to that getter return false. At step 4 of 21.2.5.6, global will be set to I think this also supports the proposal that RegExpBuiltinExec should use [[OriginalFlags]] to access "global" and "sticky" instead of calling a getter. |
This closes tc39#489. RegExpBuiltinExec reads other flags from [[OriginalFlags]], based on the fact that it is operating on a RegExp instance. In ES2015, however, it accessed 'global' and 'sticky' via property accesses. This patch changes RegExpBuiltinExec to access those flags via [[OriginalFlags]] as well.
At the March TC39 meeting, the change to reference [[OriginalFlags]] here achieved consensus. |
) This closes #489. RegExpBuiltinExec reads other flags from [[OriginalFlags]], based on the fact that it is operating on a RegExp instance. In ES2015, however, it accessed 'global' and 'sticky' via property accesses. This patch changes RegExpBuiltinExec to access those flags via [[OriginalFlags]] as well.
RegExpBuiltinExec can only be called on objects with a [[RegExpMatcher]] internal slot. It currently uses a combination of flags from [[OriginalFlags]] (to determine fullUnicode, and also from within the matcher) as well as other flags from the
"sticky"
and"global"
properties. It seems to me like this algorithm would be more consistent if [[OriginalFlags]] were also used to determine whether the"y"
and"g"
flags are set, rather than actually performing a Get. As a side benefit, in V8's implementation,"y"
is compiled into the equivalent of the matcher, so the change to use [[OriginalFlags]] would mean that we would not have to compile the RegExp when the flags change.What do you think? @allenwb @ljharb @bterlson
The text was updated successfully, but these errors were encountered: