Update code to work with new embed syntax #124

Keats · 2017-11-14T22:02:35Z

Not optimal at all since I don't know how much we care about perf in the parsing and I chose the way that (I think) would need the least code changes.

I'm skipping the Latex syntax as a recent commit broke the use with Syntect (Doesn't work with sublimehq/Packages@666b869#diff-40f615a2e7ecb9138b1bc5826bed9074R1522), cc @keith-hall if you know what's wrong, the YAML syntax on that commit looks a bit odd to me.

Also added pretty_assertions for dev deps, I can remove it if you want but it's really useful to have differences in assert_eq! be diffed and highlighted.

Fix #120

keith-hall · 2017-11-15T08:15:21Z

I'm quite surprised the LaTeX PR with that ugly commit got merged, there were some other PRs on the Packages repo with similar hard to read style that were reviewed and the authors of those were asked to change it to a more common, readable format. I don't know YAML syntax very well, but I believe the author of the commit in question was being lazy and didn't want to create a named context, so used that hacky workaround instead - in my experience, ST otherwise doesn't allow pushing an array containing an anonymous context and a named context at once, so this hack tricks ST into accepting it.

I'm guessing that LaTeX.sublime-syntax fails to parse in syntect with ParseSyntaxError::TypeMismatch, in which case we may unfortunately have to modify syntect's yaml_load to handle such odd YAML.

About your actual solution, it's pretty similar to what I was thinking about, nice one! (my Rust-foo wasn't good enough for me to get a PR in before you :) )
However, I can foresee one small problem at the moment, and that is when the escape pattern consumes characters and there are multiple contexts to pop out of - at the moment, I think it will end up only popping out of one.

The solution, I believe, will be to push a context similar to this, whereby the with_prototype will just use a lookahead on the escape pattern to pop out of all the contexts:

- match: {{escape}}
  captures:
    {{escape_captures}}
  pop: true
- match: ''
  push:
    - meta_content_scope: {{embed_scope}}
    - include: {{embed}}
  with_prototype:
    - match: (?={{escape}})
      pop: true

My test syntax for this was going to be something like:

%YAML 1.2
---
name: Embed_Escape Used by tests in src/parsing/parser.rs
scope: source.embed-test
contexts:
  main:
    - match: '"'
      scope: meta.attribute-with-value.style.html string.quoted.double punctuation.definition.string.begin.html
      embed: embedded_context
      embed_scope: meta.attribute-with-value.style.html source.css
      escape: '"'
      escape_captures:
        0: meta.attribute-with-value.style.html string.quoted.double punctuation.definition.string.end.html
    - match: '(>)\s*'
      captures:
        1: meta.tag.style.begin.html punctuation.definition.tag.end.html
      embed: embedded_context
      embed_scope: source.css.embedded.html
      escape: (?i)(?=</style)
    - match: foobar
      scope: top-level.test

  embedded_context:
    - match: a
      push: # prove that multiple context levels can be "escape"d
        - match: b
          push:
            - match: c
              push:
                - match: 'test'
                  scope: test.embedded

with the following test case in parsing/parser.rs:

    #[test]
    fn can_parse_issue120() {
        let ps = SyntaxSet::load_from_folder("testdata/Packages").unwrap();
        let syntax = ps.find_syntax_by_name("Embed_Escape Used by tests in src/parsing/parser.rs").unwrap();

        let line1 = "\"abctest\" foobar";
        let expect1 = [
            "<meta.attribute-with-value.style.html>, <string.quoted.double>, <punctuation.definition.string.begin.html>",
            "<meta.attribute-with-value.style.html>, <source.css>",
            "<meta.attribute-with-value.style.html>, <string.quoted.double>, <punctuation.definition.string.end.html>",
            "<source.css.embedded.html>, <test.embedded>",
            "<top-level.test>",
        ];
        expect_scope_stacks_with_syntax(&line1, &expect1, syntax.to_owned());

        let line2 = ">abctest</style>foobar";
        let expect2 = [
            "<meta.tag.style.begin.html>, <punctuation.definition.tag.end.html>",
            "<source.css.embedded.html>, <test.embedded>",
            "<top-level.test>",
        ];
        expect_scope_stacks_with_syntax(&line2, &expect2, syntax.to_owned());
    }

One other thought I had, is that, in ST, embed without escape is an error, so maybe we want the same semantics in syntect too? (ParseSyntaxError::MissingMandatoryKey("escape"))

keith-hall · 2017-11-15T12:31:03Z

Actually, thinking about this more, it should probably take this sort of pattern instead, to take into account the backrefs properly:

%YAML 1.2
---
# See http:https://www.sublimetext.com/docs/3/syntax.html
scope: source.example-embed
contexts:
  main:
    # embed style (from)
    - match: (`+)(javascript|js)\s*$
      captures:
        1: markup.raw.code-fence.javascript.markdown punctuation.definition.raw.code-fence.begin.markdown
        2: markup.raw.code-fence.javascript.markdown constant.other.language-name.markdown
      embed: scope:source.js
      embed_scope: markup.raw.code-fence.javascript.markdown source.js
      escape: '^[ ]{0,3}(\1)\s*$'
      escape_captures:
        0: markup.raw.code-fence.javascript.markdown
        1: punctuation.definition.raw.code-fence.end.markdown

    # with_prototype style (to)
    - match: (`+)(javascript|js)\s*$
      captures:
        1: markup.raw.code-fence.javascript.markdown punctuation.definition.raw.code-fence.begin.markdown
        2: markup.raw.code-fence.javascript.markdown constant.other.language-name.markdown
      push: [pop_js, embed_js]
      with_prototype:
        - match: (?=^[ ]{0,3}(\1)\s*)
          pop: true

  pop_js:
    - match: '^[ ]{0,3}(\1)\s*'
      captures:
        0: markup.raw.code-fence.javascript.markdown
        1: punctuation.definition.raw.code-fence.end.markdown
      pop: true

  embed_js:
    - meta_content_scope: markup.raw.code-fence.javascript.markdown source.js
    - include: scope:source.js

Although obviously we probably have the advantage of not needing to name contexts from syntect when creating contexts inline.

Keats · 2017-11-15T12:41:08Z

Does the testcase need to change as well?

keith-hall · 2017-11-15T12:44:07Z

Good question, I believe we could get away with just changing two lines in my earlier sublime-syntax example - from - match: '"' to - match: (") and from escape: '"' to escape: '\1'

trishume · 2017-11-19T20:33:35Z

Thanks for doing this! I'm pretty busy until December so I'm not going to casually review this before its done, but when you think it's ready let me know and I'll review it.

I'll probably end up bundling #125, this PR, and maybe one or two other minor breaking changes into a 2.0.0 release early December.

Keats · 2017-11-20T15:56:16Z

@keith-hall is helping with the sublime syntax related issues he mentioned

…_embed

allow immediate child patterns of with_prototype to use backrefs

Keats · 2017-11-30T14:39:33Z

Thanks to @keith-hall work, it's almost there. Some tests in syntest failing still but it's mostly ok otherwise

trishume · 2017-11-30T21:42:14Z

src/parsing/parser.rs

@@ -451,8 +451,26 @@ impl ParseState {
 MatchOperation::None => return false,
 };
 for (i, r) in ctx_refs.iter().enumerate() {
- let proto = if i == ctx_refs.len() - 1 {
- pat.with_prototype.clone()
+ let proto = if i == ctx_refs.len() - 1 { // https://forum.sublimetext.com/t/dev-build-3111/19240/17


I don't have time to fully review this yet (I will soon), but I just want to mention that this big deeply nested block looks scary. The link to the forum post is definitely helpful, but if there's some way you could make this less scary that would be great, or at least comment it more heavily. Maybe explain how it is implementing the behaviour described in that post in terms of syntect's data structures.

...as the contexts named aren't shown, and instead the entire match patterns etc. are output, the debug info from the parse state is too large to be useful, so I removed this aspect of it

Keats · 2017-12-08T11:14:03Z

@trishume I believe this is ready for review, unless @keith-hall wants to add his fix for #127 in that PR

trishume

Preliminary review: looks good except for the with_prototype capture handling.

But don't fix that yet since I'm going to take a hard look at how with_prototypes work and try and figure out a better way. If I seem to have forgotten about my commitment to doing this though feel free to ping me.

trishume · 2017-12-08T22:57:05Z

Cargo.toml

@@ -33,6 +33,7 @@ serde_json = "1.0"
 rayon = "0.8.0"
 regex = "0.2.1"
 getopts = "0.2"
+pretty_assertions = "0.4.0"


I approve 👍

trishume · 2017-12-08T22:57:56Z

src/parsing/yaml_load.rs

@@ -621,6 +665,118 @@ mod tests {
 }
 }

+ #[test]
+ fn can_parse_embed_as_with_prototypes() {


I really like this test. Very clearly shows what the rewrite does.

trishume · 2017-12-08T23:01:37Z

src/parsing/yaml_load.rs

+ 1: meta.tag.style.begin.html punctuation.definition.tag.end.html
+ push:
+ - [{ match: '(?i)(?=</style)', pop: true }]
+ - [{ meta_content_scope: 'source.css.embedded.html'}, { match: '', pop: true }]


My knowledge of Sublime syntaxes is quite rusty (heh), but what's the purpose of this context? Why is the meta_content_scope not just in the scope above?

this may just be a leftover from how I originally thought it would work, and I didn't spot it when I was cleaning up...

trishume · 2017-12-08T23:02:51Z

src/parsing/yaml_load.rs

+ - [{ meta_content_scope: 'source.css.embedded.html'}, { match: '', pop: true }]
+ - scope:source.css
+ with_prototype:
+ - match: (?=(?i)(?=</style))


The double-lookahead does exactly the same as the single lookahead right? It's just that the escape regex might not be a lookahead and in order to emulate it correctly we have to ensure this one is a lookahead?

correct 👍

trishume · 2017-12-08T23:17:44Z

src/parsing/parser.rs

+ // even have had any capture groups which would previously have caused
+ // a panic)
+ if match_pat.has_captures {
+ match_pat.regex = Some(match_pat.compile_with_refs(regions, line));


I think this doesn't work properly if there's recursion that allows the same with_context to be pushed twice on the stack, when the top one is popped it will still have the old captures compiled into it. This may never happen in practice with a with_prototype that has captures but it's possible in principle.

I'm also looking at this code again and thinking that I might have really screwed up the implementation of with_prototype and that I can redo it completely to be cleaner.

Don't try and fix this yet, I want to take a crack at simplifying a bunch of the related code, which may just outdate whatever fix you try.

trishume · 2017-12-08T23:18:11Z

testdata/minimized_tests/syntax_test_aspmini.asp

@@ -2,7 +2,7 @@
 <%
 Class TestClass2 Public Sub TestSub () Response.Write("wow") End Sub End Class
 '^^^^^ meta.class.asp meta.class.identifier.asp storage.type.asp
-' ^ meta.class.asp meta.class.body.asp meta.class.asp meta.class.identifier.asp
+' ^ meta.class.asp meta.class.identifier.asp


I thought I copied this from the Sublime syntax test, did the Sublime one change?

yes, sublimehq/Packages@73e105f

trishume · 2017-12-09T00:41:45Z

I just posted https://github.com/Keats/syntect/pull/2 which fixes the issue with the captures logic while also folding it into the general case.

Once you merge that into your branch and @keith-hall finishes investigating whether pushing two inline contexts is really necessary to emulate embed then I'll look at merging.

Clean up and fix with_prototype captures logic

trishume · 2017-12-09T17:45:11Z

I benchmarked it and performance doesn't seem to have regressed, in fact I don't know if it's favourable testing conditions or improvements in the Nightly compiler but now highlighting jQuery takes 550ms instead of 600 and something.

@keith-hall do you know if the regression in syntax tests is just because the updated syntaxes are hitting existing bugs more than they were before, or if we broke something?

Because C# now has tons of failing tests when it only had a few before, and Bash and Make are now slightly broken where they weren't before.

keith-hall · 2017-12-09T18:48:34Z

I'm almost certain its the updated syntaxes that are causing it to hit more bugs than before - its definitely true of C#.

trishume · 2017-12-09T19:25:13Z

Okay I'm going to merge this now then. Thanks so much to both of you for the great work!

I might hold off on releasing 2.0 for a week or so to see if I have time to look around for a few more backwards-incompatible changes I can lump in. If anyone wants a release sooner rather than later I can probably do that.

est31 · 2017-12-09T19:37:06Z

@trishume sounds fine by me!

est31 · 2018-01-02T00:27:39Z

@trishume do you have time to do a 2.0 release now? That would be great.

Update code to work with new embed syntax

d0ab1c7

Fix #120

Error on embed without escape

fa2bd9b

keith-hall added 2 commits November 15, 2017 22:54

attempt to get embed / escape working with backrefs

e1cd259

Support ugly YAML push syntax

31e5796

est31 mentioned this pull request Nov 19, 2017

Update dependencies #125

Merged

keith-hall and others added 7 commits November 29, 2017 14:25

Merge remote-tracking branch 'origin/ugly_yaml_push_syntax' into yaml…

b6ec574

…_embed

Fix failing test can_parse_embed_as_with_prototypes

b6f193a

Latex parsing working again

6b4df77

fix embed include

0de55fa

Update dumps

313db0f

allow immediate child patterns of with_prototype to use backrefs

19d71e3

Merge pull request #1 from forkeith/with_prototype_backrefs

e9c1c55

allow immediate child patterns of with_prototype to use backrefs

Keats changed the title ~~[wip] Update code to work with new embed syntax~~ Update code to work with new embed syntax Nov 30, 2017

trishume reviewed Nov 30, 2017

View reviewed changes

keith-hall added 3 commits December 1, 2017 10:43

comment with_prototype backref code

ee8ec02

embed context directly rather than as an include to match ST behavior

4987ff0

add debug parameter to syntest example

74d94e7

keith-hall mentioned this pull request Dec 1, 2017

syntect handles push & immediately pop non-consuming patterns differently to ST #127

Closed

small tweak to syntest debug output

5468d9e

...as the contexts named aren't shown, and instead the entire match patterns etc. are output, the debug info from the parse state is too large to be useful, so I removed this aspect of it

trishume mentioned this pull request Dec 1, 2017

Fix the last few syntax tests #59

Open

4 tasks

trishume reviewed Dec 8, 2017

View reviewed changes

Clean up and fix with_prototype captures logic

6da1f86

Keats and others added 2 commits December 9, 2017 10:44

Merge pull request #2 from trishume/fix-capture-logic

a5865dd

Clean up and fix with_prototype captures logic

removed unnecessary context when translating embed/escape

b8ed5e2

trishume merged commit af810cd into trishume:master Dec 9, 2017

trishume mentioned this pull request Jan 2, 2018

Bump to v2.0.0 with cleanup #130

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update code to work with new embed syntax #124

Update code to work with new embed syntax #124

Keats commented Nov 14, 2017 •

edited

Loading

keith-hall commented Nov 15, 2017

keith-hall commented Nov 15, 2017

Keats commented Nov 15, 2017

keith-hall commented Nov 15, 2017

trishume commented Nov 19, 2017

Keats commented Nov 20, 2017

Keats commented Nov 30, 2017

trishume Nov 30, 2017

Keats commented Dec 8, 2017

trishume left a comment

trishume Dec 8, 2017

trishume Dec 8, 2017

trishume Dec 8, 2017

keith-hall Dec 9, 2017

trishume Dec 8, 2017

keith-hall Dec 9, 2017

trishume Dec 8, 2017

trishume Dec 8, 2017

keith-hall Dec 9, 2017

trishume commented Dec 9, 2017

trishume commented Dec 9, 2017

keith-hall commented Dec 9, 2017

trishume commented Dec 9, 2017

est31 commented Dec 9, 2017

est31 commented Jan 2, 2018

Update code to work with new embed syntax #124

Update code to work with new embed syntax #124

Conversation

Keats commented Nov 14, 2017 • edited Loading

keith-hall commented Nov 15, 2017

keith-hall commented Nov 15, 2017

Keats commented Nov 15, 2017

keith-hall commented Nov 15, 2017

trishume commented Nov 19, 2017

Keats commented Nov 20, 2017

Keats commented Nov 30, 2017

Choose a reason for hiding this comment

Keats commented Dec 8, 2017

trishume left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trishume commented Dec 9, 2017

trishume commented Dec 9, 2017

keith-hall commented Dec 9, 2017

trishume commented Dec 9, 2017

est31 commented Dec 9, 2017

est31 commented Jan 2, 2018

Keats commented Nov 14, 2017 •

edited

Loading