Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

May resume SLY work in September 2021 #76

Open
dabeaz opened this issue Aug 26, 2021 · 5 comments
Open

May resume SLY work in September 2021 #76

dabeaz opened this issue Aug 26, 2021 · 5 comments

Comments

@dabeaz
Copy link
Owner

dabeaz commented Aug 26, 2021

Just a quick note that may resume some work on SLY in the coming month. Open to anything that improves it. Some things I'm thinking about

  • Better error messages
  • Improvements to the EBNF parsing features
  • Syntax diagrams
@jpsnyder
Copy link

One QoL feature I would like to see:

I find it tedious to have to manually define the tokens set in the beginning of the Lexer class. And can be confusing since the ignore_* tokens don't have to be added to the set to be available.
Perhaps the Lexer class can just assume all fields / wrapped functions that are in all uppercase (and don't start with a _) counts as a token and get automatically filled into the tokens field (if the user didn't explicitly define the set).

Also, as an alternative use case. It would be nice if the tokens field can be a list instead. That way we can define the
proper order of applying the lexing rules with the list so we don't have to worry so much about the order the tokens are declared in the class.

Thanks for your hard work on this!

@hadware
Copy link

hadware commented Sep 27, 2021

Also, as an alternative use case. It would be nice if the tokens field can be a list instead. That way we can define the
proper order of applying the lexing rules with the list so we don't have to worry so much about the order the tokens are declared in the class.

I definitely agree with that part!

@jpsnyder
Copy link

Another feature that I would like to see based on an issue I ran into, would be to have a way to generate more than one token during error handling for the lexer.
In some situations where the language you are trying to lex is sufficiently complex, one might need to add some hacks in the error() handling function.

It would be nice if we could yield multiple tokens from the error() function when we need to resort to doing some more manual regex pattern matching to get the next few tokens before getting back on track.

def error(self, t):
    # On error yield the raw whitespace separated arguments as tokens until we see a newline.
    text, _, _ = t.value.partition("\n")
    for arg in text.split(" "):
        new_token = YaccSymbol()
        new_token.type = "ARGUMENT"
        new_token.value = arg
        yield new_token

@BlakeCThompson
Copy link

@dabeaz Something I think would be nice would be the ability to change the start attribute after having initialized the parser, and have the grammar updated appropriately.

For example, if I have a test class to test my grammar, if I want to test small subsets of the grammar, I need to manually change the start attribute in my parser every time that I want to test the smaller subsets. If I could easily change the start attribute at the time parse was called, it would make automating tests a lot easier

@BlakeCThompson
Copy link

There are probably several ways to do this ^^ but I've made a PR with one solution
#101

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants