Numbat lsp #538

irevoire · 2024-08-15T09:39:29Z

Contributing to the numbat standard was so much a pain in the ass that I started working on an LSP.
Currently, it only outputs some errors, but it’s still very buggy; most of the code I wrote lives in numbat-lsp/src/main.rs. Almost everything else comes from this repo that I cloned.

Screen.Recording.2024-08-15.at.11.31.47.mov

Adding support for completion + gotodefinition doesn't seem impossible.
Maybe we could even push further and get the types to work.

I was wondering if this was the kind of stuff you would be willing to accept in the future?

Currently, I’m hitting this bug a lot: ebkalderon/tower-lsp#413
I’m not sure, but maybe tower-lsp is not really prod ready, and we'll need to move out of it? 😩

sharkdp · 2024-08-15T19:42:29Z

Awesome stuff 😄. LSP is definitely something I'd like to look into. I only had a very brief look into how this works and it seems like you run numbat::Context::interpret* and then collect errors from there. Is this a sane approach to building a language server? What if there are side effects? I would imagine a language server only runs the frontend of the compiler (tokenizer-parser-typechecker), but not the backend (code generation and execution), where no errors can occur.

RossSmyth · 2024-08-16T18:49:23Z

Just popping in because this is a topic I'm familiar with.

Is this a sane approach to building a language server?

It's not too bad. Rust-Analyzer gets most of its diagnostics from running rustc upon saving the file. There are some diagnostics baked in though, and it does have a front-end and part of the middle end in it for semantic analysis. The sema implemented allows for "go-to definition", semantic token highlighting, completion, and "lightbulb" refactors among other things.

So in reality having a front-end is pretty much required.

The front-end should have a lossless view of the syntax, what this really ends up meaning is having a concrete syntax tree rather than an abstract syntax tree so that refactors don't clobber trivia. This is also great for formatters and rustfmt hits this limitation, and rustfmt has been considering switching to a CST.

Other useful posts:
https://rust-analyzer.github.io/blog/2020/09/28/how-to-make-a-light-bulb.html
https://rust-analyzer.github.io/blog/2019/11/13/find-usages.html

sharkdp · 2024-08-29T17:10:00Z

It's not too bad. Rust-Analyzer gets most of its diagnostics from running rustc upon saving the file.

Ok, but the rustc backend doesn't execute the code. numbat::Context::interpret runs the whole compiler and the execution on the bytecode VM. This is what I meant by: "is this a sane approach …".

So in reality having a front-end is pretty much required.

I have no issue with that. I would also imagine the Numbat LSP to reuse the tokenizer, parser, and semantic analysis stages (prefix handling, name resolution, type checker) of the compiler.

The front-end should have a lossless view of the syntax, what this really ends up meaning is having a concrete syntax tree rather than an abstract syntax tree so that refactors don't clobber trivia. This is also great for formatters and rustfmt hits this limitation, and rustfmt has been considering switching to a CST.

That's something we definitely don't have at the moment. We lose all whitespace information (and things like parens in expressions, see also #102) during parsing.

RossSmyth · 2024-08-30T14:00:36Z

I would definitely look into generating CST structures with ungrammar as I do find that construction pretty valuable for your parser, so you don't throw information away that would be useful for LSP things and error reporting.

Another thing would be making the lexer infallible in a similar vein to how your parser is already. The most common way of doing that is making an error token that can be emitted. So then the type signature of the lexer would become fn scan(&mut self) -> Vec<Token>. Adding a debug config that checks that all tokens are contiguous is also a good idea for testing. Here's an example of how I've done it in the past.
https://github.com/RossSmyth/meowfile/blob/2a89f23f1a34a27b3275de9d004113deed3f6932/crates/lex/src/lib.rs#L16-L19

irevoire added 4 commits August 21, 2024 17:46

init commit

c8825ca

handle most numbat errors

958317a

Make the function output diagnostic instead of publishing reports

f7658bb

fmt

7c80f9c

irevoire force-pushed the numbat-lsp branch from 8144ce8 to 7c80f9c Compare August 21, 2024 15:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numbat lsp #538

Numbat lsp #538

irevoire commented Aug 15, 2024

sharkdp commented Aug 15, 2024 •

edited

Loading

RossSmyth commented Aug 16, 2024

sharkdp commented Aug 29, 2024

RossSmyth commented Aug 30, 2024

Numbat lsp #538

Are you sure you want to change the base?

Numbat lsp #538

Conversation

irevoire commented Aug 15, 2024

sharkdp commented Aug 15, 2024 • edited Loading

RossSmyth commented Aug 16, 2024

sharkdp commented Aug 29, 2024

RossSmyth commented Aug 30, 2024

sharkdp commented Aug 15, 2024 •

edited

Loading