Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-detect language when code is pasted #190

Open
dennwc opened this issue Jan 22, 2019 · 10 comments
Open

Auto-detect language when code is pasted #190

dennwc opened this issue Jan 22, 2019 · 10 comments
Labels

Comments

@dennwc
Copy link
Member

dennwc commented Jan 22, 2019

I'm wondering if it's possible to refresh the detected language when the code is pasted and the language is set to "Auto"?

@smacker
Copy link
Collaborator

smacker commented Jan 22, 2019

Currently we "detect" language using bblfsh. So we need to parse it to get the lang.

I proposed to use js-based detection before for gitbase-web but it was discarded.
In theory we can "parse" in background on paste event without updating UAST on the right side.

@dennwc
Copy link
Member Author

dennwc commented Jan 22, 2019

We can also expose language detection API on bblfshd side to make this happen.

@dpordomingo
Copy link
Member

How would it work @dennwc ?

Should it try to guess the lang every time it is detected an onpaste event?

Could it lead to weird language detections?

Should it only work when the whole input area is completely replaced with text from the clipboard? (I'm not sure if it can be done without workarounds)

What if the whole input area is completely replaced with a key press, or typing new code?
should it guess the language again? example:

  1. code is auto, and guessed as java
  2. user selects the whole input text
  3. user types #include <stdio.h>

→ should it be guessed that it is C, or only if it would be pasted from clipboard?

@dennwc
Copy link
Member Author

dennwc commented Jan 24, 2019

I think detecting on input will be a bit too extreme. onpaste looks practical enough, I guess.

@creachadair
Copy link

How well will the detection we currently support work with just plain text and no filename?

@smacker
Copy link
Collaborator

smacker commented Jan 24, 2019

It does not work almost at all. Bblfsh uses enry to detect language and enry detects it based on filename.

@dennwc
Copy link
Member Author

dennwc commented Jan 24, 2019

@smacker Enry also uses other heuristics. A filename is only one of them.

@creachadair
Copy link

@smacker Enry also uses other heuristics. A filename is only one of them.

It does, but the filename seems to be a very important one. So my question was real—I am not sure how well Enry will do in the case where no filename is inferred.

@smacker
Copy link
Collaborator

smacker commented Jan 24, 2019

@dennwc it's true. But the last time I checked it almost never could guess the lang correctly without filename. You can see it in gitbase-web. We actually use enry there. But still you have to choose language manually in 99.99% cases.

@smacker
Copy link
Collaborator

smacker commented Apr 11, 2019

I tried with linguist and it couldn't recognize language by default also.
But calling it with a list of candidates ("Go", "Python", "JavaScript", "Ruby", "Java" aka supported languages from bblfshd) worked for all my examples.

The same trick with enry didn't work most probably because content classifier in enry is very outdated.

We should return to this issue when enry is updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants