Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python3Packages.tree-sitter-grammars: init at 0.22.5 #320783

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

sarcasticadmin
Copy link
Member

Description of changes

As request in ngi-nix/ngipkgs#139

This generates a python binding package for each tree-sitter-grammar that exists in tree-sitter.builtGrammars.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

};
};
in
# TODO pkgset or flattened?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't see any pkgsets inside python3Packages so we will go ahead and flatten this unless we're told otherwise. But we didn't any flattening in top-level either so we guess this will be the first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We changed our minds. We saw at least two package sets nested in python3Packages. We saw bootstrap and qt6. And this one is literally mapped from an existing package set (tree-sitter.builtGrammars) so it seems appropriate.

@sarcasticadmin sarcasticadmin force-pushed the treesitter-python-generated-grammars branch from 6937418 to 253c53f Compare June 18, 2024 13:59
@mightyiam mightyiam force-pushed the treesitter-python-generated-grammars branch from 253c53f to 1ff067a Compare June 18, 2024 14:06
@GetPsyched GetPsyched force-pushed the treesitter-python-generated-grammars branch from 1ff067a to fe9e96c Compare June 18, 2024 14:12
@mightyiam mightyiam force-pushed the treesitter-python-generated-grammars branch 2 times, most recently from 4eeaa9e to 8c63440 Compare June 19, 2024 13:52
@stepbrobd stepbrobd force-pushed the treesitter-python-generated-grammars branch from 8c63440 to 0550b06 Compare June 20, 2024 07:22
@ofborg ofborg bot requested review from mightyiam and stepbrobd June 20, 2024 07:59
@A-jay98 A-jay98 force-pushed the treesitter-python-generated-grammars branch from 0550b06 to 5947cdb Compare June 20, 2024 08:07
@adfaure adfaure force-pushed the treesitter-python-generated-grammars branch from 5947cdb to ea460ca Compare June 20, 2024 08:28
Comment on lines 15489 to 15495
# ImportError: /nix/store/7w7piy6hpdj1swdg5r0bz47gk2g3q855-tree-sitter-perl-grammar-0.22.5/parser: undefined symbol: isnumber
"tree-sitter-perl"
# ImportError: /nix/store/6rkhjf2mhwnfqwq2962fvv9bnd4xmpk7-python3.11-python-tree-sitter-ql-dbscheme-0.22.5/lib/python3.11/site-packages/tree_sitter_ql_dbscheme/_binding.abi3.so: undefined symbol: tree_sitter_ql_dbscheme
"tree-sitter-ql-dbscheme"
# ImportError: /nix/store/ybmnykhrgj6sghw4r7ypf897c4wrncr5-python3.11-python-tree-sitter-org-nvim-0.22.5/lib/python3.11/site-packages/tree_sitter_org_nvim/_binding.abi3.so: undefined symbol: tree_sitter_org_nvim
"tree-sitter-org-nvim"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of these might be resolved by simply bumping its source.

Some could possibly be resolved with some tinkering, which we decided should not delay the submitting of this PR for review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved. All grammars pass!

@ofborg ofborg bot requested a review from mightyiam June 20, 2024 09:12
@mightyiam
Copy link
Member

For the record, this is already ready for review, even though it's marked as draft.

@sarcasticadmin sarcasticadmin marked this pull request as ready for review June 20, 2024 12:07
sarcasticadmin and others added 3 commits June 20, 2024 13:47
Co-authored-by: yakampe <[email protected]>
Co-authored-by: GetPsyched <[email protected]>
Co-authored-by: Shahar "Dawn" Or <[email protected]>
Co-authored-by: Robert James Hernandez <[email protected]>
@sarcasticadmin sarcasticadmin force-pushed the treesitter-python-generated-grammars branch from ea460ca to d3890e6 Compare June 20, 2024 13:51
@mightyiam mightyiam force-pushed the treesitter-python-generated-grammars branch 2 times, most recently from be0a31b to 93a9562 Compare June 20, 2024 14:23
Comment on lines 15492 to 15493
# ImportError: /nix/store/7w7piy6hpdj1swdg5r0bz47gk2g3q855-tree-sitter-perl-grammar-0.22.5/parser: undefined symbol: isnumber
"tree-sitter-perl"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be blocked by ganezdragon/tree-sitter-perl#45.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is resolved.

@mightyiam
Copy link
Member

The final commit message and PR title should be updated.

@mightyiam mightyiam force-pushed the treesitter-python-generated-grammars branch from 615c75b to e5dc900 Compare June 24, 2024 10:12
@mightyiam
Copy link
Member

I updated the final commit message. The PR title should be amended. @sarcasticadmin can do that.

@ofborg ofborg bot requested a review from mightyiam June 24, 2024 10:46
@sarcasticadmin sarcasticadmin changed the title python3Packages.tree-sitter-*: init at 0.22.5 python3Packages.tree-sitter-grammars: init at 0.22.5 Jun 24, 2024
Copy link
Member

@Janik-Haag Janik-Haag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider taking a look at https://github.com/tree-sitter/py-tree-sitter before doing all the work here?

Taking a look at ngi-nix/ngipkgs#139 I'm honestly a bit confused by the shell.nix part of the request.
Because you can just do something like:

let
  pkgs = import <nixpkgs> { };
  tree-sitter-with-grammers = pkgs.tree-sitter.override {
    extraGrammars = with pkgs.tree-sitter-grammars; [
      # put your grammars here
      tree-sitter-go
      tree-sitter-yang
    ];
  };
in pkgs.mkShell {
  buildInputs = [
    tree-sitter-with-grammers
  ];
}

And get a shell with tree-sitter and the grammars you want, but I might be missing something.

Regarding armijnhemel/proximity_matcher_webservice and the extra context in https://discourse.nixos.org/t/need-help-enabling-grammars-in-treesitter-python/39500 from just googeling a bit, it seems like there is official tree-sitter python bindings https://github.com/tree-sitter/py-tree-sitter/ and we already have them packaged in nixpkgs.
It appears like our package needs a little additional work so you can configure grammars, but that should be an easy <15-min job since you just need to add the grammar python wheel to the pythonPath. (at least that's how I read the language section in the python binding readme


Additonaly please provide reasoning and considerations in your commit message, esp. if you want to add a bunch of hard to maintain code like this. And I'm pretty certain that we should not go with this solution, but instead use the offical python bindings, that are already packaged and a lot easier to maintain.

"date": "2021-12-16T17:14:17+00:00",
"date": "2021-12-16T17:14:17Z",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what happened here, in tree-sitter-scss.json and in tree-sitter-smithy.json. Seems like there shouldn't be any change there. Also, the shortening of +00:00 to Z seems odd, might that be something related to the locals of the person that ran the update script?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ISO 8601 gives you the option to use either Z or +00:00, so both are the same timestamp but it's still odd that it changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I didn't know about that :D
In that case, we should take a look in the script and try to check what changed the behavior to avoid cluttering the git log/blame with unnecessary commits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we allow this to happen here, and see what happens with the next update?

Comment on lines 21 to 24
assert (lib.assertMsg (!(grammarDrv.meta ? license)) ''
As of this writing, ${grammarDrv.pname} surprisingly doesn't have a license.
This trap is set here to guarantee that if it ever does have a license, this package will inherit the license.
'');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What, why? This code, will just be licensed under MIT since that's what nixpkgs is licensed under. Yes, the original grammar might have another licenses, but that really doesn't matter here, and nix allows you to keep track of your (license) dependency-graph. For example, Neovim is licensed under Apache License 2.0 while Tree-Sitter, one of its build inputs is licensed under MIT, but this doesn't mean that Neovim inherits the license. Please remove this nonsensical assert statement.

Suggested change
assert (lib.assertMsg (!(grammarDrv.meta ? license)) ''
As of this writing, ${grammarDrv.pname} surprisingly doesn't have a license.
This trap is set here to guarantee that if it ever does have a license, this package will inherit the license.
'');

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this nonsensical assert statement.

Please no name-calling my work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement, on a technical level, doesn't make sense, so I think it's nonsensical.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've set all the licenses to MIT

Comment on lines +14 to +19
snakeCaseName = lib.replaceStrings [ "-" ] [ "_" ] name;
drvPrefix = "python-${name}";
langIdentOverrides = {
tree_sitter_org_nvim = "tree_sitter_org";
};
langIdent = langIdentOverrides.${snakeCaseName} or snakeCaseName;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you doing here??

So if the name is tree_sitter_org_nvim, then you want to replace it with tree_sitter_org and otherwise just use snakeCaseName? Why do it in such a complicated manner, instead of just using a simple if statement?
Also, please put a comment stating why you are replacing the name here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I believe this is cleaner and easier to maintain than a if statement

What if new a tree-sitter grammar gets added with a weird name like tree_sitter_org_nvim 😂? We can simply put another "override" in there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please put a comment stating why you are replacing the name here.

Could you please tell me why you are renaming here?
Looking at the commit, it seems to have been named this name rather deliberately: 1705882

And before introducing hacky workarounds here, we should change the name in general or just leave it as is.

What if new a tree-sitter grammar gets added with a weird name like tree_sitter_org_nvim 😂?

Then more packages have to worry about it since it would also result in conflicts in our general tree-sitter infrastructure, so doing this workaround here appears to be a bad idea because it will cause more overhead in the long term.

Comment on lines +117 to +115
preCheck = ''
rm -r ${snakeCaseName}
'';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please leave a comment on why you are removing the directory here.

Copy link

@adfaure adfaure Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is "common" for python packages, it seems that pytest gets confused if a directory with the same name as the module is present during tests, (in this case ${snakeCaseName} also corresponds to the module name provided by the generated grammar package).

It looks like it is fairly common for python packages (here an example: https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/development/python-modules/tree-sitter/default.nix#L45).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and if you take a look at the file you just linked, you will notice that it has a comment linking to the issue in question on why you have to remove this directory, so please also add a comment here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct comment would be to mention #255262 .

Comment on lines +105 to +108
def test_language():
lang = Language(language())
assert lang is not None
parser = Parser()
parser.language = lang
tree = parser.parse(bytes("", "utf-8"))
assert tree is not None
''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced by the soundness of this test, did you at least try to parse some actual code with some of the language-grammars?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC in one of our mob sessions, someone pointed out tree-sitter parsing cannot fail even if provided code doesn't follow the grammar, and no exception will be thrown no matter what (I'm not 100% sure tho)

I agree that asserting not None might not be enough, but I think the best we can do here is to change the assert not None to the resulting parse tree is an instance of some type that py-tree-sitter provides

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not answering my question. I asked if you tried to parse some actual code (manually) and check if the grammars work, I'm aware that improving this test isn't too easy, you could have a bunch of multiline strings in an attrset with actual language code and insert that instead of an empty string, at least for some languages.
Like for example:

{
  tree_sitter_python = {
    testCase = ''
      foo = 20
    '';
    expectedResult = $whateverTreeSitterShouldReturn
    ;
  };
}

is how I would probably go about this problem, and if a language isn't in the attr set, then you can still go with an empty string as default value.

["-std=c11"] if system() != 'Windows' else []
),
define_macros=[
("Py_LIMITED_API", "0x03080000"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the magic number/address 0x03080000 here mean, could you please add a comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be documented, and based on that you should also set disabled = pythonOlder "3.8"

Comment on lines +29 to +28
src = symlinkJoin {
name = "${drvPrefix}-source";
paths = [
(writeTextDir "${snakeCaseName}/__init__.py"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be entirely frank, this seems like madness. It's hard to maintain code, just inline in some file, and I'm fairly certain you could have accomplished the same result using https://github.com/grantjenks/py-tree-sitter-languages (which is already packaged) and by putting less than 15-mins of work into https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/development/python-modules/tree-sitter/default.nix so it has a package option for additional grammars.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I'm sorry I copy-pasted the wrong link, I meant to refer to https://github.com/tree-sitter/py-tree-sitter (the same one I referred to in #320783 (review)). The official one isn't deprecated and still maintained, and is the one we have in nixpkgs.

@stepbrobd
Copy link
Member

Did you consider taking a look at https://github.com/tree-sitter/py-tree-sitter before doing all the work here?

Yep, that's the first thing we've done and we are using it in the checkPhase

Taking a look at ngi-nix/ngipkgs#139 I'm honestly a bit confused by the shell.nix part of the request. Because you can just do something like:
...
And get a shell with tree-sitter and the grammars you want, but I might be missing something.

This is right for exposing tree-sitter and tree-sitter-<lang> (grammars) to dev shells.
What ngi-nix/ngipkgs#139 wants, is exposing Python bindings for multiple grammars to dev shells (original request)

Regarding armijnhemel/proximity_matcher_webservice and the extra context in https://discourse.nixos.org/t/need-help-enabling-grammars-in-treesitter-python/39500 from just googeling a bit, it seems like there is official tree-sitter python bindings https://github.com/tree-sitter/py-tree-sitter/ and we already have them packaged in nixpkgs.

py-tree-sitter is indeed in nixpkgs, but you'll need to load a grammar to it before you can parse anything (usage)
What this PR adds, is to give py-tree-sitter in nixpkgs the ability to load any/all grammars exposed through tree-sitter.builtGrammars as python bindings
Please note that Python bindings exposed through official tree-sitter org (it's mostly distributed through one of the maintainers' personal Pypi account) on Pypi is incomplete, 24 as of me writing this. With our way, we can expose 100+ bindings readily available in nixpkgs

It appears like our package needs a little additional work so you can configure grammars, but that should be an easy <15-min job since you just need to add the grammar python wheel to the pythonPath. (at least that's how I read the language section in the python binding readme

That's what we did in this PR: giving users the ability to get tree-sitter grammar python bindings, which will further allow py-tree-sitter to load the bindings and be able to parse code

Copy link
Member

@Janik-Haag Janik-Haag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What ngi-nix/ngipkgs#139 wants, is exposing Python bindings for multiple grammars to dev shells (original request)

As I said, I wasn't sure if it meant just tree-sitter, or if it meant python bindings.

What this PR adds, is to give py-tree-sitter in nixpkgs the ability to load any/all grammars exposed through tree-sitter.builtGrammars as python bindings

No, it does not, I think? You aren't touching the py-tree-sitter package, nor are you using py-tree-sitter in your code.

py-tree-sitter is indeed in nixpkgs, but you'll need to load a grammar to it before you can parse anything (usage)

Yes, that's the point I made, and you aren't adding support for it here.

All you would have needed to do is a small patch like:

From 12e830ed9a7bace1ef0df64ac088b0a4b309d8cd Mon Sep 17 00:00:00 2001
From: "Janik H." <[email protected]>
Date: Wed, 26 Jun 2024 13:28:58 +0200
Subject: [PATCH] python3Packages.tree-sitter: add support for additional
 grammars

---
 pkgs/development/python-modules/tree-sitter/default.nix | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/pkgs/development/python-modules/tree-sitter/default.nix b/pkgs/development/python-modules/tree-sitter/default.nix
index fdaa03554433..0127b1518fa1 100644
--- a/pkgs/development/python-modules/tree-sitter/default.nix
+++ b/pkgs/development/python-modules/tree-sitter/default.nix
@@ -5,11 +5,12 @@
   pytestCheckHook,
   pythonOlder,
   setuptools,
-  tree-sitter-python,
-  tree-sitter-rust,
   tree-sitter-html,
   tree-sitter-javascript,
   tree-sitter-json,
+  tree-sitter-python,
+  tree-sitter-rust,
+  extraGrammars ? [ ]
 }:
 
 buildPythonPackage rec {
@@ -29,6 +30,8 @@ buildPythonPackage rec {
 
   build-system = [ setuptools ];
 
+  dependencies = extraGrammars;
+
   nativeCheckInputs = [
     pytestCheckHook
     tree-sitter-python
-- 
2.45.1

(I'm not sure if the above example patch is the best way to do this, you might also want to craft your own python environment in passthru to avoid unnecessary recompiles (similar to what we are doing in octodns)

Example usage would be:

let
  pkgs = import ./. { };
in pkgs.writers.writePython3Bin "tree-sitter-test.py" {
  libraries = [
    (pkgs.python3Packages.tree-sitter.override { extraGrammars = [
        pkgs.python3Packages.tree-sitter-javascript
      ];
    })
  ];
} ''
import tree_sitter_javascript as tsjavascript
from tree_sitter import Language, Parser

JS_LANGUAGE = Language(tsjavascript.language())
parser = Parser(JS_LANGUAGE)
tree = parser.parse(
  bytes("console.log('Hello World!');", "utf8")
)

cursor = tree.walk()
cursor.goto_first_child()
print(cursor.node.type)
''

Please note that Python bindings exposed through official tree-sitter org (it's mostly distributed through one of the maintainers' personal Pypi account) on Pypi is incomplete, 24 as of me writing this. With our way, we can expose 100+ bindings readily available in nixpkgs

How can you be certain that your bindings work, you barely do any testing, there is a bunch of work going into this upstream to do this properly. And you are basically duplicating out of tree efforts like https://github.com/grantjenks/py-tree-sitter-languages, but with worse support and worse testing.
You can also just use pkgs.python3Packages.tree-sitter-languages instead of pkgs.python3Packages.tree-sitter in the example above (and replace the language and parser definition) and get access to all the languages.
We also don't need to depend on the wheels distributed through pypi, for example the tree-sitter-javascript package listed above, was built from source.

"date": "2021-12-16T17:14:17+00:00",
"date": "2021-12-16T17:14:17Z",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I didn't know about that :D
In that case, we should take a look in the script and try to check what changed the behavior to avoid cluttering the git log/blame with unnecessary commits.

Comment on lines +29 to +28
src = symlinkJoin {
name = "${drvPrefix}-source";
paths = [
(writeTextDir "${snakeCaseName}/__init__.py"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I'm sorry I copy-pasted the wrong link, I meant to refer to https://github.com/tree-sitter/py-tree-sitter (the same one I referred to in #320783 (review)). The official one isn't deprecated and still maintained, and is the one we have in nixpkgs.

Comment on lines 21 to 24
assert (lib.assertMsg (!(grammarDrv.meta ? license)) ''
As of this writing, ${grammarDrv.pname} surprisingly doesn't have a license.
This trap is set here to guarantee that if it ever does have a license, this package will inherit the license.
'');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement, on a technical level, doesn't make sense, so I think it's nonsensical.

Comment on lines +14 to +19
snakeCaseName = lib.replaceStrings [ "-" ] [ "_" ] name;
drvPrefix = "python-${name}";
langIdentOverrides = {
tree_sitter_org_nvim = "tree_sitter_org";
};
langIdent = langIdentOverrides.${snakeCaseName} or snakeCaseName;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please put a comment stating why you are replacing the name here.

Could you please tell me why you are renaming here?
Looking at the commit, it seems to have been named this name rather deliberately: 1705882

And before introducing hacky workarounds here, we should change the name in general or just leave it as is.

What if new a tree-sitter grammar gets added with a weird name like tree_sitter_org_nvim 😂?

Then more packages have to worry about it since it would also result in conflicts in our general tree-sitter infrastructure, so doing this workaround here appears to be a bad idea because it will cause more overhead in the long term.

Comment on lines +105 to +108
def test_language():
lang = Language(language())
assert lang is not None
parser = Parser()
parser.language = lang
tree = parser.parse(bytes("", "utf-8"))
assert tree is not None
''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not answering my question. I asked if you tried to parse some actual code (manually) and check if the grammars work, I'm aware that improving this test isn't too easy, you could have a bunch of multiline strings in an attrset with actual language code and insert that instead of an empty string, at least for some languages.
Like for example:

{
  tree_sitter_python = {
    testCase = ''
      foo = 20
    '';
    expectedResult = $whateverTreeSitterShouldReturn
    ;
  };
}

is how I would probably go about this problem, and if a language isn't in the attr set, then you can still go with an empty string as default value.

["-std=c11"] if system() != 'Windows' else []
),
define_macros=[
("Py_LIMITED_API", "0x03080000"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be documented, and based on that you should also set disabled = pythonOlder "3.8"

@Janik-Haag
Copy link
Member

cc @doronbehar since you did a lot of work on tree-sitter python bindings in #316901 recently.

@doronbehar
Copy link
Contributor

I'm sorry but this PR is very confusing to me, and I tend to agree with most of @Janik-Haag's comments, and for sure using writeTextDir like that is way too messy.

I'm mostly confused because I didn't find any details in neither the discourse thread, nor at ngi-nix/ngipkgs#139 and neither here about the target development setup. For example I wonder:

  • What text editor do you use?
  • Many text editors support installing such Grammers via the editor's package manager, why is that not an option? In the case of Neovim for example, we have an interface for that for sure (example usage). It is purely documented but still it is there.

I'm not familiar with the ecosystem of all of these python packages, including many of those I added in #316901 , so I won't be able to help a lot.

from setuptools import Extension, setup


setup(
Copy link
Member

@jtojnar jtojnar Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inherit version;
pname = drvPrefix;

src = symlinkJoin {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be cleaner keeping the files in-tree, pointing src there and using substituteInPlace --subst-var-by in postPatch to pass the variables.

@jtojnar
Copy link
Member

jtojnar commented Jul 2, 2024

Thanks. This is pretty much what I imagined in https://discourse.nixos.org/t/need-help-enabling-grammars-in-treesitter-python/39500/7?u=jtojnar.


@Janik-Haag

No, it does not, I think? You aren't touching the py-tree-sitter package, nor are you using py-tree-sitter in your code.

The grammars have nothing to do with py-tree-sitter package – the package just contains bindings to tree-sitter library itself, I am pretty sure.

And as you show in your example – you just pull in tree_sitter_javascript from PYTHONPATH. So I do not think the tree_sitter bindings module does need to be aware of the grammars at build time, passing the result of language() function to Language should be enough.

Yes, that's the point I made, and you aren't adding support for it here.

Again, no support is necessary in py-tree-sitter, we just need the grammar bindings, which is what this PR provides as mentioned above. It is creating them from scratch because most grammars are not actually available on PyPI. Also those that are available there are using vendored grammars and vendoring is generally undesirable.

All you would have needed to do is a small patch like:

I do not think this will do anything that could not be achieved by just adding the extraGrammars on Python path in the consumer. The grammars mentioned in the py-tree-sitter expression are only used for tests.

How can you be certain that your bindings work, you barely do any testing, there is a bunch of work going into this upstream to do this properly. And you are basically duplicating out of tree efforts like https://github.com/grantjenks/py-tree-sitter-languages, but with worse support and worse testing.

Yes, testing is an issue

You can also just use pkgs.python3Packages.tree-sitter-languages instead of pkgs.python3Packages.tree-sitter in the example above (and replace the language and parser definition) and get access to all the languages.

but tree-sitter-languages is a dead end as I mentioned in https://discourse.nixos.org/t/need-help-enabling-grammars-in-treesitter-python/39500/7?u=jtojnar

The only other alternative to have all those grammars available from Python with the current API (to my knowledge, as of 2024-04-12) would be going around and opening PRs to add a trivial binding like this https://github.com/tree-sitter/tree-sitter-python/tree/71778c2a472ed00a64abf4219544edbf8e4b86d7/bindings/python/tree_sitter_python for every upstream grammar, having them upload it to PyPI, and then packaging that. That does not scale.

We also don't need to depend on the wheels distributed through pypi, for example the tree-sitter-javascript package listed above, was built from source.

That would be fine but it still does not solve the issue for grammars that do not have Python bindings.

I would say the requirement to have m×n bindings for each combination of language and grammar itself is something that should be addressed but I do not have a solution for that 🤷‍♀️

@adfaure
Copy link

adfaure commented Jul 2, 2024

The only other alternative to have all those grammars available from Python with the current API (to my knowledge, as of 2024-04-12) would be going around and opening PRs to add a trivial binding like this https://github.com/tree-sitter/tree-sitter-python/tree/71778c2a472ed00a64abf4219544edbf8e4b86d7/bindings/python/tree_sitter_python for every upstream grammar, having them upload it to PyPI, and then packaging that. That does not scale.

Hi, I just want to mention that every recent enough grammar should have the binding, since the official template contains it: https://github.com/tree-sitter/tree-sitter-embedded-template.

So maybe, all (maintained) upstream grammars should at some point contain the binding.

@jtojnar
Copy link
Member

jtojnar commented Jul 2, 2024

@adfaure That is actually a grammar for ERB and EJS template languages. But looks like a template does exist so you might be right that it will be everywhere eventually.

So maybe we could indeed use the upstream sources. For example, here is how I would convert python3.pkgs.tree-sitter-javascript (after applying #324161):

--- a/pkgs/development/python-modules/tree-sitter-javascript/default.nix
+++ b/pkgs/development/python-modules/tree-sitter-javascript/default.nix
@@ -1,6 +1,6 @@
 { lib
 , buildPythonPackage
-, fetchFromGitHub
+, tree-sitter-grammars
 , setuptools
 , wheel
 , tree-sitter
@@ -8,16 +8,9 @@
 
 buildPythonPackage rec {
   pname = "tree-sitter-javascript";
-  version = "0.21.3";
+  inherit (tree-sitter-grammars.tree-sitter-javascript) version src;
   pyproject = true;
 
-  src = fetchFromGitHub {
-    owner = "tree-sitter";
-    repo = "tree-sitter-javascript";
-    rev = "v${version}";
-    hash = "sha256-jsdY9Pd9WqZuBYtk088mx1bRQadC6D2/tGGVY+ZZ0J4=";
-  };
-
   build-system = [
     setuptools
     wheel

@adfaure
Copy link

adfaure commented Jul 2, 2024

@adfaure That is actually a grammar for ERB and EJS template languages. But looks like a template does exist so you might be right that it will be everywhere eventually.

Oh, I see. Thank you for pointing that out and finding the actual template.

Co-authored-by: Robert James Hernandez <[email protected]>
Co-authored-by: Yifei Sun <[email protected]>
Co-authored-by: Ali Jamadi <[email protected]>
Co-authored-by: yakampe <[email protected]>
Co-authored-by: GetPsyched <[email protected]>
Co-authored-by: Adrien Faure <[email protected]>
Co-authored-by: Shahar "Dawn" Or <[email protected]>
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-summer-of-nix-program-updates/46053/3

@wegank wegank requested review from wegank and removed request for adfaure July 25, 2024 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants