-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
don't unnecessary fetch all tokens when grabbing an interval of text … #2532
Conversation
…(match C# logic) When an exception is thrown, it calls getText to get the text of the tokens invloved but the current C++ and Java implemention first fetches the rest of the tokens instead of just using the tokens it already has. C#'s version is correct (fill() is called only when asking for the entire stream's text).
@parrt This patch can be merged. |
Hmm...i'm a bit nervous about changing getTest() (at least for Java). not sure what ramifications are for fetching text normally. maybe slower? maybe different? |
This is only about filling the token buffer and it mirrors what's already used in C#. The tests succeed so I think it should be save. |
Yes but the call to the overridden method handles the fetching of the tokens for the given interval so this call was duplicating logic. That overriden method always called fill() prior to this change which fetches all tokens instead of fetching only up to the stop index. We caught this as a performance problem when the exception happened at the beginning of a file and it went to get the text of the token at fault it ended up tokenizing the entire file at that time which is completely unnecessary. I guess the C# implementors caught this issue and resolved it independently of the Java and C++ runtimes which both had this problem. |
Hmm...yeah, I'm just too scared to change it. can you just override it? |
@WalterCouto can you back out the Java change? thanks! |
I could back out of the java change attached to this change, but I don't agree that it should be reversed. The bug is a large performance hit on both Java and C++. The java bug is what started the performance hit for us; we also use the C++ runtime and it had the same problem. We started using the C# runtime which doesn't have the bug. getText() with no params should "The text of all tokens in the stream." so it should be calling fill() which will fill the tokens to the end of the stream, not doing so is a bug and that is what Java and C++ had. That bug was not noticed before because of the second bug which is the getText() that takes a range was calling fill() but according to that method, it will retrieve "The text of all tokens lying between the specified start and stop tokens" so it should not have fetched all the tokens to the end of the stream but only up to the stop index using sync() So the "fear" should be leaving that bug as is, with our parser tests across multiple grammars with large files shows huge performance hit to stop after the first error as it had to tokenize the large file completely to get the token text the error occurred on before return back to us it failed early on. If you were waiting for the full parse, you wouldn't have noticed that early on you took the hit of tokenizing the whole file, but if you are using callbacks you will notice a wait as memory is allocated and lexing occurs for the entire file. If you still feel like to leave this change out for this PR, I can but the original bug back in and raise an java only PR. |
Well, if @sharwell has the time to approve, I'm ok with it but I'm too distracted to review such an important issue. |
Ok, looks like Sam is busy. @WalterCouto thanks for your thoughtful response. Can you make separate PR for me to evaluate in future? thanks! |
@parrt only the C++ change is part of the PR request now |
…(match C# logic) When an exception is thrown, it calls getText to get the text of the tokens invloved but the current Java implemention first fetches the rest of the tokens instead of just using the tokens it already has. C#'s version is correct (fill() is called only when asking for the entire stream's text). C++ runtime had the same bug but was fixed by antlr#2532
…(match C# logic)
When an exception is thrown, it calls getText to get the text of the tokens invloved but the current C++ and Java implemention first fetches the rest of the tokens instead of just using the tokens it already has. C#'s version is correct (fill() is called only when asking for the entire stream's text).