-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make proto-codegen also create JSON marshalling code generation #14146
Comments
There's one requirement: codegen needs to be protojson compliant. So I don't think just using EasyJSON is going to settle it, we would most likely need to fork and adjust it to handle certain proto types |
We also really want the encoding to be deterministic |
Perhaps this will work then? It claims protojson compatible: https://github.com/mitchellh/protoc-gen-go-json Running into some serious bottlenecks on JSON deserialization for some modules genesis files. (Both raw deserialization time, but more concerningly all the heap allocations are causing problems) |
As far as I can tell that library doesn't do anything for performance. It's just a thin wrapper for encoding/json
I'm hopeful that streaming genesis will improve the memory issues although performance would likely be similar. What if we allowed streaming binary protobuf genesis as an option? |
Your right that library doesn't help :( I think keeping the genesis JSON native, and just figuring out how to codegen the proto JSON serialization is worth doing. I think it should be pretty low overhead to make a second tool to do JSONcodegen with:
As long as we can add it as another second step, rather than editing the existing code. (queue the meme "# compiler passes = # of teams working on project") |
I guess JSON is good for:
Protobuf is fine for the case thats more performance sensitive, which is taking a chain export, editing some value, and starting a testnet. Its already really painful to do this with the JSON tho, I think having to make new protobuf tools will make this unusable short term without serious investment in improving that. I think just getting the JSON codegen'd will be a bigger win |
I think JSON codegen will be a huge effort and this does seem like an optimization for an edge use case (when there's a big chain and export is needed for testing). I think it would easier to add an opt-in for binary genesis and specifically in the context of direction we're going with cross-language modules (see #15410), from the module's perspective I expect genesis will just be a stream of protobuf objects instead of raw JSON. If we can get this design right for ORM, collections, and Rust modules, it will allow binary or JSON genesis. It should be pretty straightforward to go back and forth between the binary and JSON too for editing purposes. |
From @aaronc's comment it sounds like this issue is not worth the effort to implement. If so, shall we close it? |
idk, I think it is worth doing. I disagree on this one I feel like it should be clean to:
And then we only apply this annotation to things that can decode cleanly, and/or are perf bottlenecks. The perf bottlenecks are really |
Which code generator would do this? Honestly seems like a pretty big effort |
Why? For big genesis, we need tools anyway to modify genesis. Binary encoding is order of magnitude faster for recovery / share. And as @aaronc mentioned, streaming is definitely the way to go to recover of a node of a huge chain (I guess it will be few times faster to do it with an export, rather than loading it all at once ... probably sometimes loading it all at once won't be even possible for huge chains). So optimizing JSON for genesis purpose is a short term solution. |
Should be doable with easyjson, and a light modification for timestamps in that repo |
we could use https://github.com/TheThingsIndustries/protoc-gen-go-json as it is meant to work with gogoproto |
Summary
We currently code generate all of our proto files, to have proto marshal methods. We should also add overridden logic for JSON marshalling. JSON marshalling is done in many situations in the codebase right now:
The first two are performance sensitive. Right now, the JSON marshalling has high RAM usages, because it is based on generic reflection based API's. This takes notable CPU time and heap allocation (the latter causing large GC loads at the relevant scale)
This can all be eliminated if we added code generation for the JSON. We can do so by adding easyjson's codegen https://github.com/mailru/easyjson (or something similar, I have not done a survey of different JSON code generation techniques. But this should be easy add, that would have many wins around the stack.
Proposal
Add JSON marshalling codegen, to our existing proto codegen process. Ensure it gets used / speeds things up with our JSON library usage.
The text was updated successfully, but these errors were encountered: