Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make proto-codegen also create JSON marshalling code generation #14146

Open
ValarDragon opened this issue Dec 4, 2022 · 13 comments
Open

Make proto-codegen also create JSON marshalling code generation #14146

ValarDragon opened this issue Dec 4, 2022 · 13 comments
Labels
help wanted T: Performance Performance improvements

Comments

@ValarDragon
Copy link
Contributor

Summary

We currently code generate all of our proto files, to have proto marshal methods. We should also add overridden logic for JSON marshalling. JSON marshalling is done in many situations in the codebase right now:

  • Genesis import
    • And therefore many integrated tests
  • State export
  • query / cli / debug logic

The first two are performance sensitive. Right now, the JSON marshalling has high RAM usages, because it is based on generic reflection based API's. This takes notable CPU time and heap allocation (the latter causing large GC loads at the relevant scale)

This can all be eliminated if we added code generation for the JSON. We can do so by adding easyjson's codegen https://github.com/mailru/easyjson (or something similar, I have not done a survey of different JSON code generation techniques. But this should be easy add, that would have many wins around the stack.

Proposal

Add JSON marshalling codegen, to our existing proto codegen process. Ensure it gets used / speeds things up with our JSON library usage.

@testinginprod
Copy link
Contributor

There's one requirement: codegen needs to be protojson compliant. So I don't think just using EasyJSON is going to settle it, we would most likely need to fork and adjust it to handle certain proto types Any, Timestamp, Duration in a certain way.

@aaronc
Copy link
Member

aaronc commented Dec 6, 2022

We also really want the encoding to be deterministic

@tac0turtle tac0turtle added the T: Performance Performance improvements label Dec 6, 2022
@ValarDragon
Copy link
Contributor Author

Perhaps this will work then? It claims protojson compatible: https://github.com/mitchellh/protoc-gen-go-json

Running into some serious bottlenecks on JSON deserialization for some modules genesis files. (Both raw deserialization time, but more concerningly all the heap allocations are causing problems)

@aaronc
Copy link
Member

aaronc commented Apr 10, 2023

Perhaps this will work then? It claims protojson compatible: https://github.com/mitchellh/protoc-gen-go-json

As far as I can tell that library doesn't do anything for performance. It's just a thin wrapper for encoding/json

Running into some serious bottlenecks on JSON deserialization for some modules genesis files. (Both raw deserialization time, but more concerningly all the heap allocations are causing problems)

I'm hopeful that streaming genesis will improve the memory issues although performance would likely be similar. What if we allowed streaming binary protobuf genesis as an option?

@ValarDragon
Copy link
Contributor Author

Your right that library doesn't help :(

I think keeping the genesis JSON native, and just figuring out how to codegen the proto JSON serialization is worth doing.

I think it should be pretty low overhead to make a second tool to do JSONcodegen with:

  • if the struct cannot contain anything with an any
  • time and duration patched

As long as we can add it as another second step, rather than editing the existing code. (queue the meme "# compiler passes = # of teams working on project")

@ValarDragon
Copy link
Contributor Author

I guess JSON is good for:

  • Blockchain genesis
  • Recovery from very bad chain bug / halt

Protobuf is fine for the case thats more performance sensitive, which is taking a chain export, editing some value, and starting a testnet. Its already really painful to do this with the JSON tho, I think having to make new protobuf tools will make this unusable short term without serious investment in improving that.

I think just getting the JSON codegen'd will be a bigger win

@aaronc
Copy link
Member

aaronc commented Apr 11, 2023

I think JSON codegen will be a huge effort and this does seem like an optimization for an edge use case (when there's a big chain and export is needed for testing). I think it would easier to add an opt-in for binary genesis and specifically in the context of direction we're going with cross-language modules (see #15410), from the module's perspective I expect genesis will just be a stream of protobuf objects instead of raw JSON. If we can get this design right for ORM, collections, and Rust modules, it will allow binary or JSON genesis. It should be pretty straightforward to go back and forth between the binary and JSON too for editing purposes.

@elias-orijtech
Copy link
Contributor

From @aaronc's comment it sounds like this issue is not worth the effort to implement. If so, shall we close it?

@ValarDragon
Copy link
Contributor Author

ValarDragon commented May 18, 2023

idk, I think it is worth doing. I disagree on this one

I feel like it should be clean to:

  • Make a new annotation on a proto struct, to indicate "codegen JSON"
  • protobuf translation copies this comment to go
  • We do a second pass to scan for such structs with this "JSON codegen this"
  • do JSON codegen with light adaptations of a standard tool

And then we only apply this annotation to things that can decode cleanly, and/or are perf bottlenecks. The perf bottlenecks are really x/distribution and ibc data in what Osmosis does at least.

@aaronc
Copy link
Member

aaronc commented May 18, 2023

Which code generator would do this? Honestly seems like a pretty big effort

@robert-zaremba
Copy link
Collaborator

I guess JSON is good for:

Blockchain genesis
Recovery from very bad chain bug / halt

Why? For big genesis, we need tools anyway to modify genesis. Binary encoding is order of magnitude faster for recovery / share. And as @aaronc mentioned, streaming is definitely the way to go to recover of a node of a huge chain (I guess it will be few times faster to do it with an export, rather than loading it all at once ... probably sometimes loading it all at once won't be even possible for huge chains). So optimizing JSON for genesis purpose is a short term solution.

@ValarDragon
Copy link
Contributor Author

Should be doable with easyjson, and a light modification for timestamps in that repo

@tac0turtle
Copy link
Member

we could use https://github.com/TheThingsIndustries/protoc-gen-go-json as it is meant to work with gogoproto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted T: Performance Performance improvements
Projects
None yet
Development

No branches or pull requests

6 participants