-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support google.api.HttpBody in gRPC/JSON transcoding #5400
Support google.api.HttpBody in gRPC/JSON transcoding #5400
Conversation
content, responseFuture, | ||
generateResponseBodyConverter(spec), mediaType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it is trickier, we should pass the media type dynamically before responding. It is going to involve some minor changes to the deframe logic. (Such as determining the content type dynamically after response is generated and falling back if response is not generated).
message ArbitraryHttpWrappedRequest { | ||
string request_id = 1; | ||
google.api.HttpBody body = 2; | ||
} | ||
|
||
message ArbitraryHttpWrappedResponse { | ||
string response_id = 1; | ||
google.api.HttpBody body = 2; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% sure how transcoder should behave when http body is a sub field. There is also extensions
field which I did not understand either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine until we hear more feedback on this feature from early adoptors. It won't harm at all anyway if a user is not using this feature, so I'd be happy to merge it even if sub-field and extensions cases are unsupported in this PR. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me come back to this PR. I think I have left some gaps therefore this PR was forgotten in the Draft stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if there is anyone waiting for this feature to be released? That would be great to get some feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None that I'm aware of, but I know the colleagues of @hyangtack use HTTP/JSON transcoding, so they might be interested. If we release this and mention it in our release note, someone might start tinkering with it though! 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the code, PTAL at the test cases and LMK what other tests we can run. For now those two seem like fine to me.
grpc/src/main/java/com/linecorp/armeria/server/grpc/HttpJsonTranscodingService.java
Outdated
Show resolved
Hide resolved
|
||
// Media type can only be nullif the request doesn't have a content type. | ||
if (mediaType == null) { | ||
throw new IllegalArgumentException("Unsupported content-type: " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe should we fall back to octet-stream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First time hearing about it, mozilla docs also point to the same direction, https://arc.net/l/quote/dteigvyk
Generic binary data (or binary data whose true type is unknown) is application/octet-stream.
I think it makes sense 👍
|
||
try { | ||
body.put("content_type", request.contentUtf8()); | ||
body.put("data", request.content().array()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, how is a byte array serialized into JSON? How do we deal with a non-UTF8 data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me write some tests, in protobuf spec it should be base64 array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking back to this, I have realized this is still WIP. I'll come back.
I'm still trying to comprehend how transcoding works internally 🙂.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Please take your time. I appreciate that you continue working on this 💟
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested it out and turns out it uses base64. So in that case whether it is UTF8 or not should not matter, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So rest of the code should be self-explanatory. We will see.
message ArbitraryHttpWrappedRequest { | ||
string request_id = 1; | ||
google.api.HttpBody body = 2; | ||
} | ||
|
||
message ArbitraryHttpWrappedResponse { | ||
string response_id = 1; | ||
google.api.HttpBody body = 2; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine until we hear more feedback on this feature from early adoptors. It won't harm at all anyway if a user is not using this feature, so I'd be happy to merge it even if sub-field and extensions cases are unsupported in this PR. 😄
259a7b4
to
f213f19
Compare
// Default behavior is to return JSON_UTF_8, the default content type for JSON transcoding. | ||
return MediaType.JSON_UTF_8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trustin I think this makes more sense for transcoding perspective. By default responses are json? WDY think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the others, but my preference is to just set the content type as-is transparently.
So..
- if
non-null
, then setcontent-type
transparently - if
null
, we could just not set thecontent-type
. Alternatively, I'm fine with setting a default oftext/plain
which is the default in armeria.
From my understanding, content-type
isn't really mandatory depending on whether the body HttpBody
exists. Also, upstream seems to be doing something similar ref: https://github.com/envoyproxy/envoy/blob/94037231eea7150a20b4a07b8c981ae89d1a0a60/source/extensions/filters/http/grpc_json_transcoder/json_transcoder_filter.cc#L441
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does setting transparently
mean? Are you implying we shouldn't add any header at all if it is unset? Kinda makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does setting transparently mean?
For a little more background, MediaType
is not really a comprehensive list of all content-type
s out there.
We pretty much copy over guava
's often-used list of content-type
s manually whenever the following test fails. (we also add custom media types when necessary)
public void wellDefinedUpstreamMediaTypes() { |
Given that users using HttpBody
will be sending over arbitrary data, I wasn't sure if it's worth validating the content-type
.
e.g.
final JsonNode contentType = jsonNode.get("contentType");
if (contentType != null && contentType.isTextual()) {
builder.set(HttpHeaderNames.CONTENT_TYPE, contentType.textValue());
} else {
builder.remove(HttpHeaderNames.CONTENT_TYPE);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, it doesn't have to be MediaType
. I.e. application/scim+json
wasn't an option when I checked it last time.
if (charset != null) { | ||
httpBodyBytes = Base64.getDecoder().decode(httpBody); | ||
} else { | ||
httpBodyBytes = Base64.getDecoder().decode(httpBody); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Response is Base64 because protobuf representation of body is byte[]
and jackson serializes it as a base64 string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume performance is not our first concern here so it seems fine 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good 👍 Left some minor comments
} | ||
|
||
body.put("content_type", contentType.toString()); | ||
body.put("data", content.array()); // Base64 encoded binary data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double checked that we want to use base64 👍
ref: https://protobuf.dev/programming-guides/proto3/#json
@@ -599,6 +675,25 @@ private HttpResponse serve0(ServiceRequestContext ctx, HttpRequest req, | |||
return HttpResponse.of(responseFuture); | |||
} | |||
|
|||
private HttpData convertToArbitraryHttpData(AggregatedHttpRequest request) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional; naming and static
private HttpData convertToArbitraryHttpData(AggregatedHttpRequest request) throws IOException { | |
private static HttpData convertToHttpBody(AggregatedHttpRequest request) throws IOException { |
final byte[] httpBodyBytes; | ||
if (charset != null) { | ||
httpBodyBytes = Base64.getDecoder().decode(httpBody); | ||
} else { | ||
httpBodyBytes = Base64.getDecoder().decode(httpBody); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question) Is there a point in checking the charset? I think it's fine to delegate to the user responsibility for setting the arbitrary data as well as setting the correct charset. (also, the if statement isn't doing anything)
final byte[] httpBodyBytes; | |
if (charset != null) { | |
httpBodyBytes = Base64.getDecoder().decode(httpBody); | |
} else { | |
httpBodyBytes = Base64.getDecoder().decode(httpBody); | |
} | |
final byte[] httpBodyBytes = Base64.getDecoder().decode(httpBody); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I must have deleted something. let me check this.
// Default behavior is to return JSON_UTF_8, the default content type for JSON transcoding. | ||
return MediaType.JSON_UTF_8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the others, but my preference is to just set the content type as-is transparently.
So..
- if
non-null
, then setcontent-type
transparently - if
null
, we could just not set thecontent-type
. Alternatively, I'm fine with setting a default oftext/plain
which is the default in armeria.
From my understanding, content-type
isn't really mandatory depending on whether the body HttpBody
exists. Also, upstream seems to be doing something similar ref: https://github.com/envoyproxy/envoy/blob/94037231eea7150a20b4a07b8c981ae89d1a0a60/source/extensions/filters/http/grpc_json_transcoder/json_transcoder_filter.cc#L441
try (HttpData data = httpData) { | ||
final byte[] array = data.array(); | ||
try { | ||
final JsonNode jsonNode = mapper.readValue(array, JsonNode.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question) Is it possible to refactor so that only one deserialization is done? We seem to be doing this at generateResponseBodyConverter
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a race condition, response body converter first deserializes the JSON, finds the body
parameter and converts the body to the inner body parameter. So we lose content-type data from the original body.
A way to solve this would be returning body + media_type bundled together by updating the generateResponseBodyConverter
method to be generateResponseBodyAndMediaTypeConverter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not entirely sure if it is worth the effort the abstract this out. I feel like readability of the code dropped quite a bit when I have done this. I can try introducing a method that takes HttpResponse
and emits another HttpResponse
instead of the two converters. This method can transform headers & body on the go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have used AggregatedHttpResponse
because it seems like the response was already aggregated right before we passed it to the converter. LMK if anything looks odd.
Oops, approved by mistake 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look pretty much good to me 👍 Left some final comments 🙇
grpc/src/main/java/com/linecorp/armeria/server/grpc/HttpJsonTranscodingService.java
Outdated
Show resolved
Hide resolved
grpc/src/main/java/com/linecorp/armeria/server/grpc/AbstractUnframedGrpcService.java
Outdated
Show resolved
Hide resolved
// Default behavior is to return JSON_UTF_8, the default content type for JSON transcoding. | ||
return MediaType.JSON_UTF_8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does setting transparently mean?
For a little more background, MediaType
is not really a comprehensive list of all content-type
s out there.
We pretty much copy over guava
's often-used list of content-type
s manually whenever the following test fails. (we also add custom media types when necessary)
public void wellDefinedUpstreamMediaTypes() { |
Given that users using HttpBody
will be sending over arbitrary data, I wasn't sure if it's worth validating the content-type
.
e.g.
final JsonNode contentType = jsonNode.get("contentType");
if (contentType != null && contentType.isTextual()) {
builder.set(HttpHeaderNames.CONTENT_TYPE, contentType.textValue());
} else {
builder.remove(HttpHeaderNames.CONTENT_TYPE);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thanks! 👍 👍 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, @Dogacel! 🙇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍 Left some final comments 🙇
if (responseBody == null) { | ||
return null; | ||
} else { | ||
return httpData -> { | ||
try (HttpData data = httpData) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question) Can we revive the old behavior to close the httpData
in case the data is pooled?
if (jsonNode == null) { | ||
return httpResponse; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also close the httpResponse.content()
once we're sure we will be using google.api.httpBody
? (Either via try
blocks, or by using PooledObjects.close(httpResponse.content())
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I got it, I am not entirely sure how that try(...)
works internally or what pooled objects are TBH. Please LMK if it looks right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not entirely sure how that try(...) works internally or what pooled objects are TBH.
Once a gRPC message is written, it is converted to bytes via MessageMarshaller
armeria/grpc/src/main/java/com/linecorp/armeria/internal/server/grpc/AbstractServerCall.java
Line 547 in ebb0888
return responseFramer.writePayload(marshaller.serializeResponse(message)); |
Here the channel's ByteBufAllocator
is used to allocate bytes so that no additional copies are necessary when writing over the wire.
armeria/grpc/src/main/java/com/linecorp/armeria/internal/common/grpc/GrpcMessageMarshaller.java
Line 145 in ebb0888
public ByteBuf serializeResponse(O message) throws IOException { |
The response is aggregated with the pooled
option, meaning don't copy the bytes
armeria/grpc/src/main/java/com/linecorp/armeria/server/grpc/AbstractUnframedGrpcService.java
Line 172 in ebb0888
grpcResponse.aggregate(AggregationOptions.usePooledObjects(ctx.alloc(), ctx.eventLoop())) |
Hence, if we want to use the httpResponse
as-is in the early return, we shouldn't close the httpResponse.content()
.
Can you remove the try block here and modify like the following?
e.g.
return httpResponse -> {
final HttpData data = httpResponse.content();
final JsonNode jsonNode = extractHttpBody(data);
// Failed to parse the JSON body, return the original response.
if (jsonNode == null) {
return httpResponse;
}
PooledObjects.close(data);
// The data field is base64 encoded.
// https://protobuf.dev/programming-guides/proto3/#json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I guess I kinda understand. What happens if we don't close the pool? I suspect the memory has to be claimed by the garbage collector which is inefficient? Or maybe there is some reference to this byte buffer left somewhere internally thus it might lead to a memory leak? (Seems unlikely)
Asking just for learning purposes 👼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more netty territory, but my understanding is that since direct memory is not cheap to allocate/deallocate netty maintains a pool of java ByteBuffer
s (may be direct or heap based)
If we don't call release
on the ByteBuf
, the ByteBuf
object itself will be garbage collected but the memory pool will continue to have a strong reference to the underlyingByteBuffer
- hence a memory leak. The others may shed more light on this subject though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only two nits 👍
if (jsonNode == null) { | ||
return httpResponse; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not entirely sure how that try(...) works internally or what pooled objects are TBH.
Once a gRPC message is written, it is converted to bytes via MessageMarshaller
armeria/grpc/src/main/java/com/linecorp/armeria/internal/server/grpc/AbstractServerCall.java
Line 547 in ebb0888
return responseFramer.writePayload(marshaller.serializeResponse(message)); |
Here the channel's ByteBufAllocator
is used to allocate bytes so that no additional copies are necessary when writing over the wire.
armeria/grpc/src/main/java/com/linecorp/armeria/internal/common/grpc/GrpcMessageMarshaller.java
Line 145 in ebb0888
public ByteBuf serializeResponse(O message) throws IOException { |
The response is aggregated with the pooled
option, meaning don't copy the bytes
armeria/grpc/src/main/java/com/linecorp/armeria/server/grpc/AbstractUnframedGrpcService.java
Line 172 in ebb0888
grpcResponse.aggregate(AggregationOptions.usePooledObjects(ctx.alloc(), ctx.eventLoop())) |
Hence, if we want to use the httpResponse
as-is in the early return, we shouldn't close the httpResponse.content()
.
Can you remove the try block here and modify like the following?
e.g.
return httpResponse -> {
final HttpData data = httpResponse.content();
final JsonNode jsonNode = extractHttpBody(data);
// Failed to parse the JSON body, return the original response.
if (jsonNode == null) {
return httpResponse;
}
PooledObjects.close(data);
// The data field is base64 encoded.
// https://protobuf.dev/programming-guides/proto3/#json
grpc/src/main/java/com/linecorp/armeria/server/grpc/HttpJsonTranscodingService.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, @Dogacel! 👍👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 👍 👍
Motivation:
#5311
Trying to make this work.
Modifications:
google.api.HttpBody
.Result:
Need to do some testing, I couldn't fully understand how to do this. The way I am doing it is a bit tricky bit I guess it should work