[ISSUE #4701]Fix use tcp protocol client send message, it throw a DecoderException #4702

mxsm · 2024-01-01T03:20:55Z

Fixes #4701

Motivation

Explain the content here.
Explain why you want to make the changes and what problem you're trying to solve.

use tcp protocol client send message, it throw a DecoderException

Modifications

Describe the modifications you've done.

Refactored the Codec's code and fixed the current issue

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
If a feature is not applicable for documentation, explain why?
If a feature is not documented yet in this PR, please create a followup issue for adding the documentation

… a DecoderException

codecov · 2024-01-01T03:42:09Z

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (bd2aeb9) 17.39% compared to head (32656bc) 17.46%.
Report is 12 commits behind head on master.

Files	Patch %	Lines
...che/eventmesh/common/protocol/tcp/codec/Codec.java	65.21%	6 Missing and 2 partials ⚠️
.../apache/eventmesh/client/tcp/common/TcpClient.java	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master    #4702      +/-   ##
============================================
+ Coverage     17.39%   17.46%   +0.07%     
- Complexity     1759     1770      +11     
============================================
  Files           797      797              
  Lines         29850    29850              
  Branches       2579     2565      -14     
============================================
+ Hits           5192     5213      +21     
+ Misses        24177    24156      -21     
  Partials        481      481

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pandaapo · 2024-01-01T03:58:25Z

May I ask if this bug is caused by this PR #4678? And what is the specific cause of this bug?

mxsm · 2024-01-01T03:59:44Z

May I ask if this bug is caused by this PR #4678? And what is the specific cause of this bug?

Yes

pandaapo · 2024-01-01T04:26:18Z

May I ask if this bug is caused by this PR #4678? And what is the specific cause of this bug?

Yes

And what is the specific reason of this bug?

mxsm · 2024-01-01T05:08:08Z

May I ask if this bug is caused by this PR #4678? And what is the specific cause of this bug?

Yes

And what is the specific reason of this bug?

When the TCP-sent data packet is not sufficient to construct a Package, it will result in an error. If using a custom approach, you need to handle the issues of packet fragmentation and incomplete packets. The previously merged TCP enhancement pull request did not address this, and there would be no issue if packet fragmentation and incomplete packets did not occur.

pandaapo · 2024-01-01T05:48:45Z

When the TCP-sent data packet is not sufficient to construct a Package, it will result in an error. If using a custom approach, you need to handle the issues of packet fragmentation and incomplete packets. The previously merged TCP enhancement pull request did not address this, and there would be no issue if packet fragmentation and incomplete packets did not occur.

Thanks for your explanation. So your PR essentially moves the logic of Codec.EnCoder and Codec.Decoder into Codec. This raises a new question for me: in the data packet processing for message stickiness and message truncation in the encoding and decoding logic of Codec, what is the difference between calling the internal class object and handling it directly?

谢谢您的说明。所以您的PR相当于将Codec.EnCoder和Codec.Decoder的逻辑移到了Codec中。这让我产生了新的疑问，在Codec编解码逻辑中，调用内部类对象处理数据粘包、半包问题的方式和直接处理的方式，有什么不同？

mxsm · 2024-01-01T12:15:01Z

When the TCP-sent data packet is not sufficient to construct a Package, it will result in an error. If using a custom approach, you need to handle the issues of packet fragmentation and incomplete packets. The previously merged TCP enhancement pull request did not address this, and there would be no issue if packet fragmentation and incomplete packets did not occur.

Thanks for your explanation. So your PR essentially moves the logic of Codec.EnCoder and Codec.Decoder into Codec. This raises a new question for me: in the data packet processing for message stickiness and message truncation in the encoding and decoding logic of Codec, what is the difference between calling the internal class object and handling it directly?

谢谢您的说明。所以您的PR相当于将Codec.EnCoder和Codec.Decoder的逻辑移到了Codec中。这让我产生了新的疑问，在Codec编解码逻辑中，调用内部类对象处理数据粘包、半包问题的方式和直接处理的方式，有什么不同？

The functional code of encoding has not changed; only the positions were replaced. The decoding code has been repaired to address the issues of packet fragmentation and incomplete packets. It's not just a simple code move. You can compare it with the previous code.

pandaapo · 2024-01-01T12:33:01Z

The functional code of encoding has not changed; only the positions were replaced. The decoding code has been repaired to address the issues of packet fragmentation and incomplete packets. It's not just a simple code move. You can compare it with the previous code.

So this bug is not actually introduced by PR #4678, but rather a bug that existed before, right?

pandaapo · 2024-01-01T12:44:33Z

eventmesh-common/src/main/java/org/apache/eventmesh/common/protocol/tcp/codec/Codec.java

+ if (in == null) {
+ return;
+ }
+ if (in.readableBytes() < CONSTANT_MAGIC_FLAG.length + VERSION.length + 4 + 4) {


Could these 2 magic numbers be changed to constants to indicate that they are the length of the package length value and the length of the header length value?

Could these magic numbers be changed to constants to indicate that they are the length of the package length value and the length of the header length value?

These magic numbers are actually part of the protocol. I recommend keeping them. You can think of them as a special marker for EventMesh's custom protocol, similar to the magic number in Java class files.

You can think of them as a special marker for EventMesh's custom protocol

Actually, I think CONSTANT_MAGIC_FLAG(= serializeBytes("EventMesh")) is also a special marker for EventMesh's custom protocol.

Could these 2 magic numbers be changed to constants to indicate that they are the length of the package length value and the length of the header length value?

Using constants can improve readability and maintainability. But in this case, the numbers are integral to the protocol's structure, not arbitrary choices. It is better to stick with protocol-defined markers, otherwise it could lead to ambiguity or unexpected behavior.

pandaapo · 2024-01-01T12:54:14Z

eventmesh-common/src/main/java/org/apache/eventmesh/common/protocol/tcp/codec/Codec.java

+ validateFlagAndVersion(flagBytes, versionBytes, ctx);
+ final int packageLength = in.readInt();
+ final int headerLength = in.readInt();
+ if (in.readableBytes() < packageLength - 13) {


Could you change 13 to a constant to indicate that it is the length of CONSTANT_MAGIC_FLAG plus the length of VERSION?

Could you change 13 to a constant to indicate that it is the length of CONSTANT_MAGIC_FLAG plus the length of VERSION?

Of course

mxsm · 2024-01-01T13:28:12Z

The functional code of encoding has not changed; only the positions were replaced. The decoding code has been repaired to address the issues of packet fragmentation and incomplete packets. It's not just a simple code move. You can compare it with the previous code.

So this bug is not actually introduced by PR #4678, but rather a bug that existed before, right?

Yes, Reverting to the previous version is without issues.

pandaapo · 2024-01-01T13:52:45Z

The functional code of encoding has not changed; only the positions were replaced. The decoding code has been repaired to address the issues of packet fragmentation and incomplete packets. It's not just a simple code move. You can compare it with the previous code.

So this bug is not actually introduced by PR #4678, but rather a bug that existed before, right?

Yes, Reverting to the previous version is without issues.

I'm confused. #4678 did not change the decoding logic, only changed the direct decoding logic to call the decoding logic with an internal class object. Why does #4678 cause this bug?

mxsm · 2024-01-01T14:20:09Z

The ReplayingDecoder you marked in the screenshot contains processing logic. After processing with ReplayingDecoder, it then calls the decode method as shown in the previous image. You might want to focus on the parent class of ReplayingDecoder, which is ByteToMessageDecoder, and take a look at the channelRead method. If you call it directly, you skip the steps of handling partial packets and packet concatenation. However, if your data length is just right, there's no issue. Problems may arise when the data length is not sufficient to construct a custom Package, or it can be parsed into multiple packages, or it is greater than one but cannot be built into an integer.
In summary, direct calling skips handling packet fragmentation and incomplete packets.

pandaapo · 2024-01-02T05:51:03Z

Thank you for your explanation. Since the version before PR #4678 not only ensure accuracy, but also make decoding logic clearer and simpler, would it be better to revert to the previous version without adding extra decoding complexity?

mxsm · 2024-01-02T06:27:24Z

Thank you for your explanation. Since the version before PR #4678 not only ensure accuracy, but also make decoding logic clearer and simpler, would it be better to revert to the previous version without adding extra decoding complexity?

It is also possible to roll back to the previous version. This is based on the current transformation. We can see the opinions of others in the community. The previous implementation was to implement decoding and encoding separately using two decoders and two encoders. Now, decoding and encoding are implemented through Codec, which is essentially no different from the previous implementation. It is just the coding method and implementation. The previous implementation was processed through Netty's own internal generic implementation. The current PR is handled by the developer's customization.

pandaapo · 2024-01-02T09:09:04Z

Thank you for your explanation. Since the version before PR #4678 not only ensure accuracy, but also make decoding logic clearer and simpler, would it be better to revert to the previous version without adding extra decoding complexity?

It is also possible to roll back to the previous version. This is based on the current transformation. We can see the opinions of others in the community. The previous implementation was to implement decoding and encoding separately using two decoders and two encoders. Now, decoding and encoding are implemented through Codec, which is essentially no different from the previous implementation. It is just the coding method and implementation. The previous implementation was processed through Netty's own internal generic implementation. The current PR is handled by the developer's customization.

Compared to the previous version before PR #4678, the logic in this PR still adds some complexity, such as embedding the following decoding logic on the basis of previous code, while the previous version was not affected by these details of EventMesh custom protocol.

...
if (in.readableBytes() < prefixLength + 4 + 4) {
  // Not enough data to read the package length and header length
  return;
}
...
if (in.readableBytes() < packageLength - prefixLength) {
  // Not enough data yet, reset the reader index and wait for more data
  in.resetReaderIndex();
  return;
}
...

I completely agree with what you said: we can see the opinions (modify like this or revert) of others in the community.

karsonto · 2024-01-05T03:05:26Z

Can we use Netty's built-in LengthFieldBasedFrameDecoder to solve the problem of sticky or half packet?

pandaapo · 2024-01-06T00:42:27Z

I think LengthFieldBasedFrameDecoder can not solve this problem. length = CONSTANT_MAGIC_FLAG.length + VERSION.length + headerLength + bodyLength; The actual size of the entire package is four bytes larger than the length. It does not conform to the decoding scenario of the LengthFieldBasedFrameDecoder.

@mxsm

It seems not to be the case. For example, one of the constructors is

LengthFieldBasedFrameDecoder(
  int maxFrameLength,
  int lengthFieldOffset,
  int lengthFieldLength,
  int lengthAdjustment,
  int initialBytesToStrip)

.
If you define it like this:

new LengthFieldBasedFrameDecoder(
  /* maximum packet length */
  max frame length, 
  /* the offset of "package length" field, i,e CONSTANT_MAGIC_FLAG.length + VERSION.length */
  13,
  /** 
   * the length of "package length" field, which should be 4
   */
  4,
  /** 
   * adjustment to the length field value, i.e. adding "'package length' field's length + 'Header length' field's length - 
   * CONSTANT_MAGIC_FLAG.length - VERSION.length - 'package length' field's length" to "package length" field value.
   * It is equivalent to padding the length field value to the actual length 
   * of the entire data packet and then subtracting the first 17 bytes from it.
   * Ultimately, this gives the length of the 'Header length' field, the 'Header' field, and the 'body' field
   */
  -9,
  /**
   * strip off the first 21 bytes of the decoded frame, i.e. "CONSTANT_MAGIC_FLAG.length + VERSION.length +
   *  'package length' field's length + 'Header length' field's length"
   */
  21)

, the resulting decoded ByteBuf should be the concatenation of Header and body.

好像并非如此。比如构造函数之一是

LengthFieldBasedFrameDecoder(
  int maxFrameLength,
  int lengthFieldOffset,
  int lengthFieldLength,
  int lengthAdjustment,
  int initialBytesToStrip)

，如果您这样定义

new LengthFieldBasedFrameDecoder(
  /* 数据包长度上限 */
  max frame length,
  /* 长度字段即“package length”的偏移量：CONSTANT_MAGIC_FLAG.length + VERSION.length */
  13,
  /**
   * 长度字段即“package length”的长度："package length" field's length
   */
  4, 
  /**
   * 对长度字段值的补长，即给长度字段值加上“'package length' field's length + 'Header length' field's length - 
   * CONSTANT_MAGIC_FLAG.length - VERSION.length - 'package length' field's length”，
   * 相当于将长度字段值补成了整个数据包的真正长度，然后减去前17个字节，
   * 最终得到的是“Header length”字段、“Header”字段和“body”字段的长度
   */
  -9, 
  /**
   * 剥离掉解码帧的最前面21个字节，即“CONSTANT_MAGIC_FLAG.length + VERSION.length + 
   * 'package length' field's length + 'Header length' field's length”
   */
  21)

，最终的解码出来的ByteBuf应该就是Header + body。

mxsm · 2024-01-06T12:36:30Z

Can we use Netty's built-in LengthFieldBasedFrameDecoder to solve the problem of sticky or half packet?

@karsonto I has resovled. please review the code

pandaapo · 2024-01-07T09:33:13Z

eventmesh-common/src/main/java/org/apache/eventmesh/common/protocol/tcp/codec/Codec.java

@@ -161,7 +160,9 @@ private Header parseHeader(ByteBuf in, int headerLength) throws JsonProcessingEx
 }
 final byte[] headerData = new byte[headerLength];
 in.readBytes(headerData);
- LogUtils.debug(log, "Decode headerJson={}", deserializeBytes(headerData));
+ if (log.isDebugEnabled()) {


Regarding these modifications to the log printing, you don't have to address them. PR #4698 is currently handling these issues uniformly.

关于日志打印的这些修改，您可以不用处理，PR #4698正在统一处理这些问题。

pandaapo · 2024-01-07T09:33:34Z

eventmesh-common/src/main/java/org/apache/eventmesh/common/protocol/tcp/codec/Codec.java

+ public static class Decoder extends LengthFieldBasedFrameDecoder {
+
+ public Decoder() {
+ super(FRAME_MAX_LENGTH, 13, 4, -9, 0);


My suggestion still pertains to the hardcoding of these numbers: 13, 4, -9.
To enhance code maintainability, I believe either defining two more constants like PACKAGE_LENGTH_FIELD_LENGTH (= 4) and HEADER_LENGTH_FIELD_LENGTH (= 4), and then entirely replacing these numbers with constants or variables, or using comments to explain these numbers would be beneficial. I prefer the first approach.

我的建议依然是关于数字的这种硬编码：13、4、-9。
为了让代码易于维护，我觉得要么再定义两个常量比如PACKAGE_LENGTH_FIELD_LENGTH (= 4)、HEADER_LENGTH_FIELD_LENGTH (= 4)，然后完全用常量或变量来代替这些数字，要么用注释来说明这些数字。我倾向于第一种方式。

karsonto · 2024-01-08T06:42:40Z

Can we use Netty's built-in LengthFieldBasedFrameDecoder to solve the problem of sticky or half packet?

@karsonto I has resovled. please review the code

LGTM

harshithasudhakar

LGTM

pandaapo · 2024-01-08T09:26:00Z

eventmesh-common/src/main/java/org/apache/eventmesh/common/protocol/tcp/codec/Codec.java

@@ -171,7 +171,7 @@ private Object parseBody(ByteBuf in, Header header, int bodyLength) throws JsonP
 }
 final byte[] bodyData = new byte[bodyLength];
 in.readBytes(bodyData);
- LogUtils.debug(log, "Decode bodyJson={}", deserializeBytes(bodyData));
+ LogUtils.debug(log, "Decode headerJson={}", deserializeBytes(bodyData));


It is bodyJson here instead of headerJson.

pandaapo · 2024-01-08T09:26:58Z

eventmesh-common/src/main/java/org/apache/eventmesh/common/protocol/tcp/codec/Codec.java

@@ -223,7 +223,9 @@ private static Object deserializeBody(String bodyJsonString, Header header) thro
 case REDIRECT_TO_CLIENT:
 return JsonUtils.parseObject(bodyJsonString, RedirectInfo.class);
 default:
- LogUtils.warn(log, "Invalidate TCP command: {}", command);
+ if (log.isWarnEnabled()) {
+ log.warn("Invalidate TCP command: {}", command);


The same situation as mentioned earlier: #4702 (comment)

和前面同样的情况：#4702 (comment)

pandaapo · 2024-01-08T09:27:39Z

eventmesh-common/src/main/java/org/apache/eventmesh/common/protocol/tcp/codec/Codec.java

+ public static class Decoder extends LengthFieldBasedFrameDecoder {
+
+ public Decoder() {
+ super(FRAME_MAX_LENGTH, PREFIX_LENGTH, Integer.BYTES, -9, 0);


I believe Integer.BYTES and -9 are still not clear enough for developers. Integer.BYTES: The lengths of version field, package length field, and header length field are all Integer.BYTES. -9: Hard-coded.

If you disagree with replacing them with expressions composed of more descriptive constant or variable names, as mentioned earlier, you can provide detailed comments to explain the usage of Integer.BYTES and -9 here.

我认为Integer.BYTES和-9对开发者来说依旧不够清晰。Integer.BYTES：Version字段, package length字段, header length字段的长度都是Integer.BYTES。-9：硬编码。
如果您不赞同将它们换成有助理解的常量名或变量名组成的表达式，正如我之前所说，您可以用详细的注释来说明这里的Integer.BYTES和-9。

@mxsm I'm curious to understand the thought processes behind choosing a hard-coded value over a dynamic value. I'm wondering about potential future flexibility. Have we considered scenarios where the header structure might evolve, and how a dynamic approach might handle those cases?

@mxsm I'm curious to understand the thought processes behind choosing a hard-coded value over a dynamic value. I'm wondering about potential future flexibility. Have we considered scenarios where the header structure might evolve, and how a dynamic approach might handle those cases?

@harshithasudhakar For a custom protocol, once it is determined, it will not undergo significant changes for a considerable period of time. This is because any changes to the protocol structure may introduce compatibility issues, potentially requiring existing users to perform a comprehensive upgrade to resolve them. In such cases, a major version upgrade, such as upgrading to version 2.x, may be necessary to undertake protocol modifications. Once the protocol is modified, it will be necessary to reprocess the encoding and decoding of the protocol. Therefore, I believe there is limited need for dynamic handling of these issues. Dynamically setting values increases the complexity of the code.

Got it, thanks for clarifying!

[ISSUE apache#4701]fix use tcp protocol client send message, it throw…

95ac2e0

… a DecoderException

pandaapo reviewed Jan 1, 2024

View reviewed changes

optmize code

7b36433

refactor with LengthFieldBasedFrameDecoder

bab4dd2

mxsm requested review from pandaapo, harshithasudhakar and xwm1992 January 6, 2024 12:36

fix code style

4e1dbae

pandaapo reviewed Jan 7, 2024

View reviewed changes

mxsm added 2 commits January 8, 2024 00:06

optimize code

413f44b

fix log print

fdca40a

mxsm requested a review from pandaapo January 7, 2024 16:09

karsonto approved these changes Jan 8, 2024

View reviewed changes

harshithasudhakar previously approved these changes Jan 8, 2024

View reviewed changes

pandaapo reviewed Jan 8, 2024

View reviewed changes

optimize code and add some comments

32656bc

mxsm dismissed harshithasudhakar’s stale review via 32656bc January 8, 2024 15:24

mxsm requested review from pandaapo and harshithasudhakar January 8, 2024 15:24

pandaapo approved these changes Jan 9, 2024

View reviewed changes

harshithasudhakar approved these changes Jan 9, 2024

View reviewed changes

mxsm merged commit 1f0bf0c into apache:master Jan 10, 2024
13 checks passed

mxsm deleted the bug-mxsm-4701 branch January 16, 2024 15:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ISSUE #4701]Fix use tcp protocol client send message, it throw a DecoderException #4702

[ISSUE #4701]Fix use tcp protocol client send message, it throw a DecoderException #4702

mxsm commented Jan 1, 2024 •

edited

Loading

codecov bot commented Jan 1, 2024 •

edited

Loading

pandaapo commented Jan 1, 2024

mxsm commented Jan 1, 2024

pandaapo commented Jan 1, 2024 •

edited

Loading

mxsm commented Jan 1, 2024

pandaapo commented Jan 1, 2024 •

edited

Loading

mxsm commented Jan 1, 2024

pandaapo commented Jan 1, 2024

pandaapo Jan 1, 2024 •

edited

Loading

mxsm Jan 1, 2024

pandaapo Jan 1, 2024

harshithasudhakar Jan 5, 2024

pandaapo Jan 1, 2024

mxsm Jan 1, 2024

mxsm commented Jan 1, 2024

pandaapo commented Jan 1, 2024

mxsm commented Jan 1, 2024 •

edited

Loading

pandaapo commented Jan 2, 2024

mxsm commented Jan 2, 2024

pandaapo commented Jan 2, 2024

karsonto commented Jan 5, 2024

pandaapo commented Jan 6, 2024 •

edited

Loading

mxsm commented Jan 6, 2024

pandaapo Jan 7, 2024

pandaapo Jan 7, 2024

karsonto commented Jan 8, 2024

harshithasudhakar left a comment

pandaapo Jan 8, 2024

pandaapo Jan 8, 2024

pandaapo Jan 8, 2024

harshithasudhakar Jan 8, 2024

mxsm Jan 9, 2024

harshithasudhakar Jan 9, 2024

[ISSUE #4701]Fix use tcp protocol client send message, it throw a DecoderException #4702

[ISSUE #4701]Fix use tcp protocol client send message, it throw a DecoderException #4702

Conversation

mxsm commented Jan 1, 2024 • edited Loading

Motivation

Modifications

Documentation

codecov bot commented Jan 1, 2024 • edited Loading

Codecov Report

pandaapo commented Jan 1, 2024

mxsm commented Jan 1, 2024

pandaapo commented Jan 1, 2024 • edited Loading

mxsm commented Jan 1, 2024

pandaapo commented Jan 1, 2024 • edited Loading

mxsm commented Jan 1, 2024

pandaapo commented Jan 1, 2024

pandaapo Jan 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mxsm commented Jan 1, 2024

pandaapo commented Jan 1, 2024

mxsm commented Jan 1, 2024 • edited Loading

pandaapo commented Jan 2, 2024

mxsm commented Jan 2, 2024

pandaapo commented Jan 2, 2024

karsonto commented Jan 5, 2024

pandaapo commented Jan 6, 2024 • edited Loading

mxsm commented Jan 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karsonto commented Jan 8, 2024

harshithasudhakar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mxsm commented Jan 1, 2024 •

edited

Loading

codecov bot commented Jan 1, 2024 •

edited

Loading

pandaapo commented Jan 1, 2024 •

edited

Loading

pandaapo commented Jan 1, 2024 •

edited

Loading

pandaapo Jan 1, 2024 •

edited

Loading

mxsm commented Jan 1, 2024 •

edited

Loading

pandaapo commented Jan 6, 2024 •

edited

Loading