-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: file picker widgets removing files causing abnormal content, resulting in text garbled characters of type restapiplugin MULTIPART-FORM-DATA #31422
Conversation
Welcome to the Appsmith community! Thank you for your first pull request and making this project better. 🤗 Please make sure that you raise a review request so your code change does not go unnoticed. |
WalkthroughWalkthroughThe updates include enhancing the FilePickerV2 widget in the client side to manage file selections more robustly, specifically allowing for the removal of a file without affecting the integrity of other files' data. On the server side, an improvement in handling multipart form data character encoding ensures text parameters, including those in languages like Chinese and Japanese, are correctly processed without resulting in garbled text. These changes collectively aim to improve the user experience and data handling reliability. Changes
Related issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
29af53e
to
22517f7
Compare
22517f7
to
eb102fd
Compare
@gzlinzihong thanks for the PR! We'll get someone to review it |
/ok-to-test tags="@tag.Filepicker, @tag.Widget, @tag.Datasource" |
Tests running at: https://github.com/appsmithorg/appsmith/actions/runs/8138779483. |
/build-deploy-preview skip-test=true |
Deploying Your Preview: https://github.com/appsmithorg/appsmith/actions/runs/8138786096. |
Failed server tests
|
Workflow run: https://github.com/appsmithorg/appsmith/actions/runs/8138779483. |
@Parthvi12 Can you please review the cypress tests? |
@@ -190,7 +190,7 @@ public String parseFormData(List<Property> bodyFormData, Boolean encodeParamsTog | |||
byte[] valueBytesArray = new byte[0]; | |||
if (StringUtils.hasLength(String.valueOf(property.getValue()))) { | |||
valueBytesArray = | |||
String.valueOf(property.getValue()).getBytes(StandardCharsets.ISO_8859_1); | |||
String.valueOf(property.getValue()).getBytes(StandardCharsets.UTF_8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nidhi-nair @sumitsum Do you guys see any problems in extending the character set for multipart form data text type? Looks ok to me. Please let me know if anything else needs to be changed in order to accommodate for UTF-8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sneha122 the relvant change seems to have been added as part of this PR. From the PR it seems like adding ISO_8859_1 instead of UTF_8 could be an explicit decision. @nidhi-nair can you please check out the PR once and confirm if it is safe to replace with UTF_8 encoding ?
Also, in case we go ahead with the replacement, there are two other places in the same file where ISO_8859_1 encoding is used - we should check if they need replacement as well.
From my perspective, I think it should be safe to replace ISO_8859_1 with UTF_8. However, I would suggest waiting for @nidhi-nair 's confirmation on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @nidhi-nair Could you please help confirm this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey folks, apologies for the delay in response. That's a great catch, Sneha! Yes, the change is intentional. The two types of encoding differ from each other in that UTF-8 is a variable length encoding and ISO_8859_1 (Latin) is a fixed length encoding (of 8 bits). As a result UTF-8 has a different representation for special characters outside the first 256 code points. Latin encoding treats each byte as a separate character instead.
The reason our request parsing uses Latin encoding over UTF-8 is because the multipart form request that we use for action execution sends each character across in a single byte encoded format. I have lost a little bit of context on why that is the case, but it is likely because there is no special handling to let the client side library know otherwise. If you notice in L#470 at the request parsing stage, we have interpretted the incoming byte stream as Latin encoded, and created a String using that explicit type of encoding. As a result, when we are trying to convert the interpretted String into a bytw stream again inside DataUtils
, we are using the same format, so we end up at the same kind of byte representation.
If we are making this change, we need to make sure that we use the same format of encoding across the board. Let me know if I can provide more details here. Perhaps what we need to do is to introduce the ability to even clarify the encoding for end users' endpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, could I also request that we add Cypress tests to test for files as well as text bindings with non-ASCII character sets, so we know our solution here will be permanent, and no future change disregards the conclusion we get to at the end of this thread? Please make sure to cover Chinese, Japanese, and Arabic characters at a minimum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey folks, apologies for the delay in response. That's a great catch, Sneha! Yes, the change is intentional. The two types of encoding differ from each other in that UTF-8 is a variable length encoding and ISO_8859_1 (Latin) is a fixed length encoding (of 8 bits). As a result UTF-8 has a different representation for special characters outside the first 256 code points. Latin encoding treats each byte as a separate character instead.
The reason our request parsing uses Latin encoding over UTF-8 is because the multipart form request that we use for action execution sends each character across in a single byte encoded format. I have lost a little bit of context on why that is the case, but it is likely because there is no special handling to let the client side library know otherwise. If you notice in L#470 at the request parsing stage, we have interpretted the incoming byte stream as Latin encoded, and created a String using that explicit type of encoding. As a result, when we are trying to convert the interpretted String into a bytw stream again inside
DataUtils
, we are using the same format, so we end up at the same kind of byte representation.If we are making this change, we need to make sure that we use the same format of encoding across the board. Let me know if I can provide more details here. Perhaps what we need to do is to introduce the ability to even clarify the encoding for end users' endpoints.
Hi @nidhi-nair It is a text type rather than a blob type. You can check line L#453 where the bytes array has been converted to utf-8. However, when restapi is used later, the text type is converted to ISO_8859_1 again, resulting in garbled characters.
For specific reproduction steps, please check the chat between me and @sneha122 below this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, I checked ActionApi.tsx. The body is of FormData type. When appending a string value to FormData, the binary converted by the browser should be utf-8 encoded by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, missed that piece of context in the conversation entirely, sorry about that! Sneha or @gzlinzihong can you please help to check if the line you pointed to can be moved to Latin encoding instead? This is to check if we can use a uniform type of encoding throughout. I see that there is actually some interspersing happening between these two encodings already at L#553, and it makes me wonder what kind of behaviour we would see if a user was expecting to embed the value from a file into their binding. Please feel free to invalidate the use case itself if that is what your findings show. This is a core part of our execution flow, which is where my paranoia is coming from. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @nidhi-nair Sorry, I don't understand how this is related to L#553 because the blob case is not actually executed.
I don't think changing the UTF-8 encoding to ISO_8859_1 in L#453 is a better way. Because this will introduce more problems, the test parameter in the example below is directly a constant, defined in actionConfig.
If the UTF-8 encoding is modified to ISO_8859_1 in L#453,although the form of the binding variable can be solved smoothly, the garbled problem will not be solved if the variable is a constant string, and because the encoding is modified before the plugin is executed, the impact The scope will design the execution of all plugins
Hi @gzlinzihong Thanks a lot for your contribution! There are a couple of linting failures on server side. Could you please execute following command in server directory? On client side as well there is a linting failure in file These should fix all of your linting errors and we can merge the PR as quickly as we can |
@sneha122 thank you for your feedback. I have made the necessary amendments to the code on both the server side and the client side as per your suggestion. I've run the All the linting errors should now be corrected. Feel free to let me know if there is anything else that needs my attention. Thank you again for your help and patience. |
Hi @sneha122 I'm sorry for the confusion, I've realized that I made a mistake in the naming of my branch. If necessary, we can close this pull request and I will rename the branch and create a new pull request. Let me know if you would like me to proceed in this manner |
Thanks a lot for these changes, the respective checks are running on the PR, once those are done we can check if we are good to go!! |
Hi @gzlinzihong, No worries we can keep this PR request and merge this once all checks pass. |
/build-deploy-preview skip-tests=true |
/ok-to-test tags="@tag.Filepicker, @tag.Widget, @tag.Datasource" |
Deploying Your Preview: https://github.com/appsmithorg/appsmith/actions/runs/8151480457. |
Tests running at: https://github.com/appsmithorg/appsmith/actions/runs/8151484166. |
Deploy-Preview-URL: https://ce-31422.dp.appsmith.com |
Workflow run: https://github.com/appsmithorg/appsmith/actions/runs/8151484166.
To know the list of identified flaky tests - Refer here |
Workflow run: https://github.com/appsmithorg/appsmith/actions/runs/8151484166. |
Hi @gzlinzihong, Could you please help me understand second part of the issue: |
app/client/cypress/e2e/Regression/ClientSide/Widgets/Filepicker/FilePickerV2_spec.js
Outdated
Show resolved
Hide resolved
app/client/cypress/e2e/Regression/ClientSide/Widgets/Filepicker/FilePickerV2_spec.js
Outdated
Show resolved
Hide resolved
Hi @sneha122 You can reproduce it by following these steps
|
@gzlinzihong Thanks a lot for the detailed steps, Tested the PR looks fixes look good to me! |
Hi @sneha122 I would like to ask if anyone else will review it in the future? |
Hi @gzlinzihong Let me get confirmation from Nidhi on the change of ISO to UTF-8, post that we are good to go ahead! |
Hi @gzlinzihong, As I can see in your PR, you have resolved following two issues:
First one is completely resolved with relevant cypress tests and I have tested the changes as well. As for second one, we are still discussing the change from ISO to UTF, While we are still discussing and coming to conclusion on that, it will be great to get at least first fix merged, and we cannot do that with this PR as it contains changes for second one as well. Is it possible for you to create a separate PRs for both issues? That way we can merge changes for first one very quickly and second one we can discuss and take it forward separately. Please let me know what you think about this |
OK, I'll do it |
Hi @sneha122 |
Thanks a lot @gzlinzihong This makes it super easy for us to merge the fix which is good to go, now we can focus on utf-8 separately, I will check the file picker PR and run necessary checks on it. |
Description
PR fixes following issue(s)
Fixes # (issue number)
Type of change
Testing
How Has This Been Tested?
Test Plan
Issues raised during DP testing
Checklist:
Dev activity
Summary by CodeRabbit
New Features
Bug Fixes
Tests