Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: file picker widgets removing files causing abnormal content, resulting in text garbled characters of type restapiplugin MULTIPART-FORM-DATA #31422

Closed

Conversation

gzlinzihong
Copy link
Contributor

@gzlinzihong gzlinzihong commented Mar 1, 2024

Description

fix filepicker widgets selects multiple files and then removes one of them, causing the final content of all file.data to be abnormal

fix The MULTIPART_FORM_DATA type in the body of the restapi plugin, when passing text type parameters, including Chinese, Japanese, etc., is garbled.

PR fixes following issue(s)

Fixes # (issue number)

31411

Type of change

  • Bug fix

Testing

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Also list any relevant details for your test configuration.

  • JUnit
  • Cypress

Test Plan

Add Testsmith test cases links that relate to this PR

Issues raised during DP testing

Link issues raised during DP testing for better visiblity and tracking (copy link from comments dropped on this PR)

Checklist:

Dev activity

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing unit tests pass locally with my changes

Summary by CodeRabbit

  • New Features

    • Enhanced the FilePicker widget to support better file management.
  • Bug Fixes

    • Improved handling of character encoding in form data to support UTF-8.
  • Tests

    • Added tests for new FilePicker functionality and data parsing improvements.

Copy link

welcome bot commented Mar 1, 2024

Welcome to the Appsmith community! Thank you for your first pull request and making this project better. 🤗 Please make sure that you raise a review request so your code change does not go unnoticed.

Copy link
Contributor

coderabbitai bot commented Mar 1, 2024

Walkthrough

Walkthrough

The updates include enhancing the FilePickerV2 widget in the client side to manage file selections more robustly, specifically allowing for the removal of a file without affecting the integrity of other files' data. On the server side, an improvement in handling multipart form data character encoding ensures text parameters, including those in languages like Chinese and Japanese, are correctly processed without resulting in garbled text. These changes collectively aim to improve the user experience and data handling reliability.

Changes

File Path Change Summary
.../cypress/e2e/Regression/ClientSide/Widgets/Filepicker/FilePickerV2_spec.js Added a test case for removing a file in FilePickerV2 widget.
.../src/widgets/FilePickerWidgetV2/widget/index.tsx Introduced fileMap object for storing selected files.
.../helpers/restApiUtils/helpers/DataUtils.java Changed encoding to UTF_8 in parseFormData method.
.../helpers/DataUtilsTest.java Added a test for parsing multipart text data with different character sets.

Related issues

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

@Nikhil-Nandagopal Nikhil-Nandagopal added the Task A simple Todo label Mar 1, 2024
@gzlinzihong gzlinzihong force-pushed the feature/fix-file-upload branch 5 times, most recently from 29af53e to 22517f7 Compare March 1, 2024 16:11
@Nikhil-Nandagopal
Copy link
Contributor

@gzlinzihong thanks for the PR! We'll get someone to review it

@rohan-arthur
Copy link
Contributor

/ok-to-test tags="@tag.Filepicker, @tag.Widget, @tag.Datasource"

Copy link

github-actions bot commented Mar 4, 2024

Tests running at: https://github.com/appsmithorg/appsmith/actions/runs/8138779483.
Workflow: Appsmith External Integration Test Workflow.
Tags: @tag.Filepicker, @tag.Widget, @tag.Datasource.

@rohan-arthur
Copy link
Contributor

/build-deploy-preview skip-test=true

Copy link

github-actions bot commented Mar 4, 2024

Deploying Your Preview: https://github.com/appsmithorg/appsmith/actions/runs/8138786096.
Workflow: On demand build Docker image and deploy preview.
skip-tests: . env: .
PR: 31422.
recreate: .

Copy link

github-actions bot commented Mar 4, 2024

Failed server tests

  • com.external.plugins.RestApiPluginTest#testGetApiWithBody
  • com.external.plugins.RestApiPluginTest#testGetApiWithoutBody
  • com.external.plugins.RestApiPluginTest#testHttpGetRequestRawBody

Copy link

github-actions bot commented Mar 4, 2024

Workflow run: https://github.com/appsmithorg/appsmith/actions/runs/8138779483.
Commit: ``.
Cypress dashboard url: Click here!
All cypress tests have passed 🎉🎉🎉

@sneha122 sneha122 changed the title Fix file picker widgets removing files causing abnormal content, resulting in text garbled characters of type restapiplugin MULTIPART-FORM-DATA fix: file picker widgets removing files causing abnormal content, resulting in text garbled characters of type restapiplugin MULTIPART-FORM-DATA Mar 4, 2024
@sneha122
Copy link
Contributor

sneha122 commented Mar 4, 2024

@Parthvi12 Can you please review the cypress tests?

@@ -190,7 +190,7 @@ public String parseFormData(List<Property> bodyFormData, Boolean encodeParamsTog
byte[] valueBytesArray = new byte[0];
if (StringUtils.hasLength(String.valueOf(property.getValue()))) {
valueBytesArray =
String.valueOf(property.getValue()).getBytes(StandardCharsets.ISO_8859_1);
String.valueOf(property.getValue()).getBytes(StandardCharsets.UTF_8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nidhi-nair @sumitsum Do you guys see any problems in extending the character set for multipart form data text type? Looks ok to me. Please let me know if anything else needs to be changed in order to accommodate for UTF-8

Copy link
Contributor

@sumitsum sumitsum Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sneha122 the relvant change seems to have been added as part of this PR. From the PR it seems like adding ISO_8859_1 instead of UTF_8 could be an explicit decision. @nidhi-nair can you please check out the PR once and confirm if it is safe to replace with UTF_8 encoding ?
Also, in case we go ahead with the replacement, there are two other places in the same file where ISO_8859_1 encoding is used - we should check if they need replacement as well.
From my perspective, I think it should be safe to replace ISO_8859_1 with UTF_8. However, I would suggest waiting for @nidhi-nair 's confirmation on this.

Copy link
Contributor Author

@gzlinzihong gzlinzihong Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @nidhi-nair Could you please help confirm this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey folks, apologies for the delay in response. That's a great catch, Sneha! Yes, the change is intentional. The two types of encoding differ from each other in that UTF-8 is a variable length encoding and ISO_8859_1 (Latin) is a fixed length encoding (of 8 bits). As a result UTF-8 has a different representation for special characters outside the first 256 code points. Latin encoding treats each byte as a separate character instead.

The reason our request parsing uses Latin encoding over UTF-8 is because the multipart form request that we use for action execution sends each character across in a single byte encoded format. I have lost a little bit of context on why that is the case, but it is likely because there is no special handling to let the client side library know otherwise. If you notice in L#470 at the request parsing stage, we have interpretted the incoming byte stream as Latin encoded, and created a String using that explicit type of encoding. As a result, when we are trying to convert the interpretted String into a bytw stream again inside DataUtils, we are using the same format, so we end up at the same kind of byte representation.

If we are making this change, we need to make sure that we use the same format of encoding across the board. Let me know if I can provide more details here. Perhaps what we need to do is to introduce the ability to even clarify the encoding for end users' endpoints.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, could I also request that we add Cypress tests to test for files as well as text bindings with non-ASCII character sets, so we know our solution here will be permanent, and no future change disregards the conclusion we get to at the end of this thread? Please make sure to cover Chinese, Japanese, and Arabic characters at a minimum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey folks, apologies for the delay in response. That's a great catch, Sneha! Yes, the change is intentional. The two types of encoding differ from each other in that UTF-8 is a variable length encoding and ISO_8859_1 (Latin) is a fixed length encoding (of 8 bits). As a result UTF-8 has a different representation for special characters outside the first 256 code points. Latin encoding treats each byte as a separate character instead.

The reason our request parsing uses Latin encoding over UTF-8 is because the multipart form request that we use for action execution sends each character across in a single byte encoded format. I have lost a little bit of context on why that is the case, but it is likely because there is no special handling to let the client side library know otherwise. If you notice in L#470 at the request parsing stage, we have interpretted the incoming byte stream as Latin encoded, and created a String using that explicit type of encoding. As a result, when we are trying to convert the interpretted String into a bytw stream again inside DataUtils, we are using the same format, so we end up at the same kind of byte representation.

If we are making this change, we need to make sure that we use the same format of encoding across the board. Let me know if I can provide more details here. Perhaps what we need to do is to introduce the ability to even clarify the encoding for end users' endpoints.

Hi @nidhi-nair It is a text type rather than a blob type. You can check line L#453 where the bytes array has been converted to utf-8. However, when restapi is used later, the text type is converted to ISO_8859_1 again, resulting in garbled characters.

For specific reproduction steps, please check the chat between me and @sneha122 below this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, I checked ActionApi.tsx. The body is of FormData type. When appending a string value to FormData, the binary converted by the browser should be utf-8 encoded by default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, missed that piece of context in the conversation entirely, sorry about that! Sneha or @gzlinzihong can you please help to check if the line you pointed to can be moved to Latin encoding instead? This is to check if we can use a uniform type of encoding throughout. I see that there is actually some interspersing happening between these two encodings already at L#553, and it makes me wonder what kind of behaviour we would see if a user was expecting to embed the value from a file into their binding. Please feel free to invalidate the use case itself if that is what your findings show. This is a core part of our execution flow, which is where my paranoia is coming from. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @nidhi-nair Sorry, I don't understand how this is related to L#553 because the blob case is not actually executed.
I don't think changing the UTF-8 encoding to ISO_8859_1 in L#453 is a better way. Because this will introduce more problems, the test parameter in the example below is directly a constant, defined in actionConfig.
image
image

If the UTF-8 encoding is modified to ISO_8859_1 in L#453,although the form of the binding variable can be solved smoothly, the garbled problem will not be solved if the variable is a constant string, and because the encoding is modified before the plugin is executed, the impact The scope will design the execution of all plugins

@sneha122
Copy link
Contributor

sneha122 commented Mar 4, 2024

Hi @gzlinzihong Thanks a lot for your contribution!

There are a couple of linting failures on server side. Could you please execute following command in server directory?
mvn spotless:apply

On client side as well there is a linting failure in file cypress/e2e/Regression/ClientSide/Widgets/Filepicker/FilePickerV2_spec.js, can you please format this file using prettier formatter?

These should fix all of your linting errors and we can merge the PR as quickly as we can

@gzlinzihong
Copy link
Contributor Author

@sneha122 thank you for your feedback.

I have made the necessary amendments to the code on both the server side and the client side as per your suggestion. I've run the mvn spotless:apply command on the server side, and I've also formatted the cypress/e2e/Regression/ClientSide/Widgets/Filepicker/FilePickerV2_spec.js file with Prettier on the client side.

All the linting errors should now be corrected. Feel free to let me know if there is anything else that needs my attention.

Thank you again for your help and patience.

@gzlinzihong
Copy link
Contributor Author

Hi @sneha122 I'm sorry for the confusion, I've realized that I made a mistake in the naming of my branch. If necessary, we can close this pull request and I will rename the branch and create a new pull request. Let me know if you would like me to proceed in this manner

@sneha122
Copy link
Contributor

sneha122 commented Mar 5, 2024

@sneha122 thank you for your feedback.

I have made the necessary amendments to the code on both the server side and the client side as per your suggestion. I've run the mvn spotless:apply command on the server side, and I've also formatted the cypress/e2e/Regression/ClientSide/Widgets/Filepicker/FilePickerV2_spec.js file with Prettier on the client side.

All the linting errors should now be corrected. Feel free to let me know if there is anything else that needs my attention.

Thank you again for your help and patience.

Thanks a lot for these changes, the respective checks are running on the PR, once those are done we can check if we are good to go!!

@sneha122
Copy link
Contributor

sneha122 commented Mar 5, 2024

Hi @sneha122 I'm sorry for the confusion, I've realized that I made a mistake in the naming of my branch. If necessary, we can close this pull request and I will rename the branch and create a new pull request. Let me know if you would like me to proceed in this manner

Hi @gzlinzihong, No worries we can keep this PR request and merge this once all checks pass.

@sneha122
Copy link
Contributor

sneha122 commented Mar 5, 2024

/build-deploy-preview skip-tests=true

@sneha122
Copy link
Contributor

sneha122 commented Mar 5, 2024

/ok-to-test tags="@tag.Filepicker, @tag.Widget, @tag.Datasource"

Copy link

github-actions bot commented Mar 5, 2024

Deploying Your Preview: https://github.com/appsmithorg/appsmith/actions/runs/8151480457.
Workflow: On demand build Docker image and deploy preview.
skip-tests: true.
env: ``.
PR: 31422.
recreate: .

Copy link

github-actions bot commented Mar 5, 2024

Tests running at: https://github.com/appsmithorg/appsmith/actions/runs/8151484166.
Workflow: Appsmith External Integration Test Workflow.
Tags: @tag.Filepicker, @tag.Widget, @tag.Datasource.

Copy link

github-actions bot commented Mar 5, 2024

Deploy-Preview-URL: https://ce-31422.dp.appsmith.com

Copy link

github-actions bot commented Mar 5, 2024

Workflow run: https://github.com/appsmithorg/appsmith/actions/runs/8151484166.
Commit: ``.
Cypress dashboard: Click here!
The following are new failures, please fix them before merging the PR:

  1. cypress/e2e/Regression/ClientSide/Widgets/Tab/Tab_spec.js

To know the list of identified flaky tests - Refer here

Copy link

github-actions bot commented Mar 5, 2024

Workflow run: https://github.com/appsmithorg/appsmith/actions/runs/8151484166.
Commit: ``.
Cypress dashboard url: Click here!
All cypress tests have passed 🎉🎉🎉

@sneha122 sneha122 requested a review from sumitsum March 5, 2024 09:03
@sneha122
Copy link
Contributor

sneha122 commented Mar 5, 2024

Hi @gzlinzihong,

Could you please help me understand second part of the issue: The MULTIPART_FORM_DATA type in the body of the restapi plugin, when passing text type parameters, including Chinese, Japanese, etc., is garbled? Can you please give me exact steps to reproduce the issue so that it will be easier for me to test the deploy preview?

@gzlinzihong
Copy link
Contributor Author

gzlinzihong commented Mar 5, 2024

Hi @gzlinzihong,

Could you please help me understand second part of the issue: The MULTIPART_FORM_DATA type in the body of the restapi plugin, when passing text type parameters, including Chinese, Japanese, etc., is garbled? Can you please give me exact steps to reproduce the issue so that it will be easier for me to test the deploy preview?

Hi @sneha122 You can reproduce it by following these steps

  1. Create a restapi query, as shown in the figure below. One of the parameters uses text type and the value is Chinese or Japanese, etc.
    image
    image
  2. The test procedure for accepting requests is as follows
    image
  3. Print viewing request
    image
  4. Most applications should use UTF-8 encoding by default. If utf-8 encoding is used, the print result is as follows
    image

@sneha122
Copy link
Contributor

sneha122 commented Mar 5, 2024

@gzlinzihong Thanks a lot for the detailed steps, Tested the PR looks fixes look good to me!

@gzlinzihong
Copy link
Contributor Author

Hi @sneha122 I would like to ask if anyone else will review it in the future?

@sneha122
Copy link
Contributor

sneha122 commented Mar 7, 2024

Hi @gzlinzihong Let me get confirmation from Nidhi on the change of ISO to UTF-8, post that we are good to go ahead!

@sneha122 sneha122 added the Integrations Pod General Issues related to the Integrations Pod that don't fit into other tags. label Mar 8, 2024
@sneha122
Copy link
Contributor

Hi @gzlinzihong, As I can see in your PR, you have resolved following two issues:

  1. When using filepicker widgets to select multiple files and then remove one of them, all file.data contents are abnormal in the end.
  2. The MULTIPART_FORM_DATA type in the body of the restapi plugin, when passing text type parameters, including Chinese, Japanese, etc., is garbled.

First one is completely resolved with relevant cypress tests and I have tested the changes as well. As for second one, we are still discussing the change from ISO to UTF, While we are still discussing and coming to conclusion on that, it will be great to get at least first fix merged, and we cannot do that with this PR as it contains changes for second one as well. Is it possible for you to create a separate PRs for both issues? That way we can merge changes for first one very quickly and second one we can discuss and take it forward separately. Please let me know what you think about this

@gzlinzihong
Copy link
Contributor Author

Hi @gzlinzihong, As I can see in your PR, you have resolved following two issues:

  1. When using filepicker widgets to select multiple files and then remove one of them, all file.data contents are abnormal in the end.
  2. The MULTIPART_FORM_DATA type in the body of the restapi plugin, when passing text type parameters, including Chinese, Japanese, etc., is garbled.

First one is completely resolved with relevant cypress tests and I have tested the changes as well. As for second one, we are still discussing the change from ISO to UTF, While we are still discussing and coming to conclusion on that, it will be great to get at least first fix merged, and we cannot do that with this PR as it contains changes for second one as well. Is it possible for you to create a separate PRs for both issues? That way we can merge changes for first one very quickly and second one we can discuss and take it forward separately. Please let me know what you think about this

OK, I'll do it

@gzlinzihong
Copy link
Contributor Author

Hi @sneha122
I have split it into 2 pr.
thank you for your patience!
filePickerPR
garbled characters

@sneha122
Copy link
Contributor

Thanks a lot @gzlinzihong This makes it super easy for us to merge the fix which is good to go, now we can focus on utf-8 separately, I will check the file picker PR and run necessary checks on it.

@sneha122
Copy link
Contributor

Closing this PR as we have created two separate ones, out of which one is merged, other one is pending and can be tracked here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Integrations Pod General Issues related to the Integrations Pod that don't fit into other tags. Task A simple Todo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants