Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Fix doc for read_json #41240

Merged
merged 4 commits into from
Nov 22, 2023
Merged

[Data] Fix doc for read_json #41240

merged 4 commits into from
Nov 22, 2023

Conversation

c21
Copy link
Contributor

@c21 c21 commented Nov 17, 2023

Why are these changes needed?

To avoid test becomes flaky that num_blocks can be different in different test infra.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Cheng Su <[email protected]>
@@ -1044,7 +1044,11 @@ def read_json(
... "s3:https://anonymous@ray-example-data/log.json",
... read_options=pajson.ReadOptions(block_size=block_size)
... )
Dataset(num_blocks=8, num_rows=1, schema={timestamp: timestamp[s], size: int64})
Dataset(
num_blocks=...,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not work. sometimes it is in one line, and sometimes in multiple lines.

rather than printing the entire Dataset out, maybe just print out the num_rows and schema? and assert num_blocks?

or is there some way to force the environment to treat num_blocks to be a constant, and not dependent on environment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah noticed that, will skip doc test in this case. In the future, we would always print out Dataset in multiple lines.

Copy link
Collaborator

@aslonnie aslonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get this reviewed by owners and merged?

@c21
Copy link
Contributor Author

c21 commented Nov 21, 2023

Yeah let's get approval and merge tomorrow.

python/ray/data/read_api.py Outdated Show resolved Hide resolved
python/ray/data/read_api.py Show resolved Hide resolved
Co-authored-by: Balaji Veeramani <[email protected]>
Signed-off-by: Cheng Su <[email protected]>
@c21 c21 merged commit 8dea520 into ray-project:master Nov 22, 2023
11 of 16 checks passed
@c21 c21 deleted the fix-json-doc branch November 22, 2023 22:32
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Nov 29, 2023
To avoid test becomes flaky that num_blocks can be different in different test infra.

Signed-off-by: Cheng Su <[email protected]>
Co-authored-by: Balaji Veeramani <[email protected]>
Co-authored-by: Balaji Veeramani <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants