[Dataset] Improve str/repr of Dataset
to include execution plan
#31604
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Signed-off-by: Cheng Su [email protected]
Why are these changes needed?
This is a followup of #31286, we want to improve the
Dataset.__repr__()
to provide more useful information to users, given lazy execution is default behavior.The change is to include execution plan (stages as a tree) into
Dataset.__repr__()
. Currently each stage only has stage name printed out. We shall add more information per stage/operator in the future, which is orthogonal to this PR. This PR is just to print out the existing information we have.Example:
The code change includes:
ExecutionPlan.get_plan_as_string()
to get the string representation above for the plan.ExecutionPlan
-_get_unified_blocks_schema()
and_get_num_rows_from_blocks_metadata()
Dataset.__repr__
to callExecutionPlan.get_plan_as_string()
directly.Related issue number
Closes #31417
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.