Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANALYZE TABLE #105

Merged
merged 16 commits into from
Jan 6, 2021
Merged

ANALYZE TABLE #105

merged 16 commits into from
Jan 6, 2021

Conversation

nils-braun
Copy link
Collaborator

Fixes #96

This PR introduces a new SQL statement:

ANALYZE TABLE <table> COMPUTE STATISTICS [FOR ALL COLUMNS | FOR COLUMNS a, b, ...]

Thanks to @quasiben for bringing it up. The query will not do any optimizations (as its pendant in e,g, spark), but will just output a table computed on the fly with the following properties:

"count",
"mean",
"std",
"min",
"25%",
"50%",
"75%",
"max",
"data_type",
"col_name",

If you have any more ideas on what to implement, I am happy to add this.

The PR is waiting for #104 for the documentation.

@nils-braun nils-braun marked this pull request as draft January 3, 2021 11:11
@codecov-io
Copy link

codecov-io commented Jan 3, 2021

Codecov Report

Merging #105 (6f1e844) into main (a45deb1) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##              main      #105   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           39        40    +1     
  Lines         1611      1643   +32     
  Branches       224       229    +5     
=========================================
+ Hits          1611      1643   +32     
Impacted Files Coverage Δ
dask_sql/context.py 100.00% <100.00%> (ø)
dask_sql/datacontainer.py 100.00% <100.00%> (ø)
dask_sql/physical/rel/custom/__init__.py 100.00% <100.00%> (ø)
dask_sql/physical/rel/custom/analyze.py 100.00% <100.00%> (ø)
dask_sql/physical/rel/custom/columns.py 100.00% <100.00%> (ø)
dask_sql/utils.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a45deb1...6f1e844. Read the comment docs.

@nils-braun nils-braun marked this pull request as ready for review January 4, 2021 12:35
@quasiben
Copy link
Contributor

quasiben commented Jan 6, 2021

Thanks @nils-braun !

@nils-braun nils-braun merged commit 7134fae into main Jan 6, 2021
@nils-braun nils-braun deleted the feature/analyze-table branch January 6, 2021 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Analyze Query
3 participants