Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: Bivariate analyses #171

Open
sebastian opened this issue Jun 19, 2020 · 3 comments
Open

Idea: Bivariate analyses #171

sebastian opened this issue Jun 19, 2020 · 3 comments
Labels
low priority Lower priority and to be downprioritized in favor of other work

Comments

@sebastian
Copy link
Contributor

sebastian commented Jun 19, 2020

At TeamBank the data scientist said it would be radical if he could, when seeing the analysis/distribution of ages, select a second dimension (say gender) and see a breakdown per age.

If you do multi column analyses then this is likely data you already have available. We should just think about a way of meaningfully exposing it!

@sebastian sebastian changed the title Bivariate analyses Idea: Bivariate analyses Jun 19, 2020
@dandanlen
Copy link
Contributor

Now that we have multi-column correlations / joint probabilities, this is indeed closer to being a possibility. Enabling this functionality on-demand can be done in one of two ways, one more client-heavy, the other more explorer-heavy:

  1. Package up and returning the joint probability matrices for all available column combinations and then using these client-side to build the 2-dimensional breakdown.
  • 😄 Requires some client-side data analysis - API consumer will need to perform some non-trivial work to generate visualisation data from raw probabilities.
  • 😄 Easy to implement in the explorer.
  • 😞 Requires a lot of extra data to be returned through the API (for n columns there are on the order of n^2 pairs).
  1. Storing the matrices in the explorer and exposing a new API endpoint to request a multi-column data summary.
  • 😄 Reuses existing explorer logic / datatypes for data analysis.
  • 😄 Minimises work on the client-side.
  • 😞 Requires new infrastructure in the explorer to allow persisting exploration results for subsequent requests.

@sebastian
Copy link
Contributor Author

sebastian commented Nov 2, 2020

I vote for providing more metrics through the API and letting the client make decisions based on the data.

@sebastian sebastian added the low priority Lower priority and to be downprioritized in favor of other work label Nov 3, 2020
@sebastian
Copy link
Contributor Author

sebastian commented Nov 3, 2020

Let's ignore it for now. We can revisit given time.
What needs to be decided in that case is what shape the data should take to be somewhat consumable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
low priority Lower priority and to be downprioritized in favor of other work
Projects
None yet
Development

No branches or pull requests

2 participants