Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for data format #91

Merged
merged 6 commits into from
Jun 6, 2021
Merged

Documentation for data format #91

merged 6 commits into from
Jun 6, 2021

Conversation

SichongP
Copy link
Contributor

A draft documentation for data format. Some html tables rendered fine with sphinx but seems weird on github. This is my first time working with rst files so let me know if there are any errors. Thanks :)

@SichongP SichongP mentioned this pull request Mar 17, 2020
@jnothman
Copy link
Owner

This is looking like a good start, though I am not particularly happy with all the HTML tables in there. I think I'd be happier using nbsphinx so that you can just implement the doc in a Jupyter notebook. WDYT?

@jnothman
Copy link
Owner

Please also add the data format guide to the toctree in index.rst.

@jnothman
Copy link
Owner

The current version here is focused on "here's a representation and here's what it means and how to use it". Do you think it would be more informative if it was structured around different use cases (as listed in #57) and for each use case would give some examples of how you may already have the data represented, describing how to transform it for use in upsetplot?

@SichongP
Copy link
Contributor Author

This is looking like a good start, though I am not particularly happy with all the HTML tables in there. I think I'd be happier using nbsphinx so that you can just implement the doc in a Jupyter notebook. WDYT?

I actually wrote it in notebook so yes nbsphinx sounds like a great idea!

@SichongP
Copy link
Contributor Author

The current version here is focused on "here's a representation and here's what it means and how to use it". Do you think it would be more informative if it was structured around different use cases (as listed in #57) and for each use case would give some examples of how you may already have the data represented, describing how to transform it for use in upsetplot?

I thought that it'd be helpful to start by explaining the data structures required for input in more detail so people can reformat their data (which may come in a wide variety) to this format.

I do think that a few examples of different use cases can be very helpful. So here are my understandings of your points:

  • Representing counts only

Is movies dataset good enough for this use case? We can demonstrate how to use subset_size='count' with a DataFrame as well as how to group data, filter by counts, and plot with sum_over

  • Representing data elements and their associated sets

Are you talking about something like this?

[
    ['cat0', 'cat1'],
    ['cat0', 'cat2'],
    ['cat1']
]

I find it difficult to see a use case with this kind of data format... Do you have an example I can start with?

  • Representing additional attributes for each data element

Would this be the Boston data example in Doc? I think that is by itself a great example and could just be used here.

@jnothman
Copy link
Owner

jnothman commented Mar 17, 2020 via email

@SichongP
Copy link
Contributor Author

I actually wrote it in notebook so yes nbsphinx sounds like a great idea!
Are you able to make this change?

Okay hopefully it should be working with my ipynb file now.

@jnothman
Copy link
Owner

jnothman commented Jun 6, 2021

Thank you @SichongP! sorry for the delay!

@jnothman jnothman merged commit b5fb27f into jnothman:master Jun 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants