Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement unified way of checking dimensions, exporting and unifying data for plots #26

Open
janssenhenning opened this issue Mar 12, 2021 · 2 comments
Labels
discussion enhancement New feature or request visualization Related to the visualization routines

Comments

@janssenhenning
Copy link
Contributor

The current way of providing data for plots in the plot_methods is through arrays or lists. The dimension checking is quite fragile and also varies from method to method (both on the develop and plot_methods_refactor branch)

We should have:

  • A way to convert given input data from a variety of formats to a consistent known format. Some ideas for this:
    • numpy arrays for single plot calls, lists of numpy arrays for multiple plot calls
    • pandas dataframes (Probably more natural for bokeh_plots)
  • Support for exporting data into a variety of formats
@janssenhenning janssenhenning added the enhancement New feature or request label Mar 12, 2021
@janssenhenning
Copy link
Contributor Author

janssenhenning commented Mar 15, 2021

Actually after working with bokeh plots a bit more I would be in favor to implement similar behaviour for matplotlib plots. So you can define a DataFrame (e.g with pandas) and give the keys you want to plot. With pandas this would be already possible to plot by calling the plotting methods directly on the dataframe but I think just giving the data and then indexing the right keys would be enough, since I think the interface is slightly different, which might be confusing

Of course we could construct a dataframe if it is not given and support all kinds of ways of giving the data in this way

This would probably also massively simplify exporting the plot data to files

@Irratzo
Copy link
Member

Irratzo commented Mar 17, 2021

Hi Henning, I'm not familiar with the masci-tools.vis modules. This is just food for thought. And about thematic overlap of this issue and that issue (integration of the branch studentproject18w into the main code, as much as is sensible) (Disclaimer: I wrote that code.).

The goal for that project/branch was to provide an interactive bandstructure+DOS plotter from fleur HDF5 output files with two user frontends (Tkinter desktop program, a Jupyter dashboard), using the same base code.

The outcome was

  1. a preprocessor interface to transform fleur HDF output into Python classes for different use cases,
  2. a plotting class hierarchy to unify code for a) plotting methods for different tools (matplotlib, bokeh, ...) and b) different use cases (bandstructure, DOS, ...).

(The frontends worked, the jupyter dashboard can still be tried out via the binder badge in the README.)

Now a little more detail how it works.

The preprocessor takes a JSON recipe, e.g. FleurBands, which specifies the datasets to extract from the HDF file, what transformations to apply to each, and the desired output type. The output type specifies functions for postprocessing data manipulation, e.g. for plotting. The reader then reads the datasets from the HDF file, transforms the datasets (dependencies between datasets for transformations are resolved automatically), creates an instance of the specified output type, and adds the transformed datasets as attributes of that instance. The attributes remain h5py datasets (ie, file-storage access), but can be 'moved to memory' optionally (changed into numpy arrays).

The Plotters (plotting classes) derive from an abstract class with an abstract data attribute, of which the preprocessor's output types are subclasses. For example, the AbstractBandPlot's data attribute is of type FleurBandData. The Plotters' actual plotting methods' arguments then do not take data, but only data selection arguments which operate on the underlying data attribute. This addresses at least partially your 'providing data' concern above.

(Side note: branch did not have pandas Dataframes in mind.)

(Side note: the problem with this approach is of course, that it relies on the whole pipeline, ie data comes only from HDF. But I think this can be relaxed.)

(Side note about the hierarchical Plotter classes concept: this can lead to a combinatorial explosion of classes, because you need to define a class for every use case and every plotting library. I don't know with which pythonic Design Pattern this problem could be solved more efficiently.)

@janssenhenning janssenhenning added the visualization Related to the visualization routines label Apr 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion enhancement New feature or request visualization Related to the visualization routines
Projects
None yet
Development

No branches or pull requests

2 participants