Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve plot data handling #54

Merged
merged 77 commits into from
Aug 20, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
e7906b1
Prototype version of PlotData class for more consistent and less erro…
janssenhenning Jun 6, 2021
2d10655
Some fixes and add a first usage example in single_scatterplot
janssenhenning Jun 6, 2021
2246f5e
Add data argument to multiple_scatterplot
janssenhenning Jun 7, 2021
37b09f8
Add plot_data to multi_scatter_plot
janssenhenning Jun 7, 2021
fe7f285
Add plot_data to colormesh_plot
janssenhenning Jun 7, 2021
f551c5f
Add min max function to PlotData to easily determine bounds and finis…
janssenhenning Jun 7, 2021
8fd48ec
Use min/max function in colormesh_plot
janssenhenning Jun 7, 2021
921cc6b
Add plot_data to waterfall_plot and surface_plot
janssenhenning Jun 7, 2021
ee1810c
Use plot_data in multiplot_moved
janssenhenning Jun 7, 2021
e573e38
Add apply method to mutate data for plotting using lambda functions (…
janssenhenning Jun 10, 2021
e2b979d
Use plot_data in dos plots in plot_methods
janssenhenning Jun 10, 2021
6b063a9
Use plot_data in plot_bands in plot_methods
janssenhenning Jun 10, 2021
18f3853
Move plot_fleur_bands and plot_fleur_dos to use the data argument
janssenhenning Jun 10, 2021
a32492b
Fix numpy warning in tests
janssenhenning Jun 10, 2021
247b19c
Fix for ``legend_show_data_labels`` option with new list_of_dicts switch
janssenhenning Jul 12, 2021
ab110c2
Add documentation to `vis.data` module
janssenhenning Jul 16, 2021
e0e0cfd
Add some first unit tests of the data module
janssenhenning Jul 16, 2021
71d3a7f
Add method to shift data entries in PlotData
janssenhenning Jul 17, 2021
95f54cd
Add PlotData to histogram and barchart plot function
janssenhenning Jul 17, 2021
52eaba1
pre-commit fixes and add missing files
janssenhenning Jul 17, 2021
aeef301
Started introduction of PlotData class to bokeh plots
janssenhenning Jul 17, 2021
43a0cb2
Improve normalization of the bokeh json for better regression tests
janssenhenning Jul 18, 2021
e817d9e
More improvement to bokeh_plots. Properly deprecated old signature fo…
janssenhenning Jul 18, 2021
cbed2be
Add tests for old signature in bokeh_plots
janssenhenning Jul 18, 2021
ed76bc9
Bugfixes in normalize_list_or_array
janssenhenning Jul 18, 2021
41e99b6
Fix normalize_list_or_array test
janssenhenning Jul 18, 2021
3affa81
Allow function defaults to be officially defined for multiple datasets
janssenhenning Jul 18, 2021
8a8f978
Improve bokeh regression again, with decoding numpy arrays and introd…
janssenhenning Jul 18, 2021
0698468
Convert bokeh_line to use PlotData
janssenhenning Jul 18, 2021
4a1383b
pre-commit fixes
janssenhenning Jul 18, 2021
8f9396f
Some cleanup
janssenhenning Jul 18, 2021
0a4c24b
Change default dpi to 100 to avoid problems with using plt.show
janssenhenning Jul 18, 2021
f256957
Add basic tests for fleur DOS/bandstructure plots with bokeh
janssenhenning Jul 18, 2021
d288c73
Convert bokeh_dos/spinpol_dos functions
janssenhenning Jul 18, 2021
d39899a
Convert plot_fleur_dos to use new signature of bokeh_dos
janssenhenning Jul 18, 2021
6aa80fb
Move bokeh_bands and bokeh_dos to PlotData
janssenhenning Jul 18, 2021
8057f7a
Move plot_fleur_bands to adjust to new signature in bokeh_routines
janssenhenning Jul 18, 2021
1d106b2
Activate plot_fleur_bands_characterize plot test for bokeh
janssenhenning Jul 18, 2021
8e81f6e
Forgot to adjust test
janssenhenning Jul 18, 2021
1d105e9
Reencode numpy arrays in bokeh tests to keep files smaller
janssenhenning Jul 19, 2021
5b7b492
Add data_keys used in copy_data automatically if needed
janssenhenning Jul 19, 2021
44bbfa9
Turn on more interactive tools in bokeh plots by default.
janssenhenning Jul 20, 2021
dbd8ff2
Add switch to disable usage of formatted strings in tooltips. Needed …
janssenhenning Jul 20, 2021
e4f5b1c
Add option to matplotlib plotter to remove duplicate legend labels
janssenhenning Jul 20, 2021
7b19d17
Add band_index to dataframe for future improvements of bandstructure …
janssenhenning Jul 20, 2021
bdae74d
Also encode integer data in bokeh tests to make files smaller for bet…
janssenhenning Jul 20, 2021
add2ff1
Add function to expand the number of plot parameters to a multiple of…
janssenhenning Jul 20, 2021
64bab30
Add functions to either sort the data or group the data by data_keys
janssenhenning Jul 20, 2021
4738b02
Add option to plot bandstructure with lines and updated docstrings
janssenhenning Jul 20, 2021
ae49bb2
Add functions for loading/saving defaults for matplotlib/bokeh modules
janssenhenning Jul 21, 2021
a6f2e35
Add sections to the Developers Guide/User Guide about the PlotData class
janssenhenning Jul 21, 2021
2c22791
pre-commit fix
janssenhenning Jul 21, 2021
7233843
Fixes after rebase
janssenhenning Jul 21, 2021
4d8542e
Bugfix for setting function defaults on added parameters
janssenhenning Jul 21, 2021
e1ba7e1
Add argument to copy the data argument in the beginning of the plot f…
janssenhenning Jul 21, 2021
1a7bd06
Add tests for separate_bands with providing parameters for individual…
janssenhenning Jul 21, 2021
ad28ab5
Update bokeh tests
janssenhenning Jul 21, 2021
188bc5d
Add a Wrapper for ColumnDataSources to be able to use it without spec…
janssenhenning Jul 22, 2021
c40ff09
Move max and min method to use np.nanmax/min to handle nan values
janssenhenning Jul 22, 2021
226ffd0
First test for PlotData
janssenhenning Jul 23, 2021
8ad5074
Make sure that PlotData does not need bokeh installed to function
janssenhenning Jul 23, 2021
34b5458
Some fixes for using ColumnDataSources in PlotData
janssenhenning Jul 23, 2021
1430fb6
Add tests for min function
janssenhenning Jul 23, 2021
73f4269
Allow the mask argument in min/max to be a function to be parametrize…
janssenhenning Jul 23, 2021
c20f656
Suppress warnings for numpy.bool in masks. The datatype is deprecated…
janssenhenning Aug 3, 2021
24b2ac6
Rework parametrization of PlotData tests to allow for list data sources
janssenhenning Aug 3, 2021
8bde4a8
Add tests for max function of PlotData
janssenhenning Aug 3, 2021
9c62c97
Add __contains__ method to ColumnDataSourceWrapper to be able to use …
janssenhenning Aug 3, 2021
9309e0a
Add test for get_keys and get_values
janssenhenning Aug 3, 2021
de984b4
Fix copy_data for force=True for pandas DataFrames
janssenhenning Aug 3, 2021
4329535
Fix copy_data argument to PlotData for ColumnDataSource.
janssenhenning Aug 3, 2021
a65c859
Add tests for shift_data
janssenhenning Aug 3, 2021
4ff4e3a
Add tests for distinct_datasets and apply function of PlotData
janssenhenning Aug 3, 2021
4cfe504
Add tests for get_function_result and copy_data
janssenhenning Aug 3, 2021
e483c1e
Add test of default iteration behaviour
janssenhenning Aug 3, 2021
84354df
reraise exception in dict_of_list_to_list_of_dicts
janssenhenning Aug 3, 2021
d722bc9
Implement saving of bokeh plots
janssenhenning Aug 13, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/devel_guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ This is the developers guide for masci-tools

fleur_parser
plotting
plot_data
94 changes: 94 additions & 0 deletions docs/source/devel_guide/plot_data.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
.. _devguideplotdata:

Using the :py:class:`~masci_tools.vis.data.PlotData` class
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

.. _matplotlib: https://matplotlib.org/stable/index.html
.. _bokeh: https://docs.bokeh.org/en/latest/index.html

Description
------------

The :py:class:`~masci_tools.vis.data.PlotData` class simplifies supporting data to plotting functions in multiple ways, while keeping the plotting functions themselves simple and easy to understand.

The basic idea of :py:class:`~masci_tools.vis.data.PlotData` is to mimic the behaviour of the ``data`` argument in `matplotlib`_ or the ``source`` argument in `bokeh`_. Suppose we have our data for a plot in a dictionary ``d``, which has the keys ``x``, ``y1`` and ``y2``. If we now want to plot both ``y`` keys against ``x`` we can do this in the following way.

.. code-block::

from masci_tools.vis.data import PlotData

plot_data = PlotData(d, x='x', y=['y1', 'y2'])

for entry, source in plot_data.items():
#entry has the keys needed to get the data from the source
#and source is the mapping to use

print(entry.x, entry.y) #Yields x, y1 in the first loop and x, y2 in the second

#Now we can plot the data
#for example plt.plot(entry.x, entry.y, data=source)

The keys are automatically expanded to be of the same length, if this is possible. There are three iteration modes, with the same names as for dicts:

- ``keys``: Yields ``namedtuple`` with the keys for each plot
- ``values``: Yields ``namedtuple`` with the values corresponding to the keys for each plot
- ``items``: Yields the ``keys`` and their corresponding mapping for each plot

All of these functions have an argument ``first``, which will only return the first element if it is given as ``True``.

.. note::
The names ``x`` and ``y`` in the example above are completely arbitrary. The names for the columns and the fields on the ``namedtuple`` are determined by the keyword arguments given to :py:class:`~masci_tools.vis.data.PlotData` at initialization

.. note::
At the moment the types of mappings accepted in the :py:class:`~masci_tools.vis.data.PlotData` class are limited to ``dict``, ``pd.DataFrame`` and ``ColumnDataSource`` (`bokeh`_) objects

Initializing :py:class:`~masci_tools.vis.data.PlotData` without a mapping
----------------------------------------------------------------------------------

Users might want to provide data directly as arrays. If this should be allowed, there is a function :py:func:`~masci_tools.vis.data.process_data_arguments()` to allow for this option. This function can either take a ``data`` argument with a mapping and the same keyword arguments as the :py:class:`~masci_tools.vis.data.PlotData`.

.. code-block::

from masci_tools.vis.data import process_data_arguments

plot_data = process_data_arguments(data=d, x='x', y=['y1','y2'])

Or you can provide the arrays directly without a ``data`` argument

.. code-block::

from masci_tools.vis.data import process_data_arguments

#x,y1,y2 are the actual arrays
plot_data = process_data_arguments(x=x, y=[y1,y2])

If no ``data`` argument is given the keyword arguments are assumed to contain the data and they will be processed according to three rules:
1. If the data is a multidimensional array (list of lists, etc.) and it is not forbidden by the given argument the first dimension of the array is iterated over and interpreted as separate entries (if the data was previously split up into multiple sets a length check is performed)
2. If the data is a one-dimensional array and of a different length than the number of defined data sets it is added to all previously existing entries
3. If the data is a one-dimensional array and of the same length as the number of defined data sets each entry is added to the corresponding data set

.. note::
List or array in this context refers to ``list``, ``np.array`` and ``pd.Series``

Available routines on :py:class:`~masci_tools.vis.data.PlotData`
----------------------------------------------------------------------

There are a couple of routines for mutating/copyying or getting information about the data in a :py:class:`~masci_tools.vis.data.PlotData` instance. These are not meant to be used heavily and should be used for typical simple work done for plot data processing, i.e. scaling, shifting, getting limits, ...

.. note::
The term data key in the following section refers to the keys of the keyword arguments given to :py:class:`~masci_tools.vis.data.PlotData` at initialization or the fields on the namedtuples returned by iterating over an instance

- :py:meth:`~masci_tools.vis.data.PlotData.get_keys()`: Get all the keys for a given data key in a list
- :py:meth:`~masci_tools.vis.data.PlotData.get_values()`: Get all the values for a given data key in a list
- :py:meth:`~masci_tools.vis.data.PlotData.min()`: Get the minimum value for a given data key. A mask can be passed to further select the data. If ``separate=True`` is passed a list of minimum values for each plot is returned
- :py:meth:`~masci_tools.vis.data.PlotData.max()`: Get the maximum value for a given data key. A mask can be passed to further select the data. If ``separate=True`` is passed a list of maximum values for each plot is returned
- :py:meth:`~masci_tools.vis.data.PlotData.apply()`: Apply a lambda function to transform the data of a given data key (in-place!!)
- :py:meth:`~masci_tools.vis.data.PlotData.get_function_result()`: Apply a function to a given data key and return the results (Does not change the data)
- :py:meth:`~masci_tools.vis.data.PlotData.sort_data()`: Sort the data by the given data keys
- :py:meth:`~masci_tools.vis.data.PlotData.group_data()`: Group the data by the given data keys
- :py:meth:`~masci_tools.vis.data.PlotData.shift_data()`: Shift the data of a given data key either globally or with different shifts for each plot
- :py:meth:`~masci_tools.vis.data.PlotData.copy_data()`: Copy data to a of one data key to a new data key
- :py:meth:`~masci_tools.vis.data.PlotData.distinct_datasets()`: Return how many different datasets exist for a given data key

.. warning::
The methods :py:meth:`~masci_tools.vis.data.PlotData.sort_data()` and :py:meth:`~masci_tools.vis.data.PlotData.group_data()` will always convert the data sources to ``pd.DataFrame`` objects if they are not already.
3 changes: 3 additions & 0 deletions docs/source/module_guide/code.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ General Plotting
.. automodule:: masci_tools.vis.parameters
:members:

.. automodule:: masci_tools.vis.data
:members:

Matplotlib
^^^^^^^^^^^

Expand Down
37 changes: 36 additions & 1 deletion docs/source/user_guide/plotting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,42 @@ For both of these there are a lot of plotting routines available (both general o
- :py:func:`~masci_tools.vis.bokeh_plots.bokeh_spinpol_bands()`: Plot a general bandstructure (spinpolarized)
- :py:func:`~masci_tools.vis.bokeh_plots.periodic_table_plot()`: Make a interactive plot of data for the periodic table

If you have ideas for new useful and beatiful plotting routines you are welcome to contribute. Refer to the section :ref:`devguideplotting` for a guide on how to get started.
If you have ideas for new useful and beatiful plotting routines you are welcome to contribute. Refer to the sections :ref:`devguideplotting` and :ref:`devguideplotdata` for a guide on how to get started.

Providing Data
--------------

Data can be provided to plotting functions in two main ways:

1. The first arguments and data arguments are given the keys in a mapping, which should be used. The correspinding mapping is provided via the ``data`` keyword argument
2. The first arguments and data arguments are given the data that should be plotted against each other.

The following two code blocks are equivalent in terms of the provided data.

.. code-block::

from masci_tools.vis.plot_methods import multiple_scatterplots
import numpy as np

x = np.linspace(-10,10,100)
y1 = x**2
y2 = 20*np.sin(x)

#The data is split up according to fixed rules that the plot function defines.
#The default behaviour is that a list of lists is interpreted as multiple separate plots
ax = multiple_scatterplots(x, [y1, y2])

.. code-block::

from masci_tools.vis.plot_methods import multiple_scatterplots
import numpy as np

x = np.linspace(-10,10,100)
y1 = x**2
y2 = 20*np.sin(x)
data = {'x': x, 'y1': y1, 'y2': y2}

ax = multiple_scatterplots('x', ['y1', 'y2'], data=data)

Customizing Plots
------------------
Expand Down
2 changes: 1 addition & 1 deletion masci_tools/io/common_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -379,7 +379,7 @@ def convert_to_pystd(value):
value = int(value)
elif isinstance(value, np.floating):
value = float(value)
elif isinstance(value, np.str):
elif isinstance(value, np.str_):
value = str(value)
elif isinstance(value, dict):
for key, val in value.items():
Expand Down
Loading