Skip to content

MSeal/dx

Repository files navigation

dx

This package provides convenient formatting and IPython display formatter registration for tabular data and DEX media types.

CI codecov code coverage PyPI - License PyPI - Python Version PyPI Code style: black


A Pythonic Data Explorer, open sourced with ❤️ by Noteable, a collaborative notebook platform that enables teams to use and visualize data, together.

Requirements

Python 3.8+

Installation

Poetry

poetry add dx

Then import the package:

import dx

Pip

pip install dx

Then import the package:

import dx

Usage

The dx library currently enables DEX media type visualization of pandas DataFrame and Series objects, as well as numpy ndarray objects. This can be handled in two ways:

  • explicit dx.display() calls
  • setting the display_mode to update the IPython display formatter for a session

With dx.display()

dx.display() will display a single dataset using the DEX media type. It currently supports:

  • pandas DataFrame objects

    import pandas as pd
    import random
    
    df = pd.DataFrame({
        'random_ints': [random.randint(0, 100) for _ in range(500)],
        'random_floats': [random.random() for _ in range(500)],
    })
    dx.display(df)

  • tabular data as dict or list types

    dx.display([
      [1, 5, 10, 20, 500],
      [1, 2, 3, 4, 5],
      [0, 0, 0, 0, 1]
    ])

  • .csv or .json filepaths

    df = dx.random_dataframe()
    df.to_csv("dx_docs_sample.csv", index=False)
    
    dx.display("dx_docs_sample.csv")

With dx.set_display_mode()

Using either "simple" or "enhanced" display modes will allow dx will update the current IPython display formatters to allow DEX media type visualization of pandas DataFrame objects for an entire notebook / kernel session instead of the default DataFrame display output.

Details

This will adjust pandas options to:

  • increasing the number of rows displayed to 50000 from pandas default of 60
  • increasing the number of columns displayed to 50 from pandas default of 20
  • enabling html.table_schema (False by default in pandas)

This will also handle some basic column cleaning and generate a schema for the DataFrame using pandas.io.json.build_table_schema. Depending on the display mode, the data will be transformed into either a list of dictionaries or list of lists of columnar values.

  • "simple" - list of dictionaries
  • "enhanced" - list of lists

NOTE: Unlike dx.display(), this only affects pandas DataFrames (or any types set in settings.RENDERABLE_TYPES); it does not affect the display of .csv/.json file data, or dict/list outputs

  • dx.set_display_mode("simple")

    import dx
    import numpy as np
    import pandas as pd
    
    # enable DEX display outputs from now on
    dx.set_display_mode("simple")
    
    df = pd.read_csv("dx_docs_sample.csv")
    df
    df2 = pd.DataFrame(
        [
            [1, 5, 10, 20, 500],
            [1, 2, 3, np.nan, 5],
            [0, 0, 0, np.nan, 1]
        ],
        columns=['a', 'b', 'c', 'd', 'e']
    )
    df2

If, at any point, you want to go back to the default display formatting (vanilla pandas output), use the "plain" display mode. This will revert the IPython display format update to its original state and put the pandas options back to their default values.

  • dx.set_display_mode("plain")
    # revert to original pandas display outputs from now on
    dx.set_display_mode("plain")
    
    df = pd.read_csv("dx_docs_sample.csv")
    df
    df2 = pd.DataFrame(
        [
            [1, 5, 10, 20, 500],
            [1, 2, 3, np.nan, 5],
            [0, 0, 0, np.nan, 1]
        ],
        columns=['a', 'b', 'c', 'd', 'e']
    )
    df2

Custom Settings

Default settings for dx can be found by calling dx.settings:

Each can be set using dx.set_option(): Setting DISPLAY_MAX_ROWS to 3 for the current session

...or with the dx.settings_context() context manager: Setting DISPLAY_MAX_ROWS to 3 within the current context, leaving options for the rest of the session alone

Generating Sample Data

Documentation coming soon!

Usage Outside of Noteable

If using this package in a notebook environment outside of Noteable, the frontend should support the following media types:

  • application/vnd.dataresource+json for "simple" display mode
  • application/vnd.dex.v1+json for "enhanced" display mode

Contributing

See CONTRIBUTING.md.

Code of Conduct

We follow the noteable.io code of conduct.

LICENSE

See LICENSE.md.


Open sourced with ❤️ by Noteable for the community.

Boost Data Collaboration with Notebooks