Latest Release | |
License | |
Build Status | |
Coverage | |
Documentation (Dev) | |
Documentation (Release) |
plydata is a library that provides a grammar for data manipulation. The grammar consists of verbs that can be applied to pandas dataframes or database tables. It is based on the R package dplyr. plydata uses the >> operator as a pipe symbol.
At present the only supported data store is the pandas dataframe. We expect to support sqlite and maybe postgresql and mysql.
plydata only supports Python 3.
Official version
$ pip install plydata
Development version
$ pip install git+https://github.com/has2k1/plydata.git@master
import pandas as pd
import numpy as np
from plydata import define, query
df = pd.DataFrame({
'x': [0, 1, 2, 3],
'y': ['zero', 'one', 'two', 'three']})
df >> define(z='x')
"""
x y z
0 0 zero 0
1 1 one 1
2 2 two 2
3 3 three 3
"""
df >> define(z=if_else('x > 1', 1, 0))
"""
x y z
0 0 zero 0
1 1 one 0
2 2 two 1
3 3 three 1
"""
# You can pass the dataframe as the # first argument
query(df, 'x > 1') # same as `df >> query('x > 2')`
"""
x y
2 2 two
3 3 three
"""
plydata piping works with plotnine.
from plotnine import ggplot, aes, geom_line
df = pd.DataFrame({'x': np.linspace(0, 2*np.pi, 500)})
(df
>> define(y='np.sin(x)')
>> define(sign=if_else('y >= 0', '"positive"', '"negative"'))
>> (ggplot(aes('x', 'y'))
+ geom_line(aes(color='sign'), size=1.5))
)
dplython and pandas-ply are two other packages that have a similar objective to plydata. The big difference is plydata does not use a placeholder variable (X) as a stand-in for the dataframe. For example:
diamonds >> select(X.carat, X.cut, X.price) # dplython
diamonds >> select('carat', 'cut', 'price') # plydata
select(diamonds, 'carat', 'cut', 'price') # plydata
For more, see the documentation.