PDX: Helper functions to run SQL on Pandas DataFrames

pip install git+https://github.com/ajfriend/pdx

Small ergonomic improvements to make it easy to run DuckDB queries on Pandas DataFrames.

pdx monkey-patches pandas.DataFrame to provide a df.sql(...) method.
since pdx uses DuckDB, you can leverage their convienient SQL dialect:
- https://duckdb.org/2022/05/04/friendlier-sql.html
- https://duckdbsnippets.com/

Query a Pandas DataFrame with df.sql(...). Omit the FROM clause because it is added implicitly:

import pdx
iris = pdx.data.get_iris()  # returns pandas.DataFrame

iris.sql("""
select
    species,
    count(*)
        as num,
group by
    1
""")

You can use short SQL (sub-)expressions because FROM and SELECT * are implied whenever they're omitted:

iris.sql('where petal_length > 4.5')

iris.sql('limit 10')

iris.sql('order by petal_length')

iris.sql('')  # returns the dataframe unmodified. I.e., 'select * from iris'

For more, check out the example notebook folder.

Other affordances

git clone https://github.com/duckdb/duckdb.git
cd duckdb
../env/bin/pip install -e tools/pythonpkg --verbose

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
notebooks		notebooks
src/pdx		src/pdx
tests		tests
.gitignore		.gitignore
changelog.md		changelog.md
dev_notes.md		dev_notes.md
makefile		makefile
pyproject.toml		pyproject.toml
readme.md		readme.md