Python benchmarks to process a csv file

To run the benchmarks install Pixi, clone this repository and from inside the repository directory run:

gzip -dk data.csv.gz
pixi install
pixi run bench

Results

The results of processing the CSV file (not counting the time to initialize the Python interpreter or load libraries) are next:

Description	File / Function	Time (seconds)
Pure Python looping with csv module using int types	pure_python_int	3.4547557830810547
Pure Python looping with csv module using float types	pure_python_float	3.8738009929656982
pandas with C engine	pandas_c	1.50089430809021
pandas with Python engine	pandas_python	8.328583478927612
pandas with PyArrow engine and NumPy dtypes	pandas_pyarrow	0.31276631355285645
pandas with PyArrow engine and PyArrow dtypes	pandas_pyarrow_arrow	0.29172492027282715
Polars in lazy mode	polars_lazy	0.10555672645568848
Polars in streaming mode	polars_streaming	0.11504125595092773
Polars with SQL API	polars_sql	0.09796714782714844
DuckDB with SQL API	duckdb_sql	0.8167853355407715
DataFusion with SQL API	datafusion_sql	0.20633697509765625
NumPy with loadtxt function	numpy_loadtxt	1.8354885578155518

The exact version of each library can be seen in the pixi.toml file. Note that DuckDB seems to package for conda-forge later, so the benchmarks use DuckDB 0.9 while 0.10 seems to be available in other package managers.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
python_benchmarks		python_benchmarks
.gitignore		.gitignore
README.md		README.md
data.csv.gz		data.csv.gz
gen_data.py		gen_data.py
pixi.lock		pixi.lock
pixi.toml		pixi.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python benchmarks to process a csv file

Results

About

Releases

Packages

Languages

datapythonista/bench_csv

Folders and files

Latest commit

History

Repository files navigation

Python benchmarks to process a csv file

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages