Skip to content

Talk demonstrating how to massively optimise data processing and numerical computation in Python

License

Notifications You must be signed in to change notification settings

DonaldWhyte/high-performance-data-processing-in-python

Repository files navigation

High Performance Data Processing in Python

Talk demonstrating how to massively optimise data processing and numerical computation in Python. We perform outlier detection on a large time-series weather dataset (ISD). We take detecting outliers in 600GBs worth of data in Python down from 28 days to 38 minutes.

Topics covered:

  • motivations for fast numerical processing in Python
  • why Python is a slow programming language
  • fast numerical processing in numpy
  • vectorisation
  • using numba to optimise non-vectorised code
  • parallelising computation using joblib

Running Presentation

You can also run the presentation on a local web server. Clone this repository and run the presentation like so:

npm install
grunt serve

The presentation can now be accessed on localhost:8080. Note that this web application is configured to bind to hostname 0.0.0.0, which means that once the Grunt server is running, it will be accessible from external hosts as well (using the current host's public IP address).

About

Talk demonstrating how to massively optimise data processing and numerical computation in Python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published