Skip to content

benjaminbluhm/spark_parallel_forecasting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The repository contains the source code and dataset to reproduce the parallel computing exercise described in the paper:

Time Series Econometrics at Scale - A Practical Guide to Parallel Computing in (Py)Spark

Abstract

This paper provides a practical programming guide to setting up a minimum working example of a distributed system for parallel time series analysis. The system is built in Apache Spark on top of Amazon's Hadoop-based service Elastic MapReduce (EMR). A simple forecasting exercise with 1,000 time series illustrates the proposed parallelization scheme, which reduces total runtime performance by about 95% relative to a single-core, single-machine setting. The ease of implementing this scheme makes this guide a useful reference for econometricians with a limited background in parallel programming. To facilitate reproducibility of the practical steps in this guide, the PySpark/Python code is available for download on github.

Link to the paper: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3226976

Releases

No releases published

Packages

No packages published

Languages