Skip to content

Latest commit

 

History

History

clickhouse_db

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Install ClickHouse

Add ClickHouse repo

sudo bash -c "echo 'deb https://repo.yandex.ru/clickhouse/deb/stable/ main/' > /etc/apt/sources.list.d/clickhouse.list"

Add key and update repolist

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4    # optional
sudo apt-get update

Install binaries

sudo apt-get install -y clickhouse-client clickhouse-server

More details on how to get started with ClickHouse is available here

Ensure ClickHouse is running

sudo service clickhouse-server restart

Install Python Dependencies

Install the clickhouse-driver python package

pip install clickhouse-driver

If you need to install pip on EC2, instructions can be found here: https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/eb-cli3-install-linux.html

Load Data

The data loading script requires arguments for number of workers to use and csv file to load. Due to column names requiring no spaces, ensure that the "Adj Close" column is reformatted as "Adj_Close" in your input csv. Output metrics are printed to the console.

python3 load_data_multithread.py [number of workers] [data csv]

Run Workloads

All other workloads are contained in a single script. You must specify how many workers to use. Output metrics are printed to the console.

python3 workloads_2_3_4.py [number of workers]