Cassandra Performance Profiling

This program uses YCSB to run latency and throughput benchmarks of Cassandra in Docker containers.

The results and analysis I got from running some SCAN and READ workloads are in RESULTS.md.

Navigation

Installation and Setup
How-to-run-the-workflow

Installation and Setup

Clone the repo by running git clone https://github.com/youngbryanyu/cassandra-profiling.git.

Docker

Install docker. We use docker to run all programs inside isolated containers. Make sure docker is set up.

YCSB

Install YCSB in the root directory of this project by running:

curl -O --location https://github.com/brianfrankcooper/YCSB/releases/download/0.17.0/ycsb-0.17.0.tar.gz
tar xfvz ycsb-0.17.0.tar.gz
mv ycsb-0.17.0 ycsb
cd ycsb

How to run the workflow

We will run our workflow using the run_workflow.sh script. This script does the following:

Starts a cassandra node on a docker container and sets up cassandra with the specified configurations
Runs the specified benchmark from a docker container
Processes the benchmark data and creates visualizations
Cleans up the container and image used.

Note: each new benchmark for a configuration will overwrite the previous benchmark data for that configuration. On each run of the workflow script, it will attempt to make a group plot comparing data across all CSV files in the output directory, which creates a visualization to compare different configurations, so make sure the data files come from runs with the same, YCSB run configurations, but different cassandra configurations.

The usage of the script is:

./run_workflow.sh [-c] <config> <duration_seconds> <operation> <workload> <num_records> <threads>

where:

-c is the optional flag specifying whether to cleanup and remove images and containers created by the workflow run
config is one of the supported values default, lcs, rowcache, lcs-rowcache, filecache
duration_seconds is the duration in seconds to run the benchmark
operation is the operation we are benchmarking using YCSB (e.g. SCAN, READ, INSERT, etc)
workload is the workload we are using from YCSB (e.g. workloada, workloadb, workloadc, etc)
num_records is the number of records to load for the benchmark
threads is the number of threads to use for YCSB to feed operations to cassandra

All arguments are required in the order above.

The raw data from the benchmarks is written to the directory /output/ycsb/. The visualizations plotted from the data are located in the directory /plots/.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
cassandra-conf		cassandra-conf
config		config
cql		cql
output/csv		output/csv
plots		plots
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
Dockerfile.cassandra		Dockerfile.cassandra
Dockerfile.ycsb		Dockerfile.ycsb
LICENSE		LICENSE
README.md		README.md
RESULTS.md		RESULTS.md
docker-compose.yml		docker-compose.yml
run_workflow.sh		run_workflow.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cassandra Performance Profiling

Navigation

Installation and Setup

Docker

YCSB

How to run the workflow

About

Releases

Packages

Languages

License

youngbryanyu/cassandra-profiling

Folders and files

Latest commit

History

Repository files navigation

Cassandra Performance Profiling

Navigation

Installation and Setup

Docker

YCSB

How to run the workflow

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages