Skip to content
/ tsbs2 Public
forked from timescale/tsbs

Time Series Benchmark Suite, a tool for comparing and evaluating databases for time series data

License

Notifications You must be signed in to change notification settings

danl8/tsbs2

 
 

Repository files navigation

Fork change list:

IMPORTANT all changes were made in tsbs_load program. Since tsbs_load_<dbname> sometimes uses different code base, not all changes made in tsbs_load will be available in tsbs_load_<dbname>. Also tsbs_load_<dbname> have not been tested at all.

Online data generation (on fly simulation)

Online data generation (benchmark without writing to temp file) now working for IoT scenarios, for DB:

  • Victoria Metrics
  • Quest DB
  • Click House

Multithreading for online data generation

Online (on fly) benchmark now can be used in multithreading mode.

IMPORTANT in this mode each run may vary slightly, because data generated in parallel there is no guarantee that random generation will be called in same order as in previous run. So generated data will not be the same as in previous run, even with same Seed.

To control multithreading yaml parameter added:

data-source:
    simulator:
            sim-workers-count: 4
  • If sim-workers-count = 1 - will work in single thread mode as before.
  • If sim-workers-count > 1 - data generation will be divided to sim-workers-count sub-batches, each sub-batch will be processed in separate thread

Tested only with:

  • Victoria Metrics.
  • Quest DB
  • Click House

IoT2 request generation

IoT2 test case description

New test case IoT2 added. This test case uses the same data set as IoT, but contains queries relevant for my use cases. This test case implemented for DBs:

  • Victoria Metrics
  • Quest DB
  • Click House

Because MetricQL (PromQL) language quite different by its structure and internal concept from SQL, don't sure that Victoria Metrics test case calculates exactly same results as Quest DB and ClickHouse. But it should be very similar in terms of calculation complexity.

IoT2 configuration

To generate queries for IoT2 you should run tsbs_generate_queries with --use-case="iot2". Additionally, you should specify:

  • --trucks-count - total count of different tucks in your dataset, IoT2 queries data for random track, so this count mandatory to correctly set boundary for random generator
  • --days-count - IoT2 selects queries data form random date fo period of days specified in this parameter
  • --scale - you can omit this parameter, value will be ignored

All other parameters have the same meaning as in other use cases. Example of correct IoT2 queries generation for ClickHouse:

tsbs_generate_queries --use-case="iot2" --seed=123 \
    --timestamp-start="2010-07-25T00:00:00Z" --timestamp-end="2010-08-01T00:00:01Z" \
    --queries=1000 --query-type="all-in-order" \ 
    --trucks-count=100 --days-count=1 \
    --format="clickhouse" \
    --file "queries_iot2_all_1000_clickhouse.txt"

Implemented query types

For parameter --query-type you can specify following types:

Value
--query-type
Description
all-in-order Creates all queries listed bellow cyclically in order
daily-average-load Queries average load of random truck per N consecutive days (--days-count)
daily-fuel-consumption-row Queries all rows with registered fuel consumption of random truck per N consecutive days (--days-count)
daily-low-fuel-count Queries count of row where fuel count <= 10% for random truck per N consecutive days (--days-count)

Optimisation for ClickHouse DB Structure

This changes should lead to lower space per data point usage, and possible fast query performance. Optimisation includes:

  • DoubleDelta for date columns
  • Gorilla compression for time series values columns
  • LowCardinality for string labels
  • Removed time String and additional_tags String columns

To use new optimized structure pleas specify yaml parameter (or corresponding program argument):

loader:
  db-specific:
    use-optimized-structure: true

IMPORTANT Optimized structure tested only with iot ingestion scenario and iot2 queries scenario. Additional coding may be needed to use optimized structure with other scenarios.

Other changes

  • Click House use new syntax for MergeTree table initialization
  • Later should add parameter ignore-fake-tags, by default equals to false. Should be analyzed in IoT data generation scenarios:
    • If false - benchmark behaves as previously: all numeric labels (aka fake tags) will be added as new timeseries, string tags still will be added as labels. This logic not inline with ClickHouse benchmark logic, because ClickHouse writes such tags in small separate table, and not creates timeseries for them. So number of written metrics for ClickHouse benchmark less than for VictoriaMetrics, QuestDB, InfluxDB... benchmark, for same scale and period
    • If true - benchmark for VictoriaMetrics, QuestDB, InfluxDB... will ignore all numeric labels (aka fake tags), so the number of metrics for same period and scale for this DBs, will be equal to the number of metrics for ClickHouse.
    • WARNING currently ignore-fake-tags = true hardcoded for VictoriaMetrics, QuestDB, InfluxDB and should be moved to configuration
    • Notice, for ClickHouse ignore-fake-tags = false were never implemented
    • For DBs other than VictoriaMetrics, QuestDB, InfluxDB, ClickHouse, this flag not implemented, and how they process not text labels unknown

Original description:

Time Series Benchmark Suite (TSBS)

This repo contains code for benchmarking several time series databases, including TimescaleDB, MongoDB, InfluxDB, CrateDB and Cassandra. This code is based on a fork of work initially made public by InfluxDB at https://github.com/influxdata/influxdb-comparisons.

Current databases supported:

Overview

The Time Series Benchmark Suite (TSBS) is a collection of Go programs that are used to generate datasets and then benchmark read and write performance of various databases. The intent is to make the TSBS extensible so that a variety of use cases (e.g., devops, IoT, finance, etc.), query types, and databases can be included and benchmarked. To this end we hope to help prospective database administrators find the best database for their needs and their workloads. Further, if you are the developer of a time series database and want to include your database in the TSBS, feel free to open a pull request to add it!

Current use cases

Currently, TSBS supports two use cases.

Dev ops

A 'dev ops' use case, which comes in two forms. The full form is used to generate, insert, and measure data from 9 'systems' that could be monitored in a real world dev ops scenario (e.g., CPU, memory, disk, etc). Together, these 9 systems generate 100 metrics per reading interval. The alternate form focuses solely on CPU metrics for a simpler, more streamlined use case. This use case generates 10 CPU metrics per reading.

In addition to metric readings, 'tags' (including the location of the host, its operating system, etc) are generated for each host with readings in the dataset. Each unique set of tags identifies one host in the dataset and the number of different hosts generated is defined by the scale flag (see below).

Internet of Things (IoT)

The second use case is meant to simulate the data load in an IoT environment. This use case simulates data streaming from a set of trucks belonging to a fictional trucking company. This use case simulates diagnostic data and metrics from each truck, and introduces environmental factors such as out-of-order data and batch ingestion (for trucks that are offline for a period of time). It also tracks truck metadata and uses this to tie metrics and diagnostics together as part of the query set.

The queries that are generated as part of this use case will cover both real time truck status and analytics that will look at the time series data in an effort to be more predictive about truck behavior. The scale factor with this use case will be based on the number of trucks tracked.


Not all databases implement all use cases. This table below shows which use cases are implemented for each database:

Database Dev ops IoT
Akumuli
Cassandra X
ClickHouse X
CrateDB X
InfluxDB X X
MongoDB X
QuestDB X X
SiriDB X
TimescaleDB X X
Timestream X
VictoriaMetrics

¹ Does not support the groupby-orderby-limit query ² Does not support the groupby-orderby-limit, lastpoint, high-cpu-1, high-cpu-all queries

What the TSBS tests

TSBS is used to benchmark bulk load performance and query execution performance. (It currently does not measure concurrent insert and query performance, which is a future priority.) To accomplish this in a fair way, the data to be inserted and the queries to run are pre-generated and native Go clients are used wherever possible to connect to each database (e.g., mgo for MongoDB, aws sdk for Timestream).

Although the data is randomly generated, TSBS data and queries are entirely deterministic. By supplying the same PRNG (pseudo-random number generator) seed to the generation programs, each database is loaded with identical data and queried using identical queries.

Installation

TSBS is a collection of Go programs (with some auxiliary bash and Python scripts). The easiest way to get and install the Go programs is to use go get and then make all to install all binaries:

# Fetch TSBS and its dependencies
$ go get github.com/timescale/tsbs
$ cd $GOPATH/src/github.com/timescale/tsbs
$ make

How to use TSBS

Using TSBS for benchmarking involves 3 phases: data and query generation, data loading/insertion, and query execution.

Data and query generation

So that benchmarking results are not affected by generating data or queries on-the-fly, with TSBS you generate the data and queries you want to benchmark first, and then you can (re-)use it as input to the benchmarking phases.

Data generation

Variables needed:

  1. a use case. E.g., iot (choose from cpu-only, devops, or iot)
  2. a PRNG seed for deterministic generation. E.g., 123
  3. the number of devices / trucks to generate for. E.g., 4000
  4. a start time for the data's timestamps. E.g., 2016-01-01T00:00:00Z
  5. an end time. E.g., 2016-01-04T00:00:00Z
  6. how much time should be between each reading per device, in seconds. E.g., 10s
  7. and which database(s) you want to generate for. E.g., timescaledb (choose from cassandra, clickhouse, cratedb, influx, mongo, questdb, siridb, timescaledb or victoriametrics)

Given the above steps you can now generate a dataset (or multiple datasets, if you chose to generate for multiple databases) that can be used to benchmark data loading of the database(s) chosen using the tsbs_generate_data tool:

$ tsbs_generate_data --use-case="iot" --seed=123 --scale=4000 \
    --timestamp-start="2016-01-01T00:00:00Z" \
    --timestamp-end="2016-01-04T00:00:00Z" \
    --log-interval="10s" --format="timescaledb" \
    | gzip > /tmp/timescaledb-data.gz

# Each additional database would be a separate call.

Note: We pipe the output to gzip to reduce on-disk space. This also requires you to pipe through gunzip when you run your tests.

The example above will generate a pseudo-CSV file that can be used to bulk load data into TimescaleDB. Each database has it's own format of how it stores the data to make it easiest for its corresponding loader to write data. The above configuration will generate just over 100M rows (1B metrics), which is usually a good starting point. Increasing the time period by a day will add an additional ~33M rows so that, e.g., 30 days would yield a billion rows (10B metrics)

IoT use case

The main difference between the iot use case and other use cases is that it generates data which can contain out-of-order, missing, or empty entries to better represent real-life scenarios associated to the use case. Using a specified seed means that we can do this in a deterministic and reproducible way for multiple runs of data generation.

Query generation

Variables needed:

  1. the same use case, seed, # of devices, and start time as used in data generation
  2. an end time that is one second after the end time from data generation. E.g., for 2016-01-04T00:00:00Z use 2016-01-04T00:00:01Z
  3. the number of queries to generate. E.g., 1000
  4. and the type of query you'd like to generate. E.g., single-groupby-1-1-1 or last-loc

For the last step there are numerous queries to choose from, which are listed in Appendix I. Additionally, the file scripts/generate_queries.sh contains a list of all of them as the default value for the environmental variable QUERY_TYPES. If you are generating more than one type of query, we recommend you use the helper script.

For generating just one set of queries for a given type:

$ tsbs_generate_queries --use-case="iot" --seed=123 --scale=4000 \
    --timestamp-start="2016-01-01T00:00:00Z" \
    --timestamp-end="2016-01-04T00:00:01Z" \
    --queries=1000 --query-type="breakdown-frequency" --format="timescaledb" \
    | gzip > /tmp/timescaledb-queries-breakdown-frequency.gz

Note: We pipe the output to gzip to reduce on-disk space. This also requires you to pipe through gunzip when you run your tests.

For generating sets of queries for multiple types:

$ FORMATS="timescaledb" SCALE=4000 SEED=123 \
    TS_START="2016-01-01T00:00:00Z" \
    TS_END="2016-01-04T00:00:01Z" \
    QUERIES=1000 QUERY_TYPES="last-loc low-fuel avg-load" \
    BULK_DATA_DIR="/tmp/bulk_queries" scripts/generate_queries.sh

A full list of query types can be found in Appendix I at the end of this README.

Benchmarking insert/write performance

TSBS has two ways to benchmark insert/write performance:

  • On the fly simulation and load with tsbs_load
  • Pre-generate data to a file and load it either with tsbs_load or the db specific executables tsbs_load_*

Using the unified tsbs_load executable

The tsbs_load executable can load data in any of the supported databases. It can use a pregenerated data file as input, or simulate the data on the fly.

You first start by generating a config yaml file populated with the default values for each property with:

$ tsbs_load config --target=<db-name> --data-source=[FILE|SIMULATOR]

for example, to generate an example for TimescaleDB, loading the data from file

$ tsbs_load config --target=timescaledb --data-source=FILE
Wrote example config to: ./config.yaml

You can then run tsbs_load with the generated config file with:

$ tsbs_load load timescaledb --config=./config.yaml

For more details on how to use tsbs_load check out the supplemental docs

Using the database specific tsbs_load_* executables

TSBS measures insert/write performance by taking the data generated in the previous step and using it as input to a database-specific command line program. To the extent that insert programs can be shared, we have made an effort to do that (e.g., the TimescaleDB loader can be used with a regular PostgreSQL database if desired). Each loader does share some common flags -- e.g., batch size (number of readings inserted together), workers (number of concurrently inserting clients), connection details (host & ports), etc -- but they also have database-specific tuning flags. To find the flags for a particular database, use the -help flag (e.g., tsbs_load_timescaledb -help).

Here's an example of loading data to a remote timescaledb instance with SSL required, with a gzipped data set as created in the instructions above:

cat /tmp/timescaledb-data.gz | gunzip | tsbs_load_timescaledb \
--postgres="sslmode=require" --host="my.tsdb.host" --port=5432 --pass="password" \
--user="benchmarkuser" --admin-db-name=defaultdb --workers=8  \
--in-table-partition-tag=true --chunk-time=8h --write-profile= \
--field-index-count=1 --do-create-db=true --force-text-format=false \
--do-abort-on-exist=false

For simpler testing, especially locally, we also supply scripts/load/load_<database>.sh for convenience with many of the flags set to a reasonable default for some of the databases. So for loading into TimescaleDB, ensure that TimescaleDB is running and then use:

# Will insert using 2 clients, batch sizes of 10k, from a file
# named `timescaledb-data.gz` in directory `/tmp`
$ NUM_WORKERS=2 BATCH_SIZE=10000 BULK_DATA_DIR=/tmp \
    scripts/load/load_timescaledb.sh

This will create a new database called benchmark where the data is stored. It will overwrite the database if it exists; if you don't want that to happen, supply a different DATABASE_NAME to the above command.

Example for writing to remote host using load_timescaledb.sh:

# Will insert using 2 clients, batch sizes of 10k, from a file
# named `timescaledb-data.gz` in directory `/tmp`
$ NUM_WORKERS=2 BATCH_SIZE=10000 BULK_DATA_DIR=/tmp DATABASE_HOST=remotehostname
DATABASE_USER=user DATABASE \
    scripts/load/load_timescaledb.sh

By default, statistics about the load performance are printed every 10s, and when the full dataset is loaded the looks like this:

time,per. metric/s,metric total,overall metric/s,per. row/s,row total,overall row/s
# ...
1518741528,914996.143291,9.652000E+08,1096817.886674,91499.614329,9.652000E+07,109681.788667
1518741548,1345006.018902,9.921000E+08,1102333.152918,134500.601890,9.921000E+07,110233.315292
1518741568,1149999.844750,1.015100E+09,1103369.385320,114999.984475,1.015100E+08,110336.938532

Summary:
loaded 1036800000 metrics in 936.525765sec with 8 workers (mean rate 1107070.449780/sec)
loaded 103680000 rows in 936.525765sec with 8 workers (mean rate 110707.044978/sec)

All but the last two lines contain the data in CSV format, with column names in the header. Those column names correspond to:

  • timestamp,
  • metrics per second in the period,
  • total metrics inserted,
  • overall metrics per second,
  • rows per second in the period,
  • total number of rows,
  • overall rows per second.

For databases, like Cassandra, that do not use rows when inserting, the last three values are always empty (indicated with a -).

The last two lines are a summary of how many metrics (and rows where applicable) were inserted, the wall time it took, and the average rate of insertion.

Benchmarking query execution performance

To measure query execution performance in TSBS, you first need to load the data using the previous section and generate the queries as described earlier. Once the data is loaded and the queries are generated, just use the corresponding tsbs_run_queries_ binary for the database being tested:

$ cat /tmp/queries/timescaledb-cpu-max-all-eight-hosts-queries.gz | \
    gunzip | tsbs_run_queries_timescaledb --workers=8 \
        --postgres="host=localhost user=postgres sslmode=disable"

You can change the value of the --workers flag to control the level of parallel queries run at the same time. The resulting output will look similar to this:

run complete after 1000 queries with 8 workers:
TimescaleDB max cpu all fields, rand    8 hosts, rand 12hr by 1h:
min:    51.97ms, med:   757.55, mean:  2527.98ms, max: 28188.20ms, stddev:  2843.35ms, sum: 5056.0sec, count: 2000
all queries                                                     :
min:    51.97ms, med:   757.55, mean:  2527.98ms, max: 28188.20ms, stddev:  2843.35ms, sum: 5056.0sec, count: 2000
wall clock time: 633.936415sec

The output gives you the description of the query and multiple groupings of measurements (which may vary depending on the database).


For easier testing of multiple queries, we provide scripts/generate_run_script.py which creates a bash script with commands to run multiple query types in a row. The queries it generates should be put in a file with one query per line and the path given to the script. For example, if you had a file named queries.txt that looked like this:

last-loc
avg-load
high-load
long-driving-session

You could generate a run script named query_test.sh:

# Generate run script for TimescaleDB, using queries in `queries.txt`
# with the generated query files in /tmp/queries for 8 workers
$ python generate_run_script.py -d timescaledb -o /tmp/queries \
    -w 8 -f queries.txt > query_test.sh

And the resulting script file would look like:

#!/bin/bash
# Queries
cat /tmp/queries/timescaledb-last-loc-queries.gz | gunzip | query_benchmarker_timescaledb --workers=8 --limit=1000 --hosts="localhost" --postgres="user=postgres sslmode=disable"  | tee query_timescaledb_timescaledb-last-loc-queries.out

cat /tmp/queries/timescaledb-avg-load-queries.gz | gunzip | query_benchmarker_timescaledb --workers=8 --limit=1000 --hosts="localhost" --postgres="user=postgres sslmode=disable"  | tee query_timescaledb_timescaledb-avg-load-queries.out

cat /tmp/queries/timescaledb-high-load-queries.gz | gunzip | query_benchmarker_timescaledb --workers=8 --limit=1000 --hosts="localhost" --postgres="user=postgres sslmode=disable"  | tee query_timescaledb_timescaledb-high-load-queries.out

cat /tmp/queries/timescaledb-long-driving-session-queries.gz | gunzip | query_benchmarker_timescaledb --workers=8 --limit=1000 --hosts="localhost" --postgres="user=postgres sslmode=disable"  | tee query_timescaledb_timescaledb-long-driving-session-queries.out

Query validation (optional)

Additionally each tsbs_run_queries_ binary allows you print the actual query results so that you can compare across databases that the results are the same. Using the flag -print-responses will return the results.

Appendix I: Query types

Devops / cpu-only

Query type Description
single-groupby-1-1-1 Simple aggregrate (MAX) on one metric for 1 host, every 5 mins for 1 hour
single-groupby-1-1-12 Simple aggregrate (MAX) on one metric for 1 host, every 5 mins for 12 hours
single-groupby-1-8-1 Simple aggregrate (MAX) on one metric for 8 hosts, every 5 mins for 1 hour
single-groupby-5-1-1 Simple aggregrate (MAX) on 5 metrics for 1 host, every 5 mins for 1 hour
single-groupby-5-1-12 Simple aggregrate (MAX) on 5 metrics for 1 host, every 5 mins for 12 hours
single-groupby-5-8-1 Simple aggregrate (MAX) on 5 metrics for 8 hosts, every 5 mins for 1 hour
cpu-max-all-1 Aggregate across all CPU metrics per hour over 1 hour for a single host
cpu-max-all-8 Aggregate across all CPU metrics per hour over 1 hour for eight hosts
double-groupby-1 Aggregate on across both time and host, giving the average of 1 CPU metric per host per hour for 24 hours
double-groupby-5 Aggregate on across both time and host, giving the average of 5 CPU metrics per host per hour for 24 hours
double-groupby-all Aggregate on across both time and host, giving the average of all (10) CPU metrics per host per hour for 24 hours
high-cpu-all All the readings where one metric is above a threshold across all hosts
high-cpu-1 All the readings where one metric is above a threshold for a particular host
lastpoint The last reading for each host
groupby-orderby-limit The last 5 aggregate readings (across time) before a randomly chosen endpoint

IoT

Query type Description
last-loc Fetch real-time (i.e. last) location of each truck
low-fuel Fetch all trucks with low fuel (less than 10%)
high-load Fetch trucks with high current load (over 90% load capacity)
stationary-trucks Fetch all trucks that are stationary (low avg velocity in last 10 mins)
long-driving-sessions Get trucks which haven't rested for at least 20 mins in the last 4 hours
long-daily-sessions Get trucks which drove more than 10 hours in the last 24 hours
avg-vs-projected-fuel-consumption Calculate average vs. projected fuel consumption per fleet
avg-daily-driving-duration Calculate average daily driving duration per driver
avg-daily-driving-session Calculate average daily driving session per driver
avg-load Calculate average load per truck model per fleet
daily-activity Get the number of hours truck has been active (vs. out-of-commission) per day per fleet
breakdown-frequency Calculate breakdown frequency by truck model

Contributing

We welcome contributions from the community to make TSBS better!

You can help either by opening an issue with any suggestions or bug reports, or by forking this repository, making your own contribution, and submitting a pull request.

Before we accept any contributions, Timescale contributors need to sign the Contributor License Agreement (CLA). By signing a CLA, we can ensure that the community is free and confident in its ability to use your contributions.

About

Time Series Benchmark Suite, a tool for comparing and evaluating databases for time series data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 95.6%
  • Shell 3.6%
  • Other 0.8%