Performance Issue with now() - Interval in Aggregated Query #2720

fredrikIOT · 2023-11-09T08:40:04Z

What type of bug is this?

Performance issue

What subsystems are affected?

Standalone mode, Frontend

What happened?

Environment:

Database version: 0.4.2

Issue Summary:
When executing an aggregated query over the previous 5 hours, using now() - Interval '5 hours' in the WHERE clause significantly slows down the query compared to using a hardcoded timestamp.

Expected Behavior:
Both queries should have similar execution times since they are theoretically querying the same time window.

Actual Behavior:
The query using now() - Interval '5 hours' is approximately 4 times slower than the query with a hardcoded timestamp.

First Query (slow query):

SELECT 
  DATE_BIN(INTERVAL '1 minute', ts, '1970-01-01T00:00:00Z'::TIMESTAMP) AS ts,
  mytag,
  avg(value)
FROM "my_table"  
WHERE 
  ts > now() - INTERVAL '5 hours'
  AND uid_tag = 'test'
  AND sensor = 'gas'
GROUP BY 1, "sensor"

Second Query (fast query):

SELECT 
  DATE_BIN(INTERVAL '1 minute', ts, '1970-01-01T00:00:00Z'::TIMESTAMP) AS ts,
  mytag,
  avg(value)
FROM "my_table"  
WHERE 
  ts > '2023-11-08 00:00:16.408'::timestamp
  AND uid_tag = 'test'
  AND sensor = 'gas'
GROUP BY 1, "sensor"

Hypothesis:
It seems that the dynamic calculation of now() - Interval '5 hours' might be affecting the query optimizer's ability to efficiently execute the query. However, the exact cause is unclear and needs investigation.

What operating system did you use?

Docker (installed on Ubuntu 22.04.3 LTS)

Relevant log output and stack trace

No response

How can we reproduce the bug?

Testing Setup:

Insert data has been representing 20000 sensors being sent from 50 different locations, where each location have it's own table. Each sensor has it's own tag.
Data was being inserted at a rate of 20000 data points per second (400 for each for each table, spread on 50 tables).
A simulation was run to inject data continuously for 3 days.
The performance discrepancy was observed when querying data from the last 5 hours.

Steps to Reproduce:

Ensure the test database is populated with data following the described rate and duration.
Execute the first query using now() - INTERVAL '5 hours' to filter records from the last 5 hours.
Execute the second query using a hardcoded timestamp that marks a point in time equivalent to 5 hours before the current time.
Compare the execution times of both queries.

Performance Metrics:

The first query, which uses the dynamic time computation with now(), took approximately 4.6 seconds to complete.
The second query, with the hardcoded timestamp, took about 1.45 seconds, despite actually querying a slightly larger time window and returning more data points.

Queries:

SELECT 
  DATE_BIN(INTERVAL '1 minutes', ts, '1970-01-01T00:00:00Z'::TIMESTAMP) AS ts,
  mytag,
  avg(value)
FROM "my_table"  
WHERE 
  ts > now() - Interval '5 hours'
  AND uid_tag = 'test'
  AND sensor = 'gas'
GROUP BY 1, "sensor"

Took about 4.6 seconds, while the query below used about 1.45 seconds:

SELECT 
  DATE_BIN(INTERVAL '1 minutes', ts, '1970-01-01T00:00:00Z'::TIMESTAMP) AS ts,
  mytag,
  avg(value)
FROM "my_table"  
WHERE 
  ts > '2023-11-08 00:00:16.408'::timestamp
  AND uid_tag = 'test'
  AND sensor = 'gas'
GROUP BY 1, "sensor"

Also, my last query was actually querying a bigger time windows than 5 hours, and returned more values, and still was much faster.

The text was updated successfully, but these errors were encountered:

waynexia · 2023-11-09T09:05:35Z

Hi @fredrikIOT, thanks for reporting this issue.

The data was spread equally on 50 different tables.

Do you mean you have 50 tables like my_table in the above SQL? How do you create those tables? I'd like to see the result of SHOW CREATE TABLE my_table if you are willing to paste one.

And, can you also paste the result of EXPLAIN ANALYZE <QUERY>? It provides the per-step time consumption analyze.

fredrikIOT · 2023-11-09T09:41:16Z

Hi @waynexia,

Thank you for your follow-up. Below is the information you requested:

SHOW CREATE TABLE Output for location1:

CREATE TABLE IF NOT EXISTS "location1" (
  "location_uid" STRING NULL,
  "room_uid" STRING NULL,
  "log_uid" STRING NULL,
  "sensor_name" STRING NULL,
  "value" DOUBLE NULL,
  "ts" TIMESTAMP(3) NOT NULL,
  TIME INDEX ("ts"),
  PRIMARY KEY ("location_uid", "room_uid", "log_uid", "sensor_name")
) ENGINE=mito WITH( regions = 1 )

Fast query:

EXPLAIN ANALYZE Output:

EXPLAIN ANALYZE SELECT 
  DATE_BIN(INTERVAL '1 minutes', ts, '1970-01-01T00:00:00Z'::TIMESTAMP) AS ts,
  sensor_name,
  avg(value)
FROM "location1" 
WHERE 
  ts > '2023-11-09 00:00:16.408'::timestamp
  --ts > now() - Interval '5 hours'
  and log_uid = 'Log1' and sensor_name = 'co2'
GROUP BY 1, "sensor_name"

Result:

ProjectionExec: expr=[date_bin(IntervalMonthDayNano("60000000000"),location1.ts,Utf8("1970-01-01T00:00:00Z"))@0 as ts, sensor_name@1 as sensor_name, AVG(location1.value)@2 as AVG(location1.value)], metrics=[output_rows=562, elapsed_compute=5.973µs] AggregateExec: mode=FinalPartitioned, gby=[date_bin(IntervalMonthDayNano("60000000000"),location1.ts,Utf8("1970-01-01T00:00:00Z"))@0 as date_bin(IntervalMonthDayNano("60000000000"),location1.ts,Utf8("1970-01-01T00:00:00Z")), sensor_name@1 as sensor_name], aggr=[AVG(location1.value)], metrics=[output_rows=562, elapsed_compute=303.571µs] CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=598, elapsed_compute=202.049µs] RepartitionExec: partitioning=Hash([date_bin(IntervalMonthDayNano("60000000000"),location1.ts,Utf8("1970-01-01T00:00:00Z"))@0, sensor_name@1], 8), input_partitions=8, metrics=[repart_time=1.81877ms, fetch_time=8.697737284s, send_time=252.256µs] CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=598, elapsed_compute=22.224µs] AggregateExec: mode=Partial, gby=[date_bin(60000000000, CAST(ts@2 AS Timestamp(Nanosecond, None)), 0) as date_bin(IntervalMonthDayNano("60000000000"),location1.ts,Utf8("1970-01-01T00:00:00Z")), sensor_name@0 as sensor_name], aggr=[AVG(location1.value)], metrics=[output_rows=598, elapsed_compute=4.949231ms] RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1, metrics=[repart_time=1ns, fetch_time=1.087010166s, send_time=30.638µs] ProjectionExec: expr=[sensor_name@3 as sensor_name, value@4 as value, ts@5 as ts], metrics=[output_rows=33699, elapsed_compute=7.605µs] MergeScanExec: peers=[4758823763968(1108, 0), ], metrics=[output_rows=33699, first_consume_time=1.085616849s, finish_time=1.087037767s, ready_time=52.198092ms]

Slow query:

EXPLAIN ANALYZE Output:

EXPLAIN ANALYZE SELECT 
  DATE_BIN(INTERVAL '1 minutes', ts, '1970-01-01T00:00:00Z'::TIMESTAMP) AS ts,
  sensor_name,
  avg(value)
FROM "location1" 
WHERE 
  --ts > '2023-11-09 00:00:16.408'::timestamp
  ts > now() - Interval '5 hours'
  and log_uid = 'Log1' and sensor_name = 'co2'
GROUP BY 1, "sensor_name"

Result:

ProjectionExec: expr=[date_bin(IntervalMonthDayNano("60000000000"),location1.ts,Utf8("1970-01-01T00:00:00Z"))@0 as ts, sensor_name@1 as sensor_name, AVG(location1.value)@2 as AVG(location1.value)], metrics=[output_rows=301, elapsed_compute=7.723µs] AggregateExec: mode=FinalPartitioned, gby=[date_bin(IntervalMonthDayNano("60000000000"),location1.ts,Utf8("1970-01-01T00:00:00Z"))@0 as date_bin(IntervalMonthDayNano("60000000000"),location1.ts,Utf8("1970-01-01T00:00:00Z")), sensor_name@1 as sensor_name], aggr=[AVG(location1.value)], metrics=[output_rows=301, elapsed_compute=273.111µs] CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=320, elapsed_compute=187.995µs] RepartitionExec: partitioning=Hash([date_bin(IntervalMonthDayNano("60000000000"),location1.ts,Utf8("1970-01-01T00:00:00Z"))@0, sensor_name@1], 8), input_partitions=8, metrics=[repart_time=410.474µs, fetch_time=63.599572025s, send_time=208.927µs] CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=320, elapsed_compute=3.20464ms] AggregateExec: mode=Partial, gby=[date_bin(60000000000, CAST(ts@2 AS Timestamp(Nanosecond, None)), 0) as date_bin(IntervalMonthDayNano("60000000000"),location1.ts,Utf8("1970-01-01T00:00:00Z")), sensor_name@0 as sensor_name], aggr=[AVG(location1.value)], metrics=[output_rows=320, elapsed_compute=2.006828ms] ProjectionExec: expr=[sensor_name@1 as sensor_name, value@2 as value, ts@3 as ts], metrics=[output_rows=17993, elapsed_compute=4.891µs] CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=17993, elapsed_compute=3.224305ms] CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=17993, elapsed_compute=4.123735ms] FilterExec: ts@3 > 1699504672673 AND log_uid@0 = Log1 AND sensor_name@1 = co2, metrics=[output_rows=17993, elapsed_compute=1.095815758s] RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1, metrics=[repart_time=1ns, fetch_time=7.093430586s, send_time=585.486264ms] ProjectionExec: expr=[log_uid@2 as log_uid, sensor_name@3 as sensor_name, value@4 as value, ts@5 as ts], metrics=[output_rows=82775998, elapsed_compute=37.4777ms] MergeScanExec: peers=[4758823763968(1108, 0), ], metrics=[output_rows=82775998, first_consume_time=374.417044ms, finish_time=7.949118407s, ready_time=374.333017ms]

Each of the 50 tables is set up similarly to "location1", with the difference being the contents. The rationale behind this structure was to partition the data by location, as I was unable to find a method to partition based on a primary key constraint other than using greater than or equal conditions. (If there's a more straightforward approach to this partitioning, I would be open to suggestions.)

Additional Note:
I would like to mention that there has been a noticeable change in the performance of the slow query since yesterday. While it took approximately 4.6 seconds to execute yesterday, today it has slowed down further to about 8.4 seconds. In contrast, the performance of the fast query has remained relatively stable, taking about 1.2 seconds.

This is the log from today:

The reason that the fast query has more rows is because it is querying for a longer period (about 11 hours).

waynexia · 2023-11-09T11:46:16Z

Thanks for those information. I've found the reason. Writing fix patch

waynexia · 2023-11-09T12:46:25Z

Could you check if #2723 works for you @fredrikIOT ? I can make a nightly release if you need

fredrikIOT · 2023-11-09T12:52:08Z

Could you check if #2723 works for you @fredrikIOT ? I can make a nightly release if you need

Could you please create a nightly release? I'd like to test the patch. Thank you for your prompt response and support.

waynexia · 2023-11-09T13:26:30Z

workflow: https://github.com/GreptimeTeam/greptimedb/actions/runs/6812444219

fredrikIOT · 2023-11-09T18:05:32Z

workflow: https://github.com/GreptimeTeam/greptimedb/actions/runs/6812444219

Now I get similar performance on both queries:

Thanks!

fredrikIOT · 2024-05-16T11:20:02Z

Now this issue is back. I have tested with:
docker image: greptime/greptimedb:v0.8.0-nightly-20240513
commit (found in dashboard): 6ab3aeb14221cb7164845061a9ed56a7fcf82b33

When using ts > '2024-05-16 10:00:16.408'::timestamp the query takes 137 ms and when using ts > now() - Interval '5 hours' the query takes 3.03 s (same number of rows are returned)

fredrikIOT · 2024-05-16T11:27:38Z

I am not sure when this started btw. I don't think this issue was in the v0.7.0 release

waynexia · 2024-05-16T12:54:09Z

Thanks for your report! (again)

I'm looking into this. Looks like our tests are not enough...

killme2008 · 2024-06-25T21:46:46Z

@waynexia How is this issue progressing?

fredrikIOT added the C-bug Category Bugs label Nov 9, 2023

waynexia mentioned this issue Nov 9, 2023

feat: run dist planner after simplify expression rule #2723

Merged

2 tasks

fredrikIOT closed this as completed Nov 9, 2023

fredrikIOT reopened this May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Issue with now() - Interval in Aggregated Query #2720

Performance Issue with now() - Interval in Aggregated Query #2720

fredrikIOT commented Nov 9, 2023 •

edited

Loading

waynexia commented Nov 9, 2023

fredrikIOT commented Nov 9, 2023 •

edited

Loading

waynexia commented Nov 9, 2023

waynexia commented Nov 9, 2023

fredrikIOT commented Nov 9, 2023

waynexia commented Nov 9, 2023

fredrikIOT commented Nov 9, 2023

fredrikIOT commented May 16, 2024 •

edited

Loading

fredrikIOT commented May 16, 2024

waynexia commented May 16, 2024

killme2008 commented Jun 25, 2024

Performance Issue with now() - Interval in Aggregated Query #2720

Performance Issue with now() - Interval in Aggregated Query #2720

Comments

fredrikIOT commented Nov 9, 2023 • edited Loading

What type of bug is this?

What subsystems are affected?

What happened?

What operating system did you use?

Relevant log output and stack trace

How can we reproduce the bug?

waynexia commented Nov 9, 2023

fredrikIOT commented Nov 9, 2023 • edited Loading

Fast query:

Slow query:

waynexia commented Nov 9, 2023

waynexia commented Nov 9, 2023

fredrikIOT commented Nov 9, 2023

waynexia commented Nov 9, 2023

fredrikIOT commented Nov 9, 2023

fredrikIOT commented May 16, 2024 • edited Loading

fredrikIOT commented May 16, 2024

waynexia commented May 16, 2024

killme2008 commented Jun 25, 2024

fredrikIOT commented Nov 9, 2023 •

edited

Loading

fredrikIOT commented Nov 9, 2023 •

edited

Loading

fredrikIOT commented May 16, 2024 •

edited

Loading