Add time series support to compute engine #105397

martijnvg · 2024-02-12T13:47:54Z

This is the meta issue that tracks the work to be done to the compute engine in order to power time series support. This for now at least doesn't include the language changes to ES|QL. The compute engine components should only be active via enabling specific query pragmas, until the time series compute engine components are more stable and the es|ql language is ready to adopt it.

General overview

(an overview of how time series aggregation can work in the compute engine (assuming all time series don't cross backing index boundary))

The idea is that a new source operator will emit all matching document in time series order (_tsid ascending, @timestamp descending). Documents are sorted in that order at the segment level, but not at the shard level. A page will additionally also include tsid and timestamp blocks. Documents of the same time serie should be contained by the block. A new time series grouping operator will make use of the sorted nature of the pages that the source operator emits and groups by tsid or tsid and timestamp interval. The output of this operator can be used by other operates such as the HashAggregationOperator.

Sometimes not all samples or a time series are in the same shard. This can happen when a query targets multiple backing indices of a tsdb data stream. In this case we need for the affected time series post pone grouping in the new time series grouping operator. The new time series grouping operator needs to group these time series on the coordinating node (when the aggregation mode is final in AggregateExec). Initially we will build a time series grouping operator that assumes that time series are always scattered across multiple backing indices and thus performs the grouping when the aggregate mode is final. In follow ups, we can then improve the new time series grouping operator to detect when time series don't cross backing index boundaries. In that case the grouping can perform locally, when aggregation mode is partial.

Initially we will only allow filtering on dimension fields. More specifically the filters that get pushed down to the time series source operator. If filters on labels or metrics get pushed down to the source operator we run at risk of breaking the ordered samples of a time serie apart.

Tasks

Optional:

Ordinal-based BytesRef block for TSID Ordinal-based BytesRef block for TSID #106387
...

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2024-02-12T13:48:20Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2024-02-12T13:48:20Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

This change adds an experimental time series source operator that gets enabled when `time_series` query pragma is set. When enabled, the documents the source operator emits, are in time series order. Meaning sorted by tsid asc and timestamp descending. Other yet to be introduced operators can make use of the sorted order and optimizations or computations that would otherwise not be feasible. Example usage: ``` POST /_query?format=txt { "query": "FROM cpu_tsbs | LIMIT 3", "pragma": { "time_series": true } } ``` Note that this change on its own doesn't add any real functionality order then the sort order in which data gets emitted. This change is part of a series of many changes that would eventually add time series query support to ES|QL. There are many things to be done like adding a time series grouping operator that makes use of the sorted nature of pages that this source operator adds, adding parallization support, adding time series function support like `rate` and much more. Relates #105397

martijnvg added :StorageEngine/TSDB You know, for Metrics :Analytics/Compute Engine Analytics in ES|QL labels Feb 12, 2024

martijnvg self-assigned this Feb 12, 2024

martijnvg mentioned this issue Feb 12, 2024

Add time series source operator. #105398

Merged

elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine labels Feb 12, 2024

dnhatn mentioned this issue Mar 15, 2024

Ordinal-based BytesRef block for TSID #106387

Closed

This was referenced Mar 18, 2024

Implement time series grouping. #106411

Closed

Update Aggregator and GroupingAggregator interfaces to accept sorted pages/blocks. #106414

Closed

Add es|ql rate aggregate function #106415

Closed

siposea assigned dnhatn and unassigned martijnvg Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add time series support to compute engine #105397

Add time series support to compute engine #105397

martijnvg commented Feb 12, 2024 •

edited

Loading

elasticsearchmachine commented Feb 12, 2024

elasticsearchmachine commented Feb 12, 2024

Add time series support to compute engine #105397

Add time series support to compute engine #105397

Comments

martijnvg commented Feb 12, 2024 • edited Loading

General overview

Tasks

Optional:

elasticsearchmachine commented Feb 12, 2024

elasticsearchmachine commented Feb 12, 2024

martijnvg commented Feb 12, 2024 •

edited

Loading