-
Notifications
You must be signed in to change notification settings - Fork 24.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add time series support to compute engine #105397
Labels
:Analytics/Compute Engine
Analytics in ES|QL
:StorageEngine/TSDB
You know, for Metrics
Team:Analytics
Meta label for analytical engine team (ESQL/Aggs/Geo)
Team:StorageEngine
Comments
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
martijnvg
added a commit
that referenced
this issue
Mar 7, 2024
This change adds an experimental time series source operator that gets enabled when `time_series` query pragma is set. When enabled, the documents the source operator emits, are in time series order. Meaning sorted by tsid asc and timestamp descending. Other yet to be introduced operators can make use of the sorted order and optimizations or computations that would otherwise not be feasible. Example usage: ``` POST /_query?format=txt { "query": "FROM cpu_tsbs | LIMIT 3", "pragma": { "time_series": true } } ``` Note that this change on its own doesn't add any real functionality order then the sort order in which data gets emitted. This change is part of a series of many changes that would eventually add time series query support to ES|QL. There are many things to be done like adding a time series grouping operator that makes use of the sorted nature of pages that this source operator adds, adding parallization support, adding time series function support like `rate` and much more. Relates #105397
This was referenced Mar 18, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Analytics/Compute Engine
Analytics in ES|QL
:StorageEngine/TSDB
You know, for Metrics
Team:Analytics
Meta label for analytical engine team (ESQL/Aggs/Geo)
Team:StorageEngine
This is the meta issue that tracks the work to be done to the compute engine in order to power time series support. This for now at least doesn't include the language changes to ES|QL. The compute engine components should only be active via enabling specific query pragmas, until the time series compute engine components are more stable and the es|ql language is ready to adopt it.
General overview
(an overview of how time series aggregation can work in the compute engine (assuming all time series don't cross backing index boundary))
The idea is that a new source operator will emit all matching document in time series order (
_tsid
ascending,@timestamp
descending). Documents are sorted in that order at the segment level, but not at the shard level. A page will additionally also include tsid and timestamp blocks. Documents of the same time serie should be contained by the block. A new time series grouping operator will make use of the sorted nature of the pages that the source operator emits and groups by tsid or tsid and timestamp interval. The output of this operator can be used by other operates such as theHashAggregationOperator
.Sometimes not all samples or a time series are in the same shard. This can happen when a query targets multiple backing indices of a tsdb data stream. In this case we need for the affected time series post pone grouping in the new time series grouping operator. The new time series grouping operator needs to group these time series on the coordinating node (when the aggregation mode is final in
AggregateExec
). Initially we will build a time series grouping operator that assumes that time series are always scattered across multiple backing indices and thus performs the grouping when the aggregate mode is final. In follow ups, we can then improve the new time series grouping operator to detect when time series don't cross backing index boundaries. In that case the grouping can perform locally, when aggregation mode is partial.Initially we will only allow filtering on dimension fields. More specifically the filters that get pushed down to the time series source operator. If filters on labels or metrics get pushed down to the source operator we run at risk of breaking the ordered samples of a time serie apart.
Tasks
@timestamp
filter inWHERE
clause doesn't cross the boundary of a backing index. Or timestamp interval group is contained within a backing index.Aggregator
andGroupingAggregator
interfaces to accept sorted pages/blocks. #106414BUCKET
syntax is used.TSTATS
syntax.aggregate_double_metric
field type in es|ql in order to support downsampling. Add support for aggregate_metric_double field in es|ql #110649_doc_count
field in es|ql in order to support downsampling.index.time_series.start_time
andindex.time_series.end_time
index settings, so that backing indices that will never match with the ES|QL query will be excluded from the query execution. (This is based on where filter on@timestamp
field.)Optional:
The text was updated successfully, but these errors were encountered: