Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dedicated field types for durations and byte sizes #31244

Open
jpountz opened this issue Jun 11, 2018 · 18 comments · May be fixed by #104037
Open

Add dedicated field types for durations and byte sizes #31244

jpountz opened this issue Jun 11, 2018 · 18 comments · May be fixed by #104037
Labels
>enhancement high hanging fruit :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@jpountz
Copy link
Contributor

jpountz commented Jun 11, 2018

I'm opening this feature request as a follow-up of a conversation with @ruflin. Today users typically use numeric types (eg. long, float, scaled_float) with a convention regarding units (sometimes made explicit in the name of the field, eg. transferred_bytes or duration_ms) in order to store durations or byte sizes, but we could make the experience better by having native support for these fields in Elasticsearch:

  • because Elasticsearch would internally store eg. nanos for durations and bytes for byte sizes, Elasticsearch would handle conversions automatically and it would be transparent to applications, which would just need to make sure that their values have explicit units so that they are not rejected
  • better query parsing support: query parsers could understand things like +bytes_transferred:[1MB TO 1GB] +duration:[1s TO 1d]
  • Better storage efficiency: for such data types, the order of magnitude is often much more useful than the exact value, and it might be ok to only guarantee eg. a 0.1% accuracy, which would in-turn allow to store all reasonable values (up to thousands of terabytes or thousands of years) using only 16 bits per value.

One risk is that we end up with lots of feature requests to support distances, weights, etc. Where do we draw the line? It's been suggested that we only have one field that we configure with what it is going to store but it might not be practical given that some units have their own specificities, eg. k means 1024 for byte sizes and 1000 for weights, some durations are not fixed (months, years, etc.). At first sight it looks cleaner to have one type per unit, which doesn't mean they can't share code internally.

@jpountz jpountz added >feature help wanted adoptme discuss :Search Foundations/Mapping Index mappings, including merging and defining field types labels Jun 11, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@ddorian
Copy link

ddorian commented Jun 11, 2018

Hmm, no other database/search-engine has this type of field, correct ?

@jpountz
Copy link
Contributor Author

jpountz commented Jun 11, 2018

Good question. I don't know of any, but since I have limited knowledge of what field types other datastores provide, I could easily miss (even a major) one.

@jpountz
Copy link
Contributor Author

jpountz commented Jun 15, 2018

Discussed in FixitFriday: we want to do it. We will start with duration and byte sizes, which are common data that is stored in Elasticsearch. There might be asks for distances and temperatures coming next, we will handle such requests as they come depending on how much usage we expect from them.

@jpountz jpountz removed the discuss label Jun 15, 2018
@timroes
Copy link
Contributor

timroes commented Jun 19, 2018

@elastic/kibana-visualizations This will mean changes in supported types (e.g. do they actual return numeric values we can use in charts? will they include scaled extensions?), and might also require some changes in how values need to be handled or how we can show them.

@elastic/kibana-discovery This might mean changes to the filtering UI. Also this might mean changes to KQL to query for those fields.

@elastic/kibana-management This mean new fields types (if that effects index patterns somehow), this might also mean some changes to field formatters for those types.

@jpountz Please mention the above teams in case you are creating a PR or further tickets related to this feature.

@timroes
Copy link
Contributor

timroes commented Jun 19, 2018

/cc @epixa
/cc @alexfrancoeur (I think you know about that topic already, but it looks like you are not [yet] following that issue)

@Bargs
Copy link
Contributor

Bargs commented Jun 19, 2018

These seem similar to the range types, which afaik we don't do anything special for in Kibana. Is there something different about these that would imply we need to support them at launch, or is it a similar level of priority as other field types that we don't currently support?

@timroes
Copy link
Contributor

timroes commented Jun 20, 2018

I think we should at least be involved from the very beginning to highlight potential issues. For example I talked yesterday to Adrien, and right now the API was planned to return strings for those units, which would make it impossible to use any of those values inside charts as metrics (like drawing the traffic usage over time, or the duration an API took per Endpoint). Since imho especially for those metric values, people want to visualize them quickly in Kibana, we should at least staying involved in that, and not start thinking about it, after ES has build that feature and possibly can't change any API around it easily anymore. At what point we actually want to put this on our roadmap I think is a different discussion we need to have :-)

@alexfrancoeur
Copy link

@Bargs as more people begin to use auto complete, is there anything we'll have to do to support this in KQL/Kuery?

@Bargs
Copy link
Contributor

Bargs commented Jun 28, 2018

KQL queries get turned into regular query DSL queries like range, match, exists and query_string, so assuming these field types don't need any special treatment in order to be used in those queries we should be fine.

@ruflin
Copy link
Member

ruflin commented Jan 7, 2019

I want to bump this thread as I still see quite a few use cases especially for the duration type.

@dagguh
Copy link

dagguh commented May 30, 2019

Note that there's an ISO standard for duration and time intervals, including syntax and semantics. We should respect those standards for maximum reuse and the principle of least astonishement.

@ruflin
Copy link
Member

ruflin commented Jun 5, 2019

@jasontedor I wanted to bump this issue here as we started to discuss again around bytes and duration fields in Elasticsearch in the context of ECS and adding metrics: elastic/ecs#480

@rjernst rjernst added the Team:Search Meta label for search team label May 4, 2020
@felixbarny
Copy link
Member

I was about to file a feature request for a duration type and found this issue. I think dedicated types for duration and byte sizes would be really cool for more natural queries, especially for observability use cases. Now that ES|QL makes writing queries nicer and more concise, I think that these types would add another layer of sugar that makes interacting with your data more intuitive, expressive, and sweet.

When implementing the field types, I think we should design them with backwards compatibility with numeric types in mind so that we can use them as a drop-in replacement that provides strictly additive functionality. One aspect of that is that by default, we should return a numerical value instead of a string representation of the duration or byte size. We can use the formatter functionality that exists for the date field type to optionally return the values in a string representation.

I realize that it's difficult to prioritize this as it's not really an essential thing and it potentially requires changes in a lot of different areas. But maybe we can restrict the amount of effort and coordination by adhering to the principle of strict backwards compatibility with numeric field types. This may also be a good issue to pick up for a spacetime project.

@felixbarny
Copy link
Member

After I wrote my previous message, I saw an internal discussion where the decision was to not add dedicated field types but add support for arbitrary metadata on field types: #49419.

I don't disagree with that decision and I don't think this is an either/or kind of thing. In fact, IMHO, this issue is as relevant as ever as one of the things that's not supported using #49419 is doing queries like http.request.body.bytes > 1MiB. In that discussion, it has been mentioned that we could support something like that in KQL. I don't think that's an adequate alternative to directly supporting this in Elasticsearch as all other query languages, most notably ES|QL, wouldn't benefit from that. Another concern that was mentioned is the backwards compatibility with numeric field types. But as mentioned in my previous message, I think we can make it fully compatible. This will be a requisite anyway for us to be able to adopt these field types for existing use cases.

@felixbarny
Copy link
Member

I played around with this a bit and created a PR: #104037.

Instead of creating dedicated field types, I leveraged the unit metadata field which seemed more appropriate. That way, the choice of the unit is orthogonal to the numeric field type used and it also integrates well with OpenTelemetry metric units.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@javanna javanna added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 16, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement high hanging fruit :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

Successfully merging a pull request may close this issue.