Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a live streaming API? #55358

Open
jpountz opened this issue Apr 16, 2020 · 11 comments
Open

Add a live streaming API? #55358

jpountz opened this issue Apr 16, 2020 · 11 comments
Labels
>enhancement :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@jpountz
Copy link
Contributor

jpountz commented Apr 16, 2020

Elasticsearch is often used to index logs and live-tailing the logs that match a given filter is a common use-case, but I think we could greatly improve the user experience here. The current approach is to periodically run a query that sorts hits by descending @timestamp and use a couple tricks to make these requests run efficiently.

But this approach generally delivers messages out-of-order: it's likely that a request returns for the first time an event that is older than the most recent event returned by the previous request. This is mostly due to how we partition data into shards:

  • even if you send events in order in a bulk request, it will be split into shard-level bulk requests and events will generally get indexed in a different order from the order they appeared in the original bulk request
  • shard refreshes are independent across shards, so index ordering doesn't even imply visibility ordering.

Would it be possible to build an API that, assuming that events get pushed to Elasticsearch in order, would be able to live-stream events in order as well?

@jpountz jpountz added discuss :Search/Search Search-related issues that do not fall into other categories labels Apr 16, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@jasontedor
Copy link
Member

@jpountz When I was thinking about the changes API, one use-case that I thought for our own products was exactly the Logs application and tailing logs there. I'm curious if you've thought about this in that context as well?

@jpountz
Copy link
Contributor Author

jpountz commented Apr 21, 2020

@jasontedor I have thought about it indeed. I don't think that it will be solved entirely by the Changes API because I feel like global ordering by @timestamp is important for the user experience, and I'm not seeing global ordering as a feature of the Changes API. But building on top of the Changes API might be convenient. Please let me know if you had different expectations.

We don't need the entire feature set of the Changes API, e.g. I don't think we would need to be informed about deletions so another option might be to use _search and search_after on the _seq_no and/or @timestamp fields at the shard level (both have different pros/cons).

Either way we'd need something on top in order to provide global ordering by @timestamp as much as possible. E.g. I believe that we'll want to ignore events that are too recent because there might be older events that are not visible yet because they are still indexing or not refreshed yet, these documents would only be returned on a following page.

@jpountz
Copy link
Contributor Author

jpountz commented Apr 21, 2020

We discussed it today as a group. This generally felt useful, and while both _search and the Changes API could be building blocks for this functionality, the Changes API is a more natural fit:

  • It's the point of the Changes API to return streams of changes on an index, it would be a pity to build duplicate functionality on top of _search
  • The Changes API will allow clients to listen to changes while _search requires polling. So building on the Changes API will help expose this API as a stream that clients can register to as well.

This raises interesting questions that we'll need to think about:

  • How to deal with rollovers?
  • How to deal with the case when a new data-stream that matches the index pattern gets created while the client is listening to changes?
  • Displaying hundreds of log lines per second wouldn't be useful, how should it degrade in case the user configures a filter that isn't selective enough or if there is a sudden burst?

@jpountz
Copy link
Contributor Author

jpountz commented Apr 21, 2020

Depends on #1242

@weltenwort
Copy link
Member

Thanks for considering this 🎉

While it makes total sense not to duplicate the effort for both APIs I would consider one property pretty important: It should be possible to achieve a consistent in both the changes API as well as _search. Is that realistic?

The reason is that the latter is probably still going to be used when fetching log entries for past time intervals.

@jpountz
Copy link
Contributor Author

jpountz commented Apr 22, 2020

@weltenwort The idea would be that whatever we end up exposing would take care of fetching log entries for past intervals too. The problem with _search is that it can't guarantee ordering across pages (it only guarantees it within a single page), so either a later page would include events that are older than some events from previous pages, or it would mistakenly ignore some logs if search_after is used.

@weltenwort
Copy link
Member

That sounds like it would solve the search_after tiebreaker problem for us 😍 Let me know if you want to validate any API design choice in regard to the Logs UI use case early in the process.

@jpountz
Copy link
Contributor Author

jpountz commented Apr 23, 2020

We'll certainly reach out when we start tackling this issue!

@rjernst rjernst added the Team:Search Meta label for search team label May 4, 2020
@jpountz jpountz removed the discuss label May 5, 2020
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@javanna javanna added :Search Foundations/Search Catch all for Search Foundations and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 17, 2024
@elasticsearchmachine elasticsearchmachine added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

7 participants