Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we warn users when they look for data that is older than the retention period? #57928

Closed
jpountz opened this issue Jun 10, 2020 · 10 comments
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team team-discuss

Comments

@jpountz
Copy link
Contributor

jpountz commented Jun 10, 2020

It's quite common that different people are responsible for configuring data ingestion and actually analyzing the data. How can analysts tell whether they cannot find events older than X months because this is the retention period of your data or just because no events match the current filter and are older than X months?

Some cases make this even more trappy. For instance think of a user searching for sequences of event of category X then Y. If X and Y have different retention periods then it's easy to be misleaded to think that there is old data when actually there is only old data for one of the categories.

Some questions to get the discussion started:

  • Should we warn users on all date(_nanos) fields or only the timestamp field of data streams?
  • What should happen when searching across data streams that have different retention periods?
  • Should we ignore indices that are filtered out by the can_match phase? (with the caveat that the can_match phase may filter indices based on their @timestamp values)
@jpountz jpountz added >enhancement discuss :Search/Search Search-related issues that do not fall into other categories :Data Management/Data streams Data streams and their lifecycles labels Jun 10, 2020
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Jun 10, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jun 10, 2020
@dakrone
Copy link
Member

dakrone commented Jun 10, 2020

@jpountz how are you defining "retention period" here? ILM policy delete phase?

@jpountz
Copy link
Contributor Author

jpountz commented Jun 10, 2020

@dakrone Yes indeed.

@dakrone
Copy link
Member

dakrone commented Jun 10, 2020

Hmm.. what about a policy like this:

{
  "policy": {
    "phases" : {
      "hot" : {
        "min_age" : "0ms",
        "actions" : {
          "rollover" : {
            "max_docs" : 10000000,
            "max_size": "50gb"
          }
        }
      },
      "delete" : {
        "min_age" : "1d",
        "actions" : {
          "delete" : { }
        }
      }
    }
  }
}

We would have to be careful not to warn the that they shouldn't look for data past one day, because deletion is based off of the rollover time, so the index could be a month old even though their delete retention is one day

@jpountz
Copy link
Contributor Author

jpountz commented Jun 11, 2020

@dakrone I think you're bringing a good question, but it's not obvious to me that we should not warn though as the fact that data exists is a bit accidental. I'm thinking of the case of someone who experiments with a query with the goal of turning it into an alerting rule at some point. If there is data just because we're "lucky", wouldn't it better to warn users so that they don't accidentally create rules that might not see all the data that they expect to see?

@jpountz
Copy link
Contributor Author

jpountz commented Jul 28, 2020

We have some discussions about this yesterday and the following questions were raised:

  • What if a user wants to query the entire range of data, should we warn them in such a case?
  • What if a data stream stores outdated documents (documents that are older than the index creation date)?

@tomcallahan brought up the idea that maybe this shouldn't be about warning users, but instead we should enable Elasticsearch to return information about the retention period for a given index pattern. This would allow Kibana to tailor its UI for this retention period and e.g. give signs that filtering data from the "Last 90 days" isn't right if the data has a retention period of 30 days.

To move this forward we agreed to gather more feedback from Solutions to see whether this is something they already considered.

@roncohen
Copy link

++ for exposing this and letting Kibana decide how to show it

@dakrone
Copy link
Member

dakrone commented Aug 27, 2020

@jpountz since this has two area labels, which team should take ownership of this, the search team or the core/features team?

@jpountz
Copy link
Contributor Author

jpountz commented Aug 27, 2020

Thanks for the ping. Since the idea of adding this information to _field_caps seems to be getting traction, I'm assigning the search team.

@jpountz jpountz removed :Data Management/Data streams Data streams and their lifecycles Team:Data Management Meta label for data/management team labels Aug 27, 2020
@elastic elastic deleted a comment from elasticmachine Sep 24, 2020
@javanna
Copy link
Member

javanna commented Jun 24, 2024

This has been open for quite a while, and hasn't had a lot of interest. For now I'm going to close this as something we aren't planning on implementing. We can re-open it later if needed.

@javanna javanna closed this as not planned Won't fix, can't repro, duplicate, stale Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team team-discuss
Projects
None yet
Development

No branches or pull requests

5 participants