Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The current Jackson configuration allows serializing cluster states that cannot be read back #96976

Open
original-brownbear opened this issue Jun 21, 2023 · 7 comments
Labels
>bug :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team

Comments

@original-brownbear
Copy link
Member

The current Jackson configuration uses a default maximum string length of ~5M characters. Any string longer than that, will throw an exception when deserializing.

This was uncovered in #96918 where a 5M+ string resulted in a completed broken cluster after restart. Since such a string cannot be deserialized by the current code, we can neither read the state from disk on node restart, nor can we restore a snapshot containing this string.

It seems the easiest fix here would be to increase the limit for the smile factory that we use to read the cluster state to Integer.MAX_VALUE I think.

@original-brownbear original-brownbear added >bug :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Jun 21, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team label Jun 21, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner DaveCTurner added :Core/Infra/Core Core issues without another label and removed :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. Team:Distributed Meta label for distributed team labels May 22, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label May 22, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@DaveCTurner
Copy link
Contributor

I'm asking @elastic/es-core-infra for help here because this seems like a fundamental problem in the XContent subsystem. In terms of cluster state persistence it's ok if we fail to even write a state because of an over-long string (although clearly better if we succeed and then read it back). It's only a problem if we write something we can't read.

@rjernst
Copy link
Member

rjernst commented Jun 21, 2024

I agree this is a problem with xcontent. However, I do not think we should raise this limit to max int. That would be a 2GB string. Why did we have a 5MB string in the first place? I'm more inclined to block writing the xcontent with a string length over the limit in the first place.

@original-brownbear
Copy link
Member Author

I agree, the problem was with the write side of things here initially and we should definitely not allow a string that long.
On the read side, for the cluster state specifically, maybe there shouldn't be a limit though. Assuming the state is not corrupted, having a limit and thus becoming unable to read a state that has a long string in it for whatever reason, means that the cluster is broken if you attempt a full cluster restart since you are unable to read the cluster state from disk.

I wonder if maybe we should have the ability to customize an individual XContentParser instance better here? I can see how you'd want that kind of limit on a string when ingesting documents or deserializing REST requests in general, it's less clear to me that you want it when working with cluster state or other internal on-disk metadata where any failure to read can mean big trouble.

@DaveCTurner
Copy link
Contributor

I'm more inclined to block writing the xcontent with a string length over the limit in the first place.

At least if we do this then the state on disk does not become unreadable, which would be much better than how things are today. But just to be clear, if we fail at write time then the master will not be able to commit the state update, which causes it to stand down and start a new election cycle. There's a good chance that the troublesome request came from a TransportMasterNodeAction which will retry on the newly-elected master, triggering another master failover, and so on. A full cluster restart may be needed to clear things out, and hopefully whatever condition led to the bad state update does not persist across the restart.

@DaveCTurner
Copy link
Contributor

retry on the newly-elected master

I expect we can find a way to make this situation un-retryable, once we put in place some mechanism to make the whole write fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team
Projects
None yet
Development

No branches or pull requests

6 participants