Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version check for serialized data headers #35376

Merged
merged 2 commits into from
Apr 8, 2020

Conversation

c42f
Copy link
Member

@c42f c42f commented Apr 6, 2020

Check that the serialization data format is compatible before attempting to read a serialized stream. New versions of Serialization are assumed to be able to read old serialized data, but attempting to read newer data with an older version of Serialization will fail with an error.

This code was written as part of an alternative solution to patching up CodeInfo serialization in #35371, but seems like it could be useful: it should prevent confusion if a newer julia versions somehow ends up sending data to older workers.

Check that the serialization data format is compatible before attempting
to read a serialized stream. New versions of Serialization are assumed
to be able to read old serialized data, but attempting to read newer
data with an older version of Serialization will fail with an error.
@c42f c42f requested a review from JeffBezanson April 6, 2020 12:41
error("Unknown endianness flag in header")
# Check protocol compatibility.
endian_bom == ENDIAN_BOM || error("Serialized byte order mismatch ($(repr(endian_bom)))")
wordsize == sizeof(Int) || error("Serialized word size mismatch ($wordsize)")
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure these should be errors; in many cases data will be readable moving between a 32-bit and 64-bit machine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the endianness being mismatched will break nearly everything unless we handle it explicitly? And we don't try to handle it right now AFAICT.

But quite true regarding sizeof(Int) as it's explicitly encoded as Int32 and Int64 and should deserialize ok at the other end. Should it be a warning perhaps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a couple of quick tests. It turns out that that removing the wordsize check does allow very simple things like homogenous arrays of integers to be deserialized between 32 and 64 bit Julia versions. So that can be helpful.

Once things get slightly complicated things start to break. For example serialized arrays of type Vector{Tuple{Int32,Int64}} are not portable due to platform ABI differences.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @c42f, @JeffBezanson, this 32/64 cross-architecture support is very useful to me. I have a cluster of 32-bit ARM sensor/control boards talking to a 64-bit intel master. I can't use addprocs() because of the architecture mismatch, but I've rolled my own simple RPC based on serialize(io, eval(deserialize(io))). It works nicely for my use case with simple data types (and seems vastly preferable to resorting to MsgPack.jl or ProtoBuf.jl)

It seems like it would be useful to have a serialize_compat function that guarantees to produce cross-architecture / cross-julia-version output. It could initially just throw an ArgumentError for input that isn't compatible. Support for more complex types could be added as needed.

I can see that it makes sense to keep serialize efficiency-focussed and keep the constraint of "must have same system image".

Do you think it would be generally useful to expose a subset of serialize with a compatibility guarantee?

@c42f
Copy link
Member Author

c42f commented Apr 7, 2020

Test failure is UnicodeData.txt

@c42f
Copy link
Member Author

c42f commented Apr 8, 2020

Ok, I've done the wordsize change as discussed, and tests have passed so I'll merge this small change. Hopefully this will reduce confusion for people maintaining Julia on ad-hoc clusters.

@c42f c42f merged commit 7daa424 into master Apr 8, 2020
@c42f c42f deleted the cjf/serialized-data-header-check branch April 8, 2020 03:21
ztultrebor pushed a commit to ztultrebor/julia that referenced this pull request Apr 17, 2020
Check that the serialization data format is compatible before attempting
to read a serialized stream. New versions of Serialization are assumed
to be able to read old serialized data, but attempting to read newer
data with an older version of Serialization will fail with an error.
staticfloat pushed a commit that referenced this pull request Apr 21, 2020
Check that the serialization data format is compatible before attempting
to read a serialized stream. New versions of Serialization are assumed
to be able to read old serialized data, but attempting to read newer
data with an older version of Serialization will fail with an error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants