Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hash(::Type) inconsistent with some signature tuples (sometimes) #49620

Closed
mortenpi opened this issue May 4, 2023 · 6 comments · Fixed by #49636
Closed

hash(::Type) inconsistent with some signature tuples (sometimes) #49620

mortenpi opened this issue May 4, 2023 · 6 comments · Fixed by #49636

Comments

@mortenpi
Copy link
Contributor

mortenpi commented May 4, 2023

Bit of an heisenbug, but in some cases, the hash() function for signature tuples yields inconsistent hashes. @ararslan was able to see this in a clean Julia session:

julia> t1 = Tuple{AbstractArray,AbstractArray{<:Integer},UnitRange{<:Integer}};

julia> t2 = Tuple{AbstractArray,AbstractArray{<:Integer},UnitRange{<:Integer}};

julia> hash(t1)
0x6460a2e89083289a

julia> hash(t2)
0x7c27d9754fb7224a

I couldn't actually precisely replicate this, but the issue definitely exists, since it causes Documenter to relatively consistently fail when checking missing docstrings in JuliaStats/StatsBase.jl#863. However, even that is not 100% consistent -- it did seem to depend on whether I included Revise or not at some point.

In a nutshell, Documenter needs to do a signature in existing_signatures check (where existing_signatures :: Set{Type}). But the in doesn't work properly. The bad hashing leads to:

julia> t1 = Tuple{AbstractArray,AbstractArray{<:Integer},UnitRange{<:Integer}}
Tuple{AbstractArray, AbstractArray{<:Integer}, UnitRange{<:Integer}}

julia> t2 = Tuple{AbstractArray,AbstractArray{<:Integer},UnitRange{<:Integer}}
Tuple{AbstractArray, AbstractArray{<:Integer}, UnitRange{<:Integer}}

julia> t1 == t2
true

julia> s = Set{Type}([t1, t2])
Set{Type} with 2 elements:
  Tuple{AbstractArray, AbstractArray{<:Integer}, UnitRange{<:Integer}}
  Tuple{AbstractArray, AbstractArray{<:Integer}, UnitRange{<:Integer}}

I checked the StatsBase docs build failure on 1.8.5 and 1.9.0-rc3.

X-ref: https://julialang.slack.com/archives/C66L48G1X/p1683135789756359

@ararslan
Copy link
Member

ararslan commented May 4, 2023

On current master, it seems that the objectid of the first definition differs from all subsequent definitions, which are equal to one another.

julia> t1 = Tuple{AbstractArray,AbstractArray{<:Integer},UnitRange{<:Integer}};

julia> t2 = Tuple{AbstractArray,AbstractArray{<:Integer},UnitRange{<:Integer}};

julia> t3 = Tuple{AbstractArray,AbstractArray{<:Integer},UnitRange{<:Integer}};

julia> t4 = Tuple{AbstractArray,AbstractArray{<:Integer},UnitRange{<:Integer}};

julia> objectid.([t1, t2, t3, t4])
4-element Vector{UInt64}:
 0xed23fe44110a2d2c
 0x6663357f3c298ad5
 0x6663357f3c298ad5
 0x6663357f3c298ad5

@vtjnash
Copy link
Sponsor Member

vtjnash commented May 4, 2023

We eventually need to swap objectid for the actual hash function that agrees with ==

It is implemented in jltypes.c, because the runtime needs it, but it never got exported to Julia so that Dict would work correctly (only IdDict)

@ararslan
Copy link
Member

ararslan commented May 4, 2023

@vtjnash, I'm not sure what you mean; IdDict uses objectid and thus has the same issue as shown above.

julia> IdDict{Type,Int}(t1 => 1, t2 => 2, t3 => 3, t4 => 4)
IdDict{Type, Int64} with 2 entries:
  Tuple{AbstractArray, AbstractArray{<:Integer}, UnitRange{<:Integer}} => 1
  Tuple{AbstractArray, AbstractArray{<:Integer}, UnitRange{<:Integer}} => 4

There is a type_hash function in src/jltypes.c and indeed, applying this change:

diff --git a/src/jltypes.c b/src/jltypes.c
index 902e1e557f..b337fba043 100644
--- a/src/jltypes.c
+++ b/src/jltypes.c
@@ -1575,6 +1575,12 @@ static unsigned type_hash(jl_value_t *kj, int *failed) JL_NOTSAFEPOINT
     }
 }

+JL_DLLEXPORT uintptr_t jl_type_hash(jl_value_t *v) JL_NOTSAFEPOINT
+{
+    int *failed = 0;
+    return type_hash(v, failed);
+}
+
 static unsigned typekey_hash(jl_typename_t *tn, jl_value_t **key, size_t n, int nofail) JL_NOTSAFEPOINT
 {
     if (tn == jl_type_typename && key[0] == jl_bottom_type)
diff --git a/src/julia.h b/src/julia.h
index 0a542dd8b6..89fea54fc4 100644
--- a/src/julia.h
+++ b/src/julia.h
@@ -1403,6 +1403,7 @@ JL_DLLEXPORT int jl_egal__bits(const jl_value_t *a JL_MAYBE_UNROOTED, const jl_v
 JL_DLLEXPORT int jl_egal__special(const jl_value_t *a JL_MAYBE_UNROOTED, const jl_value_t *b JL_MAYBE_UNROOTED, jl_datatype_t *dt) JL_NOTSAFEPOINT;
 JL_DLLEXPORT int jl_egal__unboxed(const jl_value_t *a JL_MAYBE_UNROOTED, const jl_value_t *b JL_MAYBE_UNROOTED, jl_datatype_t *dt) JL_NOTSAFEPOINT;
 JL_DLLEXPORT uintptr_t jl_object_id(jl_value_t *v) JL_NOTSAFEPOINT;
+JL_DLLEXPORT uintptr_t jl_type_hash(jl_value_t *v) JL_NOTSAFEPOINT;

 STATIC_INLINE int jl_egal__unboxed_(const jl_value_t *a JL_MAYBE_UNROOTED, const jl_value_t *b JL_MAYBE_UNROOTED, jl_datatype_t *dt) JL_NOTSAFEPOINT
 {

gives us

julia> typehash(T::Type) = ccall(:jl_type_hash, UInt, (Any,), T)
typehash (generic function with 1 method)

julia> t1 = Tuple{AbstractArray,AbstractArray{<:Integer},UnitRange{<:Integer}};

julia> t2 = Tuple{AbstractArray,AbstractArray{<:Integer},UnitRange{<:Integer}};

julia> typehash(t1)
0x000000004a251309

julia> typehash(t2)
0x000000004a251309

@vtjnash
Copy link
Sponsor Member

vtjnash commented May 4, 2023

That is not a problem for objectid / IdDict, since we also observe that those keys are !==

@vtjnash
Copy link
Sponsor Member

vtjnash commented May 4, 2023

That implementation of typehash for hash(::Type, ::UInt) SGTM

@ararslan
Copy link
Member

ararslan commented May 4, 2023

That is not a problem for objectid / IdDict, since we also observe that those keys are !==

The fact that they're !== is the heart of this issue though, isn't it? 😄 AFAICT, there's no reason why t1 and t2 should have differing object IDs. (That said, defining hash(::Type, ::UInt) as you suggest does fix the Set case noted in the OP).

ararslan added a commit that referenced this issue May 4, 2023
Currently, `hash(::Type, ::UInt)` uses `objectid`, which can have some
odd behavior for types: in particular, subsequent identical type-valued
variable definitions can have `objectid`s which differ from the first
such definition. This has some bizarre downstream effects when e.g.
using types as the values of a `Set` or the keys of a `Dict`. See issue
49620 for examples.

There is an internal `type_hash` C function used for caching types but
isn't exposed to Julia, as Jameson pointed out in the linked issue. This
commit exposes it as `jl_type_hash` which is then used via `ccall` to
define a method `hash(::Type, ::UInt)`. This method then fixes #49620.
Note, however, that this does not affect the differing `objectid`s for
otherwise identical types.
ararslan added a commit that referenced this issue May 5, 2023
Currently, `hash(::Type, ::UInt)` uses `objectid`, which can have some
odd behavior for types: in particular, subsequent identical type-valued
variable definitions can have `objectid`s which differ from the first
such definition. This has some bizarre downstream effects when e.g.
using types as the values of a `Set` or the keys of a `Dict`. See issue
49620 for examples.

There is an internal `type_hash` C function used for caching types but
isn't exposed to Julia, as Jameson pointed out in the linked issue. This
commit exposes it as `jl_type_hash` which is then used via `ccall` to
define a method `hash(::Type, ::UInt)`. This method then fixes #49620.
Note, however, that this does not affect the differing `objectid`s for
otherwise identical types.
ararslan added a commit that referenced this issue May 5, 2023
Currently, `hash(::Type, ::UInt)` uses `objectid`, which can have some
odd behavior for types: in particular, subsequent identical type-valued
variable definitions can have `objectid`s which differ from the first
such definition. This has some bizarre downstream effects when e.g.
using types as the values of a `Set` or the keys of a `Dict`. See issue
49620 for examples.

There is an internal `type_hash` C function used for caching types but
isn't exposed to Julia, as Jameson pointed out in the linked issue. This
commit exposes it as `jl_type_hash` which is then used via `ccall` to
define a method `hash(::Type, ::UInt)`. This method then fixes #49620.
Note, however, that this does not affect the differing `objectid`s for
otherwise identical types.
vtjnash pushed a commit that referenced this issue May 6, 2023
Currently, `hash(::Type, ::UInt)` uses `objectid`, which can have some
odd behavior for types: in particular, subsequent identical type-valued
variable definitions can have `objectid`s which differ from the first
such definition. This has some bizarre downstream effects when e.g.
using types as the values of a `Set` or the keys of a `Dict`. See issue
49620 for examples.

There is an internal `type_hash` C function used for caching types but
isn't exposed to Julia, as Jameson pointed out in the linked issue. This
commit exposes it as `jl_type_hash` which is then used via `ccall` to
define a method `hash(::Type, ::UInt)`. This method then fixes #49620.
Note, however, that this does not affect the differing `objectid`s for
otherwise identical types.
KristofferC pushed a commit that referenced this issue May 8, 2023
Currently, `hash(::Type, ::UInt)` uses `objectid`, which can have some
odd behavior for types: in particular, subsequent identical type-valued
variable definitions can have `objectid`s which differ from the first
such definition. This has some bizarre downstream effects when e.g.
using types as the values of a `Set` or the keys of a `Dict`. See issue
49620 for examples.

There is an internal `type_hash` C function used for caching types but
isn't exposed to Julia, as Jameson pointed out in the linked issue. This
commit exposes it as `jl_type_hash` which is then used via `ccall` to
define a method `hash(::Type, ::UInt)`. This method then fixes #49620.
Note, however, that this does not affect the differing `objectid`s for
otherwise identical types.

(cherry picked from commit 633d1ae)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants