RFC: Simpler array hashing #26022

mbauman · 2018-02-13T01:16:40Z

This is a straw-man implementation of a simpler array hashing scheme. It's very basic and has room for further optimizations, but it's intention is to provide a starting point as an alternative to #25822. In short: This hashes the size of the array and then the first three distinct elements and their linear indices and then the last three distinct elements and their linear distance from the end of the array.

The exact scheme here is open to bikeshedding -- we could use more or less distinct elements. I use "distinct elements" as a way of ensuring that all relatively empty sparse arrays don't hash similarly, and I hash elements at both the beginning and end of the array because they're the simplest to get to and final elements will be the "most expensive" to discover that they differ if we have to fall back to an equality check. The most complicated part is keeping track of what you hashed in order to prevent running through the entire array twice (once forwards and once backwards).

nalimilan · 2018-02-13T09:34:17Z

Thanks for this. I certainly appreciate the simplicity of this approach compared with the current one. I think we should keep in mind a third possible approach, though, which is to just drop O(1) hashing for ranges (it's not clear that's very useful in practice) and use a simple solution similar to what was implemented before #16401. As a hybrid solution, we could also continue to hash all elements for multidimensional arrays (as done both before and after #16401), and only change how vectors are hashed, since that's what is needed to hash ranges in O(1) time.

Assuming we adopt the general approach from this PR (i.e. do not hash all elements nor all differences between subsequent elements, but only a part of the information), several other quantities could be included in the hash, with the objective of making collisions less likely when modifying a single entry in the middle of the array:

hash the sum of all elements
hash the number of nonzero elements
hash the indices of series of nonzero elements (i.e. ranges of indices rather than individual indices)

These quantities have the advantage that they can be computed in O(1) time for (integer, at least) ranges and relatively efficiently for sparse matrices. But they require going over the full array, making the computation of hashes O(N) for general arrays while your PR is currently O(1).

Overall I think we should identify a few typical use cases to decide which approach is the best one. Without that it difficult to know what's the best tradeoff between computation cost and probability of collision. In general I'm a bit concerned about the fact that changing the value of entries in the middle of the array wouldn't change the hash: ideally collisions shouldn't happen in such trivial cases IMHO. But maybe that's fine in practice and the reduction in hashing cost is worth it. I'll just leave a pointer to JuliaLang/Distributed.jl#48, which AFAICT isn't a very correct use of array hashing, but which is going to become even more problematic if we make collisions much more likely (it could use a custom algorithm if needed).

Finally, let me note that we still have the possibility of continuing to hash all elements for multidimensional arrays (as was done before and after #16401), and only change how vectors are hashed, since that's what is needed to hash ranges in O(1) time.

Doing some research, I've found only two examples of array hashing in main languages:

C# only hashes the last 8 elements in an array by default: https://stackoverflow.com/a/37123103/2413179
Java hashes all elements: http:https://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/Arrays.java#Arrays.hashCode%28short%5B%5D%29

mbauman · 2018-02-13T19:49:32Z

base/abstractarray.jl

-# Efficient O(1) method equivalent to the O(N) AbstractArray fallback,
-# which works only for ranges with regular step (RangeStepRegular)
-function hash_range(r::AbstractRange, h::UInt)
- h += hashaa_seed


Note to self: I forgot to include the hashaa_seed in the computation

JeffBezanson · 2018-02-13T22:00:40Z

base/abstractarray.jl

+# But we cannot define those here, due to bootstrapping. They're defined later in iterators.jl.
+function hashndistinct(A, I, n, hx)
+ _hashcheckbounds(A, I)
+ seen = Set()


Would probably be better to use a Vector for this since the number of items is small, and it will avoid hashing them multiple times.

Yeah, I was thinking of using a specialized 3-vector to avoid allocations entirely, but figured we could decide on the direction and exact behavior first before optimizing it further.

JeffreySarnoff · 2018-02-14T11:17:08Z

This is a good optimization. These are some hash admixes worth consideration.

in some manner, one from one or more of these pairs may help

(a) the shape (dims) # reducingly hash each dim size
(α) the length (prod(dims))

(b) the initial 1st, 2nd, 7th and final 1st (end), 2nd, 7th (end-6) (or missings)
(β) the initial and final two and initial and final fn(length)th element
fn(length) = mod(maxmin(length, 61)...,) + 1

(edited to comport with reality, thank you for the note, Milan)

nalimilan · 2018-02-14T11:28:19Z

@JeffreySarnoff (0), (a) and (α) do not matter for isequal, so they definitely cannot enter the hash.

mbauman · 2018-02-14T17:05:03Z

So the options and their tradeoffs here are:

If we don't hash every single element, it's easy to mutate an array and have its hash remain the same. That's just fine, though, since hash collisions are allowed and will fall back to equality checks. We just don't want it to happen too often, since those equality checks could be expensive to repeatedly perform.
If we hash every single element, it's very expensive for sparse matrices to compute every single element. Making hashing ranges O(N) might make an otherwise cromulent structure like 1:typemax(Int) stall till the heat death of the universe if you happen to put it in a set.
If we do something clever, we have to be doubly clever in order to deal with heterogeneous arrays of things that might not support the cleverness or might overflow/underflow/change precision (e.g., subtraction for diff, addition for sum, promotion and precision difficulties in both cases). The only sort of cleverness I'd advocate for is the kind that only relies upon hashing and equality — like run-length encoding or distinct elements.

I'm becoming more convinced that an O(1) hashing method is the way to go here. I'd happily extend the number of elements to help ~~convince~~ assuage you.

nalimilan · 2018-02-14T17:34:43Z

I generally agree, the current approach is too brittle and complex. I still a bit sad to move to an approach which makes it so easy to get collisions. Maybe I just need some time to accept this. ;-)

Another less fragile possibility which occurred to me would be to use another property of ranges, which is less strict than computing differences between subsequent elements but also less problematic to check: the fact that ranges are monotonically increasing or decreasing. We could continue to hash all elements for vectors which are not sorted and multidimensional arrays (keeping RLE for efficiency with sparse matrices of course), but use the O(1) approach for sorted vectors (for element types which support isless). In practice, it should be quite rare that a user would hash several ordered vectors of the same length and which start and end with the same elements, so this should dramatically reduce the risk of collisions.

JeffreySarnoff · 2018-02-14T18:47:56Z

I agree that loosing discrimination is a negative (there are algorithms that thrive on some probable assurance of differents hashing differently, so things need be visited once to be useful with other things of a group).

Increasing the span at front and back that are hashed, and speckle-ing some of the ith quartile positions (or a few thereabout) should do a decent of job of discrimination, especially where the length>>some mod a_length_relevant_prime is used to pick the span between successive hash-sampled values (or value pairs).

Or for speed, keep a sequence of indices to pluck and let them wrap on the length.

JeffBezanson · 2018-02-14T19:04:16Z

One option might be to use exponentially-spaced indices, and hash log(n) points.

JeffreySarnoff · 2018-02-14T19:18:01Z

... and interweave those indicies, taking them from [1] increasing and from [end] decreasing, to disambiguate small vectors and similar arrays more successfully?

JeffreySarnoff · 2018-02-15T02:24:43Z

struct EntityHash{T<:Union{UInt64, Missing}}
    eachentity::UInt64  # every hashable thing has this valued
    collective::T       # an aggregative thing may have this valued
                        #    lazily done only if needed or where
                        #    the e.g. write stream does the calc
end

Things that are items or elements have sizeof(EntityHash) == sizeof(UInt64)
Things that are collectives and are more fully hashed have twice that size

martinholters · 2018-02-16T09:57:54Z

We could continue to hash all elements for vectors which are not sorted and multidimensional arrays (keeping RLE for efficiency with sparse matrices of course), but use the O(1) approach for sorted vectors (for element types which support isless).

Fails for the totally_not_five from #26034. (Not that that example seems particularly relevant, but the fact the it is so easy define an object which breaks the approach is concerning.)

mbauman · 2018-02-16T16:54:30Z

Taking log(n) hashes is quite compelling — particularly if we increase distance between hashes as we iterate from the back of the array, since that weights the elements involved in the hash towards the "most expensive" to differentiate with isequal. The major downside I see there is that many sparse matrices will have fewer than log(n) stored values.

What if we used a combination of these approaches?

Given array A, consider the list of hashes hxs that contains the hashes of those elements that are distinct from their neighbor, working backwards from the end of the array. That is:
```
hxs = [hash(A[end]); [hash(A[i]) for i in lastindex(A)-1:-1:firstindex(A) if hash(A[i]) != hash(A[i+1]) && A[i] != A[i+1]]
```
hash(A) is sum(hxs[[2^i for i=0:floor(Int, log2(end))]])

Of course, you would never compute it this way, but it can be done iteratively.

This has the slightly strange effect that we compute the hash of every element for Array but don't use them all in the computation, but I think it's way better than taking the difference of every element. The advantages are: fast iteration through sparse arrays (often just alternating between a hash of zero and a hash of a stored value), and we only need to hash O(log(N)) elements in computed arrays where we know successive elements are not equal.

mbauman · 2018-02-16T17:25:24Z

The proposal in my last post still leaves a bad taste in my mouth. If we're going to do log(N), let's make it work for all array types. Here's another idea:

Iterate the array backwards
Add the hashes of the first 3 (or so) distinct values you see, stopping at index j
Launch into a blind log(N) hashing regime starting at index j, adding sum(hash.(A[[j - 2^i for i in 0:floor(log2(end-j)))]))

StefanKarpinski · 2018-02-16T17:41:31Z

I'm a bit concerned that hashing log(N) entries in a sufficiently sparse array will tend to catch none of the non-zeros.

mbauman · 2018-02-16T17:50:45Z

Right — that's my second proposal. In short: Change this PR (which does O(1) hashing of some small number of distinct elements from the front and back) to ignore the forward iteration and augment the backwards iteration with log(N) hashes after it's hashed the small number of distinct elements.

nalimilan · 2018-02-16T18:05:49Z

That's an interesting solution. There are a few things I don't understand:

Why do you need to iterate backwards?
Why do you compare the hashes of subsequent elements before comparing them directly?
Why sum the hashes rather than combine them with hash?

mbauman · 2018-02-16T18:21:30Z

Great questions:

I iterate backwards because of the asymmetry between == and hash. In the case of a hash collision, we fall back to ==. By default and most commonly, == iteratively compares its elements walking forwards through the array. This means that — if the arrays are indeed different — it'll probably find that faster. I would bet this is why C# hashes the last 8 elements.
I compare hashes because I had been thinking we were already computing them — either as part of the total array hash or in a Set to hold the distinct elements. Might as well use them!
I sum h += hash(A[i]) instead of chaining with hash(A[i], h) in order to allow for the reuse of the the hashes.

But you're right — it's probably not worth hashing elements that don't get used in the total hash, and we probably don't want to use a Set for such a limited number of elements and comparisons.

nalimilan · 2018-03-02T09:01:33Z

I think we should proceed with any reasonable solution so that the API is stabilized and BigFloat can be moved to stdlib (#25716). We can always switch to a different approach in the future since it's not breaking.

JeffreySarnoff · 2018-03-02T11:02:23Z

I'm a bit concerned that hashing log(N) entries in a sufficiently sparse array will tend to catch none of the non-zeros.

Unclear that log(N) is the "hatrack" .. the same would be true of sqrt(N) samples.

Milan's push to get this one done so other contingencies can move along makes sense. It all sounds non-blocking and pathway providing.

Maybe it is worthwhile to keep a short vector of linearized indices to nonzero values to provide informed indirection. 32 or 64 Int64s could do that for a sparse array of 10 trillion elements. If the data structure knows the index of its first and its final nonzero entries, two of those indices could be rewritten when that internal state is rewritten. Two can get two more (the next innermost) and the structure may find that useful too, giving the hash at worst 4 nonzero values and their linearized indexings -- 8 nonzero Ints to hash or better

nalimilan · 2018-05-21T17:24:34Z

@mbauman Do you plan to finish this, or do you need help?

mbauman · 2018-06-29T17:28:56Z

Yes, let's resurrect this. I have a WIP branch locally where I started implementing #26022 (comment) a while ago, but I stalled on it (and then forgot about it) because I didn't like how hard it was to either implement or describe. I'll take a second look this afternoon, and see if I can find some simplifications.

StefanKarpinski · 2018-06-30T00:30:42Z

Here's another thought: hash arrays in such a manner that if an array approximates an range, it hashes like it and just take the hit of those two objects being in the same bucket. After all, what are the chances that you're hashing both arrays and ranges together and have an array that is almost equal to a range but not quite exactly equal to it? That's the only bad case with such an approach.

oscardssmith · 2018-06-30T00:47:02Z

won't that fail because any bucket will have an edge that will have the same precision problems that your were trying to avoid?

mbauman · 2018-07-10T20:32:02Z

Ok, I've fixed the type instability (#15276), and I'm pretty happy with this now. I've done some cursory testing of the performance impacts here by filling and emptying dictionaries of arrays — overall things look pretty good but note that this can cause dictionaries to change algorithmic complexity. I contrived the last two cases such that they're hash collisions but unequal; this is a worst case and leads to O(length(A)*length(dict)^2) performance. Of course, it's also possible to construct a set of arrays that will all have colliding hashes with the current system, too — it's just a bit harder to do. I've flagged the asymptotically bad cases in bold.

keys	fill_master	empty_master	fill_pr	empty_pr
`[(1:10) .- i for i in 1:100000]`	0.01	0.008	0.04	0.03
`[(1:2^60) .- i for i in 1:100000]`	0.01	0.005	0.1	0.07
`[(1:10)./i for i in 1:100000]`	0.08	0.04	0.05	0.03
`[(1:2^60)./i for i in 1:100000]`	~Inf	~Inf	0.4	0.2
`[rand(1000, 100) for _ in 1:100]`	0.1	0.07	0.00007	0.0001
`[rand(1000, 100) for _ in 1:100000]`	100	70	0.3	0.2
`[Array{Any}(rand(1000,100)) for _ in 1:100]`	0.9	0.4	0.0004	0.0006
`[Array{Any}(rand(1000,100)) for _ in 1:1000]`	7	4	0.006	0.003
`A = rand(1000,100); [setindex!(copy(A), i, length(A)-20) for i in 1:100]`	0.1	0.07	1.1	0.0002
`A = rand(1000,100); [setindex!(copy(A), i, length(A)-20) for i in 1:1000]`	2	0.7	100	0.007

vtjnash · 2018-07-10T20:50:23Z

O(length(A)*length(dict)^2) performance

I think this is usually dealt with by using a more resilient dictionary algorithm (this is the problem case for the linear probing scheme we currently use) or a more tailored data-structure for the problem (such as a Trie). I'm surprised it's quite that bad though – typically the goal is to have the worst case for hash collision be to turn the lookup into a linear-scan (without the square power). Where does the ^2 power derive from?

vtjnash · 2018-07-10T20:54:08Z

base/abstractarray.jl

+ i = last(I)
+ v1 = A[i]
+ h = hash(i=>v1, h)
+ i = let v1=v1; findprev(x->!isequal(x, v1), A, i); end


possibly should use Ref{eltype(A)}(v1) here, or just open-code the loop (to make heterogeneous arrays fast). the same would apply to h = hash(i=>v1, h) though (vs. writing it as h = hash(i, hash(v1, h))), so I'm not sure of the importance of it

I added a simple type-unstable case to the matrix above — this is already smoking the status quo (largely by simply touching fewer unstable elements). I get your point about constructing the Pairs, but I don't fully understand the Ref in the closure. In any case, this should be a relatively small constant cost.

Maybe add a comment explaining why let is used.

mbauman · 2018-07-10T20:57:04Z

Where does the ^2 power derive from?

This is the complexity for inserting the whole set of hash-equal items into the dictionary. For each item, it must compare == against all existing items in the dictionary.

nalimilan · 2018-07-11T08:06:23Z

Thanks! The code and the benchmarks look good to me. Of course it's somewhat slower for ranges, and much slower when there are collisions, as expected. But the main question which is hard to address is whether collisions are frequent in practice. If that'd really the case (which isn't very likely), an ideal approach would maybe to have Dict use two hashes: this one as a first default pass, and a systematic hashing of all elements as soon as a collision happens (ignored if ranges are involved). If collisions are rare, this PR should make things quite faster since much fewer elements are hashed.

BTW, if this solution is retained, we'll need to do something about JuliaLang/Distributed.jl#48. Maybe just use the old algorithm there, to limit the risk that the array wouldn't be updated because of collisions (but of course it's not completely correct either).

nalimilan · 2018-07-11T07:51:17Z

base/abstractarray.jl

+ i = last(I)
+ v1 = A[i]
+ h = hash(i=>v1, h)
+ i = let v1=v1; findprev(x->!isequal(x, v1), A, i); end


Maybe add a comment explaining why let is used.

nalimilan · 2018-07-11T07:54:30Z

base/abstractarray.jl

+ fibskip = prevfibskip = oneunit(j)
+ while j > fibskip
+ j -= fibskip
+ h = hash(A[J[j]], h)


@inbounds would be OK here right?

Also, have you considered hashing the first entry which differs from the previous/next one starting from J[j]? That would improve detection of differences between sparse matrices. I'm concerned that hashing the last distinct values isn't enough.

Sure, this could be @inbounds but I wouldn't expect it to make a difference in performance — profiling hashing a Float64 array shows checkbounds accounts for 8 samples out of ~5000. I'm inclined to leave it.

Also, have you considered hashing the first entry which differs from the previous/next one

Yes, something along those lines is compelling, but I had avoided it due to the challenge of efficiently skipping linear offsets across cartesian indices. Now that I have coded that computation here, though, perhaps it wouldn't be so bad. I'll see if it can be done reasonably.

…ments Goal: Hash approximately log(N) entries with a higher density of hashed elements weighted towards the end and special consideration for repeated values. Colliding hashes will often subsequently be compared by equality -- and equality between arrays works elementwise forwards and is short-circuiting. This means that a collision between arrays that differ by elements at the beginning is cheaper than one where the difference is towards the end. Furthermore, blindly choosing log(N) entries from a sparse array will likely only choose the same element repeatedly (zero in this case). To achieve this, we work backwards, starting by hashing the last element of the array. After hashing each element, we skip the next `fibskip` elements, where `fibskip` is pulled from the Fibonacci sequence -- Fibonacci was chosen as a simple ~O(log(N)) algorithm that ensures we don't hit a common divisor of a dimension and only end up hashing one slice of the array (as might happen with powers of two). Finally, we find the next distinct value from the one we just hashed.

mbauman · 2018-07-11T18:49:42Z

Thanks @nalimilan — I like the design you proposed in #26022 (comment) much better.

nalimilan · 2018-07-11T21:26:18Z

base/abstractarray.jl

 fibskip, prevfibskip = fibskip + prevfibskip, fibskip
+
+ # Find a key index with a value distinct from `elt` -- might be `keyidx` itself
+ keyidx = findprev(!isequal(elt), A, keyidx)


So IIUC this finds the first entry after keyidx which differs from the last hashed element. That's a bit different from what I had in mind: I was thinking about finding the first element which differs from the one at keyidx. I'm not sure it's really better, but the idea was that since in a sparse array it's likely that keyidx hits a structural zero, looking for the previous distinct element makes it likely you'll hash a non-zero entry. With your approach, if you hash a zero the first time, you will look for the previous non-zero entry the next time, but if you hit a zero entry the time after that, you'll happily hash it; so you'll end up hashing a zero half of the time, right?

Yes, that's exactly right. I think it makes the behavior a little more robust — otherwise the hashes of sparse arrays with a nonzero last element will more likely hash the same. I also think it's most likely that diagonals of sparse matrices are filled.

That said, this is now clearly not hashing enough elements. The test failures are from four-element arrays colliding. Gotta slow down the exponential a little bit.

Yes, that's exactly right. I think it makes the behavior a little more robust — otherwise the hashes of sparse arrays with a nonzero last element will more likely hash the same. I also think it's most likely that diagonals of sparse matrices are filled.

Sorry, I'm not sure I follow. Could you develop?

That said, this is now clearly not hashing enough elements. The test failures are from four-element arrays colliding. Gotta slow down the exponential a little bit.

Agreed.

BTW, why use findprev, which requires all this keys vs. linear indices dance, instead of a plain loop? I imagine one reason could be that findprev could have a specialized method for sparse arrays which would skip empty ranges, but currently that's not the case. Is that what you have in mind?

Ah, I misunderstood your comment at first — I thought you were suggesting only unique'ing against the very first element. Now I understand that you mean to access the value after each skip, and then hash the next element that's different from it.

There are two reasons to do the keys vs. linear indices dance: one is findprev, but the other is that I want to hash index-value pairs to add a bit more information about the structure of the array. And of course, we cannot introduce a hashing difference between IndexLinear and IndexCartesian arrays. The fact that we then also allow arrays to opt into optimizations via findprev is a nice bonus, especially since it's going to be a pain to completely re-write your own hash optimization that has exactly the same behavior.

mbauman · 2018-07-13T22:19:30Z

Ok, I've slowed down that exponential — by quite a bit. I now only grab the next number in the Fibonacci sequence after every 4096 hashes. I chose this number by considering:

The large anticipated penalty for a hash collision. I wanted to maximize the size of the array that gets 100% of its elements hashed… especially in light of remotecall_fetch vulnerable to hash collisions? Distributed.jl#48.
- Every element in arrays smaller than length 4096 (64x64 or 8x8x8x8) is included in the hash computation.
- For an array of size 8192 with no sequentially repeated elements, 75% of its elements are hashed: every other element is hashed in the first half of the array, and every element is hashed in the second half.
The time it takes to hash ranges with typemax(Int) elements as they used to be instantaneous — so I wanted to keep this at least somewhat tractable. With this choice on my beefy machine, it takes ~0.5 s to hash (1:typemax(Int64))/pi, and ~10 ms to hash 1:typemax(Int64).
- For an array of a billion elements with no repeats, 99,915 elements are hashed (10^5/10^9 — 0.01%)
- For an array of typemax(Int) elements with no repeats, 276,303 elements are hashed (10^5.5/10^19).

This is the balance point that seems reasonable to me — a power of two just because it's slightly cheaper to perform the mod.

nalimilan · 2018-07-14T10:13:32Z

Makes sense. I wonder whether we couldn't find a function which increases slowly for common sizes, but quite faster when sizes get really large. Or maybe we could just apply an arbitrary threshold beyond which we stop hashing entries? A Vector{UInt8} with typemax(Int) elements takes 8e6 TB, which hardware won't be able to handle for decades or centuries. It wouldn't be terrible if we stopped at typemax(Int)/1e6 elements.

EDIT: the goal being to ensure even the largest possible range doesn't take one second to hash.

Liozou · 2018-07-19T21:07:10Z

base/abstractarray.jl

+ # entirely-distinct 8000-element array will have ~75% of its elements hashed,
+ # with every other element hashed in the first half of the array. At the same
+ # time, hashing a `typemax(Int64)`-length Float64 range takes about a second.
+ if mod(n, 4096) == 0


Since performance is important here, why not replacing the mod with explicit bit twiddling? In this case I would suggest replacing this line with:

if (Int==Int64 && n<<52==0) || n<<20==0

That's because 4096 = 2^12 and 52 = 64 - 12 and 20 = 32 - 12. Also the Int==Int64 part will get inlined so this will be effectively replaced by either n<<52==0 or n<<20==0.

If you use rem(n, 4096) == 0 the compiler takes care of the rest.

I always forget about the rem and %

JeffBezanson · 2018-07-23T23:32:34Z

Bump. Could we add the test cases from #27865, #26011, and #16401 (comment) and merge this?

Fixes #27865 and fixes #26011.

Fixes #26034

mbauman · 2018-07-26T23:14:38Z

Ok, test cases pushed. There are, of course, many other schemes that we could dream up here… but triage was in support of merging this as it stands.

To help avoid JuliaLang/Distributed.jl#48, I'm putting together another PR that will basically make Distributed use its own hashing system — the constraints are a little different there since we don't need to worry about hash equality across different types. I don't think the two need to be merged at the same time, but I should have that ready to go by tomorrow.

Goal: Hash approximately log(N) entries with a higher density of hashed elements weighted towards the end and special consideration for repeated values. Colliding hashes will often subsequently be compared by equality -- and equality between arrays works elementwise forwards and is short-circuiting. This means that a collision between arrays that differ by elements at the beginning is cheaper than one where the difference is towards the end. Furthermore, blindly choosing log(N) entries from a sparse array will likely only choose the same element repeatedly (zero in this case). To achieve this, we work backwards, starting by hashing the last element of the array. After hashing each element, we skip the next `fibskip` elements, where `fibskip` is pulled from the Fibonacci sequence -- Fibonacci was chosen as a simple ~O(log(N)) algorithm that ensures we don't hit a common divisor of a dimension and only end up hashing one slice of the array (as might happen with powers of two). Finally, we find the next distinct value from the one we just hashed. Fixes #27865 and fixes #26011. Fixes #26034

mbauman added the domain:arrays [a, r, r, a, y, s] label Feb 13, 2018

mbauman requested a review from nalimilan February 13, 2018 01:16

nalimilan mentioned this pull request Feb 13, 2018

Array hashing breaks equality on 0.7 #26034

Closed

mbauman commented Feb 13, 2018

View reviewed changes

JeffBezanson reviewed Feb 13, 2018

View reviewed changes

nalimilan added the domain:hashing label Feb 18, 2018

JeffBezanson added this to the 1.0.x milestone May 30, 2018

nalimilan mentioned this pull request Jun 29, 2018

hashing of array of arrays with length > 3 #27865

Closed

vtjnash reviewed Jul 10, 2018

View reviewed changes

nalimilan reviewed Jul 11, 2018

View reviewed changes

mbauman force-pushed the mb/simplerarrayhashing branch from 50a1c70 to f16326f Compare July 11, 2018 18:48

nalimilan reviewed Jul 11, 2018

View reviewed changes

Slow down the Fibonacci sequence

0d72578

nalimilan mentioned this pull request Jul 14, 2018

remotecall_fetch vulnerable to hash collisions? JuliaLang/Distributed.jl#48

Open

Liozou reviewed Jul 19, 2018

View reviewed changes

mbauman added 3 commits July 26, 2018 18:38

Add tests

0e97f65

Fixes #27865 and fixes #26011.

Use rem instead of mod

42c8ea9

Also add test for #26034

3d62fba

Fixes #26034

Fixup test case for 32-bit machines

5a99f4b

mbauman mentioned this pull request Jul 27, 2018

Implement optimizations for sparse findnext/findprev #28313

Closed

JeffBezanson merged commit b0bf91e into master Aug 2, 2018

JeffBezanson deleted the mb/simplerarrayhashing branch August 2, 2018 15:34

nalimilan mentioned this pull request Dec 25, 2018

O(1) searchsorted for Ranges satisfying RangeStepRegular #30501

Open

This was referenced Apr 26, 2020

keys and hash don't work on ranges with many elements #35603

Open

mod(i::Integer, const_power_of_2) optimization #34990

Open

timholy mentioned this pull request Mar 8, 2021

Faster AbstractArray hashing #39950

Merged

nalimilan mentioned this pull request Oct 9, 2023

Move BigFloat to stdlib #25716

Open

5 tasks

RFC: Simpler array hashing #26022

RFC: Simpler array hashing #26022

Conversation

mbauman commented Feb 13, 2018 • edited Loading

nalimilan commented Feb 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JeffreySarnoff commented Feb 14, 2018 • edited Loading

nalimilan commented Feb 14, 2018

mbauman commented Feb 14, 2018 • edited Loading

nalimilan commented Feb 14, 2018

JeffreySarnoff commented Feb 14, 2018

JeffBezanson commented Feb 14, 2018

JeffreySarnoff commented Feb 14, 2018

JeffreySarnoff commented Feb 15, 2018

martinholters commented Feb 16, 2018

mbauman commented Feb 16, 2018 • edited Loading

mbauman commented Feb 16, 2018 • edited Loading

StefanKarpinski commented Feb 16, 2018

mbauman commented Feb 16, 2018

nalimilan commented Feb 16, 2018

mbauman commented Feb 16, 2018

nalimilan commented Mar 2, 2018

JeffreySarnoff commented Mar 2, 2018

nalimilan commented May 21, 2018

mbauman commented Jun 29, 2018

StefanKarpinski commented Jun 30, 2018

oscardssmith commented Jun 30, 2018

mbauman commented Jul 10, 2018 • edited Loading

vtjnash commented Jul 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbauman commented Jul 10, 2018

nalimilan commented Jul 11, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbauman commented Jul 11, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbauman commented Jul 13, 2018 • edited Loading

nalimilan commented Jul 14, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JeffBezanson commented Jul 23, 2018

mbauman commented Jul 26, 2018

mbauman commented Feb 13, 2018 •

edited

Loading

JeffreySarnoff commented Feb 14, 2018 •

edited

Loading

mbauman commented Feb 14, 2018 •

edited

Loading

mbauman commented Feb 16, 2018 •

edited

Loading

mbauman commented Feb 16, 2018 •

edited

Loading

mbauman commented Jul 10, 2018 •

edited

Loading

mbauman commented Jul 13, 2018 •

edited

Loading

nalimilan commented Jul 14, 2018 •

edited

Loading