Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support ordinal types in diff(AbstractRange) #41390

Merged
merged 3 commits into from
Jul 19, 2021

Conversation

Moelf
Copy link
Sponsor Contributor

@Moelf Moelf commented Jun 27, 2021

fix #41336

@Moelf
Copy link
Sponsor Contributor Author

Moelf commented Jun 27, 2021

expecting -(T, T) -> T is pretty reasonable at the first glance I think, so I'm happy that type conversion is not really used when I saw all the tests just passed.

What performance implication this will bring is unknown, probably need a comprehensive benchmark in Julia and maybe even other packages where T[] conversion inside diff actually take place.

In the dream world we could use Base.return_types to prioritize the -(T, T) result and default to converting back to T[].

If people don't like this, we can move this into Dates with diff(AbstractRange{T}} where T<:Dates.AbstractDateTime

@DilumAluthge DilumAluthge added the needs nanosoldier run This PR should have benchmarks run on it label Jun 27, 2021
@mcabbott
Copy link
Contributor

mcabbott commented Jun 28, 2021

Must diff return a Vector? If not, this could be

julia> _diff(r::AbstractRange) = StepRangeLen(step(r), zero(step(r)), length(r)-1);

julia> rr = DateTime(2021,1,1):Hour(1):DateTime(2021,1,2); # from 41336

julia> _diff(rr) == diff(collect(rr))
true

And if it must be a Vector, can't it be just fill(step(r), length(r)-1)?

@Moelf
Copy link
Sponsor Contributor Author

Moelf commented Jun 28, 2021

Well, the issue is right now DateTime - DateTime give you millisecond. Basically the question is do we want diff() to return the same type as eltype - eltype.

Personally I think it should, because diff conceptually is doing subtraction on neighboring element and it would be weird that manual diff return different types than our extra smart diff

@mcabbott
Copy link
Contributor

mcabbott commented Jun 28, 2021

Why the difference is in milliseconds & the step is in years, I don't know, is there a clear preference? Both are time intervals, and I read the bug as being that it returned dates (i.e. times since 0AD) which is the element type but not the type of the difference.

And, it seems a little odd to write a specialisation for AbstractRange which doesn't use its structure at all, when what's being computed is so close to the defining property of an AbstractRange. The method as written would work for AbstractVector too, right?

If for some reason this must return something === (rr[2] - rr[1]) and not just ==, that could surely be arranged, without repeating the calculation N times. Although it is hard to imagine the cost ever mattering.

@Moelf
Copy link
Sponsor Contributor Author

Moelf commented Jun 28, 2021

Why the difference is in milliseconds

Because -(DateTime, DateTime) is doing .instant, which are in millisecond

AbstractRange which doesn't use its structure at all, when what's being computed is so close to the defining property of an AbstractRange

Because being smart may defy people's intention of adding special -(T, T) method: they expect diff() to respect the special arithmetic

@mcabbott
Copy link
Contributor

defy people's intention of adding special -(T, T) method

If you add a method for - such that (start + step) - start != step, then you have bigger problems.

What I meant when asking "why" was whether there was a reason that (start + step) - start === step should fail. And I guess the logic is that step has whatever units you defined the range with, but + throws those away, so - can't see and has to pick some units, and it picks milliseconds.

@Moelf
Copy link
Sponsor Contributor Author

Moelf commented Jun 28, 2021

because - is not the inverse operation of + in DateTime realm:

a = DateTime(2021,1,1):Hour(1):DateTime(2021,1,2)

julia> a.start + a.step
2021-01-01T01:00:00

julia> a.start + a.step - a.start
3600000 milliseconds

(addition of two DateTime would error, subtraction would give you Period, for Periods, they obey the "normal" +/- expectation)

well, I more or less agree with you I just don't know if people rely on diff() giving exact same type as subtraction even if == of course would hold.

@mcabbott
Copy link
Contributor

mcabbott commented Jun 28, 2021

Yes, the difference is of dates is an element of the tangent space, is how I would phrase it. As is the step. It just happens that, along the path to computing the difference, the precise units of the step have been lost, so we get standard ones.

Maybe more interesting is to think about monthly steps, where things are't ==.

julia> rm = DateTime(2015,1,1):Month(1):DateTime(2021,1,2);

julia> _diff(rm)  # "252-element StepRangeLen{Month, Month, Month}"
Month(1):Month(0):Month(1)

julia> ans == diff(collect(rm))
false

julia> diff(collect(rm))
252-element Vector{Millisecond}:
 2678400000 milliseconds
 2505600000 milliseconds
 2678400000 milliseconds

this show looks weird

Oh that's a bug. It's supposed to have special printing for zero step, but should check iszero:
https://github.com/JuliaLang/julia/pull/40320/files#diff-8e63081442c6b1785e18b93e97805e8191e62860d828af6d50b530c90b72a847R930

@Moelf
Copy link
Sponsor Contributor Author

Moelf commented Jun 28, 2021

where things are't ==

I guess we will still go with what is in the PR now then

@JeffBezanson
Copy link
Sponsor Member

@nanosoldier runbenchmarks(ALL, vs=":master")

@JeffBezanson JeffBezanson changed the title let compiler figure out types in diff(AbstractRange) support ordinal types in diff(AbstractRange) Jul 6, 2021
@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @christopher-dG

@mbauman
Copy link
Sponsor Member

mbauman commented Jul 7, 2021

We don't have a single call to diff in BaseBenchmarks — and I'd be pretty surprised if diff(::AbstractRange) ended up inside something else that gets benchmarked since, well, you should probably just look directly at step instead if you're truly chasing performance. This LGTM.

@vtjnash vtjnash removed the needs nanosoldier run This PR should have benchmarks run on it label Jul 19, 2021
@vtjnash vtjnash merged commit c876fba into JuliaLang:master Jul 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

diff on ranges of DateTimes returns DateTimes, not Periods
8 participants