Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Days.daysBetween int overflow #471

Merged
merged 7 commits into from
Apr 16, 2020
Merged

Fix Days.daysBetween int overflow #471

merged 7 commits into from
Apr 16, 2020

Conversation

TuanNguyen27
Copy link
Collaborator

@TuanNguyen27 TuanNguyen27 commented Apr 7, 2020

Days.daysBetween(datetime_1, datetime_2) could throw java.lang.ArithmeticException: Value cannot fit in an int: if the duration between two dates in milliseconds cannot fit a 32-bit int representation. See related Stack Overflow issue here

Describe the proposed solution
Switch to using joda's Duration.

Describe alternatives you've considered
N/A

Additional context
N/A

@@ -122,7 +122,7 @@ class DateMapVectorizerTest extends FlatSpec with TestSparkContext with Attribut
}

it should "vectorize dates correctly on test date" in {
checkAt(new JDateTime(2017, 9, 28, 15, 45, 39, DateTimeUtils.DefaultTimeZone))
checkAt(new JDateTime(1901, 1, 1, 0, 0, 0, DateTimeUtils.DefaultTimeZone))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better add a new test case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TuanNguyen27 please add test cases for ArithmeticExceptions you are trying to solve

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gerashegalov the new test case is checkAt(new JDateTime(1901, 1, 1, 0, 0, 0, DateTimeUtils.DefaultTimeZone)), which exceed the limit if we use Days.daysBetween. Does that work ?

@@ -397,7 +397,7 @@ final class DateMapVectorizerModel[T <: OPMap[Long]] private[op]
val timeZone: DateTimeZone = DateTimeUtils.DefaultTimeZone

def convertFn: DateMap#Value => RealMap#Value = (dt: DateMap#Value) =>
dt.mapValues(v => Days.daysBetween(new DateTime(v, timeZone), referenceDate).getDays.toDouble)
dt.mapValues(v => new Duration(new DateTime(v, timeZone), referenceDate).getStandardDays.toDouble)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, there are more places like this in the code. try grepping.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is at least one more in DateTimeVectorizerTest

@gerashegalov
Copy link
Contributor

please add a real PR description

Copy link
Contributor

@winterslu winterslu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, fix the build

@TuanNguyen27 TuanNguyen27 changed the title Fix Days.daysBetween overflow Fix Days.daysBetween int overflow Apr 14, 2020
@codecov
Copy link

codecov bot commented Apr 16, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@275824f). Click here to learn what that means.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #471   +/-   ##
=========================================
  Coverage          ?   86.99%           
=========================================
  Files             ?      345           
  Lines             ?    11622           
  Branches          ?      609           
=========================================
  Hits              ?    10110           
  Misses            ?     1512           
  Partials          ?        0           
Impacted Files Coverage Δ
...ce/op/stages/impl/feature/DateListVectorizer.scala 98.01% <100.00%> (ø)
...force/op/stages/impl/feature/OPMapVectorizer.scala 97.81% <100.00%> (ø)
...a/com/salesforce/op/utils/date/DateTimeUtils.scala 100.00% <100.00%> (ø)
...stages/impl/feature/GeolocationMapVectorizer.scala 100.00% <0.00%> (ø)
.../scala/com/salesforce/op/testkit/DataSources.scala 92.00% <0.00%> (ø)
...ml/jackson/module/scala/OpDefaultScalaModule.scala 100.00% <0.00%> (ø)
...p/stages/impl/feature/SmartTextMapVectorizer.scala 100.00% <0.00%> (ø)
...main/scala/com/salesforce/op/dsl/RichFeature.scala 100.00% <0.00%> (ø)
...com/salesforce/op/testkit/StandardRandomData.scala 100.00% <0.00%> (ø)
...esforce/op/utils/spark/OpVectorColumnHistory.scala 0.00% <0.00%> (ø)
... and 338 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 275824f...3d18626. Read the comment docs.

Copy link
Contributor

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,

@TuanNguyen27 I refactored the code to avoid code duplication