changed to look specifically for part files rather than listing the files and searching through them #505

leahmcguire · 2020-09-02T18:45:14Z

Related issues
Refer to issue(s) addressed in this pull request from Issues page.

Describe the proposed solution
A clear and concise description of what the changes are.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context about the changes here.

tovbinm · 2020-09-02T18:55:16Z

Is it faster? What if another compression technique is being used, i.e. 7Zip, then we would have .7z extension?

codecov · 2020-09-02T19:05:55Z

Codecov Report

Merging #505 into master will decrease coverage by 0.70%.
The diff coverage is 58.33%.

@@            Coverage Diff             @@
##           master     #505      +/-   ##
==========================================
- Coverage   86.74%   86.03%   -0.71%     
==========================================
  Files         347      347              
  Lines       11859    11860       +1     
  Branches      388      607     +219     
==========================================
- Hits        10287    10204      -83     
- Misses       1572     1656      +84

Impacted Files	Coverage Δ
...cala/com/salesforce/op/OpWorkflowModelReader.scala	`91.20% <58.33%> (-5.46%)`	⬇️
...e/op/stages/impl/selector/RandomParamBuilder.scala	`0.00% <0.00%> (-94.45%)`	⬇️
...main/scala/com/salesforce/op/dsl/RichFeature.scala	`50.00% <0.00%> (-50.00%)`	⬇️
...mpl/regression/OpGeneralizedLinearRegression.scala	`50.00% <0.00%> (-26.93%)`	⬇️
...op/stages/impl/regression/OpXGBoostRegressor.scala	`13.46% <0.00%> (-13.47%)`	⬇️
...tages/impl/preparators/SanityCheckerMetadata.scala	`84.45% <0.00%> (-5.41%)`	⬇️
...s/impl/preparators/DerivedFeatureFilterUtils.scala	`88.67% <0.00%> (-4.41%)`	⬇️
...com/salesforce/op/utils/stages/FitStagesUtil.scala	`89.47% <0.00%> (-3.95%)`	⬇️
...rce/op/stages/impl/preparators/SanityChecker.scala	`88.93% <0.00%> (-1.64%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4d46181...3d036d2. Read the comment docs.

gerashegalov · 2020-09-02T19:47:55Z

core/src/main/scala/com/salesforce/op/OpWorkflowModelReader.scala

- }
- val finalPath = partPath.getOrElse(path)
+
+ val partFile = new Path(pathString, "part-00000")


better configure the part file name on the write call path so we don't have to guess.

it will always be this but the compression codec will determine the extension

gerashegalov · 2020-09-02T19:49:15Z

core/src/main/scala/com/salesforce/op/OpWorkflowModelReader.scala

+
+ val partFile = new Path(pathString, "part-00000")
+ val partZipped = new Path(pathString, "part-00000.gz")
+ val finalPath = if (fs.exists(partZipped)) partZipped else if (fs.exists(partFile)) partFile else path


exists add redundant filesystem calls that will be even more expensive with S3

we can just call open directly and catch FileNotFound to try another filename

@gerashegalov fixed to just try loading

leahmcguire · 2020-09-02T19:50:41Z

@gerashegalov suggested this for faster reading

gerashegalov · 2020-09-02T19:53:59Z

Is it faster? What if another compression technique is being used, i.e. 7Zip, then we would have .7z extension?

@tovbinm it should be faster on object stores since you have fast path lookups but slow entry list performance.

…istence

leahmcguire · 2020-09-04T03:33:17Z

@tovbinm this covers no compression and our default settings as well as passing in the exact full path. Do you think we need to cover more possibilities?

…rifAI into lm/loadSpecificFiles

gerashegalov

LGTM, minor comment

gerashegalov · 2020-09-10T04:38:58Z

core/src/main/scala/com/salesforce/op/OpWorkflowModelReader.scala

+ }
+ }
+ } match {
+ case Failure(e) => throw new RuntimeException(s"Failed to load workflow because of $e")


Let us keep e as the cause exception as well, i.e. provided as the second parameter to the RTE constructor.

gerashegalov · 2020-09-10T04:53:00Z

this covers no compression and our default settings as well as passing in the exact full path. Do you think we need to cover more possibilities?

I don't think we need to make it that complex. At most I would make Compression Codec configurable instead of doing any guesses at all.

…loadSpecificFiles

…ng the files and searching through them (#505)" This reverts commit be60275

changed to look specifically for part files

7904d72

leahmcguire requested review from gerashegalov, Jauntbox, tovbinm and wsuchy as code owners September 2, 2020 18:45

gerashegalov reviewed Sep 2, 2020

View reviewed changes

leahmcguire added 2 commits September 3, 2020 20:30

trying to read and catching failures rather than checking for file ex…

ee66afe

…istence

Merge branch 'master' into lm/loadSpecificFiles

373516d

leahmcguire added 2 commits September 3, 2020 20:47

style

7e39caa

Merge branch 'lm/loadSpecificFiles' of github.com:salesforce/Transmog…

3f11310

…rifAI into lm/loadSpecificFiles

gerashegalov approved these changes Sep 10, 2020

View reviewed changes

added exception to error

9bdb361

leahmcguire requested a review from nicodv as a code owner September 14, 2020 19:13

leahmcguire added 2 commits September 14, 2020 13:12

style

7c30e38

Merge branch 'master' of github.com:salesforce/TransmogrifAI into lm/…

3d036d2

…loadSpecificFiles

leahmcguire merged commit be60275 into master Sep 14, 2020

nicodv added a commit that referenced this pull request Sep 16, 2020

Revert "changed to look specifically for part files rather than listi…

392d3e6

…ng the files and searching through them (#505)" This reverts commit be60275

tovbinm deleted the lm/loadSpecificFiles branch January 18, 2021 05:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changed to look specifically for part files rather than listing the files and searching through them #505

changed to look specifically for part files rather than listing the files and searching through them #505

leahmcguire commented Sep 2, 2020

tovbinm commented Sep 2, 2020

codecov bot commented Sep 2, 2020 •

edited

Loading

gerashegalov Sep 2, 2020

leahmcguire Sep 2, 2020

gerashegalov Sep 2, 2020

leahmcguire Sep 4, 2020

leahmcguire commented Sep 2, 2020

gerashegalov commented Sep 2, 2020

leahmcguire commented Sep 4, 2020

gerashegalov left a comment

gerashegalov Sep 10, 2020

gerashegalov commented Sep 10, 2020

changed to look specifically for part files rather than listing the files and searching through them #505

changed to look specifically for part files rather than listing the files and searching through them #505

Conversation

leahmcguire commented Sep 2, 2020

tovbinm commented Sep 2, 2020

codecov bot commented Sep 2, 2020 • edited Loading

Codecov Report

gerashegalov Sep 2, 2020

Choose a reason for hiding this comment

leahmcguire Sep 2, 2020

Choose a reason for hiding this comment

gerashegalov Sep 2, 2020

Choose a reason for hiding this comment

leahmcguire Sep 4, 2020

Choose a reason for hiding this comment

leahmcguire commented Sep 2, 2020

gerashegalov commented Sep 2, 2020

leahmcguire commented Sep 4, 2020

gerashegalov left a comment

Choose a reason for hiding this comment

gerashegalov Sep 10, 2020

Choose a reason for hiding this comment

gerashegalov commented Sep 10, 2020

codecov bot commented Sep 2, 2020 •

edited

Loading