Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added tests for CSVInOut util #42

Merged
merged 5 commits into from
Aug 9, 2018
Merged

Added tests for CSVInOut util #42

merged 5 commits into from
Aug 9, 2018

Conversation

ajayborra
Copy link
Contributor

Describe the proposed solution

  • Added tests for CSVInOut class to improve coverage.

@ajayborra ajayborra changed the title Added path validation & Tests for CSVInOut util Added tests for CSVInOut util Aug 8, 2018
Copy link
Collaborator

@tovbinm tovbinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so far so good, please see comments.

private val csvReader = new CSVInOut(CSVOptions())
private val f1 = resourceFile(name = "test_sample.csv")

"CSVInOut" should "throw AnalysisException for bad file paths" in {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. use Spec for naming the tests
  2. dont mention the error in the test name, cause it might change:
    I.e.
Spec[CSVInOut] should "throw an error for bad file paths" in { ... }

var error = intercept[AnalysisException] {
csvReader.readDataFrame("/bad/file/path/read/dataframe")
}
error.getMessage().contains("Path does not exist: file:/bad/file/path/read/dataframe") shouldBe true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. avoid using var-s, unless absolutely mandatory.
  2. please use a more specific should matcher, cause this would make a test failure message nicer:
val error = intercept[AnalysisException](csvReader.readDataFrame("/bad/file/path/read/dataframe"))
error.getMessage should endWith "Path does not exist: file:/bad/file/path/read/dataframe"

error.getMessage().contains("Path does not exist: file:/bad/file/path/read/rdd") shouldBe true
}

it should "read a CSV file to DataFrame/RDD" in {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two should be two separate test cases:

it should "read a CSV file to DataFrame" in {
   val res = csvReader.readDataFrame(f1.getAbsolutePath)
   res shouldBe a[DataFrame]
   res.count shouldBe 2
}
it should "read a CSV file to RDD" in {
  val res = csvReader.readRDD(f1.getAbsolutePath)
  res   shouldBe a[RDD[Seq[String]]]
  res.count shouldBe 2
}

@@ -0,0 +1,2 @@
a,b,c,d
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add a bit larger csv file as well. and a one with headers. and perhaps some special characters.

@codecov
Copy link

codecov bot commented Aug 8, 2018

Codecov Report

Merging #42 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff            @@
##           master      #42    +/-   ##
========================================
  Coverage   83.98%   83.98%            
========================================
  Files         296      296            
  Lines        9679     9679            
  Branches      351      544   +193     
========================================
  Hits         8129     8129            
  Misses       1550     1550

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8aa559c...b210251. Read the comment docs.

@ajayborra
Copy link
Contributor Author

Made the requested changes, Code looks much cleaner now 👍

@@ -0,0 +1,101 @@
Id,Description,Name,Item0,Item1,Item2,Item3,Vendor,Category,Probability
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's he source of this dataset? maybe pick a simple csv file from here - https://github.com/awesomedata/awesome-public-datasets

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was using the data from this source https://www.sample-videos.com/download-sample-csv.php.
Will pick a simpler one from the above source.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Thank you. We need to make sure its a public dataset with no copyright restrictions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Planning to add a scaled-down version of this dataset https://www.kaggle.com/grubenm/austin-weather, Its license is GPL v2. Sounds good?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check. I will let you know.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just use an existing csv file we already have. You can load it like this in your test:

def testDataPath: String = {
    Some(new File("test-data")) filter (_.isDirectory) getOrElse new File("../test-data") getPath
}
val csvFile: String = s"$testDataPath/PassengerDataAllWithHeader.csv"

Copy link
Contributor Author

@ajayborra ajayborra Aug 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation:

Some(new File("test-data")) filter (_.isDirectory) getOrElse new File("../test-data") getPath

would fail if we create a test-data dir under utils in future. Symlinking ../test-data into utils directory can solve the issue. But, Symlinking comes with its own cross platform issues like windows file systems doesn't support linking etc. We can ignore this details for now i guess.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Not an issue right now though.


it should "read a CSV file to RDD" in {
val res = csvReader.readRDD(f1.getAbsolutePath)
res shouldBe a[RDD[Seq[String]]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant space

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if there is a good plugin for auto-linting/formatting for IntelliJ?

Copy link
Collaborator

@tovbinm tovbinm Aug 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intellij support code formating out of the box. You can import the Intellij settings xml from gradle/intellij-codestyle-config.xml file

Copy link
Collaborator

@tovbinm tovbinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@tovbinm tovbinm merged commit c2086e5 into salesforce:master Aug 9, 2018
@ajayborra ajayborra deleted the coverage/improve_coverage_for_csv_util branch August 9, 2018 14:26
ericwayman pushed a commit that referenced this pull request Feb 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants