-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added tests for CSVInOut util #42
Added tests for CSVInOut util #42
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so far so good, please see comments.
private val csvReader = new CSVInOut(CSVOptions()) | ||
private val f1 = resourceFile(name = "test_sample.csv") | ||
|
||
"CSVInOut" should "throw AnalysisException for bad file paths" in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- use
Spec
for naming the tests - dont mention the error in the test name, cause it might change:
I.e.
Spec[CSVInOut] should "throw an error for bad file paths" in { ... }
var error = intercept[AnalysisException] { | ||
csvReader.readDataFrame("/bad/file/path/read/dataframe") | ||
} | ||
error.getMessage().contains("Path does not exist: file:/bad/file/path/read/dataframe") shouldBe true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- avoid using
var
-s, unless absolutely mandatory. - please use a more specific should matcher, cause this would make a test failure message nicer:
val error = intercept[AnalysisException](csvReader.readDataFrame("/bad/file/path/read/dataframe"))
error.getMessage should endWith "Path does not exist: file:/bad/file/path/read/dataframe"
error.getMessage().contains("Path does not exist: file:/bad/file/path/read/rdd") shouldBe true | ||
} | ||
|
||
it should "read a CSV file to DataFrame/RDD" in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these two should be two separate test cases:
it should "read a CSV file to DataFrame" in {
val res = csvReader.readDataFrame(f1.getAbsolutePath)
res shouldBe a[DataFrame]
res.count shouldBe 2
}
it should "read a CSV file to RDD" in {
val res = csvReader.readRDD(f1.getAbsolutePath)
res shouldBe a[RDD[Seq[String]]]
res.count shouldBe 2
}
@@ -0,0 +1,2 @@ | |||
a,b,c,d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add a bit larger csv file as well. and a one with headers. and perhaps some special characters.
Codecov Report
@@ Coverage Diff @@
## master #42 +/- ##
========================================
Coverage 83.98% 83.98%
========================================
Files 296 296
Lines 9679 9679
Branches 351 544 +193
========================================
Hits 8129 8129
Misses 1550 1550 Continue to review full report at Codecov.
|
Made the requested changes, Code looks much cleaner now 👍 |
@@ -0,0 +1,101 @@ | |||
Id,Description,Name,Item0,Item1,Item2,Item3,Vendor,Category,Probability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's he source of this dataset? maybe pick a simple csv file from here - https://github.com/awesomedata/awesome-public-datasets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was using the data from this source https://www.sample-videos.com/download-sample-csv.php.
Will pick a simpler one from the above source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. Thank you. We need to make sure its a public dataset with no copyright restrictions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Planning to add a scaled-down version of this dataset https://www.kaggle.com/grubenm/austin-weather, Its license is GPL v2. Sounds good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me check. I will let you know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just use an existing csv file we already have. You can load it like this in your test:
def testDataPath: String = {
Some(new File("test-data")) filter (_.isDirectory) getOrElse new File("../test-data") getPath
}
val csvFile: String = s"$testDataPath/PassengerDataAllWithHeader.csv"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Observation:
Some(new File("test-data")) filter (_.isDirectory) getOrElse new File("../test-data") getPath
would fail if we create a test-data
dir under utils in future. Symlinking ../test-data
into utils
directory can solve the issue. But, Symlinking comes with its own cross platform issues like windows file systems doesn't support linking etc. We can ignore this details for now i guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Not an issue right now though.
|
||
it should "read a CSV file to RDD" in { | ||
val res = csvReader.readRDD(f1.getAbsolutePath) | ||
res shouldBe a[RDD[Seq[String]]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if there is a good plugin for auto-linting/formatting for IntelliJ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intellij support code formating out of the box. You can import the Intellij settings xml from gradle/intellij-codestyle-config.xml
file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
Describe the proposed solution