[SPARK-48354][SQL] JDBC Connectors predicate pushdown testing #46642

stefanbuk-db · 2024-05-17T14:09:59Z

What changes were proposed in this pull request?

In this PR, I add a new trait with tests for integration testing of JDBC connectors. Also, I propose changes to MsSqlServerDialect to support more filter push downs.

Why are the changes needed?

These changes are needed for better testing of JDBC connectors and general improvements of push down capabilities.

Does this PR introduce any user-facing change?

No

How was this patch tested?

With added tests.

Was this patch authored or co-authored using generative AI tooling?

No

andrej-db

LGTM

milastdbx · 2024-05-20T14:24:31Z

...ocker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCPushdownTest.scala

+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.tags.DockerTest
+
+@DockerTest


remove this

Why did you put this in docker-integration tests folder ? this seems generic to jdbc

Should we add a way to filterout tests?

Well, it seemed as a fitting place, but it can be used for more than docker integration tests, what is a fitting place for this trait?
Also, we could add some way to filterout tests here, but we already use override def excluded from SparkFunSuite, as suites implementing this trait are extended from there, do we need another method for that in this trait?

milastdbx · 2024-05-20T14:28:04Z

sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala

@@ -141,6 +160,9 @@ private case class MsSqlServerDialect() extends JdbcDialect {
 case ShortType if !SQLConf.get.legacyMsSqlServerNumericMappingEnabled =>
 Some(JdbcType("SMALLINT", java.sql.Types.SMALLINT))
 case ByteType => Some(JdbcType("SMALLINT", java.sql.Types.TINYINT))
+ case LongType => Some(JdbcType("BIGINT", java.sql.Types.BIGINT))
+ case DoubleType => Some(JdbcType("FLOAT", java.sql.Types.FLOAT))
+ case _ if !SQLConf.get.legacyMsSqlServerNumericMappingEnabled => JdbcUtils.getCommonJDBCType(dt)


If question is about a !SQLConf.get.legacyMsSqlServerNumericMappingEnabled, it is added here because there is this config, false by default, and when it is set, some type mapping shouldn't be supported (not sure which, or why there is this config), but if we didn't have this check here, some tests with this config would fail, as we would, for example convert ShortType to SMALLINT even tho we shouldn't.

milastdbx · 2024-05-20T14:30:55Z

sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala

@@ -155,7 +162,8 @@ private case class PostgresDialect() extends JdbcDialect with SQLConfHelper {
 getJDBCType(et).map(_.databaseTypeDefinition)
 .orElse(JdbcUtils.getCommonJDBCType(et).map(_.databaseTypeDefinition))
 .map(typeName => JdbcType(s"$typeName[]", java.sql.Types.ARRAY))
- case _ => None
+ case LongType => Some(JdbcType("BIGINT", Types.BIGINT))
+ case _ => JdbcUtils.getCommonJDBCType(dt)


none code path returns this as well ?

When you return None I mean, JdbcUtils.getCommonJDBCType should be called ?

Not sure if this is what you mean but in visitCast we have getJDBCType(dataType).map(_.databaseTypeDefinition).getOrElse(dataType.typeName), so if we return None, JdbcUtils.getCommonJDBCType won't be called here.

milastdbx · 2024-05-20T14:37:57Z

...ocker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCPushdownTest.scala

+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.tags.DockerTest
+
+@DockerTest


Should we add a way to filterout tests?

milastdbx · 2024-05-20T14:40:12Z

...ration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySqlPushdownIntegrationSuite.scala

+
+class MySqlPushdownIntegrationSuite
+ extends DockerJDBCIntegrationSuite
+ with V2JDBCPushdownTest {


why don't you just extend existing MySqlIntegrationSuite ?

We could. That would run all tests from V2JDBCTest as well? Not sure we want that?

milastdbx · 2024-05-20T14:40:18Z

...-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerPushdownIntegrationSuite.scala

+import org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog
+import org.apache.spark.sql.jdbc.{DatabaseOnDocker, DockerJDBCIntegrationSuite, MsSQLServerDatabaseOnDocker}
+
+class MsSqlServerPushdownIntegrationSuite


Same answer as above

andrej-db · 2024-05-21T13:03:03Z

...ocker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCPushdownTest.scala

+ assert(isAggregateRemoved(df))
+ commonAssertionOnDataFrame(df)
+ }
+


Add DISTINCT test

Suggested change

test("DISTINCT aggregate push down") {

val df = sql(

s"SELECT COUNT(DISTINCT(num_col)) " +

s"FROM `$catalog`.`$schema`.`$tablePrefix`")

checkAnswer(df, Row(5))

assert(isAggregateRemoved(df))

commonAssertionOnDataFrame(df)

}

andrej-db · 2024-05-21T14:07:25Z

...ocker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCPushdownTest.scala

+ assert(isAggregateRemoved(df))
+ commonAssertionOnDataFrame(df)
+ }
+


Add SUM test

Suggested change

test("SUM aggregate push down") {

val df = sql(

s"SELECT SUM(num_col) " +

s"FROM `$catalog`.`$schema`.`$tablePrefix`")

checkAnswer(df, Row(4150))

assert(isAggregateRemoved(df))

commonAssertionOnDataFrame(df)

}

urosstan-db · 2024-05-22T15:45:22Z

...ocker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCPushdownTest.scala

+ executeUpdate(
+ s"""INSERT INTO "$schema"."${tablePrefix}_string_test" VALUES (0, ' forth ', 1000)""")
+
+ executeUpdate(s"""INSERT INTO "$schema"."$tablePrefix" VALUES (1, 'ab', 1000)""")


It is maybe good to insert one null int col, to test aggregate functions

On spark, nulls should be ignored, I am not sure how other databases handles null in aggregates.

stefanbuk-db added 6 commits May 17, 2024 00:27

Add new suite trait, fix LongType in SqlServer

f358599

Add lower and uppert tests

b2d8e6e

Add ABS and lenght tests

82f8252

Add more tests

7585a81

Nit

51ab097

Merge master

7b6c9bd

github-actions bot added the SQL label May 17, 2024

milastdbx approved these changes May 17, 2024

View reviewed changes

stefanbuk-db added 12 commits May 17, 2024 16:58

bugfix

f7ba8f3

Add AVG bugfix for SqlServer

9de6982

Add count test

d5b42e4

Fix case when

d811423

Postgres fixes

8f867b2

Add MySql Tests

5216c3b

nit

04e8d22

fix compilation error

8e398a5

fix nit

a3270f1

Add sortOrder tests

bcaf8aa

Sort order tests

a0116cd

Change tests

1222d1a

stefanbuk-db changed the title ~~[WIP] JDBC Connectors predicate pushdown testing~~ [SPARK-48354][SQL] JDBC Connectors predicate pushdown testing May 20, 2024

stefanbuk-db marked this pull request as ready for review May 20, 2024 14:03

andrej-db approved these changes May 20, 2024

View reviewed changes

milastdbx suggested changes May 20, 2024

View reviewed changes

stefanbuk-db added 3 commits May 20, 2024 18:09

nit

4cb84cb

Reenable some tests

0870a79

Fix imports

94e1131

andrej-db reviewed May 21, 2024

View reviewed changes

urosstan-db reviewed May 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-48354][SQL] JDBC Connectors predicate pushdown testing #46642

[SPARK-48354][SQL] JDBC Connectors predicate pushdown testing #46642

stefanbuk-db commented May 17, 2024

andrej-db left a comment •

edited

milastdbx May 20, 2024

milastdbx May 20, 2024

milastdbx May 20, 2024

stefanbuk-db May 20, 2024

milastdbx May 20, 2024

stefanbuk-db May 20, 2024

milastdbx May 20, 2024

milastdbx May 20, 2024

stefanbuk-db May 20, 2024

milastdbx May 20, 2024

milastdbx May 20, 2024

stefanbuk-db May 20, 2024

milastdbx May 20, 2024

stefanbuk-db May 20, 2024

andrej-db May 21, 2024

andrej-db May 21, 2024

urosstan-db May 22, 2024

urosstan-db May 22, 2024

+ test("DISTINCT aggregate push down") {
+ val df = sql(
+ s"SELECT COUNT(DISTINCT(num_col)) " +
+ s"FROM `$catalog`.`$schema`.`$tablePrefix`")
+ checkAnswer(df, Row(5))
+ assert(isAggregateRemoved(df))
+ commonAssertionOnDataFrame(df)
+ }

[SPARK-48354][SQL] JDBC Connectors predicate pushdown testing #46642

Are you sure you want to change the base?

[SPARK-48354][SQL] JDBC Connectors predicate pushdown testing #46642

Conversation

stefanbuk-db commented May 17, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

andrej-db left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrej-db left a comment •

edited