Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-28150][sql-gateway][hive] Introduce the hiveserver2 endpoint and factory #20101

Closed
wants to merge 7 commits into from

Conversation

fsk119
Copy link
Member

@fsk119 fsk119 commented Jun 29, 2022

What is the purpose of the change

Introduce the HiveServer2 Endpoint and its Factory. With the HiveServer2 Endpoint, users is avaliable to connect to the SqlGateway with Hive JDBC.

Brief change log

  • Introduce the endpoint and its factory. With the factory, the SqlGateway is able to load the HiveServer2 Endpoint
  • Expose the required options to configure HiveServer2 Endpoint. The options are discussed in the FLIP-223 except the option thrift.max.message.size, thrift.login.timeout, thrift.exponential.backoff.slot.length. The newly introduced options are align with the hive, please refer to the HiveConf. The PR also introduce the option module.name that aligns with the option catalog.name as the default hive module name.
  • Supports OpenSession and CloseSession in HiveServer2Endpoint.

Verifying this change

This change added tests and can be verified as follows:

  • Added ITCase to openSession and closeSession with HiveServerEndoint2
  • Added test that the endpoint factory can create the expected endpoint.
  • Added the client can get expected error message with HiveServer2 endpoint

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Jun 29, 2022

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@fsk119 fsk119 force-pushed the hivesever2-endpoint branch 6 times, most recently from 40d67f3 to 4b5c49f Compare July 19, 2022 07:11
Copy link
Contributor

@link3280 link3280 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fsk119 The codes generally look good to me. I left some minor comments, PTAL. Thanks!

// Server Options
// --------------------------------------------------------------------------------------------

public static final ConfigOption<Integer> THRIFT_PORT =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these options are scoped to Hive endpoint, should we add a prefix like hive-endpoint.?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the config option name should contain a prefix? Actually, we already finished this in the SqlGatewayEndpointFactoryUtils#createSqlGatewayEndpoint. The util will automatically search the options that begin with the sql-gateway.endpoint.<identifier>, e.g. sql-gateway.endpoint.hiveserver2 here. You can take a look at the test in the SqlGatewayEndpointFactoryUtilsTest#testCreateEndpoints.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM now.

public void close() {
operationManager.close();
for (String name : sessionState.catalogManager.listCatalogs()) {
sessionState.catalogManager.getCatalog(name).ifPresent(Catalog::close);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a CatalogException when closing a catalog. I think it'd be better to wrap it with a try-catch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It might get a resource leak if one of the catalogs fails to close.

Copy link
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contribution. I left some comments.

? Collections.emptyMap()
: tOpenSessionReq.getConfiguration();
Map<String, String> sessionConfig = new HashMap<>();
sessionConfig.put(TABLE_SQL_DIALECT.key(), SqlDialect.HIVE.name());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive dialect is the default dialect?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We want the HiveSever2 endpoint is compatible with HiveServer2. With the HiveServer2 endpoint, users can use hive dialect directly.

TProtocolVersion clientProtocol = tOpenSessionReq.getClient_protocol();
HiveServer2EndpointVersion sessionVersion =
HiveServer2EndpointVersion.valueOf(
TProtocolVersion.findByValue(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TProtocolVersion.findByValue may return null, then HiveServer2EndpointVersion.valueOf(null) will throw the exception will be the message Unknown TProtocolVersion: null, which may not user friendly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it's impossible because we use min(server_version, ...) here. The negotiation results should be less than the Server Version.


HIVE_CLI_SERVICE_PROTOCOL_V9(TProtocolVersion.HIVE_CLI_SERVICE_PROTOCOL_V9),

HIVE_CLI_SERVICE_PROTOCOL_V10(TProtocolVersion.HIVE_CLI_SERVICE_PROTOCOL_V10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version for Hive3 will be up to TProtocolVersion.HIVE_CLI_SERVICE_PROTOCOL_V11. But since miss it seems won't cause any problem, it's just a reminder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out! After offline discussion, I think the only way to support HiveSever2 in Hive 3.x is to expose getHiveServer2Endpoint and getHiveServer2EndpointVersion in HiveShim. This is because HiveServer2Endpoint implements TCLIService.Iface and the interface is different in the different hive version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the HiveServer2EndpointVersion and HiveServer2Endpoint is not annotated with @PublicEvolving, I think we can expose this in the future.

}

@Test
public void testOpenSessionWithConfig() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it for open session with config? Seems we haven't set configuration in this test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I add more configs here.

Copy link
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fsk119
Copy link
Member Author

fsk119 commented Jul 23, 2022

@flinkbot run azure

@fsk119 fsk119 closed this in d067629 Jul 24, 2022
huangxiaofeng10047 pushed a commit to huangxiaofeng10047/flink that referenced this pull request Nov 3, 2022
@fsk119 fsk119 deleted the hivesever2-endpoint branch August 9, 2023 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants