Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-14526][hive] Support Hive version 1.1.0 and 1.1.1 #9995

Closed
wants to merge 4 commits into from

Conversation

lirui-apache
Copy link
Contributor

What is the purpose of the change

To support Hive 1.1.0 and 1.1.1

Brief change log

  • Added two profiles and two shim implementations for 1.1.0 and 1.1.1 respectively.
  • 1.1.x HMS always tries to compute some table stats during altering a non-partitioned table, which can override what we put in the table parameters. Therefore altering non-partitioned table stats for 1.1.x is disabled.
  • DATE column stats are not supported in 1.1.x. Therefore shim methods are added to handle these column stats.
  • Made HiveShim serializable because it's needed in HiveGenericUDF and HiveGenericUDTF. This change should have been made before this PR. We were good only because there's no test to expose this issue.
  • Java constant object inspectors are not available in 1.1.x. Therefore added a shim method to get constant object inspectors. And because of this, we have to handle Writables for Hive UDFs if they're initialized with some constant arguments. Added a conversion from Hive Java primitive to Hive Writable primitive for this.
  • 1.1.x HCatalog artifacts are built with Hadoop-1 and incompatible with Hadoop-2. As a result, we cannot use the Hive runner to generate table data during tests. Added a util method to serve this purpose.
  • For features unsupported by 1.1.x, related test cases are disabled for these old Hive versions.

Verifying this change

Covered by existing test cases.
Manually verified by running mvn verify -Phive-1.1.0 and mvn verify -Phive-1.1.1.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): yes
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? NA

@lirui-apache
Copy link
Contributor Author

cc @xuefuz @bowenli86 @zjuwangg

@flinkbot
Copy link
Collaborator

flinkbot commented Oct 25, 2019

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 2f28e38 (Wed Dec 04 14:51:54 UTC 2019)

Warnings:

  • 1 pom.xml files were touched: Check for build and licensing issues.
  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Oct 25, 2019

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build

Copy link
Member

@bowenli86 bowenli86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Btw, can you confirm all the test profiles have passed with this change?

String className;
switch (primitiveTypeInfo.getPrimitiveCategory()) {
case BOOLEAN:
className = "org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaConstantBooleanObjectInspector";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why does 1.1.0 check WritableConstant and 1.1.1 check JavaConstant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because java constant object inspectors are not available until 1.2.0

Copy link
Contributor

@xuefuz xuefuz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current inheritance pattern is that later version hive shim inherits from the previous version unless there is a compelling reason. This is to reduce code duplication.

However, HiveShimV120 isn't inherited from HiveShimV111.

It's good if @bowenli86 can also review.

flink-connectors/flink-connector-hive/pom.xml Outdated Show resolved Hide resolved
import java.util.List;
import java.util.Map;

/**
* A shim layer to support different versions of Hive.
*/
public interface HiveShim {
public interface HiveShim extends Serializable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably better to leave this change (about serialization) to a dedicated PR with relevant tests. It makes review easier and limits the scope of this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test will fail if HiveShim is not serializable. Do you mean we should first open and merge a dedicated PR for this and come back later?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the test fail before this PR, as HiveShim is not serializable before. If we need the fix to pass the test, we can put in the same PR to reduce the overhead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests pass before this PR only because we don't have enough test coverage to expose the issue. Before this PR, HiveShim is only used to get object inspectors for DATE and TIMESTAMP columns, while with the PR, HiveShim is needed for all columns. So we need the fix to pass the tests.

@lirui-apache
Copy link
Contributor Author

PR updated to address comments. Also verified tests pass for all our profiles.
@xuefuz @bowenli86 please take a look. Thanks.

Copy link
Contributor

@xuefuz xuefuz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 looks good to me on my side.

@lirui-apache
Copy link
Contributor Author

Rebased. And noted INTERVAL PrimitiveCategory is not available in 1.1.x. So added our own enum for primitive categories.
@xuefuz @bowenli86 Please let me know if you have further comments. Thanks.

@bowenli86
Copy link
Member

bowenli86 commented Oct 29, 2019

LGTM, thanks for your contribution! Merging

@asfgit asfgit closed this in 6cc09ac Oct 29, 2019
@lirui-apache lirui-apache deleted the FLINK-14526 branch October 30, 2019 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants