Skip to content

Commit

Permalink
[FLINK-7973] Fix shading and relocating Hadoop for the S3 filesystems
Browse files Browse the repository at this point in the history
- do not shade everything, especially not JDK classes!
-> instead define include patterns explicitly
- do not shade core Flink classes (only those imported from flink-hadoop-fs)
- hack around Hadoop loading (unshaded/non-relocated) classes based on names in
  the core-default.xml by overwriting the Configuration class (we may need to
  extend this for the mapred-default.xml and hdfs-defaults.xml):
-> provide a core-default-shaded.xml file with shaded class names and copy and
  adapt the Configuration class of the respective Hadoop version to load this
  file instead of core-default.xml.

Add checkstyle suppression pattern for the Hadoop Configuration classes

Also fix the (integration) tests not working because they tried to load the
relocated classes which are apparently not available there

Remove minimizeJar from shading of flink-s3-fs-presto because this was
causing "java.lang.ClassNotFoundException:
org.apache.flink.fs.s3presto.shaded.org.apache.commons.logging.impl.LogFactoryImpl"
since these classes are not statically imported and thus removed when
minimizing.

Fix s3-fs-presto not shading org.HdrHistogram

Fix log4j being relocated in the S3 fs implementations

Add shading checks to travis
  • Loading branch information
Nico Kruber authored and aljoscha committed Nov 13, 2017
1 parent 32e5194 commit 0e5fb0b
Show file tree
Hide file tree
Showing 12 changed files with 14,778 additions and 24 deletions.
27 changes: 27 additions & 0 deletions flink-filesystems/flink-s3-fs-hadoop/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
This project is a wrapper around Hadoop's s3a file system. By pulling a smaller dependency tree and
shading all dependencies away, this keeps the appearance of Flink being Hadoop-free,
from a dependency perspective.

We also relocate the shaded Hadoop version to allow running in a different
setup. For this to work, however, we needed to adapt Hadoop's `Configuration`
class to load a (shaded) `core-default-shaded.xml` configuration with the
relocated class names of classes loaded via reflection
(in the future, we may need to extend this to `mapred-default.xml` and `hdfs-defaults.xml` and their respective configuration classes).

# Changing the Hadoop Version

If you want to change the Hadoop version this project depends on, the following
steps are required to keep the shading correct:

1. copy `org/apache/hadoop/conf/Configuration.java` from the respective Hadoop jar file to this project
- adapt the `Configuration` class by replacing `core-default.xml` with `core-default-shaded.xml`.
2. copy `core-default.xml` from the respective Hadoop jar file to this project as
- `src/main/resources/core-default-shaded.xml` (replacing every occurence of `org.apache.hadoop` with `org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop`)
- `src/test/resources/core-site.xml` (as is)
3. verify the shaded jar:
- does not contain any unshaded classes except for `org.apache.flink.fs.s3hadoop.S3FileSystemFactory`
- all other classes should be under `org.apache.flink.fs.s3hadoop.shaded`
- there should be a `META-INF/services/org.apache.flink.fs.s3hadoop.S3FileSystemFactory` file pointing to the `org.apache.flink.fs.s3hadoop.S3FileSystemFactory` class
- other service files under `META-INF/services` should have their names and contents in the relocated `org.apache.flink.fs.s3hadoop.shaded` package
- contains a `core-default-shaded.xml` file
- does not contain a `core-default.xml` or `core-site.xml` file
84 changes: 74 additions & 10 deletions flink-filesystems/flink-s3-fs-hadoop/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ under the License.
<packaging>jar</packaging>

<properties>
<!-- Do not change this without updating the copied Configuration class! -->
<s3hadoop.hadoop.version>2.8.1</s3hadoop.hadoop.version>
<s3hadoop.aws.version>1.11.95</s3hadoop.aws.version>
</properties>
Expand Down Expand Up @@ -234,28 +235,87 @@ under the License.
</artifactSet>
<relocations>
<relocation>
<pattern>org</pattern>
<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org</shadedPattern>
<pattern>com.amazonaws</pattern>
<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.com.amazonaws</shadedPattern>
</relocation>
<relocation>
<pattern>com.fasterxml</pattern>
<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.com.fasterxml</shadedPattern>
</relocation>
<relocation>
<pattern>com.google</pattern>
<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.com.google</shadedPattern>
<excludes>
<!-- provided -->
<exclude>com.google.code.findbugs.**</exclude>
</excludes>
</relocation>
<relocation>
<pattern>com.nimbusds</pattern>
<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.com.nimbusds</shadedPattern>
</relocation>
<relocation>
<pattern>com.squareup</pattern>
<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.com.squareup</shadedPattern>
</relocation>
<relocation>
<pattern>net.jcip</pattern>
<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.net.jcip</shadedPattern>
</relocation>
<relocation>
<pattern>net.minidev</pattern>
<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.net.minidev</shadedPattern>
</relocation>

<!-- relocate everything from the flink-hadoop-fs project -->
<relocation>
<pattern>org.apache.flink.runtime.fs.hdfs</pattern>
<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.fs.hdfs</shadedPattern>
</relocation>
<relocation>
<pattern>org.apache.flink.runtime.util</pattern>
<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.util</shadedPattern>
<includes>
<include>org.apache.flink.runtime.util.**Hadoop*</include>
</includes>
</relocation>

<relocation>
<pattern>org.apache</pattern>
<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org.apache</shadedPattern>
<excludes>
<exclude>org.apache.flink.core.fs.FileSystemFactory</exclude>
<exclude>org.apache.flink.fs.s3hadoop.**</exclude>
<!-- keep all other classes of flink as they are (exceptions above) -->
<exclude>org.apache.flink.**</exclude>
<exclude>org.apache.log4j.**</exclude> <!-- provided -->
</excludes>
</relocation>
<relocation>
<pattern>com</pattern>
<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.com</shadedPattern>
<pattern>org.codehaus</pattern>
<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org.codehaus</shadedPattern>
</relocation>
<relocation>
<pattern>org.joda</pattern>
<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org.joda</shadedPattern>
</relocation>
<relocation>
<pattern>org.mortbay</pattern>
<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org.mortbay</shadedPattern>
</relocation>
<relocation>
<pattern>org.tukaani</pattern>
<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org.tukaani</shadedPattern>
</relocation>
<relocation>
<pattern>net</pattern>
<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.net</shadedPattern>
<pattern>org.znerd</pattern>
<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org.znerd</shadedPattern>
</relocation>
<relocation>
<pattern>okio</pattern>
<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.okio</shadedPattern>
</relocation>
<relocation>
<pattern>software</pattern>
<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.software</shadedPattern>
<pattern>software.amazon</pattern>
<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.software.amazon</shadedPattern>
</relocation>
</relocations>
<filters>
Expand All @@ -277,6 +337,10 @@ under the License.
<exclude>META-INF/maven/org.apache.commons/**</exclude>
<exclude>META-INF/maven/org.apache.flink/flink-hadoop-fs/**</exclude>
<exclude>META-INF/maven/org.apache.flink/force-shading/**</exclude>
<!-- we use our own "shaded" core-default.xml: core-default-shaded.xml -->
<exclude>core-default.xml</exclude>
<!-- we only add a core-site.xml with unshaded classnames for the unit tests -->
<exclude>core-site.xml</exclude>
</excludes>
</filter>
</filters>
Expand Down
Loading

0 comments on commit 0e5fb0b

Please sign in to comment.