Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-5103][VL] Support JVM libhdfs in velox #5384

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

JkSelf
Copy link
Contributor

@JkSelf JkSelf commented Apr 12, 2024

What changes were proposed in this pull request?

Add libhdfs API support in velox backend.

How was this patch tested?

Locally testing TPC-DS queries to loading data from hdfs.

@JkSelf
Copy link
Contributor Author

JkSelf commented Apr 12, 2024

@zhouyuan

@zhouyuan zhouyuan changed the title [VL] Support libhdfs in velox [VL] Support JVM libhdfs in velox Apr 12, 2024
@zhouyuan zhouyuan changed the title [VL] Support JVM libhdfs in velox [GLUTEN-5103][VL] Support JVM libhdfs in velox Apr 12, 2024
Copy link

#5103

@Yohahaha
Copy link
Contributor

will we keep an option to let user disable this feature?

@FelixYBW
Copy link
Contributor

Is there related changes in oap/velox?

@zhouyuan
Copy link
Contributor

zhouyuan commented Apr 13, 2024 via email

@wangyum
Copy link
Member

wangyum commented Apr 13, 2024

Thank you @JkSelf . It throw exception when reading data:
image

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f177f03c3c4, pid=445510, tid=446238
#
# JRE version: OpenJDK Runtime Environment Zulu17.40+20-SA (17.0.6+10) (build 17.0.6+10-LTS)
# Java VM: OpenJDK 64-Bit Server VM Zulu17.40+20-SA (17.0.6+10-LTS, mixed mode, sharing, tiered, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libvelox.so+0x450c3c4]  facebook::velox::HdfsReadFile::size() const+0x4
#
# Core dump will be written. Default location: /hadoop/7/yarn/local/usercache/user_spark/appcache/application_1708496355206_29565/container_e2313_1708496355206_29565_01_000108/core.445510
#
# An error report file with more information is saved as:
# /hadoop/7/yarn/local/usercache/user_spark/appcache/application_1708496355206_29565/container_e2313_1708496355206_29565_01_000108/hs_err_pid445510.log
#
# If you would like to submit a bug report, please visit:
#   https://www.azul.com/support/
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
$ cat /hadoop/5/yarn/local/usercache/user_spark/appcache/application_1708496355206_29565/container_e2313_1708496355206_29565_01_000005/hs_err_pid87813.log08496355206_29565_01_000005/hs
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f3c1677c3c4, pid=87813, tid=95868
#
# JRE version: OpenJDK Runtime Environment Zulu17.40+20-SA (17.0.6+10) (build 17.0.6+10-LTS)
# Java VM: OpenJDK 64-Bit Server VM Zulu17.40+20-SA (17.0.6+10-LTS, mixed mode, sharing, tiered, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libvelox.so+0x450c3c4]  facebook::velox::HdfsReadFile::size() const+0x4
#
# Core dump will be written. Default location: /hadoop/5/yarn/local/usercache/user_spark/appcache/application_1708496355206_29565/container_e2313_1708496355206_29565_01_000005/core.87813
#
# If you would like to submit a bug report, please visit:
#   https://www.azul.com/support/
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

---------------  S U M M A R Y ------------

Command Line: ...
Host: Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz, 80 cores, 470G, Ubuntu 22.04.1 LTS
Time: Sat Apr 13 03:13:49 2024 -07 elapsed time: 158.672676 seconds (0d 0h 2m 38s)

---------------  T H R E A D  ---------------

Current thread (0x000055b3f1c11780):  JavaThread "Executor task launch worker for task 0.2 in stage 3.0 (TID 5484)" daemon [_thread_in_native, id=95868, stack(0x00007f3c29800000,0x00007f3c2a000000)]

Stack: [0x00007f3c29800000,0x00007f3c2a000000],  sp=0x00007f3c29ffc948,  free space=8178k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libvelox.so+0x450c3c4]  facebook::velox::HdfsReadFile::size() const+0x4
C  [libvelox.so+0x1d8488c]  facebook::velox::connector::hive::SplitReader::createReader()+0xac

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(J)Z+0
j  org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNextInternal()Z+5
j  org.apache.gluten.vectorized.GeneralOutIterator.hasNext()Z+1
J 4196 c2 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext()Z (10 bytes) @ 0x00007f47f7ea89c0 [0x00007f47f7ea8980+0x0000000000000040]
j  org.apache.gluten.utils.InvocationFlowProtection.hasNext()Z+59
j  org.apache.gluten.utils.IteratorCompleter.hasNext()Z+4
j  org.apache.gluten.utils.PayloadCloser.hasNext()Z+8
j  org.apache.gluten.utils.PipelineTimeAccumulator.hasNext()Z+4
J 8418 c1 org.apache.spark.InterruptibleIterator.hasNext()Z (17 bytes) @ 0x00007f47e84f12e4 [0x00007f47e84f0f20+0x00000000000003c4]
j  scala.collection.Iterator$$anon$12.hasNext()Z+11
J 7847 c1 scala.collection.Iterator$$anon$10.hasNext()Z (10 bytes) @ 0x00007f47e843c144 [0x00007f47e843c040+0x0000000000000104]
J 10039 c2 scala.collection.AbstractIterator.foreach(Lscala/Function1;)V (6 bytes) @ 0x00007f47f81dcfe4 [0x00007f47f81dcfa0+0x0000000000000044]
J 8993 c2 scala.collection.generic.Growable.$plus$plus$eq(Lscala/collection/TraversableOnce;)Lscala/collection/generic/Growable; (34 bytes) @ 0x00007f47f818f3a8 [0x00007f47f818f260+0x0000000000000148]
J 9168 c1 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(Lscala/collection/TraversableOnce;)Lscala/collection/mutable/ArrayBuffer; (65 bytes) @ 0x00007f47e86ba9f4 [0x00007f47e86ba3c0+0x0000000000000634]
J 9167 c1 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(Lscala/collection/TraversableOnce;)Lscala/collection/generic/Growable; (6 bytes) @ 0x00007f47e86b8c44 [0x00007f47e86b8bc0+0x0000000000000084]
...

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Jun 4, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Jun 4, 2024

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale stale label Jul 30, 2024
@ulysses-you ulysses-you removed the stale stale label Jul 30, 2024
@ArnavBalyan
Copy link
Contributor

Hi team, this would be very helpful, just wanted to know if there are any plans to support this thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants