Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] system hang during spill of hashagg #6180

Open
FelixYBW opened this issue Jun 21, 2024 · 0 comments
Open

[VL] system hang during spill of hashagg #6180

FelixYBW opened this issue Jun 21, 2024 · 0 comments
Assignees
Labels
bug Something isn't working triage

Comments

@FelixYBW
Copy link
Contributor

Backend

VL (Velox)

Bug description

Error message.

W20240621 15:14:01.929342 114227 Operator.cpp:641] Can't reclaim from memory pool op.5.0.0.Aggregation which is under non-reclaimable section, memory usage: 231.99MB, reservation: 232.00MB
W20240621 15:14:01.930936 101314 Operator.cpp:641] Can't reclaim from memory pool op.5.0.0.Aggregation which is under non-reclaimable section, memory usage: 128.00MB, reservation: 128.00MB
W20240621 15:14:01.931005 101314 Operator.cpp:641] Can't reclaim from memory pool op.5.0.0.Aggregation which is under non-reclaimable section, memory usage: 128.00MB, reservation: 128.00MB
W20240621 15:14:01.934880 114227 HashAggregation.cpp:408] Can't reclaim from aggregation operator which has spilled and is under output processing, pool op.5.0.0.Aggregation, memory usage: 236.76MB, reservation: 240.00MB
24/06/21 15:14:01 ERROR [Executor task launch worker for task 2259.0 in stage 2.0 (TID 14859)] nmm.ManagedReservationListener: Error reserving memory from target
java.lang.NullPointerException
	at java.util.Objects.requireNonNull(Objects.java:203)
	at java.util.Optional.<init>(Optional.java:96)
	at java.util.Optional.of(Optional.java:108)
	at org.apache.gluten.memory.nmm.NativeMemoryManagers$1.spill(NativeMemoryManagers.java:79)
	at org.apache.gluten.memory.memtarget.Spillers$WithMinSpillSize.spill(Spillers.java:57)
	at org.apache.gluten.memory.memtarget.TreeMemoryTargets.spillTree(TreeMemoryTargets.java:90)
	at org.apache.gluten.memory.memtarget.TreeMemoryTargets.spillTree(TreeMemoryTargets.java:61)
	at org.apache.gluten.memory.memtarget.TreeMemoryTargets.spillTree(TreeMemoryTargets.java:80)
	at org.apache.gluten.memory.memtarget.TreeMemoryTargets.spillTree(TreeMemoryTargets.java:61)
	at org.apache.gluten.memory.memtarget.TreeMemoryTargets.spillTree(TreeMemoryTargets.java:80)
	at org.apache.gluten.memory.memtarget.TreeMemoryTargets.spillTree(TreeMemoryTargets.java:61)
	at org.apache.gluten.memory.memtarget.spark.TreeMemoryConsumer.spill(TreeMemoryConsumer.java:120)
	at org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:213)
	at org.apache.spark.memory.MemoryConsumer.acquireMemory(MemoryConsumer.java:136)
	at org.apache.gluten.memory.memtarget.spark.TreeMemoryConsumer.borrow(TreeMemoryConsumer.java:70)
	at org.apache.gluten.memory.memtarget.TreeMemoryTargets$Node.borrow0(TreeMemoryTargets.java:137)
	at org.apache.gluten.memory.memtarget.TreeMemoryTargets$Node.borrow(TreeMemoryTargets.java:129)
	at org.apache.gluten.memory.memtarget.TreeMemoryTargets$Node.borrow0(TreeMemoryTargets.java:137)
	at org.apache.gluten.memory.memtarget.TreeMemoryTargets$Node.borrow(TreeMemoryTargets.java:129)
	at org.apache.gluten.memory.memtarget.OverAcquire.borrow(OverAcquire.java:56)
	at org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget.borrow(ThrowOnOomMemoryTarget.java:35)
	at org.apache.gluten.memory.nmm.ManagedReservationListener.reserve(ManagedReservationListener.java:43)
	at org.apache.gluten.memory.nmm.NativeMemoryManager.create(Native Method)
	at org.apache.gluten.memory.nmm.NativeMemoryManager.create(NativeMemoryManager.java:49)
	at org.apache.gluten.memory.nmm.NativeMemoryManagers.createNativeMemoryManager(NativeMemoryManagers.java:155)
	at org.apache.gluten.memory.nmm.NativeMemoryManagers.create(NativeMemoryManagers.java:56)
	at org.apache.spark.shuffle.ColumnarShuffleWriter.internalWrite(ColumnarShuffleWriter.scala:159)
	at org.apache.spark.shuffle.ColumnarShuffleWriter.write(ColumnarShuffleWriter.scala:242)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1471)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

2 participants