Skip to content

[VL][SPARK-4.0][1.6.0] JVM crash (SIGSEGV) when enabling GlutenPlugin on Spark 4.0.1 with gluten-velox-bundle-spark4.0_2.13-linux_amd64-1.6.0.jar #12103

@Cai-Yao

Description

@Cai-Yao

Backend

VL (Velox)

Bug description

Bug description

When running Spark SQL with Gluten enabled on Spark 4.0.1, JVM crashes with a fatal error (SIGSEGV) shortly after loading libgluten.so and libvelox.so.

Expected behavior

Query should run successfully (or fallback gracefully), without JVM crash.

Actual behavior

Driver JVM crashes with SIGSEGV in libjvm.so during JNI method lookup path (jni_GetMethodID), and hs_err_pid*.log contains:

  • NoClassDefFoundError: Lorg/apache/gluten/memory/listener/ReservationListener;
  • native libraries loaded from Gluten bundle (libgluten.so, libvelox.so)
  • process exits due to fatal JVM error

This issue report was drafted with assistance from AI.

Gluten version

main branch

Spark version

spark-4.0.x

Spark configurations

Sanitized key configs used in reproduction:

--master local[80]
--class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
--jars /<REDACTED_PATH>/gluten-velox-bundle-spark4.0_2.13-linux_amd64-1.6.0.jar

--conf spark.plugins=org.apache.gluten.GlutenPlugin
--conf spark.memory.offHeap.enabled=true
--conf spark.memory.offHeap.size=20g
--conf spark.driver.memory=24g
--conf spark.sql.shuffle.partitions=80
--conf spark.sql.crossJoin.enabled=true
--conf spark.sql.legacy.timeParserPolicy=LEGACY
--conf spark.sql.ansi.enabled=false
--conf spark.gluten.sql.columnar.forceShuffledHashJoin=true
--conf spark.sql.warehouse.dir=/<REDACTED_PATH>/warehouse

spark.driver.extraJavaOptions:
-Dderby.system.home=/<REDACTED_PATH>
-XX:+UseG1GC

System information

JVM args (sanitized, key ones):
JDK: Temurin 17.0.19+10
--add-modules=jdk.incubator.vector
multiple --add-opens=...=ALL-UNNAMED
-Djdk.reflect.useDirectMethodHandle=false
-Dio.netty.tryReflectionSetAccessible=true
System information
Sanitized environment details (from hs_err):
OS: Alibaba Cloud Linux 3
Kernel: 5.10.134-18.al8.x86_64
Arch: x86_64
CPU: Intel(R) Xeon(R) Platinum 8369B, 80 cores
Memory: ~114 GB
glibc: 2.32
Java: OpenJDK Temurin 17.0.19+10 (linux-amd64)
(If needed, I can run dev/info.sh and provide a redacted output.)

Relevant logs

# A fatal error has been detected by the Java Runtime Environment:
#  SIGSEGV (0xb)
# JRE version: OpenJDK Runtime Environment Temurin-17.0.19+10
# Java VM: OpenJDK 64-Bit Server VM ... linux-amd64
# Problematic frame:
# V  [libjvm.so+0x28cda0] ... oop_access_barrier(void*)+0x0
Internal exceptions (20 events):
Event: 14.409 Thread ... Exception <a 'java/lang/NoClassDefFoundError' ...:
Lorg/apache/gluten/memory/listener/ReservationListener;>
thrown [src/hotspot/share/classfile/systemDictionary.cpp, line 245]
Event: 3.960 Loaded shared library .../linux/amd64/libgluten.so
Event: 4.996 Loaded shared library .../linux/amd64/libvelox.so
java_command: org.apache.spark.deploy.SparkSubmit ... 
--jars /<REDACTED_PATH>/gluten-velox-bundle-spark4.0_2.13-linux_amd64-1.6.0.jar ...
Reproduction notes
Same machine can run older Spark 3.3 + Gluten bundle without this crash.
Crash is observed in Spark 4.0.1 + Gluten 1.6.0 (Spark 4.0 / Scala 2.13 bundle).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions