XLA's bundled hwloc exports global symbols, which are without PCI discovery leading to break NCCL OFI network plugins on AWS EFA instances
Problem
XLA/TensorFlow bundles hwloc into libtensorflow_framework.so but the build configuration has two issues that break third-party libraries (e.g. aws-ofi-nccl, NIXL) that also depend on hwloc:
-
hwloc symbols are exported with default visibility, so the dynamic linker resolves other libraries' hwloc_topology_init/hwloc_topology_load calls to TensorFlow's bundled hwloc instead of the system hwloc.
-
The bundled hwloc is built without the PCI discovery backend (topology-pci.c is not in hwloc.BUILD srcs, and hwloc_pci_component is absent from static-components.h), so PCI device enumeration returns 0 devices.
The combination means: any library loaded in the same process that calls hwloc to discover PCI topology (GPUs, NICs, InfiniBand devices) silently gets zero results, because TensorFlow's incomplete hwloc implementation hijacks the calls.
Impact
On AWS GPU instances (P5/P6 with EFA networking), the aws-ofi-nccl NCCL network plugin uses hwloc to discover EFA NIC topology for optimal GPU-NIC pairing. When TensorFlow 2.20+ is loaded:
hwloc_get_next_pcidev() returns NULL (0 PCI devices discovered)
aws-ofi-nccl computes max_group_size = 0 → "Unexpected topo group size of 0" error
- With
aws-ofi-nccl 1.17.0: segmentation fault
- With
aws-ofi-nccl 1.18.1a1: falls back from RDMA to SENDRECV protocol (significant performance degradation)
- TensorFlow 2.18.1 (which uses system hwloc) works correctly
This also affects NIXL's libfabric backend which uses the same hwloc PCI discovery path.
Reproduction
# Compile a minimal hwloc PCI enumeration program
cat > check_topo.c << 'EOF'
#include <stdio.h>
#include <hwloc.h>
int main() {
hwloc_topology_t topo;
hwloc_topology_init(&topo);
hwloc_topology_set_io_types_filter(topo, HWLOC_TYPE_FILTER_KEEP_ALL);
hwloc_topology_load(topo);
int pci_count = 0;
hwloc_obj_t obj = NULL;
while ((obj = hwloc_get_next_pcidev(topo, obj)) != NULL) pci_count++;
printf("PCI devices: %d\n", pci_count);
hwloc_topology_destroy(topo);
return 0;
}
EOF
gcc -o check_topo check_topo.c -lhwloc
# System hwloc — correct
./check_topo
# Output: PCI devices: 89
# With TF 2.20 preloaded — broken (TF's hwloc hijacks the calls)
LD_PRELOAD=$(python3 -c "import tensorflow; print(tensorflow.__file__.replace('__init__.py','') + 'libtensorflow_framework.so.2')") ./check_topo
# Output: PCI devices: 0
# With TF 2.18 preloaded — correct (TF 2.18 uses system hwloc, no bundled symbols)
LD_PRELOAD=$(python3 -c "import tensorflow; print(tensorflow.__file__.replace('__init__.py','') + 'libtensorflow_framework.so.2')") ./check_topo
# Output: PCI devices: 89
Symbol collision confirmed via LD_DEBUG:
LD_DEBUG=bindings python test.py 2>&1 | grep "hwloc_topology" | grep "libnccl-net-ofi"
# binding file libnccl-net-ofi.so to libtensorflow_framework.so.2: normal symbol `hwloc_topology_init'
# binding file libnccl-net-ofi.so to libtensorflow_framework.so.2: normal symbol `hwloc_topology_load'
Root Cause Analysis
In third_party/hwloc/hwloc.BUILD:
Issue 1: Missing -fvisibility=hidden
The copts do not include -fvisibility=hidden, so all 283 hwloc symbols are exported from libtensorflow_framework.so with default (global) visibility:
# Current
copts = COMMON_INCLUDE_COPTS + DISABLE_WARNINGS_COPTS + VAR_SETTINGS_COPTS,
# Should be
copts = COMMON_INCLUDE_COPTS + DISABLE_WARNINGS_COPTS + VAR_SETTINGS_COPTS + ["-fvisibility=hidden"],
Issue 2: Missing PCI discovery backend
static-components.h does not register hwloc_pci_component, and hwloc.BUILD does not compile topology-pci.c. This means the bundled hwloc cannot discover PCI devices at all. While TensorFlow itself may not need PCI topology, the globally-visible symbols mean other libraries in the same process get this broken implementation.
Proposed Fix
Add -fvisibility=hidden to the hwloc build copts so that the bundled hwloc symbols remain internal to libtensorflow_framework.so and cannot hijack other libraries' hwloc calls:
copts = COMMON_INCLUDE_COPTS + DISABLE_WARNINGS_COPTS + VAR_SETTINGS_COPTS + ["-fvisibility=hidden"],
This is the minimal, non-breaking fix. It keeps TensorFlow's internal hwloc usage working while preventing symbol leakage to other libraries.
Environment
- Instance: AWS P5 (8x H100, 32x EFA NICs)
- TensorFlow: 2.20.0 (broken), 2.18.1 (works)
- System hwloc: 2.7.0
- TF bundled hwloc: 2.0.3 (per
hwloc.BUILD version defines)
- Affected libraries:
aws-ofi-nccl (NCCL network plugin), NIXL libfabric backend
XLA's bundled hwloc exports global symbols, which are without PCI discovery leading to break NCCL OFI network plugins on AWS EFA instances
Problem
XLA/TensorFlow bundles hwloc into
libtensorflow_framework.sobut the build configuration has two issues that break third-party libraries (e.g.aws-ofi-nccl, NIXL) that also depend on hwloc:hwloc symbols are exported with default visibility, so the dynamic linker resolves other libraries'
hwloc_topology_init/hwloc_topology_loadcalls to TensorFlow's bundled hwloc instead of the system hwloc.The bundled hwloc is built without the PCI discovery backend (
topology-pci.cis not inhwloc.BUILDsrcs, andhwloc_pci_componentis absent fromstatic-components.h), so PCI device enumeration returns 0 devices.The combination means: any library loaded in the same process that calls hwloc to discover PCI topology (GPUs, NICs, InfiniBand devices) silently gets zero results, because TensorFlow's incomplete hwloc implementation hijacks the calls.
Impact
On AWS GPU instances (P5/P6 with EFA networking), the
aws-ofi-ncclNCCL network plugin uses hwloc to discover EFA NIC topology for optimal GPU-NIC pairing. When TensorFlow 2.20+ is loaded:hwloc_get_next_pcidev()returns NULL (0 PCI devices discovered)aws-ofi-ncclcomputesmax_group_size = 0→ "Unexpected topo group size of 0" erroraws-ofi-nccl1.17.0: segmentation faultaws-ofi-nccl1.18.1a1: falls back from RDMA to SENDRECV protocol (significant performance degradation)This also affects NIXL's libfabric backend which uses the same hwloc PCI discovery path.
Reproduction
Symbol collision confirmed via
LD_DEBUG:Root Cause Analysis
In
third_party/hwloc/hwloc.BUILD:Issue 1: Missing
-fvisibility=hiddenThe
coptsdo not include-fvisibility=hidden, so all 283 hwloc symbols are exported fromlibtensorflow_framework.sowith default (global) visibility:Issue 2: Missing PCI discovery backend
static-components.hdoes not registerhwloc_pci_component, andhwloc.BUILDdoes not compiletopology-pci.c. This means the bundled hwloc cannot discover PCI devices at all. While TensorFlow itself may not need PCI topology, the globally-visible symbols mean other libraries in the same process get this broken implementation.Proposed Fix
Add
-fvisibility=hiddento the hwloc build copts so that the bundled hwloc symbols remain internal tolibtensorflow_framework.soand cannot hijack other libraries' hwloc calls:This is the minimal, non-breaking fix. It keeps TensorFlow's internal hwloc usage working while preventing symbol leakage to other libraries.
Environment
hwloc.BUILDversion defines)aws-ofi-nccl(NCCL network plugin), NIXL libfabric backend