Skip to content

Failing step 3 to make msccl tests nccl (error: identifier "ncclAllToAll" is undefined) #41

@aslom

Description

@aslom

I have got the 2 previous step running OK but got stuck on step 3. Do I need specific version of CUDA? I am using openmpi-4.0.7 - any idea what else could I try?

https://github.com/Azure/msccl?tab=readme-ov-file#3-below-is-the-steps-to-install-msccl-tests-nccl-for-performance-evaluation

$ make MPI=1 MPI_HOME=$HOME/mpi CUDA_HOME=/usr/local/cuda-12.4 NCCL_HOME=$HOME/msccl/executor/msccl-executor-nccl/build/ -j
make -C src build BUILDDIR=/gpfs/users/aslom/github/azure-msccl/msccl/tests/msccl-tests-nccl/build
make[1]: Entering directory '/gpfs/users/aslom/github/azure-msccl/msccl/tests/msccl-tests-nccl/src'
Compiling /gpfs/users/aslom/github/azure-msccl/msccl/tests/msccl-tests-nccl/build/verifiable/verifiable.o
Compiling  all_reduce.cu                       > /gpfs/users/aslom/github/azure-msccl/msccl/tests/msccl-tests-nccl/build/all_reduce.o
Compiling  common.cu                           > /gpfs/users/aslom/github/azure-msccl/msccl/tests/msccl-tests-nccl/build/common.o
Compiling  all_gather.cu                       > /gpfs/users/aslom/github/azure-msccl/msccl/tests/msccl-tests-nccl/build/all_gather.o
Compiling  broadcast.cu                        > /gpfs/users/aslom/github/azure-msccl/msccl/tests/msccl-tests-nccl/build/broadcast.o
Compiling  reduce_scatter.cu                   > /gpfs/users/aslom/github/azure-msccl/msccl/tests/msccl-tests-nccl/build/reduce_scatter.o
Compiling  reduce.cu                           > /gpfs/users/aslom/github/azure-msccl/msccl/tests/msccl-tests-nccl/build/reduce.o
Compiling  alltoall.cu                         > /gpfs/users/aslom/github/azure-msccl/msccl/tests/msccl-tests-nccl/build/alltoall.o
Compiling  scatter.cu                          > /gpfs/users/aslom/github/azure-msccl/msccl/tests/msccl-tests-nccl/build/scatter.o
Compiling  gather.cu                           > /gpfs/users/aslom/github/azure-msccl/msccl/tests/msccl-tests-nccl/build/gather.o
Compiling  sendrecv.cu                         > /gpfs/users/aslom/github/azure-msccl/msccl/tests/msccl-tests-nccl/build/sendrecv.o
Compiling  hypercube.cu                        > /gpfs/users/aslom/github/azure-msccl/msccl/tests/msccl-tests-nccl/build/hypercube.o
alltoall.cu(60): error: identifier "ncclAllToAll" is undefined
        do { ncclResult_t res = ncclAllToAll(sendbuff, recvbuff, count, type, comm, stream); if (res != ncclSuccess) { char hostname[1024]; getHostName(hostname, 1024); printf("%s: Test NCCL failure %s:%d " "'%s / %s'\n", hostname,"alltoall.cu",60, ncclGetErrorString(res), ncclGetLastError(
                                ^

1 error detected in the compilation of "alltoall.cu".
make[1]: *** [Makefile:94: /gpfs/users/aslom/github/azure-msccl/msccl/tests/msccl-tests-nccl/build/alltoall.o] Error 2
make[1]: *** Waiting for unfinished jobs....
../verifiable/verifiable.cu(969): error: identifier "ncclFp8E4M3" is undefined
    case ncclFp8E4M3: prepareInput2<<<block_n, 512, 0, stream>>>((__nv_fp8_e4m3*)elts, elt_n, op, rank_n, rank_me, seed, elt_ix0); break;
         ^

../verifiable/verifiable.cu(970): error: identifier "ncclFp8E5M2" is undefined
    case ncclFp8E5M2: prepareInput2<<<block_n, 512, 0, stream>>>((__nv_fp8_e5m2*)elts, elt_n, op, rank_n, rank_me, seed, elt_ix0); break;
         ^

../verifiable/verifiable.cu(1049): error: identifier "ncclFp8E4M3" is undefined
    case ncclFp8E4M3: prepareExpected2<<<block_n, 512, 0, stream>>>((__nv_fp8_e4m3*)elts, elt_n, op, rank_n, seed, elt_ix0); break;
         ^

../verifiable/verifiable.cu(1050): error: identifier "ncclFp8E5M2" is undefined
    case ncclFp8E5M2: prepareExpected2<<<block_n, 512, 0, stream>>>((__nv_fp8_e5m2*)elts, elt_n, op, rank_n, seed, elt_ix0); break;
         ^

../verifiable/verifiable.cu(1123): error: identifier "ncclFp8E4M3" is undefined
    case ncclFp8E4M3:
         ^

../verifiable/verifiable.cu(1124): error: identifier "ncclFp8E5M2" is undefined
    case ncclFp8E5M2:
         ^

../verifiable/verifiable.cu(1252): error: identifier "ncclFp8E4M3" is undefined
      floating |= elt_ty == ncclFp8E4M3;
                            ^

../verifiable/verifiable.cu(1253): error: identifier "ncclFp8E5M2" is undefined
      floating |= elt_ty == ncclFp8E5M2;
                            ^

8 errors detected in the compilation of "../verifiable/verifiable.cu".
make[1]: *** [../verifiable/verifiable.mk:11: /gpfs/users/aslom/github/azure-msccl/msccl/tests/msccl-tests-nccl/build/verifiable/verifiable.o] Error 2
make[1]: Leaving directory '/gpfs/users/aslom/github/azure-msccl/msccl/tests/msccl-tests-nccl/src'
make: *** [Makefile:20: src.build] Error 2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions