Skip to content

Error with Score-P and TensorFlow #112

@anarazh

Description

@anarazh

Dear team,

I'm getting the following error when I run Score-P with a module for tracing python scripts:


2020-10-20 09:24:14.149317: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 1.00M (10485
76 bytes) from device: CUDA_ERROR_INVALID_CONTEXT: invalid device context
2020-10-20 09:24:14.149357: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 921.8K (9438
72 bytes) from device: CUDA_ERROR_INVALID_CONTEXT: invalid device context
2020-10-20 09:24:14.149366: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 829.8K (8496
64 bytes) from device: CUDA_ERROR_INVALID_CONTEXT: invalid device context
2020-10-20 09:24:14.149373: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 747.0K (7649
28 bytes) from device: CUDA_ERROR_INVALID_CONTEXT: invalid device context
2020-10-20 09:24:14.149380: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 672.5K (6886
40 bytes) from device: CUDA_ERROR_INVALID_CONTEXT: invalid device context


The error files grows very quickly and I end up killing the job.
I use a custom Score-P build. The details about the environment setup is in the attached job script and the error output is attached too.
Without the Score-P, the application runs as expected even without specifying the LD_PRELOAD for MPI.

When I run Score-P with the LD_PRELOAD set, I get the following error instead:


[Score-P] src/adapters/mpi/SCOREP_Mpi_Env.c:230: Warning: MPI environment initialization request and provided level exceed MPI_THREAD_FUNNELED!
2020-10-19 10:56:13.384533: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2494285000 Hz [rc0003:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)
srun: error: rc0003: task 0: Segmentation fault


Would appreciate any feedback on this issue.
Thanks in advance!

Anara
err_example.txt
job-example.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions