generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 522
Open
Description
Checklist
- I've prepended issue tag with type of change: [bug]
- (If applicable) I've attached the script to reproduce the bug
- (If applicable) I've documented below the DLC image/dockerfile this relates to
- (If applicable) I've documented below the tests I've run on the DLC image
- I'm using an existing DLC image listed here: https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html
- I've built my own container based off DLC (and I've attached the code used to build my own image)
Concise Description:
smdistributed is not available.
ModuleNotFoundError: No module named ‘smdistributed’
DLC image/dockerfile:
763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-training:2.1.0-transformers4.36.0-gpu-py310-cu121-ubuntu20.04
Current behavior:
Expected behavior:
Additional context:
Installing it manually gives the following error:
ErrorMessage "ImportError: libsmddpcpp.so: cannot open shared object file: No such file or directory
niklas-palm
Metadata
Metadata
Assignees
Labels
No labels