-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
The original CodeGeeX using this script failed(out of memory in 3900X(24 core)+32GB RAM+3090)
# With quantization (with more than 15GB RAM)
bash ./scripts/test_inference_quantized.sh <GPU_ID> ./tests/test_prompt.txt
so I switch to codegeex-fastertransformer, it seems still OOM
Traceback (most recent call last):
File "api.py", line 105, in <module>
if not codegeex.load(ckpt_path=args.ckpt_path):
File "/workspace/codegeex-fastertransformer/examples/pytorch/codegeex/utils/codegeex.py", line 413, in load
self.cuda()
File "/workspace/codegeex-fastertransformer/examples/pytorch/codegeex/utils/codegeex.py", line 430, in cuda
self.weights._map(lambda w: w.contiguous().cuda(self.device))
File "/workspace/codegeex-fastertransformer/examples/pytorch/codegeex/utils/codegeex.py", line 177, in _map
w[i] = func(w[i])
File "/workspace/codegeex-fastertransformer/examples/pytorch/codegeex/utils/codegeex.py", line 430, in <lambda>
self.weights._map(lambda w: w.contiguous().cuda(self.device))
RuntimeError: CUDA out of memory. Tried to allocate 200.00 MiB (GPU 0; 24.00 GiB total capacity; 23.11 GiB already allocated; 0 bytes free; 23.11 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentat
ion. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Metadata
Metadata
Assignees
Labels
No labels