torch 2.5.0 stable from pip with cuda 12.4 results in a reproducible broken install when attempting to follow 'Getting Started' guide:
docker run -it --rm --gpus=all almalinux/9-base [root@a8af28733c07 /]# python3 -V Python 3.9.18 [root@a8af28733c07 /]# python3 -m pip install torch torchvision torchaudio [root@a8af28733c07 /]# python3 >>> import torch Traceback (most recent call last): File "/usr/local/lib64/python3.9/site-packages/torch/__init__.py", line 300, in _load_global_deps ctypes.CDLL(global_deps_lib_path, mode=ctypes.RTLD_GLOBAL) File "/usr/lib64/python3.9/ctypes/__init__.py", line 374, in __init__ self._handle = _dlopen(self._name, mode) OSError: libcudart.so.12: cannot open shared object file: No such file or directory During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib64/python3.9/site-packages/torch/__init__.py", line 367, in <module> _load_global_deps() File "/usr/local/lib64/python3.9/site-packages/torch/__init__.py", line 325, in _load_global_deps _preload_cuda_deps(lib_folder, lib_name) File "/usr/local/lib64/python3.9/site-packages/torch/__init__.py", line 284, in _preload_cuda_deps raise ValueError(f"{lib_name} not found in the system path {sys.path}") ValueError: libcufile.so.*[0-9] not found in the system path ['', '/usr/lib64/python39.zip', '/usr/lib64/python3.9', '/usr/lib64/python3.9/lib-dynload', '/usr/local/lib64/python3.9/site-packages', '/usr/local/lib/python3.9/site-packages', '/usr/lib64/python3.9/site-packages', '/usr/lib/python3.9/site-packages']
This works fine for the previous version; eg 2.4.1, 2.4.0, etc:python3 -m pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu124
I notice for previous versions when installing, resulting torch version is 2.4.1+cu124
, whereas current stable install instructions result in 2.5.0
without +cu124
- is this a simple documentation issue?
torch-2.5.0-cp39-cp39-manylinux1_x86_64.whl from pypi
Diagnostic script relies on broken distribution of torch:
[root@a8af28733c07 /]# wget https://raw.githubusercontent.com/pytorch/pytorch/main/torch/utils/collect_env.py [root@a8af28733c07 /]# python3 collect_env.py Traceback (most recent call last): File "/usr/local/lib64/python3.9/site-packages/torch/__init__.py", line 300, in _load_global_deps ctypes.CDLL(global_deps_lib_path, mode=ctypes.RTLD_GLOBAL) File "/usr/lib64/python3.9/ctypes/__init__.py", line 374, in __init__ self._handle = _dlopen(self._name, mode) OSError: libcudart.so.12: cannot open shared object file: No such file or directory During handling of the above exception, another exception occurred: Traceback (most recent call last): File "//collect_env.py", line 17, in <module> import torch File "/usr/local/lib64/python3.9/site-packages/torch/__init__.py", line 367, in <module> _load_global_deps() File "/usr/local/lib64/python3.9/site-packages/torch/__init__.py", line 325, in _load_global_deps _preload_cuda_deps(lib_folder, lib_name) File "/usr/local/lib64/python3.9/site-packages/torch/__init__.py", line 284, in _preload_cuda_deps raise ValueError(f"{lib_name} not found in the system path {sys.path}") ValueError: libcufile.so.*[0-9] not found in the system path ['/', '/usr/lib64/python39.zip', '/usr/lib64/python3.9', '/usr/lib64/python3.9/lib-dynload', '/usr/local/lib64/python3.9/site-packages', '/usr/local/lib/python3.9/site-packages', '/usr/lib64/python3.9/site-packages', '/usr/lib/python3.9/site-packages']
cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @seemethere @malfet @osalpekar @atalman @ptrblck
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4