A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/pytorch/pytorch/issues/144152 below:

`torch.device(0)` makes CUDA init fail in subprocess since `2.5.0` · Issue #144152 · pytorch/pytorch · GitHub

🐛 Describe the bug
from multiprocessing import Process

import torch

torch.device(0) # Note that torch.device('cuda') or torch.device('cuda:0') do not trigger the issue

def cuda_init():
    torch.Tensor([0]).cuda()

p = Process(target=cuda_init)
p.start()
p.join()
assert p.exitcode == 0

This code snippet succeeds on PyTorch 2.4.1 and fails on 2.5.0:

RuntimeError: CUDA error: initialization error
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Indeed, since 2.5.0, torch.device(0) calls at::getAccelerator, which ends up calling cudaGetDeviceCount and thus initializing CUDA and preventing forks

It seems to be directly linked with:

Versions

Collecting environment information...
PyTorch version: 2.5.1+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.26.3
Libc version: glibc-2.31

Python version: 3.10.9 (main, Feb 3 2023, 11:29:04) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-1028-aws-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.0.221
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A10G
Nvidia driver version: 535.183.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7R32
Stepping: 0
CPU MHz: 2820.130
BogoMIPS: 5599.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 128 KiB
L1i cache: 128 KiB
L2 cache: 2 MiB
L3 cache: 16 MiB
NUMA node0 CPU(s): 0-7
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save rdpid

Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu11==11.10.3.66
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu11==11.7.101
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu11==11.7.99
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu11==11.7.99
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu11==8.5.0.96
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu11==10.9.0.58
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu11==10.2.10.91
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu11==11.4.0.1
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu11==11.7.4.91
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-nccl-cu11==2.14.3
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu11==11.7.91
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] torch==2.5.1
[pip3] torchaudio==2.5.1
[pip3] torchsde==0.2.6
[pip3] torchvision==0.20.1
[pip3] triton==3.1.0
[conda] Could not collect

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @ptrblck @eqy @albanD @guangyey @EikanWang


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4