In PyTorch 1.8.0 JIT recompiles some functions every time if input tensor changes its content (not the shape).
To ReproduceIf I run the following code
optimize = True
device = 'cuda:0'
num_runs = 5
import time
import torch
def func(mask: torch.Tensor):
H, W = mask.size()
tensor = torch.zeros([H, W], device=mask.device)
masked_view = tensor[mask]
output = torch.stack([masked_view, masked_view + W + 1], dim=1)
return output
jit_func = torch.jit.script(func)
def get_random_mask():
mask = torch.randint(2, size=[1000, 1000], dtype=torch.bool, device=device)
return mask
with torch.jit.optimized_execution(optimize):
times = []
for i in range(num_runs):
mask = get_random_mask()
torch.cuda.synchronize(device)
start = time.perf_counter()
_ = jit_func(mask)
torch.cuda.synchronize(device)
elapsed_time = time.perf_counter() - start
times.append(elapsed_time)
print(f'PyTorch version: {torch.__version__}')
print(f'Optimized execution: {optimize}')
print(f"Times:")
print("\n".join([f"{x:.4f} sec." for x in times]))
I got the following results:
PyTorch version: 1.8.0
Optimized execution: False
Times:
0.0007 sec.
0.0002 sec.
0.0002 sec.
0.0002 sec.
0.0002 sec.
PyTorch version: 1.8.0
Optimized execution: True
Times:
0.0402 sec.
0.1237 sec.
0.1194 sec.
0.1202 sec.
0.1204 sec.
PyTorch version: 1.7.1+cu110
Optimized execution: True
Times:
0.0024 sec.
0.1230 sec.
0.0003 sec.
0.0002 sec.
0.0002 sec.
PyTorch version: 1.7.1+cu110
Optimized execution: False
Times:
0.0007 sec.
0.0003 sec.
0.0002 sec.
0.0002 sec.
0.0002 sec.
Evidently, PyTorch 1.8.0 recompiles this function for every new random mask, even though its shape is unchanged.
Expected behaviorJIT should not recompile this function for each new mask.
EnvironmentPyTorch version: 1.8.0
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
CMake version: version 3.10.2
Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: GeForce RTX 2080 SUPER
Nvidia driver version: 460.32.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.20.1
[pip3] pytorch-lightning==1.1.4
[pip3] torch==1.8.0
[pip3] torchvision==0.9.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.1.1 h6406543_8 conda-forge
[conda] mkl 2020.2 256
[conda] mkl-service 2.3.0 py38he904b0f_0
[conda] mkl_fft 1.3.0 py38h54f3939_0
[conda] mkl_random 1.1.1 py38h0573a6f_0
[conda] numpy 1.20.1 pypi_0 pypi
[conda] pytorch 1.8.0 py3.8_cuda11.1_cudnn8.0.5_0 pytorch
[conda] pytorch-lightning 1.1.4 pypi_0 pypi
[conda] torchvision 0.9.0 py38_cu111 pytorch
cc @gmagogsfm
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4