A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/pytorch/pytorch/issues/53824 below:

[JIT] torch.jit.optimized_execution(True) greatly slows down some operations in PyTorch 1.8.0 · Issue #53824 · pytorch/pytorch · GitHub

🐛 Bug

In PyTorch 1.8.0 JIT recompiles some functions every time if input tensor changes its content (not the shape).

To Reproduce

If I run the following code

optimize = True
device = 'cuda:0'
num_runs = 5


import time

import torch


def func(mask: torch.Tensor):
    H, W = mask.size()

    tensor = torch.zeros([H, W], device=mask.device)
    masked_view = tensor[mask]
    output = torch.stack([masked_view, masked_view + W + 1], dim=1)

    return output


jit_func = torch.jit.script(func)


def get_random_mask():
    mask = torch.randint(2, size=[1000, 1000], dtype=torch.bool, device=device)

    return mask


with torch.jit.optimized_execution(optimize):
    times = []
    for i in range(num_runs):
        mask = get_random_mask()

        torch.cuda.synchronize(device)
        start = time.perf_counter()

        _ = jit_func(mask)

        torch.cuda.synchronize(device)
        elapsed_time = time.perf_counter() - start

        times.append(elapsed_time)

print(f'PyTorch version: {torch.__version__}')
print(f'Optimized execution: {optimize}')
print(f"Times:")
print("\n".join([f"{x:.4f} sec." for x in times]))

I got the following results:

PyTorch version: 1.8.0
Optimized execution: False
Times:
0.0007 sec.
0.0002 sec.
0.0002 sec.
0.0002 sec.
0.0002 sec.

PyTorch version: 1.8.0
Optimized execution: True
Times:
0.0402 sec.
0.1237 sec.
0.1194 sec.
0.1202 sec.
0.1204 sec.

PyTorch version: 1.7.1+cu110
Optimized execution: True
Times:
0.0024 sec.
0.1230 sec.
0.0003 sec.
0.0002 sec.
0.0002 sec.

PyTorch version: 1.7.1+cu110
Optimized execution: False
Times:
0.0007 sec.
0.0003 sec.
0.0002 sec.
0.0002 sec.
0.0002 sec.

Evidently, PyTorch 1.8.0 recompiles this function for every new random mask, even though its shape is unchanged.

Expected behavior

JIT should not recompile this function for each new mask.

Environment

PyTorch version: 1.8.0
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
CMake version: version 3.10.2

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: GeForce RTX 2080 SUPER
Nvidia driver version: 460.32.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.1
[pip3] pytorch-lightning==1.1.4
[pip3] torch==1.8.0
[pip3] torchvision==0.9.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.1.1 h6406543_8 conda-forge
[conda] mkl 2020.2 256
[conda] mkl-service 2.3.0 py38he904b0f_0
[conda] mkl_fft 1.3.0 py38h54f3939_0
[conda] mkl_random 1.1.1 py38h0573a6f_0
[conda] numpy 1.20.1 pypi_0 pypi
[conda] pytorch 1.8.0 py3.8_cuda11.1_cudnn8.0.5_0 pytorch
[conda] pytorch-lightning 1.1.4 pypi_0 pypi
[conda] torchvision 0.9.0 py38_cu111 pytorch

cc @gmagogsfm


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4