RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/pytorch/pytorch/issues/66930 below:

JIT vs. eager mismatches for jit.traced `int8` to `int32` casting · Issue #66930 · pytorch/pytorch · GitHub

🐛 Bug

A traced model seems to create overflows when int8 tensor values are transformed to int32 on the GPU.

To Reproduce

import torch
import torch.nn as nn

class AddSubNet(nn.Module):
    def __init__(self, *args):
        self.torch_output0_dtype = args[0][0]
        self.torch_output1_dtype = args[0][1]
        super(AddSubNet, self).__init__()

    def forward(self, input0, input1):
        return (input0 + input1).to(self.torch_output0_dtype), \
               (input0 - input1).to(self.torch_output1_dtype)

device = 'cpu'
model = AddSubNet((torch.int32, torch.int32)).to(device)
x1 = torch.randint(-127, 128, (16,)).to(torch.int8).to(device)
x2 = torch.randint(-127, 128, (16,)).to(torch.int8).to(device)
model_cpu = torch.jit.trace(model, (x1, x2))

print('input')
print(x1)
print(x2)

out1_cpu, out2_cpu = model_cpu(x1, x2)
print('cpu output')
print(out1_cpu)
print(out2_cpu)

device = 'cuda'
model.to(device)
x1 = x1.to(device)
x2 = x2.to(device)

model_gpu = torch.jit.trace(model, (x1, x2))
#print(model_gpu.graph)

out1, out2 = model_gpu(x1, x2)
print('cuda output')
print(out1)
print(out2)

Output:

input
tensor([ -14,  127,  -24,    9, -115,  -24, -102,   -5,    5,   93,   45,  -69,
         -74,   46,  109,  -90], dtype=torch.int8)
tensor([  32,  -46,   13,   78,  109,  -84, -104,   76,   29,  -97,  -90,   73,
          17, -105,   34,  117], dtype=torch.int8)
cpu output
tensor([  18,   81,  -11,   87,   -6, -108,   50,   71,   34,   -4,  -45,    4,
         -57,  -59, -113,   27], dtype=torch.int32)
tensor([ -46,  -83,  -37,  -69,   32,   60,    2,  -81,  -24,  -66, -121,  114,
         -91, -105,   75,   49], dtype=torch.int32)
cuda output
tensor([  18,   81,  -11,   87,   -6, -108, -206,   71,   34,   -4,  -45,    4,
         -57,  -59,  143,   27], device='cuda:0', dtype=torch.int32)
tensor([ -46,  173,  -37,  -69, -224,   60,    2,  -81,  -24,  190,  135, -142,
         -91,  151,   75, -207], device='cuda:0', dtype=torch.int32)

The mismatches look like overflowing values (e.g. 50 vs. -206).

Environment

PyTorch version: 1.11.0.dev20211019+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.3
Libc version: glibc-2.31

Reproduced on different GPUs.

Additional information

Enabling nvfuser via:

torch._C._jit_set_nvfuser_enabled(True)
torch._C._jit_set_texpr_fuser_enabled(False)
torch._C._jit_set_profiling_executor(True)
torch._C._jit_set_profiling_mode(True)
torch._C._jit_override_can_fuse_on_cpu(False)
torch._C._jit_override_can_fuse_on_gpu(False)
torch._C._jit_set_bailout_depth(20)

yields matching values.

CC @malfet as we've discussed this issue. (Initially I thought it would be ARM-specifc, but that turns out to be wrong)

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4