We are excited to announce the release of PyTorch 1.10. This release is composed of over 3,400 commits since 1.9, made by 426 contributors. We want to sincerely thank our community for continuously improving PyTorch.
PyTorch 1.10 updates are focused on improving training and performance of PyTorch, and developer usability. Highlights include:
torch.special
, and nn.Module
Parametrization, have moved from beta to stable.You can check the blogpost that shows the new features here.
Backwards Incompatible changes Python APItorch.any
/torch.all
behavior changed slightly to be more consistent for zero-dimension, uint8
tensors. (#64642)
These two functions match the behavior of NumPy, returning an output dtype of bool for all support dtypes, except for uint8
(in which case they return a 1 or a 0, but with uint8
dtype). In some cases with 0-dim tensor inputs, the returned uint8
value could mistakenly take on a value > 1. This has now been fixed.
>>> torch.all(torch.tensor(42, dtype=torch.uint8)) tensor(1, dtype=torch.uint8) >>> torch.all(torch.tensor(42, dtype=torch.uint8), dim=0) tensor(42, dtype=torch.uint8) # wrong, old behavior
>>> torch.all(torch.tensor(42, dtype=torch.uint8)) tensor(1, dtype=torch.uint8) >>> torch.all(torch.tensor(42, dtype=torch.uint8), dim=0) tensor(1, dtype=torch.uint8) # new, corrected and consistent behaviorRemove deprecated
torch.{is,set}_deterministic
(#62158)
This is the end of the deprecation cycle for both of these functions. You should be using torch.use_deterministic_algorithms
andtorch.are_deterministic_algorithms_enabled
instead.
tensor.conj()
now returns a view tensor that aliases the same memory and has conjugate bit set (#54987, #60522, #66082, #63602).
This means that .conj()
is now an O(1) operation and returns a tensor that views the same memory as tensor
and has conjugate bit set. This notion of conjugate bit enables fusion of operations with conjugation which gives a lot of performance benefit for operations like matrix multiplication. All out-of-place operations will have the same behavior as before, but an in-place operation on a conjugated tensor will additionally modify the input tensor.
>>> import torch >>> x = torch.tensor([1+2j]) >>> y = x.conj() >>> y.add_(2) >>> print(x) tensor([1.+2.j])
>>> import torch >>> x = torch.tensor([1+2j]) >>> y = x.conj() >>> y.add_(2) >>> print(x) tensor([3.+2.j])
Note: You can verify if the conj bit is set by calling tensor.is_conj()
. The conjugation can be resolved, i.e., you can obtain a new tensor that doesn’t share storage with the input tensor at any time by calling conjugated_tensor.clone()
or conjugated_tensor.resolve_conj()
.
Note that these conjugated tensors behave differently from the corresponding numpy arrays obtained from np.conj()
when an in-place operation is performed on them (similar to the example shown above).
tensor.conj().neg()
returns a view tensor that aliases the same memory as both tensor and tensor.conj()
and has a negative bit set (#56058).
conjugated_tensor.neg()
continues to be an O(1) operation, but the returned tensor shares memory with both tensor
and conjugated_tensor
.
>>> x = torch.tensor([1+2j]) >>> y = x.conj() >>> z = y.imag >>> z.add_(2) >>> print(x) tensor([1.+2.j])
>>> x = torch.tensor([1+2j]) >>> y = x.conj() >>> z = y.imag >>> print(z.is_neg()) True >>> z.add_(2) >>> print(x) tensor([1.-0.j])
tensor.numpy()
now throws RuntimeError
when called on a tensor with conjugate or negative bit set (#61925).
Because the notion of conjugate bit and negative bit doesn’t exist outside of PyTorch, calling operations that return a Python object viewing the same memory as input like .numpy()
would no longer work for tensors with conjugate or negative bit set.
>>> x = torch.tensor([1+2j]) >>> y = x.conj().imag >>> print(y.numpy()) [2.]
>>> x = torch.tensor([1+2j]) >>> y = x.conj().imag >>> print(y.numpy()) RuntimeError: Can't call numpy() on Tensor that has negative bit set. Use tensor.resolve_neg().numpy() instead.Autograd Raise
TypeError
instead of RuntimeError
when assigning to a Tensor’s grad field with wrong type (#64876)
Setting the .grad
field with a non-None and non-Tensor object used to return a RuntimeError
but it now properly returns a TypeError
. If your code was catching this error, you should simply update it to catch a TypeError
instead of a RuntimeError
.
try: # Assigning an int to a Tensor's grad field a.grad = 0 except RuntimeError as e: pass
try: a.grad = 0 except TypeError as e: passRaise error when inputs to
autograd.grad
are empty (#52016)
Calling autograd.grad
with an empty list of inputs used to do the same as backward. To reduce confusion, it now raises the expected error. If you were relying on this, you can simply update your code as follows:
grad = autograd.grad(out, tuple()) assert grad == tuple()
out.backward()Optional arguments to
autograd.gradcheck
and autograd.gradgradcheck
are now kwarg-only (#65290)
These two functions now have a significant number of optional arguments controlling what they do (i.e., eps
, atol
, rtol
, raise_exception
, etc.). To improve readability, we made these arguments kwarg-only. If you are passing these arguments to autograd.gradcheck
or autograd.gradgradcheck
as positional arguments, you can update your code as follows:
torch.autograd.gradcheck(fn, x, 1e-6)
torch.autograd.gradcheck(fn, x, eps=1e-6)In-place detach (
detach_
) now errors for views that return multiple outputs (#58285)
This change is finishing the deprecation cycle for the inplace-over-view logic. In particular, a few things that were warning are updated:
* `detach_` will now raise an error when invoked on any view created by `split`, `split_with_sizes`, or `chunk`. You should use the non-inplace `detach` instead.
* The error message for when an in-place operation (that is not detach) is performed on a view created by `split`, `split_with_size`, and `chunk` has been changed from "This view is an output of a function..." to "This view is the output of a function...".
1.9.1 1.10.0
b = a.split(1)[0] b.detach_()
b = a.split(1)[0] c = b.detach()Fix saved variable unpacking version counter (#60195)
In-place on the unpacked SavedVariables used to be ignored. They are now properly detected which can lead to errors saying that a variable needed for backward was modified in-place.
This is a valid error and the user should fix this by cloning the unpacked saved variable before using it.
No internal formula will trigger this, but it might be triggered by user custom autograd.Function
if the backward modifies a saved Tensor inplace and you do multiple backwards. This used to silently return the wrong result and will now raise the expected error.
__torch_function__
handling checks (#63967)
This fixes the has_torch_function*()
checks throughout torch.nn.functional
to correctly pass in optional tensor arguments; prior to this fix, handle_torch_function()
was not called for these optional tensor arguments. Previously, passing a tensor-like object into a function that accepts an optional tensor might not trigger that object's __torch_function__
. Now, the object's __torch_function__
will be triggered as expected.
import torch import torch.nn.functional as F class TestTensor(object): def __init__(self, weight): self.weight = weight def __torch_function__(self, func, _, args=(), kwargs=None): print(func) print(func == F.group_norm) # Call F.group_norm with a custom Tensor as the non-optional arg 'features' features = TestTensor(torch.randn(3,3)) F.group_norm(features, 3) # ...prints "group_norm" and True # Call F.group_norm with a custom Tensor as the optional arg 'weight' features = torch.randn(3,3) weight = TestTensor(torch.randn(3)) F.group_norm(features, 3, weight=weight) # ...prints "group_norm" and False because weight's __torch_function__ is # called with func as torch.group_norm instead of F.group_norm
import torch import torch.nn.functional as F class TestTensor(object): def __init__(self, weight): self.weight = weight def __torch_function__(self, func, _, args=(), kwargs=None): print(func) print(func == F.group_norm) # Call F.group_norm with a custom Tensor as the non-optional arg 'features' features = TestTensor(torch.randn(3,3)) F.group_norm(features, 3) # ...prints "group_norm" and True # Call F.group_norm with a custom Tensor as the optional arg 'weight' features = torch.randn(3,3) weight = TestTensor(torch.randn(3)) F.group_norm(features, 3, weight=weight) # ...prints "group_norm" and TrueCUDA Removed post-backward syncs on default stream (#60421)
Calls to backward() or grad() synced only the calling thread's default stream with autograd leaf streams at the end of backward. This made the following weird pattern safe:
with torch.cuda.stream(s): # imagine forward used many streams, so backward leaf nodes may run on many streams loss.backward()# no sync use grads
but a more benign-looking pattern was unsafe:
with torch.cuda.stream(s): # imagine forward used a lot of streams, so backward leaf nodes may run on many streams loss.backward() # backward() syncs the default stream with all the leaf streams, but does not sync s with anything, # so counterintuitively (even though we're in the same stream context as backward()!) # it is NOT SAFE to use grads here, and there's no easy way to make it safe, # unless you manually sync on all the streams you used in forward, # or move "use grads" back to default stream outside the context. use grads
Note: this change makes it so that backward() has same user-facing stream semantics as any cuda op.** In other words, the weird pattern is unsafe, and the benign-looking pattern is safe. Implementation-wise, this meant backward() should sync its calling thread's current stream, not default stream, with the leaf streams. This PR deletes syncs on the default stream.
torch.packagewith PackageExporter(buffer, verbose=False) as e: e.intern("**") e.save_pickle("res", "mod1.pkl", mod1) e.save_pickle("res", "mod2.pkl", mod2)
with PackageExporter(buffer) as e: e.intern("**") e.save_pickle("res", "mod1.pkl", mod1) e.save_pickle("res", "mod2.pkl", mod2)Quantization Added extra observer/fake_quant (the same observer/fake_quant instance as the input) for some operators in prepare_fx, e.g. maxpool, add_scalar and mul_scalar (#61687, #61859)
Previously the way we insert observers/fake_quants are specific to fbgemm/qnnpack backend, as we work on making FX Graph Mode Quantization extensible to custom backends, we are changing some behaviors for the fbgemm/qnnpack path as well. The above changes are adding extra observer/fake_quant to the output of some operators to make sure we model the quantized operator more accurately in quantization aware training, the comprehensive list of operators where the behavior changes are the following:
We will show an example with torch.nn.MaxPool2d:
class M(torch.nn.Module): def __init__(self): super().__init__() self.maxpool2d = torch.nn.MaxPool2d(kernel_size=3) def forward(self, x): x = self.maxpool2d(x) return x m = M().eval() m = prepare_fx(m, {"": torch.quantization.default_qconfig}) print(m.code)1.9.1 1.10.0
def forward(self, x): x_activation_post_process_0 = self.x_activation_post_process_0(x); x = None maxpool2d = self.maxpool2d(x_activation_post_process_0); x_activation_post_process_0 = None return maxpool2d
def forward(self, x): x_activation_post_process_0 = self.x_activation_post_process_0(x); x = None maxpool2d = self.maxpool2d(x_activation_post_process_0); x_activation_post_process_0 = None maxpool2d_activation_post_process_0 = self.maxpool2d_activation_post_process_0(maxpool2d); maxpool2d = None return maxpool2d_activation_post_process_0
Note that self.maxpool2d_activation_post_process_0
and self.x_activation_post_process_0
will refer to the same observer/fake_quant instance, this is to simulate the numerics for the quantized maxpool implementation, where the output would reuse the quantization parameter of the input. Simple illustration with graph:
Before:
observer_0 - maxpool - ...
After:
observer_0 - maxpool - observer_0 (same observer instance as input observer) - ...
ONNX Removed aten
arg from torch.onnx.export()
. (#62759)
The new OperatorExportTypes.ONNX
removes the need for an explicit aten
argument. If Pytorch was built with -DPYTORCH_ONNX_CAFFE2_BUNDLE
the a None
value means OperatorExportTypes.ONNX_ATEN_FALLBACK
torch.onnx.export(..., aten=True)
torch.onnx.export(..., operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN)Deprecations Python API Deprecate
__torch_function__
as a plain methods (#64843)
The __torch_function__
function used to create Tensor like objects did not have any constraint whether it should be a method, class method or static method.
To make it compatible with newer features on Tensor-like objects, we are deprecating setting it as a plain method. You can define it as a class method to get the current class and scan the argument list if you need an object that is an instance of this class.
Mobile Removed API torch.utils.bundled_inputs.run_on_bundled_input (#58344)This API caused many issues and is not really necessary. The functionality (run model with bundled input) can be achieved by using get_all_bundled_inputs
. For example:
1.9.1:
model.run_on_bundled_input(0)
1.10.0:
model(*model.get_all_bundled_inputs()[0])Distributed
torch.distributed.rpc
: Removed ProcessGroup RPC backend (#62411 , #62985)
ProcessGroup RPC backend has been deprecated and 1.9 was the last release which carried it. The default RPC backend is TensorPipe which is the recommended backend for RPC. Users who use torch.distributed.rpc.BackendType.PROCESS_GROUP
will be given an error message to switch to torch.distributed.rpc.BackendType.TENSORPIPE
.
enable_onnx_checker
argument is removed. ONNX checker will now always run by default. Users can catch exceptions to ignore raised failures. strip_doc_string
has been rolled into the verbose
arg in torch.onnx.export()
. _retain_param_name
argument has been removed in torch.onnx.export()
will default to True
. There is no way to get the old behavior of _retain_param_name=False
. Users should stop setting this arg.
1.9.1:
torch.onnx.export(..., enable_onnx_checker=False, strip_doc_string=False)
1.10.0:
try:
torch.onnx.export(verbose=True)
except torch.onnx.utils.ONNXCheckerError:
pass
Infra (Releng) Disable ParallelTBB (#65092)
ParallelTBB
config/codepath is no longer actively tested by PyTorch CI and as result is subject to code/functionality degradation
torch.isin()
(#53125), torch.bitwise_{left/right}_shift
, __rlshift__
, __rrshift__
(#59544), torch.Tensor.{__rand__, __ror__,__rxor__}
(#59240), torch.aminmax
(#62401), torch.new_ones
(#58405)torch.cov
(#58311), torch.frombuffer
(#59077), torch.corrcoef
(#60420), torch.nanmean
(#62671), torch.cumulative_trapezoid
(#61615)torch.optim
:
torch.cpu.amp.autocast
: enable new API for CPU autocast (#57386, #63534)BFloat16
support for torch.{cross, tril, triu, tril_indices, triu_indices, cumsum, cummax, cummin, median, kthvalue, nansum, nextafter, range, sinh, cosh, frexp, nan_to_num, sigmoid, sigmoid_backward, tanh_backward, addcmul, addcdiv, bucketize, bernoulli, dropout, fold, unfold, MaxPool2D, AdaptiveAvgPool2D, topk}
on CPU (#62454, #63307, #55210, #60074, #61083, #61829, #55221, #61826, #55588, #56372, #62880, #55202, #59547)BFloat16
support for torch.{ceil, floor, frac, round, trunc, sort, topk, aminmax, cumsum, logcumsumexp, cumprod, cummin, cummax}
on CUDA (#57910, #58196, #59977, #62767, #57904).torch.cuda.is_bf16_supported
(#63798)torch.segment_reduce
(#59951, #60018, #61141, #61266, #59521, #60379, #60379)torch.isclose
(#61271)torch.trapezoid
(#61475).torch.gradient
support for second order central differences (edge_order=2) (#58165)torch.sigmoid
: CUDA support and complex autograd support (#48647)torch.bilinear
and torch.nn,MaxUnpool2d
(#56322, #49984)autograd.Function
to implement your own forward-mode-AD-supported operator.autograd.Function
(#64061, #63434)torch.{acos, add, addbmm, addcdiv, addcmul, addmm, addmv, addr, angle, acosh, asinh, atanh, asin, atan, conj, baddbmm, bmm, cat, ceil, clamp, clamp_min, clamp_max, complex, copy_sign, cos, cosh, cross, cumprod, cumsum, cummax, cummin, deg2rad, div, dot, vdot, exp, exp2, expm1, expand, floor, frac, frexp, gather, hardswish, hstack, hypot, index_add_, index_copy_, index_put_, index_select, kthvalue, lerp, lgamma, digamma, polygamma, log, log10, log1p, log2, logaddexp, logaddexp2, xlogy, masked_fill_, masked_fill_, masked_scatter_, masked_select, max, maximum, fmax, mean, min, mininum, fmin, mm, mode, mul, lu, lu_solve, vstack}
(#57768, #57863 #59711, #64742)torch.{mvlgamma, nan_to_num, permute, pow, reciprocal, remainder, repeat, round, rsqrt, sigmoid, logit, sign, sgn, sin, sinc, sinh, sqrt, squeeze, sub, sum, t, flip, roll, rot90, take, tan, tanh, trace, transpose, tril, triu, trunc, unfold, unsqueeze, view, zero_, hardshrink}
(#59993)torch.special.
{xlog1py, entr}
(#59711, #59993)torch.linalg.{cholesky, cholesky_ex, eigh, inv, inv_ex, solve}
(#62160, #64646, #62163, #62159)torch.functional.leak_relu
(#59993)autograd.Function
to use with the saved tensor hooks (#60551).is_inference()
method (#58729)torch.lu_solve
: Implement support for backward AD (#61681).nn.{ReflectionPad3d, LazyInstanceNorm*d}
(#59791, #60837, #61308, #60982)nn.CrossEntropyLoss
: Added support for class probability targets (#61044)nn.CrossEntropyLoss
: Added support for label smoothing (#63122)nn.Module
: Added support for arbitrary objects in state_dicts via get_extra_state()
/ set_extra_state()
(#62976)nn.utils.skip_init()
: Added function to skip module parameter / buffer initialization (#57555)#364
,#368
,#383
, #422
)at::meta::
namespace (#58570)cpu_kernel
, cpu_kernel_vec
and cpu_kernel_multiple_outputs
(#58949)at::native::resize_bytes_cpu
to resize Storage
in ATen (#60324)transpose
to PackedTensorAccessor (#61114)torch::linalg::qr
as the C++ API (#60529)amin
and amax
to aten symbols (#61550)c10::optional
to compare with different but comparable types (#62890)c10::util::check_env
to check environment variable (#59052)torch.distributed.rpc.is_available()
(#58887)torch.jit.script
(#62420)torch::deploy
C++ API (#62669)torch::deploy
. (#63817)aten::{avgpool2d,softmax,to,div,flatten,detach,slice,log_softmax,conv2d_transpose}
to NNAPI converter (#58538, #58539, #58540, #58541, #60885, #58543, #59364, #61378, #59529aten::{conv2d,linear,cat,flatten}
converter accept flexible batch (#61021, #61022, 76c0f223d3, #61024)aten::{hardswish,tanh,clamp}
for iOS Metal (#64588, #61383)DistributedDataParallel
torch.distributed
__fx_create_arg__
dunder method for controlling custom classes are handled as node args (#61780)autowrap_functions
kwarg to Tracer (#62106)conv2d
, BatchNorm2D
, ReLU
, maxpool2D
, AdaptiveAvgPooling2D
, flatten
(#61093, #61012, #61150, #61188, #61239, #61265)get_attr
operations in typechecker (#62682)remove_duplicate_output_args
(#65134)torch.{linspace, new_ones, nn.LSTMCell, bernoulli, dot, nn.utils.spectral_norm,bernoulli, distributions.normal.Normal, roll}
(#58854, #59255, #62757, #62765, #59536,#61560,#58697)torch.fft.
operators on ARM-based platforms using pocket FFT (#60976, #62222, #63714)torch.einsum
: added support for the “sublist” format (#56625)torch.linalg.det
: added support for complex autograd (#58195)Tensor.to_sparse
(#58413)max_pool2d
, tanh
, hardshrink
, log_softmax
, leaky_relu
, softmax
(#58806, #60695, #62870, #63193, #62239)torch.floor_divide
deprecation warning (#64034)torch.nansum
accuracy (#61082)torch.i0
: now promote integer inputs to float (#52735)torch.kthvalue:
added change to adjust output dim size for numpy compatibility (#59214)torch.scatter
operation. (#57015)torch.testing.assert_close
(#58926)torch.isclose
upcast to most precise dtype within their category before the comparison (#60536)alpha
to acc_type
for torch.add
and torch.sub
(#60227)torch.cat
shape check and removed unnecessary offending index information (#64556).torch.gather
(#65006).float64
in tensorboard
instead of float32
(#59435).use_strict_trace
to tensorboard add_graph
method (#63120).torch.hub
(#62139)output_size
to tensor.repeat_interleave
(#58881)torch.isclose
(#63571)torch.{testting.assert_close,is_close}
consistent with numpy (#63841).backward()
is called with create_graph=True
(#59412)Tensor::grad()
on a non-leaf Tensor in the C++ API (#59362)grad_output
creation for .backward()
and autograd.grad()
(#59532)NotImplementedError
for forward and backward-mode AD formulas that are not implemented (#59482, #59483)torch.relu
for common use cases (#63089)autograd.backward()
function inputs
argument (#60521)requires_grad=True
is passed to a non-differentiable function (#60610)binary_cross_entropy
differentiable w.r.t. target
(#59447)nn.{AdaptiveAvgPool*d, AdaptiveMaxPool*d, AvgPool*d, CosineEmbeddingLoss, Dropout, FractionalMaxPool2d, Linear, LPPool1d, MaxPool*d, MaxUnpool*d, NLLLoss, PairwiseDistance, ReflectionPad*d, ReplicationPad*d, TripletMarginLoss, ZeroPad*d}
, most other loss modules, and all activation modules (#61264, #61847, #61860, #64590, #61911, #62490, #60992, #62190, #62206, #61984, #61310, #62651, #64882, #62183, #61060, #61262, #62729, #61300, #61461, #62726)nn.{AdaptiveAvgPool*d, AdaptiveMaxPool*d, Bilinear, FractionalMaxPool*d, LocalResponseNorm, MaxPool*d, MaxUnpool*d, TransformerDecoder, TransformerDecoderLayer, TransformerEncoder, TransformerEncoderLayer}
(#62025, #62088, #47106, #62083, #62801, #64082, #62800)nn.AvgPool2d
: Added channels_last
support on CPU (#58725)nn.BatchNorm
: Use resize_output
and empty
instead of empty_like
to improve flexibility in output memory format choice (#63084)nn.Bilinear
: Added support for non-contiguous tensor inputs (#38409)nn.GELU
: Added support for fp32/bfloat16 in CPU path using mkldnn implementation (#58525)nn.GroupNorm
: Improved numerical stability by using the Welford algorithm and cascade summation (#54921)nn.LayerNorm
: Improved numerical stability by using the Welford algorithm and pairwise sums (#59987)nn.NLLLoss
: Added support for target of dtype byte
(#60308, #60650)nn.SmoothL1Loss
: Added support for integral target within the backward pass (#61112)nn.Transformer
: Added configurable pre/post LayerNorm placement (#60593, #61692)nn.{RNN, LSTM, GRU}
(#60269)nn.{LeakyReLU, RReLU}
(#61514)channels_last
memory format in nn.{AdaptiveMaxPool2d, GroupNorm}
(#48920, #49821)nn.{MultiheadAttention, Transformer, TransformerDecoderLayer, TransformerEncoderLayer}
(#61355, #62342)profiler.profile
argument with_flops
when set to True
to report total FLOPs rather than FLOP/s, and support more operators (#62779, #61895)#361
, #404
,#416
,#421
)#351
)Subset
to dataset (#59513)ConcatDataset
must be Sized
(#64114)IterableDataset
to accept keyword-only arguments and abc
class (#58450)DataLoader
to accept non-integer Sampler
as input(#63500)torch.scatter_add
for 1D tensors (#58761)--torch_jit_enable_rethrow_caught_exception=true
(#63348)torch.nn.ModuleList
to support arbitrary step size (#58361)Tuple[()]
annotation (#58340)torch.nn.Parameter
type for Profile-Directed-Typing (#59249)torch.einsum
(#59265)torch.jit.isinstance
with multiple types (#60465)checkScriptRaisesRegex
(#63901)optimize_for_mobile
to preserve nodes’ debug information (#63106)torch::deploy
(#58117)torch.utils.model_dump
APIs:
quantized::linear
(#58282) and quantized::embedding_bag_byte_prepack
(#64081)qconfig_dict
argument handling (#59605, #58566)torch.index_select
on quantized tensors (#61406)DistributedDataParallel
NCCL_ASYNC_ERROR_HANDLING
environment variable to control NCCL error handling (#59109)mul
and copy_
instead of mul
’s out=
variant when gradient tensor requires grad in DDP (#63831)Tensor.set_
instead of directory assigning data in model averaging (#63895)torch.distributed
torch.distributed
launcher (#59152)torch.distributed.optim.ZeroRedundancyOptimizer
(#61370)torch.distributed.nn.RemoteModule
torch.distributed.elastic
torch.distributed.rpc
threading.Locks
(#57943), torch.cuda.Event
(#61354)torch.distributed.Store
torch.distributed.pipeline
WithDevice
wrapper to specify device execution for a module. (#65190)torch.nn.Module
constructor (#61334)torch.deploy
for GraphModules with non-torch dependencies (#61680)torch.memory_format
as a BaseArgumentType (#62593)__matmul__
to the magic methods for FX tracing (#64512)torch.{any, all, fmax, fmin, remainder, glu, argmax, argmin, avg_pool3d_backward, isposinf, isneginf, fmod, fmin, signbit, slow_conv_transpose2d, nll_loss_backward, cumprod, aminmax, addcmul, addcdiv, gather, hardshrink_backward, softshrink_backward, hardshrink, gelu, gelu_backward, avg_pool2d, avg_pool2d_backward, avg_pool3d, reflection_pad1d_backward, all, any, silu_backward, sgn, softplus, leaky_relu_backward, hardsigmoid_backward, elu_backward, eq, xlogy, ne, lt, gt, le, ge, sigmoid_backward, tanh_backward, logit_backward, bitwise_or, bitwise_xor, bitwise_and, nll_loss_forward, log_softmax, log_softmax_backward_data, prod, norm, sum.dim_IntList, clamp}
(#64642, #58458,#58732, #61800, #60363, #60364, #59084, #60633, #60809, #60810, #57936, #55503, #62144, #61899, #62401, #62318, #62319, #63312, #58662, #58663, #58664, #58665, #58987, #59082, #59083, #59103, #60360, #60361, #58661, #58197, #58482, #58483, #58484, #58660, #60177, #60814, #60942, #60815, #60816, #60817, #60811, #60812, #60813, #61443, #57374, #62372, #62024, #62711, #61642, #61361)torch.utils.collect_env
(#59632)CMAKE_PREFIX_PATH
choice set by caller (#61904)torch.__version__
comparisons (#61556, #64565, #63848)bazel
builds (#63604)torch.linalg.cholesky
(#62434)THAllocator
to MapAllocator
in ATen (#60325)TensorOptions.device_index
from int16_t
to to c10::DeviceIndex
(#60412)torch.polygamma
incorrect behavior at infinites when n>=1 (#61641)torch.{sort,topk}
on CUDA (#63029), torch.tensor_split
indices(#63390)torch.Tensor
when given a scalar Tensor (#58885)Tensor.{grad,_base}
by default for Tensor-like objects(#60464)torch.angle
on aarch64 (#59832)torch.normal
: fixed RuntimeError when standard deviation named arg is torch.empty (#66524)torch.Tensor.copy_
when using large inputs and broadcasting (#64425)torch.trapezoid
(#64054).torch.median
crash on empty tensor (#61698)torch.get_num_threads
(#64486)torch.flatten
(#61953)torch.hub.{list,help}
functions for Windows (#63773)torch.{istft,rfft}
errors for special inputs (#63469, #63327)x[index] = value
no longer results in a RuntimeError if x
and value
are different devices.torch.Tensor.cauchy_
on CUDA for inf values (#60186)torch.{signbit,isin}
no longer raise an error when passed a tensor that requires grad (#62529)torch.a{max,min}
(#59669)binary_cross_entropy
loss function when reduction=sum
. (#59479)nn.AdaptiveAvgPool2d
: Correctly dispatch to CUDA implementation (#61851)nn.AdaptiveAvgPool3d
: Fixed gradient computation (#60630)nn.BatchNorm
: Fixed mixed precision usage when affine=False
(#61962)nn.BatchNorm2d
: Fixed issue when input is non-contiguous (#63392)batch_norm()
to preserve output memory layout based on input (#62773)nn.MaxPool2d
: Use channels_last
memory format for output and indices when input is channels_last (#61245)nn.Module
: Fixed full backward hook when grad is disabled (#65335)nn.Module
: Fixed get_buffer()
to check buffers by name instead of value (#61429)nn.Module
: Fixed pre-forward hooks for Lazy modules (#60517)nn.Softmax
: Improve numerical stability by subtracting max value in vectorized CPU implementation (#63132)F.cosine_similarity
: Fixed type promotion behavior and added input validation checks (#62054, #66191, #62912, #58559)F.embedding
: Added check to validate that weights are 2D (#59314)F.interpolate
: Fixed output for edge case of single pixel without align_corners (#61166)F.nll_loss
: Fixed regression for gradient computation (#64203)F.pad
: Fixed type of default pad value to be floating point (#62095)torch._ops.ops.{atan, quantized}
modules (#62447)torch.nn.utils.parametrizations.spectral_norm
so that it can be used twice in the same forward pass (#62293)IterableFecher
to stop fetching data after StopIterator
(#59313)ExceptionWrapper
to re-raise Exception with multiple args (#58131)torch.{i1,i1e}
ROCm failure: mark array as const so that it is available for host and device (#59187)torch.manual_seed{_all}
memory leak (#62534)torch.index_add
deterministic implementation (#59254)map
function for vec256
to accept const pointer to function (#59957)supports_as_strided
method to Device
and fixed indices of to_sparse()
contiguous on all devices (#59370)__assert_fail
when the NDEBUG defined. (#58906)__constants__
attribute in model to a set to be consistent (#60003)torch.jit.trace
(#60200)torch.autograd.Function
has multiple tensor outputs (#57966)Tensor.to
schema to reflect that the output may alias input (#60001)importlib.resources.path
for python <3.8.8 (#58718)os
and os.path
(#60276)ScriptModule
and then saving a Tensor
owned by it. (#61806)torch.package
(#59735)torch.clamp
shader function for x86_64 (#63062)cannot resize variables that require grad
(#57068)DistributedDataParallel
torch.distributed.Store
torch.distributed.rpc
torch.distributed.elastic
torch.distributed.autograd
torch.distributed
copy.deepcopy
to propagate output type (#61747)get_attr
node (#62234)tracer_cls
on fx.Graph
when deep copying (#63353)keepdims
(#60245)instance_norm2d
export to handle track_running_stats=True
(#58690)false
(#60199)at::cpu::{op}
and at::cuda::{op}
get external linkage, so they can be used outside of libtorch (#58569)pybind11
from third_party
folder by default (#58951)torch.utils.collect_env
(#63321)SciPy
dependency optional in PyTorch unary operators tests (#59304)setup.py
re-run incremental build logic on Windows (#59689)torch.utils.cpp_extension
behavior when older setuptools are used (#61484)torch.linalg.inv_ex
could sometimes be on the wrong device (#59223)torch.linalg.norm
could return tensors with the wrong shape in some edge cases (#60273)torch.linalg.svd
could return tensors with the wrong shape in some edge cases (#62022)torch.matmul
would throw an error when attempting to multiply certain empty tensors (#63359)torch.special.{'i0', 'i0e', 'i1', 'i1e'}:
converted floating-point constants to input type in Bessel functions (#59416)torch.unique_consecutive()
(#64835)torch.cuda.empty_cache()
before capture to fix flaky tests (#59233)torch.flip
: improved performance via TensorIterator (#59509)torch.gelu
via tensoriterator (#58950)torch.sum
: added change to accumulate 16-bit float sums in 32-bit accumulators for improved precision and performance (#60387)torch.
{dot, vdot, mm, addmm, bmm, baddbmm}
(#62915, #59380)torch.cum{sum,prod}
backward formulas (#60642)reshape
call if the tensor already has the right shape (#61466)nn.utils.clip_grad_norm_
: Removed device syncs (#61042)nn.BatchNorm2d
: Optimized performance for channels_last
on CPU (#59286)nn.Softmax
: Vectorized softmax calculation for the non-last-dimension case (#59195, #60371)nn.Transformer
: Faster generate_square_subsequent_mask
(#60631)unique
call in embedding use cub instead of thrust (#63042)fastAtomicAdd
in EmbeddingBag (mode "max") backward (#63298)F.avg_pool3d
CUDA backward: use fast atomic adds (#63387)torch.distributed:
replaced all_gather with more efficient collective api _all_gather_base (#57769)torch.distributed.optim.ZeroRedundancyOptimizer:
Sorted params by size (decreasing) (#59586)std::regex
for device string parsing (#63204)to_sparse_csr
by writing custom CPU/GPU kernels (#61340, #61838)c10::multiply_integers
for COO Tensors (#60872)std::vector
(#60873)default_collate
(#61424)gelu
, bmm
, mm
, einsum
, log1p
(#59334, #59595, #63654, #64647, #64032, #64205)You can also find the dev specific and documentation related changes in the forum post here
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4