We are excited to announce the release of PyTorch® 2.0 (release note) which we highlighted during the PyTorch Conference on 12/2/22! PyTorch 2.0 offers the same eager-mode development and user experience, while fundamentally changing and supercharging how PyTorch operates at compiler level under the hood with faster performance and support for Dynamic Shapes and Distributed.
This next-generation release includes a Stable version of Accelerated Transformers (formerly called Better Transformers); Beta includes torch.compile as the main API for PyTorch 2.0, the scaled_dot_product_attention function as part of torch.nn.functional, the MPS backend, functorch APIs in the torch.func module; and other Beta/Prototype improvements across various inferences, performance and training optimization features on GPUs and CPUs. For a comprehensive introduction and technical overview of torch.compile, please visit the 2.0 Get Started page.
Along with 2.0, we are also releasing a series of beta updates to the PyTorch domain libraries, including those that are in-tree, and separate libraries including TorchAudio, TorchVision, and TorchText. An update for TorchX is also being released as it moves to community supported mode. More details can be found in this library blog.
This release is composed of over 4,541 commits and 428 contributors since 1.13.1. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try these out and report any issues as we improve 2.0 and the overall 2-series this year.
Summary:
*To see a full list of public 2.0, 1.13 and 1.12 feature submissions click here
Backwards Incompatible Changes Drop support for Python versions <= 3.7 (#93155)Previously the minimum supported version of Python for PyTorch was 3.7. This PR updates the minimum version to require 3.8 in order to install PyTorch. See Hardware / Software Support for more information.
Drop support for CUDA 10 (#89582)This PR updates the minimum CUDA version to 11.0. See the getting-started for installation or building from source for more information.
Gradients are now set toNone
instead of zeros by default in torch.optim.*.zero_grad()
and torch.nn.Module.zero_grad()
(#92731)
This changes the default behavior of zero_grad()
to zero out the grads by setting them to None
instead of zero tensors. In other words, the set_to_none
kwarg is now True
by default instead of False
. Setting grads to None
reduces peak memory usage and increases performance. This will break code that directly accesses data or does computation on the grads after calling zero_grad()
as they will now be None
. To revert to the old behavior, pass in zero_grad(set_to_none=False)
.
>>> import torch >>> from torch import nn >>> module = nn.Linear(2,22) >>> i = torch.randn(2, 2, requires_grad=True) >>> module(i).sum().backward() >>> module.zero_grad() >>> module.weight.grad == None False >>> module.weight.grad.data tensor([[0., 0.], [0., 0.]]) >>> module.weight.grad + 1.0 tensor([[1., 1.], [1., 1.]])
>>> import torch >>> from torch import nn >>> module = nn.Linear(5, 5) >>> i = torch.randn(2, 5, requires_grad=True) >>> module(i).sum().backward() >>> module.zero_grad() >>> module.weight.grad == None True >>> module.weight.grad.data AttributeError: 'NoneType' object has no attribute 'data' >>> module.weight.grad + 1.0 TypeError: unsupported operand type(s) for +: 'NoneType' and 'float'Update
torch.tensor
and nn.Parameter
to serialize all their attributes (#88913)
Any attribute stored on torch.tensor
and torch.nn.Parameter
will now be serialized. This aligns the serialization behavior of torch.nn.Parameter
, torch.Tensor
and other tensor subclasses
# torch.Tensor behavior >>> a = torch.Tensor() >>> a.foo = 'hey' >>> buffer = io.BytesIO() >>> torch.save(a, buffer) >>> buffer.seek(0) >>> b = torch.load(buffer) >>> print(a.foo) hey >>> print(b.foo) AttributeError: 'Tensor' object has no attribute 'foo' # torch.nn.Parameter behavior >>> a = nn.Parameter() >>> a.foo = 'hey' >>> buffer = io.BytesIO() >>> torch.save(a, buffer) >>> buffer.seek(0) >>> b = torch.load(buffer) >>> print(a.foo) hey >>> print(b.foo) AttributeError: 'Parameter' object has no attribute 'foo' # torch.Tensor subclass behavior >>> class MyTensor(torch.Tensor): ... pass >>> a = MyTensor() >>> a.foo = 'hey' >>> print(a.foo) hey >>> buffer = io.BytesIO() >>> torch.save(a, buffer) >>> buffer.seek(0) >>> b = torch.load(buffer) >>>print(b.foo) hey
# torch.Tensor behavior a = torch.Tensor() a.foo = 'hey' >>> buffer = io.BytesIO() >>> torch.save(a, buffer) >>> buffer.seek(0) >>> b = torch.load(buffer) >>> print(a.foo) hey >>> print(b.foo) hey # torch.nn.Parameter behavior >>> a = nn.Parameter() >>> a.foo = 'hey' >>> buffer = io.BytesIO() >>> torch.save(a, buffer) >>> buffer.seek(0) >>> b = torch.load(buffer) >>> print(a.foo) hey >>> print(b.foo) hey # torch.Tensor subclass behavior >>> class MyTensor(torch.Tensor): ... pass >>> a = MyTensor() >>> a.foo = 'hey' >>> print(a.foo) hey >>> buffer = io.BytesIO() >>> torch.save(a, buffer) >>> buffer.seek(0) >>> b = torch.load(buffer) >>>print(b.foo) hey
If you have an attribute that you don't want to be serialized you should not store it as an attribute on tensor or Parameter but instead it is recommended to use torch.utils.weak.WeakTensorKeyDictionary
>>> foo_dict = weak.WeakTensorKeyDictionary() >>> foo_dict[a] = 'hey' >>> print(foo_dict[a]) heyAlgorithms
{Adadelta, Adagrad, Adam, Adamax, AdamW, ASGD, NAdam, RAdam, RMSProp, RProp, SGD}
default to faster foreach
implementation when on CUDA + differentiable=False
When applicable, this changes the default behavior of step()
and anything that calls into adadelta(...)
, adagrad(...)
, adam(...)
, adamax(...)
, adamw(...)
, asgd(...)
, nadam(...)
, radam(...)
, rmsprop(...)
, rprop(...)
, sgd(...)
directly to use the foreach
implementation instead of the for-loop for better performance. However, this change can potentially be backward incompatible since there may be small numerical differences between the results computed with the foreach
implementation and the previous default. The foreach implementation will be the default only if the following conditions are met.
foreach
, fused
, or differentiable
),torch.jit.is_scripting
is False
.When these conditions are satisfied, the implementation used will match the implementation used when one passes foreach=True
. The user defined flag for foreach
will NOT be overwritten in order to preserve user selections. For more details, check the documentation. There should be no significant differences between the results returned by these optimizers. To revert to the old behavior, say, for adam
, pass in adam(..., foreach=False, ...)
or initialize Adam
with Adam(..., foreach=False, ...)
.
Pull Requests: #92306, #92716, #92723,#92724, #92726, #92727, #92728, #92715, #91896, #92730, #90865, #93184, #92181, #92923, #95415, #95818, #95811
torch.nn.utils.stateless.functional_call
now respects tied weights (#90477)
Assume a module has two tied weights, x and x_tied. Previously, invoking functional_call(module, parameters_and_buffers, args, kwargs=None, *, strict=False)
with a parameter dictionary of only one of the tied weights would result in the other one(s) not being updated.
We’ve changed the behavior so that providing one of the tied weights in the parameter dictionary will update all other tied weights. If you would like the behavior in previous versions of PyTorch, please set tie_weights=False
.
Please also see the related deprecation section "torch.nn.stateless.functional_call in favor of torch.func.functional_call".
1.13 2.0>>> class Foo(nn.Module): ... def __init__(self): ... super().__init__() ... self.x = nn.Parameter(torch.zeros([])) ... self.x_tied = self.x ... ... def forward(self, inp): ... return self.x + self.x_tied >>> foo = Foo() >>> params = {'x': torch.ones([])} >>> result = functional_call(foo, params, torch.randn([])) >>> print(result) 1.0
>>> class Foo(nn.Module): ... def __init__(self): ... super().__init__() ... self.x = nn.Parameter(torch.zeros([])) ... self.x_tied = self.x ... ... def forward(self, inp): ... return self.x + self.x_tied >>> foo = Foo() >>> params = {'x': torch.ones([])} >>> result = functional_call(foo, ... params, ... torch.randn([]), ... tie_weights=False) >>> print(result) 1.0Require
return_complex
to be passed explicitly to torch.stft
for real input (#86724)
torch.stft
takes an optional return_complex parameter that indicates whether the output should be a floating point tensor or a complex tensor. return_complex
previously defaulted to False for real input tensors. This PR removes the default and makes return_complex
a required argument for real inputs. However, complex inputs will continue to default to return_complex=True
.
>>> a = torch.rand(1024) >>> _ = torch.stft(a, n_fft=128)
>>> t = torch.rand(1024) >>> _ = torch.stft(t, n_fft=128, return_complex=False)Require inputs to
torch.istft
to be complex valued
torch.istft
no longer supports input in the form of real tensors
with shape (..., 2)
to mimic complex tensors. Instead, convert
inputs to a complex tensor first before calling torch.istft
.
>>> t = torch.rand(65, 33, 2) >>> _ = torch.istft(t, n_fft=128, length=1024)
>>> t = torch.rand(65, 33, 2) >>> _ = torch.istft(t, n_fft=128, length=1024) RuntimeError: istft requires a complex-valued input tensor matching the output from stft with return_complex=True. >>> t_complex = torch.view_as_complex(t) >>> _ = torch.istft(t_complex, n_fft=128, length=1024)Change default behavior of sparse tensor construction to not do component verification(#92094)
We now disable the costly component verification of torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor by default. The user can use the new check_invariants
flag or torch.sparse.check_sparse_tensor_invariants
to locally enable component verification. This allows users to constrain these costly checks to specific regions of their code and enables better overall performance. Previously users had no access to public constructors that disable these checks.
>>> i = [[0, 1, 1], [2, 0, 5]] >>> v = [3, 4, 5] >>> s = torch.sparse_coo_tensor(i, v, (2, 3)) RuntimeError: size is inconsistent with indices: for dim 1, size is 3 but found index 5
>>> i = [[0, 1, 1], [2, 0, 5]] >>> v = [3, 4, 5] >>> s = torch.sparse_coo_tensor(i, ... v, ... (2, 3), ... check_invariants=True) RuntimeError: size is inconsistent with indices: for dim 1, size is 3 but found index 5 >>> with torch.sparse.check_sparse_tensor_invariants(): ... s = torch.sparse_coo_tensor(i, v, (2, 3)) ... RuntimeError: size is inconsistent with indices: for dim 1, size is 3 but found index 5Remove deprecated functionality from
torch.testing
Historically, torch.testing
exposed a lot of private and undocumented functionality publicly. The 2.0 release completes the deprecation cycle for the following items and removes them:
rand
and randn
(#87970)get_all_device_types
(#87971)make_non_contiguous
(#87973).grad()
(#85849)
This is a bug fix. Per the docs, hooks registered to Tensor should fire any time gradients are computed w.r.t. to that tensor. This change corrects the behavior to be consistent with the documentation. See documentation for more details about backward hooks execution..
2.0
a = torch.tensor(1., requires_grad=True) b = a.clone() b.register_hook(hook) # the hook registered here didn't fire before! torch.autograd.grad(b.clone(), inputs=(b,))
grad_fn
post-hooks can always observe the modifications to gradient by any grad_fn pre-hooks or hooks registered to Tensor, even if this is a leaf tensor (#85849)
This corrects the behavior of hooks to be consistent with the documentation in the case where the tensor is a leaf tensor, i.e. the node is a grad accumulator node. See documentation for more details about backward hooks execution.
2.0
def hook(grad): # updates grad return grad * 3 def hook2(grad_input, grad_output): # Before this change, grad_output would NOT see the x3 print(grad_output) a = torch.tensor(1., requires_grad=True) b = a.clone() acc_grad = b.grad_fn.next_functions[0][0] acc_grad.register_hook(hook2) b.register_hook(hook) torch.autograd.backward(b.clone(), inputs=(a,)) # hook fireRemove FSDP
params_with_grad
(#87480)
In FSDP, we used to have an API params_with_grad
for users to get parameters which have gradients from the FSDP module. We decided not to expose this helper because it is not a common paradigm.
m = FullyShardedDataParallel(module) m.params_with_grad()
m = FullyShardedDataParallel(module) m.params_with_grad() # Runtime error thrown # For work-around, users can still do [p for p in self.parameters() if p.grad is not None]Users doing wildcard import of torch.distributed.fsdp.fully_sharded_data_parallel will no longer get non-public symbols (#87917)
Users could previously import both public and non-public symbols:
1.13 2.0from torch.distributed.fsdp.fully_sharded_data_parallel import * ShardingStrategy.FULL_SHARD # Non-public API FullyShardedDataParallel(module) # public API
from torch.distributed.fsdp.fully_sharded_data_parallel import * ShardingStrategy.FULL_SHARD # Non-public API, this will fail now Fully`Sharded`DataParallel(module) # public API ... # Users can instead from torch.distributed.fsdp.fully_sharded_data_parallel import ( FullyShardedDataParallel, ShardingStrategy, ) FullyShardedDataParallel(module, sharding_strategy=ShardingStrategy.FULL_SHARD)Signature of FSDP
auto_wrap_policy
related APIs were changed in (#88450). 1.13 2.0
lambda_auto_wrap_policy(m, unwrapped_params=...) transformer_auto_wrap_policy(m, unwrapped_params=...) size_based_auto_wrap_policy(m, unwrapped_params=...)
lambda_auto_wrap_policy(m, nonwrapped_numel=...) transformer_auto_wrap_policy(m, nonwrapped_numel=...) size_based_auto_wrap_policy(m, nonwrapped_numel=...)Updated
alltoall
signature to be consistent with other c10d APIs (#90569)
The keyword argument names have been changed.
1.13 2.0alltoall(output=..., input=...)
alltoall(output_tensors=..., input_tensors=...)Remove unused functions in torch.ao.quantization.fx.utils (#90025)
This commit removes the following unused functions from both the torch.quantization and the
torch.ao.quantization namespaces:
graph_pretty_str
get_per_tensor_qparams
quantize_node
get_qconv_op
create_qparam_nodes
node_return_type_is_int
is_get_tensor_info_node
torch.ao.quantization.backend_config.BackendConfig
accept inputs in the right order (#90698)
The existing BackendConfig
fusion pattern uses a "reversed nested tuple" format that is unintuitive.
This pattern format also complicates the signatures of the user specified "fuser methods", which needed to accept arguments in reverse nested order to match
the patterns:
import torch as nn import torch.ao.nn.intrinsic as nni from torch.ao.quantization.backend_config import ( BackendPatternConfig ) def fuse_linear_relu(is_qat, relu, bn_conv): (bn, conv) = bn_conv return nni.ConvBnReLU2d(conv, bn, relu) config = ( BackendPatternConfig((nn.ReLU, (nn.BatchNorm2d, nn.Conv2d))) .set_dtype_configs(...) .set_fuser_method(fuse_conv_bn_relu) .set_fused_module(nni.ConvBnReLU2d) ) backend_config.configs # returns Dict[Pattern, BackendPatternConfig]
def fuse_linear_relu(is_qat, conv, bn, relu): return nni.ConvBnReLU2d(conv, bn, relu) config = ( BackendPatternConfig((nn.Conv2d, nn.BatchNorm2d, nn.ReLU)) .set_dtype_configs(...) .set_fuser_method(fuse_conv_bn_relu) .set_fused_module(nni.ConvBnReLU2d) ) # Or for backward-compatibility def fuse_linear_relu(is_qat, relu, bn_conv): (bn, conv) = bn_conv return nni.ConvBnReLU2d(conv, bn, relu) config = ( BackendPatternConfig() ._set_pattern_complex_format((nn.ReLU, (nn.BatchNorm2d, nn.Conv2d))) .set_dtype_configs(...) .set_fuser_method(fuse_conv_bn_relu) .set_fused_module(nni.ConvBnReLU2d) ) backend_config.configs # returns List[BackendPatternConfig]Make the AO codebase compliant with the public vs private API guidelines of pytorch Public-API-definition-and-documentation
If users were using any of the AO private APIs then these would have to be accessed with a preceding _
to conform with the guidelines.
get_observer_dict()
_get_observer_dict()
Pull Requests: (#86029, #87515, #87516, #87517, #87518, #87519, #88392, #88394, #88396, #88397, #87521, #88395, #87883, #88399, #88398, #86022, #86023, #86024, #86025, #86026, #86027, #86028, #86030, #86031, #86032, #86033, #86034, #86037, #90315, #88391, #90554, #87520)
Remove overwrite_output_observer and represent the observer constraints for fixed qparams ops through the existing DTypeWithConstraints mechanism (#88620)This commit removes overwrite_output_observer
and overwrite_output_fake_quantize
overwrite observer settings in the BackendConfig. Instead, we represent the observer constraints for
fixed qparams ops through the existing DTypeWithConstraints mechanism. Note that, however, to be consistent with other DTypeWithConstraints checks, we no longer throw an error if an incorrect observer is specified, but simply ignore the offending QConfig and log a warning instead. This is the BC-breaking part of the change.
1.13
from torch.ao.quantization.qconfig import default_qconfig from torch.ao.quantization.quantize_fx import prepare_fx model = ModelWithFixedQParamsOps() qconfig_mapping = QConfigMapping().set_global(default_qconfig) example_inputs = ... prepare_fx(model, qconfig_mapping, example_inputs)
Before this commit, running the above leads to an exception because the wrong observers are used for fixed qparams ops. After this commit, the above will only encounter a warning,and the fixed qparams ops will not be quantized. In both cases, switching to get_default_qconfig_mapping
will cause the fixed qparams ops to be quantized.
torch.ao.quantization.quantization_patterns
and torch.ao.quantization.fusion_patterns
(#89872)
The following classes under the torch.ao.quantization.fx.quantization_patterns
namespace are migrated to the torch.ao.quantization.fx.quantize_handler
namespace:
QuantizeHandler
BinaryOpQuantizeHandler
CatQuantizeHandler
ConvReluQuantizeHandler
LinearReLUQuantizeHandler
BatchNormQuantizeHandler
EmbeddingQuantizeHandler
RNNDynamicQuantizeHandler
DefaultNodeQuantizeHandler
FixedQParamsOpQuantizeHandler
CopyNodeQuantizeHandler
GeneralTensorShapeOpQuantizeHandler
CustomModuleQuantizeHandler
StandaloneModuleQuantizeHandler
The following classes under the torch.ao.quantization.fx.fusion_patterns namespace are migrated to the torch.ao.quantization.fx.fuse_handler
namespace:
DefaultFuseHandler
FuseHandler
torch.ao.quantization.fx.backend_config_utils
namespace(#89810)
The following APIs that were mistakenly public under the torch.ao.quantization.fx.backend_config_utils
namespace are removed in this commit.
get_quantize_handler_cls
get_fusion_pattern_to_fuse_handler_cls
get_native_quant_patterns
get_pattern_to_quantize_handlers
from torch.ao.quantization.fx.backend_config_utils import ( get_quantize_handler_cls, get_fusion_pattern_to_fuse_handler_cls, get_native_quant_patterns, get_pattern_to_quantize_handlers, ) all_quant_patterns = get_native_quant_patterns()
from torch.ao.quantization.fx.quantization_patterns import ( _get_quantize_handler_cls, _get_pattern_to_quantize_handlers, ) from torch.ao.quantization.fx.fusion_patterns import ( _get_fusion_pattern_to_fuse_handler_cls, ) from torch.ao.quantization.backend_config import ( get_native_backend_config, ) all_quant_patterns = _get_pattern_to_quantize_handlers( get_native_backend_config() )Update torch.{slice|select|diagonal|as_strided}_scatter ops to preserve input stride/storage_offset (#91029)
These operators are primarily used by the functionalization pass, used in AOTAutograd. Previously, they would always return contiguous tensors. Now, they return a tensor with the same striding as their first argument.
1.13 2.0>>> x = torch.ones(2, 2, 2) >>> base = x[:, :, 1] >>> base.stride() (4, 2) >>> x = torch.zeros(2, 2, 2) >>> base = x[:, :, 1] >>> base.stride() (4, 2) >>> torch.diagonal_scatter(base, torch.ones(2)).stride() # returns a tensor with same strides as base. (4, 2)
>>> x = torch.ones(2, 2, 2) >>> base = x[:, :, 1] >>> base.stride() (4, 2) >>> x = torch.zeros(2, 2, 2) >>> base = x[:, :, 1] >>> base.stride() (4, 2) >>> torch.diagonal_scatter(base, torch.ones(2)).stride() # returns a contiguous tensor (2, 1)Remove ONNX deprecated monkey patches to torch.Graph (#94747)
The Deprecated monkey patches to torch.Graph
, torch.Block
and torch.Node
are removed
Monkey patches to the classes torch.Graph
, torch.Block
and torch.Node
from torch.onnx
have been removed. This means the methods torch.Graph.op()
, torch..Graph.at()
, torch.Block.op()
, torch.Graph.constant()
, and torch.Node.__getitem__
are no longer available.
Users creating custom symbolic functions for the torch.onnx
exporter can continue to assume the g.op()
interface for creating an operator in the graph, which is now exposed via the GraphContext
class. Users should not assume any other methods from the GraphContext
class other than those defined natively by torch.Graph
and .op()
.
Code change to existing symbolic functions is not expected with this change.
Add full checker mode in torch.onnx.export (#83186)This removes boolean value of full_check
parameter in TORCH API check_onnx_proto
, and forces full_check
with warning messages if it fails.
Also, the API didn’t check on types in the graph even with full_check=True
previously. With the change, a warning message will show if the graph contains type error.
torch::deploy
has been migrated to over to MultiPy. Ongoing development will continue in this repository.
lazy::View
(#87822)
The view and aliasing infrastructure in lazy tensor core has been deprecated in favor of functionalization.
Renamedc10::fromIntArrayRef
to c10::fromIntArrayRefSlow
and changed call sites (#86235)
The function has been renamed to more accurately reflect its performance characteristics.
Deprecations torch.func aka functorch We’ve deprecated the functorch module in favor of the new torch.func moduleWe’re excited to announce that, as the final step of upstreaming and integrating functorch into PyTorch, the functorch APIs are now available in the torch.func module. Our function transform APIs are identical to before, but we have changed how the interaction with NN modules work.
We’ve deprecated functorch._
function transforms (e.g. vmap
, grad
, jvp
) in favor of their identical torch.func._
counterparts (#92279).
PyTorch has consolidated on torch.func.functional_call
as the NN module functional API. Please migrate from functorch.{make_functional, make_functional_with_buffers}
to it. For more details see this Guide
Please migrate from functorch.combine_state_for_ensemble
to torch.func.stack_module_state
. For more details see this Guide
We are no longer supporting functorch.compile (also known as AOTAutograd) as a frontend for compilation in PyTorch; we have integrated AOTAutograd into PyTorch’s compilation story. If you are a user, please use torch.compile()
instead.
Typed storages have been removed from the C++ side and torch.UntypedStorage is used in place. The use of torch.TypedStorage and all of its subclasses is now deprecated.
1.13 2.0tensor.storage() torch.TypedStorage(...)
tensor.untyped_storage() torch.UntypedStorage(...)
If you need to access individual elements in a storage as a particular dtype, you can simply create a tensor to view it:
torch.tensor(storage, dtype=...)Deprecate
tensor.mT
,tensor.T
,tensor.mH
,tensor.H
on 0D-tensors (#92143) 1.13 2.0
>>> a = torch.tensor(10) >>> a.T >>> a.H
>>> a = torch.tensor(10) >>> a.T UserWarning: Tensor.T is deprecated on 0-D tensors. This function is the identity in these cases. >>> a.H UserWarning: Tensor.H is deprecated on 0-D tensors. Consider using x.conj().Autograd API Deprecate decorating classes with torch.no_grad (#89522)
Decorating classes with torch.no_grad
is now deprecated. You should be decorating its functions or methods instead. To preserve the current behavior of class decoration, you can directly decorate the __init__
method and nothing else.
@torch.no_grad() class Blah(): pass
class Blah(): @torch.no_grad() def __init__(self): passLinalg Remove the use of overload at::frobenius_norm(const Tensor&) (#81762)
In continuation with the deprecation process from release 1.12 the tensor overload for this function has been removed. This function was not used in the bindings of Pytorch and should not impact users of torch.norm
.
functional.{tanh, sigmoid}
functions (#86905)
Both these ops are heavily used and so will not be removed. Deprecation warnings have been removed.
Deprecated torch.nn.utils.stateless.functional_call in favor of torch.func.functional_call (#92280)We’ve moved torch.nn.stateless.functional_call under the torch.func module to reflect how it is useful for working with nn.Modules in a functional style. As of PyTorch 2.0, torch.func.functional_call
is a drop-in replacement for torch.nn.stateless.functional_call
and we will remove torch.nn.utils.stateless.functional_call
in a future version of PyTorch. However, please note that we did change the default behavior of torch.nn.stateless.functional_call
in PyTorch 2.0 (see “torch.nn.utils.stateless.functional_call now respects tied weights” under BC-breaking notes).
Removed the Python 2 and 3 compatibility library six and future and torch._six.
2.0
# from torch._six import string_classes str # from torch._six import int_classes int # from torch._six import inf, nan from torch import inf, nan # torch._six.string_classes strOnnx Deprecated Caffe2 ONNX exporter support #95071
Users must use PyTorch 1.x versions to use Caffe2 ONNX exporter. This capability will be completely removed from PyTorch 2.x series.
New Features torch.nn APItorch.nn.functional.scaled_dot_product_attention()
to allow writing fast Transformer-like functions and use it to speed up nn.Transformer()
( #91362, #91066, #90413, #87312, #94008, #89470, #90776, #92189)Module.register_{buffer,module,parameter}
functions (#86148, #87369)Module.full_backward_pre_hook
(#86700)Module.state_dict_pre_hook
(#90435)Module.call_super_init: bool
flag that can be used to ensure Module
initialization is properly calling parent’s __init__
(#91819)functorch
support for torch.autograd.Function: one is now able to apply function transformations (e.g. vmap, grad, jvp) over torch.autograd.Function. (#92023, #91452, #91222, #90037, #90077, #90966, #89860, #91211, #92030)Logcumsumexp
for complex dtypes for CUDA (build-time optimized) (#94310)set_to_none
flag for C++ optim endpoint (#92989)tensor.to()
for NestedTensor backend (#87146)gelu
and relu
operators (#94776)torch.neg
operator (#88131)sharded_state_dict
is added as well. (#87987, #88698, #89256, #89398, #89399, #89501, #89503, #89537, #89542, #89873, #89964, #90212, #91036, #91092, #91209, #91269, #92553, #92705, #92829, #92869, #92933, #94379, #94501)init_process_group
API which changes backend to an optional argument. For users, this feature will allow for code that runs on both GPU and CPU machines without having to change the backend specification. The dispatchability feature will also allow users to perform both GPU and CPU collectives using the same ProcessGroup, as PyTorch will automatically find an appropriate backend for the tensor type (as of PyTorch 2.0, the default is NCCL for CUDA and Gloo for CPU). Existing backend specifications by users will be honored and will not require change (#83679, #83735, #83810, #83859, #83876, #83916, #84423, #86166, #86368, #86407, #86408, #86409, #88351, #88846, #88889, #88903, #89317, #89318, #89505, #89813, #88330, #91257, #91172)torch.nn.functional.group_norm
(#91190), torch.var_mean
(#91190), torch.nansym
(#93845), torch.frac
(#86625), torch.signbit
(#87214), torch.exp1m
(#87147), torch.cumsum
(#88319), torch.trace
(#87910), torch.nn.Hardswish
(#87952),torch.inverse
(#90428), torch.floor_divide
(#91126), unfold
(#91266), bincount
(#91267), nonzero
(#91616), norm_dtype
andcdist
(#91643), unique
andunique_consecutive
(#88532), nan_to_num
(#91110), torch.linalg.cross
(#91642), randperm
(#91708), triangular_solve
(#94345), grid_sampler2d
(#94273), remainder
(#92139), addr
(#94538), fmod
(#94722), repeat_interleave
(#88649),sort
andargSort
(#94697),range
(#91075)torch.mps.{get_rng_state, set_rng_state, synchronize, manual_seed, seed}
(#94417)mps
device for torch.Generator
(#91348)torch.int64
support for unary ops (#86615)Executor
and Compiler
classes which compiles the XNNPACK graph and preps for execution (#88779, #88778, #88780, #89090)torch.sparse.check_sparse_tensor_invariants
context manager that allows users to opt into more checks at runtime for better debugging. (#92094)check_invariants
flag to torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor
to allow users to verify components at construction time. (#92094)reduce
flag for CPU to torch.sparse.mm with support for sum, mean, amax, amin
(#83727){Adadelta, Adagrad, Adamax, AdamW, ASGD, NAdam, RAdam, RProp}
differentiable (#86096, #86258, #86183)requires_backend_transfers
flag of a model is set to false
, then input tensors do not to be transferred to the GPU (via tensor.gpu()
) and output tensors do not to be transferred back to the CPU (via tensor.cpu()
) since these transfers are inserted into the modelMobileOptimizer.VULKAN_AUTOMATIC_GPU_TRANSFER
under torch.utils.mobile_optimizer
to the optimization_blocklist
argument of optimize_for_mobile
(#92081)hipGraph
support for pytorch mainline (#88202)any_chain()
in operator support (#90949)example_kwarg_inputs
argument (#81623, #94032)torch.squeeze
to allow squeezing multiple dimensions at once (#89017)where
to have cpu scalar args (#87022)torch.tensor.asarray
(#90914)Tensor.set_
when dtypes mismatch(#88804)torch.max
(#85926)torch.ormqr
(#86800)setup_context
(#89859, #92312)
forward
should no longer take ctx as an input.torch.autograd.set_multithreading_enabled
for disabling multithreading in the autograd engine (#86245)remove_duplicate
flag to Module.named_buffers()
method (#84984) and Module.named_parameters()
(#88090)Module
forward-pre and forward hooks (#89389)Transformer()
fast path (#90783) and kernel selection (#90783)torch.bf16
for Embedding
(#94163)freeze
argument to Embedding()
(#86769)torch.channels_last_3d
support for SyncBatchNorm()
(#88401)torch.bfloat16
support on CPU for functional.{mish,hardtanh,silu}
(#82460)LayerNorm()
(#81851, #88064), BatchNorm{1,2,3}d()
(#84410), GroupNorm()
(#89485, #81852, #88663, #92671, #92668)ModuleList()
(#90452)torch.uint8
support for functional.interpolate()
on CPU (#90771)functional.max_pool1d
error checking consistent between CPU and CUDA (#90211)SyncBatchNorm()
fallback to BatchNorm()
when it is used in a non-distributed setting (#89706)GroupNorm()
on XPU (#87680)is_causal
kwarg to TransformerEncoder()
layer (#90508)prepend
argument to Module
hooks to register a hook that will be called before the existing ones (#87370)BACKWARD_PRE
for the backward_prefetch of FSDP (#88428)NO_SHARD
in clip_grad_norm_
(#89137)BACKWARD_PRE
and BACKWARD_POST
in the post-backward assert (#89791)ModuleWrapPolicy.__repr__
(#89058)clip_grad_norm_
for low prec grads (#90028)ModuleWrapPolicy
for simplicity in FSDP autowrap (#88450)use_orig_params=True
, no_sync
and mixed precision to work together (#91193)summon_full_params(with_grads=True)
(#85738, #87314)keep_low_precision_grads
support when CPU offloading (#86495)state_dict
offload_to_cpu
settings (#86211)set_state_dict_type
API to setup state_dict_type
without using context manager (#86243)use_orig_param
for FSDP’s optim_state_dict
(#89898, #89899, #89900)optim_state_dict
and optim_state_dict_to_load
for FSDP (#90798, #91343, #92744, #92118, #92991, #92992, #93285, #93318, #94109, #94129)torchrun
and TorchElastic
to take optional local_addr
param to allow skip local IP lookup if specified (#88922)get_worker_info
(#87017)torch.nn.functional.conv_transpose3d
(#87967), torch.log1p
(#89214,#90422), torch.lerp
(#75584), torch.logcumsumexp
for CPU (#93153)prims.clone
(#86705), ndtr, ndtri, log_ndtr, erfcx
(#86077), NLL loss
(#81128), conv backward
(#87047), xlogy and xlog1py
(#77712), alpha_dropout
(#87989)_adaptive_avg_pool2d_backward
(#86359), (#87074), avg_pool2d and avg_pool2d_backward
(#87043), scalar_tensor and argmax
(#88590), topk
(#88694), max_pool2d_with_indices_backward
(#88743), grid_sampler_2d_backward
(#88745), linalg_cholesky
and linalg_cholesky_ex
(#89430), aten._cdist_forward
(#90042), aten.pixel_shuffle
(#91605)linear
(#86137, #86302), mm
, log1p
(#86301, #88155), to_sparse_*
(#90281)sparse_dim
, dense_dim
(#86203, #86203), torch.sum
(#86300, #92979), torch.sparse.sampled_addmm(#86401),
frac,
deg2rad,
rad2deg,
relu(#88153, #88156, #88442, #86749),
conj()(#91695),
to_sparse(#90718),
sparse_mask` (#92248, #94829)sparse_mask
(#91964)indices, values, (c)row_indices, (c)col_indices
(#93149) and addmm
(#94843)col2im
opset 18 (#84594), mse_loss
(#90717), aten::contains
(#91660), src/index dynamic axes support for aten::scatter_add
(#90090), aten::zero
(#91731), Raise Unsupported for GridSample
with volumetric 5D input (#92212)torch.onnx.export
API (#83186)JitScalarType
API (#87245)share_from_this
to torch::jit::Graph
(#87343)ONNX_ATEN_FALLBACK
mode (#85736)INT64_MAX
magic numbers (#88341)torch.fx
compatible with Python-3.11 (#92895)getitem
node before split_module
(#88510)torch.nn.Linear
(#89774), torch.nn.GELU
(#86218)torch.bitwise_not
(#87286), torch.nn.LayerNorm
(#94212), many backward functions (#94343), torch.nn.functional.hardswish
(#94342), torch.topk
(#91884), torch.arange
(#94485), torch.linal.inv
(#94551),nn.Conv2d
when inputs are on different devices (#86303)torch.nn.{Fold, UnFold}
(#94491)k
greater than 16 for torch.topk
(#94639)@pytorch//
in bazel build files which improves embedding usecases (#89660)USE_CUDA
for bazel build (#92640)Literal
, Protocol
, and Final
from standard library typing
as of Python 3.8+ (#94490)amin
/amax
(#93091)torch.tensor.scatter
(#88244)torch.tensor.index_select
over scalar tensor (#94347)torch.tensor.where
(#92849)torch.histc
consistent between CPU and CUDA (#87832)linalg.solve
(#91456), linalg.lstsq
(#91460)Module
s to work with stateless.functional_call()
(#91111), better error messages (#87442),EmbeddingBag
(#85433)Upsample
and EmbeddingBag
module printing (#93850)Conv3D
CPU implementation (#94325)Upsample
(#94290)functiona.pixel_{shuffle,unshuffle}
to consistently return views or not (#86608)Conv3d()
(#87527), Upsample()
(#87901)Conv{1,2,3}d()
(#86521), functional.adaptive_{avg, max}_pool()
(#88906)Upsample()
(#89252), MaxUnpool3d()
(#94372)functional.grid_sample()
loss of precision for torch.float16
inputs (#90427)functional.interpolate()
bicubic interpolation to properly preserve memory format (#90470)make_functional.py
(#91579)CUDA_VISIBLE_DEVICES
into account for nvml calls (#94568)autocast_gpu_dtype
in custom_fwd
and custom_bwd
for BFloat16 autocast (#88029)CUDA_VISIBLE_DEVICES
into account for nvml calls (#94568)send
, recv
return type (#92152)backend_type
for backend/PG plugin (#93129)isinstance
with torch.distributed.ReduceOp
(#87303, #88275)__eq__
for ReduceOp
(#90088)use_orig_params=True
for reentrant activation checkpointing by disabling the post-backward hooks (#87413)_lazy_init
in case module changing after FSDP constructor (#87837)NO_SHARD
by handling sharded and non-sharded parameters differently in FSDP.clip_grad_norm_
(#88955)ActivationWrapper
directly to the inner wrapped module to fix state_dict
issues (#87950)use_orig_params=True
in FSDP (#91767, #92662)keep_low_precision_grads=True
for use_orig_params=True
(#90027)use_orig_params=True
+ no_sync
(#90546)no_sync
, use_orig_params=True
, mixed precision, sharded (#92874)_mp_shard
in record_stream
(#91096)clip_grad_norm_
issues (#94835), (#86337)load_sharded_state_dict
FQN mismatches for shared parameters (#86524)None
edge case (#87308)state_dict
transformations of modules with persistent buffers failure with mixed precision enabled (#93396)nn.Parameter
usage for 2D and use_orig_params=True
(#89782, #89845, #90562)_foreach_norm
on some tensor sizes (#91844)_foreach_norm
from autograd_not_implemented_fallback check (#93995)conj
and neg_view
(#88182)group["capturable"]
, not defaults["capturable"]
in Adam(W) (#94149)FusedAdam(W)
should take OptState
into account before unscaling grads (#94060)torch.save
(#88867)cat
: fix striding (#89332)prelu
: Fix prelu ref when a.ndim < 2 (#89809)huber_loss_backward
fix (#86955)uniform
fix (#90094)unfold_copy
fix (#86371)aten.group_norm
type promotion fix (#86607)torch.as_strided_scatter_backward
memory initialization (#88342)aten.copy
preserve strides (#89464)torch.mm
: (#90763), (#90917), (#91094)mul
when given CUDA CSR Tensor and scalar (#91239)torch.triangular_solve
for CSR on CPU when unitriangular=True
. (#93352)symint::sizes()
instead of sizes()
on convolution error messages. (#89549)torch.linspace
result on CPU consistent with numpy (#89048)exponential_
few fixes (1) lambda > 0 (2) mkl kernel to continuous (3) better error log on dtype (#92891)cauchy_
few fixes (1) check gamma > 0 (2) better dtype error log (#93314)make_fx
invocations isolated (opaque to higher make_fx
invocations) by default (#93290)triu
/tril
operator export with diagonal input (#86843)aten::index_put(self, mask, v)
export when rank(mask) < rank(self)
(#92862)scatter_add
with different static shape of src and index (#89787)_pad_circular
export (#86984)ceil_mode
and count_include_pad
to align torch ceil_mode
results in corner case (#87892)unconvertible_ops
as per #89261 (#89299)Gather
replacement in RNN peephole
(#93120)cat
operator for tensors with unknown rank (#94870)onnx::Max
into standard Op for scalar type alignment (#88750)setType
from user into InferredType
and Reliable
in ConstantValueMap
(#88622)BUILD_CAFFE2=0
builds (#88504)torch.autograd.Function.symbolic
method support (#94746)FindCommonAncestor
in function_extraction
(#86650)ScriptedModule
(#86745)torch.median
(#90326, #88807), torch.{std,var}
correction
argument (#91203), torch.index_select
(#94117, #91064), torch.cumsum
(#94119), torch.where
(#86240), torch.nn.Embedding
(#82809), torch.nn.Softplus
(#88555), torch.nn.functional.pad
(#89864), torch.max
(#91520), padding functions (#91522), torch.nn.functional.upsample
(#91669), pooling functions (#91519, #94348), torch.nn.{NLLLoss,SmoothL1Loss}
(#94226), torch.nn.SoftPlus
(#94256), torch.masked_fill
(#94263), torch.fill_
(#94479), torch.median
(#94489), torch.nonzero
(#94442), torch.nn.BatchNorm
(#94351), torch.{min,max}
(#94386), torch.nn.GELU
(#94529), torch.nn.LSTM
(#94889), #95137),torch.nn.Conv2d
(#95078),torch.nn.functional.bilinear
(#94892),torch.copy\_
(#95272),torch.max_pool2d
(#94963),torch.div
(#95769)torch.bool
for Unary ops (#91120), scatter ops (#94464),torch.float16
for torch.nan_to_num
(#94220), torch.nn.HuberLoss
(#94567)torch.int64
inputs for torch.dot
(#94270), torch.floor_divide
(#94488), torch.square
(#94766),torch.int64
to torch.int32
for reduction ops and raise warning. (#94484)torch.nn.Conv3d
(#94492),torch.float
inputs by casting them to torch.float
(#88542)torch.cat
(#91786, #94662), torch.Conv2d
(#91822, #94384), torch.nn.{ELU,ReLU,Hardswish}
(#94664), torch.nn.BatchNorm
(#94760), torch.nn.MaxPool2d
(#94877)extern "C"
block (#87853)benchmark_limit
ignoring failed kernels in FIND (#91032)functional.interpolate()
speed for torch.channels_last
(#86361, #86361, #90302)functional.multi_head_attention_forward()
(#93234, #89847)TransformerEncoderLayer()
and MultiheadAttention()
(#87377, #88488, #88831, #88854, #88970, #91171)SyncBatchNorm()
performance by using the right gathering ops (#89521)ConvTransposed2D()
CPU performance for torch.{float32, bfloat16}
(#92530)functional.local_response_norm()
performance for 3d inputs (#91052)bitwise operators
(#91971), nansum
& nanmean
(#91372), all
& any
(#91966), torch.linalg.vander
(#91749), slogdet
(#86815), torch.index_fill
(#91364), narrow_copy
(#88130), view_copy
(#88150), greater_equal.Scaler
(#91324)atomicAdd
for bfloat16
in Ampere and above (#84981)torch.bmm
(#86856), (#85894)select_nested
(#89150)pad
in no-padding case(#88769)lerp
performance (#84844)mul
when given COO (#86269)to(dtype)
support for all sparse compressed formats (#89055)to_sparse
(#91389)sparse_mask
(#91964)to_dense
backward by removing redundant call to coalesce
(#92001){Adadelta, Adagrad, Adam, Adamax, AdamW, ASGD, NAdam, RAdam, RMSProp, RProp, SGD}
(#92048, #92362, #92363, #92349, #92364, #92365, #92369, #92372, #92338)biasadd
OMP perf issue for the packed MKL SGEMM (#92300)_vec_log_softmax_lastdim
(#85398)torch.add{cmul,cdiv,mm}
(#94214, #94534)torch.multinomial
(#86342), faster op launch time (#86437), torch.linear
(#91114), view handling (#91743, #94218), convolutions
(#94661), scatter/gather
(#94663)MaxPool2d
(#86559), utils.clip_grad_norm_()
(#91312), Module()
(#87142), {Unfold,Fold}()
(#88819), torch.nn.functional.gelu
(#89061), functional.conv2d
padding
(#85004), functional.leaky_relu()
(#94090), MaxUnpool{1,2,3}D
(#94629)torch.distributed.run
init connect timeout by comparing host
with the current IP list (#90221)to_sparse
(#89912)torch.sparse
overview documentation(#93258)RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4