A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/pytorch/pytorch/releases/tag/v2.4.0 below:

Python 3.12, AOTInductor freezing, libuv backend for TCPStore · pytorch/pytorch · GitHub

PyTorch 2.4 Release Notes Highlights

We are excited to announce the release of PyTorch® 2.4!
PyTorch 2.4 adds support for the latest version of Python (3.12) for torch.compile.
AOTInductor freezing gives developers running AOTInductor more performance based optimizations by allowing the
serialization of MKLDNN weights. As well, a new default TCPStore server backend utilizing libuv has been introduced
which should significantly reduce initialization times for users running large-scale jobs.
Finally, a new Python Custom Operator API makes it easier than before to integrate custom kernels
into PyTorch, especially for torch.compile.

This release is composed of 3661 commits and 475 contributors since PyTorch 2.3. We want to sincerely thank our
dedicated community for your contributions. As always, we encourage you to try these out and report any issues as we
improve 2.4. More information about how to get started with the PyTorch 2-series can be found at our
Getting Started page.

Beta Prototype Performance Improvements Python 3.12 support for torch.compile FSDP2: DTensor-based per-parameter-sharding FSDP torch.compile optimizations for AWS Graviton (aarch64-linux) processors AOTInductor Freezing for CPU torch.distributed.pipelining, simplified pipeline parallelism BF16 symbolic shape optimization in TorchInductor New Higher-level Python Custom Operator API Intel GPU is available through source build Performance optimizations for GenAI projects utilizing CPU devices Switching TCPStore’s default server backend to libuv

*To see a full list of public feature submissions click here.

Tracked Regressions Subproc exception with torch.compile and onnxruntime-training

There is a reported issue (#131070) when using torch.compile if onnxruntime-training lib is
installed. The issue will be fixed (#131194) in v2.4.1. It can be solved locally by setting the environment variable
TORCHINDUCTOR_WORKER_START=fork before executing the script.

cu118 wheels will not work with pre-cuda12 drivers

It was also reported (#130684) that the new version of triton uses cuda features that are not compatible with pre-cuda12 drivers.
In this case, the workaround is to set
TRITON_PTXAS_PATH manually as follows (adapt the code according to the local installation path):

TRITON_PTXAS_PATH=/usr/local/lib/python3.10/site-packages/torch/bin/ptxas  python script.py
Backwards Incompatible Change Python frontend Default TreadPool size to number of physical cores (#125963)

Changed the default number of threads used for intra-op parallelism from the number of logical cores to the number of
physical cores. This should reduce core oversubscribing when running CPU workload and improve performance.
Previous behavior can be recovered by using torch.set_num_threads to set the number of threads to the desired value.

Fix torch.quasirandom.SobolEngine.draw default dtype handling (#126781)

The default dtype value has been changed from torch.float32 to the current default dtype as given by
torch.get_default_dtype() to be consistent with other APIs.

Forbid subclassing torch._C._TensorBase directly (#125558)

This is an internal subclass that a user used to be able to create an object that is almost a Tensor in Python and was
advertised as such in some tutorials. This is not allowed anymore to improve consistency and all users should
subclass torch.Tensor directly.

Composability Non-compositional usages of as_strided + mutation under torch.compile will raise an error (#122502)

The torch.compile flow involves functionalizing any mutations inside the region being compiled. Torch.as_strided is
an existing view op that can be used non-compositionally: meaning when you call x.as_strided(...), as_strided will only
consider the underlying storage size of x, and ignore its current size/stride/storage_offset when creating a new view.
This makes it difficult to safely functionalize mutations on views of as_strided that are created non-compositionally,
so we ban them rather than risking silent correctness issues under torch.compile.

An example of a non-compositional usage of as_strided followed by mutation that we will error on is below. You can avoid
this issue by re-writing your usage of as_strided so that it is compositional (for example: either use a different set
of view ops instead of as_strided, or call as_strided directly on the base tensor instead of an existing view of it).

@torch.compile
def foo(a):
    e = a.diagonal()
    # as_strided is being called on an existing view (e),
    # making it non-compositional. mutations to f under torch.compile
    # are not allowed, as we cannot easily functionalize them safely
    f = e.as_strided((2,), (1,), 0)
    f.add_(1.0)
    return a
We now verify schemas of custom ops at registration time (#124520)

Previously, you could register a custom op through the operator registration APIs, but give it a schema that contained
types unknown to the PyTorch Dispatcher. This behavior came from TorchScript, where “unknown” types were implicitly
treated by the TorchScript interpreter as type variables. However, calling such a custom op through regular pytorch
would result in an error later. As of 2.4, we will raise an error at registration time, when you first register the
custom operator. You can get the old behavior by constructing the schema with allow_typevars=true.

TORCH_LIBRARY(my_ns, m) {
  // this now raises an error at registration time: bar/baz are unknown types
  m.def("my_ns::foo(bar t) -> baz");
  // you can get back the old behavior with the below flag
  m.def(torch::schema("my_ns::foo(bar t) -> baz", /*allow_typevars*/ true));
}
Autograd frontend Delete torch.autograd.function.traceable APIs (#122817)

The torch.autograd.function.traceable(...) API, which sets the is_traceable class attribute
on a torch.autograd.Function class was deprecated in 2.3 and is now being deleted.
This API does not do anything and was only meant for internal purposes.
The following raised an warning in 2.3, and now errors because the API has been deleted:

@torch.autograd.function.traceable
class Func(torch.autograd.Function):
    ...
Release engineering Optim Distributed DeviceMesh

Update get_group and add get_all_groups (#128097)
In 2.3 and before, users can do:

mesh_2d = init_device_mesh(
    "cuda", (2, 2), mesh_dim_names=("dp", "tp")
)
mesh_2d.get_group()  # This will return all sub-pgs within the mesh
assert mesh_2d.get_group()[0] == mesh_2d.get_group(0)
assert mesh_2d.get_group()[1] == mesh_2d.get_group(1)

But from 2.4 forward, if users call get_group without passing in the dim, users will get a RuntimeError.
Instead, they should use get_all_groups:

mesh_2d = init_device_mesh(
    "cuda", (2, 2), mesh_dim_names=("dp", "tp")
)
mesh_2d.get_group()  # This will throw a RuntimeError
assert mesh_2d.get_all_groups()[0] == mesh_2d.get_group(0)
assert mesh_2d.get_all_groups()[1] == mesh_2d.get_group(1)
Pipelining

Retire torch.distributed.pipeline (#127354)
In 2.3 and before, users can do:

import torch.distributed.pipeline # warning saying that this will be removed and users need to migrate to torch.distributed.pipelining

But from 2.4 forward, if users write the code above, users will get a ModuleNotFound error.
Instead, they should use torch.distributed.pipelining:

import torch.distributed.pipeline # -> ModuleNotFoundError
import torch.distributed.pipelining
jit Fx

Complete revamp of float/promotion sympy handling (#126905)

ONNX Deprecations Python frontend Composability CPP Release Engineering Optim nn Distributed Profiler Quantization Export XPU ONNX New Features Python frontend Composability Optim nn frontend linalg Distributed c10d FullyShardedDataParallel v2 (FSDP2) Pipelining Profiler Dynamo Export Inductor jit MPS XPU ONNX Vulkan Improvements Python frontend Composability Autograd frontend Release Engineering nn frontend Optim Foreach cuda Quantization Distributed c10d DeviceMesh Distributed quantization DistributedDataParallel (DDP) Distributed Checkpointing (DCP) DTensor FullyShardedDataParallel (FSDP) ShardedTensor TorchElastic Tensor Parallel Profiler Profiler torch.profiler: Memory Snapshot torch.cuda.memory._dump_snapshot: Profiler record_function: Export Fx Dynamo Inductor jit ONNX MPS XPU Bug fixes Python frontend fixes Composability fixes cuda fixes Autograd frontend fixes Release Engineering fixes nn frontend fixes Optim fixes fixes linalg fixes CPP fixes Distributed fixes c10d DeviceMesh DistributedDataParallel (DDP) Distributed Checkpointing (DCP) FullyShardedDataParallel (FSDP) TorchElastic Profiler fixes Dynamo fixes Export fixes Fx fixes Inductor fixes ONNX fixes MPS fixes XPU fixes Performance Python frontend cuda nn frontend Optim linalg Foreach Distributed C10d DTensor Distributed Checkpointing (DCP) TorchElastic jit Fx Inductor MPS XPU Documentation Python frontend Composability cuda Autograd frontend Release Engineering nn frontend Optim linalg Distributed c10d Distributed Checkpointing (DCP) DTensor FullyShardedDataParallel (FSDP) Profiler Export Fx Dynamo Inductor ONNX XPU Developers Composability Release Engineering Optim Distributed c10d DTensor Distributed Checkpointing (DCP) FullyShardedDataParallel (FSDP) Miscellaneous TorchElastic Fx Inductor MPS XPU Security Python frontend Release Engineering

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4