A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/pytorch/pytorch/releases/tag/v2.6.0 below:

Release PyTorch 2.6.0 Release · pytorch/pytorch · GitHub

Highlights

We are excited to announce the release of PyTorch® 2.6 (release notes)! This release features multiple improvements for PT2: torch.compile can now be used with Python 3.13; new performance-related knob torch.compiler.set_stance; several AOTInductor enhancements. Besides the PT2 improvements, another highlight is FP16 support on X86 CPUs.

NOTE: Starting with this release we are not going to publish on Conda, please see [Announcement] Deprecating PyTorch’s official Anaconda channel for the details.

For this release the experimental Linux binaries shipped with CUDA 12.6.3 (as well as Linux Aarch64, Linux ROCm 6.2.4, and Linux XPU binaries) are built with CXX11_ABI=1 and are using the Manylinux 2.28 build platform. If you build PyTorch extensions with custom C++ or CUDA extensions, please update these builds to use CXX_ABI=1 as well and report any issues you are seeing. For the next PyTorch 2.7 release we plan to switch all Linux builds to Manylinux 2.28 and CXX11_ABI=1, please see [RFC] PyTorch next wheel build platform: manylinux-2.28 for the details and discussion.

Also in this release as an important security improvement measure we have changed the default value for weights_only parameter of torch.load. This is a backward compatibility-breaking change, please see this forum post for more details.

This release is composed of 3892 commits from 520 contributors since PyTorch 2.5. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try these out and report any issues as we improve PyTorch. More information about how to get started with the PyTorch 2-series can be found at our Getting Started page.

Beta Prototype torch.compiler.set_stance Improved PyTorch user experience on Intel GPUs torch.library.triton_op FlexAttention support on X86 CPU for LLMs torch.compile support for Python 3.13 Dim.AUTO New packaging APIs for AOTInductor CUTLASS and CK GEMM/CONV Backends for AOTInductor AOTInductor: minifier AOTInductor: ABI-compatible mode code generation FP16 support for X86 CPUs

*To see a full list of public feature submissions click here.

BETA FEATURES [Beta] torch.compiler.set_stance

This feature enables the user to specify different behaviors (“stances”) that torch.compile can take between different invocations of compiled functions. One of the stances, for example, is

“eager_on_recompile”, that instructs PyTorch to code eagerly when a recompile is necessary, reusing cached compiled code when possible.

For more information please refer to the set_stance documentation and the Dynamic Compilation Control with torch.compiler.set_stance tutorial.

[Beta] torch.library.triton_op

torch.library.triton_op offers a standard way of creating custom operators that are backed by user-defined triton kernels.

When users turn user-defined triton kernels into custom operators, torch.library.triton_op allows torch.compile to peek into the implementation, enabling torch.compile to optimize the triton kernel inside it.

For more information please refer to the triton_op documentation and the Using User-Defined Triton Kernels with torch.compile tutorial.

[Beta] torch.compile support for Python 3.13

torch.compile previously only supported Python up to version 3.12. Users can now optimize models with torch.compile in Python 3.13.

[Beta] New packaging APIs for AOTInductor

A new package format, “PT2 archive”, has been introduced. This essentially contains a zipfile of all the files that need to be used by AOTInductor, and allows users to send everything needed to other environments. There is also functionality to package multiple models into one artifact, and to store additional metadata inside of the package.

For more details please see the updated torch.export AOTInductor Tutorial for Python runtime.

[Beta] AOTInductor: minifier

If a user encounters an error while using AOTInductor APIs, AOTInductor Minifier allows creation of a minimal nn.Module that reproduces the error.

For more information please see the AOTInductor Minifier documentation.

[Beta] AOTInductor: ABI-compatible mode code generation

AOTInductor-generated model code has dependency on Pytorch cpp libraries. As Pytorch evolves quickly, it’s important to make sure previously AOTInductor compiled models can continue to run on newer Pytorch versions, i.e. AOTInductor is backward compatible.

In order to guarantee application binary interface (ABI) backward compatibility, we have carefully defined a set of stable C interfaces in libtorch and make sure AOTInductor generates code that only refers to the specific set of APIs and nothing else in libtorch. We will keep the set of C APIs stable across Pytorch versions and thus provide backward compatibility guarantees for AOTInductor-compiled models.

[Beta] FP16 support for X86 CPUs (both eager and Inductor modes)

Float16 datatype is commonly used for reduced memory usage and faster computation in AI inference and training. CPUs like the recently launched Intel® Xeon® 6 with P-Cores support Float16 datatype with native accelerator AMX. Float16 support on X86 CPUs was introduced in PyTorch 2.5 as a prototype feature, and now it has been further improved for both eager mode and Torch.compile + Inductor mode, making it Beta level feature with both functionality and performance verified with a broad scope of workloads.

PROTOTYPE FEATURES

[Prototype] Improved PyTorch user experience on Intel GPUs

PyTorch user experience on Intel GPUs is further improved with simplified installation steps, Windows release binary distribution and expanded coverage of supported GPU models including the latest Intel® Arc™ B-Series discrete graphics. Application developers and researchers seeking to fine-tune, inference and develop with PyTorch models on Intel® Core™ Ultra AI PCs and Intel® Arc™ discrete graphics will now be able to directly install PyTorch with binary releases for Windows, Linux and Windows Subsystem for Linux 2.

For more information regarding Intel GPU support, please refer to Getting Started Guide.

[Prototype] FlexAttention support on X86 CPU for LLMs

FlexAttention was initially introduced in PyTorch 2.5 to provide optimized implementations for Attention variants with a flexible API. In PyTorch 2.6, X86 CPU support for FlexAttention was added through TorchInductor CPP backend. This new feature leverages and extends current CPP template abilities to support broad attention variants (e.x.: PageAttention, which is critical for LLMs inference) based on the existing FlexAttention API, and brings optimized performance on x86 CPUs. With this feature, it’s easy to use FlexAttention API to compose Attention solutions on CPU platforms and achieve good performance.

[Prototype] Dim.AUTO

Dim.AUTO allows usage of automatic dynamic shapes with torch.export. Users can export with Dim.AUTO and “discover” the dynamic behavior of their models, with min/max ranges, relations between dimensions, and static/dynamic behavior being automatically inferred.

This is a more user-friendly experience compared to the existing named-Dims approach for specifying dynamic shapes, which requires the user to fully understand the dynamic behavior of their models at export time. Dim.AUTO allows users to write generic code that isn’t model-dependent, increasing ease-of-use for exporting with dynamic shapes.

Please see torch.export tutorial for more information.

[Prototype] CUTLASS and CK GEMM/CONV Backends for AOTInductor

The CUTLASS and CK backend adds kernel choices for GEMM autotuning in Inductor. This is now also available in AOTInductor which can run in C++ runtime environments. A major improvement to the two backends is improved compile-time speed by eliminating redundant kernel binary compilations and dynamic shapes support.

Tracked Regressions torch.device(0) makes CUDA init fail in subprocess

There is a known regression (#144152) that torch.device(0) makes CUDA init fail in subprocess since PyTorch 2.5.0.
There was an attempt to fix the regressions, but it caused some complications and was reverted.

An easy workaround is to use torch.device('cuda') or torch.device('cuda:0') instead.

Regression in the compilation of the torch.all operation with out= variant

A regressions (#145220) was reported for PyTorch 2.6.0 with
compilation of the out= variant of the torch.all operator. This should be a rare use case, a workaround can be
rewriting the model code to avoid the out= variant.

Backwards Incompatible changes Flip default torch.load to weights_only (#137602, #138225, #138866, #139221, #140304, #138936, #139541, #140738, #142153, #139433)

We are closing the loop on the deprecation that started in 2.4 and flipped torch.load to use weights_only=True by default.

When this flag is set, instead of using the usual pickle module, torch.load uses a custom unpickler constrained to call only functions and classes needed for loading state dictionaries and basic types.

While this change is disruptive for users serializing more than basic types, we expect the increased security by default is a tradeoff that is worth it. Do note that, even though this default is safer, we still recommend only loading trusted checkpoints and rely on more constrained (and even safer) formats like safetensors for un-trusted checkpoints.

For full details, please refer to this dev-discuss post.

Anaconda deprecation in CD. Remove anaconda dependency in Magma builds (#141024) (#141281) (#140157) (#139888) (#140141) (#139924) (#140158) (#142019) (#142276) (#142277) (#142282)

PyTorch will stop publishing Anaconda packages that depend on Anaconda’s default packages. We are directing users to utilize our official wheel packages from download.pytorch.org or PyPI, or switch to utilizing conda-forge (pytorch) packages if they would like to continue to use conda. For more details refer to this announcement

Added Manylinux 2.28 prototype support and CXX11_ABI=1 for following binaries: Linux CUDA 12.6, Linux aarch64 CPU, Linux aarch64 GPU CUDA 12.6, ROCm 6.2.4, Linux XPU (#139894) (#139631) (#139636) (#140743) (#137696) (#141565) (#140681) (#141609) (#141704) (#141423) (#141609)

The PyTorch binaries shipped with CUDA 12.6.3 are built with CXX11_ABI=1 and are using the Manylinux 2.28 build platform. If you are building PyTorch extensions with custom C++ or CUDA extensions, please update these builds to use CXX_ABI=1 as well and report any issues you are seeing. For the next PyTorch 2.7 release we plan to switch all Linux builds to Manylinux 2.28 and CXX11_ABI=1, please see [RFC] PyTorch next wheel build platform: manylinux-2.28 for the details and discussion.

ONNX torch.onnx.export(..., dynamo=True) now creates ONNX models using IR version 10 (#141207)

ONNX ir_version=10 is used to add support for UINT4, INT4 data types and include metadata in GraphProto and NodeProto. Make sure model consumers are able to accept IR version 10 ONNX models. You may read more about IRv10 on https://github.com/onnx/onnx/releases/tag/v1.16.0.

Several obsolete APIs are removed (#133825, #136279, #137789, #137790)

Some logging APIs, torch.onnx.ExportTypes, torch.onnx.export_to_pretty_string are removed. Users should remove usage of the APIs above.

torch.onnx.ONNXProgram has been reimplemented and improved (#136281)

All ONNX "dynamo" APIs will return the new ONNXProgram class. Some notable methods available are save(), optimize(). It can also be directly applied on PyTorch tensors to leverage ONNX Runtime to verify the ONNX graph. Some legacy methods are no longer available.

Deprecations Releng Removed CUDA 12.1 support in CI/CD (#141271) (#142177)

The full release compatibility matrix matrix can be found in release.md

Deprecated c10d::onCompletionHook (#142390) Inductor Deprecate TORCHINDUCTOR_STACK_ALLOCATION (#139147)

Instead of setting TORCHINDUCTOR_STACK_ALLOCATION, update your torch.compile call: torch.compile(options={"aot_inductor.allow_stack_allocation": True})(foo).

New features Python Frontend Miscellaneous Optim Distributed Dynamo Releng ROCM XPU Profiler Export Inductor ONNX

torch.cond is the recommended way to introduce control flows that can be converted to an ONNX model.

This is useful when you need to override an implementation or provide one that is not currently implemented. Refer to the tutorials for a more complete description of the operator registration mechanism.

# Define the translation using ONNX Script 
from onnxscript import opset18 as op 

def sym_not_onnx(input): 
   return op.Not(input) 
torch.onnx.export(... 
  dynamo=True,  
   custom_translation_table = { # Then provide it here 
      torch.sym_not: sym_not_onnx,  
}) 

Users can run optimize() to flatten nested structures in the ONNX graph, perform constant folding and remove redundancies in the ONNX model. Calling optimize() after exporting to ONNX is recommended.

onnx_program = torch.onnx.export(..., dynamo=True) 
onnx_program.optimize()  # Optimize the graph before saving is recommended 
onnx_program.save(...) 
Improvements Python Frontend NN Frontend Optim Composability Decompositions, FakeTensor and meta tensors

Operator decompositions, FakeTensors and meta tensors are used to trace out a graph in torch.compile and torch.export. They received several improvements:

Dynamic shapes

We made many improvements and bugfixes to dynamic shapes in torch.compile

Custom operators

We improved the existing torch.library APIs and added new ones.

Distributed Profiler Nested Tensor Functorch Quantization Releng Cuda Mps ROCM XPU Miscellaneous Dynamo Export Fx Inductor ONNX Bug fixes Python Frontend NN Frontend Autograd Frontend Composability Distributed Dynamo Nested Tensor Frontend Cuda Mps ROCM XPU Profiler Quantization Sparse Frontend Miscellaneous Export Fx Inductor Jit ONNX Performance Dynamo Mps ROCM Sparse Frontend Miscellaneous Inductor Documentation Distributed Inductor Mps NN Frontend Optim Python Frontend Miscellaneous Developers Composability Distributed Export Inductor Optim Quantization Releng XPU

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4