A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/pytorch/pytorch/releases/tag/v2.7.0 below:

Release PyTorch 2.7.0 Release · pytorch/pytorch · GitHub

PyTorch 2.7.0 Release Notes Highlights Beta Prototype Torch.Compile support for Torch Function Modes NVIDIA Blackwell Architecture Support Mega Cache PyTorch Native Context Parallel Enhancing Intel GPU Acceleration FlexAttention LLM first token processing on X86 CPUs FlexAttention LLM throughput mode optimization on X86 CPUs Foreach Map Flex Attention for Inference Prologue Fusion Support in Inductor

For more details about these highlighted features, you can look at the release blogpost.
Below are the full release notes for this release.

Tracked Regressions NCCL init hits CUDA failure 'invalid argument' on 12.2 driver

Some users with 12.2 CUDA driver (535 version) report seeing "CUDA driver error: invalid argument" during NCCL or Symmetric Memory initialization. This issue is currently under investigation, see #150852. If you use PyTorch from source, a known workaround is to rebuild PyTorch with CUDA 12.2 toolkit. Otherwise, you can try upgrading the CUDA driver on your system.

Backwards Incompatible Changes Dropped support for Triton < 2.2.0. Removed Support for CUDA 12.4, Anaconda in CI/CD. C++ Extensions py_limited_api=True is now built with -DPy_LIMITED_API (#145764)

We formally began respecting the py_limited_api=True kwarg in 2.6 and stopped linking libtorch_python.so when the flag was specified, as libtorch_python.so does not guarantee using APIs from from the stable Python limited API. In 2.7, we go further by specifying the -DPy_LIMITED_API flag which will enforce that the extension is buildable with the limited API. As a result of this enforcement, custom extensions that set py_limited_api=True but do not abide by the limited API may fail to build. For an example, see #152243.

This is strictly better behavior as it is sketchy to claim CPython agnosticism without enforcing with the flag. If you run into this issue, please ensure that the extension you are building does not use any APIs which are outside of the Python limited API, e.g., pybind.

Change torch.Tensor.new_tensor() to be on the given Tensor's device by default (#144958)

This function was always creating the new Tensor on the "cpu" device and will now use the same device as the current Tensor object. This behavior is now consistent with other .new_* methods.

Use Manylinux 2.28 and CXX11_ABI=1 for future released Linux wheel builds.

With Migration to manylinux_2_28 (AlmaLinux 8 based), we can no longer support OS distros with glibc2_26. These include popular Amazon Linux 2 and CentOS 7. (#143423, #146200, #148028, #148135, #148195, #148129)

torch.onnx.dynamo_export now uses the ExportedProgram logic path (#137296)

Users using the torch.onnx.dynamo_export API may see some ExportOptions become
unsupported due to an internal switch to use torch.onnx.export(..., dynamo=True): diagnostic_options, fake_context and onnx_registry are removed/ignored by ExportOptions. Only dynamic_shapes is retained.

Users should move to use the dynamo=True option on torch.onnx.export as
torch.onnx.dynamo_export is now deprecated. Leverage the dynamic_shapes argument in torch.onnx.export for specifying dynamic shapes on the model.

Version 2.6.0

torch.onnx.dynamo_export(model, *args, **kwargs)

Version 2.7.0

torch.onnx.export(model, args, kwargs=kwargs, dynamo=True)
Finish deprecation of LRScheduler.print_lr() along with the verbose kwarg to the LRScheduler constructor. (#147301)

Both APIs have been deprecated since 2.2. Please use LRScheduler.get_last_lr() to access the learning rate instead.print_lr and verbose were confusing, not properly documented and were little used, as described in #99270, so we deprecated them in 2.2. Now, we complete the deprecation by removing them completely. To access and print the learning rate of a LRScheduler:

Version 2.6.0

optim = ...
lrsched = torch.optim.lr_scheduler.ReduceLROnPlateau(optim, verbose=True)
// lrsched will internally call print_lr() and print the learning rate      

Version 2.7.0

optim = ...
lrsched = torch.optim.lr_scheduler.ReduceLROnPlateau(optim)
print(lrsched.get_last_lr())
libtorch_python.so symbols are now invisible by default on all platforms except Apple (#142214)

Previously, the symbols in libtorch_python.so were exposed with default visibility. We have transitioned to being more intentional about what we expose as public symbols for our python API in C++. After #142214, public symbols will be marked explicitly while everything else will be hidden. Some extensions using private symbols will see linker failures with this change.

Please use torch.export.export instead of capture_pre_autograd_graph to export the model for pytorch 2 export quantization (#139505)

capture_pre_autograd_graph was a temporary API in torch.export. Since now we have a better longer term API: export available, we can deprecate it.

Version 2.6.0

from torch._export import capture_pre_autograd_graph
from torch.ao.quantization.quantize_pt2e import prepare_pt2e
from torch.ao.quantization.quantizer.xnnpack_quantizer import (
    XNNPACKQuantizer,
    get_symmetric_quantization_config,
)
quantizer = XNNPACKQuantizer().set_global(
    get_symmetric_quantization_config()
)
m = capture_pre_autograd_graph(m, *example_inputs)
m = prepare_pt2e(m, quantizer)

Version 2.7.0

from torch.export import export
from torch.ao.quantization.quantize_pt2e import prepare_pt2e
# please get xnnpack quantizer from executorch (https://github.com/pytorch/executorch/)
from executorch.backends.xnnpack.quantizer.xnnpack_quantizer import (
    XNNPACKQuantizer,
    get_symmetric_quantization_config,
)
quantizer = XNNPACKQuantizer().set_global(
    get_symmetric_quantization_config()
)
m = export(m, *example_inputs)
m = prepare_pt2e(m, quantizer)
New interface for torch.fx.passes.graph_transform_observer.GraphTransformObserver to enable Node Level provenance tracking (#144277)

We now track a mapping between the nodes in the pre-grad and post-grad graph. See the issue for an example frontend to visualize the transformations. To update your GraphTransformObserver subclasses, instead of overriding on_node_creation and on_node_erase, there are new functions get_node_creation_hook, get_node_erase_hook, get_node_replace_hook and get_deepcopy_hook. These are registered on the GraphModule member of the GraphTransformObserver upon entry and exit of a with block

Version 2.6.0

class MyPrintObserver(GraphTransformObserver):
    def on_node_creation(self, node: torch.fx.Node):
        print(node)

Version 2.7.0

class MyPrintObserver(GraphTransformObserver):
    def get_node_creation_hook(self):
        def hook(node: torch.fx.Node):
            print(node)
        return hook
torch.ao.quantization.pt2e.graph_utils.get_control_flow_submodules is no longer public (#141612)

We are planning to make all functions under torch.ao.quantization.pt2e.graph_utils private. This update marks get_control_flow_submodules as a private API. If you have to or want to continue using get_control_flow_submodules, please make a private call by using _get_control_flow_submodules.

Example:
Version 2.6:

>>> from torch.ao.quantization.pt2e.graph_utils import get_control_flow_submodules

Version 2.7:

>>> from torch.ao.quantization.pt2e.graph_utils import get_control_flow_submodules
ImportError: cannot import name 'get_control_flow_submodules' from 'torch.ao.quantization.pt2e.graph_utils'
>>> from torch.ao.quantization.pt2e.graph_utils import _get_control_flow_submodules  # Note: Use _get_control_flow_submodules for private access
Deprecations torch.onnx.dynamo_export is deprecated (#146425, #146639, #146923)

Users should use the dynamo=True option on torch.onnx.export.

Version 2.6.0

torch.onnx.dynamo_export(model, *args, **kwargs)

Version 2.7.0

torch.onnx.export(model, args, kwargs=kwargs, dynamo=True)
XNNPACKQuantizer is deprecated in PyTorch and moved to ExecuTorch, please use it from executorch.backends.xnnpack.quantizer.xnnpack_quantizer instead of torch.ao.quantization.quantizer.xnnpack_quantizer. (#144940)

XNNPACKQuantizer is a quantizer for xnnpack that was added into pytorch/pytorch for initial development. However, as it is not related to our core quantization workflow, we have moved it to ExecuTorch instead. Please use it from executorch.backends.xnnpack.quantizer.xnnpack_quantizer instead of torch.ao.quantization.quantizer.xnnpack_quantizer.

Version 2.6.0

from torch._export import capture_pre_autograd_graph
from torch.ao.quantization.quantize_pt2e import prepare_pt2e
from torch.ao.quantization.quantizer.xnnpack_quantizer import (
    XNNPACKQuantizer,
    get_symmetric_quantization_config,
)
quantizer = XNNPACKQuantizer().set_global(
    get_symmetric_quantization_config()
)
m = capture_pre_autograd_graph(m, *example_inputs)
m = prepare_pt2e(m, quantizer)

Version 2.7.0

# we also updated the export call
from torch.export import export
from torch.ao.quantization.quantize_pt2e import prepare_pt2e
# please get xnnpack quantizer from executorch (https://github.com/pytorch/executorch/)
from executorch.backends.xnnpack.quantizer.xnnpack_quantizer import (
    XNNPACKQuantizer,
    get_symmetric_quantization_config,
)
quantizer = XNNPACKQuantizer().set_global(
    get_symmetric_quantization_config()
)
m = export(m, *example_inputs)
m = prepare_pt2e(m, quantizer)
New features Release Engineering Python Frontend C++ Extensions Distributed Context Parallel c10d Distributed Checkpoint (DCP) CUDA MPS ROCm XPU torch.compile Dynamo Inductor Profiler Quantization
from torchao.dtypes import PlainLayout
from torchao.experimental.packed_linear_int8_dynamic_activation_intx_weight_layout import (
    PackedLinearInt8DynamicActivationIntxWeightLayout,
)
from torchao.experimental.quant_api import (
    int8_dynamic_activation_intx_weight,
)
from torchao.quantization.granularity import (
    PerGroup,
    PerRow,
)
from torchao.quantization.quant_api import quantize_
from torchao.quantization.quant_primitives import MappingType
my_model = Model()
quantize_(
    my_model,
    int8_dynamic_activation_intx_weight(
        weight_dtype=torch.int4,
        granularity=PerGroup(32), # PerRow() is also supported
        has_weight_zeros=True, # Should be True
        weight_mapping_type=MappingType.SYMMETRIC_NO_CLIPPING_ERR # MappingType.SYMMETRIC can also be used but increases error
        layout=PackedLinearInt8DynamicActivationIntxWeightLayout(target="aten"),
    ),
)
ONNX torch.onnx.verification.verify_onnx_program (#148396, #148706, #148730, #148707)

A new verification API torch.onnx.verification.verify_onnx_program can now be used to verify numerical accuracy of the exported ONNX model. Users can use the compare_intermediates option to identify any operator that causes numerical discrepancies in intermediate tensors. It is possible to use a tool like model-explorer to visualize the verification results.

Improvements Release Engineering Python Frontend Autograd Dataloader Linear Algebra Nested Tensor (NJT) torch.nn torch.optim Build Frontend C++ Frontend Distributed c10d DistributedDataParallel (DDP) FullyShardedDataParallel2 (FSDP2) DTensor TensorParallel Torch Elastic Pipelining CPU General x86 CUDA MPS ROCm XPU Profiler torch.compile Dynamo AOTDispatcher Dynamic Shapes Decompositions, FakeTensor and meta tensors

Several operator decomps received improvements/bugfixes:

Inductor torch.fx torch.export serialization dynamic shapes draft export miscellaneous Quantization ONNX JIT Lazy Tensor torch.package Bug fixes Python Frontend Autograd Linear Algebra Nested Tensor (NJT) torch.nn Build Frontend C++ Frontend Distributed Distibuted Checkpoint (DCP) Distributed (c10d) DistributedStateDict (DSD) FullyShardedDataParallel2 (FSDP2) DTensor Pipelining CPU General x86 CUDA MPS ROCm XPU Profiler torch.compile Dynamo Inductor torch.fx torch.export serialization draft export miscellaneous ONNX Performance Release Engineering Sparse Frontend Distributed Distributed Checkpoint (DCP) c10d CPU General x86 CUDA MPS ROCm XPU torch.compile Dynamo Inductor torch.fx Quantization Documentation Python Frontend Autograd Nested Tensor (NJT) torch.nn torch.optim Build Frontend Distributed FullyShardedDataParallel2 (FSDP2) Distributed (c10d) DTensor DistributedStateDict (DSD) Torch Elastic Pipelining CUDA XPU torch.compile Dynamo Inductor torch.fx torch.export Quantization ONNX Developers Python Frontend Distributed FullyShardedDataParallel2 (FSDP2) Distributed (c10d) TensorParallel Pipelining MPS XPU Benchmark torch.compile Dynamo Inductor torch.fx

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4