Showing content from https://github.com/pytorch/xla/releases/tag/v2.0.0 below:
Release PyTorch/XLA 2.0 release · pytorch/xla · GitHub
Cloud TPUs now support the PyTorch 2.0 release, via PyTorch/XLA integration. On top of the underlying improvements and bug fixes in PyTorch's 2.0 release, this release introduces several features, and PyTorch/XLA specific bug fixes.
Beta Features PJRT runtime
- Checkout our newest document; PjRt is the default runtime in 2.0.
- New Implementation of xm.rendezvous with XLA collective communication which scales better (#4181)
- New PJRT TPU backend through the C-API (#4077)
- Use PJRT to default if no runtime is configured (#4599)
- Experimental support for torch.distributed and DDP on TPU v2 and v3 (#4520)
FSDP
- Add auto_wrap_policy into XLA FSDP for automatic wrapping (#4318)
Stable Features Lazy Tensor Core Migration
- Migration is completed, checkout this dev discussion for more detail.
- Naively inherits LazyTensor (#4271)
- Adopt even more LazyTensor interfaces (#4317)
- Introduce XLAGraphExecutor (#4270)
- Inherits LazyGraphExecutor (#4296)
- Adopt more LazyGraphExecutor virtual interfaces (#4314)
- Rollback to use xla::Shape instead of torch::lazy::Shape (#4111)
- Use TORCH_LAZY_COUNTER/METRIC (#4208)
Improvements & Additions
- Add an option to increase the worker thread efficiency for data loading (#4727)
- Improve numerical stability of torch.sigmoid (#4311)
- Add an api to clear counter and metrics (#4109)
- Add met.short_metrics_report to display more concise metrics report (#4148)
- Document environment variables (#4273)
- Op Lowering
- _linalg_svd (#4537)
- Upsample_bilinear2d with scale (#4464)
Experimental Features TorchDynamo (torch.compile) support
- Checkout our newest doc.
- Dynamo bridge python binding (#4119)
- Dynamo bridge backend implementation (#4523)
- Training optimization: make execution async (#4425)
- Training optimization: reduce graph execution per step (#4523)
PyTorch/XLA GSPMD on single host
- Preserve parameter sharding with sharded data placeholder (#4721)
- Transfer shards from server to host (#4508)
- Store the sharding annotation within XLATensor(#4390)
- Use d2d replication for more efficient input sharding (#4336)
- Mesh to support custom device order. (#4162)
- Introduce virtual SPMD device to avoid unpartitioned data transfer (#4091)
Ongoing development Ongoing Dynamic Shape implementation
- Implement missing
XLASymNodeImpl::Sub
(#4551)
- Make empty_symint support dynamism. (#4550)
- Add dynamic shape support to SigmoidBackward (#4322)
- Add a forward pass NN model with dynamism test (#4256)
Ongoing SPMD multi host execution (#4573) Bug fixes & improvements
- Support int as index type (#4602)
- Only alias inputs and outputs when force_ltc_sync == True (#4575)
- Fix race condition between execution and buffer tear down on GPU when using bfc_allocator (#4542)
- Release the GIL during TransferFromServer (#4504)
- Fix type annotations in FSDP (#4371)
RetroSearch is an open source project built by @garambo
| Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4