A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://pytorch.org/tutorials/intermediate/compiled_autograd_tutorial.html below:

Capturing a larger backward graph for torch.compile — PyTorch Tutorials 2.8.0+cu128 documentation

Compiled Autograd: Capturing a larger backward graph for torch.compile#

Created On: Oct 09, 2024 | Last Updated: Oct 23, 2024 | Last Verified: Oct 09, 2024

Author: Simon Fan

What you will learn

Overview#

Compiled Autograd is a torch.compile extension introduced in PyTorch 2.4 that allows the capture of a larger backward graph.

While torch.compile does capture the backward graph, it does so partially. The AOTAutograd component captures the backward graph ahead-of-time, with certain limitations:

Compiled Autograd addresses these limitations by directly integrating with the autograd engine, allowing it to capture the full backward graph at runtime. Models with these two characteristics should try Compiled Autograd, and potentially observe better performance.

However, Compiled Autograd introduces its own limitations:

Note

Compiled Autograd is under active development and is not yet compatible with all existing PyTorch features. For the latest status on a particular feature, refer to Compiled Autograd Landing Page.

Setup#

In this tutorial, we will base our examples on this simple neural network model. It takes a 10-dimensional input vector, processes it through a single linear layer, and outputs another 10-dimensional vector.

import torch

class Model(torch.nn.Module):
   def __init__(self):
      super().__init__()
      self.linear = torch.nn.Linear(10, 10)

   def forward(self, x):
      return self.linear(x)
Basic usage#

Before calling the torch.compile API, make sure to set torch._dynamo.config.compiled_autograd to True:

model = Model()
x = torch.randn(10)

torch._dynamo.config.compiled_autograd = True
@torch.compile
def train(model, x):
   loss = model(x).sum()
   loss.backward()

train(model, x)

In the code above, we create an instance of the Model class and generate a random 10-dimensional tensor x by using torch.randn(10). We define the training loop function train and decorate it with @torch.compile to optimize its execution. When train(model, x) is called:

Inspecting the compiled autograd logs#

Run the script with the TORCH_LOGS environment variables:

Rerun the snippet above, the compiled autograd graph should now be logged to stderr. Certain graph nodes will have names that are prefixed by aot0_, these correspond to the nodes previously compiled ahead of time in AOTAutograd backward graph 0, for example, aot0_view_2 corresponds to view_2 of the AOT backward graph with id=0.

In the image below, the red box encapsulates the AOT backward graph that is captured by torch.compile without Compiled Autograd.

Note

This is the graph on which we will call torch.compile, NOT the optimized graph. Compiled Autograd essentially generates some unoptimized Python code to represent the entire C++ autograd execution.

Compiling the forward and backward pass using different flags#

You can use different compiler configs for the two compilations, for example, the backward may be a fullgraph even if there are graph breaks in the forward.

def train(model, x):
    model = torch.compile(model)
    loss = model(x).sum()
    torch._dynamo.config.compiled_autograd = True
    torch.compile(lambda: loss.backward(), fullgraph=True)()

Or you can use the context manager, which will apply to all autograd calls within its scope.

def train(model, x):
   model = torch.compile(model)
   loss = model(x).sum()
   with torch._dynamo.compiled_autograd.enable(torch.compile(fullgraph=True)):
      loss.backward()
Compiled Autograd addresses certain limitations of AOTAutograd#
  1. Graph breaks in the forward pass no longer necessarily lead to graph breaks in the backward pass:

@torch.compile(backend="aot_eager")
def fn(x):
   # 1st graph
   temp = x + 10
   torch._dynamo.graph_break()
   # 2nd graph
   temp = temp + 10
   torch._dynamo.graph_break()
   # 3rd graph
   return temp.sum()

x = torch.randn(10, 10, requires_grad=True)
torch._dynamo.utils.counters.clear()
loss = fn(x)

# 1. base torch.compile
loss.backward(retain_graph=True)
assert(torch._dynamo.utils.counters["stats"]["unique_graphs"] == 3)
torch._dynamo.utils.counters.clear()

# 2. torch.compile with compiled autograd
with torch._dynamo.compiled_autograd.enable(torch.compile(backend="aot_eager")):
   loss.backward()

# single graph for the backward
assert(torch._dynamo.utils.counters["stats"]["unique_graphs"] == 1)

In the first torch.compile case, we see that 3 backward graphs were produced due to the 2 graph breaks in the compiled function fn. Whereas in the second torch.compile with compiled autograd case, we see that a full backward graph was traced despite the graph breaks.

Note

It is still possible for the Dynamo to graph break when tracing backward hooks captured by Compiled Autograd.

  1. Backward hooks can now be captured

@torch.compile(backend="aot_eager")
def fn(x):
   return x.sum()

x = torch.randn(10, 10, requires_grad=True)
x.register_hook(lambda grad: grad+10)
loss = fn(x)

with torch._dynamo.compiled_autograd.enable(torch.compile(backend="aot_eager")):
   loss.backward()

There should be a call_hook node in the graph, which dynamo will later inline into the following:

Common recompilation reasons for Compiled Autograd#
  1. Due to changes in the autograd structure of the loss value:

torch._dynamo.config.compiled_autograd = True
x = torch.randn(10, requires_grad=True)
for op in [torch.add, torch.sub, torch.mul, torch.div]:
   loss = op(x, x).sum()
   torch.compile(lambda: loss.backward(), backend="eager")()

In the example above, we call a different operator on each iteration, leading to loss tracking a different autograd history each time. You should see some recompile messages: Cache miss due to new autograd node.

  1. Due to tensors changing shapes:

torch._dynamo.config.compiled_autograd = True
for i in [10, 100, 10]:
   x = torch.randn(i, i, requires_grad=True)
   loss = x.sum()
   torch.compile(lambda: loss.backward(), backend="eager")()

In the example above, x changes shapes, and compiled autograd will mark x as a dynamic shape tensor after the first change. You should see recompiles messages: Cache miss due to changed shapes.

Conclusion#

In this tutorial, we went over the high-level ecosystem of torch.compile with compiled autograd, the basics of compiled autograd and a few common recompilation reasons. Stay tuned for deep dives on dev-discuss.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4