A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://paperswithcode.com/method/nesterov-accelerated-gradient below:

Nesterov Accelerated Gradient Explained | Papers With Code

Nesterov Accelerated Gradient is a momentum-based SGD optimizer that "looks ahead" to where the parameters will be to calculate the gradient ex post rather than ex ante:

$$ v_{t} = \gamma{v}_{t-1} - \eta\nabla_{\theta}J\left(\theta_{t-1}+\gamma{v_{t-1}}\right) $$ $$ \theta_{t} = \theta_{t-1} + v_{t} $$ $$ \gamma, \eta \in \mathbb{R}^+ $$

Like SGD with momentum $\gamma$ is usually set to $0.9$. $\eta$ and $\gamma$ are usually less than $1$.

The intuition is that the standard momentum method first computes the gradient at the current location and then takes a big jump in the direction of the updated accumulated gradient. In contrast Nesterov momentum first makes a big jump in the direction of the previous accumulated gradient and then measures the gradient where it ends up and makes a correction. The idea being that it is better to correct a mistake after you have made it.

Image Source: Geoff Hinton lecture notes


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4