A straight line function where activation is proportional to input ( which is the weighted sum from neuron ).
Function Derivative\[\begin{split}R(z,m) = \begin{Bmatrix} z*m \\ \end{Bmatrix}\end{split}\]
\[\begin{split}R'(z,m) = \begin{Bmatrix} m \\ \end{Bmatrix}\end{split}\]
def linear(z,m): return m*z
def linear_prime(z,m): return m
Pros
Cons
Exponential Linear Unit or its widely known name ELU is a function that tend to converge cost to zero faster and produce more accurate results. Different to other activation functions, ELU has a extra alpha constant which should be positive number.
ELU is very similiar to RELU except negative inputs. They are both in identity function form for non-negative inputs. On the other hand, ELU becomes smooth slowly until its output equal to -α whereas RELU sharply smoothes.
Function Derivative\[\begin{split}R(z) = \begin{Bmatrix} z & z > 0 \\ α.( e^z â 1) & z <= 0 \end{Bmatrix}\end{split}\]
\[\begin{split}R'(z) = \begin{Bmatrix} 1 & z>0 \\ α.e^z & z<0 \end{Bmatrix}\end{split}\]
def elu(z,alpha): return z if z >= 0 else alpha*(e^z -1)
def elu_prime(z,alpha): return 1 if z > 0 else alpha*np.exp(z)
Pros
Cons
A recent invention which stands for Rectified Linear Units. The formula is deceptively simple: \(max(0,z)\). Despite its name and appearance, itâs not linear and provides the same benefits as Sigmoid (i.e. the ability to learn nonlinear functions), but with better performance.
Function Derivative\[\begin{split}R(z) = \begin{Bmatrix} z & z > 0 \\ 0 & z <= 0 \end{Bmatrix}\end{split}\]
\[\begin{split}R'(z) = \begin{Bmatrix} 1 & z>0 \\ 0 & z<0 \end{Bmatrix}\end{split}\]
def relu(z): return max(0, z)
def relu_prime(z): return 1 if z > 0 else 0
Pros
Cons
Further reading
LeakyRelu is a variant of ReLU. Instead of being 0 when \(z < 0\), a leaky ReLU allows a small, non-zero, constant gradient \(\alpha\) (Normally, \(\alpha = 0.01\)). However, the consistency of the benefit across tasks is presently unclear. [1]
Function Derivative\[\begin{split}R(z) = \begin{Bmatrix} z & z > 0 \\ \alpha z & z <= 0 \end{Bmatrix}\end{split}\]
\[\begin{split}R'(z) = \begin{Bmatrix} 1 & z>0 \\ \alpha & z<0 \end{Bmatrix}\end{split}\]
def leakyrelu(z, alpha): return max(alpha * z, z)
def leakyrelu_prime(z, alpha): return 1 if z > 0 else alpha
Pros
Cons
Further reading
Sigmoid takes a real value as input and outputs another value between 0 and 1. Itâs easy to work with and has all the nice properties of activation functions: itâs non-linear, continuously differentiable, monotonic, and has a fixed output range.
Function Derivative\[S(z) = \frac{1} {1 + e^{-z}}\]
\[S'(z) = S(z) \cdot (1 - S(z))\]
def sigmoid(z): return 1.0 / (1 + np.exp(-z))
def sigmoid_prime(z): return sigmoid(z) * (1-sigmoid(z))
Pros
Cons
Further reading
Tanh squashes a real-valued number to the range [-1, 1]. Itâs non-linear. But unlike Sigmoid, its output is zero-centered. Therefore, in practice the tanh non-linearity is always preferred to the sigmoid nonlinearity. [1]
Function Derivative\[tanh(z) = \frac{e^{z} - e^{-z}}{e^{z} + e^{-z}}\]
\[tanh'(z) = 1 - tanh(z)^{2}\]
def tanh(z): return (np.exp(z) - np.exp(-z)) / (np.exp(z) + np.exp(-z))
def tanh_prime(z): return 1 - np.power(tanh(z), 2)
Pros
Cons
Softmax function calculates the probabilities distribution of the event over ânâ different events. In general way of saying, this function will calculate the probabilities of each target class over all possible target classes. Later the calculated probabilities will be helpful for determining the target class for the given inputs.
References
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4