Last Updated : 23 Jul, 2025
Long Short-Term Memory (LSTM) networks are a type of recurrent neural network (RNN) that are widely used for sequence prediction tasks. In PyTorch, the nn.LSTM module is a powerful tool for implementing these networks. However, understanding the difference between the "hidden" and "output" states of an LSTM can be confusing for many. This article aims to clarify these concepts, providing detailed explanations and examples to help you understand how LSTMs work in PyTorch.
1. Hidden State (h_n)The hidden state in an LSTM represents the short-term memory of the network. It contains information about the sequence that has been processed so far and is updated at each time step. The hidden state is crucial for maintaining information across time steps and layers.
2. Output (output)Shape: The hidden state h_n has the shape (num_layers * num_directions, batch, hidden_size). This shape indicates that the hidden state is maintained for each layer and direction in the LSTM.
The output of an LSTM is the sequence of hidden states from the last layer for each time step. Unlike the hidden state, which is only the last hidden state for each sequence, the output includes the hidden state for every time step in the sequence.
Shape: The output has the shape (seq_len, batch, num_directions * hidden_size), where seq_len is the length of the input sequence.
Below is an example of how to implement an LSTM in PyTorch and access the hidden state and output:
Python
import torch
import torch.nn as nn
# Define LSTM parameters
input_size = 10
hidden_size = 20
num_layers = 2
batch_size = 3
seq_len = 5
# Initialize LSTM
lstm = nn.LSTM(input_size, hidden_size, num_layers)
# Create random input tensor
input_tensor = torch.randn(seq_len, batch_size, input_size)
# Initialize hidden and cell states
h0 = torch.zeros(num_layers, batch_size, hidden_size)
c0 = torch.zeros(num_layers, batch_size, hidden_size)
# Forward pass through LSTM
output, (hn, cn) = lstm(input_tensor, (h0, c0))
print("Output shape:", output.shape) # (seq_len, batch, num_directions * hidden_size)
print("Hidden state shape:", hn.shape) # (num_layers * num_directions, batch, hidden_size)
Output:
Output shape: torch.Size([5, 3, 20])Explanation of the Code
Hidden state shape: torch.Size([2, 3, 20])
Understanding the difference between the hidden state and output in PyTorch's LSTM is crucial for effectively using this powerful neural network architecture. The hidden state provides a summary of the sequence, while the output contains detailed information for each time step.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4