Time Series cross-validator.
Provides train/test indices to split time-ordered data, where other cross-validation methods are inappropriate, as they would lead to training on future data and evaluating on past data. To ensure comparable metrics across folds, samples must be equally spaced. Once this condition is met, each test set covers the same time duration, while the train set size accumulates data from previous splits.
This cross-validation object is a variation of KFold
. In the k-th split, it returns the first k folds as the train set and the (k+1)-th fold as the test set.
Note that, unlike standard cross-validation methods, successive training sets are supersets of those that come before them.
Read more in the User Guide.
For visualisation of cross-validation behaviour and comparison between common scikit-learn split methods refer to Visualizing cross-validation behavior in scikit-learn
Added in version 0.18.
Number of splits. Must be at least 2.
Changed in version 0.22: n_splits
default value changed from 3 to 5.
Maximum size for a single training set.
Used to limit the size of the test set. Defaults to n_samples // (n_splits + 1)
, which is the maximum allowed value with gap=0
.
Added in version 0.24.
Number of samples to exclude from the end of each train set before the test set.
Added in version 0.24.
Notes
The training set has size i * n_samples // (n_splits + 1) + n_samples % (n_splits + 1)
in the i
th split, with a test set of size n_samples//(n_splits + 1)
by default, where n_samples
is the number of samples. Note that this formula is only valid when test_size
and max_train_size
are left to their default values.
Examples
>>> import numpy as np >>> from sklearn.model_selection import TimeSeriesSplit >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]]) >>> y = np.array([1, 2, 3, 4, 5, 6]) >>> tscv = TimeSeriesSplit() >>> print(tscv) TimeSeriesSplit(gap=0, max_train_size=None, n_splits=5, test_size=None) >>> for i, (train_index, test_index) in enumerate(tscv.split(X)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") Fold 0: Train: index=[0] Test: index=[1] Fold 1: Train: index=[0 1] Test: index=[2] Fold 2: Train: index=[0 1 2] Test: index=[3] Fold 3: Train: index=[0 1 2 3] Test: index=[4] Fold 4: Train: index=[0 1 2 3 4] Test: index=[5] >>> # Fix test_size to 2 with 12 samples >>> X = np.random.randn(12, 2) >>> y = np.random.randint(0, 2, 12) >>> tscv = TimeSeriesSplit(n_splits=3, test_size=2) >>> for i, (train_index, test_index) in enumerate(tscv.split(X)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") Fold 0: Train: index=[0 1 2 3 4 5] Test: index=[6 7] Fold 1: Train: index=[0 1 2 3 4 5 6 7] Test: index=[8 9] Fold 2: Train: index=[0 1 2 3 4 5 6 7 8 9] Test: index=[10 11] >>> # Add in a 2 period gap >>> tscv = TimeSeriesSplit(n_splits=3, test_size=2, gap=2) >>> for i, (train_index, test_index) in enumerate(tscv.split(X)): ... print(f"Fold {i}:") ... print(f" Train: index={train_index}") ... print(f" Test: index={test_index}") Fold 0: Train: index=[0 1 2 3] Test: index=[6 7] Fold 1: Train: index=[0 1 2 3 4 5] Test: index=[8 9] Fold 2: Train: index=[0 1 2 3 4 5 6 7] Test: index=[10 11]
For a more extended example see Time-related feature engineering.
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
A MetadataRequest
encapsulating routing information.
Returns the number of splitting iterations in the cross-validator.
Always ignored, exists for compatibility.
Always ignored, exists for compatibility.
Always ignored, exists for compatibility.
Returns the number of splitting iterations in the cross-validator.
Generate indices to split data into training and test set.
Training data, where n_samples
is the number of samples and n_features
is the number of features.
Always ignored, exists for compatibility.
Always ignored, exists for compatibility.
The training set indices for that split.
The testing set indices for that split.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4