This example illustrates the use of nearest neighbor methods for database search and classification tasks.
The three-nearest neighbors of the time series from a test set are computed. Then, the predictive performance of a three-nearest neighbors classifier [1] is computed with three different metrics: Dynamic Time Warping [2], Euclidean distance and SAX-MINDIST [3].
[1] Wikipedia entry for the k-nearest neighbors algorithm
[2] H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition”. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1), 43-49 (1978).
[3] J. Lin, E. Keogh, L. Wei and S. Lonardi, “Experiencing SAX: a novel symbolic representation of time series”. Data Mining and Knowledge Discovery, 15(2), 107-144 (2007).
1. Nearest neighbour search Computed nearest neighbor indices (wrt DTW) [[10 12 2] [ 0 13 5] [ 0 1 13] [ 0 11 5] [16 18 12] [ 3 17 9] [12 2 16] [ 7 3 17] [12 2 10] [12 2 18] [12 8 2] [ 3 17 7] [18 19 2] [ 0 17 13] [ 9 3 7] [12 2 8] [ 3 7 9] [ 0 1 13] [18 10 2] [10 12 2]] First nearest neighbor class: [0 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 0] 2. Nearest neighbor classification using DTW Correct classification rate: 1.0 3. Nearest neighbor classification using L2 Correct classification rate: 1.0 4. Nearest neighbor classification using SAX+MINDIST Correct classification rate: 0.5
# Author: Romain Tavenard # License: BSD 3 clause import numpy from sklearn.metrics import accuracy_score from tslearn.generators import random_walk_blobs from tslearn.preprocessing import TimeSeriesScalerMinMax, \ TimeSeriesScalerMeanVariance from tslearn.neighbors import KNeighborsTimeSeriesClassifier, \ KNeighborsTimeSeries numpy.random.seed(0) n_ts_per_blob, sz, d, n_blobs = 20, 100, 1, 2 # Prepare data X, y = random_walk_blobs(n_ts_per_blob=n_ts_per_blob, sz=sz, d=d, n_blobs=n_blobs) scaler = TimeSeriesScalerMinMax(value_range=(0., 1.)) # Rescale time series X_scaled = scaler.fit_transform(X) indices_shuffle = numpy.random.permutation(n_ts_per_blob * n_blobs) X_shuffle = X_scaled[indices_shuffle] y_shuffle = y[indices_shuffle] X_train = X_shuffle[:n_ts_per_blob * n_blobs // 2] X_test = X_shuffle[n_ts_per_blob * n_blobs // 2:] y_train = y_shuffle[:n_ts_per_blob * n_blobs // 2] y_test = y_shuffle[n_ts_per_blob * n_blobs // 2:] # Nearest neighbor search knn = KNeighborsTimeSeries(n_neighbors=3, metric="dtw") knn.fit(X_train, y_train) dists, ind = knn.kneighbors(X_test) print("1. Nearest neighbour search") print("Computed nearest neighbor indices (wrt DTW)\n", ind) print("First nearest neighbor class:", y_test[ind[:, 0]]) # Nearest neighbor classification knn_clf = KNeighborsTimeSeriesClassifier(n_neighbors=3, metric="dtw") knn_clf.fit(X_train, y_train) predicted_labels = knn_clf.predict(X_test) print("\n2. Nearest neighbor classification using DTW") print("Correct classification rate:", accuracy_score(y_test, predicted_labels)) # Nearest neighbor classification with a different metric (Euclidean distance) knn_clf = KNeighborsTimeSeriesClassifier(n_neighbors=3, metric="euclidean") knn_clf.fit(X_train, y_train) predicted_labels = knn_clf.predict(X_test) print("\n3. Nearest neighbor classification using L2") print("Correct classification rate:", accuracy_score(y_test, predicted_labels)) # Nearest neighbor classification based on SAX representation metric_params = {'n_segments': 10, 'alphabet_size_avg': 5} knn_clf = KNeighborsTimeSeriesClassifier(n_neighbors=3, metric="sax", metric_params=metric_params) knn_clf.fit(X_train, y_train) predicted_labels = knn_clf.predict(X_test) print("\n4. Nearest neighbor classification using SAX+MINDIST") print("Correct classification rate:", accuracy_score(y_test, predicted_labels))
Total running time of the script: (0 minutes 1.574 seconds)
Gallery generated by Sphinx-Gallery
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4