Stay organized with collections Save and categorize content based on your preferences.
You now have embeddings for any pair of examples. A supervised similarity measure takes these embeddings and returns a number measuring their similarity. Remember that embeddings are vectors of numbers. To find the similarity between two vectors \(A = [a_1,a_2,...,a_n]\) and \(B = [b_1,b_2,...,b_n]\), choose one of these three similarity measures:
Measure Meaning Formula As similarity increases, this measure... Euclidean distance Distance between ends of vectors \(\sqrt{(a_1-b_1)^2+(a_2-b_2)^2+...+(a_N-b_N)^2}\) Decreases Cosine Cosine of angle \(\theta\) between vectors \(\frac{a^T b}{|a| \cdot |b|}\) Increases Dot product Cosine multiplied by lengths of both vectors \(a_1b_1+a_2b_2+...+a_nb_n\) \(=|a||b|cos(\theta)\) Increases. Also increases with length of vectors. Choosing a similarity measureIn contrast to the cosine, the dot product is proportional to the vector length. This is important because examples that appear very frequently in the training set (for example, popular YouTube videos) tend to have embedding vectors with large lengths. If you want to capture popularity, then choose dot product. However, the risk is that popular examples may skew the similarity metric. To balance this skew, you can raise the length to an exponent \(\alpha\ < 1\) to calculate the dot product as \(|a|^{\alpha}|b|^{\alpha}\cos(\theta)\).
To better understand how vector length changes the similarity measure, normalize the vector lengths to 1 and notice that the three measures become proportional to each other.
After normalizing a and b such that \(||a||=1\) and \(||b||=1\), these three measures are related as:
Thus, all three similarity measures are equivalent because they are proportional to \(cos(\theta_{ab})\).
Review of similarity measuresA similarity measure quantifies the similarity between a pair of examples, relative to other pairs of examples. The two types, manual and supervised, are compared below:
Type How to create Best for Implications Manual Manually combine feature data. Small datasets with features that are straightforward to combine. Gives insight into results of similarity calculations. If feature data changes, you must manually update the similarity measure. Supervised Measure distance between embeddings generated by a supervised DNN. Large datasets with hard-to-combine features. Gives no insight into results. However, a DNN can automatically adapt to changing feature data.Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-02-25 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-02-25 UTC."],[[["Supervised similarity measures leverage embeddings to quantify the similarity between data examples using Euclidean distance, cosine, or dot product."],["Dot product incorporates vector length, reflecting popularity, while cosine similarity focuses solely on the angle between vectors, ignoring popularity."],["Normalizing vector lengths makes Euclidean distance, cosine, and dot product proportional, essentially measuring the same thing."],["Supervised similarity, using embeddings and a distance metric, is suitable for large, complex datasets, while manual similarity, relying on feature combinations, is better for small, straightforward datasets."]]],[]]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4