Uses a script to provide a custom score for returned documents.
The script_score
query is useful if, for example, a scoring function is expensive and you only need to calculate the score of a filtered set of documents.
The following script_score
query assigns each returned document a score equal to the my-int
field value divided by 10
.
GET /_search
{ "query": { "script_score": { "query": { "match": { "message": "elasticsearch" } }, "script": { "source": "doc['my-int'].value / 10 " } } } }
query
script
query
.
Important
Final relevance scores from the script_score
query cannot be negative. To support certain search optimizations, Lucene requires scores be positive or 0
.
min_score
boost
script
are multiplied by boost
to produce final documents' scores. Defaults to 1.0
.
Within a script, you can access the _score
variable which represents the current relevance score of a document.
Within a script, you can access the _termStats
variable which provides statistical information about the terms used in the child query of the script_score
query.
You can use any of the available painless functions in your script
. You can also use the following predefined functions to customize scoring:
We suggest using these predefined functions instead of writing your own. These functions take advantage of efficiencies from Elasticsearch' internal mechanisms.
saturation(value,k) = value/(k + value)
"script" : {
"source" : "saturation(doc['my-int'].value, 1)"
}
sigmoid(value, k, a) = value^a/ (k^a + value^a)
"script" : {
"source" : "sigmoid(doc['my-int'].value, 2, 1)"
}
random_score
function generates scores that are uniformly distributed from 0 up to but not including 1.
randomScore
function has the following syntax: randomScore(<seed>, <fieldName>)
. It has a required parameter - seed
as an integer value, and an optional parameter - fieldName
as a string value.
"script" : {
"source" : "randomScore(100, '_seq_no')"
}
If the fieldName
parameter is omitted, the internal Lucene document ids will be used as a source of randomness. This is very efficient, but unfortunately not reproducible since documents might be renumbered by merges.
"script" : {
"source" : "randomScore(100)"
}
Note that documents that are within the same shard and have the same value for field will get the same score, so it is usually desirable to use a field that has unique values for all documents across a shard. A good default choice might be to use the _seq_no
field, whose only drawback is that scores will change if the document is updated since update operations also update the value of the _seq_no
field.
You can read more about decay functions here.
double decayNumericLinear(double origin, double scale, double offset, double decay, double docValue)
double decayNumericExp(double origin, double scale, double offset, double decay, double docValue)
double decayNumericGauss(double origin, double scale, double offset, double decay, double docValue)
"script" : {
"source" : "decayNumericLinear(params.origin, params.scale, params.offset, params.decay, doc['dval'].value)",
"params": {
"origin": 20,
"scale": 10,
"decay" : 0.5,
"offset" : 0
}
}
params
allows to compile the script only once, even if params change.double decayGeoLinear(String originStr, String scaleStr, String offsetStr, double decay, GeoPoint docValue)
double decayGeoExp(String originStr, String scaleStr, String offsetStr, double decay, GeoPoint docValue)
double decayGeoGauss(String originStr, String scaleStr, String offsetStr, double decay, GeoPoint docValue)
"script" : {
"source" : "decayGeoExp(params.origin, params.scale, params.offset, params.decay, doc['location'].value)",
"params": {
"origin": "40, -70.12",
"scale": "200km",
"offset": "0km",
"decay" : 0.2
}
}
double decayDateLinear(String originStr, String scaleStr, String offsetStr, double decay, JodaCompatibleZonedDateTime docValueDate)
double decayDateExp(String originStr, String scaleStr, String offsetStr, double decay, JodaCompatibleZonedDateTime docValueDate)
double decayDateGauss(String originStr, String scaleStr, String offsetStr, double decay, JodaCompatibleZonedDateTime docValueDate)
"script" : {
"source" : "decayDateGauss(params.origin, params.scale, params.offset, params.decay, doc['date'].value)",
"params": {
"origin": "2008-01-01T01:00:00Z",
"scale": "1h",
"offset" : "0",
"decay" : 0.5
}
}
Note
Decay functions on dates are limited to dates in the default format and default time zone. Also calculations with now
are not supported.
Functions for vector fields are accessible through script_score
query.
Script score queries will not be executed if search.allow_expensive_queries
is set to false.
The script_score
query calculates the score for every matching document, or hit. There are faster alternative query types that can efficiently skip non-competitive hits:
rank_feature
query.distance_feature
query.We recommend using the script_score
query instead of function_score
query for the simplicity of the script_score
query.
You can implement the following functions of the function_score
query using the script_score
query:
What you used in script_score
of the Function Score query, you can copy into the Script Score query. No changes here.
weight
function can be implemented in the Script Score query through the following script:
"script" : {
"source" : "params.weight * _score",
"params": {
"weight": 2
}
}
Use randomScore
function as described in random score function.
field_value_factor
function can be easily implemented through script:
"script" : {
"source" : "Math.log10(doc['field'].value * params.factor)",
"params" : {
"factor" : 5
}
}
For checking if a document has a missing value, you can use doc['field'].size() == 0
. For example, this script will use a value 1
if a document doesnât have a field field
:
"script" : {
"source" : "Math.log10((doc['field'].size() == 0 ? 1 : doc['field'].value()) * params.factor)",
"params" : {
"factor" : 5
}
}
This table lists how field_value_factor
modifiers can be implemented through a script:
none
- log
Math.log10(doc['f'].value)
log1p
Math.log10(doc['f'].value + 1)
log2p
Math.log10(doc['f'].value + 2)
ln
Math.log(doc['f'].value)
ln1p
Math.log(doc['f'].value + 1)
ln2p
Math.log(doc['f'].value + 2)
square
Math.pow(doc['f'].value, 2)
sqrt
Math.sqrt(doc['f'].value)
reciprocal
1.0 / doc['f'].value
The script_score
query has equivalent decay functions that can be used in scripts.
Note
During vector functions' calculation, all matched documents are linearly scanned. Thus, expect the query time grow linearly with the number of matched documents. For this reason, we recommend to limit the number of matched documents with a query
parameter.
This is the list of available vector functions and vector access methods:
cosineSimilarity
â calculates cosine similaritydotProduct
â calculates dot productl1norm
â calculates L1 distancehamming
â calculates Hamming distancel2norm
- calculates L2 distancedoc[<field>].vectorValue
â returns a vectorâs value as an array of floatsdoc[<field>].magnitude
â returns a vectorâs magnitudeNote
The cosineSimilarity
function is not supported for bit
vectors.
Note
The recommended way to access dense vectors is through the cosineSimilarity
, dotProduct
, l1norm
or l2norm
functions. Please note however, that you should call these functions only once per script. For example, donât use these functions in a loop to calculate the similarity between a document vector and multiple other vectors. If you need that functionality, reimplement these functions yourself by accessing vector values directly.
Letâs create an index with a dense_vector
mapping and index a couple of documents into it.
PUT my-index-000001
{ "mappings": { "properties": { "my_dense_vector": { "type": "dense_vector", "index": false, "dims": 3 }, "my_byte_dense_vector": { "type": "dense_vector", "index": false, "dims": 3, "element_type": "byte" }, "status" : { "type" : "keyword" } } } } PUT my-index-000001/_doc/1 { "my_dense_vector": [0.5, 10, 6], "my_byte_dense_vector": [0, 10, 6], "status" : "published" } PUT my-index-000001/_doc/2 { "my_dense_vector": [-0.5, 10, 10], "my_byte_dense_vector": [0, 10, 10], "status" : "published" } POST my-index-000001/_refresh
The cosineSimilarity
function calculates the measure of cosine similarity between a given query vector and document vectors.
GET my-index-000001/_search
{ "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": "cosineSimilarity(params.query_vector, 'my_dense_vector') + 1.0", "params": { "query_vector": [4, 3.4, -0.2] } } } } }
Note
If a documentâs dense vector field has a number of dimensions different from the queryâs vector, an error will be thrown.
The dotProduct
function calculates the measure of dot product between a given query vector and document vectors.
GET my-index-000001/_search
{ "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": """ double value = dotProduct(params.query_vector, 'my_dense_vector'); return sigmoid(1, Math.E, -value); """, "params": { "query_vector": [4, 3.4, -0.2] } } } } }
The l1norm
function calculates L1 distance (Manhattan distance) between a given query vector and document vectors.
GET my-index-000001/_search
{ "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": "1 / (1 + l1norm(params.queryVector, 'my_dense_vector'))", "params": { "queryVector": [4, 3.4, -0.2] } } } } }
cosineSimilarity
that represent similarity, l1norm
and l2norm
shown below represent distances or differences. This means, that the more similar the vectors are, the lower the scores will be that are produced by the l1norm
and l2norm
functions. Thus, as we need more similar vectors to score higher, we reversed the output from l1norm
and l2norm
. Also, to avoid division by 0 when a document vector matches the query exactly, we added 1
in the denominator.The hamming
function calculates Hamming distance between a given query vector and document vectors. It is only available for byte and bit vectors.
GET my-index-000001/_search
{ "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": "(24 - hamming(params.queryVector, 'my_byte_dense_vector')) / 24", "params": { "queryVector": [4, 3, 0] } } } } }
The l2norm
function calculates L2 distance (Euclidean distance) between a given query vector and document vectors.
GET my-index-000001/_search
{ "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": "1 / (1 + l2norm(params.queryVector, 'my_dense_vector'))", "params": { "queryVector": [4, 3.4, -0.2] } } } } }
If a document doesnât have a value for a vector field on which a vector function is executed, an error will be thrown.
You can check if a document has a value for the field my_vector
with doc['my_vector'].size() == 0
. Your overall script can look like this:
"source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, 'my_vector')"
You can access vector values directly through the following functions:
doc[<field>].vectorValue
â returns a vectorâs value as an array of floatsNote
For bit
vectors, it does return a float[]
, where each element represents 8 bits.
doc[<field>].magnitude
â returns a vectorâs magnitude as a float (for vectors created prior to version 7.5 the magnitude is not stored. So this function calculates it anew every time it is called).Note
For bit
vectors, this is just the square root of the sum of 1
bits.
For example, the script below implements a cosine similarity using these two functions:
GET my-index-000001/_search
{ "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": """ float[] v = doc['my_dense_vector'].vectorValue; float vm = doc['my_dense_vector'].magnitude; float dotProduct = 0; for (int i = 0; i < v.length; i++) { dotProduct += v[i] * params.queryVector[i]; } return dotProduct / (vm * (float) params.queryVectorMag); """, "params": { "queryVector": [4, 3.4, -0.2], "queryVectorMag": 5.25357 } } } } }
When using bit
vectors, not all the vector functions are available. The supported functions are:
hamming
â calculates Hamming distance, the sum of the bitwise XOR of the two vectorsl1norm
â calculates L1 distance, this is simply the hamming
distancel2norm
- calculates L2 distance, this is the square root of the hamming
distancedotProduct
â calculates dot product. When comparing two bit
vectors, this is the sum of the bitwise AND of the two vectors. If providing float[]
or byte[]
, who has dims
number of elements, as a query vector, the dotProduct
is the sum of the floating point values using the stored bit
vector as a mask.Note
When comparing floats
and bytes
with bit
vectors, the bit
vector is treated as a mask in big-endian order. For example, if the bit
vector is 10100001
(e.g. the single byte value 161
) and its compared with array of values [1, 2, 3, 4, 5, 6, 7, 8]
the dotProduct
will be 1 + 3 + 8 = 16
.
Here is an example of using dot-product with bit vectors.
PUT my-index-bit-vectors
{ "mappings": { "properties": { "my_dense_vector": { "type": "dense_vector", "index": false, "element_type": "bit", "dims": 40 } } } } PUT my-index-bit-vectors/_doc/1 { "my_dense_vector": [8, 5, -15, 1, -7] } PUT my-index-bit-vectors/_doc/2 { "my_dense_vector": [-1, 115, -3, 4, -128] } PUT my-index-bit-vectors/_doc/3 { "my_dense_vector": [2, 18, -5, 0, -124] } POST my-index-bit-vectors/_refresh
bit
vector.5 * 8 = 40
bits, which equals the configured dimensionsGET my-index-bit-vectors/_search
{ "query": { "script_score": { "query" : { "match_all": {} }, "script": { "source": "dotProduct(params.query_vector, 'my_dense_vector')", "params": { "query_vector": [8, 5, -15, 1, -7] } } } } }
&
operation with the stored vectors.GET my-index-bit-vectors/_search
{ "query": { "script_score": { "query" : { "match_all": {} }, "script": { "source": "dotProduct(params.query_vector, 'my_dense_vector')", "params": { "query_vector": [0.23, 1.45, 3.67, 4.89, -0.56, 2.34, 3.21, 1.78, -2.45, 0.98, -0.12, 3.45, 4.56, 2.78, 1.23, 0.67, 3.89, 4.12, -2.34, 1.56, 0.78, 3.21, 4.12, 2.45, -1.67, 0.34, -3.45, 4.56, -2.78, 1.23, -0.67, 3.89, -4.34, 2.12, -1.56, 0.78, -3.21, 4.45, 2.12, 1.67] } } } } }
bit
vector as a mask.Currently, the cosineSimilarity
function is not supported for bit
vectors.
Using an explain request provides an explanation of how the parts of a score were computed. The script_score
query can add its own explanation by setting the explanation
parameter:
GET /my-index-000001/_explain/0
{ "query": { "script_score": { "query": { "match": { "message": "elasticsearch" } }, "script": { "source": """ long count = doc['count'].value; double normalizedCount = count / 10; if (explanation != null) { explanation.set('normalized count = count / 10 = ' + count + ' / 10 = ' + normalizedCount); } return normalizedCount; """ } } } }
Note that the explanation
will be null when using in a normal _search
request, so having a conditional guard is best practice.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4