Relevance, Scoring & Sorting

What You'll Master in This Chapter
Learn what separates amateur from professional search experiences →
- BM25 Scoring Mastery: Deep understanding of how ElasticSearch ranks results and why
- Advanced Sorting Strategies: Script-based, geo-distance, and nested field sorting techniques
- ML-Powered Ranking: ElasticSearch 8.x machine learning inference and Learning to Rank (LTR)
- Semantic Search: Dense vector implementation for context-aware search results
- Professional Debugging: Tools and techniques that reveal exactly why results rank as they do
- A/B Testing Framework: Custom similarity functions and relevance optimization used by industry leaders
Arguably the most important part of search is relevance but there are dozens of strategies to asymptotically reach it and equally as many factors that affect it.
Everyone wants a better search but "success" means different things to different people:
- as few search-as-you-type keystrokes as possible
- more clicks on the first search result
- increased usage of the search box in general etc.
This topic is too broad and so I'm not going to go into the different techniques here but will rather refer you to this insightful article (section "Relevancy") .
You'll have noticed by now that the Search API response typically includes the _score
attribute inside of each retrieved hit. By default, each hit has a score of 1.0
. This score is then affected by what queries matched a given doc and how good the match was.
How good the match was introduces the concept of similarity scoring. Scoring in Elasticsearch is since v5.x governed by an algorithm called Okapi BM25 which is explained here in great detail .
2025 Update: ElasticSearch 8.x introduced enhanced BM25 with improved handling of short documents and better normalization for modern search use cases. The core algorithm remains the same, but performance and accuracy have been significantly improved.
Now, when you're completely lost as to why ES assigned a given score to a given doc, or wondered why the response hits are ordered the way they are, you can count on the Explain API to provide a great deal of feedback:
POST index_name/_search?explain=true
{
"query": {
"simple_query_string": {
"query": "abc"
}
}
}
Hits are ordered by their scores in the descending order by default.
If you ever need to randomize the search results (→ do the "opposite" of scoring), you can use a random function score query . On the other hand, if you need to assign a constant score, use the constant score query .
More often than not, it's not the similarity scores that establish the relevance but rather the order stemming from actual fields in the index. Your site's visitors would like to sort by ↓ price, ↓ sale %, ↑ alphabetically etc.
Consequently, a typical sort request targeting multiple fields would look like this:
POST index_name/_search
{
"sort" : [
{ "price" : {"order" : "desc"} },
{ "sale" : {"order" : "desc"} },
{ "name" : {"order" : "asc"} }
]
}
If the field is an array, you can specify the mode
governing which array value is picked for sorting.
If you apply any sort
to the search body request, the _scores
will be set to null
. If you'd like to retain the originally computed scores, make sure to turn on the track_scores
setting .
As discussed on StackOverflow , giving precedence to entries which contain a substring:
POST employees/_search
{
"sort": [
{
"_script": {
"script": "return doc['employee.address.keyword'].value.indexOf('US') > -1 ? 1 : 0",
"type": "number",
"order": "desc"
}
},
{
"employee.ename.keyword": {
"order": "asc"
}
}
]
}
Script sorting can have some pretty advanced applications like date conditionals , parametrized current timestamps , and giving more weight to terms that appear earlier . Notice that unlike other sorting mechanisms, script sorts always need to specify the type
of the value they'll return (which of course cannot be inferred from an inline script).
If you intend to sort some places in relation to a static geo point, you could say:
POST places/_search
{
"sort": [
{
"_geo_distance": {
"location": [ -73.9982, 40.7388 ],
"order": "asc"
}
}
]
}
This is further discussed here and is equivalent to a script sort of the form
POST places/_search
{
"sort": [
{
"_script": {
"script": "doc['location'].arcDistance(40.7388, -73.9982)",
"type": "number",
"order": "asc"
}
}
]
}
but notice the coordinate order reversal . The former specifies them as [lon, lat]
, the latter as lat, lon
.
This is analogous to the structure of queries on nested fields :