Elasticsearch:如何让一个文档在搜索结果中永远排第一名
2021年5月25日 | by mebius
在许多的情况下,我们需要把想要的文档的排名排在第一。比如对于 eCommerce 的用户来说,我想把某个或某些商品的排名提高,这样它们永远排在其它的文档的前面。对于一些讨论区,我想把某些帖子永远置顶。又或者说,某些tgcode搜索网站,由于客户给的钱多,我想把他们的广告永远放到搜索结果的第一页的第一个位置。这个叫做竞价排名。
在我之前的文章 “Elasticsearch: 运用 Pinned query 来提高文档的排名 (7.5发行版新功能)”,我介绍了一种使用 Pinneed query 的方法来针对一些 id 来进行处理从而使得它们的排名靠前。这个在很多的场合是非常有用的。前提条件是我们需要知道它们的 id,另外这个功能也只适合在 7.5 发现的版本之后。
在今天的文章中,我来介绍一种比较通用的办法:使用 script 来进行排序。当然使用 script 的坏处是:针对大量数据来说,它需要针对每个文档进行计算,会带来一些计算的损耗。
准备数据
在今天的练习中,我们使用如下的数据:
POST _bulk
{ "index" : { "_index" : "twitter", "_id": 1} }
{"user":"张三","message":"今儿天气不错啊,出去转转去","uid":"1","city":"北京","province":"北京","country":"中国","address":"中国北京市海淀区","location":{"lat":"39.970718","lon":"116.325747"}, "DOB":"1980-12-01"}
{ "index" : { "_index" : "twitter", "_id": 2 }}
{"user":"老刘","message":"出发,下一站云南!","uid":"2", "city":"北京","province":"北京","country":"中国","address":"中国北京市东城区台基厂三条3号","location":{"lat":"39.904313","lon":"116.412754"}, "DOB":"1981-12-01"}
{ "index" : { "_index" : "twitter", "_id": 3} }
{"user":"李四","message":"happy birthday!","uid":"3","city":"北京","province":"北京","country":"中国","address":"中国北京市东城区","location":{"lat":"39.893801","lon":"116.408986"}, "DOB":"1982-12-01"}
{ "index" : { "_index" : "twitter", "_id": 4} }
{"user":"老贾","message":"123,gogogo","uid":"4","city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区建国门","location":{"lat":"39.718256","lon":"116.367910"}, "DOB":"1983-12-01"}
{ "index" : { "_index" : "twitter", "_id": 5} }
{"user":"老王","message":"Happy BirthDay My Friend!","uid":"5","city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区国贸","location":{"lat":"39.918256","lon":"116.467910"}, "DOB":"1984-12-01"}
{ "index" : { "_index" : "twitter", "_id": 6} }
{"user":"老吴","message":"好友来了都今天我生日,好友来了,什么 birthday happy 就成!","uid":"6","city":"上海","province":"上海","country":"中国","address":"中国上海市闵行区","location":{"lat":"31.175927","lon":"121.383328"}, "DOB":"1985-12-01"}
对数据进行搜索
首先我们想查询所有在北京的用户:
GET twitter/_search
{
"query": {
"match": {
"city": "北京"
}
}
}
我们执行上面的搜索,得到如下的结果:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : 0.48232412,
"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.48232412,
"_source" : {
"user" : "张三",
"message" : "今儿天气不错啊,出去转转去",
"uid" : "1",
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市海淀区",
"location" : {
"lat" : "39.970718",
"lon" : "116.325747"
},
"DOB" : "1980-12-01"
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.48232412,
"_source" : {
"user" : "老刘",
"message" : "出发,下一站云南!",
"uid" : "2",
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市东城区台基厂三条3号",
"location" : {
"lat" : "39.904313",
"lon" : "116.412754"
},
"DOB" : "1981-12-01"
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.48232412,
"_source" : {
tgcode"user" : "李四",
"message" : "happy birthday!",
"uid" : "3",
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市东城区",
"location" : {
"lat" : "39.893801",
"lon" : "116.408986"
},
"DOB" : "1982-12-01"
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.48232412,
"_source" : {
"user" : "老贾",
"message" : "123,gogogo",
tgcode "uid" : "4",
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市朝阳区建国门",
"location" : {
"lat" : "39.718256",
"lon" : "116.367910"
},
"DOB" : "1983-12-01"
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "5",
"_score" : 0.48232412,
"_source" : {
"user" : "老王",
"message" : "Happy BirthDay My Friend!",
"uid" : "5",
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市朝阳区国贸",
"location" : {
"lat" : "39.918256",
"lon" : "116.467910"
},
"DOB" : "1984-12-01"
}
}
]
}
}
从上面我们可以看出来:uid 为1的文档排在第一的位置,尽管它和其它文档的分数都是一样的。
接下来,我们想把 uid 为2和3的文档的得分提高,想让它们在搜索结果中排在前面的位置,那么我们该如何做到呢?我们可以使用如下的方法:
GET twitter/_search
{
"query": {
"match": {
"city": "北京"
}
},
"sort": [
{
"_script": {
"type": "number",
"script": {
"source": "Boolean.compare(params.ids.contains(doc['uid.keyword'].value), false);",
"lang": "painless",
"params": {
"ids": [
"2",
"3"
]
}
},
"order": "desc"
}
},
{
"_score": {
"order": "desc"
}
}
]
}
在上面,我使用了一个 script 的脚本来重新计算一个 number,并按照它来进行排序。在 ids 中,我们定义了文档的 uid 值。这是一个数组。我们可以把想提高排名的 uid 值填入这个数组中,从而达到使得它们的排名靠前。
上面搜索的运行结果为:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.48232412,
"_source" : {
"user" : "老刘",
"message" : "出发,下一站云南!",
"uid" : "2",
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市东城区台基厂三条3号",
"location" : {
"lat" : "39.904313",
"lon" : "116.412754"
},
"DOB" : "1981-12-01"
},
"sort" : [
1.0,
0.48232412
]
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.48232412,
"_source" : {
"user" : "李四",
"message" : "happy birthday!",
"uid" : "3",
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市东城区",
"location" : {
"lat" : "39.893801",
"lon" : "116.408986"
},
"DOB" : "1982-12-01"
},
"sort" : [
1.0,
0.48232412
]
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.48232412,
"_source" : {
"user" : "张三",
"message" : "今儿天气不错啊,出去转转去",
"uid" : "1",
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市海淀区",
"location" : {
"lat" : "39.970718",
"lon" : "116.325747"
},
"DOB" : "1980-12-01"
},
"sort" : [
0.0,
0.48232412
]
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.48232412,
"_source" : {
"user" : "老贾",
"message" : "123,gogogo",
"uid" : "4",
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市朝阳区建国门",
"location" : {
"lat" : "39.718256",
"lon" : "116.367910"
},
"DOB" : "1983-12-01"
},
"sort" : [
0.0,
0.48232412
]
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "5",
"_score" : 0.48232412,
"_source" : {
"user" : "老王",
"message" : "Happy BirthDay My Friend!",
"uid" : "5",
"city" : "北京",
"province" : "北京",
"country" : "中国",
"address" : "中国北京市朝阳区国贸",
"location" : {
"lat" : "39.918256",
"lon" : "116.467910"
},
"DOB" : "1984-12-01"
},
"sort" : [
0.0,
0.48232412
]
}
]
}
}
从上面的返回结果中,我们可以看出来 uid 为 2 和 3 的文档排名靠前。它们出现在搜索结果的最前面。
文章来源于互联网:Elasticsearch:如何让一个文档在搜索结果中永远排第一名
相关推荐: Elasticsearch:理解 Elastic Maps 中的 geohash 及其聚合
在我们使用 Elastic Maps 时,经常会遇到 geohash。通常当我们描述一个位置的时候,我们很习惯使用经纬度来描述一个位置。在 Elasticsearch 中,有一个叫做 geo_point 的数据类型。例如,我们可以定义如下的一个索引: PUT …