Elasticsearch:使用 Java 来对 Elasticsearch 索引进行聚合
2021年9月2日 | by mebius
聚合是 Elasticsearch 中一个强大的工具,它允许你计算字段的最小值、最大值、平均值等等。在我之前的文章中,我许多介绍 Elasticsearch 聚合的文章,比如Elasticsearch:aggregation 介绍。更多关于 aggregation 的介绍,请参阅 “Elastic:菜鸟上手指南” 文章中的 “Aggregations” 章节。
有不同类型的聚合,每一种都有自己的目的。 本章将详细讨论它们。在今天的例子中,我将简单地介绍像我们在 SQL 中的那些简单的聚合:
在这里,我就不详述每个聚合的具体意义了。我们着重于介绍如何使用 Jave client API 来访问并且计算相应的聚合。关于 Java client API 的介绍,你可以到 Elastic 的官方网站链接去查看。
安装
如果你还没有安装好自己的 Elasticsearch 及 Kibana,请参照如下的文章来进行安装:
- 如何在 Linux,MacOS 及 Windows 上进行安装 Elasticsearch
- Kibana:如何在 Linux,MacOS 及 Windows上安装 Elastic 栈中的 Kibana
创建 Java 客户端
我们首先使用一个自己的喜欢的 Java 开发工具,比如 eclipse 或者 InteliJ 来创建一个简单的 Maven 项目:
pom.xml
4.0.0
org.liuxg.demo
elasticjava
1.0-SNAPSHOT
15
15
org.elasticsearch.client
elasticsearch-rest-high-level-client
7.11.2
org.elasticsearch.client
elasticsearch-rest-client
7.11.2
org.elasticsearch
elasticsearch
7.11.2
com.fasterxml.jackson.core
jackson-databind
2.11.1
你需要添加相应的 dependency。
接下来,我们可以直接在 Kibana 中输入如下的命令来创建一个简单的 classindex 的索引:
POST coachingclass/_bulk
{ "index" : {"_id": 1} }
{ "classname" : "Galaxy", "cource" : "Physics","instructor":"Sheldon Kooper","language":"English","seats available":18,"fees" : 6000 }
{ "index" : {"_id": 2} }
{ "classname" : "Galaxy", "cource" : "Chemistry","instructor":"Tom Nelson","language":"English","seats available":20,"fees" : 4000}
{ "index" : {"_id": 3} }
{ "classname" : "Galaxy","cource" : "Maths","instructor":"Smith Ray","language":"English","seats available":25,"fees" : 3000 }
{ "index" : {"_id": 4} }
{ "classname" : "Galaxy", "cource" : "Biology","instructor":"Tom Nelson","language":"English","seats available":12,"fees" : 2000 }
{ "index" : {"_id": 5} }
{ "classname" : "Galaxy", "cource" : "Social Science","instructor":"Ric Johanson","language":"English","seats available":10,"fees" : 3000 }
如果你对手动创建不感兴趣,你可以参考我之前的文章 “Elasticsearch:Java 运用示例” 来通过客户端应用来进行创建。
我们再接下来创建一个叫做 Aggregation 的类:
它的内容如下:
Aggregation.java
import java.io.IOException;
import java.util.Arrays;
import java.util.Map;
import org.apache.http.HttpHost;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregatgcodetions.metrics.Avg;
import org.elasticsearch.search.aggregations.metrics.Cardinality;
import org.elasticsearch.search.aggregations.metrics.Max;
import org.elasticsearch.search.aggregations.metrics.Min;
import org.elasticsearch.search.aggregations.metrics.Sum;
import org.elasticsearch.search.aggregations.metrics.ValueCount;
import org.elasticsearch.search.builder.SearchSourceBuilder;
public class Aggregation {
@SuppressWarnings("resource")
public static void main(String[] args) {
RestHighLevelClient client = new RestHighLevelClient(
RestClient.builder(new HttpHost("localhost", 9200, "http")));
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("classindex");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchAllQuery());
searchSourceBuilder.aggregation(AggregationBuilders.sum("sum").field("fees"));
searchSourceBuilder.aggregation(AggregationBuilders.avg("avg").field("fees"));
searchSourceBuilder.aggregation(AggregationBuilders.min("min").field("fees"));
searchSourceBuilder.aggregation(AggregationBuilders.max("max").field("fees"));
searchSourceBuilder.aggregation(AggregationBuilders.cardinality("cardinality").field("fees"));
searchSourceBuilder.aggregation(AggregationBuilders.count("count").field("fees"));
searchRequest.source(searchSourceBuilder);
Map map = null;
try {
SearchResponse searchResponse = null;
searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
if (searchResponse.getHits().getTotalHits().value > 0) {
SearchHit[] searchHit = searchResponse.getHits().getHits();
for (SearchHit hit : searchHit) {
map = hit.getSourceAsMap();
System.out.println("Index data:" + Arrays.toString(map.entrySet().toArray()));
}
}
Sum sum = searchResponse.getAggregations().get("sum");
double result = sum.getValue();
System.out.println("aggs Sum: " + result);
Avg aggAvg = searchResponse.getAggregations().get("avg");
double valueAvg = aggAvg.getValue();
System.out.println("aggs Avg::" + valueAvg);
Min aggMin = searchResponse.getAggregations().get("min");
double minOutput = aggMin.getValue();
System.out.println("aggs Min::" + minOutput);
Max aggMax = searchResponse.getAggregations().get("max");
double maxOutput = aggMax.getValue();
System.out.println("aggs Max::" + maxOutput);
Cardinality aggCadinality = searchResponse.getAggregations().get("cardinality");
long valueCadinality = aggCadinality.getValue();
System.out.println("aggs Cadinality::" + valueCadinality);
tgcode ValueCount aggCount = searchResponse.getAggregations().get("count");
long valueCount = aggCount.getValue();
System.out.println("aggs Count::" + valueCount);
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
请注意在上面:
RestHighLevelClient client = new RestHighLevelClient(
RestClient.builder(new HttpHost("localhost", 9200, "http")));
我们需要根据自己的 Elasticsearch 的地址和端口地址进行相应的修改。上面的代码复制和 Elasticsearch 进行连接。
在上面,我们使用了searchRequest.indices(“classindex”); 来设置我们的索引。我们对 fee 这个字段进行了如下的聚合:
- sum
- avg
- min
- max
- cardinality
- count
编译并运行上面的代码,我们可以看到如下的输出:
Index data:[fees=6000, classname=Galaxy, instructor=Sheldon Kooper, seats available=18, cource=Physics, language=English]
Index data:[fees=4000, classname=Galaxy, instructor=Tom Nelson, seats available=20, cource=Chemistry, language=English]
Index data:[fees=3000, classname=Galaxy, instructor=Smith Ray, seats available=25, cource=Maths, language=English]
Index data:[fees=2000, classname=Galaxy, instructor=Tom Nelson, seats available=12, cource=Biology, language=English]
Index data:[fees=3000, classname=Galaxy, instructor=Ric Johanson, seats available=10, cource=Social Science, language=English]
aggs Sum: 18000.0
aggs Avg::3600.0
aggs Min::2000.0
aggs Max::6000.0
aggs Cadinality::4
aggs Count::5
文章来源于互联网:Elasticsearch:使用 Java 来对 Elasticsearch 索引进行聚合
相关推荐: Elasticsearch:Boosting query – 为不喜欢的查询减分
在我们实际的查询中,我们总希望能把满足我们查询的结果排在查询的前面。在在 Elasticsearch 中,通过相关性的调整可以完成这个目的。在返回的结果中,得分最高的结果总排在第一名,依次类推,得分最低的排在最后。我们可以参考文章 “Elasticsearch…