Elasticsearch:使用 Java 来对 Elasticsearch 索引进行聚合

2021年9月2日   |   by mebius

聚合是 Elasticsearch 中一个强大的工具,它允许你计算字段的最小值、最大值、平均值等等。在我之前的文章中,我许多介绍 Elasticsearch 聚合的文章,比如Elasticsearch:aggregation 介绍。更多关于 aggregation 的介绍,请参阅 “Elastic:菜鸟上手指南” 文章中的 “Aggregations” 章节。

有不同类型的聚合,每一种都有自己的目的。 本章将详细讨论它们。在今天的例子中,我将简单地介绍像我们在 SQL 中的那些简单的聚合:

%title插图%num

在这里,我就不详述每个聚合的具体意义了。我们着重于介绍如何使用 Jave client API 来访问并且计算相应的聚合。关于 Java client API 的介绍,你可以到 Elastic 的官方网站链接去查看。

安装

如果你还没有安装好自己的 Elasticsearch 及 Kibana,请参照如下的文章来进行安装:

创建 Java 客户端

我们首先使用一个自己的喜欢的 Java 开发工具,比如 eclipse 或者 InteliJ 来创建一个简单的 Maven 项目:

pom.xml



    4.0.0

    org.liuxg.demo
    elasticjava
    1.0-SNAPSHOT

    
        15
        15
    

    
        
            org.elasticsearch.client
            elasticsearch-rest-high-level-client
            7.11.2
        
        
            org.elasticsearch.client
            elasticsearch-rest-client
            7.11.2
        
        
            org.elasticsearch
            elasticsearch
            7.11.2
        
        
            com.fasterxml.jackson.core
            jackson-databind
            2.11.1
        
    

你需要添加相应的 dependency。

接下来,我们可以直接在 Kibana 中输入如下的命令来创建一个简单的 classindex 的索引:

POST coachingclass/_bulk
{ "index" : {"_id": 1} }
{ "classname" : "Galaxy", "cource" : "Physics","instructor":"Sheldon Kooper","language":"English","seats available":18,"fees" : 6000 }
{ "index" : {"_id": 2} }
{ "classname" : "Galaxy", "cource" : "Chemistry","instructor":"Tom Nelson","language":"English","seats available":20,"fees" : 4000}
{ "index" : {"_id": 3} }
{ "classname" : "Galaxy","cource" : "Maths","instructor":"Smith Ray","language":"English","seats available":25,"fees" : 3000 }
{ "index" : {"_id": 4} }
{ "classname" : "Galaxy", "cource" : "Biology","instructor":"Tom Nelson","language":"English","seats available":12,"fees" : 2000 }
{ "index" : {"_id": 5} }
{ "classname" : "Galaxy", "cource" : "Social Science","instructor":"Ric Johanson","language":"English","seats available":10,"fees" : 3000 }

如果你对手动创建不感兴趣,你可以参考我之前的文章 “Elasticsearch:Java 运用示例” 来通过客户端应用来进行创建。

我们再接下来创建一个叫做 Aggregation 的类:

%title插图%num

它的内容如下:

Aggregation.java

import java.io.IOException;
import java.util.Arrays;
import java.util.Map;

import org.apache.http.HttpHost;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregatgcodetions.metrics.Avg;
import org.elasticsearch.search.aggregations.metrics.Cardinality;
import org.elasticsearch.search.aggregations.metrics.Max;
import org.elasticsearch.search.aggregations.metrics.Min;
import org.elasticsearch.search.aggregations.metrics.Sum;
import org.elasticsearch.search.aggregations.metrics.ValueCount;
import org.elasticsearch.search.builder.SearchSourceBuilder;

public class Aggregation {
    @SuppressWarnings("resource")
    public static void main(String[] args) {

        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(new HttpHost("localhost", 9200, "http")));

        SearchRequest searchRequest = new SearchRequest();
        searchRequest.indices("classindex");
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        searchSourceBuilder.query(QueryBuilders.matchAllQuery());

        searchSourceBuilder.aggregation(AggregationBuilders.sum("sum").field("fees"));
        searchSourceBuilder.aggregation(AggregationBuilders.avg("avg").field("fees"));
        searchSourceBuilder.aggregation(AggregationBuilders.min("min").field("fees"));
        searchSourceBuilder.aggregation(AggregationBuilders.max("max").field("fees"));
        searchSourceBuilder.aggregation(AggregationBuilders.cardinality("cardinality").field("fees"));
        searchSourceBuilder.aggregation(AggregationBuilders.count("count").field("fees"));

        searchRequest.source(searchSourceBuilder);
        Map map = null;

        try {
            SearchResponse searchResponse = null;
            searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
            if (searchResponse.getHits().getTotalHits().value > 0) {
                SearchHit[] searchHit = searchResponse.getHits().getHits();
                for (SearchHit hit : searchHit) {
                    map = hit.getSourceAsMap();
                    System.out.println("Index data:" + Arrays.toString(map.entrySet().toArray()));

                }
            }

            Sum sum = searchResponse.getAggregations().get("sum");
            double result = sum.getValue();
            System.out.println("aggs Sum: " + result);
            Avg aggAvg = searchResponse.getAggregations().get("avg");
            double valueAvg = aggAvg.getValue();
            System.out.println("aggs Avg::" + valueAvg);
            Min aggMin = searchResponse.getAggregations().get("min");
            double minOutput = aggMin.getValue();
            System.out.println("aggs Min::" + minOutput);
            Max aggMax = searchResponse.getAggregations().get("max");
            double maxOutput = aggMax.getValue();
            System.out.println("aggs Max::" + maxOutput);
            Cardinality aggCadinality = searchResponse.getAggregations().get("cardinality");
            long valueCadinality = aggCadinality.getValue();
            System.out.println("aggs Cadinality::" + valueCadinality);
       tgcode     ValueCount aggCount = searchResponse.getAggregations().get("count");
            long valueCount = aggCount.getValue();
            System.out.println("aggs Count::" + valueCount);
        } catch (IOException ex) {
            ex.printStackTrace();
        }
    }
}

请注意在上面:

        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(new HttpHost("localhost", 9200, "http")));

我们需要根据自己的 Elasticsearch 的地址和端口地址进行相应的修改。上面的代码复制和 Elasticsearch 进行连接。

在上面,我们使用了searchRequest.indices(“classindex”); 来设置我们的索引。我们对 fee 这个字段进行了如下的聚合:

  • sum
  • avg
  • min
  • max
  • cardinality
  • count

编译并运行上面的代码,我们可以看到如下的输出:

Index data:[fees=6000, classname=Galaxy, instructor=Sheldon Kooper, seats available=18, cource=Physics, language=English]
Index data:[fees=4000, classname=Galaxy, instructor=Tom Nelson, seats available=20, cource=Chemistry, language=English]
Index data:[fees=3000, classname=Galaxy, instructor=Smith Ray, seats available=25, cource=Maths, language=English]
Index data:[fees=2000, classname=Galaxy, instructor=Tom Nelson, seats available=12, cource=Biology, language=English]
Index data:[fees=3000, classname=Galaxy, instructor=Ric Johanson, seats available=10, cource=Social Science, language=English]
aggs Sum: 18000.0
aggs Avg::3600.0
aggs Min::2000.0
aggs Max::6000.0
aggs Cadinality::4
aggs Count::5

你可以在地址https://github.com/liu-xiao-guo/elasticjavaaggr下载源码。

tgcode

文章来源于互联网:Elasticsearch:使用 Java 来对 Elasticsearch 索引进行聚合

相关推荐: Elasticsearch:Boosting query – 为不喜欢的查询减分

在我们实际的查询中,我们总希望能把满足我们查询的结果排在查询的前面。在在 Elasticsearch 中,通过相关性的调整可以完成这个目的。在返回的结果中,得分最高的结果总排在第一名,依次类推,得分最低的排在最后。我们可以参考文章 “Elasticsearch…

Tags: , , , , ,