Elasticsearch in Java projects – RESTful API over HTTP
The previous articles focus on the communication with Elasticsearch using Java Rest High Level Client. This article presents the communication over RESTful API over HTTP with a web client Postman. It is very handy tool for testing REST API where you can have your hands dirty by playing with the provided commands on your own. In this article all activities, regarding indices or documents, that were done using Java API are covered so you can easily see how they correspond each other.
Prerequisites
- Installed and running Elasticsearch (refer to official user guide)
- Installed and running Postman (refer to official user guide)
RESTful API
One of the way of talking to Elasticsearch is using the RESTful API over the HTTP protocol via the default 9200 port. Elasticsearch provides a wide set of APIs with very comprehensive documentation for each, that can be directly used to configure and access Elasticsearch features. Follows the official documentation:
A number of Elasticsearch GET APIs – most notably the search API – support a request body. While the GET action makes sense in the context of retrieving information, GET requests with a body are not supported by all HTTP libraries. All Elasticsearch GET APIs that require a body can also be submitted as POST requests.
Starting from the beginning, this command can be used to see the running instance details and the cluster information.
GET localhost:9200/?pretty
As a result, a similar response as shown below is received, where there are the information like:
- the machine’s hostname is „MARLAP”
- by default Elasticsearch cluster name is elasticsearch
- running version of Elasticsearch is 7.10.2
{
"name": "MARLAP",
"cluster_name": "elasticsearch",
"cluster_uuid": "zPXUxzVjRBerDRNTG0xpZA",
"version": {
"number": "7.10.2",
"build_flavor": "default",
"build_type": "zip",
"build_hash": "747e1cc71def077253878a59143c1f785afa92b9",
"build_date": "2021-01-13T00:42:12.435326Z",
"build_snapshot": false,
"lucene_version": "8.7.0",
"minimum_wire_compatibility_version": "6.8.0",
"minimum_index_compatibility_version": "6.0.0-beta1"
},
"tagline": "You Know, for Search"
}
In this article all the provided requests use convention without host and port so that the main command path is highlighted like:
GET /<path>
From this point some APIs that are used to cover the demo application functionality is listed together with using command and expected result. At the end it should lead to enrich drivers data with some statistics information, same as was realized via Java Rest High Level Client.
Index API
Index APIs are used to manage individual indices, index settings, aliases, or mappings.
1. Create index
The first step is to create two indices with timestamp suffix db-statistics-20220105-142000
and db-enriched-drivers-20220105-142000
where the demo data is stored using following commands:
PUT /db-statistics-20220105-142000
PUT /db-enriched-drivers-20220105-142000
As a result the similar response may be expected.
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "db-enriched-drivers-20220105-142000"
}
2. Get Index
By the following command the main information about the index may be retrieved:
GET /db-enriched-drivers-20220105-142000
As a result the response that contains information about aliases, mappings or settings is received.
{
"db-enriched-drivers-20220105-142000" : {
"aliases" : { },
"mappings" : { },
"settings" : {
"index" : {
"routing" : {
"allocation" : {
"include" : {
"_tier_preference" : "data_content"
}
}
},
"number_of_shards" : "1",
"provided_name" : "db-enriched-drivers-20220105-142000",
"creation_date" : "1641388937856",
"number_of_replicas" : "1",
"uuid" : "cdaMHbxVQ4eS_MTTZCqqyQ",
"version" : {
"created" : "7100299"
}
}
}
}
}
3. Delete index
It may happen that the index needs to be deleted for instance the name is incorrect or simply it is not needed any longer. To do so the following command can be used:
DELETE /db-enriched-drivers-20220105-142000
Now, using the command with GET HTTP verb, it may be checked whether the index was really deleted. The similar error response should be received then.
{
"error": {
"root_cause": [
{
"type": "index_not_found_exception",
"reason": "no such index [db-enriched-drivers-20220105-142000]",
"resource.type": "index_or_alias",
"resource.id": "db-enriched-drivers-20220105-142000",
"index_uuid": "_na_",
"index": "db-enriched-drivers-20220105-142000"
}
],
"type": "index_not_found_exception",
"reason": "no such index [db-enriched-drivers-20220105-142000]",
"resource.type": "index_or_alias",
"resource.id": "db-enriched-drivers-20220105-142000",
"index_uuid": "_na_",
"index": "db-enriched-drivers-20220105-142000"
},
"status": 404
}
After that let’s re-create the index db-enriched-drivers-20220105-142000
as we need it further.
4. Check alias
Once the indices are created, the aliases may be assigned, so there is no need to know the full name with the timestamp when refers to particular index. Before that we may check whether there are any already aliases assigned to created indices using the request with the wild card:
GET /_alias/db-*
where, as a result the response shown below with relevant information is received. It can be seen that at the moment no aliases are assigned
{
"db-enriched-drivers-20220105-142000": {
"aliases": {
}
},
"db-statistics-20220105-142000": {
"aliases": {
}
}
}
Sometimes it is useful to request the information about one specific alias. This can be done using the command:
GET /_alias/db-enriched-drivers
The result of the above is the response shown below.
{
"error": "alias [db-enriched-drivers] missing",
"status": 404
}
5. Create alias Once we know what aliases exist, we may need to create the one for specific index. The request contains the name of the index and the alias that we want to assign to it. For previously created indices following command can be used:
PUT /db-statistics-20220105-142000/_alias/db-statistics
PUT /db-enriched-drivers-20220105-142000/_alias/db-enriched-drivers
PUT /db-enriched-drivers-20220105-142000/_alias/db-enriched-drivers
Afterwards it can be checked whether they were created successfully by calling the request from point 4 and the response should look like the one shown below.
{
"db-enriched-drivers-20220105-142000": {
"aliases": {
"db-enriched-drivers": {}
}
},
"db-statistics-20220105-142000": {
"aliases": {
"db-statistics": {}
}
}
}
6. Delete alias
Once the alias needs to be removed from particular index, the following command can be used:
DELETE /db-enriched-drivers-20220105-142000/_alias/db-enriched-drivers
The result can be checked using command introduced in point 4.
7. Update mapping
Sometimes it may be handy to provide the mapping in advance so that the Elasticsearch engine knows how to map defined field upfront even though the default mapping done when the first documents are indexed is often enough. For the index db-enriched-drivers
the fields: code
, driverId
and nationality
have changed type into keyword
so that any aggregation on those fields is possible now.
In order to update mapping, the following request is used with the request body show below.
PUT localhost:9200/db-enriched-drivers/_mapping
{
"properties": {
"code": {
"type": "keyword",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"driverId": {
"type": "keyword",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"nationality": {
"type": "keyword",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
Mapping of any request can be checked using command
GET /<index_name>/_mapping
Enrich API
Enrich APIs are used to manage enrich policies.
1. Create enrich policy Now it is time to prepare the data enrichment. The first step is to create enrich policy using the command with body shown below.
PUT /_enrich/policy/enrich-statistics-policy
{
"match": {
"indices": [
"db-statistics"
],
"match_field": "driverId",
"enrich_fields": [
"races",
"wins",
"titles"
]
}
}
The policy for indices mentioned in the request body can be created only when the mapping is already know. It may be updated before any indexing or Elasticsearch creates the default once the data is indexed for the first time.
2. Execute enrich policy Next step to use the created policy is to execute it. The execution is done with following request.
PUT /_enrich/policy/enrich-statistics-policy/_execute
After that, the following response that indicates success should be received.
{
"status": {
"phase": "COMPLETE"
}
}
3. Get enrich policy The information about the enrich policy can be retrieved using the policy name with following command:
GET /_enrich/policy/enrich-statistics-policy
4. Delete enrich policy Once the enrich policy is created, it cannot be updated or modified. The only way to have the new policy is to delete the old one. However, before it can be done the pipeline that uses it needs to be deleted as well (refer to chapter „Ingest API – 3. Delete pipeline”).
The enrich policy can be deleted using following command:
DELETE /_enrich/policy/enrich-statistics-policy
Ingest pipeline
Ingest APIs can be used to manage tasks and resources related to ingest pipelines and processors.
1. Create pipeline
Once the enrich policy is created, the next step is to create the ingest pipline that will be used when new documents are indexed. The request contains pipeline name driver-enrichment-pipeline
and the configuration body shown below.
PUT /_ingest/pipeline/driver-enrichment-pipeline
{
"description": "Enrich drivers statistics information",
"processors": [
{
"enrich": {
"policy_name": "enrich-statistics-policy",
"field": "driverId",
"target_field": "statistics"
}
},
{
"remove": {
"field": "statistics.driverId"
}
}
]
}
This change takes effect immediately once it is used.
2. Get pipeline
The pipeline configuration for the specified pipeline name can be checked using following command:
GET /_ingest/pipeline/driver-enrichment-pipeline
3. Delete pipeline Once the pipeline needs to be deleted it is enough to send following request with the specified pipeline name:
DELETE /_ingest/pipeline/driver-enrichment-pipeline
Document API
The Document APIs can be used to do simple CRUD functionality. This article focuses only on indexing documents.
1. Index documents First part of the data needed to complete the task are statistics. The bulk request with specified body contains the data in required format is shown below.
POST /db-statistics/_bulk
{ "index": {}}
{"driverId": "alguersuari", "races": 46, "wins": 0, "titles": 0}
{ "index": {}}
{"driverId": "alonso", "races": 336, "wins": 32, "titles": 2}
{ "index": {}}
{"driverId": "rosa", "races": 105, "wins": 0, "titles": 0}
{ "index": {}}
{"driverId": "glock", "races": 94, "wins": 0, "titles": 0}
{ "index": {}}
{"driverId": "heidfeld", "races": 183, "wins": 0, "titles": 0 }
{ "index": {}}
{"driverId": "hulkenberg", "races": 181, "wins": 0, "titles": 0}
{ "index": {}}
{"driverId": "rosberg", "races": 206, "wins": 23, "titles": 1}
{ "index": {}}
{"driverId": "michael_schumacher", "races": 308, "wins": 91, "titles": 7}
{ "index": {}}
{"driverId": "sutil", "races": 128, "wins": 0, "titles": 0}
{ "index": {}}
{"driverId": "vettel", "races": 279, "wins": 53, "titles": 4 }
{ "index": {}}
{"driverId": "sainz", "races": 140, "wins": 0, "titles": 0}
{ "index": {}}
{"driverId": "wehrlein", "races": 39, "wins": 0, "titles": 0}
{ "index": {}}
{"driverId": "mick_schumacher", "races": 22, "wins": 0, "titles": 0}
2. Index documents with pipeline
Next, drivers data can be ingested with using the enrich pipeline. The only difference in the request is an additional parameter pipeline
as shown below. For sake of article space only 13 documents are inserted.
POST /db-enriched-drivers/_bulk?pipeline=driver-enrichment-pipeline
{ "index": {}}
{"driverId": "alguersuari", "code": "ALG", "givenName": "Jaime", "familyName": "Alguersuari", "dateOfBirth": "1990-03-23", "nationality": "Spanish", "active": false}
{ "index": {}}
{"driverId": "alonso", "permanentNumber": "14", "code": "ALO", "givenName": "Fernando", "familyName": "Alonso", "dateOfBirth": "1981-07-29", "nationality": "Spanish", "active": true}
{ "index": {}}
{"driverId": "rosa", "code": "DLR", "givenName": "Pedro", "familyName": "de la Rosa", "dateOfBirth": "1971-02-24", "nationality": "Spanish", "active": false}
{ "index": {}}
{"driverId": "glock", "code": "GLO", "givenName": "Timo", "familyName": "Glock", "dateOfBirth": "1982-03-18", "nationality": "German", "active": false}
{ "index": {}}
{"driverId": "heidfeld", "code": "HEI", "givenName": "Nick", "familyName": "Heidfeld", "dateOfBirth": "1977-05-10", "nationality": "German", "active": false}
{ "index": {}}
{"driverId": "hulkenberg", "permanentNumber": "27", "code": "HUL", "givenName": "Nico", "familyName": "HĂźlkenberg", "dateOfBirth": "1987-08-19", "nationality": "German", "active": false}
{ "index": {}}
{"driverId": "rosberg", "permanentNumber": "6", "code": "ROS", "givenName": "Nico", "familyName": "Rosberg", "dateOfBirth": "1985-06-27", "nationality": "German", "active": false}
{ "index": {}}
{"driverId": "michael_schumacher", "code": "MSC", "givenName": "Michael", "familyName": "Schumacher", "dateOfBirth": "1969-01-03", "nationality": "German", "active": false}
{ "index": {}}
{"driverId": "sutil", "permanentNumber": "99", "code": "SUT", "givenName": "Adrian", "familyName": "Sutil", "dateOfBirth": "1983-01-11", "nationality": "German", "active": false}
{ "index": {}}
{"driverId": "vettel", "permanentNumber": "5", "code": "VET", "givenName": "Sebastian", "familyName": "Vettel", "dateOfBirth": "1987-07-03", "nationality": "German", "active": true}
{ "index": {}}
{"driverId": "sainz", "permanentNumber": "55", "code": "SAI", "givenName": "Carlos", "familyName": "Sainz", "dateOfBirth": "1994-09-01", "nationality": "Spanish", "active": true}
{ "index": {}}
{"driverId": "wehrlein", "permanentNumber": "94", "code": "WEH", "givenName": "Pascal", "familyName": "Wehrlein", "dateOfBirth": "1994-10-18", "nationality": "German", "active": false}
{ "index": {}}
{"driverId": "mick_schumacher", "permanentNumber": "47", "code": "MSC"
As a result the following response was received in Postman:
Search API
Search APIs are used to search and aggregate data stored in Elasticsearch indices.
1. Search with queries Elasticsearch provides a rich, flexible, query language called the query DSL, which allows us to build much more complicated, robust queries. The domain-specific language (DSL) is specified using a JSON request body.
The following query returns driver who:
- is german
- is still active
- won at least one title
GET /db-enriched-drivers/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"nationality": "german"
}
},
{
"term": {
"active": "true"
}
},
{
"range": {
"statistics.titles": {
"gte": 1
}
}
}
]
}
}
}
As a result the following result was received:
This only shows how the queries are built. They can be more complicated and nested.
2. Search with aggregations The aggregation are added to the search request in the similar way as queries. Below code snippet shows how they are created. The parent aggregation „byNationality” has four sub-aggregations that reflects what was done in the previous article using Java API.
{
"aggs": {
"byNationality": {
"terms": {
"field": "nationality"
},
"aggs": {
"totalWins": {
"sum": {
"field": "statistics.wins"
}
},
"totalTitles": {
"sum": {
"field": "statistics.titles"
}
},
"byHits": {
"top_hits": {
"_source": {
"includes": [
"driverId",
"givenName",
"familyName"
]
},
"size": 100
}
},
"avgRaces": {
"avg": {
"field": "statistics.races"
}
}
}
}
}
}
The queries and aggregations can be put together into one search request to build more sophisticated requests, that fulfill the project requirements.
Summary
This article present the Elasticsearch REST Api that can be used to communicate with Elasticsearch using you favourite web client in practise by perform the same activities like in previous articles using Java API.