The introductory documents and tutorials all use cURL (here after referred to by its command line name curl) to interact with Elasticsearch and demonstrate what is possible and what is returned. Below is a short collection of these exercises with some explainations.
This first example for elasticsearch is almost always a simple get with no parameters. It is a simple way to check to see if the environment and server are set and functioning properly. Hence, the reason for the title.
The examples are using an AWS instance, the user will need to change the server to either "localhost" for their personal machine or the URL for the elasticsearch server they are using.
In [36]:
%%bash
curl -XGET "http://search-01.ec2.internal:9200/"
Counting is faster than searching and should be used when the actual results are not needed. From "ElasticSearch Cookbook - Second Edition":
It is often required to return only the count of the matched results and not the results themselves. The advantages of using a count request is the performance it offers and reduced resource usage, as a standard search call also returns hits count.
The simplest count is a count of all the documents in elasticsearch.
In [42]:
%%bash
curl -XGET 'http://search-01.ec2.internal:9200/_count'
The second type of simple count is to count by index. If the index is gdelt1979 then:
In [48]:
%%bash
curl -XGET 'http://search-01.ec2.internal:9200/gdelt1979/_count'
or if the index is the Global Summary of the Day data, i.e. gsod then:
In [47]:
%%bash
curl -XGET 'http://search-01.ec2.internal:9200/gsod/_count'
If the user prefers a nicer looking output then a request to make it pretty is in order.
In [49]:
%%bash
curl -XGET 'http://search-01.ec2.internal:9200/gsod/_count?pretty'
Keep in mind counts can be as complicated as searches. Just changing _count to _search and vice versa changes how elasticsearch handles the request.
With that said it is now time to show and develop some search examples.
Search is the main use for elasticsearch, hence the name and where the bulk of the examples will be. This notebook will attempt to take the user through examples that show only one new feature at a time. This will hopefully allow the user to see the order of commands which is unfortuantely important to elasticsearch.
As with count above it will start with a simple example.
In [51]:
%%bash
curl -XGET 'http://search-01.ec2.internal:9200/gsod/_search'
By default elasticsearch returns 10 documents for every search. As is evident the pretty option used for count above is needed here.
In [52]:
%%bash
curl -XGET 'http://search-01.ec2.internal:9200/gsod/_search?pretty'
Much better but it can be easily seen that if this notebook continues with the elasticsearch default for number of documents it will become very unweldy very quickly. So, let's use the size option.
In [54]:
%%bash
curl -XGET 'http://search-01.ec2.internal:9200/gsod/_search?pretty' -d '
{
"size": "1"
}'
In [64]:
%%bash
curl -XGET 'http://search-01.ec2.internal:9200/gsod/_search?pretty' -d '
{
"_source": ["Max Temp"],
"size": "2"
}'
In [2]:
%%bash
curl -XGET 'http://search-01.ec2.internal:9200/gsod/_search?pretty' -d '
{
"query": {
"filtered": {
"filter": {
"range": {
"Date": {
"gte": "2007-01-01",
"lte": "2007-01-01"
}
}
}
}
},
"_source": ["Max Temp"],
"size": "1"
}'
In [ ]:
%%bash
curl -XGET 'http://search-01.ec2.internal:9200/gsod/_search?pretty' -d '
{
"query": {
"filtered": {
"query": { "match_all": {} },
"filter": {
"range": {
"Date": {
"gte": "2007-01-01",
"lte": "2007-12-31"
}
}
}
}
},
"size": "1"
}'
In [1]:
%%bash
curl -XGET 'http://search-01.ec2.internal:9200/gsod/_count' -d '
{
"query": {
"filtered": {
"filter": {
"range": {
"Date": {
"gte": "2007-01-01",
"lte": "2007-01-31"
}
}
}
}
}
}'
In [2]:
%%bash
curl -XGET 'http://search-01.ec2.internal:9200/gsod/_search?pretty' -d '
{
"query": {
"filtered": {
"query": { "match_all": {} },
"filter": {
"range": {
"Date": {
"gte": "2007-01-01",
"lte": "2007-01-31"
}
}
}
}
},
"_source": ["Mean Temp", "Min Temp", "Max Temp"],
"size": "563280"
}' > temps_200701.txt
In [26]:
import json
with open("temps_2007.txt", "r") as f:
mean_temps = []
max_temps = []
min_temps = []
for line in f:
if "_source" in line:
line = json.loads(line[16:-1])
min_tmp = float(line['Min Temp'])
if -300 < min_tmp < 300:
min_temps.append(min_tmp)
mean_tmp = float(line['Mean Temp'])
if -300 < min_tmp < 300:
mean_temps.append(mean_tmp)
max_tmp = float(line['Max Temp'])
if -300 < max_tmp < 300:
max_temps.append(max_tmp)
print("From {} observations the temperatures for 2007 are:"\
.format(len(mean_temps)))
print("Min Temp: {:.1f}".format(min(min_temps)))
print("Mean Temp: {:.1f}".format(sum(mean_temps)/len(mean_temps)))
print("Max Temp: {:.1f}".format(max(max_temps)))
In [ ]: