Fifth Elephant 2013, Bangalore.
Also available at
ElasticSearch is a flexible and powerful open source, distributed real-time search and analytics engine.
Start small and scale horizontally out of the box. For more capacity, just add more nodes and let the cluster reorganize itself.
ElasticSearch clusters detect and remove failed nodes, and reorganize themselves.
$ curl -XPUT http://localhost:9200/people
$ curl -XPUT http://localhost:9200/gems
$ curl -XPUT http://localhost:9200/gems/document/pry-0.5.9
$ curl -XGET http://localhost:9200/gems/document/pry-0.5.9
A cluster can host multiple indices which can be queried independently, or as a group.
{
"_id": "pry-0.5.9",
"_index": "gems",
"_source": {
"authors": [
"John Mair (banisterfiend)"
],
"autorequire": null,
"bindir": "bin",
"cert_chain": [],
"date": "Sun Feb 20 11:00:00 UTC 2011",
"default_executable": null,
"description": "attach an irb-like session to any object at runtime",
"email": "jrmair@gmail.com"
}
}
Store complex real world entities in Elasticsearch as structured JSON documents.
Almost any operation can be performed using a simple RESTful interface using JSON over HTTP.
ElasticSearch is built on top of Apache Lucene. Lucene is a high performance, full-featured Information Retrieval library, written in Java.
$ curl -XGET http://localhost:9200/gems/document/pry-0.5.9
{
"_id": "pry-0.5.9",
"_index": "gems",
"_source": {
"authors": [
"John Mair (banisterfiend)"
],
"autorequire": null,
"bindir": "bin",
"cert_chain": [],
"date": "Sun Feb 20 11:00:00 UTC 2011",
"default_executable": null,
"description": "attach an irb-like session to any object at runtime",
"email": "jrmair@gmail.com",
"executables": [
"pry"
],
"extensions": [],
"extra_rdoc_files": [],
"files": [
"lib/pry/commands.rb",
"lib/pry/command_base.rb",
"lib/pry/completion.rb",
"lib/pry/core_extensions.rb",
"lib/pry/hooks.rb",
"lib/pry/print.rb",
"lib/pry/prompts.rb",
"lib/pry/pry_class.rb",
"lib/pry/pry_instance.rb",
"lib/pry/version.rb",
"lib/pry.rb",
"examples/example_basic.rb",
"examples/example_commands.rb",
"examples/example_command_override.rb",
"examples/example_hooks.rb",
"examples/example_image_edit.rb",
"examples/example_input.rb",
"examples/example_input2.rb",
"examples/example_output.rb",
"examples/example_print.rb",
"examples/example_prompt.rb",
"test/test.rb",
"test/test_helper.rb",
"CHANGELOG",
"LICENSE",
"README.markdown",
"Rakefile",
".gemtest",
"bin/pry"
],
"has_rdoc": true,
"homepage": "http://banisterfiend.wordpress.com",
"id": "pry-0.5.9",
"licenses": [],
"name": "pry",
"platform": "ruby",
"post_install_message": null,
"rdoc_options": [],
"require_paths": [
"lib"
],
"requirements": [],
"rubyforge_project": null,
"rubygems_version": "1.5.2",
"signing_key": null,
"specification_version": 3,
"summary": "attach an irb-like session to any object at runtime",
"test_files": [],
"version": {
"prerelease": null,
"version": "0.5.9"
}
},
"_type": "document",
"_version": 1,
"exists": true
}
In ElasticSearch, everything is stored as a Document. Document can be addressed and retrieved by querying their attributes.
Lets us specify document properties, so we can differentiate the objects.
Each Shard is a separate native Lucene Index. Lets us overcome RAM limitations, hard disk capacity.
An exact copy of primary Shard. Helps in setting up HA, increases query throughput.
ElasticSearch stores its data in logical Indices. Think of a table, collection or a database.
An Index has atleast 1 primary Shard, and 0 or more Replicas.
A collection of cooperating ElasticSearch nodes. Gives better availability and performance via Index Sharding and Replicas.
Download ElasticSearch from http://www.elasticsearch.org/download
# service elasticsearch start
# /etc/init.d/elasticsearch start
# ./bin/elasticsearch -f
A site plugin to view contents of ElasticSearch cluster.
# cd /usr/share/elasticsearch
# ./bin/plugin -install mobz/elasticsearch-head
# cd /opt/elasticsearch-0.90.2
# ./bin/plugin -install mobz/elasticsearch-head
Restart ElasticSearch. Plugins are detected and loaded on service startup.
$ curl -XGET 'http://localhost:9200/'
{
"ok" : true,
"status" : 200,
"name" : "Drake, Frank",
"version" : {
"number" : "0.90.2",
"snapshot_build" : false,
"lucene_version" : "4.3.1"
},
"tagline" : "You Know, for Search"
}
$ curl -XPUT 'http://localhost:9200/gems'
{
"ok":true,
"acknowledged":true
}
$ curl -XGET 'localhost:9200/_status'
{"ok":true,"_shards":{"total":20,"successful":10,"failed":0},
"indices":{"gems":{"index":{"primary_size":"495b","primary_size_in_bytes":495,
"size":"495b","size_in_bytes":495},"translog":{"operations":0},
"docs":{"num_docs":0,"max_doc":0,"deleted_docs":0},"merges":
{"current":0,"current_docs":0,"current_size":"0b","current_size_in_bytes":0,
"total":0,"total_time":"0s","total_time_in_millis":0,"total_docs":0,
"total_size":"0b","total_size_in_bytes":0},
...
...
...
$ curl -XGET 'localhost:9200/_status?pretty'
$ curl -XGET 'localhost:9200/_status' | python -mjson.tool
$ curl -XGET 'localhost:9200/_status' | json_reformat
{
"ok": true,
"_shards": {
"total": 20,
"successful": 10,
"failed": 0
},
"indices": {
"gems": {
"index": {
"primary_size": "495b",
"primary_size_in_bytes": 495,
"size": "495b",
"size_in_bytes": 495
},
...
$ curl -XDELETE 'http://localhost:9200/gems'
{
"ok":true,
"acknowledged":true
}
{
"settings" : {
"index" : {
"number_of_shards" : 6,
"number_of_replicas" : 0
}
}
}
$ curl -XPUT 'http://localhost:9200/gems' -d @body.json
{
"ok":true,
"acknowledged":true
}
{
"name": "pry",
"platform": "ruby",
"rubygems_version": "1.5.2",
"description": "attach an irb-like session to any object at runtime",
"email": "anurag@example.com",
"has_rdoc": true,
"homepage": "http://banisterfiend.wordpress.com"
}
$ curl -XPOST 'http://localhost:9200/gems/test/' -d @body.json
{
"ok":true,
"_index":"gems",
"_type":"test",
"_id":"lsJgxiwET6eg",
"_version":1
}
$ curl -XGET 'http://localhost:9200/gems/test/lsJgxiwET6eg' | python -mjson.tool
{
"_id": "lsJgxiwET6eg",
"_index": "gems",
"_source": {
"description": "attach an irb-like session to any object at runtime",
"email": "anurag@example.com",
"has_rdoc": true,
"homepage": "http://banisterfiend.wordpress.com",
"name": "pry",
"platform": "ruby",
"rubygems_version": "1.5.2"
},
"_type": "test",
"_version": 1,
"exists": true
}
{
"name": "grit",
"platform": "jruby",
"rubygems_version": "2.5.0",
"description": "Ruby library for extracting information from a git repository.",
"email": "mojombo@github.com",
"has_rdoc": false,
"homepage": "http://github.com/mojombo/grit"
}
$ curl -XPOST 'http://localhost:9200/gems/test/' -d @body.json
{
"ok":true,
"_index":"gems",
"_type":"test",
"_id":"ijUOHi2cQc2",
"_version":1
}
{
"name": "grit",
"platform": "jruby",
"rubygems_version": "2.5.1",
"description": "Ruby library for extracting information from a git repository.",
"email": "mojombo@github.com",
"has_rdoc": false,
"homepage": "http://github.com/mojombo/grit"
}
$ curl -XPUT 'http://localhost:9200/gems/test/grit-2.5.1' -d @body.json
{
"ok":true,
"_index":"gems",
"_type":"test",
"_id":"grit-2.5.1",
"_version":1
}
IDs are unique across Index. Composed of DocumentType and ID.
$ curl -XPUT 'http://localhost:9200/gems/test/grit-2.5.1' -d @body.json
{
"ok":true,
"_index":"gems",
"_type":"test",
"_id":"grit-2.5.1",
"_version":2
}
{
"query": {
"term": {"name": "pry"}
}
}
$ curl -XPOST http://localhost:9200/gems/_search -d @body.json | python -mjson.tool
{
"_shards": {
"failed": 0,
"successful": 6,
"total": 6
},
"hits": {
"hits": [
{
"_id": "MWkKgzsMRgK",
"_index": "gems",
"_score": 1.4054651,
"_source": {
"description": "attach an irb-like session to any object at runtime",
"email": "anurag@example.com",
"has_rdoc": true,
"homepage": "http://banisterfiend.wordpress.com",
"name": "pry",
"platform": "ruby",
"rubygems_version": "1.5.2"
},
"_type": "test"
}
],
"max_score": 1.4054651,
"total": 1
},
"timed_out": false,
"took": 2
}
{
"term": {"name": "pry"}
}
$ curl -XGET http://localhost:9200/gems/test/_count -d @body.json
{
"_shards": {
"failed": 0,
"successful": 6,
"total": 6
},
"count": 1
}
{
"doc": {
"platform": "macruby"
}
}
$ curl -XPOST http://localhost:9200/gems/test/grit-2.5.1/_update -d @body.json
{
"ok":true,
"_index":"gems",
"_type":"test",
"_id":"grit-2.5.1",
"_version":4
}
The partial document is merged using simple recursive merge.
{
"script" : "ctx._source.platform = vm_name",
"params" : {
"vm_name" : "rubinius"
}
}
$ curl -XPOST http://localhost:9200/gems/test/grit-2.5.1/_update -d @body.json
{
"ok":true,
"_index":"gems",
"_type":"test",
"_id":"grit-2.5.1",
"_version":5
}
$ curl -XDELETE 'http://localhost:9200/gems/test/grit-2.5.1'
{
"ok":true,
"found":true,
"_index":"gems",
"_type":"test",
"_id":"grit-2.5.1",
"_version":6
}
{
"gem" : {
"properties" : {
"name" : {"type" : "string", "index": "not_analyzed"},
"platform" : {"type" : "string", "index": "not_analyzed"},
"rubygems_version" : {"type" : "string", "index": "not_analyzed"},
"description" : {"type" : "string", "store" : "yes"},
"has_rdoc" : {"type" : "boolean"}
}
}
}
$ curl -XPUT 'http://localhost:9200/gems/gem/_mapping' -d @body.json
$ curl -XGET 'http://localhost:9200/gems/_mapping' | python -mjson.tool
{
"name": "grit",
"platform": "ruby",
"rubygems_version": "2.5.1",
"description": "Ruby library for extracting information from a git repository.",
"email": "mojombo@github.com",
"has_rdoc": false,
"homepage": "http://github.com/mojombo/grit"
}
$ curl -XPUT 'http://localhost:9200/gems/gem/grit-2.5.1' -d @body.json
{
"ok":true,
"_index":"gems",
"_type":"gem",
"_id":"grit-2.5.1",
"_version":1
}
{
"query": {
"match" : {
"description" : "git repository"
}
}
}
$ curl -XPOST http://localhost:9200/gems/gem/_search -d @body.json
{
"query": {
"match" : {
"description" : "git repository"
}
},
"highlight" : {
"fields" : {
"description" : {}
}
}
}
$ curl -XPOST http://localhost:9200/gems/gem/_search -d @body.json
"highlight": {
"description": [
"Ruby library for extracting information from a git repository."
]
}
{
"query": { "match_all" : {} },
"facets" : {
"gem_names" : {
"terms" : { "field": "name" }
}
}
}
$ curl -XPOST http://localhost:9200/gems/_search -d @body.json
...
"facets": {
"gem_names": {
"_type": "terms",
"missing": 0,
"other": 0,
"terms": [
{
"count": 2,
"term": "pry"
},
{
"count": 2,
"term": "grit"
},
{
"count": 1,
"term": "abc"
}
],
"total": 5
}
},
"hits": {
"hits": [
...
Download from Aadhaar Public Data Portal at https://data.uidai.gov.in
$ git clone https://github.com/gnurag/aadhaar
# gem install yajl-ruby tire activesupport
$ git clone https://github.com/gnurag/aadhaar
$ cd aadhaar/data
$ unzip UIDAI-ENR-DETAIL-20121001.zip
$ cd ../bin
$ vi aadhaar.rb
AADHAAR_DATA_DIR = "/path/to/aadhaar/data"
ES_URL = "http://localhost:9200"
ES_INDEX = 'aadhaar'
ES_TYPE = "UID"
BATCH_SIZE = 1000
$ ruby aadhaar.rb
$ curl -XPOST http://localhost:9200/aadhaar/UID/_search -d @template.json | python -mjson.tool
Group multiple Indexes, and query them together.
curl -XPOST 'http://localhost:9200/_aliases' -d '
{
"actions" : [
{ "add" : { "index" : "index1", "alias" : "master-alias" } }
{ "add" : { "index" : "index2", "alias" : "master-alias" } }
]
}'
curl -XPOST 'http://localhost:9200/_aliases' -d '
{
"actions" : [
{ "remove" : { "index" : "index2", "alias" : "master-alias" } }
]
}'
Control which Shard the document will be placed and queried from.
$ curl -XPUT http://localhost:9200/gems/gem/roxml?parent=rexml -d '{
"tag" : "something"
}'
A wide range of site plugins, analyzers, river plugins available from the community.