Open Distro for Elasticsearch Kickstart guide

March 13, 2019 7 min read

Open Distro for Elasticsearch Kickstart guide

Get the code for this post!

Recently Updated 14th June 2020 - Now supports up to version 1.8.0 of Open Distro

Open Distro for Elasticsearch is a very recent fork of Elasticsearch by Amazon that aims to distribute a version with enhanced features that you'd traditionally have to pay for with Elasticsearch.

Some of these open sourced features are available today:

Security & Authentication (Kibana & Elasticsearch)
Alerting (Monitoring & Triggers)
SQL Queries
perftop (Performance monitoring for stack)

There's also a decent explanation as to why Amazon is going down this path, a key line to take away:

As was the case with Java and OpenJDK, our intention is not to fork Elasticsearch, and we will be making contributions back to the Apache 2.0-licensed Elasticsearch upstream project as we develop add-on enhancements to the base open source software.

Adrian Cockcroft - Keeping Open Source Open – Open Distro for Elasticsearch

Purpose

The purpose of this guide is to give a really streamlined introduction to Open Distro for Elasticsearch and extend the examples on the website to include a small example to prove that all our usual features from elastic are intact.

Text analysis dashboard of podcast details from The Dollop episodes

Nothing I do within this guide is exceptionally unique in terms of actual configuration of Open Distro for Elasticsearch however I believe applying a practical example when trying to learn a new tool is really important! All the code we'll be using can be found in t04glovern/open-distro-kickstart

Outcome

By the end of this post you'll have setup a simple multi node cluster of Elasticsearch nodes and a Kibana frontend that will be used to view interesting data about a podcast I'm quite fond of called The Dollop.

Open Distro Dockerized

You'll need Docker installed on the system you're working with so download the Community Edition and install that to begin with. Also pull down a copy of the demo repository t04glovern/open-distro-kickstart using the following git commands

git clone https://github.com/t04glovern/open-distro-kickstart

Next we're going to run docker-compose to bring up three different containers for us that are defined in the docker-compose.yml file.

version: '3'

services:

  odfe-node1:
    image: amazon/opendistro-for-elasticsearch:1.8.0
    container_name: odfe-node1
    environment:
      - cluster.name=odfe-cluster
      - node.name=odfe-node1
      - discovery.seed_hosts=odfe-node1,odfe-node2
      - cluster.initial_master_nodes=odfe-node1,odfe-node2
      - bootstrap.memory_lock=true # along with the memlock settings below, disables swapping
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m" # minimum and maximum Java heap size, recommend setting both to 50% of system RAM
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536 # maximum number of open files for the Elasticsearch user, set to at least 65536 on modern systems
        hard: 65536
    volumes:
      - odfe-data1:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
      - 9600:9600 # required for Performance Analyzer
    networks:
      - odfe-net

  odfe-node2:
    image: amazon/opendistro-for-elasticsearch:1.8.0
    container_name: odfe-node2
    environment:
      - bootstrap.memory_lock=true
      - cluster.initial_master_nodes=odfe-node1,odfe-node2
      - cluster.name=odfe-cluster
      - discovery.seed_hosts=odfe-node1,odfe-node2
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - node.name=odfe-node2
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - odfe-data2:/usr/share/elasticsearch/data
    networks:
      - odfe-net

  kibana:
    image: amazon/opendistro-for-elasticsearch-kibana:1.8.0
    container_name: odfe-kibana
    ports:
      - 5601:5601
    expose:
      - "5601"
    environment:
      ELASTICSEARCH_URL: http://odfe-node1:9200
      ELASTICSEARCH_HOSTS: https://odfe-node1:9200
    networks:
      - odfe-net

volumes:
  odfe-data1:
  odfe-data2:

networks:
  odfe-net:

This file above is much like the one from the tutorial on the Open Distro for Elasticsearch website however a couple lines have been added to improve usability while we're loading data in; specifically:

ELASTICSEARCH_URL: http://odfe-node1:9200 - similarly we've changed the https to http in the url definition for Kibana.

NOTE: this is NOT good practice, and should never be done in production or work environments.

Deploy the docker images using the following command from within the repo folder

docker-compose up -d

NOTE: If your cluster fails to start you might need to increaase the max virtual memory using the following.

sudo sysctl -w vm.max_map_count=262144

NOTE: If you are on Windows, will need to increase the memory allocated to Docker to at least 4G. You can do this under the Docker settings

Running docker ps -a should result in being able to see all three containers up and running after about 30 seconds.

PORTS                                                      NAMES
9200/tcp, 9300/tcp, 9600/tcp                               odfe-node2
0.0.0.0:5601->5601/tcp                                     odfe-kibana
0.0.0.0:9200->9200/tcp, 0.0.0.0:9600->9600/tcp, 9300/tcp   odfe-node1

Open up http://localhost:5601 and login with the default credentials admin:admin

perftop - Monitoring

Before diving into our Elasticsearch stack too much we'll have a go at setting up the monitoring tool perftop with our ES instance. I've written a short line for either MacOS or Linux that can be run in order to pull down a copy of the binary for use in this tutorial.

## MacOS
wget https://d3g5vo6xdbdb9a.cloudfront.net/downloads/perftop/perf-top-1.8.0.0-MACOS.zip && \
    unzip perf-top-1.8.0.0-MACOS.zip && \
    rm perf-top-1.8.0.0-MACOS.zip

## Linux
wget https://d3g5vo6xdbdb9a.cloudfront.net/downloads/perftop/perf-top-1.8.0.0-LINUX.zip && \
    unzip perf-top-1.8.0.0-LINUX.zip && \
    rm perf-top-1.8.0.0-LINUX.zip

Then simply run one of the following commands in order to view one of the four dashboards.

# MacOS: perf-top-macos | Linux: perf-top-linux
./perf-top-${os} --dashboard ClusterNetworkMemoryAnalysis   --endpoint localhost:9600
./perf-top-${os} --dashboard ClusterOverview                --endpoint localhost:9600
./perf-top-${os} --dashboard ClusterThreadAnalysis          --endpoint localhost:9600
./perf-top-${os} --dashboard NodeAnalysis                   --endpoint localhost:9600

Note: If you get the following error, run the commands with the TERM=xterm prefix

# Error: File '/**/perftop/node_modules/blessed/usr/xterm' was not included into executable at compilation stage. Please recompile adding it as asset or script
TERM=xterm ./perf-top-${os} --dashboard ClusterNetworkMemoryAnalysis   --endpoint localhost:9600

The Dollop Data

Now that the cluster is up, lets load in a nice dataset to work with. I've actually developed one myself using a great tutorial last year called Discovering and indexing podcast episodes using Amazon Transcribe and Amazon Comprehend. The outcome of that is the data found in es_dollop within the project.

To load in this data, we'll use elasticdump which can be easily installed using npm via the following:

npm install elasticdump -g

Then run the following commands in order to load in the data & mappings.

NODE_TLS_REJECT_UNAUTHORIZED=0 elasticdump \
    --input=es_dollop/dollop_episode_mapping.json \
    --output=https://admin:admin@localhost:9200/episodes \
    --type=mapping

NODE_TLS_REJECT_UNAUTHORIZED=0 elasticdump \
    --input=es_dollop/dollop_episode.json \
    --output=https://admin:admin@localhost:9200/episodes \
    --type=data

NODE_TLS_REJECT_UNAUTHORIZED=0 elasticdump \
    --input=es_dollop/dollop_paragraph_mapping.json \
    --output=https://admin:admin@localhost:9200/paragraphs \
    --type=mapping

NODE_TLS_REJECT_UNAUTHORIZED=0 elasticdump \
    --input=es_dollop/dollop_paragraph.json \
    --output=https://admin:admin@localhost:9200/paragraphs \
    --type=data

Head back to Kibana and add two new index patterns by clicking Management then Index Patterns

Create the episodes index

Set the Time Filter to be published_time

Perform the same actions again but for paragraphs, also ensure to set the URL field to Link

Navigate to Management -> Index Patterns -> Create Index.
In the index pattern textbox, type paragraphs, then choose Next Step.
Accept the defaults and choose Create Index Pattern.
Scroll down to the Url field and choose the edit icon on the right.
Set the Format to Url, and the type to Link. Then choose Update Field