March 13, 2019     9min read

Open Distro for Elasticsearch Kickstart guide


Get the code for this post!

t04glovern/open-distro-kickstart

Open Distro for Elasticsearch is a very recent fork of Elasticsearch by Amazon that aims to distribute a version with enhanced features that you'd traditionally have to pay for with Elasticsearch.

Some of these open sourced features are available today:

  • Security & Authentication (Kibana & Elasticsearch)
  • Alerting (Monitoring & Triggers)
  • SQL Queries
  • perftop (Performance monitoring for stack)

There's also a decent explanation as to why Amazon is going down this path, a key line to take away:

As was the case with Java and OpenJDK, our intention is not to fork Elasticsearch, and we will be making contributions back to the Apache 2.0-licensed Elasticsearch upstream project as we develop add-on enhancements to the base open source software.

Adrian Cockcroft - Keeping Open Source Open – Open Distro for Elasticsearch

Purpose

The purpose of this guide is to give a really streamlined introduction to Open Distro for Elasticsearch and extend the examples on the website to include a small example to prove that all our usual features from elastic are intact.

Text analysis dashboard of podcast details from The Dollop episodes
Text analysis dashboard of podcast details from The Dollop episodes

Nothing I do within this guide is exceptionally unique in terms of actual configuration of Open Distro for Elasticsearch however I believe applying a practical example when trying to learn a new tool is really important! All the code we'll be using can be found in t04glovern/open-distro-kickstart

Outcome

By the end of this post you'll have setup a simple multi node cluster of Elasticsearch nodes and a Kibana frontend that will be used to view interesting data about a podcast I'm quite fond of called The Dollop.

Open Distro Dockerized

You'll need Docker installed on the system you're working with so download the Community Edition and install that to begin with. Also pull down a copy of the demo repository t04glovern/open-distro-kickstart using the following git commands

git clone https://github.com/t04glovern/open-distro-kickstart

Next we're going to run docker-compose to bring up three different containers for us that are defined in the docker-compose.yml file.

version: '3'

services:

  odfe-node1:
    image: amazon/opendistro-for-elasticsearch:0.9.0
    container_name: odfe-node1
    environment:
      - cluster.name=odfe-cluster
      - bootstrap.memory_lock=true # along with the memlock settings below, disables swapping
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m" # minimum and maximum Java heap size, recommend setting both to 50% of system RAM
      - opendistro_security.ssl.http.enabled=false
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - odfe-data1:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
      - 9600:9600 # required for Performance Analyzer
    networks:
      - odfe-net

  odfe-node2:
    image: amazon/opendistro-for-elasticsearch:0.9.0
    container_name: odfe-node2
    environment:
      - cluster.name=odfe-cluster
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - discovery.zen.ping.unicast.hosts=odfe-node1
      - opendistro_security.ssl.http.enabled=false
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - odfe-data2:/usr/share/elasticsearch/data
    networks:
      - odfe-net

  kibana:
    image: amazon/opendistro-for-elasticsearch-kibana:0.9.0
    container_name: odfe-kibana
    ports:
      - 5601:5601
    expose:
      - "5601"
    environment:
      ELASTICSEARCH_URL: http://odfe-node1:9200
      ELASTICSEARCH_HOSTS: https://odfe-node1:9200
    networks:
      - odfe-net

volumes:
  odfe-data1:
  odfe-data2:

networks:
  odfe-net:

This file above is much like the one from the tutorial on the Open Distro for Elasticsearch website however a couple lines have been added to improve usability while we're loading data in; specifically:

  • opendistro_security.ssl.http.enabled=false - included to disable HTTPS. This was done so that tools that don't support SSL requests with the --insecure flag can still be used.
  • ELASTICSEARCH_URL: http://odfe-node1:9200 - similarly we've changed the https to http in the url definition for kibana.

NOTE: this is NOT good practice, and should never be done in production or work environments.

Deploy the docker images using the following command from within the repo folder

docker-compose up -d

Running docker ps -a should result in being able to see all three containers up and running after about 30 seconds.

PORTS                                                      NAMES
9200/tcp, 9300/tcp, 9600/tcp                               odfe-node2
0.0.0.0:5601->5601/tcp                                     odfe-kibana
0.0.0.0:9200->9200/tcp, 0.0.0.0:9600->9600/tcp, 9300/tcp   odfe-node1

Open up http://localhost:5601 and login with the default credentials admin:admin

Open Distro kibana home
Open Distro kibana home

perftop - Monitoring

Before diving into our Elasticsearch stack too much we'll have a go at setting up the monitoring tool perftop with our ES instance. I've written a short line for either MacOS or Linux that can be run in order to pull down a copy of the binary for use in this tutorial.

## MacOS
wget https://d3g5vo6xdbdb9a.cloudfront.net/downloads/perftop/perf-top-0.7.0.0-MACOS.zip && \
    unzip perf-top-0.7.0.0-MACOS.zip && \
    rm perf-top-0.7.0.0-MACOS.zip

## Linux
wget https://d3g5vo6xdbdb9a.cloudfront.net/downloads/perftop/perf-top-0.7.0.0-LINUX.zip && \
    unzip perf-top-0.7.0.0-LINUX.zip && \
    rm perf-top-0.7.0.0-LINUX.zip

Then simply run one of the following commands in order to view one of the four dashboards.

# MacOS: perf-top-macos | Linux: perf-top-linux
./perf-top-${os} --dashboard dashboards/ClusterNetworkMemoryAnalysis.json   --endpoint localhost:9600
./perf-top-${os} --dashboard dashboards/ClusterOverview.json                --endpoint localhost:9600
./perf-top-${os} --dashboard dashboards/ClusterThreadAnalysis.json          --endpoint localhost:9600
./perf-top-${os} --dashboard dashboards/NodeAnalysis.json                   --endpoint localhost:9600

Open Distro perftop
Open Distro perftop

The Dollop Data

Now that the cluster is up, lets load in a nice dataset to work with. I've actually developed one myself using a great tutorial last year called Discovering and indexing podcast episodes using Amazon Transcribe and Amazon Comprehend. The outcome of that is the data found in es_dollop within the project.

To load in this data, we'll use elasticdump which can be easily installed using npm via the following:

npm install elasticdump -g

Then run the following commands in order to load in the data & mappings.

elasticdump \
    --input=es_dollop/dollop_episode_mapping.json \
    --output=http://admin:admin@localhost:9200/episodes --insecure \
    --type=mapping

elasticdump \
    --input=es_dollop/dollop_episode.json \
    --output=http://admin:admin@localhost:9200/episodes --insecure \
    --type=data

elasticdump \
    --input=es_dollop/dollop_paragraph_mapping.json \
    --output=http://admin:admin@localhost:9200/paragraphs --insecure \
    --type=mapping

elasticdump \
    --input=es_dollop/dollop_paragraph.json \
    --output=http://admin:admin@localhost:9200/paragraphs --insecure \
    --type=data

Head back to Kibana and add two new index patterns by clicking Management then Index Patterns

Open Distro index patterns
Open Distro index patterns

Create the episodes index

Open Distro define episodes pattern
Open Distro define episodes pattern

Set the Time Filter to be published_time

Open Distro time filter
Open Distro time filter

Perform the same actions again but for paragraphs, also ensure to set the URL field to Link

  1. Navigate to Management→ Index Patterns → Create Index.
  2. In the index pattern textbox, type paragraphs, then choose Next Step.
  3. Accept the defaults and choose Create Index Pattern.
  4. Scroll down to the Url field and choose the edit icon on the right.
  5. Set the Format to Url, and the type to Link. Then choose Update Field

Open Distro paragraph format
Open Distro paragraph format

Kibana Dashboard

Load in a new dashboard by clicking Saved Objects under Management. Then click Import

Open Distro import
Open Distro import

Select es_dollop/dashboard.json from the repo folder and click Import

Open Distro import json
Open Distro import json

If you get an error just make sure you assign episodes and paragraphs to the right sets and hit confirm

Open Distro index conflicts
Open Distro index conflicts

Click on Dashboard then Podcast analytics to load up the new dashboard

Open Distro dashboard selection
Open Distro dashboard selection

Finally, set the Time scale in the top right to 1-3 years ago (until you see data)

A short kickstart project for working with Open Distro for Elasticsearch. Performing podcast text analysis on The Dollop using Elasticsearch.
A short kickstart project for working with Open Distro for Elasticsearch. Performing podcast text analysis on The Dollop using Elasticsearch.

Cleaning Up

To clean up all the containers when you're done, simply run the following docker-compose command to remove all.

docker-compose down -v
devopstar

DevOpStar by Nathan Glover | 2020