November 23, 2024

Elastic Search data visualization with Kibana – How to create a scatter plot with VEGA-lite

Hello guys. Thank you very much for your interest in the articles I post here! If you like them, you can support by subscribing or by using the buy me a beer link. Every bit is appreciated!

In this post, I would like to show you how to create a nice scatter chart with Kibana for some data stored in Elastic Search.

What is Kibana?

If you used Elastic Search before you know already that it doesn’t have any kind of data visualization. Basically, you interact with it by sending commands to HTTP endpoints. Data is grouped into indexes where each field is stored as a certain data type (integer, boolean, date etc.).

Kibana is developed by the same company which is behind Elastic Search and allows you to visualize the data stored in Elastic Search indexes. It has many types of visualizations like charts, tables, maps and so forth. Its rich user interface allows you to get a very good grasp about your data.

What’s so special about a scatter plot?

A scatter plot is not a very complex data visualization but for some reason (at the time of this writing), Kibana doesn’t have a built-in scatter plot to use. Still it allows you to define custom charts with the help of VEGA-Lite grammar syntax. Vega-lite provides a way of defining complex interactive visualizations for data analysis and representation. More information can be found on their website.

Preparing the environment

Let’s prepare the environment which we’ll use to demonstrate the creation of the scatter plot. In my case I’m using ELK stack with Docker running locally from this repository. If you don’t have Elastic or Kibana yet, the easiest way is to clone the repository and use docker compose to start the stack. When asked for user/pass, use elastic/changeme.

UPDATE: This was the original repository that I tried, but I encountered some 404 erros in Kibana and that’s why I switched to this.

As for the data sample, I’m going to provide a NDJSON file which contains life expectancy and fertility for a list of countries. Data was taken from here and converted to NDJSON format.

Download sample data or copy paste from here:


  {  "country": "China",  "lifeExpectancy": 76.7,  "fertility": 1.7}
{  "country": "India",  "lifeExpectancy": 69.4,  "fertility": 2.2}
{  "country": "United States",  "lifeExpectancy": 78.5,  "fertility": 1.7}
{  "country": "Indonesia",  "lifeExpectancy": 71.5,  "fertility": 2.3}
{  "country": "Brazil",  "lifeExpectancy": 75.7,  "fertility": 1.7}
{  "country": "Pakistan",  "lifeExpectancy": 67.1,  "fertility": 3.5}
{  "country": "Nigeria",  "lifeExpectancy": 54.3,  "fertility": 5.4}
{  "country": "Bangladesh",  "lifeExpectancy": 72.3,  "fertility": 2}
{  "country": "Russia",  "lifeExpectancy": 72.7,  "fertility": 1.6}
{  "country": "Mexico",  "lifeExpectancy": 75,  "fertility": 2.1}
{  "country": "Japan",  "lifeExpectancy": 84.2,  "fertility": 1.4}
{  "country": "Philippines",  "lifeExpectancy": 71.1,  "fertility": 2.6}
{  "country": "Ethiopia",  "lifeExpectancy": 66.2,  "fertility": 4.2}
{  "country": "Vietnam",  "lifeExpectancy": 75.3,  "fertility": 2}
{  "country": "Egypt",  "lifeExpectancy": 71.8,  "fertility": 3.3}
{  "country": "Germany",  "lifeExpectancy": 81,  "fertility": 1.6}
{  "country": "Iran",  "lifeExpectancy": 76.5,  "fertility": 2.1}
{  "country": "Turkey",  "lifeExpectancy": 77.4,  "fertility": 2.1}
{  "country": "Congo",  "lifeExpectancy": 60.4,  "fertility": 5.9}
{  "country": "Thailand",  "lifeExpectancy": 76.9,  "fertility": 1.5}
{  "country": "France",  "lifeExpectancy": 82.5,  "fertility": 1.9}
{  "country": "United Kingdom",  "lifeExpectancy": 81.4,  "fertility": 1.7}
{  "country": "Italy",  "lifeExpectancy": 82.9,  "fertility": 1.3}
{  "country": "South Africa",  "lifeExpectancy": 63.9,  "fertility": 2.4}


Importing data in Kibana

It’s time to import the data into Kibana. By default, Kibana is empty so it will ask you to provide the data file from above. Kindly upload to Kibana and make sure it looks like this:

The country field should be a keyword and the fertility and lifeExpectancy need to be numbers. After checking this, click the import button. When asked for the index name, you can enter “country_data” and make sure the Create index pattern is checked. If everything goes well, you should see something like this:

Creating the actual scatter chart

Now that we have the data ready, let’s proceed to create the scatter chart visualization. If you go to Create visualization you’ll see a list of available visualizations to create. Please select the Vega one:

Now here things are getting a little more complex. We have more freedom to define the chart visualization with the custom JSON VEGA format. Although I’m no specialist in this matter, I managed to come up with some scatter map code which you can paste straight into the editor:

{
  $schema: https://vega.github.io/schema/vega-lite/v5.json
  title: Country fertility representation
  mark: {"type": "point", "tooltip": true}
  data: {
    url: {
      %context%: true
      index: country_data
      body: {
        size: 1000
        _source: ["country", "lifeExpectancy", "fertility"]
      }
    }
    format: {property: "hits.hits"}
  }
  encoding: {
    x: {field: "_source.lifeExpectancy", type: "quantitative", title: "Life Expectancy"}
    y: {field: "_source.fertility", type: "quantitative", title: "Fertility"}
    tooltip: [
      {"field": "_source.country", "type": "nominal", "title": "Country"}
      {"field": "_source.lifeExpectancy", "type": "nominal", "title": "Life Expectancy"}
      {"field": "_source.fertility", "type": "nominal", "title": "Fertility"}
    ]
  }
}

After pasting the code above, you can press the Update button and Voila 😀! You now have a working scatter map with the data from the index. On the X axis the life expectancy is represented and likewise on the Y axis you find the fertility rate. If you hover around a point, you can see all the details including the country.

Now you can save the visualization and use it in a dashboard or whatever. You may want to modify it to suit your needs 😉

Final thoughts

It’s superb that Kibana allows such a powerful feature to implement custom charts. The example above is just skimming the surface and the capabilities of this VEGA-lite are to be explored thoroughly. In our example, we basically tell VEGA to get some data from the index country_data and display X and Y axis with the fields defined there. We are also telling it to display tooltips with relevant information. Things can get more complex, so I encourage you to check their website.

I guess that’s quite about it for this article, if you have any questions please comment or use the contact form to get in touch. As always:

Thanks for reading, I hope you found this article useful and interesting. If you have any suggestions don’t hesitate to contact me. If you found my content useful please consider a small donation. Any support is greatly appreciated! Cheers  😉

afivan

Enthusiast adventurer, software developer with a high sense of creativity, discipline and achievement. I like to travel, I like music and outdoor sports. Because I have a broken ligament, I prefer safer activities like running or biking. In a couple of years, my ambition is to become a good technical lead with entrepreneurial mindset. From a personal point of view, I’d like to establish my own family, so I’ll have lots of things to do, there’s never time to get bored 😂

View all posts by afivan →