Getting Started with Big Data Using Elasticsearch and Kibana

At some point, every application outgrows its database’s built-in search capabilities. The performance cliff of LIKE '%...%' queries, the need to search through log data, the demand for real-time filtering across millions of records — all of these point to the same underlying truth: relational databases were not designed for full-text search or large-scale data analysis. That is exactly the gap Elasticsearch fills.

Elasticsearch is an open-source search and analytics engine. At its core sits Apache Lucene; Elasticsearch wraps Lucene’s powerful indexing infrastructure in a JSON-based HTTP API and deploys it on a distributed, horizontally scalable architecture. You can store and query diverse data types — unstructured text, log entries, time-series data, and geospatial coordinates — all within the same system.

How Elasticsearch Stores Data

What a relational database calls a “table” and a “row”, Elasticsearch calls an index and a document. An index is a logical container for documents of a similar type; a document is a single data record in JSON format.

The distributed architecture that enables large-scale operation works like this: each index is divided into pieces called shards. Each shard is essentially an independent Lucene instance and can be distributed across different nodes. Replicas of these shards serve dual purposes — they act as backups for high availability and distribute read load across the cluster.

The practical outcome of this architecture: when a node fails, replica shards take over, data is preserved, and the system keeps running. A single node is sufficient for development; sizing this topology correctly matters in production.

Where Elasticsearch Truly Shines

Elasticsearch is not the right tool for every job — it is the right tool for specific problems:

Full-text search: Beyond simple string matching, it analyzes text — stemming, synonym expansion, relevance scoring. When a user on an e-commerce site searches for “running shoe”, getting back results for “running shoes” is the analysis layer at work.

Log and monitoring analysis: Built for ingesting thousands of log lines per second and running real-time queries over them. The ELK stack — Elasticsearch, Logstash, Kibana — is the classic combination for this use case.

Time-series data: Attaching a timestamp to every record and asking questions like “which minute in the last hour had the highest traffic” or generating time-based histograms is something it is optimized for.

Geospatial queries: Storing coordinate-based data and executing geo-distance queries such as “return records within X km of this location” is a supported out-of-the-box feature.

It does not do what relational databases do — transactions, foreign keys, joins. These two are not competitors; they are complementary: application data lives in MySQL/PostgreSQL, while the search and analytics layer lives in Elasticsearch.

Kibana: Making Sense of Your Data

Kibana is the visualization interface for Elasticsearch data. You can reach Elasticsearch directly via curl or a client library, but Kibana organizes that data for human consumption.

A few core areas of use:

Discover: Query your data and inspect raw documents. Log investigation happens here.
Visualize/Dashboard: Build charts from metrics and assemble them together. Line charts, pie charts, heat maps, counters — all via drag and drop.
Dev Tools: Send Query DSL requests directly to Elasticsearch from within Kibana. Indispensable when learning the API.

Kibana also ships with Machine Learning plugins and Alerting; while those have moved to paid tiers, the core visualization features remain free.

Setting Up with Docker

For a development environment, Docker Compose is the most practical approach. Both versions must match:

docker run -d --name elasticsearch --net somenetwork -p 9200:9200 -e "discovery.type=single-node" elasticsearch:8.8.1
docker run -d --name kibana --net somenetwork -p 5601:5601 -e "ELASTICSEARCH_HOSTS=http://elasticsearch:9200" kibana:8.8.1

Or with a docker-compose.yml:

version: '3'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.8.1
    container_name: elasticsearch
    environment:
      - discovery.type=single-node
    ports:
      - "9200:9200"
    networks:
      - somenetwork

  kibana:
    image: docker.elastic.co/kibana/kibana:8.8.1
    container_name: kibana
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
    ports:
      - "5601:5601"
    networks:
      - somenetwork

networks:
  somenetwork:
    driver: bridge

docker-compose up -d

Once up, Elasticsearch is accessible at http://localhost:9200 and Kibana at http://localhost:5601.

Starting with Elasticsearch 8.x, security is enabled by default — HTTP Basic Auth and TLS are active out of the box. To disable this in a development environment, add the xpack.security.enabled=false environment variable; otherwise you will hit 401 errors immediately.

What Comes Next

This post covers the core concepts and setup for Elasticsearch and Kibana. In real-world usage, the first place you are likely to get stuck is Query DSL — Elasticsearch’s JSON-based query language looks unfamiliar at first, but once it clicks, it is remarkably expressive. Mapping definitions, analyzer selection, and aggregations are topics I will cover in upcoming posts.

Getting Started with Big Data Using Elasticsearch and Kibana

How Elasticsearch Stores Data

Where Elasticsearch Truly Shines

Kibana: Making Sense of Your Data

Setting Up with Docker

What Comes Next

Comments

Related Posts

Git in Production: A Senior Engineer's Practical Guide

Did Someone Say GitHub Actions?

RabbitMQ with PHP