Skip to content
Muhammet Şafak
tr
Tools & Technologies 4 min read

Getting Started with Big Data Using Elasticsearch and Kibana

An introduction to Elasticsearch and Kibana: the fundamentals of full-text search, data analysis, and visualization at scale.


At some point, every application outgrows its database’s built-in search capabilities. The performance cliff of LIKE '%...%' queries, the need to search through log data, the demand for real-time filtering across millions of records — all of these point to the same underlying truth: relational databases were not designed for full-text search or large-scale data analysis. That is exactly the gap Elasticsearch fills.

Elasticsearch is an open-source search and analytics engine. At its core sits Apache Lucene; Elasticsearch wraps Lucene’s powerful indexing infrastructure in a JSON-based HTTP API and deploys it on a distributed, horizontally scalable architecture. You can store and query diverse data types — unstructured text, log entries, time-series data, and geospatial coordinates — all within the same system.

How Elasticsearch Stores Data

What a relational database calls a “table” and a “row”, Elasticsearch calls an index and a document. An index is a logical container for documents of a similar type; a document is a single data record in JSON format.

The distributed architecture that enables large-scale operation works like this: each index is divided into pieces called shards. Each shard is essentially an independent Lucene instance and can be distributed across different nodes. Replicas of these shards serve dual purposes — they act as backups for high availability and distribute read load across the cluster.

The practical outcome of this architecture: when a node fails, replica shards take over, data is preserved, and the system keeps running. A single node is sufficient for development; sizing this topology correctly matters in production.

Where Elasticsearch Truly Shines

Elasticsearch is not the right tool for every job — it is the right tool for specific problems:

Full-text search: Beyond simple string matching, it analyzes text — stemming, synonym expansion, relevance scoring. When a user on an e-commerce site searches for “running shoe”, getting back results for “running shoes” is the analysis layer at work.

Log and monitoring analysis: Built for ingesting thousands of log lines per second and running real-time queries over them. The ELK stack — Elasticsearch, Logstash, Kibana — is the classic combination for this use case.

Time-series data: Attaching a timestamp to every record and asking questions like “which minute in the last hour had the highest traffic” or generating time-based histograms is something it is optimized for.

Geospatial queries: Storing coordinate-based data and executing geo-distance queries such as “return records within X km of this location” is a supported out-of-the-box feature.

It does not do what relational databases do — transactions, foreign keys, joins. These two are not competitors; they are complementary: application data lives in MySQL/PostgreSQL, while the search and analytics layer lives in Elasticsearch.

Kibana: Making Sense of Your Data

Kibana is the visualization interface for Elasticsearch data. You can reach Elasticsearch directly via curl or a client library, but Kibana organizes that data for human consumption.

A few core areas of use:

  • Discover: Query your data and inspect raw documents. Log investigation happens here.
  • Visualize/Dashboard: Build charts from metrics and assemble them together. Line charts, pie charts, heat maps, counters — all via drag and drop.
  • Dev Tools: Send Query DSL requests directly to Elasticsearch from within Kibana. Indispensable when learning the API.

Kibana also ships with Machine Learning plugins and Alerting; while those have moved to paid tiers, the core visualization features remain free.

Setting Up with Docker

For a development environment, Docker Compose is the most practical approach. Both versions must match:

docker run -d --name elasticsearch --net somenetwork -p 9200:9200 -e "discovery.type=single-node" elasticsearch:8.8.1
docker run -d --name kibana --net somenetwork -p 5601:5601 -e "ELASTICSEARCH_HOSTS=http://elasticsearch:9200" kibana:8.8.1

Or with a docker-compose.yml:

version: '3'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.8.1
    container_name: elasticsearch
    environment:
      - discovery.type=single-node
    ports:
      - "9200:9200"
    networks:
      - somenetwork

  kibana:
    image: docker.elastic.co/kibana/kibana:8.8.1
    container_name: kibana
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
    ports:
      - "5601:5601"
    networks:
      - somenetwork

networks:
  somenetwork:
    driver: bridge
docker-compose up -d

Once up, Elasticsearch is accessible at http://localhost:9200 and Kibana at http://localhost:5601.

Starting with Elasticsearch 8.x, security is enabled by default — HTTP Basic Auth and TLS are active out of the box. To disable this in a development environment, add the xpack.security.enabled=false environment variable; otherwise you will hit 401 errors immediately.

What Comes Next

This post covers the core concepts and setup for Elasticsearch and Kibana. In real-world usage, the first place you are likely to get stuck is Query DSL — Elasticsearch’s JSON-based query language looks unfamiliar at first, but once it clicks, it is remarkably expressive. Mapping definitions, analyzer selection, and aggregations are topics I will cover in upcoming posts.

Share:

Comments

Sign in with your GitHub account to join the discussion. Comments are stored in GitHub Discussions.

Related Posts

Search the site

Start typing to search posts, projects and pages.

Esc to close Powered by Pagefind