Getting Started with Big Data Using Elasticsearch and Kibana
An introduction to Elasticsearch and Kibana: the fundamentals of full-text search, data analysis, and visualization at scale.
At some point, every application outgrows its database’s built-in search capabilities. The performance cliff of LIKE '%...%' queries, the need to search through log data, the demand for real-time filtering across millions of records — all of these point to the same underlying truth: relational databases were not designed for full-text search or large-scale data analysis. That is exactly the gap Elasticsearch fills.
Elasticsearch is an open-source search and analytics engine. At its core sits Apache Lucene; Elasticsearch wraps Lucene’s powerful indexing infrastructure in a JSON-based HTTP API and deploys it on a distributed, horizontally scalable architecture. You can store and query diverse data types — unstructured text, log entries, time-series data, and geospatial coordinates — all within the same system.
How Elasticsearch Stores Data
What a relational database calls a “table” and a “row”, Elasticsearch calls an index and a document. An index is a logical container for documents of a similar type; a document is a single data record in JSON format.
The distributed architecture that enables large-scale operation works like this: each index is divided into pieces called shards. Each shard is essentially an independent Lucene instance and can be distributed across different nodes. Replicas of these shards serve dual purposes — they act as backups for high availability and distribute read load across the cluster.
The practical outcome of this architecture: when a node fails, replica shards take over, data is preserved, and the system keeps running. A single node is sufficient for development; sizing this topology correctly matters in production.
Where Elasticsearch Truly Shines
Elasticsearch is not the right tool for every job — it is the right tool for specific problems:
Full-text search: Beyond simple string matching, it analyzes text — stemming, synonym expansion, relevance scoring. When a user on an e-commerce site searches for “running shoe”, getting back results for “running shoes” is the analysis layer at work.
Log and monitoring analysis: Built for ingesting thousands of log lines per second and running real-time queries over them. The ELK stack — Elasticsearch, Logstash, Kibana — is the classic combination for this use case.
Time-series data: Attaching a timestamp to every record and asking questions like “which minute in the last hour had the highest traffic” or generating time-based histograms is something it is optimized for.
Geospatial queries: Storing coordinate-based data and executing geo-distance queries such as “return records within X km of this location” is a supported out-of-the-box feature.
It does not do what relational databases do — transactions, foreign keys, joins. These two are not competitors; they are complementary: application data lives in MySQL/PostgreSQL, while the search and analytics layer lives in Elasticsearch.
Kibana: Making Sense of Your Data
Kibana is the visualization interface for Elasticsearch data. You can reach Elasticsearch directly via curl or a client library, but Kibana organizes that data for human consumption.
A few core areas of use:
- Discover: Query your data and inspect raw documents. Log investigation happens here.
- Visualize/Dashboard: Build charts from metrics and assemble them together. Line charts, pie charts, heat maps, counters — all via drag and drop.
- Dev Tools: Send Query DSL requests directly to Elasticsearch from within Kibana. Indispensable when learning the API.
Kibana also ships with Machine Learning plugins and Alerting; while those have moved to paid tiers, the core visualization features remain free.
Setting Up with Docker
For a development environment, Docker Compose is the most practical approach. Both versions must match:
docker run -d --name elasticsearch --net somenetwork -p 9200:9200 -e "discovery.type=single-node" elasticsearch:8.8.1
docker run -d --name kibana --net somenetwork -p 5601:5601 -e "ELASTICSEARCH_HOSTS=http://elasticsearch:9200" kibana:8.8.1
Or with a docker-compose.yml:
version: '3'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.8.1
container_name: elasticsearch
environment:
- discovery.type=single-node
ports:
- "9200:9200"
networks:
- somenetwork
kibana:
image: docker.elastic.co/kibana/kibana:8.8.1
container_name: kibana
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
ports:
- "5601:5601"
networks:
- somenetwork
networks:
somenetwork:
driver: bridge
docker-compose up -d
Once up, Elasticsearch is accessible at http://localhost:9200 and Kibana at http://localhost:5601.
Starting with Elasticsearch 8.x, security is enabled by default — HTTP Basic Auth and TLS are active out of the box. To disable this in a development environment, add the xpack.security.enabled=false environment variable; otherwise you will hit 401 errors immediately.
What Comes Next
This post covers the core concepts and setup for Elasticsearch and Kibana. In real-world usage, the first place you are likely to get stuck is Query DSL — Elasticsearch’s JSON-based query language looks unfamiliar at first, but once it clicks, it is remarkably expressive. Mapping definitions, analyzer selection, and aggregations are topics I will cover in upcoming posts.
Comments
Sign in with your GitHub account to join the discussion. Comments are stored in GitHub Discussions.