Why Loki over Elasticsearch?
| Metric | Elasticsearch | Loki |
|---|---|---|
| Storage cost (1TB/day) | ~$3,000/mo | ~$300/mo |
| RAM required | 64GB+ | 4-8GB |
| Index strategy | Full-text (expensive) | Labels only (cheap) |
| Query language | KQL/Lucene | LogQL (PromQL-like) |
| Grafana integration | Plugin | Native (first-class) |
| Operational complexity | High (shards, mappings) | Low |
Lokiβs key insight: donβt index log content, only index metadata labels. Query by labels, grep for content.
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Loki Cluster β
β β
β ββββββββββββ βββββββββββββ ββββββββββββββββββββ β
β βDistributorβ β Ingester β β Query Frontend β β
β βββββββ¬βββββ βββββββ¬ββββββ ββββββββββ¬ββββββββββ β
β β β β β
β ββββββββββββββββΌβββββββββββββββββββ β
β β β
β ββββββββββΌβββββββββ β
β β Object Storage β β
β β (S3/MinIO/GCS) β β
β βββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β²
β Push logs
ββββββββββ΄βββββββββ
β Promtail / β (DaemonSet on every node)
β Grafana Agent β
βββββββββββββββββββInstallation
helm repo add grafana https://grafana.github.io/helm-charts
# Loki (Simple Scalable mode)
helm install loki grafana/loki \
--namespace monitoring \
--set loki.storage.type=s3 \
--set loki.storage.s3.endpoint=minio.minio.svc:9000 \
--set loki.storage.s3.bucketnames=loki-chunks \
--set loki.storage.s3.access_key_id=minioadmin \
--set loki.storage.s3.secret_access_key=minioadmin
# Promtail (log collector)
helm install promtail grafana/promtail \
--namespace monitoring \
--set config.clients[0].url=http://loki:3100/loki/api/v1/pushLogQL Queries
# All logs from payment service
{namespace="production", app="payment-service"}
# Filter for errors
{namespace="production", app="payment-service"} |= "error"
# Regex extract and filter
{namespace="production"} | regexp `status=(?P<status>\d+)` | status >= 500
# Count errors per minute
count_over_time({namespace="production"} |= "error" [1m])
# Top 10 error messages
topk(10, count by (msg)(
{namespace="production"} | json | level="error"
))
# Latency percentiles from structured logs
quantile_over_time(0.95,
{app="api-gateway"} | json | unwrap duration [5m]
) by (endpoint)Structured Logging Best Practice
{"timestamp":"2026-06-05T07:00:00Z","level":"error","msg":"payment failed","service":"payment","user_id":"u123","amount":99.99,"error":"insufficient_funds","trace_id":"abc123"}# Query structured logs efficiently
{app="payment-service"} | json | level="error" | error="insufficient_funds" | amount > 100Retention and Cost
# Loki config
limits_config:
retention_period: 30d # Auto-delete after 30 days
max_streams_per_user: 10000
ingestion_rate_mb: 10
ingestion_burst_size_mb: 20
compactor:
retention_enabled: true
delete_request_store: s3| Retention | Daily Volume | Monthly Storage Cost (S3) |
|---|---|---|
| 7 days | 50GB/day | ~$8 |
| 30 days | 50GB/day | ~$35 |
| 90 days | 50GB/day | ~$100 |
| 365 days | 50GB/day | ~$400 |
Compare: Elasticsearch for the same volume would cost $3,000-10,000/month.
Alerting on Logs
# Loki ruler config
groups:
- name: payment-alerts
rules:
- alert: HighErrorRate
expr: |
sum(count_over_time({app="payment-service"} |= "error" [5m])) > 50
for: 5m
labels:
severity: critical
annotations:
summary: "Payment service error rate is high"