This is a complete guide to running Prometheus in production on Kubernetes.
Install Prometheus on Kubernetes
The fastest path is the kube-prometheus-stack Helm chart:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring --create-namespace \
--set grafana.enabled=true \
--set alertmanager.enabled=trueThis installs Prometheus, Grafana, AlertManager, and node exporters.
Architecture
โโโโโโโโโโโโโโโ scrape โโโโโโโโโโโโโโโโโโ
โ Your Apps โ โโโโโโโโโโโโโ โ Prometheus โ
โ (metrics) โ โ Server โ
โโโโโโโโโโโโโโโ โโโโโโโโโฌโโโโโโโโโ
โ evaluate
โโโโโโโโโโโโโโโ alerts โโโโโโโโโผโโโโโโโโโ
โ Slack/PD โ โโโโโโโโโโโโโ โ AlertManager โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโ
โ query
โโโโโโโโโผโโโโโโโโโ
โ Grafana โ
โโโโโโโโโโโโโโโโโโServiceMonitor (Kubernetes-Native Discovery)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-app
labels:
release: monitoring
spec:
selector:
matchLabels:
app: my-app
endpoints:
- port: metrics
interval: 15s
path: /metricsWriting PromQL Queries
# Request rate per second
rate(http_requests_total[5m])
# Error percentage
sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100
# P99 latency
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
# CPU usage per pod
sum(rate(container_cpu_usage_seconds_total[5m])) by (pod) * 100AlertManager Configuration
# alertmanager.yml
route:
receiver: slack-critical
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
routes:
- match:
severity: critical
receiver: pagerduty
- match:
severity: warning
receiver: slack-warnings
receivers:
- name: slack-critical
slack_configs:
- channel: "#alerts-critical"
send_resolved: true
- name: pagerduty
pagerduty_configs:
- service_key: YOUR_PD_KEYAlert Rules
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: app-alerts
spec:
groups:
- name: app.rules
rules:
- alert: HighErrorRate
expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate detected"
description: "Error rate is {{ $value | humanizePercentage }}"
- alert: PodCrashLooping
expr: increase(kube_pod_container_status_restarts_total[1h]) > 5
for: 10m
labels:
severity: warningLong-Term Storage
Prometheus local storage is not designed for long retention. For production, add one of:
| Solution | Approach | Best For |
|---|---|---|
| Thanos | Sidecar + object storage | Multi-cluster, global view |
| Grafana Mimir | Horizontally scalable | High-cardinality, multi-tenant |
| VictoriaMetrics | Drop-in replacement | Performance, compression |
Tips and Tricks
- Use
rate()for counters, never raw counter values - Set scrape interval to 15-30s (not 1s โ it wastes resources)
- Use recording rules for expensive queries that dashboards run frequently
- Monitor Prometheus itself:
prometheus_tsdb_head_seriesfor cardinality - Use
relabel_configsto drop high-cardinality labels before ingestion