Skip to main content
๐ŸŽ“ Claude Code Masterclass Learn AI-assisted development on Udemy โ€” plus the companion book on Leanpub & Amazon. Start Learning
Prometheus Tutorial: Complete Monitoring Guide (2026)
DevOps

Prometheus Tutorial: Complete Monitoring Guide (2026)

Complete Prometheus tutorial. Installation, configuration, PromQL queries, alerting rules, Grafana dashboards, and Kubernetes service discovery.

LB
Luca Berton
ยท 1 min read

This is a complete guide to running Prometheus in production on Kubernetes.

Install Prometheus on Kubernetes

The fastest path is the kube-prometheus-stack Helm chart:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace \
  --set grafana.enabled=true \
  --set alertmanager.enabled=true

This installs Prometheus, Grafana, AlertManager, and node exporters.

Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     scrape     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Your Apps    โ”‚ โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚   Prometheus    โ”‚
โ”‚ (metrics)   โ”‚                โ”‚   Server        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                       โ”‚ evaluate
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     alerts     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Slack/PD    โ”‚ โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚  AlertManager   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                       โ”‚ query
                               โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                               โ”‚    Grafana      โ”‚
                               โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

ServiceMonitor (Kubernetes-Native Discovery)

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app
  labels:
    release: monitoring
spec:
  selector:
    matchLabels:
      app: my-app
  endpoints:
    - port: metrics
      interval: 15s
      path: /metrics

Writing PromQL Queries

# Request rate per second
rate(http_requests_total[5m])

# Error percentage
sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100

# P99 latency
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

# CPU usage per pod
sum(rate(container_cpu_usage_seconds_total[5m])) by (pod) * 100

AlertManager Configuration

# alertmanager.yml
route:
  receiver: slack-critical
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  routes:
    - match:
        severity: critical
      receiver: pagerduty
    - match:
        severity: warning
      receiver: slack-warnings

receivers:
  - name: slack-critical
    slack_configs:
      - channel: "#alerts-critical"
        send_resolved: true
  - name: pagerduty
    pagerduty_configs:
      - service_key: YOUR_PD_KEY

Alert Rules

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: app-alerts
spec:
  groups:
    - name: app.rules
      rules:
        - alert: HighErrorRate
          expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.05
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "High error rate detected"
            description: "Error rate is {{ $value | humanizePercentage }}"

        - alert: PodCrashLooping
          expr: increase(kube_pod_container_status_restarts_total[1h]) > 5
          for: 10m
          labels:
            severity: warning

Long-Term Storage

Prometheus local storage is not designed for long retention. For production, add one of:

SolutionApproachBest For
ThanosSidecar + object storageMulti-cluster, global view
Grafana MimirHorizontally scalableHigh-cardinality, multi-tenant
VictoriaMetricsDrop-in replacementPerformance, compression

Tips and Tricks

  • Use rate() for counters, never raw counter values
  • Set scrape interval to 15-30s (not 1s โ€” it wastes resources)
  • Use recording rules for expensive queries that dashboards run frequently
  • Monitor Prometheus itself: prometheus_tsdb_head_series for cardinality
  • Use relabel_configs to drop high-cardinality labels before ingestion

Free 30-min AI & Cloud consultation

Book Now