Prometheus Tutorial: Complete Monitoring Guide (2026)

This is a complete guide to running Prometheus in production on Kubernetes.

Install Prometheus on Kubernetes

The fastest path is the kube-prometheus-stack Helm chart:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace \
  --set grafana.enabled=true \
  --set alertmanager.enabled=true

This installs Prometheus, Grafana, AlertManager, and node exporters.

Architecture

┌─────────────┐     scrape     ┌────────────────┐
│ Your Apps    │ ◄──────────── │   Prometheus    │
│ (metrics)   │                │   Server        │
└─────────────┘                └───────┬────────┘
                                       │ evaluate
┌─────────────┐     alerts     ┌───────▼────────┐
│ Slack/PD    │ ◄──────────── │  AlertManager   │
└─────────────┘                └────────────────┘
                                       │ query
                               ┌───────▼────────┐
                               │    Grafana      │
                               └────────────────┘

ServiceMonitor (Kubernetes-Native Discovery)

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app
  labels:
    release: monitoring
spec:
  selector:
    matchLabels:
      app: my-app
  endpoints:
    - port: metrics
      interval: 15s
      path: /metrics

Writing PromQL Queries

# Request rate per second
rate(http_requests_total[5m])

# Error percentage
sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100

# P99 latency
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

# CPU usage per pod
sum(rate(container_cpu_usage_seconds_total[5m])) by (pod) * 100

AlertManager Configuration

# alertmanager.yml
route:
  receiver: slack-critical
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  routes:
    - match:
        severity: critical
      receiver: pagerduty
    - match:
        severity: warning
      receiver: slack-warnings

receivers:
  - name: slack-critical
    slack_configs:
      - channel: "#alerts-critical"
        send_resolved: true
  - name: pagerduty
    pagerduty_configs:
      - service_key: YOUR_PD_KEY

Alert Rules

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: app-alerts
spec:
  groups:
    - name: app.rules
      rules:
        - alert: HighErrorRate
          expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.05
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "High error rate detected"
            description: "Error rate is {{ $value | humanizePercentage }}"

        - alert: PodCrashLooping
          expr: increase(kube_pod_container_status_restarts_total[1h]) > 5
          for: 10m
          labels:
            severity: warning

Long-Term Storage

Prometheus local storage is not designed for long retention. For production, add one of:

Solution	Approach	Best For
Thanos	Sidecar + object storage	Multi-cluster, global view
Grafana Mimir	Horizontally scalable	High-cardinality, multi-tenant
VictoriaMetrics	Drop-in replacement	Performance, compression

Tips and Tricks

Use rate() for counters, never raw counter values
Set scrape interval to 15-30s (not 1s — it wastes resources)
Use recording rules for expensive queries that dashboards run frequently
Monitor Prometheus itself: prometheus_tsdb_head_series for cardinality
Use relabel_configs to drop high-cardinality labels before ingestion

Prometheus Tutorial: Complete Monitoring Guide (2026)

Install Prometheus on Kubernetes

Architecture

ServiceMonitor (Kubernetes-Native Discovery)

Writing PromQL Queries

AlertManager Configuration

Alert Rules

Long-Term Storage

Tips and Tricks

Related Articles

Restore a Deleted Google Analytics 4 Property

Fix OpenClaw ERR_STRING_TOO_LONG Session Error

Turn Google Search Console Data Into a Growth Plan

Argo CD: GitOps Continuous Deployment for Kubernetes

Install Prometheus on Kubernetes

Architecture

ServiceMonitor (Kubernetes-Native Discovery)

Writing PromQL Queries

AlertManager Configuration

Alert Rules

Long-Term Storage

Tips and Tricks

Related Resources

Related Articles

Restore a Deleted Google Analytics 4 Property

Fix OpenClaw ERR_STRING_TOO_LONG Session Error

Turn Google Search Console Data Into a Growth Plan

Argo CD: GitOps Continuous Deployment for Kubernetes