Skip to main content
πŸŽ“ Claude Code Masterclass Learn AI-assisted development on Udemy β€” plus the companion book on Leanpub & Amazon. Start Learning
Prometheus + Thanos: Long-Term Metrics Storage for Kubernetes
DevOps

Prometheus + Thanos: Long-Term Metrics Storage for Kubernetes

Scale Prometheus beyond single-cluster with Thanos β€” global query view, unlimited retention, downsampling, and object storage backend.

LB
Luca Berton
Β· 1 min read

The Prometheus Scaling Problem

Prometheus is excellent for single-cluster monitoring. But at enterprise scale:

  • No long-term storage β€” default retention is 15 days
  • No global view β€” each Prometheus sees only its cluster
  • No downsampling β€” raw data at full resolution forever
  • Single point of failure β€” one instance per cluster

Thanos solves all four problems.

Thanos Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Thanos Query                      β”‚
β”‚            (Global Query Frontend)                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚               β”‚               β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
β”‚ Thanos Sidecar β”‚ β”‚   Thanos   β”‚ β”‚   Thanos   β”‚
β”‚ (Cluster 1)    β”‚ β”‚   Store    β”‚ β”‚   Sidecar  β”‚
β”‚                β”‚ β”‚  Gateway   β”‚ β”‚ (Cluster 2) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
        β”‚                  β”‚              β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
β”‚  Prometheus 1  β”‚  β”‚ Object     β”‚ β”‚Prometheus 2β”‚
β”‚  (cluster-a)   β”‚  β”‚ Storage    β”‚ β”‚(cluster-b) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚ (S3/GCS)   β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Installation (Helm)

# Prometheus with Thanos sidecar
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set prometheus.prometheusSpec.thanos.enabled=true \
  --set prometheus.prometheusSpec.thanos.objectStorageConfig.secretName=thanos-objstore \
  --set prometheus.prometheusSpec.retention=6h  # Short local retention

# Thanos components
helm install thanos bitnami/thanos \
  --namespace monitoring \
  --set query.enabled=true \
  --set storegateway.enabled=true \
  --set compactor.enabled=true \
  --set ruler.enabled=true \
  --set objstoreConfig=<config>

Object Storage Config

apiVersion: v1
kind: Secret
metadata:
  name: thanos-objstore
stringData:
  objstore.yml: |
    type: S3
    config:
      bucket: thanos-metrics
      endpoint: s3.eu-west-1.amazonaws.com
      region: eu-west-1
      access_key: ${AWS_ACCESS_KEY}
      secret_key: ${AWS_SECRET_KEY}

Downsampling

Thanos Compactor automatically reduces resolution:

AgeResolutionStorage Impact
0-2 daysRaw (15s/30s)100%
2-30 days5 minutes~20%
30+ days1 hour~3%

Result: Store years of metrics at minimal cost.

Multi-Cluster Global View

Query across all clusters from a single Grafana:

# Grafana datasource pointing to Thanos Query
apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-datasources
data:
  thanos.yaml: |
    apiVersion: 1
    datasources:
      - name: Thanos
        type: prometheus
        url: http://thanos-query.monitoring:9090
        access: proxy
        isDefault: true

Cost Optimization

Storage1 Year Raw (30s)1 Year with Thanos
100 time series30GB3GB
10K time series3TB300GB
100K time series30TB3TB

S3 cost for 300GB: ~$7/month. Storing the same metrics in Prometheus local storage would require 3TB of expensive SSD.

When to Use Thanos vs Alternatives

SolutionBest For
ThanosExisting Prometheus + need long-term + multi-cluster
Cortex/MimirHigh-cardinality, multi-tenant write path
VictoriaMetricsHigh performance, simpler operations
Grafana CloudManaged, don’t want to run infrastructure

Free 30-min AI & Cloud consultation

Book Now