Skip to main content
๐ŸŽ“ Claude Code Masterclass Learn AI-assisted development on Udemy โ€” plus the companion book on Leanpub & Amazon. Start Learning
Sustainable Cloud Computing Green Infrastructure
Platform Engineering

Sustainable Computing

AI workloads are exploding energy consumption. Practical strategies for carbon-aware scheduling, right-sizing GPU instances, and carbon measurement.

LB
Luca Berton
ยท 2 min read

The technology industry consumes approximately 2-3% of global electricity and generates 2-4% of global CO2 emissions โ€” comparable to the aviation industry. As AI workloads explode, this share is growing rapidly. Sustainable computing is no longer optional.

The Scale of the Problem

Data Center Energy Consumption

  • Global data centers: ~200-250 TWh/year (2024), projected 400+ TWh by 2030
  • A single large AI training run: Can consume as much electricity as 100 US homes for a year
  • GPU inference at scale: 10x more energy per query than traditional web serving
  • Water consumption: Hyperscale data centers use millions of gallons of water for cooling

Carbon Cost of AI

Approximate CO2 emissions:
  Training GPT-4-class model:    ~500-1000 tonnes CO2
  Serving 1M ChatGPT queries:    ~10-25 tonnes CO2
  Training BERT (for reference):  ~0.6 tonnes CO2

Green Infrastructure Strategies

Strategy 1: Carbon-Aware Scheduling

Run workloads when and where the electricity grid is cleanest:

# Kubernetes carbon-aware scheduler
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: carbon-flexible
  annotations:
    carbon-aware: "true"
    max-delay: "6h"      # Can delay up to 6 hours
    preferred-regions:
      - eu-north-1       # Nordic hydro/wind
      - us-west-2        # Pacific Northwest hydro

For batch workloads (training, CI/CD, data processing), shifting execution by a few hours can reduce carbon intensity by 30-50%.

Strategy 2: Right-Sizing Resources

Most Kubernetes clusters are over-provisioned:

  • Average CPU utilization: 15-25% in most clusters
  • Average GPU utilization: 30-40% for inference workloads
  • Memory waste: 40-60% allocated but unused
# Use VPA to right-size resource requests
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: inference-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: llm-inference
  updatePolicy:
    updateMode: Auto
  resourcePolicy:
    containerPolicies:
      - containerName: inference
        maxAllowed:
          cpu: "8"
          memory: "32Gi"

Strategy 3: Efficient Model Selection

Not every query needs GPT-4:

Query ComplexityModelEnergy per Query
Simple classificationDistilBERT0.001 Wh
FAQ/retrievalSmall LLM (7B)0.01 Wh
Complex reasoningLarge LLM (70B)0.1 Wh
Advanced analysisGPT-4 class1.0 Wh

Smart routing sends simple queries to small models and reserves large models for complex tasks โ€” reducing energy consumption by 80% with minimal quality loss.

Strategy 4: Hardware Efficiency

  • ARM-based servers: 30-40% more energy efficient than x86 for many workloads
  • GPU generation: H100 is 3x more energy efficient than A100 per FLOP
  • Liquid cooling: 30-40% more efficient than air cooling
  • On-premises renewable energy: Solar/wind directly powering data centers

Strategy 5: Code and Infrastructure Optimization

  • Container image optimization: Smaller images = less storage, less network, less energy
  • Caching: Every cache hit avoids a computation
  • CDN: Serve static content from edge, not origin
  • Database optimization: Efficient queries consume less CPU
  • Spot/preemptible instances: Use excess capacity that would otherwise be wasted

Measuring Carbon Footprint

Tools for tracking infrastructure carbon:

  • Cloud Carbon Footprint: Open-source tool for estimating cloud emissions
  • Kepler: Kubernetes-based Efficient Power Level Exporter
  • Scaphandre: Energy consumption measurement agent
  • Green Metrics Tool: Web application carbon measurement

Free 30-min AI & Cloud consultation

Book Now