The technology industry consumes approximately 2-3% of global electricity and generates 2-4% of global CO2 emissions โ comparable to the aviation industry. As AI workloads explode, this share is growing rapidly. Sustainable computing is no longer optional.
The Scale of the Problem
Data Center Energy Consumption
- Global data centers: ~200-250 TWh/year (2024), projected 400+ TWh by 2030
- A single large AI training run: Can consume as much electricity as 100 US homes for a year
- GPU inference at scale: 10x more energy per query than traditional web serving
- Water consumption: Hyperscale data centers use millions of gallons of water for cooling
Carbon Cost of AI
Approximate CO2 emissions:
Training GPT-4-class model: ~500-1000 tonnes CO2
Serving 1M ChatGPT queries: ~10-25 tonnes CO2
Training BERT (for reference): ~0.6 tonnes CO2Green Infrastructure Strategies
Strategy 1: Carbon-Aware Scheduling
Run workloads when and where the electricity grid is cleanest:
# Kubernetes carbon-aware scheduler
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: carbon-flexible
annotations:
carbon-aware: "true"
max-delay: "6h" # Can delay up to 6 hours
preferred-regions:
- eu-north-1 # Nordic hydro/wind
- us-west-2 # Pacific Northwest hydroFor batch workloads (training, CI/CD, data processing), shifting execution by a few hours can reduce carbon intensity by 30-50%.
Strategy 2: Right-Sizing Resources
Most Kubernetes clusters are over-provisioned:
- Average CPU utilization: 15-25% in most clusters
- Average GPU utilization: 30-40% for inference workloads
- Memory waste: 40-60% allocated but unused
# Use VPA to right-size resource requests
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: inference-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: llm-inference
updatePolicy:
updateMode: Auto
resourcePolicy:
containerPolicies:
- containerName: inference
maxAllowed:
cpu: "8"
memory: "32Gi"Strategy 3: Efficient Model Selection
Not every query needs GPT-4:
| Query Complexity | Model | Energy per Query |
|---|---|---|
| Simple classification | DistilBERT | 0.001 Wh |
| FAQ/retrieval | Small LLM (7B) | 0.01 Wh |
| Complex reasoning | Large LLM (70B) | 0.1 Wh |
| Advanced analysis | GPT-4 class | 1.0 Wh |
Smart routing sends simple queries to small models and reserves large models for complex tasks โ reducing energy consumption by 80% with minimal quality loss.
Strategy 4: Hardware Efficiency
- ARM-based servers: 30-40% more energy efficient than x86 for many workloads
- GPU generation: H100 is 3x more energy efficient than A100 per FLOP
- Liquid cooling: 30-40% more efficient than air cooling
- On-premises renewable energy: Solar/wind directly powering data centers
Strategy 5: Code and Infrastructure Optimization
- Container image optimization: Smaller images = less storage, less network, less energy
- Caching: Every cache hit avoids a computation
- CDN: Serve static content from edge, not origin
- Database optimization: Efficient queries consume less CPU
- Spot/preemptible instances: Use excess capacity that would otherwise be wasted
Measuring Carbon Footprint
Tools for tracking infrastructure carbon:
- Cloud Carbon Footprint: Open-source tool for estimating cloud emissions
- Kepler: Kubernetes-based Efficient Power Level Exporter
- Scaphandre: Energy consumption measurement agent
- Green Metrics Tool: Web application carbon measurement