Skip to main content
🎀 Speaking at Red Hat Summit 2026 GPUs take flight: Safety-first multi-tenant Platform Engineering with NVIDIA and OpenShift AI Learn More
AI

Dynatrace at KubeCon EU 2026: AI

Caught up with Andreas (Andi) Grabner from Dynatrace at KubeCon EU 2026. AI is the top K8s workload but we cannot keep throwing compute at it.

LB
Luca Berton
Β· 3 min read

AI is officially the number one workload on Kubernetes β€” but are we actually running it efficiently?

Play

The Right-Sizing Imperative

I had a great catch-up with Andreas (Andi) Grabner on the expo floor at KubeCon to talk about the reality of scaling AI. The big takeaway? We cannot just keep throwing raw compute at these workloads. If we do not figure out how to optimize them, we are going to hit a wall with massive energy demands.

Andi made a fantastic point: the next phase of cloud-native AI is all about right-sizing. It is about optimizing workloads and shifting from massive, resource-heavy models to smaller, highly efficient ones.

This aligns perfectly with what I have been seeing in enterprise AI deployments. Organizations that started with the biggest model they could find are now realizing:

  • Smaller models fine-tuned for specific tasks often outperform general-purpose giants
  • Token economics matter β€” every unnecessary parameter costs GPU cycles and electricity
  • Model profiles let you pick the right GPU memory footprint for your actual workload, not the theoretical maximum
  • Autoscaling inference prevents over-provisioning during off-peak hours

You Cannot Fix What You Cannot Measure

Here is the challenge: you cannot fix what you cannot measure. That is where observability becomes the absolute linchpin. You need deep, granular visibility to pinpoint exactly where your AI infrastructure is bleeding efficiency.

For AI workloads on Kubernetes, the observability gaps are real:

  1. GPU utilization β€” are your expensive GPUs actually busy, or are they idle waiting for data?
  2. Inference latency distribution β€” not just p50, but p95 and p99 under real traffic patterns
  3. Token throughput per watt β€” the sustainability metric that matters
  4. Queue depth and batch efficiency β€” are you batching requests optimally?
  5. Memory pressure β€” KV cache utilization, model weight distribution across multi-node deployments

Naturally, Andi recommends Dynatrace to solve this β€” and for good reason, given their deep roots in making complex environments understandable. Their approach of automatic discovery and AI-powered root cause analysis becomes even more valuable when you are dealing with the complexity of distributed inference pipelines.

From Running AI to Running It Sustainably

Always an insightful conversation with Andi. It is refreshing to see the industry focus shifting from just β€œrunning AI” to running it sustainably.

The sustainability angle is not just idealism β€” it is economics. GPU compute is expensive. Energy costs are rising. The organizations that figure out how to get more inference per dollar and per kilowatt will have a structural advantage over those who just keep scaling horizontally.

This connects directly to the inference gold rush I have been writing about. The winners will not be the ones with the most GPUs β€” they will be the ones who use their GPUs most efficiently.

Learn More

Check out how Dynatrace is tackling observability for the AI era: dynatrace.com

About the Author

I am Luca Berton, AI and Cloud Advisor. I help enterprises right-size their AI infrastructure for performance and sustainability. Book a consultation.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens Heaven Art Shop TechMeOut

Free 30-min AI & Cloud consultation

Book Now