What AI and cloud consulting services does Luca Berton offer?

Luca Berton provides expert consulting in AI/ML platform strategy, multi-tenant GPU orchestration on OpenShift AI, MLOps enablement, cloud infrastructure design, Kubernetes workshops, and Ansible & Python training.

What is Ansible Pilot?

Ansible Pilot is the leading resource for Ansible automation learning, featuring a YouTube channel with 6.1K subscribers and 1M+ views, plus AnsiblePilot.com with 648K total users.

How can I book a consultation with Luca Berton?

Schedule a free consultation through Calendly at calendly.com/lucaberton or visit lucaberton.com/contact.

DevOps

Edge AI Monitoring with Prometheus and Grafana: Fleet Observability

Luca Berton • Thu Feb 26 2026 • 1 min read •

#edge-ai#monitoring#prometheus#grafana#observability#fleet-management

You Can’t Improve What You Can’t Measure

An edge AI model in production without monitoring is a ticking time bomb. Models drift. Hardware degrades. Cameras get dirty. You need to know when performance drops — before the business does.

The Metrics That Matter

Inference Metrics

from prometheus_client import Histogram, Counter, Gauge, start_http_server

# Core inference metrics
INFERENCE_LATENCY = Histogram(
    'inference_latency_seconds',
    'Inference latency in seconds',
    buckets=[0.005, 0.01, 0.015, 0.02, 0.05, 0.1, 0.5]
)

INFERENCE_COUNT = Counter(
    'inference_total',
    'Total inferences processed',
    ['model_version', 'result']
)

CONFIDENCE_SCORE = Histogram(
    'prediction_confidence',
    'Model confidence scores',
    buckets=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99]
)

MODEL_VERSION = Gauge(
    'model_version_info',
    'Currently loaded model version',
    ['version']
)

# Usage in inference loop
@INFERENCE_LATENCY.time()
def run_inference(image):
    prediction = model.predict(image)
    INFERENCE_COUNT.labels(
        model_version='3.2',
        result=prediction.label
    ).inc()
    CONFIDENCE_SCORE.observe(prediction.confidence)
    return prediction

# Expose metrics on :9090/metrics
start_http_server(9090)

Hardware Metrics

import subprocess
import json

GPU_TEMP = Gauge('gpu_temperature_celsius', 'GPU temperature')
GPU_UTIL = Gauge('gpu_utilization_percent', 'GPU utilization')
MEMORY_USED = Gauge('gpu_memory_used_bytes', 'GPU memory used')

def collect_jetson_metrics():
    """Collect Jetson hardware metrics via tegrastats."""
    stats = subprocess.check_output(['tegrastats', '--interval', '1000', '--count', '1'])
    # Parse tegrastats output
    gpu_temp = parse_temp(stats)
    gpu_util = parse_util(stats)
    mem_used = parse_memory(stats)

    GPU_TEMP.set(gpu_temp)
    GPU_UTIL.set(gpu_util)
    MEMORY_USED.set(mem_used)

Data Quality Metrics

IMAGE_BRIGHTNESS = Histogram(
    'input_image_brightness',
    'Average brightness of input images',
    buckets=[20, 40, 60, 80, 100, 120, 140, 160, 180, 200]
)

IMAGE_BLUR = Histogram(
    'input_image_blur_score',
    'Laplacian variance (blur detection)',
    buckets=[10, 50, 100, 200, 500, 1000]
)

def check_image_quality(image):
    brightness = image.mean()
    blur = cv2.Laplacian(image, cv2.CV_64F).var()

    IMAGE_BRIGHTNESS.observe(brightness)
    IMAGE_BLUR.observe(blur)

    if brightness < 30:
        alert("Camera may be obstructed or lighting failed")
    if blur < 50:
        alert("Camera may be out of focus")

Prometheus Configuration

# prometheus.yml - Central Prometheus scraping edge devices
global:
  scrape_interval: 30s
  evaluation_interval: 30s

scrape_configs:
  - job_name: 'edge-ai-fleet'
    file_sd_configs:
      - files:
          - /etc/prometheus/edge_targets/*.json
        refresh_interval: 5m

  # Or use DNS service discovery if devices register
  - job_name: 'edge-ai-dns'
    dns_sd_configs:
      - names:
          - '_metrics._tcp.edge.internal'
        type: SRV
        refresh_interval: 60s

# Alert rules
rule_files:
  - /etc/prometheus/rules/edge_ai_alerts.yml

Alert Rules

# edge_ai_alerts.yml
groups:
  - name: edge_ai
    rules:
      - alert: HighInferenceLatency
        expr: histogram_quantile(0.95, rate(inference_latency_seconds_bucket[5m])) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High inference latency on {{ $labels.instance }}"
          description: "P95 latency is {{ $value }}s (threshold: 50ms)"

      - alert: LowConfidenceSpike
        expr: rate(prediction_confidence_bucket{le="0.5"}[15m]) / rate(prediction_confidence_count[15m]) > 0.1
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "Model confidence dropping on {{ $labels.instance }}"
          description: ">10% of predictions below 0.5 confidence — possible model drift"

      - alert: GPUOverheating
        expr: gpu_temperature_celsius > 85
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "GPU overheating on {{ $labels.instance }}"

      - alert: DeviceUnreachable
        expr: up{job="edge-ai-fleet"} == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Edge device {{ $labels.instance }} unreachable"

Model Accuracy Drift Detection

The most insidious edge AI failure: the model slowly gets worse. New product variants, lighting changes, seasonal differences.

# Track accuracy against known-good reference images
VALIDATION_ACCURACY = Gauge(
    'model_validation_accuracy',
    'Accuracy on reference validation set'
)

def periodic_validation():
    """Run every hour against reference images."""
    correct = 0
    total = 0
    for image, expected in REFERENCE_SET:
        prediction = model.predict(image)
        if prediction.label == expected:
            correct += 1
        total += 1

    accuracy = correct / total
    VALIDATION_ACCURACY.set(accuracy)

    if accuracy < 0.95:
        alert(f"Model accuracy dropped to {accuracy:.1%} — retrain needed")

Grafana Dashboard

Key panels for the edge AI fleet dashboard:

Fleet Overview — map or table showing all devices, status, model version
Inference Performance — latency p50/p95/p99 per device, aggregated
Model Confidence Distribution — histogram showing confidence spread (drift = left shift)
Hardware Health — GPU temp, utilization, memory across fleet
Defect Rate Trend — is the defect detection rate changing? Could indicate model issue OR real quality issue
Alert Timeline — recent alerts with severity

Monitoring edge AI is harder than monitoring cloud services because devices fail in ways cloud servers don’t — dirty cameras, power fluctuations, physical damage. Build your dashboards with these failure modes in mind.

📌 Need expert help with this topic?

🧠

AI Integration & GPU Platforms

Need help deploying AI/ML platforms? Get expert consulting on OpenShift AI, GPU orchestration, and MLOps.

🐍

Ansible & Python Training

Level up your automation skills with expert-led Ansible and Python training.

Book a free consultation →

Luca Berton

AI & Cloud Advisor with 18+ years experience. Author of 8 technical books, creator of Ansible Pilot, and instructor at CopyPasteLearn Academy. Speaker at KubeCon EU & Red Hat Summit 2026.

LinkedIn Bluesky YouTube Contact →

← Back to Blog

DevOps

Fix: OpenClaw in Docker — Connection Refused, Port Mapping, and Network Issues

Running OpenClaw in Docker and getting connection refused? Common issues with port mapping, bind addresses, DNS resolution, and WebSocket upgrades explained with fixes.

Tue Mar 03 2026

DevOps

Fix: OpenClaw Gateway non-loopback control UI requires gateway.controlui.allowedorigins

Getting the allowedorigins error when starting your OpenClaw gateway? Here is exactly how to fix it, with step-by-step configuration for local network, VPS, and reverse proxy setups.

Tue Mar 03 2026

DevOps

Fix: OpenClaw Model API Key Errors — 401 Unauthorized, Invalid Key, and Rate Limits

Troubleshoot OpenClaw API key issues across OpenAI, Anthropic, and GitHub Copilot. Covers 401 errors, invalid key formats, rate limits, and model fallback configuration.

Tue Mar 03 2026

Edge AI Monitoring with Prometheus and Grafana: Fleet Observability

You Can’t Improve What You Can’t Measure

The Metrics That Matter

Inference Metrics

Hardware Metrics

Data Quality Metrics

Prometheus Configuration

Alert Rules

Model Accuracy Drift Detection

Grafana Dashboard

📌 Need expert help with this topic?

AI Integration & GPU Platforms

Ansible & Python Training

Luca Berton

Related Articles

Fix: OpenClaw in Docker — Connection Refused, Port Mapping, and Network Issues

Fix: OpenClaw Gateway non-loopback control UI requires gateway.controlui.allowedorigins

Fix: OpenClaw Model API Key Errors — 401 Unauthorized, Invalid Key, and Rate Limits