Fix: OpenClaw in Docker β Connection Refused, Port Mapping, and Network Issues
Running OpenClaw in Docker and getting connection refused? Common issues with port mapping, bind addresses, DNS resolution, and WebSocket upgrades explained with fixes.
An edge AI model in production without monitoring is a ticking time bomb. Models drift. Hardware degrades. Cameras get dirty. You need to know when performance drops β before the business does.
from prometheus_client import Histogram, Counter, Gauge, start_http_server
# Core inference metrics
INFERENCE_LATENCY = Histogram(
'inference_latency_seconds',
'Inference latency in seconds',
buckets=[0.005, 0.01, 0.015, 0.02, 0.05, 0.1, 0.5]
)
INFERENCE_COUNT = Counter(
'inference_total',
'Total inferences processed',
['model_version', 'result']
)
CONFIDENCE_SCORE = Histogram(
'prediction_confidence',
'Model confidence scores',
buckets=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99]
)
MODEL_VERSION = Gauge(
'model_version_info',
'Currently loaded model version',
['version']
)
# Usage in inference loop
@INFERENCE_LATENCY.time()
def run_inference(image):
prediction = model.predict(image)
INFERENCE_COUNT.labels(
model_version='3.2',
result=prediction.label
).inc()
CONFIDENCE_SCORE.observe(prediction.confidence)
return prediction
# Expose metrics on :9090/metrics
start_http_server(9090)import subprocess
import json
GPU_TEMP = Gauge('gpu_temperature_celsius', 'GPU temperature')
GPU_UTIL = Gauge('gpu_utilization_percent', 'GPU utilization')
MEMORY_USED = Gauge('gpu_memory_used_bytes', 'GPU memory used')
def collect_jetson_metrics():
"""Collect Jetson hardware metrics via tegrastats."""
stats = subprocess.check_output(['tegrastats', '--interval', '1000', '--count', '1'])
# Parse tegrastats output
gpu_temp = parse_temp(stats)
gpu_util = parse_util(stats)
mem_used = parse_memory(stats)
GPU_TEMP.set(gpu_temp)
GPU_UTIL.set(gpu_util)
MEMORY_USED.set(mem_used)IMAGE_BRIGHTNESS = Histogram(
'input_image_brightness',
'Average brightness of input images',
buckets=[20, 40, 60, 80, 100, 120, 140, 160, 180, 200]
)
IMAGE_BLUR = Histogram(
'input_image_blur_score',
'Laplacian variance (blur detection)',
buckets=[10, 50, 100, 200, 500, 1000]
)
def check_image_quality(image):
brightness = image.mean()
blur = cv2.Laplacian(image, cv2.CV_64F).var()
IMAGE_BRIGHTNESS.observe(brightness)
IMAGE_BLUR.observe(blur)
if brightness < 30:
alert("Camera may be obstructed or lighting failed")
if blur < 50:
alert("Camera may be out of focus")# prometheus.yml - Central Prometheus scraping edge devices
global:
scrape_interval: 30s
evaluation_interval: 30s
scrape_configs:
- job_name: 'edge-ai-fleet'
file_sd_configs:
- files:
- /etc/prometheus/edge_targets/*.json
refresh_interval: 5m
# Or use DNS service discovery if devices register
- job_name: 'edge-ai-dns'
dns_sd_configs:
- names:
- '_metrics._tcp.edge.internal'
type: SRV
refresh_interval: 60s
# Alert rules
rule_files:
- /etc/prometheus/rules/edge_ai_alerts.yml# edge_ai_alerts.yml
groups:
- name: edge_ai
rules:
- alert: HighInferenceLatency
expr: histogram_quantile(0.95, rate(inference_latency_seconds_bucket[5m])) > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "High inference latency on {{ $labels.instance }}"
description: "P95 latency is {{ $value }}s (threshold: 50ms)"
- alert: LowConfidenceSpike
expr: rate(prediction_confidence_bucket{le="0.5"}[15m]) / rate(prediction_confidence_count[15m]) > 0.1
for: 10m
labels:
severity: critical
annotations:
summary: "Model confidence dropping on {{ $labels.instance }}"
description: ">10% of predictions below 0.5 confidence β possible model drift"
- alert: GPUOverheating
expr: gpu_temperature_celsius > 85
for: 2m
labels:
severity: critical
annotations:
summary: "GPU overheating on {{ $labels.instance }}"
- alert: DeviceUnreachable
expr: up{job="edge-ai-fleet"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Edge device {{ $labels.instance }} unreachable"The most insidious edge AI failure: the model slowly gets worse. New product variants, lighting changes, seasonal differences.
# Track accuracy against known-good reference images
VALIDATION_ACCURACY = Gauge(
'model_validation_accuracy',
'Accuracy on reference validation set'
)
def periodic_validation():
"""Run every hour against reference images."""
correct = 0
total = 0
for image, expected in REFERENCE_SET:
prediction = model.predict(image)
if prediction.label == expected:
correct += 1
total += 1
accuracy = correct / total
VALIDATION_ACCURACY.set(accuracy)
if accuracy < 0.95:
alert(f"Model accuracy dropped to {accuracy:.1%} β retrain needed")Key panels for the edge AI fleet dashboard:
Monitoring edge AI is harder than monitoring cloud services because devices fail in ways cloud servers donβt β dirty cameras, power fluctuations, physical damage. Build your dashboards with these failure modes in mind.
AI & Cloud Advisor with 18+ years experience. Author of 8 technical books, creator of Ansible Pilot, and instructor at CopyPasteLearn Academy. Speaker at KubeCon EU & Red Hat Summit 2026.
Running OpenClaw in Docker and getting connection refused? Common issues with port mapping, bind addresses, DNS resolution, and WebSocket upgrades explained with fixes.
Getting the allowedorigins error when starting your OpenClaw gateway? Here is exactly how to fix it, with step-by-step configuration for local network, VPS, and reverse proxy setups.
Troubleshoot OpenClaw API key issues across OpenAI, Anthropic, and GitHub Copilot. Covers 401 errors, invalid key formats, rate limits, and model fallback configuration.