What AI and cloud consulting services does Luca Berton offer?

Luca Berton provides expert consulting in AI/ML platform strategy, multi-tenant GPU orchestration on OpenShift AI, MLOps enablement, cloud infrastructure design, Kubernetes workshops, and Ansible & Python training.

What is Ansible Pilot?

Ansible Pilot is the leading resource for Ansible automation learning, featuring a YouTube channel with 6.1K subscribers and 1M+ views, plus AnsiblePilot.com with 648K total users.

How can I book a consultation with Luca Berton?

Schedule a free consultation through Calendly at calendly.com/lucaberton or visit lucaberton.com/contact.

External GPU on Raspberry Pi 5 for Local LLM Inference with OpenClaw

Luca Berton • Thu Feb 26 2026 • 1 min read •

#openclaw#egpu#raspberry-pi#amd#local-inference#pcie

The Dream: GPU-Accelerated AI on a Pi

Jeff Geerling proved that external GPUs work on Raspberry Pi 5 via the PCIe slot. AMD cards, Intel Arc, even NVIDIA (with effort). But can you actually run LLM inference on a Pi-attached GPU? I tested it with OpenClaw.

Hardware

- Raspberry Pi 5 (8GB)
- PCIe to x16 adapter (via M.2 HAT)
- AMD Radeon RX 6400 (4GB VRAM)
- Powered PCIe riser + ATX PSU
- Total cost: ~$200 (Pi + GPU + adapters)

Getting the GPU Working

Follow Jeff’s guide for AMD GPU setup on Pi OS:

# Enable PCIe Gen 3
echo "dtparam=pciex1_gen=3" | sudo tee -a /boot/firmware/config.txt
sudo reboot

# Verify GPU is detected
lspci | grep -i vga
# 01:00.0 VGA compatible controller: AMD/ATI Navi 24 [Radeon RX 6400]

# Install ROCm (AMD's GPU compute stack)
# Note: ARM64 ROCm support is experimental
wget https://repo.radeon.com/rocm/apt/latest/pool/main/r/rocm-hip-runtime/...

The Challenge: ROCm on ARM64

Here’s where reality hits. ROCm’s ARM64 support is limited. llama.cpp’s ROCm backend expects x86_64. You need to compile from source with the right flags:

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build -DGGML_HIPBLAS=ON -DCMAKE_C_COMPILER=hipcc
cmake --build build --config Release

This may or may not work depending on your ROCm version and the GPU generation. The RX 6400 is gfx1032 — not always officially supported.

What Actually Worked

After significant effort, I got llama.cpp running with partial GPU offload:

./build/bin/llama-server \
  -m models/phi-3-mini-4k-instruct-q4_0.gguf \
  -ngl 20 \  # Offload 20 layers to GPU
  -c 2048 \
  --port 8080

Performance with 4GB VRAM RX 6400:

Phi-3 3.8B (Q4_0):
  CPU only (Pi 5):    4.5 tok/s
  GPU offload (20L):  7.2 tok/s  (+60%)
  Full GPU:           Not possible (4GB VRAM too small)

Mistral 7B (Q4_0):
  CPU only:           2.1 tok/s
  GPU offload (15L):  3.8 tok/s  (+80%)

Connecting to OpenClaw

# openclaw.yaml
providers:
  local-gpu:
    type: openai-compatible
    baseUrl: http://localhost:8080/v1

models:
  default: github-copilot/gpt-5-mini
  local: local-gpu/phi-3-mini

Is It Worth It?

Honestly? No. Here’s why:

$200 for 7 tok/s — GPT-5-mini gives 50+ tok/s for $10/month
Power draw — the GPU alone uses 50W. The Pi uses 5W
Complexity — ROCm on ARM64 is fragile, updates break things
Quality — Phi-3 at Q4 can’t match GPT-5-mini for tool calling

But it’s cool as hell. And when AMD releases a 16GB card in the $150 range, and ROCm ARM64 matures, this setup will actually be practical.

The Better Option (Today)

For the same $200:

Raspberry Pi 5 (8GB): $80
2 years of Copilot Pro: $120
Result: 50+ tok/s, state-of-the-art quality, zero maintenance

The eGPU-on-Pi experiment is a glimpse of the future. But the future isn’t quite here yet.

📌 Need expert help with this topic?

🧠

AI Integration & GPU Platforms

Need help deploying AI/ML platforms? Get expert consulting on OpenShift AI, GPU orchestration, and MLOps.

☸️

Kubernetes & Containerization

Master Kubernetes and container orchestration with hands-on workshops and architecture consulting.

Book a free consultation →

Luca Berton

AI & Cloud Advisor with 18+ years experience. Author of 8 technical books, creator of Ansible Pilot, and instructor at CopyPasteLearn Academy. Speaker at KubeCon EU & Red Hat Summit 2026.

LinkedIn Bluesky YouTube Contact →

← Back to Blog

JSON vs TOON for AI Input: Token-Efficient Data for LLMs

Compare JSON and TOON (Token-Oriented Object Notation) for feeding structured data to Large Language Models. See how TOON cuts token counts by up to 50 percent while keeping JSON compatibility.

Tue Mar 03 2026

Building Custom AI Skills with InstructLab Taxonomy

Create domain-specific AI capabilities using InstructLab's taxonomy system—from writing skill definitions to generating synthetic training data and validating fine-tuned models.

Mon Mar 02 2026

Accessing the OpenClaw Control UI Dashboard on Azure

How to access the OpenClaw Control UI dashboard from an Azure VM — via SSH tunnel (secure) or public IP. Covers device pairing, dashboard authentication, and the browser-based management interface.

Thu Feb 26 2026