Getting Started with RHEL AI: Installation and GPU Setup

Red Hat Enterprise Linux AI (RHEL AI) simplifies enterprise AI deployment by bundling the essential components needed to build, train, and deploy machine learning models at scale. In this guide, we’ll walk through the installation process and configure GPU acceleration for optimal performance.

Prerequisites

Before you begin, ensure you have (as covered in Chapter 2 of Practical RHEL AI):

Red Hat Enterprise Linux 9 or later installed on your hardware
GPU Hardware: NVIDIA A100, H100, or AMD MI300X recommended
Container Runtime: Podman (preferred) or Docker
Sufficient disk space: At least 100GB for models and dependencies
Network connectivity: For downloading models and dependencies
Root or sudo access to the system
Basic Knowledge: Linux administration, Python, and AI/ML concepts

Step 1: Update Your System

Start by updating your RHEL system to the latest packages:

sudo dnf update -y
sudo dnf install -y git curl wget

Step 2: Install RHEL AI

Red Hat provides RHEL AI through their enterprise repositories. The book covers multiple installation methods:

Option A: Standard Installation

sudo subscription-manager repos --enable rhel-9-for-x86_64-appstream-rpms
sudo dnf install -y rhel-ai

Option B: Kickstart for Bare Metal (Chapter 2) For automated bare metal installs, use Kickstart snippets provided in the book.

Option C: Cloud Templates The book provides cloud templates for AWS, Azure, and GCP deployments.

This installs the core RHEL AI components including:

DeepSpeed for distributed training (ZeRO 3, MiCS scaling)
vLLM for optimized inference
InstructLab CLI for model fine-tuning
Essential ML libraries (PyTorch, TensorFlow, Scikit-learn)

Step 3: Configure GPU Acceleration

For NVIDIA GPUs:

Install NVIDIA GPU drivers and CUDA toolkit:

# Add NVIDIA repository
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo

# Install NVIDIA drivers and CUDA
sudo dnf install -y cuda-toolkit nvidia-driver

# Verify installation
nvidia-smi

For AMD MI300X:

Install ROCm (Radeon Open Compute):

# Add AMD ROCm repository
sudo dnf config-manager --add-repo https://repo.radeon.com/rocm/rhel9/rocm.repo

# Install ROCm runtime
sudo dnf install -y rocm-core rocm-runtime

# Verify installation
rocm-smi

Step 4: Verify GPU Access

Confirm your GPU is properly configured:

# Check available GPUs
rhel-ai gpu list

# Test GPU functionality
rhel-ai gpu test

Step 5: Initialize RHEL AI

Create your RHEL AI working directory and initialize the environment:

mkdir -p ~/rhel-ai
cd ~/rhel-ai

# Initialize RHEL AI environment
rhel-ai init

This creates:

.rhel-ai/ configuration directory
Model cache directory
Environment variables setup

Step 6: Download Your First Model

RHEL AI comes pre-configured with access to open-source models. Download a model for testing:

# List available models
rhel-ai model list

# Download Granite model (recommended)
rhel-ai model download granite-8b

# Verify download
rhel-ai model list --local

Hardware Sizing Guide

Choose the right GPU for your workload:

GPU Model	Memory	Best For	Cost
NVIDIA A100	40/80GB	Multi-user inference, training	High
NVIDIA H100	80GB	Large model training	Very High
AMD MI300X	192GB	Mixed workloads	Very High
NVIDIA A10	24GB	Single-user development	Medium

Troubleshooting Common Issues

Issue: GPU not detected

# Restart GPU service
sudo systemctl restart nvidia-persistenced

# Reload drivers
sudo modprobe -r nvidia_uvm
sudo modprobe nvidia_uvm

Issue: CUDA version mismatch

# Check installed CUDA version
nvcc --version

# Update to latest CUDA
sudo dnf update -y cuda-toolkit

Issue: Insufficient GPU memory

Enable NVMe offload to use system memory:

export DEEPSPEED_OFFLOAD=1
export DEEPSPEED_OFFLOAD_DEVICE=cpu

Next Steps

Now that RHEL AI is installed and GPU is configured, you’re ready to:

Fine-tune models using InstructLab
Serve models with vLLM
Monitor performance with Prometheus and Grafana
Scale across clusters with Kubernetes

Resources

Ready to deploy enterprise AI? With RHEL AI installed and GPU configured, you have a solid foundation for building production-grade AI solutions. In the next article, we’ll explore InstructLab for fine-tuning models tailored to your organization’s needs.

📚 Get the Complete Installation Guide

This article only scratches the surface!

Practical RHEL AI provides everything you need for a successful deployment:

✅ Detailed Kickstart templates for automated bare-metal installs
✅ Cloud deployment templates for AWS, Azure, and GCP
✅ Troubleshooting guides for 50+ common installation issues
✅ Hardware compatibility matrices and validation scripts
✅ Security hardening checklists for enterprise compliance

🚀 Pre-Order Now - Available March 2026

Get Practical RHEL AI from Apress and deploy production-ready AI on Red Hat Enterprise Linux with confidence.

Learn More →Buy on Amazon →