Getting Started with RHEL AI: Installation and GPU Setup
Red Hat Enterprise Linux AI (RHEL AI) simplifies enterprise AI deployment by bundling the essential components needed to build, train, and deploy machine learning models at scale. In this guide, we’ll walk through the installation process and configure GPU acceleration for optimal performance.
Prerequisites
Before you begin, ensure you have (as covered in Chapter 2 of Practical RHEL AI):
- Red Hat Enterprise Linux 9 or later installed on your hardware
- GPU Hardware: NVIDIA A100, H100, or AMD MI300X recommended
- Container Runtime: Podman (preferred) or Docker
- Sufficient disk space: At least 100GB for models and dependencies
- Network connectivity: For downloading models and dependencies
- Root or sudo access to the system
- Basic Knowledge: Linux administration, Python, and AI/ML concepts
Step 1: Update Your System
Start by updating your RHEL system to the latest packages:
sudo dnf update -y
sudo dnf install -y git curl wgetStep 2: Install RHEL AI
Red Hat provides RHEL AI through their enterprise repositories. The book covers multiple installation methods:
Option A: Standard Installation
sudo subscription-manager repos --enable rhel-9-for-x86_64-appstream-rpms
sudo dnf install -y rhel-aiOption B: Kickstart for Bare Metal (Chapter 2) For automated bare metal installs, use Kickstart snippets provided in the book.
Option C: Cloud Templates The book provides cloud templates for AWS, Azure, and GCP deployments.
This installs the core RHEL AI components including:
- DeepSpeed for distributed training (ZeRO 3, MiCS scaling)
- vLLM for optimized inference
- InstructLab CLI for model fine-tuning
- Essential ML libraries (PyTorch, TensorFlow, Scikit-learn)
Step 3: Configure GPU Acceleration
For NVIDIA GPUs:
Install NVIDIA GPU drivers and CUDA toolkit:
# Add NVIDIA repository
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo
# Install NVIDIA drivers and CUDA
sudo dnf install -y cuda-toolkit nvidia-driver
# Verify installation
nvidia-smiFor AMD MI300X:
Install ROCm (Radeon Open Compute):
# Add AMD ROCm repository
sudo dnf config-manager --add-repo https://repo.radeon.com/rocm/rhel9/rocm.repo
# Install ROCm runtime
sudo dnf install -y rocm-core rocm-runtime
# Verify installation
rocm-smiStep 4: Verify GPU Access
Confirm your GPU is properly configured:
# Check available GPUs
rhel-ai gpu list
# Test GPU functionality
rhel-ai gpu testStep 5: Initialize RHEL AI
Create your RHEL AI working directory and initialize the environment:
mkdir -p ~/rhel-ai
cd ~/rhel-ai
# Initialize RHEL AI environment
rhel-ai initThis creates:
.rhel-ai/configuration directory- Model cache directory
- Environment variables setup
Step 6: Download Your First Model
RHEL AI comes pre-configured with access to open-source models. Download a model for testing:
# List available models
rhel-ai model list
# Download Granite model (recommended)
rhel-ai model download granite-8b
# Verify download
rhel-ai model list --localHardware Sizing Guide
Choose the right GPU for your workload:
| GPU Model | Memory | Best For | Cost |
|---|---|---|---|
| NVIDIA A100 | 40/80GB | Multi-user inference, training | High |
| NVIDIA H100 | 80GB | Large model training | Very High |
| AMD MI300X | 192GB | Mixed workloads | Very High |
| NVIDIA A10 | 24GB | Single-user development | Medium |
Troubleshooting Common Issues
Issue: GPU not detected
# Restart GPU service
sudo systemctl restart nvidia-persistenced
# Reload drivers
sudo modprobe -r nvidia_uvm
sudo modprobe nvidia_uvmIssue: CUDA version mismatch
# Check installed CUDA version
nvcc --version
# Update to latest CUDA
sudo dnf update -y cuda-toolkitIssue: Insufficient GPU memory
Enable NVMe offload to use system memory:
export DEEPSPEED_OFFLOAD=1
export DEEPSPEED_OFFLOAD_DEVICE=cpuNext Steps
Now that RHEL AI is installed and GPU is configured, you’re ready to:
- Fine-tune models using InstructLab
- Serve models with vLLM
- Monitor performance with Prometheus and Grafana
- Scale across clusters with Kubernetes
Resources
- RHEL AI Official Documentation
- InstructLab Getting Started
- DeepSpeed Optimization Guide
- NVIDIA CUDA Installation Guide
Ready to deploy enterprise AI? With RHEL AI installed and GPU configured, you have a solid foundation for building production-grade AI solutions. In the next article, we’ll explore InstructLab for fine-tuning models tailored to your organization’s needs.
Get the Complete Installation Guide
This article only scratches the surface!
Practical RHEL AI provides everything you need for a successful deployment:
- ✅ Detailed Kickstart templates for automated bare-metal installs
- ✅ Cloud deployment templates for AWS, Azure, and GCP
- ✅ Troubleshooting guides for 50+ common installation issues
- ✅ Hardware compatibility matrices and validation scripts
- ✅ Security hardening checklists for enterprise compliance
🚀 Pre-Order Now - Available March 2026
Get Practical RHEL AI from Apress and deploy production-ready AI on Red Hat Enterprise Linux with confidence.
Learn More →Buy on Amazon →