Red Hat Enterprise Linux AI (RHEL AI) simplifies enterprise AI deployment by bundling the essential components needed to build, train, and deploy machine learning models at scale. In this guide, we’ll walk through the installation process and configure GPU acceleration for optimal performance.
Before you begin, ensure you have (as covered in Chapter 2 of Practical RHEL AI):
Start by updating your RHEL system to the latest packages:
sudo dnf update -y
sudo dnf install -y git curl wgetRed Hat provides RHEL AI through their enterprise repositories. The book covers multiple installation methods:
Option A: Standard Installation
sudo subscription-manager repos --enable rhel-9-for-x86_64-appstream-rpms
sudo dnf install -y rhel-aiOption B: Kickstart for Bare Metal (Chapter 2) For automated bare metal installs, use Kickstart snippets provided in the book.
Option C: Cloud Templates The book provides cloud templates for AWS, Azure, and GCP deployments.
This installs the core RHEL AI components including:
Install NVIDIA GPU drivers and CUDA toolkit:
# Add NVIDIA repository
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo
# Install NVIDIA drivers and CUDA
sudo dnf install -y cuda-toolkit nvidia-driver
# Verify installation
nvidia-smiInstall ROCm (Radeon Open Compute):
# Add AMD ROCm repository
sudo dnf config-manager --add-repo https://repo.radeon.com/rocm/rhel9/rocm.repo
# Install ROCm runtime
sudo dnf install -y rocm-core rocm-runtime
# Verify installation
rocm-smiConfirm your GPU is properly configured:
# Check available GPUs
rhel-ai gpu list
# Test GPU functionality
rhel-ai gpu testCreate your RHEL AI working directory and initialize the environment:
mkdir -p ~/rhel-ai
cd ~/rhel-ai
# Initialize RHEL AI environment
rhel-ai initThis creates:
.rhel-ai/ configuration directoryRHEL AI comes pre-configured with access to open-source models. Download a model for testing:
# List available models
rhel-ai model list
# Download Granite model (recommended)
rhel-ai model download granite-8b
# Verify download
rhel-ai model list --localChoose the right GPU for your workload:
| GPU Model | Memory | Best For | Cost |
|---|---|---|---|
| NVIDIA A100 | 40/80GB | Multi-user inference, training | High |
| NVIDIA H100 | 80GB | Large model training | Very High |
| AMD MI300X | 192GB | Mixed workloads | Very High |
| NVIDIA A10 | 24GB | Single-user development | Medium |
Issue: GPU not detected
# Restart GPU service
sudo systemctl restart nvidia-persistenced
# Reload drivers
sudo modprobe -r nvidia_uvm
sudo modprobe nvidia_uvmIssue: CUDA version mismatch
# Check installed CUDA version
nvcc --version
# Update to latest CUDA
sudo dnf update -y cuda-toolkitIssue: Insufficient GPU memory
Enable NVMe offload to use system memory:
export DEEPSPEED_OFFLOAD=1
export DEEPSPEED_OFFLOAD_DEVICE=cpuNow that RHEL AI is installed and GPU is configured, you’re ready to:
Ready to deploy enterprise AI? With RHEL AI installed and GPU configured, you have a solid foundation for building production-grade AI solutions. In the next article, we’ll explore InstructLab for fine-tuning models tailored to your organization’s needs.
This article only scratches the surface!
Practical RHEL AI provides everything you need for a successful deployment:
Get Practical RHEL AI from Apress and deploy production-ready AI on Red Hat Enterprise Linux with confidence.
Learn More →Buy on Amazon →