Modern AI training, inference, and analytics pipelines donât just need fast GPUsâthey need a fast, predictable way to feed those GPUs with data. As datasets grow, the âtime to first batchâ and the steady-state throughput of reading training shards, feature stores, embeddings, or simulation outputs can become the real limiter. Thatâs the problem Dell Technologies and NVIDIA are addressing with the combination of Dellâs scale-out NAS and NVIDIAâs direct-to-GPU storage path.
NVIDIA GPUDirect Storage (GDS) creates a direct DMA data path between storage and GPU memory, avoiding âbounce buffersâ in CPU system memory. The practical impact is higher effective bandwidth, lower I/O latency, and less CPU utilizationâespecially valuable when your CPUs are already busy with networking, preprocessing, orchestration, or running multiple GPU jobs per node. (NVIDIA Docs)
Dell PowerScale is a scale-out NAS platform designed to grow performance and capacity by adding nodes. In the performance report, Dellâs message is straightforward: as GPU-accelerated analytics and model training intensify I/O demands, storage must scale linearly and stay consistent under load. Dell positions PowerScale (with OneFS) as that elastic back-endâespecially when paired with high-speed networking and RDMA-enabled access paths. (Dell Technologies Info Hub)
GDS is delivered via NVIDIAâs Magnum IO stack and is typically used through cuFile APIs (or through frameworks/libraries that integrate them). The model is:
NVIDIA emphasizes that this âexplicit, proactiveâ approach avoids overhead from reactive paging/faulting patterns and can deliver the biggest benefit when your pipeline is GPU-first (GPU is the first/last to touch the data). (NVIDIA Docs)
A key detail: traditionally, direct transfers have relied on opening files with O_DIRECT (plus alignment requirements), though NVIDIA notes newer releases can still take the GDS-driven path in more cases when buffers/offsets are aligned. (NVIDIA Docs)
Dellâs report focuses on a concrete, common architecture for GPU clusters:
Dell states that the testing demonstrates PowerScale OneFS with NFSoRDMA is compatible and supported by NVIDIA GDS, and that the platform scales to meet growth demands. (Dell Technologies Info Hub)
From NVIDIAâs side, the Release Notes explicitly list Dellâs platform in the ecosystem of third-party storage solutions where GDS is available, and the support matrix includes a PowerScale entry (e.g., PowerScale 9.2.0.0 paired with early GDS versions). (NVIDIA Docs)
Dell used NVIDIAâs gdsio utility (included in the GDS package) to drive the workload and measure performance. (Dell Technologies Info Hub)
In the reported GDSIO sequential read tests (512 KiB I/O size, multiple threads per GPU, large file sizes), Dell highlights:
To keep results focused on raw I/O and reduce background effects, Dell disabled or tuned several storage features during testing:
On the compute side, Dell documents GDS installation/validation, plus the use of dual 100 Gbps NICs and explicit mapping of mount points to specific PowerScale front-end IPsâaimed at maximizing throughput and avoiding hot spots. (Dell Technologies Info Hub)
A practical insight from the report is that GPUDirect-style benefits depend heavily on where your GPUs and NICs sit in the PCIe/NUMA topology.
Dell stresses limiting the number of âhopsâ between GPU and NIC, grouping GPUs and NICs by NUMA affinity, and using tools like nvidia-smi topo (and classic Linux tools like lspci) to understand the layout. (Dell Technologies Info Hub)
This is one of the easiest ways to lose performance âmysteriouslyâ: the storage path might be fast, but traffic is bouncing across sockets/interconnects before it even reaches the NIC that talks to the storage.
Below is a distilled version of the operational pattern implied by the Dell + NVIDIA docs:
Confirm GDS sees your filesystem path as supported
gdscheck output where NFS is supported in their environment. (Dell Technologies Info Hub)Use NFSoRDMA and consistent mount mapping
Align data path with topology
Example (illustrative, based on Dellâs approach):
mount -o proto=rdma,port=20049,vers=3,rsize=524288,wsize=524288 \
<powerscale_frontend_ip>:/ifs/benchmark /mnt/f600_gdsio1Putting it together, PowerScale + GDS is most compelling when:
In that scenario, GDS reduces wasted copies and CPU overhead, while PowerScale provides the scale-out throughput and namespace that big GPU farms tend to demand. (NVIDIA Docs)