Why Tune the NIC on Client Machines
Most Linux installations ship with conservative NIC defaults. These defaults prioritize compatibility and low resource usage over throughput. On a client machine that needs to push data to storage, pull from GPU clusters, or sustain high-bandwidth connections, those defaults leave performance on the table.
Common symptoms of an untuned NIC:
- Dropped packets under load (
rx_missed_errorsincrementing) - Throughput plateaus well below link speed (getting 40 Gbps on a 100 Gbps NIC)
- Latency spikes during burst traffic
- CPU saturation on a single core while other cores are idle (IRQ imbalance)
Here is every knob that matters, in the order you should tune them.
Step 1: Identify Your NIC
# List network interfaces
ip link show
# Get NIC details
ethtool -i ens1f0
# driver: mlx5_core
# version: 24.10-1.1.4
# firmware-version: 22.42.1000
# Check current link speed
ethtool ens1f0 | grep Speed
# Speed: 100000Mb/s (100 Gbps)Step 2: Increase Ring Buffer Size
Ring buffers are the NICβs internal queues. When they overflow, packets are dropped before they even reach the kernel. This is the single most impactful tuning for preventing drops.
# Check current and maximum ring buffer sizes
ethtool -g ens1f0
# Ring parameters for ens1f0:
# Pre-set maximums:
# RX: 8192
# TX: 8192
# Current hardware settings:
# RX: 1024 <-- default, too small
# TX: 1024Increase to maximum:
# Set ring buffers to maximum
ethtool -G ens1f0 rx 8192 tx 8192
# Verify
ethtool -g ens1f0Impact: Prevents rx_missed_errors and rx_no_buffer_count under burst traffic. Critical for 25 Gbps+ NICs.
Step 3: Tune Interrupt Coalescing
Interrupt coalescing controls how many packets the NIC batches before raising an interrupt. More coalescing = higher throughput but higher latency. Less coalescing = lower latency but more CPU overhead.
# Check current coalescing settings
ethtool -c ens1f0
# Adaptive RX: off
# rx-usecs: 8
# rx-frames: 128
# tx-usecs: 16
# tx-frames: 128For Throughput (Bulk Transfer, AI Training, Storage)
# Enable adaptive coalescing (NIC auto-tunes)
ethtool -C ens1f0 adaptive-rx on adaptive-tx on
# Or set manually for high throughput
ethtool -C ens1f0 rx-usecs 64 rx-frames 256 tx-usecs 64 tx-frames 256For Low Latency (Trading, Real-Time, Interactive)
# Minimize coalescing for lowest latency
ethtool -C ens1f0 adaptive-rx off adaptive-tx off
ethtool -C ens1f0 rx-usecs 0 rx-frames 1 tx-usecs 0 tx-frames 1For Balanced (General Workloads)
# Adaptive mode handles most scenarios well
ethtool -C ens1f0 adaptive-rx on adaptive-tx onStep 4: Enable Hardware Offloads
Let the NIC handle work that would otherwise consume CPU cycles.
# Check current offload settings
ethtool -k ens1f0Enable the important ones:
# TCP Segmentation Offload (TSO) β NIC handles TCP segmentation
ethtool -K ens1f0 tso on
# Generic Segmentation Offload (GSO)
ethtool -K ens1f0 gso on
# Generic Receive Offload (GRO) β batches incoming packets
ethtool -K ens1f0 gro on
# Large Receive Offload (LRO) β aggregates TCP segments
# (disable if using routing/forwarding β breaks IP forwarding)
ethtool -K ens1f0 lro on
# Scatter-Gather (reduces memory copies)
ethtool -K ens1f0 sg on
# TX/RX checksum offload
ethtool -K ens1f0 tx-checksum-ip-generic on
ethtool -K ens1f0 rx-checksum on
# Receive hashing (multi-queue distribution)
ethtool -K ens1f0 rxhash onVerify all offloads:
ethtool -k ens1f0 | grep -E "^(tcp|generic|large|scatter|checksum|receive-hash)"Step 5: Configure Multi-Queue and IRQ Affinity
Modern NICs have multiple RX/TX queues that can be spread across CPU cores. By default, Linux may assign all interrupts to a single core, creating a bottleneck.
Check Current Queue Count
# Number of combined queues
ethtool -l ens1f0
# Combined: 16 <-- maximum
# Current: 8 <-- active
# Set to maximum
ethtool -L ens1f0 combined 16Pin IRQs to CPU Cores
# Find NIC interrupts
grep ens1f0 /proc/interrupts
# Or for Mellanox:
grep mlx5 /proc/interrupts
# Automatic IRQ balancing (recommended first)
systemctl start irqbalance
systemctl enable irqbalanceFor manual pinning (HPC/AI clusters where you need deterministic placement):
# Disable irqbalance
systemctl stop irqbalance
# Pin each queue's IRQ to a specific CPU core
# Get IRQ numbers
IRQS=$(grep mlx5_comp /proc/interrupts | awk '{print $1}' | tr -d ':')
CORE=0
for IRQ in $IRQS; do
echo $CORE > /proc/irq/$IRQ/smp_affinity_list
echo "IRQ $IRQ -> CPU $CORE"
CORE=$((CORE + 1))
doneEnable Receive Packet Steering (RPS) and Transmit Packet Steering (XPS)
If you have fewer NIC queues than CPU cores:
# RPS: distribute received packets across cores
# Set to all cores (for 16-core system: ffff)
for Q in /sys/class/net/ens1f0/queues/rx-*/rps_cpus; do
echo ffff > $Q
done
# XPS: map TX queues to CPU cores
CORE=0
for Q in /sys/class/net/ens1f0/queues/tx-*/xps_cpus; do
echo $((1 << CORE)) > $Q
CORE=$((CORE + 1))
doneStep 6: MTU and Jumbo Frames
If your network supports jumbo frames (9000 byte MTU), enabling them reduces per-packet overhead and increases throughput β especially for bulk transfers.
# Check current MTU
ip link show ens1f0 | grep mtu
# Enable jumbo frames
ip link set ens1f0 mtu 9000
# Verify end-to-end (must be enabled on every hop)
ping -M do -s 8972 <destination_ip>
# 8972 = 9000 MTU - 20 IP header - 8 ICMP headerImportant: Jumbo frames must be enabled on the NIC, every switch in the path, and the destination. A single hop with 1500 MTU will fragment or drop jumbo packets.
Step 7: Kernel Sysctl Parameters
TCP/IP stack tuning at the kernel level:
cat >> /etc/sysctl.d/99-nic-tuning.conf << 'EOF'
# Increase socket buffer sizes
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
# TCP buffer auto-tuning range (min, default, max)
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
# Increase network backlog for burst handling
net.core.netdev_max_backlog = 250000
# Enable TCP window scaling
net.ipv4.tcp_window_scaling = 1
# Enable TCP timestamps
net.ipv4.tcp_timestamps = 1
# Enable selective acknowledgments
net.ipv4.tcp_sack = 1
# Increase max connections backlog
net.core.somaxconn = 65535
# Increase max SYN backlog
net.ipv4.tcp_max_syn_backlog = 65535
# Reduce TIME_WAIT sockets
net.ipv4.tcp_tw_reuse = 1
# Increase local port range
net.ipv4.ip_local_port_range = 1024 65535
# Enable busy polling for low-latency workloads
# (set to 0 to disable, 50 for moderate, higher for aggressive)
net.core.busy_read = 50
net.core.busy_poll = 50
# Increase number of allowed connections tracked
net.netfilter.nf_conntrack_max = 1048576
EOF
# Apply
sysctl -p /etc/sysctl.d/99-nic-tuning.confStep 8: Verify with Performance Tests
Throughput Test (iperf3)
# Server side
iperf3 -s
# Client side β single stream
iperf3 -c <server_ip> -t 30
# Client side β parallel streams (saturate the link)
iperf3 -c <server_ip> -t 30 -P 8
# Client side β with jumbo frames
iperf3 -c <server_ip> -t 30 -P 8 -M 8948Check for Drops
# NIC-level drops
ethtool -S ens1f0 | grep -E "drop|error|miss|discard"
# Kernel-level drops
cat /proc/net/softnet_stat
# Column 2 = dropped, Column 3 = time_squeeze (CPU too busy)
# Watch in real-time
watch -n 1 "ethtool -S ens1f0 | grep -E 'drop|error|miss'"Latency Test
# Basic ping latency
ping -c 100 -i 0.01 <server_ip> | tail -1
# Detailed latency histogram (sockperf)
sockperf ping-pong -i <server_ip> -t 10Complete Tuning Script
#!/bin/bash
# NIC tuning script for Linux client machines
# Usage: sudo ./tune-nic.sh ens1f0
IFACE=${1:-ens1f0}
echo "Tuning NIC: $IFACE"
# Ring buffers β maximize
RX_MAX=$(ethtool -g $IFACE 2>/dev/null | awk '/Pre-set/,/^$/' | grep RX: | head -1 | awk '{print $2}')
TX_MAX=$(ethtool -g $IFACE 2>/dev/null | awk '/Pre-set/,/^$/' | grep TX: | head -1 | awk '{print $2}')
[ -n "$RX_MAX" ] && ethtool -G $IFACE rx $RX_MAX tx $TX_MAX && echo "Ring buffers: RX=$RX_MAX TX=$TX_MAX"
# Adaptive coalescing
ethtool -C $IFACE adaptive-rx on adaptive-tx on 2>/dev/null && echo "Adaptive coalescing: enabled"
# Offloads
for OFFLOAD in tso gso gro sg rxhash; do
ethtool -K $IFACE $OFFLOAD on 2>/dev/null
done
echo "Offloads: tso gso gro sg rxhash enabled"
# Maximize queues
QMAX=$(ethtool -l $IFACE 2>/dev/null | awk '/Pre-set/,/^$/' | grep Combined: | awk '{print $2}')
[ -n "$QMAX" ] && ethtool -L $IFACE combined $QMAX 2>/dev/null && echo "Queues: $QMAX"
# Sysctl
sysctl -w net.core.rmem_max=134217728 > /dev/null
sysctl -w net.core.wmem_max=134217728 > /dev/null
sysctl -w net.core.netdev_max_backlog=250000 > /dev/null
sysctl -w net.ipv4.tcp_rmem="4096 87380 134217728" > /dev/null
sysctl -w net.ipv4.tcp_wmem="4096 65536 134217728" > /dev/null
echo "Sysctl: buffer sizes and backlog tuned"
echo "Done. Verify with: ethtool -S $IFACE | grep -E 'drop|error'"Save as /usr/local/bin/tune-nic.sh, make executable, and run:
chmod +x /usr/local/bin/tune-nic.sh
sudo tune-nic.sh ens1f0Quick Reference Table
| Parameter | Default | Tuned (Throughput) | Tuned (Latency) |
|---|---|---|---|
| RX ring buffer | 1024 | 8192 (max) | 8192 (max) |
| TX ring buffer | 1024 | 8192 (max) | 8192 (max) |
| Coalescing | Fixed | Adaptive on | 0 usecs, 1 frame |
| TSO/GSO/GRO | Usually on | On | On |
| LRO | Off | On | Off |
| Combined queues | 4-8 | Max available | Max available |
| IRQ affinity | Random | Pinned per core | Pinned per core |
| MTU | 1500 | 9000 (if supported) | 9000 |
| rmem_max | 212992 | 134217728 | 134217728 |
| netdev_max_backlog | 1000 | 250000 | 250000 |
| busy_poll | 0 | 0 | 50+ |
Ansible Playbook for Fleet Deployment
- name: Tune NIC on client machines
hosts: clients
become: true
vars:
nic_interface: ens1f0
mtu: 9000
tasks:
- name: Maximize ring buffers
command: ethtool -G {{ nic_interface }} rx 8192 tx 8192
ignore_errors: true
- name: Enable adaptive coalescing
command: ethtool -C {{ nic_interface }} adaptive-rx on adaptive-tx on
ignore_errors: true
- name: Enable offloads
command: ethtool -K {{ nic_interface }} tso on gso on gro on sg on rxhash on
ignore_errors: true
- name: Set MTU
command: ip link set {{ nic_interface }} mtu {{ mtu }}
- name: Deploy sysctl tuning
copy:
dest: /etc/sysctl.d/99-nic-tuning.conf
content: |
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.core.netdev_max_backlog = 250000
net.core.somaxconn = 65535
notify: reload sysctl
handlers:
- name: reload sysctl
command: sysctl -p /etc/sysctl.d/99-nic-tuning.confRelated Resources: