Skip to main content
πŸŽ“ Claude Code Masterclass Learn AI-assisted development on Udemy β€” plus the companion book on Leanpub & Amazon. Start Learning
RISC-V performance engineering and profiling session at RISC-V Summit Europe 2026
RISC-V

Performance Profiling and Counters on RISC-V

Make RISC-V code fast β€” hardware performance counters (HPM), the cycle and instret CSRs, perf on Linux, and a sane workflow for finding real bottlenecks.

LB
Luca Berton
Β· 3 min read

β€œMake it faster” is easy to say and hard to do well β€” mostly because intuition about where a program spends its time is usually wrong. The cure is measurement, and RISC-V gives you a clean, standardized set of tools for it. Here is how performance profiling works on RISC-V, from raw counters to perf.

RISC-V performance engineering and profiling session at the Summit

Measure, Don’t Guess

The first rule of optimization is to profile before you change anything. Modern cores are deep and out-of-order enough that a hot loop may be limited by memory, by branches, or by a dependency chain β€” not by the line you suspect. RISC-V’s answer is a standardized Hardware Performance Monitor (HPM) baked into the ISA, so the same measurement approach works across implementations.

The Architectural Counters

Three counters are part of the base architecture and readable as CSRs (control and status registers):

CSRCounts
cycleClock cycles elapsed
timeWall-clock time (fixed frequency)
instretInstructions retired (completed)

You can read them directly in assembly:

rdcycle   t0    # cycles
rdinstret t1    # retired instructions

The single most useful derived metric is IPC = instret / cycle. A low IPC means the core is stalling β€” and that tells you to go hunting for the cause.

Programmable Event Counters

Beyond the fixed three, RISC-V defines mhpmcounter registers β€” programmable counters you point at specific microarchitectural events: cache misses, branch mispredictions, TLB misses, and more (the exact event set depends on the core). These are the counters that turn β€œit’s slow” into β€œit’s slow because the L2 miss rate is high,” which is the difference between guessing and engineering.

Profiling on Linux with perf

On application-class RISC-V running Linux, you do not poke CSRs by hand β€” you use perf, which reads the HPM counters for you:

perf stat ./myprogram          # summary: cycles, instructions, IPC, misses
perf record ./myprogram        # sampled profile
perf report                    # interactive hotspot view

perf stat gives the high-level numbers; perf record/perf report show you the hot functions. Start at the top of the profile β€” that is where optimization pays off.

A Sane Optimization Workflow

  1. Profile to find the real hotspot (perf record/report).
  2. Check IPC on that hotspot β€” is it compute-bound or stall-bound?
  3. Drill into events β€” cache misses? branch mispredicts? β€” with programmable counters.
  4. Change one thing, then re-measure to confirm it actually helped.
  5. For data-parallel hotspots, consider the vector extension (RVV).

Resist the urge to optimize code that the profile says is cold β€” it is wasted effort and adds risk.

Microarchitecture Matters

Two chips implementing the same ISA can behave very differently β€” pipeline depth, cache sizes, and predictors all vary between cores. That is why you profile on the actual target rather than trusting a number from a different machine. Tools like the open RISC-V cores even expose rich counters for research-grade analysis.

The Bottom Line

RISC-V makes performance work refreshingly principled: architectural cycle, time, and instret counters give you IPC for free, programmable mhpmcounter events explain why a hotspot stalls, and perf on Linux ties it all together. Measure first, change one thing at a time, and always re-measure. Done that way, optimization stops being folklore and becomes exactly what it should be β€” an engineering discipline grounded in real numbers from real hardware.


Part of my RISC-V series. See also the assembly tutorial and the vector extension.

Frequently Asked Questions

What are hardware performance counters on RISC-V?

They are special registers that count microarchitectural events β€” cycles, retired instructions, cache misses, branch mispredictions, and more. RISC-V standardizes them as the Hardware Performance Monitor (HPM): the cycle, time, and instret counters are architectural, and mhpmcounter registers provide additional programmable event counters that profiling tools read to explain where time goes.

Can I use Linux perf on RISC-V?

Yes. The Linux perf subsystem works on RISC-V and reads the hardware performance counters exposed by the core. You can run perf stat for summary counts and perf record/report for sampled profiles, just as on x86 or Arm, provided the kernel and the specific core expose the relevant events.

What is the difference between cycles and retired instructions?

The cycle counter measures elapsed clock cycles, while instret counts instructions that actually completed (retired). Dividing them gives instructions-per-cycle (IPC), a key efficiency metric. A low IPC suggests the core is stalling β€” on cache misses, branch mispredictions, or dependencies β€” which points you toward the real bottleneck.

#RISC-V #performance #profiling #perf #optimization
Share:

πŸ“¬ Don't miss the next one

Get AI & Cloud insights delivered weekly

Join engineers getting practical tips on AI, Kubernetes, Ansible, and Platform Engineering.

Subscribe Free β†’
Luca Berton β€” AI & Cloud Advisor, Docker Captain

Luca Berton

AI & Cloud Advisor Β· Docker Captain Β· KubeCon Speaker

18+ years in enterprise infrastructure. Author of 8 technical books, creator of Ansible Pilot (1M+ YouTube views, 648K site users). Former Red Hat engineer. Speaker at KubeCon EU 2026 and Red Hat Summit 2026.

Free 30-min AI & Cloud consultation

Book Now