βMake it fasterβ is easy to say and hard to do well β mostly because intuition about where a program spends its time is usually wrong. The cure is measurement, and RISC-V gives you a clean, standardized set of tools for it. Here is how performance profiling works on RISC-V, from raw counters to perf.

Measure, Donβt Guess
The first rule of optimization is to profile before you change anything. Modern cores are deep and out-of-order enough that a hot loop may be limited by memory, by branches, or by a dependency chain β not by the line you suspect. RISC-Vβs answer is a standardized Hardware Performance Monitor (HPM) baked into the ISA, so the same measurement approach works across implementations.
The Architectural Counters
Three counters are part of the base architecture and readable as CSRs (control and status registers):
| CSR | Counts |
|---|---|
cycle | Clock cycles elapsed |
time | Wall-clock time (fixed frequency) |
instret | Instructions retired (completed) |
You can read them directly in assembly:
rdcycle t0 # cycles
rdinstret t1 # retired instructionsThe single most useful derived metric is IPC = instret / cycle. A low IPC means the core is stalling β and that tells you to go hunting for the cause.
Programmable Event Counters
Beyond the fixed three, RISC-V defines mhpmcounter registers β programmable counters you point at specific microarchitectural events: cache misses, branch mispredictions, TLB misses, and more (the exact event set depends on the core). These are the counters that turn βitβs slowβ into βitβs slow because the L2 miss rate is high,β which is the difference between guessing and engineering.
Profiling on Linux with perf
On application-class RISC-V running Linux, you do not poke CSRs by hand β you use perf, which reads the HPM counters for you:
perf stat ./myprogram # summary: cycles, instructions, IPC, misses
perf record ./myprogram # sampled profile
perf report # interactive hotspot viewperf stat gives the high-level numbers; perf record/perf report show you the hot functions. Start at the top of the profile β that is where optimization pays off.
A Sane Optimization Workflow
- Profile to find the real hotspot (
perf record/report). - Check IPC on that hotspot β is it compute-bound or stall-bound?
- Drill into events β cache misses? branch mispredicts? β with programmable counters.
- Change one thing, then re-measure to confirm it actually helped.
- For data-parallel hotspots, consider the vector extension (RVV).
Resist the urge to optimize code that the profile says is cold β it is wasted effort and adds risk.
Microarchitecture Matters
Two chips implementing the same ISA can behave very differently β pipeline depth, cache sizes, and predictors all vary between cores. That is why you profile on the actual target rather than trusting a number from a different machine. Tools like the open RISC-V cores even expose rich counters for research-grade analysis.
The Bottom Line
RISC-V makes performance work refreshingly principled: architectural cycle, time, and instret counters give you IPC for free, programmable mhpmcounter events explain why a hotspot stalls, and perf on Linux ties it all together. Measure first, change one thing at a time, and always re-measure. Done that way, optimization stops being folklore and becomes exactly what it should be β an engineering discipline grounded in real numbers from real hardware.
Part of my RISC-V series. See also the assembly tutorial and the vector extension.



