Performance Profiling and Counters on RISC-V

“Make it faster” is easy to say and hard to do well — mostly because intuition about where a program spends its time is usually wrong. The cure is measurement, and RISC-V gives you a clean, standardized set of tools for it. Here is how performance profiling works on RISC-V, from raw counters to perf.

RISC-V performance engineering and profiling session at the Summit

Measure, Don’t Guess

The first rule of optimization is to profile before you change anything. Modern cores are deep and out-of-order enough that a hot loop may be limited by memory, by branches, or by a dependency chain — not by the line you suspect. RISC-V’s answer is a standardized Hardware Performance Monitor (HPM) baked into the ISA, so the same measurement approach works across implementations.

The Architectural Counters

Three counters are part of the base architecture and readable as CSRs (control and status registers):

CSR	Counts
`cycle`	Clock cycles elapsed
`time`	Wall-clock time (fixed frequency)
`instret`	Instructions retired (completed)

You can read them directly in assembly:

rdcycle   t0    # cycles
rdinstret t1    # retired instructions

The single most useful derived metric is IPC = instret / cycle. A low IPC means the core is stalling — and that tells you to go hunting for the cause.

Programmable Event Counters

Beyond the fixed three, RISC-V defines mhpmcounter registers — programmable counters you point at specific microarchitectural events: cache misses, branch mispredictions, TLB misses, and more (the exact event set depends on the core). These are the counters that turn “it’s slow” into “it’s slow because the L2 miss rate is high,” which is the difference between guessing and engineering.

Profiling on Linux with perf

On application-class RISC-V running Linux, you do not poke CSRs by hand — you use perf, which reads the HPM counters for you:

perf stat ./myprogram          # summary: cycles, instructions, IPC, misses
perf record ./myprogram        # sampled profile
perf report                    # interactive hotspot view

perf stat gives the high-level numbers; perf record/perf report show you the hot functions. Start at the top of the profile — that is where optimization pays off.

A Sane Optimization Workflow

Profile to find the real hotspot (perf record/report).
Check IPC on that hotspot — is it compute-bound or stall-bound?
Drill into events — cache misses? branch mispredicts? — with programmable counters.
Change one thing, then re-measure to confirm it actually helped.
For data-parallel hotspots, consider the vector extension (RVV).

Resist the urge to optimize code that the profile says is cold — it is wasted effort and adds risk.

Microarchitecture Matters

Two chips implementing the same ISA can behave very differently — pipeline depth, cache sizes, and predictors all vary between cores. That is why you profile on the actual target rather than trusting a number from a different machine. Tools like the open RISC-V cores even expose rich counters for research-grade analysis.

The Bottom Line

RISC-V makes performance work refreshingly principled: architectural cycle, time, and instret counters give you IPC for free, programmable mhpmcounter events explain why a hotspot stalls, and perf on Linux ties it all together. Measure first, change one thing at a time, and always re-measure. Done that way, optimization stops being folklore and becomes exactly what it should be — an engineering discipline grounded in real numbers from real hardware.

Part of my RISC-V series. See also the assembly tutorial and the vector extension.

Frequently Asked Questions

What are hardware performance counters on RISC-V?

They are special registers that count microarchitectural events — cycles, retired instructions, cache misses, branch mispredictions, and more. RISC-V standardizes them as the Hardware Performance Monitor (HPM): the cycle, time, and instret counters are architectural, and mhpmcounter registers provide additional programmable event counters that profiling tools read to explain where time goes.

Can I use Linux perf on RISC-V?

Yes. The Linux perf subsystem works on RISC-V and reads the hardware performance counters exposed by the core. You can run perf stat for summary counts and perf record/report for sampled profiles, just as on x86 or Arm, provided the kernel and the specific core expose the relevant events.

What is the difference between cycles and retired instructions?

The cycle counter measures elapsed clock cycles, while instret counts instructions that actually completed (retired). Dividing them gives instructions-per-cycle (IPC), a key efficiency metric. A low IPC suggests the core is stalling — on cache misses, branch mispredictions, or dependencies — which points you toward the real bottleneck.

Performance Profiling and Counters on RISC-V

Measure, Don’t Guess

The Architectural Counters

Programmable Event Counters

Profiling on Linux with perf

A Sane Optimization Workflow

Microarchitecture Matters

The Bottom Line

Frequently Asked Questions

Related Articles

DeepComputing DC-ROMA: Running DeepSeek on a RISC-V Laptop, No GPU

Build a RISC-V Toolchain: GCC and LLVM

Getting Started with RISC-V on QEMU

The History of RISC-V: From Berkeley to the World