INP is not the average — it's closer to P98
Most senior FE devs read INP as 'some latency percentile'. Almost nobody can state the trimming rule, the Event Timing duration definition, or what an 'interaction' actually is. Here's the spec-level story with a slider you can watch misbehave.
Slide the interaction count past 50 and watch the INP bar slide off the peak:
N=120 → skip 2 → INP is the 3rd worst.
There's a reading of INP that shows up in almost every engineering deck:
"INP is the 75th-percentile interaction latency on your page."
It isn't. That sentence collapses two separate percentiles — one within a page view, one across page views. It also hides the outlier-handling rule that makes INP behave very differently from a plain P75. On a page with dozens of interactions, INP is closer to the 98th percentile of that page's interaction durations than to anything near the middle.
This post walks the two primary-source definitions. W3C Event Timing defines what a single interaction's duration is. web.dev's INP docs define the trimming rule. The slider at the end lets you feel the metric's shape.
Within a single page view: INP reports the worst interaction, except that one worst interaction is thrown away for every 50. For a page with 100 interactions, INP is the 3rd-worst. For 200, the 5th-worst. That's a ~P98 in all but name. Across page views, RUM tools then take the P75 of those per-page INPs — which is why the "P75" folklore exists, but it's a second layer, not the definition.
What the spec means by "interaction"
INP is layered on top of the W3C Event Timing API, which defines what a single interaction is and how to measure its duration. The spec is deliberate about the fact that a click is several events at the DOM level:
A single user Interaction (sometimes called a Gesture) is typically made up of multiple physical hardware input events.
One interaction is a cluster of events the user perceives as a single gesture. A click is pointerdown → pointerup → click. A keypress is keydown → keyup. A drag is pointerdown → pointermove* → pointerup. Each Performance Entry in Event Timing tags events from the same gesture with the same interactionId:
The
interactionIdattribute is a number that uniquely identifies the user Interaction which triggered the associated event.
For INP, only three interaction types count: clicking with a mouse, tapping a touchscreen, and pressing a key. Scrolling, hovering, and zooming are explicitly excluded. Scroll has its own metrics. Hover has no commit point. Zoom is typically browser-owned.
What the spec measures, exactly
This is the part most engineers get wrong. Even among senior FE devs, "duration" is usually described as "how long the JS handler took". That is not what the spec says.
Returns the difference between the next time the update the rendering steps are completed for the associated event's Document after the associated event has been dispatched, and the
startTime, rounded to the nearest 8ms.
Unpack that carefully:
startTimeis the hardware timestamp of the physical input event. Not the moment your listener fires. The time the OS delivered the click to the browser.- End time is the completion of the next run of WHATWG's "update the rendering" algorithm that includes the mutations your handler caused. Paint, not handler return.
- Rounded to 8ms — a deliberate granularity to prevent fingerprinting through high-resolution timing.
So a duration of 180ms is saying: "180ms elapsed between the OS detecting the click and the first frame that reflects the result of that click painting." That window is input delay + processing + presentation latency combined.
This is why a cheap click handler can still produce an ugly INP: if your handler schedules a state update that causes a long style recalc, a big layout, a compositor stall, or just lands on the wrong side of a pre-existing long task, the measurement lives in the space between input and paint — not in the function-call duration.
Drag the three phase sliders to see how input delay, handler processing, and rendering compose into the measured duration:
The trimming rule
Now the piece that's almost never stated in deck form. web.dev's definitional page:
For most sites the interaction with the worst latency is reported as INP. However, for pages with large numbers of interactions, random hiccups can result in an unusually high-latency interaction on an otherwise responsive page. For this reason, we ignore one highest interaction for every 50 interactions. The 75th percentile of all page views is then reported as usual, which further removes outliers.
Two percentiles. Collapsed:
- Within a page view — sort the interactions by duration, drop the top
floor(N / 50)outliers, report the next one. For N < 50, INP is the single worst interaction. For N = 100, it's the 3rd-worst. For N = 200, the 5th-worst. Expressed as a percentile of that page: for N=100, INP sits at the97/100 = P97; for N=200 it's195/200 = P97.5; asymptotically approaching but never reaching P98. - Across page views — RUM then takes the P75 of those per-page INPs to produce the site-level number.
The "P75" in field tools refers to step 2. It does not mean INP approximates P75 of interaction latencies within a visit. Within a visit, INP is a near-worst-case summary.
Back to the slider
The visualiser at the top makes the trimming rule physical. At N=1, INP is the single bar — the literal max. At N=50 it's still the max. Cross 50 and INP starts shedding one bar off the top; by N≈100 it's visibly off the peak; by N=200 it's well inside the second percentile band. Resample a few times — the shape is stable because the distribution is.
Two things become obvious once you play with it:
- Low-interaction pages punish you for any single bad click. A landing page with 20 clicks gets its INP set by the single worst one. Your p50 can be 30ms and your INP can still flash red from one 600ms outlier.
- High-interaction pages hide tail jank well. A text editor with 500 interactions in a session is allowed 10 "free" outliers. Individual ugly frames get absorbed, so the metric stabilises around a real ceiling of sustained responsiveness.
This is by design. The metric wants to catch sustained unresponsiveness, not punish every GC pause.
What this changes in practice
The usual INP advice is correct but blurry. Once you've read the spec, it sharpens:
- "Break up long tasks" → specifically, break them so they don't sit between input delivery and paint. A long task that ends before the click arrives is free. One that starts during the click's
processingEnd → paintwindow costs you the full wall-clock gap. - "Use
requestIdleCallback/ yield to the main thread" → what this really buys you is a chance for the browser to complete its "update the rendering" step — which is the definition of INP's end time. Yielding without the browser taking the render opportunity doesn't help. - "Debounce heavy work" → a good debounce keeps the current interaction's
processingEndclose to paint. A bad one just moves the pain from one interaction to another and still counts. - "Optimise for the slowest user" → spec-correct. INP takes the worst, not the average, so optimising median response is invisible to it. The win is in removing the tail.
A rule of thumb that actually survives the spec: if your 95th-percentile interaction duration is comfortably below 200ms, your INP will be too. If it's above, you need to know how many interactions your users average — because 30 interactions means INP = P95 + outlier penalty, but 200 interactions means INP ≈ P97.5 with the top 4 forgiven.
Primary sources
- W3C Event Timing — Interaction — the definition of an interaction as a cluster of events with a shared interactionId.
- W3C Event Timing — PerformanceEventTiming.duration — duration measures from hardware input to the next "update the rendering" completion, rounded to 8ms.
- WHATWG HTML — update the rendering — the algorithm whose completion ends an interaction's duration.
- web.dev — INP definition — the trimming rule ("ignore one highest interaction for every 50") and the cross-page-view P75.
- web.dev — What is INP? — the three observed interaction types (click, tap, key) and exclusions.