Português

UX Metrics: Usability, Task Success, Time-to-Completion, NPS

Most UX work isn't measured. "Users like the new flow" gets you nodded out of the room. The PM walks in with a churn chart. You walk in with a Figma file. Guess who wins the roadmap argument.

If you've ever sat in a quarterly business review and watched your scope shrink while the engineering manager defended a refactor with throughput numbers, you already know what this is about. Design needs its own numbers. Not vanity dashboards. Not "engagement up 4%." A short, defensible stack that tells a Head of Product, in under sixty seconds, whether the design is working.

Here's the stack I use, the diagnostic that catches the failure mode nobody talks about, and the QBR slide that gets forwarded without edits.

Why This Matters Right Now

Design Leads are being asked to defend headcount against AI tooling. "We shipped 14 design tokens" is not a defense. "Task success rate moved from 71% to 88% on the signup flow, and tickets-per-thousand-users dropped 22%" is.

I'm not being dramatic. In the last two performance cycles I've watched, every UX team that lost headcount had one thing in common: they couldn't produce a number that mattered to the business. The teams that grew had a shared vocabulary with product and a baseline they updated every quarter. That's the entire difference.

Numbers don't replace craft. They protect it.

The Five Metrics

Pick these five. Don't add a sixth until you've reported these for two quarters straight. More metrics make your slide weaker, not stronger.

1. Task Success Rate

Define the task narrowly. Not "use the dashboard." Say "create a new project, invite one teammate, and assign the first task." Then count how many users finish without help, without backing out, without abandoning.

Target: above 85% for any core flow that drives revenue or retention. Below 70% on a flow you ship to a paying customer is a bug, not a design preference.

How to instrument: session replay (FullStory, LogRocket, Hotjar) for high-traffic flows, moderated tests with n=12 or more for anything new. Twelve is the floor where you start seeing the same problem twice. Below twelve and you're guessing.

The mistake I see most often: teams measure task success on the happy path only. Add the two most common edge cases (empty state and error recovery) and re-measure. The number drops, and that drop is where your real work lives.

2. Time-to-Completion

Use the median, not the mean. One enterprise customer with a custom workflow will pull the mean to nonsense. Median tells you what the typical user actually experiences.

Segment by new vs. returning. New users on a flow over 90 seconds for a core action is a flag. Returning users over 30 seconds on something they do daily is also a flag. That's muscle memory you're not letting them use.

Real example: a billing page redesign moved median time-to-completion from 47 seconds to 31 seconds for returning users. That's a 34% reduction. The PM didn't care about the redesign. They cared that support tickets about "where is the invoice download button" dropped to zero in the same quarter. Same outcome, different language. Use both.

What time-to-completion does not tell you: whether the user enjoyed it, whether they came back, or whether the flow is even the right flow. It's a speedometer, not a destination check.

3. Error Rate

Three things to track:

  • Form errors: fields submitted wrong, validation triggered, retries before success
  • Dead-end clicks: taps on things that aren't interactive (a label that looks like a button, an icon with no action)
  • Rage-clicks: three or more clicks on the same element in under two seconds

Per session is the right unit. "Errors per user per week" hides flow-level fires.

Critical rule: baseline before the redesign. If you ship a redesign and report "error rate is 1.2 per session," that number is meaningless without the pre-redesign baseline. I've seen designers shipped a 30% improvement and not been able to claim it because they never measured the starting point. Set the baseline in week one. Always.

Rage-clicks are underrated. They're the closest thing UX has to a customer screaming at the screen. Any flow with rage-clicks above 0.5 per session is communicating something specific: the user thinks something should work, and it doesn't.

4. SUS Score

The System Usability Scale. Ten questions, scored 0-100, run quarterly.

Average across all software: 68. Below 50 is a fire. Your users are actively struggling. Above 80 is genuinely good. Above 85 is rare and probably means a small, self-selected sample.

Don't run SUS on the whole product. Run it on the three flows that drive revenue or retention. Pick them with your PM. Otherwise you'll end up averaging a great onboarding with a mediocre admin panel and reporting a meaningless 71.

The ten questions (paraphrased; use the standard wording when you ship it):

  1. I think I would use this system frequently.
  2. I found the system unnecessarily complex.
  3. I thought the system was easy to use.
  4. I would need support of a technical person to use this system.
  5. The various functions in this system were well integrated.
  6. There was too much inconsistency in this system.
  7. I would imagine most people would learn to use this system very quickly.
  8. I found the system very cumbersome to use.
  9. I felt very confident using the system.
  10. I needed to learn a lot of things before I could get going with this system.

Five-point scale, alternating positive and negative phrasing. The math is well-documented; don't reinvent it.

Run it the same week each quarter, on the same flows, with at least n=20 respondents per flow. Trend matters more than the absolute number.

5. Post-Launch NPS Delta

Net Promoter Score is a blunt instrument. On its own, it's a vanity metric. Segmented, it's one of the strongest signals you can show a Head of Product.

The move: in the 30 days after a redesign ships, segment NPS responses by users who actually hit the redesigned flow vs. users who didn't. Compare the two cohorts. If the redesigned-flow cohort scores +8 NPS points higher, you have a defensible claim. If they score the same or lower, you have a problem worth investigating.

This is the only NPS number I'd ever put on a UX slide. Aggregate company NPS belongs to the CEO. Cohort-segmented post-launch NPS belongs to you.

Watch for the trap: don't compare against the prior quarter's full-company NPS. You're measuring a flow, not a company. Compare cohort against cohort, same window.

The "High Task Success but High Churn" Diagnostic

This is the one nobody warns you about.

The flow works. SUS is at 76. Task success is 91%. Time-to-completion dropped. You ship. You celebrate. Three months later, churn hasn't moved. Sometimes it gets worse.

What happened?

Usually one of three things:

You optimized the wrong job-to-be-done. Users completed the task you measured, but the task wasn't what they actually wanted to accomplish. Classic example: a redesigned import flow with 94% success, but users were importing CSVs because they couldn't get the API to work. The real job was "get my data in fast." You optimized the workaround.

The next step is broken. They finished onboarding. Then they hit the empty state and bounced. Or they hit a pricing wall they didn't expect. Or they got handed off to a sales rep who took four days to respond. The flow you measured succeeded; the journey failed.

You optimized for first-time users while breaking returning users. New-user task success up 22%. Returning-user task success down 8%. Returning users are the ones paying. They left.

How to spot it before the QBR:

  1. Cohort the churned users from the last 30 days.
  2. Pull session replays for that cohort, 7 days before churn.
  3. Look for hesitation moments: long pauses on a screen, going back to a previous step, opening a help article in a new tab.
  4. Identify the screen they hesitated on, even though session metrics show they "succeeded."
  5. That screen is your real problem. Not the one you redesigned.

Five churned-user replays will teach you more than fifty happy-path tests. The investment is two hours. The output is a list of three to five fixes that move retention, not vanity metrics. Do it every month.

The QBR Slide

One slide. Four numbers. One diagnostic call-out. Don't make it pretty. Make it skimmable.

Layout (left to right, top to bottom):

Metric Current Target Δ vs Q-1 Risk
Task success (signup) 88% 90% +5pp On track
Time-to-completion (median) 31s 30s -16s On track
Error rate (per session) 0.7 < 0.5 -0.4 Trending down
SUS (3 core flows avg) 74 78 +3 On track

Below the table, one line:

Diagnostic: Onboarding completion is up 17% but week-2 retention is flat. Cohort replay points to the empty dashboard state. Fix scoped for next sprint.

That's the whole slide. No screenshots of the redesign. No before/after Figma frames. No "users were delighted" pull quotes. The Head of Product will not read those. They will read the table, the delta, and the diagnostic.

If they want the Figma, they'll ask. They almost never do.

Vanity-Metric Traps

Things that look like UX metrics but aren't:

Engagement minutes alone. Longer sessions can mean confusion. A user who spends 12 minutes finding the export button is not engaged. They're stuck. Pair with task success or don't report it.

Design tokens shipped. This is an output, not an outcome. Engineers don't report "lines of code written" for a reason. Don't report this either.

NPS without segmentation. Aggregate NPS belongs to the CEO. If you put unsegmented NPS on a UX slide, you're either taking credit for things you didn't do or blame for things you didn't break.

CSAT after a support ticket. Measures the support agent, not the product. Useful to support leadership. Not useful to you.

Bounce rate as a UX metric. Bounce rate is a marketing metric on landing pages. On product surfaces it's almost always misleading.

Number of usability tests run. Activity, not outcome. The right number is "tests run that changed a design decision" — and that's a hard number to report, which is why nobody does it. Try anyway.

Instrumentation Checklist

Before you commit to reporting any of these, make sure you can actually measure them. Walk this list with your PM and one engineer:

  • Session replay tool deployed and PII-scrubbed (FullStory, LogRocket, Hotjar, Heap)
  • Event tracking on all core-flow start/complete pairs (start onboarding, complete onboarding, etc.)
  • Funnel definitions documented and shared with product analytics
  • SUS survey infrastructure (Typeform, in-app modal, or quarterly email) with at least n=20 per flow
  • NPS survey segmented by feature/flow exposure, not just send date
  • Baseline measurements taken before any redesign ships
  • A shared dashboard the PM can open without asking you for it

If you can't check all eight, fix the gap before you commit to the metric. Reporting a number you can't actually trust is worse than not reporting at all.

Measuring Your Own Success

You'll know this is working when:

  • You can answer "is the design working?" in under 60 seconds with numbers, not adjectives.
  • Your Design Lead forwards your QBR slide to the Head of Product without edits.
  • The PM stops asking for "user feedback" and starts asking for "the latest task success number."
  • An engineer asks for the baseline before they start a refactor — because they've seen the format.
  • You catch a high-success-high-churn case before the QBR, not after.

The goal isn't to become a data analyst. It's to stop being the person in the room without numbers.

Five metrics. One diagnostic. One slide. Run it for two quarters and the conversation about UX in your company will quietly change.

Learn More