English

EM Metrics: Velocity, Quality, Retention, Hiring

Turn this article into takeaways for your work.

Each assistant summarizes the article only for you and suggests best practices for your work.

The first time I walked into a QBR with a story-points chart, my VP let me get through about ninety seconds before he said, "Camellia, what's your regrettable attrition? And your deploy frequency? Forget the points."

I had neither number. I had a beautifully colored bar chart of velocity by sprint and a slide titled "Q3 Throughput +18%." None of it survived contact with the question. He wasn't being rude. He was being honest. The exec layer was funding outcomes. I had brought activity. We were not speaking the same language, and the budget conversation that followed was about as warm as you'd expect.

That meeting changed how I run dashboards. This guide is the dashboard I wish I'd had: six metrics, real number ranges, named diagnoses, one QBR slide. Nothing more, because more is how Goodhart's law eats you.

Why outcomes beat throughput

Throughput metrics — story points, tickets closed, lines of code, hours logged — measure activity. Activity is what your team did. Outcomes are what the company got. The exec layer funds outcomes: shipping speed, reliability, team durability, hiring throughput. If you can't translate your team's work into those four buckets, you'll get cut from headcount conversations, and you won't see it coming until the offer freeze lands in your inbox.

There's a second reason. Throughput metrics are easy to game without anyone meaning to. Tell a team you measure tickets closed and within a quarter you'll have smaller tickets. Tell them you measure story points and within two quarters you'll have inflation that would make a central banker blush. Outcome metrics are harder to game because they're tied to reality. The deploy either happened or it didn't, the engineer either stayed or they left, the offer was either accepted or declined.

So: six metrics. Four for shipping and reliability, two for team health. That's the set.

Deploy frequency

How often does your team's code reach production? Daily, weekly, monthly, or less. This is the first DORA metric and the cleanest signal of whether your shipping pipeline is actually a pipeline or a series of locked doors with hand-keys.

Target tier per team maturity:

Tier Deploy frequency What it usually means
Elite On-demand (multiple per day) Trunk-based, feature flags, mature CI
High Daily to weekly Healthy team, reasonable CI, some manual gates
Medium Weekly to monthly Release trains, sprint-bound, batched risk
Low Monthly or less Big-bang releases, fear-driven cadence

Most product teams should sit in High. If you're in Medium and the work justifies it (regulated industry, long QA cycles), say so on the slide. If you're in Low and you're a SaaS team, that's a finding, not a number.

Lead time for changes

Time from commit to production. This is the second DORA metric and the one that tells you where the bottleneck actually lives. Hours, days, or weeks.

When you measure this and break it down, you usually find one of three culprits: code review (PR sits open for two days waiting on a senior who's in meetings), CI (test suite takes 47 minutes and flakes once in five), or deploy gate (manual approval that lives in someone's calendar). Pick the longest one and fix it. Don't try to fix all three at once.

Healthy ranges by team type:

  • Product team, web app: under 24 hours, ideally under 8.
  • Platform team, infra: under 48 hours.
  • Mobile team: 3-7 days, gated by app store review.

If your lead time is two weeks and your deploy frequency is daily, something is mathematically off. You're either deploying mostly trivial changes while big ones rot, or you're miscounting. Investigate before you put the slide up.

Change failure rate

Percent of deploys that cause a rollback, hotfix, or production incident. The third DORA metric. Target band is 0-15%, with healthy teams sitting at 5-10%.

Two things to watch. First, change failure rate of zero is a flag, not a flex. It usually means your team is over-testing, deploying too rarely, or under-counting incidents. Second, this metric has a natural floor. Software is hard, humans ship bugs, and a 2% rate is noise, not a signal of decline. Don't chase the last percentage point at the cost of velocity.

When change failure rate climbs past 15%, the diagnosis is almost always one of: insufficient test coverage on critical paths, a release process that batches too much (so failures become entangled), or new hires shipping to production before they understand the blast radius. Name which one. Don't average them away.

MTTR

Mean time to restore. When something breaks in production, how long until it's working again? Hours, not days. This is the fourth DORA metric and the one most tied to on-call hygiene.

A good MTTR is under 4 hours for a SaaS product. Under 1 hour for elite teams. The trick is that MTTR is not really about how fast you can write a fix. It's about how fast you can detect, diagnose, and deploy. Most slow MTTR I've debugged came from one of three places: alerting that didn't fire (so the team learned about the outage from a customer), runbooks that didn't exist (so the on-call engineer was figuring it out from scratch), or a deploy pipeline that takes 90 minutes (so even after the fix is written, you wait).

If MTTR is creeping up, look at on-call load. Burnt-out on-call engineers are slow on-call engineers. This is the metric where deploy frequency, change failure rate, and team health touch each other.

12-month regrettable attrition

The first of the two team-health metrics, and the one your VP cares about most.

Regrettable attrition counts only people you wanted to keep. If a low performer leaves, that's not regrettable. If a steady mid-level engineer who never made waves leaves and you shrug, that's not regrettable. If a senior IC, a tech lead, or a high-trajectory junior leaves, that's regrettable. Be honest with yourself when you label it. Managers tend to retroactively decide everyone who left was "not the right fit," which is how you end up with 0% regrettable attrition and a team that's halfway gone.

Industry baselines for tech, 2026:

  • Healthy: 8-12% annual.
  • Watch zone: 13-15%.
  • Fire: above 15%.
  • Five-alarm fire: above 20%.

Track it on a 12-month rolling basis, not by calendar quarter. Quarterly numbers are too noisy on small teams. If you have eight engineers and one regrettable departure in Q2, that's 12.5%: fine on a rolling basis, terrifying on a quarterly chart.

Hiring offer-accept rate plus ramp time

The second team-health metric, and the one that tells you whether your team can grow.

Two sub-metrics, one slide row. Offer-accept rate: of the offers you extended in the last quarter, what percent were accepted? Target is 70% or higher. Below 60% means your offers are not competitive (usually comp, sometimes title, occasionally story).

Ramp time: from start date to first non-trivial PR merged to production, how many days? Target under 30. If your average new hire takes 60 days to ship something real, your onboarding is broken or your codebase is a maze. Either way, that's a finding.

Both numbers together tell the hiring story. High accept rate, fast ramp: hiring engine works. High accept rate, slow ramp: you sell well, you onboard badly. Low accept rate, fast ramp: when people do join they thrive, but you can't close. Low accept rate, slow ramp: the engine is broken on both ends.

Engineering NPS

Quarterly pulse survey, one question: "Would you recommend this team to another engineer?" Zero to ten scale. Subtract detractors (0-6) from promoters (9-10), ignore passives (7-8). The result is between -100 and +100.

Healthy engineering teams score 30-50. Above 50 is great. Below 20 is a warning. Negative is a fire that's already started; you just haven't seen the smoke yet.

Run it the same week every quarter. Anonymous. Always include one open-ended follow-up: "What's the one thing that would move your score up by one point?" Read every response. The number is the headline; the comments are the diagnosis.

The "high velocity but high turnover" diagnostic

Here's the archetype that catches most EMs by surprise. Deploy frequency is high. Change failure rate is low. Lead time is excellent. MTTR is solid. By the four DORA metrics, the team is shipping like a champion.

And regrettable attrition is climbing. Engineering NPS is dropping. People you wanted to keep are leaving.

The team is shipping its way out of the company.

I lived this exactly once. We did 60 deploys a month, change failure rate of 4%, MTTR under 90 minutes, and I lost three engineers in a quarter, two of them senior. The metrics dashboard was lying by omission. The truth showed up in exit interviews: "I'm tired." "I haven't grown in 18 months." "My friend at a competitor makes 30% more for the same work."

There are usually three causes, and they often stack:

  1. Burnout. High velocity is sustainable for a quarter, sometimes two. Past that, you're borrowing against the team's future.
  2. Comp gap. The market moved, your bands didn't, and your seniors know it because they get LinkedIn messages every Tuesday.
  3. No growth path. People stayed when they were learning. Once the codebase felt small, they looked for a bigger one.

Name the cause on the slide. Don't average it into a generic "people are leaving, we'll work on engagement." That sentence has never saved a team.

Vanity metrics to stop reporting

Goodhart's law: when a measure becomes a target, it ceases to be a good measure. Every metric on this list, including the six above, is vulnerable. But these four are vulnerable enough that you should retire them entirely:

  • Story points completed. Inflation guaranteed. Ships smaller stories framed as bigger ones. Tells you nothing about what reached production.
  • Tickets closed. Same problem, different scoreboard. Encourages tickets to multiply.
  • Lines of code. A measure of typing, not engineering. Penalizes deletions, which are usually the best PRs.
  • Hours logged. Measures presence, not output. Punishes parents and rewards performative late-night Slack messages.

Swap them out. Story points and tickets become deploy frequency and lead time. Lines of code becomes change failure rate. Hours logged becomes engineering NPS, because if your team is healthy and shipping, you don't need to count hours.

The QBR slide

One slide, six metric rows, three columns: current, target, trend arrow. Add one sentence of diagnosis per row. That's the format.

Sample layout:

Metric Current Target Trend Diagnosis
Deploy frequency 12/week Daily up CI improvements landed in March; trunk-based working
Lead time 31 hours Under 24h flat Review queue is the bottleneck; piloting auto-assign in Q2
Change failure rate 7% Under 10% flat Healthy band; new-hire shipping process is holding
MTTR 2.4 hours Under 4h down Runbook coverage hit 80% this quarter
Regrettable attrition (12mo) 14% Under 12% up Two senior departures Q1; comp review and growth-path audit underway
Engineering NPS 38 40+ flat On-call load is the top open-ended complaint

Six rows. Six diagnoses. No story points. No velocity chart hidden in an appendix. If a number is bad, say so plainly and name what you're doing about it. VPs trust EMs who name problems faster than the VP would have spotted them. They distrust EMs who hide.

What to ask your VP for

Close the QBR with a question, not a victory lap. Ask which two of the six metrics matter most for the company this quarter, and what target band would be considered a win.

Trying to optimize all six at once is how you get nothing. A team that's working on shipping speed is not also working on hiring throughput, and a team that's working on hiring is going to take a temporary hit on lead time while seniors interview candidates instead of reviewing PRs. Pick two. Get alignment on the targets. Re-run the question next quarter.

If your VP says "all six," gently push back. Say: "If I had to under-invest in two of them this quarter to fully invest in two others, which two should I deprioritize?" That question almost always gets a real answer, because it forces a tradeoff into the open.

This is the version of the QBR I should have walked into the first time. Six metrics. Real numbers. Named diagnoses. One slide. The story-points chart stays in the drawer where it belongs.

Learn More

About the author

Camellia

Camellia

Principal Product Marketing Strategist

Camellia is Principal Product Marketing Strategist at Rework, helping B2B buyers pick the right software with confidence. With 6+ years in product marketing and 150+ SaaS tools evaluated across CRM, project management, and sales engagement, Camellia turns competitive intelligence into clear, honest comparisons. Readers get vendor evaluations they can trust to cut through marketing noise and decide faster.