FP&A Metrics: Forecast Accuracy, Model Velocity, Business Partner NPS
FP&A is the only finance function that doesn't measure itself.
Accounting has close-day. Treasury has DSO and cash forecast variance. Tax has effective rate and filing on-time. Walk into any FP&A team and ask "how accurate was last quarter's revenue forecast, by line item, to one decimal place?" — most of the room can't answer. The senior who can usually pulls it from a deck they made for themselves, not a system anyone else sees.
That's the problem. If you can't show your Director a trend line on your own work, you'll get measured on optics. Who stayed latest. Who built the prettiest deck. Who jumped on the ad-hoc fastest. None of those are output. They're activity.
This is the five-metric scorecard that fixes it. None of them are exotic, none require a new system, and you can baseline all five in six weeks of self-tracking before anyone else sees a number.
Why running metrics on yourself feels weird (but isn't optional)
The first reaction from most analysts is "we'd be exposing ourselves." Right. That's the point.
Every other function in the company has KPIs that get reviewed up the chain. Sales has quota attainment. Marketing has CAC and pipeline coverage. Engineering has incident rate and lead time. The only people who get to operate without a public scorecard are the ones who report on everyone else's scorecards. Read that sentence twice.
When you don't measure your own output, three things happen:
- Your CFO grades you on vibes. "Good quarter, you worked hard." Vibes don't get you promoted on a predictable cadence.
- Mistakes hide. A model with a stale formula keeps shipping forecasts until something breaks loudly enough to trace back six months.
- The function gets cut first in a downturn. Functions without measurable output are the easiest to trim because nobody can quantify what's lost.
Self-measurement isn't a vanity exercise. It's how you trade hours for credibility. The CFO doesn't care that you stayed until 11pm on Tuesday. The CFO cares whether next quarter's forecast is going to be within 5%.
Metric 1: Forecast accuracy (±5% revenue, ±3% opex, line-item)
The headline metric. Most teams already track this badly.
The two ways teams get this wrong:
Wrong way 1: bottom-line only. "We came in within 2% of total revenue." Sounds great. Inside that 2% is +18% on Enterprise, -15% on SMB, and a lucky FX tailwind that masked both. Errors net out at the bottom line and lie to you. If your model is wrong on segment mix, you'll keep making the same wrong calls on resource allocation, hiring plans, and territory coverage, and the bottom-line accuracy number will keep telling you everything is fine.
Wrong way 2: variance vs. budget instead of vs. forecast. Budget was set in Q4 of last year. Re-forecasting is the whole job. Measuring yourself against a 9-month-old number rewards you for being wrong slowly.
The right way: measure forecast accuracy at the line-item level, against the most recent re-forecast, on the metrics that drive decisions. For a B2B SaaS, that's typically:
- New ARR by segment (Enterprise / Mid-market / SMB)
- Expansion ARR
- Gross churn ARR
- Headcount by function
- Top 5 opex line items (people cost, T&E, marketing program spend, software, professional services)
Targets I've seen hold up across mid-size companies: ±5% on revenue lines, ±3% on opex lines. Opex should be tighter because most of it is contracted or headcount-driven and you control the inputs. If you're missing opex by 8%, your model is missing something structural (usually a new vendor someone forgot to tell you about, or a hiring plan that drifted).
Track it monthly. Plot a trend. The goal isn't to hit ±5% every month from day one — it's for the line to slope toward the target over two or three quarters as you fix the model.
Metric 2: Model velocity / time-to-deliver
How fast can FP&A turn a request into a usable model.
Two flavors:
Ad-hoc requests: "Can you model what happens if we cut SMB headcount by 20%?" Target: v1 in the requester's hands in under 48 hours. V1 doesn't have to be perfect. It has to be directional and labeled clearly as v1. The sin is making them wait a week for something that takes you four hours but spent six days in your queue.
New recurring models: A new monthly opex tracker, a new commission calculator, a new cohort retention view. Target: shipped within 2 weeks of scoping. Anything longer and the requester loses interest and starts building it themselves in a Google Sheet, which is how you end up with shadow models that contradict yours.
Tracking is simple. Every request that comes in gets a row: requester, ask, scoped date, v1 delivered date. Hours-to-v1 is the metric. Median is more useful than average because one nasty request will skew average and let you ignore the pattern.
The reason velocity matters more than people think: slow FP&A trains the business not to ask. If a sales leader knows it'll be a week before they get a model, they stop asking and make the call without one. You don't lose your seat at the strategy table by being wrong. You lose it by being slow enough that nobody waits for you.
Metric 3: Business partner NPS
The metric that tells you whether the rest of the company actually wants to work with you.
Send a 2-question survey to your business partners (sales, marketing, ops, product, CS leaders) once a quarter. Two questions only:
- On a scale of 0-10, how likely are you to recommend working with FP&A to a peer leader at another company?
- What's the one thing we should change?
Calculate NPS the standard way (% promoters minus % detractors). Most FP&A teams I've seen score in the 20-40 range. Good teams hit 60+. Above 70 is rare and usually means the team is structurally embedded. The analyst sits in the sales leader's staff meetings, not just in finance reviews.
The script for asking, because the awkwardness is what stops most analysts from running this:
"Hey [Director], we're trying to make FP&A more useful to your team. Can I send you a 2-question survey at the end of the quarter? It's literally two questions, takes 30 seconds. We'll share the results back and tell you what we're changing."
That's it. No build-up, no apology, no "I know you're busy." Send it via whatever they actually read. Slack DM beats email beats survey tool. Aggregate results, share back what you heard, name what you're changing. Then run it again next quarter and watch the trend.
The qualitative answers are where the real value lives. The number tells you the trend. The comments tell you what to fix.
Metric 4: Model error rate
Bugs caught after delivery, per 10 models shipped.
Cell reference errors. Broken formulas. Stale assumptions someone forgot to refresh. A SUMIF that worked in February and silently broke when a new segment got added in March. The kind of thing that makes a CFO email at 9pm asking "is this number right?"
Target: fewer than 0.5 caught-after-delivery errors per 10 models. Translation: one in twenty models has an error that gets flagged after you sent it. Anything above 1.0 (one in ten) means your QA is broken or your modeling speed is outrunning your review.
How to track honestly: keep a running list. Every time someone (you, your manager, the requester) finds an error in a model after it shipped, log it. Date, model, what broke, root cause. Most root causes fall into four buckets:
- Cell reference broke when rows were inserted
- Hard-coded assumption that should have been a driver
- Formula didn't extend when columns were added
- Source data changed format and the lookup silently returned blanks
Each of those has a fix at the build stage. Named ranges instead of cell refs. Driver tabs separate from calc tabs. Tables instead of ranges so formulas auto-extend. Data validation on source imports.
The point of tracking error rate isn't to feel bad. It's to make the patterns visible so you can fix the build process, not just the bug.
Metric 5: Ad-hoc backlog age
Median age of open ad-hoc requests, in days.
If your median is over 14 days, you're not doing FP&A. You're doing reporting triage, and the strategic work isn't getting done because you're stuck servicing a queue.
Why median, not average: one ancient request that's been sitting for 90 days will skew the average and hide the fact that the rest of the queue is actually moving. Median tells you what the typical request experiences.
Target: median under 7 days, max under 21. If something has been in the queue more than 21 days, it should either get scoped properly and put on the recurring-model roadmap, or get killed honestly with a "we're not going to do this" note to the requester. The worst outcome is leaving it open forever. The requester loses trust, and the queue keeps growing.
Backlog age is the metric that tells you when you need to hire. If accuracy is good, velocity on what you ship is good, but median backlog age is climbing past 14 days, you don't have a quality problem. You have a capacity problem. That's a very different conversation with the CFO.
The "low accuracy because the inputs are stale" diagnostic
When forecast accuracy misses by 10%+, the first instinct is to rebuild the model. Don't. Eight times out of ten, the model is fine. The inputs are stale.
Run the input-staleness audit before you touch a single formula:
- Open the model. List every assumption that drives a forecast line. New logos by segment. Average deal size. Sales cycle length. Win rate. Headcount by function. Vendor spend by category.
- Timestamp each one. When was this number last refreshed by the source-of-truth owner? Sales pipeline number from sales ops, headcount from HR, vendor spend from AP.
- Flag anything older than the forecast cycle. If you re-forecast monthly, anything older than the start of the current cycle is suspect. If quarterly, anything older than the start of the current quarter.
- Trace the largest misses to the stalest inputs. Plot misses against input-age. The correlation is almost always there.
Real example pattern: forecast missed new ARR by 12%. Model is fine. The pipeline number that fed it was pulled three weeks before close, and in those three weeks two large deals slipped to next quarter that nobody told finance about. The fix isn't model rework. The fix is a standing 30-minute weekly check-in with sales ops where pipeline gets refreshed before forecast lock.
This diagnostic is also how you push input-quality work back to the input owners. Sales ops doesn't refresh pipeline because finance doesn't ask. Once you can show the CFO "our forecast missed by 12% because pipeline data was 21 days old at lock," sales ops gets a real reason to fix the cadence. You stop owning a problem that isn't yours to fix.
The QBR slide that changes the conversation
One slide, four quadrants, every quarter, in front of your Director or CFO. This is what reframes the conversation from "what are you working on" to "is the function getting better."
Layout:
+---------------------------------------+---------------------------------------+
| TOP-LEFT: Forecast accuracy trend | TOP-RIGHT: Model velocity trend |
| Line chart, last 8 quarters | Bar chart, last 8 quarters |
| - Revenue accuracy % | - Median hours to v1 (ad-hoc) |
| - Opex accuracy % | - Median days to ship (recurring) |
| Target lines at ±5% and ±3% | Target lines at 48h and 14d |
+---------------------------------------+---------------------------------------+
| BOTTOM-LEFT: Business partner NPS | BOTTOM-RIGHT: Top 3 misses, named |
| Number + trend arrow | 1. Q1 new ARR -12%: pipeline staleness|
| Top 2 qualitative themes | 2. Q1 opex +4%: vendor onboarded late |
| (e.g., "wants faster ad-hoc", | 3. Headcount +6: hiring plan drift |
| "wants segment-level detail") | Each with named root cause + owner |
+---------------------------------------+---------------------------------------+
Three rules for the slide:
- No talking points underneath. The slide is the talking points. If you need a script, the slide is wrong.
- Misses are named with root cause. "Q1 new ARR missed by 12% because of pipeline staleness, fixed in Q2 with weekly sales ops sync." Not "we missed by 12% due to market conditions." Markets aren't a root cause. Stale pipeline is.
- Show the trend, not the snapshot. A 60 NPS this quarter doesn't mean anything. A 40 → 48 → 55 → 60 trend means the function is improving.
Run this slide every quarter for four quarters. By quarter three, the CFO stops asking what you're working on and starts asking what's next. That's the reframe. That's what you're optimizing for.
Vanity metrics that actually hurt the function
A short list of metrics that look like output but reward the wrong behavior:
- Number of models built. Rewards volume over impact. The team that ships 40 mediocre models gets the same credit as the team that ships 12 that drive decisions.
- Hours logged in models. Rewards slowness. The analyst who solves it in two hours looks worse than the one who took two days.
- Variance commentary word count. Rewards waffle. Long commentary usually means the analyst doesn't know the cause and is hedging. Two sentences with a named root cause beats two paragraphs of "due to a combination of factors."
- Number of dashboards owned. Rewards sprawl. A team with 23 dashboards has 19 nobody reads.
- Speed of close support. Important for accounting, not for FP&A. Close-day is accounting's metric. FP&A's job starts after close.
If you're being measured on any of these, the conversation to have with your manager is "I'd rather be measured on accuracy and velocity. Here's the trend so far."
How to actually start
Don't roll out all five at once. Pick three this quarter. The combination I'd start with:
- Forecast accuracy (line-item, vs. last re-forecast). The headline metric. Non-negotiable.
- Ad-hoc backlog age. Easiest to track, hardest to fake, tells you whether you have a capacity problem.
- Business partner NPS. Slowest to move, but the qualitative answers will reshape what you build.
Track them privately for six weeks before showing anyone. Get a baseline you trust. Find the obvious problems and fix them quietly so the first number your CFO sees isn't your worst one.
Then build the QBR slide. Put it in front of your Director or CFO at the next review. Keep building it every quarter.
By the third quarter, the function won't be measured by hours-worked anymore. It'll be measured by output, by trend, by the number of decisions it's improving. That's the seat at the strategy table, and that's how you earn it.
Learn More

Principal Product Marketing Strategist
On this page
- Why running metrics on yourself feels weird (but isn't optional)
- Metric 1: Forecast accuracy (±5% revenue, ±3% opex, line-item)
- Metric 2: Model velocity / time-to-deliver
- Metric 3: Business partner NPS
- Metric 4: Model error rate
- Metric 5: Ad-hoc backlog age
- The "low accuracy because the inputs are stale" diagnostic
- The QBR slide that changes the conversation
- Vanity metrics that actually hurt the function
- How to actually start
- Learn More