Post-Sale Management

A SaaS company tracked customer health using a simple model: green if they logged in this month, yellow if they didn't, red if they hadn't logged in for two months.

The problem: Their churn rate was 15%, but they only predicted 40% of churned customers. Even worse, 30% of their "green" customers churned anyway.

The VP of Customer Success asked: "Why is our health score so bad at predicting anything?"

They dug into the data and found:

Login frequency alone was basically useless for predicting retention
They weren't measuring engagement quality, relationship depth, or whether customers actually saw value
Every signal got equal weight, even though some mattered way more than others
They missed declining patterns because they only looked at the current month
A one-size-fits-all approach meant enterprise and SMB customers got scored identically

So they rebuilt their health score from scratch:

Multiple dimensions: usage, engagement, sentiment, relationship, value
Weighted scoring based on what actually predicted retention (usage 35%, engagement 20%, etc.)
Trending and momentum tracking—because direction matters as much as the score itself
Different models for different segments (enterprise vs SMB have different "healthy" baselines)
Quarterly validation against actual renewal outcomes

Six months later:

They predicted 82% of churned customers (up from 40%)
False positives dropped 60% (way fewer healthy accounts getting flagged as at-risk)
Intervention success rate jumped 45% (because they were acting on real signals, not noise)
They identified 25 expansion opportunities they would have missed before

The lesson: Not all health scores are created equal. Building one that actually works takes thoughtful design, continuous validation, and a willingness to keep refining it.

Health Score Fundamentals

Purpose and Use Cases

What Health Scores Actually Do: A customer health score quantifies the likelihood that a customer will achieve their goals, stick around long-term, and grow their relationship with you. That's the theory, anyway. In practice, it's your answer to "Should I be worried about this account?"

Here's where you'll actually use them:

CSM Prioritization:

Which accounts need me to drop everything and call them right now?
Where should I spend my limited time today?
Which accounts are fine with quarterly check-ins?

Risk Management:

Which customers might churn if I don't do something?
How bad is it—yellow alert or red alert?
Do I need to intervene this week or can it wait?

Opportunity Identification:

Which accounts are ready for an expansion conversation?
Where can I push for deeper adoption without being annoying?
Who's happy enough to become a reference customer?

Forecasting:

What's our retention rate looking like next quarter?
How much revenue might walk out the door?
What's realistically in our expansion pipeline?

Executive Reporting:

Overall portfolio health (the dashboard executives actually look at)
How things are trending month to month
Whether our initiatives are working or we're just busy

Types of Health Scores

You've got three basic flavors of health scores, and they build on each other in complexity.

Descriptive Health Scores: These tell you where things stand right now. "This customer is healthy" or "this one's at risk." They look at recent behavior and current metrics. This is what most companies start with, and honestly, where many stay.

Example: Account XYZ has 75% active users, attended their last QBR, and gave you an NPS of 8. Health score: 78 (Healthy). Simple snapshot of where they are today.

Predictive Health Scores: These try to tell you where things are headed. "This customer will probably churn in 90 days based on their current trajectory." They look at patterns and trends over time. You need decent historical data to pull this off.

Example: Account XYZ's usage is declining 30% per month. Right now they're at a "moderate" 65, but if you run the numbers, they'll hit 42 (At Risk) in 90 days. The insight? Intervene now while you still have a relationship, not when they're already one foot out the door.

Prescriptive Health Scores: These tell you what to do about it. "This customer needs re-onboarding, here's the playbook." They compare patterns from similar accounts to recommend specific actions. This is the most sophisticated approach and usually needs machine learning or a really good data science team.

Example: Account XYZ has a health score of 58. Your system spots that accounts with similar patterns improved by 12-15 points after a targeted feature adoption campaign. Recommended action: Launch the same playbook for this account.

Which one should you build? Start with descriptive—it's your foundation. Add predictive once you have enough historical data to spot patterns. Only build prescriptive if you have the data science resources and enough accounts to make the patterns meaningful.

Score Components and Dimensions

Here are the dimensions most companies track, roughly in order of how much they matter:

1. Product Usage and Adoption (30-40% weight)

Active users (both the raw number and percentage of licenses they're paying for)
Login frequency
Feature breadth (how many features they actually use)
Feature depth (are they power users or just scratching the surface?)
Usage trends (growing, flat, or declining)

Why it matters: Usage predicts retention better than anything else. Customers who use your product stick around. Customers who don't are already halfway out the door.

2. Engagement and Activity (15-25% weight)

How often your CSM talks to them
Whether they show up to QBRs
Training and webinar attendance
Community involvement
Email engagement (opens, clicks, responses)
How quickly they respond when you reach out

Why it matters: Engaged customers have invested time and energy into the relationship. Disengaged customers are one competitive email away from switching.

3. Relationship and Sentiment (15-25% weight)

Do they have an executive sponsor?
Is there an identified champion, and are they still engaged?
NPS and CSAT scores
Feedback sentiment (are they happy or frustrated?)
Relationship strength (your CSM's gut feeling, quantified)
Stakeholder coverage (how many people do you know there?)

Why it matters: Strong relationships survive product bugs and pricing increases. Weak relationships don't survive much of anything.

4. Support and Issue Resolution (10-15% weight)

Support ticket volume
Issue severity (P1 emergencies vs minor questions)
How long issues take to resolve
Support satisfaction ratings
Escalations

Why it matters: Lots of serious tickets means either the product doesn't fit or you've got quality problems. A clean support history usually means smooth sailing.

5. Business Outcomes and Value (10-20% weight)

Goals achieved (the ones they told you about during the sales process)
ROI demonstrated (can they point to actual impact?)
Use cases expanded (started with sales, now marketing's using it too)
Value milestones hit
Business impact metrics they actually care about

Why it matters: Customers who see clear value renew. Customers who can't articulate ROI are vulnerable at renewal time.

6. Financial and Commercial (5-10% weight)

Payment history (on-time vs consistently late)
Contract status
Expansion history
Budget signals (did they just announce layoffs?)

Why it matters: Late payments often predict churn. Past expansion usually signals satisfaction.

Weighting and Calculation Methods

How to Figure Out the Right Weights:

Don't just guess. Here's how to do it properly:

Step 1: Dig Into Your Historical Data Run a correlation analysis between each dimension and actual retention. This shows you what really predicts whether customers stick around.

Example Analysis:

Usage dimension correlation with retention: 0.72 (strong predictor)
Engagement dimension correlation: 0.48 (moderate predictor)
Sentiment dimension correlation: 0.35 (weak to moderate)
Financial dimension correlation: 0.18 (weak predictor)

Step 2: Weight Based on Predictive Power Give the most weight to dimensions that actually predict retention. Don't treat everything equally just because it feels fair.

Example Weighting:

Usage: 35% (strongest predictor gets the most weight)
Engagement: 25%
Value: 20%
Relationship: 15%
Financial: 5% (weak predictor gets minimal weight)

Step 3: Test It and Adjust Run your weighted model against historical outcomes. If it's not accurate, adjust and try again. This isn't a one-and-done exercise.

Calculation Example:

Dimension	Weight	Raw Score (0-100)	Weighted Score
Usage	35%	80	28.0
Engagement	25%	70	17.5
Value	20%	75	15.0
Relationship	15%	60	9.0
Financial	5%	90	4.5
Total	100%	—	74.0

Final Health Score: 74 (Moderate)

Setting Score Ranges and Thresholds

Standard Health Score Ranges:

Healthy (75-100):

Strong usage and engagement
Positive sentiment
Retention looks solid
Probably ready for expansion conversations
What to do: Keep the relationship warm, look for expansion opportunities, ask for referrals

Moderate (50-74):

Acceptable but could be better
Some gaps in usage or engagement that need attention
They'll probably renew, but it's not a sure thing
What to do: Run proactive improvement initiatives, fix the specific gaps you're seeing

At Risk (25-49):

Low or declining usage
Weak engagement or relationship
Retention is genuinely at risk here
What to do: Drop everything, intervene now, get a save plan together, escalate if needed

Critical (0-24):

Barely using the product or completely dormant
Zero engagement
They're probably going to churn unless you pull off a miracle
What to do: Executive escalation, all-hands-on-deck save effort

Different Segments Need Different Thresholds:

Not all customers are created equal. What's "healthy" for an enterprise customer might be concerning for an SMB customer.

Enterprise Customers:

Healthy: 70+ (complex products take forever to roll out)
At Risk: <50
Why: Enterprise customers have long adoption curves. Lower usage early on doesn't mean they're unhappy—it means they're still getting 5 departments to agree on a workflow.

SMB Customers:

Healthy: 80+ (simpler products, faster adoption)
At Risk: <60
Why: SMB customers should be up and running fast. If they're not, something's wrong.

Your thresholds should reflect your actual data and how different segments behave.

Designing Your Health Score Model

Identifying Outcomes to Predict

Start With the Main Thing: Retention

Will this customer actually renew?
At what contract value?
What's the renewal rate going to be?

Then Add Secondary Outcomes:

Churn Risk:

Will they churn in the next 90 days?
What kind of churn? (Did they choose to leave, or did they just forget to pay?)

Expansion:

Are they going to expand?
By how much?
When's the right time to have that conversation?

Advocacy:

Will they be a reference customer?
Might they refer other customers?
Will they give you a testimonial for your website?

Keep It Simple at First: Focus on predicting retention vs churn. That's the thing that really matters. You can add expansion and advocacy prediction later once your retention model actually works.

Selecting Health Score Dimensions

How to Pick the Right Dimensions:

Step 1: Brain Dump Every Signal You Can Think Of

Product usage metrics
How they engage with you
Relationship indicators
Financial signals
Support ticket patterns
Sentiment data
External signals (are they growing? Did they just get funded? Are they laying people off?)

Step 2: Figure Out What You Can Actually Measure Be honest about your data reality:

Is this data available right now?
Can you integrate it without a six-month engineering project?
Is the data quality good enough to trust?

Step 3: Test What Actually Predicts Retention Run correlation analysis with your actual outcomes:

High correlation (>0.5): Include this
Moderate correlation (0.3-0.5): Consider including it
Low correlation (<0.3): Probably skip it unless you have a strategic reason

Step 4: Don't Go Overboard

Too few dimensions: You'll miss important signals
Too many dimensions: You'll drown in complexity and maintenance
Sweet spot: 4-6 dimensions

Start With These Four:

Usage (always include this—it's the strongest predictor by far)
Engagement (how invested they are in the relationship)
Sentiment (NPS, CSAT, how they feel about you)
Relationship (do they have an exec sponsor? An active champion?)

Add others as your data and systems mature: value realization, support quality, financial health.

Determining Data Inputs and Metrics

For Each Dimension, Define Specific Metrics:

Usage Dimension Inputs:

% of licenses with active users (last 30 days)
Average logins per user per week
of core features used (breadth)
Depth of usage within key features
Usage trend (month-over-month % change)

Engagement Dimension Inputs:

CSM touchpoints per quarter
QBR attendance (Y/N)
Training sessions attended
Email open and click rates
Community posts or participation

Sentiment Dimension Inputs:

Most recent NPS score
Support CSAT average (last 3 months)
Qualitative feedback sentiment
CSM relationship rating (1-5 scale)

Relationship Dimension Inputs:

Executive sponsor identified (Y/N)
Champion active (Y/N)
of contacts in CRM
of departments using product
Relationship depth score (CSM assessment)

Financial Dimension Inputs:

Payment status (current, late, past due)
Expansion in last 12 months (Y/N)
Contract value (ARR)

Data Source Mapping: Document where each metric comes from:

Product analytics platform
CRM system
Support ticketing system
Survey tools
Billing system

Establishing Weighting Methodology

Data-Driven Weight Assignment:

Method 1: Correlation Analysis

Calculate correlation between each dimension and retention
Assign weights proportional to correlation strength

Example:

Usage correlation: 0.70 → Weight: 35%
Engagement correlation: 0.50 → Weight: 25%
Sentiment correlation: 0.40 → Weight: 20%
Relationship correlation: 0.30 → Weight: 15%
Financial correlation: 0.10 → Weight: 5%

Method 2: Regression Analysis

Run logistic regression with churn as outcome
Use coefficient values to inform weights
More sophisticated than simple correlation

Method 3: Expert Judgment (When Data Limited)

Survey CSM team on predictive power of each dimension
Weight based on consensus
Validate against outcomes as data accumulates

Method 4: Equal Weighting (Starting Point)

All dimensions weighted equally
Adjust based on performance
Quick to implement but less accurate

Best Practice: Start with correlation analysis (if data exists) or expert judgment. Refine weights quarterly based on predictive accuracy.

Data-Driven Model Development

Analyzing Historical Data Patterns

Historical Analysis Steps:

Step 1: Gather Retention Data

Last 12-24 months of customer data
Renewal outcomes (renewed vs churned)
Final health scores before renewal
Dimension scores

Step 2: Segment Analysis

Retention rate by health score range
Retention rate by dimension score
Segment-specific patterns (enterprise vs SMB)

Example Analysis:

Health Score Range	Retention Rate	Sample Size
90-100	98%	45
80-89	95%	112
70-79	88%	134
60-69	75%	87
50-59	58%	56
<50	35%	41

Insight: Clear threshold at 60 where retention drops significantly.

Step 3: Identify Patterns

Which churned customers had high scores? (false negatives)
Which renewed customers had low scores? (false positives)
What signals did we miss?

Step 4: Refine Model

Adjust weights
Add missing dimensions
Recalibrate thresholds

Correlation Analysis with Outcomes

Running Correlation Analysis:

For Each Dimension: Calculate correlation coefficient with retention (0 to 1, higher = stronger relationship)

Example Results:

Usage score correlation with retention: 0.72
Engagement score correlation: 0.48
Sentiment score correlation: 0.35
Relationship score correlation: 0.52
Financial score correlation: 0.21

Interpretation:

Strong predictors (>0.6): Usage
Moderate predictors (0.4-0.6): Engagement, Relationship
Weak predictors (<0.4): Sentiment, Financial

Actions:

Increase weight for strong predictors (usage)
Maintain moderate weights for moderate predictors
Reduce weight or remove weak predictors (unless strategic value)

Multi-Variate Analysis: Some dimensions may be predictive in combination but not individually. Test combinations:

Low usage + low engagement = very high churn risk
Low usage + high engagement = re-onboarding opportunity

Identifying Predictive vs Vanity Metrics

Predictive Metrics: These actually predict what's going to happen. When these numbers move, retention moves.

Examples:

Active user percentage (real predictor of retention)
Login frequency (people who log in regularly stick around)
QBR attendance (engaged customers show up)
Feature adoption depth (power users don't churn)

Vanity Metrics: These look good in a dashboard but don't tell you much about retention. They might correlate with health, but they don't cause it.

Examples:

Total registered users (meaningless if they're not active)
Total data stored (unless storage actually drives value for your product)
Product page views (browsing isn't the same as using)
Emails sent (sending emails means nothing if nobody opens them)

How to Tell the Difference:

Test 1: Does It Correlate With Retention? Run the numbers. If the metric moves and retention doesn't, it's vanity.

Correlates → Potentially predictive
Doesn't correlate → Probably vanity

Test 2: Does Improving It Actually Improve Retention? This is the causation test.

Yes → Predictive
No → Vanity

Test 3: Does It Change Before Churn or After? Timing matters.

Changes before churn → Leading indicator (useful!)
Changes after churn → Lagging indicator (too late to help)

Build your health score on predictive, leading indicators. Leave the vanity metrics for your marketing slides.

Testing and Validating Models

How to Validate Your Model:

Step 1: Test It Against Historical Data

Run your health score model on past customer data
Compare what the model predicted to what actually happened
Calculate your accuracy metrics

Step 2: Measure How Accurate You Are

True Positive Rate (Did You Catch the Churners?): Of the customers who actually churned, how many did you flag as at-risk?

Formula: True Positives / (True Positives + False Negatives)
Target: >75%

True Negative Rate (Did You Get the Healthy Ones Right?): Of the customers who renewed, how many did you correctly flag as healthy?

Formula: True Negatives / (True Negatives + False Positives)
Target: >85%

Overall Accuracy: Of all your predictions, how many were right?

Formula: (True Positives + True Negatives) / Total Customers
Target: >80%

Step 3: Figure Out Why You Were Wrong

False Positives (you said at-risk, but they renewed):

Why did your model think they were at-risk?
What signal did you miss that showed they were actually fine?
How can you adjust the model to reduce these?

False Negatives (you said healthy, but they churned):

What signals did you completely miss?
What dimension needs to be added or weighted more heavily?
These are more dangerous than false positives—you missed a real risk

Step 4: Fix Your Model

Adjust weights based on what you learned
Add dimensions you were missing
Recalibrate your thresholds
Test it again on historical data

Step 5: Keep Watching It

Track accuracy as the model runs live
Compare predictions to actual renewal outcomes every month
Keep refining it quarterly

Iterating Based on Results

Continuous Improvement Cycle:

Monthly Review:

Which at-risk accounts actually churned?
Were there healthy accounts that churned (miss)?
False positive rate (at-risk accounts that renewed)
CSM feedback on score accuracy

Quarterly Refinement:

Full model validation
Weight adjustments
Threshold recalibration
Add/remove dimensions

Annual Overhaul:

Major model redesign if needed
Incorporate new data sources
Adopt new methodologies (ML, etc.)

Example Iteration:

Quarter 1:

Model accuracy: 73%
False negative rate: 32% (too many healthy customers churned)
Analysis: Usage dimension not weighted heavily enough
Action: Increase usage weight from 30% to 40%

Quarter 2:

Model accuracy: 79%
False negative rate: 24%
Improvement: Catching more at-risk customers
New issue: False positives increased
Action: Adjust at-risk threshold from <60 to <55

Quarter 3:

Model accuracy: 84%
Balanced false positives and negatives
CSM feedback: Scores feel accurate
Action: Maintain current model, continue monitoring

Score Calculation Methods

Simple Weighted Average

This Is What Most Companies Use: Calculate scores for each dimension, apply your weights, add them up. Done.

Here's How It Works:

Step 1: Score Each Dimension (0-100)

Usage: 75 (based on active users, login frequency, which features they use)
Engagement: 80 (touchpoints, QBR attendance, training participation)
Sentiment: 70 (NPS, CSAT scores)
Relationship: 60 (they have a champion but no exec sponsor yet)

Step 2: Apply Your Weights

Usage: 75 × 0.40 = 30.0
Engagement: 80 × 0.25 = 20.0
Sentiment: 70 × 0.20 = 14.0
Relationship: 60 × 0.15 = 9.0

Step 3: Add It Up Total Health Score = 30.0 + 20.0 + 14.0 + 9.0 = 73

Why This Works:

Simple enough for anyone to understand
Easy to explain to stakeholders
You can see exactly how each dimension contributes
Flexible—easy to adjust weights when you need to

The Downsides:

It's linear, so it doesn't capture complex interactions between dimensions
You need data for all dimensions, or the math breaks

Red/Yellow/Green Categorical

The Traffic Light Approach: Instead of a numeric score, just assign a color. Simple as that.

How It Works:

Define what qualifies for each color
Check where the account fits
Assign the color

Example Criteria:

Green (Healthy):

≥70% licenses active AND
Attended last QBR AND
NPS ≥7 AND
Executive sponsor is engaged

Yellow (Moderate):

50-69% licenses active OR
Missed last QBR OR
NPS 5-6 OR
No executive sponsor

Red (At Risk):

<50% licenses active OR
No touchpoints in 60 days OR
NPS <5 OR
Multiple P1 support tickets open

Why This Works:

Super simple
Clear action categories (green = maintain, yellow = improve, red = save)
Non-technical stakeholders get it immediately

The Downsides:

Not very nuanced—you only get 3 states
Hard to prioritize when you have 50 yellow accounts
You can't see trending (improving or declining)
The thresholds are arbitrary (70% usage gets green, 69% gets yellow—really?)

Use this if: You have a small team, simple product, or you're just starting with health monitoring.

Points-Based Scoring

Method: Assign points for specific behaviors or attributes. Sum points to total score.

Example:

Criteria	Points
≥80% license utilization	20
60-79% license utilization	15
<60% license utilization	5
Attended last QBR	15
Executive sponsor identified	15
Champion active	10
NPS 9-10	15
NPS 7-8	10
NPS 0-6	0
No support tickets	10
Feature adoption ≥70%	10
Total Possible	100

Customer A:

75% utilization: 15 points
Attended QBR: 15 points
Has exec sponsor: 15 points
No champion: 0 points
NPS 8: 10 points
2 support tickets: 0 points
80% feature adoption: 10 points
Total: 65 points (Moderate)

Pros:

Easy to build and adjust
Clear point allocation
Flexible (add/remove criteria easily)

Cons:

Can become complex (too many criteria)
Point values somewhat arbitrary
May not reflect true predictive weights

Percentile Ranking

Method: Rank accounts relative to each other, assign health score based on percentile.

Example:

Top 20% of accounts: 90-100 (Healthy)
20-50%: 70-89 (Good)
50-80%: 50-69 (Moderate)
Bottom 20%: 0-49 (At Risk)

Pros:

Relative comparison (shows where account stands vs peers)
Automatically adjusts as portfolio improves
Useful for benchmarking

Cons:

Score depends on cohort (same behavior = different score in different cohorts)
Bottom 20% always "at risk" even if all accounts healthy
Not absolute measure

Best for: Mature portfolios with large customer bases, benchmarking, prioritization.

Machine Learning Models

The Advanced (and Complicated) Approach: Use ML algorithms to predict churn probability based on historical patterns. This is the fancy option.

Common Algorithms:

Logistic regression (predicts churn probability from 0 to 1)
Random forest (ensemble of decision trees)
Gradient boosting (XGBoost, LightGBM)
Neural networks (if you have massive datasets)

How It Works:

Input: All your customer data (usage, engagement, everything)
The model trains itself on historical churn data
Output: Churn probability (0-100%)
Your health score = 100 - churn probability

Why This Can Be Great:

Most accurate method (when you have enough data)
Captures complex interactions between dimensions
Finds patterns humans would never spot
Gets better over time as you feed it more data

Why This Can Be a Nightmare:

You need serious data science expertise
Requires tons of historical data (think 1000+ customers, 2+ years minimum)
"Black box" problem—hard to explain why a score is what it is
Infrastructure and maintenance costs add up fast

Use this if: You're a large SaaS company with a data team and mature datasets. If you're still figuring out your basic health scoring, skip this for now.

Model Segmentation

Segment-Specific Models

Why Segment: Different customer segments have different behaviors, adoption patterns, and health profiles.

Common Segmentation Approaches:

By Company Size:

Enterprise (1000+ employees)
Mid-Market (100-999)
SMB (<100)

Differences:

Enterprise: Slower adoption, complex implementations, longer sales cycles
SMB: Fast adoption, simpler usage, higher churn rates

By Product or Plan:

Starter/Basic tier
Professional tier
Enterprise tier

Differences:

Enterprise plans: More features, higher engagement expected
Starter plans: Limited features, lower engagement still healthy

By Industry:

Healthcare
Financial services
Technology
Manufacturing

Differences:

Industry-specific usage patterns
Regulatory requirements affect engagement
Different value drivers

By Use Case:

Sales teams
Marketing teams
Engineering teams

Differences:

Different feature usage
Different adoption curves
Different success metrics

Journey Stage Considerations

Health Score by Customer Lifecycle Stage:

Onboarding (0-90 days):

Lower baseline usage expected (still ramping)
Focus on activation milestones
Engagement more important than usage
Threshold: Moderate = 40+, Healthy = 60+

Adoption (90 days - 12 months):

Usage ramping up
Feature breadth expanding
Standard health thresholds apply
Threshold: Moderate = 50+, Healthy = 70+

Maturity (12+ months):

Expect full usage and engagement
Higher thresholds for healthy
Look for expansion signals
Threshold: Moderate = 60+, Healthy = 75+

Renewal Period (60 days before renewal):

Critical period
Lower tolerance for at-risk
Extra attention to relationship and sentiment
Threshold: At-risk if <65, even if normally moderate

Adjust health scoring and thresholds based on customer journey stage.

When to Use Universal vs Segment Models

Universal Model (One Model for All):

Pros:

Simpler to build and maintain
Consistent across portfolio
Easier to compare accounts

Cons:

Less accurate (doesn't account for segment differences)
May miss segment-specific patterns
One-size-fits-all limitations

Use When:

Small customer base (<200 customers)
Homogeneous customer segments
Early in health scoring maturity
Limited data or resources

Segment-Specific Models:

Pros:

More accurate predictions
Accounts for segment behaviors
Better threshold calibration
Enables segment benchmarking

Cons:

More complex to build and maintain
Requires sufficient data per segment
Harder to compare across segments

Use When:

Large customer base (>500 customers)
Diverse customer segments
Mature health scoring program
Sufficient data per segment (>100 customers)

Hybrid Approach:

Start with universal model
Add segment adjustments (segment-specific thresholds)
Gradually move to fully separate models as data permits

Implementation and Operationalization

Technology and Infrastructure

The Build vs Buy Decision:

Buy: Customer Success Platform

Tools like Gainsight, Totango, ChurnZero, Catalyst
Pros: You're up and running fast, proven functionality, they handle updates
Cons: Costs $50k-200k per year, less flexible, you're locked into their system
Use this if: You're a mid-to-large CS team with budget and you want speed

Build: Custom System

Stack: Your own data warehouse + BI tool + custom scoring engine
Pros: Total control, built exactly for your needs, cheaper long-term
Cons: Eats up engineering time, you own all the maintenance, slower to launch
Use this if: You have a technical team, unique requirements, and engineering resources to spare

Hybrid: Mix and Match

Core: Use a CS platform for scoring and alerts
Custom: Build your own data warehouse for complex analytics
Integrations: Connect everything (product analytics, CRM, support)
Use this if: You're like most companies—you want a balance of speed and flexibility

What You Actually Need:

Data integration layer (pulls data from all your systems)
Scoring engine (does the math to calculate health scores)
Visualization layer (dashboards people will actually look at)
Alerting system (notifications and automated workflows)
Historical database (so you can track trends over time)

Data Pipeline and Automation

Automated Data Flow:

Product DB → ETL → Data Warehouse → Scoring Engine → Dashboard
CRM → API → Data Warehouse → Scoring Engine → Dashboard
Support → API → Data Warehouse → Scoring Engine → Dashboard
Survey → Webhook → Data Warehouse → Scoring Engine → Dashboard

Pipeline Steps:

1. Extract:

Pull data from source systems (product analytics, CRM, support)
Schedule: Daily for most metrics, real-time for critical alerts
Handle API rate limits and errors

2. Transform:

Normalize data formats
Calculate derived metrics (% active users, usage trends)
Aggregate to account level
Join data from multiple sources

3. Load:

Store in data warehouse
Calculate health scores
Update dashboards
Trigger alerts if thresholds crossed

4. Archive:

Store historical scores for trending
Enable year-over-year comparisons

Automation Best Practices:

Monitor pipeline health (alert on failures)
Validate data quality (check for anomalies)
Document data sources and transformations
Version control scoring logic

Score Refresh Frequency

How Often to Recalculate:

Real-Time (Continuous):

Use for: Critical alerts (P1 tickets, payment failures)
Requires: Streaming data pipeline, higher infrastructure cost
Example: Payment past due → instant alert

Daily:

Use for: Standard health scores, most accounts
Requires: Nightly batch job, moderate infrastructure
Example: Usage data updated each morning

Weekly:

Use for: Low-touch accounts, less critical metrics
Requires: Weekly batch job, simple infrastructure
Example: SMB accounts with stable patterns

Considerations:

More frequent = more current but higher cost
Less frequent = sufficient for most needs, simpler
Hybrid: Real-time for critical, daily for standard

Recommended: Daily refresh for health scores, real-time for critical alerts.

Why Trending Matters as Much as the Score Itself:

The direction an account is moving matters just as much as where they are right now. A score of 70 that's climbing looks completely different from a 70 that's dropping fast.

Here's what trending tells you:

Catch problems early, before they become critical
Know if your interventions are actually working
Spot seasonal patterns you need to account for

Time Windows That Matter:

30-Day Change (Short-Term):

Shows you quick wins or new problems
Alert if it drops more than 10 points
Good for catching immediate issues

90-Day Change (Medium-Term):

Shows sustained improvement or decline
Most actionable timeframe for interventions
This is where you should focus

12-Month Change (Long-Term):

Reveals customer lifecycle patterns
Good for cohort analysis
Helps you understand what "normal" looks like

Use Momentum Indicators:

Improving: ↑ (score going up)
Stable: → (score flat, within ±5 points)
Declining: ↓ (score going down)

Here's Why This Matters:

Account A:

Current score: 70
30-day change: +8
90-day change: +15
Status: Moderate but improving ↑
What to do: Whatever you're doing is working—keep it up

Account B:

Current score: 72
30-day change: -12
90-day change: -18
Status: Moderate but declining ↓
What to do: Something's wrong—investigate now and intervene

Same score, completely different situations, totally different actions needed.

Integration with Workflows

Operationalize Health Scores:

CSM Daily Workflow:

Check dashboard for alerts
Review accounts with declining health
Focus on at-risk accounts (score <50)
Update success plans based on scores

Automated Playbooks:

Health drops to at-risk → Trigger save playbook
Health improves to healthy → Trigger expansion playbook
30 days to renewal + moderate health → Trigger renewal prep playbook

CRM Integration:

Sync health scores to CRM (Salesforce, HubSpot)
Display on account page
Use in reporting and forecasting
Trigger sales team alerts (exec escalation)

Communication Integration:

Email alerts to CSMs (daily digest of at-risk accounts)
Slack notifications (critical alerts)
Automated customer outreach (based on health changes)

Meeting Preparation:

Pull health score before QBR
Prepare talking points (wins and concerns)
Set agenda based on health insights

Accuracy Measurement and Tracking

Key Accuracy Metrics:

Predictive Accuracy: Of all predictions, how many were correct?

Formula: (True Positives + True Negatives) / Total
Benchmark: >80% is good, >85% is excellent

Precision (Positive Predictive Value): Of customers flagged at-risk, how many actually churned?

Formula: True Positives / (True Positives + False Positives)
Benchmark: >60% (some false positives acceptable to catch all risk)

Recall (Sensitivity): Of customers who churned, how many did we flag as at-risk?

Formula: True Positives / (True Positives + False Negatives)
Benchmark: >75% (critical to catch most churn)

F1 Score: Balance of precision and recall

Formula: 2 × (Precision × Recall) / (Precision + Recall)
Benchmark: >0.70

Track Monthly: Calculate these metrics each month as renewals occur and compare predictions to actuals.

False Positive/Negative Analysis

False Positives (Type I Error): Flagged as at-risk but renewed.

Impact:

Wasted CSM time
Unnecessary interventions
Alert fatigue
Lower confidence in scores

Example: Account flagged as at-risk (score 45) but renewed at 100%.

Analysis:

Why did model think at-risk? (Low usage)
Why did they actually renew? (Still saw value, exec champion)
Learning: Add executive sponsor dimension, increase relationship weight

False Negatives (Type II Error): Flagged as healthy but churned.

Impact:

Missed intervention opportunity
Lost revenue
More dangerous than false positives
Erodes trust in model

Example: Account flagged as healthy (score 78) but churned.

Analysis:

What signals did we miss? (New competitor, budget cut)
What dimension should catch this? (Competitive intelligence, financial)
Learning: Add competitive tracking, increase weight on stakeholder changes

Monthly Review Process:

Identify all false positives and false negatives
Analyze root causes
Identify model improvements
Implement changes
Validate on historical data

Model Drift Detection

What Is Model Drift: Your model's accuracy degrades over time because your customers, product, or market are changing. What predicted retention six months ago might not work today.

Signs Your Model Is Drifting:

Accuracy dropping month after month
More false positives or false negatives than before
CSMs saying "these scores don't feel right anymore"
New patterns your model doesn't capture

What Causes Drift:

Product changes (you launched new features or redesigned the UI)
Customer behavior evolves (usage patterns shift over time)
Market dynamics change (new competitor enters the scene)
Your data quality gets worse

How to Catch It:

Track accuracy trends (if it's declining for 3+ months straight, you've got drift)
Compare current accuracy to historical accuracy
Watch for shifts in your prediction distribution

How to Fix It:

Retrain your model on recent data
Add new dimensions that capture new patterns
Adjust weights to reflect what matters now
Update thresholds based on current behavior

How to Prevent It:

Validate your model every quarter
Track accuracy continuously
Get regular feedback from your CSM team
Document when you make product or go-to-market changes

Regular Review and Updates

Model Maintenance Schedule:

Weekly:

Monitor alert volume and response
Track CSM feedback on scores
Identify data quality issues

Monthly:

Calculate accuracy metrics
Review false positives/negatives
Identify quick wins (threshold adjustments)

Quarterly:

Full model validation
Weight adjustments
Dimension additions/removals
Backtest on recent data
Implement refinements

Annual:

Comprehensive model review
Consider major redesign if needed
Adopt new methodologies (ML, etc.)
Benchmark against industry standards
Align with strategic priorities

Documentation:

Track all model changes
Document rationale
Measure impact
Share learnings with team

A/B Testing Model Variations

Test Model Changes Before Full Rollout:

Example A/B Test:

Control (Current Model):

Usage: 35%
Engagement: 25%
Value: 20%
Relationship: 15%
Financial: 5%

Variant (Proposed Model):

Usage: 40% (increased)
Engagement: 25%
Value: 15% (decreased)
Relationship: 20% (increased)
Financial: 0% (removed)

Test Setup:

Apply both models to last 6 months of historical data
Compare accuracy metrics
Identify which model predicts better

Results:

Metric	Current Model	New Model
Accuracy	78%	84%
Precision	65%	72%
Recall	73%	81%
F1 Score	0.69	0.76

Decision: New model performs better across all metrics. Implement.

Shadow Mode Testing:

Run new model in parallel with current model
Don't act on new model scores yet
Compare predictions to actual outcomes over 1-2 months
If new model more accurate, switch

Benefits:

Validate improvements before rollout
Reduce risk of making model worse
Data-driven decision making
Build confidence in changes

Using Health Scores Effectively

CSM Prioritization and Focus

Prioritize Accounts by Health:

Tier 1: Critical (Score <40)

Immediate action required
Daily monitoring
Save plans, escalation
Time allocation: 40% of CSM time

Tier 2: At Risk (Score 40-60)

Proactive intervention
Weekly touchpoints
Improvement initiatives
Time allocation: 30% of CSM time

Tier 3: Moderate (Score 60-75)

Maintain and improve
Bi-weekly touchpoints
Standard cadence
Time allocation: 20% of CSM time

Tier 4: Healthy (Score 75+)

Maintain and grow
Monthly touchpoints
Expansion conversations
Time allocation: 10% of CSM time

Dynamic Prioritization: Re-prioritize daily as health scores change. Account that drops from healthy to at-risk moves up the priority list immediately.

Triggering Interventions and Playbooks

Health Score Thresholds Trigger Actions:

Score Drops Below 50:

Playbook: At-Risk Intervention
Actions: Root cause analysis, save plan, weekly check-ins, escalation path

Score Drops 15+ Points in 30 Days:

Playbook: Rapid Decline Investigation
Actions: Emergency CSM call, identify cause, immediate intervention

Score Improves to 80+:

Playbook: Expansion Opportunity
Actions: Identify expansion signals, schedule expansion call, generate proposal

60 Days to Renewal + Score <70:

Playbook: Renewal Risk
Actions: Renewal prep, value reporting, stakeholder mapping, negotiation strategy

Automated Playbook Triggers: Integrate health scores with CS platform to automatically launch playbooks when thresholds crossed.

Executive Reporting

Monthly Executive Dashboard:

Portfolio Health Summary:

Total customers: 487
Healthy (75+): 312 (64%)
Moderate (50-74): 130 (27%)
At Risk (<50): 45 (9%)
At-Risk ARR: $2.3M

Trends:

Health improving: 78 accounts (16%)
Health declining: 52 accounts (11%)
Net trend: Positive

Focus Areas:

Top 10 at-risk accounts (by ARR)
Accounts approaching renewal
Intervention success stories

Actions:

Saved customers this month: 8 ($450k ARR)
Expansion opportunities: 15 ($780k potential)

Customer-Facing Health Reports

Sharing Health Insights with Customers:

What to Include:

Usage metrics (active users, feature adoption)
Progress over time (celebrating growth)
Benchmarks (vs similar companies)
Recommendations (areas for improvement)

What to Exclude:

Actual health "score" or "grade" (feels judgmental)
"At risk" or "churn" language (negative framing)
Internal scoring methodology

Format:

Part of QBR presentation
Monthly email digest
Self-service dashboard

Example Customer-Facing Language:

"Your adoption grew 18% this quarter! You now have 78 active users and are using 6 of 8 core features. Companies at your adoption level report 2.3x productivity gains.

To unlock even more value: - Expand reporting adoption to managers (40% time savings) - Enable integrations (60% usage increase) - Pilot with marketing team (similar to [Customer X])"

Tone: Positive, helpful, collaborative (not judgmental or punitive)

Avoiding Over-Optimization

Beware of Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure." In other words, the moment you start optimizing for the health score itself, it stops being useful.

Here's What Can Go Wrong:

Gaming the Metrics:

CSMs start focusing on improving scores rather than actual customer success
You optimize for metrics instead of outcomes
Example: You push customers to log in more (improves the metric) without actually helping them get value (the outcome that matters)

False Comfort:

High scores make you complacent
You miss important context that the score doesn't capture
Example: Account has a score of 85, but the executive champion just left the company—your model doesn't track that

Tunnel Vision:

You only pay attention to what's measured
Important qualitative signals get ignored
Example: Customer is visibly frustrated but still using the product out of necessity (usage high, actual sentiment terrible)

How to Avoid These Traps:

Balance Scores with Human Judgment:

Let CSMs override scores when they have good reason
Keep doing regular qualitative check-ins
Trust your CSM's gut when it conflicts with the score

Track Outcomes, Not Just Scores:

What matters is retention rate, not health scores
Measure customer satisfaction, not just usage numbers
Focus on value realization, not just engagement activities

Use Multiple Metrics:

Don't rely on a single health score for everything
Track expansion, advocacy, and satisfaction separately
Get a holistic view of what's really happening

Review Your Model Regularly:

Make sure scores still predict actual outcomes
Adjust when customer behavior patterns change
Add new signals when you spot gaps

The Bottom Line

Not all health scores are created equal. The difference between a good health score and a useless one comes down to thoughtful design, continuous validation, and a willingness to keep refining it.

When you build a health score model that actually works, here's what you get:

Churn prediction with >80% accuracy (yes, this is achievable)
4-6 weeks of lead time to intervene before customers churn
CSM time spent on accounts that actually need help
Data-driven decisions instead of gut feel
Proactive customer success instead of constantly reacting to fires

A health score model that works has these components:

Multi-dimensional scoring (usage, engagement, relationship, sentiment—not just one thing)
Data-driven weighting (based on what actually predicts retention in your business)
Segment-specific models (because enterprise and SMB customers behave completely differently)
Historical trending (momentum matters as much as the current score)
Continuous validation (check accuracy monthly against actual outcomes)
Regular refinement (update the model quarterly as you learn what works)

Start simple, test it against real outcomes, and keep improving it. Your health score model is never "done"—it needs to evolve as your product, customers, and market evolve.

Build a model that actually predicts outcomes, not one that just looks impressive in a dashboard.

Ready to build your health score model? Start with customer health monitoring, implement early warning systems, and track retention metrics.

Learn more:

Tara Minh

Operation Enthusiast

Post-Sale Management

Health Score Models: Designing Effective Customer Health Scoring

Health Score Fundamentals

Purpose and Use Cases

Types of Health Scores

Score Components and Dimensions

Weighting and Calculation Methods

Setting Score Ranges and Thresholds

Designing Your Health Score Model

Identifying Outcomes to Predict

Selecting Health Score Dimensions

Determining Data Inputs and Metrics

of core features used (breadth)

of contacts in CRM

of departments using product

Establishing Weighting Methodology

Data-Driven Model Development

Analyzing Historical Data Patterns

Correlation Analysis with Outcomes

Identifying Predictive vs Vanity Metrics

Testing and Validating Models

Iterating Based on Results

Score Calculation Methods

Simple Weighted Average

Red/Yellow/Green Categorical

Points-Based Scoring

Percentile Ranking

Machine Learning Models

Model Segmentation

Segment-Specific Models

Journey Stage Considerations

When to Use Universal vs Segment Models

Implementation and Operationalization

Technology and Infrastructure

Data Pipeline and Automation

Score Refresh Frequency

Historical Trending and Changes

Integration with Workflows

Model Validation and Refinement

Accuracy Measurement and Tracking

False Positive/Negative Analysis

Model Drift Detection

Regular Review and Updates

A/B Testing Model Variations

Using Health Scores Effectively

CSM Prioritization and Focus

Triggering Interventions and Playbooks

Executive Reporting

Customer-Facing Health Reports

Avoiding Over-Optimization

The Bottom Line

On this page