日本語

Common Data Scientist Pitfalls (And How to Stop Repeating Them)

I've watched the same pattern play out on three different teams. A new data scientist joins, ships a clean first model in six months, gets the obligatory shoutout in the all-hands. Then somewhere around month nine or twelve, the wheels stop turning. Promo doesn't come. The PM goes quiet. A re-org lands them on a "less strategic" team and they spend the next year wondering what happened.

Almost every time, it's one of seven mistakes. Sometimes two or three at once.

This isn't a "top mistakes to avoid" listicle. It's a list of the specific things that kill DS careers between months 6 and 18, when the new-grad goodwill runs out and stakeholders start expecting actual business impact. Each pitfall comes with a symptom you can spot on yourself, a real number that makes the cost concrete, and a fix you can run this week.

Why This Matters Now

Year one, you get a free pass. People are patient. They expect ramp-up. You can ship a notebook and call it a deliverable.

Years two and three, that's gone. You're judged on shipped business outcomes: models running in production, decisions changed, dollars saved or made. The data scientist who makes Senior in 2-3 years versus the one who plateaus at IC2, the gap is rarely IQ or stats chops. Almost everyone at this level has the technical baseline. The gap is whether they avoided this list.

If you're 6-18 months in and something feels off but you can't name it, start here.

Pitfall 1: Starting With the Model, Not the Problem

Symptom. You opened a notebook and started pulling data before you wrote down what decision the model was supposed to inform. Maybe you already picked XGBoost. Maybe you're thinking about whether to try a transformer. The Slack thread with the stakeholder has three messages in it.

The Cost. Gartner and VentureBeat have published the same depressing number for years: 60-70% of ML projects never reach production. The technical work isn't usually what kills them. Most die because nobody framed the business decision the model was supposed to inform, so when the model is "done," there's no place for it to plug in. The dashboard nobody opens. The score nobody acts on. Six months of your life.

The Fix. Before you open a notebook, write a one-page problem brief. Not a Confluence essay. One page.

  • What decision is being made, and by whom
  • What the current baseline is (rules engine, gut feel, last quarter's average)
  • What "better" looks like in dollars or a business KPI
  • Who acts on the model output, and through which surface (dashboard, alert, API call, email)
  • What the cost of being wrong is, in both directions

If you can't answer all five, you don't have a project yet. You have a science fair. Walk back to the stakeholder and have the conversation.

Pitfall 2: Building in a Notebook and Tossing It Over the Wall

Symptom. Your handoff doc is a Jupyter notebook with hardcoded paths and a cell that reads df = pd.read_csv('/Users/yourname/Downloads/data_v3_FINAL.csv'). Engineering quotes 6-8 weeks to "productionize" it. Half their effort goes into rewriting your code into something that can actually run on a schedule.

The Cost. Anaconda's State of Data Science survey has put around 38% of DS time on data prep and deployment friction for years running. Notebook-to-prod rewrites are the largest chunk of that deployment friction. Every time eng has to translate your notebook into production code, you've added weeks to the timeline and made the eng team less excited to work with you next quarter.

The Fix. From week one of a project, write code as if a teammate will run it tomorrow on a different machine. This isn't DevOps. It's basic hygiene.

  • Pull cleaning and feature logic into importable .py modules. The notebook calls them. It doesn't redefine them.
  • Pin dependencies in a requirements.txt or pyproject.toml. Not "whatever was on my laptop in March."
  • Parameterize paths and dates. No /Users/yourname/. No hardcoded 2024-Q3.
  • Use a config file or env vars for anything that changes between dev and prod.
  • Run your own code in a fresh environment before you hand it off. If it breaks for you on a clean install, it'll break for eng.

You don't need Kubernetes. You need code somebody else can run without scheduling a Zoom.

Pitfall 3: Skipping Production Monitoring

Symptom. The model shipped. You moved on to the next project. Three months later, support tickets spike. Customer success is yelling. You spend two days digging in and discover the model has been silently drifting since week four.

The Cost. Industry studies consistently put model performance decay at 20-40% within 6-12 months for typical business models without retraining. Customer behavior shifts. Upstream data sources change schemas without telling you. A new product feature changes the input distribution. Without monitoring, you find out from the people who got hurt: your customers and your stakeholder.

The Fix. Ship a monitoring dashboard the same week you ship the model. Not "after the next sprint." Same week.

  • Track Population Stability Index (PSI) on your top 5-10 features. Alert when PSI > 0.2.
  • Track prediction distribution day-over-day. A sudden shift in mean or shape is a leading indicator.
  • Track the business KPI the model is tied to (conversion rate, churn rate, fraud catch rate). If the KPI drifts, you need to know before the VP does.
  • Set a retraining cadence. Quarterly is the floor for most business models, monthly for anything in a fast-moving domain.
  • Put your name and contact on the dashboard. Own it.

A model without monitoring is a liability, not an asset. Treat it that way.

Pitfall 4: Over-Engineering Features

Symptom. Your final pipeline has 200 features and a 12-step preprocessing chain. AUC went from 0.81 to 0.83. You're proud of it. Your stakeholder doesn't notice the difference.

The Cost. Those extra 180 features doubled training time, tripled the failure surface in production (every feature is a potential null, schema break, or upstream incident), and the lift you "won" is sitting inside the noise band of your test set. You're now on the hook to maintain a fragile pipeline for a 2% AUC bump nobody asked for.

The Fix. Start lean. Add only when the math justifies it.

  • Open with 10-20 features. The ones a domain expert would name in a meeting.
  • Add a feature only when SHAP or permutation importance shows it buys >1% lift on a held-out set.
  • Add a feature only when the stakeholder cares about that 1%. If 1% of churn is $50K, fine. If it's $500, drop it.
  • Every feature you add is a maintenance promise. If you can't commit to maintaining it, don't add it.
  • When in doubt, ablate. Drop the bottom-importance features and see if anything actually breaks.

The Senior DS bar isn't "highest AUC." It's "highest business impact per unit of pipeline complexity."

Pitfall 5: Reporting AUC Without a Business Metric

Symptom. Your readout slide says "AUC 0.87, F1 0.74, log loss 0.21." Your stakeholder nods politely. They forget the project exists by the next quarterly review.

The Cost. Zero direct dollars, but the project stops getting funded because nobody can explain it to the VP. You've built something that works and nobody knows it works. Six months later, when budget gets tight, the project that "we couldn't really articulate the value of" is the first one cut.

The Fix. Every model report leads with the business metric. Always.

  • Open with the dollars or the KPI: "$2.1M in churn saved at this threshold," "14% precision lift over the rules engine," "23% reduction in fraud chargebacks."
  • Show the tradeoff curve in business terms. "At threshold 0.4, we catch 80% of fraud and flag 3% of legitimate transactions. At threshold 0.6, we catch 60% and flag 0.5%. Here's what each costs."
  • Tie the metric to a number the stakeholder already cares about. If they live by ARR, frame it in ARR. If they live by NPS, frame it in NPS.
  • AUC, F1, and log loss go in the appendix. They're for you and your DS reviewer, not the VP.
  • End with the decision the model enables, not the model itself.

A model report that doesn't lead with business value is a status update, not a result.

Pitfall 6: Ignoring Data Quality at the Source

Symptom. Your notebook has 400 lines of cleaning code because "the data is messy." You've explained this in the last three standups. You're starting to think of yourself as the person who knows where the bodies are buried in the customer events table.

The Cost. IBM's old number put bad data costs at $3.1 trillion a year across the US economy. At company level, surveys consistently show DS teams spending 40-60% of their time on cleaning. That's your time. Your salary. And every cleaning rule you write in your notebook is a rule that will silently rot the next time the upstream schema changes, because nobody else knows it exists.

The Fix. Treat every cleaning rule as a bug report you owe the data engineering team.

  • File the bug. Once. With a clear repro and the impact ("this affects three production models").
  • Stop fixing it in your notebook. Fix it upstream, in the source pipeline, where every downstream consumer benefits.
  • Push for data contracts on the tables you depend on. Schema, nullability, freshness SLA, owner.
  • Make data quality a shared metric. If your team and data eng both have "data freshness" or "schema break rate" on a dashboard, fixes happen faster.
  • When eng pushes back ("not a priority this quarter"), come with the dollar cost: DS hours wasted, models degraded, decisions blocked.

The notebook fix is a tax you pay forever. The upstream fix is a one-time investment. Pay it once.

Pitfall 7: Saying "The Model Is Fine, the Data Is Wrong" Without Fixing the Data

Symptom. You've used some version of this sentence in 3+ standups: "The model's working. The training data is just bad." You said it again in last week's review. The stakeholder went quiet.

The Cost. You, specifically. Within about six months, stakeholders route around you to a DS who "actually ships." You don't get pulled into the next strategic project. You don't get the promo conversation. The model that's "fine" is the model nobody trusts in production, which means it's not fine.

The Fix. Own the full pipeline. Period.

  • The model is your responsibility, and the model includes the data feeding it. There's no clean line where "your job" stops and "data eng's job" starts when the output is broken.
  • If the data is wrong, you escalate. Not in passive-voice Slack. In an email with a name on it and a deadline.
  • Write the spec for what "right" looks like. Numeric ranges, freshness, schema. Don't leave it to interpretation.
  • Sit in the eng review when the fix is being scoped. Make sure they're solving the right problem.
  • Don't drop it until it's fixed in production and verified in your monitoring dashboard.

"Not my problem" is a sentence that ends careers at this stage. The model isn't fine if it doesn't work in production. Making it work in production is your job.

The Pattern Underneath All Seven

Re-read the seven pitfalls. Every one is the same mistake.

Treating data science as a modeling job instead of a business-outcome job.

Pitfall 1 is "I focused on the model, not the decision." Pitfall 2 is "I focused on the model, not the handoff." Pitfall 3 is "I focused on the model, not what happens after launch." Four is "the model, not the maintenance cost." Five is "the model, not the dollars." Six is "the model, not the data feeding it." Seven is "the model, not the system around it."

The Senior DS bar is "ships measurable business impact." The IC2-forever bar is "produces accurate models that nobody uses." Look at any DS who made Senior in 2-3 years on your team. Then look at one who's been an IC2 for four years. The gap is almost never the math. It's whether they treated the model as the deliverable, or as one component of a system that has to actually work for the business.

Self-Audit

Run this against your last 2-3 projects. Be honest. The point isn't to feel bad — it's to know where you are.

For each project, ask:

  1. Did I write a one-page problem brief before opening a notebook?
  2. Was my handoff to engineering clean code in importable modules, not a notebook with hardcoded paths?
  3. Did I ship monitoring the same week I shipped the model?
  4. Did I keep features lean, adding only when SHAP justified it AND the stakeholder cared?
  5. Did my readout lead with the business metric, with AUC in the appendix?
  6. Did I file upstream data quality issues, or just clean it in my notebook?
  7. When data was wrong, did I own the fix end-to-end?

Count the no's.

  • 0-1 no's: You're on track. Keep doing what you're doing.
  • 2-3 no's: Course-correct now. Pick the one with the highest leverage and fix it on your next project.
  • 4+ no's: Have a real conversation with your manager this week. You're a re-org away from a less strategic team and you probably already feel it.

What "Good" Looks Like

The DS who makes Senior in 2-3 years isn't smarter than you. They've just internalized a different definition of the job.

They start every project with a problem brief, not a notebook. They write code from week one as if eng will run it on Monday. They ship monitoring the same week as the model and check it weekly. They keep features lean and ablate ruthlessly. They lead readouts with dollars, not AUC. They file data quality issues upstream and follow them through. When the data's wrong, they own the fix until it's deployed.

That person ships fewer notebooks and more shipped, monitored, business-impact systems. Stakeholders pull them into the next strategic project before the current one ends. The promo packet writes itself, because every project has a clean "I shipped X, it drove Y in business value, here's the dashboard" story.

You can be that person. Most of the gap is habits, not horsepower.

Learn More