日本語

Data Analyst Tools and Tech Stack: The Honest 6-Layer Build (With Real Prices)

I joined a Series B last year and inherited a stack with three BI tools, two reverse-ETL vendors nobody could remember the password to, a "data catalog" with eleven entries in it, and a $40K annual bill that produced exactly one weekly Slack screenshot. The Looker license alone was $52K. Looker was rendering twelve dashboards. Two of them got opened in the prior 90 days. One of those was mine, checking if the dashboard still worked.

That's the moment I learned what "modern data stack" really means: a logo soup that vendors sell to analysts who haven't yet been forced to defend the line items. If you can't draw your stack on a napkin and justify each layer to a CFO who's never heard of dbt, you'll lose the budget fight, and the budget fight is coming.

So here's the honest version. Six layers. Real prices. The vendors I'd cut from most stacks. And a 30-day audit you can run before you sign another renewal.

Why this matters now

Every CFO I talk to is asking the same question: "Why is our analytics tooling spend up 40% year over year when our headcount is flat?" The answer is usually that someone bought Snowflake when Postgres would've worked, someone else bought Looker because it came up in an interview, and a third person added Fivetran because the old engineer left and nobody wanted to maintain the Python scripts.

None of those decisions were wrong in isolation. The problem is nobody owns the whole stack. Tooling spend is the easiest line item for a CFO to question and the easiest one for an analyst to defend badly. If your answer to "why do we have this?" is "because the last person set it up," you've already lost.

Defensible stacks have one trait in common: every tool maps to exactly one layer, and every layer earns its seat. Six layers is enough.

The core 6 layers (everything else is optional)

1. Warehouse

This is the foundation. Pick wrong and the next three layers cost you 3x what they should.

  • Snowflake: usage-based, roughly $2-$4 per credit depending on your edition and region. Brilliant for spiky workloads and team-wide SQL access. Easy to overspend on if you don't set warehouse auto-suspend to 60 seconds and force everyone onto X-Small for ad-hoc work. I've seen a single misbehaving dbt run torch $800 in a weekend.
  • BigQuery: pay-per-query at $6.25/TB scanned (on-demand), or commit slots if you have predictable load. Great if your traffic is genuinely spiky and you don't want to manage compute. The slot model is confusing for first-timers. Read the docs before you commit.
  • Redshift: cheap if you commit to a reserved instance, painful if you don't. Reserved instances start around $0.25/node/hr and go up. The cluster model feels dated next to Snowflake/BigQuery, but if your shop is already on AWS and your DE team knows it cold, it's defensible.
  • Postgres: still the right answer under 1TB. Stop apologizing for it. A managed Postgres instance on RDS or Supabase runs $50-$500/month and handles everything a mid-stage analyst team actually queries. I've never seen a sub-1TB workload that justified Snowflake. Not once.

The decision tree: under 500GB, Postgres. 500GB-5TB with spiky load, BigQuery or Snowflake. Over 5TB or heavy concurrent users, Snowflake. Over 50TB and you have a DE team, Redshift if you'll commit.

2. ELT / ingestion

Getting data into the warehouse. This is where a lot of "modern stack" budgets quietly explode.

  • Fivetran: $1K-$10K/month depending on Monthly Active Rows. Brilliant when it works. Expensive when a connector breaks and you spend two days waiting on support. Their pricing model (MAR) is opaque enough that I've seen a $1,200/mo bill jump to $7,800 in one quarter because someone enabled a chatty Salesforce sync.
  • Airbyte: open-source, free if you self-host. Cloud version starts around $360/month for low volumes. Self-hosting on a small EC2 or GKE cluster runs roughly $200/month in infra. The trade-off: you'll fix things at 11pm. I've done it. It's fine if you have a half-decent DE or a strong analytics engineer. Don't pretend it's "free" if your team can't run it.
  • Stitch: mid-tier, fading. Decent if you already have it. I wouldn't start a new shop on it.

My default: Fivetran for the top 5-10 connectors that genuinely matter (Salesforce, HubSpot, Stripe, NetSuite, Postgres replicas). Airbyte for the long tail of weird APIs nobody else cares about. Don't run two of these at once for the same source. Pick.

3. Transformation

This layer is settled. It's dbt. Stop shopping.

  • dbt Core: free, open-source. Runs anywhere you can run Python. Most analyst teams should start here.
  • dbt Cloud: $50/developer/month for the Team tier, $300/developer/month for Enterprise. You're paying for the IDE, the scheduler, the docs hosting, and the CI integration. Worth it for teams of 3+ analysts who don't have a data engineer. Skip it if you have a DE willing to wire up Airflow or Dagster. Running dbt Core on Airflow is fine, and Airflow itself is free.

The only legitimate alternative is SQLMesh, and only if you're at a scale where dbt's full refresh patterns hurt. For most shops under 100 models, that's not you.

4. BI / dashboards

The most overshopped layer. Most teams have two BI tools because someone joined from a Tableau shop and someone else from a Looker shop and nobody made them pick.

  • Looker: enterprise pricing, public estimates put it at $50K+/year and going up fast. The semantic layer (LookML) is the moat. It's the only BI tool where governance actually works at scale. Don't buy it until you have a real semantic layer to build and a person to maintain it. Buying Looker without a LookML owner is like buying a Ferrari to drive in your garage.
  • Tableau: $75/user/month for Creator, $42 for Explorer, $15 for Viewer. Still the prettiest dashboards on the market. Painful for governance and version control. Good if your audience is execs who care about polish.
  • Hex: $40-$80/user/month depending on tier. Notebooks plus dashboards in one app. The right choice if your analysts spend half their time in SQL exploration and half in stakeholder-facing reports. Replaces the "Jupyter for me, Tableau for them" split.
  • Metabase: open-source, free self-hosted. Cloud Pro starts at $85/month for 5 users. The right answer for Series A and earlier. Honestly, the right answer for a lot of Series B too. I've seen Metabase outperform a $40K Looker license at companies that didn't have semantic-layer needs yet.

My rule: one BI tool. If you're under $10M ARR, Metabase. If you have a LookML owner and execs who demand governance, Looker. If your analysts are notebook-first, Hex. Tableau if leadership specifically asked for it. Anything else is a renewal you'll regret.

5. Notebook / exploration

Where analysts actually do the messy thinking before it becomes a dashboard.

  • Jupyter: free, local, works forever. The default. Pair it with VS Code and you're set.
  • Hex: already on your books if you bought it for BI. Kills two layers with one tool. This is part of why Hex's pricing pencils out for some teams.
  • Deepnote: free tier is generous. Paid plans start at $39/user/month. Strong collaborative editing. Worth it if your team genuinely co-edits notebooks; less compelling if everyone works alone.

If you bought Hex for BI, don't add Deepnote. If you didn't, Jupyter is fine.

6. Ticket / intake

The layer most analysts don't think of as a layer. It is.

  • Jira, Notion, or Linear: pick one. Whatever the eng team uses is usually fine. The point isn't the tool. The point is killing the Slack DM as the intake channel.

Slack DMs as analytics intake produce no queue, no priority, no audit trail, and infinite "quick questions" that take six hours. A real intake tool gives you a queue, a SLA, and a record. Treat it like a tool.

CRM / sales data — the layer most analysts under-budget

Here's the under-discussed reality: half the "data quality" problems analysts wrestle with are CRM hygiene problems pushed downstream. When ops asks for "clean B2B data," the standard answer is to pipe Salesforce exports through four dbt transformations to deduplicate contacts, normalize company names, fix the phone formats, and patch the missing industry codes.

That's not data engineering. That's compensating for a CRM that didn't enforce hygiene at write time.

Rework starts at $12/user/month for CRM and Sales Ops and exports clean B2B contact and pipeline data straight to your warehouse. The cleanup pass you'd otherwise do in dbt mostly evaporates because the data is structured at intake (required fields, validated formats, deduplication on write). I've moved teams off Salesforce-plus-four-cleanup-models and watched their dbt build time drop from 22 minutes to 6 minutes.

This isn't a "Rework wins everywhere" pitch. If you're running Salesforce at a 500-person org with 12 admins, you're not switching tomorrow. But if you're at the stage where "we should buy Salesforce someday" is the plan, do the math on Rework first. The savings show up in dbt model count, not just license cost.

The 30-day stack audit (do this before you buy anything)

Every analyst should run this once a year. It pays for itself in week one.

Days 1-3 — Inventory. List every tool, every seat, every monthly bill. Pull the AP ledger. Find the credit card statement. Most teams find $10K-$30K/year in shelfware in week one. The Snowflake reader account nobody uses. The Tableau seat for the analyst who left in November. The Census subscription from when you tried reverse-ETL for a quarter.

Days 4-10, map. Map each tool to a layer above. Anything that doesn't map gets a "why does this exist?" interview with whoever owns the contract. If they can't answer in two sentences, it's a kill candidate.

Days 11-20, find the duplicates. Two BI tools. Two ELT tools. Three things calling themselves "data catalogs." Pick one per layer. The duplicate is the kill.

Days 21-30, write the kill list. Concrete dollar amounts. Concrete reasons. Present to the head of data with receipts. Bring the alternative migration plan, even if it's just "move to Metabase, here's the timeline." Heads of data hate vague kill lists. They love specific ones with replacement plans.

Stack-on-a-napkin diagram (the deliverable to your CFO):

Source systems → ELT (Fivetran) → Warehouse (Postgres or Snowflake) → dbt → BI (one tool) → Stakeholders
                                          ↑
                                   CRM (Rework)
                                   pipes clean
                                   data here
Intake (Jira) governs the queue.

If your napkin needs more boxes than that, you're overbuilt.

The cut list (vendors I'd remove from most stacks)

  • Reverse-ETL when you have 3 destinations. Hightouch and Census are real products, but if you're piping data to Salesforce and HubSpot and that's it, you don't need a $24K/year tool. Write a Python script. Schedule it in dbt Cloud or Airflow. Move on.
  • Data catalogs under 50 tables. Atlan, Alation, Collibra are great at scale. Under 50 tables, a Notion page beats them and costs nothing. Catalogs only earn their seat when nobody can find the right table without one.
  • "AI-powered" anything that wraps GPT around a SQL editor. I've evaluated five of these. They all generate plausible SQL that's wrong in subtle ways. Your analysts will spend more time correcting them than writing the SQL themselves. Wait 18 months.
  • Observability tools when you have 12 dbt models. Monte Carlo, Bigeye, Elementary at scale make sense. With 12 models, your "observability layer" is a dbt test suite and a Slack alert. That's free.

Common pitfalls

Buying Looker before you have a semantic layer. I see this every quarter. A team buys Looker for the governance story, then realizes nobody on staff knows LookML, then pays a consultant $200/hour to build the semantic layer. Two years later they're still not using it the way Looker intended.

Picking Snowflake for a 200GB workload. Postgres handles 200GB on a $200/month RDS instance. Snowflake handles it for $2K/month minimum once you account for compute, storage, and the warehouses people forgot to suspend. If your data fits in RAM on a $500 server, you don't need a cloud warehouse yet.

Treating dbt Cloud as mandatory. It's not. dbt Core plus Airflow plus a free GitLab CI runner gives you 90% of dbt Cloud at 0% of the cost. The 10% you lose is the IDE and the docs site. Both are nice. Neither is mandatory.

Letting every team buy their own BI tool. Marketing buys Tableau. Sales buys Looker. Product buys Hex. Now you have three semantic layers, three sets of dashboards that disagree, and three renewals to fight. One BI tool. Negotiate hard. Make the teams adapt.

Measuring success

You're done auditing when:

  • You can name every line item in the analytics budget, every monthly price, and every layer it serves.
  • Tooling spend per analyst is benchmarked. (Mine target: $8K-$15K per analyst per year for everything below the warehouse, plus warehouse compute. If you're over $25K per analyst, something's wrong.)
  • Nothing in your stack exists "because the last person set it up."

That's the bar. Six layers, real prices, defensible to a CFO who's never heard of dbt. If you can write that paragraph cold, you'll keep the budget. If you can't, you won't.

Learn More