日本語

Analyze: How AI Makes Sense of What You've Collected

Analyze capability — magnifying glass revealing patterns in data points

Meet Lisa. She runs a 140-person HR consulting firm. Business is strong. The team has been growing for three years.

But last spring, they made a bet that didn't land. They subscribed to an AI recruiting tool that promised to "screen candidates intelligently." Lisa's team ran a pilot on an open senior analyst role. Five hundred applications came in. The tool processed all of them in under four hours.

Then her Head of Recruiting reviewed the output. Forty percent of the candidates the AI marked as strong matches were clearly wrong fits. A candidate with six years of relevant experience was marked low-priority because the tool didn't recognize an alternate job title convention common in Australia. Two candidates with almost no relevant experience were marked high because they'd optimized their resumes with the right keywords.

The AI wasn't broken. The Analyze capability was just being used in ways nobody had fully thought through, and the failure modes were invisible until they were expensive.

This article is for Lisa, and for any leader trying to understand what Analyze does, where it works, where it breaks, and how to hold it accountable.


What Analyze actually does

In the ACE Framework, Analyze is the second of five core capabilities: Ingest, Analyze, Predict, Generate, Execute. If Ingest takes data in, Analyze makes sense of it.

Analyze takes ingested information and answers the question what is this? It classifies. It extracts. It summarizes. It translates. It identifies who said what, how they felt about it, and what they wanted.

Predict answers a different question: what will happen? Analyze is oriented to the present and past. It interprets current state: this email is a complaint, this contract contains a 90-day payment clause, this customer is frustrated. Predict takes that interpretation one step further by forecasting what's likely next.

Search is different again. Search returns documents. Analyze returns meaning. When you ask a knowledge base "find me contracts about payment terms," that's Search. When you ask it "summarize what our typical payment terms have been across the last 50 contracts," that's Analyze (combined with Generate for the output).

The distinction matters because many AI tools blur all three. Knowing which capability you're actually using tells you what failure modes to expect and what inputs you need.


The six sub-capabilities of Analyze

Analyze is the broadest of the five ACE capabilities. It encompasses six distinct operations that often work together but can also fail independently.

1. Classification

Classification is the most basic Analyze operation: putting something into a category. Is this email urgent or routine? Is this lead qualified or not? Is this support ticket a billing question, a bug report, or a feature request?

Classifiers assign labels. They can be binary (yes/no), multi-class (which of ten categories?), or multi-label (all applicable categories from a set). The quality of classification depends entirely on the quality and relevance of the training data the model learned from.

This is where Lisa's recruiting tool stumbled. The classifier was trained on resume data that didn't generalize well across regional job title conventions. It labeled candidates correctly within the distribution of its training data, and wrong everywhere outside it.

2. Extraction

Extraction pulls specific facts from unstructured text. Given a vendor contract, extract the payment terms, the liability cap, and the renewal conditions. Given a resume, extract years of experience per skill, most recent employer, education credentials. Given a support ticket, extract the product version and the error code.

Raw text goes in; structured fields come out. Tools like spaCy, Hugging Face transformers, and the OpenAI and Anthropic APIs all have strong extraction capabilities. Where extraction fails is at the borders of ambiguity: extracting "John" from a document without knowing which John, or pulling a date that could refer to multiple events.

3. Summarization

Summarization condenses long content to its key points. A 60-page RFP becomes two paragraphs. A 90-minute sales call becomes five action items and three objections. A 5,000-response survey becomes a dozen themes.

Good summarization is harder than it looks. The model must decide what's important, which requires understanding context and intent. A summary of a legal contract for procurement looks different from a summary for compliance. Tools that don't let you specify the audience produce generic summaries that miss what actually matters.

Gong and Chorus (now part of ZoomInfo) do summarization on sales calls as their primary product. Snowflake Cortex includes summarization for structured data queries.

4. Translation

Translation in the ACE Framework is broader than language translation. It also includes format translation: code to documentation, data to narrative, voice to text.

Language translation (English to Spanish, Japanese to French) is now commodity-grade in AI. What's harder is domain translation: converting technical jargon into plain language an executive can act on, or translating customer feedback into structured product requirements. That kind of translation is still very sensitive to context and framing.

5. Sentiment and intent detection

Sentiment detection answers: how does the person writing this feel? Positive, negative, neutral, or more granularly: frustrated, satisfied, confused. Intent detection asks: what does this person want to accomplish?

These two are often paired but shouldn't be conflated. A customer who writes "I can't believe you released this feature finally, been waiting for years" has positive sentiment but is expressing a complaint. Intent detection flags this as a feature adoption message, not a support request.

Sentiment and intent analysis are what let Zendesk AI route an angry customer to a senior agent, or let Intercom Fin distinguish between a customer who needs help and one who's about to churn.

6. Entity and topic recognition

Entity recognition identifies and categorizes named things in text: people, organizations, dates, products, locations, currencies, contract amounts. Topic recognition identifies what a piece of text is about without relying on named entities.

An entity recognizer reads "On March 4, Acme Corp signed a $240,000 agreement for software services" and extracts: date (March 4), organization (Acme Corp), amount ($240,000), type (software services). A topic model reads a corpus of support tickets and identifies clusters ("account access," "billing discrepancy," "feature request," "performance issue") without anyone labeling them in advance.

Both are fundamental to making large volumes of unstructured data navigable at scale.


Six real business examples

Each workflow below tags the primary sub-capabilities at work.

Inbox triage [classification + intent]: An Analyze layer (via the OpenAI API) classifies 500 incoming emails per day as "response needed today," "FYI only," or "no reply needed," and tags intent within the first category. Response time on priority messages drops by 60%.

Support ticket routing [classification + extraction]: A Zendesk AI layer tags each ticket by issue type and extracts the product version and account tier. Tickets route automatically, and the enterprise SLA hit rate goes from 71% to 94%.

Sales call analysis [summarization + sentiment]: Using Gong or Chorus, every recorded call produces a summary, the top three objections raised, the prospect's sentiment arc, and competitive products mentioned. Coaching conversations become specific rather than generic.

Survey synthesis [topic recognition + summarization]: 5,000 free-text survey responses, processed via Hugging Face topic modeling or a GPT-class API, produce 12 themes with representative quotes and a sentiment breakdown in about four hours. Without AI, two people spent a week on the same work.

Resume screening [extraction + classification]: Lisa's firm, post-pilot, uses Analyze for extraction only: pulling years of experience per skill, most recent role, and credentials into structured fields. Recruiters filter and rank those fields themselves. Accuracy improves, and the "alternative job title" problem disappears because recruiters now see the underlying data rather than a black-box score.

Customer feedback analysis [sentiment + entity recognition]: An Analyze pipeline on the Anthropic API extracts mentioned product features, assigns sentiment per feature, and produces a ranked list of what customers praise and criticize. The product team gets actionable input in under a day rather than waiting for a quarterly manual analysis.


Analyze vs. Predict: the distinction that matters

This is the confusion that costs the most. Many AI products describe themselves as "analyzing" data when they're actually doing prediction. The distinction in the ACE Framework is time orientation.

Analyze interprets the present. This email is a billing complaint. This call had three objections. This customer has negative sentiment. These statements describe what is, based on the data you have.

Predict forecasts the future. This customer is 73% likely to churn next quarter. This lead has an 82% probability of closing. This transaction has a 99.4% chance of being fraudulent. These statements project forward based on historical patterns.

The failure modes differ too. Analyze fails when categories are wrong, training data is outdated, or context is ambiguous. Predict fails when historical patterns stop reflecting current reality.

A lead scoring tool that says "this lead is a good fit" is doing Analyze (fit score based on current attributes). A lead scoring tool that says "this lead is 78% likely to close in Q2" is doing Predict. Both useful. Both fail differently. Knowing which one you have tells you which problems to watch for.


Analyze vs. Search: two different jobs

Search returns documents. Analyze returns meaning. Search for "customer complaints about billing" and you get documents. Ask Analyze to "summarize what customers have complained about in billing-related tickets over the last six months" and you get themes, frequencies, representative quotes, and sentiment patterns.

Most real AI workflows combine both: retrieve (Ingest + search) to get the relevant documents, then Analyze to make sense of what was retrieved, then Generate to produce a response or report. This combination is the RAG (Retrieval-Augmented Generation) pattern, and Analyze is the middle step that makes it work.


Common tools for Analyze

Use case Tools
Text classification, extraction, sentiment OpenAI API, Anthropic API, Hugging Face Transformers
NLP and entity recognition spaCy, Hugging Face, AWS Comprehend
Sales call analysis Gong, Chorus (ZoomInfo), Fireflies
Structured data analysis Snowflake Cortex, DuckDB, Google BigQuery ML
Customer support classification Zendesk AI, Intercom Fin, Freshdesk Freddy

Most mid-market companies don't build Analyze capabilities from scratch. They buy them bundled inside platforms (Gong for sales calls, Zendesk for support) or use them via API (OpenAI, Anthropic) to build custom workflows. The API route gives more control; the bundled route ships faster.


How Analyze connects to other ACE capabilities

Analyze is almost always the middle layer in a larger workflow.

Ingest feeds Analyze. A call recording becomes a transcript (Ingest), and Analyze surfaces the objections and sentiment. Ingest converts raw signals into a form Analyze can work with.

Analyze feeds Predict. Prediction needs structured inputs the model can pattern-match against historical outcomes. Analyze creates those features by classifying a lead's job title, extracting their company size, and tagging products they've mentioned.

Analyze feeds Generate. You can't write a good response to a customer complaint without first understanding the complaint. Analyze reads the ticket, identifies the issue type and sentiment, and gives Generate the context it needs.

The chain Ingest → Analyze → Generate is one of the most common patterns in business AI. Meeting intelligence tools (Gong, Fireflies) follow it exactly: take in the call (Ingest), understand what happened (Analyze), produce a summary and follow-up (Generate).


Failure modes

Analyze is reliable in controlled conditions and surprisingly brittle when conditions shift. These are the four failure modes that show up most often.

Label drift. A classifier trained on last year's support tickets performs well on last year's support tickets. When your product, your customers, or the kinds of problems they report change, the classifier's categories stop fitting the new data. This can happen slowly (gradual degradation) or suddenly (a product launch creates ticket types the model has never seen). The fix is monitoring accuracy over time and retraining regularly.

Inherited bias. Classifiers learn from training data. If that data reflects historical human decisions, and those decisions were biased (in recruiting, in loan approvals, in support prioritization), the classifier reproduces those biases at scale. The AI doesn't add bias from nothing; it amplifies patterns already present in the data. This is the failure mode in AI-powered resume screening: classifiers trained on historical hiring data often underweight candidates from underrepresented groups because those groups were underrepresented in past hires.

Overconfident edge cases. Most classifiers output a confidence score. But classifiers often show high confidence on inputs that are actually edge cases, close calls the model has never seen before. The confidence score looks reassuring. The classification is wrong. Human spot-checking on high-stakes classifications is the only way to catch this.

Context-blind extraction. Extraction pulls named entities from text, but names don't carry context with them. "John signed the agreement" (which John?), "The contract expires in 90 days" — from when? Extractors output the literal text they found without resolving the ambiguity. In a document with multiple parties, dates, and references, context-blind extraction creates structured data with gaps that look complete but aren't.


How to measure Analyze quality

Analyze is more measurable than most AI capabilities because it produces labeled outputs you can compare against ground truth.

Precision and recall. Build a labeled test set: a sample of inputs you've manually classified correctly. Precision tells you what fraction of the model's positive classifications are actually positive. Recall tells you what fraction of actual positives the model caught. A good classifier has both above 80%; excellent is above 90%.

5% human spot-check. Sample roughly 5% of Analyze outputs and have a human review them. This catches drift before it shows up in aggregate metrics and builds institutional knowledge about how the model fails, not just that it fails.

Drift detection. Re-run your test set every 30 to 90 days. If precision and recall are declining, the data distribution has shifted and the model needs retraining. The recruiting tool Lisa used had probably been degrading for months before anyone reviewed the output carefully enough to notice.


Why Analyze is the workhorse

Call an AI vendor today and ask what their product does. Whatever the feature name, the underlying work is probably Analyze. Routing. Tagging. Summarizing. Extracting. Scoring.

Of the five ACE capabilities, Analyze appears in the widest range of business workflows. It's the interpretation layer that converts raw data into something a human or another system can act on. Without it, Ingest just accumulates, Predict has nothing to pattern-match, and Generate has no context to work from.

It's also the quietest capability. When it works, users don't notice. Emails arrive pre-sorted. Tickets route correctly. Calls produce accurate summaries. The work is invisible until it fails. And when it fails, the failure is usually blamed on "the AI is wrong" rather than on label drift, inherited bias, or context-blind extraction.

Knowing those distinctions tells you which questions to ask before you buy, which metrics to monitor after you deploy, and which failures to expect when conditions change.