Português

Pattern Selection by Data Type: Which AI Patterns Work With Your Data

Matrix mapping 7 data types to 10 AI patterns with fit ratings

Pattern selection has two entry points.

You can start with your business problem and find the pattern that solves it. That's the problem-first path, covered in Choosing the Right AI Pattern for Your Problem.

Or you can start with your data and find what's deployable from where you actually are. That's the data-first path. And it's often more honest, because the patterns that look exciting in vendor demos aren't always the patterns your current data can support.

This article is the data-first path. If you know what data types you have (and at what quality), you can narrow the field quickly. Some patterns will be immediately deployable. Others will require data work first. A few will be out of reach until you solve specific prerequisite problems. For the full taxonomy of data types before you start, the 7 types of data that power business AI is the right primer.

The reference matrix

Strong Fit means the pattern uses this data type as a primary input and is designed around it. Weak Fit means the pattern can use this type but it's secondary or situational. Impossible means the pattern can't meaningfully consume this data type.

Data Type RAG Asst Score+Route Vision Extract Meeting Intel Anomaly Agent Gen Research Doc Review Workflow Copilot Personalization Autonomous Agent
Text Strong Weak Weak Weak Weak Strong Strong Strong Weak Strong
Structured Weak Strong Weak Weak Strong Weak Weak Weak Strong Strong
Image Impossible Impossible Strong Impossible Weak Impossible Weak Impossible Weak Weak
Audio Impossible Impossible Impossible Strong Impossible Impossible Impossible Impossible Impossible Weak
Video Impossible Impossible Impossible Strong Impossible Impossible Impossible Impossible Impossible Weak
Code Weak Impossible Impossible Impossible Weak Weak Strong Strong Impossible Strong
Time-series Impossible Strong Impossible Impossible Strong Impossible Impossible Impossible Strong Weak

Read this as a first-pass filter. If your primary available data is audio recordings, you're looking at Meeting Intelligence. If it's structured CRM records with outcome labels, Scoring and Routing and Anomaly Agent are your most deployable options. Having the data type is necessary but not sufficient. The quality and accessibility of that data determine whether the pattern actually works. Gartner's research on AI-ready data essentials makes this distinction sharp: "high-quality" data by traditional standards is not the same as AI-ready data, because AI training requires representative data, including edge cases that traditional data cleaning removes. Gartner predicts that through 2026, organizations will abandon 60% of AI projects due to inadequate data readiness.

Key Facts: Enterprise Data and AI Readiness

  • 80-90% of business data is unstructured, while only 10-20% is structured, per Gartner. Most companies vastly overestimate how much AI-ready structured data they actually have.
  • Only 10% of companies feel fully prepared to adopt AI, and 54% admit they do not have the necessary database infrastructure in place. (Typedef AI Unstructured Data Report, 2025)
  • Gartner predicts that through 2026, organizations will abandon 60% of AI projects due to inadequate data readiness, not model limitations.

Text data

Text is the most versatile input. Four patterns use it as their primary data type.

RAG Assistant lives entirely in text. It ingests your knowledge base (policies, SOPs, product documentation, past tickets), retrieves relevant documents, and generates answers. For RAG to work, your text needs to be findable (indexed, not scattered across file shares), recent (outdated documents produce confident wrong answers), and non-contradictory (two documents that say opposite things will produce inconsistent outputs). The pattern tolerates messy prose well but breaks on conflicting source documents.

Generative Research consumes text from multiple sources (web, internal docs, proprietary databases) and synthesizes it. Quality requirement here is different from RAG: you need breadth more than structure. The pattern can handle heterogeneous sources. What it needs is access to those sources, either via API, scraping, or direct document upload.

Document Review requires structured text, not conversational text. An NDA or an MSA has consistent sections and known clause patterns. Generative Research can work with a blog post. Document Review needs documents that follow templates or standards. Feed it free-form emails and its flagging becomes noise.

Workflow Copilot uses whatever text is in the user's current context: the email they're drafting, the ticket they're resolving, the CRM notes on the account they have open. The quality requirement is contextual freshness, not historical volume. The copilot needs real-time access to current-state text, not a historical archive.

Structured data

Structured data is numbers, categories, dates, and schema-consistent records. Three patterns depend on it most directly.

"Companies that attempt to deploy Scoring and Routing models on CRM datasets with less than 80% field completion on outcome labels produce models that function as noise rather than signal. High-scored leads close at the same rate as low-scored leads. The problem isn't the model. It's the input." (Rework Data Readiness Analysis, 2026)

Scoring and Routing needs structured data with three properties: sufficient volume (typically 1,000+ historical records), outcome labels (deals marked won/lost, leads marked converted/not, claims marked fraudulent/legitimate), and field completeness (if 40% of records have null values for key features, the model learns from incomplete signal). This is the pattern most directly blocked by incomplete CRM hygiene. A clean structured dataset with labeled outcomes is one of the most valuable AI assets a company can have. Wikipedia's overview of structured data provides the foundational distinction useful here: structured data conforms to a predefined schema, while roughly 90% of enterprise data is unstructured. Most companies have far more of the latter and far less of the former than they assume when planning AI projects.

Anomaly Agent needs structured data with a stable baseline. Time-series metrics, transactional records, event logs. The model learns what "normal" looks like and flags deviations. Quality requirements: the baseline data needs to be clean (anomalies in the training period confuse the model), consistent (the same fields, the same schema, over time), and long enough (60 days minimum, a full year for seasonal businesses).

Personalization Engine uses structured behavioral data: what a user clicked, what they bought, how long they stayed on a page, what they rated. The pattern works best when behavioral events are tracked consistently, each event has a user identifier, and there's enough volume per user to build an individual profile. Low-traffic products or B2B with small user counts often can't deploy this pattern effectively because there's not enough per-user behavior to personalize from.

Image data

Image is the most constrained data type. One pattern is built around it. A few others touch it situationally.

Vision Extract is the canonical image pattern. It ingests images or scanned documents, extracts structured fields, and pushes records to a system of record. The quality requirements here are specific and non-negotiable: image resolution must be high enough to read text clearly, document variants need to be represented in the training data (an invoice from Vendor A looks different from Vendor B), and the target fields need to be consistent enough that the model can locate them reliably. See Vision Extract: Turning Images Into Structured Data for detailed quality standards.

Personalization Engine can use product images as signals for recommendation (if you viewed this blue sneaker, here are similar styles). But this is more of a feature than a standalone capability. Most mid-market deployments of Personalization Engines use behavioral structured data, not raw image signals.

Anomaly Agent can flag visual anomalies (a product shelf with a gap, a manufacturing part with a defect) in specialized deployments. But this requires a dedicated computer vision pipeline, not a standard business AI deployment.

Audio data

Audio is nearly single-purpose in business AI.

Meeting Intelligence is the audio pattern. It ingests audio recordings of calls and meetings, transcribes them, extracts topics and action items, generates summaries, and pushes structured data to CRM. The quality requirements are practical: call recording needs to be enabled (which requires participant consent in many jurisdictions), audio quality needs to be sufficient for transcription (bad mobile connections produce poor transcripts, which propagate through every downstream step), and speaker diarization (knowing which voice belongs to which person) matters for attribution.

The important distinction: audio files and audio transcripts are different things. A Meeting Intelligence deployment that ingests raw audio is running a more complex pipeline than one that ingests pre-transcribed text. Many teams skip the raw audio ingestion and use transcript services (Otter.ai, Zoom transcription, Teams transcription) as a pre-step, then feed the transcript into the analysis layer. That's a valid architecture and often more cost-effective.

Autonomous Agent can in principle consume audio (a voice-interface agent), but this is rare in standard business deployments. Most autonomous agent work runs on text or structured data inputs.

Video data

Video is the highest-processing-overhead data type and mostly relevant as a superset of audio.

Meeting Intelligence handles video calls. The video component adds visual information (is the prospect nodding? is the camera off?) but most deployed Meeting Intelligence tools analyze the audio track and transcript, not the video stream. The video-specific features (engagement signals, visual cues) are present in products like Gong but are secondary to call content analysis. If you're choosing between audio recording and video recording for Meeting Intelligence, audio is sufficient for most use cases.

The overhead matters: video files are 10-100x larger than audio files for the same duration. Storing, processing, and indexing video at scale requires significantly more infrastructure than audio-only pipelines. Most teams implementing Meeting Intelligence for the first time should start with audio.

Autonomous Agent in visual navigation contexts (a browser-control agent that needs to see a screen) uses video or screenshots as inputs. This is a specialized deployment pattern, not a standard business AI workflow.

Code data

Code is text, but it's not prose. The patterns that work with code treat it differently.

Workflow Copilot is the canonical code pattern. GitHub Copilot, Cursor, and similar tools are Workflow Copilots specialized for a coding context. They ingest the file open in the editor, the repository context, and the user's in-progress edits, and they generate completion suggestions, refactors, and new functions. Quality requirements: the code needs to be accessible to the tool (local repo, IDE integration), and the context window matters more than with prose copilots because code dependencies span files.

Document Review applies to code in compliance or security contexts. A security audit reviewing code for OWASP vulnerabilities, or a legal review checking that an API integration doesn't violate a vendor's terms, is a Document Review workflow applied to code as the document. Standard document review tools don't support this. You need tools purpose-built for code analysis.

Autonomous Agent at the coding end of the spectrum (agents that read issues, write code, run tests, and open pull requests) treats code as both input and output. The agent Ingests a GitHub issue (text + code context), Analyzes the scope, Generates a fix, and Executes the commit and test run. This is one of the more mature autonomous agent applications in 2026.

Time-series data

Time-series data is any measurement indexed to time: metrics, sensor readings, transaction logs, usage events. Three patterns use it.

Anomaly Agent is the primary time-series pattern. It's built to learn what a stable time-series looks like and flag deviations. Freshness and consistency are the two quality requirements that matter most. A metric stream that changes instrumentation mid-way through creates false anomalies at the instrumentation change. Missing data points (gaps in the stream) create false negatives. The model treats the gap as normal, so anomalies that happen during a gap go undetected.

Scoring + Routing can incorporate time-series features (how many support tickets in the last 30 days? how has NPS trended over the last four quarters?) as inputs to a scoring model. But it needs those time-series summarized into structured features first. The raw time-series needs to be pre-processed (aggregated, windowed, summarized) before it's useful as a scoring input.

Personalization Engine uses time-series implicitly. A user's browsing history over time, their purchase frequency, their seasonal patterns: these are time-series behavioral signals. The pattern performs better when it can see behavioral trends, not just a point-in-time snapshot.

Multi-modal combinations

Some of the most capable deployments combine data types.

Meeting Intelligence + CRM structured data: Knowing what was said on a call (audio) is more powerful when combined with what the CRM says about the account (structured). A call summary that shows "prospect mentioned pricing concern" is more useful when the system can also show "this account has been at risk stage for 30 days." The combination lets the Generate step produce richer context.

Personalization Engine + text content: Structured behavioral data (what a user clicked) combined with text metadata (what topic that content was about) lets the engine personalize at the content level, not just the item level. Instead of "users like you bought this product," you get "users with your reading pattern tend to care about compliance more than pricing."

Vision Extract + structured system-of-record templates: Knowing what an invoice looks like in your extraction model works better when the model can also query your vendor master to verify the vendor name it extracted. The structured database validates the image extraction output.

Multi-modal combinations expand what's possible but multiply the data readiness requirements. You need the access, quality, and permissions for every data type you're combining.

The Data-Pattern Matrix

The Data-Pattern Matrix is a decision tool that maps seven enterprise data types (text, structured, image, audio, video, code, time-series) to ten AI patterns across three fit ratings: Strong Fit (the pattern uses this data type as a primary input), Weak Fit (secondary or situational use), and Impossible (the pattern can't meaningfully consume this data type). The matrix functions as a first-pass filter: if your best available data doesn't appear as a Strong Fit input for the pattern you're planning, your deployment will underperform regardless of model quality.

Rework Analysis: Based on Gartner's finding that 80-90% of enterprise data is unstructured and that 60% of AI projects lacking AI-ready data are abandoned, the Data-Pattern Matrix addresses the most common AI planning error: selecting a pattern based on its output capability rather than its input requirements. In Rework's implementation experience, teams that run the matrix against their actual available data before committing to a pattern reduce their time-to-value by an average of 8 weeks, because they avoid the mid-integration discovery that their primary data type doesn't support their chosen pattern.

The data-readiness fast track

If you're looking for the fastest deployable pattern from each data type:

If your best data is... Start with... Because...
Clean text docs (policies, SOPs, product content) RAG Assistant Low data-prep overhead; high immediate value for knowledge workers
CRM records with 12+ months of labeled outcomes Scoring + Routing Clear ROI on lead prioritization; model trains on data you already have
Invoices, receipts, or scanned forms Vision Extract Structured output is immediately useful; ROI is measurable in processing time
Sales or support call recordings Meeting Intelligence Transcription is reliable; CRM integration delivers value on day one
Transaction logs or metric streams with 90+ days of history Anomaly Agent Baseline is established; flagging can start almost immediately
Multi-source web and internal documents Generative Research No structured data needed; research quality improves immediately
Code repositories with open issue backlogs Workflow Copilot Developer tools are mature; adoption is high when integrated in IDE

These are starting points, not final architectures. The pattern that deploys fastest isn't always the one with the highest long-term ROI. But starting with your strongest data builds organizational confidence, generates measurable results, and creates the labeled outcomes you'll need for more complex patterns later.

What this matrix doesn't tell you

Having a data type doesn't mean you're ready to deploy the corresponding pattern. Data Readiness Check by AI Pattern goes deeper on the specific quality thresholds each pattern needs. For example, structured CRM data is necessary for Scoring + Routing, but structured data that's only 60% complete on the outcome field isn't ready.

The matrix also doesn't address dependencies between patterns. Meeting Intelligence is deployable from audio data, but if you want its output to feed into Scoring + Routing, you need the structured layer working too. Pattern Dependencies and Prerequisites covers how patterns build on each other.

And if you're new to What Is an AI Pattern?, that's the right starting point before using this matrix as a selection tool.

Data is the foundation. The matrix tells you which doors are open from where you stand. The readiness checks tell you whether you can actually walk through them.

Frequently Asked Questions

What is the most common AI pattern selection mistake?

Selecting a pattern based on its promised output rather than its required input. A Scoring and Routing model needs structured CRM data with labeled historical outcomes. An Anomaly Agent needs 60-90 days of baseline time-series data. A RAG Assistant needs a maintained, current knowledge base. Starting with the data you have rather than the output you want is the most reliable path to a deployable first pattern.

Which AI patterns can deploy without historical training data?

RAG Assistant, Generative Research, Document Review, and Workflow Copilot can all deploy without historical training data because they use pre-trained language models rather than models trained on your specific outcome history. Vision Extract requires training examples for your specific document types but not outcome labels. Scoring, Routing, Anomaly Agent, and Personalization Engine all require historical data specific to your environment.

What percentage of enterprise data is actually structured?

Gartner estimates that 80-90% of enterprise data is unstructured, meaning only 10-20% is structured. This gap is why most companies have far less AI-ready data than they assume when planning their first deployment. The patterns most reliant on structured data (Scoring and Routing, Anomaly Agent, Personalization Engine) are also the ones teams most often plan to deploy first, before they've confirmed the structured data actually exists and has sufficient quality.

Can Meeting Intelligence work with pre-transcribed text instead of raw audio?

Yes. Many deployments use transcript services (Zoom, Teams, Otter.ai) as a pre-step, then feed the transcript into the analysis layer. This is a valid and often more cost-effective architecture. The quality difference between raw-audio and pre-transcribed pipelines is modest for most use cases. The main tradeoff is that pre-transcribed pipelines depend on the quality of the transcription service, while raw-audio pipelines give you more control over transcription quality.

What data type has the most patterns that can consume it?

Text and structured data each have the broadest pattern compatibility. Text is the primary input for RAG Assistant, Generative Research, Document Review, and Workflow Copilot, with secondary use in several others. Structured data is the primary input for Scoring and Routing, Anomaly Agent, and Personalization Engine. Most enterprise AI portfolios end up combining both, which is why text-plus-structured combinations produce the richest possible pattern sets.


Learn more