Clay logo, go to homepage

The complete guide to AI lead enrichment

AI lead enrichment fills the columns no data provider sells — ICP fit, what they sell, segment, and trigger. Here is how to build it accurately in Clay.

May 25, 202611 min read

AI lead enrichment uses a research agent to answer the questions a data provider has no field for. A vendor can sell you a headcount, an industry code, and a funding round. It cannot tell you whether this account fits your ICP, what this company actually sells, which segment it belongs in, or whether a trigger just fired that makes now the moment to reach out. Those are judgment calls a human SDR used to make by reading a website, and they are exactly the calls a research agent can now make on every row. This guide covers what AI lead enrichment is, what it does that providers can't, how to keep it accurate, where it sits in your enrichment stack, and how to build it in Clay.

What AI lead enrichment is

AI lead enrichment is research, not lookup. A data provider matches your record against a database and returns whatever row it finds: the same fixed fields, the same way, for every company. A research agent starts from a question, reads live sources to answer it, and returns a structured verdict with the evidence behind it. The provider tells you a company has 540 employees. The agent tells you that company sells a compliance platform to mid-market healthcare, just posted four security-engineering roles, and reads as a strong ICP fit because of it.

This is the work Clay calls last-mile data: the attributes no provider sells because they require judgment, not a database row. Classic enrichment, the kind covered in the complete guide to data enrichment and the waterfall enrichment guide, gets you coverage on the fields that exist somewhere. AI enrichment gets you the fields that exist nowhere until someone reads the page and decides.

The cleanest way to see the difference is to ask which of your enrichment columns a vendor could actually invoice you for.

Lead attributeWhere it comes fromWhy
Employee countA data provider can sell itLives in a database, matchable by domain
Industry codeA data provider can sell itStandardized field, matchable by domain
HQ locationA data provider can sell itLives in a database, matchable by domain
Funding roundA data provider can sell itTracked and sold as firmographic data
Does this account fit our ICP?Only research can answer itRequires reading the page and making a judgment
What does this company actually sell?Only research can answer itNo provider field; read the site to decide
Which segment does it belong in?Only research can answer itYour segments, not a standard taxonomy
Did a buying trigger just fire?Only research can answer itLives in news, hiring, and product pages, not a database
Is this person the real decision-maker for us?Only research can answer itDepends on your buyer, requires interpretation

Four of those nine columns are firmographics any provider will invoice you for. The other five decide fit, segment, and timing, and no vendor sells them: they require judgment rather than a database match. Those five are where deals are won or lost, and until now the only way to fill them was a person reading a website.

What AI lead enrichment does that providers can't

A provider returns a record; an agent returns a decision. OpenAI's GTM team named this exactly when they built their enrichment: they started by automating what their best sellers already did, visiting company websites and reading pages to decide who was worth a rep's time. That research, run by hand, was the bottleneck. Run by an agent on every row, it became the foundation.

A-LIGN had been paying a contractor sixty thousand dollars a year to research two thousand accounts over six months, and the output was a spreadsheet of yes/no answers that reps did not trust. When they rebuilt that research as an automated agent workflow, it delivered fifteen times more useful information at lower cost and faster speeds — the Clay contract actually came in ten thousand dollars cheaper than the manual one. The agent did not just check a box for "uses compliance services." It read each company's site and reasoned about whether they were a fit, the same call the contractor had made, on every account, in minutes instead of months.

40% → 80%

OpenAI more than doubled its inbound enrichment coverage after moving off a single provider and onto a multi-source model with AI research filling the gaps.

Read the full story

The pattern repeats across teams that sell something a database can't describe well. The point is not that the agent is smarter than a provider. The point is that it answers a different kind of question: not "what is true about this company" but "what does this mean for us."

How an AI enrichment run actually works on one lead

An AI enrichment run is a prompt, a read, and a structured verdict. You give the agent the row's context and a question. It searches named sources, reads them, and returns the answer in fields you defined, with a link to where it found each one. The part that separates a usable column from a dangerous one is what happens when the answer is not on the page.

Run research on one lead, then toggle what happens when the answer isn't on the page

Northwind Roboticsnorthwind.io

Does this account fit our ICP, and what segment?

A good enrichment run returns a structured, cited verdict for what it can confirm and an honest blank for what it cannot, which is the single behavior that separates a trustworthy column from a confident guess.

The fork is the whole game. A field that returns "unclear" is information you can act on; you know not to score on it. A field that invents a revenue figure looks identical to a real one in your CRM, and a rep will quote it in a call. The next section is how to make sure your columns only ever do the first thing.

How to keep AI lead enrichment accurate

An AI enrichment column is only as trustworthy as its instructions to return nothing. The failure mode of AI enrichment is not that it gets a fact wrong now and then. It is that an ungrounded prompt produces a confident, well-formatted answer for every row, including the rows where the truth was not available, and a fabricated value is indistinguishable from a real one once it lands in a field. Three controls remove that risk, and they compound.

Flip the three accuracy controls and watch the risky cell fail safe

Lead: a company selling to both consumers and small businesses (genuinely ambiguous segment)

segment

Mid-market

Backed by invented justification from model memory.

Hallucination riskHigh

Source-grounding, a locked schema, and a verify-before-write pass each close a separate door the model uses to hallucinate, and together they turn a risky column into one that fails to a blank instead of a guess.

Two of these controls have analogs you already run on classic data. Verification on an AI column is the same instinct as verifying an email before you send: confirm it is real before you act on it, the discipline covered in the guide to verifying email addresses. The locked schema is what makes the column scorable instead of decorative. And source-grounding is what lets a rep click through to the page and trust the verdict, which is the difference between an agent that augments your sellers and one that embarrasses them.

We consolidated three vendors into Clay and started enriching data points that didn't exist in any traditional database. Our reps went from starting every conversation cold to knowing exactly who to call and what to say.

Bryanna Clancy, Marketing Strategy & GTM Engineering Leader, Hex · Read the Hex story

Where AI enrichment fits in your enrichment stack

AI enrichment is the last layer, not the first. It is tempting to point an agent at every empty field, but research is the slowest and most expensive way to get a fact that a provider already sells for a fraction of a cent. The economical stack runs providers first, catches the misses with a waterfall, and saves the agent for the questions that have no database answer. Assemble it in the wrong order and you pay an agent to look up a headcount.

Drag the four layers into the order that fills every field for the least money

1

Primary data provider

Cheapest broad coverage

2

Data-provider waterfall

Fill the misses across many sources

3

AI research agent

Answer judgment fields no provider sells

4

Verification

Confirm before write

Lowcost per lead
5 of 5judgment columns filled

AI enrichment belongs after providers and the waterfall, running only on the fields no database can fill, so you pay for research exactly where research is the only option.

The order is a budget decision, the same lesson the waterfall enrichment guide makes about provider ordering: coverage comes from having the right layers, cost comes from running them in the right sequence. The agent is the most capable layer and the one you want to fire least often.

Here is how the two enrichment styles compare directly, so you can decide which fields belong to which.

AI agent enrichment vs. classic data-provider enrichment

DimensionClassic provider enrichmentAI agent enrichment
What it returnsA matched database recordA reasoned answer to your question
Fields it fillsFirmographics, contact data, tech stackICP fit, what they sell, segment, trigger, persona fit
How it worksMatch on domain or name, return the rowRead named sources, reason, return a structured verdict
Cost per fieldFractions of a centHigher: a research run per field
SpeedInstantSeconds per row
When it's wrongStale or missing recordA confident guess, unless grounded and verified
Best used forFacts that live in a databaseJudgment that lived in an SDR's head

Run providers for the left column and the agent for the right, and never the other way around.

How to build AI lead enrichment in Clay

You build AI enrichment in Clay as a prompt that returns a contract, not a paragraph. The mechanic is Claygent, Clay's research agent, running as a column on your table. It takes the whole row as context, reads the sources you name, and writes back the fields you specified. The build has four parts: give it the row, point it at sources, lock the output with an unclear floor, and pick a model.

Start the prompt with the row's context so the agent knows what it is looking at, then name where to look and what to return. A working judgment-enrichment prompt looks like this.

ICP-fit research prompt (Claygent)
Here is everything we know about this lead:{{company_name}}, {{domain}}, {{employee_count}}, {{industry}}Read the company's own website first: homepage, /product, /pricing, and /careers. Then answer for our team.Our ICP: B2B software companies, 50 to 1,000 employees, that sell to revenue or marketing teams.Return exactly these fields:- icp_fit: high, medium, low, or unclear- what_they_sell: one sentence- segment: SMB, mid-market, enterprise, or unclear- trigger: one current buying signal, or "none found"- reasoning: 2 sentences, include one reason this might NOT fit- evidence_url: the page that supports your answerIf you cannot confirm a field from a real page, return "unclear" for that field. Never guess. Never infer a number that is not written on a page.

Two lines in that prompt do the heavy lifting. The "include one reason this might NOT fit" line forces the agent to argue against itself, which catches the optimistic over-scoring that makes every account look like a fit. The "return unclear, never guess" floor is what makes the column safe to score on. Everything above it is context and routing.

On model choice, do not reflexively pick the cheapest option to save credits. A one-credit model that drops data quality ten percent across a hundred thousand accounts is not a saving; it is ten thousand misjudged leads to recover the price of a coffee. Test on Clay's free test cases first, confirm the cheaper model holds quality on your specific question, and only then run it at scale. Reserve the heavier models for the judgment calls that actually move pipeline.

Once the column returns clean, it feeds everything downstream: ICP fit and trigger become scoring inputs, segment becomes a routing rule, and what-they-sell becomes the first line of the rep's outreach. For research aimed at outbound prospecting rather than enrichment, the guide to using Claygent for prospect research goes deep on writing these prompts.

Where to start

Pick the one judgment field your reps research by hand most often, and automate that column first. For most teams it is ICP fit, because every lead needs it and reps waste the most time on it. Write the prompt above against five real accounts, run it on Clay's free test cases, and read the output next to what a rep would have concluded. If it agrees, scale it. If it over-scores, add the devil's-advocate line and the unclear floor and run again.

Build one column, trust it, then add the next: what they sell, their segment, the trigger. Within a few columns you have replaced the research a human SDR used to do by hand with a system that does it on every row, in seconds, with the evidence attached.

Fill the columns no data provider sells

Build an AI research column in Clay that answers ICP fit, segment, and trigger on every lead, with the evidence attached.

Frequently asked questions

What is AI lead enrichment?

AI lead enrichment uses a research agent to answer questions about a lead that no data provider sells as a field: whether the account fits your ICP, what the company actually sells, which segment it belongs in, and whether a buying trigger just fired. Instead of matching your record against a database, the agent reads live sources, reasons about what it finds, and writes back a structured verdict with the evidence behind each answer. It is the judgment work an SDR used to do by hand, run automatically on every row.

How is AI lead enrichment different from data enrichment?

Classic data enrichment fills fields that already exist in a database somewhere, like headcount, industry, funding stage, and contact details, by matching on a domain or name. AI enrichment fills fields that exist nowhere until someone reads a page and decides, like ICP fit and segment. They are complements, not substitutes: run providers and a waterfall for the database facts, and run the agent only for the judgment fields no provider can offer.

Does AI lead enrichment hallucinate?

It can, if you let it. An ungrounded prompt will return a confident answer for every row, including rows where the truth was not available, and a fabricated value looks identical to a real one in your CRM. Three controls prevent it: point the agent at named sources so it reads pages instead of its training memory, lock the output to a fixed schema that includes an "unclear" option, and require a verification pass to confirm a value before it writes. With those in place the column fails to a blank rather than a guess.

Where does AI enrichment fit in the enrichment stack?

Last. Run your primary data provider first for cheap broad coverage, catch the misses with a waterfall across many providers, then run the AI agent only on the empty judgment fields, and verify before write. Pointing an agent at fields a provider already sells means paying research prices for a database lookup. The agent is your most capable layer and the one you should fire least often.

Can AI lead enrichment replace an SDR's research?

It replaces the manual part, not the judgment. The agent does what reps did by hand, reading websites to decide fit, segment, and timing, and it does it on every lead in seconds with sources attached. A-LIGN rebuilt the work of a sixty-thousand-dollar-a-year research contractor as an automated agent and got fifteen times more useful information at lower cost. Reps then spend their time on the conversations the research surfaced, not on the research itself.