Clay logo, go to homepage

Clay GTM guide

How to Build a Lead Scoring Model (Step by Step)

Build a lead scoring model backward from closed-won: fit times intent, weighted by what predicts a purchase, validated against real outcomes.

May 1, 202611 min read

Most lead scoring models are guesses dressed up as math. Someone in a room decides a demo request is worth +10, a webinar signup +5, an enterprise title +15, and the numbers go live without anyone ever checking them against a single deal that actually closed. A scoring model that works is built backward from closed-won, not forward from opinion: it reads fit and intent together, weights each input by how much it actually separates buyers from tire-kickers, and gets checked against who really converted before any rep ever sees a score.

Fit without intent ranks a perfect-profile account that will never buy. Intent without fit ranks a curious student downloading your whitepaper. You need both, multiplied, then validated. This is how to build it.

What you need before you start

A Clay account and a list of your last 30 to 50 closed-won deals, plus a roughly equal set of closed-lost or no-decision leads to score against. Enriched lead data (firmographics, seniority, tech stack, and any intent signals you capture). A working ICP hypothesis, even an imperfect one. You do not need a perfect model before you begin; you build a rough version, score historical outcomes, and tighten the weights until the scores line up with reality. The upstream half of this (capturing form fills, enriching them, and routing the scored result to a rep in minutes) is the inbound lead qualification workflow; this article builds the scoring engine that workflow runs on.

Step 1: Define your ICP from pain, not demographics

A demographic-first ICP is the first guess everyone gets wrong. Teams open a doc and start typing filters: 100 to 500 employees, North America, SaaS, $10M+ revenue. The list looks reasonable and predicts almost nothing, because it never answers why any of those traits would make someone buy.

The right starting point is the buying motivation underneath the firmographics. Before you write a single filter, answer five questions about who actually feels the problem you solve.

Answer five questions about who actually buys, and watch your scoring inputs build themselves

Your ICP hypothesis

Fit inputs

Answer a question to fill this in.

Intent inputs

Answer a question to fill this in.

An ICP built from buying motivation produces concrete, scoreable inputs that already split cleanly into fit and intent. An ICP built from demographics produces filters that predict nothing.

The payoff of starting here is that every answer becomes a measurable input you can score later. "RevOps leaders at companies that just raised" is not a vibe; it is a seniority check, a department match, and a funding-recency signal. Your ICP is a hypothesis, not a verdict. You will refine it against closed-won in Step 5, but it has to start from motivation or there is nothing real to refine.

Step 2: Separate fit from intent, then multiply them

A single number that mashes fit and intent together hides the one thing a rep needs to know. Fit answers "should we ever sell to this account," and intent answers "are they ready right now"; collapsing both into one additive score loses the distinction that decides what a rep does next. A perfect-fit account with zero intent is a nurture target. A scrappy account showing five buying signals is a call today. The same total score, two completely different plays.

The fix is to score the two dimensions on separate axes and let an account's position decide its action, rather than summing everything into one pile of points.

Drag five leads into the fit-times-intent grid and see why a single score hides the play

Low intentHigh intent

Place this lead

VP RevOps, 220 employees, ideal stack, no recent activity

Pick the quadrant above that matches its fit and intent.

Fit and intent are independent axes, and an account's quadrant, not its total point count, is what tells a rep whether to call, nurture, qualify, or drop it.

In practice you can keep two sub-scores, Fit and Intent, each on a 0 to 100 scale, and combine them as a product rather than a sum so that a near-zero on either axis pulls the whole lead down. A lead at 90 fit and 10 intent is not the same as 50 and 50, and multiplying keeps that honest in a way addition never does.

Step 3: Assign weights based on what the data says predicts a purchase

Equal weights are a confession that you have not looked at your data. Most teams give every criterion the same importance because deciding otherwise feels arbitrary, but the whole point of a scoring model is that some signals separate buyers from non-buyers far more sharply than others. Seniority might be the strongest predictor in your data, or it might be tech stack, or funding recency. You do not know until you weight them and test.

Start with a weighted set of fit inputs, set them from your best current read of which signals correlate with closing, and treat those numbers as a draft you will correct in Step 5.

Set the dimension weights and watch the lead list re-rank, pushing your real closed-won deals to the top

Seniority tier30
Company headcount fit25
Tech stack depth20
Funding stage and recency15
Industry match10
This weighting puts your real winners on top. That is what the data is telling you to weight.
1Lead Aclosed-won
85
2Lead Eclosed-won
82
3Lead Cclosed-won
79
4Lead G
59
5Lead B
50
6Lead D
47
7Lead F
35
8Lead H
24

The right weights are the ones that rank your actual closed-won deals near the top. You discover them by tuning against real outcomes, not by debating importance in a meeting.

If pushing your closed-won deals up the list forces you into weights that feel strange, that is the data correcting your instinct, which is exactly what it is for. The criteria themselves come straight from your ICP work in Step 1; the weights are the part you are now letting the outcomes decide.

Clay gave us the ability to define what a great customer looks like on our terms. Not just industry and title, but the signals that actually predict who will buy. Our reps are working better lists, closing faster, and generating 19% more revenue per head.

Step 4: Build the scoring logic in Clay with deterministic rules plus AI

Not every criterion should be scored the same way. Firmographic checks (headcount band, funding stage, title keywords) are deterministic lookups: a clear rule returns a clear number. Judgment calls (does this messy industry value map to a category we win in, does this job title signal real buying authority despite a weird label) need AI. Build the deterministic 80% with native rules first, then layer an AI formula only over the 20% of cases that rules handle badly, so you spend credits where judgment actually changes the score.

Clay's native Score Row enrichment (Add Enrichment, then Score Row) handles firmographic scoring across up to 15 criteria: you set each factor, a comparison type, keywords, and the points to assign, and Clay returns a number plus its reasoning. Use it for everything a rule can decide cleanly. For the harder inputs, add an AI formula column. A common one is normalizing the industry field, since raw provider data is inconsistent (Ulta Beauty and Nike both come back as "Retail," and different vendors label the same sector "IT," "Software," or "Internet"). Map it to your own categories before you score it.

The other input AI handles well is reading a lead's enriched fields and returning a fit score with its reasoning, so a human can audit why the model rated it that way.

AI formula: score a lead's fit from enriched fields
You are scoring inbound and sourced leads for fit against our ICP.Lead data:- Job title: {{job_title}}- Seniority tier: {{seniority_tier}}- Department: {{department}}- Company: {{company_name}}- Headcount: {{headcount}}- Industry (normalized): {{industry_normalized}}- Tech stack: {{tech_stack_summary}}- Funding stage: {{funding_stage}}Score each dimension 0-10, then return JSON only:{  "seniority_fit": 0-10,  "headcount_fit": 0-10,  "tech_stack_fit": 0-10,  "funding_fit": 0-10,  "industry_fit": 0-10,  "fit_score_0_100": 0-100,  "strongest_signal": "one sentence: the single trait most like our best customers",  "weakest_signal": "one sentence: the trait that argues against fit",  "reasoning": "one sentence explaining the overall fit score"}Scoring guidance:- Seniority: C-level 9-10, VP/Director 7-8, Manager 5-6, IC 3-4, entry 1-2.- Headcount: 50-500 is our primary band (8-10); outside it scores lower.- Tech stack: reward a mature stack with 5+ tools in the category.- Do not invent signals. Use only the data provided. If a field is empty, score that dimension 5 and say so in reasoning.

Test this on 10 to 20 rows before running at scale, and read every output. If a meaningful share look generic or contradictory, tighten the prompt before you proceed. When an AI formula misfires, Clay's "Output is Wrong" button drops you into a flow to fix the logic; if it still resists, the formula is just generated code, so paste it into an AI assistant with your expected-versus-actual outputs and ask why it breaks.

Step 5: Validate the model against who actually closed

This is the step almost every team skips, and skipping it is why most scoring models quietly lie. A scoring model is unvalidated until you have run it across your historical closed-won and closed-lost leads and confirmed that your best customers actually score high; a model that has never met a real outcome is a guess with decimals. The test is simple: score the deals you already know the answer to, and see whether the model agrees with reality.

Take your last 30 to 50 closed-won deals and a matched set of closed-lost or no-decision leads, run them all through the model, and check the distributions. The bar to clear is that the large majority of your closed-won deals land above your sales-ready threshold. If they scatter all over the range, or your best customers cluster in the middle, the weights from Step 3 are wrong and you go back and re-tune.

Drag the threshold across your real closed-won and closed-lost deals, then compare to a guessed model

Closed-won deals (n=40)

Closed-lost / no-decision (n=40)

Sales-ready threshold65

100%

Won captured (recall)

100%

Precision above line

40

Won above line

0

Lost above line

A validated model separates the bands. The defensible threshold is where that separation holds.

A model is only trustworthy once its scores actually separate the deals that closed from the ones that did not, and the threshold you can defend is the one where that separation holds.

There is a second, generative use of closed-won here. Once you know which companies actually bought, you can find more that look like them: mark a record closed-won in your CRM, trigger a Clay workflow that runs the company through Find Company Lookalikes (or Ocean.io), and write the 10 nearest matches to a new table to enrich and score. The same outcomes that validate your model also become a source for the next list. ElevenLabs built automated scoring on exactly this foundation, scoring every inbound lead so the right ones reached sales faster.

+50%

Lift in sales-qualified leads ElevenLabs saw after moving to automated lead scoring in Clay.

Read the full story

Step 6: Operationalize the score and keep it fresh

A validated model is worthless if it lives in a spreadsheet no rep opens. Once the scores separate your winners, the model has to run automatically on every new lead, write its score and reasoning where reps work, and get re-validated on a schedule, because a model that was right last quarter degrades as your market moves. Operationalizing is two jobs: wiring the score into the live flow, and keeping the model honest over time.

For the live flow, the scoring columns you built run on every new row as leads land in Clay, and the score plus its tier and one-line reasoning sync to your CRM as custom fields (ICP Tier, Fit Score, Strongest Signal). Standard CRM fields will not surface the right context, so build the custom fields before the sync runs. The routing and alerting that happens after the score is set, who gets the lead and how fast, is the job of the inbound qualification workflow; the scoring model is what feeds it a number it can trust.

Keeping it fresh is the part teams forget. Set a recurring date, every quarter is reasonable, to re-run the Step 5 validation on the most recent closed-won and closed-lost deals. Markets shift, your product moves upmarket, a new competitor changes which signals matter. When the recent winners stop scoring high, re-tune the weights. Scoring is not set-and-forget; it is a loop where each quarter of outcomes corrects the next quarter of scores.

Common failure modes

  • Building forward from opinion instead of backward from closed-won: A demo request is worth +10 because someone decided it was, not because the data showed demo-requesters close. Always run new weights against historical outcomes before they go live.
  • Adding fit and intent into one number: A 90-fit, 10-intent account and a 50-50 account get the same total and completely wrong plays. Keep the two scores separate and combine them as a product, not a sum.
  • Equal weights across every criterion: Flat weighting scatters your real winners through the middle of the distribution. Let the closed-won validation set decide which signals get the points.
  • Scoring on raw, un-normalized data: Seniority rules that look for "VP" miss "Vice President," and industry filters miss because providers label the same sector three different ways. Normalize titles and industries before any scoring runs.
  • Never re-validating: A model that nailed last year's buyers can quietly drift as your market changes. Re-run the closed-won validation every quarter and re-tune when recent winners stop scoring high.

Build a lead scoring model that actually predicts who buys

Score fit times intent in Clay, validate it against your closed-won deals, and run it on every new lead automatically.

Frequently asked questions

What is a lead scoring model?

A lead scoring model is a system that assigns each lead a numerical value representing how likely it is to convert, so reps can prioritize the highest-value prospects first. A good model scores two things separately: fit (does this account match the customers you win) and intent (are they showing signals they are ready to buy). The score is only meaningful if the weights behind it were validated against deals that actually closed, rather than assigned by opinion.

How do you build a lead scoring model step by step?

Start by defining your ICP from buying motivation, not demographics. Separate your inputs into fit and intent, and combine them as a product so a near-zero on either axis pulls the lead down. Assign weights to each criterion based on what your data shows predicts a purchase, build the logic in Clay using native Score Row for firmographics and AI formulas for judgment calls, then validate the whole model against your last 30 to 50 closed-won and closed-lost deals before it goes live. Finally, wire it into your live lead flow and re-validate every quarter.

What is the difference between fit and intent in lead scoring?

Fit measures whether an account matches the kind of customer you win: seniority, headcount, tech stack, industry, funding. Intent measures whether they are in-market right now: recent funding, hiring for a relevant role, evaluating tools, repeated engagement. A high-fit, low-intent account is a nurture target, and a low-fit, high-intent lead is worth a quick qualification call but probably not an AE. Scoring them on one combined number hides the distinction that decides what a rep should do next.

How do you validate a lead scoring model?

Run the model across leads where you already know the outcome: your recent closed-won deals plus a matched set of closed-lost or no-decision leads. Check whether the large majority of your closed-won deals score above your sales-ready threshold and whether the lost ones fall below it. If the two groups separate cleanly, the model works; if they overlap, your weights are wrong and you re-tune. A model that has never been scored against real outcomes is unvalidated, no matter how reasonable the points look.

How often should you update a lead scoring model?

Re-validate every quarter by re-running your closed-won and closed-lost deals through the current model. Markets shift, your product moves, and new competitors change which signals matter, so a model that separated buyers cleanly last quarter can drift. When your most recent winners stop scoring high, re-tune the weights against the fresh outcome data. Treat scoring as a loop where each quarter of results corrects the next, not a one-time setup.