Most “dirty CRM” is not missing data. It is inconsistent data.
The same company is written five ways, job titles sit in five formats, and the picklist no one agrees on has forty values where it should have eight. You cannot dedupe, route, score, or report on a database until its values are standardized first. A rep sees “Acme Inc,” “Acme Corp,” and “ACME” as one account; your CRM sees three, splits the history, and routes the same buyer to three reps.
The fix is not another enrichment vendor adding more rows. It is a pass that makes the rows you already have agree with each other. This is how to standardize the data you have, validate what is still good, flag what has rotted, and put a rule in place so it stays clean.
Step 1: Audit what kind of dirty your CRM actually is
Before you clean anything, separate inconsistent data from missing data, because they need opposite fixes. Missing data is an enrichment problem: a blank phone field needs a value added. Inconsistent data is a standardization problem: a full phone field written six different ways needs one format applied. Teams reach for an enrichment tool when most of their pain is the second kind, then wonder why the database still feels untrustworthy after they have paid to fill it.
Run a profile of your highest-traffic fields and sort every issue into one of two buckets. The audit is not about counting blanks. It is about finding the fields where one real-world value is stored as a half-dozen different strings, because those are the fields breaking your routing and your reports right now.
Sort your fields by inconsistency, not by how full they look
A field can be 100% full and still be the dirtiest field you have, because fill rate measures presence, not consistency. The fields with the highest fill rate are often the ones quietly breaking routing.
The fields at the top of the inconsistency sort are your work list for the next four steps. Note that this guide is about reconciling the data you already have. Adding net-new fields is enrichment, a separate job covered elsewhere.
Step 2: Standardize formats and casing with rule-based tools first
Most format cleanup is deterministic, so it should not cost you a single enrichment credit or an AI call. A phone number written four ways, a country written six ways, a title in mixed casing: these follow rules, not judgment. Clay's formatters run rule-based transforms without spending Data Credits. The available actions are Normalize Company Name (with an optional Normalize Case toggle), Normalize Phone Number, Format Date/Time, and Remove Extra Whitespace from Text. Reach for AI only when a value needs interpretation, not when it needs reformatting.
Open the Add enrichment panel, choose Normalize, and apply the built-in functions to the fields your audit flagged. The order matters less than the principle: exhaust the free deterministic tools before you spend anything.
Run the free deterministic cleanup before spending anything
Company
Normalize Company Name (Normalize Case)
Phone
Normalize Phone Number
Date
Format Date/Time
Text
Remove Extra Whitespace from Text
Format and casing cleanup is deterministic, so the right tool is a free rule-based normalizer, not a paid AI or enrichment call. Save AI for the judgment calls in the next step.
Step 3: Collapse company-name and picklist variants to one canonical value
The most damaging inconsistency is the same entity stored under several names, because it fractures every count you run. “Acme Inc,” “Acme Corp,” and “ACME” are not a formatting nuisance; they are three accounts in your reporting, three owners in your routing, and three line items in a board deck that should read as one. Stripping legal suffixes from Step 2 gets you partway. The rest needs a canonical mapping: a decision about which single value every variant resolves to, applied consistently across the whole table.
Build the canonical value, then map every variant onto it. Native normalization handles the mechanical part (suffixes, casing, punctuation). For the judgment part, deciding that “Big Blue” and “International Business Machines” are the same canonical company, or that “Tech,” “SaaS,” and “Information Technology” all map to one picklist value, use an AI formula so the rule is consistent across every row. Here is a prompt you can drop into an AI column to canonicalize a company name:
You are standardizing company names in a CRM to one canonical value.Raw value: {{company_name}}Optional context: domain {{domain}}, industry {{industry}}Rules:- Strip legal suffixes (Inc, LLC, Corp, Corporation, Ltd, GmbH) and punctuation.- Resolve known abbreviations and brand names to the legal/common parent (IBM = International Business Machines = Big Blue -> "IBM").- Use the domain to disambiguate when two raw values look alike.- Output Title Case. Do not invent a name you cannot support from the inputs.Return JSON only:{"canonical_name": "<value>", "confidence": 0-100, "merged_variants": ["<raw inputs you collapsed>"]}
Run it across the table, accept high-confidence results, and the same logic that cleaned company names cleans your industry and lead-source picklists.
Five spellings of one company become one account
Account count
Routing owners
Open pipeline (split rows)
Collapsing name variants to one canonical value is what makes your account counts, routing, and pipeline totals correct, not just tidier. A good rule keeps a genuinely different entity (Acme Robotics) apart.
This is the difference between a database a team trusts and one it quietly works around. At Sana, the team rebuilt confidence in 150,000 Salesforce accounts by reconciling and standardizing at scale instead of letting reps patch records one call at a time.
“Reps used to spend hours validating account information because they couldn't trust the data. With Clay, reps are much more confident in our CRM data and most accounts in their books of business are now worth reaching out to.”
When the canonical value is trustworthy, a rep stops re-verifying the account before every call. That recovered time is the real return on standardization.
Step 4: Validate emails and flag the records that have rotted
A standardized record can still be wrong, so cleaning is not finished until you separate the values that are correct from the ones that have decayed. Standardizing a phone number does not make it ring. Reformatting an email does not make it deliverable. Roughly a third of B2B contact data goes stale each year as people change jobs and companies restructure, so a clean-looking field is often a confident-looking lie. The job here is to score each record's health and route it by what is actually wrong with it, not to delete on sight.
Validate the email against deliverability, not just syntax, so a well-formatted dead address gets caught. Check the rest of the record for the signals of rot: a title that no longer matches the person, a company that was acquired, a contact who left. Then assign a health state and let the state decide the record's fate.
Score each record's health, then route it by what's wrong
Most records are not deletable, they are routable.
Cleaning ends by routing each record by its health state, not by deleting bad rows, because most bad records are recoverable, not garbage.
Quarantining the rotted records protects more than report accuracy. Dead addresses inflate bounce rates, and bounce rates damage the sender reputation every valid email then depends on.
Step 5: Write the clean values back without creating new records
A cleanup is only safe if it updates the records you have instead of inserting fresh ones, or you have replaced a standardization problem with a duplication problem. This is where careful cleanups go wrong: the write-back creates new rows next to the old ones, and the database you just standardized now has twins. The safeguard is to treat the whole pull as read-only until the moment you write, and to key every write on the CRM record ID so the update lands on the existing row in place.
Pull records into Clay through the native HubSpot or Salesforce integration, build and check your canonical values and health states, and change nothing in production while you work. When you are ready, test the write in a Salesforce sandbox first so a mistake never touches live data. Then use update actions keyed on the record ID. Deduplication gets one pass in this loop: before any write, look up each record by normalized email or domain so a standardized record updates its existing match instead of creating a new one. Finding and merging the near-match duplicates that exact matching misses is its own job, covered in the guide on how to find and remove duplicate contacts.
A write-back that updates in place and creates zero new rows
- 1Pull (read-only)
- 2Standardize & score
- 3Test in sandbox
- 4Write to production by record ID
Pull (read-only)
Records flow into Clay behind a lock. Production unchanged.
Keying the write-back on the CRM record ID is what updates records in place, so a cleanup never becomes the next source of duplicates.
Step 6: Put a standardization rule at the point of entry so it stays clean
Cleaning once buys you about ninety days; without a rule at entry, the next import refills the mess. A database drifts back to dirty the same way it got dirty: new records arrive from forms, list uploads, and manual entry, each with its own spelling of a company and its own phone format, and nothing standardizes them before they land. The durable fix is to run the same canonical mapping and validation on every new record on a schedule, so cleaning stops being a project you repeat and becomes a step that just runs.
Set the standardization pass as a standing step on the inbound path and on a refresh cadence for existing records. New arrivals get normalized on the same rules from Steps 2 through 4, get a health state, and get checked against existing records before they are written. Flip this rule off and the same audit you ran in Step 1 will read dirty again within a quarter.
Standardize at entry vs. re-clean every quarter
90-day projection
—
Runs on entry plus a scheduled refresh
A standardization rule that runs on every new record on a schedule is the only thing that keeps a cleaned database from drifting back to dirty.
Common failure modes when cleaning CRM data
Four mistakes turn a CRM cleanup into wasted effort.
- Treating inconsistency as a missing-data problem: Paying an enrichment vendor to fill fields that are already full, when the real issue is that the full fields disagree with each other.
- Spending AI credits on deterministic work: Running a model to reformat a phone number or strip a suffix that a free rule-based normalizer handles in place.
- Standardizing without a canonical decision: Cleaning casing and punctuation but never deciding that “Acme Corp” and “ACME” resolve to one value, so the counts stay wrong.
- Cleaning once and skipping the entry rule: A perfect one-time pass that the next import quietly undoes.
All four come from treating cleaning as a one-time chore instead of a standing system. Sort inconsistent from missing before you touch anything, use the free deterministic tools before the paid ones, decide the canonical value, validate for decay, write back in place, and run the same logic at entry. Done that way, your CRM stops being something the team works around and becomes something it can route, score, and report on.