Clay logo, go to homepage

Clay GTM guide

How to Scrape Business Data from Professional Profiles

Turn a name, domain, or profile URL into structured fields, title, seniority, tenure, skills, at scale. Here is how to pull clean profile data into a usable table, verify it, and keep it compliant, no per-site scraper required.

June 8, 20269 min read

Most people scrape the wrong thing. They build a fragile scraper for one site, watch it break the moment the page layout changes, and start over the next week. The structured business data you actually want, a person's name, title, seniority, company, tenure, skills, and recent activity, does not require a per-site scraper at all.

You start from a name, a domain, or a public profile URL, then ask Clay to find the profile and read the named fields off the public page. The page can change all it wants; the fields you ask for stay the same. This is how to pull clean profile data into a usable table, verify it, and keep it compliant.

Step 1: Decide what you can start from

Your starting input decides what you can extract, and how reliably. A profile URL is the cleanest place to begin, because the page is already identified. A name plus a company domain, or an email, forces a find-the-person step first. Each input unlocks a different set of fields at a different match rate.

Toggle inputs on and off to see which fields each one unlocks

Contact record

Dana Whitfield

Northwind Logistics

Company domain
Public profile URL
Work email
Name

Available enrichments

Coverage: ~0%+

Toggle a starting input on the left to see what it unlocks. A company domain opens the email waterfall.

What you can pull off a profile depends entirely on what you start from: a profile URL lights up the most fields, a bare name the fewest.

A bare name with no domain is the weakest input: two people share it, and you cannot tell which profile is the right one. Add the company or a verified work email and the match becomes specific. If you only have an email, Clay can reverse-resolve the profile first, the path covered in the companion guide on how to find a person from an email address.

Step 2: Pick the right read for the page

The page type decides the tool, not the data source. A clean, structured directory page reads differently from a messy public profile, and Clay gives you a different mechanic for each. Pick by how the page is built, not by what site it lives on.

Clay for Chrome extracts structured data from a webpage you have open. It auto-detects list-like structures (a directory, a results table, a roster) and turns them into rows you can send straight to a Clay table, download as CSV, or copy. Use it for clean, list-style pages where the data is already laid out in a repeating pattern.

An AI column (Web research) and Claygent read a public page and return the named fields you asked for, even when the page is unstructured. Claygent is Clay's AI web scraper: it pulls real-time context from the live page rather than a stale database, so it handles profiles whose layout shifts. ScrapeMagic is the option when you want to parse specific fields from a single URL by describing each field by name.

Match the read to the page in front of you

Page you are readingBest toolWhy
Clean directory or results list (repeating rows)Clay for Chrome extensionAuto-detects the list pattern and exports rows directly
Single freeform public profile pageAI column (Web research) or ClaygentReads named fields off an unstructured layout in real time
Specific fields from one known URLScrapeMagic (Parse Data from URL)Define each field by name and description, get structured output
A page that changes layout oftenClaygentReads the live page, so a layout change does not break it

The takeaway: you never maintain a scraper per site. You maintain a list of fields you want, and Clay reads them off whatever page is in front of it.

Step 3: Run a single-profile extraction

Start with one profile URL and one AI column before you run a thousand. Add a Use AI column, choose the Generate tab, and describe what you want in plain English. Clay builds the prompt, picks a model, and sets up output fields for each business field you name. Run it on one row, read the output, and only then point it at the full list.

One profile URL in, structured fields out

Public professional profile · freeform layout

Dana Whitfield leads revenue operations at Northwind Logistics, a mid-market freight company based in Austin.

She has spent a little over two years in the role and writes regularly about supply-chain hiring and forecasting.

Skills called out on the page include RevOps, Salesforce administration, and pipeline forecasting.

Structured fields

One profile URL plus one AI read turns an unstructured public page into a full row of named, structured business fields.

Here is a Claygent extraction prompt you can paste into a Use AI or Claygent column. Map the profile URL to the column that holds it.

Claygent: profile field extraction
Read the public professional profile at {{Profile URL}}.Return ONLY these fields as structured output. If a field is notpresent on the page, return "not found" rather than guessing:- full_name- current_title- seniority_level (IC / Manager / Director / VP / C-level)- current_company- tenure_in_current_role (years and months)- top_skills (up to 5, comma-separated)- most_recent_public_activity (one line, with approximate date)Use only what appears on the public page. Do not infer or fabricate.

The reason you tell it to return “not found” instead of guessing: an AI read that invents a plausible title is worse than a blank. You cannot tell the fabricated cell from the real one downstream.

Clay has helped us simplify complex workflows, eliminate redundant tools, and make smarter decisions faster. It's become a cornerstone of our RevOps strategy.

Step 4: Scale from one profile to a list

A single read proves the prompt; a list needs a find step in front of it. If you do not already have profile URLs, start with Clay's Find People source. It searches by job title, organizational level, function, company attributes, location, and experience, and returns the matching people as rows, including their public professional profile URLs. From there, your tested AI column reads each one, and a verify column keeps only the rows you can trust.

Find People matches on synonyms and similar titles by default. A search for “Software Developer” also returns “Frontend Engineer.” If you need exact titles only, use the “must contain exact” or “must match exactly” options, and wrap multi-word terms in quotes for exact-phrase matching. That precision matters most when seniority is the whole point of the list. The sequence is always the same: search criteria find the people, your AI read fills the fields, and the verify step narrows the list to the rows that hold up.

Step 5: Dedupe and verify what you pulled

Raw scraped fields are a draft, not a record. The same person shows up twice under two title variants. A tenure reads as current when the person changed jobs last month. A name comes back with trailing whitespace. Clean before you trust. Clay's formatters handle the mechanical part: Normalize Company Name collapses “Northwind Logistics, Inc.” and “Northwind Logistics” into one value, and Remove Extra Whitespace strips the stray spaces that break a join.

From a messy scrape to a clean, verified record

6

Rows

0

Verified

Normalize Company NameRemove Extra WhitespaceTap to pause

Deduping and verification turn a scrape into a record you can act on: one row per person, normalized values, and fields confirmed current.

For the fields that drift fastest, title and company, re-read or cross-check against a second source before you act on them. A profile scraped six months ago is a snapshot, not a feed. Treat tenure and recent activity as the first things to re-verify.

Step 6: Match the input type to your match rate

Coverage and quality are not the same number, and both move with your input type. Clay benchmarks profile finders by what you feed them. The gap between a name-plus-domain start and an email start is real. If your match rate is low, the fix is usually a better input, not a better provider.

Profile match: coverage vs quality by starting input

60%80%100%0%50%100%Name + company domainWork emailPersonal email
Match quality (vertical) · Coverage (share matched) (horizontal)

Tap any dot to see its exact quality and coverage. The dots sit along a tradeoff line, not in one corner.

The starting input you choose moves both coverage and accuracy, so picking the right input does more for your match rate than swapping providers. Source: Clay data test, best professional profile finder by input type, 2025.

The pattern the benchmark shows is clear. A name plus a company domain gives you the broadest coverage with high accuracy. A work email trades coverage for accuracy. A personal email is the hardest input to resolve cleanly. When a record comes back empty, change the input before you change the tool.

3x

Anthropic tripled its data enrichment coverage by combining multiple data providers in Clay instead of relying on a single source.

Read the full story

Step 7: Keep it compliant and current

Public data, used for legitimate business contact, is the only ground you should stand on. Pull from public professional pages and public profiles, store only what you need, and respect each source's terms. Clay reads public pages for the fields you name; it does not require you to operate a fragile, terms-violating scraper of your own. The compliance posture and the accuracy posture point the same way: scrape narrowly, verify often, and refresh the fields that go stale.

Schedule the read to re-run on a cadence so titles and companies stay current rather than decaying into a one-time snapshot. The most defensible profile data is also the freshest: pulled from public sources, scoped to the fields you actually use, and re-verified on a schedule.

Common failure modes

The four ways profile scraping breaks are predictable, and each has a fix you can apply before you ever run the list. Most failed scrapes are not a tooling problem; they are an input problem, a verification problem, or a fabrication problem.

Diagnose a bad scrape: pick your symptom

Almost every bad scrape traces to one of four fixable causes: a weak input, an unverified field, a stale snapshot, or an AI that guessed instead of returning “not found.”

The single most common one is the last: a model that fills a blank with a confident guess. That is why every prompt in this guide ends with an instruction to return “not found.” A blank you can re-run. A fabricated VP title you will email.

Pull clean profile data into a table, without a scraper per site

Start from a name, a domain, or a URL and let Clay find the profile and read the fields you name.

Frequently asked questions

Is it legal to scrape business data from professional profiles?

Pulling publicly available professional data for legitimate business contact is generally accepted. The details depend on your jurisdiction, the source's terms of service, and how you use the data. The safe posture is to read only public pages, store only the fields you need, and respect each source's terms. Clay reads public pages for the named fields you request rather than requiring you to run a scraper that violates a site's terms. For any specific use case, review the relevant terms and consult a legal professional.

What business data can you actually extract from a profile?

Reliably: name, current title, seniority, company, and skills. With a clean profile URL you can also pull tenure and recent public activity. The fields you can reach depend on your starting input: a profile URL unlocks the most, a bare name the fewest. Treat title, company, and tenure as the fields most likely to be out of date, and re-verify them before acting.

Do I need a separate scraper for each site?

No, and that is the point. A per-site scraper breaks every time the page layout changes. In Clay you maintain a list of fields you want, and an AI read (Use AI or Claygent) pulls those fields off whatever public page is in front of it. For clean, list-style pages, the Clay for Chrome extension auto-detects the structure and exports rows directly. The page can change; your field list does not.

Can I scrape profiles from just a name or an email?

Yes, but the match rate depends on the input. A name plus a company domain gives the broadest coverage with high accuracy. A work email is strong on accuracy and narrower on coverage. A bare name with no company is the weakest input, because you cannot tell which of several same-named people is the right match. If you only have an email, Clay can reverse-resolve the profile first, then read its fields.

How do I keep scraped profile data accurate over time?

A scrape is a snapshot, so it decays. Dedupe with Clay's formatters (Normalize Company Name, Remove Extra Whitespace), verify the volatile fields against a second source, and schedule the read to re-run on a cadence so titles and companies stay current. The fields that drift fastest, title, company, and tenure, are the ones to re-verify first.