Search used to return ten blue links, and you could see your rank on every one. AI search returns one synthesized answer, and your brand is either inside it or invisible. Ask ChatGPT what the best tool for outbound is and it names three or four products in a paragraph. There is no page two. There is no second result for the curious clicker to find. Google Search Console still tells you about clicks; the AI platforms tell you nothing. An AI visibility dashboard is the measurement layer that closes that gap. This is how to think about one, what it should measure, and how to build it.
What is an AI visibility dashboard?
An AI visibility dashboard measures whether AI assistants name your brand when buyers ask them what to buy. It runs a library of real buyer prompts through ChatGPT, Claude, and Perplexity on a schedule. A second AI pass extracts a structured signal from each answer. The result is a set of metrics you can act on: are you mentioned, where in the answer, with what sentiment, and against which competitors.
The discipline behind it is AEO, or AI Engine Optimization. SEO was about ranking on a results page. AEO is about being the answer when there is no results page, just a model deciding which brands to surface and how to describe them. Whether you get surfaced is not random. It is driven by how often your content gets cited, how your brand sits in the context the model retrieves, and which competitors keep appearing alongside you. A dashboard makes those forces visible. Clay built one for its own growth team, and it is the dashboard described throughout this guide. The hard part was not code. It was teaching a second AI to read messy answers and return clean signal every single time.
What an AI visibility dashboard measures
The metrics are the point. A dashboard that shows a single visibility gauge is a vanity meter; a useful one breaks visibility into the signals a marketer can change. Each metric below comes from Clay's own build.
The six metrics an AI visibility dashboard should report
Visibility is not one number; it is a small family of metrics, each answering a different question a marketer can act on.
Visibility Score is the headline: the share of answers that name you at all. Citation Rate is sharper, because being mentioned and being cited as a source are different things. Average Position separates being listed fourth from being recommended first. Sentiment scores how the model frames you when it does name you. Competitor Visibility is the same calculation run for each rival, which gives you share of voice. And the period delta on every metric is what turns the dashboard from a snapshot into a trend.
Why off-the-shelf AEO tools fall short
Off-the-shelf AEO tools track brand mention rate and sentiment, and stop there. That is a one-size-fits-all schema, and one size does not fit the questions a product marketer actually needs to answer. Generic tools tell you that you were mentioned 40 percent of the time with neutral sentiment. They cannot tell you whether your new pricing changed how AI describes your value. They cannot tell you how fast a new product started appearing in answers after launch. They cannot tag a prompt by the internal use case it maps to, or normalize four names for one competitor, or separate the prompts that inflate your score from the ones that count. Those answers require a schema you write yourself, because they are specific to your category and your roadmap.
The other reason to build is that the generic tools are a black box. You cannot see why a number moved. A custom dashboard stores every raw answer and every extracted field, so when your visibility score drops you can read the exact responses that drove it. This is not hypothetical. Clay's growth team, led by its SEO/AEO lead, runs three Clay-powered systems. One refreshes more than 8,000 company-and-executive dossier pages. One turns video transcripts into searchable template pages with Claygent. The third is this AI visibility dashboard, and the team built it in two days.
Clay's growth team built its own AI visibility dashboard in two days, at roughly a fifth the cost of the off-the-shelf AEO tools it evaluated.
Read the full storyBuilding beat buying for one reason: every AEO tool evaluated calls the same LLM APIs that already run inside Clay. The data retrieval was never the moat. The parsing and the schema were, and they had to be specific to how Clay's MCP app, Claygent, and a recent pricing change were being discussed. The economics follow from where the dashboard sits. It runs on top of Clay and Supabase and touches no core infrastructure, so the proof of concept came in at roughly five times cheaper than the tools on the market. The real advantage is maintenance cost. If a better approach appears, the team rebuilds it in another two days, with no migration, no contract to unwind, and no sunk implementation to protect. This is content engineering as a natural extension of GTM engineering: the same tools, the same logic, the same engineering discipline Clay's growth team already applies to outreach.
Why branded prompts inflate your score
Every brand looks good on branded prompts, so they cannot be in your visibility score. Ask any AI what Clay is used for and it answers confidently, because the model already knows what Clay is. That is brand perception, not competitive visibility, and mixing the two inflates your numbers. The fix is to split your prompt library by type and report them separately. Watch what happens to a visibility score when branded prompts are allowed back in.
The same brand, scored with and without branded prompts
Prompts counted
- best tools for B2B prospecting
- how do I automate outbound
- which CRM enrichment tool should I use
- top sales intelligence platforms
Branded prompts always name you, so they push the score up without measuring whether you win an open question.
Including branded prompts inflates your visibility score because the model always knows your own brand; only non-branded prompts measure competitive reality.
So every leaderboard, topic breakdown, and trend line should filter to non-branded prompts only. Branded prompts still matter, but they belong in a separate sentiment view that answers a different question: when AI talks about you by name, is it accurate and is it positive. Start with 20 to 30 non-branded prompts that mirror real buyer questions, then scale.
The four-layer architecture
An AI answer is a paragraph of prose, and a dashboard needs rows of structured data, so the architecture exists to bridge that gap. The sentence "Clay was named third with positive framing, alongside HubSpot and Apollo" is a judgment, not a database field. Something has to read the prose and make that judgment, reliably, thousands of times.
Collect, analyze, store, visualize
Collect (Clay)
Sends each prompt to ChatGPT, Claude, and Perplexity and retrieves the raw answer plus the URLs the model cited.
The pipeline's job is to turn one unstructured AI answer into structured, queryable signal: collect raw text, analyze it into JSON, store it, then visualize it.
Clay does the data work in the first two layers, which is the hard part. Supabase stores it. A Next.js app shows it. The split matters because each layer is debuggable on its own. You can read exactly what the collector retrieved, what the analyzer extracted, and what landed in the database. That is how you catch a silent failure before it corrupts weeks of data.
Collection runs on native AI API integrations rather than a generic web scrape, for one reason. Clay's first build used Claygent and got the answer text back, but not the citation data, meaning the URLs and domains the model sourced. Citations are one of the most valuable signals, so the collection layer was rebuilt to capture them.
How Clay builds the collection and analysis layers
The two layers that matter most live inside one Clay table, where each row is a prompt and each column is a step in the pipeline. This is where the dashboard earns its accuracy, so it is worth seeing the columns in order. The table starts with the prompt and its metadata: prompt text, prompt type (branded or non-branded), topic, the internal use case it maps to, and tags. Then, per platform, a Claygent column or a Use AI column sends the prompt and returns the raw answer with cited URLs. A formula column flattens that raw response into clean fields. A second Use AI column, the analyzer, reads the flattened fields and returns strict JSON. Finally an HTTP API column posts that JSON to storage.
Two AI columns per platform, not one, is deliberate. The raw query object has to be flattened by the formula column before the analyzer reads it, and separating the steps means you can see exactly what the analyzer received. When something breaks, you know which step broke. The analyzer prompt is the intelligence of the whole system, and the hardest part to get right. It has to return clean JSON every time, because Clay reports a 200 OK even when the JSON silently failed to parse. Here is the shape of it.
Read the AI response below. Return ONLY valid JSON, no markdown fences.Extract:- clayMentioned: "Yes" or "No"- clayMentionPosition: integer (1 = named first), or null- brandSentiment: "positive" | "neutral" | "negative"- brandSentimentScore: 0-100 (50 = neutral)- citationType: "Direct" | "Indirect" | "None"- citations: [{ url, domain, title, urlType }] urlType in Owned | Competition | Institution | Earned Media | Social | PR Wire | Other- competitorsMentioned: [ normalized company names ], e.g. ["Apollo","HubSpot","ZoomInfo"]- themes: [ short strings ]- positioningVsCompetitors: one sentence, or nullNormalize competitor names BEFORE returning:"ZoomInfo / Zoom Info / DiscoverOrg" -> "ZoomInfo""HubSpot / Hubspot / HubSpot CRM" -> "HubSpot""Apollo / Apollo.io" -> "Apollo""Outreach / Outreach.io" -> "Outreach"AI response:{{Parsed Response}}
That normalization line is not a nice-to-have. Without it, one company arrives as four separate competitors and the leaderboard fragments into nonsense.
Why competitor normalization happens at extraction time
A model will name the same company five different ways in one answer, and your leaderboard has to treat them as one. "ZoomInfo", "Zoom Info", and "DiscoverOrg" are one competitor. Collapse them at the moment of extraction, not in a cleanup pass later, or the fragmented names leak into every downstream query.
Ten raw aliases collapse into four real competitors
Raw mentions
Normalized companies
Normalizing competitor aliases at extraction time is what keeps one company from appearing as four, so the share-of-voice leaderboard reflects reality.
The cost of skipping this is a fragmented picture: you look like you are losing to "ZoomInfo" and "DiscoverOrg" separately, when there is one company in the room. Normalize early and every leaderboard, gap analysis, and trend line downstream is correct by construction.
How storage and the dashboard fit together
Storage and visualization are the boring layers, and boring is correct here. Supabase holds five tables: one row per unique prompt, one row per prompt-platform-day, and flattened rows for citation domains and for competitors so the leaderboard queries stay simple.
The one design decision that matters early is a unique constraint on prompt, platform, and run day, so that a same-day re-run overwrites instead of duplicating. Add it before you have real data, not after. Clay also hit a camelCase versus snake_case mismatch (the analyzer emits clayMentioned, the schema expected clay_mentioned) and solved it with a COALESCE in the RPC so whichever key is non-null wins. Small thing, saves a rewrite. The dashboard itself is a Next.js app where every query lives in a pure function in one place and components never touch SQL. Pages map to the questions a marketer asks: a home view with the headline metrics, a citations view, a competitive intelligence view with topic-by-topic gaps, and a sentiment view with the actual competitive-framing quotes AI used.
One operational lesson is worth more than the schema. In late April, OpenAI shifted GPT-4o to GPT-5.5, the response format changed, the Clay columns failed silently (200 OK, empty parsed response, nulls into storage), and weeks of data came in incomplete. Treat the pipeline like an engineering system, not a spreadsheet: add a row-count monitor and a null-rate check on the key fields. A silent ingestion failure is worse than a loud one.
What a PMM can actually act on
The dashboard is only worth building if it produces decisions, not charts. The right questions turn each metric into a specific next move.
Each question maps to one move
| Question the dashboard answers | What you do about it |
|---|---|
| Which AI platforms never mention us? | A 0% platform is a content or citation gap; go earn citations the model reads. |
| Which topics are we invisible on? | These are the topics to create content for next. |
| What narratives does AI repeat about us? | A recurring negative theme is a reputational risk to address directly. |
| Which domains does AI cite for our competitors? | Get cited by those same sources. |
| What does AI say about us versus a rival in one answer? | The competitive-framing quote is your sharpest positioning input. |
The last one is the most useful and the hardest to get any other way. When a single answer compares you to a competitor in the same breath, that sentence is exactly how the market's most-used research tool frames the choice. Read those quotes weekly.
How to start
You do not need the full five-table schema to learn something this week. Start with the prompt library, because deciding what to track is the real work, and the rest is plumbing. Write 20 to 30 non-branded prompts that match how buyers actually ask: best tools for X, how do I do Y, which platform for Z. Tag each by topic and intent. Put them in a Clay table, add one Use AI or Claygent column per platform to collect answers, and add the analyzer column to extract structured signal.
Schedule the table to re-run on a recurring basis so you get a trend, not a snapshot. Watch the two credit meters as you scale: every column run costs Actions, and the AI model calls draw Data Credits. That alone gives you a visibility score and a competitor leaderboard before you write a line of dashboard code. The hands-on, persistent build with citation analysis and sentiment trends over time is the natural next read; for the broader context on why this work sits with a GTM engineer rather than an analyst, start with the GTM engineering guide, and for the signal side of the same coin, the intent data guide.