Step 1: Build the prompt set from buyer questions

List the questions someone would ask an AI assistant at each stage of choosing a product like yours. Not your keywords — questions. "Best [category] for [use case]", "[competitor] vs [you]", "how do I solve [problem you solve]", "is [your brand] good for [use case]". Pull from sales calls, support tickets, and community threads. Aim for 20-50 prompts: enough to cover the journey, few enough to maintain. Freeze the wording — rephrasing a prompt changes results and breaks your trend line.

Step 2: Pick engines and query them the same way each time

Minimum viable coverage in 2026: ChatGPT, Perplexity, and Google AI Overviews — they have different citation behavior (research shows only ~11% citation overlap across platforms). Add Claude, Gemini, and Copilot as capacity allows. Query in a fresh session each time (no conversation history), don't log into personalized accounts, and record the date. Engines are non-deterministic, so expect some run-to-run noise; that's why trends matter more than single runs.

Step 3: Score each answer on three axes

  • Mention: does the answer name your brand at all? Record position too — named first reads differently than named seventh.
  • Citation: is your site linked as a source? A mention without citation means the engine knows you from training data or third-party sources; a citation means your pages are being retrieved.
  • Sentiment and accuracy: how are you characterized, and is it correct? Wrong pricing, outdated features, and misattributed capabilities are common and worth cataloguing — they're fixable with content updates.

Track the same for your top 3-5 competitors on every prompt. Share of voice against competitors is more actionable than your raw mention rate.

Step 4: Trend it and act on the gaps

One measurement is a baseline; the value shows up at measurement three or four when you can see movement. Falling visibility on a prompt cluster tells you where to look: check whether the cited sources changed, whether a competitor published something that now dominates retrieval, or whether your pages stopped being fetched (crawl logs show this). Rising visibility after a content change is the closed loop that justifies the work.

When to automate

Manual tracking costs roughly a day per full run at 30 prompts × 4 engines. That's fine monthly; it collapses at weekly cadence or 100+ prompts. Automation platforms run the prompt set on schedule and score responses consistently — Lighthouse does this across 10+ engines with sentiment and citation scoring built in, and the free tier will give you a first baseline in about three minutes. Whatever tool you use, apply the same test: its numbers should match what you see when you spot-check the engines by hand.