Gate 1: Can the engine fetch your page?

AI crawlers are less capable than Googlebot and easier to block by accident. Work through this checklist:

  • robots.txt allows AI crawlers. GPTBot and OAI-SearchBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, Google-Extended, and CCBot. Check for CDN-level bot blocking too — Cloudflare and similar services ship AI-bot blocking that overrides your own robots.txt, and it's often on by default. (We found exactly this on our own site: our robots.txt now explicitly welcomes every major AI crawler.)
  • Content is in the initial HTML. AI crawlers largely don't execute JavaScript. If your content renders client-side, the crawler sees an empty shell. Server-render or pre-render anything you want cited.
  • No auth walls or paywalls on pages you want cited, and no aggressive rate limiting that times crawlers out.
  • Help discovery: a sitemap.xml listed in robots.txt, and an llms.txt file that gives LLM agents a map of your site.

Symptom of a Gate 1 failure: you're rarely or never mentioned even for prompts where you should be an obvious answer, and your server logs show no AI-crawler visits.

Gate 2: Does the engine choose your page?

Before reading a page, retrieval systems judge its "cover": URL slug, title, description, and date. The cover must signal that this page answers this question.

  • Title precision. "Best AEO Tools in 2026: 8 Platforms Compared" beats "Our Thoughts on AI Search Tools." Include the category, the use case, and a freshness signal.
  • Slug matches topic. /learn/best-aeo-tools-2026, not /blog/post-847.
  • Format matches question shape. Comparative questions get answered from comparison-shaped sources. If the prompt is "X vs Y" or "best X", a single-product page loses to a structured comparison regardless of quality.
  • Visible dates. Publish and update dates on the page and in schema. Engines prefer sources that look maintained.

Symptom of a Gate 2 failure: crawlers visit your pages (logs show it) but citations go to competitors whose titles and formats match the question more precisely.

Gate 3: Can the engine extract an answer?

Engines lift chunks of roughly 200-400 words. Your job is to make one chunk self-sufficient:

  • Answer first. State the complete answer in the opening block, then elaborate. A 500-word scene-setting intro pushes your answer below the extraction window.
  • Question-shaped headings. H2s that match how users phrase questions give retrieval a direct hook.
  • Text, not artifacts. Answers that exist only in images, charts without captions, videos without transcripts, or JavaScript-driven accordions don't get extracted.
  • Specifics. Numbers, names, dates, prices. Quotable facts survive synthesis; adjectives don't.

Symptom of a Gate 3 failure: your page appears in the citation list occasionally but your actual points never make it into the answer text, or engines cite you for trivia while your main argument goes unused.

Diagnosing in order

Run the gates as a decision tree. Never mentioned + no crawler visits → Gate 1: fix access. Crawled but not cited → Gate 2: fix the cover and format. Cited but your answer isn't used → Gate 3: restructure the page answer-first. Re-measure after each fix — the tracking method gives you the before/after, or a Lighthouse analysis scores all three layers per page automatically.