cited?← all posts
AEO1 min read

How to Check If AI Crawlers Can Read Your Site

By ShlokPublished

The short answer

To check if AI crawlers can read your site, inspect your live robots.txt for Disallow rules on AI user-agents, then test an actual fetch using each agent (GPTBot, ClaudeBot, PerplexityBot, Google-Extended). A page can load in a browser but return 403 to a bot if a CDN or WAF blocks it. Verify both layers.

Why this is the first thing to check

If AI crawlers can't fetch your pages, no amount of content or schema will get you cited. Crawler blocks are common and silent — your site looks fine to humans while being invisible to engines.

Step 1: read your robots.txt

  • Open yoursite.com/robots.txt.
  • Search for GPTBot, OAI-SearchBot, ClaudeBot, Claude-SearchBot, PerplexityBot, Google-Extended.
  • Any Disallow: / under those agents blocks the engine.

Step 2: test a real fetch

robots.txt is advisory; your CDN or WAF can hard-block bots regardless. Send a request with each AI user-agent and confirm a 200, not a 403 or challenge. Cloudflare, in particular, can inject AI-bot blocks that override your own robots file.

Step 3: fix and verify

Remove Disallow rules and whitelist the agents in your CDN/WAF. Then re-test. cited? automates both checks — robots.txt parsing plus a live user-agent probe — for every major AI crawler, and flags exactly which engine is blocked.

Frequently asked questions

+Can a site be blocked even if robots.txt allows AI bots?

Yes. A CDN or WAF (like Cloudflare's AI-bot controls) can block crawlers at the edge regardless of robots.txt. Always test an actual fetch with each user-agent, not just read the robots file.

+Which AI crawlers should I allow?

At minimum GPTBot and OAI-SearchBot (OpenAI), ClaudeBot and Claude-SearchBot (Anthropic), PerplexityBot (Perplexity), and Google-Extended (Google AI). Allowing them lets the major answer engines fetch and potentially cite your content.

+How do I test a specific crawler quickly?

Send an HTTP request with that crawler's user-agent string and check the status code, or use an AEO scanner that runs live user-agent probes for all major bots automatically.

Related