Ahrefs published a study of 1.4 million ChatGPT prompts. The goal was to reverse-engineer how ChatGPT decides which pages get cited and which ones get used quietly in the background, without credit.
If you’ve got clients asking how to show up in AI-generated answers and, at this point, most of you do, this is the study worth understanding.
Here’s what actually matters.
ChatGPT Is Reading Dozens of Pages and Citing About Half of Them
For any given query, ChatGPT is pulling dozens of URLs. It cites roughly 50% of them. The other half gets read, absorbed, used for context, and never acknowledged.
It’s the research equivalent of being used as a source but left off the bibliography. Your content contributed to the answer. Your domain got nothing.
This distinction matters because a lot of the current conversation around “AI visibility” conflates being used with being cited. They’re not the same thing, and the difference has real implications for how you build a content strategy.
The Retrieval System Has a Clear Hierarchy
ChatGPT tags every source internally with something called a ref_type a label for where the content came from. The citation rates across those source types are not even close:
General search index: 88.46% citation rate. If your content is ranking in search, it’s getting cited. There’s no ambiguity here.
News: 12%. ChatGPT reads news coverage extensively. It almost never cites it.
Reddit: 1.93%. And this is where the study gets genuinely interesting.
ChatGPT has over 16 million data points from Reddit. It uses Reddit constantly to gauge consensus, understand how real people frame topics, and build contextual understanding of a subject. But it almost never surfaces Reddit as an actual citation.
Ahrefs described it as ChatGPT treating Reddit like “a textbook it’s embarrassed to admit it read.” Hilarious, but also accurate and useful.
YouTube and academic sources come in under 1% for citation rates, despite being pulled at scale.
What this means practically: Reddit is a context layer, not a citation play. If you’re trying to shape how ChatGPT understands your topic area, Reddit presence matters. If you’re trying to get cited directly, it’s not the mechanism. The content that gets cited is the content that ranks in search. That’s the channel.
Your Title Is the First Filter, Not the Last
Before ChatGPT reads your content, it generates internal sub-questions, which the study calls “fan-out queries.” These are the questions behind the question. A user asks something broad, and ChatGPT internally spins up more specific queries to answer it well.
Then it looks at your title, snippet, and URL to decide whether your page is worth reading at all.
The semantic similarity between fan-out queries and cited page titles was 0.656. For pages that got skipped, it was 0.484. That’s a meaningful gap, and it’s driven entirely by whether your title semantically aligns with the sub-questions the AI is generating, not just the surface-level query.
Your title isn’t just an SEO asset anymore. It’s the first thing an AI evaluates when deciding whether your content is relevant to a question the user hasn’t even explicitly asked.
Clean URL Structure Is a 9-Point Citation Advantage
This one is almost too straightforward.
Pages with natural language URL slugs (something like /why-chatgpt-cites-pages/) had an 89.78% citation rate. Pages with cryptic, auto-generated URL structures came in at 81.11%.
That’s a gap of nearly 9 percentage points. For something you can fix in under a minute per page.
If you’re still using query-string URLs or random ID-based slugs, this is the clearest, low-effort, high-impact fix in the entire study.
The Average Cited Page Is About 500 Days Old
This was the finding that stopped our CCO, Kyle Christensen, mid-conversation when I shared the study with the team. His exact response: “500 days old for the average cited page?! That’s crazy.”
It’s also the most important strategic signal in the study.
ChatGPT isn’t chasing recency. It’s rewarding content that has established itself over time, built authority, and proven its staying power in search. The average cited page is roughly a year and a half old. That’s not a coincidence; it reflects how the model is weighing trust.
The exception is news content, where freshness is a tiebreaker. Cited news pages average around 200 days old, versus 300 for non-cited. If you’re publishing time-sensitive coverage, speed still matters.
For everything else, depth and durability win.
The Fundamentals Didn’t Change. The Urgency Did.
I’ve been in marketing long enough to remember when the conversation was about keyword density and meta description character limits. What this study tells us about getting cited by the most advanced AI currently in wide deployment is… basically the same thing.
Ranking in search drives 88% of citations. Semantically relevant titles influence whether you get selected. Clean URL structure correlates with higher citation rates. Comprehensive, maintained content outlasts the quick-hit stuff.
The tools are different. The principles are the same ones that have always mattered.
This is actually useful information if you’re explaining AI search visibility to clients who think they need an entirely new strategy. They probably don’t. They need to execute the existing strategy well and understand a few additional layers on top of it.
What to Do with This
Ranking is still the primary lever. There’s no shortcut here. If your content isn’t in the search index at a competitive position, ChatGPT isn’t looking at it. SEO fundamentals remain the foundation.
Audit your titles for semantic depth. Not keyword stuffing. Genuine alignment with the questions behind the questions. Think about what sub-queries an AI would generate to answer a broad prompt, then check whether your titles actually address those.
Fix your URL slugs. Descriptive, natural language slugs consistently outperform auto-generated or ID-based structures. This is one of the easiest audits you can run on existing content.
Stop chasing freshness for its own sake. Publish authoritative, comprehensive content and maintain it. A well-maintained page that’s 18 months old is outperforming new content in citation rates. That’s the data.
Use Reddit intentionally. Reddit shapes the contextual understanding that ChatGPT uses when evaluating your topic area, even when it doesn’t cite Reddit directly. Establishing genuine topical authority in relevant communities is worth doing. Just don’t expect it to show up as a citation.
Audit for citability, not just traffic. Look at your top-ranking pages. Are the titles semantically rich enough to match the sub-questions an AI would generate? Are the URLs clean? Is the content comprehensive enough to actually answer those sub-questions? Small updates to existing pages can have a disproportionate impact here.
The Takeaway
AI citation isn’t a new game with entirely new rules. It’s search, with an additional semantic filtering layer on top of it.
The pages getting cited are the ones whose titles align with the questions ChatGPT is asking behind the scenes, and that surface through the right retrieval channel, which, 88% of the time, is the same general search index you’ve been optimizing for all along.
If you’ve been doing SEO correctly, you’re most of the way there. If you haven’t, this is one more reason to start, and now you know exactly which parts of the foundation matter most.
