The short version: We found five factors that account for about 80% of whether AI engines cite your brand. We turned them into a free audit tool. Here’s the research, what we learned, and how you can build something like it yourself.
Last week we rolled out a free AI Discovery Audit. It’s part of a growing set of free tools we’re building for customers and growth marketers.
Here’s the thing: AI engines like ChatGPT, Perplexity, and Google AI Overviews are a real distribution channel now. Frase reports AI-referred traffic is growing 527% year over year. RankScience found visitors from AI search convert at 4-5x the rate of regular organic traffic. This isn’t a future trend. It’s happening.
But the goal isn’t just to “show up” in AI results. It’s to change how you think about content so AI visibility becomes part of your regular workflow.
Let me say this: building the tool was hard. But there’s nothing stopping any of our readers from giving it a try. Here’s how we figured out what’s working.
Where Did the Research Come From?
We learned from two places: our own client work and academic research.
What we saw firsthand
During a consulting engagement with Hedge Fund Alpha, we noticed a curious thing. The more we posted on Reddit and YouTube, the more likely we were to show up in ChatGPT answers. This wasn’t a small improvement. Our Reddit posts and comments led to a 300% increase in ChatGPT mentions. (ChatGPT still drives more referral traffic than Claude and other AI engines, but I expect that to change soon.)
At People Inc, while leading growth for MyRecipes, I saw firsthand how internal experiments with YouTube and content summaries led to higher mentions across AI engines. With over 200 million visitors, People Inc’s main revenue source is search traffic. Figuring out AI was not optional for them.
What the research says
We also drew from outside sources:
- The Princeton/Georgia Tech GEO study, which analyzed 10,000 AI queries and found that citing credible sources boosts your chances of showing up in AI answers by 30-40%
- Research from Ahrefs, Semrush, Amsive, and HubSpot on what actually gets brands mentioned by AI
- Conversations and events with people like Jonathan Martinez (GrowthPair) and Sean Ellis on growth-led discovery strategies
What Are the Five Factors That Drive 80% of AI Citations?
We found that these five things account for the vast majority of whether AI engines will cite you or skip you.
| Factor | What It Means | Key Stat |
|---|---|---|
| 1. Page structure | Format content so AI can easily pull answers: tables, question-and-answer sections, short direct answers | Tables get cited 34% of the time vs 3% for plain paragraphs |
| 2. Third-party authority | Get mentioned on Reddit, G2, Wikipedia, YouTube, and industry publications | 85% of AI citations come from third-party sources, not your own site |
| 3. Citations and credentials | Use real statistics, cite where you got them, and put a named author on every piece | Adding citations boosts visibility by 30-40% (Princeton/GEO study) |
| 4. Technical setup | Behind-the-scenes site code, FAQ pages, an llms.txt file, clean URLs | Sites with the right technical setup get cited 2.8x more often |
| 5. Freshness | Keep content updated and publish regularly | 76.4% of top-cited pages were updated within 30 days |
Here’s what each one actually looks like in practice.
1. Structure your pages so AI can pull answers from them
AI engines don’t read your page top to bottom. They scan for passages they can grab and drop into an answer. The format of your content decides whether you get picked or skipped.
What works:
- Question-style headings followed by a direct answer (40-60 words), then a longer explanation
- Comparison tables whenever you’re comparing features, pricing, or options. AI engines love tables
- Bullet points and numbered lists for supporting detail
- Short paragraphs. Two to three sentences. Keep sentences under 20 words when you can
Opollo tested this across 120 AI queries. Tables got picked up 34% of the time. Q&A content: 29%. Structured lists: 21%. Plain paragraphs? Just 3%.
2. Build authority on third-party platforms
This was the most surprising finding. 85% of the time AI engines cite a brand, the source is a third-party website, not the brand’s own site. Ahrefs found in December 2025 that how often your brand gets mentioned across the web is the single strongest predictor of whether AI will cite you. Stronger than how many sites link to you. Stronger than how well-known your domain is. Mentions matter way more than links.
According to research from Amsive, the platforms that matter most are Reddit, Wikipedia, G2, YouTube, and authoritative industry publications. And it’s better when communities mention your brand on their own rather than through traditional PR.
Here’s what creates a real flywheel: publish original research that other people cite. More third-party mentions lead to more AI citations. We saw this play out directly with Hedge Fund Alpha.
3. Add real numbers, cite your sources, and put a name on it
The Princeton/Georgia Tech GEO study gives the clearest evidence here. Across 10,000 queries:
- Adding credible source citations boosted visibility by 30-40%
- Swapping vague claims (“very fast”) for specific statistics (“10,000 requests per second”) boosted visibility by 30-40%
- Content with a named author and visible credentials gets cited 41% more often
The takeaway is simple. Replace marketing fluff with real numbers. Cite your sources. Put a real person’s name on your content with their credentials visible. AI engines reward this heavily.
4. Get the technical setup right
I honestly didn’t know much about this until recently. But it makes a real difference. Here’s what it involves:
- Structured data: This is code your developer adds behind the scenes that tells AI engines what your page is about. Is it a product? An article? A FAQ? Sites with this get cited 2.8x more often, according to AirOps
- FAQ pages with the right code: If you have a FAQ section on your site, there’s a way to tag it so AI engines can read the questions and answers directly. Only 10.5% of AI-cited pages do this. Adding it gives you a real edge
- An llms.txt file: A simple text file you add to your website that tells AI engines what your site is about and where your key pages are. Think of it as a welcome mat for AI
- Clean URLs: Descriptive URLs with 5-7 words get 11.4% more AI citations than generic ones
- The content formats that get cited most: “Best of” lists, “vs” comparison pages, how-to guides, product pages with clear factual descriptions, and original data studies
5. Optimize for each AI platform separately
Each AI engine rewards different things. We learned this the hard way:
- ChatGPT rewards authority and clarity. It cites Wikipedia in nearly half its answers
- Google AI Overviews overlaps heavily with traditional search (76% overlap with the organic top 10)
- Perplexity prefers fresh content with lots of cited sources and averages 6.6 references per answer
- Claude favors structured, long-form content with clear logical flow
Here’s the kicker: there’s less than 1% overlap between what ChatGPT and Perplexity cite for the same query. If you’re treating them the same, you’re leaving visibility on the table.
How Did We Build the Tool?
Here’s what we did next:
- Put all our research into one document. Every stat, every best practice, every scoring idea in one place. (We published this as part of our AEO/GEO service page.)
- Fed it to Claude and planned how the tool would work. We mapped out six scoring categories: can AI crawlers access your site, do you have the right behind-the-scenes code, is your content structured well, does your brand show up on third-party sites, do you cite sources and name authors, and is your content fresh. Each one has specific checks.
- Built the tool with Claude Code. We went from plan to working prototype in about a week.
- Tested it against real sites and iterated. This is where it got interesting.
What surprised us during the build
A few things we didn’t expect:
- You need web search APIs. Claude Code can’t browse the web on its own. We added APIs like EXA and Brave Search to check brand presence on Reddit, Wikipedia, G2, and other platforms
- Every scan costs money. Each time someone runs a report, it costs us a little. This is why we gated part of the report behind an email capture. We need to at least start building a relationship with the people using the tool
- Scoring is harder than it sounds. Grading a website’s AI readiness is not straightforward. We spent a lot of time scanning well-known sites like HubSpot and Ahrefs to calibrate. Every time we added a new check, we had to go back and see if the grades still felt right
How Can You Try This Yourself?
Here’s what I’d recommend:
- Start with our AI Discovery Audit to see where your site stands today
- Read through the five factors above and figure out your biggest gaps
- If you want to build your own tracker, take the research, feed it into your favorite AI coding tool, and start from there
Send me a note at alex@theremarkableagency.com if you have questions about how to do this or want help improving your AEO/GEO visibility. Or book a free strategy call and we’ll walk through it together.
Seeing patterns like this in your own growth data?
We help growth-stage companies diagnose exactly what's working and what's not.
Book a Free Diagnostic