Siddharth Ramakrishnan

Writing

Academic SEO Is Coming

February 19, 2026

AI answer engines are becoming a discovery layer for science. That will change what gets read, cited, and built on.

AI SEO companies like Profound are helping brands optimize how they appear in ChatGPT, Perplexity, and Google AI Overviews. They are betting that if AI generates the answer, you need to be inside that answer. I think academia is heading toward the same dynamic.

Researchers are already using ChatGPT, Claude, and Perplexity for literature search, related work drafting, and early-stage synthesis. A 2025 Wiley survey of 2,400+ researchers reported AI tool usage rising from 57% to 84% in one year, with 62% using AI specifically for research and publication work. At the same time, domain-specific tools like PhilLit emerged because generic models still hallucinate citations.

That raises three practical questions:

  1. How close are LLM-generated related-work suggestions to what humans actually cite?
  2. Is there evidence that AI tools are already reshaping citation patterns?
  3. What paper features predict whether LLMs will surface your work?

Study Design

I built a dataset around 64 Best Paper Award winners from NeurIPS, ICML, ICLR, ACL, and CVPR (2019-2025) as a "famous paper" set. I also sampled 10 applied ML papers with 10-300 citations (for example, medical imaging and food quality detection) as a "niche" set.

For each anchor paper, I pulled the human reference list from OpenAlex as ground truth. Then I prompted GPT-4o and Claude for 25 related-work suggestions per paper, using a standardized prompt asking for foundational work, methodologically similar papers, and concurrent approaches.

I tested three prompt conditions:

Result 1: LLMs Are Strong for Niche Discovery

On applied niche papers, suggestions were much more diverse and less anchored to canonical defaults.

Paper TypeUnique SuggestionsFamous Paper Bias
Random applied papers (niche)92%10-20%
Best paper award set (famous)75%30-47%

For unfamiliar domains, this is useful. If you need to get oriented outside your home subfield, these tools can surface relevant work quickly.

Result 2: The Default Bibliography Problem

On famous ML topics, both models repeatedly over-suggested canonical papers.

PaperSuggestion RateHuman Citation RateOver-Suggestion
Attention Is All You Need47%15%3.1x
BERT31%12%2.6x
Deep Residual Learning28%8%3.5x
ImageNet Classification (AlexNet)25%6%4.2x

This points to a rich-get-richer dynamic. Models learn strong associations from pretraining frequency, then reuse those defaults even when they are not the most relevant references for the specific paper.

Result 3: LLM-Human Overlap Is Low

Across conditions, overlap with human bibliographies stayed modest.

ModelConditionJaccardPrecisionRecall
Claude Opus 4.5Closed-book5.9%14.2%8.1%
Claude Opus 4.5With search4.8%12.1%6.9%
GPT-4oClosed-book4.2%11.8%5.9%
GPT-4oWith search5.1%13.4%7.2%

Jaccard in the 4-6% range means model suggestions and expert references are mostly disjoint sets. Search access helps validity checks, but it does not reliably reproduce expert relevance judgments.

Result 4: Prompt Engineering Did Not Remove Bias

The methodology-focused prompt did not fix canonical drift.

Prompt TypeJaccardFamous Paper Bias
Standard8.3%29.7%
Methodology-focused8.5%37.5%

Bias increased slightly under structured prompting, suggesting this behavior is largely structural rather than a prompt-level issue.

Result 5: Adoption Is High, Citation Shift Is Not

I compared pre-AI papers (2019-2021) with AI-era papers (2023-2024). If AI tools were already driving bibliography construction end to end, post-2023 papers should align more with LLM suggestions.

EraMean JaccardMean Precision
Pre-AI (2019-2021)4.2%25.3%
AI-era (2023-2024)3.1%27.0%

A two-sample t-test gave p = 0.549, so there was no significant difference. The feedback loop appears to be forming, but it has not measurably closed in citation data yet.

What Predicts AI Discoverability

I trained a logistic regression model (AUC = 0.70) to estimate which paper features predict whether multiple LLMs surface a paper, used here as a proxy for robust AI discoverability.

FeatureCoefficientDirection
Top venue (NeurIPS, ICML, etc.)+0.29More discoverable
Contains "GAN" in title+0.30More discoverable
Contains "deep learning" keyword+0.16More discoverable
Contains "attention" keyword+0.12More discoverable
Contains "reinforcement" keyword+0.16More discoverable
Published in journal (vs conference)-0.38Less discoverable
Long title-0.16Less discoverable

The venue effect is substantial. Method-signaling keywords are also strong predictors, while longer titles tend to reduce retrievability.

Practical Playbook for Researchers

The Bigger Picture

Search engines changed web traffic. Social feeds changed news distribution. AI answer engines are now changing information discovery again. Academia is unlikely to be exempt.

The citation loop has not fully shifted yet, but the components are in place: mass adoption, measurable model bias, and distribution effects that favor already prominent work. The right response is early intervention with better tooling, stronger awareness, and explicit norms for critical curation.

Limitations

Takeaway

AI-mediated discovery is becoming part of how scholarship gets routed. If we understand its biases now, we can shape systems that amplify relevant, high-quality work instead of just recycling what is already famous.