Academic SEO Is Coming
February 19, 2026
AI answer engines are becoming a discovery layer for science. That will change what gets read, cited, and built on.
AI SEO companies like Profound are helping brands optimize how they appear in ChatGPT, Perplexity, and Google AI Overviews. They are betting that if AI generates the answer, you need to be inside that answer. I think academia is heading toward the same dynamic.
Researchers are already using ChatGPT, Claude, and Perplexity for literature search, related work drafting, and early-stage synthesis. A 2025 Wiley survey of 2,400+ researchers reported AI tool usage rising from 57% to 84% in one year, with 62% using AI specifically for research and publication work. At the same time, domain-specific tools like PhilLit emerged because generic models still hallucinate citations.
That raises three practical questions:
- How close are LLM-generated related-work suggestions to what humans actually cite?
- Is there evidence that AI tools are already reshaping citation patterns?
- What paper features predict whether LLMs will surface your work?
Study Design
I built a dataset around 64 Best Paper Award winners from NeurIPS, ICML, ICLR, ACL, and CVPR (2019-2025) as a "famous paper" set. I also sampled 10 applied ML papers with 10-300 citations (for example, medical imaging and food quality detection) as a "niche" set.
For each anchor paper, I pulled the human reference list from OpenAlex as ground truth. Then I prompted GPT-4o and Claude for 25 related-work suggestions per paper, using a standardized prompt asking for foundational work, methodologically similar papers, and concurrent approaches.
I tested three prompt conditions:
- Closed-book: no retrieval, no search, only model memory.
- With web search: model can verify and expand with online context.
- Methodology-focused: stepwise prompt mirroring how humans build related work.
Result 1: LLMs Are Strong for Niche Discovery
On applied niche papers, suggestions were much more diverse and less anchored to canonical defaults.
| Paper Type | Unique Suggestions | Famous Paper Bias |
|---|---|---|
| Random applied papers (niche) | 92% | 10-20% |
| Best paper award set (famous) | 75% | 30-47% |
For unfamiliar domains, this is useful. If you need to get oriented outside your home subfield, these tools can surface relevant work quickly.
Result 2: The Default Bibliography Problem
On famous ML topics, both models repeatedly over-suggested canonical papers.
| Paper | Suggestion Rate | Human Citation Rate | Over-Suggestion |
|---|---|---|---|
| Attention Is All You Need | 47% | 15% | 3.1x |
| BERT | 31% | 12% | 2.6x |
| Deep Residual Learning | 28% | 8% | 3.5x |
| ImageNet Classification (AlexNet) | 25% | 6% | 4.2x |
This points to a rich-get-richer dynamic. Models learn strong associations from pretraining frequency, then reuse those defaults even when they are not the most relevant references for the specific paper.
Result 3: LLM-Human Overlap Is Low
Across conditions, overlap with human bibliographies stayed modest.
| Model | Condition | Jaccard | Precision | Recall |
|---|---|---|---|---|
| Claude Opus 4.5 | Closed-book | 5.9% | 14.2% | 8.1% |
| Claude Opus 4.5 | With search | 4.8% | 12.1% | 6.9% |
| GPT-4o | Closed-book | 4.2% | 11.8% | 5.9% |
| GPT-4o | With search | 5.1% | 13.4% | 7.2% |
Jaccard in the 4-6% range means model suggestions and expert references are mostly disjoint sets. Search access helps validity checks, but it does not reliably reproduce expert relevance judgments.
Result 4: Prompt Engineering Did Not Remove Bias
The methodology-focused prompt did not fix canonical drift.
| Prompt Type | Jaccard | Famous Paper Bias |
|---|---|---|
| Standard | 8.3% | 29.7% |
| Methodology-focused | 8.5% | 37.5% |
Bias increased slightly under structured prompting, suggesting this behavior is largely structural rather than a prompt-level issue.
Result 5: Adoption Is High, Citation Shift Is Not
I compared pre-AI papers (2019-2021) with AI-era papers (2023-2024). If AI tools were already driving bibliography construction end to end, post-2023 papers should align more with LLM suggestions.
| Era | Mean Jaccard | Mean Precision |
|---|---|---|
| Pre-AI (2019-2021) | 4.2% | 25.3% |
| AI-era (2023-2024) | 3.1% | 27.0% |
A two-sample t-test gave p = 0.549, so there was no significant difference. The feedback loop appears to be forming, but it has not measurably closed in citation data yet.
What Predicts AI Discoverability
I trained a logistic regression model (AUC = 0.70) to estimate which paper features predict whether multiple LLMs surface a paper, used here as a proxy for robust AI discoverability.
| Feature | Coefficient | Direction |
|---|---|---|
| Top venue (NeurIPS, ICML, etc.) | +0.29 | More discoverable |
| Contains "GAN" in title | +0.30 | More discoverable |
| Contains "deep learning" keyword | +0.16 | More discoverable |
| Contains "attention" keyword | +0.12 | More discoverable |
| Contains "reinforcement" keyword | +0.16 | More discoverable |
| Published in journal (vs conference) | -0.38 | Less discoverable |
| Long title | -0.16 | Less discoverable |
The venue effect is substantial. Method-signaling keywords are also strong predictors, while longer titles tend to reduce retrievability.
Practical Playbook for Researchers
- Lead with method and task in the title. Avoid generic phrasing like "A Novel Approach" when specific technical terms would carry more retrieval signal.
- Keep titles short and information-dense. Longer titles diluted discoverability in this dataset.
- Front-load extractable metadata in abstracts. Method, dataset, benchmark, and domain should be explicit early.
- Compensate for venue effects when needed. Journal papers may need stronger distribution via preprints, talks, posts, and community channels.
- Treat LLM related work as draft input. Use it for breadth and discovery, then curate rigorously before final citation decisions.
The Bigger Picture
Search engines changed web traffic. Social feeds changed news distribution. AI answer engines are now changing information discovery again. Academia is unlikely to be exempt.
The citation loop has not fully shifted yet, but the components are in place: mass adoption, measurable model bias, and distribution effects that favor already prominent work. The right response is early intervention with better tooling, stronger awareness, and explicit norms for critical curation.
Limitations
- The sample is concentrated in AI/ML; other fields may behave differently.
- The temporal split is small (n = 8 per era) for the adoption-vs-citation analysis.
- Only two model families were tested (GPT-4o and Claude).
- The feature model (AUC = 0.70) is useful but incomplete; factors like recency, author prominence, and abstract content likely matter.
Takeaway
AI-mediated discovery is becoming part of how scholarship gets routed. If we understand its biases now, we can shape systems that amplify relevant, high-quality work instead of just recycling what is already famous.