Do LLMs Use Metadata or Page Content? James Dooley Interviews Sergey Lucktinov
Listen on your favourite platform
| Platform | Link |
|---|---|
| YouTube | Listen on YouTube → |
What Does “Do LLMs Use Metadata or Page Content? James Dooley Interviews Sergey Lucktinov” Talk About?
This episode of the James Dooley Podcast tackles one of the most debated questions in the modern SEO community: when AI tools like ChatGPT, Gemini, Claude, and Perplexity retrieve information, do they rely solely on metadata from search results, or do they actually open and read page content? Guest Sergey Lucktinov breaks down the full LLM retrieval pipeline into distinct stages, explaining that the answer is both, depending on the query and what is found at each step. The conversation walks through how query fan-out works, what signals an LLM uses at the metadata stage, and why elements like meta titles and descriptions are more critical than ever in the age of AI search.
The episode goes deeper into what happens after the metadata stage, covering light skimming of DOM structure and page stability before full content parsing begins. Sergey explains that relevance must be established within the first 50 to 70 words of a page or the LLM discards it before completing a full analysis. The discussion also covers the role of schema markup as a context-clarifying tool, the complete irrelevance of meta keywords, and why leaving meta descriptions blank is a strategy that no longer holds up. Finally, the episode outlines three distinct retrieval paths: pre-trained knowledge, metadata-only retrieval, and full deep-analysis fan-out, giving listeners a practical framework for understanding when and how their content gets seen by AI systems.
“If you do not lead with meaning, you fail early. If you want to be relevant for a specific fan-out query, the answer needs to appear in roughly the first 50 to 70 words.”
— Sergey Lucktinov
Who Are the Guests on “Do LLMs Use Metadata or Page Content? James Dooley Interviews Sergey Lucktinov”?
Sergey Lucktinov is the guest expert on this episode, brought in to demystify how large language models retrieve and process web content during AI-powered search. His knowledge spans the technical architecture of LLM retrieval pipelines, including metadata filtering, DOM structure analysis, schema markup interpretation, and the distinction between pre-trained knowledge and live search retrieval. His ability to break down complex retrieval mechanics into actionable SEO guidance makes him a valuable voice for anyone trying to understand how to optimise content for AI-driven search environments.
James Dooley serves as host and interviewer, guiding the conversation with practical, community-driven questions that reflect the real confusion SEOs and marketers are experiencing as AI search tools become mainstream. James grounds the discussion by challenging common assumptions, such as the idea that leaving meta descriptions blank is acceptable, and by asking about edge cases like metadata consensus and pre-trained knowledge retrieval. His interviewing style keeps the episode focused and accessible for a broad digital marketing audience.
What Are the Key Takeaways From “Do LLMs Use Metadata or Page Content? James Dooley Interviews Sergey Lucktinov”?
Here are the key points discussed in this episode:
- Meta titles and meta descriptions remain critically important in the LLM era because they are the first and sometimes only signals an AI retrieval system evaluates before deciding whether to open a page.
- If a page fails to establish relevance within approximately the first 50 to 70 words, it will be discarded by the LLM before a full content analysis is ever completed.
- Schema markup functions as a helpful context signal that reduces computational effort for the model, but only when implemented correctly and consistently matched with actual page content.
- Meta keywords are entirely obsolete and carry no value in LLM-based retrieval, while meta descriptions continue to play a meaningful role in the first stage of the retrieval pipeline.
- There are three distinct LLM retrieval paths: answers drawn from pre-trained knowledge, metadata-only retrieval for simple non-controversial queries, and full fan-out with deep page parsing for complex or subjective topics.
“Meta keywords are completely irrelevant. They are a relic from the past and have no value in the LLM era.”
— Sergey Lucktinov
Is “Do LLMs Use Metadata or Page Content? James Dooley Interviews Sergey Lucktinov” Worth Listening To?
This episode is worth listening to because it provides a clear, technically grounded explanation of a question that most SEO content online still gets wrong or oversimplifies. Rather than offering vague advice about optimising for AI, Sergey Lucktinov maps out a specific, stage-by-stage retrieval pipeline that explains exactly when metadata matters, when pages get opened, and when the LLM bypasses search entirely. That level of precision is rare and immediately useful for anyone making decisions about how to structure and write web content.
What makes this episode particularly valuable is how it reframes established SEO practices in light of AI search behavior. The argument that hardcoding intent into meta descriptions is now a safety measure rather than an optional best practice is a meaningful shift in thinking. The discussion of schema markup as a double-edged tool and the explanation of metadata consensus as a reason pages sometimes never get opened are insights that give listeners a genuinely updated mental model of how their content is evaluated. Anyone managing a website in 2024 and beyond will find this episode directly applicable to their work.
Who Should Listen to “Do LLMs Use Metadata or Page Content? James Dooley Interviews Sergey Lucktinov”?
This episode is ideal for:
- SEO professionals who want to understand how AI-powered search tools like Gemini, ChatGPT, and Perplexity evaluate and retrieve web content
- Content strategists and writers who need to know how to structure and front-load content to survive LLM retrieval filtering
- Digital marketers responsible for on-page optimisation who are questioning whether traditional practices like meta descriptions and schema markup still matter
- Website owners and business operators who rely on organic and AI-driven search visibility and want a clearer picture of what signals determine whether their pages are surfaced
Where Can You Listen to James Dooley Podcast?
You can listen to James Dooley Podcast on all major podcast platforms:
- Apple Podcasts – Search for “James Dooley Podcast” in the Podcasts app
- Spotify – Available on Spotify for free
- Amazon Music / Audible – Listen through your Amazon account
- Overcast – For iOS users who prefer a dedicated podcast app
- Pocket Casts – Cross-platform podcast player
You can also subscribe using the RSS feed: https://feeds.transistor.fm/james-dooley-podcast
What Are Listeners Saying About This Episode?
“Finally an episode that actually explains the pipeline rather than just saying optimise for AI. The breakdown of the three retrieval paths changed how I think about our content strategy. The point about the first 50 to 70 words was a wake-up call.”
“I have been going back and forth on whether to keep filling in meta descriptions and this episode settled it for me. Sergey's explanation of metadata consensus and why pages sometimes never get opened was something I had never heard explained that clearly before.”
“Really appreciated that schema markup was treated honestly here, not as a magic fix but as something that can actually hurt you if done wrong. Short episode but packed with practical detail that I can actually use on client sites.”

James Dooley: Hi. Today I’m joined with Sergey, and we’re covering a really interesting topic. When people search using AI tools like ChatGPT, Gemini, Claude, or Perplexity, there’s a big debate in the SEO community. Do large language models only use metadata like the meta title and meta description from the search engine results page, or do they actually open pages and parse the content? Sergey Lucktinov: It actually does both. There are different stages of retrieval. The first stage is when query fan-out happens and the LLM fetches results from a search engine, usually Google or Bing. At this stage, it only sees metadata. That includes the website name, page URL, meta title, and meta description. If those elements are irrelevant to the fan-out query it is trying to answer, the result gets removed immediately. So at the first stage, meta titles and meta descriptions are extremely important. This is why leaving meta descriptions blank, which used to work years ago, is no longer effective. You should always include a proper meta description that covers the likely fan-out intent you want to be relevant for. James Dooley: So once it moves past metadata and decides it wants more information, does it just pull a central snippet from the page, or does it read the full content? Sergey Lucktinov: After the metadata stage, it moves into light skimming. This is where it checks the DOM structure, page stability, and whether the page is clean and properly structured. If the page passes that stage, then it moves into full parsing. At that point, every section of the page is analysed, chunk by chunk, to see whether it makes sense and matches the intent. The second stage is critical. If you do not lead with meaning, you fail early. If you want to be relevant for a specific fan-out query, the answer needs to appear in roughly the first 50 to 70 words. If the LLM does not see relevance there, the page gets discarded before full analysis happens. James Dooley: And does it read schema markup as part of this process? Sergey Lucktinov: Yes, absolutely. Schema markup is important, but it is not a silver bullet. It works more like a handshake. You are making the LLM’s job easier by clearly explaining the context and content of the page. However, if schema is implemented incorrectly or mismatched with the actual content, it can hurt you. If someone does not understand schema properly, it is often better not to use it at all. When done correctly, schema reduces computational effort for the model and helps it understand your content faster and more cheaply. James Dooley: What about meta keywords and meta descriptions? Many SEOs stopped using both. Are either of them used by LLMs today? Sergey Lucktinov: Meta keywords are completely irrelevant. They are a relic from the past and have no value in the LLM era. Meta descriptions matter because of what is shown on the search engine results page. That text is used during the first retrieval stage. Even though Google sometimes rewrites meta descriptions dynamically, you should not rely on that. You want to control the message and clearly explain the intent of the page. James Dooley: A lot of people argue that leaving meta descriptions blank is fine because Google will pull a relevant snippet dynamically. Are you saying that is now a bad approach? Sergey Lucktinov: Yes. You should explain clearly what the page is about. Sometimes Google does a good job, but you cannot rely on it 100 percent of the time. If you hardcode the intent into the meta description, the LLM immediately understands that the page may be relevant and allows it to pass the first stage. That is a much safer approach. James Dooley: So if Gemini performs a Google search and the top results all say roughly the same thing in their metadata, does the LLM sometimes skip opening the pages entirely and just answer from the search results? Sergey Lucktinov: Yes, depending on the query. If the topic is not subjective or controversial, it may not open any pages at all. It will simply rely on metadata consensus. James Dooley: And if the answer is already known, like the capital of France, it will not even search? Sergey Lucktinov: Exactly. That information already exists in the pre-trained knowledge. There are three main retrieval paths. The first is from pre-trained data. You cannot optimise for that. The second is metadata-only retrieval, which is enough for simple or non-controversial queries. The third is full fan-out and deep analysis, which happens when the topic is subjective, ambiguous, or complex. James Dooley: That makes complete sense. Hopefully this clears things up for people wondering whether LLMs open pages, rely on metadata, or use internal knowledge. Thanks very much for joining us, Sergey. Sergey Lucktinov: Thank you.
Creators & Guests
Host
James Dooley is a UK entrepreneur.