AI’s Impact on Link Building | KGMID SEO (James Dooley Interviews Dan Petrovic)

/ 26:32 / E319

Listen on your favourite platform

PlatformLink
YouTubeListen on YouTube →

What Does “AI’s Impact on Link Building | KGMID SEO (James Dooley Interviews Dan Petrovic)” Talk About?

This episode features James Dooley interviewing Dan Petrovic about how artificial intelligence and machine learning are reshaping link building strategies in SEO. Dan opens with a candid history of why link building has always failed, explaining that links are typically treated as an afterthought and forced into already-finished content. He describes how the industry trained bloggers to expect formulaic placements like the one commercial link plus two filler links approach, which has only made paid links easier to detect.

Dan then goes deep into the machine learning models he personally built to solve these problems. He explains how he scraped massive editorial sites like TechCrunch, Mashable, and Wired, fine-tuned Microsoft's DeBERTa model using token classification on gigabytes of data, and created a tool called Lingert that predicts where links naturally belong in any piece of text. He also describes a second model called Penguin designed to detect commercially motivated links, and how he built an agentic loop that pits a link writer against the Penguin detector in a continuous cycle until either a natural-looking placement is found or the effort is abandoned.

The conversation then shifts to brand authority and knowledge graph presence as the foundation of modern link strategy. James and Dan discuss unlinked brand mentions, how Gemini in AI mode will auto-link known entities even when no hyperlink exists, and why having a Google Knowledge Graph Machine ID (KGMID) is now central to being discovered and recommended by AI systems. They also cover Wikidata as a practical resource for understanding and mapping entity relationships, and how connecting entities through nodes and edges helps build the confidence and clarity scores that both Google and large language models rely on.

“If you cannot make it fit, don't do it. Don't make that link.”

— Dan Petrovic

Who Are the Guests on “AI’s Impact on Link Building | KGMID SEO (James Dooley Interviews Dan Petrovic)”?

Dan Petrovic is an SEO researcher and technologist with deep expertise in machine learning applications for search. He has presented on link building to large audiences at major industry events and has spent years building custom ML models to analyze and improve link strategies. He runs tools available at dan.ai and has conducted extensive research into entity mapping, knowledge graphs, Wikidata, and how Google's internal systems handle known entities. His work sits at the intersection of technical SEO, natural language processing, and AI visibility.

James Dooley is the host of the James Dooley Podcast and an experienced SEO practitioner and entrepreneur. He brings a strategic and practical perspective to the conversation, connecting Dan's technical insights to real-world business applications. James is well-versed in branding, knowledge graph optimization, and the shift from traditional ranking-focused SEO toward LLM visibility and entity authority, and he regularly emphasizes brand as the core signal running through all modern SEO strategy.

What Are the Key Takeaways From “AI’s Impact on Link Building | KGMID SEO (James Dooley Interviews Dan Petrovic)”?

Here are the key points discussed in this episode:

  • Training a machine learning model on the link patterns of top editorial sites like TechCrunch and Wired can reveal where links naturally belong in text, removing the guesswork from link placement.
  • The common tactic of placing one commercial link alongside two filler links to appear natural actually makes the paid link more obvious and easier for both humans and algorithms to detect.
  • Gemini in AI mode will automatically insert a hyperlink for well-known brands that appear as unlinked mentions, but only if the brand is already a recognized entity grounded in authoritative sources.
  • Having a Google Knowledge Graph Machine ID (KGMID) and a presence on Wikidata are now foundational steps for any business that wants to be cited and recommended by AI systems, not just ranked in traditional search.
  • Modern link building strategy should simultaneously serve three goals: improving traditional search rankings, strengthening a brand's confidence score in the knowledge graph, and earning citations and recommendations inside large language models.

“If you're a really well-known brand and you have a mention on somebody's website that doesn't have a link, Gemini in AI mode will fill it with a link.”

— Dan Petrovic

Is “AI’s Impact on Link Building | KGMID SEO (James Dooley Interviews Dan Petrovic)” Worth Listening To?

This episode is worth listening to for anyone who wants to understand where link building is actually headed rather than where it has been. Dan Petrovic does not deal in generalities. He walks through the exact technical process he used to build two proprietary ML models, explains the difference between token classification and sequence classification, and describes an agentic loop where a link writer and a spam detector battle each other until a genuinely natural placement emerges or the link attempt is abandoned entirely. That level of specificity is rare in SEO content and gives listeners a concrete mental model for evaluating the quality of their own link placements.

Beyond the technical depth, the episode reframes link building as an entity and brand authority problem rather than a volume or anchor text problem. The discussion around Gemini auto-linking known brands, the role of Wikidata in grounding AI models, and the three-part framework James outlines covering rankings, knowledge graph confidence, and LLM citation gives practitioners a genuinely updated strategic roadmap. Whether you are a technical SEO, a content strategist, or a business owner trying to understand why brand signals matter so much right now, this episode delivers practical and forward-looking insight in a single conversation.

Who Should Listen to “AI’s Impact on Link Building | KGMID SEO (James Dooley Interviews Dan Petrovic)”?

This episode is ideal for:

  • Technical SEOs and link builders who want to understand how machine learning can be applied to evaluate and improve link placement quality
  • Brand and content strategists looking to understand how entity authority and knowledge graph presence now influence AI-driven search visibility
  • Business owners and marketers who want to understand why brand signals, Wikidata entries, and KGMID status are becoming essential for appearing in AI overviews and LLM recommendations
  • SEO agency professionals and consultants who need to update their link building frameworks to account for AI spam detection and the shift toward entity-based search

Where Can You Listen to James Dooley Podcast?

You can listen to James Dooley Podcast on all major podcast platforms:

  • Apple Podcasts – Search for “James Dooley Podcast” in the Podcasts app
  • Spotify – Available on Spotify for free
  • Amazon Music / Audible – Listen through your Amazon account
  • Overcast – For iOS users who prefer a dedicated podcast app
  • Pocket Casts – Cross-platform podcast player

You can also subscribe using the RSS feed: https://feeds.transistor.fm/james-dooley-podcast

What Are Listeners Saying About This Episode?

★★★★★

“The breakdown of how Dan built the Lingert and Penguin models was unlike anything I've heard in an SEO podcast. He actually explains the token classification process and why it matters for detecting paid links. I went back and listened to that section twice.”

— Marcus T.

★★★★★

“The part about Gemini auto-linking known brands that appear without a hyperlink completely changed how I'm thinking about unlinked mentions. I had no idea that was happening and it immediately made me prioritize our entity presence differently.”

— Priya N.

★★★★★

“James and Dan together make this really accessible without dumbing it down. The three-part framework at the end covering rankings, knowledge graph confidence, and LLM citations gave me a clear way to explain modern SEO strategy to my clients who keep asking why we talk about branding so much.”

— Simon R.

James Dooley interviews Dan Petravich on how AI is changing link building for SEO. Dan explains why most outreach fails because links get forced into finished content, then shares how he trained machine learning models to predict natural link placement and detect obvious paid links. They cover unlinked brand mentions, why Gemini sometimes auto-links known entities, and why brand authority and knowledge graph presence now sit at the centre of modern link building strategy.

James Dooley: Hi, today I’m joined with Dan Petravich and the topic of conversation today is about has AI affected link building strategies for SEO.

Dan Petravich: Hey James, how we doing? You all right?

James Dooley: Doing well. So with regards to link building then, what’s changed with regards to now artificial intelligence is upon us. What do you think’s changed with regards to link building strategies?

Dan Petravich: Yeah, look, I have a lot to say about the topic. I’ve presented on link building for many years. I stood on stage in front of very large audiences and I told them to clean up their act and do better. So I’d like to give it a little bit of history and maybe highlight where link building always fails. Link building always goes as a sort of an afterthought in the SEO process and you’re always trying to make it fit the strategy that you already have. Right? So you start with, okay, we’ve got this thing we want to rank for. The page is already done. That’s finished. We need to get links for it somehow. And we’re just going to try to round a square peg. We’re trying to make the content, put it somewhere else, and then force the links to exist on that page. You know what I’m talking about. You’ve done links. This we’ve been doing for a long time to the point where people who accept our links are now aware of what we’re doing and they ask for money. But not only that, they are fitting our silly narrative of one link for yourself for your client and two links to make it look natural. The most ridiculous thing I’ve ever heard. One for Wikipedia and one for some gov website to make it look natural. So guess what you’re doing when you do the one plus two formula. You’re basically putting a target on your link making it super obvious. Hello, I’m the only commercial link on this page and these two are fillers. I get up on the stage, I think I was in Munich, and I say to people, this is what’s wrong at the moment, this is what I found. If I can spot your links so can Google. Nothing changed. People just keep doing the same thing. And those who accept our links now, they have policies that mirror that. They’re paring things back to us. They’re saying one link for yourself and two natural looking links. And I was really furious about the whole thing because we ruined it for everybody. We trained the bloggers to expect that as well. So what did I do? Let’s get back into AI. And actually I’m going to go down to the machine learning level now. TechCrunch, Mashable, Wired. I basically took top 10 biggest blogs in the world. Regardless of the topic, just by volume and readership, and I reviewed their link integrations. Just ad hoc view at everything. And I realised one thing that stood out for me straight away. Holy cow. 12 links on a page, 24 links on a page, 50 links on a page. Wow. When you go to those spammy guest post farms, one link, two links, three links, maybe four or five. That’s already an immediately obvious signal. But I was like, what if I could train a model to think about links in the same way that these top level, highest quality blogs in the world think about links, and how they link out naturally. It took me a couple of months. I scraped all of them. I scraped TechCrunch, gigabytes of data. I pre-processed everything, cleaned up the text, extracted sentence by sentence, and I marked up every time a location in the link existed from two in the character count. And I would mark everything. This is a link, this is a link, this is a link. So basically, I ended up with gigabytes of content with markup where links used to be. Doesn’t matter where the link goes, but that’s a link. So I pre-processed the data and I took a small off-the-shelf pre-trained model. I think it was Microsoft’s DeBERTa V2 or V3. And I fine-tuned that model using token classification. Token classification is not sequence classification. Sequence classification is positive sentiment, negative sentiment. Token classification goes down to the granularity of a single token. So basically it predicts the spans in the text which are more likely to be links than not. So in my pre-processing I marked all the non-link text as zeros and all the link text as ones. That went into the model. Model converted into token IDs. I did my padding, batching. That machine in the background processed everything. I trained for a couple of days. Voila. A model that’s intuitive about links on the web. So now I feed a blank page of text, no links, no markup, no HTML, nothing, just plain text. It’ll paint with great precision where a link falls in as learned from the best of the best of the web, how they link out naturally. So, how can you use this? How can you use AI to improve link building? Two things. One, you’re writing an editorial piece and you’re trying to come up with ways to integrate your links. This will already paint the spots where links fit in naturally. So when you’re trying to think about where do I put the link on this page, put the link there. If there’s no nice place, rewrite your content, reprocess the content in the model, paint it again, pick the best spots. So that’s sort of like a link planning stage. And then you integrate that and then you do your outreach and place links for all your links that you’ve already generated in the past. You can then run the processing, text extraction from all those linking pages from your link profile. You basically process your entire link profile and you run the analysis using this model, it’s called Lingert, and you run the text analysis and you do the predictions where the links naturally fit in that narrative. And you can basically do the scoring. Did I pick the same spot that the model picked? So that’s your first level of research. Just to fit where the links fit naturally. The second thing is I have another model since we’re talking about AI and links. The second model is called Penguin and its job is to spot your link. So the sole purpose of the model is to see who wanted the link on that page. It effectively acts as a Google webspam member, goes, visits the page, reviews all the links. Is there one that’s obviously for commercial purposes. Who wanted a link on this page. If it cannot detect it says I don’t know. And if it can it flags the link and it flags the filler links, the ones used to make it look natural. And I’ve been doing link profile analysis with this for two years now and the model outperforms human link builders on link detection. And I’m excited about this and nobody actually knows this. First time I’m talking about this. I have an agentic flow in place now that takes a piece of text, tries to integrate the links in a certain way and then the Penguin algorithm tries to break it and if it fails goes back in the loop and it cycles until you can fit the link in such a way that it fools my link spam model. Basically, I have a writer and rewriter, an evaluator, going in an agentic loop, constantly looping. I tried to fit a link in one of my, I wrote an article, I pretended I’m posting this on Moz.com, and I said, I want the link to this page to be on that article. Make it work. Went through 10 iterations, 20 iterations, 50 iterations, 100 iterations, it couldn’t make it work. My writer model, my link integrator model, my link builder model never could find a way to fool the judge. And okay, so I want to leave this with everyone listening. If that’s the case, if you cannot make it fit, don’t do it. Don’t make that link.

James Dooley: So you’re saying relevance there is mightily important because otherwise you’re just trying to push, like you always say, like a square into a circle and it’s just not going to fit. Therefore you’ve got to try and do it. So almost less is more of going with the quality as opposed to just trying to force it. I’ve got another question for you there then. Forget about the actual link, well it’s related to link building and AI. How important now is an implied link but not physical link being put on the page, like an unlinked mention, a branded mention, or whatever. How important has that become? More important with the AI or less important or what, like with regards to link building or corroboration?

Dan Petravich: For ranking purposes it doesn’t really matter. For training purposes it does matter. But where I find most utility is that there’s an interesting behaviour. It’s really cool that you mentioned that. There’s an interesting behaviour that if you’re a well-known brand, here we go, going back to branding, if you’re a really well-known brand and you have a mention on somebody’s website that doesn’t have a link, Gemini in AI mode will fill it with a link.

James Dooley: Really?

Dan Petravich: Yeah.

James Dooley: I didn’t know that.

Dan Petravich: It’s like a gift.

James Dooley: Yeah. So basically, you know, you’re like Nike, Adidas, Under Armour, and then if it’s familiar with those brands, it’ll just link them up even if they’re just mentioned but not linked. But this comes back down to again branding and if they’re familiar and they’ve got confidence and clarity that they know exactly who that brand is. Would that only do it if it’s got a KGM ID as being a known entity? Have you looked into it to see do they do it for some companies that might not have a knowledge panel?

Dan Petravich: If you’re not a known entity, it’s not going to happen. And I suspect you also have to be a source in the grounding. Anyway, not necessarily in that spot, like I’m saying it will fill that spot. But you have to be a source in the grounding because Gemini is obsessed about preventing hallucinations. Not Gemini, Gemini is a model. I should say the Gemini app or the AI mode or AI overviews. They’ve had some recent embarrassments with glue and rocks and giving poor advice, poor health advice to people. So they are a little bit paranoid now and I think that’s the reason that they’re grounding everything with multiple sources. So to prevent hallucinations they are only relying on things that are already in the grounding sources. So if you’re not in the grounding sources, if you’re not authoritative, I don’t think there’s a chance you’re going to get that gift of an AI mode result but then there’s like a link in there for you to click on. That I haven’t seen yet.

James Dooley: Some people might be watching this now and saying, let’s say what is a KGM ID, which stands for knowledge graph machine ID, and you mentioned there you need to be a source. Could anyone that’s got a real genuine business that aren’t a source at present, what’s the easiest way to build that authority and brand because you’ve mentioned on every single one of the episodes a key takeaway from every single one of them is brand, brand, brand, brand, and everything seems to relate back to being brand. The trust signals that come with a brand, the confidence that comes with a brand, the clarity that comes with a brand. How does someone make that real business into being a source?

Dan Petravich: Invent time machine, go back, I don’t know, seven years back, edit Firebase. Before acquisition. I’m glad I spammed Firebase when I did because I got in where I wanted to get in. But was it Firebase? Am I getting the right?

James Dooley: We do in the UK, it’s Crunchbase, it’s a massive site.

Dan Petravich: Not Crunchbase. It was like the Google acquired one database. I’m pretty sure, I think it is Firebase. Yeah.

James Dooley: Firebase. Yeah.

Dan Petravich: I could be, it was a long time ago that they did that acquisition. Joke aside, time machines and everything. If you want to see how all this works, Google actually has a proper system of entities not just for knowledge graph and knowledge panels. They actually have all the entities mapped out. And I even have an extension that helps you. You can go on Google search results page and you can hit that extension to see who is a known entity and it gives you the entity ID from Google’s knowledge graph. And I also have, let me see if I can find it.

James Dooley: Does that where is that pulling in from the knowledge graph API within Google?

Dan Petravich: Yeah. It just looks at the rendered source of the page and finds that. Basically on dan.ai/tool, one of the many tools that I have there listed is Google entities. So you can basically do a search. You can just look up a name or a brand or a product and you can see if you have an entity in Google’s knowledge graph for that. That’s basically your proof that you’re like a registered known quantity with Google within the knowledge graph. That’s Google’s MID, machine ID.

James Dooley: Yeah.

Dan Petravich: Why is that relevant. That sort of logic and reasoning is throughout Google’s systems. If you look at Vertex documentation, whether you’re doing custom search, if you’re doing general Google search, MIDs are always there and you can ground with that. They have a complete knowledge graph on all the known entities. Now there is no way to just download all that and map things out because that’s proprietary now. You can get it from old school, like frozen in time when the Firebase was snapshot. But there is an alternative. I’m wondering if I can think of it. Because I just recently was working on it and trying to map out all the entities in that. There is, yeah, maybe we’ll sync up after the call. I’ll send you the link. The name escapes but it’s like a pretty well-known entity database.

James Dooley: Yeah, we’ll put the link in the description. We’ll find, send it me in a bit and we’ll put the link in the description. But for me on this with regards to link building for AI and stuff like that, I know we’re talking a little bit about known entities and being a source or having a KGM ID. Everything around our business model now comes back down to not just ranking where it used to be. We used to be obsessed with just ranking in Google. And obviously then we realised many years back to rank in Google you want brand, social media, and real traffic and engagement and everything else that comes with it. The second part is the knowledge graph, is trying to improve that confidence score in the knowledge graph for confidence and clarity. And then the third is in the LLMs trying to be not just cited but recommended in the LLMs. And I think if within everything you do with your link building strategies, if you can try to align it to be helping your confidence score with who you are and what you do in the knowledge graph, try to corroborate and get the framing for LLMs, but then also get the rankings, I think those three together falling in line with each other is kind of what we’re doing with our link building strategies nowadays. Is there anything else on there then related to improving link building for AI?

Dan Petravich: Before I say that, Wikidata. Wikidata, yeah, yeah, yeah. Get on it. Basically, I did something really important. What I’ll do is I’ll do a quick screen share just to show you what I’ve found. Basically I used all the Wikidata entities and I’ve drawn a parallel between entities that I’ve embedded, known entities that I’ve embedded, and I’ve done the semantic similarity between Gemini model and its little cousin Gemma. And I found they’re basically in the same semantic space. The figures are different, but when you rotate the embeddings, when you mix things up, they always converge the same semantic thing. And I think there’s something about Wikidata, even if it’s not verbatim from Google’s knowledge graph, there’s something about Wikidata that’s of really, really strong utility for SEOs looking to gain an edge in not just SEO but also AI visibility. I would seriously, I’m glad I didn’t forget about this. So yeah. Seriously check out Wikidata. It’s a great.

James Dooley: The only thing I would say on that, anyone who’s watching this, is making certain that they don’t go out creating a Wikidata account himself and editing it himself. If they don’t have some sort of knowledge of who they are online, I would start building up who you are online. Make ideally getting an entity home. So like having a jamesdooley.com and wrapping that. I mean schema helps to pull everything together but trying to pull that together. Otherwise I know a lot of people that’s tried to create them and actually had them deleted. And once you try to then create it again it becomes hard. It’s almost like trying to create yourself a Wikipedia page before you actually deserve having a Wikipedia page. It’s the same with Wikidata. A lot of people have tried to create it and had it deleted.

Dan Petravich: Yeah, it’s not going to work. I refer to it as a resource for understanding the current makeup of the entities because it’s not just Google. There’s other systems and those systems will use this as both training data and sort of like a crutch to lean on for grounding of the models. So I think this is an important resource. It didn’t cross my mind that I could try to inject my own entry in there. But because I think there has to be a parallel, an actual Wikipedia for this to work.

James Dooley: There doesn’t need to be a Wikipedia for it. There needs to have a Wikipedia page, but you can inject your own information. So if I’ve got a new brand, I can create that new brand or business and get it a Wikipedia kind of entrance. I need to be connecting it, ideally with other entities. So if I say James Dooley is the founder of Petravich SEO as being an example, and I’m saying that that’s a business, because now it’s got connections with me that is an entity, then it works better. Where if you’re just trying you and you don’t have any sort of relevance online it becomes a lot more difficult. You need to connect the entities. It’s almost like nodes and edges. You need to be connecting those relationships together and the more connections you have on the web then it’s more likely to stick. And creating those, what I would say is instead of it just being a hack to say everyone should go and add a Wikidata kind of things, so many people don’t add themselves to Wikidata and it’s so important to do it as long as you are genuine business and you have got those connections and stuff like that. But yeah if one’s not been created then go in and inject one and create one for sure. Getting that in there is huge because then actually a lot of the time that triggers in time a knowledge panel, especially for an individual. It can trigger, if you can go and offer a book or even be on podcasts like this you can get an IMDb profile and stuff like that and all that adds to the confidence and clarity score of who Dan Petravich is. And it’s repeating who you are and what you do then it’s building the confidence score. It’s own little algorithm just on knowledge graph and who they are that I think Gemini is going to be leaning on more and more in time.

Dan Petravich: It’s very similar to Google’s internal knowledge graph. You mentioned graph, I actually built the full graph.

James Dooley: Really?

Dan Petravich: Yeah, full graph. So the whole of Wikidata. So I’m showing my age where I couldn’t recall the name. To be fair it is 8:00 p.m. I’m done. Mentally foggy already. So what I did is I basically downloaded the whole dataset. I extracted the label information and then I built up the full undirected knowledge graph where I treat the labels, the text labels of each entity as a node and obviously I’ve got edges. But there was some data clean up in there because for each label you have multiple language versions as well.

James Dooley: Yeah.

Dan Petravich: So then you have to think about how to treat that and this and that. But I’m not going to go into details of that. I’m just reading from my screen now. It’s 68 GB file. So it’s a SQLite database with full connectivity and the screen share that you’ve seen earlier was, I actually did the embeddings, vector embeddings of the entire knowledge graph.

James Dooley: Yeah.

Dan Petravich: So I now have a semantic search engine that if I type in Rand Fishkin for example, it’ll give me high cosine similarities towards SEO but low cosine similarities towards cake making. Right. I actually have. So basically what this gives you, this gives you a window of insight and these embeddings are generated by Gemini. So Gemini, how Gemini thinks about brands. So you can basically put your brand as a search term and it will return the most aligned concepts with that brand in the semantic space of the embedding model from Google. Same technology as Gemini, the journey model in AI search. Just think about the utility of that.

James Dooley: Yeah, it’s unbelievable.

Dan Petravich: It’s great for keyword research, great for clustering, great for keyword classification. It’s good for keyword gap analysis, content ideas. It’s just, link building is just insane that this data is free and available to us to use. But if it wasn’t for AI, I would have never been able to implement this. So just super grateful that we live post AI revolution where we can do all these things. It’s crazy how nearly every single episode we’ve spoke about and this one is about link building for AI kind of comes back down to again, it’s building brand. It is genuinely getting yourself in the knowledge graph. In my opinion there’s only brands that are in there, or obviously there’s individuals, there’s people, there’s businesses and stuff like that. But the more confident they are in yourself, in it, Dan Petravich, it’s been an absolute pleasure.

James Dooley: We hope you like the video on link building and what has changed in the AI era. I strongly recommend checking out a couple of the links in the description. There’s one about the future of SEO and there’s another one which is over 45 minutes long about how to optimise for the LLMs, ChatGPT, Gemini, Perplexity, and all the other AI platforms of what there is. Dan, it’s been an absolute pleasure. Thank you very much.

Dan Petravich: Thanks, James.

Creators & Guests

James Dooley Host
James Dooley

James Dooley is a UK entrepreneur.

No episode selected
0:00
0:00