No Code Solution for Internal Linking Automation [Part 1]

Over the past few days, I’ve built a backend content-processing system inside Pabbly Connect that automatically turns our entire sitemap into a structured knowledge database.

This system now works as the foundation for all my internal-linking automation and will keep improving as new blogs get published.

TL;DR

I’ve built a backend Pabbly flow that:

  • extracts content from every URL

  • cleans and structures it

  • generates an embedding summary

  • stores it in a database

  • and prepares it for internal linking automation

This system will now quietly run in the background and strengthen our entire content ecosystem while reducing manual work and improving SEO performance.


1. What the Backend Does

The flow starts by pulling all URLs from the sitemap.
For each URL, it performs a full processing pipeline:

a) Fetch the content

The flow loads the page HTML and prepares it for cleaning.

b) Extract key fields

Using regex + text parsing, the system automatically extracts:

  • Title

  • All H2 and H3 headings (in sequence)

  • Cleaned Body Text (core editorial content only)

This ensures we’re not embedding junk like navigation, footer, images, CTA sections, forms, etc.

c) Generate embedding summaries

Once the structural elements are clean, a custom AI prompt produces a compact embedding summary that represents the semantic meaning of the entire page.

This summary is:

  • highly compressed

  • optimized for semantic search

  • ideal for matching related content

  • consistent across all articles

d) Store everything in a backend database

The final structured output saved per URL includes:

  • URL

  • Title

  • H2/H3 list

  • Cleaned body text

  • Embedding summary

  • Timestamp of processing

This becomes our content intelligence database.


2. Why This Matters for Internal Linking

Internal linking works best when:

  • You know EXACTLY what each blog is about

  • You can semantically match new content with old content

  • You have a clean store of topics, subtopics, and summaries

Instead of manually evaluating hundreds of posts, the system now gives us:

  • Automated topic similarity

  • Accurate linking opportunities

  • Anchor text suggestions aligned with headings

  • Content clusters that update automatically

This turns the entire site into a living knowledge graph.


3. How This Will Power the Internal Linking Automation

When a new blog URL is added:

  1. Pabbly re-runs the full pipeline

  2. It generates the embedding summary for the new post

  3. It compares the new post’s summary with the database

  4. It finds the most semantically related older posts

  5. It recommends:

    • which blogs to link to

    • where to insert the link based on heading match

    • what anchor text to use

The flow then:

  • sends a report

  • logs it in Trello

  • and optionally triggers updates to the older posts (planned extension)

This means we never miss a linking opportunity again.


4. Big Picture: This Is a Mini SEO Engine

This backend system now functions as an internal SEO assistant that:

  • monitors content

  • understands content

  • links content

  • keeps updating relationships between articles

  • massively strengthens topical authority

Over time, it becomes impossible for the website to decay because the system keeps the structure alive and highly interconnected.


5. Future Upgrades Planned

  • auto-updating old posts with new sections

  • rewriting old intros using new keywords

  • generating cluster-level summaries

  • scoring similarity across entire site

  • powering a site search based on embeddings

  • building an SEO dashboard on top of the embeddings DB