No Code Solution for Internal Linking Automation [Part 1]
Over the past few days, I’ve built a backend content-processing system inside Pabbly Connect that automatically turns our entire sitemap into a structured knowledge database.
This system now works as the foundation for all my internal-linking automation and will keep improving as new blogs get published.
TL;DR
I’ve built a backend Pabbly flow that:
extracts content from every URL
cleans and structures it
generates an embedding summary
stores it in a database
and prepares it for internal linking automation
This system will now quietly run in the background and strengthen our entire content ecosystem while reducing manual work and improving SEO performance.
1. What the Backend Does
The flow starts by pulling all URLs from the sitemap.
For each URL, it performs a full processing pipeline:
a) Fetch the content
The flow loads the page HTML and prepares it for cleaning.
b) Extract key fields
Using regex + text parsing, the system automatically extracts:
Title
All H2 and H3 headings (in sequence)
Cleaned Body Text (core editorial content only)
This ensures we’re not embedding junk like navigation, footer, images, CTA sections, forms, etc.
c) Generate embedding summaries
Once the structural elements are clean, a custom AI prompt produces a compact embedding summary that represents the semantic meaning of the entire page.
This summary is:
highly compressed
optimized for semantic search
ideal for matching related content
consistent across all articles
d) Store everything in a backend database
The final structured output saved per URL includes:
URL
Title
H2/H3 list
Cleaned body text
Embedding summary
Timestamp of processing
This becomes our content intelligence database.
2. Why This Matters for Internal Linking
Internal linking works best when:
You know EXACTLY what each blog is about
You can semantically match new content with old content
You have a clean store of topics, subtopics, and summaries
Instead of manually evaluating hundreds of posts, the system now gives us:
Automated topic similarity
Accurate linking opportunities
Anchor text suggestions aligned with headings
Content clusters that update automatically
This turns the entire site into a living knowledge graph.
3. How This Will Power the Internal Linking Automation
When a new blog URL is added:
Pabbly re-runs the full pipeline
It generates the embedding summary for the new post
It compares the new post’s summary with the database
It finds the most semantically related older posts
It recommends:
which blogs to link to
where to insert the link based on heading match
what anchor text to use
The flow then:
sends a report
logs it in Trello
and optionally triggers updates to the older posts (planned extension)
This means we never miss a linking opportunity again.
4. Big Picture: This Is a Mini SEO Engine
This backend system now functions as an internal SEO assistant that:
monitors content
understands content
links content
keeps updating relationships between articles
massively strengthens topical authority
Over time, it becomes impossible for the website to decay because the system keeps the structure alive and highly interconnected.
5. Future Upgrades Planned
auto-updating old posts with new sections
rewriting old intros using new keywords
generating cluster-level summaries
scoring similarity across entire site
powering a site search based on embeddings
building an SEO dashboard on top of the embeddings DB

