Geo.vote March 8, 2026

DIY AI Visibility Tracker: Building with Open Source Tools and pgvector

By Maggie

Vision in Plain Sight: Your Brand Meets AI

Ever wondered how your brand shows up when someone asks an AI assistant? With open source AI tools, you can finally peek behind the curtain. No more guessing. No more blind spots. This guide walks you through building your own AI visibility tracker using PostgreSQL, the pgvector extension and popular open source AI frameworks.

We’ll cover everything from spinning up your database to running semantic searches that reveal how AI describes your offerings. Ready to demystify AI visibility? See how open source AI tools can deliver AI Visibility Tracking for Small Businesses

1. Why Small Businesses Need an AI Visibility Tracker

Traditional analytics track clicks and pageviews. AI assistants? They organise and summarise based on embeddings and context, not URLs. That means:
* You don’t know which answers mention your brand.
* You can’t see how competitors rank in AI responses.
* You miss out on optimising for generative engine queries.

Building your own tracker with open source AI tools levels the playing field. You’ll gain insights into brand mentions, context, and comparisons—without the hefty subscription fees of enterprise platforms.

1.1 Competitor Snapshot

Most tools focus on SEO for search engines, not generative engines:
– SEMrush and Ahrefs excel at keyword tracking but lack AI context analysis.
– Moz and BuzzSumo dig into content performance yet miss AI-driven narratives.
– Google Analytics tracks web traffic but can’t tell you how AI models describe your brand.

Our DIY solution fills that gap. You’ll harness open source AI tools, keep costs low, and get precise data on how AI chatbots see your business.

2. Setting Up Your Infrastructure

Before we dive into code, let’s outline what you need:
1. A PostgreSQL database (cloud or local).
2. The pgvector extension.
3. Python 3 with pip, plus GPU drivers if you want speed.
4. Access to a vector embedding model on Hugging Face.

2.1 Installing pgvector on PostgreSQL

Install pgvector via your package manager or build from source. On Kubernetes, you can use Percona Operator for PostgreSQL’s Custom Extension feature:

kubectl apply -f bundle.yaml --server-side
kubectl apply -f s3-secret.yaml
kubectl apply -f init.yaml
kubectl apply -f cr.yaml

This sets up a cluster with pgvector loaded. If you’re on a VM or bare metal, just run CREATE EXTENSION IF NOT EXISTS vector; in your database.

2.2 Creating the Embeddings Table

In PostgreSQL, create a table to store snippets and their embeddings:

CREATE TABLE ai_visibility (
  id SERIAL PRIMARY KEY,
  source_url TEXT,
  content TEXT,
  embedding VECTOR(1024)
);

We use 1024 dimensions to match the UAE-Large-V1 model from Hugging Face. Pick your model carefully—switching later means re-embedding everything.

3. Capturing and Indexing Your Data

This section shows how to gather your content, chunk it, and store embeddings in the database.

3.1 Scraping and Cleaning Your Pages

Use the BeautifulSoup library to extract meaningful text from your site or blog:

from bs4 import BeautifulSoup
import requests

def extract_text(url):
    html = requests.get(url).text
    soup = BeautifulSoup(html, "html.parser")
    for div in soup.select(".share-wrap, .comments-sec, #jp-relatedposts"):
        div.decompose()
    return soup.select_one(".blog-content-inner").get_text(separator="\n")

Once you have raw text, split into paragraphs or use langchain‘s MarkdownTextSplitter for markdown files.

3.2 Generating Embeddings

Load a sentence transformer and create vectors:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("WhereIsAI/UAE-Large-V1")
def embed(text):
    return model.encode([text], device="cuda" if use_gpu else "cpu")[0]

3.3 Storing in pgvector

Connect with psycopg2 and register the vector type:

import psycopg2
from pgvector.psycopg2 import register_vector

conn = psycopg2.connect(DATABASE_URL)
register_vector(conn)
cur = conn.cursor()

for chunk, url in chunks_and_sources:
    vec = embed(chunk)
    cur.execute(
      "INSERT INTO ai_visibility (content, source_url, embedding) VALUES (%s, %s, %s)",
      (chunk, url, vec)
    )
conn.commit()

4. Querying for Brand Mentions

Once data is in place, you can semantically search for any mention.

4.1 Fast Similarity Search

Define a helper in SQL:

CREATE OR REPLACE FUNCTION match_snippets(query_embedding VECTOR)
RETURNS TABLE(similarity FLOAT, source_url TEXT, content TEXT) AS $$
  SELECT 1 - (embedding <=> query_embedding), source_url, content
  FROM ai_visibility
  ORDER BY embedding <=> query_embedding
  LIMIT 5;
$$ LANGUAGE SQL IMMUTABLE;

Then in Python:

def quick_search(query):
    q_vec = embed(query)
    cur.execute("SELECT * FROM match_snippets(%s)", (q_vec,))
    return cur.fetchall()

4.2 Framing User Prompts

Combine the top snippets with the user’s question. For example:

from transformers import pipeline

qa = pipeline("text2text-generation", model="t5-small", tokenizer="t5-small")
context = "\n\n".join([row[2] for row in quick_search(question)])
prompt = f"question: {question}\ncontext: {context}"
answer = qa(prompt, max_length=150)[0]["generated_text"]

5. Building Your Visibility Dashboard

With queries in place, aggregate results into a simple web dashboard:
– Show brand mention count over time.
– Highlight top competitor mentions.
– Visualize similarity scores.

You can use Flask or another lightweight framework. Keep it simple so even non-technical team members can interpret AI visibility data at a glance.

Learn how AI visibility works

6. Optimisation Tips and Best Practices

Chunk size matters. Too long and you lose relevance. Too short and context vanishes.
Choose cosine distance. It’s the de facto metric for semantic similarity.
Fine-tune where possible. Training on your own FAQ dataset will boost precision.

For GEO-specific reach, combine this data with local keywords to improve regional recommendations. Explore practical GEO SEO strategies

7. How This Beats Paid Platforms

Paid tools like SEMrush, Ahrefs, Moz or Brandwatch:
– Lock you into subscription tiers.
– Track only traditional SEO metrics.
– Don’t reveal how AI assistants characterise your brand.

Your DIY tracker:
– Leverages open source AI tools.
– Runs in your database—no extra vendor lock-in.
– Targets AI-generated narratives head on.

And you control every bit of data.

8. Real Voices: AI Visibility in Action

“Implementing the DIY tracker gave us concrete proof of how AI sees our product. We spotted competitor mentions we never knew existed and adjusted our messaging accordingly.”
— Maria Thompson, founder of GreenLeaf Organics

“I was skeptical about running my own embeddings. But the step-by-step guide made it easy. Now I know exactly when AI assistants recommend my site.”
— Oliver Grant, CEO of BrightHomes

Conclusion and Next Steps

You’ve seen how open source AI tools, PostgreSQL and pgvector can be combined to build an AI visibility tracker. No black-box subscriptions. No guesswork. Just clear insights into how generative engines describe your brand.

Ready to take control of your AI narrative? See how open source AI tools can deliver AI Visibility Tracking for Small Businesses

Run your own AI visibility project today and stay steps ahead in the generative era.