Entity-Based Topic Clustering Tool

A 100% client-side knowledge graph and topic clustering tool that extracts named entities from your page titles and meta descriptions, resolves them to Wikidata, and visualizes the relationships — all running in your browser's RAM.

Live demo: https://client-side-kg.pages.dev

What It Does

Paste your website's titles and meta descriptions (supports thousands of pages)
ML models run locally in your browser to extract named entities (people, companies, locations) and compute semantic embeddings
Wikidata resolution maps each entity to its canonical knowledge graph ID (QID)
Three visualization modes let you explore the relationships

No server, no API key, no data ever leaves your machine (except public Wikidata lookups for entity names).

Visualization Modes

Mode	Best For
2D Network Graph	Seeing entity co-occurrence and semantic similarity edges between pages
3D t-SNE Map	Exploring how pages and entities cluster in semantic space (rotatable 3D scatter)
SEO Topic Clusters	Interactive treemap showing which entities are your content pillars and which pages belong to each

Tech Stack

Everything loads via CDN. Zero build step required.

Layer	Library	Purpose
NER	`Xenova/distilbert-base-multilingual-cased-ner-hrl` via Transformers.js	Extract person / org / location entities from text
Embeddings	`Xenova/all-MiniLM-L6-v2` via Transformers.js	384-dim sentence vectors for similarity and t-SNE
Entity Resolution	Wikidata API	Map entity names to canonical QIDs with descriptions
2D Graph	vis-network	Force-directed entity relationship graph
3D Scatter	Plotly.js + tsne-js	Interactive 3D t-SNE visualization
Treemap	Plotly.js	Hierarchical topic cluster view
Styling	Tailwind CSS	UI framework

Use Cases

Topic Clustering — group pages by the entities they share
Topical Authority Mapping — see which entities you cover deeply vs. thinly
Internal Linking Opportunities — pages sharing entities should link to each other
Cannibalization Detection — pages with very high similarity (>0.85) may compete for the same queries
Entity-Based Site Architecture — organize your content around knowledge graph entities instead of keywords

Input Format

Supports two formats:

Title/Description pairs (recommended):

title: Apple iPhone 15 Pro Review & Specs
desc: Our in-depth review of the Apple iPhone 15 Pro camera, USB-C, and performance.

title: Tesla Model 3 Long Range Price & Range 2024
desc: The definitive guide to the 2024 Tesla Model 3 Long Range.

Plain text (one page per line):

Apple iPhone 15 Pro Review & Specs
Tesla Model 3 Long Range Price & Range 2024

Running Locally

# Clone
git clone https://github.com/metehan777/entity-topic-cluster.git
cd entity-topic-cluster

# Serve the public/ folder (any static server works)
npx serve public

# Open http://localhost:3000

No npm install needed for the frontend — everything loads from CDN.

Deploying to Cloudflare Pages

# Install wrangler
npm install -g wrangler

# Login
wrangler login

# Deploy
wrangler pages deploy public --project-name="your-project-name"

Cloudflare Worker (Optional)

The repo also includes a standalone Cloudflare Worker (src/index.ts) that provides a REST API for entity resolution:

# Install dependencies
npm install

# Deploy the worker
wrangler deploy

POST /resolve — send NER output, get Wikidata QIDs back:

{
  "entities": [
    {"text": "Barack Obama", "label": "PER"},
    {"text": "Google", "label": "ORG"}
  ],
  "language": "en",
  "limit": 3
}

Project Structure

├── public/
│   ├── index.html          # Main UI
│   ├── app.js              # Core logic (ML, Wikidata, graph rendering)
│   └── sample-data.js      # 60-page preset dataset
├── src/
│   └── index.ts            # Cloudflare Worker REST API (optional)
├── wrangler.toml            # Cloudflare config
├── package.json
└── tsconfig.json

Contributing

Contributions welcome. Some ideas:

Content Gap Analysis — query Wikidata SPARQL for related entities your site doesn't cover
CSV/TSV import — bulk import from Screaming Frog, Ahrefs, or GSC exports
Entity type filtering — toggle PER / ORG / LOC visibility in the graph
Cluster labeling — auto-generate topic cluster names from entity groups
URL column support — associate each title/desc with its URL for linking recommendations
Web Worker inference — move ML to a background thread to keep the UI responsive on large datasets

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
public		public
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
tsconfig.json		tsconfig.json
wrangler.toml		wrangler.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Entity-Based Topic Clustering Tool

What It Does

Visualization Modes

Tech Stack

Use Cases

Input Format

Running Locally

Deploying to Cloudflare Pages

Cloudflare Worker (Optional)

Project Structure

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Entity-Based Topic Clustering Tool

What It Does

Visualization Modes

Tech Stack

Use Cases

Input Format

Running Locally

Deploying to Cloudflare Pages

Cloudflare Worker (Optional)

Project Structure

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages