Paper Clusterer: Turn Your Obsidian Vault into a Research Map

Paper Clusterer Visualization - Cluster Overview

Paper Clusterer Visualization - Focused Cluster

The Problem: A Graveyard of Research Notes

My Obsidian vault had become a mess. Hundreds of paper notes, lecture materials, and research ideas—scattered across folders, poorly connected, and increasingly difficult to navigate. Sound familiar?

As researchers, we consume vast amounts of information. We take notes, highlight papers, jot down ideas. But without proper organization, this knowledge remains fragmented and underutilized.

I knew the technical solution was straightforward: embed documents using language models, cluster them by semantic similarity, and auto-generate meaningful labels. A classic NLP pipeline that could transform chaos into structured insight.

But there was a catch: I didn’t have time to learn TypeScript and the Obsidian Plugin API from scratch.

The Solution: Agentic Development

Enter modern AI coding tools. What would have traditionally required weeks of learning and development, I built in days through iterative collaboration with AI agents. The result is Paper Clusterer—an Obsidian plugin that automatically organizes your research notes into thematic clusters.

What It Does

Paper Clusterer transforms your scattered notes into an interactive knowledge map:

🔹 Multi-Provider Embeddings

Choose your preferred embedding provider:

  • OpenAI (text-embedding-3-small/large)
  • Ollama (local, free)
  • LM Studio (local, free)

Keep your data private with local options, or use cloud APIs for convenience.

🔹 Smart Clustering Algorithms

Two algorithms to fit your needs:

  • HDBSCAN: Automatically discovers the optimal number of clusters, handles outliers
  • K-means: Classic algorithm with automatic K selection via silhouette analysis

🔹 AI-Generated Scientific Labels

No more generic “Cluster 1”, “Cluster 2”. The plugin uses LLMs to generate research-appropriate labels:

  • OpenAI (GPT-4o, GPT-4o-mini)
  • Kimi (Moonshot) - particularly strong for Chinese/English academic content
  • Ollama & LM Studio (local LLMs)

Labels focus on research domains, methodologies, and technical concepts—not vague descriptions.

🔹 Interactive Visualization

Explore your research landscape through an interactive scatter plot:

  • Zoom and pan across clusters
  • Focus on specific clusters to see all papers
  • Visual connections between related clusters (L2 distance)
  • Hover to see paper titles, click to open notes

🔹 Persistent Results

Clustering results are saved directly to your vault as a Markdown document with embedded metadata. Reopen the document anytime to restore the full visualization—no need to re-run clustering.

Technical Highlights

The plugin demonstrates several interesting technical approaches:

  • Dynamic library loading: UMAP and clustering libraries load on-demand to keep plugin startup fast
  • Embedding caching: Reuse embeddings for quick re-clustering with different parameters
  • Dimensionality reduction: UMAP reduces high-dimensional embeddings to 2-5D for effective clustering
  • Quality-controlled labeling: Automatic detection and rejection of low-quality generated labels

Real Impact

Even in its current form, Paper Clusterer has transformed how I navigate my research:

  • Rediscovered connections between papers I’d forgotten about
  • Identified gaps in my literature review by seeing cluster sizes
  • Linked lecture materials with relevant research papers
  • Generated structured literature review outlines automatically

Get Started

1
2
3
4
# Install via GitHub (until it's in the community plugin store)
# 1. Download main.js, manifest.json, styles.css from releases
# 2. Copy to .obsidian/plugins/paper-clusterer/
# 3. Enable in Community Plugins settings

GitHub: github.com/yiyang92/obsidian-paper-clusterer

Roadmap

  • Fragment-level clustering (cluster parts of documents, not just whole notes)
  • Enhanced visualizations with 3D exploration
  • Time-based clustering (see how your research interests evolve)
  • Integration with Zotero and other reference managers
  • Official Obsidian Community Plugin store submission

Acknowledgments

Huge thanks to Moonshot AI for the developer tools that made experimenting with this idea much easier. Their Kimi models excel at generating precise academic labels.

Also, this project wouldn’t exist without modern agentic coding tools that dramatically lower the barrier to building specialized software.


Contributions welcome! If you find this useful or have ideas for improvement, open an issue or PR on GitHub.