Digital Collections Explorer

A web-based exploratory search system leveraging CLIP (Contrastive Language-Image Pre-training) models for enhanced discovery of digital collections, including maps, photographs, and born-digital documents.

Overview

This project describes out Digital Collections Explorer, available at: https://arxiv.org/abs/2507.00961.

We present Digital Collections Explorer, a web-based, open-source exploratory search platform that leverages CLIP (Contrastive Language-Image Pre-training) for enhanced visual discovery of digital collections. Our Digital Collections Explorer can be installed locally and configured to run on a visual collection of interest on disk in just a few steps. Building upon recent advances in multimodal search techniques, our interface enables natural language queries and reverse image searches over digital collections with visual features. An overview of our system can be seen in the image above.

Features

Multimodal search capabilities using both text and image inputs
Support for various digital collection types:
- Historical maps
- Photographs
- Born-digital documents
Fine-tuned CLIP models for improved accuracy (coming soon)
User-friendly web interface for exploration

Quick Start Guide

Prerequisites

Python 3.8+
Node.js 14+
Git
Docker (optional, for containerized deployment)

Step 1: Clone the Repository

git clone https://github.com/hinxcode/digital-collections-explorer.git
cd digital-collections-explorer

Step 2: Run the Setup Script with Collection Type

npm install
npm run setup -- --type=photographs

Available collection types:

photographs: For photo collections and image archives
maps: For map collections
documents: For born-digital documents collections

This will configure the project for your specific collection type and build the frontend.

Step 3: Set Up the Environment for the Backend

# Create and activate a virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Step 4: Prepare Your Collection

Add your images to the data/raw directory. Supported formats include JPG, JPEG, PNG, GIF, BMP, TIFF, and WebP. The images in subdirectories will also be retrieved recursively.
Generate embeddings for your collection:

python -m src.models.clip.generate_embeddings

This will process all images in the data/raw directory and create embeddings in the data/embeddings directory.

Step 5: Start the Backend Server

python -m src.backend.main

The API server will start at http://localhost:8000

Customizing the Frontend

Development Mode

For active development with hot-reloading:

# To enable auto-reloading of the backend server whenever code changes, first modify the `api_config.debug` setting in in `config.json` from `false` to `true`.
# Next, ensure the backend server is running. If the server is not yet running, navigate to the project's root directory and execute:
python -m src.backend.main

# Start the frontend development server
cd src/frontend/[photographs|maps|documents]
npm run dev

This will start a frontend dev server at http://localhost:5173 with hot-reloading enabled. The development server will automatically proxy API requests to the backend at http://localhost:8000.

Production Build

When you're ready to deploy your changes, and only if you have customized the frontend and made code changes, since Step 2 has already built the frontend once:

npm run frontend-build

Then restart the backend server to serve the updated frontend.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

References

Mahowald, J., & Lee, B. C. G. (2024). Integrating Visual and Textual Inputs for Searching Large-Scale Map Collections with CLIP. arXiv:2410.01190 [cs.IR]. https://arxiv.org/abs/2410.01190

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
data		data
scripts		scripts
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.json		config.json
overview.png		overview.png
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Digital Collections Explorer

Overview

Features

Quick Start Guide

Prerequisites

Step 1: Clone the Repository

Step 2: Run the Setup Script with Collection Type

Step 3: Set Up the Environment for the Backend

Step 4: Prepare Your Collection

Step 5: Start the Backend Server

Customizing the Frontend

Development Mode

Production Build

Contributing

References

About

Uh oh!

Releases 4

Uh oh!

Contributors 2

Uh oh!

Languages

License

hinxcode/digital-collections-explorer

Folders and files

Latest commit

History

Repository files navigation

Digital Collections Explorer

Overview

Features

Quick Start Guide

Prerequisites

Step 1: Clone the Repository

Step 2: Run the Setup Script with Collection Type

Step 3: Set Up the Environment for the Backend

Step 4: Prepare Your Collection

Step 5: Start the Backend Server

Customizing the Frontend

Development Mode

Production Build

Contributing

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Uh oh!

Contributors 2

Uh oh!

Languages