A web-based exploratory search system leveraging CLIP (Contrastive Language-Image Pre-training) models for enhanced discovery of digital collections, including maps, photographs, and born-digital documents.
This project describes out Digital Collections Explorer, available at: https://arxiv.org/abs/2507.00961.
We present Digital Collections Explorer, a web-based, open-source exploratory search platform that leverages CLIP (Contrastive Language-Image Pre-training) for enhanced visual discovery of digital collections. Our Digital Collections Explorer can be installed locally and configured to run on a visual collection of interest on disk in just a few steps. Building upon recent advances in multimodal search techniques, our interface enables natural language queries and reverse image searches over digital collections with visual features. An overview of our system can be seen in the image above.
- Multimodal search capabilities using both text and image inputs
- Support for various digital collection types:
- Historical maps
- Photographs
- Born-digital documents
- Fine-tuned CLIP models for improved accuracy (coming soon)
- User-friendly web interface for exploration
- Python 3.8+
- Node.js 14+
- Git
- Docker (optional, for containerized deployment)
git clone https://github.com/hinxcode/digital-collections-explorer.git
cd digital-collections-explorer
npm install
npm run setup -- --type=photographs
Available collection types:
photographs
: For photo collections and image archivesmaps
: For map collectionsdocuments
: For born-digital documents collections
This will configure the project for your specific collection type and build the frontend.
# Create and activate a virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
-
Add your images to the
data/raw
directory. Supported formats include JPG, JPEG, PNG, GIF, BMP, TIFF, and WebP. The images in subdirectories will also be retrieved recursively. -
Generate embeddings for your collection:
python -m src.models.clip.generate_embeddings
This will process all images in the data/raw
directory and create embeddings in the data/embeddings
directory.
python -m src.backend.main
The API server will start at http://localhost:8000
For active development with hot-reloading:
# To enable auto-reloading of the backend server whenever code changes, first modify the `api_config.debug` setting in in `config.json` from `false` to `true`.
# Next, ensure the backend server is running. If the server is not yet running, navigate to the project's root directory and execute:
python -m src.backend.main
# Start the frontend development server
cd src/frontend/[photographs|maps|documents]
npm run dev
This will start a frontend dev server at http://localhost:5173 with hot-reloading enabled. The development server will automatically proxy API requests to the backend at http://localhost:8000.
When you're ready to deploy your changes, and only if you have customized the frontend and made code changes, since Step 2 has already built the frontend once:
npm run frontend-build
Then restart the backend server to serve the updated frontend.
Contributions are welcome! Please feel free to submit a Pull Request.
Mahowald, J., & Lee, B. C. G. (2024). Integrating Visual and Textual Inputs for Searching Large-Scale Map Collections with CLIP. arXiv:2410.01190 [cs.IR]. https://arxiv.org/abs/2410.01190