This project was developed under a previous phase of the Yale Digital Humanities Lab. Now a part of Yale Library’s Computational Methods and Data department, the Lab no longer includes this project in its scope of work. As such, it will receive no further updates.
Visualize large collections of text data with WebGL
pip install wordmapTo create a visualization from a directory of text files, you can call wordmap as follows:
wordmap --texts "data/*.txt"That process creates a visualization in ./web that can be viewed if you start a local web server:
# python 2
python -m SimpleHTTPServer 7090
# python 3
python -m http.server 7090After starting the web server, navigate to http://localhost:7090/web/ to view the visualization.
The following flags can be passed to the wordmap command. Type --help to see the full list:
--texts A glob of files to process
--encoding The encoding of input files
--max_n The maximum number of words/docs to include in the visualization
--layouts The layouts to render {umap, tsne, grid, img, obj}
--obj_file An .obj file that should be used to create the obj layout
--img_file A .png or .jpg file that should be used to create the img layout
--n_components The number of dimensions to use when creating the layouts
--tsne_perplexity The perplexity value to use when creating TSNE layout
--umap_n_neighbors The n_neighbors value to use when creating UMAP layout
--umap_min_distance The min_distance value to use when creating the UMAP layout
--model_type The model type to use {word2vec}
--use_cache Boolean that, if True, will load saved layouts from models
--model_name The name to use when saving a model to disk
--model A persisted model to use to create layouts
--size The number of dimensions to include in Word2Vec vectors
--window The number of words to include in windows when creating a Word2Vec model
--iter The maximum number of iterations to run the created model
--min_count The minimum occurrences of each word to be included in the Word2Vec model
--workers The number of computer cores to use when processing input data
--verbose If true, logs progress during layout construction
Examples:
Create a wordmap of the text files in ./data using the umap, tsne, and grid layouts:
wordmap --texts "data/*.txt" \
--layouts umap tsne gridCreate a wordmap using a saved Word2Vec model with 3 dimsions and a maximum of 10000 words:
wordmap --model "1563222036.model" \
--n_components 3 \
--max_n 10000Create a wordmap with several layouts, each with multiple parameter steps:
python wordmap/wordmap.py \
--texts "data/philosophical_transactions/*.txt" \
--layouts tsne umap grid \
--tsne_perplexity 5 25 100 \
--umap_n_neighbors 2 20 200 \
--umap_min_dist 0.01 0.1 1.0 \
--n_clusters 10 25 \
--iter 100