-
Notifications
You must be signed in to change notification settings - Fork 0
ravisraju/vector_space_retreival
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Simple Search Engine Simple search engine based on Boolean Retrieval and Vector Space Retrieval. filename - simple_search_engine.py The corpus has not been uploaded due to size constraints. Place the corpus to be indexed ("nsf award abstracts" in our case) in the same folder as the python file and continue. Run the file as python simple_search_engine.py Initial Index will be created for about 25 seconds. TF IDF calculation and construction of tfidf vectors for all files will happen in about another 50 seconds. A total of 75 seconds to make sure the query part doesn't take a long time. After this, query prompt will open up. query structure is "<bool/vector> query" "bool" option searches in boolean way, while option "vector" computes cosine similarity for all documents, ranks them and prints the top 50 results. Example queries 1) bool stephen palumbi 2) vector ricardo osuna To exit, type "exit". Note - If you type "bool exit", the application will search for keyword exit. The time noted has been in my personal machine. (8GB Ram, CORE i5) In Sun machines, it takes considerably and variably longer time. The total index time varied from 8 minutes to 14 minutes in sun machine. Resources Used - "glob" - from piazza "log(1.0*N/df)" - from piazza python documentation stackoverflow
About
Vector Space Retrieval
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published