An application for robust search of Logosophy teachings powered by LLMs and classical techniques. This tool also aims to simplify a crucial part of logosophy study, which is the preparation of specific study programs from the vast amount of material available.
Although the web application is not yet available, it is possible to use the
CLI tool to index documents and search for specific topics. The logos command
serves as entry point for the CLI. This tool is part of the package and will
always be the interface for indexing documents.
To index a document, use the index command followed by the path to the
document. All documents are expected to be in Markdown format, regardless of
the file extension. The tool will extract the paragraphs from the document and
store together with the headers and document title.
logos index 'data/prepared/'The command allow filtering the files and will recursively search for documents
in the given path. For more information on the command options, use the
--help flag.
To search for a specific topic, use the search command followed by the query
you want to search for. The tool will return a list of paragraph passages that
are related to the query using both dense, sparse and keyword search.
logos search "Como asimilar la enseñanza logosófica?" --limit 10The command will display the most relevant passages, together with the score related to the query and the metadata of the passage (file, section headers, paragraphs, etc).
- Build a first app version that allows the user to search for a specific topic and get a list of paragraphs that contain the topic.
- Allow the user to select only
dense, onlysparseor onlykeywordsearch while performing a query. - If the passage is only a fragment of a paragraph, show the full paragraph instead, with the fragment highlighted.
- Add highlighting to the parts of each passage that justify its position on the search results (maybe use LLM to score indicate relevant parts).
- Allow the user to click on a search result and see the surrounding paragraphs, scrolling through them as if viewing the original source.
- Add a chat with a
Retrieverchatbot that helps the user to find the most relevant paragraphs for a specific topic.
- Add a cross-encoder to re-rank search results.
- Test the
multilingual-e5-large-instructmodel for the search engine. - Improve sparse search by using chars n-grams to split words (reference).
- Replace
txtaibyllama-indexfor a more robust resources ecosystem. - Build a knowledge-graph extracting triplets from the paragraphs.
- Add a
deep searchmechanism that runs a query, extracts top-k terms from the found results, creates N new queries similar to the original but using the top-k terms, and re-rank the results using the cross-encoder.
- Add study program creations.
- Allow the user to select a paragraph and add it to a study program.
- Add login to save study programs per user.
- Allow management of saved study programs.
-
Speed up the imports for a quicker CLI loading (specially for --help calls). - Move the
doc2docxscript to the CLI as a utility.
Simply install the package using poetry:
pip install poetry
poetry installTo contribute, make sure to install the pre-commit hooks and run the tests before submitting a pull request.
pre-commit install
pre-commit run --all-files