- README
- How to Run this RAG solution
- Project Proposal
- Technical Approach
- Technical Implementation
- Project Brief
- Prototypes developed in support of the Technical Approach & Implementation
- docker deployment
- Research, Trades and References
The goal of this project is to implement a Local Language Model (LLM) capable of reviewing, summarizing, and leveraging proprietary documents to facilitate document management tasks. The LLM will run on a laptop equipped with a 6GB RTX 3060 GPU and 32GB RAM. The system will take a list of files as input, provide answers to questions about these files, summarize the content, and assist in writing new documents based on the provided information.
This project will be reused after the class for developing documentation from existing proprietary documents at students current employer.
NOTE: Splitting text and saving in the repository is handled with a jupyter notebook. The RAG query and response are also handled with a Jupyter notebook. These were slated to be added to a third docker container for automation. However, due to refactoring, the final RAG UI was not re-implemented before the product demonstration. The authors indend to return and complete this work in Q4, 2024.
This project aims to enhance document management by leveraging an LLM to process and understand proprietary documents. The LLM will be integrated into a local system that will ensure data security and compliance with privacy regulations. The key functionalities of the LLM will include document review, question answering, summarization, and document creation assistance.
The Project Proposal provides more details about the initial project concept and premiliminary approach.
Reference Source: This use case section is taken directly from the High-Level Concepts page of LlamaIndex.
There are endless use cases for data-backed LLM applications but they can be roughly grouped into four categories:
Structured Data Extraction Pydantic extractors allow you to specify a precise data structure to extract from your data and use LLMs to fill in the missing pieces in a type-safe way. This is useful for extracting structured data from unstructured sources like PDFs, websites, and more, and is key to automating workflows.
Query Engines: A query engine is an end-to-end pipeline that allows you to ask questions over your data. It takes in a natural language query, and returns a response, along with reference context retrieved and passed to the LLM.
Chat Engines: A chat engine is an end-to-end pipeline for having a conversation with your data (multiple back-and-forth instead of a single question-and-answer).
Agents: An agent is an automated decision-maker powered by an LLM that interacts with the world via a set of tools. Agents can take an arbitrary number of steps to complete a given task, dynamically deciding on the best course of action rather than following pre-determined steps. This gives it additional flexibility to tackle more complex tasks.
- User enters a prompt
- The application uses an embedding model to embed the prompt
- The prompt embedding is sent to the vector database
- The vector database returns a list of documents that most closely match the prompt embedding. This is the context provided to the LLM.
- The application creates a new prompt using:
- The users original prompt
- A system prompt that provides a "persona" to use for the response
- The context - document text
- This new prompt is sent to the Local LLM or alternatively a cloud based LLM API
- The LLM produces a result based on the extended prompt using the best associated information in the Knowledge Base.
- Provide a list of Documents and Sources to use for the knowledge base.
- Break the documents into smaller chunks
- Create an embedding for each document chunk
- Implement a vector database that stores the document embeddings
The description of the applications, packages and approach used for the project are described in the Technical Approach. There are tools that make developing a local LLM based solution simple to implement. However, this project will implement each component to build the overall solution.
A summary of general considerations provided below.
- Knowledge Documents: Options:
- Local Documents (pdf, html, text, etc)
- Internet Search Results
- Embedding Model:
- The same ebedding model must be used to create the knowledge base and embed the users prompt.
- Vector DB:
- Select a lightweight, easy to implement DB
- Large Language Model (LLM): There are two approaches for an LLM. Both approaches work similarly. The difference is the Robustness of the LLM model, the resources required and potential licensing requirements and costs. Pros and cons of Public vs Private/Local LLMs are listed below
- Public LLM API:
- Local Private LLM:
- User Interface: There are two options for a user interface
- Use a simple terminal command line interface for the prompt and response
- Use a web based interface. There are several approaches that may provide a relatively simple implementation.
The team defined tasks and much of the initial work was completed in the prototypes directory.
See: The knowledge base implementation for details on how to start and run the local Knowledge base.
Note The application code is added to the streamlit_app.py
file and built into the streamlit docker file. See the Streamlit README for details.
See example:
-
For the purpose of providing a knowledge base LLM soluton from a directory of local files:
- Evaluate different Embedding models
- Evaluate different LLMs
- Evaluate if addtional llamaindex
- Add Authentication to docker deployments
-
Next Gen LLM Capabilities
- Implement Agents.
Project Timeline