Moved changed logic from Ingest.py into persistent_rag_ingest.ipynb#155
Open
elana wants to merge 1 commit into
Open
Moved changed logic from Ingest.py into persistent_rag_ingest.ipynb#155elana wants to merge 1 commit into
elana wants to merge 1 commit into
Conversation
…est.py, and moved it into persistent_rag_ingest.ipynb so it won't affect future modules using the same ingest.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I previously made a PR to prevent duplicate records from being imported into SQLite databases. The "id" column is reserved in the db, and we need to store each record's id (from the retrieved JSON) in order to see if it's been imported before or not. If it exists, the record won't be imported again to the SQLite database. However I put the logic in ingest.py. It modified the json stored in the response from load_faq_data() to change the "id" column to "doc_id". However I did not realize that the same "ingest.py" file was going to be used in future modules. When I ran the code for Module 4 lesson 2, there is a line in the notebook that prints the "id" from the json. Since I had changed "id" to "doc_id", it errored out.
The solution was to move the logic into the persistent_rag_ingest.ipynb file instead. Now the field only switches from id to doc_id right before being inserted into the database.