Skip to content

Moved changed logic from Ingest.py into persistent_rag_ingest.ipynb#155

Open
elana wants to merge 1 commit into
DataTalksClub:mainfrom
elana:fix/sqlitesearch-id-field
Open

Moved changed logic from Ingest.py into persistent_rag_ingest.ipynb#155
elana wants to merge 1 commit into
DataTalksClub:mainfrom
elana:fix/sqlitesearch-id-field

Conversation

@elana

@elana elana commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

I previously made a PR to prevent duplicate records from being imported into SQLite databases. The "id" column is reserved in the db, and we need to store each record's id (from the retrieved JSON) in order to see if it's been imported before or not. If it exists, the record won't be imported again to the SQLite database. However I put the logic in ingest.py. It modified the json stored in the response from load_faq_data() to change the "id" column to "doc_id". However I did not realize that the same "ingest.py" file was going to be used in future modules. When I ran the code for Module 4 lesson 2, there is a line in the notebook that prints the "id" from the json. Since I had changed "id" to "doc_id", it errored out.

The solution was to move the logic into the persistent_rag_ingest.ipynb file instead. Now the field only switches from id to doc_id right before being inserted into the database.

…est.py, and moved it into persistent_rag_ingest.ipynb so it won't affect future modules using the same ingest.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant