A retrieval-augmented generation (RAG) system built entirely inside Snowflake using Cortex Search and Cortex LLM, with a Streamlit UI for querying and uploading internal policy documents.
This version is specifically built for American Refrigeration to allow employees to ask natural language questions about SOPs, safety procedures, and internal policies with answers grounded in source documents and citations. Important for high stakes environments that require safety and strict protocols, where LLM hallucinations could be drastic.
Querying a safety policy:
Cited source chunks from the indexed PDF:
- Ingest - Upload policy PDFs via a Streamlit app into a Snowflake internal stage
- Parse - Snowflake Cortex parses each PDF into structured text
- Chunk - Text is split into overlapping chunks (1800 chars, 250 overlap)
- Index - A Cortex Search Service indexes all chunks for semantic search
- Query - A Streamlit assistant retrieves relevant chunks and optionally
summarizes them with an LLM (
snowflake-arctic)
All compute, storage, and inference runs inside Snowflake. There are no external API keys nor external vector databases/infrastructure.
PDF Upload (Streamlit)
↓
@PDF_STAGE_RAW (Snowflake internal stage)
↓
CORTEX.PARSE_DOCUMENT → PARSED_POLICY_DOCS
↓
CORTEX.SPLIT_TEXT_RECURSIVE_CHARACTER → POLICY_DOC_CHUNKS
↓
CORTEX SEARCH SERVICE: POLICY_DOCS_SEARCH
↓
Streamlit Assistant
├── Summary mode → CORTEX.COMPLETE (snowflake-arctic)
└── Verbatim mode → raw chunk retrieval only
| Component | Technology |
|---|---|
| Data warehouse | Snowflake |
| PDF parsing | Snowflake Cortex (PARSE_DOCUMENT) |
| Text chunking | Snowflake Cortex (SPLIT_TEXT_RECURSIVE_CHARACTER) |
| Semantic search | Snowflake Cortex Search Service |
| LLM | Snowflake Arctic (CORTEX.COMPLETE) |
| Frontend | Streamlit in Snowflake |
| Language | Python, SQL |
- Snowflake account with Cortex enabled
ACCOUNTADMINrole (for initial setup)- Cortex Search available in your region
Run each file in sql/ sequentially in a Snowflake worksheet:
-- 1. Create DB, schema, warehouse
01_setup.sql
-- 2. Parse PDFs → PARSED_POLICY_DOCS
02_parse_docs.sql
-- 3. Chunk text → POLICY_DOC_CHUNKS
03_chunk_text.sql
-- 4. Create Cortex Search Service
04_create_search_service.sql
-- 5. Test search (optional)
05_test_search.sqlCreate a new Streamlit app in Snowflake:
- Database:
RAG_DOCS_DB - Schema:
INTERNAL_PDFS - Warehouse:
RAG_WH
Paste the contents of apps/pdf_uploader.py and run it. Upload your policy PDFs.
Then run in a worksheet:
ALTER STAGE PDF_STAGE_RAW REFRESH;
SELECT * FROM DIRECTORY(@PDF_STAGE_RAW);Create another Streamlit app with the same DB/schema/warehouse settings. Paste one of:
| File | Mode |
|---|---|
apps/assistant_summary.py |
LLM-summarized answers + citations |
apps/assistant_verbatim.py |
Exact policy excerpts only (no LLM) |
apps/assistant_toggle.py |
Toggle between both modes |
Uses CORTEX.COMPLETE (snowflake-arctic) to generate a natural language answer
grounded in the top 5 retrieved policy chunks. Best for internal guidance and
employee Q&A.
Returns exact text excerpts from the indexed PDFs with no LLM interpretation. Best for compliance use cases where verbatim policy text is required.
Lets the user switch between both modes in the same interface.
| Resource | Name | Type |
|---|---|---|
| Database | RAG_DOCS_DB |
Database |
| Schema | INTERNAL_PDFS |
Schema |
| Warehouse | RAG_WH |
XSmall, auto-suspend 60s |
| Stage | PDF_STAGE_RAW |
Internal stage (SSE encrypted) |
| Table | PARSED_POLICY_DOCS |
One row per PDF |
| Table | POLICY_DOC_CHUNKS |
One row per chunk |
| Search Service | POLICY_DOCS_SEARCH |
Cortex Search, 90min lag |
Why Snowflake Cortex instead of an external vector DB? Everything stays inside the data warehouse, so no credentials to manage, no separate infrastructure to maintain, and no data leaving the Snowflake trust boundary. This is good for internal compliance documents.
Why two assistant modes? Verbatim mode is appropriate for official compliance answers. Summary mode is appropriate for internal employee guidance. The toggle version gives operators control over which mode employees use.
Why 1800 char chunks with 250 overlap? This balances retrieval precision (smaller chunks = more targeted results) against context completeness (overlap prevents answers from being split across chunk boundaries).
Built using Snowflake Cortex features:

