sample-data/README.md at main · PJ-Snap/sample-data

Sample AdventureWorks Data Loads

This repository contains cleaned AdventureWorks CSV exports plus the SQL needed to recreate and load both the OLTP and DW schemas into Snowflake.

Repository Structure

snowflake/
  oltp/
    create_tables_snowflake.sql   # Snowflake DDL for the OLTP schema
    local-ingestion.sql           # PUT/COPY script for local files
    ingestion.sql                 # Same as above, but under a distinct stage name
    data/                         # Cleaned OLTP CSV files (tab-delimited)
  dw/
    create_tables_snowflake.sql
    local-ingestion.sql
    ingestion.sql
    data/                         # Cleaned DW CSV files (pipe-delimited)

Preparing the Data

Both datasets were exported from SQL Server using AdventureWorks’ legacy delimiters. They have already been normalized inside this repo:

snowflake/oltp/data: tab-delimited (\t), Windows newlines (\r\n). Binary columns (e.g., SPATIALLOCATION) remain as hex strings, XML columns are raw text.
snowflake/dw/data: pipe-delimited (|), Windows newlines. UTF-8 with quoted fields preserved when needed.

If you ever re-export the CSVs from the source scripts (AdventureWorks-oltp-install-script or AdventureWorksDW-data-warehouse-install-script), run the normalization steps before loading:

For OLTP files, replace every +| with a tab and strip the trailing &|.
For DW files, replace +| with | and strip the trailing &|.

Snowflake DDL

Each schema has its own DDL file:

snowflake/oltp/create_tables_snowflake.sql
snowflake/dw/create_tables_snowflake.sql

They create the database (if needed), switch to the correct schema, and create tables with plain Snowflake data types (no FK/PK constraints).

Loading into Snowflake

Use SnowSQL (or any client that supports PUT and COPY INTO):

!source snowflake/oltp/local-ingestion.sql
-- or
!source snowflake/dw/local-ingestion.sql

The scripts do the following:

USE DATABASE <db>; USE SCHEMA <schema>;
CREATE OR REPLACE STAGE …
CREATE OR REPLACE FILE FORMAT …
PUT file://…/*.csv @<stage> AUTO_COMPRESS = TRUE OVERWRITE = TRUE;
COPY INTO <table> FROM @<stage>/<File>.csv.gz FILE_FORMAT = (FORMAT_NAME = …);

Key file-format details:

OLTP: FIELD_DELIMITER = X'09', RECORD_DELIMITER = '\r\n'
DW: FIELD_DELIMITER = '|', RECORD_DELIMITER = '\r\n', ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE

Adjust PUT paths if you move the repo.

Common Troubleshooting

“Number of columns … does not match …”
The DW format sets ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE to handle rows whose trailing nullable columns are missing. Add the same to any ad-hoc file format you create.
“Error parsing Geo input / JSON”
Some OLTP exports (e.g., Store, Address) store XML or binary data. The DDL already defines those columns as VARCHAR/BINARY, so no extra parsing is required. .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sample AdventureWorks Data Loads

Repository Structure

Preparing the Data

Snowflake DDL

Loading into Snowflake

Common Troubleshooting

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Sample AdventureWorks Data Loads

Repository Structure

Preparing the Data

Snowflake DDL

Loading into Snowflake

Common Troubleshooting