Skip to content

Latest commit

 

History

History
75 lines (51 loc) · 2.98 KB

File metadata and controls

75 lines (51 loc) · 2.98 KB

Sample AdventureWorks Data Loads

This repository contains cleaned AdventureWorks CSV exports plus the SQL needed to recreate and load both the OLTP and DW schemas into Snowflake.

Repository Structure

snowflake/
  oltp/
    create_tables_snowflake.sql   # Snowflake DDL for the OLTP schema
    local-ingestion.sql           # PUT/COPY script for local files
    ingestion.sql                 # Same as above, but under a distinct stage name
    data/                         # Cleaned OLTP CSV files (tab-delimited)
  dw/
    create_tables_snowflake.sql
    local-ingestion.sql
    ingestion.sql
    data/                         # Cleaned DW CSV files (pipe-delimited)

Preparing the Data

Both datasets were exported from SQL Server using AdventureWorks’ legacy delimiters. They have already been normalized inside this repo:

  • snowflake/oltp/data: tab-delimited (\t), Windows newlines (\r\n). Binary columns (e.g., SPATIALLOCATION) remain as hex strings, XML columns are raw text.
  • snowflake/dw/data: pipe-delimited (|), Windows newlines. UTF-8 with quoted fields preserved when needed.

If you ever re-export the CSVs from the source scripts (AdventureWorks-oltp-install-script or AdventureWorksDW-data-warehouse-install-script), run the normalization steps before loading:

  1. For OLTP files, replace every +| with a tab and strip the trailing &|.
  2. For DW files, replace +| with | and strip the trailing &|.

Snowflake DDL

Each schema has its own DDL file:

  • snowflake/oltp/create_tables_snowflake.sql
  • snowflake/dw/create_tables_snowflake.sql

They create the database (if needed), switch to the correct schema, and create tables with plain Snowflake data types (no FK/PK constraints).

Loading into Snowflake

Use SnowSQL (or any client that supports PUT and COPY INTO):

!source snowflake/oltp/local-ingestion.sql
-- or
!source snowflake/dw/local-ingestion.sql

The scripts do the following:

  1. USE DATABASE <db>; USE SCHEMA <schema>;
  2. CREATE OR REPLACE STAGE …
  3. CREATE OR REPLACE FILE FORMAT …
  4. PUT file://…/*.csv @<stage> AUTO_COMPRESS = TRUE OVERWRITE = TRUE;
  5. COPY INTO <table> FROM @<stage>/<File>.csv.gz FILE_FORMAT = (FORMAT_NAME = …);

Key file-format details:

  • OLTP: FIELD_DELIMITER = X'09', RECORD_DELIMITER = '\r\n'
  • DW: FIELD_DELIMITER = '|', RECORD_DELIMITER = '\r\n', ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE

Adjust PUT paths if you move the repo.

Common Troubleshooting

  • “Number of columns … does not match …”
    The DW format sets ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE to handle rows whose trailing nullable columns are missing. Add the same to any ad-hoc file format you create.

  • “Error parsing Geo input / JSON”
    Some OLTP exports (e.g., Store, Address) store XML or binary data. The DDL already defines those columns as VARCHAR/BINARY, so no extra parsing is required. .