Welcome to Databricks Lakeflow Connect Demo Kit

The demo kit makes a demo and PoC super simple:

Creates a tiny database in the cloud
1. Automatic secure names, passwords, firewall rules
Configures the database
1. Enable replication on the catalog and tables
Starts the Lakeflow Connect
1. Connection, staging and target schemas, Gateway and Ingestion pipelines, and Jobs
Generates DMLs on tables
1. insert, update, delete on primary key table intpk
2. insert on non primary key table dtix
Customize via CLIs
1. Databricks CLI, database CLI, cloud CLI,

After two hours, all objects created are automatically deleted. A tiny database instance is created meant for a functional demo.

Don't reboot the laptop while the demo is running. Rebooting the laptop will kill the background cleanup jobs.

Install CLI tools

This is a one time task in the beginning. Copy and paste the commands in a terminal window to install CLI (one time or upgrade)

Steps to run a demo

Open a new terminal using one of the ways below.
OSX terminal
- press Command Space and open Spotlight Search
- type terminal
- click terminal icon
iterm2 from brew install
- press Command Space and open Spotlight Search
- type iterm
- click iterm icon
ttyd if setup from launchctl at http://localhost:7681/
1. open a new tab from a browser with URL http://localhost:7681/
ttyd started from a terminal at http://localhost:7681/
1. open terminal or iterm from the above
2. run ttyd
```
nohup ttyd -W tmux new -A -s lakeflow.ttyd &
```
1. open a new tab from a browser with URL http://localhost:7681/
Use bash 4.0 or greater
```
/opt/homebrew/bin/bash
```

Initialize environment variables in a new terminal session for a new database. Customize with export commands as required.

# [ optional ] customize export commands here as required
source  <(curl -s -L https://raw.githubusercontent.com/rsleedbx/lakeflow_connect/refs/heads/main/00_lakeflow_connect_env.sh)

Start and configure one of the following database instances

SQL Server

SQL Server: AWS RDS SQL Server

source  <(curl -s -L https://raw.githubusercontent.com/rsleedbx/lakeflow_connect/refs/heads/main/sqlserver/01_aws_sqlserver.sh)
source  <(curl -s -L https://raw.githubusercontent.com/rsleedbx/lakeflow_connect/refs/heads/main/sqlserver/02_sqlserver_configure.sh)

SQL Server: Azure SQL Server

source  <(curl -s -L https://raw.githubusercontent.com/rsleedbx/lakeflow_connect/refs/heads/main/sqlserver/01_azure_sqlserver.sh)
source  <(curl -s -L https://raw.githubusercontent.com/rsleedbx/lakeflow_connect/refs/heads/main/sqlserver/02_sqlserver_configure.sh)

SQL Server: Azure SQL Server Managed Instance

The cost is relatively high if the free version is not available.

source  <(curl -s -L https://raw.githubusercontent.com/rsleedbx/lakeflow_connect/refs/heads/main/sqlserver/01_azure_managed_instance.sh)
source  <(curl -s -L https://raw.githubusercontent.com/rsleedbx/lakeflow_connect/refs/heads/main/sqlserver/02_sqlserver_configure.sh)

SQL Server: Google CloudSQL SQL Server

source  <(curl -s -L https://raw.githubusercontent.com/rsleedbx/lakeflow_connect/refs/heads/main/sqlserver/01_gcloud_sqlserver_instance.sh)
source  <(curl -s -L https://raw.githubusercontent.com/rsleedbx/lakeflow_connect/refs/heads/main/sqlserver/02_sqlserver_configure.sh)

Postgres

Postgres: Azure Postgres Flexible Server

source  <(curl -s -L https://raw.githubusercontent.com/rsleedbx/lakeflow_connect/refs/heads/main/postgres/01_azure_postgres.sh)
source  <(curl -s -L https://raw.githubusercontent.com/rsleedbx/lakeflow_connect/refs/heads/main/postgres/02_postgres_configure.sh)

Postgres: AWS RDS Postgres

source  <(curl -s -L https://raw.githubusercontent.com/rsleedbx/lakeflow_connect/refs/heads/main/postgres/01_aws_postgres.sh)
source  <(curl -s -L https://raw.githubusercontent.com/rsleedbx/lakeflow_connect/refs/heads/main/postgres/02_postgres_configure.sh)

Start the Databricks Lakeflow Connect Database Demo

source  <(curl -s -L https://raw.githubusercontent.com/rsleedbx/lakeflow_connect/refs/heads/main/03_lakeflow_connect_demo.sh)

How to connect to the database using the native CLI

The terminal session maintains variables that include host, user names, and passwords. The variable names used for a connection are:

$DBA_USERNAME
$DBA_PASSWORD
$USER_USERNAME
$USER_PASSWORD
$DB_FQDN
$DB_PORT

$DB_CATALOG

Example of echo $DBA_USERNAME to see the value.

L9P0RQPHY7:lakeflow_connect robert.lee$ echo $DBA_USERNAME
eirai7opei9ahp3h

type SQLCLI_DBA in the terminal after creating the database. This will issue commands to connect as the DBA using $DBA_USERNAME:$DBA_PASSWORD@$DB_HOST_FQDN:$DB_PORT/. For postgres, psql is used and postgres is the catalog. For sqlserver, sqlcmd is used and master is the catalog.

Example of postgres using SQLCLI_DBA:

. postgres/01_azure_postgres.sh
SQLCLI_DBA


L9P0RQPHY7:lakeflow_connect robert.lee$ SQLCLI_DBA
PGPASSWORD=$DBA_PASSWORD psql postgresql://eirai7opei9ahp3h@eip9aeth9ke3oiji-zp.postgres.database.azure.com:5432/ievoo7ai?sslmode=allow
psql (14.15 (Homebrew), server 16.8)
WARNING: psql major version 14, server major version 16.
       Some psql features might not work.
Type "help" for help.


ievoo7ai=>

type SQLCLI in the terminal after configuring the database. This will issue commands to connect as the user using $USER_USERNAME:$USER_PASSWORD@$DB_HOST_FQDN:$DB_PORT/$DB_CATALOG. For postgres, psql is used. For sqlserver, sqlcmd is used.

Example of postgres using SQLCLI:

. postgres/02_postgres_configure.sh
SQLCLI


L9P0RQPHY7:lakeflow_connect robert.lee$ SQLCLI
PGPASSWORD=$USER_PASSWORD psql postgresql://eine4jeip3eej4ja@eip9aeth9ke3oiji-zp.postgres.database.azure.com:5432/ievoo7ai?sslmode=allow
psql (14.15 (Homebrew), server 16.8)
WARNING: psql major version 14, server major version 16.
       Some psql features might not work.
Type "help" for help.


ievoo7ai=>

Manually connecting to the database

type the following command to connect as the DBA

PGPASSWORD=$DBA_PASSWORD psql "postgresql://${DBA_USERNAME}@${DB_HOST_FQDN}:${DB_PORT}/postgres?sslmode=allow"

type the following command to connect as the user

PGPASSWORD=$USER_PASSWORD psql "postgresql://${USER_USERNAME}@${DB_HOST_FQDN}:${DB_PORT}/${DB_CATALOG}?sslmode=allow"

Frequently Used Environmental Variables

`CDC_CT_MODE`=`BOTH`|`CDC`|`CT`|`NONE`

BOTH is the default

Example usage:

Only replicate tables that do not have primary keys.

export CDC_CT_MODE=CDC
. ./00_lakeflow_connect_env.sh

CDC_CT_MODE	Postgres	SQL Server
CDC	set `replica full` on tables without pk	enable CDC on tables without pk
CT	set `replica default` on tables with pk	enable CT on tables with pk
BOTH	set `replica full` on tables without pk, set `replica default` on tables with pk	enable CDC on tables without pk, enable CT on tables with pk
NONE	set `replica nothing` on the tables	enable CDC and CT on the table

`DB_FIREWALL_CIDRS="0.0.0.0/0"`

The default is to open the database to the public. For security, a random server name, catalog name, user name, dba name, user password, dba password are used. The database is deleted in 1 hour by default.

Example usage:

Set up firewall to allow connections from 192.168.0.0/24 and 10.10.10.12/32

export DB_FIREWALL_CIDRS="192.168.0.0/24 10.10.10.12/32"
. ./00_lakeflow_connect_env.sh

`DELETE_DB_AFTER_SLEEP=131m`

The default is to delete the database objects (server, catalog, schema, tables, UC Connection) the script creates after this many minutes.

To not delete, make it DELETE_DB_AFTER_SLEEP=""
To change the time, make it DELETE_DB_AFTER_SLEEP="67m" for example.

If the server was already created, then it won't be deleted even if this is set.

Example usage:

export DELETE_DB_AFTER_SLEEP=""
. ./00_lakeflow_connect_env.sh

`DELETE_PIPELINES_AFTER_SLEEP=137m`

The default is to delete the pipeline objects (gateway, ingestion, jobs) the script creates after this many minutes.

To not delete, make it DELETE_PIPELINES_AFTER_SLEEP=""
To change the time, make it DELETE_PIPELINES_AFTER_SLEEP="67m" for example.

Example usage:

export DELETE_PIPELINES_AFTER_SLEEP=""
. ./00_lakeflow_connect_env.sh

`DATABRICKS_CONFIG_PROFILE=DEFAULT`

The default Databricks profile is DEFAULT. Change to a different profile name.

Example usage:

Example usage of using azure profile name from .databrickscfg file.

export DATABRICKS_CONFIG_PROFILE="azure"
. ./00_lakeflow_connect_env.sh

Quick reference

native cli quick reference

common Postgres psql native commands

\l list catalogs (databases)
\dn list schema
\dt *.* to list schemas and tables
\q quit

common SQL Server sqlcmd native commands

select * from information_schema.schemata; list schemas
select * from information_schema.tables; to list schemas and tables

tmux quick reference

Ctrl + b + 0 select window 0
Ctrl + b + 1 select window 1
Ctrl + b + c create a new windows
Ctrl + b + % to split the current pane vertically.
Ctrl + b + " to split the current pane horizontally.
Ctrl + b + x to close the current pane.

Lakeflow Pipeline commands

To perform a full refresh of a table.

select a table to refresh and start the pipeline

databricks api post /api/2.0/pipelines/$INGESTION_PIPELINE_ID/updates --json '{
      "full_refresh":false,
      "full_refresh_selection":[
         "intpk"
      ]
}'

Bash command

kill jobs that delete the pipeline

kill $(jobs -l | grep "pipelines delete" | awk '{print $2}') ```

Name		Name	Last commit message	Last commit date
Latest commit History 187 Commits
.vscode		.vscode
bin		bin
docs		docs
mysql		mysql
oracle		oracle
postgres		postgres
resources		resources
sqlserver		sqlserver
typings		typings
vm		vm
.DS_Store		.DS_Store
.gitignore		.gitignore
00_lakeflow_connect_env.sh		00_lakeflow_connect_env.sh
03_lakeflow_connect_demo.sh		03_lakeflow_connect_demo.sh
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.installcli.md		README.installcli.md
README.jobs.md		README.jobs.md
README.md		README.md
README.security.md		README.security.md
cli_bug_with_update.md		cli_bug_with_update.md
oci_list		oci_list
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to Databricks Lakeflow Connect Demo Kit

Install CLI tools

Steps to run a demo

How to connect to the database using the native CLI

Frequently Used Environmental Variables

`CDC_CT_MODE`=`BOTH`|`CDC`|`CT`|`NONE`

`DB_FIREWALL_CIDRS="0.0.0.0/0"`

`DELETE_DB_AFTER_SLEEP=131m`

`DELETE_PIPELINES_AFTER_SLEEP=137m`

`DATABRICKS_CONFIG_PROFILE=DEFAULT`

Quick reference

native cli quick reference

common Postgres psql native commands

common SQL Server sqlcmd native commands

tmux quick reference

Lakeflow Pipeline commands

Bash command

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Welcome to Databricks Lakeflow Connect Demo Kit

Install CLI tools

Steps to run a demo

How to connect to the database using the native CLI

Frequently Used Environmental Variables

CDC_CT_MODE=BOTH|CDC|CT|NONE

DB_FIREWALL_CIDRS="0.0.0.0/0"

DELETE_DB_AFTER_SLEEP=131m

DELETE_PIPELINES_AFTER_SLEEP=137m

DATABRICKS_CONFIG_PROFILE=DEFAULT

Quick reference

native cli quick reference

common Postgres psql native commands

common SQL Server sqlcmd native commands

tmux quick reference

Lakeflow Pipeline commands

Bash command

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`CDC_CT_MODE`=`BOTH`|`CDC`|`CT`|`NONE`

`DB_FIREWALL_CIDRS="0.0.0.0/0"`

`DELETE_DB_AFTER_SLEEP=131m`

`DELETE_PIPELINES_AFTER_SLEEP=137m`

`DATABRICKS_CONFIG_PROFILE=DEFAULT`

Packages