Generic REST API wrapper for Skyflow SDK operations. Deploy as AWS Lambda + API Gateway to expose Skyflow tokenization, detokenization, and query capabilities via HTTP.
- Multi-column tokenization - Tokenize multiple fields in a single request
- Vault-level detokenization - No table/column needed
- SQL queries - Execute SELECT queries against vault data
- BYOT support - Bring Your Own Token
- Multi-cluster - Route to any Skyflow cluster via request payload
- Serverless - Zero infrastructure management
Before deploying, ensure you have:
- AWS Account with CLI access
- AWS CLI installed and configured
aws --version # Should show v2.x or higher aws sts get-caller-identity # Verify credentials work
- Node.js 18+ installed
node --version # Should show v18.x or higher - Skyflow Account with:
- Vault created
- Cluster ID - The prefix of your vault URL (e.g.,
ebfc9bee4242fromhttps://ebfc9bee4242.vault.skyflowapis.com) - Vault ID - Your vault's unique identifier (e.g.,
ac7f4217c9e54fa7a6f4896c34f6964b) - API Key or Service Account credentials
cd lambda
npm install
cd ..Grant your IAM user permission to deploy Lambda functions:
# Replace 'your-iam-username' with your actual IAM username
./deploy.sh --setup-permissions your-iam-usernameThis creates an IAM policy with permissions for:
- Creating/updating Lambda functions
- Managing API Gateway
- Creating IAM roles
- CloudWatch logging
Choose your authentication method:
Option A: API Key (Simpler)
cd lambda
cp config.example.json skyflow-config.json
# Edit skyflow-config.json and add your API keyOption B: JWT Service Account
cd lambda
cp config.example-jwt.json skyflow-config.json
# Edit skyflow-config.json and add your service account credentialsExample skyflow-config.json (API Key):
{
"credentials": {
"apiKey": "sky-xxxxxxxxxxxxxxxx"
}
}./deploy.shYou'll see output like:
============================================================================
Deployment Complete! 🎉
============================================================================
API Gateway URL:
https://abc123xyz.execute-api.us-east-1.amazonaws.com/process
# Replace with your actual values
curl -X POST https://your-api-url.amazonaws.com/process \
-H "Content-Type: application/json" \
-H "X-Skyflow-Operation: tokenize" \
-H "X-Skyflow-Cluster-ID: ebfc9bee4242" \
-H "X-Skyflow-Vault-ID: ac7f4217c9e54fa7a6f4896c34f6964b" \
-H "X-Skyflow-Table: users" \
-H "X-Skyflow-Env: PROD" \
-d '{
"records": [{"email": "test@example.com"}]
}'Note: The X-Skyflow-Env header is optional and defaults to PROD if not provided. Use SANDBOX for development/testing environments.
This API provides three endpoints:
| Endpoint | Purpose | Format |
|---|---|---|
POST /process |
Standard REST API | Headers + JSON payload |
POST /processDatabricks |
Databricks integration | Same as /process (see samples/) |
POST /processSnowflake |
Snowflake external functions | Snowflake-specific format |
Note: /process and /processDatabricks use identical request/response formats. The separate Databricks endpoint exists for traffic isolation and analytics.
| Operation | Description |
|---|---|
tokenize |
Insert sensitive data, get tokens back |
detokenize |
Convert tokens back to plaintext |
query |
Execute SQL queries against vault |
tokenize-byot |
Insert with custom tokens |
All requests require these headers:
X-Skyflow-Operation- Operation to perform (tokenize, detokenize, query, tokenize-byot)X-Skyflow-Cluster-ID- Your Skyflow cluster IDX-Skyflow-Vault-ID- Your vault IDX-Skyflow-Table- Table name (required for tokenize and tokenize-byot operations)
X-Skyflow-Env- Skyflow environment (SANDBOX or PROD, defaults to PROD)
curl -X POST $API_URL \
-H "Content-Type: application/json" \
-H "X-Skyflow-Operation: tokenize" \
-H "X-Skyflow-Cluster-ID: ebfc9bee4242" \
-H "X-Skyflow-Vault-ID: ac7f4217c9e54fa7a6f4896c34f6964b" \
-H "X-Skyflow-Table: users" \
-H "X-Skyflow-Env: PROD" \
-d '{
"records": [
{"email": "john@example.com"},
{"email": "jane@example.com"}
]
}'Environment Options:
PROD- Production environment (default if header omitted)SANDBOX- Development/testing environment
Response:
{
"success": true,
"data": [
{
"email": "tok_abc123xyz",
"skyflow_id": "uuid-1"
},
{
"email": "tok_def456abc",
"skyflow_id": "uuid-2"
}
],
"metadata": {
"operation": "tokenize",
"duration_ms": 245
}
}curl -X POST $API_URL \
-H "Content-Type: application/json" \
-H "X-Skyflow-Operation: tokenize" \
-H "X-Skyflow-Cluster-ID: ebfc9bee4242" \
-H "X-Skyflow-Vault-ID: ac7f4217c9e54fa7a6f4896c34f6964b" \
-H "X-Skyflow-Table: users" \
-d '{
"records": [
{
"email": "john@example.com",
"name": "John Doe",
"ssn": "123-45-6789"
}
],
"options": {
"upsert": "email"
}
}'Response:
{
"success": true,
"data": [
{
"email": "tok_abc123xyz",
"name": "tok_def456abc",
"ssn": "tok_ghi789jkl",
"skyflow_id": "uuid-1"
}
]
}Governance-Controlled Detokenization (Recommended):
Omit redactionType to let Skyflow's governance engine determine the appropriate redaction based on your vault policies:
curl -X POST $API_URL \
-H "Content-Type: application/json" \
-H "X-Skyflow-Operation: detokenize" \
-H "X-Skyflow-Cluster-ID: ebfc9bee4242" \
-H "X-Skyflow-Vault-ID: ac7f4217c9e54fa7a6f4896c34f6964b" \
-d '{
"tokens": ["tok_abc123xyz", "tok_def456abc"]
}'Response:
{
"success": true,
"data": [
{
"token": "tok_abc123xyz",
"value": "john@example.com"
},
{
"token": "tok_def456abc",
"value": "Jane Doe"
}
]
}Override Redaction (Optional):
You can explicitly specify a redaction type to override governance policies:
curl -X POST $API_URL \
-H "Content-Type: application/json" \
-H "X-Skyflow-Operation: detokenize" \
-H "X-Skyflow-Cluster-ID: ebfc9bee4242" \
-H "X-Skyflow-Vault-ID: ac7f4217c9e54fa7a6f4896c34f6964b" \
-d '{
"tokens": ["tok_abc123xyz"],
"options": {
"redactionType": "MASKED"
}
}'Redaction Types:
- Omit
redactionType(recommended) - Skyflow governance engine decides based on vault policies PLAIN_TEXT- Returns unmasked data:john@example.comMASKED- Returns masked data:j***@example.comREDACTED- Returns fully redacted:***DEFAULT- Uses vault's default redaction setting
curl -X POST $API_URL \
-H "Content-Type: application/json" \
-H "X-Skyflow-Operation: query" \
-H "X-Skyflow-Cluster-ID: ebfc9bee4242" \
-H "X-Skyflow-Vault-ID: ac7f4217c9e54fa7a6f4896c34f6964b" \
-d '{
"query": "SELECT email, created_at FROM users WHERE created_at > '\''2024-01-01'\'' LIMIT 10"
}'Response:
{
"success": true,
"data": [
{
"email": "john@example.com",
"created_at": "2024-01-15",
"skyflow_id": "uuid-1"
}
]
}Query Limitations:
- Maximum 25 records per query (use LIMIT/OFFSET for pagination)
- SELECT statements only
- Returns plaintext values (not tokens)
curl -X POST $API_URL \
-H "Content-Type: application/json" \
-H "X-Skyflow-Operation: tokenize-byot" \
-H "X-Skyflow-Cluster-ID: ebfc9bee4242" \
-H "X-Skyflow-Vault-ID: ac7f4217c9e54fa7a6f4896c34f6964b" \
-H "X-Skyflow-Table: users" \
-d '{
"records": [
{
"fields": {
"email": "john@example.com"
},
"tokens": {
"email": "my-custom-token-123"
}
}
]
}'This API provides a dedicated endpoint for Snowflake external functions, enabling tokenization and detokenization directly within Snowflake queries.
Snowflake external functions use a specific request/response format:
Request:
{
"data": [
[0, "value1"],
[1, "value2"]
]
}Response:
{
"data": [
[0, "result1"],
[1, "result2"]
]
}Row numbers must match exactly between request and response.
Snowflake automatically prefixes all custom headers with sf-custom-. When you define a header in your Snowflake EXTERNAL FUNCTION, Snowflake adds this prefix before sending the request.
For example, if you define 'X-Skyflow-Operation' = 'tokenize' in your function's HEADERS clause, Snowflake will send it as sf-custom-X-Skyflow-Operation.
Headers to define in Snowflake (without the sf-custom- prefix):
| Header Name (in Snowflake) | Sent As | Required | Used For | Description |
|---|---|---|---|---|
X-Skyflow-Operation |
sf-custom-X-Skyflow-Operation |
Yes | Both | Operation to perform: "tokenize" or "detokenize" |
X-Skyflow-Cluster-ID |
sf-custom-X-Skyflow-Cluster-ID |
Yes | Both | Your Skyflow cluster ID |
X-Skyflow-Vault-ID |
sf-custom-X-Skyflow-Vault-ID |
Yes | Both | Your Skyflow vault ID |
X-Skyflow-Env |
sf-custom-X-Skyflow-Env |
No | Both | Skyflow environment: "SANDBOX" or "PROD" (defaults to PROD) |
X-Skyflow-Table |
sf-custom-X-Skyflow-Table |
Yes | Tokenize only | Table name for storing data |
X-Skyflow-Column-Name |
sf-custom-X-Skyflow-Column-Name |
Yes | Tokenize only | Column name in the table (single-column operations only) |
CREATE OR REPLACE API INTEGRATION skyflow_api_integration
API_PROVIDER = aws_api_gateway
API_AWS_ROLE_ARN = 'arn:aws:iam::YOUR_ACCOUNT:role/snowflake-api-role'
ENABLED = TRUE
API_ALLOWED_PREFIXES = ('https://YOUR_API_ID.execute-api.us-east-1.amazonaws.com/');-- Production tokenize function
CREATE OR REPLACE EXTERNAL FUNCTION skyflow_tokenize(plaintext VARCHAR)
RETURNS VARCHAR
API_INTEGRATION = skyflow_api_integration
HEADERS = (
'X-Skyflow-Operation' = 'tokenize',
'X-Skyflow-Cluster-ID' = 'ebfc9bee4242',
'X-Skyflow-Vault-ID' = 'ac7f4217c9e54fa7a6f4896c34f6964b',
'X-Skyflow-Env' = 'PROD',
'X-Skyflow-Table' = 'users',
'X-Skyflow-Column-Name' = 'email'
)
AS 'https://YOUR_API_ID.execute-api.us-east-1.amazonaws.com/processSnowflake';
-- Sandbox tokenize function (optional)
CREATE OR REPLACE EXTERNAL FUNCTION skyflow_tokenize_sandbox(plaintext VARCHAR)
RETURNS VARCHAR
API_INTEGRATION = skyflow_api_integration
HEADERS = (
'X-Skyflow-Operation' = 'tokenize',
'X-Skyflow-Cluster-ID' = 'your-sandbox-cluster-id',
'X-Skyflow-Vault-ID' = 'your-sandbox-vault-id',
'X-Skyflow-Env' = 'SANDBOX',
'X-Skyflow-Table' = 'users',
'X-Skyflow-Column-Name' = 'email'
)
AS 'https://YOUR_API_ID.execute-api.us-east-1.amazonaws.com/processSnowflake';-- Production detokenize function
CREATE OR REPLACE EXTERNAL FUNCTION skyflow_detokenize(token VARCHAR)
RETURNS VARCHAR
API_INTEGRATION = skyflow_api_integration
HEADERS = (
'X-Skyflow-Operation' = 'detokenize',
'X-Skyflow-Cluster-ID' = 'ebfc9bee4242',
'X-Skyflow-Vault-ID' = 'ac7f4217c9e54fa7a6f4896c34f6964b',
'X-Skyflow-Env' = 'PROD'
)
AS 'https://YOUR_API_ID.execute-api.us-east-1.amazonaws.com/processSnowflake';
-- Sandbox detokenize function (optional)
CREATE OR REPLACE EXTERNAL FUNCTION skyflow_detokenize_sandbox(token VARCHAR)
RETURNS VARCHAR
API_INTEGRATION = skyflow_api_integration
HEADERS = (
'X-Skyflow-Operation' = 'detokenize',
'X-Skyflow-Cluster-ID' = 'your-sandbox-cluster-id',
'X-Skyflow-Vault-ID' = 'your-sandbox-vault-id',
'X-Skyflow-Env' = 'SANDBOX'
)
AS 'https://YOUR_API_ID.execute-api.us-east-1.amazonaws.com/processSnowflake';-- Tokenize email addresses when loading data
INSERT INTO tokenized_customers (id, email_token, name)
SELECT
id,
skyflow_tokenize(email) AS email_token,
name
FROM staging_customers;-- Create masking policy for role-based access
CREATE OR REPLACE MASKING POLICY email_mask AS (val VARCHAR) RETURNS VARCHAR ->
CASE
WHEN CURRENT_ROLE() IN ('ADMIN', 'ANALYST') THEN skyflow_detokenize(val)
ELSE '***@***.com'
END;
-- Apply policy to column
ALTER TABLE tokenized_customers
MODIFY COLUMN email_token
SET MASKING POLICY email_mask;
-- Admins see real emails, others see masked
SELECT id, email_token, name
FROM tokenized_customers;-- Detokenize only for specific users
SELECT
customer_id,
CASE
WHEN is_vip = TRUE
THEN skyflow_detokenize(email_token)
ELSE email_token
END AS email
FROM customers
WHERE created_date > '2024-01-01';Emulate a Snowflake request for testing (note the sf-custom- prefix that Snowflake adds):
Tokenize (Production):
curl -X POST https://YOUR_API_ID.execute-api.us-east-1.amazonaws.com/processSnowflake \
-H "Content-Type: application/json" \
-H "sf-custom-X-Skyflow-Operation: tokenize" \
-H "sf-custom-X-Skyflow-Cluster-ID: ebfc9bee4242" \
-H "sf-custom-X-Skyflow-Vault-ID: ac7f4217c9e54fa7a6f4896c34f6964b" \
-H "sf-custom-X-Skyflow-Env: PROD" \
-H "sf-custom-X-Skyflow-Table: users" \
-H "sf-custom-X-Skyflow-Column-Name: email" \
-d '{"data":[[0,"john@example.com"],[1,"jane@example.com"]]}'Response:
{
"data": [
[0, "tok_abc123xyz"],
[1, "tok_def456abc"]
]
}Detokenize (Sandbox):
curl -X POST https://YOUR_API_ID.execute-api.us-east-1.amazonaws.com/processSnowflake \
-H "Content-Type: application/json" \
-H "sf-custom-X-Skyflow-Operation: detokenize" \
-H "sf-custom-X-Skyflow-Cluster-ID: your-sandbox-cluster-id" \
-H "sf-custom-X-Skyflow-Vault-ID: your-sandbox-vault-id" \
-H "sf-custom-X-Skyflow-Env: SANDBOX" \
-d '{"data":[[0,"tok_abc123xyz"],[1,"tok_def456abc"]]}'Response:
{
"data": [
[0, "john@example.com"],
[1, "jane@example.com"]
]
}- Snowflake batches rows automatically for efficiency
- Row order is preserved (guaranteed by both Snowflake and Skyflow)
- Lambda singleton pattern ensures fast warm starts
- Typical latency: 50-200ms for batches of 100-1000 rows
- Tokenize requires table name and column name headers
- Only single-column operations supported per function
- For multi-column tokenization, create multiple external functions (one per column)
The /processDatabricks endpoint enables Skyflow tokenization and detokenization in Databricks using Unity Catalog Batch Python UDFs.
Databricks → Lambda (batched) → Skyflow (batched)
- Batching to Lambda: Configurable (default 500 rows per call)
- Lambda to Skyflow: Automatic internal batching at 25 rows per Skyflow API call
- Functions: Persistent in Unity Catalog, governed and shareable
- Views: Support for persistent views with automatic tokenization/detokenization
- Deploy Lambda (see Quick Start)
- Import notebook: Upload
samples/databricks.ipynbto Databricks - Configure credentials in cell 1
- Run cells 2-3 to create persistent Unity Catalog functions
- Use in SQL:
-- Tokenize WITH prepared AS ( SELECT email, 'email' AS col FROM users ) SELECT skyflow_tokenize_column(email, col) as token FROM prepared; -- Detokenize SELECT skyflow_detokenize(token) as email FROM tokens;
See samples/README.md for complete Databricks integration guide including:
- Detailed setup instructions
- Derived column pattern (required for UC PARAMETER STYLE PANDAS)
- Performance tuning and batch size configuration
- Troubleshooting guide
- Security best practices
- Handler signature patterns
import requests
API_URL = "https://your-api.amazonaws.com/process"
def tokenize(cluster_id, vault_id, table, records):
response = requests.post(
API_URL,
headers={
"X-Skyflow-Operation": "tokenize",
"X-Skyflow-Cluster-ID": cluster_id,
"X-Skyflow-Vault-ID": vault_id,
"X-Skyflow-Table": table
},
json={"records": records}
)
return response.json()["data"]
def detokenize(cluster_id, vault_id, tokens, redaction_type=None):
payload = {"tokens": tokens}
# Only include options if redaction_type is specified
if redaction_type:
payload["options"] = {"redactionType": redaction_type}
response = requests.post(
API_URL,
headers={
"X-Skyflow-Operation": "detokenize",
"X-Skyflow-Cluster-ID": cluster_id,
"X-Skyflow-Vault-ID": vault_id
},
json=payload
)
return response.json()["data"]
# Usage
tokens = tokenize(
"ebfc9bee4242",
"ac7f4217c9e54fa7a6f4896c34f6964b",
"users",
[{"email": "john@example.com"}]
)
# Governance-controlled detokenization (recommended)
values = detokenize(
"ebfc9bee4242",
"ac7f4217c9e54fa7a6f4896c34f6964b",
["tok_abc123xyz"]
)
# Or explicitly specify masking
masked_values = detokenize(
"ebfc9bee4242",
"ac7f4217c9e54fa7a6f4896c34f6964b",
["tok_abc123xyz"],
redaction_type="MASKED"
)const axios = require('axios');
const API_URL = 'https://your-api.amazonaws.com/process';
async function tokenize(clusterId, vaultId, table, records) {
const response = await axios.post(API_URL,
{ records: records },
{
headers: {
'X-Skyflow-Operation': 'tokenize',
'X-Skyflow-Cluster-ID': clusterId,
'X-Skyflow-Vault-ID': vaultId,
'X-Skyflow-Table': table
}
}
);
return response.data.data;
}
async function detokenize(clusterId, vaultId, tokens, redactionType = null) {
const payload = { tokens: tokens };
// Only include options if redactionType is specified
if (redactionType) {
payload.options = { redactionType: redactionType };
}
const response = await axios.post(API_URL, payload, {
headers: {
'X-Skyflow-Operation': 'detokenize',
'X-Skyflow-Cluster-ID': clusterId,
'X-Skyflow-Vault-ID': vaultId
}
});
return response.data.data;
}
// Usage
const tokens = await tokenize(
'ebfc9bee4242',
'ac7f4217c9e54fa7a6f4896c34f6964b',
'users',
[{ email: 'john@example.com' }]
);
// Governance-controlled detokenization (recommended)
const values = await detokenize(
'ebfc9bee4242',
'ac7f4217c9e54fa7a6f4896c34f6964b',
['tok_abc123xyz']
);
// Or explicitly specify masking
const maskedValues = await detokenize(
'ebfc9bee4242',
'ac7f4217c9e54fa7a6f4896c34f6964b',
['tok_abc123xyz'],
'MASKED'
);Config file: lambda/skyflow-config.json (git-ignored)
API Key:
{
"credentials": {
"apiKey": "sky-xxxxxxxxxxxxxxxx"
}
}JWT Service Account:
{
"credentials": {
"clientID": "your-client-id",
"clientName": "your-client-name",
"tokenURI": "https://your-cluster.vault.skyflowapis.com/v1/auth/sa/oauth/token",
"keyID": "your-key-id",
"privateKey": "-----BEGIN RSA PRIVATE KEY-----\n...\n-----END RSA PRIVATE KEY-----"
}
}The deploy script automatically converts your config file to Lambda environment variables:
SKYFLOW_API_KEY(for API Key auth)SKYFLOW_CLIENT_ID,SKYFLOW_CLIENT_NAME,SKYFLOW_TOKEN_URI,SKYFLOW_KEY_ID,SKYFLOW_PRIVATE_KEY(for JWT auth)
Note: cluster_id is provided per-request, not in config. This allows routing to multiple clusters from a single Lambda function.
# Deploy or update
./deploy.sh
# Show help
./deploy.sh --help
# Setup IAM permissions (first-time only)
./deploy.sh --setup-permissions <iam-username>
# Destroy all resources
./deploy.sh --destroyClient → API Gateway → Lambda → Skyflow SDK → Skyflow API
(Single (Pure
Endpoint) Wrapper)
Key Files:
lambda/handler.js- Request routing and validationlambda/skyflow-client.js- SDK wrapper (no custom logic)lambda/config.js- Credential loaderdeploy.sh- Deployment automation
Performance:
- Singleton SDK client (reused across warm Lambda invocations)
- Client caching per cluster+vault combination
- Configurable batching and concurrency
All errors return HTTP 500 with:
{
"success": false,
"error": {
"message": "Error description",
"type": "ErrorType"
}
}Common errors:
Missing required header: X-Skyflow-Cluster-IDMissing required header: X-Skyflow-Vault-IDMissing required header: X-Skyflow-TableTokenization failed: Invalid vault IDQuery failed: SQL syntax error
aws logs tail /aws/lambda/skyflow-lambda-api --followMonitor in AWS Console:
- Invocations
- Duration
- Errors
- Throttles
- Never commit
skyflow-config.json(already in.gitignore) - Use AWS Secrets Manager for production credentials
- Enable API Gateway authentication (API keys, IAM, Cognito)
- Rotate credentials regularly in Skyflow dashboard
- Monitor CloudWatch logs for suspicious activity
- Use HTTPS only (enforced by API Gateway)
- Create
lambda/skyflow-config.jsonfromconfig.example.json - Verify the file is in the correct location
- Ensure your request includes the required headers:
X-Skyflow-Cluster-IDandX-Skyflow-Vault-ID
- Increase Lambda memory (more CPU): edit
MEMORY_SIZEindeploy.sh - Check CloudWatch logs for slow operations
- Verify AWS CLI is configured:
aws sts get-caller-identity - Run
./deploy.sh --setup-permissions <your-iam-user>first - Check you have
jqinstalled:jq --version
MIT
- Skyflow SDK: https://github.com/skyflowapi/skyflow-node
- Skyflow Docs: https://docs.skyflow.com
- Issues: Open an issue in this repository