This repository contains the Skyflow User-Defined Functions (UDF) for AWS Athena, which allows you to integrate Skyflow's data privacy capabilities with AWS Athena queries.
skyflow-udf-project/: Maven project for building the UDFsrc/main/java/com/amazonaws/athena/connectors/udfs/SkyflowUDFHandler.java: The main UDF handler classsrc/main/resources/log4j2.xml: Log4j2 configuration filepom.xml: Maven project configuration file
skyflowudf.jar: Pre-built JAR file for those who don't want to build the projectlog4j2.xml: Log4j2 configuration file
You have two options to use this UDF:
- Use the pre-built JAR file (
skyflowudf.jar) directly - Build the project yourself using Maven
To build the project, you need to have Maven installed. Then run:
cd skyflow-udf-project
mvn clean packageThis will create a JAR file in the target directory named skyflowudf.jar.
- Skyflow vault and the necessary credentials (Vault ID, Vault URL, and API keys/access tokens)
- AWS account with access to Athena, Lambda, Secrets Manager, and IAM.
-
In the AWS Management Console, navigate to the Secrets Manager service.
-
Click "Store a new secret".
-
Choose "Other type of secret" and select "Plaintext".
-
In the "Plaintext" field, enter a JSON object with the following structure:
{ "your-skyflow-role-here": "your-skyflow-access-token-here" }Replace
your-skyflow-role-herewith the Skyflow role you want to use (e.g., "Vault Editor"), andyour-skyflow-access-token-herewith the corresponding access token or API key. -
Set the secret name to
skyflow-creds. -
Complete the secret creation process.
-
In the AWS Management Console, navigate to the IAM service.
-
Click "Policies" and then "Create policy".
-
In the JSON tab, paste the following policy:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": "secretsmanager:GetSecretValue", "Resource": "arn:aws:secretsmanager:*:{account-id}:secret:skyflow-creds-*" } ] }Replace
{account-id}with your AWS account ID. -
Set the policy name to
GetSkyflowSecretValue. -
Create the policy.
- In the IAM service, click "Roles" and then "Create role".
- Select "AWS service" as the trusted entity and choose "Lambda" as the use case.
- Attach the following policies to the role:
AWSLambdaBasicExecutionRoleGetSkyflowSecretValue(the policy you created in the previous step)
- Set the role name to
skyflowudfrole. - Create the role.
-
In the AWS Management Console, navigate to the Lambda service.
-
Click "Create function".
-
Set the function name to
skyflowudf. -
Choose "Java 11" as the runtime.
-
Click on "Change default execution role" and under "Execution role", select "Use an existing role" and choose the
skyflowudfrolerole you created in the previous step. -
Click the "Create function" button.
-
In the "Code" section, upload the Skyflow UDF JAR file (
skyflowudf.jar). -
Click on "Edit" under the "Runtime settings" section and in the "Handler" field, enter
com.amazonaws.athena.connectors.udfs.SkyflowUDFHandler. -
In the "Environment variables" section under the configuration tab, add the following variables:
VAULT_ID: Your Skyflow vault IDVAULT_URL: Your Skyflow vault URLSECRET: "skyflow-creds" (the secret name you created in Step 1)LOG_LEVEL: "false" (set to "true" if you want to log API latency)
Note: All variable names should be uppercase (CAPS)
-
Save the Lambda function.
In the Athena console, you can now use the Skyflow UDF in your queries. Here are some examples:
USING EXTERNAL FUNCTION getrecord(id varchar, tablename varchar, role varchar, redaction varchar)
RETURNS VARCHAR lambda 'skyflowudf'
SELECT getrecord(skyflow_id, 'aadhaar', 'administrator', 'REDACTED')
FROM aadhaar
LIMIT 10;This query uses the getrecord external function to retrieve a record from the aadhaar table. The function takes the following parameters:
id: The Skyflow ID of the record to retrievetablename: The name of the table in Skyflow vaultrole: The Skyflow role of the user requesting the dataredaction: The redaction type to apply to the retrieved data
The function returns the retrieved data as a VARCHAR.
USING EXTERNAL FUNCTION detokenize(col varchar, skyflowrole varchar, redaction varchar)
RETURNS VARCHAR lambda 'skyflowudf'
SELECT
detokenize(name, 'administrator', 'PLAIN_TEXT') as name,
detokenize(gender, 'financer', 'DEFAULT') as gender,
country,
skyflow_id
FROM itr
LIMIT 50;This query uses the detokenize external function to detokenize the name and gender columns from the itr table. The function takes the following parameters:
col: The column with tokens that need to be detokenizedskyflowrole: The Skyflow role of the user requesting the dataredaction: The redaction type to apply to the detokenized data
The function returns the detokenized data as a VARCHAR.
- Athena processes queries by assigning resources based on the overall service load and the number of incoming requests. Your queries may be temporarily queued before they run. You can purchase dedicated capacity for your queries using the Capacity Reservation feature.
- There is no defined limit for parallel Lambda requests; it will depend on the records that can be processed in a single go with Athena, and how much the Lambda function can handle.
The project depends on the following libraries:
- AWS Athena Federation SDK
- SLF4J
- Log4j2
- Skyflow Java SDK
- JSON Simple
All dependencies are managed by Maven and included in the final JAR file.