Skip to content

Commit ec113b8

Browse files
author
Paulo Pereira
committed
feat (typescript): added sample for eventual load on Aurora from S3 data
1 parent 2dc9931 commit ec113b8

16 files changed

+989
-0
lines changed
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
*.js
2+
!jest.config.js
3+
*.d.ts
4+
node_modules
5+
6+
# CDK asset staging directory
7+
.cdk.staging
8+
cdk.out
9+
10+
.idea/
11+
package-lock.json
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
*.ts
2+
!*.d.ts
3+
4+
# CDK asset staging directory
5+
.cdk.staging
6+
cdk.out
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
# demo-aurora-eventual-s3-data-load
2+
3+
## Project Overview
4+
5+
This project implements a robust, event-driven architecture that seamlessly ingests `CSV` files uploaded to an _Amazon
6+
S3_ bucket into an _Aurora MySQL_ database. It leverages the powerful `LOAD` command of the _MySQL_ engine to
7+
efficiently load the data into the target database.
8+
9+
Key Features:
10+
11+
1. **Event-Driven Architecture**: The system is designed to react to events triggered by `CSV` file uploads to the
12+
designated _S3_ bucket. This ensures real-time data ingestion and minimizes manual intervention.
13+
14+
2. **Reliable Data Ingestion**: The `LOAD` command is utilized to efficiently and reliably load the `CSV` data into the
15+
_Aurora MySQL_ database, ensuring data integrity and consistency.
16+
17+
3. **Error Handling and Notifications**: In the event of errors or warnings during the data ingestion process, the
18+
system captures and stores the error details in an _SQS Dead Letter Queue_ (DLQ). This facilitates subsequent
19+
analysis and troubleshooting. Additionally, an alarm is triggered to notify the relevant stakeholders, enabling
20+
prompt resolution of issues.
21+
22+
4. **Monitoring and Observability**: The system incorporates monitoring capabilities, allowing you to track the
23+
transaction, health, and potential issues in the data ingestion pipeline.
24+
25+
## Architecture
26+
27+
The architecture consists of the following components:
28+
29+
1. An _S3_ bucket that will store the `CSV` file and notify the subsequent components.
30+
2. An _SQS_ queue to store the event notification from the _S3_ bucket.
31+
3. An _SQS_ queue to store any failed attempts of processing the files (_Dead-letter queue_).
32+
4. A _Lambda_ function deployed in the same _VPC_ as the database. This function will consume the message from the _SQS_
33+
queue and trigger the database `LOAD` process, providing the data file details.
34+
5. An _Aurora MySQL_ database where the data is persisted and the new data will be loaded. The database is responsible
35+
to fetch the data file from the _S3_ bucket through the use of the `LOAD` command.
36+
6. A _CloudWatch Alarm_ that will monitor the metric `NumberOfMessagesReceived` of the _DLQ_.
37+
7. An _SNS_ topic that will be triggered from the alarm created in case of new messages arriving in the _DLQ_. The
38+
stakeholders can subscribe to this topic directly using their e-mail addresses or mobile phone numbers. It's also
39+
possible to create a custom integration that will allow the architecture to notify the related stakeholders using
40+
third-party components.
41+
8. _CloudWatch Logs_ will capture and store all the logs produced by the _Lambda_ functions for further analysis.
42+
43+
![architecture.png](docs/architecture.png)
44+
45+
## Deployment Guide
46+
47+
### Prerequisites
48+
49+
- AWS CLI installed and configured with appropriate IAM permissions
50+
- NodeJS installed (version 22.4 or later)
51+
- AWS CDK installed (version 2.160 or later)
52+
53+
### CDK Toolkit
54+
55+
The `cdk.json` file tells the CDK Toolkit how to execute your app.
56+
57+
To start working with the project, first you will need to install all dependencies as well as the cdk module (if not
58+
installed already). In the project directory, run:
59+
60+
```bash
61+
$ npm install
62+
```
63+
64+
### Deploying the solution
65+
66+
To deploy the solution, we will need to request cdk to deploy the stacks:
67+
68+
```shell
69+
$ cdk deploy --all
70+
```
71+
72+
> **Note**: After the deployment is done, some output data is presented on the terminal, providing information regarding
73+
> the deployed solution:
74+
> - **DataBucketName**: S3 bucket where the data files will be uploaded.
75+
> - **DataLoadQueueName**: Queue responsible for storing the events sent from S3.
76+
> - **DLQName**: Dead-letter queue responsible for storing failed events.
77+
> - **NotificationTopicName**: SNS topic responsible for notifying the stakeholders regarding failed processes.
78+
> - **FunctionLogGroupName**: CloudWatch Log Group responsible for storing the Lambda's function logs.
79+
80+
```shell
81+
Outputs:
82+
DemoAuroraEventualDataLoadStack.BastionHostSecurityGroupId = sg-XXXXXXXXXXX
83+
DemoAuroraEventualDataLoadStack.DLQName = demo-data-load-dlq
84+
DemoAuroraEventualDataLoadStack.DataBucketName = data-bucket-XXXXXXXXXXX
85+
DemoAuroraEventualDataLoadStack.DataLoadQueueName = demo-data-load
86+
DemoAuroraEventualDataLoadStack.DatabaseSecretName = demo-aurora-eventual-load-database-secret
87+
DemoAuroraEventualDataLoadStack.FunctionLogGroupName = /aws/lambda/demo-aurora-eventual-data-load-function
88+
DemoAuroraEventualDataLoadStack.NotificationTopicName = demo-aurora-eventual-load-notification
89+
DemoAuroraEventualDataLoadStack.VpcId = vpc-XXXXXXXXXXX
90+
```
91+
92+
## Testing the solution
93+
94+
1. Head to _AWS_ console and then to _S3_
95+
2. Select the bucket provided from the deployment command and click on `Upload`
96+
3. Select one of the files present on this repo in the _/data_ directory.
97+
1. `db-data.csv` will load successfully into the database and will generate enough logs for you to check the number
98+
of rows loaded.
99+
2. `db-data-with-error.csv` will produce some errors and will deliver the message to the _DLQ_, which will trigger
100+
the alarm and send a message to the alarm topic. This will also generate enough logs for better understand the
101+
errors encountered.
102+
4. You can check the logs produced by the solution using the _CloudWatch Log Group_ provided after the deployment
103+
process.
104+
5. Whenever you want to test the failure scenario, after uploading the data file with errors, you will be able to see
105+
the failure logs on the _CloudWatch Log Group_ and the alarm in "In Alarm" state on _CloudWatch Alarms_ section.
106+
6. **(OPTIONAL)** You can subscribe your e-mail address to the _SNS Notification Topic_ and validate the e-mail sent on
107+
the failure event.
108+
7. **(OPTIONAL)** You can access the database by deploying an _EC2_ bastion host or _CloudShell_ session inside the
109+
created _VPC_ (see the `VpcId` output). You will need to install the _MySQL_ client and use the database credentials
110+
through _Secrets Manager_ (see the `DatabaseSecretName` output).
111+
112+
## Cleanup
113+
114+
To destroy the provisioned infrastructure, you can simply run the following command:
115+
116+
```shell
117+
$ cdk destroy --all
118+
```
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
#!/opt/homebrew/opt/node/bin/node
2+
import 'source-map-support/register';
3+
import * as cdk from 'aws-cdk-lib';
4+
import { DemoAuroraEventualDataLoadStack } from '../lib/demo-aurora-eventual-data-load-stack';
5+
6+
const app = new cdk.App();
7+
new DemoAuroraEventualDataLoadStack(app, 'DemoAuroraEventualDataLoadStack', {
8+
/* If you don't specify 'env', this stack will be environment-agnostic.
9+
* Account/Region-dependent features and context lookups will not work,
10+
* but a single synthesized template can be deployed anywhere. */
11+
12+
/* Uncomment the next line to specialize this stack for the AWS Account
13+
* and Region that are implied by the current CLI configuration. */
14+
// env: { account: process.env.CDK_DEFAULT_ACCOUNT, region: process.env.CDK_DEFAULT_REGION },
15+
16+
/* Uncomment the next line if you know exactly what Account and Region you
17+
* want to deploy the stack to. */
18+
// env: { account: '123456789012', region: 'us-east-1' },
19+
20+
/* For more information, see https://docs.aws.amazon.com/cdk/latest/guide/environments.html */
21+
});
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
{
2+
"app": "npx ts-node --prefer-ts-exts bin/demo-aurora-eventual-data-load.ts",
3+
"watch": {
4+
"include": [
5+
"**"
6+
],
7+
"exclude": [
8+
"README.md",
9+
"cdk*.json",
10+
"**/*.d.ts",
11+
"**/*.js",
12+
"tsconfig.json",
13+
"package*.json",
14+
"yarn.lock",
15+
"node_modules",
16+
"test"
17+
]
18+
},
19+
"context": {
20+
"@aws-cdk/aws-lambda:recognizeLayerVersion": true,
21+
"@aws-cdk/core:checkSecretUsage": true,
22+
"@aws-cdk/core:target-partitions": [
23+
"aws",
24+
"aws-cn"
25+
],
26+
"@aws-cdk-containers/ecs-service-extensions:enableDefaultLogDriver": true,
27+
"@aws-cdk/aws-ec2:uniqueImdsv2TemplateName": true,
28+
"@aws-cdk/aws-ecs:arnFormatIncludesClusterName": true,
29+
"@aws-cdk/aws-iam:minimizePolicies": true,
30+
"@aws-cdk/core:validateSnapshotRemovalPolicy": true,
31+
"@aws-cdk/aws-codepipeline:crossAccountKeyAliasStackSafeResourceName": true,
32+
"@aws-cdk/aws-s3:createDefaultLoggingPolicy": true,
33+
"@aws-cdk/aws-sns-subscriptions:restrictSqsDescryption": true,
34+
"@aws-cdk/aws-apigateway:disableCloudWatchRole": true,
35+
"@aws-cdk/core:enablePartitionLiterals": true,
36+
"@aws-cdk/aws-events:eventsTargetQueueSameAccount": true,
37+
"@aws-cdk/aws-ecs:disableExplicitDeploymentControllerForCircuitBreaker": true,
38+
"@aws-cdk/aws-iam:importedRoleStackSafeDefaultPolicyName": true,
39+
"@aws-cdk/aws-s3:serverAccessLogsUseBucketPolicy": true,
40+
"@aws-cdk/aws-route53-patters:useCertificate": true,
41+
"@aws-cdk/customresources:installLatestAwsSdkDefault": false,
42+
"@aws-cdk/aws-rds:databaseProxyUniqueResourceName": true,
43+
"@aws-cdk/aws-codedeploy:removeAlarmsFromDeploymentGroup": true,
44+
"@aws-cdk/aws-apigateway:authorizerChangeDeploymentLogicalId": true,
45+
"@aws-cdk/aws-ec2:launchTemplateDefaultUserData": true,
46+
"@aws-cdk/aws-secretsmanager:useAttachedSecretResourcePolicyForSecretTargetAttachments": true,
47+
"@aws-cdk/aws-redshift:columnId": true,
48+
"@aws-cdk/aws-stepfunctions-tasks:enableEmrServicePolicyV2": true,
49+
"@aws-cdk/aws-ec2:restrictDefaultSecurityGroup": true,
50+
"@aws-cdk/aws-apigateway:requestValidatorUniqueId": true,
51+
"@aws-cdk/aws-kms:aliasNameRef": true,
52+
"@aws-cdk/aws-autoscaling:generateLaunchTemplateInsteadOfLaunchConfig": true,
53+
"@aws-cdk/core:includePrefixInUniqueNameGeneration": true,
54+
"@aws-cdk/aws-efs:denyAnonymousAccess": true,
55+
"@aws-cdk/aws-opensearchservice:enableOpensearchMultiAzWithStandby": true,
56+
"@aws-cdk/aws-lambda-nodejs:useLatestRuntimeVersion": true,
57+
"@aws-cdk/aws-efs:mountTargetOrderInsensitiveLogicalId": true,
58+
"@aws-cdk/aws-rds:auroraClusterChangeScopeOfInstanceParameterGroupWithEachParameters": true,
59+
"@aws-cdk/aws-appsync:useArnForSourceApiAssociationIdentifier": true,
60+
"@aws-cdk/aws-rds:preventRenderingDeprecatedCredentials": true,
61+
"@aws-cdk/aws-codepipeline-actions:useNewDefaultBranchForCodeCommitSource": true,
62+
"@aws-cdk/aws-cloudwatch-actions:changeLambdaPermissionLogicalIdForLambdaAction": true,
63+
"@aws-cdk/aws-codepipeline:crossAccountKeysDefaultValueToFalse": true,
64+
"@aws-cdk/aws-codepipeline:defaultPipelineTypeToV2": true,
65+
"@aws-cdk/aws-kms:reduceCrossAccountRegionPolicyScope": true,
66+
"@aws-cdk/aws-eks:nodegroupNameAttribute": true,
67+
"@aws-cdk/aws-ec2:ebsDefaultGp3Volume": true,
68+
"@aws-cdk/aws-ecs:removeDefaultDeploymentAlarm": true,
69+
"@aws-cdk/custom-resources:logApiResponseDataPropertyTrueDefault": false,
70+
"@aws-cdk/aws-s3:keepNotificationInImportedBucket": false
71+
}
72+
}
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
a1;Teste;aaaa
2+
a2;Teste 2;bbbb
3+
a30;Teste 30;cccc
4+
a33;Teste 33;dddd
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
1;Teste
2+
2;Teste 2
3+
30;Teste 30
4+
33;Teste 33

0 commit comments

Comments
 (0)