Skip to content

Commit 7a34920

Browse files
New Post: Serverless ETL Pipeline for Weather Data
1 parent 19f8cfa commit 7a34920

File tree

2 files changed

+45
-0
lines changed

2 files changed

+45
-0
lines changed
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
---
2+
title: "Serverless ETL Pipeline for Weather Data"
3+
date: 2025-04-23 08:00:00 - 0500
4+
mermaid: true
5+
categories: [AWS, Data Engineering]
6+
tags: [AWS, Serverless, ETL, AWS Lambda, Amazon Data Firehose, Amazon S3, AWS Glue, Amazon Aurora, AWS Secrets Manager, VPC Endpoints, Amazon EventBridge, Amazon Athena]
7+
image:
8+
path: /assets/img/headers/serverless-etl.webp
9+
lqip: 
10+
---
11+
12+
This solution builds a serverless data pipeline that collects, processes, and analyzes global weather data using:
13+
- OpenWeatherMap API as the data source
14+
- Lambda function for data ingestion
15+
- Kinesis Data Firehose for streaming
16+
- S3 for data lake storage with time-based partitioning
17+
- AWS Glue Crawler & Data Catalog for automated schema discovery
18+
- Amazon Athena for SQL query capabilities against raw data
19+
- VPC Endpoints for secure data access
20+
- AWS Secrets Manager for credential management
21+
- AWS Glue ETL for data transformation
22+
- Aurora Serverless v2 for structured data storage and analysis
23+
- EventBridge Rules for workflow orchestration
24+
25+
The pipeline delivers a complete solution with minimal operational overhead, providing both raw data in S3 and structured data in Aurora for comprehensive weather insights.
26+
27+
## Intended audience
28+
This video is designed for beginners interested in AWS data engineering seeking hands-on experience with ETL pipelines. It's suitable for those preparing for the AWS Data Engineering certification or anyone wanting to develop practical cloud data skills through a real-world project.
29+
30+
## Learning Objectives
31+
- Create a Lambda function to process weather data from the OpenWeatherMap API
32+
- Set up Kinesis Firehose to store data in S3 with dynamic partitioning
33+
- Implement AWS Secrets Manager for secure credential management
34+
- Configure a Glue Crawler to catalog the S3 data
35+
- Set up Amazon Athena for querying the raw data via the Glue Data Catalog
36+
- Deploy Amazon Aurora Serverless v2 database to store transformed data
37+
- Establish VPC endpoints for S3 and Secrets Manager for enhanced security
38+
- Build a Glue ETL pipeline using visual ETL and script mode for data transformation
39+
- Configure automated triggers using EventBridge for Lambda and Glue triggers for the crawler and ETL job
40+
- Run SQL queries via Aurora's built-in Query Editor
41+
42+
This hands-on demonstration will show you how to deploy directly from the command line while Elastic Beanstalk automatically handles the infrastructure provisioning and management.
43+
44+
## Get Started
45+
[Solution Overview](https://www.youtube.com/watch?v=aKEC8z9_UA4&t=1s&ab_channel=Hands-OnWithDigitalDen){:target="_blank"}
121 KB
Loading

0 commit comments

Comments
 (0)