|
| 1 | +--- |
| 2 | +title: "Serverless ETL Pipeline for Weather Data" |
| 3 | +date: 2025-04-23 08:00:00 - 0500 |
| 4 | +mermaid: true |
| 5 | +categories: [AWS, Data Engineering] |
| 6 | +tags: [AWS, Serverless, ETL, AWS Lambda, Amazon Data Firehose, Amazon S3, AWS Glue, Amazon Aurora, AWS Secrets Manager, VPC Endpoints, Amazon EventBridge, Amazon Athena] |
| 7 | +image: |
| 8 | + path: /assets/img/headers/serverless-etl.webp |
| 9 | + lqip:  |
| 10 | +--- |
| 11 | + |
| 12 | +This solution builds a serverless data pipeline that collects, processes, and analyzes global weather data using: |
| 13 | +- OpenWeatherMap API as the data source |
| 14 | +- Lambda function for data ingestion |
| 15 | +- Kinesis Data Firehose for streaming |
| 16 | +- S3 for data lake storage with time-based partitioning |
| 17 | +- AWS Glue Crawler & Data Catalog for automated schema discovery |
| 18 | +- Amazon Athena for SQL query capabilities against raw data |
| 19 | +- VPC Endpoints for secure data access |
| 20 | +- AWS Secrets Manager for credential management |
| 21 | +- AWS Glue ETL for data transformation |
| 22 | +- Aurora Serverless v2 for structured data storage and analysis |
| 23 | +- EventBridge Rules for workflow orchestration |
| 24 | + |
| 25 | +The pipeline delivers a complete solution with minimal operational overhead, providing both raw data in S3 and structured data in Aurora for comprehensive weather insights. |
| 26 | + |
| 27 | +## Intended audience |
| 28 | +This video is designed for beginners interested in AWS data engineering seeking hands-on experience with ETL pipelines. It's suitable for those preparing for the AWS Data Engineering certification or anyone wanting to develop practical cloud data skills through a real-world project. |
| 29 | + |
| 30 | +## Learning Objectives |
| 31 | +- Create a Lambda function to process weather data from the OpenWeatherMap API |
| 32 | +- Set up Kinesis Firehose to store data in S3 with dynamic partitioning |
| 33 | +- Implement AWS Secrets Manager for secure credential management |
| 34 | +- Configure a Glue Crawler to catalog the S3 data |
| 35 | +- Set up Amazon Athena for querying the raw data via the Glue Data Catalog |
| 36 | +- Deploy Amazon Aurora Serverless v2 database to store transformed data |
| 37 | +- Establish VPC endpoints for S3 and Secrets Manager for enhanced security |
| 38 | +- Build a Glue ETL pipeline using visual ETL and script mode for data transformation |
| 39 | +- Configure automated triggers using EventBridge for Lambda and Glue triggers for the crawler and ETL job |
| 40 | +- Run SQL queries via Aurora's built-in Query Editor |
| 41 | + |
| 42 | +This hands-on demonstration will show you how to deploy directly from the command line while Elastic Beanstalk automatically handles the infrastructure provisioning and management. |
| 43 | + |
| 44 | +## Get Started |
| 45 | +[Solution Overview](https://www.youtube.com/watch?v=aKEC8z9_UA4&t=1s&ab_channel=Hands-OnWithDigitalDen){:target="_blank"} |
0 commit comments