Skip to content

bclipp/data_engineering_projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

included in the artic code vault

Data Engineering Projects

Level's of Skill for each Project

Blue :

  • All code follows best practices and was run through a linter

  • Classes and Functions are used when appropriate

  • Project is organized into modules logically

Purple:

  • code is testable without interacting with external dependencies

  • code is tests with reasonable code coverage.

  • code has intergration tests for external dependencies.

  • project uses has tests for all the infrastructure

  • project is a package

Brown :

  • project uses infrastructure as code (terraform or Cloudwatch)

  • project uses Docker

  • code uses fakes (mocks and stubs)

  • project uses a CI/CD process using something like Jenkins

  • project uses concurrency when appropriate

Projects

Core :
Merge pipeline DB and API
Streaming PostGresql CDC to S3

Data modeling and Datawarehouse
Data Modeling in PostgreSQL
Data Warehouse(Redshift , Snowflake or postgresql)

Automating Data Pipeline
Automate DataPipeline (airflow or Jenkins)

Moving Data :
REST API crud app (restaurant)
Grps crud app (Library)
AWS Lambda microservices (blockbuster)

Streaming :
Kafka Project
Spark Streaming not structured streaming

...

Big Data :
Spark DataLake
Spark Delta lake ...

DBs:
Data Modeling in Cassandra
Data Modeling in MongoDB
Data Modeling in Elasticsearch
Redis In Memory DB
Globally distributed Database

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages