This project is based on the 11-step data cleaning framework shared by Data Scientist Dawn Choo (ex-Meta, ex-Amazon).
It summarizes essential steps in preparing messy data for impactful analysis.
- Import libraries
- Understand the data structure
- Explore the dataset
- Standardize data formats
- Remove duplicates
- Handle missing values
- Standardize string values
- Filter out bad data
- Remove outliers
- Rename columns
- Save cleaned data
- Python
- Pandas
- NumPy
- Seaborn
To run the example and generate the cleaned dataset:
python data_cleaning_cheatsheet.py
---