Riskify

Project Overview

This repository contains a machine learning project focused on predicting whether a loan will be defaulted upon or fully paid. The project utilizes a dataset from a lending club to build and evaluate predictive models.

The primary goal of this project is to develop a robust classification model that can accurately predict the loan_status of a loan application. This is a crucial task for financial institutions as it helps in risk assessment and making informed lending decisions. The project involves a complete machine learning workflow, including:

Exploratory Data Analysis (EDA): A detailed analysis of the dataset to understand its structure, identify missing values, and visualize the relationships between different features.
Data Preprocessing and Feature Engineering: Cleaning the data, handling categorical variables, and engineering new features to improve model performance.
Model Training: Training several machine learning models, including a Random Forest Classifier, to predict loan default.
Evaluation: Assessing the model's performance using appropriate metrics like a confusion matrix, classification report, and ROC curve.
Imbalanced Data Handling: Addressing the class imbalance in the dataset, which is a common issue in fraud and default prediction, using techniques like Random Over-sampling.

Dataset

The dataset used in this project is lending_club_loan_two.csv, which contains various features related to loan applications, such as loan_amnt, int_rate, emp_length, and annual_inc. The target variable is loan_status, which is transformed into a binary is_default variable (1 for 'Charged Off' and 0 for 'Fully Paid').

Technologies Used

Python: The core programming language for the project.
Jupyter Notebook: The primary environment for development, allowing for a clear and organized workflow.
Pandas: Used for data manipulation and analysis.
NumPy: Essential for numerical operations.
Matplotlib & Seaborn: Libraries for data visualization and plotting.
Scikit-learn: A powerful machine learning library used for building and evaluating models.
imblearn: Used for handling imbalanced datasets.

Getting Started

Prerequisites

To run this project, you will need to have Python installed along with the following libraries:

pip install pandas numpy matplotlib seaborn scikit-learn imblearn

Usage

Clone this repository:

git clone [https://github.com/your-username/your-repository-name.git](https://github.com/your-username/your-repository-name.git)

Navigate to the project directory:
```
cd your-repository-name
```
Ensure the lending_club_loan_two.csv dataset is present in the same directory.
Open the Jupyter notebook predict loan default.ipynb and run all the cells to see the complete analysis and model training process.

Results

The final model, a Random Forest Classifier, demonstrates promising results in predicting loan defaults. The notebook includes a detailed evaluation of its performance on the test set, including a classification report and a confusion matrix to show its predictive accuracy for both 'Fully Paid' and 'Charged Off' loans. The use of Random Over-sampling helped to improve the model's ability to identify the minority class (loan defaults).

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
lending_club_loan_two.csv		lending_club_loan_two.csv
predict loan default.ipynb		predict loan default.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Riskify

Project Overview

Dataset

Technologies Used

Getting Started

Prerequisites

Usage

Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Riskify

Project Overview

Dataset

Technologies Used

Getting Started

Prerequisites

Usage

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages