This project focuses on predicting liver disease using the Indian Liver Patient Dataset. It follows a full machine learning workflow—from exploratory data analysis to model deployment using a FastAPI backend with an interactive HTML frontend.
Early detection of liver disease is crucial for timely medical intervention. This project aims to build a robust binary classification model to predict whether a patient is likely to have a liver disease based on medical attributes.
Liver Disease Prediction/
│
├── app/
│ ├── static/ # Static assets (if any)
│ ├── templates/ # HTML templates for the web interface
│ │ ├── form.html # User input form
│ │ ├── result.html # Prediction result page
│ │ └── metrics.html # Model metrics (accuracy, confusion matrix, etc.)
│ ├── main.py # FastAPI backend logic
│ ├── Ada_Model.pkl # Final trained model (AdaBoost)
│ ├── X_test.pkl # X_test used for evaluation
│ └── y_test.pkl # y_test used for evaluation
│
├── images/ # Screenshots or plots (optional)
│
├── notebooks/ # Jupyter Notebooks for experimentation
│ ├── ExplorationAndBaselineModel.ipynb
│ ├── DataPrepration.ipynb
│ └── FinalCode.ipynb
│
├── indian_liver_patient.csv # Original dataset from Kaggle
├── cleaned_indian_liver_patient.csv # Cleaned version after preprocessing
└── README.md
Notebook: notebooks/ExplorationAndBaselineModel.ipynb
- Dataset: Indian Liver Patient Dataset (ILPD)
- Target Variable:
Dataset(1: Liver disease, 2: No disease) - Key tasks:
- Null value analysis
- Correlation heatmaps
- Feature distributions by gender and class
- Baseline models: DecisionTreeClassfier
- Insight: Initial model performance was limited due to noisy and mislabeled and Unbala data.
Notebook: notebooks/DataPreparation.ipynb
- Detected and removed mislabeled rows through EDA and statistical inspection.
- Feature engineering steps:
- Converted
Datasetfrom {1, 2} to {1, 0} - One-hot encoding for
Gender
- Converted
- Exported cleaned dataset:
cleaned_indian_liver_patient.csv
Notebook: notebooks/FinalCode.ipynb
- Applied SMOTEN to address class imbalance in categorical features.
- Trained multiple classifiers; selected AdaBoostClassifier based on evaluation metrics.
- Evaluation included:
- classification report
- Confusion Matrix
- Used LIME (Local Interpretable Model-agnostic Explanations) to explain individual predictions.
- Integrated into
main.pyto generate local explanation plots for user inputs. - LIME visualizations are rendered dynamically inside
result.html:- Users can view which features influenced the prediction and by how much.
- Enhances transparency and trust in model predictions.
- Directory:
app/ - Frontend: HTML templates:
form.html: User input formresult.html: Displays prediction + LIME explanationmetrics.html: Shows model performance metrics
- Backend: FastAPI
- Loads serialized AdaBoost model (
Ada_Model.pkl) - Accepts user input and returns prediction with probability and explanation
- Serves evaluation metrics on demand
- Loads serialized AdaBoost model (
cd app
uvicorn main:app --reloadNotebook: notebooks/ExplorationAndBaselineModel.ipynb
- Dataset: Indian Liver Patient Dataset (ILPD)
- Target Variable:
Dataset(1: Liver disease, 2: No disease) - Key tasks:
- Null value analysis
- Correlation heatmaps
- Feature distributions by gender and class
- Baseline models: Logistic Regression, Random Forest, SVM
- Insight: Initial model performance was limited due to noisy and mislabeled data.
Notebook: notebooks/DataPreparation.ipynb
- Detected and removed mislabeled rows through EDA and statistical inspection.
- Feature engineering steps:
- Converted
Datasetfrom {1, 2} to {1, 0} - One-hot encoding for
Gender - Removed outliers for numerical stability
- Converted
- Exported cleaned dataset:
cleaned_indian_liver_patient.csv
Notebook: notebooks/FinalCode.ipynb
- Applied SMOTEN to address class imbalance in categorical features.
- Trained multiple classifiers; selected AdaBoostClassifier based on evaluation metrics.
- Used LIME (Local Interpretable Model-agnostic Explanations) to explain individual predictions.
- Integrated into
main.pyto generate local explanation plots for user inputs. - LIME visualizations are rendered dynamically inside
result.html:- Users can view which features influenced the prediction and by how much.
- Enhances transparency and trust in model predictions.
- Directory:
app/ - Frontend: HTML templates:
form.html: User input formresult.html: Displays prediction + LIME explanationmetrics.html: Shows model performance metrics
- Backend: FastAPI
- Loads serialized AdaBoost model (
Ada_Model.pkl) - Accepts user input and returns prediction with probability and explanation
- Serves evaluation metrics on demand
- Loads serialized AdaBoost model (
cd app
uvicorn main:app --reloadVisit: http://127.0.0.1:8000
Dependencies
pandas==2.2.2 numpy==1.26.4 scikit-learn==1.5.1 imbalanced-learn==0.12.3 matplotlib==3.9.2 fastapi==0.115.1 uvicorn[standard] joblib==1.4.2 lime==0.2.0.1
pip install -r requirements.txt
AdaBoost handled tabular and mildly imbalanced data effectively.
SMOTEN helped equalize class representation during training.
Label quality is as important as model choice—data integrity matters.
Class imbalance distorts metrics; SMOTE-like techniques are essential.
Explaining predictions (via LIME) is crucial in sensitive domains like healthcare.
Deployment requires more than a good model: usability and interpretability are key.


