Equitable AI for Dermatology, AJL Kaggle Competition

👥 Team Members

Name	GitHub Handle	Contribution
Bode Chiu	@BootyChu	Model Improvement
Erica Xue	@ericaxuee	Model Improvement
Manoj Nath Yogi	@manojnathyogi	Data Visualization and Model Development
Natalie Kao	@nataliekao03	Model Improvement
Smila Gala	@Smila3	Initial EDA and updating README
Sneha Nangunoori	@snehanangunoori	Model Development and Fine-tuning
Precious Onah	preciousonah	Model Improvement

🎯 Project Highlights

Developed the final model leveraging transfer learning with pre-trained architectures (ResNet, EfficientNet, ConvNeXt) to classify 16 different skin conditions across diverse skin tones on the Fitzpatrick scale.
Attained an F1 score of 0.61, securing 13th place out of 74 teams on the final Kaggle leaderboard.
Employed image generation and augmentation techniques to enhance dataset diversity and model generalization.

🔗 Equitable AI for Dermatology | Kaggle Competition Page

👩🏽‍💻 Setup & Execution

How to run the Notebooks:

Clone the repository in the terminal of your preferred IDE

git clone https://github.com/manojnathyogi/Equitable_AI_Dermatology-TeamGlycolic.git
cd Equitable_AI_Dermatology-TeamGlycolic

Make sure to download the datasets provided in the competition link and update the file paths in the notebook accordingly.
These notebooks can run in IDEs like Visual Studio Code, but we recommend opening it in the Kaggle or Jupyter Notebook

🏗️ Project Overview

Our project was part of a Algorithmic Justic League (AJL) Kaggle competition linked to the Break Through Tech AI Program, aiming to advance AI-driven healthcare solutions.
The challenge required us to develop a model capable of classifying 21 different skin conditions while ensuring fairness across diverse skin tones.
The evaluation metric, weighted F1 score, emphasized the importance of balanced performance across all classes.
Projects like this ensure more competent AI algorithms in the health field, as we so regularly see how misrepresented groups get misdiagnosed due to professionals only studying the majority in the past.
When developing the model, we wanted to be more socially responsible and include those values related to representation and diversity in our project.

📊 Data Exploration

We used the dataset provided by AJL, which is a subset of the FitzPatrick17k dataset, containing approximately 17,000 images depicting a variety of serious (e.g., melanoma) and cosmetic (e.g., acne) dermatological conditions. These images cover a range of skin tones, scored on the FitzPatrick skin tone scale (FST).
FYI: The dataset is available on the competition page.
The dataset contains about 4,500 images representing 21 skin conditions out of the 100+ in the full FitzPatrick dataset.
The Fitzpatrick scale ranges from 1 to 6, but we observed some rows containing an invalid value of -1, which is outside the expected range. To ensure data accuracy, we removed these rows from the dataset.

Observation: The Fitzpatrick scale and "Fitzpatrick_centaur" attributes show a moderate positive correlation but are not perfectly aligned.
Explanation: Cultural differences in skin tone classification, as well as potential biases in data collection or subjectivity in manual assessments, may account for discrepancies between the two variables.

Before training the model, we analyzed the distribution of skin conditions to identify any imbalances and explored strategies to mitigate them.
To address the class imbalance issue, we implemented a strategy using a class weights dictionary, which was later fed into the model during training to ensure balanced learning across all classes.

🧠 Model Development

Models Used:

ResNet
EfficientNet
ConvNeXt

Feature selection and Hyperparameter tuning strategies

Interation 1

As this is a computer vision problem, the features extracted from the dataset were limited to the image paths and the corresponding labels, which were encoded into integers.
The core approach involved leveraging pre-trained image classification models as the base, and fine-tuning the model by adding dense layers and dropout layers to mitigate overfitting.
To transition from the base model to the additional layers, we used GlobalAveragePooling2D(), which aggregates feature maps by averaging their values, reducing spatial dimensions and the risk of overfitting while maintaining key features for classification.

Interation 2

We noticed that the model wasn’t effectively learning from our dataset, so we unfreezed some of the last layers of the pre-trained models. We then experimented with the optimal number of layers to unfreeze, balancing between retaining the knowledge learned from the ImageNet dataset and allowing the model to better adapt to our specific dataset.
Initially, we considered increasing the number of epochs to give the model more time to learn patterns from the dataset. However, after a certain point, the validation accuracy plateaued, indicating that the learning rate was too low. As a result, we adjusted the learning rate from 1e-4 to 1e-3 to improve convergence.

Iteration 3

After increasing the learning rate to 1e-3, the model started to overfit, learning the training data but failing to generalize well. To address this, we implemented a learning rate reduction callback, which dynamically reduces the learning rate when the validation accuracy begins to plateau.
Additionally, for models with relatively fewer layers, such as ResNet50, we needed to train for more epochs to allow the model to effectively learn from the dataset.

Training setup

Training and validation data had a ratio of 80:20
Training evaluation metric: Validation accuracy
Baseline Performance: 0.3 - 0.4

📈 Results & Key Findings

🏆 Performance Metrics

Final Model: Fine-tuned ConvNeXtTiny
Validation Accuracy: ~68%
F1 Score: 0.61
Leaderboard Rank: 13th out of 74 teams

📊 Overall Model Performance

Our model performed well on high-frequency conditions like acne-vulgaris and folliculitis, while rare or visually similar classes such as eczema, seborrheic-keratosis, and malignant-melanoma were more challenging.

Confusion Matrix:

🌈 Performance Across Skin Tones

Most samples were from Fitzpatrick Types II–IV.
Underrepresented skin tones (Types I and VI) showed slightly lower class-wise accuracy.
We used class weights, brightness/contrast augmentations, and removed invalid FST values (-1) to reduce bias.

📉 Precision-Recall Summary

The plot below compares the top 5 and bottom 5 classes based on average precision (AP):

Precision-Recall Curve:

🔼 Top Classes (High AP)	🔽 Bottom Classes (Low AP)
prurigo-nodularis (0.86)	dyshidrotic-eczema (0.29)
folliculitis (0.85)	actinic-keratosis (0.33)
acne-vulgaris (0.84)	eczema (0.33)
basal-cell-carcinoma-morpheiform (0.80)	malignant-melanoma (0.35)
mycosis-fungoides (0.76)	seborrheic-keratosis (0.40)

🖼️ Impact Narrative

We addressed fairness by applying class weighting, augmentation (brightness, flips, etc.), and by filtering invalid fitzpatrick_scale values. We also used stratified validation and analyzed performance across skin tones.
Although we weren't able to implement all of our ideas for making the model more inclusive, we hope our work serves as an inspiration for others to join us in this movement. We believe continued efforts toward creating more inclusive models will have a meaningful impact on improving healthcare outcomes for underrepresented groups.

🚀 Next Steps & Future Improvements

We encountered challenges with model overfitting and inconsistencies in results when running the model multiple times.
With more time and resources, we could have investigated the causes of these inconsistencies and developed strategies to address them. This would involve understanding the internal structure of the pre-trained models and conducting more hyperparameter tuning.
Future improvements include exploring alternative models, implementing ensemble learning, applying k-fold cross-validation, training models specific to each Fitzpatrick skin type, and conducting further image-oriented EDA to enhance model robustness and generalization.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.gitignore		.gitignore
Manoj-team-glycolic-acid.ipynb		Manoj-team-glycolic-acid.ipynb
README.md		README.md
Screenshot 2025-03-22 123139.png		Screenshot 2025-03-22 123139.png
Screenshot 2025-03-22 123224.png		Screenshot 2025-03-22 123224.png
Screenshot 2025-03-22 123240.png		Screenshot 2025-03-22 123240.png
Smila-starter-notebook.ipynb		Smila-starter-notebook.ipynb
ajl-starter-notebook.ipynb		ajl-starter-notebook.ipynb
confusion-matrix.png		confusion-matrix.png
data-visualization.ipynb		data-visualization.ipynb
label_distribution_in_training_sett.png		label_distribution_in_training_sett.png
precision-recallCurve.png		precision-recallCurve.png
sneha-notebook.ipynb		sneha-notebook.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Equitable AI for Dermatology, AJL Kaggle Competition

👥 Team Members

🎯 Project Highlights

👩🏽‍💻 Setup & Execution

🏗️ Project Overview

📊 Data Exploration

🧠 Model Development

📈 Results & Key Findings

🏆 Performance Metrics

📊 Overall Model Performance

🌈 Performance Across Skin Tones

📉 Precision-Recall Summary

🖼️ Impact Narrative

🚀 Next Steps & Future Improvements

📄 References & Additional Resources

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

manojnathyogi/Equitable_AI_Dermatology-TeamGlycolic

Folders and files

Latest commit

History

Repository files navigation

Equitable AI for Dermatology, AJL Kaggle Competition

👥 Team Members

🎯 Project Highlights

👩🏽‍💻 Setup & Execution

🏗️ Project Overview

📊 Data Exploration

🧠 Model Development

📈 Results & Key Findings

🏆 Performance Metrics

📊 Overall Model Performance

🌈 Performance Across Skin Tones

📉 Precision-Recall Summary

🖼️ Impact Narrative

🚀 Next Steps & Future Improvements

📄 References & Additional Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages