An end-to-end pipeline for low-shot image classification, designed to bridge the gap between manual data collection and deep learning. This project features a complete workflow: mobile data collection via AppSheet, an automated cleaning suite, a custom PyQt6 curation GUI, and a high-performance Convolutional Neural Network (CNN) built with TensorFlow/Keras.
- AppSheet Integration: Seamlessly fetch image data and metadata collected from mobile devices using the AppSheet API.
- Automated Pre-cleaning: Scripts to automatically remove duplicate images (via SHA-256 hashing) and filter out low-resolution samples.
- Interactive Curation: A custom-built PyQt6 desktop application to manually review and select high-quality samples for the "low-shot" training set.
- Cost-Sensitive CNN: A specialized TensorFlow model that utilizes a custom loss function to penalize specific class confusions (e.g., Blue vs. Purple) more heavily.
- Advanced Data Augmentation: Custom layers for Saturation Jitter and Random Erasing to improve model robustness with minimal data.
GetData.py: Fetches metadata and downloads images from the AppSheet cloud.CleanData.py: Automated removal of duplicates and low-res images.SelectImages.py: PyQt6 GUI for manual image selection and labeling verification.MoveFiles.py: Utility to organize selected images into the final training directory.CNN.ipynb: The training pipeline, including data preprocessing (HSV conversion), model architecture, and evaluation.Inference.py: Predicts using the trained model.
Important
The script requires that uv and Microsoft Visual C++ Redistributable be installed.
git clone https://github.com/BeUnMerreHuman/Low-Shot-Color-Classifier.git
uv sync
To run inference with the already trained model, proceed directly to Step 6, skipping all prior data processing and training steps.
Create a .env file in the root directory to store your AppSheet credentials:
APP_ID=your_appsheet_id
ACCESS_KEY=your_access_key
TABLE_NAME=your_table_name
Follow these steps in order to process your data and train the model:
Collect your color samples using your configured AppSheet app. Once ready, run the retrieval script:
uv run GetData.py
This downloads images to the images/ folder and generates metadata.csv.
Remove redundant or poor-quality files:
uv run CleanData.py
- Duplicates: Removed using SHA-256 hash comparison.
- Resolution: Defaults to removing images smaller than 150x150 pixels.
Launch the GUI to hand-pick the best representatives for each class:
uv run SelectImages.py
- Controls: Click images to toggle selection (Green = Keep, Red = Drop).
- Output: Saves your choices to
selected_images.csv.
Organize the selected files into the folder structure required by the model:
uv run MoveFiles.py
Open CNN.ipynb in Jupyter or Google Colab. The notebook performs:
- Preprocessing: Converts images to HSV color space to better isolate "Color" features.
- Training: Executes a 3-block CNN with Global Average Pooling and Dropout.
- Cost-Sensitive Loss: Uses a
COST_MATRIXto specifically penalize confusion between Blue and Purple. - ONNX Model Export: Saves the trained model in ONNX format along with its configuration files and training results in the
output/directory.
Runs an ONNX image classification pipeline that preprocesses an input image (RGB→HSV, resize, pad) and outputs the predicted class with confidence scores.
uv run Inference.py --image test.jpg --model_dir output
The current architecture, KHILONA_CNN, achieves an overall accuracy of 94.12%, even with small datasets (75 samples per class), by leveraging HSV color space and targeted augmentation.
| Class | Precision | Recall | F1-Score |
|---|---|---|---|
| Blue | 1.00 | 0.91 | 0.95 |
| Purple | 0.84 | 1.00 | 0.91 |
| Yellow | 1.00 | 0.92 | 0.95 |

