Local, privacy-friendly resume analysis and classification using traditional ML and modern embeddings. Convert resumes, train a classifier, predict categories, and generate actionable advice — all on your machine.
- Resume-to-text conversion for CSV and PDF datasets
- TF‑IDF + Logistic Regression baseline classifier
- Optional sentence-transformer embeddings for richer features
- Simple CLI scripts for converting, training, predicting, and advice
- Configurable via
config.yaml
- Create and activate a virtual environment (recommended).
python -m venv .venv
. .venv/Scripts/activate # Windows PowerShell: .venv\Scripts\Activate.ps1- Install dependencies.
pip install -r requirements.txt
# Temporary compatibility pin
pip install numpy==1.26.0 --force-reinstall- Prepare data (see Data section) and run the pipeline below.
- Training data: Kaggle: UpdatedResumeDataSet.csv
- Test data: Kaggle: Resume Dataset
Recommended layout:
data/
raw/
UpdatedResumeDataSet.csv
test/
<your_pdf_files>.pdf
processed/
converted/
CSV to text:
python src/convert_dataset.py \
--csv data/raw/UpdatedResumeDataSet.csv \
--outdir data/processed/convertedPDF directory to text:
python src/convert_test_data.py \
--pdfdir data/test \
--outdir data/processed/converted_testThe CLI naming may be refined in future iterations.
python src/train_classifier.pyAdditional algorithms will be experimented with in future updates.
python src/predict.py --input <path_to_text_file>python src/advice.py --input <path_to_text_file>The current advice checks for:
- Length — is the resume too short?
- Missing keywords — e.g., "Python", "machine learning"
- Missing sections — Experience, Projects, Education, Skills
- Soft skills — mentions of communication, leadership, etc.
- Role match — proximity to a target career path (e.g., Data Science)
This is an early MVP and will evolve.
Tune behavior via config.yaml:
model:
embedding: all-MiniLM-L6-v2
tfidf_max_features: 1000
advice_threshold: 0.5src/
convert_dataset.py # CSV → text conversion
convert_test_data.py # PDF dir → text conversion
train_classifier.py # Train baseline classifier
predict.py # Predict class for a resume text
advice.py # Generate heuristic advice (MVP)
models/ # Saved models/artifacts
data/ # Raw/test/processed data
assets/ # Plots and images
- Experiment with additional classifiers (SVM, RandomForest, XGBoost)
- Improve advice heuristics and scoring
- Add evaluation on held-out test sets and reporting
- Streamline CLI and naming for consistency
- Optional lightweight web UI
Contributions, issues, and feature requests are welcome! Feel free to open a PR or issue.
