Skip to content

cacelass/cacelass

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 

Repository files navigation

Alex | ML Systems Engineer

I build production-ready machine learning systems — from raw data to deployed models — designed to be reproducible, testable, and operational in real environments.

Most ML projects fail outside the notebook. I focus on production constraints: data quality, reproducibility, evaluation correctness, and system reliability.

Background in systems administration (ASIR). I design ML systems with infrastructure, failure modes, and performance in mind before model complexity.


What I deliver

  • End-to-end ML pipelines (ingestion → validation → feature engineering → training → evaluation → inference)
  • Data pipelines with schema validation, quality checks, and leakage prevention
  • Reproducible environments with versioned data and locked dependencies (uv, Docker)
  • Calibrated probability outputs (Brier score, reliability curves), not raw model scores
  • Time-aware evaluation frameworks (walk-forward / stratified CV depending on problem type)
  • ML systems decoupled from business decision logic
  • Batch inference pipelines designed for scheduled production workloads

Tech Stack

Python SQL Docker Linux Azure Git

HDFS Hive Sqoop

Pandas NumPy Scikit-learn PyTorch Polars

Power BI

Certified: Microsoft Azure Data Fundamentals · Power BI (DAX)
Credly


Featured Projects

dskit — Reproducible ML project scaffold

Production-grade ML template designed to eliminate environment drift and enforce consistent project structure.

Why it matters
Most ML failures are not model failures — they are reproducibility and data consistency failures.

What it enforces

  • Strict project structure (data/, features/, models/, pipelines/)
  • Dependency locking with uv
  • Documentation system with Sphinx
  • Pandas / Polars interoperability

Result
Faster setup, consistent engineering standards, zero environment ambiguity.


credit-risk-classifier — Credit risk scoring system

ML system designed for real decision-making, focused on calibrated probabilities instead of raw predictions.

Key decisions

  • Logistic Regression + Random Forest for interpretability vs performance trade-off
  • Probability calibration (Platt scaling / isotonic regression)
  • Decision threshold decoupled from model (business layer owns decision policy)

Evaluation

  • Stratified k-fold cross-validation
  • Brier score + AUC as primary metrics
  • Strict leakage prevention across time and folds

Result
AUC: 0.81 with calibrated outputs suitable for operational decision systems.


stock-market-prediction — Time series under real constraints

ML applied to a non-stationary, low signal-to-noise environment under realistic constraints.

What most people do wrong
Random splits → leakage → inflated performance

What this project enforces

  • Walk-forward validation (deployment simulation)
  • Baseline-first evaluation discipline
  • Strict no-leakage constraints

Result
Marginal improvement over baseline, consistent with efficient market behavior.


Positioning

I design ML systems that remain stable under real-world constraints: shifting data distributions, imperfect labels, and production latency.

I don’t optimize notebooks. I design systems that survive production.


About me

I enjoy learning new technologies and adapting quickly to different problem domains. I’m comfortable working across the full ML stack and iterating on systems from prototype to production.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors