StatLab is an interactive web application designed to demonstrate core Data Science and Statistical Analysis competencies.
Unlike static notebooks, this project features a dynamic synthetic data generator that creates realistic employee datasets on the fly using statistical probability distributions. It enables users to perform real-time hypothesis testing, regression analysis, and receive automated insights via Google Gemini AI.
This project showcases the practical application of the following statistical concepts, essential for any Data Scientist role:
- Normal Distribution (Gaussian): Implemented the Box-Muller Transform to generate realistic distributions for
AgeandPerformance Scores. - Skewed Distributions: Modeled
Years of Experienceto reflect real-world seniority hierarchies. - Probabilistic Modeling: Used conditional probability logic to model
Churnbased on Salary, Overwork, and Performance factors.
- Independent Samples T-Test: Compares means between two independent groups (e.g., Engineering vs. Sales salaries).
- Significance Testing: Calculates T-Statistic and P-Value to accept or reject null hypotheses.
- Simple Linear Regression: Implements the Least Squares method from scratch (
y = mx + c) to model relationships. - Correlation: Calculates Pearson Correlation Coefficient (r) and Coefficient of Determination (R²) to measure relationship strength.
- Central Tendency: Mean, Median.
- Dispersion: Standard Deviation, Variance, Range.
- Skewness: Measures the asymmetry of the probability distribution.
- Synthetic Dataset Generator: Instantly generate N=200+ records of employee data with realistic noise and correlations.
- Interactive Dashboard:
- Overview: Key KPIs (Churn Rate, Avg Salary) and raw data view.
- Hypothesis Lab: Interactive T-Test tool with visualization.
- Regression Lab: Scatter plots with dynamic trend lines and fit metrics.
- AI Analyst Integration:
- Uses Google Gemini API to generate plain-English "Data Scientist" interpretations of statistical results.
- CSV Export: Download the generated dataset for further analysis in Python/R.
- Frontend: React 18, TypeScript
- Styling: Tailwind CSS
- Visualization: Recharts
- AI/LLM: Google GenAI SDK
- Icons: Lucide React
To run this project locally:
-
Clone the repository
git clone https://github.com/your-username/statlab-data-science-portfolio.git cd statlab-data-science-portfolio -
Install Dependencies
npm install
-
Set up Environment
- Create a
.envfile in the root directory. - Add your Gemini API Key (Required for AI features):
VITE_API_KEY=your_google_gemini_api_key
Note: If you don't have a key, the statistical features will still work, but AI insights will be disabled.
- Create a
-
Run the Development Server
npm run dev
/src
├── services/
│ ├── statsService.ts # Core statistical algorithms (T-Test, Regression, StdDev)
│ ├── dataService.ts # Synthetic data generation logic
│ └── geminiService.ts # AI integration
├── components/ # Reusable UI components
├── types.ts # TypeScript definitions
└── App.tsx # Main application logic
This project is licensed under the MIT License - see the LICENSE file for details.
Created to demonstrate full-stack data science capabilities.