A fast, lightweight resume parser built with Python and FastAPI. It extracts structured data from raw resume text using regex patterns and heuristics — no ML models, no external AI APIs.
Built this as a backend microservice that takes plain text resumes and returns clean JSON with contact info, education, work experience, skills, and more.
- Name — from the top of the resume using multiple detection strategies
- Emails, Phone Numbers, Websites — regex-based extraction from the header
- Date of Birth — if labeled in the resume
- Addresses — scored against known Indian and US cities, states, and PIN/ZIP codes
- Summary / Objective — detected by section headers
- Education — degree, institution, year, GPA, field of study
- Work Experience — company, job title, date range, description
- Skills — section-based parsing + full-text scan against 200+ known skills
- Certifications — listed certifications
- Projects — project name, technologies used, description
Every field comes with a confidence score (0.0 to 1.0) so the consumer knows how reliable each extraction is.
- Python 3.11+
- FastAPI — async REST API framework
- Uvicorn — ASGI server
- Pydantic — request/response validation
- Regex + Heuristics — all parsing is rule-based, zero ML dependencies
ResumeParser/
├── main.py # FastAPI app, endpoints, server startup
├── pyproject.toml # Dependencies and project config
├── run_server.bat # Quick-start script (Windows)
├── test_parser.py # Manual smoke test with sample resume
├── app/
│ ├── core/
│ │ ├── config.py # Server settings (host, port, log level)
│ │ ├── parser_engine.py # Main orchestrator — runs all extractors
│ │ └── section_detector.py # Splits resume into named sections
│ └── extractors/
│ ├── name_extractor.py # Candidate name extraction
│ ├── contact_extractor.py # Emails, phones, URLs, DOB
│ ├── address_extractor.py # Physical address parsing
│ ├── education_extractor.py # Degree, institution, GPA, year
│ ├── experience_extractor.py# Job title, company, dates, bullets
│ ├── skills_extractor.py # Skills (section + full-text keyword scan)
│ └── section_extractor.py # Certifications and projects
-
Section Detection — scans the resume line by line and splits it into sections (header, summary, education, experience, skills, etc.) using regex pattern matching on section headers.
-
Extraction Pipeline — the parser engine runs 10 extractors in sequence, each focused on one data type. Every extractor receives only its relevant section text.
-
Confidence Scoring — each extractor returns a confidence score. The engine computes a weighted overall confidence where name, emails, skills, and experience are weighted highest.
-
JSON Response — returns all extracted data, per-field confidence scores, overall confidence, and which sections were detected.
pip install -e .python main.pyServer runs on http://localhost:2700
Parse Resume
POST /parse-resume
Request body:
{
"resume_text": "John Doe\njohndoe@gmail.com | +91-9876543210\n\nSummary\nFull Stack Developer with 5 years of experience..."
}Response:
{
"success": true,
"data": {
"name": "John Doe",
"emails": ["johndoe@gmail.com"],
"phoneNumbers": ["+91-9876543210"],
"websites": [],
"dateOfBirth": null,
"addresses": [],
"summary": "Full Stack Developer with 5 years of experience...",
"education": [],
"workExperience": [],
"skills": ["Python", "FastAPI", "React"],
"certifications": [],
"projects": []
},
"confidence": {
"name": 0.88,
"emails": 0.99,
"phoneNumbers": 0.95,
"skills": 0.90
},
"overall_confidence": 0.85,
"sections_detected": ["header", "summary", "skills"]
}Health Check
GET /health
Returns {"status": "ok", "version": "1.0.0"}
FastAPI auto-generates interactive docs at http://localhost:2700/docs
python test_parser.pyParses a sample resume and prints the full JSON output to verify everything works.