Skip to content

rishabh23002/Speaker-Verification-using-GMM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ™οΈ Speaker Verification using Gaussian Mixture Models (GMM)

This repository implements a Speaker Verification System using Gaussian Mixture Models (GMMs) with MFCC, Delta, and Double Delta features. The system is trained and tested on a subset of a speaker recognition dataset containing 1-second WAV files per speaker.


πŸ“Œ Project Overview

Speaker verification is an essential component in audio-based security and authentication systems. This project uses GMMs to learn speaker-specific acoustic patterns and verify the identity of a speaker based on short utterances.

πŸ” Key Highlights

  • πŸ“‚ Preprocessing includes noise addition, resampling, and pre-emphasis.
  • 🎚️ Feature extraction based on MFCC, delta, and double-delta coefficients.
  • 🎯 Model training using Gaussian Mixture Models with Expectation-Maximization.
  • πŸ“Š Equal Error Rate (EER) used for evaluation; best model achieves EER = 0.177.
  • πŸ‘₯ Includes speaker pair comparisons and real-world applicability demonstrations.

πŸ› οΈ Project Structure

.
β”œβ”€β”€ Audio/                     # Contains 1-second audio files for each speaker
β”œβ”€β”€ Noise/                     # Contains noise samples used for robustness
β”œβ”€β”€ test_pairs.txt             # File containing speaker comparison test cases
β”œβ”€β”€ ml-end.ipynb               # Main notebook for training, evaluation, and testing
β”œβ”€β”€ README.md                  # You're reading it

πŸ”„ Pipeline Summary

1. Preprocessing

  • Silence and Noise Removal
  • Resampling to 16kHz
  • Pre-Emphasis Filtering

2. Feature Extraction

  • MFCC (Mel-Frequency Cepstral Coefficients)
  • MFCC Delta
  • MFCC Double Delta

3. Model Training

  • Trained a GMM per speaker using sklearn.mixture.GaussianMixture
  • Parameters: n_components=4, max_iter=160

4. Evaluation

  • Log-likelihood comparison for speaker prediction
  • Equal Error Rate (EER) calculated for performance
  • Also supports speaker pair comparison

πŸ“Š Results

Model Version EER
MFCC + Delta + Double Delta 0.177
MFCC only Higher

A lower EER indicates better performance (balance between false accept and false reject).


πŸ“š References

Key papers referenced include:

  • Reynolds & Rose (1995): Robust text-independent speaker identification using GMM
  • Dehak et al. (2007): Modeling Prosodic Features with Joint Factor Analysis
  • Jadhav et al. (2018): GMM + MFCC + EM-based speaker recognition

See full reference list in the report


πŸ“ˆ Future Work

  • Train on larger datasets with more speakers
  • Explore i-vector and x-vector embeddings
  • Apply deep learning approaches (e.g., LSTM, CNN) for feature extraction
  • Integrate with real-time APIs or voice assistants

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published