Skip to content

Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

License

Notifications You must be signed in to change notification settings

osecen/pyAudioAnalysis

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Python library for audio feature extraction, classification, segmentation and applications

This is general info. Click here for the complete wiki and here for a more generic intro to audio data handling

News

  • [2022-01-01] If you are not interested in training audio models from your own data, you can check the Deep Audio API, were you can directly send audio data and receive predictions with regards to the respective audio content (speech vs silence, musical genre, speaker gender, etc).
  • [2021-08-06] deep-audio-features deep audio classification and feature extraction using CNNs and Pytorch
  • Check out paura a Python script for realtime recording and analysis of audio data

General

pyAudioAnalysis is a Python library covering a wide range of audio analysis tasks. Through pyAudioAnalysis you can:

  • Extract audio features and representations (e.g. mfccs, spectrogram, chromagram)
  • Train, parameter tune and evaluate classifiers of audio segments
  • Classify unknown sounds
  • Detect audio events and exclude silence periods from long recordings
  • Perform supervised segmentation (joint segmentation - classification)
  • Perform unsupervised segmentation (e.g. speaker diarization) and extract audio thumbnails
  • Train and use audio regression models (example application: emotion recognition)
  • Apply dimensionality reduction to visualize audio data and content similarities

An audio classification example

More examples and detailed tutorials can be found at the wiki

pyAudioAnalysis provides easy-to-call wrappers to execute audio analysis tasks. Eg, this code first trains an audio segment classifier, given a set of WAV files stored in folders (each folder representing a different class) and then the trained classifier is used to classify an unknown audio WAV file

from pyAudioAnalysis import audioTrainTest as aT
aT.extract_features_and_train(["classifierData/music","classifierData/speech"], 1.0, 1.0, aT.shortTermWindow, aT.shortTermStep, "svm", "svmSMtemp", False)
aT.file_classification("data/doremi.wav", "svmSMtemp","svm")

Result: (0.0, array([ 0.90156761, 0.09843239]), ['music', 'speech'])

In addition, command-line support is provided for all functionalities. E.g. the following command extracts the spectrogram of an audio signal stored in a WAV file: python audioAnalysis.py fileSpectrogram -i data/doremi.wav

Further reading

Apart from this README file, to bettern understand how to use this library one should read the following:

@article{giannakopoulos2015pyaudioanalysis,
  title={pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis},
  author={Giannakopoulos, Theodoros},
  journal={PloS one},
  volume={10},
  number={12},
  year={2015},
  publisher={Public Library of Science}
}

For Matlab-related audio analysis material check this book.

Author

Theodoros Giannakopoulos, Principal Researcher of Multimodal Machine Learning at the Multimedia Analysis Group of the Computational Intelligence Lab (MagCIL) of the Institute of Informatics and Telecommunications, of the National Center for Scientific Research "Demokritos"

Installation Steps

  1. Clone the repository:
git clone https://github.com/tyiannak/pyAudioAnalysis.git
cd pyAudioAnalysis
  1. Create and activate a virtual environment (recommended):
python3 -m venv venv
# On Windows:
venv\Scripts\activate
# On Unix or MacOS:
source venv/bin/activate
  1. Install dependencies and package:
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -e .

Using the UI Interface

pyAudioAnalysis now includes a web-based UI interface for easy access to all functionality:

Starting the UI

  1. Ensure you're in your virtual environment:
# On Windows:
venv\Scripts\activate
# On Unix or MacOS:
source venv/bin/activate
  1. Launch the UI:
PYTHONPATH=$(pwd) streamlit run pyAudioAnalysis/audioUI.py
  1. Your default web browser will automatically open to the UI (typically http://localhost:8501)

UI Features

The interface provides easy access to:

  • Audio Classification: Upload and classify audio files using pre-trained models
  • Feature Extraction: Visualize audio features like MFCCs, spectrograms
  • Beat Extraction: Analyze rhythm and tempo
  • Segmentation: Perform audio segmentation tasks
  • Regression: Train and use regression models

Usage Tips

  • Audio files must be in WAV format
  • For MP3 files, convert to WAV first using FFmpeg:
ffmpeg -i input.mp3 output.wav
  • Models should be trained first using the command line interface before using them in the UI
  • Large audio files may take longer to process - consider splitting them into smaller segments

Running Tests

Basic Testing Commands

# Run all tests
python -m pytest tests/

# Run tests with coverage report
python -m pytest --cov=pyAudioAnalysis tests/

# Run specific test file
python -m pytest tests/test_standard.py

# Run tests verbosely
python -m pytest -v tests/

# Run tests matching specific pattern
python -m pytest -k "test_feature" tests/

Test Organization

Test files are organized by functionality:

  • test_standard.py: Core functionality tests (previously in shell scripts)
  • test_audio_utils.py: Utility function tests
  • test_ui.py: Streamlit UI tests

Coverage Reports

# Generate HTML coverage report
python -m pytest --cov=pyAudioAnalysis --cov-report=html tests/

# Generate both coverage and test reports
python -m pytest tests/ --cov=pyAudioAnalysis --cov-report=html --html=tests/test-report.html

The HTML coverage report will be available in the htmlcov directory, and the test report will be in tests/test-report.html.

Continuous Integration

The project uses GitHub Actions for continuous integration, running:

  • All tests with coverage reporting
  • Code style checks
  • System dependency verification
  • Multiple Python version testing

Test reports and coverage information are automatically uploaded as artifacts and to Codecov.

Troubleshooting Tests

If you encounter any issues:

  1. Ensure all dependencies are installed:
pip install -r requirements.txt
  1. Verify system dependencies:
# Ubuntu/Debian
sudo apt-get install ffmpeg libavcodec-extra

# MacOS
brew install ffmpeg
  1. Check that you're in the correct directory and virtual environment is activated

  2. If you get audio-related errors, ensure your system's audio drivers are properly configured

Command Line Usage

For those who prefer command line usage, all features are still available through the CLI:

from pyAudioAnalysis import audioTrainTest as aT
# Train a classifier
aT.extract_features_and_train(["classifierData/music","classifierData/speech"], 1.0, 1.0, aT.shortTermWindow, aT.shortTermStep, "svm", "svmSMtemp", False)
# Classify a file
aT.file_classification("data/doremi.wav", "svmSMtemp","svm")

Model Organization for UI

The UI looks for pre-trained models in the data/models directory. To use the classification features in the UI:

  1. Create a models directory structure:
pyAudioAnalysis/
├── data/
│   └── models/
│       ├── svm_rbf_sm.svm         # Speech/Music classifier
│       ├── svm_rbf_4class.svm     # 4-class audio classifier
│       ├── knn_speaker_10.knn     # 10-speaker recognition
│       ├── svm_rbf_movie8.svm     # Movie genre classifier
│       └── svm_rbf_musical_genre_6.svm  # Music genre classifier
  1. Model Naming Convention:
  • Format: {classifier_type}_{task_name}.{ext}
  • Classifier types: svm_rbf or knn
  • Extensions: .svm for SVM models, .knn for KNN models
  1. Default Model Search Paths:
./data/models
../data/models
{package_directory}/data/models

If you're using pre-trained models, place them in one of these locations. The UI will automatically detect and list available models in the classification section.

Note: You can train your own models using the command line interface and place them in the models directory for use in the UI.

About

Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 92.6%
  • Shell 2.9%
  • CSS 2.5%
  • HTML 1.4%
  • MATLAB 0.6%