ConversationLearning

ConversationLearning is a machine learning project for scoring conversations between a bot and a user. It uses data from Supabase along with target scores provided in a CSV file to train a regression model that predicts various conversational performance metrics.

Introduction

In modern personalized learning systems, measuring the quality of conversations between an AI bot and a student can help improve engagement and learning outcomes. This project implements a regression model that uses chat data from Supabase and manually scored target metrics (e.g., comprehension, participation, problem-solving, etc.) to predict conversation quality.

The pipeline includes:

Data Loading: Querying chat messages from Supabase and reading target scores from a CSV file.
Preprocessing: Adding role tokens (<BOT> and <USER>), grouping messages by thread, and tokenizing conversation text using the Longformer tokenizer.
Model Training: Training a regression model (built on top of a Longformer encoder) with multiple outputs.
Inference: Using the pre-trained model to predict target scores for a new conversation.

Features

Efficient data loading from Supabase (only retrieving conversations with valid thread IDs taken from scores.csv).
Comprehensive preprocessing that preserves target score columns.
A multi-output regression model using a pre-trained Longformer.
Training and inference pipelines with additional evaluation metrics (MSE, MAE, R²).
Optional plotting of predicted vs. ground truth values during training.

Directory Structure

Please make a file called config.py in the project's root directory and save the supabase url and service key here
all executable files are in the src folder
Data for scores in stored in data/ in the root of the project
Saved models are scored in the models/ directory

Installation

Clone the Repository:

git clone https://github.com/yourusername/ConversationLearning.git
cd ConversationLearning

**Set up a virtual environment

python3 -m venv venv
source venv/bin/activate

**Install Dependencies
```
pip install -r requirements.txt
```

Usage

Training

To train the model, run the training script from the project root:

python -m src.train --plot

The --plot flag is optional, and will help you visualize the predicted vs ground-truth values

Inference

After training, please run

python -m src.main

This will prompt you to enter a thread id of a conversation. Please enter the thread id and let the program do its thing!

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ConversationLearning

Table of Contents

Introduction

Features

Directory Structure

Installation

Usage

Training

Inference

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ConversationLearning

Table of Contents

Introduction

Features

Directory Structure

Installation

Usage

Training

Inference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages