📚 Data-Augmentation Toolkit

A Streamlit-based UI that lets you:

Upload single- or multi-turn SFT / Alignment datasets (.jsonl or .csv)
Validate the file before launching the heavy pipeline
Run the data-augmentation pipeline with live logs & progress bar
Download the generated JSONL in one click
Check your LLM endpoint instantly via a Health-Check button

✨ Features

Area	What it does
LLM Connection	Enter `model_name`, `api_key`, `base_url` & press Health-Check to verify connectivity.
Generation Mode	Choose between single/multi-turn SFT or Alignment pipelines.
Threading	Adjustable worker slider (1-16) controls concurrent requests.
File Validation	Early checks for broken JSONL, malformed CSV, or wrong extensions with descriptive errors.
Live Feedback	Real-time `tqdm` progress + log stream in the main pane.
Output	Final JSONL is offered for download; CSV deliberately omitted to keep training format consistent.

🛠 Quick Start

# 1. Clone & enter the repo
git clone https://github.com/your-org/data-augmentation-toolkit.git
cd data-augmentation-toolkit

# 2. Create env & install deps
python -m venv .venv
source .venv/bin/activate      # Windows: .venv\Scripts\activate
pip install -r requirements.txt

# 3. Run the Streamlit app
streamlit run app.py

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
agents		agents
utils		utils
.DS_Store		.DS_Store
alignment_data.py		alignment_data.py
app.py		app.py
config.py		config.py
config.yaml		config.yaml
executor.py		executor.py
image.png		image.png
readme.md		readme.md
req.txt		req.txt
sft_data.py		sft_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📚 Data-Augmentation Toolkit

✨ Features

🛠 Quick Start

About

Uh oh!

Releases

Packages

Languages

namantiwari2002/DataAugmenToolkit

Folders and files

Latest commit

History

Repository files navigation

📚 Data-Augmentation Toolkit

✨ Features

🛠 Quick Start

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages