Terminal Autocomplete

This project is currently a work in progress

A lightweight ML-powered CLI tool that autocompletes your terminal commands based on your past .bash_history. Built using PyTorch, LSTM, and a custom tokenizer. This project stemmed from the fact that at work I have to use a ton of commands (that I tend to forget)!

Features

LSTM-based next-token prediction
Tokenizer, batching, vocab, and training pipeline
Interactive CLI with typer
Beautiful rich table output
Reusable model + vocab loading
CUDA and CPU-compatible

Project Structure

ml-terminal-autocomplete/
├── model/
│ ├── model_def.py # LSTM architecture
│ ├── dataset.py # Tokenization + batching
│ ├── train_model.py # Training loop
│ └── predict.py # Inference
├── cli/
│ └── main.py # Interactive CLI tool
├── data/
│ ├── bash_history.txt # Raw data
│ ├── tokenized_sequences.pkl
│ └── vocab.json
├── saved_models/
│ └── best_model.pt # Saved trained model
├── requirements.txt
└── README.md

Setup

# Clone the repo
git clone https://github.com/YOUR_USERNAME/ml-terminal-autocomplete.git
cd ml-terminal-autocomplete

# Set up virtual env (optional but recommended)
python -m venv venv
source venv/bin/activate   # or venv\Scripts\activate on Windows

# Install dependencies
pip install -r requirements.txt

Train the Model

python model/dataset.py        # Tokenize + save sequences
python model/train_model.py    # Train and save best model

Run Interactive CLI

python -m main interactive start

Example:

>>> git ch
╭──────────────┬─────────────╮
│ Token        │ Probability │
├──────────────┼─────────────┤
│ checkout     │ 0.8235      │
│ cherry-pick  │ 0.0921      │
│ commit       │ 0.0544      │
╰──────────────┴─────────────╯

Data Reference

The training/validation/testing data was sourced from n12bash's set!

Roadmap

This project is functional end-to-end, but there are several areas for improvement to increase usefulness, accuracy, and polish:

Phase 1 – Core Functionality

Load and preprocess a user’s bash_history
Tokenize sequences (char-level)
Train an LSTM-based model to predict the next character
Build a prediction engine using the trained model
Implement a CLI with typer and styled output via rich
Allow interactive predictions from user-typed shell fragments

Phase 2 – Model Improvement (In Progress)

Switch to token-level modeling (predict next full token, not character)
Improve dataset quality (real-world command history, fewer <UNK> tokens)
Add greedy or beam search to complete full word predictions from char-level model (if char-level is kept)
Add more training data (2k–5k lines) for better generalization
Introduce dropout/regularization tuning for better performance
Optionally try transformer-based architecture (nn.Transformer, GPT2, etc.)

Phase 3 – Evaluation & Testing

Add train/val/test split (e.g., 80/10/10)
Report metrics during training (e.g., loss, accuracy)
Add test-time evaluation: given real partial commands, does the model predict the correct token?
Add per-epoch logging and loss visualization (e.g., matplotlib, tensorboard, or simple CLI output)

Phase 4 – User Experience & Features

Add shell-style autocomplete (TAB-key mimic, fuzzy match)
Auto-complete full commands rather than just showing suggestions
Add CLI option to show top-k completions as a single line or inline
Create a bash or zsh plugin that calls the model for real-time shell autocompletion
Make CLI installable via pip (setup.py or pyproject.toml)

Phase 5 – Deployment & Distribution

Add pip install support (e.g., pip install ml-terminal-autocomplete)
Dockerize for easy use anywhere
Publish demo video/gif in README
Add usage examples in the README with screenshots

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Terminal Autocomplete

Features

Project Structure

Setup

Train the Model

Run Interactive CLI

Data Reference

Roadmap

Phase 1 – Core Functionality

Phase 2 – Model Improvement (In Progress)

Phase 3 – Evaluation & Testing

Phase 4 – User Experience & Features

Phase 5 – Deployment & Distribution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
cli		cli
data		data
model		model
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Terminal Autocomplete

Features

Project Structure

Setup

Train the Model

Run Interactive CLI

Data Reference

Roadmap

Phase 1 – Core Functionality

Phase 2 – Model Improvement (In Progress)

Phase 3 – Evaluation & Testing

Phase 4 – User Experience & Features

Phase 5 – Deployment & Distribution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages