Discovering Hidden Algebraic Structures via Transformers with Rank-Aware Beam GRPO

This repository contains code and resources for the paper "Discovering Hidden Algebraic Structures via Transformers with Rank-Aware Beam GRPO". Our research explores the potential of transformer models to recognize and decompose hidden algebraic substructures within polynomials.

Data_Generation/Using_Sympy/using_sympy.py offers a fast, parallelized alternative for dataset generation
Generates polynomial-in-polynomial substitution problems with prefix notation
Supports multi-core parallel generation for large datasets (1M+ samples)
No Mathematica dependency required for data generation

2. Training Code

Our implementation is based on Andrej Karpathy's minGPT with the following enhancements:

Polynomial-specific tokenization
Parallelized evaluation
Integrated beam search with direct Mathematica evaluation
Flash Attention support for 2-3x faster training (automatically enabled)
KV-Cache optimization for 1.4-5x faster inference (automatically enabled for evaluation modes)
Combined Flash Attention + KV-Cache: 3x overall speedup
BGRPO training fully compatible with both optimizations

Getting Started

1. Setup

bash setup.sh
# Or use setup_with_uv.sh for faster installation with uv package manager

2. Generating Datasets

Using Mathematica

The MMA_package/example.nb notebook demonstrates dataset generation for:

Single variable polynomials with polynomial substitution
Single variable polynomials with O(N) singlet substitution
Multi-variable polynomials

Using Python/SymPy

For faster generation without Mathematica dependency:

from using_sympy import generate_all_datasets_parallel

# Generate datasets with parallel processing
generate_all_datasets_parallel(
    file_directory="data_storage/dataset/single_variable",
    num_train=1000000,  # 1M training samples
    num_test=3000,      # 3K test samples per degree combination
    num_valid=128,      # 128 validation samples
    inner_only=True,    # Single variable format
    num_cpus=None       # Auto-detect optimal CPU usage
)

This generates training, validation, and 9 test datasets (for all degree combinations) with automatic deduplication and shuffling.

3. Training

Single Variable Polynomial Decomposition (Paper: $\mathcal{D}_1$ - First Part)

This experiment corresponds to the first evaluation axis in our paper ($\mathcal{D}_1$), specifically the first part examining the effect of polynomial degrees.

For polynomial-in-polynomial substitution problems (finding inner and outer polynomials):

# Generate dataset
python -c "
from Data_Generation.Using_Sympy.using_sympy import generate_all_datasets_parallel
generate_all_datasets_parallel(
    file_directory='data_storage/dataset/single_variable',
    num_train=1000000,
    num_test=3000,
    num_valid=128,
    inner_only=True  # For single variable case
)"

# Train model
python Training/mingpt/main.py \
    --dataset_path data_storage/dataset/single_variable/training_dataset.txt \
    --val_dataset_path data_storage/dataset/single_variable/validation_dataset.txt \
    --save_path data_storage/model/single_variable_model.pt \
    --block_size 350 \
    --n_layer 6 \
    --n_head 8 \
    --n_embd 512 \
    --max_epochs 10 \
    --batch_size 512

O(N) Singlet Substitution (Toy Example - Not in Paper)

This is a toy example not included in our paper. An example experiment is provided in Training/example/example_with_ON_data.sh, which includes:

Training a model with example O(N) data
Fine-tuning capabilities on pre-trained models
Testing with greedy search inference
Testing with beam search (To use beam search, mathematica should be setup before.)

Paper Experiments

The complete experiment code for all results in our paper can be found in Training/things_on_paper/:

exp0: O(N) experiments
exp1_2: $\mathcal{D}_1$ first part - varying degrees of inner and outer polynomials
exp1, exp2: $\mathcal{D}_3$ - multi-variable polynomial experiments
exp3-8: $\mathcal{D}_2$ first part - varying embedding dimension and layer number
exp10-11: $\mathcal{D}_2$ second part - varying number of attention heads
exp12-16: $\mathcal{D}_1$ second part - varying number of variables in inner and outer polynomials

Important: To run these paper experiments (other than the example cases above), you need to first generate the training, test, and validation datasets using the Mathematica package. The datasets should be generated and placed in data_storage/things_on_paper/dataset/ before running the experiment scripts.

4. Evaluation

Single Variable Polynomial Decomposition

# Greedy search evaluation
python Training/mingpt/run.py inequality_evaluate4 \
    --block_size 350 \
    --n_layer 6 \
    --n_head 8 \
    --n_embd 512 \
    --reading_params_path data_storage/model/single_variable_model.pt \
    --evaluate_corpus_path data_storage/dataset/single_variable/test_dataset_3_3.txt \
    --outputs_path data_storage/predictions/single_variable/greedy_3_3.txt

# Beam search evaluation
python Training/mingpt/run.py debug_beam \
    --block_size 350 \
    --n_layer 6 \
    --n_head 8 \
    --n_embd 512 \
    --beam_width 30 \
    --reading_params_path data_storage/model/single_variable_model.pt \
    --evaluate_corpus_path data_storage/dataset/single_variable/test_dataset_3_3.txt \
    --outputs_path data_storage/predictions/single_variable/beam_3_3.txt

Multi-Variable Polynomial Decomposition

For multi-variable polynomial decomposition (Paper: $\mathcal{D}_3$), we extend the problem to multiple variables where each variable gets its own inner polynomial:

Data Generation:

from Data_Generation.Using_Sympy.using_sympy import generate_multivariate_datasets_parallel

# Generate multi-variable datasets
generate_multivariate_datasets_parallel(
    file_directory='data_storage/dataset/multi_variable',
    num_inner_vars=3,      # Number of inner variables (a0, a1, a2)
    num_outer_vars=3,      # Number of outer variables (b0, b1, b2)
    max_degree_inner=2,    # Max degree for inner polynomials
    max_degree_outer=2,    # Max degree for outer polynomial
    num_train=300000,      # Training samples
    num_test=3000,         # Test samples
    num_valid=128,         # Validation samples
    num_cpus=None          # Auto-detect optimal CPU usage
)

Training with Extended Vocabulary:

python Training/mingpt/run.py inequality_finetune \
    --extended_vocab \
    --block_size 800 \
    --max_number_token 101 \
    --n_layer 6 \
    --n_head 8 \
    --n_embd 512 \
    --batch_size 128 \
    --finetune_corpus_path data_storage/dataset/multi_variable/training_dataset.txt \
    --valid_corpus_path data_storage/dataset/multi_variable/validation_dataset.txt \
    --writing_params_path data_storage/model/multi_variable_model.pt

Evaluation:

# Greedy search with extended vocabulary
python Training/mingpt/run.py inequality_evaluate4 \
    --extended_vocab \
    --block_size 800 \
    --max_output_length 400 \
    --max_number_token 101 \
    --n_layer 6 \
    --n_head 8 \
    --n_embd 512 \
    --sympy 1 \
    --reading_params_path data_storage/model/multi_variable_model.pt \
    --evaluate_corpus_path data_storage/dataset/multi_variable/test_dataset.txt \
    --outputs_path data_storage/predictions/multi_variable/greedy.txt

# Beam search
python Training/mingpt/run.py debug_beam \
    --extended_vocab \
    --block_size 800 \
    --max_output_length 400 \
    --max_number_token 101 \
    --beam_width 10 \
    --n_layer 6 \
    --n_head 8 \
    --n_embd 512 \
    --sympy 1 \
    --reading_params_path data_storage/model/multi_variable_model.pt \
    --evaluate_corpus_path data_storage/dataset/multi_variable/test_dataset.txt \
    --outputs_path data_storage/predictions/multi_variable/beam.txt

Key Differences from Single-Variable:

Extended Vocabulary: Includes tokens for variables a0-a18, b0-b18, n1-n18
Larger Block Size: 800 tokens to accommodate longer multi-variable expressions
Dataset Format: expanded ? outer & inner0 & inner1 & inner2 (uses '?' separator)
Number Range: Supports 0-100 with --max_number_token 101

General Evaluation Notes

Beam search results can be read directly from the output text files. For greedy search inference, the evaluation code only checks for exact matches with the test dataset. Since problems may have multiple valid answers, you can use Mathematica to verify the correctness of model-generated answers as demonstrated in MMA_package/example.nb.

Performance: Evaluation modes (inequality_evaluate4 and debug_beam) automatically use KV-Cache optimization for 1.4-5x faster inference while maintaining identical results to the original implementation.

5. BGRPO Fine-tuning

Rank-aware Beam Group Relative Policy Optimization (BGRPO) can be used to further improve model performance:

cd Training/BGRPO
bash run_single_variable_model.sh

BGRPO training supports KV-Cache optimization for faster beam search during reinforcement learning.

Performance Optimizations

KV-Cache (Key-Value Cache)

Our implementation includes KV-cache optimization that significantly speeds up autoregressive generation:

Benefits:

3x overall speedup when combined with Flash Attention
Particularly effective for beam search - essential for BGRPO training
Automatic for evaluation - enabled by default in inequality_evaluate4 and debug_beam modes
Easy to enable for training - just add use_kvcache=True when loading models

To enable KV-cache in your custom scripts:

from mingpt.model_loader import load_model_and_tokenizer

model, tokenizer = load_model_and_tokenizer(
    config_path=config_path,
    model_dir_path=model_dir_path,
    device=device,
    use_kvcache=True  # Enable KV-cache
)

Technical Details:

Caches key-value projections from previous tokens
Eliminates redundant computation during generation
Memory overhead: ~5MB for 200 tokens
See Training/BGRPO/KV_Cache_Guide.md for comprehensive documentation

Interactive Workflow

The Polynomial_decomposition.ipynb Jupyter notebook provides a comprehensive workflow demonstrating:

Dataset generation using Python/SymPy
Supervised learning with transformer models
Evaluation with greedy and beam search
Rank-aware GRPO reinforcement learning fine-tuning

This notebook serves as a complete tutorial for polynomial decomposition tasks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Discovering Hidden Algebraic Structures via Transformers with Rank-Aware Beam GRPO

Contents

1. Data Generation

Mathematica Package

Python/SymPy Implementation

2. Training Code

Getting Started

1. Setup

2. Generating Datasets

Using Mathematica

Using Python/SymPy

3. Training

Single Variable Polynomial Decomposition (Paper: $\mathcal{D}_1$ - First Part)

O(N) Singlet Substitution (Toy Example - Not in Paper)

Paper Experiments

4. Evaluation

Single Variable Polynomial Decomposition

Multi-Variable Polynomial Decomposition

General Evaluation Notes

5. BGRPO Fine-tuning

Performance Optimizations

KV-Cache (Key-Value Cache)

Interactive Workflow

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
Data_Generation		Data_Generation
Training		Training
data_storage		data_storage
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
Polynomial_decomposition.ipynb		Polynomial_decomposition.ipynb
README.md		README.md
requirements.txt		requirements.txt
setup.sh		setup.sh
setup_with_uv.sh		setup_with_uv.sh

geo-179/PolynomialDecomposition

Folders and files

Latest commit

History

Repository files navigation

Discovering Hidden Algebraic Structures via Transformers with Rank-Aware Beam GRPO

Contents

1. Data Generation

Mathematica Package

Python/SymPy Implementation

2. Training Code

Getting Started

1. Setup

2. Generating Datasets

Using Mathematica

Using Python/SymPy

3. Training

Single Variable Polynomial Decomposition (Paper: $\mathcal{D}_1$ - First Part)

O(N) Singlet Substitution (Toy Example - Not in Paper)

Paper Experiments

4. Evaluation

Single Variable Polynomial Decomposition

Multi-Variable Polynomial Decomposition

General Evaluation Notes

5. BGRPO Fine-tuning

Performance Optimizations

KV-Cache (Key-Value Cache)

Interactive Workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages