This repository was archived by the owner on Jul 30, 2025. It is now read-only.

Description
I execute blow commands:
git clone https://huggingface.co/datasets/cerebras/SlimPajama-627B
python scripts/prepare_slimpajama.py --source_path /path/to/SlimPajama --tokenizer_path data/llama --destination_path data/slim_star_combined --split validation --percentage 1.0
only got 257 validation file, i wonder is there some mistake