Skip to content

KoustubhPhalak/GPT-1-From-Scratch

Repository files navigation

GPT-1-From-Scratch

Implementation of 56M GPT-1 from scratch. Since BookCorpus dataset (~1B tokens) is no longer publicly available, I instead use WikiText-103 (103M tokens) dataset to pre-train GPT-1

GPT-1 parameter values

Model dimension (d_model) = 512

Number of Attention Heads (n_heads) = 8

Number of Decoders (num_decoder_layers) = 8

Maximum sequence length (max_len) = 128

Feedforward layer hidden size (dim_feedforward) = 2048

Vocabulary size (vocab_size) = 30000 for WikiText-103 dataset

Batch size (batch_size) = 64


TOTAL PARAMETER COUNT ≈ 56M

Steps to run

  1. Run input_processing.py to generate tokenized Wikitext data and save it in .pt torch tensor format

  2. Run main_pretrain.py to pre-train GPT-1. The user can change training settings from this file and model parameters from GPT_Decoder.py

  3. Run last cell in test.ipynb to generate random text from pre-trained GPT-1

Sample generation output: The earliest known mention of this date was that of 544 , when King Olaf II of Norway was discovered in the reign of King Olaf II of Norway . The earliest recorded mention of this date was from 544 , when King Olaf was assassinated . The date of the birth is unknown , but it is unclear whether Olaf was killed . Olaf 's birth date is unknown , but it is likely that Olaf was killed by the Vikings in 842 , but Olaf 's reign is uncertain . .

This can be improved by increasing model size, but is good enough for 56M parameter model.

About

Implementation of 56M GPT-1 from scratch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published