This repo contains code to extend/replicate the dataset present in the Kaggle Bengali.AI Handwritten Grapheme Classification. For the dataset, codes, discussions and leaderboards, visit the Kaggle competition page. The paper describing the dataset, protocols and future directions can be found here or here.
.
- data
-- scanned
-- extracted
-- error
-- packed
- codes
- collection
-- A4
-- Letter
- logs
- Run
python ./data/extracted/purge.pyto clear extraction folders - Download and extract batch of scanned file .jpgs to
./data/scanned/<batchname> cd ./data/scannedand runpython transcribeGui.py <batchname>- After Roll/ID are transcribed execute
extract.mon MATLAB. Specifysourcefolder before executing. ReplacesurfAlignGPU()withsurfAlignin the absence of GPU support. Setdisp=trueforocrForm(), surfAlign(), surfAlignGPU()to validate extraction performance. ForsurfAlign()setnonrigid=true. cd ./data/errorand check for extraction failures.cd ./data/extractedand check for label errors in sub-folders.- Run
python pack.pywhich will create separate folders for each extracted<batchname>inside./data/packed. cd ./data/packed/and runpython labelXGui.py <batchname>. Selectoverwritingandempty blobsto be discarded andCtrl+Sto save. After you are done going through all of the packets, click the transfer button to remove errors from the packaged folder.
-
MATLAB 2017b or higher
-
MATLAB Computer Vision Toolbox
-
Python 3.6.3 or higher
-
Pillow == 4.2.1
- Kaggle competition page www.kaggle.com/c/bengaliai-cv19
- Dataset introduction COCO-Grapheme