-
-
Notifications
You must be signed in to change notification settings - Fork 134
Home
Malaya (Ma-la-ya) is a Python library for Bahasa Malaysia Natural Language Processing.
You can check /example directory for a quick start.
pip install scikit-learn==0.19.1 requests fuzzywuzzy tqdm
pip nltk unidecode numpy scipy python-levenshtein pandas xgboost==0.80
python -m nltk.downloader punktIf you want to use CPU,
pip install tensorflow==1.5If you want to use GPU,
pip install tensorflow-gpu==1.5git clone https://github.com/DevconX/Malaya && cd Malaya && python setup.py installEasy, simply import on top of your code,
import malayaMalaya trained on Python 3.6, supposedly able to support Python 3.X but below than Python 3.7.
Deep learning Malaya trained on CUDA 8.0 and Tensorflow 1.5, supposedly any new version of CUDA and Tensorflow able to support Tensorflow 1.5 features.
Malaya depends on scikit-learn 0.19.1, any upper versions not recommended.
We also trying to expand Malaya team, we are open for any contribution or donation!
We don't care your programming skills, anyone can improve Malaya.
Most of the data gathered using crawlers crawled through targeted malaysia websites. I am not aware of any data protection.
- Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M. and Etzioni, O., 2007, January. Open Information Extraction from the Web. In IJCAI (Vol. 7, pp. 2670-2676).
- Angeli, G., Premkumar, M.J. and Manning, C.D., 2015, July. Leveraging linguistic structure for open domain information extraction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL 2015).
- Suhartono, D., 2014. Lemmatization Technique in Bahasa: Indonesian. Journal of Software, 9(5), p.1203.