Skip to content
HUSEIN ZOLKEPLI edited this page Sep 13, 2018 · 20 revisions

Welcome to Malaya wiki!

Malaya (Ma-la-ya) is a Python library for Bahasa Malaysia Natural Language Processing.

Getting Started

You can check /example directory for a quick start.

Installation

1. Install dependencies

pip install scikit-learn==0.19.1 requests fuzzywuzzy tqdm
pip nltk unidecode numpy scipy python-levenshtein pandas xgboost==0.80
python -m nltk.downloader punkt

If you want to use CPU,

pip install tensorflow==1.5

If you want to use GPU,

pip install tensorflow-gpu==1.5

2. Clone and install

git clone https://github.com/DevconX/Malaya && cd Malaya && python setup.py install

How-to-start

Easy, simply import on top of your code,

import malaya

Supported environment

Malaya trained on Python 3.6, supposedly able to support Python 3.X but below than Python 3.7.

Deep learning Malaya trained on CUDA 8.0 and Tensorflow 1.5, supposedly any new version of CUDA and Tensorflow able to support Tensorflow 1.5 features.

Malaya depends on scikit-learn 0.19.1, any upper versions not recommended.

Contribution

We also trying to expand Malaya team, we are open for any contribution or donation!

We don't care your programming skills, anyone can improve Malaya.

Disclaimer

Most of the data gathered using crawlers crawled through targeted malaysia websites. I am not aware of any data protection.

References

  1. Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M. and Etzioni, O., 2007, January. Open Information Extraction from the Web. In IJCAI (Vol. 7, pp. 2670-2676).
  2. Angeli, G., Premkumar, M.J. and Manning, C.D., 2015, July. Leveraging linguistic structure for open domain information extraction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL 2015).
  3. Suhartono, D., 2014. Lemmatization Technique in Bahasa: Indonesian. Journal of Software, 9(5), p.1203.

Clone this wiki locally