This is the a web-crawler made in 2022 to try to collect and prepare a dataset to be used in developing a content-based recommendation system. The content-based recommendation system is simply trying to find the most similiar products or items the client have viewed.
The code was developed individually when I was a student so lots of coding style are not formatted very well. Lots of hard coding were made and the most of the code did not follow the concepts of OOP very well. After working for a while, I decided to refactor these codes to make it more readable. I also want to further clean the result dataset so it can help to build a better content-based recommendation system. Some other attributes may be added to the dataset by using some new techniques I learned, such as the keywords extracted from the movie reviews using the NLP.
The dataset will be uoploaded to Kaggle and welcome to use it