GitHub - DFanny-5/web-crawler-for-content-based-recommendiation-system: This web-crawler is used to create a dataset contains all the available movies and tv shows on the Canadian Netflix. It will try to contains as much metadata as possible by combining the information available from the IMDB and TVDB for each movie or tv shows.

Project Background

This is the a web-crawler made in 2022 to try to collect and prepare a dataset to be used in developing a content-based recommendation system. The content-based recommendation system is simply trying to find the most similiar products or items the client have viewed.

The code was developed individually when I was a student so lots of coding style are not formatted very well. Lots of hard coding were made and the most of the code did not follow the concepts of OOP very well. After working for a while, I decided to refactor these codes to make it more readable. I also want to further clean the result dataset so it can help to build a better content-based recommendation system. Some other attributes may be added to the dataset by using some new techniques I learned, such as the keywords extracted from the movie reviews using the NLP.

The dataset will be uoploaded to Kaggle and welcome to use it

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.idea		.idea
.gitignore		.gitignore
IMDB_API.py		IMDB_API.py
README.md		README.md
crawel_netflix_tv_or_movie.py		crawel_netflix_tv_or_movie.py
models.py		models.py
read_first.txt		read_first.txt
requirements.txt		requirements.txt
try2.py		try2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project Background

About

Uh oh!

Releases

Packages

Languages

DFanny-5/web-crawler-for-content-based-recommendiation-system

Folders and files

Latest commit

History

Repository files navigation

Project Background

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages