This project is a simple Python web scraping script that extracts data from the Wikipedia page on highest-grossing Japanese films. The script collects movie-related data such as titles, gross revenue, release year, and more, and stores it in a structured format using pandas.
- β
Scrapes movie data from Wikipedia using
requestsandBeautifulSoup. - β
Extracts table headers and rows into a clean
pandasDataFrame. - β Drops unnecessary columns (e.g., notes or references).
- β
Optional: Save the final DataFrame to a
.csvfile for analysis or visualization.
- Sends a GET request to the Wikipedia page using a custom
User-Agent. - Parses the page with
BeautifulSoupand locates the first HTML<table>. - Extracts column headers and row values.
- Cleans the data by dropping the last column.
- Saves or prints the result.
Install the required packages:
pip install requests beautifulsoup4 pandas