Skip to content

devansh-DR/AnimeList_pandas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 

Repository files navigation

🎬 Highest-Grossing Japanese Films Web Scraper

This project is a simple Python web scraping script that extracts data from the Wikipedia page on highest-grossing Japanese films. The script collects movie-related data such as titles, gross revenue, release year, and more, and stores it in a structured format using pandas.


πŸ“Œ Features

  • βœ… Scrapes movie data from Wikipedia using requests and BeautifulSoup.
  • βœ… Extracts table headers and rows into a clean pandas DataFrame.
  • βœ… Drops unnecessary columns (e.g., notes or references).
  • βœ… Optional: Save the final DataFrame to a .csv file for analysis or visualization.

🧠 How It Works

  1. Sends a GET request to the Wikipedia page using a custom User-Agent.
  2. Parses the page with BeautifulSoup and locates the first HTML <table>.
  3. Extracts column headers and row values.
  4. Cleans the data by dropping the last column.
  5. Saves or prints the result.

πŸ› οΈ Requirements

Install the required packages:

pip install requests beautifulsoup4 pandas

About

This Python script uses BeautifulSoup and pandas to scrape data from the Wikipedia page on highest-grossing Japanese films.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages