Learn practical skills, build real-world projects, and advance your career

Webscraping Project on MyAnimeList

Scraping a list of top anime using BeautifulSoup and Request

Website URL: https://myanimelist.net/

My Anime List

What exactly is web scraping?

The Web contains tons and tons of data. The technique to extract data from this pool of information is called Web scraping. In other words, web scraping is the process of creating bots to extract content and data from a website, so it can later be used for other purposes like analysis, etc. Unlike Screen Scraping, a web scraper extracts the underlying HTML code and with it, the data stored in a database.

Python is a great language to scrape in,for this project, I have used two python libraries: Requests and Beautiful Soup.

  • Requests, downloads the HTML code for a selected URL.
  • BeautifulSoup, extracts data from HTML code.
    There’s more than one library to scrape a web page obviously. But for one-off scripts that you don’t plan to maintain in the long run, these two are likely the better solution.

In this project, I have scraped a very interesting website called 'MyAnimeList'. I also call it the 'Reddit for Anime' since it contains everything you need to know to get started on watching anime.

alt

I have divided this project into two parts:

  • First: 
    I have scraped top anime across all genres.
    The page: https://myanimelist.net/topanime.php contains a list of top 50 anime with other information. We'll pick the Title, rank, image_url, release_date, and no. of episodes from our list of top anime.

  • Second:
    I have scraped genre-specific trending anime. This URL: https://myanimelist.net/anime.php provides a list of genre's to choose from. eg: {Adventure, Action, Drama, Cars, Fantasy, etc} where we will scrape anime titles and other information under specific genre's, using the same code.

  • Next:
    In the end, I will store the scraped information into a pandas data frame for each part.

Here's an outline of the steps we'll follow:

  • Download the webpage using requests
  • Parse the HTML source code using beautiful soup
  • Extract title names, ratings, ranking and URLs from page.
  • Compile the extracted information into Python lists and dictionaries
  • Extract information from multiple pages.
  • Save the extracted information to a CSV file.
  • We'll repeat the steps for Part I and Part II.

Let's begin:

Importing Necessary Libraries

import requests 
from bs4 import BeautifulSoup 
import pandas as pd
import jovian
site_url = 'https://myanimelist.net'