Project1 Web Scraping Mcnutrition
Organizing the Menu of McDonald's Japan for a Nutrition Analysis
In this project we will collect and organize information from all products currently available in McDonald's Japan (April, 2021). Our objective is to create a table where we can identify the product name, price, size and nutritional information.
To achieve our objective we will use web scraping, a process that uses bots to extract content and data from a website. The products' information will be scraped from the company's homepage: https://www.mcdonalds.co.jp/en/quality/allergy_Nutrition/nutrient/
As we can see below, the nutrition data is already provided in a table format:
Our object is to collect all the data and reorganize it in a dataframe, adding some secondary information such as product price and size. The secondary information is available in each product specific homepage. Here we can see an example of the Shrimp Filet-O:
Outline of the Project:
- Download the necessary information from the homepages using the command 'requests';
- Parse these homepage html code using BeautifulSoup;
- Identify the codes referent to the information of interest;
- Create lists that contain these data and then organize them in a data frame;
- Save the final data as a CSV file.
After finishing the web scraping process we expect to end up with a CSV file that contains a table similar to the one below:
How to run the code:
You can execute this notebook using the "Run" button at the top of the page.
You can also make changes and save your own version of the notebook to Jovian by executing the following cells:
!pip install jovian --upgrade --quiet import jovian jovian.commit(project="project1-mcnutrition_correct")
[jovian] Attempting to save notebook.. [jovian] Updating notebook "matcha-coding/project1-web-scraping-mcnutrition" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/matcha-coding/project1-web-scraping-mcnutrition
Before starting our work we install the libraries that will be used in this project:
!pip install beautifulsoup4 --upgrade --quiet import requests import pandas as pd from bs4 import BeautifulSoup import os