Jovian
Sign In

Web Scraping Wiki Country Demographics Final

Scraping the Country Demographic Information on Wikipedia using Python

banner-image

Wikipedia is an online free content encyclopedia project helping create a world in which everyone can freely share in the sum of all knowledge. Wikipedia is written collaboratively by largely anonymous volunteers. Anyone with Internet access can write and make changes to Wikipedia articles, except in limited cases where editing is restricted to prevent further disruption or vandalism.

The page https://en.wikipedia.org/wiki/Category:Demographics_by_country provides a list of countries on Wikipedia. In this project, we will retrieve information from this page using web scraping: the process of extracting information from a website in an automated fashion using code.

We'll use the Python libraries Requests and Beautifulsoup4 to scrape data from this page. Additional Python libraries used in the process are pandas, os etc.

Project Outline

Here's an outline of the steps we'll follow:

  1. Download the page using requests
  2. Parse the HTML source using BeautifulSoup4
  3. Extract country names and country URLs from the main page
  4. Compile extracted information into Python lists and dictionaries
  5. Extract and combine data from multiple pages
  6. Save the extracted information to a CSV file.

By the end of the project, we will create a CSV file with the following format:

Country Name, Country Demographic URL
Afghanistan, https://en.wikipedia.org/wiki/Demographics_of_Afghanistan
India, https://en.wikipedia.org/wiki/Demographics_of_India
  • For each country, we will utilize the demographic URL to extract demographic information like Population, Density, Growth Rate, Birth Rate & Death Rate etc
  • We will create csv file for each country in the following format (as long as all these information is available on Wiki):
Country,Population,Growth_rate,Birth_rate,Death_rate,Life_expectancy
Afghanistan,"39,864,082",2.34%,38.3,13.7,63.2
India,"1,392,700,000",1.1%,18.2,7.3,70.03

How to Run the code

You can execute the code using the "Run" button at the top of the page. You can make changes and save your own version to Jovian of the notebook by executing the following cells:

Install and Import the libraries

We can use the requests library to download the web page. The library can be installed using pip.

!pip install requests --upgrade --quiet
import requests

!pip install pandas --quiet
import pandas as pd

import os
VR
Vipin Rathore6 months ago