Web Scraping Wiki Country Demographics Final
Scraping the Country Demographic Information on Wikipedia using Python
Wikipedia is an online free content encyclopedia project helping create a world in which everyone can freely share in the sum of all knowledge. Wikipedia is written collaboratively by largely anonymous volunteers. Anyone with Internet access can write and make changes to Wikipedia articles, except in limited cases where editing is restricted to prevent further disruption or vandalism.
The page https://en.wikipedia.org/wiki/Category:Demographics_by_country provides a list of countries on Wikipedia. In this project, we will retrieve information from this page using web scraping: the process of extracting information from a website in an automated fashion using code.
Here's an outline of the steps we'll follow:
- Download the page using
- Parse the HTML source using
- Extract country names and country URLs from the main page
- Compile extracted information into Python lists and dictionaries
- Extract and combine data from multiple pages
- Save the extracted information to a CSV file.
By the end of the project, we will create a CSV file with the following format:
Country Name, Country Demographic URL Afghanistan, https://en.wikipedia.org/wiki/Demographics_of_Afghanistan India, https://en.wikipedia.org/wiki/Demographics_of_India
- For each country, we will utilize the demographic URL to extract demographic information like Population, Density, Growth Rate, Birth Rate & Death Rate etc
- We will create csv file for each country in the following format (as long as all these information is available on Wiki):
Country,Population,Growth_rate,Birth_rate,Death_rate,Life_expectancy Afghanistan,"39,864,082",2.34%,38.3,13.7,63.2 India,"1,392,700,000",1.1%,18.2,7.3,70.03
Install and Import the libraries
We can use the
requests library to download the web page. The library can be installed using
!pip install requests --upgrade --quiet import requests !pip install pandas --quiet import pandas as pd import os