Sign In

Webscraping Project Final

Scraping Electricity Production of Different Countries and Indian States using Python


Wikipedia is a free content, multilingual online encyclopedia written and maintained by a community of volunteers through a model of open collaboration, using a wiki-based editing system. Individual contributors, also called editors, are known as Wikipedians. It is the largest and most-read reference work in history, and consistently one of the 15 most popular websites ranked by Alexa; as of 2021, Wikipedia was ranked the 13th most popular site. We can find any type of information on wikipedia.

The page, provides a list of countries by electricity production and the page, which is going to be extracted from country page, discusses the electricity generation of the Indian states on Wikipedia. In this project we'll retrieve information from these two pages using web scraping: the process of extracting and parsing data from websites in an automated fashion using a computer program. We'll use the python libraries requests and Beautiful Soup and pandas to scrape data from these pages.

Here's an outline of the steps we'll follow:

  1. Download the web pages using requests
  2. Parse the HTML source code using beautifulsoup4
  3. Extract the required information from page.
  4. Compile extracted information into python list
  5. Save the extracted information to a csv file.

By the end of the project, we'll create two csv files.

One csv file looks in the following format:

Country_name, Country_electricity(GWh),Production_Year,country_url
United States,"4,286,600",2020,

Another csv file looks like this:

State/Union Territory,Coal,Lignite,Gas,Diesel,Sub-TotalThermal,Nuclear,Hydel,OtherRenewable,Sub-TotalRenewable,Total, % of National total, % Renewable
Western Region,85156,1540,10806,-,97502,1840,7392,30367,37759,137101,35.69%,27.54%

Importance of the project

Electricity is one of the most important blessings that science has given to mankind. It has also become a part of modern life and one cannot think of a world without it. Electricity has many uses in our day to day life. It is used for lighting rooms, working fans and domestic appliances like using electric stoves, A/C and more. All these provide comfort to people. In factories, large machines are worked with the help of electricity. Essential items like food, cloth, paper and many other things are the product of electricity. Modern means of transportation and communication have been revolutionized by it. Electric trains and battery cars are quick means of travel. Electricity also provides means of amusement, radio, television and cinema, which are the most popular forms of entertainment are the result of electricity. Modern equipment like computers and robots have also been developed because of electricity. Electricity plays a pivotal role in the fields of medicines and surgery too such as X-ray, ECG. The use of electricity is increasing day by day. The growth of the electricity sector will be important to sustain the economic output of the country. Electricity is not freely available in nature, so it must be produced. The main sources for electricity generation are coal, lignite, gas, diesel, nuclear, solar, wind, hydro and many more. The world is moving towards renewable energy sources such as wind, solar, biomass e.t.c , which provides reliable power supplies and fuel diversification.

Because of these innumerable uses of electricity it is highly recommended to scrape and analyze this information.

This project is divided into two sections:

How to run the code

You can execute the code using the "Run" button at the top of this page and selecting "Run on Binder". You can make changes and save your own version of the notebook to Jovian by executing the following cells.

!pip install jovian --upgrade --quiet
import jovian
# Execute this to save new versions of the notebook
Prasanthi N6 months ago