Jovian
Sign In

Webscraping Final

Web Scraping Threads on Mental Health Forum using Python


banner-image

Image Credit: Christine Daniloff, MIT

In this project, we will retrieve information from the Mental Health Forum using web scraping - the process of extracting information from a website in an automated fashion using code. Before we dive into any of the details let us take a look at some fo the terminlolgies used here.

1. What is Web Scraping?

Web scraping refers to the extraction of data from websites. The web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.

2. What is the Mental Health Forum?

It is a place where you can speak openly and anonymously about your mental health experiences. On the Mental Health Forum you can share experiences, ask questions or vent your emotions with people who know what’s it’s like to experience mental health difficulties and everything that goes alongside them. There are several categories of forums in here where you can speak to people who know what it's like having the same or similar mental health difficulties and everything that goes alongside them.

The Python community offers some pretty powerful web scraping tools. There are several ways you could scrape data using Python. Here, we will use the Python libraries Requests and BeautifulSoup to scrape data from the depression forum: https://www.mentalhealthforum.net/forum/forums/depression-forum.366/.

Here is an outline of the steps we will follow in this project:

3. Outline:

  1. Download the webpage using Requests.
  2. Parse the HTML source code using Beautiful Soup.
  3. Extract title, replies, views, timestamp from the page.
  4. Compile extracted information into python lists and dictionaries.
  5. Extract and combine data from multiple pages.
  6. Save the extracted information to a CSV file.

At the end of the project, we will create a CSV file in the following format:

titlerepliesviewstimestamp
"Anxiety and Depression: Which one do I have?"83650002012-01-04T01:18:39+0000
"Shame"5442021-05-02T20:35:03+0100
"I Feel Like I Don't Have Almost None Happiness Hormones Left In Me"5582021-05-03T05:28:34+0100

4. How to Run the Code?

You can execute the code using the "Run" button at the top of this page and selecting "Run on Binder". You can make changes and save your own version of the notebook to Jovian by executing the following cells:

!pip install jovian --upgrade --quiet

Let us now dive into the project.

Download the webpage using Requests

We'll use the requests library to download the web page. The library can be installed using pip.

sanuann
Sanu Ann Abraham6 months ago