Web Scraping Pubmed Project

Web Scraping Research articles on Alzheimer's Disease from Pubmed

PubMed is online literature Database for biomedical literature from various scientific journals. It is a service provided by the National Library of Medicine (NLM) at the U.S. National Institutes of Health (NIH).

PubMed is a valuable resource for researchers, healthcare professionals, and anyone interested in exploring the latest developments in medical research. It allows users to search for and access a wide range of peer-reviewed publications, including research papers, clinical trials, and reviews, covering various areas of biomedicine, such as genetics, immunology, pharmacology, and many others.

📌Problem Statement

The problem at hand is to extract relevant articles on Alzheimer's disease from PubMed. Alzheimer's disease is a complex and debilitating neurodegenerative disease that affects millions of people worldwide. As research into Alzheimer's disease is a rapidly evolving field, it is essential to have access to up-to-date and accurate information.

the extracted information will have

  • name of the article

  • authors of the article

  • citation

  • pubmedID (a unique identifier alloted to every article on PubMed)

  • link to the article

A break down of the code ⚒

  1. Download web pages via requests library and convert it into a BeautifulSoup object
  2. Use BeautifulSoup to parse information
  3. Create functions to get the necessary tags and write information gathered to a csv
