Webscrapping Android Rankings 1
Web Scrape Top Android Apps - Games to Health to Productivity ...
In this new world data is everything. If the data is available in a homogenous way, we can manipulate and analyse all the different stats on those data.
Android App Store Ranking Site offers data about all the mobile apps, platforms, price, user ratings, categories, country based rankings, number of downloads(installs) and much more information. This provides an ideal base for scraping interesting data starting from video games to health apps to communication apps to dating apps. The data provides insight into peoples preferred choice of apps and their growth.
Below is a simple project to showcase how scrapping can be done in Android app store to easily avail interesting informaiton on video games. But this can be used to scrape any app category specified in that site.Web Scraping
Web scraping is the process of extracting and parsing data from websites in an automated fashion using a computer program. It's a useful technique for creating datasets for research and learning. The process of scrappingn involes the following steps:
- Make a HTML request via a URL to the web page that needs to be scraped.
- Retrieve the HTML content as text.
- Identify the particular HTML tag from which to extract data. To do this, right click on the web page in the browser and select inspect options to view the structure.
- Extract the text from those HTML tag and construct the needed dataset
- Convert the dataset into format which can be analysed.
In this project we will be using three python libraries
BeautifulSoup,
requests,
pandas
and see how we can scrape the data from online and store them into a CSV file.
Requests
"Requests" is an elegant and simple HTTP library for Python. More information can be found at https://docs.python-requests.org/en/master/)
It abstracts the complexities of making HTTP requests behind a beautiful, simple API so that we can focus on interacting with services and consuming data in the application.
Here we will be using the basic get API "requests.get()" to send the HTTP request on the specified URL and retrieve the HTTP response. The status of the http response should be OK(200 -> HTTP Status Code)
Example Usage:
url = 'https://androidrank.org/'
response = requests.get(url)
response.status_code
Beautiful Soup
This python library helps in parsing the HTML and XML data in a simpler way. The HTTP response that's obtained from the earlier step is beautified using the Beautiful soup.
Here we identify which httml tags/elements has to be parsed to process the necessary data. Detailed documentation can be seen at https://www.crummy.com/software/BeautifulSoup/bs4/doc/
Example Usage:
1. Parse the response into html
s = BeautifulSoup(response.content, ‘html.parser’)
2. Find first occurence of a specified HTML Tag.
s.find(<tag name>, class)
Eg:
s.find('div', '{class=test}')
3. Find all occurences of a specified HTML tag. This provides a list of all the element satifying the specified criteria.
s. find_all(<tag name>, class)
Eg:
find_all('div', '{class=test}')