Learn practical skills, build real-world projects, and advance your career

Project: Profitable App Profile(App Store & Google Play)

In this project the assumption is that we are working for a company building applications for both the Google Play Store and the IOS app store. Our team of developers want to know what kind of app to build for both markets that would deliver the most profit.
Note that at this company we only develop free apps thus most of our revenue comes from in-app Ads, therefore, we should build apps that have the highest number of users. The apps are also targeted at the english-speaking market

The Data Set

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.
alt
Collecting data for over 4 million apps requires a significant amount of time and money, so we'll try to analyze a sample of the data instead. To avoid spending resources on collecting new data ourselves, we should first try to see if we can find any relevant existing data at no cost. Luckily, we have two data sets that seem suitable for our goals:

  • A data set containing data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018. You can download the data set directly from this link
    .
  • A data set
    containing data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017. You can download the data set directly from this link.

Opening & Exploring the Data Set

This analysis will be done without the pydata stack.

from csv import reader

#App store data set
d = open('AppleStore.csv')
ios = list(reader(d)) #reading in it in as a list of list
ios_header = ios[0] # seperates the header containing the column names from the data itself i.e the values
ios_data = ios[1:]

#Google play store data set
e = open('googleplaystore.csv')
android = list(reader(e))
android_header = android[0]
android_data = android[1:]
#function to print data in a readable way
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
print(android_header) #column names for android data set
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
print(ios_header) #column names for ios data set
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']