Project: Profitable App Profile(App Store & Google Play)
In this project the assumption is that we are working for a company building applications for both the Google Play Store and the IOS app store. Our team of developers want to know what kind of app to build for both markets that would deliver the most profit.
Note that at this company we only develop free apps thus most of our revenue comes from in-app Ads, therefore, we should build apps that have the highest number of users. The apps are also targeted at the english-speaking market
The Data Set
As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.
Collecting data for over 4 million apps requires a significant amount of time and money, so we'll try to analyze a sample of the data instead. To avoid spending resources on collecting new data ourselves, we should first try to see if we can find any relevant existing data at no cost. Luckily, we have two data sets that seem suitable for our goals:
- A data set containing data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018. You can download the data set directly from this link
. - A data set
containing data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017. You can download the data set directly from this link.
Opening & Exploring the Data Set
This analysis will be done without the pydata stack.
from csv import reader
#App store data set
d = open('AppleStore.csv')
ios = list(reader(d)) #reading in it in as a list of list
ios_header = ios[0] # seperates the header containing the column names from the data itself i.e the values
ios_data = ios[1:]
#Google play store data set
e = open('googleplaystore.csv')
android = list(reader(e))
android_header = android[0]
android_data = android[1:]
#function to print data in a readable way
def explore_data(dataset, start, end, rows_and_columns=False):
dataset_slice = dataset[start:end]
for row in dataset_slice:
print(row)
print('\n') # adds a new (empty) line after each row
if rows_and_columns:
print('Number of rows:', len(dataset))
print('Number of columns:', len(dataset[0]))
print(android_header) #column names for android data set
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
print(ios_header) #column names for ios data set
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']