Exploratory Data Analysis on Google Play Store Apps
In this project, we will analyze the Google Playstore Apps Dataset from Kaggle. This dataset has over 4.5 Lakhs of rows and 29 columns. We'll use the useful 16 columns for our Analysis. The data set can be viewed using this link: 'https://www.kaggle.com/datasets/geothomas/playstore-dataset?resource=download&select=Playstore_final.csv'
The main objective of this project is to Analyse each column in the Google Play Store dataset by applying data analysis & visualization skills to the real-world dataset.
Here is an outline of the steps we'll follow:
- Downloading a dataset from an online source.
- Data preparation and cleaning
- Exploratory Analysis and Visualization.
- Asking and Answering interesting questions.
- Future Work
Exploratory Data Analysis (EDA) is a process of exploring, investigating and gathering insights from data using statistical measures and visualizations. The objective of EDA is to develop and understand data by uncovering trends, relationships and patterns.
EDA is both a science and an art. On the one hand, it requires knowledge of statistics, visualization techniques and data analysis tools like Numpy, Pandas, Seaborn etc. On the other hand, it requires asking interesting questions to guide the investigation and interpreting numbers & figures to generate useful insights.
1. DOWNLOADING DATASET FROM AN ONLINE SOURCE
Installing and importing all required Libraries
!pip install opendatasets --upgrade --quiet !pip install matplotlib==3.1.3 --quiet !pip install plotly --upgrade --quiet !pip install -U matplotlib --quiet !pip install folium --upgrade --quiet !pip install numpy --quiet !pip install seaborn --upgrade --quiet import seaborn as sns import os import pandas as pd import datetime import numpy as np import matplotlib.pyplot as plt import matplotlib import opendatasets as od import folium import plotly.express as px import plotly.graph_objects as go from plotly.subplots import make_subplots from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/ Collecting matplotlib==3.1.3 Downloading matplotlib-3.1.3.tar.gz (40.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.9/40.9 MB 12.9 MB/s eta 0:00:00 Preparing metadata (setup.py) ... done Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.9/dist-packages (from matplotlib==3.1.3) (0.11.0) Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.9/dist-packages (from matplotlib==3.1.3) (1.4.4) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.9/dist-packages (from matplotlib==3.1.3) (3.0.9) Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.9/dist-packages (from matplotlib==3.1.3) (2.8.2) Requirement already satisfied: numpy>=1.11 in /usr/local/lib/python3.9/dist-packages (from matplotlib==3.1.3) (1.22.4) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.9/dist-packages (from python-dateutil>=2.1->matplotlib==3.1.3) (1.16.0) Building wheels for collected packages: matplotlib Building wheel for matplotlib (setup.py) ... done Created wheel for matplotlib: filename=matplotlib-3.1.3-cp39-cp39-linux_x86_64.whl size=12062305 sha256=93c957873119f7d019e7a3ad4c0c995f326decaff92cf30aebc1ab8435ffa13d Stored in directory: /root/.cache/pip/wheels/88/5f/33/d7b8943eba74fdfbd535c83cefcf366c25b0f9cb6424e763e7 Successfully built matplotlib Installing collected packages: matplotlib Attempting uninstall: matplotlib Found existing installation: matplotlib 3.7.1 Uninstalling matplotlib-3.7.1: Successfully uninstalled matplotlib-3.7.1 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. plotnine 0.10.1 requires matplotlib>=3.5.0, but you have matplotlib 3.1.3 which is incompatible. pandas-profiling 3.2.0 requires matplotlib>=3.2.0, but you have matplotlib 3.1.3 which is incompatible. mizani 0.8.1 requires matplotlib>=3.5.0, but you have matplotlib 3.1.3 which is incompatible. arviz 0.15.1 requires matplotlib>=3.2, but you have matplotlib 3.1.3 which is incompatible. Successfully installed matplotlib-3.1.3 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.3/15.3 MB 54.3 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.6/11.6 MB 52.6 MB/s eta 0:00:00 Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/ Requirement already satisfied: numpy in /usr/local/lib/python3.9/dist-packages (1.22.4)