Eda Zero2pandas
Exploratory Data Analysis on Google Play Store Apps
INTRODUCTION
In this project, we will analyze the Google Playstore Apps Dataset from Kaggle. This dataset has over 4.5 Lakhs of rows and 29 columns. We'll use the useful 16 columns for our Analysis. The data set can be viewed using this link: 'https://www.kaggle.com/datasets/geothomas/playstore-dataset?resource=download&select=Playstore_final.csv'
The main objective of this project is to Analyse each column in the Google Play Store dataset by applying data analysis & visualization skills to the real-world dataset.
OUTLINE
Here is an outline of the steps we'll follow:
- Downloading a dataset from an online source.
- Data preparation and cleaning
- Exploratory Analysis and Visualization.
- Asking and Answering interesting questions.
- Summary
- Future Work
- Reference
Exploratory Data Analysis (EDA) is a process of exploring, investigating and gathering insights from data using statistical measures and visualizations. The objective of EDA is to develop and understand data by uncovering trends, relationships and patterns.
EDA is both a science and an art. On the one hand, it requires knowledge of statistics, visualization techniques and data analysis tools like Numpy, Pandas, Seaborn etc. On the other hand, it requires asking interesting questions to guide the investigation and interpreting numbers & figures to generate useful insights.
1. DOWNLOADING DATASET FROM AN ONLINE SOURCE
Installing and importing all required Libraries
!pip install opendatasets --upgrade --quiet
!pip install matplotlib==3.1.3 --quiet
!pip install plotly --upgrade --quiet
!pip install -U matplotlib --quiet
!pip install folium --upgrade --quiet
!pip install numpy --quiet
!pip install seaborn --upgrade --quiet
import seaborn as sns
import os
import pandas as pd
import datetime
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import opendatasets as od
import folium
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting matplotlib==3.1.3
Downloading matplotlib-3.1.3.tar.gz (40.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.9/40.9 MB 12.9 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.9/dist-packages (from matplotlib==3.1.3) (0.11.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.9/dist-packages (from matplotlib==3.1.3) (1.4.4)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.9/dist-packages (from matplotlib==3.1.3) (3.0.9)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.9/dist-packages (from matplotlib==3.1.3) (2.8.2)
Requirement already satisfied: numpy>=1.11 in /usr/local/lib/python3.9/dist-packages (from matplotlib==3.1.3) (1.22.4)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.9/dist-packages (from python-dateutil>=2.1->matplotlib==3.1.3) (1.16.0)
Building wheels for collected packages: matplotlib
Building wheel for matplotlib (setup.py) ... done
Created wheel for matplotlib: filename=matplotlib-3.1.3-cp39-cp39-linux_x86_64.whl size=12062305 sha256=93c957873119f7d019e7a3ad4c0c995f326decaff92cf30aebc1ab8435ffa13d
Stored in directory: /root/.cache/pip/wheels/88/5f/33/d7b8943eba74fdfbd535c83cefcf366c25b0f9cb6424e763e7
Successfully built matplotlib
Installing collected packages: matplotlib
Attempting uninstall: matplotlib
Found existing installation: matplotlib 3.7.1
Uninstalling matplotlib-3.7.1:
Successfully uninstalled matplotlib-3.7.1
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
plotnine 0.10.1 requires matplotlib>=3.5.0, but you have matplotlib 3.1.3 which is incompatible.
pandas-profiling 3.2.0 requires matplotlib>=3.2.0, but you have matplotlib 3.1.3 which is incompatible.
mizani 0.8.1 requires matplotlib>=3.5.0, but you have matplotlib 3.1.3 which is incompatible.
arviz 0.15.1 requires matplotlib>=3.2, but you have matplotlib 3.1.3 which is incompatible.
Successfully installed matplotlib-3.1.3
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.3/15.3 MB 54.3 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.6/11.6 MB 52.6 MB/s eta 0:00:00
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: numpy in /usr/local/lib/python3.9/dist-packages (1.22.4)