Jovian
Sign In

Eda Zero2pandas

Exploratory Data Analysis on Google Play Store Apps

Play Store

INTRODUCTION

In this project, we will analyze the Google Playstore Apps Dataset from Kaggle. This dataset has over 4.5 Lakhs of rows and 29 columns. We'll use the useful 16 columns for our Analysis. The data set can be viewed using this link: 'https://www.kaggle.com/datasets/geothomas/playstore-dataset?resource=download&select=Playstore_final.csv'

The main objective of this project is to Analyse each column in the Google Play Store dataset by applying data analysis & visualization skills to the real-world dataset.

OUTLINE

Here is an outline of the steps we'll follow:

  1. Downloading a dataset from an online source.
  2. Data preparation and cleaning
  3. Exploratory Analysis and Visualization.
  4. Asking and Answering interesting questions.
  5. Summary
  6. Future Work
  7. Reference

Exploratory Data Analysis (EDA) is a process of exploring, investigating and gathering insights from data using statistical measures and visualizations. The objective of EDA is to develop and understand data by uncovering trends, relationships and patterns.

EDA is both a science and an art. On the one hand, it requires knowledge of statistics, visualization techniques and data analysis tools like Numpy, Pandas, Seaborn etc. On the other hand, it requires asking interesting questions to guide the investigation and interpreting numbers & figures to generate useful insights.

EDA

1. DOWNLOADING DATASET FROM AN ONLINE SOURCE

Installing and importing all required Libraries

!pip install opendatasets --upgrade --quiet
!pip install matplotlib==3.1.3 --quiet
!pip install plotly --upgrade --quiet
!pip install -U matplotlib --quiet
!pip install folium --upgrade --quiet
!pip install numpy --quiet
!pip install seaborn --upgrade --quiet

import seaborn as sns
import os
import pandas as pd
import datetime
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import opendatasets as od
import folium
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/ Collecting matplotlib==3.1.3 Downloading matplotlib-3.1.3.tar.gz (40.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.9/40.9 MB 12.9 MB/s eta 0:00:00 Preparing metadata (setup.py) ... done Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.9/dist-packages (from matplotlib==3.1.3) (0.11.0) Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.9/dist-packages (from matplotlib==3.1.3) (1.4.4) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.9/dist-packages (from matplotlib==3.1.3) (3.0.9) Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.9/dist-packages (from matplotlib==3.1.3) (2.8.2) Requirement already satisfied: numpy>=1.11 in /usr/local/lib/python3.9/dist-packages (from matplotlib==3.1.3) (1.22.4) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.9/dist-packages (from python-dateutil>=2.1->matplotlib==3.1.3) (1.16.0) Building wheels for collected packages: matplotlib Building wheel for matplotlib (setup.py) ... done Created wheel for matplotlib: filename=matplotlib-3.1.3-cp39-cp39-linux_x86_64.whl size=12062305 sha256=93c957873119f7d019e7a3ad4c0c995f326decaff92cf30aebc1ab8435ffa13d Stored in directory: /root/.cache/pip/wheels/88/5f/33/d7b8943eba74fdfbd535c83cefcf366c25b0f9cb6424e763e7 Successfully built matplotlib Installing collected packages: matplotlib Attempting uninstall: matplotlib Found existing installation: matplotlib 3.7.1 Uninstalling matplotlib-3.7.1: Successfully uninstalled matplotlib-3.7.1 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. plotnine 0.10.1 requires matplotlib>=3.5.0, but you have matplotlib 3.1.3 which is incompatible. pandas-profiling 3.2.0 requires matplotlib>=3.2.0, but you have matplotlib 3.1.3 which is incompatible. mizani 0.8.1 requires matplotlib>=3.5.0, but you have matplotlib 3.1.3 which is incompatible. arviz 0.15.1 requires matplotlib>=3.2, but you have matplotlib 3.1.3 which is incompatible. Successfully installed matplotlib-3.1.3 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.3/15.3 MB 54.3 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.6/11.6 MB 52.6 MB/s eta 0:00:00 Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/ Requirement already satisfied: numpy in /usr/local/lib/python3.9/dist-packages (1.22.4)
jp-amith
Amith J Prakash6 months ago