Numerical Computing with Python and Numpy
Part 6 of "Data Analysis with Python: Zero to Pandas"
This tutorial is the sixth in a series on introduction to programming and data analysis using the Python language. These tutorials take a practical coding-based approach, and the best way to learn the material is to execute the code and experiment with the examples. Check out the full series here:
How to run the code
This tutorial hosted on Jovian.ml, a platform for sharing data science projects online. You can "run" this tutorial and experiment with the code examples in a couple of ways: using free online resources (recommended) or on your own computer.
This tutorial is a Jupyter notebook - a document made of "cells", which can contain explanations in text or code written in Python. Code cells can be executed and their outputs e.g. numbers, messages, graphs, tables, files etc. can be viewed within the notebook, which makes it a really powerful platform for experimentation and analysis. Don't afraid to experiment with the code & break things - you'll learn a lot by encoutering and fixing errors. You can use the "Kernel > Restart & Clear Output" menu option to clear all outputs and start again from the top of the notebook.
Option 1: Running using free online resources (1-click, recommended)
The easiest way to start executing this notebook is to click the "Run" button at the top of this page, and select "Run on Binder". This will run the notebook on mybinder.org, a free online service for running Jupyter notebooks. You can also select "Run on Colab" or "Run on Kaggle", but you'll need to create an account on Google Colab or Kaggle to use these platforms.
Option 2: Running on your computer locally
You'll need to install Python and download this notebook on your computer to run in locally. We recommend using the Conda distribution of Python. Here's what you need to do to get started:
-
Install Conda by following these instructions. Make sure to add Conda binaries to your system
PATH
to be able to run theconda
command line tool from your Mac/Linux terminal or Windows command prompt. -
Create and activate a Conda virtual environment called
zerotopandas
which you can use for this tutorial series:
conda create -n zerotopandas -y python=3.8
conda activate zerotopandas
You'll need to create the environment only once, but you'll have to activate it every time want to run the notebook. When the environment is activated, you should be able to see a prefix (numerical-computing-with-numpy)
within your terminal or command prompt.
- Install the required Python libraries within the environmebt by the running the following command on your terminal or command prompt:
pip install jovian jupyter numpy pandas matplotlib seaborn --upgrade
- Download the notebook for this tutorial using the
jovian clone
command:
jovian clone aakashns/python-numerical-computing-with-numpy
The notebook is downloaded to the directory python-numerical-computing-with-numpy
.
- Enter the project directory and start the Jupyter notebook:
cd python-numerical-computing-with-numpy
jupyter notebook
- You can now access Jupyter's web interface by clicking the link that shows up on the terminal or by visiting http://localhost:8888 on your browser. Click on the notebook
python-numerical-computing-with-numpy.ipynb
to open it and run the code. If you want to type out the code yourself, you can also create a new notebook using the "New" button.
Working with numerical data
The "data" in Data Analysis typically refers to numerical data e.g. stock prices, sales figures, sensor measurements, sports scores, database tables etc. The Numpy library provides specialized data structures, functions and other tools for numerical computing in Python. Let's work through an example to see why & how to use Numpy for working with numerical data.
Let's say we want to use climate data like the temperature, rainfall and humidity in a region to determine if the region is well suited for growing apples. A really simple approach for doing this would be to formulate the relationship between the annual yield of apples (tons per hectare) and the climatic conditions like the average temperature (in degrees Farenheit), rainfall (in millimeters) & average relative humidity (in percentage) as a linear equation.
yield_of_apples = w1 * temperature + w2 * rainfall + w3 * humidity
We're expressing the yield of apples as a weighted sum of the temperature, rainfall and humidity. Obviously, this is an approximation, since the actual relation may not necessarily be linear. But a simple linear model like this often works well in practice.
Based on some statical analysis of historical data, we might we able to come up with reasonable values for the weights w1
, w2
and w3
. Here's an example set of values:
w1, w2, w3 = 0.3, 0.2, 0.5
Given some climate data for a region, we can now predict what the yield of apples in the region might look like. Here's some sample data:
To begin, we can define some variables to record the climate data for a region.