Pandas map(), apply(), & applymap() methods
Pandas map
, apply
, and applymap
methods
Pandas offers a variety of tools & methods to optimize the data loading, pre-processing, and analyzing process. Datasets with millions of rows can be processed using Pandas smoothly.
map
, apply
, & applymap
are such methods that allow element-wise modification of a Dataframe or Series without using a loop, which simplifies data processing. In this post, we will look at the use case for these methods and how we can implement them.
Let's create a sample df to understand each.
import pandas as pd
a_dict = {'c1':[1,2,3], 'c2':[1,2,3],'c3':[1,2,3], 'c4':[1,2,3]}
df = pd.DataFrame.from_dict(a_dict, orient='columns')
df
map()
The map
method can be used either to apply a custom function to each element of a series, or to map/substitute that value with another value derived from a dictionary/list.
Syntax:
Series.map(arg, na_action=None)
Parameters:
-
arg
: function, collections.
(Mapping correspondence). -
na_action{None, ‘ignore’}
, default None
Returns
Series with the same index as input.
1. Mapping sample values with a dictionary.
df
df['c1']=df.c1.map({1:'ONE', 2:'TWO',3:"THREE"})
df
Note that, when the mapping argument is a dictionary, the values in Series that are not in the dictionary are converted to NaN.
i.e, If we do not mention the substitute value for some value(present in the series) in the dictionary, it will be converted to NaN.
df.c2 = df.c2.map({1:'ONE', 2:'TWO'})
df
This brings us to how we can deal with NaN values while mapping. We can simply add the argument na_action='ignore'
so as to avoid applying function to mising values.
2. Mapping sample values with a function.
def add_lowercase(val):
return(val+'_'+val.lower())
df.c2.map(add_lowercase, na_action='ignore')
0 ONE_one
1 TWO_two
2 NaN
Name: c2, dtype: object
To make this a permanent change, you can simply equate the mapped series/column to the original datframe column.
apply()
The apply
method can be used to apply a custom function to an entire column/row of a dataframe to return an aggregated result. It can be applied to both, a series or a dataframe but should be prefered for complex operations.
a_dict = {'c1':[1,2,3], 'c2':[1,2,3],'c3':[1,2,3], 'c4':[1,2,3]}
sample_df = pd.DataFrame.from_dict(a_dict, orient='columns')
sample_df
1. pd.Series.apply
def power(val):
return val**val
sample_df['c5'] = sample_df['c3'].apply(power)
sample_df
A general practise should be to use these three functions with lambda.
sample_df['c4'].apply(lambda x: x*100)
0 100
1 200
2 300
Name: c4, dtype: int64
Since we are using the apply method on a specific column, we will see a similar result to map
.
2. pd.DataFrame.apply
:
Pandas dataframe.apply
is used to apply a function along an axis of the DataFrame.
Means, we have to explicitly provide the axis
argument that defines whether we are operating row-wise(axis = 1)
or column-wise(axis = 0)
. By default the axis = 0.
sample_df
def func(vals):
return vals.sum()
sample_df.apply(func) #column-wise aggregation creates an aggregated row
c1 6
c2 6
c3 6
c4 6
c5 32
dtype: int64
sample_df.apply(func, axis=1) #row-wise aggregation creates an aggregated column
0 5
1 12
2 39
dtype: int64
A column-wise aggregation can create an aggregated row whereas a row-wise aggregation creates an aggregated column. This can also be seen with the following implementation:
sample_df.loc['3'] = sample_df.apply(func)
sample_df
sample_df['c6'] = sample_df.apply(func, axis = 1)
sample_df
applymap()
The applymap
method is another way to modify values but is only suited for Dataframes.
def pow(val):
return val**5
a_dict = {'c1':[1,2,3], 'c2':[1,2,3],'c3':[1,2,3], 'c4':[1,2,3]}
sample_df = pd.DataFrame.from_dict(a_dict, orient='columns')
sample_df
sample_df.applymap(lambda x: str(x)+'_')
sample_df.applymap(pow)
The function passed into applymap
applies individually to all elements in the input dataframe.
!pip install jovian --upgrade --quiet
import jovian
|████████████████████████████████| 68 kB 3.3 MB/s eta 0:00:011
Building wheel for uuid (setup.py) ... done
jovian.commit()
[jovian] Detected Colab notebook...
[jovian] jovian.commit() is no longer required on Google Colab. If you ran this notebook from Jovian,
then just save this file in Colab using Ctrl+S/Cmd+S and it will be updated on Jovian.
Also, you can also delete this cell, it's no longer necessary.