Missing data can refer to many things. This depends on the origin of the data and the context of it being generated. An example is the salary field with an empty value or a number 0 is an invalid value and can be considered “Missing data”.
falsy_values = (0,False,None, '', [], {})
np.nan #Numpy has a special "nullable" value for numbers which is np.nan
3 + np.nan #Anything that touches nan becomes it
Numpy also supports an infinite type :
np.inf #Also behaves as a virus
3 + np.inf #Returns infinite
There are two functions : np.isnan
and np.isinf
that will perform the desired checks
np.isnan(np.nan) #Returns true
np.isinf(np.inf) #Returns true
####################
Joint operation is np.isinfinite
####################
np.isinfinite(np.nan), np.isinfinite(np.inf) #Returns false for both
np.isnan(np.array([1,2,3,np.nan,np.inf,4]))
#array([False, False, False, True, False, False])
np.isinf(np.array([1,2,3,np.nan,np.inf,4]))
#array([False, False, False, False, True, False])
np.isfinite(np.array([1, 2, 3, np.nan, np.inf, 4]))
#array([ True, True, True, False, False, True])
When we are trying to perform an operation with a Numpy array and we are aware that there will be missing values, we will need to filter them out before proceeding. to avoid Nan propagation, we will use a combination of the previous np.isnan
and Boolean arrays for this purpose.
a = np.array([1, 2, 3, np.nan, np.nan, 4]) #creates an array
a[~np.isnan(a)] #Checks for values that are not missing
a[np.isfinite(a)] #Same as this
Similarly to numpy, pandas also has a few utility functions to identify and detect null values
pd.isnull(np.nan)
pd.isnull(none)
pd.isnull(pd.Series([1, np.nan, 7])) #These functions also work with series
pd.isnull(pd.DataFrame({
'Column A': [1, np.nan, 7],
'Column B': [np.nan, 2, 3],
'Column C': [np.nan, 2, np.nan]
})) #Made up data frame
pd.Series([1, 2, np.nan]).count() #Returns 2
pd.Series([1, 2, np.nan]).sum() #Returns 3.0
pd.Series([2, 2, np.nan]).mean() #Returns 2.0
As we saw with numpy, Boolean selection + pd.isnull
to filter out those nan and null values :