Missing data can refer to many things. This depends on the origin of the data and the context of it being generated. An example is the salary field with an empty value or a number 0 is an invalid value and can be considered “Missing data”.

falsy_values = (0,False,None, '', [], {})

np.nan #Numpy has a special "nullable" value for numbers which is np.nan

3 + np.nan #Anything that touches nan becomes it

Numpy also supports an infinite type :

np.inf #Also behaves as a virus

3 + np.inf #Returns infinite

Checking for Nan Or Inf -

There are two functions : np.isnan and np.isinf that will perform the desired checks

np.isnan(np.nan) #Returns true

np.isinf(np.inf) #Returns true

####################
Joint operation is np.isinfinite 
####################

np.isinfinite(np.nan), np.isinfinite(np.inf) #Returns false for both

np.isnan(np.array([1,2,3,np.nan,np.inf,4])) 
#array([False, False, False,  True, False, False])

np.isinf(np.array([1,2,3,np.nan,np.inf,4]))
#array([False, False, False, False,  True, False])

np.isfinite(np.array([1, 2, 3, np.nan, np.inf, 4]))
#array([ True,  True,  True, False, False,  True])

Filtering them out -

When we are trying to perform an operation with a Numpy array and we are aware that there will be missing values, we will need to filter them out before proceeding. to avoid Nan propagation, we will use a combination of the previous np.isnan and Boolean arrays for this purpose.

a = np.array([1, 2, 3, np.nan, np.nan, 4]) #creates an array

a[~np.isnan(a)] #Checks for values that are not missing
a[np.isfinite(a)] #Same as this

Handling missing data with pandas -

Similarly to numpy, pandas also has a few utility functions to identify and detect null values

pd.isnull(np.nan)

pd.isnull(none)

pd.isnull(pd.Series([1, np.nan, 7])) #These functions also work with series

pd.isnull(pd.DataFrame({
    'Column A': [1, np.nan, 7],
    'Column B': [np.nan, 2, 3],
    'Column C': [np.nan, 2, np.nan]
})) #Made up data frame

pd.Series([1, 2, np.nan]).count() #Returns 2

pd.Series([1, 2, np.nan]).sum() #Returns 3.0

pd.Series([2, 2, np.nan]).mean() #Returns 2.0

filtering Missing data -

As we saw with numpy, Boolean selection + pd.isnull to filter out those nan and null values :