Also include common EDA/preprocessing done with libraries such as Seaborn, Plotly, etc

Checking for Missing Values Per Column with Pandas

#Check for missing records with train as the df
train.isnull().sum(axis=0)

Plot Value Counts of Columns for Numerical Variables with Seaborn

#Visualize how many reviews per Sentiment
sns.barplot(x=train.Sentiment.value_counts().index,y=train.Sentiment.value_counts())

Drop Rows Based off of a Column Integer Value

#Drop fares with values less than 0
df = df[df.fare_amount >= 0]
df.describe()

Plot Value Counts of a Column In a Dataframe

df['class'].value_counts().sort_values().plot(kind = 'barh')

Encoding a Python Dataframe with all Categorical Variables

#All variables are categorical need to encode
from sklearn.preprocessing import LabelEncoder
def encodeCategorical(data):
    labelencoder=LabelEncoder()
    for col in data.columns:
        data[col] = labelencoder.fit_transform(data[col])
    return data
df = encodeCategorical(df)
df.head()

Selecting a subset of columns for X, y split

X = df[['artist','Genre/Mood','Language','release_year','popularity']]
y = df['name']

Ordinal Encoding

enc = OrdinalEncoder()
enc.fit(df[["Sex","Blood", "Study"]])
df[["Sex","Blood", "Study"]] = enc.transform(df[["Sex","Blood", "Study"]])

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/fad7573f-6545-4d85-8b7f-012fba8cb19e/Screen_Shot_2020-08-19_at_4.23.21_PM.png

Smarter Ways to Encode Categorical Data for Machine Learning

Using OrdinalEncoder to transform categorical values in Python