Started out with an amazing TensorFlow course, course-Github. Before starting the course I wanted to start with Deep Learning and went through an existential crisis of choosing the framework for practicing DeepL. And following the usual procedure of googling TF vs Pytorch and reading a bunch of medium articles and answers on Quora and stack overflow I was still not convinced enough to let go of the other. Then I read this on one of the Kaggle competitions :

People often ask a question - Keras/Tensorflow vs Pytorch?

Answer is simple - both. Many recent publications use both Keras and Pytorch. If you want to be flexible and understand how solutions work you should know both. This is why I encourage you start today ... and implement your first NN in Pytorch.

And I decided why not both? So here I go coding all the concepts from Linear Regression to Time Series both in TF and Pytorch. Link to the Pytorch implementation.

Link to all resources in the end.

Multiclass classification (TensorFlow fashion MNIST)

Key points:

Input needs to be flattened out before passing to other layers.
Loss can be Categroical Cross Entropy if the targets are one hot encoded else Sparse Categorical Cross Entropy can be used. (can use tf.one_hot to one hot encode targets).
The output layer should contain the number of neurons equal to a number of target classes and Softmax must be used as the Output layer's activation.
For shape errors check the input shape the output shape and the loss function.
For checking validation error and validation loss pass the "validation_data" parameter to fit function containing the tuple of validation data and targets.
Neural networks prefer data to be normalized (scaled between 0-1).

Creating the dataset:

Used the scikit learn's "make_circles" function to create the dataset of two concentric circles belonging to two separate classes. The data is non-linear and hence the neural network must be able to generalize on this non-linear data.

<aside> 💡 from sklearn.datasets import make_circles

n_samples = 1000

create circles

X, y = make_circles(n_samples, noise=0.03, random_state=42)

</aside>

import matplotlib.pyplot as plt
plt.scatter(X[:, 0], X[:, 1], c=y);

TensorFlow Implementation:

Started out with a basic model with 3 layers and using Binary Cross Entropy as the loss function for classification problems, as we can see the model didn't perform very well.

# set seed
tf.random.set_seed(42)

# create model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(100),
    tf.keras.layers.Dense(10),
    tf.keras.layers.Dense(2)
])

# compile
model.compile(loss=tf.keras.losses.BinaryCrossentropy(),
              optimizer=tf.keras.optimizers.Adam(lr=0.001),
              metrics=['accuracy'])

# fit the model
history = model.fit(X, y, epochs=100, verbose=0)

So let's check the decision boundary the model predicted for the classes and we observe that the decision boundary is linear but the data is non-linear.

import numpy as np

def plot_decision_boundry(model, X, y):
  x_min, x_max = X[:, 0].min() - 0.1, X[:, 0].max() + 0.1
  y_min, y_max = X[:, 1].min() - 0.1, X[:, 1].max() + 0.1
  xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                       np.linspace(y_min, y_max, 100))
  
  # Create x value for making predictions
  x_in = np.c_[xx.ravel(), yy.ravel()]
  # Make predictions
  y_pred = model.predict(x_in)
  if len(y_pred[0]) > 1:
    print("doing multi class classification")
    y_pred = np.argmax(y_pred, axis=1).reshape(xx.shape)
  else:
    print("doing binary classification")
    y_pred = np.round(y_pred).reshape(xx.shape)
  ## plot boundary
  plt.contourf(xx, yy, y_pred, alpha=0.6)
  plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.RdYlBu)
  plt.xlim(xx.min(), xx.max())
  plt.ylim(yy.min(), yy.max())

Just one change to the previous model and the model starts recognizing the non-linearity in the data ie "Adding Softmax activation to output layer" that's it and we can see the results.

# Multilayer NN with non linear inner activation and outer activation
# set seed
tf.random.set_seed(42)

# model
model_4 = tf.keras.Sequential([
    tf.keras.layers.Dense(4, activation="relu"),
    tf.keras.layers.Dense(4, activation="relu"),
    tf.keras.layers.Dense(1, activation=tf.keras.activations.sigmoid)
])

# compile
model_4.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.Adam(lr=0.001),
                metrics="accuracy")

# fit model
model_4.fit(X, y, epochs=250, verbose=0)