Hello new readers!

Visit the updated blog post on my new personal website:

Math Behind Graph Neural Networks



I've heard numerous concerns over the two GDL posts on Distill and how they seem to be confusing for beginners. This blog post is my humble attempt at bridging the gaps.

Don't worry, I've added tons of diagrams and drawings to help visualise the whole thing! 🤠

Also, I explicitly avoid the actual math-heavy concepts like spectral graph theory. Maybe an article for the future! The bulk of this article is comprehensive as long as you know the very basics of regular Machine Learning.


Graph Deep Learning (GDL) has picked up its pace over the years. The natural network-like structure of many real-life problems makes GDL a versatile tool in the shed. The field has shown a lot of promise in social media, drug-discovery, chip placement, forecasting, bioinformatics, and more.

Here, I wish to provide a breakdown of popular Graph Neural Networks and their mathematical nuances – a very tiny survey, of sorts. Think of this as the continuation to my previous article on Graph Neural Networks that had no math at all.

<aside> ⭐ The idea behind Graph Deep Learning is to learn the structural and spatial features over a graph with nodes and edges that represent entities and their interactions.


Structure of this article

I start off by providing an in-depth breakdown of Graphs and Graph Neural Networks. Here, I deep dive into the granular steps one would take for the forward pass. Then, I move on to training these networks using familiar end-to-end techniques. Finally, I use the steps in the forward pass section as a framework or guideline to introduce popular Graph Neural Networks from the literature.

Representing Graphs

Before we get into Graph Neural Networks, let's explore what a graph is in Computer Science.

A graph $\mathcal{G}(V, E)$ is a data structure containing a set of vertices (nodes) $i \in V$and a set of edges $e_{ij} \in E$ connecting vertices $i$ and $j$. If two nodes $i$ and $j$ are connected, $e_{ij} = 1$, and $e_{ij} = 0$ otherwise. One can store this connection information in an Adjacency Matrix $A$: