Consider the size of the web in terms of graph size:

We have a power law for the fraction of pages that have $k$ links, i.e. $P(k) \sim k^{-\gamma}$

Note Fibonacci binning ⇒ bin size increases by Fibonacci sequence, should use if plotting to log-log space.

Note Fibonacci binning ⇒ bin size increases by Fibonacci sequence, should use if plotting to log-log space.

image.png

It is clear that the web graph is big data, so we need Hadoop. We observe that many graph algorithms involve:

Thus, we must consider:

We will often do a BFS.

Single Source Shortest Path

In this problem, we want to find the shortest path from a single source node to all other nodes. Note that “shortest” in this context can refer to fewest links or the lowest total weight.