PageRank | Notion

The web can be thought of as a directed graph:

In web searches (lots of sources of information), need to consider who to “trust”.

“Trustworthy” pages may point to each other
Web pages may not be as equally “important”
- e.g. www.stanford.edu vs. random personal site

There is a large diversity in web-graph node connectivity. We can rank the pages by link structure (PageRank).

PageRank

PageRank is a measure of web page quality based on the structure of the hyperlink graph.

More precisely, PageRank is a probability distribution over nodes in graph of web representing the likelihood that a random walk over the link structure will arrive at a particular node
- high in-degree ⇒ tend to have high PageRank, nodes linked to by other nodes have high PageRank
- complete formulation includes randomly jumping to a completely different page

Links on a page can be thought of as “votes”:

A page is more “important” if it has more votes - in-coming links
- e.g. www.stanford.edu has around 23,400 in-links, while personal website may have single digit in-links
Links from important pages should also count more

A basic recursive formulation of PageRank is as follows:

Each link “vote” is proportional to the importance of its own source page
if page $j$ with importance $r_j$ has $n$ out-links, each link gets $r_j/n$ votes
Page $j$’s importance is the sum of the votes on its in-links