The web can be thought of as a directed graph:

In web searches (lots of sources of information), need to consider who to “trust”.
- “Trustworthy” pages may point to each other
- Web pages may not be as equally “important”
There is a large diversity in web-graph node connectivity. We can rank the pages by link structure (PageRank).
PageRank
PageRank is a measure of web page quality based on the structure of the hyperlink graph.
- More precisely, PageRank is a probability distribution over nodes in graph of web representing the likelihood that a random walk over the link structure will arrive at a particular node
- high in-degree ⇒ tend to have high PageRank, nodes linked to by other nodes have high PageRank
- complete formulation includes randomly jumping to a completely different page
Links on a page can be thought of as “votes”:
- A page is more “important” if it has more votes - in-coming links
- e.g. www.stanford.edu has around 23,400 in-links, while personal website may have single digit in-links
- Links from important pages should also count more

A basic recursive formulation of PageRank is as follows:
- Each link “vote” is proportional to the importance of its own source page
- if page $j$ with importance $r_j$ has $n$ out-links, each link gets $r_j/n$ votes
- Page $j$’s importance is the sum of the votes on its in-links