Unsupervised Learning

What is Unsupervised Learning?

Unsupervised learning is the second most widely used form of machine learning after supervised learning. The key difference: data has no output labels (Y).

Supervised Learning	Unsupervised Learning
Data includes labels (X → Y)	Data has no labels (just X)
Algorithm learns "right answers"	Algorithm finds patterns on its own
Example: tumor size + diagnosis	Example: tumor size + age, no diagnosis

Instead of predicting a specific output, unsupervised learning finds structure, patterns, or something interesting in the data without being told what to look for.

Clustering

Clustering is a type of unsupervised learning that groups unlabeled data into clusters based on similarities the algorithm discovers itself.

Application 1: Google News

Google News processes hundreds of thousands of articles daily and groups related stories together automatically.

For a story about panda twins born at a zoo, the algorithm noticed articles sharing words like "panda," "twin," and "zoo" and clustered them together. No human tells the algorithm which words matter, it figures this out on its own. This would be impossible to do manually given how many stories and topics exist each day.

Application 2: DNA/Genetic Analysis

DNA microarray data shows gene expression levels across many individuals:

Each column = one person's DNA activity
Each row = a specific gene
Colors indicate how strongly genes are expressed

A clustering algorithm can group people into types (Type 1, Type 2, Type 3) based on genetic similarities. Researchers don't define the types in advance, the algorithm discovers them.

What is Unsupervised Learning?

Clustering

Application 1: Google News

Application 2: DNA/Genetic Analysis

Three Types of Unsupervised Learning

1. Clustering

2. Anomaly Detection