Unsupervised learning consists of the set of statistical tools that are used in the setting in which we have only a set of $p$ features measured on $n$ observations, but we do not have an associated response variable $Y$. The goal is to discover interesting things about the measurements on $X_1, X_2,\dots, X_p$. The two main types of unsupervised are clustering and dimensionality reduction. We will focus on two types: principal components analysis, a tool used for preprocessing and data visualization, and k-means and hierarchical clustering, methods for discovering unknown subgroups in data.
Unsupervised learning is often more challenging than supervised learning. It is a more subjective type of analysis, as there is no simple goal, as is often the case in supervised learning. Unsupervised learning is often used in exploratory data analysis.
The main difference is this: in supervised learning, we have universally accepted mechanisms for checking our work, cross validation or validating results on an independent data set. We can see how our predictive model works on observations not used in fitting the model. In unsupervised learning, there is no way to check our work because there is no true answer- the problem is unsupervised.
When faced with a large set of correlated variables, principal components allow us to summarize this set with a smaller number of representative variables that collectively explain the variability of the original set. PCA is the process by which principal components are computed. You'll learn this in next few sections.
Clustering refers to a broad set of techniques for finding subgroups, or "clusters", within a data set. When we cluster the observations of the data set we seeks to partition them into distinct groups so that the observations within each group are similar to eachother.