Introduction

Introduction

Dimensionality refers to the the minimum number of coordinates needed in order to specify any point within a space or an object.

For example, to identify someone uniquely in a classroom you need both first and last name. Sometimes two students might have the same first and last name so we might include the middle name as well. So the dimensionality for this case will be 3.

Dimensionality Reduction

Every form of data has to be converted into a feature set before it is analyzed. For example if you want to analyze images it needs to be converted to into a form that can be used by a machine learning algorithm. This process is called feature extraction.

Keeping the feature set simple reduces the dimensionality and complexity making data easy to analyze. However, the feature set might not be unique enough. The opposite is true if dimensionality is high. The feature set will be unique but the data might not be easy to analyze.

We have a hard time visualizing data above 3 dimensions. So apart from simplifying data, reducing dimensionality can help us visualize high dimensional data in a 2d or 3d space.

Manifold Learning Intuition

A manifold is a surface of any shape. It doesn’t have to be just a plane, it can also be shaped like a folded sheet with all the curves. This is generalized to ‘n’ dimensions and formalized as “manifold” in mathematics.

Manifold learning algorithms can be viewed as the non-linear version of PCA. For example, if your data points are distributed in a shape of a swiss roll, then PCA would not do a very good job because the surface is non-linear. This is because PCA would look for a planar surface to describe the data.

Algorithms for this task are based on the idea that the dimensionality of many data sets is only artificially high. Although the data points may consist of thousands of features, they may be described as a function of only a few underlying parameters.That is, the data points are actually samples from a low-dimensional manifold that is embedded in a high-dimensional space. Manifold learning algorithms attempt to uncover these parameters in order to find a low-dimensional representation of the data. Reference

Sources

Resources used for this page