A Support Vector Machine (SVM) is a supervised model that can be used for both classification and regression problems. In classification, an SVM finds a hyperplane that best separates our data into two classes. For instance, a hyperplane in two dimensions is simply a line. In three dimensions, a hyperplane is a plane. Since there are many possible hyperplanes that partitions our data, we need find the best hyperplane. The intuition behind SVM is that the best hyperplane is the one with largest margin to the support vectors. A support vector is the data point that touch the margin lines (filled points).
Source [#]
The hyperplane can also be explained in other way. The distance from the optimal hyperplane and a new point $x'$, tells how much confidence we have in our prediction. If the distance is large then we can be certain that $x'$ is one of the two categories. On the other hand, our prediction confidence would be high whenever $x'$ is close to the decision boundary.
The idea above requires that classes are linearly separable. However, by employing kerneling, the SVM technique can work with datasets that are not linearly separable. A kernel is a function that projects our data into a higher dimensional space.
Source [#]
In [ ]: