Support Vector Machine:

It is a Supervised Classification algorithm used to analyze data. Vladimir Vapnik invented the support vector machine.

At first approximation, what Support Vector Machines do is find a separating line, or more generally called a hyperplane, between data of two classes. So, suppose we have some data of two different classes. Support Vector Machine is an algorithm that takes this data as an input, and outputs a line that separates those classes in the best way, if possible.

The margin is the distance between the line and the nearest point of either of the two classes. The hyperplane made by SVM maximizes the margin.

Now to understand the correct result of a support vector machine look at the example below:

Line A does maximize the margin, in some sense to all the data points but it makes a classification error, the red x is on the wrong side of the green line. Whereas in line B, all the points are classified correctly.

Support vector machine puts first and foremost the correct classification of the labels, and then maximize the margin. So for support vector machines, you are trying to classify correctly, and subject to that constraint, you maximize the margin.

Even if there’s point that can’t be classified correctly, while retaining its largest margin, SVM will treat it as outliers, and can safely ignore the points.

Support Vector Machine Algorithm:

SVM just like Naive Bayes is a classification algorithm for binary (two-class) and multi-class classification problems. The technique is easiest to understand when described using binary or categorical input values.

  1. We create some training points.
  2. import the sklearn module for GaussianNB from sklearn.naive_bayes
  3. create classifier
  4. fit the classifier on the training features and labels; i.e, training- where we actually give it the training data, and it learns the patterns.
  5. Then we predict the labels for new points.
  6. Calculate accuracy

Example code:

from sklearn.svm import 

SVCclf = SVC(kernel="linear")

clf.fit(features_train,labels_train)

pred = clf.predict(features_test)

from sklearn.metrics import accuracy_score

acc = accuracy_score(pred, labels_test)

On applying this algorithm on different data sets we get decision boundary that is classifying points, predict which data belongs to which group and find out the accuracy of our prediction algorithm using sklearn.

If we apply this algorithm in the self driving car problem which was discussed in the previous post link, we get our decision boundary like this:

SVM can do some really complicated shapes to the decision boundary, sometimes even more complicated then you want.

SVM will gives us linear separable if we’re trying to include polynomial features if we’re trying to solve non-linear data. The z features, as two dimensional, will consider the problem as top right, making z capable of linearly separating the graph.

Important parameters for an SVM

  1. Kernel:

    There are functions that take a low dimensional input space or feature space, and map it to a very high dimensional space. So that what’s used to be not linear separable and turn this into a separable problem, these functions are called kernels. These aren’t just functions with a feature space, these are functions over two inputs. And when you apply the kernel trick to change your input space from x,y to a much larger input space, separate the data point using support vector machines, and then take the solution and go back to the original space. You now have a non linear separation.A kernel can be linear, poly, rbf, sigmoid, precomputed, or a callable.

  2. C:

    C parameter controls the tradeoff between a smooth decision boundary and one that classifies all the training points correctly. A large value of C means that you’re going to get more training points correct. So what that means in practice is that you get the more intricate decision boundaries with the larger values of C where it can wiggle around individual data points to try to get everything correct.

  3. Gamma:

    Gamma defines how far the influence of a single training example reaches. If we have a high value of gamma, the exact details of the decision boundary are going to be dependent only on the closest points and certainly ignoring the faraway points. But if we have low value of gamma, it is better as even the faraway points are taken into account.

Overfitting

Iis a common phenomena in machine learning that happens when you take your data too literal, and when your machine learning algorithm produces something much more complex, as opposed to something very simple.

So, in machine learning we really want to avoid over-fitting. One of the ways that you can control over-fitting is through the parameter of your algorithm.

 Advantages of support vector machines:

  • Effective in high dimensional spaces.
  • Still effective in cases where number of dimensions is greater than the number of samples.
  • Uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.
  • Versatile: different Kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels.

Disadvantages of support vector machines include:

  • If the number of features is much greater than the number of samples, the method is likely to give poor performances and is very slow.
  • SVMs do not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation

References: