SUPERVISED LEARNING:

Supervised learning is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output.

Y = f(X)

The goal is to approximate the mapping function so well that when you have new input data (x) that you can predict the output variables (Y) for that data.

It is called supervised learning because the process of an algorithm learning from the training dataset can be thought of as a teacher supervising the learning process. We know the correct answers, the algorithm iteratively makes predictions on the training data and is corrected by the teacher. Learning stops when the algorithm achieves an acceptable level of performance.

Supervised learning problems can be further grouped into regression and classification problems.

Classification: A classification problem is when the output variable is a category, such as “red” or “blue” or “disease” and “no disease”.

One of the uses of supervised classification is self driving cars. The cars need to be trained when to go fast and when to slow down based on the road terrain. Go fast when the road is smooth and slow down when it’s bumpy.

As an example, we take 750 points in our scatter plot for a self driving car.

What our machine learning algorithm do is they define what’s called a decision surface. And the goal is to draw a decision boundary that will help us distinguish which terrain we need to go slow and which terrain we can go really fast. That means being able to draw a boundary on  where we’re able to divide the two classes.

So we have our decision boundary that we can draw between our two classes, and for any arbitrary point, we can immediately classify it as terrain where we have to go slow or terrain where we can drive really fast.

So to make this decision boundary we use the algorithm named Gaussian Naive Bayes Algorithm which uses the Scikit-learn or sklearn Python library.

Gaussian Naive Bayes Algorithm:

Naive Bayes is a classification algorithm for binary (two-class) and multi-class classification problems. The technique is easiest to understand when described using binary or categorical input values.

This algorithm is divided into two phases, Training and Testing.

  • In the training phase,
  1. We create some training points.
  2. import the sklearn module for GaussianNB from sklearn.naive_bayes
  3. create classifier
  4. fit the classifier on the training features and labels; i.e, training- where we actually give it the training data, and it learns the patterns.
  5. Then we predict the labels for new points.

Example code:

>>> import numpy as np

>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])

>>> Y = np.array([1, 1, 1, 2, 2, 2])

>>> from sklearn.naive_bayes import GaussianNB

>>> clf = GaussianNB()

>>> clf.fit(X, Y)

GaussianNB(priors=None)

>>> print(clf.predict([[-0.8, -1]]))

[1]

>>> clf_pf = GaussianNB()

>>> clf_pf.partial_fit(X, Y, np.unique(Y))

GaussianNB(priors=None)

>>> print(clf_pf.predict([[-0.8, -1]]))

[1]

On applying this algorithm on different data sets of self driving car example, we get decision boundary that is classifying points like this

  • Now in the testing phase,

We find out how well our algorithm is doing by writing a code to tell what the accuracy is of this naive_bayes classifier that we made.Accuracy is just the number of points that are classified correctly divided by the total number of, of points in the test set.

Example code:

>>> from sklearn.metrics import accuracy_score

>>> accuracy_score(y_true, y_pred)

This way we can predict which data belongs to which group and find out the accuracy of our prediction algorithm using sklearn.

Bayes Rule:

Bayes’ Theorem provides a way that we can calculate the probability of a hypothesis given our prior knowledge.

Now for example, we are given a set of emails written by 2 authors. Then if we are given a new email, using Naive Bayes algorithm we can actually predict the author of the new email.

Strengths of Naive Bayes:

It’s actually really easy to implement with big feature spaces, there is between 20,000 and 200,000 words in the English language.

And it, it’s really simple to run, it’s really efficient.

Weaknesses of Naive Bayes:

It can break.

Historically when Google first sent it out, when people searched for Chicago Bulls, which is a sports team comprised of two words, Chicago Bulls, it would show many images of bulls, animals, and of cities, like Chicago. But Chicago Bulls is something succinctly different.

So phrases that encompass multiple words and have distinctive meanings don’t work really well in Naïve Bayes.

References: