Contents:

  • What is Machine Learning
  • Types of Machine Learning
    • Supervised Learning
    • Unsupervised Learning
    • Reinforcement Learning
  • Unsupervised Learning
    • What is Clustering
      • Examples
      • Applications
      • Use cases of Clustering in business
  • Types of Clustering
    • Exclusive Clustering
    • Overlapping Clustering
    • Hierarchical Clustering
  • K-Means Algorithm
    • Where to apply
    • Explanation
    • Steps to follow
  • Conclusion
  • References

1. What is Machine Learning?

In this blog we are going to understand K-Mean Clustering with an overview of  Machine Learning (ML) and its types.  Machine learning will help us to better understand about K-Mean Clustering comes under which type of ML.

Machine learning is the study of computer algorithms that improve automatically through experience. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as “trainingdata”, in order to make predictions or decisions without being explicitly programmed to do so.Machine learning algorithms are used in a wide variety of applications, such as emailfiltering and computervision, where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks.

– Wikipedia

While watching a video on Youtube viewers get video recommendations based off on the content. This is an example of a recommendation system using machine learning algorithm. Most online shopping websites like Amazon make use of such recommendation system model build using machine learning. Movie recommendations in streaming services like Netflix are similarly based on past viewing history.

2. Types of Machine Learning

Machine learning is broadly divided into 3 types:

  • Supervised learning
  • Unsupervised learning
  • Reinforcement learning

The chart below, shows overall classification of types of machine learning and as well as their popular sub-type.

What is Supervised learning?

Supervised learning algorithms build a mathematical model of a known set of data that contains both the inputs and the desired outputs. As the dataset is known and it is used to train the model just like a teacher guiding through the process of learning. Hence, this algorithm is known as Supervised learning. Once the model is trained it can be use to predict the output of new data given to it.

Wikipedia

Examples of Supervised learning:

Email spam and malware filtering is an example of supervised learning where the model is trained with already classified email as spam and non-spam. Based on past training it can automatically classify a new email itself as spam or non-spam.

What is Unsupervised learning?

Unsupervised learning is a type of machine learning that looks for previously undetected patterns in a data set with no pre-existing labels and with a minimum of human supervision. In contrast to supervised learning that usually makes use of human-labeled data, unsupervised learning, also known as self-organization allows for modeling of probability densities over inputs. As the learning is taking place without the guidance of any supervisor hence, the name is Unsupervised learning.

Wikipedia

This algorithm can make groups or clusters of inputs having the same features but cannot add labels for the same. But have the feature of assigning the new data into a cluster or group already created during training.

Examples of Supervised Learning:

We as a human can easily classification fruits based on shape, colors and other factors in the same way we can train  a model with different fruits like mangoes, apples and bananas where model itself tries to find out the patterns and creates clusters or groups which further helps us to predict in which class a new data comes under in already created clusters. Below figure shows the example of classification of raw dataset consisting of fruits (mangoes, apples and bananas).

What is Reinforcement Learning?

Reinforcement learning (RL) is teaching a software agent how to behave in an environment by telling it how good it’s doing. It is an area of machine learning inspired by behaviourist psychology. Reinforcement learning is different from supervised learning because the correct inputs and outputs are never shown. Also, reinforcement learning usually learns as it goes (online learning) unlike supervised learning. This means an agent has to choose between exploring and sticking with what it knows best.

Wikipedia

It is just like a hit and trial method where the agent gets reward or penalty based on a correct or wrong prediction. After that once again training takes place to predict new data given to it.

Example of Reinforcement Learning:

Robotics is one of the finest example of Reinforcement learning where robots learn to behave on their own and get rewards or penalties based on which training takes place repeatedly till improvement is seen in the behavior  of the agent (Robot).

Consider a mouse trying to follow the best path to reach its favorite food (cheese).

If it takes the correct path then it will get a reward i.e. cheese otherwise it will get a penalty i.e. no cheese. In the same way any agent can be trained based on a given environment.

3. Unsupervised learning

K-Mean clustering as the name suggests that this algorithm comes under unsupervised learning of machine learning. So, let’s begin with the concept of clustering then followed by K-mean clustering with its example.

  1. What is Clustering?
  2. Types of Clustering
  3. What is K-means Clustering?
  4. How does a K-mean Algorithm work?

1. What is Clustering?

Clustering or classification is a process of separating the datasets into groups based on the similarity they possess.

Clustering works on the principle of points in the same group should be as same as possible and points in different groups should be as dissimilar as possible.

Examples:

  • Group of employees in a company.
  • Different flowers present in a garden.

Applications:

  • Amazon gives recommendations based on the past search or purchase .
  • YouTube shows videos based on the past watched history.

Use cases of Clustering  in business:

  • Retail Store (customers shopping behavior )
  • Banking (customer credit and profitability)
  • Insurance Companies (fraud detection, risk factor identification)

4. Types of Clustering?

Exclusive Clustering

This clustering is known as Hard Clustering as the data point / item belongs exclusively to one cluster.

Example : K-Means Clustering

Overlapping Clustering

This clustering is also known as soft cluster as the data point / item belongs to multiple clustering.

Example: Fuzzy and C-means Clustering

Hierarchical Clustering

What is K-Means Clustering?

“K-Means is a part of unsupervised machine learning algorithm which is used to cluster data based on similar features”

The letter ‘K’ in K-Means represents the no. of clusters or groups. This is the main factor behind accuracy in the model.

Examples: Folders consisting of different types of legal contracts or a fruit basket filled with different fruits and classification of such files or fruits into groups based on similarity and dissimilarity is what meant by clustering.

Applications of K-Means?

We can apply this algorithm wherever there is some data that we need to classify.

Example: Documentation classification  (tags, topics and contents).

5. K-Means Algorithm

Consider classifying different types of unlabelled flowers.

Step 1 : Let’s begin with the selection of the value of K i.e. number of clusters. Looking at the figure below we can say that  the value of k=3 and in following figure we have given 3 different colors to each cluster( red, blue and green).

Raw data plotted
Clustering of data with K=3

Step 2: Now select 3 random points as shown below – red, blue and green data points

Random selection of 3 points

Step 3: One by one calculate the distance of the 1st selected point from the red, blue and green cluster.

Distance from point 1 to red cluster
Distance from point 1 to blue cluster
Distance from point 1 to green cluster

Step 4 : After calculating the distance of the 1st point from each cluster we can say that it belongs to the red cluster because it was nearest to it.

Assign 1st point to nearest cluster

Step 5: Once assignment is done we need to calculate the mean point for the red cluster including the new point.

Calculate the mean value including the new point for red cluster
Calculate the cluster mean including the new point
Result from 1st iteration
Total variation within the cluster

K-Means Algorithm keeps on calculating the mean after every new data is assigned in a cluster and loops till data points inside each cluster keep changing as we can see the K factor is very important in this case so we need to learn how to calculate the value of K in order to reduce variation which occurred in the above figure.

Step 6: How can one calculate the value of K?

Decide the value of K
Comparing variation for K=1 and K=2
Comparing variation for K=2 and K=3
Comparing variation for K=3 and K=4

When we take different values of K and plot it on a graph then we can see an elbow point and it is used to determine the correct value of K (cluster).

Elbow point for K value determination

Step 7: Repeat the following steps again in order to find in which cluster do other 2 points which were selected in step 2.

Add the point to the cluster and repeat steps

6. Conclusion

In this blog, we have gone through the basics of Machine Learning, its types and finally understood the concept of K-Means clustering which comes under Unsupervised ML with the help of steps one can follow to achieve this algorithm.

7. References:

  1. Machine Learning definition : https://en.wikipedia.org/wiki/Machine_learning
  2. Supervised Learning definition: https://en.wikipedia.org/wiki/Supervised_learning
  3. Unsupervised Learning definition: https://en.wikipedia.org/wiki/Unsupervised_learning
  4. Reinforcement Learning definition: https://simple.wikipedia.org/wiki/Reinforcement_learning
  5. K-mean Clustering Images: https://www.slideserve.com/EdurekaIN/k-means-clustering-algorithm-k-means-example-in-python-machine-learning-algorithms-edureka-powerpoint-ppt-presentation

Leave a Reply