Modified on

30 Dec 2022 07:06 pm

Types of Machine Learning / Deep Learning Algorithms



Machine learning algorithms learn from the data and help make a prediction or classification.

Depending upon whether the input data is labelled or not, we can classify the algorithm as supervised and unsupervised algorithms. Apart from this, we also have reinforced learning, where the algorithms get feedback based on their prediction. Based on this feedback, the machine learns. A typical example of supervised learning will be labelling all cat pictures as cat and dog pictures as dogs and asking the machine to learn. In contrast, for unsupervised learning, cat and dog pictures are given to the machine to learn by itself. Finally, in reinforcement learning, the machine gets positive feedback every time a cat is identified as a cat, and for wrong identification, it gets negative feedback. Based on this, the model learns and trains itself.

How To Classify The Type Of Algorithms Based On The Work They Do?

Regression Algorithms

In the case of regression algorithms, we intend to find a mathematical relationship between the input and output. We try to find a function which takes the inputs as variables and tries to predict the output as a function of the input.

Some of the well-known regression models are

  • Least Squares Regression

In the least square regression, we try to fit a best-fit line that passes through most of the points.

  • Linear Regression / Polynomial Regression

Here we fit a linear function or a polynomial function through the data points that explain the dataset well.

  • Logistic Regression

Here the output data is discrete, while the input could be continuous. In such a scenario, we use logistic regression. A classic example of this is the Breast cancer dataset.

  • Stepwise Regression

Here the variables are added stepwise and then statistical tests are performed to check the significance. The variables that pass the statistical significance are further selected for modelling.

  • Multivariate Regression

Multi-variate regression checks out the relationship between the dependent and independent variables. Essentially, this gives the behavior of the response variable based on the predictor variable.

  • Locally Estimate Scatter Plot Smoothing

Here the data is first plotted using a scatter plot and regions are identified, where different clusters are formed. Within the clusters, lines that fit the data are plotted.

Instance-Based Algorithms.

  • kNN k nearest neighbor algorithm

Here the data is classified based on its neighbors. The value of k is decided by the user.

  • Learning vector Quantization (LVQ)

Here we have a two-layer artificial neural network that adjusts the weights as the learning happens. Here the learning is via competition rather than feedback corrections.

  • Self Organizing Map

This is similar to LVQ, except that it is unsupervised.

  • Locally Weighted Learning

Locally weighted learning is a group of functions that predicts a particular input based on the local model around it.

  • Support Vector Machines

Here the data is segregated using hyperplanes. A new point is classified or regressed using the distance from the hyperplanes. Based on regularization, we have the following set of algorithms. Regularization is needed so we are not caught at high variance or high bias. Usually, the model is punished to avoid such scenarios.

  • LASSO L1 regularization

In the cost function, we add a factor of lambda times L1 norm, which is nothing but the sum of absolute deviations.

  • Ridge regression L2 regularization.

In the cost function, we add a factor of lambda times L2 norm, which is nothing but the sum of the square of deviations.

  • Elastic Net

Here the regularization is done by combining both L1 and L2 norm.

  • Least angle regression

This is similar to stepwise regression. This plays a vital role when many attributes are to be considered.

Decision Tree Algorithms

Decision trees are constructed to have a node and branch-like structure. The trees grow branches as long as the data becomes pure. At this stage it is called the leaf. By purity, we denote that all the data at this region are similar. An example would be a class containing girls and boys. A node would be what is the gender? This will have two answers, boys and girls. Any data point (i.e., student) is classified into either of the two. When we check the boys node, all data points are boys, so in that way it is pure. Similarly, for the girls.

  • Classification and regression tree (CART)

Here the classification is done based on the Gini impurity index. Here the classification is mostly binary.

  • Iterative dichotomiser 3(ID3)

Here the decision is made by using the entropy or the information gain.

  • C4.5 and C5.0

Here the decision is made by using the entropy or the information gain.

  • Chi-square automatic interaction detection (CHAID)

Here the number of classes can be more than 2. This is more for a descriptive analysis.

  • Decision Stump

The decision stump is a Decision tree model with just one decision-making node.

  • M5

In this model regression can be performed and hence can be used for regression also. In the leaf nodes some functions take the input and predict the values.

  • Continuous decision trees

There may scenarios where a proper division cannot be made and in those scenarios we use a continuous variable decision tree. This is also called as regression tree because decision at one place depends on decisions taken elsewhere.

  • Conditional inference trees

In all the previous scenarios, a node was selected based on entropy or information gain. However, the node selection is made here by conducting a series of non-parametric tests.

Bayesian Algorithms

The next type of algorithms which we study is the Bayesian algorithms. In Bayesian algorithm, there is an assumption that the Bayes theorem is valid. So this requires that the input variables are independent.

Some of the most popular Bayesian algorithms are listed below.

  • Naive Bayes

Mostly used for high-dimensional datasets, where there is an assumption that various features are independent. Here probabilities are calculated and accordingly, classification is made.

  • Gaussian Naive Bayes

It is similar to the Naive Bayes, except that here we make an assumption that our input features follow a Gaussian distribution.

  • Multinomial/Binomial Naive Bayes

Here we are making frequency histograms based on whether the classification is binary or more than binary.

  • Averaged one-dependence Estimators (AODE)
  • Bayesian Belief Network
  • Bayesian Network

Clustering Algorithms

These algorithms classify a member based on the structures in the data. The data is organized into groups based on maximum commonality and similarity.

Some of the popular algorithms are

  • K-means

In the K means algorithm, the data is divided into K regions. Essentially, K central points are identified and new data is assigned to the group based on its proximity.  The central points (also called centroids) are calculated by finding the means.

  • K-medians

Here the centroids are calculated based on the medians.

  • Expectation Maximization

In this there are two steps in the algorithms. The first is the estimation step and the second is the optimization step. In the first step, missing variables are estimated and in the second step the model's parameters are maximized.

  • Hierarchical Clustering

In hierarchical clustering the data is clustered into various groups. Within the group the data points are similar. The difference between Hierarchical clustering and K means clustering is that in the latter the number of groups is already decided. There are two ways by which this algorithm works. One is agglomerative and the second is divisive. In agglomerative is bottom - up approach, while decisive is top-down approach. The agglomerative approach will make the entire data set into one cluster. First, a cluster is formed by taking nearby points. Then the cluster size increases by including the next points. This process repeats till all the points are brought into one big cluster.  

Association Rule Learning Algorithms

In these algorithms, relationships between two variables are uncovered and rules that explain them are excavated from the data. These can be used for some sort of predictions.

  • Apriori algorithm

In this algorithm, association rules are studied between members or transactions. For instance, in shopping, the shopkeepers are always interested to know if buyers who buy object A also buy object B. In case they buy, A is always kept near B. These kinds of association rules are mined from the dataset.

  • The eclat algorithm is also known as the equivalence class clustering and bottom-up lattice transversal.

In this algorithm, association rules are mined between various transaction id sets. This is more efficient than the apriori algorithm.

Artificial Neural Network Algorithms

These are inspired by the neuron structure in our brains. Neurons are interconnected to each other. While training the model, the weights of the interconnection are constantly adjusted.

  • Perceptron

It is a simple neuron with one node with binary output. The inputs can be many.

  • Multi-layer perceptrons

In multi-layer perceptrons, there are fully connected neural networks with 3 layers. If there is more than one hidden layer, it becomes a deep-learning Artificial neural network.

  • Backpropagation

In this neural network, the errors in classification or prediction are backpropagated from output to input, and the weights are adjusted.

  • Stochastic Gradient descent

In the stochastic gradient descent, gradients are calculated for some part of the data, and for the next iteration, different points from the datasets are used.

  • Hopfield Network

Hopfield Network consists of a fully interconnected neural network. That is to say that all neurons are connected to all others. This network is used to learn associations.

  • Radial basis Function Network

This is a three-layer feed-forward neural network. The first layer is the input layer, the second is the hidden layer with an activation unit, and the last is the output layer. the activation unit mainly consists of Gaussian functions.

Deep Learning Algorithms

These are extensions of ANN. Some of the important algorithms in this category are

  • Convolutional Neural Network

Convolutional neural networks are mainly used for classifying images. The images are stored as an array of pixels. These input arrays are multiplied by another array called the kernel or the filter. The size of the kernel need not be the same as the input. The features are extracted via this process of convolution.

  • Recurrent Neural Networks

In a recurrent neural network, the neurons send signal to each other in anyway. This is mainly suited to analyzing temporal data or sequential data.

  • Long and Short term Memory networks

LSTM is almost like a RNN, except that it can handle a lot of data. It consists of a cell, input gate, forget gate and an output gate. The three gates control the flow of information. The important parts of a messages are stored and used for further processing.  

  • Stacked Autoencoders

These are used to reduce the dimension of the data. A non-linear function describes the relationship between the input and the output. These will automatically capture features.

An auto-encoder contains three layers, The encoder, the decoder and the bottleneck. The encoder picks the most important features. The decoder tries to reconstruct the original information. Multiple auto encoders working together form a stacked autoencoder.

  •  Deep Boltzmann machine

In a deep Boltzmann machine, all neurons are connected to each other; they are multi-directional. The connections grow exponentially.

  • Deep Belief networks

Deep belief networks arise when we stack multiple deep Boltzmann machines.

Dimensional Reduction Algorithms

Here larger data are reduced into smaller ones by using dimension reduction based on the inherent structure of the data. This technique aids in visualization or simplifying the data.

  • Principal component analysis

In principal component analysis, rotations are performed in higher dimensional space, to reduce the number of dimensions. This facilitates by reducing the complexity of the problem. A classic example would be the movement of chalk on a board can be tracked with a camera and we will get the position of the chalk in x,y and z direction. However in this case since chalk moves on the board, the motion is restricted into a plane and thus performing a PCA would reduce the dimension from 3 to 2.

  • Principal Component Regression

PCR = PCA + LR (linear regression) works efficiently well on multivariate data

  • Partial Least Squares Regression (PLSR)

Here the algorithm tries to reduce the input as much as possible and still predict y. The difference between PCR and PLSR is that PCR concentrates on X alone, while PLSR considers Y also.

  • Sammon mapping

Mapping from a higher dimension to lower dimension using gradient descent methods.

  • Multi-Dimensional Scaling
  • Projection Pursuit

Using the Kurtosis in the data, projection indexes are devised, which helps scale the data.

  • Linear Discriminant Analysis LDA

LDA finds a feature subspace and is mostly used in supervised learning. Here there is an inherent assumption that all classes come from a single Gaussian distribution.

  • Mixture Discriminant Analysis

Similar to LDA with relaxation on the assumption that all classes come from a single Gaussian distribution.

  • Quadratic Discriminant Analysis

It is a general model which assumes that each class comes from a Gaussian distribution.

  • Flexible Discriminant Analysis

Here there is a mixture of linear regression models that is used for prediction purposes.

Ensemble methods

In an ensemble, methods are a combination of multiple models. These models work together to give better accuracy.

  • Boosting

In this process, many weak models are combined to make a stronger model. By weak model, what we denote is that the model is just better than a random guess. While for a stronger model, the prediction is as accurate and almost close to the actual ones. Here some part of data is sampled and trained with models sequentially. Each model that succeeds tries to learn from the weakness of the previous model. The weak rules from all are combined to form a strong one. Boosting is used when there is low variance and high bias. AdaBoost and XGBoost are two very popular techniques.

  • Bootstrapped aggregation or Bagging

In Bagging, the models run in parallel. Bagging are used when there is high variance and low bias.

  • Weighted Average (blending)
  • Stacked Generalization (stacking)
  • Gradient boosting machines (GBM)
  • Gradient-boosted regression trees (GBRT)
  • Random forest

Feature selection algorithm

This manipulates the input to reduce the noise and get more relevant information to make a prediction.

algorithm accuracy evaluation

For classification, we can use accuracy, precision, recall, F-1 score, ROC, AUC

For regression, we can use MSE, MAE

Ranking metrics would involve finding MRR, DCG and NDCG

Correlation is one of the statistical metrics

PSNR, SSIM and IOU are used for computer vision

Perplexity, BLEU scores are used for NLP

Inception score, Frecher Inception distance for deep learning

performance measures

Apart from this, we have Algorithms for the following

  • Optimization Algorithms
  • Evolutionary Algorithms
  • Computer Vision
  • Natural Language Processing
  • Recommender Systems
  • Reinforcement Learning
  • Graphical Models




Navin Baskar




Related Blogs

How do you connect to MS Excel using MySQL?

When analysing SQL data, Microsoft Excel can come into play as a very effective tool. Excel is instrumental in establishing a connection to a specific database that has been filtered to meet your needs. Through this process, you can now manipulate and report your SQL data, attach a table of data to Excel or build pivot tables.


08 Aug 2022

How to remove MySQL Server from your PC? A Stepwise Guide

Microsoft introduced and distributes the SQL Server, a relational database management system (RDBMS). SQL Server is based on SQL, a common programming language for communicating with relational databases, like other RDBMS applications.


23 Aug 2022

Introduction to Artificial Intelligence, Machine learning, and Deep Learning

Machine Learning is a process by which we train a device to learn some knowledge and use the awareness of that acquired information to make decisions. For instance, let us consider an application of machine learning in sales.


01 Jul 2022

Do Not Be Just Another Engineer: Four Tips to Enhance Your Engineering Career

Companies seek candidates who can differentiate themselves from the colossal pool of engineers. You could have a near-perfect CGPA and be a bookie, but the value you can provide to a company determines your worth.


04 Jul 2022

Cross-Validation Techniques For Data

Often while working with datasets, we encounter scenarios where the data present might be very scarce. Due to this scarcity, dividing the data into tests and training leads to a loss of information.


27 Dec 2022




