CSE

Uploaded on

31 Dec 2022

Skill-Lync

Machine learning algorithms learn from the data and help make a prediction or classification.

Depending upon whether the input data is labelled or not, we can classify the algorithm as supervised and unsupervised algorithms. Apart from this, we also have reinforced learning, where the algorithms get feedback based on their prediction. Based on this feedback, the machine learns. A typical example of supervised learning will be labelling all cat pictures as cat and dog pictures as dogs and asking the machine to learn. In contrast, for unsupervised learning, cat and dog pictures are given to the machine to learn by itself. Finally, in reinforcement learning, the machine gets positive feedback every time a cat is identified as a cat, and for wrong identification, it gets negative feedback. Based on this, the model learns and trains itself.

In the case of regression algorithms, we intend to find a mathematical relationship between the input and output. We try to find a function which takes the inputs as variables and tries to predict the output as a function of the input.

Some of the well-known regression models are

**Least Squares Regression**

In the least square regression, we try to fit a best-fit line that passes through most of the points.

**Linear Regression / Polynomial Regression**

Here we fit a linear function or a polynomial function through the data points that explain the dataset well.

**Logistic Regression**

Here the output data is discrete, while the input could be continuous. In such a scenario, we use logistic regression. A classic example of this is the Breast cancer dataset.

**Stepwise Regression**

Here the variables are added stepwise and then statistical tests are performed to check the significance. The variables that pass the statistical significance are further selected for modelling.

**Multivariate Regression**

Multi-variate regression checks out the relationship between the dependent and independent variables. Essentially, this gives the behavior of the response variable based on the predictor variable.

**Locally Estimate Scatter Plot Smoothing**

Here the data is first plotted using a scatter plot and regions are identified, where different clusters are formed. Within the clusters, lines that fit the data are plotted.

**kNN k nearest neighbor algorithm**

Here the data is classified based on its neighbors. The value of k is decided by the user.

**Learning vector Quantization (LVQ)**

Here we have a two-layer artificial neural network that adjusts the weights as the learning happens. Here the learning is via competition rather than feedback corrections.

**Self Organizing Map**

This is similar to LVQ, except that it is unsupervised.

**Locally Weighted Learning**

Locally weighted learning is a group of functions that predicts a particular input based on the local model around it.

**Support Vector Machines**

Here the data is segregated using hyperplanes. A new point is classified or regressed using the distance from the hyperplanes. Based on regularization, we have the following set of algorithms. Regularization is needed so we are not caught at high variance or high bias. Usually, the model is punished to avoid such scenarios.

**LASSO L1 regularization**

In the cost function, we add a factor of lambda times L1 norm, which is nothing but the sum of absolute deviations.

**Ridge regression L2 regularization.**

In the cost function, we add a factor of lambda times L2 norm, which is nothing but the sum of the square of deviations.

**Elastic Net**

Here the regularization is done by combining both L1 and L2 norm.

**Least angle regression**

This is similar to stepwise regression. This plays a vital role when many attributes are to be considered.

Decision trees are constructed to have a node and branch-like structure. The trees grow branches as long as the data becomes pure. At this stage it is called the leaf. By purity, we denote that all the data at this region are similar. An example would be a class containing girls and boys. A node would be what is the gender? This will have two answers, boys and girls. Any data point (i.e., student) is classified into either of the two. When we check the boys node, all data points are boys, so in that way it is pure. Similarly, for the girls.

**Classification and regression tree (CART)**

Here the classification is done based on the Gini impurity index. Here the classification is mostly binary.

**Iterative dichotomiser 3(ID3)**

Here the decision is made by using the entropy or the information gain.

**C4.5 and C5.0**

Here the decision is made by using the entropy or the information gain.

**Chi-square automatic interaction detection (CHAID)**

Here the number of classes can be more than 2. This is more for a descriptive analysis.

**Decision Stump**

The decision stump is a Decision tree model with just one decision-making node.

**M5**

In this model regression can be performed and hence can be used for regression also. In the leaf nodes some functions take the input and predict the values.

**Continuous decision trees**

There may scenarios where a proper division cannot be made and in those scenarios we use a continuous variable decision tree. This is also called as regression tree because decision at one place depends on decisions taken elsewhere.

**Conditional inference trees**

In all the previous scenarios, a node was selected based on entropy or information gain. However, the node selection is made here by conducting a series of non-parametric tests.

The next type of algorithms which we study is the Bayesian algorithms. In Bayesian algorithm, there is an assumption that the Bayes theorem is valid. So this requires that the input variables are independent.

Some of the most popular Bayesian algorithms are listed below.

**Naive Bayes**

Mostly used for high-dimensional datasets, where there is an assumption that various features are independent. Here probabilities are calculated and accordingly, classification is made.

**Gaussian Naive Bayes**

It is similar to the Naive Bayes, except that here we make an assumption that our input features follow a Gaussian distribution.

**Multinomial/Binomial Naive Bayes**

Here we are making frequency histograms based on whether the classification is binary or more than binary.

**Averaged one-dependence Estimators (AODE)****Bayesian Belief Network****Bayesian Network**

These algorithms classify a member based on the structures in the data. The data is organized into groups based on maximum commonality and similarity.

Some of the popular algorithms are

**K-means**

In the K means algorithm, the data is divided into K regions. Essentially, K central points are identified and new data is assigned to the group based on its proximity. The central points (also called centroids) are calculated by finding the means.

**K-medians**

Here the centroids are calculated based on the medians.

**Expectation Maximization**

In this there are two steps in the algorithms. The first is the estimation step and the second is the optimization step. In the first step, missing variables are estimated and in the second step the model's parameters are maximized.

**Hierarchical Clustering**

In hierarchical clustering the data is clustered into various groups. Within the group the data points are similar. The difference between Hierarchical clustering and K means clustering is that in the latter the number of groups is already decided. There are two ways by which this algorithm works. One is agglomerative and the second is divisive. In agglomerative is bottom - up approach, while decisive is top-down approach. The agglomerative approach will make the entire data set into one cluster. First, a cluster is formed by taking nearby points. Then the cluster size increases by including the next points. This process repeats till all the points are brought into one big cluster.

In these algorithms, relationships between two variables are uncovered and rules that explain them are excavated from the data. These can be used for some sort of predictions.

**Apriori algorithm**

In this algorithm, association rules are studied between members or transactions. For instance, in shopping, the shopkeepers are always interested to know if buyers who buy object A also buy object B. In case they buy, A is always kept near B. These kinds of association rules are mined from the dataset.

**The eclat algorithm**is also known as the equivalence class clustering and bottom-up lattice transversal.

In this algorithm, association rules are mined between various transaction id sets. This is more efficient than the apriori algorithm.

These are inspired by the neuron structure in our brains. Neurons are interconnected to each other. While training the model, the weights of the interconnection are constantly adjusted.

**Perceptron**

It is a simple neuron with one node with binary output. The inputs can be many.

**Multi-layer perceptrons**

In multi-layer perceptrons, there are fully connected neural networks with 3 layers. If there is more than one hidden layer, it becomes a deep-learning Artificial neural network.

**Backpropagation**

In this neural network, the errors in classification or prediction are backpropagated from output to input, and the weights are adjusted.

**Stochastic Gradient descent**

In the stochastic gradient descent, gradients are calculated for some part of the data, and for the next iteration, different points from the datasets are used.

**Hopfield Network**

Hopfield Network consists of a fully interconnected neural network. That is to say that all neurons are connected to all others. This network is used to learn associations.

**Radial basis Function Network**

This is a three-layer feed-forward neural network. The first layer is the input layer, the second is the hidden layer with an activation unit, and the last is the output layer. the activation unit mainly consists of Gaussian functions.

These are extensions of ANN. Some of the important algorithms in this category are

**Convolutional Neural Network**

Convolutional neural networks are mainly used for classifying images. The images are stored as an array of pixels. These input arrays are multiplied by another array called the kernel or the filter. The size of the kernel need not be the same as the input. The features are extracted via this process of convolution.

**Recurrent Neural Networks**

In a recurrent neural network, the neurons send signal to each other in anyway. This is mainly suited to analyzing temporal data or sequential data.

**Long and Short term Memory networks**

LSTM is almost like a RNN, except that it can handle a lot of data. It consists of a cell, input gate, forget gate and an output gate. The three gates control the flow of information. The important parts of a messages are stored and used for further processing.

**Stacked Autoencoders**

These are used to reduce the dimension of the data. A non-linear function describes the relationship between the input and the output. These will automatically capture features.

An auto-encoder contains three layers, The encoder, the decoder and the bottleneck. The encoder picks the most important features. The decoder tries to reconstruct the original information. Multiple auto encoders working together form a stacked autoencoder.

**Deep Boltzmann machine**

In a deep Boltzmann machine, all neurons are connected to each other; they are multi-directional. The connections grow exponentially.

**Deep Belief networks**

Deep belief networks arise when we stack multiple deep Boltzmann machines.

Here larger data are reduced into smaller ones by using dimension reduction based on the inherent structure of the data. This technique aids in visualization or simplifying the data.

**Principal component analysis**

In principal component analysis, rotations are performed in higher dimensional space, to reduce the number of dimensions. This facilitates by reducing the complexity of the problem. A classic example would be the movement of chalk on a board can be tracked with a camera and we will get the position of the chalk in x,y and z direction. However in this case since chalk moves on the board, the motion is restricted into a plane and thus performing a PCA would reduce the dimension from 3 to 2.

**Principal Component Regression**

PCR = PCA + LR (linear regression) works efficiently well on multivariate data

**Partial Least Squares Regression (PLSR)**

Here the algorithm tries to reduce the input as much as possible and still predict y. The difference between PCR and PLSR is that PCR concentrates on X alone, while PLSR considers Y also.

**Sammon mapping**

Mapping from a higher dimension to lower dimension using gradient descent methods.

**Multi-Dimensional Scaling****Projection Pursuit**

Using the Kurtosis in the data, projection indexes are devised, which helps scale the data.

**Linear Discriminant Analysis LDA**

LDA finds a feature subspace and is mostly used in supervised learning. Here there is an inherent assumption that all classes come from a single Gaussian distribution.

**Mixture Discriminant Analysis**

Similar to LDA with relaxation on the assumption that all classes come from a single Gaussian distribution.

**Quadratic Discriminant Analysis**

It is a general model which assumes that each class comes from a Gaussian distribution.

**Flexible Discriminant Analysis**

Here there is a mixture of linear regression models that is used for prediction purposes.

In an ensemble, methods are a combination of multiple models. These models work together to give better accuracy.

**Boosting**

In this process, many weak models are combined to make a stronger model. By weak model, what we denote is that the model is just better than a random guess. While for a stronger model, the prediction is as accurate and almost close to the actual ones. Here some part of data is sampled and trained with models sequentially. Each model that succeeds tries to learn from the weakness of the previous model. The weak rules from all are combined to form a strong one. Boosting is used when there is low variance and high bias. AdaBoost and XGBoost are two very popular techniques.

**Bootstrapped aggregation or Bagging**

In Bagging, the models run in parallel. Bagging are used when there is high variance and low bias.

**Weighted Average (blending)****Stacked Generalization (stacking)****Gradient boosting machines (GBM)****Gradient-boosted regression trees (GBRT)****Random forest**

**Feature selection algorithm**

This manipulates the input to reduce the noise and get more relevant information to make a prediction.

**algorithm accuracy evaluation**

For classification, we can use accuracy, precision, recall, F-1 score, ROC, AUC

For regression, we can use MSE, MAE

Ranking metrics would involve finding MRR, DCG and NDCG

Correlation is one of the statistical metrics

PSNR, SSIM and IOU are used for computer vision

Perplexity, BLEU scores are used for NLP

Inception score, Frecher Inception distance for deep learning

performance measures

- Optimization Algorithms
- Evolutionary Algorithms
- Computer Vision
- Natural Language Processing
- Recommender Systems
- Reinforcement Learning
- Graphical Models

Author

Navin Baskar

Author

Skill-Lync

Continue Reading

**Related Blogs**

What is Confusion Matrix?

While using classification algorithms, two kinds of outputs are generated. In one of the types, the output is class while in another probability is the output.

CSE

19 May 2023

Real-Time Applications of Python You Need to Know

Since 1991 when the Python language was developed, it has been used for various applications. Due to its simplicity and versatile nature, Python codes can help developers to complete the process of software development without much hassle.

CSE

16 May 2023

A Brief Introduction to Python: Its Features and Different IDEs

Python is an open-source programming language which means it is available on the official website, and anyone can make use of this technology free of cost. Since it is open-source, this means that the source code is also available to the public.

CSE

15 May 2023

Cybersecurity in Telecom: Protecting Networks & Data from Cyber Threats

Telecommunications networks support our digital society. They are, therefore, a top target for cyberattacks.

CSE

15 Apr 2023

Everything you Need to Know About Full-Stack Web Development

Are you interested in becoming a web developer? If so, you've come to the right place! This comprehensive guide to full-stack web development will give you all the information you need to start.

CSE

13 Apr 2023

Author

Skill-Lync

Continue Reading

**Related Blogs**

What is Confusion Matrix?

While using classification algorithms, two kinds of outputs are generated. In one of the types, the output is class while in another probability is the output.

CSE

19 May 2023

Real-Time Applications of Python You Need to Know

Since 1991 when the Python language was developed, it has been used for various applications. Due to its simplicity and versatile nature, Python codes can help developers to complete the process of software development without much hassle.

CSE

16 May 2023

A Brief Introduction to Python: Its Features and Different IDEs

Python is an open-source programming language which means it is available on the official website, and anyone can make use of this technology free of cost. Since it is open-source, this means that the source code is also available to the public.

CSE

15 May 2023

Cybersecurity in Telecom: Protecting Networks & Data from Cyber Threats

Telecommunications networks support our digital society. They are, therefore, a top target for cyberattacks.

CSE

15 Apr 2023

Everything you Need to Know About Full-Stack Web Development

Are you interested in becoming a web developer? If so, you've come to the right place! This comprehensive guide to full-stack web development will give you all the information you need to start.

CSE

13 Apr 2023

Book a Free Demo, now!

Related Courses

4.8

23 Hours of content

Electrical Domain

4.8

29 Hours of content

Embedded Domain

Showing 1 of 4 courses

Try our top engineering courses, projects & workshops today!Book a Live Demo