Modified on
18 Jul 2022 02:10 pm
Skill-Lync
Machine learning algorithm learns from the data and helps to make a prediction or classification. Based on whether the input data is labelled or not, we can classify the algorithm as supervised and unsupervised algorithms. Apart from this, we also have reinforced learning, where the algorithms get feedback based on their prediction. Based on this feedback, the machine learns. A typical example for supervised learning will be labelling all cat pictures as cat and dog pictures as dogs and asking the machine to learn, while for unsupervised learning both cat and dog pictures are given to the machine to learn by itself. Finally, in reinforcement learning, every time a cat is identified as a cat the machine gets positive feedback and for wrong identification, it gets negative feedback. Based on this, the model learns and trains itself to make accurate decisions.
Let’s explore how to classify the type of algorithms based on the function each of them does.
These algorithms help us intend to find a mathematical relationship between the input and output. We tried to find a function that takes the inputs as variables and tries to predict the output as a function of the input.
Some of the well-known regression models are
1. Least Squares Regression
In the least square regression, we try to fit the best fit line that passes through most of the points.
2. Linear Regression / Polynomial Regression
Here we fit a linear function or a polynomial function through the data points that explain the dataset well.
3. Logistic Regression
Here the output data is discrete, while the input could be continuous. In such a scenario, we use logistic regression. A classic example of this is the Breast cancer dataset.
4. Stepwise Regression
Here the variables are added stepwise and then statistical tests are performed to check the significance. The variables that pass the statistical significance are further selected for modelling.
5. Multivariate Regression
Multi-variate regression checks out the relationship between the dependent and independent variables. Essentially, this gives the behaviour of the response variable based on the predictor variable.
6. Locally estimate scatter plot smoothing
Here the data is first plotted using a scatter plot and regions are identified, where different clusters are formed. Within the clusters lines that fit the data are plotted.
1. k Nearest Neighbor (kNN) Algorithm
Here the data is classified based on its neighbours. The value of k is decided by the user.
2. Learning Vector Quantization (LVQ)
Here we have a two-layer artificial neural network, that adjusts the weights as the learning happens. Here the learning is via competition rather than feedback corrections.
3. Self-Organizing Map
This is similar to LVQ except that it is unsupervised.
4. Locally weighted Learning
Locally weighted learning is a group of functions that predicts a particular input based on the local model around it.
5. Support Vector Machines
Here the data is segregated using hyperplanes. A new point is classified or regressed using the distance from the hyperplanes.
Below are the different types of machine learning algorithms based on regularization. Regularization is needed so that we are not caught either at high variance or high bias. Usually, the model is punished to avoid such scenarios.
1. LASSO L1 regularization
In the cost function, we add a factor of lambda times L1 norm, which is nothing but the sum of absolute deviations.
2. Ridge regression L2 regularization.
In the cost function, we add a factor of lambda times L2 norm, which is nothing but the sum of the square of deviations.
3. Elastic Net
Here the regularization is done by combining both L1 and L2 norms.
4. Least angle regression
This is similar to stepwise regression and it plays a vital role when there are many attributes that are to be considered.
Decision trees are constructed to have a node and branch-like structure. The trees grow branches as long as the data becomes pure. At this stage, it is called the leaf. By purity, we denote that all the data in this region are similar. An example would be a class containing girls and boys. A node would be what is the gender? This will have two answers, boys and girls. Any data point (i.e., student) is classified into either of the two. When we check the boys' node, all data points are boys, so in that way it is pure. Similarly for the girls.
1. Classification and regression tree (CART)
Here the classification is done based on the Gini impurity index and it is mostly binary.
2. Iterative dichotomiser 3(ID3)
Here the decision is made using the entropy or the information gain.
3. C4.5 and C5.0
Here the decision is made using the entropy or the information gain.
4. Chi-square automatic interaction detection (CHAID)
Here the number of classes can be more than 2. This is more for descriptive analysis.
5. Decision Stump
The decision stump is a Decision tree model with just one decision-making node.
6. M5
In this model, a regression can be performed and hence can be used for regression also. In the leaf nodes, there are functions that take the input and predict the values.
7. Continuous decision trees
There may scenarios where a proper division cannot be made and in those scenarios, we use a continuous variable decision tree. This is also called a regression tree because a decision in one place depends on decisions taken elsewhere.
8. Conditional inference trees
In all the previous scenarios a node was selected based on entropy or information gain. However, here the node selection is done by conducting a series of non-parametric tests.
In the Bayesian algorithm, there is an assumption that the Bayes theorem is valid. So this requires that the input variables are independent.
Some of the most popular Bayesian algorithms are listed below.
1. Naive Bayes
Mostly used is a high dimensional dataset, where there is an assumption that various features are independent. Here probabilities are calculated and accordingly, classification is made.
2. Gaussian Naive Bayes
It is similar to the Naive Bayes, except that here we make an assumption that our input features follow a Gaussian distribution.
3. Multinomial/Binomial Naive Bayes
Here we are making frequency histograms based on whether the classification is binary or more than binary.
4. Averaged one-dependence Estimators (AODE)
5. Bayesian Belief Network
6. Bayesian Network
These algorithms classify a member based on the structures in the data. The data is organized into groups based on maximum commonality and similarity.
Some of the popular algorithms are
1. K-means
In the K means algorithm, the data is divided into K regions. Essentially, K central points are identified and new data is assigned to the group based on its proximity. The central points (also called the centroids) are calculated by finding the means.
2. K-medians
Here the centroids are calculated based on the medians.
3. Expectation Maximization
In this, there are two steps in the algorithms. The first is the estimation step and the second is the optimization step. In the first step, missing variables are estimated and in the second step, the parameters of the model are maximized.
4. Hierarchical Clustering
In hierarchical clustering, the data is clustered into various groups. Within the group the data points are similar. The difference between Hierarchical clustering and K means clustering is that in the latter the number of groups is already decided. There are two ways by which this algorithm works. One is agglomerative and the second is divisive. Agglomerative is a bottom-up approach while decisive is a top-down approach. In the agglomerative approach, the entire data set is to be made into one cluster. First, a cluster is formed by taking nearby points. Then the cluster size increases by including the next points. This process repeats till all the points are brought into one big cluster.
In these algorithms, relationships between two variables are uncovered and rules that explain them are excavated from the data. These can be used for some sort of prediction.
1. Apriori algorithm
In this algorithm, association rules are studied between members or transactions. For instance, in shopping, the shopkeepers are always interested to know if buyers who buy object A also buy object B. In case they buy, A is always kept near B. These kinds of association rules are mined from the dataset.
2. Eclat algorithm is also known as the equivalence class clustering and bottom-up lattice transversal
In this algorithm, association rules are mined between various transaction id sets. This is more efficient than the apriori algorithm.
These are inspired by the neuron structure in our brains. Neurons are interconnected to each other. While training the model, the weights of the interconnection are constantly adjusted.
1. Perceptron
It is a simple neuron with one node with binary output. The inputs can be many.
2. Multi-layer perceptions
In multi-layer perceptrons, there are full connected neural networks with 3 layers. If there is more than one hidden layer, then it becomes a deep learning Artificial neural network.
3. Backpropagation
In this neural network, the errors in classification or prediction are backpropagated from output to input and the weights are adjusted.
4. Stochastic Gradient descent
In the stochastic gradient descent, gradients are calculated for some part of the data, and for the next iteration, different points from the datasets are used.
5. Hopfield Network
Hopfield Network consists of the full interconnected neural network. That is to say that all neurons are connected with one another. This network is used to learn associations.
6. Radial basis Function Network
This is a three-layer feed-forward neural network. The first layer is the input layer, the second is the hidden layer with the activation unit and the last is the output layer. the activation unit mainly consists of Gaussian functions.
These are extensions of ANN. Some of the important algorithms in this category are
1. Convolutional Neural Network
Convolutional neural networks are mainly used for classifying images. The images are stored as an array of pixels. These input arrays are multiplied with another array also called the kernel or the filter. The size of the kernel need not be the same as the input. The features are extracted via this process of convolution.
2. Recurrent Neural Networks
In a recurrent neural network, the neurons send signals to each other in any way. This is mainly suited to analyzing temporal data or sequential data.
3. Long and Short term Memory networks
LSTM is almost like an RNN, except that it can handle a lot of data. It consists of a cell, input gate, forgets gate, and output gate. The three gates control the flow of information. The important parts of a message are stored and used for further processing.
4. Stacked Autoencoders
These are used to reduce the dimension of the data. A non-linear function describes the relationship between the input and the output. These will automatically capture features.
An autoencoder contains three layers, The encoder, the decoder, and the bottleneck. The encoder picks the most important features. The decoder tries to reconstruct the original information. Multiple auto encoders working together form a stacked autoencoder.
5. Deep Boltzmann machine
In a deep Boltzmann machine, all neurons are connected to each other, they are multi-directional. The connections grow exponentially.
6. Deep Belief networks
Deep belief networks arise when we stack multiple deep Boltzmann machines.
Here larger data are reduced into smaller ones by using dimension reduction based on the inherent structure of the data. This technique aids in visualization or simplifying the data.
1. Principal component analysis
In principal component analysis, rotations are performed in higher dimensional space, to reduce the number of dimensions. This facilitates by reducing the complexity of the problem. A classic example would be the movement of chalk on a board can be tracked with a camera and we will get the position of the chalk in x,y, and z directions. However, in this case, since chalk moves on the board, the motion is restricted to a plane, and thus performing a PCA would reduce the dimension from 3 to 2.
2. Principal Component Regression
PCR = PCA + LR (linear regression) works efficiently well on multivariate data
3. Partial Least Squares Regression (PLSR)
Here the algorithm tries to reduce the input as much as possible and still predict y. The difference between PCR and PLSR is that PCR concentrates on X alone, while PLSR considers Y also.
4. Sammon mapping
Mapping from a higher dimension to a lower dimension using gradient descent methods.
5. Multi-Dimensional Scaling
6. Projection Pursuit
Using the Kurtosis in the data, projection indexes are devised, which helps in scaling the data.
7. Linear Discriminant Analysis LDA
LDA finds a feature subspace and is mostly used in supervised learning. Here, an inherent assumption that all classes come from a single Gaussian distribution exists.
8. Mixture Discriminant Analysis
Similar to LDA with relaxation on the assumption that all classes come from a single Gaussian distribution.
9. Quadratic Discriminant Analysis
It is a general model which assumes that each class comes from a Gaussian distribution.
10. Flexible Discriminant Analysis
Here there is a mixture of linear regression models that are used for prediction purposes.
In ensemble, methods are a combination of multiple models. These models work together to give better accuracy.
1. Boosting
In this process, many weak models are combined to make a stronger model. By weak model what we denote is that the model is just better than a random guess. While for the stronger model, the prediction is as accurate and almost close to the actual ones. Here some part of data is sampled and trained with models sequentially. Each model that succeeds tries to learn from the weakness of the previous model. The weak rules from all are combined to form a strong one. Boosting is used when there is low variance and high bias. AdaBoost and XGBoost are two very popular techniques.
2. Bootstrapped aggregation or Bagging
In Bagging, the models run in parallel. Bagging is used when there is high variance and low bias.
3. Weighted Average (blending)
4. Stacked Generalization (stacking)
5. Gradient boosting machines (GBM)
6. Gradient boosted regression trees (GBRT)
7. Random forest
This manipulates the input to reduce the noise and get more relevant information with which we can make a prediction.
Algorithm Accuracy Evaluation
For classification, we can use accuracy, precision, recall, F-1 score, ROC, AUC
For regression, we can use MSE, MAE
Ranking metrics would involve finding MRR, DCG and NDCG
Correlation is one of the statistical metrics
PSNR, SSIM and IOU are used for computer vision
Perplexity, BLEU scores are used for NLP
Inception score, Frecher Inception distance for deep learning
performance measures
Apart from this, we have Algorithms like optimization algorithms, evolutionary algorithms, computer vision, natural language processing, recommender systems, reinforcement learning and graphical models.
Author
Navin Baskar
Author
Skill-Lync
Subscribe to Our Free Newsletter
Continue Reading
Related Blogs
Premium Master’s Program can do so at a discount of 20%. But, Christmas is time for sharing, therefore if you and your friend were to join any Skill-Lync Master’s Program together, both of you will get a discount of 30% on the course fee of your Premium Master’s Program
24 Dec 2021
Increase your career opportunities by becoming a software engineer and make the world a better place. Enroll in upskilling courses and practice the skills you learn.
27 Dec 2021
Software development is rated as the best job in the industry. Individuals with the right software development skills, good communication, and an open mind to adapt, learn, and evolve can find success in the field.
28 Dec 2021
If you aspire for a career in the software development space, upskilling yourself with the knowledge and practical application of programming languages is mandatory.
29 Dec 2021
The most fascinating thing about the chosen ways of completing tasks on computers is that we only choose them because we do not have a simpler way yet.
30 Dec 2021
Author
Skill-Lync
Subscribe to Our Free Newsletter
Continue Reading
Related Blogs
Premium Master’s Program can do so at a discount of 20%. But, Christmas is time for sharing, therefore if you and your friend were to join any Skill-Lync Master’s Program together, both of you will get a discount of 30% on the course fee of your Premium Master’s Program
24 Dec 2021
Increase your career opportunities by becoming a software engineer and make the world a better place. Enroll in upskilling courses and practice the skills you learn.
27 Dec 2021
Software development is rated as the best job in the industry. Individuals with the right software development skills, good communication, and an open mind to adapt, learn, and evolve can find success in the field.
28 Dec 2021
If you aspire for a career in the software development space, upskilling yourself with the knowledge and practical application of programming languages is mandatory.
29 Dec 2021
The most fascinating thing about the chosen ways of completing tasks on computers is that we only choose them because we do not have a simpler way yet.
30 Dec 2021
Related Courses