Executive Programs

Workshops

Projects

Blogs

Careers

Placements

Student Reviews

For Business

Academic Training

Informative Articles

Find Jobs

We are Hiring!

All Courses

Choose a category

Mechanical

Electrical

Civil

Computer Science

Electronics

Offline Program

All Courses

CHOOSE A CATEGORY

Mechanical

Electrical

Civil

Computer Science

Electronics

Offline Program

Top Job Leading Courses

Automotive

CFD

FEA

Design

MBD

Med Tech

Courses by Software

Design

Solver

Automation

Vehicle Dynamics

CFD Solver

Preprocessor

Courses by Semester

First Year

Second Year

Third Year

Fourth Year

Courses by Domain

Automotive

CFD

Design

FEA

Tool-focused Courses

Design

Solver

Automation

Preprocessor

CFD Solver

Vehicle Dynamics

Machine learning

Machine Learning and AI

POPULAR COURSES

Post Graduate Program in Hybrid Electric Vehicle Design and Analysis

Post Graduate Program in Computational Fluid Dynamics

Post Graduate Program in CAD

Post Graduate Program in CAE

Post Graduate Program in Manufacturing Design

Post Graduate Program in Computational Design and Pre-processing

Post Graduate Program in Complete Passenger Car Design & Product Development

Executive Programs

Workshops

For Business

Success Stories

Placements

Student Reviews

Projects

Blogs

Academic Training

Find Jobs

Informative Articles

We're Hiring!

+91 9342691281 Log in

Supervised Learning - Prediction Week 3 Challenge

1) Perform Gradient Descent in Python with any loss function Gradient Descent Gradient Descent is an optimization algorithm used for minimizing the cost function in various machine learning algorithms. Gradient descent is an optimization algorithm that is commonly used to train machine learning models and neural networks.…

Sushant Ovhal
updated on 09 Oct 2022

1) Perform Gradient Descent in Python with any loss function

Gradient Descent

Gradient Descent is an optimization algorithm used for minimizing the cost function in various machine learning algorithms. Gradient descent is an optimization algorithm that is commonly used to train machine learning models and neural networks. Training data helps these models learn over time, and the cost function within gradient descent specifically acts as a barometer, gauging its accuracy with each iteration of parameter updates. Until the function is close to or equal to zero, the model will continue to adjust its parameters to yield the smallest possible error.

import numpy as np

x=np.array([1,2,3,4,5,6])
y=np.array([6,7,8,9,11,12])

m=0
b=0
n=len(x)
learning_rate=0.001
iteration =10

for i in range (iteration):
y_predicted = m*x+b
cost=(1/n)*sum([val**2 for val in (y-y_predicted)])
md = -(2/n)*sum(x*(y-y_predicted))
bd = -(2/n)*sum(y-y_predicted)

m=m-learning_rate *md
b=b-learning_rate *bd
print("m{},b{},cost{} iteration{}".format(m,b,cost,i))

Ans:

m0.069,b0.017666666666666664,cost82.5 iteration0
m0.13578333333333334,b0.034815,cost77.5079425 iteration1
m0.20042086722222224,b0.05146155333333333,cost72.82981771791017 iteration2
m0.2629812033764815,b0.06762235082277777,cost68.44587474255758 iteration3
m0.32353075041830215,b0.08331290436416351,cost64.33760528734116 iteration4
m0.3821337939917312,b0.09854822996917374,cost60.48766551075427 iteration5
m0.4388525646308645,b0.11334286361795995,cost56.879802755553044 iteration6
m0.4937473034584025,b0.12771087660497465,cost53.498786897115764 iteration7
m0.5468763257839295,b0.14166589039422256,cost50.330346011006384 iteration8
m0.5982960826690574,b0.1552210909996133,cost47.361106088000476 iteration9

2) code
import numpy as np

x=[1,2,3,4,5,6]
y=[6,7,8,9,11,12]

n=len(x)

#y_predicted = mx+b

m=0
b=0

error=1e-6

#loss function
mse=1e9

learning_rate =0.01
iteration= 1

while mse>error:
square=0
dm=0
db=0
for i in range (0,n):
square=square + pow((y[i]-(m*x[i]+b)),2)
dm=dm+2/n*(-x[i]*(y[i]-(m*x[i]-b)))
db=db+2/n*(-1*(y[i]-(m*x[i]+b)))
mse=square/n
print("Iteration=" + str(iteration)+"mean square Error=" + str(mse))
print("m="+str(m)+"b="+str(b))
b=b -learning_rate*db
m=m-learning_rate*dm
iteration=iteration + 1

print("The parameter for the best fit line are:" + "b=" + str(b) + "m="+str(m))

Ans:-

Iteration=1mean square Error=6.0
m=0b=0
Iteration=2mean square Error=14.027266666666668
m=0.02b=0.02
Iteration=3mean square Error=23.850670578518518
m=0.08653333333333334b=0.06313333333333333
Iteration=4mean square Error=34.3678233991607
m=0.23110200000000003b=0.13185755555555556
Iteration=5mean square Error=45.981067048968065
m=0.48510332740740747b=0.2270608925925926
Iteration=6mean square Error=52.67004293574694
m=0.8857970590740741b=0.35008897119753085
Iteration=7mean square Error=2.772454088221284

2) Difference between L1 & L2 Gradient descent method

L1 L2

1)L1 is Mean Absolute Error (MAE) 1) L2 is Mean Squared Error (MSE)

2)Mean Absolute Error is another loss function used 2) Mean Squared Error is the most commonly

for regression models. used regression loss function.

3)L1 it measures the average magnitude of errors in 3) MSE is the sum of squared distance

a set of predictions without considering their directions. between our target variable and predicted

values.

4) $\mathrm {MAE}= \frac {\sum _{i=1}^{n} {|y_i-x_i|}}{n}$ $\mathrm{MSE} = \frac{1}{n} \sum_{i=1}^{n}(Y_{i}-\hat{Y}_{i})^2$

3) What are the different loss functions for regression

1) Mean Squared Error/L2

2)Mean Absolute Error / L1 Loss

3)Root Mean Squared Error

4)Mean Bias Error

5)Huber Loss

6) Hinge

4) What is the importance of learning rate

Learning Rate(also referred to as step size or the alpha) is the size of the steps that are taken to reach the minimum. This is typically a small value, and it is evaluated and updated based on the behavior of the cost function. High learning rates result in larger steps but risks overshooting the minimum. Conversely, a low learning rate has small step sizes. While it has the advantage of more precision, the number of iterations compromises overall efficiency as this takes more time and computations to reach the minimum.

High and low learning rates

The amount that the weights are updated during training is referred to as the step size or the “learning rate.”Specifically, the learning rate is a configurable hyperparameter used in the training of neural networks that has a small positive value, often in the range between 0.0 and 1.0.The learning rate controls how quickly the model is adapted to the problem. Smaller learning rates require more Training Epochs given the smaller changes made to the weights each update, whereas larger learning rates result in rapid changes and require fewer training epochs.A learning rate that is too large can cause the model to converge too quickly to a suboptimal solution, whereas a learning rate that is too small can cause the process to get stuck.The challenge of training deep learning neural networks involves carefully selecting the learning rate. It may be the most important hyperparameter for the model.

The learning rate is a configurable hyperparameter used in the training of neural networks that has a small positive value, often in the range between 0.0 and 1.0.The learning rate controls how quickly the model is adapted to the problem. Smaller learning rates require more training epochs given the smaller changes made to the weights each update, whereas larger learning rates result in rapid changes and require fewer training epochs.A learning rate that is too large can cause the model to converge too quickly to a suboptimal solution, whereas a learning rate that is too small can cause the process to get stuck.

5) How to evaluate linear regression

Linear regression is a regression model that uses a straight line to describe the relationship between variables. It finds the line of best fit through your data by searching for the value of the regression coefficient(s) that minimizes the total error of the model.

There are two main types of linear regression:

1)Simple linear regression uses only one independent variable

2)Multiple linear regression uses two or more independent variables

There are 3 main metrics for model evaluation in regression:

1)R Square/Adjusted R Square.
R-Squared is the ratio of Sum of Squares Regression (SSR) and Sum of Squares Total (SST). Sum of Squares Regression is the amount of variance explained by the regression line. R-squared value is used to measure the goodness of fit. The greater the value of R-Squared, the better is the regression model

2)Mean Square Error(MSE)/Root Mean Square Error(RMSE)

The MSE has the units squared of whatever is plotted on the vertical axis. Another quantity that we calculate is the Root Mean Squared Error (RMSE).It is just the square root of the mean square error.Mean Squared Error(MSE) is the sum of squared distances between our target variable and predicted values. MAE is similar to MAE but the errors are squared.

3)Mean Absolute Error(MAE)
MAE is the sum of absolute differences between our target and predicted variables. So, it measures the average magnitude of errors in a set of predictions, without considering their directions.

6)What is the difference between multiple and adjusted coefficient of determination

The proportion of variance explained by all of the independent variables together is called the coefficient of multiple determination.R is called the multiple correlation coefficient.R measures the correlation between the prediction and the actual values of the dependent variable.

R^2 =SSR/SST

where SSR is the sum of square regression and SST is the total sum of squares.The multiple coefficient of determination represents the proportion of the variability in the response that is explained by the multiple regression equation.Multiple Square is simply a measure of Rsquared for models that have multiple predictor variables. Therefore it measures the amount of variation in the response variable that can be explained by the predictor variables. The fundamental point is that when you add predictors to your model, the multiple Rsquared will always increase, as a predictor will always explain some portion of the variance.

IN Adjusted coefficient of determination measures the goodness of a regression equation using the coefficient of determination.Adjusted Rsquare controls against this increase, and adds penalties for the number of predictors in the model. Therefore it shows a balance between the most parsimonious model, and the best fitting model. Generally, if you have a large difference between your multiple and your adjusted Rsquared that indicates you may have overfit your model

r^2 = SSR/SST

R² measures the proportion of the total variation in Y explained by the regression model.

Here are some key points about R²:

1)It is a non-negative quantity with a range 0 ≤ R² ≤ 1

2) R² = 0 implies that the regression line does not fit the data at all.

3)R² = 1 implies that the regression line is a perfect fit.

Adjusted R² is a modified version of R² adjusted with the number of predictors. It penalizes for adding unnecessary features and allows a comparison of regression models with a different number of predictors. The adjusted R-squared increases only if the new term improves the model more than would be expected by chance. It decreases when a predictor improves the model by less than expected by chance.

Obtaining a negative value for Adjusted R² can indicate few or all of the following:

1) The linear model is a poor fit for the data

2) The number of predictors is large

3)The number of samples is small

4)It is a statistic used in the context of statistical models whose main purpose is either the prediction of future outcomes or the testing of hypotheses, on the basis of other related information. It provides a measure of how well observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model.