All Courses
All Courses
Courses by Software
Courses by Semester
Courses by Domain
Tool-focused Courses
Machine learning
POPULAR COURSES
Success Stories
1) Perform Gradient Descent in Python with any loss function Gradient Descent Gradient Descent is an optimization algorithm used for minimizing the cost function in various machine learning algorithms. Gradient descent is an optimization algorithm that is commonly used to train machine learning models and neural networks.…
Sushant Ovhal
updated on 09 Oct 2022
1) Perform Gradient Descent in Python with any loss function
Gradient Descent
Gradient Descent is an optimization algorithm used for minimizing the cost function in various machine learning algorithms. Gradient descent is an optimization algorithm that is commonly used to train machine learning models and neural networks. Training data helps these models learn over time, and the cost function within gradient descent specifically acts as a barometer, gauging its accuracy with each iteration of parameter updates. Until the function is close to or equal to zero, the model will continue to adjust its parameters to yield the smallest possible error.
import numpy as np
x=np.array([1,2,3,4,5,6])
y=np.array([6,7,8,9,11,12])
m=0
b=0
n=len(x)
learning_rate=0.001
iteration =10
for i in range (iteration):
y_predicted = m*x+b
cost=(1/n)*sum([val**2 for val in (y-y_predicted)])
md = -(2/n)*sum(x*(y-y_predicted))
bd = -(2/n)*sum(y-y_predicted)
m=m-learning_rate *md
b=b-learning_rate *bd
print("m{},b{},cost{} iteration{}".format(m,b,cost,i))
Ans:
m0.069,b0.017666666666666664,cost82.5 iteration0
m0.13578333333333334,b0.034815,cost77.5079425 iteration1
m0.20042086722222224,b0.05146155333333333,cost72.82981771791017 iteration2
m0.2629812033764815,b0.06762235082277777,cost68.44587474255758 iteration3
m0.32353075041830215,b0.08331290436416351,cost64.33760528734116 iteration4
m0.3821337939917312,b0.09854822996917374,cost60.48766551075427 iteration5
m0.4388525646308645,b0.11334286361795995,cost56.879802755553044 iteration6
m0.4937473034584025,b0.12771087660497465,cost53.498786897115764 iteration7
m0.5468763257839295,b0.14166589039422256,cost50.330346011006384 iteration8
m0.5982960826690574,b0.1552210909996133,cost47.361106088000476 iteration9
2) code
import numpy as np
x=[1,2,3,4,5,6]
y=[6,7,8,9,11,12]
n=len(x)
#y_predicted = mx+b
m=0
b=0
error=1e-6
#loss function
mse=1e9
learning_rate =0.01
iteration= 1
while mse>error:
square=0
dm=0
db=0
for i in range (0,n):
square=square + pow((y[i]-(m*x[i]+b)),2)
dm=dm+2/n*(-x[i]*(y[i]-(m*x[i]-b)))
db=db+2/n*(-1*(y[i]-(m*x[i]+b)))
mse=square/n
print("Iteration=" + str(iteration)+"mean square Error=" + str(mse))
print("m="+str(m)+"b="+str(b))
b=b -learning_rate*db
m=m-learning_rate*dm
iteration=iteration + 1
print("The parameter for the best fit line are:" + "b=" + str(b) + "m="+str(m))
Ans:-
Iteration=1mean square Error=6.0
m=0b=0
Iteration=2mean square Error=14.027266666666668
m=0.02b=0.02
Iteration=3mean square Error=23.850670578518518
m=0.08653333333333334b=0.06313333333333333
Iteration=4mean square Error=34.3678233991607
m=0.23110200000000003b=0.13185755555555556
Iteration=5mean square Error=45.981067048968065
m=0.48510332740740747b=0.2270608925925926
Iteration=6mean square Error=52.67004293574694
m=0.8857970590740741b=0.35008897119753085
Iteration=7mean square Error=2.772454088221284
2) Difference between L1 & L2 Gradient descent method
L1 L2
1)L1 is Mean Absolute Error (MAE) 1) L2 is Mean Squared Error (MSE)
2)Mean Absolute Error is another loss function used 2) Mean Squared Error is the most commonly
for regression models. used regression loss function.
3)L1 it measures the average magnitude of errors in 3) MSE is the sum of squared distance
a set of predictions without considering their directions. between our target variable and predicted
values.
4)
3) What are the different loss functions for regression
1) Mean Squared Error/L2
2)Mean Absolute Error / L1 Loss
3)Root Mean Squared Error
4)Mean Bias Error
5)Huber Loss
6) Hinge
4) What is the importance of learning rate
Learning Rate(also referred to as step size or the alpha) is the size of the steps that are taken to reach the minimum. This is typically a small value, and it is evaluated and updated based on the behavior of the cost function. High learning rates result in larger steps but risks overshooting the minimum. Conversely, a low learning rate has small step sizes. While it has the advantage of more precision, the number of iterations compromises overall efficiency as this takes more time and computations to reach the minimum.
The amount that the weights are updated during training is referred to as the step size or the “learning rate.”Specifically, the learning rate is a configurable hyperparameter used in the training of neural networks that has a small positive value, often in the range between 0.0 and 1.0.The learning rate controls how quickly the model is adapted to the problem. Smaller learning rates require more Training Epochs given the smaller changes made to the weights each update, whereas larger learning rates result in rapid changes and require fewer training epochs.A learning rate that is too large can cause the model to converge too quickly to a suboptimal solution, whereas a learning rate that is too small can cause the process to get stuck.The challenge of training deep learning neural networks involves carefully selecting the learning rate. It may be the most important hyperparameter for the model.
The learning rate is a configurable hyperparameter used in the training of neural networks that has a small positive value, often in the range between 0.0 and 1.0.The learning rate controls how quickly the model is adapted to the problem. Smaller learning rates require more training epochs given the smaller changes made to the weights each update, whereas larger learning rates result in rapid changes and require fewer training epochs.A learning rate that is too large can cause the model to converge too quickly to a suboptimal solution, whereas a learning rate that is too small can cause the process to get stuck.
5) How to evaluate linear regression
Linear regression is a regression model that uses a straight line to describe the relationship between variables. It finds the line of best fit through your data by searching for the value of the regression coefficient(s) that minimizes the total error of the model.
There are two main types of linear regression:
1)Simple linear regression uses only one independent variable
2)Multiple linear regression uses two or more independent variables
There are 3 main metrics for model evaluation in regression:
1)R Square/Adjusted R Square.
R-Squared is the ratio of Sum of Squares Regression (SSR) and Sum of Squares Total (SST). Sum of Squares Regression is the amount of variance explained by the regression line. R-squared value is used to measure the goodness of fit. The greater the value of R-Squared, the better is the regression model
2)Mean Square Error(MSE)/Root Mean Square Error(RMSE)
The MSE has the units squared of whatever is plotted on the vertical axis. Another quantity that we calculate is the Root Mean Squared Error (RMSE).It is just the square root of the mean square error.Mean Squared Error(MSE) is the sum of squared distances between our target variable and predicted values. MAE is similar to MAE but the errors are squared.
3)Mean Absolute Error(MAE)
MAE is the sum of absolute differences between our target and predicted variables. So, it measures the average magnitude of errors in a set of predictions, without considering their directions.
6)What is the difference between multiple and adjusted coefficient of determination
The proportion of variance explained by all of the independent variables together is called the coefficient of multiple determination.R is called the multiple correlation coefficient.R measures the correlation between the prediction and the actual values of the dependent variable.
R^2 =SSR/SST
where SSR is the sum of square regression and SST is the total sum of squares.The multiple coefficient of determination represents the proportion of the variability in the response that is explained by the multiple regression equation.Multiple Square is simply a measure of Rsquared for models that have multiple predictor variables. Therefore it measures the amount of variation in the response variable that can be explained by the predictor variables. The fundamental point is that when you add predictors to your model, the multiple Rsquared will always increase, as a predictor will always explain some portion of the variance.
IN Adjusted coefficient of determination measures the goodness of a regression equation using the coefficient of determination.Adjusted Rsquare controls against this increase, and adds penalties for the number of predictors in the model. Therefore it shows a balance between the most parsimonious model, and the best fitting model. Generally, if you have a large difference between your multiple and your adjusted Rsquared that indicates you may have overfit your model
r^2 = SSR/SST
R² measures the proportion of the total variation in Y explained by the regression model.
Here are some key points about R²:
1)It is a non-negative quantity with a range 0 ≤ R² ≤ 1
2) R² = 0 implies that the regression line does not fit the data at all.
3)R² = 1 implies that the regression line is a perfect fit.
Adjusted R² is a modified version of R² adjusted with the number of predictors. It penalizes for adding unnecessary features and allows a comparison of regression models with a different number of predictors. The adjusted R-squared increases only if the new term improves the model more than would be expected by chance. It decreases when a predictor improves the model by less than expected by chance.
Obtaining a negative value for Adjusted R² can indicate few or all of the following:
1) The linear model is a poor fit for the data
2) The number of predictors is large
3)The number of samples is small
4)It is a statistic used in the context of statistical models whose main purpose is either the prediction of future outcomes or the testing of hypotheses, on the basis of other related information. It provides a measure of how well observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model.
Leave a comment
Thanks for choosing to leave a comment. Please keep in mind that all the comments are moderated as per our comment policy, and your email will not be published for privacy reasons. Please leave a personal & meaningful conversation.
Other comments...
Project 1 - Analyzing the Education trends in Tamilnadu
This dashboard empowers mission driven organizations to harness the power of data visualization for social change. Women are tracked away from science and mathematics throughout their education, limiting their training and options to go into these fields as adults. The data set contains the data of women graduated by years,…
14 Nov 2023 01:32 PM IST
Project 1 - English Dictionary App & Library Book Management System
Project 1) English dictionary app and Library Book Management system
06 Nov 2023 04:04 PM IST
Project 1 - Implement and deploy CNN model in real-time using python on Fashion MNIST dataset
Implement and deploy CNN model in real-time using python on Fashion MNIST dataset
20 Dec 2022 07:04 AM IST
Project 2
Project 2
30 Nov 2022 11:41 AM IST
Related Courses
0 Hours of Content
Skill-Lync offers industry relevant advanced engineering courses for engineering students by partnering with industry experts.
© 2025 Skill-Lync Inc. All Rights Reserved.