All Courses
All Courses
Courses by Software
Courses by Semester
Courses by Domain
Tool-focused Courses
Machine learning
POPULAR COURSES
Success Stories
Perform Gradient Descent in Python with any loss function Let us consider the below dataset for performing gradient descent: x = [1,2,3,4,5] y = [2,4,6,8,10] The loss function to be used is the mean square error which is defined as: MSE=n∑i=1(ya−yp)2n n = total number of data points yp = predicted…
Vignesh Varatharajan
updated on 31 Mar 2021
Let us consider the below dataset for performing gradient descent:
x = [1,2,3,4,5]
y = [2,4,6,8,10]
The loss function to be used is the mean square error which is defined as:
MSE=n∑i=1(ya−yp)2n
n = total number of data points
yp = predicted outcome
ya = actual outcome
Gradient descent is an optimization algorithm. It's based on a convex function and tweaks its parameters iteratively to minimize the loss function to its local minimum
Let us define the y_predicted as:
y_predicted = mx + b
where m,b are constants.
Therefore, we can rewrite MSE as:
MSE=n∑i=1(ya−(mx+b))2n
To find the local minima, the gradient of MAE wrt to b and m is calculated:
∂MSE∂b=2n.n∑i=1(−1)(ya−(mx+b))2n
∂MSE∂m=2n.n∑i=1(−x)(ya−(mx+b))2n
Using these differentials, we iterate the values of b and m as follows:
b=b−learning_rate⋅∂MSE∂b
m=m−learning_rate⋅∂MSE∂m
#Program to perform gradient descent using MSE loss function
x = [1,2,3,4,5]
y = [2,4,6,8,10]
n=len(x)
#y_predicted = mx + b
#Assume m and b
m= 0
b = 0
#tolerance
tol=1e-6
#loss function
mse= 1e9
learning_rate = 0.001
iteration = 1
while mse>tol :
square_error = 0
d_dm = 0
d_db = 0
for i in range(0,n):
square_error = square_error + pow((y[i] - (m*x[i]+b)),2)
d_dm = d_dm + 2/n*(-x[i]*(y[i] - (m*x[i]+b)))
d_db = d_db + 2/n*(-1*(y[i] - (m*x[i]+b)))
mse = square_error/n
print("\nIteration ="+ str(iteration) + " \nMean Square Error="+str(mse))
print("m = "+ str(m) + " b = " + str(b) )
b= b - learning_rate*d_db
m = m - learning_rate*d_dm
iteration = iteration + 1
print("\n\nThe parameters for the best fit line are:\n" + "b = " + str(b) + " m = "+ str(m))
Output:
After 15,945 iterations, the values of b and m are found to be 0.00234 and 1.99935 respectively. Hence, the predicted values are determined using equation:
ypredicted=1.99935x+0.00234
2. Difference between L1 & L2 Gradient descent method
The two major loss functions are:
MAE=∑ni=1|yp−ya|n
n = total number of data points
yp = predicted outcome
ya = actual outcome
MSE=∑ni=1(yp−ya)2n
n = total number of data points
yp = predicted outcome
ya = actual outcome
The gradient descent performed using MAE is known as L1 norm whereas the gradient descent using MSE is known as L2 norm.
3. What are the different loss functions for regression?
The different loss functions are:
MAE=∑ni=1|yp−ya|n
MSE=∑ni=1(yp−ya)2n
RMS=√∑ni=1(yp−ya)2n
4. What is the importance of learning rate?
In gradient descent method, the error function is minimised by calculating the tangent at the weights and moving along the convex function in incremental steps. The size of these steps is determined by the learning rate.
A larger value of learning rate could results in overshooting of the weights and could result in inaccurate value of minima. On the other hand, a smaller value of learning rate improves the accuracy of the algorithm but results in increased computational expenditure.
5. How to evaluate linear regression?
Evaluation of linear regression helps you to understand the performance of your model. There are three major metrics used to evaluate regression model
`R^2 and adjusted R^2 are better used to explain the model to other people because you can explain the number as a percentage of the output variability. On the other hand, the other two metrics are used to choose the best amongst the various regression models.
MSE gives a larger error value and may be difficult for comparison of different algorithms. In such cases, it is beneficial to opt for RMSE as the error values are smaller and the dimensionality of the predicted value is same as the actual values.
Furthermore, MSE penalises significantly the outlier data points as compared to MAE. Hence, based on the scenario where outliers points need not be penalized significantly, we choose MAE over MSE.
6. What is the difference between multiple and adjusted coefficient of determination?
When the linear regression involves one dependent variable and one independent variable, the regression model is evaluated by R2or coefficient of determination. In case of a data set with one dependent variable and several independent variable, the model is evaluated using multiple coefficient of determination.
When the regression model is not influenced by some of the independent variables, we opt for adjusted coefficient of determination. Any addition of data point where the independent variable can be ignored does not afffect the value of R2
R2adjusted=1−(1−R2)(n−1)n−k−1
n= number of data points
k= number of independent variables
Leave a comment
Thanks for choosing to leave a comment. Please keep in mind that all the comments are moderated as per our comment policy, and your email will not be published for privacy reasons. Please leave a personal & meaningful conversation.
Other comments...
Supervised Learning - Prediction Week 3 Challenge
Perform Gradient Descent in Python with any loss function Let us consider the below dataset for performing gradient descent: x = [1,2,3,4,5] y = [2,4,6,8,10] The loss function to be used is the mean square error which is defined as: MSE=n∑i=1(ya−yp)2n n = total number of data points yp = predicted…
31 Mar 2021 10:29 AM IST
Basics of ML & AL Week 2 Challenge
1)Calculate all 4 business moments using pen and paper for the below data set? First business moment The first business moment is the measure of central tendency Mean or expected value, μ=∑x.P(x) For the data, we calculate the data as follows: μ=∑x.P(x)=1.4 Second business moment The second business…
16 Mar 2021 06:39 AM IST
Basics of Probability and Statistics Week 1 Challenge
Why there is a difference in the formula of variance for population and sample? The formula used for determining variance for population is σ2=∑(xi−μx)2N Whereas, the variance for a sample is calculated using s2=∑(xi−μx)2N−1 It can be noted that the denominator for sample variance is reduced…
10 Mar 2021 02:15 PM IST
Week 10: Project 1 - FULL HYDRO case set up (PFI)
Aim To setup a full hydrodynamic case of PFI and perform simulation to determine the engine characteristics Introduction Using the boundary flagging and mesh motion obtained from the no-hydro case, this project involves spray and combustion modeling to perform a full hydrodynamic case of Port Fuel Injection system. The…
23 Feb 2021 04:16 AM IST
Related Courses
0 Hours of Content
Skill-Lync offers industry relevant advanced engineering courses for engineering students by partnering with industry experts.
© 2025 Skill-Lync Inc. All Rights Reserved.