All Courses
All Courses
Courses by Software
Courses by Semester
Courses by Domain
Tool-focused Courses
Machine learning
POPULAR COURSES
Success Stories
AIM: To write codes in Python to perform curve fitting. OBJECTIVE : To wite codes to fit a linear and cubic polynomial for the Cp data. To plot the linear and cubic fit curves along with the raw data points. To measure the fitness characteristics for both the curves. THEORY: Curve fitting is the way…
Shubhranshu Mishra
updated on 03 Jul 2020
AIM: To write codes in Python to perform curve fitting.
OBJECTIVE :
THEORY: Curve fitting is the way we model or represent a data spread by assigning the best fit function (curve) along with the entire range. Ideally, it will capture the trend in the data and allow us to make predictions of how the data series will behave in the future.
Types of curve fitting include:
Linear and Polynomial Curve fitting :
(i) Linear curve fitting, or linear regression, is when the data is fit to a straight line. Although there might be some curve to your data, a straight line provides a reasonable enough fit to make predictions.
Since the equation of a generic straight line is always given by f(x)= a x + b, the question becomes: what a and b will give us the best fit line for our data?
Considering the vertical distance from each point to a prospective line as an error, and summing them up over our range, gives us a concrete number that expresses how far from ‘best’ the prospective line is.
A line that provides a minimum error can be considered the best straight line.
Since it’s the distance from our points to the line we’re interested in—whether it is positive or negative distance is not relevant—we square the distance in our error calculations. This also allows us to weight greater errors more heavily. So this method is called the least square approach.
(ii) Polynomial curve fitting is when we fit our data to the graph of a polynomial function. The same least-squares method can be used to find the polynomial, of a given degree, that has a minimum total error.
To choose the best fit for the curve the following four parameters help us to measure the goodness of fit criteria or how well the equations are representing the given datapoints:
GOVERNING EQUATIONS USED :
where n is the total number of datapoints available.
The raw data of Temperature (K) and specific heat (kJ/kcalK) is obtained from here: Data
SOLUTION STEPS :
PYTHON CODE :
# A program to measure fitness characteristics for the linear and cubic polynomial for the Cp data:
import math
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
# Curve fit function
# Linear function
def function_0(t,a,b):
return a*t + b
# Cubic function
def function_2(t,a,b,c,d):
return a*pow(t,3) + b*pow(t,2) + c*t +d
# Reading thermodynamic data file
def read_file():
temperature = []
cp = []
for line in open('data','r'):
values = line.split(',')
temperature.append(float(values[0]))
cp.append(float(values[1]))
return [temperature , cp]
# Main Program
temperature , cp = read_file()
popt, pcov = curve_fit(function_0,temperature,cp)
fit_cp_l = function_0(np.array(temperature), *popt)
plt.figure(1)
plt.plot(temperature,cp,'k--')
plt.plot(temperature,fit_cp_l,color='red',linewidth = 1)
plt.legend(['Actual data','Curve fit'])
plt.xlabel('Temperature (K)')
plt.ylabel('Cp')
plt.title('Linear Curve fitting')
plt.show()
popt, pcov = curve_fit(function_2,temperature,cp)
fit_cp_c = function_2(np.array(temperature), *popt)
plt.figure(3)
plt.plot(temperature,cp,'k--')
plt.plot(temperature,fit_cp_c,color='red',linewidth = 1)
plt.legend(['Actual data','Curve fit'])
plt.xlabel('Temperature (K)')
plt.ylabel('Cp')
plt.title('Cubic Curve fitting')
plt.show()
'''
Measuring the fitness characetristics of the curves
SSE (sum of error squared)
SSR (sum of squares of the regression)
SST = SSE + SSR
R^2 = SSR/SST
RMSE = (SSE/l)^0.5
Finding mean of Cp data
mean = ( sum of all elements)/(total number of elements)'''
s = np.sum(cp)
l = np.size(cp)
m = s/l
print('Mean of all cp :',m)
print('')
# For Linear curve fit
linear_sse = 0
linear_ssr = 0
for i in range(l):
linear_error = abs((np.sum((cp[i] - fit_cp_l[i]))))
linear_sse = linear_sse+pow(linear_error,2)
linear_ssr = linear_ssr+ np.sum(pow((fit_cp_l[i] - m),2))
linear_sst = linear_sse + linear_ssr
print('linear_sst :',linear_sst)
linear_R2 = linear_ssr/linear_sst
print('linear_R2 :',linear_R2)
linear_rmse = pow((linear_sse/l),0.5)
print('linear_RMSE :',linear_rmse)
print('')
# For Cubic curve fit
cubic_sse = 0
cubic_ssr = 0
for j in range(l):
cubic_error = abs(np.sum((cp[i] - fit_cp_c[i])))
cubic_sse = cubic_sse+(pow(cubic_error,2))
cubic_ssr = cubic_ssr+np.sum(pow((fit_cp_c[i] - m),2))
cubic_sst = cubic_sse + cubic_ssr
print('cubic_sst :',cubic_sst)
cubic_R2= cubic_ssr/cubic_sst
print('cubic_R2 :',cubic_R2)
cubic_rmse = pow((cubic_sse/l),0.5)
print('cubic_RMSE :',cubic_rmse)
ERRORS :
for
statement operates on iterable data such as a string, list, tuple, or another object container. It was rectified by providing range to it. The range function is a great way to loop for a specified number of times.RESULTS :
Based on the above work, we can answer the following questions:
Q. What do popt and pcov mean?
Ans. 'popt' represents the matrix that stores and extracts the coefficients for the fitting functions according to the general equations defined.
popt : array, Optimal values for the parameters so that the sum of the squared residuals of f(xdata, *popt) - ydata
is minimized
'pcov' represents the square matrix which stores the estimated values of covariance of the coefficients of the above. The diagonal elements of this square matrix also represent the variance of these coefficients.
pcov : 2d array
The estimated covariance of popt. The diagonals provide the variance of the parameter estimate. To compute one standard deviation errors on the parameters use perr = np.sqrt(np.diag(pcov))
.How the sigma parameter affects the estimated covariance depends on absolute_sigma argument, as if True, sigma is used in an absolute sense and the estimated parameter covariance pcov reflects these absolute values. If False, only the relative magnitudes of the sigma values matter. The returned parameter covariance matrix pcov is based on scaling sigma by a constant factor.
Q. What does np.array(temperature) do?
Ans This command converts each value of temperature in the data file into an array using the NumPy module.
Q. What does the * in *popt mean?
Ans '*' , '*popt' indicates and returns each coefficient stored in popt array.
Q. What needs to be done in order to make the curve fit perfectly?
Ans If we increase the order of the polynomial, the error produced will be less and curve fit perfectly. On increasing the order of the polynomial the value of R2R2 will be close to 1 and is assumed to a perfect fit.
This method requires defining functions every time for another polynomial and can be considered lengthy. Instead of this method one of the data says temperature can be split into multiple domains say temperature column has 5000 data in it, we can split it into 10 domains of 500 values and if we plot them it will fit perfectly on a curve.
CONCLUSION: Hence we can conclude that to make the curve fit good R2R2 should be close to 1. It can be found from the results obtained as for linear curve fit R2R2 is approximately 0.93 and for the cubic curve fit it is nearly 0.99. So it can be said that cubic polynomial fits better than linear polynomial. If the order of the polynomial increases, the value of R2R2 will inch towards 1 and fit will be assumed to be good.
Leave a comment
Thanks for choosing to leave a comment. Please keep in mind that all the comments are moderated as per our comment policy, and your email will not be published for privacy reasons. Please leave a personal & meaningful conversation.
Other comments...
Frequency Analysis of a rotating shaft (Finite Element Analysis using SolidWorks)
Aim- The aim of this project is to perform a frequency analysis on a rotating shaft, from there we need to determine the critical frequencies and the mode shapes. 5 Mode Shapes were simulated and analyzed. Introduction:- Frequency is the number of occurrences of a repeating event per unit of time. The formula…
06 Jul 2020 03:57 PM IST
Project - Rankine cycle Simulator (MATLAB)
AIM: To create a basic 'RANKINE CYCLE SIMULATOR'. THEORY: The Rankine cycle is the fundamental operating cycle of all power plants where an operating fluid is continuously evaporated and condensed. The selection of operating fluid depends mainly on the available temperature range. The above figure shows us the basic rankine…
03 Jul 2020 10:43 AM IST
Curve fitting (MATLAB)
AIM: To write a program to fit a linear and cubic polynomial for the specific heat data set then calculate the goodness of fit using different parameters and different ways to improve the fit in MATLAB THEORY: Curve fitting is the process of constructing a curve or mathematical function that best fits the data points,…
03 Jul 2020 10:24 AM IST
Solving second order ODEs (MATLAB)
Aim: To solves the ODE which represents the equation of motion of a simple pendulum with damping. Objective: To write a program that solves the following ODE which represents the equation of motion of a simple pendulum with damping and create an animated video of output obtains by solving this ODE. Theory:…
03 Jul 2020 10:20 AM IST
Related Courses
Skill-Lync offers industry relevant advanced engineering courses for engineering students by partnering with industry experts.
© 2025 Skill-Lync Inc. All Rights Reserved.