Skill-Lync Launch Pad – Your Gateway to a Core Engineering Job by 2025! Only 1̶0̶0̶ 57 Seats Available.

02D 16H 58M 09S

Executive Programs

Workshops

Projects

Blogs

Careers

Placements

Student Reviews

For Business

Academic Training

Informative Articles

Find Jobs

We are Hiring!

All Courses

Choose a category

Mechanical

Electrical

Civil

Computer Science

Electronics

Offline Program

All Courses

CHOOSE A CATEGORY

Mechanical

Electrical

Civil

Computer Science

Electronics

Offline Program

Top Job Leading Courses

Automotive

CFD

FEA

Design

MBD

Med Tech

Courses by Software

Design

Solver

Automation

Vehicle Dynamics

CFD Solver

Preprocessor

Courses by Semester

First Year

Second Year

Third Year

Fourth Year

Courses by Domain

Automotive

CFD

Design

FEA

Tool-focused Courses

Design

Solver

Automation

Preprocessor

CFD Solver

Vehicle Dynamics

Machine learning

Machine Learning and AI

POPULAR COURSES

Post Graduate Program in Hybrid Electric Vehicle Design and Analysis

Post Graduate Program in Computational Fluid Dynamics

Post Graduate Program in CAD

Post Graduate Program in CAE

Post Graduate Program in Manufacturing Design

Post Graduate Program in Computational Design and Pre-processing

Post Graduate Program in Complete Passenger Car Design & Product Development

Executive Programs

Workshops

For Business

Success Stories

Placements

Student Reviews

Projects

Blogs

Academic Training

Find Jobs

Informative Articles

We're Hiring!

+91 9342691281 Log in

Curve fitting using python

AIM : To write codes in Python to perform curve fitting. OBJECTIVE : To wite codes to fit a linear and cubic polynomial for the Cp data. To plot the linear and cubic fit curves along with the raw data points. To measure the fitness characteristics for both the curves. THEORY : Curve fitting…

PYTHON

Sourabh Lakhera
updated on 15 Jul 2020

AIM : To write codes in Python to perform curve fitting.

OBJECTIVE :

To wite codes to fit a linear and cubic polynomial for the Cp data.
To plot the linear and cubic fit curves along with the raw data points.
To measure the fitness characteristics for both the curves.

THEORY : Curve fitting is the way we model or represent a data spread by assigning a best fit function (curve) along the entire range. Ideally, it will capture the trend in the data and allow us to make predictions of how the data series will behave in the future.

Types of curve fitting include:

Interpolation, where you discover a function that is an exact fit to the data points. Since this assumes no measurement error, it has limited applicability to real life scenarios.
Smoothing is when we find a function that is an approximate fit to the data points, but we give room for error and we allow our actual points to be near, but not necessarily on the line; given the error is minimized overall.

Linear and Polynomial Curve fitting :

(i) Linear curve fitting, or linear regression, is when the data is fit to a straight line. Although there might be some curve to your data, a straight line provides a reasonable enough fit to make predictions.

Since the equation of a generic straight line is always given by f(x)= a x + b, the question becomes: what a and b will give us the best fit line for our data?

Considering the vertical distance from each point to a prospective line as an error, and summing them up over our range, gives us a concrete number that expresses how far from ‘best’ the prospective line is.

A line that provides a minimum error can be considered the best straight line.

Since it’s the distance from our points to the line we’re interested in—whether it is positive or negative distance is not relevant—we square the distance in our error calculations. This also allows us to weight greater errors more heavily. So this method is called the least square approach.

(ii) Polynomial curve fitting is when we fit our data to the graph of a polynomail function. The same least squares method can be used to find the polynomial, of a given degree, that has a minimum total error.

To choose the best fit for the curve the following four parameters help us to measure the goodness of fit criteria or how well the equations are representing the given datapoints:

The sum of squares due to error ( $S S E e c o m m e r c e$ ),
R-square ( $R2$ ),
Adjusted R-squrae,
Root mean squared error ( $RMSE$ )

GOVERNING EQUATIONS USED :

Error = $∣y(i))-"> ∣ y (i)) -$ value of $f(x)"> f (x)$ at $x(i)∣"> x (i) ∣$ = $|y(i)-f(x(i))|"> | y (i) - f (x (i)) |$ ,
SSE (sum of squared error) = $\sumi=1n"> n \sum i = 1$ Error $(i)2"> {(i)}^{2}$ ,
SSR (sum of square of regression) = $\sumi=1n(f(x(i))-Mean)2"> n \sum i = 1 {(f (x (i)) - M e a n)}^{2}$ ,
SST (sum of the squared total) = $SSE+SSR"> S S E + S S R$ ,
$R2"> R^{2}$ ( R squared) = $SSRSST"> \frac{S S R}{S S T}$ ,
RMSE (root mean squared error) = $SSEn"> \sqrt{\frac{S S E}{n}}$ ,

where n is the total number of data points available.

The raw data of Temperature (K) and specific heat (kJ/kcalK) is obtained from here : DATA GIVEN

SOLUTION STEPS :

At first necessary modules such as math,matplotlib,numpy,scipy are imported to perform their respective functions.
Two functions are defined which return linear and cubic equations.
The following data is loaded into Python by placing it in right directory and a function read_file is created. For loop is used to run each line by using 'open' command with 'r' to read it.
From the data obtained, two individual column matrices of temperature and specific heat are derived.
For curve fitting, 'popt' and 'pcov' are used as output for storing parameters obtained by 'curve_fit' command which takes input as the required function(linear or cubic) ,temperature and Cp. Now, another variable named 'fit_cp' stores the new values of Cp calculated by using temperature array.
Plots of both raw dataset and predicted dataset are made and compared to find fitness characteristics for both the curves for both linear and cubic polynomial.
To measure the fitness characteristics for both the curves, various parameters such as $SSE"> S S E$ , $SSR"> S S R$ , $SST"> S S T$ , $R2"> R^{2}$ , $RMSE"> R M S E$ are calculated. The plot which has highest $R2"> R^{2}$ or closest to 1 is considered to be good fit.

PYTHON CODE :

# A program to measure fitness characteristics for the linear and cubic polynomial for the Cp data:
import math
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit

# Curve fit function
# Linear function
def function_0(t,a,b):
	return a*t + b 
# Cubic function	
def function_2(t,a,b,c,d):
	return a*pow(t,3) + b*pow(t,2) + c*t +d
	

# Reading thermodynamic data file
def read_file():
	temperature = []
	cp = []
	for line in open('data','r'):
		values = line.split(',')
		temperature.append(float(values[0]))
		cp.append(float(values[1]))

	return [temperature , cp]

# Main Program
temperature , cp = read_file()

popt, pcov = curve_fit(function_0,temperature,cp)
fit_cp_l = function_0(np.array(temperature), *popt)
plt.figure(1)
plt.plot(temperature,cp,'k--')
plt.plot(temperature,fit_cp_l,color='red',linewidth = 1)
plt.legend(['Actual data','Curve fit'])
plt.xlabel('Temperature (K)')
plt.ylabel('Cp')
plt.title('Linear Curve fitting')
plt.show()

popt, pcov = curve_fit(function_2,temperature,cp)
fit_cp_c = function_2(np.array(temperature), *popt)
plt.figure(3)
plt.plot(temperature,cp,'k--')
plt.plot(temperature,fit_cp_c,color='red',linewidth = 1)
plt.legend(['Actual data','Curve fit'])
plt.xlabel('Temperature (K)')
plt.ylabel('Cp')
plt.title('Cubic Curve fitting')
plt.show()


''' 
Measuring the fitness characetristics of the curves 

 SSE (sum of error squared)
 SSR (sum of squares of the regression)
 SST = SSE + SSR
 R^2 = SSR/SST
 RMSE = (SSE/l)^0.5

 Finding mean of Cp data
 mean = ( sum of all elements)/(total number of elements)'''

s = np.sum(cp)
l = np.size(cp)
m = s/l
print('Mean of all cp :',m)
print('')
# For Linear curve fit

linear_sse = 0
linear_ssr = 0
for i in range(l):
    linear_error = abs((np.sum((cp[i] - fit_cp_l[i]))))
    linear_sse = linear_sse+pow(linear_error,2)
    linear_ssr = linear_ssr+ np.sum(pow((fit_cp_l[i] - m),2))

 
linear_sst = linear_sse + linear_ssr
print('linear_sst :',linear_sst)
linear_R2 = linear_ssr/linear_sst
print('linear_R2 :',linear_R2)
linear_rmse = pow((linear_sse/l),0.5)
print('linear_RMSE :',linear_rmse)
print('')


# For Cubic curve fit
cubic_sse = 0
cubic_ssr = 0
for j in range(l):
    cubic_error = abs(np.sum((cp[i] - fit_cp_c[i])))
    cubic_sse = cubic_sse+(pow(cubic_error,2))
    cubic_ssr = cubic_ssr+np.sum(pow((fit_cp_c[i] - m),2))

 
cubic_sst = cubic_sse + cubic_ssr
print('cubic_sst :',cubic_sst)
cubic_R2= cubic_ssr/cubic_sst
print('cubic_R2 :',cubic_R2)
cubic_rmse = pow((cubic_sse/l),0.5)
print('cubic_RMSE :',cubic_rmse)

ERRORS :

There is NO such big error occurred Sublime properly hinted after every error and accordingly rectified it.

RESULTS :

The following are the output of the above program-

For Linear Curve fitting the plot obtained is-

For Cubic Curve fitting the plot obtained is-

Based on the above work, we can answer the following questions:

Q. What does popt and pcov mean?

Ans. 'popt' represents the matrix which stores and extracts the coefficients for the fitting functions according to the general equations defined.

popt : array, Optimal values for the parameters so that the sum of the squared residuals of f(xdata, *popt) - ydata is minimized

'pcov' represents the square matrix which stores the estimated values of covariance of the coefficients of the above. The diagonal element of this square matrix also represents variance of that coefficients .

pcov : 2d array

The estimated covariance of popt. The diagonals provide the variance of the parameter estimate. To compute one standard deviation errors on the parameters use perr = np.sqrt(np.diag(pcov)).How the sigma parameter affects the estimated covariance depends on absolute_sigma argument, as if True, sigma is used in an absolute sense and the estimated parameter covariance pcov reflects these absolute values. If False, only the relative magnitudes of the sigma values matter. The returned parameter covariance matrix pcov is based on scaling sigma by a constant factor.

Q. What does np.array(temperature) do?

Ans This command converts each value of temperature in the data file into an array using numpy module.

Q. What does the * in *popt mean?

Ans '*' , '*popt' indicates and returns each coefficients stored in popt array.

Q. What needs to be done in order to make the curve fit perfect?

Ans If we increase the order of the polynomial, the error produced will be less and curve fit perfectly. On increasing the order of the polynomial the value of $R^{2}$ will be close to 1 and is assumed to perfect fit.

This technique requires characterizing capacities each an ideal opportunity for another polynomial and can be viewed as long. Rather than this technique the one of the information state temperature can be part into numerous areas state temperature segment has 5000 information in it, we can part it into 10 spaces of 500 qualities and on the off chance that we plot them it will fit consummately on bend.

CONCLUSION :
Henceforth we can infer that to make bend fit great $R^{2}$ ought to be near 1. It very well may be found from the outcomes got concerning straight bend fit $R^{2}$ is around 0.93 and for cubic bend fit it is about 0.99. So it tends to be said that cubic polynomial fits superior to straight polynomial. In the event that the request for the polynomial expands, the estimation of $R^{2}$ will creep towards 1 and fit will be thought to be acceptable.