All Courses
All Courses
Courses by Software
Courses by Semester
Courses by Domain
Tool-focused Courses
Machine learning
POPULAR COURSES
Success Stories
Aim: Perform the linear and cubic fit for a given set of data and then gain insights on the several parameters that can be used to identify a good fit. Theory: Curve fitting is the process of constructing a mathematical curve to fit the data points according to some criteria to get a mathematical formula…
ARTH SOJITRA
updated on 19 Jun 2020
Perform the linear and cubic fit for a given set of data and then gain insights on the several parameters that can be used to identify a good fit.
Curve fitting is the process of constructing a mathematical curve to fit the data points according to some criteria to get a mathematical formula that can be used to interpolate or extrapolate the values accordingly. Curve fitting is a very powerful method and can have several benefits as to the prediction of data in the future time or the data not having any specified discrete value. This branch of mathematics has its root in the abysmal depths of finance and actuary mathematics. The stock markets, financial risks all use this method to predict the stocks or the prices in the future and invest accordingly in them to get the maximum benefit out of it.
The curve fitting works on the principle as to form a closed and bounded mathematical curve which can give very good and accurate results well within the tolerance range for the specified data. Polynomial functions are used to fit the data which are of the form
f(x)=A0+A1x+A2x2+A3x3+.......+Anxn
where appropriate order of the polynomial can be selected to get the desired curve fit.
Many statistical packages such as R and numerical software such as the Maple, MATLAB, Mathematica, GNU Octave, and SciPy include commands for doing curve fitting in a variety of scenarios. For our report, we shall limit our focus to the MATLAB software as it is easy to program and visualize the data.
To measure the goodness of a fit there can be several parameters.
Consider the diagram is shown below in which we have to perform a curve fit with the help of an nth degree polynomial.
Consider the curve fit as :
f(x)=A0+A1x+A2x2+A3x3+.......+Anxn
So then this curve will also give some discrete values at the data points.
In the Sum of Squares of the error, we measure the difference between the actual data set and the curve fitted data set and sum the errors for all the discrete data points. Our goal is to find the coefficients such that this error sum is minimum.
The sum of the square of the errors will have the coefficients A0,A1,A2,.......,An and hence to minimize the equation we need to partially differentiate the equation with respect to the unknown coefficients and set it equal to zero to get n equations in n unknowns which can then be solved to obtain the coefficients.
Let the data points be : (x1,y1) , (x2,y2) , (x3,y3)...... , (xN,yN)
So the sum of squares of the error is :
SSE ( Sum of squares of the error ) = N∑i=1[ y(i)-f(x(i)) ]2
Then to find the coefficients A0,A1,A2,.......,An:
(∂(SSE)∂A0∂(SSE)∂A1∂(SSE)∂A2...∂(SSE)∂An)=0
Solving these n equations in n unknowns we will get the coefficients A0,A1,A2,.......,An and hence our curve fit.
This sum of the least square method can also be used to check the goodness of a given curve fit. The lower the sum of the square of the error the better the fit.
This is also one more quantity that measures how well the curve fits the given data. It is the correlation coefficient between the approximated curve fitted data and the actual data. We would tend to think that if the curve fit is a good one then we would have a rather stronger correlation between the curve fit data and the actual data as both of them would go hand in hand.
Consider the following diagram showing the data set and parameters used to calculate the R Squared.
If y=f(x) is the curve fit, then the quantity SSR can be defined as:
SSR= N∑i=1[ y(i)-Mean ]2 where Mean is the mean of the original data set
and SSE can be defined as follows:
SSE ( Sum of squares of the error ) = N∑i=1[ y(i)-f(x(i)) ]2
Now we can define the term SST as :
SST=SSR+SSE
and the R-squared can be defined as:
R2=SSRSST
As it is clear the value of the R2 will be between 0 and 1. It can be interpreted like this:
This is also a statistical way of measuring how well the curve fit is. As the name suggests it has something to do with the Squaring of the errors, then summing them and then taking the average and finally take the root.
It is a very similar measure as to the SSE ( Sum of the Square of the Error ) which was discussed earlier in the report. Mathematically,
RSME=√∑Ni=1[ y(i)-f(x(i)) ]2N
therefore we can see that:
RSME=√SSEN. As it is evident that the smaller the RSME the better the curve fit.
To compare the goodness of the various fits we will use the data of the variation of Cp with Temperature T. We shall compare the several parameters as discussed above for the linear and the cubic fit.
To start within MATLAB we first need to import the data into the MATLAB Workspace which is done with the help of the following command.
clear all; close all; clc; % Loading the data into MATLAB with the help of an inbuilt command load('data') % % Now this data will contain around 3200 rows and 2 columns % % The column 1 is the data set for temperature % % The column 2 is the data set for the Cp Values %
Explanation: This code will, first of all, clear the MATLAB Workspace so to remove the unnecessary variables, it will close all the figures and it will clear the command window. It will then import the data from the data file. It is to be made sure that the data file is present in the same directory as the current folder so that MATLAB can read it.
After importing the data the variable data will have 3200 rows and 2 columns as is shown below:
In this, the first row corresponds to the Temperature data points and the second row corresponds to the Specific Heat data points.
We will need to extract these two data sets in the separate vectors to allow easy operations on them. It is done with the help of the following MATLAB Command.
% % Extracting the temperature data as the first column temperature_data = data(:,1); % % Extracting the Cp data cp_data = data(:,2);
Explanation: This code extracts the first column of the data table and then stores it in a vector called the temperature data and the second column which is the data set for the specific values in the cp_data vector.
Now after the procurement of data it is imperative to look at the data in a graphical format. It is achieved via the following MATLAB Code:
%% Plotting raw data % Plotting the initial data set figure(1);clf; % Resizing the figure set(gcf,'Position',[100,100,900,700]); % Plotting the data plot(temperature_data,cp_data,'linew',4,'color','b'); % Adding the labels and title xlabel(' Temperature in [K]'); ylabel(' Specific Heat in [KJ/k-molK] '); title(' Initial Data set '); % Turning on the grid grid on; grid minor; % Incresing the Fontsize set(gca,'FontSize',20)
The following graph of the initial data was obtained:
As it is seen clearly that the variation of the specific heat with temperature is not constant as is assumed in several of the cases to simplify the calculations. Rather it seems like a complex function of Temperature. By the end of this report, the reader will realize the importance of the curve fitting.
Now we will try to generate a polynomial fit to the data above according to a polynomial function:
1.) Linear fit :
The idea behind the linear fit is very simple. We want to fit the linear function f(T)=A+BT where T is the variable and A, B are the constants. It is achieved with the help of the following MATLAB command:
%% Fitting the linear data % Using the Polyfit command coeffs_linear = polyfit(temperature_data,cp_data,1); % Creating the linear fit function linear_fit = polyval(coeffs_linear,temperature_data);
Explanation: This code will create the coefficients of the polynomial which will be linear and stores it in a variable coeffs_linear. Then using these coefficients we generate a new data set corresponding to the linear fit by substituting the discreet Temperature points in the function and evaluating it.
Now we shall compare the linear fit versus the original data. It is done with the help of the following MATLAB Command:
% Plotting and comparing the initial vs the fit curve figure(2);clf; % Resizing the figure set(gcf,'Position',[100,100,700,700]); % Plotting the data plot(temperature_data,cp_data,'linew',4,'color','b'); hold on; plot(temperature_data,linear_fit,'linew',4,'color','r'); % Adding the labels and title xlabel(' Temperature in [K]'); ylabel(' Specific Heat in [KJ/k-molK] '); title(' Linear fit vs original data '); % Adding the legend legend('Original Data','Using the Curve fit for a linear function',... 'location','northwest'); % Turning on the grid grid on; grid minor; % Incresing the Fontsize set(gca,'FontSize',20)
The following graph was obtained:
As it is seen clearly that the linear fit is a very cruel approximation to the original data. Our calculation of the statistical parameters will also confirm this.
Firstly we shall find the error between the original and the curve fitted data which is done with the help of the following MATLAB Command:
% Linear Data size_of_data = max(size(linear_fit)); % Calculating the sum of the squares of the error for i=1:size_of_data % Computing the squared difference between the fit and % the approximated value square_linear(i) = ( cp_data(i) - linear_fit(i) )^2; end % Computing the sum sum_of_square_linear = sum(square_linear);
Explanation: This code will create a matrix such that each element corresponds to its corresponding squared of the difference between the actual data set value and the curve fir.
On running this MATLAB command the following output was obtained:
As can be seen, the error is quite high. This confirms our observation that the linear fit is very cruel. Also, we can see the square of the error at each discrete data point. It is done with the help of following MATLAB Command:
% Plotting the error: figure(3);clf; % Resizing the figure set(gcf,'Position',[100,100,900,700]); % Plotting the data plot(temperature_data,square_linear,'linew',4,'color','b'); % Adding the labels and title xlabel(' Temperature in [K]'); ylabel(' Error '); title(' Error at each discrete data point '); % Turning on the grid grid on; grid minor; % Incresing the Fontsize set(gca,'FontSize',20)
The following graph was obtained:
As is seen from the graph the error in the initial data points is quite high suggesting that the curve fit is not so good in the initial points. However, in the middle towards the end, the curve is fairly close to zero suggesting that towards the middle and the end the curve fit is better.
Now we shall calculate the R-Square term for the linear fitted data. The following MATLAB Command is implemented for the same:
% calculating the mean of the cp_data to be used in the calculation of the % R-sqaured. mean_cp = mean(cp_data); % Calculating the Least squared average for i=1:size_of_data % Computing the difference between the fit and the approximated value R_linear(i) = ( mean_cp - linear_fit(i) )^2; end % Summing up the data SSR_squared_linear = sum(R_linear); % Finding the SST term SST_linear = SSR_squared_linear + sum_of_square_linear; % Finally finding the R-square R_square_linear = SSR_squared_linear/SST_linear;
This code implements the formula as discussed in the Theory Section and finds the R-square. The following output is obtained:
As it is seen the R-squared for the linear fit is closer to one suggesting that it is a good fit. Now, we shall calculate the RSME ( Root Square Mean Error ). It is implemented with the help of the MATLAB Command:
% Calculating the RSME for linear fit RSME_linear = sqrt(sum_of_square_linear / size_of_data );
The following output was obtained:
2.) Cubic fit :
The idea behind the cubic fit is very simple. We want to fit the cubic function f(T)=A+BT+CT2+DT3 where T is the variable and A, B are the constants. It is achieved with the help of the following MATLAB command:
% Using the Polyfit command coeffs_cubic = polyfit(temperature_data,cp_data,3); % Creating the cubic fit function cubic_fit = polyval(coeffs_cubic,temperature_data);
Explanation: This code will create the coefficients of the polynomial which will be cubic and stores it in a variable coeffs_cubic. Then using these coefficients we generate a new data set corresponding to the cubic fit by substituting the discreet Temperature points in the function and evaluating it.
Now we shall compare the cubic fit versus the original data. It is done with the help of the following MATLAB Command:
% Plotting and comparing the initial vs the cubic fit curve figure(4);clf; % Resizing the figure set(gcf,'Position',[100,100,700,700]); % Plotting the data plot(temperature_data,cp_data,'linew',4,'color','b'); hold on; plot(temperature_data,cubic_fit,'linew',4,'color','r'); % Adding the labels and title xlabel(' Temperature in [K]'); ylabel(' Specific Heat in [KJ/k-molK] '); title(' Cubic fit vs Original Data '); % Adding the legend legend('Original Data','Using the Curve fit for a cubic function',... 'location','northwest'); % Turning on the grid grid on; grid minor; % Incresing the Fontsize set(gca,'FontSize',20)
The following graph was obtained:
As it is seen clearly that the cubic fit is a comparatively good approximation than the linear fit to the original data. Our calculation of the statistical parameters will also confirm this.
Firstly we shall find the error between the original and the curve fitted data which is done with the help of the following MATLAB Command:
% Calculating the Least squared average for i=1:size_of_data % Computing the difference between the fit and the approximated value square_cubic(i) = ( cp_data(i) - cubic_fit(i) )^2; end % Computing the least square error_square_cubic = sum(square_cubic);
Explanation: This code will create a matrix such that each element corresponds to its corresponding squared of the difference between the actual data set value and the curve fit.
On running this MATLAB command the following output was obtained:
As can be seen, the error is high but not as high as the linear fit case. This confirms our observation that the cubic fit is a better approximation than the linear fit. Also, we can see the square of the error at each discrete data point. It is done with the help of following MATLAB Command:
% Plotting the error: figure(5);clf; % Resizing the figure set(gcf,'Position',[100,100,900,700]); % Plotting the data plot(temperature_data,square_cubic,'linew',4,'color','b'); % Adding the labels and title xlabel(' Temperature in [K]'); ylabel(' Error '); title(' Error at each discrete data point for cubic case '); % Turning on the grid grid on; grid minor; % Incresing the Fontsize set(gca,'FontSize',20)
The following graph was obtained:
As is seen from the graph the error in the initial data points is quite low suggesting that the curve fit is good in the initial points. However, towards the end, the curve is high suggesting that towards the end the curve fit is very cruel. Overall if we see the cubic curve fit remains well within the bounds with very low fluctuations.
Now we shall calculate the R-Square term for the cubic fitted data. The following MATLAB Command is implemented for the same:
% Calculating the Least squared average for i=1:size_of_data % Computing the difference between the fit and the approximated value R_cubic(i) = ( mean_cp - cubic_fit(i) )^2; end % Summing up the data SSR_squared_cp_cubic = sum(R_cubic); % Calculating the SST term for the cubic fit SST_cubic = SSR_squared_cp_cubic + square_cubic; % Calculating the R-squared term R_squared_cubic = SSR_squared_cp_cubic/SST_cubic;
This code implements the formula as discussed in the Theory Section and finds the R-square. The following output is obtained:
As it is seen the R-squared for the cubic fit is very closer to one suggesting that it is a very good fit. Now, we shall calculate the RSME ( Root Square Mean Error ). It is implemented with the help of the MATLAB Command:
% Calculating the RSME term for the cubic fit RMSE_cubic = sqrt(error_square_cubic / size_of_data );
The following output was obtained:
Getting a confirmation our calculations are correct:
The correlation coefficient according to the formula discussed above is nothing but the square root of R2. Implementing this in MATLAB and comparing the correlation coefficient using the MATLAB inbuilt command:
% From our formula the correlation coefficient correlation_cubic_fit = sqrt(R_squared_cubic) % Using the MATLAB inbuilt command correlation_matlab_function = corr(cp_data,cubic_fit)
The following output is obtained:
As is seen the one using the formula and the one using the MATLAB inbuilt command is the same.
The workspace after the final calculations looks like shown below:
Now after calculation of the parameters of the linear and cubic fit it is time to compare them:
Sum of squares of error | R-Squared | RSME | correlation coefficient | |
linear fit | 2163049.08 | 0.9249 | 25.991 | 0.9617 |
cubic ft | 94272.02 | 0.9967 | 5.4277 | 0.98 |
Clearly, as per our discussion on the good fit, the parameters of the cubic fit are in favor of qualifying it as a better fit than the linear fit.
Now we can answer some of the questions:
1.) How to make a curve fit perfectly?
Ans: The answer is embedded in the question. A perfect fit is nothing but a fit in which all the points are satisfied in the function i.e. the error is zero. It can be achieved if we take the order of the polynomial to be fitted to be equal to the number of data points available. In this way, we will make sure that every data point satisfies the function.
2.) How to get the best fit?
Ans: getting the best fit is like experimenting with the basic parameters like the order of the polynomial, the data set to be fitted, etc. The best fit is obtained on a trial and error basis until we get the desired level of accuracy.
3.) What could be done to improve the cubic fit?
Ans: There are several ways in which we can improve a cubic fit. We can do the cubic fitting for several intervals i.e. piecewise and then combine them to form a global fit. This is shown in the MATLAB Code below:
%% Improving the cubic fit: % Partitioning the T into sub-intervals T1 = temperature_data(1:1100); T2 = temperature_data(1101:2099); T3 = temperature_data(2100:end); % Partitioning the Cp into subintervals C1 = cp_data(1:1100); C2 = cp_data(1101:2099); C3 = cp_data(2100:end); % Polyfitting the curve in these subintervals coeffs_1 = polyfit(T1,C1,3); coeffs_2 = polyfit(T2,C2,3); coeffs_3 = polyfit(T3,C3,3); % Making the polynomails P1 = polyval(coeffs_1,T1); P2 = polyval(coeffs_2,T2); P3 = polyval(coeffs_3,T3); % Combining the data P_overall = [P1(:)',P2(:)',P3(:)'];
This code will partition the domain into 3 sub-intervals and will perform the cubic fitting there. Now after the piecewise cubic fitting we shall now plot the data sets on top of one another. It is done using the following MATLAB command:
% Plotting the overall curve fit with the peicewise splitting and fitting % the cubic curves individually % Plotting and comparing the initial vs the cubic fit curve figure(6);clf; % Resizing the figure set(gcf,'Position',[100,100,800,700]); % Plotting the data plot(temperature_data,cp_data,'linew',4,'color','b'); hold on; plot(temperature_data,P_overall,'linew',4,'color','r'); % Adding the labels and title xlabel(' Temperature in [K]'); ylabel(' Specific Heat in [KJ/k-molK] '); title(' Comparison using the peicewise splitting '); % Adding the legend legend('Original Data',['Using the Curve fit for'... ' a cubic function by peice-wise splitting'],... 'location','northwest'); % Turning on the grid grid on; grid minor; % Incresing the Fontsize set(gca,'FontSize',20)
The following graph was obtained:
As is seen from the graph we get a very good fit using the piecewise split and then performing the cubic fit in each subinterval. Now we shall compute the statistic parameter to confirm this. It is done by the following code:
%% Calculation of statistic paramters for peicewise fit % Calculating the Least squared average for i=1:size_of_data % Computing the difference between the fit and the approximated value P_cubic(i) = ( cp_data(i) - P_overall(i) )^2; end % Computing the least square error_square_cubic_P = sum(P_cubic); % Calculating the Least squared average for i=1:size_of_data % Computing the difference between the fit and the approximated value R_cubic_P(i) = ( mean_cp - P_overall(i) )^2; end % Summing up the data SSR_squared_cp_cubic_P = sum(R_cubic_P); % Calculating the SST term for the cubic fit SST_cubic_P = SSR_squared_cp_cubic_P + error_square_cubic_P; % Calculating the R-squared term R_squared_cubic_P = SSR_squared_cp_cubic_P/SST_cubic_P; % Calculating the RSME term for the cubic fit RMSE_cubic_P = sqrt(error_square_cubic_P / size_of_data ); % From our formula the correlation coefficient correlation_cubic_fit_P = sqrt(R_squared_cubic_P);
The following outputs were obtained:
It is visible that the parameters strongly suggest this is a very good fit as compared to the normal cubic and linear polynomial fits which concludes the argument of the accuracy of the piecewise splitting.
1.) The mislabeling of the import data:
% % Extracting the temperature data as the first column temperature_data = data(1,:);
As is seen here this code will by itself not give any error. However, in the further running, the program will crash because of the non-agreement of the matrix dimensions like is shown below:
2.) Misordering of the inputs in the polyfit command:
Sometimes the user can make a mistake of providing the input arguments in the wrong order as is shown here:
% Using the Polyfit command coeffs_linear = polyfit(cp_data,temperature_data,1); % Creating the linear fit function linear_fit = polyval(coeffs_linear,temperature_data);
This code will by itself not give any error but the graph we will get in the further part of the code will be completely meaningless as is shown below:
This error can be fixed easily by providing the input arguments in the correct order.
3.) The wrong partitioning of the interval:
One more mistake that the user tends to make is he might partition the interval in the wrong ways either by including an extra point or missing an extra point as is shown below:
% Partitioning the T into sub-intervals T1 = temperature_data(1:1100); T2 = temperature_data(1100:2100); T3 = temperature_data(2100:end); % Partitioning the Cp into subintervals C1 = cp_data(1:1100); C2 = cp_data(1100:2100); C3 = cp_data(2100:end);
This code will not give any error however we can see it clearly that the partitioning is wrong. The user has taken the points 1100 and 2100 two times and hence the final vector will be more than the length of the original vector and it will result in the error in the subsequent part of the code as is shown below:
In this report, we had studied the nuances of the curve fitting and the statistic parameters that help determine how good is a fit. Curve fitting has several applications in actuary and applied mathematics. Like for example in this report, we did the curve fit on the Cp data. Now just imagine if because of some issue we are not able to obtain the Cp data at very high temperatures using the curve fitting approach with appropriate accuracy we can obtain a very good approximation of the data we want. Thought the value may still be far from the real value at least curve fitting introduces the basic idea.
Leave a comment
Thanks for choosing to leave a comment. Please keep in mind that all the comments are moderated as per our comment policy, and your email will not be published for privacy reasons. Please leave a personal & meaningful conversation.
Other comments...
Genetic Algorithm
Aim: To understand the concept of the genetic algorithm and write code in MATLAB to optimize the stalagmite function and find the global maxima of the function. Theory: The stalagmite function is a function with 4 components : 2 sine components in each of the axis direction 2 normal…
30 Jun 2020 10:42 PM IST
Curve fitting in MATLAB:
Aim: Perform the linear and cubic fit for a given set of data and then gain insights on the several parameters that can be used to identify a good fit. Theory: Curve fitting is the process of constructing a mathematical curve to fit the data points according to some criteria to get a mathematical formula…
19 Jun 2020 09:02 PM IST
FVM Literature
Objective : To study the theory behind the various interpolation schemes and the flux limiters in case of the Finite Volume Method. Why the need for FVM? In advanced CFD approaches with highly unstructured grids, we tend to use a method called as the finite Volume Method ( FVM ). While in the normal discretization…
18 Jun 2020 07:40 AM IST
Air standard Cycle
Aim: To demonstrate the working of an otto cycle and write a MATLAB code to plot the graphs and calculate the thermal efficiency. Theory: The Air-Standard cycles work on the principle that the working medium is air. These cycles gained became more and more popular in the theoretical analysis of an Internal Combustion Engine.…
12 Jun 2020 01:27 PM IST
Related Courses
Skill-Lync offers industry relevant advanced engineering courses for engineering students by partnering with industry experts.
© 2025 Skill-Lync Inc. All Rights Reserved.