All Courses
All Courses
Courses by Software
Courses by Semester
Courses by Domain
Tool-focused Courses
Machine learning
POPULAR COURSES
Success Stories
Why there is a difference in the formula of variance for population and sample? The formula used for determining variance for population is σ2=∑(xi−μx)2N Whereas, the variance for a sample is calculated using s2=∑(xi−μx)2N−1 It can be noted that the denominator for sample variance is reduced…
Vignesh Varatharajan
updated on 10 Mar 2021
The formula used for determining variance for population is
σ2=∑(xi−μx)2N
Whereas, the variance for a sample is calculated using
s2=∑(xi−μx)2N−1
It can be noted that the denominator for sample variance is reduced by 1. This results in a higher variance for the sample data as compared to the population.
For the cases where sampling techniques are used, it is very difficult to calculate the population statistics. For example, finding the statistics of weight of a population in a given region could be impossible as there exists huge number of people with varying range of weights. In such cases, we rely heavily on the sample data. To compensate for the lack of the information available regarding the population, we increase the variance of the sample data (by decreasing the denominator)
Stratified sampling method involves forming stratas or groups from a population based on a distinct characteristics such as gender, age group, color etc. For determing the statistics, we take equal numbers of entities from each stratas. For examples, if there are 3 different colors of balls in the population, we segregate the balls based on the color. Later a definite number of balls are taken for each color and statistics are calculated. In stratified sampling, each strata is equally represented in the sample generated.
Clustered sampling method involves a similar approach of categorizing the population into clusters based on a distinct characteristic. However, cluster sampling is used for cases where the population is very large and it is difficult to form clusters representing all data points. In such case, the sample is generated from only a few clusters instead of all the clusters. This results in misrepresentation or biased representation of clusters in the samples generated.
Let us consider a case where the size of population is 4
X={A,B,C,D}
The different samples possible are : {A}, {B}, {C}, {D}, {A,B}, {A,C}, {A,D}, {B,C}, {B,D},{C,D},{A,B,C},{A,B,D},{A,C,D},{B,C,D},{A,B,C,D}
The total number of samples = 15 = 24−1
Therefore, the total number of samples that can be generated from a population of size N = 2N−1
Number of jacks in a pack of card = 4
Considering without replacement, the probability of getting 2 jacks on picking 2 cards from a pack of card is
P=452⋅351=1221
Outcomes when numbers are same on the dice twice = {(1,1), (2,2),(3,3),(4,4),(5,5),(6,6)}
Therefore, number of favorable outcomes = 6
Total number of outcomes = 36
Probability that both the numbers on dice are same while rolling 2 dice
P=636=16
The variance of a data is calculated by squaring the difference of the variable from its mean. Due to this, the unit of variance is not dimensionally same as that of mean.
To resolve this issue, standard deviation is introduced which is the square root of variance. Both mean and standard deviation are dimensionally equal.
Leave a comment
Thanks for choosing to leave a comment. Please keep in mind that all the comments are moderated as per our comment policy, and your email will not be published for privacy reasons. Please leave a personal & meaningful conversation.
Other comments...
Supervised Learning - Prediction Week 3 Challenge
Perform Gradient Descent in Python with any loss function Let us consider the below dataset for performing gradient descent: x = [1,2,3,4,5] y = [2,4,6,8,10] The loss function to be used is the mean square error which is defined as: MSE=n∑i=1(ya−yp)2n n = total number of data points yp = predicted…
31 Mar 2021 10:29 AM IST
Basics of ML & AL Week 2 Challenge
1)Calculate all 4 business moments using pen and paper for the below data set? First business moment The first business moment is the measure of central tendency Mean or expected value, μ=∑x.P(x) For the data, we calculate the data as follows: μ=∑x.P(x)=1.4 Second business moment The second business…
16 Mar 2021 06:39 AM IST
Basics of Probability and Statistics Week 1 Challenge
Why there is a difference in the formula of variance for population and sample? The formula used for determining variance for population is σ2=∑(xi−μx)2N Whereas, the variance for a sample is calculated using s2=∑(xi−μx)2N−1 It can be noted that the denominator for sample variance is reduced…
10 Mar 2021 02:15 PM IST
Week 10: Project 1 - FULL HYDRO case set up (PFI)
Aim To setup a full hydrodynamic case of PFI and perform simulation to determine the engine characteristics Introduction Using the boundary flagging and mesh motion obtained from the no-hydro case, this project involves spray and combustion modeling to perform a full hydrodynamic case of Port Fuel Injection system. The…
23 Feb 2021 04:16 AM IST
Related Courses
0 Hours of Content
Skill-Lync offers industry relevant advanced engineering courses for engineering students by partnering with industry experts.
© 2025 Skill-Lync Inc. All Rights Reserved.