Section outline

  • Two groups of students that each have an average test score of 75 might have a score distribution that looks remarkably different. One class might be made up entirely of grades between 72 and 78 while the other class may have half the group around 50, with the other half getting near 100. Variance is a way of measuring the variation in a set of data or how spread out the data is. What is the mean and variance for the following sample test scores taken from a larger student population ?
    ::每组学生的平均考试分数平均为75分,其中两类学生的得分分布可能明显不同。 一类学生的得分分布可能完全由72至78年级组成,另一类学生的得分可能一半在50岁左右,另一类学生的得分接近100岁,而另一类学生的得分则接近100岁左右。 差异是衡量一组数据差异或数据传播方式的一种方法。 从更多学生群体中得出以下抽样考试分数的平均值和差异是什么?

    75, 73, 78, 90, 60, 51, 87, 79, 80, 77

    Finding Variance
    ::调查结果差异

    The thought process of a person trying to describe the spread (variation) of some data for the first time must have been something like this.
    ::第一次试图描述某些数据传播(变换)的人的思考过程, 一定是像这样的。

    Well, the average is 75. What if I try to just add up how different each number is from 75?
    ::那么,平均数字是75,如果我试着加起来 每一个数字和75有什么不同呢?

          

    As the person calculates the numbers, they realize pretty quickly that this sum will be zero, essentially by definition. This is because the numbers that occur below 75 precisely cancel out with the numbers above 75.
    ::当一个人计算数字时,他们很快意识到这个总和将是零,基本上根据定义。这是因为75以下的数字与75以上的数字完全勾销了75以下的数字。

    Since I cannot add the differences directly, why don’t I just sum the absolute value of the differences?
    ::既然我不能直接增加差异, 为什么我就不能把差异的绝对价值相加?

            

    This is a legitimate method for describing the spread of data. It is called absolute deviation and is simply the sum of the absolute values of each of the differences.
    ::这是描述数据分布的合理方法,被称为绝对偏差,只是每个差异的绝对值之和。

    If I take the average absolute difference, I will be able to judge on average how far away each data point is from the mean. A larger difference means more spread out.
    ::如果我得出平均绝对差异,我就能平均判断每个数据点离平均值有多远。 更大的差异意味着扩大范围。

            

    If you take the average of the absolute deviation, you get the mean absolute deviatio n. The mean absolute variation is a legitimate, but limited, way of describing the spread of data. Eventually, a person trying to describe the spread of data for the first time might consider a method called population variance .
    ::如果您采用绝对偏差的平均值,您就会得到平均绝对偏差。平均绝对偏差是一种合理但有限的数据分布描述方式。最终,试图首次描述数据分布的人可能会考虑一种称为人口差异的方法。

    What if instead of using absolute value to solve the issue, I square each difference and then add them together? Of course I’d have to divide by the number of data points to get the average difference squared
    ::如果不是使用绝对值来解决问题,而是我将每个差异平分,然后将它们加在一起,那该怎么办? 当然,我必须除以数据点数,才能将平均差异平开。

               

    This method turns out to be extraordinarily powerful in statistics . One downside is that most of the time you cannot get data from the entire population, you usually only get it from a sample. Over time people realized that samples were typically less variable than their populations and dividing by the number of data points was consistently underestimating the true variance of the population. In other words, if n is the size of the sample then multiplying the sum of the square differences by  1 n makes the variance too small. Research and theory progressed until it was realized that multiplying the sum of the square differences by  1 n 1 made the fraction slightly larger and properly estimated the variance of the population. Thus, there are two ways to calculate variance, one for populations and one for samples.
    ::这种方法在统计中显得异常有力。 一个不利因素是,大部分时间你无法从全部人口获得数据,通常只能从抽样中获得数据。随着时间推移,人们意识到样本通常比其人口变化较少,而除以数据点数,总是低估了人口的真正差异。换句话说,如果样本的大小是n,那么将平方差之和乘以1n,那么差异就太小了。研究与理论进展到发现平方差之和乘以1n-1使小数小一点,并适当估计了人口差异。因此,有两种方法可以计算差异,一种是人口差异,另一种是抽样。

    Hey wait, by squaring the differences, doesn’t that mean that the units are squared? What if I want to describe the spread in the regular units? Should I just take the square root of the variance?
    ::等等,通过缩小分歧,这是否意味着单位是平方的?如果我想描述常规单位的分布,那怎么办?我是否应该只选择差异的平方根?

                         

    This is why the Greek letter lowercase sigma, σ , is used for standard deviation of a population (which is the square root of the variance) and  σ 2 is the symbol for variance of a population. The letters s , s 2 are used for sample standard deviation and sample variance. The Greek letter mu, μ , is the symbol used for mean of a population, while   ¯ x  is the symbol used for mean of a sample.
    ::这就是为什么希腊字母小写 sigma, QQ, 用于人口的标准偏差( 差异的平方根) , 而 QQ2 是人口差异的符号。 字母 s, s2 用于样本标准偏差和样本差异。 希腊字母 mu, 是人口平均值的符号, 而x 是样本平均值的符号 。

    Mean and variance for the population: x 1 , x 2 , x 3 , , x n
    ::人口平均和差异:x1,x2,x3,...,xn

    μ = 1 n n i = 1 x i σ 2 = 1 n n i = 1 ( μ x i ) 2


    ::1nni=1xI2=1nni=1(xi)2

    Mean and variance for a sample from a population: x 1 , x 2 , x 3 , , x m
    ::人口样本的平均和差异: x1,x2,x3,...,xm

    ¯ x = 1 m m i = 1 x i s 2 = 1 m 1 m i = 1 ( ¯ x x i ) 2


    ::1米=1米=1x2 1米=1米=1米=1米=1米=1米=1米=1米=1个(xxxxxxxxxxi)2

    Remember that variance is a measure of the spread of data. The bigger the variance, the more spread out the data points.
    ::记住差异是衡量数据分布的尺度。差异越大,数据点的分布越大。

    Take a six sided dice.  Since the population for a six sided die is entirely known, you would use the population variance to calculate the variance and mean. You would get .
    ::取一个六边骰子。 因为六边骰子的死亡人口是众所周知的, 你会使用人口差异来计算差异和平均值。 你会得到的 。

      μ = 1 6 ( 1 + 2 + 3 + 4 + 5 + 6 ) = 1 6 21 = 3.5

    σ 2 = 1 6 [ ( 3.5 1 ) 2 + ( 3.5 2 ) 2 + ( 3.5 3 ) 2 + ( 3.5 4 ) 2 + ( 3.5 5 ) 2 + ( 3.5 6 ) 2 ] = 1 6 [ 6.25 + 2.25 + 0.25 + 0.25 + 2.25 + 6.25 ] 2.9167

    Examples
    ::实例

    Example 1
    ::例1

    Earlier, you were asked to find the mean and variance for the following sample test scores taken from a larger student population:
    ::更早之前,有人要求你找到从更多学生中抽取的下列抽样测试分数的平均值和差异:

    75, 73, 78, 90, 60, 51, 87, 79, 80, 77

    The mean of the test scores is 75. The variance is calculated by taking the difference of each number from the mean, squaring and summing these differences.
    ::测试分数的平均值为75。 计算差异的方法是,将每个数字与平均值的差数、差数和差数的差数进行计算。

    0 2 + 2 2 + 3 2 + 15 2 + 15 2 + 24 2 + 12 2 + 4 2 + 5 2 + 2 2 = 1228

    Since this data is a sample, you divide the sum by one fewer than the number of terms.
    ::由于此数据为样本,您将总和除以比条件数少一分。

    1228 10 1 136.4444

    If you knew the variances for two samples, each from a different class, you could quickly determine which class had test scores that were more spread out.
    ::如果你知道两个样本的差异,每个样本来自不同的类别, 你可以很快地确定哪个类的测试分数比较分散。

    Example 2
    ::例2

    Calculate the mean and variance of the following data sample of lap times.
    ::计算下列圈间数据样本的平均值和差异。

    59.8, 57.1, 58.2, 58.6, 57.8, 57.9, 58.0, 57.3

      ¯ x = 1 8 ( 59.8 + 57.1 + 58.2 + 58.6 + 57.8 + 57.9 + 58.0 + 57.3 ) = 58.0875
    ::x=18(59.8+57.1+58.2+58.6+57.8+57.9+58.8+57.3)=58.0875

    This is a sample, so you should use the sample variance formula.
    ::这是样本, 所以您应该使用样本差异公式 。

    s 2 = 1 8 1 [ ( μ 59.8 ) 2 + ( μ 57.1 ) 2 + ( μ 58.2 ) 2 + ( μ 58.6 ) 2 + ( μ 57.8 ) 2 + ( μ 57.9 ) 2 + ( μ 58.0 ) 2 + ( μ 57.3 ) 2 ] = 1 7 [ ( 1.7125 ) 2 + 0.9875 2 + ( 0.1125 ) 2 + ( 0.5125 ) 2 + 0.2875 2 + 0.1875 2 + 0.0875 2 + 0.7875 2 ] 1 7 [ 2.9327 + 0.9751 + 0.0126 + 0.2626 + 0.0826 + 0.0351 + 0.0076 + 0.6201 ] 1 7 [ 4.9288 ] 0.7041


    :sad598.8)2+(58.2)2+(58.2)2+(58.6)2+(57.8)2+(57.8)2+(57.9.2)2+(57.9.2)2+(58.8)2+(57.3)2+(580)2+(57.3)2)2+17[(-1.7125)2+0.98752+(-0.1125)2+(-0.5125)2+0.28752+0.28752+0.18752+0.18752+0.18752+0.8752+0.8752+0.8752+0.8752] +17[2.9327+0.9751+0.0126+0.2626+0.00826+0.0351+0.0076+0.6201]__17[4.9288]0.7041]

    Example 3
    ::例3

    Use a calculator to calculate the variance from Example 2.
    ::使用计算器计算示例2的差异。

    To calculate variance on your calculator, enter the data in a list, choose 1-Var Stats and run the 1-Var Stats on the list you entered the data.
    ::要计算计算计算器上的差异,请在列表中输入数据,选择1-Var Stats,并在您输入的数据列表中运行1-Var Stats。

    lesson content

    lesson content

    The two outputs that are important for you to interpret are:
    ::两项产出对于你的解释很重要,它们是:

    S x = 0.839110924
    ::Sx=0.8.839110924

    σ x = 0.7848163968
    ::XXx=0.7848163968

    Since the calculator does not know whether the data is a population or a sample, it produces both. Since this problem is about a sample, the number of interest is S x . This number does not match the variance from Example B because it is the sample standard deviation which means it is the square root of the sample variance. The calculator produces standard deviation. You need to square that number to produce the appropriate variance.
    ::由于计算器不知道数据是一组数据还是一个样本,它产生两种数据。由于这个问题涉及一个样本,因此利息数是Sx。这个数字与例B的差异不符,因为它是样本标准偏差,表示它是样本差异的平方根。计算器产生标准偏差。为了产生相应的差异,您需要平方该数字。

    0.8391 2 0.7041

    Example 4
    ::例4

    Calculate the standard deviation for the following 6 numbers by hand. Assume the numbers are a population.
    ::手工计算以下6个数字的标准偏差。假设数字是人口。

    2, 4, 6, 8, 12, 19

    μ = 1 6 ( 2 + 4 + 6 + 8 + 12 + 17 ) = 8 σ 2 = 1 6 ( ( 8 2 ) 2 + ( 8 4 ) 2 + ( 8 6 ) 2 + 0 + ( 8 12 ) 2 + ( 8 17 ) 2 ) = 1 6 ( 6 2 + 4 2 + 2 2 + 4 2 + 9 2 ) = 1 6 ( 36 + 16 + 4 + 16 + 81 ) = 1 6 ( 153 ) = 25.5 σ 5.0498

    Example 5
    ::例5

    Use a spreadsheet to organize your calculations for computing the variance of the following numbers. Assume these numbers are a true population.
    ::使用电子表格来组织计算以下数字的差异。 假设这些数字是真实的数字 。

    14, 15, 7, 15, 2, 0, 6, 5, 12, 3

    After entering the data in a column, you can use the power of the embedded programming of the spreadsheet to make a second column of just the average.
    ::在将数据输入一列后,您可以使用电子表格嵌入式编程的功率,使第二列仅为平均值。

    • The average command is: “ = average (A2:A11)
      ::平均命令是:“=平均(A2(A2:A11))”

    You can subtract one cell from another cell to find the difference. You can then square the difference to find the difference squared. You can then sum these values using the sum command.
    ::您可以从另一个单元格中减去一个单元格以找到差数。然后,您可以对差数进行平方以找到差数。然后,您可以使用和数命令将这些值相加。

    • The sum command is: “ = sum(D2:D11)
      ::总和命令是:“=总(D2:D11)”

    Finally, just divide the sum by the number of observations (which is 10) to get the variance.
    ::最后,为得出差异,将总和除以观测次数(即10)即可。

    lesson content

      Summary
    • Variance is a measure of the spread of data, indicating how far each data point is from the mean.
      ::差异是衡量数据分布的尺度,显示每个数据点与平均值之间的距离。
    • Absolute deviation is the sum of the absolute values of each of the differences between data points and the mean.
      ::绝对偏差是数据点和平均值之间每项差异的绝对值总和。
    • Mean absolute deviation is the average of the absolute deviation, which is a limited way of describing the spread of data.
      ::平均绝对偏差是绝对偏差的平均值,这是描述数据分布的有限方式。
    • Standard deviation is the square root of the variance, used to describe the spread in the regular units.
      ::标准偏差是差异的平方根,用来描述常规单位的差分。

    Review
    ::回顾

    1. What are the similarities and differences between standard deviation and variance?
    ::1. 标准偏差和差异之间有哪些相似之处和不同之处?

    2. Data Set A has a mean of 30 and a standard deviation of 10. Data Set B also has a mean of 30, but a standard deviation of 2. What does this mean about Data Set A compared to Data Set B?
    ::2. 数据集A的平均值为30,标准偏差为10。 数据集B的平均值为30,但标准偏差为2。 与数据集B相比,这对数据集A意味着什么?

    Calculate the variance of each set of data by hand.
    ::用手计算每组数据的差异。

    3. Sample: 1, 4, 7, 10, 3, 6, 12, 5, 8, 16, 21, 3, 1, 5
    ::3. 样本:1、4、7、10、3、6、12、5、8、16、21、3、1、5

    4. Population: 23, 27, 19, 24, 20, 22, 31, 30, 28
    ::4. 人口:23、27、19、24、20、22、31、30、38

    5. Sample: 64, 62, 60, 58, 54, 60, 61, 63, 47, 100, 29, 59
    ::5. 抽样:64、62、60、58、54、60、61、63、47、100、29、59

    Calculate the variance of each set of data using your calculator. Compare your answers to your answers to 3-5.
    ::使用您的计算器计算每组数据的差异。 比较您的答复与您的答复到 3-5 。

    6. Sample: 1, 4, 7, 10, 3, 6, 12, 5, 8, 16, 21, 3, 1, 5
    ::6. 样本:1、4、7、10、3、6、12、5、8、16、21、3、1、5

    7. Population: 23, 27, 19, 24, 20, 22, 31, 30, 28
    ::7. 人口:23、27、19、24、20、22、31、30、38

    8. Sample: 64, 62, 60, 58, 54, 60, 61, 63, 47, 100, 29, 59
    ::8. 抽样:64、62、60、58、54、60、61、63、47、100、29、59

    9. If σ 2 = 16 , what is the population standard deviation?
    ::9. 如果0.12=16,人口标准差是多少?

    10. Which data set has the largest standard deviation?
    ::10. 哪些数据集的标准差最大?

    1. 10, 10, 10, 10, 10
    2. 0, 0, 10, 10, 10
    3. 0, 9, 10, 11, 20
    4. 20, 20, 20, 20, 20

    11. What will a large variance look like on a histogram? What will a small variance look like on a histogram?
    ::11. 直方图上的大差异将是什么样子?直方图上的小差异将是什么样子?直方图上的小差异将是什么样子?

    12. You find some data organized in a bar graph. Could you calculate the variance of this data? Explain.
    ::12. 在条形图中找到一些数据。您能否计算此数据的差异? 解释 。

    13. A sample set of 20 exam scores is 67, 94, 88, 76, 85, 93, 55, 87, 80, 81, 80, 61, 90, 84, 75, 93, 75, 68, 100, 98. Calculate the mean, variance, and standard deviation for this data.
    ::13. 一组20分的抽样考试分数为67、94、88、76、85、93、55、87、80、81、80、61、90、84、75、93、75、68、100、98。

    14. All of Mike’s bowling scores are: 1, 1, 2, 10, 12, 1, 9, 6, 7, 8, 4, 3, 4, 1, 4, 1, 6, 7, 11, 5. Calculate the mean, variance, and standard deviation for this data.
    ::14. 迈克的保龄球分数全部是:1、1、2、10、12、1、9、6、7、8、4、4、4、4、1、4、1、6、7、11、5。

    15. Why can’t you always calculate the population variance and standard deviation? Why do you sometimes have to calculate the sample variance and standard deviation?
    ::15. 为什么你不能总是计算人口差异和标准偏差? 为什么有时还要计算抽样差异和标准偏差?

    Review (Answers)
    ::回顾(答复)

    Click to see the answer key or go to the Table of Contents and click on the Answer Key under the 'Other Versions' option.
    ::单击可查看答题键, 或转到目录中, 单击“ 其他版本” 选项下的答题键 。