13.10 扩散措施
Section outline
-
Measures of Dispersion
::分散措施Look at the graphs below. Each represents a collection of many data points and shows how the individual values (solid line) compare to the mean of the data set (dashed line). You can see that even though all three graphs have a common mean, the spread of the data differs from graph to graph. In statistics we use the word dispersion as a measure of how spread out the data is.
::查看下面的图表。 每个图表代表着多个数据点的集合, 并显示单个值( 固线) 如何与数据集的平均值( 斜线) 进行比较。 您可以看到, 尽管所有三个图表都有共同的平均值, 但数据分布却因图表而异。 在统计中, 我们使用单词分布作为数据分布的尺度 。Range
::范围范围范围Range is the simplest measure of dispersion. It is simply the total spread in the data, calculated by subtracting the smallest number in the group from the largest number.
::范围是最简单的分散度测量。它只是数据中的总分布,计算方法是从最大数字中减去该组中最小的数字。Finding the Range
::查找范围Find the range and the of the following data:
::查找下列数据的范围和范围:223, 121, 227, 433, 122, 193, 397, 276, 303, 199, 197, 265, 366, 401, 222
The first thing to do in this case is to order the data, listing all values in ascending order:
::在此情况下,首先要做的是订购数据,按升序列出所有值:121, 122, 193, 197, 199, 222, 223, 227, 265, 276, 303, 366, 397, 401, 433
Note: It is extremely important to make sure that you don’t skip any values when you reorder the list. Two ways to do this are (i) cross out the numbers in the original list as you write them in the second list, and (ii) count the number of values in both lists when you are done. In this example, both lists contain 15 values, so we can be sure we didn’t miss any (as long as we didn’t count any twice!)
::注意 : 在重新排序列表时确保您不跳过任何值是非常重要的。 这样做的两种方法是 (一) 在第二个列表中填入原始列表中的数字时,划出原始列表中的数字,以及(二) 在完成该列表时计算两个列表中的数值数量。 在这个例子中,两个列表包含15个值,因此我们可以确定我们没有漏掉任何值(只要我们没有两次计数的话 ) 。The range is found by subtracting the lowest value from the highest: .
::通过从最高值(433-121=312)中减去最低值(433-121=312),发现这一范围。And now that the list is ordered, we can see that the median is the 8th value: 227 .
::现在名单已经订购了, 我们可以看到中位数是第8值: 227。Variance
::差异The range is not a particularly good measure of dispersion, as it does not eliminate points that have unusually high or low values when compared to the rest of the data (the outliers ). A better method involves measuring the distance each data point lies from a central average .
::范围并不是一种特别好的分散测量方法,因为它并不消除与数据其他部分(外部线)相比值异常高或低的点。 更好的方法就是测量每个数据点从中央平均数的距离。Look at the following data values:
::查看以下数据值:11, 13, 14, 15, 19, 22, 24, 26
The mean of these values is 18; of course, the values all differ from 18 by varying amounts. Here’s a list of the values’ deviations from the mean:
::这些数值的平均值是18;当然,所有数值都与18不同,数量不同。 以下是这些数值与平均值的偏差清单:-7, -5, -4, -3, 1, 4, 6, 8
If we take the mean of these deviations, we find that it is zero:
::如果我们采用这些偏差的平均值,我们就会发现它为零:This comes as no surprise. You can see that some of the values are positive and some are negative, as the mean lies somewhere near the middle of the range. You can use algebra to prove (try it!) that the sum of the deviations will always be zero, no matter what numbers are in the list. So, the sum of the deviations is not a useful tool for measuring variance .
::这并不令人惊讶。 您可以看到有些值是正数, 有些值是负数, 因为平均值位于距离中间的某个地方。 您可以使用代数来证明( 尝试它! ) 偏差的总和总是为零, 无论列表中的数字是多少 。 因此, 偏差的总和并不是用来测量差异的有用工具 。But if we square the differences, all the negative differences become positive, and then we can tell how great the average deviation is. If we do that for this data set, we get the following list:
::但如果我们平分差异,所有负差都变为正数,然后我们就能知道平均偏差有多大。如果我们对这个数据集这样做,我们就会得到以下清单:49, 25, 16, 9, 1, 16, 36, 64
The sum of those squares is 216, so their average is .
::这些平方之和是216, 所以它们的平均值是2168=27。We call this averaging of the square of the differences from the mean (the mean squared deviation) the variance . The variance is a measure of the dispersion, and its value is lower for tightly grouped data than for widely spread data. In the example above, the variance is 27.
::我们称之为平均值(平均平方偏差)差差平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方平方What does it mean to say that tightly grouped data will have a low variance? You can probably already imagine that the size of the variance also depends on the size of the data itself. Mathematicians have tried to standardize the definition of variance in various ways; the standard deviation is one of the most commonly used.
::严格分组数据差异小意味着什么? 您可能已经可以想象差异大小也取决于数据本身的大小。 数学家试图以各种方式使差异定义标准化; 标准偏差是最常用的方法之一。Standard Deviation
::标准偏离You can see from the previous example that using variance gives us a measure of the spread of the data (you should hopefully see that tightly grouped data would have a smaller mean squared deviation and so a smaller variance) but it is not immediately clear what a number like 27 actually refers to. Since it is the mean of the squares of the deviation, however, it seems logical that taking its square root would be a better way to make sense of it. The root mean square (i.e. square root of the variance) is called the standard deviation , and is given the symbol .
::从上一个例子中可以看出,使用差异可以测量数据分布的尺度(您应该看到,严格分组的数据平均偏差较小,正方偏差较小),但并不清楚像27这样的数字实际指的是什么。然而,既然它是偏差方形的平均值,取其正方根是更能理解它的方法,这似乎是合乎逻辑的。根平均值正方形(即差异的平方根)被称为标准偏差,并给出符号s。Calculating the Mean, Variance, and Standard Deviation
::计算平均值、差异和标准偏差Find the mean, the variance and the standard deviation of the following values.
::查找下列值的平均值、差异和标准偏差。121, 122, 193, 197, 199, 222, 223, 227, 265, 276, 303, 366, 397, 401, 433
The mean will be needed to find the variance, and from the variance we can determine the standard deviation. The sum of all fifteen values is 3945, so their mean is .
::查找差异需要平均值,根据差异,我们可以确定标准差。所有15个值的总和是3945,因此其平均值是394515=263。The are often best calculated by constructing a table. Using this method, we enter the deviation and the square of the deviation for each separate data point.
::通常最好通过构建表格来计算。使用这种方法,我们输入每个独立数据点的偏差和偏差平方。Value Deviation Deviation 121 –142 20,164 122 –141 19,881 193 –70 4,900 197 –66 4,356 199 –64 4,096 222 –41 1,681 223 –40 1,600 227 –36 1,296 265 2 4 276 13 169 303 40 1,600 366 103 10,609 397 134 17,956 401 138 19,044 433 170 28,900 sum: 0 136,256 The variance is the mean of the squares of the deviations, so it is . The standard deviation is the square root of the variance, or approximately 95.31.
::差异是偏差平方的平均值,因此是136 25615=9083.733。标准偏差是差异的平方根,或大约95.31。If you look at the second column of the table, you can see that the standard deviation is a good measure of the spread. It looks to be a reasonable estimate of the average distance that each point lies from the mean.
::如果查看表格第二列,您可以看到标准偏差是利差的好尺度。它似乎是对每个点与平均值的平均距离的合理估计。Calculating and Interpreting Measures of Central Tendency and Dispersion for Real-World Situations
::计算和解释 " 现实世界 " 局势中央居住和分布的计算和解释措施A number of house sales in a town in Arizona are listed below. Calculate the mean and median house price. Also calculate the standard deviation in sale price.
::下文列出了亚利桑那州一个城镇的一些房屋销售情况。计算房屋平均价格和中位价格。还计算销售价格的标准差。Address Sale Price 518 CLEVELAND AVE $117, 424 1808 MARKESE AVE $128, 000 1770 WHITE AVE $132, 485 1459 LINCOLN AVE $77, 900 1462 ANNE AVE $60, 000 2414 DIX HWY $250, 000 1523 ANNE AVE $110, 205 1763 MARKESE AVE $70, 000 1460 CLEVELAND AVE $111, 710 1478 MILL ST $102, 646 The sum of all ten values is $1,160,370, so their mean is $116,037 .
::所有十种价值的总和是1 160 370美元,因此其平均值是116 037美元。The median is halfway between the and highest values. Those two middle values (if we reorder the list by price) are $110,205 and $111,710, so the median is $110,957.50 .
::中位数介于第五和第六最高值之间的一半。 这两个中间值(如果我们按价格重新排列清单)是110,205美元和111,710美元,因此中位数是110,957.50美元。Now we can rewrite the table with the deviations and their squares added in:
::现在,我们可以用偏差和他们的方形重写表格。Value ($) Deviation Deviation 60,000 -56037 3140145369 70,000 -46037 2119405369 77,900 -38137 1454430769 102,646 -13391 179318881 110,205 -5832 34012224 111,710 -4327 18722929 117,424 1387 1923769 128,000 11963 14311369 132,485 16448 270536704 250,000 133963 17946085369 SUM: 25178892752 The variation is , and the square root of that is about 50179. So the standard deviation is $50,179 .
::差异为2517889275210=2517889275.2,其平方根约为50179,因此标准差为50179美元。In this case, the mean and the median are close to each other, indicating that the house prices in this area of Mesa are spread fairly symmetrically about the mean. Although there is one house that is significantly more expensive than the others, there are also a number that are cheaper to balance out the spread.
::在本案中,中位数和中位数彼此接近,表明梅萨地区的房价与中位数相当对称。 虽然有一个房子比其他房子贵得多,但有一些房价更便宜,可以平衡差幅。Example
::示例示例示例示例Example 1
::例1James and John both own fields in which they plant cabbages. James plants cabbages by hand, while John uses a machine to carefully control the distance between the cabbages. The diameters of each grower’s cabbages are measured. James’s cabbages have an average (mean) diameter of 7.10 inches with a standard deviation of 2.75 inches; John’s have a mean diameter of 6.85 inches with a standard deviation of 0.60 inches.
::詹姆斯和约翰都拥有种植卷心菜的田地。 詹姆斯手工种植卷心菜,约翰则使用机器仔细控制卷心菜之间的距离。 测量每个种植者的卷心菜的直径。 詹姆斯和约翰的平均(平均)直径为7.10英寸,标准偏差为2.75英寸;约翰的平均直径为6.85英寸,标准偏差为0.60英寸。John claims his method of machine planting is better. James insists it is better to plant by hand. Use the data to provide a reason to justify both sides of the argument.
::John声称他的机械栽培方法比较好。 James坚持认为手工栽培更好。 利用数据来为争论双方提供理由。-
James’s cabbages have a larger mean diameter, so on average they are larger than John’s. The larger standard deviation also means that there will be a number of cabbages which are significantly bigger than most of John’s.
::詹姆斯的卷心菜平均直径较大,因此平均而言比约翰的大。 更大的标准偏差也意味着将有一些卷心菜比约翰的大得多。 -
John’s cabbages are smaller on average, but only by a little bit (one quarter inch). Meanwhile, the smaller standard deviation means that the sizes of his cabbages are much more predictable. The spread of sizes is much less, so they all end up being closer to the mean. While he may not have many extra large cabbages, he will not have any that are excessively small either, which may be better for any stores to which he sells his cabbages.
::约翰的卷心菜平均规模较小,但仅略小一点(四分之一英寸 ) 。 与此同时,更小的标准偏差意味着他的卷心菜的大小更可预测。 卷心菜的大小要小得多,因此它们最终都更接近于平均值。 虽然他可能没有多余的大型卷心菜,但他也不会有任何过小的卷心菜,这对他出售卷心菜的任何商店都可能更好。
Review
::回顾-
Two bus companies run services between Los Angeles and San Francisco. Inter-Cal Express takes a mean time of 9.5 hours to make the trip, with a standard deviation of 0.25 hours. Fast-Dog Travel takes 8.75 hours on average, with a standard deviation of 2.5 hours. If Samantha needs to travel between the cities, which company should she choose if:
-
She needs to be on time for a meeting in San Francisco.
::她需要准时到旧金山开会 -
She travels weekly to visit friends who live in San Francisco and wishes to minimize the time she spends on a bus over the entire year.
::她每周旅行一次,访问住在旧金山的朋友,希望尽量减少她整年在公共汽车上的时间。
::两家公共汽车公司在洛杉矶和旧金山之间运营服务。 跨Cal Express的行程平均需要9.5小时,标准偏差为0.25小时。 快速旅行平均需要8.75小时,标准偏差为2.5小时。 如果萨曼莎需要在城市之间旅行,她应该选择哪个公司:她需要准时到旧金山开会。 她每周都去拜访住在旧金山的朋友,希望尽可能减少她整年在公共汽车上的时间。 -
She needs to be on time for a meeting in San Francisco.
For problems 2-6, suppose you have a collection of data points for which you have already found the mean, median, mode, range, variance, and standard deviation. Then, you collect two new data points—one that is higher than any of the values in the original set, and one that is lower than any of the values in the original set.
::对于第2-6号问题,假设您已经收集了一些数据点,您已经找到了其中平均值、中位数、模式、范围、差异和标准偏差。然后,您收集了两个新的数据点,一个比原始数据集中任何一个值高,另一个比原始数据集中任何一个值低。-
Based on just this information, can you tell what will happen to the mean value of the data set when these new points are added? (In other words, can you say anything at all about whether the mean will or won’t increase, decrease, or stay the same, or do you not have enough information to tell—and if not, what additional information would you need?)
::仅凭此信息,你能否知道当添加这些新点时,数据集的平均值会如何? (换句话说,你能否说出任何关于该平均值是否会增加、减少或维持不变,或者你是否没有足够的信息可以告诉您 — — 如果没有,你还需要什么补充信息? ) -
Can you tell what will happen to the median value?
::您知道中位值会发生什么吗 ? -
Can you tell what will happen to the mode? (Assume the original data set has only one mode.)
::您能否知道模式会发生什么? (假设原始数据集只有一个模式 。) -
Can you tell what will happen to the range?
::你能分辨出射程会发生什么事吗? -
Can you tell what will happen to the variance and standard deviation?
::您知道差异和标准偏差会怎么样吗?
For problems 7-11, suppose that instead of collecting two new values for your data set above, you have only collected one new value—one that is higher than all the values in the original set.
::对于问题7-11,假设你没有为上面的数据集收集两个新值,而只是收集了一个新值,一个值高于原始数据集中的所有值。-
Now can you tell what will happen to the mean value?
::现在你能看出平均值会发生什么了吗? -
Can you tell what will happen to the median value?
::您知道中位值会发生什么吗 ? -
Can you tell what will happen to the mode?
::你能知道模式会怎么样吗? -
Can you tell what will happen to the range?
::你能分辨出射程会发生什么事吗? -
Can you tell what will happen to the variance and standard deviation?
::您知道差异和标准偏差会怎么样吗?
Finally, for problems 12-16, suppose that instead of being higher than all the values in the original data set, your new value is somewhere in the middle of the original data set. Specifically, suppose it is higher than the mean, lower than the median, and equal to the mode.
::最后,对于问题12-16,假设你的新值不是高于原始数据集中的所有值,而是位于原始数据集中间的某个地方。具体地说,假设它高于平均值,低于中位数,等于模式。-
Now can you tell what will happen to the mean?
::现在,你能看出来这个恶毒的下场了吗? -
Can you tell what will happen to the median?
::你能知道中位数会怎么样吗? -
Can you tell what will happen to the mode?
::你能知道模式会怎么样吗? -
Can you tell what will happen to the range?
::你能分辨出射程会发生什么事吗? -
Can you tell what will happen to the variance and standard deviation?
::您知道差异和标准偏差会怎么样吗?
Review (Answers)
::回顾(答复)Click to see the answer key or go to the Table of Contents and click on the Answer Key under the 'Other Versions' option.
::单击可查看答题键, 或转到目录中, 单击“ 其他版本” 选项下的答题键 。 -
James’s cabbages have a larger mean diameter, so on average they are larger than John’s. The larger standard deviation also means that there will be a number of cabbages which are significantly bigger than most of John’s.