Section outline

  • Suppose you were at a county fair and saw a large jar full of gumballs, maybe 1000 of them, with a sign that said “Guess the Number, Win a Prize!” If the rules of the game are that you could win a $10 prize by guessing within 200 gumballs either way, or a $50 prize by guessing within five gumballs either way, but you have to specify which prize you are trying for before submitting your guess, which would you choose?
    ::假设你在县集市看到一罐满满是口香糖的罐子,也许有1000个, 上面写着“猜数字,赢一个奖!” 如果游戏规则是,你可以通过在200个口香糖中猜出,或者通过在5个口香糖中猜出50个奖项,赢得10个奖项,但你必须具体说明在提出猜测之前你正在尝试的哪个奖项,你会选择哪个奖项?

    lesson content

    Confidence Intervals 
    ::信任度间

    The general concept of confidence intervals is pretty intuitive: It is easier to predict that an unknown value will lie somewhere within a wide range , than to predict it will occur within a narrow range. In other words, if you are making an educated guess about an unknown number, you are more likely to be correct if you predict it will occur within a wider range. This idea is reflected in the concept question above, where the reward is greater if you guess within a smaller range, because the contest creator knows that your chance of guessing correctly is much less if you have to guess within a smaller range. 
    ::信任期的一般概念非常直观:预测一个未知值会位于一个大范围的某个地方比预测它会发生在一个小范围的某个地方要容易得多。换句话说,如果你对一个未知数字进行有教育的猜测,那么如果预测它会发生在一个大范围的某个地方,你更有可能正确。以上的概念问题反映了这一想法,如果在较小范围内进行猜测,奖励会更大,因为竞争创造者知道,如果必须在一个小范围内进行猜测,你正确猜测的机会就会少得多。

    A confidence interval , centered on the mean of your sample , is the range of values that is expected to capture the population mean with a given level of confidence. A wider confidence interval is a greater range of values, resulting in a greater confidence level that the range will include the population mean. By convention, you will mostly be concerned with identifying the intervals associated with 90%, 95%, and 99% confidence levels.
    ::以样本平均值为中心的一个置信区间是预期以一定的置信度水平捕捉人口平均值的值范围。 更大的置信区间是更大的值范围, 导致包含人口平均值的更大置信度。 根据常规, 您将主要关注于确定90%、 95% 和 99% 的置信度的间隔。

    Calculate the confidence interval by combining the sample mean with the , found by multiplying the standard error of the mean by the z-score of the percent confidence level:
    ::将样本平均值与 中值乘以平均值的标准误差乘以百分比置信度的 z 分数, 以此计算置信间隔 :

    confidence interval = ¯ x ± margin of error margin of error = Z a 2 × σ n


    ::置信度间距= x' 错误差错间距的边距@a2 @n

    It is common, but incorrect, to assume that a confidence level indicates the probability that the mean of the population will occur within a given range of the mean of your sample. A 95% confidence interval means that if you took 100 samples, all of the same size, and formed 100 confidence intervals, 95 of these intervals would capture the population mean.
    ::假设信任度表明人口平均值在样本平均值的某一范围内发生的可能性是常见的,但并不正确。 95%的置信度间隔意味着,如果采集100个样本,所有样本的大小相同,并形成100个置信间隔,其中95个间隔将捕捉到人口平均值。

    The confidence level indicates the number of times out of 100 that the mean of the population will be within the given interval of the sample mean.
    ::置信度表示在抽样平均值的100次中,人口平均值在特定间隔之内的乘数。

    Comparing Sample Means to Population Means
    ::将抽样方法与人口方法进行比较

    Suppose you took 100 unbiased random samples of the heights of U.S. women (recall that height is normally distributed), each sample containing 30 women. What can you say about the means of the samples ( ¯ x 1 , ¯ x 2 , ¯ x 100 )  compared to the population mean?
    ::假设你抽取了100个美国女性高地的无偏向随机抽样(提醒注意通常分布的高度),每个样本中都有30位女性。与人口平均比例相比,你对样本手段(x1,x2,x100)有什么看法?

    Since height is normally distributed, we know that approximately 95% of women will have a height within two standard deviations of the mean (remember the ?). That means that out of 100 samples, we can assume that 95 of them will have a mean within 2 standard deviations of the population mean.
    ::由于通常的高度分布,我们知道大约95%的妇女的身高在平均值(还记得吗? )的两个标准差之内。 这意味着在100个样本中,我们可以假定其中95个在人口平均值的2个标准差之内。

    Predicting Population Means 
    ::预测人口手段

    Suppose the mean of the means of our 100 samples from Example A is 5′5″, in other words, ¯ X = 5 5 . Within what range of heights can we expect the population mean to be, with 95% confidence? Assume a standard deviation of 1.5″.
    ::假设我们从例A中采集的100个样本的平均值是 5 5 ,换句话说,是 5 。在95%的置信度下,我们预期人口会达到的高度范围是多少? 标准偏差为 1. 5 。

    Remember that since height is normally distributed, 95% of the values lie within 2 standard deviations of the mean, we need to identify that range of values.
    ::记住,由于高度通常分布,95%的值处于平均值的2个标准偏差之内,我们需要确定该数值的范围。

    • First we need to use  Z a 2 × σ n to identify the margin of error (since we are looking for a 95% confidence level, this is the range of values within 2 standard deviations of the sample mean). Since  σ = 1.5 , in this case we get 2 × 1.5 100 = 2 × 1.5 2 = 2 × 0.15 = 0.3 above and below ¯ X .
      ::首先,我们需要使用Za2n来识别误差幅度(因为我们正在寻找95%的置信度,这就是在样本平均值的2个标准差范围内的值范围 ) 。 从 {1.5\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
    • The interval then is 5′4.7″ to 5′5.3″, or three inches above and below the mean of 5′5″.
      ::间隔时间是5'4.7至5'5.3,或高于或低于5'5的平均值3英寸。

    We can say that there is a 95% probability that the mean of our 100 samples would be within 0.3 inches either way of the population mean. Since the mean of our sample is 5′5″, we can say that the population mean is between 5′4.7″ and 5′5.3″ with 95% confidence. 
    ::我们可以说,我们100个样本的平均值有95%的概率在0.3英寸以内。 由于我们样本的平均值是 5 5 5,我们可以说,人口平均值在 5 4 7 和 5 5 5 3 之间,具有95%的自信。

    Mathematically: 5 4.7 < μ < 5 5.3
    ::数学:5'4.75'5.3

    Plotting the Means of Samples 
    ::标 标 标 标 标 样 样 样 的 方 法

    Suppose you plot the mean of each of your height samples on a graph, and drawing a line each way of the mean of each sample to represent 2 standard deviations. If you were to do this for 50 of the samples, you might end up with an image like the one below.
    ::假设您在图表中绘制您每个高度样本的平均值, 并在每个样本的平均值中绘制一条线以代表 2 个标准偏差 。 如果您在 50 个样本中这样做, 最终您可能会出现下面的图像 。

    lesson content

    The image is a screen capture from the interactive applet at Bedford, Freeman, and Worth Publishing Group's website
    ::照片来自Bedford、Freeman和Worth出版集团网站互动小程序,

    At the top of the image is a normal curve . Each of the lines below the curve has a length that represents a 95% confidence interval, centered on the mean (in red) of the sample.
    ::图像的顶部是一个普通曲线。 曲线下的每一行的长度代表95%的置信间隔, 以样本的平均值( 红色) 为中心 。

    a. What is indicated by the lines that are all red in color?
    ::a. 红色线表示什么?

    The lines that are colored entirely red have a mean that is greater than 2 standard deviations away from the population mean. In other words, the mean of those two samples was not within the stated confidence interval (95%).
    ::完全红色的线条的平均值大于2个标准偏差,与人口平均偏差相差甚远。 换句话说,这两个样本的平均值不在规定的置信区间之内(95% ) 。

    b. What value is indicated by the vertical red center line on each interval?
    ::b. 每个间隔的垂直红中线表示什么值?

    The vertical red center line represents the mean of each sample.
    ::垂直红中线代表每个样本的平均值。

    c. What does the "percent hit" number mean? How would it change if you were to continue taking more and more samples of 60 each?
    ::c. “中百点击”数字意味着什么?如果继续采集60个样本,它又会如何变化?

    The “percent hit” number indicates the percentage of times that the population mean was included in the confidence interval of sample means. If you were to continue plotting sample means and confidence intervals, the percent hit would approach 95%. In fact, here is the same graph after 1000 sample runs:
    ::“ 中位点击” 数字表示在样本手段的置信区间中,人口平均值包含的百分比。 如果您要继续绘制样本方法和置信区间, 百位点击率将接近95%。 事实上, 在1000个样本运行后, 这里的图表相同 :

    lesson content

    Earlier Problem Revisited
    ::重审先前的问题

    Suppose you were at a county fair and saw a large jar full of gumballs, maybe 1000 of them, with a sign that said “Guess the Number, Win a Prize!” If the rules of the game are that you could win a $10 prize by guessing within 200 gumballs either way, or a $50 prize by guessing within five gumballs either way, but you have to specify which prize you are trying for before submitting your guess, which would you choose?
    ::假设你在县集市看到一罐满满是口香糖的罐子,也许有1000个, 上面写着“猜数字,赢一个奖!” 如果游戏规则是,你可以通过在200个口香糖中猜出,或者通过在5个口香糖中猜出50个奖项,赢得10个奖项,但你必须具体说明在提出猜测之前你正在尝试的哪个奖项,你会选择哪个奖项?

    This problem/question is meant to give you an intuitive feeling for the concept of a confidence interval or confidence level. It should be clear that you would have a greater level of confidence in trying for a $10 prize that you would win simply by guessing within +/- 20% of the number, than in trying for $50 by guessing within +/- 0.5% of the number!
    ::这个问题/ 问题是为了给您一种直觉感, 即信任间隔或信任度的概念。 很显然, 您在尝试10美元奖金时, 将拥有更大的信心, 只需在+/ - 20%的数值范围内猜算就可以赢得, 而不是在+/ - 0.5%的数值范围内猜算50美元!

    Examples
    ::实例

    Example 1
    ::例1

    Suppose you took 40 unbiased random samples of the number of candies in a $0.75 bag of candy from a particular factory. The factory states that the number of candies per bag is normally distributed. What can you say about the mean number of candies in your sample?
    ::假设你取了40个无偏向的随机样本 糖盒数量来自一个特定工厂的0.75美元糖袋。工厂说每个糖袋的数量通常分配。你能对样品中的糖罐平均数量说些什么?

    Since the population is normally distributed, we can state that the mean of the sample follows the Empirical Rule.
    ::由于人口正常分布,我们可以说,抽样的平均值遵循了经验规则。

    Example 2
    ::例2

    Suppose the factory states that the number of candies per bag has σ = 2 . If each sample includes data from 40 bags of candies ( n = 40 ) , what is the standard error of the mean ( σ n ) ?
    ::假设工厂说每袋糖果的数量为++2。 如果每样样本包括40袋糖果(n=40)的数据,那么平均值的标准错误No是什么?

    The standard error of the mean is calculated as  σ n , so  S E M = 2 40 = 2 6.32 = .31
    ::平均值的标准误差计算为n,SEM=240=26.32=31。

    Example 3
    ::例3

    If the sample mean is 38 candies, within what interval could we expect 99 out of each 100 samples to contain the population mean? What is that interval known as?
    ::如果样本的意思是38个糖果,那么每100个样本中,每100个样本中有99个可以容纳人口,在什么时间间隔之内?

    The interval is called the confidence interval, and it is calculated as  ¯ x ± z a 2 × ¯ σ
    ::间隔称为置信区间,计算为 'x'za2x'

    38 ± z 0.005 × .316 38 ± 2.58 × .316 38 ± 0.81528


    ::

    Therefore, the confidence interval is approximately 37.18 to 38.82
    ::因此,置信区间约为37.18至38.82。

    Example 4
    ::例4

    What is the more common way to describe the fact that “expect 99 out of each 100 samples contain the population mean”?
    ::“每100个样本中预期99个含有人口”这一说法更常见的说法是什么?

    Saying that you “expect 99 out of each 100 samples contain the population mean”, is the same as saying that the interval has a 99% confidence level.
    ::表示“每100个样本中预计99个样本中含有人口平均值”,与表示该间隔具有99%的置信度相同。

    Review 
    ::回顾

    1. What is a confidence interval?
    ::1. 什么是信任间隔?

    2. What is the formula for calculating the confidence interval?
    ::2. 计算置信间隔的公式是什么?

    3. What is the difference between a confidence interval and a confidence level?
    ::3. 信任间隔与信任水平之间有什么区别?

    4. What is a margin of error?
    ::4. 什么是误差幅度?

    5. How is the margin of error calculated?
    ::5. 误差幅度是如何计算的?

    6. What common misconception about confidence level is corrected by stating that a 99% confidence level means that 99 out of 100 samples are expected to contain the population mean?
    ::6. 关于信心水平的共同误解是什么? 说99%的信心水平意味着100个样本中的99个将包含人口平均值,从而纠正了这种误解?

    7. If a population is known to have an approximately normal distribution, but the standard deviation is unknown, how can the population standard deviation be approximated?
    ::7. 如果已知人口分布大致正常,但标准偏差尚不得而知,那么,如何才能接近人口标准偏差?

    8. If the sample mean is unknown, is it safe to use the population mean as the sample mean?
    ::8. 如果样本平均值未知,使用样本平均值的人口是否安全?

    9. What Z -score corresponds to a 98% confidence interval?
    ::9. 哪些Z值相当于98%的置信间隔?

    10. What confidence interval is associated with a Z -score of 2.576, assuming a two-tailed test?
    ::10. 假定进行双尾测试,则Z-芯片为2.576,置信度间隔与Z-芯片有关?

    11. Which confidence level would describe a wider confidence interval, 80% or 85%?
    ::11. 哪种信任度能表示更大的信任度间隔,即80%或85%?

    12. A factory produces bags of marbles for a toy store. The factory has previously calculated that the σ = 1   m a r b l e   p e r   b a g . If you were to sample 35 bags and calculate ¯ μ = 40 , within what range could you predict μ , with 98% confidence?
    ::12. 一家工厂为一个玩具商店生产一袋大理石,工厂以前曾计算每袋1美元大理石,如果抽样35袋,计算40美元大理石,那么在98%的置信度范围内,你能预测到什么范围?

    13. Interpret your results from question 12, in context.
    ::13. 在上下文中解释问题12的结果。

    14. The manager of a clothing store is attempting to estimate the mean number of customers that pass through her store each day. If the data from past estimates and other franchises suggests that σ = 78 , and the manager has collected the customer counts in the table below from a SRS (Simple Random Sample), what can the manager predict the range of customers to be, with 50% confidence?
    ::14. 服装店经理试图估计每天通过商店的顾客平均人数,如果过去估计和其他特许经营权的数据表明78岁和经理从SRS(简单随机抽样)中收集了下表中的顾客人数,那么经理能预测到50%的顾客范围,有50%的把握吗?

    148

    298

    210

    213

    315

    129

    145

    148

    131

    281

    317

    15. Interpret your answer from problem 14, in context.
    ::15. 在上下文中解释对问题14的答复。

    Review (Answers)
    ::回顾(答复)

    Click to see the answer key or go to the Table of Contents and click on the Answer Key under the 'Other Versions' option.
    ::单击可查看答题键, 或转到目录中, 单击“ 其他版本” 选项下的答题键 。