章节大纲

  • Lesson Objectives
    ::经验教训目标

    • Use the mean and standard deviation of a data set to fit it to a normal distribution and to estimate population percentages.
      ::使用数据集的平均值和标准差,使其符合正常分布,并估计人口百分比。

    Introduction: Going to College
    ::导言:进入学院

    Discussion Question: Grace recently took both the SAT and the ACT. She scored 1350 on the SAT, which has a mean score of 1060 with a standard deviation of 210. She scored 29 on the ACT, which has a mean score of 21 with a standard deviation of 5.4. On which test did she do relatively better?
    ::讨论问题:格蕾丝最近攻读了SAT和ACT,在SAT中得1350分,平均得分为1060分,标准差为210分,在ACT中得29分,平均得分为21分,标准差为5.4分。


    Activity 1: Measuring Distance From the Mean
    ::活动1:衡量离平均值的距离

    Probability distributions help us understand the likelihood of the occurrence of a specific value in a discrete or continuous dataset. For example, what proportion of movies are more than five hours long, or what is the probability that a person is over 7 feet tall? This probability isn’t too difficult to determine if we have a large enough data set, but what if we want to determine whether it is more unusual for a movie to be five hours long or for a person to be seven feet tall? We will need a universal unit of measure for comparing the likelihood of a single data value relative to the mean. The unit we will be using is the standard deviation. We will measure the distance of a data value in the units of standard deviations. Recall that the standard deviation of a dataset is a measure of the spread of a dataset. The more spread out a data set is, the greater the standard deviation. Standard deviation is an effective way to compare the relative distance between data values and datasets. Use the interactive below to derive the formula for measuring the distance of a data value from the mean in standard deviations.
    ::概率分布有助于我们理解在离散或连续数据集中出现特定值的可能性。 例如, 电影的哪个比例超过5小时长, 或者一个人超过7英尺高的概率是多少? 这一概率并不很难确定我们是否拥有一个庞大的数据集, 但如果我们想确定电影的长度是5小时长还是一个人达到7英尺高是更不寻常的? 我们需要一个通用的计量单位来比较单个数据值与平均值的相对可能性。 我们将使用的单位是标准偏差。 我们将用标准偏差的单位衡量数据值的距离。 我们将会在标准偏差的单位中测量数据值的距离。 提醒注意, 数据集的标准偏差是衡量数据集分布的尺度。 数据集越分散,标准偏差就越大。 标准偏差是比较数据值和数据集之间相对距离的有效方法。 使用下面的交互单位来计算数据值与标准偏差平均值之间的距离的公式。

    Use the interactive to explore what a scale of standard deviations would look like.
    ::使用互动来探索标准偏差的规模。

    INTERACTIVE
    Measuring Distance from the Mean
    minimize icon
    • Use the orange and purple sliders to change the mean and the standard deviation of the distribution.

      ::使用橙色和紫色滑块来改变分布的平均值和标准偏差。
    Your device seems to be offline.
    Please check your internet connection and try again.

    +
    Do you want to reset the PLIX?
    Yes
    No

    Activity 2: Z S cores
    ::活动2:Z分数

    In the past, we have used the interquartile range and percentiles to measure the location of a data value relative to the center. The IQR is calculated with the as a measure of center. One challenge using the IQR and percentiles is that it is difficult to find the location of multiple points relative to the median. Since the standard deviation is a measure of the typical deviation from the mean, we can use it in situations where we have the mean. The standard deviation is particularly useful for data that is normally distributed . Data that is normally distributed is symmetric about the mean with a greater likelihood of falling closer to the mean. The number of standard deviations and direction that a data value is from the mean is referred to as its z score . A negative z score represents a data value that falls to the left of the mean. A positive data value represents a data value that falls to the right of the mean. The z score of the mean is 0. We typically do not associate units with the z score.
    ::过去,我们曾使用数字间距和百分位数来测量数据值相对于中心的位置。 IQR 是以中间值的测量尺度计算出来的。使用 IQR 和百分位数的一个挑战是,很难找到与中位数相对的多点位置。由于标准偏差是典型偏离平均值的尺度,我们可以在有平均值的情况下使用。标准偏差对于通常分布的数据特别有用。通常分布的数据是平均值的对称性,接近平均值的可能性更大。数据值的标准偏差和方向数被称为z分。负z分代表值左边的数据值。正值数据值代表值向平均值右边的数据值。平均值的z分为0。我们通常不将单位与z分挂钩。

    INTERACTIVE
    Z-scores
    minimize icon
    • Drag the red point to change the sample data value.
      ::拖曳红色点以更改样本数据值。
    • Adjust the mean and standard deviation to see how it affects the data value's z-score.
      ::调整平均值和标准偏差,看它如何影响数据值的z-分数。
    Your device seems to be offline.
    Please check your internet connection and try again.

    +
    Do you want to reset the PLIX?
    Yes
    No

    The z score of a data value can be determined using the following formula:
    ::可用下列公式确定数据值的z分数:

    z = x μ σ

    ::z z=x

    • z is the z score
      ::z 是 z 分
    • x the data value that you are comparing to the mean
      ::x 与平均值比较的数据值
    • μ is the mean
      ::~ 是平均值 ~
    • σ is the standard deviation
      ::XIII 是标准偏差

    A z score that is 2 or more standard deviations from the mean is considered to be a significantly high value. Only about 2.3% of the data values in a normally distributed dataset will be 2 or more standard deviations from the mean. A z score that is -2 or less standard deviations from the mean is considered to be a significantly low value. Only about 2.3% of the data values in a normally distributed dataset will be -2 or less standard deviations from the mean.
    ::与平均值标准差为2个或2个以上的z得分被认为值高得多。在通常分布的数据集中,只有约2.3%的数据值与平均值标准差为2个或2个以上。与平均值标准差为2个或2个以下的z得分被认为值低得多。在通常分布的数据集中,只有约2.3%的数据值与平均值标准差为2个或2个以下。

    lesson content
    Significance based on standard deviations from the mean.

    When we attach z scores to the data values in a dataset, we say that we standardize it. We are expressing each data value in the set relative to the mean. S tandardiz ing a dataset allows us to compare values from completely different datasets in terms of which value is relatively more “abnormal.”
    ::当我们在数据集中给数据值附加 z 分时,我们说我们把它标准化了。我们正在显示每组数据相对于平均值的数值。 将数据集标准化使我们能够比较完全不同的数据集的数值,根据这些数据集的价值相对来说比较“异常 ” 。

    Example
    ::示例示例示例示例

    Grace recently took both the SAT and the ACT. She got scored 1350 on the SAT, which has a mean score of 1060 with a standard deviation of 210. She scored 29 on the ACT and has a mean score of 21 with a standard deviation of 5.4. On which test did she do relatively better?
    ::格蕾丝最近攻读了SAT和ACT,在SAT中得1350分,平均得分为1060分,标准偏差为210分,在ACT中得29分,平均得分为21分,标准偏差为5.4分。在哪个测试中,她做得比较好?

    To determine which test Grace did better on, let’s standardize each dataset. We will assign a z score to each score and compare them. 
    ::为了确定格蕾丝的测试做得更好,让我们将每个数据集标准化。 我们将给每个得分分配一个z分,并进行比较。

    SAT   ACT
    z = x μ σ z = 1350 1060 210 z = 290 210 z 1.38     z = x μ σ z = 29 21 5.4 z = 8 5.4 z 1.48  

    Let’s examine this on a standardized number line. The numbers on this number line represent standard deviations from the mean, located at 0.
    ::让我们用一个标准数字行来检查这一点。 这个数字行的数字代表了位于 0 的平均值的标准偏差 。

    lesson content
    Grace's z scores on the SAT and ACT.

    If we want to know which value is more extreme, we are looking at the z score, which is farther from the mean. In this case, since both values are above the mean, the greater z score will represent the more extreme value.
    ::如果我们想知道哪个值更极端,那么我们看到的是z分,这比平均值要远。 在这种情况下,由于这两个值都高于平均值,因此,z分越大代表着越极端值。

    Answer: Grace did relatively better on the ACT.
    ::答复:Grace在青蒿素综合疗法方面做得相对较好。


    Activity 3 : The Empirical Rule
    ::活动3:经验规则

    In Algebra 1, you may have used the percentage of data within one standard deviation of the mean to make observations about the skew of a dataset. You might not have known it at the time, but these intervals represented z scores. By finding the percent of the dataset that falls within an interval , we can make predictions about where data values will fall in similar datasets based on their z scores. Let’s examine a larger dataset of sat scores. In the interactive below, we will simulate pulling the scores of other students who took the same exam as Grace from a nationwide database.
    ::在代数1中,您可能用了一个标准差范围内的数据百分比来观察数据集的偏差。您可能当时还不知道,但这些间距代表了z分。通过找到一个间隔内数据集的百分比,我们可以预测根据他们的z分在类似数据集中的数据值会在哪里下降。让我们检查一个较大的卫星分数数据集。在下面的互动中,我们将模拟从全国数据库中与Grace相同的考试中抽取其他学生的分数。

    INTERACTIVE
    The Empirical Rule
    minimize icon

    Examine a normally distributed dataset of SAT scores. Recall that the empirical rule states for normally distributed data, nearly all values reside within 3 standard deviations of the mean. 
    ::回顾经验规则规定,对于通常分发的数据,几乎所有数值都处于平均值的3个标准差之内。

    • Add additional test scores by clicking +10 or +100.
      ::单击 +10 或 +100 添加额外的测试分数。
    • Use the first checkbox to show/hide the empirical rule proportions 68-95-99.7.
      ::使用第一个复选框显示/隐藏68-95-99.7比例的经验规则。
    • Use the second checkbox to show/hide the test score intervals corresponding with the z-scores along the x-axis.
      ::使用第二个复选框来显示/隐藏与 X 轴沿线的 z 分数相应的测试分数间隔。
    Your device seems to be offline.
    Please check your internet connection and try again.

    +
    Do you want to reset the PLIX?
    Yes
    No

    The dataset that you created above is an example of a normal distribution. As you can see, there is a greater likelihood that the score of a test taker will fall closer to the mean. Additionally, a randomly chosen score is just as likely to fall above the mean as below. Notice how the proportion of data within each group of standard deviations is approximately equal on the left and right side of the data. These proportions emerge for all normally distributed data, and this phenomenon is known as the . The empirical rule states that for normally distributed data, nearly all data values will reside within three standard deviations of the mean. Additionally, approximately the same proportions of data between 1, 2, and 3 standard deviations of the mean will occur in  relatively large datasets.
    ::上面创建的数据集是正常分布的一个例子。 正如你可以看到的, 测试对象的分数更有可能更接近平均值。 此外, 随机选择的分数同样可能比平均值高。 注意数据左侧和右侧每组标准偏差中的数据比例如何大致相等。 这些比例出现在所有正常分布的数据中, 这个现象被称为 。 经验规则指出, 在通常分布的数据中, 几乎所有的数据值都处于平均值的三个标准偏差之内。 此外, 平均值的1、 2 和 3个标准偏差中, 大约相同比例的数据将在相对大的数据集中出现 。

    Discussion Question: Why do you think the empirical rule will not apply to a dataset that is not normally distributed?
    ::讨论问题:你为什么认为经验规则不适用于通常不分发的数据集?

      Summary
    • The z score is the number of standard deviations and direction that a data value is from the mean.
      ::z分是数据值来自平均值的标准偏差数和方向。
    • The formula for z score is z = x μ σ  
      ::兹得分的公式是z=x
    • The empirical rule states that for normally distributed data, nearly all data values will reside within three standard deviations of the mean.
      ::实证规则规定,对于通常分发的数据,几乎所有数据值都处于平均值的三个标准差之内。

    Wrap-Up: Review Questions
    ::总结:审查问题