4.1 分组数据
Section outline
-
Suppose you were given the following data :
::假设你得到了以下数据:87, 72, 91, 91, 73, 83, 79, 81, 87, 72, 81, 91, 73, 73, 73
If you were told you were going to evaluate this data using common methods of central tendency and dispersion , how might you start by organizing the data in order to make the study as straightforward as possible?
::如果你被告知要使用中央趋势和分散的常见方法来评估这些数据,你如何开始组织数据以使研究尽可能直截了当?Grouping Data
::分组数据Data in its original form, just a list of numbers, names, letters, colors, etc., is known as raw data , and is often not particularly useful without some kind of organization. The series of numbers in the concept question above, for instance, doesn’t really mean anything at the moment. Without some sort of context and some level of organization, this is just a bunch of meaningless values.
::最初的数据形式,仅仅是数字、名称、字母、颜色等清单,被称为原始数据,如果没有某种组织,通常不会特别有用。 比如,上述概念问题中的一系列数字在目前并不真正意味着什么。 没有某种背景和某种层次的组织,这只是一堆毫无意义的价值观。Data can be classified into two general types, quantitative and qualitative . There are a number of ways to group or organize each type of data to make it more useful.
::数据可分为两类,即定量和定性两类,有几种方法可以对每一类数据进行分组或组织,使其更有用。-
Quantitative
Data (data that may be conveniently described numerically):
-
Dates or times are commonly organized chronologically
::日期或时间通常按时间顺序排列 -
Data with recurring values is often organized in a
frequency distribution
::重复值数据往往按频率分布编排 -
Univariate
data likely to be used for evaluating the mean or the
range
of a
population
is generally organized in increasing or decreasing magnitude or alphabetically.
::可用于评价人口平均值或范围的单数数据一般按增减幅度或按字母顺序排列。 -
Bivariate
data is usually organized in a table showing how the two variables change in relation to each other.
::双变量数据通常在一个表格中编排,显示两个变量之间如何相互变化。 -
Compare and contrast tables are excellent for evaluating two or more variables
::比较表和对比表对评价两个或两个以上变数是极佳的
::量化数据(数据可以方便地用数字说明):日期或时间通常是按时间顺序排列的。 具有重复值的数据往往在频率分布中排列。 可用于评价人口平均值或范围的单变式数据一般按增减幅度或字母顺序排列。 双变式数据通常在表格中排列,显示两个变量彼此之间如何变化。比较和对比表对评价两个或两个以上变量来说是优异的。 -
Dates or times are commonly organized chronologically
-
Qualitative
Data (data that may be difficult to describe with numerical values)
-
Commonly grouped by category
::共同按类别分类 -
Categories are often evaluated using a frequency distribution
::通常使用频率分布评估类别 -
Data may be organized in order of importance
::数据可按重要程度排列 -
Inductive
organization orders information by increasing complexity, listing facts prior to conclusions and advancing from specific examples to general conclusions.
::感性组织通过增加复杂性、在作出结论之前列举事实和从具体实例向一般性结论推进来命令信息。 -
Deductive
organization is the inverse of inductive, listing recommendations/conclusions followed by supporting facts/data.
::递减性组织是反感性的、列出建议/结论,然后是佐证事实/数据的反面组织。
::定性数据(数据可能难以用数值加以说明) 通常使用频率分布法对按类别分类的通用数据进行评价。 通过增加复杂性、在结论之前列举事实和从具体实例向一般性结论推进,可以按重要程度组织安排数据。 -
Commonly grouped by category
Organizing Data
::组织数据Elaina is preparing to create a histogram to illustrate the data that she collected on average time spent taking a particular test in her Statistics class.
::Elaina正准备制作直方图,以说明她在统计类中进行特定测试的平均时间收集的数据。16 mins, 18.5 mins, 14.5 mins, 16 mins, 19 mins, 18 mins, 16.5 mins, 15 mins, 15 mins, 14.5 mins, 14 mins, 16 mins, 12.5 mins, 19.5 mins, 14 mins, 15 mins, 16.5 mins, 14 mins, 18 mins, 16 mins
::16分钟、18.5分钟、14.5分钟、16分钟、19分钟、18分钟、16.5分钟、15分钟、15分钟、15分钟、15分钟、14.5分钟、14分钟、14分钟、16分钟、12.5分钟、19.5分钟、14分钟、15分钟、16.5分钟、16.5分钟、14分钟、14分钟、14分钟、18分钟、16分钟A histogram is a graph that illustrates the relative frequency or probability density of a single variable .
::直方图是一个图表,显示单个变量的相对频率或概率密度。a. How should she organize the data to make the construction of the histogram as straightforward as possible.
::a. 她应如何组织数据,使直方图的构造尽可能直截了当。Since Elaina will need to identify the number of values in each category of the data, it would be ideal to organize the data in groups called classes or intervals. With the given data, intervals of 1 minute each would seem appropriate.
::由于Elaina需要确定每一类数据中的数值数目,最好按称为类别或间隔的组别来组织数据,根据所提供的数据,每个类别间隔1分钟似乎是合适的。b. What will the data look like after it is organized?
::b. 数据组织起来后将是什么样子?Minutes Required to Complete the Test:
::完成测试所需的分钟 :12.5 | 14, 14, 14, 14.5, 14.5 | 15, 15, 15 | 16, 16, 16, 16, 16.5, 16.5 | 18, 18, 18.5 | 19, 19.5
Since Elaina will need to identify the number of values in each category of the data, it would be ideal to organize the data in groups called classes or intervals. With the given data, intervals of 1 minute each would seem appropriate.
::由于Elaina需要确定每一类数据中的数值数目,最好按称为类别或间隔的组别来组织数据,根据所提供的数据,每个类别间隔1分钟似乎是合适的。Organizing Raw Data to Create a Box-and-Whisker Plot
::组织原始数据以创建插盒和口述口语图Orlando is planning to create a box-and-whisker plot to illustrate how much more popular dogs and cats are as pets than fish-tanks, reptiles, and birds. He has collected the data below from a randomized sample of homes in his town, using a survey questioning the number of pets each family has in each category
::奥兰多正计划建立一个盒子和小费的阴谋,以说明比鱼罐、爬行动物和鸟类更受欢迎的狗和猫是宠物。 他收集了来自本镇随机抽样家庭的数据,调查了每个家庭每一类宠物的数量。House 1: 2 dogs, 2 cats, 0 birds, 0 reptiles, 1 fish tank
::1号房: 2只狗, 2只猫, 0只鸟, 0只爬行动物, 1个鱼缸House 2: 3 dogs, 2 cats, 1 birds, 0 reptiles, 0 fish tank
::房子2: 3只狗,2只猫,1只鸟,0只爬行动物,0个鱼缸House 3: 0 dogs, 3 cats, 0 birds, 1 reptiles, 1 fish tank
::3号房:0只狗,3只猫,0只鸟,1只爬行动物,1个鱼缸House 4: 2 dogs, 1 cats, 2 birds, 0 reptiles, 1 fish tank
::4号房:2只狗,1只猫,2只鸟,0只爬行动物,1个鱼缸House 5: 2 dogs, 1 cats, 0 birds, 0 reptiles, 0 fish tank
::5号房: 2只狗, 1只猫, 0只鸟, 0只爬行动物, 0个鱼缸House 6: 2 dogs, 2 cats, 0 birds, 0 reptiles, 0 fish tank
::6号房: 2只狗, 2只猫, 0只鸟, 0只爬行动物, 0只鱼缸House 7: 3 dogs, 1 cats, 1 birds, 0 reptiles, 2 fish tank
::7: 7房: 3只狗,1只猫,1只鸟,0只爬行动物,2个鱼缸House 8: 3 dogs, 2 cats, 0 birds, 0 reptiles, 1 fish tank
::第八屋: 3只狗,2只猫,0只鸟,0只爬行动物,1个鱼缸House 9: 2 dogs, 3 cats, 0 birds, 0 reptiles, 0 fish tank
::9号房:2只狗,3只猫,0只鸟,0只爬行动物,0只鱼缸House 10: 1 dogs, 3 cats, 0 birds, 0 reptiles, 0 fish tanks
::10号房:1只狗,3只猫,0只鸟,0只爬行动物,0个鱼缸How should Orlando organize the raw data to facilitate the creation of his box-and-whisker plot? What will the organized data look like?
::奥兰多应该如何组织原始数据来帮助创造他的盒子和口哨图? 有组织的数据将是什么样子?Since Orlando’s box-and-whisker plot is specifically meant to highlight the number of dogs and cats, it would be a good idea to organize the data in groups by importance, with dogs and cats first. Since he will need to identify the mean, range, and quartiles of the data, it would also be good to organize each group by increasing values.
::由于奥兰多的盒子和口哨图旨在突出狗和猫的数量,因此将数据按重要群体分类,首先以狗和猫为主是一个好主意。 既然他需要确定数据的平均值、范围以及四分之一,那么通过增加价值来组织每个群体也是件好事。-
Dogs: 0, 1, 2, 2, 2, 2, 2, 3, 3, 3
::狗: 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3 -
Cats: 1, 1, 1, 2, 2, 2, 2, 3, 3, 3
::猫: 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3 -
Birds: 0, 0, 0, 0, 0, 0, 0, 1, 1, 2
::鸟:0,0,0,0,0,0,0,0,1,1,2 -
Reptiles: 0, 0, 0, 0, 0, 0, 0, 0, 0, 1
::爬虫:0,0,0,0,0,0,0,0,1 -
Fish Tanks: 0, 0, 0, 0, 0, 1, 1, 1, 1, 2
::鱼罐:0,0,0,0,0,0,1,1,1,1,1,2
Consolidating Data
::合并数据Cheng is interested in the phenomenon of the changes in how fast time seems to pass to people as they age. He has collected data from 300 people between the ages of 10 and 70. Each person reported the time it seemed to take to complete three neutral (neither particularly liked nor disliked) activities, one 5mins, one 15mins, and one 60mins long. Now Cheng has a massive and somewhat intimidating list of numbers, and he needs to decide how to organize what he has into something useful.
::Cheng对改变人们在年老时所经历的时间似乎有多快的现象感兴趣。他收集了300名10岁至70岁人口的数据。 每个人都报告了完成三种中立活动(既不特别喜欢也不不喜欢)所需的时间,一个5分钟,一个15分钟,一个60分钟。 现在,Cheng有一个庞大的、有点吓人的数字清单,他需要决定如何将他拥有的东西组织成一些有用的东西。With such a huge amount of raw data, Cheng’s greatest challenge will be consolidating it into a useful and informative format.
::以如此庞大的原始数据, Cheng 最大的挑战将是将其整合为有用且信息丰富的格式。a. Identify at least 2 different ways that Cheng might organize the raw data that would illustrate changes in time perception as people age.
::a. 确定成成组织原始数据的至少两种不同方式,以说明时间观念随着人口年龄的变化。Cheng might choose to organize the data by increasing time in several age groups, sorting the values first by age, and then by perceived time for each activity. He might also wish to sort first by actual activity length, then by age or perceived time passage.
::Cheng可能选择通过增加几个年龄组的时间来组织数据,首先按年龄对数值进行排序,然后根据每项活动的时间来进行排序。 他还可能希望首先根据实际活动时间,然后根据年龄或预期时间间隔进行排序。b. How might Cheng consolidate the data so he doesn't end up needing to plot nearly 1000 values on a chart or graph?
::b. Cheng如何合并数据,使他不需要在图表或图表上绘制近1000个数值?Finding the mean perceived length for each of several age ranges would be a great way for Cheng to maintain the general integrity of his data while reducing the sheer volume.
::发现几个年龄范围的每个年龄范围的平均值长度,将是Cheng在减少数量的同时保持其数据的整体完整性的伟大途径。Earlier Problem Revisited
::重审先前的问题87, 72, 91, 91, 73, 83, 79, 81, 87, 72, 81, 91, 73, 73, 73
If you were told you were going to evaluate this data using common methods of central tendency and dispersion, what sort of preparation could you do in order to make the study as straightforward as possible?
::如果你被告知你打算使用中央趋势和分散的常见方法来评估这些数据,你能够做哪些准备工作,使研究尽可能直截了当?Central tendency measurements are generally facilitated by organizing data in increasing value from left to right. Ideally, it would be convenient to also note the total number of values, along with their sum, as you are ordering them.
::集中趋势测量一般通过从左到右的增值数据来组织数据来方便。 理想的情况是,在您订购数值时,最好也注意到数值的总数及其总和。Examples
::实例A class of 40 students took a science exam. They earned the following percentages on their tests:
::40个班级的学生参加了科学考试,他们的考试获得以下百分比:73, 45, 62, 34, 59, 20, 48, 50, 78, 38, 52, 91, 57, 82, 46, 51, 62, 58, 39, 50, 72, 73, 63, 52, 41, 37, 28, 46, 71, 75, 36, 28, 44, 90, 51, 28, 60, 18, 47, 40.
Example 1
::例1Describe or demonstrate a means of displaying the results more clearly.
::描述或显示以何种手段更清楚地显示结果。A good start would be to simply organize the numbers in increasing order:
::一个良好的开端是,简单地将数字按不断增长的顺序排列:18, 20, 28, 28, 34, 36, 37, 38, 39, 40, 41, 44, 45, 46, 46, 47, 48, 50, 50, 51, 51, 52, 52, 57, 58, 59, 60, 62, 62, 63, 71, 72, 73, 73, 75, 78, 82, 90, 91.
Now we can see at a glance that the numbers range from 18 to 91, with a greater frequency in the mid-range than at the extremes .
::现在我们可以看到,这些数字介于18到91之间,中间的频率比最极端要高。Example 2
::例2The teacher wants to compare the student’s scores with those of another class. Describe a means of organizing the data that would make it easy to compare the two sets of data.
::教师想将学生的分数与其他班级的分数进行比较。 描述一种组织数据的方法,这样可以方便地比较两套数据。To compare the scores with another class, it would be convenient to have the number of scores in each range summarized. She might either tally the number of scores between 0 and 10, then 10 and 20, and so on, or just tally the number of A’s, B’s, etc.
::将得分与另一类作比较,可以将每一范围的得分数量进行汇总。 她可以将得分数量在0到10之间,然后是10到20之间,等等,或者只计算A、B等的得分数量。Example 3
::例3The teacher gave grades as follows:
::教师的成绩如下:A grade: 90 and above
::A级:90及以上B grade: 80 to 89
::B级:80至89C grade: 70 to 79
::C级:70至79D grade: 60 to 69
::D级:60至69岁F grade: 59 and below
::F级:59岁及以下Make a table to show how many students achieved each grade
::制作一张表格,显示每个年级完成学业的学生人数The table would look like this:
::这张表格看起来是这样的:A B C D F 2 1 6 4 26 (Either that was a frightfully difficult exam, or the students didn’t study well!)
::校对:PortnoyExample 4
::例4Determine if the data is qualitative or quantitative.
::确定数据是定性数据还是定量数据。-
The majority of the people in Asia most often wear the color red.
::亚洲大多数人最常穿红色的颜色。 -
A survey was done among elementary age children to discover their favorite fruit.
::在小学生中进行了一项调查,以发现他们最喜欢的水果。
These are both qualitative. Neither A, nor B could be expressed as numerical data .
::两者均为定性数据,A或B均不能以数字数据表示。Example 5
::例5These are the numbers of cars sold at a local dealer over the last 12 days. Create a Frequency Distribution Table . 3, 5, 1, 4, 3, 2, 2, 1, 3, 2, 5, 4.
::创建频率分布表3、5、1、4、3、4、3、4、2、3、2、3、2、3、2、3、5、4、4。To create a frequency distribution table for 3, 5, 1, 4, 3, 2, 2, 1, 3, 2, 5, 4, simply label the values that occur in the set across the top, and the number of occurrences of each in a 2 nd row beneath, either as numerals or as tally marks:
::为 3, 5, 1, 4, 3, 3, 2, 2, 2, 1, 3, 3, 2, 2, 5, 5, 4, 4 创建一个频率分布表, 仅标注在上方的一组数值, 以及下方第二行每个数值的发生次数, 以数字或计数标记 :Value: 1 2 3 4 5 Frequency: II
::二、二2
III
::三、三3
III
::三、三3
II
::二、二2
II
::二、二2
Review
::回顾For Q’s 1-3, determine if the data is qualitative or quantitative.
::s 1-3,确定数据是定性数据还是定量数据。1. The average temperature of a particular city is 23 degrees C.
::1. 特定城市的平均温度为23摄氏度。2. Determine if the number of hours a person spends in front of a computer will affect their eye sight.
::2. 确定一个人在计算机面前花费的小时数是否会影响其视力。3. A random survey was done to find out the average speed of cars on a highway.
::3. 进行了随机调查,以了解高速公路上车辆的平均速度。4. Which letter has the greatest frequency in the following sentence?
::4. 以下一句中哪个字母的频率最大?THE SUN ALWAYS SETS IN THE WEST.
::太阳总是在西边的沙发上。5. Joe scored the following numbers of goals in their last twenty soccer games: 3, 0, 1, 5, 4, 3, 2, 6, 4, 2, 3, 3, 0, 7, 1, 1, 2, 3, 4, 3.
::5. 乔在最后20场足球赛中得分如下:3,0,1,5,4,3,3,2,6,4,4,3,3,3,3,0,7,1,1,1,2,3,4,3,3,3,3,3,4,3。-
Organize the values from smallest to greatest
::组织最小值至最大值 -
Which number had the greatest frequency?
::哪个数字的频率最大?
6. The following number gives the first 31 digits of pi: 3141592653589793238462643383279
::6. 以下数字为pi的前31位数: 3141592653589793238462643383279。-
Treating each digit as a separate unit of data, how might you organize the units to prepare for an evaluation of their frequency and range (spread)?
::将每个数字作为独立的数据单位处理,你如何组织各单位准备评价其频率和范围(分布)? -
How would the units appear after the organization?
::这些单位在组织之后将如何出现? -
What is the frequency of the digits 3, 5, and 7?
::数字3、5、7的频率是多少?
7. A die was thrown 100 times. The frequency distribution is shown in the following table:
::7. 死亡100次,频率分布见下表:Roll Frequency 1 21 2 11 3 15 4 19 5 16 6 18
-
What is the total frequency of numbers less than 4?
::数字少于4的总频率是多少? -
What percentage of throws of the die were higher than 5?
::被丢弃的死者中超过5人的百分比是多少? -
How many throws scored greater than 2, but less than or equal to 5?
::有多少投球得分超过2,但小于或等于5?
50 students took a test with a total of 10 possible points. The frequency distribution is shown in the following table:
::50名学生参加考试,共可达10个分数,频率分布见下表:Score Frequency 0 1 1 2 2 1 3 3 4 1 5 4 6 9 7 8 8 7 9 10 10 4 8. If 60% is a passing score, how many students passed the test?
::8. 如果60%是过分,有多少学生通过了考试?9. How many students scored above 80%?
::9. 有多少学生得分超过80%?10. What percentage of the students had 5 or more questions correct?
::10. 哪些比例的学生有5个或5个以上的问题是正确的?11. How many students scored greater than or equal to 4, but less than or equal to 7?
::11. 有多少学生得分大于或等于4,但小于或等于7?A spinner is in the shape of a regular heptagon marked with the numbers 1 to 7. Sue spun the spinner 50 times and recorded her results:
::以1至7号标注的普通七边形形状的旋转器, Sue 将旋转器旋转50次并记录其结果:1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7
12. Create a frequency table with the data.
::12. 用数据创建频率表。13. Which spin had the greatest frequency?
::13. 哪个旋转频率最大?14. Which spin had the least frequency?
::14. 哪个旋转频率最小?A teacher decided to survey the students in her class to determine the number of siblings each of them had. The following numbers are the total number of siblings reported by each student in the class: 2, 0, 1, 0, 1, 0, 4, 3, 4, 9, 2, 1, 3, 1, 5, 1, 2, 1, 2, 4, 3, 2, 2, 6, 3, 2, 4, 2, 3, 5
::一名教师决定对班级学生进行调查,以确定每个学生的兄弟姐妹人数,以下数字是每班学生报告的兄弟姐妹总数:2,0,1,0,1,0,1,0,4,3,4,4,4,9,2,2,1,3,3,1,1,1,1,1,2,1,2,2,2,4,4,3,2,2,2,2,2,6,6,3,3,4,4,4,2,3,5,6,6,4,4,4,4,4,4,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,715. Organize the numbers in a manner conducive to the creation of a frequency table.
::15. 以有利于创建频率表的方式组织数字。16. Create a Frequency Table.
::16. 建立一个频率表。17. How many students were surveyed to collect this data?
::17. 有多少学生接受过调查以收集这些数据?18. How many families have 4 children or less?
::18. 有多少家庭有4个子女或不到4个子女?Review (Answers)
::回顾(答复)Click to see the answer key or go to the Table of Contents and click on the Answer Key under the 'Other Versions' option.
::单击可查看答题键, 或转到目录中, 单击“ 其他版本” 选项下的答题键 。 -
Quantitative
Data (data that may be conveniently described numerically):