3.4 群集抽样
章节大纲
-
Suppose you are hoping to predict the most popular favorite movie among U.S. high school students. Since your population is all high school students in the U.S.A., a simple random sample is just not feasible since you cannot possibly number each student individually. How then could you manage to get a representative sample to use for extrapolation?
::假设你希望预测美国高中学生中最受欢迎的最受欢迎的电影。 由于你的人口是美国所有高中学生,简单的随机抽样是不可行的,因为你不可能将每个学生单独统计出来。 那么,你怎么能设法获得具有代表性的样本用于外推呢?Look to the end of the lesson for the answer.
::寻找教训的结尾 以找到答案。Cluster Sampling
::群集抽样Cluster sampling is ideal for extremely large populations and/or populations distributed over a large geographic area. The concept of cluster sampling is that we use SRS (simple random sampling) to choose a limited number of groups or clusters of samples from a population, and then again apply SRS to the chosen clusters in order to identify specific samples.
::组群取样对于分布在广大地理区域的极为庞大的人口和/或人口来说是理想的,组群取样的概念是,我们使用SRS(简单随机抽样)从一个人口中选择数量有限的一组或一组样本,然后对选定的组群再次采用SRS,以便确定具体的样品。Since you complete each step in the cluster sampling process using SRS, the results can be used for extrapolation. However, there is still a danger of ending up with a non-representative sample if the clusters you are choosing from are not each representative of the population. (See Example B)
::由于您使用SRS完成了组群取样过程的每一个步骤,结果可用于外推,但是,如果您选择的组群不是每个人口的代表,最终仍有可能出现非代表性抽样。 (见例B)The prime benefit of cluster sampling is that it can do an excellent job of reducing the size of a very large population down to something more manageable without ruining your ability to gather a representative sample.
::集束取样的主要好处是,它能够出色地减少大量人口的规模,降低到更易于管理的程度,同时又不破坏你收集具有代表性的样本的能力。Using Cluster Sampling
::使用群集抽样A consumer report journalist wants to publish a blog about the most popular cars in the U.S. She has decided to use publicly available vehicle registration data to identify the most often registered car makes. How could she use cluster sampling to help her build a representative sample of U.S. car owners?
::一位消费者报告记者想发表一篇关于美国最受欢迎的汽车的博客。 她决定使用公开的车辆登记数据来识别最经常注册的汽车。 她如何利用集束取样来帮助她建立具有代表性的美国汽车所有者的样本?One way to get a representative sample of vehicle registrations across the whole country would be to number a list of all of the sates in the U.S., and then use a random number generator (RNG) to pick out 4 or 5 states. From each state, she could then number the counties, use an RNG to pick a county or two, and then repeat to identify cities or towns. By narrowing down the extremely large initial population in this way, she can maintain the randomness of her sample without needing to number every car owner in the U.S.
::在全国进行具有代表性的车辆登记抽样的一个办法是,将美国所有州的名单编号,然后用随机数字生成器(RNG)来挑选4个或5个州。 从每个州,她就可以对各州进行编号,用RNG来挑选一个或两个郡,然后重复确定城市或城镇。 通过以这种方式缩小最初数量极多的人口,她可以保持样本的随机性,而不必对美国的所有汽车拥有者进行编号。Understanding Errors in Cluster Sampling
::理解群集抽样中的错误Kevin is attempting to create a representative sample of students in his school for a poll asking students’ opinions on shortening the school day by 1hr for students over 18yrs old. The results of his survey suggest that over 95% of students think it is a bad idea. Kevin is rather surprised that the results are so overwhelmingly negative, and he wonders if he did something wrong when selecting his sample.
::凯文试图在他的学校里建立一个有代表性的学生样本,以便进行民意测验,询问学生们对于将18岁以上学生的学日缩短1小时的意见。 调查结果表明95%以上的学生认为这是个坏主意。 凯文相当惊讶的是,结果如此之大都是负面的,他想知道自己在选择样本时是否做错了什么。If Kevin chose his sample with the cluster sampling method, and started by clustering the students by grade level, can you see why his results might be suspect?
::如果Kevin选用集束取样方法进行抽样, 从按年级分组学生开始,Did you recall from the lesson that we mentioned that each cluster should be representative of the population? By clustering his samples by grade level, Kevin opened himself up to bias right away. Given the results he received, it is likely that he ended up with all of his samples being freshman who (approximately 15yrs old) thought it unfair that older students should have a shorter schedule!
::你还记得我们提到的每个组群应该代表人口的经验教训吗? 通过按年级分组他的样本,凯文立刻就向偏见敞开了大门。 根据他所得到的结果,他最后可能所有的样本都是新生,他(大约15岁)认为年龄较大的学生应该有更短的工时是不公平的!Understanding How to Use Cluster Sampling
::了解如何使用群集抽样How could you use a cluster sample to estimate the average density of various tree types in a large forest?
::如何使用群集样本来估计大森林中各种树种的平均密度?A common method for this type of study is to use a map. If you lay a virtual grid over a map of the forest, you can then number the squares and use an RNG to identify a number of square clusters of trees. You can then count the number of each type of tree in each cluster.
::此类型研究的一个常见方法是使用地图。如果在森林地图上设置虚拟网格,您可以编号方形,并使用 RNG 来识别若干平方形树群。然后您可以计算每个组群中每一类树的数量。Earlier Problem Revisited
::重审先前的问题Suppose you are hoping to predict the most popular favorite movie among U.S. high school students. Since your population is all high school students in the U.S.A., a simple random sample is just not feasible since you cannot possibly number each student individually. How then could you manage to get a representative sample to use for extrapolation?
::假设你希望预测美国高中学生中最受欢迎的最受欢迎的电影。 由于你的人口是美国所有高中学生,简单的随机抽样是不可行的,因为你不可能将每个学生单独统计出来。 那么,你怎么能设法获得具有代表性的样本用于外推呢?This is an ideal opportunity to use a cluster sample. You could number each state and use an RNG to choose a few states, then repeat to choose a couple of school districts in each state, then a few schools from each district, and finally 1 or 2 classes from each school.
::这是一个使用集束抽样的理想机会。 你可以将每个州编号,然后用一个新民党来选择几个州,然后重复选择每个州的几个学区,然后从每个区选择几个学校,最后从每个学校选择一至两个班。Examples
::实例For examples 1-3, describe why or why not each scenario describes a cluster sample.
::例1-3,说明为什么或为什么不是每一种设想都描述了一个组群样本。Example 1
::例1Armand chooses 4 of the 10 busses in front of his school, and polls 10 students from each to see if they think buses are comfortable.
::阿曼德在学校前选择了10辆公共汽车中的4辆,This is valid cluster sample because it is reasonable to assume that the students in each bus are representative of the population of bus riders.
::这是有效的集群抽样,因为可以合理地假定,每辆公共汽车上的学生代表了乘坐公共汽车的人。Example 2
::例2A cup of milk is selected from 10 of the 50 gallons being studied.
::从正在研究的50加仑中的10加仑中选取一杯牛奶。This is not a cluster sample, it is merely an SRS, since each gallon can be considered a single unit, and the cup is just a smaller portion of the sample.
::这不是一个集束样品,它只是一个SRS,因为每加仑可以被视为一个单元,杯子只是样品中的一小部分。Example 3
::例35 dogs are chosen from each breed at the show.
::在表演中,从每个品种中挑选出5只狗。This is a stratified sample, not a cluster sample, since the groups are not each representative of the population of show dogs.
::这是一个分层抽样,而不是一个集群抽样,因为各群体并不是每个代表展示狗群的人群。Example 4
::例4How could you use the cluster method to select a representative sample of the types of energy drink carried by gas stations in Colorado?
::您如何使用集束法选择科罗拉多州加油站所携带的能源饮料类型的代表性样本?You might start with an overlay of a map of Colorado, and use an RNG to identify a few areas. Then sample the types of drink at one store of each gasoline brand located in the chosen areas (since different stores in the same geographical area from the same company usually carry the same inventory).
::你可以从一个科罗拉多地图的覆盖开始,然后用一个RNG来确定几个地区。 然后在位于选定地区的每个汽油品牌的一家商店中抽查饮料类型(因为同一公司在同一地理区域的不同商店通常都有相同的存货 ) 。Review
::回顾For questions 1-10, decide if each situation is an example of a properly selected cluster sample.
::对于问题1-10,决定每个情况是否是适当挑选的组群样本的例子。-
150 light bulbs are evaluated from 1 randomly selected pallet every 30 minutes.
::每30分钟从一个随机选择的托盘上评估150个灯泡。 -
5 light bulbs are evaluated from each case of light bulbs.
::对每个灯泡的5个灯泡进行评价。 -
10 cars are reviewed from each of 10 randomly selected used-car dealers.
::从10个随机挑选的二手车经销商中,每10辆对10辆汽车进行审查。 -
15 candy bars are tested from each shipment.
::每批货物中测试15个糖果棒。 -
150 laptops are tested from each company.
::每家公司测试150台膝上型计算机。 -
100 laptops are evaluated from each of 5 randomly selected dealers.
::5个随机挑选的经销商对每5台100台膝上型计算机都进行了评价。 -
25 students from each grade were asked the names of their favorite bands.
::每个年级的25名学生被问及他们最喜欢的乐队的名称。 -
25 students from each school were asked the names of their favorite bands.
::每所学校有25名学生被问及自己最喜欢的乐队的名称。 -
Gas prices were sampled from each gas station in town to find the cheapest.
::从镇上的每个加油站对天然气价格进行了抽样,以找到最便宜的天然气价格。 -
15 gas stations were sampled from each town to find the town with the cheapest.
::从每个城镇对15个加油站进行了抽样,以找到最廉价的加油站。
Review (Answers)
::回顾(答复)Click to see the answer key or go to the Table of Contents and click on the Answer Key under the 'Other Versions' option.
::单击可查看答题键, 或转到目录中, 单击“ 其他版本” 选项下的答题键 。 -
150 light bulbs are evaluated from 1 randomly selected pallet every 30 minutes.