2.2 人口与抽样
Section outline
-
Sometimes it can be a bit tricky to decide whether to conduct a particular study upon a sample group or on the entire population . Suppose you were attempting to put together a menu for a camping trip with a large group of friends and wanted to make sure nobody was allergic to peanuts before planning peanut-butter sandwiches for lunch. Would you need to question all 50+ friends individually? Would it make sense to choose a representative sample to poll instead? What if you wanted to pick a few popular types of soda to bring along, would that be a different situation?
::有时候,决定对抽样组还是对全体人口进行特定研究可能有点困难。 假设你试图为与一大群朋友一起的露营旅行设计一个菜单,并且想在计划午餐的花生酱三明治之前确保没有人对花生过敏。 您是否需要单独询问所有50岁以上朋友? 选择具有代表性的样本来代替民意调查是否合理? 如果您想选择几种流行的苏打水来生产,情况会不同吗?At the end of the lesson, we’ll return to this question to apply what we have discussed.
::课程结束后, 我们将回到这个问题, 以运用我们讨论过的内容。Populations vs. Samples
::人口与人口对比. 抽样Before you begin any particular study, you will need to decide whether you need to get data from the entire population in question, or just a representative sample of the population instead. For most studies, it makes much more sense to use a sample than to try to collect data on an entire population, but sometimes a sample is not enough. The most famous census is the U.S. Population Census, conducted once every 10 years.
::在开始任何特定研究之前,您需要决定是需要从全部相关人口那里获得数据,还是仅仅要取而代之的是具有代表性的人口抽样。 对于大多数研究来说,使用抽样比试图收集关于整个人口的数据更有意义,但有时仅仅收集抽样是不够的。 最著名的人口普查是每10年进行一次的美国人口普查。According to the Constitution, the population of the United States is enumerated once every ten years by physical count, and estimated in the intervening years by statistical sample. Observing the incredible cost (the 2010 Census cost approximately 13 billion dollars!) and organizational effort required for the census makes it clear why there are so few census studies conducted on the U.S. population. However, smaller census studies are more common than you might think.
::根据《宪法》,美国人口每10年按实物统计一次,并在间隔的年份中按统计抽样估算。 观察令人难以置信的成本(2010年人口普查花费约130亿美元 ! ) , 以及人口普查所需的组织努力,可以清楚地说明为什么对美国人口进行的普查研究如此之少。 然而,规模较小的普查研究比你们想象的要常见得多。Some studies would make no sense at all to conduct on an entire population. In fact, one entire class of study comes to mind: destructive study . A destructive study requires that the sample be ruined for its intended use by the study itself. Vehicle manufacturers test the durability of different models by crashing sample vehicles into simulated walls or other cars. Obviously if such studies were conducted on the entire population, there would be no cars left to sell!
::某些研究对全体人口来说是毫无意义的。 事实上,整个一整类研究都会想到:破坏性研究。破坏性研究要求将样品销毁,因为研究本身打算使用这些样品。 车辆制造商通过将样品车辆撞入模拟墙壁或其他汽车来测试不同模型的耐久性。 显然,如果对全体人口进行这种研究,就没有汽车可以出售了!Census vs. Representative Sample
::人口普查与代表对代表As a student you are most likely familiar with the most common census there is: the attendance count that takes place each morning. Each class is polled to identify any students who are not present, and the data is compiled in the administrative office.
::作为学生,您最可能熟悉最常见的人口普查:每天早上的出勤统计。每个班都接受民意测验,以识别不在场的学生,数据由行政办公室汇编。What likely uses are there for this data, and why might it be collected as a census rather than using a representative sample?
::这些数据有哪些可能的用途,为什么可以作为人口普查收集,而不是使用具有代表性的样本?The most apparent use of this data is to notify parents of students who did not make it to class, for safety and rule enforcement. Since the goal is to locate each and every possibly missing or late student, a statistical sample just isn’t acceptable.
::这些数据最明显的用途是通知没有上学的学生的父母,以便安全和执行规则。 由于目标是找到每一个可能失踪或迟到的学生,统计样本是无法接受的。Understanding Where Information Comes From
::了解信息来自何方When insurance companies set auto insurance rates, they adjust them according to statistically relevant demographic differences among drivers. The process of determining which groups of drivers is the most likely to be involved in expensive accidents is a statistical analysis using police reports and accident claims as data sources.
::当保险公司确定汽车保险费率时,它们根据与统计有关的驾驶员人口差异进行调整,确定哪些驾驶员最有可能参与昂贵事故的过程是利用警察报告和事故索赔作为数据来源进行统计分析。It is widely accepted that teenage boys are the most expensive demographic to insure, would you expect this information to be based on the population of teenage boys, or of teenage drivers, or of a sample of the appropriate demographic(s), and why?
::人们广泛承认,少年男孩是最昂贵的保险人口,你是否希望这一信息是基于少年男孩或少年司机的人口,还是根据适当人口抽样,以及为什么?The information is based on a sample of the teenage male drivers demographic, compared to a sample of teen and adult drivers in general. It would be virtually impossible to conduct a true census of all accidents involving teen male drivers, as there are just too many and there is no real way to insure that all accidents are correctly documented.
::与一般的青少年和成人驾驶员抽样相比,该信息是以青少年男性驾驶员人口抽样为基础的,几乎不可能对所有涉及青少年男性驾驶员的事故进行真正的普查,因为只有太多的男子驾驶员,而且没有真正办法保证所有事故都有正确的记录。What's Appropriate: Census or Sample?
::普查还是抽样?Suppose your biology teacher wanted to encourage the students in her class to work together on a large project, so she promised the class a pizza party if every single student completed the assigned homework by the deadline. With the deadline fast approaching, you decide to make sure that everyone is on track to get the assignment done on time.
::假设你的生物学老师想鼓励她班级的学生一起合作进行一个大型项目,所以她答应了班级一个比萨饼派对,如果每个学生在最后期限前完成指定作业的话。 随着最后期限的迅速临近,你决定确保每个人都能按时完成任务。Is this a situation where it would be appropriate to conduct a sample poll of the students, or should you do a full census of all 32 students in the class?
::这是适合对学生进行抽样调查,还是应对班里所有32名学生进行全面普查?If you want to be sure that everyone is really on track, you’d better complete a full census . A well-chosen sample would give you an idea of how far along the class is in general, but would not be effective at identifying all of the outliers which are really the most important data points in this particular study.
::如果你想确定每个人是否都步入正轨,你最好完成一次完整的人口普查。 精心挑选的样本可以让你了解整个班级的总体进度,但无法有效地识别出局者,而出局者正是本研究中最重要的数据点。Earlier Problem Revisited
::重审先前的问题Suppose you were attempting to put together a menu for a camping trip with a large group of friends and wanted to make sure nobody was allergic to peanuts before planning peanut-butter sandwiches for lunch. Would you need to question all 50+ friends individually? Would it make sense to choose a representative sample to poll instead? What if you wanted to pick a few popular types of soda to bring along, would that be a different situation?
::假设你试图为野营旅行设计一个菜单,与一大群朋友一起,并想在计划午餐的花生酱三明治之前确保没有人对花生过敏。你是否需要单独询问所有50岁以上朋友?选择具有代表性的样本来代替民意测验是否合理?如果你想选择几种流行的苏打水,那会不同吗?As inconvenient as it might be, you would certainly be well advised to actually ask each and every one of the friends planning to attend the trip about possible peanut allergies. Since even a single person having a severe allergic reaction would probably ruin the trip for everyone, the time saving of a sample poll instead of the complete census would just not be worth the risk.
::尽管可能很不方便,但你还是应该问一下计划参加这次旅行的每一个朋友关于花生过敏可能性的每一个朋友。 因为即使一个人有严重的过敏反应也可能会毁了每个人的旅行,因此节省抽查时间而不是完整的人口普查是不值得冒险的。The soda choice would indeed be a very different situation. Since it is very unlikely that anyone is going to be more than a little inconvenienced by a particular set of drink choices, a quickly generated list of suggestions from a half-dozen people or so would probably be just fine.
::苏打水的选择确实会是一个非常不同的情况。 因为人们不太可能会因为特定的一套饮料选择而感到不便,因此快速生成的来自半成人口的建议清单也许就很好了。Examples
::实例Example 1
::例1A study is to be conducted on the psychological effects of personally witnessing a jewelry store theft from a local mall. Police records suggest that there were a total of 23 witnesses. Is this a situation that would suggest that the entire population be included in the study, why or why not?
::警方记录显示,共有23名目击证人,这种情况是否意味着整个人口都包括在研究之中,为什么或为什么没有?The relatively small population size in this example certainly suggests that a full census be taken. A shopping mall in likely to contain a rather broad range of demographics, and the 23 witnesses are therefore likely to have many differences in age, sex, background, profession, etc.. Any representative sample taken would probably not be able to accurately represent the full range of possible factors affecting the results of the study.
::这一例子中相对较小的人口规模无疑意味着要进行全面普查。 购物中心可能包含相当广泛的人口统计,因此,23名证人在年龄、性别、背景、职业等方面可能存在许多差异。 任何具有代表性的抽样都可能无法准确地代表影响研究结果的各种可能因素。Example 2
::例2A new medicine has been developed that the developer claims will stimulate hair growth in balding men. Would you expect there to be safety tests conducted on the population of men before release?
::开发者声称开发者将刺激秃头人的发型增长。 你是否期望在释放前对男性人口进行安全测试?Read the question carefully! In statistics , “population” has a very specific meaning. It would be impossible to conduct safety tests on every man in the world, therefore any safety tests would have to be conducted on a representative sample, not on the population of male humans.
::仔细阅读问题!在统计中,“人口”有非常具体的含义。 世界上不可能对每个人进行安全测试,因此,任何安全测试都必须在具有代表性的样本中进行,而不是对男性人口进行。Example 3
::例3The Ford Explorer is a popular sport-utility vehicle sold in the U.S. originally equipped with Firestone tires. In May of 2000, Ford and Firestone were both accused of responsibility in hundreds of vehicle accidents caused by tire failure. Given that all vehicles sold in the U.S. undergo extensive safety testing, how could so many bad products have slipped through?
::福特探险家(Ford Explorer)是在美国销售的一部流行体育通用车辆,最初配有Firestone轮胎。 2000年5月,福特和费尔斯通两人都被指控对数百起因轮胎失灵造成的车辆事故负责。 鉴于在美国销售的所有车辆都经过广泛的安全测试,如此多的坏产品怎么会漏掉?There are many ways that the problem could have gone unnoticed. This is a situation where a census study of every Explorer produced in just not feasible; much of the testing simply has to be conducted on a representative sample. Perhaps the sample vehicles used for safety testing just happened to be ones with good tires, or perhaps the safety tests weren’t extensive enough, or the results were incorrectly evaluated.
::有许多方法可以忽略这一问题。 这种情况是,对每一个在不可行的情况下生产的探索者进行普查研究;许多测试都只是要用具有代表性的样本进行。 也许用于安全测试的试样车辆碰巧是轮胎好,或者安全测试不够广泛,或者对结果的评估不正确。Example 4
::例4You and your team are conducting a study on the differences in the ability of students in your school to focus during different times throughout the day. Each day your team chooses every
student to walk in the door, and you study 112 students on Monday, 78 on Tuesday, and 109 on Wednesday. If there are 299 students in the school, is this a sample or a population?
::你和你的团队正在研究你们学校学生在全天不同时间集中学习能力的差异。 你们的团队每天选择每个学生走进门,周一学习112名学生,周二学习78名学生,周三学习109名学生。 如果学校有299名学生,这是抽样还是人口?Even though your team collected samples equal to the population of the school, it would still be a representative sample rather than a true census since your random selection method almost certainly resulted in the observation of some students multiple times, and missed others entirely.
::尽管你的团队收集的样本与学校人口相等,但它仍将是一个具有代表性的样本,而不是真正的人口普查,因为你的随机选择方法几乎肯定导致一些学生多次观察,而其他学生则完全错过了。Example 5
::例5Why would it be virtually unarguable to state that a product claiming to be “Everyone’s Favorite Soda,” has not been properly evaluated from a statistical standpoint?
::为什么说声称是 " 每个人最喜爱的索达 " 的产品没有从统计角度得到适当评价,这几乎是无可辩驳的呢?A population study on every single person in the world is impossible.
::对世界上每个人进行人口研究是不可能的。Review
::回顾The local public library wants to know if it should increase its hours of operation.
::当地公共图书馆想知道是否应该增加营业时间。1. How would you want to go about conducting your research? Would you collect a sample or take a census?
::1. 您打算如何进行研究?您会收集样本还是接受普查?2. How would you collect your sample? What time of day would be best to collect the information? Why?
::2. 如何收集样品?何时收集资料最好?为什么?Some college students who were writing a research paper on whether people their age prefer vocal or instrumental music, decide to do so by sampling 100 people at a concert.
::一些大学生正在撰写一篇研究论文, 研究他们这个年龄的人是喜欢发声音乐还是乐器音乐,3. What is their population?
::3. 其人口是多少?4. What is their sample?
::4. 它们的样本是什么?5. What is wrong with their sample, based on the identified population?
::5. 根据已查明的人口,其抽样有什么问题?Identify the Population and the Sample
::查明人口和抽样6. In a survey of 1500 American households, it was found that 20% of the households own a computer.
::6. 对1 500个美国家庭进行的调查发现,20%的家庭拥有一台计算机。7. In a recent survey of 2578 highschool students, it was found that 28% of them come from single parent homes.
::7. 最近对2578名高中学生进行的调查发现,28%的高中学生来自单亲家庭。8. The average height of every 6 t h person entering the movie theatre within a 3 hour period was 5 ′ 4 ′ ′ .
::8. 每6人在3小时内进入电影院的平均高度为5'4。Identify each scenario as either sampling or census, and identify it as either random or not random.
::将每种情况确定为抽样或普查,并将之确定为随机或非随机。9. Only 12 tickets are available for over 30 candidates. All their names are thrown into a hat and 12 are pulled out.
::9. 30多名候选人只有12张门票,所有候选人的名字都挂在帽子上,12张被拔掉。10. A student wants to know how many students in school have ever worn a cast. Every student who comes to school that day is handed a short survey that they must turn in before they head to lunch.
::10. 学生想知道有多少学生曾戴过石膏,当天来学校的每个学生都收到一份简短的调查,在去吃午饭前必须交接。11. You ask 30 people in a clothing store which clothing store is their favorite.
::11. 你问服装店的30人,哪个服装店是他们最喜欢的服装店。Identify the choice that best completes the statement or answers the question.
::确定最能完成声明或回答问题的选择。12. A local business owner wants to find out which benefits plan its employees would prefer. Which of the procedures listed below would be the best way to obtain a statistically unbiased sample?
::12. 当地企业业主想了解其雇员喜欢哪一种福利计划,以下列出哪些程序是获得统计上不偏不倚抽样的最佳途径?a. Survey a random sample of employees from a list of all employees
::a. 从所有雇员名单中抽抽抽抽样调查雇员b. Invite all employees to indicate their choices by email
::b. 请所有雇员通过电子邮件表明其选择c. Place suggestion boxes at random locations in the company’s plant and offices
::c. 将建议箱放在公司工厂和办公室的随机地点d. Assemble a group with one member from each department and ask them their preference.
::d. 组成一个小组,每个部门有一名成员,请他们选择。13. A simple random sample of 300 people is selected from the 1650 male students in a university business course to take part in a business analysis test. The population being considered is:
::13. 从大学商业课程的1 650名男生中挑选了300人的简单随机抽样,以参加商业分析测试。a. 300
::a. 300个b. 1650
::b. 1650c. People taking part in the test
::c. 参加考试的人d. Male students enrolled in a university business course.
::d. 就读大学商业课程的男生。14. Which is the best example of an unbiased question?
::14. 一个没有偏见问题的最好例子是什么?a. Does the school board have the right to enforce a dress code?
::a. 学校董事会是否有权执行着装守则?b. Do you think the principal is doing a good job in spite of his questionable character?
::b. 你认为校长尽管品格可疑,但是否在做好工作?c. Do you prefer a daytime or evening class schedule?
::c. 您更喜欢白天或晚间课时表吗?d. Do you think the government should be allowed to seize whatever property they want to build a new highway?
:d) 你认为政府是否应该被允许没收他们想要建造新高速公路的任何财产?
15. Which question is biased?
::15. 哪个问题有偏见?a. Do you prefer daytime or evening television programming?
::a. 你更喜欢日间或晚间电视节目?b. Should there be a school dress code?
::b. 是否应制定学校着装守则?c. Do you prefer news or mindless sitcoms?
::c. 你更喜欢新闻还是没头脑的喜剧?d. Do you think a new highway should be built?
::d. 你认为应该修建一条新的高速公路吗?Review (Answers)
::回顾(答复)Click to see the answer key or go to the Table of Contents and click on the Answer Key under the 'Other Versions' option.
::单击可查看答题键, 或转到目录中, 单击“ 其他版本” 选项下的答题键 。