2.4 覆盖不足
Section outline
-
In 1936, a well-known and highly respected magazine called the Literary Digest announced the result of the poll it had conducted on who would be elected president. During prior election years, the magazine had demonstrated remarkable accuracy in predicting the election winner. This time, the magazine predicted that Republican Alfred Landon, Governor of Kansas, would win by a wide margin (57% v.s. 43%) over the incumbent Democrat, President Franklin D. Roosevelt.
::1936年,一家知名和备受尊重的杂志《文学文摘》宣布了该杂志对谁将当选总统的民意测验结果。 在以前的选举年份,该杂志在预测选举获胜者时表现出了惊人的准确性。 这次,该杂志预测堪萨斯州州长阿尔弗雷德·兰德恩(共和党员)将比现任民主党总统富兰克林·罗斯福(Franklin D. Roosevelt)获得很大优势(57%对43% ) 。Unfortunately for the Literary Digest , when the results of the actual election came in, Roosevelt was the victor by a landslide: 62% vs. 38%! Obviously there was a serious problem with the poll conducted by Literary Digest , given that the was an unheard-of nearly 20%. The irony is that the poll was also one of the most ambitious surveys of the type ever conducted. Nearly 10 million people chosen from telephone books, club memberships, magazine subscriptions and other resources had been mailed the survey card, and approximately 2.5 million people responded.
::不幸的是,当实际选举结果出现时,罗斯福在《文学文摘》中以压倒性滑坡获胜:62%对38%!很显然,《文学文摘》进行的民意测验存在严重问题,因为这项调查是近20%的未闻之事。 讽刺的是,这次民意测验也是有史以来最雄心勃勃的此类调查之一。 从电话簿、俱乐部会员、杂志订阅费和其他资源中挑选的近1 000万人已经邮寄了调查卡,大约250万人回信。The error was almost entirely due to sample bias , specifically undercoverage of the less-wealthy democratic segments of the population . What caused the bias, and how could the magazine have improved the accuracy of their poll?
::这一错误几乎完全是由于抽样偏差,特别是人口中不富裕的民主阶层的便衣。 是什么原因导致了偏差,以及杂志如何提高了他们的民意测验的准确性?After we discuss undercoverage and self-selection bias, and work a few examples, we will return to this question. Can you figure out the answer on your own before then?
::在讨论秘密行动与自我选择偏见之后,并举几个例子之后,我们将回到这个问题上来。 在那之前,你能自己想出答案吗?Sample Bias
::抽样比亚There are many different types of sample bias , any of which can skew the results of an experiment or survey. Undercoverage is one common type, referring to a sample with too few examples of one or more segments of the population it is meant to represent. In some cases, particularly where the under-represented group is quite small in comparison to the others in the entire population, undercoverage may not have much of an effect. However, if the undercovered segment is significant enough, the results of the sample may not accurately estimate the characteristic of the population.
::有多种不同类型的抽样偏差,其中任何一种都可能扭曲实验或调查的结果,地下覆盖是一种常见类型,是指一个样品,其中只有极少一部分或多部分其本意代表的人口,在某些情况下,特别是在代表不足的群体与整个人口的其他群体相比相当小的情况下,地下覆盖可能没有多大效果,但是,如果卧底部分足够重要,抽样结果可能无法准确估计人口的特征。Self-selection is related to undercoverage, and can actually be the cause of it. Self-selection refers to the policy of asking voters to submit responses on their own, rather than collecting the answers from them. The problem with self-selection is it limits the voters to those with the time and inclination to respond (known as non-response bias ), which reduces the overall sample size , and also skews it toward the type of person who believes in the value of taking time to respond to polls!
::自选与卧底有关,而且实际上也是其原因。 自选是指要求选民自己提交答复的政策,而不是从他们那里收集答复。 自选的问题是将选民限制在有时间和倾向做出答复的人(称为无反应偏见 ) , 从而缩小了总体抽样规模,同时也扭曲了那些相信花时间对投票作出回应的价值的人。Understanding Bias
::理解偏见You are assisting with a study attempting to determine the satisfaction of school communication with students who speak a second language at home. The plan is to send home a questionnaire to the parents of the students, asking them about their opinion.
::您正在协助一项研究,试图确定学校与在家里讲第二语言的学生沟通的满意程度,计划向学生的父母寄送一份问卷,询问他们的意见。What kind(s) of bias is this survey method particularly prone to? How might they be addressed?
::这种调查方法特别容易产生何种偏见?如何解决这些问题?This method of sampling is liable to result in both non-response and undercoverage bias. Non-response bias is an issue any time a sample population is expected to submit a questionnaire, as your results are going to include more input from the type of person who is willing and able to complete and submit your survey. In this case, undercoverage is a particular problem, since the population most affected by the study is also unusually liable to misinterpret the questions or the reason for them due to the language barrier.
::这种抽样方法可能导致不作答复和卧底偏向,不作答复偏向是一个问题,每当抽样人口提交调查表时,不作答复偏向即是一个问题,因为你的调查结果将包括愿意和能够完成并提交调查的人的更多投入,在这种情况下,秘密调查是一个特殊问题,因为受研究影响最大的人口由于语言障碍,也极易曲解问题或原因。One possible solution might be to conduct a phone survey conducted by a native speaker in the target language(s).
::一种可能的解决办法可能是由讲当地语言的人用目标语言进行电话调查。Recognizing Types of Bias
::承认Bias类型What type(s) of bias do theexperiments below suggest?
::下面的实验表明哪类偏差?a. An experiment to determine the danger of mixing household chemicals is conducted by collecting samples of chemicals found under the experimenter’s sink.
::a. 通过收集试验者水槽下发现的化学品样品,进行一项实验,以确定混合家庭化学品的危险。Under coverage bias – This experiment is a prime example of the problems associated with convenience sampling , since the only chemicals used were the ones conveniently found in one location, the results could not be assumed to be the same as with chemicals found under other sinks.
::在覆盖偏差下 — — 这一实验是方便取样问题的一个典型例子,因为使用的唯一化学品是在一个地点方便发现的化学品,因此不能假定其结果与其他汇下发现的化学品相同。b. Mall shoppers are asked to fill out and return a form rating their shopping experiences at each of the 26 stores to identify the most popular stores in each of 4 categories.
::b. 要求购物店主填写并退回一份表格,对26个商店的购物经验进行评分,以确定4类商店中最受欢迎的商店。Non- response bias – Since the results are dependent on the shoppers turning in a response form on their own, the results will be biased toward a specific type of personality, and will not reflect a true cross-section of shoppers' experiences.
::非答复偏差 — — 由于结果取决于店主自己提交答复表,结果将偏向于特定类型的个性,不会反映真正的跨行业的店主经历。c. A study of the average grades of mathematics students polls 16 Algebra I students, 14 Geometry students, 7 Calculus students, and 19 Statistics students.
::c. 数学学生平均年级调查调查16个代数一学生、14个几何学生、7个微积分学生和19个统计学生。Undercoverage – The study only includes approximately as many Calculus students as the other subjects.
::隐蔽面 — — 这项研究只包括大约12个与其它科目一样多的微积分学生。Identifying Bias
::识别 BiasThere is a commonly referenced story about the difficulties of marketing products internationally, related to the Chevy Nova automobile. According to the story, the Chevrolet motor company lost millions over an attempt to sell the popular U.S. vehicle in Mexico without noting that “No-Va” means “No-Go” in Spanish!
::有关Chevy Nova汽车(Chevy Nova Nova 汽车)的国际营销产品困难的故事经常被引用。 根据这个故事,Chevrolet汽车公司因为试图在墨西哥出售美国流行的汽车而损失了数百万美元,而没有注意到“No-Va”在西班牙语中的意思是“不走 ” 。The truth is that the story is just an urban myth, and that the Nova sold well in Latin America, but the caution is valid nonetheless. If the situation had occurred as described, what sort of bias might have been the culprit in Chevy’s market research that could have led to the misunderstanding?
::事实上,这个故事只是个城市神话,新星在拉丁美洲卖得不错,但谨慎还是有效的。 如果情况如所述发生,什么类型的偏见可能是雪佛兰市场研究中可能导致误解的罪魁祸首?It is certainly reasonable to suspect that undercoverage might have been a contributing factor here. Any studies or market research that Chevy conducted in the United States about the popularity of the name “Nova” would have included far more native English speakers than Spanish speakers.
::当然有理由怀疑卧底可能是一个促成因素。 雪佛兰(Chevy)在美国对“诺瓦”这个名字的流行程度所做的任何研究或市场研究都可能比西班牙语多得多。Earlier Problem Revisited
::重审先前的问题In 1936, the Literary Digest predicted that Republican Alfred Landon, Governor of Kansas, would win the presidential race by a wide margin (57% v.s. 43%) over the incumbent Democrat, President Franklin D. Roosevelt.When the results of the actual election came in, Roosevelt was the victor by a landslide: 62% v.s. 38%!The error was almost entirely due to sample bias, specifically undercoverage of the less-wealthy democratic segments of the population.
::1936年,《文学文摘》预测堪萨斯州州长共和党人Alfred Landon将在现任民主党总统富兰克林·罗斯福(Franklin D. Roosevelt)的竞选中大胜(57%对43% ) 。 当实际选举结果到来时,罗斯福是压倒性的胜利者:62%对38%!这一错误几乎完全是由于抽样偏差,特别是人口中不富裕的民主阶层的卧底。What caused the bias, and how could the magazine have improved the accuracy of their poll?
::是什么原因造成这种偏见,该杂志如何提高民意测验的准确性?The bias was caused by the magazine’s method of sampling. Choosing the voters by telephone listing (remember that phones were much more of a luxury in 1936!), club membership, and magazine subscribers resulted in a bias toward the wealthier members of the population. Perhaps a door-to-door poll in some of the lower-income areas of the country would have provided some valuable insight. At a minimum , the magazine could have at least issued a statement regarding the possible bias in the survey due to the limited range of incomes targeted.
::这一偏见是由该杂志的抽样方法造成的。 通过电话列表(记住电话在1936年更加奢侈 ! ) 、 俱乐部会员和杂志订户选择选民,导致对人口较富裕成员的偏向。 也许在国内一些低收入地区进行门到门民意调查可以提供一些有价值的见解。 至少,该杂志至少可以发表一份声明,说明由于收入目标范围有限,调查可能存在偏向。Ironically, the uncommonly large size of the sample actually made the bias worse, since there was a huge number of responses from the wealthier demographic, overshadowing the limited number of other responses. Had the study been a bit more limited in size, the fewer other responses might not have been so drastically outnumbered, particularly if the smaller study were conducted in a more balanced area.
::具有讽刺意味的是,抽样规模之大实际上使偏差更为严重,因为来自较富裕人口的大量答复使其他答复数量有限。 如果研究规模稍小一些,其他答复数量可能不会如此之少,特别是如果规模较小的研究是在一个更为平衡的领域进行的。Examples
::实例Example 1
::例1If a sample of 100 high school students indicated that 78% thought the most important class in a high school curriculum was “Woodworking”, what might you suspect about the chosen sample?
::如果100名中学生的抽样表明,78%的中学生认为高中课程中最重要的班级是“Woodwork”,你对所选的样本有什么怀疑?It would certainly appear that the sample was not a likely cross section of the average public school. It is a good bet that the female population was undercovered during the sample selection process.
::当然,样本似乎不太可能是普通公立学校的跨部分,在样本选择过程中,女性人口很可能是卧底的。Example 2
::例2If a study posted results indicating that only 1% of polled students liked football, what bias is likely to have affected the sample selection?
::如果一项研究公布的结果显示,只有1%的受访学生喜欢足球,那么什么偏见可能影响抽样选择?Obviously the athletic students were undercovered in this sample. Maybe this study was conducted using the students who weren;t polled during the study referenced in question 1!
::很明显,这些体育学生是在这个样本中被卧底的。 也许这项研究是利用那些被调查的学生进行的;在问题1所述研究期间没有被调查过!Example 3
::例3Suppose “Super-Sugar” cola company indicated that every person polled who preferred “Super-Sugar Cola” over all other brands of soda was a multi-millionaire. What type(s) of sample selection bias would you suspect that might prevent you from running right out to buy a case of “Super-Sugar” so you could become a multi-millionaire?
::假设“超级糖”可乐公司指出,每一个被调查者都喜欢“超级糖可乐”而不是其他所有苏打水品牌的人都是百万富翁。 你怀疑哪种抽样选择偏差会阻止你直接跑出去买一个“超级糖”案例,从而成为百万富翁?This is an example of “cherry-picking”, a sampling technique where only very specific people are polled to insure a particular appearance for the results. If “Super-Sugar Cola” only sampled multi-millionaires, then any person who preferred their drink would be a multi-millionaire. Obviously this method would also create an undercoverage bias, since the less-wealthy soda drinkers were not included in the sample.
::这是“采摘”的一个例子,这是一种抽样技术,只对非常具体的人进行抽查,以确保结果的外观。 如果“超级糖可乐”只对数百万富翁进行抽样,那么任何喜欢他们喝的人都会是百万富翁。 显然,这种方法也会造成一种密探偏向,因为不那么富有的苏打水饮者不包括在抽样中。Review
::回顾Discuss how undercoverage could be a source of bias in each of the following surveys:
::讨论在以下每次调查中,便衣如何成为偏见的根源:1. A poll showed that 85% of respondents believe that teens make better drivers than adults.
::1. 一项民意调查显示,85%的受访者认为青少年比成年人成为更好的司机。2. The U.S. census of 1980 states that 32,194 Americans are 100 years old or older. However, Social Security figures show only 15, 258 adults of this advanced age (Los Angeles Times, Dec. 4, 1983)
::2. 1980年美国人口普查显示,有32,194名美国人年龄在100岁或100岁以上,然而,社会保障数字显示,只有15,258名成年人处于这一高龄(《洛杉矶时报》,1983年12月4日)。3. In a census in Russia, 1.4 million more women than men reported that they were married (U.S. News & World Report, Aug. 30, 1976).
::3. 在俄罗斯的一次人口普查中,有140万妇女报告说已婚(《美国新闻和世界报告》,1976年8月30日)。4. To find out how important the clothes of vice-presidential candidate might be, researchers ran a survey shortly after the 1984 Democratic convention in three locations: the Wall Street area of New York City, State Street in Chicago, and Crown Center in downtown Kansas City. The 347 respondents were shown pictures of women wearing three outfits, and the pictures did not show the women's faces. Then the respondents were asked several questions about how the outfits affected respondents’ feelings of competence regarding the model serving in a public office (Los Angeles Times, Aug., 3, 1984). 310 respondents indicated that the color and fit of the outfit was important in creating feelings of competence.
::4. 为了了解副总统候选人的衣服可能有多重要,研究人员在1984年民主大会后不久在三个地点进行了一项调查:纽约市华尔街区、芝加哥州街和堪萨斯市中心的皇家中心,向347名答复者展示了身穿三套服装的妇女的照片,照片没有显示妇女的脸,然后向答复者询问了这些服装如何影响答复者对在公共办公室服务的模式的胜任感(洛杉矶时报,1984年8月3日)。 310名答复者指出,服装的颜色和适当性对于创造能力感很重要。5. One year after the Detroit race riots of 1967, interviewers asked a sample of residents in Detroit if they felt they could trust most of their neighbors, some of their neighbors, or none at all. In one sample, 35% answered “most”; in another sample, only 7% answered “most”.
::5. 1967年底特律种族暴动一年后,采访者询问底特律居民的抽样,他们是否认为可以相信大多数邻居、一些邻居或根本没有人,在一次抽样中,35%回答“最”;在另一次抽样中,只有7%回答“最”。6. In a comment on deregulation of banking, “[the head of California's Security Pacific Bank] reckons the higher interest accounts, and all the other new financial services, are designed for the most affluent 15% to 20% of Security Pacific Bank's customers. By extension--as 2million customers are surely a sample of the general population--the new world of deregulated finance benefits the top-earning 15% to 20% of U.S. households” (Los Angeles Times, Dec. 4, 1983).
::6. 在一篇关于放松银行管制的评论中,[加利福尼亚州太平洋证券银行行长]估计高利息账户和所有其他新的金融服务是为最富的15%至20%的太平洋证券银行客户设计的。 推而广之,200万客户无疑是普通大众的抽样 — — 放松管制的金融新世界给美国15%至20%的家庭带来了收益最高的收益(洛杉矶时报,1983年12月4日 ) 。In the following scenarios, identify if we are dealing with a sampling or a nonsampling error. In each case, be as specific as possible about the source of error. Would this type of error result in bias?
::在以下情景中, 请确定我们是在处理抽样还是非抽样错误。 在每种情况下, 请尽可能具体地说明错误的来源。 这种错误会导致偏差吗 ?7. In a telephone survey that randomly selects participants, we try to contact a person five times and he/she never picks up the phone.
::7. 在随机挑选参与者的电话调查中,我们试图与一个人联系五次,但从未接过电话。8. An interviewer chooses people on the street to interview regarding their preference for walking v.s. driving.
::8. 面谈者选择街上的人接受面谈,以了解他们宁愿走路还是开车。9. The police department of Lexington would like to know more about people’s opinion about their police force. They send an officer in uniform to randomly selected households, but many of the selected households refuse to participate.
::9. 列克星顿警察局希望更多地了解人们对其警察部队的看法,他们向随机选定的家庭派出一名穿制服的警官,但许多选定的家庭拒绝参加。10. A survey asks the question “Do you agree with the U.S. Supreme Court’s decision that corporations are allowed to spend huge amounts of money to sway elections in their favor?”
::10. 一项调查提出这样一个问题: " 你是否同意美国最高法院关于允许公司花费巨额钱来为它们主持选举的决定? " 。11. In a survey that would like to measure the overall health of college students, including the prevalence of sexually transmitted diseases, some participants are not willing to admit that they have contracted such a disease.
::11. 一项调查想衡量大学生的总体健康状况,包括性传播疾病的流行程度,在这项调查中,一些参与者不愿意承认他们感染了这种疾病。12. In Fayette County, 53.8% of registered voters are registered as Democrats. However, in a SRS of 200 registered voters, only 45% of them are registered as Democrats.
::12. 在费耶特县,53.8%的登记选民登记为民主党人,然而,在200名登记选民中,只有45%登记为民主党人。13. An interviewer enters all the information into a database during the interview, and accidentally records that a person has 22 children, instead of 2.
::13. 面谈者在面谈期间将所有信息输入数据库,意外记录一个人有22个孩子,而不是2个孩子。Review (Answers)
::回顾(答复)Click to see the answer key or go to the Table of Contents and click on the Answer Key under the 'Other Versions' option.
::单击可查看答题键, 或转到目录中, 单击“ 其他版本” 选项下的答题键 。