取样方法
Section outline
-
Sampling Methods
::取样方法One of the most important applications of statistics is collecting information. Statistical studies are done for many purposes: A government agency may want to collect data on weather patterns. An advertising firm might seek information about what people buy. A consumer group could conduct a statistical study on gas consumption of cars, or a biologist might study primates to find out more about animal behaviors. All of these applications and many more rely on the collection and analysis of information.
::统计的最重要应用之一是收集信息。 统计研究有多种目的:政府机构可能想收集关于天气模式的数据。 广告公司可能想了解人们买什么。 消费者团体可以对汽车的气体消费进行统计研究,或者生物学家可以研究灵长类动物,以了解更多的动物行为。 所有这些应用以及更多的应用都依赖于信息的收集和分析。One method to collect information is to conduct a census . In a census, information is collected on all the members of the population of interest. For example, when voting for a class president at school every person in the class votes, so this is an example of a census. With this method, the whole population is polled.
::收集信息的一种方法是进行人口普查。在人口普查中,收集了所有感兴趣的人口成员的信息。例如,在学校投票选举班长时,每个有阶级选票的人,这就是人口普查的一个例子。采用这种方法,整个人口都接受民意调查。It’s sensible to include everyone’s opinion when the population is small, like that of a high school. But conducting a census on a very large population can be very time-consuming and expensive. An alternate method for collecting information is by using a sampling method . This means that information is collected from a small sample that represents the population with which the study is concerned. The information from the sample is then extrapolated to the population—that is, we assume the results for the whole population would be about the same as the results for the sample.
::明智的做法是在人口少的时候,像高中那样,把每个人的意见都包括进去。 但是,对非常庞大的人口进行人口普查可能非常费时和昂贵。 另一种收集信息的方法是使用抽样方法。 这意味着从代表研究所涉人口的小样本中收集信息,然后将样本中的信息外推给人口 — — 也就是说,我们假设整个人口的结果与抽样结果大致相同。Sampling Methods
::取样方法The word population in statistics means the group of people we wish to study, as opposed to the population at large. When we use sampling to conduct a statistical study, first we need to decide how to choose the sample population. It is essential that the sample is a representative sample of the population we are studying. For example, if we are trying to determine the effect of a drug on teenage girls, it would make no sense to include males or older women in our sample population.
::统计中的“人口”一词是指我们希望研究的群体,而不是一般人口。当我们使用抽样进行统计研究时,首先我们需要决定如何选择抽样人口。至关重要的是,抽样是目前研究的人口具有代表性的抽样。例如,如果我们试图确定毒品对少女的影响,那么将男性或老年妇女纳入我们的抽样人口是毫无意义的。There are several ways to choose a population sample from a larger group. The two main types of sampling are random sampling and stratified sampling .
::从较大群体中选择人口抽样有几种方法,两种主要抽样类型是随机抽样和分层抽样。Random Sampling
::随机抽样This method simply involves picking people at random from the population we wish to poll . However, this doesn’t mean we can simply ask the first fifty people who walk by in the street. For instance, if you were conducting a survey on people’s eating habits, you’d get different results if you were standing in front of a fast-food restaurant than if you were standing in front of a health food store. In a true random sample , everyone in the population must have the same chance of being chosen. Calling people on the phone, for example, might be a better way of getting a random sample for a survey about eating habits.
::这种方法只是从我们想要调查的人群中随机挑选人。 然而,这并不意味着我们可以简单地询问在街上行走的前50人。 比如,如果你在调查人们的饮食习惯,如果你站在快餐餐厅前面,结果就会不同。 如果你站在卫生食品店前面,那么在真正的随机抽样中,每个人都必须拥有同样的被选中的机会。 比如,打电话给电话上的人,或许是获得随机抽样调查饮食习惯的更好方法。Stratified Sampling
::分层抽样This method of sampling actively seeks to poll people from many different backgrounds. The population is first divided into different categories (or strata ) and the number of members in each category is determined. Gender and age groups are commonly used strata, but others could include salary, education level or even hair color. Then, a sample is made up by picking members from each category in the same proportion as they are in the population. For example, imagine you are conducting a survey that calls for a sample size of 100 people. If you know that 10% of the population you’re studying are males between the ages of 10 and 25, then you would seek 10 males in that age group to be part of your sample. Once those 10 have responded, no more males between 10 and 25 may take part in the survey.
::这种抽样方法积极寻求调查来自不同背景的人。 人口首先分为不同的类别( 或阶层) , 并且确定每一类别的成员数量。 性别和年龄组通常使用不同的阶层, 但其他的可以包括工资、 教育水平或甚至头发颜色。 然后, 抽样方法通过从每个类别中采集与人口比例相同的成员来组成。 比如, 想象你正在进行调查, 需要100人的抽样规模。 如果您知道, 您正在学习的人口中有10%是10至25岁的男性, 那么您就会在这个年龄组中寻找10名男性作为样本的一部分。 一旦这10人做出答复, 10至25岁的男性可能不会参加调查。Sample Size
::样本大小In order for sampling to work well, the sample size must be large enough to lessen the effect of a biased sample. For example, if you randomly sample 6 children, there is a fairly good chance that most or all of them will be boys. If you randomly sample 6000 children, it’s far more likely that they will be approximately equally spread between boys and girls. Even in stratified sampling (when we would likely poll equal numbers of boys and girls) it’s important to have a large enough sample to include other kinds of different viewpoints.
::为了让取样工作顺利,抽样规模必须足够大,以降低偏差抽样的效果。 比如,如果你随机抽样6个孩子,他们中大部分或全部是男孩的可能性相当大。 如果你随机抽样6000个孩子,他们几乎在男孩和女孩中间分布平均的可能性更大。 即使是分层抽样(当我们可能调查同等数量的男孩和女孩时 ) , 拥有足够多的样本以纳入其他不同观点也很重要。The sample size is determined by the precision desired for the population. The larger the sample size is, the more precise the estimate is. However, the larger the sample size, the more expensive and time consuming the statistical study becomes. In more advanced statistics classes you’ll learn how to use statistical methods to determine the best sample size for a given survey.
::抽样规模由人口所需的精确度决定。 抽样规模越大,估计越准确。 然而,抽样规模越大,统计研究越费钱和费时。 在更先进的统计类中,你将学会如何使用统计方法来确定特定调查的最佳抽样规模。Choosing a Sampling Method
::选择抽样方法For a class assignment you have been asked to find if students in your school are planning to attend university after graduating high-school. Students can respond with “yes”, “no” or “undecided”. How will you choose which students to interview if you want your results to be reliable?
::学生们可以回答“是 ” 、 “否 ” 或“未决定 ” 。 如果您希望你的考试结果可靠,您将如何选择面试哪些学生?The best method for obtaining a representative sample would be stratified sampling. Students in the upper grades might be more sure of their post-graduation plans than students in the lower grades, so it makes sense to divide your sample by grade level. You’ll need to find out what proportion of the total student population is included in each grade, then interview the same proportion of students from each grade when conducting the survey.
::获得代表性样本的最佳方法将是分层抽样。 高年级学生可能比低年级学生更确定自己的毕业后计划,因此将样本按年级分列是有道理的。 您需要找出每个年级学生占学生总人数的比例,然后在进行调查时与每个年级的学生进行相同比例的访谈。Identifying Biased Samples
::查明有风险抽样Once we have identified our population, it is important that the sample we choose accurately reflect the spread of people present in the population. If the sample we choose ends up with one or more sub-groups that are either over-represented or under-represented, then the sample is biased . The results of a biased sample might not really represent the entire population, so we want to avoid selecting one. Stratified sampling helps, but it doesn’t always eliminate bias in a sample. Even with a large sample size, we may be consistently picking one group over another.
::一旦我们确定了我们的人口,我们选择的样本必须准确地反映人口中人口的分布。 如果我们选择的样本最终是一个或多个代表性过大或代表性不足的子群体,那么样本就有偏向。 有偏向的样本的结果可能并不真正代表整个人口,因此我们想避免选择。 分层抽样有助于避免选择一个。 分层抽样可以帮助我们,但并不总是消除样本中的偏向。 即便样本规模大,我们也可能总是选择一个群体而不是另一个群体。Some samples may deliberately seek a biased sample in order to bolster a particular viewpoint. For example, if a group of students were trying to petition the school to allow eating candy in the classroom, they might try to show that a lot of students support this idea by surveying students immediately before lunchtime when they are all hungry. The practice of polling only those who you believe will support your cause is sometimes referred to as cherry picking .
::一些样本可能有意寻找有偏见的样本,以强化特定的观点。 比如,如果一群学生试图向学校请愿,允许在课堂上吃糖果,他们可能会试图表明许多学生支持这一想法,在午餐前立即调查学生,因为他们都饿了。 投票的做法只有那些你认为会支持你事业的人,有时被称为樱桃采摘。Many surveys may have a non-response bias . For example, if researchers simply hand out questionnaires on a street corner and ask people to fill them out and then mail them in, most people will just throw the questionnaires away. Only people who are really interested in the subject will bother to send them in, and those might also be the people who are more likely to answer the questions a certain way. (Imagine if the questionnaire asked “Do you care a lot about surveys?” People who cared about surveys would answer it, people who didn’t care wouldn’t bother, and a researcher just looking at the surveys that got sent in would conclude that everybody cares about surveys, because everybody who actually answered the survey said yes!)
::许多调查可能有非答复偏差。 比如,如果研究人员简单地在街角的街角发放问卷,要求人们填写问卷,然后寄出问卷,大多数人就会把问卷扔掉。 只有真正感兴趣的人才会把问卷寄进来,而这些人也可能是更可能以某种方式回答问题的人。 (如果问卷问“你很关心调查吗 ” , 关心调查的人会回答,那些不在乎调查的人不会麻烦,而只要看看所送调查的研究人员就会得出结论,每个人都会关心调查,因为所有实际回答调查的人都会回答是的! )Non-response bias may be reduced by conducting face-to-face interviews. When you talk to people in person, you can get them to agree to answer a question before you tell them what it is, and then the people you get answers from won’t just be the people who care a lot about the question.
::进行面对面的面谈可以减少不回应的偏向。 当你亲自与人交谈时,你可以让他们同意回答一个问题,然后告诉他们答案是什么,然后得到答案的人将不仅仅是关心这个问题的人。Self-selected respondents tend to have stronger opinions on subjects than others and are more motivated to respond. For this reason, phone-in and online polls also tend to be poor representations of the overall population. Even if it looks like both sides are responding, the poll may disproportionately represent extreme viewpoints from both sides, while ignoring more moderate opinions which may, in fact, be the majority view. Self-selected polls are generally regarded as unscientific.
::自我选择的受访者往往比其他受访者对主题的意见更强,更愿意做出回应。 为此原因,电话和在线民意调查也往往对总人口的反映不力。 即使看起来双方都在回应,民意调查也可能不成比例地代表双方的极端观点,而忽略了更温和的观点,而这种观点事实上可能是多数人的观点。 自我选择的民意调查通常被视为不科学的。A classic example of a biased sample occurred in the 1948 Presidential Election. On Election night, the Chicago Tribune printed the headline DEWEY DEFEATS TRUMAN, which turned out to be mistaken. The reason the paper was mistaken is that their editor trusted the results of a phone survey. Telephones were still relatively new at the time, so the people who had them tended to be wealthier than average ; therefore, a sample of people who had telephones was not a representative sample of the population at large.
::在1948年的总统选举中,出现了一个典型的有偏见的抽样例子。在选举之夜,芝加哥论坛刊印了头条标题DEWEY DefEATS TRUMAN,结果被误认为是错的。 报纸错误的原因是编辑相信了电话调查的结果。 当时电话仍然比较新,因此拥有电话的人往往比一般人富裕;因此,拥有电话的人的抽样并不是广大人口的有代表性的抽样。Identifying Bias in Samples
::确定抽样中的偏见Identify each sample as biased or unbiased. If the sample is biased explain how you would improve your sampling method.
::如果样本有偏差,请解释如何改进取样方法。a) Asking people shopping at a farmer’s market if they think locally grown fruit and vegetables are healthier than supermarket fruits and vegetables.
:a) 询问在农民市场购物的人,如果他们认为当地生产的水果和蔬菜比超市的水果和蔬菜更健康的话。
This would be a biased sample because people who shop at farmer’s markets are more likely than the average person to think that locally grown produce is better. The study could be improved by interviewing an equal number of people coming out of a supermarket, or by interviewing people in a more neutral environment such as the post office.
::这将是一个有偏见的样本,因为那些在农民市场购物的人比一般人更有可能认为当地种植的农产品更好。 这项研究可以通过采访同样数量从超市出来的人,或者在邮局等更中立的环境中采访人来改进。b) You want to find out public opinion on whether teachers get paid a sufficient salary by interviewing the teachers in your school.
:b) 你想通过与学校教师面谈,了解公众对于教师是否获得足够工资的看法。
This is a biased sample because teachers probably would think they should get a higher salary, but that doesn’t mean everybody else would agree. A better sample could be obtained by constructing a stratified sample with people in different income categories.
::这是一个有偏见的样本,因为教师们可能认为他们应该得到更高的工资,但这并不意味着其他人都会同意。 与不同收入类别的人建立分层样本可以获得更好的样本。c) You want to find out if your school needs to improve its communications with parents by sending home a survey written in English.
:c) 您想了解您的学校是否需要通过发送英文调查,改善与父母的沟通。
This is a biased sample because only English-speaking parents would understand the survey, and parents who don’t speak English would be more likely to find that the school doesn’t communicate with them well. The study could be improved by sending different versions of the survey written in languages spoken at the students’ homes.
::这是一个有偏见的样本,因为只有英语家长才能理解调查,而不讲英语的父母更有可能发现学校与其沟通不畅。 这项研究可以通过以学生家里讲的语言发送不同版本的调查来改进。Identify Biased Questions
::查明有偏见问题When you are creating a survey, you must think very carefully about the questions you should ask, how many questions are appropriate and even the order in which the questions should be asked. A biased question is a question that is worded in such a way (whether intentional or not) that it causes a swing in the way people answer it. Biased questions can lead even a representative, non-biased population sample to answer in a way that does not accurately reflect the larger population.
::当您正在创建调查时, 您必须非常仔细地思考您应该问的问题, 问几个问题是合适的, 甚至问几个问题的顺序。 有偏见的问题是一个用某种方式( 不管是有意的还是无意的)表达来导致人们回答方式的摇摆的问题。 有偏见的问题甚至可以导致有代表性的、无偏见的人口抽样以不准确反映更多人口的方式回答问题。While biased questions are a bad way to judge the overall mood of a population, they are sometimes used by politicians or advertising companies to falsely suggest that a product or policy is more or less popular than it really is.
::虽然有偏见的问题不是判断人口总体情绪的好办法,但有时政客或广告公司却利用这些问题来虚伪地暗示某项产品或政策比实际中更受欢迎。There are several ways to spot biased questions:
::发现有偏见的问题有几种方式:-
They may use polarizing language, words and phrases that people associate with emotions:
-
Is it right that farmers murder animals to feed people?
::农民为养活人而谋杀动物是对的吗? -
How much of your time do you waste on TV every week?
::你每周在电视上浪费多少时间? -
Should we be able to remove a person’s freedom of choice over cigarette smoking?
::我们是否应该取消一个人选择吸烟的自由?
::他们可能会使用与情感相关的两极化语言、言词和词句:农民杀害动物喂人是否正确?每周在电视上浪费你多少时间?我们是否应该取消一个人对吸烟的选择自由? -
Is it right that farmers murder animals to feed people?
-
They may refer to a majority or to a supposed authority:
-
Would you agree with the American Heart and Lung Association that smoking is bad for your health?
::你同意美国心脏和肺协会的意见吗? 吸烟对健康有害? -
The president believes that criminals should serve longer prison sentences. Do you agree?
::总统认为罪犯应该服更长的刑期 你同意吗? -
Do you agree with 90% of the public that the car on the right looks better?
::你同意90%的公众 认为右边的车看起来更好吗?
::他们可以指多数人或所谓的权威:你同意美国心脏和肺协会的观点吗?吸烟有害健康吗?总统认为罪犯应该服更长的徒刑。你同意吗?90%的公众都认为右边的汽车看起来更好吗? -
Would you agree with the American Heart and Lung Association that smoking is bad for your health?
-
The question may be phrased so as to suggest the person asking the question already knows the answer:
-
It’s OK to smoke so long as you do it on your own, right?
::只要自己吸烟, 吸烟是正常的, -
You shouldn’t be forced to give your money to the government, should you?
::你不应该被迫把钱捐给政府, -
You wouldn’t want criminals free to roam the streets, would you?
::你不希望罪犯在街上自由游荡,
::这个问题的提法可能是为了建议提出问题的人已经知道答案:只要你自己吸烟,吸烟是正常的,对吗?你不应该被迫把钱交给政府,对吗?你不希望罪犯在街上自由游荡,对吗? -
It’s OK to smoke so long as you do it on your own, right?
-
The question may be phrased in ambiguous way (often with double negatives) which may confuse people:
-
Do you reject the possibility that the moon landings never took place?
::你是否拒绝月球登陆从未发生的可能性? -
Do you disagree with people who oppose the ban on smoking in public places?
::你是否不同意反对禁止在公共场所吸烟的人?
::这个问题可能用含糊不清的词句(往往使用双重否定词)来表达,使人困惑:你是否拒绝月球着陆从未发生的可能性?你是否不同意反对禁止在公共场所吸烟的人? -
Do you reject the possibility that the moon landings never took place?
In addition to biased questions, the overall design of a survey can be biased in other ways. In particular, question order can play a role. For example, a survey may contain several questions on people’s attitudes to cigarette smoking. Then, if the question “What, in your opinion, are the three biggest threats to public health today?” is asked at the end of the survey, people will be more likely to give “smoking” as one of their answers than they would be if that question had been asked as part of a different survey, or if it had been placed at the beginning of this survey instead of at the end.
::除了有偏见的问题之外,调查的总体设计还可能在其他方面有偏向性。 特别是,问题顺序可以发挥作用。 比如,调查可能包含几个关于人们吸烟态度的问题。 然后,如果在调查结束时问到“你认为今天对公众健康构成三大威胁是什么? ” 的问题,人们会比作为不同调查的一部分提出这个问题,或者如果在调查开始时而不是在调查结束时提出这个问题,更可能给出“吸烟”作为答案之一。Example
::示例示例示例示例Example 1
::例1Suppose you are interested in learning how popular the internet music program Spotify is at your school. You select a random sample of your friends. Is this sample likely to be representative of your school?
::假设您有兴趣了解互联网音乐程序Spotify在您的学校有多受欢迎。 您可以随机选择您朋友的样本。 这个样本可能代表您的学校吗 ?By selecting a random sample of your friends, not everyone in your school has an equal chance to be selected, in fact, students who are not your friends do not have a chance of being selected at all. Therefore, this is not a random sample of students at your school. Your sample may be biased because your circle of friends is likely to represent similar interests, and not represent all interests of the students at the school. At best, this sample could represent how popular Spotify is with your friends.
::通过随机挑选你的朋友样本,不是学校里每个人都有同等机会被选中,事实上,不是你的朋友的学生根本就没有被选中的机会。 因此,这不是学校里学生的随机样本。 您的样本可能会有偏向, 因为您的朋友圈可能代表着类似的利益, 而不是学校里学生的所有利益。 最多来说, 这份样本可以代表您朋友对《 点点》的欢迎程度。Review
::回顾For 1-6, comment on the way the following samples have been chosen. For the unsatisfactory cases, suggest a way to improve the sample choice.
::对于1-6,评论选择下列样品的方式;对于不满意的样品,建议改进样品选择的方法。-
You want to find whether wealthier people have more nutritious diets by interviewing people coming out of a five-star restaurant.
::你想通过采访从五星级餐厅出来的人 来发现富人是否有更富营养的饮食。 -
You want to find if there is there a pedestrian crossing needed at a certain intersection by interviewing people walking by that intersection.
::您想要找到是否在某个十字路口需要一个行人交叉路口, 询问在十字路口走过的人。 -
You want to find out if women talk more than men by interviewing an equal number of men and women.
::你想知道,如果女性通过采访数量相等的男子和妇女来比男性谈得更多的话,你会发现。 -
You want to find whether students in your school get too much homework by interviewing a stratified sample of students from each grade level.
::您想通过访问各年级学生的分层样本, 来发现您学校的学生是否做太多功课。 -
You want to find out whether there should be more public busses running during rush hour by interviewing people getting off the bus.
::询问下巴士的人, 是否应该有更多公交车在高峰时段行驶。 -
You want to find out whether children should be allowed to listen to music while doing their homework by interviewing a stratified sample of male and female students in your school.
::您想了解是否允许孩子在做功课时听音乐,
For 7-10, a university wants to know if its statistics course challenging enough for students. Every semester, the university offers several sections of the course. Explain the type(s) of bias most evident in each sampling technique and/or what sampling method is most evident. Be sure to justify your choice.
::对于7-10,大学想知道其统计课程是否对学生具有足够的挑战性;每学期,大学都提供课程的几个部分;解释在每种取样技术中最明显的偏见类型和/或最明显的抽样方法。-
The first 30 students to buy the textbook at the beginning of the next semester.
::在下学期初购买教科书的前30名学生。 -
The name of a color is selected at random, and on a given day, all statistics professors ask students wearing that color their opinion on the statistics course.
::颜色的名称是随机选取的,在某一天,所有统计教授都向穿着这种颜色的学生询问他们对统计课程的看法。 -
A flier is passed out on campus, asking students who have taken statistics at the university to reply by mail.
::校内传来一张传单, 要求在大学统计的学生通过邮件回信回信。 -
Five students are selected at random from each section of the statistics course during a given semester.
::在某一学期,从统计课程的每一部分随机挑选5名学生。 -
There are 35 students taking statistics in your school, and you want to choose 10 of them for a survey about their impressions of the course. Use your calculator to select a SRS of 10 students. (Seed your random number generator with the number 10 before starting.) Assuming the students are assigned numbers from 1 to 35, which students are chosen for the sample?
::在你的学校里有35个学生在接受统计,您想选择其中的10个学生来调查他们对课程的印象。使用您的计算器来选择10个学生的SRS。 (在开始之前,您会看到10个学生的随机数字生成器。 )假设学生的编号是从1到35, 样本中选择哪些学生? -
For a class assignment, you have been asked to find out how students get to school. Do they take public transportation, drive themselves, get a ride from their parents, carpool, walk, or bike? You decide to interview a sample of students. How will you choose those you wish to interview if you want your results to be reliable?
::对于课堂任务,你被要求去了解学生如何上学。他们是否乘坐公共交通工具、驾驶自己、从父母那里搭便车、搭车、驾车、步行或骑自行车?你决定采访抽样学生。如果你希望你的考试结果可靠,你将如何选择你想采访的学生?
Review (Answers)
::回顾(答复)Click to see the answer key or go to the Table of Contents and click on the Answer Key under the 'Other Versions' option.
::单击可查看答题键, 或转到目录中, 单击“ 其他版本” 选项下的答题键 。 -
They may use polarizing language, words and phrases that people associate with emotions: