章节大纲

  • The Purpose of this Lesson
    ::本课程的目的

    In this lesson, you will explore data that is not numerical. Nevertheless, you'll provide counts for the data, organize the counts in tables, and interpret the relative proportions of each.
    ::在此教训中, 您将探索不是数字的数据 。 尽管如此, 您仍会提供数据计数, 将计数组织在表格中, 并解释每个计数的相对比例 。

    Introduction : Frequency Plots and Relative Frequency Plots
    ::导言:频率图和相对频率图

    The tables you'll work with in this section involve both the frequency and the relative frequency of counts for data. You've worked with something similar before — frequency plots and relative frequency plots.
    ::在本节中,您要使用的表格既包括数据计数的频率,也包括数据的相对频率。您以前也曾使用过类似的方法——频率图和相对频率图。

    Work it Out
    ::工作出来

    Below is a list of the masses (in grams) of the "California" burrito at 20 different burrito shops in San Diego. Use a bin size of 150  to create a histogram , that is, a frequency plot, for the data. Find the mean and standard deviation , and graph vertical lines showing the mean as well as values one standard deviation from the mean. Approximate the number of burrito shops that were within one standard deviation of the mean. Interpret the data in the context of the scenario. Convert the frequency plot to a relative frequency plot.
    ::下面是圣地亚哥20个不同的卷饼商店的“California”卷饼质量(克)列表。 使用150个的硬盘大小来创建直方图, 即数据频率图。 查找平均值和标准偏差, 以及显示平均值和值与平均值标准偏差的垂直线图。 接近在平均值标准偏差范围内的卷饼商店数量。 在假设中解释数据。 将频率图转换为相对频率图 。

    723, 428, 797, 812, 768, 750, 714, 605, 673, 680, 700, 893, 635, 928, 951, 910, 670, 800, 945, 1050


    Activity 1: Categorical Versus Numerical Data
    ::活动1: 分类Versus数值数据

    Example 1-1
    ::例1-1

    Sanjay's favorite dish is sushi. Yasmine's favorite is burritos . Since one person likes sushi and the other likes burritos , does it make sense to average "sushi" and " burritos " and conclude that the  mean dish they prefer is, say, "vegetarian wrap?" Why or why not? In the interest of determining which dish is more popular, they ask 20 of their friends if they prefer sushi or burritos Below is the  categorical data they gathered. Determine the percentage that preferred each. 
    ::Sanjay最喜欢的菜是寿司。Yasmine最喜欢的菜是玉米卷。由于一个人喜欢寿司,而另一个人喜欢玉米卷饼,因此,平均的“寿司”和“肉卷”是否合理,并得出结论,他们喜欢的是“蔬菜包装?”为什么不行?为了确定哪道菜更受欢迎,他们要求20个朋友,如果他们喜欢寿司或玉米卷的话。下面是他们收集的绝对数据。确定每个菜都喜欢的比例。

    Prefer Sushi Prefer Burritos Total Individuals 14 6 20

    ::Prefer Sushis prefer Burritos Burritos Total Individidals 14620 (普惠制)

    Solution:    The average of "sushi" and "burritos" is not "vegetarian wrap." You can't average this data, because it's categorical data. The variable is "dish," and the variable can take on a value of "sushi" or "burrito." It can also take on a value of "pizza" or "salad," but those values were not included in the domain for this scenario.
    ::解决方案 : “ 寿司” 和“ burritos” 的平均值不是“ 蔬菜包装 ” 。 您无法对这些数据进行平均处理, 因为它是绝对数据 。 变量是“ dish ” , 变量可以使用“ sushi” 或“ brrito ” 的值 。 它也可以使用“ pizza” 或“ salad ” 的值, 但是这些值没有包括在这个假设情景的域内 。

    Nevertheless, it is possible to count the number of cases where the variable "dish" took on the value "sushi" and the number of cases where the variable "dish" took on the value "burrito." These counts were organized in a  frequency table . Because the data is categorical, a table is often preferred over a histogram for organizing categorical data. After all, you are not going to compute the mean as "vegetarian wrap" and see a roughly normal distribution of dish types around the mean! (However, you are encouraged to create such a histogram and submit it as a cartoon to your favorite satirical statistics journal.)
    ::尽管如此, 仍然可以计算变量“ dish” 使用值“ sushi” 和变量“ dish” 使用值“ birrito” 的案例数。 这些计数是在一个频率表格中排列的。 由于数据是绝对的, 一个表格往往比直方图更适合用于组织绝对数据。 毕竟, 您不会将平均值算作“ 植被包装 ” , 并且看到盘子类型大致正常地分布在平均值周围 ! ( 但是, 鼓励您创建这样的直方图, 并将其作为漫画提交给您最喜爱的讽刺统计期刊 ) 。

    You can calculate the  relative frequencies  of each category by dividing the count of each by the total count. Here is a  relative frequency table  showing the percentage that preferred each dish:
    ::您可以通过将每个类别的计数除以总数来计算每个类别的相对频率。这里有一个相对频率表,显示选择每个盘子的百分比:

    Prefer Sushi Prefer Burritos Total Individuals 14 20 = 70 % 6 20 = 30 % 100 %

    ::1420=70% 620=30% 100%

       Categorical Data
    ::分类数据

    A categorical variable like "dish" can take on categorical values like "sushi" and "burrito." The values for the variable are categories.
    ::像“dish”这样的绝对变量可以接受绝对值,比如“sush”和“burrito”。变量的值是分类。

    The counts on each category are data that can be summarized in a frequency table.
    ::每一类的计数是可在频率表中汇总的数据。

    A relative frequency table gives the percentage of each count out of the total counted.
    ::相对频率表列出计算总数中每个计数的百分比。


    Activity 2: Bivariate Categorical Data
    ::活动2:双变量分类数据

    Example 2-1
    ::例2-1

    Ava is a cinephile and an audiophile. She is curious to know if there is a relationship between the sorts of movies her friends prefer and the sorts of music they prefer. She ask each of 40 friends to choose: science fiction or romantic comedy? Rap or punk? She organizes the  counts into a table as shown below. Interpret the  counts in the table. Do you see any interesting patterns? Convert the table to a relative frequency table and continue your interpretation.
    ::她很想知道她的朋友喜欢的电影类型和喜欢的音乐类型之间是否有关系。她要求40个朋友中的每一个选择:科幻小说还是浪漫喜剧?拉普还是朋克?她将计数组织成一张表格,如下表所示。请解释表中的计数。你看到什么有趣的模式吗?将表格转换成相对频率的表格,继续你的解释。

    Science Fiction Romantic Comedy Totals Rap 15 10 25 Punk 11 4 15 Totals 26 14 40

    ::科 理 文 曲 喜 喜 剧

    Solution:    More of her friends like science fiction (26) than Romantic Comedies (14). More of her friends like Rap (25) than Punk (15). It would have been interesting if it turned out that rap was  associated  with science fiction and punk was associated with romantic comedy, or vice versa, but neither of those turned out to be the case. Converting this table to a relative frequency table makes that clearer:
    ::解答:更多的她的朋友喜欢科幻小说(26),比浪漫喜剧(14)要多。更多的她的朋友喜欢拉普(25),比庞克(15)要多。 如果事实证明饶舌与科幻小说有关,朋克和浪漫喜剧有关,反之亦然,但两者都没有出现。 将这张表格转换为相对频率表会更清楚:

    Science Fiction Romantic Comedy Totals Rap 15 40 = 37.5 % 10 40 = 25 % 25 40 = 62.5 % Punk 11 40 = 27.5 % 4 40 = 10 % 15 40 = 37.5 % Totals 26 40 = 65 % 14 40 = 35 % 40 40 = 100 %


    ::科 理 文 曲 曲 喜 喜 喜 喜 喜 喜 喜 喜 喜 喜 喜 喜 喜 喜 喜 喜 喜 喜 喜 喜 喜 喜 喜 喜 喜 喜 喜 季 季 季 季 季 季 季 季 季 季 季 季 季 季 末 季 季 季 季 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完 完

    It was consistently the case that Ava's friends liked rap more than punk, and that they liked science fiction more than romantic comedy. Perhaps you can say that punk was particularly unpopular with the romantic comedy crowd, but this effect is subtle since generally punk was relatively unpopular.
    ::艾娃的朋友总是比朋克更喜欢说唱,他们更喜欢科幻小说,而不是浪漫喜剧。 也许你可以说朋克对浪漫喜剧人群特别不欢迎,但这种影响是微妙的,因为一般而言朋克相对不欢迎。

    Interactive
    ::交互式互动

    Use the interactive to  see different frequencies and relative frequencies.  Try to enter values so that there is no association between movie and music preferences.  Try to enter value so that  punk is associated with science fiction and rock is associated with romantic comedy.
    ::使用互动来查看不同的频率和相对频率。 尝试输入值, 这样电影和音乐偏好之间就不会有任何关联。 尝试输入值, 这样朋克就可以和科幻小说相关联, 摇滚可以和浪漫喜剧相关联 。

    INTERACTIVE
    Categorical Data
    minimize icon
    • Enter in different values to see different relative frequency tables from the example above.
      ::以不同的值输入不同的数值,以查看与以上示例不同的相对频率表。
    • The percentages are rounded to the nearest hundredths place.
      ::百分数四舍五入到最近的第一百位。
    • Entered values can not exceed 50.
      ::输入值不得超过50。
    Your device seems to be offline.
    Please check your internet connection and try again.

    +
    Do you want to reset the PLIX?
    Yes
    No

        Bivariate Categorical Data
    ::双变量分类数据

    Joint frequencies  are the counts on two variables, for example, there were 15 friends who preferred science fiction and rap. They are in the interior of the table.
    ::联合频率是根据两个变量计算的,例如,有15个朋友偏爱科幻小说和说唱,他们在表内。

    Marginal frequencies are the totals of the rows and columns. They are on the margins of the table.
    ::边际频率是行和列的总和。它们位于表格边际。

    Categories might be associated if there is significant variation in the relative frequencies.
    ::如果相对频率差异很大,则可能与类别相关联。

    Work it Out
    ::工作出来

    1. Kenneth enjoys going to fairs and festivals with his friends. He surveys his friends to see what they like to eat for lunch at the fair, and what they like to eat for desert. Do they like pizza or hot dogs for lunch? Do they like funnel cake or cotton candy for desert? He surveys 30 of his friends, and records the results in the table that follows. What percentage of friends liked pizza and funnel cake? What percentage liked hot dogs and cotton candy? Convert the table to a relative frequency table. Interpret the  percentages  in the tables. Is there any association between  categories ? Explain.
      ::Kenneth喜欢和朋友一起参加博览会和节日活动。他调查他的朋友们,看他们喜欢吃什么,在博览会吃什么,他们喜欢吃什么,在沙漠吃什么。他们喜欢吃披萨或热狗吗?他们喜欢吃漏斗蛋糕或棉花糖吗?他们喜欢沙漠吃什么?他喜欢吃漏斗蛋糕或棉花糖吗?他调查了30个朋友,并将结果记录在下面的表格中。朋友喜欢披萨和漏斗蛋糕的比例是多少?喜欢热狗和棉花糖的比例是多少?把桌子转换成相对频率表。解释表格中的百分比。分类之间是否有关联?解释一下。

    Pizza Hot Dogs Totals Funnel Cake 11 2 13 Cotton Candy 5 12 17 Totals 16 14 30

    ::111213Cottton Candry 51217Totals161430

    1. Kai enjoys snow sports with his friends. Some like snowboarding, and others skiing. All of them listen to music while they are on the slopes. Some like hip hop, and some like classical. He surveys 50 of his friends and records the results as shown below. Convert the table to a relative frequency table. Interpret the percentages in the tables. Is there any association between categories? Explain.
      ::Kai和他的朋友一起享受雪运动,有些喜欢滑雪,有些喜欢滑雪,有些喜欢滑雪。 他们都在斜坡上听音乐,有些喜欢跳跳,有些喜欢古典。他调查了50个朋友,记录了下面显示的结果。他将表格转换为相对频率表。解释表格中的百分比。分类之间是否有关联?请解释。

    Snowboarding Skiing Totals Hip Hop 17 0 17 Classical 8 25 33 Totals 25 25 50

    ::滑雪滑雪滑雪滑雪滑雪拖车Hip Hop17017 Classical 825533 Totals252550

    1. Carla runs a reading lounge and cafe that offers coffee and tea. Her patrons enjoy reading either physical books or electronic tablets. Carla surveys her clientele, and records the data in a table, but an unfortunate spill obscures some of the values. Find the missing values in the table. Convert the table to a relative frequency table. Interpret the  percentages in the tables. Is there any association between categories ? Explain.
      ::卡拉经营一个阅读休息室和咖啡厅,提供咖啡和茶。她的赞助人喜欢阅读物理书籍或电子平板电脑。卡拉调查她的客户,记录在一张桌子上的数据,但不幸的泄漏掩盖了其中的某些值。找到表中缺失的值。将表格转换为相对频率表。解释表格中的百分比。类别之间是否有关联?解释。

    Coffee Tea Totals Books 24 Tablets 57 Totals 46 52

    ::咖啡Tea TotalsBooks24 表57


    Activity 3: Conditional Relative Frequencies
    ::活动3:有条件的相对原因

    Conditional relative frequencies  provide another way to interpret bivariate categorical data. Instead of computing percentages out of the total of all counts, percentages are computed out of the total for each row or column. Conditional relative frequency gives us another mechanism for detecting an association between categories.
    ::有条件的相对频率提供了解释双轨绝对数据的另一个方法。 与计算所有计算总数中的百分比相比,每行或每列的百分比都是从总数中计算出来的。 有条件的相对频率为我们提供了另一种检测类别间关联的机制。

    Example 3-1
    ::例3-1

    Helene runs a travel company. She interviews her clients to determine their location preferences--beach, lakes, or mountains. She also determines their activity preferences--swimming, fishing, or hiking. She collects the  counts in the table below. Convert the table to a conditional relative frequency table using the totals for the rows. Interpret the results . Is there any association between categories? Explain.
    ::Helene经营一家旅行公司。她与客户面谈,以确定他们的首选地点 -- -- 海滩、湖泊或山峰。她还决定他们的活动首选 -- -- 游泳、捕鱼或徒步旅行。她收集下表的计数。她用行的总数将表格转换成一个有条件的相对频率表。解释结果。分类之间是否有关联?请解释。

    Beach Lakes Mountains Totals Swimming 24 5 9 38 Fishing 16 32 14 62 Hiking 4 11 45 60 Totals 44 48 68 160

    ::海滩湖湖山山 245938 钓鱼 16321462 Hisking 4114560 Totals 444868160

    Solution:    Conditional relative frequencies for rows are determined by dividing each count by the total for the row. For example,  24 38 63.2 %  of swimmers choose to go to the beach. Furthermore,  45 60 75 %  of hikers choose to go to the mountains. Certain categories are clearly associated with one another. "Swimming" is associated with "beach" and "hiking" is associated with "mountains." In fact, you could say that swimming is nearly as associated with the beach as hiking is associated with the mountains. And that certainly makes sense. Below is the complete table with percentages rounded. Interestingly, those who like fishing disperse themselves more freely to each destination. Notice that the total percentage per row is 100%, and it doesn't make sense to add the percentages for the columns.
    ::解答 : 各行的有条件相对频率是通过将每行的计数除以总数来决定的。 例如, 243863.2%的游泳者选择去海滩。 此外, 456075%的远足者选择去山上。 某些类别明显相关。 “ 游泳” 与“ 海滩” 和“ 登山” 相关联。 事实上, 您可以说, 游泳与海滩几乎一样相关, 远足与山脉相关。 这当然是有道理的。 下面是完整的表格, 百分数四舍五入。 有趣的是, 那些喜欢捕鱼的人会更自由地散到每个目的地。 注意每行的总百分数是100%, 而为柱子添加百分数是没有道理的 。

    Beach Lakes Mountains Totals Swimming 63.2 % 13.2 % 23.7 % 100 % Fishing 25.8 % 51.6 % 22.6 % 100 % Hiking 6.7 % 18.3 % 75 % 100 % Totals n/a n/a n/a n/a

    ::海滩湖湖 Mountains Totals Swimming 63.2% 13.2% 23.2% 23.7% 100% Fishing 25.8% 51.8% 51.6% 22.6% 100% Hiking6.7% 18.3% 75% 100% Totalsn/an/an/an/an/a

    It's important to remember that not every association between categories is evidence of causation. As was the case with linear correlation , data can be cherry picked to create associations that have no real-world explanation. Here is the same table as above, but with different categorical variables and values. Alternately, imagine removing the names for the categories altogether. The association would still exist, but it would be meaningless.
    ::重要的是要记住,并不是所有类别之间的关联都是因果关系的证据。 和线性关联一样,数据可以被摘为樱桃, 来创建没有真实世界解释的关联。 这里的表格与上表相同, 但有不同的绝对变量和价值。 或者, 想象一下完全删除类别的名称。 社团将继续存在, 但毫无意义 。

    Fav. Color Blue Fav. Color Green Fav. Color Red Totals Blood Type O 63.2 % 13.2 % 23.7 % 100 % Blood Type A or B 25.8 % 51.6 % 22.6 % 100 % Blood Type AB 6.7 % 18.3 % 75 % 100 % Totals n/a n/a n/a n/a

    ::Fav. 颜色蓝色Fav. 颜色蓝Fav. 颜色绿色Fav. 颜色红色玩具布洛德型O63.2% 13.2% 23.2% 23.7% 100% A型或B25.8% Blood型或B25.8% 51.6% 22.6% 22.6% 100% AB6.7% 18.3% 18.3% 75% 100% Totalsn/an/an/an/a

       Conditional Relative Frequencies
    ::有条件的相对困难

    Conditional relative frequencies are calculated by dividing each count by the total for the corresponding row or column.
    ::有条件相对频率的计算方法是,将每项计算除以相应行或列的总和。

    A conditional relative frequency table shows the percentages out of row totals, or out  of column totals, but not both.
    ::有条件的相对频率表显示行总数或栏总数中各行总数中的百分比,但不同时显示两者。

    Conditional relative frequency is your best tool for determining associations between categories.
    ::有条件的相对频率是确定类别间关联的最佳工具。

    Work it Out
    ::工作出来

    Diego lives in Chicago and commutes to work by bike. He surveys his colleagues to see which transportation method they use, and how they carry their items to work. The counts are shown in the table below. Complete the totals. Convert the table to a conditional relative frequency table using the totals for rows. Interpret the  results. Is there an association between any of the categories? Explain.
    ::Diego住在芝加哥,乘自行车上班。 他调查了他的同事, 查看他们使用的运输方法, 以及他们如何携带物品工作。 计算结果见下表。 完成总数。 将表格转换为有条件的相对频率表, 使用行总和。 解释结果。 任何类别之间是否有关联? 解释 。

    Subway Bike Car Totals Satchel 18 10 7 Backpack 12 22 11 Briefcase 4 1 15 Totals

    ::潜艇汽车托塔尔萨奇尔18107 背包122211 Briefcase4115

      Summary
    ::摘要

    • Categorical variables like "dish" or "carrying method" take on categories like "sushi" or "briefcase" as their values.
      ::分类变量,如“dish”或“carrying 方法”,以“sushi”或“priefcase”等分类作为其值。
    • Counts can be presented in frequency tables, relative frequency tables, or conditional relative frequency tables.
      ::计数可按频度表、相对频度表或有条件相对频度表列出。
    • Associations between categories don't necessarily imply the existence of a causal relationship. 
      ::各类别之间的关联并不一定意味着存在因果关系。