分析数据
章节大纲
-
How do you decide what sort of graph is most appropriate for a particular application? If you know you have a set of average test scores from a number of different tests and are attempting to compare them, should you use a histogram , box-and-whiskers plot, or a bar chart ? If you have 500 yes/no/maybe responses to a survey , should you visualize them with a pie chart , a bar chart, or a frequency polygon ?
::您如何决定哪类图表最适合特定应用程序 ? 如果您知道您有一套来自不同测试的平均测试分数, 并且试图比较这些分数, 您是否使用直方图、 框和口哨图或条形图 ? 如果您对调查有500 个是/ 不/ 可能的答复, 您是否应该用饼图、 条形图或频谱多边形来想象它们 ?Choosing an effective data visualization can be a bit daunting; particularly at first, but with practice it will become much less difficult.
::选择有效的数据可视化可能有点令人望而却步;特别是在一开始,但在实践中,它将变得不那么困难。Analyzing Data
::分析数据There are quite a number of common types of data visualizations, some of which have been in use for hundreds of years, and using modern technology there are even more. Some of the more modern types are animated and/or interactive, and may continually update themselves as new data is collected via the Internet. There are a number of other lessons in this course that individually detail the creation and interpretation of some of the more common graphing types, but in this lesson we will focus on some of the advantages and disadvantages of each. Having a good idea of the strengths of different methods will help you to choose an appropriate method for your own study.
::有许多常见的数据直观化类型,其中一些已经使用了数百年,而且使用现代技术的情况甚至更多。有些较现代的种类是动画和/或互动的,随着新数据通过互联网收集,可以不断更新自己。本课程中还有其他一些教训,分别详细说明一些比较常见的图表型的创建和解释,但在这个教训中,我们将侧重于每种类型的一些利弊。很好地了解不同方法的长处,将有助于你选择适合自己研究的方法。Graphing Method Strengths / Advantages Weaknesses / Disadvantages Histogram Good for comparing multiple ranges of values, can be visually appealing, good for consolidating large amounts of data into a limited number of categories Gets cluttered if there are too many categories of data, can take a while to construct – particularly by hand, not ideal for comparing a large number of categories, precise data is not easily displayed since values are grouped Box-and-Whisker Plot Quickly demonstrates a five-point summary of the data (mean, quartiles, range), can be used to compare multiple data sets, good for consolidating very large data sets Individual data usually lost as data is grouped, not often considered as visually appealing as some other graphs, can be somewhat slow to construct Stem and Leaf Diagram Maintains accuracy of individual data values, can be easily evaluated to find ranges and notable clusters , good for medium to med-large data sets Generally not particularly attractive, not convenient for identifying central tendencies for larger data sets, can be confusing for audiences not familiar with the idea. Scatter Plot Good for showing increasing/decreasing data trends , maintains accuracy of individual values, makes outliers immediately apparent, faster to construct than some, excellent for bivariate data . May lead to incorrect generalizations about the data, may take a while to construct if there are many data points, limited flexibility for creating attractive designs, regression should only be applied to continuous variables . Line Plot Continuous variables can be represented with a few discrete points, handles bivariate data, moderate flexibility in design, easily interpreted by audiences Requires continuous data, may not be visually striking Frequency Polygon Same as Line Plot above, but with greater visual flexibility for grabbing audience attention Requires continuous data, anchors on base may be assumed as zero value data points Pie Chart Simple to create and interpret, easily visualize relative percentages, great for comparing values from multiple sources, can be very visually striking Limited uses, requires discrete data, may be overly simplistic Determining the Appropriate Graph
::确定适当图表1. Which of the following types of graphs is the most appropriate for displaying/evaluating the body weights of 300 different pet dogs? Why?
::1. 以下哪几类图表最适合显示/评价300只不同宠物狗的体重?为什么?
::条形图表A histogram is the best option here because it allows you to break the data up into as many or as few categories or weight classes as you wish, and clearly shows the comparison between them. Also, a histogram would be a relatively efficient way to track the rather large data set .
::直方图是这里最好的选择, 因为它允许您将数据分割成您想要的尽可能多或少的类别或重量级, 并清楚地显示它们之间的比较。 另外,直方图将是一个相对有效的跟踪相当大数据集的方法 。2. Which one of the given graph types is most appropriate for evaluating the relationship between amount of sunlight and plant growth rate?
::2. 哪个特定图表类型最适合评估阳光量与植物生长率之间的关系?
::条形图表A scatter plot is the right tool here, since you are comparing two different variables. Since these are both continuous variables , you could also evaluate the trend of the data points. If the trend appears to be linear, you could use linear regression to identify the average comparison, and then add a linear plot to the graph to illustrate that average.
::散射图是一个正确的工具, 因为您正在比较两个不同的变量。 由于这两个变量都是连续变量, 您也可以评估数据点的趋势。 如果趋势似乎是线性, 您可以使用线性回归来识别平均比较, 然后在图形中添加线性图来显示该平均值 。3. Which one of the given graph types is most appropriate for evaluating the relationship between amount of sunlight and plant growth rate?
::3. 哪个特定图表类型最适合评估阳光量与植物生长率之间的关系?The relative distribution of several age ranges of trees in a forest: 0-5yrs, 6-15yrs, 16-30yrs, 31yrs and older.
::森林中若干年龄范围的树木的相对分布:0-5yrs、6-15yrs、16-30yrs、31yrs及以上。
::条形图表This would be a great use for a pie chart, since you are dealing with the entire population of trees, and are interested in comparing the percentage of the forest represented by each.
::这对一个派图很有用,因为你所处理的是所有树木人口,并有兴趣比较每个森林占森林的百分比。Earlier Problem Revisited
::重审先前的问题How do you decide what sort of graph is most appropriate for a particular application? If you know you have a set of average test scores from a number of different tests and are attempting to compare them, should you use a histogram, box-and-whiskers plot, or a bar chart? If you have 500 yes/no/maybe responses to a survey, should you visualize them with a pie chart, a bar chart, or a frequency polygon?
::您如何决定哪类图表最适合特定应用程序 ? 如果您知道您有一套来自不同测试的平均测试分数, 并且试图比较这些分数, 您是否使用直方图、 框和口哨图或条形图 ? 如果您对调查有500 个是/ 不/ 可能的答复, 您是否应该用饼图、 条形图或频谱多边形来想象它们 ?Choosing the best visual representation of a data set is a matter of identifying the purpose of your study and the type of data you intend to display.
::选择数据集的最佳直观表示方式,是确定研究目的和打算显示的数据类型。A particular average representing each test could be pretty clearly displayed by a bar chart, and could be made quite striking with appropriate design.
::一张条形图可以清楚地显示代表每项试验的特定平均数,如果设计得当,可以使这一平均数变得相当引人注目。500 results spread over only three categories would be a good use of a pie chart. You have the entire population (members of the survey), and your goal is to show the relative portions of each answer.
::500个分布于三个类别的结果将很好地使用一个派图。你拥有全部人口(调查成员),你的目标是显示每个答案的相对部分。Examples
::实例Choose the most appropriate graphing method for each situation, and explain your reasoning:
::为每种情况选择最合适的图形绘制方法,并解释你的推理:Example 1
::例1The number of high school diplomas earned in Denver, CO for each year between 1980 and 1990.
::1980年至1990年期间每年在丹佛获得的高中文凭数量(CO)
::条形图表Histogram, this is a comparative study of a limited number of categories of data. Note that although time is continuous, the groups of 1 year each allow us to graph with a histogram.
::直方图,这是对数量有限的数据类别的比较研究。 请注意,尽管时间是连续的,但每年的1个组允许我们用直方图绘制图表。Example 2
::例2The proportion of minnows in a pond that are damaged by chemical dumping.
::池塘中因化学倾弃而受损的米诺鱼的比例。Pie Chart, this is a finite population and you are comparing proportions of a limited number of categories Example 3
::例3The average number of puppies birthed by five breeds of dogs.
::5种狗所生小狗的平均数量。
::条形图表Bar Chart, since you are dealing with averages of discrete data in a limited number of categories.
::条形图,因为您正在处理数量有限的类别中离散数据的平均值。Example 4
::例4The number of students in the upper 10%, upper 50%, lower 50%, upper 25%, and lower 25% on finals at a particular university.
::在特定大学的决赛中,上10%、上50%、下50%、上25%和下25%的学生人数。
::条形图表Bar Chart, your categories overlap (upper 50% includes upper 10% for example), so a pie chart would not be appropriate.
::条形图, 您的分类重叠( 50%以上包含10%以上) , 因此, 馅饼图表不合适 。Example 5
::例5The number of yellow, red, and white roses found in each of 500 ten by ten foot plots in Central Park, New York.
::在纽约中央公园,每500万乘10英尺的地块里 都发现了黄色、红色和白色的玫瑰Scatter Plot, there are WAY too many categories (individual flower garden plots) to try to use a histogram or bar chart. Review
::回顾What type of Graph should you use?
::您应该使用哪种类型的图形 ?-
Track and compare the different amounts of time you spend playing video games versus doing your homework and practicing piano, over the period of a month.
::并比较玩电子游戏的时间 和做功课和练钢琴的时间 在一个月的时间里 -
You are trying to prove that your family spends too much money on groceries each month. Your families monthly budget looks like this:
-
$1,250 home mortgage
::1 250美元住房抵押贷款 -
$500 utilities
::500美元水电费 -
$800 car payments
::800美元汽车付款 -
$300 entertainment
::300美元娱乐费 -
$800 groceries
::800美元杂货
::你试图证明你的家人每月花在杂货上的钱太多了。你的家庭每月预算是这样的:1 250美元房屋抵押费500美元,汽车费800美元,娱乐费300美元,娱乐费800美元。 -
$1,250 home mortgage
-
You employ 4 sales people. You would like to track their sales over the last quarter.
::你雇了四个推销员 你想追踪他们的销售情况 -
You would like to compare the number of votes that 4 candidates received in the last student council election.
::你们想比较一下4名候选人在上次学生会选举中获得的选票数。 -
You would like to know if there is a relationship between the time you spend studying for a test and the test scores you receive in a class, knowing you study more for classes you enjoy.
::您想知道您在考试中学习的时间和您在班级中获得的考试分数之间是否存在某种关系,您知道您为喜欢的班级学习更多时间。 -
Choose methods of graphing commonly used to “Compare”. For instance comparing the sales performance of one car to another.
-
Bar Graphs
::条边图 -
Column Graphs
::列图表 -
Scatter Plots
::散散散绘图 -
Pie Charts
::饼图表 -
Line Graphs
::直线图 -
Data Tables
::数据表格 -
Box Plots
::框绘图 -
Histograms
::直方图
::选择通常用于“比较”的图形化方法。例如,将一辆车的销售性能与另一辆车的销售性能进行比较。 -
Bar Graphs
-
Choose types of graphs commonly used to show “Distribution”. For instance, the waiting room times of patients in 5 different doctors’ offices.
-
Bar Graphs
::条边图 -
Column Graphs
::列图表 -
Scatter Plots
::散散散绘图 -
Pie Charts
::饼图表 -
Line Graphs
::直线图 -
Data Tables
::数据表格 -
Box Plots
::框绘图 -
Histograms
::直方图
::选择通常用于显示“分发”的图表类型。例如,5个不同医生办公室的病人候诊室时间。巴列图表列图表散列图 -
Bar Graphs
-
Choose types of graphs commonly used to show “Parts of a Whole” For instance, number of female viewers of a new gaming website.
-
Bar Graphs
::条边图 -
Column Graphs
::列图表 -
Scatter Plots
::散散散绘图 -
Pie Charts
::饼图表 -
Line Graphs
::直线图 -
Data Tables
::数据表格 -
Box Plots
::框绘图 -
Histograms
::直方图
::选择通常用于显示“整个部分”的图表类型,例如,新游戏网站的女观众数量。Bar Graphs 列图表散列绘图 Pie 图表线 数据表格 框绘制直方图 -
Bar Graphs
-
Choose types of graphs often used to show “Trends over Time”. For instance, number of umbrellas sold over a 365 day period.
-
Bar Graphs
::条边图 -
Column Graphs
::列图表 -
Scatter Plots
::散散散绘图 -
Pie Charts
::饼图表 -
Line Graphs
::直线图 -
Data Tables
::数据表格 -
Box Plots
::框绘图 -
Histograms
::直方图
::选择通常用来显示“随时间演变的趋势”的图表类型。例如,在365天期内出售的雨伞数量。 -
Bar Graphs
-
Choose types of graphs used to show “Deviations”. For instance, sales numbers in an established business when a competitor opens across town.
-
Bar Graphs
::条边图 -
Column Graphs
::列图表 -
Scatter Plots
::散散散绘图 -
Pie Charts
::饼图表 -
Line Graphs
::直线图 -
Data Tables
::数据表格 -
Box Plots
::框绘图 -
Histograms
::直方图
::选择用于显示“ evidations” 的图表类型。 例如, 当竞争者跨城打开时, 固定企业的销售额。 bar Graphs 列图图表散列绘图 Pie 图表线 数据表格 框 绘制直方图 。 -
Bar Graphs
-
Choose types of charts that can be used to show “relationship” For instance, number of tutoring students obtained after report cards are released.
-
Bar Graphs
::条边图 -
Column Graphs
::列图表 -
Scatter Plots
::散散散绘图 -
Pie Charts
::饼图表 -
Line Graphs
::直线图 -
Data Tables
::数据表格 -
Box Plots
::框绘图 -
Histogram
::直方图
::选择可用于显示“关系”的图表类型,例如,发布在报告卡片后获得的辅导学生人数。 -
Bar Graphs
Review (Answers)
To see the answer key for this book, go to the and click on the Answer Key under the ' ' option.
::回顾(答复) -
Track and compare the different amounts of time you spend playing video games versus doing your homework and practicing piano, over the period of a month.