5.6 适合数据的线条
章节大纲
-
Fitting Lines to Data
::匹配数据线Katja has noticed that sales are falling off at her store lately. She plots her sales figures for each week on a graph and sees that the points are trending downward, but they don’t quite make a straight line. How can she predict what her sales figures will be over the next few weeks?
::Katja注意到最近她的商店销售量正在下降。 她将每周的销售数字放在一张图表上,并看到这些点呈下降趋势,但并没有完全直线。 她如何预测她接下来几周的销售数字会如何?In real-world problems, the relationship between our dependent and independent variables is linear, but not perfectly so. We may have a number of data points that don’t quite fit on a straight line, but we may still want to find an equation representing those points. In this lesson, we’ll learn how to find linear equations to fit real-world data.
::在现实世界的问题中,我们依赖的变量和独立变量之间的关系是线性的,但并非完全如此。 我们可能有一些数据点不完全适合直线,但我们可能仍然想要找到一个代表这些点的方程式。 在这个教训中,我们会学会如何找到适合真实世界数据的线性方程。Make a Scatter Plot
::制作一个散列绘图A scatter plot is a plot of all the ordered pairs in a table. Even when we expect the relationship we’re analyzing to be linear, we usually can’t expect that all the points will fit perfectly on a straight line. Instead, the points will be “scattered” about a straight line.
::散射图是一张桌子上所有定购对子的图。 即使我们预期我们分析的关系是线性的,我们通常也不能预期所有点子都完全适合直线。 相反,点数将“分解 ” , 直线。There are many reasons why the data might not fall perfectly on a line. Small errors in measurement are one reason; another reason is that the real world isn’t always as simple as a mathematical abstraction, and sometimes math can only describe it approximately.
::数据可能不完全上线的原因很多。 测量中的小错误是一个原因;另一个原因是现实世界并不总是像数学抽象一样简单,有时数学只能大致描述它。Make a scatter plot of the following ordered pairs:
::绘制下列定购配对的散射图:(0, 2); (1, 4.5); (2, 9); (3, 11); (4, 13); (5, 18); (6, 19.5)
We make a scatter plot by graphing all the ordered pairs on the coordinate axis:
::我们绘制一个散射图,绘制坐标轴上所有订购的对子的图表:Fit a Line to Data
::符合数据线Notice that the points look like they might be part of a straight line, although they wouldn’t fit perfectly on a straight line. If the points were perfectly lined up, we could just draw a line through any two of them, and that line would go right through all the other points as well. When the points aren’t lined up perfectly, we just have to find a line that is as close to all the points as possible.
::请注意,这些点看上去可能是一条直线的一部分,尽管它们不完全适合一条直线。 如果两点完全排成一行,我们可以通过其中任何两点划出一条线,这条线也可以通过其他所有点。 当这些点排成一行时,我们需要找到一条尽可能接近所有点的线。Here you can see that we could draw many lines through the points in our data set. However, the red line is the line that best fits the points. To prove this mathematically, we would measure all the distances from each data point to line : and then we would show that the sum of all those distances—or rather, the square root of the sum of the squares of the distances—is less than it would be for any other line.
::在这里,您可以看到,我们可以通过我们数据集中的各点绘制许多线条。然而,红线A是最适合各点的线条。为了从数学上证明这一点,我们将测量从每个数据点到A线的所有距离:然后我们将显示,所有这些距离的总和——或者说,距离平方之和的平方根——比任何其他线的平方之和要小。Actually proving this is a lesson for a much more advanced course, so we won’t do it here. And finding the best fit line in the first place is even more complex; instead of doing it by hand, we’ll use a graphing calculator or just “eyeball” the line, as we did above—using our visual sense to guess what line fits best.
::事实上,证明这一点对于更先进的课程来说是一个教训,因此我们在这里不会这样做。 找到最合适的第一线甚至更为复杂;我们不用手动计算法,而是像我们前面所做的那样,用视觉感知来猜测哪条线最合适。Write an Equation For a Line of Best Fit
::写出最合适线的公式Once you draw the line of best fit, you can find its equation by using two points on the line. Finding the equation of the line of best fit is also called linear regression.
::一旦您绘制了最合适的线条, 您可以通过在线条上使用两点来找到它的方程。 找到最合适的线条的方程也称为线性回归 。Caution: Make sure you don’t get caught making a common mistake. Sometimes the line of best fit won’t pass straight through any of the points in the original data set. This means that you can’t just use two points from the data set – you need to use two points that are on the line , which might not be in the data set at all.
::注意: 不要犯共同错误。 有时最合适的线不会直接通过原始数据集中的任何点。 这意味着你不能仅仅使用数据集中的两点 — — 您需要使用线上的两点, 可能根本不在数据集中。In Example 1, it happens that two of the data points are very close to the line of best fit, so we can just use these points to find the equation of the line: (1, 4.5) and (3, 11).
::在例1中,碰巧有两个数据点非常接近最合适的线,因此我们可以使用这两个点来找到线的方程:1、4.5和3、11。Start with the slope-intercept form of a line:
::以一条线的斜度间距形式开始: Y=mx+bFind the : .
::查找:m=11-4.53-1=6.52=3.25。So .
::所以,y=3.25x+b。Plug (3, 11) into the equation:
::方程中的插件(3,11):11=3.25(3)+b=1.25So the equation for the line that fits the data best is .
::符合数据最佳的直线的方程=3.25x+1.25。Performing Linear Regression With a Graphing Calculator
::使用图形计算计算器进行线回归The problem with eyeballing a line of best fit, of course, is that you can’t be sure how accurate your guess is. To get the most accurate equation for the line, we can use a graphing calculator instead. The calculator uses a mathematical algorithm to find the line that minimizes the sum of the squares.
::当然,观察一条最合适的线的问题在于你无法确定你猜想的准确度。 要获得最准确的线方程,我们可以用图形计算器代替。 计算器使用数学算法找到将方形总和最小化的线。Use a graphing calculator to find the equation of the line of best fit for the following data:
::使用图形计算器查找最适合下列数据的线的方程式:(3, 12), (8, 20), (1, 7), (10, 23), (5, 18), (8, 24), (11, 30), (2, 10)
Step 1: Input the data in your calculator.
::步骤 1: 在计算器中输入数据 。Press [STAT] and choose the [EDIT] option. Input the data into the table by entering the values in the first column and the values in the second column.
::按 [STAT] 键并选择 [EDIT] 选项。将数据输入表格,输入时输入第一列的 x - 值和第二列的 y - 值。Step 2: Find the equation of the line of best fit.
::第2步:寻找最合适线的方程式。Press [STAT] again use right arrow to select [CALC] at the top of the screen.
::按 [STAT] 再次使用右箭头选择屏幕顶部的 [CALC] 。Chose option number 4, , and press [ENTER]
::选择选项 4, LinReg( x+b) 和按 [ENTER]The calculator will display .
::计算器将显示 LinReg( 轴+b) 。Press [ENTER] and you will be given the and values.
::按下按键后,你们将获得A值和B值。Here represents the slope and represents the intercept of the equation. The linear regression line is .
::a 这里表示斜坡, b 表示方程的y- interfict。 线性回归线是 y= 2. 01x+5. 94。Step 3. Draw the scatter plot.
::第3步 绘制散射图To draw the scatter plot press [STATPLOT] [2nd] [Y=].
::绘制撒布图按线 [STATBOIT] [第二 [Y=]。Choose Plot 1 and press [ENTER] .
::选择绘图 1 并按 [ENTER] 。Press the On option and set the Type as scatter plot (the one highlighted in black).
::按“On On 选项”并设置该类型为散射图(以黑色突出显示的图)。Make sure that the list and list names match the names of the columns of the table in Step 1.
::确保 X 列表和 Y 列表名称符合步骤1中表格列的名称。Choose the box or plus as the mark, since the simple dot may make it difficult to see the points.
::选择框或加号为标记,因为简单的点可能很难看到点。Press [GRAPH] and adjust the window size so you can see all the points in the scatter plot.
::按 [GRAPH] 键并调整窗口大小, 以便您可以看到散射图中的所有点 。Step 4. Draw the line of best fit through the scatter plot.
::步骤4. 通过散射图绘制最合适的线条。Press [Y=]
::按按 [Y=]Enter the equation of the line of best fit that you just found: .
::输入您刚刚找到的最合适行的方程式 : y= 2. 01x+5.94。Press [GRAPH] .
::新闻[GRAPH]。Solve Real-World Problems Using Linear Models of Scattered Data
::使用散散散数据线性模型解决现实世界问题Once we’ve found the line of best fit for a data set, we can use the equation of that line to predict other data points.
::一旦我们找到了最适合数据集的线条, 我们可以用线条的方程来预测其他数据点。Real-World Application: 5K Training
::现实世界应用:5K培训Nadia is training for a 5K race. The following table shows her times for each month of her training program. Find an equation of a line of fit. Predict her running time if her race is in August.
::Nadia正在训练一场5K的比赛。 下表显示她每个月参加训练计划的时间。 找到一个适合的等式。 如果她的比赛在八月, 预测她的赛跑时间 。Month Month number Average time (minutes) January 0 40 February 1 38 March 2 39 April 3 38 May 4 33 June 5 30 Let’s make a scatter plot of Nadia’s running times. The independent variable , , is the month number and the dependent variable, , is the running time. We plot all the points in the table on the coordinate plane , and then sketch a line of fit.
::让我们绘制Nadia运行时间的分布图。 独立的变量 x, 是月数, 依附变量 y , 是运行时间。 我们绘制坐标平面上表格中的所有点, 然后绘制合适的线条 。Two points on the line are (0, 42) and (4, 34). We’ll use them to find the equation of the line:
::线上的两点是 (0, 42) 和 (4, 34) 。 我们将用它们来找到线的方程式 :
::m=34 - 424 - 0842y2x+b422(0)+bb=42y2x+42In a real-world problem, the slope and intercept have a physical significance. In this case, the slope tells us how Nadia’s running time changes each month she trains. Specifically, it decreases by 2 minutes per month. Meanwhile, the intercept tells us that when Nadia started training, she ran a distance of 5K in 42 minutes.
::在一个现实世界的问题中,斜坡和y - intervictive具有物理意义。 在这个例子中,斜坡告诉我们纳迪雅每月训练的时间变化如何。 具体地说,它每月减少2分钟。 与此同时,y - interview告诉我们,当纳迪雅开始训练时,她每42分钟跑5千米远。The problem asks us to predict Nadia’s running time in August. Since June is defined as month number 5, August will be month number 7. We plug into the equation of the line of best fit:
::这个问题要求我们预测Nadia在八月的运行时间。 由于6月的定义是月号5, 8月将是月号7, 我们将x=7插在最合适的方程式的方程式中:
::y2(7)+4214+42=28The equation predicts that Nadia will run the 5K race in 28 minutes.
::方程式预测Nadia将在28分钟内 进行5K比赛In this solution, we eyeballed a line of fit. Using a graphing calculator, we can find this equation for a line of fit instead:
::在这个解决方案中, 我们看到一条合适的线。 使用图形计算器, 我们可以找到这个方程式, 换成一条适合的线 : y2.2x+43.7 。If we plug into this equation, we get . This means that Nadia will run her race in 28.3 minutes. You see that the graphing calculator gives a different equation and a different answer to the question. The graphing calculator result is more accurate, but the line we drew by hand still gives a good approximation to the result. And of course, there’s no guarantee that Nadia will actually finish the race in that exact time; both answers are estimates, it’s just that the calculator’s estimate is slightly more likely to be right.
::如果我们在这个方程式中插入 x=7, 我们就会得到 y2.2(7)+43.7=28.3。 这意味着 Nadia 将在28.3 分钟内进行比赛。 您可以看到, 图形计算器给出了不同的方程式和不同的答案。 图形计算器的结果更准确, 但我们手工绘制的线仍然能很好地接近结果。 当然, 无法保证 Nadia 能在那个准确的时间内完成比赛; 两种答案都是估计的, 只是计算器的估计可能略为正确。Example
::示例示例示例示例Example 1
::例1Peter is testing the burning time of “BriteGlo” candles. The following table shows how long it takes to burn candles of different weights. Assume it’s a linear relation , so we can use a line to fit the data. If a candle burns for 95 hours, what must be its weight in ounces?
::彼得正在测试“ BriteGlo” 蜡烛的燃烧时间。 下表显示燃烧不同重量的蜡烛需要多长时间。 假设这是线性关系,我们可以用线条来匹配数据。 如果蜡烛燃烧了95小时,那么其重量必须用盎司表示多少?Candle weight (oz) Time (hours) 2 15 3 20 4 35 5 36 10 80 16 100 22 120 26 180 Let’s make a scatter plot of the data. The independent variable, , is the candle weight and the dependent variable, , is the time it takes the candle to burn. We plot all the points in the table on the coordinate plane, and draw a line of fit.
::让我们绘制数据散射图。 独立的变量x是蜡烛重量,依附变量y是用蜡烛点燃的时间。 我们绘制坐标平面上桌子上的所有点,并绘制合适的线条。Two convenient points on the line are (0,0) and (30, 200). Find the equation of the line:
::线上的两个方便点是(0,0)和(30,200)。
::m=20030=203y=203x+b0=203(0)+bb=0y=203xA slope of tells us that for each extra ounce of candle weight, the burning time increases by hours. A intercept of zero tells us that a candle of weight 0 oz will burn for 0 hours.
::203=623的斜坡告诉我们,每增加一盎司的蜡烛重量,燃烧时间将增加623小时。 零的Y-拦截告诉我们,0oz重量的蜡烛将燃烧0小时。The problem asks for the weight of a candle that burns 95 hours; in other words, what’s the value that gives a value of 95? Plugging in :
::问题在于点燃了95小时的蜡烛的重量;换句话说,Y值为95的x值是多少?插在Y=95中:
::y=203x95=203xx=28520=574=1414A candle that burns 95 hours weighs 14.25 oz.
::一根燃烧95小时的蜡烛 重14.25ozA graphing calculator gives the linear regression equation as and a result of 14.6 oz.
::一个图形计算器将线性回归方程式改为y=6.1x+5.9,结果为14.6oz。Review
::回顾For problems 1-4, draw the scatter plot and find an equation that fits the data set by hand.
::对于问题1-4, 绘制散射图, 并找到符合手动数据集的方程式 。- (57, 45); (65, 61); (34, 30); (87, 78); (42, 41); (35, 36); (59, 35); (61, 57); (25, 23); (35, 34)
- (32, 43); (54, 61); (89, 94); (25, 34); (43, 56); (58, 67); (38, 46); (47, 56); (39, 48)
- (12, 18); (5, 24); (15, 16); (11, 19); (9, 12); (7, 13); (6, 17); (12, 14)
- (3, 12); (8, 20); (1, 7); (10, 23); (5, 18); (8, 24); (2, 10)
-
Use the graph from problem 1 to predict the
values for two
values of your choice that are not in the data set.
::使用问题1中的图形来预测您选择的、不在数据集中的两个 x - 值的 Y - 值 。 -
Use the graph from problem 2 to predict the
values for two
values of your choice that are not in the data set.
::使用问题2中的图表来预测数据组中未包含的您选择的两个 Y - 值的 x - 值 。 -
Use the equation from problem 3 to predict the
values for two
values of your choice that are not in the data set.
::使用问题3的方程来预测您选择的、不在数据集中的两个 x - 值的 Y - 值 。 -
Use the equation from problem 4 to predict the
values for two
values of your choice that are not in the data set.
::使用问题4的方程来预测您选择的不在数据集中的两个 Y - 值的 x - 值 。
For problems 9-11, use a graphing calculator to find the equation of the line of best fit for the data set.
::对于问题9-11,使用图表计算器来找到最适合数据集的线的方程式。- (57, 45); (65, 61); (34, 30); (87, 78); (42, 41); (35, 36); (59, 35); (61, 57); (25, 23); (35, 34)
- (32, 43); (54, 61); (89, 94); (25, 34); (43, 56); (58, 67); (38, 46); (47, 56); (95, 105); (39, 48)
- (12, 18); (3, 26); (5, 24); (15, 16); (11, 19); (0, 27); (9, 12); (7, 13); (6, 17); (12, 14)
-
Graph the best fit line on top of the scatter plot for problem 10. Then pick a data point that’s close to the line, and change its
value to move it much farther from the line.
-
Calculate the new best fit line with that one point changed; write the equation of that line along with the coordinates of the new point.
::计算与该点变化的某个点相适应的新的最佳线条; 写入该线条的方程以及新点的坐标 。 -
How much did the slope of the best fit line change when you changed that point?
::当你改变那个点时 最合适线的斜坡变化了多少?
::问题 10 的分布图上绘制最适合的线条。 然后选择一个接近线条的数据点, 并修改其 y - 值, 使其更远地移动到线条上。 计算一个点的新的最适合线条变化了; 写入该线条的方程和新点的坐标。 当您更改了该点时, 最佳线条的斜度变化了多少 ? -
Calculate the new best fit line with that one point changed; write the equation of that line along with the coordinates of the new point.
-
Graph the scatter plot from problem 11 and change one point as you did in the previous problem.
-
Calculate the new best fit line with that one point changed; write the equation of that line along with the coordinates of the new point.
::计算与该点变化的某个点相适应的新的最佳线条; 写入该线条的方程以及新点的坐标 。 -
Did changing that one point seem to affect the slope of the best fit line more or less than it did in the previous problem? What might account for this difference?
::改变这一点似乎是否比前一个问题影响最合适线的斜坡多或少? 是什么原因可以解释这一差异?
::从问题 11 中绘制散点图, 并和您在前一个问题中所做的一样, 更改一个点 。 计算一个点的新的最佳匹配线; 将该线的方程与新点的坐标一起写入 。 是否对一个点做了修改, 似乎对最佳匹配线的斜度的影响大于或小于前一个问题中的影响 ? 是什么可以解释这一差异 ? -
Calculate the new best fit line with that one point changed; write the equation of that line along with the coordinates of the new point.
-
Shiva is trying to beat the samosa-eating record. The current record is 53.5 samosas in 12 minutes. Each day he practices and the following table shows how many samosas he eats each day for the first week of his training.
::Shiva试图打破萨莫萨食用记录,目前记录为12分钟53.5马萨,每天练习,下表显示他训练的第一周每天吃多少马摩萨。
Day No. of samosas 1 30 2 34 3 36 4 36 5 40 6 43 7 45 (a) Draw a scatter plot and find an equation to fit the data.
:a) 绘制散射图并找到适合数据的方程。
(b) Will he be ready for the contest if it occurs two weeks from the day he started training?
:b) 如果比赛从他开始培训之日起两周内举行,他是否愿意参加比赛?
(c) What are the meanings of the slope and the intercept in this problem?
:c) 这一问题中斜坡的含义和y-interphy的含义是什么?
-
Anne is trying to find the elasticity coefficient of a Superball. She drops the ball from different heights and measures the maximum height of the ball after the bounce. The table below shows the data she collected.
::Anne试图找到超级球的弹性系数。 她从不同高度投下球, 测量弹跳后球的最大高度。 下表显示她收集的数据 。
Initial height (cm) Bounce height (cm) 30 22 35 26 40 29 45 34 50 38 55 40 60 45 65 50 70 52 (a) Draw a scatter plot and find the equation.
:a) 绘制散射图并找到方程。
(b) What height would she have to drop the ball from for it to bounce 65 cm?
:b) 她投球到65厘米的弹出需要多少高度?
(c) What are the meanings of the slope and the intercept in this problem?
:c) 这一问题中斜坡的含义和y-interphy的含义是什么?
(d) Does the intercept make sense? Why isn’t it (0, 0)?
:d) y-拦截是否合理?为什么不是(0,0)?
-
The following table shows the median California family income from 1995 to 2002 as reported by the US Census Bureau.
::下表显示美国人口普查局报告的1995年至2002年加利福尼亚家庭收入中位数。
Year Income 1995 53,807 1996 55,217 1997 55,209 1998 55,415 1999 63,100 2000 63,206 2001 63,761 2002 65,766 (a) Draw a scatter plot and find the equation.
:a) 绘制散射图并找到方程。
(b) What would you expect the median annual income of a Californian family to be in year 2010?
:b) 加利福尼亚家庭在2010年的年收入中位数是多少?
(c) What are the meanings of the slope and the intercept in this problem?
:c) 这一问题中斜坡的含义和y-interphy的含义是什么?
(d) Inflation in the U.S. is measured by the Consumer Price Index, which increased by 20% between 1995 and 2002. Did the median income of California families keep up with inflation over that time period? (In other words, did it increase by at least 20%?)
:d) 美国的通货膨胀是以消费物价指数衡量的,该指数在1995年至2002年之间增加了20%。 加利福尼亚家庭的中位收入在这一期间是否与通货膨胀持平? (换句话说,是否至少增加了20%? )
Review (Answers)
::回顾(答复)Click to see the answer key or go to the Table of Contents and click on the Answer Key under the 'Other Versions' option.
::单击可查看答题键, 或转到目录中, 单击“ 其他版本” 选项下的答题键 。