机器学习
在上一节中,我们学习了简单线性回归,其中使用单个自变量/预测变量(X)来建模响应变量(Y)。但是,在许多情况下,响应变量可能受多个预测变量影响;对于这种情况,就需要使用多元线性回归。
此外,多元线性回归是简单线性回归的扩展,因为它使用多个预测变量来预测响应变量。
我们可以将其定义为:
“多元线性回归是重要的回归算法之一,它模拟单个连续因变量与多个自变量之间的线性关系。”
示例:
根据汽车的发动机尺寸和气缸数量预测二氧化碳排放量。
多元线性回归的关键点:
- 对于多元线性回归,因变量或目标变量(Y)必须是连续/实数,但预测变量或自变量可以是连续或类别形式。
- 每个特征变量都必须与因变量建立线性关系。
- 多元线性回归尝试在数据点的多维空间中拟合回归线。
多元线性回归方程:
在多元线性回归中,目标变量(Y)是多个预测变量 x1,x2,x3,...,xn 的线性组合。由于它是简单线性回归的增强,因此多元线性回归方程也适用相同的原则,方程变为:
其中:
- Y = 输出/响应变量
- b0,b1,b2,b3,...,bn = 模型的系数。
- x1,x2,x3,x4,... = 各种独立/特征变量
多元线性回归的假设:
- 目标变量和预测变量之间应存在线性关系。
- 回归残差必须正态分布。
- 多元线性回归假设数据中很少或没有多重共线性(自变量之间的相关性)。
In the previous topic, we have learned about Simple Linear Regression, where a single
Independent/Predictor(X) variable is used to model the response variable (Y). But there may be various cases in
which the response variable is affected by more than one predictor variable; for such cases, the Multiple LineMoreover, Multiple Linear Regression is an extension of Simple Linear regression as it takes more than one
predictor variable to predict the response variable.
We can define it as:
“Multiple Linear Regression is one of the important regression algorithms which models the linear relationship
between a single dependent continuous variable and more than one independent variable.”
Example:
Prediction of CO 2 emission based on engine size and number of cylinders in a car.
Some key points about MLR:
o For MLR, the dependent or target variable(Y) must be the continuous/real, but the predictor or independent
variable may be of continuous or categorical form.
o
Each feature variable must model the linear relationship with the dependent variable.
o
MLR tries to fit a regression line through a multidimensional space of data-points.
MLR equation:
In Multiple Linear Regression, the target variable(Y) is a linear combination of multiple predictor variables
x1, x2, x3, ...,xn. Since it is an enhancement of Simple Linear Regression, so the same is applied for the multiple
linear regression equation, the equation becomes:
Y= b<sub>0</sub>+b<sub>1</sub>x<sub>1</sub>+ b<sub>2</sub>x<sub>2</sub>+ b<sub>3</sub>x<sub>
3</sub>+...... bnxn
............... (a)
Where,
Y= Output/Response variable
b0, b1, b2, b3 , b n....= Coefficients of the model.
x1, x2, x3, x4,...= Various Independent/feature variable
Assumptions for Multiple Linear Regression:
o A linear relationship should exist between the Target and predictor variables.
o
o
The regression residuals must be normally distributed.
MLR assumes little or no multicollinearity (correlation between the independent variable) in data.