Python 机器学习
- 基础知识
我们生活在“数据时代”,这个时代拥有更强大的计算能力和更多的存储资源。数据或信息日益增长,但真正的挑战在于理解所有这些数据。企业和组织正试图通过利用数据科学、数据挖掘和机器学习的概念和方法来构建智能系统以应对这一挑战。其中,机器学习是计算机科学中最令人兴奋的领域。如果我们将机器学习称为为数据赋予意义的算法应用和科学,那也毫不为过。
什么是机器学习?
机器学习(ML)是计算机科学的一个领域,通过它可以帮助计算机系统像人类一样理解数据。
简单来说,机器学习是一种人工智能,它通过算法或方法从原始数据中提取模式。机器学习的主要重点是让计算机系统能够从经验中学习,而无需显式编程或人工干预。
机器学习的需求
人类目前是地球上最智能、最先进的物种,因为他们能够思考、评估和解决复杂的问题。另一方面,人工智能仍处于初始阶段,在许多方面尚未超越人类智能。那么问题来了,为什么需要让机器学习呢?最合适的原因是:“以高效和规模化的方式,根据数据做出决策”。
最近,组织正大力投资于人工智能、机器学习和深度学习等新技术,以从数据中获取关键信息,从而执行多项现实任务和解决问题。我们可以称之为机器根据数据做出的决策,特别是为了自动化流程。这些数据驱动的决策可以用来解决那些无法通过固有编程逻辑来解决的问题,而不是使用编程逻辑。事实是,我们离不开人类智能,但另一方面,我们需要以大规模、高效率的方式解决现实世界的问题。这就是机器学习需求产生的原因。
为什么以及何时让机器学习?
我们已经讨论了机器学习的需求,但另一个问题出现了:在什么情况下我们必须让机器学习?在某些情况下,我们需要机器高效且大规模地做出数据驱动的决策。以下是一些让机器学习会更有效的情况:
缺乏人类专业知识
我们希望机器学习并做出数据驱动决策的第一个场景,可能是人类专业知识缺乏的领域。例如未知领域或空间星球的导航。
动态场景
有些场景本质上是动态的,即它们会随时间不断变化。在这些场景和行为中,我们希望机器学习并做出数据驱动的决策。例如组织中的网络连接性和基础设施可用性。
难以将专业知识转化为计算任务
在许多领域中,人类拥有专业知识;然而,他们无法将这种专业知识转化为计算任务。在这种情况下,我们需要机器学习。例如语音识别、认知任务等领域。
机器学习模型
在讨论机器学习模型之前,我们必须了解 Mitchell 教授给出的机器学习的以下正式定义:
“如果一个计算机程序在任务 T 上的表现,通过 P 衡量,随着经验 E 的提高而改善,则称该程序从经验 E 中学习,关于某类任务 T 和性能度量 P。”
上述定义主要关注三个参数,它们也是任何学习算法的主要组成部分,即任务(T)、性能(P)和经验(E)。在这种背景下,我们可以将此定义简化为:
机器学习是人工智能的一个领域,由学习算法组成,这些算法:
- 提高它们的性能(P)
- 在执行某些任务(T)时
- 随着经验(E)的积累而提高
基于以上内容,下图表示了一个机器学习模型:
让我们现在更详细地讨论它们:
任务(T)
从问题的角度来看,我们可以将任务 T 定义为要解决的现实世界问题。问题可以是任何事情,例如找到特定位置的最佳房价或找到最佳营销策略等。另一方面,如果我们谈论机器学习,任务的定义是不同的,因为基于机器学习的任务很难通过传统的编程方法来解决。
当任务 T 基于一个过程并且系统必须遵循该过程来操作数据点时,它被称为基于机器学习的任务。基于机器学习的任务的例子有分类、回归、结构化标注、聚类、转录等。
经验(E)
顾名思义,它是从提供给算法或模型的数据点中获得的知识。一旦提供了数据集,模型将迭代运行并学习一些固有的模式。由此获得的学习称为经验(E)。与人类学习类比,我们可以将这种情况视为人类从各种属性(如情境、关系等)中学习或获得一些经验。监督学习、无监督学习和强化学习是学习或获得经验的一些方式。我们的机器学习模型或算法获得的经验将用于解决任务 T。
性能(P)
机器学习算法应该随着时间的推移执行任务并获得经验。衡量机器学习算法是否按预期执行的指标是其性能(P)。P 本质上是一个定量指标,它表示模型如何使用其经验 E 执行任务 T。有许多指标有助于理解机器学习性能,例如准确率、F1 分数、混淆矩阵、精确度、召回率、敏感性等。
机器学习面临的挑战
尽管机器学习正在迅速发展,并在网络安全和自动驾驶汽车方面取得了重大进展,但作为人工智能的这个分支作为一个整体仍有很长的路要走。原因在于机器学习未能克服许多挑战。机器学习目前面临的挑战是:
数据质量: 为机器学习算法提供高质量数据是最大的挑战之一。使用低质量数据会导致与数据预处理和特征提取相关的问题。
耗时任务: 机器学习模型面临的另一个挑战是耗时,特别是数据采集、特征提取和检索。
缺乏专业人员: 由于机器学习技术仍处于起步阶段,因此难以获得专业资源。
业务问题制定目标不明确: 业务问题缺乏明确的目标和明确的定义是机器学习的另一个关键挑战,因为这项技术尚未成熟。
过拟合和欠拟合问题: 如果模型过拟合或欠拟合,则无法很好地表示问题。
维度灾难: 机器学习模型面临的另一个挑战是数据点特征过多。这可能是一个真正的障碍。
部署困难: 机器学习模型的复杂性使其在现实生活中部署起来相当困难。
机器学习的应用
机器学习是增长最快的技术,研究人员认为我们正处于人工智能和机器学习的黄金时代。它用于解决许多无法通过传统方法解决的现实世界复杂问题。以下是机器学习的一些现实世界应用:
- 情感分析
- 情绪分析
- 错误检测和预防
- 天气预报和预测
- 股票市场分析和预测
- 语音合成
- 语音识别
- 客户细分
- 物体识别
- 欺诈检测
- 欺诈预防
- 在线购物中向客户推荐产品。
Machine Learning with Python
– Basics
We are living in the ‘age of data’ that is enriched with better computational power and
more storage resources,. This data or information is increasing day by day, but the real
challenge is to make sense of all the data. Businesses & organizations are trying to deal
with it by building intelligent systems using the concepts and methodologies from Data
science, Data Mining and Machine learning. Among them, machine learning is the most
exciting field of computer science. It would not be wrong if we call machine learning the
application and science of algorithms that provides sense to the data.
What is Machine Learning?
Machine Learning (ML) is that field of computer science with the help of which computer
systems can provide sense to data in much the same way as human beings do.
In simple words, ML is a type of artificial intelligence that extract patterns out of raw data
by using an algorithm or method. The main focus of ML is to allow computer systems learn
from experience without being explicitly programmed or human intervention.
Need for Machine Learning
Human beings, at this moment, are the most intelligent and advanced species on earth
because they can think, evaluate and solve complex problems. On the other side, AI is still
in its initial stage and haven’t surpassed human intelligence in many aspects. Then the
question is that what is the need to make machine learn? The most suitable reason for
doing this is, “to make decisions, based on data, with efficiency and scale”.
Lately, organizations are investing heavily in newer technologies like Artificial Intelligence,
Machine Learning and Deep Learning to get the key information from data to perform
several real-world tasks and solve problems. We can call it data-driven decisions taken by
machines, particularly to automate the process. These data-driven decisions can be used,
instead of using programing logic, in the problems that cannot be programmed inherently.
The fact is that we can’t do without human intelligence, but other aspect is that we all
need to solve real-world problems with efficiency at a huge scale. That is why the need for
machine learning arises.
Why & When to Make Machines Learn?
We have already discussed the need for machine learning, but another question arises
that in what scenarios we must make the machine learn? There can be several
circumstances where we need machines to take data-driven decisions with efficiency and
at a huge scale. The followings are some of such circumstances where making machines
learn would be more effective:
Lack of human expertise
The very first scenario in which we want a machine to learn and take data-driven decisions,
can be the domain where there is a lack of human expertise. The examples can be
navigations in unknown territories or spatial planets.
1Machine Learning with Python
Dynamic scenarios
There are some scenarios which are dynamic in nature i.e. they keep changing over time.
In case of these scenarios and behaviors, we want a machine to learn and take data-driven
decisions. Some of the examples can be network connectivity and availability of
infrastructure in an organization.
Difficulty in translating expertise into computational tasks
There can be various domains in which humans have their expertise,; however, they are
unable to translate this expertise into computational tasks. In such circumstances we want
machine learning. The examples can be the domains of speech recognition, cognitive tasks
etc.
Machine Learning Model
Before discussing the machine learning model, we must need to understand the following
formal definition of ML given by professor Mitchell:
“A computer program is said to learn from experience E with respect to some class of
tasks T and performance measure P, if its performance at tasks in T, as measured by P,
improves with experience E.”
The above definition is basically focusing on three parameters, also the main components
of any learning algorithm, namely Task(T), Performance(P) and experience (E). In this
context, we can simplify this definition as:
ML is a field of AI consisting of learning algorithms that:
Improve their performance (P)
At executing some task (T)
Over time with experience (E)
2Machine Learning with Python
Based on the above, the following diagram represents a Machine Learning Model:
Task (T)
Performan
ce (P)
Experienc
e (E)
Let us discuss them more in detail now:
Task(T)
From the perspective of problem, we may define the task T as the real-world problem to
be solved. The problem can be anything like finding best house price in a specific location
or to find best marketing strategy etc. On the other hand, if we talk about machine
learning, the definition of task is different because it is difficult to solve ML based tasks by
conventional programming approach.
A task T is said to be a ML based task when it is based on the process and the system
must follow for operating on data points. The examples of ML based tasks are
Classification, Regression, Structured annotation, Clustering, Transcription etc.
Experience (E)
As name suggests, it is the knowledge gained from data points provided to the algorithm
or model. Once provided with the dataset, the model will run iteratively and will learn
some inherent pattern. The learning thus acquired is called experience(E). Making an
analogy with human learning, we can think of this situation as in which a human being is
learning or gaining some experience from various attributes like situation, relationships
etc. Supervised, unsupervised and reinforcement learning are some ways to learn or gain
experience. The experience gained by out ML model or algorithm will be used to solve the
task T.
3Machine Learning with Python
Performance (P)
An ML algorithm is supposed to perform task and gain experience with the passage of
time. The measure which tells whether ML algorithm is performing as per expectation or
not is its performance (P). P is basically a quantitative metric that tells how a model is
performing the task, T, using its experience, E. There are many metrics that help to
understand the ML performance, such as accuracy score, F1 score, confusion matrix,
precision, recall, sensitivity etc.
Challenges in Machines Learning
While Machine Learning is rapidly evolving, making significant strides with cybersecurity
and autonomous cars, this segment of AI as whole still has a long way to go. The reason
behind is that ML has not been able to overcome number of challenges. The challenges
that ML is facing currently are:
Quality of data: Having good-quality data for ML algorithms is one of the biggest
challenges. Use of low-quality data leads to the problems related to data preprocessing
and feature extraction.
Time-Consuming task: Another challenge faced by ML models is the consumption of
time especially for data acquisition, feature extraction and retrieval.
Lack of specialist persons: As ML technology is still in its infancy stage, availability of
expert resources is a tough job.
No clear objective for formulating business problems: Having no clear objective and
well-defined goal for business problems is another key challenge for ML because this
technology is not that mature yet.
Issue of overfitting & underfitting: If the model is overfitting or underfitting, it cannot
be represented well for the problem.
Curse of dimensionality: Another challenge ML model faces is too many features of data
points. This can be a real hindrance.
Difficulty in deployment: Complexity of the ML model makes it quite difficult to be
deployed in real life.
Applications of Machines Learning
Machine Learning is the most rapidly growing technology and according to researchers we
are in the golden year of AI and ML. It is used to solve many real-world complex problems
which cannot be solved with traditional approach. Following are some real-world
applications of ML:
Emotion analysis
Sentiment analysis
Error detection and prevention
Weather forecasting and prediction
Stock market analysis and forecasting
Speech synthesis
Speech recognition
4Machine Learning with Python
Customer segmentation
Object recognition
Fraud detection
Fraud prevention
Recommendation of products to customer in online shopping.