translation of 《deep learning》 Chapter 1 Introduction


Inventors have long dreamed of creating machines that think. Ancient Greek myths tell of intelligent objects, such as animated statues of human beings and tables that arrive full of food and drink when called。


When programmable computers were first conceived, people wondered whetherthey might become intelligent, over a hundred years before one was built (Lovelace,1842). Today, artificial intelligence (AI) is a thriving field with many practical applications and active research topics. We look to intelligent software to automate routine labor, understand speech or images, make diagnoses in medicine and support basic scientific research.

当可编程电脑第一次成为构想时,人们就思考它是否会变得智能,这个问题困扰了人类100年,直到第一个可编程电脑的出现(Lovelace,1842)。今天,人工智能是一个繁荣昌盛的研究领域,有很多实际的应用 ,也有很多活跃的研究课题。我们期望智能软件能够,自动完成日常劳务, 理解语音或者图像,在医学上完成疾病诊断,并且能够支撑基础的科学研究。

In the early days of artificial intelligence, the field rapidly tackled and solved problems that are intellectually diffcult for human beings but relatively straight-forward for computers, problems that can be described by a list of formal, math-ematical rules. The true challenge to artificial intelligence proved to be solving the tasks that are easy for people to perform but hard for people to describe formally。Problems that we solve intuitively, that feel automatic, like recognizing spoken words or faces in images

在人工智能的早期,AI能够快速的解决那些对于人类很复杂,但是对于机器来说很直白的问题,一般对于机器来说,这些问题往往能够被描述为一系列形式化的数学公式。 然而,真正对人工智能有挑战的问题大多都是对人类来说很容易解决,但是人类很难形式化的去描述它。这部分问题我们往往能够很自然的凭直觉去解决,比如识别说话内容,辨别图像中的脸。

This book is about a solution to these more intuitive problems. This solution is to allow computers to learn from experience and understand the world in terms of ahierarchy of concepts, with each concept defined in terms of its relation to simpler concepts. By gathering knowledge from experience, this approach avoids the need for human operators to formally specify all of the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated conceptsby building them out of simpler ones. If we draw a graph showing how these concepts are built on top of each other, the graph is deep, with many layers. For this reason, we call this approach to AI deep learning.


Many of the early successes of AI took place in relatively sterile and formal environments and did not require computers to have much knowledge about the world. For example, IBM’s Deep Blue chess-playing system defeated world champion Garry Kasparov in 1997 (Hsu, 2002). Chess is of course a very simple world, containing only sixty-four locations and thirty-two pieces that can move in only rigidly circumscribed ways. Devising a successful chess strategy is a tremendous accomplishment, but the challenge is not due to the difficulty of describing the set of chess pieces and allowable moves to the computer. Chess can be completely described by a very brief list of completely formal rules, easily provided ahead of time by the programmer。

很多早期的人工智能的成功发生在单一、形式化的环境下,不需要计算机对世界有太多的知识。例如,IBM的深蓝象棋系统在1997年(Hsu)击败世界冠军Garry Kasparov。国际象棋本身当然是一个非常单一的环境,仅仅包含64个位置和32个棋子,并且他们的移动方式有严格的规定。设计一个成功的象棋策略是非凡的成就,但是这个设计任务的难点本身不在于描述象棋棋子以及他们可能的移动步伐。象棋完全可以用一个完全形式化的规则列表去描述,而这个份列表是程序员预先提供好的。

Ironically, abstract and formal tasks that are among the most difficult mental undertakings for a human being are among the easiest for a computer. Computers have long been able to defeat even the best human chess player, but are only recently matching some of the abilities of average human beings to recognize objects or speech. A person’s everyday life requires an immense amount of knowledge about the world. Much of this knowledge is subjective and intuitive, and therefore difficult to articulate in a formal way. Computers need to capture this same knowledge in order to behave in an intelligent way. One of the key challenges in artificial intelligence is how to get this informal knowledge into a computer。


Several artificial intelligence projects have sought to hard-code knowledge about the world in formal languages. A computer can reason about statements in these formal languages automatically using logical inference rules. This is known as the knowledge base approach to artificial intelligence. None of these projects has led to a major success. One of the most famous such projects is Cyc (Lenat and Guha,1989). Cyc is an inference engine and a database of statements in a language called CycL. These statements are entered by a staff of human supervisors. It is an unwieldy process. People struggle to devise formal rules with enough complexity to accurately describe the world. For example, Cyc failed to understand a story about a person named Fred shaving in the morning (Linde, 1992). Its inference engine detected an inconsistency in the story: it knew that people do not have electrical parts, but because Fred was holding an electric razor, it believed the entity “FredWhileShaving” contained electrical parts. It therefore asked whether Fred was still a person while he was shaving
几种人工智能项目试图通过硬编码的方式去形式化的描述这个世界的知识。计算机能够自动的通过逻辑推理规则来推理这些用形式化语言描述的陈述。这被称为 the knowledge base 的人工智能方法。这类方法没有哪个项目取得特别成功。最著名的此类项目之一是Cyc(Lenat和古哈,1989)。Cyc是一个推理引擎以及一些用CycL语言描述的陈述组成的数据库。这些语句是人类再有监督的情况下输入的,这是一个笨拙繁琐的过程。人们难以制定足够形式化的规则来准确地描述世界。例如,Cyc不能理解一个名为Fred的人在早上刮胡子的故事(Linde,1992)。推理引擎检测到故事中的一个不一致的地方:它知道人们是不含有电子部分(自然体),但因为弗雷德有一个电动剃须刀,它认为“FredWhileShaving”这个实体是包含电子部分的(有电动剃须刀)。因此Cyc很疑惑剃须时的Fred是不是仍然是一个人。

The difficulties faced by systems relying on hard-coded knowledge suggest that AI systems need the ability to acquire their own knowledge, by extracting patterns from raw data. This capability is known as machine learning. The introduction of machine learning allowed computers to tackle problems involving knowledge of the real world and make decisions that appear subjective. A simple machine learning algorithm called logistic regression can determine whether to recommend cesarean delivery (Mor-Yosef et al., 1990). A simple machine learning algorithm called naive Bayes can separate legitimate e-mail from spam e-mail.

依赖于知识硬编码的系统面临的问题告诉我们AI系统需要能够自己获得知识,通过从原始数据从获得固存的模式。这种能力被称之为机器学习。机器学习的引入使得计算机能够处理一些涉及真实世界知识的问题,并且能够主观的做决定。一个简单的机器学习算法 逻辑斯地回归 能够决定是否推荐剖腹产(Mor-Yosef et al., 1990)。 一个简单的机器学习算法,朴素贝叶斯算法可以把正规邮件从垃圾邮件里面分离出来。

The performance of these simple machine learning algorithms depends heavily on the representation of the data they are given. For example, when logistic regression is used to recommend cesarean delivery, the AI system does not examine the patient directly. Instead, the doctor tells the system several pieces of relevant information, such as the presence or absence of a uterine scar. Each piece of information included in the representation of the patient is known as a feature.Logistic regression learns how each of these features of the patient correlates with various outcomes. However, it cannot influence the way that the features are defined in any way. If logistic regression was given an MRI scan of the patient, rather than the doctor’s formalized report, it would not be able to make useful predictions. Individual pixels in an MRI scan have negligible correlation with any complications that might occur during delivery


This dependence on representations is a general phenomenon that appears throughout computer science and even daily life. In computer science, operations such as searching a collection of data can proceed exponentially faster if the collection is structured and indexed intelligently. People can easily perform arithmetic on Arabic numerals, but find arithmetic on Roman numerals much more time-consuming. It is not surprising that the choice of representation has an enormous effect on the performance of machine learning algorithms. For a simple visual example, see Fig. 1.1.3



Many artificial intelligence tasks can be solved by designing the right set of features to extract for that task, then providing these features to a simple machine learning algorithm. For example, a useful feature for speaker identification from sound is an estimate of the size of speaker’s vocal tract. It therefore gives a strong clue as to whether the speaker is a man, woman, or child

However, for many tasks, it is difficult to know what features should be extracted. For example, suppose that we would like to write a program to detect cars in photographs. We know that cars have wheels, so we might like to use the presence of a wheel as a feature. Unfortunately, it is difficult to describe exactly what a wheel looks like in terms of pixel values. A wheel has a simple geometric shape but its image may be complicated by shadows falling on the wheel, the sun glaring off the metal parts of the wheel, the fender of the car or an object in the foreground obscuring part of the wheel, and so on。

但是对于很多任务来说,我们很难知道该提取哪些特征。例如,假如我们需要来写一个检测图片中汽车的程序。我们知道汽车有轮子。因此我们可能会用是否有轮子作为一个特征。但是,从像素值的角度我们很难描述一个轮子究竟长什么样。一个轮子有简单的几何形状,但是他的图片可能十分复杂,因为有影子可能照到轮子上,太阳光分照射在轮子的金属部分,汽车的护栏,或者前景物体 遮挡了轮子的部分,等等。

One solution to this problem is to use machine learning to discover not onlythe mapping from representation to output but also the representation itself.This approach is known as representation learning. Learned representations often result in much better performance than can be obtained with hand-designed representations. They also allow AI systems to rapidly adapt to new tasks, with minimal human intervention. A representation learning algorithm can discover a good set of features for a simple task in minutes, or a complex task in hours to months. Manually designing features for a complex task requires a great deal of human time and effort; it can take decades for an entire community of researchers

对于特征问题,一种解决办法就是利用机器学习的方法去解决。不仅仅是学习从表达到输出结果的映射,同时还学习如何获得表达本身。这种方法就是我们熟知的表达学习 representation learning 。 相比手动设计的特征表达,自动学习的表达常常可以得到一个更加好的性能。他们也允许AI系统更快的适应新的任务,以最少的人工干预。一个表达学习算法可以再几分钟内为一个简单的任务学习一个很好的特征集合,对于复杂的任务,时间可能是几个小时到几个月。但是为复杂任务做人工的特征设计常常要花掉大量的时间和精力。可能需要花掉整个领域的科研人员几十年的时间。

The quintessential example of a representation learning algorithm is the au-toencoder. An autoencoder is the combination of an encoder function that convertsthe input data into a different representation, and a decoder function that convertsthe new representation back into the original format. Autoencoders are trained topreserve as much information as possible when an input is run through the encoderand then the decoder, but are also trained to make the new representation havevarious nice properties. Different kinds of autoencoders aim to achieve differentkinds of properties


When designing features or algorithms for learning features, our goal is usually to separate the factors of variation that explain the observed data. In this context,we use the word “factors” simply to refer to separate sources of influence; the factors are usually not combined by multiplication. Such factors are often not quantities that are directly observed. Instead, they may exist either as unobserved objects or unobserved forces in the physical world that affect observable quantities. Theymay also exist as constructs in the human mind that provide useful simplifying explanations or inferred causes of the observed data. They can be thought of asconcepts or abstractions that help us make sense of the rich variability in the data

当我们设计特征或者设计学习特征的算法时,我们的目标是一般都是把能解释我们观测的数据结果差别的 “差别因子” 从中分离出来。在这里,我们用 “因子”这个词来简单的指代不同的差别影响源;因子之间一般都不是简单的相乘叠加。这些因子一般不是能直接观测到的量。 相反,他们要么是以看不到的物体的形式,要么是以看不到的力的形式,来影响可观测的量。 这些因子也可能仅仅是存在于人类思维中的某种结构,他们提供了对原始数据的一种有效的简化解释或者一种因果上的推断。他们可以被认为是一种概念或者一种抽象化的事物,来帮助我们理解数据中丰富的变化。

When analyzing a speech recording, the factors of variation include the speaker’sage, their sex, their accent and the words that they are speaking. When analyzingan image of a car, the factors of variation include the position of the car, its color,and the angle and brightness of the sun.

当我们分析一个语音记录时,差别因子包含说话者的年纪,性别,口音和他们所说的话。当分析一个汽车图片时,差别因子包括 汽车的位置,颜色,观察角度以及太阳的亮度。

A major source of diffculty in many real-world artificial intelligence applications is that many of the factors of variation influence every single piece of data we are able to observe. The individual pixels in an image of a red car might be very close to black at night. The shape of the car’s silhouette depends on the viewing angle.Most applications require us to disentangle the factors of variation and discard the ones that we do not care about.


Of course, it can be very difficult to extract such high-level, abstract featuresfrom raw data. Many of these factors of variation, such as a speaker’s accent,can be identified only using sophisticated, nearly human-level understanding ofthe data. When it is nearly as difficult to obtain a representation as to solve theoriginal problem, representation learning does not, at first glance, seem to help us.


Deep learning solves this central problem in representation learning by introducing representations that are expressed in terms of other, simpler representations.Deep learning allows the computer to build complex concepts out of simpler con-cepts. Fig. 1.2 shows how a deep learning system can represent the concept of animage of a person by combining simpler concepts, such as corners and contours,which are in turn defined in terms of edges.

通过引入由许多更简单的浅层表达组合得到高层表达,深度学习解决了表达学习这个中心问题。深度学习允许我们 通过简单的概念建立更复杂的概念。表1.2展示了一个深度学习系统如何通过结合简单的概念,例如角点,连通域表达出一个人脸的图像。而这些角点,连通域的概念是有边缘像素来定义的。

The quintessential example of a deep learning model is the feedforward deepnetwork or multilayer perceptron (MLP). A multilayer perceptron is just a mathe-matical function mapping some set of input values to output values. The function is formed by composing many simpler functions. We can think of each applicationof a different mathematical function as providing a new representation of the input

The idea of learning the right representation for the data provides one perspective on deep learning. Another perspective on deep learning is that it allows the computer to learn a multi-step computer program. Each layer of the representation can be thought of as the state of the computer’s memory after executing another set of instructions in parallel. Networks with greater depth can execute more in-structions in sequence. Being able to execute instructions sequentially offers greatpower because later instructions can refer back to the results of earlier instructions.According to this view of deep learning, not all of the information in a layer’s representation of the input necessarily encodes factors of variation that explainthe input. The representation is also used to store state information that helps toexecute a program that can make sense of the input. This state information couldbe analogous to a counter or pointer in a traditional computer program. It hasnothing to do with the content of the input specifically, but it helps the model to organize its processing


There are two main ways of measuring the depth of a model. The first view isbased on the number of sequential instructions that must be executed to evaluatethe architecture. We can think of this as the length of the longest path througha flow chart that describes how to compute each of the model’s outputs givenits inputs. Just as two equivalent computer programs will have different lengthsdepending on which language the program is written in, the same function may bedrawn as a flowchart with different depths depending on which functions we allowto be used as individual steps in the flowchart. Fig. 1.3 illustrates how this choiceof language can give two different measurements for the same architecture

评价一个模型的深度,又要看两点。第一,就是评测这个模型时,我们需要执行的顺序指令的数目。我们可以把这个看做是 给定输入的情况下,计算输出的流程图中最长的路径有多长。正如 给定两个相同的程序,不同的编程语言会有不同的长度,流程图中条相同的步骤也可能会有不同的长度,这取决于我们我们用什么函数。图1.3解释了同一个框架,不同的语言下,给出了不同的结构。

Another approach, used by deep probabilistic models, regards the depth of a model as being not the depth of the computational graph but the depth of thegraph describing how concepts are related to each other. In this case, the depthof the flowchart of the computations needed to compute the representation ofeach concept may be much deeper than the graph of the concepts themselves.This is because the system’s understanding of the simpler concepts can be refinedgiven information about the more complex concepts. For example, an AI systemobserving an image of a face with one eye in shadow may initially only see one eye.After detecting that a face is present, it can then infer that a second eye is probably present as well. In this case, the graph of concepts only includes two layers—alayer for eyes and a layer for faces—but the graph of computations includes 2nlayers if we refine our estimate of each concept given the other n times.

另外一种评价模型深度的方法,大多用在的深度概率模型中,认为模型的深度不是计算流图的深度,而是描述模型的概念关系的流图的深度。这种情况下,用来计算每个概念的特征表达的计算流图的深度可能会比概念流图本身的深度 更深。这是因为,如果给定了关于复杂概念的相关信息,系统对于简单概念的理解将会更加精细。举例子来说,有一张人脸的图像,其中一个眼睛在阴影下,当一个AI系统观测这么一张图时,智能检测到一个眼睛。在检测到有一张人脸后,他就能推测出也许还有另外一个眼睛存在。这种情况下,概念流图只包含两层---一层是眼睛,一层是脸---但是对于计算流图来说,如果我们再把所给的概念精细化n次,计算流图就包含2n层,

Because it is not always clear which of these two views—the depth of thecomputational graph, or the depth of the probabilistic modeling graph—is most relevant, and because different people choose different sets of smallest elements from which to construct their graphs, there is no single correct value for the depth of an architecture, just as there is no single correct value for the length of a computer program. Nor is there a consensus about how much depth a model requires to qualify as “deep.” However, deep learning can safely be regarded as the study of models that either involve a greater amount of composition of learned functions or learned concepts than traditional machine learning does


To summarize, deep learning, the subject of this book, is an approach to AI.Specifically, it is a type of machine learning, a technique that allows computersystems to improve with experience and data. According to the authors of thisbook, machine learning is the only viable approach to building AI systems thatcan operate in complicated, real-world environments. Deep learning is a particularkind of machine learning that achieves great power and flexibility by learning torepresent the world as a nested hierarchy of concepts, with each concept defined inrelation to simpler concepts, and more abstract representations computed in termsof less abstract ones. Fig. 1.4 illustrates the relationship between these differentAI disciplines. Fig. 1.5 gives a high-level schematic of how each works

总结下来,这本书的主题---深度学习是一种实现人工智能的途径。展开说,它是一种机器学习的方法,一种允许计算机系统随着经验和数据改变性能的技术。从笔者的角度来看,机器学习是唯一可行的来实现能够在复杂真实环境中运作的人工智能系统的途径。尤其是深度学习,它是那种高效灵活的,能够通过学习将世界以一个嵌套层次化的概念的形式展现出来,每一个概念都与一个更简单的概念相关,更抽象的概念可以被更具体一点的概念计算出来。 图1.4阐述了这些AI法则之间的不同关系。图1.5给出了阐述他是如何工作的高层的图解