History of Machine Learning

anonymous


It is likely that you have learned in high school that the history of science is quite long-lived-- from important modern theories (such as gravity) being hundreds of years old, to mathematical theorems discovered a millenia ago. Out of many field of scientific study, it is often believed that Computer Science and related fields of study are very new--only surfacing within the past 30 or so years. And while this is true, the foundations of computing go back a very long way. In his article, I will discuss the field of machine learning as related to computer science--its history, and eventual climb from obscurity into one of the most hot topics of research today.

Machine learning essentially is a name for a set of algorithms that fit functions to complex data to make predictions, rather than humans specifically programming them to do so. As such, the field relies heavily on statistical and mathematical methods first discovered in the late 1700s and early 1800s. Baye’s Theorem, first worked toward in Thomas’ Bayes’ An Essay towards solving a Problem in the Doctrine of Chances (1783), was fully realized in 1812, in Pierre-Simon Laplace’s Théorie Analytique des Probabilités. As well, Least Squares, one of the many forms of fitting data used in Machine Learning, was first published in Nouvelles méthodes pour la détermination des orbites des comètes, by Andrey Markov.

Pierre-Simon Laplace

Pierre-Simon Laplace (source: Wikipedia)

In the 1940s and 1950s, several machine-learning algorithms were discovered. Neural Networks, a common paradigm in modern machine learning that involves representing a function as a series of neurons, were first conceived in 1943--the first paper discussing the idea was A logical calculus of the ideas immanent in nervous activity, by Warren McCulloch and Walker Pitts. In this paper, neural networks are presented as a model based of of biology--the authors make an astute point that is the foundation of this section of machine learning: that one neuron is a very simple mathematical function, and the complexity that makes neural networks worthwhile comes from combining a large amount of these neurons and forcing them to interact. This foundation paper was followed by the first real neural network machine, created by Marvin Minsky and Dean Edmonds in 1951, called SNARC (Kurenkov). Finally, the first neural network programs, which played checkers, were created in 1952 by Arthur Samuel at IBM (Press). The perceptron, a model by which one can fit a decision boundary to classify data into different groups, was invented by Frank Rosenblatt in 1957 at Cornell (History of the Perceptron). Today is it common to combine the two computational models above--that is, creating a neural network where each "neuron" is a perceptron--to create what is called a "multi-layered perceptron."

A simple neural network

A simple neural network. (Source: Wikipedia)

The 1960s and 70s brought a host of new discoveries as well. It is fascinating that neural networks themselves were created and studied before probabilistic inference, a method widely used to create and use neural net models, was discovered by Ray Solomonoff in 1964 and published in his paper A formal theory of inductive inference. The K nearest neighbor problem, though first proposed by Fix and Hodges in 1951, was fully worked out by T. Cover and P. Hart, in their paper Nearest neighbor pattern classification, which was published in 1967 (Who Invented the Nearest Neighbor Rule). Finally, an algorithm for backpropagation, an algorithm fundamental to training neural networks and other statistical models, was discovered by Seppo Linnainmaa and published in The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors in 1970. Unfortunately, for the rest of the 1970s, neural networks were left alone by many scholars (Morrison). This is because of reasons such as the popularity of the Von Neumann architecture, which was used in place of neural networks for much of the 1970s due to structural similarities (Morrison).

The early 1980s brought back some interest in neural networks, beginning with the discovery of the Recurrent Neural Network, a statistical neural network model with a sense of time, by John Hopfield in 1982 (Hopfield Network). However, for much of the rest of the ‘80s and ‘90s, neural networks once again fell out of fashion--partially because the computational cost of training reliable neural networks was expensive at the time. However, other forms of machine learning were studied and improved upon. Terry Sejnowski invented a program that could learn to pronounce words like a baby in 1986 (Timeline of Machine Learning). The foundations of Reinforcement Learning were improved upon by Christopher Watkins’ invention of Q-learning, and the first commercial software package to use a machine learning algorithms Evolver, was released in 1989 by Axceilis, Inc (Timeline of Machine Learning, Watkins).

A simple neural network

The modern logo for Evolver. (Source: Palisade.com)

The "modern" era of machine learning, which has largely been dominated by research in neural networks, arguably started in 1997, with the discovery of LSTM (long short-term memory) neural networks, by Sepp Hochreiter and Jürgen Schmidhuber (Olah). These networks, unlike those of the past, have a sense of long and short-term memory, which can aid in training for certain kinds of classification tasks (Olah). In 1998, the MNIST database, a collection of handwritten digits that have widely been used for benchmarking different classification algorithms from its conception, was created in 1998 (LeCun). The late 2000s and 2010s saw neural networks finally begin to come mainstream, starting with ImageNet, created in 2009, which provided a massive database of images for training models to recognize thousands of objects. AlexNet, a neural network created in 2012, was the first model to achieve an error of 15.3% on this dataset--more than 10.8 points higher than any other model at the time (Nayak). Models since have improved on these numbers. Since then, modern networks such as Google PixelRNN, PixelCNN, and WaveNet (2016), have pushed the bounds of what machine learning can accomplish--all generate complex images or audio. One specific application of Google Wavenet is text to speech--the network is so accurate that it may be difficult for some people to discern between output of this network from a real voice (WaveNet: A Generative Model for Raw Audio).

Wavenet diagram

A demonstration of a WaveNet (Source: DeepMind.com, Google)

Machine learning has come a long way since its inception over 70 years ago. However, it seems that, with modern advancements in computational power and neural network architectures, it surely hasn’t come close to reaching its full potential. The next few years of research and discoveries will likely get ever more exciting, as we dive into realizing complex computational tasks many people thought impossible only several years ago.

References

  • History of the Perceptron, CSU Long Beach, web.csulb.edu/~cwallis/artificialn/History.htm
  • "Hopfield Network." Scholarpedia, www.scholarpedia.org/article/Hopfield_network
  • Kurenkov, Andrey. "A 'Brief' History of Neural Nets and Deep Learning." Andrey Kurenkov's Web World, www.andreykurenkov.com/writing/ai/a-brief-history-of-neural-nets-and-deep-learning/
  • LeCun, Yann, et al. "THE MNIST DATABASE." MNIST Handwritten Digit Database, Yann LeCun, Corinna Cortes and Chris Burges, yann.lecun.com/exdb/mnist/
  • Morrison, Jack, et al. "History of Machine Learning." AI in Radiology, 2018, www.doc.ic.ac.uk/~jce317/history-machine-learning.html
  • Nayak, Sunita. "Understanding AlexNet." Learn OpenCV, Big Vision LLC, 13 June 2018, www.learnopencv.com/understanding-alexn`et/
  • Olah, Christopher. "Understanding LSTM Networks." Understanding LSTM Networks -- Colah's Blog, 27 Aug. 2015, colah.github.io/posts/2015-08-Understanding-LSTMs/
  • Palisade. "Maker of Risk & Decision Analysis Software Using Monte Carlo Simulation." Palisade, Palisade, www.palisade.com
  • "Pierre-Simon Laplace." Wikipedia, Wikimedia Foundation, 15 Mar. 2019, en.wikipedia.org/wiki/Pierre-Simon_Laplace
  • Press, Gil. "A Very Short History Of Artificial Intelligence (AI)." Forbes, Forbes Magazine, 30 Dec. 2016, www.forbes.com/sites/gilpress/2016/12/30/a-very-short-history-of-artificial-intelligence-ai/#f311a7c6fba2
  • "Timeline of Machine Learning." Wikipedia, Wikimedia Foundation, 17 Jan. 2019, en.wikipedia.org/wiki/Timeline_of_machine_learning
  • Watkins, Christopher, and Peter Dayan. "Technical Note: Q-Learning." Machine Learning, vol. 8, 1992, pp. 279–292., www.gatsby.ucl.ac.uk/~dayan/papers/cjch.pdf
  • "WaveNet: A Generative Model for Raw Audio." DeepMind, Google, Inc., 2016, deepmind.com/blog/wavenet-generative-model-raw-audio/
  • "Who Invented the Nearest Neighbor Rule?" Pattern Recognition Tools, 37steps, 28 Jan. 2014, https://37steps.com/4370/nn-rule-invention