Sven Kreiss ←Articles

AncientML 1

AncientML is a series of paper reading notes. The purpose is to review outstanding contributions to machine learning that are valuable to the formation as an academic field.

Some rules about the papers:

  • have at least 500 citations
  • be sufficiently old so that interest in them cannot be considered a conflict for industry ML researchers and engineers
  • have had impact on academia so that they would be considered valuable to teach

It’s not supposed to be a summary but rather inspire reading of the papers itself and discussions in person.

A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955 (McCarthy et al., 2006), PDF

  • The paper/event that gets credited with the foundation of the field of Artificial Intelligence research.
  • The paper is three pages long and the authors include Claude Shannon.
  • scale of the proposed project: 2 months, 10 men
  • focused on language, abstraction and concepts
  • identifies seven areas to improve: Automatic Computers, How Can a Computer be Programmed to Use a Language, Neuron Nets, Theory of the Size of a Calculation, Self-Improvement, Abstractions, Randomness and Creativity
  • the major obstacle is not lack of machine capacity, but our inability to write programs”
  • There is Wikipedia article on the Dartmouth workshop.
  • 102 pages of Ray Solomonoff’s hand written notes including some doodles on page 3.

The Mathematical Theory of Communication (Shannon et al., 1951), PDF

  • Central paper for many fields. 90 pages (skip the part by Weaver).
  • The Idea Factory (Gertner, 2012) is a book about Bell Labs around that time.
  • Khinchin (1957) is a book that discusses this paper.
  • p.49: information is not attached to a particular message but to the amount of freedom of choice
  • p.49: “decomposition of choice” is a beautiful requirement for \(H\), and leads with the other two requirements to a unique form for \(H\)
  • p.50: simple example to visualize the connection between probability of a message and information is shown in the figure below
  • p.53: origin for terms of the form \(p_i\log{}p_i\)
  • p.56: relative entropy, maximum possible compression, redundancy
  • p.70: capacity of a noisy channel; includes a max() over all possible information sources


Similar resources: @shakir_za tweets a series called “Sunday Classic Paper”.


Jon Louis Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509–517, 1975.

Jon Gertner. The Idea Factory: Bell Labs and the great age of American innovation. Penguin Press, New York, 2012. ISBN 978-0143122791.

Alex Graves. Supervised sequence labelling. In Supervised sequence labelling with recurrent neural networks, pages 5–13. Springer, 2012.

João F Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3):583–596, 2015.

Geoffrey E Hinton and Ruslan R Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, 2006.

A Khinchin. Mathematical foundations of information theory. Dover Publications, New York, 1957. ISBN 978-0486604343.

John McCarthy, Marvin L Minsky, Nathaniel Rochester, and Claude E Shannon. A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955. AI magazine, 27(4):12, 2006.

Claude E Shannon, Warren Weaver, and Arthur W Burks. The Mathematical Theory of Communication. The University of Illinois Press, 1951.

Paul Smolensky. Information processing in dynamical systems: foundations of harmony theory. Technical Report, Colorado University at Boulder Dept of Computer Science, 1986.

Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), 928–936. 2003.

Go Top