AncientML is a series of paper reading notes. The purpose is to review outstanding contributions to machine learning that are valuable to the formation as an academic field.

Some rules about the papers:

- have at least 500 citations
- be sufficiently old so that interest in them cannot be considered a conflict for industry ML researchers and engineers
- have had impact on academia so that they would be considered valuable to teach

It’s not supposed to be a summary but rather inspire reading of the papers itself and discussions in person.

## A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955 (McCarthy et al., 2006), PDF

- The paper/event that gets credited with the foundation of the field of Artificial Intelligence research.
- The paper is three pages long and the authors include Claude Shannon.
- scale of the proposed project: 2 months, 10 men
- focused on language, abstraction and concepts
- identifies seven areas to improve: Automatic Computers, How Can a Computer be Programmed to Use a Language, Neuron Nets, Theory of the Size of a Calculation, Self-Improvement, Abstractions, Randomness and Creativity
- “the major obstacle is not lack of machine capacity, but our inability to write programs”
- There is Wikipedia article on the Dartmouth workshop.
- 102 pages of Ray Solomonoff’s hand written notes including some doodles on page 3.

## The Mathematical Theory of Communication (Shannon et al., 1951), PDF

- Central paper for many fields. 90 pages (skip the part by Weaver).
*The Idea Factory*(Gertner, 2012) is a book about Bell Labs around that time.- Khinchin (1957) is a book that discusses this paper.
- p.49:
*information*is not attached to a particular message but to the amount of freedom of choice - p.49: “decomposition of choice” is a beautiful requirement for \(H\), and leads with the other two requirements to a unique form for \(H\)
- p.50: simple example to visualize the connection between probability of a message and information is shown in the figure below
- p.53: origin for terms of the form \(p_i\log{}p_i\)
- p.56: relative entropy, maximum possible compression, redundancy
- p.70: capacity of a noisy channel; includes a
`max()`

over all possible information sources - …

## Backlog

- Multidimensional binary search trees used for associative searching, (Bentley, 1975), PDF
- RBM predecessor Harmonium: Information processing in dynamical systems: Foundations of harmony theory, (Smolensky, 1986), PDF
- Reducing the Dimensionality of Data, (Hinton and Salakhutdinov, 2006), PDF
- Online Convex Programming and Generalized Infinitesimal Gradient Ascent (Zinkevich, 2003), PDF
- Supervised Sequence Labelling with Recurrent Neural Networks (Graves, 2012), PDF
- High-speed tracking with kernelized correlation filters, (Henriques et al., 2015), PDF

Similar resources: @shakir_za tweets a series called “Sunday Classic Paper”.

