From Tabulating Machines to Machine Learning to Deep Learning


A Census Bureau clerk tabulates data using a Hollerith Machine (Source: US Census Bureau)

This week’s milestone in the history of technology is the patent that launched the ongoing quest to get machines to help us and them know more about our world, from tabulating machines to machine learning to deep learning (or today’s “artificial intelligence”).

On January 8, 1889, Herman Hollerith was granted a patent titled the “Art of Compiling Statistics.” The patent described a punched card tabulating machine which launched a new industry and the fruitful marriage of statistics and computer engineering—called “machine learning” since the late 1950s, and reincarnated today as “deep learning” (also popularly known today as “artificial intelligence”).

Commemorating IBM’s 100th anniversary in 2011, The Economist wrote:

In 1886, Herman Hollerith, a statistician, started a business to rent out the tabulating machines he had originally invented for America’s census. Taking a page from train conductors, who then punched holes in tickets to denote passengers’ observable traits (e.g., that they were tall, or female) to prevent fraud, he developed a punch card that held a person’s data and an electric contraption to read it. The technology became the core of IBM’s business when it was incorporated as Computing Tabulating Recording Company (CTR) in 1911 after Hollerith’s firm merged with three others.

In his patent application, Hollerith explained the use of his machine in the context of a population survey, highlighting its usefulness in the statistical analysis of “big data”:

The returns of a census contain the names of individuals and various data relating to such persons, as age, sex, race, nativity, nativity of father, nativity of mother, occupation, civil condition, etc. These facts or data I will for convenience call statistical items, from which items the various statistical tables are compiled. In such compilation the person is the unit, and the statistics are compiled according to single items or combinations of items… it maybe required to know the numbers of persons engaged in certain occupations, classified according to sex, groups of ages, and certain nativities. In such cases persons are counted according to combinations of items. A method for compiling such statistics must be capable of counting or adding units according to single statistical items or combinations of such items. The labor and expense of such tallies, especially when counting combinations of items made by the usual methods, are very great.

James Cortada in Before the Computer quotes Walter Wilcox of the U.S. Bureau of the Census:

While the returns of the Tenth (1880) Census were being tabulated at Washington, John Shaw Billings [Director of the Division of Vital Statistics] was walking with a companion through the office in which hundreds of clerks were engaged in laboriously transferring data from schedules to record sheets by the slow and heartbreaking method of hand tallying. As they were watching the clerks he said to his companion, “there ought to be some mechanical way of doing this job, something on the principle of the Jacquard loom.”

Says Cortada: “It was a singular moment in the history of data processing, one historians could reasonably point to and say that things had changed because of it. It stirred Hollerith’s imagination and ultimately his achievements.” Cortada describes the results of the first large-scale machine learning project:

The U.S. Census of 1890… was a milestone in the history of modern data processing…. No other occurrence so clearly symbolized the start of the age of mechanized data handling…. Before the end of that year, [Hollerith’s] machines had tabulated all 62,622,250 souls in the United States. Use of his machines saved the bureau $5 million over manual methods while cutting sharply the time to do the job. Additional analysis of other variables with his machines meant that the Census of 1890 could be completed within two years, as opposed to nearly ten years taken for fewer data variables and a smaller population in the previous census.

But the efficient output of the machine was considered by some as “fake news.” In 1891, the Electrical Engineer reported (quoted in Patricia Cline Cohen’s A Calculating People):

The statement by Mr. Porter [the head of the Census Bureau, announcing the initial count of the 1890 census] that the population of this great republic was only 62,622,250 sent into spasms of indignation a great many people who had made up their minds that the dignity of the republic could only be supported on a total of 75,000,000. Hence there was a howl, not of “deep-mouthed welcome,” but of frantic disappointment.  And then the publication of the figures for New York! Rachel weeping for her lost children and refusing to be comforted was a mere puppet-show compared with some of our New York politicians over the strayed and stolen Manhattan Island citizens.

A century later, no matter how efficiently machines learned, they were still accused of creating and disseminating “fake news.” On March 24, 2011, the U.S. Census Bureau delivered “New York’s 2010 Census population totals, including first look at race and Hispanic origin data for legislative redistricting.” In response to the census data showing that New York has about 200,000 less people than originally thought, Sen. Chuck Schumer said, “The Census Bureau has never known how to count urban populations and needs to go back to the drawing board. It strains credulity to believe that New York City has grown by only 167,000 people over the last decade.” Mayor Bloomberg called the numbers “totally incongruous” and Brooklyn borough president Marty Markowitz said “I know they made a big big mistake.” [the results of the 1990 census were also disappointing and were unsuccessfully challenged in court, according to the New York Times].

Complaints by politicians and other people not happy with learning machines have not slowed down the continuing advances in using computers in ingenious ways for increasingly sophisticated statistical analysis. But for many years after Hollerith’s invention and after tabulating machines became digital computers, the machines interacted with the world around them in a very specific, one-dimensional way. Kevin Maney in Making the World Work Better:

Hollerith gave computers a way to sense the world through a crude form of touch. Subsequent computing and tabulating machines would improve on the process, packing more information unto cards and developing methods for reading the cards much faster. Yet, amazingly, for six more decades computers would experience the outside world no other way.

Deep learning, the recently successful variant of machine learning (giving rise to the buzz around “artificial intelligence”), opened up new vistas for learning machines. Now they can “see” and “hear” the world around them, driving a worldwide race for producing the winning self-driving car and for planting everywhere virtual assistants—new applications in the age-old endeavor of combining statistical analysis with computer engineering, of getting machines to assist us in the processing, tabulating, and analysis of data.

Originally published on

About GilPress

I launched the Big Data conversation; writing, research, marketing services; &
This entry was posted in Machine Learning. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s