Build your own Memex

by Mike Shea on 5 August 2006

In 1945, just after the end of World War 2, the scientist Vannevar Bush wrote a paper called "As We May Think". In this article, Bush describes a machine called "Memex", a machine, essentially a series of tubes (it's not a big truck), that is able to scan, photocopy, record, and index all of the information in ones head. In his own words:

"A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory."

In his day, he envisioned the Memex as an analog device that would photocopy books, texts, papers, notes, and drawings. These items could all be indexed or marked and interlinked so that someone could travel through this base of data and find the bit of information for which they were looking.

It was the descriptions of these links that eventually influenced the invention of hypertext.

Today all of the technology Vannevar Bush describes is within the grasp of average Americans. We would never consider the capture of analog material, however, without digitizing it. In my article last week, I described the Lifebackup, a system, process, and set of scripts that captures ones digital life: writings, emails, photos, captured web pages, music, recordings, videos, and chat logs. All of these data can be stored digitally on portable hard drives or web servers. It can all be indexed with a variety of tools such as Google Desktop Search and Picasa. With it, ones entire digital life can be passed on from generation to generation.

Another technology more closely resembles the vision of Vannevar Bush - the printer / scanner / copier. This single device can take books, papers, drawings, and other analog media, scan them, and record them digitally. People can take their snapshots, documents, notebooks, drawings, and favorite passages from books, scan them, digitize them, and store them within their own digital lifebackup. Anyone can buy one for about $200 and, according to Saving Stuff black and white laser printing can last as long as the paper upon which it is printed.

Is that the right direction? Digital media and digital content has a very poor history for long-term life. Media degrades quickly and the data itself either become corrupted or so obsolete that there is no way to read it. Legal issues surround all of it like a dark spectre, threatening anyone who might spend the time to reverse engineer the technology to decode these lost data. Who knows if any company will have the legal rights to render a .jpeg in forty years, much less the capability.

Last week a book of psalms was found in the mud in Ireland. The book is over 1000 years old, dated back to 800 to 1000 AD. While the damage to the twenty page book is extensive, the material within it can, over time, be fully recovered. What would have happened to a CD-ROM or a USB drive stuck in the mud for a thousand years?

Books have proven to last for two millenia when properly treated. The material within the pages does not need to be decoded or transformed. As long as the pages survive, the information within can be read and understood.

So why would we take a medium known to last for thousands of years and convert it to a medium and format known to be so complicated and delicate that we lose hundreds of gigabytes in only a few years?

The digitization of data has the following advantages:

It is easy to transform digital data from one format, media, or file type to another.

It is easy to make a full copy of the data.

It is easy to transport the data or a copy of the data elsewhere.

It is easier to search and navigate digital data than analog.

Digital data can easily include audio and video. The strengths of hardcopy data are lost when it comes to audio and video. No format or media has yet been invented that preserves audio and video as well as still images or text.

It is easier to make analog copies of digital data than it is to make analog copies of analog copies. It is easier to print a paper photo album from a digital master than to copy another paper one.

Data kept in a hardcopy format has the following advantages:

It requires no tool or device to read the data.

It degrades gracefully. Any lost page does not affect the others.

It is inexpensive. A book costs only a few dollars these days. Paper is cheap.

It can be stored for extremely long periods of time. One could preserve hardcopy data for a thousand years.

It would seem, with an understanding of the above principals, that a modern Memex machine would actually contain both analog and digital data. This is where the printer / scanner / copier comes in.

With a digital lifebackup built, a multifunction printer could help one copy data from paper to a digital format and help print digital data to paper. A copy of many of the items stored in digital format can be printed and stored for long-term preservation while still kept in a digital repository to assist with search, copying, transportation, and transformation.

Print-on-demand services offer us another excellent tool for digital preservation. One can easily create a template and transform their digital data into a format suitable for printing from a service such as Lulu. A few hundred pages can be printed and bound into a nice trade-paperback format for under $20. These hardcopy books can act as the long-term archive of digital data in a format and medium that can last far longer than its digital equivalent.

Yesterday I built four books, three of them over 500 pages, combining all of the articles of Mikeshea.net, Liquidtheater.com, Loralciriclight.com, and Vrenna and the Red Stone. These four books contain just about all of the articles I wrote on the web. Within an hour, I built, formatted, and ordered professional hardcopy books on acid-free paper. I can put these four books into ziploc bags (Note, I found out that ziploc bags are made of polyethylene archival quality plastic) stick them in a pretzel jar and save them for a thousand years. I can't do that with a digital copy of these data. The total cost of this paper archive is about $80.

Yesterday Reuters reported that an electrician in Danville VA found a 188 year old bible broken into four pieces. The book had survived fire and floods. No electronic device could have survived what this book faced for 180 years.