by Mike Shea on 24 September 2005
I became fascinated with the challenges of preserving our knowledge, probably some psychological arm-wrestle with mortality but whatever, I'm having fun.
There was a good article in Fairfax Digital called The Digital Dark Age that talks about a lot of the issues with preserving digital data. I also found a lot of interesting reports from various sources on archiving digital data and the challenges it presents.
At the same time, I continually look at preserving my own information both in paper and in digital formats. Right now I have a pretty good process that follows Arthur C. Clarke's Rule of Three: preserve in three formats on three media types in three places.
There are a few problems faced when considering the longevity of digital archives:
Most people only consider the last one. I bought some CD-ROMs and DVDs that are rated for 100 years but I'm pretty sure no one will have a CD-ROM drive that far in the future. Also, what good is a CD with your lost novel if its written in a binary Microsoft Word file that no one can understand?
Here are some more random thoughts:
Archival Formats: Text: UTF-8 ASCII, HTML (minimal markup, strict compliance, complete removal or isolation of style from structure), XML
Images: GIF, JPEG, PNG, BMP, TIFF
Audio: MP3, WAV, OOG
Video: MPEG-1, AVI
Archival Media: - Hard Drives (40gb+ size, lifespan = ten years or so - degrades as used and under changes in climate) - USB drives (1gb to 4gb size, lifespan = five to ten years - degrades as it is written to, survives extreme climates) - CD-ROMs (600 mb size, 100 year media lifespan, ubiquitous, open, cross platform) - DVD-ROM (4gb size, 100 year lifespan, not as ubiquitous as CD, open, cross platform)
National Archives rules: Open standards Ubiquity - is it used often Stability - is it a stable format with proven history Metadata support - does it allow for a open source of metadata Feature set - can it do what you need it to do interoperability - does it work on a variety of systems and platforms Viability - does it do any sort of internal data check?
Mike's additional rules: - Durability - can the data be restored or preserved if damaged? - Migratability - can it be easily shifted from one media, format, or system to another? - Ten year rule - has this format remained ubiquitous over a long period of time? - Your system is more important than technical detail. - Do not compress - uncompressed data is easier to fix on damaged media.
What is my backup procedure? Lets take a look:
Preserve in Three Places: - On my website - On my home machine HDs - At Michelle's House - On my person
Formats: - HTML, single file per site - HTML, one file per article - single file SQL ascii database dump
Media: - Website HDs - Home HDs - USB Flash Drives
Rules of Thumb: - Don't compress - Use standards - Keep it simple - Document it well - Keep it automated - Single volume per copy - Document well the recovery - Keep it platform, format, application, and media independent - Don't go overboard. Organization and documentation is more important than chaotic mass backup.
Procedures:
My website is the core area I care about. All of my other machines can explode as long as the data on my website are ok. Keep no personal files local that aren't copied to the web.
Every morning at 1am, push anything I've written on my home machine's stories folder to the "stories" directory on my website.
Every day at 2am, tar up the following on my pair.com webserver to http://mikeshea.net/lifebackup.tar.gz:
mikeshea.net/articles liquidtheater.com/ loralciriclight.com/ mobhunter.com/ bobshea.net/ vrenna_and_other_tales.html notes stories
Every day at 4am, my home multimedia machine (my core backup machine) retrieves the lifebackup.tar.gz file and unpacks it using 7za.exe, 7zip's universal unpack executable, to the lifebackup directory in "My Documents".
My multimedia machine uses xcopy to update any changed files to my bank of four USB thumb drives, my internal video drive and my external western digital drive, and my ipod nano.
My multimedia box also backs up Everquest, Warcraft, EQ2, and My Documents (including all of my music and audio books) to my Western Digital external drive.
Once a month I plug in another isolated USB hard drive and copy everything over. This drive rotates with another drive over to Michelle's house for a full backup of all of my games, music, and written works outside of my own apartment.
Here are some other resources I've found:
http://www.tessella.com/Services/Capabilities/e_Digital%20Preservation.pdf
http://www.si.umich.edu/CAMILEON/reports/reports.html
http://www.dpconline.org/graphics/digpres/presissues.html
http://www.columbia.edu/acis/dl/imagespec.html
http://www.columbia.edu/cu/lweb/services/preservation/dlpolicy.html
http://www.nationalarchives.gov.uk/preservation/advice/digital.htm
http://www.nationalarchives.gov.uk/preservation/advice/pdf/selecting_file_formats.pdf