Evernote and Long-Term Data Archiving

by Mike Shea on 18 January 2010

I've been having a lot of fun with Evernote over the past couple of months. When one manages to build it into one's workflow, it really is an amazing ubiquitous capture device.

The long-term survival of our data is something I constantly consider these days. Digital data just seems weak to me, easily destroyed, easily corrupted, easily lost as formats and devices change. When I'm looking at a program like Evernote, I'm thinking about how long I will use such a thing or how long such a program will last. Is it possible that Evernote will be around for the next forty or fifty years? If not, what will become of it and what will become of the data I am sticking into it?

One thing I want to always be sure of, these days, is how I can get my data out of a thing. iTunes, for example, lets me pull out my music and stick it somewhere else. Same with my images. I can be generally assured that .jpgs are the best format for my images these days, patent politics aside. As far as the long-term survival of images, .jpeg is a reasonably safe bet.

But what about a program like Evernote? Well, I was happily surprised to see that not only does Evernote on the Mac export to a directory full of HTML files - including all of the images for those files - but also stores these files natively in both HTML and XML. In fact, if you were to archive the data directory for Evernote, you would be archiving a directory heirchary of HTML data and the images contained within. This speaks very well for the survival of the data held within Evernote. Granted, things like tags might get lost should you want to import Evernote data into something else. To my knowledge, no other program lets you import from Evernote that I can find. Not too surprising as its a realtively niche market. Even still, I could index a pile of HTML subdirectories easily enough and search against it. Browsing is probably a bit more difficult but still doable. If I really needed to, I could probably write scripts to parse through the results and generate some sort of HTML index page.

Evernote also contains a relatively robust and well-published API. This is something to be commended and helps reinforce my decision to invest in Evernote with a higher-end account. A better software developer than I could write against this API and do quite a bit with the data held within, including importing to systems in the future.

There is no knowing how digital data will hold up over time. Right now all reasonable data archivists tell you to manually move your data to newer platforms and media every five years. This has been coined Data Curation by some. Technology simply moves too quickly to let your data sit around for 10 years. There is no format or media I know of that can be left alone for 10 or 20 years without doing some sort of migration and cleanup. Text files and JPEGs might be a decent enough format to save your data but ten year old hard drives are likely to fail and any other 10 year old medium has even less survivability. We can hope the USB thumb drive can last longer than that but we haven't seen it for sure yet. We're finally getting to the point where a thumb drive is cheap enough per gigabyte to store nearly all of our vital data on a single drive. A 16 GB drive runs about $30 these days.

Still, the philosophy and articulation of Evernote's design makes me feel safe enough to use it to store most everything I want to capture. I've written a script to import my weekly autojournal every week so now all of my current writings are captured in there as well. I haven't gotten to the point where I'll store my entire lifebackup in there - it's a lot of data to stash in a single place - but I could.

The big question will be seeing how well this program works in 2020. Will it still be around? What will become of the data we've stored in it if it's not? I'm confident I'll still have these data, but only time will tell me exactly how I will use it.

We live in interesting times.