Dear Appy, - Gordon Bell



Dear Appy,

How committed are you?

Signed,

Lost and Forgotten Data

Gordon Bell

Microsoft Research

Dear Appy,

I have trouble with the long-term commitments for the apps that created me and that I associate with. Over time, these pesky apps evolve and they simply don’t recognize data that they once helped create. But, we data progeny consider them responsible for eternal support. Some apps even disappear. Is it expecting too much for 20-something year old data like me to be interpretable by my app (e.g. Acrobat, DB2, Draw, Eudora, Office, Quicken, or RealNetworks), or am I just associating with irresponsible apps? Based on history, it seems I will be un-interpretable within 20 to 50 years. Apps will move to other platforms, or evolve to be more internet or next-big-thing centric.

My owner is trying to store everything cyberizable in his life. His Cyberall aims at holding every form of directly created to digitally encoded, personal information e.g. documents, photos, and videos. His data wants to be valid and hence understood in an indeterminate, future! For example, high quality paper will hold information for a millennium, and film is sometimes rated at several hundred years. A CD is likely to be readable in 50 years, but finding the CD reader/computer & file system/app to read it will clearly be impossible if history is a guide[1]. So, would you say that the only true form of long-terms storage is paper? Is data committed to an inordinate conversion effort with each generation if it wants to be internal due to these three, too rapidly evolving, finger-pointing, levels: the media, the computer and its operating system, and the app? If so, this means storing 10 feet paper stacks of personal information versus a single DVD for the few GBytes and simply giving up on audio and video! Various computing forefathers donate their personal archives to places like the Charles Babbage Institute, the Computer Museum History Center, or a university library in hopes that future scholars will find them useful. The irony is that with this much paper that computers helped create, computers are unlikely to be helpful in assisting the retention and retrieval of their personal archives. But poor video data! An app that encoded video just two years ago has gone away, leaving data useless. That was because the evolving nature of proprietary formats coming out of the format wars. Any one of the MPEGs would have been a better choice.

Are there a few basic data-types that will be forever interpretable so that one doesn’t have to print out and store in large stacks of irretrievable paper waiting to be encoded or to be otherwise found?

For one thing, data has learned that in order to be understood in the future, it cannot be subject to the highly volatile apps that change every year such that a particular version has to be executed in order for data to be understood e.g. Quicken 95…2000.[2] As apps evolve this means data maintains the creating version of the app or all past data associated with a named app has to be converted forward. Kind of an issue of object philosophy. Alternatively, the simplest way to ensure interpretability of a simple form, is to transform an app’s progeny i.e. its data into a generic form that one has a very long term confidence in. This assumes there are a few, golden, generic formats that will live indefinitely. ASCII text is probably the only proven long-term data type. It is too early to tell whether html will make it to be a golden format. Data of the world sure has a commitment to it. Unfortunately, an html document consists of a number of files including images e.g. gif or jpeg making it less than an ideal format. PDF looks like a potential bet for most all paper documents, if it can prove it has a commitment to our longevity!

Clearly a good solution to longevity is to have just a few data-types that have wide acceptance and standardization that data can be transformed into and that are not subject to the whims of rapidly evolving apps. Forget about data in a complex database like drawing programs, or databases e.g. DB2 or Outlook[3]. What golden formats will exist in addition to ASCII? How long will data held in RTF, PDF, JPEG, various MPEGs, and MP3 be interpretable[4]? Given the vast amount of data in Microsoft’s Office apps[5], what commitment will these apps make to their data?

In the future, what prenuptials do you recommend for data? What about app’s fiduciary responsibilities to data that may have cost 100s of billions of dollars to create?

Lost data

-----------------------

[1] A friend has converted a number of the author’s c1980, 8” floppy disks from the Digital PDP-8 Word Processing System (WPS 8) format into word perfect using a PDP-8 emulator running WPS 8 software.

[2] Data written in 1990 on a MAC and converted forward to a more recent version can be used to almost generate an accurate report of the transactions. Data written on a MAC cannot be converted across i.e. read on a PC without inordinate effort. MacDraw, MacDraw Professional, and Draw (MacDraw for the PC) have essentially the same characteristics.

[3] Eudora creates a single ASCII file of messages for each folder; hence is almost certain to be readable.

[4] Having hardware devices such as cameras help create format inertia, but don’t guarantee longevity!

[5] Data written in the early 1980s can been converted forward from MAC versions 4.0 and converted across to the PC Office 2000 standards for Excel, PowerPoint, and Word. PowerPoint can be converted to jpeg.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download